Commit Graph

113 Commits

Author SHA1 Message Date
Andrew Seigner 64ed8e4a74
Introduce Cluster Heartbeat cronjob (#3056)
`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.

Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.

Example check URL:
https://versioncheck.linkerd.io/version.json?
  install-time=1562761177&
  k8s-version=v1.15.0&
  meshed-pods=8&
  rps=3&
  source=heartbeat&
  uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
  version=git-b97ee9f7

Fixes #2961

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-23 17:12:30 -07:00
Alex Leong d6ef9ea460
Update ServiceProfile CRD to version v1alpha2 and remove validation (#3078)
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller.  Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.

To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2.  This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource.  If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can.  This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.  

Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation).  In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-07-23 11:46:31 -07:00
Ivan Sim 36681218ba
Revert "Increase the retry duration in the post-upgrade 'check' integration test (#2944)" (#3091)
This reverts commit 60c58c1f85.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-07-18 09:25:27 -07:00
Alejandro Pedraza 68f2f694e3
Improve object cleanup when integration tests fail (#3080)
Integration tests may fail and leave behind namespaces that following
builds aren't able to clean up because the git sha is being included in
the namespace name, and the following builds don't know about those
shas.

This modifies the `test-cleanup` script to delete based on object labels
instead of relying on the objects names, now that after 2.4 all the
control plane components are labeled. Note that this will also remove
non-testing linkerd namespaces, but we were already kinda doing that
partially because we were removing the cluster-level resources (CRDs,
webhook configs, clusterroles, clusterrolebindings, psp).

`test-cleanup` no longer receives a namespace name as an argument.

The data plane namespaces aren't labeled though, so I've added the
`linkerd.io/is-test-data-plane` label for them in
`CreateNamespaceIfNotExists()`, and making sure all tests that need a
data plaine explicitly call that method instead of creating the
namespace as a side-effect in `KubectlApply()`.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-07-12 15:01:10 -05:00
Jonathan Juares Beber 2dcbde08b3 Show pod status more clearly (#1967) (#2989)
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.

This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.

Fixes #1967

Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
2019-07-10 12:44:44 -07:00
Andrew Seigner 5d0746ff91
Add NET_RAW to `linkerd check --pre` (#3055)
`linkerd check --pre` validates that PSPs provide `NET_ADMIN`, but was
not validating `NET_RAW`, despite `NET_RAW` being required by Linkerd's
proxy-init container since #2969.

Introduce a `has NET_RAW capability` check to `linkerd check --pre`.

Fixes #3054

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-10 20:28:49 +02:00
Andrew Seigner 7c87fd4498
Make ReplicaSet check more explicit. (#3017)
The `linkerd check` for healthy ReplicaSets had a generic
`control plane components ready` description, and a hint anchor to
`l5d-existence-psp`. While a ReplicaSet failure could definitely occur
due to psp, that hintAnchor was already in use by the "control plane
PodSecurityPolicies exist" check.

Rename the `control plane components ready` check to
`control plane replica sets are ready`, and the hintAnchor from
`l5d-existence-psp` to `l5d-existence-replicasets`.

Relates to https://github.com/linkerd/website/issues/372.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-07-02 20:02:08 +02:00
Ivan Sim 866fe6fa5e
Introduce global resources checks to install and multi-stage install (#2987)
* Introduce new checks to determine existence of global resources and the
'linkerd-config' config map.
* Update pre-check to check for existence of global resources

This ensures that multiple control planes can't be installed into
different namespaces.

* Update integration test clean-up script to delete psp and crd

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-27 09:59:12 -07:00
Andrew Seigner 2528e3d62d
Make NET_ADMIN check a warning, add PSP check (#2958)
`linkerd check` validates whether PSP's exist, and if the caller has the
`NET_ADMIN` capability. This check was previously failing if `NET_ADMIN`
was not found, even in the case where the PSP admission controller was
not running. Related, `linkerd install` now includes a PSP, so
`linkerd check` should also validate that the caller can create PSP's.

Modify the `NET_ADMIN` check to warn, but not fail, if PSP's are found
but the caller does not have `NET_ADMIN`. Update the warning message to
mention that this is only a problem if the PSP admission controller is
running (and will only be a problem during injection, since #2920
handles control plane installation by adding its own PSP).

Also introduce a check to validate the caller can create PSP's.

Fixes #2884, #2849

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-06-20 17:58:26 +02:00
Ivan Sim e2e976cce9
Add `NET_RAW` capability to the proxy-init container (#2969)
Also, update control plane PSP to match linkerd/website#94

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-19 19:34:37 -07:00
Ivan Sim 60c58c1f85
Increase the retry duration in the post-upgrade 'check' integration test (#2944)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-14 12:23:18 -07:00
Alejandro Pedraza 7fc6c195ad
Set MWC and VWC failure policy to 'fail' in HA mode only (#2943)
Fixes #2927

Also moved `TestInstallSP` after `TestCheckPostInstall` so we're sure
the validating webhook is ready before installing a service profile.

Signed-off-by: Alejandro Pedraza Borrero <alejandro@buoyant.io>
2019-06-14 11:50:59 -05:00
Ivan Sim c4354ab1a4
Fix inject integration test failure (#2928)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-06-11 15:08:37 -07:00
Alex Leong 06a69f69c5
Refactor destination service (#2786)
This is a major refactor of the destination service.  The goals of this refactor are to simplify the code for improved maintainability.  In particular:

* Remove the "resolver" interfaces.  These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities.  The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns.  These watchers deal only in Kubernetes primitives and are agnostic to how they are used.  This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.

Signed-off-by: Alex Leong <alex@buoyant.io>
2019-06-04 15:01:16 -07:00
Alejandro Pedraza 74ca92ea25
Split proxy-init into separate repo (#2824)
Split proxy-init into separate repo

Fixes #2563

The new repo is https://github.com/linkerd/linkerd2-proxy-init, and I
tagged the latest there `v1.0.0`.

Here, I've removed the `/proxy-init` dir and pinned the injected
proxy-init version to `v1.0.0` in the injector code and tests.

`/cni-plugin` depends on proxy-init, so I updated the import paths
there, and could verify CNI is still working (there is some flakiness
but unrelated to this PR).

For consistency, I added a `--init-image-version` flag to `linkerd
inject` along with its corresponding override config annotation.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-06-03 16:24:05 -05:00
Alejandro Pedraza b384995397
Known warnings should hold just the message part, not the reason nor the involved object name (#2865)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-29 17:13:31 -05:00
Dennis Adjei-Baah b9cc66c6c8
fixes roleRef name in linkerd-tap rbac (#2845)
* fixes roleRef name in linkerd-tap rbac
2019-05-28 10:05:01 -07:00
Andrew Seigner bd4c2788fa Modify upgrade integration test to upgrade stable (#2835)
In #2679 we introduced an upgrade integration test. At the time we only
supported upgrading from a recent edge. Since that PR, a stable build
was released supporting upgrade.

Modify the upgrade integration test to upgrade from the latest stable
rather than latest edge. This fulfills the original intent of #2669.

Also add some known k8s event warnings.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-05-20 13:36:07 -07:00
Dennis Adjei-Baah a0fa1dff59
Move tap service into its own pod. (#2773)
* Split tap into its own pod in the control plane

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-05-15 16:28:44 -05:00
mg b965d0d30e Introduce pre-install healthcheck for clock skew (#2803)
* Adding pre-install check for clock skew
* Fixing lint error - time.Since
* Update test data for clock skew check
* Incorporating code review comments
* Additional fix - clock skew test

Signed-off-by: Matej Gera <matejgera@gmail.com>
2019-05-13 10:14:38 -07:00
Alejandro Pedraza 065c221858
Support for resources opting-out of tap (#2807)
Support for resources opting out of tap

Implements the `linkerd inject --disable-tap` flag (although hidden pending #2811) and the config override annotation `config.linkerd.io/disable-tap`.
Fixes #2778

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-10 14:17:23 -05:00
Alejandro Pedraza f4ab9d6f9e
Integration test for k8s events generated during install (#2805)
Integration test for k8s events generated during install

Fixes #2713

I did make sure a scenario like the one described in #2964 is caught.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-10 09:09:54 -05:00
Alejandro Pedraza 2ae0daca9f
Add `linkerd inject --manual` test into install_test.go (#2791)
Add `linkerd inject --manual` test into install_test.go

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-05-08 11:13:30 -05:00
Ivan Sim 00a94be073
Change the service profile assertion conditions (#2785)
With the server configured to response with a failure of 50%, the test first
checks to ensure the actual success rate is less than 100%. Then the
service profile is edited to perform retries. The test then checks to
ensure the effective success rate is at least 95%.

This is (hopefully) more reliable than changing the test to perform waits and
retries until there is a difference between effective success rate and actual
success rate and compare them.

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-05-03 08:37:48 -07:00
Andrew Seigner 66494591e0
Multi-stage check support (#2765)
Add support for `linkerd check config`. Validates the existence of the
Linkerd Namespace, ClusterRoles, ClusterRoleBindings, ServiceAccounts,
and CustomResourceDefitions.

Part of #2337

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-30 17:17:59 +01:00
Andrew Seigner 85a7c885d4
Add known proxy-injector log warning to tests (#2760)
PR #2737 introduced a warning in the proxy-injector when owner ref
lookups failed due to not having up-to-date ReplicaSet information. That
warning may occur during integration tests, causing a failure.

Add the warning as a known controller log message. The warning will be
printed as a skipped test, allowing the integration tests to pass.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-30 13:38:59 +01:00
Ivan Sim 714035fee9
Define default resource spec for proxy-init init container (#2763)
Fixes #2750 

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-29 11:41:05 -07:00
Andrew Seigner 15ffd86cf1
Introduce multi-stage upgrade (#2723)
`linkerd install` supports a 2-stage install process, `linkerd upgrade`
did not.

Add 2-stage support for `linkerd upgrade`. Also exercise multi-stage
functionality during upgrade integration tests.

Part of #2337

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-25 14:29:52 -07:00
Alejandro Pedraza 53bb7c47f6
Make the auto-injector required and removed proxy-auto-inject flag (#2733)
Make the auto-injector required and removed proxy-auto-inject flag

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-24 13:06:51 -05:00
Alejandro Pedraza 62d9a80894
New `linkerd inject` default and manual modes (#2721)
Fixes #2720 and 2711 

This changes the default behavior of `linkerd inject` to not inject the
proxy but just the `linkerd.io/inject: enabled` annotation for the
auto-injector to pick it up (regardless of any namespace annotation).

A new `--manual` mode was added, which behaves as before, injecting
the proxy in the command output.

The unit tests are running with `--manual` to avoid any changes in the
fixtures.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-24 09:05:27 -05:00
Alejandro Pedraza c56766a923
Add config.linkerd.io/disable-identity annotation (#2717)
Add config.linkerd.io/disable-identity annotation

First part of #2540

We'll tackle support for `--disable-identity` in `linkerd install` in a
separate commit.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-19 19:04:49 -04:00
Andrew Seigner ebbb55cf6b
Report known logging errors in test output (#2724)
The integration tests check container logs for errors. When an error is
encountered that matches a list of expected errors, it was hidden, and
the test passed.

Modify the integration tests to report known errors in logs via
`t.Skipf`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-19 14:22:35 -07:00
Gaurav Kumar 392cab80fa Add check for unschedulable pods and psp issues (#2510)
Fixes #2465 

* Add check for unschedulable pods and psp issues (#2465)
* Return error reason and message on pod or node failure

Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
2019-04-19 11:42:22 -07:00
Ivan Sim 8d13084f94
Split the `linkerd-version` CLI flag into `control-plane-version` and `proxy-version` (#2702)
* The 'linkerd-version' CLI flag is renamed to 'control-plane-version'
* Add version field to proxy config
* Add the control plane version to the global config
* Unit test for init image version
* Use more specific control plane and proxy versions in unit tests

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-19 11:35:20 -07:00
Dennis Adjei-Baah be614656bb
add service profile integration tests for service profile metrics (#2685)
* add integration tests for retryable requests

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-04-18 11:01:49 -07:00
Andrew Seigner 8323e104fb
Introduce upgrade --from-manifests flag (#2697)
The `linkerd upgrade` command read the control-plane's config from
Kubernetes, which required the environment to be configured to connect
to the appropriate k8s cluster.

Intrdouce a `linkerd upgrade --from-manifests` flag, allowing the user
to feed the output of `linkerd install` into the upgrade command.

Fixes #2629

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-17 13:32:21 -07:00
Ivan Sim 1c0f147718
Integration test for the 'upgrade' command (#2679)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-11 19:37:50 -07:00
Ivan Sim 4e19827457
Allow identity to be disabled during inject on existing cluster (#2686)
Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-04-11 13:37:06 -07:00
Andrew Seigner 46d1490f98
Skip proxy logging errors in integration tests (#2647)
The list of known proxy log errors has been growing, and causing regular
ci failures.

Skip proxy logging errors. The tests will continue to run and report
unexpected errors, but this will not fail the tests (and ci). Also break
out the controller log errors separately, and continue to fail on those.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-10 16:43:56 -07:00
Dennis Adjei-Baah c166b1db55
Service profiles integration tests (#2638)
adds integration tests for service profiles that test profile generation through the tap and Open API flags.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2019-04-09 10:16:26 -07:00
Alejandro Pedraza edb225069c
Add validation webhook for service profiles (#2623)
Add validation webhook for service profiles

Fixes #2075

Todo in a follow-up PRs: remove the SP check from the CLI check.

Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
2019-04-05 16:10:47 -05:00
Andrew Seigner 2f80add17a
Introduce inject integration tests (#2616)
This change introduces integration tests for `linkerd inject`. The tests
perform CLI injection, with and without params, and validates the
output, including annotations.

Also add some known errors in logs to `install_test.go`.

TODO:
- deploy uninjected and injected resources to a default and
  auto-injected cluster
- test creation and update

Part of #2459

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-05 11:42:49 -07:00
harsh jain 976bc40345 Fixes #2607: Remove TLS from stat (#2613)
Removes the TLS percentages from the stat command in the CLI.
2019-04-04 10:37:42 -07:00
Andrew Seigner a3bba0e143
Fix logging regex to handle any addr (#2619)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-02 12:47:38 -07:00
Andrew Seigner a21c7be9f5
Fix ci logging errors (#2617)
Introduce some additional known logging errors to the integration tests

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-02 11:24:30 -07:00
Andrew Seigner b454f8fbc1
Introduce auto inject integration tests (#2595)
The integration tests were not exercising proxy auto inject.

Introduce a `--proxy-auto-inject` flag to `install_test.go`, which
now exercises install, check, and smoke test deploy for both manual and
auto injected use cases.

Part of #2569

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-04-01 10:32:56 -07:00
Andrew Seigner 9eab0e28a6
Introduce ServiceProfile integration tests (#2588)
The existing integration tests were not validating ServiceProfile
functionality.

Introduce ServiceProfile integration tests that:
- install control-plane ServiceProfiles via `linkerd install-sp`
- install smoke-test ServiceProfiles via `linkerd profile --proto`
- validate well-formed ServiceProfiles via `linkerd check`
- validate `linkerd routes` returns expected output

Fixes #2520

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-03-29 10:52:54 -07:00
Alejandro Pedraza e711ab7b15
Update tap integration test to account for always-on mTLS (#2589)
Out of all the integration tests (egress, get, stat, tap and
install_test) only in stat and tap do meshed (proxy-to-proxy) connections take
place, which we can test are 100% TLS.

For stat, #2537 already added such check for connections with the
Prometheus pod (connections to other pods are not meshed, apparently).

This commit adds such check for tap.

Fixes #2519
2019-03-29 12:28:47 -05:00
Risha Mars eda36e3258
Always show TCP open connections in the CLI (#2533)
Allow the TCP CONNECTIONS column to be shown on all stat queries in the CLI.
This column will now be called TCP_CONN for brevity.
Read/Write bytes will still only be shown on -o wide or -o json
2019-03-27 13:34:28 -07:00
Oliver Gould da0330743f
Provide peer Identities via the Destination API (#2537)
This change reintroduces identity hinting to the destination service.
The Get endpoint includes identities for pods that are injected with an
identity-mode of "default" and have the same linkerd control plane.

A `serviceaccount` label is now also added to destination response
metadata so that it's accessible in prometheus and tap.
2019-03-22 09:19:14 -07:00