`linkerd check`, the web dashboard, and Grafana all perform version
checks to validate Linkerd is up to date. It's common for users to
seldom execute these codepaths. This makes it difficult to identify what
versions of Linkerd are currently in use and what environments it is
being run in, which helps prioritize testing and backports.
Introduce a `heartbeat` CronJob to the default Linkerd install. The
cronjob executes every 24 hours, starting from 5 minutes after
`linkerd install` is run.
Example check URL:
https://versioncheck.linkerd.io/version.json?
install-time=1562761177&
k8s-version=v1.15.0&
meshed-pods=8&
rps=3&
source=heartbeat&
uuid=cc4bb700-3314-426a-9f0f-ec588b9df020&
version=git-b97ee9f7
Fixes#2961
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
Integration tests may fail and leave behind namespaces that following
builds aren't able to clean up because the git sha is being included in
the namespace name, and the following builds don't know about those
shas.
This modifies the `test-cleanup` script to delete based on object labels
instead of relying on the objects names, now that after 2.4 all the
control plane components are labeled. Note that this will also remove
non-testing linkerd namespaces, but we were already kinda doing that
partially because we were removing the cluster-level resources (CRDs,
webhook configs, clusterroles, clusterrolebindings, psp).
`test-cleanup` no longer receives a namespace name as an argument.
The data plane namespaces aren't labeled though, so I've added the
`linkerd.io/is-test-data-plane` label for them in
`CreateNamespaceIfNotExists()`, and making sure all tests that need a
data plaine explicitly call that method instead of creating the
namespace as a side-effect in `KubectlApply()`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
During operations with `linkerd stat` sometimes it's not clear the actual
pod status.
This commit introduces a method, to the `k8s`package, getting the pod status,
based on [`kubectl` logic](33a3e325f7/pkg/printers/internalversion/printers.go (L558-L640))
to expose the `STATUS` column for pods . Also, it changes the stat command
on the` cli` package adding a column when the resource type is a Pod.
Fixes#1967
Signed-off-by: Jonathan Juares Beber <jonathanbeber@gmail.com>
`linkerd check --pre` validates that PSPs provide `NET_ADMIN`, but was
not validating `NET_RAW`, despite `NET_RAW` being required by Linkerd's
proxy-init container since #2969.
Introduce a `has NET_RAW capability` check to `linkerd check --pre`.
Fixes#3054
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The `linkerd check` for healthy ReplicaSets had a generic
`control plane components ready` description, and a hint anchor to
`l5d-existence-psp`. While a ReplicaSet failure could definitely occur
due to psp, that hintAnchor was already in use by the "control plane
PodSecurityPolicies exist" check.
Rename the `control plane components ready` check to
`control plane replica sets are ready`, and the hintAnchor from
`l5d-existence-psp` to `l5d-existence-replicasets`.
Relates to https://github.com/linkerd/website/issues/372.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Introduce new checks to determine existence of global resources and the
'linkerd-config' config map.
* Update pre-check to check for existence of global resources
This ensures that multiple control planes can't be installed into
different namespaces.
* Update integration test clean-up script to delete psp and crd
Signed-off-by: Ivan Sim <ivan@buoyant.io>
`linkerd check` validates whether PSP's exist, and if the caller has the
`NET_ADMIN` capability. This check was previously failing if `NET_ADMIN`
was not found, even in the case where the PSP admission controller was
not running. Related, `linkerd install` now includes a PSP, so
`linkerd check` should also validate that the caller can create PSP's.
Modify the `NET_ADMIN` check to warn, but not fail, if PSP's are found
but the caller does not have `NET_ADMIN`. Update the warning message to
mention that this is only a problem if the PSP admission controller is
running (and will only be a problem during injection, since #2920
handles control plane installation by adding its own PSP).
Also introduce a check to validate the caller can create PSP's.
Fixes#2884, #2849
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2927
Also moved `TestInstallSP` after `TestCheckPostInstall` so we're sure
the validating webhook is ready before installing a service profile.
Signed-off-by: Alejandro Pedraza Borrero <alejandro@buoyant.io>
This is a major refactor of the destination service. The goals of this refactor are to simplify the code for improved maintainability. In particular:
* Remove the "resolver" interfaces. These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities. The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns. These watchers deal only in Kubernetes primitives and are agnostic to how they are used. This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.
Signed-off-by: Alex Leong <alex@buoyant.io>
Split proxy-init into separate repo
Fixes#2563
The new repo is https://github.com/linkerd/linkerd2-proxy-init, and I
tagged the latest there `v1.0.0`.
Here, I've removed the `/proxy-init` dir and pinned the injected
proxy-init version to `v1.0.0` in the injector code and tests.
`/cni-plugin` depends on proxy-init, so I updated the import paths
there, and could verify CNI is still working (there is some flakiness
but unrelated to this PR).
For consistency, I added a `--init-image-version` flag to `linkerd
inject` along with its corresponding override config annotation.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
In #2679 we introduced an upgrade integration test. At the time we only
supported upgrading from a recent edge. Since that PR, a stable build
was released supporting upgrade.
Modify the upgrade integration test to upgrade from the latest stable
rather than latest edge. This fulfills the original intent of #2669.
Also add some known k8s event warnings.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Support for resources opting out of tap
Implements the `linkerd inject --disable-tap` flag (although hidden pending #2811) and the config override annotation `config.linkerd.io/disable-tap`.
Fixes#2778
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Integration test for k8s events generated during install
Fixes#2713
I did make sure a scenario like the one described in #2964 is caught.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
With the server configured to response with a failure of 50%, the test first
checks to ensure the actual success rate is less than 100%. Then the
service profile is edited to perform retries. The test then checks to
ensure the effective success rate is at least 95%.
This is (hopefully) more reliable than changing the test to perform waits and
retries until there is a difference between effective success rate and actual
success rate and compare them.
Signed-off-by: Ivan Sim <ivan@buoyant.io>
Add support for `linkerd check config`. Validates the existence of the
Linkerd Namespace, ClusterRoles, ClusterRoleBindings, ServiceAccounts,
and CustomResourceDefitions.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #2737 introduced a warning in the proxy-injector when owner ref
lookups failed due to not having up-to-date ReplicaSet information. That
warning may occur during integration tests, causing a failure.
Add the warning as a known controller log message. The warning will be
printed as a skipped test, allowing the integration tests to pass.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`linkerd install` supports a 2-stage install process, `linkerd upgrade`
did not.
Add 2-stage support for `linkerd upgrade`. Also exercise multi-stage
functionality during upgrade integration tests.
Part of #2337
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2720 and 2711
This changes the default behavior of `linkerd inject` to not inject the
proxy but just the `linkerd.io/inject: enabled` annotation for the
auto-injector to pick it up (regardless of any namespace annotation).
A new `--manual` mode was added, which behaves as before, injecting
the proxy in the command output.
The unit tests are running with `--manual` to avoid any changes in the
fixtures.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Add config.linkerd.io/disable-identity annotation
First part of #2540
We'll tackle support for `--disable-identity` in `linkerd install` in a
separate commit.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The integration tests check container logs for errors. When an error is
encountered that matches a list of expected errors, it was hidden, and
the test passed.
Modify the integration tests to report known errors in logs via
`t.Skipf`.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Fixes#2465
* Add check for unschedulable pods and psp issues (#2465)
* Return error reason and message on pod or node failure
Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
* The 'linkerd-version' CLI flag is renamed to 'control-plane-version'
* Add version field to proxy config
* Add the control plane version to the global config
* Unit test for init image version
* Use more specific control plane and proxy versions in unit tests
Signed-off-by: Ivan Sim <ivan@buoyant.io>
The `linkerd upgrade` command read the control-plane's config from
Kubernetes, which required the environment to be configured to connect
to the appropriate k8s cluster.
Intrdouce a `linkerd upgrade --from-manifests` flag, allowing the user
to feed the output of `linkerd install` into the upgrade command.
Fixes#2629
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The list of known proxy log errors has been growing, and causing regular
ci failures.
Skip proxy logging errors. The tests will continue to run and report
unexpected errors, but this will not fail the tests (and ci). Also break
out the controller log errors separately, and continue to fail on those.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
adds integration tests for service profiles that test profile generation through the tap and Open API flags.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Add validation webhook for service profiles
Fixes#2075
Todo in a follow-up PRs: remove the SP check from the CLI check.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
This change introduces integration tests for `linkerd inject`. The tests
perform CLI injection, with and without params, and validates the
output, including annotations.
Also add some known errors in logs to `install_test.go`.
TODO:
- deploy uninjected and injected resources to a default and
auto-injected cluster
- test creation and update
Part of #2459
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The integration tests were not exercising proxy auto inject.
Introduce a `--proxy-auto-inject` flag to `install_test.go`, which
now exercises install, check, and smoke test deploy for both manual and
auto injected use cases.
Part of #2569
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Out of all the integration tests (egress, get, stat, tap and
install_test) only in stat and tap do meshed (proxy-to-proxy) connections take
place, which we can test are 100% TLS.
For stat, #2537 already added such check for connections with the
Prometheus pod (connections to other pods are not meshed, apparently).
This commit adds such check for tap.
Fixes#2519
Allow the TCP CONNECTIONS column to be shown on all stat queries in the CLI.
This column will now be called TCP_CONN for brevity.
Read/Write bytes will still only be shown on -o wide or -o json
This change reintroduces identity hinting to the destination service.
The Get endpoint includes identities for pods that are injected with an
identity-mode of "default" and have the same linkerd control plane.
A `serviceaccount` label is now also added to destination response
metadata so that it's accessible in prometheus and tap.