This changes the data plane and control plane checks to only consider pods that
are in a running state.
This is currently an issue in scenarios when upgrading and an old pod is still
terminating while the upgraded pod is already running. Running the `check`
command will report that the old pod has an unexpected proxy version even though
it is terminating; it should not warn in this case.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* checks: add proxy checks for core cp and extension pods
Fixes#5623
This PR adds proxy checks for control-plane and extension pods
when the respective checks are ran. This can make sure proxies
are working correctly and are able to communicate.
Currently, The following checks are added:
- proxy status checks
- proxy certificate checks
- proxy version checks
These are the same data-plane proxy checks that were already
present.
As these checks result in errors in most cases under integration
tests as there are latest versions online. This is fixed by templating
the check golden files and checking for the known error.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Remove the `linkerd-controller` pod
Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.
## Removals
- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.
## Health checks
- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.
## Other changes
- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
* Move `sp-validator` into the `destination` pod
Fixes#5195
The webhook secrets remain the same, as do the `profileValidator` settings in `values.yaml`, so this doesn't pose any upgrading challenges.
The `helm-upgrade` test was installing the current viz chart version from `viz/charts/linkerd-viz` instead of the latest stable published one, before attempting to perform the helm upgrade.
As seen below, we're passing `--helm-chart` and `helm-stable-chart` as parameters, but only `--viz-helm-chart`
```console
Test script: [install_test.go] Params: [--helm-path=/home/alpeb/src/pr/version2/linkerd2/bin/helm --helm-chart=/home/alpeb/src/pr/version2/linkerd2/charts/linkerd2 --viz-helm-chart=/home/alpeb/src/pr/version2/linkerd2/viz/charts/linkerd-viz --helm-stable-chart=linkerd/linkerd2 --helm-release=helm-test --upgrade-helm-from-version=stable-2.10.0]
```
So this PR adds a new `--viz-helm-stable-chart` parameter that instructs which version to install before performing the upgrade.
This is affecting #6000 which has changes in the viz chart that are going undetected and are failing the `helm-upgrade` test.
`CheckDeployments()` verified that some deployment had the appropriate
number of replicas in the Ready state. `CheckPods()` does the same, plus
checking if there were restarts (a single restart returns
`RestartCountError` which only triggers a warning on the calling side,
more restarts trigger a regular error). We were always calling the
former followed by a call to the latter which is superfluous, so we're
getting rid of the latter.
Also, the `testutil.DeploySpec` struct had a `Containers` field for
checking the name of containers, but that wasn't always checked and
didn't really represent a potential error that wouldn't be clearly
manifested otherwise (like in golden files), so that was simplified as
well.
* Fix helm-upgrade integration test
Update `install_test.go` now that the upgrade test is done from 2.10.
This also implied installing viz right after core.
Refactored `HelmInstallPlain()` into `HelmCmdPlain()` to work for both
helm install and upgrade.
* Add expected heartbeat config entry to the upgrade-stable test, and remove testCheckCommand arg (for lint)
- Get rid of all the custom settings passed through `--set` during `helm install`, and instead let the defaults mechanisms in the templates to kick-in
- Before installing `linkerd-viz` through Helm, wait on _all_ the core components to be ready (this is what might have been causing the restarts seen in CI)
- Show full output when `linkerd jaeger check` fails, and do some cleanup before triggering the tracing tests. But then I decided to temporarily disable that test till we figure out what's the deal.
* Remove linkerd prefix from extension resources
This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* tests: add exnternal-prometheus integration test
Fixes#5659
Though most control-plane components dont differntiate on default
vs external prometheus, There have been issues w.r.t CLI and external
prometheus i.e check, etc.
This PR adds a e2e deep integration tests w.r.t external pronmetheus
thereby running viz and non-viz cmds integration tests on a linkerd-viz
ewith external prometheus instance
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* values: removal of .global field
Fixes#5425
With the new extension model, We no longer need `Global` field
as we don't rely on chart dependencies anymore. This helps us
further cleanup Values, and make configuration more simpler.
To make upgrades and the usage of new CLI with older config work,
We add a new method called `config.RemoveGlobalFieldIfPresent` that
is used in the upgrade and `FetchCurrentConfiguration` paths to remove
global field and attach its child nodes if global is present. This is verified
by the `TestFetchCurrentConfiguration`'s older test that has the global
field.
We also don't yet remove .global in some helm stable-upgrade tests for
the initial install to work.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Run extension checks when linkerd check is invoked
This change allows the linkerd check command to also run any known
linkerd extension commands that have been installed in the cluster. It
does this by first querying for any namespace that has the label
selector `linkerd.io/extension` and then runs the subcommands for either
`jaeger`, `multicluster` or `viz`. This change runs basic namespace
healthchecks for extensions that aren't part of the Linkerd extension suite.
Fixes#5233
For consistency we rename the extension charts to a common naming scheme:
linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link
We also make the chart files and chart readmes a bit more uniform.
Signed-off-by: Alex Leong <alex@buoyant.io>
* viz: move some components into linkerd-viz
This branch moves the grafana,prometheus,web, tap components
into a new viz chart, following the same extension model that
multi-cluster and jaeger follow.
The components in viz are not injected during install time, and
will go through the injector. The `viz install` does not have any
cli flags to customize the install directly but instead follow the Helm
way of customization by using flags such as
`set`, `set-string`, `values`, `set-files`.
**Changes Include**
- Move `grafana`, `prometheus`, `web`, `tap` templates into viz extension.
- Remove all add-on related charts, logic and tests w.r.t CLI & Helm.
- Clean up `linkerd2/values.go` & `linkerd2/values.yaml` to not contain
fields related to viz components.
- Update `linkerd check` Healthchecks to not check for viz components.
- Create a new top level `viz` directory with CLI logic and Helm charts.
- Clean fields in the `viz/Values.yaml` to be in the `<component>.<property>`
model. Ex: `prometheus.resources`, `dashboard.image.tag`, etc so that it is
consistent everywhere.
**Testing**
```bash
# Install the Core Linkerd Installation
./bin/linkerd install | k apply -f -
# Wait for the proxy-injector to be ready
# Install the Viz Extension
./bin/linkerd cli viz install | k apply -f -
# Customized Install
./bin/linkerd cli viz install --set prometheus.enabled=false | k apply -f -
```
What is not included in this PR:
- Move of Controller from core install into the viz extension.
- Simplification and refactoring of the core chart i.e removing `.global`, etc.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
The rare cases where these tests were useful don't make up for the burden of
maintaing them, having different k8s version change the messages and
having unexpected warnings come up that didn't affect the final
convergence of the system.
With this we also revert the indirection added back in #4538 that
fetched unmatched warnings after a test had failed.
Fixes#5191
The logs command adds a external dependency that we forked to work but
does not fit within linkerd's core set of responsibilities. Hence, This
is being removed.
For capabilities like this, The Kubernetes plugin ecosystem has better
and well maintained tools that can be used.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Most invocations of `TestHelper.LinkerdRun` don't actually need the stderr
output except to encode it in the error message. This changes this helper
to return an error that includes the full invoked command and error message.
Invocations that need direct access to stderr must call `TestHelper.PipeToLinkerdRun`
The purpose of this test is to validate that the auto injector configures the proxy and the additional containers according to the specified config.
This is done by providing a helper that can generate the desired annotations and later inspect an injected pod in order to determine that every bit of configuration has been accounted for. This test is to provide further assurance that #5036 did not introduce any regressions.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Adds bin/certs-openssl, which creates self-signed root cert/key and issuer cert/key using openssl. This will be used in the two clusters set up in the multicluster integration test (followup PR), given CI already has openssl and to avoid having to install step.
Adds a new flag `--certs-path` to the integration tests, pointing to the path where those certs (ca.crt, ca.key, issuer.key and issuer.crt) will be located to be fed into linkerd install's `--identity-*` flags.
* Integration test for smi-metrics
This PR adds an integration test which installs SMI-Metrics and performs
queries and matches the reply with a regex query.
Currently, We store the SMI Helm pkg locally and run the test on top, so
That our CI does not break and we will periodically update the package
based on the newer releases of SMI-Metrics
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tests: Add new CNI deep integration tests
Fixes#3944
This PR adds a new test, called cni-calico-deep which installs the Linkerd CNI
plugin on top of a cluster with Calico and performs the current integration tests on top, thus
validating various Linkerd features when CNI is enabled. For Calico
to work, special config is required for kind which is at `cni-calico.yaml`
This is different from the CNI integration tests that we run in
cloud integration which performs the CNI level integration tests.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC.
Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.
The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708
Signed-off-by: Alex Leong <alex@buoyant.io>
This creates a new integration test target that launches the deep suite,
using a linkerd instance installed through Helm.
I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override
during install and enhanced the injection test to catch problems like
what we saw in #4679.
## Summary
Change the default behavior of integration tests to be isolated by cluster.
Additionally, make running one or all tests easier than the current process.
These changes are explained more in the [Testing
RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md)
## Changes
This is a script used only by Linkerd developers, but there is a lot of useful
usage examples and explanations in `bin/tests --help` output:
```
Run Linkerd integration tests.
Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer]
Usage:
tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd
Examples:
# Run all tests in isolated clusters
tests /path/to/linkerd
# Run single test in isolated clusters
tests --name test-name /path/to/linkerd
# Skip KinD cluster creation and run all tests in default cluster context
tests --skip-kind-create /path/to/linkerd
# Load images from tar files located under the 'image-archives' directory
# Note: This is primarly for CI
tests --images /path/to/linkerd
# Retrieve images from a remote docker instance and then load them into KinD
# Note: This is primarly for CI
tests --images --images-host ssh://linkerd-docker /path/to/linkerd
Available Commands:
--name: the argument to this option is the specific test to run
--skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster.
--images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images.
```
### Run all tests
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run single test (upgrade for example):
Current:
```bash
. bin/_test-run.sh
init_test_run $PWD/bin/linkerd
upgrade_integration_tests
```
New:
```bash
bin/tests --name upgrade $PWD/bin/linkerd
```
### Run tests in isolated KinD clusters
Current: Not possible without running single tests in newly created clusters
manually
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run tests in isolated namespaces on an existing cluster
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests --skip-kind-create $PWD/bin/linkerd
```
## CI
`kind_integration` has been updated so that it does not create a KinD cluster as
part of its test setup.
`cloud_integration` passes the `--skip-kind-create` flag so that the tests are
run serially in a non-KinD cluster.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This PR adds multicluster components to the integration tests.
The existing tests have been modified to pass the `--multicluster` flag so that the entire integration test suite runs with multicluster components.
Currently, the upgrade tests do not have multicluster components installed, but this will be done in a follow-up PR.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Adds parameters like kubernetesHelper, k8scontext, etc to the NewGenericTestHelper func allowing it to be more general, and to be able to be usable through linkerd2-conformance
Followup to #4522
This removes the `controlPlaneInstalled` var in `bin/install_test.go`
that flagged whether the control plane was already present in the series
of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet
there. That was done under the assumption `TestMain()` would feed that
flag to the runner for each individual test function, but it turns out
`TestMain()` only runs once per test file, and so
`controlPlaneInstalled` remained with its initial value `false`.
So now logs/events are fetched always, even if the control plane is not
there. If the CP is absent and we try fetching, we only see a `didn't
find any client-go entries` message.
* Fetch logs/events when integration test fails, not only for install tests
## Motivation
Mainly to know what caused containers to not start (or to restart), like in #4285
## Implementation
Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test.
For that, we replace in each `TestMain()`:
```go
os.Exit(m.Run())
```
with
```go
os.Exit(testutil.Run(m, TestHelper, true))
```
where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed.
Also extracted the log/event fetching and matching into its own separate file.
* Appease linter
* For external_issuer_integration_tests controlPlaninstalled wasn't being set
Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making
other simplifications.
Note that the version placeholder in the `values.yaml` files had to be
changed from `{version}` to `linkerdVersionValue` because the former
confuses Helm v3.
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
In light of the breaking changes we are introducing to the Helm chart and the convoluted upgrade process (see linkerd/website#647) an integration test can be quite helpful. This simply installs latest stable through helm install and then upgrades to the current head of the branch.
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
In various integration tests we're not showing stderr when a failure
happens, thus hiding some possibly useful debugging info.
E.g. in the latest CI failures, commands like `linkerd update` were
failing with no visible reason why.
* Enable cert rotation test to work with dynamic namespaces
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
The tests in #3775 were failing because the second secret holding the issuer cert replacement was a leaf cert and not a root/intermediary cert capable of signing the CSRs. This is how the replacement cert looked like:
```bash
$ k -n l5d-integration-external-issuer get secrets linkerd-identity-issuer-new -ojson | jq '.data|.["tls.crt"]' | tr -d '"' | base64 -d | step certificate inspect -
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2 (0x2)
Signature Algorithm: ECDSA-SHA256
Issuer: CN=identity.l5d-integration-external-issuer.cluster.local
Validity
Not Before: Dec 6 19:16:08 2019 UTC
Not After : Dec 5 19:16:28 2020 UTC
Subject: CN=identity.l5d-integration-external-issuer.cluster.local
Subject Public Key Info:
Public Key Algorithm: ECDSA
Public-Key: (256 bit)
X:
93:d5:fa:f8:d1:44:4f:9a:8c:aa:0c:9e:4f:98:a3:
8d:28:d9:cc:f2:74:4c:5f:76:14:52:47:b9:fb:c9:
a3:33
Y:
d2:04:74:95:2e:b4:78:28:94:8a:90:b2:fb:66:1b:
e7:60:e5:02:48:d2:02:0e:4d:9e:4f:6f:e9:0a:d9:
22:78
Curve: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Alternative Name:
DNS:identity.l5d-integration-external-issuer.cluster.local
Signature Algorithm: ECDSA-SHA256
30:46:02:21:00:f6:93:2f:10:ba:eb:be:bf:77:1a:2d:68:e6:
04:17:a4:b4:2a:05:80:f7:c5:f7:37:82:7b:b7:9c:a1:66:6a:
e1:02:21:00:b3:65:06:37:49:06:1e:13:98:7c:cf:f9:71:ce:
5a:55:de:f6:1b:83:85:b0:a8:88:b7:cf:21:d1:16:f2:10:f9
```
For it to be a root/intermediate cert it should have had `CA:TRUE` under the `X509v3 extensions` section.
Why did the test pass sometimes? When it did pass for me, I could see in the linkerd-identity proxy logs something like:
```
ERR! [ 320.964592s] linkerd2_proxy_identity::certify Received invalid ceritficate: invalid certificate: UnknownIssuer
```
so the cert retrieved from identity still was invalid but for some reason the proxy, sometimes, keeps on going despite that. And when one would delete the linkerd-identity pod, its proxy wouldn't come up at all, also showing that error.
With the changes from this branch, we no longer see that error in the logs and after deleting the linkerd-identity pod it comes back gracefully.
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
* Enable cert rotation test to work with dynamic namespaces
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address further comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Traffic split integration test
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Address comments
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Display placeholder when there is no basic stats data
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
This change introduces integration tests for `linkerd inject`. The tests
perform CLI injection, with and without params, and validates the
output, including annotations.
Also add some known errors in logs to `install_test.go`.
TODO:
- deploy uninjected and injected resources to a default and
auto-injected cluster
- test creation and update
Part of #2459
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The integration tests were not exercising proxy auto inject.
Introduce a `--proxy-auto-inject` flag to `install_test.go`, which
now exercises install, check, and smoke test deploy for both manual and
auto injected use cases.
Part of #2569
Signed-off-by: Andrew Seigner <siggy@buoyant.io>