Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.
Fixes#4498
Signed-off-by: Matei David <matei.david.35@gmail.com>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC.
Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.
The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708
Signed-off-by: Alex Leong <alex@buoyant.io>
EndpointSlices have been made opt-in due to their experimental nature. This PR
introduces a new install flag 'enableEndpointSlices' that will allow adopters to
specify in their cli install or helm install step whether they would like to
use endpointslices as a resource in the destination service, instead of the
endpoints k8s resource.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This creates a new integration test target that launches the deep suite,
using a linkerd instance installed through Helm.
I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override
during install and enhanced the injection test to catch problems like
what we saw in #4679.
* Refactor install test helpers
- Move testResourcesPostInstall to testutil.TestResourcesPostInstall
- Move exerciseTestAppEndpoint to testutil.ExerciseTestAppEndpoint
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
* Trigger CI
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
## Summary
Change the default behavior of integration tests to be isolated by cluster.
Additionally, make running one or all tests easier than the current process.
These changes are explained more in the [Testing
RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md)
## Changes
This is a script used only by Linkerd developers, but there is a lot of useful
usage examples and explanations in `bin/tests --help` output:
```
Run Linkerd integration tests.
Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer]
Usage:
tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd
Examples:
# Run all tests in isolated clusters
tests /path/to/linkerd
# Run single test in isolated clusters
tests --name test-name /path/to/linkerd
# Skip KinD cluster creation and run all tests in default cluster context
tests --skip-kind-create /path/to/linkerd
# Load images from tar files located under the 'image-archives' directory
# Note: This is primarly for CI
tests --images /path/to/linkerd
# Retrieve images from a remote docker instance and then load them into KinD
# Note: This is primarly for CI
tests --images --images-host ssh://linkerd-docker /path/to/linkerd
Available Commands:
--name: the argument to this option is the specific test to run
--skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster.
--images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images.
```
### Run all tests
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run single test (upgrade for example):
Current:
```bash
. bin/_test-run.sh
init_test_run $PWD/bin/linkerd
upgrade_integration_tests
```
New:
```bash
bin/tests --name upgrade $PWD/bin/linkerd
```
### Run tests in isolated KinD clusters
Current: Not possible without running single tests in newly created clusters
manually
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run tests in isolated namespaces on an existing cluster
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests --skip-kind-create $PWD/bin/linkerd
```
## CI
`kind_integration` has been updated so that it does not create a KinD cluster as
part of its test setup.
`cloud_integration` passes the `--skip-kind-create` flag so that the tests are
run serially in a non-KinD cluster.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This PR adds multicluster components to the integration tests.
The existing tests have been modified to pass the `--multicluster` flag so that the entire integration test suite runs with multicluster components.
Currently, the upgrade tests do not have multicluster components installed, but this will be done in a follow-up PR.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
In #4595 we stopped failing integration tests whenever a pod restarted
just once, which is being caused by containerd/containerd#4068.
But we forgot to remove the warning event corresponding to that
containerd failure, and such unexpected event continues to fail the
tests. So this change adds that event to the list of expected ones.
Adds parameters like kubernetesHelper, k8scontext, etc to the NewGenericTestHelper func allowing it to be more general, and to be able to be usable through linkerd2-conformance
* Integration tests: Warn (instead of erroring) upon pod restarts
Fixes#4595
Don't have integration tests fail whenever a pod is detected to have
restarted just once. For now we'll be just logging this out and creating
a warning annotation for it.
Followup to #4522
This removes the `controlPlaneInstalled` var in `bin/install_test.go`
that flagged whether the control plane was already present in the series
of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet
there. That was done under the assumption `TestMain()` would feed that
flag to the runner for each individual test function, but it turns out
`TestMain()` only runs once per test file, and so
`controlPlaneInstalled` remained with its initial value `false`.
So now logs/events are fetched always, even if the control plane is not
there. If the CP is absent and we try fetching, we only see a `didn't
find any client-go entries` message.
* Fetch logs/events when integration test fails, not only for install tests
## Motivation
Mainly to know what caused containers to not start (or to restart), like in #4285
## Implementation
Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test.
For that, we replace in each `TestMain()`:
```go
os.Exit(m.Run())
```
with
```go
os.Exit(testutil.Run(m, TestHelper, true))
```
where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed.
Also extracted the log/event fetching and matching into its own separate file.
* Appease linter
* For external_issuer_integration_tests controlPlaninstalled wasn't being set
Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making
other simplifications.
Note that the version placeholder in the `values.yaml` files had to be
changed from `{version}` to `linkerdVersionValue` because the former
confuses Helm v3.
* Refactor integration tests to use annotations functions
First part of #4176
Replaced all the `t.Error`/`t.Fatal` calls in the integration with the
new functions defined in `testutil/annotations.go` as described in #4292,
in order for the errors to produce Github annotations.
Most of these calls have now two strings: one containing a generic error
message and another with a more specific message. The former is what
will be aggregated and seen in the CI reports at
[linkerd2-ci-metrics](https://github.com/linkerd/linkerd2-ci-metrics).
Other changes:
- Improved the annotation generator in `annotations.go` so that the
message includes the name of the test.
- When a failure from `RetryFor` occurs, log the original timeout so
we can consider incrementing it when the failure is persistent.
* Go test failure message wrappers to create GH Annotations
First part of #4176
## Problem
Failures in go tests need to be properly formatted as Github annotations
so that we can fetch them through Github's API for aggregation and
analysis.
## Solution
A wrapper for error messages has been created in `testutil/annotations.go`.
The idea is that instead of throwing test failures like this:
```go
t.Failf("error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
We'd throw them like this:
```go
testutil.AnnotationFatalf("error retrieving data", "error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
That will continue reporting the error as before (when using `go test`
or another test runner), but as a side-effect it will also send to
stdout something like:
```
::error file=pkg/inject_test.go,line=133::error retrieving data
```
Which becomes a GH annotation, visible in the CI run summary screen.
The fist string art is used to have the GH annotation be a generic error message
that can be aggregated and counted across multiple test runs. If `testutil.Fatalf(str, args...)`
is called instead, the original error message will be used.
Note that that the output will be produced only when the env var
`GH_ANNOTATION` is set (which will when tests are triggered from a
Github Actions workflow).
Besides `testutil/annotation.go` and its accompanying unit test file,
other changes were made in other tests as examples, the plan being that
in a further PR _all_ the tests will use these wrappers.
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
* Update control-plane-namespace label
Upgrade command ignores changes to the namespace object
Add linkerd.io/control-plane-ns=linkerd label to the control-plane namespace
Fixes#3958
* Add controlPlaneNamespace label to namespace.yaml
* Modify tests for updated controlPlaneNamespace label
* Fix faulty values.yaml value
* Localize reference for controlPlaneNamespace label in kubernetes_helper.go
Signed-off-by: Supratik Das <rick.das08@gmail.com>
In light of the breaking changes we are introducing to the Helm chart and the convoluted upgrade process (see linkerd/website#647) an integration test can be quite helpful. This simply installs latest stable through helm install and then upgrades to the current head of the branch.
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
In various integration tests we're not showing stderr when a failure
happens, thus hiding some possibly useful debugging info.
E.g. in the latest CI failures, commands like `linkerd update` were
failing with no visible reason why.
* Enable cert rotation test to work with dynamic namespaces
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
The tests in #3775 were failing because the second secret holding the issuer cert replacement was a leaf cert and not a root/intermediary cert capable of signing the CSRs. This is how the replacement cert looked like:
```bash
$ k -n l5d-integration-external-issuer get secrets linkerd-identity-issuer-new -ojson | jq '.data|.["tls.crt"]' | tr -d '"' | base64 -d | step certificate inspect -
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2 (0x2)
Signature Algorithm: ECDSA-SHA256
Issuer: CN=identity.l5d-integration-external-issuer.cluster.local
Validity
Not Before: Dec 6 19:16:08 2019 UTC
Not After : Dec 5 19:16:28 2020 UTC
Subject: CN=identity.l5d-integration-external-issuer.cluster.local
Subject Public Key Info:
Public Key Algorithm: ECDSA
Public-Key: (256 bit)
X:
93:d5:fa:f8:d1:44:4f:9a:8c:aa:0c:9e:4f:98:a3:
8d:28:d9:cc:f2:74:4c:5f:76:14:52:47:b9:fb:c9:
a3:33
Y:
d2:04:74:95:2e:b4:78:28:94:8a:90:b2:fb:66:1b:
e7:60:e5:02:48:d2:02:0e:4d:9e:4f:6f:e9:0a:d9:
22:78
Curve: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Alternative Name:
DNS:identity.l5d-integration-external-issuer.cluster.local
Signature Algorithm: ECDSA-SHA256
30:46:02:21:00:f6:93:2f:10:ba:eb:be:bf:77:1a:2d:68:e6:
04:17:a4:b4:2a:05:80:f7:c5:f7:37:82:7b:b7:9c:a1:66:6a:
e1:02:21:00:b3:65:06:37:49:06:1e:13:98:7c:cf:f9:71:ce:
5a:55:de:f6:1b:83:85:b0:a8:88:b7:cf:21:d1:16:f2:10:f9
```
For it to be a root/intermediate cert it should have had `CA:TRUE` under the `X509v3 extensions` section.
Why did the test pass sometimes? When it did pass for me, I could see in the linkerd-identity proxy logs something like:
```
ERR! [ 320.964592s] linkerd2_proxy_identity::certify Received invalid ceritficate: invalid certificate: UnknownIssuer
```
so the cert retrieved from identity still was invalid but for some reason the proxy, sometimes, keeps on going despite that. And when one would delete the linkerd-identity pod, its proxy wouldn't come up at all, also showing that error.
With the changes from this branch, we no longer see that error in the logs and after deleting the linkerd-identity pod it comes back gracefully.
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
* Enable cert rotation test to work with dynamic namespaces
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address further comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Traffic split integration test
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Address comments
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Display placeholder when there is no basic stats data
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Replaced `uuid` with `uid` from linkerd-config resource
Fixes#3621
Removed the old `uuid` for identifying linkerd installations, and
replaced it with the `uid` property from the `linkerd-config` ConfigMap.
I tested that this `uid` remains the same by updating the config and
also upgrading linkerd, using both the CLI and Helm.
Note that this required granting `linkerd-web` RBAC access to the
`linkerd-config` Config.
I also added an integration test to verify the stability of the uid.
Fixes#3356
1.16 removes some api groups that were already deprecated. From k8s blog
post (https://kubernetes.io/blog/2019/07/18/api-deprecations-in-1-16/):
```
- PodSecurityPolicy: will no longer be served from extensions/v1beta1 in
v1.16.
Migrate to the policy/v1beta1 API, available since v1.10. Existing
persisted data can be retrieved/updated via the policy/v1beta1 API.
- DaemonSet, Deployment, StatefulSet, and ReplicaSet: will no longer be
served from extensions/v1beta1, apps/v1beta1, or apps/v1beta2 in v1.16.
Migrate to the apps/v1 API, available since v1.9. Existing persisted
data can be retrieved/updated via the apps/v1 API.
```
Previous PRs had already made this change at the Helm templates level,
but we still needed to do it at the API calls and tests.
The integration tests ran fine for k8s 1.12 and 1.15. They fail on 1.16
because the upgrade integration test tries to install linkerd 2.5 which is not
compatible with 1.16.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Fix auto-injecting pods and integration tests reporting
When creating an Event when auto-injection occurs (#3316) we try to
fetch the parent object to associate the event to it. If the parent
doesn't exist (like in the case of stand-alone pods) the event isn't
created. I had missed dealing with one part where that parent was
expected.
This also adds a new integration test that I verified fails before this
fix.
Finally, I removed from `_test-run.sh` some `|| exit_code=$?` that was
preventing the whole suite to report failure whenever one of the tests
in `/tests` failed.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
The Tap Service enabled tapping of any meshed pod, regardless of user
privilege.
This change introduces a new Tap APIService. Kubernetes provides
authentication and authorization of Tap requests, and then forwards
requests to a new Tap APIServer, which implements a Kubernetes
aggregated APIServer. The Tap APIServer authenticates the client TLS
from Kubernetes, and authorizes the user via a SubjectAccessReview.
This change also modifies the `linkerd tap` command to make requests
against the new APIService.
The Tap APIService implements these Kubernetes-style endpoints:
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/tap
POST /apis/tap.linkerd.io/v1alpha1/watch/namespaces/:ns/:res/:name/tap
GET /apis
GET /apis/tap.linkerd.io
GET /apis/tap.linkerd.io/v1alpha1
GET /healthz
GET /healthz/log
GET /healthz/ping
GET /metrics
GET /openapi/v2
GET /version
Users authorize to the new `tap.linkerd.io/v1alpha1` via RBAC. Only the
`watch` verb is supported. Access is also available via subresources
such as `deployments/tap` and `pods/tap`.
This change introduces the following resources into the default Linkerd
install:
- Global
- APIService/v1alpha1.tap.linkerd.io
- ClusterRoleBinding/linkerd-linkerd-tap-auth-delegator
- `linkerd` namespace:
- Secret/linkerd-tap-tls
- `kube-system` namespace:
- RoleBinding/linkerd-linkerd-tap-auth-reader
Tasks not covered by this PR:
- `linkerd top`
- `linkerd dashboard`
- `linkerd profile --tap`
- removal of the unauthenticated tap controller
Fixes#2725, #3162, #3172
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Similar to `kubectl --as`, global flag across all linkerd subcommands
which sets a `ImpersonationConfig` in the Kubernetes API config.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Integration tests may fail and leave behind namespaces that following
builds aren't able to clean up because the git sha is being included in
the namespace name, and the following builds don't know about those
shas.
This modifies the `test-cleanup` script to delete based on object labels
instead of relying on the objects names, now that after 2.4 all the
control plane components are labeled. Note that this will also remove
non-testing linkerd namespaces, but we were already kinda doing that
partially because we were removing the cluster-level resources (CRDs,
webhook configs, clusterroles, clusterrolebindings, psp).
`test-cleanup` no longer receives a namespace name as an argument.
The data plane namespaces aren't labeled though, so I've added the
`linkerd.io/is-test-data-plane` label for them in
`CreateNamespaceIfNotExists()`, and making sure all tests that need a
data plaine explicitly call that method instead of creating the
namespace as a side-effect in `KubectlApply()`.
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
* Simplify port-forwarding code
Simplifies the establishment of a port-forwarding by moving the common
logic into `PortForward.Init()`
Stemmed from this
[comment](https://github.com/linkerd/linkerd2/pull/2937#discussion_r295078800)
Signed-off-by: Alejandro Pedraza <alejandro@buoyant.io>
Numerous codepaths have emerged that create k8s configs, k8s clients,
and make k8s api requests.
This branch consolidates k8s client creation and APIs. The primary
change migrates most codepaths to call `k8s.NewAPI` to instantiate a
`KubernetesAPI` struct from `pkg`. `KubernetesAPI` implements the
`kubernetes.Interface` (clientset) interface, and also persists a
`client-go` `rest.Config`.
Specific list of changes:
- removes manual GET requests from `k8s.KubernetesAPI`, in favor of
clientsets
- replaces most calls to `k8s.GetConfig`+`kubernetes.NewForConfig` with
a single `k8s.NewAPI`
- introduces a `timeout` param to `k8s.NewAPI`, currently only used by
healthchecks
- removes `NewClientSet` in `controller/k8s/clientset.go` in favor of
`k8s.NewAPI`
- removes `httpClient` and `clientset` from `HealthChecker`, use
`KubernetesAPI` instead
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
This change introduces integration tests for `linkerd inject`. The tests
perform CLI injection, with and without params, and validates the
output, including annotations.
Also add some known errors in logs to `install_test.go`.
TODO:
- deploy uninjected and injected resources to a default and
auto-injected cluster
- test creation and update
Part of #2459
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The integration tests were not exercising proxy auto inject.
Introduce a `--proxy-auto-inject` flag to `install_test.go`, which
now exercises install, check, and smoke test deploy for both manual and
auto injected use cases.
Part of #2569
Signed-off-by: Andrew Seigner <siggy@buoyant.io>