Fixes#4790
This PR removes both the SMI-Metrics templates along with the
experimental sub-commands. This also removes pkg `smi-metrics`
as there is no direct use of it without the commands.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).
The misspellings have been reported at aaf440489e (commitcomment-41423663)
The action reports that the changes in this PR would make it happy: 5b82c6c5ca
Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Add a new structure on the destination controller side to keep track of contextual information.
The token format has been changed from ns:<namespace> to a JSON format so that more variables can be
encdoed in the token. As part of this PR, a new field 'nodeName' has been added to help with service
topologies.
Fixes#4498
Signed-off-by: Matei David <matei.david.35@gmail.com>
This PR removes the service mirror controller from `linkerd mc install` to `linkerd mc link`, as described in https://github.com/linkerd/rfc/pull/31. For fuller context, please see that RFC.
Basic multicluster functionality works here including:
* `linkerd mc install` installs the Link CRD but not any service mirror controllers
* `linkerd mc link` creates a Link resource and installs a service mirror controller which uses that Link
* The service mirror controller creates and manages mirror services, a gateway mirror, and their endpoints.
* The `linkerd mc gateways` command lists all linked target clusters, their liveliness, and probe latences.
* The `linkerd check` multicluster checks have been updated for the new architecture. Several checks have been rendered obsolete by the new architecture and have been removed.
The following are known issues requiring further work:
* the service mirror controller uses the existing `mirror.linkerd.io/gateway-name` and `mirror.linkerd.io/gateway-ns` annotations to select which services to mirror. it does not yet support configuring a label selector.
* an unlink command is needed for removing multicluster links: see https://github.com/linkerd/linkerd2/issues/4707
* an mc uninstall command is needed for uninstalling the multicluster addon: see https://github.com/linkerd/linkerd2/issues/4708
Signed-off-by: Alex Leong <alex@buoyant.io>
As linkerd-prometheus is optional now, the checks are also separated
and should only work when the prometheus add-on is installed.
This is done by re-using the add-on check code.
This creates a new integration test target that launches the deep suite,
using a linkerd instance installed through Helm.
I've added a `global.proxyInit.ignoreInboundPorts=1234,5678` override
during install and enhanced the injection test to catch problems like
what we saw in #4679.
An unappropriate variable reuse resulted in the failure of the test for
upgrading using manifests. This only happened when the upgrade was
retried a second time (when there's a discrepancy in the heartbeat cron
schedule, which is bening).
This fixes the deep integration test which currently only calls `run_test` for
`edges` integration test.
This occurs because `run_test "${tests[@]}"` will pass an entire array of
filenames when `run_test` only expects *one* filename.
The solution is to loop through `tests` and call `run_test` for each file.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This moves Prometheus as a add-on, thus making it optional but enabled by default. The also make `linkerd-prometheus` more configurable, and allow it to have its own life-cycle for upgrades, configuration, etc.
This work will be followed by documentation that help users configure existing Prometheus to work with Linkerd.
**Changes Include:**
- moving prometheus manifests into a separate chart at `charts/add-ons/prometheus`, and adding it as a dependency to `linkerd2`
- implement the `addOn` interface to support the same with CLI.
- include configuration in `linkerd-config-addons`
**User Facing Changes:**
The default install experience does not change much but for users who have already configured Prometheus differently, would need to apply the same using the new configuration fields present in chart README
Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
This PR adds a new cli test to see if installation yamls are correctly
generated even on windows, this is important because of all the file
path difference between windows and Linux, and if any code uses a wrong
format might cause the chart generation commands to fail on windows.
This creates a separate workflow for both release and integration.
Also, all the exisiting integration tests are moved in to
/tests/integration to separate from /test/cli as this test does not fall
under integration tests category
* feat: add log format annotation and helm value
Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.
This PR adds the annotation and helm value to configure log format.
Closes#2491
Signed-off-by: Naseem <naseem@transit.app>
* Refactor install test helpers
- Move testResourcesPostInstall to testutil.TestResourcesPostInstall
- Move exerciseTestAppEndpoint to testutil.ExerciseTestAppEndpoint
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
* Trigger CI
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
## Summary
Change the default behavior of integration tests to be isolated by cluster.
Additionally, make running one or all tests easier than the current process.
These changes are explained more in the [Testing
RFC](https://github.com/linkerd/rfc/blob/master/design/0004-isolated-integration-tests.md)
## Changes
This is a script used only by Linkerd developers, but there is a lot of useful
usage examples and explanations in `bin/tests --help` output:
```
Run Linkerd integration tests.
Optionally specify one of the following tests: [upgrade helm helm-upgrade uninstall deep external-issuer]
Usage:
tests [--images] [--images-host ssh://linkerd-docker] [--name test-name] [--skip-kind-create] /path/to/linkerd
Examples:
# Run all tests in isolated clusters
tests /path/to/linkerd
# Run single test in isolated clusters
tests --name test-name /path/to/linkerd
# Skip KinD cluster creation and run all tests in default cluster context
tests --skip-kind-create /path/to/linkerd
# Load images from tar files located under the 'image-archives' directory
# Note: This is primarly for CI
tests --images /path/to/linkerd
# Retrieve images from a remote docker instance and then load them into KinD
# Note: This is primarly for CI
tests --images --images-host ssh://linkerd-docker /path/to/linkerd
Available Commands:
--name: the argument to this option is the specific test to run
--skip-kind-create: skip KinD cluster creation step and run tests in an existing cluster.
--images: (Primarily for CI) use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: (Primarily for CI) the argument to this option is used as the remote docker instance from which images are first retrieved (using 'docker save') to be then loaded into KinD. This command requires --images.
```
### Run all tests
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run single test (upgrade for example):
Current:
```bash
. bin/_test-run.sh
init_test_run $PWD/bin/linkerd
upgrade_integration_tests
```
New:
```bash
bin/tests --name upgrade $PWD/bin/linkerd
```
### Run tests in isolated KinD clusters
Current: Not possible without running single tests in newly created clusters
manually
New:
```bash
bin/tests $PWD/bin/linkerd
```
### Run tests in isolated namespaces on an existing cluster
Old:
```bash
bin/test-run $PWD/bin/linkerd
```
New:
```bash
bin/tests --skip-kind-create $PWD/bin/linkerd
```
## CI
`kind_integration` has been updated so that it does not create a KinD cluster as
part of its test setup.
`cloud_integration` passes the `--skip-kind-create` flag so that the tests are
run serially in a non-KinD cluster.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This PR adds multicluster components to the integration tests.
The existing tests have been modified to pass the `--multicluster` flag so that the entire integration test suite runs with multicluster components.
Currently, the upgrade tests do not have multicluster components installed, but this will be done in a follow-up PR.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Post-2.8.0 integration test cleanup
We had some code for testing upgrades from pre-2.8.0 stables that took
care of creating the non-existent `linkerd-smi-metrics` SA, which is no
longer necessary.
I also had missed many spots in test/install_test.go from #4623
* Integration tests: Warn (instead of erroring) upon pod restarts
Fixes#4595
Don't have integration tests fail whenever a pod is detected to have
restarted just once. For now we'll be just logging this out and creating
a warning annotation for it.
* Fix install-pr script
* Add image-archives path to commands to use the files
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Signed-off-by: Charles Pretzer <charles@buoyant.io>
Co-authored-by: Charles Pretzer <charles@buoyant.io>
This adds an integration test for upgrading from the latest edge to the current
build.
Closes#4471
Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com
Followup to #4522
This removes the `controlPlaneInstalled` var in `bin/install_test.go`
that flagged whether the control plane was already present in the series
of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet
there. That was done under the assumption `TestMain()` would feed that
flag to the runner for each individual test function, but it turns out
`TestMain()` only runs once per test file, and so
`controlPlaneInstalled` remained with its initial value `false`.
So now logs/events are fetched always, even if the control plane is not
there. If the CP is absent and we try fetching, we only see a `didn't
find any client-go entries` message.
* Fixes#4305
Fixed SP route for `POST /api/v1/query`:
```
$ bin/linkerd routes -n linkerd deploy/linkerd-prometheus
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms
GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms
POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms
[DEFAULT] linkerd-prometheus - - - - -
```
Also added one missing route for `linkerd-grafana`, realizing afterwards there are
many other ones missing, but not really worth adding them all.
I also removed the routes in `linkerd-controller` for the tap routes
given that's no longer handled in that service.
And the tap service SP was also removed alltogether since nothing was
getting reported.
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.
This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.
---
* proxy: v2.99.0
The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.
---
* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
* inject: Support config.linkerd.io/enable-gateway
This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.
This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).
* proxy: v2.100.0
This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.
---
* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
* Fetch logs/events when integration test fails, not only for install tests
## Motivation
Mainly to know what caused containers to not start (or to restart), like in #4285
## Implementation
Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test.
For that, we replace in each `TestMain()`:
```go
os.Exit(m.Run())
```
with
```go
os.Exit(testutil.Run(m, TestHelper, true))
```
where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed.
Also extracted the log/event fetching and matching into its own separate file.
* Appease linter
* For external_issuer_integration_tests controlPlaninstalled wasn't being set
* When installation test fails, fetch logs and events
Re #4371
When a test fails in `./test/install_test.go`, trigger the `TestLogs`
and `TestEvents` tests in a separate process in order to output any
unexpected logs/events that might have caused the initial test failure.
For instance, currently we're sporadically experiencing pod restarts.
Instead of ignoring them, this might help provide us with the real
underlying cause.
Upgraded to Helm v3.2.1 from v2.16.1, getting rid of Tiller and making
other simplifications.
Note that the version placeholder in the `values.yaml` files had to be
changed from `{version}` to `linkerdVersionValue` because the former
confuses Helm v3.
#4217 suggests a retries integration test, but this is already tested as part
of the ServiceProfiles test.
In order to fix this issue, an extra check has been added to the assertion of
the `ActualSuccess` value. It now asserts the value is both greater than 0 and
less than 100.
Closes#4217
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Refactor integration tests to use annotations functions
First part of #4176
Replaced all the `t.Error`/`t.Fatal` calls in the integration with the
new functions defined in `testutil/annotations.go` as described in #4292,
in order for the errors to produce Github annotations.
Most of these calls have now two strings: one containing a generic error
message and another with a more specific message. The former is what
will be aggregated and seen in the CI reports at
[linkerd2-ci-metrics](https://github.com/linkerd/linkerd2-ci-metrics).
Other changes:
- Improved the annotation generator in `annotations.go` so that the
message includes the name of the test.
- When a failure from `RetryFor` occurs, log the original timeout so
we can consider incrementing it when the failure is persistent.
Sometimes for no clear reason pods are taking their time to become
available. The `kubectl wait --for=condition=available` command in
`inject_test.go` is failing sporadically because of this.
e.g in
https://github.com/linkerd/linkerd2/runs/652159504?check_suite_focus=true#step:14:56
I could reproduce this and even though I couldn't see any errors in the logs
or events, I could confirm how long it's taking for the pod to come up:
```
$ k -n l5d-integration-inject-test describe po inject-test-terminus-enabled
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m12s default-scheduler Successfully assigned l5d-integration-inject-test/inject-test-terminus-enabled-96fd5f5dc-5qlpb to gke-alpeb-dev-default-pool-b94ca25c-h84p
Normal Pulled 6m55s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "gcr.io/linkerd-io/proxy-init:v1.3.2" already present on machine
Normal Created 6m54s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container linkerd-init
Normal Started 6m47s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container linkerd-init
Normal Pulled 6m28s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "buoyantio/bb:v0.0.5" already present on machine
Normal Created 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container bb-terminus
Normal Started 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container bb-terminus
Normal Pulled 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Container image "gcr.io/linkerd-io/proxy:git-2a95d373" already present on machine
Normal Created 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Created container linkerd-proxy
Normal Started 6m27s kubelet, gke-alpeb-dev-default-pool-b94ca25c-h84p Started container linkerd-proxy
```
here the pod took 45s to start!
Updated rule in list of ignored k8s warning events to make it more
generic and to account for this failure:
```
error killing pod: failed to "KillPodSandbox" for
"756c8333-1d4d-4f42-bc2d-bd99eb8b4c94" with KillPodSandboxError: "rpc
error: code = Unknown desc = networkPlugin cni failed to teardown pod
\"_\" network: operation Delete is not supported on
WorkloadEndpoint(default/gke--testing--git--2d2fd3f1--default--pool--b9cfce6d--tgcn-cni-bd3ca37ee6fc3a05bafa26ce71faa05279ce08de02462040300786cb7e046b38-eth0)"
```
That happened here:
https://github.com/linkerd/linkerd2/runs/653622248?check_suite_focus=true#step:6:27
When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context.
Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively.
Signed-off-by: Matei David <matei.david.35@gmail.com>
Fixes#3807
By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range. This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.
This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.
We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.
Signed-off-by: Alex Leong <alex@buoyant.io>
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#3984
We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint. This endpoint returns a much smaller payload.
Signed-off-by: Alex Leong <alex@buoyant.io>
## Motivation
Introduces an `unmeshed` flag to the `stat` command so that users can opt-in
to viewing unmeshed resources in the `stat` output.
This changes the existing behavior of the `stat` command such that unmeshed
resources no longer render by default in the output.
Before:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
kube-system coredns 0/1 - - - - - -
kube-system local-path-provisioner 0/1 - - - - - -
kube-system metrics-server 0/1 - - - - - -
kube-system traefik 0/1 - - - - - -
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 2ms 2ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 1ms 1ms 11
...
```
After:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 1ms 1ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 2ms 2ms 13
...
```
Closes#3871
## Solution
Using the meshed pod count in the stat response, resources with a count of `0`
are not rendered in the table.
The `-l`/`--selector` flag do not work for all resource types, so applying a
default label does not solve this problem. While it works for pods, it does
not work for deployments as the `linkerd.io/inject` is an annotation that
cannot be selected on.
I did not think a shorthand flag was necessary for this. I do not think users
will commonly pass this flag to the `stat` command, and I didn't think adding
an additional short flag such as `u` was necessary.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden.
We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
Followup to #4193
This is to verify that the list of SA installed, as well as the list of
SA in the linkerd-psp RoleBinding match the list of expected SA defined
in `healthcheck.go`.
* Add missing SAs to linkerd check
This adds the service accounts `linkerd-destination` and
`linkerd-smi-metrics` that were missing from the "control plane
ServiceAccounts exist" check.