Depends on #9195
Currently, `linkerd inject --default-inbound-policy` does not set the
`config.linkerd.io/default-inbound-policy` annotation on the injected
resource(s).
The `inject` command does _try_ to set that annotation if it's set in
the `Values` generated by `proxyFlagSet`:
14d1dbb3b7/cli/cmd/inject.go (L485-L487)
...but, the flag in the proxy `FlagSet` doesn't set
`Values.Proxy.DefaultInboundPolicy`, it sets
`Values.PolicyController.DefaultAllowPolicy`:
7c5e3aaf40/cli/cmd/options.go (L375-L379)
This is because the flag set is shared across `linkerd inject` and
`linkerd install` subcommands, and in `linkerd install`, we want to set
the default policy for the whole cluster by configuring the policy
controller. In `linkerd inject`, though, we want to add the annotation
to the injected pods only.
This branch fixes this issue by changing the flag so that it sets the
`Values.Proxy.DefaultInboundPolicy` instead of the
`Values.PolicyController.DefaultAllowPolicy` value. In `linkerd
install`, we then set `Values.PolicyController.DefaultAllowPolicy` based
on the value of `Values.Proxy.DefaultInboundPolicy`, while in `inject`,
we will now actually add the annotation.
This branch is based on PR #9195, which adds validation to reject
invalid values for `--default-inbound-policy`, rather than on `main`.
This is because the validation code added in that PR had to be moved
around a bit, since it now needs to validate the
`Values.Proxy.DefaultInboundPolicy` value rather than the
`Values.PolicyController.DefaultAllowPolicy` value. I thought using
#9195 as a base branch was better than basing this on `main` and then
having to resolve merge conflicts later. When that PR merges, this can
be rebased onto `main`.
Fixes#9168
The policy integration tests builds some of the same container images
that are built in the integration test workflow. We can avoid this
duplication while still making the policy test execution optional.
This change moves `integration_tests.yml` and `policy_controller.yml`
into `integration.yml`. The policy test workflows now re-use some of the
`just` recipes used for development. This change also seperates the
tests that need Viz components so that, eventually, they can be executed
more selectively.
Furthermore, this change updates the docker-build action to take a tag
as a parameter, rather than *setting* a TAG environment variable. This
change tries to simplify the use of environment variables by setting a
single tag output in both the integration and release workflows.
Signed-off-by: Oliver Gould <ver@buoyant.io>
PR linkerd/linkerd2-proxy#1815 added support for a
`LINKERD2_PROXY_SHUTDOWN_GRACE_PERIOD` environment variable that
configures the proxy's maximum grace period for graceful shutdown. This
is intended to ensure that if a proxy is shut down, it will eventually
terminate in a relatively timely manner, even if some stubborn
connections don't close gracefully.
This branch adds support for a `config.linkerd.io/shutdown-grace-period`
annotation that can be used to override the default grace period
duration.
Hopefully I've added this everywhere it needs to be added --- please let
me know if I've missed anything!
Package promm is used for making assertions on prometheus metrics.
Fixes#8250
I believe this improves the readability of tests. While the new assertions use much more LOC (basically plenty vs one line regex), this allows developers to understand better what is actually being tested.
It allows for better test granularity. For example, it is easier to include more details in "has metrics" assertions, while having checks in "has no such metrics" assertions (for example look at `HasNoOutboundHTTPRequest`).
Actually, I was able to spot a bug in opaque port tests. Previously tests were passing only because we were looking for a very specific metrics.
If this gets approved, my plan is to use `metrictest` in more places and replace all assertions based on the regular expressions.
Signed-off-by: Krzysztof Dryś <krzysztofdrys@gmail.com>
When we compare generated manifests against fixtures, we do a simple
string comparison to compare output. The diffed data can be pretty hard
to understand.
This change adds a new test helper, `DiffTestYAML` that parses strings
as arbitrary YAML data structures and uses `deep.Equal` to generate a
diff of the datastructures.
Now, when a test fails, we'll get output like:
```
install_test.go:244: YAML mismatches install_output.golden:
slice[32].map[spec].map[template].map[spec].map[containers].slice[3].map[image]: PolicyControllerImageName:PolicyControllerVersion != SomeOtherImage:PolicyControllerVersion
```
While testing this, it became apparent that several of our generated
golden files were not actually valid YAML, due to the `LinkerdVersion`
value being unset. This has been fixed.
Signed-off-by: Oliver Gould <ver@buoyant.io>
This change continues the work from #7403 by refactoring the
multicluster tests in order to install components programatically.
As part of this change, we now generate certificates (a CA and a shared
issuer) in code, and add a few utilities to manage different Kubernetes
contexts; a few examples are `KubectlApplyWithContext` and a function to
re-initialise the clientset with an arbitrary context.
Few bits and pieces have also been changed as I went through this, such
as applying entire files as opposed to reading manifests in memory
before piping them to kubectl.
Some other changes:
* remove logic from test runner script that set-up multicluster
* add a more rigurous check test after linking source to target cluster
* remove `target1`, `source` and `target_statefulset` tests
* consolidated previous tests in one file
Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This change hoists all of the upgrade logic into dedicated files.
Upgrade tests are now done programatically (at the expense of repeating a lot of code between upgrade-edge and upgrade-stable). This (in theory) will remove the dependency on an external test-runner and special arguments for the TestHelper. Upgrade tests can now be run through go's runner, provided a Kubernetes cluster has been provisioned: `go test ./test/integration/upgrade-<release-channel>/...`
Signed-off-by: Matei David <matei@buoyant.io>
The timer created by calling `time.After` is not cleaned up until the timer fires. Repeated calls to `time.After` in a loop create multiple timers which can accumulate if the loop runs faster than the timeout.
We replace `time.After` with `time.NewTimer` and explicitly call `Stop` when we are finished to clean up the timer.
Signed-off-by: Alex Leong <alex@buoyant.io>
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.
[gc]: https://github.com/go-critic/go-critic
Signed-off-by: Oliver Gould <ver@buoyant.io>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
When the flakey policy test can't produce a success rate, let's just
mark the test as skipped instead of failing CI.
Relates to #7590
Signed-off-by: Oliver Gould <ver@buoyant.io>
Go's test runner (`go test`) can be non-deterministic with the order in
which it runs the tests. Tests in Go seem to be always
run in parallel, but the specifics here differ depending on the
available CPU.
We can take advantage of parallelism here to get better timing on our
tests, however, we need to block the start of each test until the
control plane (or extension) pods are ready. In each `TestMain`, we
block until the pods are ready.
Signed-off-by: Matei David <matei@buoyant.io>
This change re-works how tests are organised, by splitting each
integration test based on its functionality. Each "suite", is a
directory that compromises tests that will be specific to that suite;
deep tests will test the control plane, viz tests will target the
monitoring stack, and so on.
As part of this change, aside from moving the tests, the GH workflow
file has also been changed to remove old test names, and add (or
replace) new ones where appropriate.
To have a fully functioning viz suite, a new flag has been added to the
TestHelper to discern between tests that make use of the monitoring
stack and tests that do not. The external suite also has a duplicated
`edges` test so we can make sure interactions with external prometheus
don't break.
Signed-off-by: Matei David <matei@buoyant.io>
* Remove the `proxy.disableIdentity` config
Fixes#7724
Also:
- Removed the `linkerd.io/identity-mode` annotation.
- Removed the `config.linkerd.io/disable-identity` annotation.
- Removed the `linkerd.proxy.validation` template partial, which only
made sense when `proxy.disableIdentity` was `true`.
- TestInjectManualParams now requires to hit the cluster to retrieve the
trust root.
The goal is to support configuring the
`--subnets-to-ignore` flag in proxy-init
This change adds a new annotation `/skip-subnets` which
takes a comma-separated list of valid CIDR.
The argument will map to the `--subnets-to-ignore`
flag in the proxy-init initContainer.
Fixes#6758
Signed-off-by: Michael Lin <mlzc@hey.com>
* Stop shipping grafana-based image
Fixes#6045#7358
With this change we stop building a Grafana-based image preloaded with the Linkerd Grafana dashboards.
Instead, we'll recommend users to install Grafana by themselves, and we provide a file `grafana/values.yaml` with a default config that points to all the same Grafana dashboards we had, which are now hosted in https://grafana.com/orgs/linkerd/dashboards .
The new file `grafana/README.md` contains instructions for installing the official Grafana Helm chart, and mentions other available methods.
The `grafana.enabled` flag has been removed, and `grafanaUrl` has been moved to `grafana.url`. This will help consolidating other grafana settings that might emerge, in particular when #7429 gets addressed.
## Dashboards definitions changes
The dashboard definitions under `grafana/dashboards` (which should be kept in sync with what's published in https://grafana.com/orgs/linkerd/dashboards), got updated, adding the `__inputs`, `__elements` and `__requires` entries at the beginning, that were required in order to be published.
Fixes#6584#6620#7405
# Namespace Removal
With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper).
The `installNamespace` and `namespace` entries in `values.yaml` have been removed.
There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`.
The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used.
------------
Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart:
Stop rendering `namespace.yaml` in the `linkerd-viz` chart.
The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch.
---------
Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts.
Additional changes from what's already mentioned above:
- Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support.
- Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper.
- We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation.
Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently:
```console
$ linkerd mc allow --service-account-name foobar
Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl"
```
---------
Update: see helm/helm#5465 describing the current best-practice
# Core Helm Charts Split
This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change.
Also note the original `values.yaml` file has been split into both charts accordingly.
### UX
```console
$ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds
...
# certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values
$ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane
```
### Upgrade
As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway).
### CLI
The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before.
### Other changes
- The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405.
- These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files.
- In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts.
### Followups
- Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.
Fixes#3307
Add support for annotations `config.linkerd.io/proxy-ephemeral-storage-limit` and `config.linkerd.io/proxy-ephemeral-storage-request`
Signed-off-by: Michael Lin <mlzc@hey.com>
Closes#6813, followup to #6846
This runs `./test/integration/install_test.go`, installing linkerd with
`--set policyController.defaultAllowPolicy=deny` and subsequently
installing viz, making sure everything is alright.
This PR adds a new `policy` integration tests suite
to check the output of for the `srv` and `saz` commands and
make sure they have the expected format.
This is majorly needed to make sure things work as expected
with stat cmd output without having to do manual testing.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#6733
As policy resources provide a grouping, statistics summaries should
also be allowed on these groupings which are useful to the user. Them
being port specific provide a great way to break down these metrics
further.
This PR adds support for policy resources i.e `server` and `serverauthorization`
on the `stat` command.
## Changes
This adds a new path in the `stat_summary.go` file to handle policy
objects. I tried to see if we could re-use some of the other paths
but some of the labels seems to differ and hence a different path
had to be created. We can try to refactor and merge them though.
We support both request and TCP metrics for the `server` resource
while only the former with `serverauthorization` resources
as metrics are generated in this manner.
This also adds these policy objects into the `k8s` package to
make them as known resources.
For both the policy resources, `--from` doesn't work as these
metrics are not exposed from outbound, and there is no way to
query about the client workload from the inbound metrics. `--to`
is supported to get metrics specifically for a destination workload.
(just like on a service)
## Testing
```bash
> curl -sL https://run.linkerd.io/emojivoto.yml | linkerd inject --proxy-log-level debug - | kubectl apply -f -
> kubectl apply -f 897de1a8d5/emojivoto-policy.yml
# Initial values
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.0rps 100.00% 1.8rps 1ms 1ms 3ms 1 188.6B/s 2072.9B/s
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps 80.70% 0.9rps 1ms 2ms 3ms 1 91.4B/s 52.7B/s
emojivoto web-http 0.0rps 90.68% 2.0rps 2ms 10ms 28ms 1 153.7B/s 4509.4B/s
# After changing the `emoji-grpc` authz
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell) took 2s
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.3rps 100.00% 1.1rps 0ms 0ms 0ms 1 156.5B/s 1282.4B/s
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps 87.88% 0.6rps 0ms 0ms 0ms 1 53.5B/s 31.5B/s
emojivoto web-http 0.0rps 61.18% 1.4rps 1ms 2ms 2ms 1 110.2B/s 2195.7B/s
# after changing the `web-http` authz
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.0rps - - - - - - - -
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps - - - - - - - -
emojivoto web-http 1.0rps - - - - - - - -
> linkerd viz stat srv/emoji-grpc -n emojivoto -owide
NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emoji-grpc 100.00% 2.0rps 1ms 1ms 1ms 1 199.9B/s 2208.0B/s
> linkerd viz stat srv/web-http -n emojivoto -owide
NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web-http 94.02% 1.9rps 4ms 9ms 10ms 1 152.7B/s 4505.9B/s
> linkerd viz stat srv -n emojivoto -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms 1 201.6B/s 2209.8B/s
prom - - - - - - - - -
voting-grpc - 86.21% 1.0rps 1ms 1ms 1ms 1 98.3B/s 55.9B/s
web-http - 91.67% 2.0rps 3ms 8ms 10ms 1 157.7B/s 4600.3B/s
> linkerd viz stat serverauthorization/web-public -n emojivoto
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
web-http - 89.83% 2.0rps 3ms 9ms 10ms
> linkerd viz stat saz -n emojivoto
NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
emoji-grpc emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms
prom prom-prometheus - - - - - -
voting-grpc voting-grpc - 89.83% 1.0rps 1ms 1ms 1ms
web-http web-public - 94.96% 2.0rps 1ms 5ms 9ms
> linkerd viz stat saz/web-public -n emojivoto
NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
web-http web-public - 90.00% 2.0rps 1ms 5ms 9ms
```
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).
The misspellings have been reported at 0d56327e6f (commitcomment-51603624)
The action reports that the changes in this PR would make it happy: 03a9c310aa
Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
This changes the data plane and control plane checks to only consider pods that
are in a running state.
This is currently an issue in scenarios when upgrading and an old pod is still
terminating while the upgraded pod is already running. Running the `check`
command will report that the old pod has an unexpected proxy version even though
it is terminating; it should not warn in this case.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Go 1.16.4 includes a fix for a denial-of-service in net/http: golang/go#45710
Go's error file-line formatting changed in 1.16.3, so this change
updates tests to only do suffix matching on these error strings.
* checks: add proxy checks for core cp and extension pods
Fixes#5623
This PR adds proxy checks for control-plane and extension pods
when the respective checks are ran. This can make sure proxies
are working correctly and are able to communicate.
Currently, The following checks are added:
- proxy status checks
- proxy certificate checks
- proxy version checks
These are the same data-plane proxy checks that were already
present.
As these checks result in errors in most cases under integration
tests as there are latest versions online. This is fixed by templating
the check golden files and checking for the known error.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Remove the `linkerd-controller` pod
Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.
## Removals
- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.
## Health checks
- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.
## Other changes
- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
* Move `sp-validator` into the `destination` pod
Fixes#5195
The webhook secrets remain the same, as do the `profileValidator` settings in `values.yaml`, so this doesn't pose any upgrading challenges.
The `helm-upgrade` test was installing the current viz chart version from `viz/charts/linkerd-viz` instead of the latest stable published one, before attempting to perform the helm upgrade.
As seen below, we're passing `--helm-chart` and `helm-stable-chart` as parameters, but only `--viz-helm-chart`
```console
Test script: [install_test.go] Params: [--helm-path=/home/alpeb/src/pr/version2/linkerd2/bin/helm --helm-chart=/home/alpeb/src/pr/version2/linkerd2/charts/linkerd2 --viz-helm-chart=/home/alpeb/src/pr/version2/linkerd2/viz/charts/linkerd-viz --helm-stable-chart=linkerd/linkerd2 --helm-release=helm-test --upgrade-helm-from-version=stable-2.10.0]
```
So this PR adds a new `--viz-helm-stable-chart` parameter that instructs which version to install before performing the upgrade.
This is affecting #6000 which has changes in the viz chart that are going undetected and are failing the `helm-upgrade` test.
### What
When a namespace has the opaque ports annotation, pods and services should
inherit it if they do not have one themselves. Currently, services do this but
pods do not. This can lead to surprising behavior where services are correctly
marked as opaque, but pods are not.
This changes the proxy-injector so that it now passes down the opaque ports
annotation to pods from their namespace if they do not have their own annotation
set. Closes#5736.
### How
The proxy-injector webhook receives admission requests for pods and services.
Regardless of the resource kind, it now checks if the resource should inherit
the opaque ports annotation from its namespace. It should inherit it if the
namespace has the annotation but the resource does not.
If the resource should inherit the annotation, the webhook creates an annotation
patch which is only responsible for adding the opaque ports annotation.
After generating the annotation patch, it checks if the resource is injectable.
From here there are a few scenarios:
1. If no annotation patch was created and the resource is not injectable, then
admit the request with no changes. Examples of this are services with no OP
annotation and inject-disabled pods with no OP annotation.
2. If the resource is a pod and it is injectable, create a patch that includes
the proxy and proxy-init containers—as well as any other annotations and
labels.
3. The above two scenarios lead to a patch being generated at this point, so no
matter the resource the patch is returned.
### UI changes
Resources are now reported to either be "injected", "skipped", or "annotated".
The first pass at this PR worked around the fact that injection reports consider
services and namespaces injectable. This is not accurate because they don't have
pod templates that could be injected; they can however be annotated.
To fix this, an injection report now considers resources "annotatable" and uses
this to clean up some logic in the `inject` command, as well as avoid a more
complex proxy-injector webhook.
What's cool about this is it fixes some `inject` command output that would label
resources as "injected" when they were not even mutated. For example, namespaces
were always reported as being injected even if annotations were not added. Now,
it will properly report that a namespace has been "annotated" or "skipped".
### Tests
For testing, unit tests and integration tests have been added. Manual testing
can be done by installing linkerd with `debug` controller log levels, and
tailing the proxy-injector's app container when creating pods or services.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
`CheckDeployments()` verified that some deployment had the appropriate
number of replicas in the Ready state. `CheckPods()` does the same, plus
checking if there were restarts (a single restart returns
`RestartCountError` which only triggers a warning on the calling side,
more restarts trigger a regular error). We were always calling the
former followed by a call to the latter which is superfluous, so we're
getting rid of the latter.
Also, the `testutil.DeploySpec` struct had a `Containers` field for
checking the name of containers, but that wasn't always checked and
didn't really represent a potential error that wouldn't be clearly
manifested otherwise (like in golden files), so that was simplified as
well.
* Fix helm-upgrade integration test
Update `install_test.go` now that the upgrade test is done from 2.10.
This also implied installing viz right after core.
Refactored `HelmInstallPlain()` into `HelmCmdPlain()` to work for both
helm install and upgrade.
* Add expected heartbeat config entry to the upgrade-stable test, and remove testCheckCommand arg (for lint)
- Get rid of all the custom settings passed through `--set` during `helm install`, and instead let the defaults mechanisms in the templates to kick-in
- Before installing `linkerd-viz` through Helm, wait on _all_ the core components to be ready (this is what might have been causing the restarts seen in CI)
- Show full output when `linkerd jaeger check` fails, and do some cleanup before triggering the tracing tests. But then I decided to temporarily disable that test till we figure out what's the deal.
* Move CP check after the readiness check
Moved the `can initialize client` and `can query the control plane API`
checks from the `linkerd-existence` section to the `linkerd-api` because
they required the `linkerd-controller` pod to not just be "Running" but
actually be ready.
This was causing `linkerd check` to show some port-forwarding warnings
when ran right after install.
This also allowed getting rid of the `CheckPublicAPIClientOrExit` function
and directly use `CheckPublicAPIClientOrRetryOrExit` (better naming
punted for later) which was refactored so it always runs the
`linkerd-api` checks before retrieving the client.
Other changes:
- Temporarily disabled `upgrade-edge` test because the latest edge has this readiness check issue
- Have the upgrade tests do proper pruning (stolen for @Pothulapati's #5673😉 )
- Added missing label to tap SA (fixes#5850)
- Complete tap-injector Service selector
* Remove linkerd prefix from extension resources
This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
This PR enables the temporarily disabled external-prometheus-deep
integration test. This also fixes the clean up issue by essentially
moving the external-prometheus resources into `external-prometheus`
which has the relevant annotation required to be deleted by
`bin/test-cleanup`
This PR also updates the test resource label to be `test.linkerd.io/is-test-data-plane` from
`linkerd.io/is-test-data-plane` to prevent `linkerd inject` from removing it.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tests: add exnternal-prometheus integration test
Fixes#5659
Though most control-plane components dont differntiate on default
vs external prometheus, There have been issues w.r.t CLI and external
prometheus i.e check, etc.
This PR adds a e2e deep integration tests w.r.t external pronmetheus
thereby running viz and non-viz cmds integration tests on a linkerd-viz
ewith external prometheus instance
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* values: removal of .global field
Fixes#5425
With the new extension model, We no longer need `Global` field
as we don't rely on chart dependencies anymore. This helps us
further cleanup Values, and make configuration more simpler.
To make upgrades and the usage of new CLI with older config work,
We add a new method called `config.RemoveGlobalFieldIfPresent` that
is used in the upgrade and `FetchCurrentConfiguration` paths to remove
global field and attach its child nodes if global is present. This is verified
by the `TestFetchCurrentConfiguration`'s older test that has the global
field.
We also don't yet remove .global in some helm stable-upgrade tests for
the initial install to work.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Run extension checks when linkerd check is invoked
This change allows the linkerd check command to also run any known
linkerd extension commands that have been installed in the cluster. It
does this by first querying for any namespace that has the label
selector `linkerd.io/extension` and then runs the subcommands for either
`jaeger`, `multicluster` or `viz`. This change runs basic namespace
healthchecks for extensions that aren't part of the Linkerd extension suite.
Fixes#5233
For consistency we rename the extension charts to a common naming scheme:
linkerd-viz -> linkerd-viz (unchanged)
jaeger -> linkerd-jaeger
linkerd2-multicluster -> linkerd-multicluster
linkerd2-multicluster-link -> linkerd-multicluster-link
We also make the chart files and chart readmes a bit more uniform.
Signed-off-by: Alex Leong <alex@buoyant.io>
## What this changes
This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.
If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.
This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.
Closes#5326
## How it changes
### tap-injector
The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:
- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace
### LINKERD2_PROXY_TAP_DISABLED
Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.
### controller image
The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.
As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: move sub-cmds using viz extension under viz cmd
Fixes#5327 , #5524
This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.
- dashboard
- edges
- routes
- stat
- tap
- top
This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>