This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden.
We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
* edge-20.4.1
This release introduces some cool new functionalities, all provided by our
awesome community of contributors! Also two bugs were fixed that were introduced
since edge-20.3.2.
* CLI
* Added `linkerd uninstall` command to uninstall the control plane (thanks
@Matei207!)
* Fixed a bug causing `linkerd routes -o wide` to not show the proper actual
success rate
* Controller
* Fail proxy injection if the pod spec has `automountServiceAccountToken`
disabled (thanks @mayankshah1607!)
* Web UI
* Added a route dashboard to Grafana (thanks @lundbird!)
* Proxy
* Fixed a bug causing the proxy's inbound to spuriously return 503 timeouts
The `cloud_integration_tests` job was creating its tests under
namespaces containing the git SHA. This is a left-over from when all the
tests ran in the same cluster, which is no longer the case, and thus no
longer needed.
This fixes the [current CI
failure](https://github.com/linkerd/linkerd2/runs/556330879?check_suite_focus=true#step:6:24)
in master.
This release fixes a bug introduced in v2.89.0 that could cause spurious
timeouts for inbound proxies that handle HTTP requests for many distinct
domains.
---
* inbound: Do not cache per-endpoint services (linkerd/linkerd2-proxy#469)
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0. Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.
This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4. This keeps all of our transitive dependencies synchronized on one version of client-go.
This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code. I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Fix bin/kind-load for pull requests
Followup to #4212
External PRs were failing because:
1) The image tarballs weren't being loaded from the `images-archives`
directory
2) Concurrent calls to `bin/kind` were attempting to download the KinD
binary simultaneously, resulting in a "text file busy" error. To avoid
that, now we just call `bin/kind` synchronously one time beforehand.
* Handle automountServiceAccountToken
Return error during inject if pod spec has `automountServiceAccountToken: false`
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Fixes#4206 Followup to #4167
Extract common logic to load images into KinD, from `bin/kind-load`, `bin/install-pr`, `.github/workflows/kind_integration.yml` and `.github/workflows/release.yml`.
Besides removing the duplication, `bin/kind-load` will benefit in performance by having each image be loaded in parallel.
```
Load into KinD the images for Linkerd's proxy, controller, web, grafana, debug and cni-plugin.
Usage:
bin/kind-load [--images] [--images-host ssh://linkerd-docker]
Examples:
# Load images from the local docker instance
bin/kind-load
# Load images from tar files located in the current directory
bin/kind-load --images
# Retrieve images from a remote docker instance and then load them into KinD
bin/kind-load --images --images-host ssh://linkerd-docker
Available Commands:
--images: use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: the argument to this option is used as the remote docker instance from which images are first retrieved
(using 'docker save') to be then loaded into KinD. This command requires --images.
```
This release introduces several fixes and improvements to the CLI.
* CLI
* Added support for kubectl-style label selectors in many CLI commands (thanks
@mayankshah1607!)
* Fixed the path regex in service profiles generated from proto files without
a package name (thanks @amariampolskiy!)
* Fixed an error when injecting Cronjobs that have no metadata
* Relaxed the clock skew check to match the default node heartbeat interval
on Kubernetes 1.17 and made this check a warning
* Fixed a bug where the linkerd-smi-metrics pod could not be created on
clusters with pod security policy enabled
* Internal
* Upgraded tracing components to more recent versions and improved resource
defaults (thanks @Pothulapati!)
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #4193
This is to verify that the list of SA installed, as well as the list of
SA in the linkerd-psp RoleBinding match the list of expected SA defined
in `healthcheck.go`.
The linkerd-smi-metrics ServiceAccount wasn't hooked into linkerd's PSP
resource, which resulted in the linkerd-smi-metrics ReplicaSet failing
to spawn pods:
```
Error creating: pods "linkerd-smi-metrics-574f57ffd4-" is forbidden:
unable to validate against any pod security policy: []
```
Fixes#3943
The Linkerd clock skew check requires that all nodes in the cluster have reported a heartbeat within (approximately) the last minute. However, in Kubernetes 1.17, the default heartbeat interval is 5 minutes. This means that the clock skew check will often fail in Kubernetes 1.17 clusters.
We relax the check to only require that heartbeats have been detected in the past 5 minutes, matching the default heartbeat interval in Kubernetes 1.17. We also switch this check to be a warning so that clusters which are configured with longer heartbeat intervals don't see this as a fatal error.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add missing SAs to linkerd check
This adds the service accounts `linkerd-destination` and
`linkerd-smi-metrics` that were missing from the "control plane
ServiceAccounts exist" check.
Fixes#4179
Changes to Go dependencies will touch all Dockerfiles in the repo which requires approval from the codeowners of each subdirectory.
We revise the codeowners to add more owners for the Dockerfiles so that approval is not required from the subdirectory owners specifically.
Signed-off-by: Alex Leong <alex@buoyant.io>
When injecting a Cronjob with no
`spec.jobTemplate.spec.template.metadata` we were getting the following
error:
```
Error transforming resources: jsonpatch add operation does not apply:
doc is missing path:
"/spec/jobTemplate/spec/template/metadata/annotations"
```
This only happens to Cronjobs because other workloads force having at
least a label there that is used in `spec.selector` (at least as of v1
workloads).
With this fix, if no metadata is detected, then we add it in the json patch when
injecting, prior to adding the injection annotation.
I've added a couple of new unit tests, one that verifies that this
doesn't remove metadata contents in Cronjobs that do have that metadata,
and another one that tests injection in Cronjobs that don't have
metadata (which I verified it failed prior to this fix).
Currently the release tag regex matches against arguments that have `edge` or
`stable` as a substring.
It should only match against arguments that are either `edge` or `stable`.
For example, the graceful error handling is not triggered for the following:
```
❯ bin/create-release-tag edge-20.3.3
bin/create-release-tag: line 92: release_tag: unbound variable
```
This PR fixes the regex so that the above results in graceful error handling.
```
❯ bin/create-release-tag edge-20.3.3
Error: valid release channels: edge, stable
Usage:
bin/create-release-tag edge
bin/create-release-tag stable 2.4.8
```
## edge-20.3.3
This release introduces new experimental CLI commands for querying metrics using
the Service Mesh Interface (SMI) and for multi-cluster support via service
mirroring.
If you would like to learn more about service mirroring or SMI, or are
interested in experimenting with these features, please join us in
[Linkerd Slack](https://slack.linkerd.io) for help and feedback.
* CLI
* Added experimental `linkerd cluster` commands for managing multi-cluster
service mirroring
* Added the experimental `linkerd alpha clients` command, which uses the
smi-metrics API to display client-side metrics from each of a resource's
clients
* Added retries to some `linkerd check` checks to prevent spurious failures
when run immediately after cluster creation or Linkerd installation
This version contains an fix for a bug that was rejecting all requests on clusters configured with an empty list of allowed client names. Because smi-metrics is an apiservice, this was also preventing namespaces from terminating.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Upgrade golangci-lint to v1.23.8
This should help with some timeouts we're seeing in CI.
I fixed some new warnings found in `inject.go` and `uninject.go`.
Also we now have to explicitly disable linting `/controller/gen`.
The linter was also complaining that in `/pkg/k8s/fake.go` the
`spClient.Interface` and `tsclient.Interface` returned in the function
`newFakeClientSetsFromManifests()` aren't used, but I opted to ignore
that to leave them available for future tests.
* Bump proxy-init to v1.3.2
Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod`
(linkerd/linkerd2-proxy-init#9).
This is a non-user-facing fix.
## Motivation
I noticed the Go language server stopped working in VS Code and narrowed it
down to `go build ./...` failing with the following:
```
❯ go build ./...
go: github.com/linkerd/stern@v0.0.0-20190907020106-201e8ccdff9c: parsing go.mod: go.mod:3: usage: go 1.23
```
This change updates `linkerd/stern` version with changes made in
linkerd/stern#3 to fix this issue.
This does not depend on #4170, but it is also needed in order to completely
fix `go build ./...`
## Motivation
After #4147 added the `install-pr` script, installing PRs into existing
clusters does not work if that cluster is a KinD cluster
Changing the script to be able to use KinD, and specifically automate `kind
load` would be helpful!
## Solution
The script can now be used in the following ways.
```
❯ bin/install-pr --help
Install Linkerd with the changes made in a GitHub Pull Request.
Usage:
--context: The name of the kubeconfig context to use
# Install Linkerd into the current cluster
bin/install-pr 1234
# Install Linkerd into the current KinD cluster
bin/install-pr [-k|--kind] 1234
# Install Linkerd into the 'kind-pr-1234' KinD cluster
bin/install-pr [-k|--kind] --context kind-pr-1234 1234
```
The script assumes that the cluster (KinD or not) has already been created. If
the cluster is a KinD cluster, the `-k|--kind` flag should be passed.
If the `--context` flag is not passsed, the install defaults to the current
context (`kubectl config current-context`).
I also added a [`-h|--help]` option that describes how to use the script.
This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op.
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
## Motivation
Testing #4167 has revealed some `linkerd check` failures that occur only
because the checks happen too quickly after cluster creation or install. If
retried, they pass on the second time.
Some checkers already handle this with the `retryDeadline` field. If a checker
does not set this field, there is no retry.
## Solution
Add retries to the `l5d-existence-replicasets`
`l5d-existence-unschedulable-pods` checks so that these checks do not fail
during a chained cluster creation > install > check process.
We add the `linkerd alpha clients` command which displays client side metrics from each of a resource's clients. This allows you to see who all of your clients are and see what your resource's metrics look like from your clients' point of view. Since these metrics are measured on the client-side, they include network latency.
```
> linkerd alpha clients deploy/web -n emojivoto
FROM TO SUCCESS RPS LATENCY_P50 LATENCY_P90 LATENCY_P99
vote-bot.emojivoto web 97.50% 2.0rps 4ms 5ms 5ms
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This change is in a similar vein to #4052 which provided support for
configuring service profile retries via a vendor extension of
`x-linkerd-retryable`, when generating from an openapi specification.
This change is very similar to the final version of that pull request,
and adds a timeout value based on `x-linkerd-timeout`.
At this point I believe that if the timeout is not specified then the
default provided by linkerd of 30s will apply anyway, but won't
explicitly be reflected in the service profile, which I'm somewhat okay
with as a current state, but I think there's a potential future
improvement that the default timeout is always shown when generating
from an open api spec, but that's more to make it clear and obvious that
that timeout exists.
Signed-off-by: Lewis Cowper <lewis.cowper@googlemail.com>
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.
Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.
Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.
---
* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
## Motivation
#4147 adds a script for setting up a local cluster that uses the images built
from the changes introduced in a forked PR. This would be useful for all PRs.
In order to install Linkerd from a PR into a local cluster, the images still
need to be built at some point. If you happen to have SSH config setup for our
Packet host, you can pull them from there. That is not very
accessible--requiring that someone adds you as a user--so we can take a
similar approach to forked PRs.
## Solution
All PRs now make an artifact directory that is uploaded as part of the KinD
integration tests. This way, the `install-pr` script can use those images no
matter if the PR is a fork or not.
We use curl for fetching remote files in our `bin` scripts. Replace the use of `wget` with `curl` in `bin/shellcheck` for consistency.
Signed-off-by: Alex Leong <alex@buoyant.io>