* Support Multi-stage install with Add-Ons
* add upgrade tests for add-ons
* add multi stage upgrade unit tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This release introduces a per-endpoint authority-override feature. This
is driven by the destination controller and is needed to support
mutli-cluster gateways.
---
* Update to Rust 1.42.0 (linkerd/linkerd2-proxy#483)
* Adjust metric description. (linkerd/linkerd2-proxy#484)
* Use authority override from metadata (linkerd/linkerd2-proxy#458)
#4195 relaxed the clock skew check to match the Kubernetes 1.17 default
heartbeat interval.
This is the same issue that was preventing an update to the `kind` version
used.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Some `linkerd stat` test failures were being hidden
`linkerd stat` was doing an early `os.Exit(0)` when no traffic was
found, which avoided `go test` to report any test failure that ended in
that code path.
This was hiding a mismatch in the golden files for HA after the
introduction of the rolling update strategy (#4267), and the failure of
`linkerd stat trafficsplit` not returning results unless `--unmeshed` is
used. For the latter, I added the flag to the tests in order to temporarly pass
them, but the underlying issue remains to be fixed in a separate
PR.
The addition of the `--unmeshed` flag changed the rendering behavior of the
`stat` command so that resources with 0 meshed pods are not displayed by
default.
Rendering is based off the row's `MeshedPodCount` field which is currently not
set by `func trafficSplitResourceQuery`. This change sets that field now so
that in rendering, the trafficsplit resource is rendered in the output.
The reason for this not showing up in testing is addressed by #4272 where the
`stat` command behavior for no traffic is changed.
The following now works without `--unmeshed` flag being passed:
```
❯ bin/linkerd stat -A ts
NAMESPACE NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
default backend-traffic-split backend-svc backend-svc 500m - - - - -
default backend-traffic-split backend-svc failing-svc 0 - - - - -
```
Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern.
The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`. The error message of this error changed between go versions. We update the test to not check for the specific error string so that this test passes regardless of go version.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#3984
We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint. This endpoint returns a much smaller payload.
Signed-off-by: Alex Leong <alex@buoyant.io>
This introduces a rolling update strategy to Linkerd deployments that have
three replicas during HA deployments. This allows for at most one pod to begin
terminating before a new pod ready is ready.
This allows for upgrades to take place on three node clusters. As a pod begins
terminating, it opens up the node for the new pod to start initializing.
`.Values.enablePodAntiAffinity` was chosen as the conditional here because it
is set by the `values-ha.yaml` config on HA deployments
It can be difficult to know which versions of the proxy are running in your cluster, especially when you have pods running at multiple different proxy versions.
We add two pieces of CLI functionality to assist with this:
The `linkerd check --proxy` command will now list all data plane pods which are not up-to-date rather than just printing the first one it encounters:
```
‼ data plane is up-to-date
Some data plane pods are not running the current version:
* default/books-84958fff5-95j75 (git-ca760bdd)
* default/authors-57c6dc9b47-djldq (git-ca760bdd)
* default/traffic-85f58ccb66-vxr49 (git-ca760bdd)
* default/release-name-smi-metrics-899c68958-5ctpz (git-ca760bdd)
* default/webapp-6975dc796f-2ngh4 (git-ca760bdd)
* default/webapp-6975dc796f-z4bc4 (git-ca760bdd)
* emojivoto/voting-54ffc5787d-wj6cp (git-ca760bdd)
* emojivoto/vote-bot-7b54d6999b-57srw (git-ca760bdd)
* emojivoto/emoji-5cb99f85d8-5bhvm (git-ca760bdd)
* emojivoto/web-7988674b8b-zfvvm (git-ca760bdd)
* default/webapp-6975dc796f-d2fbc (git-ca760bdd)
* default/curl (git-7f6bbc73)
see https://linkerd.io/checks/#l5d-data-plane-version for hints
```
The `linkerd version` command now supports a `--proxy` flag which will list all proxy versions running in the cluster and the number of pods running each version:
```
linkerd version --proxy
Client version: dev-7b9d475f-alex
Server version: edge-20.4.1
Proxy versions:
edge-20.4.1 (10 pods)
git-ca760bdd (11 pods)
git-7f6bbc73 (1 pods)
```
Signed-off-by: Alex Leong <alex@buoyant.io>
*## edge-20.4.2
This release brings a number of CLI fixes and Controller improvements.
* CLI
* Fixed a bug that caused the proxy to crash after upgrade if
`--skip-outbound-ports` or `--skip-inbound-ports` were used
* Added `unmeshed` flag to the `stat` command, such that unmeshed resources
are only displayed if the user opts-in
* Added a `--smi-metrics` flag to `install`, to allow installation of the
experimental `linkerd-smi-metrics` component
* Fixed a bug in `linkerd stat`, causing incorrect output formatting when using
the `wide` flag
* Fixed a bug, causing `linkerd uninstall` to fail when attempting to delete
PSPs
* Controller
* Improved the anti-affinity of `linkerd-smi-metrics` deployment to avoid
pod scheduling problems during `upgrade`
* Improved endpoints change detection in the `destinations` service, enabling
mirrored remote services to change cluster gateways
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This release includes a new protocol detection timeout, which prevents
clients from consuming resources indefinitely when they do not send any
data.
Additionally: the proxy's admin endpoint now supports a `/live` endpoint
for liveness checks, and a feature has been added to enrich tracing
metadata from a file of label/values.
---
* Add Labels from a path as oc-collector attributes (linkerd/linkerd2-proxy#463)
* Add liveness endpoint to admin server (linkerd/linkerd2-proxy#470)
* docker: Use buildkit for caching (linkerd/linkerd2-proxy#472)
* Makefile: Use STRIP variable with strip as default (linkerd/linkerd2-proxy#475)
* Add checksec to the release process (linkerd/linkerd2-proxy#476)
* Time out protocol detect futures (linkerd/linkerd2-proxy#464)
* Ensure that checksec is executable (linkerd/linkerd2-proxy#477)
* Fix the checksec URL (linkerd/linkerd2-proxy#478)
* Undo hardcoded release version (linkerd/linkerd2-proxy#479)
This fixes an issue users are experiencing when upgrading from from Linkerd
2.6 to 2.7 and use the [kubernetes-external-secrets]() project.
The change introduced by #3700 resulted in the tap service showing up in the
`/openapi/v2` API response. I confirmed this with a local build.
A dependency within the project expects the `operationID` field to be present
in the swagger definition. It is optional as stated in the
[spec](https://swagger.io/docs/specification/paths-and-operations/). It's
purpose is to identify an operation and should be unique.
This change adds that field to tap service swagger spec. While this can be
fixed in the KES dependency, it certainly does not hurt to add and other
libraries may similarly expect this field.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#4257
This was introduced in 2.7.0. When performing an upgrade on an
installation having used `--skip-outbound-ports` or
`--skip-inbound-ports`, the upgrade picks those values from the
ConfigMap, parses them wrongly, and then when proxy-init picks them the
iptables commands fail.
I've also improved one of the upgrade unit tests to include these flags,
and confirmed it failed before this fix.
## Motivation
Introduces an `unmeshed` flag to the `stat` command so that users can opt-in
to viewing unmeshed resources in the `stat` output.
This changes the existing behavior of the `stat` command such that unmeshed
resources no longer render by default in the output.
Before:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
kube-system coredns 0/1 - - - - - -
kube-system local-path-provisioner 0/1 - - - - - -
kube-system metrics-server 0/1 - - - - - -
kube-system traefik 0/1 - - - - - -
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 2ms 2ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 1ms 1ms 11
...
```
After:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 1ms 1ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 2ms 2ms 13
...
```
Closes#3871
## Solution
Using the meshed pod count in the stat response, resources with a count of `0`
are not rendered in the table.
The `-l`/`--selector` flag do not work for all resource types, so applying a
default label does not solve this problem. While it works for pods, it does
not work for deployments as the `linkerd.io/inject` is an annotation that
cannot be selected on.
I did not think a shorthand flag was necessary for this. I do not think users
will commonly pass this flag to the `stat` command, and I didn't think adding
an additional short flag such as `u` was necessary.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden.
We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Bug in `linkerd uninstall` when attempting to delete PSP
We were using a wrong apiVersion for PSP in `linkerd uninstall`'s
output, which avoids removing that resource:
```
$ linkerd uninstall | kubectl delete -f -
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-controller"
deleted
clusterrole.rbac.authorization.k8s.io "linkerd-linkerd-destination"
deleted
...
mutatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-proxy-injector-webhook-config" deleted
validatingwebhookconfiguration.admissionregistration.k8s.io
"linkerd-sp-validator-webhook-config" deleted
namespace "linkerd" deleted
error: unable to recognize "uninstall.yml": no matches for kind
"PodSecurityPolicy" in version "extensions/v1beta1"
$ kubectl get psp -oname
podsecuritypolicy.policy/linkerd-linkerd-control-plane
```
I've also replaced the uninstall integration test with a new separate
suite that performs the installation, waits for it to be ready,
uninstalls, and then confirms `linkerd check --pre` returns as expected.
* edge-20.4.1
This release introduces some cool new functionalities, all provided by our
awesome community of contributors! Also two bugs were fixed that were introduced
since edge-20.3.2.
* CLI
* Added `linkerd uninstall` command to uninstall the control plane (thanks
@Matei207!)
* Fixed a bug causing `linkerd routes -o wide` to not show the proper actual
success rate
* Controller
* Fail proxy injection if the pod spec has `automountServiceAccountToken`
disabled (thanks @mayankshah1607!)
* Web UI
* Added a route dashboard to Grafana (thanks @lundbird!)
* Proxy
* Fixed a bug causing the proxy's inbound to spuriously return 503 timeouts
The `cloud_integration_tests` job was creating its tests under
namespaces containing the git SHA. This is a left-over from when all the
tests ran in the same cluster, which is no longer the case, and thus no
longer needed.
This fixes the [current CI
failure](https://github.com/linkerd/linkerd2/runs/556330879?check_suite_focus=true#step:6:24)
in master.
This release fixes a bug introduced in v2.89.0 that could cause spurious
timeouts for inbound proxies that handle HTTP requests for many distinct
domains.
---
* inbound: Do not cache per-endpoint services (linkerd/linkerd2-proxy#469)
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0. Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.
This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4. This keeps all of our transitive dependencies synchronized on one version of client-go.
This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code. I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Fix bin/kind-load for pull requests
Followup to #4212
External PRs were failing because:
1) The image tarballs weren't being loaded from the `images-archives`
directory
2) Concurrent calls to `bin/kind` were attempting to download the KinD
binary simultaneously, resulting in a "text file busy" error. To avoid
that, now we just call `bin/kind` synchronously one time beforehand.
* Handle automountServiceAccountToken
Return error during inject if pod spec has `automountServiceAccountToken: false`
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Fixes#4206 Followup to #4167
Extract common logic to load images into KinD, from `bin/kind-load`, `bin/install-pr`, `.github/workflows/kind_integration.yml` and `.github/workflows/release.yml`.
Besides removing the duplication, `bin/kind-load` will benefit in performance by having each image be loaded in parallel.
```
Load into KinD the images for Linkerd's proxy, controller, web, grafana, debug and cni-plugin.
Usage:
bin/kind-load [--images] [--images-host ssh://linkerd-docker]
Examples:
# Load images from the local docker instance
bin/kind-load
# Load images from tar files located in the current directory
bin/kind-load --images
# Retrieve images from a remote docker instance and then load them into KinD
bin/kind-load --images --images-host ssh://linkerd-docker
Available Commands:
--images: use 'kind load image-archive' to load the images from local .tar files in the current directory.
--images-host: the argument to this option is used as the remote docker instance from which images are first retrieved
(using 'docker save') to be then loaded into KinD. This command requires --images.
```
This release introduces several fixes and improvements to the CLI.
* CLI
* Added support for kubectl-style label selectors in many CLI commands (thanks
@mayankshah1607!)
* Fixed the path regex in service profiles generated from proto files without
a package name (thanks @amariampolskiy!)
* Fixed an error when injecting Cronjobs that have no metadata
* Relaxed the clock skew check to match the default node heartbeat interval
on Kubernetes 1.17 and made this check a warning
* Fixed a bug where the linkerd-smi-metrics pod could not be created on
clusters with pod security policy enabled
* Internal
* Upgraded tracing components to more recent versions and improved resource
defaults (thanks @Pothulapati!)
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #4193
This is to verify that the list of SA installed, as well as the list of
SA in the linkerd-psp RoleBinding match the list of expected SA defined
in `healthcheck.go`.
The linkerd-smi-metrics ServiceAccount wasn't hooked into linkerd's PSP
resource, which resulted in the linkerd-smi-metrics ReplicaSet failing
to spawn pods:
```
Error creating: pods "linkerd-smi-metrics-574f57ffd4-" is forbidden:
unable to validate against any pod security policy: []
```
Fixes#3943
The Linkerd clock skew check requires that all nodes in the cluster have reported a heartbeat within (approximately) the last minute. However, in Kubernetes 1.17, the default heartbeat interval is 5 minutes. This means that the clock skew check will often fail in Kubernetes 1.17 clusters.
We relax the check to only require that heartbeats have been detected in the past 5 minutes, matching the default heartbeat interval in Kubernetes 1.17. We also switch this check to be a warning so that clusters which are configured with longer heartbeat intervals don't see this as a fatal error.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add missing SAs to linkerd check
This adds the service accounts `linkerd-destination` and
`linkerd-smi-metrics` that were missing from the "control plane
ServiceAccounts exist" check.
Fixes#4179
Changes to Go dependencies will touch all Dockerfiles in the repo which requires approval from the codeowners of each subdirectory.
We revise the codeowners to add more owners for the Dockerfiles so that approval is not required from the subdirectory owners specifically.
Signed-off-by: Alex Leong <alex@buoyant.io>
When injecting a Cronjob with no
`spec.jobTemplate.spec.template.metadata` we were getting the following
error:
```
Error transforming resources: jsonpatch add operation does not apply:
doc is missing path:
"/spec/jobTemplate/spec/template/metadata/annotations"
```
This only happens to Cronjobs because other workloads force having at
least a label there that is used in `spec.selector` (at least as of v1
workloads).
With this fix, if no metadata is detected, then we add it in the json patch when
injecting, prior to adding the injection annotation.
I've added a couple of new unit tests, one that verifies that this
doesn't remove metadata contents in Cronjobs that do have that metadata,
and another one that tests injection in Cronjobs that don't have
metadata (which I verified it failed prior to this fix).
Currently the release tag regex matches against arguments that have `edge` or
`stable` as a substring.
It should only match against arguments that are either `edge` or `stable`.
For example, the graceful error handling is not triggered for the following:
```
❯ bin/create-release-tag edge-20.3.3
bin/create-release-tag: line 92: release_tag: unbound variable
```
This PR fixes the regex so that the above results in graceful error handling.
```
❯ bin/create-release-tag edge-20.3.3
Error: valid release channels: edge, stable
Usage:
bin/create-release-tag edge
bin/create-release-tag stable 2.4.8
```