This release modifies Linkerd's internal buffering to avoid idling out
services as a request arrives. This could cause failures for requests
that are sent exactly once per minute, such as Prometheus scrapes.
---
* Set a grpc-status of UNAVAILABLE only on io errors (linkerd/linkerd2-proxy#498)
* inbound: Remove unnecessary buffer (linkerd/linkerd2-proxy#501)
* buffer: Move idle timeouts into the buffer (linkerd/linkerd2-proxy#502)
* make: Support CARGO_TARGET for multi-arch builds (linkerd/linkerd2-proxy#497)
* release: Use arch-specific paths (linkerd/linkerd2-proxy#508)
Use [gotestsum](https://github.com/gotestyourself/gotestsum) for running
unit tests in CI, so we get a summary result at the end, instead of having to
scroll up to find failures.
Doesn't apply for integration tests, as only failures are shown there,
and they're easily visible.
Certain install flags are intended to help with Linkerd development and generally are not useful (and are potentially confusing) to users.
We hide these flags in release (edge or stable) builds of the CLI but show them in all other builds. The list of affected flags is:
* control-plane-version
* proxy-image
* proxy-version
* image-pull-policy
* init-image
* init-image-version
Signed-off-by: Alex Leong <alex@buoyant.io>
When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context.
Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This allows end user flexibility for options such as log format. Rather than bubbling up such possible config options into helm values, extra arguments provides more flexibility.
Add prometheusAlertmanagers value allows configuring a list of statically targetted alertmanager instances.
Use rule configmaps for prometheus rules. They take a list of {name,subPath,configMap} values and mounts them accordingly. Provided that subpaths end with _rules.yml or _rules.yaml they should be loaded by prometheus as per prometheus.yml's rule_files content.
Signed-off-by: Naseem <naseem@transit.app>
* Go test failure message wrappers to create GH Annotations
First part of #4176
## Problem
Failures in go tests need to be properly formatted as Github annotations
so that we can fetch them through Github's API for aggregation and
analysis.
## Solution
A wrapper for error messages has been created in `testutil/annotations.go`.
The idea is that instead of throwing test failures like this:
```go
t.Failf("error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
We'd throw them like this:
```go
testutil.AnnotationFatalf("error retrieving data", "error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
That will continue reporting the error as before (when using `go test`
or another test runner), but as a side-effect it will also send to
stdout something like:
```
::error file=pkg/inject_test.go,line=133::error retrieving data
```
Which becomes a GH annotation, visible in the CI run summary screen.
The fist string art is used to have the GH annotation be a generic error message
that can be aggregated and counted across multiple test runs. If `testutil.Fatalf(str, args...)`
is called instead, the original error message will be used.
Note that that the output will be produced only when the env var
`GH_ANNOTATION` is set (which will when tests are triggered from a
Github Actions workflow).
Besides `testutil/annotation.go` and its accompanying unit test file,
other changes were made in other tests as examples, the plan being that
in a further PR _all_ the tests will use these wrappers.
* Increase timeout for Helm cleanup in integration tests
Tests were failing sporadically, waiting for the Helm namespace to get
cleaned up. I verified that it is getting cleaned up, but taking more
time sometimes.
* update changelog for edge-20.4.5
This edge release includes several new CLI commands for use with
multi-cluster gateways, and adds liveness checks and metrics for
gateways. Additionally, it makes the proxy's gRPC error-handling
behavior more consistent with other implementations, and includes a fix
for a bug in the web UI.
* CLI
* Added `linkerd cluster setup-remote` command for setting up a
multi-cluster gateway
* Added `linkerd cluster gateways` command to display stats for
multi-cluster gateways
* Changed `linkerd cluster export-service` to modify a provided YAML
file and output it, rather than mutating the cluster
* Controller
* Added liveness checks and Prometheus metrics for multi-cluster
gateways
* Changed the proxy injector to configure proxies to do destination
lookups for IPs in the private IP range
* Web UI
* Fixed errors when viewing resource detail pages
* Internal
* Created script and config to build a Linkerd CLI Chocolatey package
for Windows users, which will be published with stable releases
(thanks to @drholmie!)
* Proxy
* Changed the proxy to set a `grpc-status: UNAVAILABLE` trailer when a
gRPC response stream is interrupted by a transport error
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* review feedback
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This release improves gRPC-aware error handling to set a `grpc-status`
to `UNAVAILABLE` when a response stream is interrupted by a transport
error. This is consistent with common gRPC implementations' error-
handling behavior.
---
* Handle GRPC body errors (linkerd/linkerd2-proxy#493)
Fixes#3807
By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range. This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.
This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.
We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add Linkerd CLI Chocolatey Package
This PR partially fixes#3063 by building a chocolatey package for Linkerd2's Windows CLI
It adds the build scripts for the Linkerd chocolatey package and based on discussions in
https://github.com/linkerd/linkerd2/pull/3921
Signed-off-by: Animesh Narayan Dangwal <animesh.leo@gmail.com>
Second part of #4176
Added extra Jest reporter when running js tests from CI, which will send
to stdout a GH annotation for each test failure, something like:
```
::error file=/home/alpeb/src/forks/linkerd2/web/app/js/components/Navigation.test.jsx::Navigation › checks state when versions do not match
```
See the [health
metrics RFC](https://github.com/linkerd/rfc/blob/master/design/0002-ci-health-metrics.md) for more context.
Fixes#4298
Since we started using using annotated tags for releases (because they
need to be signed), `bin/root-tag` will append `^0` to them when used
after checking out a release tag. E.g.:
```
$ bin/root-tag
edge-20.4.4^0
```
which breaks version checking by the CLI.
This PR removes that trailing `^0` whenever it's present
* Support Multi-stage install with Add-Ons
* add upgrade tests for add-ons
* add multi stage upgrade unit tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This release introduces a per-endpoint authority-override feature. This
is driven by the destination controller and is needed to support
mutli-cluster gateways.
---
* Update to Rust 1.42.0 (linkerd/linkerd2-proxy#483)
* Adjust metric description. (linkerd/linkerd2-proxy#484)
* Use authority override from metadata (linkerd/linkerd2-proxy#458)
#4195 relaxed the clock skew check to match the Kubernetes 1.17 default
heartbeat interval.
This is the same issue that was preventing an update to the `kind` version
used.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Some `linkerd stat` test failures were being hidden
`linkerd stat` was doing an early `os.Exit(0)` when no traffic was
found, which avoided `go test` to report any test failure that ended in
that code path.
This was hiding a mismatch in the golden files for HA after the
introduction of the rolling update strategy (#4267), and the failure of
`linkerd stat trafficsplit` not returning results unless `--unmeshed` is
used. For the latter, I added the flag to the tests in order to temporarly pass
them, but the underlying issue remains to be fixed in a separate
PR.
The addition of the `--unmeshed` flag changed the rendering behavior of the
`stat` command so that resources with 0 meshed pods are not displayed by
default.
Rendering is based off the row's `MeshedPodCount` field which is currently not
set by `func trafficSplitResourceQuery`. This change sets that field now so
that in rendering, the trafficsplit resource is rendered in the output.
The reason for this not showing up in testing is addressed by #4272 where the
`stat` command behavior for no traffic is changed.
The following now works without `--unmeshed` flag being passed:
```
❯ bin/linkerd stat -A ts
NAMESPACE NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
default backend-traffic-split backend-svc backend-svc 500m - - - - -
default backend-traffic-split backend-svc failing-svc 0 - - - - -
```
Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern.
The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`. The error message of this error changed between go versions. We update the test to not check for the specific error string so that this test passes regardless of go version.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#3984
We use the new `/live` admin endpoint in the Linkerd proxy for liveness probes instead of the `/metrics` endpoint. This endpoint returns a much smaller payload.
Signed-off-by: Alex Leong <alex@buoyant.io>
This introduces a rolling update strategy to Linkerd deployments that have
three replicas during HA deployments. This allows for at most one pod to begin
terminating before a new pod ready is ready.
This allows for upgrades to take place on three node clusters. As a pod begins
terminating, it opens up the node for the new pod to start initializing.
`.Values.enablePodAntiAffinity` was chosen as the conditional here because it
is set by the `values-ha.yaml` config on HA deployments
It can be difficult to know which versions of the proxy are running in your cluster, especially when you have pods running at multiple different proxy versions.
We add two pieces of CLI functionality to assist with this:
The `linkerd check --proxy` command will now list all data plane pods which are not up-to-date rather than just printing the first one it encounters:
```
‼ data plane is up-to-date
Some data plane pods are not running the current version:
* default/books-84958fff5-95j75 (git-ca760bdd)
* default/authors-57c6dc9b47-djldq (git-ca760bdd)
* default/traffic-85f58ccb66-vxr49 (git-ca760bdd)
* default/release-name-smi-metrics-899c68958-5ctpz (git-ca760bdd)
* default/webapp-6975dc796f-2ngh4 (git-ca760bdd)
* default/webapp-6975dc796f-z4bc4 (git-ca760bdd)
* emojivoto/voting-54ffc5787d-wj6cp (git-ca760bdd)
* emojivoto/vote-bot-7b54d6999b-57srw (git-ca760bdd)
* emojivoto/emoji-5cb99f85d8-5bhvm (git-ca760bdd)
* emojivoto/web-7988674b8b-zfvvm (git-ca760bdd)
* default/webapp-6975dc796f-d2fbc (git-ca760bdd)
* default/curl (git-7f6bbc73)
see https://linkerd.io/checks/#l5d-data-plane-version for hints
```
The `linkerd version` command now supports a `--proxy` flag which will list all proxy versions running in the cluster and the number of pods running each version:
```
linkerd version --proxy
Client version: dev-7b9d475f-alex
Server version: edge-20.4.1
Proxy versions:
edge-20.4.1 (10 pods)
git-ca760bdd (11 pods)
git-7f6bbc73 (1 pods)
```
Signed-off-by: Alex Leong <alex@buoyant.io>
*## edge-20.4.2
This release brings a number of CLI fixes and Controller improvements.
* CLI
* Fixed a bug that caused the proxy to crash after upgrade if
`--skip-outbound-ports` or `--skip-inbound-ports` were used
* Added `unmeshed` flag to the `stat` command, such that unmeshed resources
are only displayed if the user opts-in
* Added a `--smi-metrics` flag to `install`, to allow installation of the
experimental `linkerd-smi-metrics` component
* Fixed a bug in `linkerd stat`, causing incorrect output formatting when using
the `wide` flag
* Fixed a bug, causing `linkerd uninstall` to fail when attempting to delete
PSPs
* Controller
* Improved the anti-affinity of `linkerd-smi-metrics` deployment to avoid
pod scheduling problems during `upgrade`
* Improved endpoints change detection in the `destinations` service, enabling
mirrored remote services to change cluster gateways
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This release includes a new protocol detection timeout, which prevents
clients from consuming resources indefinitely when they do not send any
data.
Additionally: the proxy's admin endpoint now supports a `/live` endpoint
for liveness checks, and a feature has been added to enrich tracing
metadata from a file of label/values.
---
* Add Labels from a path as oc-collector attributes (linkerd/linkerd2-proxy#463)
* Add liveness endpoint to admin server (linkerd/linkerd2-proxy#470)
* docker: Use buildkit for caching (linkerd/linkerd2-proxy#472)
* Makefile: Use STRIP variable with strip as default (linkerd/linkerd2-proxy#475)
* Add checksec to the release process (linkerd/linkerd2-proxy#476)
* Time out protocol detect futures (linkerd/linkerd2-proxy#464)
* Ensure that checksec is executable (linkerd/linkerd2-proxy#477)
* Fix the checksec URL (linkerd/linkerd2-proxy#478)
* Undo hardcoded release version (linkerd/linkerd2-proxy#479)
This fixes an issue users are experiencing when upgrading from from Linkerd
2.6 to 2.7 and use the [kubernetes-external-secrets]() project.
The change introduced by #3700 resulted in the tap service showing up in the
`/openapi/v2` API response. I confirmed this with a local build.
A dependency within the project expects the `operationID` field to be present
in the swagger definition. It is optional as stated in the
[spec](https://swagger.io/docs/specification/paths-and-operations/). It's
purpose is to identify an operation and should be unique.
This change adds that field to tap service swagger spec. While this can be
fixed in the KES dependency, it certainly does not hurt to add and other
libraries may similarly expect this field.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#4257
This was introduced in 2.7.0. When performing an upgrade on an
installation having used `--skip-outbound-ports` or
`--skip-inbound-ports`, the upgrade picks those values from the
ConfigMap, parses them wrongly, and then when proxy-init picks them the
iptables commands fail.
I've also improved one of the upgrade unit tests to include these flags,
and confirmed it failed before this fix.
## Motivation
Introduces an `unmeshed` flag to the `stat` command so that users can opt-in
to viewing unmeshed resources in the `stat` output.
This changes the existing behavior of the `stat` command such that unmeshed
resources no longer render by default in the output.
Before:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
kube-system coredns 0/1 - - - - - -
kube-system local-path-provisioner 0/1 - - - - - -
kube-system metrics-server 0/1 - - - - - -
kube-system traefik 0/1 - - - - - -
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 2ms 2ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 1ms 1ms 11
...
```
After:
```
❯ bin/linkerd stat -A deploy
NAMESPACE NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN
linkerd linkerd-controller 1/1 100.00% 0.3rps 1ms 1ms 1ms 2
linkerd linkerd-destination 1/1 100.00% 0.3rps 1ms 2ms 2ms 13
...
```
Closes#3871
## Solution
Using the meshed pod count in the stat response, resources with a count of `0`
are not rendered in the table.
The `-l`/`--selector` flag do not work for all resource types, so applying a
default label does not solve this problem. While it works for pods, it does
not work for deployments as the `linkerd.io/inject` is an annotation that
cannot be selected on.
I did not think a shorthand flag was necessary for this. I do not think users
will commonly pass this flag to the `stat` command, and I didn't think adding
an additional short flag such as `u` was necessary.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>