## edge-20.3.3
This release introduces new experimental CLI commands for querying metrics using
the Service Mesh Interface (SMI) and for multi-cluster support via service
mirroring.
If you would like to learn more about service mirroring or SMI, or are
interested in experimenting with these features, please join us in
[Linkerd Slack](https://slack.linkerd.io) for help and feedback.
* CLI
* Added experimental `linkerd cluster` commands for managing multi-cluster
service mirroring
* Added the experimental `linkerd alpha clients` command, which uses the
smi-metrics API to display client-side metrics from each of a resource's
clients
* Added retries to some `linkerd check` checks to prevent spurious failures
when run immediately after cluster creation or Linkerd installation
This version contains an fix for a bug that was rejecting all requests on clusters configured with an empty list of allowed client names. Because smi-metrics is an apiservice, this was also preventing namespaces from terminating.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Upgrade golangci-lint to v1.23.8
This should help with some timeouts we're seeing in CI.
I fixed some new warnings found in `inject.go` and `uninject.go`.
Also we now have to explicitly disable linting `/controller/gen`.
The linter was also complaining that in `/pkg/k8s/fake.go` the
`spClient.Interface` and `tsclient.Interface` returned in the function
`newFakeClientSetsFromManifests()` aren't used, but I opted to ignore
that to leave them available for future tests.
* Bump proxy-init to v1.3.2
Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod`
(linkerd/linkerd2-proxy-init#9).
This is a non-user-facing fix.
## Motivation
I noticed the Go language server stopped working in VS Code and narrowed it
down to `go build ./...` failing with the following:
```
❯ go build ./...
go: github.com/linkerd/stern@v0.0.0-20190907020106-201e8ccdff9c: parsing go.mod: go.mod:3: usage: go 1.23
```
This change updates `linkerd/stern` version with changes made in
linkerd/stern#3 to fix this issue.
This does not depend on #4170, but it is also needed in order to completely
fix `go build ./...`
## Motivation
After #4147 added the `install-pr` script, installing PRs into existing
clusters does not work if that cluster is a KinD cluster
Changing the script to be able to use KinD, and specifically automate `kind
load` would be helpful!
## Solution
The script can now be used in the following ways.
```
❯ bin/install-pr --help
Install Linkerd with the changes made in a GitHub Pull Request.
Usage:
--context: The name of the kubeconfig context to use
# Install Linkerd into the current cluster
bin/install-pr 1234
# Install Linkerd into the current KinD cluster
bin/install-pr [-k|--kind] 1234
# Install Linkerd into the 'kind-pr-1234' KinD cluster
bin/install-pr [-k|--kind] --context kind-pr-1234 1234
```
The script assumes that the cluster (KinD or not) has already been created. If
the cluster is a KinD cluster, the `-k|--kind` flag should be passed.
If the `--context` flag is not passsed, the install defaults to the current
context (`kubectl config current-context`).
I also added a [`-h|--help]` option that describes how to use the script.
This change removes the target port requirement when resolving ports in the dst service. Based on the comments, it seems that we need to have a target port defined in the port spec in order to resolve to the port in the Endpoints. In reality if target port is note defined when creating the service, k8s will set the port and the target port to the same value. Seems to me that checking for the targetPort to be different than 0, is a no-op.
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
## Motivation
Testing #4167 has revealed some `linkerd check` failures that occur only
because the checks happen too quickly after cluster creation or install. If
retried, they pass on the second time.
Some checkers already handle this with the `retryDeadline` field. If a checker
does not set this field, there is no retry.
## Solution
Add retries to the `l5d-existence-replicasets`
`l5d-existence-unschedulable-pods` checks so that these checks do not fail
during a chained cluster creation > install > check process.
We add the `linkerd alpha clients` command which displays client side metrics from each of a resource's clients. This allows you to see who all of your clients are and see what your resource's metrics look like from your clients' point of view. Since these metrics are measured on the client-side, they include network latency.
```
> linkerd alpha clients deploy/web -n emojivoto
FROM TO SUCCESS RPS LATENCY_P50 LATENCY_P90 LATENCY_P99
vote-bot.emojivoto web 97.50% 2.0rps 4ms 5ms 5ms
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This change is in a similar vein to #4052 which provided support for
configuring service profile retries via a vendor extension of
`x-linkerd-retryable`, when generating from an openapi specification.
This change is very similar to the final version of that pull request,
and adds a timeout value based on `x-linkerd-timeout`.
At this point I believe that if the timeout is not specified then the
default provided by linkerd of 30s will apply anyway, but won't
explicitly be reflected in the service profile, which I'm somewhat okay
with as a current state, but I think there's a potential future
improvement that the default timeout is always shown when generating
from an open api spec, but that's more to make it clear and obvious that
that timeout exists.
Signed-off-by: Lewis Cowper <lewis.cowper@googlemail.com>
This release builds on changes in the prior release to ensure that
balancers process updates eagerly.
Cache capacity limitations have been removed; and services now fail
eagerly, rather than making all requests wait for the timeout to expire.
Also, a bug was fixed in the way the `LINKERD2_PROXY_LOG` env variable
is parsed.
---
* Introduce a backpressure-propagating buffer (linkerd/linkerd2-proxy#451)
* trace: update tracing-subscriber to 0.2.3 (linkerd/linkerd2-proxy#455)
* timeout: Introduce FailFast, Idle, and Probe middlewares (linkerd/linkerd2-proxy#452)
* cache: Let services self-evict (linkerd/linkerd2-proxy#456)
## Motivation
#4147 adds a script for setting up a local cluster that uses the images built
from the changes introduced in a forked PR. This would be useful for all PRs.
In order to install Linkerd from a PR into a local cluster, the images still
need to be built at some point. If you happen to have SSH config setup for our
Packet host, you can pull them from there. That is not very
accessible--requiring that someone adds you as a user--so we can take a
similar approach to forked PRs.
## Solution
All PRs now make an artifact directory that is uploaded as part of the KinD
integration tests. This way, the `install-pr` script can use those images no
matter if the PR is a fork or not.
We use curl for fetching remote files in our `bin` scripts. Replace the use of `wget` with `curl` in `bin/shellcheck` for consistency.
Signed-off-by: Alex Leong <alex@buoyant.io>
# Install PR
This script takes a Github pull request number as an argument, downloads the
docker images from the pull request's artifacts, pushes them, and installs
them on your Kubernetes cluster. Requires a Github personal access token
in the $GITHUB_TOKEN environment variable.
Signed-off-by: Alex Leong <alex@buoyant.io>
* use custom all values for top line dashboard
* convert remaining allValue params to wildcard glob
Signed-off-by: Matt Miller <mamiller@rosettastone.com>
## Motivation
Closes#4140
Automatically create new edge release tag:
```
❯ bin/create-release-tag edge
edge-20.3.2 tag created and signed.
tag: edge-20.3.2
To push tag, run:
git push origin edge-20.3.2
```
Validate new stable release tag:
```
❯ bin/create-release-tag stable 2.7.1
stable-2.7.1 tag created and signed.
tag: stable-2.7.1
To push tag, run:
git push origin stable-2.7.1
```
## Solution
The release tag script now takes a release channel argument. If the release
channel argument is `stable`, a second argument is required for the version.
If the release channel is `edge`, the script gets the current edge version and
creates a new edge version with the current year: `YY`, month: `MM`, and
increments the current month minor if it is not a new month.
If the release channel is `stable`, the script will only validate the version.
Example error cases:
```
❯ bin/create-release-tag
Error: create-release-tag accepts 1 or 2 arguments
Usage:
create-release-tag edge
create-release-tag stable x.x.x
```
```
❯ bin/create-release-tag foo
Error: valid release channels: edge, stable
Usage:
bin/create-release-tag edge
bin/create-release-tag stable 2.4.8
```
```
❯ bin/create-release-tag edge 2.7.1
Error: accepts 1 argument
Usage:
bin/create-release-tag edge
```
```
❯ bin/create-release-tag stable
Error: accepts 2 arguments
Usage:
bin/create-release-tag stable 2.4.8
```
```
❯ bin/create-release-tag stable 2.7
Error: version reference incorrect
Usage:
bin/create-release-tag stable 2.4.8
```
```
❯ bin/create-release-tag stable 2.7.1.1
Error: version reference incorrect
Usage:
bin/create-release-tag stable 2.4.8
```
More helpful error messages when the `linkerd alpha stat` command fails. For example, when the user is not authorized:
```
> linkerd alpha stat deploy/web -n emojivoto --as obama@buoyant.io
Error: deployments.metrics.smi-spec.io "web" is forbidden: User "obama@buoyant.io" cannot get resource "deployments" in API group "metrics.smi-spec.io" in the namespace "emojivoto"
```
When an error is encountered on the server:
```
> linkerd alpha stat deploy -n emojivoto
Error: Unauthorized client certificate. Check configuration and try again.
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR introduces the `linkerd alpha stat` command which will eventually replace the `linkerd stat` command. This command functions in a similar way, but with slightly different arguments and is implemented using the smi-metrics API. This means that access to metrics can be controlled with RBAC.
See the `linkerd alpha stat` help text for full details, or try one of these commands:
* `linkerd alpha stat -n emojivoto deploy/web`
* `linkerd alpha stat -n emojivoto deploy`
* `linkerd alpha stat -n emojivoto deploy/web --to deploy/emoji`
Signed-off-by: Alex Leong <alex@buoyant.io>
* proxy: v2.88.0
This release includes a significant internal change to how backpressure
is handled in the proxy. These changes fix a class of bugs related to discovery
staleness, and it should be rarer to encounter "dispatch timeout"
errors.
---
* orig-proto: Be more flexible to stack placement (linkerd/linkerd2-proxy#444)
* Remove Clone requirement from controller clients (linkerd/linkerd2-proxy#449)
* server: Simplify HTTP server type constraints (linkerd/linkerd2-proxy#450)
* Overhaul buffering & caching to better-support backpressure (linkerd/linkerd2-proxy#453)
This is a followup to #4129, fixing this warning:
```
In ./bin/create-release-tag line 32:
tmp=$(. "$bindir"/_release.sh; extract_release_notes)
^-------------------^ SC2119: Use
extract_release_notes "$@" if function's
$1 should mean script's $1.
```
In order to use functions in bash that use optional arguments that don't
generate this warning, we have to disable the SC2120 check, as explained here:
https://github.com/koalaman/shellcheck/wiki/SC2120#exceptions
Extracted the logic to pull the latest release notes, out of
`bin/create-release-tag` into `bin/_release.sh` so that it can be reused
in the `release.yml` workflow, which needs to use that inside
`gh_release` when creating the github release in order to have prettier
markup release notes instead of a plaintext message pulled out of the tag
message.
The new extracted function also receives an optional argument with the
name of the file to put the release notes into, because the `body_path`
parameter in `softprops/action-gh-release` doesn't work with dynamic
vars.
Finally, now the `website_publish` job will only launch until the `gh_release`
has succeeded.
Unit tests that exercise most of the code in cluster_watcher.go. Essentially the whole cluster mirroring machinary can be tought of as a function that takes remote cluster state, local cluster state, and modification events and as a result it either modifies local cluster state or issues new events onto the queue. This is what these tests are trying to model. I think this covers a lot of the logic there. Any suggestions for other edge cases are welcome.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
The `/namespaces` page in the web dashboard was rendering broken Grafana
links, containing an extra `var-namespace=` param, for example:
```
/grafana/dashboard/db/linkerd-namespace?var-namespace=&var-namespace=emojivoto
```
Root cause was the `GrafanaLink` component taking both `resource` and
`namespace` properties, but not special-casing when
`resource === 'namespace' && namespace === ''`.
Modify the `GrafanaLink` component to omit the `var-namespace` param
when a `namespace` property is not provided.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
PR #4117 was root-caused with the help of `shellcheck`.
This change introduces a `bin/shellcheck` script, and adds it to CI. In
CI, many checks are disabled to allow it to pass. This will at least
prevent introduction of new classes of shell issue, and should motivate
re-enabling more checks over time.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
`bin/fetch-proxy` was failing on Linux:
```bash
$ bin/fetch-proxy
linkerd2-proxy-v2.87.0/
linkerd2-proxy-v2.87.0/LICENSE
linkerd2-proxy-v2.87.0/bin/
linkerd2-proxy-v2.87.0/bin/linkerd2-proxy
bin/fetch-proxy: 31: [: Linux: unexpected operator
/home/siggy/code/linkerd2/target/proxy/linkerd2-proxy-v2.87.0
```
Also in CI:
https://github.com/linkerd/linkerd2/runs/473746447?check_suite_focus=true#step:5:32
Unfortunately `bin/fetch-proxy` still returned a zero exit status, because
`set -e` does not apply to commands that are part of `if` statements.
From https://ss64.com/bash/set.html:
```
-e Exit immediately if a simple command exits with a non-zero status, unless
the command that fails is part of an until or while loop, part of an
if statement, part of a && or || list, or if the command's return status
is being inverted using !. -o errexit
```
Fortunately when the `if` command failed, it fell through to the `else` clause
for Linux, and copied `linkerd-proxy` successfully.
Root cause was a `==` instead of `=`. `shellcheck` confirms, and also
recommends quoting:
```bash
$ shellcheck bin/fetch-proxy
In bin/fetch-proxy line 31:
if [ $(uname) == "Darwin" ]; then
^-- SC2046: Quote this to prevent word splitting.
^-- SC2039: In POSIX sh, == in place of = is undefined.
```
Apply `shellcheck` recommendations.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
When adding an action we can quickly vet it and fix it to a sha. Whereas
if we use a tag, the 3rd party can change the code and retag it without us noticing
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Adds the SMI metrics API to the Linkerd install flow. This installs the SMI metrics controller deployment, the SMI metrics ApiService object, and supporting RBAC, and config resources.
This is the first step toward having Linkerd consume the SMI metrics API in the CLI and web dashboard.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file
Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
## edge-20.2.3
This release introduces the first optional add-on `tracing`, added through the
new add-on model!
The existing optional `tracing` components Jaeger and OpenCensus can now be
installed as add-on components.
There will be more information to come about the new add-on model, but please
refer to the details of [#3955](https://github.com/linkerd/linkerd2/pull/3955) for how to get started.
* CLI
* Added the `linkerd diagnostics` command to get metrics only from the
control plane, excluding metrics from the data plane proxies (thanks
@srv-twry!)
* Added the `linkerd install --prometheus-image` option for installing a
custom Prometheus image (thanks @christyjacob4!)
* Fixed an issue with `linkerd upgrade` where changes to the `Namespace`
object were ignored (thanks @supra08!)
* Controller
* Added the `tracing` add-on which installs Jaeger and OpenCensus as add-on
components (thanks @Pothulapati!!)
* Proxy
* Increased the inbound router's default capacity from 100 to 10k to
accommodate environments that have a high cardinality of virtual hosts
served by a single pod
* Web UI
* Fixed styling in the CallToAction banner (thanks @aliariff!)
This adds a message after running the `create-release-script` that I intended to
add as part of the initial PR. Example output:
```
❯ bin/create-release-tag $TAG tag created and signed.
tag: edge-93.1.1
To push tag, run:
git push origin edge-93.1.1
```
Fixes#4105
In my local machine, `linkerd stat` was not returning traffic up until
the 17th try or so. Which explains why the 20s timeout was a bit too
close to the limit and this test was failing sometimes. So I increased
the timeout up to 40s and I'm also adding stderr to the error message.
* Add spacing in CallToAction banner
The call to action banner in control plane page is missing some spacing.
The CSS is defined but not yet used. So the solution is to add the class name to the corresponding banner.
After its merged the banner will have more space.
Fixes#3690
* Remove unused css
Signed-off-by: Ali Ariff <ali.ariff12@gmail.com>
Added the `website_publish_check` job, triggered upon edge/stable tag pushes,
that verifies that the install script at the website has indeed been
updated with the corresponding edge/stable version. It performs the
check every 5 seconds, 10 times, before giving up and failing the build
run. Tested fine in my fork 👍