## edge-21.4.1
This is a release candidate for `stable-2.10.1`!
This includes several fixes for the core installation as well the Multicluster,
Jaeger, and Viz extensions. There are two significant proxy fixes that address
TLS detection and admin server failures.
Thanks to all our 2.10 users who helped discover these issues!
* Fixed TCP read and write bytes/sec calculations to group by label based off
inbound or outbound traffic
* Updated dashboard build to use webpack v5
* Modified the proxy-injector to add the opaque ports annotation to pods if
their namespace has it set
* Added CA certs to the Viz extension's `metrics-api` container so that it can
validate the certifcate of an external Prometheus
* Fixed an issue where inbound TLS detection from non-meshed workloads could
break
* Fixed an issue where the admin server's HTTP detection would fail and not
recover; these are now handled gracefully and without logging warnings
* Aligned the Helm installation heartbeat schedule to match that of the CLI
* Fixed an issue with Multicluster's serivce mirror where it's endpoint repair
retries were not properly rate limited
* Removed components from the control plane dashboard that now are part of the
Viz extension
* Fixed components in the Jaeger extension to set the correct Prometheus scrape
values
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5976
Currently, Jaeger and Collector components in jaeger extension
do not actually support metrics scraping because relevant
ports are not exposed and Prometheus annotations are not set
correctly.
This PR fixes those values to be the correct ones.
By default, Prometheus in `linkerd-viz` does not actually
scrape jaeger metrics, and additional configuration
has to be applied to do the same.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* multicluster: make service mirror honour `requeueLimit`
Fixes#5374
Currently, Whenever the `gatewayAddress` is changed the service
mirror component keeps trying to repairEndpoints (which is
invoked every `repairPeriod`). This behavior is fine and
expected
but as the service mirror does not honor `requeueLimit` currently,
It keeps on requeuing the same event and keeps trying with no limit.
The condition that we use to limit requeues
`if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does
not work for the following reason:
- For this queue to actually track requeues, `AddRateLimited` has to be
used instead which makes `NumRequeues` actually return the actual
number of requeues for a specific event.
This change updates the requeuing logic to use `AddRateLimited` instead
of `Add`
After these changes, The logs in the service mirror are as follows
```bash
time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote
time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote
```
As seen, The `RepairEndpoints` is called every `repairPeriod` which
is 1 minute by default. Whenever a failure happens, It is retried
but now the failures are tracked and the event is given up if it
reaches the `reuqueLimit` which is 3 by default.
This also fixes the requeuing logic for all type of events
not just `repairEndpoints`.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* tests: include viz extension in upgrade-* integration tests
Currently, The source version of viz is installed even for
upgrade integration tests. Instead, Just like the core
control-plane the viz extension should be started with stable/edge
version and then upgraded to the source version.
This helps discover any issues between upgrades for viz extension.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Schedule heartbeat 10 mins after install
... for the Helm installation method, thus aligning it with the CLI
installation method, to reduce the midnight peak on the receiving end.
The logic added into the chart is now reused by the CLI as well.
Also, set `concurrencyPolicy=Replace` so that when a job fails and it's
retried, the retries get canceled when the next scheduled job is triggered.
Finally, the go client only failed when the connection failed;
successful connections with a non 200 response status were considered
successful and thus the job wasn't retried. Fixed that as well.
This release fixes two issues:
1. The inbound proxy could break non-meshed TLS connections when the
initial ClientHello message was larger than 512 bytes or when the
entire message was not received in the first data packet of the
connection. TLS detection has been fixed to ensure that the entire
message is preserved in these cases.
2. The admin server could emit warnings about HTTP detection failing in
some innocuous situations, such as when the socket closes before
a request is sent. These situations are now handled gracefully
without logging warnings.
---
* Update MAINTAINERS to point at the main repo (linkerd/linkerd2-proxy#950)
* outbound: Configure endpoint construction in logical stack (linkerd/linkerd2-proxy#949)
* outbound: Decouple the TCP connect stack from the target type (linkerd/linkerd2-proxy#951)
* outbound: Make HTTP endpoint stack generic on its target (linkerd/linkerd2-proxy#952)
* outbound: Make the HTTP server stack generic (linkerd/linkerd2-proxy#953)
* Update profile response to include a logical address (linkerd/linkerd2-proxy#954)
* inbound, outbound: `Param`-ify `listen::Addrs` (linkerd/linkerd2-proxy#955)
* tls: Fix inbound I/O when TLS detection fails (linkerd/linkerd2-proxy#958)
* tls: Test SNI detection (linkerd/linkerd2-proxy#959)
* admin: Handle connections that fail protocol detection (linkerd/linkerd2-proxy#960)
Fixes#5966Fixes#5955
The metrics-api container in the Viz extension does not have the default set of system CA certificates installed. This means that it will fail to validate the certificate of an external prometheus serverd over https.
We add install default CA certs into the container.
Signed-off-by: Alex Leong <alex@buoyant.io>
### What
When a namespace has the opaque ports annotation, pods and services should
inherit it if they do not have one themselves. Currently, services do this but
pods do not. This can lead to surprising behavior where services are correctly
marked as opaque, but pods are not.
This changes the proxy-injector so that it now passes down the opaque ports
annotation to pods from their namespace if they do not have their own annotation
set. Closes#5736.
### How
The proxy-injector webhook receives admission requests for pods and services.
Regardless of the resource kind, it now checks if the resource should inherit
the opaque ports annotation from its namespace. It should inherit it if the
namespace has the annotation but the resource does not.
If the resource should inherit the annotation, the webhook creates an annotation
patch which is only responsible for adding the opaque ports annotation.
After generating the annotation patch, it checks if the resource is injectable.
From here there are a few scenarios:
1. If no annotation patch was created and the resource is not injectable, then
admit the request with no changes. Examples of this are services with no OP
annotation and inject-disabled pods with no OP annotation.
2. If the resource is a pod and it is injectable, create a patch that includes
the proxy and proxy-init containers—as well as any other annotations and
labels.
3. The above two scenarios lead to a patch being generated at this point, so no
matter the resource the patch is returned.
### UI changes
Resources are now reported to either be "injected", "skipped", or "annotated".
The first pass at this PR worked around the fact that injection reports consider
services and namespaces injectable. This is not accurate because they don't have
pod templates that could be injected; they can however be annotated.
To fix this, an injection report now considers resources "annotatable" and uses
this to clean up some logic in the `inject` command, as well as avoid a more
complex proxy-injector webhook.
What's cool about this is it fixes some `inject` command output that would label
resources as "injected" when they were not even mutated. For example, namespaces
were always reported as being injected even if annotations were not added. Now,
it will properly report that a namespace has been "annotated" or "skipped".
### Tests
For testing, unit tests and integration tests have been added. Manual testing
can be done by installing linkerd with `debug` controller log levels, and
tailing the proxy-injector's app container when creating pods or services.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change upgrades the current webpack version used to build the
dashboard to version 5. This upgrade required an update to some
deprecated js build libraries (i.e. `@babel-eslint` ->
`@babel/eslint-parser` and `eslint-loader` -> `eslint-webpack-plugin`)
Upgrading these packaged also added some new linting rules that. Some of
the files have been updated to fix them while some other linting rules
were disabled and will be fixed in a follow up PR.
This change also includes updates the the dashboard `Dockerfile` to
update the node and yarn versions used to build the web assets.
Fixes#5945
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
Add peer label to TCP read and write stat queries
Closes#5693
### Tests
---
After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes).
```
$ linkerd viz stat deploy/web -n emojivoto -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 91.91% 2.3rps 2ms 4ms 5ms 3 185.3B/s 5180.0B/s
# same value as before, latency seems to have dropped
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
# queries show the peer label
---
$ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 93.16% 1.9rps 3ms 4ms 4ms 1 4503.4B/s 153.1B/s
# stats same as before except for latency which seems to have dropped a bit
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
# queries show the right label
```
Signed-off-by: mateiidavid <matei.david.35@gmail.com>
* Changelog for edge-21.3.4
This release fixes some issues around publishing of CLI binary
for Apple Silicon M1 Chips. This release also includes some fixes and
improvements to the dashboard, destination, and the CLI.
* Fixed an issue where the topology graph in the dashboard was no longer
draggable
* Updated the IP Watcher in destination to ignore pods in "Terminating" state
(thanks @Wenliang-CHEN!)
* Added `installNamespace` toggle in the jaeger extension's install.
(thanks @jijeesh!)
* Updated `healthcheck` pkg to have `hintBaseURL` configurable, useful
for external extensions using that pkg
* Added multi-arch support for RabbitMQ integration tests (thanks @barkardk!)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change updates the bin/web script to work with linkerd viz's
`metrics-api`. Previously, the script was still pointing to the
linkerd `controller` pod which no longer has the public-api to get
metrics.
This change also fixes the script's port forward function so that we can
point to both the `linkerd-controller` and the `metrics-api`. This was
causing the linkerd web development environment to not start up when
running the script.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Fixes#5939
Some CNIs reasssign the IP of a terminating pod to a new pod, which
leads to duplicate IPs in the cluster.
It eventually triggers #5939.
This commit will make the IPWatcher, when given an IP, filter out the terminating pods
(when a pod is given a deletionTimestamp).
The issue is hard reproduce because we are not able to assign a
particular IP to a pod manually.
Signed-off-by: Bruce <wenliang.chen@personio.de>
Co-authored-by: Bruce <wenliang.chen@personio.de>
This PR adds a new field into `values.yaml` of
the jaeger extension i.e `installNamespace` used
to toggle the presence of namespace manifest.
This is useful when installing/upgrading into a
custom namespace and follows the same pattern
as that of other extensions
Signed-off-by: jijeesh <jijeesh.ka@gmail.com>
`CheckDeployments()` verified that some deployment had the appropriate
number of replicas in the Ready state. `CheckPods()` does the same, plus
checking if there were restarts (a single restart returns
`RestartCountError` which only triggers a warning on the calling side,
more restarts trigger a regular error). We were always calling the
former followed by a call to the latter which is superfluous, so we're
getting rid of the latter.
Also, the `testutil.DeploySpec` struct had a `Containers` field for
checking the name of containers, but that wasn't always checked and
didn't really represent a potential error that wouldn't be clearly
manifested otherwise (like in golden files), so that was simplified as
well.
* checks: make hintBaseURL configurable
Fixes#5740
Currently, When external binaries use the `healthcheck` pkg, They
can't really override the `hintBaseURL` variable which is used
to set the baseURL of the troubleshooting doc (in which hintAnchor
is used as a anchor).
This PR updates that by adding a new `hintBaseURL` field to
`healthcheck.Category`. This field has been added to `Category`
instead of `Checker` so as to reduce unecessary redundancy as
the chances of setting a different baseURL for checkers under
the same category is very low.
This PR also updates the `runChecks` logic to automatically fallback
onto the linkerd troubleshooting doc as the baseURL so that all
the current checks don't have to set this field. This can be done
otherwise (i.e removing the fallback logic and make all categories
specify the URL) if there is a problem.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change fixes an issue where the default linkerd MacOS binary would
not get created by `docker-build-cli-bin`. This was caused by the
`Darwin/arm64` build step overwriting the binary created by the previous
`Darwin/amd64` build step. Tested the change on MacOS and confirmed that
both the default amd64 and arm64 versions are built.
Fixes#5933
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
This release includes various bug fixes and improvements to the CLI, the
identity and destination control plane components as well as the proxy. This
release also ships with a new CLI binary for Apple Silicon M1 chips.
* Updated `helm-upgrade` and `upgrade-stable` integration tests now that 2.10
has been released
* Added new RabbitMQ integration tests (thanks @barkardk!)
* Updated the Go version to 1.16.2
* Fixed an issue where the `linkerd identity` command returned the root
certificate of a pod instead of its leaf certificate
* Fixed an issue where the destination service would respond with too
big of a header and result in http2 protocol errors
* Updated `docker-build-cli-bin` to build Darwin Arm64 binaries. This
now adds support for running Linkerd CLI on Apple Silicon M1 chips
* Improved error messaging when trying to install Linkerd on a cluster
that already had Linkerd installed
* Fixed an issue where the `destination` control plan component sometimes
returned endpoint addresses with a `0` port number while pods were
undergoing a rollout
* Added a loading spinner to the `linkerd check` command when running extension
checks
* Fixed an issue where pod lookups by host IP and host port fail even though
the cluster has a matching pod
* Control plane proxies no longer emit warnings about the resolution stream
ending. This error was innocuous.
* Fixed an issue where proxies could infinitely retry failed requests to
the `destination` controller when it returned a `FailedPrecondition`
* The proxy's logging infrastructure has been updated to reduce memory
pressure in high-connection environments.
This release includes several stability improvements, following initial
feedback from the stable-2.10.0 release:
* The control plane proxies no longer emit warnings about the resolution
stream ending. This error was innocuous.
* The proxy's logging infrastructure has been updated to avoid including
client addresses in cached logging spans. Now, client addresses are
preserved to be included in warning logs. This should reduce memory
pressure in high-connection environments.
* The proxy could infinitely retry failed requests to the destination
controller when it returned a FailedPrecondition, indicating an
unexpected cluster state. These errors are now handled gracefully.
---
* Prevent fixed-address resolutions from ending (linkerd/linkerd2-proxy#945)
* http: Parameterize authority overriding (linkerd/linkerd2-proxy#946)
* tracing: Avoid high-cardinality client in INFO spans (linkerd/linkerd2-proxy#947)
* Handle FailedPrecondition errors from the control plane (linkerd/linkerd2-proxy#948)
This fixes an issue where pod lookups by host IP and host port fail even though
the cluster has a matching pod.
Usually these manifested as `FailedPrecondition` errors, but the messages were
too long and resulted in http/2 errors. This change depends on #5893 which fixes
that separate issue.
This changes how often those `FailedPrecondition` errors actually occur. The
destination service now considers pod host IPs and should reduce the frequency
of those errors.
Closes#5881
---
Lookups like this happen when a pod is created with a host IP and host port set
in its spec. It still has a pod IP when running, but requests to
`hostIP:hostPort` will also be redirected to the pod. Combinations of host IP
and host Port are unique to the cluster and enforced by Kubernetes.
Currently, the destination services fails to find pods in this scenario because
we only keep an index with pod and their pod IPs, not pods and their host IPs.
To fix this, we now also keep an index of pods and their host IPs—if and only if
they have the host IP set.
Now when doing a pod lookup, we consider both the IP and the port. We perform
the following steps:
1. Do a lookup by IP in the pod podIP index
- If only one pod is found then return it
2. 0 or more than 1 pods have the same pod IP
3. Do a lookup by IP in the pod hostIP index
- If any number of pods were found, we know that IP maps to a node IP.
Therefore, we search for a pod with a matching host Port. If one exists then
return it; if not then there is no pod that matches `hostIP:port`
4. The IP does not map to a host IP
5. If multiple pods were found in `1`, then we know there are pods with
conflicting podIPs and an error is returned
6. If no pounds were found in `1` then there is no pod that matches `IP:port`
---
Aside from the additional IP watcher test being added, this can be tested with
the following steps:
1. Create a kind cluster. kind is required because it's pods in `kube-system`
have the same pod IPs; this not the case with k3d: `bin/kind create cluster`
2. Install Linkerd with `4445` marked as opaque: `linkerd install --set
proxy.opaquePorts="4445" |kubectl apply -f -`
2. Get the node IP: `kubectl get -o wide nodes`
3. Pull my fork of `tcp-echo`:
```
$ git clone https://github.com/kleimkuhler/tcp-echo
...
$ git checkout --track kleimkuhler/host-pod-repro
```
5. `helm package .`
7. Install `tcp-echo` with the server not injected and correct host IP: `helm
install tcp-echo tcp-echo-0.1.0.tgz --set server.linkerdInject="false" --set
hostIP="..."`
8. Looking at the client's proxy logs, you should not observe any errors or
protocol detection timeouts.
9. Looking at the server logs, you should see all the requests coming through
correctly.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5637
Currently, When extension checks are running through `linkerd check`
we don't really any show status. This means that when the check
is run directly after a extension install, The CLI looks like it
got struck somewhere with no status as per the issue description.
This PR updates the code to show a spinner with the name of
extension of whose the check is being ran.
```bash
control-plane-version
---------------------
control plane is up-to-date
control plane and cli versions match
control plane running stable-2.10.0 but cli running dev-e5ff84ce-tarun
see https://linkerd.io/checks/#l5d-version-control for hints
Status check results are
Linkerd extensions checks
=========================
linkerd-jaeger
--------------
linkerd-jaeger extension Namespace exists
collector and jaeger service account exists
collector config map exists
jaeger extension pods are injected
jaeger extension pods are running
Status check results are
/ Running viz extension check
```
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
# Problem
While rolling out often not all pods will be ready in all the same set of
ports, leading the Kubernetes Endpoints API to return multiple subsets,
each covering a different set of ports, with the end result that the
same address gets repeated across subsets.
The old code for endpointsToAddresses would loop through all subsets, and the
later occurrences of an address would overwrite previous ones, with the
last one prevailing.
If the last subset happened to be for an irrelevant port, and the port to
be resolved is named, resolveTargetPort would resolve to port 0, which would
return port 0 to clients, ultimately leading linkerd-proxy to forward
connections to port 0.
This only happens if the pods selected by a service expose > 1 port, the
service maps to > 1 of these ports, and at least one of these ports is named.
# Solution
Never write an address to set of addresses if resolved port is 0, which
indicates named port resolution failed.
# Validation
Added a test case.
Signed-off-by: Riccardo Freixo <riccardofreixo@gmail.com>
* cli: update err msg when control-plane already exists.
Fixes#5889
Currently, The error msg that is sent when control-plane already
exists suggests to use `--ignore-cluster`. We should instead
suggest users to use `linkerd upgrade`.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This change modifies the Linkerd build scripts to start building darwin
arm64 CLI binaries. This is to ensure architecture compatability for the
M1 Apple CPU.
Fixes#5886
This reduces the possible HTTP response size from the destination service when
it encounters an error during a profile lookup.
If multiple objects on a cluster share the same IP (such as pods in
`kube-system`), the destination service will return an error with the two
conflicting pod yamls.
In certain cases, these pod yamls can be too large for the HTTP response and the
destination pod's proxy will indicate that with the following error:
```
hyper::proto::h2::server: send response error: user error: header too big
```
From the app pod's proxy, this results in the following error:
```
poll_profile: linkerd_service_profiles::client: Could not fetch profile error=status: Unknown, message: "http2 error: protocol error: unexpected internal error encountered"
```
We now only return the conflicting pods (or services) names. This reduces the
size of the returned error and fixes these warnings from occurring.
Example response error:
```
poll_profile: linkerd_service_profiles::client: Could not fetch profile error=status: FailedPrecondition, message: "Pod IP address conflict: kube-system/kindnet-wsflq, kube-system/kube-scheduler-kind-control-plane", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 12 Mar 2021 19:54:09 GMT"} }
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fix various lint issues:
- Remove unnecessary calls to fmt.Sprint
- Fix check for empty string
- Fix unnecessary calls to Printf
- Combine multiple `append`s into a single call
Signed-off-by: shubhendra <withshubh@gmail.com>
* update go.mod and docker images to go 1.16.1
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* update test error messages for ParseDuration
* update go version to 1.16.2
Add integration tests for external resources
linkerd changes sometimes causes regressions on external components that users have installed in their stacks such as rabbitmq.
This PR adds a new integration test, adds functionality to install additional components and run basic test on them to make sure linkerd changes do not have an adverse effect w.r.t RabbitMq.
This change will include the following steps
- Deploy a rabbitmq server
- Deploy a rabbitmq client, the code for that client is hosted at https://github.com/barkardk/integration
- Use a golden file for output verification
- Uses the test suite to inject linkerd into the deployments
- Client then creates a queue, a message and consumes said message and sends a log output.
Signed-off-by: Kristin Barkardottir <kristin.barkardottir@netapp.com>
* Fix helm-upgrade integration test
Update `install_test.go` now that the upgrade test is done from 2.10.
This also implied installing viz right after core.
Refactored `HelmInstallPlain()` into `HelmCmdPlain()` to work for both
helm install and upgrade.
* Add expected heartbeat config entry to the upgrade-stable test, and remove testCheckCommand arg (for lint)
* Update BUILD.md
Fixes#5063
- Completed table of contents.
- Added references to viz and other extensions there where appropriate.
- Updated components chart.
- Removed build architecture chart, too big and convoluted now to be of use ¯\_(ツ)_/¯
- Moved `go test` paragraphs into `TEST.md`, and moved from `TEST.md` the linting paragraph.
- Group Helm paragraphs together.
- Deleted the "Go modules and dependencies" paragraph. We can assume most people are already familiar with go modules.
- Assume we're always using buildkit
- Add notes about restarting the CP for tap an tracing to be enabled
This release introduces Linkerd extensions. The default control plane no longer
includes Prometheus, Grafana, the dashboard, or several other components that
previously shipped by default. This results a much smaller and simpler set
of core functionality. Visibility and metrics functionality is now available
in the Viz extension under the `linkerd viz` command. Cross-cluster
communication functionality is now available in the Multicluster extensions
under the `linkerd multicluster` command. Distributed tracing functionality is
not available in the Jaeger extension under the `linkerd jaeger` command.
This release also introduces the ability to mark certain ports as "opaque",
indicating that the proxy should treat the traffic as opaque TCP instead of
attempting protocol detection. This allows the proxy to provide TCP metrics
and mTLS for server-speaks-first protocols. It also enables support for
TCP traffic in the Multicluster extension.
**Upgrade notes**: Please see the [upgrade
instructions](https://linkerd.io/2/tasks/upgrade/#upgrade-notice-stable-2100).
* Proxy
* Updated the proxy to use TLS version 1.3; support for TLS 1.2 remains
enabled for compatibility with prior proxy versions
* Improved support for server-speaks-first protocols by allowing ports to be
marked as opaque, causing the proxy to skip protocol detection. Ports can
be marked as opaque by setting the `config.linkerd.io/opaque-ports`
annotation on the Pod and Service or by using the `--opaque-ports` flag with
`linkerd inject`
* Ports `25,443,587,3306,5432,11211` have been removed from the default skip
ports; all traffic through those ports is now proxied and handled opaquely
by default
* Fixed an issue that could cause the inbound proxy to fail meshed HTTP/1
requests from older proxies (from the stable-2.8.x vintage)
* Added a new `/shutdown` admin endpoint that may only be accessed over the
loopback network allowing batch jobs to gracefully terminate the proxy on
completion
* Control Plane
* Removed all components and functionality related to visibility, tracing,
or multicluster. These have been moved into extensions
* Changed the identity controller to receive the trust anchor via environment
variable instead of by flag; this allows the certificate to be loaded from a
config map or secret (thanks @mgoltzsche!)
* Added PodDisruptionBudgets to the control plane components so that they
cannot be all terminated at the same time during disruptions
(thanks @tustvold!)
* Added missing label to the `linkerd-config-overrides` secret to avoid
breaking upgrades performed with the help of `kubectl apply --prune`
* Fixed an issue where the `proxy-injector` and `sp-validator` did not refresh
their certs automatically when provided externally—like through cert-manager
* CLI
* Changed the `check` command to include each installed extension's `check`
output; this allows users to check for proper configuration and installation
of Linkerd without running a command for each extension
* Moved the `metrics`, `endpoints`, and `install-sp` commands into subcommands
under the `diagnostics` command
* Added an `--opaque-ports` flag to `linkerd inject` to easily mark ports
as opaque.
* Added the `repair` command which will repopulate resources needed for
properly upgrading a Linkerd installation
* Added Helm-style `set`, `set-string`, `values`, `set-files` customization
flags for the `linkerd install` command
* Introduced the `linkerd identity` command, used to fetch the TLS certificates
for injected pods (thanks @jimil749)
* Removed the `get` and `logs` command from the CLI
* Helm
* Changed many Helm values, please see the upgrade notes
* Viz
* Updated the Web UI to only display the "Gateway" sidebar link when the
multicluster extension is active
* Added a `linkerd viz list` command to list pods with tap enabled
* Fixed an issue where the `tap` APIServer would not refresh its certs
automatically when provided externally—like through cert-manager
* Multicluster
* Added support for cross-cluster TCP traffic
* Updated the service mirror controller to copy the
`config.linkerd.io/opaque-ports` annotation when mirroring services so that
cross-cluster traffic can be correctly handled as opaque
* Added support for multicluster gateways of types other than LoadBalancer
(thanks @DaspawnW!)
* Jaeger
* Added a `linkerd jaeger list` command to list pods with tracing enabled
* Other
* Docker images are now hosted on the `cr.l5d.io` registry
This release includes changes from a massive list of contributors. A special
thank-you to everyone who helped make this release possible:
[Lutz Behnke](https://github.com/cypherfox)
[Björn Wenzel](https://github.com/DaspawnW)
[Filip Petkovski](https://github.com/fpetkovski)
[Simon Weald](https://github.com/glitchcrab)
[GMarkfjard](https://github.com/GMarkfjard)
[hodbn](https://github.com/hodbn)
[Hu Shuai](https://github.com/hs0210)
[Jimil Desai](https://github.com/jimil749)
[jiraguha](https://github.com/jiraguha)
[Joakim Roubert](https://github.com/joakimr-axis)
[Josh Soref](https://github.com/jsoref)
[Kelly Campbell](https://github.com/kellycampbell)
[Matei David](https://github.com/mateiidavid)
[Mayank Shah](https://github.com/mayankshah1607)
[Max Goltzsche](https://github.com/mgoltzsche)
[Mitch Hulscher](https://github.com/mhulscher)
[Eugene Formanenko](https://github.com/mo4islona)
[Nathan J Mehl](https://github.com/n-oden)
[Nicolas Lamirault](https://github.com/nlamirault)
[Oleh Ozimok](https://github.com/oleh-ozimok)
[Piyush Singariya](https://github.com/piyushsingariya)
[Naga Venkata Pradeep Namburi](https://github.com/pradeepnnv)
[rish-onesignal](https://github.com/rish-onesignal)
[Shai Katz](https://github.com/shaikatz)
[Takumi Sue](https://github.com/tkms0106)
[Raphael Taylor-Davies](https://github.com/tustvold)
[Yashvardhan Kukreja](https://github.com/yashvardhan-kukreja)
Signed-off-by: Alex Leong <alex@buoyant.io>
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-21.3.2
This edge release is another release candidate for stable 2.10 and fixes some
final bugs found in testing. A big thank you to users who have helped us
identity these issues!
* Fixed an issue with the service profile validating webhook that prevented
service profiles from being added or updated
* Updated the `check` command output hint anchors to match Linkerd component
names
* Fixed a permission issue with the Viz extension's tap admin cluster role by
adding namespace listing to the allowed actions
* Fixed an issue with the proxy where connections would not be torn down when
communicating with a defunct endpoint
* Improved diagnostic logging in the proxy
* Fixed an issue with the Viz extension's Prometheus template that prevented
users from specifying a log level flag for that component (thanks @n-oden!)
* Fixed a template parsing issue that prevented users from specifying additional
ignored inbound parts through Helm's `--set` flag
* Fixed an issue with the proxy where non-HTTP streams could sometimes hang due
to TLS buffering
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This release fixes an issue where non-HTTP streams could hang due to TLS
buffering. Buffered data is now flushed more aggressively to prevent TCP
streams from getting "stuck" in the proxy.
---
* duplex: Ensure written data is flushed (linkerd/linkerd2-proxy#944)
The `ignoreInboundPorts` field is not parsed correctly when passed through the
`--set` Helm flag. This was discovered in https://github.com/linkerd/linkerd2/pull/5874#pullrequestreview-606779599.
This is happening because the value is not parsed into a string before using it
in the templating.
Before:
```
linkerd install --set proxyInit.ignoreInboundPorts="12345" |grep 12345
...
- "4190,4191,%!s(int64=12345)"
...
```
After:
```
linkerd install --set proxyInit.ignoreInboundPorts="12345" |grep 12345
...
- "4190,4191,12345"
...
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
# Problem
If a user specifies a `log.level` flag for linkerd-prometheus in `prometheus.args`, the template for linkerd-prometheus will generate a prometheus command where `--log.level` is specified twice (the first being directly interpolated from `prometheus.logLevel` in the template), and prometheus will crash-loop because its flags parser does not allow duplicate flags that are not arrays.
# Solution
Only fill in the `--log.level` flag from `.Values.prometheus.logLevel` if the `log.level` key is _not_ present in `.Values.prometheus.args`
# Validation
Added a test case.
Signed-off-by: Nathan J. Mehl <n@oden.io>
Currently, There is no `Notes` that get printed out after installatio
is performed through helm for extensions, like we do for the core
chart. This updates the viz and jaeger charts to include that
along with instructions to view the dashbaord.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>