In #4595 we stopped failing integration tests whenever a pod restarted
just once, which is being caused by containerd/containerd#4068.
But we forgot to remove the warning event corresponding to that
containerd failure, and such unexpected event continues to fail the
tests. So this change adds that event to the list of expected ones.
The `choco_pack` job only runs for stable tags. In order for jobs to
depend on it to run on non-stable tags, we need to move this tag check from the
`choco_pack` job level down into its steps.
## edge-20.6.3
This edge release is a release candidate for stable-2.8.1. It includes a fix
to support multicluster gateways on EKS.
* The `config.linkerd.io/proxy-destination-get-networks` annotation configures
the networks for which a proxy can discover metadata. This is an advanced
configuration option that has security implications.
* The multicluster service-mirror has been extended to resolve DNS names for
target clusters when an IP address it not known.
* Linkerd checks could fail when run from the dashboard. Thanks to @alex-berger
for providing a fix!
* The CLI will be published for Chocolatey (Windows) on future stable releases.
* Base images have been updated:
* debian:buster-20200514-slim
* grafana/grafana:7.0.3
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
Co-authored-by: Oliver Gould <ver@buoyant.io>
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.
In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting:
- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value
Note that this setting cannot be tweaked through the `install` or `inject` command
Fix: #4585
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Explicitly shebang `bin/update-go-deps-shas` with `#!/bin/bash` instead
of `#!/bin/sh` because the latter points to `dash` in most Ubuntu-based
distros, and the script's `bin/_tag.sh` dependency requires bash.
## Problem
#4557 changed the name of the function that `helm_upgrade_integration_tests`
uses.
`install_stable()` was renamed to `latest_release_channel()` and now takes an
argument for specifying either `edge` or `stable`.
`run_helm_upgrade_test` is a function used by the helm upgrade integration test
and was not properly updated to use `latest_release_channel()`.
This silently passed integration tests because `run_helm_upgrade_test` started
passing an empty string for the version to upgrade from, which results in the
default behavior of `install_test.go`--and therefore still passes.
## Solution
`run_helm_upgrade_test` now uses `latest_release_channel()` and passes the
proper argument.
Additionally, it checks that the version returned from
`latest_release_channel()` is not empty. If it is empty, it exits the test. This
ensures something like this does happen in the future.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Adds parameters like kubernetesHelper, k8scontext, etc to the NewGenericTestHelper func allowing it to be more general, and to be able to be usable through linkerd2-conformance
* Integration tests: Warn (instead of erroring) upon pod restarts
Fixes#4595
Don't have integration tests fail whenever a pod is detected to have
restarted just once. For now we'll be just logging this out and creating
a warning annotation for it.
* Fix install-pr script
* Add image-archives path to commands to use the files
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Signed-off-by: Charles Pretzer <charles@buoyant.io>
Co-authored-by: Charles Pretzer <charles@buoyant.io>
Fixes#4606
This has not worked as far back as stable-2.6.0.
## Solution
The recommended upgrade process is to include `--prune` as part of `kubectl
apply ..`:
```bash
$ linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
```
This is an issue for multi-stage upgrade because `linkerd upgrade config` does
not include the `linkerd-config` ConfigMap in it's output.
`kubectl apply --prune ..` will then prune this resource because it matches the
label selector *and* is not in the above output.
The issue occurs when `linkerd upgrade control-plane` is run and expects to find
the ConfigMap that was just pruned.
This can be fixed by not suggesting to prune resources as part of the
multi-stage upgrade.
## Considered
Including `templates/config.yaml` in the install output regardless of the stage.
Instead of it being a template only used in `control-plane` stage in
[render](4aa3ca7f87/cli/cmd/install.go (L873-L886)), it could always be rendered.
This just exposes other things that are pruned in the process:
```bash
❯ bin/linkerd upgrade control-plane |kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
× Failed to build upgrade configuration: secrets "linkerd-identity-issuer" not found
For troubleshooting help, visit: https://linkerd.io/upgrade/#troubleshooting
error: no objects passed to apply
```
Ultimately, resources part of the `control-plane` stage need to remain and that
will not happen if we prune all resources not in the `config` stage output
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This adds an integration test for upgrading from the latest edge to the current
build.
Closes#4471
Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com
* CI steps for Chocolatey package - take 2
Followup to #4205, supersedes #4205
This adds:
- A new job psscript-analyzer into the `statics_checks.yml`
workflow for linting the Chocolatey Powershell script.
- A new `choco_pack` job in the `release.yml` workflow for
updating the Chocolatey spec file and generating the
package. This is only triggered for stable releases. It requires
a windows runner in order to run the choco tooling (in theory
it should have worked on a linux runner but in practice it
didn't).
- The `Create release` step was updated to upload the generated package,
if present.
- The source file path in `bin/win/linkerd.nuspec` was updated
to make this work.
* Name nupkg file accordingly to the other release assets
My experience of our CODEOWNERS setup is that it frequently causes us to
require additional pro-forma reviews, but I think we can do a decent job
of getting the proper reviews informally without enforcing ownership.
I'd like to simplify this by relaxing the CODEOWNERS to add
@linkerd/maintainers by default. The project infrastructure docs should
remain locked-down, requiring a review from me; and I've updated the
CHANGES review requirement to be @adleong and I (practically, I'll
review most of the CHANGES, but Alex is a suitable fallback in most
cases).
Then, we leave the CNI ownership as-is (unless others want to volunteer
for those reviews ;).
Fixes#4582
When a target cluster gateway is exposed as a hostname rather than with a fixed IP address, the service mirror controller fails to create mirror services and gateway mirrors for that gateway. This is because we only look at the IP field of the gateway service.
We make two changes to address this problem:
First, when extracting the gateway spec from a gateway that has a hostname instead of an IP address, we do a DNS lookup to resolve that hostname into an IP address to use in the mirror service endpoints and gateway mirror endpoints.
Second, we schedule a repair job on a regular (1 minute) to update these endpoint objects. This has the effect of re-resolving the DNS names every minute to pick up any changes in DNS resolution.
Signed-off-by: Alex Leong <alex@buoyant.io>
As reported in #4259 linkerd check run from linkerd's web cconsole is
broken as the underlying RBAC Role cannot access the apiregistration.k8s.io API Group.
With this commit the RBAC Role is fixed allowing read-only access to the API Group
apiregistration.k8s.io.
Fixes#4259
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
Put back space after `grafanaUrl` label in `linkerd-config-addons.yaml`
to avoid breaking the yaml parsing.
```
$ linkerd check
...
linkerd-addons
--------------
‼ 'linkerd-config-addons' config map exists
could not unmarshal linkerd-config-addons config-map: error
unmarshaling JSON: while decoding JSON: json: cannot unmarshal
string into Go struct field Values.global of type linkerd2.Global
```
This was added in #4544 to avoid having the configmap being badly formatted.
So this PR fixes the yaml, but then if we don't set `grafanaUrl` the
configmap format gets messed up, but apparently that's just a cosmetic
problem:
```
apiVersion: v1
data:
values: "global:\n grafanaUrl: \ngrafana:\n enabled: true\n
image:\n name:
gcr.io/linkerd-io/grafana\n name: linkerd-grafana\n resources:\n
cpu:\n limit:
240m\n memory:\n limit: null\ntracing:\n enabled:
false"
kind: ConfigMap
```
Fixes#4541
This PR adds the following checks
- if a mirrored service has endpoints. (This includes gateway mirrors too).
- if an exported service is referencing a gateway that does not exist.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
Container-optimized OS on GKE runs with a set of read/write rules that prevent the linkerd-gateway from starting up.
These changes move the directories that nginx needs to write to /tmp and configures the error_log to write to stderr
Signed-off-by: Charles Pretzer charles@buoyant.io
This PR just modifies the log levels on the probe and cluster watchers
to emit in INFO what they would emit in DEBUG. I think it makes sense
as we need that information to track problems. The only difference is
that when probing gateways we only log if the probe attempt was
unsuccessful.
Fix#4546
When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object.
This is because the annotations are updated on the wrong object and the changes are lost.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add namespace global flag to hold default namespace name (#4469)
Signed-off-by: Matei David <matei.david.35@gmail.com>
* Change name of controlplane install namespace constant and init point for kubeNamespace
Signed-off-by: Matei David <matei.david.35@gmail.com>
Problem
When updating / writing tests with complex data, e.g the certificates, the build-in diff is not as powerful as dedicated external tool.
Solution
Dump all resource specifications created as part of failing tests to a supplied folder for external analysis.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Followup to #4522
This removes the `controlPlaneInstalled` var in `bin/install_test.go`
that flagged whether the control plane was already present in the series
of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet
there. That was done under the assumption `TestMain()` would feed that
flag to the runner for each individual test function, but it turns out
`TestMain()` only runs once per test file, and so
`controlPlaneInstalled` remained with its initial value `false`.
So now logs/events are fetched always, even if the control plane is not
there. If the CP is absent and we try fetching, we only see a `didn't
find any client-go entries` message.
Fixes#4454
As explained
[here](https://github.com/kubernetes/kubernetes/issues/36222#issuecomment-553966166),
trailing spaces in configmap data makes it to look funky when retrieved
later on. This is currently affecting `linkerd-config-addons` and
`linkerd-gateway-config`:
```
$ k -n linkerd-multicluster get cm linkerd-gateway-config -oyaml
apiVersion: v1
data:
nginx.conf: "events {\n}\nstream { \n
\ server { \n
\ listen 4180; \n
\ proxy_pass 127.0.0.1:4140; \n
\ } \n}
\nhttp {\n server {\n listen 4181;\n location /health {\n access_log
off;\n return 200 \"healthy\\n\";\n }\n }\n server {\n listen
\ 8888;\n location /health-local {\n access_log off;\n return
200 \"healthy\\n\";\n }\n } \n}"
kind: ConfigMap
```
AFAIK this is only cosmetic and doesn't affect functionality.
## edge-20.6.1
This edge release is a release candidate for `stable-2.8`! It introduces several
improvements and fixes for multicluster support.
* CLI
* Added multicluster daisy chain checks to `linkerd check`
* Added list of successful gatways in multicluster checks section of `linkerd
check`
* Controller
* Renamed multicluster gateway ports to `mc-gateway` and `mc-probe`
* Fixed Service Profiles routes for `linkerd-prometheus`
* Internal
* Fixed array handling in the `bin/fmt` script
* Improved error reporting for scripts in failed CI runs
* Improved logs and event reporting in CI for all integration test failures
* Fixed `uname` flags for Darwin in the `bin/lint` script
* Fixed shellcheck errors in all `bin/` scripts (thanks @joakimr-axis!)
* Helm
* Added support for `linkerd mc allow`
* Added ability to disable secret rescources for self-signed certs (thanks
@cypherfox!)
* Proxy
* Modified the `linkerd-gateway` component to use the inbound proxy, rather
than nginx, for gateway; this allows Linkerd to detect loops and propogate
identity
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Our stretch images contain some libraries/utilities with CVEs. While we
can't yet upgrade all containers (see #3486), we can upgrade the proxy
image (which is the most widely deployed).
Change terminology from local/remote to source/target in events and metrics.
This does not change any variable, function, struct, or field names since
testing is still improving
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#4531
This PR updates the `linkerd-gateway` cm's name to be templated. To allow multiple Gateway installations in the same cluster with different configmaps.
(Installing multiple gateways in the same cluster is possible only through Helm, as the CLI dosen't expose those commands currently.)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Fixes#4305
Fixed SP route for `POST /api/v1/query`:
```
$ bin/linkerd routes -n linkerd deploy/linkerd-prometheus
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms
GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms
POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms
[DEFAULT] linkerd-prometheus - - - - -
```
Also added one missing route for `linkerd-grafana`, realizing afterwards there are
many other ones missing, but not really worth adding them all.
I also removed the routes in `linkerd-controller` for the tap routes
given that's no longer handled in that service.
And the tap service SP was also removed alltogether since nothing was
getting reported.
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.
This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.
---
* proxy: v2.99.0
The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.
---
* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
* inject: Support config.linkerd.io/enable-gateway
This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.
This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).
* proxy: v2.100.0
This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.
---
* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)