This PR just modifies the log levels on the probe and cluster watchers
to emit in INFO what they would emit in DEBUG. I think it makes sense
as we need that information to track problems. The only difference is
that when probing gateways we only log if the probe attempt was
unsuccessful.
Fix#4546
When the identity annotation on a gateway service is updated, this change is not propagated to the mirror gateway endpoints object.
This is because the annotations are updated on the wrong object and the changes are lost.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add namespace global flag to hold default namespace name (#4469)
Signed-off-by: Matei David <matei.david.35@gmail.com>
* Change name of controlplane install namespace constant and init point for kubeNamespace
Signed-off-by: Matei David <matei.david.35@gmail.com>
Problem
When updating / writing tests with complex data, e.g the certificates, the build-in diff is not as powerful as dedicated external tool.
Solution
Dump all resource specifications created as part of failing tests to a supplied folder for external analysis.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Followup to #4522
This removes the `controlPlaneInstalled` var in `bin/install_test.go`
that flagged whether the control plane was already present in the series
of tests, whose intention was to avoid fetching the logs/events when the CP wasn't yet
there. That was done under the assumption `TestMain()` would feed that
flag to the runner for each individual test function, but it turns out
`TestMain()` only runs once per test file, and so
`controlPlaneInstalled` remained with its initial value `false`.
So now logs/events are fetched always, even if the control plane is not
there. If the CP is absent and we try fetching, we only see a `didn't
find any client-go entries` message.
Fixes#4454
As explained
[here](https://github.com/kubernetes/kubernetes/issues/36222#issuecomment-553966166),
trailing spaces in configmap data makes it to look funky when retrieved
later on. This is currently affecting `linkerd-config-addons` and
`linkerd-gateway-config`:
```
$ k -n linkerd-multicluster get cm linkerd-gateway-config -oyaml
apiVersion: v1
data:
nginx.conf: "events {\n}\nstream { \n
\ server { \n
\ listen 4180; \n
\ proxy_pass 127.0.0.1:4140; \n
\ } \n}
\nhttp {\n server {\n listen 4181;\n location /health {\n access_log
off;\n return 200 \"healthy\\n\";\n }\n }\n server {\n listen
\ 8888;\n location /health-local {\n access_log off;\n return
200 \"healthy\\n\";\n }\n } \n}"
kind: ConfigMap
```
AFAIK this is only cosmetic and doesn't affect functionality.
## edge-20.6.1
This edge release is a release candidate for `stable-2.8`! It introduces several
improvements and fixes for multicluster support.
* CLI
* Added multicluster daisy chain checks to `linkerd check`
* Added list of successful gatways in multicluster checks section of `linkerd
check`
* Controller
* Renamed multicluster gateway ports to `mc-gateway` and `mc-probe`
* Fixed Service Profiles routes for `linkerd-prometheus`
* Internal
* Fixed array handling in the `bin/fmt` script
* Improved error reporting for scripts in failed CI runs
* Improved logs and event reporting in CI for all integration test failures
* Fixed `uname` flags for Darwin in the `bin/lint` script
* Fixed shellcheck errors in all `bin/` scripts (thanks @joakimr-axis!)
* Helm
* Added support for `linkerd mc allow`
* Added ability to disable secret rescources for self-signed certs (thanks
@cypherfox!)
* Proxy
* Modified the `linkerd-gateway` component to use the inbound proxy, rather
than nginx, for gateway; this allows Linkerd to detect loops and propogate
identity
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Our stretch images contain some libraries/utilities with CVEs. While we
can't yet upgrade all containers (see #3486), we can upgrade the proxy
image (which is the most widely deployed).
Change terminology from local/remote to source/target in events and metrics.
This does not change any variable, function, struct, or field names since
testing is still improving
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#4531
This PR updates the `linkerd-gateway` cm's name to be templated. To allow multiple Gateway installations in the same cluster with different configmaps.
(Installing multiple gateways in the same cluster is possible only through Helm, as the CLI dosen't expose those commands currently.)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* Fixes#4305
Fixed SP route for `POST /api/v1/query`:
```
$ bin/linkerd routes -n linkerd deploy/linkerd-prometheus
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms
GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms
POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms
[DEFAULT] linkerd-prometheus - - - - -
```
Also added one missing route for `linkerd-grafana`, realizing afterwards there are
many other ones missing, but not really worth adding them all.
I also removed the routes in `linkerd-controller` for the tap routes
given that's no longer handled in that service.
And the tap service SP was also removed alltogether since nothing was
getting reported.
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.
This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.
---
* proxy: v2.99.0
The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.
---
* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
* inject: Support config.linkerd.io/enable-gateway
This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.
This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).
* proxy: v2.100.0
This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.
---
* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
- Add quotes where missing, to handle whitespace & c:o.
- Use single quotes for non-expansion strings.
- Fix quotes were the current would cause errors.
Signed-off-by: Joakim Roubert <joakim.roubert@axis.com>
The version of `uname` on Darwin doesn't support the `-o` flag, resulting in an error message when running the `bin/lint` script.
We add an if-branch to short-circuit the `uname-o` call if running on Darwin.
Signed-off-by: Alex Leong <alex@buoyant.io>
Change terminology from local/remote to source/target in service-mirror and
healthchecks help text.
This does not change any variable, function, struct, or field names since
testing is still improving
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Change terminology from local/remote to source/target in `multicluster` CLI help
text.
This does not change any variable, function, struct, or field names since
testing is still improving.
Relevant issue: #4480
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Fetch logs/events when integration test fails, not only for install tests
## Motivation
Mainly to know what caused containers to not start (or to restart), like in #4285
## Implementation
Followup to #4410, where we fetched unexpected logs/events when a test failed in `test/install_test.go`; now we're expanding that behavior to every integration test.
For that, we replace in each `TestMain()`:
```go
os.Exit(m.Run())
```
with
```go
os.Exit(testutil.Run(m, TestHelper, true))
```
where `testutil.Run()` executes the tests and fetches the logs/events if the tests failed.
Also extracted the log/event fetching and matching into its own separate file.
* Appease linter
* For external_issuer_integration_tests controlPlaninstalled wasn't being set
There are a few notable things happening in this PR:
- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Failures in `bin/_test-run` from commands different than `go test`
aren't currently properly reported, in part because CI's bash default is
to have `set -e` which terminates the script and just outputs
`##[error]Process completed with exit code 2.` like
[here](https://github.com/linkerd/linkerd2/pull/4496/checks?check_run_id=720720352#step:14:116)
```
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
linkerd-controller-6c77c7ffb8-w8wh5: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-destination-6767d88f7f-rcnbq: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-grafana-76c76fcfb9-pdhfb: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-identity-5bcf97d6c8-q6rll: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-prometheus-6b95c56b44-hd9m6: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-proxy-injector-58d794ff9-jf7cj: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-sp-validator-6c5f999bfb-qg252: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-tap-6fdf84fc65-6txvr: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-web-8484fbd867-nm8z2: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints
Status check results are ×
[error]Process completed with exit code 2.
```
I've made the following changes to `bin/_test-run` to generate better
messages and Github annotations when an error occurs:
- Unset `set -e` so that errors don't immediately exit the script and
don't allow us to properly format the errors.
- Removed many of the `exit_on_err` calls after go test calls because
those output enough information already (they were not being used
anyways in CI because of `set -e`). And instead have `run_test` exit
upon a `go test` error.
- Added `exit_on_err` calls right after non-`go-test` commands to
properly report their failure.
- Refactored the `exit_on_err` function so that it generates a Github
error annotation upon failure.
- Removed `trap` in `install_stable`, since the OS should be able to
handle GC for stuff under `/tmp`.
Also, I've changed the exit 2 code from `linkerd check` when it fails,
to exit code 1.
Fixes#4478
We add some additional output text when the "all remote cluster gateways are alive" check succeeds to list the gateways that have been detected as alive. In order to do this, we have added an `VerboseSuccess` error type. Even though this type implements the `error` interface, it represents a success which contains additional information to be printed.
Sample output when dead gateways are detected:
```
[...]
√ service mirror controller can access remote clusters
× all remote cluster gateways are alive
Some gateways are not alive:
* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
see https://linkerd.io/checks/#l5d-multicluster-remote-gateways-alive for hints
√ clusters share trust anchors
```
Sample output when all gateways are alive:
```
[...]
√ service mirror controller can access remote clusters
√ all remote cluster gateways are alive
* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
√ clusters share trust anchors
```
Signed-off-by: Alex Leong <alex@buoyant.io>
For the Edge-20.5.6 release notes: Mention under the Helm section that the user might wanna manually remove the `nginx-configuration` configmap that is left over after this upgrade.
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
A mirror-service is one that has been created by the mirror service controller and resolves to a gateway in another cluster. If a mirror service is exported (and thus mirrored into another cluster) this creates a "daisy chain" where requests can come in to the cluster through the local gateway and be immediately sent out of the cluster to a remote gateway. If the remote gateway is in the source cluster, this can create an infinite loop.
Similarly, if an exported service routes to a mirror service by a traffic split, the same daisy chain effect occurs.
One example where this can come up is with multicluster fail-over. If both clusters simultaneously fail-over even a portion of their traffic, a loop is created.
We add a check that detects either of the above conditions and warns of the existence of a daisy chain.
Signed-off-by: Alex Leong <alex@buoyant.io>
Quoting the list of directories passed to `goimports` was causing the list to be interpreted as a single argument which was stopping `bin/fmt` from working.
Instead, use `read` to split the list of directories into an array.
Also fix up incorrect formatting that has crept in while `bin/fmt` has been broken.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored.
Fx #4461
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
In #4436 `head_root_tag()` was changed to replace `sed` with a
bash-native substitution. This assumes bash is our shell, which is the
case in `bin/_tag.sh` but not in `bin/root-tag` which calls it, and
which has a `sh` shebang that in Ubuntu points to dash instead of bash,
which breaks with the new bash-native substitution. Ergo, I'm
expliciting the bash shebang in this file.
This is @psinghal20's changes in #4462 which is currently failing CI.
Fixes#4456
Description from the original PR:
> This pr renames the `cluster` command in CLI to `multicluster` command. It
> also adds a shorthand `mc` for easy use.
>
> Fixes#4456
>
> Signed-off-by: psinghal20 <psinghal20@gmail.com>
The CI failure doesn't seem to be related to this change, but has only been seen
on forks. Opening this from a non-fork for now to continue investigating.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: psinghal20 <psinghal20@gmail.com>
* When installation test fails, fetch logs and events
Re #4371
When a test fails in `./test/install_test.go`, trigger the `TestLogs`
and `TestEvents` tests in a separate process in order to output any
unexpected logs/events that might have caused the initial test failure.
For instance, currently we're sporadically experiencing pod restarts.
Instead of ignoring them, this might help provide us with the real
underlying cause.
Added comments to document several methods and strucs on cmd package. Based on GoDoc guidelines. Focus on alpha cli command
Signed-off-by: arthursens <arthursens2005@gmail.com>
In some ingress setups, the proxy could be tricked into looping requests
through the outbound proxy. We now detect these loops and fail these
requests with a 502, saving your precious CPU.
---
* outbound: Prevent loops (linkerd/linkerd2-proxy#525)
* CLI
* Fixed the display of the meshed pod column for non-selector services in
`linkerd stat` output
* Added an `addon-overwrite` upgrade flag which allows users to overwrite the
existing addon config rather than merging into it
* Added a `--close-wait-timeout` inject flag which sets the
`nf_conntrack_tcp_timeout_close_wait` property which can be used to mitigate
connection issues with application that hold half-closed sockets
* Controller
* Restricted the service-mirror's RBAC permissions so that it no longer is
able to read secrets in all namespaces
* Moved many multicluster components into the `linkerd-multicluster` namespace
by default
* Added multicluster gateway mirror services to allow multicluster liveness
probes to work in private networks
* Fixed an issue where multicluster gateway mirror services could be
incorrectly deleted during a resync
* Internal
* Fixed many style issues in build scripts (thanks @joakimr-axis!)
* Helm
* Added `global.grafanaUrl` variable to allow using an existing Grafana
installation
Signed-off-by: Alex Leong <alex@buoyant.io>