In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.
In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting:
- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value
Note that this setting cannot be tweaked through the `install` or `inject` command
Fix: #4585
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes#4606
This has not worked as far back as stable-2.6.0.
## Solution
The recommended upgrade process is to include `--prune` as part of `kubectl
apply ..`:
```bash
$ linkerd upgrade | kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
```
This is an issue for multi-stage upgrade because `linkerd upgrade config` does
not include the `linkerd-config` ConfigMap in it's output.
`kubectl apply --prune ..` will then prune this resource because it matches the
label selector *and* is not in the above output.
The issue occurs when `linkerd upgrade control-plane` is run and expects to find
the ConfigMap that was just pruned.
This can be fixed by not suggesting to prune resources as part of the
multi-stage upgrade.
## Considered
Including `templates/config.yaml` in the install output regardless of the stage.
Instead of it being a template only used in `control-plane` stage in
[render](4aa3ca7f87/cli/cmd/install.go (L873-L886)), it could always be rendered.
This just exposes other things that are pruned in the process:
```bash
❯ bin/linkerd upgrade control-plane |kubectl apply --prune -l linkerd.io/control-plane-ns=linkerd -f -
× Failed to build upgrade configuration: secrets "linkerd-identity-issuer" not found
For troubleshooting help, visit: https://linkerd.io/upgrade/#troubleshooting
error: no objects passed to apply
```
Ultimately, resources part of the `control-plane` stage need to remain and that
will not happen if we prune all resources not in the `config` stage output
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
As reported in #4259 linkerd check run from linkerd's web cconsole is
broken as the underlying RBAC Role cannot access the apiregistration.k8s.io API Group.
With this commit the RBAC Role is fixed allowing read-only access to the API Group
apiregistration.k8s.io.
Fixes#4259
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
Put back space after `grafanaUrl` label in `linkerd-config-addons.yaml`
to avoid breaking the yaml parsing.
```
$ linkerd check
...
linkerd-addons
--------------
‼ 'linkerd-config-addons' config map exists
could not unmarshal linkerd-config-addons config-map: error
unmarshaling JSON: while decoding JSON: json: cannot unmarshal
string into Go struct field Values.global of type linkerd2.Global
```
This was added in #4544 to avoid having the configmap being badly formatted.
So this PR fixes the yaml, but then if we don't set `grafanaUrl` the
configmap format gets messed up, but apparently that's just a cosmetic
problem:
```
apiVersion: v1
data:
values: "global:\n grafanaUrl: \ngrafana:\n enabled: true\n
image:\n name:
gcr.io/linkerd-io/grafana\n name: linkerd-grafana\n resources:\n
cpu:\n limit:
240m\n memory:\n limit: null\ntracing:\n enabled:
false"
kind: ConfigMap
```
Fixes#4541
This PR adds the following checks
- if a mirrored service has endpoints. (This includes gateway mirrors too).
- if an exported service is referencing a gateway that does not exist.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
* Add namespace global flag to hold default namespace name (#4469)
Signed-off-by: Matei David <matei.david.35@gmail.com>
* Change name of controlplane install namespace constant and init point for kubeNamespace
Signed-off-by: Matei David <matei.david.35@gmail.com>
Problem
When updating / writing tests with complex data, e.g the certificates, the build-in diff is not as powerful as dedicated external tool.
Solution
Dump all resource specifications created as part of failing tests to a supplied folder for external analysis.
Signed-off-by: Lutz Behnke <lutz.behnke@finleap.com>
Fixes#4454
As explained
[here](https://github.com/kubernetes/kubernetes/issues/36222#issuecomment-553966166),
trailing spaces in configmap data makes it to look funky when retrieved
later on. This is currently affecting `linkerd-config-addons` and
`linkerd-gateway-config`:
```
$ k -n linkerd-multicluster get cm linkerd-gateway-config -oyaml
apiVersion: v1
data:
nginx.conf: "events {\n}\nstream { \n
\ server { \n
\ listen 4180; \n
\ proxy_pass 127.0.0.1:4140; \n
\ } \n}
\nhttp {\n server {\n listen 4181;\n location /health {\n access_log
off;\n return 200 \"healthy\\n\";\n }\n }\n server {\n listen
\ 8888;\n location /health-local {\n access_log off;\n return
200 \"healthy\\n\";\n }\n } \n}"
kind: ConfigMap
```
AFAIK this is only cosmetic and doesn't affect functionality.
* Fixes#4305
Fixed SP route for `POST /api/v1/query`:
```
$ bin/linkerd routes -n linkerd deploy/linkerd-prometheus
ROUTE SERVICE SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
GET /api/v1/query_range linkerd-prometheus 100.00% 3.9rps 1ms 2ms 2ms
GET /api/v1/series linkerd-prometheus 100.00% 1.1rps 1ms 1ms 1ms
POST /api/v1/query linkerd-prometheus 100.00% 3.1rps 1ms 17ms 19ms
[DEFAULT] linkerd-prometheus - - - - -
```
Also added one missing route for `linkerd-grafana`, realizing afterwards there are
many other ones missing, but not really worth adding them all.
I also removed the routes in `linkerd-controller` for the tap routes
given that's no longer handled in that service.
And the tap service SP was also removed alltogether since nothing was
getting reported.
Change terminology from local/remote to source/target in `multicluster` CLI help
text.
This does not change any variable, function, struct, or field names since
testing is still improving.
Relevant issue: #4480
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
There are a few notable things happening in this PR:
- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Failures in `bin/_test-run` from commands different than `go test`
aren't currently properly reported, in part because CI's bash default is
to have `set -e` which terminates the script and just outputs
`##[error]Process completed with exit code 2.` like
[here](https://github.com/linkerd/linkerd2/pull/4496/checks?check_run_id=720720352#step:14:116)
```
linkerd-existence
-----------------
√ 'linkerd-config' config map exists
√ heartbeat ServiceAccount exist
√ control plane replica sets are ready
× no unschedulable pods
linkerd-controller-6c77c7ffb8-w8wh5: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-destination-6767d88f7f-rcnbq: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-grafana-76c76fcfb9-pdhfb: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-identity-5bcf97d6c8-q6rll: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-prometheus-6b95c56b44-hd9m6: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-proxy-injector-58d794ff9-jf7cj: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-sp-validator-6c5f999bfb-qg252: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-tap-6fdf84fc65-6txvr: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
linkerd-web-8484fbd867-nm8z2: 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
see https://linkerd.io/checks/#l5d-existence-unschedulable-pods for hints
Status check results are ×
[error]Process completed with exit code 2.
```
I've made the following changes to `bin/_test-run` to generate better
messages and Github annotations when an error occurs:
- Unset `set -e` so that errors don't immediately exit the script and
don't allow us to properly format the errors.
- Removed many of the `exit_on_err` calls after go test calls because
those output enough information already (they were not being used
anyways in CI because of `set -e`). And instead have `run_test` exit
upon a `go test` error.
- Added `exit_on_err` calls right after non-`go-test` commands to
properly report their failure.
- Refactored the `exit_on_err` function so that it generates a Github
error annotation upon failure.
- Removed `trap` in `install_stable`, since the OS should be able to
handle GC for stuff under `/tmp`.
Also, I've changed the exit 2 code from `linkerd check` when it fails,
to exit code 1.
Quoting the list of directories passed to `goimports` was causing the list to be interpreted as a single argument which was stopping `bin/fmt` from working.
Instead, use `read` to split the list of directories into an array.
Also fix up incorrect formatting that has crept in while `bin/fmt` has been broken.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored.
Fx #4461
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
This is @psinghal20's changes in #4462 which is currently failing CI.
Fixes#4456
Description from the original PR:
> This pr renames the `cluster` command in CLI to `multicluster` command. It
> also adds a shorthand `mc` for easy use.
>
> Fixes#4456
>
> Signed-off-by: psinghal20 <psinghal20@gmail.com>
The CI failure doesn't seem to be related to this change, but has only been seen
on forks. Opening this from a non-fork for now to continue investigating.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: psinghal20 <psinghal20@gmail.com>
Added comments to document several methods and strucs on cmd package. Based on GoDoc guidelines. Focus on alpha cli command
Signed-off-by: arthursens <arthursens2005@gmail.com>
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10Fixes#4276
We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`.
Signed-off-by: Alex Leong <alex@buoyant.io>
When viewing the output of `linkerd stat` for services which do not have a selector (such as services created by the service-mirror, for example) the meshed count column shows the total number which exist, even though the service actually selects no pods at all.
We update the StatSummary implementation to account for services which have no selector.
Additionally, we update the logic of the `--unmeshed` flag. When the `--unmeshed` flag is not set, we typically skip rows for unmeshed resources because those resources would have no stats. This is not appropriate to do when the `--from` flag is also set because in this case, metrics are not collected on the target resource but are instead collected on the client-side. This means that stats can be present, even for unmeshed resources and these resources should still be displayed, even if the `--unmeshed` flag is not set.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.
Fix#4411
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR introduces a few changes that were requested after a bit of service mirror reviewing.
- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Certain install flags are intended to help with Linkerd development and generally are not useful (and are potentially confusing) to users.
We hide these flags in release (edge or stable) builds of the CLI but show them in all other builds. The list of affected flags is:
* control-plane-version
* proxy-image
* proxy-version
* image-pull-policy
* init-image
* init-image-version
Signed-off-by: Alex Leong <alex@buoyant.io>
When using cli commands that work on namespaced resources in the cluster, the default namespace used by the cli is hardcoded to the default Kubernetes namespace (i.e 'default'). This update will allow cli commands that operate on namespaced resources to automatically infer what the name of the default namespace is, by taking the relevant default from the currently used Kubeconfig context. In short, this allows the omission of the -n flag in commands such as linkerd metrics, when working with resources that belong to a namespace that is set as default in the currently active context.
Validation was done manually by setting the default namespace of the currently used context, as well as through two integration tests that target the tap and get command respectively.
Signed-off-by: Matei David <matei.david.35@gmail.com>
This allows end user flexibility for options such as log format. Rather than bubbling up such possible config options into helm values, extra arguments provides more flexibility.
Add prometheusAlertmanagers value allows configuring a list of statically targetted alertmanager instances.
Use rule configmaps for prometheus rules. They take a list of {name,subPath,configMap} values and mounts them accordingly. Provided that subpaths end with _rules.yml or _rules.yaml they should be loaded by prometheus as per prometheus.yml's rule_files content.
Signed-off-by: Naseem <naseem@transit.app>
Fixes#3807
By setting the LINKERD2_PROXY_DESTINATION_GET_NETWORKS environment variable, we configure the Linkerd proxy to do destination lookups for authorities which are IP addresses in the private network range. This allows us to get destination metadata including identity for HTTP requests which target an IP address in the cluster, Prometheus metrics scrape requests, for example.
This change allowed us to update the "direct edges" test which ensures that the edges command produces correct output for traffic which is addressed directly to a pod IP.
We also re-enabled the "linkerd stat" integration tests which had been disabled while the destination service did not yet support these types of IP queries.
Signed-off-by: Alex Leong <alex@buoyant.io>