Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
Based on the [EndpointSlice PR](https://github.com/linkerd/linkerd2/pull/4663), this is just the k8s/api support for endpointslices to shorten the first PR.
* Adds CRD
* Adds functions that check whether the cluster has EndpointSlice access
* Adds discovery & endpointslice informers to api.
Signed-off-by: Matei David <matei.david.35@gmail.com>
* feat: add log format annotation and helm value
Json log formatting has been added via https://github.com/linkerd/linkerd2-proxy/pull/500
but wiring the option through as an annotation/helm value is still
necessary.
This PR adds the annotation and helm value to configure log format.
Closes#2491
Signed-off-by: Naseem <naseem@transit.app>
Currently linkerd check appears to hang on HA installations where there are pods that are unscheduable. In reality it is just wating on a condition that might never become true without showing any useful information (i.e. which pods are not scheduled). This change adds sets the `surfaceErrorOnRetry: true` so the user gets feedback wrt to what conditions are not met yet instead of simply being shown waiting for check to complete.
Fix#4680
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Currently commands that need access to the public api are executing the `LinkerdControlPlaneExistenceChecks` This set of checks includes one that specifically checks that there is no unscheduable pods. In fact in order to run commands like stat and edge we do not need to meet that requirement.
This change relaxes all this by makind the no unschedulable pods a warning only check. Fixes#3940
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
Data disappears upon prometheus restarts due to it being all in-memory.
Adding an option to enabled persistence by means of a PVC would be the right approach. It is commonly seen in a wide array of helm charts.
Fixes#4576
Signed-off-by: Naseem <naseem@transit.app>
Regenerated protobuf files, using version 1.4.2 that was upgraded from
1.3.2 with the proxy-api update in #4614.
As of v1.4 protobuf messages are disallowed to be copied (because they
hold a mutex), so whenever a message is passed to or returned from a
function we need to use a pointer.
This affects _mostly_ test files.
This is required to unblock #4620 which is adding a field to the config
protobuf.
* Update inject to error out on failure
Update injection process to throw an error when the reason for failure is due to sidecar, udp, automountServiceAccountToken or hostNetwork
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Tools like cert-manager might encode private keys in PKCS8 format instead of PKCS1
in which case linkerd would fail as it cannot parse PKCS8 encoded private keys.
With this commit support for parsing PKCS8 encoded private keys is added to linkerd,
allowing it to read ECDSA and RSA keys encoded in PKCS8.
Unit tests have been added to test the private key parsing.
This commit addresses https://github.com/jetstack/cert-manager/issues/2942.
Signed-off-by: Alexander Berger <alex.berger@nexxiot.com>
Signed-off-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
Co-authored-by: alex.berger@nexiot.ch <alex.berger@nexiot.ch>
This PR makes the service mirror controller is running retry on failure. This brings the check in line with the rest of the checks that verify that certain Linkerd components are running. It is especially useful in integration tests when we want to wait for the service mirror component to be initialized for a certain amount of time before we simply fail the linkerd check command
Fix#4642
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
In #4585 we are observing an issue where a loop is encountered when using nginx ingress. The problem is that the outbound proxy does a dst lookup on the IP address which happens to be the very same address the ingress is listening on.
In order to avoid situations like that this PR introduces a way to modify the set of networks for which the proxy shall do IP based discovery. The change introduces a helm flag `.Values.global.proxy.destinationGetNetworks` that can be used to modify this value. There are two ways a user can affect the this setting:
- setting the `destinationGetNetworks` field in values during a Helm install, which changes the default on all injected pods
- using an annotation ` config.linkerd.io/proxy-destination-get-networks` for injected workloads to override this value
Note that this setting cannot be tweaked through the `install` or `inject` command
Fix: #4585
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes#4541
This PR adds the following checks
- if a mirrored service has endpoints. (This includes gateway mirrors too).
- if an exported service is referencing a gateway that does not exist.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Alex Leong <alex@buoyant.io>
This change modifies the linkerd-gateway component to use the inbound
proxy, rather than nginx, for gateway. This allows us to detect loops and
propagate identity through the gateway.
This change also cleans up port naming to `mc-gateway` and `mc-probe`
to resolve conflicts with Kubernetes validation.
---
* proxy: v2.99.0
The proxy can now operate as gateway, routing requests from its inbound
proxy to the outbound proxy, without passing the requests to a local
application. This supports Linkerd's multicluster feature by adding a
`Forwarded` header to propagate the original client identity and assist
in loop detection.
---
* Add loop detection to inbound & TCP forwarding (linkerd/linkerd2-proxy#527)
* Test loop detection (linkerd/linkerd2-proxy#532)
* fallback: Unwrap errors recursively (linkerd/linkerd2-proxy#534)
* app: Split inbound/outbound constructors into components (linkerd/linkerd2-proxy#533)
* Introduce a gateway between inbound and outbound (linkerd/linkerd2-proxy#540)
* gateway: Add a Forwarded header (linkerd/linkerd2-proxy#544)
* gateway: Return errors instead of responses (linkerd/linkerd2-proxy#547)
* Fail requests that loop through the gateway (linkerd/linkerd2-proxy#545)
* inject: Support config.linkerd.io/enable-gateway
This change introduces a new annotation,
config.linkerd.io/enable-gateway, that, when set, enables the proxy to
act as a gateway, routing all traffic targetting the inbound listener
through the outbound proxy.
This also removes the nginx default listener and gateway port of 4180,
instead using 4143 (the inbound port).
* proxy: v2.100.0
This change modifies the inbound gateway caching so that requests may be
routed to multiple leaves of a traffic split.
---
* inbound: Do not cache gateway services (linkerd/linkerd2-proxy#549)
Change terminology from local/remote to source/target in service-mirror and
healthchecks help text.
This does not change any variable, function, struct, or field names since
testing is still improving
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
There are a few notable things happening in this PR:
- the probe manager has been decoupled from the cluster_watcher. Now its only responsibility is to watch for mirrored gateways beeing created and to probe them. This means that probes are initiated for all gateways no matter whether there are mirrored services being paired
- the number of paired services is derived from the existing services in the cluster rather than being published as a metric by the prober
- there are no events being exchanged between the cluster watcher and the probe manager
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes#4478
We add some additional output text when the "all remote cluster gateways are alive" check succeeds to list the gateways that have been detected as alive. In order to do this, we have added an `VerboseSuccess` error type. Even though this type implements the `error` interface, it represents a success which contains additional information to be printed.
Sample output when dead gateways are detected:
```
[...]
√ service mirror controller can access remote clusters
× all remote cluster gateways are alive
Some gateways are not alive:
* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
see https://linkerd.io/checks/#l5d-multicluster-remote-gateways-alive for hints
√ clusters share trust anchors
```
Sample output when all gateways are alive:
```
[...]
√ service mirror controller can access remote clusters
√ all remote cluster gateways are alive
* cluster: [gke], gateway: [linkerd-multicluster/linkerd-gateway]
√ clusters share trust anchors
```
Signed-off-by: Alex Leong <alex@buoyant.io>
A mirror-service is one that has been created by the mirror service controller and resolves to a gateway in another cluster. If a mirror service is exported (and thus mirrored into another cluster) this creates a "daisy chain" where requests can come in to the cluster through the local gateway and be immediately sent out of the cluster to a remote gateway. If the remote gateway is in the source cluster, this can create an infinite loop.
Similarly, if an exported service routes to a mirror service by a traffic split, the same daisy chain effect occurs.
One example where this can come up is with multicluster fail-over. If both clusters simultaneously fail-over even a portion of their traffic, a loop is created.
We add a check that detects either of the above conditions and warns of the existence of a daisy chain.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a `allow` and `link` commands, effectivelly enabling a cluster to have more than one set of credentials that allow it to be mirrored.
Fx #4461
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
THis PR addresses two problems:
- when a resync happens (or the mirror controller is restarted) we incorrectly classify the remote gateway as a mirrored service that is not mirrored anymore and we delete it
- when updating services due to a gateway update, we need to select only the services for the particular cluster
The latter fixes#4451
Depends on https://github.com/linkerd/linkerd2-proxy-init/pull/10Fixes#4276
We add a `--close-wait-timeout` inject flag which configures the proxy-init container to run with `privileged: true` and to set `nf_conntrack_tcp_timeout_close_wait`.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change creates a gateway proxy for every gateway. This enables the probe worker to leverage the destination service functionality in order to discover the identity of the gateway.
Fix#4411
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR introduces a few changes that were requested after a bit of service mirror reviewing.
- we restrict the RBACs so the service mirror controller cannot read secrets in all namespaces but only in the one that it is installed in
- we unify the namespace namings so all multicluster resources are installedi n `linkerd-multicluster` on both clusters
- fixed checks to account for changes
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
The Linkerd control plane components' admin servers have an idle connection timeout of 10 seconds. This means that they will close connections which have been idle for 10 seconds. These components are also configured with a 10 second period for liveness checks. This introduces a race condition where connections will be idle for approximately 10 seconds between liveness checks and can idle out, potentially causing the next liveness check to fail.
We remove the idle timeout so that the connection stays alive.
Certain install flags are intended to help with Linkerd development and generally are not useful (and are potentially confusing) to users.
We hide these flags in release (edge or stable) builds of the CLI but show them in all other builds. The list of affected flags is:
* control-plane-version
* proxy-image
* proxy-version
* image-pull-policy
* init-image
* init-image-version
Signed-off-by: Alex Leong <alex@buoyant.io>
This allows end user flexibility for options such as log format. Rather than bubbling up such possible config options into helm values, extra arguments provides more flexibility.
Add prometheusAlertmanagers value allows configuring a list of statically targetted alertmanager instances.
Use rule configmaps for prometheus rules. They take a list of {name,subPath,configMap} values and mounts them accordingly. Provided that subpaths end with _rules.yml or _rules.yaml they should be loaded by prometheus as per prometheus.yml's rule_files content.
Signed-off-by: Naseem <naseem@transit.app>
* Go test failure message wrappers to create GH Annotations
First part of #4176
## Problem
Failures in go tests need to be properly formatted as Github annotations
so that we can fetch them through Github's API for aggregation and
analysis.
## Solution
A wrapper for error messages has been created in `testutil/annotations.go`.
The idea is that instead of throwing test failures like this:
```go
t.Failf("error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
We'd throw them like this:
```go
testutil.AnnotationFatalf("error retrieving data", "error retrieving data;\nExpected: %#v\nActual: %#v", expected,
actual)
```
That will continue reporting the error as before (when using `go test`
or another test runner), but as a side-effect it will also send to
stdout something like:
```
::error file=pkg/inject_test.go,line=133::error retrieving data
```
Which becomes a GH annotation, visible in the CI run summary screen.
The fist string art is used to have the GH annotation be a generic error message
that can be aggregated and counted across multiple test runs. If `testutil.Fatalf(str, args...)`
is called instead, the original error message will be used.
Note that that the output will be produced only when the env var
`GH_ANNOTATION` is set (which will when tests are triggered from a
Github Actions workflow).
Besides `testutil/annotation.go` and its accompanying unit test file,
other changes were made in other tests as examples, the plan being that
in a further PR _all_ the tests will use these wrappers.
* Support Multi-stage install with Add-Ons
* add upgrade tests for add-ons
* add multi stage upgrade unit tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* use downward API to mount labels to the proxy container as a volume
* add namespace as a label to the pod
* add a trace inject test
* add downwardAPi for controlplaneTracing
* add controlPlaneTracing condition to volumeMounts
* update add-ons to have workload-ns
* add workload-ns label to control-plane components
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Upgrade Linkerd's base docker image to use go 1.14.2 in order to stay modern.
The only code change required was to update a test which was checking the error message of a `crypto/x509.CertificateInvalidError`. The error message of this error changed between go versions. We update the test to not check for the specific error string so that this test passes regardless of go version.
Signed-off-by: Alex Leong <alex@buoyant.io>
It can be difficult to know which versions of the proxy are running in your cluster, especially when you have pods running at multiple different proxy versions.
We add two pieces of CLI functionality to assist with this:
The `linkerd check --proxy` command will now list all data plane pods which are not up-to-date rather than just printing the first one it encounters:
```
‼ data plane is up-to-date
Some data plane pods are not running the current version:
* default/books-84958fff5-95j75 (git-ca760bdd)
* default/authors-57c6dc9b47-djldq (git-ca760bdd)
* default/traffic-85f58ccb66-vxr49 (git-ca760bdd)
* default/release-name-smi-metrics-899c68958-5ctpz (git-ca760bdd)
* default/webapp-6975dc796f-2ngh4 (git-ca760bdd)
* default/webapp-6975dc796f-z4bc4 (git-ca760bdd)
* emojivoto/voting-54ffc5787d-wj6cp (git-ca760bdd)
* emojivoto/vote-bot-7b54d6999b-57srw (git-ca760bdd)
* emojivoto/emoji-5cb99f85d8-5bhvm (git-ca760bdd)
* emojivoto/web-7988674b8b-zfvvm (git-ca760bdd)
* default/webapp-6975dc796f-d2fbc (git-ca760bdd)
* default/curl (git-7f6bbc73)
see https://linkerd.io/checks/#l5d-data-plane-version for hints
```
The `linkerd version` command now supports a `--proxy` flag which will list all proxy versions running in the cluster and the number of pods running each version:
```
linkerd version --proxy
Client version: dev-7b9d475f-alex
Server version: edge-20.4.1
Proxy versions:
edge-20.4.1 (10 pods)
git-ca760bdd (11 pods)
git-7f6bbc73 (1 pods)
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a `--smi-metrics` install flag which controls if the SMI-metrics controller and associated RBAC and APIService resources are installed. The flag defaults to false and is hidden.
We plan to remove this flag or default it to true if and when the SMI-Metrics integration graduates from experimental.
Signed-off-by: Alex Leong <alex@buoyant.io>
Here we upgrade our dependencies on client-go to 0.17.4 and smi-sdk-go to 0.3.0. Since smi-sdk-go uses client-go 0.17.4, these upgrades must be performed simultaneously.
This also requires simultaneously upgrading our dependency on linkerd/stern to a SHA which also uses client-go 0.17.4. This keeps all of our transitive dependencies synchronized on one version of client-go.
This ALSO requires updating our codegen scripts to use the 0.17.4 version of code-generator and running it to generate 0.17.4 compatible generated code. I took this opportunity to update our code generation script to properly use the version of code-generater from `go.mod` rather than a hardcoded SHA.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Handle automountServiceAccountToken
Return error during inject if pod spec has `automountServiceAccountToken: false`
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Followup to #4193
This is to verify that the list of SA installed, as well as the list of
SA in the linkerd-psp RoleBinding match the list of expected SA defined
in `healthcheck.go`.
Fixes#3943
The Linkerd clock skew check requires that all nodes in the cluster have reported a heartbeat within (approximately) the last minute. However, in Kubernetes 1.17, the default heartbeat interval is 5 minutes. This means that the clock skew check will often fail in Kubernetes 1.17 clusters.
We relax the check to only require that heartbeats have been detected in the past 5 minutes, matching the default heartbeat interval in Kubernetes 1.17. We also switch this check to be a warning so that clusters which are configured with longer heartbeat intervals don't see this as a fatal error.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Add missing SAs to linkerd check
This adds the service accounts `linkerd-destination` and
`linkerd-smi-metrics` that were missing from the "control plane
ServiceAccounts exist" check.
When injecting a Cronjob with no
`spec.jobTemplate.spec.template.metadata` we were getting the following
error:
```
Error transforming resources: jsonpatch add operation does not apply:
doc is missing path:
"/spec/jobTemplate/spec/template/metadata/annotations"
```
This only happens to Cronjobs because other workloads force having at
least a label there that is used in `spec.selector` (at least as of v1
workloads).
With this fix, if no metadata is detected, then we add it in the json patch when
injecting, prior to adding the injection annotation.
I've added a couple of new unit tests, one that verifies that this
doesn't remove metadata contents in Cronjobs that do have that metadata,
and another one that tests injection in Cronjobs that don't have
metadata (which I verified it failed prior to this fix).
This version contains an fix for a bug that was rejecting all requests on clusters configured with an empty list of allowed client names. Because smi-metrics is an apiservice, this was also preventing namespaces from terminating.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Upgrade golangci-lint to v1.23.8
This should help with some timeouts we're seeing in CI.
I fixed some new warnings found in `inject.go` and `uninject.go`.
Also we now have to explicitly disable linting `/controller/gen`.
The linter was also complaining that in `/pkg/k8s/fake.go` the
`spClient.Interface` and `tsclient.Interface` returned in the function
`newFakeClientSetsFromManifests()` aren't used, but I opted to ignore
that to leave them available for future tests.
* Bump proxy-init to v1.3.2
Bumped `proxy-init` version to v1.3.2, fixing an issue with `go.mod`
(linkerd/linkerd2-proxy-init#9).
This is a non-user-facing fix.
## Motivation
Testing #4167 has revealed some `linkerd check` failures that occur only
because the checks happen too quickly after cluster creation or install. If
retried, they pass on the second time.
Some checkers already handle this with the `retryDeadline` field. If a checker
does not set this field, there is no retry.
## Solution
Add retries to the `l5d-existence-replicasets`
`l5d-existence-unschedulable-pods` checks so that these checks do not fail
during a chained cluster creation > install > check process.
This change is in a similar vein to #4052 which provided support for
configuring service profile retries via a vendor extension of
`x-linkerd-retryable`, when generating from an openapi specification.
This change is very similar to the final version of that pull request,
and adds a timeout value based on `x-linkerd-timeout`.
At this point I believe that if the timeout is not specified then the
default provided by linkerd of 30s will apply anyway, but won't
explicitly be reflected in the service profile, which I'm somewhat okay
with as a current state, but I think there's a potential future
improvement that the default timeout is always shown when generating
from an open api spec, but that's more to make it clear and obvious that
that timeout exists.
Signed-off-by: Lewis Cowper <lewis.cowper@googlemail.com>
More helpful error messages when the `linkerd alpha stat` command fails. For example, when the user is not authorized:
```
> linkerd alpha stat deploy/web -n emojivoto --as obama@buoyant.io
Error: deployments.metrics.smi-spec.io "web" is forbidden: User "obama@buoyant.io" cannot get resource "deployments" in API group "metrics.smi-spec.io" in the namespace "emojivoto"
```
When an error is encountered on the server:
```
> linkerd alpha stat deploy -n emojivoto
Error: Unauthorized client certificate. Check configuration and try again.
```
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR introduces the `linkerd alpha stat` command which will eventually replace the `linkerd stat` command. This command functions in a similar way, but with slightly different arguments and is implemented using the smi-metrics API. This means that access to metrics can be controlled with RBAC.
See the `linkerd alpha stat` help text for full details, or try one of these commands:
* `linkerd alpha stat -n emojivoto deploy/web`
* `linkerd alpha stat -n emojivoto deploy`
* `linkerd alpha stat -n emojivoto deploy/web --to deploy/emoji`
Signed-off-by: Alex Leong <alex@buoyant.io>
This PR introduces a service mirroring component that is responsible for watching remote clusters and mirroring their services locally.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Adds the SMI metrics API to the Linkerd install flow. This installs the SMI metrics controller deployment, the SMI metrics ApiService object, and supporting RBAC, and config resources.
This is the first step toward having Linkerd consume the SMI metrics API in the CLI and web dashboard.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Check Extension api server Authentication
* Added Checks and tests for extension api-server authentication
* Fixed Failing Static Checks
* Updated the golden file
Signed-off-by: Christy Jacob <christyjacob4@gmail.com>
* Moves Common templates needed to partials
As add-ons re-use the partials helm chart, all the templates needed by multiple charts should be present in partials
This commit also updates the helm tests
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* add tracing add-on helm chart
Tracing sub-chart includes open-census and jaeger components as a sub-chart which can be enabled as needed
* Updated Install path to also install add-ons
This includes new interface for add-ons to implement, with example tracing implementation
* Updates Linkerd install path to also install add-ons
Changes include:
- Adds an optional Linkerd Values configmap which stores add-on configuration when add-ons are present.
- Updates Linkerd install path to check for add-ons and render their sub-charts.
- Adds a install Option called config, which is used to pass confiugration for add-ons.
- Uses a fork of mergo, to over-write default Values with the Values struct generated from config.
* Updates the upgrade path about add-ons.
Upgrade path now checks for the linkerd-values cm, and overwrites the default values with it, if present.
It then checks the config option, for any further overwrites
* Refactor linkerd-values and re-update tests
also adds relevant nil checks
* Refactor code to fix linting issues
* Fixes an error with linkerd-config global values
Also refactors the linkerd-values cm to work the same with helm
* Fix a nil pointer issue for tests
* Updated Tracing add-on chart meta-data
Also introduced a defaultGetFiles method for add-ons
* Add add-on/charts to gitignore
* refactor gitignore for chart deps
* Moves sub-charts to /charts directly
* Refactor linkerd values cm
* Add comment in linkerd-values
* remove extra controlplanetracing flag
* Support Stages deployment for add-ons along with tests
* linting fix
* update tracing rbac
* Removes the need for add-on Interface
- Uses helm loading capabiltiies to get info about add-ons
- Uses reflection to not have to unnecessarily add checks for each add-on type
* disable tracing flag
* Remove dep on forked mergo
- Re-use merge from helm
* Re-use helm's merge
* Override the chartDir path during tests
* add error check
* Updated the dependency iteration code
Currently, the charts directory, will not have the deps in the repo. So, Code is updated to read the dependencies from requirements.yaml
and use that info to read templates from the relevant add-ons directory.
* Hard Code add-ons name
* Remove struct details for add-ons
- As we don't use fields of a add-on struct, we don't have them to be typed. Instead we can just use the `enabled` flag using reflection
- Users can just use map[string]interface{} as the add-on type.
* update unit tests
* linting fix
* Rename flag to addon-config
* Use Chart loading logic
- This code uses chart loading to read the files and keep in a vfs.
- Once we have those files read we will then use them for generation of sub-charts.
* Go fmt fix
* Update the linkerd-values cm to use second level field
* Add relevant unit tests for mergeRaw
* linting fix
* Move addon tests to a new file
* Fix golden files
* remove addon install unit test
* Refactor sub-chart load logic
* Add install tracing unit test
* golden file update for tracing install
* Update golden files to reflect another pr changes
* Move addon-config flag to recordFlagSet
* add relevant tracing enabled checks
* linting fix
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* CLI command to fetch control plane metrics
Fixes#3116
* Add GetResonse method to return http GET response
* Implemented timeouts using waitgroups
* Refactor metrics command by extracting common code to metrics_diagnostics_util
* Refactor diagnostics to remove code duplication
* Update portforward_test for NewContainerMetricsForward function
* Lint code
* Incorporate Alex's suggestions
* Lint code
* fix minor errors
* Add unit test for getAllContainersWithPort
* Update metrics and diagnostics to store results in a buffer and print once
* Incorporate Ivan's suggestions
* consistent error handling inside diagnostics
* add coloring for the output
* spawn goroutines for each pod instead of each container
* switch back to unbuffered channel
* remove coloring in the output
* Add a long description of the command
Signed-off-by: Saurav Tiwary <srv.twry@gmail.com>
* cli: handle `linkerd metrics` port-forward gracefully
- add return for routine in func `Init()` in case of error
- add return from func `getMetrics()` if error from `portforward.Init()`
* Remove select block at pkg/k8s/portforward.go
- It is now the caller's responsibility to call pf.Stop()
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
* Retry policy is manually written in yaml file and patched it into the service profile
Added support for configuring service profile retries(x-linkerd-retryable) via openapi spec
Signed-off-by: Kohsheen Tiku <kohsheen.t@gmail.com>
* bin/helm-build automatically updates version in values.yaml
Have the Helm charts building script (`bin/helm-build`) update the
linkerd version in the `values.yaml` files according to the tagged
version, thus removing the need of doing this manually on every release.
This is akin to the update we do in `version.go` at CLI build time.
Note that `shellcheck` is issuing some warnings about this script, but
that's on code that was already there, so that will be handled in an
followup PR.
Update identity controller to make issuer certificates diagnosable if
cert validity is causing error
- Add expiry time in identity log message
- Add current time in identity log message
- Emit k8s event with appropriate message
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
This fix ensures that we ignore whitespace and newlines when checking that roots match between the Linkerd config map and the issuer secret (in the case of using external issue + Helm).
Fixes: #3907
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
## edge-20.1.3
* CLI
* Introduced `linkerd check --pre --linkerd-cni-enabled`, used when the CNI
plugin is used, to check it has been properly installed before proceeding
with the control plane installation
* Added support for the `--as-group` flag so that users can impersonate
groups for Kubernetes operations (thanks @mayankshah160!)
* Controller
* Fixed an issue where an override of the Docker registry was not being
applied to debug containers (thanks @javaducky!)
* Added check for the Subject Alternate Name attributes to the API server
when access restrictions have been enabled (thanks @javaducky!)
* Added support for arbitrary pod labels so that users can leverage the
Linkerd provided Prometheus instance to scrape for their own labels
(thanks @daxmc99!)
* Fixed an issue with CNI config parsing
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
**Subject**
Fixes bug where override of Docker registry was not being applied to debug containers (#3851)
**Problem**
Overrides for Docker registry are not being applied to debug containers and provide no means to correct the image.
**Solution**
This update expands the `data.proxy` configuration section within the Linkerd `ConfigMap` to maintain the overridden image name for debug containers at _install_-time similar to handling of the `proxy` and `proxyInit` images.
This change also enables the further override option of the registry for debug containers at _inject_-time given utilization of the `--registry` CLI option.
**Validation**
Several new unit tests have been created to confirm functionality. In addition, the following workflows were run through:
### Standard Workflow with Custom Registry
This workflow installs Linkerd control plane based upon a custom registry, then injecting the debug sidecar into a service.
* Start with a k8s instance having no Linkerd installation
* Build all images locally using `bin/docker-build`
* Create custom tags (using same version) for generated images, e.g. `docker tag gcr.io/linkerd-io/debug:git-a4ebecb6 javaducky.com/linkerd-io/debug:git-a4ebecb6`
* Install Linkerd with registry override `bin/linkerd install --registry=javaducky.com/linkerd-io | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap now contains the debug image name, pull policy, and version within the `data.proxy` section
* Request injection of the debug image into an available container. I used the Emojivoto voting service as described in https://linkerd.io/2/tasks/using-the-debug-container/ as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Once the deployment creates a new pod for the service, inspection should show that the container now includes the "linkerd-debug" container name based on the applicable override image seen previously within the ConfigMap
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Overriding the Custom Registry Override at Injection
This builds upon the “Standard Workflow with Custom Registry” by overriding the Docker registry utilized for the debug container at the time of injection.
* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=gcr.io/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry. Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image. Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Standard Workflow with Default Registry
This workflow is the typical workflow which utilizes the standard Linkerd image registry.
* Uninstall the Linkerd control plane using `bin/linkerd install --ignore-cluster | kubectl delete -f -` as described at https://linkerd.io/2/tasks/uninstall/
* Clean the Emojivoto environment using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl delete -f -` then reinstall using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl apply -f -`
* Perform standard Linkerd installation as `bin/linkerd install | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap references the default debug image of `gcr.io/linkerd-io/debug` within the `data.proxy` section
* Request injection of the debug image into an available container as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Overriding the Default Registry at Injection
This workflow builds upon the “Standard Workflow with Default Registry” by overriding the Docker registry utilized for the debug container at the time of injection.
* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=javaducky.com/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry. Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image. Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.
Fixes issue #3851
Signed-off-by: Paul Balogh javaducky@gmail.com
As part of the effort to remove the "experimental" label from the CNI plugin, this PR introduces cni checks to `linkerd check`
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes
- https://github.com/linkerd/linkerd2/issues/2962
- https://github.com/linkerd/linkerd2/issues/2545
### Problem
Field omissions for workload objects are not respected while marshaling to JSON.
### Solution
After digging a bit into the code, I came to realize that while marshaling, workload objects have empty structs as values for various fields which would rather be omitted. As of now, the standard library`encoding/json` does not support zero values of structs with the `omitemty` tag. The relevant issue can be found [here](https://github.com/golang/go/issues/11939). To tackle this problem, the object declaration should have _pointer-to-struct_ as a field type instead of _struct_ itself. However, this approach would be out of scope as the workload object declaration is handled by the k8s library.
I was able to find a drop-in replacement for the `encoding/json` library which supports zero value of structs with the `omitempty` tag. It can be found [here](https://github.com/clarketm/json). I have made use of this library to implement a simple filter like functionality to remove empty tags once a YAML with empty tags is generated, hence leaving the previously existing methods unaffected
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
Adds a check to ensure kube-system namespace has `config.linkerd.io/admission-webhooks:disabled`
FIxes#3721
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>