This allows for users of Linkerd to leverage the Prometheus instance
deployed by the mesh for their metric needs. With support for pod labels
outside of the Linkerd metrics users are able to scrape metrics
based upon their own labels.
Signed-off-by: Dax McDonald <dax@rancher.com>
Subject
Utilize Common Name or Subject Alternate Name for access checks (#3459)
Problem
When access restrictions to API server have been enabled with the requestheader-allowed-names configuration, only the Common Name of the requestor certificate is being checked. This check should include the use of Subject Alternate Name attributes.
Solution
API server will now check the SAN attributes (DNS Names, Email Addresses, IP Addresses, and URIs) when determining accessibility for allowed names.
Fixes issue #3459
Signed-off-by: Paul Balogh <javaducky@gmail.com>
This is a follow-up to #3882, which adopted a bunch of new linting rules
in our Javascript codebase. The no-use-before-define rule requires
moving some functions around, so I'm doing it in a separate branch.
Note that I was originally going to also enable the react/sort-comp rule
as part of this branch, but I decided that the sort ordering doesn't
work for our codebase.
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
**Subject**
Fixes bug where override of Docker registry was not being applied to debug containers (#3851)
**Problem**
Overrides for Docker registry are not being applied to debug containers and provide no means to correct the image.
**Solution**
This update expands the `data.proxy` configuration section within the Linkerd `ConfigMap` to maintain the overridden image name for debug containers at _install_-time similar to handling of the `proxy` and `proxyInit` images.
This change also enables the further override option of the registry for debug containers at _inject_-time given utilization of the `--registry` CLI option.
**Validation**
Several new unit tests have been created to confirm functionality. In addition, the following workflows were run through:
### Standard Workflow with Custom Registry
This workflow installs Linkerd control plane based upon a custom registry, then injecting the debug sidecar into a service.
* Start with a k8s instance having no Linkerd installation
* Build all images locally using `bin/docker-build`
* Create custom tags (using same version) for generated images, e.g. `docker tag gcr.io/linkerd-io/debug:git-a4ebecb6 javaducky.com/linkerd-io/debug:git-a4ebecb6`
* Install Linkerd with registry override `bin/linkerd install --registry=javaducky.com/linkerd-io | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap now contains the debug image name, pull policy, and version within the `data.proxy` section
* Request injection of the debug image into an available container. I used the Emojivoto voting service as described in https://linkerd.io/2/tasks/using-the-debug-container/ as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Once the deployment creates a new pod for the service, inspection should show that the container now includes the "linkerd-debug" container name based on the applicable override image seen previously within the ConfigMap
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Overriding the Custom Registry Override at Injection
This builds upon the “Standard Workflow with Custom Registry” by overriding the Docker registry utilized for the debug container at the time of injection.
* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=gcr.io/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry. Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image. Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Standard Workflow with Default Registry
This workflow is the typical workflow which utilizes the standard Linkerd image registry.
* Uninstall the Linkerd control plane using `bin/linkerd install --ignore-cluster | kubectl delete -f -` as described at https://linkerd.io/2/tasks/uninstall/
* Clean the Emojivoto environment using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl delete -f -` then reinstall using `curl -sL https://run.linkerd.io/emojivoto.yml | kubectl apply -f -`
* Perform standard Linkerd installation as `bin/linkerd install | kubectl apply -f -`
* Once Linkerd has been fully initialized, you should be able to confirm that the `linkerd-config` ConfigMap references the default debug image of `gcr.io/linkerd-io/debug` within the `data.proxy` section
* Request injection of the debug image into an available container as `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar - | kubectl apply -f -`
* Debugging can also be verified by viewing debug container logs as `kubectl -n emojivoto logs deploy/voting linkerd-debug -f`
* Modifying the `config.linkerd.io/enable-debug-sidecar` annotation, setting to “false”, should show that the pod will be recreated no longer running the debug container.
### Overriding the Default Registry at Injection
This workflow builds upon the “Standard Workflow with Default Registry” by overriding the Docker registry utilized for the debug container at the time of injection.
* “Clean” the Emojivoto voting service by removing any Linkerd annotations from the deployment
* Request injection similar to before, except provide the `--registry` option as in `kubectl -n emojivoto get deploy/voting -o yaml | bin/linkerd inject --enable-debug-sidecar --registry=javaducky.com/linkerd-io - | kubectl apply -f -`
* Inspection of the deployment config should now show the override annotation for `config.linkerd.io/debug-image` having the debug container from the new registry. Viewing the running pod should show that the `linkerd-debug` container was injected and running the correct image. Of note, the proxy and proxy-init images are still running the “original” override images.
* As before, modifying the `config.linkerd.io/enable-debug-sidecar` annotation setting to “false”, should show that the pod will be recreated no longer running the debug container.
Fixes issue #3851
Signed-off-by: Paul Balogh javaducky@gmail.com
As part of the effort to remove the "experimental" label from the CNI plugin, this PR introduces cni checks to `linkerd check`
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This integration test roughly follows the [Linkerd guide to distributed tracing](https://linkerd.io/2019/10/07/a-guide-to-distributed-tracing-with-linkerd/).
We deploy the tracing components (oc-collector and jaeger), emojivoto, and nginx as an ingress to do span initiation. We then watch the jaeger API and check that a trace is eventually created that includes traces from all of the data plane components: nginx, linkerd-proxy, web, voting, and emoji.
Signed-off-by: Alex Leong <alex@buoyant.io>
## edge-20.1.2
* CLI
* Added HA specific checks to `linkerd check` to ensure that the `kube-system`
namespace has the `config.linkerd.io/admission-webhooks:disabled`
label set
* Fixed a problem causing the presence of unnecessary empty fields in
generated resource definitions (thanks @mayankshah1607)
* Proxy
* Fixed an issue that could cause the OpenCensus exporter to stall
* Internal
* Added validation to incoming sidecar injection requests that ensures
the value of `linkerd.io/inject` is either `enabled` or `disabled`
(thanks @mayankshah1607)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This release fixes an issue that could cause the OpenCensus exporter to
stall.
This release does NOT include the experimental changes from
v2.83.0-experimental.
---
* http: Use the endpoint type to inform URI normalization (linkerd/linkerd2-proxy#404)
* Remove clone in opencensus exporter to ensure task is notified (linkerd/linkerd2-proxy#405)
* sort alphabatically and update prometheus version
* update version field to static
* sort linkerd2-cni readme
* switch to uppercase CNI
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes
- https://github.com/linkerd/linkerd2/issues/2962
- https://github.com/linkerd/linkerd2/issues/2545
### Problem
Field omissions for workload objects are not respected while marshaling to JSON.
### Solution
After digging a bit into the code, I came to realize that while marshaling, workload objects have empty structs as values for various fields which would rather be omitted. As of now, the standard library`encoding/json` does not support zero values of structs with the `omitemty` tag. The relevant issue can be found [here](https://github.com/golang/go/issues/11939). To tackle this problem, the object declaration should have _pointer-to-struct_ as a field type instead of _struct_ itself. However, this approach would be out of scope as the workload object declaration is handled by the k8s library.
I was able to find a drop-in replacement for the `encoding/json` library which supports zero value of structs with the `omitempty` tag. It can be found [here](https://github.com/clarketm/json). I have made use of this library to implement a simple filter like functionality to remove empty tags once a YAML with empty tags is generated, hence leaving the previously existing methods unaffected
Signed-off-by: Mayank Shah <mayankshah1614@gmail.com>
There are a few dangling references to old release versions in our charts and readmes.
I've removed as many of these references as possible so that we no longer need to worry about them getting out of date. The one reference that remains is `cniPluginVersion` and this will need to be manually updated as part of the release process.
Signed-off-by: Alex Leong <alex@buoyant.io>
Adds a check to ensure kube-system namespace has `config.linkerd.io/admission-webhooks:disabled`
FIxes#3721
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
## edge-20.1.1
This edge release includes experimental improvements to the Linkerd proxy's
request buffering and backpressure infrastructure.
Additionally, we've fixed several bugs when installing Linkerd with Helm,
updated the CLI to allow using both port numbers _and_ port ranges with the
`--skip-inbound-ports` and `--skip-outbound-ports` flags, and fixed a dashboard
error that can occur if the dashboard is open in a browser while updating Linkerd.
**Note**: The `linkerd-proxy` version included with this release is more
experimental than usual. We'd love your help testing, but be aware that there
might be stability issues.
* CLI
* Added the ability to pass both port numbers and port ranges to
`--skip-inbound-ports` and `--skip-outbound-ports` (thanks to @javaducky!)
* Controller
* Fixed a race condition in the `linkerd-web` service
* Updated Prometheus to 2.15.2 (thanks @Pothulapati)
* Web UI
* Fixed an error when refreshing an already open dashboard when the Linkerd
version has changed
* Proxy
* Internal changes to the proxy's request buffering and backpressure
infrastructure
* Helm
* Fixed the `linkerd-cni` Helm chart not setting proper namespace annotations
and labels
* Fixed certificate issuance lifetime not being set when installing through
Helm
* More improvements to Helm best practices (thanks to @Pothulapati!)
This is an experimental release that includes large changes to the
proxy's request buffering and backpressure infrastructure.
Please exercise caution before deploying this proxy version into mission
critical environments.
`cniEnabled` was hard-coded to `false` in the `_config.tpl` template, thus always adding the init container during injection regardless of having installed the control plane with `--set noInitContainer=true`.
This affects injection after having installed with Helm, not when having installed with the CLI.
Repro steps under the edge-19.12.3's tag:
```bash
$ helm install charts/linkerd2-cni
# wait for the linkerd-cni-xxx pod to come up
# refresh linkerd2's chart dependencies
$ bin/helm-build
# overrides.yaml should contain all the mandatory values for certs
$ helm install -f overrides.yaml --set noInitContainer=true --set installNamespace=false charts/linkerd2
# verify the global config `cniEnabled` is NOT being persisted appropriately
$ k -n linkerd get cm linkerd-config -oyaml | grep cni
"cniEnabled": false,
# install and inject emojivoto
$ curl https://run.linkerd.io/emojivoto.yml|bin/go-run cli inject -|k apply -f -
# verify that the init container is being (unexpectedly) added
$ k -n emojivoto get po emoji-xxxxx-xxx -oyaml | grep initContainer
initContainers:
initContainerStatuses:
```
In this branch:
```bash
$ helm install charts/linkerd2-cni
# wait for the linkerd-cni-xxx pod to come up
# refresh linkerd2's chart dependencies
$ bin/helm-build
# overrides.yaml should contain all the mandatory values for certs
$ helm install -f overrides.yaml --set noInitContainer=true --set installNamespace=false charts/linkerd2
# verify the global config `cniEnabled` is being persisted appropriately
$ k -n linkerd get cm linkerd-config -oyaml | grep cni
"cniEnabled": true,
# install and inject emojivoto
$ curl https://run.linkerd.io/emojivoto.yml|bin/go-run cli inject -|k apply -f -
# verify that the init container is NOT being added
$ k -n emojivoto get po emoji-xxxxx-xxx -oyaml | grep initContainer
# nothing returned
```
This replaces #3872
Fixes a problem where the identitiy serice can issue a certificate that has a lifetime larger than the issuer certificate. This was causing the proxies to end up using an invalid TLS certificate. This fix ensures that the lifetime of the issued certificate is not greater than the smallest lifetime of the certs in the issuer cert trust chain.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Fixes https://github.com/linkerd/linkerd2/issues/3878
If the `--registry` flag is provided to Linkerd without the `--proxy-image` or `--init-image` flags, the `--registry` flag is ignored and not applied to the existing values for the proxy or init images pulled from the configmap.
We now override the registry with the value from the `--registry` flag regardless of which other flags are provided.
Signed-off-by: Alex Leong <alex@buoyant.io>
The current set of Javascript linting rules that we're using in this
project is outdated, and it has lead to a variety of competing styles
in the Javascript codebase.
Update the project's linting rules to match those provided by the latest
release of eslint-config-airbnb, but disable a bunch of rules that
aren't compatible with this project.
I've split this change into two commits. The first commit contains the
manual changes that I made to satisfy the new rules, and the second
commit contains all of the whitespace, quoting and commas changes that
were fixed automatically by eslint.
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Revert "Replace actions/checkout with actions/upload/download (#3602)"
This reverts commit 397970e917.
* Upgraded actions/checkout to @v2
Reverts #3602 and Fixes#3881
## Motivation
Full background: #2074#2074 was recently reopened because a user reported an error that occurs when
refreshing an already opened dashboard after the dashboard build has changed.
This can occur when upgrading or downgrading.
#2074 explores a larger issue about a redirection that occurs when loading the
dashboard JS. However, the actual issue that users are experiencing happens
because `index_bundle.js` is being cached when it should not be.
Even if the hash of the JS bundle changes, users can see (on the current edge)
that browsers do in fact cache `index_bundle.js`.
The easiest way I reproduced this was:
1. Install `edge-19.12.3`
2. `linkerd dashboard` (and keep the tab open)
2. Uninstall `edge-19.12.3`
3. Install `stable-2.5.0`
4. `linkerd dashboard`
5. Refresh in all browsers: Users will observe the `edge-19.12.3` dashboard
still renders (with all of it's new additions) even though `stable-2.5.0` is
installed with it's older theme.
Below are screenshots of Safari and Firefox caching the file. Chrome was not as
easy to reproduce:
*Safari*

*Firefox*

## Solution
This change only changes the response header when requesting `index_bundle.js`
from the server to ensure caching does not take place; mainly `no-cache` is
changed to `no-store` and `must-revalidate` is now included.
`no-store` and `must-revalidate` are redundant on some browsers but both
required to cover all browsers (and versions).
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
Due to wrong snake casing, lifetime setting lifetime issuance was not reflected when installing through helm. This commit solved that problem
Signed-off-by: Zahari Dichev zaharidichev@gmail.com
* Use `kind export kubeconfig` instead of `scp`
Followup to #3864
[comment](https://github.com/linkerd/linkerd2/pull/3864#discussion_r360976473)
Stop moving the kubeconfig file between the Github Action env and the
build server with `scp` and use instead `kind export kubeconfig`.
* Replaced deprecated '--loglevel debug' flag with '--verbosity 3'
We were ignoring events like
```
MountVolume.SetUp failed for volume .* : couldn't propagate object cache: timed out waiting for the condition
```
but as k8s 1.16 those got replaced by more precise messages, like
```
MountVolume.SetUp failed for volume "linkerd-identity-token-cm4fn" :failed to sync secret cache: timed out waiting for the condition
MountVolume.SetUp failed for volume "prometheus-config" : failed to sync configmap cache: timed out waiting for the condition
```
This was causing sporadic CI test failures like
[here](https://github.com/linkerd/linkerd2/runs/368424822#step:7:562)
So I'm including another regex for that.
Re: 96c41f8a1e
* The `linkerd-cni` chart should set proper annotations/labels for the namespace
When installing through Helm, the `linkerd-cni` chart will (by default)
install itself under the same namespace ("linkerd") that the `linkerd` chart will be
installed aftewards. So it needs to set up the proper annotations and labels.
* Fix Helm install when disabling init containers
To install linkerd using Helm after having installed linkerd's CNI plugin, one needs to `--set noInitContainer=true`.
But to determine whether to use init containers or not, we weren't
evaluating that, but instead `Values.proxyInit`, which is indeed null
when installing through the CLI but not when installing with Helm. So
init containers were being set despite having passed `--set
noInitContainers=true`.
* Upgrade `kind` to v0.6.1
Fixes#3852
Upgraded `/bin/kind` to pull v0.6.1.
Also have `workflow.yml` use `KUBECONFIG` explicitly for setting the
location of the config file, now that `kind get kubeconfig-path` has
been deprecated (check
https://github.com/kubernetes-sigs/kind/releases/tag/v0.6.0 for detailed
info).
Note that in the build server the kind binary for this version is
`kind-0.6.1`, leaving the `kind` binary still pointing to v0.5.1 while
this gets merged and all the PR branches get this.
* update flags to smaller
* add tests for the same
* fix control plane trace flag
* add tests for controlplane tracing install
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This PR restructures how the the array of apiRequests are constructed in the
`ResourceDetail` component to reduce unnecessary data requests. In the case of a
Pod detail page, we will no longer query the API for a list of pods in a
namespace, or request metrics for those pods, which we do for all other resource
detail pages.
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
Fixes#3801
This will package and build the `linkerd2-cni` chart from the
`charts/linkerd2-cni` directory and update our Helm Hub's `index.yaml`
file to index it.
This will only be run in the `chart_deploy` job of our Github Actions
when an edge/stable tag is pushed.
Once that happens, users will be able to install the chart with a
command like:
```
helm install linkerd-edge/linkerd2-cni
```
Docs update will follow.
* Enable mixed configuration of skip-[inbound|outbound]-ports using port numbers and ranges (#3752)
* included tests for generated output given proxy-ignore configuration options
* renamed "validate" method to "parseAndValidate" given mutation
* updated documentation to denote inclusiveness of ranges
* Updates for expansion of ignored inbound and outbound port ranges to be handled by the proxy-init rather than CLI (#3766)
This change maintains the configured ports and ranges as strings rather than unsigned integers, while still providing validation at the command layer.
* Bump versions for proxy-init to v1.3.0
Signed-off-by: Paul Balogh <javaducky@gmail.com>
In various integration tests we're not showing stderr when a failure
happens, thus hiding some possibly useful debugging info.
E.g. in the latest CI failures, commands like `linkerd update` were
failing with no visible reason why.
* Changes for edge-19.12.3
Signed-off-by: Charles Pretzer <charles@buoyant.io>
* CHANGES.md updates based on feedback
Signed-off-by: Charles Pretzer <charles@buoyant.io>
* Fix flag name
Signed-off-by: Charles Pretzer <charles@buoyant.io>
Fixes#3444Fixes#3443
## Background and Behavior
This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter. It returns the stream of endpoints, just as if `Get` had been called with the service's authority. This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections. When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity.
Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error.
Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`.
## Implementation
We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip. `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`.
Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response.
## Testing
This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster:
```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```
Then lookups can be issued using the destination client:
```
go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086
```
Service cluster ips and pod ips can be used as the `path` argument.
Signed-off-by: Alex Leong <alex@buoyant.io>
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.
Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.
---
* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
The Kubernetes docs recommend a common set of labels for resources:
https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#labels
Add the following 3 labels to all control-plane workloads:
```
app.kubernetes.io/name: controller # or destination, etc
app.kubernetes.io/part-of: Linkerd
app.kubernetes.io/version: edge-X.Y.Z
```
Fixes#3816
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Inject preStop hook into the proxy sidecar container to stop it last
This commit adds support for a Graceful Shutdown technique that is used
by some Kubernetes administrators while the more perspective
configuration is being discussed in
https://github.com/kubernetes/kubernetes/issues/65502
The problem is that RollingUpdate strategy does not guarantee that all
traffic will be sent to a new pod _before_ the previous pod is removed.
Kubernetes inside is an event-driven system and when a pod is being
terminating, several processes can receive the event simultaneously.
And if an Ingress Controller gets the event too late or processes it
slower than Kubernetes removes the pod from its Service, users requests
will continue flowing into the black whole.
According [to the documentation](https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods)
> 1. If one of the Pod’s containers has defined a `preStop` hook,
> it is invoked inside of the container. If the `preStop` hook is still
> running after the grace period expires, step 2 is then invoked with
> a small (2 second) extended grace period.
>
> 2. The container is sent the `TERM` signal. Note that not all
> containers in the Pod will receive the `TERM` signal at the same time
> and may each require a preStop hook if the order in which
> they shut down matters.
This commit adds support for the `preStop` hook that can be configured
in three forms:
1. As command line argument `--wait-before-exit-seconds` for
`linkerd inject` command.
2. As `linkerd2` Helm chart value `Proxy.WaitBeforeExitSeconds`.
2. As `config.alpha.linkerd.io/wait-before-exit-seconds` annotation.
If configured, it will add the following preHook to the proxy container
definition:
```yaml
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep {{.Values.Proxy.WaitBeforeExitSeconds}}
```
To achieve max benefit from the option, the main container should have
its own `preStop` hook with the `sleep` command inside which has
a smaller period than is set for the proxy sidecar. And none of them
must be bigger than `terminationGracePeriodSeconds` configured for the
entire pod.
An example of a rendered Kubernetes resource where
`.Values.Proxy.WaitBeforeExitSeconds` is equal to `40`:
```yaml
# application container
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep 20
# linkerd-proxy container
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep 40
terminationGracePeriodSeconds: 160 # for entire pod
```
Fixes#3747
Signed-off-by: Eugene Glotov <kivagant@gmail.com>