In various integration tests we're not showing stderr when a failure
happens, thus hiding some possibly useful debugging info.
E.g. in the latest CI failures, commands like `linkerd update` were
failing with no visible reason why.
* Changes for edge-19.12.3
Signed-off-by: Charles Pretzer <charles@buoyant.io>
* CHANGES.md updates based on feedback
Signed-off-by: Charles Pretzer <charles@buoyant.io>
* Fix flag name
Signed-off-by: Charles Pretzer <charles@buoyant.io>
Fixes#3444Fixes#3443
## Background and Behavior
This change adds support for the destination service to resolve Get requests which contain a service clusterIP or pod ip as the `Path` parameter. It returns the stream of endpoints, just as if `Get` had been called with the service's authority. This lays the groundwork for allowing the proxy to TLS TCP connections by allowing the proxy to do destination lookups for the SO_ORIG_DST of tcp connections. When that ip address corresponds to a service cluster ip or pod ip, the destination service will return the endpoints stream, including the pod metadata required to establish identity.
Prior to this change, attempting to look up an ip address in the destination service would result in a `InvalidArgument` error.
Updating the `GetProfile` method to support ip address lookups is out of scope and attempts to look up an ip address with the `GetProfile` method will result in `InvalidArgument`.
## Implementation
We do this by creating a `IPWatcher` which wraps the `EndpointsWatcher` and supports lookups by ip. `IPWatcher` maintains a mapping up clusterIPs to service ids and translates subscriptions to an IP address into a subscription to the service id using the underlying `EndpointsWatcher`.
Since the service name is no longer always infer-able directly from the input parameters, we restructure `EndpointTranslator` and `PodSet` so that we propagate the service name from the endpoints API response.
## Testing
This can be tested by running the destination service locally, using the current kube context to connect to a Kubernetes cluster:
```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```
Then lookups can be issued using the destination client:
```
go run controller/script/destination-client/main.go -path 192.168.54.78:80 -method get -addr localhost:8086
```
Service cluster ips and pod ips can be used as the `path` argument.
Signed-off-by: Alex Leong <alex@buoyant.io>
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.
Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.
---
* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
The Kubernetes docs recommend a common set of labels for resources:
https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/#labels
Add the following 3 labels to all control-plane workloads:
```
app.kubernetes.io/name: controller # or destination, etc
app.kubernetes.io/part-of: Linkerd
app.kubernetes.io/version: edge-X.Y.Z
```
Fixes#3816
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Inject preStop hook into the proxy sidecar container to stop it last
This commit adds support for a Graceful Shutdown technique that is used
by some Kubernetes administrators while the more perspective
configuration is being discussed in
https://github.com/kubernetes/kubernetes/issues/65502
The problem is that RollingUpdate strategy does not guarantee that all
traffic will be sent to a new pod _before_ the previous pod is removed.
Kubernetes inside is an event-driven system and when a pod is being
terminating, several processes can receive the event simultaneously.
And if an Ingress Controller gets the event too late or processes it
slower than Kubernetes removes the pod from its Service, users requests
will continue flowing into the black whole.
According [to the documentation](https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods)
> 1. If one of the Pod’s containers has defined a `preStop` hook,
> it is invoked inside of the container. If the `preStop` hook is still
> running after the grace period expires, step 2 is then invoked with
> a small (2 second) extended grace period.
>
> 2. The container is sent the `TERM` signal. Note that not all
> containers in the Pod will receive the `TERM` signal at the same time
> and may each require a preStop hook if the order in which
> they shut down matters.
This commit adds support for the `preStop` hook that can be configured
in three forms:
1. As command line argument `--wait-before-exit-seconds` for
`linkerd inject` command.
2. As `linkerd2` Helm chart value `Proxy.WaitBeforeExitSeconds`.
2. As `config.alpha.linkerd.io/wait-before-exit-seconds` annotation.
If configured, it will add the following preHook to the proxy container
definition:
```yaml
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep {{.Values.Proxy.WaitBeforeExitSeconds}}
```
To achieve max benefit from the option, the main container should have
its own `preStop` hook with the `sleep` command inside which has
a smaller period than is set for the proxy sidecar. And none of them
must be bigger than `terminationGracePeriodSeconds` configured for the
entire pod.
An example of a rendered Kubernetes resource where
`.Values.Proxy.WaitBeforeExitSeconds` is equal to `40`:
```yaml
# application container
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep 20
# linkerd-proxy container
lifecycle:
preStop:
exec:
command:
- /bin/bash
- -c
- sleep 40
terminationGracePeriodSeconds: 160 # for entire pod
```
Fixes#3747
Signed-off-by: Eugene Glotov <kivagant@gmail.com>
Replaces theme.spacing.unit in the TapQueryForm component, which is deprecated,
with theme.spacing(1), as part of the upgrade to Material-UI v4.
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
This PR pauses the network activity when the dashboard is not visible, resuming
it as soon as the user goes back to it. To do that, we are using the
react-page-visibility library.
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
This PR updates Material-UI from v3.6.1 to v4.7.1. The Material-UI
icon library has also been updated from v3.0.1 to v4.5.1.
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
Closes#3483.
This PR refactors and simplifies breadcrumb text pluralization. The redesigned
dashboard added a view that shows the user a list of all pods, deployments, etc.
in a namespace. The breadcrumb navigation text needed to be tweaked to correctly
pluralize the resource type selected.
Closes#3764.
This PR fixes an issue where the dashboard would cut off the bottom of the
Community Updates posts (displayed in an iframe) if the browser height was
shorter than the height of the iframe. Related to [#605 in the linkerd website
repo](https://github.com/linkerd/website/pull/605).
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
* Added checks for cert correctness
* Add warning checks for approaching expiration
* Add unit tests
* Improve unit tests
* Address comments
* Address more comments
* Prevent upgrade from breaking proxies when issuer cert is overwritten (#3821)
* Address more comments
* Add gate to upgrade cmd that checks that all proxies roots work with the identitiy issuer that we are updating to
* Address comments
* Enable use of upgarde to modify both roots and issuer at the same time
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Closes#3778.
Fixes a formatting issue in the dashboard Tap/Top form where if a longer
resource name was selected, the placement of the buttons was off.
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
This PR addresses recent JS unit test failures on CI by:
* Upgrading yarn from 1.7.0 to 1.21.1 (current stable version) in the Dockerfile
and Github Actions workflow
* Wrapping the yarn installation with the --network-concurrency 1 flag, setting the
maximum number of concurrent network requests to 1, suggested as a fix here:
https://github.com/yarnpkg/yarn/issues/2629
v2.80.0 fixed a problem where the destination controller client's
connection receive window could become exhausted, preventing additional
updates from the controller. The connection window has been increased
from 64K to 1MB to prevent a single stalled stream from block others.
Furthermore, discovery for IP addresses has been disabled in the proxy,
as the control plane does not yet support these resolutions. This
additionally lessons the load on the destination controller client.
---
* profiles: Eagerly read profiles off the wire (linkerd/linkerd2-proxy#397)
* router: Ensure that the purge task completes (linkerd/linkerd2-proxy#396)
* app-core: Add `accept` context with peer addr (linkerd/linkerd2-proxy#398)
* Remove default for destination lookup subnets (linkerd/linkerd2-proxy#399)
* Configure the HTTP/2 connection window to 1MB (linkerd/linkerd2-proxy#400)
* Enable cert rotation test to work with dynamic namespaces
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
The tests in #3775 were failing because the second secret holding the issuer cert replacement was a leaf cert and not a root/intermediary cert capable of signing the CSRs. This is how the replacement cert looked like:
```bash
$ k -n l5d-integration-external-issuer get secrets linkerd-identity-issuer-new -ojson | jq '.data|.["tls.crt"]' | tr -d '"' | base64 -d | step certificate inspect -
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 2 (0x2)
Signature Algorithm: ECDSA-SHA256
Issuer: CN=identity.l5d-integration-external-issuer.cluster.local
Validity
Not Before: Dec 6 19:16:08 2019 UTC
Not After : Dec 5 19:16:28 2020 UTC
Subject: CN=identity.l5d-integration-external-issuer.cluster.local
Subject Public Key Info:
Public Key Algorithm: ECDSA
Public-Key: (256 bit)
X:
93:d5:fa:f8:d1:44:4f:9a:8c:aa:0c:9e:4f:98:a3:
8d:28:d9:cc:f2:74:4c:5f:76:14:52:47:b9:fb:c9:
a3:33
Y:
d2:04:74:95:2e:b4:78:28:94:8a:90:b2:fb:66:1b:
e7:60:e5:02:48:d2:02:0e:4d:9e:4f:6f:e9:0a:d9:
22:78
Curve: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Alternative Name:
DNS:identity.l5d-integration-external-issuer.cluster.local
Signature Algorithm: ECDSA-SHA256
30:46:02:21:00:f6:93:2f:10:ba:eb:be:bf:77:1a:2d:68:e6:
04:17:a4:b4:2a:05:80:f7:c5:f7:37:82:7b:b7:9c:a1:66:6a:
e1:02:21:00:b3:65:06:37:49:06:1e:13:98:7c:cf:f9:71:ce:
5a:55:de:f6:1b:83:85:b0:a8:88:b7:cf:21:d1:16:f2:10:f9
```
For it to be a root/intermediate cert it should have had `CA:TRUE` under the `X509v3 extensions` section.
Why did the test pass sometimes? When it did pass for me, I could see in the linkerd-identity proxy logs something like:
```
ERR! [ 320.964592s] linkerd2_proxy_identity::certify Received invalid ceritficate: invalid certificate: UnknownIssuer
```
so the cert retrieved from identity still was invalid but for some reason the proxy, sometimes, keeps on going despite that. And when one would delete the linkerd-identity pod, its proxy wouldn't come up at all, also showing that error.
With the changes from this branch, we no longer see that error in the logs and after deleting the linkerd-identity pod it comes back gracefully.
This PR adds support for CronJobs and ReplicaSets to `linkerd inject`, the web
dashboard and CLI. It adds a new Grafana dashboard for each kind of resource.
Closes#3614Closes#3630Closes#3584Closes#3585
Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com
Signed-off-by: Cintia Sanchez Garcia cynthiasg@icloud.com
This PR fixes a dashboard unit test added in #3666 that was passing, but
returning a warning.
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
* Pods with non empty securitycontext capabilities fail to be injected
Followup to #3744
The `_capabilities.tpl` template got its variables scope changed in
`Values.Proxy`, which caused inject to fail when security context
capabilities were detected.
Discovered when testing injecting the nginx ingress controller.
* Add identity-issuer-certificate-file and identity-issuer-key-file to upgrade command
Signed-off-by: zaharidichev <zaharidichev@gmail.com>
* Implement logic to use identity-trust-anchors-file flag to update the anchors
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address remarks
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* No need for `processYAML()` in `install`
Since `install` uses helm to do its proxy injection, there's no need to
call `processYAML`. This also fixes an issue discovered in #3687 where
we started supporting injection of cronjobs, and even though `linkerd`'s
namespace is flagged to skip automatic injection it was being injected.
This replaces #3773 as it's a much more simpler approach.
## edge-19.12.1
* CLI
* Added condition to the `linkerd stat` command that requires a window size
of at least 15 seconds to work properly with Prometheus
* Web UI
* Fixed a table wrap issue in the resource detail view that made sidebar
font size inconsistent
* Internal
* Fixed whitespace path handling in non-docker build scripts (thanks
@joakimr-axis!)
* Removed calico logutils dependency that was incompatible with go 1.13
* Updated Helm templates to use fully-qualified variable references based
upon Helm best practices (thanks @javaducky!)
* Added new browser tests for URL routing in dashboard
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This PR fixes issues with Octopus graph line styling in the dashboard, and improves
the UI of the collapsed neighbors display.
Closes#3577
Signed-off-by: Cintia Sanchez Garcia cynthiasg@icloud.com
This PR adds support for dynamic cert generation when running the cert rotation intergration tests. This allows to avoid baking in the namespace in the certificate CN, thereby allowing us to run these tests on the clouds.
* Enable cert rotation test to work with dynamic namespaces
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* Address further comments
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR updates `react` and `react-dom` to version 16.11.0, and `react-router`
and `react-router-dom` to version 5.1.2.
The following breaking changes have been fixed as part of the upgrade:
- Change deprecated `componentWillUpdate` to `componentDidUpdate`
- Replace`react-url-query` library with `use-query-params` (a Hook) due
to the deprecation of some React lifecycle methods. This required some
changes in the Tap, Top, TapQueryForm and TopRoutes components.
Fixes#3617
Signed-off-by: Cintia Sanchez Garcia <cynthiasg@icloud.com>
This PR allows the dashboard to query for a resource's definition in YAML
format, if the boolean `queryForDefinition` in the `ResourceDetail` component is
set to true.
This change to the web API and the dashboard component was made for a future
redesigned dashboard detail page. At present, `queryForDefinition` is set to
false and there is no visible change to the user with this PR.
Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com> Signed-off-by: Cintia
Sanchez Garcia <cynthiasg@icloud.com>
This PR adds two tests to test the dashboard's new navigation and routing
patterns: url-routing.js and namespace-select.js
It deletes the now-obsolete logo-redirect.js test.