linkerd2

Commit Graph

Author	SHA1	Message	Date
wangchenglong01	4c1b51a6f2	Condition is always 'false' because 'err' is always 'nil' (#5982 ) Remove unnecessary err check. Signed-off-by: Cookie Wang <wangchl01@inspur.com>	2021-04-05 10:49:12 +05:30
Kevin Leimkuhler	b9aa32f9b2	Add changes for edge-21.4.1 (#5980 ) ## edge-21.4.1 This is a release candidate for `stable-2.10.1`! This includes several fixes for the core installation as well the Multicluster, Jaeger, and Viz extensions. There are two significant proxy fixes that address TLS detection and admin server failures. Thanks to all our 2.10 users who helped discover these issues! * Fixed TCP read and write bytes/sec calculations to group by label based off inbound or outbound traffic * Updated dashboard build to use webpack v5 * Modified the proxy-injector to add the opaque ports annotation to pods if their namespace has it set * Added CA certs to the Viz extension's `metrics-api` container so that it can validate the certifcate of an external Prometheus * Fixed an issue where inbound TLS detection from non-meshed workloads could break * Fixed an issue where the admin server's HTTP detection would fail and not recover; these are now handled gracefully and without logging warnings * Aligned the Helm installation heartbeat schedule to match that of the CLI * Fixed an issue with Multicluster's serivce mirror where it's endpoint repair retries were not properly rate limited * Removed components from the control plane dashboard that now are part of the Viz extension * Fixed components in the Jaeger extension to set the correct Prometheus scrape values Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-04-01 16:08:35 -04:00
Tarun Pothulapati	cceaed8da9	jaeger: fix prometheus scrape configuration (#5979 ) Fixes #5976 Currently, Jaeger and Collector components in jaeger extension do not actually support metrics scraping because relevant ports are not exposed and Prometheus annotations are not set correctly. This PR fixes those values to be the correct ones. By default, Prometheus in `linkerd-viz` does not actually scrape jaeger metrics, and additional configuration has to be applied to do the same. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-04-01 22:45:47 +05:30
Dennis Adjei-Baah	586b08314a	Remove viz components in control plane dashboard (#5978 ) This change modifies the `/controlplane` dashboard page by removing viz components that are no longer part of the core control plane install. It also adds `Proxy Injector` to the list of components since that was missing from the list. Fixes #5898 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io> ![image](https://user-images.githubusercontent.com/2197104/113218613-7481e280-9245-11eb-927e-a296ab58db57.png)	2021-04-01 12:50:17 -04:00
Tarun Pothulapati	7ab8255855	multicluster: make service mirror honour `requeueLimit` (#5969 ) * multicluster: make service mirror honour `requeueLimit` Fixes #5374 Currently, Whenever the `gatewayAddress` is changed the service mirror component keeps trying to repairEndpoints (which is invoked every `repairPeriod`). This behavior is fine and expected but as the service mirror does not honor `requeueLimit` currently, It keeps on requeuing the same event and keeps trying with no limit. The condition that we use to limit requeues `if (rcsw.eventsQueue.NumRequeues(event) < rcsw.requeueLimit)` does not work for the following reason: - For this queue to actually track requeues, `AddRateLimited` has to be used instead which makes `NumRequeues` actually return the actual number of requeues for a specific event. This change updates the requeuing logic to use `AddRateLimited` instead of `Add` After these changes, The logs in the service mirror are as follows ```bash time="2021-03-30T16:52:31Z" level=info msg="Received: OnAddCalled: {svc: Service: {name: grafana, namespace: linkerd-viz, annotations: [[linkerd.io/created-by=linkerd/helm git-0e2ecd7b]], labels [[linkerd.io/extension=viz]]}}" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:52:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 0, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 1, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 2, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (will retry): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Received: RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=warning msg="Error resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=info msg="Requeues: 3, Limit: 3 for event RepairEndpoints" apiAddress="https://172.18.0.4:6443" cluster=remote time="2021-03-30T16:53:31Z" level=error msg="Error processing RepairEndpoints (giving up): Inner errors:\n\tError resolving 'foobar': lookup foobar on 10.43.0.10:53: no such host" apiAddress="https://172.18.0.4:6443" cluster=remote ``` As seen, The `RepairEndpoints` is called every `repairPeriod` which is 1 minute by default. Whenever a failure happens, It is retried but now the failures are tracked and the event is given up if it reaches the `reuqueLimit` which is 3 by default. This also fixes the requeuing logic for all type of events not just `repairEndpoints`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-04-01 11:16:57 +05:30
Tarun Pothulapati	8a1c698f85	tests: include viz extension in upgrade-* integration tests (#5610 ) * tests: include viz extension in upgrade-* integration tests Currently, The source version of viz is installed even for upgrade integration tests. Instead, Just like the core control-plane the viz extension should be started with stable/edge version and then upgraded to the source version. This helps discover any issues between upgrades for viz extension. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-31 21:57:32 +05:30
Alejandro Pedraza	61f443ad05	Schedule heartbeat 10 mins after install (#5973 ) * Schedule heartbeat 10 mins after install ... for the Helm installation method, thus aligning it with the CLI installation method, to reduce the midnight peak on the receiving end. The logic added into the chart is now reused by the CLI as well. Also, set `concurrencyPolicy=Replace` so that when a job fails and it's retried, the retries get canceled when the next scheduled job is triggered. Finally, the go client only failed when the connection failed; successful connections with a non 200 response status were considered successful and thus the job wasn't retried. Fixed that as well.	2021-03-31 07:49:36 -05:00
Oliver Gould	1a8ea7eb55	proxy: v2.140.0 (#5974 ) This release fixes two issues: 1. The inbound proxy could break non-meshed TLS connections when the initial ClientHello message was larger than 512 bytes or when the entire message was not received in the first data packet of the connection. TLS detection has been fixed to ensure that the entire message is preserved in these cases. 2. The admin server could emit warnings about HTTP detection failing in some innocuous situations, such as when the socket closes before a request is sent. These situations are now handled gracefully without logging warnings. --- * Update MAINTAINERS to point at the main repo (linkerd/linkerd2-proxy#950) * outbound: Configure endpoint construction in logical stack (linkerd/linkerd2-proxy#949) * outbound: Decouple the TCP connect stack from the target type (linkerd/linkerd2-proxy#951) * outbound: Make HTTP endpoint stack generic on its target (linkerd/linkerd2-proxy#952) * outbound: Make the HTTP server stack generic (linkerd/linkerd2-proxy#953) * Update profile response to include a logical address (linkerd/linkerd2-proxy#954) * inbound, outbound: `Param`-ify `listen::Addrs` (linkerd/linkerd2-proxy#955) * tls: Fix inbound I/O when TLS detection fails (linkerd/linkerd2-proxy#958) * tls: Test SNI detection (linkerd/linkerd2-proxy#959) * admin: Handle connections that fail protocol detection (linkerd/linkerd2-proxy#960)	2021-03-30 15:29:03 -07:00
Alex Leong	320df9e69a	Add CA certs to metrics-api container (#5972 ) Fixes #5966 Fixes #5955 The metrics-api container in the Viz extension does not have the default set of system CA certificates installed. This means that it will fail to validate the certificate of an external prometheus serverd over https. We add install default CA certs into the container. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-03-30 09:26:40 -07:00
Kevin Leimkuhler	a11012819c	Add opaque ports namespace inheritance to pods (#5941 ) ### What When a namespace has the opaque ports annotation, pods and services should inherit it if they do not have one themselves. Currently, services do this but pods do not. This can lead to surprising behavior where services are correctly marked as opaque, but pods are not. This changes the proxy-injector so that it now passes down the opaque ports annotation to pods from their namespace if they do not have their own annotation set. Closes #5736. ### How The proxy-injector webhook receives admission requests for pods and services. Regardless of the resource kind, it now checks if the resource should inherit the opaque ports annotation from its namespace. It should inherit it if the namespace has the annotation but the resource does not. If the resource should inherit the annotation, the webhook creates an annotation patch which is only responsible for adding the opaque ports annotation. After generating the annotation patch, it checks if the resource is injectable. From here there are a few scenarios: 1. If no annotation patch was created and the resource is not injectable, then admit the request with no changes. Examples of this are services with no OP annotation and inject-disabled pods with no OP annotation. 2. If the resource is a pod and it is injectable, create a patch that includes the proxy and proxy-init containers—as well as any other annotations and labels. 3. The above two scenarios lead to a patch being generated at this point, so no matter the resource the patch is returned. ### UI changes Resources are now reported to either be "injected", "skipped", or "annotated". The first pass at this PR worked around the fact that injection reports consider services and namespaces injectable. This is not accurate because they don't have pod templates that could be injected; they can however be annotated. To fix this, an injection report now considers resources "annotatable" and uses this to clean up some logic in the `inject` command, as well as avoid a more complex proxy-injector webhook. What's cool about this is it fixes some `inject` command output that would label resources as "injected" when they were not even mutated. For example, namespaces were always reported as being injected even if annotations were not added. Now, it will properly report that a namespace has been "annotated" or "skipped". ### Tests For testing, unit tests and integration tests have been added. Manual testing can be done by installing linkerd with `debug` controller log levels, and tailing the proxy-injector's app container when creating pods or services. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-03-29 19:41:15 -04:00
Dennis Adjei-Baah	3e54bf9af9	Update dashboard build to use webpack v5 (#5962 ) This change upgrades the current webpack version used to build the dashboard to version 5. This upgrade required an update to some deprecated js build libraries (i.e. `@babel-eslint` -> `@babel/eslint-parser` and `eslint-loader` -> `eslint-webpack-plugin`) Upgrading these packaged also added some new linting rules that. Some of the files have been updated to fix them while some other linting rules were disabled and will be fixed in a follow up PR. This change also includes updates the the dashboard `Dockerfile` to update the node and yarn versions used to build the web assets. Fixes #5945 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2021-03-29 11:46:56 -07:00
Matei David	e798b33e2e	Add peer label to TCP read and write stat queries (#5903 ) Add peer label to TCP read and write stat queries Closes #5693 ### Tests --- After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes). ``` $ linkerd viz stat deploy/web -n emojivoto -o wide NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC web 1/1 91.91% 2.3rps 2ms 4ms 5ms 3 185.3B/s 5180.0B/s # same value as before, latency seems to have dropped time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)" time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)" # queries show the peer label --- $ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC web 1/1 93.16% 1.9rps 3ms 4ms 4ms 1 4503.4B/s 153.1B/s # stats same as before except for latency which seems to have dropped a bit time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)" time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)" # queries show the right label ``` Signed-off-by: mateiidavid <matei.david.35@gmail.com>	2021-03-26 13:36:30 -04:00
Bruce Chen Wenliang	073212ac6d	Add Personio to the adopters list (#5957 ) Signed-off-by: Bruce <wenliang.chen@personio.de>	2021-03-26 09:52:15 -05:00
Brad McCoy	6f9656ed17	Update ADOPTERS.md (#5956 ) Update Search365 to ADOPTERS.md Signed-off-by: bradmccoydev <bradmccoydev@gmail.com>	2021-03-26 10:48:09 -04:00
Tarun Pothulapati	332f788856	edge-21.3.4 (#5950 ) * Changelog for edge-21.3.4 This release fixes some issues around publishing of CLI binary for Apple Silicon M1 Chips. This release also includes some fixes and improvements to the dashboard, destination, and the CLI. * Fixed an issue where the topology graph in the dashboard was no longer draggable * Updated the IP Watcher in destination to ignore pods in "Terminating" state (thanks @Wenliang-CHEN!) * Added `installNamespace` toggle in the jaeger extension's install. (thanks @jijeesh!) * Updated `healthcheck` pkg to have `hintBaseURL` configurable, useful for external extensions using that pkg * Added multi-arch support for RabbitMQ integration tests (thanks @barkardk!) Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-25 12:00:46 +05:30
Dennis Adjei-Baah	39950dcb6f	Update bin/web script to work with metrics-api (#5944 ) This change updates the bin/web script to work with linkerd viz's `metrics-api`. Previously, the script was still pointing to the linkerd `controller` pod which no longer has the public-api to get metrics. This change also fixes the script's port forward function so that we can point to both the `linkerd-controller` and the `metrics-api`. This was causing the linkerd web development environment to not start up when running the script. Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-03-24 15:51:22 -04:00
Bruce Chen Wenliang	b84b2077d3	Ignore pods in "Terminating" when watching IP addresses. (#5940 ) Fixes #5939 Some CNIs reasssign the IP of a terminating pod to a new pod, which leads to duplicate IPs in the cluster. It eventually triggers #5939. This commit will make the IPWatcher, when given an IP, filter out the terminating pods (when a pod is given a deletionTimestamp). The issue is hard reproduce because we are not able to assign a particular IP to a pod manually. Signed-off-by: Bruce <wenliang.chen@personio.de> Co-authored-by: Bruce <wenliang.chen@personio.de>	2021-03-24 18:21:42 +05:30
Jijeesh K A	14d482186c	add installNamespace toggle for jaeger extension This PR adds a new field into `values.yaml` of the jaeger extension i.e `installNamespace` used to toggle the presence of namespace manifest. This is useful when installing/upgrading into a custom namespace and follows the same pattern as that of other extensions Signed-off-by: jijeesh <jijeesh.ka@gmail.com>	2021-03-24 12:09:46 +05:30
Alejandro Pedraza	9a191fbd7b	Get rid of `CheckDeployments()` in integration tests (#5942 ) `CheckDeployments()` verified that some deployment had the appropriate number of replicas in the Ready state. `CheckPods()` does the same, plus checking if there were restarts (a single restart returns `RestartCountError` which only triggers a warning on the calling side, more restarts trigger a regular error). We were always calling the former followed by a call to the latter which is superfluous, so we're getting rid of the latter. Also, the `testutil.DeploySpec` struct had a `Containers` field for checking the name of containers, but that wasn't always checked and didn't really represent a potential error that wouldn't be clearly manifested otherwise (like in golden files), so that was simplified as well.	2021-03-23 13:53:38 -05:00
Tarun Pothulapati	defdbbf738	checks: make hintBaseURL configurable (#5921 ) * checks: make hintBaseURL configurable Fixes #5740 Currently, When external binaries use the `healthcheck` pkg, They can't really override the `hintBaseURL` variable which is used to set the baseURL of the troubleshooting doc (in which hintAnchor is used as a anchor). This PR updates that by adding a new `hintBaseURL` field to `healthcheck.Category`. This field has been added to `Category` instead of `Checker` so as to reduce unecessary redundancy as the chances of setting a different baseURL for checkers under the same category is very low. This PR also updates the `runChecks` logic to automatically fallback onto the linkerd troubleshooting doc as the baseURL so that all the current checks don't have to set this field. This can be done otherwise (i.e removing the fallback logic and make all categories specify the URL) if there is a problem. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-23 23:16:56 +05:30
Kris Barkar	2cd5165bba	Rabbitmq client container multiarch support (#5927 ) Signed-off-by: Kristin Barkardottir <kristin.barkardottir@netapp.com>	2021-03-22 16:29:12 -05:00
Dennis Adjei-Baah	caaf3f4173	Fix issue where MacOS binary was not being built (#5934 ) This change fixes an issue where the default linkerd MacOS binary would not get created by `docker-build-cli-bin`. This was caused by the `Darwin/arm64` build step overwriting the binary created by the previous `Darwin/amd64` build step. Tested the change on MacOS and confirmed that both the default amd64 and arm64 versions are built. Fixes #5933 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-03-22 10:50:48 -04:00
Alejandro Pedraza	3d71fc669c	Update bin/docker-pull to include metrics-api jaeger-webhook and tap (#5930 )	2021-03-22 09:11:58 -05:00
Alejandro Pedraza	65c4477778	Add darwin-arm64 CLI binary to list of uploads in release.yml (#5929 )	2021-03-19 14:14:58 -05:00
Dennis Adjei-Baah	284a74b3d2	skip rabbitmq tests for ARM64 integration tests (#5925 ) Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-03-18 15:11:29 -07:00
Dennis Adjei-Baah	3bf793ea57	Changelog for edge-21.3.3 (#5924 ) This release includes various bug fixes and improvements to the CLI, the identity and destination control plane components as well as the proxy. This release also ships with a new CLI binary for Apple Silicon M1 chips. * Updated `helm-upgrade` and `upgrade-stable` integration tests now that 2.10 has been released * Added new RabbitMQ integration tests (thanks @barkardk!) * Updated the Go version to 1.16.2 * Fixed an issue where the `linkerd identity` command returned the root certificate of a pod instead of its leaf certificate * Fixed an issue where the destination service would respond with too big of a header and result in http2 protocol errors * Updated `docker-build-cli-bin` to build Darwin Arm64 binaries. This now adds support for running Linkerd CLI on Apple Silicon M1 chips * Improved error messaging when trying to install Linkerd on a cluster that already had Linkerd installed * Fixed an issue where the `destination` control plan component sometimes returned endpoint addresses with a `0` port number while pods were undergoing a rollout * Added a loading spinner to the `linkerd check` command when running extension checks * Fixed an issue where pod lookups by host IP and host port fail even though the cluster has a matching pod * Control plane proxies no longer emit warnings about the resolution stream ending. This error was innocuous. * Fixed an issue where proxies could infinitely retry failed requests to the `destination` controller when it returned a `FailedPrecondition` * The proxy's logging infrastructure has been updated to reduce memory pressure in high-connection environments.	2021-03-18 13:31:16 -07:00
Oliver Gould	55abc99e20	proxy: v2.139.0 (#5923 ) This release includes several stability improvements, following initial feedback from the stable-2.10.0 release: * The control plane proxies no longer emit warnings about the resolution stream ending. This error was innocuous. * The proxy's logging infrastructure has been updated to avoid including client addresses in cached logging spans. Now, client addresses are preserved to be included in warning logs. This should reduce memory pressure in high-connection environments. * The proxy could infinitely retry failed requests to the destination controller when it returned a FailedPrecondition, indicating an unexpected cluster state. These errors are now handled gracefully. --- * Prevent fixed-address resolutions from ending (linkerd/linkerd2-proxy#945) * http: Parameterize authority overriding (linkerd/linkerd2-proxy#946) * tracing: Avoid high-cardinality client in INFO spans (linkerd/linkerd2-proxy#947) * Handle FailedPrecondition errors from the control plane (linkerd/linkerd2-proxy#948)	2021-03-18 11:45:04 -07:00
Kevin Leimkuhler	3f72c998b3	Handle pod lookups for pods that map to a host IP and host port (#5904 ) This fixes an issue where pod lookups by host IP and host port fail even though the cluster has a matching pod. Usually these manifested as `FailedPrecondition` errors, but the messages were too long and resulted in http/2 errors. This change depends on #5893 which fixes that separate issue. This changes how often those `FailedPrecondition` errors actually occur. The destination service now considers pod host IPs and should reduce the frequency of those errors. Closes #5881 --- Lookups like this happen when a pod is created with a host IP and host port set in its spec. It still has a pod IP when running, but requests to `hostIP:hostPort` will also be redirected to the pod. Combinations of host IP and host Port are unique to the cluster and enforced by Kubernetes. Currently, the destination services fails to find pods in this scenario because we only keep an index with pod and their pod IPs, not pods and their host IPs. To fix this, we now also keep an index of pods and their host IPs—if and only if they have the host IP set. Now when doing a pod lookup, we consider both the IP and the port. We perform the following steps: 1. Do a lookup by IP in the pod podIP index - If only one pod is found then return it 2. 0 or more than 1 pods have the same pod IP 3. Do a lookup by IP in the pod hostIP index - If any number of pods were found, we know that IP maps to a node IP. Therefore, we search for a pod with a matching host Port. If one exists then return it; if not then there is no pod that matches `hostIP:port` 4. The IP does not map to a host IP 5. If multiple pods were found in `1`, then we know there are pods with conflicting podIPs and an error is returned 6. If no pounds were found in `1` then there is no pod that matches `IP:port` --- Aside from the additional IP watcher test being added, this can be tested with the following steps: 1. Create a kind cluster. kind is required because it's pods in `kube-system` have the same pod IPs; this not the case with k3d: `bin/kind create cluster` 2. Install Linkerd with `4445` marked as opaque: `linkerd install --set proxy.opaquePorts="4445" \|kubectl apply -f -` 2. Get the node IP: `kubectl get -o wide nodes` 3. Pull my fork of `tcp-echo`: ``` $ git clone https://github.com/kleimkuhler/tcp-echo ... $ git checkout --track kleimkuhler/host-pod-repro ``` 5. `helm package .` 7. Install `tcp-echo` with the server not injected and correct host IP: `helm install tcp-echo tcp-echo-0.1.0.tgz --set server.linkerdInject="false" --set hostIP="..."` 8. Looking at the client's proxy logs, you should not observe any errors or protocol detection timeouts. 9. Looking at the server logs, you should see all the requests coming through correctly. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-03-18 13:29:43 -04:00
William Morgan	fe6f0c8687	reference SECURITY.md from README.md (#5918 ) * reference SECURITY.md from README.md * add best practices badge Signed-off-by: William Morgan <william@buoyant.io>	2021-03-18 10:13:06 -07:00
Tarun Pothulapati	b88ce7aa59	cli: show spinner for extension checks (#5915 ) Fixes #5637 Currently, When extension checks are running through `linkerd check` we don't really any show status. This means that when the check is run directly after a extension install, The CLI looks like it got struck somewhere with no status as per the issue description. This PR updates the code to show a spinner with the name of extension of whose the check is being ran. ```bash control-plane-version --------------------- control plane is up-to-date control plane and cli versions match control plane running stable-2.10.0 but cli running dev-e5ff84ce-tarun see https://linkerd.io/checks/#l5d-version-control for hints Status check results are Linkerd extensions checks ========================= linkerd-jaeger -------------- linkerd-jaeger extension Namespace exists collector and jaeger service account exists collector config map exists jaeger extension pods are injected jaeger extension pods are running Status check results are / Running viz extension check ``` Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-18 12:23:55 +05:30
Riccardo Freixo	66b89f55e7	Fix named port resolution mid roll out (#5912 ) (#5911 ) # Problem While rolling out often not all pods will be ready in all the same set of ports, leading the Kubernetes Endpoints API to return multiple subsets, each covering a different set of ports, with the end result that the same address gets repeated across subsets. The old code for endpointsToAddresses would loop through all subsets, and the later occurrences of an address would overwrite previous ones, with the last one prevailing. If the last subset happened to be for an irrelevant port, and the port to be resolved is named, resolveTargetPort would resolve to port 0, which would return port 0 to clients, ultimately leading linkerd-proxy to forward connections to port 0. This only happens if the pods selected by a service expose > 1 port, the service maps to > 1 of these ports, and at least one of these ports is named. # Solution Never write an address to set of addresses if resolved port is 0, which indicates named port resolution failed. # Validation Added a test case. Signed-off-by: Riccardo Freixo <riccardofreixo@gmail.com>	2021-03-17 17:40:11 -04:00
Tarun Pothulapati	e5ff84ce6f	cli: update err msg when control-plane already exists. (#5900 ) * cli: update err msg when control-plane already exists. Fixes #5889 Currently, The error msg that is sent when control-plane already exists suggests to use `--ignore-cluster`. We should instead suggest users to use `linkerd upgrade`. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-17 11:56:33 +05:30
William Morgan	d490c12aed	add steering committe links to README (#5910 ) Signed-off-by: William Morgan <william@buoyant.io>	2021-03-16 17:22:38 -07:00
Alejandro Pedraza	b1d3ad1082	Update proxy annotations docs (#5906 ) * Update proxy annotations docs * Add warning at the beginning	2021-03-16 15:31:54 -05:00
Dennis Adjei-Baah	156f6e01ec	build darwin arm64 cli images (#5907 ) This change modifies the Linkerd build scripts to start building darwin arm64 CLI binaries. This is to ensure architecture compatability for the M1 Apple CPU. Fixes #5886	2021-03-16 12:20:39 -07:00
Kevin Leimkuhler	1544d90150	dest: Reduce possible response size in destination service errors (#5893 ) This reduces the possible HTTP response size from the destination service when it encounters an error during a profile lookup. If multiple objects on a cluster share the same IP (such as pods in `kube-system`), the destination service will return an error with the two conflicting pod yamls. In certain cases, these pod yamls can be too large for the HTTP response and the destination pod's proxy will indicate that with the following error: ``` hyper::proto::h2::server: send response error: user error: header too big ``` From the app pod's proxy, this results in the following error: ``` poll_profile: linkerd_service_profiles::client: Could not fetch profile error=status: Unknown, message: "http2 error: protocol error: unexpected internal error encountered" ``` We now only return the conflicting pods (or services) names. This reduces the size of the returned error and fixes these warnings from occurring. Example response error: ``` poll_profile: linkerd_service_profiles::client: Could not fetch profile error=status: FailedPrecondition, message: "Pod IP address conflict: kube-system/kindnet-wsflq, kube-system/kube-scheduler-kind-control-plane", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 12 Mar 2021 19:54:09 GMT"} } ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-03-16 13:28:05 -04:00
Alejandro Pedraza	43be2ba0d1	'linkerd identity' should return the leaf cert (#5908 ) * 'linkerd identity' should return the leaf cert ... not the root, from the cert chain retrieved from the connection to the pod. Fixes #5901 Before: ```console $ linkerd identity -n emojivoto emoji-696d9d8f95-zgpxz POD emoji-696d9d8f95-zgpxz (1 of 1) Certificate: Data: Version: 3 (0x2) Serial Number: 1 (0x1) Signature Algorithm: ECDSA-SHA256 Issuer: CN=identity.linkerd.cluster.local Validity Not Before: Mar 15 20:14:49 2021 UTC Not After : Mar 15 20:15:09 2022 UTC Subject: CN=identity.linkerd.cluster.local Subject Public Key Info: Public Key Algorithm: ECDSA Public-Key: (256 bit) X: b7:6d:bc:0f:b5:2c:3e:76:a8:0d:9a:82:e8:ff:e1: 4d:9e:2a:fe:d3:7a:c0:8a:28:ce:22:fc:09:72:f8: c8:a4 Y: 3b:7e:d6:6b:b4:2e:15:c6:3e:c7:ae:dd:80:25:91: 78:97:f1:1c:3c:29:b4:a4:99:e2:78:03:0e:a7:d5: e7:47 Curve: P-256 X509v3 extensions: X509v3 Key Usage: critical Certificate Sign, CRL Sign X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Basic Constraints: critical CA:TRUE Signature Algorithm: ECDSA-SHA256 30:45:02:21:00:8f:02:00:26:c6:2a🆎17:8a:ea:02:1d:ee: 81:2d:8f:c5:d7:92:55:0a:d0:90:21:8d:72:83:bf:c0:8a:6d: 44:02:20:30:b6:22:7e:98:ff:80:2e:04:25:61:b0:fd:d0:d6: 2b:44:a8:e8:98:7d:ef:31:a4:f1:11🇩🇪93:fb:96:14:49 ``` After: ```console $ linkerd identity -n emojivoto emoji-696d9d8f95-zgpxz POD emoji-696d9d8f95-zgpxz (1 of 1) Certificate: Data: Version: 3 (0x2) Serial Number: 6 (0x6) Signature Algorithm: ECDSA-SHA256 Issuer: CN=identity.linkerd.cluster.local Validity Not Before: Mar 15 20:21:55 2021 UTC Not After : Mar 16 20:22:35 2021 UTC Subject: CN=emoji.emojivoto.serviceaccount.identity.linkerd.cluster.local Subject Public Key Info: Public Key Algorithm: ECDSA Public-Key: (256 bit) X: ac:5f:07:97:d4:0b:21:f0:f7:2d:a0:a5:85:19:c7: e7:3b:08:05:cf:c9:61:21:5a:f3:00:70:7e:a1:1b: 87:9e Y: a6:76:ef:7c:10:43:6e:55:e4:5d:ec:81:c0:cc:1c: 08:6c:81:4c:cb:c9:e4:53:89:1d:ab:6e:e6:ec:c4: 76:5b Curve: P-256 X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Subject Alternative Name: DNS:emoji.emojivoto.serviceaccount.identity.linkerd.cluster.local Signature Algorithm: ECDSA-SHA256 30:45:02:20:6a:10:b4:97:96:6b:54:a3:5f:02:36:8d:78:96: 8f:f7:0c:7b:1e:7b:4a:b2:2c:c2:a5:b0:16:47:bd:64:58:a3: 02:21:00:cc:95:7d:d1:6a:25:d3:81:a7:bb:63:8f:d9:65:a4: 93:77:6e:76:b8:50:22:bb:85:8c:38:f6:f8:c8:a3:3d:e9 ```	2021-03-16 09:48:24 -05:00
Shubhendra Singh Chauhan	ad3b9accd8	fix: issues affecting code quality (#5827 ) Fix various lint issues: - Remove unnecessary calls to fmt.Sprint - Fix check for empty string - Fix unnecessary calls to Printf - Combine multiple `append`s into a single call Signed-off-by: shubhendra <withshubh@gmail.com>	2021-03-15 17:35:40 -04:00
Dennis Adjei-Baah	7f0529ed7c	update go.mod and docker images to go 1.16.2 (#5890 ) * update go.mod and docker images to go 1.16.1 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io> * update test error messages for ParseDuration * update go version to 1.16.2	2021-03-15 11:20:16 -05:00
Ricardo Nales Amato	ec1be471ff	Update ADOPTERS.md (#5887 ) Add Tradeshift to the Linkerd 2 adopters list Signed-off-by: Ricardo Nales Amato <ricardo.amato@tradeshift.com>	2021-03-15 10:11:09 -05:00
Kris Barkar	f3504a10e2	Rabbitmq integration tests. (#5757 ) Add integration tests for external resources linkerd changes sometimes causes regressions on external components that users have installed in their stacks such as rabbitmq. This PR adds a new integration test, adds functionality to install additional components and run basic test on them to make sure linkerd changes do not have an adverse effect w.r.t RabbitMq. This change will include the following steps - Deploy a rabbitmq server - Deploy a rabbitmq client, the code for that client is hosted at https://github.com/barkardk/integration - Use a golden file for output verification - Uses the test suite to inject linkerd into the deployments - Client then creates a queue, a message and consumes said message and sends a log output. Signed-off-by: Kristin Barkardottir <kristin.barkardottir@netapp.com>	2021-03-15 15:40:28 +05:30
Alejandro Pedraza	711124159a	Fix helm-upgrade and upgrade-stable integration tests (#5891 ) * Fix helm-upgrade integration test Update `install_test.go` now that the upgrade test is done from 2.10. This also implied installing viz right after core. Refactored `HelmInstallPlain()` into `HelmCmdPlain()` to work for both helm install and upgrade. * Add expected heartbeat config entry to the upgrade-stable test, and remove testCheckCommand arg (for lint)	2021-03-12 08:20:45 -05:00
Alejandro Pedraza	4b9c273ae0	Update BUILD.md (#5869 ) * Update BUILD.md Fixes #5063 - Completed table of contents. - Added references to viz and other extensions there where appropriate. - Updated components chart. - Removed build architecture chart, too big and convoluted now to be of use ¯\_(ツ)_/¯ - Moved `go test` paragraphs into `TEST.md`, and moved from `TEST.md` the linting paragraph. - Group Helm paragraphs together. - Deleted the "Go modules and dependencies" paragraph. We can assume most people are already familiar with go modules. - Assume we're always using buildkit - Add notes about restarting the CP for tap an tracing to be enabled	2021-03-11 12:02:06 -05:00
Alex Leong	b10356a827	stable-2.10.0 (#5859 ) This release introduces Linkerd extensions. The default control plane no longer includes Prometheus, Grafana, the dashboard, or several other components that previously shipped by default. This results a much smaller and simpler set of core functionality. Visibility and metrics functionality is now available in the Viz extension under the `linkerd viz` command. Cross-cluster communication functionality is now available in the Multicluster extensions under the `linkerd multicluster` command. Distributed tracing functionality is not available in the Jaeger extension under the `linkerd jaeger` command. This release also introduces the ability to mark certain ports as "opaque", indicating that the proxy should treat the traffic as opaque TCP instead of attempting protocol detection. This allows the proxy to provide TCP metrics and mTLS for server-speaks-first protocols. It also enables support for TCP traffic in the Multicluster extension. Upgrade notes: Please see the [upgrade instructions](https://linkerd.io/2/tasks/upgrade/#upgrade-notice-stable-2100). * Proxy * Updated the proxy to use TLS version 1.3; support for TLS 1.2 remains enabled for compatibility with prior proxy versions * Improved support for server-speaks-first protocols by allowing ports to be marked as opaque, causing the proxy to skip protocol detection. Ports can be marked as opaque by setting the `config.linkerd.io/opaque-ports` annotation on the Pod and Service or by using the `--opaque-ports` flag with `linkerd inject` * Ports `25,443,587,3306,5432,11211` have been removed from the default skip ports; all traffic through those ports is now proxied and handled opaquely by default * Fixed an issue that could cause the inbound proxy to fail meshed HTTP/1 requests from older proxies (from the stable-2.8.x vintage) * Added a new `/shutdown` admin endpoint that may only be accessed over the loopback network allowing batch jobs to gracefully terminate the proxy on completion * Control Plane * Removed all components and functionality related to visibility, tracing, or multicluster. These have been moved into extensions * Changed the identity controller to receive the trust anchor via environment variable instead of by flag; this allows the certificate to be loaded from a config map or secret (thanks @mgoltzsche!) * Added PodDisruptionBudgets to the control plane components so that they cannot be all terminated at the same time during disruptions (thanks @tustvold!) * Added missing label to the `linkerd-config-overrides` secret to avoid breaking upgrades performed with the help of `kubectl apply --prune` * Fixed an issue where the `proxy-injector` and `sp-validator` did not refresh their certs automatically when provided externally—like through cert-manager * CLI * Changed the `check` command to include each installed extension's `check` output; this allows users to check for proper configuration and installation of Linkerd without running a command for each extension * Moved the `metrics`, `endpoints`, and `install-sp` commands into subcommands under the `diagnostics` command * Added an `--opaque-ports` flag to `linkerd inject` to easily mark ports as opaque. * Added the `repair` command which will repopulate resources needed for properly upgrading a Linkerd installation * Added Helm-style `set`, `set-string`, `values`, `set-files` customization flags for the `linkerd install` command * Introduced the `linkerd identity` command, used to fetch the TLS certificates for injected pods (thanks @jimil749) * Removed the `get` and `logs` command from the CLI * Helm * Changed many Helm values, please see the upgrade notes * Viz * Updated the Web UI to only display the "Gateway" sidebar link when the multicluster extension is active * Added a `linkerd viz list` command to list pods with tap enabled * Fixed an issue where the `tap` APIServer would not refresh its certs automatically when provided externally—like through cert-manager * Multicluster * Added support for cross-cluster TCP traffic * Updated the service mirror controller to copy the `config.linkerd.io/opaque-ports` annotation when mirroring services so that cross-cluster traffic can be correctly handled as opaque * Added support for multicluster gateways of types other than LoadBalancer (thanks @DaspawnW!) * Jaeger * Added a `linkerd jaeger list` command to list pods with tracing enabled * Other * Docker images are now hosted on the `cr.l5d.io` registry This release includes changes from a massive list of contributors. A special thank-you to everyone who helped make this release possible: [Lutz Behnke](https://github.com/cypherfox) [Björn Wenzel](https://github.com/DaspawnW) [Filip Petkovski](https://github.com/fpetkovski) [Simon Weald](https://github.com/glitchcrab) [GMarkfjard](https://github.com/GMarkfjard) [hodbn](https://github.com/hodbn) [Hu Shuai](https://github.com/hs0210) [Jimil Desai](https://github.com/jimil749) [jiraguha](https://github.com/jiraguha) [Joakim Roubert](https://github.com/joakimr-axis) [Josh Soref](https://github.com/jsoref) [Kelly Campbell](https://github.com/kellycampbell) [Matei David](https://github.com/mateiidavid) [Mayank Shah](https://github.com/mayankshah1607) [Max Goltzsche](https://github.com/mgoltzsche) [Mitch Hulscher](https://github.com/mhulscher) [Eugene Formanenko](https://github.com/mo4islona) [Nathan J Mehl](https://github.com/n-oden) [Nicolas Lamirault](https://github.com/nlamirault) [Oleh Ozimok](https://github.com/oleh-ozimok) [Piyush Singariya](https://github.com/piyushsingariya) [Naga Venkata Pradeep Namburi](https://github.com/pradeepnnv) [rish-onesignal](https://github.com/rish-onesignal) [Shai Katz](https://github.com/shaikatz) [Takumi Sue](https://github.com/tkms0106) [Raphael Taylor-Davies](https://github.com/tustvold) [Yashvardhan Kukreja](https://github.com/yashvardhan-kukreja) Signed-off-by: Alex Leong <alex@buoyant.io> Signed-off-by: Alex Leong <alex@buoyant.io>	2021-03-10 14:25:31 -08:00
Kevin Leimkuhler	c35caa35bd	Add changes for edge-21.3.2 (#5883 ) ## edge-21.3.2 This edge release is another release candidate for stable 2.10 and fixes some final bugs found in testing. A big thank you to users who have helped us identity these issues! * Fixed an issue with the service profile validating webhook that prevented service profiles from being added or updated * Updated the `check` command output hint anchors to match Linkerd component names * Fixed a permission issue with the Viz extension's tap admin cluster role by adding namespace listing to the allowed actions * Fixed an issue with the proxy where connections would not be torn down when communicating with a defunct endpoint * Improved diagnostic logging in the proxy * Fixed an issue with the Viz extension's Prometheus template that prevented users from specifying a log level flag for that component (thanks @n-oden!) * Fixed a template parsing issue that prevented users from specifying additional ignored inbound parts through Helm's `--set` flag * Fixed an issue with the proxy where non-HTTP streams could sometimes hang due to TLS buffering Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-03-09 16:52:53 -05:00
Oliver Gould	e1e60ffebb	proxy: v2.138.0 (#5882 ) This release fixes an issue where non-HTTP streams could hang due to TLS buffering. Buffered data is now flushed more aggressively to prevent TCP streams from getting "stuck" in the proxy. --- * duplex: Ensure written data is flushed (linkerd/linkerd2-proxy#944)	2021-03-09 16:22:39 -05:00
Kevin Leimkuhler	3b617a49e1	Fix parsing of ignoreInboundPorts field (#5876 ) The `ignoreInboundPorts` field is not parsed correctly when passed through the `--set` Helm flag. This was discovered in https://github.com/linkerd/linkerd2/pull/5874#pullrequestreview-606779599. This is happening because the value is not parsed into a string before using it in the templating. Before: ``` linkerd install --set proxyInit.ignoreInboundPorts="12345" \|grep 12345 ... - "4190,4191,%!s(int64=12345)" ... ``` After: ``` linkerd install --set proxyInit.ignoreInboundPorts="12345" \|grep 12345 ... - "4190,4191,12345" ... ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-03-09 15:32:20 -05:00
Nathan J Mehl	4e4b767e4d	prevent duplicate prometheus args (#5841 ) # Problem If a user specifies a `log.level` flag for linkerd-prometheus in `prometheus.args`, the template for linkerd-prometheus will generate a prometheus command where `--log.level` is specified twice (the first being directly interpolated from `prometheus.logLevel` in the template), and prometheus will crash-loop because its flags parser does not allow duplicate flags that are not arrays. # Solution Only fill in the `--log.level` flag from `.Values.prometheus.logLevel` if the `log.level` key is _not_ present in `.Values.prometheus.args` # Validation Added a test case. Signed-off-by: Nathan J. Mehl <n@oden.io>	2021-03-09 15:31:46 -05:00
Tarun Pothulapati	36084c6958	helm: add NOTES.txt for extension charts (#5870 ) Currently, There is no `Notes` that get printed out after installatio is performed through helm for extensions, like we do for the core chart. This updates the viz and jaeger charts to include that along with instructions to view the dashbaord. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-03-09 15:31:18 -05:00
dependabot[bot]	37657571d0	Bump elliptic from 6.5.3 to 6.5.4 in /web/app (#5878 ) Bumps [elliptic](https://github.com/indutny/elliptic) from 6.5.3 to 6.5.4. - [Release notes](https://github.com/indutny/elliptic/releases) - [Commits](https://github.com/indutny/elliptic/compare/v6.5.3...v6.5.4) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-03-09 15:30:24 -05:00

1 2 3 4 5 ...

2771 Commits All Branches Search

2771 Commits

All Branches