linkerd2-proxy

Commit Graph

Author	SHA1	Message	Date
Oliver Gould	26b718c55d	Include server address in server error logs (#2500 ) When the proxy's TCP server encounters an error (usually due to one of the connections failing, we log the error and the client's address. The server's address was omitted because it varies based on context that is not known in this module: in some cases it's the actual server address on the socket, but when proxying a connection it may be determined by the value retrieved from the SO_ORIGINAL_DST socket option. To fix this, the server now requires that connection metadata be able to materialize an 'AddrPair' parameter that describes a client-server connection. The TCP listener impls are updated to satisfy this based on the appropriate metadata; and the TCP server consumes this type to include both client and server addresses in the relevant logs/contexts.	2023-11-03 10:30:25 -07:00
Oliver Gould	cbf226e8e5	inbound: Fix gRPC response classification (#2496 ) `ecaaf39` changed the proxy's behavior with regard to creating [default response classifiers][default]: the defaults used to support detecting gRPC response (regardless of the request properties). To fix this, we modify the metrics module that uses responses classifiers to require them without inferring defaults. This enforces the intended usage pattern so that we do not silently and implicitly fall back to the default behavior. This change also updates the `NewClassify` module that inserts the response classifier request extension so that overrides are supported. We then can install a default classifier early in request processing and override it only if specified by a route configuration. To support this change, the http-metrics crate is updated to support querying response_total metrics without stringifying everything. [default]: `ecaaf39b46 (diff-372e8a8a57b1fad5d94f37d2f77fdc7a45bcf708782475424b75d671f99ea1a0L97-L103)`	2023-11-01 17:41:19 -07:00
Oliver Gould	920b2ddfba	Log a warning when the controller clients receive an error (#2499 ) The controller client includes a recovery/backoff module that causes resolutions to be retried when an unexpected error is encountered. These events are only logged at debugging and trace log levels. This change updates the destination and policy controller recovery modules to log unexpected errors as warnings.	2023-11-01 16:18:48 -07:00
Oliver Gould	6999daca59	gate: Fix readiness deadlock (#2493 ) The gate middleware controls a service's readiness so that it can exert back-pressure. This is used, for instance, by the circuit breaker module so that an endpoint can go into an unavailable state after the breaker has been tripped and be marked available again as it recovers. This change fixes a bug in that recovery scenario: when the gate is in a Limited state (i.e. when the circuit breaker puts an endpoint into Probation to test its availability), and caller (i.e. the balancer) is waiting for the endpoint to leave probation, the balancer may never be notified that the endpoint has left its probation state. To fix this, we update the gate controller to definitively close its inner Semaphore when transitioning out of a limited state -- dropping the semaphore in the sender doesn't close it when it's being held by a receiver. This issue is somewhat masked by the balancer's polling behavior, where endpoint states are only advanced as requests are processed. It seems likely, however, that this scenario could be encountered in the wild when circuit breaking is enabled on a service.	2023-11-01 12:30:38 -07:00
Oliver Gould	bbc4e23c1b	Bump ahash to v0.8.5 (#2498 ) * Bump ahash to v0.8.5 * Allow BSD-2-Clause	2023-11-01 12:29:47 -07:00
Oliver Gould	4f68425ccf	gate: Detect disconnected inner services in readiness (#2491 ) If `Gate` becomes ready, it assumes the inner service remains ready indefinitely. Load balancers rely on lazy and redudant readiness checking to avoid disconnected endpoints. This change fixes the Gate to ensure that the inner service is always polled whenever the gate is polled.	2023-10-25 12:55:31 -07:00
Eliza Weisman	986d45895c	chore: change `rust-toolchain` file to toml format (#2487 ) * chore: change `rust-toolchain` file to toml format The `rust-toolchain` file containing only a Rust version number is deprecated in favor of a TOML-formatted `rust-toolchain.toml`. Using the old format seems to make Dependabot unhappy --- it complains that: ``` only rust-toolchain files formatted as TOML are supported, the non-TOML format was deprecated by Rust ``` Therefore, this branch changes the toolchain file in this repo to the TOML format. This required updating the CI workflows that check that the toolchain matches to use a new regex.	2023-10-23 10:26:19 -07:00
Oliver Gould	6cf2e9f3e1	balance: Fail the discovery stream on queue backup (#2486 ) `328826caa` updated the balancer's discovery channel to prevent backing up into the discovery stream by dropping the discovery stream. This results in balancers becoming permanently stale (should they ever be used again). This change modifies the discovery stream so that these errors are fatal for the balancer. These errors are recorded distinctly by the error counters. To fix this, we replace the `DiscoverNew` module with a `discover::NewServices` module that wraps the buffering layer. The buffer now only holds target metadata, and services are only built as the entry is dequeued from channel. This has the (positive) side-effect that the proxy's stack_create_total metric will not be incremented before the balancer actually uses an endpoint stack. Previously, this metric would be incremented for all queued endpoint updates. We also now log at INFO the address of all additions and removals from a balancer. This should dramatically improve diagnostics in stale endpoint situations.	2023-10-19 11:44:42 -07:00
Eliza Weisman	777435b404	build(deps): update `rustix` to v0.36.16/v0.37.7 (#2488 ) This commit updates the proxy's dependency on `rustix` in order to resolve a potential memory exhaustion issue when using the `rustix::fs::Dir` iterator with the `linux-raw` backend. This issue is described in GHSA-c827-hfw6-qwvm. We currently depend on both `rustix` v0.36 and v0.37 as transitive deps, so this branch updates the v0.36 dep from v0.36.14 to v0.36.16, and the v0.37 dependency from v0.37.4 to v0.37.7. Unfortunately, we weren't able to get Dependabot to bump these deps for us, because it no longer supports the legacy (non-TOML) `rust-toolchain` file (see #2487 for details). Therefore, we have to do this bump manually.	2023-10-19 09:37:53 -07:00
Oliver Gould	328826caa7	balance: Log and fail stuck discovery streams. (#2484 ) In `6d2abbc`, we changed how outbound proxies process discovery updates. The prior implementation used a watchdog timeout to bound the amount of time an update stream could be full. With that change, when an update channel fills, the backpressure can extend to the destination controller's gRPC response stream. To detect and avoid this harmful (and useless) backpressure, this change modifies the balancer's discovery processing stream to exit when the balancer has 1000 unprocessed discovery updates. A sufficiently scary warning is logged.	2023-10-17 11:01:19 -07:00
Alex Leong	54979bc5d5	Render grpc_status metric label as number (#2480 ) Fixes https://github.com/linkerd/linkerd2/issues/11449 The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics. For example: ``` response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1 ``` This is because of the Display impl for Code. We explicitly convert to an i32 so this renders as a number instead: ``` response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1 ``` Signed-off-by: Alex Leong <alex@buoyant.io>	2023-10-03 17:17:28 -07:00
dependabot[bot]	45b324f7b4	build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (#2479 ) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](`0b7f8abb15...a8a3f3ad30`) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-28 13:57:26 -07:00
dependabot[bot]	597549d66e	build(deps): bump DavidAnson/markdownlint-cli2-action (#2476 ) Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/davidanson/markdownlint-cli2-action) from 12.0.0 to 13.0.0. - [Release notes](https://github.com/davidanson/markdownlint-cli2-action/releases) - [Commits](`3aaa38e446...ed4dec634f`) --- updated-dependencies: - dependency-name: DavidAnson/markdownlint-cli2-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-28 13:57:12 -07:00
dependabot[bot]	ad95981117	build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (#2478 ) Bumps [EmbarkStudios/cargo-deny-action](https://github.com/embarkstudios/cargo-deny-action) from 1.5.4 to 1.5.5. - [Release notes](https://github.com/embarkstudios/cargo-deny-action/releases) - [Commits](`a50c7d5f86...1e59595bed`) --- updated-dependencies: - dependency-name: EmbarkStudios/cargo-deny-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-28 13:56:40 -07:00
dependabot[bot]	768ff0b497	build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (#2475 ) Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.0.2 to 39.2.0. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](`6ee9cdc581...8238a41032`) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-28 13:56:19 -07:00
dependabot[bot]	0dafe33e9a	build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (#2474 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 3.5.0 to 4.1.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](`8f4b7f8486...8ade135a41`) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-27 14:53:30 -07:00
Eliza Weisman	e92f325bb6	meshtls: log errors parsing client certs (#2467 ) Currently, if errors occur while parsing a client identity from a TLS certificate, the `client_identity` function in `linkerd-meshtls-rustls` will simply discard the error and return `None`. This means that we cannot easily determine why a connection has no client identity --- there may have been no client cert, but we may also have failed to parse a client cert that was present. In order to make debugging these issues a little easier, I've changed this function to log any errors returned by `rustls-webpki` while parsing client certs.	2023-09-27 11:24:32 -07:00
dependabot[bot]	16a75fe1c7	build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (#2448 ) Bumps [EmbarkStudios/cargo-deny-action](https://github.com/embarkstudios/cargo-deny-action) from 1.5.0 to 1.5.4. - [Release notes](https://github.com/embarkstudios/cargo-deny-action/releases) - [Commits](`8af37f5d0c...a50c7d5f86`) --- updated-dependencies: - dependency-name: EmbarkStudios/cargo-deny-action dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-27 11:21:39 -07:00
dependabot[bot]	d44158cebb	build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (#2468 ) Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 36.2.1 to 39.0.2. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](`c9124514c3...6ee9cdc581`) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-27 11:08:06 -07:00
dependabot[bot]	218c9f6835	build(deps): bump DavidAnson/markdownlint-cli2-action (#2460 ) Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/davidanson/markdownlint-cli2-action) from 9.0.0 to 12.0.0. - [Release notes](https://github.com/davidanson/markdownlint-cli2-action/releases) - [Commits](`5b7c9f74fe...3aaa38e446`) --- updated-dependencies: - dependency-name: DavidAnson/markdownlint-cli2-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-27 11:06:42 -07:00
Eliza Weisman	85db2fcb69	meshtls: update to `rustls` v0.21.7 (#2472 ) Currently, the proxy [depends on an outdated version of `rustls`][1], v0.20.8. The `rustls` dependency is via our dependency on `tokio-rustls` v0.23.4; we don't have a direct `rustls` dependency, in order to ensure that the version of `rustls` is always the same version as used by `tokio-rustls`. `rustls` also has a dependency on `webpki`, and v0.20.x of `rustls` uses the original `webpki` crate, rather than the `rustls-webpki` crate. So, unfortunately, because we have a transitive dep on `webpki` via `rustls`, PR linkerd/linkerd2-proxy#2465 did not remove _all_ `webpki` deps from our dependency tree, only the direct dependency. This branch updates to `rustls` v0.21.x, which depends on `rustls-webpki` rather than `webpki`, removing the `webpki` dependency. This is accomplished by updating `tokio-rustls` to v0.24.x, implicitly updating the transitive `rustls` dep. In order to update to the semver-incompatible version of `rustls`, it was necessary to modify our code in order to track some breaking API changes. I've also added a `cargo-deny` ban for `webpki` to our `deny.toml`, to ensure that we always use the actively-maintained `rustls-webpki` crate rather than `webpki` classic. Since peer certificate validation is performed through `rustls` rather than through the direct `rustls-webpki` dependency, this should hopefully resolve issues with issuer certs that contain name constraints --- these were not fixed by linkerd/linkerd2-proxy#2465, because the failure with certs containing name constraints occurred inside of the `webpki` version depended on by `rustls`, rather than inside of the proxy's direct dep. See [this comment][2] for details. In addition, it was necessary to update `rustls-webpki` to v0.101.6, since v0.101.5 was yanked due to an accidental API breaking change. <details> <summary>Verifying that we no longer depend on `webpki`:</summary> Before: ```console $ cargo tree -p webpki -i webpki v0.22.1 ├── rustls v0.20.8 │ └── tokio-rustls v0.23.4 │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ └── linkerd-meshtls-rustls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls/rustls) │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) │ │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ │ └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy) │ │ ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin) │ │ │ └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) () │ │ │ [dev-dependencies] │ │ │ └── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) │ │ └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) () │ │ [dev-dependencies] │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) () │ ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) () │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) () │ │ [dev-dependencies] │ │ └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) () │ └── linkerd-meshtls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls) │ ├── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core) │ │ ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) () │ │ ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin) () │ │ ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) () │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) () │ │ └── linkerd-app-test v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/test) │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) () │ │ [dev-dependencies] │ │ ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) () │ │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) () │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ ├── linkerd-proxy-tap v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/proxy/tap) │ │ └── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core) () │ └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy) │ [dev-dependencies] │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration) │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) () │ [dev-dependencies] │ ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) () │ └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) () └── tokio-rustls v0.23.4 (*) ``` After: ```console $ cargo tree -p webpki -i error: package ID specification `webpki` did not match any packages ``` </details> [1]: `8afc72258b/Cargo.lock (L2450-L2460C2)` [2]: https://github.com/linkerd/linkerd2/issues/9299#issuecomment-1730094953	2023-09-21 14:17:49 -07:00
Pranoy Kumar Kundu	8afc72258b	Replace `procinfo` with `procfs` (#2433 ) This PR will replace [procinfo](https://crates.io/crates/procinfo) crate which is not maintained for over 5 years with [procfs](https://crates.io/crates/procfs). Signed-off-by: Pranoy Kumar Kundu <pranoy1998k@gmail.com> Fixes linkerd/linkerd2/issues/10819	2023-09-20 12:47:30 -07:00
Eliza Weisman	c10c4b7ac7	meshtls: use published `rustls-webpki` v0.101.5 (#2470 ) Now that [v0.101.5 of `rustls-webpki`][1] has been [published][2], we can now depend on the crate from crates.io. This allows us to remove the Git dependency on the branch preparing that release to be published, which allows us to remove the allowance for Git dependencies in the `cargo-deny` config. [1]: https://github.com/rustls/webpki/releases/tag/v%2F0.101.5 [2]: https://crates.io/crates/rustls-webpki/0.101.5	2023-09-18 11:13:11 -07:00
Eliza Weisman	cd04d7a801	use `rustls-webpki` instead of `linkerd/webpki` (#2465 ) This commit changes the `linkerd-meshtls-rustls` crate to use the upstream `rustls-webpki` crate, maintained by Rustls, rather than our fork of `briansmith/webpki` from GitHub. Since `rustls-webpki` includes the change which was the initial motivation for the `linkerd/webpki` fork (rustls/webpki#42), we can now depend on upstream. Currently, we must take a Git dependency on `rustls-webpki`, since a release including a fix for an issue (rustls/webpki#167) which prevents `rustls-webpki` from parsing our test certificates has not yet been published. Once v0.101.5 of `rustls-webpki` is published (PR see rustls/webpki#170), we can remove the Git dep. For now, I've updated `cargo-deny` to allow the Git dependency.	2023-09-11 11:03:52 -07:00
Eliza Weisman	426120a6e9	build(deps): use published version of `boring` (#2454 ) The `linkerd-meshtls-boring` crate currently uses a Git dependency on `boring` and `tokio-boring`. This is because, when this crate was initially introduced, the proxy required unreleased changes to these crates. Now, however, upstream has published all the changes we depended on (this happened ages ago), and we can depend on these libraries from crates.io. This branch removes the Git deps and updates to v3.0.0 of `boring`/`tokio-boring`. I've also changed the `cargo-deny` settings to no longer allow Git deps on these crates, as we no longer depend on them from Git.	2023-08-25 10:27:48 -07:00
Eliza Weisman	9fa90df4ec	Increase HTTP request queue capacity (#2449 ) In 2.13, the default inbound and outbound HTTP request queue capacity decreased from 10,000 requests to 100 requests (in PR #2078). This change results in proxies shedding load much more aggressively while under high load to a single destination service, resulting in increased error rates in comparison to 2.12 (see linkerd/linkerd2#11055 for details). This commit changes the default HTTP request queue capacities for the inbound and outbound proxies back to 10,000 requests, the way they were in 2.12 and earlier. In manual load testing I've verified that increasing the queue capacity results in a substantial decrease in 503 Service Unavailable errors emitted by the proxy: with a queue capacity of 100 requests, the load test described [here] observed a failure rate of 51.51% of requests, while with a queue capacity of 10,000 requests, the same load test observes no failures. Note that I did not modify the TCP connection queue capacities, or the control plane request queue capacity. These were previously configured by the same variable before #2078, but were split out into separate vars in that change. I don't think the queue capacity limits for TCP connection establishment or for control plane requests are currently resulting in instability the way the decreased request queue capacity is, so I decided to make a more focused change to just the HTTP request queues for the proxies. [here]: https://github.com/linkerd/linkerd2/issues/11055#issuecomment-1650957357	2023-08-03 10:19:39 -07:00
Alex Leong	32245601bb	Add suport for response header filter (#2439 ) Add support for the response header modifier, which was added to the proxy API in https://github.com/linkerd/linkerd2-proxy-api/pull/251 Signed-off-by: Alex Leong <alex@buoyant.io>	2023-07-19 14:47:03 -07:00
Oliver Gould	d6172c594c	Emit distinguishable version info (#2432 ) The proxy currently emits very little useful version information. This change updates the proxy to support new build-time environment variables that are used to report version information: * LINKERD2_PROXY_BUILD_TIME * LINKERD2_PROXY_VENDOR * LINKERD2_PROXY_VERSION Additionally, several pre-existing Git-oriented metadata have been removed, as they were generally redundant or uninformative. The Rustc version has also been removed (since it has no real user-facing value and can be easily determined by the version/tag).	2023-06-23 14:21:28 -07:00
Eliza Weisman	f0277cc776	outbound: handle `NotFound` client policies in ingress mode (#2431 ) When the outbound proxy resolves an outbound policy from the policy controller's `OutboundPolicies` API, the policy controller may return an error with the `grpc-status` code `NotFound` in order to indicate that the destination is not a ClusterIP service. When this occurs, the proxy will fall back to either using a ServiceProfile, if the ServiceProfile contains non-trivial configuration, or synthesizing a default client policy from the ServiceProfile. However, when the outbound proxy is configured to run in ingress mode, the fallback behavior does not occur. Instead, the ingress mode proxy treats any error returned by the policy controller's `OutboundPolicies` API as fatal. This means that when an ingress controller performs its own load-balancing and opens a connection to a pod IP directly, the ingress mode proxy will fail any requests on that connection. This is a bug, and is the cause of the issues described in linkerd/linkerd2#10908. This branch fixes this by changing the ingress mode proxy to handle `NotFound` errors returned by the policy controller. I've added similar logic for synthesizing default policies from a discovered ServiceProfile, or using the profile if it's non-trivial. Unfortunately, we can't just reuse the existing `Outbound::resolver` method, as ingress discovery may be performed for an original destination address or for a DNS name, and it's necessary to construct fallback policies in either case. Instead, I've added a new function with similar behavior that's ingress-specific. I've manually tested this change against the repro steps[^1] described in linkerd/linkerd2#10908, and verified that the proxy 503s on 2.13.4, and that it once again routes correctly after applying this change. Fixes linkerd/linkerd2#10908. [^1]: As described in the first comment, using Contour and podinfo.	2023-06-20 12:14:37 -07:00
Eliza Weisman	9f0a2698f4	outbound: add backend and route metadata to errors (#2428 ) PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ```	2023-06-15 12:26:31 -07:00
Eliza Weisman	966306bc84	outbound: implement `OutboundPolicies` backend request timeouts (#2419 ) Depends on #2418 The latest proxy-api release, v0.10.0, adds fields to the `OutboundPolicies` API for configuring HTTP request timeouts, based on the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997. PR #2418 updates the proxy to depend on the new proxy-api release, and implements the `Rule.request_timeout` field added to the API. However, that branch does not add a timeout for the `RouteBackend.request_timeout` field. This branch changes the proxy to apply the backend request timeout when configured by the policy controller. This branch implements `RouteBackend.request_timeout` by adding an additional timeout layer in the `MatchedBackend` stack. This applies the per-backend timeout once a backend is selected for a route. I've also added stack tests for the interaction between the request and backend request timeouts. Note that once retries are added to client policy stacks, it may be necessary to move the backend request timeout to ensure it occurs "below" retries, depending on where the retry middleware ends up being located in the proxy stack.	2023-06-15 10:48:49 -07:00
dependabot[bot]	1a6225feb7	build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (#2427 ) Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 35.7.7 to 36.2.1. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](`db5dd7c176...c9124514c3`) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-15 09:33:42 -07:00
Eliza Weisman	a512ea658d	outbound: implement `OutboundPolicies` route request timeouts (#2418 ) The latest proxy-api release, v0.10.0, adds fields to the `OutboundPolicies` API for configuring HTTP request timeouts, based on the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997. This branch updates the proxy-api dependency to v0.10.0 and adds the new timeout configuration fields to the proxy's internal client policy types. In addition, this branch adds a timeout middleware to the HTTP client policy stack, so that the timeout described by the `Rule.request_timeout` field is now applied. Implementing the `RouteBackend.request_timeout` field with semantics as close as possible to those described in GEP-1742 will be somewhat more complex, and will be added in a separate PR.	2023-06-14 11:36:40 -07:00
Eliza Weisman	864a5dbc97	recover: remove unused `mut` (#2425 )	2023-06-12 11:45:35 -07:00
Alex Leong	704ef31a28	Classify grpc requests properly in the http classifier (#2410 ) The gRPC protocol always sets the HTTP response status code to 200 and instead communicates failures in a grpc-status header sent in a TRAILERS frame. Linkerd uses the HTTP response status code to determine if a response is successful, and therefore will consider all gRPC responses successful regardless of their gRPC status code. This means that functionality such as retries and circuit breaking do not function correctly with gRPC traffic. We update the Http classifier to look for the presence of a `Content-Type: application/grpc` header and use Grpc response classification when it is set. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-05-22 13:40:37 -07:00
Matei David	3933feb587	Skip h2 upgrade when target is local (#2407 ) In the most recent stable versions, pods cannot communicate with themselves when using a ClusterIP. While direct (pod-to-pod) connections are never sent through the proxy and are skipped at the iptables level, connections to a logical service still pass through the proxy. When the chosen endpoint is the same as the source of the traffic, TLS and H2 upgrades should be skipped. Every endpoint receives an h2 upgrade hint in its metadata. When looking into the problem, I noticed that client settings do not take into account that the target may be local. When deciding what client settings to use, we do not upgrade the connection when the hint is "unknown" (gatewayed connections) or "opaque". This change does a similar thing by using H1 settings when the protocol is H1 and the target IP is also part of the inbound IPs passed to the proxy. Fixes linkerd/linkerd2#10816 Signed-off-by: Matei David <matei@buoyant.io>	2023-05-10 10:01:03 +01:00
Willi Schönborn	5366652c2b	Propagate correct span ID via W3C context (#2408 ) The W3C context propagation uses the wrong span ID right now. That causes all spans emitted by linkerd-proxy to be siblings rather than children of their original parent. This only applies to W3C as far as I can tell, because the B3 propagation uses the span ID correctly. Signed-off-by: Willi Schönborn <w.schoenborn@gmail.com>	2023-05-04 09:21:35 -07:00
Oliver Gould	c30d39f73e	dev: Update to Rust v1.69.0 (#2402 ) Update dev to v40	2023-04-25 15:56:34 -07:00
dependabot[bot]	2a32127d48	build(deps): bump async-trait from 0.1.66 to 0.1.68 (#2368 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Eliza Weisman <eliza@buoyant.io>	2023-04-25 14:52:13 -07:00
dependabot[bot]	fc3a1e860c	build(deps): bump futures from 0.3.26 to 0.3.28 (#2370 ) Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.26 to 0.3.28. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.26...0.3.28) --- updated-dependencies: - dependency-name: futures dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-04-25 14:08:37 -07:00
Eliza Weisman	717fd22b23	chore: allow `syn` v1 and v2 to coexist peacefully (#2401 ) The proc-macro ecosystem is in the middle of a migration from `syn` v1 to `syn` v2. Some crates (such as `tokio-macros`, `async-trait`, `tracing-attributes`, etc) have been updated to v2, while others haven't yet. This means that `cargo deny` will not currently permit us to update some of those crates to versions that depend on `syn` v2, because they will create a duplicate dependency. Since `syn` is used by proc-macros (executed at compile time), duplicate versions won't have an impact on the final binary size. Therefore, it's fine to allow both v1 and v2 to coexist while the ecosystem is still being gradually migrated to the new version.	2023-04-25 12:42:20 -07:00
dependabot[bot]	aacd8c9bac	build(deps): bump io-lifetimes from 1.0.4 to 1.0.10 (#2379 ) Bumps [io-lifetimes](https://github.com/sunfishcode/io-lifetimes) from 1.0.4 to 1.0.10. - [Release notes](https://github.com/sunfishcode/io-lifetimes/releases) - [Commits](https://github.com/sunfishcode/io-lifetimes/compare/1.0.4...v1.0.10) --- updated-dependencies: - dependency-name: io-lifetimes dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-04-25 09:58:41 -07:00
Eliza Weisman	181a20753a	outbound: synthesize client policies on `Unimplemented` (#2396 ) If the policy controller is from a Linkerd version earlier than 2.13.x, it will return the `Unimplemented` gRPC status code for requests to the `OutboundPolicies` API. The proxy's outbound policy client will currently retry this error code, rather than synthesizing a default policy. Since 2.13.x proxies require an `OutboundPolicy` to be discovered before handling outbound traffic, this means that 2.13.x proxies cannot handle outbound connections when the control plane is on an earlier version. Therefore, installing Linkerd 2.13 and then downgrading to 2.12 can potentially break the data plane's ability to route traffic. In order to support downgrade scenarios, the proxy should also synthesize a default policy when receiving an `Unimplemented` gRPC status code from the policy controller. This branch changes the proxy to do that. A warning is logged which indicates that the control plane version is older than the proxy's.	2023-04-25 09:56:30 -07:00
Eliza Weisman	ad4d5b64bb	inbound: determine default policies using the opaque ports env var (#2395 ) The proxy injector populates an environment variable, `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION`, with a list of all ports marked as opaque. Currently, however, _the proxy _does not actually use this environment variable_. Instead, opaque ports are discovered from the policy controller. The opaque ports environment variable was used only when running in the "fixed" inbound policy mode, where all inbound policies are determined from environment variables, and no policy controller address is provided. This mode is no longer supported, and the policy controller address is now required, so the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable is not currently used to discover inbound opaque ports. There are two issues with the current state of things. One is that inbound policy discovery is _non-blocking_: when an inbound proxy receives a connection on a port that it has not previously discovered a policy for, it uses the default policy until it has successfully discovered a policy for that port from the policy controller. This means that the proxy may perform protocol detection on the first connection to an opaque port. This isn't great, as it may result in a protocol detection timeout error on a port that the user had previously marked as opaque. It would be preferable for the proxy to read the environment variable, and use it to determine whether the default policy for a port is opaque, so that ports marked as opaque disable protocol detection even before the "actual" policy is discovered. The other issue with the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable is that it is currently a list of _individual port numbers_, while the proxy injector can accept annotations that specify _ranges_ of opaque ports. This means that when a very large number of ports are marked as opaque, the proxy manifest must contain a list of each individual port number in those ranges, making it potentially quite large. See linkerd/linkerd2#9803 for details on this issue. This branch addresses both of these problems. The proxy is changed so that it will once again read the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable, and use it to determine which ports should have opaque policies by default. The parsing of the environment variable is changed to support specifying ports as a list of ranges, rather than a list of individual port numbers. Along with a proxy-injector change, this would resolve the manifest size issue described in linkerd/linkerd2#9803. This is implemented by changing the `inbound::policy::Store` type to also include a set of port ranges that are marked as opaque. When the `Store` handles a `get_policy` call for a port that is not already in the cache, it starts a control plane watch for that port just as it did previously. However, when determining the initial _default_ value for the policy, before the control plane discovery provides one, it checks whether the port is in a range that is marked as opaque, and, if it is, uses an opaque default policy instead. This approach was chosen rather than pre-populating the `Store` with policies for all opaque ports to better handle the case where very large ranges are marked as opaque and are used infrequently. If the `Store` was pre-populated with default policies for all such ports, it would essentially behave as though all ports in `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` were also in `LINKERD2_PROXY_INBOUND_PORTS`, and the proxy would immediately start a policy controller discovery watch for all opaque ports, which would be kept open for the proxy's entire lifetime. In cases where the opaque ports ranges include ~10,000s of ports, this causes significant unnecessary load on the policy controller. Storing opaque port ranges separately and using them to determine the default policy as needed allows opaque port policies to be treated the same as non-default ports, which are discovered as needed and can be evicted from the cache if they are unused. If a port is in both `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` and `LINKERD2_PROXY_INBOUND_PORTS`, the proxy will start discovery eagerly and retain the port in the cache forever, but the default policy will be opaque. I've also added a test for the behavior of opaque ports where the port's policy has not been discovered from the policy controller. That test fails on `main`, as the proxy attempts protocol detection, but passes on this branch. In addition, I changed the parsing of the `LINKERD2_PROXY_INBOUND_PORTS` environment variable to also accept ranges, because it seemed like a nice thing to do while I was here. :)	2023-04-25 09:56:07 -07:00
Eliza Weisman	15bebe4eeb	outbound: add missing `meta` field in test policy (#2400 ) Looks like we accidentally merged PR #2375 without a CI build against the latest state of `main`. In the meantime since #2375 was last built on CI, PR #2374 added an additional metadata field to `policy::HttpParams`, which the `HttpParams` constructed in the test added from #2375 doesn't populate. Therefore, merging this PR broke the build. Whoops! This commit populates the `meta` field, fixing it.	2023-04-24 14:22:28 -07:00
Eliza Weisman	2a48e12e2b	outbound: test load balancer behavior with failure accrual (#2375 ) This branch adds a new test for failure accrual in load balancers with multiple endpoints. This test asserts that endpoints whose circuit breakers have tripped will not be selected by a load balancer.	2023-04-24 13:34:25 -07:00
Eliza Weisman	423ffb0af8	set default `trust_dns` log level to `ERROR` (#2393 ) Since upstream has yet to release a version with PR bluejekyll/trust-dns#1881, this commit changes the proxy's default log level to silence warnings from `trust_dns_proto` that are generally spurious. See linkerd/linkerd2#10123 for details.	2023-04-24 13:25:25 -07:00
Eliza Weisman	9d8607351a	outbound: determine protocol based on `OutboundPolicy` (#2397 ) Currently, the outbound proxy determines whether or not to perform protocol detection based on the presence of the `opaque_protocol` field on the resolved `ServiceProfile` from the Destination controller. However, the `OutboundPolicy` resolved from the policy controller also contains a `proxy_protocol` field that indicates what protocol should be used for this destination. While the proxy uses the HTTPRoutes from the `OutboundPolicy`'s `proxy_protocol`, it does _not_ take into account the `proxy_protocol` when determining whether or not to perform protocol detection. This can result in the outbound proxy performing protocol detection on connections to destinations that have been marked as opaque. This branch modifies the outbound proxy to use the `proxy_protocol` from the `OutboundPolicy`, as well as the `opaque_protocol` field from the `ServiceProfile`, when determining whether or not to perform protocol detection. In addition, I've added an integration test, which fails before making the changes on this branch. Fixes linkerd/linkerd2#10745	2023-04-24 13:22:08 -07:00
Eliza Weisman	051e0e199d	build(deps): bump `h2` to v0.3.18 (#2394 ) The DOS mitigation changes in `h2` v0.3.17 inadvertantly introduced a potential panic (hyperium/h2#674). Version 0.3.18 fixes this, so we should bump the proxy's dependency to avoid panics.	2023-04-18 14:51:25 -07:00
Eliza Weisman	c7918cfb1f	outbound: handle `Opaque` protocol hints on endpoints (#2237 ) Currently, when the outbound proxy makes a direct connection prefixed with a `TransportHeader` in order to send HTTP traffic, it will always send a `SessionProtocol` hint with the HTTP version as part of the header. This instructs the inbound proxy to use that protocol, even if the target port has a ServerPolicy that marks that port as opaque, which can result in incorrect handling of that connection. See linkerd/linkerd2#9888 for details. In order to prevent this, linkerd/linkerd2-proxy-api#197 adds a new `ProtocolHint` value to the protobuf endpoint metadata message. This will allow the Destination controller to explicitly indicate to the outbound proxy that a given endpoint is known to handle all connections to a port as an opaque TCP stream, and that the proxy should not perform a protocol upgrade or send a `SessionProtocol` in the transport header. This branch updates the proxy to handle this new hint value, and adds tests that the outbound proxy behaves as expected. Along with linkerd/linkerd2#10301, this will fix linkerd/linkerd2#9888. I opened a new PR for this change rather than attempting to rebase my previous PR #2209, as it felt a bit easier to start with a new branch and just make the changes that were still relevant. Therefore, this closes #2209.	2023-04-13 14:09:01 -07:00

1 2 3 4 5 ...

2303 Commits All Branches Search

2303 Commits

All Branches