Commit Graph

2303 Commits

Author SHA1 Message Date
Oliver Gould 26b718c55d
Include server address in server error logs (#2500)
When the proxy's TCP server encounters an error (usually due to one of
the connections failing, we log the error and the client's address. The
server's address was omitted because it varies based on context that is
not known in this module: in some cases it's the actual server address
on the socket, but when proxying a connection it may be determined by
the value retrieved from the SO_ORIGINAL_DST socket option.

To fix this, the server now requires that connection metadata be able to
materialize an 'AddrPair' parameter that describes a client-server
connection. The TCP listener impls are updated to satisfy this based on
the appropriate metadata; and the TCP server consumes this type to
include both client and server addresses in the relevant logs/contexts.
2023-11-03 10:30:25 -07:00
Oliver Gould cbf226e8e5
inbound: Fix gRPC response classification (#2496)
ecaaf39 changed the proxy's behavior with regard to creating [default
response classifiers][default]: the defaults used to support detecting
gRPC response (regardless of the request properties).

To fix this, we modify the metrics module that uses responses
classifiers to *require* them without inferring defaults. This enforces
the intended usage pattern so that we do not silently and implicitly
fall back to the default behavior.

This change also updates the `NewClassify` module that inserts the
response classifier request extension so that overrides are supported.
We then can install a default classifier early in request processing and
override it only if specified by a route configuration.

To support this change, the http-metrics crate is updated to support
querying response_total metrics without stringifying everything.

[default]: ecaaf39b46 (diff-372e8a8a57b1fad5d94f37d2f77fdc7a45bcf708782475424b75d671f99ea1a0L97-L103)
2023-11-01 17:41:19 -07:00
Oliver Gould 920b2ddfba
Log a warning when the controller clients receive an error (#2499)
The controller client includes a recovery/backoff module that causes
resolutions to be retried when an unexpected error is encountered.
These events are only logged at debugging and trace log levels.

This change updates the destination and policy controller recovery
modules to log unexpected errors as warnings.
2023-11-01 16:18:48 -07:00
Oliver Gould 6999daca59
gate: Fix readiness deadlock (#2493)
The gate middleware controls a service's readiness so that it can exert
back-pressure. This is used, for instance, by the circuit breaker module
so that an endpoint can go into an unavailable state after the breaker
has been tripped and be marked available again as it recovers.

This change fixes a bug in that recovery scenario: when the gate is in a
Limited state (i.e. when the circuit breaker puts an endpoint into
Probation to test its availability), and caller (i.e. the balancer) is
waiting for the endpoint to leave probation, the balancer may never be
notified that the endpoint has left its probation state.

To fix this, we update the gate controller to definitively close its
inner Semaphore when transitioning out of a limited state -- dropping
the semaphore in the sender doesn't close it when it's being held by a
receiver.

This issue is somewhat masked by the balancer's polling behavior, where
endpoint states are only advanced as requests are processed. It seems
likely, however, that this scenario could be encountered in the wild
when circuit breaking is enabled on a service.
2023-11-01 12:30:38 -07:00
Oliver Gould bbc4e23c1b
Bump ahash to v0.8.5 (#2498)
* Bump ahash to v0.8.5

* Allow BSD-2-Clause
2023-11-01 12:29:47 -07:00
Oliver Gould 4f68425ccf
gate: Detect disconnected inner services in readiness (#2491)
If `Gate` becomes ready, it assumes the inner service remains ready
indefinitely.

Load balancers rely on lazy and redudant readiness checking to avoid
disconnected endpoints.

This change fixes the Gate to ensure that the inner service is always
polled whenever the gate is polled.
2023-10-25 12:55:31 -07:00
Eliza Weisman 986d45895c
chore: change `rust-toolchain` file to toml format (#2487)
* chore: change `rust-toolchain` file to toml format

The `rust-toolchain` file containing only a Rust version number is
deprecated in favor of a TOML-formatted `rust-toolchain.toml`. Using the
old format seems to make Dependabot unhappy --- it complains that:

```
only rust-toolchain files formatted as TOML are supported, the non-TOML
format was deprecated by Rust
```

Therefore, this branch changes the toolchain file in this repo to the
TOML format. This required updating the CI workflows that check that
the toolchain matches to use a new regex.
2023-10-23 10:26:19 -07:00
Oliver Gould 6cf2e9f3e1
balance: Fail the discovery stream on queue backup (#2486)
328826caa updated the balancer's discovery channel to prevent backing up
into the discovery stream by dropping the discovery stream. This results
in balancers becoming permanently stale (should they ever be used
again).

This change modifies the discovery stream so that these errors are fatal
for the balancer. These errors are recorded distinctly by the error counters.

To fix this, we replace the `DiscoverNew` module with a
`discover::NewServices` module that wraps the buffering layer. The
buffer now only holds target metadata, and services are only built as
the entry is dequeued from channel.

This has the (positive) side-effect that the proxy's stack_create_total
metric will not be incremented before the balancer actually uses an
endpoint stack. Previously, this metric would be incremented for all
queued endpoint updates.

We also now log at INFO the address of all additions and removals from a
balancer. This should dramatically improve diagnostics in stale endpoint
situations.
2023-10-19 11:44:42 -07:00
Eliza Weisman 777435b404
build(deps): update `rustix` to v0.36.16/v0.37.7 (#2488)
This commit updates the proxy's dependency on `rustix` in order to
resolve a potential memory exhaustion issue when using the
`rustix::fs::Dir` iterator with the `linux-raw` backend. This issue is
described in GHSA-c827-hfw6-qwvm.

We currently depend on both `rustix` v0.36 and v0.37 as transitive deps,
so this branch updates the v0.36 dep from v0.36.14 to v0.36.16, and the
v0.37 dependency from v0.37.4 to v0.37.7.

Unfortunately, we weren't able to get Dependabot to bump these deps for
us, because it no longer supports the legacy (non-TOML) `rust-toolchain`
file (see #2487 for details). Therefore, we have to do this bump
manually.
2023-10-19 09:37:53 -07:00
Oliver Gould 328826caa7
balance: Log and fail stuck discovery streams. (#2484)
In 6d2abbc, we changed how outbound proxies process discovery updates.
The prior implementation used a watchdog timeout to bound the amount of
time an update stream could be full. With that change, when an update
channel fills, the backpressure can extend to the destination
controller's gRPC response stream.

To detect and avoid this harmful (and useless) backpressure, this change
modifies the balancer's discovery processing stream to exit when the
balancer has 1000 unprocessed discovery updates. A sufficiently scary
warning is logged.
2023-10-17 11:01:19 -07:00
Alex Leong 54979bc5d5
Render grpc_status metric label as number (#2480)
Fixes https://github.com/linkerd/linkerd2/issues/11449

The `grpc_status` metric label is rendered as a long form, human readable string value in the proxy metrics.  For example:

```
response_total{direction="outbound", [...], classification="failure",grpc_status="Unknown error",error=""} 1
```

This is because of the Display impl for Code.  We explicitly convert to an i32 so this renders as a number instead:

```
response_total{direction="outbound", [...] ,classification="failure",grpc_status="2",error=""} 1
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-10-03 17:17:28 -07:00
dependabot[bot] 45b324f7b4
build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (#2479)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](0b7f8abb15...a8a3f3ad30)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-28 13:57:26 -07:00
dependabot[bot] 597549d66e
build(deps): bump DavidAnson/markdownlint-cli2-action (#2476)
Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/davidanson/markdownlint-cli2-action) from 12.0.0 to 13.0.0.
- [Release notes](https://github.com/davidanson/markdownlint-cli2-action/releases)
- [Commits](3aaa38e446...ed4dec634f)

---
updated-dependencies:
- dependency-name: DavidAnson/markdownlint-cli2-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-28 13:57:12 -07:00
dependabot[bot] ad95981117
build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (#2478)
Bumps [EmbarkStudios/cargo-deny-action](https://github.com/embarkstudios/cargo-deny-action) from 1.5.4 to 1.5.5.
- [Release notes](https://github.com/embarkstudios/cargo-deny-action/releases)
- [Commits](a50c7d5f86...1e59595bed)

---
updated-dependencies:
- dependency-name: EmbarkStudios/cargo-deny-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-28 13:56:40 -07:00
dependabot[bot] 768ff0b497
build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (#2475)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 39.0.2 to 39.2.0.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](6ee9cdc581...8238a41032)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-28 13:56:19 -07:00
dependabot[bot] 0dafe33e9a
build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (#2474)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.5.0 to 4.1.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](8f4b7f8486...8ade135a41)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-27 14:53:30 -07:00
Eliza Weisman e92f325bb6
meshtls: log errors parsing client certs (#2467)
Currently, if errors occur while parsing a client identity from a TLS
certificate, the `client_identity` function in `linkerd-meshtls-rustls`
will simply discard the error and return `None`. This means that we
cannot easily determine *why* a connection has no client identity ---
there may have been no client cert, but we may also have failed to parse
a client cert that was present.

In order to make debugging these issues a little easier, I've changed
this function to log any errors returned by `rustls-webpki` while
parsing client certs.
2023-09-27 11:24:32 -07:00
dependabot[bot] 16a75fe1c7
build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (#2448)
Bumps [EmbarkStudios/cargo-deny-action](https://github.com/embarkstudios/cargo-deny-action) from 1.5.0 to 1.5.4.
- [Release notes](https://github.com/embarkstudios/cargo-deny-action/releases)
- [Commits](8af37f5d0c...a50c7d5f86)

---
updated-dependencies:
- dependency-name: EmbarkStudios/cargo-deny-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-27 11:21:39 -07:00
dependabot[bot] d44158cebb
build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (#2468)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 36.2.1 to 39.0.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](c9124514c3...6ee9cdc581)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-27 11:08:06 -07:00
dependabot[bot] 218c9f6835
build(deps): bump DavidAnson/markdownlint-cli2-action (#2460)
Bumps [DavidAnson/markdownlint-cli2-action](https://github.com/davidanson/markdownlint-cli2-action) from 9.0.0 to 12.0.0.
- [Release notes](https://github.com/davidanson/markdownlint-cli2-action/releases)
- [Commits](5b7c9f74fe...3aaa38e446)

---
updated-dependencies:
- dependency-name: DavidAnson/markdownlint-cli2-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-27 11:06:42 -07:00
Eliza Weisman 85db2fcb69
meshtls: update to `rustls` v0.21.7 (#2472)
Currently, the proxy [depends on an outdated version of `rustls`][1],
v0.20.8. The `rustls` dependency is via our dependency on `tokio-rustls`
v0.23.4; we don't have a direct `rustls` dependency, in order to ensure
that the version of `rustls` is always the same version as used by
`tokio-rustls`. `rustls` also has a dependency on `webpki`, and v0.20.x
of `rustls` uses the original `webpki` crate, rather than the
`rustls-webpki` crate. So, unfortunately, because we have a transitive
dep on `webpki` via `rustls`, PR linkerd/linkerd2-proxy#2465 did not
remove _all_ `webpki` deps from our dependency tree, only the direct
dependency.

This branch updates to `rustls` v0.21.x, which depends on
`rustls-webpki` rather than `webpki`, removing the `webpki` dependency.
This is accomplished by updating `tokio-rustls` to v0.24.x, implicitly
updating the transitive `rustls` dep. In order to update to the
semver-incompatible version of `rustls`, it was necessary to modify our
code in order to track some breaking API changes. I've also added a
`cargo-deny` ban for `webpki` to our `deny.toml`, to ensure that we
always use the actively-maintained `rustls-webpki` crate rather than
`webpki` classic.

Since peer certificate validation is performed through `rustls` rather
than through the direct `rustls-webpki` dependency, this should
hopefully resolve issues with issuer certs that contain name constraints
--- these were not fixed by linkerd/linkerd2-proxy#2465, because the
failure with certs containing name constraints occurred inside of the
*`webpki` version depended on by `rustls`*, rather than inside of the
proxy's direct dep. See [this comment][2] for details.

In addition, it was necessary to update `rustls-webpki` to v0.101.6,
since v0.101.5 was yanked due to an accidental API breaking change.

<details>

<summary>Verifying that we no longer depend on `webpki`:</summary>

Before:

```console
$ cargo tree -p webpki -i
webpki v0.22.1
├── rustls v0.20.8
│   └── tokio-rustls v0.23.4
│       ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│       └── linkerd-meshtls-rustls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls/rustls)
│           ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound)
│           │   ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app)
│           │   │   ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│           │   │   └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy)
│           │   ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin)
│           │   │   └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*)
│           │   │   [dev-dependencies]
│           │   │   └── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│           │   └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway)
│           │       └── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*)
│           │   [dev-dependencies]
│           │   └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*)
│           ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound)
│           │   ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*)
│           │   └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*)
│           │   [dev-dependencies]
│           │   └── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*)
│           └── linkerd-meshtls v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/meshtls)
│               ├── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core)
│               │   ├── linkerd-app v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app) (*)
│               │   ├── linkerd-app-admin v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/admin) (*)
│               │   ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*)
│               │   ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│               │   ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│               │   ├── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*)
│               │   └── linkerd-app-test v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/test)
│               │       ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│               │       ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│               │       └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*)
│               │       [dev-dependencies]
│               │       ├── linkerd-app-gateway v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/gateway) (*)
│               │       ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│               │       └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*)
│               ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│               ├── linkerd-proxy-tap v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/proxy/tap)
│               │   └── linkerd-app-core v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/core) (*)
│               └── linkerd2-proxy v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd2-proxy)
│               [dev-dependencies]
│               ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│               ├── linkerd-app-integration v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/integration)
│               └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*)
│           [dev-dependencies]
│           ├── linkerd-app-inbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/inbound) (*)
│           └── linkerd-app-outbound v0.1.0 (/home/eliza/Code/linkerd2-proxy/linkerd/app/outbound) (*)
└── tokio-rustls v0.23.4 (*)
```

After:

```console
$ cargo tree -p webpki -i
error: package ID specification `webpki` did not match any packages
```

</details>

[1]:
    8afc72258b/Cargo.lock (L2450-L2460C2)
[2]:
    https://github.com/linkerd/linkerd2/issues/9299#issuecomment-1730094953
2023-09-21 14:17:49 -07:00
Pranoy Kumar Kundu 8afc72258b
Replace `procinfo` with `procfs` (#2433)
This PR will replace [procinfo](https://crates.io/crates/procinfo) crate
which is not maintained for over 5 years with
[procfs](https://crates.io/crates/procfs).

Signed-off-by: Pranoy Kumar Kundu <pranoy1998k@gmail.com>

Fixes linkerd/linkerd2/issues/10819
2023-09-20 12:47:30 -07:00
Eliza Weisman c10c4b7ac7
meshtls: use published `rustls-webpki` v0.101.5 (#2470)
Now that [v0.101.5 of `rustls-webpki`][1] has been [published][2], we
can now depend on the crate from crates.io. This allows us to remove the
Git dependency on the branch preparing that release to be published,
which allows us to remove the allowance for Git dependencies in the
`cargo-deny` config.

[1]: https://github.com/rustls/webpki/releases/tag/v%2F0.101.5
[2]: https://crates.io/crates/rustls-webpki/0.101.5
2023-09-18 11:13:11 -07:00
Eliza Weisman cd04d7a801
use `rustls-webpki` instead of `linkerd/webpki` (#2465)
This commit changes the `linkerd-meshtls-rustls` crate to use the
upstream `rustls-webpki` crate, maintained by Rustls, rather than our
fork of `briansmith/webpki` from GitHub. Since `rustls-webpki` includes
the change which was the initial motivation for the `linkerd/webpki`
fork (rustls/webpki#42), we can now depend on upstream.

Currently, we must take a Git dependency on `rustls-webpki`, since a
release including a fix for an issue (rustls/webpki#167) which prevents
`rustls-webpki` from parsing our test certificates has not yet been
published. Once v0.101.5 of `rustls-webpki` is published (PR see
rustls/webpki#170), we can remove the Git dep. For now, I've updated
`cargo-deny` to allow the Git dependency.
2023-09-11 11:03:52 -07:00
Eliza Weisman 426120a6e9
build(deps): use published version of `boring` (#2454)
The `linkerd-meshtls-boring` crate currently uses a Git dependency on
`boring` and `tokio-boring`. This is because, when this crate was
initially introduced, the proxy required unreleased changes to these
crates. Now, however, upstream has published all the changes we depended
on (this happened ages ago), and we can depend on these libraries from
crates.io.

This branch removes the Git deps and updates to v3.0.0 of
`boring`/`tokio-boring`. I've also changed the `cargo-deny` settings to
no longer allow Git deps on these crates, as we no longer depend on them
from Git.
2023-08-25 10:27:48 -07:00
Eliza Weisman 9fa90df4ec
Increase HTTP request queue capacity (#2449)
In 2.13, the default inbound and outbound HTTP request queue capacity
decreased from 10,000 requests to 100 requests (in PR #2078). This
change results in proxies shedding load much more aggressively while
under high load to a single destination service, resulting in increased
error rates in comparison to 2.12 (see linkerd/linkerd2#11055 for
details).

This commit changes the default HTTP request queue capacities for the
inbound and outbound proxies back to 10,000 requests, the way they were
in 2.12 and earlier. In manual load testing I've verified that
increasing the queue capacity results in a substantial decrease in 503
Service Unavailable errors emitted by the proxy: with a queue capacity
of 100 requests, the load test described [here] observed a failure rate
of 51.51% of requests, while with a queue capacity of 10,000 requests,
the same load test observes no failures.

Note that I did not modify the TCP connection queue capacities, or the
control plane request queue capacity. These were previously configured
by the same variable before #2078, but were split out into separate vars
in that change. I don't think the queue capacity limits for TCP
connection establishment or for control plane requests are currently
resulting in instability the way the decreased request queue capacity
is, so I decided to make a more focused change to just the HTTP request
queues for the proxies.

[here]: https://github.com/linkerd/linkerd2/issues/11055#issuecomment-1650957357
2023-08-03 10:19:39 -07:00
Alex Leong 32245601bb
Add suport for response header filter (#2439)
Add support for the response header modifier, which was added to the proxy API in https://github.com/linkerd/linkerd2-proxy-api/pull/251

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-07-19 14:47:03 -07:00
Oliver Gould d6172c594c
Emit distinguishable version info (#2432)
The proxy currently emits very little useful version information.

This change updates the proxy to support new build-time environment
variables that are used to report version information:

* LINKERD2_PROXY_BUILD_TIME
* LINKERD2_PROXY_VENDOR
* LINKERD2_PROXY_VERSION

Additionally, several pre-existing Git-oriented metadata have been
removed, as they were generally redundant or uninformative. The Rustc
version has also been removed (since it has no real user-facing value
and can be easily determined by the version/tag).
2023-06-23 14:21:28 -07:00
Eliza Weisman f0277cc776
outbound: handle `NotFound` client policies in ingress mode (#2431)
When the outbound proxy resolves an outbound policy from the policy
controller's `OutboundPolicies` API, the policy controller may return an
error with the `grpc-status` code `NotFound` in order to indicate that
the destination is not a ClusterIP service. When this occurs, the proxy
will fall back to either using a ServiceProfile, if the ServiceProfile
contains non-trivial configuration, or synthesizing a default client
policy from the ServiceProfile.

However, when the outbound proxy is configured to run in ingress mode,
the fallback behavior does not occur. Instead, the ingress mode proxy
treats any error returned by the policy controller's `OutboundPolicies`
API as fatal. This means that when an ingress controller performs its
own load-balancing and opens a connection to a pod IP directly, the
ingress mode proxy will fail any requests on that connection. This is a
bug, and is the cause of the issues described in linkerd/linkerd2#10908.

This branch fixes this by changing the ingress mode proxy to handle
`NotFound` errors returned by the policy controller. I've added similar
logic for synthesizing default policies from a discovered
ServiceProfile, or using the profile if it's non-trivial. Unfortunately,
we can't just reuse the existing `Outbound::resolver` method, as ingress
discovery may be performed for an original destination address *or* for
a DNS name, and it's necessary to construct fallback policies in either
case. Instead, I've added a new function with similar behavior that's
ingress-specific.

I've manually tested this change against the repro steps[^1] described
in linkerd/linkerd2#10908, and verified that the proxy 503s on 2.13.4,
and that it once again routes correctly after applying this change.

Fixes linkerd/linkerd2#10908.

[^1]: As described in the first comment, using Contour and podinfo.
2023-06-20 12:14:37 -07:00
Eliza Weisman 9f0a2698f4
outbound: add backend and route metadata to errors (#2428)
PRs #2418 and #2419 add per-route and per-backend request timeouts
configured by the `OutboundPolicies` API to the `MatchedRoute` and
`MatchedBackend` layers in the outbound `ClientPolicy` stack,
respectively. This means that — unlike in the `ServiceProfile` stack —
two separate request timeouts can be configured in `ClientPolicy`
stacks. However, because both the `MatchedRoute` and `MatchedBackend`
layers are in the HTTP logical stack, the errors emitted by both
timeouts will have a `LogicalError` as their most specific error
metadata, meaning that the log messages and `l5d-proxy-error` headers
recorded for these timeouts do not indicate whether the timeout that
failed the request was the route request timeout or the backend request
timeout.

In order to ensure this information is recorded and exposed to the user,
this branch adds two new error wrapper types, one of which enriches an
error with a `RouteRef`'s metadata, and one of which enriches an error
with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all
errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps
errors with `BackendRef` metadata. This way, when the route timeout
fails a request, the error will include the route metadata, while when
the backend request timeout fails a request, the error will include both
the route and backend metadata.

Adding these new error wrappers also has the additional side benefit of
adding this metadata to errors returned by filters, allowing users to
distinguish between errors emitted by a filter on a route rule and
errors emitted by a per-backend filter. Also, any other errors emitted
lower in the stack for requests that are handled by a client policy
stack will now also include this metadata, which seems generally useful.

Example errors, taken from a proxy unit test:

backend request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s
```
route request:
```
logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s
```
2023-06-15 12:26:31 -07:00
Eliza Weisman 966306bc84
outbound: implement `OutboundPolicies` backend request timeouts (#2419)
Depends on #2418 

The latest proxy-api release, v0.10.0, adds fields to the
`OutboundPolicies` API for configuring HTTP request timeouts, based on
the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.
PR #2418 updates the proxy to depend on the new proxy-api release, and
implements the `Rule.request_timeout` field added to the API. However,
that branch does *not* add a timeout for the
`RouteBackend.request_timeout` field. This branch changes the proxy to
apply the backend request timeout when configured by the policy
controller.

This branch implements `RouteBackend.request_timeout` by adding an
additional timeout layer in the `MatchedBackend` stack. This applies the
per-backend timeout once a backend is selected for a route. I've also
added stack tests for the interaction between the request and backend
request timeouts.

Note that once retries are added to client policy stacks, it may be
necessary to move the backend request timeout to ensure it occurs
"below" retries, depending on where the retry middleware ends up being
located in the proxy stack.
2023-06-15 10:48:49 -07:00
dependabot[bot] 1a6225feb7
build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (#2427)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 35.7.7 to 36.2.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](db5dd7c176...c9124514c3)

---
updated-dependencies:
- dependency-name: tj-actions/changed-files
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-15 09:33:42 -07:00
Eliza Weisman a512ea658d
outbound: implement `OutboundPolicies` route request timeouts (#2418)
The latest proxy-api release, v0.10.0, adds fields to the
`OutboundPolicies` API for configuring HTTP request timeouts, based on
the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997.

This branch updates the proxy-api dependency to v0.10.0 and adds the new
timeout configuration fields to the proxy's internal client policy
types. In addition, this branch adds a timeout middleware to the HTTP
client policy stack, so that the timeout described by the
`Rule.request_timeout` field is now applied.

Implementing the `RouteBackend.request_timeout` field with semantics as
close as possible to those described in GEP-1742 will be somewhat more
complex, and will be added in a separate PR.
2023-06-14 11:36:40 -07:00
Eliza Weisman 864a5dbc97
recover: remove unused `mut` (#2425) 2023-06-12 11:45:35 -07:00
Alex Leong 704ef31a28
Classify grpc requests properly in the http classifier (#2410)
The gRPC protocol always sets the HTTP response status code to 200 and instead communicates failures in a grpc-status header sent in a TRAILERS frame. Linkerd uses the HTTP response status code to determine if a response is successful, and therefore will consider all gRPC responses successful regardless of their gRPC status code. This means that functionality such as retries and circuit breaking do not function correctly with gRPC traffic.

We update the Http classifier to look for the presence of a `Content-Type: application/grpc` header and use Grpc response classification when it is set.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-05-22 13:40:37 -07:00
Matei David 3933feb587
Skip h2 upgrade when target is local (#2407)
In the most recent stable versions, pods cannot communicate with themselves when using a ClusterIP. While direct (pod-to-pod) connections are never sent through the proxy and are skipped at the iptables level, connections to a logical service still pass through the proxy. When the chosen endpoint is the same as the source of the traffic, TLS and H2 upgrades should be skipped.

Every endpoint receives an h2 upgrade hint in its metadata. When looking into the problem, I noticed that client settings do not take into account that the target may be local. When deciding what client settings to use, we do not upgrade the connection when the hint is "unknown" (gatewayed connections) or "opaque". This change does a similar thing by using H1 settings when the protocol is H1 and the target IP is also part of the inbound IPs passed to the proxy.

Fixes linkerd/linkerd2#10816

Signed-off-by: Matei David <matei@buoyant.io>
2023-05-10 10:01:03 +01:00
Willi Schönborn 5366652c2b
Propagate correct span ID via W3C context (#2408)
The W3C context propagation uses the wrong span ID right now. That
causes all spans emitted by linkerd-proxy to be siblings rather than
children of their original parent.

This only applies to W3C as far as I can tell, because the B3
propagation uses the span ID correctly.

Signed-off-by: Willi Schönborn <w.schoenborn@gmail.com>
2023-05-04 09:21:35 -07:00
Oliver Gould c30d39f73e
dev: Update to Rust v1.69.0 (#2402)
Update dev to v40
2023-04-25 15:56:34 -07:00
dependabot[bot] 2a32127d48
build(deps): bump async-trait from 0.1.66 to 0.1.68 (#2368)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
2023-04-25 14:52:13 -07:00
dependabot[bot] fc3a1e860c
build(deps): bump futures from 0.3.26 to 0.3.28 (#2370)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.26 to 0.3.28.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.26...0.3.28)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-25 14:08:37 -07:00
Eliza Weisman 717fd22b23
chore: allow `syn` v1 and v2 to coexist peacefully (#2401)
The proc-macro ecosystem is in the middle of a migration from `syn` v1
to `syn` v2. Some crates (such as `tokio-macros`, `async-trait`,
`tracing-attributes`, etc) have been updated to v2, while others haven't
yet. This means that `cargo deny` will not currently permit us to update
some of those crates to versions that depend on `syn` v2, because they
will create a duplicate dependency.

Since `syn` is used by proc-macros (executed at compile time), duplicate
versions won't have an impact on the final binary size. Therefore, it's
fine to allow both v1 and v2 to coexist while the ecosystem is still
being gradually migrated to the new version.
2023-04-25 12:42:20 -07:00
dependabot[bot] aacd8c9bac
build(deps): bump io-lifetimes from 1.0.4 to 1.0.10 (#2379)
Bumps [io-lifetimes](https://github.com/sunfishcode/io-lifetimes) from 1.0.4 to 1.0.10.
- [Release notes](https://github.com/sunfishcode/io-lifetimes/releases)
- [Commits](https://github.com/sunfishcode/io-lifetimes/compare/1.0.4...v1.0.10)

---
updated-dependencies:
- dependency-name: io-lifetimes
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-04-25 09:58:41 -07:00
Eliza Weisman 181a20753a
outbound: synthesize client policies on `Unimplemented` (#2396)
If the policy controller is from a Linkerd version earlier than 2.13.x,
it will return the `Unimplemented` gRPC status code for requests to the
`OutboundPolicies` API. The proxy's outbound policy client will
currently retry this error code, rather than synthesizing a default
policy. Since 2.13.x proxies require an `OutboundPolicy` to be
discovered before handling outbound traffic, this means that 2.13.x
proxies cannot handle outbound connections when the control plane
is on an earlier version. Therefore, installing Linkerd 2.13 and then
downgrading to 2.12 can potentially break the data plane's ability to
route traffic.

In order to support downgrade scenarios, the proxy should also
synthesize a default policy when receiving an `Unimplemented` gRPC
status code from the policy controller. This branch changes the proxy to
do that. A warning is logged which indicates that the control plane
version is older than the proxy's.
2023-04-25 09:56:30 -07:00
Eliza Weisman ad4d5b64bb
inbound: determine default policies using the opaque ports env var (#2395)
The proxy injector populates an environment variable,
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION`, with a list
of all ports marked as opaque. Currently, however, _the proxy _does not
actually use this environment variable_. Instead, opaque ports are
discovered from the policy controller. The opaque ports environment
variable was used only when running in the "fixed" inbound policy mode,
where all inbound policies are determined from environment variables,
and no policy controller address is provided. This mode is no longer
supported, and the policy controller address is now required, so the
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment
variable is not currently used to discover inbound opaque ports.

There are two issues with the current state of things. One is that
inbound policy discovery is _non-blocking_: when an inbound proxy
receives a connection on a port that it has not previously discovered a
policy for, it uses the default policy until it has successfully
discovered a policy for that port from the policy controller. This means
that the proxy may perform protocol detection on the first connection to
an opaque port. This isn't great, as it may result in a protocol
detection timeout error on a port that the user had previously marked as
opaque. It would be preferable for the proxy to read the environment
variable, and use it to determine whether the default policy for a port
is opaque, so that ports marked as opaque disable protocol detection
even before the "actual" policy is discovered.

The other issue with the
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment
variable is that it is currently a list of _individual port numbers_,
while the proxy injector can accept annotations that specify _ranges_ of
opaque ports. This means that when a very large number of ports are
marked as opaque, the proxy manifest must contain a list of each
individual port number in those ranges, making it potentially quite
large. See linkerd/linkerd2#9803 for details on this issue.

This branch addresses both of these problems. The proxy is changed so
that it will once again read the
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment
variable, and use it to determine which ports should have opaque
policies by default. The parsing of the environment variable is changed
to support specifying ports as a list of ranges, rather than a list of
individual port numbers. Along with a proxy-injector change, this would
resolve the manifest size issue described in linkerd/linkerd2#9803.

This is implemented by changing the `inbound::policy::Store` type to
also include a set of port ranges that are marked as opaque. When the
`Store` handles a `get_policy` call for a port that is not already in
the cache, it starts a control plane watch for that port just as it did
previously. However, when determining the initial _default_ value for
the policy, before the control plane discovery provides one, it checks
whether the port is in a range that is marked as opaque, and, if it is,
uses an opaque default policy instead.

This approach was chosen rather than pre-populating the `Store` with
policies for all opaque ports to better handle the case where very large
ranges are marked as opaque and are used infrequently. If the `Store`
was pre-populated with default policies for all such ports, it would
essentially behave as though all ports in
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` were also in
`LINKERD2_PROXY_INBOUND_PORTS`, and the proxy would immediately start a
policy controller discovery watch for all opaque ports, which would be
kept open for the proxy's entire lifetime. In cases where the opaque
ports ranges include ~10,000s of ports, this causes significant
unnecessary load on the policy controller. Storing opaque port ranges
separately and using them to determine the default policy as needed
allows opaque port policies to be treated the same as non-default ports,
which are discovered as needed and can be evicted from the cache if they
are unused. If a port is in both
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` *and*
`LINKERD2_PROXY_INBOUND_PORTS`, the proxy will start discovery eagerly
and retain the port in the cache forever, but the default policy will be
opaque.

I've also added a test for the behavior of opaque ports where the port's
policy has not been discovered from the policy controller. That test
fails on `main`, as the proxy attempts protocol detection, but passes on
this branch.

In addition, I changed the parsing of the `LINKERD2_PROXY_INBOUND_PORTS`
environment variable to also accept ranges, because it seemed like a
nice thing to do while I was here. :)
2023-04-25 09:56:07 -07:00
Eliza Weisman 15bebe4eeb
outbound: add missing `meta` field in test policy (#2400)
Looks like we accidentally merged PR #2375 without a CI build against
the latest state of `main`. In the meantime since #2375 was last built
on CI, PR #2374 added an additional metadata field to
`policy::HttpParams`, which the `HttpParams` constructed in the test
added from #2375 doesn't populate. Therefore, merging this PR broke the
build. Whoops!

This commit populates the `meta` field, fixing it.
2023-04-24 14:22:28 -07:00
Eliza Weisman 2a48e12e2b
outbound: test load balancer behavior with failure accrual (#2375)
This branch adds a new test for failure accrual in load balancers with
multiple endpoints. This test asserts that endpoints whose circuit
breakers have tripped will not be selected by a load balancer.
2023-04-24 13:34:25 -07:00
Eliza Weisman 423ffb0af8
set default `trust_dns` log level to `ERROR` (#2393)
Since upstream has yet to release a version with PR
bluejekyll/trust-dns#1881, this commit changes the proxy's default log
level to silence warnings from `trust_dns_proto` that are generally
spurious.

See linkerd/linkerd2#10123 for details.
2023-04-24 13:25:25 -07:00
Eliza Weisman 9d8607351a
outbound: determine protocol based on `OutboundPolicy` (#2397)
Currently, the outbound proxy determines whether or not to perform
protocol detection based on the presence of the `opaque_protocol` field
on the resolved `ServiceProfile` from the Destination controller.
However, the `OutboundPolicy` resolved from the policy controller also
contains a `proxy_protocol` field that indicates what protocol should be
used for this destination. While the proxy uses the HTTPRoutes from the
`OutboundPolicy`'s `proxy_protocol`, it does _not_ take into account the
`proxy_protocol` when determining whether or not to perform protocol
detection. This can result in the outbound proxy performing protocol
detection on connections to destinations that have been marked as
opaque.

This branch modifies the outbound proxy to use the `proxy_protocol` from
the `OutboundPolicy`, as well as the `opaque_protocol` field from the
`ServiceProfile`, when determining whether or not to perform protocol
detection. In addition, I've added an integration test, which fails before
making the changes on this branch.

Fixes linkerd/linkerd2#10745
2023-04-24 13:22:08 -07:00
Eliza Weisman 051e0e199d
build(deps): bump `h2` to v0.3.18 (#2394)
The DOS mitigation changes in `h2` v0.3.17 inadvertantly introduced a
potential panic (hyperium/h2#674). Version 0.3.18 fixes this, so we
should bump the proxy's dependency to avoid panics.
2023-04-18 14:51:25 -07:00
Eliza Weisman c7918cfb1f
outbound: handle `Opaque` protocol hints on endpoints (#2237)
Currently, when the outbound proxy makes a direct connection prefixed
with a `TransportHeader` in order to send HTTP traffic, it will always
send a `SessionProtocol` hint with the HTTP version as part of the
header. This instructs the inbound proxy to use that protocol, even if
the target port has a ServerPolicy that marks that port as opaque, which
can result in incorrect handling of that connection. See
linkerd/linkerd2#9888 for details.

In order to prevent this, linkerd/linkerd2-proxy-api#197 adds a new
`ProtocolHint` value to the protobuf endpoint metadata message. This
will allow the Destination controller to explicitly indicate to the
outbound proxy that a given endpoint is known to handle all connections
to a port as an opaque TCP stream, and that the proxy should not perform
a protocol upgrade or send a `SessionProtocol` in the transport header.
This branch updates the proxy to handle this new hint value, and adds
tests that the outbound proxy behaves as expected.

Along with linkerd/linkerd2#10301, this will fix linkerd/linkerd2#9888.

I opened a new PR for this change rather than attempting to rebase my
previous PR #2209, as it felt a bit easier to start with a new branch
and just make the changes that were still relevant. Therefore, this
closes #2209.
2023-04-13 14:09:01 -07:00