this commit removes a piece of code that has been commented out.
it additionally removes a variable binding that is not needed. `dst` is
not moved, so we do not need to bind the address of the destination
service to a variable, nor do we need to clone it.
Signed-off-by: katelyn martin <kate@buoyant.io>
Now that the `rustls` initialization/configuration has been decoupled from `linkerd-meshtls`, we can get the provider directly from there. This handles the uninitialized case better, which should be less of a problem now that we always directly initialize the provider in main.
Signed-off-by: Scott Fleener <scott@buoyant.io>
`NewBroadcastClassification<C, X, N>` is not used.
`BroadcastClassification<C, S>` is only used by the `gate` submodule in
this crate.
this commit removes `NewBroadcastClassification`, since it is unused.
this commit demotes `channel` to an internal submodule, since it has no
external users.
the reëxport of `BroadcastClassification` is unused, though it is left
intact because it _is_ exposed by `NewClassifyGateSet`'s implementation
of `NewService<T>`.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_core::classify` reëxports symbols from
`linkerd_proxy_http::classify::gate`.
nothing makes use of this, and these symbols are already reëxported from
`linkerd_proxy_http::classify`. existing callsites in the outbound proxy
import this middleware directly, or though the reëxport in
`linkerd_proxy_http`.
this commit removes this `pub use` directive, since it does nothing.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Move rustls into dedicated crate
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove extraneous provider installs from tests
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Install default rustls provider in main
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor shims out of meshtls
Meshtls previously assumed that mutliple TLS implementations could be used. Now that we've consolidated on rustls as the TLS implementation, we can remove these shims.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor mode out of meshtls
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
this commit removes `linkerd_app_inbound::Inbound::proxy_metrics()`.
this accessor is not used anywhere.
Signed-off-by: katelyn martin <kate@buoyant.io>
* feat(tls): Explicitly include post-quantum key exchange algorithms
This explicitly sets the key exchange algorithms the proxy uses. It adds `X25519MLKEM768` as the most preferred algorithm in non-FIPS mode, and `SECP256R1MLKEM768` in FIPS mode.
Note that `X25519MLKEM768` is still probably appropriate for FIPS environments according to [NIST's special publication 800-56Cr2](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Cr2.pdf) as it performs a FIPS-approved key-establishment first (`MLKEM768`), but we should evaluate this position more before committing to it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
this comment changes this comment in two ways:
1. fix a copy-paste typo. this should say "inbound", not "outbound".
2. add note that this is a "legacy" structure.
the equivalent structure in the outbound proxy was labeled as such
in https://github.com/linkerd/linkerd2-proxy/pull/2887.
see:
```rust
/// Holds LEGACY outbound proxy metrics.
#[derive(Clone, Debug)]
pub struct OutboundMetrics {
pub(crate) http_errors: error::Http,
pub(crate) tcp_errors: error::Tcp,
// pub(crate) http_route_backends: RouteBackendMetrics,
// pub(crate) grpc_route_backends: RouteBackendMetrics,
/// Holds metrics that are common to both inbound and outbound proxies. These metrics are
/// reported separately
pub(crate) proxy: Proxy,
pub(crate) prom: PromMetrics,
}
```
\- <dce6b61191/linkerd/app/outbound/src/metrics.rs (L22-L35)>
`authz::HttpAuthzMetrics`, `error::HttpErrorMetrics`,
`authz::TcpAuthzMetrics`, and `error::TcpErrorMetrics` all make use of
the "legacy" metrics implementation defined in `linkerd_metrics`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* feat(tls): Remove boring as a TLS implementation
BoringSSL, as we use it today, doesn't integrate well with the broader rustls ecosystem, so this removes it. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
This adds a few small improvements to how we handle the `aws-lc` usage in the proxy:
- Pull provider customization to the `aws-lc` backend, reducing the amount that the module exposes
- Validate that the provider is actually FIPS compatible when fips is enabled
- Use the same signature verification algorithms in the `rustls` provider as we do in the cert verifier. Previously, the provider also included RSA_PSS_2048_8192_SHA256, which is marked as legacy and we don't have a strong reason to support.
- Add change detector tests for the cipher suites, key exchange groups, and signature algorithms. These should ideally never change unless `rustls` changes, at which point we can re-evaluate which algorithms are in use by the proxy.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
`linkerd-metrics` contains a suite of facilities for defining,
registering, and serving Prometheus metrics. these predate the
[`prometheus-client`](https://crates.io/crates/prometheus-client/)
crate, which should now be used for our metrics.
`linkerd-metrics` defines a `prom` namespace, which reëxports symbols
from the `prometheus-client` library. as the documentation comment for
this submodule notes, this should be used for all new metrics.
6b323d8457/linkerd/metrics/src/lib.rs (L30-L60)
`linkerd-metrics` still provides its legacy types in the public surface
of this library today, which can make it difficult to differentiate
between our two metrics implementations.
this branch introduces a new `legacy` namespace, to help clarify the
distinction between these two Prometheus implementations, and to smooth
the road to further adoption of `prometheus-client` interfaces across
the proxy.
---
* refactor(metrics): introduce empty `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Counter` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Gauge` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Histogram` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Metric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtLabels` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `LastUpdate` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Store` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `SharedStore` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Serve` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `NewMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Factor` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This includes a small set of metrics about the currently installed rustls crypto provider and the algorithms it is configured to use.
We don't have 100% assurance that a default crypto provider has been installed before registering the metric, but in local testing it never appeared to be a problem. When we refactor the rustls initialization we can add an extra guarantee that we've initialized it by this point.
Example metric:
```
# HELP rustls_info Proxy TLS info.
# TYPE rustls_info gauge
rustls_info{tls_suites="TLS13_AES_128_GCM_SHA256,TLS13_AES_256_GCM_SHA384,TLS13_CHACHA20_POLY1305_SHA256,",tls_kx_groups="X25519,secp256r1,secp384r1,X25519MLKEM768,",tls_rand="AwsLcRs",tls_key_provider="AwsLcRs",tls_fips="false"} 1
```
Signed-off-by: Scott Fleener <scott@buoyant.io>
We only build linux/amd64 during typical release CI runs. This means that the platform-
specific builds are not exercised. This change updates the release workflow so that
all platforms are built whenever the workflow itself is changed.
The broader ecosystem has mostly moved to `aws-lc-rs` as the primary `rustls` backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
This requires some extra configuration for successful cross-compilation, ideally we can remove this extra configuration once linkerd/dev v48 is available.
This doesn't remove `ring` as a crypto backend, that can come in a follow-up at https://github.com/linkerd/linkerd2-proxy/pull/4029
* build(deps): bump tokio from 1.45.0 to 1.47.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.0 to 1.47.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.45.0...tokio-1.47.0)
---
updated-dependencies:
- dependency-name: tokio
dependency-version: 1.47.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
---
chore(deny): ignore socket2@v0.5
there is now a v0.6 used by the latest tokio.
while we wait for this new version to propagate through the ecosystem,
allow for two socket2 dependencies.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_io_err test
> @cratelyn I think it would be appropriate to remove these tests, given
> that they can no longer behave properly. I don't think that this test
> case is particularly meaningful or load bearing, it's best just to
> unblock the dependency updates.
\- <https://github.com/BuoyantIO/enterprise-linkerd/issues/1645#issuecomment-3046905516>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_multi test
this test exercises the same thing that the previous two tests do, as
the comment at the top of it points out.
this test is redundant, and we have removed the i/o error coverage that
this was redunant with. let's remove it.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
This architecture has become too significant of a maintenance burden, and isn't used often enough to justify the associated maintenance cost.
This removes arm/v7 from all the build infrastructure/dockerfiles/etc. Note that arm64 targets are still widely used and well supported.
Related: https://github.com/linkerd/linkerd2/pull/14308
Signed-off-by: Scott Fleener <scott@buoyant.io>
Currently, disabling the `ring` feature does not actually disable the dependency across the tree. Doing so requires a couple of tightly coupled steps:
- Making `ring` and `aws-lc` exclusive features, raising a compile error if they are both enabled.
- Removing a direct dependency on some `ring` types, and instead going through `rustls` for equivalent functionality.
- Removing a direct dependency on the `ring` crypto provider for integration tests, and instead using the provider from `linkerd-meshtls`.
- Installing the default crypto provider globally for the process and re-using it when requested, mostly to make the tests pass.
This was tested using a temporary `cargo deny` config that forbid `ring` when `aws-lc-rs` was used, and vice-versa. Note that it doesn't completely remove ring for dev dependencies, but that can be done as a follow-up.
Signed-off-by: Scott Fleener <scott@buoyant.io>
This makes two changes to the preferred cipher suite order.
- Prefer AES algorithms over ChaCha20. AES is significantly faster when AES hardware is present, and AES hardware is on all x86 CPUs since ~2010, and all ARM server CPUs for a similar amount of time. For these reasons it's reasonable to default to AES for modern deployments, and it's the same default that `aws-lc-rs` makes anyway.
- Remove ChaCha20 when FIPS is enabled. It's no longer a supported algorithm, so we shouldn't have it as an option.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Auditing tools like Syft cannot inspect proxy dependencies, which makes it difficult to inspect the state of a binary. This change updates the release process to use cargo-auditable, which documents the proxy's crate dependencies in its release binary.
tikv-jemallocator supersedes jemallocator. To enable jemalloc profiling, this change updates the dependency and adds a `jemalloc-profiling` feature so that profiling can be enabled at build time.
We use the ubuntu-24.04 runner by default, but in forks this may not be appropriate. This change updates the runners to support overriding via the LINKERD2_PROXY_RUNNER variable.
* build(deps): bump the rustls group across 1 directory with 3 updates
Bumps the rustls group with 3 updates in the / directory: [rustls-webpki](https://github.com/rustls/webpki), [rustls](https://github.com/rustls/rustls) and [rustls-pki-types](https://github.com/rustls/pki-types).
Updates `rustls-webpki` from 0.103.1 to 0.103.2
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.1...v/0.103.2)
Updates `rustls` from 0.23.26 to 0.23.27
- [Release notes](https://github.com/rustls/rustls/releases)
- [Changelog](https://github.com/rustls/rustls/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rustls/rustls/compare/v/0.23.26...v/0.23.27)
Updates `rustls-pki-types` from 1.11.0 to 1.12.0
- [Release notes](https://github.com/rustls/pki-types/releases)
- [Commits](https://github.com/rustls/pki-types/compare/v/1.11.0...v/1.12.0)
---
updated-dependencies:
- dependency-name: rustls-webpki
dependency-version: 0.103.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls
dependency-version: 0.23.27
dependency-type: indirect
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls-pki-types
dependency-version: 1.12.0
dependency-type: indirect
update-type: version-update:semver-minor
dependency-group: rustls
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix(rustls): Remove dependency on most rustls internal types
We only used these types for generating a ClientHello message for testing. Instead, we can manually encode a sample message based on the TLS spec.
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Scott Fleener <scott@buoyant.io>
* chore(app/env): fix typo
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(app/env): a lower default maximum per-host connection limit
see also:
* #4004
* linkerd/linkerd2#14204
in #4004 we fixed an issue related to our HTTP/1.1 client's connection
pool.
this further hedges against future issues related to our HTTP client
exhausting resources available to its container. today, the limit by
default is `usize::MAX`, which is dramatically higher than the practical
limit.
this commit changes the limit for outbound idle connections per-host to
10,000.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
When constructing the HTTP/1 client, we configure connection pooling, but
notably do not provide a timer implementation to Hyper. This causes hyper's
connection pool to be configured without idle timeouts, which may lead to
resource leaks, especially for clients that communicate with many virtual hosts.
This change updates the HTTP/1 client builder to use a Tokio timer, which allows
Hyper to manage idle timeouts correctly.
This is a strong ciphersuite that's reasonable to include as a supported option. We still prefer CHACHA20_POLY1305 in non-FIPS modes for its speed, as well as keeping CHACHA20_POLY1305 as a backup for older proxies that only support it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
this is based on #3987.
in #3987 (_see https://github.com/linkerd/linkerd2/issues/13821_) we discovered that some of the types that implement [`FmtLabels`](085be9978d/linkerd/metrics/src/fmt.rs (L5)) could collide when used in registry keys; i.e., they might emit identical label sets, but distinct `Hash` values.
#3987 solves two bugs. this pull request proposes a follow-on change, introducing _exhaustive_ bindings to implementations of `FmtLabels`, to prevent this category of bug from reoccurring again in the future.
this change means that the introduction of an additional field to any of these label structures, e.g. `OutboundEndpointLabels` or `HTTPLocalRateLimitLabels`, will cause a compilation error unless said new field is handled in the corresponding `FmtLabels` implementation.
### 🔖 a note
in writing this pull request, i noticed one label that i believe is unintentionally being elided. i've refrained from changing behavior in this pull request. i do note it though, as an example of this syntax identifying the category of bug i hope to hedge against here.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(transport-metrics): exhaustive `Eos: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TlsAccept: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TargetAddr: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(metrics): exhaustive `Label: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(http/metrics): exhaustive `Status: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ControlLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ProfileRouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `InboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabel: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `OutboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `Authority: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `StackLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `HTTPLocalRateLimitLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `Key<L>: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): remove redundant banner comment
these impl blocks are all `FmtLabels`, following another series of the
same, above. we don't need another one of these comments.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): exhaustive `AndThen: FmtMetrics`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/core): note unused label
see #3262 (618838ec7), which introduced this label.
to preserve behavior, this label remains unused.
X-Ref: #3262
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
The inbound and outbound connect backoffs are now set at 500ms. This is very aggressive in practice, especially when an endpoint remains unavailable.
This change increases the maximum backoff durations:
* inbound: 10s
* outbound: 60s
The default minimum backoff durations remain unchanged at 100ms so that failed
connections are retried quickly. This change only increases the default _maximum_ backoff so that the timeout increases substantially when an endpoint is unavailable for a longer period of time.
### 🖼️ background
the linkerd2 proxy implements, registers, and exports Prometheus metrics using a variety of systems, for historical reasons. new metrics broadly rely upon the official [`prometheus-client`](https://github.com/prometheus/client_rust/) library, whose interfaces are reexported for internal consumption in the [`linkerd_metrics::prom`](https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/metrics/src/lib.rs#L30-L60) namespace.
other metrics predate this library however, and rely on the metrics registry implemented in the workspace's [`linkerd-metrics`](https://github.com/linkerd/linkerd2-proxy/tree/main/linkerd/metrics) library.
### 🐛 bug report
* https://github.com/linkerd/linkerd2/issues/13821linkerd/linkerd2#13821 reported a bug in which duplicate metrics could be observed and subsequently dropped by Prometheus when upgrading the control plane via helm with an existing workload running.
### 🦋 reproduction example
for posterity, i'll note the reproduction steps here.
i used these steps to identify the `2025.3.2` edge release as the affected release. upgrading from `2025.2.3` to `2025.3.1` did not exhibit this behavior. see below for more discussion about the cause.
generate certificates via <https://linkerd.io/2.18/tasks/generate-certificates/>
using these two deployments, courtesy of @GTRekter:
<details>
<summary>**💾 click to expand: app deployment**</summary>
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: simple-app
annotations:
linkerd.io/inject: enabled
---
apiVersion: v1
kind: Service
metadata:
name: simple-app-v1
namespace: simple-app
spec:
selector:
app: simple-app-v1
version: v1
ports:
- port: 80
targetPort: 5678
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-app-v1
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: simple-app-v1
version: v1
template:
metadata:
labels:
app: simple-app-v1
version: v1
spec:
containers:
- name: http-app
image: hashicorp/http-echo:latest
args:
- "-text=Simple App v1"
ports:
- containerPort: 5678
```
</details>
<details>
<summary>**🤠 click to expand: client deployment**</summary>
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: traffic
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: traffic
template:
metadata:
labels:
app: traffic
spec:
containers:
- name: traffic
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
while true; do
TIMESTAMP_SEND=$(date '+%Y-%m-%d %H:%M:%S')
PAYLOAD="{\"timestamp\":\"$TIMESTAMP_SEND\",\"test_id\":\"sniff_me\",\"message\":\"hello-world\"}"
echo "$TIMESTAMP_SEND - Sending payload: $PAYLOAD"
RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
http://simple-app-v1.simple-app.svc.cluster.local:80)
TIMESTAMP_RESPONSE=$(date '+%Y-%m-%d %H:%M:%S')
echo "$TIMESTAMP_RESPONSE - RESPONSE: $RESPONSE"
sleep 1
done
```
</details>
and this prometheus configuration:
<details>
<summary>**🔥 click to expand: prometheus configuration**</summary>
```yaml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'pod'
scrape_interval: 10s
static_configs:
- targets: ['localhost:4191']
labels:
group: 'traffic'
```
</details>
we will perform the following steps:
```sh
# install the edge release
# specify the versions we'll migrate between.
export FROM="2025.3.1"
export TO="2025.3.2"
# create a cluster, and add the helm charts.
kind create cluster
helm repo add linkerd-edge https://helm.linkerd.io/edge
# install linkerd's crd's and control plane.
helm install linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $FROM
helm install linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $FROM \
linkerd-edge/linkerd-control-plane
# install a simple app and a client to drive traffic.
kubectl apply -f duplicate-metrics-simple-app.yml
kubectl apply -f duplicate-metrics-traffic.yml
# bind the traffic pod's metrics port to the host.
kubectl port-forward -n simple-app deploy/traffic 4191
# start prometheus, begin scraping metrics
prometheus --config.file=prometheus.yml
```
now, open a browser and query `irate(request_total[1m])`.
next, upgrade the control plane:
```
helm upgrade linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $TO
helm upgrade linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $TO \
linkerd-edge/linkerd-control-plane
```
prometheus will begin emitting warnings regarding 34 time series being dropped.
in your browser, querying `irate(request_total[1m])` once more will show that
the rate of requests has stopped, due to the new time series being dropped.
next, restart the workloads...
```
kubectl rollout restart deployment -n simple-app simple-app-v1 traffic
```
prometheus warnings will go away, as reported in linkerd/linkerd2#13821.
### 🔍 related changes
* https://github.com/linkerd/linkerd2/pull/13699
* https://github.com/linkerd/linkerd2/pull/13715
in linkerd/linkerd2#13715 and linkerd/linkerd2##13699, we made some changes to the destination controller. from the "Cautions" section of the `2025.3.2` edge release:
> Additionally, this release changes the default for `outbound-transport-mode`
> to `transport-header`, which will result in all traffic between meshed
> proxies flowing on port 4143, rather than using the original destination
> port.
linkerd/linkerd2#13699 (_included in `edge-25.3.1`_) introduced this outbound transport-protocol configuration surface, but maintained the default behavior, while linkerd/linkerd2#13715 (_included in `edge-25.3.2`_) altered the default behavior to route meshed traffic via port 4143.
this is a visible change in behavior that can be observed when upgrading from a version that preceded this change to the mesh. this means that when upgrading across `edge-25.3.2`, such as from the `2025.2.1` to `2025.3.2` versions of the helm charts, or from the `2025.2.3` to the `2025.3.4` versions of the helm charts (_reported upstream in linkerd/linkerd2#13821_), the freshly upgraded destination controller pods will begin routing meshed traffic differently.
i'll state explicitly, _that_ is not a bug! it is, however, an important clue to bear in mind: data plane pods that were started with the previous control plane version, and continue running after the control plane upgrade, will have seen both routing patterns. reporting a duplicate time series for affected metrics indicates that there is a hashing collision in our metrics system.
### 🐛 the bug(s)
we define a collection to structures to model labels for inbound and outbound endpoints'
metrics:
```rust
// linkerd/app/core/src/metrics.rs
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum EndpointLabels {
Inbound(InboundEndpointLabels),
Outbound(OutboundEndpointLabels),
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct InboundEndpointLabels {
pub tls: tls::ConditionalServerTls,
pub authority: Option<http::uri::Authority>,
pub target_addr: SocketAddr,
pub policy: RouteAuthzLabels,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct OutboundEndpointLabels {
pub server_id: tls::ConditionalClientTls,
pub authority: Option<http::uri::Authority>,
pub labels: Option<String>,
pub zone_locality: OutboundZoneLocality,
pub target_addr: SocketAddr,
}
```
\- <https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs>
bear particular attention to the derived `Hash` implementation. note the `tls::ConditionalClientTls` and `tls::ConditionalServerTls` types used in each of these labels. these are used by some of our types like `TlsConnect` to emit prometheus labels, using our legacy system's `FmtLabels` trait:
```rust
// linkerd/app/core/src/transport/labels.rs
impl FmtLabels for TlsConnect<'_> {
fn fmt_labels(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.0 {
Conditional::None(tls::NoClientTls::Disabled) => {
write!(f, "tls=\"disabled\"")
}
Conditional::None(why) => {
write!(f, "tls=\"no_identity\",no_tls_reason=\"{}\"", why)
}
Conditional::Some(tls::ClientTls { server_id, .. }) => {
write!(f, "tls=\"true\",server_id=\"{}\"", server_id)
}
}
}
}
```
\- <99316f7898/linkerd/app/core/src/transport/labels.rs (L151-L165)>
note the `ClientTls` case, which ignores fields in the client tls information:
```rust
// linkerd/tls/src/client.rs
/// A stack parameter that configures a `Client` to establish a TLS connection.
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub struct ClientTls {
pub server_name: ServerName,
pub server_id: ServerId,
pub alpn: Option<AlpnProtocols>,
}
```
\- <99316f7898/linkerd/tls/src/client.rs (L20-L26)>
this means that there is potential for an identical set of labels to be emitted given two `ClientTls` structures with distinct server names or ALPN protocols. for brevity, i'll elide the equivalent issue with `ServerTls`, and its corresponding `TlsAccept<'_>` label implementation, though it exhibits the same issue.
### 🔨 the fix
this pull request introduces two new types: `ClientTlsLabels` and `ServerTlsLabels`. these continue to implement `Hash`, for use as a key in our metrics registry, and for use in formatting labels.
`ClientTlsLabels` and `ServerTlsLabels` each resemble `ClientTls` and `ServerTls`, respectively, but do not contain any fields that are elided in label formatting, to prevent duplicate metrics from being emitted.
relatedly, #3988 audits our existing `FmtLabels` implementations and makes use of exhaustive bindings, to prevent this category of problem in the short-term future. ideally, we might eventually consider replacing the metrics interfaces in `linkerd-metrics`, but that is strictly kept out-of-scope for the purposes of this particular fix.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This change does two things:
- adds support for `NamedPipes` to our SPIRE client. This will allow the client to connect to spire agents running on Windows hosts
- renames the `LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET` to `LINKERD2_PROXY_IDENTITY_SPIRE_WORKLOAD_API_ADDRESS` and deprecates the former.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
`linkerd-app-test` relies on some dependencies that are unused.
this commit removes these dependencies from the crate's manifest.
see #3928 and #3929.
Signed-off-by: katelyn martin <kate@buoyant.io>
see linkerd/linkerd2#14050.
this change fixes a logical bug with
`linkerd_http_retry::peek_trailers::PeekTrailersBody::<B>::read_body(..)`.
`read_body(..)` constructs a `PeekTrailersBody<B>`, by polling the inner
body to see whether or not it can reach the end of the stream by only
yielding to the asynchronous runtime once.
in linkerd/linkerd2-proxy#3559, we restructured this middleware's
internal modeling to reflect the `Frame<T>`-oriented signatures of the
`http_body::Body` trait's 1.0 interface.
unfortunately, this included a bug which could cause the first frame in
a stream to be discarded if the second `Body::poll_frame()` call
(_invoked via `now_or_never()`_) returns `Pending`. this could cause
non-deterministic errors for users when sending traffic to HTTPRoutes
and GRPCRoutes with retry annotations applied.
this change rectifies this problem, ensuring that the first frame is not
discarded when attempting to peek a body's trailers.
to confirm that this works as expected, additional test coverage is
introduced that confirms that the data and trailers of the inner body
are passed through faithfully.
---
* feat(http/retry): additional `PeekTrailersBody<B>` test coverage
this commit introduces additional test coverage to
`linker_http_retry::peek_trailers::PeekTrailersBody<B>`.
this body middleware is used to facilitate transparent http retries, and
allows callers to possibly inspect the trailers for a response, by
polling an `http_body::Body`.
this commit introduces additional unit test coverage that confirms that
the data and trailers of the inner body are passed through faithfully.
Signed-off-by: katelyn martin <kate@buoyant.io>
* feat(http/retry): another `PeekTrailersBody<B>` test case
this commit introduces some additional coverage for bodies that return
`Pending` when polled a second time.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(http/retry): `PeekTrailersBody<B>` retains first frame
this commit fixes a logical bug with
`linkerd_http_retry::peek_trailers::PeekTrailersBody::<B>::read_body(..)`.
`read_body(..)` constructs a `PeekTrailersBody<B>`, by polling the inner
body to see whether or not it can reach the end of the stream by only
yielding to the asynchronous runtime once.
in linkerd/linkerd2-proxy#3559, we restructured this middleware's
internal modeling to reflect the `Frame<T>`-oriented signatures of the
`http_body::Body` trait's 1.0 interface.
unfortunately, this included a bug which could cause the first frame in
a stream to be discarded if the second `Body::poll_frame()` call
(_invoked via `now_or_never()`_) returns `Pending`. this could cause
non-deterministic errors for users when sending traffic to HTTPRoutes
and GRPCRoutes with retry annotations applied.
this commit rectifies this problem, ensuring that the first frame is not
discarded when attempting to peek a body's trailers.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd-app-test` exposes some functions that we never use elsewhere.
this commit removes these functions.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_test::service` contains facilities that are unused.
this commit removes this submodule from the `linkerd-app-test` library.
Signed-off-by: katelyn martin <kate@buoyant.io>
this is a trivial, cosmetic change.
`Config` has two consecutive `impl` blocks in the `linkerd-app` library.
these do not include distinct generics or trait bounds, so the methods
contained therein do not need to live in two distinct `impl` blocks.
this commit consolidates these blocks.
while we are performing this change, we add two `=== impl T ===`
banners, which are used throughout the project as greppable strings to
find methods and trait implementations for a given type.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit hoists `tracing`, used liberally throughout our project,
such that it is managed as a single workspace dependency.
this will be helpful someday when a 0.2 release happens.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit introduces a concrete error type for the `orig_proto`
upgrade layer.
this layer is used by the proxy's http client to transparently upgrade
outbound http/1 traffic to http/2. rather than boxing errors, we define
a concrete error type to facilitate inspecting errors in the future.
for now, the top-level http client continues to box errors thrown by the
"orig_proto" upgrade client.
see also, #3894 (ea75ac0).
Signed-off-by: katelyn martin <kate@buoyant.io>
the `linkerd-error` crate includes two functions that can be used to
examine the cause of a dynamic, boxed error. for example, here is the
`is_caused_by()` function, used in some of our error recovery logic:
```rust
/// Determines whether the provided error was caused by an `E` typed error.
pub fn is_caused_by<E: std::error::Error + 'static>(
mut error: &(dyn std::error::Error + 'static),
) -> bool {
loop {
if error.is::<E>() {
return true;
}
error = match error.source() {
Some(e) => e,
None => return false,
};
}
}
```
we rely on [`thiserror`](https://github.com/dtolnay/thiserror/) to
generate boilerplate code for our error structures. this includes an
attribute called `transparent` that will delegate down to an inner
error.
however, this delegation means that the causal chains inspected by
the function above might not properly identify an inner error. this
test, for example, fails:
```rust
// linkerd/dns/src/lib.rs
#[derive(Debug, Clone, Error)]
#[error("invalid SRV record {:?}", self.0)]
struct InvalidSrv(rdata::SRV);
#[derive(Debug, Error)]
enum SrvRecordError {
#[error(transparent)]
Invalid(#[from] InvalidSrv),
#[error("failed to resolve SRV record: {0}")]
Resolve(#[from] hickory_resolver::ResolveError),
}
#[test]
fn srv_record_reports_cause_correctly() {
let srv = "foobar.linkerd-dst-headless.linkerd.svc.cluster.local."
.parse::<hickory_resolver::Name>()
.map(|name| rdata::SRV::new(1, 1, 8086, name))
.expect("a valid domain name");
let error = SrvRecordError::Invalid(InvalidSrv(srv));
let error: Box<dyn std::error::Error + 'static> = Box::new(error);
assert!(linkerd_error::is_caused_by::<InvalidSrv>(&*error));
assert!(linkerd_error::cause_ref::<InvalidSrv>(&*error).is_some());
}
```
the `transparent` attribute will delegate directly down to `InvalidSrv`
when `Error::source()` is invoked. this means that our downcasting logic
in `linkerd-error` used to ascertain causes of dynamic, boxed errors
will fail to identify a `SrvRecordError` as being caused by an
`InvalidSrv`.
by replacing the `transparent` attribute with a `"{0}"` display
attribute, we continue to transparently show the inner error when
printed as a string, but will include `InvalidSrv` in the causal chain.
this branch replaces `transparent` attributes in an assortment of
error variants.
---
* test(dns): add a failing test
this commit adds a failing unit test. this test shows that dns errors
might not report their cause correctly, due to thiserror's `transparent`
attribute passing directly through to `InvalidSrv`'s cause.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(dns): replace `error(transparent)` attribute
this commit fixes the failing unit test introduced in the previous
commit.
the `transparent` attribute will delegate directly down to `InvalidSrv`
when `Error::source()` is invoke. this means that our downcasting logic
in `linkerd-error` used to ascertain causes of dynamic, boxed errors
will fail to identify a `SrvRecordError` as being caused by an
`InvalidSrv`.
by replacing the `transparent` attribute with a `"{0}"` display
attribute, we continue to transparently show the inner error when
printed as a string, but will include `InvalidSrv` in the causal chain.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: errors report inner sources
this commit performs the same transformation as the previous commit,
replacing `transparent` with equivalent pass-through `"{0}"` display
strings, adding `#[source]` where needed.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
this structure exposes its fields, but those fields are never accessed
elsewhere, aside from test code.
this commit removes the `pub` directives from the address and tls
fields. in their stead, test interfaces are added to allow the
`tagged_transport` test suite to function.
Signed-off-by: katelyn martin <kate@buoyant.io>
this is a small mechanical refactor to the http/1 client.
our http/2 and "orig_proto" clients are tower services. our http/1
client, on the other hand, exposes a concrete inherent method `request`.
to be consistent, this changes our http client to treat this http/1
client as a service as well.
Signed-off-by: katelyn martin <kate@buoyant.io>
This has a few benefits. Primarily this gives us a reasonable path to creating FIPS-enabled builds on architectures other than x86-64, as well as a path away from using BoringSSL as a backend.
Additionally, rustls has been using the aws-lc-rs library as the default backend for a little while now, so this gives us the opportunity to stay in line with the most widely used option in the ecosystem.
Signed-off-by: Scott Fleener <scott@buoyant.io>
the initial replay body, circa the usage of our "compatibility" layer
(4b53081, #3598), used to need an extra poll to confirm the absence of
trailers before it would report itself as reaching the end of the
stream. these tests were added in (afda8a7b3, #3583).
this was an artifact of how the compatibility middleware masked the
previous `poll_data()` and `poll_trailer()` methods behind a
forward-compatible `poll_frame()`- and `frame()`-oriented interface.
this commit removes these extra calls to `initial.frame().await`, now
that the initial body will report the end of stream without an extra
call to await a `None`.
X-ref: #3598
X-ref: #3583
Signed-off-by: katelyn martin <kate@buoyant.io>
This introduces a GitHub Copilot instructions file under .github to guide AI-driven code generation and updates the devcontainer configuration accordingly.
The new instructions enforce Rust styling, error handling, and tracing conventions across the project. It ensures generated code passes `cargo fmt` and `clippy`, avoids unwraps, and uses structured logging.
In 65db3dd we enabled overriding the behavior to export TLS hostnames for
outbound traffic, but we omitted TLS hostname labels.
This change updates the tls module to mirror the http module's behavior.
we use the `symbolic-common` and `symbolic-demangle` crates in our
dependency tree. these live in the same repo, here:
<https://github.com/getsentry/symbolic>
this commit introduces a "group" so that dependabot will upgrade them in
lockstep, rather than individually, such as in pull requests like
#3853, #3852, #3857, #3858, or #3860.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/outbound): `linkerd-mock-http-body` test dependency
this adds a development dependency, so we can use this mock body type in
the outbound proxy's unit tests.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/outbound): additional http route metrics tests
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/outbound): additional grpc route metrics tests
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(http/prom): record bodies when eos reached
this commit fixes a bug discovered by @alpeb, which was introduced in
proxy v2.288.0.
> The associated metric is `outbound_http_route_request_statuses_total`:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10
> ```
>
> The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`.
> Before that the metrics looked like:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96
> ```
>
> So the difference is the non-empty value for `error=UNKNOWN` even
> when `https_status` is 2xx, which `linkerd viz stat-outbound`
> interprets as failed requests.
in #3086 we introduced a suite of route- and backend-level metrics. that
subsystem contains a body middleware that will report itself as having
reached the end-of-stream by delegating directly down to its inner
body's `is_end_stream()` hint.
this is roughly correct, but is slightly distinct from the actual
invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must
call its `end_stream` helper to classify the outcome and increment the
corresponding time series in the
`outbound_http_route_request_statuses_total` metric family.
in #3504 we upgraded our hyper dependency. while doing so, we neglected
to include a call to `end_stream` if a data frame is yielded and the
inner body reports itself as having reached the end-of-stream.
this meant that instrumented bodies would be polled until the end is
reached, but were being dropped before a `None` was encountered.
this commit fixes this issue in two ways, to be defensive:
* invoke `end_stream()` if a non-trailers frame is yielded, and the
inner body now reports itself as having ended. this restores the
behavior in place prior to #3504. see the relevant component of that
diff, here:
<https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274>
* rather than delegating to the inner `<B as Body>::is_end_stream()`
method, report the end-of-stream being reached by inspecting whether
or not the inner response state has been taken. this is the state that
directly indicates whether or not the `ResponseBody<B>` middleware is
finished.
X-ref: #3504
X-ref: #3086
X-ref: linkerd/linkerd2#8733
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
this adds `tokio-boring` to the `boring` group.
this will group these crates together and bump them in lockstep.
see, for example:
* #3838
* #3840
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit removes the `linkerd-http-executor` crate, and replaces all
usage of its `TracingExecutor` type with the `TokioExecutor` type
provided by `hyper-util`.
this work is based upon hyperium/hyper-util#166. that change, included
in the 0.1.11 release, altered the `TokioExecutor` type so that it
propagates tracing context when the `tracing` feature is enabled.
with that change made, our `TracingExecutor` type is now redundant.
* https://github.com/hyperium/hyper-util/pull/166
* https://github.com/hyperium/hyper-util/blob/master/CHANGELOG.md#0111-2025-03-31
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit introduces a new metric family tracking the rate and outcome
of dns lookups made by the linkerd2 proxy. this metric family has three
labels, counting the number of DNS resolutions for each distinct
control plane client, by record type (A/AAAA or SRV), and by outcome
(success or failure).
this metric is named `control_dns_resolutions_total`.
this commit generally does this via the addition of some new interfaces
to `linkerd-dns`'s `Resolver` structure. the `resolve_addrs()` method is
extended to increment particular counters if they have been installed.
the `linkerd-app` crate's `Dns` type now encapsulates its resolver, and
callers acquire a new resolver by providing a client name to its
`resolver()` method. this uses the client name to construct label sets
and create the corresponding time series for each client.
once proxies with this patch are running, and the viz extension has been
installed, one can query this metric like so:
**nb:** this screenshot shows an early prototype, this metric has since
been renamed.

this promQL query...
```
sum(rate(control_dns_resolutions_total[1m])) by (app,client,result) > 0
```
...will show the per-minute rate of dns lookups/failures across each
application workload, for each control-plane client, for each possible
outcome.
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
In linkerd/linkerd2-proxy#3547, we removed unsafe authority labels. This was a
breaking change, since the behavior was considered unsafe.
To support a graceful migration, this change adds an environment configuration,
`LINKERD2_PROXY_INBOUND_AUTHORITY_LABELS=unsafe`, that reverts to the prior
behavior.
It may be configured in linkerd2 via the proxy.additionalEnv helm value.
The latest edge doesn't properly install gateway API crds. This changes our
justfile to install the resources from the upstream release instead of the
Linkerd CLI.
this commit changes a message for a debug-level tracing event.
this block builds a trace collector. we can call it that, instead of the
more generic term "client". there are many clients being built here,
including identity, policy, and destination controller clients.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit fixes some broken links now that we have updated to the
latest 1.0 version of `http-body`.
this should address some warnings that can be seen in pull requests'
"files" tab in github. see, for example:
`https://github.com/linkerd/linkerd2-proxy/pull/3818/files`.
Signed-off-by: katelyn martin <kate@buoyant.io>
`LINKERD2_PROXY_RESOLV_CONF` is an environment variable that ostensibly
is used to set the path of the resolver configuration file.
this connects to a `resolv_conf_path` field in the application's dns
`Config` structure, but that field is never used.
because it is marked as public, this isn't caught by the compiler's dead
code analysis.
see `resolv.conf(5)` for more information.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(deps): dependabot group for unicode components
this commit introduces a new dependabot group.
this will update all of the crates maintained by the icu4x organization
in lockstep. we depend upon these transitively to handle urls.
```
; cargo tree | rg icu_ | rg 'icu_\w*' --only-matching | sort | uniq
icu_collections
icu_locid
icu_locid_transform
icu_locid_transform_data
icu_normalizer
icu_normalizer_data
icu_properties
icu_properties_data
icu_provider
icu_provider_macros
```
see:
- https://docs.rs/icu/latest/icu/
- https://icu.unicode.org/
- https://github.com/orgs/unicode-org/repositories?type=all
- https://crates.io/crates/idna
- #3811
- #3812
- #3813
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit: alphabetize
Signed-off-by: katelyn martin <kate@buoyant.io>
* review: use a glob
Co-authored-by: Oliver Gould <ver@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
this commit addresses a todo comment in the `linkerd-proxy-resolve`
crate. this comment mentioned that a `match` block was originally an `if
let` block. a clippy lint is locally ignored as well, regarding `match`
statements with a single pattern.
contrary to the comment, `if let` *does* work with pin projection, as of
today.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit adds a group to the dependabot configuration.
this will mean that dependabot updates `tonic` and `tonic-build` in
lockstep.
Signed-off-by: katelyn martin <kate@buoyant.io>
DNS servers may return extremely low TTLs in some cases. When we're polling DNS to power a load balancer, we need to enforce a minimum duration to prevent tight-looping DNS queries.
This change adds a 5s minimum time between DNS lookups when resolving control plane components.
fixeslinkerd/linkerd2#13508
* build(deps): bump deranged from 0.4.0 to 0.4.1
Bumps [deranged](https://github.com/jhpratt/deranged) from 0.4.0 to 0.4.1.
- [Commits](https://github.com/jhpratt/deranged/commits)
---
updated-dependencies:
- dependency-name: deranged
dependency-type: indirect
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix(proxy/tap): fix inference error
https://github.com/jhpratt/deranged/issues/19
`deranged` added some additional interfaces in 0.4.1 that seem to affect
this `Into<T>` invocation. use `From::from` instead, so we can
explicitly indicate that we wish to convert this into an integer for
comparison.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: katelyn martin <kate@buoyant.io>
We can run our testing k3d cluster with minimal components enabled. This will
speed up the cluster creation and deletion process (i.e. especially in CI).
* chore(deps)!: upgrade to tower 0.5
this commit updates our tower dependency from 0.4 to 0.5.
note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.
the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:
- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])
see:
* <https://github.com/tower-rs/tower/pull/584>
* <https://github.com/tower-rs/tower/pull/681>
* <https://github.com/tower-rs/tower/pull/703>
* <https://github.com/tower-rs/tower/pull/637>
* <https://github.com/tower-rs/tower/pull/654>
the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.
updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.
* <https://github.com/tower-rs/tower/compare/v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>
this work is based upon #3504. for more information, see:
* https://github.com/linkerd/linkerd2/issues/8733
* https://github.com/linkerd/linkerd2-proxy/pull/3504
Signed-off-by: katelyn martin <kate@buoyant.io>
X-Ref: https://github.com/tower-rs/tower/pull/815
X-Ref: https://github.com/tower-rs/tower/pull/817
X-Ref: https://github.com/tower-rs/tower/pull/818
X-Ref: https://github.com/tower-rs/tower/pull/819
* fix(stack/loadshed): update test affected by tower-rs/tower#635
this commit updates a test that was affected by breaking changes in
tower's `Buffer` middleware. see this excerpt from the description of
that change:
> I had to change some of the integration tests slightly as part of this
> change. This is because the buffer implementation using semaphore
> permits is _very subtly_ different from one using a bounded channel. In
> the `Semaphore`-based implementation, a semaphore permit is stored in
> the `Message` struct sent over the channel. This is so that the capacity
> is used as long as the message is in flight. However, when the worker
> task is processing a message that's been recieved from the channel,
> the permit is still not dropped. Essentially, the one message actively
> held by the worker task _also_ occupies one "slot" of capacity, so the
> actual channel capacity is one less than the value passed to the
> constructor, _once the first request has been sent to the worker_. The
> bounded MPSC changed this behavior so that capacity is only occupied
> while a request is actually in the channel, which broke some tests
> that relied on the old (and technically wrong) behavior.
bear particular attention to this:
> The bounded MPSC changed this behavior so that capacity is only
> occupied while a request is actually in the channel, which broke some
> tests that relied on the old (and technically wrong) behavior.
that pr adds an additional message to the channel in tests exercising
the laod-shedding behavior, on account of the previous (incorrect)
behavior.
https://github.com/tower-rs/tower/pull/635/files#r797108274
this commit performs the same change for our corresponding test, adding
an additional `ready()` call before we hit the buffer's limit.
Signed-off-by: katelyn martin <kate@buoyant.io>
* review: use vendored `Either` for consistency
https://github.com/linkerd/linkerd2-proxy/pull/3744#discussion_r1999878537
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
In #3626, we refactored the origin_dst determination logic to utilize
socket2 calls. However, this change inadvertently disrupted IPv6 and
dual-stack support, causing the server to fail to start when deployed on
such network configurations:
```
WARN ThreadId(01) inbound: linkerd_app_core::serve: Server failed to accept connection error=No such file or directory (os error 2)
```
This change reintroduces detection of the current network family,
calling socket2's `original_dst()` or `original_dst_ipv6()` depending on
the case.
Tested fine in both IPv6 and dual-stack Kind clusters.
this golfs down the return expression in
`NameRef::try_from_ascii_str()`.
rather than binding our `s` to a temporary variable, in order to return
a `Self(s)` result, we can take the same result and use `Result::map` to
convert a `Result<&'a str, InvalidName>` to a
`Result<NameRef<'a>, InvalidName>`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* build(deps): bump the hickory group with 2 updates
Bumps the hickory group with 2 updates: [hickory-resolver](https://github.com/hickory-dns/hickory-dns) and [hickory-proto](https://github.com/hickory-dns/hickory-dns).
Updates `hickory-resolver` from 0.24.4 to 0.25.1
- [Release notes](https://github.com/hickory-dns/hickory-dns/releases)
- [Changelog](https://github.com/hickory-dns/hickory-dns/blob/main/OLD-CHANGELOG.md)
- [Commits](https://github.com/hickory-dns/hickory-dns/compare/v0.24.4...v0.25.1)
Updates `hickory-proto` from 0.24.4 to 0.25.1
- [Release notes](https://github.com/hickory-dns/hickory-dns/releases)
- [Changelog](https://github.com/hickory-dns/hickory-dns/blob/main/OLD-CHANGELOG.md)
- [Commits](https://github.com/hickory-dns/hickory-dns/compare/v0.24.4...v0.25.1)
---
updated-dependencies:
- dependency-name: hickory-resolver
dependency-type: direct:production
update-type: version-update:semver-minor
dependency-group: hickory
- dependency-name: hickory-proto
dependency-type: indirect
update-type: version-update:semver-minor
dependency-group: hickory
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(dns): address breaking changes in `hickory-resolver`
see also #3782.
this commit addresses breaking changes in the v0.25.0 release of
`hickory-resolver`, used by our `linkerd-dns` crate to handle DNS
resolution.
see the release notes, here:
<https://github.com/hickory-dns/hickory-dns/releases/tag/v0.25.0>
> 0.25.0 represents a large release for the Hickory DNS project. Over 14
> months since 0.24.0, we've [..] addressed a number of findings from our
> first security audit.
changes that are relevant to us include:
> * Support for TLS using native-tls or OpenSSL has been removed. We now
> only provide first-party support for rustls (0.23, for DNS over TLS,
> HTTP/2, QUIC and HTTP/3). We support ring or aws-lc-rs for
> cryptographic operations both for DNSSEC and TLS. The
> dns-over-rustls,dns-over-native-tls, dns-over-openssl,
> dns-over-https-rustls, dns-over-https, dns-over-quic and dns-over-h3
> features have been removed in favor of a set of
> {tls,https,quic,h3}-{aws-lc-rs,ring} features across our library
> crates.
>
> * The synchronous API in the resolver and client crates, which
> previously provided a thin partial wrapper over the asynchronous
> API, has been removed. Downstream users will have to migrate to the
> asynchronous API.
>
> * Error types are now exposed directly in the crate roots.
this commit updates references to the
`hickory_resolver::error::ResolveError` error with
`hickory_resolver::ResolveError` now that the errors submodule is
private. (hickory-dns/hickory-dns#2530)
this commit replaces references to
`hickory_resolver::TokioAsyncResolver` with its new name,
`hickory_resolver::TokioResolver`. (hickory-dns/hickory-dns#2521)
this commit inspects "no records found" errors according to the new api.
this particular change isn't especially documented, explicitly, but
occurred in hickory-dns/hickory-dns#2094. see in particular, in that
respect, corresponding changes in the upstream repo's own code. for
example: https://github.com/hickory-dns/hickory-dns/pull/2094/files#diff-330847b46040a30d449f85e8a804bea085f0974d3cba80d79d83acc56f33542dL176-R178
```diff
- match error.kind() {
- ResolveErrorKind::NoRecordsFound { query, soa, .. } => {
+ match error.proto().map(ProtoError::kind) {
+ Some(ProtoErrorKind::NoRecordsFound { query, soa, .. }) => {
```
there is a small pull request being proposed upstream to introduce a
`Builder::with_options()` method, which would make our construction of a
dns resolver marginally more idiomatic. this however, is not a blocker,
by any means.
X-Ref: hickory-dns/hickory-dns#2521
X-Ref: hickory-dns/hickory-dns#2830
X-Ref: hickory-dns/hickory-dns#2094
X-Ref: hickory-dns/hickory-dns#2877
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This PR adds os param to our package job in the release workflow.
This allows us to build and release Windows artifacts.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
this branch is motivated by [review feedback](https://github.com/linkerd/linkerd2-proxy/pull/3504#discussion_r1999706761) from #3504. see
linkerd/linkerd2#8733 for more information on upgrading `hyper`. there,
we asked:
> I wonder if we should be a little more defensive about cloning [`HttpConnect`]. What does cloning it mean? When handling a CONNECT request, we can't clone the request, really. (Technically, we can't clone the body, but practically, it means we can't clone the request). Can we easily track whether this was accidentally cloned (i.e. with a custom Clone impl or Arc or some such) and validate at runtime (i.e., in proxy::http::h1) that everything is copacetic?
`linkerd-http-upgrade` provides a `HttpConnect` type that is intended
for use as a response extension. this commit performs a refactor,
removing this type.
we use this extension in a single piece of tower middleware. typically,
these sorts of extensions are intended for e.g. passing state between
distinct layers of tower middleware, or otherwise facilitating
extensions to the HTTP family of protocols.
this extension is only constructed and subsequently referenced within a
single file, in the `linkerd_proxy_http::http::h1::Client`. we can
perform the same task by using the `is_http_connect` boolean we use to
conditionally insert this extension.
then, this branch removes a helper function for a computation whose
amortization is no longer as helpful. now that we are passing
`is_http_connect` down into this function, we are no longer inspecting
the response's extensions. because of that, the only work to do is to
check the status code, which is a very cheap comparison.
this also restates an `if version != HTTP_11 { .. }` conditional block as
a match statement. this is a code motion change, none of the inner blocks
are changed.
reviewers are encouraged to examine this branch commit-by-commit; because
of the sensitivity of this change, this refactor is performed in small,
methodical changes.
for posterity, i've run the linkerd/linkerd2 test suite against this branch, as of
57dd7f4a60.
---
* refactor(http/upgrade): remove `HttpConnect` extension
`linkerd-http-upgrade` provides a `HttpConnect` type that is intended
for use as a response extension. this commit performs a refactor,
removing this type.
we use this extension in a single piece of tower middleware. typically,
these sorts of extensions are intended for e.g. passing state between
distinct layers of tower middleware, or otherwise facilitating
extensions to the HTTP family of protocols.
this extension is only constructed and subsequently referenced within a
single file, in the `linkerd_proxy_http::http::h1::Client`. we can
perform the same task by using the `is_http_connect` boolean we use to
conditionally insert this extension.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(proxy/http): fold helper function
this removes a helper function for a computation whose amortization is
no longer as helpful.
now that we are passing `is_http_connect` down into this function, we
are no longer inspecting the response's extensions. because of that, the
only work to do is to check the status code, which is a very cheap
comparison.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(proxy/http): match on response status
this commit refactors a sequence of conditional blocks in a helper
function used to identity HTTP/1.1 upgrades.
this commit replaces this sequence of conditional blocks with a match
statement.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(proxy/http): rename `res` to `rsp`
we follow a convention where we tend to name responses `rsp`, not `res`
or `resp`. this commit applies that convention to this helper function.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(proxy/http): import `Version`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(proxy/http): match on http version
this restates an `if version != HTTP_11 { .. }` conditional block as a
match statement.
this is a code motion change, none of the inner blocks are changed.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(proxy/http): add comments on http/1.1
this commit adds a brief comment noting that upgrades are a concept
specific to http/1.1.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Outbound hostname metrics were recently disabled. This conditionally re-enables those through a `LINKERD2_PROXY_OUTBOUND_METRICS_HOSTNAME_LABELS` env var, wired through the policy/routing config with the option of individual policies and routes to set this separately from the global config.
Signed-off-by: Scott Fleener <scott@buoyant.io>
this commit adds a `[workspace.package]` table at the root of the cargo
workspace. constituent manifests are updated to use the workspace-level
metadata.
this is generally a superficial chore, but has a pleasant future upside:
when new rust editions are released (e.g. 2024), we will only need to
update the edition specified at the root of the workspace.
Signed-off-by: katelyn martin <kate@buoyant.io>
* build(deps): bump tempfile from 3.17.1 to 3.19.0
Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.17.1 to 3.19.0.
- [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Stebalien/tempfile/compare/v3.17.1...v3.19.0)
---
updated-dependencies:
- dependency-name: tempfile
dependency-type: indirect
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deny.toml): skip rustix v0.38
this commit adds mention of rustix, whose 1.0 release is still
propagating through the ecosystem, to the deny.toml.
nb: this also removes the bitflags directive, which no longer included a
duplicate version.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: katelyn martin <kate@buoyant.io>
this commit performs a small refactor to one of the unit tests in
`linkerd-stack`'s load-shedding middleware.
this adds a span to the worker tasks spawned in this test, so that
tracing logs can be associated with particular oneshot services.
see #3744 for more information on upgrading our tower dependency. this
is cherry-picked from investigations on that branch related to breaking
changes in 0.5 related to the `Buffer` middleware.
after this change, logs now look like this:
```
; RUST_LOG="trace" cargo test -p linkerd-stack buffer_load_shed -- --nocapture
running 1 test
[ 0.002770s] TRACE worker{id=oneshot1}: tower::buffer::service: sending request to buffer worker
[ 0.002809s] TRACE worker{id=oneshot2}: tower::buffer::service: sending request to buffer worker
[ 0.002823s] TRACE worker{id=oneshot3}: tower::buffer::service: sending request to buffer worker
[ 0.002843s] DEBUG worker{id=oneshot4}: linkerd_stack::loadshed: Service has become unavailable
[ 0.002851s] DEBUG worker{id=oneshot4}: linkerd_stack::loadshed: Service shedding load
[ 0.002878s] TRACE tower::buffer::worker: worker polling for next message
[ 0.002885s] TRACE tower::buffer::worker: processing new request
[ 0.002892s] TRACE worker{id=oneshot1}: tower::buffer::worker: resumed=false worker received request; waiting for service readiness
[ 0.002901s] DEBUG worker{id=oneshot1}: tower::buffer::worker: service.ready=true processing request
[ 0.002914s] TRACE worker{id=oneshot1}: tower::buffer::worker: returning response future
[ 0.002926s] TRACE tower::buffer::worker: worker polling for next message
[ 0.002931s] TRACE tower::buffer::worker: processing new request
[ 0.002935s] TRACE worker{id=oneshot2}: tower::buffer::worker: resumed=false worker received request; waiting for service readiness
[ 0.002946s] TRACE worker{id=oneshot2}: tower::buffer::worker: service.ready=false delay
[ 0.002983s] TRACE worker{id=oneshot5}: tower::buffer::service: sending request to buffer worker
[ 0.003001s] DEBUG worker{id=oneshot6}: linkerd_stack::loadshed: Service has become unavailable
[ 0.003007s] DEBUG worker{id=oneshot6}: linkerd_stack::loadshed: Service shedding load
[ 0.003017s] DEBUG worker{id=oneshot7}: linkerd_stack::loadshed: Service has become unavailable
[ 0.003024s] DEBUG worker{id=oneshot7}: linkerd_stack::loadshed: Service shedding load
[ 0.003035s] TRACE tower::buffer::worker: worker polling for next message
[ 0.003041s] TRACE tower::buffer::worker: resuming buffered request
[ 0.003045s] TRACE worker{id=oneshot2}: tower::buffer::worker: resumed=true worker received request; waiting for service readiness
[ 0.003052s] DEBUG worker{id=oneshot2}: tower::buffer::worker: service.ready=true processing request
[ 0.003060s] TRACE worker{id=oneshot2}: tower::buffer::worker: returning response future
[ 0.003068s] TRACE tower::buffer::worker: worker polling for next message
[ 0.003073s] TRACE tower::buffer::worker: processing new request
[ 0.003077s] TRACE worker{id=oneshot3}: tower::buffer::worker: resumed=false worker received request; waiting for service readiness
[ 0.003084s] DEBUG worker{id=oneshot3}: tower::buffer::worker: service.ready=true processing request
[ 0.003091s] TRACE worker{id=oneshot3}: tower::buffer::worker: returning response future
[ 0.003099s] TRACE tower::buffer::worker: worker polling for next message
[ 0.003103s] TRACE tower::buffer::worker: processing new request
[ 0.003107s] TRACE worker{id=oneshot5}: tower::buffer::worker: resumed=false worker received request; waiting for service readiness
[ 0.003114s] DEBUG worker{id=oneshot5}: tower::buffer::worker: service.ready=true processing request
[ 0.003121s] TRACE worker{id=oneshot5}: tower::buffer::worker: returning response future
[ 0.003129s] TRACE tower::buffer::worker: worker polling for next message
test loadshed::tests::buffer_load_shed ... ok
```
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit replaces `humantime`, which is no longer maintained, with
`jiff`.
see this error when `main` today is built:
```
error[unmaintained]: humantime is unmaintained
┌─ /linkerd/linkerd2-proxy/Cargo.lock:78:1
│
78 │ humantime 2.1.0 registry+https://github.com/rust-lang/crates.io-index
│ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ unmaintained advisory detected
│
├ ID: RUSTSEC-2025-0014
├ Advisory: https://rustsec.org/advisories/RUSTSEC-2025-0014
├ Latest `humantime` crates.io release is four years old and GitHub repository has
not seen commits in four years. Question about maintenance status has not gotten
any reaction from maintainer: https://github.com/tailhook/humantime/issues/31
## Possible alternatives
* [jiff](https://crates.io/crates/jiff) provides same kind of functionality
├ Announcement: https://github.com/tailhook/humantime/issues/31
├ Solution: No safe upgrade is available!
├ humantime v2.1.0
└── linkerd-http-access-log v0.1.0
└── linkerd-app-inbound v0.1.0
├── linkerd-app v0.1.0
│ ├── linkerd-app-integration v0.1.0
│ └── linkerd2-proxy v0.1.0
├── linkerd-app-admin v0.1.0
│ ├── linkerd-app v0.1.0 (*)
│ └── (dev) linkerd-app-integration v0.1.0 (*)
└── linkerd-app-gateway v0.1.0
└── linkerd-app v0.1.0 (*)
advisories FAILED, bans ok, licenses ok, sources ok
```
see:
* https://github.com/rustsec/advisory-db/pull/2249.
* https://github.com/tailhook/humantime/issues/31.
Signed-off-by: katelyn martin <kate@buoyant.io>
kubert-prometheus-process is a new crate that includes all of Linkerd's system
metrics and more. This also helps avoid annoying compilation build issues on
non-Linux systems.
this updates the prometheus client dependency.
additionally, this commit updates the `kubert-prometheus-tokio`
dependency, so that we agree on the client library in use.
Signed-off-by: katelyn martin <kate@buoyant.io>
When the proxy boots up, it needs to select a number of I/O worker threads to
allocate to the runtime. This change adds a new environment variable that allows
this value to scale based on the number of CPUs available on on the host.
A CORES_MAX_RATIO value of 1.0 will allocate one worker thread per CPU core. A
lesser value will allocate fewer worker threads. Values are rounded to the
nearest whole number.
The CORES_MIN value sets a lower bound on the number of worker threads to use.
The CORES_MAX value sets an upper bound.
The proxy predates the multi-threaded tokio runtime. When switching to it, we
added a 'multicore' feature to adopt it incrementally. This has been the only
supported configuration for many years now.
This change removes the needless feature flag to simplify the runtime
configuration.
The outbound proxy makes protocol decisions based on the discovery response,
keyed on a "parent" reference.
This change adds a `protocol::metrics` middleware that records connection counts
by parent reference.
Inbound proxies may receive meshed traffic directly on the proxy's inbound port
with a transport header, informing inbound routing behavior.
This change updates the inbound proxy to record metrics about the usage of
transport headers, including the total number of requests with a transport
header by session protocol and target port.
This change updates the DetectHttp middleware to record metrics about HTTP
protocol detection. Specfically, it records the the counts of results and a very
coarse histogram of the time taken to detect the protocol.
The inbound, outbound, and admin (via inbound) stacks are updated to record
metrics against the main registry.
* refactor(http): consolidate HTTP protocol detection
Linkerd's HTTP protocol detection logic is spread across a few crates: the
linkerd-detect crate is generic over the actual protocol detection logic, and
the linkerd-proxy-http crate provides an implementation. There are no other
implemetations of the Detect interface. This leads to gnarly type signatures in
the form `Result<Option<http::Variant>, DetectTimeoutError>`: simultaneously
verbose and not particularly informative (what does the None case mean exactly).
This commit introduces a new crate, `linkerd-http-detect`, consolidating this
logic and removes the prior implementations. The admin, inbound, and outbound
stacks are updated to use these new types. This work is done in anticipation of
introducing metrics that report HTTP detection behavior.
There are no functional changes.
* feat(http/detect)!: error when the socket is closed
When a proxy does protocol detection, the initial read may indicate that the
connection was closed by the client with no data being written to the socket. In
such a case, the proxy continues to process the connection as if may be proxied,
but we expect this to fail immediately. This can lead to unexpected proxy
behavior: for example, inbound proxies may report policy denials.
To address this, this change surfaces an error (as if the read call failed).
This could, theoretically, impact some bizarre clients that initiate half-open
connections. These corner cases can use explicit opaque policies to bypass
detection.
We include a group/version/kind for inbound server resources, but we do not
indicate which specific port the server is applied to. This is important context
to understand the inbound proxy's behavior, especially when using the default
servers.
This change adds a `srv_port` label to inbound server metrics to definitively
and consistently indicate the server port used for inbound policy.
The RefusedNoTarget error type is a remnant of an older version of the direct
stack. This commit updates the error message to reflect the current state of the
code: we require ALPN-negotiated transport headers on all direct connections.
Linkerd's HTTP protocol detection logic is spread across a few crates: the
linkerd-detect crate is generic over the actual protocol detection logic, and
the linkerd-proxy-http crate provides an implementation. There are no other
implemetations of the Detect interface. This leads to gnarly type signatures in
the form `Result<Option<http::Variant>, DetectTimeoutError>`: simultaneously
verbose and not particularly informative (what does the None case mean exactly).
This commit introduces a new crate, `linkerd-http-detect`, consolidating this
logic and removes the prior implementations. The admin, inbound, and outbound
stacks are updated to use these new types. This work is done in anticipation of
introducing metrics that report HTTP detection behavior.
There are no functional changes.
Our build can occaisionally fail when the sha is not a valid semver label:
--- stdout
cargo:rustc-env=GIT_SHA=025979070
cargo:rustc-env=LINKERD2_PROXY_BUILD_DATE=2025-03-08T16:32:34Z
--- stderr
thread 'main' panicked at linkerd/app/core/build.rs:18:17:
LINKERD2_PROXY_VERSION must be semver: version='0.0.0-dev.025979070'
error='invalid leading zero in pre-release identifier'
To fix this, the dot is removed so the version string is 0.0.0-dev025979070,
which is valid.
pr #3715 missed a small handful of cargo dependencies. this commit marks
these so that they also use the workspace-level tower version.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(deps): `tower` is a workspace dependency
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
see https://github.com/linkerd/linkerd2-proxy/pull/3504 as well.
see #3456 (c740b6d8), #3466 (ca50d6bb), #3473 (b87455a9), and #3701
(cf4ef39) for some other previous pr's that moved dependencies to be
managed at the workspace level.
see also https://github.com/linkerd/drain-rs/pull/36 for another related
pull request that relates to our tower dependency.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(deps): `tower-service` is a workspace dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(deps): `tower-test` is a workspace dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
noticed while addressing `cargo-deny` errors in #3504. these crates
include a few unused dependencies, which we can remove. while we
are in the neighborhood, we make some subjective tweaks to tidy up
these imports.
---
* chore(opentelemetry): remove unused `http` dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(opentelemetry): tidy imports
this groups imports at the crate level, and directly imports some
imports from their respective crates rather than through an alias of
said crate. a `self` prefix is added to clarify imports from submodules
of this crate.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(opentelemetry): remove unused `tokio-stream` dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(opencensus): remove unused `http` dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(opencensus): use self prefix in import
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Currently, TCP metrics are not logged for HTTP requests coming in through the tagged transport header stack.
This adds that instrumentation, like we do for the opaque and gateway stacks already present.
Signed-off-by: Scott Fleener <scott@buoyant.io>
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
this commit moves `prost-build` so that it is now managed as a workspace
dependency. while only used in tests, these tests can fail if this is
not versioned in lockstep with our other protobuffer dependencies.
see #3456 (c740b6d8), #3466 (ca50d6bb), and especially #3473 (b87455a9)
for some other previous pr's that moved dependencies to be managed at
the workspace level.
Signed-off-by: katelyn martin <kate@buoyant.io>
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
we are in the process of upgrading to hyper 1.x.
in the process of doing so, we will wish to use our friendly `BoxBody`
type, which provides a convenient and reusable interface to abstract
over different artitrary `B`-typed request and response bodies.
unfortunately, by virtue of its definition, it is not a `Sync` type:
```rust
pub struct BoxBody {
inner: Pin<Box<dyn Body<Data = Data, Error = Error> + Send + 'static>>,
}
#[pin_project]
pub struct Data {
#[pin]
inner: Box<dyn bytes::Buf + Send + 'static>,
}
```
these are erased `Box<dyn ..>` objects that only ensure `Send`-ness.
rather than changing that, because that is the proper definition of the
type, we should update code in our test client and test server to stop
requesting arbitrary `Sync` bounds.
this commit removes `Sync` bounds from various places that in fact only
need be `Send + 'static`.
this will help facilitate making use of `BoxBody` in #3504.
Signed-off-by: katelyn martin <kate@buoyant.io>
this method is not used by any test code, nor any other internal code.
this commit removes
`linkerd_app_integration::tcp::TcpConn::target_addr()`.
Signed-off-by: katelyn martin <kate@buoyant.io>
`TapEventExt` provides an extension trait interface that we use to
extends `linkerd_proxy_api::tap::TapEvent` with additional interfaces
for use in integration tests.
this commit removes `request_init_path()`. this method was originally
added in 3ac6b72c4 (#154), but was never actually implemented and will
only ever panic when invoked. thus, it can be removed.
Signed-off-by: katelyn martin <kate@buoyant.io>
we follow a convention of grouping imported symbols at the crate-level.
this commit tidies up imports in `linkerd_app_integration::tcp` to
follow this convention.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_integration::tcp` provides a `TcpClient` type that is
distinct from the primary `linkerd_app_integration::client::Client` type
broadly used in integration tests.
this commit makes a small change to reduce indirection, and clarify that
this is constructing a different client implementation from a different
submodule.
this removes `linkerd_app_integration::client::tcp()`, and updates test
code to call the `tcp::client()` function that this is masking.
this is the client-side equivalent to #3688 (a10d1d7e).
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit removes some misdirection from the various constructors for
our test server.
currently, we expose a family of constructor functions `server::new()`,
`server::http1()`, ..., and so forth.
each of these invoke a private `server::Server::http1()`,
`server::Server::http2()`, `server::Server::http2_tls()`, ...,
counterpart, which then delegates down once more to another private
constructor `server::Server::new()`.
this is all a bit roundabout, particularly because these private
constructors are not used by any other internal code in the `server`
submodule.
this commit removes these inherent `Server` constructors, since they are
private and not used by any test code. each free-standing constructor
function is altered to instead directly construct a `Server`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): remove `Request`, `Response` aliases
see https://github.com/linkerd/linkerd2/issues/8733.
this commit removes two type aliases from our test server
implementation. these are each tied to the defunct `hyper::Body` type.
since much of this code was originally written (between 2017 and 2020)
we've since developed some patterns / idioms elsewhere for dealing with
request and response bodies.
to help set the stage for tweaks to which interfaces need
`hyper::body::Incoming`, which types work with our general default of
`BoxBody`, and which can be generic across arbitrary `B`-typed bodies,
we remove these aliases and provide the body parameter to `Request` and
`Response`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): remove `Request`, `Response` aliases
see https://github.com/linkerd/linkerd2/issues/8733.
this commit removes two type aliases from our test client
implementation. these are each tied to the defunct `hyper::Body` type.
since much of this code was originally written (between 2017 and 2020)
we've since developed some patterns / idioms elsewhere for dealing with
request and response bodies.
to help set the stage for tweaks to which interfaces need
`hyper::body::Incoming`, which types work with our general default of
`BoxBody`, and which can be generic across arbitrary `B`-typed bodies,
we remove these aliases and provide the body parameter to `Request` and
`Response`.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/integration): add whitespace for consistency
we follow a convention of an empty line between functions.
this commit adds an empty line.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/integration): remove whitespace for consistency
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/integration): add whitespace for consistency
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
these constants exist, and are generally considered a best practice for
these situations.
this commit replaces numeric literals with named constants.
Signed-off-by: katelyn martin <kate@buoyant.io>
the test server implementation in `linkerd_app_integration` defines an
`BoxError` alias. we have a boxed error type in
`linkerd_app_core::Error` that achieves the same purpose, that we can
use instead.
this commit replaces this type alias with a reëxport of
`linkerd_app_core::Error`.
see also, #3685, which removed another similar alias.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_integration::tcp` provides a `TcpServer` type that is
distinct from the primary `linkerd_app_integration::server::Server` type
broadly used in integration tests.
this commit makes a small change to reduce indirection, and clarify that
this is constructing a different server implementation from a
different submodule.
this removes `linkerd_app_integration::server::tcp()`, and updates test
code to call the `tcp::server()` function that this is masking.
Signed-off-by: katelyn martin <kate@buoyant.io>
elsewhere in our codebase, we follow a pattern that can be called a
"new service". this is a `Service<T>` whose response `S` is itself
a `Service<U>`.
new services are often useful for dealing with particular connection
semantics, and provide us a way to model a connection that services many
requests.
our test server code makes use of a `Svc`, which wraps a reference to a
map of uri's and routes. there is an associated `NewSvc` type that does
not provide any material benefit. this `NewSvc` type is a `Service<()>`
that never exerts backpressure, nor performs any action besides
`Arc::clone`ing the map of routes.
this commit golfs down `linkerd_app_integration::server::Server`, by
directly cloning the routes into a `Svc(_)`, without the need for
polling a future or handling an (impossible) error.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_integration::running()` is a public function that is not
used by any external callers. this function is used in one place, when
setting up test client used for integration tests.
this commit inlines this logic, and moves the associated `Running` type
alias down alongside the `Run` enum.
Signed-off-by: katelyn martin <kate@buoyant.io>
To support cross-compilation to windows, this change adds an 'os' param to the
justfile, used in the release to cross-build to x86_64-pc-windows-gnu.
This will produce a binary named 'linkerd2-proxy-v2.999.9-x86_64.exe'.
The proxy does not yet compile on windows, so this is a placeholder for now.
`linkerd_app_integration` defines an `Error` alias.
we have a boxed error type in `linkerd_app_core::Error` that achieves
the same purpose, that we can use instead.
this commit replaces this type alias with a reëxport of
`linkerd_app_core::Error`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): use `Result::expect()`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): clarify `<SyncSvc as Service<T>>::call()`
this commit makes some cosmetic changes to
`linkerd_app_integration::tap::SyncSvc`'s implementation of
`tower::Service<T>`.
documentation comments are added to clarify something that makes this
service slightly interesting, and notably different from code suitable
for use in production / real-world contexts.
this service wraps an underlying `Client`, and provides a service
implementation that deals with arbitrary `B`-typed request bodies.
this provides a flexible adapter that simplifies test code.
this service, however, *blocks* the calling thread (off-task) to collect
the body into a cheaply-cloneable `Bytes`.
this commit outlines that logic into an associated function and adds
additional documentation noting this property, and the basis for this
assumption.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): loosen `SyncSvc` bounds
the bounds placed upon the inbound request's `B`-typed body are overly
restrictive for `<SyncSvc as Service<T>>`. this commit removes some
superfluous bounds, so that only those that are currently needed by this
code are now required.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
see https://github.com/linkerd/linkerd2-proxy/pull/3559 and
https://github.com/linkerd/linkerd2-proxy/pull/3614 for more information
on the `ForwardCompatibleBody<B>` wrapper.
this branch updates test code in `linkerd-app-integration` so that it
interacts with request and response bodies via an adapter that polls for
frames in a manner consistent with the 1.0 api of `http_body`.
this allows us to limit the diff in
https://github.com/linkerd/linkerd2-proxy/pull/3504, which will only
need to remove this adapter once using hyper 1.0.
see #3671 and #3672, which perform the same change for
`linkerd-app-inbound` and `linkerd-app-outbound`, respectively.
---
* chore(app/integration): `linkerd-http-body-compat` test dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): generalize `hyper::Body`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/integration): use `ForwardCompatibleBody`
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
see https://github.com/linkerd/linkerd2-proxy/pull/3559 and
https://github.com/linkerd/linkerd2-proxy/pull/3614 for more information
on the `ForwardCompatibleBody<B>` wrapper.
this branch updates test code in `linkerd-app-outbound` so that it
interacts with request and response bodies via an adapter that polls for
frames in a manner consistent with the 1.0 api of `http_body`.
this allows us to limit the diff in
https://github.com/linkerd/linkerd2-proxy/pull/3504, which will only
need to remove this adapter once using hyper 1.0.
see #3671 and #3673, which perform the same change for
`linkerd-app-inbound` and `linkerd-app-integration`, respectively.
---
* chore(app/outbound): `linkerd-http-body-compat` test dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/outbound): use `Response::into_body()`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/outbound): use `ForwardCompatibleBody`
see https://github.com/linkerd/linkerd2/issues/8733.
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/outbound): use `ForwardCompatibleBody`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/outbound): use `ForwardCompatibleBody`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/outbound): use `ForwardCompatibleBody`
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
see https://github.com/linkerd/linkerd2/issues/8733 for more
information.
see https://github.com/linkerd/linkerd2-proxy/pull/3559 and
https://github.com/linkerd/linkerd2-proxy/pull/3614 for more information
on the `ForwardCompatibleBody<B>` wrapper.
this branch updates test code in `linkerd-app-inbound` so that it
interacts with request and response bodies via an adapter that polls for
frames in a manner consistent with the 1.0 api of `http_body`.
this allows us to limit the diff in
https://github.com/linkerd/linkerd2-proxy/pull/3504, which will only
need to remove this adapter once using hyper 1.0.
see #3672 and #3673, which perform the same change for
`linkerd-app-outbound` and `linkerd-app-integration`, respectively.
---
* refactor(app/inbound): `linkerd-http-body-compat` test dependency
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): use `ForwardCompatibleBody`
see https://github.com/linkerd/linkerd2/issues/8733.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
see #3651 and linkerd/linkerd2#8733.
#3651 missed this unused trait bound, which we want to loosen
to account for changes in hyper's api.
Signed-off-by: katelyn martin <kate@buoyant.io>