this commit removes a piece of code that has been commented out.
it additionally removes a variable binding that is not needed. `dst` is
not moved, so we do not need to bind the address of the destination
service to a variable, nor do we need to clone it.
Signed-off-by: katelyn martin <kate@buoyant.io>
Now that the `rustls` initialization/configuration has been decoupled from `linkerd-meshtls`, we can get the provider directly from there. This handles the uninitialized case better, which should be less of a problem now that we always directly initialize the provider in main.
Signed-off-by: Scott Fleener <scott@buoyant.io>
`NewBroadcastClassification<C, X, N>` is not used.
`BroadcastClassification<C, S>` is only used by the `gate` submodule in
this crate.
this commit removes `NewBroadcastClassification`, since it is unused.
this commit demotes `channel` to an internal submodule, since it has no
external users.
the reëxport of `BroadcastClassification` is unused, though it is left
intact because it _is_ exposed by `NewClassifyGateSet`'s implementation
of `NewService<T>`.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_core::classify` reëxports symbols from
`linkerd_proxy_http::classify::gate`.
nothing makes use of this, and these symbols are already reëxported from
`linkerd_proxy_http::classify`. existing callsites in the outbound proxy
import this middleware directly, or though the reëxport in
`linkerd_proxy_http`.
this commit removes this `pub use` directive, since it does nothing.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Move rustls into dedicated crate
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove extraneous provider installs from tests
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Install default rustls provider in main
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor shims out of meshtls
Meshtls previously assumed that mutliple TLS implementations could be used. Now that we've consolidated on rustls as the TLS implementation, we can remove these shims.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor mode out of meshtls
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
this commit removes `linkerd_app_inbound::Inbound::proxy_metrics()`.
this accessor is not used anywhere.
Signed-off-by: katelyn martin <kate@buoyant.io>
* feat(tls): Explicitly include post-quantum key exchange algorithms
This explicitly sets the key exchange algorithms the proxy uses. It adds `X25519MLKEM768` as the most preferred algorithm in non-FIPS mode, and `SECP256R1MLKEM768` in FIPS mode.
Note that `X25519MLKEM768` is still probably appropriate for FIPS environments according to [NIST's special publication 800-56Cr2](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Cr2.pdf) as it performs a FIPS-approved key-establishment first (`MLKEM768`), but we should evaluate this position more before committing to it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
this comment changes this comment in two ways:
1. fix a copy-paste typo. this should say "inbound", not "outbound".
2. add note that this is a "legacy" structure.
the equivalent structure in the outbound proxy was labeled as such
in https://github.com/linkerd/linkerd2-proxy/pull/2887.
see:
```rust
/// Holds LEGACY outbound proxy metrics.
#[derive(Clone, Debug)]
pub struct OutboundMetrics {
pub(crate) http_errors: error::Http,
pub(crate) tcp_errors: error::Tcp,
// pub(crate) http_route_backends: RouteBackendMetrics,
// pub(crate) grpc_route_backends: RouteBackendMetrics,
/// Holds metrics that are common to both inbound and outbound proxies. These metrics are
/// reported separately
pub(crate) proxy: Proxy,
pub(crate) prom: PromMetrics,
}
```
\- <dce6b61191/linkerd/app/outbound/src/metrics.rs (L22-L35)>
`authz::HttpAuthzMetrics`, `error::HttpErrorMetrics`,
`authz::TcpAuthzMetrics`, and `error::TcpErrorMetrics` all make use of
the "legacy" metrics implementation defined in `linkerd_metrics`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* feat(tls): Remove boring as a TLS implementation
BoringSSL, as we use it today, doesn't integrate well with the broader rustls ecosystem, so this removes it. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
This adds a few small improvements to how we handle the `aws-lc` usage in the proxy:
- Pull provider customization to the `aws-lc` backend, reducing the amount that the module exposes
- Validate that the provider is actually FIPS compatible when fips is enabled
- Use the same signature verification algorithms in the `rustls` provider as we do in the cert verifier. Previously, the provider also included RSA_PSS_2048_8192_SHA256, which is marked as legacy and we don't have a strong reason to support.
- Add change detector tests for the cipher suites, key exchange groups, and signature algorithms. These should ideally never change unless `rustls` changes, at which point we can re-evaluate which algorithms are in use by the proxy.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
`linkerd-metrics` contains a suite of facilities for defining,
registering, and serving Prometheus metrics. these predate the
[`prometheus-client`](https://crates.io/crates/prometheus-client/)
crate, which should now be used for our metrics.
`linkerd-metrics` defines a `prom` namespace, which reëxports symbols
from the `prometheus-client` library. as the documentation comment for
this submodule notes, this should be used for all new metrics.
6b323d8457/linkerd/metrics/src/lib.rs (L30-L60)
`linkerd-metrics` still provides its legacy types in the public surface
of this library today, which can make it difficult to differentiate
between our two metrics implementations.
this branch introduces a new `legacy` namespace, to help clarify the
distinction between these two Prometheus implementations, and to smooth
the road to further adoption of `prometheus-client` interfaces across
the proxy.
---
* refactor(metrics): introduce empty `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Counter` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Gauge` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Histogram` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Metric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtLabels` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `LastUpdate` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Store` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `SharedStore` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Serve` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `NewMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Factor` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This includes a small set of metrics about the currently installed rustls crypto provider and the algorithms it is configured to use.
We don't have 100% assurance that a default crypto provider has been installed before registering the metric, but in local testing it never appeared to be a problem. When we refactor the rustls initialization we can add an extra guarantee that we've initialized it by this point.
Example metric:
```
# HELP rustls_info Proxy TLS info.
# TYPE rustls_info gauge
rustls_info{tls_suites="TLS13_AES_128_GCM_SHA256,TLS13_AES_256_GCM_SHA384,TLS13_CHACHA20_POLY1305_SHA256,",tls_kx_groups="X25519,secp256r1,secp384r1,X25519MLKEM768,",tls_rand="AwsLcRs",tls_key_provider="AwsLcRs",tls_fips="false"} 1
```
Signed-off-by: Scott Fleener <scott@buoyant.io>
We only build linux/amd64 during typical release CI runs. This means that the platform-
specific builds are not exercised. This change updates the release workflow so that
all platforms are built whenever the workflow itself is changed.
The broader ecosystem has mostly moved to `aws-lc-rs` as the primary `rustls` backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
This requires some extra configuration for successful cross-compilation, ideally we can remove this extra configuration once linkerd/dev v48 is available.
This doesn't remove `ring` as a crypto backend, that can come in a follow-up at https://github.com/linkerd/linkerd2-proxy/pull/4029
* build(deps): bump tokio from 1.45.0 to 1.47.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.0 to 1.47.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.45.0...tokio-1.47.0)
---
updated-dependencies:
- dependency-name: tokio
dependency-version: 1.47.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
---
chore(deny): ignore socket2@v0.5
there is now a v0.6 used by the latest tokio.
while we wait for this new version to propagate through the ecosystem,
allow for two socket2 dependencies.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_io_err test
> @cratelyn I think it would be appropriate to remove these tests, given
> that they can no longer behave properly. I don't think that this test
> case is particularly meaningful or load bearing, it's best just to
> unblock the dependency updates.
\- <https://github.com/BuoyantIO/enterprise-linkerd/issues/1645#issuecomment-3046905516>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_multi test
this test exercises the same thing that the previous two tests do, as
the comment at the top of it points out.
this test is redundant, and we have removed the i/o error coverage that
this was redunant with. let's remove it.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
This architecture has become too significant of a maintenance burden, and isn't used often enough to justify the associated maintenance cost.
This removes arm/v7 from all the build infrastructure/dockerfiles/etc. Note that arm64 targets are still widely used and well supported.
Related: https://github.com/linkerd/linkerd2/pull/14308
Signed-off-by: Scott Fleener <scott@buoyant.io>
Currently, disabling the `ring` feature does not actually disable the dependency across the tree. Doing so requires a couple of tightly coupled steps:
- Making `ring` and `aws-lc` exclusive features, raising a compile error if they are both enabled.
- Removing a direct dependency on some `ring` types, and instead going through `rustls` for equivalent functionality.
- Removing a direct dependency on the `ring` crypto provider for integration tests, and instead using the provider from `linkerd-meshtls`.
- Installing the default crypto provider globally for the process and re-using it when requested, mostly to make the tests pass.
This was tested using a temporary `cargo deny` config that forbid `ring` when `aws-lc-rs` was used, and vice-versa. Note that it doesn't completely remove ring for dev dependencies, but that can be done as a follow-up.
Signed-off-by: Scott Fleener <scott@buoyant.io>
This makes two changes to the preferred cipher suite order.
- Prefer AES algorithms over ChaCha20. AES is significantly faster when AES hardware is present, and AES hardware is on all x86 CPUs since ~2010, and all ARM server CPUs for a similar amount of time. For these reasons it's reasonable to default to AES for modern deployments, and it's the same default that `aws-lc-rs` makes anyway.
- Remove ChaCha20 when FIPS is enabled. It's no longer a supported algorithm, so we shouldn't have it as an option.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Auditing tools like Syft cannot inspect proxy dependencies, which makes it difficult to inspect the state of a binary. This change updates the release process to use cargo-auditable, which documents the proxy's crate dependencies in its release binary.
tikv-jemallocator supersedes jemallocator. To enable jemalloc profiling, this change updates the dependency and adds a `jemalloc-profiling` feature so that profiling can be enabled at build time.
We use the ubuntu-24.04 runner by default, but in forks this may not be appropriate. This change updates the runners to support overriding via the LINKERD2_PROXY_RUNNER variable.
* build(deps): bump the rustls group across 1 directory with 3 updates
Bumps the rustls group with 3 updates in the / directory: [rustls-webpki](https://github.com/rustls/webpki), [rustls](https://github.com/rustls/rustls) and [rustls-pki-types](https://github.com/rustls/pki-types).
Updates `rustls-webpki` from 0.103.1 to 0.103.2
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.1...v/0.103.2)
Updates `rustls` from 0.23.26 to 0.23.27
- [Release notes](https://github.com/rustls/rustls/releases)
- [Changelog](https://github.com/rustls/rustls/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rustls/rustls/compare/v/0.23.26...v/0.23.27)
Updates `rustls-pki-types` from 1.11.0 to 1.12.0
- [Release notes](https://github.com/rustls/pki-types/releases)
- [Commits](https://github.com/rustls/pki-types/compare/v/1.11.0...v/1.12.0)
---
updated-dependencies:
- dependency-name: rustls-webpki
dependency-version: 0.103.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls
dependency-version: 0.23.27
dependency-type: indirect
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls-pki-types
dependency-version: 1.12.0
dependency-type: indirect
update-type: version-update:semver-minor
dependency-group: rustls
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix(rustls): Remove dependency on most rustls internal types
We only used these types for generating a ClientHello message for testing. Instead, we can manually encode a sample message based on the TLS spec.
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Scott Fleener <scott@buoyant.io>
* chore(app/env): fix typo
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(app/env): a lower default maximum per-host connection limit
see also:
* #4004
* linkerd/linkerd2#14204
in #4004 we fixed an issue related to our HTTP/1.1 client's connection
pool.
this further hedges against future issues related to our HTTP client
exhausting resources available to its container. today, the limit by
default is `usize::MAX`, which is dramatically higher than the practical
limit.
this commit changes the limit for outbound idle connections per-host to
10,000.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
When constructing the HTTP/1 client, we configure connection pooling, but
notably do not provide a timer implementation to Hyper. This causes hyper's
connection pool to be configured without idle timeouts, which may lead to
resource leaks, especially for clients that communicate with many virtual hosts.
This change updates the HTTP/1 client builder to use a Tokio timer, which allows
Hyper to manage idle timeouts correctly.
This is a strong ciphersuite that's reasonable to include as a supported option. We still prefer CHACHA20_POLY1305 in non-FIPS modes for its speed, as well as keeping CHACHA20_POLY1305 as a backup for older proxies that only support it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
this is based on #3987.
in #3987 (_see https://github.com/linkerd/linkerd2/issues/13821_) we discovered that some of the types that implement [`FmtLabels`](085be9978d/linkerd/metrics/src/fmt.rs (L5)) could collide when used in registry keys; i.e., they might emit identical label sets, but distinct `Hash` values.
#3987 solves two bugs. this pull request proposes a follow-on change, introducing _exhaustive_ bindings to implementations of `FmtLabels`, to prevent this category of bug from reoccurring again in the future.
this change means that the introduction of an additional field to any of these label structures, e.g. `OutboundEndpointLabels` or `HTTPLocalRateLimitLabels`, will cause a compilation error unless said new field is handled in the corresponding `FmtLabels` implementation.
### 🔖 a note
in writing this pull request, i noticed one label that i believe is unintentionally being elided. i've refrained from changing behavior in this pull request. i do note it though, as an example of this syntax identifying the category of bug i hope to hedge against here.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(transport-metrics): exhaustive `Eos: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TlsAccept: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TargetAddr: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(metrics): exhaustive `Label: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(http/metrics): exhaustive `Status: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ControlLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ProfileRouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `InboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabel: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `OutboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `Authority: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `StackLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `HTTPLocalRateLimitLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `Key<L>: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): remove redundant banner comment
these impl blocks are all `FmtLabels`, following another series of the
same, above. we don't need another one of these comments.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): exhaustive `AndThen: FmtMetrics`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/core): note unused label
see #3262 (618838ec7), which introduced this label.
to preserve behavior, this label remains unused.
X-Ref: #3262
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
The inbound and outbound connect backoffs are now set at 500ms. This is very aggressive in practice, especially when an endpoint remains unavailable.
This change increases the maximum backoff durations:
* inbound: 10s
* outbound: 60s
The default minimum backoff durations remain unchanged at 100ms so that failed
connections are retried quickly. This change only increases the default _maximum_ backoff so that the timeout increases substantially when an endpoint is unavailable for a longer period of time.
### 🖼️ background
the linkerd2 proxy implements, registers, and exports Prometheus metrics using a variety of systems, for historical reasons. new metrics broadly rely upon the official [`prometheus-client`](https://github.com/prometheus/client_rust/) library, whose interfaces are reexported for internal consumption in the [`linkerd_metrics::prom`](https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/metrics/src/lib.rs#L30-L60) namespace.
other metrics predate this library however, and rely on the metrics registry implemented in the workspace's [`linkerd-metrics`](https://github.com/linkerd/linkerd2-proxy/tree/main/linkerd/metrics) library.
### 🐛 bug report
* https://github.com/linkerd/linkerd2/issues/13821linkerd/linkerd2#13821 reported a bug in which duplicate metrics could be observed and subsequently dropped by Prometheus when upgrading the control plane via helm with an existing workload running.
### 🦋 reproduction example
for posterity, i'll note the reproduction steps here.
i used these steps to identify the `2025.3.2` edge release as the affected release. upgrading from `2025.2.3` to `2025.3.1` did not exhibit this behavior. see below for more discussion about the cause.
generate certificates via <https://linkerd.io/2.18/tasks/generate-certificates/>
using these two deployments, courtesy of @GTRekter:
<details>
<summary>**💾 click to expand: app deployment**</summary>
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: simple-app
annotations:
linkerd.io/inject: enabled
---
apiVersion: v1
kind: Service
metadata:
name: simple-app-v1
namespace: simple-app
spec:
selector:
app: simple-app-v1
version: v1
ports:
- port: 80
targetPort: 5678
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-app-v1
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: simple-app-v1
version: v1
template:
metadata:
labels:
app: simple-app-v1
version: v1
spec:
containers:
- name: http-app
image: hashicorp/http-echo:latest
args:
- "-text=Simple App v1"
ports:
- containerPort: 5678
```
</details>
<details>
<summary>**🤠 click to expand: client deployment**</summary>
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: traffic
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: traffic
template:
metadata:
labels:
app: traffic
spec:
containers:
- name: traffic
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
while true; do
TIMESTAMP_SEND=$(date '+%Y-%m-%d %H:%M:%S')
PAYLOAD="{\"timestamp\":\"$TIMESTAMP_SEND\",\"test_id\":\"sniff_me\",\"message\":\"hello-world\"}"
echo "$TIMESTAMP_SEND - Sending payload: $PAYLOAD"
RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
http://simple-app-v1.simple-app.svc.cluster.local:80)
TIMESTAMP_RESPONSE=$(date '+%Y-%m-%d %H:%M:%S')
echo "$TIMESTAMP_RESPONSE - RESPONSE: $RESPONSE"
sleep 1
done
```
</details>
and this prometheus configuration:
<details>
<summary>**🔥 click to expand: prometheus configuration**</summary>
```yaml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'pod'
scrape_interval: 10s
static_configs:
- targets: ['localhost:4191']
labels:
group: 'traffic'
```
</details>
we will perform the following steps:
```sh
# install the edge release
# specify the versions we'll migrate between.
export FROM="2025.3.1"
export TO="2025.3.2"
# create a cluster, and add the helm charts.
kind create cluster
helm repo add linkerd-edge https://helm.linkerd.io/edge
# install linkerd's crd's and control plane.
helm install linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $FROM
helm install linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $FROM \
linkerd-edge/linkerd-control-plane
# install a simple app and a client to drive traffic.
kubectl apply -f duplicate-metrics-simple-app.yml
kubectl apply -f duplicate-metrics-traffic.yml
# bind the traffic pod's metrics port to the host.
kubectl port-forward -n simple-app deploy/traffic 4191
# start prometheus, begin scraping metrics
prometheus --config.file=prometheus.yml
```
now, open a browser and query `irate(request_total[1m])`.
next, upgrade the control plane:
```
helm upgrade linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $TO
helm upgrade linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $TO \
linkerd-edge/linkerd-control-plane
```
prometheus will begin emitting warnings regarding 34 time series being dropped.
in your browser, querying `irate(request_total[1m])` once more will show that
the rate of requests has stopped, due to the new time series being dropped.
next, restart the workloads...
```
kubectl rollout restart deployment -n simple-app simple-app-v1 traffic
```
prometheus warnings will go away, as reported in linkerd/linkerd2#13821.
### 🔍 related changes
* https://github.com/linkerd/linkerd2/pull/13699
* https://github.com/linkerd/linkerd2/pull/13715
in linkerd/linkerd2#13715 and linkerd/linkerd2##13699, we made some changes to the destination controller. from the "Cautions" section of the `2025.3.2` edge release:
> Additionally, this release changes the default for `outbound-transport-mode`
> to `transport-header`, which will result in all traffic between meshed
> proxies flowing on port 4143, rather than using the original destination
> port.
linkerd/linkerd2#13699 (_included in `edge-25.3.1`_) introduced this outbound transport-protocol configuration surface, but maintained the default behavior, while linkerd/linkerd2#13715 (_included in `edge-25.3.2`_) altered the default behavior to route meshed traffic via port 4143.
this is a visible change in behavior that can be observed when upgrading from a version that preceded this change to the mesh. this means that when upgrading across `edge-25.3.2`, such as from the `2025.2.1` to `2025.3.2` versions of the helm charts, or from the `2025.2.3` to the `2025.3.4` versions of the helm charts (_reported upstream in linkerd/linkerd2#13821_), the freshly upgraded destination controller pods will begin routing meshed traffic differently.
i'll state explicitly, _that_ is not a bug! it is, however, an important clue to bear in mind: data plane pods that were started with the previous control plane version, and continue running after the control plane upgrade, will have seen both routing patterns. reporting a duplicate time series for affected metrics indicates that there is a hashing collision in our metrics system.
### 🐛 the bug(s)
we define a collection to structures to model labels for inbound and outbound endpoints'
metrics:
```rust
// linkerd/app/core/src/metrics.rs
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum EndpointLabels {
Inbound(InboundEndpointLabels),
Outbound(OutboundEndpointLabels),
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct InboundEndpointLabels {
pub tls: tls::ConditionalServerTls,
pub authority: Option<http::uri::Authority>,
pub target_addr: SocketAddr,
pub policy: RouteAuthzLabels,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct OutboundEndpointLabels {
pub server_id: tls::ConditionalClientTls,
pub authority: Option<http::uri::Authority>,
pub labels: Option<String>,
pub zone_locality: OutboundZoneLocality,
pub target_addr: SocketAddr,
}
```
\- <https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs>
bear particular attention to the derived `Hash` implementation. note the `tls::ConditionalClientTls` and `tls::ConditionalServerTls` types used in each of these labels. these are used by some of our types like `TlsConnect` to emit prometheus labels, using our legacy system's `FmtLabels` trait:
```rust
// linkerd/app/core/src/transport/labels.rs
impl FmtLabels for TlsConnect<'_> {
fn fmt_labels(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.0 {
Conditional::None(tls::NoClientTls::Disabled) => {
write!(f, "tls=\"disabled\"")
}
Conditional::None(why) => {
write!(f, "tls=\"no_identity\",no_tls_reason=\"{}\"", why)
}
Conditional::Some(tls::ClientTls { server_id, .. }) => {
write!(f, "tls=\"true\",server_id=\"{}\"", server_id)
}
}
}
}
```
\- <99316f7898/linkerd/app/core/src/transport/labels.rs (L151-L165)>
note the `ClientTls` case, which ignores fields in the client tls information:
```rust
// linkerd/tls/src/client.rs
/// A stack parameter that configures a `Client` to establish a TLS connection.
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub struct ClientTls {
pub server_name: ServerName,
pub server_id: ServerId,
pub alpn: Option<AlpnProtocols>,
}
```
\- <99316f7898/linkerd/tls/src/client.rs (L20-L26)>
this means that there is potential for an identical set of labels to be emitted given two `ClientTls` structures with distinct server names or ALPN protocols. for brevity, i'll elide the equivalent issue with `ServerTls`, and its corresponding `TlsAccept<'_>` label implementation, though it exhibits the same issue.
### 🔨 the fix
this pull request introduces two new types: `ClientTlsLabels` and `ServerTlsLabels`. these continue to implement `Hash`, for use as a key in our metrics registry, and for use in formatting labels.
`ClientTlsLabels` and `ServerTlsLabels` each resemble `ClientTls` and `ServerTls`, respectively, but do not contain any fields that are elided in label formatting, to prevent duplicate metrics from being emitted.
relatedly, #3988 audits our existing `FmtLabels` implementations and makes use of exhaustive bindings, to prevent this category of problem in the short-term future. ideally, we might eventually consider replacing the metrics interfaces in `linkerd-metrics`, but that is strictly kept out-of-scope for the purposes of this particular fix.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This change does two things:
- adds support for `NamedPipes` to our SPIRE client. This will allow the client to connect to spire agents running on Windows hosts
- renames the `LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET` to `LINKERD2_PROXY_IDENTITY_SPIRE_WORKLOAD_API_ADDRESS` and deprecates the former.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
`linkerd-app-test` relies on some dependencies that are unused.
this commit removes these dependencies from the crate's manifest.
see #3928 and #3929.
Signed-off-by: katelyn martin <kate@buoyant.io>
see linkerd/linkerd2#14050.
this change fixes a logical bug with
`linkerd_http_retry::peek_trailers::PeekTrailersBody::<B>::read_body(..)`.
`read_body(..)` constructs a `PeekTrailersBody<B>`, by polling the inner
body to see whether or not it can reach the end of the stream by only
yielding to the asynchronous runtime once.
in linkerd/linkerd2-proxy#3559, we restructured this middleware's
internal modeling to reflect the `Frame<T>`-oriented signatures of the
`http_body::Body` trait's 1.0 interface.
unfortunately, this included a bug which could cause the first frame in
a stream to be discarded if the second `Body::poll_frame()` call
(_invoked via `now_or_never()`_) returns `Pending`. this could cause
non-deterministic errors for users when sending traffic to HTTPRoutes
and GRPCRoutes with retry annotations applied.
this change rectifies this problem, ensuring that the first frame is not
discarded when attempting to peek a body's trailers.
to confirm that this works as expected, additional test coverage is
introduced that confirms that the data and trailers of the inner body
are passed through faithfully.
---
* feat(http/retry): additional `PeekTrailersBody<B>` test coverage
this commit introduces additional test coverage to
`linker_http_retry::peek_trailers::PeekTrailersBody<B>`.
this body middleware is used to facilitate transparent http retries, and
allows callers to possibly inspect the trailers for a response, by
polling an `http_body::Body`.
this commit introduces additional unit test coverage that confirms that
the data and trailers of the inner body are passed through faithfully.
Signed-off-by: katelyn martin <kate@buoyant.io>
* feat(http/retry): another `PeekTrailersBody<B>` test case
this commit introduces some additional coverage for bodies that return
`Pending` when polled a second time.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(http/retry): `PeekTrailersBody<B>` retains first frame
this commit fixes a logical bug with
`linkerd_http_retry::peek_trailers::PeekTrailersBody::<B>::read_body(..)`.
`read_body(..)` constructs a `PeekTrailersBody<B>`, by polling the inner
body to see whether or not it can reach the end of the stream by only
yielding to the asynchronous runtime once.
in linkerd/linkerd2-proxy#3559, we restructured this middleware's
internal modeling to reflect the `Frame<T>`-oriented signatures of the
`http_body::Body` trait's 1.0 interface.
unfortunately, this included a bug which could cause the first frame in
a stream to be discarded if the second `Body::poll_frame()` call
(_invoked via `now_or_never()`_) returns `Pending`. this could cause
non-deterministic errors for users when sending traffic to HTTPRoutes
and GRPCRoutes with retry annotations applied.
this commit rectifies this problem, ensuring that the first frame is not
discarded when attempting to peek a body's trailers.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd-app-test` exposes some functions that we never use elsewhere.
this commit removes these functions.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_test::service` contains facilities that are unused.
this commit removes this submodule from the `linkerd-app-test` library.
Signed-off-by: katelyn martin <kate@buoyant.io>
this is a trivial, cosmetic change.
`Config` has two consecutive `impl` blocks in the `linkerd-app` library.
these do not include distinct generics or trait bounds, so the methods
contained therein do not need to live in two distinct `impl` blocks.
this commit consolidates these blocks.
while we are performing this change, we add two `=== impl T ===`
banners, which are used throughout the project as greppable strings to
find methods and trait implementations for a given type.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit hoists `tracing`, used liberally throughout our project,
such that it is managed as a single workspace dependency.
this will be helpful someday when a 0.2 release happens.
Signed-off-by: katelyn martin <kate@buoyant.io>
this commit introduces a concrete error type for the `orig_proto`
upgrade layer.
this layer is used by the proxy's http client to transparently upgrade
outbound http/1 traffic to http/2. rather than boxing errors, we define
a concrete error type to facilitate inspecting errors in the future.
for now, the top-level http client continues to box errors thrown by the
"orig_proto" upgrade client.
see also, #3894 (ea75ac0).
Signed-off-by: katelyn martin <kate@buoyant.io>
the `linkerd-error` crate includes two functions that can be used to
examine the cause of a dynamic, boxed error. for example, here is the
`is_caused_by()` function, used in some of our error recovery logic:
```rust
/// Determines whether the provided error was caused by an `E` typed error.
pub fn is_caused_by<E: std::error::Error + 'static>(
mut error: &(dyn std::error::Error + 'static),
) -> bool {
loop {
if error.is::<E>() {
return true;
}
error = match error.source() {
Some(e) => e,
None => return false,
};
}
}
```
we rely on [`thiserror`](https://github.com/dtolnay/thiserror/) to
generate boilerplate code for our error structures. this includes an
attribute called `transparent` that will delegate down to an inner
error.
however, this delegation means that the causal chains inspected by
the function above might not properly identify an inner error. this
test, for example, fails:
```rust
// linkerd/dns/src/lib.rs
#[derive(Debug, Clone, Error)]
#[error("invalid SRV record {:?}", self.0)]
struct InvalidSrv(rdata::SRV);
#[derive(Debug, Error)]
enum SrvRecordError {
#[error(transparent)]
Invalid(#[from] InvalidSrv),
#[error("failed to resolve SRV record: {0}")]
Resolve(#[from] hickory_resolver::ResolveError),
}
#[test]
fn srv_record_reports_cause_correctly() {
let srv = "foobar.linkerd-dst-headless.linkerd.svc.cluster.local."
.parse::<hickory_resolver::Name>()
.map(|name| rdata::SRV::new(1, 1, 8086, name))
.expect("a valid domain name");
let error = SrvRecordError::Invalid(InvalidSrv(srv));
let error: Box<dyn std::error::Error + 'static> = Box::new(error);
assert!(linkerd_error::is_caused_by::<InvalidSrv>(&*error));
assert!(linkerd_error::cause_ref::<InvalidSrv>(&*error).is_some());
}
```
the `transparent` attribute will delegate directly down to `InvalidSrv`
when `Error::source()` is invoked. this means that our downcasting logic
in `linkerd-error` used to ascertain causes of dynamic, boxed errors
will fail to identify a `SrvRecordError` as being caused by an
`InvalidSrv`.
by replacing the `transparent` attribute with a `"{0}"` display
attribute, we continue to transparently show the inner error when
printed as a string, but will include `InvalidSrv` in the causal chain.
this branch replaces `transparent` attributes in an assortment of
error variants.
---
* test(dns): add a failing test
this commit adds a failing unit test. this test shows that dns errors
might not report their cause correctly, due to thiserror's `transparent`
attribute passing directly through to `InvalidSrv`'s cause.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(dns): replace `error(transparent)` attribute
this commit fixes the failing unit test introduced in the previous
commit.
the `transparent` attribute will delegate directly down to `InvalidSrv`
when `Error::source()` is invoke. this means that our downcasting logic
in `linkerd-error` used to ascertain causes of dynamic, boxed errors
will fail to identify a `SrvRecordError` as being caused by an
`InvalidSrv`.
by replacing the `transparent` attribute with a `"{0}"` display
attribute, we continue to transparently show the inner error when
printed as a string, but will include `InvalidSrv` in the causal chain.
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: errors report inner sources
this commit performs the same transformation as the previous commit,
replacing `transparent` with equivalent pass-through `"{0}"` display
strings, adding `#[source]` where needed.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
this structure exposes its fields, but those fields are never accessed
elsewhere, aside from test code.
this commit removes the `pub` directives from the address and tls
fields. in their stead, test interfaces are added to allow the
`tagged_transport` test suite to function.
Signed-off-by: katelyn martin <kate@buoyant.io>
this is a small mechanical refactor to the http/1 client.
our http/2 and "orig_proto" clients are tower services. our http/1
client, on the other hand, exposes a concrete inherent method `request`.
to be consistent, this changes our http client to treat this http/1
client as a service as well.
Signed-off-by: katelyn martin <kate@buoyant.io>
This has a few benefits. Primarily this gives us a reasonable path to creating FIPS-enabled builds on architectures other than x86-64, as well as a path away from using BoringSSL as a backend.
Additionally, rustls has been using the aws-lc-rs library as the default backend for a little while now, so this gives us the opportunity to stay in line with the most widely used option in the ecosystem.
Signed-off-by: Scott Fleener <scott@buoyant.io>
the initial replay body, circa the usage of our "compatibility" layer
(4b53081, #3598), used to need an extra poll to confirm the absence of
trailers before it would report itself as reaching the end of the
stream. these tests were added in (afda8a7b3, #3583).
this was an artifact of how the compatibility middleware masked the
previous `poll_data()` and `poll_trailer()` methods behind a
forward-compatible `poll_frame()`- and `frame()`-oriented interface.
this commit removes these extra calls to `initial.frame().await`, now
that the initial body will report the end of stream without an extra
call to await a `None`.
X-ref: #3598
X-ref: #3583
Signed-off-by: katelyn martin <kate@buoyant.io>
This introduces a GitHub Copilot instructions file under .github to guide AI-driven code generation and updates the devcontainer configuration accordingly.
The new instructions enforce Rust styling, error handling, and tracing conventions across the project. It ensures generated code passes `cargo fmt` and `clippy`, avoids unwraps, and uses structured logging.
In 65db3dd we enabled overriding the behavior to export TLS hostnames for
outbound traffic, but we omitted TLS hostname labels.
This change updates the tls module to mirror the http module's behavior.
we use the `symbolic-common` and `symbolic-demangle` crates in our
dependency tree. these live in the same repo, here:
<https://github.com/getsentry/symbolic>
this commit introduces a "group" so that dependabot will upgrade them in
lockstep, rather than individually, such as in pull requests like
#3853, #3852, #3857, #3858, or #3860.
Signed-off-by: katelyn martin <kate@buoyant.io>