this commit removes a piece of code that has been commented out.
it additionally removes a variable binding that is not needed. `dst` is
not moved, so we do not need to bind the address of the destination
service to a variable, nor do we need to clone it.
Signed-off-by: katelyn martin <kate@buoyant.io>
Now that the `rustls` initialization/configuration has been decoupled from `linkerd-meshtls`, we can get the provider directly from there. This handles the uninitialized case better, which should be less of a problem now that we always directly initialize the provider in main.
Signed-off-by: Scott Fleener <scott@buoyant.io>
`NewBroadcastClassification<C, X, N>` is not used.
`BroadcastClassification<C, S>` is only used by the `gate` submodule in
this crate.
this commit removes `NewBroadcastClassification`, since it is unused.
this commit demotes `channel` to an internal submodule, since it has no
external users.
the reëxport of `BroadcastClassification` is unused, though it is left
intact because it _is_ exposed by `NewClassifyGateSet`'s implementation
of `NewService<T>`.
Signed-off-by: katelyn martin <kate@buoyant.io>
`linkerd_app_core::classify` reëxports symbols from
`linkerd_proxy_http::classify::gate`.
nothing makes use of this, and these symbols are already reëxported from
`linkerd_proxy_http::classify`. existing callsites in the outbound proxy
import this middleware directly, or though the reëxport in
`linkerd_proxy_http`.
this commit removes this `pub use` directive, since it does nothing.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Move rustls into dedicated crate
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove extraneous provider installs from tests
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Install default rustls provider in main
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor shims out of meshtls
Meshtls previously assumed that mutliple TLS implementations could be used. Now that we've consolidated on rustls as the TLS implementation, we can remove these shims.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Refactor mode out of meshtls
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
this commit removes `linkerd_app_inbound::Inbound::proxy_metrics()`.
this accessor is not used anywhere.
Signed-off-by: katelyn martin <kate@buoyant.io>
* feat(tls): Explicitly include post-quantum key exchange algorithms
This explicitly sets the key exchange algorithms the proxy uses. It adds `X25519MLKEM768` as the most preferred algorithm in non-FIPS mode, and `SECP256R1MLKEM768` in FIPS mode.
Note that `X25519MLKEM768` is still probably appropriate for FIPS environments according to [NIST's special publication 800-56Cr2](https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Cr2.pdf) as it performs a FIPS-approved key-establishment first (`MLKEM768`), but we should evaluate this position more before committing to it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
this comment changes this comment in two ways:
1. fix a copy-paste typo. this should say "inbound", not "outbound".
2. add note that this is a "legacy" structure.
the equivalent structure in the outbound proxy was labeled as such
in https://github.com/linkerd/linkerd2-proxy/pull/2887.
see:
```rust
/// Holds LEGACY outbound proxy metrics.
#[derive(Clone, Debug)]
pub struct OutboundMetrics {
pub(crate) http_errors: error::Http,
pub(crate) tcp_errors: error::Tcp,
// pub(crate) http_route_backends: RouteBackendMetrics,
// pub(crate) grpc_route_backends: RouteBackendMetrics,
/// Holds metrics that are common to both inbound and outbound proxies. These metrics are
/// reported separately
pub(crate) proxy: Proxy,
pub(crate) prom: PromMetrics,
}
```
\- <dce6b61191/linkerd/app/outbound/src/metrics.rs (L22-L35)>
`authz::HttpAuthzMetrics`, `error::HttpErrorMetrics`,
`authz::TcpAuthzMetrics`, and `error::TcpErrorMetrics` all make use of
the "legacy" metrics implementation defined in `linkerd_metrics`.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* feat(tls): Remove boring as a TLS implementation
BoringSSL, as we use it today, doesn't integrate well with the broader rustls ecosystem, so this removes it. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
This adds a few small improvements to how we handle the `aws-lc` usage in the proxy:
- Pull provider customization to the `aws-lc` backend, reducing the amount that the module exposes
- Validate that the provider is actually FIPS compatible when fips is enabled
- Use the same signature verification algorithms in the `rustls` provider as we do in the cert verifier. Previously, the provider also included RSA_PSS_2048_8192_SHA256, which is marked as legacy and we don't have a strong reason to support.
- Add change detector tests for the cipher suites, key exchange groups, and signature algorithms. These should ideally never change unless `rustls` changes, at which point we can re-evaluate which algorithms are in use by the proxy.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Remove ring as crypto backend
The broader ecosystem has mostly moved to aws-lc-rs as the primary rustls backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
There will need to be some refactoring to clean up the rustls provider interfaces, but that will come in follow-ups.
Signed-off-by: Scott Fleener <scott@buoyant.io>
* chore(tls): Restore existing aws-lc feature names for compatibility
Signed-off-by: Scott Fleener <scott@buoyant.io>
* fix(tls): Use correct feature name for fips conditionals
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: Scott Fleener <scott@buoyant.io>
`linkerd-metrics` contains a suite of facilities for defining,
registering, and serving Prometheus metrics. these predate the
[`prometheus-client`](https://crates.io/crates/prometheus-client/)
crate, which should now be used for our metrics.
`linkerd-metrics` defines a `prom` namespace, which reëxports symbols
from the `prometheus-client` library. as the documentation comment for
this submodule notes, this should be used for all new metrics.
6b323d8457/linkerd/metrics/src/lib.rs (L30-L60)
`linkerd-metrics` still provides its legacy types in the public surface
of this library today, which can make it difficult to differentiate
between our two metrics implementations.
this branch introduces a new `legacy` namespace, to help clarify the
distinction between these two Prometheus implementations, and to smooth
the road to further adoption of `prometheus-client` interfaces across
the proxy.
---
* refactor(metrics): introduce empty `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Counter` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Gauge` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Histogram` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Metric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetric` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `FmtLabels` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `LastUpdate` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Store` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `SharedStore` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Serve` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `NewMetrics` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor: move `Factor` into `legacy` namespace
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This includes a small set of metrics about the currently installed rustls crypto provider and the algorithms it is configured to use.
We don't have 100% assurance that a default crypto provider has been installed before registering the metric, but in local testing it never appeared to be a problem. When we refactor the rustls initialization we can add an extra guarantee that we've initialized it by this point.
Example metric:
```
# HELP rustls_info Proxy TLS info.
# TYPE rustls_info gauge
rustls_info{tls_suites="TLS13_AES_128_GCM_SHA256,TLS13_AES_256_GCM_SHA384,TLS13_CHACHA20_POLY1305_SHA256,",tls_kx_groups="X25519,secp256r1,secp384r1,X25519MLKEM768,",tls_rand="AwsLcRs",tls_key_provider="AwsLcRs",tls_fips="false"} 1
```
Signed-off-by: Scott Fleener <scott@buoyant.io>
We only build linux/amd64 during typical release CI runs. This means that the platform-
specific builds are not exercised. This change updates the release workflow so that
all platforms are built whenever the workflow itself is changed.
The broader ecosystem has mostly moved to `aws-lc-rs` as the primary `rustls` backend, and we should follow suit. This will also simplify the maintenance of the proxy's TLS implementation in the long term.
This requires some extra configuration for successful cross-compilation, ideally we can remove this extra configuration once linkerd/dev v48 is available.
This doesn't remove `ring` as a crypto backend, that can come in a follow-up at https://github.com/linkerd/linkerd2-proxy/pull/4029
* build(deps): bump tokio from 1.45.0 to 1.47.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.45.0 to 1.47.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.45.0...tokio-1.47.0)
---
updated-dependencies:
- dependency-name: tokio
dependency-version: 1.47.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
---
chore(deny): ignore socket2@v0.5
there is now a v0.6 used by the latest tokio.
while we wait for this new version to propagate through the ecosystem,
allow for two socket2 dependencies.
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_io_err test
> @cratelyn I think it would be appropriate to remove these tests, given
> that they can no longer behave properly. I don't think that this test
> case is particularly meaningful or load bearing, it's best just to
> unblock the dependency updates.
\- <https://github.com/BuoyantIO/enterprise-linkerd/issues/1645#issuecomment-3046905516>
Co-authored-by: Oliver Gould <ver@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
* chore(app/integration): remove inbound_multi test
this test exercises the same thing that the previous two tests do, as
the comment at the top of it points out.
this test is redundant, and we have removed the i/o error coverage that
this was redunant with. let's remove it.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
This architecture has become too significant of a maintenance burden, and isn't used often enough to justify the associated maintenance cost.
This removes arm/v7 from all the build infrastructure/dockerfiles/etc. Note that arm64 targets are still widely used and well supported.
Related: https://github.com/linkerd/linkerd2/pull/14308
Signed-off-by: Scott Fleener <scott@buoyant.io>
Currently, disabling the `ring` feature does not actually disable the dependency across the tree. Doing so requires a couple of tightly coupled steps:
- Making `ring` and `aws-lc` exclusive features, raising a compile error if they are both enabled.
- Removing a direct dependency on some `ring` types, and instead going through `rustls` for equivalent functionality.
- Removing a direct dependency on the `ring` crypto provider for integration tests, and instead using the provider from `linkerd-meshtls`.
- Installing the default crypto provider globally for the process and re-using it when requested, mostly to make the tests pass.
This was tested using a temporary `cargo deny` config that forbid `ring` when `aws-lc-rs` was used, and vice-versa. Note that it doesn't completely remove ring for dev dependencies, but that can be done as a follow-up.
Signed-off-by: Scott Fleener <scott@buoyant.io>
This makes two changes to the preferred cipher suite order.
- Prefer AES algorithms over ChaCha20. AES is significantly faster when AES hardware is present, and AES hardware is on all x86 CPUs since ~2010, and all ARM server CPUs for a similar amount of time. For these reasons it's reasonable to default to AES for modern deployments, and it's the same default that `aws-lc-rs` makes anyway.
- Remove ChaCha20 when FIPS is enabled. It's no longer a supported algorithm, so we shouldn't have it as an option.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Auditing tools like Syft cannot inspect proxy dependencies, which makes it difficult to inspect the state of a binary. This change updates the release process to use cargo-auditable, which documents the proxy's crate dependencies in its release binary.
tikv-jemallocator supersedes jemallocator. To enable jemalloc profiling, this change updates the dependency and adds a `jemalloc-profiling` feature so that profiling can be enabled at build time.
We use the ubuntu-24.04 runner by default, but in forks this may not be appropriate. This change updates the runners to support overriding via the LINKERD2_PROXY_RUNNER variable.
* build(deps): bump the rustls group across 1 directory with 3 updates
Bumps the rustls group with 3 updates in the / directory: [rustls-webpki](https://github.com/rustls/webpki), [rustls](https://github.com/rustls/rustls) and [rustls-pki-types](https://github.com/rustls/pki-types).
Updates `rustls-webpki` from 0.103.1 to 0.103.2
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.1...v/0.103.2)
Updates `rustls` from 0.23.26 to 0.23.27
- [Release notes](https://github.com/rustls/rustls/releases)
- [Changelog](https://github.com/rustls/rustls/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rustls/rustls/compare/v/0.23.26...v/0.23.27)
Updates `rustls-pki-types` from 1.11.0 to 1.12.0
- [Release notes](https://github.com/rustls/pki-types/releases)
- [Commits](https://github.com/rustls/pki-types/compare/v/1.11.0...v/1.12.0)
---
updated-dependencies:
- dependency-name: rustls-webpki
dependency-version: 0.103.2
dependency-type: direct:production
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls
dependency-version: 0.23.27
dependency-type: indirect
update-type: version-update:semver-patch
dependency-group: rustls
- dependency-name: rustls-pki-types
dependency-version: 1.12.0
dependency-type: indirect
update-type: version-update:semver-minor
dependency-group: rustls
...
Signed-off-by: dependabot[bot] <support@github.com>
* fix(rustls): Remove dependency on most rustls internal types
We only used these types for generating a ClientHello message for testing. Instead, we can manually encode a sample message based on the TLS spec.
Signed-off-by: Scott Fleener <scott@buoyant.io>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Scott Fleener <scott@buoyant.io>
* chore(app/env): fix typo
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix(app/env): a lower default maximum per-host connection limit
see also:
* #4004
* linkerd/linkerd2#14204
in #4004 we fixed an issue related to our HTTP/1.1 client's connection
pool.
this further hedges against future issues related to our HTTP client
exhausting resources available to its container. today, the limit by
default is `usize::MAX`, which is dramatically higher than the practical
limit.
this commit changes the limit for outbound idle connections per-host to
10,000.
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
When constructing the HTTP/1 client, we configure connection pooling, but
notably do not provide a timer implementation to Hyper. This causes hyper's
connection pool to be configured without idle timeouts, which may lead to
resource leaks, especially for clients that communicate with many virtual hosts.
This change updates the HTTP/1 client builder to use a Tokio timer, which allows
Hyper to manage idle timeouts correctly.
This is a strong ciphersuite that's reasonable to include as a supported option. We still prefer CHACHA20_POLY1305 in non-FIPS modes for its speed, as well as keeping CHACHA20_POLY1305 as a backup for older proxies that only support it.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
this is based on #3987.
in #3987 (_see https://github.com/linkerd/linkerd2/issues/13821_) we discovered that some of the types that implement [`FmtLabels`](085be9978d/linkerd/metrics/src/fmt.rs (L5)) could collide when used in registry keys; i.e., they might emit identical label sets, but distinct `Hash` values.
#3987 solves two bugs. this pull request proposes a follow-on change, introducing _exhaustive_ bindings to implementations of `FmtLabels`, to prevent this category of bug from reoccurring again in the future.
this change means that the introduction of an additional field to any of these label structures, e.g. `OutboundEndpointLabels` or `HTTPLocalRateLimitLabels`, will cause a compilation error unless said new field is handled in the corresponding `FmtLabels` implementation.
### 🔖 a note
in writing this pull request, i noticed one label that i believe is unintentionally being elided. i've refrained from changing behavior in this pull request. i do note it though, as an example of this syntax identifying the category of bug i hope to hedge against here.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(transport-metrics): exhaustive `Eos: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TlsAccept: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `TargetAddr: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(metrics): exhaustive `Label: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(http/metrics): exhaustive `Status: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ControlLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ProfileRouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `InboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerLabel: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `ServerAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `RouteAuthzLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `OutboundEndpointLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `Authority: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/core): exhaustive `StackLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `HTTPLocalRateLimitLabels: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* refactor(app/inbound): exhaustive `Key<L>: FmtLabels`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): remove redundant banner comment
these impl blocks are all `FmtLabels`, following another series of the
same, above. we don't need another one of these comments.
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(metrics): exhaustive `AndThen: FmtMetrics`
Signed-off-by: katelyn martin <kate@buoyant.io>
* nit(app/core): note unused label
see #3262 (618838ec7), which introduced this label.
to preserve behavior, this label remains unused.
X-Ref: #3262
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
The inbound and outbound connect backoffs are now set at 500ms. This is very aggressive in practice, especially when an endpoint remains unavailable.
This change increases the maximum backoff durations:
* inbound: 10s
* outbound: 60s
The default minimum backoff durations remain unchanged at 100ms so that failed
connections are retried quickly. This change only increases the default _maximum_ backoff so that the timeout increases substantially when an endpoint is unavailable for a longer period of time.
### 🖼️ background
the linkerd2 proxy implements, registers, and exports Prometheus metrics using a variety of systems, for historical reasons. new metrics broadly rely upon the official [`prometheus-client`](https://github.com/prometheus/client_rust/) library, whose interfaces are reexported for internal consumption in the [`linkerd_metrics::prom`](https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/metrics/src/lib.rs#L30-L60) namespace.
other metrics predate this library however, and rely on the metrics registry implemented in the workspace's [`linkerd-metrics`](https://github.com/linkerd/linkerd2-proxy/tree/main/linkerd/metrics) library.
### 🐛 bug report
* https://github.com/linkerd/linkerd2/issues/13821linkerd/linkerd2#13821 reported a bug in which duplicate metrics could be observed and subsequently dropped by Prometheus when upgrading the control plane via helm with an existing workload running.
### 🦋 reproduction example
for posterity, i'll note the reproduction steps here.
i used these steps to identify the `2025.3.2` edge release as the affected release. upgrading from `2025.2.3` to `2025.3.1` did not exhibit this behavior. see below for more discussion about the cause.
generate certificates via <https://linkerd.io/2.18/tasks/generate-certificates/>
using these two deployments, courtesy of @GTRekter:
<details>
<summary>**💾 click to expand: app deployment**</summary>
```yaml
apiVersion: v1
kind: Namespace
metadata:
name: simple-app
annotations:
linkerd.io/inject: enabled
---
apiVersion: v1
kind: Service
metadata:
name: simple-app-v1
namespace: simple-app
spec:
selector:
app: simple-app-v1
version: v1
ports:
- port: 80
targetPort: 5678
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-app-v1
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: simple-app-v1
version: v1
template:
metadata:
labels:
app: simple-app-v1
version: v1
spec:
containers:
- name: http-app
image: hashicorp/http-echo:latest
args:
- "-text=Simple App v1"
ports:
- containerPort: 5678
```
</details>
<details>
<summary>**🤠 click to expand: client deployment**</summary>
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: traffic
namespace: simple-app
spec:
replicas: 1
selector:
matchLabels:
app: traffic
template:
metadata:
labels:
app: traffic
spec:
containers:
- name: traffic
image: curlimages/curl:latest
command:
- /bin/sh
- -c
- |
while true; do
TIMESTAMP_SEND=$(date '+%Y-%m-%d %H:%M:%S')
PAYLOAD="{\"timestamp\":\"$TIMESTAMP_SEND\",\"test_id\":\"sniff_me\",\"message\":\"hello-world\"}"
echo "$TIMESTAMP_SEND - Sending payload: $PAYLOAD"
RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$PAYLOAD" \
http://simple-app-v1.simple-app.svc.cluster.local:80)
TIMESTAMP_RESPONSE=$(date '+%Y-%m-%d %H:%M:%S')
echo "$TIMESTAMP_RESPONSE - RESPONSE: $RESPONSE"
sleep 1
done
```
</details>
and this prometheus configuration:
<details>
<summary>**🔥 click to expand: prometheus configuration**</summary>
```yaml
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'pod'
scrape_interval: 10s
static_configs:
- targets: ['localhost:4191']
labels:
group: 'traffic'
```
</details>
we will perform the following steps:
```sh
# install the edge release
# specify the versions we'll migrate between.
export FROM="2025.3.1"
export TO="2025.3.2"
# create a cluster, and add the helm charts.
kind create cluster
helm repo add linkerd-edge https://helm.linkerd.io/edge
# install linkerd's crd's and control plane.
helm install linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $FROM
helm install linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $FROM \
linkerd-edge/linkerd-control-plane
# install a simple app and a client to drive traffic.
kubectl apply -f duplicate-metrics-simple-app.yml
kubectl apply -f duplicate-metrics-traffic.yml
# bind the traffic pod's metrics port to the host.
kubectl port-forward -n simple-app deploy/traffic 4191
# start prometheus, begin scraping metrics
prometheus --config.file=prometheus.yml
```
now, open a browser and query `irate(request_total[1m])`.
next, upgrade the control plane:
```
helm upgrade linkerd-crds linkerd-edge/linkerd-crds \
-n linkerd --create-namespace --version $TO
helm upgrade linkerd-control-plane \
-n linkerd \
--set-file identityTrustAnchorsPEM=cert/ca.crt \
--set-file identity.issuer.tls.crtPEM=cert/issuer.crt \
--set-file identity.issuer.tls.keyPEM=cert/issuer.key \
--version $TO \
linkerd-edge/linkerd-control-plane
```
prometheus will begin emitting warnings regarding 34 time series being dropped.
in your browser, querying `irate(request_total[1m])` once more will show that
the rate of requests has stopped, due to the new time series being dropped.
next, restart the workloads...
```
kubectl rollout restart deployment -n simple-app simple-app-v1 traffic
```
prometheus warnings will go away, as reported in linkerd/linkerd2#13821.
### 🔍 related changes
* https://github.com/linkerd/linkerd2/pull/13699
* https://github.com/linkerd/linkerd2/pull/13715
in linkerd/linkerd2#13715 and linkerd/linkerd2##13699, we made some changes to the destination controller. from the "Cautions" section of the `2025.3.2` edge release:
> Additionally, this release changes the default for `outbound-transport-mode`
> to `transport-header`, which will result in all traffic between meshed
> proxies flowing on port 4143, rather than using the original destination
> port.
linkerd/linkerd2#13699 (_included in `edge-25.3.1`_) introduced this outbound transport-protocol configuration surface, but maintained the default behavior, while linkerd/linkerd2#13715 (_included in `edge-25.3.2`_) altered the default behavior to route meshed traffic via port 4143.
this is a visible change in behavior that can be observed when upgrading from a version that preceded this change to the mesh. this means that when upgrading across `edge-25.3.2`, such as from the `2025.2.1` to `2025.3.2` versions of the helm charts, or from the `2025.2.3` to the `2025.3.4` versions of the helm charts (_reported upstream in linkerd/linkerd2#13821_), the freshly upgraded destination controller pods will begin routing meshed traffic differently.
i'll state explicitly, _that_ is not a bug! it is, however, an important clue to bear in mind: data plane pods that were started with the previous control plane version, and continue running after the control plane upgrade, will have seen both routing patterns. reporting a duplicate time series for affected metrics indicates that there is a hashing collision in our metrics system.
### 🐛 the bug(s)
we define a collection to structures to model labels for inbound and outbound endpoints'
metrics:
```rust
// linkerd/app/core/src/metrics.rs
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub enum EndpointLabels {
Inbound(InboundEndpointLabels),
Outbound(OutboundEndpointLabels),
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct InboundEndpointLabels {
pub tls: tls::ConditionalServerTls,
pub authority: Option<http::uri::Authority>,
pub target_addr: SocketAddr,
pub policy: RouteAuthzLabels,
}
#[derive(Clone, Debug, PartialEq, Eq, Hash)]
pub struct OutboundEndpointLabels {
pub server_id: tls::ConditionalClientTls,
pub authority: Option<http::uri::Authority>,
pub labels: Option<String>,
pub zone_locality: OutboundZoneLocality,
pub target_addr: SocketAddr,
}
```
\- <https://github.com/linkerd/linkerd2-proxy/blob/main/linkerd/app/core/src/metrics.rs>
bear particular attention to the derived `Hash` implementation. note the `tls::ConditionalClientTls` and `tls::ConditionalServerTls` types used in each of these labels. these are used by some of our types like `TlsConnect` to emit prometheus labels, using our legacy system's `FmtLabels` trait:
```rust
// linkerd/app/core/src/transport/labels.rs
impl FmtLabels for TlsConnect<'_> {
fn fmt_labels(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.0 {
Conditional::None(tls::NoClientTls::Disabled) => {
write!(f, "tls=\"disabled\"")
}
Conditional::None(why) => {
write!(f, "tls=\"no_identity\",no_tls_reason=\"{}\"", why)
}
Conditional::Some(tls::ClientTls { server_id, .. }) => {
write!(f, "tls=\"true\",server_id=\"{}\"", server_id)
}
}
}
}
```
\- <99316f7898/linkerd/app/core/src/transport/labels.rs (L151-L165)>
note the `ClientTls` case, which ignores fields in the client tls information:
```rust
// linkerd/tls/src/client.rs
/// A stack parameter that configures a `Client` to establish a TLS connection.
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub struct ClientTls {
pub server_name: ServerName,
pub server_id: ServerId,
pub alpn: Option<AlpnProtocols>,
}
```
\- <99316f7898/linkerd/tls/src/client.rs (L20-L26)>
this means that there is potential for an identical set of labels to be emitted given two `ClientTls` structures with distinct server names or ALPN protocols. for brevity, i'll elide the equivalent issue with `ServerTls`, and its corresponding `TlsAccept<'_>` label implementation, though it exhibits the same issue.
### 🔨 the fix
this pull request introduces two new types: `ClientTlsLabels` and `ServerTlsLabels`. these continue to implement `Hash`, for use as a key in our metrics registry, and for use in formatting labels.
`ClientTlsLabels` and `ServerTlsLabels` each resemble `ClientTls` and `ServerTls`, respectively, but do not contain any fields that are elided in label formatting, to prevent duplicate metrics from being emitted.
relatedly, #3988 audits our existing `FmtLabels` implementations and makes use of exhaustive bindings, to prevent this category of problem in the short-term future. ideally, we might eventually consider replacing the metrics interfaces in `linkerd-metrics`, but that is strictly kept out-of-scope for the purposes of this particular fix.
---
* fix: do not key transport metrics registry on `ClientTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
* fix: do not key transport metrics registry on `ServerTls`
Signed-off-by: katelyn martin <kate@buoyant.io>
---------
Signed-off-by: katelyn martin <kate@buoyant.io>
This change does two things:
- adds support for `NamedPipes` to our SPIRE client. This will allow the client to connect to spire agents running on Windows hosts
- renames the `LINKERD2_PROXY_IDENTITY_SPIRE_SOCKET` to `LINKERD2_PROXY_IDENTITY_SPIRE_WORKLOAD_API_ADDRESS` and deprecates the former.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>