linkerd2-proxy

Commit Graph

Author	SHA1	Message	Date
Eliza Weisman	426120a6e9	build(deps): use published version of `boring` (#2454 ) The `linkerd-meshtls-boring` crate currently uses a Git dependency on `boring` and `tokio-boring`. This is because, when this crate was initially introduced, the proxy required unreleased changes to these crates. Now, however, upstream has published all the changes we depended on (this happened ages ago), and we can depend on these libraries from crates.io. This branch removes the Git deps and updates to v3.0.0 of `boring`/`tokio-boring`. I've also changed the `cargo-deny` settings to no longer allow Git deps on these crates, as we no longer depend on them from Git.	2023-08-25 10:27:48 -07:00
Eliza Weisman	9fa90df4ec	Increase HTTP request queue capacity (#2449 ) In 2.13, the default inbound and outbound HTTP request queue capacity decreased from 10,000 requests to 100 requests (in PR #2078). This change results in proxies shedding load much more aggressively while under high load to a single destination service, resulting in increased error rates in comparison to 2.12 (see linkerd/linkerd2#11055 for details). This commit changes the default HTTP request queue capacities for the inbound and outbound proxies back to 10,000 requests, the way they were in 2.12 and earlier. In manual load testing I've verified that increasing the queue capacity results in a substantial decrease in 503 Service Unavailable errors emitted by the proxy: with a queue capacity of 100 requests, the load test described [here] observed a failure rate of 51.51% of requests, while with a queue capacity of 10,000 requests, the same load test observes no failures. Note that I did not modify the TCP connection queue capacities, or the control plane request queue capacity. These were previously configured by the same variable before #2078, but were split out into separate vars in that change. I don't think the queue capacity limits for TCP connection establishment or for control plane requests are currently resulting in instability the way the decreased request queue capacity is, so I decided to make a more focused change to just the HTTP request queues for the proxies. [here]: https://github.com/linkerd/linkerd2/issues/11055#issuecomment-1650957357	2023-08-03 10:19:39 -07:00
Alex Leong	32245601bb	Add suport for response header filter (#2439 ) Add support for the response header modifier, which was added to the proxy API in https://github.com/linkerd/linkerd2-proxy-api/pull/251 Signed-off-by: Alex Leong <alex@buoyant.io>	2023-07-19 14:47:03 -07:00
Oliver Gould	d6172c594c	Emit distinguishable version info (#2432 ) The proxy currently emits very little useful version information. This change updates the proxy to support new build-time environment variables that are used to report version information: * LINKERD2_PROXY_BUILD_TIME * LINKERD2_PROXY_VENDOR * LINKERD2_PROXY_VERSION Additionally, several pre-existing Git-oriented metadata have been removed, as they were generally redundant or uninformative. The Rustc version has also been removed (since it has no real user-facing value and can be easily determined by the version/tag).	2023-06-23 14:21:28 -07:00
Eliza Weisman	f0277cc776	outbound: handle `NotFound` client policies in ingress mode (#2431 ) When the outbound proxy resolves an outbound policy from the policy controller's `OutboundPolicies` API, the policy controller may return an error with the `grpc-status` code `NotFound` in order to indicate that the destination is not a ClusterIP service. When this occurs, the proxy will fall back to either using a ServiceProfile, if the ServiceProfile contains non-trivial configuration, or synthesizing a default client policy from the ServiceProfile. However, when the outbound proxy is configured to run in ingress mode, the fallback behavior does not occur. Instead, the ingress mode proxy treats any error returned by the policy controller's `OutboundPolicies` API as fatal. This means that when an ingress controller performs its own load-balancing and opens a connection to a pod IP directly, the ingress mode proxy will fail any requests on that connection. This is a bug, and is the cause of the issues described in linkerd/linkerd2#10908. This branch fixes this by changing the ingress mode proxy to handle `NotFound` errors returned by the policy controller. I've added similar logic for synthesizing default policies from a discovered ServiceProfile, or using the profile if it's non-trivial. Unfortunately, we can't just reuse the existing `Outbound::resolver` method, as ingress discovery may be performed for an original destination address or for a DNS name, and it's necessary to construct fallback policies in either case. Instead, I've added a new function with similar behavior that's ingress-specific. I've manually tested this change against the repro steps[^1] described in linkerd/linkerd2#10908, and verified that the proxy 503s on 2.13.4, and that it once again routes correctly after applying this change. Fixes linkerd/linkerd2#10908. [^1]: As described in the first comment, using Contour and podinfo.	2023-06-20 12:14:37 -07:00
Eliza Weisman	9f0a2698f4	outbound: add backend and route metadata to errors (#2428 ) PRs #2418 and #2419 add per-route and per-backend request timeouts configured by the `OutboundPolicies` API to the `MatchedRoute` and `MatchedBackend` layers in the outbound `ClientPolicy` stack, respectively. This means that — unlike in the `ServiceProfile` stack — two separate request timeouts can be configured in `ClientPolicy` stacks. However, because both the `MatchedRoute` and `MatchedBackend` layers are in the HTTP logical stack, the errors emitted by both timeouts will have a `LogicalError` as their most specific error metadata, meaning that the log messages and `l5d-proxy-error` headers recorded for these timeouts do not indicate whether the timeout that failed the request was the route request timeout or the backend request timeout. In order to ensure this information is recorded and exposed to the user, this branch adds two new error wrapper types, one of which enriches an error with a `RouteRef`'s metadata, and one of which enriches an error with a `BackendRef`'s metadata. The `MatchedRoute` stack now wraps all errors with `RouteRef` metadata, and the `MatchedBackend` stack wraps errors with `BackendRef` metadata. This way, when the route timeout fails a request, the error will include the route metadata, while when the backend request timeout fails a request, the error will include both the route and backend metadata. Adding these new error wrappers also has the additional side benefit of adding this metadata to errors returned by filters, allowing users to distinguish between errors emitted by a filter on a route rule and errors emitted by a per-backend filter. Also, any other errors emitted lower in the stack for requests that are handled by a client policy stack will now also include this metadata, which seems generally useful. Example errors, taken from a proxy unit test: backend request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: backend service.test.test-svc:666: HTTP response timeout after 1s ``` route request: ``` logical service logical.test.svc.cluster.local:666: route httproute.test.timeout-route: HTTP response timeout after 2s ```	2023-06-15 12:26:31 -07:00
Eliza Weisman	966306bc84	outbound: implement `OutboundPolicies` backend request timeouts (#2419 ) Depends on #2418 The latest proxy-api release, v0.10.0, adds fields to the `OutboundPolicies` API for configuring HTTP request timeouts, based on the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997. PR #2418 updates the proxy to depend on the new proxy-api release, and implements the `Rule.request_timeout` field added to the API. However, that branch does not add a timeout for the `RouteBackend.request_timeout` field. This branch changes the proxy to apply the backend request timeout when configured by the policy controller. This branch implements `RouteBackend.request_timeout` by adding an additional timeout layer in the `MatchedBackend` stack. This applies the per-backend timeout once a backend is selected for a route. I've also added stack tests for the interaction between the request and backend request timeouts. Note that once retries are added to client policy stacks, it may be necessary to move the backend request timeout to ensure it occurs "below" retries, depending on where the retry middleware ends up being located in the proxy stack.	2023-06-15 10:48:49 -07:00
dependabot[bot]	1a6225feb7	build(deps): bump tj-actions/changed-files from 35.7.7 to 36.2.1 (#2427 ) Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 35.7.7 to 36.2.1. - [Release notes](https://github.com/tj-actions/changed-files/releases) - [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md) - [Commits](`db5dd7c176...c9124514c3`) --- updated-dependencies: - dependency-name: tj-actions/changed-files dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-15 09:33:42 -07:00
Eliza Weisman	a512ea658d	outbound: implement `OutboundPolicies` route request timeouts (#2418 ) The latest proxy-api release, v0.10.0, adds fields to the `OutboundPolicies` API for configuring HTTP request timeouts, based on the proposed changes to HTTPRoute in kubernetes-sigs/gateway-api#1997. This branch updates the proxy-api dependency to v0.10.0 and adds the new timeout configuration fields to the proxy's internal client policy types. In addition, this branch adds a timeout middleware to the HTTP client policy stack, so that the timeout described by the `Rule.request_timeout` field is now applied. Implementing the `RouteBackend.request_timeout` field with semantics as close as possible to those described in GEP-1742 will be somewhat more complex, and will be added in a separate PR.	2023-06-14 11:36:40 -07:00
Eliza Weisman	864a5dbc97	recover: remove unused `mut` (#2425 )	2023-06-12 11:45:35 -07:00
Alex Leong	704ef31a28	Classify grpc requests properly in the http classifier (#2410 ) The gRPC protocol always sets the HTTP response status code to 200 and instead communicates failures in a grpc-status header sent in a TRAILERS frame. Linkerd uses the HTTP response status code to determine if a response is successful, and therefore will consider all gRPC responses successful regardless of their gRPC status code. This means that functionality such as retries and circuit breaking do not function correctly with gRPC traffic. We update the Http classifier to look for the presence of a `Content-Type: application/grpc` header and use Grpc response classification when it is set. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-05-22 13:40:37 -07:00
Matei David	3933feb587	Skip h2 upgrade when target is local (#2407 ) In the most recent stable versions, pods cannot communicate with themselves when using a ClusterIP. While direct (pod-to-pod) connections are never sent through the proxy and are skipped at the iptables level, connections to a logical service still pass through the proxy. When the chosen endpoint is the same as the source of the traffic, TLS and H2 upgrades should be skipped. Every endpoint receives an h2 upgrade hint in its metadata. When looking into the problem, I noticed that client settings do not take into account that the target may be local. When deciding what client settings to use, we do not upgrade the connection when the hint is "unknown" (gatewayed connections) or "opaque". This change does a similar thing by using H1 settings when the protocol is H1 and the target IP is also part of the inbound IPs passed to the proxy. Fixes linkerd/linkerd2#10816 Signed-off-by: Matei David <matei@buoyant.io>	2023-05-10 10:01:03 +01:00
Willi Schönborn	5366652c2b	Propagate correct span ID via W3C context (#2408 ) The W3C context propagation uses the wrong span ID right now. That causes all spans emitted by linkerd-proxy to be siblings rather than children of their original parent. This only applies to W3C as far as I can tell, because the B3 propagation uses the span ID correctly. Signed-off-by: Willi Schönborn <w.schoenborn@gmail.com>	2023-05-04 09:21:35 -07:00
Oliver Gould	c30d39f73e	dev: Update to Rust v1.69.0 (#2402 ) Update dev to v40	2023-04-25 15:56:34 -07:00
dependabot[bot]	2a32127d48	build(deps): bump async-trait from 0.1.66 to 0.1.68 (#2368 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.66 to 0.1.68. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.66...0.1.68) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Eliza Weisman <eliza@buoyant.io>	2023-04-25 14:52:13 -07:00
dependabot[bot]	fc3a1e860c	build(deps): bump futures from 0.3.26 to 0.3.28 (#2370 ) Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.26 to 0.3.28. - [Release notes](https://github.com/rust-lang/futures-rs/releases) - [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.26...0.3.28) --- updated-dependencies: - dependency-name: futures dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-04-25 14:08:37 -07:00
Eliza Weisman	717fd22b23	chore: allow `syn` v1 and v2 to coexist peacefully (#2401 ) The proc-macro ecosystem is in the middle of a migration from `syn` v1 to `syn` v2. Some crates (such as `tokio-macros`, `async-trait`, `tracing-attributes`, etc) have been updated to v2, while others haven't yet. This means that `cargo deny` will not currently permit us to update some of those crates to versions that depend on `syn` v2, because they will create a duplicate dependency. Since `syn` is used by proc-macros (executed at compile time), duplicate versions won't have an impact on the final binary size. Therefore, it's fine to allow both v1 and v2 to coexist while the ecosystem is still being gradually migrated to the new version.	2023-04-25 12:42:20 -07:00
dependabot[bot]	aacd8c9bac	build(deps): bump io-lifetimes from 1.0.4 to 1.0.10 (#2379 ) Bumps [io-lifetimes](https://github.com/sunfishcode/io-lifetimes) from 1.0.4 to 1.0.10. - [Release notes](https://github.com/sunfishcode/io-lifetimes/releases) - [Commits](https://github.com/sunfishcode/io-lifetimes/compare/1.0.4...v1.0.10) --- updated-dependencies: - dependency-name: io-lifetimes dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-04-25 09:58:41 -07:00
Eliza Weisman	181a20753a	outbound: synthesize client policies on `Unimplemented` (#2396 ) If the policy controller is from a Linkerd version earlier than 2.13.x, it will return the `Unimplemented` gRPC status code for requests to the `OutboundPolicies` API. The proxy's outbound policy client will currently retry this error code, rather than synthesizing a default policy. Since 2.13.x proxies require an `OutboundPolicy` to be discovered before handling outbound traffic, this means that 2.13.x proxies cannot handle outbound connections when the control plane is on an earlier version. Therefore, installing Linkerd 2.13 and then downgrading to 2.12 can potentially break the data plane's ability to route traffic. In order to support downgrade scenarios, the proxy should also synthesize a default policy when receiving an `Unimplemented` gRPC status code from the policy controller. This branch changes the proxy to do that. A warning is logged which indicates that the control plane version is older than the proxy's.	2023-04-25 09:56:30 -07:00
Eliza Weisman	ad4d5b64bb	inbound: determine default policies using the opaque ports env var (#2395 ) The proxy injector populates an environment variable, `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION`, with a list of all ports marked as opaque. Currently, however, _the proxy _does not actually use this environment variable_. Instead, opaque ports are discovered from the policy controller. The opaque ports environment variable was used only when running in the "fixed" inbound policy mode, where all inbound policies are determined from environment variables, and no policy controller address is provided. This mode is no longer supported, and the policy controller address is now required, so the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable is not currently used to discover inbound opaque ports. There are two issues with the current state of things. One is that inbound policy discovery is _non-blocking_: when an inbound proxy receives a connection on a port that it has not previously discovered a policy for, it uses the default policy until it has successfully discovered a policy for that port from the policy controller. This means that the proxy may perform protocol detection on the first connection to an opaque port. This isn't great, as it may result in a protocol detection timeout error on a port that the user had previously marked as opaque. It would be preferable for the proxy to read the environment variable, and use it to determine whether the default policy for a port is opaque, so that ports marked as opaque disable protocol detection even before the "actual" policy is discovered. The other issue with the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable is that it is currently a list of _individual port numbers_, while the proxy injector can accept annotations that specify _ranges_ of opaque ports. This means that when a very large number of ports are marked as opaque, the proxy manifest must contain a list of each individual port number in those ranges, making it potentially quite large. See linkerd/linkerd2#9803 for details on this issue. This branch addresses both of these problems. The proxy is changed so that it will once again read the `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment variable, and use it to determine which ports should have opaque policies by default. The parsing of the environment variable is changed to support specifying ports as a list of ranges, rather than a list of individual port numbers. Along with a proxy-injector change, this would resolve the manifest size issue described in linkerd/linkerd2#9803. This is implemented by changing the `inbound::policy::Store` type to also include a set of port ranges that are marked as opaque. When the `Store` handles a `get_policy` call for a port that is not already in the cache, it starts a control plane watch for that port just as it did previously. However, when determining the initial _default_ value for the policy, before the control plane discovery provides one, it checks whether the port is in a range that is marked as opaque, and, if it is, uses an opaque default policy instead. This approach was chosen rather than pre-populating the `Store` with policies for all opaque ports to better handle the case where very large ranges are marked as opaque and are used infrequently. If the `Store` was pre-populated with default policies for all such ports, it would essentially behave as though all ports in `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` were also in `LINKERD2_PROXY_INBOUND_PORTS`, and the proxy would immediately start a policy controller discovery watch for all opaque ports, which would be kept open for the proxy's entire lifetime. In cases where the opaque ports ranges include ~10,000s of ports, this causes significant unnecessary load on the policy controller. Storing opaque port ranges separately and using them to determine the default policy as needed allows opaque port policies to be treated the same as non-default ports, which are discovered as needed and can be evicted from the cache if they are unused. If a port is in both `LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` and `LINKERD2_PROXY_INBOUND_PORTS`, the proxy will start discovery eagerly and retain the port in the cache forever, but the default policy will be opaque. I've also added a test for the behavior of opaque ports where the port's policy has not been discovered from the policy controller. That test fails on `main`, as the proxy attempts protocol detection, but passes on this branch. In addition, I changed the parsing of the `LINKERD2_PROXY_INBOUND_PORTS` environment variable to also accept ranges, because it seemed like a nice thing to do while I was here. :)	2023-04-25 09:56:07 -07:00
Eliza Weisman	15bebe4eeb	outbound: add missing `meta` field in test policy (#2400 ) Looks like we accidentally merged PR #2375 without a CI build against the latest state of `main`. In the meantime since #2375 was last built on CI, PR #2374 added an additional metadata field to `policy::HttpParams`, which the `HttpParams` constructed in the test added from #2375 doesn't populate. Therefore, merging this PR broke the build. Whoops! This commit populates the `meta` field, fixing it.	2023-04-24 14:22:28 -07:00
Eliza Weisman	2a48e12e2b	outbound: test load balancer behavior with failure accrual (#2375 ) This branch adds a new test for failure accrual in load balancers with multiple endpoints. This test asserts that endpoints whose circuit breakers have tripped will not be selected by a load balancer.	2023-04-24 13:34:25 -07:00
Eliza Weisman	423ffb0af8	set default `trust_dns` log level to `ERROR` (#2393 ) Since upstream has yet to release a version with PR bluejekyll/trust-dns#1881, this commit changes the proxy's default log level to silence warnings from `trust_dns_proto` that are generally spurious. See linkerd/linkerd2#10123 for details.	2023-04-24 13:25:25 -07:00
Eliza Weisman	9d8607351a	outbound: determine protocol based on `OutboundPolicy` (#2397 ) Currently, the outbound proxy determines whether or not to perform protocol detection based on the presence of the `opaque_protocol` field on the resolved `ServiceProfile` from the Destination controller. However, the `OutboundPolicy` resolved from the policy controller also contains a `proxy_protocol` field that indicates what protocol should be used for this destination. While the proxy uses the HTTPRoutes from the `OutboundPolicy`'s `proxy_protocol`, it does _not_ take into account the `proxy_protocol` when determining whether or not to perform protocol detection. This can result in the outbound proxy performing protocol detection on connections to destinations that have been marked as opaque. This branch modifies the outbound proxy to use the `proxy_protocol` from the `OutboundPolicy`, as well as the `opaque_protocol` field from the `ServiceProfile`, when determining whether or not to perform protocol detection. In addition, I've added an integration test, which fails before making the changes on this branch. Fixes linkerd/linkerd2#10745	2023-04-24 13:22:08 -07:00
Eliza Weisman	051e0e199d	build(deps): bump `h2` to v0.3.18 (#2394 ) The DOS mitigation changes in `h2` v0.3.17 inadvertantly introduced a potential panic (hyperium/h2#674). Version 0.3.18 fixes this, so we should bump the proxy's dependency to avoid panics.	2023-04-18 14:51:25 -07:00
Eliza Weisman	c7918cfb1f	outbound: handle `Opaque` protocol hints on endpoints (#2237 ) Currently, when the outbound proxy makes a direct connection prefixed with a `TransportHeader` in order to send HTTP traffic, it will always send a `SessionProtocol` hint with the HTTP version as part of the header. This instructs the inbound proxy to use that protocol, even if the target port has a ServerPolicy that marks that port as opaque, which can result in incorrect handling of that connection. See linkerd/linkerd2#9888 for details. In order to prevent this, linkerd/linkerd2-proxy-api#197 adds a new `ProtocolHint` value to the protobuf endpoint metadata message. This will allow the Destination controller to explicitly indicate to the outbound proxy that a given endpoint is known to handle all connections to a port as an opaque TCP stream, and that the proxy should not perform a protocol upgrade or send a `SessionProtocol` in the transport header. This branch updates the proxy to handle this new hint value, and adds tests that the outbound proxy behaves as expected. Along with linkerd/linkerd2#10301, this will fix linkerd/linkerd2#9888. I opened a new PR for this change rather than attempting to rebase my previous PR #2209, as it felt a bit easier to start with a new branch and just make the changes that were still relevant. Therefore, this closes #2209.	2023-04-13 14:09:01 -07:00
dependabot[bot]	abab7fd783	build(deps): bump cmake from 0.1.49 to 0.1.50 (#2365 ) Bumps [cmake](https://github.com/rust-lang/cmake-rs) from 0.1.49 to 0.1.50. - [Release notes](https://github.com/rust-lang/cmake-rs/releases) - [Commits](https://github.com/rust-lang/cmake-rs/compare/0.1.49...0.1.50) --- updated-dependencies: - dependency-name: cmake dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-04-13 10:39:32 -07:00
dependabot[bot]	67306bc7ba	build(deps): bump h2 from 0.3.16 to 0.3.17 (#2391 ) Bumps [h2](https://github.com/hyperium/h2) from 0.3.16 to 0.3.17. - [Release notes](https://github.com/hyperium/h2/releases) - [Changelog](https://github.com/hyperium/h2/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/h2/compare/v0.3.16...v0.3.17) --- updated-dependencies: - dependency-name: h2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Closes #2390	2023-04-13 10:22:28 -07:00
Oliver Gould	89f237be86	outbound: Export balancer endpoints gauges (#2385 ) `d6f20a6` added a new `outbound_http_balancer_endpoints` gauge, but omitted it from metrics export. This change fixes this deficiency to ensure that this metric is rendered.	2023-04-11 15:29:15 -07:00
Eliza Weisman	907e4bf11a	update `linkerd2-proxy-api` to v0.9.0 from crates.io (#2384 ) This branch updates the dependency on `linkerd2-proxy-api` to use v0.9.0 from `crates.io` rather than via a Git dependency.	2023-04-10 10:39:04 -07:00
Oliver Gould	d6f20a6519	outbound: Report HTTP balancer endpoint gauges (#2380 ) It's not currently possible to know how many endpoints are in a balancer. This change adds an `outbound_http_balancer_endpoints` gauge that exposes the number of endpoints a balancer has by their current readiness status. Note that in the concrete stack we do not currently differentiate between gRPC and HTTP backends, so all balancers are exposed under this single metric.	2023-04-07 17:13:57 -07:00
Eliza Weisman	cef2ba9fba	outbound: add `l5d-dst-canonical` to requests with ServiceProfiles (#2383 ) PR #2250 removed the `l5d-dst-canonical` header from ServiceProfile requests. This header is used by the inbound proxy for telemetry purposes, so removing it and not putting it back broke ServiceProfile route metrics. This commit adds a layer for setting this header to the service profile route stack. We can also add the header for non-ServiceProfile requests if that's desirable...I'll have to look into whether it is. This commit should, at least, fix the existing ServiceProfile route metrics. Fixes #10521	2023-04-07 13:27:42 -07:00
Oliver Gould	d9fde0a414	outbound: Report per-route-backend request count metrics (#2377 ) When performing policy-based routing, proxies may dispatch requests through per-route backend configurations. In order to illustrate how routing rules apply and how backend distributions are being honored, this change adds two new metrics: * `outbound_http_route_backend_requests_total`; and * `outbound_grpc_route_backend_requests_total` Each of these metrics includes labels that identify a routes parent (i.e. a Service), the route resource being used, and the backend resource being used. This implementation does NOT implement any form of metrics eviction for these new metrics. This is tolerable for the short term as the cardinality of services and routes is generally much less than the cardinality of individual endpoints (where we do require timeout/eviction for metrics).	2023-04-04 16:18:32 -07:00
Oliver Gould	7a811f6916	Revert "Merge branch 'ver/route-reqs-mtx'" This reverts commit `fdd2128ff6`, reversing changes made to `7859b9a3e5`.	2023-04-04 21:43:34 +00:00
Oliver Gould	fdd2128ff6	Merge branch 'ver/route-reqs-mtx'	2023-04-04 21:37:37 +00:00
Oliver Gould	7859b9a3e5	outbound: Configure balancers with service metadata (#2374 ) The OutboundPolicies API includes resource references with its responses. These references allow us to accurately identify an arbitrary (Kubernetes style) resource by its group, kind, namespace, and name. To support new metrics that include these resource coordinates as labels, we need access to the parent and backend references from the balancer stack. This change makes the following changes to accommodate this: 1. We introduce `ParentRef`, `RouteRef`, `BackendRef`, and `EndpointRef` new-types to serve as explicit markers for different types of metadata we may encounter. 3. The profile stack is updated to parse out metadata from service names in the form <name>.<ns>.svc.*. 4. We also support discovering pod metadata from the endpoint labels `dst_pod` and `dst_namespace`. 5. When we can't infer any metadata from a profile response, we use an "unknown" metadata. 6. When using policy, we use control-plane provided metadata. In practice, pod metadata won't be surfaced by the existing code path; but we have a relatively simple path forward to using it. This change splits the balancer stack into its own layer so it's distinct from the rest of the concrete stack configuration. There are only minor functional changes in this branch: 1. The `balance{addr=...}` tracing context is replace by `service{ns=..,name=...}` for clarity; 2. Errors returned from the balance stack are now rendered as: `Service {name}.{ns}: {error}`	2023-04-04 14:04:18 -07:00
Oliver Gould	4cced5846d	app: Rename Metrics types as InboundMetrics and OutboundMetrics (#2376 ) This helps avoid a bug with rust-analyzer and is generally clearer.	2023-04-03 22:03:55 -07:00
Oliver Gould	9e8c1fc20b	Merge branch 'main' into ver/route-reqs-mtx	2023-04-04 00:36:36 +00:00
Oliver Gould	846ce1b524	outbound: Emit INFO-level logs on failure accrual changes (#2373 ) There are no explicit indications of failure accrual related behavior. This change adds INFO-level logs when a failure accrual breaker is closed or reopened. These log messages are scoped within the 'endpoint' log context so that log lines include endpoint and balancer addresses. INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker closed INFO balance{addr=logical.test.svc.cluster.local:666}:endpoint{addr=192.0.2.41:666}:consecutive_failures: linkerd_app_outbound::http::breaker::consecutive_failures: Consecutive failure-accrual breaker reopened	2023-04-03 17:07:38 -07:00
Oliver Gould	df78f2967c	outbound: Report per-route-backend request count metrics When performing policy-based routing, proxies may dispatch requests through per-route backend configurations. In order to illustrate how routing rules apply and how backend distributions are being honored, this change adds two new metrics: * `outbound_http_route_backend_requests_total`; and * `outbound_grpc_route_backend_requests_total` Each of these metrics includes labels that identify a routes parent (i.e. a Service), the route resource being used, and the backend resource being used. This implementation does NOT implement any form of metrics eviction for these new metrics. This is tolerable for the short term as the cardinality of services and routes is generally much less than the cardinality of individual endpoints (where we do require timeout/eviction for metrics).	2023-04-01 19:58:13 +00:00
Eliza Weisman	172523fb08	outbound: add failure accrual test, fix `Gate`s not becoming ready (#2371 ) This branch adds a stack test in `linkerd_app_outbound` for the consecutive-failures failure accrual policy. This test builds the HTTP logical and concrete stacks and tests their behavior using a mocked `tower::Service` as the endpoint stack, and a mock stack target type and destination resolver. Because these stack tests do not perform IO, execute in a single thread, and use the paused Tokio timer, they should be fully deterministic, minimizing potential flaky failures. However, we are still testing enough of the stack to exercize the circuit-breaking functionality integrated with other components of the proxy. The test added in this branch exercises only the behavior of a single concrete stack (one backend). It tests that the backend's circuit breaker trips when the configured number of failures occur, that the backend is probed after a probation backoff, and the behavior of failed and successful probe requests. This test does not currently exercise the behavior of distributors/load balancers when only some endpoints are failing. Those tests can be added in a follow-up PR. This test revealed a bug in the `Gate` middleware that prevented the test from passing. The `Gate` middleware's `poll_ready` implementation attempts to acquire a semaphore permit when the gate is in a limited state, and then polls the inner service for readiness. If the inner service is ready, the semaphore permit is stored in the gate to be consumed in `call`. However, this implementation does not check if the gate has already acquired a sempahore permit and stored it in the gate. In that case, a subsequent call to `poll_ready` after the gate has returned `Poll::Ready` will result in it attempting to acquire another permit. If the semaphore only has a single permit, and the gate has already acquired it, it will never become ready if `poll_ready` is called a second time. This branch fixes this bug by changing the `Gate` to only call `self.poll_acquire()` if the gate has not acquired a permit. If it has already acquired a permit, the `poll_ready` implementation returns `Poll::Ready` immediately.	2023-03-31 13:57:58 -07:00
Eliza Weisman	c659ec542f	configure failure accrual from the `OutboundPolicy` API (#2364 ) This branch updates the proxy to configure failure accrual policies from the OutboundPolicy API. It updates the dependency on `linkerd2-proxy-api` to include changes from linkerd/linkerd2-proxy-api#223, which adds failure accrual configuration to the `Protocol::Http1`, `Protocol::Http2`, and `Protocol::Grpc` variants. Currently, `ConsecutiveFailures` (added in #2357) is the only supported failure accrual policy. If the proxy API does not configure a failure accrual policy, failures are not accrued, and endpoints do not become unavailable due to failure accrual.	2023-03-30 16:44:49 -07:00
Eliza Weisman	d158a98480	add `ConsecutiveFailures` failure accrual policy (#2357 ) Depends on https://github.com/linkerd/linkerd2-proxy/pull/2354. This branch introduces a `ConsecutiveFailures` failure accrual policy which marks an endpoint as unavailable if a given number of failures occur in a row without any successes. Once the endpoint is marked as failing, it is issued a single probe request after a delay. If the probe succeeds, the endpoint transitions back to the available state. Otherwise, it remains unavailable, with subsequent probe requests being issued with an exponential backoff. The consecutive failures failure accrual policy was initially implemented in PR #2334 by @olix0r. A new `FailureAccrual::ConsecutiveFailures` variant is added in `linkerd2-proxy-client-policy` for configuring the consecutive failures failure accrual policy. The construction of circuit breakers in the outbound stack is changed from a closure implementing `ExtractParam` to a new `breaker::Params` type, so that the type returned can be the same regardless of which failure accrual policy is constructed.	2023-03-30 16:06:57 -07:00
Eliza Weisman	f5daf1f727	outbound: configure failure accrual policies from stack targets (#2354 ) Depends on #2353. PR #2353 adds middleware for implementing request-level circuit breaking. This branch adds the circuit breaking middleware to the outbound concrete stack, and adds plumbing for configuring a concrete stack's circuit breaker based on params provided by the target. A new `FailureAccrual` enum is added in `linkerd2-proxy-client-policy` to represent the failure accrual policy for a circuit breaker. Currently, no actual implementations of failure accrual policies exist in the proxy, so the only available variant of `FailureAccrual` is `FailureAccrual::None`, which disables circuit breaking. Circuit breaking middleware (a `Gate`/`BroadcastClassification` pair) is still constructed, but no failure accural task is spawned to actually open and shut the gate, so no circuit breaking is actually performed. Subsequent branches will actually implement failure accrual policies. Failure accrual policies are configured at the protocol level, rather than per-route or per-backend. This is because a given policy may contain multiple routes referencing the same backend, and a single concrete stack is constructed for that backend that's shared across all distributions that include it. If failure accrual policies were configured at the `RouteBackend` level, we would need to build separate client stacks if the same backend is referenced by `RouteBackend`s that have different failure accrual policies. Currently, failure accrual policies are not present in the proxy API, so all `ClientPolicy` instances have the default policy, `FailureAccrual::None`. Once the `OutboundPolicy` proxy API actually provides failure accrual configurations, a subsequent branch will populate this configuration from discovery.	2023-03-29 16:08:52 -07:00
Eliza Weisman	6d897d70a3	client-policy: accept backends with empty `dispatcher` fields (#2352 ) When a `backendRef` in an HTTPRoute references a valid Service name which doesn't currently exist in the cluster, the policy controller constructs a synthesized backend with a `failureInjector` filter, so that traffic sent to the invalid backend will fail with a descriptive error message, and any other backends in the route will still work normally. Because the backend doesn't exist, there is no reasonable value for the policy controller to set as the backend's `dispatcher` field in the OutboundPolicy proto. Unfortunately, however, the proxy currently rejects any OutboundPolicy which has a backend without a populated `dispatcher` field. This means that when an OutboundPolicy has one backend that has no `dispatcher`, the proxy will synthesize an invalid policy that fails all traffic, rather than one that only fails traffic sent to that backend. This branch changes the proxy so that backends without a `dispatcher` are no longer rejected by the client policy protobuf conversion code. Instead, a new `Dispatcher::Fail` variant is used to represent a backend which always fails requests with a configured error message, so that a functional `ClientPolicy` can still be constructed when one or more backends lack a dispatcher. When constructing a stack for a `Concrete` target with a `Fail` dispatcher, the proxy builds a service that fails all requests. In practice, these errors will not be observed when the policy controller returns a backend for an invalid Service name, as the `failureInjector` filter's error should be produced first, but constructing a service is necessary for the stack to function correctly. Closes linkerd/linkerd2#10620	2023-03-29 15:28:05 -07:00
dependabot[bot]	7b9b12173a	build(deps): bump tempfile from 3.4.0 to 3.5.0 (#2359 ) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.4.0 to 3.5.0. - [Release notes](https://github.com/Stebalien/tempfile/releases) - [Changelog](https://github.com/Stebalien/tempfile/blob/master/NEWS) - [Commits](https://github.com/Stebalien/tempfile/commits) --- updated-dependencies: - dependency-name: tempfile dependency-type: indirect update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-29 13:46:26 -07:00
dependabot[bot]	18b0a4f934	build(deps): bump actions/checkout from 3.3.0 to 3.5.0 (#2349 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 3.3.0 to 3.5.0. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](`ac59398561...8f4b7f8486`) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-29 13:45:59 -07:00
dependabot[bot]	afb70ef970	build(deps): bump windows_aarch64_msvc from 0.42.1 to 0.42.2 (#2362 ) Bumps [windows_aarch64_msvc](https://github.com/microsoft/windows-rs) from 0.42.1 to 0.42.2. - [Release notes](https://github.com/microsoft/windows-rs/releases) - [Commits](https://github.com/microsoft/windows-rs/commits) --- updated-dependencies: - dependency-name: windows_aarch64_msvc dependency-type: indirect update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-29 09:16:39 -07:00
dependabot[bot]	edec462c8f	build(deps): bump indexmap from 1.9.2 to 1.9.3 (#2363 ) Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.2 to 1.9.3. - [Release notes](https://github.com/bluss/indexmap/releases) - [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md) - [Commits](https://github.com/bluss/indexmap/compare/1.9.2...1.9.3) --- updated-dependencies: - dependency-name: indexmap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-29 09:16:12 -07:00
dependabot[bot]	911e361eeb	build(deps): bump ipnet from 2.7.1 to 2.7.2 (#2361 ) Bumps [ipnet](https://github.com/krisprice/ipnet) from 2.7.1 to 2.7.2. - [Release notes](https://github.com/krisprice/ipnet/releases) - [Changelog](https://github.com/krisprice/ipnet/blob/master/RELEASES.md) - [Commits](https://github.com/krisprice/ipnet/commits) --- updated-dependencies: - dependency-name: ipnet dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-29 09:12:04 -07:00

1 2 3 4 5 ...

2279 Commits All Branches Search

2279 Commits

All Branches