linkerd2-proxy

Commit Graph

Author	SHA1	Message	Date
Eliza Weisman	0a085f98e8	Actually fix the master CI build (#164 ) It turns out that increasing the recursion limit for the `tap` test crate _actually_ fixes the compiler error that's broken the last several builds on master. Since I'm now able to locally reproduce the error (which occurs only when running the tests in release mode), I've verified that this actually does fix the issue. Thus, I've also reverted the previous commit (`7c35f27ad3`) which claimed to fix this issue, as it turns out that was not actually necessary. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-12-14 12:18:59 -08:00
Eliza Weisman	761a08e4ac	Make TLS accept logic compatible with disabled protocol detection (#158 ) This branch changes the proxy's accept logic so that the proxy will no longer attempt to terminate TLS on ports which are configured to skip protocol detection. This means that a Linkerd deployment with `--tls optional` will no longer break server-speaks-first protocols like MySQL (although that traffic will not be encrypted). Since it's necessary to get the connection's original destination to determine if it's on a port which should skip protocol detection, I've moved the SO_ORIGINAL_DST call down the stack from `Server` to `BoundPort`. However, to avoid making an additional unnecessary syscall, the original destination is propagated to the server, along with the information about whether or not protocol detection is enabled. This is the approach described in https://github.com/linkerd/linkerd2/issues/1270#issuecomment-406124236. I've also written a new integration test for server-speaks-first protocols with TLS enabled. This test is essentially the same as the existing `transparency::tcp_server_first` test, but with TLS enabled for the test proxy. I've confirmed that this fails against master. Furthermore, I've validated this change by deploying the `booksapp` demo with MySQL with TLS enabled, which [previously didn't work](https://github.com/linkerd/linkerd2/issues/1648#issuecomment-432867702). Closes linkerd/linkerd2#1270 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-12-13 12:31:13 -08:00
Sean McArthur	3ac6b72c48	Add basic tap integration tests (#154 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-12-04 18:37:26 -08:00
Sean McArthur	88340cadf3	replace proxy::http usage of tower-h2 with hyper Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-11-27 17:29:18 -08:00
Oliver Gould	5d93690045	tests: Replace `uimplemented!` with 'unreachable!' (#141 ) When developing, it's convenient to use `unimplemented!` as a placeholder for work that has not yet been done. However, we also use `unimplemented!` in various tests in stubbed methods; so searching the project for `unimplemented` produces false positives. This change replaces these with `unreachable!`, which is functionaly equivalent, but better indicates that the current usage does not reach these methods and disambiguates usage of `unimplemented!`.	2018-11-27 14:27:30 -08:00
Sean McArthur	cde7675a9a	Ensure l5d-orig-proto header is removed before returning responses Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-11-16 12:10:29 -08:00
Sean McArthur	f37c9e5128	Update all tower pieces to use Service<Request> (#132 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-11-16 11:19:17 -08:00
Oliver Gould	d5e2ff2cb7	Canonicalize outbound names via DNS for inbound profiles (#129 ) When the inbound proxy receives requests, these requests may have relative `:authority` values like _web:8080_. Because these requests can come from hosts with a variety of DNS configurations, the inbound proxy can't make a sufficient guess about the fully qualified name (e.g. _web.ns.svc.cluster.local._). In order for the inbound proxy to discover inbound service profiles, we need to establish some means for the inbound proxy to determine the "canonical" name of the service for each request. This change introduces a new `l5d-dst-canonical` header that is set by the outbound proxy and used by the remote inbound proxy to determine which profile should be used. The outbound proxy determines the canonical destination by performing DNS resolution as requests are routed and uses this name for profile and address discovery. This change removes the proxy's hardcoded Kubernetes dependency. The `LINKERD2_PROXY_DESTINATION_GET_SUFFIXES` and `LINKERD2_PROXY_DESTINATION_PROFILE_SUFFIXES` environment variables control which domains may be discovered via the destination service. Finally, HTTP settings detection has been moved into a dedicated routing layer at the "bottom" of the stack. This is done do that canonicalization and discovery need not be done redundantly for each set of HTTP settings. Now, HTTP settings, only configure the HTTP client stack within an endpoint. Fixes linkerd/linkerd2#1798	2018-11-15 11:41:17 -08:00
Sean McArthur	21887e57e4	change Inbound to always use localhost Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-11-14 15:59:48 -08:00
Oliver Gould	d396acda6d	Fall-back to former classification logic with profiles (#123 ) With the introduction of profile-based classification, the proxy would not perform normal gRPC classification in some cases when it could & should. This change simplifies our default classifier logic and falls back to the default grpc-aware behavior whenever another classification cannot be performed. Furthermore, this change moves the `proxy::http::classify` module to `proxy::http::metrics::classify`, as these modules should only be relied on for metrics classification. Other module (for instance, retries), should provide their own abstractions. Finally, this change fixes a test error-formatting issue.	2018-11-12 18:22:12 -08:00
Oliver Gould	5e0a15b8a7	Introduce outbound route metrics (#117 ) The Destination Profile API---provided by linkerd2-proxy-api v0.1.3-- allows the proxy to discovery route information for an HTTP service. As the proxy processes outbound requests, in addition to doing address resolution through the Destination service, the proxy may also discover profiles including route patterns and labels. When the proxy has route information for a destination, it applies the RequestMatch for each route to find the first-matching route. The route's labels are used to expose `route_`-prefixed HTTP metrics (and each label is prefixed with `rt_`). Furthermore, if a route includes ResponseMatches, they are used to perform classification (i.e. for the `response_total` and `route_response_total` metrics). A new `proxy::http::profiles` module implements a router that consumes routes from an infinite stream of route lists. The `app::profiles` module implements a client that continually and repeatedly tries to establish a watch for the destination's routes (with some backoff). Route discovery does not _block_ routing; that is, the first request to a destination will likely be processed before the route information is retrieved from the controller (i.e. on the default route). Route configuration is applied in a best-effort fashion.	2018-11-05 16:30:39 -08:00
Oliver Gould	8fca9ebde2	Only use the classification label for response_total (#116 ) As described in https://github.com/linkerd/linkerd2/issues/1832, our eager classification is too complicated. This changes the `classification` label to only be used with the `response_total` label. The following changes have been made: 1. response_latency metrics only include a status_code and not a classification. 2. response_total metrics include classification labels. 3. transport metrics no longer expose a `classification` label (since it's misleading). now the `errno` label is set to be empty when there is no error. 4. Only gRPC classification applies when the request's content type starts with `application/grpc+` The `proxy::http::classify` APIs have been changed so that classifiers cannot return a classification before the classifier is fully consumed.	2018-11-01 14:59:44 -07:00
Sean McArthur	68ec7f1488	fix test import conflict of assert_contains macro Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-10-26 14:21:26 -07:00
Oliver Gould	978fed1cf6	refactor: Structure the proxy in terms of `Stack` (#100 ) As the proxy's functionality has grown, the HTTP routing functionality has become complex. Module boundaries have become ill-defined, which leads to tight coupling--especially around the `ctx` metadata types and `Service` type signatures. This change introduces a `Stack` type (and subcrate) that is used as the base building block for proxy functionality. The `proxy` module now exposes generic components--stack layers--that are configured and instantiated in the `app::main` module. This change reorganizes the repo as follows: - Several auxiliary crates have been split out from the `src/` directory into `lib/fs-watch`, `lib/stack` and `lib/task`. - All logic specific to configuring and running the linkerd2 sidecar proxy has been moved into `src/app`. The `Main` type has been moved from `src/lib.rs` to `src/app/main.rs`. - The `src/proxy` has reusable, generic components useful for building proxies in terms of `Stack`s. The logic contained in `lib/bind.rs`, pertaining to per-endpoint service behavior, has almost entirely been moved into `app::main`. `control::destination` has changed so that it is not responsible for building services. (It used to take a clone of `Bind` and use it to create per-endpoint services). Instead, the destination service implements the new `proxy::Resolve` trait, which produces an infinite `Resolution` stream for each lookup. This allows the `proxy::balance` module to be generic over the servie discovery source. Furthermore, the `router::Recognize` API has changed to only expose a `recgonize()` method and not a `bind_service()` method. The `bind_service` logic is now modeled as a `Stack`. The `telemetry::http` module has been replaced by a `proxy::http::metrics` module that is generic over its metadata types and does not rely on the old telemetry event system. These events are now a local implementation detail of the `tap` module. There are no user-facing changes in the proxy's behavior.	2018-10-11 11:25:03 -07:00
Oliver Gould	977ff25cb9	Make config variable naming uniform (#99 ) There are a few crufty remaining references to the `private` and `public` proxy interfaces. `private` referred to the pod-local side of the proxy; and `public` the external side. These terms have been replaced by `inbound` and `outbound` proxies. The "private forward address" is now the "inbound forward address". The "private listener" is now the "outbound listener". The "public listener" is now the "inbound listener". This change adds alternate environment variables to support configuration: - `LINKERD2_PROXY_INBOUND_CONNECT_TIMEOUT` - `LINKERD2_PROXY_INBOUND_FORWARD` - `LINKERD2_PROXY_INBOUND_LISTENER` - `LINKERD2_PROXY_OUTBOUND_CONNECT_TIMEOUT` - `LINKERD2_PROXY_OUTBOUND_LISTENER` The old configuration variables have not been removed. Instead, a warning is logged when such a variable is used to configure the proxy.	2018-10-03 07:51:54 -07:00
Oliver Gould	53a85f442a	Split the `timeout` module into its own subcrate (#96 ) The `timeout` module has very little to do with the proxy, specifically. This change moves it into a dedicated subcrate. This helps to clarify dependencies and to minimize generic logic in the main proxy crate. In doing this, an unused implementation of AsyncRead/AsyncWrite for Timeout was deleted. Furthermore, the `HumanDuration` type has been copied into tests/support/mod.rs so that this type need not be part of the timeout module's public API.	2018-09-18 15:09:31 -07:00
Oliver Gould	0d890dab6b	Mark tcp_connect_err tests as macos-specific (#63 ) The `tcp_connect_err` tests do not pass on Linux. I initially observed that the errno produced is ECONNREFUSED on Linux, not EXFULL, but even changing this does not help the test to pass reliably, as the socket teardown semantics appear to be slightly different. In order to prevent spurious test failures during development, this change marks these two tests as macos-specific.	2018-08-14 12:32:41 -07:00
Sean McArthur	47b0f0402d	strip other connection-level headers from HTTP/1 (#43 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-07 11:00:42 -07:00
Sean McArthur	68f9a427d6	fix orig-proto translation with HTTP1 chunked bodies (#42 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-06 16:20:32 -07:00
Sean McArthur	ab1b280de8	Add orig-proto which uses HTTP2 between proxies (#32 ) When the destination service returns a hint that an endpoint is another proxy, eligible HTTP1 requests are translated into HTTP2 and sent over an HTTP2 connection. The original protocol details are encoded in a header, `l5d-orig-proto`. When a proxy receives an inbound HTTP2 request with this header, the request is translated back into its HTTP/1 representation before being passed to the internal service. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-03 15:03:14 -07:00
Eliza Weisman	1e24aeb615	Limit concurrent Destination service queries (#36 ) Required for linkerd/linkerd2#1322. Currently, the proxy places a limit on the number of active routes in the route cache. This limit defaults to 100 routes, and is intended to prevent the proxy from requesting more than 100 lookups from the Destination service. However, in some cases, such as Prometheus scraping a large number of pods, the proxy hits this limit even though none of those requests actually result in requests to service discovery (since Prometheus scrapes pods by their IP addresses). This branch implements @briansmith's suggestion in https://github.com/linkerd/linkerd2/issues/1322#issuecomment-407161829. It splits the router capacity limit to two separate, configurable limits, one that sets an upper bound on the number of concurrently active destination lookups, and one that limits the capacity of the router cache. I've done some preliminary testing using the `lifecycle` tests, where a single Prometheus instance is configured to scrape a very large number of proxies. In these tests, neither limit is reached. Furthermore, I've added integration tests in `tests/discovery` to exercise the destination service query limit. These tests ensure that query capacity is released when inactive routes which create queries are evicted from the router cache, and that the limit does _not_ effect DNS queries. This branch obsoletes and closes #27, which contained an earlier version of these changes. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-02 16:40:12 -07:00
Oliver Gould	18a8d7956d	Improve tcp_connect_err test flakiness (#37 ) Both tcp_connect_err tests frequently fail, even during local development. This seems to happen because the proxy doesn't necessarily observe the socket closure. Instead of shutting down the socket gracefully, we can just drop it! This helps the test pass much more reliably.	2018-08-01 17:25:49 -07:00
Oliver Gould	b2fcd5d276	Remove the telemetry system's event channel (#30 ) The proxy's telemetry system is implemented with a channel: the proxy thread generates events and the control thread consumes these events to record metrics and satisfy Tap observations. This design was intended to minimize latency overhead in the data path. However, this design leads to substantial CPU overhead: the control thread's work scales with the proxy thread's work, leading to resource contention in busy, resource-limited deployments. This design also has other drawbacks in terms of allocation & makes it difficult to implement planned features like payload-aware Tapping. This change removes the event channel so that all telemetry is recorded instantaneously in the data path, setting up for further simplifications so that, eventually, the metrics registry properly uses service lifetimes to support eviction. This change has a potentially negative side effect: metrics scrapes obtain the same lock that the data path uses to write metrics so, if the metrics server gets heavy traffic, it can directly impact proxy latency. These effects will be ameliorated by future changes that reduce the need for the Mutex in the proxy thread.	2018-07-26 11:16:27 -07:00
Markus Jais	7788f60e0e	fixed some typos in comments and Dockerfile (#25 ) Signed-off-by: Markus Jais <markusjais@googlemail.com>	2018-07-25 10:10:59 -10:00
Sean McArthur	04a8ae3edf	update to hyper 0.12.7 to fix a keep-alive bug (#26 ) Specifically proxied bodies would make use of an optimization in hyper that resulted in the connection not knowing (but it did know! just didn't tell itself...) that the body was finished, and so the connection was closed. 0.12.7 includes the fix in hyper. As part of this upgrade, the keep-alive tests have been adjusted to send a small body, since the empty body was not triggering this case.	2018-07-23 18:33:55 -07:00
Eliza Weisman	2d4086aee9	Add errno label to transport close metrics (when applicable) (#12 ) This branch adds a label displaying the Unix error code name (or the raw error code, on other operating systems or if the error code was not recognized) to the metrics generated for TCP connection failures. It also adds a couple of tests for label formatting. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-23 15:37:04 -07:00
Sean McArthur	9f5648d955	fix control client Backoff to poll its timer when backing off (#13 ) The `Backoff` service wrapper is used for the controller client service so that if the proxy can't find the controller (there is a connection error), it doesn't keep trying in a tight loop, but instead waits a couple seconds before trying again, presuming that the control plane was rebooting. When "backing off", a timer would be set, but it wasn't polled, so the task was never registered to wake up after the delay. This turns out to not have been a problem in practice, since the background destination task was joined with other tasks that were constantly waking up, allowing it to try again anyways. To add tests for this, a new `ENV_CONTROL_BACKOFF_DELAY` config value has been added, so that the tests don't have to wait the default 5 seconds. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-16 12:41:47 -07:00
Oliver Gould	bbf217ff4f	Replace references to _Conduit_ (#6 ) There are various comments, examples, and documentation that refers to Conduit. This change replaces or removes these refernces. CONTRIBUTING.md has been updated to refer to GOVERNANCE/MAINTAINERS.	2018-07-12 20:41:17 -07:00
Eliza Weisman	2f4c1b220a	Add labels for `tls::ReasonForNoIdentity` (#5 ) Fixes linkerd/linkerd2#1276. Currently, metrics with the `tls="no_identity"` label are duplicated. This is because that label is generated from the `tls_status` label on the `TransportLabels` struct, which is either `Some(())` or a `ReasonForNoTls`. `ReasonForNoTls` has a variant`ReasonForNoTls::NoIdentity`, which contains a `ReasonForNoIdentity`, but when we format that variant as a label, we always just produce the string "no_identity", regardless of the value of the `ReasonForNoIdentity`. However, label types are _also_ used as hash map keys into the map that stores the metrics scopes, so although two instances of `ReasonForNoTls::NoIdentity` with different `ReasonForNoIdentity`s produce the same formatted label output, they aren't _equal_, since that field differs, so they correspond to different metrics. This branch resolves this issue, by adding an additional label to these metrics, based on the `ReasonForNoIdentity`. Now, the separate lines in the metrics output that correspond to each `ReasonForNoIdentity` have a label differentiating them from each other. Note that the the `NotImplementedForTap` and `NotImplementedForMetrics` reasons will never show up in metrics labels, currently, since we don't gather metrics from the tap and metrics servers at the moment. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-12 16:04:25 -07:00
Oliver Gould	c23ecd0cbc	Migrate `conduit-proxy` to `linkerd2-proxy` The proxy now honors environment variables starting with `LINKERD2_PROXY_`.	2018-07-07 22:45:21 +00:00

30 Commits