linkerd2-proxy

Commit Graph

Author	SHA1	Message	Date
Oliver Gould	0d890dab6b	Mark tcp_connect_err tests as macos-specific (#63 ) The `tcp_connect_err` tests do not pass on Linux. I initially observed that the errno produced is ECONNREFUSED on Linux, not EXFULL, but even changing this does not help the test to pass reliably, as the socket teardown semantics appear to be slightly different. In order to prevent spurious test failures during development, this change marks these two tests as macos-specific.	2018-08-14 12:32:41 -07:00
Sean McArthur	47b0f0402d	strip other connection-level headers from HTTP/1 (#43 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-07 11:00:42 -07:00
Sean McArthur	68f9a427d6	fix orig-proto translation with HTTP1 chunked bodies (#42 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-06 16:20:32 -07:00
Sean McArthur	ab1b280de8	Add orig-proto which uses HTTP2 between proxies (#32 ) When the destination service returns a hint that an endpoint is another proxy, eligible HTTP1 requests are translated into HTTP2 and sent over an HTTP2 connection. The original protocol details are encoded in a header, `l5d-orig-proto`. When a proxy receives an inbound HTTP2 request with this header, the request is translated back into its HTTP/1 representation before being passed to the internal service. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-03 15:03:14 -07:00
Eliza Weisman	1e24aeb615	Limit concurrent Destination service queries (#36 ) Required for linkerd/linkerd2#1322. Currently, the proxy places a limit on the number of active routes in the route cache. This limit defaults to 100 routes, and is intended to prevent the proxy from requesting more than 100 lookups from the Destination service. However, in some cases, such as Prometheus scraping a large number of pods, the proxy hits this limit even though none of those requests actually result in requests to service discovery (since Prometheus scrapes pods by their IP addresses). This branch implements @briansmith's suggestion in https://github.com/linkerd/linkerd2/issues/1322#issuecomment-407161829. It splits the router capacity limit to two separate, configurable limits, one that sets an upper bound on the number of concurrently active destination lookups, and one that limits the capacity of the router cache. I've done some preliminary testing using the `lifecycle` tests, where a single Prometheus instance is configured to scrape a very large number of proxies. In these tests, neither limit is reached. Furthermore, I've added integration tests in `tests/discovery` to exercise the destination service query limit. These tests ensure that query capacity is released when inactive routes which create queries are evicted from the router cache, and that the limit does _not_ effect DNS queries. This branch obsoletes and closes #27, which contained an earlier version of these changes. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-02 16:40:12 -07:00
Oliver Gould	18a8d7956d	Improve tcp_connect_err test flakiness (#37 ) Both tcp_connect_err tests frequently fail, even during local development. This seems to happen because the proxy doesn't necessarily observe the socket closure. Instead of shutting down the socket gracefully, we can just drop it! This helps the test pass much more reliably.	2018-08-01 17:25:49 -07:00
Oliver Gould	b2fcd5d276	Remove the telemetry system's event channel (#30 ) The proxy's telemetry system is implemented with a channel: the proxy thread generates events and the control thread consumes these events to record metrics and satisfy Tap observations. This design was intended to minimize latency overhead in the data path. However, this design leads to substantial CPU overhead: the control thread's work scales with the proxy thread's work, leading to resource contention in busy, resource-limited deployments. This design also has other drawbacks in terms of allocation & makes it difficult to implement planned features like payload-aware Tapping. This change removes the event channel so that all telemetry is recorded instantaneously in the data path, setting up for further simplifications so that, eventually, the metrics registry properly uses service lifetimes to support eviction. This change has a potentially negative side effect: metrics scrapes obtain the same lock that the data path uses to write metrics so, if the metrics server gets heavy traffic, it can directly impact proxy latency. These effects will be ameliorated by future changes that reduce the need for the Mutex in the proxy thread.	2018-07-26 11:16:27 -07:00
Markus Jais	7788f60e0e	fixed some typos in comments and Dockerfile (#25 ) Signed-off-by: Markus Jais <markusjais@googlemail.com>	2018-07-25 10:10:59 -10:00
Sean McArthur	04a8ae3edf	update to hyper 0.12.7 to fix a keep-alive bug (#26 ) Specifically proxied bodies would make use of an optimization in hyper that resulted in the connection not knowing (but it did know! just didn't tell itself...) that the body was finished, and so the connection was closed. 0.12.7 includes the fix in hyper. As part of this upgrade, the keep-alive tests have been adjusted to send a small body, since the empty body was not triggering this case.	2018-07-23 18:33:55 -07:00
Eliza Weisman	2d4086aee9	Add errno label to transport close metrics (when applicable) (#12 ) This branch adds a label displaying the Unix error code name (or the raw error code, on other operating systems or if the error code was not recognized) to the metrics generated for TCP connection failures. It also adds a couple of tests for label formatting. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-23 15:37:04 -07:00
Sean McArthur	9f5648d955	fix control client Backoff to poll its timer when backing off (#13 ) The `Backoff` service wrapper is used for the controller client service so that if the proxy can't find the controller (there is a connection error), it doesn't keep trying in a tight loop, but instead waits a couple seconds before trying again, presuming that the control plane was rebooting. When "backing off", a timer would be set, but it wasn't polled, so the task was never registered to wake up after the delay. This turns out to not have been a problem in practice, since the background destination task was joined with other tasks that were constantly waking up, allowing it to try again anyways. To add tests for this, a new `ENV_CONTROL_BACKOFF_DELAY` config value has been added, so that the tests don't have to wait the default 5 seconds. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-16 12:41:47 -07:00
Oliver Gould	bbf217ff4f	Replace references to _Conduit_ (#6 ) There are various comments, examples, and documentation that refers to Conduit. This change replaces or removes these refernces. CONTRIBUTING.md has been updated to refer to GOVERNANCE/MAINTAINERS.	2018-07-12 20:41:17 -07:00
Eliza Weisman	2f4c1b220a	Add labels for `tls::ReasonForNoIdentity` (#5 ) Fixes linkerd/linkerd2#1276. Currently, metrics with the `tls="no_identity"` label are duplicated. This is because that label is generated from the `tls_status` label on the `TransportLabels` struct, which is either `Some(())` or a `ReasonForNoTls`. `ReasonForNoTls` has a variant`ReasonForNoTls::NoIdentity`, which contains a `ReasonForNoIdentity`, but when we format that variant as a label, we always just produce the string "no_identity", regardless of the value of the `ReasonForNoIdentity`. However, label types are _also_ used as hash map keys into the map that stores the metrics scopes, so although two instances of `ReasonForNoTls::NoIdentity` with different `ReasonForNoIdentity`s produce the same formatted label output, they aren't _equal_, since that field differs, so they correspond to different metrics. This branch resolves this issue, by adding an additional label to these metrics, based on the `ReasonForNoIdentity`. Now, the separate lines in the metrics output that correspond to each `ReasonForNoIdentity` have a label differentiating them from each other. Note that the the `NotImplementedForTap` and `NotImplementedForMetrics` reasons will never show up in metrics labels, currently, since we don't gather metrics from the tap and metrics servers at the moment. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-12 16:04:25 -07:00
Oliver Gould	c23ecd0cbc	Migrate `conduit-proxy` to `linkerd2-proxy` The proxy now honors environment variables starting with `LINKERD2_PROXY_`.	2018-07-07 22:45:21 +00:00

14 Commits