linkerd2

Commit Graph

Author	SHA1	Message	Date
Oliver Gould	8bedd9d38a	rfc: Make DestinationServiceQuery generic (#749 ) The goals of this change are: 1. Reduce the size/complexity of `control::discovery` in order to ease code reviews. 2. Extract a reusable grpc streaming utility. There are no intended functional changes. `control::discovery::DestinationServiceQuery` is used to track the state of a request (and streaming response) to the destination service. Very little of this logic is specific to the destination service. The `DestinationServiceQuery` and associated `UpdateRx` type have been moved to a new module, `control::remote_stream`, as `Remote` and `Receiver`, respectively. Both of these types are generic over the gRPC message type, so it will be possible to use this utility with additional API endpoints. The `Receiver::poll` implementation has been simplified to be more idiomatic with the rest of our code (namely, using `try_ready!`).	2018-05-08 16:54:20 -07:00
Oliver Gould	63fbbd6931	proxy: Parse units with duration configurations (#909 ) Configuration values that take durations are currently specified as time values with no units. So `600` may mean 600ms in some contexts and 10 minutes in others. In order to avoid this problem, this change now requires that configurations provide explicit units for time values such as '600ms' or 10 minutes'. Fixes #27.	2018-05-08 13:54:12 -07:00
Oliver Gould	68e203a2fc	proxy: Use Duration types for config defaults (#906 ) It's easy to misconfigure default durations, since they're recorded as integers and converted to Durations separately. Now, all default constants that represent durations use const `Duration` instances (enabled by a recent Rust release). This fixes #905 which was caused by using the wrong time unit for the metrics retain time.	2018-05-08 10:58:22 -07:00
Oliver Gould	f4dba72cc3	proxy: Track SingleUse services against router capacity (#902 ) PR #898 introduces capacity limits to the balancer. However, because the router supports "single-use" routes--routes that are bound only for the life of a single HTTP1 request--it is easy for a router to exceed its configured capacity. In order to fix this, the `Reuse` type is removed from the router library so that _all_ routes are considered cacheable. It's now the responsibility of the bound service to enforce policies with regards to client retention. Routes were not added to the cache when the service could not be used to process more than a single request. Now, `Bind` wraps its returned services (via the `Binding` type), that dictate whether a single client is reused or if one is bound for each request. This enables all routes to be cached without changing behavior with regards to connection reuse.	2018-05-08 10:57:56 -07:00
Oliver Gould	02e6d018d0	proxy: Bound on router capacity (#898 ) Currently, the proxy may cache an unbounded number of routes. In order to prevent such leaks in production, new configurations are introduced to limit the number of inbound and outbound HTTP routes. By default, we support 100 inbound routes and 10K outbound routes. In a followup, we'll introduce an eviction strategy so that capacity can be reclaimed gracefully.	2018-05-04 16:32:30 -07:00
Oliver Gould	3310968647	proxy: Refactor router implementation (#894 ) The Router's primary `call` implementation is somewhat difficult to follow. This change does not introduce any functional changes, but makes the function easier to reason about. This is being done in preparation for functional changes.	2018-05-02 15:47:36 -07:00
Oliver Gould	7f16079f64	proxy: Upgrade tower dependencies (#892 ) In order to pick up https://github.com/tower-rs/tower-grpc/pull/60, upgrade tower dependencies. This will reduce the cost of updating for upcoming tower-h2 improvements.	2018-05-02 13:40:55 -07:00
Eliza Weisman	86bb701be8	Add unit tests for `metrics::record` (#890 ) This PR adds unit tests for `metrics::record`, based on the benchmarks for the same function. Currently, there is a test that fires a single response end event and asserts that the metrics state is correct afterward, and a test that fires all the events to simulate a full connection lifetime, and asserts that the metrics state is correct afterward. I'd like to also add a test that simulates multiple events with different labels, but I'll add that in a subsequent PR, In order to add these tests, it was necessary to to add test-only accessors to make some `metrics` structs `pub`` so that the test can access them. I also added some test-only functions to `metrics::Histogram`s, to make them easier to make assertions about.	2018-05-02 13:26:27 -07:00
Oliver Gould	2578a47617	proxy: Do not build Arbitrary types in Docker (#889 ) When the proxy's Dockerfile ran tests, it was necessary to build Arbitrary types for quickchecking protobuf types. Now that tests have been disabled, this optional set of dependencies is no longer required. Relates to #882.	2018-05-01 14:56:24 -07:00
Oliver Gould	eb08d47347	proxy: Fix Tap ID generation (#885 ) The proxy's tap server assigns a sequential numeric ID to each inbound Tap request to assist tap lifecycle management. The server implementation keeps a local counter to keep track of tap IDs. However, this implementation is cloned for each individual tap requests, so `0` the only tap ID ever used. This change moves the Tap ID to be stored in a shared atomic integer. Debug logging has been improved as well.	2018-05-01 11:59:45 -07:00
Oliver Gould	1801118906	Do not run tests in proxy Dockerfile (#882 ) The proxy Dockerfile includes test execution. While the intentions of this are good, it has unintended consequences: we can ship code linked with test dependencies. Because we have other means for testing proxy code (cargo, locally; and CI runs tests outside of Docker), it is fine to remove these tests.	2018-05-01 11:54:02 -07:00
Eliza Weisman	dd3b952634	proxy: Fix metrics constructor in benches (#881 ) Fixes a test compilation error.	2018-04-30 17:48:07 -07:00
Oliver Gould	ada5cb267e	proxy: Expire metrics that have not been updated for 10 minutes (#880 ) The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE environment variable that dictates the amount of time that the proxy will retain metrics that have not been updated. A timestamp is maintained for each unique set of labels, indicating the last time that the scope was updated. Then, when metrics are read, all metrics older than CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry. A ctx::test_utils module has been added to aid testing. Fixes #819	2018-04-30 16:11:12 -07:00
Oliver Gould	6512b02780	proxy: Group metrics by label (#879 ) Previously, we maintained a map of labels for each metric. Because the same keys are used in multiple scopes, this causes redundant hashing & map lookup when updating metrics. With this change, there is now only one map per unique label scope and all of the metrics for each scope are stored in the value. This makes metrics inserting faster and prepares for eviction of idle metrics. The Metric type has been split into Metric, which now only holds metric metadata and is responsible for printing a given metric, and Scopes which holds groupings of metrics by label. The metrics! macro is provided to make it easy to define Metric instances statically.	2018-04-30 15:33:09 -07:00
Oliver Gould	7247cb8ddc	proxy: Make each metric type responsible for formatting (#878 ) In order to set up for a refactor that removes the `Metric` type, the `FmtMetric` trait--implemented by `Counter`, `Gauge`, and `Histogram`--is introduced to push prometheus formatting down into each type. With this change, the `Histogram` type now relies on `Counter` (and its metric formatting) more heavily.	2018-04-30 13:00:21 -07:00
Oliver Gould	f73fdc1eae	Move `metrics::Serve` into its own module (#877 ) With this change, metrics/mod.rs now contains only metrics types.	2018-04-30 10:52:08 -07:00
Eliza Weisman	1fb15a55b3	proxy: Remove Arcs from metric labels (#873 ) This PR removes the `Arc`s from the various label types in the proxy's `metrics` modules. This should make the write side of the metrics code much more efficient (and makes the code much simpler! :D). This change was particularly easy to implement for the TCP `TransportLabels` and `TransportCloseLabels`, which consisted of only `struct`s and `enum`s, and could easily be changed to derive `Copy`. For protocol-level `RequestLabels`, the request's authority was a `String`, which still needs to be reference-counted, as the overhead of cloning `String`s is almost certainly worse than that added by ref-counting. However, rather than adding an additional `Arc<str>`, I changed `RequestLabels` to store the authority as a `http::uri::Authority`, which is backed by a `ByteStr` and thus already ref-counted. Now, when constructing `RequestLabels`, we just take another reference to the `Authority` already stored in the request context. Since `Authority` implements `fmt::Display` already, formatting the labels still works. `ResponseLabels` already store the `DstLabels` string in an `Arc`, so no additional changes there were necessary. By removing the outer `Arc` around `ResponseLabels`, we now only have to ref-count the portion of the label type that would actually be inefficient to clone. @olix0r ran the benchmarks from #874 against this branch, and it seems to be a small but noticeable improvement: ``` test record_many_dsts ... bench: 151,076 ns/iter (+/- 182,151) test record_one_conn_request ... bench: 1,599 ns/iter (+/- 209) test record_response_end ... bench: 676 ns/iter (+/- 144) ``` before: ``` test record_many_dsts ... bench: 158,403 ns/iter (+/- 130,241) test record_one_conn_request ... bench: 1,823 ns/iter (+/- 1,408) test record_response_end ... bench: 547 ns/iter (+/- 70) ``` Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-29 13:50:27 -07:00
Oliver Gould	935ccd5f78	proxy: Implement benchmarks for telemetry recording (#874 ) Before changing the telemetry implementation, we should have a means to understand the impacts of such changes. To run, you must use a nightly toolchain: ``` rustup run nightly cargo bench -p conduit-proxy -- record ```	2018-04-29 12:55:26 -07:00
Oliver Gould	64a3bb09b2	Rename `metrics::Aggregate` to `metrics::Record` (#875 ) Move `Record` into its own file.	2018-04-28 15:35:29 -07:00
Oliver Gould	9c70310406	proxy: Implement a From converter for latency::Ms (#872 ) This reduces callsite verbosity for latency measurements at the expense of a fn-level generic.	2018-04-27 17:55:44 -07:00
Eliza Weisman	9156a80a22	proxy: Add histogram unit tests (#870 ) This PR adds the unit tests for the proxy metrics module's Histogram implementation that I wrote in #775 to @olix0r's Histogram implementation added in #868. The tests weren't too difficult to adapt for the new code, and everything seems to work correctly! Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-27 17:02:13 -07:00
Oliver Gould	1cadef9516	proxy: Make `Histogram` generic over its value (#868 ) In order to support histograms measured in, for instance, microseconds, Histogram should blind store integers without being aware of the unit. In order to accomplish this, we make `Histogram` generic over a `V: Into<u64>`, such that all values added to the histogram must be of type `V`. In doing this, we also make the histogram buckets configurable, though we maintain the same defaults used for latency values. The `Histogram` type has been moved to a new module, and the `Bucket` and `Bounds` helper types have been introduced to help make histogram logic clearer and latency-agnostic.	2018-04-27 14:43:09 -07:00
Sean McArthur	5fb4695358	proxy: wrap connections in Transport sensor before peeking (#851 ) In case there are any errors while peeking the connection to do protocol detection, the sensors will now be in place to detect them. Besides just errors, this will also allow reporting about connections that are accepted, but then immediately closed. Additionally: - add write_buf implementation for Transport sensor, can help performance for http1/http2 - add better logs for tcp connections errors - add printlns for when tests fail Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-27 14:18:23 -07:00
Oliver Gould	80fdb97f88	Move `Counter` and `Gauge` to their own modules (#861 ) In preparation for a larger metrics refactor, this change splits the Counter and Gauge types into their own modules. Furthermore, this makes the minor change to these types: incr() and decr() no longer return `self`. We were not actually ever using the returned self references, and I find the unit return type to more obviously indicate the side-effecty-ness of these calls. #smpfy	2018-04-26 16:49:37 -07:00
Oliver Gould	d4d0e579c2	Introduce the `peer` label to transport metrics (#848 ) Previously, the proxy exposed separate _accept_ and _connect_ metrics for some metric types, but not for all. This leads to confusing aggregations, particularly for read and write taotals. This change primarily introduces the `peer` prometheus label (with possible values _src_ or _dst_) to indicate which side of the proxy the metric reflects. Additionally, the `received_bytes` and `sent_bytes` metrics have been renamed as `tcp_read_bytes_total` and `tcp_write_bytes_total`, resectively. This more naturally fits into existing idioms. Stream classification is not applied to these metrics, as we plan to increment them throughout stream lifetime and not only on close. The `tcp_connections_open` metric has also been renamed to `tcp_open_connections` to reflect Prometheus idioms. Finally, `msg1` and `msg2` have been constified in telemetry test fixtures so that tests are somewhat easier to read.	2018-04-25 14:06:33 -07:00
Brian Smith	0c23ad416b	Proxy: Use trust-dns-resolver for DNS. (#834 ) trust-dns-resolver is a more complete implementation. In particular, it supports CNAMES correctly, which is needed for PR #764. It also supports /etc/hosts, which will help with issue #62. Use the 0.8.2 pre-release since it hasn't been released yet. It was created at our request. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 10:04:49 -10:00
Eliza Weisman	3a3079d8ed	Fix assertions in metrics_compression test (#847 ) Fixes #846 The proxy `metrics_compression` test contained an assertion that a compressed scrape contained the `request_duration_ms_count` metric. This was chosen completely arbitrarily, and was only intended as an assertion that metrics were updated between compressed scrapes. Unfortunately, that metric was removed in `d9112abc93`, so when #665 merged to master, this test broke. CI didn't catch this since we don't build merges for PRs --- we should probably (re)enable this in Travis? This PR fixes the test to assert on a metric that wasn't removed. Sorry for the ❌s!	2018-04-25 11:02:52 -07:00
Carl Lerche	298321a6c1	Bump proxy h2 dependency to v0.1.6. (#845 ) This release includes a number of bug fixes related to HTTP/2.0 stream management. Signed-off-by: Carl Lerche <me@carllerche.com>	2018-04-25 08:40:42 -07:00
Eliza Weisman	a2c60e8fcf	Add optional GZIP compression to proxy /metrics endpoint (#665 ) Closes #598. According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent. I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-24 17:42:50 -07:00
Oliver Gould	3799debab5	Add destination labels to all relevant tap events (#840 ) The proxy incorrectly only added labels to response events. Destination labels should be added to all tap events sent by the proxy.	2018-04-24 17:15:13 -07:00
Eliza Weisman	5d29c270bf	proxy: Add tcp_connections_open gauge (#791 ) Depends on #785. This PR adds the `tcp_connections_open` gauge to the proxy's TCP metrics. It also adds some tests for that metric.	2018-04-24 10:17:48 -07:00
Sean McArthur	b053b16e9d	proxy tests: reduce some boilerplate, improve error information (#833 ) The `controller` part of the proxy will now use a default, removing the need to pass the exact same `controller::new().run()` in every test case. The TCP server and client will include their socket addresses in some panics. Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-23 18:01:51 -07:00
Eliza Weisman	d9112abc93	proxy: remove unused metrics (#826 ) This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md` Closes #821	2018-04-23 16:05:20 -07:00
Eliza Weisman	455c99d4d4	Ignore flaky metrics tests on CI (#832 ) Fixes #831. Proxy metrics tests `transport::inbound_tcp_accept` and `transport::inbound_tcp_duration` are known to be flaky and should be ignored on CI. Note that the outbound versions of these tests were already marked as flaky, so this was almost certainly either an oversight or the result of an incorrect merge. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-23 14:29:34 -07:00
Eliza Weisman	79304cdcaf	proxy: Unbreak process_start_time_seconds metric (#825 ) The refactoring of how metrics are formatted in `674ce87588` inadvertently introduced a bug that caused the `process_start_time_seconds` metric to be formatted as just a number without the metric name. This causes Prometheus to fail with a parse error rather than accepting the metrics. I've fixed this issue, and added a unit test to detect regressions in the future.	2018-04-20 15:59:08 -07:00
Eliza Weisman	c19da70965	proxy: Add classifications to TCP close stats (#790 ) This PR adds a `classification` label to transport level metrics collected on transport close. Like the `classification` label on HTTP response metrics, the value may be either `"success"` or `"failure"`. The label value is determined based on the `clean` field on the `TransportClose` event, which indicates whether a transport closed cleanly or due to an error. I've updated the tests for transport-level metrics to reflect the addition of the new label. I'd like to also modify the test support code to allow us to close transports with errors, in order to test that the errors are correctly classified as failures.	2018-04-19 19:01:48 -07:00
Oliver Gould	b968682a10	proxy: Support destination label matching for tap (#817 ) Now, the tap server may specify that requests should be matched by destination label. For example, if the controller's Destination service returns the labels: `{"service": "users", "namespace": "prod"}` for an endpoint, then tap would be able to specify a match like `namespace=prod` to match requests destined to that namespace.	2018-04-19 17:58:45 -07:00
Eliza Weisman	674ce87588	proxy: Add transport-level metrics (#785 ) This branch adds all the transport-level Prometheus metrics as described in #742, with the exception of the `tcp_connections_open` gauge (to be added in a subsequent branch). A brief description of the metrics added in this branch: * `tcp_accept_open_total`: counter of the number of connections accepted by the proxy * `tcp_accept_close_total`: counter of the number of accepted connections that have closed * `tcp_connect_open_total`: counter of the number of connections opened by the proxy * `tcp_connect_close_total`: counter of the number of connections opened by the proxy that have been closed. * `tcp_connection_duration_ms`: histogram of the total duration of each TCP connection (incremented on connection close) * `sent_bytes`: counter of the total number of bytes sent on TCP connections (incremented on connection close) * `received_bytes`: counter of the total number of bytes received on TCP connections (incremented on connection close) These metrics are labeled with the direction (inbound or outbound) and whether the connection was proxied as raw TCP or corresponds to an HTTP request. Additionally, I've added several proxy tests for these metrics. Note that there are some cases which are currently untested; in particular, while there are tests for the `tcp_accept_close_total` counter, it's more difficult to test the `tcp_connect_close_total` counter, due to connection pooling. I'd like to improve the tests for this code in additional branches.	2018-04-19 17:27:43 -07:00
Oliver Gould	689c42263a	proxy: Add destination labels to TapEvents (#814 ) The Tap API supports key-value labels on endpoint metadata. The proxy was not setting these labels previously. In order to add these labels onto tap events, we store the original set of labels in an `Arc<HashMap>` on `DstLabels`. When tap events are emitted, the destination' labels are copied from the `DstLabels` into each event.	2018-04-19 16:57:40 -07:00
Oliver Gould	0e39d6d8fa	proxy: Remove the `Labeled` middleware in favor of client context labels (#812 ) The `Labeled` middleware is used to add `DstLabels` to each request. Now that each client context maintains a watch on its endpoint's `DstLabels`, the `Labeled` middleware can safely be removed. This has one subtle behavior change: labels are associated with requests _lazily_, whereas before they were determined _eagerly_. This means that if an endpoints labels are updated before the telemetry system captures the labels for the request, it may use the newer labels. Previously, it would only use the labels at the time that the request originated.	2018-04-19 15:36:01 -07:00
Oliver Gould	c76dd1caea	proxy: Track destination labels in client ctx (#799 ) Currently, only the request context holds destination labels. However, destination labels are more accurately associated with the client context, since the client context is what tracks the remote peer address (and destination labels are associated with this address). No functional changes.	2018-04-19 14:22:13 -07:00
Oliver Gould	2238c91e92	proxy: Introduce the control::discovery::Endpoint type (#798 ) Building on #796, this creates a new `Endpoint` type that wraps `SocketAddr`. Still, no functional change has been introduced, but this sets up to move destination labels into the bind stack directly (by adding the labels watch to the `Endpoint` type).	2018-04-19 13:31:21 -07:00
Oliver Gould	491fae7cc4	proxy: Rewrite mock controller to accept a stream of dst updates (#808 ) Currently, the mock controller, which is used in tests, takes all of its updates a priori, which makes it hard to control when an update occurs within a test. Now, the controller exposes a `DstSender`, which wraps an unbounded channel of destination updates. This allows tests to trigger updates at a specific point in the test. In order to accomplish this, the controller's hand-rolled gRPC server implementation has been discarded in favor of a real gRPC destination service. This requires that the `controller-grpc` project now builds both clients and servers for the destination service. Additionally, we now build a tap client as well (assuming that we'll want to write tests against our tap server).	2018-04-19 11:01:10 -07:00
Oliver Gould	926c4cf323	proxy: Make control::discovery::Bind generic over its Endpoint type (#796 ) Previously, `Bind` required that it bind to `SocketAddr` (and `SocketAddr` only). This makes it hard to pass additional information from service discovery into the client's stack. To resolve this, `Bind` now has an additional `Endpoint` trait-generic type, and `Bind::bind` accepts an `Endpoint` rather than a `SocketAddr`. No additional endpoints have been introduced yet. There are no functional changes in this refactor.	2018-04-19 11:00:28 -07:00
Oliver Gould	2097d5a1db	proxy: Cleanup control::discovery (#797 ) `set_labels` was needlessly `Arc`ed. `Metadata` does not need to be public. No functional changes.	2018-04-19 10:59:24 -07:00
Oliver Gould	06dd8d90ee	Introduce the TapByResource API (#778 ) This changes the public api to have a new rpc type, `TapByResource`. This api supersedes the Tap api. `TapByResource` is richer, more closely reflecting the proxy's capabilities. The proxy's Tap api is extended to select over destination labels, corresponding with those returned by the Destination api. Now both `Tap` and `TapByResource`'s responses may include destination labels. This change avoids breaking backwards compatibility by: * introducing the new `TapByResource` rpc type, opting not to change Tap * extending the proxy's Match type with a new, optional, `destination_label` field. * `TapEvent` is extended with a new, optional, `destination_meta`.	2018-04-18 15:37:07 -07:00
Eliza Weisman	2e919bf813	Move request open timestamp to the top of the stack (#744 ) Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router). This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats. By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously. I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.	2018-04-17 15:01:36 -07:00
Eliza Weisman	6121afb6f2	Factor out reused test fixtures from telemetry tests (#782 ) This is a fairly minor refactor to the proxy telemetry tests. `b07b554d2b` added a `Fixture` in the Destination service labeling tests added in #661 to reduce the repetition of copied and pasted code in those tests. I've refactored most of the other telemetry tests to also use the test fixture. Significantly less code is copied and pasted now. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-17 14:15:56 -07:00
Sean McArthur	3cd16e8e40	proxy: clean up some logs and a few warnings in proxy tests (#780 ) Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-17 12:53:20 -07:00
Eliza Weisman	cf2d7b1d7d	proxy: move metrics::prometheus module to root metrics module (#763 ) The proxy `telemetry::metrics::prometheus` module was initially added in order to give the Prometheus metrics export code a separate namespace from the controller push metrics. Since the controller push metrics code was removed from the proxy in #616, we no longer need a separate module for the Prometheus-specific metrics code. Therefore, I've moved that code to the root `telemetry::metrics` module, which should hopefully make the proxy source tree structure a little simpler. This is a fairly trivial refactor. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-17 11:19:27 -07:00

1 2 3 4

171 Commits