Commit Graph

218 Commits

Author SHA1 Message Date
Oliver Gould f4dba72cc3
proxy: Track SingleUse services against router capacity (#902)
PR #898 introduces capacity limits to the balancer. However, because the
router supports "single-use" routes--routes that are bound only for the
life of a single HTTP1 request--it is easy for a router to exceed its
configured capacity.

In order to fix this, the `Reuse` type is removed from the router
library so that _all_ routes are considered cacheable. It's now the
responsibility of the bound service to enforce policies with regards to
client retention.

Routes were not added to the cache when the service could not be used to
process more than a single request. Now, `Bind` wraps its returned
services (via the `Binding` type), that dictate whether a single client
is reused or if one is bound for each request.

This enables all routes to be cached without changing behavior with
regards to connection reuse.
2018-05-08 10:57:56 -07:00
Oliver Gould 02e6d018d0
proxy: Bound on router capacity (#898)
Currently, the proxy may cache an unbounded number of routes. In order
to prevent such leaks in production, new configurations are introduced
to limit the number of inbound and outbound HTTP routes. By default, we
support 100 inbound routes and 10K outbound routes.

In a followup, we'll introduce an eviction strategy so that capacity can
be reclaimed gracefully.
2018-05-04 16:32:30 -07:00
Oliver Gould 3310968647
proxy: Refactor router implementation (#894)
The Router's primary `call` implementation is somewhat difficult to
follow.

This change does not introduce any functional changes, but makes the
function easier to reason about.

This is being done in preparation for functional changes.
2018-05-02 15:47:36 -07:00
Oliver Gould 7f16079f64
proxy: Upgrade tower dependencies (#892)
In order to pick up https://github.com/tower-rs/tower-grpc/pull/60,
upgrade tower dependencies. This will reduce the cost of updating
for upcoming tower-h2 improvements.
2018-05-02 13:40:55 -07:00
Eliza Weisman 86bb701be8
Add unit tests for `metrics::record` (#890)
This PR adds unit tests for `metrics::record`, based on the benchmarks for the
same function. Currently, there is a test that fires a single response end event
and asserts that the metrics state is correct afterward, and a test that fires
all the events to simulate a full connection lifetime, and asserts that the
metrics state is correct afterward. I'd like to also add a test that simulates
multiple events with different labels, but I'll add that in a subsequent PR,

In order to add these tests, it was necessary to to add test-only accessors
to make some `metrics` structs `pub`` so that the test can access them. 
I also added some test-only functions to `metrics::Histogram`s, to make 
them easier to make assertions about.
2018-05-02 13:26:27 -07:00
Oliver Gould 2578a47617
proxy: Do not build Arbitrary types in Docker (#889)
When the proxy's Dockerfile ran tests, it was necessary to build
Arbitrary types for quickchecking protobuf types.

Now that tests have been disabled, this optional set of dependencies is
no longer required.

Relates to #882.
2018-05-01 14:56:24 -07:00
Oliver Gould eb08d47347
proxy: Fix Tap ID generation (#885)
The proxy's tap server assigns a sequential numeric ID to each inbound
Tap request to assist tap lifecycle management.

The server implementation keeps a local counter to keep track of tap
IDs. However, this implementation is cloned for each individual tap
requests, so `0` the only tap ID ever used.

This change moves the Tap ID to be stored in a shared atomic integer.

Debug logging has been improved as well.
2018-05-01 11:59:45 -07:00
Oliver Gould 1801118906
Do not run tests in proxy Dockerfile (#882)
The proxy Dockerfile includes test execution. While the intentions of
this are good, it has unintended consequences: we can ship code linked
with test dependencies.

Because we have other means for testing proxy code (cargo, locally; and
CI runs tests outside of Docker), it is fine to remove these tests.
2018-05-01 11:54:02 -07:00
Eliza Weisman dd3b952634 proxy: Fix metrics constructor in benches (#881)
Fixes a test compilation error.
2018-04-30 17:48:07 -07:00
Oliver Gould ada5cb267e
proxy: Expire metrics that have not been updated for 10 minutes (#880)
The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE
environment variable that dictates the amount of time that the proxy will retain
metrics that have not been updated.

A timestamp is maintained for each unique set of labels, indicating the last time
that the scope was updated. Then, when metrics are read, all metrics older than
CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry.

A ctx::test_utils module has been added to aid testing.

Fixes #819
2018-04-30 16:11:12 -07:00
Oliver Gould 6512b02780
proxy: Group metrics by label (#879)
Previously, we maintained a map of labels for each metric. Because the same keys are used
in multiple scopes, this causes redundant hashing & map lookup when updating metrics.

With this change, there is now only one map per unique label scope and all of the metrics
for each scope are stored in the value. This makes metrics inserting faster and prepares
for eviction of idle metrics.

The Metric type has been split into Metric, which now only holds metric metadata and is
responsible for printing a given metric, and Scopes which holds groupings of metrics by
label.

The metrics! macro is provided to make it easy to define Metric instances statically.
2018-04-30 15:33:09 -07:00
Oliver Gould 7247cb8ddc
proxy: Make each metric type responsible for formatting (#878)
In order to set up for a refactor that removes the `Metric` type, the
`FmtMetric` trait--implemented by `Counter`, `Gauge`, and
`Histogram`--is introduced to push prometheus formatting down into each
type.

With this change, the `Histogram` type now relies on `Counter` (and its
metric formatting) more heavily.
2018-04-30 13:00:21 -07:00
Oliver Gould f73fdc1eae
Move `metrics::Serve` into its own module (#877)
With this change, metrics/mod.rs now contains only metrics types.
2018-04-30 10:52:08 -07:00
Eliza Weisman 1fb15a55b3
proxy: Remove Arcs from metric labels (#873)
This PR removes the `Arc`s from the various label types in the proxy's 
`metrics` modules. This should make the write side of the metrics code
much more efficient (and makes the code much simpler! :D).

This change was particularly easy to implement for the TCP `TransportLabels`
and `TransportCloseLabels`, which consisted of only `struct`s and `enum`s,
and could easily be changed to derive `Copy`. 

For protocol-level `RequestLabels`, the request's authority was a `String`,
which still needs to be reference-counted, as the overhead of cloning `String`s
is almost certainly worse than that added by ref-counting. However, rather than
adding an additional `Arc<str>`, I changed `RequestLabels` to store the 
authority as a `http::uri::Authority`, which is backed by a `ByteStr` and thus
already ref-counted. Now, when constructing `RequestLabels`, we just take 
another reference to the `Authority` already stored in the request context.
Since `Authority` implements `fmt::Display` already, formatting the labels 
still works.

`ResponseLabels` already store the `DstLabels` string in an `Arc`, so no
additional changes there were necessary. By removing the outer `Arc` around
`ResponseLabels`, we now only have to ref-count the portion of the label type
that would actually be inefficient to clone.

@olix0r ran the benchmarks from #874 against this branch, and it seems to be 
a small but noticeable improvement:
```
test record_many_dsts        ... bench:     151,076 ns/iter (+/- 182,151)
test record_one_conn_request ... bench:       1,599 ns/iter (+/- 209)
test record_response_end     ... bench:         676 ns/iter (+/- 144)
```

before:
```
test record_many_dsts        ... bench:     158,403 ns/iter (+/- 130,241)
test record_one_conn_request ... bench:       1,823 ns/iter (+/- 1,408)
test record_response_end     ... bench:         547 ns/iter (+/- 70)
```

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-29 13:50:27 -07:00
Oliver Gould 935ccd5f78
proxy: Implement benchmarks for telemetry recording (#874)
Before changing the telemetry implementation, we should have a means to
understand the impacts of such changes.

To run, you must use a nightly toolchain:
```
rustup run nightly cargo bench -p conduit-proxy -- record
```
2018-04-29 12:55:26 -07:00
Oliver Gould 64a3bb09b2
Rename `metrics::Aggregate` to `metrics::Record` (#875)
Move `Record` into its own file.
2018-04-28 15:35:29 -07:00
Oliver Gould 9c70310406
proxy: Implement a From converter for latency::Ms (#872)
This reduces callsite verbosity for latency measurements at the expense of a fn-level
generic.
2018-04-27 17:55:44 -07:00
Eliza Weisman 9156a80a22
proxy: Add histogram unit tests (#870)
This PR adds the unit tests for the proxy metrics module's Histogram 
implementation that I wrote in #775 to @olix0r's Histogram implementation
added in #868. The tests weren't too difficult to adapt for the new code,
and everything seems to work correctly!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-27 17:02:13 -07:00
Oliver Gould 1cadef9516
proxy: Make `Histogram` generic over its value (#868)
In order to support histograms measured in, for instance, microseconds,
Histogram should blind store integers without being aware of the unit.
In order to accomplish this, we make `Histogram` generic over a `V:
Into<u64>`, such that all values added to the histogram must be of type
`V`.

In doing this, we also make the histogram buckets configurable, though
we maintain the same defaults used for latency values.

The `Histogram` type has been moved to a new module, and the `Bucket`
and `Bounds` helper types have been introduced to help make histogram
logic clearer and latency-agnostic.
2018-04-27 14:43:09 -07:00
Sean McArthur 5fb4695358
proxy: wrap connections in Transport sensor before peeking (#851)
In case there are any errors while peeking the connection to do protocol
detection, the sensors will now be in place to detect them. Besides just
errors, this will also allow reporting about connections that are
accepted, but then immediately closed.

Additionally:

- add write_buf implementation for Transport sensor, can help
  performance for http1/http2
- add better logs for tcp connections errors
- add printlns for when tests fail

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-27 14:18:23 -07:00
Oliver Gould 80fdb97f88
Move `Counter` and `Gauge` to their own modules (#861)
In preparation for a larger metrics refactor, this change splits the
Counter and Gauge types into their own modules.

Furthermore, this makes the minor change to these types: incr() and
decr() no longer return `self`. We were not actually ever using the
returned self references, and I find the unit return type to more
obviously indicate the side-effecty-ness of these calls. #smpfy
2018-04-26 16:49:37 -07:00
Oliver Gould d4d0e579c2
Introduce the `peer` label to transport metrics (#848)
Previously, the proxy exposed separate _accept_ and _connect_ metrics
for some metric types, but not for all. This leads to confusing
aggregations, particularly for read and write taotals.

This change primarily introduces the `peer` prometheus label (with
possible values _src_ or _dst_) to indicate which side of the proxy the
metric reflects.

Additionally, the `received_bytes` and `sent_bytes` metrics have been
renamed as `tcp_read_bytes_total` and `tcp_write_bytes_total`,
resectively. This more naturally fits into existing idioms.  Stream
classification is not applied to these metrics, as we plan to increment
them throughout stream lifetime and not only on close.

The `tcp_connections_open` metric has also been renamed to
`tcp_open_connections` to reflect Prometheus idioms.

Finally, `msg1` and `msg2` have been constified in telemetry test
fixtures so that tests are somewhat easier to read.
2018-04-25 14:06:33 -07:00
Brian Smith 0c23ad416b
Proxy: Use trust-dns-resolver for DNS. (#834)
trust-dns-resolver is a more complete implementation. In particular,
it supports CNAMES correctly, which is needed for PR #764. It also
supports /etc/hosts, which will help with issue #62.

Use the 0.8.2 pre-release since it hasn't been released yet. It was
created at our request.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 10:04:49 -10:00
Eliza Weisman 3a3079d8ed
Fix assertions in metrics_compression test (#847)
Fixes #846 

The proxy `metrics_compression` test contained an assertion that a compressed scrape contained the `request_duration_ms_count` metric. This was chosen completely arbitrarily, and was only intended as an assertion that metrics were updated between compressed scrapes. Unfortunately, that metric was removed in d9112abc93, so when #665 merged to master, this test broke. CI didn't catch this since we don't build merges for PRs --- we should probably (re)enable this in Travis?

This PR fixes the test to assert on a metric that wasn't removed. Sorry for the s!
2018-04-25 11:02:52 -07:00
Carl Lerche 298321a6c1
Bump proxy h2 dependency to v0.1.6. (#845)
This release includes a number of bug fixes related to HTTP/2.0 stream
management.

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-04-25 08:40:42 -07:00
Eliza Weisman a2c60e8fcf
Add optional GZIP compression to proxy /metrics endpoint (#665)
Closes #598.

According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent.

I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-24 17:42:50 -07:00
Oliver Gould 3799debab5
Add destination labels to all relevant tap events (#840)
The proxy incorrectly only added labels to response events. Destination
labels should be added to all tap events sent by the proxy.
2018-04-24 17:15:13 -07:00
Eliza Weisman 5d29c270bf
proxy: Add tcp_connections_open gauge (#791)
Depends on #785.

This PR adds the `tcp_connections_open` gauge to the proxy's TCP metrics. It also adds some tests for that metric.
2018-04-24 10:17:48 -07:00
Sean McArthur b053b16e9d
proxy tests: reduce some boilerplate, improve error information (#833)
The `controller` part of the proxy will now use a default, removing the
need to pass the exact same `controller::new().run()` in every test
case.

The TCP server and client will include their socket addresses in some
panics.

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-23 18:01:51 -07:00
Eliza Weisman d9112abc93
proxy: remove unused metrics (#826)
This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`

Closes #821
2018-04-23 16:05:20 -07:00
Eliza Weisman 455c99d4d4
Ignore flaky metrics tests on CI (#832)
Fixes #831.

Proxy metrics tests `transport::inbound_tcp_accept` and `transport::inbound_tcp_duration` are known to be flaky and should be ignored on CI. Note that the outbound versions of these tests were already marked as flaky, so this was almost certainly either an oversight or the result of an incorrect merge.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-23 14:29:34 -07:00
Eliza Weisman 79304cdcaf
proxy: Unbreak process_start_time_seconds metric (#825)
The refactoring of how metrics are formatted in 674ce87588 inadvertently introduced a bug that caused the `process_start_time_seconds` metric to be formatted as just a number without the metric name. This causes Prometheus to fail with a parse error rather than accepting the metrics.

I've fixed this issue, and added a unit test to detect regressions in the future.
2018-04-20 15:59:08 -07:00
Eliza Weisman c19da70965
proxy: Add classifications to TCP close stats (#790)
This PR adds a `classification` label to transport level metrics collected on transport close. Like the `classification` label on HTTP response metrics, the value may be either `"success"` or `"failure"`. The label value is determined based on the `clean` field on the `TransportClose` event, which indicates whether a transport closed cleanly or due to an error.

I've updated the tests for transport-level metrics to reflect the addition of the new label. I'd like to also modify the test support code to allow us to close transports with errors, in order to test that the errors are correctly classified as failures.
2018-04-19 19:01:48 -07:00
Oliver Gould b968682a10
proxy: Support destination label matching for tap (#817)
Now, the tap server may specify that requests should be matched by destination
label.

For example, if the controller's Destination service returns the labels:
`{"service": "users", "namespace": "prod"}` for an endpoint, then tap would be
able to specify a match like `namespace=prod` to match requests destined to
that namespace.
2018-04-19 17:58:45 -07:00
Eliza Weisman 674ce87588
proxy: Add transport-level metrics (#785)
This branch adds all the transport-level Prometheus metrics as described in #742, with the exception of the `tcp_connections_open` gauge (to be added in a subsequent branch). 

A brief description of the metrics added in this branch:
* `tcp_accept_open_total`: counter of the number of connections accepted by the proxy
* `tcp_accept_close_total`: counter of the number of accepted connections that have closed
* `tcp_connect_open_total`: counter of the number of connections opened by the proxy
* `tcp_connect_close_total`: counter of the number of connections opened by the proxy that have been closed.
* `tcp_connection_duration_ms`: histogram of the total duration of each TCP connection (incremented on connection close)
* `sent_bytes`: counter of the total number of bytes sent on TCP connections (incremented on connection close)
* `received_bytes`: counter of the total number of bytes received on TCP connections (incremented on connection close)

 These metrics are labeled with the direction (inbound or outbound) and whether the connection was proxied as raw TCP or corresponds to an HTTP request. 

Additionally, I've added several proxy tests for these metrics. Note that there are some cases which are currently untested; in particular, while there are tests for the `tcp_accept_close_total` counter, it's more difficult to test the `tcp_connect_close_total` counter, due to connection pooling. I'd like to improve the tests for this code in additional branches.
2018-04-19 17:27:43 -07:00
Oliver Gould 689c42263a
proxy: Add destination labels to TapEvents (#814)
The Tap API supports key-value labels on endpoint metadata. The proxy was not
setting these labels previously.

In order to add these labels onto tap events, we store the original set of
labels in an `Arc<HashMap>` on `DstLabels`. When tap events are emitted, the
destination' labels are copied from the `DstLabels` into each event.
2018-04-19 16:57:40 -07:00
Oliver Gould 0e39d6d8fa
proxy: Remove the `Labeled` middleware in favor of client context labels (#812)
The `Labeled` middleware is used to add `DstLabels` to each request. Now that
each client context maintains a watch on its endpoint's `DstLabels`, the
`Labeled` middleware can safely be removed.

This has one subtle behavior change: labels are associated with requests
_lazily_, whereas before they were determined _eagerly_. This means that if an
endpoints labels are updated before the telemetry system captures the labels
for the request, it may use the newer labels. Previously, it would only use the
labels at the time that the request originated.
2018-04-19 15:36:01 -07:00
Oliver Gould c76dd1caea
proxy: Track destination labels in client ctx (#799)
Currently, only the request context holds destination labels. However,
destination labels are more accurately associated with the client context,
since the client context is what tracks the remote peer address (and
destination labels are associated with this address).

No functional changes.
2018-04-19 14:22:13 -07:00
Oliver Gould 2238c91e92
proxy: Introduce the control::discovery::Endpoint type (#798)
Building on #796, this creates a new `Endpoint` type that wraps `SocketAddr`.

Still, no functional change has been introduced, but this sets up to move
destination labels into the bind stack directly (by adding the labels watch to
the `Endpoint` type).
2018-04-19 13:31:21 -07:00
Oliver Gould 491fae7cc4
proxy: Rewrite mock controller to accept a stream of dst updates (#808)
Currently, the mock controller, which is used in tests, takes all of its
updates a priori, which makes it hard to control when an update occurs within a
test.

Now, the controller exposes a `DstSender`, which wraps an unbounded channel of
destination updates. This allows tests to trigger updates at a specific point
in the test.

In order to accomplish this, the controller's hand-rolled gRPC server
implementation has been discarded in favor of a real gRPC destination service.
This requires that the `controller-grpc` project now builds both clients
and servers for the destination service. Additionally, we now build a tap
client as well (assuming that we'll want to write tests against our tap
server).
2018-04-19 11:01:10 -07:00
Oliver Gould 926c4cf323
proxy: Make control::discovery::Bind generic over its Endpoint type (#796)
Previously, `Bind` required that it bind to `SocketAddr` (and `SocketAddr`
only). This makes it hard to pass additional information from service discovery
into the client's stack.

To resolve this, `Bind` now has an additional `Endpoint` trait-generic type,
and `Bind::bind` accepts an `Endpoint` rather than a `SocketAddr`.

No additional endpoints have been introduced yet. There are no functional
changes in this refactor.
2018-04-19 11:00:28 -07:00
Oliver Gould 2097d5a1db
proxy: Cleanup control::discovery (#797)
`set_labels` was needlessly `Arc`ed.

`Metadata` does not need to be public.

No functional changes.
2018-04-19 10:59:24 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Eliza Weisman 2e919bf813
Move request open timestamp to the top of the stack (#744)
Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router).

This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats.

By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously.

I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.
2018-04-17 15:01:36 -07:00
Eliza Weisman 6121afb6f2
Factor out reused test fixtures from telemetry tests (#782)
This is a fairly minor refactor to the proxy telemetry tests. b07b554d2b added a `Fixture` in the Destination service labeling tests added in #661 to reduce the repetition of copied and pasted code in those tests. I've refactored most of the other telemetry tests to also use the test fixture. Significantly less code is copied and pasted now.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-17 14:15:56 -07:00
Sean McArthur 3cd16e8e40
proxy: clean up some logs and a few warnings in proxy tests (#780)
Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-17 12:53:20 -07:00
Eliza Weisman cf2d7b1d7d
proxy: move metrics::prometheus module to root metrics module (#763)
The proxy `telemetry::metrics::prometheus` module was initially added in order to give the Prometheus metrics export code a separate namespace from the controller push metrics. Since the controller push metrics code was removed from the proxy in #616, we no longer need a separate module for the Prometheus-specific metrics code. Therefore, I've moved that code to the root `telemetry::metrics` module, which should hopefully make the proxy source tree structure a little simpler.

This is a fairly trivial refactor.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-17 11:19:27 -07:00
Eliza Weisman 64f4dfe07f
Refactor control::Cache and add tests (#733)
Closes #713. This is a follow-up from #688.

This PR makes a number of refactorings to the proxy's `control::Cache` module and removes all but one of the `clone` calls. 

The `CacheChange` enum now contains the changed key and a reference to the changed value when applicable. This simplifies `on_change` functions, which no longer have to take both a tuple of `(K, V)` and a `CacheChange` and can now simply destructure the `CacheChange`, and since the changed value is passed as a reference, the `on_change` function can now decide whether or not it should be cloned. This means that we can remove a majority of the clones previously present here.

I've also rewritten `Cache::update_union` so that it no longer clones values (twice if the cache was invalidated). There's still one `clone` call in `Cache::update_intersection`, but it seems like it will be fairly tricky to remove. However, I've moved the `V: Clone` bound to that function specifically. `Cache::clear` and `Cache::update_union` so that they no longer call `Cache::update_intersection` internally, so they don't need a `V: Clone` bound.

In addition, I've added some unit tests that test that `on_change` is called with the correct `CacheChange`s when key/value pairs are modified.
2018-04-16 16:42:55 -07:00
Brian Smith 621f3c2e56
Revert "Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593)" (#767)
This reverts commit d38a2acff8.

The change being reverted here did reduce downloads that occur when
Cargo.lock is updated. However, it had the unwanted side-effect of
invalidating at least part of the Cargo download cache when other
files, including in particular files under proto/, were modified.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-16 13:27:49 -10:00
Brian Smith 0c37067554
Reduce proto dependencies in proxy/Dockerfile (#765)
Reduce the dependencies on files under proto/ to eliminate Docker
detecting false dependencies that trigger rebuilds.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-13 14:49:55 -10:00
Oliver Gould cc44db054f
Remove NODE_NAME and POD_NAME env usage (#758)
* proxy: Remove pod_name and node_name

* cli: Do not inject POD_NAME and NODE_NAME env vars
2018-04-13 13:09:51 -07:00
Oliver Gould efdfc93b50
Stop pushing telemetry reports from the proxy (#616)
Now that the controller does not depend on pushed telemetry reports, the
proxy need not depend on the telemetry API or maintain legacy sampling
logic.
2018-04-12 17:39:29 -07:00
Eliza Weisman b6180d8bfe
Add unit tests for Labeled middleware (#738)
I've added unit tests for the `Labeled` middleware used to add Destination labels in the proxy, as @olix0r requested in https://github.com/runconduit/conduit/pull/661#discussion_r179897783. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-12 15:10:01 -07:00
Eliza Weisman 61d15a6c3e
Ignore flaky telemetry tests on CI (#752)
The tests for label metadata updates from the control plane are flaky on CI. This is likely due to the CI containers not having enough cores to execute the test proxy thread, the test proxy's controller client thread, the mock controller thread, and the test server thread simultaneously --- see #751 for more information. 

For now, I'm ignoring these on CI. Eventually, I'd like to change the mock controller code in test support so that we can trigger it to send a second metadata update only after the request has finished.

I think this issue also makes merging #738 a higher priority, so that we can still have some tests running on CI that exercise some part of the label update behaviour.
2018-04-12 14:59:17 -07:00
Eliza Weisman b07b554d2b
Add labels from service discovery to proxy metrics reports (#661)
PR #654 adds pod-based metric labels to the Destination API responses for cluster-local services. 

This PR modifies the proxy to actually add these labels to reported Prometheus metrics for outbound requests to local services. 

It enhances the proxy's `control::discovery` module to track these labels and add a `LabelRequest` middleware to the service stack built in `Bind` for labeled services. Requests transiting `LabelRequest` are given an `Extension` which contains these labels, which are then added to events produced by the `Sensors` for these requests. When these events are aggregated to Prometheus metrics, the labels are added.

I've also added some tests in `test/telemetry.rs` ensuring that these metrics are added correctly when the Destination service provides labels.

Closes #660

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-12 12:54:38 -07:00
Sean McArthur 7f54b5253d
proxy: fix flaky tcp graceful shutdown test (#735) 2018-04-10 19:47:00 -07:00
Sean McArthur 02c6887020
proxy: improve graceful shutdown process (#684)
- The listener is immediately closed on receipt of a shutdown signal.
- All in-progress server connections are now counted, and the process will
  not shutdown until the connection count has dropped to zero.
- In the case of HTTP1, idle connections are closed. In the case of HTTP2,
  the HTTP2 graceful shutdown steps are followed of sending various
  GOAWAYs.
2018-04-10 14:15:37 -07:00
Brian Smith 7319cf648f
Proxy: Do L7 load balancing for all external HTTP services. (#726)
Previously when the proxy could tell, by parsing, the request-target
is not in the cluster, it would not override the destination. That is,
load balancing would be disabled for such destinations.

With this change, the proxy will do L7 load balancing for all HTTP
services as long as the request-target has a DNS name.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-10 08:07:16 -10:00
Brian Smith bc16034fd6
Proxy: Fall back to using DNS when Destination service can't find service. (#692)
Fixes #155.
2018-04-07 18:26:06 -10:00
Brian Smith c25e9c371b
Refactor poll_destination() in service discovery. (#725)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-07 18:15:19 -10:00
Brian Smith 7d3b715c4d
Proxy: Move DNS name normalization to service discovery (#722)
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 15:04:09 -10:00
Eliza Weisman 8bc05472ed
Make `control::Cache` key-value in order to store discovery metadata (#688)
This PR changes the proxy's `control::Cache` module from a set to a key-value map. 

This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type.

Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. 

Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata.

If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.
2018-04-06 13:54:16 -07:00
Brian Smith 15037d9618
Proxy: Improve DNS name parsing (#708)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 08:45:18 -10:00
Eliza Weisman 605e68dff6
Add pretty durations to panics from `assert_eventually!` (#677)
This PR adds the pretty-printing for durations I added in #676 to the panic message from the `assert_eventually!` macro added in #669. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-06 10:49:17 -07:00
Brian Smith c31f4ba993
Remove unused conversions for Destination. (#701)
These have not been used for a while; they are dead code.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 07:35:35 -10:00
Brian Smith 7bc4ffd0a4
Revert "Proxy: Refactor DNS name parsing and normalization (#673)" (#700)
This reverts commit 311ef410a8.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:49:32 -10:00
Brian Smith 1b223723bc
Revert "Proxy: Refactor poll_destination() in service discovery. (#674)" (#698)
This reverts commit 4fb9877b89.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:36:01 -10:00
Brian Smith 4fb9877b89
Proxy: Refactor poll_destination() in service discovery. (#674)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 13:05:11 -10:00
Brian Smith 311ef410a8
Proxy: Refactor DNS name parsing and normalization (#673)
Proxy: Refactor DNS name parsing and normalization

Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 12:32:12 -10:00
Eliza Weisman 18fa42ebd0
Pretty-print durations in log messages (#676)
This branch adds simple pretty-printing to duration in log timeout messages. If the duration is >= 1 second, it's printed in seconds with a fractional part. If the duration is less than 1 second, it is printed in milliseconds. This simple formatting may not be sufficient as a formatting rule for all cases, but should be sufficient for printing our relatively small timeouts.

Log messages now look something like this:
```
ERROR 2018-04-04T20:05:49Z: conduit_proxy: turning operation timed out after 100 ms into 500
```

Previously, they looked like this:
```
ERROR 2018-04-04T20:07:26Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
```

I made this change partially because I wanted to make the panics from the `eventually!` macro added in #669 more readable.
2018-04-05 13:47:19 -07:00
Eliza Weisman 49bf01b0da
Add `assert_eventually!` macro to help de-flake telemetry tests (#669)
Closes #615.

Based on @olix0r's suggestion in https://github.com/runconduit/conduit/issues/613#issuecomment-376024744, this PR adds an `assert_eventually!` macro to retry an assertion a set number of times, waiting for 15 ms between retries. This is loosely based on ScalaTest's [eventually](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html).

I've rewritten the flaky telemetry tests to use the `assert_eventually!` macro, to compensate for delays in the served metrics being updated between client requests and metrics scrapes.
2018-04-05 11:23:34 -07:00
Eliza Weisman 6b370b4466
Split labels out of `prometheus.rs` into its own file (#680)
The proxy's `telemetry/metrics/prometheus.rs` file was starting to get long and hard to find one's way around in. I split the prometheus labels code out into a separate submodule and `RequestLabels` and `ResponseLabels` public. This seems like a reasonable division of the code, and the resultant files are much easier to read.
2018-04-04 15:49:17 -07:00
Oliver Gould 2dc964c583
Move control::discovery::Cache into its own module (#672)
The proxy's control::discovery module is becoming a bit dense in terms
of what it implements.

In order to make this code more understandable, and to be able to use a
similar caching strategy in other parts of the controller, the
`control::cache` module now holds discovery's cache implementation.

This module is only visible within the `control` module, and it now
exposes two new public methods: `values()` and
`set_reset_on_next_modification()`.
2018-04-04 14:27:04 -07:00
Eliza Weisman 01628bfa43
Fix missing comma in gRPC status code labels (#670)
Fixes the issue caught by @olix0r in https://github.com/runconduit/conduit/pull/661#issuecomment-378431155
2018-04-04 10:41:21 -07:00
Brian Smith 06bf78ccdf
Use Rust 1.25 to build Docker images. (#667)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-03 08:22:29 -10:00
Phil Calçado 19001f8d38 Add pod-based metric_labels to destinations response (#429) (#654)
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix mock controller in proxy tests (#656)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* Address review feedback
* Rename files in the destination package

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-02 18:36:57 -07:00
Sean McArthur 47f9665b8e
proxy: allow disable protocol detection on specific ports (#648)
- Adds environment variables to configure a set of ports that, when an
  incoming connection has an SO_ORIGINAL_DST with a port matching, will
  disable protocol detection for that connection and immediately start a
  TCP proxy.
- Adds a default list of well known ports: SMTP and MySQL.

Closes #339
2018-04-02 14:24:36 -07:00
Brian Smith f931dec3b3
Proxy: Completely replace current set of destinations on reconnect (#632)
Previosuly, when the proxy was disconnected from the Destination
service and then reconnects, the proxy would not forget old, outdated
entries in its cache of endpoints. If those endpoints had been removed
while the proxy was disconnected then the proxy would never become
aware of that.

Instead, on the first message after a reconnection, replace the entire
set of cached entries with the new set, which may be empty.

Prior to this change, the new test
outbound_destinations_reset_on_reconnect_followed_by_no_endpoints_exists
passed already
but outbound_destinations_reset_on_reconnect_followed_by_add_none
and outbound_destinations_reset_on_reconnect_followed_by_remove_none
failed. Now all these tests pass.

Fixes #573

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 16:50:08 -10:00
Brian Smith bae72c32ed
Proxy: Factor out Destination service connection logic (#631)
* Proxy: Factor out Destination service connection logic

Centralize the connection initiation logic for the Destination service
to make it easier to maintain. Clarify that the `rx` field isn't needed
prior to a (re)connect.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Rename `rx` to `query`.

Signed-off-by: Brian Smith <brian@briansmith.org>

* "recoonect" -> "reconnect"

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 08:20:57 -10:00
Eliza Weisman c688cf6028
Add response classification to proxy metrics (#639)
This PR adds a `classification` label to proxy response metrics, as @olix0r described in https://github.com/runconduit/conduit/issues/634#issuecomment-376964083. The label is either "success" or "failure", depending on the following rules:
+ **if** the response had a gRPC status code, *then*
   - gRPC status code 0 is considered a success
   - all others are considered failures
+ **else if** the response had an HTTP status code, *then*
  - status codes < 500 are considered success,
  - status codes >= 500 are considered failures
+ **else if** the response stream failed **then**
  - the response is a failure.

I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-28 14:49:00 -07:00
Eliza Weisman 40b9b345a5
All counters in proxy telemetry wrap on overflows (#603)
In #602, @olix0r suggested that telemetry counters should wrap on overflows, as "most timeseries systems (like prometheus) are designed to handle this case gracefully."

This PR changes counters to use explicitly wrapping arithmetic.

Closes #602.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-27 14:03:12 -07:00
Brian Smith 7dc21f9588
Add the NoEndpoints message to the Destination API (#564)
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.

Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 10:45:41 -10:00
Eliza Weisman e7aa3d4105
Add process_start_time_seconds Prometheus metric (#628)
As described in #619. `process_start_time_seconds` is the idiomatic way of reporting to Prometheus the uptime of a process. It should contain the time in seconds since the beginning of the Unix epoch.

The proxy now exports this metric:
```
➜ http get localhost:4191/metrics
HTTP/1.1 200 OK
Content-Length: 902
Content-Type: text/plain; charset=utf-8
Date: Mon, 26 Mar 2018 22:09:55 GMT

# HELP request_total A counter of the number of requests the proxy has received.
# TYPE request_total counter

# HELP request_duration_ms A histogram of the duration of a request. This is measured from when the request headers are received to when the request stream has completed.
# TYPE request_duration_ms histogram

# HELP response_total A counter of the number of responses the proxy has received.
# TYPE response_total counter

# HELP response_duration_ms A histogram of the duration of a response. This is measured from when theresponse headers are received to when the response stream has completed.
# TYPE response_duration_ms histogram

# HELP response_latency_ms A histogram of the total latency of a response. This is measured from whenthe request headers are received to when the response stream has completed.
# TYPE response_latency_ms histogram

process_start_time_seconds 1522102089

```

Closes #619
2018-03-27 12:54:31 -07:00
Eliza Weisman bdcdfa8874
Actually skip flaky tests on CI and in Docker (#626)
Flaky proxy tests were not actually being ignored properly. This is due to our use of a Cargo workspace; as it turns out that Cargo doesn't propagate feature flags from the workspace to the crates in the workspace (see rust-lang/cargo#4753). 

If I run `cargo test --no-default-features` in the root directory, the `flaky_tests` feature is still passed, and the flaky tests still run:
```
➜ cargo test --no-default-features
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_loop ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_outbound_request_count ... ok
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_endpoint_outbound_request_duration ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok

test result: ok. 11 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:54:10Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http11_upgrade_not_supported ... ok
test http11_absolute_uri_differs_from_host ... ok
test http10_without_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_connect_not_supported ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_controller_grpc-7fdac3528475b1dc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_router-024926cac5d328ee

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/convert-ae9bd3b8fee21c85

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/futures_mpsc_lossy-4afd31454ff77b40

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-controller-grpc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests convert

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-router

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests futures-mpsc-lossy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

This also happens if the `-p` flag is used to run tests only in the `conduit-proxy` crate:

```
➜ cargo test -p conduit-proxy --no-default-features
   Compiling conduit-proxy v0.3.0 (file:///Users/eliza/Code/go/src/github.com/runconduit/conduit/proxy)
    Finished dev [unoptimized + debuginfo] target(s) in 17.27 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test inbound::tests::recognize_default_no_loop ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test inbound::tests::recognize_orig_dst ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_outbound_request_count ... ok
test metrics_endpoint_outbound_request_duration ... ok
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test inbound_sends_telemetry ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok
test records_latency_statistics ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:56:55Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```

However, if I `cd` into the `proxy` directory (so that Cargo treats the `conduit-proxy` crate as the root project, rather than the workspace) and pass the `--no-default-features` flag, the flaky tests are skipped as expected:

```
➜ (cd proxy && exec cargo test --no-default-features)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-ac198a96228a056e

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test inbound::tests::recognize_default_no_loop ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-41e0f900f97e194b

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/discovery-7ba7fe16345a347a

running 16 tests
test http1::absolute_uris::outbound_times_out ... ignored
test http1::outbound_times_out ... ignored
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http2::outbound_times_out ... ignored
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test outbound_updates_newer_services ... ok

test result: ok. 13 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/telemetry-b0763b64edd8fc68

running 12 tests
test metrics_endpoint_inbound_request_count ... ignored
test metrics_endpoint_inbound_request_duration ... ignored
test metrics_endpoint_inbound_response_latency ... ignored
test metrics_endpoint_outbound_request_count ... ignored
test metrics_endpoint_outbound_request_duration ... ignored
test metrics_endpoint_outbound_response_latency ... ignored
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok

test result: ok. 5 passed; 0 failed; 7 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/transparency-300fd801daa85ccf

running 19 tests
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http10_with_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http1_bodyless_responses ... ok
test http1_removes_connection_headers ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test inbound_tcp ... ok
test inbound_http1 ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_connections_close_if_client_closes ... ok
test tcp_with_no_orig_dst ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

I'm wrapping the `cd` and `cargo test` command in a subshell so that the CWD on Travis is still in the repo root when the command exits, but the return value from `cargo test` is  propagated.

Closes #625
2018-03-26 17:11:06 -07:00
Brian Smith 7247ffeee3
Proxy: Clarify destination test support code queue handling (#617)
Use `VecDeqeue` to make the queue structure clear. Follow good practice
by minimizing the amount of time the lock is held. Clarify how
defaulting logic works.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-26 10:45:05 -10:00
Oliver Gould 006360aa90
Skip flaky tests for #613 (#614)
The metrics endpoint tests are flaky because there are no guarantees
that the metrics pipeline has processed events before the metrics
endpoint is read. This can cause CI to fail spuriously.

Disable these tests from running in CI until #613 is resolved.
2018-03-25 14:26:14 -07:00
Andrew Seigner 291d8e97ab
Move injected data from env var to k8s labels (#605)
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.

This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 16:11:34 -07:00
Eliza Weisman 9321932918
Add request_duration_ms metric and increment request_total on request end (#589)
This PR adds the `request_duration_ms` metric to the Prometheus metrics exported by the proxy. It also modifies the `request_total` metric so that it is incremented when a request stream finishes, rather than when it opens, for consistency with how the `response_total` metric is generated.

Making this change required modifying `telemetry::sensors::http` to generate a `StreamRequestEnd` event similar to the `StreamResponseEnd` event. This is done similarly to how sensors are added to response bodies, by generalizing the `ResponseBody` type into a `MeasuredBody` type that can wrap a request or response body. Since this changed the type of request bodies, it necessitated changing request types pretty much everywhere else in the proxy codebase in order to fix the resulting type errors, which is why the diff for this PR is so large.

Closes #570
2018-03-22 15:27:34 -07:00
Eliza Weisman 3f10c80256
Fix double comma in outbound metrics (#601)
Fixes #600 

The proxy metrics endpoint has a bug where metrics recorded in the outbound direction can contain two commas in a row when no outbound label is present. This occurs because the code for formatting the outbound direction label mistakenly assumed that there would always be a destination pod owner label as well, but the proxy isn't currently aware of the destination's pod owner (waiting for #429). 

I've fixed this issue by moving the place where the comma is output from the `fmt::Display` impl for `RequestLabels` to the `fmt::Display` impl for `OutboudnLabels`. This way, the comma between the `direction` and `dst_*` labels is only output when the `dst_*` label is present. 

This bug made it to master since all of the proxy end-to-end tests for metrics only test the inbound router. I've rectified this issue by adding tests on the outbound router as well (which would fail against the current master due to the double comma bug). I've also added a test that asserts there are no double commas in exported metrics, to protect against regressions to this bug.
2018-03-22 14:17:10 -07:00
Eliza Weisman 1c9ce4d118
Add Prometheus /metrics endpoint to proxy (#569)
This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable.

Tests have been added in t`ests/telemetry.rs`
2018-03-21 16:19:32 -07:00
Brian Smith d38a2acff8
Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593)
`cargo fetch` doesn't consider the target platform and downloads
all crates needed to build for any target.

Stop using `cargo fetch` and instead use the implicit fetch done
by `cargo build`, which *does* consider the target platform.

This change results in 12 (soon 15) fewer crates downloaded.
This is a non-trivial savings in build time for a full rebuild
since cargo downloads crates in parallel.

```diff
- Downloading bitflags v1.0.1
- Downloading fuchsia-zircon v0.3.3
- Downloading fuchsia-zircon-sys v0.3.3
- Downloading miow v0.2.1
- Downloading redox_syscall v0.1.37
- Downloading redox_termios v0.1.1
- Downloading termion v1.5.1
- Downloading winapi v0.3.4
- Downloading winapi-i686-pc-windows-gnu v0.4.0
- Downloading winapi-x86_64-pc-windows-gnu v0.4.0
- Downloading wincolor v0.1.6
- Downloading ws2_32-sys v0.2.1
```

I verified that no downloads are done during an incremental
build.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-21 08:51:18 -10:00
Sean McArthur cd59465366
proxy: add SIGTERM and SIGINT handlers (#581)
When the proxy is run in a Docker container, it runs as PID 1, with
no default signal handlers setup. In order to react to signals from
Kubernetes about shutting down, we need to set up explicit handlers.

This adds handlers for SIGTERM and SIGINT.

Closes #549
2018-03-16 18:53:20 -07:00
Carl Lerche 35f61dcc56 Proxy: Upgrade h2 and indexmap crates (#572)
In order to pick up a bugfix in h2, upgrade:

h2 0.1.2
indexmap 1.0.0

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-03-14 12:35:38 -07:00
Eliza Weisman 8da70bb6e2
Run all discovery tests for HTTP/1 as well as HTTP/2 (#556)
In order to ensure we catch discovery and routing issues arising from different logic for HTTP/1 and HTTP/2 requests, I've modified tests/discovery.rs to run all applicable tests with both HTTP/1 and HTTP/2 requests. The tests themselves are largely unchanged, but now there are separate modules containing HTTP/1 and HTTP/2 versions of a majority of the tests.
2018-03-09 17:24:48 -08:00
Eliza Weisman d62a869e68
Fix outbound HTTP/1 requests not using Destinations (#555)
Commit 569d6939a7 introduced a regression that caused the proxy to stop using the Destination service for outbound HTTP/1 requests with no authority in the request URI but a valid authority in the `Host:` header. 

The bug is due to some code in `Outbound::recognize` which assumed that a request had already been passed through `normalize_our_view_of_uri`. This was valid at one point while I was writing #492, as URIs were normalized prior to `recognize` and a request `Extension` was used to mark that they had been rewritten, and the host header and request URI could be assumed to be in agreement, but after merging #514 into the dev branch for #492, this behaviour changed and I forgot to update the logic in `recognize`.

I've fixed the issue by adding the logic for routing on `Host:` headers back into `Outbound::recognize`.

@seanmonstar added a test in `discovery.rs`, `outbound_http1_asks_controller_about_host`, which should exercise this case. I've added a couple more unit tests in that file to try and ensure we cover more of the different cases that can occur here.

Fixes #552
2018-03-09 16:25:19 -08:00
Sean McArthur 83d6a1f579
proxy: improve transparency of host headers and absolute-uris (#535)
In some cases, we would adjust an existing Host header, or add one. And in all cases when an HTTP/1 request was received with an absolute-form target, it was not passed on.

Now, the Host header is never changed. And if the Uri was in absolute-form, it is sent in the same format.

Closes #518
2018-03-08 13:15:21 -08:00
Eliza Weisman 6af9871f13
Fix infinite loop in `tcp::HalfDuplex::copy_into()` (#537)
An infinite loop exists in the TCP proxy, which could be triggered by any raw TCP connection (including HTTPS requests). The connection will be proxied successfully, but instead of closing, it will remain open, and the proxy's CPU usage will remain extremely high indefinitely.

Since `Duplex::poll` will call `half_in.copy_into()`/`half_out.copy_into()` repeatedly, even after they return `Async::Ready`, when one half has shut down and returned ready, it may still be polled again, as `Duplex::poll` waits until _both_ halves have returned `Ready`. Because of the guard that `!dst.is_shutdown`, intended to prevent the destination from shutting down twice, the function will not return if it is polled again after returning `Async::Ready` once.

I've fixed this by moving the guard against double shutdowns out of the loop, so that the function will return `Async::Ready` again if it is polled after shutting down the destination.

I've also included a unit test against regressions to this bug. The unit test fails against master.

Fixes #519 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Andrew Seigner <andrew@sig.gy>
2018-03-08 12:43:19 -08:00
Brian Smith 3a73411375
Proxy: Test & document localhost. name resolution. (#531)
* Proxy: Test & document localhost. name resolution.

Closes #358.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 17:40:39 -10:00
Brian Smith 7aa1d0b26d
Proxy: Don't resolve absolute names outside zone using Destinations (#530)
* Proxy: Don't resolve absolute names outside zone using Destinations service

Many absolute names were being resolved using the Destinations service due to logic error
in the proxy's matching of the zone to the default zone.

Fix that bug.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:53:32 -10:00
Brian Smith 649e784d9c
Simplify cluster zone suffix handling in the proxy (#528)
* Temporarily stop trying to support configurable zones in the proxy.

None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.

Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting

The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.

It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.

Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Further hard-code "cluster.local" as the zone, temporarily.

Addresses review feedback.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:30:13 -10:00