Commit Graph

256 Commits

Author SHA1 Message Date
Eliza Weisman 4490db9909
proxy: Add TLS identity to endpoint metadata and wire it through to `Connect::new` (#1008)
Depends on #1006. Depends on #1041.

This PR adds a `tls_identity` field to the endpoint `Metadata` struct, which
contains the `TlsIdentity` metadata sent by the control plane's Destination
service. 

I changed the `ctx::transport::Client` context struct to hold a `Metadata`,
rather than just the labels, so the TLS support determination is always
available. In addition, I've added it as an additional parameter to 
`transport::Connect::new`, so that when we create a new connection, the TLS
code will be able to determine whether or not TLS is supported and, if it is, 
how to verify the endpoint's identity.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-04 20:08:55 -07:00
Eliza Weisman d5d610f542
proxy: Change `DEFAULT_OUTBOUND_ROUTER_CAPACITY` from 10,000 to 100 (#1060)
The proxy can't actually support 10K clients currently (for one, we can't open
10K resolution streams to the destination service). 100 is a more-realistic 
but sufficiently-high default.
2018-06-04 14:34:34 -07:00
Eliza Weisman 7220fb5367
proxy: Reload TLS config on changes (#1056)
This PR modifies the proxy's TLS code so that the TLS config files are reloaded
when any of them has changed (including if they did not previously exist).

If reloading the configs returns an error, we log an error and continue using
the old config.

Currently, this is implemented by polling the file system for the time they
were last modified at a fixed interval. However, I've implemented this so 
that the changes are passed around as a `Stream`, and that reloading and
updating the config is in a separate function the one that detects changes.
Therefore, it should be fairly easy to plug in support for `inotify` (and 
other FS watch APIs) later, as long as we can use them to generate a 
`Stream` of changes.

Closes #369 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-04 13:36:28 -07:00
Eliza Weisman b7a759cb64
proxy: Update `dns` module to use new Trust-DNS `AsyncResolver` API (#1032)
Depends on #974.  Closes #859.

This PR updates the proxy's `dns` module to use the new `AsyncResolver` API I
added to `trust-dns-resolver` in bluejekyll/trust-dns#487. This allows us to 
spawn one `Future` that will drive DNS resolution in the background, rather
than having to repeatedly clone a heavyweight `ResolverFuture` for every 
lookup. It also means that we no longer have to clone the name to resolve in
quite as many places.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-01 14:36:37 -07:00
Eliza Weisman cccebc2b26
proxy: Honor TTLs for DNS responses (#974)
Closes #711. Depends on #967.

This PR changes the proxy's `destination` module to honor the TTLs associated
with DNS lookups, now that bluejekyll/trust-dns#444 has been merged and we can
access this information from the Trust-DNS Resolver API. 

The `destination::background::DestinationSet` type has been modified so that, 
when a successful result is received for a DNS query, the DNS server will be 
polled again after the deadline associated with that query, rather than after
a fixed deadline. The fixed deadline is still used to determine when to poll
again for negative DNS responses or for errors.

Furthermore, Conduit now accepts an optional CONDUIT_PROXY_DNS_MIN_TTL 
environment variable that will configure a minimum TTL for DNS results. If the
deadline of a DNS response gives it a TTL shorter than the configured minimum,
Conduit will not poll DNS again until after that minimum TTL is elapsed. By
default, there is no minimum value set, as this feature is intended primarily
for when Conduit is run locally for development purposes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-01 12:17:48 -07:00
Brian Smith b114ef6819
Add initial infrastructure for optionally accepting TLS connections (#1047)
* Add initial infrastructure for optinally accepting TLS connections.

If the environment gives us the paths to the certificate chain and private key
then use TLS for all accepted TCP connections. Otherwise, continue on using
plaintext for all accepted TCP connections. The default behavior--no TLS--isn't
changed.

Later we'll make this smarter by adding protocol detection so that when the TLS
configuration is available, we'll accept both TLS and non-TLS connections.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-05-31 12:20:57 -10:00
Eliza Weisman 5a42ce357e
proto: Add TLS identity to WeightedAddr message (#1041)
Required for #1008.

This PR adds the `TlsIdentity` message to the Destination service proto,
to describe what strategy the proxy should use for verifying an endpoint's TLS
certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message.

Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, 
which consists of the Kubernetes pod name of the endpoint, the namespace of
the endpoint, and the namespace of that pod's Conduit control plane. The proxy
should attempt to connect over TLS if the control plane namespace matches its 
own control plane namespace. The pod name and namespace are used to verify 
the endpoint's TLS certificate.

See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046.

This change was initially part of #1008, but I factored it out to make the diff
smaller.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-31 11:48:25 -07:00
Oliver Gould 294de5b3c4
proxy: Add rich logging contexts (#1037)
While debugging proxy issues, I found it necessary to change how logging contexts are
instrumented, especially for clients.

This change moves away from using `Debug` types to in favor of `Display` types.
Furthermore, the `logging` module now provides a uniform set of logging contexts to be
used throughout the application.  All clients, servers, and background tasks should now be
instrumented so that their log messages contain predictable metadata.

Some small improvements have been made to ensure that logging contexts are correct
when a `Future` is dropped (which is important for some H2 uses, especially).
2018-05-30 13:41:59 -07:00
Oliver Gould db2478f5a2
proxy: Fix bench tests and require bench tests in CI (#1038)
b3170af changed the DstLabels api, but the bench test was not updated
accordingly.

Furthermore, since bench tests require a nightly rust version, we've
avoided running them in CI. This makes it easy for these tests to break, however.

This updates the benches/record.rs. Additionally, in CI, we pin the rust nightly'
version to a known-good version so that we can reliably run these bench test
without the fear of external changes breaking our build.
2018-05-30 07:20:28 -07:00
Oliver Gould 22719a2898
proxy: Ensure labels are reliably ordered (#1030)
The proxy receives a hash map of endpoint labels from the destination
service. As this map is serialized into a string, its keys and values
do not have a stable ordering.

To fix this, we sort the keys for all labels before constructing an
instance of `DstLabels`.

This change was much more difficult to test than it was to fix, so tests
this change was tested manually.

Fixes #1015
2018-05-30 07:13:26 -07:00
Eliza Weisman b3170af567
proxy: Remove dynamic label updating on bound services (#1006)
Depends on tower-rs/tower#75. Required for #386

In order for the proxy to use the TLS support metadata from the Destination 
service correctly, we determined that the code for dynamically changing the
labels on an already-bound service should be removed, and any change in
metadata should cause an endpoint to be rebound.

I've modified the proxy so that we no longer update the labels using 
`futures-watch` (as a sidenote, we no longer depend on that crate). Metadata
update events now cause the `tower-discover::Discover` implementation for 
`DestinationSet` to re-insert the changed endpoint into the load balancer.
Upstream PR tower-rs/tower#75 in tower-balance changes the load balancer 
to honor duplicate insertions by replacing the old endpoint rather than 
ignoring them; that change is necessary for the tests to pass on this branch.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-29 12:48:59 -07:00
Brian Smith 8c79ff3601
Fix location of raw pointer comment in `ContextGuard`. (#1027)
Commit b861a6df31 moved the code the
comment was describing, but didn't move the comment.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-05-26 18:30:37 -10:00
Brian Smith 9c339afe72
Abstract I/O interface into a trait. (#1020)
* Rename so_original_dst.rs to addr_info.rs.

Prepare for expanding the functionality of this module by renaming it.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Abstract I/O interface into a trait.

Instead of pattern matching over an `Io` variant, use a `Box<Io>` to
abstract the I/O interface. This will make it easier to add a TLS
transport.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-05-26 10:04:31 -10:00
Eliza Weisman 1a89107ece
proxy: Fix missing logging contexts on inbound/outbound (#1025)
Changes to `BoundPort::listen_and_fold` inadvertently broke the 
`::logging::context_future`s on the `serve` futures for the Inbound and 
outbound proxies, leading to log messages that didn't have the appropriate
context. This fixes that.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-25 16:31:33 -07:00
Brian Smith 88d614a425
Prepare `BoundPort::listen_and_fold` for upcoming TLS work. (#1018)
Refactor `listen_and_fold()` to make it possible to insert more futures
into the chain before the folding.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-05-25 09:59:05 -10:00
Oliver Gould a3cb1e47a2
proxy: Record EOS when bodies are dropped (#1003)
It appears that hyper does not necessarily poll bodies to completion,
and instead simply drops a body as soon as `content-length` is reached
(hyperium/hyper#1521).

This change implements Drop for MeasuredBody such that the stream-end
event is triggered if it had not been triggered previously. This ensures
that response latencies and counts are recorded for HTTP/1 streams.

Fixes #994
2018-05-24 10:40:29 -07:00
Oliver Gould 8dc56f8691
proxy: Fix h1 body implementation (#995)
In the h1-h2 glue code, we incorrectly called `is_empty()` to determine
if an H1 stream had ended. `is_empty` only returns true if there was no
body at all (rather than if the body has been fully consumed).

By changing this to call `hyper::body::Payload::is_end_stream`, h1
bodies now behave the same as h2 bodies.

Relates to #994
2018-05-24 07:23:14 -07:00
Oliver Gould 163e0a1e9a
proxy: Record HTTP latency at first data frame (#981)
Currently, the proxy records a request's latency as the time between
when a request is opened and when its response stream completes. This is
not what we intend to record, especially when a response is long-lived.

In order to more accurate record latency, we want to track the time at
which the first response body frame is received (which is a close
approximation of time-to-first-byte).

Telemetry aggregation has been changed to use the first-frame time to
compute latencies; tests have been updated to exercise this behavior; and
the metrics documentation has been updated to reflect this change.

Addresses #818 
Relates to #980
2018-05-23 16:02:44 -07:00
Oliver Gould 41d9f915ed
proxy: Alter telemetry to use discrete instants (#980)
Proxy tasks emit events to the telemetry system. These events are used
aggregate counts and latencies, as well as to inform Tap requests.
Initially, these events included durations, describing the relevant time
that elapsed between this event and another.

This approach is somewhat inflexible -- it unnecessarily constrains the
set of measurements that can computed in the telemetry system.

To remedy this, the `Event` types can be changed to report discrete
`Instant`s (rather than `Duration`s). Then, when latencies are computed
in the telemetry system, these discrete instants can be compared to
produce durations.

There are no functional changes in this PR.
2018-05-22 14:57:00 -07:00
Eliza Weisman d256377eea
proxy: Remove configure-and-bind-to-executor pattern (#967)
A common pattern when using the old Tokio API was separating the configuration
of a task from binding it to an executor to run on. This was often necessary
when we wanted to construct a type corresponding to some task before the
reactor on which it would execute was initialized. Typically, this was 
accomplished with two separate types, one of which represented the 
configuration and exposed only a method to take a reactor `Handle` and
transform it to the other type, representing the actual task.

After we migrate to the new Tokio API in #944, executors no longer need to be
passed explictly, as we can use `DefaultExecutor::current` or 
`current_thread::TaskExecutor::current` to spawn a task on the current 
executor. Therefore, a lot of this complexity can be refactored away.

This PR refactors the `Config` and `Process` structs in
i`control::destination::background` into a single `Background` struct, and 
removes the `dns::Config` and `telemetry::MakeControl` structs (`dns::Resolver`
and `telemetry::Control` are now constructed directly). It should not cause
any functional changes.

Closes #966

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-21 15:40:33 -07:00
Eliza Weisman fb1268b4d3
proxy: Use `impl Trait` to unbox some futures (#969)
Now that `impl Trait` is stable, we don't need to box as many futures. We still
need to box before spawning them on an executor, but the component futures no 
longer require their own boxes. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io
2018-05-19 13:19:05 -07:00
Eliza Weisman a067d3918c
proxy: Upgrade Conduit to use the new version of Tokio (#944)
Closes #888.  Closes #867.

This branch upgrades Conduit to use the new Tokio API. It was also necessary to
upgrade some other dependencies (including `hyper`, and `trust-dns`) alongside
this upgrade. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-17 16:38:15 -07:00
Sean McArthur fb904f0174
proxy: rebind services on connect errors (#952)
Instead of having connect errors destroy all buffered requests,
this changes Bind to return a service that can rebind itself when
there is a connect error.

It won't try to establish the new connection itself, but waits for
the buffer to poll again. Combing this with changes in tower-buffer
to remove canceled requests from the buffer should mean that we
won't loop on connect errors for forever.

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-05-17 14:15:16 -07:00
Oliver Gould a1b56846b2
proxy: Drop destination resolutions when unused (#956)
A proxy dispatches requests over a constrained number of routes. When
the router's upper bound is reached---and potentially in other future
scenarios---router capacity is created by removing unused routes, their
load balancers, and all related endpoint stacks.

However, in the current regime, the controller subsystem will continue
to monitor discovery observations. As the number of active observations
expands over time, the controller task ends up with more and more work
to do.

This change introduces a shared atomic boolean between the resolution
returned to the load balancer and the state maintained when
communicating with the service. Before the controller polls its active
resolutions, it first ensures that all unused resolutions are dropped.
2018-05-16 17:28:11 -07:00
Eliza Weisman 6433ce060d
proxy: Fix end events not firing when a stream is ended by a DATA frame (#957)
A recent upstream change in `tower-h2` (tower-rs/tower-h2@d9b3140) caused some
HTTP/2 streams that were previously terminated by TRAILERS frames to be 
terminated by empty DATA frames with the end of stream bit set, instead.

This broke some tests in my dev branch for #944, as our test server also uses
`tower-h2`, and some of the metrics tests were no longer seeing the expected
`StreamResponseEnd` events due to this change. This issue may also occur in
other cases, resulting in incorrect metrics.

This PR changes `MeasuredBody::poll_data` to trigger the Stream End event if
it sees a DATA frame that ends the stream.

Fixes #954 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-16 16:24:29 -07:00
Oliver Gould fbba6bf4c8
rfc: proxy: Split `control::discovery` into submodules (#955)
While preparing #946, I was again struck by the `discovery` module being very weighty
(nearly 800 dense lines). The intent of this change is only to improve readability. There
are no functional changes. The following aesthetic changes have been made:

* `control::discovery` has been renamed to `control::destination` to be more consistent
  with the rest of conduit's terminology (destinations aren't the only thing that need to
  be discovered).
* In that vein, the `Discovery` type has been renamed `Resolver` (since it exposes one
  function, `resolve`).
* The `Watch` type has been renamed `Resolution`. This disambiguates the type form
  `futures_watch::Watch`(which is used in the same code) and makes it more clearly the
  product of a `Resolver`.
* The `Background` and `DiscoveryWork` names were very opaque.  `Background` is now
  `background::Config` to indicate that it can't actually _do_ anything; and
  `DiscoveryWork` is now `background::Process` to indicate that it's responsible for
  processing destination updates.
* `DestinationSet` is now a private implementation detail in the `background` module.
* An internal `ResolveRequest` type replaces an unnamed tuple (now that it's used across
  files).
* `rustfmt` has been run on `background.rs` and `endpoint.rs`
2018-05-15 17:23:01 -07:00
Eliza Weisman b28193817c
proxy: Move absolute URI detection to `bind::Protocol` (#938)
This is in preparation for landing the Tokio upgrade. 

In the upcoming Hyper release, the handling of absolute form request URIs
moved from `hyper::Request` to the `hyper::client::connect::Connect` trait.
Once we upgrade to the new Tokio, we will have to upgrade our Hyper 
dependency as well.

Currently, Conduit detects whether the request URI is in absolute form in
`h1::normalize_our_view_of_uri` and adds an extension to the request if it is.
This will no longer work with the new Hyper, as that function is called from
the `bind::NormalizeUri` service, which is not constructed until after the 
client connection is established. Therefore, it is necessary to move this
information to `bind::Protocol`, so that it can be passed to 
`transparency::client::HyperConnect` (our implementation of Hyper's `Connect`
trait) when we are using the newest Hyper. 

For now, however, I've left in the `UriIsAbsoluteForm` extension and continued
to set it in  `h1::normalize_our_view_of_uri`, since we currently still use it
on the current Hyper version. I thought it was good to minimize the changes to
this existing code, as it will be removed when we migrate to the new Hyper.

This PR shouldn't cause any functional changes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-15 13:16:58 -07:00
Eliza Weisman fac845bdcd
proxy: Add a lazy version of ThreadRng (#936)
This is in preparation for landing the Tokio upgrade. 

In order to be generic over Tokio's current thread and threadpool executors,
a number of types in Conduit which were not previously `Send` are now required
to be `Send`. A majority of this work will be done in the main Tokio upgrade
PR, as it is in many cases not possible to make these types `Send` _without_
using the new Tokio API (in order to remove `Handle`s, etc.); however, I'm
factoring out everything possible and trying to land it in separate PRs.

The p2c load balancer constructed in `Outbound` is currently parameterized
over a random number generator. We currently construct it by getting the 
thread-local RNG, and passing it to the load balancer constructor. However,
the thread-local RNG is not `Send`. I've fixed this issue by creating a new
zero-sized empty struct type which implements `rand::Rng` simply by calling
`thread_rng()` every time its' called, and passing that to 
`choose::power_of_two_choices` instead. Since this is an empty type which 
contains no data, and the correct thread-local RNG is accessed whenever
the methods are called, this new type can trivially be `Send`. According to
the `rand` crate's documentation, this is the correct way to use `ThreadRng`
anyway:
> Retrieve the lazily-initialized thread-local random number generator, seeded
> by the system. Intended to be used in method chaining style, e.g. 
> `thread_rng().gen::<i32>()`.
> (from https://docs.rs/rand/0.4.2/rand/fn.thread_rng.html)

This shouldn't lead to any functional changes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-11 17:26:23 -07:00
Eliza Weisman f3cd0c4f27
proxy: Make `outbound_updates_newer_services` test forward-compatible (#939)
This is in preparation for landing the Tokio upgrade.

The test `discovery::outbound_updates_newer_services` currently contains an
assertion that an HTTP/2 request to an HTTP/1 service will return a response
with status code 500. This is because the current version of Hyper on which
Conduit depends does not support protocol upgrades.

However, commit hyperium/hyper@bc6af88a32, which
adds support for this kind of protocol upgrade, was recently merged to Hyper's
master branch. Therefore, this assertion will no longer be correct once we 
depend on the upcoming Hyper release. When we migrate to the new Tokio, it will
be necessary to upgrade our Hyper dependency as well, and this test will fail.

I've modified the test to no longer make assertions about the response's status
code, so that it's compatible with both the current and future Hyper versions.
If the response is not `Ok`, the test will still fail, since 
`tests::support::Client::request()` `expect`s that the response is successful,
but the status code is ignored. I've added a comment in the test explaining 
this.

Eventually, when the master version of Conduit depends on the latest Hyper, we
may want to change this test to assert that the status code is 200 instead. We
may also want to add more tests for Hyper's protocol upgrade functionality, but
that seems out of scope for this PR.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-05-11 14:36:03 -07:00
Oliver Gould 319088bd70
proxy: Update to Rust 1.26.0 (#942)
Notably, this adds support for `impl Trait`.

More info at https://blog.rust-lang.org/2018/05/10/Rust-1.26.html
2018-05-11 13:52:56 -07:00
Oliver Gould e91699bba2
proxy/router: Implement LRU cache eviction (#925)
The router's cache has no means to evict unused entries when capacity is
reached.

This change does the following:

- Wraps cache values in a smart pointer that tracks the last time of
  access for each entry. The smart pointer updates the access time when
  the reference to entry is dropped.
- When capacity is not available, all nodes that have not been accessed
  within some minimal idle age are dropped.

Accesses and updates to the map are O(1) when capacity is available.
Reclaiming capacity is O(n), so it's expected that the router is
configured with enough capacity such that capacity need not be reclaimed
usually.
2018-05-10 19:06:31 -07:00
Oliver Gould 27345a6d2f
proxy/router: Create a separate `cache` module (#920)
The router's `Inner` type contains a map of routes. Recently, this map's
capacity has become constrained to prevent leakage for long-running
processes.

This change prepares for a fuller LRU implementation by moving the
router's `Inner` type to a new (tested) module, `cache`.
2018-05-10 14:00:50 -07:00
Oliver Gould 2019747ff8
router: Store `recognize` outside of lock (#913)
The router stores its cache and `Recognize` implementation within a `Mutex`,
but there is no need for the recognizer to be locked.

This change creates a new `Cache` type that is locked independently of
`Recognize`. In order to accomplish this, `Recognize::bind_service` has
been changed to take an immutable reference to its `self`.

The (unused) `Single` type has been removed because it relied on
`bind_service` being mutable.
2018-05-10 11:51:01 -07:00
Oliver Gould 76ca32297e
proxy: Remove try_bind_route! macro (#915)
The macro is now only used once, so it seems clearer just to inline the
logic.
2018-05-09 14:38:20 -07:00
Sean McArthur b48a26b0b7
proxy: change peek to use reads for eventual support of TLS (#901) 2018-05-08 18:19:12 -07:00
Oliver Gould 8bedd9d38a
rfc: Make DestinationServiceQuery generic (#749)
The goals of this change are:
1. Reduce the size/complexity of `control::discovery` in order to ease code reviews.
2. Extract a reusable grpc streaming utility.

There are no intended functional changes.

`control::discovery::DestinationServiceQuery` is used to track the state of a request (and
streaming response) to the destination service. Very little of this logic is specific to
the destination service.

The `DestinationServiceQuery` and associated `UpdateRx` type have been moved to a new
module, `control::remote_stream`, as `Remote` and `Receiver`, respectively. Both of these
types are generic over the gRPC message type, so it will be possible to use this utility
with additional API endpoints.

The `Receiver::poll` implementation has been simplified to be more idiomatic with the rest
of our code (namely, using `try_ready!`).
2018-05-08 16:54:20 -07:00
Oliver Gould 63fbbd6931
proxy: Parse units with duration configurations (#909)
Configuration values that take durations are currently specified as
time values with no units.  So `600` may mean 600ms in some contexts and
10 minutes in others.

In order to avoid this problem, this change now requires that
configurations provide explicit units for time values such as '600ms' or
10 minutes'.

Fixes #27.
2018-05-08 13:54:12 -07:00
Oliver Gould 68e203a2fc
proxy: Use Duration types for config defaults (#906)
It's easy to misconfigure default durations, since they're recorded as
integers and converted to Durations separately.

Now, all default constants that represent durations use const `Duration`
instances (enabled by a recent Rust release).

This fixes #905 which was caused by using the wrong time unit for the
metrics retain time.
2018-05-08 10:58:22 -07:00
Oliver Gould f4dba72cc3
proxy: Track SingleUse services against router capacity (#902)
PR #898 introduces capacity limits to the balancer. However, because the
router supports "single-use" routes--routes that are bound only for the
life of a single HTTP1 request--it is easy for a router to exceed its
configured capacity.

In order to fix this, the `Reuse` type is removed from the router
library so that _all_ routes are considered cacheable. It's now the
responsibility of the bound service to enforce policies with regards to
client retention.

Routes were not added to the cache when the service could not be used to
process more than a single request. Now, `Bind` wraps its returned
services (via the `Binding` type), that dictate whether a single client
is reused or if one is bound for each request.

This enables all routes to be cached without changing behavior with
regards to connection reuse.
2018-05-08 10:57:56 -07:00
Oliver Gould 02e6d018d0
proxy: Bound on router capacity (#898)
Currently, the proxy may cache an unbounded number of routes. In order
to prevent such leaks in production, new configurations are introduced
to limit the number of inbound and outbound HTTP routes. By default, we
support 100 inbound routes and 10K outbound routes.

In a followup, we'll introduce an eviction strategy so that capacity can
be reclaimed gracefully.
2018-05-04 16:32:30 -07:00
Oliver Gould 3310968647
proxy: Refactor router implementation (#894)
The Router's primary `call` implementation is somewhat difficult to
follow.

This change does not introduce any functional changes, but makes the
function easier to reason about.

This is being done in preparation for functional changes.
2018-05-02 15:47:36 -07:00
Oliver Gould 7f16079f64
proxy: Upgrade tower dependencies (#892)
In order to pick up https://github.com/tower-rs/tower-grpc/pull/60,
upgrade tower dependencies. This will reduce the cost of updating
for upcoming tower-h2 improvements.
2018-05-02 13:40:55 -07:00
Eliza Weisman 86bb701be8
Add unit tests for `metrics::record` (#890)
This PR adds unit tests for `metrics::record`, based on the benchmarks for the
same function. Currently, there is a test that fires a single response end event
and asserts that the metrics state is correct afterward, and a test that fires
all the events to simulate a full connection lifetime, and asserts that the
metrics state is correct afterward. I'd like to also add a test that simulates
multiple events with different labels, but I'll add that in a subsequent PR,

In order to add these tests, it was necessary to to add test-only accessors
to make some `metrics` structs `pub`` so that the test can access them. 
I also added some test-only functions to `metrics::Histogram`s, to make 
them easier to make assertions about.
2018-05-02 13:26:27 -07:00
Oliver Gould 2578a47617
proxy: Do not build Arbitrary types in Docker (#889)
When the proxy's Dockerfile ran tests, it was necessary to build
Arbitrary types for quickchecking protobuf types.

Now that tests have been disabled, this optional set of dependencies is
no longer required.

Relates to #882.
2018-05-01 14:56:24 -07:00
Oliver Gould eb08d47347
proxy: Fix Tap ID generation (#885)
The proxy's tap server assigns a sequential numeric ID to each inbound
Tap request to assist tap lifecycle management.

The server implementation keeps a local counter to keep track of tap
IDs. However, this implementation is cloned for each individual tap
requests, so `0` the only tap ID ever used.

This change moves the Tap ID to be stored in a shared atomic integer.

Debug logging has been improved as well.
2018-05-01 11:59:45 -07:00
Oliver Gould 1801118906
Do not run tests in proxy Dockerfile (#882)
The proxy Dockerfile includes test execution. While the intentions of
this are good, it has unintended consequences: we can ship code linked
with test dependencies.

Because we have other means for testing proxy code (cargo, locally; and
CI runs tests outside of Docker), it is fine to remove these tests.
2018-05-01 11:54:02 -07:00
Eliza Weisman dd3b952634 proxy: Fix metrics constructor in benches (#881)
Fixes a test compilation error.
2018-04-30 17:48:07 -07:00
Oliver Gould ada5cb267e
proxy: Expire metrics that have not been updated for 10 minutes (#880)
The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE
environment variable that dictates the amount of time that the proxy will retain
metrics that have not been updated.

A timestamp is maintained for each unique set of labels, indicating the last time
that the scope was updated. Then, when metrics are read, all metrics older than
CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry.

A ctx::test_utils module has been added to aid testing.

Fixes #819
2018-04-30 16:11:12 -07:00
Oliver Gould 6512b02780
proxy: Group metrics by label (#879)
Previously, we maintained a map of labels for each metric. Because the same keys are used
in multiple scopes, this causes redundant hashing & map lookup when updating metrics.

With this change, there is now only one map per unique label scope and all of the metrics
for each scope are stored in the value. This makes metrics inserting faster and prepares
for eviction of idle metrics.

The Metric type has been split into Metric, which now only holds metric metadata and is
responsible for printing a given metric, and Scopes which holds groupings of metrics by
label.

The metrics! macro is provided to make it easy to define Metric instances statically.
2018-04-30 15:33:09 -07:00
Oliver Gould 7247cb8ddc
proxy: Make each metric type responsible for formatting (#878)
In order to set up for a refactor that removes the `Metric` type, the
`FmtMetric` trait--implemented by `Counter`, `Gauge`, and
`Histogram`--is introduced to push prometheus formatting down into each
type.

With this change, the `Histogram` type now relies on `Counter` (and its
metric formatting) more heavily.
2018-04-30 13:00:21 -07:00
Oliver Gould f73fdc1eae
Move `metrics::Serve` into its own module (#877)
With this change, metrics/mod.rs now contains only metrics types.
2018-04-30 10:52:08 -07:00
Eliza Weisman 1fb15a55b3
proxy: Remove Arcs from metric labels (#873)
This PR removes the `Arc`s from the various label types in the proxy's 
`metrics` modules. This should make the write side of the metrics code
much more efficient (and makes the code much simpler! :D).

This change was particularly easy to implement for the TCP `TransportLabels`
and `TransportCloseLabels`, which consisted of only `struct`s and `enum`s,
and could easily be changed to derive `Copy`. 

For protocol-level `RequestLabels`, the request's authority was a `String`,
which still needs to be reference-counted, as the overhead of cloning `String`s
is almost certainly worse than that added by ref-counting. However, rather than
adding an additional `Arc<str>`, I changed `RequestLabels` to store the 
authority as a `http::uri::Authority`, which is backed by a `ByteStr` and thus
already ref-counted. Now, when constructing `RequestLabels`, we just take 
another reference to the `Authority` already stored in the request context.
Since `Authority` implements `fmt::Display` already, formatting the labels 
still works.

`ResponseLabels` already store the `DstLabels` string in an `Arc`, so no
additional changes there were necessary. By removing the outer `Arc` around
`ResponseLabels`, we now only have to ref-count the portion of the label type
that would actually be inefficient to clone.

@olix0r ran the benchmarks from #874 against this branch, and it seems to be 
a small but noticeable improvement:
```
test record_many_dsts        ... bench:     151,076 ns/iter (+/- 182,151)
test record_one_conn_request ... bench:       1,599 ns/iter (+/- 209)
test record_response_end     ... bench:         676 ns/iter (+/- 144)
```

before:
```
test record_many_dsts        ... bench:     158,403 ns/iter (+/- 130,241)
test record_one_conn_request ... bench:       1,823 ns/iter (+/- 1,408)
test record_response_end     ... bench:         547 ns/iter (+/- 70)
```

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-29 13:50:27 -07:00
Oliver Gould 935ccd5f78
proxy: Implement benchmarks for telemetry recording (#874)
Before changing the telemetry implementation, we should have a means to
understand the impacts of such changes.

To run, you must use a nightly toolchain:
```
rustup run nightly cargo bench -p conduit-proxy -- record
```
2018-04-29 12:55:26 -07:00
Oliver Gould 64a3bb09b2
Rename `metrics::Aggregate` to `metrics::Record` (#875)
Move `Record` into its own file.
2018-04-28 15:35:29 -07:00
Oliver Gould 9c70310406
proxy: Implement a From converter for latency::Ms (#872)
This reduces callsite verbosity for latency measurements at the expense of a fn-level
generic.
2018-04-27 17:55:44 -07:00
Eliza Weisman 9156a80a22
proxy: Add histogram unit tests (#870)
This PR adds the unit tests for the proxy metrics module's Histogram 
implementation that I wrote in #775 to @olix0r's Histogram implementation
added in #868. The tests weren't too difficult to adapt for the new code,
and everything seems to work correctly!

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-27 17:02:13 -07:00
Oliver Gould 1cadef9516
proxy: Make `Histogram` generic over its value (#868)
In order to support histograms measured in, for instance, microseconds,
Histogram should blind store integers without being aware of the unit.
In order to accomplish this, we make `Histogram` generic over a `V:
Into<u64>`, such that all values added to the histogram must be of type
`V`.

In doing this, we also make the histogram buckets configurable, though
we maintain the same defaults used for latency values.

The `Histogram` type has been moved to a new module, and the `Bucket`
and `Bounds` helper types have been introduced to help make histogram
logic clearer and latency-agnostic.
2018-04-27 14:43:09 -07:00
Sean McArthur 5fb4695358
proxy: wrap connections in Transport sensor before peeking (#851)
In case there are any errors while peeking the connection to do protocol
detection, the sensors will now be in place to detect them. Besides just
errors, this will also allow reporting about connections that are
accepted, but then immediately closed.

Additionally:

- add write_buf implementation for Transport sensor, can help
  performance for http1/http2
- add better logs for tcp connections errors
- add printlns for when tests fail

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-27 14:18:23 -07:00
Oliver Gould 80fdb97f88
Move `Counter` and `Gauge` to their own modules (#861)
In preparation for a larger metrics refactor, this change splits the
Counter and Gauge types into their own modules.

Furthermore, this makes the minor change to these types: incr() and
decr() no longer return `self`. We were not actually ever using the
returned self references, and I find the unit return type to more
obviously indicate the side-effecty-ness of these calls. #smpfy
2018-04-26 16:49:37 -07:00
Oliver Gould d4d0e579c2
Introduce the `peer` label to transport metrics (#848)
Previously, the proxy exposed separate _accept_ and _connect_ metrics
for some metric types, but not for all. This leads to confusing
aggregations, particularly for read and write taotals.

This change primarily introduces the `peer` prometheus label (with
possible values _src_ or _dst_) to indicate which side of the proxy the
metric reflects.

Additionally, the `received_bytes` and `sent_bytes` metrics have been
renamed as `tcp_read_bytes_total` and `tcp_write_bytes_total`,
resectively. This more naturally fits into existing idioms.  Stream
classification is not applied to these metrics, as we plan to increment
them throughout stream lifetime and not only on close.

The `tcp_connections_open` metric has also been renamed to
`tcp_open_connections` to reflect Prometheus idioms.

Finally, `msg1` and `msg2` have been constified in telemetry test
fixtures so that tests are somewhat easier to read.
2018-04-25 14:06:33 -07:00
Brian Smith 0c23ad416b
Proxy: Use trust-dns-resolver for DNS. (#834)
trust-dns-resolver is a more complete implementation. In particular,
it supports CNAMES correctly, which is needed for PR #764. It also
supports /etc/hosts, which will help with issue #62.

Use the 0.8.2 pre-release since it hasn't been released yet. It was
created at our request.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-25 10:04:49 -10:00
Eliza Weisman 3a3079d8ed
Fix assertions in metrics_compression test (#847)
Fixes #846 

The proxy `metrics_compression` test contained an assertion that a compressed scrape contained the `request_duration_ms_count` metric. This was chosen completely arbitrarily, and was only intended as an assertion that metrics were updated between compressed scrapes. Unfortunately, that metric was removed in d9112abc93, so when #665 merged to master, this test broke. CI didn't catch this since we don't build merges for PRs --- we should probably (re)enable this in Travis?

This PR fixes the test to assert on a metric that wasn't removed. Sorry for the s!
2018-04-25 11:02:52 -07:00
Carl Lerche 298321a6c1
Bump proxy h2 dependency to v0.1.6. (#845)
This release includes a number of bug fixes related to HTTP/2.0 stream
management.

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-04-25 08:40:42 -07:00
Eliza Weisman a2c60e8fcf
Add optional GZIP compression to proxy /metrics endpoint (#665)
Closes #598.

According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent.

I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-24 17:42:50 -07:00
Oliver Gould 3799debab5
Add destination labels to all relevant tap events (#840)
The proxy incorrectly only added labels to response events. Destination
labels should be added to all tap events sent by the proxy.
2018-04-24 17:15:13 -07:00
Eliza Weisman 5d29c270bf
proxy: Add tcp_connections_open gauge (#791)
Depends on #785.

This PR adds the `tcp_connections_open` gauge to the proxy's TCP metrics. It also adds some tests for that metric.
2018-04-24 10:17:48 -07:00
Sean McArthur b053b16e9d
proxy tests: reduce some boilerplate, improve error information (#833)
The `controller` part of the proxy will now use a default, removing the
need to pass the exact same `controller::new().run()` in every test
case.

The TCP server and client will include their socket addresses in some
panics.

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-23 18:01:51 -07:00
Eliza Weisman d9112abc93
proxy: remove unused metrics (#826)
This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md`

Closes #821
2018-04-23 16:05:20 -07:00
Eliza Weisman 455c99d4d4
Ignore flaky metrics tests on CI (#832)
Fixes #831.

Proxy metrics tests `transport::inbound_tcp_accept` and `transport::inbound_tcp_duration` are known to be flaky and should be ignored on CI. Note that the outbound versions of these tests were already marked as flaky, so this was almost certainly either an oversight or the result of an incorrect merge.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-23 14:29:34 -07:00
Eliza Weisman 79304cdcaf
proxy: Unbreak process_start_time_seconds metric (#825)
The refactoring of how metrics are formatted in 674ce87588 inadvertently introduced a bug that caused the `process_start_time_seconds` metric to be formatted as just a number without the metric name. This causes Prometheus to fail with a parse error rather than accepting the metrics.

I've fixed this issue, and added a unit test to detect regressions in the future.
2018-04-20 15:59:08 -07:00
Eliza Weisman c19da70965
proxy: Add classifications to TCP close stats (#790)
This PR adds a `classification` label to transport level metrics collected on transport close. Like the `classification` label on HTTP response metrics, the value may be either `"success"` or `"failure"`. The label value is determined based on the `clean` field on the `TransportClose` event, which indicates whether a transport closed cleanly or due to an error.

I've updated the tests for transport-level metrics to reflect the addition of the new label. I'd like to also modify the test support code to allow us to close transports with errors, in order to test that the errors are correctly classified as failures.
2018-04-19 19:01:48 -07:00
Oliver Gould b968682a10
proxy: Support destination label matching for tap (#817)
Now, the tap server may specify that requests should be matched by destination
label.

For example, if the controller's Destination service returns the labels:
`{"service": "users", "namespace": "prod"}` for an endpoint, then tap would be
able to specify a match like `namespace=prod` to match requests destined to
that namespace.
2018-04-19 17:58:45 -07:00
Eliza Weisman 674ce87588
proxy: Add transport-level metrics (#785)
This branch adds all the transport-level Prometheus metrics as described in #742, with the exception of the `tcp_connections_open` gauge (to be added in a subsequent branch). 

A brief description of the metrics added in this branch:
* `tcp_accept_open_total`: counter of the number of connections accepted by the proxy
* `tcp_accept_close_total`: counter of the number of accepted connections that have closed
* `tcp_connect_open_total`: counter of the number of connections opened by the proxy
* `tcp_connect_close_total`: counter of the number of connections opened by the proxy that have been closed.
* `tcp_connection_duration_ms`: histogram of the total duration of each TCP connection (incremented on connection close)
* `sent_bytes`: counter of the total number of bytes sent on TCP connections (incremented on connection close)
* `received_bytes`: counter of the total number of bytes received on TCP connections (incremented on connection close)

 These metrics are labeled with the direction (inbound or outbound) and whether the connection was proxied as raw TCP or corresponds to an HTTP request. 

Additionally, I've added several proxy tests for these metrics. Note that there are some cases which are currently untested; in particular, while there are tests for the `tcp_accept_close_total` counter, it's more difficult to test the `tcp_connect_close_total` counter, due to connection pooling. I'd like to improve the tests for this code in additional branches.
2018-04-19 17:27:43 -07:00
Oliver Gould 689c42263a
proxy: Add destination labels to TapEvents (#814)
The Tap API supports key-value labels on endpoint metadata. The proxy was not
setting these labels previously.

In order to add these labels onto tap events, we store the original set of
labels in an `Arc<HashMap>` on `DstLabels`. When tap events are emitted, the
destination' labels are copied from the `DstLabels` into each event.
2018-04-19 16:57:40 -07:00
Oliver Gould 0e39d6d8fa
proxy: Remove the `Labeled` middleware in favor of client context labels (#812)
The `Labeled` middleware is used to add `DstLabels` to each request. Now that
each client context maintains a watch on its endpoint's `DstLabels`, the
`Labeled` middleware can safely be removed.

This has one subtle behavior change: labels are associated with requests
_lazily_, whereas before they were determined _eagerly_. This means that if an
endpoints labels are updated before the telemetry system captures the labels
for the request, it may use the newer labels. Previously, it would only use the
labels at the time that the request originated.
2018-04-19 15:36:01 -07:00
Oliver Gould c76dd1caea
proxy: Track destination labels in client ctx (#799)
Currently, only the request context holds destination labels. However,
destination labels are more accurately associated with the client context,
since the client context is what tracks the remote peer address (and
destination labels are associated with this address).

No functional changes.
2018-04-19 14:22:13 -07:00
Oliver Gould 2238c91e92
proxy: Introduce the control::discovery::Endpoint type (#798)
Building on #796, this creates a new `Endpoint` type that wraps `SocketAddr`.

Still, no functional change has been introduced, but this sets up to move
destination labels into the bind stack directly (by adding the labels watch to
the `Endpoint` type).
2018-04-19 13:31:21 -07:00
Oliver Gould 491fae7cc4
proxy: Rewrite mock controller to accept a stream of dst updates (#808)
Currently, the mock controller, which is used in tests, takes all of its
updates a priori, which makes it hard to control when an update occurs within a
test.

Now, the controller exposes a `DstSender`, which wraps an unbounded channel of
destination updates. This allows tests to trigger updates at a specific point
in the test.

In order to accomplish this, the controller's hand-rolled gRPC server
implementation has been discarded in favor of a real gRPC destination service.
This requires that the `controller-grpc` project now builds both clients
and servers for the destination service. Additionally, we now build a tap
client as well (assuming that we'll want to write tests against our tap
server).
2018-04-19 11:01:10 -07:00
Oliver Gould 926c4cf323
proxy: Make control::discovery::Bind generic over its Endpoint type (#796)
Previously, `Bind` required that it bind to `SocketAddr` (and `SocketAddr`
only). This makes it hard to pass additional information from service discovery
into the client's stack.

To resolve this, `Bind` now has an additional `Endpoint` trait-generic type,
and `Bind::bind` accepts an `Endpoint` rather than a `SocketAddr`.

No additional endpoints have been introduced yet. There are no functional
changes in this refactor.
2018-04-19 11:00:28 -07:00
Oliver Gould 2097d5a1db
proxy: Cleanup control::discovery (#797)
`set_labels` was needlessly `Arc`ed.

`Metadata` does not need to be public.

No functional changes.
2018-04-19 10:59:24 -07:00
Oliver Gould 06dd8d90ee
Introduce the TapByResource API (#778)
This changes the public api to have a new rpc type, `TapByResource`.
This api supersedes the Tap api. `TapByResource` is richer, more closely 
reflecting the proxy's capabilities.

The proxy's Tap api is extended to select over destination labels,
corresponding with those returned by the Destination api.

Now both `Tap` and `TapByResource`'s responses may include destination
labels.

This change avoids breaking backwards compatibility by:

* introducing the new `TapByResource` rpc type, opting not to change Tap
* extending the proxy's Match type with a new, optional, `destination_label` field.
* `TapEvent` is extended with a new, optional, `destination_meta`.
2018-04-18 15:37:07 -07:00
Eliza Weisman 2e919bf813
Move request open timestamp to the top of the stack (#744)
Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router).

This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats.

By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously.

I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.
2018-04-17 15:01:36 -07:00
Eliza Weisman 6121afb6f2
Factor out reused test fixtures from telemetry tests (#782)
This is a fairly minor refactor to the proxy telemetry tests. b07b554d2b added a `Fixture` in the Destination service labeling tests added in #661 to reduce the repetition of copied and pasted code in those tests. I've refactored most of the other telemetry tests to also use the test fixture. Significantly less code is copied and pasted now.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-17 14:15:56 -07:00
Sean McArthur 3cd16e8e40
proxy: clean up some logs and a few warnings in proxy tests (#780)
Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-04-17 12:53:20 -07:00
Eliza Weisman cf2d7b1d7d
proxy: move metrics::prometheus module to root metrics module (#763)
The proxy `telemetry::metrics::prometheus` module was initially added in order to give the Prometheus metrics export code a separate namespace from the controller push metrics. Since the controller push metrics code was removed from the proxy in #616, we no longer need a separate module for the Prometheus-specific metrics code. Therefore, I've moved that code to the root `telemetry::metrics` module, which should hopefully make the proxy source tree structure a little simpler.

This is a fairly trivial refactor.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-17 11:19:27 -07:00
Eliza Weisman 64f4dfe07f
Refactor control::Cache and add tests (#733)
Closes #713. This is a follow-up from #688.

This PR makes a number of refactorings to the proxy's `control::Cache` module and removes all but one of the `clone` calls. 

The `CacheChange` enum now contains the changed key and a reference to the changed value when applicable. This simplifies `on_change` functions, which no longer have to take both a tuple of `(K, V)` and a `CacheChange` and can now simply destructure the `CacheChange`, and since the changed value is passed as a reference, the `on_change` function can now decide whether or not it should be cloned. This means that we can remove a majority of the clones previously present here.

I've also rewritten `Cache::update_union` so that it no longer clones values (twice if the cache was invalidated). There's still one `clone` call in `Cache::update_intersection`, but it seems like it will be fairly tricky to remove. However, I've moved the `V: Clone` bound to that function specifically. `Cache::clear` and `Cache::update_union` so that they no longer call `Cache::update_intersection` internally, so they don't need a `V: Clone` bound.

In addition, I've added some unit tests that test that `on_change` is called with the correct `CacheChange`s when key/value pairs are modified.
2018-04-16 16:42:55 -07:00
Brian Smith 621f3c2e56
Revert "Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593)" (#767)
This reverts commit d38a2acff8.

The change being reverted here did reduce downloads that occur when
Cargo.lock is updated. However, it had the unwanted side-effect of
invalidating at least part of the Cargo download cache when other
files, including in particular files under proto/, were modified.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-16 13:27:49 -10:00
Brian Smith 0c37067554
Reduce proto dependencies in proxy/Dockerfile (#765)
Reduce the dependencies on files under proto/ to eliminate Docker
detecting false dependencies that trigger rebuilds.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-13 14:49:55 -10:00
Oliver Gould cc44db054f
Remove NODE_NAME and POD_NAME env usage (#758)
* proxy: Remove pod_name and node_name

* cli: Do not inject POD_NAME and NODE_NAME env vars
2018-04-13 13:09:51 -07:00
Oliver Gould efdfc93b50
Stop pushing telemetry reports from the proxy (#616)
Now that the controller does not depend on pushed telemetry reports, the
proxy need not depend on the telemetry API or maintain legacy sampling
logic.
2018-04-12 17:39:29 -07:00
Eliza Weisman b6180d8bfe
Add unit tests for Labeled middleware (#738)
I've added unit tests for the `Labeled` middleware used to add Destination labels in the proxy, as @olix0r requested in https://github.com/runconduit/conduit/pull/661#discussion_r179897783. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-12 15:10:01 -07:00
Eliza Weisman 61d15a6c3e
Ignore flaky telemetry tests on CI (#752)
The tests for label metadata updates from the control plane are flaky on CI. This is likely due to the CI containers not having enough cores to execute the test proxy thread, the test proxy's controller client thread, the mock controller thread, and the test server thread simultaneously --- see #751 for more information. 

For now, I'm ignoring these on CI. Eventually, I'd like to change the mock controller code in test support so that we can trigger it to send a second metadata update only after the request has finished.

I think this issue also makes merging #738 a higher priority, so that we can still have some tests running on CI that exercise some part of the label update behaviour.
2018-04-12 14:59:17 -07:00
Eliza Weisman b07b554d2b
Add labels from service discovery to proxy metrics reports (#661)
PR #654 adds pod-based metric labels to the Destination API responses for cluster-local services. 

This PR modifies the proxy to actually add these labels to reported Prometheus metrics for outbound requests to local services. 

It enhances the proxy's `control::discovery` module to track these labels and add a `LabelRequest` middleware to the service stack built in `Bind` for labeled services. Requests transiting `LabelRequest` are given an `Extension` which contains these labels, which are then added to events produced by the `Sensors` for these requests. When these events are aggregated to Prometheus metrics, the labels are added.

I've also added some tests in `test/telemetry.rs` ensuring that these metrics are added correctly when the Destination service provides labels.

Closes #660

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-12 12:54:38 -07:00
Sean McArthur 7f54b5253d
proxy: fix flaky tcp graceful shutdown test (#735) 2018-04-10 19:47:00 -07:00
Sean McArthur 02c6887020
proxy: improve graceful shutdown process (#684)
- The listener is immediately closed on receipt of a shutdown signal.
- All in-progress server connections are now counted, and the process will
  not shutdown until the connection count has dropped to zero.
- In the case of HTTP1, idle connections are closed. In the case of HTTP2,
  the HTTP2 graceful shutdown steps are followed of sending various
  GOAWAYs.
2018-04-10 14:15:37 -07:00
Brian Smith 7319cf648f
Proxy: Do L7 load balancing for all external HTTP services. (#726)
Previously when the proxy could tell, by parsing, the request-target
is not in the cluster, it would not override the destination. That is,
load balancing would be disabled for such destinations.

With this change, the proxy will do L7 load balancing for all HTTP
services as long as the request-target has a DNS name.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-10 08:07:16 -10:00
Brian Smith bc16034fd6
Proxy: Fall back to using DNS when Destination service can't find service. (#692)
Fixes #155.
2018-04-07 18:26:06 -10:00
Brian Smith c25e9c371b
Refactor poll_destination() in service discovery. (#725)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-07 18:15:19 -10:00
Brian Smith 7d3b715c4d
Proxy: Move DNS name normalization to service discovery (#722)
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 15:04:09 -10:00
Eliza Weisman 8bc05472ed
Make `control::Cache` key-value in order to store discovery metadata (#688)
This PR changes the proxy's `control::Cache` module from a set to a key-value map. 

This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type.

Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. 

Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata.

If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.
2018-04-06 13:54:16 -07:00