`tower-balance` has been updated with a Peak-EWMA load balancer; and a
new crate, `tower-h2-balance` has been introduced to make the load
balancer aware of some H2 stream events.
The Peak-EWMA balancer is designed to reduce tail latency by maintaining
an Exponentially Weighted Moving Average of latencies to each endpoint
which decay over a 10s window.
This commit adds the initial wiring to forward TLS config changes to the
watches used by TLS clients as well as TLS servers. As the TLS clients
are not yet implemented, the config type is currently `()`, but once
the client config is implemented, we should be able to drop it in
seamlessly.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Brian Smith <brian@briansmith.org>
* Fix non-Linux builds.
The change to signal.rs is needed for Windows.
The change to config.rs is needed for Windows and maybe other platforms.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Proxy: Better encapsulate the details of TLS config watching.
Encapsulate more of the TLS configuration logic in the TLS submodule. This allows
for easier refactoring. In particular, this will make adding the client TLS configuration
easier.
Signed-off-by: Brian Smith <brian@briansmith.org>
This branch adds an inotify-based implementation of filesystem watches
for the TLS config files. On Linux, where inotify is available, this is
used instead of the polling-based code I added in #1056 and #1076.
In order to avoid the issues detecting changes to files in Kubernetes
ConfigMaps described in #1061, we watch the directory _containing_ the
files we care about rather than the files themselves. I've tested this
manually in Docker for Mac Kubernetes and can confirm that ConfigMap
changes are detected successfully.
Closes#1061. Closes#369.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Proxy: Map Kubernetes Pod Namespace/Name to TLS identity.
Map the Kubernetes identity into a DNS name that can be used to
validate the peer's certificate. The final mapping is TBD; the
important thing for now is that the mapped name doesn't collide
with any real DNS name.
Encapsulate the mapping logic within the TLS submodule.
Minimize `Arc`ing and `Clone`ing of TLS identities.
This has no effect in default configurations since the settings that
enable the functionality are not set by default.
Signed-off-by: Brian Smith <brian@briansmith.org>
This PR modifies the proxy's TLS code so that the TLS config files are reloaded
when any of them has changed (including if they did not previously exist).
If reloading the configs returns an error, we log an error and continue using
the old config.
Currently, this is implemented by polling the file system for the time they
were last modified at a fixed interval. However, I've implemented this so
that the changes are passed around as a `Stream`, and that reloading and
updating the config is in a separate function the one that detects changes.
Therefore, it should be fairly easy to plug in support for `inotify` (and
other FS watch APIs) later, as long as we can use them to generate a
`Stream` of changes.
Closes#369
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Depends on #974. Closes#859.
This PR updates the proxy's `dns` module to use the new `AsyncResolver` API I
added to `trust-dns-resolver` in bluejekyll/trust-dns#487. This allows us to
spawn one `Future` that will drive DNS resolution in the background, rather
than having to repeatedly clone a heavyweight `ResolverFuture` for every
lookup. It also means that we no longer have to clone the name to resolve in
quite as many places.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Closes#711. Depends on #967.
This PR changes the proxy's `destination` module to honor the TTLs associated
with DNS lookups, now that bluejekyll/trust-dns#444 has been merged and we can
access this information from the Trust-DNS Resolver API.
The `destination::background::DestinationSet` type has been modified so that,
when a successful result is received for a DNS query, the DNS server will be
polled again after the deadline associated with that query, rather than after
a fixed deadline. The fixed deadline is still used to determine when to poll
again for negative DNS responses or for errors.
Furthermore, Conduit now accepts an optional CONDUIT_PROXY_DNS_MIN_TTL
environment variable that will configure a minimum TTL for DNS results. If the
deadline of a DNS response gives it a TTL shorter than the configured minimum,
Conduit will not poll DNS again until after that minimum TTL is elapsed. By
default, there is no minimum value set, as this feature is intended primarily
for when Conduit is run locally for development purposes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Add initial infrastructure for optinally accepting TLS connections.
If the environment gives us the paths to the certificate chain and private key
then use TLS for all accepted TCP connections. Otherwise, continue on using
plaintext for all accepted TCP connections. The default behavior--no TLS--isn't
changed.
Later we'll make this smarter by adding protocol detection so that when the TLS
configuration is available, we'll accept both TLS and non-TLS connections.
Signed-off-by: Brian Smith <brian@briansmith.org>
While debugging proxy issues, I found it necessary to change how logging contexts are
instrumented, especially for clients.
This change moves away from using `Debug` types to in favor of `Display` types.
Furthermore, the `logging` module now provides a uniform set of logging contexts to be
used throughout the application. All clients, servers, and background tasks should now be
instrumented so that their log messages contain predictable metadata.
Some small improvements have been made to ensure that logging contexts are correct
when a `Future` is dropped (which is important for some H2 uses, especially).
Depends on tower-rs/tower#75. Required for #386
In order for the proxy to use the TLS support metadata from the Destination
service correctly, we determined that the code for dynamically changing the
labels on an already-bound service should be removed, and any change in
metadata should cause an endpoint to be rebound.
I've modified the proxy so that we no longer update the labels using
`futures-watch` (as a sidenote, we no longer depend on that crate). Metadata
update events now cause the `tower-discover::Discover` implementation for
`DestinationSet` to re-insert the changed endpoint into the load balancer.
Upstream PR tower-rs/tower#75 in tower-balance changes the load balancer
to honor duplicate insertions by replacing the old endpoint rather than
ignoring them; that change is necessary for the tests to pass on this branch.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Rename so_original_dst.rs to addr_info.rs.
Prepare for expanding the functionality of this module by renaming it.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Abstract I/O interface into a trait.
Instead of pattern matching over an `Io` variant, use a `Box<Io>` to
abstract the I/O interface. This will make it easier to add a TLS
transport.
Signed-off-by: Brian Smith <brian@briansmith.org>
Changes to `BoundPort::listen_and_fold` inadvertently broke the
`::logging::context_future`s on the `serve` futures for the Inbound and
outbound proxies, leading to log messages that didn't have the appropriate
context. This fixes that.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
A common pattern when using the old Tokio API was separating the configuration
of a task from binding it to an executor to run on. This was often necessary
when we wanted to construct a type corresponding to some task before the
reactor on which it would execute was initialized. Typically, this was
accomplished with two separate types, one of which represented the
configuration and exposed only a method to take a reactor `Handle` and
transform it to the other type, representing the actual task.
After we migrate to the new Tokio API in #944, executors no longer need to be
passed explictly, as we can use `DefaultExecutor::current` or
`current_thread::TaskExecutor::current` to spawn a task on the current
executor. Therefore, a lot of this complexity can be refactored away.
This PR refactors the `Config` and `Process` structs in
i`control::destination::background` into a single `Background` struct, and
removes the `dns::Config` and `telemetry::MakeControl` structs (`dns::Resolver`
and `telemetry::Control` are now constructed directly). It should not cause
any functional changes.
Closes#966
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Now that `impl Trait` is stable, we don't need to box as many futures. We still
need to box before spawning them on an executor, but the component futures no
longer require their own boxes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io
Closes#888. Closes#867.
This branch upgrades Conduit to use the new Tokio API. It was also necessary to
upgrade some other dependencies (including `hyper`, and `trust-dns`) alongside
this upgrade.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This is in preparation for landing the Tokio upgrade.
In order to be generic over Tokio's current thread and threadpool executors,
a number of types in Conduit which were not previously `Send` are now required
to be `Send`. A majority of this work will be done in the main Tokio upgrade
PR, as it is in many cases not possible to make these types `Send` _without_
using the new Tokio API (in order to remove `Handle`s, etc.); however, I'm
factoring out everything possible and trying to land it in separate PRs.
The p2c load balancer constructed in `Outbound` is currently parameterized
over a random number generator. We currently construct it by getting the
thread-local RNG, and passing it to the load balancer constructor. However,
the thread-local RNG is not `Send`. I've fixed this issue by creating a new
zero-sized empty struct type which implements `rand::Rng` simply by calling
`thread_rng()` every time its' called, and passing that to
`choose::power_of_two_choices` instead. Since this is an empty type which
contains no data, and the correct thread-local RNG is accessed whenever
the methods are called, this new type can trivially be `Send`. According to
the `rand` crate's documentation, this is the correct way to use `ThreadRng`
anyway:
> Retrieve the lazily-initialized thread-local random number generator, seeded
> by the system. Intended to be used in method chaining style, e.g.
> `thread_rng().gen::<i32>()`.
> (from https://docs.rs/rand/0.4.2/rand/fn.thread_rng.html)
This shouldn't lead to any functional changes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The router's cache has no means to evict unused entries when capacity is
reached.
This change does the following:
- Wraps cache values in a smart pointer that tracks the last time of
access for each entry. The smart pointer updates the access time when
the reference to entry is dropped.
- When capacity is not available, all nodes that have not been accessed
within some minimal idle age are dropped.
Accesses and updates to the map are O(1) when capacity is available.
Reclaiming capacity is O(n), so it's expected that the router is
configured with enough capacity such that capacity need not be reclaimed
usually.
Configuration values that take durations are currently specified as
time values with no units. So `600` may mean 600ms in some contexts and
10 minutes in others.
In order to avoid this problem, this change now requires that
configurations provide explicit units for time values such as '600ms' or
10 minutes'.
Fixes#27.
Currently, the proxy may cache an unbounded number of routes. In order
to prevent such leaks in production, new configurations are introduced
to limit the number of inbound and outbound HTTP routes. By default, we
support 100 inbound routes and 10K outbound routes.
In a followup, we'll introduce an eviction strategy so that capacity can
be reclaimed gracefully.
The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE
environment variable that dictates the amount of time that the proxy will retain
metrics that have not been updated.
A timestamp is maintained for each unique set of labels, indicating the last time
that the scope was updated. Then, when metrics are read, all metrics older than
CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry.
A ctx::test_utils module has been added to aid testing.
Fixes#819
Before changing the telemetry implementation, we should have a means to
understand the impacts of such changes.
To run, you must use a nightly toolchain:
```
rustup run nightly cargo bench -p conduit-proxy -- record
```
trust-dns-resolver is a more complete implementation. In particular,
it supports CNAMES correctly, which is needed for PR #764. It also
supports /etc/hosts, which will help with issue #62.
Use the 0.8.2 pre-release since it hasn't been released yet. It was
created at our request.
Signed-off-by: Brian Smith <brian@briansmith.org>
Closes#598.
According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent.
I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router).
This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats.
By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously.
I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.
PR #654 adds pod-based metric labels to the Destination API responses for cluster-local services.
This PR modifies the proxy to actually add these labels to reported Prometheus metrics for outbound requests to local services.
It enhances the proxy's `control::discovery` module to track these labels and add a `LabelRequest` middleware to the service stack built in `Bind` for labeled services. Requests transiting `LabelRequest` are given an `Extension` which contains these labels, which are then added to events produced by the `Sensors` for these requests. When these events are aggregated to Prometheus metrics, the labels are added.
I've also added some tests in `test/telemetry.rs` ensuring that these metrics are added correctly when the Destination service provides labels.
Closes#660
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
- The listener is immediately closed on receipt of a shutdown signal.
- All in-progress server connections are now counted, and the process will
not shutdown until the connection count has dropped to zero.
- In the case of HTTP1, idle connections are closed. In the case of HTTP2,
the HTTP2 graceful shutdown steps are followed of sending various
GOAWAYs.
Previously when the proxy could tell, by parsing, the request-target
is not in the cluster, it would not override the destination. That is,
load balancing would be disabled for such destinations.
With this change, the proxy will do L7 load balancing for all HTTP
services as long as the request-target has a DNS name.
Signed-off-by: Brian Smith <brian@briansmith.org>
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.
Signed-off-by: Brian Smith <brian@briansmith.org>
This PR changes the proxy's `control::Cache` module from a set to a key-value map.
This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type.
Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements.
Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata.
If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.
Proxy: Refactor DNS name parsing and normalization
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.
Signed-off-by: Brian Smith <brian@briansmith.org>
- Adds environment variables to configure a set of ports that, when an
incoming connection has an SO_ORIGINAL_DST with a port matching, will
disable protocol detection for that connection and immediately start a
TCP proxy.
- Adds a default list of well known ports: SMTP and MySQL.
Closes#339
This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable.
Tests have been added in t`ests/telemetry.rs`
When the proxy is run in a Docker container, it runs as PID 1, with
no default signal handlers setup. In order to react to signals from
Kubernetes about shutting down, we need to set up explicit handlers.
This adds handlers for SIGTERM and SIGINT.
Closes#549
* Temporarily stop trying to support configurable zones in the proxy.
None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.
Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting
The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.
It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.
Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Further hard-code "cluster.local" as the zone, temporarily.
Addresses review feedback.
Signed-off-by: Brian Smith <brian@briansmith.org>
Currently, the `Reconnect` middleware does not reconnect on connection errors (see #491) and treats them as request errors. This means that when a connection timeout is wrapped in a `Reconnect`, timeout errors are treated as request errors, and the request returns HTTP 500. Since this is not the desired behavior, the connection timeouts should be removed, at least until their errors can be handled differently.
This PR removes the connect timeouts from `Bind`, as described in https://github.com/runconduit/conduit/pull/483#issuecomment-369380003.
It removes the `CONDUIT_PROXY_PUBLIC_CONNECT_TIMEOUT_MS` environment variable, but _not_ the `CONDUIT_PROXY_PRIVATE_CONNECT_TIMEOUT_MS` variable, since this is also used for the TCP connect timeouts. If we want also want to remove the TCP connection timeouts, I can do that as well.
Closes#483. Fixes#491.
This PR changes the proxy to log error messages using `fmt::Display` whenever possible, which should lead to much more readable and meaningful error messages
This is part of the work I started last week on issue #442. While I haven't finished everything for that issue (all errors still are mapped to HTTP 500 error codes), I wanted to go ahead and open a PR for the more readable error messages. This is partially because I found myself merging these changes into other branches to aid in debugging, and because I figured we may as well have the nicer logging on master.
Currently we have to download and build two different versions of
the ordermap crate.
I will submit similar PRs for the dependent crates so that we will
eventually all be using the same version of indexmap.
Signed-off-by: Brian Smith <brian@briansmith.org>
Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.
This eliminates some build dependencies.
Signed-off-by: Brian Smith <brian@briansmith.org>
Closes#403.
When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.
I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).
Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.
This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.
Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.
Fixes#287
Signed-off-by: Carl Lerche <me@carllerche.com>
Currently, the conduit proxy uses a simplistic Round-Robin load
balancing algorithm. This strategy degrades severely when individual
endpoints exhibit abnormally high latency.
This change improves this situation somewhat by making the load balancer
aware of the number of outstanding requests to each endpoint. When nodes
exhibit high latency, they should tend to have more pending requests
than faster nodes; and the Power-of-Two-Choices node selector can be
used to distribute requests to lesser-loaded instances.
From the finagle guide:
The algorithm randomly picks two nodes from the set of ready endpoints
and selects the least loaded of the two. By repeatedly using this
strategy, we can expect a manageable upper bound on the maximum load of
any server.
The maximum load variance between any two servers is bound by
ln(ln(n))` where `n` is the number of servers in the cluster.
Signed-off-by: Oliver Gould <ver@buoyant.io>
The proxy depends on `protoc`-generated gRPC bindings to communicate
with the controller. In order to generate these bindings, build-time
dependencies must be compiled.
In order to support a more granular, cacheable build scheme, a new crate
has been created to house these gRPC bindings,
`conduit-proxy-controller-grpc`.
Because `TryFrom` and `TryInto` conversions are implemented for
protobuf-defined types, the `convert` module also had to be moved to
into a dedicated crate.
Furthermore, because the proxy's tests require that
`quickcheck::Aribtrary` be implemented for protobuf types, the
`conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature
fla protobuf types, the `conduit-proxy-controller-grpc` crate supports
an _arbitrary_ feature flag.
While we're moving these libraries around, the `tower-router` crate has
been moved to `proxy/router` and renamed to `conduit-proxy-router.`
`futures-mpsc-lossy` has been moved into the proxy directory but has not
been renamed.
Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the
wasteful building of dependency artifacts, as they are not actually used
by `proxy/Dockerfile`.
The conduit repo includes several library projects that have since been
moved into external repos, including `tower-grpc` and `tower-h2`.
This change removes these vendored libraries in favor of using the new
external crates.
The proxy will now try to detect what protocol new connections are
using, and route them accordingly. Specifically:
- HTTP/2 stays the same.
- HTTP/1 is now accepted, and will try to send an HTTP/1 request
to the target.
- If neither HTTP/1 nor 2, assume a TCP stream and simply forward
between the source and destination.
* tower-h2: fix Server Clone bounds
* proxy: implement Async{Read,Write} extra methods for Connection
Closes#130Closes#131