The proxy's metrics are instrumented with a `tls` label that describes
the state of TLS for each connection and associated messges.
This same level of detail is useful to get in `tap` output as well.
This change updates Tap in the following ways:
* `TapEvent` protobuf updated:
* Added `source_meta` field including source labels
* `proxy_direction` enum indicates which proxy server was used.
* The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection
* The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.
The `tls` label could sometimes be formatted incorrectly, without a
preceding comma.
To fix this, the `TlsStatus` type no longer formats commas so that they
must be provided in the context in which they are used (as is done
otherwise in this file).
This branch modifies the proxy's logic for opening a connection so
that when an attempted TLS handshake fails, the proxy will retry that
connection without TLS.
This is implemented by changing the `UpgradeToTls` case in the `Future`
implementation for `Connecting`, so that rather than simply wrapping
a poll to the TLS upgrade future with `try_ready!` (and thus failing
the future if the upgrade future fails), we reset the state of the
future to the `Plaintext` state and continue looping. The `tls_status`
field of the future is changed to `ReasonForNoTls::HandshakeFailed`,
and the `Plaintext` state is changed so that if its `tls_status` is
`HandshakeFailed`, it will no longer attempt to upgrade to TLS when the
plaintext connection is successfully established.
Closes#1084
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
If the controller address has a loopback host then don't use TLS to connect
to it. TLS isn't needed for security in that case. In mormal configurations
the proxy isn't terminating TLS for loopback connections anyway.
Signed-off-by: Brian Smith <brian@briansmith.org>
Instead of attempting to construct identities itself, have the proxy
accept fully-formed identities from whatever configures it. This allows
us to centralize the formatting of the identity strings in the Go code
that is shared between the `conduit inject`, `conduit install`, and CA
components.
One wrinkle: The pod namespace isn't necessarily available at
`conduit inject` time, so the proxy must implement a simple variable
substitution mechanism to insert the pod namespace into its identity.
This has the side-effect of enabling TLS to the controller since the
controller's identity is now available.
Signed-off-by: Brian Smith <brian@briansmith.org>
This branch adds buckets for latencies below 10 ms to the proxy's latency
histograms, and removes the buckets for 100, 200, 300, 400, and 500
seconds, so the largest non-infinity bucket is 50,000 ms. It also removes
comments that claimed that these buckets were the same as those created
by the control plane, as this is no longer true (the metrics are now scraped
by Prometheus from the proxy directly).
Closes#1208
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Update dest service with a different tls identity strategy
* Send controller namespace as separate field
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
#1203 introduced a bug in the implementation of `Future` for
`connection::ConditionallyUpgradeServerToTls`. If the attempt to match
the current peek buffer was incomplete, the `Future` implementation
would return `Ok(Async::NotReady)`. This results in the task yielding.
However, in this case the task would not be notified again, as the
`NotReady` state wasn't from an underlying IO resource. Instead, the
would _never_ be ready.
This branch fixes this issue by simply continuing the loop, so that
we instead try to read more bytes from the socket and try to match
again, until the match is successful or the _socket_ returns `NotReady`.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
When the proxy receives a `CONNECT` request, the HTTP Upgrade pieces
are used since a CONNECT is very similar to an Upgrade. If the CONNECT
response back from the proxied client request is successful, the
connection is converted into a TCP proxy, just like with Upgrades.
There are currently two issues which can lead to false positives (changes being
reported when files have not actually changed) in the polling-based filesystem
watch implementation.
The first issue is that when checking each watched file for changes, the loop
iterating over each path currently short-circuits as soon as it detects a
change. This means that if two or more files have changed, the first time we
poll the fs, we will see the first change, then if we poll again, we will see
the next change, and so on.
This branch fixes that issue by always hashing all the watched files, even if a
change has already been detected. This way, if all the files change between one
poll and the next, we no longer generate additional change events until a file
actually changes again.
The other issue is that the old implementation would treat any instance of a
"file not found" error as indicating that the file had been deleted, and
generate a change event. This leads to changes repeatedly being detected as
long as a file does not exist, rather than a single time when the file's
existence state actually changes.
This branch fixes that issue as well, by only generating change events on
"file not found" errors if the file existed the last time it was polled.
Otherwise, if a file did not previously exist, we no longer generate a new
event.
I've verified both of these fixes through manual testing, as well as a new
test for the second issue. The new test fails on master but passes on this
branch.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
On the server (accept) side of TLS, if the traffic isn't targetting the
proxy (as determined by the TLS ClientHello SNI), or if the traffic
isn't TLS, then pass it through.
Signed-off-by: Brian Smith <brian@briansmith.org>
Copy most of the implementation of `connection::Connection` to create
a way to prefix a `TcpStream` with some previously-read bytes. This
will allow us to read and parse a TLS ClientHello message to see if it
is intended for the proxy to process, and then "rewind" and feed it
back into the TLS implementation if so.
This must be in the `transport` submodule in order for it to implement
the private `Io` trait.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Proxy: Add parser to distinguish proxy TLS traffic from other traffic.
Distinguish incoming TLS traffic intended for the proxy to terminate
from TLS traffic intended for the proxied service to terminate and from
non-TLS traffic.
The new version of `untrusted` is required for this to work.
Signed-off-by: Brian Smith <brian@briansmith.org>
* More tests
Signed-off-by: Brian Smith <brian@briansmith.org>
* Stop abusing `futures::Async`.
Signed-off-by: Brian Smith <brian@briansmith.org>
As the TLS client config watch stored in `ctx::Process` is used only in
`Bind`, it's not necessary for it to be part of the process context.
Instead, it can be explicitly passed into `Bind`.
The resultant code is simpler, and resolves a potential cyclic
dependency caused when adding `Sensors` to the watch (see
https://github.com/runconduit/conduit/pull/1141#issuecomment-400082357).
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This branch adds the rebinding logic added to outbound clients in #1185
to the controller client used in the proxy's `control::destination::background`
module. Now, if we are communicating with the control plane over TLS, we will
rebind the controller client stack if the TLS client configuration changes,
using the `WatchService` added in #1177.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Signed-off-by: Brian Smith <brian@briansmith.org>
Co-authored-by: Brian Smith <brian@briansmith.org>
control/mod.rs contains a variety of miscelaneous utilities. In
preparation of adding other types into the root of `control`, this
change creates a `control::util` module that holds them.
Rearrange the TLS configuration loading tests to enable them to be
extended outside the tls::config submodule.
Signed-off-by: Brian Smith <brian@briansmith.org>
Simplify the code and make it easier to report finer-grained
reasoning about what part(s) of the TLS configuration are
missing.
This is based on Eliza's PR #1186.
Signed-off-by: Brian Smith <brian@briansmith.org>
This branch adds process stats to the proxy's metrics, as described in
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics.
In particular, it adds metrics for the process's total CPU time, number of
open file descriptors and max file descriptors, virtual memory size, and
resident set size.
This branch adds a dependency on the `procinfo` crate. Since this crate and the
syscalls it wraps are Linux-specific, these stats are only reported on Linux.
On other operating systems, they aren't reported.
Manual testing
Metrics scrape:
```
eliza@ares:~$ curl http://localhost:4191/metrics
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 19
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 45252608
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 12132352
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1529017536
```
Note that the `process_cpu_seconds_total` stat is 0 because I just launched this conduit instance and it's not seeing any load; it does go up after i sent a few requests to it.
Confirm RSS & virtual memory stats w/ `ps`, and get Conduit's pid so we can check the fd stats
(note that `ps` reports virt/rss in kb while Conduit's metrics reports them in bytes):
```
eliza@ares:~$ ps aux | grep conduit | grep -v grep
eliza 16766 0.0 0.0 44192 12956 pts/2 Sl+ 16:05 0:00 target/debug/conduit-proxy
```
Count conduit process's open fds:
```
eliza@ares:~$ cd /proc/16766/fd
eliza@ares:/proc/16766/fd$ ls -l | wc -l
18
```
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This branch changes the proxy's `Bind` module to add a middleware layer
which watches for TLS cliend configuration changes and rebinds the
endpoint stacks of any endpoints with which it is able to communicate with over
TLS (i.e. those with `TlsIdentity` metadata) when the client config changes. The
rebinding is done at the level of individual endpoint stacks, rather than for the
entire service stack for the destination.
This obsoletes my previous PRs #1169 and #1175.
Closes#1161
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
WatchService is a middleware that rebinds its inner service
each time a Watch updates.
This is planned to be used to rebind endpoint stacks when TLS
configuration changes. Later, it should probably be moved into
the tower repo.
While investigating TLS configuration, I found myself wanting a
docstring on `tls::config::watch_for_config_changes`.
This has one minor change in functionality: now, `future::empty()`
is returned instead of `future:ok(())` so that the task never completes.
It seems that, ultimately, we'll want to treat it as an error if we lose
the ability to receive configuration updates.
* Proxy: Implement TLS conditional accept more like TLS conditional connect.
Clean up the accept side of the TLS configuration logic.
Signed-off-by: Brian Smith <brian@briansmith.org>
Any HTTP/1.1 requests seen by the proxy will automatically set up
to prepare such that if the proxied responses agree to an upgrade,
the two connections will converted into a standard TCP proxy duplex.
Implementation
-----------------
This adds a new type, `transparency::Http11Upgrade`, which is a sort of rendezvous type for triggering HTTP/1.1 upgrades. In the h1 server service, if a request looks like an upgrade (`h1::wants_upgrade`), the request body is decorated with this new `Http11Upgrade` type. It is actually a pair, and so the second half is put into the request extensions, so that the h1 client service may look for it right before serialization. If it finds the half in the extensions, it decorates the *response* body with that half (if it looks like a response upgrade (`h1::is_upgrade`)).
The `HttpBody` type now has a `Drop` impl, which will look to see if its been decorated with an `Http11Upgrade` half. If so, it will check for hyper's new `Body::on_upgrade()` future, and insert that into the half.
When both `Http11Upgrade` halves are dropped, its internal `Drop` will look to if both halves have supplied an upgrade. If so, the two `OnUpgrade` futures from hyper are joined on, and when they succeed, a `transparency::tcp::duplex()` future is created. This chain is spawned into the default executor.
The `drain::Watch` signal is carried along, to ensure upgraded connections still count towards active connections when the proxy wants to shutdown.
Closes#195
This adds `Io::write_buf_erased` that doesn't required `Self: Sized`, so
it can be called on trait objects. By using this method, specialized
methods of `TcpStream` (and others) can use their `write_buf` to do
vectored writes.
Since it can be easy to forget to call `Io::write_buf_erased` instead of
`Io::write_buf`, the concept of making a `Box<Io>` has been made
private. A new type, `BoxedIo`, implements all the super traits of `Io`,
while making the `Io` trait private to the `transport` module. Anything
hoping to use a `Box<Io>` can use a `BoxedIo` instead, and know that
the write buf erase dance is taken care of.
Adds a test to `transport::io` checking that the dance we've done does
indeed call the underlying specialized `write_buf` method.
Closes#1162
* Proxy: More carefully keep track of the reason TLS isn't used.
There is only one case where we dynamically don't know whether we'll
have an identity to construct a TLS connection configuration. Refactor
the code with that in mind, better documenting all the reasons why an
identity isn't available.
Signed-off-by: Brian Smith <brian@briansmith.org>
Move TLS cipher suite configuration to tls::config.
Use the same configuration to act as a client and a server.
Signed-off-by: Brian Smith <brian@briansmith.org>
The comments in Outbound::recognize had become somewhat stale as the
logic changed. Furthermore, this implementation may be easier to
understand if broken into smaller pieces.
This change reorganizes the Outbound:recognize method into helper
methods--`destination`, `host_port`, and `normalize`--each with
accompanying docstrings that more accurately reflect the current
implementation.
This also has the side-effect benefit of eliminating a string clone on
every request.
Depends on #1047.
This PR adds a `tls="true"` label to metrics produced by TLS connections and
requests/responses on those connections, and a `tls="no_config"` label on
connections where TLS was enabled but the proxy has not been able to load
a valid TLS configuration.
Currently, these labels are only set on accepted connections, as we are not yet
opening encrypted connections, but I wired through the `tls_status` field on
the `Client` transport context as well, so when we start opening client
connections with TLS, the label will be applied to their metrics as well.
Closes#1046
Signed-off-by: Eliza Weisman <eliza@buoyanbt.io>
* Proxy: Make TLS server aware of its own identity.
When validating the TLS configuration, make sure the certificate is
valid for the current pod. Make the pod's identity available at that
point in time so it can do so. Since the identity is available now,
simplify the validation of our own certificate by using Rustls's API
instead of dropping down to the lower-level webpli API.
This is a step towards the server differentiating between TLS
handshakes it is supposed to terminate vs. TLS handshakes it is
supposed to pass through.
This is also a step toward the client side (connect) of TLS, which will
reuse much of the configuration logic.
Signed-off-by: Brian Smith <brian@briansmith.org>
Using MS Edge and probably other clients with the Conduit proxy when
TLS is enabled fails because Rustls doesn't take into consideration
that Conduit only supports one signature scheme (ECDSA P-256 SHA-256).
This bug was fixed in Rustls when ECDSA support was added, after the
latest release. With this change MS Edge can talk to Conduit.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously, the proxy would not attempt to load its TLS certificates until a fs
watch detected that one of them had changed. This means that if the proxy was
started with valid files already at the configured paths, it would not load
them until one of the files changed.
This branch fixes that issue by starting the stream of changes with one event
_followed_ by any additional changes detected by watching the filesystem.
I've manually tested that this fixes the issue, both on Linux and on macOS, and
can confirm that this fixes the issue. In addition, when I start writing
integration tests for certificate reloading, I'll make sure to include a test
to detect any regressions.
Closes#1133.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Refactor the way the TLS trust anchors are configured in preparation
for the client and server authenticating each others' certificates.
Make the use of client certificates optional pending the implementation
of authorization policy.
Signed-off-by: Brian Smith <brian@briansmith.org>
When a TLS handshake error occurs, the proxy just stops accepting
requests. It seems my expectations of how `Stream` handles errors
were wrong.
The test for this will be added in a separate PR after the
infrastructure needed for TLS testing is added. (This is a chicken
and egg problem.)
Signed-off-by: Brian Smith <brian@briansmith.org>
This PR changes the proxy's Inotify watch code to avoid always falling back to
polling the filesystem when the watched files don't exist yet. It also contains
some additional cleanup and refactoring of the inotify code, including moving
the non-TLS-specific filesystem watching code out of the `tls::config` module
and into a new `fs_watch` module.
In addition, it adds tests for both the polling-based and inotify-based watch
implementations, and changes the polling-based watches to hash the files rather
than using timestamps from the file's metadata to detect changes. These changes
are originally from #1094 and #1091, respectively, but they're included here
because @briansmith asked that all the changes be made in one PR.
Closes#1094. Closes#1091. Fixes#1090. Fixes#1097. Fixes#1061.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
prost-0.4.0 has been released, which removes unnecessary dependencies.
tower-grpc is being updated simultaneously, as this is the proxy's
primary use of prost.
See: https://github.com/danburkert/prost/releases/tag/v0.4.0
* proxy: Update `rand` to 0.5.1
The proxy depends on rand-0.4, which is superceded by newer APIs in
rand-0.5. Since we're already using rand-0.5 via the tower-balance
crate, it seems appropriate to upgrade the proxy.
* Expand lock files in reviews
In e2093e3, we created a `convert` crate when refactoring the proxy's
gRPC bindings into a dedicated crate.
It's not really necessary to handle `convert` as a crate, given that it
holds a single 39-line file that's mostly comments. It's possible to
"vendor" this file in the proxy, and controller-grpc crate doesn't
even need this trait (in fact, the proxy probably doesn't either).
`tower-balance` has been updated with a Peak-EWMA load balancer; and a
new crate, `tower-h2-balance` has been introduced to make the load
balancer aware of some H2 stream events.
The Peak-EWMA balancer is designed to reduce tail latency by maintaining
an Exponentially Weighted Moving Average of latencies to each endpoint
which decay over a 10s window.
This commit adds the initial wiring to forward TLS config changes to the
watches used by TLS clients as well as TLS servers. As the TLS clients
are not yet implemented, the config type is currently `()`, but once
the client config is implemented, we should be able to drop it in
seamlessly.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-authored-by: Brian Smith <brian@briansmith.org>
Depends on #1032.
This branch makes some additional changes to the proxy's DNS code. In
particular, since we no longer need to clone the resolver on every lookup,
it removes some `clone()` calls in `DestinationSet::reset_dns_query`.
I've also changed the DNS futures to use the new contextual logging code
on master.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
* Fix non-Linux builds.
The change to signal.rs is needed for Windows.
The change to config.rs is needed for Windows and maybe other platforms.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Proxy: Better encapsulate the details of TLS config watching.
Encapsulate more of the TLS configuration logic in the TLS submodule. This allows
for easier refactoring. In particular, this will make adding the client TLS configuration
easier.
Signed-off-by: Brian Smith <brian@briansmith.org>
This branch adds an inotify-based implementation of filesystem watches
for the TLS config files. On Linux, where inotify is available, this is
used instead of the polling-based code I added in #1056 and #1076.
In order to avoid the issues detecting changes to files in Kubernetes
ConfigMaps described in #1061, we watch the directory _containing_ the
files we care about rather than the files themselves. I've tested this
manually in Docker for Mac Kubernetes and can confirm that ConfigMap
changes are detected successfully.
Closes#1061. Closes#369.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>