Simplify the code and make it easier to report finer-grained
reasoning about what part(s) of the TLS configuration are
missing.
This is based on Eliza's PR #1186.
Signed-off-by: Brian Smith <brian@briansmith.org>
This branch adds process stats to the proxy's metrics, as described in
https://prometheus.io/docs/instrumenting/writing_clientlibs/#process-metrics.
In particular, it adds metrics for the process's total CPU time, number of
open file descriptors and max file descriptors, virtual memory size, and
resident set size.
This branch adds a dependency on the `procinfo` crate. Since this crate and the
syscalls it wraps are Linux-specific, these stats are only reported on Linux.
On other operating systems, they aren't reported.
Manual testing
Metrics scrape:
```
eliza@ares:~$ curl http://localhost:4191/metrics
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 19
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 45252608
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 12132352
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1529017536
```
Note that the `process_cpu_seconds_total` stat is 0 because I just launched this conduit instance and it's not seeing any load; it does go up after i sent a few requests to it.
Confirm RSS & virtual memory stats w/ `ps`, and get Conduit's pid so we can check the fd stats
(note that `ps` reports virt/rss in kb while Conduit's metrics reports them in bytes):
```
eliza@ares:~$ ps aux | grep conduit | grep -v grep
eliza 16766 0.0 0.0 44192 12956 pts/2 Sl+ 16:05 0:00 target/debug/conduit-proxy
```
Count conduit process's open fds:
```
eliza@ares:~$ cd /proc/16766/fd
eliza@ares:/proc/16766/fd$ ls -l | wc -l
18
```
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
This branch changes the proxy's `Bind` module to add a middleware layer
which watches for TLS cliend configuration changes and rebinds the
endpoint stacks of any endpoints with which it is able to communicate with over
TLS (i.e. those with `TlsIdentity` metadata) when the client config changes. The
rebinding is done at the level of individual endpoint stacks, rather than for the
entire service stack for the destination.
This obsoletes my previous PRs #1169 and #1175.
Closes#1161
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
WatchService is a middleware that rebinds its inner service
each time a Watch updates.
This is planned to be used to rebind endpoint stacks when TLS
configuration changes. Later, it should probably be moved into
the tower repo.
PR #978 introduced usage of parallel in docker-build. Unfortunately this
breaks if the system has non-GNU parallel.
Remove usage of parallel until we can do at least one of the following:
- detect version of parallel installed
- make usage of parallel optional and off by default
- confirm this speeds up builds for a majority of use cases
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add CA certificate bundle distributor to conduit install
* Update ca-distributor to use shared informers
* Only install CA distributor when --enable-tls flag is set
* Only copy CA bundle into namespaces where inject pods have the same controller
* Update API config to only watch pods and configmaps
* Address review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
While investigating TLS configuration, I found myself wanting a
docstring on `tls::config::watch_for_config_changes`.
This has one minor change in functionality: now, `future::empty()`
is returned instead of `future:ok(())` so that the task never completes.
It seems that, ultimately, we'll want to treat it as an error if we lose
the ability to receive configuration updates.
* Proxy: Implement TLS conditional accept more like TLS conditional connect.
Clean up the accept side of the TLS configuration logic.
Signed-off-by: Brian Smith <brian@briansmith.org>
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
Any HTTP/1.1 requests seen by the proxy will automatically set up
to prepare such that if the proxied responses agree to an upgrade,
the two connections will converted into a standard TCP proxy duplex.
Implementation
-----------------
This adds a new type, `transparency::Http11Upgrade`, which is a sort of rendezvous type for triggering HTTP/1.1 upgrades. In the h1 server service, if a request looks like an upgrade (`h1::wants_upgrade`), the request body is decorated with this new `Http11Upgrade` type. It is actually a pair, and so the second half is put into the request extensions, so that the h1 client service may look for it right before serialization. If it finds the half in the extensions, it decorates the *response* body with that half (if it looks like a response upgrade (`h1::is_upgrade`)).
The `HttpBody` type now has a `Drop` impl, which will look to see if its been decorated with an `Http11Upgrade` half. If so, it will check for hyper's new `Body::on_upgrade()` future, and insert that into the half.
When both `Http11Upgrade` halves are dropped, its internal `Drop` will look to if both halves have supplied an upgrade. If so, the two `OnUpgrade` futures from hyper are joined on, and when they succeed, a `transparency::tcp::duplex()` future is created. This chain is spawned into the default executor.
The `drain::Watch` signal is carried along, to ensure upgraded connections still count towards active connections when the proxy wants to shutdown.
Closes#195
This adds `Io::write_buf_erased` that doesn't required `Self: Sized`, so
it can be called on trait objects. By using this method, specialized
methods of `TcpStream` (and others) can use their `write_buf` to do
vectored writes.
Since it can be easy to forget to call `Io::write_buf_erased` instead of
`Io::write_buf`, the concept of making a `Box<Io>` has been made
private. A new type, `BoxedIo`, implements all the super traits of `Io`,
while making the `Io` trait private to the `transport` module. Anything
hoping to use a `Box<Io>` can use a `BoxedIo` instead, and know that
the write buf erase dance is taken care of.
Adds a test to `transport::io` checking that the dance we've done does
indeed call the underlying specialized `write_buf` method.
Closes#1162
* Enable optional parallel build of docker images
By default, docker does image builds in a single thread. For our containers, this is a little slow on my system. Using `parallel` allows for *optional* improvements in speed there.
Before: 41s
After: 22s
* Move parallel help text to stderr
Don't allow the CLI or Web UI to request named resources if --all-namespaces is used.
This follows kubectl, which also does not allow requesting named resources
over all namespaces.
This PR also updates the Web API's behaviour to be in line with the CLI's.
Both will now default to the default namespace if no namespace is specified.
* Proxy: More carefully keep track of the reason TLS isn't used.
There is only one case where we dynamically don't know whether we'll
have an identity to construct a TLS connection configuration. Refactor
the code with that in mind, better documenting all the reasons why an
identity isn't available.
Signed-off-by: Brian Smith <brian@briansmith.org>
Move TLS cipher suite configuration to tls::config.
Use the same configuration to act as a client and a server.
Signed-off-by: Brian Smith <brian@briansmith.org>
Problem
`conduit stat` would cause a panic for any resource that wasn't in the list
of StatAllResourceTypes
This bug was introduced by https://github.com/runconduit/conduit/pull/1088/files
Solution
Fix writeStatsToBuffer to not depend on what resources are in StatAllResourceTypes
Also adds a unit test and integration test for `conduit stat ns`
* dest service: close open streams on shutdown
* Log instead of print in pkg packages
* Convert ServerClose to a receive-only channel
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
The comments in Outbound::recognize had become somewhat stale as the
logic changed. Furthermore, this implementation may be easier to
understand if broken into smaller pieces.
This change reorganizes the Outbound:recognize method into helper
methods--`destination`, `host_port`, and `normalize`--each with
accompanying docstrings that more accurately reflect the current
implementation.
This also has the side-effect benefit of eliminating a string clone on
every request.
- If error messages are very long, truncate them and display a toggle to show the full message
- Tweak the headings - remove Pod, Container and Image - instead show them as titles
- Also move over from using Ant's Modal.method to the plain Modal component, which is a
little simpler to hook into our other renders.
Depends on #1047.
This PR adds a `tls="true"` label to metrics produced by TLS connections and
requests/responses on those connections, and a `tls="no_config"` label on
connections where TLS was enabled but the proxy has not been able to load
a valid TLS configuration.
Currently, these labels are only set on accepted connections, as we are not yet
opening encrypted connections, but I wired through the `tls_status` field on
the `Client` transport context as well, so when we start opening client
connections with TLS, the label will be applied to their metrics as well.
Closes#1046
Signed-off-by: Eliza Weisman <eliza@buoyanbt.io>
* Proxy: Make TLS server aware of its own identity.
When validating the TLS configuration, make sure the certificate is
valid for the current pod. Make the pod's identity available at that
point in time so it can do so. Since the identity is available now,
simplify the validation of our own certificate by using Rustls's API
instead of dropping down to the lower-level webpli API.
This is a step towards the server differentiating between TLS
handshakes it is supposed to terminate vs. TLS handshakes it is
supposed to pass through.
This is also a step toward the client side (connect) of TLS, which will
reuse much of the configuration logic.
Signed-off-by: Brian Smith <brian@briansmith.org>
Remove the filling and stacking in request rate graphs that combine resources,
to make it easier to spot outliers.
* Grafana: remove fill and stack from individual resource breakouts
* Remove all the stacks and fills from request rates everywhere
The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via
the Kubernetes API server. In some configurations of Kubernetes, namely
minikube and at least one baremetal kubespray cluster, this API call
requires the `get` verb on the `nodes/proxy` resource.
Enable `get` for `nodes/proxy` for the `conduit-prometheus` service
account.
Fixes#912
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Using MS Edge and probably other clients with the Conduit proxy when
TLS is enabled fails because Rustls doesn't take into consideration
that Conduit only supports one signature scheme (ECDSA P-256 SHA-256).
This bug was fixed in Rustls when ECDSA support was added, after the
latest release. With this change MS Edge can talk to Conduit.
Signed-off-by: Brian Smith <brian@briansmith.org>
Previously, the proxy would not attempt to load its TLS certificates until a fs
watch detected that one of them had changed. This means that if the proxy was
started with valid files already at the configured paths, it would not load
them until one of the files changed.
This branch fixes that issue by starting the stream of changes with one event
_followed_ by any additional changes detected by watching the filesystem.
I've manually tested that this fixes the issue, both on Linux and on macOS, and
can confirm that this fixes the issue. In addition, when I start writing
integration tests for certificate reloading, I'll make sure to include a test
to detect any regressions.
Closes#1133.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Refactor the way the TLS trust anchors are configured in preparation
for the client and server authenticating each others' certificates.
Make the use of client certificates optional pending the implementation
of authorization policy.
Signed-off-by: Brian Smith <brian@briansmith.org>
When a TLS handshake error occurs, the proxy just stops accepting
requests. It seems my expectations of how `Stream` handles errors
were wrong.
The test for this will be added in a separate PR after the
infrastructure needed for TLS testing is added. (This is a chicken
and egg problem.)
Signed-off-by: Brian Smith <brian@briansmith.org>
* Start running integration tests in CI
* Add gcp helper funcs
* Split integration test cleanup into separate phase
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Display proxy container errors in the Web UI
Add an error modal to display pod errors
Add icon to data tables to indicate errors are present
Display errors on the Service Mesh Overview Page and all the resource pages
This PR changes the proxy's Inotify watch code to avoid always falling back to
polling the filesystem when the watched files don't exist yet. It also contains
some additional cleanup and refactoring of the inotify code, including moving
the non-TLS-specific filesystem watching code out of the `tls::config` module
and into a new `fs_watch` module.
In addition, it adds tests for both the polling-based and inotify-based watch
implementations, and changes the polling-based watches to hash the files rather
than using timestamps from the file's metadata to detect changes. These changes
are originally from #1094 and #1091, respectively, but they're included here
because @briansmith asked that all the changes be made in one PR.
Closes#1094. Closes#1091. Fixes#1090. Fixes#1097. Fixes#1061.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
prost-0.4.0 has been released, which removes unnecessary dependencies.
tower-grpc is being updated simultaneously, as this is the proxy's
primary use of prost.
See: https://github.com/danburkert/prost/releases/tag/v0.4.0
- It would be nice to display container errors in the UI. This PR gets the pod's container
statuses and returns them in the public api
- Also add a terminationMessagePolicy to conduit's inject so that we can capture the
proxy's error messages if it terminates
* proxy: Update `rand` to 0.5.1
The proxy depends on rand-0.4, which is superceded by newer APIs in
rand-0.5. Since we're already using rand-0.5 via the tower-balance
crate, it seems appropriate to upgrade the proxy.
* Expand lock files in reviews