Commit Graph

303 Commits

Author SHA1 Message Date
Oliver Gould 4e79348af7
Fully encapsulate process metrics in `mod process` (#41)
The `process` module exposes a `Sensor` type that is different from
other types called `Sensor`. Most `Sensor` types instrument other
types with telemetry. The `process::Sensor` type, on the other hand,
is used to read system metrics from the `/proc` filesystem, returning
a metrics summary.

Furthermore, `telemetry::metrics::Root` owns the process start time
metric.

In the interest of making the telemetry system more modular, this moves
all process-related telemetry concerns into the `process` module.
Instead of exposing a `Sensor` that produces metrics, a single public
`Process` type implements `fmt::Display` directly.

This removes process-related concerns from `telemetry/metrics/mod.rs` to
setup further refactoring along these lines.
2018-08-06 14:09:33 -07:00
Eliza Weisman 1774c87400
Refactor control::destination::background::client module (#38)
This branch should not make any functional changes.

This branch makes two minor refactorings to the `client` module in 
`control::destination::background`:

 1. Remove the `AddOrigin` middleware and replace it with the 
    `tower-add-origin` crate from `tower-http`. These middlewares are
    functionally identical, but the Tower version has tests.
 2. Change `ClientService` from a type alias to a tuple struct. This
    means that some of the middleware that are used only in this module
    (`LogErrors` and `Backoff`) are no longer part of a publicly visible
    type and can be made private to the module.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-03 17:00:20 -07:00
Sean McArthur ab1b280de8
Add orig-proto which uses HTTP2 between proxies (#32)
When the destination service returns a hint that an endpoint is another
proxy, eligible HTTP1 requests are translated into HTTP2 and sent over
an HTTP2 connection. The original protocol details are encoded in a
header, `l5d-orig-proto`. When a proxy receives an inbound HTTP2
request with this header, the request is translated back into its HTTP/1
representation before being passed to the internal service.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-08-03 15:03:14 -07:00
Eliza Weisman 1e24aeb615
Limit concurrent Destination service queries (#36)
Required for linkerd/linkerd2#1322.

Currently, the proxy places a limit on the number of active routes
in the route cache. This limit defaults to 100 routes, and is intended
to prevent the proxy from requesting more than 100 lookups from the 
Destination service. 

However, in some cases, such as Prometheus scraping a large number of
pods, the proxy hits this limit even though none of those requests 
actually result in requests to service discovery (since Prometheus 
scrapes pods by their IP addresses). 

This branch implements @briansmith's suggestion in 
https://github.com/linkerd/linkerd2/issues/1322#issuecomment-407161829.
It splits the router capacity limit to two separate, configurable 
limits, one that sets an upper bound on the number of concurrently 
active destination lookups, and one that limits the capacity of the
router cache.

I've done some preliminary testing using the `lifecycle` tests, where a
single Prometheus instance is configured to scrape a very large number 
of proxies. In these tests, neither limit is reached. Furthermore, I've added
integration tests in `tests/discovery` to exercise the destination service 
query limit. These tests ensure that query capacity is released when inactive
routes which create queries are evicted from the router cache, and that the
limit does _not_ effect DNS queries.

This branch obsoletes and closes #27, which contained an earlier version of
these changes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-02 16:40:12 -07:00
Oliver Gould 18a8d7956d
Improve tcp_connect_err test flakiness (#37)
Both tcp_connect_err tests frequently fail, even during local
development. This seems to happen because the proxy doesn't necessarily
observe the socket closure.

Instead of shutting down the socket gracefully, we can just drop it!
This helps the test pass much more reliably.
2018-08-01 17:25:49 -07:00
Eliza Weisman 37164afb3a
refactor: Make `Background::query_destination_service_if_relevant` into a method (#35)
This is strictly a refactor which should make no functional changes.

Currently, the function used to construct new Destination service
queries (`Background::query_destination_service_if_relevant`) is a
function rather than a method on `Background`, although it takes as an
argument a field from `Background`. This is because in some cases, it is
called where `self.destinations` is borrowed mutably, preventing `self`
from being borrowed immutably.

Right now, this means that one additional field has to be passed
explicitly. However, in order to add the limit on active Destination
service queries, it was necessary to add two additional fields to
`Background` that have to be passed to this function. Since these
arguments should always come from fields on `Background`, it would be
preferable for this to be a method.

This branch breaks out some of the fields on
`control::destination::Background` into their own structs:
`DestinationCache`, which holds the map of DNS names to
`DestinationSet`s and the queue of DNS names that need reconnects; and
`Config`, which holds the configuration necessary to create a new
Destination service query (currently just the `Namespaces` config). This
allows us to have separate borrows on the `DestinationCache` and
`Config`. 

When I make the additional changes necessary to add the limit on active 
destination queries, the two additional fields necessary can be added to 
`Confiq`, rather than having to explicitly pass them into 
`query_destination_service_if_relevant` every time it's called.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-01 15:37:48 -07:00
Eliza Weisman a615834f7b
Refactor `control::destination::background` into smaller modules (#34)
This branch is purely a refactor and should result in no functional 
changes.

The `control::destination::background` module has become quite large,
making the code difficult to read and review changes to. This branch
separates out the `DestinationSet` type and the destination service
client code into their own modules inside of `background`. Furthermore,
it rolls the `control::utils` module into the `client` submodule of
`background`, as that code is only used in the `client` module.

I think there's some additional work that can be done to make this code
clearer beyond simply splitting apart some of these large files, and I
intend to do some refactoring in additional follow-up branches.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-08-01 14:14:34 -07:00
Eliza Weisman 51e07b2a68
Update h2 version to v0.1.11 (#33)
This branch updates the proxy's `h2` dependency to v0.1.11. This version
removes a busy loop when shutting down an idle server
(carllerche/h2#296), and fixes a potential panic when dropping clients
(carllerche/h2#295).

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-31 12:12:26 -07:00
Oliver Gould b2fcd5d276
Remove the telemetry system's event channel (#30)
The proxy's telemetry system is implemented with a channel: the proxy thread
generates events and the control thread consumes these events to record
metrics and satisfy Tap observations. This design was intended to minimize
latency overhead in the data path.

However, this design leads to substantial CPU overhead: the control thread's
work scales with the proxy thread's work, leading to resource contention in
busy, resource-limited deployments. This design also has other drawbacks in
terms of allocation & makes it difficult to implement planned features like
payload-aware Tapping.

This change removes the event channel so that all telemetry is recorded
instantaneously in the data path, setting up for further simplifications so
that, eventually, the metrics registry properly uses service lifetimes to
support eviction.

This change has a potentially negative side effect: metrics scrapes obtain
the same lock that the data path uses to write metrics so, if the metrics
server gets heavy traffic, it can directly impact proxy latency. These
effects will be ameliorated by future changes that reduce the need for the
Mutex in the proxy thread.
2018-07-26 11:16:27 -07:00
Markus Jais 7788f60e0e fixed some typos in comments and Dockerfile (#25)
Signed-off-by: Markus Jais <markusjais@googlemail.com>
2018-07-25 10:10:59 -10:00
Brian Smith 9a19457ca1
Add initial tests for client and server connection handling w.r.t. TLS. (#28)
* Add initial tests for client and server connection handling w.r.t. TLS.

Add a simple framework for TLS connection handling and some initial tests
that use it.

An explicit effort has been made to keep the test configuration as close to the
production configuration as possible; e.g. we use regular TCP sockets instead of some
mock TCP sockets. This matters less now, but will matter more later, if/when we
implement more low-level TLS-over-TCP optimizations.

Rename `ConnectionConfig::identity` to `ConnectionConfig::server_identity` to make
it clearer that it is always the identity of the server, regardless of which role
the `ConnectionConfig` is being used in.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-24 17:32:23 -10:00
Sean McArthur 04a8ae3edf
update to hyper 0.12.7 to fix a keep-alive bug (#26)
Specifically proxied bodies would make use of an optimization in hyper
that resulted in the connection not knowing (but it did know! just
didn't tell itself...) that the body was finished, and so the connection
was closed. 0.12.7 includes the fix in hyper.

As part of this upgrade, the keep-alive tests have been adjusted to send
a small body, since the empty body was not triggering this case.
2018-07-23 18:33:55 -07:00
Eliza Weisman 2d4086aee9
Add errno label to transport close metrics (when applicable) (#12)
This branch adds a label displaying the Unix error code name (or the raw
error code, on other operating systems or if the error code was not 
recognized) to the metrics generated for TCP connection failures.

It also adds a couple of tests for label formatting.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-23 15:37:04 -07:00
Brian Smith b3b578be39
Configure listen ports' TLS when constructing them. (#21)
The way TLS is done for a bound port is fixed based on its role and whatever
the TLS settings are, so it makes sense to configure the TLS aspects of the
bound port during construction. This will also make writing tests easier.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-20 10:40:50 -10:00
Brian Smith 448605b4c3
Allow TLS configuration watches to start before telemetry. (#19)
Previously it wasn't possible to create objects that need to watch the TLS
configuration until the telemetry sensors were created. Split the watching
initialization into two parts so that in the (near) future TLS-using objects can
be created before the telemetry sensors are created.

This will allow us to initialize `BoundPort` with the TLS settings during
creation instead of later in `BoundPort::listen_and_fold()`. This will also
facilitate TLS testing.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-20 10:21:25 -10:00
Brian Smith 38058eb7d8
Reduce visibility of some `transport::connection` items. (#20)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-20 10:20:50 -10:00
Eliza Weisman 6a81e1f137
Improve error messages in logs (#18)
Currently, the messages that the proxy logs on errors are often not 
very useful. For example, when an error occurred that caused the proxy
to return HTTP 500, we log a message like this:

```
ERR! proxy={server=in listen=0.0.0.0:4143 remote=127.0.0.1:57416} linkerd2_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
```

Note that:
+ Regardless of what the error actually was, the current log message
  *always* says protocol error: unexpected internal error encountered,
  which is both fairly unclear *and* often not actually the case.
+ Regardless of whether the error was encountered by an HTTP/1 or 
  HTTP/2 client, the error message always includes the string
  "underlying HTTP/2 error". This is probably fairly confusing for 
  users who are, say, only proxying HTTP/1 traffic.

This branch fixes several related issues around the clarity of the
proxy's error messages:

+ A couple cases in the `transparency` module that returned
  `io::Error::from(io::ErrorKind::Other)` have been replaced with
  more descriptive errors that propagate the underlying error. This
  necessitated adding bounds on some error types.
+ Introduced a new `transparency::client::Error` enum that can be 
  either a `h2::Error` or a `hyper::Error`, depending on whether
  the client is HTTP/1 or HTTP/2, and proxies its' `std::error::Error`
  impl to the wrapped error types. This way, we don't return a
  `tower_h2::client::Error` (with format output that includes the 
  string "HTTP/2") from everything, and discard significantly less
  error information.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-19 16:38:31 -07:00
Sean McArthur 04e4b4409b update httparse to v1.3.2
Allows using SIMD instructions when parsing.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-19 16:12:35 -07:00
Sean McArthur 11e7eb6357 update hyper to v0.12.6
Brings in fix to reduce connection churn related to keep-alive racing.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-19 16:12:35 -07:00
Eliza Weisman 30a48a7d8b
Accept TLS connnections even when TLS configuration isn't available (#22)
Closes linkerd/linkerd2#1272.

Currently, if TLS is enabled but the TLS configuration isn't available
(yet), the proxy will pass through all traffic to the application.
However, the destination service will tell other proxies to send TLS
traffic to the pod unconditionally, so the proxy will pass through TLS
handshakes to the application that are destined for the proxy itself.

In linkerd/linkerd2#1272, @briansmith suggested that we change the 
proxy so that when it hasn't yet loaded a TLS configuration, it will
accept TLS handshakes, but fail them. This branch implements that 
behaviour by making the `rustls::sign::CertifiedKey` in `CertResolver`
optional, and changing the `CertResolver` to return `None` when 
`rustls` asks it to resolve a certificate in that case. The server
config watch is now initially created with `Conditional::Some` with an
empty `CertResolver`, rather than `Conditional::None(NoConfig)`, so
that the proxy will accept incoming handshakes, but fail them.

Signed-off-by: Eliza Weisman <eliza@buoyant.io
2018-07-18 17:06:45 -07:00
Eliza Weisman f208acb3a5
Fix incorrect process_cpu_seconds_total metric (#7)
Fixes linkerd/linkerd2#1239.

The proxy's `process_cpu_seconds_total` metric is currently calculated
incorrectly and will differ from the CPU stats reported by other 
sources. This is because it currently calculates the CPU time by summing
the `utime` and `stime` fields of the stat struct returned by `procinfo`.
However, those numbers are expressed in _clock ticks_, not seconds, so
the metric is expressed in the wrong unit.

This branch fixes this issue by using `sysconf` to get the number of
clock ticks per second when the process sensor is created, and then
dividing `utime + stime` by that number, so that the value is expressed
in seconds.

## Demonstration:

(Note that the last column in `ps aux` output is the CPU time total)
```
eliza@ares:~$ ps aux | grep linkerd2-proxy | grep -v grep
eliza    40703  0.2  0.0  45580 14864 pts/0    Sl+  13:49   0:03 target/debug/linkerd2-proxy
eliza@ares:~$ curl localhost:4191/metrics
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 3
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 19
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 46673920
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 15220736
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1531428584
eliza@ares:~$
```

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-17 15:48:02 -07:00
Brian Smith 1fdcbfaba6
Replace "conduit" with "linkerd" in TLS test data. (#17)
This is purely aesthetic; the TLS logic doesn't care about the product name.

The test data was regenerated by re-running gen-certs.sh after modifying it.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-17 09:22:17 -10:00
Brian Smith e1b4e66836
Upgrade TLS dependencies. (#16)
Fixes linkerd/linkerd2#1330.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-17 09:21:59 -10:00
Sean McArthur 162f53dc8d
spawn individual admin tasks instead of joining all (#10)
Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-17 11:50:57 -07:00
Eliza Weisman 2ac114ba65
Point inotify dependency at master (#14)
Now that inotify-rs/inotify#105 has merged, we will no longer see
rampant CPU use from using the master version of `inotify`. I've 
updated Cargo.toml to depend on master rather than on my branch.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-17 10:21:40 -07:00
Sean McArthur aa60ddb088
remove busy loop from destination background future when shutdown (#15)
When the proxy is shutting down, once there are no more outbound
connections, the sender side of the resolver channel is dropped. In the
admin background thread, when the destination background future is
notified of the closure, instead of shutting down itself, it just busy
loops. Now, after seeing shutdown, the background future ends as well.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-17 09:50:08 -07:00
Sean McArthur 9f5648d955
fix control client Backoff to poll its timer when backing off (#13)
The `Backoff` service wrapper is used for the controller client service
so that if the proxy can't find the controller (there is a connection
error), it doesn't keep trying in a tight loop, but instead waits a
couple seconds before trying again, presuming that the control plane
was rebooting.

When "backing off", a timer would be set, but it wasn't polled, so the
task was never registered to wake up after the delay. This turns out to
not have been a problem in practice, since the background destination
task was joined with other tasks that were constantly waking up,
allowing it to try again anyways.

To add tests for this, a new `ENV_CONTROL_BACKOFF_DELAY` config value
has been added, so that the tests don't have to wait the default 5
seconds.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-16 12:41:47 -07:00
Brian Smith 73edefb795
Move `connection` submodule to `transport`. (#11)
This allows easier logging configuration for the entire transport system
using the common prefix `conduit_proxy::transport`. Previously logging had to be
controlled separately/additionally for `conduit_proxy::connection`.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-07-13 12:46:38 -10:00
Sean McArthur f7233fd682 Revert "remove busy loop from destination background future during shutdown (#8)"
This reverts commit 4bee7b0b55.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-13 13:18:39 -07:00
Oliver Gould bbf217ff4f
Replace references to _Conduit_ (#6)
There are various comments, examples, and documentation that refers to
Conduit. This change replaces or removes these refernces.

CONTRIBUTING.md has been updated to refer to GOVERNANCE/MAINTAINERS.
2018-07-12 20:41:17 -07:00
Eliza Weisman 2f4c1b220a
Add labels for `tls::ReasonForNoIdentity` (#5)
Fixes linkerd/linkerd2#1276.

Currently, metrics with the `tls="no_identity"` label are duplicated.
This is because that label is generated from the `tls_status` label on
the `TransportLabels` struct, which is either `Some(())` or a
`ReasonForNoTls`. `ReasonForNoTls` has a
variant`ReasonForNoTls::NoIdentity`, which contains a
`ReasonForNoIdentity`, but when we format that variant as a label, we
always just produce the string "no_identity", regardless of the value of
the `ReasonForNoIdentity`. 

However, label types are _also_ used as hash map keys into the map that
stores the metrics scopes, so although two instances of
`ReasonForNoTls::NoIdentity` with different `ReasonForNoIdentity`s
produce the same formatted label output, they aren't _equal_, since that
field differs, so they correspond to different metrics.

This branch resolves this issue, by adding an additional label to these
metrics, based on the `ReasonForNoIdentity`. Now, the separate lines in
the metrics output that correspond to each `ReasonForNoIdentity` have a
label differentiating them from each other.

Note that the the `NotImplementedForTap` and `NotImplementedForMetrics`
reasons will never show up in metrics labels, currently, since we don't gather 
metrics from the tap and metrics servers at the moment.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-12 16:04:25 -07:00
Sean McArthur 4bee7b0b55
remove busy loop from destination background future during shutdown (#8)
When the proxy is shutting down, once there are no more outbound
connections, the sender side of the resolver channel is dropped. In the
admin background thread, when the destination background future is
notified of the closure, instead of shutting down itself, it just busy
loops. Now, after seeing shutdown, the background future ends as well.

While examining this, I noticed all the background futures are joined
togther into a single `Future` before being spawned on a dedicated
current_thread executor. Join in this case is inefficient, since *every*
single time *one* of the futures is ready, they are *all* polled again.
Since we have an executor handy, it's better to allow it to manage each
of the futures individually.

Signed-off-by: Sean McArthur <sean@buoyant.io>
2018-07-12 15:22:32 -07:00
Oliver Gould 3c48ba7f62
Add a README (#4) 2018-07-11 16:01:54 -07:00
Oliver Gould 8db765c7bc
dev: Add a Dockerfile for development (#3)
When working on the proxy, it's important to be able to build a Docker
image that can be tested in the context of the existing linkerd2
project.

This change adds a `make docker` target that produces a docker image,
optionally tagged via the `DOCKER_TAG` environment variable.

This is intended to be used for development--especially on non-Linux
OSes.
2018-07-11 15:27:33 -07:00
Oliver Gould 0ca5d11c03
Adopt Linkerd2's governance (#2)
For the time being, @briansmith and I will serve as super-maintainers
for the linkerd2-proxy.
2018-07-10 15:59:12 -07:00
Oliver Gould 02a64e980f
ci: Publish artifacts to build.l5d.io
In order to setup continuous integration, proxy artifacts need to be
published somewhere predictable and discoverable. This change configures
Travis CI to publish proxy artifacts built from master to:

    build.l5d.io/linkerd2-proxy/linkerd2-proxy-${ref}.tar.gz
    build.l5d.io/linkerd2-proxy/linkerd2-proxy-${ref}.txt

The tarball includes an optimized proxy binary and metadata (right now, just
the LICENSE file, but later this should include additional version/build
metadata that can be used for runtime diagnostics).

The text file includes the sha256 sum of the tarball.

A `Makefile` is introduced to encapsulate build logic so that it can both
drive CI and be used manually.

Travis CI is configured to run debug-mode tests against PRs and to run a full
release package-test-publish for commits to
master.
2018-07-08 14:24:25 -07:00
Oliver Gould c23ecd0cbc
Migrate `conduit-proxy` to `linkerd2-proxy`
The proxy now honors environment variables starting with
`LINKERD2_PROXY_`.
2018-07-07 22:45:21 +00:00
Eliza Weisman ec303942ee proxy: Add tls_config_last_reload_seconds metric (#1204)
Depends on #1141.

This PR adds a `tls_config_last_reload_seconds` Prometheus metric
that reports the last time the TLS configuration files were reloaded.

Proof that it works:

Started the proxy with no certs, then generated them:
```
➜ http GET localhost:4191/metrics
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 323
content-type: text/plain
date: Mon, 25 Jun 2018 23:02:52 GMT

# HELP tls_config_reload_total Total number of times the proxy's TLS config files were reloaded.
# TYPE tls_config_reload_total counter
tls_config_reload_total{status="io_error",path="example-example.crt",error_code="2"} 9
tls_config_reload_total{status="reloaded"} 3
# HELP tls_config_last_reload_seconds Timestamp of when the TLS configuration files were last reloaded successfully (in seconds since the UNIX epoch)
# TYPE tls_config_last_reload_seconds gauge
tls_config_last_reload_seconds 1529967764
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1529967754
```

Started the proxy with certs already present:
```
➜ http GET localhost:4191/metrics
HTTP/1.1 200 OK
content-encoding: gzip
content-length: 285
content-type: text/plain
date: Mon, 25 Jun 2018 23:04:39 GMT

# HELP tls_config_reload_total Total number of times the proxy's TLS config files were reloaded.
# TYPE tls_config_reload_total counter
tls_config_reload_total{status="reloaded"} 4
# HELP tls_config_last_reload_seconds Timestamp of when the TLS configuration files were last reloaded successfully (in seconds since the UNIX epoch)
# TYPE tls_config_last_reload_seconds gauge
tls_config_last_reload_seconds 1529967876
# HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch)
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1529967874
```

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-05 16:23:57 -07:00
Eliza Weisman dd7ac18cc5 proxy: Fix out-of-control inotify CPU use (#1263)
The `inotify-rs` library's `EventStream` implementation currently 
calls `task::current().notify()` in a hot loop when a poll returns
`WouldBlock`, causing the task to constantly burn CPU. 

This branch updates the `inotify-rs` dependency to point at a branch
of `inotify-rs` I had previously written. That branch  rewrites the 
`EventStream` to use `mio` to  register interest in the `inotify` file 
descriptor instead, fixing the out-of-control polling. 

When inotify-rs/inotify#105 is merged upstream, we can go back to 
depending on the master version of the library.

Fixes #1261

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-03 20:16:12 -07:00
Oliver Gould b9b35ec11c proxy: Handle connection close during TLS detection (#1256)
During protocol detection, we buffer data to detect a TLS Client Hello
message. If the client disconnects while this detection occurs, we do
not properly handle the disconnect, and the proxy may busy loop.

To fix this, we must handle the case where `read(2)` returns 0 by
creating a `Connection` with the already-closed socket.

While doing this, I've moved some of the implementation of
`ConditionallyUpgradeServerToTls::poll` into helpers on
`ConditionallyUpgradeServerToTlsInner` so that the poll method is easier
to read, hiding the inner details from the polling logic.
2018-07-03 15:36:48 -07:00
Eliza Weisman 1e39ab6ac4 proxy: Add a Prometheus metric for reporting errors loading TLS configs (#1141)
This PR adds a Prometheus stat tracking the number of times
TLS config files have been reloaded, and the number of times
reloading those files has errored. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-07-03 15:24:20 -07:00
Eliza Weisman 5a3b1cdb3a proxy: Add TLS label in `transparency::retry_reconnect_errors` test (#1258) 2018-07-03 12:27:08 -07:00
Oliver Gould 866167a955 tap: Support `tls` labeling (#1244)
The proxy's metrics are instrumented with a `tls` label that describes
the state of TLS for each connection and associated messges.

This same level of detail is useful to get in `tap` output as well.

This change updates Tap in the following ways:
* `TapEvent` protobuf updated:
  * Added `source_meta` field including source labels
  * `proxy_direction` enum indicates which proxy server was used.
* The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection
* The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.
2018-07-02 17:19:20 -07:00
Oliver Gould 051a7639c5 proxy: Always inlcude `tls` label in metrics (#1243)
The `tls` label could sometimes be formatted incorrectly, without a
preceding comma.

To fix this, the `TlsStatus` type no longer formats commas so that they
must be provided in the context in which they are used (as is done
otherwise in this file).
2018-07-02 16:21:06 -07:00
Eliza Weisman 91108a2d53 proxy: Fall back to plaintext communication when a TLS handshake fails (#1173)
This branch modifies the proxy's logic for opening a connection so
that when an attempted TLS handshake fails, the proxy will retry that
connection without TLS.

This is implemented by changing the `UpgradeToTls` case in the `Future`
implementation for `Connecting`, so that rather than simply wrapping
a poll to the TLS upgrade future with `try_ready!` (and thus failing
the future if the upgrade future fails), we reset the state of the
future to the `Plaintext` state and continue looping. The `tls_status`
field of the future is changed to `ReasonForNoTls::HandshakeFailed`,
and the `Plaintext` state is changed so that if its `tls_status` is
`HandshakeFailed`, it will no longer attempt to upgrade to TLS when the
plaintext connection is successfully established.

Closes #1084 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-29 17:08:03 -07:00
Brian Smith da61aace6c Proxy: Skip TLS for control plane loopback connections. (#1229)
If the controller address has a loopback host then don't use TLS to connect
to it. TLS isn't needed for security in that case. In mormal configurations
the proxy isn't terminating TLS for loopback connections anyway.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-06-28 17:24:09 -10:00
Brian Smith 03814c18eb Proxy: Get identity of pod & controller from configuration. (#1221)
Instead of attempting to construct identities itself, have the proxy
accept fully-formed identities from whatever configures it. This allows
us to centralize the formatting of the identity strings in the Go code
that is shared between the `conduit inject`, `conduit install`, and CA
components.

One wrinkle: The pod namespace isn't necessarily available at
`conduit inject` time, so the proxy must implement a simple variable
substitution mechanism to insert the pod namespace into its identity.

This has the side-effect of enabling TLS to the controller since the
controller's identity is now available.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-06-27 17:17:34 -10:00
Brian Smith c67546653a Proxy: Use new destination service TLS identity scheme. (#1222)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-06-27 14:47:57 -10:00
Eliza Weisman af7b56f963 proxy: Replace >=100,000 ms latency buckets with 1, 2, 3, 4, and 5 ms (#1218)
This branch adds buckets for latencies below 10 ms to the proxy's latency
histograms, and removes the buckets for 100, 200, 300, 400, and 500 
seconds, so the largest non-infinity bucket is 50,000 ms. It also removes
comments that claimed that these buckets were the same as those created
by the control plane, as this is no longer true (the metrics are now scraped
by Prometheus from the proxy directly).

Closes #1208

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-06-27 16:53:42 -07:00
Kevin Lingerfelt 26d2bce656 Update dest service with a different tls identity strategy (#1215)
* Update dest service with a different tls identity strategy
* Send controller namespace as separate field

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-27 11:40:02 -07:00