linkerd2-proxy

Commit Graph

Author	SHA1	Message	Date
Oliver Gould	4e79348af7	Fully encapsulate process metrics in `mod process` (#41 ) The `process` module exposes a `Sensor` type that is different from other types called `Sensor`. Most `Sensor` types instrument other types with telemetry. The `process::Sensor` type, on the other hand, is used to read system metrics from the `/proc` filesystem, returning a metrics summary. Furthermore, `telemetry::metrics::Root` owns the process start time metric. In the interest of making the telemetry system more modular, this moves all process-related telemetry concerns into the `process` module. Instead of exposing a `Sensor` that produces metrics, a single public `Process` type implements `fmt::Display` directly. This removes process-related concerns from `telemetry/metrics/mod.rs` to setup further refactoring along these lines.	2018-08-06 14:09:33 -07:00
Eliza Weisman	1774c87400	Refactor control::destination::background::client module (#38 ) This branch should not make any functional changes. This branch makes two minor refactorings to the `client` module in `control::destination::background`: 1. Remove the `AddOrigin` middleware and replace it with the `tower-add-origin` crate from `tower-http`. These middlewares are functionally identical, but the Tower version has tests. 2. Change `ClientService` from a type alias to a tuple struct. This means that some of the middleware that are used only in this module (`LogErrors` and `Backoff`) are no longer part of a publicly visible type and can be made private to the module. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-03 17:00:20 -07:00
Sean McArthur	ab1b280de8	Add orig-proto which uses HTTP2 between proxies (#32 ) When the destination service returns a hint that an endpoint is another proxy, eligible HTTP1 requests are translated into HTTP2 and sent over an HTTP2 connection. The original protocol details are encoded in a header, `l5d-orig-proto`. When a proxy receives an inbound HTTP2 request with this header, the request is translated back into its HTTP/1 representation before being passed to the internal service. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-08-03 15:03:14 -07:00
Eliza Weisman	1e24aeb615	Limit concurrent Destination service queries (#36 ) Required for linkerd/linkerd2#1322. Currently, the proxy places a limit on the number of active routes in the route cache. This limit defaults to 100 routes, and is intended to prevent the proxy from requesting more than 100 lookups from the Destination service. However, in some cases, such as Prometheus scraping a large number of pods, the proxy hits this limit even though none of those requests actually result in requests to service discovery (since Prometheus scrapes pods by their IP addresses). This branch implements @briansmith's suggestion in https://github.com/linkerd/linkerd2/issues/1322#issuecomment-407161829. It splits the router capacity limit to two separate, configurable limits, one that sets an upper bound on the number of concurrently active destination lookups, and one that limits the capacity of the router cache. I've done some preliminary testing using the `lifecycle` tests, where a single Prometheus instance is configured to scrape a very large number of proxies. In these tests, neither limit is reached. Furthermore, I've added integration tests in `tests/discovery` to exercise the destination service query limit. These tests ensure that query capacity is released when inactive routes which create queries are evicted from the router cache, and that the limit does _not_ effect DNS queries. This branch obsoletes and closes #27, which contained an earlier version of these changes. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-02 16:40:12 -07:00
Oliver Gould	18a8d7956d	Improve tcp_connect_err test flakiness (#37 ) Both tcp_connect_err tests frequently fail, even during local development. This seems to happen because the proxy doesn't necessarily observe the socket closure. Instead of shutting down the socket gracefully, we can just drop it! This helps the test pass much more reliably.	2018-08-01 17:25:49 -07:00
Eliza Weisman	37164afb3a	refactor: Make `Background::query_destination_service_if_relevant` into a method (#35 ) This is strictly a refactor which should make no functional changes. Currently, the function used to construct new Destination service queries (`Background::query_destination_service_if_relevant`) is a function rather than a method on `Background`, although it takes as an argument a field from `Background`. This is because in some cases, it is called where `self.destinations` is borrowed mutably, preventing `self` from being borrowed immutably. Right now, this means that one additional field has to be passed explicitly. However, in order to add the limit on active Destination service queries, it was necessary to add two additional fields to `Background` that have to be passed to this function. Since these arguments should always come from fields on `Background`, it would be preferable for this to be a method. This branch breaks out some of the fields on `control::destination::Background` into their own structs: `DestinationCache`, which holds the map of DNS names to `DestinationSet`s and the queue of DNS names that need reconnects; and `Config`, which holds the configuration necessary to create a new Destination service query (currently just the `Namespaces` config). This allows us to have separate borrows on the `DestinationCache` and `Config`. When I make the additional changes necessary to add the limit on active destination queries, the two additional fields necessary can be added to `Confiq`, rather than having to explicitly pass them into `query_destination_service_if_relevant` every time it's called. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-01 15:37:48 -07:00
Eliza Weisman	a615834f7b	Refactor `control::destination::background` into smaller modules (#34 ) This branch is purely a refactor and should result in no functional changes. The `control::destination::background` module has become quite large, making the code difficult to read and review changes to. This branch separates out the `DestinationSet` type and the destination service client code into their own modules inside of `background`. Furthermore, it rolls the `control::utils` module into the `client` submodule of `background`, as that code is only used in the `client` module. I think there's some additional work that can be done to make this code clearer beyond simply splitting apart some of these large files, and I intend to do some refactoring in additional follow-up branches. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-08-01 14:14:34 -07:00
Eliza Weisman	51e07b2a68	Update h2 version to v0.1.11 (#33 ) This branch updates the proxy's `h2` dependency to v0.1.11. This version removes a busy loop when shutting down an idle server (carllerche/h2#296), and fixes a potential panic when dropping clients (carllerche/h2#295). Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-31 12:12:26 -07:00
Oliver Gould	b2fcd5d276	Remove the telemetry system's event channel (#30 ) The proxy's telemetry system is implemented with a channel: the proxy thread generates events and the control thread consumes these events to record metrics and satisfy Tap observations. This design was intended to minimize latency overhead in the data path. However, this design leads to substantial CPU overhead: the control thread's work scales with the proxy thread's work, leading to resource contention in busy, resource-limited deployments. This design also has other drawbacks in terms of allocation & makes it difficult to implement planned features like payload-aware Tapping. This change removes the event channel so that all telemetry is recorded instantaneously in the data path, setting up for further simplifications so that, eventually, the metrics registry properly uses service lifetimes to support eviction. This change has a potentially negative side effect: metrics scrapes obtain the same lock that the data path uses to write metrics so, if the metrics server gets heavy traffic, it can directly impact proxy latency. These effects will be ameliorated by future changes that reduce the need for the Mutex in the proxy thread.	2018-07-26 11:16:27 -07:00
Markus Jais	7788f60e0e	fixed some typos in comments and Dockerfile (#25 ) Signed-off-by: Markus Jais <markusjais@googlemail.com>	2018-07-25 10:10:59 -10:00
Brian Smith	9a19457ca1	Add initial tests for client and server connection handling w.r.t. TLS. (#28 ) * Add initial tests for client and server connection handling w.r.t. TLS. Add a simple framework for TLS connection handling and some initial tests that use it. An explicit effort has been made to keep the test configuration as close to the production configuration as possible; e.g. we use regular TCP sockets instead of some mock TCP sockets. This matters less now, but will matter more later, if/when we implement more low-level TLS-over-TCP optimizations. Rename `ConnectionConfig::identity` to `ConnectionConfig::server_identity` to make it clearer that it is always the identity of the server, regardless of which role the `ConnectionConfig` is being used in. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-24 17:32:23 -10:00
Sean McArthur	04a8ae3edf	update to hyper 0.12.7 to fix a keep-alive bug (#26 ) Specifically proxied bodies would make use of an optimization in hyper that resulted in the connection not knowing (but it did know! just didn't tell itself...) that the body was finished, and so the connection was closed. 0.12.7 includes the fix in hyper. As part of this upgrade, the keep-alive tests have been adjusted to send a small body, since the empty body was not triggering this case.	2018-07-23 18:33:55 -07:00
Eliza Weisman	2d4086aee9	Add errno label to transport close metrics (when applicable) (#12 ) This branch adds a label displaying the Unix error code name (or the raw error code, on other operating systems or if the error code was not recognized) to the metrics generated for TCP connection failures. It also adds a couple of tests for label formatting. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-23 15:37:04 -07:00
Brian Smith	b3b578be39	Configure listen ports' TLS when constructing them. (#21 ) The way TLS is done for a bound port is fixed based on its role and whatever the TLS settings are, so it makes sense to configure the TLS aspects of the bound port during construction. This will also make writing tests easier. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-20 10:40:50 -10:00
Brian Smith	448605b4c3	Allow TLS configuration watches to start before telemetry. (#19 ) Previously it wasn't possible to create objects that need to watch the TLS configuration until the telemetry sensors were created. Split the watching initialization into two parts so that in the (near) future TLS-using objects can be created before the telemetry sensors are created. This will allow us to initialize `BoundPort` with the TLS settings during creation instead of later in `BoundPort::listen_and_fold()`. This will also facilitate TLS testing. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-20 10:21:25 -10:00
Brian Smith	38058eb7d8	Reduce visibility of some `transport::connection` items. (#20 ) Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-20 10:20:50 -10:00
Eliza Weisman	6a81e1f137	Improve error messages in logs (#18 ) Currently, the messages that the proxy logs on errors are often not very useful. For example, when an error occurred that caused the proxy to return HTTP 500, we log a message like this: ``` ERR! proxy={server=in listen=0.0.0.0:4143 remote=127.0.0.1:57416} linkerd2_proxy turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500 ``` Note that: + Regardless of what the error actually was, the current log message always says protocol error: unexpected internal error encountered, which is both fairly unclear and often not actually the case. + Regardless of whether the error was encountered by an HTTP/1 or HTTP/2 client, the error message always includes the string "underlying HTTP/2 error". This is probably fairly confusing for users who are, say, only proxying HTTP/1 traffic. This branch fixes several related issues around the clarity of the proxy's error messages: + A couple cases in the `transparency` module that returned `io::Error::from(io::ErrorKind::Other)` have been replaced with more descriptive errors that propagate the underlying error. This necessitated adding bounds on some error types. + Introduced a new `transparency::client::Error` enum that can be either a `h2::Error` or a `hyper::Error`, depending on whether the client is HTTP/1 or HTTP/2, and proxies its' `std::error::Error` impl to the wrapped error types. This way, we don't return a `tower_h2::client::Error` (with format output that includes the string "HTTP/2") from everything, and discard significantly less error information. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-19 16:38:31 -07:00
Sean McArthur	04e4b4409b	update httparse to v1.3.2 Allows using SIMD instructions when parsing. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-19 16:12:35 -07:00
Sean McArthur	11e7eb6357	update hyper to v0.12.6 Brings in fix to reduce connection churn related to keep-alive racing. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-19 16:12:35 -07:00
Eliza Weisman	30a48a7d8b	Accept TLS connnections even when TLS configuration isn't available (#22 ) Closes linkerd/linkerd2#1272. Currently, if TLS is enabled but the TLS configuration isn't available (yet), the proxy will pass through all traffic to the application. However, the destination service will tell other proxies to send TLS traffic to the pod unconditionally, so the proxy will pass through TLS handshakes to the application that are destined for the proxy itself. In linkerd/linkerd2#1272, @briansmith suggested that we change the proxy so that when it hasn't yet loaded a TLS configuration, it will accept TLS handshakes, but fail them. This branch implements that behaviour by making the `rustls::sign::CertifiedKey` in `CertResolver` optional, and changing the `CertResolver` to return `None` when `rustls` asks it to resolve a certificate in that case. The server config watch is now initially created with `Conditional::Some` with an empty `CertResolver`, rather than `Conditional::None(NoConfig)`, so that the proxy will accept incoming handshakes, but fail them. Signed-off-by: Eliza Weisman <eliza@buoyant.io	2018-07-18 17:06:45 -07:00
Eliza Weisman	f208acb3a5	Fix incorrect process_cpu_seconds_total metric (#7 ) Fixes linkerd/linkerd2#1239. The proxy's `process_cpu_seconds_total` metric is currently calculated incorrectly and will differ from the CPU stats reported by other sources. This is because it currently calculates the CPU time by summing the `utime` and `stime` fields of the stat struct returned by `procinfo`. However, those numbers are expressed in _clock ticks_, not seconds, so the metric is expressed in the wrong unit. This branch fixes this issue by using `sysconf` to get the number of clock ticks per second when the process sensor is created, and then dividing `utime + stime` by that number, so that the value is expressed in seconds. ## Demonstration: (Note that the last column in `ps aux` output is the CPU time total) ``` eliza@ares:~$ ps aux \| grep linkerd2-proxy \| grep -v grep eliza 40703 0.2 0.0 45580 14864 pts/0 Sl+ 13:49 0:03 target/debug/linkerd2-proxy eliza@ares:~$ curl localhost:4191/metrics # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 3 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 19 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1024 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 46673920 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 15220736 # HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch) # TYPE process_start_time_seconds gauge process_start_time_seconds 1531428584 eliza@ares:~$ ``` Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-17 15:48:02 -07:00
Brian Smith	1fdcbfaba6	Replace "conduit" with "linkerd" in TLS test data. (#17 ) This is purely aesthetic; the TLS logic doesn't care about the product name. The test data was regenerated by re-running gen-certs.sh after modifying it. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-17 09:22:17 -10:00
Brian Smith	e1b4e66836	Upgrade TLS dependencies. (#16 ) Fixes linkerd/linkerd2#1330. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-17 09:21:59 -10:00
Sean McArthur	162f53dc8d	spawn individual admin tasks instead of joining all (#10 ) Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-17 11:50:57 -07:00
Eliza Weisman	2ac114ba65	Point inotify dependency at master (#14 ) Now that inotify-rs/inotify#105 has merged, we will no longer see rampant CPU use from using the master version of `inotify`. I've updated Cargo.toml to depend on master rather than on my branch. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-17 10:21:40 -07:00
Sean McArthur	aa60ddb088	remove busy loop from destination background future when shutdown (#15 ) When the proxy is shutting down, once there are no more outbound connections, the sender side of the resolver channel is dropped. In the admin background thread, when the destination background future is notified of the closure, instead of shutting down itself, it just busy loops. Now, after seeing shutdown, the background future ends as well. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-17 09:50:08 -07:00
Sean McArthur	9f5648d955	fix control client Backoff to poll its timer when backing off (#13 ) The `Backoff` service wrapper is used for the controller client service so that if the proxy can't find the controller (there is a connection error), it doesn't keep trying in a tight loop, but instead waits a couple seconds before trying again, presuming that the control plane was rebooting. When "backing off", a timer would be set, but it wasn't polled, so the task was never registered to wake up after the delay. This turns out to not have been a problem in practice, since the background destination task was joined with other tasks that were constantly waking up, allowing it to try again anyways. To add tests for this, a new `ENV_CONTROL_BACKOFF_DELAY` config value has been added, so that the tests don't have to wait the default 5 seconds. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-16 12:41:47 -07:00
Brian Smith	73edefb795	Move `connection` submodule to `transport`. (#11 ) This allows easier logging configuration for the entire transport system using the common prefix `conduit_proxy::transport`. Previously logging had to be controlled separately/additionally for `conduit_proxy::connection`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-13 12:46:38 -10:00
Sean McArthur	f7233fd682	Revert "remove busy loop from destination background future during shutdown (#8 )" This reverts commit `4bee7b0b55`. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-13 13:18:39 -07:00
Oliver Gould	bbf217ff4f	Replace references to _Conduit_ (#6 ) There are various comments, examples, and documentation that refers to Conduit. This change replaces or removes these refernces. CONTRIBUTING.md has been updated to refer to GOVERNANCE/MAINTAINERS.	2018-07-12 20:41:17 -07:00
Eliza Weisman	2f4c1b220a	Add labels for `tls::ReasonForNoIdentity` (#5 ) Fixes linkerd/linkerd2#1276. Currently, metrics with the `tls="no_identity"` label are duplicated. This is because that label is generated from the `tls_status` label on the `TransportLabels` struct, which is either `Some(())` or a `ReasonForNoTls`. `ReasonForNoTls` has a variant`ReasonForNoTls::NoIdentity`, which contains a `ReasonForNoIdentity`, but when we format that variant as a label, we always just produce the string "no_identity", regardless of the value of the `ReasonForNoIdentity`. However, label types are _also_ used as hash map keys into the map that stores the metrics scopes, so although two instances of `ReasonForNoTls::NoIdentity` with different `ReasonForNoIdentity`s produce the same formatted label output, they aren't _equal_, since that field differs, so they correspond to different metrics. This branch resolves this issue, by adding an additional label to these metrics, based on the `ReasonForNoIdentity`. Now, the separate lines in the metrics output that correspond to each `ReasonForNoIdentity` have a label differentiating them from each other. Note that the the `NotImplementedForTap` and `NotImplementedForMetrics` reasons will never show up in metrics labels, currently, since we don't gather metrics from the tap and metrics servers at the moment. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-12 16:04:25 -07:00
Sean McArthur	4bee7b0b55	remove busy loop from destination background future during shutdown (#8 ) When the proxy is shutting down, once there are no more outbound connections, the sender side of the resolver channel is dropped. In the admin background thread, when the destination background future is notified of the closure, instead of shutting down itself, it just busy loops. Now, after seeing shutdown, the background future ends as well. While examining this, I noticed all the background futures are joined togther into a single `Future` before being spawned on a dedicated current_thread executor. Join in this case is inefficient, since every single time one of the futures is ready, they are all polled again. Since we have an executor handy, it's better to allow it to manage each of the futures individually. Signed-off-by: Sean McArthur <sean@buoyant.io>	2018-07-12 15:22:32 -07:00
Oliver Gould	3c48ba7f62	Add a README (#4 )	2018-07-11 16:01:54 -07:00
Oliver Gould	8db765c7bc	dev: Add a Dockerfile for development (#3 ) When working on the proxy, it's important to be able to build a Docker image that can be tested in the context of the existing linkerd2 project. This change adds a `make docker` target that produces a docker image, optionally tagged via the `DOCKER_TAG` environment variable. This is intended to be used for development--especially on non-Linux OSes.	2018-07-11 15:27:33 -07:00
Oliver Gould	0ca5d11c03	Adopt Linkerd2's governance (#2 ) For the time being, @briansmith and I will serve as super-maintainers for the linkerd2-proxy.	2018-07-10 15:59:12 -07:00
Oliver Gould	02a64e980f	ci: Publish artifacts to build.l5d.io In order to setup continuous integration, proxy artifacts need to be published somewhere predictable and discoverable. This change configures Travis CI to publish proxy artifacts built from master to: build.l5d.io/linkerd2-proxy/linkerd2-proxy-${ref}.tar.gz build.l5d.io/linkerd2-proxy/linkerd2-proxy-${ref}.txt The tarball includes an optimized proxy binary and metadata (right now, just the LICENSE file, but later this should include additional version/build metadata that can be used for runtime diagnostics). The text file includes the sha256 sum of the tarball. A `Makefile` is introduced to encapsulate build logic so that it can both drive CI and be used manually. Travis CI is configured to run debug-mode tests against PRs and to run a full release package-test-publish for commits to master.	2018-07-08 14:24:25 -07:00
Oliver Gould	c23ecd0cbc	Migrate `conduit-proxy` to `linkerd2-proxy` The proxy now honors environment variables starting with `LINKERD2_PROXY_`.	2018-07-07 22:45:21 +00:00
Eliza Weisman	ec303942ee	proxy: Add tls_config_last_reload_seconds metric (#1204 ) Depends on #1141. This PR adds a `tls_config_last_reload_seconds` Prometheus metric that reports the last time the TLS configuration files were reloaded. Proof that it works: Started the proxy with no certs, then generated them: ``` ➜ http GET localhost:4191/metrics HTTP/1.1 200 OK content-encoding: gzip content-length: 323 content-type: text/plain date: Mon, 25 Jun 2018 23:02:52 GMT # HELP tls_config_reload_total Total number of times the proxy's TLS config files were reloaded. # TYPE tls_config_reload_total counter tls_config_reload_total{status="io_error",path="example-example.crt",error_code="2"} 9 tls_config_reload_total{status="reloaded"} 3 # HELP tls_config_last_reload_seconds Timestamp of when the TLS configuration files were last reloaded successfully (in seconds since the UNIX epoch) # TYPE tls_config_last_reload_seconds gauge tls_config_last_reload_seconds 1529967764 # HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch) # TYPE process_start_time_seconds gauge process_start_time_seconds 1529967754 ``` Started the proxy with certs already present: ``` ➜ http GET localhost:4191/metrics HTTP/1.1 200 OK content-encoding: gzip content-length: 285 content-type: text/plain date: Mon, 25 Jun 2018 23:04:39 GMT # HELP tls_config_reload_total Total number of times the proxy's TLS config files were reloaded. # TYPE tls_config_reload_total counter tls_config_reload_total{status="reloaded"} 4 # HELP tls_config_last_reload_seconds Timestamp of when the TLS configuration files were last reloaded successfully (in seconds since the UNIX epoch) # TYPE tls_config_last_reload_seconds gauge tls_config_last_reload_seconds 1529967876 # HELP process_start_time_seconds Time that the process started (in seconds since the UNIX epoch) # TYPE process_start_time_seconds gauge process_start_time_seconds 1529967874 ``` Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-05 16:23:57 -07:00
Eliza Weisman	dd7ac18cc5	proxy: Fix out-of-control inotify CPU use (#1263 ) The `inotify-rs` library's `EventStream` implementation currently calls `task::current().notify()` in a hot loop when a poll returns `WouldBlock`, causing the task to constantly burn CPU. This branch updates the `inotify-rs` dependency to point at a branch of `inotify-rs` I had previously written. That branch rewrites the `EventStream` to use `mio` to register interest in the `inotify` file descriptor instead, fixing the out-of-control polling. When inotify-rs/inotify#105 is merged upstream, we can go back to depending on the master version of the library. Fixes #1261 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-03 20:16:12 -07:00
Oliver Gould	b9b35ec11c	proxy: Handle connection close during TLS detection (#1256 ) During protocol detection, we buffer data to detect a TLS Client Hello message. If the client disconnects while this detection occurs, we do not properly handle the disconnect, and the proxy may busy loop. To fix this, we must handle the case where `read(2)` returns 0 by creating a `Connection` with the already-closed socket. While doing this, I've moved some of the implementation of `ConditionallyUpgradeServerToTls::poll` into helpers on `ConditionallyUpgradeServerToTlsInner` so that the poll method is easier to read, hiding the inner details from the polling logic.	2018-07-03 15:36:48 -07:00
Eliza Weisman	1e39ab6ac4	proxy: Add a Prometheus metric for reporting errors loading TLS configs (#1141 ) This PR adds a Prometheus stat tracking the number of times TLS config files have been reloaded, and the number of times reloading those files has errored. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-07-03 15:24:20 -07:00
Eliza Weisman	5a3b1cdb3a	proxy: Add TLS label in `transparency::retry_reconnect_errors` test (#1258 )	2018-07-03 12:27:08 -07:00
Oliver Gould	866167a955	tap: Support `tls` labeling (#1244 ) The proxy's metrics are instrumented with a `tls` label that describes the state of TLS for each connection and associated messges. This same level of detail is useful to get in `tap` output as well. This change updates Tap in the following ways: * `TapEvent` protobuf updated: * Added `source_meta` field including source labels * `proxy_direction` enum indicates which proxy server was used. * The proxy adds a `tls` label to both source and destination meta indicating the state of each peer's connection * The CLI uses the `proxy_direction` field to determine which `tls` label should be rendered.	2018-07-02 17:19:20 -07:00
Oliver Gould	051a7639c5	proxy: Always inlcude `tls` label in metrics (#1243 ) The `tls` label could sometimes be formatted incorrectly, without a preceding comma. To fix this, the `TlsStatus` type no longer formats commas so that they must be provided in the context in which they are used (as is done otherwise in this file).	2018-07-02 16:21:06 -07:00
Eliza Weisman	91108a2d53	proxy: Fall back to plaintext communication when a TLS handshake fails (#1173 ) This branch modifies the proxy's logic for opening a connection so that when an attempted TLS handshake fails, the proxy will retry that connection without TLS. This is implemented by changing the `UpgradeToTls` case in the `Future` implementation for `Connecting`, so that rather than simply wrapping a poll to the TLS upgrade future with `try_ready!` (and thus failing the future if the upgrade future fails), we reset the state of the future to the `Plaintext` state and continue looping. The `tls_status` field of the future is changed to `ReasonForNoTls::HandshakeFailed`, and the `Plaintext` state is changed so that if its `tls_status` is `HandshakeFailed`, it will no longer attempt to upgrade to TLS when the plaintext connection is successfully established. Closes #1084 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-06-29 17:08:03 -07:00
Brian Smith	da61aace6c	Proxy: Skip TLS for control plane loopback connections. (#1229 ) If the controller address has a loopback host then don't use TLS to connect to it. TLS isn't needed for security in that case. In mormal configurations the proxy isn't terminating TLS for loopback connections anyway. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-06-28 17:24:09 -10:00
Brian Smith	03814c18eb	Proxy: Get identity of pod & controller from configuration. (#1221 ) Instead of attempting to construct identities itself, have the proxy accept fully-formed identities from whatever configures it. This allows us to centralize the formatting of the identity strings in the Go code that is shared between the `conduit inject`, `conduit install`, and CA components. One wrinkle: The pod namespace isn't necessarily available at `conduit inject` time, so the proxy must implement a simple variable substitution mechanism to insert the pod namespace into its identity. This has the side-effect of enabling TLS to the controller since the controller's identity is now available. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-06-27 17:17:34 -10:00
Brian Smith	c67546653a	Proxy: Use new destination service TLS identity scheme. (#1222 ) Signed-off-by: Brian Smith <brian@briansmith.org>	2018-06-27 14:47:57 -10:00
Eliza Weisman	af7b56f963	proxy: Replace >=100,000 ms latency buckets with 1, 2, 3, 4, and 5 ms (#1218 ) This branch adds buckets for latencies below 10 ms to the proxy's latency histograms, and removes the buckets for 100, 200, 300, 400, and 500 seconds, so the largest non-infinity bucket is 50,000 ms. It also removes comments that claimed that these buckets were the same as those created by the control plane, as this is no longer true (the metrics are now scraped by Prometheus from the proxy directly). Closes #1208 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-06-27 16:53:42 -07:00
Kevin Lingerfelt	26d2bce656	Update dest service with a different tls identity strategy (#1215 ) * Update dest service with a different tls identity strategy * Send controller namespace as separate field Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-27 11:40:02 -07:00

1 2 3 4 5 ...

303 Commits All Branches Search

303 Commits

All Branches