linkerd2

Commit Graph

Author	SHA1	Message	Date
Oliver Gould	02e6d018d0	proxy: Bound on router capacity (#898 ) Currently, the proxy may cache an unbounded number of routes. In order to prevent such leaks in production, new configurations are introduced to limit the number of inbound and outbound HTTP routes. By default, we support 100 inbound routes and 10K outbound routes. In a followup, we'll introduce an eviction strategy so that capacity can be reclaimed gracefully.	2018-05-04 16:32:30 -07:00
Oliver Gould	7f16079f64	proxy: Upgrade tower dependencies (#892 ) In order to pick up https://github.com/tower-rs/tower-grpc/pull/60, upgrade tower dependencies. This will reduce the cost of updating for upcoming tower-h2 improvements.	2018-05-02 13:40:55 -07:00
Oliver Gould	ada5cb267e	proxy: Expire metrics that have not been updated for 10 minutes (#880 ) The proxy is now configured with the CONDUIT_PROXY_METRICS_RETAIN_IDLE environment variable that dictates the amount of time that the proxy will retain metrics that have not been updated. A timestamp is maintained for each unique set of labels, indicating the last time that the scope was updated. Then, when metrics are read, all metrics older than CONDUIT_PROXY_METRICS_RETAIN_IDLE are dropped from the stats registry. A ctx::test_utils module has been added to aid testing. Fixes #819	2018-04-30 16:11:12 -07:00
Oliver Gould	935ccd5f78	proxy: Implement benchmarks for telemetry recording (#874 ) Before changing the telemetry implementation, we should have a means to understand the impacts of such changes. To run, you must use a nightly toolchain: ``` rustup run nightly cargo bench -p conduit-proxy -- record ```	2018-04-29 12:55:26 -07:00
Brian Smith	0c23ad416b	Proxy: Use trust-dns-resolver for DNS. (#834 ) trust-dns-resolver is a more complete implementation. In particular, it supports CNAMES correctly, which is needed for PR #764. It also supports /etc/hosts, which will help with issue #62. Use the 0.8.2 pre-release since it hasn't been released yet. It was created at our request. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 10:04:49 -10:00
Eliza Weisman	a2c60e8fcf	Add optional GZIP compression to proxy /metrics endpoint (#665 ) Closes #598. According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent. I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-24 17:42:50 -07:00
Eliza Weisman	2e919bf813	Move request open timestamp to the top of the stack (#744 ) Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router). This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats. By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously. I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.	2018-04-17 15:01:36 -07:00
Oliver Gould	efdfc93b50	Stop pushing telemetry reports from the proxy (#616 ) Now that the controller does not depend on pushed telemetry reports, the proxy need not depend on the telemetry API or maintain legacy sampling logic.	2018-04-12 17:39:29 -07:00
Eliza Weisman	b07b554d2b	Add labels from service discovery to proxy metrics reports (#661 ) PR #654 adds pod-based metric labels to the Destination API responses for cluster-local services. This PR modifies the proxy to actually add these labels to reported Prometheus metrics for outbound requests to local services. It enhances the proxy's `control::discovery` module to track these labels and add a `LabelRequest` middleware to the service stack built in `Bind` for labeled services. Requests transiting `LabelRequest` are given an `Extension` which contains these labels, which are then added to events produced by the `Sensors` for these requests. When these events are aggregated to Prometheus metrics, the labels are added. I've also added some tests in `test/telemetry.rs` ensuring that these metrics are added correctly when the Destination service provides labels. Closes #660 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-12 12:54:38 -07:00
Sean McArthur	02c6887020	proxy: improve graceful shutdown process (#684 ) - The listener is immediately closed on receipt of a shutdown signal. - All in-progress server connections are now counted, and the process will not shutdown until the connection count has dropped to zero. - In the case of HTTP1, idle connections are closed. In the case of HTTP2, the HTTP2 graceful shutdown steps are followed of sending various GOAWAYs.	2018-04-10 14:15:37 -07:00
Brian Smith	7319cf648f	Proxy: Do L7 load balancing for all external HTTP services. (#726 ) Previously when the proxy could tell, by parsing, the request-target is not in the cluster, it would not override the destination. That is, load balancing would be disabled for such destinations. With this change, the proxy will do L7 load balancing for all HTTP services as long as the request-target has a DNS name. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-10 08:07:16 -10:00
Brian Smith	bc16034fd6	Proxy: Fall back to using DNS when Destination service can't find service. (#692 ) Fixes #155.	2018-04-07 18:26:06 -10:00
Brian Smith	7d3b715c4d	Proxy: Move DNS name normalization to service discovery (#722 ) Only the destination service needs normalized names (and even then, that's just temporary). The rest of the code needs the name as it was given, except case-normalized (lowercased). Because DNS fallack isn't implemented in service discovery yet, Outbound still a temporary workaround using FullyQualifiedName to keep things working; thta will be removed once DNS fallback is implemented in service discovery. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-06 15:04:09 -10:00
Eliza Weisman	8bc05472ed	Make `control::Cache` key-value in order to store discovery metadata (#688 ) This PR changes the proxy's `control::Cache` module from a set to a key-value map. This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type. Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata. If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.	2018-04-06 13:54:16 -07:00
Brian Smith	7bc4ffd0a4	Revert "Proxy: Refactor DNS name parsing and normalization (#673 )" (#700 ) This reverts commit `311ef410a8`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 16:49:32 -10:00
Brian Smith	311ef410a8	Proxy: Refactor DNS name parsing and normalization (#673 ) Proxy: Refactor DNS name parsing and normalization Only the destination service needs normalized names (and even then, that's just temporary). The rest of the code needs the name as it was given, except case-normalized (lowercased). Because DNS fallack isn't implemented in service discovery yet, Outbound still a temporary workaround using FullyQualifiedName to keep things working; thta will be removed once DNS fallback is implemented in service discovery. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 12:32:12 -10:00
Sean McArthur	47f9665b8e	proxy: allow disable protocol detection on specific ports (#648 ) - Adds environment variables to configure a set of ports that, when an incoming connection has an SO_ORIGINAL_DST with a port matching, will disable protocol detection for that connection and immediately start a TCP proxy. - Adds a default list of well known ports: SMTP and MySQL. Closes #339	2018-04-02 14:24:36 -07:00
Eliza Weisman	1c9ce4d118	Add Prometheus /metrics endpoint to proxy (#569 ) This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable. Tests have been added in t`ests/telemetry.rs`	2018-03-21 16:19:32 -07:00
Sean McArthur	cd59465366	proxy: add SIGTERM and SIGINT handlers (#581 ) When the proxy is run in a Docker container, it runs as PID 1, with no default signal handlers setup. In order to react to signals from Kubernetes about shutting down, we need to set up explicit handlers. This adds handlers for SIGTERM and SIGINT. Closes #549	2018-03-16 18:53:20 -07:00
Brian Smith	649e784d9c	Simplify cluster zone suffix handling in the proxy (#528 ) * Temporarily stop trying to support configurable zones in the proxy. None of the zone configuration is tested and lots of things assume the cluster zone is `cluster.local`. Further, how exactly the proxy will actually learn the cluster zone hasn't been decided yet. Just hard-code the zone as "cluster.local" in the proxy until configurable zones are fully implemented and tested to be working correctly. Signed-off-by: Brian Smith <brian@briansmith.org> * Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting The way that Kubernetes configures DNS search suffixes has some negative consequences as some names like "example.com" are ambiguous: depending on whether there is a service "example" in the "com" namespace, "example.com" may refer to an external service or an internal service, and this can fluctuate over time. In recognition of that we added the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would be part of a solution for users to opt out of the unfortunate behavior if their applications didn't depend on the DNS search suffix feature. It turns out similar effects can be acheived using a custom dnsConfig, starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level. Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative. Further, attempting to support it optionally made the code complex and hard to read. Therefore, let's just remove it. If/when somebody actually requests this functionality then we can add it back, if dnsConfig isn't a valid alternative for them. Signed-off-by: Brian Smith <brian@briansmith.org> * Further hard-code "cluster.local" as the zone, temporarily. Addresses review feedback. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 14:30:13 -10:00
Eliza Weisman	ad073c79b9	Remove connect timeouts from Bind (#487 ) Currently, the `Reconnect` middleware does not reconnect on connection errors (see #491) and treats them as request errors. This means that when a connection timeout is wrapped in a `Reconnect`, timeout errors are treated as request errors, and the request returns HTTP 500. Since this is not the desired behavior, the connection timeouts should be removed, at least until their errors can be handled differently. This PR removes the connect timeouts from `Bind`, as described in https://github.com/runconduit/conduit/pull/483#issuecomment-369380003. It removes the `CONDUIT_PROXY_PUBLIC_CONNECT_TIMEOUT_MS` environment variable, but _not_ the `CONDUIT_PROXY_PRIVATE_CONNECT_TIMEOUT_MS` variable, since this is also used for the TCP connect timeouts. If we want also want to remove the TCP connection timeouts, I can do that as well. Closes #483. Fixes #491.	2018-03-05 15:38:20 -08:00
Eliza Weisman	c3ad9e5f2f	Use fmt::Display to format error messages in logs (#477 ) This PR changes the proxy to log error messages using `fmt::Display` whenever possible, which should lead to much more readable and meaningful error messages This is part of the work I started last week on issue #442. While I haven't finished everything for that issue (all errors still are mapped to HTTP 500 error codes), I wanted to go ahead and open a PR for the more readable error messages. This is partially because I found myself merging these changes into other branches to aid in debugging, and because I figured we may as well have the nicer logging on master.	2018-03-02 12:44:18 -08:00
Brian Smith	0d14c196f5	Proxy: Upgrade from ordermap 0.2 crate to indexmap 0.4. (#466 ) Currently we have to download and build two different versions of the ordermap crate. I will submit similar PRs for the dependent crates so that we will eventually all be using the same version of indexmap. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-26 19:29:22 -10:00
Brian Smith	8607875267	Stop using the url crate in the proxy. (#450 ) Version 1.7.0 of the url crate seems to be broken which means we cannot `cargo update` the proxy without locking url to version 1.6. Since we only use it in a very limited way anyway, and since we use http::uri for parsing much more, just switch all uses of the url crate to use http::uri for parsing instead. This eliminates some build dependencies. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-02-26 08:55:48 -10:00
Eliza Weisman	694f691b71	Add timeout to Outbound::bind_service (#436 ) Closes #403. When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error. I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)). Testing with @klingerf's reproduction from #403: ``` curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get \| head -n1 HTTP/1.1 500 Internal Server Error ``` proxy logs: ```rust proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 } proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140) proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240 proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 })) ```	2018-02-26 10:18:35 -08:00
Eliza Weisman	50103d479a	Remove timestamps from log messages (#399 ) As @olix0r requested in https://github.com/runconduit/conduit/issues/56#issuecomment-356771758, I've removed timestamps from the Conduit proxy's log records. Closes #56 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-02-21 15:17:17 -08:00
Carl Lerche	287128885e	Proxy: Limit the max number of in-flight requests. (#398 ) Currently, the max number of in-flight requests in the proxy is unbounded. This is due to the `Buffer` middleware being unbounded. This is resolved by adding an instance of `InFlightLimit` around `Buffer`, capping the max number of in-flight requests for a given endpoint. Currently, the limit is hardcoded to 10,000. However, this will eventually become a configuration value. Fixes #287 Signed-off-by: Carl Lerche <me@carllerche.com>	2018-02-20 19:56:21 -08:00
Oliver Gould	a2d537f5c4	Use a load-aware balancer (#251 ) Currently, the conduit proxy uses a simplistic Round-Robin load balancing algorithm. This strategy degrades severely when individual endpoints exhibit abnormally high latency. This change improves this situation somewhat by making the load balancer aware of the number of outstanding requests to each endpoint. When nodes exhibit high latency, they should tend to have more pending requests than faster nodes; and the Power-of-Two-Choices node selector can be used to distribute requests to lesser-loaded instances. From the finagle guide: The algorithm randomly picks two nodes from the set of ready endpoints and selects the least loaded of the two. By repeatedly using this strategy, we can expect a manageable upper bound on the maximum load of any server. The maximum load variance between any two servers is bound by ln(ln(n))` where `n` is the number of servers in the cluster. Signed-off-by: Oliver Gould <ver@buoyant.io>	2018-02-07 09:39:31 -08:00
Oliver Gould	e2093e37f8	Move the Rust gRPC bindings to a dedicated crate (#275 ) The proxy depends on `protoc`-generated gRPC bindings to communicate with the controller. In order to generate these bindings, build-time dependencies must be compiled. In order to support a more granular, cacheable build scheme, a new crate has been created to house these gRPC bindings, `conduit-proxy-controller-grpc`. Because `TryFrom` and `TryInto` conversions are implemented for protobuf-defined types, the `convert` module also had to be moved to into a dedicated crate. Furthermore, because the proxy's tests require that `quickcheck::Aribtrary` be implemented for protobuf types, the `conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature fla protobuf types, the `conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature flag. While we're moving these libraries around, the `tower-router` crate has been moved to `proxy/router` and renamed to `conduit-proxy-router.` `futures-mpsc-lossy` has been moved into the proxy directory but has not been renamed. Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the wasteful building of dependency artifacts, as they are not actually used by `proxy/Dockerfile`.	2018-02-06 10:31:48 -08:00
Eliza Weisman	eddc37de28	Adopt external tower-grpc and tower-h2 deps #225 ) The conduit repo includes several library projects that have since been moved into external repos, including `tower-grpc` and `tower-h2`. This change removes these vendored libraries in favor of using the new external crates.	2018-02-01 11:57:02 -08:00
Sean McArthur	54aef56e25	proxy: add transparent protocol detection and handling The proxy will now try to detect what protocol new connections are using, and route them accordingly. Specifically: - HTTP/2 stays the same. - HTTP/1 is now accepted, and will try to send an HTTP/1 request to the target. - If neither HTTP/1 nor 2, assume a TCP stream and simply forward between the source and destination. * tower-h2: fix Server Clone bounds * proxy: implement Async{Read,Write} extra methods for Connection Closes #130 Closes #131	2018-01-23 16:14:07 -08:00
Eliza Weisman	63d1a5d70d	Add Protocol field to Transports telemetry (#138 ) See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP). Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes. Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP. Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy... Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-01-11 16:00:38 -08:00
clemensw	b1831cd415	[proxy] Fix rendering for top-level rustdoc (#113 ) Signed-off-by: clemensw <clemensw@users.noreply.github.com>	2018-01-08 15:40:12 -08:00
Brian Smith	8385a7a8c1	Proxy: Map unqualified/partially-qualified names to FQDN (#59 ) * Proxy: Map unqualified/partially-qualified names to FQDN Previously we required the service to fully qualify all service names for outbound traffic. Many services are written assuming that Kubernetes will complete names using its DNS search path, and those services weren't working with Conduit. Now add an option, used by default, to fully-qualify the domain names. Currently only Kubernetes-like name completion for services is supported, but the configuration syntax is open-ended to allow for alternatives in the future. Also, the auto-completion can be disabled for applications that prefer to ensure they're always using unambiguous names. Once routing is implemented then it is likely that (default) routing rules will replace these hard-coded rules. Unit tests for the name completion logic are included. Part of the solution for #9. The changes to `conduit inject` to actually use this facility will be in another PR.	2017-12-19 11:59:26 -10:00
Brian Smith	40d50f0f4a	Encapsulate listening port connection acceptance logic (#46 ) Previously every use of `BoundPort` repeated a bunch of logic. Move the repeated logic to `BoundPort` itself. Just remove the no-op handshaking logic; new handshaking logic will be added to `BoundPort` when TLS is added.	2017-12-14 13:19:05 -10:00
Brian Smith	81fb0fea5f	Move default private connect timeout to `Config` (#42 ) Previously the default value of this setting was in lib.rs instead of being automatically set in `Config` like all the other defaults, which was inconsistent and confusing. Fix this by moving the defaulting logic to `Config`. Validated by running the test suite.	2017-12-13 21:15:21 -06:00
Brian Smith	0c2aa0e185	Centralize and clarify TCP port binding (#43 ) Previously the logic related to listening for incoming TCP connections was duplicated in several places. Begin centralizing this logic. Future commits will centralize it further. No validation was done other than running the test suite.	2017-12-13 19:45:15 -06:00
Brian Smith	0185522821	Proxy: Parse environment variables in one place (#26 ) Previously `Process` did its own environment variable parsing and did not benefit from the improved error handling that `config` now has. Additionally, future changes will need access to these same environment variables in other parts of the proxy. Move `Process`'s environment variable parsing to `config` to address both of these issues. Now there are no uses of `env::var` outside of `config` except for logging, which is the final desired state. I validated this manually.	2017-12-13 19:33:37 -06:00
Brian Smith	559f4a76fb	Proxy: Use production config parsing in tests (#25 ) * Proxy: Use production config parsing in tests Previosuly the testing code for the proxy was sensitive to the values of environment variables unintentionally, because `Config` looked at the environment variables. Also, the tests were largely avoiding testing the production configuration parsing code since they were doing their own parsing. Now the tests avoid looking at environment variables other than `ENV_LOG`, which makes them more resilient. Also the tests now parse the settings using the same code as production use uses. I validated this manually.	2017-12-13 19:27:50 -06:00
Eliza Weisman	2fdb859dff	Add timeout to in-flight telemetry reports (#12 ) This PR adds a configurable timeout duration after which in-flight telemetry reports are dropped, cancelling the corresponding RPC request to the control plane. I've also made the `Timeout` implementation used in `TimeoutConnect` generic, and reused it in multiple places, including the timeout for in-flight reports. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2017-12-13 15:07:36 -08:00
Oliver Gould	980f85963d	apply rustffmt on proxy, remove rustfmt.toml for now	2017-12-05 00:44:16 +00:00
Oliver Gould	b104bd0676	Introducing Conduit, the ultralight service mesh We’ve built Conduit from the ground up to be the fastest, lightest, simplest, and most secure service mesh in the world. It features an incredibly fast and safe data plane written in Rust, a simple yet powerful control plane written in Go, and a design that’s focused on performance, security, and usability. Most importantly, Conduit incorporates the many lessons we’ve learned from over 18 months of production service mesh experience with Linkerd. This repository contains a few tightly-related components: - `proxy` -- an HTTP/2 proxy written in Rust; - `controller` -- a control plane written in Go with gRPC; - `web` -- a UI written in React, served by Go.	2017-12-05 00:24:55 +00:00

42 Commits