This PR ensures that the mapping of requests to outbound connections is segregated by `Host:` header values. In most cases, the desired behavior is provided by Hyper's connection pooling. However, Hyper does not handle the case where a request had no `Host:` header and the request URI had no authority part, and the request was routed based on the SO_ORIGINAL_DST in the desired manner. We would like these requests to each have their own outbound connection, but Hyper will reuse the same connection for such requests.
Therefore, I have modified `conduit_proxy_router::Recognize` to allow implementations of `Recognize` to indicate whether the service for a given key can be cached, and to only cache the service when it is marked as cachable. I've also changed the `reconstruct_uri` function, which rewrites HTTP/1 requests, to mark when a request had no authority and no `Host:` header, and the authority was rewritten to be the request's ORIGINAL_DST. When this is the case, the `Recognize` implementations for `Inbound` and `Outbound` will mark these requests as non-cachable.
I've also added unit tests ensuring that A, connections are created per `Host:` header, and B, that requests with no `Host:` header each create a new connection. The first test passes without any additional changes, but the second only passes on this branch. The tests were added in PR #489, but this branch supersedes that branch.
Fixes#415. Closes#489.
The main goal with this change is to make it clear, from looking at
Cargo.lock, that the Conduit proxy doesn't depend on OpenSSL. Although
the *ring* crate attempts to avoid conflicts with symbols defined in
OpenSSL, that is a manual process that doesn't have automatic
verification yet.
The secondary goal is to reduce the total number of dependencies to
make (at least) full from-scratch builds, such as those in CI, faster.
As a result of this PR, following the following upstream PRs we
submitted to prost, as well as some similar PRs in other upstream
projects and in conduit inself, our usage of prost now results in us
depending on many fewer crates:
* https://github.com/danburkert/prost/pull/78
* https://github.com/danburkert/prost/pull/79
* https://github.com/danburkert/prost/pull/82
* https://github.com/danburkert/prost/pull/84
Here are the crate dependencies that are removed:
* adler32
* aho-corasick
* build_const
* bzip2
* bzip2-sys
* crc
* curl
* curl-sys
* env_logger (0.4)
* flate2
* lazy_static
* libz-sys
* memchr
* miniz_oxide
* miniz_oxide_c_api
* msdos_time
* openssl-probe
* openssl-sys
* pkg-config
* podio
* regex
* regex-syntax
* schannel
* socket2
* thread_local
* unreachable
* utf8-ranges
* vcpkg
* zip
Pretty much all of these are build dependencies, but Cargo.lock doesn't
distinguish between build dependencies and regular dependencies.
Signed-off-by: Brian Smith <brian@briansmith.org>
As a goal of being a transparent proxy, we want to proxy requests and
responses with as little modification as possible. Basically, servers
and clients should see messages that look the same whether the proxy was
injected or not.
With that goal in mind, we want to make sure that body headers (things
like `Content-Length`, `Transfer-Encoding`, etc) are left alone. Prior
to this commit, we at times were changing behavior. Sometimes
`Transfer-Encoding` was added to requests, or `Content-Length: 0` may
have been removed. While RC 7230 defines that differences are
semantically the same, implementations may not handle them correctly.
Now, we've added some fixes to prevent any of these header changes
from occurring, along with tests to make sure library updates don't
regress.
For requests:
- With no message body, `Transfer-Encoding: chunked` should no longer be
added.
- With `Content-Length: 0`, the header is forwarded untouched.
For responses:
- Tests were added that responses not allowed to have bodies (to HEAD
requests, 204, 304) did not have `Transfer-Encoding` added.
- Tests that `Content-Length: 0` is preserved.
- Tests that HTTP/1.0 responses with no body headers do not have
`Transfer-Encoding` added.
- Tests that `HEAD` responses forward `Content-Length` headers (but not
an actual body).
Closes#447
Signed-off-by: Sean McArthur <sean@seanmonstar.com>
Currently, the `Reconnect` middleware does not reconnect on connection errors (see #491) and treats them as request errors. This means that when a connection timeout is wrapped in a `Reconnect`, timeout errors are treated as request errors, and the request returns HTTP 500. Since this is not the desired behavior, the connection timeouts should be removed, at least until their errors can be handled differently.
This PR removes the connect timeouts from `Bind`, as described in https://github.com/runconduit/conduit/pull/483#issuecomment-369380003.
It removes the `CONDUIT_PROXY_PUBLIC_CONNECT_TIMEOUT_MS` environment variable, but _not_ the `CONDUIT_PROXY_PRIVATE_CONNECT_TIMEOUT_MS` variable, since this is also used for the TCP connect timeouts. If we want also want to remove the TCP connection timeouts, I can do that as well.
Closes#483. Fixes#491.
This PR changes the proxy to log error messages using `fmt::Display` whenever possible, which should lead to much more readable and meaningful error messages
This is part of the work I started last week on issue #442. While I haven't finished everything for that issue (all errors still are mapped to HTTP 500 error codes), I wanted to go ahead and open a PR for the more readable error messages. This is partially because I found myself merging these changes into other branches to aid in debugging, and because I figured we may as well have the nicer logging on master.
We previously `join`ed on piping data from both sides, meaning
that the future didn't complete until **both** sides had disconnected.
Even if the client disconnected, it was possible the server never knew,
and we "leaked" this future.
To fix this, the `join` is replaced with a `Duplex` future, which pipes
from both ends into the other, while also detecting when one side shuts
down. When a side does shutdown, a write shutdown is forwarded to the
other side, to allow draining to occur for deployments that half-close
sockets.
Closes#434
As requested by @briansmith in https://github.com/runconduit/conduit/issues/415#issuecomment-369026560 and https://github.com/runconduit/conduit/issues/415#issuecomment-369032059, I've refactored `FullyQualifiedAuthority::normalize` to _always_ return a `FullyQualifiedAuthority`, along with a boolean value indicating whether or not the Destination service should be used for that authority.
This is in contrast to returning an `Option<FullyQualifiedAuthority>` where `None` indicated that the Destination service should not be used, which is what this function did previously.
This is required for further progress on #415.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
```
cargo update -p tempdir
cargo update -p abstract-ns
```
The new version of tempdir actually adds a new dependency, but
apparently that is to fix a bug.
Signed-off-by: Brian Smith <brian@briansmith.org>
Currently we have to download and build two different versions of
the ordermap crate.
I will submit similar PRs for the dependent crates so that we will
eventually all be using the same version of indexmap.
Signed-off-by: Brian Smith <brian@briansmith.org>
This was caused by the fact that a new instance of `env_logger::init()`
was added after the PR that rewrote them all to `env_logger::try_init()`
was added.
Fixes#469
Signed-off-by: Brian Smith <brian@briansmith.org>
Stop initializing env_logger in every test. In env_logger 0.5, it
may only be initialized once per process.
Also, Prost will soon upgrade to env_logger 0.5 and this will
(eventually) help reduce the number of versions of env_logger we
have to build. Turning off the regex feature will (eventually) also
reduce the number of dependencies we have to build. Unfortunately,
as it is now, the number of dependencies has increased because
env_logger increased its dependencies in 0.5.
Signed-off-by: Brian Smith <brian@briansmith.org>
Turning off the default features of quickcheck removes its
`env_logger` and `log` dependencies. It uses older versions of
those packages than conduit-proxy will use, so this will
(eventually) reduce the number of versions of those packages that
get downloaded and built.
Signed-off-by: Brian Smith <brian@briansmith.org>
Hyper depends on tokio-proto with a default feature. By turning off
its default features, we can avoid that dependency. That reduces the
number of dependencies by 4.
Signed-off-by: Brian Smith <brian@briansmith.org>
Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.
This eliminates some build dependencies.
Signed-off-by: Brian Smith <brian@briansmith.org>
Closes#403.
When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.
I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).
Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
This PR adds a `flaky_tests` cargo feature to control whether or not to ignore tests that are timing-dependent. This feature is enabled by default in local builds, but disabled on CI and in all Docker builds.
Closes#440
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.
This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.
Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.
Fixes#287
Signed-off-by: Carl Lerche <me@carllerche.com>
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.
The rules used to determine if it looks like a local service come from this comment:
> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.
Closes#266
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Follow-up from #315.
Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.
I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.
Closes#265
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports.
I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.
I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
Currently, the conduit proxy uses a simplistic Round-Robin load
balancing algorithm. This strategy degrades severely when individual
endpoints exhibit abnormally high latency.
This change improves this situation somewhat by making the load balancer
aware of the number of outstanding requests to each endpoint. When nodes
exhibit high latency, they should tend to have more pending requests
than faster nodes; and the Power-of-Two-Choices node selector can be
used to distribute requests to lesser-loaded instances.
From the finagle guide:
The algorithm randomly picks two nodes from the set of ready endpoints
and selects the least loaded of the two. By repeatedly using this
strategy, we can expect a manageable upper bound on the maximum load of
any server.
The maximum load variance between any two servers is bound by
ln(ln(n))` where `n` is the number of servers in the cluster.
Signed-off-by: Oliver Gould <ver@buoyant.io>
The proxy depends on `protoc`-generated gRPC bindings to communicate
with the controller. In order to generate these bindings, build-time
dependencies must be compiled.
In order to support a more granular, cacheable build scheme, a new crate
has been created to house these gRPC bindings,
`conduit-proxy-controller-grpc`.
Because `TryFrom` and `TryInto` conversions are implemented for
protobuf-defined types, the `convert` module also had to be moved to
into a dedicated crate.
Furthermore, because the proxy's tests require that
`quickcheck::Aribtrary` be implemented for protobuf types, the
`conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature
fla protobuf types, the `conduit-proxy-controller-grpc` crate supports
an _arbitrary_ feature flag.
While we're moving these libraries around, the `tower-router` crate has
been moved to `proxy/router` and renamed to `conduit-proxy-router.`
`futures-mpsc-lossy` has been moved into the proxy directory but has not
been renamed.
Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the
wasteful building of dependency artifacts, as they are not actually used
by `proxy/Dockerfile`.
The conduit repo includes several library projects that have since been
moved into external repos, including `tower-grpc` and `tower-h2`.
This change removes these vendored libraries in favor of using the new
external crates.
Response End events were only triggered after polling the trailers of
a response, but when the Response is given to a hyper h1 server, it
doesn't know about trailers, so they were never polled!
The fix is that the `BodyStream` glue will now poll the wrapped body for
trailers after it sees the end of the data, before telling hyper the
stream is over. This ensures a ResponseEnd event is emitted.
Includes a proxy telemetry test over h1 connections.
* Make Eos optional in TapEvent
grpc_status not being set in protobuf is the same as being set to zero,
which is also status OK
Modify TapEvent to include an optional EOS struct
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Part of #198
* Add Eos to proto & proxy tap end-of-stream events
The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer.
This PR updates the proxy. The proto and controller changes are in PR #204.
Part of #198. Closes#202
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The proxy will now try to detect what protocol new connections are
using, and route them accordingly. Specifically:
- HTTP/2 stays the same.
- HTTP/1 is now accepted, and will try to send an HTTP/1 request
to the target.
- If neither HTTP/1 nor 2, assume a TCP stream and simply forward
between the source and destination.
* tower-h2: fix Server Clone bounds
* proxy: implement Async{Read,Write} extra methods for Connection
Closes#130Closes#131
As @seanmonstar noticed, the build script will currently re-compile all the protobufs regardless of whether or not they have changed, making the build much slower.
This PR modifies it to emit `cargo:rerun-if-changed=` for all the protobuf files, so they will only be regenerated if one of them changes.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Our build instructions were scattered across a few README's.
This consolidates all instructions relevant to Conduit development into
a single BUILD.md.
Fixes#134
Signed-off-by: Andrew Seigner <andrew@sig.gy>
See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP).
Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes.
Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP.
Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy...
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
The image tags for gcr.io/runconduit/go-deps and
gcr.io/runconduit/proxy-deps were not updating to account for all
changes in those images.
Modify SHA generation to include all files that affect the base
dependency images. Also add instructions to README.md for updating
hard-coded SHAs in Dockerfile's.
Fixes#115
Signed-off-by: Andrew Seigner <andrew@sig.gy>
Because whether or not to build a new deps image is based on the SHA of Cargo.lock, changes to the deps Dockerfile will not cause a new deps image to be built. Because of this, the current proxy deps Docker image is based on the wrong Rust version, breaking the build. See #115 for details on this issue.
I've appended a newline to Cargo.lock to change the lockfile's SHA and trigger a rebuild of the deps Docker image on CI. I've also added a comment in the Dockerfile noting that it is necessary to do this when changing that file.
Signed-off-by: Eliza Weisman eliza@buoyant.io
Since the methods on this trait were moved to direct implementations on the
implementing types, this produces an unused import warning with the latest
(1.23) Rust standard library. As we set `deny(warnings)`, this breaks the build.
Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Previously there was a default controller URL in the proxy. This
default was never used for any proxy injected by `conduit inject` and
it was the wrong default when using the proxy outside of Kubernetes.
Also more generally this is such an important setting in terms of
correctness and security that it was dangerous to let it be implied in
any context.
Remove the default, requiring that it be set in order for the proxy to
start.