Commit Graph

54 Commits

Author SHA1 Message Date
Brian Smith 6d3a7c337d Reduce memory allocations during logging. (#445)
Stop initializing env_logger in every test. In env_logger 0.5, it
may only be initialized once per process.

Also, Prost will soon upgrade to env_logger 0.5 and this will
(eventually) help reduce the number of versions of env_logger we
have to build. Turning off the regex feature will (eventually) also
reduce the number of dependencies we have to build. Unfortunately,
as it is now, the number of dependencies has increased because
env_logger increased its dependencies in 0.5.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 18:32:47 -10:00
Brian Smith 320c794f0f Turn off the `use_logging` default feature of quickcheck. (#465)
Turning off the default features of quickcheck removes its
`env_logger` and `log` dependencies. It uses older versions of
those packages than conduit-proxy will use, so this will
(eventually) reduce the number of versions of those packages that
get downloaded and built.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 16:04:58 -10:00
Brian Smith f2c711230b Remove the unused tokio-proto build dependency. (#451)
Hyper depends on tokio-proto with a default feature. By turning off
its default features, we can avoid that dependency. That reduces the
number of dependencies by 4.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 08:56:05 -10:00
Brian Smith 819fa072d6 Stop using the url crate in the proxy. (#450)
Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.

This eliminates some build dependencies.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 08:55:48 -10:00
Eliza Weisman 6df6e8150f Add timeout to Outbound::bind_service (#436)
Closes #403.

When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.

I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).

Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
2018-02-26 10:18:35 -08:00
Eliza Weisman 777b2fcfe4 Add flaky_tests feature for skipping some tests on CI (#441)
This PR adds a `flaky_tests` cargo feature to control whether or not to ignore tests that are timing-dependent. This feature is enabled by default in local builds, but disabled on CI and in all Docker builds.

Closes #440
2018-02-26 10:17:53 -08:00
Sean McArthur 427335a4d5 proxy: don't send transfer-encoding for empty GET requests (#410)
This is fixed in hyper v0.11.19.

Closes #402
2018-02-23 16:22:45 -08:00
Sean McArthur 8ea68f9c44 proxy: fix intermittent tap match method quickcheck failure (#427)
Closes #413
2018-02-23 12:13:51 -08:00
Sean McArthur 4e53eaf909 proxy: improve overall quality of logs (#424)
- Removed several useless `trace!("poll")` lines
- Upgraded controller client errors to `error` level
- Made controller client errors human-readable
2018-02-22 15:51:04 -08:00
Sean McArthur 34f5c58c99 proxy: add more tests for loopback IPs in FullyQualifiedAuthority (#411)
Closes #357

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-02-21 18:15:23 -10:00
Eliza Weisman 10915b6da3 Remove timestamps from log messages (#399)
As @olix0r requested in https://github.com/runconduit/conduit/issues/56#issuecomment-356771758, I've removed timestamps from the Conduit proxy's log records.

Closes #56

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-21 15:17:17 -08:00
Kevin Lingerfelt 13275818ee Prepare for the v0.3.0 release (#406)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-21 11:14:11 -08:00
Carl Lerche d2e86904ef Proxy: Limit the max number of in-flight requests. (#398)
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes #287

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-02-20 19:56:21 -08:00
Sean McArthur 30dc48db51 proxy: use original dst if authority doesnt look like local service (#397)
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.

The rules used to determine if it looks like a local service come from this comment:

> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
2018-02-20 18:09:21 -08:00
Eliza Weisman ea06854fca Remove unused metrics (#322)
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.

Closes #266 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 17:14:17 -08:00
Eliza Weisman 713ddf24c8 Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman ae0e2d7d86 Store proxy latencies in a structure that matches controller histogram (#11)
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. 

I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.

I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
2018-02-07 18:02:59 -08:00
Oliver Gould 9c02c4d6e7 Use a load-aware balancer (#251)
Currently, the conduit proxy uses a simplistic Round-Robin load
balancing algorithm. This strategy degrades severely when individual
endpoints exhibit abnormally high latency.

This change improves this situation somewhat by making the load balancer
aware of the number of outstanding requests to each endpoint. When nodes
exhibit high latency, they should tend to have more pending requests
than faster nodes; and the Power-of-Two-Choices node selector can be
used to distribute requests to lesser-loaded instances.

From the finagle guide:

    The algorithm randomly picks two nodes from the set of ready endpoints
    and selects the least loaded of the two. By repeatedly using this
    strategy, we can expect a manageable upper bound on the maximum load of
    any server.

    The maximum load variance between any two servers is bound by
    ln(ln(n))` where `n` is the number of servers in the cluster.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2018-02-07 09:39:31 -08:00
Oliver Gould 0cd1f65e39 Move the Rust gRPC bindings to a dedicated crate (#275)
The proxy depends on `protoc`-generated gRPC bindings to communicate
with the controller. In order to generate these bindings, build-time
dependencies must be compiled.

In order to support a more granular, cacheable build scheme, a new crate
has been created to house these gRPC bindings,
`conduit-proxy-controller-grpc`.

Because `TryFrom` and `TryInto` conversions are implemented for
protobuf-defined types, the `convert` module also had to be moved to
into a dedicated crate.

Furthermore, because the proxy's tests require that
`quickcheck::Aribtrary` be implemented for protobuf types, the
`conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature
fla protobuf types, the `conduit-proxy-controller-grpc` crate supports
an _arbitrary_ feature flag.

While we're moving these libraries around, the `tower-router` crate has
been moved to `proxy/router` and renamed to `conduit-proxy-router.`
`futures-mpsc-lossy` has been moved into the proxy directory but has not
been renamed.

Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the
wasteful building of dependency artifacts, as they are not actually used
by `proxy/Dockerfile`.
2018-02-06 10:31:48 -08:00
Oliver Gould ad1aedf7b6 Add a newline to dco.yml (#254) 2018-02-01 15:16:02 -08:00
Oliver Gould 817f914798 Do not require DCO signoff for project members (#252)
We only need the DCO bot to validate external submissions.
2018-02-01 13:45:32 -08:00
Eliza Weisman b56cc883c1 Adopt external tower-grpc and tower-h2 deps #225)
The conduit repo includes several library projects that have since been
moved into external repos, including `tower-grpc` and `tower-h2`.

This change removes these vendored libraries in favor of using the new
external crates.
2018-02-01 11:57:02 -08:00
Dennis Adjei-Baah 53299f6c78 Prepare for v0.2.0 release (#248)
* prepare for v0.2.0 release

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-01-31 15:39:48 -08:00
Sean McArthur 719000082f proxy: fix tcp_with_no_orig_dst test (#229)
Sometimes, the try_read will return a connection error, sometimes it
will just return EOF. Handle both cases.

Closes #226
2018-01-29 15:15:06 -08:00
Sean McArthur 8b7baf62c3 proxy: fix h1 streams to trigger response end events
Response End events were only triggered after polling the trailers of
a response, but when the Response is given to a hyper h1 server, it
doesn't know about trailers, so they were never polled!

The fix is that the `BodyStream` glue will now poll the wrapped body for
trailers after it sees the end of the data, before telling hyper the
stream is over. This ensures a ResponseEnd event is emitted.

Includes a proxy telemetry test over h1 connections.
2018-01-25 16:36:16 -08:00
Andrew Seigner 0008002236 Move EosCtx to common for Tap and Telemetery (#204)
* Make Eos optional in TapEvent

grpc_status not being set in protobuf is the same as being set to zero,
which is also status OK

Modify TapEvent to include an optional EOS struct

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Part of #198

* Add Eos to proto & proxy tap end-of-stream events

The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer.

This PR updates the proxy. The proto and controller changes are in PR #204.
Part of #198. Closes #202

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-24 15:48:00 -08:00
Sean McArthur 1e9ff8be03 proxy: add transparent protocol detection and handling
The proxy will now try to detect what protocol new connections are
using, and route them accordingly. Specifically:

- HTTP/2 stays the same.
- HTTP/1 is now accepted, and will try to send an HTTP/1 request
  to the target.
- If neither HTTP/1 nor 2, assume a TCP stream and simply forward
  between the source and destination.

* tower-h2: fix Server Clone bounds
* proxy: implement Async{Read,Write} extra methods for Connection

Closes #130 
Closes #131
2018-01-23 16:14:07 -08:00
Andrew Seigner cb6c2eab16 Updates for v0.1.3 release (#185)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-19 13:58:52 -08:00
Andrew Seigner d22ce60c0c Updates for v0.1.2 release (#171)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-19 10:56:20 -08:00
Eliza Weisman 3bbeac09d2 Use cargo:rerun-if-changed to avoid recompiling protos (#160)
As @seanmonstar noticed, the build script will currently re-compile all the protobufs regardless of whether or not they have changed, making the build much slower. 

This PR modifies it to emit `cargo:rerun-if-changed=` for all the protobuf files, so they will only be regenerated if one of them changes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-17 09:23:27 -08:00
Andrew Seigner 1a386d1f1f Introduce BUILD.md (#137)
Our build instructions were scattered across a few README's.

This consolidates all instructions relevant to Conduit development into
a single BUILD.md.

Fixes #134

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2018-01-16 23:19:53 -08:00
Eliza Weisman d6cd34fc98 Add Protocol field to Transports telemetry (#138)
See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP).

Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes.

Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP.

Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy...

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-11 16:00:38 -08:00
clemensw 84ab38414e [proxy] Fix rendering for top-level rustdoc (#113)
Signed-off-by: clemensw <clemensw@users.noreply.github.com>
2018-01-08 15:40:12 -08:00
Andrew Seigner cb73e42fab Fix Go and Proxy dependency image SHAs (#117)
The image tags for gcr.io/runconduit/go-deps and
gcr.io/runconduit/proxy-deps were not updating to account for all
changes in those images.

Modify SHA generation to include all files that affect the base
dependency images. Also add instructions to README.md for updating
hard-coded SHAs in Dockerfile's.

Fixes #115

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2018-01-08 11:19:49 -08:00
Eliza Weisman 77dba6f013 Change Cargo.lock to trigger deps image rebuild (#116)
Because whether or not to build a new deps image is based on the SHA of Cargo.lock, changes to the deps Dockerfile will not cause a new deps image to be built. Because of this, the current proxy deps Docker image is based on the wrong Rust version, breaking the build. See #115 for details on this issue.

I've appended a newline to Cargo.lock to change the lockfile's SHA and trigger a rebuild of the deps Docker image on CI. I've also added a comment in the Dockerfile noting that it is necessary to do this when changing that file.

Signed-off-by: Eliza Weisman eliza@buoyant.io
2018-01-08 10:29:51 -08:00
Eliza Weisman 27ababf5bd Remove `AsciiExt` import (#104)
Since the methods on this trait were moved to direct implementations on the
implementing types, this produces an unused import warning with the latest
(1.23) Rust standard library. As we set `deny(warnings)`, this breaks the build.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-04 10:49:13 -08:00
Brian Smith f1a4e98053 Remove default controller URL from proxy. (#48)
Previously there was a default controller URL in the proxy. This
default was never used for any proxy injected by `conduit inject` and
it was the wrong default when using the proxy outside of Kubernetes.
Also more generally this is such an important setting in terms of
correctness and security that it was dangerous to let it be implied in
any context.

Remove the default, requiring that it be set in order for the proxy to
start.
2018-01-02 08:44:27 -10:00
Sky Ao 20a6e18922 correct typo: Enviroment -> Environment (#100)
Signed-off-by: Sky Ao <aoxiaojian@gmail.com>
2017-12-29 10:14:48 -08:00
Kevin Lingerfelt 138d6e8a32 Add contributing doc and DCO file (#88)
* Add contributing doc and DCO file

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Fix small typos

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2017-12-22 14:54:27 -08:00
Sean McArthur c40e407ad0 disable push promises in proxy (#70) 2017-12-21 14:41:17 -08:00
Kevin Lingerfelt 957d02b9cb Prepare the repo for the v0.1.1 release (#75)
* Prepare the repo for the v0.1.1 release

* Add changelog

* Changelog updates, wrap at 100 characters
2017-12-20 10:51:53 -08:00
Brian Smith 1164759540 Proxy: Map unqualified/partially-qualified names to FQDN (#59)
* Proxy: Map unqualified/partially-qualified names to FQDN

Previously we required the service to fully qualify all service names
for outbound traffic. Many services are written assuming that
Kubernetes will complete names using its DNS search path, and those
services weren't working with Conduit.

Now add an option, used by default, to fully-qualify the domain names.
Currently only Kubernetes-like name completion for services is
supported, but the configuration syntax is open-ended to allow for
alternatives in the future. Also, the auto-completion can be disabled
for applications that prefer to ensure they're always using unambiguous
names. Once routing is implemented then it is likely that (default)
routing rules will replace these hard-coded rules.

Unit tests for the name completion logic are included.

Part of the solution for #9. The changes to `conduit inject` to
actually use this facility will be in another PR.
2017-12-19 11:59:26 -10:00
Brian Smith fdf9f1a81c Use `connection::Connection` for outbound connections (#51)
Previously `connection::Connection` was only being used for inbound
connections, not outbound connections. This led to some duplicate
logic and also made it difficult to adapt that code to enable TLS.

Now outbound connections use `connection::Connection` too. This will
allow the upcoming TLS logic to guarantee that `TCP_NODELAY` is
enabled at the right time, and the TLS logic also control access to
the underlying plaintext socket for security reasons.
2017-12-15 12:44:25 -10:00
Brian Smith 1af68d3a14 Encapsulate listening port connection acceptance logic (#46)
Previously every use of `BoundPort` repeated a bunch of logic.

Move the repeated logic to `BoundPort` itself. Just remove the no-op
handshaking logic; new handshaking logic will be added to `BoundPort`
when TLS is added.
2017-12-14 13:19:05 -10:00
Brian Smith 95cb05d3a9 Move default private connect timeout to `Config` (#42)
Previously the default value of this setting was in lib.rs instead of
being automatically set in `Config` like all the other defaults, which
was inconsistent and confusing.

Fix this by moving the defaulting logic to `Config`.

Validated by running the test suite.
2017-12-13 21:15:21 -06:00
Brian Smith 284fbcfb20 Centralize and clarify TCP port binding (#43)
Previously the logic related to listening for incoming TCP connections
was duplicated in several places.

Begin centralizing this logic. Future commits will centralize it
further.

No validation was done other than running the test suite.
2017-12-13 19:45:15 -06:00
Brian Smith 86fb3c7e4a Proxy: Parse environment variables in one place (#26)
Previously `Process` did its own environment variable parsing and did
not benefit from the improved error handling that `config` now has.
Additionally, future changes will need access to these same environment
variables in other parts of the proxy.

Move `Process`'s environment variable parsing to `config` to address
both of these issues. Now there are no uses of `env::var` outside of
`config` except for logging, which is the final desired state.

I validated this manually.
2017-12-13 19:33:37 -06:00
Brian Smith 4ccff3f333 Proxy: Use production config parsing in tests (#25)
* Proxy: Use production config parsing in tests

Previosuly the testing code for the proxy was sensitive to the values
of environment variables unintentionally, because `Config` looked at
the environment variables. Also, the tests were largely avoiding
testing the production configuration parsing code since they were
doing their own parsing.

Now the tests avoid looking at environment variables other than
`ENV_LOG`, which makes them more resilient. Also the tests now parse
the settings using the same code as production use uses.

I validated this manually.
2017-12-13 19:27:50 -06:00
Brian Smith ad515bb537 Proxy: Parse all environment variables before aborting (#24)
Previously, as soon as we would encounter one environment variable with
an invalid value we would exit. This is frustrating behavior when
deploying to Kubernetes and there are multiple problems because the
edit-compile-test cycle is so slow.

Fix this by parsing all the environment variables and logging error
messages before exiting.

I validated this manually.
2017-12-13 18:56:14 -06:00
Eliza Weisman 97be2dd8cd Add timeout to in-flight telemetry reports (#12)
This PR adds a configurable timeout duration after which in-flight telemetry reports are dropped, cancelling the corresponding RPC request to the control plane.

I've also made the `Timeout` implementation used in `TimeoutConnect` generic, and reused it in multiple places, including the timeout for in-flight reports.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2017-12-13 15:07:36 -08:00