Commit Graph

161 Commits

Author SHA1 Message Date
Brian Smith 7319cf648f
Proxy: Do L7 load balancing for all external HTTP services. (#726)
Previously when the proxy could tell, by parsing, the request-target
is not in the cluster, it would not override the destination. That is,
load balancing would be disabled for such destinations.

With this change, the proxy will do L7 load balancing for all HTTP
services as long as the request-target has a DNS name.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-10 08:07:16 -10:00
Brian Smith bc16034fd6
Proxy: Fall back to using DNS when Destination service can't find service. (#692)
Fixes #155.
2018-04-07 18:26:06 -10:00
Brian Smith c25e9c371b
Refactor poll_destination() in service discovery. (#725)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-07 18:15:19 -10:00
Brian Smith 7d3b715c4d
Proxy: Move DNS name normalization to service discovery (#722)
Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 15:04:09 -10:00
Eliza Weisman 8bc05472ed
Make `control::Cache` key-value in order to store discovery metadata (#688)
This PR changes the proxy's `control::Cache` module from a set to a key-value map. 

This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type.

Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. 

Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata.

If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.
2018-04-06 13:54:16 -07:00
Brian Smith 15037d9618
Proxy: Improve DNS name parsing (#708)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 08:45:18 -10:00
Eliza Weisman 605e68dff6
Add pretty durations to panics from `assert_eventually!` (#677)
This PR adds the pretty-printing for durations I added in #676 to the panic message from the `assert_eventually!` macro added in #669. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-06 10:49:17 -07:00
Brian Smith c31f4ba993
Remove unused conversions for Destination. (#701)
These have not been used for a while; they are dead code.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 07:35:35 -10:00
Brian Smith 7bc4ffd0a4
Revert "Proxy: Refactor DNS name parsing and normalization (#673)" (#700)
This reverts commit 311ef410a8.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:49:32 -10:00
Brian Smith 1b223723bc
Revert "Proxy: Refactor poll_destination() in service discovery. (#674)" (#698)
This reverts commit 4fb9877b89.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:36:01 -10:00
Brian Smith 4fb9877b89
Proxy: Refactor poll_destination() in service discovery. (#674)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 13:05:11 -10:00
Brian Smith 311ef410a8
Proxy: Refactor DNS name parsing and normalization (#673)
Proxy: Refactor DNS name parsing and normalization

Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 12:32:12 -10:00
Eliza Weisman 18fa42ebd0
Pretty-print durations in log messages (#676)
This branch adds simple pretty-printing to duration in log timeout messages. If the duration is >= 1 second, it's printed in seconds with a fractional part. If the duration is less than 1 second, it is printed in milliseconds. This simple formatting may not be sufficient as a formatting rule for all cases, but should be sufficient for printing our relatively small timeouts.

Log messages now look something like this:
```
ERROR 2018-04-04T20:05:49Z: conduit_proxy: turning operation timed out after 100 ms into 500
```

Previously, they looked like this:
```
ERROR 2018-04-04T20:07:26Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
```

I made this change partially because I wanted to make the panics from the `eventually!` macro added in #669 more readable.
2018-04-05 13:47:19 -07:00
Eliza Weisman 49bf01b0da
Add `assert_eventually!` macro to help de-flake telemetry tests (#669)
Closes #615.

Based on @olix0r's suggestion in https://github.com/runconduit/conduit/issues/613#issuecomment-376024744, this PR adds an `assert_eventually!` macro to retry an assertion a set number of times, waiting for 15 ms between retries. This is loosely based on ScalaTest's [eventually](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html).

I've rewritten the flaky telemetry tests to use the `assert_eventually!` macro, to compensate for delays in the served metrics being updated between client requests and metrics scrapes.
2018-04-05 11:23:34 -07:00
Eliza Weisman 6b370b4466
Split labels out of `prometheus.rs` into its own file (#680)
The proxy's `telemetry/metrics/prometheus.rs` file was starting to get long and hard to find one's way around in. I split the prometheus labels code out into a separate submodule and `RequestLabels` and `ResponseLabels` public. This seems like a reasonable division of the code, and the resultant files are much easier to read.
2018-04-04 15:49:17 -07:00
Oliver Gould 2dc964c583
Move control::discovery::Cache into its own module (#672)
The proxy's control::discovery module is becoming a bit dense in terms
of what it implements.

In order to make this code more understandable, and to be able to use a
similar caching strategy in other parts of the controller, the
`control::cache` module now holds discovery's cache implementation.

This module is only visible within the `control` module, and it now
exposes two new public methods: `values()` and
`set_reset_on_next_modification()`.
2018-04-04 14:27:04 -07:00
Eliza Weisman 01628bfa43
Fix missing comma in gRPC status code labels (#670)
Fixes the issue caught by @olix0r in https://github.com/runconduit/conduit/pull/661#issuecomment-378431155
2018-04-04 10:41:21 -07:00
Brian Smith 06bf78ccdf
Use Rust 1.25 to build Docker images. (#667)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-03 08:22:29 -10:00
Phil Calçado 19001f8d38 Add pod-based metric_labels to destinations response (#429) (#654)
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix mock controller in proxy tests (#656)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* Address review feedback
* Rename files in the destination package

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-02 18:36:57 -07:00
Sean McArthur 47f9665b8e
proxy: allow disable protocol detection on specific ports (#648)
- Adds environment variables to configure a set of ports that, when an
  incoming connection has an SO_ORIGINAL_DST with a port matching, will
  disable protocol detection for that connection and immediately start a
  TCP proxy.
- Adds a default list of well known ports: SMTP and MySQL.

Closes #339
2018-04-02 14:24:36 -07:00
Brian Smith f931dec3b3
Proxy: Completely replace current set of destinations on reconnect (#632)
Previosuly, when the proxy was disconnected from the Destination
service and then reconnects, the proxy would not forget old, outdated
entries in its cache of endpoints. If those endpoints had been removed
while the proxy was disconnected then the proxy would never become
aware of that.

Instead, on the first message after a reconnection, replace the entire
set of cached entries with the new set, which may be empty.

Prior to this change, the new test
outbound_destinations_reset_on_reconnect_followed_by_no_endpoints_exists
passed already
but outbound_destinations_reset_on_reconnect_followed_by_add_none
and outbound_destinations_reset_on_reconnect_followed_by_remove_none
failed. Now all these tests pass.

Fixes #573

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 16:50:08 -10:00
Brian Smith bae72c32ed
Proxy: Factor out Destination service connection logic (#631)
* Proxy: Factor out Destination service connection logic

Centralize the connection initiation logic for the Destination service
to make it easier to maintain. Clarify that the `rx` field isn't needed
prior to a (re)connect.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Rename `rx` to `query`.

Signed-off-by: Brian Smith <brian@briansmith.org>

* "recoonect" -> "reconnect"

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 08:20:57 -10:00
Eliza Weisman c688cf6028
Add response classification to proxy metrics (#639)
This PR adds a `classification` label to proxy response metrics, as @olix0r described in https://github.com/runconduit/conduit/issues/634#issuecomment-376964083. The label is either "success" or "failure", depending on the following rules:
+ **if** the response had a gRPC status code, *then*
   - gRPC status code 0 is considered a success
   - all others are considered failures
+ **else if** the response had an HTTP status code, *then*
  - status codes < 500 are considered success,
  - status codes >= 500 are considered failures
+ **else if** the response stream failed **then**
  - the response is a failure.

I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-28 14:49:00 -07:00
Eliza Weisman 40b9b345a5
All counters in proxy telemetry wrap on overflows (#603)
In #602, @olix0r suggested that telemetry counters should wrap on overflows, as "most timeseries systems (like prometheus) are designed to handle this case gracefully."

This PR changes counters to use explicitly wrapping arithmetic.

Closes #602.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-27 14:03:12 -07:00
Brian Smith 7dc21f9588
Add the NoEndpoints message to the Destination API (#564)
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.

Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 10:45:41 -10:00
Eliza Weisman e7aa3d4105
Add process_start_time_seconds Prometheus metric (#628)
As described in #619. `process_start_time_seconds` is the idiomatic way of reporting to Prometheus the uptime of a process. It should contain the time in seconds since the beginning of the Unix epoch.

The proxy now exports this metric:
```
➜ http get localhost:4191/metrics
HTTP/1.1 200 OK
Content-Length: 902
Content-Type: text/plain; charset=utf-8
Date: Mon, 26 Mar 2018 22:09:55 GMT

# HELP request_total A counter of the number of requests the proxy has received.
# TYPE request_total counter

# HELP request_duration_ms A histogram of the duration of a request. This is measured from when the request headers are received to when the request stream has completed.
# TYPE request_duration_ms histogram

# HELP response_total A counter of the number of responses the proxy has received.
# TYPE response_total counter

# HELP response_duration_ms A histogram of the duration of a response. This is measured from when theresponse headers are received to when the response stream has completed.
# TYPE response_duration_ms histogram

# HELP response_latency_ms A histogram of the total latency of a response. This is measured from whenthe request headers are received to when the response stream has completed.
# TYPE response_latency_ms histogram

process_start_time_seconds 1522102089

```

Closes #619
2018-03-27 12:54:31 -07:00
Eliza Weisman bdcdfa8874
Actually skip flaky tests on CI and in Docker (#626)
Flaky proxy tests were not actually being ignored properly. This is due to our use of a Cargo workspace; as it turns out that Cargo doesn't propagate feature flags from the workspace to the crates in the workspace (see rust-lang/cargo#4753). 

If I run `cargo test --no-default-features` in the root directory, the `flaky_tests` feature is still passed, and the flaky tests still run:
```
➜ cargo test --no-default-features
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_loop ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_outbound_request_count ... ok
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_endpoint_outbound_request_duration ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok

test result: ok. 11 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:54:10Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http11_upgrade_not_supported ... ok
test http11_absolute_uri_differs_from_host ... ok
test http10_without_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_connect_not_supported ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_controller_grpc-7fdac3528475b1dc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_router-024926cac5d328ee

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/convert-ae9bd3b8fee21c85

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/futures_mpsc_lossy-4afd31454ff77b40

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-controller-grpc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests convert

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-router

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests futures-mpsc-lossy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

This also happens if the `-p` flag is used to run tests only in the `conduit-proxy` crate:

```
➜ cargo test -p conduit-proxy --no-default-features
   Compiling conduit-proxy v0.3.0 (file:///Users/eliza/Code/go/src/github.com/runconduit/conduit/proxy)
    Finished dev [unoptimized + debuginfo] target(s) in 17.27 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test inbound::tests::recognize_default_no_loop ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test inbound::tests::recognize_orig_dst ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_outbound_request_count ... ok
test metrics_endpoint_outbound_request_duration ... ok
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test inbound_sends_telemetry ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok
test records_latency_statistics ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:56:55Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```

However, if I `cd` into the `proxy` directory (so that Cargo treats the `conduit-proxy` crate as the root project, rather than the workspace) and pass the `--no-default-features` flag, the flaky tests are skipped as expected:

```
➜ (cd proxy && exec cargo test --no-default-features)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-ac198a96228a056e

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test inbound::tests::recognize_default_no_loop ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-41e0f900f97e194b

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/discovery-7ba7fe16345a347a

running 16 tests
test http1::absolute_uris::outbound_times_out ... ignored
test http1::outbound_times_out ... ignored
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http2::outbound_times_out ... ignored
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test outbound_updates_newer_services ... ok

test result: ok. 13 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/telemetry-b0763b64edd8fc68

running 12 tests
test metrics_endpoint_inbound_request_count ... ignored
test metrics_endpoint_inbound_request_duration ... ignored
test metrics_endpoint_inbound_response_latency ... ignored
test metrics_endpoint_outbound_request_count ... ignored
test metrics_endpoint_outbound_request_duration ... ignored
test metrics_endpoint_outbound_response_latency ... ignored
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok

test result: ok. 5 passed; 0 failed; 7 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/transparency-300fd801daa85ccf

running 19 tests
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http10_with_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http1_bodyless_responses ... ok
test http1_removes_connection_headers ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test inbound_tcp ... ok
test inbound_http1 ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_connections_close_if_client_closes ... ok
test tcp_with_no_orig_dst ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

I'm wrapping the `cd` and `cargo test` command in a subshell so that the CWD on Travis is still in the repo root when the command exits, but the return value from `cargo test` is  propagated.

Closes #625
2018-03-26 17:11:06 -07:00
Brian Smith 7247ffeee3
Proxy: Clarify destination test support code queue handling (#617)
Use `VecDeqeue` to make the queue structure clear. Follow good practice
by minimizing the amount of time the lock is held. Clarify how
defaulting logic works.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-26 10:45:05 -10:00
Oliver Gould 006360aa90
Skip flaky tests for #613 (#614)
The metrics endpoint tests are flaky because there are no guarantees
that the metrics pipeline has processed events before the metrics
endpoint is read. This can cause CI to fail spuriously.

Disable these tests from running in CI until #613 is resolved.
2018-03-25 14:26:14 -07:00
Andrew Seigner 291d8e97ab
Move injected data from env var to k8s labels (#605)
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.

This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 16:11:34 -07:00
Eliza Weisman 9321932918
Add request_duration_ms metric and increment request_total on request end (#589)
This PR adds the `request_duration_ms` metric to the Prometheus metrics exported by the proxy. It also modifies the `request_total` metric so that it is incremented when a request stream finishes, rather than when it opens, for consistency with how the `response_total` metric is generated.

Making this change required modifying `telemetry::sensors::http` to generate a `StreamRequestEnd` event similar to the `StreamResponseEnd` event. This is done similarly to how sensors are added to response bodies, by generalizing the `ResponseBody` type into a `MeasuredBody` type that can wrap a request or response body. Since this changed the type of request bodies, it necessitated changing request types pretty much everywhere else in the proxy codebase in order to fix the resulting type errors, which is why the diff for this PR is so large.

Closes #570
2018-03-22 15:27:34 -07:00
Eliza Weisman 3f10c80256
Fix double comma in outbound metrics (#601)
Fixes #600 

The proxy metrics endpoint has a bug where metrics recorded in the outbound direction can contain two commas in a row when no outbound label is present. This occurs because the code for formatting the outbound direction label mistakenly assumed that there would always be a destination pod owner label as well, but the proxy isn't currently aware of the destination's pod owner (waiting for #429). 

I've fixed this issue by moving the place where the comma is output from the `fmt::Display` impl for `RequestLabels` to the `fmt::Display` impl for `OutboudnLabels`. This way, the comma between the `direction` and `dst_*` labels is only output when the `dst_*` label is present. 

This bug made it to master since all of the proxy end-to-end tests for metrics only test the inbound router. I've rectified this issue by adding tests on the outbound router as well (which would fail against the current master due to the double comma bug). I've also added a test that asserts there are no double commas in exported metrics, to protect against regressions to this bug.
2018-03-22 14:17:10 -07:00
Eliza Weisman 1c9ce4d118
Add Prometheus /metrics endpoint to proxy (#569)
This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable.

Tests have been added in t`ests/telemetry.rs`
2018-03-21 16:19:32 -07:00
Brian Smith d38a2acff8
Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593)
`cargo fetch` doesn't consider the target platform and downloads
all crates needed to build for any target.

Stop using `cargo fetch` and instead use the implicit fetch done
by `cargo build`, which *does* consider the target platform.

This change results in 12 (soon 15) fewer crates downloaded.
This is a non-trivial savings in build time for a full rebuild
since cargo downloads crates in parallel.

```diff
- Downloading bitflags v1.0.1
- Downloading fuchsia-zircon v0.3.3
- Downloading fuchsia-zircon-sys v0.3.3
- Downloading miow v0.2.1
- Downloading redox_syscall v0.1.37
- Downloading redox_termios v0.1.1
- Downloading termion v1.5.1
- Downloading winapi v0.3.4
- Downloading winapi-i686-pc-windows-gnu v0.4.0
- Downloading winapi-x86_64-pc-windows-gnu v0.4.0
- Downloading wincolor v0.1.6
- Downloading ws2_32-sys v0.2.1
```

I verified that no downloads are done during an incremental
build.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-21 08:51:18 -10:00
Sean McArthur cd59465366
proxy: add SIGTERM and SIGINT handlers (#581)
When the proxy is run in a Docker container, it runs as PID 1, with
no default signal handlers setup. In order to react to signals from
Kubernetes about shutting down, we need to set up explicit handlers.

This adds handlers for SIGTERM and SIGINT.

Closes #549
2018-03-16 18:53:20 -07:00
Carl Lerche 35f61dcc56 Proxy: Upgrade h2 and indexmap crates (#572)
In order to pick up a bugfix in h2, upgrade:

h2 0.1.2
indexmap 1.0.0

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-03-14 12:35:38 -07:00
Eliza Weisman 8da70bb6e2
Run all discovery tests for HTTP/1 as well as HTTP/2 (#556)
In order to ensure we catch discovery and routing issues arising from different logic for HTTP/1 and HTTP/2 requests, I've modified tests/discovery.rs to run all applicable tests with both HTTP/1 and HTTP/2 requests. The tests themselves are largely unchanged, but now there are separate modules containing HTTP/1 and HTTP/2 versions of a majority of the tests.
2018-03-09 17:24:48 -08:00
Eliza Weisman d62a869e68
Fix outbound HTTP/1 requests not using Destinations (#555)
Commit 569d6939a7 introduced a regression that caused the proxy to stop using the Destination service for outbound HTTP/1 requests with no authority in the request URI but a valid authority in the `Host:` header. 

The bug is due to some code in `Outbound::recognize` which assumed that a request had already been passed through `normalize_our_view_of_uri`. This was valid at one point while I was writing #492, as URIs were normalized prior to `recognize` and a request `Extension` was used to mark that they had been rewritten, and the host header and request URI could be assumed to be in agreement, but after merging #514 into the dev branch for #492, this behaviour changed and I forgot to update the logic in `recognize`.

I've fixed the issue by adding the logic for routing on `Host:` headers back into `Outbound::recognize`.

@seanmonstar added a test in `discovery.rs`, `outbound_http1_asks_controller_about_host`, which should exercise this case. I've added a couple more unit tests in that file to try and ensure we cover more of the different cases that can occur here.

Fixes #552
2018-03-09 16:25:19 -08:00
Sean McArthur 83d6a1f579
proxy: improve transparency of host headers and absolute-uris (#535)
In some cases, we would adjust an existing Host header, or add one. And in all cases when an HTTP/1 request was received with an absolute-form target, it was not passed on.

Now, the Host header is never changed. And if the Uri was in absolute-form, it is sent in the same format.

Closes #518
2018-03-08 13:15:21 -08:00
Eliza Weisman 6af9871f13
Fix infinite loop in `tcp::HalfDuplex::copy_into()` (#537)
An infinite loop exists in the TCP proxy, which could be triggered by any raw TCP connection (including HTTPS requests). The connection will be proxied successfully, but instead of closing, it will remain open, and the proxy's CPU usage will remain extremely high indefinitely.

Since `Duplex::poll` will call `half_in.copy_into()`/`half_out.copy_into()` repeatedly, even after they return `Async::Ready`, when one half has shut down and returned ready, it may still be polled again, as `Duplex::poll` waits until _both_ halves have returned `Ready`. Because of the guard that `!dst.is_shutdown`, intended to prevent the destination from shutting down twice, the function will not return if it is polled again after returning `Async::Ready` once.

I've fixed this by moving the guard against double shutdowns out of the loop, so that the function will return `Async::Ready` again if it is polled after shutting down the destination.

I've also included a unit test against regressions to this bug. The unit test fails against master.

Fixes #519 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
Co-Authored-By: Andrew Seigner <andrew@sig.gy>
2018-03-08 12:43:19 -08:00
Brian Smith 3a73411375
Proxy: Test & document localhost. name resolution. (#531)
* Proxy: Test & document localhost. name resolution.

Closes #358.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 17:40:39 -10:00
Brian Smith 7aa1d0b26d
Proxy: Don't resolve absolute names outside zone using Destinations (#530)
* Proxy: Don't resolve absolute names outside zone using Destinations service

Many absolute names were being resolved using the Destinations service due to logic error
in the proxy's matching of the zone to the default zone.

Fix that bug.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:53:32 -10:00
Brian Smith 649e784d9c
Simplify cluster zone suffix handling in the proxy (#528)
* Temporarily stop trying to support configurable zones in the proxy.

None of the zone configuration is tested and lots of things assume the cluster
zone is `cluster.local`. Further, how exactly the proxy will actually learn the
cluster zone hasn't been decided yet.

Just hard-code the zone as "cluster.local" in the proxy until configurable zones
are fully implemented and tested to be working correctly.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting

The way that Kubernetes configures DNS search suffixes has some negative
consequences as some names like "example.com" are ambiguous: depending on
whether there is a service "example" in the "com" namespace, "example.com"
may refer to an external service or an internal service, and this can
fluctuate over time. In recognition of that we added the
CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would
be part of a solution for users to opt out of the unfortunate behavior
if their applications didn't depend on the DNS search suffix feature.

It turns out similar effects can be acheived using a custom dnsConfig,
starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level.
Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative.
Further, attempting to support it optionally made the code complex and hard
to read.

Therefore, let's just remove it. If/when somebody actually requests this
functionality then we can add it back, if dnsConfig isn't a valid alternative
for them.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Further hard-code "cluster.local" as the zone, temporarily.

Addresses review feedback.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 14:30:13 -10:00
Brian Smith 72c6a9cab2
Proxy: Make CONDUIT_PROXY_POD_NAMESPACE a required parameter. (#527)
Wwe will be able to simplify service discovery in the near future if we
can rely on the namespace being available.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 11:12:05 -10:00
Eliza Weisman 569d6939a7
Enforce that requests are mapped to connections for each Host: header values (#492)
This PR ensures that the mapping of requests to outbound connections is segregated by `Host:` header values. In most cases, the desired behavior is provided by Hyper's connection pooling. However, Hyper does not handle the case where a request had no `Host:` header and the request URI had no authority part, and the request was routed based on the SO_ORIGINAL_DST in the desired manner. We would like these requests to each have their own outbound connection, but Hyper will reuse the same connection for such requests. 

Therefore, I have modified `conduit_proxy_router::Recognize` to allow implementations of `Recognize` to indicate whether the service for a given key can be cached, and to only cache the service when it is marked as cachable. I've also changed the `reconstruct_uri` function, which rewrites HTTP/1 requests, to mark when a request had no authority and no `Host:` header, and the authority was rewritten to be the request's ORIGINAL_DST. When this is the case, the `Recognize` implementations for `Inbound` and `Outbound` will mark these requests as non-cachable.

I've also added unit tests ensuring that A, connections are created per `Host:` header, and B, that requests with no `Host:` header each create a new connection. The first test passes without any additional changes, but the second only passes on this branch. The tests were added in PR #489, but this branch supersedes that branch.

Fixes #415. Closes #489.
2018-03-06 16:44:14 -08:00
Sean McArthur c278228c1b
proxy: preserve body headers in http1 (#457)
As a goal of being a transparent proxy, we want to proxy requests and
responses with as little modification as possible. Basically, servers
and clients should see messages that look the same whether the proxy was
injected or not.

With that goal in mind, we want to make sure that body headers (things
like `Content-Length`, `Transfer-Encoding`, etc) are left alone. Prior
to this commit, we at times were changing behavior. Sometimes
`Transfer-Encoding` was added to requests, or `Content-Length: 0` may
have been removed. While RC 7230 defines that differences are
semantically the same, implementations may not handle them correctly.

Now, we've added some fixes to prevent any of these header changes
from occurring, along with tests to make sure library updates don't
regress.

For requests:

- With no message body, `Transfer-Encoding: chunked` should no longer be
added.
- With `Content-Length: 0`, the header is forwarded untouched.

For responses:

- Tests were added that responses not allowed to have bodies (to HEAD
requests, 204, 304) did not have `Transfer-Encoding` added.
- Tests that `Content-Length: 0` is preserved.
- Tests that HTTP/1.0 responses with no body headers do not have
`Transfer-Encoding` added.
- Tests that `HEAD` responses forward `Content-Length` headers (but not
an actual body).

Closes #447

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-03-05 18:10:51 -08:00
Eliza Weisman ad073c79b9
Remove connect timeouts from Bind (#487)
Currently, the `Reconnect` middleware does not reconnect on connection errors (see  #491) and treats them as request errors. This means that when a connection timeout is wrapped in a `Reconnect`, timeout errors are treated as request errors, and the request returns HTTP 500. Since  this is not the desired behavior, the connection timeouts should be removed, at least until their errors can be handled differently.

This PR removes the connect timeouts from `Bind`, as described in https://github.com/runconduit/conduit/pull/483#issuecomment-369380003.

It removes the `CONDUIT_PROXY_PUBLIC_CONNECT_TIMEOUT_MS` environment variable, but _not_ the `CONDUIT_PROXY_PRIVATE_CONNECT_TIMEOUT_MS` variable, since this is also used for the TCP connect timeouts. If we want also want to remove the TCP connection timeouts, I can do that as well.

Closes #483. Fixes #491.
2018-03-05 15:38:20 -08:00
Eliza Weisman c3ad9e5f2f
Use fmt::Display to format error messages in logs (#477)
This PR changes the proxy to log error messages using `fmt::Display` whenever possible, which should lead to much more readable and meaningful error messages

This is part of the work I started last week on issue #442. While I haven't finished everything for that issue (all errors still are mapped to HTTP 500 error codes), I wanted to go ahead and open a PR for the more readable error messages. This is partially because I found myself merging these changes into other branches to aid in debugging, and because I figured we may as well have the nicer logging on master.
2018-03-02 12:44:18 -08:00
Sean McArthur f9d8f3d94a
proxy: detect TCP socket hang ups from client or server (#463)
We previously `join`ed on piping data from both sides, meaning
that the future didn't complete until **both** sides had disconnected.
Even if the client disconnected, it was possible the server never knew,
and we "leaked" this future.

To fix this, the `join` is replaced with a `Duplex` future, which pipes
from both ends into the other, while also detecting when one side shuts
down. When a side does shutdown, a write shutdown is forwarded to the
other side, to allow draining to occur for deployments that half-close
sockets.

Closes #434
2018-03-02 10:14:54 -08:00
Brian Smith ea9a2c84e9
Proxy: Update domain dep. to remove rand 0.3 dep. (#495)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-01 17:16:21 -10:00
Sean McArthur 42ee56d1af
proxy: de-duplicate method parsing in tap quickcheck (#472)
Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-02-28 14:12:36 -08:00
Eliza Weisman 41bef41eb5
Refactor FullyQualifiedAuthority::normalize to always return authority (#476)
As requested by @briansmith in https://github.com/runconduit/conduit/issues/415#issuecomment-369026560 and https://github.com/runconduit/conduit/issues/415#issuecomment-369032059, I've refactored `FullyQualifiedAuthority::normalize` to _always_ return a `FullyQualifiedAuthority`, along with a boolean value indicating whether or not the Destination service should be used for that authority. 

This is in contrast to returning an `Option<FullyQualifiedAuthority>` where `None` indicated that the Destination service should not be used, which is what this function did previously.

This is required for further progress on #415.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-27 17:38:43 -08:00
Brian Smith 0d14c196f5
Proxy: Upgrade from ordermap 0.2 crate to indexmap 0.4. (#466)
Currently we have to download and build two different versions of
the ordermap crate.

I will submit similar PRs for the dependent crates so that we will
eventually all be using the same version of indexmap.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 19:29:22 -10:00
Brian Smith d993820cb3
Fix intermittent outbound_times_out failure. (#471)
This was caused by the fact that a new instance of `env_logger::init()`
was added after the PR that rewrote them all to `env_logger::try_init()`
was added.

Fixes #469

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 19:07:36 -10:00
Brian Smith 6b4d294a40
Reduce memory allocations during logging. (#445)
Stop initializing env_logger in every test. In env_logger 0.5, it
may only be initialized once per process.

Also, Prost will soon upgrade to env_logger 0.5 and this will
(eventually) help reduce the number of versions of env_logger we
have to build. Turning off the regex feature will (eventually) also
reduce the number of dependencies we have to build. Unfortunately,
as it is now, the number of dependencies has increased because
env_logger increased its dependencies in 0.5.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 18:32:47 -10:00
Brian Smith 578cffca22
Turn off the `use_logging` default feature of quickcheck. (#465)
Turning off the default features of quickcheck removes its
`env_logger` and `log` dependencies. It uses older versions of
those packages than conduit-proxy will use, so this will
(eventually) reduce the number of versions of those packages that
get downloaded and built.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 16:04:58 -10:00
Brian Smith 617d7da894
Remove the unused tokio-proto build dependency. (#451)
Hyper depends on tokio-proto with a default feature. By turning off
its default features, we can avoid that dependency. That reduces the
number of dependencies by 4.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 08:56:05 -10:00
Brian Smith 8607875267
Stop using the url crate in the proxy. (#450)
Version 1.7.0 of the url crate seems to be broken which means we cannot
`cargo update` the proxy without locking url to version 1.6. Since we only
use it in a very limited way anyway, and since we use http::uri for parsing
much more, just switch all uses of the url crate to use http::uri for parsing
instead.

This eliminates some build dependencies.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-26 08:55:48 -10:00
Eliza Weisman 694f691b71
Add timeout to Outbound::bind_service (#436)
Closes #403.

When the Destination service does not return a result for a service, the proxy connection for that service will hang indefinitely waiting for a result from Destination. If, for example, the requested name doesn't exist, this means that the proxy will wait forever, rather than responding with an error.

I've added a timeout wrapping the service returned from `<Outbound as Recognize>::bind_service`. The timeout can be configured by setting the `CONDUIT_PROXY_BIND_TIMEOUT` environment variable, and defaults to 10 seconds (because that's the default value for [a similar configuration in Linkerd](https://linkerd.io/config/1.3.5/linkerd/index.html#router-parameters)).

Testing with @klingerf's reproduction from #403:
```
curl -sIH 'Host: httpbin.org' $(minikube service proxy-http --url)/get | head -n1
HTTP/1.1 500 Internal Server Error
```
proxy logs:
```rust
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy using controller at HostAndPort { host: Domain("proxy-api.conduit.svc.cluster.local"), port: 8086 }
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy routing on V4(127.0.0.1:4140)
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy proxying on V4(0.0.0.0:4143) to None
proxy-5698f79b66-8rczl conduit-proxy INFO conduit_proxy::transport::connect "controller-client", DNS resolved proxy-api.conduit.svc.cluster.local to 10.0.0.240
proxy-5698f79b66-8rczl conduit-proxy ERR! conduit_proxy::map_err turning service error into 500: Inner(Timeout(Duration { secs: 10, nanos: 0 }))
```
2018-02-26 10:18:35 -08:00
Eliza Weisman 6309741ae7
Add flaky_tests feature for skipping some tests on CI (#441)
This PR adds a `flaky_tests` cargo feature to control whether or not to ignore tests that are timing-dependent. This feature is enabled by default in local builds, but disabled on CI and in all Docker builds.

Closes #440
2018-02-26 10:17:53 -08:00
Sean McArthur 381fb3800e
proxy: don't send transfer-encoding for empty GET requests (#410)
This is fixed in hyper v0.11.19.

Closes #402
2018-02-23 16:22:45 -08:00
Sean McArthur 7365c5645a
proxy: fix intermittent tap match method quickcheck failure (#427)
Closes #413
2018-02-23 12:13:51 -08:00
Sean McArthur b861a6df31
proxy: improve overall quality of logs (#424)
- Removed several useless `trace!("poll")` lines
- Upgraded controller client errors to `error` level
- Made controller client errors human-readable
2018-02-22 15:51:04 -08:00
Sean McArthur 382b68faeb proxy: add more tests for loopback IPs in FullyQualifiedAuthority (#411)
Closes #357

Signed-off-by: Sean McArthur <sean@seanmonstar.com>
2018-02-21 18:15:23 -10:00
Eliza Weisman 50103d479a
Remove timestamps from log messages (#399)
As @olix0r requested in https://github.com/runconduit/conduit/issues/56#issuecomment-356771758, I've removed timestamps from the Conduit proxy's log records.

Closes #56

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-21 15:17:17 -08:00
Kevin Lingerfelt 383babfae2
Prepare for the v0.3.0 release (#406)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-02-21 11:14:11 -08:00
Carl Lerche 287128885e
Proxy: Limit the max number of in-flight requests. (#398)
Currently, the max number of in-flight requests in the proxy is
unbounded. This is due to the `Buffer` middleware being unbounded.

This is resolved by adding an instance of `InFlightLimit` around
`Buffer`, capping the max number of in-flight requests for a given
endpoint.

Currently, the limit is hardcoded to 10,000. However, this will
eventually become a configuration value.

Fixes #287

Signed-off-by: Carl Lerche <me@carllerche.com>
2018-02-20 19:56:21 -08:00
Sean McArthur 236f71fbe0
proxy: use original dst if authority doesnt look like local service (#397)
The proxy will check that the requested authority looks like a local service, and if it doesn't, it will no longer ask the Destination service about the request, instead just using the SO_ORIGINAL_DST, enabling egress naturally.

The rules used to determine if it looks like a local service come from this comment:

> If default_zone.is_none() and the name is in the form $a.$b.svc, or if !default_zone.is_none() and the name is in the form $a.$b.svc.$default_zone, for some a and some b, then use the Destination service. Otherwise, use the IP given.
2018-02-20 18:09:21 -08:00
Oliver Gould c454ac413c
Upgrade to Rust 1.24.0 (#363)
Upgrade to Rust 1.24.0
2018-02-16 14:37:29 -08:00
Eliza Weisman 8bc497a057
Remove unused metrics (#322)
Removed the `method` label from Prometheus, and removed HTTP methods from reports. Removed `StreamSummary` from reports and replaced it with a `u32` count of streams.

Closes #266 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 17:14:17 -08:00
Eliza Weisman 458e9d2ac5
Remove per-path metrics from telemetry pipeline (#317)
Follow-up from #315.

Now that the UIs don't report per-path metrics, we can remove the path label from Prometheus, the path aggregation and filtering options from the telemetry API, and the path field from the proxy report API.

I've modified the tests to no longer expect the removed fields, and manually verified that Conduit still works after making these changes.

Closes #265 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-02-09 14:20:28 -08:00
Eliza Weisman 915f08ac4c
Store proxy latencies in a structure that matches controller histogram (#11)
The proxy currently stores latency values in an `OrderMap` and reports every observed latency value to the controller's telemetry API since the last report. The telemetry API then sends each individual value to Prometheus. This doesn't scale well when there are a large number of proxies making reports. 

I've modified the proxy to use a fixed-size histogram that matches the histogram buckets in Prometheus. Each report now includes an array indicating the histogram bounds, and each response scope contains a set of counts corresponding to each index in the bounds array, indicating the number of times a latency in that bucket was observed. The controller then reports the upper bound of each bucket to Prometheus, and can use the proxy's reported set of bucket bounds so that the observed values will be correct even if the bounds in the control plane are changed independently of those set in the proxy.

I've also modified `simulate-proxy` to generate the new report structure, and added tests in the proxy's telemetry test suite validating the new behaviour.
2018-02-07 18:02:59 -08:00
Oliver Gould a2d537f5c4
Use a load-aware balancer (#251)
Currently, the conduit proxy uses a simplistic Round-Robin load
balancing algorithm. This strategy degrades severely when individual
endpoints exhibit abnormally high latency.

This change improves this situation somewhat by making the load balancer
aware of the number of outstanding requests to each endpoint. When nodes
exhibit high latency, they should tend to have more pending requests
than faster nodes; and the Power-of-Two-Choices node selector can be
used to distribute requests to lesser-loaded instances.

From the finagle guide:

    The algorithm randomly picks two nodes from the set of ready endpoints
    and selects the least loaded of the two. By repeatedly using this
    strategy, we can expect a manageable upper bound on the maximum load of
    any server.

    The maximum load variance between any two servers is bound by
    ln(ln(n))` where `n` is the number of servers in the cluster.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2018-02-07 09:39:31 -08:00
Oliver Gould 6a0936e699
Remove proxy/Dockerfile-deps (#279)
The current proxy Dockerfile configuration does not cache dependencies
well, which can increase build times substantially.

By carefully splitting proxy/Dockerfile into several stages that mock
parts of the project, dependencies may be built and cached in Docker
such that changes to the proxy only require building the conduit-proxy
crate.

Furthermore, proxy/Dockerfile now runs the proxy's tests before
producing an artifact, unless the ` PROXY_SKIP_TESTS` build-arg is set
and not-empty.

The `PROXY_UNOPTIMIZED` build-arg has been added to support quicker,
debug-friendly builds.
2018-02-06 13:01:38 -08:00
Oliver Gould e2093e37f8
Move the Rust gRPC bindings to a dedicated crate (#275)
The proxy depends on `protoc`-generated gRPC bindings to communicate
with the controller. In order to generate these bindings, build-time
dependencies must be compiled.

In order to support a more granular, cacheable build scheme, a new crate
has been created to house these gRPC bindings,
`conduit-proxy-controller-grpc`.

Because `TryFrom` and `TryInto` conversions are implemented for
protobuf-defined types, the `convert` module also had to be moved to
into a dedicated crate.

Furthermore, because the proxy's tests require that
`quickcheck::Aribtrary` be implemented for protobuf types, the
`conduit-proxy-controller-grpc` crate supports an _arbitrary_ feature
fla protobuf types, the `conduit-proxy-controller-grpc` crate supports
an _arbitrary_ feature flag.

While we're moving these libraries around, the `tower-router` crate has
been moved to `proxy/router` and renamed to `conduit-proxy-router.`
`futures-mpsc-lossy` has been moved into the proxy directory but has not
been renamed.

Finally, the `proxy/Dockerfile-deps` image has been updated to avoid the
wasteful building of dependency artifacts, as they are not actually used
by `proxy/Dockerfile`.
2018-02-06 10:31:48 -08:00
Andrew Seigner 277c06cf1e
Simplify and refactor k8s labels and annnotations (#227)
The conduit.io/* k8s labels and annotations we're redundant in some
cases, and not flexible enough in others.

This change modifies the labels in the following ways:
`conduit.io/plane: control` => `conduit.io/controller-component: web`
`conduit.io/controller: conduit` => `conduit.io/controller-ns: conduit`
`conduit.io/plane: data` => (remove, redundant with `conduit.io/controller-ns`)
It also centralizes all k8s labels and annotations into
pkg/k8s/labels.go, and adds tests for the install command.

Part of #201

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-01 14:12:06 -08:00
Eliza Weisman eddc37de28 Adopt external tower-grpc and tower-h2 deps #225)
The conduit repo includes several library projects that have since been
moved into external repos, including `tower-grpc` and `tower-h2`.

This change removes these vendored libraries in favor of using the new
external crates.
2018-02-01 11:57:02 -08:00
Dennis Adjei-Baah 01312f9ffe
Prepare for v0.2.0 release (#248)
* prepare for v0.2.0 release

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-01-31 15:39:48 -08:00
Sean McArthur 9720a32de7
proxy: fix tcp_with_no_orig_dst test (#229)
Sometimes, the try_read will return a connection error, sometimes it
will just return EOF. Handle both cases.

Closes #226
2018-01-29 15:15:06 -08:00
Sean McArthur b861318e86
proxy: fix h1 streams to trigger response end events
Response End events were only triggered after polling the trailers of
a response, but when the Response is given to a hyper h1 server, it
doesn't know about trailers, so they were never polled!

The fix is that the `BodyStream` glue will now poll the wrapped body for
trailers after it sees the end of the data, before telling hyper the
stream is over. This ensures a ResponseEnd event is emitted.

Includes a proxy telemetry test over h1 connections.
2018-01-25 16:36:16 -08:00
Andrew Seigner aa17e37ab5
Add docker deps validation to ci (#207)
If docker image tags were out of date, ci would not fail until the
docker-deploy stage (master merge).

Modify ci to validate tags as part of the default ci run.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-25 12:41:02 -08:00
Andrew Seigner 4e2eb18f1d
Use cargo frozen flag in build scripts (#206)
The cargo commands in our docker and ci scripts were at risk for
modifying Cargo.lock and cache.

Using cargo's --frozen flag (and --locked during fetch) ensures our
build is consistent with what's defined across Cargo.toml, Cargo.lock,
and cached build artifacts.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-25 11:08:15 -08:00
Andrew Seigner d0a0bb22bd
Move EosCtx to common for Tap and Telemetery (#204)
* Make Eos optional in TapEvent

grpc_status not being set in protobuf is the same as being set to zero,
which is also status OK

Modify TapEvent to include an optional EOS struct

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

Part of #198

* Add Eos to proto & proxy tap end-of-stream events

The proxy now outputs `Eos` instead of `grpc_status` in all end-of-stream tap events. The EOS value is set to `grpc_status_code` when the response ended with a `grpc_status` trailer, `http_reset_code` when the response ended with a reset, and no `Eos` when the response ended gracefully without a `grpc_status` trailer.

This PR updates the proxy. The proto and controller changes are in PR #204.
Part of #198. Closes #202

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-24 15:48:00 -08:00
Sean McArthur 54aef56e25
proxy: add transparent protocol detection and handling
The proxy will now try to detect what protocol new connections are
using, and route them accordingly. Specifically:

- HTTP/2 stays the same.
- HTTP/1 is now accepted, and will try to send an HTTP/1 request
  to the target.
- If neither HTTP/1 nor 2, assume a TCP stream and simply forward
  between the source and destination.

* tower-h2: fix Server Clone bounds
* proxy: implement Async{Read,Write} extra methods for Connection

Closes #130 
Closes #131
2018-01-23 16:14:07 -08:00
Andrew Seigner 06c9894c31
Updates for v0.1.3 release (#185)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-19 13:58:52 -08:00
Andrew Seigner e6f17faf28
Updates for v0.1.2 release (#171)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-01-19 10:56:20 -08:00
Oliver Gould 008f53865b
Make proxy-deps multi-stage to remove the original source files (#161)
Previously, proxy-deps and go-deps included the source tree for local
projects. This can cause build conflicts when files are renamed.

By adopting a multi-stage build for the proxy-deps image, we can be sure
that we only preserve essential dependencies & manifests in the
proxy-deps and go-deps images.

Furthermore, `bin/update-go-deps-shas` and `bin/update-proxy-deps-shas` have
been added to ease maintenance when files are changed.

Fixes #159

Signed-off-by: Oliver Gould <ver@buoyant.io>
2018-01-17 12:26:22 -08:00
Eliza Weisman 2b20a8bb10
Use cargo:rerun-if-changed to avoid recompiling protos (#160)
As @seanmonstar noticed, the build script will currently re-compile all the protobufs regardless of whether or not they have changed, making the build much slower. 

This PR modifies it to emit `cargo:rerun-if-changed=` for all the protobuf files, so they will only be regenerated if one of them changes.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-17 09:23:27 -08:00
Eliza Weisman 63d1a5d70d
Add Protocol field to Transports telemetry (#138)
See #132. This PR adds a protocol field to the ClientTransport and ServerTransport messages, and modifies the proxy to report a value for this field (currently, it's only ever HTTP).

Currently, HTTP/1 and HTTP/2 are collapsed into one Protocol variant, see #132 (comment). I expect that we can treat H1 as a subset of H2 as far as metrics goes.

Note that after discussing it with @klingerf, I learned that the control plane telemetry API currently does not do anything with the ClientTransport and ServerTransport messages, so beyond regenerating the protobuf-generated code, no controller changes were actually necessary. As we actually add metrics to TCP transports, we'll want to make some additions to the telemetry API to ingest these metrics. If any metrics are shared between HTTP and raw TCP transports (say, bytes sent), we'll want to differentiate between them in Prometheus. All the metrics that the control plane currently ingests from telemetry reports are likely to be HTTP-specific (requests, responses, response latencies), or at least, do not apply to raw TCP.

Actually adding metrics to raw TCP transports will probably have to wait until there are raw TCP transports implemented in the proxy...

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-11 16:00:38 -08:00
clemensw b1831cd415 [proxy] Fix rendering for top-level rustdoc (#113)
Signed-off-by: clemensw <clemensw@users.noreply.github.com>
2018-01-08 15:40:12 -08:00
Andrew Seigner caeb83a526
Fix Go and Proxy dependency image SHAs (#117)
The image tags for gcr.io/runconduit/go-deps and
gcr.io/runconduit/proxy-deps were not updating to account for all
changes in those images.

Modify SHA generation to include all files that affect the base
dependency images. Also add instructions to README.md for updating
hard-coded SHAs in Dockerfile's.

Fixes #115

Signed-off-by: Andrew Seigner <andrew@sig.gy>
2018-01-08 11:19:49 -08:00
Eliza Weisman 90b5ddfb00
Change Cargo.lock to trigger deps image rebuild (#116)
Because whether or not to build a new deps image is based on the SHA of Cargo.lock, changes to the deps Dockerfile will not cause a new deps image to be built. Because of this, the current proxy deps Docker image is based on the wrong Rust version, breaking the build. See #115 for details on this issue.

I've appended a newline to Cargo.lock to change the lockfile's SHA and trigger a rebuild of the deps Docker image on CI. I've also added a comment in the Dockerfile noting that it is necessary to do this when changing that file.

Signed-off-by: Eliza Weisman eliza@buoyant.io
2018-01-08 10:29:51 -08:00
Eliza Weisman 2bd9f99f6b
Require Rust 1.23 in proxy Dockerfile-deps (#105)
After merging #104, Conduit will not build against pre-1.23 Rust versions. This PR updates the Dockerfile to require this version. This should fix the build on master.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-05 13:54:38 -08:00
Eliza Weisman 67d4b56253
Remove `AsciiExt` import (#104)
Since the methods on this trait were moved to direct implementations on the
implementing types, this produces an unused import warning with the latest
(1.23) Rust standard library. As we set `deny(warnings)`, this breaks the build.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-01-04 10:49:13 -08:00
Brian Smith 02176d8d16
Remove default controller URL from proxy. (#48)
Previously there was a default controller URL in the proxy. This
default was never used for any proxy injected by `conduit inject` and
it was the wrong default when using the proxy outside of Kubernetes.
Also more generally this is such an important setting in terms of
correctness and security that it was dangerous to let it be implied in
any context.

Remove the default, requiring that it be set in order for the proxy to
start.
2018-01-02 08:44:27 -10:00
Sky Ao 238c54414b correct typo: Enviroment -> Environment (#100)
Signed-off-by: Sky Ao <aoxiaojian@gmail.com>
2017-12-29 10:14:48 -08:00
Sean McArthur 8e2dd66bfa
disable push promises in proxy (#70) 2017-12-21 14:41:17 -08:00
Kevin Lingerfelt a8e75115ab
Prepare the repo for the v0.1.1 release (#75)
* Prepare the repo for the v0.1.1 release

* Add changelog

* Changelog updates, wrap at 100 characters
2017-12-20 10:51:53 -08:00
Brian Smith 8385a7a8c1
Proxy: Map unqualified/partially-qualified names to FQDN (#59)
* Proxy: Map unqualified/partially-qualified names to FQDN

Previously we required the service to fully qualify all service names
for outbound traffic. Many services are written assuming that
Kubernetes will complete names using its DNS search path, and those
services weren't working with Conduit.

Now add an option, used by default, to fully-qualify the domain names.
Currently only Kubernetes-like name completion for services is
supported, but the configuration syntax is open-ended to allow for
alternatives in the future. Also, the auto-completion can be disabled
for applications that prefer to ensure they're always using unambiguous
names. Once routing is implemented then it is likely that (default)
routing rules will replace these hard-coded rules.

Unit tests for the name completion logic are included.

Part of the solution for #9. The changes to `conduit inject` to
actually use this facility will be in another PR.
2017-12-19 11:59:26 -10:00
Brian Smith 4fa4891694
Use `connection::Connection` for outbound connections (#51)
Previously `connection::Connection` was only being used for inbound
connections, not outbound connections. This led to some duplicate
logic and also made it difficult to adapt that code to enable TLS.

Now outbound connections use `connection::Connection` too. This will
allow the upcoming TLS logic to guarantee that `TCP_NODELAY` is
enabled at the right time, and the TLS logic also control access to
the underlying plaintext socket for security reasons.
2017-12-15 12:44:25 -10:00