linkerd2

Commit Graph

Author	SHA1	Message	Date
Andrew Seigner	1ed4a93b5e	Higher velocity metrics from simulate-proxy (#635 ) simulate-proxy increments a single set of metrics on each iteration, and also randomizes http status codes, leaving counters unchanged across several collections. Modify simuilate-proxy to increment all metrics on each iteration, provide a 90% success rate, ensure a pod does not call itself, and increase proxy count from 3 to 10. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-28 13:30:02 -07:00
Kevin Lingerfelt	59c75a73a9	Add tests/utils/scripts for running integration tests (#608 ) * Add tests/utils/scripts for running integration tests Add a suite of integration tests in the `test/` directory, as well as utilities for testing in the `testutil/` directory. You can use the `bin/test-run` script to run the full suite of tests, and the `bin/test-cleanup` script to cleanup after the tests. The test/README.md file has more information about running tests. @pcalcado, @franziskagoltz, and @rmars also contributed to this change. * Create TEST.md file at the root of the repo * Update based on review feedback * Relax external service IP timeout for GKE * Update TEST.md with more info about different types of test runs * More updates to TEST.md based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 15:06:55 -07:00
Andrew Seigner	fe35509406	Clean up Prometheus labels scraped from proxy (#633 ) The Prometheus scrape config collects from Conduit proxies, and maps Kubernetes labels to Prometheus labels, appending "k8s_". This change keeps the resultant Prometheus labels consistent with their source Kubernetes labels. For example: "deployment" and "pod_template_hash". Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-27 15:01:08 -07:00
Eliza Weisman	40b9b345a5	All counters in proxy telemetry wrap on overflows (#603 ) In #602, @olix0r suggested that telemetry counters should wrap on overflows, as "most timeseries systems (like prometheus) are designed to handle this case gracefully." This PR changes counters to use explicitly wrapping arithmetic. Closes #602. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-03-27 14:03:12 -07:00
Brian Smith	7dc21f9588	Add the NoEndpoints message to the Destination API (#564 ) Have the controller tell the client whether the service exists, not just what are available. This way we can implement fallback logic to alternate service discovery mechanisms for ambigious names. Signed-off-by: Brian Smith <brian@briansmith.org> Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-27 10:45:41 -10:00
Eliza Weisman	e7aa3d4105	Add process_start_time_seconds Prometheus metric (#628 ) As described in #619. `process_start_time_seconds` is the idiomatic way of reporting to Prometheus the uptime of a process. It should contain the time in seconds since the beginning of the Unix epoch. The proxy now exports this metric: ``` ➜ http get localhost:4191/metrics HTTP/1.1 200 OK Content-Length: 902 Content-Type: text/plain; charset=utf-8 Date: Mon, 26 Mar 2018 22:09:55 GMT # HELP request_total A counter of the number of requests the proxy has received. # TYPE request_total counter # HELP request_duration_ms A histogram of the duration of a request. This is measured from when the request headers are received to when the request stream has completed. # TYPE request_duration_ms histogram # HELP response_total A counter of the number of responses the proxy has received. # TYPE response_total counter # HELP response_duration_ms A histogram of the duration of a response. This is measured from when theresponse headers are received to when the response stream has completed. # TYPE response_duration_ms histogram # HELP response_latency_ms A histogram of the total latency of a response. This is measured from whenthe request headers are received to when the response stream has completed. # TYPE response_latency_ms histogram process_start_time_seconds 1522102089 ``` Closes #619	2018-03-27 12:54:31 -07:00
Eliza Weisman	bdcdfa8874	Actually skip flaky tests on CI and in Docker (#626 ) Flaky proxy tests were not actually being ignored properly. This is due to our use of a Cargo workspace; as it turns out that Cargo doesn't propagate feature flags from the workspace to the crates in the workspace (see rust-lang/cargo#4753). If I run `cargo test --no-default-features` in the root directory, the `flaky_tests` feature is still passed, and the flaky tests still run: ``` ➜ cargo test --no-default-features Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f running 13 tests test fully_qualified_authority::tests::test_normalized_authority ... ok test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok test ctx::transport::tests::same_addr_ipv4 ... ok test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok test ctx::transport::tests::same_addr_ipv6 ... ok test telemetry::tap::match_::tests::http_from_proto ... ok test inbound::tests::recognize_default_no_ctx ... ok test telemetry::tap::match_::tests::tcp_from_proto ... ok test telemetry::tap::match_::tests::tcp_matches ... ok test inbound::tests::recognize_default_no_loop ... ok test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok test inbound::tests::recognize_default_no_orig_dst ... ok test inbound::tests::recognize_orig_dst ... ok test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/conduit_proxy-74584a35ef749a60 running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/discovery-73cd0b65bd7a45ae running 16 tests test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok test http1::outbound_reconnects_if_controller_stream_ends ... ok test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok test http1::outbound_asks_controller_without_orig_dst ... ok test http1::absolute_uris::outbound_asks_controller_api ... ok test http1::outbound_asks_controller_api ... ok test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok test http2::outbound_reconnects_if_controller_stream_ends ... ok test http2::outbound_asks_controller_api ... ok test http2::outbound_asks_controller_without_orig_dst ... ok test http1::outbound_uses_orig_dst_if_not_local_svc ... ok server h1 error: invalid HTTP version specified test http2::outbound_uses_orig_dst_if_not_local_svc ... ok ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500 test outbound_updates_newer_services ... ok ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http1::absolute_uris::outbound_times_out ... ok ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http2::outbound_times_out ... ok ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http1::outbound_times_out ... ok test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/telemetry-cb5bee2d2b94332c running 12 tests test metrics_endpoint_inbound_request_count ... ok test metrics_endpoint_inbound_request_duration ... ok test metrics_endpoint_outbound_request_count ... ok test records_latency_statistics ... ignored test telemetry_report_errors_are_ignored ... ok test metrics_endpoint_outbound_request_duration ... ok test metrics_have_no_double_commas ... ok test http1_inbound_sends_telemetry ... ok test inbound_sends_telemetry ... ok test inbound_aggregates_telemetry_over_several_requests ... ok test metrics_endpoint_inbound_response_latency ... ok test metrics_endpoint_outbound_response_latency ... ok test result: ok. 11 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out Running target/debug/deps/transparency-9d14bf92d8ba3700 running 19 tests ERROR 2018-03-26T20:54:10Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500 test http11_upgrade_not_supported ... ok test http11_absolute_uri_differs_from_host ... ok test http10_without_host ... ok test http1_head_responses ... ok test http10_with_host ... ok test http1_connect_not_supported ... ok test http1_bodyless_responses ... ok test http1_content_length_zero_is_preserved ... ok test http1_removes_connection_headers ... ok test http1_one_connection_per_host ... ok test inbound_http1 ... ok test inbound_tcp ... ok test http1_requests_without_body_doesnt_add_transfer_encoding ... ok test http1_response_end_of_file ... ok test http1_requests_without_host_have_unique_connections ... ok test outbound_tcp ... ok test tcp_with_no_orig_dst ... ok test tcp_connections_close_if_client_closes ... ok test outbound_http1 ... ok test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/conduit_proxy_controller_grpc-7fdac3528475b1dc running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/conduit_proxy_router-024926cac5d328ee running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/convert-ae9bd3b8fee21c85 running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/futures_mpsc_lossy-4afd31454ff77b40 running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests conduit-proxy running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests conduit-proxy-controller-grpc running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests convert running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests conduit-proxy-router running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests futures-mpsc-lossy running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` This also happens if the `-p` flag is used to run tests only in the `conduit-proxy` crate: ``` ➜ cargo test -p conduit-proxy --no-default-features Compiling conduit-proxy v0.3.0 (file:///Users/eliza/Code/go/src/github.com/runconduit/conduit/proxy) Finished dev [unoptimized + debuginfo] target(s) in 17.27 secs Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f running 13 tests test fully_qualified_authority::tests::test_normalized_authority ... ok test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok test ctx::transport::tests::same_addr_ipv6 ... ok test ctx::transport::tests::same_addr_ipv4 ... ok test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok test inbound::tests::recognize_default_no_loop ... ok test telemetry::tap::match_::tests::http_from_proto ... ok test inbound::tests::recognize_default_no_orig_dst ... ok test inbound::tests::recognize_default_no_ctx ... ok test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok test telemetry::tap::match_::tests::tcp_from_proto ... ok test inbound::tests::recognize_orig_dst ... ok test telemetry::tap::match_::tests::tcp_matches ... ok test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/conduit_proxy-74584a35ef749a60 running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/discovery-73cd0b65bd7a45ae running 16 tests test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok test http1::outbound_reconnects_if_controller_stream_ends ... ok test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok test http1::outbound_asks_controller_without_orig_dst ... ok test http1::absolute_uris::outbound_asks_controller_api ... ok test http1::outbound_asks_controller_api ... ok test http1::outbound_uses_orig_dst_if_not_local_svc ... ok test http2::outbound_reconnects_if_controller_stream_ends ... ok test http2::outbound_asks_controller_without_orig_dst ... ok test http2::outbound_asks_controller_api ... ok test http2::outbound_uses_orig_dst_if_not_local_svc ... ok server h1 error: invalid HTTP version specified ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500 test outbound_updates_newer_services ... ok ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http1::absolute_uris::outbound_times_out ... ok ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http1::outbound_times_out ... ok ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 test http2::outbound_times_out ... ok test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/telemetry-cb5bee2d2b94332c running 12 tests test metrics_endpoint_inbound_request_duration ... ok test metrics_endpoint_inbound_request_count ... ok test metrics_endpoint_outbound_request_count ... ok test metrics_endpoint_outbound_request_duration ... ok test telemetry_report_errors_are_ignored ... ok test metrics_have_no_double_commas ... ok test inbound_sends_telemetry ... ok test http1_inbound_sends_telemetry ... ok test inbound_aggregates_telemetry_over_several_requests ... ok test metrics_endpoint_inbound_response_latency ... ok test metrics_endpoint_outbound_response_latency ... ok test records_latency_statistics ... ok test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running target/debug/deps/transparency-9d14bf92d8ba3700 running 19 tests ERROR 2018-03-26T20:56:55Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500 test http1_connect_not_supported ... ok test http11_upgrade_not_supported ... ok test http10_without_host ... ok test http11_absolute_uri_differs_from_host ... ok test http1_head_responses ... ok test http10_with_host ... ok test http1_bodyless_responses ... ok test http1_content_length_zero_is_preserved ... ok test http1_removes_connection_headers ... ok test http1_one_connection_per_host ... ok test http1_response_end_of_file ... ok test http1_requests_without_host_have_unique_connections ... ok test inbound_http1 ... ok test inbound_tcp ... ok test http1_requests_without_body_doesnt_add_transfer_encoding ... ok test outbound_tcp ... ok test tcp_with_no_orig_dst ... ok test tcp_connections_close_if_client_closes ... ok test outbound_http1 ... ok test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests conduit-proxy running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` However, if I `cd` into the `proxy` directory (so that Cargo treats the `conduit-proxy` crate as the root project, rather than the workspace) and pass the `--no-default-features` flag, the flaky tests are skipped as expected: ``` ➜ (cd proxy && exec cargo test --no-default-features) Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-ac198a96228a056e running 13 tests test fully_qualified_authority::tests::test_normalized_authority ... ok test ctx::transport::tests::same_addr_ipv4 ... ok test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok test ctx::transport::tests::same_addr_ipv6 ... ok test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok test telemetry::tap::match_::tests::tcp_from_proto ... ok test telemetry::tap::match_::tests::http_from_proto ... ok test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok test telemetry::tap::match_::tests::tcp_matches ... ok test inbound::tests::recognize_default_no_ctx ... ok test inbound::tests::recognize_default_no_loop ... ok test inbound::tests::recognize_default_no_orig_dst ... ok test inbound::tests::recognize_orig_dst ... ok test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-41e0f900f97e194b running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/discovery-7ba7fe16345a347a running 16 tests test http1::absolute_uris::outbound_times_out ... ignored test http1::outbound_times_out ... ignored test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok test http1::outbound_reconnects_if_controller_stream_ends ... ok test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok test http1::outbound_uses_orig_dst_if_not_local_svc ... ok test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok test http1::outbound_asks_controller_without_orig_dst ... ok test http1::outbound_asks_controller_api ... ok test http1::absolute_uris::outbound_asks_controller_api ... ok test http2::outbound_times_out ... ignored server h1 error: invalid HTTP version specified ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500 test http2::outbound_reconnects_if_controller_stream_ends ... ok test http2::outbound_uses_orig_dst_if_not_local_svc ... ok test http2::outbound_asks_controller_api ... ok test http2::outbound_asks_controller_without_orig_dst ... ok test outbound_updates_newer_services ... ok test result: ok. 13 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/telemetry-b0763b64edd8fc68 running 12 tests test metrics_endpoint_inbound_request_count ... ignored test metrics_endpoint_inbound_request_duration ... ignored test metrics_endpoint_inbound_response_latency ... ignored test metrics_endpoint_outbound_request_count ... ignored test metrics_endpoint_outbound_request_duration ... ignored test metrics_endpoint_outbound_response_latency ... ignored test records_latency_statistics ... ignored test telemetry_report_errors_are_ignored ... ok test metrics_have_no_double_commas ... ok test http1_inbound_sends_telemetry ... ok test inbound_sends_telemetry ... ok test inbound_aggregates_telemetry_over_several_requests ... ok test result: ok. 5 passed; 0 failed; 7 ignored; 0 measured; 0 filtered out Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/transparency-300fd801daa85ccf running 19 tests ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500 test http1_connect_not_supported ... ok test http11_upgrade_not_supported ... ok test http10_without_host ... ok test http10_with_host ... ok test http11_absolute_uri_differs_from_host ... ok test http1_head_responses ... ok test http1_bodyless_responses ... ok test http1_removes_connection_headers ... ok test http1_content_length_zero_is_preserved ... ok test http1_one_connection_per_host ... ok test http1_response_end_of_file ... ok test http1_requests_without_body_doesnt_add_transfer_encoding ... ok test inbound_tcp ... ok test inbound_http1 ... ok test http1_requests_without_host_have_unique_connections ... ok test outbound_tcp ... ok test tcp_connections_close_if_client_closes ... ok test tcp_with_no_orig_dst ... ok test outbound_http1 ... ok test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out Doc-tests conduit-proxy running 0 tests test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` I'm wrapping the `cd` and `cargo test` command in a subshell so that the CWD on Travis is still in the repo root when the command exits, but the return value from `cargo test` is propagated. Closes #625	2018-03-26 17:11:06 -07:00
Brian Smith	7247ffeee3	Proxy: Clarify destination test support code queue handling (#617 ) Use `VecDeqeue` to make the queue structure clear. Follow good practice by minimizing the amount of time the lock is held. Clarify how defaulting logic works. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-26 10:45:05 -10:00
Oliver Gould	006360aa90	Skip flaky tests for #613 (#614 ) The metrics endpoint tests are flaky because there are no guarantees that the metrics pipeline has processed events before the metrics endpoint is read. This can cause CI to fail spuriously. Disable these tests from running in CI until #613 is resolved.	2018-03-25 14:26:14 -07:00
Oliver Gould	c5179ba10b	Remove references to `cli` images (#611 ) CI builds on master have been failing to publish `cli-bin` images because the `docker-push` script still refers to the `cli` image, though it was removed in `e7c4a9d4b9`. This change removes references to the `cli` image from all scripts.	2018-03-25 09:46:34 -07:00
Andrew Seigner	12c6531546	Update docker-compose environment to match prod (#609 ) The Prometheus config in the docker-compose environment had fallen behind the prod setup. This change updates the docker-compose environment in the following ways: - Prometheus config more closely matches prod, based on #583 - simulate-proxy labels matches prod, based on #605 - add Grafana container Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-23 17:00:39 -07:00
Andrew Seigner	291d8e97ab	Move injected data from env var to k8s labels (#605 ) The inject code detects the object it is being injected into, and writes self-identifying information into the CONDUIT_PROMETHEUS_LABELS environment variable, so that conduit-proxy may read this information and report it to Prometheus at collection time. This change puts the self-identifying information directly into Kubernetes labels, which Prometheus already collects, removing the need for conduit-proxy to be aware of this information. The resulting label in Prometheus is recorded in the form `k8s_deployment`. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-23 16:11:34 -07:00
Eliza Weisman	9321932918	Add request_duration_ms metric and increment request_total on request end (#589 ) This PR adds the `request_duration_ms` metric to the Prometheus metrics exported by the proxy. It also modifies the `request_total` metric so that it is incremented when a request stream finishes, rather than when it opens, for consistency with how the `response_total` metric is generated. Making this change required modifying `telemetry::sensors::http` to generate a `StreamRequestEnd` event similar to the `StreamResponseEnd` event. This is done similarly to how sensors are added to response bodies, by generalizing the `ResponseBody` type into a `MeasuredBody` type that can wrap a request or response body. Since this changed the type of request bodies, it necessitated changing request types pretty much everywhere else in the proxy codebase in order to fix the resulting type errors, which is why the diff for this PR is so large. Closes #570	2018-03-22 15:27:34 -07:00
Andrew Seigner	fb1d6a5c66	Introduce Conduit Health dashboard (#591 ) In addition to dashboards display service health, we need a dashboard to display health of the Conduit service mesh itself. This change introduces a conduit-health dashboard. It currently only displays health metrics for the control plane components. Proxy health will come later. Fixes #502 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-22 15:16:03 -07:00
Alex Leong	d50550515e	Add the proxy pod owner as a Prometheus label (#448 ) Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics. Fixes: #426 Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-22 15:10:51 -07:00
Eliza Weisman	3f10c80256	Fix double comma in outbound metrics (#601 ) Fixes #600 The proxy metrics endpoint has a bug where metrics recorded in the outbound direction can contain two commas in a row when no outbound label is present. This occurs because the code for formatting the outbound direction label mistakenly assumed that there would always be a destination pod owner label as well, but the proxy isn't currently aware of the destination's pod owner (waiting for #429). I've fixed this issue by moving the place where the comma is output from the `fmt::Display` impl for `RequestLabels` to the `fmt::Display` impl for `OutboudnLabels`. This way, the comma between the `direction` and `dst_` labels is only output when the `dst_` label is present. This bug made it to master since all of the proxy end-to-end tests for metrics only test the inbound router. I've rectified this issue by adding tests on the outbound router as well (which would fail against the current master due to the double comma bug). I've also added a test that asserts there are no double commas in exported metrics, to protect against regressions to this bug.	2018-03-22 14:17:10 -07:00
Andrew Seigner	c03508ba8c	Update Prometheus to scrape data and control plane (#583 ) The existing telemetry pipeline relies on Prometheus scraping the Telemetry service, which will soon be removed. This change configures Prometheus to scrape the conduit proxies directly for telemetry data, and the control plane components for control-plane health information. This affects the output of both conduit install and conduit inject. Fixes #428, #501 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-22 13:58:11 -07:00
Dennis Adjei-Baah	b90668a0b5	Modify simulate proxy to expose prometheus metrics (#576 ) The simulate-proxy script pushes metrics to the telemetry service. This PR modifies the script to expose metrics to a prometheus endpoint. This functionality creates a server that randomly generates response_total, request_totals, response_duration_ms and response_latency_ms. The server reads pod information from a k8s cluster and picks a random namespace to use for all exposed metrics. Tested out these changes with a locally running prometheus server. I also ran the docker-compose.yml to make sure metrics were being recorded by the prometheus docker container. fixes #498 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-21 16:40:12 -07:00
Eliza Weisman	1c9ce4d118	Add Prometheus /metrics endpoint to proxy (#569 ) This PR adds an endpoint to the proxy that serves metrics in Prometheus' text exposition format. The endpoint currently serves the `request_total`, `response_total`, `response_latency_ms`, and `response_duration_ms metrics`, as described in #536. The endpoint's port and address are configurable with the `CONDUIT_PROXY_METRICS_LISTENER` environment variable. Tests have been added in t`ests/telemetry.rs`	2018-03-21 16:19:32 -07:00
Brian Smith	359460c826	Proxy: Download fewer crates in Travis CI runs (#597 ) Follow up https://github.com/runconduit/conduit/pull/593 by avoiding `cargo fetch` in favor of the implicit fetch done in `cargo build` to work around the lack of a `--target` flag in `cargo fetch`. This should at least slightly improve the speed and reliability of Travis CI runs. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-21 10:43:53 -10:00
Brian Smith	a9589ecf99	Update tempdir dependency to improve deps situation. (#596 ) Replace an unconditional dependency on windows-specific crates in tempdir (via its update of its remove_dir_all dependency), which eliminates the need to download any windows-specific crates during the build when targetting non-Windows platforms. Also, when targetting Windows platforms, replace a winapi 0.2.x dependency with a winapi 0.3.x dependency. This results in two fewer downloads during Docker builds: ```diff - Downloading winapi v0.2.8 - Downloading winapi-build v0.1.1 ``` Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-21 10:43:32 -10:00
Brian Smith	d38a2acff8	Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593 ) `cargo fetch` doesn't consider the target platform and downloads all crates needed to build for any target. Stop using `cargo fetch` and instead use the implicit fetch done by `cargo build`, which does consider the target platform. This change results in 12 (soon 15) fewer crates downloaded. This is a non-trivial savings in build time for a full rebuild since cargo downloads crates in parallel. ```diff - Downloading bitflags v1.0.1 - Downloading fuchsia-zircon v0.3.3 - Downloading fuchsia-zircon-sys v0.3.3 - Downloading miow v0.2.1 - Downloading redox_syscall v0.1.37 - Downloading redox_termios v0.1.1 - Downloading termion v1.5.1 - Downloading winapi v0.3.4 - Downloading winapi-i686-pc-windows-gnu v0.4.0 - Downloading winapi-x86_64-pc-windows-gnu v0.4.0 - Downloading wincolor v0.1.6 - Downloading ws2_32-sys v0.2.1 ``` I verified that no downloads are done during an incremental build. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-21 08:51:18 -10:00
Brian Smith	000e1fff24	Update codegen and tower-balance to remove indexmap dep. (#594 ) ```sh $ cargo update -p codegen -p tower-balance [...] Removing indexmap v0.4.1 ``` Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-21 08:28:33 -10:00
Andrew Seigner	680bf6211a	Add Grafana support to conduit dashboard command (#590 ) The existing `conduit dashboard` command supported opening the conduit dashboard, or displaying the conduit dashboard URL, via a `url` boolean flag. Replace the `url` boolean flag with a `show` string flag, with three modes: `conduit dashboard --show conduit`: default, open conduit dashboard `conduit dashboard --show grafana`: open grafana dashboard `conduit dashboard --show url`: display dashboard URLs Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-20 18:07:30 -07:00
Alex Leong	997df861a3	Add proxy metrics documentation (#536 ) * Add proxy metrics documentation Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-20 14:50:42 -07:00
Andy Hume	e6286e1bdf	cli: ensure check command has 80-character output (#587 ) Successful `conduit check` commands now take into account `[ok]` and `\n` tokens when constraining line length. Fixes #554 Signed-off-by: Andy Hume <andyhume@gmail.com>	2018-03-20 13:55:19 -07:00
Andrew Seigner	3ca8e84eec	Add Top Line and Deployment Grafana dashboards (#562 ) Existing Grafana configuration contained no dashboards, just a skeleton for testing. Introduce two Grafana dashboards: 1) Top Line: Overall health of all Conduit-enabled services 2) Deployment: Health of a specific conduit-enabled deployment Fixes #500 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-20 10:22:30 -07:00
Andy Hume	1a66e7f8f1	cli: reduce timeouts on API check requests (#586 ) Applies timeout of 5s to check request contexts. This overrides 30s timeout applied at client transport level, and stops the conduit check command from taking > 90s to complete. Fixes #553 Signed-off-by: Andy Hume <andyhume@gmail.com>	2018-03-19 17:15:01 -07:00
Alena Varkockova	b82f89f4d9	Reuse code for metrics serving in controller (#585 ) Signed-off-by: Alena Varkockova varkockova.a@gmail.com	2018-03-19 10:33:25 -07:00
Sean McArthur	cd59465366	proxy: add SIGTERM and SIGINT handlers (#581 ) When the proxy is run in a Docker container, it runs as PID 1, with no default signal handlers setup. In order to react to signals from Kubernetes about shutting down, we need to set up explicit handlers. This adds handlers for SIGTERM and SIGINT. Closes #549	2018-03-16 18:53:20 -07:00
Brian Smith	e7c4a9d4b9	Remove the cli docker image (#579 ) This image isn't used. It references its base image using the `latest` tag, which is wrong; it should have been using the tag that the base image was built with. It is likely that the last few iterations of this image that we've published have wrong and useless contents. With that in mind, just remove the image. Fixes #578. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-16 14:22:46 -10:00
Brian Smith	c5cf53d022	Update release notes for v0.3.1. (#574 ) * Update release notes for v0.3.1. Signed-off-by: Brian Smith <brian@briansmith.org> * Add PR authorship for non-Buoyant contributors. Signed-off-by: Brian Smith <brian@briansmith.org> * Wrap lines at 100 characters. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-15 08:50:09 -10:00
Carl Lerche	35f61dcc56	Proxy: Upgrade h2 and indexmap crates (#572 ) In order to pick up a bugfix in h2, upgrade: h2 0.1.2 indexmap 1.0.0 Signed-off-by: Carl Lerche <me@carllerche.com>	2018-03-14 12:35:38 -07:00
Brian Smith	216efaa568	Stop collapsing Cargo.lock in GitHub PR reviews. (#551 ) We should review changes to Cargo.lock to ensure we're not adding unexpected and/or unnecessary dependencies. (Maybe we should do the same for Gopkg.lock, but I'm not in a position to say for sure.) Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-13 10:17:33 -07:00
Alex Leong	9eb084c99d	Most controller listeners should only bind on localhost (#494 ) * Most controller listeners should only bind on localhost * Use default listening addresses in controller components * Review feedback * Revert test_helper change * Revert use of absolute domains Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-12 11:32:20 -07:00
Eliza Weisman	8da70bb6e2	Run all discovery tests for HTTP/1 as well as HTTP/2 (#556 ) In order to ensure we catch discovery and routing issues arising from different logic for HTTP/1 and HTTP/2 requests, I've modified tests/discovery.rs to run all applicable tests with both HTTP/1 and HTTP/2 requests. The tests themselves are largely unchanged, but now there are separate modules containing HTTP/1 and HTTP/2 versions of a majority of the tests.	2018-03-09 17:24:48 -08:00
Eliza Weisman	d62a869e68	Fix outbound HTTP/1 requests not using Destinations (#555 ) Commit `569d6939a7` introduced a regression that caused the proxy to stop using the Destination service for outbound HTTP/1 requests with no authority in the request URI but a valid authority in the `Host:` header. The bug is due to some code in `Outbound::recognize` which assumed that a request had already been passed through `normalize_our_view_of_uri`. This was valid at one point while I was writing #492, as URIs were normalized prior to `recognize` and a request `Extension` was used to mark that they had been rewritten, and the host header and request URI could be assumed to be in agreement, but after merging #514 into the dev branch for #492, this behaviour changed and I forgot to update the logic in `recognize`. I've fixed the issue by adding the logic for routing on `Host:` headers back into `Outbound::recognize`. @seanmonstar added a test in `discovery.rs`, `outbound_http1_asks_controller_about_host`, which should exercise this case. I've added a couple more unit tests in that file to try and ensure we cover more of the different cases that can occur here. Fixes #552	2018-03-09 16:25:19 -08:00
Brian Smith	9cdc485ee4	Proxy: Update deps to improve logging and remove slab 0.3 & ordermap deps. (#550 ) Improve per-module logging (reportedly log 0.3 doesn't work with env_logger 0.5 as well as log 0.4 does in this respect) and eliminate unnecesary dependencies. ``` cargo update -p mio cargo update -p tokio-io cargo update -p tower cargo update -p tower-h2 cargo update -p tower-grpc ``` This removes (partial output of the above `cargo update` commands): ``` Removing log v0.3.9 Removing ordermap v0.2.13 Removing ordermap v0.3.5 Removing slab v0.3.0 ``` Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-08 18:38:10 -10:00
Sean McArthur	83d6a1f579	proxy: improve transparency of host headers and absolute-uris (#535 ) In some cases, we would adjust an existing Host header, or add one. And in all cases when an HTTP/1 request was received with an absolute-form target, it was not passed on. Now, the Host header is never changed. And if the Uri was in absolute-form, it is sent in the same format. Closes #518	2018-03-08 13:15:21 -08:00
Carl Lerche	1b4a426d16	Proxy: Update h2 dependency. (#539 ) The h2 crate (HTTP/2.0 client and server) has a new release which includes bug fixes and stability improvements. This updates the Cargo.lock file to include the new release. Closes #538 Signed-off-by: Carl Lerche <me@carllerche.com>	2018-03-08 12:59:27 -08:00
Eliza Weisman	6af9871f13	Fix infinite loop in `tcp::HalfDuplex::copy_into()` (#537 ) An infinite loop exists in the TCP proxy, which could be triggered by any raw TCP connection (including HTTPS requests). The connection will be proxied successfully, but instead of closing, it will remain open, and the proxy's CPU usage will remain extremely high indefinitely. Since `Duplex::poll` will call `half_in.copy_into()`/`half_out.copy_into()` repeatedly, even after they return `Async::Ready`, when one half has shut down and returned ready, it may still be polled again, as `Duplex::poll` waits until _both_ halves have returned `Ready`. Because of the guard that `!dst.is_shutdown`, intended to prevent the destination from shutting down twice, the function will not return if it is polled again after returning `Async::Ready` once. I've fixed this by moving the guard against double shutdowns out of the loop, so that the function will return `Async::Ready` again if it is polled after shutting down the destination. I've also included a unit test against regressions to this bug. The unit test fails against master. Fixes #519 Signed-off-by: Eliza Weisman <eliza@buoyant.io> Co-Authored-By: Andrew Seigner <andrew@sig.gy>	2018-03-08 12:43:19 -08:00
Brian Smith	3a73411375	Proxy: Test & document localhost. name resolution. (#531 ) * Proxy: Test & document localhost. name resolution. Closes #358. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 17:40:39 -10:00
Brian Smith	7aa1d0b26d	Proxy: Don't resolve absolute names outside zone using Destinations (#530 ) * Proxy: Don't resolve absolute names outside zone using Destinations service Many absolute names were being resolved using the Destinations service due to logic error in the proxy's matching of the zone to the default zone. Fix that bug. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 14:53:32 -10:00
Brian Smith	649e784d9c	Simplify cluster zone suffix handling in the proxy (#528 ) * Temporarily stop trying to support configurable zones in the proxy. None of the zone configuration is tested and lots of things assume the cluster zone is `cluster.local`. Further, how exactly the proxy will actually learn the cluster zone hasn't been decided yet. Just hard-code the zone as "cluster.local" in the proxy until configurable zones are fully implemented and tested to be working correctly. Signed-off-by: Brian Smith <brian@briansmith.org> * Remove the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting The way that Kubernetes configures DNS search suffixes has some negative consequences as some names like "example.com" are ambiguous: depending on whether there is a service "example" in the "com" namespace, "example.com" may refer to an external service or an internal service, and this can fluctuate over time. In recognition of that we added the CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN setting, thinking this would be part of a solution for users to opt out of the unfortunate behavior if their applications didn't depend on the DNS search suffix feature. It turns out similar effects can be acheived using a custom dnsConfig, starting in Kubernetes 1.10 when dnsConfig reaches the beta stability level. Now any CONDUIT_PROXY_DESTINATIONS_AUTOCOMPLETE_FQDN-based seems duplicative. Further, attempting to support it optionally made the code complex and hard to read. Therefore, let's just remove it. If/when somebody actually requests this functionality then we can add it back, if dnsConfig isn't a valid alternative for them. Signed-off-by: Brian Smith <brian@briansmith.org> * Further hard-code "cluster.local" as the zone, temporarily. Addresses review feedback. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 14:30:13 -10:00
Dennis Adjei-Baah	ad42f2f8ab	Retry k8s watch endpoints on error (#510 ) Shortly after conduit is installed in k8s environment. The control plane component that establishes a watch endpoint with k8s run in to networking issues during proxy initialization. During failure, each watcher fails to retry its connection to k8s watch endpoint which leads to timeouts and eventually, multiple controller pod restarts. This PR adds retry logic to each "watch" enabled package. fixes #478 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-07 13:40:43 -08:00
Brian Smith	72c6a9cab2	Proxy: Make CONDUIT_PROXY_POD_NAMESPACE a required parameter. (#527 ) Wwe will be able to simplify service discovery in the near future if we can rely on the namespace being available. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 11:12:05 -10:00
Brian Smith	0d4ab39ce7	Revert "Make absolute names truly absolute. (#525 )" (#533 ) This reverts commit `517616a166`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 10:57:10 -10:00
Brian Smith	82f9db7deb	Patch prost-derive 0.3.2 to current master to prune dependencies. (#526 ) Pick up https://github.com/danburkert/prost/pull/87, which results in the following reduction in build dependencies for the proxy: Removing failure_derive v0.1.1 Adding prost-derive v0.3.2 (https://github.com/danburkert/prost#3427352e) Removing prost-derive v0.3.2 Removing quote v0.3.15 Removing syn v0.11.11 Removing synom v0.11.3 Removing synstructure v0.6.1 Removing unicode-xid v0.0.4 Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 10:27:00 -10:00
Brian Smith	517616a166	Make absolute names truly absolute. (#525 ) Kubernetes will do multiple DNS lookups for a name like `proxy-api.conduit.svc.cluster.local` based on the default search settings in /etc/resolv.conf for each container: 1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A 2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A 3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A 4. proxy-api.conduit.svc.cluster.local. IN A We do not need or want this search to be done, so avoid it by making each name absolute by appending a period so that the first three DNS queries are skipped for each name. The case for `localhost` is even worse because we expect that `localhost` will always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default search is done: 1. localhost.conduit.svc.cluster.local. IN A 2. localhost.svc.cluster.local. IN A 3. localhost.cluster.local. IN A 4. localhost. IN A Avoid these unnecessary DNS queries by making each name absolute, so that the first three DNS queries are skipped for each name. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 09:46:03 -10:00
Kevin Lingerfelt	47fc2eae20	Set -logtostderr flag on controller components (#524 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-07 10:18:15 -08:00

1 2 3 4 5 ...

303 Commits All Branches Search

303 Commits

All Branches