Commit Graph

1837 Commits

Author SHA1 Message Date
Eliza Weisman 605e68dff6
Add pretty durations to panics from `assert_eventually!` (#677)
This PR adds the pretty-printing for durations I added in #676 to the panic message from the `assert_eventually!` macro added in #669. 

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-04-06 10:49:17 -07:00
Brian Smith c31f4ba993
Remove unused conversions for Destination. (#701)
These have not been used for a while; they are dead code.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-06 07:35:35 -10:00
Brian Smith 7bc4ffd0a4
Revert "Proxy: Refactor DNS name parsing and normalization (#673)" (#700)
This reverts commit 311ef410a8.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:49:32 -10:00
Brian Smith 1b223723bc
Revert "Proxy: Refactor poll_destination() in service discovery. (#674)" (#698)
This reverts commit 4fb9877b89.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 16:36:01 -10:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Brian Smith 4fb9877b89
Proxy: Refactor poll_destination() in service discovery. (#674)
No change in behavior is intended here.

Split poll_destination() into two parts, one that operates locally
on the DestinationSet, and the other that operates on data that isn't
wholly local to the DestinationSet. This makes the code easier to
understand. This is being done in preparation for adding DNS fallback
polling to poll_destination().

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 13:05:11 -10:00
Brian Smith 311ef410a8
Proxy: Refactor DNS name parsing and normalization (#673)
Proxy: Refactor DNS name parsing and normalization

Only the destination service needs normalized names (and even then,
that's just temporary). The rest of the code needs the name as it was
given, except case-normalized (lowercased). Because DNS fallack isn't
implemented in service discovery yet, Outbound still a temporary
workaround using FullyQualifiedName to keep things working; thta will
be removed once DNS fallback is implemented in service discovery.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-05 12:32:12 -10:00
Andrew Seigner 28d5007cdf
Harmonize Prometheus label usage (#690)
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.

Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.

Fixes #655

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:09:06 -07:00
Andrew Seigner 65be27c3c0
Fix ci job failing when new Docker image added (#691)
The master ci job executes a `docker-pull master` prior to building, to
bootstrap the Docker image cache. This command fails if the PR being
merged to master introduces a new Docker image, for example:
https://travis-ci.org/runconduit/conduit/jobs/362841328

This changes the master ci job to handle a `docker-pull master` failure
gracefully.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:01:54 -07:00
Andrew Seigner 9508e11b45
Build conduit-specific Grafana Docker image (#679)
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.

Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.

Fixes #567
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 14:20:05 -07:00
Eliza Weisman 18fa42ebd0
Pretty-print durations in log messages (#676)
This branch adds simple pretty-printing to duration in log timeout messages. If the duration is >= 1 second, it's printed in seconds with a fractional part. If the duration is less than 1 second, it is printed in milliseconds. This simple formatting may not be sufficient as a formatting rule for all cases, but should be sufficient for printing our relatively small timeouts.

Log messages now look something like this:
```
ERROR 2018-04-04T20:05:49Z: conduit_proxy: turning operation timed out after 100 ms into 500
```

Previously, they looked like this:
```
ERROR 2018-04-04T20:07:26Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
```

I made this change partially because I wanted to make the panics from the `eventually!` macro added in #669 more readable.
2018-04-05 13:47:19 -07:00
Eliza Weisman 49bf01b0da
Add `assert_eventually!` macro to help de-flake telemetry tests (#669)
Closes #615.

Based on @olix0r's suggestion in https://github.com/runconduit/conduit/issues/613#issuecomment-376024744, this PR adds an `assert_eventually!` macro to retry an assertion a set number of times, waiting for 15 ms between retries. This is loosely based on ScalaTest's [eventually](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html).

I've rewritten the flaky telemetry tests to use the `assert_eventually!` macro, to compensate for delays in the served metrics being updated between client requests and metrics scrapes.
2018-04-05 11:23:34 -07:00
Eliza Weisman 6b370b4466
Split labels out of `prometheus.rs` into its own file (#680)
The proxy's `telemetry/metrics/prometheus.rs` file was starting to get long and hard to find one's way around in. I split the prometheus labels code out into a separate submodule and `RequestLabels` and `ResponseLabels` public. This seems like a reasonable division of the code, and the resultant files are much easier to read.
2018-04-04 15:49:17 -07:00
Oliver Gould 2dc964c583
Move control::discovery::Cache into its own module (#672)
The proxy's control::discovery module is becoming a bit dense in terms
of what it implements.

In order to make this code more understandable, and to be able to use a
similar caching strategy in other parts of the controller, the
`control::cache` module now holds discovery's cache implementation.

This module is only visible within the `control` module, and it now
exposes two new public methods: `values()` and
`set_reset_on_next_modification()`.
2018-04-04 14:27:04 -07:00
Eliza Weisman 01628bfa43
Fix missing comma in gRPC status code labels (#670)
Fixes the issue caught by @olix0r in https://github.com/runconduit/conduit/pull/661#issuecomment-378431155
2018-04-04 10:41:21 -07:00
Risha Mars d1a39ea6bf
Define a new telemetry Stat API (#663)
* Define a new telemetry Stat API

Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627.
StatSummary will replace Stat once implemented and the original Stat deleted.
2018-04-03 14:45:58 -07:00
Brian Smith 06bf78ccdf
Use Rust 1.25 to build Docker images. (#667)
Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-03 08:22:29 -10:00
Franziska von der Goltz eff848a8cf
fix pod status and count in control plane dashboard (#659)
* fix pod status and count display in control plane dashboard section:
- the control plane would show terminated and stale deployments in the UI, this is confusing and might indicate errors
- this filters out temrinated and failed component deploys from the UI
- it is to note that pending deploys will still be counted and represented with a greyed out status dot
- Fixes: #606

Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>


Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
2018-04-03 10:39:35 -07:00
Phil Calçado 19001f8d38 Add pod-based metric_labels to destinations response (#429) (#654)
* Extracted logic from destination server
* Make tests follow style used elsewhere in the code
* Extract single interface for resolvers
* Add tests for k8s and ipv4 resolvers
* Fix small usability issues
* Update dep
* Act on feedback
* Add pod-based metric_labels to destinations response
* Add documentation on running control plane to BUILD.md

Signed-off-by: Phil Calcado <phil@buoyant.io>

* Fix mock controller in proxy tests (#656)

Signed-off-by: Eliza Weisman <eliza@buoyant.io>

* Address review feedback
* Rename files in the destination package

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-02 18:36:57 -07:00
Andrew Seigner ee042e1943
Rename grafana viz to top-line (#666)
The primary Grafana dashboard was named 'viz' from a prototype.

Rename 'viz' to 'Top Line'.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 18:10:35 -07:00
Brian Smith df9ead9c36
Use Go 1.10.1 to build all Go code. (#650)
Go 1.10.1 is a security release.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-04-02 14:58:30 -10:00
Andrew Seigner bf721466e3
Filter out conduit controller pods from Grafana (#657)
The Grafana dashboards were displaying all proxy-enabled pods, including
conduit controller pods. In the old telemetry pipeline filtering these
out required knowledge of the controller's namespace, which the
dashboards are agnostic to.

This change leverages the new `conduit_io_control_plane_component`
prometheus label to filter out proxy-enabled controller components.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 17:56:12 -07:00
Sean McArthur 47f9665b8e
proxy: allow disable protocol detection on specific ports (#648)
- Adds environment variables to configure a set of ports that, when an
  incoming connection has an SO_ORIGINAL_DST with a port matching, will
  disable protocol detection for that connection and immediately start a
  TCP proxy.
- Adds a default list of well known ports: SMTP and MySQL.

Closes #339
2018-04-02 14:24:36 -07:00
Andrew Seigner 97546e0646
Modify simulate-proxy to be more pod-centric (#653)
simulate-proxy uses a deployment object from kubernetes to simulate
each proxy metrics endpoint.

Modify simulate-proxy to instead use a pod to simulate each proxy
metrics endpoint. This ensures that each metrics endpoint consistently
represents a pod in kubernetes, including it's namespace, deployment,
and label information.

This change also adds support for:
- a new `metric-ports` flag, default is `10000-10009`.
- `classification`, `pod_name`, and `pod_template_hash` labels

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-30 13:28:45 -07:00
Phil Calçado bbed49c5bd Refactor destination service and add tests in preparation to add information about labels (#645)
* Extracted logic from destination server

* Make tests follow style used elsewhere in the code

* Extract single interface for resolvers

* Add tests for k8s and ipv4 resolvers

* Fix small usability issues

* Update dep

* Act on feedback

Signed-off-by: Phil Calcado <phil@buoyant.io>
2018-03-30 11:36:48 -07:00
Brian Smith f931dec3b3
Proxy: Completely replace current set of destinations on reconnect (#632)
Previosuly, when the proxy was disconnected from the Destination
service and then reconnects, the proxy would not forget old, outdated
entries in its cache of endpoints. If those endpoints had been removed
while the proxy was disconnected then the proxy would never become
aware of that.

Instead, on the first message after a reconnection, replace the entire
set of cached entries with the new set, which may be empty.

Prior to this change, the new test
outbound_destinations_reset_on_reconnect_followed_by_no_endpoints_exists
passed already
but outbound_destinations_reset_on_reconnect_followed_by_add_none
and outbound_destinations_reset_on_reconnect_followed_by_remove_none
failed. Now all these tests pass.

Fixes #573

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 16:50:08 -10:00
Andrew Seigner 8fe742e2de
Update Grafana dashboards to use new proxy metrics (#637)
The Top-line and Deployment Grafana dashboards relied on the
soon-to-be-removed telemetry pipeline metrics.

Update the Grafana dashboards to query for the new, proxy-based metrics.
Grafana dashboard layouts have not changed.

Depends on #635 to render metrics.

Part of #420.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 13:00:01 -07:00
Brian Smith bae72c32ed
Proxy: Factor out Destination service connection logic (#631)
* Proxy: Factor out Destination service connection logic

Centralize the connection initiation logic for the Destination service
to make it easier to maintain. Clarify that the `rx` field isn't needed
prior to a (re)connect.

Signed-off-by: Brian Smith <brian@briansmith.org>

* Rename `rx` to `query`.

Signed-off-by: Brian Smith <brian@briansmith.org>

* "recoonect" -> "reconnect"

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-29 08:20:57 -10:00
Andrew Seigner 666c83e963
Add pod_name to Prometheus labels (#649)
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.

This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 11:07:35 -07:00
Deshi Xiao 732f3c1565 fix grafana CrashLoopBackoff on image 5.0.3 (#646)
this is a known issue with grafana in k8s. grafana/grafana:5.0.4 was just released today. update the repo from 5.0.3 to 5.0.4
fixed issues #582

Signed-off-by: Deshi Xiao <xiaods@gmail.com>
2018-03-29 09:36:11 -07:00
Oliver Gould 6e435754a1
Improve CLI docker caching (#612)
Currently, the CLI docker image copies the entire `controller`
directory, though the CLI only requires a few of its subdirectories.
This causes the CLI's docker cache to be needlessly invalidated when,
for instance, a service implementation changes.

By restricting the copied directories to `controller/{api,public,util}`,
build caching is improved.
2018-03-29 09:29:06 -07:00
Carl Lerche 288e041b8f proxy: Update h2 to 0.1.3 (#640)
Signed-off-by: Carl Lerche <me@carllerche.com>
2018-03-29 09:22:54 -07:00
Franziska von der Goltz 67fac9d240
remove toggle sorting functionality from TableComponent (#630)
remove toggle sorting functionality from TableComponent:
- tables displaying metrics allowed to toggle between being sorted and unsorted when clicking the same button. This was confusing behavior for the user.
- this PR removes the toggle functionality and introduces a BaseTable Component that extends antd's component without the capability to toggle
- Fixes: #566

Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
2018-03-28 18:01:34 -07:00
Eliza Weisman c688cf6028
Add response classification to proxy metrics (#639)
This PR adds a `classification` label to proxy response metrics, as @olix0r described in https://github.com/runconduit/conduit/issues/634#issuecomment-376964083. The label is either "success" or "failure", depending on the following rules:
+ **if** the response had a gRPC status code, *then*
   - gRPC status code 0 is considered a success
   - all others are considered failures
+ **else if** the response had an HTTP status code, *then*
  - status codes < 500 are considered success,
  - status codes >= 500 are considered failures
+ **else if** the response stream failed **then**
  - the response is a failure.

I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-28 14:49:00 -07:00
Andrew Seigner 1ed4a93b5e
Higher velocity metrics from simulate-proxy (#635)
simulate-proxy increments a single set of metrics on each iteration, and
also randomizes http status codes, leaving counters unchanged across
several collections.

Modify simuilate-proxy to increment all metrics on each iteration,
provide a 90% success rate, ensure a pod does not call itself, and
increase proxy count from 3 to 10.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-28 13:30:02 -07:00
Kevin Lingerfelt 59c75a73a9
Add tests/utils/scripts for running integration tests (#608)
* Add tests/utils/scripts for running integration tests

Add a suite of integration tests in the `test/` directory, as well as
utilities for testing in the `testutil/` directory.

You can use the `bin/test-run` script to run the full suite of tests,
and the `bin/test-cleanup` script to cleanup after the tests.

The test/README.md file has more information about running tests.

@pcalcado, @franziskagoltz, and @rmars also contributed to this change.

* Create TEST.md file at the root of the repo

* Update based on review feedback

* Relax external service IP timeout for GKE

* Update TEST.md with more info about different types of test runs

* More updates to TEST.md based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 15:06:55 -07:00
Andrew Seigner fe35509406
Clean up Prometheus labels scraped from proxy (#633)
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".

This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-27 15:01:08 -07:00
Eliza Weisman 40b9b345a5
All counters in proxy telemetry wrap on overflows (#603)
In #602, @olix0r suggested that telemetry counters should wrap on overflows, as "most timeseries systems (like prometheus) are designed to handle this case gracefully."

This PR changes counters to use explicitly wrapping arithmetic.

Closes #602.

Signed-off-by: Eliza Weisman <eliza@buoyant.io>
2018-03-27 14:03:12 -07:00
Brian Smith 7dc21f9588
Add the NoEndpoints message to the Destination API (#564)
Have the controller tell the client whether the service exists, not
just what are available. This way we can implement fallback logic to
alternate service discovery mechanisms for ambigious names.

Signed-off-by: Brian Smith <brian@briansmith.org>
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-27 10:45:41 -10:00
Eliza Weisman e7aa3d4105
Add process_start_time_seconds Prometheus metric (#628)
As described in #619. `process_start_time_seconds` is the idiomatic way of reporting to Prometheus the uptime of a process. It should contain the time in seconds since the beginning of the Unix epoch.

The proxy now exports this metric:
```
➜ http get localhost:4191/metrics
HTTP/1.1 200 OK
Content-Length: 902
Content-Type: text/plain; charset=utf-8
Date: Mon, 26 Mar 2018 22:09:55 GMT

# HELP request_total A counter of the number of requests the proxy has received.
# TYPE request_total counter

# HELP request_duration_ms A histogram of the duration of a request. This is measured from when the request headers are received to when the request stream has completed.
# TYPE request_duration_ms histogram

# HELP response_total A counter of the number of responses the proxy has received.
# TYPE response_total counter

# HELP response_duration_ms A histogram of the duration of a response. This is measured from when theresponse headers are received to when the response stream has completed.
# TYPE response_duration_ms histogram

# HELP response_latency_ms A histogram of the total latency of a response. This is measured from whenthe request headers are received to when the response stream has completed.
# TYPE response_latency_ms histogram

process_start_time_seconds 1522102089

```

Closes #619
2018-03-27 12:54:31 -07:00
Eliza Weisman bdcdfa8874
Actually skip flaky tests on CI and in Docker (#626)
Flaky proxy tests were not actually being ignored properly. This is due to our use of a Cargo workspace; as it turns out that Cargo doesn't propagate feature flags from the workspace to the crates in the workspace (see rust-lang/cargo#4753). 

If I run `cargo test --no-default-features` in the root directory, the `flaky_tests` feature is still passed, and the flaky tests still run:
```
➜ cargo test --no-default-features
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_loop ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok
ERROR 2018-03-26T20:54:09Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_outbound_request_count ... ok
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_endpoint_outbound_request_duration ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok

test result: ok. 11 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:54:10Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http11_upgrade_not_supported ... ok
test http11_absolute_uri_differs_from_host ... ok
test http10_without_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_connect_not_supported ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_controller_grpc-7fdac3528475b1dc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy_router-024926cac5d328ee

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/convert-ae9bd3b8fee21c85

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/futures_mpsc_lossy-4afd31454ff77b40

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-controller-grpc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests convert

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy-router

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests futures-mpsc-lossy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

This also happens if the `-p` flag is used to run tests only in the `conduit-proxy` crate:

```
➜ cargo test -p conduit-proxy --no-default-features
   Compiling conduit-proxy v0.3.0 (file:///Users/eliza/Code/go/src/github.com/runconduit/conduit/proxy)
    Finished dev [unoptimized + debuginfo] target(s) in 17.27 secs
     Running target/debug/deps/conduit_proxy-0e0ab2829c6b743f

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test inbound::tests::recognize_default_no_loop ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test inbound::tests::recognize_orig_dst ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/conduit_proxy-74584a35ef749a60

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/discovery-73cd0b65bd7a45ae

running 16 tests
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http1::outbound_asks_controller_api ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test outbound_updates_newer_services ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::absolute_uris::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http1::outbound_times_out ... ok
ERROR 2018-03-26T20:56:50Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500
test http2::outbound_times_out ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/telemetry-cb5bee2d2b94332c

running 12 tests
test metrics_endpoint_inbound_request_duration ... ok
test metrics_endpoint_inbound_request_count ... ok
test metrics_endpoint_outbound_request_count ... ok
test metrics_endpoint_outbound_request_duration ... ok
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test inbound_sends_telemetry ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok
test metrics_endpoint_inbound_response_latency ... ok
test metrics_endpoint_outbound_response_latency ... ok
test records_latency_statistics ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/transparency-9d14bf92d8ba3700

running 19 tests
ERROR 2018-03-26T20:56:55Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http10_with_host ... ok
test http1_bodyless_responses ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_removes_connection_headers ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_host_have_unique_connections ... ok
test inbound_http1 ... ok
test inbound_tcp ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test outbound_tcp ... ok
test tcp_with_no_orig_dst ... ok
test tcp_connections_close_if_client_closes ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```

However, if I `cd` into the `proxy` directory (so that Cargo treats the `conduit-proxy` crate as the root project, rather than the workspace) and pass the `--no-default-features` flag, the flaky tests are skipped as expected:

```
➜ (cd proxy && exec cargo test --no-default-features)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-ac198a96228a056e

running 13 tests
test fully_qualified_authority::tests::test_normalized_authority ... ok
test ctx::transport::tests::same_addr_ipv4 ... ok
test ctx::transport::tests::same_addr_ip6_compat_ipv4 ... ok
test ctx::transport::tests::same_addr_ipv6 ... ok
test ctx::transport::tests::same_addr_ip6_mapped_ipv4 ... ok
test telemetry::tap::match_::tests::tcp_from_proto ... ok
test telemetry::tap::match_::tests::http_from_proto ... ok
test transparency::tcp::tests::duplex_doesnt_hang_when_one_half_finishes ... ok
test telemetry::tap::match_::tests::tcp_matches ... ok
test inbound::tests::recognize_default_no_ctx ... ok
test inbound::tests::recognize_default_no_loop ... ok
test inbound::tests::recognize_default_no_orig_dst ... ok
test inbound::tests::recognize_orig_dst ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/conduit_proxy-41e0f900f97e194b

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/discovery-7ba7fe16345a347a

running 16 tests
test http1::absolute_uris::outbound_times_out ... ignored
test http1::outbound_times_out ... ignored
test http1::absolute_uris::outbound_reconnects_if_controller_stream_ends ... ok
test http1::outbound_reconnects_if_controller_stream_ends ... ok
test http1::absolute_uris::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::outbound_uses_orig_dst_if_not_local_svc ... ok
test http1::absolute_uris::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_without_orig_dst ... ok
test http1::outbound_asks_controller_api ... ok
test http1::absolute_uris::outbound_asks_controller_api ... ok
test http2::outbound_times_out ... ignored
server h1 error: invalid HTTP version specified
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: frame with invalid size into 500
test http2::outbound_reconnects_if_controller_stream_ends ... ok
test http2::outbound_uses_orig_dst_if_not_local_svc ... ok
test http2::outbound_asks_controller_api ... ok
test http2::outbound_asks_controller_without_orig_dst ... ok
test outbound_updates_newer_services ... ok

test result: ok. 13 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/telemetry-b0763b64edd8fc68

running 12 tests
test metrics_endpoint_inbound_request_count ... ignored
test metrics_endpoint_inbound_request_duration ... ignored
test metrics_endpoint_inbound_response_latency ... ignored
test metrics_endpoint_outbound_request_count ... ignored
test metrics_endpoint_outbound_request_duration ... ignored
test metrics_endpoint_outbound_response_latency ... ignored
test records_latency_statistics ... ignored
test telemetry_report_errors_are_ignored ... ok
test metrics_have_no_double_commas ... ok
test http1_inbound_sends_telemetry ... ok
test inbound_sends_telemetry ... ok
test inbound_aggregates_telemetry_over_several_requests ... ok

test result: ok. 5 passed; 0 failed; 7 ignored; 0 measured; 0 filtered out

     Running /Users/eliza/Code/go/src/github.com/runconduit/conduit/target/debug/deps/transparency-300fd801daa85ccf

running 19 tests
ERROR 2018-03-26T21:48:32Z: conduit_proxy: turning Error caused by underlying HTTP/2 error: protocol error: unexpected internal error encountered into 500
test http1_connect_not_supported ... ok
test http11_upgrade_not_supported ... ok
test http10_without_host ... ok
test http10_with_host ... ok
test http11_absolute_uri_differs_from_host ... ok
test http1_head_responses ... ok
test http1_bodyless_responses ... ok
test http1_removes_connection_headers ... ok
test http1_content_length_zero_is_preserved ... ok
test http1_one_connection_per_host ... ok
test http1_response_end_of_file ... ok
test http1_requests_without_body_doesnt_add_transfer_encoding ... ok
test inbound_tcp ... ok
test inbound_http1 ... ok
test http1_requests_without_host_have_unique_connections ... ok
test outbound_tcp ... ok
test tcp_connections_close_if_client_closes ... ok
test tcp_with_no_orig_dst ... ok
test outbound_http1 ... ok

test result: ok. 19 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests conduit-proxy

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

```

I'm wrapping the `cd` and `cargo test` command in a subshell so that the CWD on Travis is still in the repo root when the command exits, but the return value from `cargo test` is  propagated.

Closes #625
2018-03-26 17:11:06 -07:00
Brian Smith 7247ffeee3
Proxy: Clarify destination test support code queue handling (#617)
Use `VecDeqeue` to make the queue structure clear. Follow good practice
by minimizing the amount of time the lock is held. Clarify how
defaulting logic works.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-26 10:45:05 -10:00
Oliver Gould 006360aa90
Skip flaky tests for #613 (#614)
The metrics endpoint tests are flaky because there are no guarantees
that the metrics pipeline has processed events before the metrics
endpoint is read. This can cause CI to fail spuriously.

Disable these tests from running in CI until #613 is resolved.
2018-03-25 14:26:14 -07:00
Oliver Gould c5179ba10b
Remove references to `cli` images (#611)
CI builds on master have been failing to publish `cli-bin` images because the
`docker-push` script still refers to the `cli` image, though it was removed in
e7c4a9d4b9.

This change removes references to the `cli` image from all scripts.
2018-03-25 09:46:34 -07:00
Andrew Seigner 12c6531546
Update docker-compose environment to match prod (#609)
The Prometheus config in the docker-compose environment had fallen
behind the prod setup.

This change updates the docker-compose environment in the following
ways:
- Prometheus config more closely matches prod, based on #583
- simulate-proxy labels matches prod, based on #605
- add Grafana container

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 17:00:39 -07:00
Andrew Seigner 291d8e97ab
Move injected data from env var to k8s labels (#605)
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.

This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 16:11:34 -07:00
Eliza Weisman 9321932918
Add request_duration_ms metric and increment request_total on request end (#589)
This PR adds the `request_duration_ms` metric to the Prometheus metrics exported by the proxy. It also modifies the `request_total` metric so that it is incremented when a request stream finishes, rather than when it opens, for consistency with how the `response_total` metric is generated.

Making this change required modifying `telemetry::sensors::http` to generate a `StreamRequestEnd` event similar to the `StreamResponseEnd` event. This is done similarly to how sensors are added to response bodies, by generalizing the `ResponseBody` type into a `MeasuredBody` type that can wrap a request or response body. Since this changed the type of request bodies, it necessitated changing request types pretty much everywhere else in the proxy codebase in order to fix the resulting type errors, which is why the diff for this PR is so large.

Closes #570
2018-03-22 15:27:34 -07:00
Andrew Seigner fb1d6a5c66
Introduce Conduit Health dashboard (#591)
In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.

This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.

Fixes #502

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-22 15:16:03 -07:00
Alex Leong d50550515e
Add the proxy pod owner as a Prometheus label (#448)
Update the inject command to set a CONDUIT_PROMETHEUS_LABELS proxy environment variable with the name of the pod spec that the proxy is injected into. This will later be used as a label value when the proxy is exposing metrics.

Fixes: #426

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-22 15:10:51 -07:00
Eliza Weisman 3f10c80256
Fix double comma in outbound metrics (#601)
Fixes #600 

The proxy metrics endpoint has a bug where metrics recorded in the outbound direction can contain two commas in a row when no outbound label is present. This occurs because the code for formatting the outbound direction label mistakenly assumed that there would always be a destination pod owner label as well, but the proxy isn't currently aware of the destination's pod owner (waiting for #429). 

I've fixed this issue by moving the place where the comma is output from the `fmt::Display` impl for `RequestLabels` to the `fmt::Display` impl for `OutboudnLabels`. This way, the comma between the `direction` and `dst_*` labels is only output when the `dst_*` label is present. 

This bug made it to master since all of the proxy end-to-end tests for metrics only test the inbound router. I've rectified this issue by adding tests on the outbound router as well (which would fail against the current master due to the double comma bug). I've also added a test that asserts there are no double commas in exported metrics, to protect against regressions to this bug.
2018-03-22 14:17:10 -07:00