linkerd2

Commit Graph

Author	SHA1	Message	Date
Andrew Seigner	5a5c6d14ab	Update Grafana to 5.1.0, handle missing data (#876 ) Conduit 0.4.1 contained some rough edges in the Grafana deployment. This PR include the following: - bump Grafana to 5.1.0 - fix deployment and rc graphs when no data present - fix some text sections overlapping due to scrolling Fixes #705 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-29 22:24:22 +02:00
Oliver Gould	935ccd5f78	proxy: Implement benchmarks for telemetry recording (#874 ) Before changing the telemetry implementation, we should have a means to understand the impacts of such changes. To run, you must use a nightly toolchain: ``` rustup run nightly cargo bench -p conduit-proxy -- record ```	2018-04-29 12:55:26 -07:00
Oliver Gould	64a3bb09b2	Rename `metrics::Aggregate` to `metrics::Record` (#875 ) Move `Record` into its own file.	2018-04-28 15:35:29 -07:00
Oliver Gould	9c70310406	proxy: Implement a From converter for latency::Ms (#872 ) This reduces callsite verbosity for latency measurements at the expense of a fn-level generic.	2018-04-27 17:55:44 -07:00
Eliza Weisman	9156a80a22	proxy: Add histogram unit tests (#870 ) This PR adds the unit tests for the proxy metrics module's Histogram implementation that I wrote in #775 to @olix0r's Histogram implementation added in #868. The tests weren't too difficult to adapt for the new code, and everything seems to work correctly! Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-27 17:02:13 -07:00
Oliver Gould	1cadef9516	proxy: Make `Histogram` generic over its value (#868 ) In order to support histograms measured in, for instance, microseconds, Histogram should blind store integers without being aware of the unit. In order to accomplish this, we make `Histogram` generic over a `V: Into<u64>`, such that all values added to the histogram must be of type `V`. In doing this, we also make the histogram buckets configurable, though we maintain the same defaults used for latency values. The `Histogram` type has been moved to a new module, and the `Bucket` and `Bounds` helper types have been introduced to help make histogram logic clearer and latency-agnostic.	2018-04-27 14:43:09 -07:00
Sean McArthur	5fb4695358	proxy: wrap connections in Transport sensor before peeking (#851 ) In case there are any errors while peeking the connection to do protocol detection, the sensors will now be in place to detect them. Besides just errors, this will also allow reporting about connections that are accepted, but then immediately closed. Additionally: - add write_buf implementation for Transport sensor, can help performance for http1/http2 - add better logs for tcp connections errors - add printlns for when tests fail Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-27 14:18:23 -07:00
Oliver Gould	80fdb97f88	Move `Counter` and `Gauge` to their own modules (#861 ) In preparation for a larger metrics refactor, this change splits the Counter and Gauge types into their own modules. Furthermore, this makes the minor change to these types: incr() and decr() no longer return `self`. We were not actually ever using the returned self references, and I find the unit return type to more obviously indicate the side-effecty-ness of these calls. #smpfy	2018-04-26 16:49:37 -07:00
Risha Mars	f85dab8937	Upgrade antd to 3.4.3 (#855 )	2018-04-26 16:35:42 -07:00
Risha Mars	4661aaf30d	Add a namespace column to the metrics tables (#854 ) * Add a namespace column to the metrics tables, support long resource names * Add a test for GrafanaLink * Change the PodList.jsx component to not use the ListPods api	2018-04-26 16:34:59 -07:00
Andrew Seigner	97bf4fcdf2	Release Notes for 0.4.1 release. (#839 ) Also update Getting Started and Debugging docs to reflect changes in `Tap` and `Stat`. Fixes #838 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-26 13:32:41 -07:00
Eliza Weisman	309ef14a11	Add transport-level metrics to proxy-metrics.md (#742 ) This PR adds a description of the transport level (TCP) metrics that the Conduit proxy now exposes as of `6ad0960`. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-26 09:44:52 -07:00
Risha Mars	cd8c53ca2d	Remove the deployment search bar from the sidebar (#853 ) We removed individual Deployment pages a while ago, but left the autocomplete search bar in. Clicking on searches goes to a 404 because we don't have /deployment any more. This will be revisited in the future with direct links to grafana dashboards to all the resources we support.	2018-04-25 16:28:54 -07:00
Eliza Weisman	d55e334a42	Add TCP stats to deployment dashboards (#824 ) This PR adds the TCP metrics added in #785 and #790 to the Grafana deployment dashboards. I've added three new charts in the "Inbound Traffic" and "Outbound Traffic" headings: + "TCP Connection Failures": plots the number of failed TCP connections over time + "TCP Connections Open": shows the number of accepted and opened connections currently open + "TCP Connection Duration": a heatmap of connection durations over time I'm planning on adding similar graphs to other dashboards as well in subsequent PRs.	2018-04-25 16:26:43 -07:00
Risha Mars	fbacdd8a05	Add a Replication Controllers page in the Web UI (#850 ) * Add a Replication Controllers page in the Web UI @siggy pointed out that we don't need to use the PodsList api any more, since the new stats endpoint (#671) includes meshedPodCount and totalPodCount, which is all we need to determine whether the deployment/rc has been added to the mesh (which is what we were using ListPods to determine). This PR modifies deployments to not use the pods api any more, and adds a Replication Controllers page. This page is quite similar to the Deployments page in logic, so I've made a PodOwnersList component to share the code. I haven't added Replication Controllers to the Service Mesh page yet, because that page does require a list of component pods. Also, we don't need the calls to Prometheus for the Service Mesh page, so I don't want to use the existing stat apis for it. I figure that is a large enough change for a separate PR.	2018-04-25 15:01:06 -07:00
Brian Smith	c5d2dab8bd	Remove special support for ExternalName services (#764 ) After this was implemented we found that ExternalName services are represented in DNS as CNAMEs, which means that the proxy's DNS fallback logic can be used instead of doing DNS in the control plane. Besides simplifying the controller, this will also increase fidelity with the proxied pods' DNS configuration (improve transparency). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 11:53:33 -10:00
Oliver Gould	d4d0e579c2	Introduce the `peer` label to transport metrics (#848 ) Previously, the proxy exposed separate _accept_ and _connect_ metrics for some metric types, but not for all. This leads to confusing aggregations, particularly for read and write taotals. This change primarily introduces the `peer` prometheus label (with possible values _src_ or _dst_) to indicate which side of the proxy the metric reflects. Additionally, the `received_bytes` and `sent_bytes` metrics have been renamed as `tcp_read_bytes_total` and `tcp_write_bytes_total`, resectively. This more naturally fits into existing idioms. Stream classification is not applied to these metrics, as we plan to increment them throughout stream lifetime and not only on close. The `tcp_connections_open` metric has also been renamed to `tcp_open_connections` to reflect Prometheus idioms. Finally, `msg1` and `msg2` have been constified in telemetry test fixtures so that tests are somewhat easier to read.	2018-04-25 14:06:33 -07:00
Brian Smith	0c23ad416b	Proxy: Use trust-dns-resolver for DNS. (#834 ) trust-dns-resolver is a more complete implementation. In particular, it supports CNAMES correctly, which is needed for PR #764. It also supports /etc/hosts, which will help with issue #62. Use the 0.8.2 pre-release since it hasn't been released yet. It was created at our request. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 10:04:49 -10:00
Andrew Seigner	dce31b888f	Deprecate Tap, rename TapByResource to Tap (#844 ) The `conduit tap` command is now deprecated. Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now returns Unimplemented. Fixes #804 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-25 12:24:46 -07:00
Eliza Weisman	3a3079d8ed	Fix assertions in metrics_compression test (#847 ) Fixes #846 The proxy `metrics_compression` test contained an assertion that a compressed scrape contained the `request_duration_ms_count` metric. This was chosen completely arbitrarily, and was only intended as an assertion that metrics were updated between compressed scrapes. Unfortunately, that metric was removed in `d9112abc93`, so when #665 merged to master, this test broke. CI didn't catch this since we don't build merges for PRs --- we should probably (re)enable this in Travis? This PR fixes the test to assert on a metric that wasn't removed. Sorry for the ❌s!	2018-04-25 11:02:52 -07:00
Risha Mars	aca09813fd	Add a Replication Controller grafana dashboard (#843 ) * Add a Replication Controller grafana dashboard, very similar to the Deployment one	2018-04-25 10:57:41 -07:00
Carl Lerche	298321a6c1	Bump proxy h2 dependency to v0.1.6. (#845 ) This release includes a number of bug fixes related to HTTP/2.0 stream management. Signed-off-by: Carl Lerche <me@carllerche.com>	2018-04-25 08:40:42 -07:00
Andrew Seigner	640570cd6b	Make TapByResource output destination pod (#837 ) The TapByResource command now has access to destination labels from the proxy, but was not outputting them on the cli. Modify the TapByResource output to print the destination pod label, rather than the ip, when available. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-24 18:58:40 -07:00
Andrew Seigner	a0a9a42e23	Implement Public API and Tap on top of Lister (#835 ) public-api and and tap were both using their own implementations of the Kubernetes Informer/Lister APIs. This change factors out all Informer/Lister usage into the Lister module. This also introduces a new `Lister.GetObjects` method. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-24 18:10:48 -07:00
Eliza Weisman	a2c60e8fcf	Add optional GZIP compression to proxy /metrics endpoint (#665 ) Closes #598. According to the Prometheus documentation, metrics export endpoints should support serving metrics compressed using GZIP. I've modified the proxy's `/metrics` endpoint to serve metrics compressed with GZIP when an `Accept-Encoding: gzip` request header is sent. I've also added a new unit test that attempts to get the proxy's metrics endpoint as GZIP, and asserts that the metrics are decompressed successfully. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-24 17:42:50 -07:00
Oliver Gould	3799debab5	Add destination labels to all relevant tap events (#840 ) The proxy incorrectly only added labels to response events. Destination labels should be added to all tap events sent by the proxy.	2018-04-24 17:15:13 -07:00
Eliza Weisman	5d29c270bf	proxy: Add tcp_connections_open gauge (#791 ) Depends on #785. This PR adds the `tcp_connections_open` gauge to the proxy's TCP metrics. It also adds some tests for that metric.	2018-04-24 10:17:48 -07:00
Sean McArthur	b053b16e9d	proxy tests: reduce some boilerplate, improve error information (#833 ) The `controller` part of the proxy will now use a default, removing the need to pass the exact same `controller::new().run()` in every test case. The TCP server and client will include their socket addresses in some panics. Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-23 18:01:51 -07:00
Andrew Seigner	03d4684d3b	Introduce K8s Lister, integrate simulate-proxy (#829 ) The Kubernetes client-go Informer/Lister APIs are implemented in several parts of the code base. This change introduces a Lister module, providing Informer/Lister capability through a simple interface. Once this merges, we can follow up with moving public-api and tap onto Lister. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:44:19 -07:00
Andrew Seigner	baf4ea1a5a	Implement TapByResource in Tap Service (#827 ) The TapByResource endpoint was previously a stub. Implement end-to-end tapByResource functionality, with support for specifying any kubernetes resource(s) as target and destination. Fixes #803, #49 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:13:26 -07:00
Eliza Weisman	d9112abc93	proxy: remove unused metrics (#826 ) This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md` Closes #821	2018-04-23 16:05:20 -07:00
Andrew Seigner	39eccb09e2	cli: standardize kubernetes resource parsing (#830 ) The Tap command leveraged new cli parsing code, enabling Kubernetes resources specified as `(TYPE [NAME] \| TYPE/NAME)`. The Stat command did not use this. Modify the Stat command to use the same cli flag parsing code as Tap. Remove the to/from-resource flags from Stat. Fixes #792 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 15:17:42 -07:00
Eliza Weisman	455c99d4d4	Ignore flaky metrics tests on CI (#832 ) Fixes #831. Proxy metrics tests `transport::inbound_tcp_accept` and `transport::inbound_tcp_duration` are known to be flaky and should be ignored on CI. Note that the outbound versions of these tests were already marked as flaky, so this was almost certainly either an oversight or the result of an incorrect merge. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-23 14:29:34 -07:00
Eliza Weisman	79304cdcaf	proxy: Unbreak process_start_time_seconds metric (#825 ) The refactoring of how metrics are formatted in `674ce87588` inadvertently introduced a bug that caused the `process_start_time_seconds` metric to be formatted as just a number without the metric name. This causes Prometheus to fail with a parse error rather than accepting the metrics. I've fixed this issue, and added a unit test to detect regressions in the future.	2018-04-20 15:59:08 -07:00
Eliza Weisman	8147a363e9	Make simulate-proxy match proxy output (#822 ) This PR makes two changes to the `simulate-proxy` script: 1. Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499). 2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well. Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.	2018-04-20 12:11:57 -07:00
Brian Smith	ef5fac2109	Remove docs for never-implemented Destination service heartbeat (#820 ) Fixes #610. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-20 04:54:33 -10:00
Eliza Weisman	c19da70965	proxy: Add classifications to TCP close stats (#790 ) This PR adds a `classification` label to transport level metrics collected on transport close. Like the `classification` label on HTTP response metrics, the value may be either `"success"` or `"failure"`. The label value is determined based on the `clean` field on the `TransportClose` event, which indicates whether a transport closed cleanly or due to an error. I've updated the tests for transport-level metrics to reflect the addition of the new label. I'd like to also modify the test support code to allow us to close transports with errors, in order to test that the errors are correctly classified as failures.	2018-04-19 19:01:48 -07:00
Andrew Seigner	326d9f493c	Fix top-line Grafana counts (#815 ) The top-line single stat numbers were not calculated properly, resulting in inflated counts. Modify the underlying Prometheus queries to ensure accurate counts of Deployments, Pods, and Namespaces. Fixes #801. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 18:01:06 -07:00
Andrew Seigner	1f32f130de	Upgrade Prometheus from 2.1.0 to 2.2.1 (#816 ) There have been a number of performance improvements and bug fixes since v2.1.0. Bump our Prometheus container to v2.2.1. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 18:00:53 -07:00
Oliver Gould	b968682a10	proxy: Support destination label matching for tap (#817 ) Now, the tap server may specify that requests should be matched by destination label. For example, if the controller's Destination service returns the labels: `{"service": "users", "namespace": "prod"}` for an endpoint, then tap would be able to specify a match like `namespace=prod` to match requests destined to that namespace.	2018-04-19 17:58:45 -07:00
Andrew Seigner	c26955186b	Introduce service-centric Grafana dashboard (#810 ) Conduit's Grafana currencly provides Top-line, Deployment, Pod, and Mesh Health dashboards. This change adds a new Conduit Service dashboard. In addition to top-line information, this dashboards focuses primarily on requests to a Service, as only dst_service is available in our metrics. Part of #706 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 17:32:42 -07:00
Eliza Weisman	674ce87588	proxy: Add transport-level metrics (#785 ) This branch adds all the transport-level Prometheus metrics as described in #742, with the exception of the `tcp_connections_open` gauge (to be added in a subsequent branch). A brief description of the metrics added in this branch: * `tcp_accept_open_total`: counter of the number of connections accepted by the proxy * `tcp_accept_close_total`: counter of the number of accepted connections that have closed * `tcp_connect_open_total`: counter of the number of connections opened by the proxy * `tcp_connect_close_total`: counter of the number of connections opened by the proxy that have been closed. * `tcp_connection_duration_ms`: histogram of the total duration of each TCP connection (incremented on connection close) * `sent_bytes`: counter of the total number of bytes sent on TCP connections (incremented on connection close) * `received_bytes`: counter of the total number of bytes received on TCP connections (incremented on connection close) These metrics are labeled with the direction (inbound or outbound) and whether the connection was proxied as raw TCP or corresponds to an HTTP request. Additionally, I've added several proxy tests for these metrics. Note that there are some cases which are currently untested; in particular, while there are tests for the `tcp_accept_close_total` counter, it's more difficult to test the `tcp_connect_close_total` counter, due to connection pooling. I'd like to improve the tests for this code in additional branches.	2018-04-19 17:27:43 -07:00
Oliver Gould	689c42263a	proxy: Add destination labels to TapEvents (#814 ) The Tap API supports key-value labels on endpoint metadata. The proxy was not setting these labels previously. In order to add these labels onto tap events, we store the original set of labels in an `Arc<HashMap>` on `DstLabels`. When tap events are emitted, the destination' labels are copied from the `DstLabels` into each event.	2018-04-19 16:57:40 -07:00
Andrew Seigner	79bdc638b3	Service support in stat command (#809 ) The `stat` command did not support `service` as a resource type. This change adds `service` support to the `stat` command. Specifically: - as a destination resource on `--to` commands - as a target resource on `--from` commands Fixes #805 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 16:51:20 -07:00
Oliver Gould	0e39d6d8fa	proxy: Remove the `Labeled` middleware in favor of client context labels (#812 ) The `Labeled` middleware is used to add `DstLabels` to each request. Now that each client context maintains a watch on its endpoint's `DstLabels`, the `Labeled` middleware can safely be removed. This has one subtle behavior change: labels are associated with requests _lazily_, whereas before they were determined _eagerly_. This means that if an endpoints labels are updated before the telemetry system captures the labels for the request, it may use the newer labels. Previously, it would only use the labels at the time that the request originated.	2018-04-19 15:36:01 -07:00
Eliza Weisman	6eec6256f7	Add transport-level metrics to simulate-proxy (#811 ) This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI. Closes #793	2018-04-19 15:18:43 -07:00
Andrew Seigner	293e00bc3e	Introduce tapByResource cli command (#802 ) The existing `tap` command is being deprecated. Introduce a `tapByResource` cli command. It supports tapping a Kubernetes resource or collection of resources, optionally filtered by outbound resources. This command will eventually replace `tap`. Part of #778 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 14:44:23 -07:00
Oliver Gould	c76dd1caea	proxy: Track destination labels in client ctx (#799 ) Currently, only the request context holds destination labels. However, destination labels are more accurately associated with the client context, since the client context is what tracks the remote peer address (and destination labels are associated with this address). No functional changes.	2018-04-19 14:22:13 -07:00
Oliver Gould	2238c91e92	proxy: Introduce the control::discovery::Endpoint type (#798 ) Building on #796, this creates a new `Endpoint` type that wraps `SocketAddr`. Still, no functional change has been introduced, but this sets up to move destination labels into the bind stack directly (by adding the labels watch to the `Endpoint` type).	2018-04-19 13:31:21 -07:00
Oliver Gould	491fae7cc4	proxy: Rewrite mock controller to accept a stream of dst updates (#808 ) Currently, the mock controller, which is used in tests, takes all of its updates a priori, which makes it hard to control when an update occurs within a test. Now, the controller exposes a `DstSender`, which wraps an unbounded channel of destination updates. This allows tests to trigger updates at a specific point in the test. In order to accomplish this, the controller's hand-rolled gRPC server implementation has been discarded in favor of a real gRPC destination service. This requires that the `controller-grpc` project now builds both clients and servers for the destination service. Additionally, we now build a tap client as well (assuming that we'll want to write tests against our tap server).	2018-04-19 11:01:10 -07:00

... 4 5 6 7 8 ...

686 Commits All Branches Search

686 Commits

All Branches