linkerd2

Commit Graph

Author	SHA1	Message	Date
Eliza Weisman	6eec6256f7	Add transport-level metrics to simulate-proxy (#811 ) This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI. Closes #793	2018-04-19 15:18:43 -07:00
Andrew Seigner	293e00bc3e	Introduce tapByResource cli command (#802 ) The existing `tap` command is being deprecated. Introduce a `tapByResource` cli command. It supports tapping a Kubernetes resource or collection of resources, optionally filtered by outbound resources. This command will eventually replace `tap`. Part of #778 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 14:44:23 -07:00
Oliver Gould	c76dd1caea	proxy: Track destination labels in client ctx (#799 ) Currently, only the request context holds destination labels. However, destination labels are more accurately associated with the client context, since the client context is what tracks the remote peer address (and destination labels are associated with this address). No functional changes.	2018-04-19 14:22:13 -07:00
Oliver Gould	2238c91e92	proxy: Introduce the control::discovery::Endpoint type (#798 ) Building on #796, this creates a new `Endpoint` type that wraps `SocketAddr`. Still, no functional change has been introduced, but this sets up to move destination labels into the bind stack directly (by adding the labels watch to the `Endpoint` type).	2018-04-19 13:31:21 -07:00
Oliver Gould	491fae7cc4	proxy: Rewrite mock controller to accept a stream of dst updates (#808 ) Currently, the mock controller, which is used in tests, takes all of its updates a priori, which makes it hard to control when an update occurs within a test. Now, the controller exposes a `DstSender`, which wraps an unbounded channel of destination updates. This allows tests to trigger updates at a specific point in the test. In order to accomplish this, the controller's hand-rolled gRPC server implementation has been discarded in favor of a real gRPC destination service. This requires that the `controller-grpc` project now builds both clients and servers for the destination service. Additionally, we now build a tap client as well (assuming that we'll want to write tests against our tap server).	2018-04-19 11:01:10 -07:00
Oliver Gould	926c4cf323	proxy: Make control::discovery::Bind generic over its Endpoint type (#796 ) Previously, `Bind` required that it bind to `SocketAddr` (and `SocketAddr` only). This makes it hard to pass additional information from service discovery into the client's stack. To resolve this, `Bind` now has an additional `Endpoint` trait-generic type, and `Bind::bind` accepts an `Endpoint` rather than a `SocketAddr`. No additional endpoints have been introduced yet. There are no functional changes in this refactor.	2018-04-19 11:00:28 -07:00
Oliver Gould	2097d5a1db	proxy: Cleanup control::discovery (#797 ) `set_labels` was needlessly `Arc`ed. `Metadata` does not need to be public. No functional changes.	2018-04-19 10:59:24 -07:00
Kevin Lingerfelt	653dc6bfaa	Add replication controller stats in CLI (#794 ) * Add replication controller stats in CLI * Fix pod status in stat summary tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 18:12:14 -07:00
Oliver Gould	06dd8d90ee	Introduce the TapByResource API (#778 ) This changes the public api to have a new rpc type, `TapByResource`. This api supersedes the Tap api. `TapByResource` is richer, more closely reflecting the proxy's capabilities. The proxy's Tap api is extended to select over destination labels, corresponding with those returned by the Destination api. Now both `Tap` and `TapByResource`'s responses may include destination labels. This change avoids breaking backwards compatibility by: * introducing the new `TapByResource` rpc type, opting not to change Tap * extending the proxy's Match type with a new, optional, `destination_label` field. * `TapEvent` is extended with a new, optional, `destination_meta`.	2018-04-18 15:37:07 -07:00
Andrew Seigner	1e4ac8fda8	Destination service provides pod-template-hash (#784 ) The Destination service does not provide ReplicaSet information to the proxy. The `pod-template-hash` label approximates selecting over all pods in a ReplicaSet or ReplicationController. Modify the Destination service to provide this label to the proxy. Relates to #508 and #741 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-18 14:41:27 -07:00
Kevin Lingerfelt	71a51afb40	Expose pod stats in CLI, web UI, and Grafana (#788 ) * Expose pod stats in CLI, web UI, and Grafana * Fix js api helpers test * Add outbound traffic stats to pod dashboard Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 11:26:47 -07:00
Eliza Weisman	2e919bf813	Move request open timestamp to the top of the stack (#744 ) Currently, the request open timestamp, which is used for calculating latency, is captured in the `sensor::http::Http` middleware. However, the sensor middleware is placed fairly low in the stack, below some of the proxy's components that can add measurable latency (e.g. the router). This PR moves the request_open timestamp out of the `Http` middleware and into a new `TimestampRequestOpen` middleware, which is installed at the top of the stack (before the router). The `TimestampRequestOpen` middleware adds the timestamp as a request extension, so that it can later be consumed by the `Http` sensor to generate the request stats. By moving the timestamping to the top of the stack, the timestamp should more accurately cover the overhead of the proxy, but a majority of the telemetry work can still be done where it was previously. I'd like to have included unit tests for this change, but since the expected improvement is in the accuracy of latency measurements, there's no easy way to test this programmatically.	2018-04-17 15:01:36 -07:00
Andrew Seigner	9e8cce0838	Destination service returns "Running" pod labels (#781 ) When the Destination sees an IP address, it looks up Pods by that IP, and associates Pod label data to it. If the lookup by IP returned more than one Pod, it simply picked the first one. This is not correct, specifically in cases where one pod is in a Running state, and others are not. Modify the Destination service to only return label data for Pods in the Running state. Fixes #773 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-17 14:42:54 -07:00
Eliza Weisman	6121afb6f2	Factor out reused test fixtures from telemetry tests (#782 ) This is a fairly minor refactor to the proxy telemetry tests. `b07b554d2b` added a `Fixture` in the Destination service labeling tests added in #661 to reduce the repetition of copied and pasted code in those tests. I've refactored most of the other telemetry tests to also use the test fixture. Significantly less code is copied and pasted now. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-17 14:15:56 -07:00
Sean McArthur	3cd16e8e40	proxy: clean up some logs and a few warnings in proxy tests (#780 ) Signed-off-by: Sean McArthur <sean@seanmonstar.com>	2018-04-17 12:53:20 -07:00
Eliza Weisman	cf2d7b1d7d	proxy: move metrics::prometheus module to root metrics module (#763 ) The proxy `telemetry::metrics::prometheus` module was initially added in order to give the Prometheus metrics export code a separate namespace from the controller push metrics. Since the controller push metrics code was removed from the proxy in #616, we no longer need a separate module for the Prometheus-specific metrics code. Therefore, I've moved that code to the root `telemetry::metrics` module, which should hopefully make the proxy source tree structure a little simpler. This is a fairly trivial refactor. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-17 11:19:27 -07:00
Andrew Seigner	727521f914	Permit arbitrary time windows in public-api (#774 ) The public-api previously only permitted 4 hard-coded time windows: 10s, 1m, 10m, 1h. This was primarily a relic of the recently removed telemetry system. Modify the public-api to validate the time string, but allow for any window size, which is then passed through to Prometheus. Fixes #686 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-16 17:37:17 -07:00
Eliza Weisman	64f4dfe07f	Refactor control::Cache and add tests (#733 ) Closes #713. This is a follow-up from #688. This PR makes a number of refactorings to the proxy's `control::Cache` module and removes all but one of the `clone` calls. The `CacheChange` enum now contains the changed key and a reference to the changed value when applicable. This simplifies `on_change` functions, which no longer have to take both a tuple of `(K, V)` and a `CacheChange` and can now simply destructure the `CacheChange`, and since the changed value is passed as a reference, the `on_change` function can now decide whether or not it should be cloned. This means that we can remove a majority of the clones previously present here. I've also rewritten `Cache::update_union` so that it no longer clones values (twice if the cache was invalidated). There's still one `clone` call in `Cache::update_intersection`, but it seems like it will be fairly tricky to remove. However, I've moved the `V: Clone` bound to that function specifically. `Cache::clear` and `Cache::update_union` so that they no longer call `Cache::update_intersection` internally, so they don't need a `V: Clone` bound. In addition, I've added some unit tests that test that `on_change` is called with the correct `CacheChange`s when key/value pairs are modified.	2018-04-16 16:42:55 -07:00
Brian Smith	621f3c2e56	Revert "Avoid `cargo fetch --locked` in proxy/Dockerfile. (#593 )" (#767 ) This reverts commit `d38a2acff8`. The change being reverted here did reduce downloads that occur when Cargo.lock is updated. However, it had the unwanted side-effect of invalidating at least part of the Cargo download cache when other files, including in particular files under proto/, were modified. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-16 13:27:49 -10:00
Kevin Lingerfelt	11a4359e9a	Misc cleanup following the telemetry rewrite (#771 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-16 15:51:07 -07:00
Oliver Gould	cd9f755262	v0.4.0 (#772 ) Conduit 0.4.0 overhauls Conduit's telemetry system and improves service discovery reliability. * Web UI * New automatically-configured Grafana dashboards for all Deployments. * Command-line interface * `conduit stat` has been completely rewritten to accept arguments like `kubectl get`. The `--to` and `--from` filters can be used to filter traffic by destination and source, respectively. `conduit stat` currently can operate on `Namespace` and `Deployment` Kubernetes resources. More resource types will be added in the next release! * Proxy (data plane) * New Prometheus-formatted metrics are now exposed on `:4191/metrics`, including rich destination labeling for outbound HTTP requests. The proxy no longer pushes metrics to the control plane. * The proxy now handles `SIGINT` or `SIGTERM`, gracefully draining requests until all are complete or `SIGQUIT` is received. * SMTP and MySQL (ports 25 and 3306) are now treated as opaque TCP by default. You should no longer have to specify `--skip-outbound-ports` to communicate with such services. * When the proxy reconnected to the controller, it could continue to send requests to old endpoints. Now, when the proxy reconnects to the controller, it properly removes invalid endpoints. * A bug impacting some HTTP/2 reset scenarios has been fixed. * Service Discovery * Previously, the proxy failed to resolve some domain names that could be misinterpreted as a Kubernetes Service name. This has been fixed by extending the _Destination_ API with a negative acknowledgement response. * Control Plane * The _Telemetry_ service and associated APIs have been removed. * Documentation * Updated Roadmap * Added prometheus metrics guide	2018-04-16 14:42:15 -07:00
Oliver Gould	800cefdb77	Skip the proxy on the metrics port (#770 ) When prometheus queries the proxy for data, these requests are reported as inbound traffic to the pod. This leads to misleading stats when a pod otherwise receives little/no traffic. In order to prevent these requests being proxied, the metrics port is now added to the default inbound skip-ports list (as is already case for the tap server). Fixes #769	2018-04-16 11:54:58 -07:00
Andrew Seigner	c9cdd838dc	Standardize and polish Grafana for 0.4.0 release (#766 ) The top-line, deployments, and health Grafana dashboards had inconsistent layouts and data. This change standardizes our Grafana dashboards. Every row is composed of Success Rate, Request Rate, and Latency. Part of #420. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 18:01:44 -07:00
Brian Smith	0c37067554	Reduce proto dependencies in proxy/Dockerfile (#765 ) Reduce the dependencies on files under proto/ to eliminate Docker detecting false dependencies that trigger rebuilds. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-13 14:49:55 -10:00
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Oliver Gould	cc44db054f	Remove NODE_NAME and POD_NAME env usage (#758 ) * proxy: Remove pod_name and node_name * cli: Do not inject POD_NAME and NODE_NAME env vars	2018-04-13 13:09:51 -07:00
Andrew Seigner	21886760c6	Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762 ) Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs. apps/v1 is not available on Kubernetes 1.8. This prevented the public-api from starting. Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on a test cluster. Fixes #761 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 12:08:16 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Oliver Gould	efdfc93b50	Stop pushing telemetry reports from the proxy (#616 ) Now that the controller does not depend on pushed telemetry reports, the proxy need not depend on the telemetry API or maintain legacy sampling logic.	2018-04-12 17:39:29 -07:00
Kevin Lingerfelt	37434d048a	Update web component to use new stat api (#753 ) * Update web component to use new stat api * Address review feedback * Add external link icon Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-12 17:35:03 -07:00
Andrew Seigner	e9b209829d	Handle NaN metrics (#750 ) The Prometheus client sometimes returns NaN if a calculation is invalid, such as histogram_quantile when no requests have occurred. Add IsNaN check in the public-api and set output to zero. Fixes #747 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-12 15:21:00 -07:00
Eliza Weisman	b6180d8bfe	Add unit tests for Labeled middleware (#738 ) I've added unit tests for the `Labeled` middleware used to add Destination labels in the proxy, as @olix0r requested in https://github.com/runconduit/conduit/pull/661#discussion_r179897783. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-12 15:10:01 -07:00
Eliza Weisman	61d15a6c3e	Ignore flaky telemetry tests on CI (#752 ) The tests for label metadata updates from the control plane are flaky on CI. This is likely due to the CI containers not having enough cores to execute the test proxy thread, the test proxy's controller client thread, the mock controller thread, and the test server thread simultaneously --- see #751 for more information. For now, I'm ignoring these on CI. Eventually, I'd like to change the mock controller code in test support so that we can trigger it to send a second metadata update only after the request has finished. I think this issue also makes merging #738 a higher priority, so that we can still have some tests running on CI that exercise some part of the label update behaviour.	2018-04-12 14:59:17 -07:00
Eliza Weisman	b07b554d2b	Add labels from service discovery to proxy metrics reports (#661 ) PR #654 adds pod-based metric labels to the Destination API responses for cluster-local services. This PR modifies the proxy to actually add these labels to reported Prometheus metrics for outbound requests to local services. It enhances the proxy's `control::discovery` module to track these labels and add a `LabelRequest` middleware to the service stack built in `Bind` for labeled services. Requests transiting `LabelRequest` are given an `Extension` which contains these labels, which are then added to events produced by the `Sensors` for these requests. When these events are aggregated to Prometheus metrics, the labels are added. I've also added some tests in `test/telemetry.rs` ensuring that these metrics are added correctly when the Destination service provides labels. Closes #660 Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-12 12:54:38 -07:00
Andrew Seigner	624b87f743	Implement ListPods in public-api (#743 ) The ListPods endpoint's logic resides in the telemetry service, which is going away. Move ListPods logic into public-api, use new k8s informer APIs. Fixes #694 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 17:53:57 -07:00
Kevin Lingerfelt	47caf1ca07	Add --all-namespaces flag to CLI statsummary command (#745 ) * Add --all-namespaces flag to CLI statsummary command * Fix statsummary output formatting Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 16:40:25 -07:00
Andrew Seigner	259fdcd134	Add latency stats in new stat summary endpoint (#737 ) The new StatSummary endpoint was only providing request volume and successs rate information. Add support for retrieving latency stats via StatSummary. Also make all prometheus calls in parallel, and implement kubernetes test fixtures. Fixes #681 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 11:58:32 -07:00
Kevin Lingerfelt	e1e1b6b599	Controller: add more destination labels, fix service label (#731 ) * Add more destination labels, fix service label * Update owner labels to match proxy metrics docs Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 10:44:52 -07:00
Sean McArthur	7f54b5253d	proxy: fix flaky tcp graceful shutdown test (#735 )	2018-04-10 19:47:00 -07:00
Sean McArthur	02c6887020	proxy: improve graceful shutdown process (#684 ) - The listener is immediately closed on receipt of a shutdown signal. - All in-progress server connections are now counted, and the process will not shutdown until the connection count has dropped to zero. - In the case of HTTP1, idle connections are closed. In the case of HTTP2, the HTTP2 graceful shutdown steps are followed of sending various GOAWAYs.	2018-04-10 14:15:37 -07:00
Kevin Lingerfelt	91c359e612	Switch public API to use cached k8s resources (#724 ) * Switch public API to use cached k8s resources * Move shared informer code to separate goroutine * Fix spelling issue Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-10 11:39:31 -07:00
Brian Smith	7319cf648f	Proxy: Do L7 load balancing for all external HTTP services. (#726 ) Previously when the proxy could tell, by parsing, the request-target is not in the cluster, it would not override the destination. That is, load balancing would be disabled for such destinations. With this change, the proxy will do L7 load balancing for all HTTP services as long as the request-target has a DNS name. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-10 08:07:16 -10:00
Andrew Seigner	3a341abe9a	Fix success rate calculation in public api (#723 ) The success rate calculation relies on the `classification` label, but was incorrectly specifying `fail` rather than `failure`. Fix public api to specify `failure`. Also re-org public api tests for easier Kubernetes and Prometheus mocking. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-10 11:04:04 -07:00
Brian Smith	bc16034fd6	Proxy: Fall back to using DNS when Destination service can't find service. (#692 ) Fixes #155.	2018-04-07 18:26:06 -10:00
Brian Smith	c25e9c371b	Refactor poll_destination() in service discovery. (#725 ) No change in behavior is intended here. Split poll_destination() into two parts, one that operates locally on the DestinationSet, and the other that operates on data that isn't wholly local to the DestinationSet. This makes the code easier to understand. This is being done in preparation for adding DNS fallback polling to poll_destination(). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-07 18:15:19 -10:00
Brian Smith	7d3b715c4d	Proxy: Move DNS name normalization to service discovery (#722 ) Only the destination service needs normalized names (and even then, that's just temporary). The rest of the code needs the name as it was given, except case-normalized (lowercased). Because DNS fallack isn't implemented in service discovery yet, Outbound still a temporary workaround using FullyQualifiedName to keep things working; thta will be removed once DNS fallback is implemented in service discovery. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-06 15:04:09 -10:00
Andrew Seigner	716b392231	Move StatSummary logic into grpc server (#717 ) The StatSummary logic was implemented as a method on http_server. Move the StatSummary logic into grpc_server, for consistency with the other endpoints. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 16:46:15 -07:00
Andrew Seigner	b6bcdcc059	Namespace-aware Grafana dashboards (#716 ) The Grafana dashboards key off of deployment, but had no awareness of namespaces, causing incorrect metrics aggregation and display. This change makes the Grafana dashboards key off of namespaces, and also modifies the Grafana links in the Conduit dashboard to link to namespace+deployment. Fixes #704 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 15:37:53 -07:00
Kevin Lingerfelt	baa4d10c2f	CLI: change conduit namespace shorthand flag to -c (#714 ) * CLI: change conduit namespace shorthand flag to -c All of the conduit CLI subcommands accept a --conduit-namespace flag, indicating the namespace where conduit is running. Some of the subcommands also provide a --namespace flag, indicating the kubernetes namespace where a user's application code is running. To prevent confusion, I'm changing the shorthand flag for the conduit namespace to -c, and using the -n shorthand when referring to user namespaces. As part of this change I've also standardized the capitalization of all of our command line flags, removed the -r shorthand for the install --registry flag, and made the global --kubeconfig and --api-addr flags apply to all subcommands. * Switch flag descriptions from lowercase to Capital Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-06 14:47:31 -07:00
Eliza Weisman	8bc05472ed	Make `control::Cache` key-value in order to store discovery metadata (#688 ) This PR changes the proxy's `control::Cache` module from a set to a key-value map. This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type. Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata. If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.	2018-04-06 13:54:16 -07:00

1 2 3 4 5 ...

391 Commits All Branches Search

391 Commits

All Branches