linkerd2

Commit Graph

Author	SHA1	Message	Date
Sean McArthur	7f54b5253d	proxy: fix flaky tcp graceful shutdown test (#735 )	2018-04-10 19:47:00 -07:00
Sean McArthur	02c6887020	proxy: improve graceful shutdown process (#684 ) - The listener is immediately closed on receipt of a shutdown signal. - All in-progress server connections are now counted, and the process will not shutdown until the connection count has dropped to zero. - In the case of HTTP1, idle connections are closed. In the case of HTTP2, the HTTP2 graceful shutdown steps are followed of sending various GOAWAYs.	2018-04-10 14:15:37 -07:00
Kevin Lingerfelt	91c359e612	Switch public API to use cached k8s resources (#724 ) * Switch public API to use cached k8s resources * Move shared informer code to separate goroutine * Fix spelling issue Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-10 11:39:31 -07:00
Brian Smith	7319cf648f	Proxy: Do L7 load balancing for all external HTTP services. (#726 ) Previously when the proxy could tell, by parsing, the request-target is not in the cluster, it would not override the destination. That is, load balancing would be disabled for such destinations. With this change, the proxy will do L7 load balancing for all HTTP services as long as the request-target has a DNS name. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-10 08:07:16 -10:00
Andrew Seigner	3a341abe9a	Fix success rate calculation in public api (#723 ) The success rate calculation relies on the `classification` label, but was incorrectly specifying `fail` rather than `failure`. Fix public api to specify `failure`. Also re-org public api tests for easier Kubernetes and Prometheus mocking. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-10 11:04:04 -07:00
Brian Smith	bc16034fd6	Proxy: Fall back to using DNS when Destination service can't find service. (#692 ) Fixes #155.	2018-04-07 18:26:06 -10:00
Brian Smith	c25e9c371b	Refactor poll_destination() in service discovery. (#725 ) No change in behavior is intended here. Split poll_destination() into two parts, one that operates locally on the DestinationSet, and the other that operates on data that isn't wholly local to the DestinationSet. This makes the code easier to understand. This is being done in preparation for adding DNS fallback polling to poll_destination(). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-07 18:15:19 -10:00
Brian Smith	7d3b715c4d	Proxy: Move DNS name normalization to service discovery (#722 ) Only the destination service needs normalized names (and even then, that's just temporary). The rest of the code needs the name as it was given, except case-normalized (lowercased). Because DNS fallack isn't implemented in service discovery yet, Outbound still a temporary workaround using FullyQualifiedName to keep things working; thta will be removed once DNS fallback is implemented in service discovery. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-06 15:04:09 -10:00
Andrew Seigner	716b392231	Move StatSummary logic into grpc server (#717 ) The StatSummary logic was implemented as a method on http_server. Move the StatSummary logic into grpc_server, for consistency with the other endpoints. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 16:46:15 -07:00
Andrew Seigner	b6bcdcc059	Namespace-aware Grafana dashboards (#716 ) The Grafana dashboards key off of deployment, but had no awareness of namespaces, causing incorrect metrics aggregation and display. This change makes the Grafana dashboards key off of namespaces, and also modifies the Grafana links in the Conduit dashboard to link to namespace+deployment. Fixes #704 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 15:37:53 -07:00
Kevin Lingerfelt	baa4d10c2f	CLI: change conduit namespace shorthand flag to -c (#714 ) * CLI: change conduit namespace shorthand flag to -c All of the conduit CLI subcommands accept a --conduit-namespace flag, indicating the namespace where conduit is running. Some of the subcommands also provide a --namespace flag, indicating the kubernetes namespace where a user's application code is running. To prevent confusion, I'm changing the shorthand flag for the conduit namespace to -c, and using the -n shorthand when referring to user namespaces. As part of this change I've also standardized the capitalization of all of our command line flags, removed the -r shorthand for the install --registry flag, and made the global --kubeconfig and --api-addr flags apply to all subcommands. * Switch flag descriptions from lowercase to Capital Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-06 14:47:31 -07:00
Eliza Weisman	8bc05472ed	Make `control::Cache` key-value in order to store discovery metadata (#688 ) This PR changes the proxy's `control::Cache` module from a set to a key-value map. This change is made in order to use the values in the map to store metadata from the Destination API, but allow evictions and insertions to be based only on the `SocketAddr` of the destination entry. This will make code in PR #661 much simpler, by removing the need to wrap `SocketAddr`s in the cache in a `Labeled` struct for storing metadata, and the need for custom `Borrow` implementations on that type. Furthermore, I've changed from using a standard library `HashSet`/`HashMap` as the underlying collection to using `IndexMap`, as we suspect that this will result in performance improvements. Currently, as `master` has no additional metadata to associate with cache entries, the type of the values in the map is `()`. When #661 merges, the values will actually contain metadata. If we suspect that there are many other use-cases for `control::Cache` where it will be treated as a set rather than a map, we may want to provide a separate set of impls for `Cache<T, ()>` (like `std::HashSet`) to make the API more ergonomic in this case.	2018-04-06 13:54:16 -07:00
Andrew Seigner	1cf1a0cb13	Fix public-api config in docker-compose (#712 ) The public-api in the docker-compose environment is not configured to talk to Prometheus or Kubernetes, which is now required with the new telemetry pipeline. Modify the public-api config in docker-compose to connect to k8s and prom. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 12:59:34 -07:00
Andrew Seigner	50c323c617	Use canonical k8s names, fix prom labels (#702 ) The new statsummary command accepted friendly k8s names, which worked for k8s queries, but Prometheus requires a specific key. Modify the statsummary query to map friendly k8s names to canonical k8s names when constructing the query. Then during the query, map the canonical k8s name to a specific Prometheus label. Fixes #695 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 12:34:54 -07:00
Brian Smith	15037d9618	Proxy: Improve DNS name parsing (#708 ) Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-06 08:45:18 -10:00
Andrew Seigner	836168884e	Link to Grafana from Conduit Dashboard (#678 ) * Link to Grafana from Conduit Dashboard Previously the only way to access the Grafana dashboards was via direct link, provided by the `conduit dashboard` command. Add Grafana links throughout the Conduit Dashboard, next to all Deployment objects. This change also modifies the behavior of the ConduitLink helper, to enable linking to other deployments proxied by the `conduit dashboard` command. Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * review feedback Signed-off-by: Andrew Seigner <siggy@buoyant.io> * review feedback, fix console, remove absolute Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 10:56:42 -07:00
Eliza Weisman	605e68dff6	Add pretty durations to panics from `assert_eventually!` (#677 ) This PR adds the pretty-printing for durations I added in #676 to the panic message from the `assert_eventually!` macro added in #669. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-04-06 10:49:17 -07:00
Brian Smith	c31f4ba993	Remove unused conversions for Destination. (#701 ) These have not been used for a while; they are dead code. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-06 07:35:35 -10:00
Brian Smith	7bc4ffd0a4	Revert "Proxy: Refactor DNS name parsing and normalization (#673 )" (#700 ) This reverts commit `311ef410a8`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 16:49:32 -10:00
Brian Smith	1b223723bc	Revert "Proxy: Refactor poll_destination() in service discovery. (#674 )" (#698 ) This reverts commit `4fb9877b89`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 16:36:01 -10:00
Risha Mars	2f5b5ea5f2	Start implementing conduit stat summary endpoint (#671 ) Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas	2018-04-05 17:05:06 -07:00
Brian Smith	4fb9877b89	Proxy: Refactor poll_destination() in service discovery. (#674 ) No change in behavior is intended here. Split poll_destination() into two parts, one that operates locally on the DestinationSet, and the other that operates on data that isn't wholly local to the DestinationSet. This makes the code easier to understand. This is being done in preparation for adding DNS fallback polling to poll_destination(). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 13:05:11 -10:00
Brian Smith	311ef410a8	Proxy: Refactor DNS name parsing and normalization (#673 ) Proxy: Refactor DNS name parsing and normalization Only the destination service needs normalized names (and even then, that's just temporary). The rest of the code needs the name as it was given, except case-normalized (lowercased). Because DNS fallack isn't implemented in service discovery yet, Outbound still a temporary workaround using FullyQualifiedName to keep things working; thta will be removed once DNS fallback is implemented in service discovery. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-05 12:32:12 -10:00
Andrew Seigner	28d5007cdf	Harmonize Prometheus label usage (#690 ) The Destination service used slightly different labels than the telemetry pipeline expected, specifically, prefixed with `k8s_`. Make all Prometheus labels consistent by dropping `k8s_`. Also rename `pod_name` to `pod` for consistency with `deployement`, etc. Also update and reorganize `proxy-metrics.md` to reflect new labelling. Fixes #655 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 15:09:06 -07:00
Andrew Seigner	65be27c3c0	Fix ci job failing when new Docker image added (#691 ) The master ci job executes a `docker-pull master` prior to building, to bootstrap the Docker image cache. This command fails if the PR being merged to master introduces a new Docker image, for example: https://travis-ci.org/runconduit/conduit/jobs/362841328 This changes the master ci job to handle a `docker-pull master` failure gracefully. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 15:01:54 -07:00
Andrew Seigner	9508e11b45	Build conduit-specific Grafana Docker image (#679 ) Using a vanilla Grafana Docker image as part of `conduit install` avoided maintaining a conduit-specific Grafana Docker image, but made packaging dashboard json files cumbersome. Roll our own Grafana Docker image, that includes conduit-specific dashboard json files. This significantly decreases the `conduit install` output size, and enables dashboard integration in the docker-compose environment. Fixes #567 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 14:20:05 -07:00
Eliza Weisman	18fa42ebd0	Pretty-print durations in log messages (#676 ) This branch adds simple pretty-printing to duration in log timeout messages. If the duration is >= 1 second, it's printed in seconds with a fractional part. If the duration is less than 1 second, it is printed in milliseconds. This simple formatting may not be sufficient as a formatting rule for all cases, but should be sufficient for printing our relatively small timeouts. Log messages now look something like this: ``` ERROR 2018-04-04T20:05:49Z: conduit_proxy: turning operation timed out after 100 ms into 500 ``` Previously, they looked like this: ``` ERROR 2018-04-04T20:07:26Z: conduit_proxy: turning operation timed out after Duration { secs: 0, nanos: 100000000 } into 500 ``` I made this change partially because I wanted to make the panics from the `eventually!` macro added in #669 more readable.	2018-04-05 13:47:19 -07:00
Eliza Weisman	49bf01b0da	Add `assert_eventually!` macro to help de-flake telemetry tests (#669 ) Closes #615. Based on @olix0r's suggestion in https://github.com/runconduit/conduit/issues/613#issuecomment-376024744, this PR adds an `assert_eventually!` macro to retry an assertion a set number of times, waiting for 15 ms between retries. This is loosely based on ScalaTest's [eventually](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html). I've rewritten the flaky telemetry tests to use the `assert_eventually!` macro, to compensate for delays in the served metrics being updated between client requests and metrics scrapes.	2018-04-05 11:23:34 -07:00
Eliza Weisman	6b370b4466	Split labels out of `prometheus.rs` into its own file (#680 ) The proxy's `telemetry/metrics/prometheus.rs` file was starting to get long and hard to find one's way around in. I split the prometheus labels code out into a separate submodule and `RequestLabels` and `ResponseLabels` public. This seems like a reasonable division of the code, and the resultant files are much easier to read.	2018-04-04 15:49:17 -07:00
Oliver Gould	2dc964c583	Move control::discovery::Cache into its own module (#672 ) The proxy's control::discovery module is becoming a bit dense in terms of what it implements. In order to make this code more understandable, and to be able to use a similar caching strategy in other parts of the controller, the `control::cache` module now holds discovery's cache implementation. This module is only visible within the `control` module, and it now exposes two new public methods: `values()` and `set_reset_on_next_modification()`.	2018-04-04 14:27:04 -07:00
Eliza Weisman	01628bfa43	Fix missing comma in gRPC status code labels (#670 ) Fixes the issue caught by @olix0r in https://github.com/runconduit/conduit/pull/661#issuecomment-378431155	2018-04-04 10:41:21 -07:00
Risha Mars	d1a39ea6bf	Define a new telemetry Stat API (#663 ) * Define a new telemetry Stat API Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627. StatSummary will replace Stat once implemented and the original Stat deleted.	2018-04-03 14:45:58 -07:00
Brian Smith	06bf78ccdf	Use Rust 1.25 to build Docker images. (#667 ) Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-03 08:22:29 -10:00
Franziska von der Goltz	eff848a8cf	fix pod status and count in control plane dashboard (#659 ) * fix pod status and count display in control plane dashboard section: - the control plane would show terminated and stale deployments in the UI, this is confusing and might indicate errors - this filters out temrinated and failed component deploys from the UI - it is to note that pending deploys will still be counted and represented with a greyed out status dot - Fixes: #606 Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu> Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>	2018-04-03 10:39:35 -07:00
Phil Calçado	19001f8d38	Add pod-based metric_labels to destinations response (#429 ) (#654 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback * Add pod-based metric_labels to destinations response * Add documentation on running control plane to BUILD.md Signed-off-by: Phil Calcado <phil@buoyant.io> * Fix mock controller in proxy tests (#656) Signed-off-by: Eliza Weisman <eliza@buoyant.io> * Address review feedback * Rename files in the destination package Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-02 18:36:57 -07:00
Andrew Seigner	ee042e1943	Rename grafana viz to top-line (#666 ) The primary Grafana dashboard was named 'viz' from a prototype. Rename 'viz' to 'Top Line'. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-02 18:10:35 -07:00
Brian Smith	df9ead9c36	Use Go 1.10.1 to build all Go code. (#650 ) Go 1.10.1 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-02 14:58:30 -10:00
Andrew Seigner	bf721466e3	Filter out conduit controller pods from Grafana (#657 ) The Grafana dashboards were displaying all proxy-enabled pods, including conduit controller pods. In the old telemetry pipeline filtering these out required knowledge of the controller's namespace, which the dashboards are agnostic to. This change leverages the new `conduit_io_control_plane_component` prometheus label to filter out proxy-enabled controller components. Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-02 17:56:12 -07:00
Sean McArthur	47f9665b8e	proxy: allow disable protocol detection on specific ports (#648 ) - Adds environment variables to configure a set of ports that, when an incoming connection has an SO_ORIGINAL_DST with a port matching, will disable protocol detection for that connection and immediately start a TCP proxy. - Adds a default list of well known ports: SMTP and MySQL. Closes #339	2018-04-02 14:24:36 -07:00
Andrew Seigner	97546e0646	Modify simulate-proxy to be more pod-centric (#653 ) simulate-proxy uses a deployment object from kubernetes to simulate each proxy metrics endpoint. Modify simulate-proxy to instead use a pod to simulate each proxy metrics endpoint. This ensures that each metrics endpoint consistently represents a pod in kubernetes, including it's namespace, deployment, and label information. This change also adds support for: - a new `metric-ports` flag, default is `10000-10009`. - `classification`, `pod_name`, and `pod_template_hash` labels Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-30 13:28:45 -07:00
Phil Calçado	bbed49c5bd	Refactor destination service and add tests in preparation to add information about labels (#645 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-03-30 11:36:48 -07:00
Brian Smith	f931dec3b3	Proxy: Completely replace current set of destinations on reconnect (#632 ) Previosuly, when the proxy was disconnected from the Destination service and then reconnects, the proxy would not forget old, outdated entries in its cache of endpoints. If those endpoints had been removed while the proxy was disconnected then the proxy would never become aware of that. Instead, on the first message after a reconnection, replace the entire set of cached entries with the new set, which may be empty. Prior to this change, the new test outbound_destinations_reset_on_reconnect_followed_by_no_endpoints_exists passed already but outbound_destinations_reset_on_reconnect_followed_by_add_none and outbound_destinations_reset_on_reconnect_followed_by_remove_none failed. Now all these tests pass. Fixes #573 Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-29 16:50:08 -10:00
Andrew Seigner	8fe742e2de	Update Grafana dashboards to use new proxy metrics (#637 ) The Top-line and Deployment Grafana dashboards relied on the soon-to-be-removed telemetry pipeline metrics. Update the Grafana dashboards to query for the new, proxy-based metrics. Grafana dashboard layouts have not changed. Depends on #635 to render metrics. Part of #420. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-29 13:00:01 -07:00
Brian Smith	bae72c32ed	Proxy: Factor out Destination service connection logic (#631 ) * Proxy: Factor out Destination service connection logic Centralize the connection initiation logic for the Destination service to make it easier to maintain. Clarify that the `rx` field isn't needed prior to a (re)connect. Signed-off-by: Brian Smith <brian@briansmith.org> * Rename `rx` to `query`. Signed-off-by: Brian Smith <brian@briansmith.org> * "recoonect" -> "reconnect" Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-29 08:20:57 -10:00
Andrew Seigner	666c83e963	Add pod_name to Prometheus labels (#649 ) Previously we were using the instance label to uniquely identify a pod. This meant that getting stats by pod name would require extra queries to Kubernetes to map pod name to instance. This change adds a pod_name label to metrics at collection time. This should not affect cardinality as pod_name is invariant with respect to instance. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-29 11:07:35 -07:00
Deshi Xiao	732f3c1565	fix grafana CrashLoopBackoff on image 5.0.3 (#646 ) this is a known issue with grafana in k8s. grafana/grafana:5.0.4 was just released today. update the repo from 5.0.3 to 5.0.4 fixed issues #582 Signed-off-by: Deshi Xiao <xiaods@gmail.com>	2018-03-29 09:36:11 -07:00
Oliver Gould	6e435754a1	Improve CLI docker caching (#612 ) Currently, the CLI docker image copies the entire `controller` directory, though the CLI only requires a few of its subdirectories. This causes the CLI's docker cache to be needlessly invalidated when, for instance, a service implementation changes. By restricting the copied directories to `controller/{api,public,util}`, build caching is improved.	2018-03-29 09:29:06 -07:00
Carl Lerche	288e041b8f	proxy: Update h2 to 0.1.3 (#640 ) Signed-off-by: Carl Lerche <me@carllerche.com>	2018-03-29 09:22:54 -07:00
Franziska von der Goltz	67fac9d240	remove toggle sorting functionality from TableComponent (#630 ) remove toggle sorting functionality from TableComponent: - tables displaying metrics allowed to toggle between being sorted and unsorted when clicking the same button. This was confusing behavior for the user. - this PR removes the toggle functionality and introduces a BaseTable Component that extends antd's component without the capability to toggle - Fixes: #566 Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>	2018-03-28 18:01:34 -07:00
Eliza Weisman	c688cf6028	Add response classification to proxy metrics (#639 ) This PR adds a `classification` label to proxy response metrics, as @olix0r described in https://github.com/runconduit/conduit/issues/634#issuecomment-376964083. The label is either "success" or "failure", depending on the following rules: + if the response had a gRPC status code, then - gRPC status code 0 is considered a success - all others are considered failures + else if the response had an HTTP status code, then - status codes < 500 are considered success, - status codes >= 500 are considered failures + else if the response stream failed then - the response is a failure. I've also added end-to-end tests for the classification of HTTP responses (with some work towards classifying gRPC responses as well). Additionally, I've updated `doc/proxy_metrics.md` to reflect the added `classification` label. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-03-28 14:49:00 -07:00

... 22 23 24 25 26 ...

1503 Commits All Branches Search

1503 Commits

All Branches