linkerd2

Commit Graph

Author	SHA1	Message	Date
Kevin Lingerfelt	eebc612d52	Add install flag for sending tls identity info to proxies (#1055 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-04 16:55:06 -07:00
Kevin Lingerfelt	ec2433e9bd	Update controller to use 'tls' metric label (#1044 ) * Update controller to use 'tls' metric label * Fix meshed column formatter Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-01 16:44:33 -07:00
Eliza Weisman	5a42ce357e	proto: Add TLS identity to WeightedAddr message (#1041 ) Required for #1008. This PR adds the `TlsIdentity` message to the Destination service proto, to describe what strategy the proxy should use for verifying an endpoint's TLS certificates. It also adds a `TlsIdentity` field to the `WeightedAddr` message. Currently, there is one possible variant for `TlsIdentity`, `KubernetesPodName`, which consists of the Kubernetes pod name of the endpoint, the namespace of the endpoint, and the namespace of that pod's Conduit control plane. The proxy should attempt to connect over TLS if the control plane namespace matches its own control plane namespace. The pod name and namespace are used to verify the endpoint's TLS certificate. See https://github.com/runconduit/conduit/issues/386#issuecomment-392948046. This change was initially part of #1008, but I factored it out to make the diff smaller. Signed-off-by: Eliza Weisman <eliza@buoyant.io>	2018-05-31 11:48:25 -07:00
Risha Mars	ffabdefc6c	Add queries to prometheus to determine number of fully meshed requests (#983 ) - Update the `response_total` prometheus query of the StatSummary endpoint to also break queries out by a `meshed` label. - Add a 'Secured' column to the web UI/CLI stat displays, which indicate the percentage of traffic starting and ending in the mesh This meshed label is used in the CLI/Web UI to display a column of the percentage of traffic that starts/ends in the mesh. (Which is a proxy indicator for whether that traffic is 'secured' when we add TLS by default for intra mesh requests). The `meshed` label is not yet added anywhere, so until it is supplied by the proxy, all traffic will show up as 0% secured in the web/CLI.	2018-05-24 11:05:09 -07:00
Andrew Seigner	8a3b1a638a	Introduce meshed label in simulate-proxy (#992 ) The proxy does not yet support a `meshed` label. In anticipation of a `meshed` label in the proxy, introduce this label in `simulate-proxy`, for testing. Relates to #306 and #386. Signed-off-by: Andrew Seigner <siggy@buoyant.io> secured -> meshed Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-23 15:06:11 -07:00
Andrew Seigner	84e6eb5c87	Fix nil pointer dereference in StatSummary (#991 ) The StatSummary endpoint was dereferencing StatSummaryRequest.Selector.Resource, causing a panic when it received an empty request. Fix StatSummary to use the nil-friendly StatSummaryRequest.GetSelector().GetResource() methods, and add a test to validate. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-23 13:21:49 -07:00
Risha Mars	1e6434f6de	Fix bug in the public-api where conduit stat params were ignored (#971 ) * Fix bug where we were dropping parts of the StatSummaryRequest * Add tests for prometheus query strings and for failed cases Problem In #928 I rewrote the stat api to handle 'all' as a resource type. To query for all resource types, we would copy the Resource, LabelSelector and TimeWindow of the original request, and then go through all the resource types and set Resource.Type for each resource we wanted to get. The bug is that while we copy over some fields of the original request, we didn't copy over all of them - namely Resource.Name and the Outbound resource. So the Stat endpoint would ignore any --to or --from flags, and would ignore requests for a specific named resource. Solution Copy over all fields from the request. I've also added tests for this case. In this process I've refactored the stat_summary_test code to make it a bit easier to read/use.	2018-05-18 16:06:06 -07:00
Kevin Lingerfelt	36ec391dbe	Go: update k8s dependencies to 1.10.2 (#962 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-05-17 15:46:58 -07:00
Risha Mars	b8dc83f9d2	Modify the Stat API to handle requests for resource type "all" (#928 ) Allow the Stat endpoint in the public-api to accept requests for resourceType "all". Currently, this queries Pods, Deployments, RCs and Services, but can be modified to query other resources as well. Both the CLI and web endpoints now work if you set resourceType to all. e.g. `conduit stat all`	2018-05-11 14:35:37 -07:00
Kevin Lingerfelt	4e8e1eb84d	CLI: Fix validation for service stats (#935 ) * CLI: Fix validation for service stats * Address review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-05-11 10:28:49 -07:00
Oliver Gould	a786089fd6	docker: Cache versionless builds before building versioned go binaries (#921 ) The way that git-related version information is linked into go binaries busts Docker's cache such that every commit causes all binaries to rebuilt. In order to ameliorate this, we can build each binary once without version information first so that its artifacts are cached. When Go sources are not changed and only the version information changes, builds are 4.3x faster than before (from 5+ minutes to <90s). On `master` Branch off of master and build (mostly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 9.10s user 6.30s system 5% cpu 4:26.47 total ``` Rebuild without changing anything (highly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 9.23s user 6.04s system 47% cpu 32.017 total ``` Update only the git sha and rebuild: ``` :; git ci -am 'bump it' --allow-empty [ver/eg 2749eb3] bump it :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 8.55s user 6.08s system 4% cpu 5:22.25 total ``` On this branch: Rebuild without changing anything (highly cached): ``` :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build 8.94s user 5.97s system 46% cpu 32.257 total ``` Update only the git sha and rebuild: ``` :; git ci -am 'bump it' --allow-empty [ver/go-docker-cache-versionless 77a80b5] bump it :; time DOCKER_TRACE=1 bin/docker-build ... DOCKER_TRACE=1 bin/docker-build-cli-bin 2.02s user 1.34s system 9% cpu 34.144 total ```	2018-05-10 10:22:09 -07:00
Risha Mars	416381cdfd	Fix bug where GetPodsFor(pod) was returning all pods in a namespace (#900 ) * Fix bug where GetPodsFor(pod) was returning all pods in a namespace Problem In lister.GetPodsFor, when the input object was a pod, we would return all the pods in the namespace. I would expect GetPodsFor(pod) to return only one pod - the pod itself. Cause The cause of this is that when the object type was pod we were setting the selector to selector = labels.Everything() which gets all the pods in the namespace. Fix Special case GetPodsFor(pod) to return the pod itself, rather than looking up pods via labels.	2018-05-08 13:52:49 -07:00
Risha Mars	f94856e489	Modify the Stat endpoint to also return the number of failed conduit pods (#895 ) * Modify the Stat endpoint to also return the count of failed pods * Add comments explaining pod count stats * Rename total pod count to running pod count This is to support the service mesh overview page, as I'd like to include an indicator of failed pods there.	2018-05-08 10:35:21 -07:00
Brian Smith	c5d2dab8bd	Remove special support for ExternalName services (#764 ) After this was implemented we found that ExternalName services are represented in DNS as CNAMEs, which means that the proxy's DNS fallback logic can be used instead of doing DNS in the control plane. Besides simplifying the controller, this will also increase fidelity with the proxied pods' DNS configuration (improve transparency). Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-25 11:53:33 -10:00
Andrew Seigner	dce31b888f	Deprecate Tap, rename TapByResource to Tap (#844 ) The `conduit tap` command is now deprecated. Replace `conduit tap` with `connduit tapByResource`. Rename tapByResource to tap. The underlying protobuf for tap remains, the tap gRPC endpoint now returns Unimplemented. Fixes #804 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-25 12:24:46 -07:00
Andrew Seigner	a0a9a42e23	Implement Public API and Tap on top of Lister (#835 ) public-api and and tap were both using their own implementations of the Kubernetes Informer/Lister APIs. This change factors out all Informer/Lister usage into the Lister module. This also introduces a new `Lister.GetObjects` method. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-24 18:10:48 -07:00
Andrew Seigner	03d4684d3b	Introduce K8s Lister, integrate simulate-proxy (#829 ) The Kubernetes client-go Informer/Lister APIs are implemented in several parts of the code base. This change introduces a Lister module, providing Informer/Lister capability through a simple interface. Once this merges, we can follow up with moving public-api and tap onto Lister. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:44:19 -07:00
Andrew Seigner	baf4ea1a5a	Implement TapByResource in Tap Service (#827 ) The TapByResource endpoint was previously a stub. Implement end-to-end tapByResource functionality, with support for specifying any kubernetes resource(s) as target and destination. Fixes #803, #49 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 16:13:26 -07:00
Eliza Weisman	d9112abc93	proxy: remove unused metrics (#826 ) This PR removes the unused `request_duration_ms` and `response_duration_ms` histogram metrics from the proxy. It also removes them from the `simulate-proxy` script's output, and from `docs/proxy-metrics.md` Closes #821	2018-04-23 16:05:20 -07:00
Andrew Seigner	39eccb09e2	cli: standardize kubernetes resource parsing (#830 ) The Tap command leveraged new cli parsing code, enabling Kubernetes resources specified as `(TYPE [NAME] \| TYPE/NAME)`. The Stat command did not use this. Modify the Stat command to use the same cli flag parsing code as Tap. Remove the to/from-resource flags from Stat. Fixes #792 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-23 15:17:42 -07:00
Eliza Weisman	8147a363e9	Make simulate-proxy match proxy output (#822 ) This PR makes two changes to the `simulate-proxy` script: 1. Removed the `protocol={"http", "tcp"}` label from TCP metrics. The proxy no longer adds this label (see https://github.com/runconduit/conduit/pull/785#discussion_r182563499). 2. Fixed failed responses being labeled with `classification="fail"` rather than `classification="failure"` (the label the proxy sets). I noticed that while I was here and decided to fix it as well. Note that the first change required some minor changes to the `proxyMetricCollectors` struct in `simulate-proxy`; since the label cardinality for TCP open stats decreased by one due to removing the `protocol` label, it's no longer necessary for that struct to `haveCounterVec`/`GaugeVec` pointers for these stats. It now owns the actual `Counter`/`Gauge` instead. This means that the metric vecs that are created to be labeled for `inbound` and `outbound` are now stored as variables in the `newSimulatedProxy` function rather than going in a `proxyMetricCollectors` struct first. This shouldn't impact behaviour at all.	2018-04-20 12:11:57 -07:00
Andrew Seigner	79bdc638b3	Service support in stat command (#809 ) The `stat` command did not support `service` as a resource type. This change adds `service` support to the `stat` command. Specifically: - as a destination resource on `--to` commands - as a target resource on `--from` commands Fixes #805 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 16:51:20 -07:00
Eliza Weisman	6eec6256f7	Add transport-level metrics to simulate-proxy (#811 ) This PR adds the transport-level metrics described in #742 to the `simulate-proxy` script. This will be useful while adding these metrics to the Grafana dashboard and/or CLI. Closes #793	2018-04-19 15:18:43 -07:00
Andrew Seigner	293e00bc3e	Introduce tapByResource cli command (#802 ) The existing `tap` command is being deprecated. Introduce a `tapByResource` cli command. It supports tapping a Kubernetes resource or collection of resources, optionally filtered by outbound resources. This command will eventually replace `tap`. Part of #778 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-19 14:44:23 -07:00
Kevin Lingerfelt	653dc6bfaa	Add replication controller stats in CLI (#794 ) * Add replication controller stats in CLI * Fix pod status in stat summary tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 18:12:14 -07:00
Oliver Gould	06dd8d90ee	Introduce the TapByResource API (#778 ) This changes the public api to have a new rpc type, `TapByResource`. This api supersedes the Tap api. `TapByResource` is richer, more closely reflecting the proxy's capabilities. The proxy's Tap api is extended to select over destination labels, corresponding with those returned by the Destination api. Now both `Tap` and `TapByResource`'s responses may include destination labels. This change avoids breaking backwards compatibility by: * introducing the new `TapByResource` rpc type, opting not to change Tap * extending the proxy's Match type with a new, optional, `destination_label` field. * `TapEvent` is extended with a new, optional, `destination_meta`.	2018-04-18 15:37:07 -07:00
Andrew Seigner	1e4ac8fda8	Destination service provides pod-template-hash (#784 ) The Destination service does not provide ReplicaSet information to the proxy. The `pod-template-hash` label approximates selecting over all pods in a ReplicaSet or ReplicationController. Modify the Destination service to provide this label to the proxy. Relates to #508 and #741 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-18 14:41:27 -07:00
Kevin Lingerfelt	71a51afb40	Expose pod stats in CLI, web UI, and Grafana (#788 ) * Expose pod stats in CLI, web UI, and Grafana * Fix js api helpers test * Add outbound traffic stats to pod dashboard Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 11:26:47 -07:00
Andrew Seigner	9e8cce0838	Destination service returns "Running" pod labels (#781 ) When the Destination sees an IP address, it looks up Pods by that IP, and associates Pod label data to it. If the lookup by IP returned more than one Pod, it simply picked the first one. This is not correct, specifically in cases where one pod is in a Running state, and others are not. Modify the Destination service to only return label data for Pods in the Running state. Fixes #773 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-17 14:42:54 -07:00
Andrew Seigner	727521f914	Permit arbitrary time windows in public-api (#774 ) The public-api previously only permitted 4 hard-coded time windows: 10s, 1m, 10m, 1h. This was primarily a relic of the recently removed telemetry system. Modify the public-api to validate the time string, but allow for any window size, which is then passed through to Prometheus. Fixes #686 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-16 17:37:17 -07:00
Kevin Lingerfelt	11a4359e9a	Misc cleanup following the telemetry rewrite (#771 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-16 15:51:07 -07:00
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Andrew Seigner	21886760c6	Use apps/v1beta2 for Kubernetes 1.8 compatibility (#762 ) Conduit was relying on apps/v1 to Deployment and ReplicaSet APIs. apps/v1 is not available on Kubernetes 1.8. This prevented the public-api from starting. Switch Conduit to use apps/v1beta2. Also increase the Kubernetes API cache sync timeout from 10 to 60 seconds, as it was taking 11 seconds on a test cluster. Fixes #761 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-13 12:08:16 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Andrew Seigner	e9b209829d	Handle NaN metrics (#750 ) The Prometheus client sometimes returns NaN if a calculation is invalid, such as histogram_quantile when no requests have occurred. Add IsNaN check in the public-api and set output to zero. Fixes #747 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-12 15:21:00 -07:00
Andrew Seigner	624b87f743	Implement ListPods in public-api (#743 ) The ListPods endpoint's logic resides in the telemetry service, which is going away. Move ListPods logic into public-api, use new k8s informer APIs. Fixes #694 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 17:53:57 -07:00
Kevin Lingerfelt	47caf1ca07	Add --all-namespaces flag to CLI statsummary command (#745 ) * Add --all-namespaces flag to CLI statsummary command * Fix statsummary output formatting Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 16:40:25 -07:00
Andrew Seigner	259fdcd134	Add latency stats in new stat summary endpoint (#737 ) The new StatSummary endpoint was only providing request volume and successs rate information. Add support for retrieving latency stats via StatSummary. Also make all prometheus calls in parallel, and implement kubernetes test fixtures. Fixes #681 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-11 11:58:32 -07:00
Kevin Lingerfelt	e1e1b6b599	Controller: add more destination labels, fix service label (#731 ) * Add more destination labels, fix service label * Update owner labels to match proxy metrics docs Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-11 10:44:52 -07:00
Kevin Lingerfelt	91c359e612	Switch public API to use cached k8s resources (#724 ) * Switch public API to use cached k8s resources * Move shared informer code to separate goroutine * Fix spelling issue Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-10 11:39:31 -07:00
Andrew Seigner	3a341abe9a	Fix success rate calculation in public api (#723 ) The success rate calculation relies on the `classification` label, but was incorrectly specifying `fail` rather than `failure`. Fix public api to specify `failure`. Also re-org public api tests for easier Kubernetes and Prometheus mocking. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-10 11:04:04 -07:00
Andrew Seigner	716b392231	Move StatSummary logic into grpc server (#717 ) The StatSummary logic was implemented as a method on http_server. Move the StatSummary logic into grpc_server, for consistency with the other endpoints. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 16:46:15 -07:00
Andrew Seigner	50c323c617	Use canonical k8s names, fix prom labels (#702 ) The new statsummary command accepted friendly k8s names, which worked for k8s queries, but Prometheus requires a specific key. Modify the statsummary query to map friendly k8s names to canonical k8s names when constructing the query. Then during the query, map the canonical k8s name to a specific Prometheus label. Fixes #695 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-06 12:34:54 -07:00
Risha Mars	2f5b5ea5f2	Start implementing conduit stat summary endpoint (#671 ) Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas	2018-04-05 17:05:06 -07:00
Andrew Seigner	28d5007cdf	Harmonize Prometheus label usage (#690 ) The Destination service used slightly different labels than the telemetry pipeline expected, specifically, prefixed with `k8s_`. Make all Prometheus labels consistent by dropping `k8s_`. Also rename `pod_name` to `pod` for consistency with `deployement`, etc. Also update and reorganize `proxy-metrics.md` to reflect new labelling. Fixes #655 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 15:09:06 -07:00
Risha Mars	d1a39ea6bf	Define a new telemetry Stat API (#663 ) * Define a new telemetry Stat API Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627. StatSummary will replace Stat once implemented and the original Stat deleted.	2018-04-03 14:45:58 -07:00
Phil Calçado	19001f8d38	Add pod-based metric_labels to destinations response (#429 ) (#654 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback * Add pod-based metric_labels to destinations response * Add documentation on running control plane to BUILD.md Signed-off-by: Phil Calcado <phil@buoyant.io> * Fix mock controller in proxy tests (#656) Signed-off-by: Eliza Weisman <eliza@buoyant.io> * Address review feedback * Rename files in the destination package Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-02 18:36:57 -07:00
Brian Smith	df9ead9c36	Use Go 1.10.1 to build all Go code. (#650 ) Go 1.10.1 is a security release. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-04-02 14:58:30 -10:00
Andrew Seigner	97546e0646	Modify simulate-proxy to be more pod-centric (#653 ) simulate-proxy uses a deployment object from kubernetes to simulate each proxy metrics endpoint. Modify simulate-proxy to instead use a pod to simulate each proxy metrics endpoint. This ensures that each metrics endpoint consistently represents a pod in kubernetes, including it's namespace, deployment, and label information. This change also adds support for: - a new `metric-ports` flag, default is `10000-10009`. - `classification`, `pod_name`, and `pod_template_hash` labels Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-30 13:28:45 -07:00
Phil Calçado	bbed49c5bd	Refactor destination service and add tests in preparation to add information about labels (#645 ) * Extracted logic from destination server * Make tests follow style used elsewhere in the code * Extract single interface for resolvers * Add tests for k8s and ipv4 resolvers * Fix small usability issues * Update dep * Act on feedback Signed-off-by: Phil Calcado <phil@buoyant.io>	2018-03-30 11:36:48 -07:00

1 2 3

123 Commits