linkerd2

Commit Graph

Author	SHA1	Message	Date
Ben Lambert	297cb570f2	Added a --ha flag to install CLI (#1852 ) This change allows some advised production config to be applied to the install of the control plane. Currently this runs 3x replicas of the controller and adds some pretty sane requests to each of the components + containers of the control plane. Fixes #1101 Signed-off-by: Ben Lambert <ben@blam.sh>	2018-11-20 23:03:59 -05:00
Alex Leong	d8b5ebaa6d	Remove the proxy-api container (#1813 ) A container called `proxy-api` runs in the Linkerd2 controller pod. This container listens on port 8086 and serves the proxy-api but does nothing other than forward gRPC requests to the destination container which listens on port 8089. We remove the proxy-api container altogether and change the destination container to listen on port 8086 instead of 8089. The result is that clients still use the proxy-api by connecting to `proxy-api.<ns>.svc.cluster.local:8086` but the controller has one fewer containers. This results in a simpler system that is easier to reason about. Signed-off-by: Alex Leong <alex@buoyant.io>	2018-10-29 16:31:43 -07:00
Alex Leong	43c22fe967	Implement getProfiles method in destination service (#1759 ) We implement the getProfiles method in the destination service. This method returns a stream of destination profiles for a given authority. It does this by looking up the ServiceProfile resource in the controller namespace named `<svc>.<ns>` where `<svc>` is the name of the service and `<ns>` is the namespace of the service. This PR includes: * Adding a ServiceProfile Custom Resource Definition to linkerd install * A watch based implementation of the getProfiles method in the destination service, similar to the implementation of get. * An update to the destination client script that allows querying the getProfiles method. Signed-off-by: Alex Leong <alex@buoyant.io>	2018-10-16 15:39:12 -07:00
Kevin Lingerfelt	e9874b9c3e	Improve docker layer caching for web image (#1757 ) * Improve docker layer caching for web image * Move all web files to /linkerd dir Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-10-16 11:10:52 -07:00
Kevin Lingerfelt	46c887ca00	Add --single-namespace install flag for restricted permissions (#1721 ) * Add --single-namespace install flag for restricted permissions * Better formatting in install template * Mark --single-namespace and --proxy-auto-inject as experimental * Fix wording of --single-namespace check flag * Small healthcheck refactor Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-10-11 10:55:57 -07:00
Ivan Sim	4fba6aca0a	Proxy init and sidecar containers auto-injection (#1714 ) * Support auto sidecar-injection 1. Add proxy-injector deployment spec to cli/install/template.go 2. Inject the Linkerd CA bundle into the MutatingWebhookConfiguration during the webhook's start-up process. 3. Add a new handler to the CA controller to create a new secret for the webhook when a new MutatingWebhookConfiguration is created. 4. Declare a config map to store the proxy and proxy-init container specs used during the auto-inject process. 5. Ignore namespace and pods that are labeled with linkerd.io/auto-inject: disabled or linkerd.io/auto-inject: completed 6. Add new flag to `linkerd install` to enable/disable proxy auto-injection Proposed implementation for #561. * Resolve missing packages errors * Move the auto-inject label to the pod level * PR review items * Move proxy-injector to its own deployment * Ignore pods that already have proxy injected This ensures the webhook doesn't error out due to proxy that are injected using the command * PR review items on creating/updating the MWC on-start * Replace API calls to ConfigMap with file reads * Fixed post-rebase broken tests * Don't mutate the auto-inject label Since we started using healhcheck.HasExistingSidecars() to ensure pods with existing proxies aren't mutated, we don't need to use the auto-inject label as an indicator. This resolves a bug which happens with the kubectl run command where the deployment is also assigned the auto-inject label. The mutation causes the pod auto-inject label to not match the deployment label, causing kubectl run to fail. * Tidy up unit tests * Include proxy resource requests in sidecar config map * Fixes to broken YAML in CLI install config The ignore inbound and outbound ports are changed to string type to avoid broken YAML caused by the string conversion in the uint slice. Also, parameterized the proxy bind timeout option in template.go. Renamed the sidecar config map to 'linkerd-proxy-injector-webhook-config'. Signed-off-by: ihcsim <ihcsim@gmail.com>	2018-10-10 12:09:22 -07:00
Kevin Lingerfelt	b5ff29c8aa	Add data plane check to validate proxy version (#1574 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-09-04 15:22:38 -07:00
Risha Mars	fff09c5d06	Only tap pods that are meshed (#1535 ) Previously, we would tap any resource's pods, regardless of whether the pods were meshed or not. We can't actually tap non-meshed pods, so I'm adding a check that will filter out non-meshed pods from the pods that tap watches. Previous behaviour: When attempting to hang a non meshed pod, it would establish a watch on the pods, but then never return any results. In the CLI you could just cancel it with Ctrl-C. In the web, clicking Stop would send a WebSocket.close(1000) but wouldn't actually close the connection... Behaviour after change : If no pods under the specified resource are meshed, it'll return an error of no pods being found to tap	2018-08-28 09:59:52 -07:00
Kevin Lingerfelt	4845b4ec04	Restore linkerd.io/control-plane* labels (#1411 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-07 13:53:29 -07:00
Kevin Lingerfelt	e0a01c5dd8	Remove node scrape target, kubernetes grafana dashboard (#1410 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-07 13:41:38 -07:00
Kevin Lingerfelt	bd19e8aaff	Update prometheus to only scrape proxies in the same mesh (#1402 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-08-06 12:05:55 -07:00
Kevin Lingerfelt	51848230a0	Send glog logs to stderr by default (#1367 ) * Send glog logs to stderr by default * Factor out more shared flag parsing code Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-25 12:59:24 -07:00
Brian Smith	a98bfb1ca7	Rename `ca-bundle-distributor` to `ca`. (#1340 ) `ca-bundle-distributor` described the original role of the program but `ca` ("Certificate Authority") better describes its current role. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-07-17 14:10:40 -10:00
Franziska von der Goltz	c7ac072acc	update grafana dashboards: conduit to linkerd (#1320 ) * update grafana dashboards to remove conduit reference and replace with linkerd instances * update test install fixtures to reflect changes Fixes: #1315 Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>	2018-07-16 13:05:01 -07:00
Kevin Lingerfelt	e5cce1abaf	Rename CLI from conduit to linkerd (#1312 ) * Rename CLI binary * Update integration tests for new binary name * Rename --conduit-namespace flag, change default ns * Rename occurrences of conduit in rest of CLI * Rename inject and install components * Remove conduit occurrences in docker files * Additional miscellaneous cleanup * Move protobuf definitions to linkerd2 package * Rename conduit.io labels to use linkerd.io * Rename conduit-managed segment to linkerd-managed * Fix conduit references in web project Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-12 17:14:07 -07:00
Andrew Seigner	e18fa48135	Name ClusterRole objects to be namespace-specific (#1295 ) The control-plane's `ClusterRole` and `ClusterRoleBinding` objects are global. Because their names did not vary across multiple control-plane deployments, it prevented multiple control-planes from coexisting (when RBAC is enabled). Modify the `ClusterRole` and `ClusterRoleBinding` objects to include the control-plane's namespace in their names. Also modify the integration test to first install two control-planes, and then perform its full suite of tests, to prevent regression. Fixes #1292. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-07-10 16:21:20 -07:00
Brian Smith	252a8d39d3	Generate an ephemeral CA at startup that distributes TLS credentials (#1245 ) Create a ephemeral, in-memory TLS certificate authority and integrate it into the certificate distributor. Remove the re-creation of deleted ConfigMaps; this will be added back later in #1248. Signed-off-by: Brian Smith brian@briansmith.org Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-07-02 18:09:31 -10:00
Brian Smith	cca8e7077d	Add TLS support to `conduit inject`. (#1220 ) * Add TLS support to `conduit inject`. Add the settings needed to enable TLs when `--tls=optional` is passed on the commend line. Later the requirement to add `--tls` will be removed. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-06-27 16:04:07 -10:00
Kevin Lingerfelt	af85d1714f	Add probes and log termination policy for distributor (#1178 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-21 14:02:41 -07:00
Kevin Lingerfelt	12f869e7fc	Add CA certificate bundle distributor to conduit install (#675 ) * Add CA certificate bundle distributor to conduit install * Update ca-distributor to use shared informers * Only install CA distributor when --enable-tls flag is set * Only copy CA bundle into namespaces where inject pods have the same controller * Update API config to only watch pods and configmaps * Address review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-21 13:12:21 -07:00
Kevin Lingerfelt	682b0274b5	Add controller admin servers and readiness probes (#1168 ) * Add controller admin servers and readiness probes * Tweak readiness probes to be more sane * Refactor based on review feedback Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-20 17:32:44 -07:00
Andrew Seigner	0b9e7ff7df	Enable get for nodes/proxy for Prometheus RBAC (#1142 ) The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via the Kubernetes API server. In some configurations of Kubernetes, namely minikube and at least one baremetal kubespray cluster, this API call requires the `get` verb on the `nodes/proxy` resource. Enable `get` for `nodes/proxy` for the `conduit-prometheus` service account. Fixes #912 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-06-18 17:49:23 +01:00
Thomas Rampelberg	516807bde6	Add readiness/liveness checks for third party components (#1121 ) * Add readiness/liveness checks for third party components Any possible issues with the third party control plane components can wedge the services. Take the best practices for prometheus/grafana and add them to our template. See #1116 * Update test fixtures for new output	2018-06-14 13:01:13 -07:00
Kevin Lingerfelt	eebc612d52	Add install flag for sending tls identity info to proxies (#1055 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-06-04 16:55:06 -07:00
Andrew Seigner	8a1a3b31d4	Fix non-default proxy-api port (#979 ) Running `conduit install --api-port xxx` where xxx != 8086 would yield a broken install. Fix the install command to correctly propagate the `api-port` flag, setting it as the serve address in the proxy-api container. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-22 10:34:25 -07:00
Andrew Seigner	1275b1ae89	Introduce Grafana, K8s, and Prom dashboards (#904 ) Grafana provides default dashboards for Prometheus and Grafana health. The community also provides Kubernetes-specific dashboards. Conduit was not taking advantage of these. Introduce new Grafana dashboards focused on Grafana, Kubernetes, and Prometheus health. Tag all Conduit dashboards for easier UI navigation. Also fix layout in Conduit Health dashboard. Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-05-08 23:11:43 +02:00
Andrew Seigner	97bf4fcdf2	Release Notes for 0.4.1 release. (#839 ) Also update Getting Started and Debugging docs to reflect changes in `Tap` and `Stat`. Fixes #838 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-26 13:32:41 -07:00
Kevin Lingerfelt	653dc6bfaa	Add replication controller stats in CLI (#794 ) * Add replication controller stats in CLI * Fix pod status in stat summary tests Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-18 18:12:14 -07:00
Andrew Seigner	77fb6d3709	Add namespace as a resource type in public-api (#760 ) * Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 16:53:01 -07:00
Kevin Lingerfelt	fb15fe7c1a	Remove the telemetry service (#757 ) * Remove the telemetry service The telemetry service is no longer needed, now that prometheus scrapes metrics directly from proxies, and the public-api talks directly to prometheus. In this branch I'm removing the service itself as well as all of the telemetry protobuf, and updating the conduit install command to no longer install the service. I'm also removing the old version of the stat command, which required the telemetry service, and renaming the statsummary command to stat. * Fix time window tests * Remove deprecated controller scrape config Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-04-13 11:21:29 -07:00
Risha Mars	2f5b5ea5f2	Start implementing conduit stat summary endpoint (#671 ) Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas	2018-04-05 17:05:06 -07:00
Andrew Seigner	28d5007cdf	Harmonize Prometheus label usage (#690 ) The Destination service used slightly different labels than the telemetry pipeline expected, specifically, prefixed with `k8s_`. Make all Prometheus labels consistent by dropping `k8s_`. Also rename `pod_name` to `pod` for consistency with `deployement`, etc. Also update and reorganize `proxy-metrics.md` to reflect new labelling. Fixes #655 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 15:09:06 -07:00
Andrew Seigner	9508e11b45	Build conduit-specific Grafana Docker image (#679 ) Using a vanilla Grafana Docker image as part of `conduit install` avoided maintaining a conduit-specific Grafana Docker image, but made packaging dashboard json files cumbersome. Roll our own Grafana Docker image, that includes conduit-specific dashboard json files. This significantly decreases the `conduit install` output size, and enables dashboard integration in the docker-compose environment. Fixes #567 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-05 14:20:05 -07:00
Andrew Seigner	ee042e1943	Rename grafana viz to top-line (#666 ) The primary Grafana dashboard was named 'viz' from a prototype. Rename 'viz' to 'Top Line'. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-02 18:10:35 -07:00
Andrew Seigner	bf721466e3	Filter out conduit controller pods from Grafana (#657 ) The Grafana dashboards were displaying all proxy-enabled pods, including conduit controller pods. In the old telemetry pipeline filtering these out required knowledge of the controller's namespace, which the dashboards are agnostic to. This change leverages the new `conduit_io_control_plane_component` prometheus label to filter out proxy-enabled controller components. Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-04-02 17:56:12 -07:00
Andrew Seigner	8fe742e2de	Update Grafana dashboards to use new proxy metrics (#637 ) The Top-line and Deployment Grafana dashboards relied on the soon-to-be-removed telemetry pipeline metrics. Update the Grafana dashboards to query for the new, proxy-based metrics. Grafana dashboard layouts have not changed. Depends on #635 to render metrics. Part of #420. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-29 13:00:01 -07:00
Andrew Seigner	666c83e963	Add pod_name to Prometheus labels (#649 ) Previously we were using the instance label to uniquely identify a pod. This meant that getting stats by pod name would require extra queries to Kubernetes to map pod name to instance. This change adds a pod_name label to metrics at collection time. This should not affect cardinality as pod_name is invariant with respect to instance. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-29 11:07:35 -07:00
Andrew Seigner	fe35509406	Clean up Prometheus labels scraped from proxy (#633 ) The Prometheus scrape config collects from Conduit proxies, and maps Kubernetes labels to Prometheus labels, appending "k8s_". This change keeps the resultant Prometheus labels consistent with their source Kubernetes labels. For example: "deployment" and "pod_template_hash". Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-27 15:01:08 -07:00
Andrew Seigner	291d8e97ab	Move injected data from env var to k8s labels (#605 ) The inject code detects the object it is being injected into, and writes self-identifying information into the CONDUIT_PROMETHEUS_LABELS environment variable, so that conduit-proxy may read this information and report it to Prometheus at collection time. This change puts the self-identifying information directly into Kubernetes labels, which Prometheus already collects, removing the need for conduit-proxy to be aware of this information. The resulting label in Prometheus is recorded in the form `k8s_deployment`. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-23 16:11:34 -07:00
Andrew Seigner	fb1d6a5c66	Introduce Conduit Health dashboard (#591 ) In addition to dashboards display service health, we need a dashboard to display health of the Conduit service mesh itself. This change introduces a conduit-health dashboard. It currently only displays health metrics for the control plane components. Proxy health will come later. Fixes #502 Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-22 15:16:03 -07:00
Andrew Seigner	c03508ba8c	Update Prometheus to scrape data and control plane (#583 ) The existing telemetry pipeline relies on Prometheus scraping the Telemetry service, which will soon be removed. This change configures Prometheus to scrape the conduit proxies directly for telemetry data, and the control plane components for control-plane health information. This affects the output of both conduit install and conduit inject. Fixes #428, #501 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-22 13:58:11 -07:00
Andrew Seigner	680bf6211a	Add Grafana support to conduit dashboard command (#590 ) The existing `conduit dashboard` command supported opening the conduit dashboard, or displaying the conduit dashboard URL, via a `url` boolean flag. Replace the `url` boolean flag with a `show` string flag, with three modes: `conduit dashboard --show conduit`: default, open conduit dashboard `conduit dashboard --show grafana`: open grafana dashboard `conduit dashboard --show url`: display dashboard URLs Part of #420 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-20 18:07:30 -07:00
Andrew Seigner	3ca8e84eec	Add Top Line and Deployment Grafana dashboards (#562 ) Existing Grafana configuration contained no dashboards, just a skeleton for testing. Introduce two Grafana dashboards: 1) Top Line: Overall health of all Conduit-enabled services 2) Deployment: Health of a specific conduit-enabled deployment Fixes #500 Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-20 10:22:30 -07:00
Alex Leong	9eb084c99d	Most controller listeners should only bind on localhost (#494 ) * Most controller listeners should only bind on localhost * Use default listening addresses in controller components * Review feedback * Revert test_helper change * Revert use of absolute domains Signed-off-by: Alex Leong <alex@buoyant.io>	2018-03-12 11:32:20 -07:00
Brian Smith	0d4ab39ce7	Revert "Make absolute names truly absolute. (#525 )" (#533 ) This reverts commit `517616a166`. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 10:57:10 -10:00
Brian Smith	517616a166	Make absolute names truly absolute. (#525 ) Kubernetes will do multiple DNS lookups for a name like `proxy-api.conduit.svc.cluster.local` based on the default search settings in /etc/resolv.conf for each container: 1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A 2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A 3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A 4. proxy-api.conduit.svc.cluster.local. IN A We do not need or want this search to be done, so avoid it by making each name absolute by appending a period so that the first three DNS queries are skipped for each name. The case for `localhost` is even worse because we expect that `localhost` will always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default search is done: 1. localhost.conduit.svc.cluster.local. IN A 2. localhost.svc.cluster.local. IN A 3. localhost.cluster.local. IN A 4. localhost. IN A Avoid these unnecessary DNS queries by making each name absolute, so that the first three DNS queries are skipped for each name. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-07 09:46:03 -10:00
Kevin Lingerfelt	47fc2eae20	Set -logtostderr flag on controller components (#524 ) Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>	2018-03-07 10:18:15 -08:00
Andrew Seigner	a065174688	Disable Grafana update check (#521 ) Grafana by default calls out to grafana.com to check for updates. As user's of Conduit do not have direct control over updating Grafana directly, this update check is not needed. Disable Grafana's update check via grafana.ini. This is also a workaround for #155, root cause of #519. Signed-off-by: Andrew Seigner <siggy@buoyant.io>	2018-03-06 16:14:44 -08:00
Dennis Adjei-Baah	5a4c5aa683	Exclude telemetry generated by the control plane when requesting depl… (#493 ) When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh. This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment. * exclude telemetry generated by the control plane when requesting deployment metrics fixes #370 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2018-03-05 17:58:08 -08:00
Brian Smith	4c9b9c0f68	Install: Don't install buoyantio/kubectl into the prometheus pod. (#509 ) In the initial review for this code (preceding the creation of the runconduit/conduit repository), it was noted that this container is not actually used, so this is actually dead code. Further, this container actualy causes a minor problem, as it doesn't implement any retry logic, thus it will sometimes often cause errors to be logged. See https://github.com/runconduit/conduit/issues/496#issuecomment-370105328. Further, this is a "buoyantio/" branded container. IF we actually need such a container then it should be a Conduit-branded container. See https://github.com/runconduit/conduit/issues/478 for additional context. Signed-off-by: Brian Smith <brian@briansmith.org>	2018-03-05 08:59:14 -10:00

1 2

54 Commits