Commit Graph

84 Commits

Author SHA1 Message Date
Kevin Lingerfelt 682b0274b5
Add controller admin servers and readiness probes (#1168)
* Add controller admin servers and readiness probes
* Tweak readiness probes to be more sane
* Refactor based on review feedback

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-20 17:32:44 -07:00
Andrew Seigner 0b9e7ff7df
Enable get for nodes/proxy for Prometheus RBAC (#1142)
The `kubernetes-nodes-cadvisor` Prometheus queries node-level data via
the Kubernetes API server. In some configurations of Kubernetes, namely
minikube and at least one baremetal kubespray cluster, this API call
requires the `get` verb on the `nodes/proxy` resource.

Enable `get` for `nodes/proxy` for the `conduit-prometheus` service
account.

Fixes #912

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-06-18 17:49:23 +01:00
Thomas Rampelberg 516807bde6
Add readiness/liveness checks for third party components (#1121)
* Add readiness/liveness checks for third party components

Any possible issues with the third party control plane components can wedge the services.

Take the best practices for prometheus/grafana and add them to our template. See #1116

* Update test fixtures for new output
2018-06-14 13:01:13 -07:00
Kevin Lingerfelt eebc612d52
Add install flag for sending tls identity info to proxies (#1055)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-04 16:55:06 -07:00
Andrew Seigner 8a1a3b31d4
Fix non-default proxy-api port (#979)
Running `conduit install --api-port xxx` where xxx != 8086 would yield a
broken install.

Fix the install command to correctly propagate the `api-port` flag,
setting it as the serve address in the proxy-api container.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-22 10:34:25 -07:00
Andrew Seigner 1275b1ae89
Introduce Grafana, K8s, and Prom dashboards (#904)
Grafana provides default dashboards for Prometheus and Grafana health.
The community also provides Kubernetes-specific dashboards. Conduit was
not taking advantage of these.

Introduce new Grafana dashboards focused on Grafana, Kubernetes, and
Prometheus health. Tag all Conduit dashboards for easier UI navigation.
Also fix layout in Conduit Health dashboard.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-08 23:11:43 +02:00
Andrew Seigner 97bf4fcdf2
Release Notes for 0.4.1 release. (#839)
Also update Getting Started and Debugging docs to reflect changes in
`Tap` and `Stat`.

Fixes #838

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-26 13:32:41 -07:00
Kevin Lingerfelt 653dc6bfaa
Add replication controller stats in CLI (#794)
* Add replication controller stats in CLI
* Fix pod status in stat summary tests

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 18:12:14 -07:00
Andrew Seigner 77fb6d3709
Add namespace as a resource type in public-api (#760)
* Add namespace as a resource type in public-api

The cli and public-api only supported deployments as a resource type.

This change adds support for namespace as a resource type in the cli and
public-api. This also change includes:
- cli statsummary now prints `-`'s when objects are not in the mesh
- cli statsummary prints `No resources found.` when applicable
- removed `out-` from cli statsummary flags, and analagous proto changes
- switched public-api to use native prometheus label types
- misc error handling and logging fixes

Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>

* Refactor filter and groupby label formulation

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Rename stat_summary.go to stat.go in cli

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

* Update rbac privileges for namespace stats

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 16:53:01 -07:00
Kevin Lingerfelt fb15fe7c1a
Remove the telemetry service (#757)
* Remove the telemetry service

The telemetry service is no longer needed, now that prometheus scrapes
metrics directly from proxies, and the public-api talks directly to
prometheus. In this branch I'm removing the service itself as well as
all of the telemetry protobuf, and updating the conduit install command
to no longer install the service. I'm also removing the old version of
the stat command, which required the telemetry service, and renaming the
statsummary command to stat.

* Fix time window tests

* Remove deprecated controller scrape config

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-13 11:21:29 -07:00
Risha Mars 2f5b5ea5f2
Start implementing conduit stat summary endpoint (#671)
Start implementing new conduit stat summary endpoint. 
Changes the public-api to call prometheus directly instead of the
telemetry service. Wired through to `api/stat` on the web server,
as well as `conduit statsummary` on the CLI. Works for deployments only.

Current implementation just retrieves requests and mesh/total pod count 
(so latency stats are always 0). 

Uses API defined in #663
Example queries the stat endpoint will eventually satisfy in #627

This branch includes commits from @klingerf 

* run ./bin/dep ensure
* run ./bin/update-go-deps-shas
2018-04-05 17:05:06 -07:00
Andrew Seigner 28d5007cdf
Harmonize Prometheus label usage (#690)
The Destination service used slightly different labels than the
telemetry pipeline expected, specifically, prefixed with `k8s_*`.

Make all Prometheus labels consistent by dropping `k8s_*`. Also rename
`pod_name` to `pod` for consistency with `deployement`, etc. Also update
and reorganize `proxy-metrics.md` to reflect new labelling.

Fixes #655

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 15:09:06 -07:00
Andrew Seigner 9508e11b45
Build conduit-specific Grafana Docker image (#679)
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.

Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.

Fixes #567
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 14:20:05 -07:00
Andrew Seigner ee042e1943
Rename grafana viz to top-line (#666)
The primary Grafana dashboard was named 'viz' from a prototype.

Rename 'viz' to 'Top Line'.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 18:10:35 -07:00
Andrew Seigner bf721466e3
Filter out conduit controller pods from Grafana (#657)
The Grafana dashboards were displaying all proxy-enabled pods, including
conduit controller pods. In the old telemetry pipeline filtering these
out required knowledge of the controller's namespace, which the
dashboards are agnostic to.

This change leverages the new `conduit_io_control_plane_component`
prometheus label to filter out proxy-enabled controller components.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-02 17:56:12 -07:00
Andrew Seigner 8fe742e2de
Update Grafana dashboards to use new proxy metrics (#637)
The Top-line and Deployment Grafana dashboards relied on the
soon-to-be-removed telemetry pipeline metrics.

Update the Grafana dashboards to query for the new, proxy-based metrics.
Grafana dashboard layouts have not changed.

Depends on #635 to render metrics.

Part of #420.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 13:00:01 -07:00
Andrew Seigner 666c83e963
Add pod_name to Prometheus labels (#649)
Previously we were using the instance label to uniquely identify a pod.
This meant that getting stats by pod name would require extra queries to
Kubernetes to map pod name to instance.

This change adds a pod_name label to metrics at collection time. This
should not affect cardinality as pod_name is invariant with respect to
instance.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-29 11:07:35 -07:00
Andrew Seigner fe35509406
Clean up Prometheus labels scraped from proxy (#633)
The Prometheus scrape config collects from Conduit proxies, and maps
Kubernetes labels to Prometheus labels, appending "k8s_".

This change keeps the resultant Prometheus labels consistent with their
source Kubernetes labels. For example: "deployment" and
"pod_template_hash".

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-27 15:01:08 -07:00
Andrew Seigner 291d8e97ab
Move injected data from env var to k8s labels (#605)
The inject code detects the object it is being injected into, and writes
self-identifying information into the CONDUIT_PROMETHEUS_LABELS
environment variable, so that conduit-proxy may read this information
and report it to Prometheus at collection time.

This change puts the self-identifying information directly into
Kubernetes labels, which Prometheus already collects, removing the need
for conduit-proxy to be aware of this information. The resulting label
in Prometheus is recorded in the form `k8s_deployment`.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-23 16:11:34 -07:00
Andrew Seigner fb1d6a5c66
Introduce Conduit Health dashboard (#591)
In addition to dashboards display service health, we need a dashboard to
display health of the Conduit service mesh itself.

This change introduces a conduit-health dashboard. It currently only
displays health metrics for the control plane components. Proxy health
will come later.

Fixes #502

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-22 15:16:03 -07:00
Andrew Seigner c03508ba8c
Update Prometheus to scrape data and control plane (#583)
The existing telemetry pipeline relies on Prometheus scraping the
Telemetry service, which will soon be removed.

This change configures Prometheus to scrape the conduit proxies directly
for telemetry data, and the control plane components for control-plane
health information. This affects the output of both conduit install
and conduit inject.

Fixes #428, #501

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-22 13:58:11 -07:00
Andrew Seigner 680bf6211a
Add Grafana support to conduit dashboard command (#590)
The existing `conduit dashboard` command supported opening the conduit
dashboard, or displaying the conduit dashboard URL, via a `url` boolean
flag.

Replace the `url` boolean flag with a `show` string flag, with three
modes:
`conduit dashboard --show conduit`: default, open conduit dashboard
`conduit dashboard --show grafana`: open grafana dashboard
`conduit dashboard --show url`: display dashboard URLs

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-20 18:07:30 -07:00
Andrew Seigner 3ca8e84eec
Add Top Line and Deployment Grafana dashboards (#562)
Existing Grafana configuration contained no dashboards, just a skeleton
for testing.

Introduce two Grafana dashboards:
1) Top Line: Overall health of all Conduit-enabled services
2) Deployment: Health of a specific conduit-enabled deployment

Fixes #500

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-20 10:22:30 -07:00
Alex Leong 9eb084c99d Most controller listeners should only bind on localhost (#494)
* Most controller listeners should only bind on localhost
* Use default listening addresses in controller components
* Review feedback
* Revert test_helper change
* Revert use of absolute domains

Signed-off-by: Alex Leong <alex@buoyant.io>
2018-03-12 11:32:20 -07:00
Brian Smith 0d4ab39ce7
Revert "Make absolute names truly absolute. (#525)" (#533)
This reverts commit 517616a166.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 10:57:10 -10:00
Brian Smith 517616a166
Make absolute names truly absolute. (#525)
Kubernetes will do multiple DNS lookups for a name like
`proxy-api.conduit.svc.cluster.local` based on the default search settings
in /etc/resolv.conf for each container:

1. proxy-api.conduit.svc.cluster.local.conduit.svc.cluster.local. IN A
2. proxy-api.conduit.svc.cluster.local.svc.cluster.local. IN A
3. proxy-api.conduit.svc.cluster.local.cluster.local. IN A
4. proxy-api.conduit.svc.cluster.local. IN A

We do not need or want this search to be done, so avoid it by making each
name absolute by appending a period so that the first three DNS queries
are skipped for each name.

The case for `localhost` is even worse because we expect that `localhost` will
always resolve to 127.0.0.1 and/or ::1, but this is not guaranteed if the default
search is done:

1. localhost.conduit.svc.cluster.local. IN A
2. localhost.svc.cluster.local. IN A
3. localhost.cluster.local. IN A
4. localhost. IN A

Avoid these unnecessary DNS queries by making each name absolute, so that the
first three DNS queries are skipped for each name.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-07 09:46:03 -10:00
Kevin Lingerfelt 47fc2eae20
Set -logtostderr flag on controller components (#524)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-03-07 10:18:15 -08:00
Andrew Seigner a065174688
Disable Grafana update check (#521)
Grafana by default calls out to grafana.com to check for updates. As
user's of Conduit do not have direct control over updating Grafana
directly, this update check is not needed.

Disable Grafana's update check via grafana.ini.

This is also a workaround for #155, root cause of #519.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-03-06 16:14:44 -08:00
Dennis Adjei-Baah 5a4c5aa683
Exclude telemetry generated by the control plane when requesting depl… (#493)
When the conduit proxy is injected into the controller pod, we observe controller pod proxy stats show up as an "outbound" deployment for an unrelated upstream deployment. This may cause confusion when monitoring deployments in the service mesh.

This PR filters out this "misleading" stat in the public api whenever the dashboard requests metric information for a specific deployment.

* exclude telemetry generated by the control plane when requesting deployment metrics

fixes #370

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2018-03-05 17:58:08 -08:00
Brian Smith 4c9b9c0f68
Install: Don't install buoyantio/kubectl into the prometheus pod. (#509)
In the initial review for this code (preceding the creation of the
runconduit/conduit repository), it was noted that this container is not
actually used, so this is actually dead code.

Further, this container actualy causes a minor problem, as it doesn't
implement any retry logic, thus it will sometimes often cause errors to
be logged. See
https://github.com/runconduit/conduit/issues/496#issuecomment-370105328.

Further, this is a "buoyantio/" branded container. IF we actually need
such a container then it should be a Conduit-branded container.

See https://github.com/runconduit/conduit/issues/478 for additional
context.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-03-05 08:59:14 -10:00
Andrew Seigner d50c8b4ac8
Add Grafana to conduit install (#444)
`conduit install` deploys prometheus, but lacks a general-purpose way to
visualize that data.

This change adds a Grafana container to the `conduit install` command. It
includes two sample dashboards, viz and health, in their own respective
source files.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-28 11:36:21 -08:00
Dennis Adjei-Baah 893bacf8d6
Make prometheus URL in config fully qualified DNS name (#443)
The telemetry service in the controller pod uses a non-fully qualified URL to connect to the prometheus pod in the control plane. This PR changes the URL the telemetry's prometheus URL to be fully qualified to be consistent with other URLs in the control plane. This change was tested in minikube. The logs report no errors and looking at the prometheus dashboard shows that stats are being recorded from all conduit proxies.

fixes #414

Signed-off-by: Dennis Adjei-Baah dennis@buoyant.io
2018-02-26 09:40:31 -08:00
Brian Smith 86bb65a148
Remove potentially-conflicting `app` labels in control plane (#373)
The `app` label should be reserved for end-user applications and we
shouldn't use it ourselves. We already have a Conduit-specific label
that is is prefixed with the `conduit.io/` prefix to avoid naming
collisions with users' labels, so just use that one instead.

Signed-off-by: Brian Smith <brian@briansmith.org>
2018-02-23 12:43:55 -10:00
Andrew Seigner 83f9c391bb
Move install template to its own file (#423)
The template used by `conduit install` was hard-coded in install.go.

This change moves the template into its own file, in anticipation of
increasing the template's size and complexity.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-02-23 14:15:31 -08:00