Commit Graph

36 Commits

Author SHA1 Message Date
Andrew Seigner 81790b6735 Bump Prometheus to v2.10.0 (#2979)
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-06-21 12:51:31 -07:00
Gaurav Kumar cbcd201715 Add TCP stats to the Linkerd Pod Grafana dashboard (#2329) (#2477)
* Add TCP stats to the Linkerd Pod Grafana dashboard (#2329)
* Minimize tcp stats and link it to dashboard tcp tables
* Add rows to fix minimization issues

Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
2019-03-14 14:49:13 -07:00
Tarun Pothulapati 8f6c63d5ea Added Jobs Resource to Linkerd Dashboard along with grafana. (#2439)
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2019-03-06 17:06:46 -08:00
Andrew Seigner 8384f1eb56
Ensure shared tooltips in Linkerd Health dashboard (#2324)
All Grafana graphs use shared tooltips (display all series in the
tooltip rather than the one currently moused-over), except for 3 graphs
in the Linkerd Health dashboard.

This change ensures all tooltips are shared.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-19 15:55:36 -08:00
Risha Mars ee18a7fe31
Modify the grafana variable queries to use a tcp-based metric (#2272)
Currently, we use request_total for the variable query to determine the names in
the grafana dropdowns. We should use a non-http-based metric instead, so that if
there is only TCP traffic, the dropdowns will still be populated.

This branch uses process_start_time_seconds instead of the http-based
request_total to query for grafana variables
2019-02-19 13:46:02 -08:00
Andrew Seigner 1df1683b6a
Instrument k8s clients (#2243)
The control-plane's clients, specifically the Kubernetes clients, did
not provide telemetry information.

Introduce a `prometheus.ClientWithTelemetry` wrapper to instrument
arbitrary clients. Apply this wrapper to Kubernetes clients.

Fixes #2183

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2019-02-18 09:10:02 -08:00
Ivan Sim f6e75ec83a
Add statefulsets to the dashboard and CLI (#2234)
Fixes #1983

Signed-off-by: Ivan Sim <ivan@buoyant.io>
2019-02-08 15:37:44 -08:00
zak 8c413ca38b Wire up stats commands for daemonsets (#2006) (#2086)
DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.

Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.

Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.

Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.

Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.

Update dependencies and dockerfile hashes

Add DaemonSet support to tap and top commands

Fixes of #2006

Signed-off-by: Zak Knill <zrjknill@gmail.com>
2019-01-24 14:34:13 -08:00
Kevin Lingerfelt a27bb2e0ce
Proxy grafana requests through web service (#2039)
* Proxy grafana requests through web service
* Fix -grafana-addr default, clarify -api-addr flag
* Fix version check in grafana dashboards
* Fix comment typo

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2019-01-04 16:07:57 -08:00
Kevin Lingerfelt 37ae423bb3
Add linkerd- prefix to all objects in linkerd install (#1920)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-12-04 15:41:47 -08:00
Oliver Gould 747fd328e9
grafana: Show TCP closes by errno (#1839)
linkerd/linkerd2-proxy#116 removes the `classification` label for the
`tcp_close_total` metric because TCP sockets that close with an error
do not actually indicate any sort of failure -- many graceful shutdown
situations can still cause a socket error.

This change uses the `errno` label to enumerate tcp_close_total metrics.
2018-11-02 10:20:11 -07:00
Alejandro Pedraza 338848d2bc Add Grafana dashboard for Authorities (#1772)
* Add Grafana dashboard for Authorities

Proposal for #1225

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>

* Implement code review suggestions

Modified Inbound by Deployment and Inbound by Pod graphs according to klingerf's feedback.
Removed template variables values.

Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2018-10-18 13:56:13 -07:00
Kevin Lingerfelt 12b10e27c1
Update version checks to support release channels (#1667)
* Update version checks to support release channels
* Update based on review feedback
* Fix sidebar tests
* Update CI config for edge and stable tags

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-09-17 17:13:50 -07:00
Andrew Seigner b708378d07
Add version check to Grafana dashboard (#1638)
* Add version check to Grafana dashboard

The web dashboard checks the local Linkerd version against the latest
release, and informs the user if an update is available. Grafana was not
doing this.

Modify the Grafana dashboard to perform a version check, and prompt the
user to update if needed.

Fixes #1607

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-09-13 15:28:44 -07:00
Kevin Lingerfelt 4845b4ec04
Restore linkerd.io/control-plane* labels (#1411)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-07 13:53:29 -07:00
Kevin Lingerfelt e0a01c5dd8
Remove node scrape target, kubernetes grafana dashboard (#1410)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-08-07 13:41:38 -07:00
Kevin Lingerfelt 4b9700933a
Update prometheus labels to match k8s resource names (#1355)
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-23 15:45:05 -07:00
Franziska von der Goltz c7ac072acc
update grafana dashboards: conduit to linkerd (#1320)
* update grafana dashboards to remove conduit reference and replace with linkerd instances
* update test install fixtures to reflect changes

Fixes: #1315

Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
2018-07-16 13:05:01 -07:00
Kevin Lingerfelt e5cce1abaf
Rename CLI from conduit to linkerd (#1312)
* Rename CLI binary
* Update integration tests for new binary name
* Rename --conduit-namespace flag, change default ns
* Rename occurrences of conduit in rest of CLI
* Rename inject and install components
* Remove conduit occurrences in docker files
* Additional miscellaneous cleanup
* Move protobuf definitions to linkerd2 package
* Rename conduit.io labels to use linkerd.io
* Rename conduit-managed segment to linkerd-managed
* Fix conduit references in web project

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-07-12 17:14:07 -07:00
Andrew Seigner e70d62dc9f
Introduce Proxy process telemetry in Grafana (#1199)
PR #1128 introduced new proxy process stats.

Introduce Grafana graphs that expose these new proxy process stats.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-06-27 00:58:28 +01:00
Risha Mars fdb0b7f63f
Grafana: remove fill and stack from individual resource breakouts (#1092)
Remove the filling and stacking in request rate graphs that combine resources, 
to make it easier to spot outliers.

* Grafana: remove fill and stack from individual resource breakouts
* Remove all the stacks and fills from request rates everywhere
2018-06-18 10:14:39 -07:00
Risha Mars 53b713b2a8
Remove the ⚠️ emoji from non-tlsed grafana stat labels (#1089) 2018-06-08 15:00:56 -07:00
Risha Mars b930bc6b88
Fix conduit health grafana dashboard (#1086)
* Scope health queries to controller namespace

* Add a prometheus query variable to get the conduit namespace
2018-06-08 12:57:05 -07:00
Kevin Lingerfelt ec2433e9bd
Update controller to use 'tls' metric label (#1044)
* Update controller to use 'tls' metric label
* Fix meshed column formatter

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-06-01 16:44:33 -07:00
Andrew Seigner 95f9f8dc35
Add meshed label support to Grafana (#1021)
The Grafana dashboards currently show Request Volume by ns/deploy/pod.

Add a `meshed` dimension to the Request Volume graphs, in anticipation
of the `meshed`/`secured` label from the proxy. Also increase `irate`
time window queries from `20s` to `30s`, per recommendation from
Prometheus team.

Relates to #388.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-25 14:10:57 -07:00
Andrew Seigner 6fccdee58e
Stop special-casing conduit controller in Grafana (#984)
The Grafana dashboards were explicitly filtering out Conduit
control-plane data.

Remove control-plane filtering from Grafana dashboards. This brings
Grafana in-line with web, and also encourages better dog-fooding of our
proxy metrics and dashboards. Also update Grafana to 5.1.3, update the
BUILD.md architecture diagram to include Promethues and Grafana, and
introduce a Prometheus Benchmark dashboard, courtesy of Robust
Perception.

Fixes #908

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-23 13:47:20 -07:00
Andrew Seigner 1275b1ae89
Introduce Grafana, K8s, and Prom dashboards (#904)
Grafana provides default dashboards for Prometheus and Grafana health.
The community also provides Kubernetes-specific dashboards. Conduit was
not taking advantage of these.

Introduce new Grafana dashboards focused on Grafana, Kubernetes, and
Prometheus health. Tag all Conduit dashboards for easier UI navigation.
Also fix layout in Conduit Health dashboard.

Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-05-08 23:11:43 +02:00
Andrew Seigner 5a5c6d14ab
Update Grafana to 5.1.0, handle missing data (#876)
Conduit 0.4.1 contained some rough edges in the Grafana deployment.

This PR include the following:
- bump Grafana to 5.1.0
- fix deployment and rc graphs when no data present
- fix some text sections overlapping due to scrolling

Fixes #705

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-29 22:24:22 +02:00
Eliza Weisman d55e334a42
Add TCP stats to deployment dashboards (#824)
This PR adds the TCP metrics added in #785 and #790 to the Grafana deployment dashboards. I've added three new charts in the "Inbound Traffic" and "Outbound Traffic" headings:
+ "TCP Connection Failures": plots the number of failed TCP connections over time
+ "TCP Connections Open": shows the number of accepted and opened connections currently open
+ "TCP Connection Duration": a heatmap of connection durations over time

I'm planning on adding similar graphs to other dashboards as well in subsequent PRs.
2018-04-25 16:26:43 -07:00
Risha Mars aca09813fd
Add a Replication Controller grafana dashboard (#843)
* Add a Replication Controller grafana dashboard, very similar to the Deployment one
2018-04-25 10:57:41 -07:00
Andrew Seigner 326d9f493c
Fix top-line Grafana counts (#815)
The top-line single stat numbers were not calculated properly, resulting
in inflated counts.

Modify the underlying Prometheus queries to ensure accurate counts of
Deployments, Pods, and Namespaces.

Fixes #801.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 18:01:06 -07:00
Andrew Seigner c26955186b
Introduce service-centric Grafana dashboard (#810)
Conduit's Grafana currencly provides Top-line, Deployment, Pod, and Mesh
Health dashboards.

This change adds a new Conduit Service dashboard. In addition to
top-line information, this dashboards focuses primarily on requests to a
Service, as only dst_service is available in our metrics.

Part of #706

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-19 17:32:42 -07:00
Kevin Lingerfelt 71a51afb40
Expose pod stats in CLI, web UI, and Grafana (#788)
* Expose pod stats in CLI, web UI, and Grafana
* Fix js api helpers test
* Add outbound traffic stats to pod dashboard

Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
2018-04-18 11:26:47 -07:00
Andrew Seigner c9cdd838dc
Standardize and polish Grafana for 0.4.0 release (#766)
The top-line, deployments, and health Grafana dashboards had
inconsistent layouts and data.

This change standardizes our Grafana dashboards. Every row is composed
of Success Rate, Request Rate, and Latency.

Part of #420.

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-13 18:01:44 -07:00
Andrew Seigner b6bcdcc059
Namespace-aware Grafana dashboards (#716)
The Grafana dashboards key off of deployment, but had no awareness of
namespaces, causing incorrect metrics aggregation and display.

This change makes the Grafana dashboards key off of namespaces, and also
modifies the Grafana links in the Conduit dashboard to link to
namespace+deployment.

Fixes #704
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-06 15:37:53 -07:00
Andrew Seigner 9508e11b45
Build conduit-specific Grafana Docker image (#679)
Using a vanilla Grafana Docker image as part of `conduit install`
avoided maintaining a conduit-specific Grafana Docker image, but made
packaging dashboard json files cumbersome.

Roll our own Grafana Docker image, that includes conduit-specific
dashboard json files. This significantly decreases the `conduit install`
output size, and enables dashboard integration in the docker-compose
environment.

Fixes #567
Part of #420

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
2018-04-05 14:20:05 -07:00