Added the `DS_PROMETHEUS` parameter in all the Grafana dashboard
definitions. When importing a definition into a Grafana Cloud instance
for example, the import form will allow selecting the datasource from
the currently available. OTOH when using an in-cluster instance such
as one installed through the Grafana helm chart, the parameter gets
overridden with the `values.yaml` entry
`dashboards.default.{name}.datasource`.
Also, the javascript snippet used in the dashboard definitions for
checking for the latest linkerd version has been wrapped around a hidden
div. This avoids showing the script itself when it gets escaped when
importing the definition into Grafana Cloud.
* Stop shipping grafana-based image
Fixes#6045#7358
With this change we stop building a Grafana-based image preloaded with the Linkerd Grafana dashboards.
Instead, we'll recommend users to install Grafana by themselves, and we provide a file `grafana/values.yaml` with a default config that points to all the same Grafana dashboards we had, which are now hosted in https://grafana.com/orgs/linkerd/dashboards .
The new file `grafana/README.md` contains instructions for installing the official Grafana Helm chart, and mentions other available methods.
The `grafana.enabled` flag has been removed, and `grafanaUrl` has been moved to `grafana.url`. This will help consolidating other grafana settings that might emerge, in particular when #7429 gets addressed.
## Dashboards definitions changes
The dashboard definitions under `grafana/dashboards` (which should be kept in sync with what's published in https://grafana.com/orgs/linkerd/dashboards), got updated, adding the `__inputs`, `__elements` and `__requires` entries at the beginning, that were required in order to be published.
As per how links to grafana charts are [built](b0a799eee7/web/app/js/components/GrafanaLink.jsx (L7)), all the chart's UUIDs should be prefixed by `linkerd-`. So this fixes the broken links to charts for deployments, cronjobs, jobs, daemonsets, replicasets, replicationcontrollers and statefulsets.
* Include viz components in Prom scrapes, fix Linkerd Health charts
Fixes#5429
Expanded the `linkerd-controller` Prometheus scraping config so it also includes the `linkerd-viz` namespace. Also simplified the first relabelling config there removing the `_meta_kubernetes_pod_label_linkerd_io_control_plane_component` source label that wasn't serving any purpose. Just by its own, that extra scraping now allows having non-empty Go charts at the bottom of the `Linkerd Health` charts for the viz components.
Additionally, the `namespace-viz` variable was added into `health.json` which then is leveraged in the queries for the `Control-Plane Traffic` and `Control-Plane TCP Metrics` charts to include the viz pods.
Finally in that same file the queries for the `Data-Plane Telemetry` section were simplified by removing the filter on the `control_plane_ns` label which was redundant.
Prometheus use a relabel rule that changed since 1.16
Use "pod_name" and "pod" to avoid breaking changes.
Also use "container" and "container_name" for the
same reasons.
Fixes#4380
Signed-off-by: Florian Davasse <florian.davasse@stack-labs.com>
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).
The misspellings have been reported at aaf440489e (commitcomment-41423663)
The action reports that the changes in this PR would make it happy: 5b82c6c5ca
Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
Using following command the wrong spelling were found and later on
fixed:
```
codespell --skip CHANGES.md,.git,go.sum,\
controller/cmd/service-mirror/events_formatting.go,\
controller/cmd/service-mirror/cluster_watcher_test_util.go,\
SECURITY_AUDIT.pdf,.gcp.json.enc,web/app/img/favicon.png \
--ignore-words-list=aks,uint,ans,files\' --check-filenames \
--check-hidden
```
Signed-off-by: Suraj Deshmukh <surajd.service@gmail.com>
Change terminology from local/remote to source/target in events and metrics.
This does not change any variable, function, struct, or field names since
testing is still improving
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change adds labels to endpoints that target remote services. It also adds a Grafana dashboard that can be used to monitor multicluster traffic.
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
* use custom all values for top line dashboard
* convert remaining allValue params to wildcard glob
Signed-off-by: Matt Miller <mamiller@rosettastone.com>
This PR adds support for CronJobs and ReplicaSets to `linkerd inject`, the web
dashboard and CLI. It adds a new Grafana dashboard for each kind of resource.
Closes#3614Closes#3630Closes#3584Closes#3585
Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com
Signed-off-by: Cintia Sanchez Garcia cynthiasg@icloud.com
* Add TCP stats to the Linkerd Pod Grafana dashboard (#2329)
* Minimize tcp stats and link it to dashboard tcp tables
* Add rows to fix minimization issues
Signed-off-by: Gaurav Kumar <gaurav.kumar9825@gmail.com>
All Grafana graphs use shared tooltips (display all series in the
tooltip rather than the one currently moused-over), except for 3 graphs
in the Linkerd Health dashboard.
This change ensures all tooltips are shared.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Currently, we use request_total for the variable query to determine the names in
the grafana dropdowns. We should use a non-http-based metric instead, so that if
there is only TCP traffic, the dropdowns will still be populated.
This branch uses process_start_time_seconds instead of the http-based
request_total to query for grafana variables
The control-plane's clients, specifically the Kubernetes clients, did
not provide telemetry information.
Introduce a `prometheus.ClientWithTelemetry` wrapper to instrument
arbitrary clients. Apply this wrapper to Kubernetes clients.
Fixes#2183
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
DaemonSet stats are not currently shown in the cli stat command, web ui
or grafana dashboard. This commit adds daemonset support for stat.
Update stat command's help message to reference daemonsets.
Update the public-api to support stats for daemonsets.
Add tests for stat summary and api.
Add daemonset get/list/watch permissions to the linkerd-controller
cluster role that's created using the install command.
Update golden expectation test files for install command
yaml manifest output.
Update web UI with daemonsets
Update navigation, overview and pages to list daemonsets and the pods
associated to them.
Add daemonset paths to server, and ui apps.
Add grafana dashboard for daemonsets; a clone of the deployment
dashboard.
Update dependencies and dockerfile hashes
Add DaemonSet support to tap and top commands
Fixes of #2006
Signed-off-by: Zak Knill <zrjknill@gmail.com>
* Proxy grafana requests through web service
* Fix -grafana-addr default, clarify -api-addr flag
* Fix version check in grafana dashboards
* Fix comment typo
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
linkerd/linkerd2-proxy#116 removes the `classification` label for the
`tcp_close_total` metric because TCP sockets that close with an error
do not actually indicate any sort of failure -- many graceful shutdown
situations can still cause a socket error.
This change uses the `errno` label to enumerate tcp_close_total metrics.
* Add Grafana dashboard for Authorities
Proposal for #1225
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Implement code review suggestions
Modified Inbound by Deployment and Inbound by Pod graphs according to klingerf's feedback.
Removed template variables values.
Signed-off-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
* Update version checks to support release channels
* Update based on review feedback
* Fix sidebar tests
* Update CI config for edge and stable tags
Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
* Add version check to Grafana dashboard
The web dashboard checks the local Linkerd version against the latest
release, and informs the user if an update is available. Grafana was not
doing this.
Modify the Grafana dashboard to perform a version check, and prompt the
user to update if needed.
Fixes#1607
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* update grafana dashboards to remove conduit reference and replace with linkerd instances
* update test install fixtures to reflect changes
Fixes: #1315
Signed-off-by: Franziska von der Goltz <franziska@vdgoltz.eu>
PR #1128 introduced new proxy process stats.
Introduce Grafana graphs that expose these new proxy process stats.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Remove the filling and stacking in request rate graphs that combine resources,
to make it easier to spot outliers.
* Grafana: remove fill and stack from individual resource breakouts
* Remove all the stacks and fills from request rates everywhere
The Grafana dashboards currently show Request Volume by ns/deploy/pod.
Add a `meshed` dimension to the Request Volume graphs, in anticipation
of the `meshed`/`secured` label from the proxy. Also increase `irate`
time window queries from `20s` to `30s`, per recommendation from
Prometheus team.
Relates to #388.
Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The Grafana dashboards were explicitly filtering out Conduit
control-plane data.
Remove control-plane filtering from Grafana dashboards. This brings
Grafana in-line with web, and also encourages better dog-fooding of our
proxy metrics and dashboards. Also update Grafana to 5.1.3, update the
BUILD.md architecture diagram to include Promethues and Grafana, and
introduce a Prometheus Benchmark dashboard, courtesy of Robust
Perception.
Fixes#908
Signed-off-by: Andrew Seigner <siggy@buoyant.io>