Commit Graph

4 Commits

Author SHA1 Message Date
Tarun Pothulapati cd2e911be3
viz: add data-plane and prometheus healthchecks (#5602)
* viz: add data-plane and prometheus healthchecks

Fixes #5325

This branch adds the remaining healthchecks for the viz extension
i.e

- Data-plane metrics check in Prometheus
- `--proxy` mode which also checks for tap injections based
  on annotations.

For this, The following changes were needed
- Category.ID is made public so that --proxy toggleness can be
allowed
- Made tap env key as a field so that it can be re-used for
checks

simplify viz.NewHealthChecker by removing the need to
 pass categoryIDs and instead using
hc.appendCategories directly at the caller to add the
required categories. This is possible by dividing the vizCategories
into separate functions

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-01 23:01:13 +05:30
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Yashvardhan Kukreja b67bbe157b
add jaeger check: to confirm whether the jaeger injector pod is in running state or not (#5528)
Currently, the linkerd jaeger check runs multiple checks but it doesn't have a check to confirm the state of the jaeger injector to be running.

This commit adds that required check to confirm the running state of the jaeger injector pod.

Fixes #5495

Signed-off-by: Yashvardhan Kukreja <yash.kukreja.98@gmail.com>
2021-01-19 08:35:16 +05:30
Tarun Pothulapati 0a2f1f3a26
viz: add check sub-command (#5496)
* viz: add check sub-command

This adds a new `viz check` cmd performing checks for the resources
in linkerd-viz extension. Checks include resource checks and
the health of resources, certs, etc

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-15 15:31:45 -05:00