Fixes#5966Fixes#5955
The metrics-api container in the Viz extension does not have the default set of system CA certificates installed. This means that it will fail to validate the certificate of an external prometheus serverd over https.
We add install default CA certs into the container.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add peer label to TCP read and write stat queries
Closes#5693
### Tests
---
After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes).
```
$ linkerd viz stat deploy/web -n emojivoto -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 91.91% 2.3rps 2ms 4ms 5ms 3 185.3B/s 5180.0B/s
# same value as before, latency seems to have dropped
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
# queries show the peer label
---
$ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 93.16% 1.9rps 3ms 4ms 4ms 1 4503.4B/s 153.1B/s
# stats same as before except for latency which seems to have dropped a bit
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
# queries show the right label
```
Signed-off-by: mateiidavid <matei.david.35@gmail.com>
* update go.mod and docker images to go 1.16.1
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* update test error messages for ParseDuration
* update go version to 1.16.2
When introducing the `linkerd-await` helper, we provided a default value
for `TARGETARCH`. This appears to interfere with multi-arch image
builds, causing ARM builds to fetch amd64 binaries.
Unsetting this default appears to fix this issue.
When a container starts up, we generally want to wait for the proxy to
initialize before starting the controller (which may initiate outbound
connections, especially to the Kubernetes API). This is true for all
pods except the identity controller, which must start before its proxy.
This change adds the linkerd-await helper to all of our container
images. Its use is explicitly disabled in the identity controller, due
to startup ordering constraints, and the heartbeat controller, because
it does not run a proxy currently.
Fixes#5819
* Remove linkerd prefix from extension resources
This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
The Go-1.14 release branch includes a number of important updates. This
change updates our containers' base image to the latest release, 1.14.15
See linkerd/linkerd2-proxy-init#32
Fixes#5655
Closes#5545.
This change moves all tap and tap-injector code into the viz directory.
The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.
Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.
There is no change in behavior for the `viz tap` command.
### Reviewing
#### Docker images
All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.
#### Controller and pkg directories
This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.
#### viz/tap
This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.
The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5575
Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.
Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.
## Success Output
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```
## Failure Examples
Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`