Fixes#5652
This PR adds new annotation that is added when a
external Prometheus is used. Based on that
annotations, The CLI can get to know if an external instance
is being used and if the annotation is absent, that the
the default instance is present.
This updates the viz Checks to skip some checkers if the default
Prometheus instances are absent.
This PR also removes the grafana checks as they are not useful
and add unnecessary complexity.
This also cleans up some `grafanaUrl` stuff from the core
control-plane chart.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes#5686
Test:
```bash
$ linkerd viz install --set tapInjector.logLevel=debug | k apply -f -
// and then when creating a pod we can see debug log entries such as:
time="2021-02-10T16:19:28Z" level=debug msg="admission request: &AdmissionRequest{UID:c5e95e8d-...
```
Closes#5545.
This change moves all tap and tap-injector code into the viz directory.
The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.
Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.
There is no change in behavior for the `viz tap` command.
### Reviewing
#### Docker images
All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.
#### Controller and pkg directories
This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.
#### viz/tap
This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.
The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Include viz components in Prom scrapes, fix Linkerd Health charts
Fixes#5429
Expanded the `linkerd-controller` Prometheus scraping config so it also includes the `linkerd-viz` namespace. Also simplified the first relabelling config there removing the `_meta_kubernetes_pod_label_linkerd_io_control_plane_component` source label that wasn't serving any purpose. Just by its own, that extra scraping now allows having non-empty Go charts at the bottom of the `Linkerd Health` charts for the viz components.
Additionally, the `namespace-viz` variable was added into `health.json` which then is leveraged in the queries for the `Control-Plane Traffic` and `Control-Plane TCP Metrics` charts to include the viz pods.
Finally in that same file the queries for the `Data-Plane Telemetry` section were simplified by removing the filter on the `control_plane_ns` label which was redundant.
(Background information)
In our company we are checking the sops-encrypted Linkerd manifest into GitHub repository,
and I came across the following problem.
---
Three dashes mean the start of the YAML document (or the end of the
directive).
https://yaml.org/spec/1.2/spec.html#id2800132
If there are only comments between `---`, the document is empty.
Assume the file which include an empty document at the top of itself.
```yaml
---
# foo
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
# bar
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
When we encrypt and decrypt it with [sops](https://github.com/mozilla/sops), the empty document will be
converted to `{}`.
```yaml
{}
---
apiVersion: v1
kind: Namespace
metadata:
name: foo
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
```
It is invalid as k8s manifest ([apiVersion not set, kind not set]).
```
error validating data: [apiVersion not set, kind not set]
```
---
I'm afraid that it's sops's problem (at least partly), but anyhow this modification is enough harmless I think.
Thank you.
Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
*Closes #5484*
### Changes
---
*Overview*:
* Update golden files and make necessary spec changes
* Update test files for viz
* Add v1 to healthcheck and uninstall
* Fix link-crd clusterDomain field validation
- To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now.
- For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`.
- In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks.
I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further.
### Tests
Linkerd local build CLI + k8s 1.19+
`install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-b0fd2ec8
Server version: unavailable
$ bin/linkerd install | kubectl apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-b0fd2ec8
Status check results are :tick:
# MC
$ bin/linkerd mc install | k apply -f -
- no erros, no version warnings -
$ bin/linkerd mc check
Status check results are :tick:
$ bin/linkerd mc link foo | k apply -f - # test crd creation
# had a validation error here because the schema had targetDomainName instead of targetClusterDomain
# changed, rebuilt cli, re-installed mc, tried command again
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
...
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd uninstall | k delete -f -
- no errors, no version warnings -
```
Linkerd local build CLI + k8s 1.17
`check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
$ bin/linkerd version
Client version: git-3d2d4df1 # made changes to link-crd after prev test case
Server version: unavailable
$ bin/linkerd check --pre --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd check --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd mc check
- no errors, no version warnings -
Status check results are :tick:
$ bin/linkerd mc link --cluster-name foo | k apply -f -
bin/linkerd mc link --cluster-name foo | k apply -f -
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
# VIZ
$ bin/linkerd viz install | k apply -f -
- no errors, no version warnings -
$ bin/linkerd viz check
- no errors, no version warnings -
- hangs up indefinitely after linkerd-viz can talk to Kubernetes
```
Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed)
`check`
```
$ linkerd version
Client version: edge-21.1.3
Server version: git-3d2d4df1
$ linkerd check
- no errors -
- warnings: mismatch between cli & control plane, control plane not up to date (both expected) -
Status check results are :tick:
```
Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed)
`check/uninstall`
```
$ linkerd version
Client version: stable-2.9.2
Server version: git-3d2d4df1
$ linkerd check
× control plane ClusterRoles exist
missing ClusterRoles: linkerd-linkerd-tap
see https://linkerd.io/checks/#l5d-existence-cr for hints
Status check results are ×
# viz wasn't installed, hence the error, installing viz didn't help since
# the res is named `viz-tap` now
# moving to uninstall
$ linkerd uninstall | k delete -f -
- no warnings, no errors -
```
_Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨
Signed-off-by: Matei David <matei.david.35@gmail.com>
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* viz: cleanup helm values.yaml
This branch fixes some nits around naming of default variables
i.e replace the usage of global with default.
Renames globalLogLevel to defaultLogLevel and globalUID to
defaultUID along with some chart README updates.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## What this changes
This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.
If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.
This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.
Closes#5326
## How it changes
### tap-injector
The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:
- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace
### LINKERD2_PROXY_TAP_DISABLED
Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.
### controller image
The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.
As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: move sub-cmds using viz extension under viz cmd
Fixes#5327 , #5524
This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.
- dashboard
- edges
- routes
- stat
- tap
- top
This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
The last viz refactoring removed support for modifying the k8s resources
used by the proxies injected into the control plane components (values
like `tapProxyResources`, `prometheus.proxy.resources`, etc).
This adds them back, using a consistent naming: `tap.proxy.resources`,
`dashboard.proxy.resources`, etc.
Also fixes the tap helm template that was making reference to
`.Values.tapResources` instead of `.Values.tap.resources`.
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* viz: add render golden tests
This branch adds golden tests for the viz install. This would be
useful to track changes in render as more changes are added.
This also moves the common code that is used across extensions
to generate diffs into `testutil` to be able to be used widely.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>