Commit Graph

12 Commits

Author SHA1 Message Date
Tarun Pothulapati cb6c1fce03
viz: make prom checks dynamic by using annotations (#5680)
Fixes #5652 

This PR adds new annotation that is added when a
external Prometheus is used. Based on that
annotations, The CLI can get to know if an external instance
is being used and if the annotation is absent, that the
the default instance is present.

This updates the viz Checks to skip some checkers if the default
 Prometheus instances are absent.

This PR also removes the grafana checks as they are not useful
and add unnecessary complexity.

This also cleans up some `grafanaUrl` stuff from the core
control-plane chart.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-02-12 21:25:42 +05:30
Alejandro Pedraza a14f3f4eec
Add the 'tapInjector.logLevel' value. (#5713)
Fixes #5686

Test:
```bash
$ linkerd viz install --set tapInjector.logLevel=debug | k apply -f -

// and then when creating a pod we can see debug log entries such as:

time="2021-02-10T16:19:28Z" level=debug msg="admission request: &AdmissionRequest{UID:c5e95e8d-...
```
2021-02-11 09:21:41 -05:00
Kevin Leimkuhler 75fcc9d623
Move tap from core into Viz extension (#5651)
Closes #5545.

This change moves all tap and tap-injector code into the viz directory. 

The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.

Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.

There is no change in behavior for the `viz tap` command.

### Reviewing

#### Docker images

All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.

#### Controller and pkg directories

This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.

#### viz/tap

This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.

The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-09 12:43:21 -05:00
Alejandro Pedraza b8ed799372
Include viz components in Prom scrapes, fix Linkerd Health charts (#5656)
* Include viz components in Prom scrapes, fix Linkerd Health charts

Fixes #5429

Expanded the `linkerd-controller` Prometheus scraping config so it also includes the `linkerd-viz` namespace. Also simplified the first relabelling config there removing the `_meta_kubernetes_pod_label_linkerd_io_control_plane_component` source label that wasn't serving any purpose. Just by its own, that extra scraping now allows having non-empty Go charts at the bottom of the `Linkerd Health` charts for the viz components.

Additionally, the `namespace-viz` variable was added into `health.json` which then is leveraged in the queries for the `Control-Plane Traffic` and `Control-Plane TCP Metrics` charts to include the viz pods.

Finally in that same file the queries for the `Data-Plane Telemetry` section were simplified by removing the filter on the `control_plane_ns` label which was redundant.
2021-02-04 09:40:23 -05:00
Takumi Sue 77add64860
Remove extra three dashes from helm templates (#5628)
(Background information)
In our company we are checking the sops-encrypted Linkerd manifest into GitHub repository,
and I came across the following problem.

---

Three dashes mean the start of the YAML document (or the end of the
directive).
https://yaml.org/spec/1.2/spec.html#id2800132

If there are only comments between `---`, the document is empty.
Assume the file which include an empty document at the top of itself.

```yaml
---
# foo
---
apiVersion: v1
kind: Namespace
metadata:
  name: foo
---
# bar
---
apiVersion: v1
kind: Namespace
metadata:
  name: bar
```

When we encrypt and decrypt it with [sops](https://github.com/mozilla/sops), the empty document will be
converted to `{}`.

```yaml
{}
---
apiVersion: v1
kind: Namespace
metadata:
    name: foo
---
apiVersion: v1
kind: Namespace
metadata:
    name: bar
```

It is invalid as k8s manifest ([apiVersion not set, kind not set]).

```
error validating data: [apiVersion not set, kind not set]
```

---

I'm afraid that it's sops's problem (at least partly), but anyhow this modification is enough harmless I think.
Thank you.

Signed-off-by: Takumi Sue <u630868b@alumni.osaka-u.ac.jp>
2021-02-01 10:51:34 -05:00
Matei David 0ce9e84a94
Introduce V1 to CRDs and Mutating Hooks (#5603)
*Closes #5484*
 ### Changes
---
*Overview*:
 * Update golden files and make necessary spec changes
 * Update test files for viz
 * Add v1 to healthcheck and uninstall
 * Fix link-crd clusterDomain field validation

- To update to v1, I had to change crd schemas to be version-based (i.e each version has to declare its own schema). I noticed an error in the link-crd (`targetClusterDomain` was `targetDomainName`). Also, additionalPrinterColumns are also version-dependent as a field now.

- For `admissionregistration` resources I had to add an additional `admissionReviewVersions` field -- I included `v1` and `v1beta1`.

- In `healthcheck.go` and `resources.go` (used by `uninstall`) I had to make some changes to the client-go versions (i.e from `v1beta1` to `v1` for admissionreg and apiextension) so that we don't see any warning messages when uninstalling or when we do any install checks. 

I tested again different cli and k8s versions to have a bit more confidence in the changes (in addition to automated tests), hope the cases below will be enough, if not let me know and I can test further.

### Tests

Linkerd local build CLI + k8s 1.19+
`install/check/mc-check/mc-install/mc-link/viz-install/viz-check/uninstall/`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2+k3s1", GitCommit:"1d4adb0301b9a63ceec8cabb11b309e061f43d5f", GitTreeState:"clean", BuildDate:"2021-01-14T23:52:37Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

$ bin/linkerd version
Client version: git-b0fd2ec8
Server version: unavailable

$ bin/linkerd install | kubectl apply -f -
- no errors, no version warnings - 

$ bin/linkerd check --expected-version git-b0fd2ec8
Status check results are :tick:

# MC

$ bin/linkerd mc install | k apply -f - 
- no erros, no version warnings - 

$ bin/linkerd mc check
Status check results are :tick:

$ bin/linkerd mc link foo | k apply -f -   # test crd creation
# had a validation error here because the schema had targetDomainName instead of targetClusterDomain
# changed, rebuilt cli, re-installed mc, tried command again
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created
...

# VIZ
$ bin/linkerd viz install | k apply -f - 
- no errors, no version warnings - 

$ bin/linkerd viz check 
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd uninstall | k delete -f -
- no errors, no version warnings - 
```

Linkerd local build CLI + k8s 1.17
`check-pre/install/mc-check/mc-install/mc-link/viz-install/viz-check`
```
$ kubectl version
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17-rc1+k3s1", GitCommit:"e8c9484078bc59f2cd04f4018b095407758073f5", GitTreeState:"clean", BuildDate:"2021-01-14T06:20:56Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

$ bin/linkerd version
Client version: git-3d2d4df1 # made changes to link-crd after prev test case
Server version: unavailable

$ bin/linkerd check --pre --expected-version git-3d2d4df1
- no errors, no version warnings -
Status check results are :tick:

$ bin/linkerd install | k apply -f -
- no errors, no version warnings -

$ bin/linkerd check --expected-version git-3d2d4df1
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd mc install | k apply -f -
- no errors, no version warnings - 

$ bin/linkerd mc check 
- no errors, no version warnings - 
Status check results are :tick:

$ bin/linkerd mc link --cluster-name foo | k apply -f -
bin/linkerd mc link --cluster-name foo | k apply -f -
secret/cluster-credentials-foo created
link.multicluster.linkerd.io/foo created

# VIZ

$ bin/linkerd viz install | k apply -f - 
- no errors, no version warnings - 

$ bin/linkerd viz check
- no errors, no version warnings -
- hangs up indefinitely after linkerd-viz can talk to Kubernetes
```

Linkerd edge (21.1.3) CLI + k8s 1.17 (already installed)
`check`
```
$ linkerd version
Client version: edge-21.1.3
Server version: git-3d2d4df1

$ linkerd check
- no errors -
- warnings: mismatch between cli & control plane, control plane not up to date (both expected) -
Status check results are :tick:
```

Linkerd stable (2.9.2) CLI + k8s 1.17 (already installed)
`check/uninstall`
```
$ linkerd version
Client version: stable-2.9.2
Server version: git-3d2d4df1

$ linkerd check
× control plane ClusterRoles exist
    missing ClusterRoles: linkerd-linkerd-tap
    see https://linkerd.io/checks/#l5d-existence-cr for hints

Status check results are ×

# viz wasn't installed, hence the error, installing viz didn't help since
# the res is named `viz-tap` now
# moving to uninstall

$ linkerd uninstall | k delete -f -
- no warnings, no errors - 
```

_Note_: I used `go test ./cli/cmd/... --generate` which is why there are so many changes 😨 

Signed-off-by: Matei David <matei.david.35@gmail.com>
2021-02-01 09:18:13 -05:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Tarun Pothulapati 288fbefe02
viz: cleanup helm values.yaml (#5546)
* viz: cleanup helm values.yaml

This branch fixes some nits around naming of default variables
i.e replace the usage of global with default.

Renames globalLogLevel to defaultLogLevel and globalUID to
defaultUID along with some chart README updates.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-22 00:48:16 +05:30
Kevin Leimkuhler e7f2a3fba3
viz: add tap-injector (#5540)
## What this changes

This adds a tap-injector component to the `linkerd-viz` extension which is
responsible for adding the tap service name environment variable to the Linkerd
proxy container.

If a pod does not have a Linkerd proxy, no action is taken. If tap is disabled
via annotation on the pod or the namespace, no action is taken.

This also removes the environment variable for explicitly disabling tap through
an environment variable. Tap status for a proxy is now determined only be the
presence or absence of the tap service name environment variable.

Closes #5326

## How it changes

### tap-injector

The tap-injector component determines if `LINKERD2_PROXY_TAP_SVC_NAME` should be
added to a pod's Linkerd proxy container environment. If the pod satisfies the
following, it is added:

- The pod has a Linkerd proxy container
- The pod has not already been mutated
- Tap is not disabled via annotation on the pod or the pod's namespace

### LINKERD2_PROXY_TAP_DISABLED

Now that tap is an extension of Linkerd and not a core component, it no longer
made sense to explicitly enable or disable tap through this Linkerd proxy
environment variable. The status of tap is now determined only be if the
tap-injector adds or does not add the `LINKERD2_PROXY_TAP_SVC_NAME` environment
variable.

### controller image

The tap-injector has been added to the controller image's several startup
commands which determines what it will do in the cluster.

As a follow-up, I think splitting out the `tap` and `tap-injector` commands from
the controller image into a linkerd-viz image (or something like that) makes
sense.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-21 11:24:08 -05:00
Tarun Pothulapati 4c3d002501
viz: move sub-cmds using viz extension under viz cmd (#5485)
* viz: move sub-cmds using viz extension under viz cmd

Fixes #5327 , #5524 

This branch moves the following commands, under the `linkerd viz`
cmd as they use the viz extension to perform the job.

- dashboard
- edges
- routes
- stat
- tap
- top

This also creates a new pkg `public-api` which fecilitates
interaction and communication with public-api to be used
across extensions.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
2021-01-13 12:11:25 +05:30
Alejandro Pedraza a9317af3d8
Add back support for proxy resource settings (#5517)
The last viz refactoring removed support for modifying the k8s resources
used by the proxies injected into the control plane components (values
like `tapProxyResources`, `prometheus.proxy.resources`, etc).

This adds them back, using a consistent naming: `tap.proxy.resources`,
`dashboard.proxy.resources`, etc.

Also fixes the tap helm template that was making reference to
`.Values.tapResources` instead of `.Values.tap.resources`.

Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-12 10:56:43 -05:00
Tarun Pothulapati 836c077898
viz: add render golden tests (#5433)
* viz: add render golden tests

This branch adds golden tests for the viz install. This would be
useful to track changes in render as more changes are added.

This also moves the common code that is used across extensions
to generate diffs into `testutil` to be able to be used widely.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-01-12 11:59:16 +05:30