Fixes#6740
\#6711 removed the usage of unnecessary reference variables
in the proxy template, as they are not needed. Their definations
were left as there were race conditions with extension installs.
As `2.11` was released with that change, Now its a good time to
remove the definations too as no usages should be present from a
`2.11` upgrade.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Now, that SMI functionality is fully being moved into the
[linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can
stop supporting its functionality by default.
This means that the `destination` component will stop reacting
to the `TrafficSplit` objects. When `linkerd-smi` is installed,
It does the conversion of `TrafficSplit` objects to `ServiceProfiles`
that destination components can understand, and will react accordingly.
Also, Whenever a `ServiceProfile` with traffic splitting is associated
with a service, the same information (i.e splits and weights) is also
surfaced through the `UI` (in the new `services` tab) and the `viz cmd`.
So, We are not really loosing any UI functionality here.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
* build: upgrade to Go 1.17
This commit introduces three changes:
1. Update the `go` directive in `go.mod` to 1.17
2. Update all Dockerfiles from `golang:1.16.2` to
`golang:1.17.3`
3. Update all CI to use Go 1.17
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* chore: run `go fmt ./...`
This commit synchronizes `//go:build` lines with `// +build` lines.
Reference: https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Similarly to the `linkerd authz` command which lists all authorizations for a given resource and `linkerd viz stat` which can show metrics for policy resources, we introduce a `linkerd viz authz` command which shows metrics for server authorizations broken down by server for a given resource. It also shows the rate of unauthorized requests to each server. This is helpful for seeing a breakdown of which authorizations are being used and what proportion of traffic is being rejected. For example:
```console
> linkerd viz authz -n emojivoto deploy
SERVER AUTHZ SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
emoji-grpc emoji-grpc 100.00% 1.8rps 1ms 1ms 1ms
prom prom-prometheus - - - - -
voting-grpc [UNAUTHORIZED] - 0.9rps - - -
web-http web-public 50.00% 1.8rps 4ms 190ms 198ms
```
This shows us a few things right away:
* all traffic to the emoji-grpc server is authorized by the emoji-grpc server authorization
* the prom server defines a prom-prometheus server authorization, but it is not receiving any traffic
* the voting-grpc server has no server authorizations, and thus all 0.9rps is getting rejected
Fixes#6733
As policy resources provide a grouping, statistics summaries should
also be allowed on these groupings which are useful to the user. Them
being port specific provide a great way to break down these metrics
further.
This PR adds support for policy resources i.e `server` and `serverauthorization`
on the `stat` command.
## Changes
This adds a new path in the `stat_summary.go` file to handle policy
objects. I tried to see if we could re-use some of the other paths
but some of the labels seems to differ and hence a different path
had to be created. We can try to refactor and merge them though.
We support both request and TCP metrics for the `server` resource
while only the former with `serverauthorization` resources
as metrics are generated in this manner.
This also adds these policy objects into the `k8s` package to
make them as known resources.
For both the policy resources, `--from` doesn't work as these
metrics are not exposed from outbound, and there is no way to
query about the client workload from the inbound metrics. `--to`
is supported to get metrics specifically for a destination workload.
(just like on a service)
## Testing
```bash
> curl -sL https://run.linkerd.io/emojivoto.yml | linkerd inject --proxy-log-level debug - | kubectl apply -f -
> kubectl apply -f 897de1a8d5/emojivoto-policy.yml
# Initial values
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.0rps 100.00% 1.8rps 1ms 1ms 3ms 1 188.6B/s 2072.9B/s
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps 80.70% 0.9rps 1ms 2ms 3ms 1 91.4B/s 52.7B/s
emojivoto web-http 0.0rps 90.68% 2.0rps 2ms 10ms 28ms 1 153.7B/s 4509.4B/s
# After changing the `emoji-grpc` authz
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell) took 2s
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.3rps 100.00% 1.1rps 0ms 0ms 0ms 1 156.5B/s 1282.4B/s
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps 87.88% 0.6rps 0ms 0ms 0ms 1 53.5B/s 31.5B/s
emojivoto web-http 0.0rps 61.18% 1.4rps 1ms 2ms 2ms 1 110.2B/s 2195.7B/s
# after changing the `web-http` authz
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.7 via via ❄️ impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2
NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emojivoto emoji-grpc 0.0rps - - - - - - - -
emojivoto prom 0.0rps - - - - - - - -
emojivoto voting-grpc 0.0rps - - - - - - - -
emojivoto web-http 1.0rps - - - - - - - -
> linkerd viz stat srv/emoji-grpc -n emojivoto -owide
NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emoji-grpc 100.00% 2.0rps 1ms 1ms 1ms 1 199.9B/s 2208.0B/s
> linkerd viz stat srv/web-http -n emojivoto -owide
NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web-http 94.02% 1.9rps 4ms 9ms 10ms 1 152.7B/s 4505.9B/s
> linkerd viz stat srv -n emojivoto -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms 1 201.6B/s 2209.8B/s
prom - - - - - - - - -
voting-grpc - 86.21% 1.0rps 1ms 1ms 1ms 1 98.3B/s 55.9B/s
web-http - 91.67% 2.0rps 3ms 8ms 10ms 1 157.7B/s 4600.3B/s
> linkerd viz stat serverauthorization/web-public -n emojivoto
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
web-http - 89.83% 2.0rps 3ms 9ms 10ms
> linkerd viz stat saz -n emojivoto
NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
emoji-grpc emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms
prom prom-prometheus - - - - - -
voting-grpc voting-grpc - 89.83% 1.0rps 1ms 1ms 1ms
web-http web-public - 94.96% 2.0rps 1ms 5ms 9ms
> linkerd viz stat saz/web-public -n emojivoto
NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
web-http web-public - 90.00% 2.0rps 1ms 5ms 9ms
```
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Currently, `viz stat` on services is pretty restricted because of
it not being a podowner resource. This PR fixes that by making
it use the `direction="outbound", authroty="svc"` while querying
the prometheus metrics. This means that for services, we can
generate metrics from the *meshed* clients side.
`StatsSummary` metrics on a service are further divided into
two kinds
### Service has no `ServiceProfiles.dstOverrides`
In this case, We just return the metrics by
querying for `direction="outbound", authroty="svc"`, along
with any `--from` resources specified as client query labels.
We also gate this path, to fail for requests that have `--from`
as a service or for `svc/* --to xyz`, as they are invalid i.e
we can't render metrics with service as the client.
### Service has `ServiceProfiles.dstOverrides`
Here, We follow a similar path of `TrafficSplit`
except that we use a `ServiceProfile` resource
object instead.
_The TrafficSplit path will be removed or merged into the
`Service` path in a separate PR for simplification,_
## Testing
### Apply Traffic Splitting through `ServiceProfiles`
```bash
on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷] via 🐼 v1.16.5 took 1m11s
➜ k create ns linkerd-trafficsplit-test-sp ~/work/linkerd2
namespace/linkerd-trafficsplit-test-sp created
on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷] via 🐼 v1.16.5
➜ ./bin/linkerd inject ./test/integration/trafficsplit/testdata/application.yaml | k -n linkerd-trafficsplit-test-sp apply -f - ~/work/linkerd2
document missing "kind" field, skipped
deployment "backend" injected
service "backend-svc" skipped
deployment "failing" injected
service "failing-svc" skipped
deployment "slow-cooker" injected
service "slow-cooker" skipped
deployment.apps/backend created
service/backend-svc created
deployment.apps/failing created
service/failing-svc created
deployment.apps/slow-cooker created
service/slow-cooker created
on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷] via 🐼 v1.16.5
➜ k apply -f ./test/integration/trafficsplit/testdata/sp/updated-traffic-split-leaf-weights.yaml -n linkerd-trafficsplit-test-sp ~/work/linkerd2
serviceprofile.linkerd.io/backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local created
on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷] via 🐼 v1.16.5
➜ k describe sp -n linkerd-trafficsplit-test-sp ~/work/linkerd2
Name: backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
Namespace: linkerd-trafficsplit-test-sp
Labels: <none>
Annotations: <none>
API Version: linkerd.io/v1alpha2
Kind: ServiceProfile
Metadata:
Creation Timestamp: 2021-07-01T11:05:06Z
Generation: 1
Managed Fields:
API Version: linkerd.io/v1alpha2
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
.:
f:dstOverrides:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2021-07-01T11:05:06Z
Resource Version: 1398
UID: fce0a250-1396-4a14-9729-e19030048c7a
Spec:
Dst Overrides:
Authority: backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
Weight: 500m
Authority: failing-svc.linkerd-trafficsplit-test-sp.svc.cluster.local:8081
Weight: 500m
Events: <none>
```
### CLI Output
```bash
on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp ~/work/linkerd2
NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.9rps 1ms 2ms 2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 1.1rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷] via 🐼 v1.16.6 via took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker ~/work/linkerd2
NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.4rps 1ms 2ms 2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 0.6rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷] via 🐼 v1.16.6 via took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 ~/work/linkerd2
NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.5rps 1ms 2ms 2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 0.5rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat svc/prometheus -n linkerd-viz ~/work/linkerd2
StatSummary API error: service only supported as a target on 'from' queries, or as a destination on 'to' queries%
# With no `sp.dstOverrides`
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via took 10s
➜ k -n linkerd-trafficsplit-test-sp delete sp backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local ~/work/linkerd2
serviceprofile.linkerd.io "backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local" deleted
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp ~/work/linkerd2
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc - 100.00% 1.2rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 --from-namespace linkerd-trafficsplit-test-sp
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc - 100.00% 0.6rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker --from-namespace linkerd-trafficsplit-test-sp
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99
backend-svc - 100.00% 0.7rps 1ms 2ms 2ms
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via
➜ ./bin/go-run cli viz stat deploy/slow-cooker -n linkerd-trafficsplit-test-sp --to svc/backend-svc ~/work/linkerd2
No traffic found.
on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷] via 🐼 v1.16.6 via
➜ ~/work/linkerd2
```
Note: _This means that we need documenation changes to
let the user know that the `viz stat` on a service are client
side metrics and would be missing metrics from unmeshed
clients._
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Fixes https://github.com/linkerd/linkerd2/issues/3706
The implementation of the `linkerd viz edges` command works by gathering http and tcp metrics in both the inbound and outbound directions and combining this data in dubious ways.
We make the implementation simpler and more correct by instead doing the following:
* Gather tcp metrics only
* (this drops support for very old proxy versions which do not expose the `tcp_open_connections` metric)
* Gather outbound metrics only
* (all meshed edges will have a src in the mesh and will be present in the outbound metrics)
* Outbound metrics do not have a `client_id` label, so we fill in this missing data by inspecting the source pod via the k8s api and reconstruct that pod's TLS identity based on it's service account name and namespace.
Signed-off-by: Alex Leong <alex@buoyant.io>
Fixes#5589
The core control plane has a dependency on the viz package in order to use the `BuildResource` function. This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene.
We move the `BuildResource` function into the viz package. In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`.
Signed-off-by: Alex Leong <alex@buoyant.io>
* Fix namespace always showing up in topology graph
Fixes#6211
In #6091, code was added to include the the namespace in the list of all
stat resource types. This was added to so that we'd have a complete list
of resources types that could be suggested by the CLI autocompletion
code. However, this list was also used by the web frontend in a query
that gathered metrics from all resource types. This then caused the
query to inadvertently create an inbound metric for deployment that
showed traffic from given namespace.
This change, breaks up the code so that we have a separate list for the
autocompletion code without the namespace value and the original list
used by the web frontend prior to #6091.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
Go 1.16.4 includes a fix for a denial-of-service in net/http: golang/go#45710
Go's error file-line formatting changed in 1.16.3, so this change
updates tests to only do suffix matching on these error strings.
Linkerd's CLI offers basic shell suggestions on most of its subcommands.
These suggestions are based on hardcoded suggestion lists, for example such
`viz stat` auto suggests a list of all resources types supported by that
command located in `k8s.go`. Although this provides basic suggestions
for k8s manifest resources, prior to this change, there currently is no
way to get auto suggested resources from the k8s cluster linkerd is
installed in.
This change adds a new `CommandCompletion` module that reads arguments
and a substring, and queries the k8s API to determine what suggestions
to provide to the user. The current implementation makes the module
generic enough to query most Kubernetes resourcesand can be used for all
subcommands in the CLI.
This change only applies this behavior to the `stat` command as first
step. Adding auto completion for other commands will be done in a number
of follow up PRs.
To test out the change on this branch:
- Build the CLI binaries on this branch
- install the completion scripts for your shell environment. Running
`linkerd completion -h` should give you more info on how to do that.
- If not installed already, install `linkerd viz`
```
linkerd viz install | k apply -f -
```
- test out completion by typing
```
linkerd viz stat [tab][tab]
```
Part of #5981
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
### What
This change adds the `config.linkerd.io/proxy-await` annotation which when set will delay application container start until the proxy is ready. This allows users to force application containers to wait for the proxy container to be ready without modifying the application's Docker image. This is different from the current use-case of [linkerd-await](https://github.com/olix0r/linkerd-await) which does require modifying the image.
---
To support this, Linkerd is using the fact that containers are started in the order that they appear in `spec.containers`. If `linkerd-proxy` is the first container, then it will be started first.
Kubernetes will start each container without waiting on the result of the previous container. However, if a container has a hook that is executed immediately after container creation, then Kubernetes will wait on the result of that hook before creating the next container. Using a `PostStart` hook in the `linkerd-proxy` container, the `linkerd-await` binary can be run and force Kubernetes to pause container creation until the proxy is ready. Once `linkerd-await` completes, the container hook completes and the application container is created.
Adding the `config.linkerd.io/await-proxy` annotation to a pod's metadata results in the `linkerd-proxy` container being the first container, as well as having the container hook:
```yaml
postStart:
exec:
command:
- /usr/lib/linkerd/linkerd-await
```
---
### Update after draft
There has been some additional discussion both off GitHub as well as on this PR (specifically with @electrical).
First, we decided that this feature should be enabled by default. The reason for this is more often than not, this feature will prevent start-up ordering issues from occurring without having any negative effects on the application. Additionally, this will be a part of edges up until the 2.11 (the next stable release) and having it enabled by default will allow us to check that it does not conflict often with applications. Once we are closer to 2.11, we'll be able to determine if this should be disabled by default because it causes more issues than it prevents.
Second, this feature will remain configurable; if disabled, then upon injection the proxy container will not be made the first container in the pod manifest. This is important for the reasons discussed with @electrical about tools that make assumptions about app containers being the first container. For example, Rancher defaults to showing overview pages for the `0` index container, and if the proxy container was always `0` then this would defeat the purpose of the overview page.
### Testing
To test this I used the `sleep.sh` script and changed `Dockerfile-proxy` to use it as it's `ENTRYPOINT`. This forces the container to sleep for 20 seconds before starting the proxy.
---
`sleep.sh`:
```bash
#!/bin/bash
echo "sleeping..."
sleep 20
/usr/bin/linkerd2-proxy-run
```
`Dockerfile-proxy`:
```textile
...
COPY sleep.sh /sleep.sh
RUN ["chmod", "+x", "/sleep.sh"]
ENTRYPOINT ["/sleep.sh"]
```
---
```bash
# Build and install with the above changes
$ bin/docker-build
...
$ bin/image-load --k3d
...
$ bin/linkerd install |kubectl apply -f -
```
Annotate the `emoji` deployment so that it's the only workload that should wait for it's proxy to be ready and inject it:
```bash
cat emojivoto.yaml |bin/linkerd inject - |kubectl apply -f -
```
You can then see that the `emoji` deployment is not starting its application container until the proxy is ready:
```bash
$ kubectl get -n emojivoto pods
NAME READY STATUS RESTARTS AGE
voting-ff4c54b8d-sjlnz 1/2 Running 0 9s
emoji-f985459b4-7mkzt 0/2 PodInitializing 0 9s
web-5f86686c4d-djzrz 1/2 Running 0 9s
vote-bot-6d7677bb68-mv452 1/2 Running 0 9s
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
* Remove the `linkerd-controller` pod
Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.
## Removals
- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.
## Health checks
- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.
## Other changes
- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
* Removed `Version` API from the public-api
This is a sibling PR to #5993, and it's the second step towards removing the `linkerd-controller` pod.
This one deals with a replacement for the `Version` API, fetching instead the `linkerd-config` CM and retrieving the `LinkerdVersion` value.
## Changes to the public-api
- Removal of the `publicPb.ApiClient` entry from the `Client` interface
- Removal of the `publicPb.ApiServer` entry from the `Server` interface
- Removal of the `Version` and related methods from `client.go`, `grpc_server.go` and `http_server.go`
## Changes to `linkerd version`
- Removal of all references to the public API.
- Call `healthcheck.GetServerVersion` to retrieve the version
## Changes to `linkerd check`
- Removal of the "can query the control API" check from the "linkerd-api" section
- Addition of a new "can retrieve the control plane version" check under the "control-plane-version" section
## Changes to `linkerd-web`
- The version is now retrieved from the `linkerd-config` CM instead of a public-API call.
- Removal of all references to the public API.
- Removal of the `data-go-version` global attribute on the dashboard, which wasn't being used.
## Other changes
- Added `ValuesFromConfigMap` function in `values.go` to convert the `linkerd-config` CM into a `*Values` struct instance
- Removal of the `public` protobuf
- Refactor 'linkerd repair' to use the refactored 'healthcheck.GetServerVersion()' function
Fixes#5966Fixes#5955
The metrics-api container in the Viz extension does not have the default set of system CA certificates installed. This means that it will fail to validate the certificate of an external prometheus serverd over https.
We add install default CA certs into the container.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add peer label to TCP read and write stat queries
Closes#5693
### Tests
---
After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes).
```
$ linkerd viz stat deploy/web -n emojivoto -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 91.91% 2.3rps 2ms 4ms 5ms 3 185.3B/s 5180.0B/s
# same value as before, latency seems to have dropped
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"
# queries show the peer label
---
$ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC
web 1/1 93.16% 1.9rps 3ms 4ms 4ms 1 4503.4B/s 153.1B/s
# stats same as before except for latency which seems to have dropped a bit
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"
# queries show the right label
```
Signed-off-by: mateiidavid <matei.david.35@gmail.com>
* update go.mod and docker images to go 1.16.1
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* update test error messages for ParseDuration
* update go version to 1.16.2
When introducing the `linkerd-await` helper, we provided a default value
for `TARGETARCH`. This appears to interfere with multi-arch image
builds, causing ARM builds to fetch amd64 binaries.
Unsetting this default appears to fix this issue.
When a container starts up, we generally want to wait for the proxy to
initialize before starting the controller (which may initiate outbound
connections, especially to the Kubernetes API). This is true for all
pods except the identity controller, which must start before its proxy.
This change adds the linkerd-await helper to all of our container
images. Its use is explicitly disabled in the identity controller, due
to startup ordering constraints, and the heartbeat controller, because
it does not run a proxy currently.
Fixes#5819
* Remove linkerd prefix from extension resources
This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
The Go-1.14 release branch includes a number of important updates. This
change updates our containers' base image to the latest release, 1.14.15
See linkerd/linkerd2-proxy-init#32
Fixes#5655
Closes#5545.
This change moves all tap and tap-injector code into the viz directory.
The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.
Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.
There is no change in behavior for the `viz tap` command.
### Reviewing
#### Docker images
All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.
#### Controller and pkg directories
This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.
#### viz/tap
This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.
The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#5575
Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.
Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.
## Success Output
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```
## Failure Examples
Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
Error calling the Kubernetes API: someerror
Error calling Prometheus from the control plane: someerror
see https://linkerd.io/checks/#l5d-api-control-api for hints
Status check results are ×
```
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.
* Added chart templates for new viz linkerd-metrics-api pod
* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.
* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).
* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.
* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.
* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.
* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
- Updated `endpoints.go` according to new API interface name.
- Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
- `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
- Added `metrics-api` to list of docker images to build in actions workflows.
- In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).
* Add retry to 'tap API service is running' check
* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
* Separate observability API
Closes#5312
This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.
- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.
The other changes in the go files are just changes in the imports to point to the new protobufs.
Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`