linkerd2

Commit Graph

Author	SHA1	Message	Date
Tarun Pothulapati	aa9ee6b007	injector: remove unused proxy reference env variables (#7382 ) Fixes #6740 \#6711 removed the usage of unnecessary reference variables in the proxy template, as they are not needed. Their definations were left as there were race conditions with extension installs. As `2.11` was released with that change, Now its a good time to remove the definations too as no usages should be present from a `2.11` upgrade. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-12-16 00:06:18 +05:30
Tarun Pothulapati	4170b49b33	smi: remove default functionality in linkerd (#7334 ) Now, that SMI functionality is fully being moved into the [linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can stop supporting its functionality by default. This means that the `destination` component will stop reacting to the `TrafficSplit` objects. When `linkerd-smi` is installed, It does the conversion of `TrafficSplit` objects to `ServiceProfiles` that destination components can understand, and will react accordingly. Also, Whenever a `ServiceProfile` with traffic splitting is associated with a service, the same information (i.e splits and weights) is also surfaced through the `UI` (in the new `services` tab) and the `viz cmd`. So, We are not really loosing any UI functionality here. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-12-03 12:07:30 +05:30
Eng Zer Jun	f2fb35aa46	build: upgrade to Go 1.17 (#7371 ) * build: upgrade to Go 1.17 This commit introduces three changes: 1. Update the `go` directive in `go.mod` to 1.17 2. Update all Dockerfiles from `golang:1.16.2` to `golang:1.17.3` 3. Update all CI to use Go 1.17 Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * chore: run `go fmt ./...` This commit synchronizes `//go:build` lines with `// +build` lines. Reference: https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2021-11-30 15:36:11 -05:00
Kevin Leimkuhler	d147104b29	Upgrade golangci-lint (#7338 ) This upgrades golangci-lint from `v1.25.1` to `v1.43.0` and fixes a few new lint errors. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2021-11-23 12:54:47 -07:00
Alex Leong	e5dd7810fa	Add linkerd viz authz command (#6875 ) Similarly to the `linkerd authz` command which lists all authorizations for a given resource and `linkerd viz stat` which can show metrics for policy resources, we introduce a `linkerd viz authz` command which shows metrics for server authorizations broken down by server for a given resource. It also shows the rate of unauthorized requests to each server. This is helpful for seeing a breakdown of which authorizations are being used and what proportion of traffic is being rejected. For example: ```console > linkerd viz authz -n emojivoto deploy SERVER AUTHZ SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 emoji-grpc emoji-grpc 100.00% 1.8rps 1ms 1ms 1ms prom prom-prometheus - - - - - voting-grpc [UNAUTHORIZED] - 0.9rps - - - web-http web-public 50.00% 1.8rps 4ms 190ms 198ms ``` This shows us a few things right away: * all traffic to the emoji-grpc server is authorized by the emoji-grpc server authorization * the prom server defines a prom-prometheus server authorization, but it is not receiving any traffic * the voting-grpc server has no server authorizations, and thus all 0.9rps is getting rejected	2021-09-21 09:36:05 -07:00
Tarun Pothulapati	45478b6db8	viz: support `stat` on new policy resources (#6785 ) Fixes #6733 As policy resources provide a grouping, statistics summaries should also be allowed on these groupings which are useful to the user. Them being port specific provide a great way to break down these metrics further. This PR adds support for policy resources i.e `server` and `serverauthorization` on the `stat` command. ## Changes This adds a new path in the `stat_summary.go` file to handle policy objects. I tried to see if we could re-use some of the other paths but some of the labels seems to differ and hence a different path had to be created. We can try to refactor and merge them though. We support both request and TCP metrics for the `server` resource while only the former with `serverauthorization` resources as metrics are generated in this manner. This also adds these policy objects into the `k8s` package to make them as known resources. For both the policy resources, `--from` doesn't work as these metrics are not exposed from outbound, and there is no way to query about the client workload from the inbound metrics. `--to` is supported to get metrics specifically for a destination workload. (just like on a service) ## Testing ```bash > curl -sL https://run.linkerd.io/emojivoto.yml \| linkerd inject --proxy-log-level debug - \| kubectl apply -f - > kubectl apply -f `897de1a8d5/emojivoto-policy.yml` # Initial values on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️ impure (shell) ➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2 NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC emojivoto emoji-grpc 0.0rps 100.00% 1.8rps 1ms 1ms 3ms 1 188.6B/s 2072.9B/s emojivoto prom 0.0rps - - - - - - - - emojivoto voting-grpc 0.0rps 80.70% 0.9rps 1ms 2ms 3ms 1 91.4B/s 52.7B/s emojivoto web-http 0.0rps 90.68% 2.0rps 2ms 10ms 28ms 1 153.7B/s 4509.4B/s # After changing the `emoji-grpc` authz on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️ impure (shell) took 2s ➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2 NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC emojivoto emoji-grpc 0.3rps 100.00% 1.1rps 0ms 0ms 0ms 1 156.5B/s 1282.4B/s emojivoto prom 0.0rps - - - - - - - - emojivoto voting-grpc 0.0rps 87.88% 0.6rps 0ms 0ms 0ms 1 53.5B/s 31.5B/s emojivoto web-http 0.0rps 61.18% 1.4rps 1ms 2ms 2ms 1 110.2B/s 2195.7B/s # after changing the `web-http` authz on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️ impure (shell) ➜ ./bin/go-run cli viz stat srv -A -owide ~/work/linkerd2 NAMESPACE NAME UNAUTHORIZED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC emojivoto emoji-grpc 0.0rps - - - - - - - - emojivoto prom 0.0rps - - - - - - - - emojivoto voting-grpc 0.0rps - - - - - - - - emojivoto web-http 1.0rps - - - - - - - - > linkerd viz stat srv/emoji-grpc -n emojivoto -owide NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC emoji-grpc 100.00% 2.0rps 1ms 1ms 1ms 1 199.9B/s 2208.0B/s > linkerd viz stat srv/web-http -n emojivoto -owide NAME SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC web-http 94.02% 1.9rps 4ms 9ms 10ms 1 152.7B/s 4505.9B/s > linkerd viz stat srv -n emojivoto -o wide NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms 1 201.6B/s 2209.8B/s prom - - - - - - - - - voting-grpc - 86.21% 1.0rps 1ms 1ms 1ms 1 98.3B/s 55.9B/s web-http - 91.67% 2.0rps 3ms 8ms 10ms 1 157.7B/s 4600.3B/s > linkerd viz stat serverauthorization/web-public -n emojivoto NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 web-http - 89.83% 2.0rps 3ms 9ms 10ms > linkerd viz stat saz -n emojivoto NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 emoji-grpc emoji-grpc - 100.00% 2.0rps 1ms 1ms 1ms prom prom-prometheus - - - - - - voting-grpc voting-grpc - 89.83% 1.0rps 1ms 1ms 1ms web-http web-public - 94.96% 2.0rps 1ms 5ms 9ms > linkerd viz stat saz/web-public -n emojivoto NAME AUTHORIZATION MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 web-http web-public - 90.00% 2.0rps 1ms 5ms 9ms ``` Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-09-15 10:59:36 +05:30
Stepan Rabotkin	5e6a1b5508	Graceful shutdown for admin server (#6817 ) * Graceful shutdown for admin server Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>	2021-09-07 10:50:31 -05:00
Tarun Pothulapati	a330d20aa0	stat_summary: support service metrics using `authority` label (#6514 ) Currently, `viz stat` on services is pretty restricted because of it not being a podowner resource. This PR fixes that by making it use the `direction="outbound", authroty="svc"` while querying the prometheus metrics. This means that for services, we can generate metrics from the meshed clients side. `StatsSummary` metrics on a service are further divided into two kinds ### Service has no `ServiceProfiles.dstOverrides` In this case, We just return the metrics by querying for `direction="outbound", authroty="svc"`, along with any `--from` resources specified as client query labels. We also gate this path, to fail for requests that have `--from` as a service or for `svc/* --to xyz`, as they are invalid i.e we can't render metrics with service as the client. ### Service has `ServiceProfiles.dstOverrides` Here, We follow a similar path of `TrafficSplit` except that we use a `ServiceProfile` resource object instead. _The TrafficSplit path will be removed or merged into the `Service` path in a separate PR for simplification,_ ## Testing ### Apply Traffic Splitting through `ServiceProfiles` ```bash on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 took 1m11s ➜ k create ns linkerd-trafficsplit-test-sp ~/work/linkerd2 namespace/linkerd-trafficsplit-test-sp created on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 ➜ ./bin/linkerd inject ./test/integration/trafficsplit/testdata/application.yaml \| k -n linkerd-trafficsplit-test-sp apply -f - ~/work/linkerd2 document missing "kind" field, skipped deployment "backend" injected service "backend-svc" skipped deployment "failing" injected service "failing-svc" skipped deployment "slow-cooker" injected service "slow-cooker" skipped deployment.apps/backend created service/backend-svc created deployment.apps/failing created service/failing-svc created deployment.apps/slow-cooker created service/slow-cooker created on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 ➜ k apply -f ./test/integration/trafficsplit/testdata/sp/updated-traffic-split-leaf-weights.yaml -n linkerd-trafficsplit-test-sp ~/work/linkerd2 serviceprofile.linkerd.io/backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local created on ⛵ kind-kind linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 ➜ k describe sp -n linkerd-trafficsplit-test-sp ~/work/linkerd2 Name: backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local Namespace: linkerd-trafficsplit-test-sp Labels: <none> Annotations: <none> API Version: linkerd.io/v1alpha2 Kind: ServiceProfile Metadata: Creation Timestamp: 2021-07-01T11:05:06Z Generation: 1 Managed Fields: API Version: linkerd.io/v1alpha2 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:kubectl.kubernetes.io/last-applied-configuration: f:spec: .: f:dstOverrides: Manager: kubectl-client-side-apply Operation: Update Time: 2021-07-01T11:05:06Z Resource Version: 1398 UID: fce0a250-1396-4a14-9729-e19030048c7a Spec: Dst Overrides: Authority: backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local Weight: 500m Authority: failing-svc.linkerd-trafficsplit-test-sp.svc.cluster.local:8081 Weight: 500m Events: <none> ``` ### CLI Output ```bash on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp ~/work/linkerd2 NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.9rps 1ms 2ms 2ms backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 1.1rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker ~/work/linkerd2 NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.4rps 1ms 2ms 2ms backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 0.6rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 ~/work/linkerd2 NAME APEX LEAF WEIGHT SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc 500m 100.00% 0.5rps 1ms 2ms 2ms backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local failing-svc 500m 0.00% 0.5rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat svc/prometheus -n linkerd-viz ~/work/linkerd2 StatSummary API error: service only supported as a target on 'from' queries, or as a destination on 'to' queries% # With no `sp.dstOverrides` on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  took 10s ➜ k -n linkerd-trafficsplit-test-sp delete sp backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local ~/work/linkerd2 serviceprofile.linkerd.io "backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local" deleted on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp ~/work/linkerd2 NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc - 100.00% 1.2rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 --from-namespace linkerd-trafficsplit-test-sp NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc - 100.00% 0.6rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker --from-namespace linkerd-trafficsplit-test-sp NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 backend-svc - 100.00% 0.7rps 1ms 2ms 2ms on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ./bin/go-run cli viz stat deploy/slow-cooker -n linkerd-trafficsplit-test-sp --to svc/backend-svc ~/work/linkerd2 No traffic found. on ⛵ kind-kind linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  ➜ ~/work/linkerd2 ``` Note: _This means that we need documenation changes to let the user know that the `viz stat` on a service are client side metrics and would be missing metrics from unmeshed clients._ Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-08-09 23:05:14 +05:30
Alex Leong	10a36b010a	Refactor edges command (#6574 ) Fixes https://github.com/linkerd/linkerd2/issues/3706 The implementation of the `linkerd viz edges` command works by gathering http and tcp metrics in both the inbound and outbound directions and combining this data in dubious ways. We make the implementation simpler and more correct by instead doing the following: * Gather tcp metrics only * (this drops support for very old proxy versions which do not expose the `tcp_open_connections` metric) * Gather outbound metrics only * (all meshed edges will have a src in the mesh and will be present in the outbound metrics) * Outbound metrics do not have a `client_id` label, so we fill in this missing data by inspecting the source pod via the k8s api and reconstruct that pod's TLS identity based on it's service account name and namespace. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-08-05 11:01:03 -07:00
Alex Leong	24792cfd1c	Remove core dependency on viz (#6497 ) Fixes #5589 The core control plane has a dependency on the viz package in order to use the `BuildResource` function. This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene. We move the `BuildResource` function into the viz package. In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-07-19 14:28:45 -07:00
dependabot[bot]	1dfd8b5bd7	Bump google.golang.org/protobuf from 1.27.0 to 1.27.1 (#6409 ) * Bump google.golang.org/protobuf from 1.27.0 to 1.27.1 Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.27.0 to 1.27.1. - [Release notes](https://github.com/protocolbuffers/protobuf-go/releases) - [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash) - [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.27.0...v1.27.1) --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-07-01 14:50:04 -06:00
dependabot[bot]	94b5aa634e	Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 (#6395 ) * Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.26.0 to 1.27.0. - [Release notes](https://github.com/protocolbuffers/protobuf-go/releases) - [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash) - [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.26.0...v1.27.0) --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex Leong <alex@buoyant.io>	2021-06-29 13:16:44 -06:00
Alex Leong	948f9a4ece	Update protoc (#6333 ) Update protoc from 3.6.0 to 3.15.7 Signed-off-by: Alex Leong <alex@buoyant.io>	2021-06-21 16:37:57 -07:00
dependabot[bot]	f4dacaf27f	Bump google.golang.org/protobuf from 1.24.0 to 1.26.0 (#6304 ) * Bump google.golang.org/protobuf from 1.24.0 to 1.26.0 Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.24.0 to 1.26.0. - [Release notes](https://github.com/protocolbuffers/protobuf-go/releases) - [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash) - [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.24.0...v1.26.0) --- updated-dependencies: - dependency-name: google.golang.org/protobuf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Update protobuf Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com> * Update go.sum Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-06-21 10:24:47 -07:00
dependabot[bot]	3bb1b6397d	Bump helm.sh/helm/v3 from 3.4.1 to 3.6.1 (#6286 ) * Bump helm.sh/helm/v3 from 3.4.1 to 3.6.1 Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.4.1 to 3.6.1. - [Release notes](https://github.com/helm/helm/releases) - [Commits](https://github.com/helm/helm/compare/v3.4.1...v3.6.1) --- updated-dependencies: - dependency-name: helm.sh/helm/v3 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2021-06-18 09:34:29 -06:00
Dennis Adjei-Baah	c61f5af1e9	Fix namespace always showing up in topology graph (#6236 ) * Fix namespace always showing up in topology graph Fixes #6211 In #6091, code was added to include the the namespace in the list of all stat resource types. This was added to so that we'd have a complete list of resources types that could be suggested by the CLI autocompletion code. However, this list was also used by the web frontend in a query that gathered metrics from all resource types. This then caused the query to inadvertently create an inbound metric for deployment that showed traffic from given namespace. This change, breaks up the code so that we have a separate list for the autocompletion code without the namespace value and the original list used by the web frontend prior to #6091. Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-06-09 15:31:11 +05:30
Oliver Gould	da6d8e5272	Update Go to 1.16.4 (#6170 ) Go 1.16.4 includes a fix for a denial-of-service in net/http: golang/go#45710 Go's error file-line formatting changed in 1.16.3, so this change updates tests to only do suffix matching on these error strings.	2021-05-24 11:57:46 -07:00
Dennis Adjei-Baah	03438c6587	Add support for server resource aware completion (#6091 ) Linkerd's CLI offers basic shell suggestions on most of its subcommands. These suggestions are based on hardcoded suggestion lists, for example such `viz stat` auto suggests a list of all resources types supported by that command located in `k8s.go`. Although this provides basic suggestions for k8s manifest resources, prior to this change, there currently is no way to get auto suggested resources from the k8s cluster linkerd is installed in. This change adds a new `CommandCompletion` module that reads arguments and a substring, and queries the k8s API to determine what suggestions to provide to the user. The current implementation makes the module generic enough to query most Kubernetes resourcesand can be used for all subcommands in the CLI. This change only applies this behavior to the `stat` command as first step. Adding auto completion for other commands will be done in a number of follow up PRs. To test out the change on this branch: - Build the CLI binaries on this branch - install the completion scripts for your shell environment. Running `linkerd completion -h` should give you more info on how to do that. - If not installed already, install `linkerd viz` ``` linkerd viz install \| k apply -f - ``` - test out completion by typing ``` linkerd viz stat [tab][tab] ``` Part of #5981 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-05-18 09:59:59 -04:00
Kevin Leimkuhler	1071ec2e77	Add support for awaiting proxy readiness (#5967 ) ### What This change adds the `config.linkerd.io/proxy-await` annotation which when set will delay application container start until the proxy is ready. This allows users to force application containers to wait for the proxy container to be ready without modifying the application's Docker image. This is different from the current use-case of [linkerd-await](https://github.com/olix0r/linkerd-await) which does require modifying the image. --- To support this, Linkerd is using the fact that containers are started in the order that they appear in `spec.containers`. If `linkerd-proxy` is the first container, then it will be started first. Kubernetes will start each container without waiting on the result of the previous container. However, if a container has a hook that is executed immediately after container creation, then Kubernetes will wait on the result of that hook before creating the next container. Using a `PostStart` hook in the `linkerd-proxy` container, the `linkerd-await` binary can be run and force Kubernetes to pause container creation until the proxy is ready. Once `linkerd-await` completes, the container hook completes and the application container is created. Adding the `config.linkerd.io/await-proxy` annotation to a pod's metadata results in the `linkerd-proxy` container being the first container, as well as having the container hook: ```yaml postStart: exec: command: - /usr/lib/linkerd/linkerd-await ``` --- ### Update after draft There has been some additional discussion both off GitHub as well as on this PR (specifically with @electrical). First, we decided that this feature should be enabled by default. The reason for this is more often than not, this feature will prevent start-up ordering issues from occurring without having any negative effects on the application. Additionally, this will be a part of edges up until the 2.11 (the next stable release) and having it enabled by default will allow us to check that it does not conflict often with applications. Once we are closer to 2.11, we'll be able to determine if this should be disabled by default because it causes more issues than it prevents. Second, this feature will remain configurable; if disabled, then upon injection the proxy container will not be made the first container in the pod manifest. This is important for the reasons discussed with @electrical about tools that make assumptions about app containers being the first container. For example, Rancher defaults to showing overview pages for the `0` index container, and if the proxy container was always `0` then this would defeat the purpose of the overview page. ### Testing To test this I used the `sleep.sh` script and changed `Dockerfile-proxy` to use it as it's `ENTRYPOINT`. This forces the container to sleep for 20 seconds before starting the proxy. --- `sleep.sh`: ```bash #!/bin/bash echo "sleeping..." sleep 20 /usr/bin/linkerd2-proxy-run ``` `Dockerfile-proxy`: ```textile ... COPY sleep.sh /sleep.sh RUN ["chmod", "+x", "/sleep.sh"] ENTRYPOINT ["/sleep.sh"] ``` --- ```bash # Build and install with the above changes $ bin/docker-build ... $ bin/image-load --k3d ... $ bin/linkerd install \|kubectl apply -f - ``` Annotate the `emoji` deployment so that it's the only workload that should wait for it's proxy to be ready and inject it: ```bash cat emojivoto.yaml \|bin/linkerd inject - \|kubectl apply -f - ``` You can then see that the `emoji` deployment is not starting its application container until the proxy is ready: ```bash $ kubectl get -n emojivoto pods NAME READY STATUS RESTARTS AGE voting-ff4c54b8d-sjlnz 1/2 Running 0 9s emoji-f985459b4-7mkzt 0/2 PodInitializing 0 9s web-5f86686c4d-djzrz 1/2 Running 0 9s vote-bot-6d7677bb68-mv452 1/2 Running 0 9s ``` Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-04-21 17:43:23 -04:00
Alejandro Pedraza	6980e45e1d	Remove the `linkerd-controller` pod (#6039 ) * Remove the `linkerd-controller` pod Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod. ## Removals - Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`. - Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication. - Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values. ## Health checks - Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether. - In that same `linkerd-existence` section, got rid of the `controller pod is running` check. ## Other changes - Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.	2021-04-19 09:57:45 -05:00
Alejandro Pedraza	c24585e6ea	Removed `Version` API from the public-api (#6000 ) * Removed `Version` API from the public-api This is a sibling PR to #5993, and it's the second step towards removing the `linkerd-controller` pod. This one deals with a replacement for the `Version` API, fetching instead the `linkerd-config` CM and retrieving the `LinkerdVersion` value. ## Changes to the public-api - Removal of the `publicPb.ApiClient` entry from the `Client` interface - Removal of the `publicPb.ApiServer` entry from the `Server` interface - Removal of the `Version` and related methods from `client.go`, `grpc_server.go` and `http_server.go` ## Changes to `linkerd version` - Removal of all references to the public API. - Call `healthcheck.GetServerVersion` to retrieve the version ## Changes to `linkerd check` - Removal of the "can query the control API" check from the "linkerd-api" section - Addition of a new "can retrieve the control plane version" check under the "control-plane-version" section ## Changes to `linkerd-web` - The version is now retrieved from the `linkerd-config` CM instead of a public-API call. - Removal of all references to the public API. - Removal of the `data-go-version` global attribute on the dashboard, which wasn't being used. ## Other changes - Added `ValuesFromConfigMap` function in `values.go` to convert the `linkerd-config` CM into a `*Values` struct instance - Removal of the `public` protobuf - Refactor 'linkerd repair' to use the refactored 'healthcheck.GetServerVersion()' function	2021-04-16 11:23:55 -05:00
Alex Leong	320df9e69a	Add CA certs to metrics-api container (#5972 ) Fixes #5966 Fixes #5955 The metrics-api container in the Viz extension does not have the default set of system CA certificates installed. This means that it will fail to validate the certificate of an external prometheus serverd over https. We add install default CA certs into the container. Signed-off-by: Alex Leong <alex@buoyant.io>	2021-03-30 09:26:40 -07:00
Matei David	e798b33e2e	Add peer label to TCP read and write stat queries (#5903 ) Add peer label to TCP read and write stat queries Closes #5693 ### Tests --- After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes). ``` $ linkerd viz stat deploy/web -n emojivoto -o wide NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC web 1/1 91.91% 2.3rps 2ms 4ms 5ms 3 185.3B/s 5180.0B/s # same value as before, latency seems to have dropped time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)" time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)" # queries show the peer label --- $ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 TCP_CONN READ_BYTES/SEC WRITE_BYTES/SEC web 1/1 93.16% 1.9rps 3ms 4ms 4ms 1 4503.4B/s 153.1B/s # stats same as before except for latency which seems to have dropped a bit time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)" time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)" # queries show the right label ``` Signed-off-by: mateiidavid <matei.david.35@gmail.com>	2021-03-26 13:36:30 -04:00
Dennis Adjei-Baah	7f0529ed7c	update go.mod and docker images to go 1.16.2 (#5890 ) * update go.mod and docker images to go 1.16.1 Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io> * update test error messages for ParseDuration * update go version to 1.16.2	2021-03-15 11:20:16 -05:00
Oliver Gould	ab2a809e1b	docker: Avoid specifying TARGETARCH for await (#5835 ) When introducing the `linkerd-await` helper, we provided a default value for `TARGETARCH`. This appears to interfere with multi-arch image builds, causing ARM builds to fetch amd64 binaries. Unsetting this default appears to fix this issue.	2021-02-26 07:30:14 -05:00
Oliver Gould	9e9b40d5a2	Add the linkerd-await helper to all Linkerd containers (#5821 ) When a container starts up, we generally want to wait for the proxy to initialize before starting the controller (which may initiate outbound connections, especially to the Kubernetes API). This is true for all pods except the identity controller, which must start before its proxy. This change adds the linkerd-await helper to all of our container images. Its use is explicitly disabled in the identity controller, due to startup ordering constraints, and the heartbeat controller, because it does not run a proxy currently. Fixes #5819	2021-02-25 10:35:04 -08:00
Dennis Adjei-Baah	15d1809bd0	Remove linkerd prefix from extension resources (#5803 ) * Remove linkerd prefix from extension resources This change removes the `linkerd-` prefix on all non-cluster resources in the jaeger and viz linkerd extensions. Removing the prefix makes all linkerd extensions consistent in their naming. Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>	2021-02-25 11:01:31 -05:00
Oliver Gould	2774780fb8	Update Go to 1.14.15 (#5751 ) The Go-1.14 release branch includes a number of important updates. This change updates our containers' base image to the latest release, 1.14.15 See linkerd/linkerd2-proxy-init#32 Fixes #5655	2021-02-16 08:40:06 -08:00
Kevin Leimkuhler	75fcc9d623	Move tap from core into Viz extension (#5651 ) Closes #5545. This change moves all tap and tap-injector code into the viz directory. The tap and tap-injector components now also use a new tap image—separating these components from the controller image that they are currently part of. This means the controller image has removed all its build dependencies related to tap. Finally, the tap Protobuf has been separated from the metrics-api and moved into it's own `.proto` file and gen directory. This introduces a clear split between metrics-api and tap Protobuf. There is no change in behavior for the `viz tap` command. ### Reviewing #### Docker images All the bin directory scripts should be updated to build and load the tap image. All the CI workflows should be updated to build and push the tap image. #### Controller and pkg directories This is primarily deletions. Most of the deleted code in this directory is now in the tap directory of the Viz extension. #### viz/tap This is the location that all the tap related code now lives in. New files are mostly moved from the controller and pkg directories. Imports have all been updated to point at the right locations and Protobuf. The Protobuf here is taken from metrics-api and contains all tap-related Protobuf. Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>	2021-02-09 12:43:21 -05:00
Alejandro Pedraza	a04b30d2ab	Simplify SelfCheck API (#5665 ) Fixes #5575 Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`. Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses. ## Success Output ```bash $ bin/linkerd viz check ... linkerd-viz ----------- ... √ viz extension self-check ``` ## Failure Examples Failure when viz fails to connect to the k8s api: ```bash $ bin/linkerd viz check ... linkerd-viz ----------- ... × viz extension self-check Error calling the Kubernetes API: someerror see https://linkerd.io/checks/#l5d-api-control-api for hints Status check results are × ``` Failure when viz fails to connect to Prometheus: ```bash $ bin/linkerd viz check ... linkerd-viz ----------- ... × viz extension self-check Error calling Prometheus from the control plane: someerror see https://linkerd.io/checks/#l5d-api-control-api for hints Status check results are × ``` Failure when viz fails to connect to both the k8s api and Prometheus: ```bash $ bin/linkerd viz check ... linkerd-viz ----------- ... × viz extension self-check Error calling the Kubernetes API: someerror Error calling Prometheus from the control plane: someerror see https://linkerd.io/checks/#l5d-api-control-api for hints Status check results are × ```	2021-02-05 10:13:45 -05:00
Alejandro Pedraza	8ac5360041	Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560 ) * Protobuf changes: - Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510). - Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs. * Added chart templates for new viz linkerd-metrics-api pod * Spin-off viz healthcheck: - Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients. - The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface. - Refactored the data plane checks so they don't rely on calling `ListPods` - The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck. * Removed linkerd-controller dependency on Prometheus: - Removed the `global.prometheusUrl` config in the core values.yml. - Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352). * Moved observability gRPC from linkerd-controller to viz: - Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server). - Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type. - Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.). - Also simplified some type names to avoid stuttering. * Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits. * linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container. * CLI updates and other minor things: - Changes to command files under `cli/cmd`: - Updated `endpoints.go` according to new API interface name. - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically. - Changes to command files under `viz/cmd`: - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz. - Other changes to have tests pass: - Added `metrics-api` to list of docker images to build in actions workflows. - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`). * Add retry to 'tap API service is running' check * mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used	2021-01-21 18:26:38 -05:00
Alejandro Pedraza	f3b1ebfa99	Separate observability API (#5510 ) * Separate observability API Closes #5312 This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here. - Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot` - Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff. - Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto` - `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup. - The web server provides handlers for both interfaces. - `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs. - Most of the CLI commands will use the Viz API, except for `version`. The other changes in the go files are just changes in the imports to point to the new protobufs. Other minor changes: - Removed `git add controller/gen` from `bin/protoc-go.sh`	2021-01-13 14:34:54 -05:00

32 Commits