Commit Graph

32 Commits

Author SHA1 Message Date
Tarun Pothulapati aa9ee6b007
injector: remove unused proxy reference env variables (#7382)
Fixes #6740

\#6711 removed the usage of unnecessary reference variables
in the proxy template, as they are not needed. Their definations
were left as there were race conditions with extension installs.

As `2.11` was released with that change, Now its a good time to
remove the definations too as no usages should be present from a
`2.11` upgrade.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-12-16 00:06:18 +05:30
Tarun Pothulapati 4170b49b33
smi: remove default functionality in linkerd (#7334)
Now, that SMI functionality is fully being moved into the
[linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can
stop supporting its functionality by default.

This means that the `destination` component will stop reacting
to the `TrafficSplit` objects. When `linkerd-smi` is installed,
It does the conversion of `TrafficSplit` objects to `ServiceProfiles`
that destination components can understand, and will react accordingly.

Also, Whenever a `ServiceProfile` with traffic splitting is associated
with a service, the same information (i.e splits and weights) is also
surfaced through the `UI` (in the new `services` tab) and the `viz cmd`.
So, We are not really loosing any UI functionality here.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-12-03 12:07:30 +05:30
Eng Zer Jun f2fb35aa46
build: upgrade to Go 1.17 (#7371)
* build: upgrade to Go 1.17

This commit introduces three changes:
	1. Update the `go` directive in `go.mod` to 1.17
	2. Update all Dockerfiles from `golang:1.16.2` to
	   `golang:1.17.3`
	3. Update all CI to use Go 1.17

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* chore: run `go fmt ./...`

This commit synchronizes `//go:build` lines with `// +build` lines.

Reference: https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-11-30 15:36:11 -05:00
Kevin Leimkuhler d147104b29
Upgrade golangci-lint (#7338)
This upgrades golangci-lint from `v1.25.1` to `v1.43.0` and fixes a few new lint errors.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2021-11-23 12:54:47 -07:00
Alex Leong e5dd7810fa
Add linkerd viz authz command (#6875)
Similarly to the `linkerd authz` command which lists all authorizations for a given resource and `linkerd viz stat` which can show metrics for policy resources, we introduce a `linkerd viz authz` command which shows metrics for server authorizations broken down by server for a given resource.  It also shows the rate of unauthorized requests to each server.  This is helpful for seeing a breakdown of which authorizations are being used and what proportion of traffic is being rejected.  For example:


```console
> linkerd viz authz -n emojivoto deploy
SERVER       AUTHZ            SUCCESS     RPS  LATENCY_P50  LATENCY_P95  LATENCY_P99  
emoji-grpc   emoji-grpc       100.00%  1.8rps          1ms          1ms          1ms  
prom         prom-prometheus        -       -            -            -            -  
voting-grpc  [UNAUTHORIZED]         -  0.9rps            -            -            -  
web-http     web-public        50.00%  1.8rps          4ms        190ms        198ms
```

This shows us a few things right away:

* all traffic to the emoji-grpc server is authorized by the emoji-grpc server authorization
* the prom server defines a prom-prometheus server authorization, but it is not receiving any traffic
* the voting-grpc server has no server authorizations, and thus all 0.9rps is getting rejected
2021-09-21 09:36:05 -07:00
Tarun Pothulapati 45478b6db8
viz: support `stat` on new policy resources (#6785)
Fixes #6733

As policy resources provide a grouping, statistics summaries should
also be allowed on these groupings which are useful to the user. Them
being port specific provide a great way to break down these metrics
further.

This PR adds support for policy resources i.e `server` and `serverauthorization`
 on the `stat` command.

## Changes

This adds a new path in the `stat_summary.go` file to handle policy
objects. I tried to see if we could re-use some of the other paths
but some of the labels seems to differ and hence a different path
had to be created. We can try to refactor and merge them though.

We support both request and TCP metrics for the `server` resource
while only the former with `serverauthorization` resources
as metrics are generated in this manner.

This also adds these policy objects into the `k8s` package to
make them as known resources.

For both the policy resources, `--from` doesn't work as these
metrics are not exposed from outbound, and there is no way to
query about the client workload from the inbound metrics. `--to`
is supported to get metrics specifically for a destination workload.
(just like on a service)

## Testing

```bash
> curl -sL https://run.linkerd.io/emojivoto.yml | linkerd inject --proxy-log-level debug - | kubectl apply -f -

> kubectl apply -f 897de1a8d5/emojivoto-policy.yml


# Initial values
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps   100.00%   1.8rps           1ms           1ms           3ms          1         188.6B/s         2072.9B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    80.70%   0.9rps           1ms           2ms           3ms          1          91.4B/s           52.7B/s
emojivoto   web-http            0.0rps    90.68%   2.0rps           2ms          10ms          28ms          1         153.7B/s         4509.4B/s

# After changing the `emoji-grpc` authz
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell) took 2s
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.3rps   100.00%   1.1rps           0ms           0ms           0ms          1         156.5B/s         1282.4B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    87.88%   0.6rps           0ms           0ms           0ms          1          53.5B/s           31.5B/s
emojivoto   web-http            0.0rps    61.18%   1.4rps           1ms           2ms           2ms          1         110.2B/s         2195.7B/s

# after changing the `web-http` authz

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps         -     -             -             -             -          -                -                 -
emojivoto   prom                0.0rps         -     -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps         -     -             -             -             -          -                -                 -
emojivoto   web-http            1.0rps         -     -             -             -             -          -                -                 -

> linkerd  viz stat srv/emoji-grpc -n emojivoto -owide
NAME         SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc        100.00%   2.0rps           1ms           1ms           1ms          1         199.9B/s         2208.0B/s

> linkerd  viz stat srv/web-http -n emojivoto -owide
NAME      SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
web-http         94.02%   1.9rps           4ms           9ms          10ms          1         152.7B/s         4505.9B/s

> linkerd  viz stat srv -n emojivoto -o wide                                                      
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc         -   100.00%   2.0rps           1ms           1ms           1ms          1         201.6B/s         2209.8B/s
prom               -         -        -             -             -             -          -                -                 -
voting-grpc        -    86.21%   1.0rps           1ms           1ms           1ms          1          98.3B/s           55.9B/s
web-http           -    91.67%   2.0rps           3ms           8ms          10ms          1         157.7B/s         4600.3B/s


> linkerd  viz stat serverauthorization/web-public -n emojivoto
NAME       MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99  
web-http        -    89.83%   2.0rps           3ms           9ms          10ms

> linkerd viz stat saz -n emojivoto
NAME          AUTHORIZATION     MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
emoji-grpc    emoji-grpc             -   100.00%   2.0rps           1ms           1ms           1ms
prom          prom-prometheus        -         -        -             -             -             -
voting-grpc   voting-grpc            -    89.83%   1.0rps           1ms           1ms           1ms
web-http      web-public             -    94.96%   2.0rps           1ms           5ms           9ms

> linkerd viz stat saz/web-public -n emojivoto                                                 
NAME       AUTHORIZATION   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
web-http   web-public           -    90.00%   2.0rps           1ms           5ms           9ms
```

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-09-15 10:59:36 +05:30
Stepan Rabotkin 5e6a1b5508
Graceful shutdown for admin server (#6817)
* Graceful shutdown for admin server

Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>
2021-09-07 10:50:31 -05:00
Tarun Pothulapati a330d20aa0
stat_summary: support service metrics using `authority` label (#6514)
Currently, `viz stat` on services is pretty restricted because of
it not being a podowner resource. This PR fixes that by making
it use the `direction="outbound", authroty="svc"` while querying
the prometheus metrics. This means that for services, we can
generate metrics from the *meshed* clients side.

`StatsSummary` metrics on a service are further divided into
two kinds

### Service has no `ServiceProfiles.dstOverrides` 

In this case, We just return the metrics by
querying for `direction="outbound", authroty="svc"`, along
with any `--from` resources specified as client query labels.

We also gate this path, to fail for requests that have `--from`
as a service or for `svc/* --to xyz`, as they are invalid i.e 
we can't render metrics with service as the client.

### Service has `ServiceProfiles.dstOverrides` 

Here, We follow a similar path of `TrafficSplit`
except that we use a `ServiceProfile` resource
object instead.

_The TrafficSplit path will be removed or merged into the 
`Service` path in a separate PR for simplification,_


## Testing

### Apply Traffic Splitting through `ServiceProfiles`

```bash
on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 took 1m11s
➜ k create ns linkerd-trafficsplit-test-sp                                                                                                                                ~/work/linkerd2
namespace/linkerd-trafficsplit-test-sp created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ ./bin/linkerd inject ./test/integration/trafficsplit/testdata/application.yaml | k -n linkerd-trafficsplit-test-sp apply -f -                                           ~/work/linkerd2

document missing "kind" field, skipped
deployment "backend" injected
service "backend-svc" skipped
deployment "failing" injected
service "failing-svc" skipped
deployment "slow-cooker" injected
service "slow-cooker" skipped

deployment.apps/backend created
service/backend-svc created
deployment.apps/failing created
service/failing-svc created
deployment.apps/slow-cooker created
service/slow-cooker created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ k apply -f ./test/integration/trafficsplit/testdata/sp/updated-traffic-split-leaf-weights.yaml -n linkerd-trafficsplit-test-sp                                          ~/work/linkerd2
serviceprofile.linkerd.io/backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ k describe sp -n linkerd-trafficsplit-test-sp                                                                                                                           ~/work/linkerd2
Name:         backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
Namespace:    linkerd-trafficsplit-test-sp
Labels:       <none>
Annotations:  <none>
API Version:  linkerd.io/v1alpha2
Kind:         ServiceProfile
Metadata:
  Creation Timestamp:  2021-07-01T11:05:06Z
  Generation:          1
  Managed Fields:
    API Version:  linkerd.io/v1alpha2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:dstOverrides:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2021-07-01T11:05:06Z
  Resource Version:  1398
  UID:               fce0a250-1396-4a14-9729-e19030048c7a
Spec:
  Dst Overrides:
    Authority:  backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
    Weight:     500m
    Authority:  failing-svc.linkerd-trafficsplit-test-sp.svc.cluster.local:8081
    Weight:     500m
Events:         <none>
```

### CLI Output

```bash
on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp                                                                                                     ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.9rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   1.1rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker                                                                           ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.4rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   0.6rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1                                                                         ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.5rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   0.5rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/prometheus -n linkerd-viz                                                   ~/work/linkerd2
StatSummary API error: service only supported as a target on 'from' queries, or as a destination on 'to' queries%


# With no `sp.dstOverrides`

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  took 10s
➜ k -n linkerd-trafficsplit-test-sp delete sp backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local                                     ~/work/linkerd2
serviceprofile.linkerd.io "backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local" deleted

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp                                                                  ~/work/linkerd2
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   1.2rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 --from-namespace linkerd-trafficsplit-test-sp
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   0.6rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker --from-namespace linkerd-trafficsplit-test-sp
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   0.7rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat deploy/slow-cooker -n linkerd-trafficsplit-test-sp --to svc/backend-svc                                          ~/work/linkerd2
No traffic found.

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜                                                                                                                                            ~/work/linkerd2


```

Note: _This means that we need documenation changes to
let the user know that the `viz stat` on a service are client
side metrics and would be missing metrics from unmeshed
clients._


Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-08-09 23:05:14 +05:30
Alex Leong 10a36b010a
Refactor edges command (#6574)
Fixes https://github.com/linkerd/linkerd2/issues/3706

The implementation of the `linkerd viz edges` command works by gathering http and tcp metrics in both the inbound and outbound directions and combining this data in dubious ways.

We make the implementation simpler and more correct by instead doing the following:

* Gather tcp metrics only
  * (this drops support for very old proxy versions which do not expose the `tcp_open_connections` metric)
* Gather outbound metrics only
  * (all meshed edges will have a src in the mesh and will be present in the outbound metrics)
* Outbound metrics do not have a `client_id` label, so we fill in this missing data by inspecting the source pod via the k8s api and reconstruct that pod's TLS identity based on it's service account name and namespace.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-08-05 11:01:03 -07:00
Alex Leong 24792cfd1c
Remove core dependency on viz (#6497)
Fixes #5589 

The core control plane has a dependency on the viz package in order to use the `BuildResource` function.  This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene.

We move the `BuildResource` function into the viz package.  In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-19 14:28:45 -07:00
dependabot[bot] 1dfd8b5bd7
Bump google.golang.org/protobuf from 1.27.0 to 1.27.1 (#6409)
* Bump google.golang.org/protobuf from 1.27.0 to 1.27.1

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.27.0 to 1.27.1.
- [Release notes](https://github.com/protocolbuffers/protobuf-go/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash)
- [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.27.0...v1.27.1)

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-07-01 14:50:04 -06:00
dependabot[bot] 94b5aa634e
Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 (#6395)
* Bump google.golang.org/protobuf from 1.26.0 to 1.27.0

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/protocolbuffers/protobuf-go/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash)
- [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.26.0...v1.27.0)

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
2021-06-29 13:16:44 -06:00
Alex Leong 948f9a4ece
Update protoc (#6333)
Update protoc from 3.6.0 to 3.15.7

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-06-21 16:37:57 -07:00
dependabot[bot] f4dacaf27f
Bump google.golang.org/protobuf from 1.24.0 to 1.26.0 (#6304)
* Bump google.golang.org/protobuf from 1.24.0 to 1.26.0

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.24.0 to 1.26.0.
- [Release notes](https://github.com/protocolbuffers/protobuf-go/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash)
- [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.24.0...v1.26.0)

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update protobuf

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

* Update go.sum

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-06-21 10:24:47 -07:00
dependabot[bot] 3bb1b6397d
Bump helm.sh/helm/v3 from 3.4.1 to 3.6.1 (#6286)
* Bump helm.sh/helm/v3 from 3.4.1 to 3.6.1

Bumps [helm.sh/helm/v3](https://github.com/helm/helm) from 3.4.1 to 3.6.1.
- [Release notes](https://github.com/helm/helm/releases)
- [Commits](https://github.com/helm/helm/compare/v3.4.1...v3.6.1)

---
updated-dependencies:
- dependency-name: helm.sh/helm/v3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2021-06-18 09:34:29 -06:00
Dennis Adjei-Baah c61f5af1e9
Fix namespace always showing up in topology graph (#6236)
* Fix namespace always showing up in topology graph

Fixes #6211

In #6091, code was added to include the the namespace in the list of all
stat resource types. This was added to so that we'd have a complete list
of resources types that could be suggested by the CLI autocompletion
code. However, this list was also used by the web frontend in a query
that gathered metrics from all resource types. This then caused the
query to inadvertently create an inbound metric for deployment that
showed traffic from given namespace.

This change, breaks up the code so that we have a separate list for the
autocompletion code without the namespace value and the original list
used by the web frontend prior to #6091.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-06-09 15:31:11 +05:30
Oliver Gould da6d8e5272
Update Go to 1.16.4 (#6170)
Go 1.16.4 includes a fix for a denial-of-service in net/http: golang/go#45710

Go's error file-line formatting changed in 1.16.3, so this change
updates tests to only do suffix matching on these error strings.
2021-05-24 11:57:46 -07:00
Dennis Adjei-Baah 03438c6587
Add support for server resource aware completion (#6091)
Linkerd's CLI offers basic shell suggestions on most of its subcommands.
These suggestions are based on hardcoded suggestion lists, for example such
`viz stat` auto suggests a list of all resources types supported by that
command located in `k8s.go`. Although this provides basic suggestions
for k8s manifest resources, prior to this change, there currently is no
way to get auto suggested resources from the k8s cluster linkerd is
installed in.

This change adds a new `CommandCompletion` module that reads arguments
and a substring, and queries the k8s API to determine what suggestions
to provide to the user. The current implementation makes the module
generic enough to query most Kubernetes resourcesand can be used for all
subcommands in the CLI.

This change only applies this behavior to the `stat` command as first
step. Adding auto completion for other commands will be done in a number
of follow up PRs.

To test out the change on this branch:
- Build the CLI binaries on this branch
- install the completion scripts for your shell environment. Running
`linkerd completion -h` should give you more info on how to do that.
- If not installed already, install `linkerd viz`
```
linkerd viz install | k apply -f -
```
- test out completion by typing
```
linkerd viz stat [tab][tab]
```

Part of #5981 

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-05-18 09:59:59 -04:00
Kevin Leimkuhler 1071ec2e77
Add support for awaiting proxy readiness (#5967)
### What

This change adds the `config.linkerd.io/proxy-await` annotation which when set will delay application container start until the proxy is ready. This allows users to force application containers to wait for the proxy container to be ready without modifying the application's Docker image. This is different from the current use-case of [linkerd-await](https://github.com/olix0r/linkerd-await) which does require modifying the image.

---

To support this, Linkerd is using the fact that containers are started in the order that they appear in `spec.containers`. If `linkerd-proxy` is the first container, then it will be started first.

Kubernetes will start each container without waiting on the result of the previous container. However, if a container has a hook that is executed immediately after container creation, then Kubernetes will wait on the result of that hook before creating the next container. Using a `PostStart` hook in the `linkerd-proxy` container, the `linkerd-await` binary can be run and force Kubernetes to pause container creation until the proxy is ready. Once `linkerd-await` completes, the container hook completes and the application container is created.

Adding the `config.linkerd.io/await-proxy` annotation to a pod's metadata results in the `linkerd-proxy` container being the first container, as well as having the container hook:

```yaml
postStart:
  exec:
    command:
    - /usr/lib/linkerd/linkerd-await
```

---

### Update after draft

There has been some additional discussion both off GitHub as well as on this PR (specifically with @electrical).

First, we decided that this feature should be enabled by default. The reason for this is more often than not, this feature will prevent start-up ordering issues from occurring without having any negative effects on the application. Additionally, this will be a part of edges up until the 2.11 (the next stable release) and having it enabled by default will allow us to check that it does not conflict often with applications. Once we are closer to 2.11, we'll be able to determine if this should be disabled by default because it causes more issues than it prevents.

Second, this feature will remain configurable; if disabled, then upon injection the proxy container will not be made the first container in the pod manifest. This is important for the reasons discussed with @electrical about tools that make assumptions about app containers being the first container. For example, Rancher defaults to showing overview pages for the `0` index container, and if the proxy container was always `0` then this would defeat the purpose of the overview page.

### Testing

To test this I used the `sleep.sh` script and changed `Dockerfile-proxy` to use it as it's `ENTRYPOINT`. This forces the container to sleep for 20 seconds before starting the proxy.

---

`sleep.sh`:

```bash
#!/bin/bash
echo "sleeping..."
sleep 20
/usr/bin/linkerd2-proxy-run
```

`Dockerfile-proxy`:

```textile
...
COPY sleep.sh /sleep.sh
RUN ["chmod", "+x", "/sleep.sh"]
ENTRYPOINT ["/sleep.sh"]
```

---

```bash
# Build and install with the above changes
$ bin/docker-build
...
$ bin/image-load --k3d
...
$ bin/linkerd install |kubectl apply -f -
```

Annotate the `emoji` deployment so that it's the only workload that should wait for it's proxy to be ready and inject it:

```bash
cat emojivoto.yaml |bin/linkerd inject - |kubectl apply -f -
```

You can then see that the `emoji` deployment is not starting its application container until the proxy is ready:

```bash
$ kubectl get -n emojivoto pods
NAME                        READY   STATUS            RESTARTS   AGE
voting-ff4c54b8d-sjlnz      1/2     Running           0          9s
emoji-f985459b4-7mkzt       0/2     PodInitializing   0          9s
web-5f86686c4d-djzrz        1/2     Running           0          9s
vote-bot-6d7677bb68-mv452   1/2     Running           0          9s
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-04-21 17:43:23 -04:00
Alejandro Pedraza 6980e45e1d
Remove the `linkerd-controller` pod (#6039)
* Remove the `linkerd-controller` pod

Now that we got rid of the `Version` API (#6000) and the destination API forwarding business in `linkerd-controller` (#5993), we can get rid of the `linkerd-controller` pod.

## Removals

- Deleted everything under `/controller/api/public` and `/controller/cmd/public-api`.
- Moved `/controller/api/public/test_helper.go` to `/controller/api/destination/test_helper.go` because those are really utils for destination testing. I also extracted from there the prometheus mock structs and put that under `/pkg/prometheus/test_helper.go`, which is now by both the `linkerd diagnostics endpoints` and the `metrics-api` tests, removing some duplication.
- Deleted the `controller.yaml` and `controller-rbac.yaml` helm templates along with the `publicAPIResources` and `publicAPIProxyResources` helm values.

## Health checks

- Removed the `can initialize the client` check given such client is no longer needed. The `linkerd-api` section was left with only the check `control pods are ready`, so I moved that under the `linkerd-existence` section and got rid of the `linkerd-api` section altogether.
- In that same `linkerd-existence` section, got rid of the `controller pod is running` check.

## Other changes

- Fixed the Control Plane section of the dashboard, taking account the disappearance of `linkerd-controller` and previously, of `linkerd-sp-validator`.
2021-04-19 09:57:45 -05:00
Alejandro Pedraza c24585e6ea
Removed `Version` API from the public-api (#6000)
* Removed `Version` API from the public-api

This is a sibling PR to #5993, and it's the second step towards removing the `linkerd-controller` pod.

This one deals with a replacement for the `Version` API, fetching instead the `linkerd-config` CM and retrieving the `LinkerdVersion` value.

## Changes to the public-api

- Removal of the `publicPb.ApiClient` entry from the `Client` interface
- Removal of the `publicPb.ApiServer` entry from the `Server` interface
- Removal of the `Version` and related methods from `client.go`, `grpc_server.go` and `http_server.go`

## Changes to `linkerd version`

- Removal of all references to the public API.
- Call `healthcheck.GetServerVersion` to retrieve the version

## Changes to `linkerd check`

- Removal of the "can query the control API" check from the "linkerd-api" section
- Addition of a new "can retrieve the control plane version" check under the "control-plane-version" section

## Changes to `linkerd-web`

- The version is now retrieved from the `linkerd-config` CM instead of a public-API call.
- Removal of all references to the public API.
- Removal of the `data-go-version` global attribute on the dashboard, which wasn't being used.

## Other changes

- Added `ValuesFromConfigMap` function in `values.go` to convert the `linkerd-config` CM into a `*Values` struct instance
- Removal of the `public` protobuf
- Refactor 'linkerd repair' to use the refactored 'healthcheck.GetServerVersion()' function
2021-04-16 11:23:55 -05:00
Alex Leong 320df9e69a
Add CA certs to metrics-api container (#5972)
Fixes #5966 
Fixes #5955 

The metrics-api container in the Viz extension does not have the default set of system CA certificates installed.  This means that it will fail to validate the certificate of an external prometheus serverd over https.

We add install default CA certs into the container.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-03-30 09:26:40 -07:00
Matei David e798b33e2e
Add peer label to TCP read and write stat queries (#5903)
Add peer label to TCP read and write stat queries

Closes #5693

### Tests
---

After refactoring, `linkerd viz stat` behaves the same way (I haven't checked gateways or routes).

```
$ linkerd viz stat deploy/web -n emojivoto -o wide

NAME   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
web       1/1    91.91%   2.3rps           2ms           4ms           5ms          3         185.3B/s         5180.0B/s

# same value as before, latency seems to have dropped

time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"

time="2021-03-22T18:19:44Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"web\", direction=\"inbound\", namespace=\"emojivoto\", peer=\"src\"}[1m])) by (namespace, deployment)"

# queries show the peer label
---

$ linkerd viz stat deploy/web -n emojivoto --from deploy/vote-bot -o wide

NAME   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
web       1/1    93.16%   1.9rps           3ms           4ms           4ms          1        4503.4B/s          153.1B/s


# stats same as before except for latency which seems to have dropped a bit

time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_write_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"

time="2021-03-22T18:22:10Z" level=debug msg="Query request:\n\tsum(increase(tcp_read_bytes_total{deployment=\"vote-bot\", direction=\"outbound\", dst_deployment=\"web\", dst_namespace=\"emojivoto\", namespace=\"emojivoto\", peer=\"dst\"}[1m])) by (dst_namespace, dst_deployment)"

# queries show the right label
```

Signed-off-by: mateiidavid <matei.david.35@gmail.com>
2021-03-26 13:36:30 -04:00
Dennis Adjei-Baah 7f0529ed7c
update go.mod and docker images to go 1.16.2 (#5890)
* update go.mod and docker images to go 1.16.1

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>

* update test error messages for ParseDuration

* update go version to 1.16.2
2021-03-15 11:20:16 -05:00
Oliver Gould ab2a809e1b
docker: Avoid specifying TARGETARCH for await (#5835)
When introducing the `linkerd-await` helper, we provided a default value
for `TARGETARCH`. This appears to interfere with multi-arch image
builds, causing ARM builds to fetch amd64 binaries.

Unsetting this default appears to fix this issue.
2021-02-26 07:30:14 -05:00
Oliver Gould 9e9b40d5a2
Add the linkerd-await helper to all Linkerd containers (#5821)
When a container starts up, we generally want to wait for the proxy to
initialize before starting the controller (which may initiate outbound
connections, especially to the Kubernetes API). This is true for all
pods except the identity controller, which must start before its proxy.

This change adds the linkerd-await helper to all of our container
images. Its use is explicitly disabled in the identity controller, due
to startup ordering constraints, and the heartbeat controller, because
it does not run a proxy currently.

Fixes #5819
2021-02-25 10:35:04 -08:00
Dennis Adjei-Baah 15d1809bd0
Remove linkerd prefix from extension resources (#5803)
* Remove linkerd prefix from extension resources

This change removes the `linkerd-` prefix on all non-cluster resources
in the jaeger and viz linkerd extensions. Removing the prefix makes all
linkerd extensions consistent in their naming.

Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
2021-02-25 11:01:31 -05:00
Oliver Gould 2774780fb8
Update Go to 1.14.15 (#5751)
The Go-1.14 release branch includes a number of important updates. This
change updates our containers' base image to the latest release, 1.14.15

See linkerd/linkerd2-proxy-init#32
Fixes #5655
2021-02-16 08:40:06 -08:00
Kevin Leimkuhler 75fcc9d623
Move tap from core into Viz extension (#5651)
Closes #5545.

This change moves all tap and tap-injector code into the viz directory. 

The tap and tap-injector components now also use a new tap image—separating
these components from the controller image that they are currently part of. This
means the controller image has removed all its build dependencies related to
tap.

Finally, the tap Protobuf has been separated from the metrics-api and moved into
it's own `.proto` file and gen directory. This introduces a clear split between
metrics-api and tap Protobuf.

There is no change in behavior for the `viz tap` command.

### Reviewing

#### Docker images

All the bin directory scripts should be updated to build and load the tap image.
All the CI workflows should be updated to build and push the tap image.

#### Controller and pkg directories

This is primarily deletions. Most of the deleted code in this directory is now
in the tap directory of the Viz extension.

#### viz/tap

This is the location that all the tap related code now lives in. New files are
mostly moved from the controller and pkg directories. Imports have all been
updated to point at the right locations and Protobuf.

The Protobuf here is taken from metrics-api and contains all tap-related
Protobuf.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-09 12:43:21 -05:00
Alejandro Pedraza a04b30d2ab
Simplify SelfCheck API (#5665)
Fixes #5575

Now that only viz makes use of the `SelfCheck` api, merged the `healthcheck.proto` into `viz.proto`.

Also removed the "checkRPC" functionality that was used for handling multiple API responses and was only used by `SelfCheck`, because the extra complexity was not granted. Revert to use the plain vanilla "check" by just concatenating error responses.

## Success Output

```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
√ viz extension self-check
```

## Failure Examples

Failure when viz fails to connect to the k8s api:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```

Failure when viz fails to connect to both the k8s api and Prometheus:
```bash
$ bin/linkerd viz check
...
linkerd-viz
-----------
...
× viz extension self-check
    Error calling the Kubernetes API: someerror
    Error calling Prometheus from the control plane: someerror
    see https://linkerd.io/checks/#l5d-api-control-api for hints

Status check results are ×
```
2021-02-05 10:13:45 -05:00
Alejandro Pedraza 8ac5360041
Extract from public-api all the Prometheus dependencies, and moves things into a new viz component 'linkerd-metrics-api' (#5560)
* Protobuf changes:
- Moved `healthcheck.proto` back from viz to `proto/common` as it remains being used by the main `healthcheck.go` library (it was moved to viz by #5510).
- Extracted from `viz.proto` the IP-related types and put them in `/controller/gen/common/net` to be used by both the public and the viz APIs.

* Added chart templates for new viz linkerd-metrics-api pod

* Spin-off viz healthcheck:
- Created `viz/pkg/healthcheck/healthcheck.go` that wraps the original `pkg/healthcheck/healthcheck.go` while adding the `vizNamespace` and `vizAPIClient` fields which were removed from the core `healthcheck`. That way the core healthcheck doesn't have any dependencies on viz, and viz' healthcheck can now be used to retrieve viz api clients.
- The core and viz healthcheck libs are now abstracted out via the new `healthcheck.Runner` interface.
- Refactored the data plane checks so they don't rely on calling `ListPods`
- The checks in `viz/cmd/check.go` have been moved to `viz/pkg/healthcheck/healthcheck.go` as well, so `check.go`'s sole responsibility is dealing with command business. This command also now retrieves its viz api client through viz' healthcheck.

* Removed linkerd-controller dependency on Prometheus:
- Removed the `global.prometheusUrl` config in the core values.yml.
- Leave the Heartbeat's `-prometheus` flag hard-coded temporarily. TO-DO: have it automatically discover viz and pull Prometheus' endpoint (#5352).

* Moved observability gRPC from linkerd-controller to viz:
- Created a new gRPC server under `viz/metrics-api` moving prometheus-dependent functions out of the core gRPC server and into it (same thing for the accompaigning http server).
- Did the same for the `PublicAPIClient` (now called just `Client`) interface. The `VizAPIClient` interface disappears as it's enough to just rely on the viz `ApiClient` protobuf type.
- Moved the other files implementing the rest of the gRPC functions from `controller/api/public` to `viz/metrics-api` (`edge.go`, `stat_summary.go`, etc.).
- Also simplified some type names to avoid stuttering.

* Added linkerd-metrics-api bootstrap files. At the same time, we strip out of the public-api's `main.go` file the prometheus parameters and other no longer relevant bits.

* linkerd-web updates: it requires connecting with both the public-api and the viz api, so both addresses (and the viz namespace) are now provided as parameters to the container.

* CLI updates and other minor things:
- Changes to command files under `cli/cmd`:
  - Updated `endpoints.go` according to new API interface name.
  - Updated `version.go`, `dashboard` and `uninstall.go` to pull the viz namespace dynamically.
- Changes to command files under `viz/cmd`:
  - `edges.go`, `routes.go`, `stat.go` and `top.go`: point to dependencies that were moved from public-api to viz.
- Other changes to have tests pass:
  - Added `metrics-api` to list of docker images to build in actions workflows.
  - In `bin/fmt` exclude protobuf generated files instead of entire directories because directories could contain both generated and non-generated code (case in point: `viz/metrics-api`).

* Add retry to 'tap API service is running' check

* mc check shouldn't err when viz is not available. Also properly set the log in multicluster/cmd/root.go so that it properly displays messages when --verbose is used
2021-01-21 18:26:38 -05:00
Alejandro Pedraza f3b1ebfa99
Separate observability API (#5510)
* Separate observability API

Closes #5312

This is a preliminary step towards moving all the observability API into `/viz`, by first moving its protobuf into `viz/metrics-api`. This should facilitate review as the go files are not moved yet, which will happen in a followup PR. There are no user-facing changes here.

- Moved `proto/common/healthcheck.proto` to `viz/metrics-api/proto/healthcheck.prot`
- Moved the contents of `proto/public.proto` to `viz/metrics-api/proto/viz.proto` except for the `Version` Stuff.
- Merged `proto/controller/tap.proto` into `viz/metrics-api/proto/viz.proto`
- `grpc_server.go` now temporarily exposes `PublicAPIServer` and `VizAPIServer` interfaces to separate both APIs. This will get properly split in a followup.
- The web server provides handlers for both interfaces.
- `cli/cmd/public_api.go` and `pkg/healthcheck/healthcheck.go` temporarily now have methods to access both APIs.
- Most of the CLI commands will use the Viz API, except for `version`.

The other changes in the go files are just changes in the imports to point to the new protobufs.

Other minor changes:
- Removed `git add controller/gen` from `bin/protoc-go.sh`
2021-01-13 14:34:54 -05:00