Commit Graph

1 Commits

Author SHA1 Message Date
Tarun Pothulapati a330d20aa0
stat_summary: support service metrics using `authority` label (#6514)
Currently, `viz stat` on services is pretty restricted because of
it not being a podowner resource. This PR fixes that by making
it use the `direction="outbound", authroty="svc"` while querying
the prometheus metrics. This means that for services, we can
generate metrics from the *meshed* clients side.

`StatsSummary` metrics on a service are further divided into
two kinds

### Service has no `ServiceProfiles.dstOverrides` 

In this case, We just return the metrics by
querying for `direction="outbound", authroty="svc"`, along
with any `--from` resources specified as client query labels.

We also gate this path, to fail for requests that have `--from`
as a service or for `svc/* --to xyz`, as they are invalid i.e 
we can't render metrics with service as the client.

### Service has `ServiceProfiles.dstOverrides` 

Here, We follow a similar path of `TrafficSplit`
except that we use a `ServiceProfile` resource
object instead.

_The TrafficSplit path will be removed or merged into the 
`Service` path in a separate PR for simplification,_


## Testing

### Apply Traffic Splitting through `ServiceProfiles`

```bash
on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5 took 1m11s
➜ k create ns linkerd-trafficsplit-test-sp                                                                                                                                ~/work/linkerd2
namespace/linkerd-trafficsplit-test-sp created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ ./bin/linkerd inject ./test/integration/trafficsplit/testdata/application.yaml | k -n linkerd-trafficsplit-test-sp apply -f -                                           ~/work/linkerd2

document missing "kind" field, skipped
deployment "backend" injected
service "backend-svc" skipped
deployment "failing" injected
service "failing-svc" skipped
deployment "slow-cooker" injected
service "slow-cooker" skipped

deployment.apps/backend created
service/backend-svc created
deployment.apps/failing created
service/failing-svc created
deployment.apps/slow-cooker created
service/slow-cooker created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ k apply -f ./test/integration/trafficsplit/testdata/sp/updated-traffic-split-leaf-weights.yaml -n linkerd-trafficsplit-test-sp                                          ~/work/linkerd2
serviceprofile.linkerd.io/backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local created

on  kind-kind  linkerd2 on 🌱 taru [📦++1🤷‍] via 🐼 v1.16.5
➜ k describe sp -n linkerd-trafficsplit-test-sp                                                                                                                           ~/work/linkerd2
Name:         backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
Namespace:    linkerd-trafficsplit-test-sp
Labels:       <none>
Annotations:  <none>
API Version:  linkerd.io/v1alpha2
Kind:         ServiceProfile
Metadata:
  Creation Timestamp:  2021-07-01T11:05:06Z
  Generation:          1
  Managed Fields:
    API Version:  linkerd.io/v1alpha2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:dstOverrides:
    Manager:         kubectl-client-side-apply
    Operation:       Update
    Time:            2021-07-01T11:05:06Z
  Resource Version:  1398
  UID:               fce0a250-1396-4a14-9729-e19030048c7a
Spec:
  Dst Overrides:
    Authority:  backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
    Weight:     500m
    Authority:  failing-svc.linkerd-trafficsplit-test-sp.svc.cluster.local:8081
    Weight:     500m
Events:         <none>
```

### CLI Output

```bash
on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp                                                                                                     ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.9rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   1.1rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker                                                                           ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.4rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   0.6rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via  took 2s
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1                                                                         ~/work/linkerd2
NAME                                                         APEX                                                         LEAF          WEIGHT   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc     500m   100.00%   0.5rps           1ms           2ms           2ms
backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local   failing-svc     500m     0.00%   0.5rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 main [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/prometheus -n linkerd-viz                                                   ~/work/linkerd2
StatSummary API error: service only supported as a target on 'from' queries, or as a destination on 'to' queries%


# With no `sp.dstOverrides`

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via  took 10s
➜ k -n linkerd-trafficsplit-test-sp delete sp backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local                                     ~/work/linkerd2
serviceprofile.linkerd.io "backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local" deleted

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp                                                                  ~/work/linkerd2
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   1.2rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker-1 --from-namespace linkerd-trafficsplit-test-sp
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   0.6rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat svc/backend-svc -n linkerd-trafficsplit-test-sp --from deploy/slow-cooker --from-namespace linkerd-trafficsplit-test-sp
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
backend-svc        -   100.00%   0.7rps           1ms           2ms           2ms

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜ ./bin/go-run cli viz stat deploy/slow-cooker -n linkerd-trafficsplit-test-sp --to svc/backend-svc                                          ~/work/linkerd2
No traffic found.

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.6 via 
➜                                                                                                                                            ~/work/linkerd2


```

Note: _This means that we need documenation changes to
let the user know that the `viz stat` on a service are client
side metrics and would be missing metrics from unmeshed
clients._


Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-08-09 23:05:14 +05:30