Commit Graph

679 Commits

Author SHA1 Message Date
Alejandro Pedraza f9f3ebefa9
Remove namespace from charts and split them into `linkerd-crd` and `linkerd-control-plane` (#6635)
Fixes #6584 #6620 #7405

# Namespace Removal

With this change, the `namespace.yaml` template is rendered only for CLI installs and not Helm, and likewise the `namespace:` entry in the namespace-level objects (using a new `partials.namespace` helper).

The `installNamespace` and `namespace` entries in `values.yaml` have been removed.

There in the templates where the namespace is required, we moved from `.Values.namespace` to `.Release.Namespace` which is filled-in automatically by Helm. For the CLI, `install.go` now explicitly defines the contents of the `Release` map alongside `Values`.

The proxy-injector has a new `linkerd-namespace` argument given the namespace is no longer persisted in the `linkerd-config` ConfigMap, so it has to be passed in. To pass it further down to `injector.Inject()` without modifying the `Handler` signature, a closure was used.

------------
Update: Merged-in #6638: Similar changes for the `linkerd-viz` chart:

Stop rendering `namespace.yaml` in the `linkerd-viz` chart.

The additional change here is the addition of the `namespace-metadata.yaml` template (and its RBAC), _not_ rendered in CLI installs, which is a Helm `post-install` hook, consisting on a Job that executes a script adding the required annotations and labels to the viz namespace using a PATCH request against kube-api. The script first checks if the namespace doesn't already have an annotations/labels entries, in which case it has to add extra ops in that patch.

---------
Update: Merged-in the approved #6643, #6665 and #6669 which address the `linkerd2-cni`, `linkerd-multicluster` and `linkerd-jaeger` charts. 

Additional changes from what's already mentioned above:
- Removes the install-namespace option from `linkerd install-cni`, which isn't found in `linkerd install` nor `linkerd viz install` anyways, and it would add some complexity to support.
- Added a dependency on the `partials` chart to the `linkerd-multicluster-link` chart, so that we can tap on the `partials.namespace` helper.
- We don't have any more the restriction on having the muticluster objects live in a separate namespace than linkerd. It's still good practice, and that's the default for the CLI install, but I removed that validation.


Finally, as a side-effect, the `linkerd mc allow` subcommand was fixed; it has been broken for a while apparently:

```console
$ linkerd mc allow --service-account-name foobar
Error: template: linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml:16:7: executing "linkerd-multicluster/templates/remote-access-service-mirror-rbac.yaml" at <include "partials.annotations.created-by" $>: error calling include: template: no template "partials.annotations.created-by" associated with template "gotpl"
```
---------
Update: see helm/helm#5465 describing the current best-practice

# Core Helm Charts Split

This removes the `linkerd2` chart, and replaces it with the `linkerd-crds` and `linkerd-control-plane` charts. Note that the viz and other extension charts are not concerned by this change.

Also note the original `values.yaml` file has been split into both charts accordingly.

### UX

```console
$ helm install linkerd-crds --namespace linkerd --create-namespace linkerd/linkerd-crds
...
# certs.yaml should contain identityTrustAnchorsPEM and the identity issuer values
$ helm install linkerd-control-plane --namespace linkerd -f certs.yaml linkerd/linkerd-control-plane
```

### Upgrade

As explained in #6635, this is a breaking change. Users will have to uninstall the `linkerd2` chart and install these two, and eventually rollout the proxies (they should continue to work during the transition anyway).

### CLI

The CLI install/upgrade code was updated to be able to pick the templates from these new charts, but the CLI UX remains identical as before.

### Other changes

- The `linkerd-crds` and `linkerd-control-plane` charts now carry a version scheme independent of linkerd's own versioning, as explained in #7405.
- These charts are Helm v3, which is reflected in the `Chart.yaml` entries and in the removal of the `requirements.yaml` files.
- In the integration tests, replaced the `helm-chart` arg with `helm-charts` containing the path `./charts`, used to build the paths for both charts.

### Followups

- Now it's possible to add a `ServiceProfile` instance for Destination in the `linkerd-control-plane` chart.
2021-12-10 15:53:08 -05:00
Matei David ebe125df6f
Update `proxy-init` to `v1.5.2` (#7447)
New proxy init version adds the option to skip subnets 
from being redirected to the inbound proxy port.

Signed-off-by: Matei David <matei@buoyant.io>
2021-12-09 15:31:55 +00:00
Kevin Leimkuhler 147d85dc70
Update `GetProfile` clients with policy server updates (#7388)
### What

`GetProfile` clients do not receive destinatin profiles that consider Server protocol fields the way that `Get` clients do. If a Server exists for a `GetProfile` destination that specifies the protocol for that destination is `opaque`, this information is not passed back to the client.

#7184 added this for `Get` by subscribing clients to Endpoint/EndpointSlice updates. When there is an update, or there is a Server update, the endpoints watcher passes this information back to the endpoint translator which handles sending the update back to the client.

For `GetProfile` the situation is different. As with `Get`, we only consider Servers when dealing with Pod IPs, but this only occurs in two situations for `GetProfile`.

1. The destination is a Pod IP and port
2. The destionation is an Instance ID and port

In both of these cases, we need to check if a already Server selects the endpoint and we need to subscribe for Server updates incase one is added or deleted which selects the endpoint.

### How

First we check if there is already a Server which selects the endpoint. This is so that when the first destionation profile is returned, the client knows if the destination is `opaque` or not.

After sending that first update, we then subscribe the client for any future updates which will come from a Server being added or deleted.

This is handled by the new `ServerWatcher` which watches for Server updates on the cluster; when an update occurs it sends that to the `endpointProfileTranslator` which translates the protcol update into a DestinationProfile.

By introducing the `endpointProfileTranslator` which only handles protocol updates, we're able to decouple the endpoint logic from `profileTranslator`—it's `endpoint` field has been removed now that it only handles updates for ServiceProfiles for Services.

### Testing

A unit test has been added and below are some manual testing instructions to see how it interacts with Server updates:

<details>
	<summary>app.yaml</summary>

	```yaml
	apiVersion: v1
	kind: Pod
	metadata:
	  name: pod
	  labels:
		app: pod
	spec:
	  containers:
	  - name: app
		image: nginx
		ports:
		  - name: http
			containerPort: 80
	---
	apiVersion: policy.linkerd.io/v1beta1
	kind: Server
	metadata:
	  name: srv
	  labels:
		policy: srv
	spec:
	  podSelector:
		matchLabels:
		  app: pod
	  port: 80
	  proxyProtocol: opaque
	```
</details>

```shell
$ go run ./controller/cmd/main.go destination
```

```shell
$ linkerd inject app.yaml |kubectl apply -f -
...
$ kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP           NODE                       NOMINATED NODE   READINESS GATES
pod    2/2     Running   0          53m   10.42.0.34   k3d-k3s-default-server-0   <none>           <none>
$ go run ./controller/script/destination-client/main.go -method getProfile -path 10.42.0.34:80
...
```

You can add/delete `srv` as well as edit its `proxyProtocol` field to observe the correct DestinationProfile updates.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2021-12-08 12:26:27 -07:00
Alex Leong 5703d25518
Add context token support to destination client script (#7404)
The destination API expects a json formatted context token which includes the node name of the requesting client.  e.g.

```
go run controller/script/destination-client/main.go -method get -path web-svc.emojivoto.svc.cluster.local:80  --token '{"nodeName":"alex-worker"}'
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-12-06 11:58:35 -08:00
Tarun Pothulapati 4170b49b33
smi: remove default functionality in linkerd (#7334)
Now, that SMI functionality is fully being moved into the
[linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can
stop supporting its functionality by default.

This means that the `destination` component will stop reacting
to the `TrafficSplit` objects. When `linkerd-smi` is installed,
It does the conversion of `TrafficSplit` objects to `ServiceProfiles`
that destination components can understand, and will react accordingly.

Also, Whenever a `ServiceProfile` with traffic splitting is associated
with a service, the same information (i.e splits and weights) is also
surfaced through the `UI` (in the new `services` tab) and the `viz cmd`.
So, We are not really loosing any UI functionality here.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-12-03 12:07:30 +05:30
Alex Leong 18bfd0bd9e
default enableEndpointSlices to true (#7216)
EndpointSlices are enabled by default in our Kubernetes minimum version of 1.20.  Thus we can change the default behavior of the destination controller to use EndpointSlices instead of Endpoints.  This unblocks any functionality which is specific to EndpointSlices such as topology aware hints.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-12-01 15:56:44 -08:00
Eng Zer Jun f2fb35aa46
build: upgrade to Go 1.17 (#7371)
* build: upgrade to Go 1.17

This commit introduces three changes:
	1. Update the `go` directive in `go.mod` to 1.17
	2. Update all Dockerfiles from `golang:1.16.2` to
	   `golang:1.17.3`
	3. Update all CI to use Go 1.17

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* chore: run `go fmt ./...`

This commit synchronizes `//go:build` lines with `// +build` lines.

Reference: https://go.googlesource.com/proposal/+/master/design/draft-gobuild.md
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-11-30 15:36:11 -05:00
Krzysztof Dryś f92e77f7f0
Remove legacy upgrade and it's references (#7309)
With [linkerd2#5008](https://github.com/linkerd/linkerd2/issues/5008) and associated PRs, we changed the way configuration is handled by storing a helm values struct inside of the configmap.

Now that we have had one stable release with new configuration, were no longer use and need to maintain the legacy config. This commit removes all the associated logic, protobuf files, and references.

Changes Include:

- Removed [`proto/config/config.proto`](https://github.com/linkerd/linkerd2/blob/main/proto/config/config.proto)
- Changed [`bin/protoc-go.sh`](https://github.com/linkerd/linkerd2/blob/main/bin/protoc-go.sh) to not include `config.proto`
- Changed [`FetchLinkerdConfigMap()`](741fde679b/pkg/healthcheck/healthcheck.go (L1768)) in `healthcheck.go` to return only the configmap, with the pb type.
- Changed [`FetchCurrentConfiguration()`](741fde679b/pkg/healthcheck/healthcheck.go (L1647)) only unmarshal and use helm value struct from configmap (as a follow-up to the todo above; note that there's already a todo here to refactor the function once value struct is the default, which has already happened)
- Removed [`upgrade_legacy.go`](https://github.com/linkerd/linkerd2/blob/main/cli/cmd/upgrade_legacy.go)

Signed-off-by: Krzysztof Dryś <krzysztofdrys@gmail.com>
2021-11-29 20:08:58 +05:30
Kevin Leimkuhler 01cbe616f1
Honor Server `proxyProtocol` in destination service `Get` with policy CRD APIs (#7184)
This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic.

Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account.

There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service.

This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator.

The two primary scenarios to consider are

### A `Get` request for some service is streaming when a Server is created/updated/deleted
When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque.

When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides.

### A `Get` request for some service happens after a Server is created
When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set.

This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected.

## Testing
Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server

**app.yaml**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod
  labels:
    app: pod
spec:
  containers:
  - name: app
    image: nginx
    ports:
    - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: svc
spec:
  selector:
    app: pod
  ports:
  - name: http
    port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
  name: srv
  labels:
    policy: srv
spec:
  podSelector:
    matchLabels:
      app: pod
  port: 80
  proxyProtocol: HTTP/1
```

```bash
$ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-11-23 20:35:53 -07:00
Zahari Dichev 40cdb7fc23
add mc crd to codegen (#7335)
Currently, the MC `Link` CRD is being handled using the dynamic k8s client. It would be useful for consumers of this API if there was a typed API for this CRD.

The solution is to update the codegen script to generate this code.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2021-11-23 15:49:14 -07:00
Matei David 690bc09c35
Stop using deprecated `beta.kubernetes.io/node` label (#7310)
In our chart values and (some) integration tests, we're using a deprecated
label for node selection. According to the warning messages we get during
installation, the label has been deprecated since k8s `v1.14`:

```
Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead
Warning: spec.jobTemplate.spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead
```

This PR changes all occurrences of `beta.kubernetes.io/node` with
`kubernetes.io/node`.

Fixes #7225
2021-11-19 09:50:15 -08:00
Alex Leong ea5461f674
Fix identity overrides for endpoint slices (#7243)
When the `mirror.linkerd.io/remote-gateway-identity` and `mirror.linkerd.io/remote-svc-fq-name` annotations are set on an EndpointSlice object, the destination controller does not return the correct identity hints for that endpoint.

We fix an incorrect assignment to fix this.  We also fix some logic that can result in a nil pointer dereference instead of comparing empty strings.  We add a test case to exercise these.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-11-09 16:38:46 -08:00
Kevin Leimkuhler ebb1ee8c4c
Deprecate `topologyKeys` and add support for endpoint slices `Hints`. (#6698)
## background

In order to upgrade `client-go` and other related libraries to `v0.22.0`, we had to address the deprecation of the service's `TopologyKeys` field. This field and it's related feature have been deprecated and superseded by [Topology Aware Hints](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2433-topology-aware-hints/README.md).

The goal of topology aware hints is to to provide a simpler way for users to prefer endpoints by basing decisions soely off the node's `topology.kubernetes.io/zone` label. If a node is in `zone-a`, then it should prefer endpoints that _should_ be consumed by clients in `zone-a`.

kube-proxy (and now the destination controller) know that an endpoint _should_ be consumed by clients in certain zones if its `Hints.ForZones` field is set with a zone value that matches that of the client.

For example, the endpoint slice controller may add the following hint to an endpoint:
```
- addresses: ["1.1.1.1"]
  zone: "zone-a"
  hints:
    zone: "zone-b"
```

The above endpoint is an endpoint that is located in `zone-a` but should be consumed by clients in `zone-b`.

## changes

Now that topological preference is not a concept, we can remove it from the `servicePublisher` and `portPublisher` structs. The fields were only there so that it could be populated down to individual addresses.

The `Hints` field is only present on endpoints that belong to an `EndpointSlice`, so use of this field is limited to the `endpointSliceToAddresses` function.

When endpoint slices are translated to an `AddressSet` now, for each address (endpoint) we make sure to copy the `Hints.ForZones` field if it is present. This field is only present if it's set by the endpoint slice controller and it has [several safeguards](https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/#safeguards).

After `endpointSliceToAddresses` has translated an endpoint slice into an `AddressSet` and updated the endpoint translator's `availableEndpoints`, filtering takes place and is the crux of this change.

For each potential address that we have to consider in `availableEndpoints`, we make sure to only return a set of addresses who's consumption zone (zones in `forZones` field) match that of the node's zone. That way, we only communicate with endpoints that have been labeled by the endpoint slice controller for the current node we're on.

This allows us to remove the ordering/hierarchy of topological region and considering the `*` value.

## testing

I've added a unit test which creates an endpoint translator tied to a node in `west-1a` and asserts that it only handles updates for addresses that should be consumed by clients in `west-1a`.

Closes #6637

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-11-08 12:21:31 -07:00
Christian Schlotter 98533538e6
Allow proxy-init container to run as non-root (#7162)
Linkerd proxy-init container is currently enforced to run as root.

Removes hardcoding `runAsNonRoot: false` and `runAsUser: 0`. This way
the container inherits the user ID from the proxy-init image instead which
may allow to run as non-root.

Fixes #5505

Signed-off-by: Schlotter, Christian <christian.schlotter@daimler.com>
2021-11-05 10:44:32 -05:00
Oliver Gould 170548443f
proxy-init: v1.5.1 (#7223)
This release updates the base image to alpine:3.14.2.
2021-11-04 17:11:20 -05:00
Alejandro Pedraza 281cc4aded
Upgrade proxy-init to v1.5.0 (#7203)
To include the changes from linkerd/linkerd2-proxy-init#49 (allow the
proxy-init image to be run as non-root)
2021-11-03 14:35:58 -05:00
Tarun Pothulapati 92421d047a
core: use serviceAccountToken volume for pod authentication (#7117)
Fixes #3260 

## Summary

Currently, Linkerd uses a service Account token to validate a pod
during the `Certify` request with identity,  through which identity
is established on the proxy. This works well and good, as Kubernetes
attaches the `default` service account token of a namespace as a volume
(unless overridden with a specific service account by the user). Catch
here being that this token is aimed at the application to talk to the
kubernetes API and not specifically for Linkerd. This means that there
are [controls outside of Linkerd](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server), to manage this service token, which
users might want to use, [causing problems with Linkerd](https://github.com/linkerd/linkerd2/issues/3183)
as Linkerd might expect it to be present.

To have a more granular control over the token, and not rely on the
service token that can be managed externally, [Bound Service Tokens](https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/1205-bound-service-account-tokens)
can be used to generate tokens that are specifically for Linkerd,
that are bound to a specific pod, along with an expiry.

## Background on Bounded Service Tokens 

This feature has been GA’ed in Kubernetes 1.20, and is enabled by default
in most cloud provider distributions. Using this feature, Kubernetes can
be asked to issue specific tokens for linkerd usage (through audience bound
configuration), with a specific expiry time (as the validation happens every
24 hours when establishing identity, we can follow the same), bounded to
a specific pod (meaning verification fails if the pod object isn’t available).

Because of all these bounds, and not being able to use this token for
anything else, This feels like the right thing to rely on to validate
a pod to issue a certificate.

### Pod Identity Name

We still use the same service account name as the pod identity
(used with metrics, etc) as these tokens are all generated from the
same base service account attached to the pod (could be defualt, or
the user overriden one). This can be verified by looking at the `user`
field in the `TokenReview` response.

<details>

<summary>Sample TokenReview response</summary>

Here, The new token was created for the vault audience for a pod which
had a serviceAccount token volume projection and was using the `mine`
serviceAccount in the default namespace.

```json
  "kind": "TokenReview",
  "apiVersion": "authentication.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authentication.k8s.io/v1",
        "time": "2021-10-19T19:21:40Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {"f:spec":{"f:audiences":{},"f:token":{}}}
      }
    ]
  },
  "spec": {
    "token": "....",
    "audiences": [
      "vault"
    ]
  },
  "status": {
    "authenticated": true,
    "user": {
      "username": "system:serviceaccount:default:mine",
      "uid": "889a81bd-e31c-4423-b542-98ddca89bfd9",
      "groups": [
        "system:serviceaccounts",
        "system:serviceaccounts:default",
        "system:authenticated"
      ],
      "extra": {
        "authentication.kubernetes.io/pod-name": [
  "nginx"
],
        "authentication.kubernetes.io/pod-uid": [
  "ebf36f80-40ee-48ee-a75b-96dcc21466a6"
]
      }
    },
    "audiences": [
      "vault"
    ]
  }

```

</details>


## Changes

- Update `proxy-injector` and install scripts to include the new
  projected Volume and VolumeMount.
- Update the `identity` pod to validate the token with the linkerd
  audience key.
- Added `identity.serviceAccountTokenProjection`  to disable this
 feature.
- Updated err'ing logic with `autoMountServiceAccount: false`
 to fail only when this feature is disabled.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-11-03 02:03:39 +05:30
Zahari Dichev d1b444ee41
fix wrong group names in fake client (#7173)
The `Group` attribute of the`GroupVersionResource` is wrong for the fake clients.
This leads to tests failing as types are not registered and keyed correctly.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-10-29 16:02:06 -06:00
Kevin Leimkuhler 00e018d277
Add policy CRD APIs (#7095)
This adds the policy CRD APIs for `Server` and `ServerAuthorization` CRDs.

The structure of each (in their respective `types.go`) is based off the `policy-crd.yaml` specs for each CRD.

Unlike service profiles, servers and server authorizations use the `oneof` extensively so I encoded that as a struct with a pointer for each possible `oneof`. For example, a server's `PodSelector` is either `MatchExpressions` or `MatchLabels`. Therefore, a `PodSelector` is defined as:

```
type PodSelector struct {
	MatchExpressions *MatchExpressions
	MatchLabels      *MatchLabels
}
```

Closes #6970 

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-10-22 15:54:09 -06:00
Alejandro Pedraza ca92182904
Bump proxy-init to v1.4.1 (#7010)
which contains @gusfcarvalho's logging improvements
2021-10-06 15:14:03 -05:00
Alejandro Pedraza e4a7c3b9b6
Log Warn instead of Err when prom not found in heartbeat (#7029)
Fixes #7013
2021-10-06 09:46:10 -05:00
Alejandro Pedraza 90f8c9ddf5
Remove `omitWebhookSideEffects` flag/setting (#6942)
* Remove `omitWebhookSideEffects` flag/setting

This was introduced back in #2963 to support k8s with versions before 1.12 that didn't support the `sideEffects` property in webhooks. It's been a while we no longer support 1.12, so we can safely drop this.
2021-09-22 17:03:26 -05:00
Bart Peeters 6bf507ad32
Change log level for watchers in destination svc
Make destination info logs clearer by changing log level of watchers log messages 'Establishing watch', 'Starting watch' and 'Stopping watch' from info to debug (#6917)

Signed-off-by: Bart Peeters <birtpeeters@hotmail.com>
2021-09-20 09:44:32 +01:00
Oliver Gould 99d5819232
Enable TLS detection on port 443 (#6887)
We've previously handled inbound connections on 443 as opaque, meaning
that we don't do any TLS detection.

This prevents the proxy from reporting meaningful metadata on these TLS
connections--especially the connection's SNI value.

This change also simplifies the core control plane's configuration for
skipping outbound connection on 443 to be much simpler (and
documented!).
2021-09-15 16:55:28 -07:00
Tarun Pothulapati 45478b6db8
viz: support `stat` on new policy resources (#6785)
Fixes #6733

As policy resources provide a grouping, statistics summaries should
also be allowed on these groupings which are useful to the user. Them
being port specific provide a great way to break down these metrics
further.

This PR adds support for policy resources i.e `server` and `serverauthorization`
 on the `stat` command.

## Changes

This adds a new path in the `stat_summary.go` file to handle policy
objects. I tried to see if we could re-use some of the other paths
but some of the labels seems to differ and hence a different path
had to be created. We can try to refactor and merge them though.

We support both request and TCP metrics for the `server` resource
while only the former with `serverauthorization` resources
as metrics are generated in this manner.

This also adds these policy objects into the `k8s` package to
make them as known resources.

For both the policy resources, `--from` doesn't work as these
metrics are not exposed from outbound, and there is no way to
query about the client workload from the inbound metrics. `--to`
is supported to get metrics specifically for a destination workload.
(just like on a service)

## Testing

```bash
> curl -sL https://run.linkerd.io/emojivoto.yml | linkerd inject --proxy-log-level debug - | kubectl apply -f -

> kubectl apply -f 897de1a8d5/emojivoto-policy.yml


# Initial values
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps   100.00%   1.8rps           1ms           1ms           3ms          1         188.6B/s         2072.9B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    80.70%   0.9rps           1ms           2ms           3ms          1          91.4B/s           52.7B/s
emojivoto   web-http            0.0rps    90.68%   2.0rps           2ms          10ms          28ms          1         153.7B/s         4509.4B/s

# After changing the `emoji-grpc` authz
on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell) took 2s
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.3rps   100.00%   1.1rps           0ms           0ms           0ms          1         156.5B/s         1282.4B/s
emojivoto   prom                0.0rps         -        -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps    87.88%   0.6rps           0ms           0ms           0ms          1          53.5B/s           31.5B/s
emojivoto   web-http            0.0rps    61.18%   1.4rps           1ms           2ms           2ms          1         110.2B/s         2195.7B/s

# after changing the `web-http` authz

on  kind-kind  linkerd2 on 🌱 taru [📦📝🤷‍] via 🐼 v1.16.7 via  via ❄️  impure (shell)
➜ ./bin/go-run cli viz stat srv -A -owide                                                                                                         ~/work/linkerd2
NAMESPACE   NAME          UNAUTHORIZED   SUCCESS   RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emojivoto   emoji-grpc          0.0rps         -     -             -             -             -          -                -                 -
emojivoto   prom                0.0rps         -     -             -             -             -          -                -                 -
emojivoto   voting-grpc         0.0rps         -     -             -             -             -          -                -                 -
emojivoto   web-http            1.0rps         -     -             -             -             -          -                -                 -

> linkerd  viz stat srv/emoji-grpc -n emojivoto -owide
NAME         SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc        100.00%   2.0rps           1ms           1ms           1ms          1         199.9B/s         2208.0B/s

> linkerd  viz stat srv/web-http -n emojivoto -owide
NAME      SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
web-http         94.02%   1.9rps           4ms           9ms          10ms          1         152.7B/s         4505.9B/s

> linkerd  viz stat srv -n emojivoto -o wide                                                      
NAME          MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TCP_CONN   READ_BYTES/SEC   WRITE_BYTES/SEC
emoji-grpc         -   100.00%   2.0rps           1ms           1ms           1ms          1         201.6B/s         2209.8B/s
prom               -         -        -             -             -             -          -                -                 -
voting-grpc        -    86.21%   1.0rps           1ms           1ms           1ms          1          98.3B/s           55.9B/s
web-http           -    91.67%   2.0rps           3ms           8ms          10ms          1         157.7B/s         4600.3B/s


> linkerd  viz stat serverauthorization/web-public -n emojivoto
NAME       MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99  
web-http        -    89.83%   2.0rps           3ms           9ms          10ms

> linkerd viz stat saz -n emojivoto
NAME          AUTHORIZATION     MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
emoji-grpc    emoji-grpc             -   100.00%   2.0rps           1ms           1ms           1ms
prom          prom-prometheus        -         -        -             -             -             -
voting-grpc   voting-grpc            -    89.83%   1.0rps           1ms           1ms           1ms
web-http      web-public             -    94.96%   2.0rps           1ms           5ms           9ms

> linkerd viz stat saz/web-public -n emojivoto                                                 
NAME       AUTHORIZATION   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99
web-http   web-public           -    90.00%   2.0rps           1ms           5ms           9ms
```

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-09-15 10:59:36 +05:30
Stepan Rabotkin 5e6a1b5508
Graceful shutdown for admin server (#6817)
* Graceful shutdown for admin server

Signed-off-by: Stepan Rabotkin <epicstyt@gmail.com>
2021-09-07 10:50:31 -05:00
Matei David ecd39700c4
Update proxy-init to v1.4.0 (#6790)
Updates linkerd2-proxy-init version to v1.4.0

Major change includes removing "redirect-non-loopback-traffic" rule; previously packets with destination != 127.0.0.1 on lo originating from proxy process would be sent to the inbound proxy port (assuming application tries to talk to itself). This is no longer the case.

Signed-off-by: Matei David <matei@buoyant.io>
2021-09-01 15:45:12 +01:00
Kevin Leimkuhler d611af3647
Filter default opaque ports for pods and services (#6774)
#6719 changed the proxy injector so that it adds the `config.linkerd.io/opaque-ports` annotation to all pods and services if they or their namespace do not already contain the annotation. The value used is the default list of opaque ports—which is `25,443,587,3306,4444,5432,6379,9300,11211` unless otherwise specified by the user during installation.

Closes #6729

The main issue with this is that if a service exposes a service port `9090` that targets `3306`, the service _should_ have `9090` set as opaque since it targets a default opaque port, but it does not. This change ensures that services with this situation have `9090` set as opaque.

Additionally, services and pods do not need an annotation for with the entire default opaque ports list if they don't expose those ports in the first place. This change will filter out ports from the default list if the service or pod does not expose them.

### tests
I've added some unit tests that demonstrate the change in behavior and explained in the original issue #6729.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-08-31 16:11:42 -06:00
Kevin Leimkuhler 152290e58d
proxy-injector: add `default-inbound-policy` annotation (#6750)
The proxy injector now adds the `config.linkerd.io/default-inbound-policy` annotation to all injected pods.

Closes #6720.

If the pod has the annotation before injection then that value is used. If the pod does not have the annotation but the namespace does, then it inherits that. If both the pod and the namespace do not have the annotation, then it defaults to `.Values.policyController.defaultAllowPolicy`.

Upon injecting the sidecar container into the pod, this annotation value is used to set the `LINKERD2_PROXY_INBOUND_DEFAULT_POLICY` environment variable. Additionally, `LINKERD2_PROXY_POLICY_CLUSTER_NETWORKS` is also set to the value of `.Values.clusterNetworks`.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-08-26 12:46:40 -06:00
Kevin Leimkuhler c7d54bb826
proxy-injector: always add the `opaque-ports` annotation (#6719)
In order to discover how a workload is configured without knowing the global defaults, the `opaque-ports` annotation is now added by the proxy injector to workloads, regardless of the list being the default or user-specified.

Closes #6689

#### core
Because core control plane components do not go through the proxy injector the annotation is added to the `destination`, `identity`, and `proxy-injector` templates.

The `linkerd-destination` and `linkerd-proxy-injector` deployments both now just have the `opaque-ports: "8443"` annotation. The `linkerd-identity` deployment and service doesn't need this annotation since it doesn't expose anything in the default list.

#### non-core
All other resources go through the proxy injector; it decides whether or not services or pods (the two resources that it can add annotations to) should get the default list.

Workloads get the default list of opaque ports added if they and their namespace do not have the annotation already. So this boils down to:
1. If the workload already has the annotation, no patch is created
2. If the namespace has the annotation but the workload does not, a patch is generated
3. If the workload and namespace do not have the annotation, a patch is generated

#### tests
A unit test has been added and I performed the following manual tests:
1. Injected a pod with the annotation: a patch is generated but there is no change to opaque ports
2. Injected a pod with the namespace annotation: a patch is genereted and opaque ports are copied down to the pod
3. Injected a pod with no annotation on it or the namespace: a patch is generated and the default opaque ports are added
4. Created a pod (not injected): a patch is generated (without the proxy) that adds the annotation (this holds true for if the pod having the annotation or the namespace having the annotation)

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-08-26 11:38:40 -06:00
Alex Leong 9ed5a3cb3f
Add sleep binary to proxy image (#6734)
Fixes #6723

We add the sleep binary to the proxy image so that the waitBeforeExitSeconds will work.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-08-25 08:56:20 -07:00
Tarun Pothulapati a8b1cdd79f
injector: cleanup env variables in `_proxy.tpl` (#6711)
* injector: cleanup env variables in `_proxy.tpl`

This PR updates the `_proxy.tpl` file to remove the usage of `_l5d_ns`
and `l5d_trustDomain` env variables which can be rendered directly
instead. This also moves the reference variables to the top for
simplicity purposes.

These unused variables will be removed in a future release to
prevent race conditions during upgrades.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-08-25 11:55:56 +05:30
Tarun Pothulapati 9324195485
injector: move parent env variables to first (#6706)
Variable references are only expanded to previously defined
environment variables as per https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#envvar-v1-core
which means for `LINKERD2_PROXY_POLICY_WORKLOAD` to work correctly, the
`_pod_ns` `_pod_name` should be present before they are used.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-08-20 00:06:31 +05:30
Tarun Pothulapati 6ffc4970f5
injector: configure `policy` env variables (#6701)
Fixes #6688

This PR adds the new `LINKERD2_PROXY_POLICY_SVC_ADDR` and
`LINKERD2_PROXY_POLICY_SVC_NAME` env variables which are used to specify
the address and the identity (which is `linkerd-destination`) of the
policy server respectively.

This also adds the new `LINKERD2_PROXY_POLICY_WORKLOAD` in the format
of `$ns:$pod` which is used to specify the identity of the workload itself.
A new `_pod_name` env variable has been added to get the name of the pod
through the Downward API.

These variables are only set if the `proxy.component` is not
`linkerd-identity`.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-08-19 10:25:40 -07:00
LiuDui 61d4fa80fc
remove unused constants (#6630)
Signed-off-by: liudui <1693291525@qq.com>
2021-08-09 16:23:48 +01:00
Alejandro Pedraza d9e9013cd9
Fix external-prometheus integration test flakyness (#6575)
Another attempt at fixing #6511

Even after #6524, we continued experiencing discrepancies on the
linkerd-edges integration test. The problem ended up being the external
prometheus instance not being injected. The injector logs revealed this:

```console
2021-07-29T13:57:10.2497460Z time="2021-07-29T13:54:15Z" level=info msg="caches synced"
2021-07-29T13:57:10.2498191Z time="2021-07-29T13:54:15Z" level=info msg="starting admin server on :9995"
2021-07-29T13:57:10.2498935Z time="2021-07-29T13:54:15Z" level=info msg="listening at :8443"
2021-07-29T13:57:10.2499945Z time="2021-07-29T13:54:18Z" level=info msg="received admission review request 2b7b4970-db40-4bda-895b-bb2e95e98265"
2021-07-29T13:57:10.2511751Z time="2021-07-29T13:54:18Z" level=debug msg="admission request: &AdmissionRequest{UID:2b7b4970-db40-4bda-895b-bb2e95e98265,Kind:/v1, Kind=Service,Resource:{ v1 services},SubResource:,Name:metrics-api,Namespace:linkerd-viz...
```

Usually one expects the webhook server to start first ("listening at
:8443") and then the admin server, but in this case it happened the
other way around. The admin server serves the readiness probe, so k8s
was signaled that the injector was ready before it could listen to
webhook requests, and given the WebhookFailurePolicy is Ignore by
default, sometimes this was causing for the prometheus pod creation
event to get missed, and we see in the log above that it starts by
processing the pods that are created afterwards, which are the viz ones.

In this fix we start first the webhook server, then block on the syncing
of the k8s API, which should give enough time for the webhook to be up,
and finally we start the admin server.
2021-07-29 13:29:31 -05:00
Alex Leong ca1077bb08
Read trust roots from configmap (#6455)
Fixes #6452 

We add a `linkerd-identity-trust-roots` ConfigMap which contains the configured trust root bundle.  The proxy template partial is modified so that core control plane components load this bundle from the configmap through the downward API.

The identity controller is updated to mount this new configmap as a volume read the trust root bundle at startup.

Similarly, the proxy-injector also mounts this new configmap.  For each pod it injects, it reads the trust root bundle file and sets it on the injected pod.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-28 13:23:15 -07:00
Alex Leong 24792cfd1c
Remove core dependency on viz (#6497)
Fixes #5589 

The core control plane has a dependency on the viz package in order to use the `BuildResource` function.  This "backwards" dependency means that the viz source code needs to be included in core docker-builds and is bad for code hygiene.

We move the `BuildResource` function into the viz package.  In `cli/cmd/metrics.go` we replace a call to `BuildResource` with a call directly to `CanonicalResourceNameFromFriendlyName`.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-19 14:28:45 -07:00
dependabot[bot] 789aeea561
Fix gRPC servers (#6510)
Bump github.com/linkerd/linkerd2-proxy-api from 0.1.18 to 0.2.0

Bumps [github.com/linkerd/linkerd2-proxy-api](https://github.com/linkerd/linkerd2-proxy-api) from 0.1.18 to 0.2.0.
- [Release notes](https://github.com/linkerd/linkerd2-proxy-api/releases)
- [Changelog](https://github.com/linkerd/linkerd2-proxy-api/blob/main/CHANGES.md)
- [Commits](https://github.com/linkerd/linkerd2-proxy-api/compare/v0.1.18...v0.2.0)

---
updated-dependencies:
- dependency-name: github.com/linkerd/linkerd2-proxy-api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Oliver Gould <olix0r@gmail.com>

Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Oliver Gould <olix0r@gmail.com>
2021-07-19 10:24:23 -05:00
Alejandro Pedraza ae62d92f7d
Fixes unit tests indeterminism (#6496)
We were getting sporadic coverage differences on `controller/k8s/test_helper.go` and `pkg/healthcheck/healthcheck_test.go` on pushes unrelated to those files.

For the former, the problem was in tests in `controller/k8s/api_test.go` that compared slices of pods and services by sorting them. The `Sort` interface was implemented through the methods in `test_helper.go`. There is indeterminism in that sorting at the go library level apparently, in that the `Swap` method is not always called, which impacted the coverage report. The fix consists on comparing those slices item by item without needing to sort beforehand.

As for `healthcheck_test.go`, `validateControlPlanePods()` in `healthcheck.go` short-circuits on the first pod having all its containers ready. The unit tests iterate over maps, an iteration we know is not deterministic, so sometimes the short-circuiting avoided to ever cover the `!container.Ready` block, thus affecting the coverage report. This is fixed by adding a new small test that makes sure that block is covered.
2021-07-19 12:42:45 +05:30
Matei David 2689be893f
Add parent obj validation for ReplicaSets (#6458)
The problem is for parent objects that are not supported in Linkerd, we
cannot get any metrics. For example, using a Rollout will not report any
metrics higher than a pod level.

To fix, add validation for ReplicaSet owners; if it's a valid parent,
use parent Kind and Name, otherwise use ReplicaSet.

Tested using CLI/UI

Interim solution for #6429 

Signed-off-by: Matei David <matei@buoyant.io>
2021-07-13 10:16:35 -06:00
Alex Leong d9315fa4ee
Add client-go cache size metrics (#6447)
Fixes #6354 

We add Prometheus gauges which track the client-go cache size for each resource type.  For example, the following metrics are added to the destination controller:

```
# HELP endpoint_cache_size Number of items in the client-go endpoint cache
# TYPE endpoint_cache_size gauge
endpoint_cache_size 21
# HELP job_cache_size Number of items in the client-go job cache
# TYPE job_cache_size gauge
job_cache_size 0
# HELP namespace_cache_size Number of items in the client-go namespace cache
# TYPE namespace_cache_size gauge
namespace_cache_size 8
# HELP node_cache_size Number of items in the client-go node cache
# TYPE node_cache_size gauge
node_cache_size 1
# HELP pod_cache_size Number of items in the client-go pod cache
# TYPE pod_cache_size gauge
pod_cache_size 23
# HELP replica_set_cache_size Number of items in the client-go replica_set cache
# TYPE replica_set_cache_size gauge
replica_set_cache_size 40
# HELP service_cache_size Number of items in the client-go service cache
# TYPE service_cache_size gauge
service_cache_size 18
# HELP service_profile_cache_size Number of items in the client-go service_profile cache
# TYPE service_profile_cache_size gauge
service_profile_cache_size 4
# HELP traffic_split_cache_size Number of items in the client-go traffic_split cache
# TYPE traffic_split_cache_size gauge
traffic_split_cache_size 0
```

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-07-09 13:36:43 -07:00
Alejandro Pedraza a4e35b7cc8
Set `LINKERD2_PROXY_INBOUND_PORTS` during injection (#6445)
* Set `LINKERD2_PROXY_INBOUND_PORTS` during injection

Fixes #6267

The `LINKERD2_PROXY_INBOUND_PORTS` env var will be set during injection,
containing a comma-separated list of the ports in the non-proxy containers in
the pod. For the identity, destination and injector pods, the var is set
manually in their Helm templates.

Since the proxy-injector isn't reinvoked, containers injected by a mutating
webhook after the injector has run won't be detected. As an escape hatch, the
`config.linkerd.io/pod-inbound-ports` annotation has been added to explicit
overrides.

Other changes:

- Removed
`controller/proxy-injector/fake/data/inject-sidecar-container-spec.yaml` which
is no longer used.  - Fixed bad indentation in some fixture files under
`controller/proxy-injector/fake/data`.
2021-07-09 11:52:20 -05:00
Jason Morgan 1e53bc6f87
Added ports to default configuration. (#6388)
Default Linkerd skip and opaque port configuration

Missing default ports based on docs

Addressed: Add Redis to default list of Opaque ports #6132

Once merged, the default install values will match the recommendations in Linkerd's TCP ports guide.

Fixes #6132

Signed-off-by: jasonmorgan <jmorgan@f9vs.com>
Co-authored-by: Alejandro Pedraza <alejandro.pedraza@gmail.com>
2021-07-09 09:58:47 -06:00
dependabot[bot] 1dfd8b5bd7
Bump google.golang.org/protobuf from 1.27.0 to 1.27.1 (#6409)
* Bump google.golang.org/protobuf from 1.27.0 to 1.27.1

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.27.0 to 1.27.1.
- [Release notes](https://github.com/protocolbuffers/protobuf-go/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash)
- [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.27.0...v1.27.1)

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-07-01 14:50:04 -06:00
dependabot[bot] 94b5aa634e
Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 (#6395)
* Bump google.golang.org/protobuf from 1.26.0 to 1.27.0

Bumps [google.golang.org/protobuf](https://github.com/protocolbuffers/protobuf-go) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/protocolbuffers/protobuf-go/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf-go/blob/master/release.bash)
- [Commits](https://github.com/protocolbuffers/protobuf-go/compare/v1.26.0...v1.27.0)

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alex Leong <alex@buoyant.io>
2021-06-29 13:16:44 -06:00
Alex Leong 9a1468328c
Emit event when issuing leaf certificate (#6364)
We emit a Kubernetes event from the identity controller when successfully issuing a leaf certificate.  The events include the identity, expiry, and a hash of the certificate.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-06-25 11:19:16 -07:00
Alejandro Pedraza f976f0d6e5
Upgrade proxy-init to v.1.3.13 (#6367)
List of changes:

- Include more output in the `simulate` mode (thanks @liuerfire!")
- Log to `stdout` instead of `stderr` (thanks @mo4islona!)

Non user-facing changes:
- Added `dependabot.yml` to receive automated dependencies upgrades PRs (both for go and github actions). As a result, also upgraded a bunch of dependencies.
2021-06-23 20:23:00 -05:00
Tarun Pothulapati ebbb3182a9
checks: use caching with opaqueports check (#6292)
Fixes #6272

The opaqueports is prone to fail, with `context deadline exceeded`
as there are numerous k8s API requests being performed.

This PR updates the pre-fetching logic to instead use
`controller/k8s` which provides a wrapper around `pkg/k8s` with
caching by using shared informers underneath!

This commit includes the following changes:
- Update `checkMisconfiguredOpaquePortAnnotations` to use
  `controllerk8s.KubeAPI` instead of `hc.kubeAPI`
- `kubeAPI.Sync` fn also had to be updated as it fails to check
if the sp and ts sharedinformers are nil, which might be the
case in cases like this where they are not needed.

We had to use `controllerK8s.NewAPI` for the initialization
instead of `controllerk8s.InitializeAPI` to take-in
`hc.kubeAPI` so as to support unit testing, etc as `hc.kubeAPI`
is how we pass the fake resources in unit tests.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-06-23 10:28:15 +05:30
Matei David 06ef634a9b
Add endpoint in profile response for requests on pod DNS. (#6260)
Closes #6253 


### What
---

When we send a profile request with a pod IP, we get back an endpoint as part of the response. This has two advantages: we avoid building a load balancer and we can treat endpoint failure differently (with more of a fail fast approach). At the moment, when we use a pod DNS as the target of the profile lookup, we don't have an endpoint returned in the
response.

Through this change, the behaviour will be consistent. Whenever we look up a pod (either through IP or DNS name) we will get an endpoint back. The change also attempts to simplify some of the logic in GetProfile.


### How
---

We already have a way to build an endpoint and return it back to the client; I sought to re-use most of the code in an effort to also simplify `GetProfile()`. I extracted most of the code that would have been duplicated into a separate method that is responsible for building the address, looking at annotations for opaque ports and for sending the response back.

In addition, to support a pod DNS fqn I've expanded on the `else` branch of the topmost if statement -- if our host is not an IP, we parse the host to get the k8s fqn. If the parsing function returns an instance ID along with the ServiceID, then we know we are dealing directly with a pod -- if we do, we fetch the pod using the core informer and then return an endpoint for it.

### Tests
---

I've tested this mostly with the destination client script. For the tests, I used the following pods:
```
❯ kgp -n emojivoto -o wide

NAME                        READY   STATUS    RESTARTS   AGE     IP           NODE                NOMINATED NODE   READINESS GATES
voting-ff4c54b8d-zbqc4      2/2     Running   0          3m58s   10.42.0.53   k3d-west-server-0   <none>           <none>
web-0                       2/2     Running   0          3m58s   10.42.0.55   k3d-west-server-0   <none>           <none>
vote-bot-7d89964475-tfq7j   2/2     Running   0          3m58s   10.42.0.54   k3d-west-server-0   <none>           <none>
emoji-79cc56f589-57tsh      2/2     Running   0          3m58s   10.42.0.52   k3d-west-server-0   <none>           <none>

# emoji pod has an opaque port set to 8080.
# web-svc is a headless service and it backs a statefulset (which is why we have web-0).
# without a headless service we can't lookup based on pod DNS.
```

**`Responses before the change`**: 
```
# request on IP, this is how things work at the moment. I included this because there shouldn't be
# any diff between the response given here and the response we get with the change.
# note: this corresponds to the emoji pod which has opaque ports set to 8080.
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.52:8080
INFO[0000] opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524724} port:8080} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"emoji"} metric_labels:{key:"namespace" value:"emojivoto"} metric_labels:{key:"pod" value:"emoji-79cc56f589-57tsh"} metric_labels:{key:"pod_template_hash" value:"79cc56f589"} metric_labels:{key:"serviceaccount" value:"emoji"} tls_identity:{dns_like_identity:{name:"emoji.emojivoto.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{} opaque_transport:{inbound_port:4143}}}
INFO[0000]

# request web-0 by IP
# there shouldn't be any diff with the response we get after the change
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.55:8080
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524727} port:8080} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"namespace" value:"emojivoto"} metric_labels:{key:"pod" value:"web-0"} metric_labels:{key:"serviceaccount" value:"web"} metric_labels:{key:"statefulset" value:"web"} tls_identity:{dns_like_identity:{name:"web.emojivoto.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
INFO[0000]

# request web-0 by DNS name -- will not work.
❯ go run controller/script/destination-client/main.go -method getProfile -path web-0.web-svc.emojivoto.svc.cluster.loc
al:8080
INFO[0000] fully_qualified_name:"web-0.web-svc.emojivoto.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"web-svc.emojivoto.svc.cluster.local.:8080" weight:10000}
INFO[0000]
INFO[0000] fully_qualified_name:"web-0.web-svc.emojivoto.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"web-svc.emojivoto.svc.cluster.local.:8080" weight:10000}
INFO[0000]
# ^
# |
#  -->  no endpoint in the response
```

**`Responses after the change`**:

```
# request profile for emoji, we see opaque transport being set on the endpoint.
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.52:8080
INFO[0000] opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524724} port:8080} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"emoji"} metric_labels:{key:"namespace" value:"emojivoto"} metric_labels:{key:"pod" value:"emoji-79cc56f589-57tsh"} metric_labels:{key:"pod_template_hash" value:"79cc56f589"} metric_labels:{key:"serviceaccount" value:"emoji"} tls_identity:{dns_like_identity:{name:"emoji.emojivoto.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{} opaque_transport:{inbound_port:4143}}}
INFO[0000]

# request profile for web-0 with IP.
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.55:8080
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524727} port:8080} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"namespace" value:"emojivoto"} metric_labels:{key:"pod" value:"web-0"} metric_labels:{key:"serviceaccount" value:"web"} metric_labels:{key:"statefulset" value:"web"} tls_identity:{dns_like_identity:{name:"web.emojivoto.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
INFO[0000]

# request profile for web-0 with pod DNS, resp contains endpoint.
❯ go run controller/script/destination-client/main.go -method getProfile -path web-0.web-svc.emojivoto.svc.cluster.local:8080
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524727} port:8080} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"namespace" value:"emojivoto"} metric_labels:{key:"pod" value:"web-0"} metric_labels:{key:"serviceaccount" value:"web"} metric_labels:{key:"statefulset" value:"web"} tls_identity:{dns_like_identity:{name:"web.emojivoto.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
INFO[0000]
```

Signed-off-by: Matei David <matei@buoyant.io>
2021-06-22 16:51:29 -06:00