In order for proxies to properly reflect the resources used to drive policy
decisions, the proxy API has been updated to include resource metadata on
ServiceProfile responses.
This change updates the profile translator to include ParentRef and ProfileRef
metadata when it is configured.
This change does not set backend or endpoint references.
https://github.com/linkerd/linkerd2/pull/11491 changed the EndpointTranslator to use a queue to avoid calling `Send` on a gRPC stream directly from an informer callback goroutine. This change updates the ProfileTranslator in the same way, adding a queue to ensure we do not block the informer thread.
Signed-off-by: Alex Leong <alex@buoyant.io>
The GetProfile API endpoint does not behave as expected: when a profile
watch is established, the API server starts two separate profile
watches--a primary watch with the client's namespace and fallback watch
ignoring the client's namespace. These watches race to send data back to
the client. If the backup watch updates first, it may be sent to clients
before being corrected by a subsequent update. If the primary watch
updates with an empty value, the default profile may be served before
being corrected by an update to the backup watch.
From the proxy's perspective, we'd much prefer that the API provide a
single authoritative response when possible. It avoids needless
corrective work from distributing across the system on every watch
initiation.
To fix this, we modify the fallbackProfileListener to behave
predictably: it only emits updates once both its primary and fallback
listeners have been updated. This avoids emitting updates based on a
partial understanding of the cluster state.
Furthermore, the opaquePortsAdaptor is updated to avoid synthesizing a
default serviceprofile (surprising behavior) and, instead, this
defaulting logic is moved into a dedicated defaultProfileListener
helper. A dedupProfileListener is added to squelch obviously redundant
updates.
Finally, this newfound predictability allows us to simplify the API's
tests. Many of the API tests are not clear in what they test and
sometimes make assertions about the "incorrect" profile updates.
Closes#7826
This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time.
A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword.
Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover.
Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
We use `fmt.Sprintf` to format URIs in several places we could be using
`net.JoinHostPort`. `net.JoinHostPort` ensures that IPv6 addresses are
properly escaped in URIs.
Signed-off-by: Oliver Gould <ver@buoyant.io>
### What
`GetProfile` clients do not receive destinatin profiles that consider Server protocol fields the way that `Get` clients do. If a Server exists for a `GetProfile` destination that specifies the protocol for that destination is `opaque`, this information is not passed back to the client.
#7184 added this for `Get` by subscribing clients to Endpoint/EndpointSlice updates. When there is an update, or there is a Server update, the endpoints watcher passes this information back to the endpoint translator which handles sending the update back to the client.
For `GetProfile` the situation is different. As with `Get`, we only consider Servers when dealing with Pod IPs, but this only occurs in two situations for `GetProfile`.
1. The destination is a Pod IP and port
2. The destionation is an Instance ID and port
In both of these cases, we need to check if a already Server selects the endpoint and we need to subscribe for Server updates incase one is added or deleted which selects the endpoint.
### How
First we check if there is already a Server which selects the endpoint. This is so that when the first destionation profile is returned, the client knows if the destination is `opaque` or not.
After sending that first update, we then subscribe the client for any future updates which will come from a Server being added or deleted.
This is handled by the new `ServerWatcher` which watches for Server updates on the cluster; when an update occurs it sends that to the `endpointProfileTranslator` which translates the protcol update into a DestinationProfile.
By introducing the `endpointProfileTranslator` which only handles protocol updates, we're able to decouple the endpoint logic from `profileTranslator`—it's `endpoint` field has been removed now that it only handles updates for ServiceProfiles for Services.
### Testing
A unit test has been added and below are some manual testing instructions to see how it interacts with Server updates:
<details>
<summary>app.yaml</summary>
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
labels:
app: pod
spec:
containers:
- name: app
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
name: srv
labels:
policy: srv
spec:
podSelector:
matchLabels:
app: pod
port: 80
proxyProtocol: opaque
```
</details>
```shell
$ go run ./controller/cmd/main.go destination
```
```shell
$ linkerd inject app.yaml |kubectl apply -f -
...
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod 2/2 Running 0 53m 10.42.0.34 k3d-k3s-default-server-0 <none> <none>
$ go run ./controller/script/destination-client/main.go -method getProfile -path 10.42.0.34:80
...
```
You can add/delete `srv` as well as edit its `proxyProtocol` field to observe the correct DestinationProfile updates.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
* Support Traffic Splitting through `ServiceProfile.dstOverrides`
This commit
- updates the destination logic to prevent the override of `ServiceProfiles.dstOverrides` when a `TS` is absent and no `dstOverrides` set. This means that any `serviceprofiles.dstOverrides` set by the user are correctly propagated to the proxy and thus making Traffic Splitting through service profiles work.
- This also updates the `profile_translator.toDstOverrides` to use the port from `GetProfiles` when there is no port in `dstOverrides.authority`
After these changes, The following `ServiceProfile` to perform
traffic splitting should work.
```yaml
apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
name: backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local
spec:
dstOverrides:
- authority: "backend-svc.linkerd-trafficsplit-test-sp.svc.cluster.local.:8080"
weight: 500m
- authority: "failing-svc.linkerd-trafficsplit-test-sp.svc.cluster.local.:8080"
weight: 500m
```
This PR also adds an integration test i.e `TestTrafficSplitCliWithSP` which checks for
Traffic Splitting through Service Profiles. This integration test follows the same pattern
of other traffic split integration tests but Instead, we perform `linkerd viz stat` on the
client and check if there are expected server objects to be there as `linkerd viz stat ts`
does not _yet_ work with Service Profiles.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
## Summary
This changes the destination service to start indicating whether a profile is an
opaque protocol or not.
Currently, profiles returned by the destination service are built by chaining
together updates coming from watching Profile and Traffic Split updates.
With this change, we now also watch updates to Opaque Port annotations on pods
and namespaces; if an update occurs this is now included in building a profile
update and is sent to the client.
## Details
Watching updates to Profiles and Traffic Splits is straightforward--we watch
those resources and if an update occurs on one associated to a service we care
about then the update is passed through.
For Opaque Ports this is a little different because it is an annotation on pods
or namespaces. To account for this, we watch the endpoints that we should care
about.
### When host is a Pod IP
When getting the profile for a Pod IP, we check for the opaque ports annotation
on the pod and the pod's namespace. If one is found, we'll indicate if the
profile is an opaque protocol if the requested port is in the annotation.
We do not subscribe for updates to this pod IP. The only update we really care
about is if the pod is deleted and this is already handled by the proxy.
### When host is a Service
When getting the profile for a Service, we subscribe for updates to the
endpoints of that service. For any ports set in the opaque ports annotation on
any of the pods, we check if the requested port is present.
Since the endpoints for a service can be added and removed, we do subscribe for
updates to the endpoints of the service.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Context: #5209
This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.
The `Endpoint` field is only set if the IP maps to a Pod--not a Service.
Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.
### Examples
```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090" weight:10000}
```
Before:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```
After:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524692}} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"fast-1"} metric_labels:{key:"pod" value:"fast-1-5cc87f64bc-9hx7h"} metric_labels:{key:"pod_template_hash" value:"5cc87f64bc"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## Motivation
Closes#5016
Depends on linkerd/linkerd2-proxy-api#44
## Solution
A `profileTranslator` exists for each service and now has a new
`fullyQualifiedName` field.
This field is used to set the `FullyQualifiedName` field of
`DestinationProfile`s each time an update is sent.
In the case that no service profile exists for a service, a default
`DestinationProfile` is created and we can use the field to set the correct
name.
In the case that a service profile does exist for a service, we still use this
field to set the name to keep it consistent.
### Example
Install linkerd on a cluster and run the destination server:
```
go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
```
Get the IP of a service. Here, we'll get the ip for `linkerd-identity`:
```
> kubectl get -n linkerd svc/linkerd-identity
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
linkerd-identity ClusterIP 10.43.161.68 <none> 8080/TCP 4h25m
```
Get the profile of `linkerd-identity` from service name or IP and note the
`FullyQualifiedName` field:
```
> go run controller/script/destination-client/main.go -method getProfile -path 10.43.161.68:8080
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```
```
> go run controller/script/destination-client/main.go -method getProfile -path linkerd-identity.linkerd.svc.cluster.local
INFO[0000] fully_qualified_name:"linkerd-identity.linkerd.svc.cluster.local" ..
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
The openAPIV3Schema validation in the ServiceProfiles CRD is very limited in what it can validate and is obviated by more sophisticated validation done by the validating admission controller. Therefore, we would like to remove the openAPIV3Schema validation to reduce the size and complexity of the CRD object.
To do so, we must also bump the version of the ServiceProfile custom resource from v1alpha1 to v1alpha2. This ensures that when the controller is upgraded, it will attempt to watch the v1alpha2 resource. If it cannot (because, for example, the controller pod started before the ServiceProfile CRD was updated and therefore the v1alpha2 version does not exist) then it will go into a crash loop backoff until it can. This essentially means that the controller will wait for the CRD to be upgraded to include v1alpha2 before it will start.
Bumping the version is necessary because if we did not, it would be possible for the controller to start before the CRD is updated (removing the validation). In this case, when the CRD is edited, the controller will lose its list watch on ServiceProfiles and will stop getting updates.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change implements the DstOverrides feature of the destination profile API (aka traffic splitting).
We add a TrafficSplitWatcher to the destination service which watches for TrafficSplit resources and notifies subscribers about TrafficSplits for services that they are subscribed to. A new TrafficSplitAdaptor then merges the TrafficSplit logic into the DstOverrides field of the destination profile.
Signed-off-by: Alex Leong <alex@buoyant.io>
This is a major refactor of the destination service. The goals of this refactor are to simplify the code for improved maintainability. In particular:
* Remove the "resolver" interfaces. These were a holdover from when our decision tree was more complex about how to handle different kinds of authorities. The current implementation only accepts fully qualified kubernetes service names and thus this was an unnecessary level of indirection.
* Moved the endpoints and profile watchers into their own package for a more clear separation of concerns. These watchers deal only in Kubernetes primitives and are agnostic to how they are used. This allows a cleaner layering when we use them from our gRPC service.
* Renamed the "listener" types to "translator" to make it more clear that the function of these structs is to translate kubernetes updates from the watcher to gRPC messages.
Signed-off-by: Alex Leong <alex@buoyant.io>