Traffic that is meant for the destination workload can be sent over the opaque transport without issue. However, traffic intended for the proxy itself (metrics scraping, tap) need to be sent directly to the corresponding proxy port to prevent them from being forwarded to the workflow.
This adds in special cases for the admin and control ports, read directly from the environment variables on the pods, that excludes them from being sent over opaque transport.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Non-opaque meshed traffic currently flows over the original destination port, which requires the inbound proxy to do protocol detection.
This adds an option to the destination controller that configures all meshed traffic to flow to the inbound proxy's inbound port. This will allow us to include more session protocol information in the future, obviating the need for inbound protocol detection.
This doesn't do much in the way of testing, since the default behavior should be unchanged. When this default changes, more validation will be done on the behavior here.
Signed-off-by: Scott Fleener <scott@buoyant.io>
We add support for federated services to the destination controller by adding a new FederatedServiceWatcher. When the destination controller receives a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` and/or the `multicluster.linkerd.io/local-discovery` annotations, it subscribes to the FederatedServiceWatcher instead of subscribing to the EndpointsWatcher directly. The FederatedServiceWatcher watches the federated service for any changes to these annotations, and maintains the appropriate watches on the local EndpointWatcher and/or remote EndpointWatchers fetched through the ClusterStore.
This means that we will often have multiple EndpointTranslators writing to the same `Get` response stream. In order for a `NoEndpoints` message sent to one EndpointTranslator to not clobber the whole stream, we make a change where `NoEndpoints` messages are no longer sent to the response stream, but are replaced by a `Remove` message containing all of the addresses from that EndpointTranslator. This allows multiple EndpointTranslators to coexist on the same stream.
Signed-off-by: Alex Leong <alex@buoyant.io>
Currently, we don't have a simple way of checking if the endpoint a proxy is discovering is in the same zone or not.
This adds a "zone_locality" metric label to the outbound destination address metrics. Note that this does not increase the cardinality of the related metrics, as this label doesn't vary within an endpoint.
Validated by checking the prometheus metrics on a local cluster and verifying this label appears in the outbound transport metrics.
Signed-off-by: Scott Fleener <scott@buoyant.io>
Services in dual-stack mode result in the creation of two EndpointSlices, one for each IP family. Before this change, the Get Destination API would nondeterministically return the address for any of those ES, depending on which one was processed last by the controller because they would overwrite each other.
As part of the ongoing effort to support IPv6/dual-stack networks, this change fixes that behavior giving preference to IPv6 addresses whenever a service exposes both families.
There are a new set of unit tests in server_ipv6_test.go, and in the TestEndpointTranslatorForPods tests there's a couple of new cases to test the interaction with zone filtering.
Also the server unit tests were updated to segregate the tests and resources dealing with the IPv4/IPv6/dual-stack cases.
As part of the ongoing effort to support IPv6/dual-stack networks, this change
enables the proxy to properly forward IPv6 connections:
- Adds the new `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDRS` environment variable when
injecting the proxy. This is supported as of proxy v2.228.0 which was just
pulled into the linkerd2 repo in #2d5085b56e465ef56ed4a178dfd766a3e16a631d.
This adds the IPv6 loopback address (`[::1]`) to the IPv4 one (`127.0.0.1`)
so the proxy can forward outbound connections received via IPv6. The injector
will still inject `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR` to support the rare
case where the `proxy.image.version` value is overridden with an older
version. The new proxy still considers that variable, but it's superseded by
the new one. The old variable is considered deprecated and should be removed
in the future.
- The values for `LINKERD2_PROXY_CONTROL_LISTEN_ADDR`,
`LINKERD2_PROXY_ADMIN_LISTEN_ADDR` and `LINKERD2_PROXY_INBOUND_LISTEN_ADDR`
have been updated to point to the IPv6 wildcard address (`[::]`) instead of
the IPv4 one (`0.0.0.0`) for the same reason. Unlike with the loopback
address, the IPv6 wildcard address suffices to capture both IPv4 and IPv6
traffic.
- The endpoint translator's `getInboundPort()` has been updated to properly
parse the IPv6 loopback address retrieved from the proxy container manifest.
A unit test was added to validate the behavior.
This commit adds destination controller configuration that enables default
keep-alives for meshed HTTP/2 clients.
This is accomplished by encoding the raw protobuf message structure into the
helm values, and then encoding that as JSON in the destination controller's
command-line options. This allows operators to set any supported HTTP/2 client
configuration without having to modify the destination controller.
Closes#12395
Failing to iterate over init containers as well as regular containers for finding the proxy in various parts of the code when the proxy is injected as a native sidecar resulted in:
- `Get` Destination API failing in the presence of opaque ports
- Failure having the injector detecting already injected pods
- Various CLI issues
This PR is split into the following commits addressing each issue separately:
a8ebe76e3 - Fix injection check for existing sidecars
44e9625e0 - Fix 'linkerd uninject'
62694965d - Fix 'linkerd version --proxy'
42dbdaddf - Fix 'linkerd identity'
39db823fe - Fix 'linkerd check'
7359f371d - Fix 'linkerd dg proxy-metrics'
f8f73c47c - Fix destination controller
Fixes: #12311
When the endpoint translator receives a `remove` call, it was updating it's local traffic policy based on the address set passed to remove. However, since `remove` is only meant to remove addresses and not change the address metadata, the endpoints watcher was not setting local traffic policy on these calls to `remove`. This can result in calls to `remove` temporarily turning off local traffic policy which will cause non-local addresses to be sent to clients.
Since `remove` should not change address metadata, we now disregard any metadata in the call to `remove`, including any changes to the local traffic policy.
Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
The main change here is the refactoring of the address functions in `addr.go` that support the Destination controller and Viz's Tap controller. Some of those functions only worked for IPv4, so this change refactored them to make them IP family agnostic.
This enabled adding (and fixing) IPv6 unit tests as detailed in the following sections.
Other changes:
- The `ProxyAddressesToString()` function was no longer used, so it got removed.
- The `ProxyIPToString()` function was only used by the destination-client script, so that got stripped out.
## `addr_test.go`
We added IPv6 cases to each test, that would have failed previously.
## `endpoint_translator_test.go`
One of the test pods (pod3) was changed to have an IPv6. Without the other changes in this PR those tests would still have passed, but just because when comparing actual IPs with expected ones we weren't checking if they were both zero. So here we added checks against that.
## `server_test.go`
As above, we added checks against empty IPs. And in the mocked resources in `test_util.go` we added an IPv6 EndpointSlice.
The ExternalWorkload resource we introduced has a minor naming
inconsistency; `Tls` in `meshTls` is not capitalised. Other resources
that we have (e.g. authentication resources) capitalise TLS (and so does
Go, it follows a similar naming convention).
We fix this in the workload resource by changing the field's name and
bumping the version to `v1beta1`.
Upgrading the control plane version will continue to work without
downtime. However, if an existing resource exists, the policy controller
will not completely initialise. It will not enter a crashloop backoff,
but it will also not become ready until the resource is edited or
deleted.
Signed-off-by: Matei David <matei@buoyant.io>
When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including:
* mesh TLS identity
* workload metric labels
* opaque / protocol hints
Signed-off-by: Alex Leong <alex@buoyant.io>
When we do a `GetProfile` lookup for an unmeshed pod, we set the `weightedAddr.ProtocolHint` to an empty value `&pb.ProtocolHint{}` to indicate that the address is unmeshed and has no protocol hint. However, when the looked up port is in the default opaque list, we erroneously check if `weightedAddr.ProtocolHint != nil` to determine if we should attempt to get the inbound listen port for that pod. Since `&pb.ProtocolHint{} != nil`, we attempt to get the inbound listen port for the unmeshed pod. This results in an error, preventing any valid `GetProfile` responses from being returned.
We update the initialization logic for `weightedAddr.ProtocolHint` to only create a struct when a protocol hint is present and to leave it as `nil` if the pod is unmeshed.
We add a simple unit test for this behavior as well.
Signed-off-by: Alex Leong <alex@buoyant.io>
This change adds a runtime flag to the destination controller,
--experimental-endpoint-zone-weights=true, that causes endpoints in the
local zone to receive higher weights. This feature is disabled by
default, since the weight value is not honored by proxies. No helm
configuration is exposed yet, either.
This weighting is instrumented in the endpoint translator. Tests are
added to confirm that the behavior is feature-gated.
Additionally, this PR adds the "zone" metric label to endpoint metadata
responses.
When a grpc client of the destination.Get API initiates a request but then doesn't read off of that stream, the HTTP2 stream flow control window will fill up and eventually exert backpressure on the destination controller. This manifests as calls to `Send` on the stream blocking. Since `Send` is called synchronously from the client-go informer callback (by way of the endpoint translator), this blocks the informer callback and prevents all further informer calllbacks from firing. This causes the destination controller to stop sending updates to any of its clients.
We add a queue in the endpoint translator so that when it gets an update from the informer callback, that update is queued and we avoid potentially blocking the informer callback. Each endpoint translator spawns a goroutine to process this queue and call `Send`. If there is not capacity in this queue (e.g. because a client has stopped reading and we are experiencing backpressure) then we terminate the stream.
Signed-off-by: Alex Leong <alex@buoyant.io>
Followup to #11328
Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.
The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.
Other things to note about PodWatcher:
- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
-
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
When a service has it's internal traffic policy set to "local", we will perform filtering to only return local endpoints, as-per the ForZone hints in the endpoints. However, ForZone calculations do not take resources from remote clusters into account, therefore this type of filtering is not appropriate for remote discovery services.
We explicitly ignore any internal traffic policy when doing remote discovery.
Signed-off-by: Alex Leong <alex@buoyant.io>
Add support for enabling and disabling topology aware routing when hints are added/removed.
The testing setup is very involved because it involves so many moving parts
1) Setup a service which is layered over several availability zones.
1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each.
2) Add `service.kubernetes.io/topology-aware-hints: Auto` annotation to the Service object
3) Use a load tester like k6 to send meaningful traffic to your service but only in one AZ
3) Scale up your replica sets until k8s adds Hints to your endpointslices
4) Observe that traffic shifts to only hit pods in one AZ
5) Turn down the replicasets count until such time that K8s removes the hints from your endpointslices
6) Observe traffic shifts back to all pods across all AZ.
Currently, the proxy injector will expand lists of opaque port ranges
into lists of individual port numbers. This is because the proxy has
historically not accepted port ranges in the
`LINKERD2_PROXY_INBOUND_PORTS_DISABLE_PROTOCOL_DETECTION` environment
variable. However, when very large ranges are used, the size of the
injected manifest can be quite large, since each individual port number
in a range must be listed separately.
Proxy PR linkerd/linkerd2-proxy#2395 changed the proxy to accept ranges
as well as individual port numbers in the opaque ports environment
variable, and this change was included in the latest proxy release
(v2.200.0). This means that the proxy-injector no longer needs to expand
large port ranges into individual port numbers, and can now simply
forward the list of ranges to the proxy. This branch changes the proxy
injector to do this, resolving issues with manifest size due to large
port ranges.
Closes#9803
The outbound proxy handles endpoints with the `opaque_transport` flag by
opening a direct connection to the inbound proxy's inbound listener
port, and sending a ProtoBuf `TransportHeader` including the target port
of the originating outbound connection and an (optional)
`SessionProtocol` describing the protocol used on that connection.
Currently, outbound proxies initiating direct connections will *always*
send `SessionProtocol` values communicating the protocol as understood
by the outbound proxy. However, this is not always the desired behavior.
Direct connections with `TransportHeader`s are used in two cases: for
gateway connections, and for ports which are marked as opaque. When the
inbound port is marked as opaque, the presence of a `SessionProtocol`
tells the inbound proxy to handle that connection as the indicated
protocol, which results in incorrect behavior when the inbound proxy's
ServerPolicy configures the target port as opaque (see #9888).
Therefore, the `Destination` proxy API has been updated to add a new
`ProtocolHint`, `Opaque`, which indicates that an outbound proxy should
_not_ send a `SessionProtocol` when initiating a direct connection, even
if the outbound proxy handled the connection as HTTP. This hint was
added to the proxy API in linkerd/linkerd2-proxy-api#197, and released
in `linkerd2-proxy-api` v0.8.0.
This branch updates the Destination controller's dependency on
`linkerd2-proxy-api` to v0.8.0, and changes the controller to send an
`Opaque` protocol hint when the target port is marked as opaque on the
destination pod. This should override the `H2` protocol hint that is
added when the destination is meshed. I've also added a new test for
this behavior.
Fixes#9888 (along with linkerd/linkerd2-proxy#2209, which changes the
proxy to actually handle the `Opaque` protocol hint).
Fixes#9986
After reviewing the k8s API calls in Destination, it was concluded we
could only swap out the calls to the Node and RS resources to use the
metadata API, as all the other resources (Endpoints, EndpointSlices,
Services, Pod, ServiceProfiles, Server) required fields other than those
found in their metadata section.
This also required completing the `NewFakeAPI` implementation by adding
the missing annotations and labels entries.
## Testing Memory Consumption
The gains here aren't as big as in #9650. In order to test this we need
to push hard and create 4000 RS:
``` bash
for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done
```
In edge-23.2.1 the destination pod's memory consumption goes from 40Mi
to 160Mi after all the RS were created. With this change, it went from
37Mi to 140Mi.
Closes#10130https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/
1. Update endpoints watcher to include additional field `localTrafficPolicy`.
Set to true when `.spec.internalTrafficPolicy` is set to `Local`
2. Update endpoints translater to filter by node when `localTrafficPolicy` is
set to true. Topology Aware Hints are not used when
`service.pec.internalTrafficPolicy` is set to local
Signed-off-by: Yu Cao <yc185050@ncr.com>
Fixes#9896
The maps in `endpointTranslator` weren't being guarded against
concurrent access, so we're adding locks at the `Add` and `Remove`
methods. Also these functions ultimately call the `SendMsg` method on
the gRPC `stream`, which is not
["thread-safe"](https://github.com/grpc/grpc-go/blob/master/stream.go#L122-L126),
so we're guarding against other problems as well.
A new unit test `TestConcurrency` was added that failed in the following
ways before this fix:
When running the test with the `-race` flag, we immediately get the data race warning:
```bash
$ go test ./controller/api/destination/... -run TestConcurrency -race
time="2022-11-25T16:48:52-05:00" level=info msg="waiting for caches to sync"
time="2022-11-25T16:48:52-05:00" level=info msg="caches synced"
==================
WARNING: DATA RACE
Read at 0x00c0000c0040 by goroutine 161:
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80 +0x29c
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92
Previous write at 0x00c0000c0040 by goroutine 162:
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).sendFilteredUpdate()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:95 +0x66
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:83 +0x330
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x92
Goroutine 161 (running) created at:
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f
testing.tRunner()
/usr/local/go/src/testing/testing.go:1439 +0x213
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1486 +0x47
Goroutine 162 (running) created at:
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x6f
testing.tRunner()
/usr/local/go/src/testing/testing.go:1439 +0x213
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1486 +0x47
```
If run without the `-race` flag, we get the `concurrent map writes` panic reported in #9896:
```bash
$ go test ./controller/api/destination/... -run TestConcurrency -count=1
time="2022-11-25T16:53:25-05:00" level=info msg="waiting for caches to sync"
time="2022-11-25T16:53:25-05:00" level=info msg="caches synced"
fatal error: concurrent map writes
goroutine 187 [running]:
runtime.throw({0x1b57bc4?, 0x500000000000000?})
/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0xc00013dc80 sp=0xc00013dc50 pc=0x43a5b1
runtime.mapassign(0xc00013dec8?, 0x2?, 0x0?)
/usr/local/go/src/runtime/map.go:595 +0x4d6 fp=0xc00013dd00 sp=0xc00013dc80 pc=0x4113b6
github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Add(...)
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator.go:80
github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency.func1()
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:338 +0x1a8 fp=0xc00013dfe0 sp=0xc00013dd00 pc=0x16d1da8
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00013dfe8 sp=0xc00013dfe0 pc=0x46d721
created by github.com/linkerd/linkerd2/controller/api/destination.TestConcurrency
/home/alpeb/pr/destination-panic/linkerd2/controller/api/destination/endpoint_translator_test.go:336 +0x3c
```
The destination controller can improperly handle updates by returning a
map reference instead of a new data structure. This breaks diffing logic,
as newly added endpoints appear to pre-exist.
This change ensures that a fresh data structure is used when handling
discovery updates.
Fixes#8143
Signed-off-by: Alex Leong <alex@buoyant.io>
[gocritic][gc] helps to enforce some consistency and check for potential
errors. This change applies linting changes and enables gocritic via
golangci-lint.
[gc]: https://github.com/go-critic/go-critic
Signed-off-by: Oliver Gould <ver@buoyant.io>
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.
This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).
[el]: https://github.com/polyfloyd/go-errorlint
Signed-off-by: Oliver Gould <ver@buoyant.io>
* Remove the `proxy.disableIdentity` config
Fixes#7724
Also:
- Removed the `linkerd.io/identity-mode` annotation.
- Removed the `config.linkerd.io/disable-identity` annotation.
- Removed the `linkerd.proxy.validation` template partial, which only
made sense when `proxy.disableIdentity` was `true`.
- TestInjectManualParams now requires to hit the cluster to retrieve the
trust root.
Fixes#6337
GetProfile can be called with a FQDN for a specific member of a service e.g.
```
web-0.foo.ns.svc.cluster.local
```
If that endpoint is not backed by a pod, `GetProfile` will not return an endpoint in the response.
We update the logic to return an endpoint in the response even when the endpoint is not backed by a pod.
Signed-off-by: Alex Leong <alex@buoyant.io>
### What
`GetProfile` clients do not receive destinatin profiles that consider Server protocol fields the way that `Get` clients do. If a Server exists for a `GetProfile` destination that specifies the protocol for that destination is `opaque`, this information is not passed back to the client.
#7184 added this for `Get` by subscribing clients to Endpoint/EndpointSlice updates. When there is an update, or there is a Server update, the endpoints watcher passes this information back to the endpoint translator which handles sending the update back to the client.
For `GetProfile` the situation is different. As with `Get`, we only consider Servers when dealing with Pod IPs, but this only occurs in two situations for `GetProfile`.
1. The destination is a Pod IP and port
2. The destionation is an Instance ID and port
In both of these cases, we need to check if a already Server selects the endpoint and we need to subscribe for Server updates incase one is added or deleted which selects the endpoint.
### How
First we check if there is already a Server which selects the endpoint. This is so that when the first destionation profile is returned, the client knows if the destination is `opaque` or not.
After sending that first update, we then subscribe the client for any future updates which will come from a Server being added or deleted.
This is handled by the new `ServerWatcher` which watches for Server updates on the cluster; when an update occurs it sends that to the `endpointProfileTranslator` which translates the protcol update into a DestinationProfile.
By introducing the `endpointProfileTranslator` which only handles protocol updates, we're able to decouple the endpoint logic from `profileTranslator`—it's `endpoint` field has been removed now that it only handles updates for ServiceProfiles for Services.
### Testing
A unit test has been added and below are some manual testing instructions to see how it interacts with Server updates:
<details>
<summary>app.yaml</summary>
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
labels:
app: pod
spec:
containers:
- name: app
image: nginx
ports:
- name: http
containerPort: 80
---
apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
name: srv
labels:
policy: srv
spec:
podSelector:
matchLabels:
app: pod
port: 80
proxyProtocol: opaque
```
</details>
```shell
$ go run ./controller/cmd/main.go destination
```
```shell
$ linkerd inject app.yaml |kubectl apply -f -
...
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod 2/2 Running 0 53m 10.42.0.34 k3d-k3s-default-server-0 <none> <none>
$ go run ./controller/script/destination-client/main.go -method getProfile -path 10.42.0.34:80
...
```
You can add/delete `srv` as well as edit its `proxyProtocol` field to observe the correct DestinationProfile updates.
Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic.
Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account.
There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service.
This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator.
The two primary scenarios to consider are
### A `Get` request for some service is streaming when a Server is created/updated/deleted
When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque.
When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides.
### A `Get` request for some service happens after a Server is created
When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set.
This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected.
## Testing
Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server
**app.yaml**
```yaml
apiVersion: v1
kind: Pod
metadata:
name: pod
labels:
app: pod
spec:
containers:
- name: app
image: nginx
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: svc
spec:
selector:
app: pod
ports:
- name: http
port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
name: srv
labels:
policy: srv
spec:
podSelector:
matchLabels:
app: pod
port: 80
proxyProtocol: HTTP/1
```
```bash
$ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## background
In order to upgrade `client-go` and other related libraries to `v0.22.0`, we had to address the deprecation of the service's `TopologyKeys` field. This field and it's related feature have been deprecated and superseded by [Topology Aware Hints](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2433-topology-aware-hints/README.md).
The goal of topology aware hints is to to provide a simpler way for users to prefer endpoints by basing decisions soely off the node's `topology.kubernetes.io/zone` label. If a node is in `zone-a`, then it should prefer endpoints that _should_ be consumed by clients in `zone-a`.
kube-proxy (and now the destination controller) know that an endpoint _should_ be consumed by clients in certain zones if its `Hints.ForZones` field is set with a zone value that matches that of the client.
For example, the endpoint slice controller may add the following hint to an endpoint:
```
- addresses: ["1.1.1.1"]
zone: "zone-a"
hints:
zone: "zone-b"
```
The above endpoint is an endpoint that is located in `zone-a` but should be consumed by clients in `zone-b`.
## changes
Now that topological preference is not a concept, we can remove it from the `servicePublisher` and `portPublisher` structs. The fields were only there so that it could be populated down to individual addresses.
The `Hints` field is only present on endpoints that belong to an `EndpointSlice`, so use of this field is limited to the `endpointSliceToAddresses` function.
When endpoint slices are translated to an `AddressSet` now, for each address (endpoint) we make sure to copy the `Hints.ForZones` field if it is present. This field is only present if it's set by the endpoint slice controller and it has [several safeguards](https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/#safeguards).
After `endpointSliceToAddresses` has translated an endpoint slice into an `AddressSet` and updated the endpoint translator's `availableEndpoints`, filtering takes place and is the crux of this change.
For each potential address that we have to consider in `availableEndpoints`, we make sure to only return a set of addresses who's consumption zone (zones in `forZones` field) match that of the node's zone. That way, we only communicate with endpoints that have been labeled by the endpoint slice controller for the current node we're on.
This allows us to remove the ordering/hierarchy of topological region and considering the `*` value.
## testing
I've added a unit test which creates an endpoint translator tied to a node in `west-1a` and asserts that it only handles updates for addresses that should be consumed by clients in `west-1a`.
Closes#6637
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This PR corrects misspellings identified by the [check-spelling action](https://github.com/marketplace/actions/check-spelling).
The misspellings have been reported at 0d56327e6f (commitcomment-51603624)
The action reports that the changes in this PR would make it happy: 03a9c310aa
Note: this PR does not include the action. If you're interested in running a spell check on every PR and push, that can be offered separately.
Signed-off-by: Josh Soref <jsoref@users.noreply.github.com>
While uncommon, if H2 upgrades are disabled it's possible for an opaque workload
to not have it's hint.OpaqueTransport field set in it's destination profile
response.
This changes the H2 upgrade enabled check to be specific for setting the
hint.Protocol while allowing hint.OpaqueTransport to be set independent of
that value.
Signed-off-by: Kevin Leimkuhler kevin@kleimkuhler.com
Co-authored-by: Oliver Gould <ver@buoyant.io>
When a pod is configured with `skip-inbound-ports` annotation, a client
proxy trying to connect to that pod tries to connect to it via H2 and
also tries to initiate a TLS connection. This issue is caused by the
destination controller when it sends protocol and TLS hints to the
client proxy for that skipped port.
This change fixes the destination controller so that it no longer
sends protocol and TLS identity hints to outbound proxies resolving a
`podIP:port` that is on a skipped inbound port.
I've included a test that exhibits this error prior to this fix but you
can also test the prior behavior by:
```bash
curl https://run.linkerd.io/booksapp.yml > booksapp.yaml
# edit either the books or authors service to:
1: Configure a failure rate of 0.0
2: add the `skip-inbound-ports` config annotation
bin/linkerd viz stat pods webapp
There should be no successful requests on the webapp deployment
```
Fixes#5995
Signed-off-by: Dennis Adjei-Baah <dennis@buoyant.io>
* destination: pass opaque-ports through cmd flag
Fixes#5817
Currently, Default opaque ports are stored at two places i.e
`Values.yaml` and also at `opaqueports/defaults.go`. As these
ports are used only in destination, We can instead pass these
values as a cmd flag for destination component from Values.yaml
and remove defaultPorts in `defaults.go`.
This means that users if they override `Values.yaml`'s opauePorts
field, That change is propogated both for injection and also
discovery like expected.
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
This changes the destination service to always use a default set of opaque ports
for pods and services. This is so that after Linkerd is installed onto a
cluster, users can benefit from common opaque ports without having to annotate
the workloads that serve the applications.
After #5810 merges, the proxy containers will be have the default opaque ports
`25,443,587,3306,5432,11211`. This value on the proxy container does not affect
traffic though; it only configures the proxy.
In order for clients and servers to detect opaque protocols and determine opaque
transports, the pods and services need to have these annotations.
The ports `25,443,587,3306,5432,11211` are now handled opaquely when a pod or
service does not have the opaque ports annotation. If the annotation is present
with a different value, this is used instead of the default. If the annotation
is present but is an empty string, there are no opaque ports for the workload.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This change introduces an opaque ports annotation watcher that will send
destination profile updates when a service has its opaque ports annotation
change.
The user facing change introduced by this is that the opaque ports annotation is
now required on services when using the multicluster extension. This is because
the service mirror will create mirrored services in the source cluster, and
destination lookups in the source cluster need to discover that the workloads in
the target cluster are opaque protocols.
### Why
Closes#5650
### How
The destination server now has a new opaque ports annotation watcher. When a
client subscribes to updates for a service name or cluster IP, the `GetProfile`
method creates a profile translator stack that passes updates through resource
adaptors such as: traffic split adaptor, service profile adaptor, and now opaque
ports adaptor.
When the annotation on a service changes, the update is passed through to the
client where the `opaque_protocol` field will either be set to true or false.
A few scenarios to consider are:
- If the annotation is removed from the service, the client should receive
an update with no opaque ports set.
- If the service is deleted, the stream stays open so the client should
receive an update with no opaque ports set.
- If the service has the annotation added, the client should receive that
update.
### Testing
Unit test have been added to the watcher as well as the destination server.
An integration test has been added that tests the opaque port annotation on a
service.
For manual testing, using the destination server scripts is easiest:
```
# install Linkerd
# start the destination server
$ go run controller/cmd/main.go destination -kubeconfig ~/.kube/config
# Create a service or namespace with the annotation and inject it
# get the destination profile for that service and observe the opaque protocol field
$ go run controller/script/destination-client/main.go -method getProfile -path test-svc.default.svc.cluster.local:8080
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000}
INFO[0000]
INFO[0000] fully_qualified_name:"terminus-svc.default.svc.cluster.local" opaque_protocol:true retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"terminus-svc.default.svc.cluster.local.:8080" weight:10000}
INFO[0000]
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Getting information about node topology queries the k8s api directly.
In an environment with high traffic and high number of pods, the
k8s api server can become overwhelmed or start throttling requests.
This MR introduces a node informer to resolve the bottleneck and
fetch node information asynchronously.
Fixes#5684
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
The destination service now returns `OpaqueTransport` hint when the annotation
matches the resolve target port. This is different from the current behavior
which always sets the hint when a proxy is present.
Closes#5421
This happens by changing the endpoint watcher to set a pod's opaque port
annotation in certain cases. If the pod already has an annotation, then its
value is used. If the pod has no annotation, then it checks the namespace that
the endpoint belongs to; if it finds an annotation on the namespace then it
overrides the pod's annotation value with that.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
## What
When the destination service returns a destination profile for an endpoint,
indicate if the endpoint can receive opaque traffic.
## Why
Closes#5400
## How
When translating a pod address to a destination profile, the destination service
checks if the pod is controlled by any linkerd control plane. If it is, it can
set a protocol hint where we indicate that it supports H2 and opaque traffic.
If the pod supports opaque traffic, we need to get the port that it expects
inbound traffic on. We do this by getting the proxy container and reading it's
`LINKERD2_PROXY_INBOUND_LISTEN_ADDR` environment variable. If we successfully
parse that into a port, we can set the opaque transport field in the destination
profile.
## Testing
A test has been added to the destination server where a pod has a
`linkerd-proxy` container. We can expect the `OpaqueTransport` field to be set
in the returned destination profile's protocol hint.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.
A test for this has been added to #5266.
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Context: #5209
This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.
The `Endpoint` field is only set if the IP maps to a Pod--not a Service.
Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.
### Examples
```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local" retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090" weight:10000}
```
Before:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```
After:
```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}} endpoint:{addr:{ip:{ipv4:170524692}} weight:10000 metric_labels:{key:"control_plane_ns" value:"linkerd"} metric_labels:{key:"deployment" value:"fast-1"} metric_labels:{key:"pod" value:"fast-1-5cc87f64bc-9hx7h"} metric_labels:{key:"pod_template_hash" value:"5cc87f64bc"} metric_labels:{key:"serviceaccount" value:"default"} tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}} protocol_hint:{h2:{}}}
```
Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Fixes#4191#4993
This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5
This consists of the following changes:
- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface
Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
[Link to RFC](https://github.com/linkerd/rfc/pull/23)
### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :)
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).
### How
---
* **Collecting metadata**:
- Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.
- For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.
* **Filtering endpoints**:
- This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.
- To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.
- When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.
And that's pretty much it.
Signed-off-by: Matei David <matei.david.35@gmail.com>