Commit Graph

72 Commits

Author SHA1 Message Date
Oliver Gould 3c91fc64ce
fix(destination): avoid panic on missing managed fields timestamp (#13378)
We received a report of a panic:

    runtime error: invalid memory address or nil pointer dereference

    panic({0x1edb860?, 0x37a6050?}
        /usr/local/go/src/runtime/panic.go:785 +0x132

    github.com/linkerd/linkerd2/controller/api/destination/watcher.latestUpdated({0xc0006b2d80?, 0xc00051a540?, 0xc0008fa008?})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1612 +0x125

    github.com/linkerd/linkerd2/controller/api/destination/watcher.(*OpaquePortsWatcher).updateService(0xc0007d5480, {0x21fd160?, 0xc000d71688?}, {0x21fd160, 0xc000d71688})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/opaque_ports_watcher.go:141 +0x68

The `latestUpdated` function does not properly handle the case where a atime is
omitted from a `ManagedFieldsEntry`.

    type ManagedFieldsEntry struct {
        // Time is the timestamp of when the ManagedFields entry was added. The
        // timestamp will also be updated if a field is added, the manager
        // changes any of the owned fields value or removes a field. The
        // timestamp does not update when a field is removed from the entry
        // because another manager took it over.
        // +optional
        Time *Time `json:"time,omitempty" protobuf:"bytes,4,opt,name=time"`

This change adds a check to avoid the nil dereference.
2024-11-22 15:21:09 -08:00
Alejandro Pedraza 71291fe7bc
Add `accessPolicy` field to Server CRD (#12845)
Followup to #12844

This new field defines the default policy for Servers, i.e. if a request doesn't match the policy associated to a Server then this policy applies. The values are the same as for `proxy.defaultInboundPolicy` and the `config.linkerd.io/default-inbound-policy` annotation (all-unauthenticated, all-authenticated, cluster-authenticated, cluster-unauthenticated, deny), plus a new value "audit". The default is "deny", thus remaining backwards-compatible.

This field is also exposed as an additional printer column.
2024-07-22 09:01:09 -05:00
Alejandro Pedraza df24ea6d96
Refactor ES addition logic in Destination (#12625)
This is a second take on #12427, which avoided a theoretical/correctness
issue around overwritting new ES addresses with stale data.

We had to revert that in #12589 because the change introduced a bug, by
returning early when the ES had no addresses and failed to properly
initiallize `addesses` for the portPublisher.

This just removes the early return.
2024-05-22 09:12:19 -05:00
Alejandro Pedraza 9bd8c005da
Revert "Fix destination staleness issue when adding EndpointSlices (#12427)" (#12589)
This reverts commit 4fccf3e9ec.

The early return was causing `pp.addresses = newAddressSet` to not be run when the list of addresses is empty; but setting that is still necessary so that labels are tracked correctly.

This was caught by the tap (viz) integration test run in the release workflow.
2024-05-13 16:25:50 -07:00
Alejandro Pedraza 4fccf3e9ec
Fix destination staleness issue when adding EndpointSlices (#12427)
When updating the portPublisher's address set when a new EndpointSlice creation event is received, its addresses where getting overwritten with stale data whenever its IDs already existed in the current pp's address set.

There can be pathological cases in single-stack where this can be a problem. For example when ES get recycled but the deletion event is not caught for some reason, when the addition event is received its Address data will be overwritten by the old stale entry.

## Other Changes

- Remove overriding `newAddressSet.LocalTrafficPolicy` as that is already taken care inside `pp.endpointSliceToAddresses(slice)`.
- When there are no Add events to send, return early without updating state nor updating metrics.
2024-05-08 09:12:51 -05:00
Alejandro Pedraza 137eac9df3
Add IPv6 support for the destination controller (#12428)
Services in dual-stack mode result in the creation of two EndpointSlices, one for each IP family. Before this change, the Get Destination API would nondeterministically return the address for any of those ES, depending on which one was processed last by the controller because they would overwrite each other.

As part of the ongoing effort to support IPv6/dual-stack networks, this change fixes that behavior giving preference to IPv6 addresses whenever a service exposes both families.

There are a new set of unit tests in server_ipv6_test.go, and in the TestEndpointTranslatorForPods tests there's a couple of new cases to test the interaction with zone filtering.
Also the server unit tests were updated to segregate the tests and resources dealing with the IPv4/IPv6/dual-stack cases.
2024-05-02 14:39:05 -05:00
knowmost 27bcdd1028
chore: fix function names in comment (#12512)
Signed-off-by: knowmost <knowmost@outlook.com>
2024-04-29 10:28:10 -07:00
Alex Leong 5915ef5a18
Don't send endpoint profile updates from Server updates when opaqueness doesn't change (#12013)
When the destination controller receives an update for a Server resource, we recompute opaqueness ports for all pods.  This results in a large number of updates to all endpoint profile watches, even if the opaqueness doesn't change.  In cases where there are many Server resources, this can result in a large number of updates being sent to the endpoint profile translator and overflowing the endpoint profile translator update queue.  This is especially likely to happen during an informer resync, since this will result in an informer callback for every Server in the cluster.

We refactor the workload watcher to not send these updates if the opaqueness has not changed.

This, seemingly simple, change in behavior requires a large code change because:
* the current opaqueness state is not stored on workload publishers and must be added so that we can determine if the opaqueness has changed
* storing the opaqueness in addition to the other state we're storing (pod, ip, port, etc.) means that we are not storing all of the data represented by the Address struct
* workload watcher uses a `createAddress` func to dynamically create an Address from the state it stores
* now that we are storing the Address as state, creating Addresses dynamically is no longer necessary and we can operate on the Address state directly
  * this makes the workload watcher more similar to other watchers and follow a common pattern
  * it also fixes some minor correctness issues:
    * pods that did not have the ready status condition were being considered when they should not have been
    * updates to ExternalWorkload labels were not being considered

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-03-19 10:24:02 -07:00
Matei David 98e38a66b6
Rename meshTls to meshTLS in ExternalWorkload CRD (#12098)
The ExternalWorkload resource we introduced has a minor naming
inconsistency; `Tls` in `meshTls` is not capitalised. Other resources
that we have (e.g. authentication resources) capitalise TLS (and so does
Go, it follows a similar naming convention).

We fix this in the workload resource by changing the field's name and
bumping the version to `v1beta1`.

Upgrading the control plane version will continue to work without
downtime. However, if an existing resource exists, the policy controller
will not completely initialise. It will not enter a crashloop backoff,
but it will also not become ready until the resource is edited or
deleted.

Signed-off-by: Matei David <matei@buoyant.io>
2024-02-20 11:00:13 -08:00
Alex Leong 1d0f484a20
Limit service publisher updates from server by namespace (#12040)
A Server may only select workloads in its own namespace. Therefore, when the
destination controller receives an update for a Server, it only needs to
potentially send updates to watches on workloads in that same namespace. Taking
this into account allows us avoid all opaqueness computations for workloads in
other namespaces.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-08 17:00:50 +00:00
Alejandro Pedraza 8f8bd8f28f
Fix race condition in Destination's endpoints watcher (#12022)
Fixes #12010

## Problem

We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010.

## Cause

The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue.

## Solution

Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated.

## Repro

Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below):

<details>
  <summary>endpoint_translator.go diff</summary>

```diff
diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go
index d1018d5f9..7d5abd638 100644
--- a/controller/api/destination/endpoint_translator.go
+++ b/controller/api/destination/endpoint_translator.go
@@ -5,6 +5,7 @@ import (
        "reflect"
        "strconv"
        "strings"
+       "time"

        pb "github.com/linkerd/linkerd2-proxy-api/go/destination"
        "github.com/linkerd/linkerd2-proxy-api/go/net"
@@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) {
 }

 func (et *endpointTranslator) add(set watcher.AddressSet) {
        for id, address := range set.Addresses {
+               time.Sleep(1 * time.Second)
                et.availableEndpoints.Addresses[id] = address
        }
```
</details>

Then create these two Server manifests:

<details>
  <summary>emoji-web-server.yml</summary>

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: Server
metadata:
  namespace: emojivoto
  name: web-http
  labels:
    app.kubernetes.io/part-of: emojivoto
    app.kubernetes.io/name: web
    app.kubernetes.io/version: v11
spec:
  podSelector:
    matchLabels:
      app: web-svc
  port: http
  proxyProtocol: HTTP/1
```
</details>

<details>
  <summary>emoji-web-server-opaque.yml</summary>

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: Server
metadata:
  namespace: emojivoto
  name: web-http
  labels:
    app.kubernetes.io/part-of: emojivoto
    app.kubernetes.io/name: web
    app.kubernetes.io/version: v11
spec:
  podSelector:
    matchLabels:
      app: web-svc
  port: http
  proxyProtocol: opaque
```
</details>

In separate consoles run the patched destination service and a destination client:

```bash
HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211
```

```bash
go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80
```

And run this to continuously switch the `proxyProtocol` field:

```bash
while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done
```

You'll see the following data race report in the Destination controller logs:

<details>
  <summary>destination logs</summary>

```console
==================
WARNING: DATA RACE
Write at 0x00c0006d30e0 by goroutine 178:
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm()
      <autogenerated>:1 +0x64
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/controller.go:246 +0x81
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
      <autogenerated>:1 +0x1f
  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:970 +0x1f4
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:226 +0x41
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:227 +0xbe
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:204 +0x10a
  k8s.io/apimachinery/pkg/util/wait.Until()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:161 +0x9b
  k8s.io/client-go/tools/cache.(*processorListener).run()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:966 +0x38
  k8s.io/client-go/tools/cache.(*processorListener).run-fm()
      <autogenerated>:1 +0x33
  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/wait.go:72 +0x86

Previous read at 0x00c0006d30e0 by goroutine 360:
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45
```
</details>

## Extras

This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
2024-02-07 12:17:24 -05:00
Alex Leong 53287d40a8
Reflect changes to server selector in opaqueness (#12031)
Fixes #11995

If a Server is marking a Pod's port as opaque and then the Server's podSelector is updated to no longer select that Pod, then the Pod's port should no longer be marked as opaque. However, this update does not result in any updates from the destination API's Get stream and the port remains marked as opaque.

We fix this by updating the endpoint watcher's handling of Server updates to consider both the old and the new Server.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-05 12:25:16 -08:00
Matei David 3313bec072
Process ExternalWorkload targetRefs when updating slices (#11987)
If the readiness of an external workload endpoint changes while traffic
is being sent to it, the update will not be propagated to clients. This
can lead to issues where an endpoint that is marked as `notReady`
continues to figure out as being `ready` by the endpoints watcher.

The issue stems from how endpoint slices are diffed. A utility function
responsible for processing addresses does not consider endpoints whose
targetRef is an external workload. We fix the problem and add two module
tests to validate readiness is propagated to clients correctly.

---------

Signed-off-by: Matei David <matei@buoyant.io>
2024-01-26 12:01:50 +00:00
Alex Leong 65f13de2ce
Add support for ExternalWorkloads in endpoint profiles (#11952)
When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including:
* mesh TLS identity
* workload metric labels
* opaque / protocol hints

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-01-23 09:43:12 -08:00
Zahari Dichev 027d49a9a6
discovery: handle endpoint slices from ExternalWorkload (#11939)
This alters the endpoints slices watcher to handle slices that reference ExternalWorkloads.

Testing
Add the following resources: 

```yaml
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
addressType: IPv4
metadata:
  name: my-external-workload
  namespace: mixed-env
  labels:
    kubernetes.io/service-name: test-1
endpoints:
- addresses:
  - 172.21.0.5
  conditions:
    ready: true
    serving: true
    terminating: false
  targetRef:
    kind: ExternalWorkload
    name: my-external-workload
ports:
- port: 8080
  name: http
---
apiVersion: workload.linkerd.io/v1alpha1
kind: ExternalWorkload
metadata:
  name: my-external-workload
  namespace: mixed-env
  labels:
    app: test
spec:
  meshTls:
    identity: "test"
    serverName: "test"
  workloadIPs:
  - ip: 172.21.0.5
  ports:
  - port: 8080
    name: http
---
apiVersion: v1
kind: Service
metadata:
  name: test-1
  namespace: mixed-env
spec:
  selector:
    app: test
  type: ClusterIP
  ports:
  - name: http
    port: 8080
    targetPort: 8080
    protocol: TCP

```

Observe endpoints:
```
linkerd dg endpoints test-1.mixed-env.svc.cluster.local:8080
```

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-17 15:43:20 -08:00
Zahari Dichev 391ce919f5
policy: regenerate Server go bindings (#11920)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-15 11:09:31 +02:00
Alex Leong 27a1a84a48
Only send server updates to listeners when the opaque protocol changes (#11907)
Whenever the destination controller's informer receives an update of a Server resource, it checks every portPublisher in the endpointsWatcher to see if the Server selects any pods in that servicePort and updates those pods' opaque protocol field.  Regardless of if any pods were matched or if the opaque protocol changed, an update is sent to each listener.  This results in an update to every endpointTranslator each time a Server is updated.  During a resync, we get an update for every Server in the cluster which results in N updates to each endpointTranslator where N is the number of Servers in the cluster.

If N is greater than 100, it becomes possible that these N updates could overflow the endpointTranslator update queue if the queue is not being drained fast enough.

We change this to only send the update for a Server if at least one of the servicePort addresses was selected by that server AND it's opaque protocol field changed.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-01-10 14:18:36 -08:00
Oliver Gould 5e558ae3e5
destination: Add optional experimental endpoint weighting (#11795)
This change adds a runtime flag to the destination controller,
--experimental-endpoint-zone-weights=true, that causes endpoints in the
local zone to receive higher weights. This feature is disabled by
default, since the weight value is not honored by proxies. No helm
configuration is exposed yet, either.

This weighting is instrumented in the endpoint translator. Tests are
added to confirm that the behavior is feature-gated.

Additionally, this PR adds the "zone" metric label to endpoint metadata
responses.
2023-12-20 13:11:30 -08:00
Alex Leong 1e605ddf63
Add informer lag histograms (#11534)
In order to detect if the destination controller's k8s informers have fallen behind, we add a histogram for each resource type.  These histograms track the delta between when an update to a resource occurs and when the destination controller processes that update.  We do this by looking at the timestamps on the managed fields of the resource and looking for the most recent update and comparing that to the current time.

The histogram metrics are of the form `{kind}_informer_lag_ms_bucket_*`.

* We record a value only for updates, not for adds or deletes.  This is because when the controller starts up, it will populate its cache with an add for each resource in the cluster and the delta between the last updated time of that resource and the current time may be large.  This does not represent informer lag and should not be counted as such.
* When the informer performs resyncs, we get updates where the updated time of the old version is equal to the updated time of the new version.  This does not represent an actual update of the resource itself and so we do not record a value.
* Since we are comparing timestamps set on the manged fields of resources to the current time from the destination controller's system clock, the accuracy of these metrics depends on clock drift being minimal across the cluster.
* We use histogram buckets which range from 500ms to about 17 minutes.  In my testing, an informer lag of 500ms-1000ms is typical.  However, we wish to have enough buckets to identify cases where the informer is lagged significantly behind.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-11-08 14:56:20 -05:00
Alejandro Pedraza 65ddba4e5d
dst: Update `GetProfile`'s stream when pod associated to HostPort lookup changes (#11334)
Followup to #11328

Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned.

The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons.

Other things to note about PodWatcher:

- It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`.
- The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event.
- 
Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).
2023-09-28 08:57:52 -05:00
Alex Leong 368b63866d
Add support for remote discovery (#11224)
Adds support for remote discovery to the destination controller.

When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster.  The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name.  It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster.  It then streams back the endpoint data for that remote service.

Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other.

---------

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-08-11 09:31:45 -07:00
Matei David c2780adde1
Introduce an EndpointsWatcher cache structure (#11190)
Currently, the control plane does not support indexing and discovering resources across cluster boundaries. In a multicluster set-up, it is desirable to have access to endpoint data (and by extension, any configuration associated with that endpoint) to support pod-to-pod communication. Linkerd's destination service should be extended to support reading discovery information from multiple sources (i.e. API Servers).

This change introduces an `EndpointsWatcher` cache. On creation, the cache registers a pair of event handlers:
* One handler to read `multicluster` secrets that embed kubeconfig files. For each such secret, the cache creates a corresponding `EndpointsWatcher` to read remote discovery information.
* Another handle to evict entries from the cache when a `multicluster` secret is deleted.

Additionally, a set of tests have been added that assert the behaviour of the cache when secrets are created and/or deleted. The cache will be used by the destination service to do look-ups for services that belong to another cluster, and that are running in a "remote discovery" mode.

---------

Signed-off-by: Matei David <matei@buoyant.io>
2023-08-04 13:59:21 -07:00
Alex Leong f7f1a95081
Add mutex lock to updateServer (#11169)
Fixes #11163

The `servicePublisher.updateServer` function will iterate through all registered listeners and update them.  However, a nil listener may temporarily be in the list of listeners if an unsubscribe is in progress.  This results in a nil pointer dereference.

All functions which result in updating the listeners must therefore be protected by the mutex so that we don't try to act on the list of listeners while it is being modified.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-07-27 11:17:53 -07:00
Mark Robinson 21209955c2
Fix bug where topology routing would not disable while service was under load. (#10925)
Add support for enabling and disabling topology aware routing when hints are added/removed.

The testing setup is very involved because it involves so many moving parts

1) Setup a service which is layered over several availability zones.
1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each.
2) Add `service.kubernetes.io/topology-aware-hints: Auto` annotation to the Service object
3) Use a load tester like k6 to send meaningful traffic to your service but only in one AZ
3) Scale up your replica sets until k8s adds Hints to your endpointslices
4) Observe that traffic shifts to only hit pods in one AZ

5) Turn down the replicasets count until such time that K8s removes the hints from your endpointslices
6) Observe traffic shifts back to all pods across all AZ.
2023-05-26 10:31:14 -07:00
Oliver Gould c657aeabe8
destination: Split GetProfile into smaller functions (#10514)
Before changing any GetProfile behavior, this change splits the API
handler into some smaller scopes. This helps to clarify control flow and
reduce nested contexts. This change also adds relevant fields to log
contexts to improve diagnostics.
2023-03-13 11:36:39 -07:00
Yannick Utard 95a3d19b3f
Fix deleteEndpointSlice when there are multiple EndpointSlices (#10370)
When processing a `delete` event for an EndpointSlice, regardless of the outcome (whether there are still addresses/endpoints alive or whether we have no endpoints left) we make a call to `noEndpoints` on the subscriber. This will send a `Destination.Get` update to the listeners to advertise a `NoEndpoints` event.

If there are still addresses left, NoEndpoints { exists: true } will be sent (to signal the service still exists). Until an update is registered (i.e a new add) there will be no available endpoints -- this is incorrect, since other EndpointSlices may exist. This change fixes the problem by handling `noEndpoints` in a more specialized way for EndpointSlices.

Signed-off-by: Yannick Utard <yannickutard@gmail.com>
2023-02-22 13:35:02 +00:00
Alejandro Pedraza 4a84f2cb32
Implement the k8s metadata API in the Destination controller (#10326)
Fixes #9986

After reviewing the k8s API calls in Destination, it was concluded we
could only swap out the calls to the Node and RS resources to use the
metadata API, as all the other resources (Endpoints, EndpointSlices,
Services, Pod, ServiceProfiles, Server) required fields other than those
found in their metadata section.

This also required completing the `NewFakeAPI` implementation by adding
the missing annotations and labels entries.

## Testing Memory Consumption

The gains here aren't as big as in #9650. In order to test this we need
to push hard and create 4000 RS:

``` bash
for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done
```

In edge-23.2.1 the destination pod's memory consumption goes from 40Mi
to 160Mi after all the RS were created. With this change, it went from
37Mi to 140Mi.
2023-02-13 17:30:07 -05:00
Alejandro Pedraza 147c8dc07c
Add metrics to server and service watchers (#10213)
* Add metrics to server and service watchers

Closes #10202 and completes #2204

As a followup to #10201, I'm adding the following metric in `server_watcher.go`:

- `server_port_subscribers`: This tracks the number of subscribers to changes to Servers associated to a port in a pod. The metric's label identify the namespace and name of the pod, and its targeted port.

Additionally, `opaque_ports.go` was missing metrics as well. I added `service_subscribers` which tracks the number of subscribers to a given Service, labeled by the Service's namespace and name.

`opaque_ports.go` was also leaking the subscriber's map key, so that got fixed as well.
2023-02-07 08:51:09 -05:00
Yu Cao e662e147ca
Support service internal traffic policy (#10186)
Closes #10130

https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/
1. Update endpoints watcher to include additional field `localTrafficPolicy`.
   Set to true when `.spec.internalTrafficPolicy` is set to `Local`
2. Update endpoints translater to filter by node when `localTrafficPolicy` is
   set to true. Topology Aware Hints are not used when
   `service.pec.internalTrafficPolicy` is set to local

Signed-off-by: Yu Cao <yc185050@ncr.com>
2023-02-06 13:53:07 -07:00
dependabot[bot] 62d6d7cd52
build(deps): bump sigs.k8s.io/gateway-api from 0.5.1 to 0.6.0 (#10038)
* build(deps): bump sigs.k8s.io/gateway-api from 0.5.1 to 0.6.0

Bumps [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api) from 0.5.1 to 0.6.0.
- [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases)
- [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/CHANGELOG.md)
- [Commits](https://github.com/kubernetes-sigs/gateway-api/compare/v0.5.1...v0.6.0)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/gateway-api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Account for possible errors returned from `AddEventHandler`

In v0.26.0 client-go's `AddEventHandler` method for informers started
returning a registration handle (that we ignore) and an error that we
now surface up.

* client-go v0.26.0 removed the openstack plugin

* Temporary changes to trigger tests in k8s 1.21

- Adds an innocuous change to integration.yml so that all tests get
  triggered
- Hard-code k8s version in `k3d cluster create` invocation to v1.21

* Revert "Temporary changes to trigger tests in k8s 1.21"

This reverts commit 3e1fdd0e5e.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2023-01-16 09:38:09 -05:00
Alex Leong 768e04dd7e
Update endpoints watcher to not fetch pods for removed endpoints (#10013)
Fixes #10003

When endpoints are removed from an EndpointSlice resource, the destination controller builds a list of addresses to remove.  However, if any of the removed endpoints have a Pod as their targetRef, we will attempt to fetch that pod to build the address to remove.  If that pod has already been removed from the informer cache, this will fail and the endpoint will be skipped in the list of endpoints to be removed.  This results in stale endpoints being stuck in the address set and never being removed.

We update the endpoint watcher to construct only a list of endpoint IDs for endpoints to remove, rather than fetching the entire pod object.  Since we no longer attempt to fetch the pod, this operation is now infallible and endpoints will no longer be skipped during removal.

We also add a `TestEndpointSliceScaleDown` test to exercise this.

Signed-off-by: Alex Leong <alex@buoyant.io>
2023-01-03 10:04:02 -08:00
Jacob Henner 7d47639608
Remove kube-system exclusions from watchers (#8720)
Watch events for objects in the kube-system namespace were previously ignored.
In certain situations, this would cause the destination service to return
invalid (outdated) endpoints for services in kube-system - including unmeshed
services.

It [was suggested][1] that kube-system events were ignored to avoid handling
frequent Endpoint updates - specifically from [controllers using Endpoints for
leader elections][2]. As of Kubernetes 1.20, these controllers [default to using
Leases instead of Endpoints for their leader elections][3], obviating the need
to exclude (or filter) updates from kube-system. The exclusions have been
removed accordingly.

[1]: https://github.com/linkerd/linkerd2/pull/4133#issuecomment-594983588
[2]: https://github.com/kubernetes/kubernetes/issues/86286
[3]: https://github.com/kubernetes/kubernetes/pull/94603

Signed-off-by: Jacob Henner <code@ventricle.us>
2022-07-11 13:52:27 -06:00
Alex Leong b7a0b8adb4
Bump minimum kubernetes version to 1.21 (#8647)
Fixes #8592

Increase the minimum supported kubernetes version from 1.20 to 1.21.  This allows us to drop support for batch/v1beta1/CronJob and discovery/v1beta1/EndpointSlices, instead using only v1 of those resources.  This fixes deprecation warnings about these warnings printed by the CLI.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-06-14 15:15:28 -07:00
Alex Leong c5963fbbb1
Set targetCluster label even when serviceFqn is not set (#8542)
Fixes #8134

The multicluster probe-gateway service (which is used by the service-mirror controller to send health probes to the remote gateway) has `mirror.linkerd.io/cluster-name` label but it does not have a `mirror.linkerd.io/remote-svc-fq-name` annotation.  This makes sense because the probe-gateway service does not correspond to any target service on the remote cluster and instead targets the gateway itself.

We update the logic in the destination controller so that we can still add the `dst_target_cluster` metric label when only the `mirror.linkerd.io/cluster-name` label is present but not the `mirror.linkerd.io/remote-svc-fq-name` annotation.  This allows us to have the `dst_target_cluster` metric label for service-mirror controller probe traffic.

Signed-off-by: Alex Leong <alex@buoyant.io>
2022-05-31 16:09:51 -07:00
Oliver Gould f5876c2a98
go: Enable `errorlint` checking (#7885)
Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks
that error formatting and inspection is wrapping-aware.

This change enables `errorlint` in golangci-lint and updates all error
handling code to pass the lint. Some comparisons in tests have been left
unchanged (using `//nolint:errorlint` comments).

[el]: https://github.com/polyfloyd/go-errorlint

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-16 18:32:19 -07:00
Oliver Gould 7776a50074
destination: Use net.JoinHostPort to format names (#7821)
We use `fmt.Sprintf` to format URIs in several places we could be using
`net.JoinHostPort`. `net.JoinHostPort` ensures that IPv6 addresses are
properly escaped in URIs.

Signed-off-by: Oliver Gould <ver@buoyant.io>
2022-02-07 15:05:14 -08:00
Kevin Leimkuhler 147d85dc70
Update `GetProfile` clients with policy server updates (#7388)
### What

`GetProfile` clients do not receive destinatin profiles that consider Server protocol fields the way that `Get` clients do. If a Server exists for a `GetProfile` destination that specifies the protocol for that destination is `opaque`, this information is not passed back to the client.

#7184 added this for `Get` by subscribing clients to Endpoint/EndpointSlice updates. When there is an update, or there is a Server update, the endpoints watcher passes this information back to the endpoint translator which handles sending the update back to the client.

For `GetProfile` the situation is different. As with `Get`, we only consider Servers when dealing with Pod IPs, but this only occurs in two situations for `GetProfile`.

1. The destination is a Pod IP and port
2. The destionation is an Instance ID and port

In both of these cases, we need to check if a already Server selects the endpoint and we need to subscribe for Server updates incase one is added or deleted which selects the endpoint.

### How

First we check if there is already a Server which selects the endpoint. This is so that when the first destionation profile is returned, the client knows if the destination is `opaque` or not.

After sending that first update, we then subscribe the client for any future updates which will come from a Server being added or deleted.

This is handled by the new `ServerWatcher` which watches for Server updates on the cluster; when an update occurs it sends that to the `endpointProfileTranslator` which translates the protcol update into a DestinationProfile.

By introducing the `endpointProfileTranslator` which only handles protocol updates, we're able to decouple the endpoint logic from `profileTranslator`—it's `endpoint` field has been removed now that it only handles updates for ServiceProfiles for Services.

### Testing

A unit test has been added and below are some manual testing instructions to see how it interacts with Server updates:

<details>
	<summary>app.yaml</summary>

	```yaml
	apiVersion: v1
	kind: Pod
	metadata:
	  name: pod
	  labels:
		app: pod
	spec:
	  containers:
	  - name: app
		image: nginx
		ports:
		  - name: http
			containerPort: 80
	---
	apiVersion: policy.linkerd.io/v1beta1
	kind: Server
	metadata:
	  name: srv
	  labels:
		policy: srv
	spec:
	  podSelector:
		matchLabels:
		  app: pod
	  port: 80
	  proxyProtocol: opaque
	```
</details>

```shell
$ go run ./controller/cmd/main.go destination
```

```shell
$ linkerd inject app.yaml |kubectl apply -f -
...
$ kubectl get pods -o wide
NAME   READY   STATUS    RESTARTS   AGE   IP           NODE                       NOMINATED NODE   READINESS GATES
pod    2/2     Running   0          53m   10.42.0.34   k3d-k3s-default-server-0   <none>           <none>
$ go run ./controller/script/destination-client/main.go -method getProfile -path 10.42.0.34:80
...
```

You can add/delete `srv` as well as edit its `proxyProtocol` field to observe the correct DestinationProfile updates.

Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>
2021-12-08 12:26:27 -07:00
Kevin Leimkuhler 01cbe616f1
Honor Server `proxyProtocol` in destination service `Get` with policy CRD APIs (#7184)
This change ensures that if a Server exists with `proxyProtocol: opaque` that selects an endpoint backed by a pod, that destination requests for that pod reflect the fact that it handles opaque traffic.

Currently, the only way that opaque traffic is honored in the destination service is if the pod has the `config.linkerd.io/opaque-ports` annotation. With the introduction of Servers though, users can set `server.Spec.ProxyProtocol: opaque` to indicate that if a Server selects a pod, then traffic to that pod's `server.Spec.Port` should be opaque. Currently, the destination service does not take this into account.

There is an existing change up that _also_ adds this functionality; it takes a different approach by creating a policy server client for each endpoint that a destination has. For `Get` requests on a service, the number of clients scales with the number of endpoints that back that service.

This change fixes that issue by instead creating a Server watch in the endpoint watcher and sending updates through to the endpoint translator.

The two primary scenarios to consider are

### A `Get` request for some service is streaming when a Server is created/updated/deleted
When a Server is created or updated, the endpoint watcher iterates through its endpoint watches (`servicePublisher` -> `portPublisher`) and if it selects any of those endpoints, the port publisher sends an update if the Server has marked that port as opaque.

When a Server is deleted, the endpoint watcher once again iterates through its endpoint watches and deletes the address set's `OpaquePodPorts` field—ensuring that updates have been cleared of Server overrides.

### A `Get` request for some service happens after a Server is created
When a `Get` request occurs (or new endpoints are added—they both take the same path), we must check if any of those endpoints are selected by some existing Server. If so, we have to take that into account when creating the address set.

This part of the change gives me a little concern as we first must get all the Servers on the cluster and then create a set of _all_ the pod-backed endpoints that they select in order to determine if any of these _new_ endpoints are selected.

## Testing
Right now this can be tested by starting up the destination service locally and running `Get` requests on a service that has endpoints selected by a Server

**app.yaml**
```yaml
apiVersion: v1
kind: Pod
metadata:
  name: pod
  labels:
    app: pod
spec:
  containers:
  - name: app
    image: nginx
    ports:
    - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: svc
spec:
  selector:
    app: pod
  ports:
  - name: http
    port: 80
---
apiVersion: policy.linkerd.io/v1alpha1
kind: Server
metadata:
  name: srv
  labels:
    policy: srv
spec:
  podSelector:
    matchLabels:
      app: pod
  port: 80
  proxyProtocol: HTTP/1
```

```bash
$ go run controller/script/destination-client/main.go -path svc.default.svc.cluster.local:80
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-11-23 20:35:53 -07:00
Alex Leong ea5461f674
Fix identity overrides for endpoint slices (#7243)
When the `mirror.linkerd.io/remote-gateway-identity` and `mirror.linkerd.io/remote-svc-fq-name` annotations are set on an EndpointSlice object, the destination controller does not return the correct identity hints for that endpoint.

We fix an incorrect assignment to fix this.  We also fix some logic that can result in a nil pointer dereference instead of comparing empty strings.  We add a test case to exercise these.

Signed-off-by: Alex Leong <alex@buoyant.io>
2021-11-09 16:38:46 -08:00
Kevin Leimkuhler ebb1ee8c4c
Deprecate `topologyKeys` and add support for endpoint slices `Hints`. (#6698)
## background

In order to upgrade `client-go` and other related libraries to `v0.22.0`, we had to address the deprecation of the service's `TopologyKeys` field. This field and it's related feature have been deprecated and superseded by [Topology Aware Hints](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/2433-topology-aware-hints/README.md).

The goal of topology aware hints is to to provide a simpler way for users to prefer endpoints by basing decisions soely off the node's `topology.kubernetes.io/zone` label. If a node is in `zone-a`, then it should prefer endpoints that _should_ be consumed by clients in `zone-a`.

kube-proxy (and now the destination controller) know that an endpoint _should_ be consumed by clients in certain zones if its `Hints.ForZones` field is set with a zone value that matches that of the client.

For example, the endpoint slice controller may add the following hint to an endpoint:
```
- addresses: ["1.1.1.1"]
  zone: "zone-a"
  hints:
    zone: "zone-b"
```

The above endpoint is an endpoint that is located in `zone-a` but should be consumed by clients in `zone-b`.

## changes

Now that topological preference is not a concept, we can remove it from the `servicePublisher` and `portPublisher` structs. The fields were only there so that it could be populated down to individual addresses.

The `Hints` field is only present on endpoints that belong to an `EndpointSlice`, so use of this field is limited to the `endpointSliceToAddresses` function.

When endpoint slices are translated to an `AddressSet` now, for each address (endpoint) we make sure to copy the `Hints.ForZones` field if it is present. This field is only present if it's set by the endpoint slice controller and it has [several safeguards](https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/#safeguards).

After `endpointSliceToAddresses` has translated an endpoint slice into an `AddressSet` and updated the endpoint translator's `availableEndpoints`, filtering takes place and is the crux of this change.

For each potential address that we have to consider in `availableEndpoints`, we make sure to only return a set of addresses who's consumption zone (zones in `forZones` field) match that of the node's zone. That way, we only communicate with endpoints that have been labeled by the endpoint slice controller for the current node we're on.

This allows us to remove the ordering/hierarchy of topological region and considering the `*` value.

## testing

I've added a unit test which creates an endpoint translator tied to a node in `west-1a` and asserts that it only handles updates for addresses that should be consumed by clients in `west-1a`.

Closes #6637

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-11-08 12:21:31 -07:00
Bart Peeters 6bf507ad32
Change log level for watchers in destination svc
Make destination info logs clearer by changing log level of watchers log messages 'Establishing watch', 'Starting watch' and 'Stopping watch' from info to debug (#6917)

Signed-off-by: Bart Peeters <birtpeeters@hotmail.com>
2021-09-20 09:44:32 +01:00
Tarun Pothulapati fac28ff8a7
destination: Remove support for IP Queries in `Get` API (#6018)
* destination: Remove support for IP Queries in `Get` API

Fixes #5246

This PR updates the destination to report an error when `Get`
is called for IP Queries. As the issue mentions, The proxies
are not using this API anymore and it helps to simplify and
remove unnecessary logic.

This removes the relevant `IPWatcher` logic, along with
unit tests

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2021-04-21 12:40:40 +05:30
Kevin Leimkuhler 3f72c998b3
Handle pod lookups for pods that map to a host IP and host port (#5904)
This fixes an issue where pod lookups by host IP and host port fail even though
the cluster has a matching pod.

Usually these manifested as `FailedPrecondition` errors, but the messages were
too long and resulted in http/2 errors. This change depends on #5893 which fixes
that separate issue.

This changes how often those `FailedPrecondition` errors actually occur. The
destination service now considers pod host IPs and should reduce the frequency
of those errors.

Closes #5881 

---

Lookups like this happen when a pod is created with a host IP and host port set
in its spec. It still has a pod IP when running, but requests to
`hostIP:hostPort` will also be redirected to the pod. Combinations of host IP
and host Port are unique to the cluster and enforced by Kubernetes.

Currently, the destination services fails to find pods in this scenario because
we only keep an index with pod and their pod IPs, not pods and their host IPs.
To fix this, we now also keep an index of pods and their host IPs—if and only if
they have the host IP set.

Now when doing a pod lookup, we consider both the IP and the port. We perform
the following steps:

1. Do a lookup by IP in the pod podIP index
  - If only one pod is found then return it
2. 0 or more than 1 pods have the same pod IP
3. Do a lookup by IP in the pod hostIP index
  - If any number of pods were found, we know that IP maps to a node IP.
    Therefore, we search for a pod with a matching host Port. If one exists then
    return it; if not then there is no pod that matches `hostIP:port`
4. The IP does not map to a host IP
5. If multiple pods were found in `1`, then we know there are pods with
   conflicting podIPs and an error is returned
6. If no pounds were found in `1` then there is no pod that matches `IP:port`

---

Aside from the additional IP watcher test being added, this can be tested with
the following steps:

1. Create a kind cluster. kind is required because it's pods in `kube-system`
   have the same pod IPs; this not the case with k3d: `bin/kind create cluster`
2. Install Linkerd with `4445` marked as opaque: `linkerd install --set
   proxy.opaquePorts="4445" |kubectl apply -f -`
2. Get the node IP: `kubectl get -o wide nodes`
3. Pull my fork of `tcp-echo`:

```
$ git clone https://github.com/kleimkuhler/tcp-echo
...
$ git checkout --track kleimkuhler/host-pod-repro
```

5. `helm package .`
7. Install `tcp-echo` with the server not injected and correct host IP: `helm
   install tcp-echo tcp-echo-0.1.0.tgz --set server.linkerdInject="false" --set
   hostIP="..."`
8. Looking at the client's proxy logs, you should not observe any errors or
   protocol detection timeouts.
9. Looking at the server logs, you should see all the requests coming through
   correctly.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-03-18 13:29:43 -04:00
Riccardo Freixo 66b89f55e7
Fix named port resolution mid roll out (#5912) (#5911)
# Problem

While rolling out often not all pods will be ready in all the same set of
ports, leading the Kubernetes Endpoints API to return multiple subsets,
each covering a different set of ports, with the end result that the
same address gets repeated across subsets.

The old code for endpointsToAddresses would loop through all subsets, and the
later occurrences of an address would overwrite previous ones, with the
last one prevailing.

If the last subset happened to be for an irrelevant port, and the port to
be resolved is named, resolveTargetPort would resolve to port 0, which would
return port 0 to clients, ultimately leading linkerd-proxy to forward
connections to port 0.

This only happens if the pods selected by a service expose > 1 port, the
service maps to > 1 of these ports, and at least one of these ports is named.

# Solution

Never write an address to set of addresses if resolved port is 0, which
indicates named port resolution failed.

# Validation

Added a test case.

Signed-off-by: Riccardo Freixo <riccardofreixo@gmail.com>
2021-03-17 17:40:11 -04:00
Kevin Leimkuhler 5dc662ae97
Remove namespace inheritance of opaque ports annotation (#5739)
This change removes the namespace inheritance of the opaque ports annotation.
Now when setting opaque port related fields in destination profile responses, we
only look at the pod annotations.

This prepares for #5736 where the proxy-injector will add the annotation from
the namespace if the pod does not have it already.

Closes #5735

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-02-15 10:21:20 -05:00
Oleh Ozimok c416e78261
destination: Fix crash when EndpointSlices are enabled (#5543)
The Destination controller can panic due to a nil-deref when
the EndpointSlices API is enabled.

This change updates the controller to properly initialize values
to avoid this segmentation fault.

Fixes #5521

Signed-off-by: Oleg Ozimok <oleg.ozimok@corp.kismia.com>
2021-01-15 12:52:11 -08:00
Alejandro Pedraza d3d7f4e2e2
Destination should return `OpaqueTransport` hint when annotation matches resolved target port (#5458)
The destination service now returns `OpaqueTransport` hint when the annotation
matches the resolve target port. This is different from the current behavior
which always sets the hint when a proxy is present.

Closes #5421

This happens by changing the endpoint watcher to set a pod's opaque port
annotation in certain cases. If the pod already has an annotation, then its
value is used. If the pod has no annotation, then it checks the namespace that
the endpoint belongs to; if it finds an annotation on the namespace then it
overrides the pod's annotation value with that.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
2021-01-05 14:54:55 -05:00
Tarun Pothulapati d0caaa86c4
Bump k8s client-go to v0.19.2 (#5002)
Fixes #4191 #4993

This bumps Kubernetes client-go to the latest v0.19.2 (We had to switch directly to 1.19 because of this issue). Bumping to v0.19.2 required upgrading to smi-sdk-go v0.4.1. This also depends on linkerd/stern#5

This consists of the following changes:

- Fix ./bin/update-codegen.sh by adding the template path to the gen commands, as it is needed after we moved to GOMOD.
- Bump all k8s related dependencies to v0.19.2
- Generate CRD types, client code using the latest k8s.io/code-generator
- Use context.Context as the first argument, in all code paths that touch the k8s client-go interface

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
2020-09-28 12:45:18 -05:00
Matei David f797ab1e65
service topologies: topology-aware service routing (#4780)
[Link to RFC](https://github.com/linkerd/rfc/pull/23)

### What
---
* PR that puts together all past pieces of the puzzle to deliver topology-aware service routing, as specified in the [Kubernetes docs](https://kubernetes.io/docs/concepts/services-networking/service-topology/) but with a much better load balancing algorithm and all the coolness of linkerd :) 
* The first piece of this PR is focused on adding topology metadata: topology preference for services and topology `<k,v>` pairs for endpoints.
* The second piece of this PR puts together the new context format and fetching the source node topology metadata in order to allow for endpoints filtering.
* The final part is doing the filtering -- passing all of the metadata to the listener and on every `Add` filtering endpoints based on the topology preference of the service, topology `<k,v>` pairs of endpoints and topology of the source (again `<k,v>` pairs).

### How
---

* **Collecting metadata**:
   -  Services do not have values for topology keys -- the topological keys defined in a service's spec are only there to dictate locality preference for routing; as such, I decided to store them in an array, they will be taken exactly as they are found in the service spec, this ensures we respect the preference order.

   - For EndpointSlices, we are using a map -- an EndpointSlice has locality information in the form of `<k,v>` pair, where the key is a topological key (similar to what's listed in the service) and the value is the locality information -- e.g `hostname: minikube`. For each address we now have a map of topology values which gets populated when we translate the endpoints to an address set. Because normal Endpoints do not have any topology information, we create each address with an empty map which is subsequently populated ONLY for slices in the `endpointSliceToAddressSet` function.

* **Filtering endpoints**:
  - This was a tricky part and filled me with doubts. I think there are a few ways to do this, but this is how I "envisioned" it. First, the `endpoint_translator.go` should be the one to do the filtering; this means that on subscription, we need to feed all of the relevant metadata to the listener. To do this, I created a new function `AddTopologyFilter` as part of the listener interface.

  - To complement the `AddTopologyFilter` function, I created a new `TopologyFilter` struct in `endpoints_watcher.go`. I then embedded this structure in all listeners that implement the interface. The structure holds the source topology (source node), a boolean to tell if slices are activated in case we need to double check (or write tests for the function) and the service preference. We create the filter on Subscription -- we have access to the k8s client here as well as the service, so it's the best point to collect all of this data together. Addresses all have their own topology added to them so they do not have to be collected by the filter.

  - When we add a new set of addresses, we check to see if slices are enabled -- chances are if slices are enabled, service topology might be too. This lets us skip this step if the latest version is not adopted. Prior to sending an `Add` we filter the endpoints -- if the preference is registered by the filter we strictly enforce it, otherwise nothing changes.

And that's pretty much it. 

Signed-off-by: Matei David <matei.david.35@gmail.com>
2020-08-18 11:11:09 -07:00
Alex Leong a1543b33e3
Add support for service-mirror selectors (#4795)
* Add selector support

Signed-off-by: Alex Leong <alex@buoyant.io>

* Removed unused labels

Signed-off-by: Alex Leong <alex@buoyant.io>
2020-07-30 10:07:14 -07:00