linkerd2

Commit Graph

Author	SHA1	Message	Date
Matei David	98e38a66b6	Rename meshTls to meshTLS in ExternalWorkload CRD (#12098 ) The ExternalWorkload resource we introduced has a minor naming inconsistency; `Tls` in `meshTls` is not capitalised. Other resources that we have (e.g. authentication resources) capitalise TLS (and so does Go, it follows a similar naming convention). We fix this in the workload resource by changing the field's name and bumping the version to `v1beta1`. Upgrading the control plane version will continue to work without downtime. However, if an existing resource exists, the policy controller will not completely initialise. It will not enter a crashloop backoff, but it will also not become ready until the resource is edited or deleted. Signed-off-by: Matei David <matei@buoyant.io>	2024-02-20 11:00:13 -08:00
Alex Leong	5aef29ace1	Update workload watcher and server tests to use EndpointSlices (#12054 ) Fixes #12032 The Destination controller server tests test the destination server with `enableEndpointSlices=false`. The default for this value is true, meaning that these tests do not test the default configuration. We update the tests to test with `enableEndpointSlices=true` and update the corresponding mock kubernetes Endpoints resources to be EndpointSlices instead. We also fix an instance where the workload watcher was using Endpoints even when in EndpointSlices mode. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-09 11:33:06 -08:00
Alex Leong	1d0f484a20	Limit service publisher updates from server by namespace (#12040 ) A Server may only select workloads in its own namespace. Therefore, when the destination controller receives an update for a Server, it only needs to potentially send updates to watches on workloads in that same namespace. Taking this into account allows us avoid all opaqueness computations for workloads in other namespaces. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-08 17:00:50 +00:00
Alejandro Pedraza	8f8bd8f28f	Fix race condition in Destination's endpoints watcher (#12022 ) Fixes #12010 ## Problem We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010. ## Cause The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue. ## Solution Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated. ## Repro Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below): <details> <summary>endpoint_translator.go diff</summary> ```diff diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go index d1018d5f9..7d5abd638 100644 --- a/controller/api/destination/endpoint_translator.go +++ b/controller/api/destination/endpoint_translator.go @@ -5,6 +5,7 @@ import ( "reflect" "strconv" "strings" + "time" pb "github.com/linkerd/linkerd2-proxy-api/go/destination" "github.com/linkerd/linkerd2-proxy-api/go/net" @@ -195,7 +196,9 @@ func (et endpointTranslator) processUpdate(update interface{}) { } func (et endpointTranslator) add(set watcher.AddressSet) { for id, address := range set.Addresses { + time.Sleep(1 * time.Second) et.availableEndpoints.Addresses[id] = address } ``` </details> Then create these two Server manifests: <details> <summary>emoji-web-server.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: HTTP/1 ``` </details> <details> <summary>emoji-web-server-opaque.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: opaque ``` </details> In separate consoles run the patched destination service and a destination client: ```bash HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211 ``` ```bash go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80 ``` And run this to continuously switch the `proxyProtocol` field: ```bash while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done ``` You'll see the following data race report in the Destination controller logs: <details> <summary>destination logs</summary> ```console ================== WARNING: DATA RACE Write at 0x00c0006d30e0 by goroutine 178: github.com/linkerd/linkerd2/controller/api/destination/watcher.(portPublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772 github.com/linkerd/linkerd2/controller/api/destination/watcher.(servicePublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150 github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).addServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173 github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).updateServer-fm() <autogenerated>:1 +0x64 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/controller.go:246 +0x81 k8s.io/client-go/tools/cache.(ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x1f k8s.io/client-go/tools/cache.(processorListener).run.func1() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:970 +0x1f4 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:226 +0x41 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:227 +0xbe k8s.io/apimachinery/pkg/util/wait.JitterUntil() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:204 +0x10a k8s.io/apimachinery/pkg/util/wait.Until() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:161 +0x9b k8s.io/client-go/tools/cache.(processorListener).run() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:966 +0x38 k8s.io/client-go/tools/cache.(processorListener).run-fm() <autogenerated>:1 +0x33 k8s.io/apimachinery/pkg/util/wait.(Group).Start.func1() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/wait.go:72 +0x86 Previous read at 0x00c0006d30e0 by goroutine 360: github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).add() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).processUpdate() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166 github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).Start.func1() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45 ``` </details> ## Extras This also removes the unused method `func (as AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go	2024-02-07 12:17:24 -05:00
Alex Leong	53287d40a8	Reflect changes to server selector in opaqueness (#12031 ) Fixes #11995 If a Server is marking a Pod's port as opaque and then the Server's podSelector is updated to no longer select that Pod, then the Pod's port should no longer be marked as opaque. However, this update does not result in any updates from the destination API's Get stream and the port remains marked as opaque. We fix this by updating the endpoint watcher's handling of Server updates to consider both the old and the new Server. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-05 12:25:16 -08:00
Alex Leong	3902b339e7	Only process server updates on workloads affected by the server (#12017 ) We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload. This is because the Server update may have changed opaqueness for that workload. However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster. Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server. If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-01 10:42:50 -08:00
Matei David	3313bec072	Process ExternalWorkload targetRefs when updating slices (#11987 ) If the readiness of an external workload endpoint changes while traffic is being sent to it, the update will not be propagated to clients. This can lead to issues where an endpoint that is marked as `notReady` continues to figure out as being `ready` by the endpoints watcher. The issue stems from how endpoint slices are diffed. A utility function responsible for processing addresses does not consider endpoints whose targetRef is an external workload. We fix the problem and add two module tests to validate readiness is propagated to clients correctly. --------- Signed-off-by: Matei David <matei@buoyant.io>	2024-01-26 12:01:50 +00:00
Matei David	9c902dc6b4	Add an endpoints reconciler component for external workloads (#11948 ) We introduced an endpoints controller that will be responsible for managing EndpointSlices for services that select external workloads. We introduce as a follow-up the reconciler component of the controller that will be responsible for doing the writes and diffing. Additionally, the controller is wired-up in the destination service's main routine and will start if endpoint slice support is enabled. --------- Signed-off-by: Matei David <matei@buoyant.io> Signed-off-by: Zahari Dichev <zaharidichev@gmail.com> Co-authored-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-24 16:55:16 +00:00
Alex Leong	65f13de2ce	Add support for ExternalWorkloads in endpoint profiles (#11952 ) When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including: * mesh TLS identity * workload metric labels * opaque / protocol hints Signed-off-by: Alex Leong <alex@buoyant.io>	2024-01-23 09:43:12 -08:00
Zahari Dichev	027d49a9a6	discovery: handle endpoint slices from ExternalWorkload (#11939 ) This alters the endpoints slices watcher to handle slices that reference ExternalWorkloads. Testing Add the following resources: ```yaml apiVersion: discovery.k8s.io/v1 kind: EndpointSlice addressType: IPv4 metadata: name: my-external-workload namespace: mixed-env labels: kubernetes.io/service-name: test-1 endpoints: - addresses: - 172.21.0.5 conditions: ready: true serving: true terminating: false targetRef: kind: ExternalWorkload name: my-external-workload ports: - port: 8080 name: http --- apiVersion: workload.linkerd.io/v1alpha1 kind: ExternalWorkload metadata: name: my-external-workload namespace: mixed-env labels: app: test spec: meshTls: identity: "test" serverName: "test" workloadIPs: - ip: 172.21.0.5 ports: - port: 8080 name: http --- apiVersion: v1 kind: Service metadata: name: test-1 namespace: mixed-env spec: selector: app: test type: ClusterIP ports: - name: http port: 8080 targetPort: 8080 protocol: TCP ``` Observe endpoints: ``` linkerd dg endpoints test-1.mixed-env.svc.cluster.local:8080 ``` Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-17 15:43:20 -08:00
Zahari Dichev	391ce919f5	policy: regenerate Server go bindings (#11920 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-15 11:09:31 +02:00
Alex Leong	27a1a84a48	Only send server updates to listeners when the opaque protocol changes (#11907 ) Whenever the destination controller's informer receives an update of a Server resource, it checks every portPublisher in the endpointsWatcher to see if the Server selects any pods in that servicePort and updates those pods' opaque protocol field. Regardless of if any pods were matched or if the opaque protocol changed, an update is sent to each listener. This results in an update to every endpointTranslator each time a Server is updated. During a resync, we get an update for every Server in the cluster which results in N updates to each endpointTranslator where N is the number of Servers in the cluster. If N is greater than 100, it becomes possible that these N updates could overflow the endpointTranslator update queue if the queue is not being drained fast enough. We change this to only send the update for a Server if at least one of the servicePort addresses was selected by that server AND it's opaque protocol field changed. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-01-10 14:18:36 -08:00
Oliver Gould	b0fee16fb4	destination: Fix GetProfile endpoint buffering (#11815 ) In `71635cb` and `357a1d3` we updated the endpoint and profile translators to prevent backpressure from stalling the informer tasks. This change updates the endpoint profile translator with the same fix, so that updates are buffered and can detect when when a gRPC stream is stalled. Furthermore, the update method is updated to use a protobuf-aware comparison method instead of using serialization and string comparison. A test is added for the endpoint profile translator, since none existed previously.	2023-12-22 09:25:12 -08:00
Oliver Gould	5e558ae3e5	destination: Add optional experimental endpoint weighting (#11795 ) This change adds a runtime flag to the destination controller, --experimental-endpoint-zone-weights=true, that causes endpoints in the local zone to receive higher weights. This feature is disabled by default, since the weight value is not honored by proxies. No helm configuration is exposed yet, either. This weighting is instrumented in the endpoint translator. Tests are added to confirm that the behavior is feature-gated. Additionally, this PR adds the "zone" metric label to endpoint metadata responses.	2023-12-20 13:11:30 -08:00
Alex Leong	1e605ddf63	Add informer lag histograms (#11534 ) In order to detect if the destination controller's k8s informers have fallen behind, we add a histogram for each resource type. These histograms track the delta between when an update to a resource occurs and when the destination controller processes that update. We do this by looking at the timestamps on the managed fields of the resource and looking for the most recent update and comparing that to the current time. The histogram metrics are of the form `{kind}_informer_lag_ms_bucket_`. We record a value only for updates, not for adds or deletes. This is because when the controller starts up, it will populate its cache with an add for each resource in the cluster and the delta between the last updated time of that resource and the current time may be large. This does not represent informer lag and should not be counted as such. * When the informer performs resyncs, we get updates where the updated time of the old version is equal to the updated time of the new version. This does not represent an actual update of the resource itself and so we do not record a value. * Since we are comparing timestamps set on the manged fields of resources to the current time from the destination controller's system clock, the accuracy of these metrics depends on clock drift being minimal across the cluster. * We use histogram buckets which range from 500ms to about 17 minutes. In my testing, an informer lag of 500ms-1000ms is typical. However, we wish to have enough buckets to identify cases where the informer is lagged significantly behind. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-08 14:56:20 -05:00
Alex Leong	8c42a509d7	Return NotFound for unknown pod names (#11540 ) Fixes #11065 When an inbound proxy receives a request with a canonical name of the form `hostname.service.namespace.svc.cluster.domain`, we assume that `hostname` is the hostname of the pod as described in https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields. However, pods are also addressable with `pod-ip.service.namespace.svc.cluster.domain`. When the destination controller gets a profile request of this form, we attempt to find a pod with hostname of `pod-ip` and return an error with gRPC status `Unknown` since this will not exist. It is expected that this profile lookup will fail since we cannot have service profiles for individual pods. However, returning a gRPC status `Unknown` for these requests brings the reported success rate of the destination controller down. Instead we should return these as gRPC status `NotFound` so that these responses don't get reported as server errors. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-01 15:44:54 -07:00
Oliver Gould	8cd165a1e4	destination: Index pods and service with multiple IPs (#11515 ) The destination controller indexes Pods by their primary PodIP and their primary HostIP (when applicable); and it indexes Services by their primary ClusterIP. In preparation for dual-stack (IPv6) support, this change updates the destination controller indexers to consume all IPs for these resources. Tests are updated to demonstrate a dual-stack setup (where these resources have a primary IPv4 address and a secondary IPv6 address). While exercising these tests, I had to replace the brittle host:port parsing logic we use in the server, in favor of Go's standard `net.SplitHostPort` utility.	2023-10-23 14:55:33 -07:00
Alejandro Pedraza	65ddba4e5d	dst: Update `GetProfile`'s stream when pod associated to HostPort lookup changes (#11334 ) Followup to #11328 Implements a new pod watcher, instantiated along the other ones in the Destination server. It also watches on Servers and carries all the logic from ServerWatcher, which has now been decommissioned. The `CreateAddress()` function has been moved into a function of the PodWatcher, because now we're calling it on every update given the pod associated to an ip:port might change and we need to regenerate the Address object. That function also takes care of capturing opaque protocol info from associated Servers, which is not new and had some logic that was duped in the now defunct ServerWatcher. `getAnnotatedOpaquePorts()` got also moved for similar reasons. Other things to note about PodWatcher: - It publishes a new pair of metrics `ip_port_subscribers` and `ip_port_updates` leveraging the framework in `prometheus.go`. - The complexity in `updatePod()` is due to only send stream updates when there are changes in the pod's readiness, to avoid sending duped messages on every pod lifecycle event. - Finally, endpointProfileTranslator's `endpoint` (*pb.WeightedAddr) not being a static object anymore, the `Update()` function now receives an Address that allows it to rebuild the endpoint on the fly (and so `createEndpoint()` was converted into a method of endpointProfileTranslator).	2023-09-28 08:57:52 -05:00
Alejandro Pedraza	4720714df2	Fixed destination metrics warning when re-linking (#11253 ) Similar to #11246, the destination controller was complaning above trying to register dupe metrics for the api cache gauges, when a given target cluster got re-linked. This change unregisters the gauges for the target cluster when said cluster is removed. Supersedes #11252	2023-08-16 15:27:31 -07:00
Alex Leong	196cd49fe9	Add cluster store size gauge (#11256 ) We add a `cluster_store_size` gauge to the destination controller to track the number of entries in the remote discovery cluster store. If this is ever different from the number of cluster credentials secrets in the linkerd namespace, this indicates that there is a problem with a link that needs to be investigated further. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-08-16 15:14:22 -07:00
Alex Leong	368b63866d	Add support for remote discovery (#11224 ) Adds support for remote discovery to the destination controller. When the destination controller gets a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` label, this is an indication that the destination controller should discover the endpoints for this service from a remote cluster. The destination controller will look for a remote cluster which has been linked to it (using the `linkerd multicluster link` command) with that name. It will look at the `multicluster.linkerd.io/remote-discovery` label for the service name to look up in that cluster. It then streams back the endpoint data for that remote service. Since we now have multiple client-go informers for the same resource types (one for the local cluster and one for each linked remote cluster) we add a `cluster` label onto the prometheus metrics for the informers and EndpointWatchers to ensure that each of these components' metrics are correctly tracked and don't overwrite each other. --------- Signed-off-by: Alex Leong <alex@buoyant.io>	2023-08-11 09:31:45 -07:00
Matei David	62a8407420	Introduce small improvements to ClusterStore (#11220 ) There were a few improvements we could have made to a recent change that added a ClusterStore concept to the destination service. This PR introduces the small improvements: * Sync dynamically created clients in separate go routines * Refactor metadata API creation * Remove redundant check in cluster_store_test * Create a new runtime schema every time a fake metadata API client is created to avoid racey behaviour. Signed-off-by: Matei David <matei@buoyant.io>	2023-08-08 17:58:49 -07:00
Alejandro Pedraza	69ecb7f10d	Replace `server_port_subscribers` metric (#11206 ) Fixes #10764 Removed the `server_port_subscribers` gauge, as it wasn't distiguishing amongst different pods, and the number of subscribers for each pod were conflicting with one another when updating the metric (see more details [here](https://github.com/linkerd/linkerd2/issues/10764#issuecomment-1650835823)). Besides carying an invalid value, this was generating the warning `unable to delete server_port_subscribers metric with labels` The metric was replaced with the `server_port_subscribes` and `server_port_unsubscribes` counters, which track the overall number of subscribes and unsubscribes to the particular pod port. 🌮 to @adleong for the diagnosis and the fix!	2023-08-08 11:28:51 -05:00
Matei David	c2780adde1	Introduce an EndpointsWatcher cache structure (#11190 ) Currently, the control plane does not support indexing and discovering resources across cluster boundaries. In a multicluster set-up, it is desirable to have access to endpoint data (and by extension, any configuration associated with that endpoint) to support pod-to-pod communication. Linkerd's destination service should be extended to support reading discovery information from multiple sources (i.e. API Servers). This change introduces an `EndpointsWatcher` cache. On creation, the cache registers a pair of event handlers: * One handler to read `multicluster` secrets that embed kubeconfig files. For each such secret, the cache creates a corresponding `EndpointsWatcher` to read remote discovery information. * Another handle to evict entries from the cache when a `multicluster` secret is deleted. Additionally, a set of tests have been added that assert the behaviour of the cache when secrets are created and/or deleted. The cache will be used by the destination service to do look-ups for services that belong to another cluster, and that are running in a "remote discovery" mode. --------- Signed-off-by: Matei David <matei@buoyant.io>	2023-08-04 13:59:21 -07:00
Alex Leong	f7f1a95081	Add mutex lock to updateServer (#11169 ) Fixes #11163 The `servicePublisher.updateServer` function will iterate through all registered listeners and update them. However, a nil listener may temporarily be in the list of listeners if an unsubscribe is in progress. This results in a nil pointer dereference. All functions which result in updating the listeners must therefore be protected by the mutex so that we don't try to act on the list of listeners while it is being modified. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-07-27 11:17:53 -07:00
Mark Robinson	21209955c2	Fix bug where topology routing would not disable while service was under load. (#10925 ) Add support for enabling and disabling topology aware routing when hints are added/removed. The testing setup is very involved because it involves so many moving parts 1) Setup a service which is layered over several availability zones. 1a) The best way to do this is one service object, with 3 replicasets explicitly forced to use a specific AZ each. 2) Add `service.kubernetes.io/topology-aware-hints: Auto` annotation to the Service object 3) Use a load tester like k6 to send meaningful traffic to your service but only in one AZ 3) Scale up your replica sets until k8s adds Hints to your endpointslices 4) Observe that traffic shifts to only hit pods in one AZ 5) Turn down the replicasets count until such time that K8s removes the hints from your endpointslices 6) Observe traffic shifts back to all pods across all AZ.	2023-05-26 10:31:14 -07:00
Alejandro Pedraza	33adae2b7c	Fix `server_port_subscribers` metric in Destination (#10820 ) Fixes #10764 `GetProfile` streams create a `server_port_subscribers` gauge that tracks the number of listeners interested in a given Server. Because of an oversight, the gauge was only being registered until the second listener was added. For just one listener the gauge was absent. But whenever the `GetProfile` stream ended, the gauge was deleted which resulted in this error if it wasn't registered to begin with: ``` level=warning msg="unable to delete server_port_subscribers metric with labels map[name:voting namespace:emojivoto port:4191]" addr=":8086" component=server ``` One can check that the gauge wasn't being created by installing viz and emojivoto, and checking the following returns empty: ```bash $ linkerd diagnostics controller-metrics \| grep server_port_subscribers ``` After this fix, one can see the metric getting populated: ```bash $ linkerd diagnostics controller-metrics \| grep server_port_subscribers # HELP server_port_subscribers Number of subscribers to Server changes associated with a pod's port. # TYPE server_port_subscribers gauge server_port_subscribers{name="emoji",namespace="emojivoto",port="4191"} 1 server_port_subscribers{name="linkerd",namespace="linkerd",port="4191"} 1 server_port_subscribers{name="linkerd",namespace="linkerd",port="9990"} 1 server_port_subscribers{name="linkerd",namespace="linkerd",port="9995"} 1 server_port_subscribers{name="linkerd",namespace="linkerd",port="9996"} 1 server_port_subscribers{name="linkerd",namespace="linkerd",port="9997"} 1 server_port_subscribers{name="metrics",namespace="linkerd-viz",port="4191"} 1 server_port_subscribers{name="metrics",namespace="linkerd-viz",port="9995"} 1 server_port_subscribers{name="tap",namespace="linkerd-viz",port="4191"} 1 server_port_subscribers{name="tap",namespace="linkerd-viz",port="9995"} 1 server_port_subscribers{name="tap",namespace="linkerd-viz",port="9998"} 1 server_port_subscribers{name="vote",namespace="emojivoto",port="4191"} 1 server_port_subscribers{name="voting",namespace="emojivoto",port="4191"} 1 server_port_subscribers{name="web",namespace="emojivoto",port="4191"} 1 server_port_subscribers{name="web",namespace="linkerd-viz",port="4191"} 1 server_port_subscribers{name="web",namespace="linkerd-viz",port="9994"} 1 ``` And when scaling down the voting deployment, one can see how the metric with `name="voting"` is removed.	2023-04-26 16:48:50 -05:00
Oliver Gould	08f97cc26f	destination: Avoid sending spurious profile updates (#10517 ) The GetProfile API endpoint does not behave as expected: when a profile watch is established, the API server starts two separate profile watches--a primary watch with the client's namespace and fallback watch ignoring the client's namespace. These watches race to send data back to the client. If the backup watch updates first, it may be sent to clients before being corrected by a subsequent update. If the primary watch updates with an empty value, the default profile may be served before being corrected by an update to the backup watch. From the proxy's perspective, we'd much prefer that the API provide a single authoritative response when possible. It avoids needless corrective work from distributing across the system on every watch initiation. To fix this, we modify the fallbackProfileListener to behave predictably: it only emits updates once both its primary and fallback listeners have been updated. This avoids emitting updates based on a partial understanding of the cluster state. Furthermore, the opaquePortsAdaptor is updated to avoid synthesizing a default serviceprofile (surprising behavior) and, instead, this defaulting logic is moved into a dedicated defaultProfileListener helper. A dedupProfileListener is added to squelch obviously redundant updates. Finally, this newfound predictability allows us to simplify the API's tests. Many of the API tests are not clear in what they test and sometimes make assertions about the "incorrect" profile updates.	2023-03-13 13:36:18 -07:00
Oliver Gould	c657aeabe8	destination: Split GetProfile into smaller functions (#10514 ) Before changing any GetProfile behavior, this change splits the API handler into some smaller scopes. This helps to clarify control flow and reduce nested contexts. This change also adds relevant fields to log contexts to improve diagnostics.	2023-03-13 11:36:39 -07:00
Yannick Utard	95a3d19b3f	Fix deleteEndpointSlice when there are multiple EndpointSlices (#10370 ) When processing a `delete` event for an EndpointSlice, regardless of the outcome (whether there are still addresses/endpoints alive or whether we have no endpoints left) we make a call to `noEndpoints` on the subscriber. This will send a `Destination.Get` update to the listeners to advertise a `NoEndpoints` event. If there are still addresses left, NoEndpoints { exists: true } will be sent (to signal the service still exists). Until an update is registered (i.e a new add) there will be no available endpoints -- this is incorrect, since other EndpointSlices may exist. This change fixes the problem by handling `noEndpoints` in a more specialized way for EndpointSlices. Signed-off-by: Yannick Utard <yannickutard@gmail.com>	2023-02-22 13:35:02 +00:00
Alejandro Pedraza	4a84f2cb32	Implement the k8s metadata API in the Destination controller (#10326 ) Fixes #9986 After reviewing the k8s API calls in Destination, it was concluded we could only swap out the calls to the Node and RS resources to use the metadata API, as all the other resources (Endpoints, EndpointSlices, Services, Pod, ServiceProfiles, Server) required fields other than those found in their metadata section. This also required completing the `NewFakeAPI` implementation by adding the missing annotations and labels entries. ## Testing Memory Consumption The gains here aren't as big as in #9650. In order to test this we need to push hard and create 4000 RS: ``` bash for i in {0..4000}; do kubectl create deployment test-pod-$i --image=nginx; done ``` In edge-23.2.1 the destination pod's memory consumption goes from 40Mi to 160Mi after all the RS were created. With this change, it went from 37Mi to 140Mi.	2023-02-13 17:30:07 -05:00
Alex Leong	6ebcf19cfc	Don't use pods in map keys in server watcher (#10245 ) Fixes #10205 The server watches uses `*corev1.Pod` in the key of it's subscriber map which is potentially brittle since it's not obvious what exactly should count as equality for full pod resources. More robust is to use a unique per-pod identifier in the key: the pod's name and namespace. We therefore more the pod resource into the value type of the map since we still need it when we receive a Server update so that we can iterate through all subscriptions and find pods which match the Server's pod selector. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-02-13 13:41:00 -08:00
Alejandro Pedraza	147c8dc07c	Add metrics to server and service watchers (#10213 ) * Add metrics to server and service watchers Closes #10202 and completes #2204 As a followup to #10201, I'm adding the following metric in `server_watcher.go`: - `server_port_subscribers`: This tracks the number of subscribers to changes to Servers associated to a port in a pod. The metric's label identify the namespace and name of the pod, and its targeted port. Additionally, `opaque_ports.go` was missing metrics as well. I added `service_subscribers` which tracks the number of subscribers to a given Service, labeled by the Service's namespace and name. `opaque_ports.go` was also leaking the subscriber's map key, so that got fixed as well.	2023-02-07 08:51:09 -05:00
Yu Cao	e662e147ca	Support service internal traffic policy (#10186 ) Closes #10130 https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/ 1. Update endpoints watcher to include additional field `localTrafficPolicy`. Set to true when `.spec.internalTrafficPolicy` is set to `Local` 2. Update endpoints translater to filter by node when `localTrafficPolicy` is set to true. Topology Aware Hints are not used when `service.pec.internalTrafficPolicy` is set to local Signed-off-by: Yu Cao <yc185050@ncr.com>	2023-02-06 13:53:07 -07:00
Alejandro Pedraza	8b29408db3	Fix memory leak in Server listeners (#10201 ) Fixes #8270 When a listener unsubscribes to port updates in Servers, we were removing the listener for the `ServerWatcher.subscriptions` map, leaving the map's key (`podPort` with holds the pod object and port) with an empty value. In clusters where there's a lot of pod churn, those keys with empty values were getting accumulated, so this change cleans that up. The repro (basically constantly rolling emojivoto) is described in #9947. A followup will be up shortly adding metrics to track these metrics, along with similar missing metrics from other parts of Destination.	2023-01-26 12:20:48 -05:00
dependabot[bot]	62d6d7cd52	build(deps): bump sigs.k8s.io/gateway-api from 0.5.1 to 0.6.0 (#10038 ) * build(deps): bump sigs.k8s.io/gateway-api from 0.5.1 to 0.6.0 Bumps [sigs.k8s.io/gateway-api](https://github.com/kubernetes-sigs/gateway-api) from 0.5.1 to 0.6.0. - [Release notes](https://github.com/kubernetes-sigs/gateway-api/releases) - [Changelog](https://github.com/kubernetes-sigs/gateway-api/blob/main/CHANGELOG.md) - [Commits](https://github.com/kubernetes-sigs/gateway-api/compare/v0.5.1...v0.6.0) --- updated-dependencies: - dependency-name: sigs.k8s.io/gateway-api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * Account for possible errors returned from `AddEventHandler` In v0.26.0 client-go's `AddEventHandler` method for informers started returning a registration handle (that we ignore) and an error that we now surface up. * client-go v0.26.0 removed the openstack plugin * Temporary changes to trigger tests in k8s 1.21 - Adds an innocuous change to integration.yml so that all tests get triggered - Hard-code k8s version in `k3d cluster create` invocation to v1.21 * Revert "Temporary changes to trigger tests in k8s 1.21" This reverts commit `3e1fdd0e5e`. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2023-01-16 09:38:09 -05:00
Alex Leong	768e04dd7e	Update endpoints watcher to not fetch pods for removed endpoints (#10013 ) Fixes #10003 When endpoints are removed from an EndpointSlice resource, the destination controller builds a list of addresses to remove. However, if any of the removed endpoints have a Pod as their targetRef, we will attempt to fetch that pod to build the address to remove. If that pod has already been removed from the informer cache, this will fail and the endpoint will be skipped in the list of endpoints to be removed. This results in stale endpoints being stuck in the address set and never being removed. We update the endpoint watcher to construct only a list of endpoint IDs for endpoints to remove, rather than fetching the entire pod object. Since we no longer attempt to fetch the pod, this operation is now infallible and endpoints will no longer be skipped during removal. We also add a `TestEndpointSliceScaleDown` test to exercise this. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-01-03 10:04:02 -08:00
Oliver Gould	b2f22dee78	go: Copy port range utilities from the proxy-init repo (#9143 ) The proxy-init repo is changing its structure and, as such, we want to minimize cross-repo dependencies from linkerd2 to linkerd2-proxy-init. (We expect the cni-plugin code to move in a followup change). This change duplicates the port range parsing utility (about 50 lines, plus tests). This avoids stray dependencies on linkerd2-proxy-init. Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-08-12 10:34:02 -07:00
Jacob Henner	7d47639608	Remove kube-system exclusions from watchers (#8720 ) Watch events for objects in the kube-system namespace were previously ignored. In certain situations, this would cause the destination service to return invalid (outdated) endpoints for services in kube-system - including unmeshed services. It [was suggested][1] that kube-system events were ignored to avoid handling frequent Endpoint updates - specifically from [controllers using Endpoints for leader elections][2]. As of Kubernetes 1.20, these controllers [default to using Leases instead of Endpoints for their leader elections][3], obviating the need to exclude (or filter) updates from kube-system. The exclusions have been removed accordingly. [1]: https://github.com/linkerd/linkerd2/pull/4133#issuecomment-594983588 [2]: https://github.com/kubernetes/kubernetes/issues/86286 [3]: https://github.com/kubernetes/kubernetes/pull/94603 Signed-off-by: Jacob Henner <code@ventricle.us>	2022-07-11 13:52:27 -06:00
Alex Leong	b7a0b8adb4	Bump minimum kubernetes version to 1.21 (#8647 ) Fixes #8592 Increase the minimum supported kubernetes version from 1.20 to 1.21. This allows us to drop support for batch/v1beta1/CronJob and discovery/v1beta1/EndpointSlices, instead using only v1 of those resources. This fixes deprecation warnings about these warnings printed by the CLI. Signed-off-by: Alex Leong <alex@buoyant.io>	2022-06-14 15:15:28 -07:00
Alex Leong	c5963fbbb1	Set targetCluster label even when serviceFqn is not set (#8542 ) Fixes #8134 The multicluster probe-gateway service (which is used by the service-mirror controller to send health probes to the remote gateway) has `mirror.linkerd.io/cluster-name` label but it does not have a `mirror.linkerd.io/remote-svc-fq-name` annotation. This makes sense because the probe-gateway service does not correspond to any target service on the remote cluster and instead targets the gateway itself. We update the logic in the destination controller so that we can still add the `dst_target_cluster` metric label when only the `mirror.linkerd.io/cluster-name` label is present but not the `mirror.linkerd.io/remote-svc-fq-name` annotation. This allows us to have the `dst_target_cluster` metric label for service-mirror controller probe traffic. Signed-off-by: Alex Leong <alex@buoyant.io>	2022-05-31 16:09:51 -07:00
Oliver Gould	fa8ddb4801	Use go-test/deep for comparisons in tests (#8427 ) We frequently compare data structures--sometimes very large data structures--that are difficult to compare visually. This change replaces uses of `reflect.DeepEqual` with `deep.Equal`. `go-test`'s `deep.Equal` returns a diff of values that are not equal. Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-05-05 09:31:07 -07:00
Matei David	1e9f734bcd	Support whitespaces in opaqueports annotation (#8355 ) Opaque ports may be configured through annotations, where the value may be either as a range (e.g 0 - 255), or as a comma delimited string ("121,122"). When configured as a comma delimited string, our parsing logic will trim any leading and trailing commas and split the value; ports will be converted from string to an int counterpart. If whitespaces are used, such that the value looks similar to "121, 122", the parsing logic will fail -- when attempting to convert the string into an integer -- with the following error "\" 122\" is not a valid lower-bound". This can lead to confusion from users whose services and endpoints function with undefined behaviour. This change introduces logic to strip any leading or trailing whitespaces from strings when tokenizing the annotation value; this way, we are guaranteed not to experience parsing errors when converting strings. To validate the behaviour, a new (unit) test has been added to the opaque ports watcher with a multi-opaque-port annotation on a service, where the value contains a space. Signed-off-by: Matei David <matei@buoyant.io>	2022-04-28 13:46:27 +03:00
Alejandro Pedraza	30e42f98f6	Fix race in destination unit test (#8065 ) Fixes report https://github.com/linkerd/linkerd2/runs/5518386921 by guarding `BufferingProfileListener` with a mutex.	2022-03-14 11:26:05 -07:00
Kevin Leimkuhler	67bcd8f642	Add `gosec` and `errcheck` lints (#7954 ) Closes #7826 This adds the `gosec` and `errcheck` lints to the `golangci` configuration. Most significant lints have been fixed my individual changes, but this enables them by default so that all future changes are caught ahead of time. A significant amount of these lints are been exluced by the various `exclude-rules` rules added to `.golangci.yml`. These include operations are files that generally do not fail such as `Copy`, `Flush`, or `Write`. We also choose to ignore most errors when cleaning up functions via the `defer` keyword. Aside from those, there are several other rules added that all have comments explaining why it's okay to ignore the errors that they cover. Finally, several smaller fixes in the code have been made where it seems necessary to catch errors or at least log them. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2022-03-03 10:09:51 -07:00
Oliver Gould	425a43def5	Enable gocritic linting (#7906 ) [gocritic][gc] helps to enforce some consistency and check for potential errors. This change applies linting changes and enables gocritic via golangci-lint. [gc]: https://github.com/go-critic/go-critic Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-02-17 22:45:25 +00:00
Oliver Gould	f5876c2a98	go: Enable `errorlint` checking (#7885 ) Since Go 1.13, errors may "wrap" other errors. [`errorlint`][el] checks that error formatting and inspection is wrapping-aware. This change enables `errorlint` in golangci-lint and updates all error handling code to pass the lint. Some comparisons in tests have been left unchanged (using `//nolint:errorlint` comments). [el]: https://github.com/polyfloyd/go-errorlint Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-02-16 18:32:19 -07:00
Oliver Gould	7776a50074	destination: Use net.JoinHostPort to format names (#7821 ) We use `fmt.Sprintf` to format URIs in several places we could be using `net.JoinHostPort`. `net.JoinHostPort` ensures that IPv6 addresses are properly escaped in URIs. Signed-off-by: Oliver Gould <ver@buoyant.io>	2022-02-07 15:05:14 -08:00
Kevin Leimkuhler	147d85dc70	Update `GetProfile` clients with policy server updates (#7388 ) ### What `GetProfile` clients do not receive destinatin profiles that consider Server protocol fields the way that `Get` clients do. If a Server exists for a `GetProfile` destination that specifies the protocol for that destination is `opaque`, this information is not passed back to the client. #7184 added this for `Get` by subscribing clients to Endpoint/EndpointSlice updates. When there is an update, or there is a Server update, the endpoints watcher passes this information back to the endpoint translator which handles sending the update back to the client. For `GetProfile` the situation is different. As with `Get`, we only consider Servers when dealing with Pod IPs, but this only occurs in two situations for `GetProfile`. 1. The destination is a Pod IP and port 2. The destionation is an Instance ID and port In both of these cases, we need to check if a already Server selects the endpoint and we need to subscribe for Server updates incase one is added or deleted which selects the endpoint. ### How First we check if there is already a Server which selects the endpoint. This is so that when the first destionation profile is returned, the client knows if the destination is `opaque` or not. After sending that first update, we then subscribe the client for any future updates which will come from a Server being added or deleted. This is handled by the new `ServerWatcher` which watches for Server updates on the cluster; when an update occurs it sends that to the `endpointProfileTranslator` which translates the protcol update into a DestinationProfile. By introducing the `endpointProfileTranslator` which only handles protocol updates, we're able to decouple the endpoint logic from `profileTranslator`—it's `endpoint` field has been removed now that it only handles updates for ServiceProfiles for Services. ### Testing A unit test has been added and below are some manual testing instructions to see how it interacts with Server updates: <details> <summary>app.yaml</summary> ```yaml apiVersion: v1 kind: Pod metadata: name: pod labels: app: pod spec: containers: - name: app image: nginx ports: - name: http containerPort: 80 --- apiVersion: policy.linkerd.io/v1beta1 kind: Server metadata: name: srv labels: policy: srv spec: podSelector: matchLabels: app: pod port: 80 proxyProtocol: opaque ``` </details> ```shell $ go run ./controller/cmd/main.go destination ``` ```shell $ linkerd inject app.yaml \|kubectl apply -f - ... $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod 2/2 Running 0 53m 10.42.0.34 k3d-k3s-default-server-0 <none> <none> $ go run ./controller/script/destination-client/main.go -method getProfile -path 10.42.0.34:80 ... ``` You can add/delete `srv` as well as edit its `proxyProtocol` field to observe the correct DestinationProfile updates. Signed-off-by: Kevin Leimkuhler <kleimkuhler@icloud.com>	2021-12-08 12:26:27 -07:00
Tarun Pothulapati	4170b49b33	smi: remove default functionality in linkerd (#7334 ) Now, that SMI functionality is fully being moved into the [linkerd-smi](www.github.com/linkerd/linkerd-smi) extension, we can stop supporting its functionality by default. This means that the `destination` component will stop reacting to the `TrafficSplit` objects. When `linkerd-smi` is installed, It does the conversion of `TrafficSplit` objects to `ServiceProfiles` that destination components can understand, and will react accordingly. Also, Whenever a `ServiceProfile` with traffic splitting is associated with a service, the same information (i.e splits and weights) is also surfaced through the `UI` (in the new `services` tab) and the `viz cmd`. So, We are not really loosing any UI functionality here. Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>	2021-12-03 12:07:30 +05:30

1 2 3

109 Commits