linkerd2

Commit Graph

Author	SHA1	Message	Date
dependabot[bot]	d42432914d	build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0 (#12593 ) * build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0 Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.64.0. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.64.0) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> I've replaced all the `grpc.Dial` calls with `grpc.NewClient`. There was one call `grpc.DialContext(ctx,...)` in `viz/tap/api/grpc_server.go` that also got replaced with `grpc.NewClient`, which loses the ability to pass `ctxt` but that doesn't matter; as we're not using `WithBlock(true)` that context wasn't being accounted for when we were using `DialContext()` anyways. https://github.com/grpc/grpc-go/blob/v1.64.0/clientconn.go#L216-L242 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>	2024-05-22 14:40:04 -05:00
Alejandro Pedraza	df24ea6d96	Refactor ES addition logic in Destination (#12625 ) This is a second take on #12427, which avoided a theoretical/correctness issue around overwritting new ES addresses with stale data. We had to revert that in #12589 because the change introduced a bug, by returning early when the ES had no addresses and failed to properly initiallize `addesses` for the portPublisher. This just removes the early return.	2024-05-22 09:12:19 -05:00
Matei David	407df01ec3	chore(controller): Remove stream concurrency limits (#12598 ) Our gRPC servers use the default gRPC server configuration, which limits the number of concurrent streams to 100. Since the controllers run with proxies, this provides a hard scaling limit for the number of watches an application can have. This change updates our gRPC server configuration to clear the default concurrency limit, allowing the server to handle as many streams as possible. Signed-off-by: Matei David <matei@buoyant.io> Co-authored-by: Oliver Gould <ver@buoyant.io>	2024-05-15 18:07:15 +01:00
Alejandro Pedraza	9bd8c005da	Revert "Fix destination staleness issue when adding EndpointSlices (#12427 )" (#12589 ) This reverts commit `4fccf3e9ec`. The early return was causing `pp.addresses = newAddressSet` to not be run when the list of addresses is empty; but setting that is still necessary so that labels are tracked correctly. This was caught by the tap (viz) integration test run in the release workflow.	2024-05-13 16:25:50 -07:00
Alejandro Pedraza	4fccf3e9ec	Fix destination staleness issue when adding EndpointSlices (#12427 ) When updating the portPublisher's address set when a new EndpointSlice creation event is received, its addresses where getting overwritten with stale data whenever its IDs already existed in the current pp's address set. There can be pathological cases in single-stack where this can be a problem. For example when ES get recycled but the deletion event is not caught for some reason, when the addition event is received its Address data will be overwritten by the old stale entry. ## Other Changes - Remove overriding `newAddressSet.LocalTrafficPolicy` as that is already taken care inside `pp.endpointSliceToAddresses(slice)`. - When there are no Add events to send, return early without updating state nor updating metrics.	2024-05-08 09:12:51 -05:00
Alex Leong	d3e227fbd7	Fix flakey Handles_overflow test (#12555 ) The `Handles overflow` test for the endpoint profile translator writes updates into the updates queue until it is full and then tests that no more updates can be enqueued. However, since the test also starts the profile translator, it is concurrently draining updates off of the queue as well. This leads to unpredictable results and test flakeyness. We update the test to not start the translator so that updates are not drained off of the queue during the test. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-05-07 13:06:58 -07:00
Alejandro Pedraza	137eac9df3	Add IPv6 support for the destination controller (#12428 ) Services in dual-stack mode result in the creation of two EndpointSlices, one for each IP family. Before this change, the Get Destination API would nondeterministically return the address for any of those ES, depending on which one was processed last by the controller because they would overwrite each other. As part of the ongoing effort to support IPv6/dual-stack networks, this change fixes that behavior giving preference to IPv6 addresses whenever a service exposes both families. There are a new set of unit tests in server_ipv6_test.go, and in the TestEndpointTranslatorForPods tests there's a couple of new cases to test the interaction with zone filtering. Also the server unit tests were updated to segregate the tests and resources dealing with the IPv4/IPv6/dual-stack cases.	2024-05-02 14:39:05 -05:00
Alejandro Pedraza	7cbe2f5ca6	Enable forwarding IPv6 connections through the proxy (#12495 ) As part of the ongoing effort to support IPv6/dual-stack networks, this change enables the proxy to properly forward IPv6 connections: - Adds the new `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDRS` environment variable when injecting the proxy. This is supported as of proxy v2.228.0 which was just pulled into the linkerd2 repo in #2d5085b56e465ef56ed4a178dfd766a3e16a631d. This adds the IPv6 loopback address (`[::1]`) to the IPv4 one (`127.0.0.1`) so the proxy can forward outbound connections received via IPv6. The injector will still inject `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR` to support the rare case where the `proxy.image.version` value is overridden with an older version. The new proxy still considers that variable, but it's superseded by the new one. The old variable is considered deprecated and should be removed in the future. - The values for `LINKERD2_PROXY_CONTROL_LISTEN_ADDR`, `LINKERD2_PROXY_ADMIN_LISTEN_ADDR` and `LINKERD2_PROXY_INBOUND_LISTEN_ADDR` have been updated to point to the IPv6 wildcard address (`[::]`) instead of the IPv4 one (`0.0.0.0`) for the same reason. Unlike with the loopback address, the IPv6 wildcard address suffices to capture both IPv4 and IPv6 traffic. - The endpoint translator's `getInboundPort()` has been updated to properly parse the IPv6 loopback address retrieved from the proxy container manifest. A unit test was added to validate the behavior.	2024-05-02 16:39:19 +01:00
Oliver Gould	aef8a02426	feat(destination): Add meshed HTTP/2 keep-alive settings (#12504 ) This commit adds destination controller configuration that enables default keep-alives for meshed HTTP/2 clients. This is accomplished by encoding the raw protobuf message structure into the helm values, and then encoding that as JSON in the destination controller's command-line options. This allows operators to set any supported HTTP/2 client configuration without having to modify the destination controller.	2024-04-30 19:35:30 +00:00
knowmost	27bcdd1028	chore: fix function names in comment (#12512 ) Signed-off-by: knowmost <knowmost@outlook.com>	2024-04-29 10:28:10 -07:00
Alejandro Pedraza	6db4bd667c	Fix issues with native sidecars (#12453 ) Closes #12395 Failing to iterate over init containers as well as regular containers for finding the proxy in various parts of the code when the proxy is injected as a native sidecar resulted in: - `Get` Destination API failing in the presence of opaque ports - Failure having the injector detecting already injected pods - Various CLI issues This PR is split into the following commits addressing each issue separately: `a8ebe76e3` - Fix injection check for existing sidecars `44e9625e0` - Fix 'linkerd uninject' `62694965d` - Fix 'linkerd version --proxy' `42dbdaddf` - Fix 'linkerd identity' `39db823fe` - Fix 'linkerd check' `7359f371d` - Fix 'linkerd dg proxy-metrics' `f8f73c47c` - Fix destination controller	2024-04-26 14:38:01 -05:00
Matei David	38c6d11832	Change injector overriding logic to be more generic (#12405 ) The proxy-injector package has a `ResourceConfig` type that is responsible for parsing resources, applying overrides, and serialising a series of configuration values to a Kubernetes patch. The functionality is very concrete in its assumption; it always relies on a pod spec and it mutates inner state when deciding on which overrides to apply. This is not a flexible way to handle injection and configuration overriding for other types of resources. We change this by turning methods previously defined on `ResourceConfig` into free-standing functions. These functions can be applied for any type of resources in order to compute a set of configuration values based on annotation overrides. Through the change, the functions can be used to compute static configuration for non-Pod types or can be used in tests. Signed-off-by: Matei David <matei@buoyant.io>	2024-04-10 15:51:58 +01:00
hanghuge	78d42b230d	chore: fix function name in comment (#12396 ) Fixed comments for `subscribeToServicesWithContext` and `reconcileByAddressType`. Previously, the comments contained incorrect function names. Signed-off-by: hanghuge <cmoman@outlook.com>	2024-04-10 15:46:45 +01:00
occupyhabit	6eeaea4d94	chore: Remove repetitive words (#12330 ) Signed-off-by: occupyhabit <wangmengjiao@outlook.com>	2024-03-25 09:33:39 -07:00
Hirotaka Tagawa / wafuwafu13	9a5284f453	controller: Stop logging errors on shutdown (#12167 ) When a controller is shutdown, the admin server fails. This failure is logged as an error, even when the shutdown was graceful. This change updates the shutdown behavior to log more appropriately. Signed-off-by: wafuwafu13 <jaruwafu@gmail.com>	2024-03-22 09:49:37 -07:00
Alex Leong	2c2a96bc73	Removes should not change local traffic policy (#12325 ) Fixes: #12311 When the endpoint translator receives a `remove` call, it was updating it's local traffic policy based on the address set passed to remove. However, since `remove` is only meant to remove addresses and not change the address metadata, the endpoints watcher was not setting local traffic policy on these calls to `remove`. This can result in calls to `remove` temporarily turning off local traffic policy which will cause non-local addresses to be sent to clients. Since `remove` should not change address metadata, we now disregard any metadata in the call to `remove`, including any changes to the local traffic policy. Signed-off-by: Alex Leong <alex@buoyant.io> Co-authored-by: Oliver Gould <ver@buoyant.io>	2024-03-22 09:42:13 -07:00
Alejandro Pedraza	b697e285a0	Refactor IPv4-only functions to also work for IPv6 (#12303 ) The main change here is the refactoring of the address functions in `addr.go` that support the Destination controller and Viz's Tap controller. Some of those functions only worked for IPv4, so this change refactored them to make them IP family agnostic. This enabled adding (and fixing) IPv6 unit tests as detailed in the following sections. Other changes: - The `ProxyAddressesToString()` function was no longer used, so it got removed. - The `ProxyIPToString()` function was only used by the destination-client script, so that got stripped out. ## `addr_test.go` We added IPv6 cases to each test, that would have failed previously. ## `endpoint_translator_test.go` One of the test pods (pod3) was changed to have an IPv6. Without the other changes in this PR those tests would still have passed, but just because when comparing actual IPs with expected ones we weren't checking if they were both zero. So here we added checks against that. ## `server_test.go` As above, we added checks against empty IPs. And in the mocked resources in `test_util.go` we added an IPv6 EndpointSlice.	2024-03-22 07:20:52 -05:00
Alex Leong	5915ef5a18	Don't send endpoint profile updates from Server updates when opaqueness doesn't change (#12013 ) When the destination controller receives an update for a Server resource, we recompute opaqueness ports for all pods. This results in a large number of updates to all endpoint profile watches, even if the opaqueness doesn't change. In cases where there are many Server resources, this can result in a large number of updates being sent to the endpoint profile translator and overflowing the endpoint profile translator update queue. This is especially likely to happen during an informer resync, since this will result in an informer callback for every Server in the cluster. We refactor the workload watcher to not send these updates if the opaqueness has not changed. This, seemingly simple, change in behavior requires a large code change because: * the current opaqueness state is not stored on workload publishers and must be added so that we can determine if the opaqueness has changed * storing the opaqueness in addition to the other state we're storing (pod, ip, port, etc.) means that we are not storing all of the data represented by the Address struct * workload watcher uses a `createAddress` func to dynamically create an Address from the state it stores * now that we are storing the Address as state, creating Addresses dynamically is no longer necessary and we can operate on the Address state directly * this makes the workload watcher more similar to other watchers and follow a common pattern * it also fixes some minor correctness issues: * pods that did not have the ready status condition were being considered when they should not have been * updates to ExternalWorkload labels were not being considered Signed-off-by: Alex Leong <alex@buoyant.io>	2024-03-19 10:24:02 -07:00
Oliver Gould	e1ab0a2e46	destination: Fix flakey TestEndpointProfileTranslator (#12182 ) TestEndpointProfileTranslator could race against the consumer task so that the final queue insertion would not necessarily fail. The send buffer has been eliminated to avoid this race.	2024-02-29 08:59:44 -08:00
Matei David	98e38a66b6	Rename meshTls to meshTLS in ExternalWorkload CRD (#12098 ) The ExternalWorkload resource we introduced has a minor naming inconsistency; `Tls` in `meshTls` is not capitalised. Other resources that we have (e.g. authentication resources) capitalise TLS (and so does Go, it follows a similar naming convention). We fix this in the workload resource by changing the field's name and bumping the version to `v1beta1`. Upgrading the control plane version will continue to work without downtime. However, if an existing resource exists, the policy controller will not completely initialise. It will not enter a crashloop backoff, but it will also not become ready until the resource is edited or deleted. Signed-off-by: Matei David <matei@buoyant.io>	2024-02-20 11:00:13 -08:00
Oliver Gould	2ab76b64c6	destination: Rename zone weighting flag to ext-endpoint-zone-weights (#12090 )	2024-02-16 09:06:56 -05:00
Zahari Dichev	bf7b039f41	controller: add counter for items dropped from workqueue (#12079 ) Adds a metric that measures the number of items that have been discarded from the work queue in the external workloads controller due to the retries limit being exceeded. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-02-15 16:19:09 +02:00
Alex Leong	5aef29ace1	Update workload watcher and server tests to use EndpointSlices (#12054 ) Fixes #12032 The Destination controller server tests test the destination server with `enableEndpointSlices=false`. The default for this value is true, meaning that these tests do not test the default configuration. We update the tests to test with `enableEndpointSlices=true` and update the corresponding mock kubernetes Endpoints resources to be EndpointSlices instead. We also fix an instance where the workload watcher was using Endpoints even when in EndpointSlices mode. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-09 11:33:06 -08:00
Alex Leong	1d0f484a20	Limit service publisher updates from server by namespace (#12040 ) A Server may only select workloads in its own namespace. Therefore, when the destination controller receives an update for a Server, it only needs to potentially send updates to watches on workloads in that same namespace. Taking this into account allows us avoid all opaqueness computations for workloads in other namespaces. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-08 17:00:50 +00:00
Alejandro Pedraza	8f8bd8f28f	Fix race condition in Destination's endpoints watcher (#12022 ) Fixes #12010 ## Problem We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010. ## Cause The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue. ## Solution Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated. ## Repro Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below): <details> <summary>endpoint_translator.go diff</summary> ```diff diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go index d1018d5f9..7d5abd638 100644 --- a/controller/api/destination/endpoint_translator.go +++ b/controller/api/destination/endpoint_translator.go @@ -5,6 +5,7 @@ import ( "reflect" "strconv" "strings" + "time" pb "github.com/linkerd/linkerd2-proxy-api/go/destination" "github.com/linkerd/linkerd2-proxy-api/go/net" @@ -195,7 +196,9 @@ func (et endpointTranslator) processUpdate(update interface{}) { } func (et endpointTranslator) add(set watcher.AddressSet) { for id, address := range set.Addresses { + time.Sleep(1 * time.Second) et.availableEndpoints.Addresses[id] = address } ``` </details> Then create these two Server manifests: <details> <summary>emoji-web-server.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: HTTP/1 ``` </details> <details> <summary>emoji-web-server-opaque.yml</summary> ```yaml apiVersion: policy.linkerd.io/v1beta2 kind: Server metadata: namespace: emojivoto name: web-http labels: app.kubernetes.io/part-of: emojivoto app.kubernetes.io/name: web app.kubernetes.io/version: v11 spec: podSelector: matchLabels: app: web-svc port: http proxyProtocol: opaque ``` </details> In separate consoles run the patched destination service and a destination client: ```bash HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211 ``` ```bash go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80 ``` And run this to continuously switch the `proxyProtocol` field: ```bash while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done ``` You'll see the following data race report in the Destination controller logs: <details> <summary>destination logs</summary> ```console ================== WARNING: DATA RACE Write at 0x00c0006d30e0 by goroutine 178: github.com/linkerd/linkerd2/controller/api/destination/watcher.(portPublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772 github.com/linkerd/linkerd2/controller/api/destination/watcher.(servicePublisher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150 github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).addServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173 github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).updateServer() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f github.com/linkerd/linkerd2/controller/api/destination/watcher.(EndpointsWatcher).updateServer-fm() <autogenerated>:1 +0x64 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/controller.go:246 +0x81 k8s.io/client-go/tools/cache.(ResourceEventHandlerFuncs).OnUpdate() <autogenerated>:1 +0x1f k8s.io/client-go/tools/cache.(processorListener).run.func1() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:970 +0x1f4 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:226 +0x41 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:227 +0xbe k8s.io/apimachinery/pkg/util/wait.JitterUntil() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:204 +0x10a k8s.io/apimachinery/pkg/util/wait.Until() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:161 +0x9b k8s.io/client-go/tools/cache.(processorListener).run() /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:966 +0x38 k8s.io/client-go/tools/cache.(processorListener).run-fm() <autogenerated>:1 +0x33 k8s.io/apimachinery/pkg/util/wait.(Group).Start.func1() /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/wait.go:72 +0x86 Previous read at 0x00c0006d30e0 by goroutine 360: github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).add() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).processUpdate() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166 github.com/linkerd/linkerd2/controller/api/destination.(endpointTranslator).Start.func1() /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45 ``` </details> ## Extras This also removes the unused method `func (as AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go	2024-02-07 12:17:24 -05:00
Alex Leong	53287d40a8	Reflect changes to server selector in opaqueness (#12031 ) Fixes #11995 If a Server is marking a Pod's port as opaque and then the Server's podSelector is updated to no longer select that Pod, then the Pod's port should no longer be marked as opaque. However, this update does not result in any updates from the destination API's Get stream and the port remains marked as opaque. We fix this by updating the endpoint watcher's handling of Server updates to consider both the old and the new Server. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-05 12:25:16 -08:00
Matei David	d4f99b32ce	Revise leader election logic for endpoints controller (#12021 ) Revise leader election logic for endpoints controller Our leader election logic can result in updates being missed under certain conditions. Leases expire after their duration is up, even if their current holder has been terminated. During this dead time, any changes in the system will be observed by other controllers, but will not be written to the API Server. For example, during a rollout, a controller that has come up will not be able to acquire the lease for a maximum time of 30 seconds (lease duration). Within this time frame, any changes to the system (e.g. modified workloads, services, deleted endpointslices) will be observed but not acted on by the newly created controller. Once the controller gets into a bad state, it can only recover after 10 minutes (via service resyncs) or if any resources are modified. To address this, we change our leader election mechanism. Instead of pushing leader election to the edge (i.e. when performing writes) we only allow events to be observed when a controller is leading (i.e. by registering callbacks). When a controller stops leading, all of its callbacks will be de-registered. NOTE: * controllers will have a grace period during which they can renew their lease. Their callbacks will be de-registered only if this fails. We will not register and de-register callbacks that often for a single controller. * we do not lose out on any state. Other informers will continue to run (e.g. destination readers). When callbacks are registered, we pass all of the cached objects through them. In other words, we do not issue API requests on registration, we process the state of the cluster as observed from the cache. * we make another change that's slightly orthogonal. Before we shutdown, we ensure to drain the queue. This should not be a race since we will first block until the queue is drained, then signal to the leader elector loop that we are done. This gives us some confidence that all events have been processed as soon as they were observed. Signed-off-by: Matei David <matei@buoyant.io>	2024-02-01 17:46:22 -05:00
Alex Leong	3902b339e7	Only process server updates on workloads affected by the server (#12017 ) We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload. This is because the Server update may have changed opaqueness for that workload. However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster. Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server. If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-02-01 10:42:50 -08:00
Matei David	4a8e760d95	Fix how names are generated for external EndpointSlice resources (#12016 ) Any slices generated for a group of external workloads follow a similar convention: `linkerd-external-<svc-name>-<hash>`. Currently the hash is appended directly to the service name making it less readable. We add a `-` to the generate name value so that random hashes are not part of the service name. This is similar to the upstream implementation. Signed-off-by: Matei David <matei@buoyant.io>	2024-01-31 09:57:29 +00:00
Alex Leong	0261e4f086	Return INVALID_ARGUMENT status codes properly from GetProfile (#11980 ) When the destination controller receives a GetProfile request for an ExternalName service, it should return gRPC status code INVALID_ARGUMENT to signal to the proxy that ExternalName services do not have endpoints and service discovery should not be used. However, the destination controller is returning gRPC status code UNKNOWN instead. This causes the proxy to retry these requests, resulting in a flurry of warning logs in the destination controller. We fix the destination controller to properly return INVALID_ARGUMENT instead of UNKNOWN for ExternalName services. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-01-26 09:11:00 -08:00
Matei David	3313bec072	Process ExternalWorkload targetRefs when updating slices (#11987 ) If the readiness of an external workload endpoint changes while traffic is being sent to it, the update will not be propagated to clients. This can lead to issues where an endpoint that is marked as `notReady` continues to figure out as being `ready` by the endpoints watcher. The issue stems from how endpoint slices are diffed. A utility function responsible for processing addresses does not consider endpoints whose targetRef is an external workload. We fix the problem and add two module tests to validate readiness is propagated to clients correctly. --------- Signed-off-by: Matei David <matei@buoyant.io>	2024-01-26 12:01:50 +00:00
Matei David	9c902dc6b4	Add an endpoints reconciler component for external workloads (#11948 ) We introduced an endpoints controller that will be responsible for managing EndpointSlices for services that select external workloads. We introduce as a follow-up the reconciler component of the controller that will be responsible for doing the writes and diffing. Additionally, the controller is wired-up in the destination service's main routine and will start if endpoint slice support is enabled. --------- Signed-off-by: Matei David <matei@buoyant.io> Signed-off-by: Zahari Dichev <zaharidichev@gmail.com> Co-authored-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-24 16:55:16 +00:00
Alex Leong	65f13de2ce	Add support for ExternalWorkloads in endpoint profiles (#11952 ) When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including: * mesh TLS identity * workload metric labels * opaque / protocol hints Signed-off-by: Alex Leong <alex@buoyant.io>	2024-01-23 09:43:12 -08:00
Zahari Dichev	6f4cdf6617	discovery: add metrics to endpoints controller workqueue (#11958 ) This PR adds metrics to the work queue that is used in the external workload endpoints controller. Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-22 15:34:50 +02:00
Zahari Dichev	027d49a9a6	discovery: handle endpoint slices from ExternalWorkload (#11939 ) This alters the endpoints slices watcher to handle slices that reference ExternalWorkloads. Testing Add the following resources: ```yaml apiVersion: discovery.k8s.io/v1 kind: EndpointSlice addressType: IPv4 metadata: name: my-external-workload namespace: mixed-env labels: kubernetes.io/service-name: test-1 endpoints: - addresses: - 172.21.0.5 conditions: ready: true serving: true terminating: false targetRef: kind: ExternalWorkload name: my-external-workload ports: - port: 8080 name: http --- apiVersion: workload.linkerd.io/v1alpha1 kind: ExternalWorkload metadata: name: my-external-workload namespace: mixed-env labels: app: test spec: meshTls: identity: "test" serverName: "test" workloadIPs: - ip: 172.21.0.5 ports: - port: 8080 name: http --- apiVersion: v1 kind: Service metadata: name: test-1 namespace: mixed-env spec: selector: app: test type: ClusterIP ports: - name: http port: 8080 targetPort: 8080 protocol: TCP ``` Observe endpoints: ``` linkerd dg endpoints test-1.mixed-env.svc.cluster.local:8080 ``` Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-17 15:43:20 -08:00
Matei David	983fc55abc	Introduce new external endpoints controller (#11905 ) For mesh expansion, we need to register an ExternalWorkload's service membership. Service memberships describe which Service objects an ExternalWorkload is part of (i.e. which service can be used to route traffic to an external endpoint). Service membership will allow the control plane to discover configuration associated with an external endpoint when performing discovery on a service target. To build these memberships, we introduce a new controller to the destination service, responsible for watching Service and ExternalWorkload objects, and for writing out EndpointSlice objects for each Service that selects one or more external endpoints. As a first step, we add a new externalworkload module and a new controller in the that watches services and workloads. In a follow-up change, the ExternalEndpointManager will additionally perform the necessary reconciliation by writing EndpointSlice objects. Since Linkerd's control plane may run in HA, we also add a lease object that will be used by the manager. When a lease is claimed, a flag is turned on in the manager to let it know it may perform writes. A more compact list of changes: * Add a new externalworkload module * Add an EndpointsController in the module along with necessary mechanisms to watch resources. * Add RBAC rules to the destination service: * Allow policy and destination to read ExternalWorkload objects * Allow destination to create / update / read Lease objects --------- Signed-off-by: Matei David <matei@buoyant.io>	2024-01-17 12:15:28 +00:00
Zahari Dichev	391ce919f5	policy: regenerate Server go bindings (#11920 ) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>	2024-01-15 11:09:31 +02:00
Alex Leong	27a1a84a48	Only send server updates to listeners when the opaque protocol changes (#11907 ) Whenever the destination controller's informer receives an update of a Server resource, it checks every portPublisher in the endpointsWatcher to see if the Server selects any pods in that servicePort and updates those pods' opaque protocol field. Regardless of if any pods were matched or if the opaque protocol changed, an update is sent to each listener. This results in an update to every endpointTranslator each time a Server is updated. During a resync, we get an update for every Server in the cluster which results in N updates to each endpointTranslator where N is the number of Servers in the cluster. If N is greater than 100, it becomes possible that these N updates could overflow the endpointTranslator update queue if the queue is not being drained fast enough. We change this to only send the update for a Server if at least one of the servicePort addresses was selected by that server AND it's opaque protocol field changed. Signed-off-by: Alex Leong <alex@buoyant.io>	2024-01-10 14:18:36 -08:00
Oliver Gould	b0fee16fb4	destination: Fix GetProfile endpoint buffering (#11815 ) In `71635cb` and `357a1d3` we updated the endpoint and profile translators to prevent backpressure from stalling the informer tasks. This change updates the endpoint profile translator with the same fix, so that updates are buffered and can detect when when a gRPC stream is stalled. Furthermore, the update method is updated to use a protobuf-aware comparison method instead of using serialization and string comparison. A test is added for the endpoint profile translator, since none existed previously.	2023-12-22 09:25:12 -08:00
Alex Leong	a92d17dbe9	Fix error for profile lookups on unmeshed pods with port in default opaque list (#11550 ) When we do a `GetProfile` lookup for an unmeshed pod, we set the `weightedAddr.ProtocolHint` to an empty value `&pb.ProtocolHint{}` to indicate that the address is unmeshed and has no protocol hint. However, when the looked up port is in the default opaque list, we erroneously check if `weightedAddr.ProtocolHint != nil` to determine if we should attempt to get the inbound listen port for that pod. Since `&pb.ProtocolHint{} != nil`, we attempt to get the inbound listen port for the unmeshed pod. This results in an error, preventing any valid `GetProfile` responses from being returned. We update the initialization logic for `weightedAddr.ProtocolHint` to only create a struct when a protocol hint is present and to leave it as `nil` if the pod is unmeshed. We add a simple unit test for this behavior as well. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-12-20 13:56:49 -08:00
Oliver Gould	5e558ae3e5	destination: Add optional experimental endpoint weighting (#11795 ) This change adds a runtime flag to the destination controller, --experimental-endpoint-zone-weights=true, that causes endpoints in the local zone to receive higher weights. This feature is disabled by default, since the weight value is not honored by proxies. No helm configuration is exposed yet, either. This weighting is instrumented in the endpoint translator. Tests are added to confirm that the behavior is feature-gated. Additionally, this PR adds the "zone" metric label to endpoint metadata responses.	2023-12-20 13:11:30 -08:00
Alejandro Pedraza	c7db3b3b80	Remove k8sAPI/metadataAPI dependency from endpointProfileTranslator (#11587 ) This removes `endpointProfileTranslator`'s dependency on the k8sAPI and metadataAPI, which are not used. This was introduced in one of #11334's refactorings, but ended up being not required. No functional changes here.	2023-11-14 09:04:36 -05:00
Alex Leong	71635cbf3d	Add queuing to profile translator (#11546 ) https://github.com/linkerd/linkerd2/pull/11491 changed the EndpointTranslator to use a queue to avoid calling `Send` on a gRPC stream directly from an informer callback goroutine. This change updates the ProfileTranslator in the same way, adding a queue to ensure we do not block the informer thread. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-09 07:32:30 -05:00
Alex Leong	1e605ddf63	Add informer lag histograms (#11534 ) In order to detect if the destination controller's k8s informers have fallen behind, we add a histogram for each resource type. These histograms track the delta between when an update to a resource occurs and when the destination controller processes that update. We do this by looking at the timestamps on the managed fields of the resource and looking for the most recent update and comparing that to the current time. The histogram metrics are of the form `{kind}_informer_lag_ms_bucket_`. We record a value only for updates, not for adds or deletes. This is because when the controller starts up, it will populate its cache with an add for each resource in the cluster and the delta between the last updated time of that resource and the current time may be large. This does not represent informer lag and should not be counted as such. * When the informer performs resyncs, we get updates where the updated time of the old version is equal to the updated time of the new version. This does not represent an actual update of the resource itself and so we do not record a value. * Since we are comparing timestamps set on the manged fields of resources to the current time from the destination controller's system clock, the accuracy of these metrics depends on clock drift being minimal across the cluster. * We use histogram buckets which range from 500ms to about 17 minutes. In my testing, an informer lag of 500ms-1000ms is typical. However, we wish to have enough buckets to identify cases where the informer is lagged significantly behind. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-08 14:56:20 -05:00
Alex Leong	8cf386355f	Fix GetProfiles error when address is opaque and unmeshed (#11556 ) When we do a GetProfile lookup for an opaque port on an unmeshed pod, we attempt to look up the inbound listen port of that pod's proxy. Since that pod has no proxy, this fails and we return an error to the GetProfile API call. This causes the proxy to fail to be able to resolve the profile and be unable to route the traffic. We revert to the previous behavior of only logging when we cannot look up the inbound listen port instead of returning an error. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-02 06:59:39 -07:00
Alex Leong	8c42a509d7	Return NotFound for unknown pod names (#11540 ) Fixes #11065 When an inbound proxy receives a request with a canonical name of the form `hostname.service.namespace.svc.cluster.domain`, we assume that `hostname` is the hostname of the pod as described in https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields. However, pods are also addressable with `pod-ip.service.namespace.svc.cluster.domain`. When the destination controller gets a profile request of this form, we attempt to find a pod with hostname of `pod-ip` and return an error with gRPC status `Unknown` since this will not exist. It is expected that this profile lookup will fail since we cannot have service profiles for individual pods. However, returning a gRPC status `Unknown` for these requests brings the reported success rate of the destination controller down. Instead we should return these as gRPC status `NotFound` so that these responses don't get reported as server errors. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-11-01 15:44:54 -07:00
Alex Leong	4e7a588a2c	Add pod name to context token and logging (#11532 ) When the destination controller logs about receiving or sending messages to a data plane proxy, there is no information in the log about which data plane pod it is communicating with. This can make it difficult to diagnose issues which span the data plane and control plane. We add a `pod` field to the context token that proxies include in requests to the destination controller. We add this pod name to the logging context so that it shows up in log messages. In order to accomplish this, we had to plumb through logging context in a few places where it previously had not been. This gives us a more complete logging context and more information in each log message. An example log message with this fuller logging context is: ``` time="2023-10-24T00:14:09Z" level=debug msg="Sending destination add: add:{addrs:{addr:{ip:{ipv4:183762990} port:8080} weight:10000 metric_labels:{key:\"control_plane_ns\" value:\"linkerd\"} metric_labels:{key:\"deployment\" value:\"voting\"} metric_labels:{key:\"pod\" value:\"voting-7475cb974c-2crt5\"} metric_labels:{key:\"pod_template_hash\" value:\"7475cb974c\"} metric_labels:{key:\"serviceaccount\" value:\"voting\"} tls_identity:{dns_like_identity:{name:\"voting.emojivoto.serviceaccount.identity.linkerd.cluster.local\"}} protocol_hint:{h2:{}}} metric_labels:{key:\"namespace\" value:\"emojivoto\"} metric_labels:{key:\"service\" value:\"voting-svc\"}}" addr=":8086" component=endpoint-translator context-ns=emojivoto context-pod=web-767f4484fd-wmpvf remote="10.244.0.65:52786" service="voting-svc.emojivoto.svc.cluster.local:8080" ``` Note the `context-pod` field. Additionally, we have tested this when no pod field is included in the context token (e.g. when handling requests from a pod which does not yet add this field) and confirmed that the `context-pod` log field is empty, but no errors occur. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-10-25 13:48:42 -07:00
Oliver Gould	8cd165a1e4	destination: Index pods and service with multiple IPs (#11515 ) The destination controller indexes Pods by their primary PodIP and their primary HostIP (when applicable); and it indexes Services by their primary ClusterIP. In preparation for dual-stack (IPv6) support, this change updates the destination controller indexers to consume all IPs for these resources. Tests are updated to demonstrate a dual-stack setup (where these resources have a primary IPv4 address and a secondary IPv6 address). While exercising these tests, I had to replace the brittle host:port parsing logic we use in the server, in favor of Go's standard `net.SplitHostPort` utility.	2023-10-23 14:55:33 -07:00
Alex Leong	357a1d32b2	Add update queue to endpoint translator (#11491 ) When a grpc client of the destination.Get API initiates a request but then doesn't read off of that stream, the HTTP2 stream flow control window will fill up and eventually exert backpressure on the destination controller. This manifests as calls to `Send` on the stream blocking. Since `Send` is called synchronously from the client-go informer callback (by way of the endpoint translator), this blocks the informer callback and prevents all further informer calllbacks from firing. This causes the destination controller to stop sending updates to any of its clients. We add a queue in the endpoint translator so that when it gets an update from the informer callback, that update is queued and we avoid potentially blocking the informer callback. Each endpoint translator spawns a goroutine to process this queue and call `Send`. If there is not capacity in this queue (e.g. because a client has stopped reading and we are experiencing backpressure) then we terminate the stream. Signed-off-by: Alex Leong <alex@buoyant.io>	2023-10-18 12:34:38 -07:00
Alejandro Pedraza	c67985def0	Extend unit test for HostPort subscriptions (#11439 ) Followup to https://github.com/linkerd/linkerd2/pull/11334#issuecomment-1736093592 This extends the test introduced in #11334 to excercise upgrading a Server associated to a pod's HostPort, and observing how the stream updates the OpaqueProtocol field. Helper functions were refactored a bit to allow retrieving the l5dCRDClientSet used when building the fake API.	2023-10-02 13:51:19 -05:00

1 2 3 4 5 ...

406 Commits