Commit Graph

423 Commits

Author SHA1 Message Date
Alex Leong 7edd886c91
test(multicluster): Wait for cluster store to be populated in test
The federated service watcher test has a race condition where we create a cluster store with a set of kubernetes manifests and then immediately begin testing queries to that cluster store.  If these queries are executed before the cluster store's informers process the kubernetes manifests, the queries can fail.

In the context of this test, this failure manifests as the read on the updates channel never returning, resulting in test timeouts.

We fix this by waiting for the cluster store to be populated before continuing with the test and issuing queries.

Signed-off-by: Alex Leong <alex@buoyant.io>
2025-04-11 14:41:56 -07:00
Oliver Gould 6501de61ed
fix(test): avoid duplicate registry errors (#13898)
The destination controller's cluster store registers a gague in its constructor. When this constructor is called multiple times (i.e. in tests), this can lead to a panic.

To avoid this panic, this change updates NewClusterStoreWithDecoder to accept a prometheus registry). The NewClusterStore constructor (used by the application's main) continues to use the default registry, but tests now construct their own temporary registries to avoid duplicate registration errors.
2025-04-03 15:05:24 -05:00
Alejandro Pedraza 38ec59128e
refactor(multicluster): revert Link permissions back to Role (#13848)
#13783 moved the service mirror permissions on Links from a Role to a ClusterRole as a side-effect, and this change reverts that by refactoring the Links API to allow consuming a namespace-scoped API more easily.

- We introduce in our `k8s.API` type a field `L5dClient` alongside the broad `Client` one, which is constructed via the new function `NewL5dNamespacedAPI()`.
- In the service-mirror `main.go` we use that constructor to acquire `linksAPI`, which is used to configure the informer for handling Link events in this file.
- `linksAPI` is also passed down to instantiations of `RemoteClusterServiceWatcher`, where it's used for the direct kube-apiserver calls and for retrieving a Lister.
2025-03-22 05:44:35 -05:00
Zahari Dichev f57137b121
fix(dest): fallback to default proxy inbound port when one could not be discovered on an ExternalWorkload (#13840)
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2025-03-21 15:25:15 +02:00
Scott Fleener 5d0275e3f3
feat(dest): Default meshed traffic to inbound proxy port (#13715)
A follow up to https://github.com/linkerd/linkerd2/pull/13699, this default-enables the config option introduced in that PR. Now, all traffic between meshed pods should flow to the proxy's inbound port.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-11 15:25:40 -07:00
Scott Fleener 05d48f6a52
fix(destination): Do not send admin traffic over opaque transport (#13758)
Traffic that is meant for the destination workload can be sent over the opaque transport without issue. However, traffic intended for the proxy itself (metrics scraping, tap) need to be sent directly to the corresponding proxy port to prevent them from being forwarded to the workflow.

This adds in special cases for the admin and control ports, read directly from the environment variables on the pods, that excludes them from being sent over opaque transport.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-11 12:15:10 -07:00
Scott Fleener 156bf60ad7
feat(destination): introduce transport-protocol outbound TLS mode (#13699)
Non-opaque meshed traffic currently flows over the original destination port, which requires the inbound proxy to do protocol detection.

This adds an option to the destination controller that configures all meshed traffic to flow to the inbound proxy's inbound port. This will allow us to include more session protocol information in the future, obviating the need for inbound protocol detection.

This doesn't do much in the way of testing, since the default behavior should be unchanged. When this default changes, more validation will be done on the behavior here.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2025-03-05 13:51:21 -08:00
Tuomo ba8a84c960
fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods (#13557)
* fix(destination): GetProfile requests targeting pods directly should return endpoint data for running (not necessarily ready) pods

Requiring Pods to pass readiness checks before allowing Pod to Pod communication disrupts communication in e.g. clustered systems which require Pods to communicate with each other prior to establishing ready state and allowing inbound traffic.

Relaxed the requirement and modified the workload watcher to only require that a Pod exists and is in Running phase.

Reproduced the issue with a test setup described in #13247.

Fixes #13247.

---------

Signed-off-by: Tuomo <tjorri@gmail.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2025-01-16 16:55:31 -05:00
Oliver Gould 3c91fc64ce
fix(destination): avoid panic on missing managed fields timestamp (#13378)
We received a report of a panic:

    runtime error: invalid memory address or nil pointer dereference

    panic({0x1edb860?, 0x37a6050?}
        /usr/local/go/src/runtime/panic.go:785 +0x132

    github.com/linkerd/linkerd2/controller/api/destination/watcher.latestUpdated({0xc0006b2d80?, 0xc00051a540?, 0xc0008fa008?})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1612 +0x125

    github.com/linkerd/linkerd2/controller/api/destination/watcher.(*OpaquePortsWatcher).updateService(0xc0007d5480, {0x21fd160?, 0xc000d71688?}, {0x21fd160, 0xc000d71688})
        /linkerd-build/vendor/github.com/linkerd/linkerd2/controller/api/destination/watcher/opaque_ports_watcher.go:141 +0x68

The `latestUpdated` function does not properly handle the case where a atime is
omitted from a `ManagedFieldsEntry`.

    type ManagedFieldsEntry struct {
        // Time is the timestamp of when the ManagedFields entry was added. The
        // timestamp will also be updated if a field is added, the manager
        // changes any of the owned fields value or removes a field. The
        // timestamp does not update when a field is removed from the entry
        // because another manager took it over.
        // +optional
        Time *Time `json:"time,omitempty" protobuf:"bytes,4,opt,name=time"`

This change adds a check to avoid the nil dereference.
2024-11-22 15:21:09 -08:00
Alex Leong 752d1c9ea0
Add tests for federated service watcher (#13329)
Adds tests for the federated service watcher that exercise having remote and local clusters join and leave a federated service and ensuring that the correct proxy API updates are emitted.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-11-19 10:08:50 -08:00
Oliver Gould 5cbe45c86e
feat(destination): set parent and profile references (#13292)
In order for proxies to properly reflect the resources used to drive policy
decisions, the proxy API has been updated to include resource metadata on
ServiceProfile responses.

This change updates the profile translator to include ParentRef and ProfileRef
metadata when it is configured.

This change does not set backend or endpoint references.
2024-11-09 00:11:40 +00:00
Alex Leong c66f83e1f1
Add federated service watcher (#13267)
We add support for federated services to the destination controller by adding a new FederatedServiceWatcher.  When the destination controller receives a `Get` request for a Service with the `multicluster.linkerd.io/remote-discovery` and/or the `multicluster.linkerd.io/local-discovery` annotations, it subscribes to the FederatedServiceWatcher instead of subscribing to the EndpointsWatcher directly.  The FederatedServiceWatcher watches the federated service for any changes to these annotations, and maintains the appropriate watches on the local EndpointWatcher and/or remote EndpointWatchers fetched through the ClusterStore.

This means that we will often have multiple EndpointTranslators writing to the same `Get` response stream.  In order for a `NoEndpoints` message sent to one EndpointTranslator to not clobber the whole stream, we make a change where `NoEndpoints` messages are no longer sent to the response stream, but are replaced by a `Remove` message containing all of the addresses from that EndpointTranslator.  This allows multiple EndpointTranslators to coexist on the same stream.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-11-08 09:34:01 -08:00
Scott Fleener 958cfca666
Export zone locality in outbound destination metrics (#13129)
Currently, we don't have a simple way of checking if the endpoint a proxy is discovering is in the same zone or not.

This adds a "zone_locality" metric label to the outbound destination address metrics. Note that this does not increase the cardinality of the related metrics, as this label doesn't vary within an endpoint.

Validated by checking the prometheus metrics on a local cluster and verifying this label appears in the outbound transport metrics.

Signed-off-by: Scott Fleener <scott@buoyant.io>
2024-10-15 13:43:05 -07:00
dependabot[bot] 4baa94baac
build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0 (#12958)
* build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.30.3 to 0.31.0.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* To apease the linter, replaced deprecated workqueue interfaces with their typed alternatives. For the endpoints controller we can instantiate with . But for the service mirror, given the queue can hold different event types, we have to instantiate with .

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-09-04 09:04:04 -05:00
Alejandro Pedraza 567288a060
Dual-stack support for ExternalWorkloads (#12965)
* Dual-stack support for ExternalWorkloads

This changes the `workloadIPs.maxItems` field in the ExternalWorkload CRD from `1` to `2`, to accommodate for an IPv4 and IPv6 pair. This is a BC change, so there's no need to bump the CRD version.

The control plane already supports this, so this change is mainly about expansions to the unit tests to also account for the double stack case.
2024-08-30 13:23:56 -05:00
Alejandro Pedraza 71291fe7bc
Add `accessPolicy` field to Server CRD (#12845)
Followup to #12844

This new field defines the default policy for Servers, i.e. if a request doesn't match the policy associated to a Server then this policy applies. The values are the same as for `proxy.defaultInboundPolicy` and the `config.linkerd.io/default-inbound-policy` annotation (all-unauthenticated, all-authenticated, cluster-authenticated, cluster-unauthenticated, deny), plus a new value "audit". The default is "deny", thus remaining backwards-compatible.

This field is also exposed as an additional printer column.
2024-07-22 09:01:09 -05:00
Alex Leong 3bd01cac9c
add nil check when reading endpoint hostname (thanks @acallejaszu) (#12689)
Fixes #12686

When an endpoint in an EndpointSlice resource does not contain a hostname field, the destination controller can panic while looking for an endpoint with a certain hostname.  This happens when doing a lookup with a pod dns name.

We add a nil check to avoid the panic.

We add such an endpoint to our test fixture to exercise this case.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-06-10 10:45:31 -07:00
dependabot[bot] d42432914d
build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0 (#12593)
* build(deps): bump google.golang.org/grpc from 1.63.2 to 1.64.0

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.64.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.64.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

I've replaced all the `grpc.Dial` calls with `grpc.NewClient`. There was one call `grpc.DialContext(ctx,...)` in `viz/tap/api/grpc_server.go` that also got replaced with `grpc.NewClient`, which loses the ability to pass `ctxt` but that doesn't matter; as we're not using `WithBlock(true)` that context wasn't being accounted for when we were using `DialContext()` anyways.

https://github.com/grpc/grpc-go/blob/v1.64.0/clientconn.go#L216-L242

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-05-22 14:40:04 -05:00
Alejandro Pedraza df24ea6d96
Refactor ES addition logic in Destination (#12625)
This is a second take on #12427, which avoided a theoretical/correctness
issue around overwritting new ES addresses with stale data.

We had to revert that in #12589 because the change introduced a bug, by
returning early when the ES had no addresses and failed to properly
initiallize `addesses` for the portPublisher.

This just removes the early return.
2024-05-22 09:12:19 -05:00
Matei David 407df01ec3
chore(controller): Remove stream concurrency limits (#12598)
Our gRPC servers use the default gRPC server configuration, which
limits the number of concurrent streams to 100. Since the controllers
run with proxies, this provides a hard scaling limit for the number of
watches an application can have.

This change updates our gRPC server configuration to clear the default
concurrency limit, allowing the server to handle as many streams as
possible.

Signed-off-by: Matei David <matei@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
2024-05-15 18:07:15 +01:00
Alejandro Pedraza 9bd8c005da
Revert "Fix destination staleness issue when adding EndpointSlices (#12427)" (#12589)
This reverts commit 4fccf3e9ec.

The early return was causing `pp.addresses = newAddressSet` to not be run when the list of addresses is empty; but setting that is still necessary so that labels are tracked correctly.

This was caught by the tap (viz) integration test run in the release workflow.
2024-05-13 16:25:50 -07:00
Alejandro Pedraza 4fccf3e9ec
Fix destination staleness issue when adding EndpointSlices (#12427)
When updating the portPublisher's address set when a new EndpointSlice creation event is received, its addresses where getting overwritten with stale data whenever its IDs already existed in the current pp's address set.

There can be pathological cases in single-stack where this can be a problem. For example when ES get recycled but the deletion event is not caught for some reason, when the addition event is received its Address data will be overwritten by the old stale entry.

## Other Changes

- Remove overriding `newAddressSet.LocalTrafficPolicy` as that is already taken care inside `pp.endpointSliceToAddresses(slice)`.
- When there are no Add events to send, return early without updating state nor updating metrics.
2024-05-08 09:12:51 -05:00
Alex Leong d3e227fbd7
Fix flakey Handles_overflow test (#12555)
The `Handles overflow` test for the endpoint profile translator writes updates into the updates queue until it is full and then tests that no more updates can be enqueued.  However, since the test also starts the profile translator, it is concurrently draining updates off of the queue as well.  This leads to unpredictable results and test flakeyness.

We update the test to not start the translator so that updates are not drained off of the queue during the test.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-05-07 13:06:58 -07:00
Alejandro Pedraza 137eac9df3
Add IPv6 support for the destination controller (#12428)
Services in dual-stack mode result in the creation of two EndpointSlices, one for each IP family. Before this change, the Get Destination API would nondeterministically return the address for any of those ES, depending on which one was processed last by the controller because they would overwrite each other.

As part of the ongoing effort to support IPv6/dual-stack networks, this change fixes that behavior giving preference to IPv6 addresses whenever a service exposes both families.

There are a new set of unit tests in server_ipv6_test.go, and in the TestEndpointTranslatorForPods tests there's a couple of new cases to test the interaction with zone filtering.
Also the server unit tests were updated to segregate the tests and resources dealing with the IPv4/IPv6/dual-stack cases.
2024-05-02 14:39:05 -05:00
Alejandro Pedraza 7cbe2f5ca6
Enable forwarding IPv6 connections through the proxy (#12495)
As part of the ongoing effort to support IPv6/dual-stack networks, this change
enables the proxy to properly forward IPv6 connections:

- Adds the new `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDRS` environment variable when
  injecting the proxy. This is supported as of proxy v2.228.0 which was just
  pulled into the linkerd2 repo in #2d5085b56e465ef56ed4a178dfd766a3e16a631d.
  This adds the IPv6 loopback address (`[::1]`) to the IPv4 one (`127.0.0.1`)
  so the proxy can forward outbound connections received via IPv6. The injector
  will still inject `LINKERD2_PROXY_OUTBOUND_LISTEN_ADDR` to support the rare
  case where the `proxy.image.version` value is overridden with an older
  version. The new proxy still considers that variable, but it's superseded by
  the new one. The old variable is considered deprecated and should be removed
  in the future.
- The values for `LINKERD2_PROXY_CONTROL_LISTEN_ADDR`,
  `LINKERD2_PROXY_ADMIN_LISTEN_ADDR` and `LINKERD2_PROXY_INBOUND_LISTEN_ADDR`
  have been updated to point to the IPv6 wildcard address (`[::]`) instead of
  the IPv4 one (`0.0.0.0`) for the same reason. Unlike with the loopback
  address, the IPv6 wildcard address suffices to capture both IPv4 and IPv6
  traffic.
- The endpoint translator's `getInboundPort()` has been updated to properly
  parse the IPv6 loopback address retrieved from the proxy container manifest.
  A unit test was added to validate the behavior.
2024-05-02 16:39:19 +01:00
Oliver Gould aef8a02426
feat(destination): Add meshed HTTP/2 keep-alive settings (#12504)
This commit adds destination controller configuration that enables default
keep-alives for meshed HTTP/2 clients.

This is accomplished by encoding the raw protobuf message structure into the
helm values, and then encoding that as JSON in the destination controller's
command-line options. This allows operators to set any supported HTTP/2 client
configuration without having to modify the destination controller.
2024-04-30 19:35:30 +00:00
knowmost 27bcdd1028
chore: fix function names in comment (#12512)
Signed-off-by: knowmost <knowmost@outlook.com>
2024-04-29 10:28:10 -07:00
Alejandro Pedraza 6db4bd667c
Fix issues with native sidecars (#12453)
Closes #12395

Failing to iterate over init containers as well as regular containers for finding the proxy in various parts of the code when the proxy is injected as a native sidecar resulted in:

- `Get` Destination API failing in the presence of opaque ports
- Failure having the injector detecting already injected pods
- Various CLI issues

This PR is split into the following commits addressing each issue separately:

a8ebe76e3 - Fix injection check for existing sidecars
44e9625e0 - Fix 'linkerd uninject'
62694965d - Fix 'linkerd version --proxy'
42dbdaddf - Fix 'linkerd identity'
39db823fe - Fix 'linkerd check'
7359f371d - Fix 'linkerd dg proxy-metrics'
f8f73c47c - Fix destination controller
2024-04-26 14:38:01 -05:00
Matei David 38c6d11832
Change injector overriding logic to be more generic (#12405)
The proxy-injector package has a `ResourceConfig` type that is
responsible for parsing resources, applying overrides, and serialising a
series of configuration values to a Kubernetes patch. The functionality
is very concrete in its assumption; it always relies on a pod spec and
it mutates inner state when deciding on which overrides to apply.

This is not a flexible way to handle injection and configuration
overriding for other types of resources. We change this by turning
methods previously defined on `ResourceConfig` into free-standing
functions. These functions can be applied for any type of resources in
order to compute a set of configuration values based on annotation
overrides. Through the change, the functions can be used to compute
static configuration for non-Pod types or can be used in tests.


Signed-off-by: Matei David <matei@buoyant.io>
2024-04-10 15:51:58 +01:00
hanghuge 78d42b230d
chore: fix function name in comment (#12396)
Fixed comments for `subscribeToServicesWithContext` and `reconcileByAddressType`. Previously,
the comments contained incorrect function names.

Signed-off-by: hanghuge <cmoman@outlook.com>
2024-04-10 15:46:45 +01:00
occupyhabit 6eeaea4d94
chore: Remove repetitive words (#12330)
Signed-off-by: occupyhabit <wangmengjiao@outlook.com>
2024-03-25 09:33:39 -07:00
Hirotaka Tagawa / wafuwafu13 9a5284f453
controller: Stop logging errors on shutdown (#12167)
When a controller is shutdown, the admin server fails. This failure is logged
as an error, even when the shutdown was graceful.

This change updates the shutdown behavior to log more appropriately.

Signed-off-by: wafuwafu13 <jaruwafu@gmail.com>
2024-03-22 09:49:37 -07:00
Alex Leong 2c2a96bc73
Removes should not change local traffic policy (#12325)
Fixes: #12311

When the endpoint translator receives a `remove` call, it was updating it's local traffic policy based on the address set passed to remove.  However, since `remove` is only meant to remove addresses and not change the address metadata, the endpoints watcher was not setting local traffic policy on these calls to `remove`.  This can result in calls to `remove` temporarily turning off local traffic policy which will cause non-local addresses to be sent to clients.

Since `remove` should not change address metadata, we now disregard any metadata in the call to `remove`, including any changes to the local traffic policy.

Signed-off-by: Alex Leong <alex@buoyant.io>
Co-authored-by: Oliver Gould <ver@buoyant.io>
2024-03-22 09:42:13 -07:00
Alejandro Pedraza b697e285a0
Refactor IPv4-only functions to also work for IPv6 (#12303)
The main change here is the refactoring of the address functions in `addr.go` that support the Destination controller and Viz's Tap controller. Some of those functions only worked for IPv4, so this change refactored them to make them IP family agnostic.
This enabled adding (and fixing) IPv6 unit tests as detailed in the following sections.

Other changes:

- The `ProxyAddressesToString()` function was no longer used, so it got removed.
- The `ProxyIPToString()` function was only used by the destination-client script, so that got stripped out.

## `addr_test.go`

We added IPv6 cases to each test, that would have failed previously.

## `endpoint_translator_test.go`

One of the test pods (pod3) was changed to have an IPv6. Without the other changes in this PR those tests would still have passed, but just because when comparing actual IPs with expected ones we weren't checking if they were both zero. So here we added checks against that.

## `server_test.go`

As above, we added checks against empty IPs. And in the mocked resources in `test_util.go` we added an IPv6 EndpointSlice.
2024-03-22 07:20:52 -05:00
Alex Leong 5915ef5a18
Don't send endpoint profile updates from Server updates when opaqueness doesn't change (#12013)
When the destination controller receives an update for a Server resource, we recompute opaqueness ports for all pods.  This results in a large number of updates to all endpoint profile watches, even if the opaqueness doesn't change.  In cases where there are many Server resources, this can result in a large number of updates being sent to the endpoint profile translator and overflowing the endpoint profile translator update queue.  This is especially likely to happen during an informer resync, since this will result in an informer callback for every Server in the cluster.

We refactor the workload watcher to not send these updates if the opaqueness has not changed.

This, seemingly simple, change in behavior requires a large code change because:
* the current opaqueness state is not stored on workload publishers and must be added so that we can determine if the opaqueness has changed
* storing the opaqueness in addition to the other state we're storing (pod, ip, port, etc.) means that we are not storing all of the data represented by the Address struct
* workload watcher uses a `createAddress` func to dynamically create an Address from the state it stores
* now that we are storing the Address as state, creating Addresses dynamically is no longer necessary and we can operate on the Address state directly
  * this makes the workload watcher more similar to other watchers and follow a common pattern
  * it also fixes some minor correctness issues:
    * pods that did not have the ready status condition were being considered when they should not have been
    * updates to ExternalWorkload labels were not being considered

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-03-19 10:24:02 -07:00
Oliver Gould e1ab0a2e46
destination: Fix flakey TestEndpointProfileTranslator (#12182)
TestEndpointProfileTranslator could race against the consumer task
so that the final queue insertion would not necessarily fail. The send
buffer has been eliminated to avoid this race.
2024-02-29 08:59:44 -08:00
Matei David 98e38a66b6
Rename meshTls to meshTLS in ExternalWorkload CRD (#12098)
The ExternalWorkload resource we introduced has a minor naming
inconsistency; `Tls` in `meshTls` is not capitalised. Other resources
that we have (e.g. authentication resources) capitalise TLS (and so does
Go, it follows a similar naming convention).

We fix this in the workload resource by changing the field's name and
bumping the version to `v1beta1`.

Upgrading the control plane version will continue to work without
downtime. However, if an existing resource exists, the policy controller
will not completely initialise. It will not enter a crashloop backoff,
but it will also not become ready until the resource is edited or
deleted.

Signed-off-by: Matei David <matei@buoyant.io>
2024-02-20 11:00:13 -08:00
Oliver Gould 2ab76b64c6
destination: Rename zone weighting flag to ext-endpoint-zone-weights (#12090) 2024-02-16 09:06:56 -05:00
Zahari Dichev bf7b039f41
controller: add counter for items dropped from workqueue (#12079)
Adds a metric that measures the number of items that have been discarded from the work queue in the external workloads controller due to the retries limit being exceeded.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-02-15 16:19:09 +02:00
Alex Leong 5aef29ace1
Update workload watcher and server tests to use EndpointSlices (#12054)
Fixes #12032

The Destination controller server tests test the destination server with `enableEndpointSlices=false`.  The default for this value is true, meaning that these tests do not test the default configuration.

We update the tests to test with `enableEndpointSlices=true` and update the corresponding mock kubernetes Endpoints resources to be EndpointSlices instead.  We also fix an instance where the workload watcher was using Endpoints even when in EndpointSlices mode.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-09 11:33:06 -08:00
Alex Leong 1d0f484a20
Limit service publisher updates from server by namespace (#12040)
A Server may only select workloads in its own namespace. Therefore, when the
destination controller receives an update for a Server, it only needs to
potentially send updates to watches on workloads in that same namespace. Taking
this into account allows us avoid all opaqueness computations for workloads in
other namespaces.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-08 17:00:50 +00:00
Alejandro Pedraza 8f8bd8f28f
Fix race condition in Destination's endpoints watcher (#12022)
Fixes #12010

## Problem

We're observing crashes in the destination controller in some scenarios, due to data race as described in #12010.

## Cause

The problem is the same instance of the `AddressSet.Addresses` map is getting mutated in the endpoints watcher Server [informer handler](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/watcher/endpoints_watcher.go#L1309), and iterated over in the endpoint translator [queue loop](https://github.com/linkerd/linkerd2/blob/edge-24.1.3/controller/api/destination/endpoint_translator.go#L197-L211), which run in different goroutines and the map is not guarded. I believe this doesn't result in Destination returning stale data; it's more of a correctness issue.

## Solution

Make a shallow copy of `pp.addresses` in the endpoints watcher and only pass that to the listeners. It's a shallow copy because we avoid making copies of the pod reference in there, knowing it won't get mutated.

## Repro

Install linkerd core and injected emojivoto and patch the endpoint translator to include a sleep call that will help surfacing the race (don't install the patch in the cluster; we'll only use it locally below):

<details>
  <summary>endpoint_translator.go diff</summary>

```diff
diff --git a/controller/api/destination/endpoint_translator.go b/controller/api/destination/endpoint_translator.go
index d1018d5f9..7d5abd638 100644
--- a/controller/api/destination/endpoint_translator.go
+++ b/controller/api/destination/endpoint_translator.go
@@ -5,6 +5,7 @@ import (
        "reflect"
        "strconv"
        "strings"
+       "time"

        pb "github.com/linkerd/linkerd2-proxy-api/go/destination"
        "github.com/linkerd/linkerd2-proxy-api/go/net"
@@ -195,7 +196,9 @@ func (et *endpointTranslator) processUpdate(update interface{}) {
 }

 func (et *endpointTranslator) add(set watcher.AddressSet) {
        for id, address := range set.Addresses {
+               time.Sleep(1 * time.Second)
                et.availableEndpoints.Addresses[id] = address
        }
```
</details>

Then create these two Server manifests:

<details>
  <summary>emoji-web-server.yml</summary>

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: Server
metadata:
  namespace: emojivoto
  name: web-http
  labels:
    app.kubernetes.io/part-of: emojivoto
    app.kubernetes.io/name: web
    app.kubernetes.io/version: v11
spec:
  podSelector:
    matchLabels:
      app: web-svc
  port: http
  proxyProtocol: HTTP/1
```
</details>

<details>
  <summary>emoji-web-server-opaque.yml</summary>

```yaml
apiVersion: policy.linkerd.io/v1beta2
kind: Server
metadata:
  namespace: emojivoto
  name: web-http
  labels:
    app.kubernetes.io/part-of: emojivoto
    app.kubernetes.io/name: web
    app.kubernetes.io/version: v11
spec:
  podSelector:
    matchLabels:
      app: web-svc
  port: http
  proxyProtocol: opaque
```
</details>

In separate consoles run the patched destination service and a destination client:

```bash
HOSTNAME=foobar go run -race ./controller/cmd/main.go destination -enable-h2-upgrade=true -enable-endpoint-slices=true -cluster-domain=cluster.local -identity-trust-domain=cluster.local -default-opaque-ports=25,587,3306,4444,5432,6379,9300,11211
```

```bash
go run ./controller/script/destination-client -path web-svc.emojivoto.svc.cluster.local:80
```

And run this to continuously switch the `proxyProtocol` field:

```bash
while true; do kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server.yml; kubectl apply -f ~/src/k8s/sample_yamls/emoji-web-server-opaque.yml ; done
```

You'll see the following data race report in the Destination controller logs:

<details>
  <summary>destination logs</summary>

```console
==================
WARNING: DATA RACE
Write at 0x00c0006d30e0 by goroutine 178:
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*portPublisher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:1310 +0x772
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*servicePublisher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:711 +0x150
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).addServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:514 +0x173
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/watcher/endpoints_watcher.go:528 +0x26f
  github.com/linkerd/linkerd2/controller/api/destination/watcher.(*EndpointsWatcher).updateServer-fm()
      <autogenerated>:1 +0x64
  k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/controller.go:246 +0x81
  k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnUpdate()
      <autogenerated>:1 +0x1f
  k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:970 +0x1f4
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:226 +0x41
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:227 +0xbe
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:204 +0x10a
  k8s.io/apimachinery/pkg/util/wait.Until()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/backoff.go:161 +0x9b
  k8s.io/client-go/tools/cache.(*processorListener).run()
      /home/alpeb/go/pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/shared_informer.go:966 +0x38
  k8s.io/client-go/tools/cache.(*processorListener).run-fm()
      <autogenerated>:1 +0x33
  k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      /home/alpeb/go/pkg/mod/k8s.io/apimachinery@v0.29.1/pkg/util/wait/wait.go:72 +0x86

Previous read at 0x00c0006d30e0 by goroutine 360:
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).add()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:200 +0x1ab
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).processUpdate()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:190 +0x166
  github.com/linkerd/linkerd2/controller/api/destination.(*endpointTranslator).Start.func1()
      /home/alpeb/pr/destination-race/linkerd2/controller/api/destination/endpoint_translator.go:174 +0x45
```
</details>

## Extras

This also removes the unused method `func (as *AddressSet) WithPort(port Port) AddressSet` in endpoints_watcher.go
2024-02-07 12:17:24 -05:00
Alex Leong 53287d40a8
Reflect changes to server selector in opaqueness (#12031)
Fixes #11995

If a Server is marking a Pod's port as opaque and then the Server's podSelector is updated to no longer select that Pod, then the Pod's port should no longer be marked as opaque. However, this update does not result in any updates from the destination API's Get stream and the port remains marked as opaque.

We fix this by updating the endpoint watcher's handling of Server updates to consider both the old and the new Server.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-05 12:25:16 -08:00
Matei David d4f99b32ce
Revise leader election logic for endpoints controller (#12021)
Revise leader election logic for endpoints controller

Our leader election logic can result in updates being missed under certain
conditions. Leases expire after their duration is up, even if their current
holder has been terminated. During this dead time, any changes in the
system will be observed by other controllers, but will not be written to
the API Server.

For example, during a rollout, a controller that has come up will not be
able to acquire the lease for a maximum time of 30 seconds (lease
duration). Within this time frame, any changes to the system (e.g. modified
workloads, services, deleted endpointslices) will be observed but not acted
on by the newly created controller. Once the controller gets into a bad
state, it can only recover after 10 minutes (via service resyncs) or if any
resources are modified.

To address this, we change our leader election mechanism. Instead of
pushing leader election to the edge (i.e. when performing writes) we only
allow events to be observed when a controller is leading (i.e. by
registering callbacks). When a controller stops leading, all of its
callbacks will be de-registered.

NOTE:

 * controllers will have a grace period during which they can renew their
   lease. Their callbacks will be de-registered only if this fails. We will
   not register and de-register callbacks that often for a single
   controller.
 * we do not lose out on any state. Other informers will continue to run
   (e.g. destination readers). When callbacks are registered, we pass all of
   the cached objects through them. In other words, we do not issue API
   requests on registration, we process the state of the cluster as observed
   from the cache.
 * we make another change that's slightly orthogonal. Before we shutdown,
   we ensure to drain the queue. This should not be a race since we will
   first block until the queue is drained, then signal to the leader elector
   loop that we are done. This gives us some confidence that all events have
   been processed as soon as they were observed.

Signed-off-by: Matei David <matei@buoyant.io>
2024-02-01 17:46:22 -05:00
Alex Leong 3902b339e7
Only process server updates on workloads affected by the server (#12017)
We the destination controller's workload watcher receives an update for any Server resource, it recomputes opaqueness for every workload.  This is because the Server update may have changed opaqueness for that workload.  However, this is very CPU intensive for the destination controller, especially during resyncs when we get Server updates for every Server resource in the cluster.

Instead, we only need to recompute opaqueness for workloads that are selected by the old version of the Server or by the new version of the Server.  If a workload is not selected by either the new or old version of the Server, then the Server update cannot have changed the workload's opaqueness.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-02-01 10:42:50 -08:00
Matei David 4a8e760d95
Fix how names are generated for external EndpointSlice resources (#12016)
Any slices generated for a group of external workloads follow a similar
convention: `linkerd-external-<svc-name>-<hash>`. Currently the hash is
appended directly to the service name making it less readable. We add a
`-` to the generate name value so that random hashes are not part of the
service name. This is similar to the upstream implementation.

Signed-off-by: Matei David <matei@buoyant.io>
2024-01-31 09:57:29 +00:00
Alex Leong 0261e4f086
Return INVALID_ARGUMENT status codes properly from GetProfile (#11980)
When the destination controller receives a GetProfile request for an ExternalName service, it should return gRPC status code INVALID_ARGUMENT to signal to the proxy that ExternalName services do not have endpoints and service discovery should not be used.  However, the destination controller is returning gRPC status code UNKNOWN instead.  This causes the proxy to retry these requests, resulting in a flurry of warning logs in the destination controller.

We fix the destination controller to properly return INVALID_ARGUMENT instead of UNKNOWN for ExternalName services.

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-01-26 09:11:00 -08:00
Matei David 3313bec072
Process ExternalWorkload targetRefs when updating slices (#11987)
If the readiness of an external workload endpoint changes while traffic
is being sent to it, the update will not be propagated to clients. This
can lead to issues where an endpoint that is marked as `notReady`
continues to figure out as being `ready` by the endpoints watcher.

The issue stems from how endpoint slices are diffed. A utility function
responsible for processing addresses does not consider endpoints whose
targetRef is an external workload. We fix the problem and add two module
tests to validate readiness is propagated to clients correctly.

---------

Signed-off-by: Matei David <matei@buoyant.io>
2024-01-26 12:01:50 +00:00
Matei David 9c902dc6b4
Add an endpoints reconciler component for external workloads (#11948)
We introduced an endpoints controller that will be responsible for
managing EndpointSlices for services that select external workloads. We
introduce as a follow-up the reconciler component of the controller that
will be responsible for doing the writes and diffing.

Additionally, the controller is wired-up in the destination service's
main routine and will start if endpoint slice support is enabled.

---------

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-24 16:55:16 +00:00
Alex Leong 65f13de2ce
Add support for ExternalWorkloads in endpoint profiles (#11952)
When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including:
* mesh TLS identity
* workload metric labels
* opaque / protocol hints

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-01-23 09:43:12 -08:00