Commit Graph

11 Commits

Author SHA1 Message Date
dependabot[bot] 4baa94baac
build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0 (#12958)
* build(deps): bump k8s.io/client-go from 0.30.3 to 0.31.0

Bumps [k8s.io/client-go](https://github.com/kubernetes/client-go) from 0.30.3 to 0.31.0.
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.30.3...v0.31.0)

---
updated-dependencies:
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* To apease the linter, replaced deprecated workqueue interfaces with their typed alternatives. For the endpoints controller we can instantiate with . But for the service mirror, given the queue can hold different event types, we have to instantiate with .

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alejandro Pedraza <alejandro@buoyant.io>
2024-09-04 09:04:04 -05:00
Alejandro Pedraza 567288a060
Dual-stack support for ExternalWorkloads (#12965)
* Dual-stack support for ExternalWorkloads

This changes the `workloadIPs.maxItems` field in the ExternalWorkload CRD from `1` to `2`, to accommodate for an IPv4 and IPv6 pair. This is a BC change, so there's no need to bump the CRD version.

The control plane already supports this, so this change is mainly about expansions to the unit tests to also account for the double stack case.
2024-08-30 13:23:56 -05:00
hanghuge 78d42b230d
chore: fix function name in comment (#12396)
Fixed comments for `subscribeToServicesWithContext` and `reconcileByAddressType`. Previously,
the comments contained incorrect function names.

Signed-off-by: hanghuge <cmoman@outlook.com>
2024-04-10 15:46:45 +01:00
Matei David 98e38a66b6
Rename meshTls to meshTLS in ExternalWorkload CRD (#12098)
The ExternalWorkload resource we introduced has a minor naming
inconsistency; `Tls` in `meshTls` is not capitalised. Other resources
that we have (e.g. authentication resources) capitalise TLS (and so does
Go, it follows a similar naming convention).

We fix this in the workload resource by changing the field's name and
bumping the version to `v1beta1`.

Upgrading the control plane version will continue to work without
downtime. However, if an existing resource exists, the policy controller
will not completely initialise. It will not enter a crashloop backoff,
but it will also not become ready until the resource is edited or
deleted.

Signed-off-by: Matei David <matei@buoyant.io>
2024-02-20 11:00:13 -08:00
Zahari Dichev bf7b039f41
controller: add counter for items dropped from workqueue (#12079)
Adds a metric that measures the number of items that have been discarded from the work queue in the external workloads controller due to the retries limit being exceeded.

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-02-15 16:19:09 +02:00
Matei David d4f99b32ce
Revise leader election logic for endpoints controller (#12021)
Revise leader election logic for endpoints controller

Our leader election logic can result in updates being missed under certain
conditions. Leases expire after their duration is up, even if their current
holder has been terminated. During this dead time, any changes in the
system will be observed by other controllers, but will not be written to
the API Server.

For example, during a rollout, a controller that has come up will not be
able to acquire the lease for a maximum time of 30 seconds (lease
duration). Within this time frame, any changes to the system (e.g. modified
workloads, services, deleted endpointslices) will be observed but not acted
on by the newly created controller. Once the controller gets into a bad
state, it can only recover after 10 minutes (via service resyncs) or if any
resources are modified.

To address this, we change our leader election mechanism. Instead of
pushing leader election to the edge (i.e. when performing writes) we only
allow events to be observed when a controller is leading (i.e. by
registering callbacks). When a controller stops leading, all of its
callbacks will be de-registered.

NOTE:

 * controllers will have a grace period during which they can renew their
   lease. Their callbacks will be de-registered only if this fails. We will
   not register and de-register callbacks that often for a single
   controller.
 * we do not lose out on any state. Other informers will continue to run
   (e.g. destination readers). When callbacks are registered, we pass all of
   the cached objects through them. In other words, we do not issue API
   requests on registration, we process the state of the cluster as observed
   from the cache.
 * we make another change that's slightly orthogonal. Before we shutdown,
   we ensure to drain the queue. This should not be a race since we will
   first block until the queue is drained, then signal to the leader elector
   loop that we are done. This gives us some confidence that all events have
   been processed as soon as they were observed.

Signed-off-by: Matei David <matei@buoyant.io>
2024-02-01 17:46:22 -05:00
Matei David 4a8e760d95
Fix how names are generated for external EndpointSlice resources (#12016)
Any slices generated for a group of external workloads follow a similar
convention: `linkerd-external-<svc-name>-<hash>`. Currently the hash is
appended directly to the service name making it less readable. We add a
`-` to the generate name value so that random hashes are not part of the
service name. This is similar to the upstream implementation.

Signed-off-by: Matei David <matei@buoyant.io>
2024-01-31 09:57:29 +00:00
Matei David 9c902dc6b4
Add an endpoints reconciler component for external workloads (#11948)
We introduced an endpoints controller that will be responsible for
managing EndpointSlices for services that select external workloads. We
introduce as a follow-up the reconciler component of the controller that
will be responsible for doing the writes and diffing.

Additionally, the controller is wired-up in the destination service's
main routine and will start if endpoint slice support is enabled.

---------

Signed-off-by: Matei David <matei@buoyant.io>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Co-authored-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-24 16:55:16 +00:00
Alex Leong 65f13de2ce
Add support for ExternalWorkloads in endpoint profiles (#11952)
When a meshed client attempts to establish a connection directly to the workload IP of an ExternalWorkload, the destination controller should return an endpoint profile for that ExternalWorkload with a single endpoint and the metadata associated with that ExternalWorkload including:
* mesh TLS identity
* workload metric labels
* opaque / protocol hints

Signed-off-by: Alex Leong <alex@buoyant.io>
2024-01-23 09:43:12 -08:00
Zahari Dichev 6f4cdf6617
discovery: add metrics to endpoints controller workqueue (#11958)
This PR adds metrics to the work queue that is used in the external workload endpoints controller. 

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
2024-01-22 15:34:50 +02:00
Matei David 983fc55abc
Introduce new external endpoints controller (#11905)
For mesh expansion, we need to register an ExternalWorkload's service
membership. Service memberships describe which Service objects an
ExternalWorkload is part of (i.e. which service can be used to route
traffic to an external endpoint).

Service membership will allow the control plane to discover
configuration associated with an external endpoint when performing
discovery on a service target.

To build these memberships, we introduce a new controller to the
destination service, responsible for watching Service and
ExternalWorkload objects, and for writing out EndpointSlice objects for
each Service that selects one or more external endpoints.

As a first step, we add a new externalworkload module and a new controller in the
that watches services and workloads. In a follow-up change, 
the ExternalEndpointManager will additionally perform
the necessary reconciliation by writing EndpointSlice objects.

Since Linkerd's control plane may run in HA, we also add a lease object
that will be used by the manager. When a lease is claimed, a flag is
turned on in the manager to let it know it may perform writes.

A more compact list of changes:
* Add a new externalworkload module
* Add an EndpointsController in the module along with necessary mechanisms to watch resources.
* Add RBAC rules to the destination service:
  * Allow policy and destination to read ExternalWorkload objects
  * Allow destination to create / update / read Lease objects

---------

Signed-off-by: Matei David <matei@buoyant.io>
2024-01-17 12:15:28 +00:00