linkerd2

History

Matei David d4f99b32ce Revise leader election logic for endpoints controller (#12021 ) Revise leader election logic for endpoints controller Our leader election logic can result in updates being missed under certain conditions. Leases expire after their duration is up, even if their current holder has been terminated. During this dead time, any changes in the system will be observed by other controllers, but will not be written to the API Server. For example, during a rollout, a controller that has come up will not be able to acquire the lease for a maximum time of 30 seconds (lease duration). Within this time frame, any changes to the system (e.g. modified workloads, services, deleted endpointslices) will be observed but not acted on by the newly created controller. Once the controller gets into a bad state, it can only recover after 10 minutes (via service resyncs) or if any resources are modified. To address this, we change our leader election mechanism. Instead of pushing leader election to the edge (i.e. when performing writes) we only allow events to be observed when a controller is leading (i.e. by registering callbacks). When a controller stops leading, all of its callbacks will be de-registered. NOTE: * controllers will have a grace period during which they can renew their lease. Their callbacks will be de-registered only if this fails. We will not register and de-register callbacks that often for a single controller. * we do not lose out on any state. Other informers will continue to run (e.g. destination readers). When callbacks are registered, we pass all of the cached objects through them. In other words, we do not issue API requests on registration, we process the state of the cluster as observed from the cache. * we make another change that's slightly orthogonal. Before we shutdown, we ensure to drain the queue. This should not be a race since we will first block until the queue is drained, then signal to the leader elector loop that we are done. This gives us some confidence that all events have been processed as soon as they were observed. Signed-off-by: Matei David <matei@buoyant.io>		2024-02-01 17:46:22 -05:00
..
api	Revise leader election logic for endpoints controller (#12021 )	2024-02-01 17:46:22 -05:00
cmd	Add an endpoints reconciler component for external workloads (#11948 )	2024-01-24 16:55:16 +00:00
gen	Relax validation for ExternalWorkload Status fields (#11979 )	2024-01-24 14:12:32 +00:00
heartbeat	build(deps): bump linkerd/dev from 39 to 40 (#10825 )	2023-05-09 10:57:19 -07:00
identity	core: use serviceAccountToken volume for pod authentication (#7117 )	2021-11-03 02:03:39 +05:30
k8s	Add an endpoints reconciler component for external workloads (#11948 )	2024-01-24 16:55:16 +00:00
proxy-injector	Bump proxy-init to v2.2.4 (#11988 )	2024-01-26 09:28:14 -08:00
script	policy: regenerate Server go bindings (#11920 )	2024-01-15 11:09:31 +02:00
sp-validator	Use metadata API in the proxy and tap injectors (#9650 )	2022-11-16 09:21:39 -05:00
webhook	Add ability to configure client-go's `QPS` and `Burst` settings (#11644 )	2023-11-28 15:25:05 -05:00
Dockerfile	dev: v42 (#11563 )	2023-11-03 13:55:06 -07:00