Commit Graph

358 Commits

Author SHA1 Message Date
MenD32 3a2933a24c docs: added helpful comments
Signed-off-by: MenD32 <amit.mendelevitch@gmail.com>
2025-05-27 21:31:47 +03:00
MenD32 4e8bd0ada5 tests: added test to check that scaledowns work with topology spread constraints
Signed-off-by: MenD32 <amit.mendelevitch@gmail.com>
2025-05-24 15:34:19 +03:00
MenD32 ea1c308130 fix: hard topology spread constraints stop scaledown
Signed-off-by: MenD32 <amit.mendelevitch@gmail.com>
2025-05-24 15:09:38 +03:00
Kubernetes Prow Robot c85f22f7dd
Merge pull request #7798 from omerap12/migrate-claimReservedForPod
migrate claimReservedForPod to use upstream IsReservedForPod
2025-05-13 09:57:16 -07:00
Norbert Cyran 6ab7e2eb78 Prevent nil dereference of preFilterStatus 2025-05-07 10:38:20 +02:00
Piotr Betkier ac1c7b5463 use k8s.io/component-helpers/resource for pod request calculations 2025-04-22 17:36:17 +02:00
jinglinliang 25af21c515 Add unit test to allow draining when StatefulSet kind has custom API Group 2025-04-09 14:03:00 -07:00
jinglinliang cc3a9f5d10 Allow draining when StatefulSet kind has custom API Group 2025-04-09 14:03:00 -07:00
Omran 696af986ed
Add time based drainability rule for non-pdb-assigned system pods 2025-03-24 12:47:16 +00:00
mendelski 68c7d1a84e
Force preempting system-node-critical daemon sets 2025-02-17 18:10:27 +00:00
Kushagra Nigam c47cb7083c ignore unexpored fields 2025-02-12 11:57:28 +00:00
Omer Aplatony e02272a4f1 migrate claimReservedForPod to use upstream IsReservedForPod
Signed-off-by: Omer Aplatony <omerap12@gmail.com>
2025-02-02 21:09:08 +02:00
Maciej Skoczeń b36e3879a2 Fix data race while setting delta cluster state in parallel 2025-01-15 13:28:52 +00:00
Ismail Alidzhikov b676bb91ef logging: Start from a capital letter 2025-01-07 17:49:29 +02:00
Maciej Skoczeń 39882551f7 Parallelize cluster snapshot creation 2025-01-03 10:35:11 +00:00
Kubernetes Prow Robot 50c65906fd
Merge pull request #7530 from towca/jtuznik/dra-actual
CA: DRA integration MVP
2024-12-20 16:30:08 +01:00
Kuba Tużnik a45e6b7003 CA: implement DRA integration tests for StaticAutoscaler 2024-12-20 13:30:36 +01:00
Kuba Tużnik c5cb8a077d CA: add DRA object handling logic to PredicateSnapshot
All added logic is behind the DRA flag guard, this should be a no-op
if the flag is disabled.
2024-12-20 13:30:36 +01:00
Kuba Tużnik 714ab661ca CA: implement calculating utilization for DRA resources
The logic is very basic and will likely need to be revised, but it's
something for initial testing. Utilization of a given Pool is calculated
as the number of allocated devices in the pool divided by the number of
all devices in the pool. For scale-down purposes, the max utilization
of all Node-local Pools is used.

The new logic is mostly behind the DRA flag guard, so this should be a no-op
if the flag is disabled. The only difference should be that FilterOutUnremovable
marks a Node as unremovable if calculating utilization fails. Not sure
why this wasn't the case before, but I think we need it for DRA - if CA sees an
incomplete picture of a resource pool, we probably don't want to scale
the Node down.
2024-12-20 13:30:36 +01:00
Kuba Tużnik 4e68a0c6ef CA: sanitize and propagate DRA objects through NodeInfos in node_info utils 2024-12-20 13:30:36 +01:00
Kuba Tużnik 479d7ce3d6 CA: implement a Provider for dynamicresources.Snapshot
The Provider uses DRA object listers to create a Snapshot of the
DRA objects.
2024-12-20 13:30:36 +01:00
Kuba Tużnik 377639a8dc CA: implement dynamicresources.Snapshot for storing and modifying the state of DRA objects
The Snapshot can hold all DRA objects in the cluster, and expose them
to the scheduler framework via the SharedDRAManager interface.

The state of the objects can be modified during autoscaling simulations
using the provided methods.
2024-12-20 13:30:10 +01:00
Kuba Tużnik 66d0aeb3cb CA: implement utils for interacting with ResourceClaims
These utils will be used by various parts of the DRA logic in the
following commits.
2024-12-19 15:55:49 +01:00
Kubernetes Prow Robot c2972a8000
Merge pull request #7606 from towca/jtuznik/node-info-fix
CA: fix a nil map write in NodeInfo.AddPod()
2024-12-16 12:48:51 +01:00
Kuba Tużnik 410bd7cea5 CA: fix a nil map write in NodeInfo.AddPod()
If the NodeInfo is created via WrapSchedulerNodeInfo with nil
podExtraInfos, subsequent AddPod() calls panic on trying to add
extra info for the pod.
2024-12-13 16:21:37 +01:00
Kuba Tużnik 4e283e34ee CA: don't error out in HintingSimulator if a hinted Node is gone
If a hinted Node is no longer in the cluster snapshot (e.g. it was
a fake upcoming Node and the real one appeared).

This was introduced in the recent PredicateChecker->PredicateSnapshot
refactor. Previously, PredicateChecker.CheckPredicates() would return
an error if the hinted Node was gone, and HintingSimulator treated
this error the same as failing predicates - it would move on to the
non-hinting logic. After the refactor, HintingSimulator explicitly
errors out if it can't retrieve the hinted Node from the snapshot,
so the behavior changed.

I checked other CheckPredicates()/SchedulePod() callsites, and this is
the only one when ignoring the missing Node makes sense. For the others,
the Node is added to the snapshot just before the call, so it being
missing should cause an error.
2024-12-12 14:20:51 +01:00
Kubernetes Prow Robot 37b3da4e79
Merge pull request #7529 from towca/jtuznik/dra-prep
CA: prepare for DRA integration
2024-12-09 17:14:03 +00:00
Kuba Tużnik 0691512d27 CA: extend SchedulerPluginRunner with RunReserveOnNode
RunReserveOnNode runs the Reserve phase of schedulerframework,
which is necessary to obtain ResourceClaim allocations computed
by the DRA scheduler plugin.

RunReserveOnNode isn't used anywhere yet, so this should be a no-op.
2024-12-09 17:38:13 +01:00
Kuba Tużnik 307002eb42 CA: move NodeInfo methods from ClusterSnapshotStore to ClusterSnapshot
All the NodeInfo methods have to take DRA into account, and the logic
for that will be the same for different ClusterSnapshotStore implementations.
Instead of duplicating the new logic in Basic and Delta, the methods
are moved to ClusterSnapshot and the logic will be implemented once in
PredicateSnapshot.

PredicateSnapshot will use the DRA Snapshot exposed by its ClusterSnapshotStore
to implement these methods. The DRA Snapshot has to be stored in the
ClusterSnapshotStore layer, as we need to be able to fork/commit/revert it.

Lower-level methods for adding/removing just the schedulerframework.NodeInfo
parts are added to ClusterSnapshotStore. PredicateSnapshot utilizes these methods
to implement AddNodeInfo and RemoveNodeInfo.

This should be a no-op, it's just a refactor.
2024-12-09 17:38:04 +01:00
Kuba Tużnik eba5e08f6d CA: integrate BasicSnapshotStore with drasnapshot.Snapshot
Store the DRA snapshot inside the current internal data in
SetClusterState().

Retrieve the DRA snapshot from the current internal data in
DraSnapshot().

Clone the DRA snapshot whenever the internal data is cloned
during Fork(). This matches the forking logic that BasicSnapshotStore
uses, ensuring that the DRA object state is correctly
forked/commited/reverted during the corresponding ClusterSnapshot
operations.

This should be a no-op, as DraSnapshot() isn't called anywhere yet,
adn no DRA snapshot is passed to SetClusterState() yet.
2024-12-09 17:14:45 +01:00
Kuba Tużnik 466f94b780 CA: extend ClusterSnapshotStore to allow storing, retrieving and modifying DRA objects
A new DRA Snapshot type is introduced, for now with just dummy methods
to be implemented in later commits. The new type is intended to hold all
DRA objects in the cluster.

ClusterSnapshotStore.SetClusterState() is extended to take the new DRA Snapshot in
addition to the existing parameters.

ClusterSnapshotStore.DraSnapshot() is added to retrieve the DRA snapshot set by
SetClusterState() back. This will be used by PredicateSnapshot to implement DRA
logic later.

This should be a no-op, as DraSnapshot() is never called, and no DRA
snapshot is passed to SetClusterState() yet.
2024-12-09 17:14:45 +01:00
Kuba Tużnik 1e560274d5 CA: extend WrapSchedulerNodeInfo to allow passing DRA objects
This should be a no-op, as no DRA objects are passed yet.
2024-12-09 17:14:45 +01:00
Kuba Tużnik d0338fa301 CA: integrate simulator with schedulerframework.SharedDRAManager
Make SharedDRAManager a part of the ClusterSnapshotStore interface, and
implement dummy methods to satisfy the interface. Actual implementation
will come in later commits.

This is needed so that ClusterSnapshot can feed DRA objects to the DRA
scheduler plugin, and obtain ResourceClaim modifications back from it.

The integration is behind the DRA flag guard, this should be a no-op
if the flag is disabled.
2024-12-09 17:14:34 +01:00
Kuba Tużnik 8c7f3fadc6 CA: plumb the DRA flag guard to PredicateSnapshot 2024-12-06 13:40:47 +01:00
Kuba Tużnik 16983d2cdd CA: Fix a data race in framework.NewHandle
Multiple tests can call NewHandle() concurrently, because of
t.Parallel(). NewHandle calls schedulermetrics.InitMetrics()
which modifies global variables, so there's a race.

Wrapped the schedulermetrics.InitMetrics() call in a sync.Once.Do()
so that it's only done once, in a thread-safe manner.
2024-12-04 20:18:50 +01:00
Kuba Tużnik 054d5d2e7c CA: refactor SchedulerBasedPredicateChecker into SchedulerPluginRunner
For DRA, this component will have to call the Reserve phase in addition
to just checking predicates/filters.

The new version also makes more sense in the context of
PredicateSnapshot, which is the only context now.

While refactoring, I noticed that CheckPredicates for some reason
doesn't check the provided Node against the eligible Nodes returned
from PreFilter (while FitsAnyNodeMatching does do that). This seems like
a bug, so the check is added.

The checks in FitsAnyNodeMatching are also reordered so that the
cheapest ones are checked earliest.
2024-12-04 14:33:51 +01:00
Kuba Tużnik 6876289228 CA: remove PredicateChecker, use the new ClusterSnapshot methods instead 2024-12-04 14:33:51 +01:00
Kuba Tużnik 0ace148d3d CA: rename BasicClusterSnapshot and DeltaClusterSnapshot to reflect the ClusterSnapshotStore change 2024-12-04 14:33:51 +01:00
Kuba Tużnik 67773a5509 CA: move BasicClusterSnapshot and DeltaClusterSnapshot to a dedicated subpkg 2024-12-04 14:33:51 +01:00
Kuba Tużnik 540725286f CA: migrate the codebase to use PredicateSnapshot 2024-12-04 14:33:51 +01:00
Kuba Tużnik 46e9f398fe CA: introduce PredicateSnapshot
PredicateSnapshot implements the ClusterSnapshot methods that need
to run predicates on top of a ClusterSnapshotStore.

testsnapshot pkg is introduced, providing functions abstracting away
the snapshot creation for tests.

ClusterSnapshot tests are moved near PredicateSnapshot, as it'll be
the only "full" implementation.
2024-12-04 14:33:50 +01:00
Kuba Tużnik ce185226d1 CA: extend ClusterSnapshot interface with predicate-checking methods
To handle DRA properly, scheduling predicates will need to be run
whenever Pods are scheduled in the snapshot.

PredicateChecker always needs a ClusterSnapshot to work, and ClusterSnapshot
scheduling methods need to run the predicates first. So it makes most
sense to have PredicateChecker be a dependency for ClusterSnapshot
implementations, and move the PredicateChecker methods to
ClusterSnapshot.

This commit mirrors PredicateChecker methods in ClusterSnapshot (with
the exception of FitsAnyNode which isn't used anywhere and is trivial to
do via FitsAnyNodeMatching). Further commits will remove the
PredicateChecker interface and move the implementation under
clustersnapshot.

Dummy methods are added to current ClusterSnapshot implementations to
get the tests to pass. Further commits will actually implement them.

PredicateError is refactored into a broader SchedulingError so that the
ClusterSnapshot methods can return a single error that the callers can
use to distinguish between a failing predicate and other, unexpected
errors.
2024-12-04 14:33:40 +01:00
Kuba Tużnik a35f830f1d CA: extract a Handle to scheduleframework.Framework out of PredicateChecker
This decouples PredicateChecker from the Framework initialization logic,
and allows creating multiple PredicateChecker instances while only
initializing the framework once.

This commit also fixes how CA integrates with Framework metrics. Instead
of Registering them they're only Initialized so that CA doesn't expose
scheduler metrics. And the initialization is moved from multiple
different places to the Handle constructor.
2024-12-03 16:47:54 +01:00
Kuba Tużnik eb26816ce9 CA: refactor utils related to NodeInfos
simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate,
and scheduler_utils.DeepCopyTemplateNode all had very similar logic
for sanitizing and copying NodeInfos. They're all consolidated to
one file in simulator, sharing common logic.

DeepCopyNodeInfo is changed to be a framework.NodeInfo method.

MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to
correlate Nodes to scheduled pods, instead of using a live Pod lister.
This means that the snapshot now has to be properly initialized in a
bunch of tests.
2024-11-27 12:51:30 +01:00
Kuba Tużnik 473a1a8ffc CA: remove Clear from ClusterSnapshot
It's now redundant - SetClusterState with empty arguments does the same
thing.
2024-11-19 15:28:27 +01:00
Kuba Tużnik f67db627e2 CA: rename ClusterSnapshot AddPod, RemovePod, RemoveNode
RemoveNode is renamed to RemoveNodeInfo for consistency with other
NodeInfo methods.

For DRA, the snapshot will have to potentially allocate ResourceClaims
when adding a Pod to a Node, and deallocate them when removing a Pod
from a Node. This will happen in new methods added to ClusterSnapshot
in later commits - SchedulePod and UnschedulePod. These new methods
should be the "default" way of moving pods around the snapshot going
forward.

However, we'll still need to be able to add and remove pods from the
snapshot "forcefully" to handle some corner cases (e.g. expendable pods).
AddPod is renamed to ForceAddPod, and RemovePod to ForceRemovePod to
highlight that these are no longer the "default" methods of moving pods
around the snapshot, and are bypassing something important.
2024-11-19 15:28:21 +01:00
Kuba Tużnik a81aa5c616 CA: remove AddNode from ClusterSnapshot
AddNodeInfo already provides the same functionality, and has to be used
in production code in order to propagate DRA objects correctly.

Uses in production are replaced with SetClusterState(), which will later
take DRA objects into account. Uses in the test code are replaced with
AddNodeInfo().
2024-11-19 15:28:16 +01:00
Kuba Tużnik 38603883db CA: remove redundant IsPVCUsedByPods from ClusterSnapshot
The method is already accessible via StorageInfos(), it's
redundant.
2024-11-19 15:28:11 +01:00
Kuba Tużnik 517ecb992f CA: add SetClusterState to ClusterSnapshot, remove AddNodes
AddNodes() is redundant - it was indended for batch adding nodes,
with batch-specific optimizations in mind probably. However, it
has always been implemented as just iterating over AddNode(), and
is only used in test code.

Most of the uses in the test code were initializing the cluster state.
They are replaced with SetClusterState(), which will later be needed for
handling DRA anyway (we'll have to start tracking things that aren't
node- or pod-scoped). The other uses are replaced with inline loops over
AddNode().
2024-11-19 15:28:06 +01:00
Kuba Tużnik 269c7a339e CA: remove AddNodeWithPods from ClusterSnapshot, replace uses with AddNodeInfo
We need AddNodeInfo in order to propagate DRA objects through the
snapshot, which makes AddNodeWithPods redundant.
2024-11-19 15:27:59 +01:00