Commit Graph

82 Commits

Author SHA1 Message Date
Daniel Kłobuszewski 780e68f6d2 Fix incorrect usage of klog .*f functions
The .*f variants should only ever be called with arguments to format.
2025-03-13 13:24:52 +01:00
Kuba Tużnik 4e68a0c6ef CA: sanitize and propagate DRA objects through NodeInfos in node_info utils 2024-12-20 13:30:36 +01:00
Kuba Tużnik 6876289228 CA: remove PredicateChecker, use the new ClusterSnapshot methods instead 2024-12-04 14:33:51 +01:00
Kuba Tużnik 540725286f CA: migrate the codebase to use PredicateSnapshot 2024-12-04 14:33:51 +01:00
Kuba Tużnik a35f830f1d CA: extract a Handle to scheduleframework.Framework out of PredicateChecker
This decouples PredicateChecker from the Framework initialization logic,
and allows creating multiple PredicateChecker instances while only
initializing the framework once.

This commit also fixes how CA integrates with Framework metrics. Instead
of Registering them they're only Initialized so that CA doesn't expose
scheduler metrics. And the initialization is moved from multiple
different places to the Handle constructor.
2024-12-03 16:47:54 +01:00
Kuba Tużnik eb26816ce9 CA: refactor utils related to NodeInfos
simulator.BuildNodeInfoForNode, core_utils.GetNodeInfoFromTemplate,
and scheduler_utils.DeepCopyTemplateNode all had very similar logic
for sanitizing and copying NodeInfos. They're all consolidated to
one file in simulator, sharing common logic.

DeepCopyNodeInfo is changed to be a framework.NodeInfo method.

MixedTemplateNodeInfoProvider now correctly uses ClusterSnapshot to
correlate Nodes to scheduled pods, instead of using a live Pod lister.
This means that the snapshot now has to be properly initialized in a
bunch of tests.
2024-11-27 12:51:30 +01:00
Kuba Tużnik f67db627e2 CA: rename ClusterSnapshot AddPod, RemovePod, RemoveNode
RemoveNode is renamed to RemoveNodeInfo for consistency with other
NodeInfo methods.

For DRA, the snapshot will have to potentially allocate ResourceClaims
when adding a Pod to a Node, and deallocate them when removing a Pod
from a Node. This will happen in new methods added to ClusterSnapshot
in later commits - SchedulePod and UnschedulePod. These new methods
should be the "default" way of moving pods around the snapshot going
forward.

However, we'll still need to be able to add and remove pods from the
snapshot "forcefully" to handle some corner cases (e.g. expendable pods).
AddPod is renamed to ForceAddPod, and RemovePod to ForceRemovePod to
highlight that these are no longer the "default" methods of moving pods
around the snapshot, and are bypassing something important.
2024-11-19 15:28:21 +01:00
Kuba Tużnik a81aa5c616 CA: remove AddNode from ClusterSnapshot
AddNodeInfo already provides the same functionality, and has to be used
in production code in order to propagate DRA objects correctly.

Uses in production are replaced with SetClusterState(), which will later
take DRA objects into account. Uses in the test code are replaced with
AddNodeInfo().
2024-11-19 15:28:16 +01:00
Kuba Tużnik 269c7a339e CA: remove AddNodeWithPods from ClusterSnapshot, replace uses with AddNodeInfo
We need AddNodeInfo in order to propagate DRA objects through the
snapshot, which makes AddNodeWithPods redundant.
2024-11-19 15:27:59 +01:00
Kuba Tużnik 879c6a84a4 DRA: migrate all of CA to use the new internal NodeInfo/PodInfo
The new wrapper types should behave like the direct schedulerframework
types for most purposes, so most of the migration is just changing
the imported package.

Constructors look a bit different, so they have to be adapted -
mostly in test code. Accesses to the Pods field have to be changed
to a method call.

After this, the schedulerframework types are only used in the new
wrappers, and in the parts of simulator/ that directly interact with
the scheduler framework. The rest of CA codebase operates on the new
wrapper types.
2024-11-05 16:43:43 +01:00
Bartłomiej Wróblewski 068ce78272 Register scheduler metrics 2024-10-23 16:47:34 +00:00
Daniel Gutowski 5aa6b2cb07 Introduce binbacking optimization for similar pods.
The optimization uses the fact that pods which are equivalent do not
need to be check multiple times against already filled nodes.
This changes the time complexity from O(pods*nodes) to O(pods).
2024-04-04 10:15:47 -07:00
Daniel Gutowski 5d0c973652 Make the Estimate func accept pods grouped.
The grouping should be made by the schedulability equivalence
meaning we can introduce optimizations to the binpacking.

Introduce a benchmark that estimates capacity needed for 51k pods,
which can be grouped to two equivalence groups 50k and 1k.
2024-04-02 02:23:44 -07:00
Aleksandra Gacek 4470430007 Fix klog formating directives in cluster-autoscaler package. 2023-11-07 16:13:57 +01:00
Patrick Ohly ad98cb321a autoscaler: fix premature end of binpacking
When PermissionToAddNode gets called without actually adding a new node, the
node counter in the thresholdBasedEstimationLimiter gets out of sync with the
actual number of new nodes. This can happen when the "is the last node empty"
check triggers.

The solution used here is to rearrange the checks so that PermissionToAddNode
is followed by adding a new node. Alternatively, it might also be possible
to pass the current number of nodes as parameter.
2023-09-30 20:57:48 +02:00
Artur Żyliński 21229d34ec Add EstimationAnalyserFunc to be run at the end of the estimation logic 2023-07-25 09:21:28 +02:00
Yuriy Stryuchkov b4213d8244 Add support for negative binpacking duration limit in threshold based estimation limiter 2023-07-05 10:56:33 +02:00
Yuriy Stryuchkov f3dfeeeb47 Make signature of GetDurationLimit uniformed with GetNodeLimit
For SNG threshold include capacity of the currently estimated node group (as it is not part of SNG itself)
Replaced direct calls with use of getters in cluster capacity threshold
Renamed getters removing the verb Get
Replace EstimationContext struct with interface
Add support for negative threshold value in estimation limiter
2023-07-04 17:42:21 +02:00
Yuriy Stryuchkov 0d0e3fce38 Fix tests 2023-07-03 14:55:06 +02:00
Yuriy Stryuchkov a947ec1f57 Implement threshold interface for use by threshold based limiter
Add EstimationContext to take into account runtime state of the autoscaling for estimations
Implement static threshold
Implement cluster capacity threshold for Estimation Limiter
Implement similar node groups capacity threshold for Estimation Limiter
Set default estimation thresholds
2023-07-03 13:52:14 +02:00
Kubernetes Prow Robot f87dbe5fc6
Merge pull request #5715 from kisieland/revert-estimator-expansion
Revert "Add new method 'ReachedLimit' to EstimationLimiter"
2023-04-28 02:04:17 -07:00
Daniel Gutowski 4930cf3742 Revert commit 3ad77e8341
This was meant to be used as a signal that the estimation
did consider all of the pods, but x13n pointed out that
other k8s primitives may also limit it so the best option
is to compare a list of estimated pods.
2023-04-27 06:04:11 -07:00
Kubernetes Prow Robot 8099d0db86
Merge pull request #5713 from jayantjain93/pod-group-processor
Binpacking Estimator pod priority
2023-04-27 05:46:15 -07:00
Jayant Jain 9624e7d11f Binpacking Estimator pod orderer
Refactor the Binpacking estimator by making the pod sort extendable by moving the logic into decreasing_pod_orderer.go
Cleanup sorting logic/structs from binpacking_estimator.go
2023-04-27 12:16:03 +00:00
Daniel Gutowski 3ad77e8341 Add new method 'ReachedLimit' to EstimationLimiter
This method will allow CA to check if any the limiter
blocked addition of any new Node.
2023-04-18 00:54:47 -07:00
Daniel Kłobuszewski b4a47c3295
Fix int formatting in threshold_based_limiter logs 2022-12-08 09:55:01 +01:00
Yaroslava Serdiuk 92bba5c93f Allow forking snapshot more than 1 time 2022-11-16 14:19:30 +00:00
Daniel Kłobuszewski 18f2e67c4f Split out code from simulator package 2022-10-18 11:51:44 +02:00
Maciek Pytel 5342f189f1 Check if pods fit on the new node in binpacking
Previously we've just assumed pod will always fit on a newly added node
during binpacking, because we've already checked that a pod fits on an
empty template node earlier in scale-up logic.
This assumption is incorrect, as it doesn't take into account potential
impact of other scheduling we've done in binpacking. For pods using
zonal Filters (such as PodTopologySpreading with zonal topology key) the
pod may no longer be able to schedule even on an empty node as a result
of earlier decisions we've made in binpacking.
2022-06-20 17:02:51 +02:00
Maciek Pytel ab891418f6 Limit binpacking based on #new_nodes or time
The binpacking algorithm is O(#pending_pods * #new_nodes) and
calculating a very large scale-up can get stuck for minutes or even
hours, leading to CA failing it's healthcheck and going down.
The new limiting prevents this scenario by stopping binpacking after
reaching specified threshold. Any pods that remain pending as a result
of shorter binpacking will be processed next autoscaler loop.

The thresholds used can be controlled with newly introduced flags:
--max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The
limiting can be disabled by setting both flags to 0 (not recommended,
especially for --max-nodegroup-binpacking-duration).
2022-06-20 17:02:51 +02:00
Maciek Pytel f599494f48 Add EstimationLimiter interface, update Estimator 2022-06-20 17:02:51 +02:00
Benjamin Pineau 030a2152b0 Fix templated nodeinfo names collisions in BinpackingNodeEstimator
Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses
the same shared DeepCopyTemplateNode function and inherits its naming
pattern, which is great as that fixes a long standing bug.

Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with
generated nodeinfos and nodes having predictable names (using template name
+ an incremental ordinal starting at 0) for upcoming nodes.

Later, when it looks for fitting nodes for unschedulable pods (when upcoming
nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity,
or pods antiaffinity, ...), the binpacking estimator will also build virtual
nodes and place them in a snapshot fork to evaluate scheduler predicates.

Those temporary virtual nodes are built using the same pattern (template name
and an index ordinal also starting at 0) as the one previously used by
`getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes
names for nodegroups having upcoming nodes.

But adding nodes by the same name in an existing cluster snapshot isn't
allowed, and the evaluation attempt will fail.

Practically this blocks re-upscales for nodegroups having upcoming nodes,
which can cause a significant delay.
2021-05-19 12:05:40 +02:00
Maciek Pytel 9831623810 Set different hostname label for upcoming nodes
Function copying template node to use for upcoming nodes was
not chaning hostname label, meaning that features relying on
this label (ex. pod antiaffinity on hostname topology) would
treat all upcoming nodes as a single node.
This resulted in triggering too many scale-ups for pods
using such features. Analogous function in binpacking didn't
have the same bug (but it didn't set unique UID or pod names).
I extracted the functionality to a util function used in both
places to avoid the two functions getting out of sync again.
2021-02-12 19:41:04 +01:00
Bartłomiej Wróblewski 0fb897b839 Update imports after scheduler scheduler/framework/v1alpha1 removal 2020-11-30 10:48:52 +00:00
Maciek Pytel 9a5f38f484 Use FitsAnyNode in binpacking
This means that PreFilters are run once per pod in binpacking
instead of #pods*#nodes times. This makes a huge performance
difference in very large clusters.
2020-08-13 15:34:14 +02:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
Aleksandra Malinowska 5d44b202bc Forget FakeNodeInfoForNodeName ever existed 2020-02-21 15:36:21 +01:00
Łukasz Osipiuk 7b67d3f582 klog.Fatalf on error from ClusterSnapshot.Revert() 2020-02-04 20:52:07 +01:00
Łukasz Osipiuk e5c60c81a9 Remove Estimator's upcoming nodes paramter 2020-02-04 20:52:04 +01:00
Łukasz Osipiuk 8f18bab081 Set kubernetes.io/hostname label on simulated node in BinpackingNodeEstimator 2020-02-04 20:51:47 +01:00
Łukasz Osipiuk b6e1b26f45 Use ClusterSnapshot in BinpackingNodeEstimator 2020-02-04 20:51:45 +01:00
Łukasz Osipiuk 30ce46cc28 Pass ClusterSnapshot to BinpackingNodeEstimator 2020-02-04 20:51:29 +01:00
Łukasz Osipiuk 6b2287af4f Pass ClusterSnaphost explicitly to PredicateChecker 2020-02-04 20:51:24 +01:00
Łukasz Osipiuk db85b1b6f1 Implement NewTestPredicateChecker 2020-02-04 20:51:20 +01:00
Łukasz Osipiuk 373c558303 Extract PredicateChecker interface 2020-02-04 20:51:18 +01:00
Łukasz Osipiuk 4a2b8c7dfc Remove use of PredicateMetadata 2020-02-04 20:51:05 +01:00
Łukasz Osipiuk cdcc693ab9 Remove OldBinpackingEstimator 2020-01-14 15:29:16 +01:00
Vivek Bagade 90aa28a077 Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements 2019-06-19 14:48:47 +02:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00