Commit Graph

38 Commits

Author SHA1 Message Date
Yaroslava Serdiuk 92bba5c93f Allow forking snapshot more than 1 time 2022-11-16 14:19:30 +00:00
Daniel Kłobuszewski 18f2e67c4f Split out code from simulator package 2022-10-18 11:51:44 +02:00
Maciek Pytel 5342f189f1 Check if pods fit on the new node in binpacking
Previously we've just assumed pod will always fit on a newly added node
during binpacking, because we've already checked that a pod fits on an
empty template node earlier in scale-up logic.
This assumption is incorrect, as it doesn't take into account potential
impact of other scheduling we've done in binpacking. For pods using
zonal Filters (such as PodTopologySpreading with zonal topology key) the
pod may no longer be able to schedule even on an empty node as a result
of earlier decisions we've made in binpacking.
2022-06-20 17:02:51 +02:00
Maciek Pytel ab891418f6 Limit binpacking based on #new_nodes or time
The binpacking algorithm is O(#pending_pods * #new_nodes) and
calculating a very large scale-up can get stuck for minutes or even
hours, leading to CA failing it's healthcheck and going down.
The new limiting prevents this scenario by stopping binpacking after
reaching specified threshold. Any pods that remain pending as a result
of shorter binpacking will be processed next autoscaler loop.

The thresholds used can be controlled with newly introduced flags:
--max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The
limiting can be disabled by setting both flags to 0 (not recommended,
especially for --max-nodegroup-binpacking-duration).
2022-06-20 17:02:51 +02:00
Maciek Pytel f599494f48 Add EstimationLimiter interface, update Estimator 2022-06-20 17:02:51 +02:00
Benjamin Pineau 030a2152b0 Fix templated nodeinfo names collisions in BinpackingNodeEstimator
Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses
the same shared DeepCopyTemplateNode function and inherits its naming
pattern, which is great as that fixes a long standing bug.

Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with
generated nodeinfos and nodes having predictable names (using template name
+ an incremental ordinal starting at 0) for upcoming nodes.

Later, when it looks for fitting nodes for unschedulable pods (when upcoming
nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity,
or pods antiaffinity, ...), the binpacking estimator will also build virtual
nodes and place them in a snapshot fork to evaluate scheduler predicates.

Those temporary virtual nodes are built using the same pattern (template name
and an index ordinal also starting at 0) as the one previously used by
`getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes
names for nodegroups having upcoming nodes.

But adding nodes by the same name in an existing cluster snapshot isn't
allowed, and the evaluation attempt will fail.

Practically this blocks re-upscales for nodegroups having upcoming nodes,
which can cause a significant delay.
2021-05-19 12:05:40 +02:00
Maciek Pytel 9831623810 Set different hostname label for upcoming nodes
Function copying template node to use for upcoming nodes was
not chaning hostname label, meaning that features relying on
this label (ex. pod antiaffinity on hostname topology) would
treat all upcoming nodes as a single node.
This resulted in triggering too many scale-ups for pods
using such features. Analogous function in binpacking didn't
have the same bug (but it didn't set unique UID or pod names).
I extracted the functionality to a util function used in both
places to avoid the two functions getting out of sync again.
2021-02-12 19:41:04 +01:00
Bartłomiej Wróblewski 0fb897b839 Update imports after scheduler scheduler/framework/v1alpha1 removal 2020-11-30 10:48:52 +00:00
Maciek Pytel 9a5f38f484 Use FitsAnyNode in binpacking
This means that PreFilters are run once per pod in binpacking
instead of #pods*#nodes times. This makes a huge performance
difference in very large clusters.
2020-08-13 15:34:14 +02:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
Aleksandra Malinowska 5d44b202bc Forget FakeNodeInfoForNodeName ever existed 2020-02-21 15:36:21 +01:00
Łukasz Osipiuk 7b67d3f582 klog.Fatalf on error from ClusterSnapshot.Revert() 2020-02-04 20:52:07 +01:00
Łukasz Osipiuk e5c60c81a9 Remove Estimator's upcoming nodes paramter 2020-02-04 20:52:04 +01:00
Łukasz Osipiuk 8f18bab081 Set kubernetes.io/hostname label on simulated node in BinpackingNodeEstimator 2020-02-04 20:51:47 +01:00
Łukasz Osipiuk b6e1b26f45 Use ClusterSnapshot in BinpackingNodeEstimator 2020-02-04 20:51:45 +01:00
Łukasz Osipiuk 30ce46cc28 Pass ClusterSnapshot to BinpackingNodeEstimator 2020-02-04 20:51:29 +01:00
Łukasz Osipiuk 6b2287af4f Pass ClusterSnaphost explicitly to PredicateChecker 2020-02-04 20:51:24 +01:00
Łukasz Osipiuk 373c558303 Extract PredicateChecker interface 2020-02-04 20:51:18 +01:00
Łukasz Osipiuk 4a2b8c7dfc Remove use of PredicateMetadata 2020-02-04 20:51:05 +01:00
Vivek Bagade 90aa28a077 Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements 2019-06-19 14:48:47 +02:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Vivek Bagade c6b87841ce Added a new method that uses pod packing to filter schedulable pods
filterOutSchedulableByPacking is an alternative to the older
filterOutSchedulable. filterOutSchedulableByPacking sorts pods in
unschedulableCandidates by priority and filters out pods that can be
scheduled on free capacity on existing nodes. It uses a basic packing
approach to do this. Pods with nominatedNodeName set are always
filtered out.

filterOutSchedulableByPacking is set to be used by default, but, this
can be toggled off by setting filter-out-schedulable-pods-uses-packing
flag to false, which would then activate the older and more lenient
filterOutSchedulable(now called filterOutSchedulableSimple).

Added test cases for both methods.
2019-01-25 16:09:51 +05:30
Aleksandra Malinowska bf6ff4be8e Clean up estimators 2018-11-06 14:15:42 +01:00
Aleksandra Malinowska f5690aab96 Make CheckPredicates return predicateError 2018-08-28 14:11:35 +02:00
Pengfei Ni be3dd85503 Update scheduler cache package 2018-06-11 13:54:12 +08:00
Łukasz Osipiuk 5344cc637c Use less verbose sort.Slice instead sort.Sort 2018-04-25 17:47:06 +02:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Maciej Pytel 281afa7147 precompute predicateMetadata in scale-down 2017-08-29 16:29:45 +02:00
Maciej Pytel fb6ef75d12 Don't create verbose errors in predicates if we ignore them
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Marcin Wielgus 2ffaddb7c0 Cluster-autoscaler: lint 2017-03-02 15:15:07 +01:00
Marcin Wielgus 72a47dc2b2 Cluster-autoscaler: update code for 1.6 k8s sync 2017-03-02 14:34:49 +01:00
Marcin Wielgus 27d58c7f7c Cluster-autoscaler: add comming nodes to estimators 2016-12-30 15:19:42 +01:00
Marcin Wielgus 7b63b6c1f1 Cluster-autoscaler: update code to compile with K8S 1.5 2016-12-13 17:22:57 +01:00
Marcin Wielgus 0561d0ea84 ClusterAutoscaler: first fit decreasing estimate algorithm 2016-10-04 11:28:14 +02:00