Commit Graph

122 Commits

Author SHA1 Message Date
Kubernetes Prow Robot e3552bb95e
Merge pull request #5084 from grosser/grosser/ref
cluster-autoscaler: avoid goto in filterNodeGroupsByPods
2022-08-18 04:54:36 -07:00
Michael Grosser 87dba05278
cluster-autoscaler: avoid goto in filterNodeGroupsByPods 2022-08-11 10:33:32 -07:00
Kubernetes Prow Robot 07ea116616
Merge pull request #5036 from grosser/grosser/join
use strings.Join to build list of names
2022-08-10 02:06:31 -07:00
Michael Grosser a09f4d9c7f
use strings.Join to build list of names 2022-08-08 11:55:01 -07:00
Michael McCune da9d307e57 add metric for skipped scaling events
This change adds a new metric, skipped_scale_events_count, which will
record the number of times that the CA has chosen to skip a scaling
event. The metric contains a label for the scaling direction (up or down)
and the reason.

This patch includes usages for the new metric based on CPU or Memory
limits being reached in eiter a scale up or down.
2022-07-28 10:51:49 -04:00
Maciek Pytel f599494f48 Add EstimationLimiter interface, update Estimator 2022-06-20 17:02:51 +02:00
Marwan Ahmed fd089c2d15 avoid double wrapping scale up error 2021-12-22 15:47:05 +02:00
Kubernetes Prow Robot dc4a818b69
Merge pull request #4014 from tomkerkhove/ScaledUpGroup-info
Improve ScaledUpGroup event info to include current & max nodes
2021-11-22 07:11:08 -08:00
Aleksandra Gacek a7660c1f5a Set PodsTriggeredScaleUp field for failed scale ups. 2021-10-15 12:38:51 +02:00
Aleksandra Gacek b5677acc80 Extend ScaleUpStatus with node groups that failed scale up. 2021-10-13 12:53:43 +02:00
Aleksandra Gacek b194c6f252 Extend ScaleUpStatus structure with ScaleUpError field. 2021-08-12 10:40:58 +02:00
Tom Kerkhove 245db5b6b1 Improve ScaledUpGroup event info to include current & made nodes
Signed-off-by: GitHub <noreply@github.com>
2021-04-14 13:32:24 +00:00
Kubernetes Prow Robot 6432771415
Merge pull request #3971 from BigDarkClown/feat/resource-processor
Separate and refactor custom resources logic
2021-04-07 04:41:52 -07:00
Bartłomiej Wróblewski 1698e0e583 Separate and refactor custom resources logic 2021-04-07 10:31:11 +00:00
Brett Elliott 013fa19be3 Log failed scale up metric based on string of AutoscalerErrorType. 2021-03-23 15:37:04 +01:00
Brett Elliott 4cddaed2f2 Support for reporting authorization errors during scale up 2021-03-17 14:56:03 +01:00
Bartłomiej Wróblewski 0fb897b839 Update imports after scheduler scheduler/framework/v1alpha1 removal 2020-11-30 10:48:52 +00:00
Marwan Ahmed a3bada3708 correctly classify error for failed scale ups 2020-09-13 21:14:27 -07:00
Maciek Pytel 9fb6cdc079 Fix go fmt errors 2020-06-08 13:52:24 +02:00
Maciek Pytel 2160e6d49e Rewrite glogx to work with klogv2 (+rename klogx) 2020-06-05 17:22:26 +02:00
Maciek Pytel 655b4081f4 Migrate to klog v2 2020-06-05 17:22:26 +02:00
Enxebre 49ce70acbd report MaxNodesTotal count during scale up
This change adds a warning event to signal when a planned scale up
operation would go over the maximum total nodes. This is being proposed
for two primary reasons: as an event that can be watched during
end-to-end testing, and as a signal to users when this condition is
occurring.
2020-05-14 10:28:18 -04:00
Jakub Tużnik 73a5cdf928 Address recent breaking changes in scheduler
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
2020-04-24 17:54:47 +02:00
Julien Balestra 628128f65e cluster-autoscaler/taints: refactor current taint logics in the same package
Signed-off-by: Julien Balestra <julien.balestra@datadoghq.com>
2020-02-25 13:57:23 +01:00
Aleksandra Malinowska 5d44b202bc Forget FakeNodeInfoForNodeName ever existed 2020-02-21 15:36:21 +01:00
Łukasz Osipiuk 7b67d3f582 klog.Fatalf on error from ClusterSnapshot.Revert() 2020-02-04 20:52:07 +01:00
Łukasz Osipiuk e5c60c81a9 Remove Estimator's upcoming nodes paramter 2020-02-04 20:52:04 +01:00
Łukasz Osipiuk 9433ef9ffa Use ClusterSnapshot in ScaleUp 2020-02-04 20:51:43 +01:00
Łukasz Osipiuk 30ce46cc28 Pass ClusterSnapshot to BinpackingNodeEstimator 2020-02-04 20:51:29 +01:00
Łukasz Osipiuk 6b2287af4f Pass ClusterSnaphost explicitly to PredicateChecker 2020-02-04 20:51:24 +01:00
Łukasz Osipiuk b0c6d25182 Cleanup simulator.PredicateError 2020-02-04 20:51:11 +01:00
Łukasz Osipiuk 4a2b8c7dfc Remove use of PredicateMetadata 2020-02-04 20:51:05 +01:00
Aleksandra Malinowska c83d609352 Compute expansion options for additional created groups 2020-02-03 17:54:05 +01:00
Aleksandra Malinowska d6849e82b6 Simplify equivalence group usage 2020-01-15 19:40:45 +01:00
Łukasz Osipiuk 7f083d2393 Move core/utils.go to separate package and split into multiple files 2019-10-22 14:23:40 +02:00
Jakub Tużnik 43466ff837 Provide ScaleUpStatusProcessor with info about all rejected node groups
Previously, it had info only about the ones that actually exist.

The changes to the eventing processor are done to keep its previous
behavior the same.
2019-08-19 12:48:10 +02:00
Jakub Tużnik 935476a7e2 Provide more info to ScaleUpStatusProcessor
Add info about considered and created nodegroups to
ScaleUpStatusProcessor
2019-08-12 17:20:09 +02:00
Aleksandra Malinowska c27ae4eb24 Add resource limit type to NotTriggerScaleUp event 2019-07-03 16:38:46 +02:00
Łukasz Osipiuk a849ead286 Precompute inter pod equivalence groups in checkPodsSchedulableOnNode 2019-05-29 18:05:52 +02:00
Chris Bradfield 92ea680f1a Implement an --ignore-taint flag
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Jiaxin Shan 90666881d3 Move GPULabel and GPUTypes to cloud provider 2019-03-25 13:03:01 -07:00
Andrew McDermott 5ae76ea66e UPSTREAM: <carry>: fix max cluster size calculation on scale up
When scaling up the calculation for computing the maximum cluster size
does not take into account the number of any upcoming nodes and it is
possible to grow the cluster beyond the cluster
size (--max-nodes-total).

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1670695
2019-03-08 13:28:58 +00:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Vivek Bagade 79ef3a6940 unexporting methods in utils.go 2019-01-25 00:06:03 +05:30
Łukasz Osipiuk 85a83b62bd Pass nodeGroup->NodeInfo map to ClusterStateRegistry
Change-Id: Ie2a51694b5731b39c8a4135355a3b4c832c26801
2019-01-08 15:52:00 +01:00
Kubernetes Prow Robot 4002559a4c
Merge pull request #1516 from frobware/fix-max-nodes-total-upstream
fix calculation of max cluster size
2019-01-03 10:02:38 -08:00
Maciej Pytel 3f0da8947a Use listers in scale-up 2019-01-02 15:56:01 +01:00
Kubernetes Prow Robot ab7f1e69be
Merge pull request #1464 from losipiuk/lo/stockouts2
Better quota-exceeded/stockout handling
2018-12-31 05:28:08 -08:00
Łukasz Osipiuk 9689b30ee4 Do not use time.Now() in RegisterFailedScaleUp 2018-12-28 17:17:07 +01:00
Łukasz Osipiuk da5bef307b Allow updating Increase for ScaleUpRequest in ClusterStateRegistry 2018-12-28 17:17:07 +01:00