Commit Graph

55 Commits

Author SHA1 Message Date
MenD32 81a348d0e3 fix: binpacking simulator scale up optimization on pods with topology spread constraint
binpacking simulator will now consider old nodes when trying to pack pods with topology spread constraints in order to avoid unecessary scale ups. The previous behavior did not consider that nodes that were once unschedulable within the pod equivalence group can can become scehdulable for a pod. this can happen with topology spread constraint since node scale ups can increase the global minimum, thus allowing existing nodes to schedule pods due to the increase in global_minimum+max_skew.

Signed-off-by: MenD32 <amit.mendelevitch@gmail.com>
2025-05-28 23:43:43 +03:00
Norbert Cyran 6ab7e2eb78 Prevent nil dereference of preFilterStatus 2025-05-07 10:38:20 +02:00
Norbert Cyran 9a5e3d9f3d Allow using scheduled pods as samples in proactive scale up 2025-03-19 12:33:39 +01:00
Kuba Tużnik 377639a8dc CA: implement dynamicresources.Snapshot for storing and modifying the state of DRA objects
The Snapshot can hold all DRA objects in the cluster, and expose them
to the scheduler framework via the SharedDRAManager interface.

The state of the objects can be modified during autoscaling simulations
using the provided methods.
2024-12-20 13:30:10 +01:00
Kuba Tużnik 66d0aeb3cb CA: implement utils for interacting with ResourceClaims
These utils will be used by various parts of the DRA logic in the
following commits.
2024-12-19 15:55:49 +01:00
Kuba Tużnik 358f8c0d21 DRA: remove utils/test dependency on cloudprovider
utils/test is supposed to be usable in any CA package. Having a
dependency on cloudprovider makes it unusuable in any package
that cloudprovider depends on because of import cycles.

The cloudprovider import is only needed by GetGpuConfigFromNode,
which is only used in info_test.go. This commit just moves
GetGpuConfigFromNode there as an unexported function.
2024-11-05 16:42:26 +01:00
Zihan Jiang e6ab3438fa refine log and test case 2024-04-26 20:10:48 -07:00
Zihan Jiang 15ca53f4a6 add unit test 2024-04-25 14:26:59 -07:00
Daniel Gutowski 5aa6b2cb07 Introduce binbacking optimization for similar pods.
The optimization uses the fact that pods which are equivalent do not
need to be check multiple times against already filled nodes.
This changes the time complexity from O(pods*nodes) to O(pods).
2024-04-04 10:15:47 -07:00
Walid Ghallab aada657452 Add BuildTestNodeWithAllocatable test utility method. 2024-02-27 19:26:48 +00:00
Kubernetes Prow Robot fc48d5c052
Merge pull request #6139 from damikag/priority-evictor
Implement priority based evictor
2023-12-21 18:18:53 +01:00
damikag 9ffbea4408 implement priority based evictor and refactor drain logic 2023-12-21 16:57:05 +00:00
Daniel Kłobuszewski e3d3303f89 [GCE] Support paginated instance listing 2023-12-15 09:16:09 +01:00
Mahmoud Atwa 4635a6dc04 Allow users to specify which schedulers to ignore 2023-11-22 11:18:44 +00:00
Piotr ffe6537163 Unifies pod listing. 2023-09-11 17:11:11 +00:00
vadasambar 7941bab214 feat: set `IgnoreDaemonSetsUtilization` per nodegroup
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-07-06 10:31:45 +05:30
Hakan Bostan 2b602fca9f Use GpuConfig in utilization calculations for scale-down
* Changed the `utilization.Calculate()` function to use GpuConfig
  instead of GPU label.
* Started using GpuConfig in utilization threshold calculations.
2023-02-15 08:28:24 +00:00
Bartłomiej Wróblewski c3d8e81b98 Don't add pods from drained nodes in scale-down 2022-12-09 16:26:54 +00:00
Aleksandra Gacek ab2cc2fb8a Bump k/k dependencies to v1.25.0 together with go.mod go version. 2022-08-26 13:38:07 +02:00
Yaroslava Serdiuk 1cbcfbcbe7 Add ephemeral storage price to PodPrice 2022-06-03 16:12:17 +00:00
Yaroslava Serdiuk 581f1d7bc6 Add ephemeral storage price to NodePrice 2022-06-03 16:12:17 +00:00
Marwan Ahmed 286f44e351 fix pod equivalency checks for pods with projected volumes 2021-12-21 17:02:30 +02:00
Brett Elliott 3b48a3193f Set cluster autoscaler-specific user agent.
Refactored mocks to remove redundancy.
2021-04-06 17:49:35 +02:00
Vivek Bagade 8c592f0c04 Fix bug where a node that becomes ready after 2 mins can be
treated as unready. Deprecated LongNotStarted

 In cases where node n1 would:
 1) Be created at t=0min
 2) Ready condition is true at t=2.5min
 3) Not ready taint is removed at t=3min
 the ready node is counted as unready

 Tested cases after fix:
 1) Case described above
 2) Nodes not starting even after 15mins still
 treated as unready
 3) Nodes created long ago that suddenly become unready are
 counted as unready.
2021-03-11 18:32:51 +01:00
Jason DeTiberus 0d8a4f9f97
Check content-type on response 2020-11-09 11:57:39 -05:00
Jayant Jain d987394c55 review fixes.
added explicit ignoring for write func
2020-10-01 14:35:43 +00:00
Jayant Jain ccd405e80a lint fix 2020-10-01 12:29:16 +00:00
Jayant Jain a632d33b9a added a new NodeGroupDoesNotExistError in errors.go
This is to support no nodepool exists conditions, which crashes Autoscaler
2020-10-01 12:21:08 +00:00
Łukasz Osipiuk d7770e3044 Use ClusterSnapshot in ScaleDown 2020-02-04 20:51:48 +01:00
Łukasz Osipiuk 1469058470 Get rid of removed testapis 2020-02-04 20:51:03 +01:00
Łukasz Osipiuk 90a7e47123 Add GPU taint toleration for test pods requiring GPUs 2020-01-03 11:22:21 +01:00
Łukasz Osipiuk 1b6c75f4f9 Set StartTime in Pod objects used in tests 2019-11-22 14:08:09 +01:00
Łukasz Osipiuk a849ead286 Precompute inter pod equivalence groups in checkPodsSchedulableOnNode 2019-05-29 18:05:52 +02:00
Jiaxin Shan 83ae66cebc Consider GPU utilization in scaling down 2019-04-04 01:12:51 -07:00
Aleksandra Malinowska 364e2da764 Check for ready condition not true 2018-08-30 13:43:24 +02:00
Łukasz Osipiuk c406da4174 Support gpus in nodes and pods definitions in UT 2018-05-15 22:43:31 +02:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Krzysztof Jastrzebski 6b8b8b8fe1 Cloudprovider/gce/gce_manager.go unit tests. 2017-09-19 11:16:08 +02:00
Aleksandra Malinowska ac0d8388bc use OwnerReferences instead of deprecated created by annotation 2017-08-29 17:26:38 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Maciej Pytel 849b3a2712 Function to compare nodeinfos to find similar nodegroups 2017-05-31 13:21:27 +02:00
Marcin Wielgus 80bf191f02 GCE pricing model 2017-05-26 17:37:32 +02:00
Maciej Pytel 6b2ea76973 Added UT for CA simulator 2017-04-19 19:12:30 +02:00
Marcin Wielgus 2ffaddb7c0 Cluster-autoscaler: lint 2017-03-02 15:15:07 +01:00
Marcin Wielgus 72a47dc2b2 Cluster-autoscaler: update code for 1.6 k8s sync 2017-03-02 14:34:49 +01:00
Marcin Wielgus ce45c33d29 Cluster-autoscaler: update CA code for godep refresh 2017-01-20 14:46:34 +01:00
Marcin Wielgus 949cf37465 Cluster-autoscaler: support unready nodes in scale down 2017-01-03 14:17:59 +01:00