binpacking simulator will now consider old nodes when trying to pack pods with topology spread constraints in order to avoid unecessary scale ups. The previous behavior did not consider that nodes that were once unschedulable within the pod equivalence group can can become scehdulable for a pod. this can happen with topology spread constraint since node scale ups can increase the global minimum, thus allowing existing nodes to schedule pods due to the increase in global_minimum+max_skew.
Signed-off-by: MenD32 <amit.mendelevitch@gmail.com>
The Snapshot can hold all DRA objects in the cluster, and expose them
to the scheduler framework via the SharedDRAManager interface.
The state of the objects can be modified during autoscaling simulations
using the provided methods.
utils/test is supposed to be usable in any CA package. Having a
dependency on cloudprovider makes it unusuable in any package
that cloudprovider depends on because of import cycles.
The cloudprovider import is only needed by GetGpuConfigFromNode,
which is only used in info_test.go. This commit just moves
GetGpuConfigFromNode there as an unexported function.
The optimization uses the fact that pods which are equivalent do not
need to be check multiple times against already filled nodes.
This changes the time complexity from O(pods*nodes) to O(pods).
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: test cases failing for actuator and scaledown/eligibility
- abstract default values into `config`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code
- there is no change in the flag name
- rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: reset help text for `ignore-daemonsets-utilization` flag
- because per nodegroup override is supported only for AWS ASG tags as of now
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: add info about overriding `--ignore-daemonsets-utilization` per ASG
- in AWS cloud provider README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface
- to limit the functions that can be used
- since we need it only for `GetIgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: tests failing for actuator
- rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter`
- move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: go lint errors for `MockNodeGroupConfigGetter`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: update node group config processor tests for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: update eligibility test cases for `IgnoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: run actuation tests for 2 NGS
- one with `IgnoreDaemonSetsUtilization`: `false`
- one with `IgnoreDaemonSetsUtilization`: `true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add tests for `IgnoreDaemonSetsUtilization` in actuator
- add helper to generate multiple ds pods dynamically
- get rid of mock config processor because it is not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: fix failing tests for actuator
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use `generateDsPods` instead of `generateDsPod`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
* Changed the `utilization.Calculate()` function to use GpuConfig
instead of GPU label.
* Started using GpuConfig in utilization threshold calculations.
treated as unready. Deprecated LongNotStarted
In cases where node n1 would:
1) Be created at t=0min
2) Ready condition is true at t=2.5min
3) Not ready taint is removed at t=3min
the ready node is counted as unready
Tested cases after fix:
1) Case described above
2) Nodes not starting even after 15mins still
treated as unready
3) Nodes created long ago that suddenly become unready are
counted as unready.