- this is so that scale down is not blocked on local storage volume
- for pods where it is okay to ignore local storage volume
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: tests failing
- there was a problem in the logic
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add unit test for `IgnoreLocalStorageVolumeKey`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use `IgnoreLocalStorageVolumeKey` in tests instead of hardcoding the annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: wording for test name
- `pod with EmptyDir but IgnoreLocalStorageVolumeKey annotation` -> `pod with EmptyDir and IgnoreLocalStorageVolumeKey annotation`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: simulator drain tests failing
- set local storage vol name (required)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: add support for multiple vals in `safe-to-evict-local-volume` annotation
- add more unit tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename ignore local vol key `safe-to-evict-local-volume` -> `safe-to-evict-local-volumes`
- abtract code to process annotation into a separate fn
- shorten name for test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: update FAQ with info about `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: add the FAQ for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: fix formatting for `safe-to-evict-local-volumes` in FAQ
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: format the `safe-to-evict-local-volumes` as a bullet
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: fix `Unless` -> `unless` to make it consistent with other lines
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test: add an extra test for mismatching local vol value in annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: make the wording clearer
- for `safe-to-evict-local-volumes` annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 144a64a402)
fix: set `replicated` to true if controller ref is set to `true`
- forgot to add this in the last commit
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit f8f458295d)
fix: remove `checkReferences`
- not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
(cherry picked from commit 5df6e31f8b)
test(drain): add test for custom controller pod
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
feat: add flag to allow scale down on custom controller pods
- set to `false` by default
- `false` will be set to `true` by default in the future
- right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true`
- TODO: this code might need some unit tests. Look into adding unit tests.
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: remove `at` symbol in prefix of `vadasambar`
- to keep it consistent with previous such mentions in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test(utils): run all drain tests twice
- once for `allowScaleDownOnCustomControllerOwnedPods=false`
- and once for `allowScaleDownOnCustomControllerOwnedPods=true`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs(utils): add description for `testOpts` struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
test(utils/drain): fix failing tests
- refactor code to add cusom controller pod test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: fix long code comments
- clean-up print statements
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: move `expectFatal` right above where it is used
- makes the code easier to read
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: fix code comment wording
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: address PR comments
- abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future
- fix param info in the FAQ.md
- simplify tests and remove the global variable used in the tests
- rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods`
- refactor tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
docs: update flag info
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
fix: forgot to change flag name on a line in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: use `ControllerRef()` directly instead of `controllerRef`
- we don't need an extra variable
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: create tests consolidated test cases
- from looping over and tweaking shared test cases
- so that we don't have to duplicate shared test cases
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
refactor: append test flag to shared test description
- so that the failed test is easy to identify
- shallow copy tests and add comments so that others do the same
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
* Added the new resource_name field to scaled_up/down_gpu_nodes_total,
representing the resource name for the gpu.
* Changed metrics registrations to use GpuConfig
* Changed the `utilization.Calculate()` function to use GpuConfig
instead of GPU label.
* Started using GpuConfig in utilization threshold calculations.
* Added GetNodeGpuConfig to cloud provider which returns a GpuConfig
struct containing the gpu label, type and resource name if the node
has a GPU.
* Added initial implementaion of the GetNodeGpuConfig to all cloud
providers.
Node state is refreshed and checked again before deleting the node
It gives kube-scheduler time to acknowledge that nodes state has
changed and to stop scheduling pods on them
this change brings in a new command line flag,
`--record-duplicated-events`, which allows a user to enable the
duplication of events bypassing the 5 minute de-duplication window.
This change simplifies debugging GPU issues: without it, all nodes can
be Ready as far as Kubernetes API is concerned, but CA will still report
some of them as unready if are missing GPU resource. Explicitly calling
them out in the status ConfigMap will point into the right direction.
Both upscale's `getUpcomingNodeInfos` and the binpacking estimator now uses
the same shared DeepCopyTemplateNode function and inherits its naming
pattern, which is great as that fixes a long standing bug.
Due to that, `getUpcomingNodeInfos` will enrich the cluster snapshots with
generated nodeinfos and nodes having predictable names (using template name
+ an incremental ordinal starting at 0) for upcoming nodes.
Later, when it looks for fitting nodes for unschedulable pods (when upcoming
nodes don't satisfy those (FitsAnyNodeMatching failing due to nodes capacity,
or pods antiaffinity, ...), the binpacking estimator will also build virtual
nodes and place them in a snapshot fork to evaluate scheduler predicates.
Those temporary virtual nodes are built using the same pattern (template name
and an index ordinal also starting at 0) as the one previously used by
`getUpcomingNodeInfos`, which means it will generate the same nodeinfos/nodes
names for nodegroups having upcoming nodes.
But adding nodes by the same name in an existing cluster snapshot isn't
allowed, and the evaluation attempt will fail.
Practically this blocks re-upscales for nodegroups having upcoming nodes,
which can cause a significant delay.
treated as unready. Deprecated LongNotStarted
In cases where node n1 would:
1) Be created at t=0min
2) Ready condition is true at t=2.5min
3) Not ready taint is removed at t=3min
the ready node is counted as unready
Tested cases after fix:
1) Case described above
2) Nodes not starting even after 15mins still
treated as unready
3) Nodes created long ago that suddenly become unready are
counted as unready.
Function copying template node to use for upcoming nodes was
not chaning hostname label, meaning that features relying on
this label (ex. pod antiaffinity on hostname topology) would
treat all upcoming nodes as a single node.
This resulted in triggering too many scale-ups for pods
using such features. Analogous function in binpacking didn't
have the same bug (but it didn't set unique UID or pod names).
I extracted the functionality to a util function used in both
places to avoid the two functions getting out of sync again.