Previously, if any of the nodes fails to delete, the processor gets
a ScaleDownError status. After this commit, it will get the list of
nodes that were successfully deleted.
Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments
Reverted incorrect change to with error format string
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Resolve conflict
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix error strings in testscases to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix more error strings to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Explicitly handle nil as a return value for nodeGroup in
`calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called
for GPU nodes that don't exist. The current logic generates a runtime
exception:
"reflect: call of reflect.Value.IsNil on zero Value"
Looking through the rest of the tree all the other places that use
this pattern additionally and explicitly check whether `nodeGroup ==
nil` first.
This change now completes the pattern in
`calculateScaleDownGpusTotal()`.
Looking at the other occurrences of this pattern we see:
```
File: clusterstate/clusterstate.go
488:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/utils.go
231:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
322:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
394:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
461:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/scale_down.go
185:6: if reflect.ValueOf(nodeGroup).IsNil() {
608:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
747:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
1010:25: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
```
with the notable exception at core/scale_down.go:185 which is
`calculateScaleDownGpusTotal()`.
With this change, and invoking the autoscaler with:
```
...
--max-nodes-total=24 \
--cores-total=8:128 \
--memory-total=4:256 \
--gpu-total=nvidia.com/gpu:0:16 \
--gpu-total=amd.com/gpu:0:4 \
...
```
I no longer see a runtime exception.
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
* Add GPU-related scaled_up & scaled_down metrics
* Fix name to match SD naming convention
* Fix import after master rebase
* Change the logic to include GPU-being-installed nodes
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.