Explicitly handle nil as a return value for nodeGroup in
`calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called
for GPU nodes that don't exist. The current logic generates a runtime
exception:
"reflect: call of reflect.Value.IsNil on zero Value"
Looking through the rest of the tree all the other places that use
this pattern additionally and explicitly check whether `nodeGroup ==
nil` first.
This change now completes the pattern in
`calculateScaleDownGpusTotal()`.
Looking at the other occurrences of this pattern we see:
```
File: clusterstate/clusterstate.go
488:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/utils.go
231:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
322:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
394:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
461:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/scale_down.go
185:6: if reflect.ValueOf(nodeGroup).IsNil() {
608:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
747:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
1010:25: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
```
with the notable exception at core/scale_down.go:185 which is
`calculateScaleDownGpusTotal()`.
With this change, and invoking the autoscaler with:
```
...
--max-nodes-total=24 \
--cores-total=8:128 \
--memory-total=4:256 \
--gpu-total=nvidia.com/gpu:0:16 \
--gpu-total=amd.com/gpu:0:4 \
...
```
I no longer see a runtime exception.
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
* Add GPU-related scaled_up & scaled_down metrics
* Fix name to match SD naming convention
* Fix import after master rebase
* Change the logic to include GPU-being-installed nodes
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.