Previously, if any of the nodes fails to delete, the processor gets
a ScaleDownError status. After this commit, it will get the list of
nodes that were successfully deleted.
Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments
Reverted incorrect change to with error format string
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Resolve conflict
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix error strings in testscases to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Fix more error strings to remedy failing tests
Signed-off-by: CodeLingo Bot <bot@codelingo.io>
Explicitly handle nil as a return value for nodeGroup in
`calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called
for GPU nodes that don't exist. The current logic generates a runtime
exception:
"reflect: call of reflect.Value.IsNil on zero Value"
Looking through the rest of the tree all the other places that use
this pattern additionally and explicitly check whether `nodeGroup ==
nil` first.
This change now completes the pattern in
`calculateScaleDownGpusTotal()`.
Looking at the other occurrences of this pattern we see:
```
File: clusterstate/clusterstate.go
488:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/utils.go
231:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
322:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
394:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
461:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
File: core/scale_down.go
185:6: if reflect.ValueOf(nodeGroup).IsNil() {
608:27: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
747:26: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
1010:25: if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
```
with the notable exception at core/scale_down.go:185 which is
`calculateScaleDownGpusTotal()`.
With this change, and invoking the autoscaler with:
```
...
--max-nodes-total=24 \
--cores-total=8:128 \
--memory-total=4:256 \
--gpu-total=nvidia.com/gpu:0:16 \
--gpu-total=amd.com/gpu:0:4 \
...
```
I no longer see a runtime exception.
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
* Add GPU-related scaled_up & scaled_down metrics
* Fix name to match SD naming convention
* Fix import after master rebase
* Change the logic to include GPU-being-installed nodes
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
Automatic merge from submit-queue
Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2
In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict.
cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.
The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.
Example usage:
```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```
The configmap would look like:
```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
settings: |-
{
"nodeGroups": [
{
"minSize": 1,
"maxSize": 2,
"name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
}
]
}
```
Other notes:
* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267
* Trigger a full-recreate on a configuration change
according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410
* Introduced `autoscaler/` and moved all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663)
* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`
* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.
* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)
* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`