Automatic merge from submit-queue
Cluster-Autoscaler: make status less confusing
Previously min and max in status were refering to
non-obvious internal variables, which was pretty confusing.
Automatic merge from submit-queue
Cluster-autoscaler: add information about which version is supported in which k8s
cc: @andrewsykim @MaciekPytel @fgrzadkowski
Automatic merge from submit-queue
Cluster-Autoscaler: reset unneededNodes list on cluster failure
If the nodes is marked as unneeded and cluster goes to an unhealthy state shortly after the node will likely be deleted immediately on cluster recovery. This is because there is already an entry for it in unnededNodes datastructure and the cluster downtime is counted towards node being unneeded time.
It's not 100% obvious to me what should happen in this case, but I think it's better to play it safe and just wait the full 10 minutes after cluster recovery before we start to delete nodes. After a quick glance at the code I haven't spotted any other stuff that needs to be cleaned up in case of cluster failure, but maybe you have some other ideas @mwielgus?
Automatic merge from submit-queue
Cluster-Autoscaler: update status configmap on errors
Previously it would only update after successfully completing the main
loop, meaning the status wouldn't get updated unless cluster was
healthy.
Automatic merge from submit-queue
Cluster-Autoscaler: consider node with unknown readiness unready
Node with non-responsive kubelet seems to be marked as NodeReady: Unknown, which is currently considered as ready by CA.
Automatic merge from submit-queue
Cluster-autoscaler: fix NotTriggerScaleUp event
This should fix a failing e2e test.
Also updated some scale_up unittests to check created events and fixed a typo in variable name.
Automatic merge from submit-queue
Cluster-Autoscaler: fix delete taint failing
It was using old node version (which in general is always going to be outdated, as we've likely modified it by adding delete taint).
@mwielgus
Automatic merge from submit-queue
Cluster-Autoscaler: fix delete taint value format
Fix a bug, where non-compliant value format prevented CA deletetaint from being created (which in turn caused CA node drain to fail).
Automatic merge from submit-queue
Cluster-Autoscaler: Update timestamps in status configmap
Update LastProbeTime and LastTransitionTime fields in ClusterStateRegistry (previously they weren't used and always showed as epoch in status). Update scale down part of status whenever list of unneeded nodes in CA changes.
Automatic merge from submit-queue
Cluster-Autoscaler: skip nodes currently under deletion in scale down
Currently we may try to delete the same node multiple times.
cc: @MaciekPytel @jszczepkowski @fgrzadkowski
Automatic merge from submit-queue
Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2
In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict.
cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski