as discussed with the cluster api community[0], the nodegroupset
processor is being removed from the clusterapi provider implementation
in favor of instructing our community on the use of the
--balancing-ignore-label flag. due to the wide variety of provider
infrastructures that clusterapi can be deployed on, we would prefer to
not encode all of these labels in the autoscaler itself. see the linked
recording for more information.
[0] https://www.youtube.com/watch?v=jbhca_9oPuQ
Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.
This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
This change simplifies debugging GPU issues: without it, all nodes can
be Ready as far as Kubernetes API is concerned, but CA will still report
some of them as unready if are missing GPU resource. Explicitly calling
them out in the status ConfigMap will point into the right direction.
There's a race condition between DaemonSet pods getting scheduled to a
new node and Cluster Autoscaler caching that node for the sake of
predicting future nodes in a given node group. We can reduce the risk of
missing some DaemonSet by providing a grace period before accepting nodes in the
cache. 1 minute should be more than enough, except for some pathological
edge cases.
This allows the ClusterAPI provider to ignore the
`topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset
processor. It also adds unit tests to exercise the new processor.
This change adds the aforementioned label to the list of ignored labels
in the AWS nodegroupset processor. This change is being made in response
to the addition of this label by the aws-ebs-csi-driver. This label will
eventually be deprecated by the driver, but its use will prevent AWS
users from properly balancing similar nodes. Also adds unit test for the
AWS processor.
ref: https://github.com/kubernetes/autoscaler/issues/3230
ref: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729
Every other processors (and, I think, function in CA?) that takes
AutoscalingContext has it as first parameter. Changing the new processor
for consistency.
Supports providing different NodeInfos sources (either upstream or in
local forks, eg. to properly implement variants like in #4000).
This also moves a large and specialized code chunk out of core, and removes
the need to maintain and pass the GetNodeInfosForGroups() cache from the side,
as processors can hold their states themselves.
No functional changes to GetNodeInfosForGroups(), outside mechanical changes
due to the move: remotely call a few utils functions in core/utils package,
pick context attributes (the processor takes the context as arg rather than
ListerRegistry + PredicateChecker + CloudProvider), and use the builtin cache
rather than receiving it from arguments.
This is the first step of implementing
https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
Commit bb2eed1cff introduced a new `topology.gke.io/zone` label to
GCE nodes templates, for CSI needs.
That label holds zone name, making nodeInfo templates dissimilar
for groups belonging to different zones. The CA otherwise tries to
ignore those zonal labels (ie. it ignores the standards LabelZoneRegion
and LabelZoneFailureDomain) when it looks for nodegroups similarities.
In testing, AWS M5 instances can on occasion display approximately a 1% difference
in memory capacity between availability zones, deployed with the same launch
configuration and same AMI.
Allow a 1.5% tolerance to give some buffer on the actual amount of memory discrepancy
since in testing, some examples were just over 1% (eg 1.05%, 1.1%).
Tests are included with capacity values taken from real instances to prevent future
regression.