as discussed with the cluster api community[0], the nodegroupset
processor is being removed from the clusterapi provider implementation
in favor of instructing our community on the use of the
--balancing-ignore-label flag. due to the wide variety of provider
infrastructures that clusterapi can be deployed on, we would prefer to
not encode all of these labels in the autoscaler itself. see the linked
recording for more information.
[0] https://www.youtube.com/watch?v=jbhca_9oPuQ
Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.
This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
This allows the ClusterAPI provider to ignore the
`topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset
processor. It also adds unit tests to exercise the new processor.
This change adds the aforementioned label to the list of ignored labels
in the AWS nodegroupset processor. This change is being made in response
to the addition of this label by the aws-ebs-csi-driver. This label will
eventually be deprecated by the driver, but its use will prevent AWS
users from properly balancing similar nodes. Also adds unit test for the
AWS processor.
ref: https://github.com/kubernetes/autoscaler/issues/3230
ref: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/729
Commit bb2eed1cff introduced a new `topology.gke.io/zone` label to
GCE nodes templates, for CSI needs.
That label holds zone name, making nodeInfo templates dissimilar
for groups belonging to different zones. The CA otherwise tries to
ignore those zonal labels (ie. it ignores the standards LabelZoneRegion
and LabelZoneFailureDomain) when it looks for nodegroups similarities.
In testing, AWS M5 instances can on occasion display approximately a 1% difference
in memory capacity between availability zones, deployed with the same launch
configuration and same AMI.
Allow a 1.5% tolerance to give some buffer on the actual amount of memory discrepancy
since in testing, some examples were just over 1% (eg 1.05%, 1.1%).
Tests are included with capacity values taken from real instances to prevent future
regression.
The following things changed in scheduler and needed to be fixed:
* NodeInfo was moved to schedulerframework
* Some fields on NodeInfo are now exposed directly instead of via getters
* NodeInfo.Pods is now a list of *schedulerframework.PodInfo, not *apiv1.Pod
* SharedLister and NodeInfoLister were moved to schedulerframework
* PodLister was removed
The current comparator expects memory capacity values to be identical.
However across AWS, Azure and GCP I quite often see very small
differences in capacity, typically 8-16Ki. When this occurs the
nodegroups are considered not equal when balancing is in effect which
is unfortunate because, in reality, they are identical.
This change will now tolerate a 128Ki difference before memory
capacity values are considered unequal.