Adds a new flag `--balance-label` which allows users to balance between
node groups exclusively via labels.
This gives users the flexibility to specify the similarity logic
themselves when --balance-similar-node-groups is in use.
The binpacking algorithm is O(#pending_pods * #new_nodes) and
calculating a very large scale-up can get stuck for minutes or even
hours, leading to CA failing it's healthcheck and going down.
The new limiting prevents this scenario by stopping binpacking after
reaching specified threshold. Any pods that remain pending as a result
of shorter binpacking will be processed next autoscaler loop.
The thresholds used can be controlled with newly introduced flags:
--max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The
limiting can be disabled by setting both flags to 0 (not recommended,
especially for --max-nodegroup-binpacking-duration).
this change brings in a new command line flag,
`--record-duplicated-events`, which allows a user to enable the
duplication of events bypassing the 5 minute de-duplication window.
Add a flag to allow the user configure then MaxPodEvictionTime to values
other than the default 2m. This is needed in cases a pod takes more than
2 minutes to be evicted.
Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>
This allows the ClusterAPI provider to ignore the
`topology.ebs.csi.aws.com/zone` label by adding a custom nodegroupset
processor. It also adds unit tests to exercise the new processor.
Multiple expanders can now be specified, expanders now "filter to the
tied for best" instead of "selecting the best" so the output of one
expander is now fed to the input of the next. Each expander may only
be used once to disallow bad configuration. This should not be a change
in functionality as in the event of a tie the random expander is still
used.
This change adds 4 metrics that can be used to monitor the minimum and
maximum limits for CPU and memory, as well as the current counts in
cores and bytes, respectively.
The four metrics added are:
* `cluster_autoscaler_cpu_limits_cores`
* `cluster_autoscaler_cluster_cpu_current_cores`
* `cluster_autoscaler_memory_limits_bytes`
* `cluster_autoscaler_cluster_memory_current_bytes`
This change also adds the `max_cores_total` metric to the metrics
proposal doc, as it was previously not recorded there.
User story: As a cluster autoscaler user, I would like to monitor my
cluster through metrics to determine when the cluster is nearing its
limits for cores and memory usage.
This allows us to run two instances of cluster-autoscaler in our
cluster, targeting two different types of autoscaling groups that
require different command-line settings to be passed.
This is the first step of implementing
https://github.com/kubernetes/autoscaler/issues/3583#issuecomment-743215343.
New method was added to cloudprovider interface. All existing providers
were updated with a no-op stub implementation that will result in no
behavior change.
The config values specified per NodeGroup are not yet applied.
This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list.
This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior.
With 1.5k MIGs attached to a cluster, cluster-autoscaler needs about
40mn to start. Refreshing MIGs+ITs concurrently brings that down to
about 5mn.
While bulk GCE API calls (triggered at startup and on Refresh() calls)
and a few stateless functions (called by GetMigInstanceTemplate) become
concurrent, cache accesses remains lock protected. To that effect:
* Set RegenerateInstancesCache to run parallels RegenerateInstanceCacheForMig
(slightly adapted so the slow GceService.FetchMigInstances call isn't locked)
* Set fetchAutoMigs to run parallels registerMig: rework GetMigInstanceTemplate
so the slow InstanceGroupManagers.Get+InstanceTemplates.Get calls aren't locked
Tested on a large k8s cluster (> 1k MIGs) with intense scaling activity,
and tested on live clusters with "go build -race" cluster-autoscaler builds.
Commit bb2eed1cff introduced a new `topology.gke.io/zone` label to
GCE nodes templates, for CSI needs.
That label holds zone name, making nodeInfo templates dissimilar
for groups belonging to different zones. The CA otherwise tries to
ignore those zonal labels (ie. it ignores the standards LabelZoneRegion
and LabelZoneFailureDomain) when it looks for nodegroups similarities.
- Leverage --cloud-config to allow for providing a separate kubeconfig for Cluster API management and workload cluster resources
- Allow for fallback to previous behavior when --cloud-config is not specified for backward compatibility
- Provides a --clusterapi-cloud-config-authoritative flag to disable the above fallback behavior and allow for both the management and workload cluster clients to use the in-cluster config