Currently the label to identify controller/master node is hard coded to
`node-role.kubernetes.io/master`.
There have been some conversations centered around replacing the label
with `node-role.kubernetes.io/control-plane`.
In [Lokomotive](github.com/kinvolk/lokomotive), the label to identify
the controller/master node is `node.kubernetes.io/master`, the reasons
for this is mentioned in this [issue](https://github.com/kinvolk/lokomotive/issues/227)
This commit makes the label configurable by setting an env variable in
the deployment `CONTROLLER_NODE_IDENTIFIER_LABEL`, if set then the value
in the env variable is used for identifying controller/master nodes, if
not set/passed, then the existing behaviour is followed choosing the
existing label.
Signed-off-by: Imran Pochi <imran@kinvolk.io>
This commit adds another string prefix to consider `equinixmetal://`
along with the existing prefix `packet://`.
When K8s API is queried to get providerID from Node Spec, some machines
return `packet://<uuid>`, whereas some return `equinixmetal://`, this
creates error as the string is not trimmed properly and hence results in
a 404 when an untrimmed string is queried to Equinix Metal API for
device information.
Signed-off-by: Imran Pochi <imran@kinvolk.io>
In the latest version of cluster-autoscaler (cloudprovider: packet), the
code panics and the pods go into CrashLoopBackoff due to an entry
assignment on a nil map.
This commit fixes that by initializing the ConfigFile instance.
I believe this situation is created when the config file doesn't contain
any information about the nodepool and also `default` config is not
present, but this does not take care the use case of when `Global`
section is defined in the config file.
Below is the error reproduced when `[Global]` is used in the config
file.
```
panic: assignment to entry in nil map
goroutine 131 [running]:
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.createPacketManagerRest(0x44cf260, 0xc00085e448, 0xc000456670, 0x1, 0x1, 0x0, 0x0, 0x0, 0x3fe0000000000000, 0x3fe0000000000000, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_manager_rest.go:307 +0xaca
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.createPacketManager(0x44cf260, 0xc00085e448, 0xc000456670, 0x1, 0x1, 0x0, 0x0, 0x0, 0x3fe0000000000000, 0x3fe0000000000000, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_manager.go:64 +0x179
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.BuildPacket(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_cloud_provider.go:164 +0xe5
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.buildCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/builder_all.go:91 +0x31f
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.NewCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go:45 +0x1e6
k8s.io/autoscaler/cluster-autoscaler/core.initializeDefaultOptions(0xc0013876e0, 0x452ef01, 0xc000d80e20)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:101 +0x2fd
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscaler(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:65 +0x43
main.buildAutoscaler(0xc000313600, 0xc000d00000, 0x4496df, 0x7f9c7b60b4f0)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:337 +0x368
main.run(0xc00063e230)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:343 +0x39
main.main.func2(0x453b440, 0xc00029d380)
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:447 +0x2a
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:207 +0x113
```
Signed-off-by: Imran Pochi <imran@kinvolk.io>
The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.
Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.
Magnum allows using the microversion string "latest",
and it will replace it internally with the highest
microversion that it supports.
This will let the autoscaler use microversion 1.10 which
allows scaling groups to 0 nodes, if it is available.
The autoscaler will still be able to use microversion 1.9
on older versions of magnum.
csidrivers.storage.k8s.io and csistoragecpacities.storage.k8s.io are available on EKS
1.21. Adding permissions to the ClusterRole in the example to avoid the error
messages.
Since Cluster Autoscaler versioning should be in sync with Kubernetes,
update-vendor.sh can simply set the version after a successful
dependency update.
Recent changes configured providers to set stable nodes labels names
exclusively (ie. LabelTopologyZone and not LabelZoneFailureDomain, etc),
with older labels names backfilled at nodeInfos templates generation time
(from GetNodeInfoFromTemplate), which isn't invoked from most tests cases.
GCE NodePirce() might have been dereferencing potentially missing labels.
And run hack/update-gofmt.sh where hack/verify-all.sh fails, to pass CI.
And at the same time only set stable labels in all buildGenericLabels
implementations.
This fixes issues when a node group has 0 nodes yet and node labels are
built using buildGenericLabels and the node-template labels.
Issues include (anti-)affinity and nodeSelectors for the given labels,
giving false-negative results for candidate nodes, which leads to ASGs
never scaling up.
The new `CreateInstances()` upscale method replacing `Resize()` API
calls generates new instances names based on the MIG's name (from
`mig.GceRef()`).
Before that change, `Resize()`-initiated upscales were prompting MIGs to
spawn instances named after MIG's `BaseInstanceName` attribute.
Accordingly, `GetMigForInstance()` (still) uses MIG's `BaseInstanceName`
to map instances to their parent MIG and discover which MIGs needs an
immediate refresh.
Down the line the `clusterstate.updateReadinessStats()` periodic
goroutines won't be able to map new ready nodes to their parent MIGs
(until the cache is backfilled upward from k8s node's providerid, ie.
from an hourly goroutine), and those MIGs will be considered non-ready
(because MIG's size>0 while the MIG has no known ready instances).
So after a first upscale, MIGs (having a BaseInstanceName that is not
the MIG's Name) won't be re-upscalable for a while. Example symptoms:
```
cluster-autoscaler W0719 12:35:43.166563 6 clusterstate.go:447] Failed to find readiness information for https://www.googleapis.com/compute/v1/projects/REDACTED-PROJECT/zones/europe-west3-b/instanceGroups/REDACTED-MIGNAME
cluster-autoscaler W0719 12:35:43.193469 6 clusterstate.go:626] Readiness for node group https://www.googleapis.com/compute/v1/projects/REDACTED-PROJECT/zones/europe-west3-b/instanceGroups/REDACTED-MIGNAME not found
```
Beside mapping cache issue, this changed the instance names prefixes for
some users, while it might make sense to keep using basenames when
explicitely provided (might have an use for eg. identification, or name
length limits) and avoid a breaking change before `CreateInstances` hits
a release.