Commit Graph

5141 Commits

Author SHA1 Message Date
Kubernetes Prow Robot a498045443
Merge pull request #4237 from DataDog/autoscaling-options-azure
implement GetOptions for Azure
2021-08-24 09:47:14 -07:00
Kubernetes Prow Robot b0681fca7c
Merge pull request #4277 from filintod/fix-autoscaler-ns-permit
fix 4256 autoscaler permit
2021-08-24 01:43:15 -07:00
Benjamin Pineau 28cd49c09e implement GetOptions for Azure
Support per-VMSS (scaledown) settings as permited by the
cloudprovider's interface `GetOptions()` method.
2021-08-24 09:48:51 +02:00
Filinto Duran 38ccc59458
fix 4256 autoscaler permit
it does not need update, only watch/list/get
2021-08-23 15:41:24 -05:00
Kubernetes Prow Robot d09b8931bb
Merge pull request #4236 from DataDog/autoscaling-options-gce
implement GetOptions for GCE
2021-08-23 03:48:00 -07:00
Kubernetes Prow Robot 0e6f5fb25d
Merge pull request #4278 from jmnote/patch-1
presources → resources
2021-08-22 22:38:00 -07:00
Jmnote 288b8aa4fe
presources → resources
presources → resources
2021-08-23 12:20:38 +09:00
Benjamin Pineau d905ec28dd implement GetOptions for GCE
Support per-MIG (scaledown) settings as permited by the
cloudprovider's interface `GetOptions()` method.
2021-08-21 18:18:48 +02:00
Kubernetes Prow Robot fb8fdf819b
Merge pull request #4274 from kinvolk/imran/cloud-provider-packet-fix
Cloud provider[Packet] fixes
2021-08-19 11:35:25 -07:00
Imran Pochi 8e6f109dab packet: Add documentation regarding new env variable
Adds documentation regarding env variables introduced:

 - PACKET_CONTROLLER_NODE_IDENTIFIER_LABEL
 - INSTALLED_CCM

Signed-off-by: Imran Pochi <imran@kinvolk.io>
2021-08-19 21:04:53 +05:30
Imran Pochi 87beac1af7 packet: make controller node label configurable
Currently the label to identify controller/master node is hard coded to
`node-role.kubernetes.io/master`.

There have been some conversations centered around replacing the label
with `node-role.kubernetes.io/control-plane`.

In [Lokomotive](github.com/kinvolk/lokomotive), the label to identify
the controller/master node is `node.kubernetes.io/master`, the reasons
for this is mentioned in this [issue](https://github.com/kinvolk/lokomotive/issues/227)

This commit makes the label configurable by setting an env variable in
the deployment `CONTROLLER_NODE_IDENTIFIER_LABEL`, if set then the value
in the env variable is used for identifying controller/master nodes, if
not set/passed, then the existing behaviour is followed choosing the
existing label.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
2021-08-19 21:04:53 +05:30
Imran Pochi 0041a393ec packet: Consider the string prefix equinixmetal://
This commit adds another string prefix to consider `equinixmetal://`
along with the existing prefix `packet://`.

When K8s API is queried to get providerID from Node Spec, some machines
return `packet://<uuid>`, whereas some return `equinixmetal://`, this
creates error as the string is not trimmed properly and hence results in
a  404 when an untrimmed string is queried to Equinix Metal API for
device information.

Signed-off-by: Imran Pochi <imran@kinvolk.io>
2021-08-19 21:04:52 +05:30
Imran Pochi 6e95bccc96 packet: fix panic on entry in nil map
In the latest version of cluster-autoscaler (cloudprovider: packet), the
code panics and the pods go into CrashLoopBackoff due to an entry
assignment on a nil map.

This commit fixes that by initializing the ConfigFile instance.

I believe this situation is created when the config file doesn't contain
any information about the nodepool and also `default` config is not
present, but this does not take care the use case of when `Global`
section is defined in the config file.

Below is the error reproduced when `[Global]` is used in the config
file.

```
panic: assignment to entry in nil map

goroutine 131 [running]:
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.createPacketManagerRest(0x44cf260, 0xc00085e448, 0xc000456670, 0x1, 0x1, 0x0, 0x0, 0x0, 0x3fe0000000000000, 0x3fe0000000000000, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_manager_rest.go:307 +0xaca
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.createPacketManager(0x44cf260, 0xc00085e448, 0xc000456670, 0x1, 0x1, 0x0, 0x0, 0x0, 0x3fe0000000000000, 0x3fe0000000000000, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_manager.go:64 +0x179
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet.BuildPacket(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/packet/packet_cloud_provider.go:164 +0xe5
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.buildCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/builder_all.go:91 +0x31f
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.NewCloudProvider(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go:45 +0x1e6
k8s.io/autoscaler/cluster-autoscaler/core.initializeDefaultOptions(0xc0013876e0, 0x452ef01, 0xc000d80e20)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:101 +0x2fd
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscaler(0x3fe0000000000000, 0x3fe0000000000000, 0x1bf08eb000, 0x1176592e000, 0xa, 0x0, 0x4e200, 0x0, 0x186a0000000000, 0x0, ...)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:65 +0x43
main.buildAutoscaler(0xc000313600, 0xc000d00000, 0x4496df, 0x7f9c7b60b4f0)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:337 +0x368
main.run(0xc00063e230)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:343 +0x39
main.main.func2(0x453b440, 0xc00029d380)
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:447 +0x2a
created by k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
	/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:207 +0x113
```

Signed-off-by: Imran Pochi <imran@kinvolk.io>
2021-08-19 19:14:06 +05:30
Kubernetes Prow Robot c64e97570a
Merge pull request #4234 from by211/by211-patch-1
Fix markdown code not showing correctly
2021-08-17 06:01:13 -07:00
Kubernetes Prow Robot 5766bb7137
Merge pull request #4210 from caogj/fix-usage
fixed flag usages
2021-08-16 05:29:17 -07:00
Kubernetes Prow Robot 62e79c3636
Merge pull request #4250 from PhdLoLi/master
Fill in the LastUpdateTime Field of VpaCheckpoint Status with Correct Time.
2021-08-16 04:31:48 -07:00
Kubernetes Prow Robot cac6d4be41
Merge pull request #4261 from tghartland/4254-magnum-microversion
Use highest available magnum microversion
2021-08-16 04:23:47 -07:00
Kubernetes Prow Robot b01f84c803
Merge pull request #4199 from aidy/optimise_generate_ec2
Optimise generate ec2
2021-08-16 04:05:47 -07:00
Adrian Lai 1177ee04f3
Optimise GenerateEC2InstanceTypes unmarshal memory usage
The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.
2021-08-16 11:12:59 +01:00
Kubernetes Prow Robot ece1c4010a
Merge pull request #4246 from brightframe/cuppett/aws-add-storage-apis
Fix: Adding additional storage APIs discovered but unable to be watched on EKS
2021-08-16 01:07:47 -07:00
Kubernetes Prow Robot 5d754993c9
Merge pull request #3999 from DataDog/bump-asg-per-describe
aws: Set maxAsgNamesPerDescribe to the new maximum value
2021-08-16 01:01:47 -07:00
Thomas Hartland 13cd70eaa5 Use highest available magnum microversion
Magnum allows using the microversion string "latest",
and it will replace it internally with the highest
microversion that it supports.

This will let the autoscaler use microversion 1.10 which
allows scaling groups to 0 nodes, if it is available.

The autoscaler will still be able to use microversion 1.9
on older versions of magnum.
2021-08-13 13:14:51 +02:00
Kubernetes Prow Robot 0265412f69
Merge pull request #4243 from sherman-grewal/updater-deploy-add-ns-env-var
Add NAMESPACE as an environment variable to the updater deployment config
2021-08-13 03:16:23 -07:00
Kubernetes Prow Robot 0df0eecea4
Merge pull request #4257 from x13n/patch-2
Make CA version on HEAD match k8s version in go.mod
2021-08-12 06:09:47 -07:00
Daniel Kłobuszewski fcb0e665ef
Make CA version on HEAD match k8s version in go.mod 2021-08-12 11:46:52 +02:00
Kubernetes Prow Robot 9482d47f7e
Merge pull request #4253 from olagacek/master
Extend ScaleUpStatus structure with ScaleUpError field.
2021-08-12 02:17:47 -07:00
Aleksandra Gacek b194c6f252 Extend ScaleUpStatus structure with ScaleUpError field. 2021-08-12 10:40:58 +02:00
Lijing Wang 80cc8487d0
Merge branch 'kubernetes:master' into master 2021-08-10 12:16:05 +08:00
Kubernetes Prow Robot e9b605ae48
Merge pull request #4245 from x13n/update-vendor
Update Cluster Autoscaler version with vendor
2021-08-09 07:25:32 -07:00
Adrian Lai 329c6522b0
Break out unmarshal from GenerateEC2InstanceTypes
Refactor to allow for optimisation
2021-08-09 10:46:30 +01:00
Kubernetes Prow Robot 7e5f8f0045
Merge pull request #4179 from DataDog/aws-requests-metrics
Metrics for AWS API calls
2021-08-09 01:19:31 -07:00
Stephen Cuppett 409a5d9c78 Fix: Adding additional APIs discovered but unable to be watched on EKS
csidrivers.storage.k8s.io and csistoragecpacities.storage.k8s.io are available on EKS
1.21. Adding permissions to the ClusterRole in the example to avoid the error
messages.
2021-08-08 18:42:55 -04:00
Kubernetes Prow Robot d86a875927
Merge pull request #4222 from harshagv/priority-configmap-annotation
allow adding annotations for priority-expander configmap
2021-08-06 08:43:19 -07:00
Daniel Kłobuszewski b95a611216 Update Cluster Autoscaler version with vendor
Since Cluster Autoscaler versioning should be in sync with Kubernetes,
update-vendor.sh can simply set the version after a successful
dependency update.
2021-08-06 17:30:09 +02:00
Sherman Grewal 8b624757bf Add NAMESPACE as an environment variable to the updater deployment config 2021-08-05 23:32:10 -04:00
Kubernetes Prow Robot 2dd92cbf37 allow adding annotations for priority-expander configmap 2021-08-06 08:09:19 +08:00
Kubernetes Prow Robot ca49c2c7ae
Merge pull request #4050 from afirth/patch-1
Add example to AWS readme if taint has value
2021-08-05 13:29:20 -07:00
Kubernetes Prow Robot 4b4bc85aa1
Merge pull request #4046 from sylr/aws-log
Improve misleading log
2021-08-05 12:23:19 -07:00
Kubernetes Prow Robot 9d0946bccb
Merge pull request #4241 from towca/jtuznik/n2-pricing
GCE: add pricing info for new N2 instance types
2021-08-05 01:45:23 -07:00
Jakub Tużnik 19dffbc145 GCE: add pricing info for new N2 instance types 2021-08-05 10:29:38 +02:00
Kubernetes Prow Robot 9d54f7b782
Merge pull request #4239 from BigDarkClown/move-update-labels
Move UpdateDeprecatedTemplateLabels function
2021-08-04 07:49:24 -07:00
Bartłomiej Wróblewski 1e4cb1eafe Move UpdateDeprecatedTemplateLabels function
This is a useful function, we will benefit from
having it more accessible then it is currently.
2021-08-04 14:32:39 +00:00
Kubernetes Prow Robot c563a40a60
Merge pull request #4235 from DataDog/fix-tests-and-gcp-pricing
cluster-autoscaler: fix unit tests
2021-08-03 03:12:48 -07:00
Benjamin Pineau 79c63a7b3c cluster-autoscaler: fix tests and GCE NodePrice
Recent changes configured providers to set stable nodes labels names
exclusively (ie. LabelTopologyZone and not LabelZoneFailureDomain, etc),
with older labels names backfilled at nodeInfos templates generation time
(from GetNodeInfoFromTemplate), which isn't invoked from most tests cases.
GCE NodePirce() might have been dereferencing potentially missing labels.
And run hack/update-gofmt.sh where hack/verify-all.sh fails, to pass CI.
2021-08-03 08:28:49 +02:00
Kubernetes Prow Robot 21fc0c1889
Merge pull request #4053 from codablock/old-labels
Also set new (non-beta/non-deprecated) labels in buildGenericLabels
2021-08-01 18:21:21 -07:00
by211 f2eefa9a26
Fix markdown code not showing correctly 2021-07-31 12:25:03 -05:00
Alexander Block 6d84abf0de Remove obsolete comment
arch is not hardcoded anymore
2021-07-29 16:45:09 +02:00
Alexander Block 8f11490c0c Introduce UpdateDeprecatedTemplateLabels to set beta/deprecated labels
And at the same time only set stable labels in all buildGenericLabels
implementations.

This fixes issues when a node group has 0 nodes yet and node labels are
built using buildGenericLabels and the node-template labels.

Issues include (anti-)affinity and nodeSelectors for the given labels,
giving false-negative results for candidate nodes, which leads to ASGs
never scaling up.
2021-07-29 16:45:08 +02:00
Kubernetes Prow Robot 1ecc8b43e1
Merge pull request #4225 from DataDog/gce-createinstances-basename
GCE: CreateInstances() should use BaseInstanceName
2021-07-29 05:10:19 -07:00
Benjamin Pineau 655bc6fd4a GCE: CreateInstances() should use BaseInstanceName
The new `CreateInstances()` upscale method replacing `Resize()` API
calls generates new instances names based on the MIG's name (from
`mig.GceRef()`).

Before that change, `Resize()`-initiated upscales were prompting MIGs to
spawn instances named after MIG's `BaseInstanceName` attribute.

Accordingly, `GetMigForInstance()` (still) uses MIG's `BaseInstanceName`
to map instances to their parent MIG and discover which MIGs needs an
immediate refresh.

Down the line the `clusterstate.updateReadinessStats()` periodic
goroutines won't be able to map new ready nodes to their parent MIGs
(until the cache is backfilled upward from k8s node's providerid, ie.
from an hourly goroutine), and those MIGs will be considered non-ready
(because MIG's size>0 while the MIG has no known ready instances).

So after a first upscale, MIGs (having a BaseInstanceName that is not
the MIG's Name) won't be re-upscalable for a while. Example symptoms:

```
cluster-autoscaler W0719 12:35:43.166563 6 clusterstate.go:447] Failed to find readiness information for https://www.googleapis.com/compute/v1/projects/REDACTED-PROJECT/zones/europe-west3-b/instanceGroups/REDACTED-MIGNAME
cluster-autoscaler W0719 12:35:43.193469 6 clusterstate.go:626] Readiness for node group https://www.googleapis.com/compute/v1/projects/REDACTED-PROJECT/zones/europe-west3-b/instanceGroups/REDACTED-MIGNAME not found
```

Beside mapping cache issue, this changed the instance names prefixes for
some users, while it might make sense to keep using basenames when
explicitely provided (might have an use for eg. identification, or name
length limits) and avoid a breaking change before `CreateInstances` hits
a release.
2021-07-29 12:41:12 +02:00