Daniel Kłobuszewski
26769e4c1b
Expose nodes with unready GPU in CA status
...
This change simplifies debugging GPU issues: without it, all nodes can
be Ready as far as Kubernetes API is concerned, but CA will still report
some of them as unready if are missing GPU resource. Explicitly calling
them out in the status ConfigMap will point into the right direction.
2022-03-03 14:59:31 +01:00
Kubernetes Prow Robot
994fbac99f
Merge pull request #4661 from olagacek/master
...
Remove disable scale down callback if schedulable pods are found in filter_out_schedulable.
2022-02-03 09:37:46 -08:00
Jayant Jain
a906da2c6e
mig_info_provider.go:fillMigInstances will now use locking when calling the gce api.
...
This is to avoid multiple gce calls for the same mig during scale down (which is done in parallel).
2022-02-03 12:25:53 +00:00
Aleksandra Gacek
834d02b2d5
Remove disable scale down callback if schedulable pods are found in
...
filter_out_schedulable.
2022-02-02 15:23:31 +01:00
Maciek Pytel
a8f4981f4f
Update import paths to clock utils library
2022-01-28 16:56:21 -08:00
Marwan Ahmed
b0da013ec2
update vendor directory
2022-01-28 16:47:06 -08:00
Marwan Ahmed
4d4ecbef02
increase azure clients polling delay to 30s
2022-01-28 13:58:23 -08:00
Marwan Ahmed
6689f92cbc
update delete async calls in scale sets
2022-01-28 13:58:15 -08:00
Marwan Ahmed
82c480d221
bump az cloudprovider version
2022-01-28 13:58:11 -08:00
Jayant Jain
a3db650c26
CA: Debugging snapshotter locking optimisation for better transactions
2022-01-27 11:36:19 +00:00
Kubernetes Prow Robot
b64d2949a5
Merge pull request #4633 from jayantjain93/debugging-snapshot-1
...
CA: Debugging snapshot adding a new field for TemplateNode.
2022-01-27 03:02:25 -08:00
Kubernetes Prow Robot
f508212e9d
Merge pull request #4641 from x13n/nodeinfocache
...
Don't cache NodeInfo for recently Ready nodes
2022-01-27 01:56:10 -08:00
Daniel Kłobuszewski
9944137fae
Don't cache NodeInfo for recently Ready nodes
...
There's a race condition between DaemonSet pods getting scheduled to a
new node and Cluster Autoscaler caching that node for the sake of
predicting future nodes in a given node group. We can reduce the risk of
missing some DaemonSet by providing a grace period before accepting nodes in the
cache. 1 minute should be more than enough, except for some pathological
edge cases.
2022-01-26 20:18:53 +01:00
Kubernetes Prow Robot
44170bc038
Merge pull request #4648 from marwanad/moar-instances
...
update azure instances and template with np-series SKU
2022-01-25 16:38:26 -08:00
Marwan Ahmed
24537b1ab7
properly set FPGA capacity
2022-01-25 15:51:08 -08:00
Kubernetes Prow Robot
28f549e4d1
Merge pull request #4636 from nxtlytics/fix-aws-asg-tags
...
Allow colon in AWS ASG autodiscovery tag keys
2022-01-25 15:35:42 -08:00
Marwan Ahmed
21a758c635
update azure instances with np-series
2022-01-25 14:36:52 -08:00
Jayant Jain
537e07fdb1
CA: Debugging snapshot adding a new field for TemplateNode. This captures all the templates for nodegroups present
2022-01-24 17:12:57 +00:00
Tyler Montgomery
afc835a5dd
allow colon in aws asg discovery tag names, update documentation
2022-01-21 10:34:20 -06:00
Joel Speed
9f670d4ea8
Ensure ClusterAPI DeleteNodes accounts for out of band changes scale
...
Because the autoscaler assumes it can delete nodes in parallel, it
fetches nodegroups for each node in separate go routines and then
instructs each nodegroup to delete a single node.
Because we don't share the nodegroup across go routines, the cached
replica count in the scalableresource can become stale and as such, if
the autoscaler attempts to scale down multiple nodes at a time, the
cluster api provider only actually removes a single node.
To prevent this, we must ensure we have a fresh replica count for every
scale down attempt.
2022-01-21 16:08:00 +00:00
Kubernetes Prow Robot
5c741c881d
Merge pull request #4626 from lzhecheng/remove-deleteblob-ut
...
Remove TestDeleteBlob UT
2022-01-19 17:53:52 -08:00
Zhecheng Li
5b99b58ba1
Remove TestDeleteBlob UT
...
Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
2022-01-20 09:28:18 +08:00
Kubernetes Prow Robot
f8266a5101
Merge pull request #4627 from yaroslava-serdiuk/templates
...
GCE: Add m2-megamem-416 price
2022-01-19 07:06:06 -08:00
Yaroslava Serdiuk
abacf124ad
GCE: Add m2-megamem-416 price
2022-01-19 14:51:22 +00:00
Kubernetes Prow Robot
698c02b17c
Merge pull request #4603 from yaroslava-serdiuk/templates
...
Introduce gce image types and remove *_containerd gce os distributions
2022-01-19 04:56:04 -08:00
Yaroslava Serdiuk
5380a9dd83
Cluster-Autoscaler: Introduce gce image types and remove *_containerd gce os distributions.
2022-01-19 12:26:36 +00:00
Kubernetes Prow Robot
91e8f8e40c
Merge pull request #4617 from kisieland/add_context_to_scale_down_processor
...
Add AutoscalingContext to the scale-down post-processor
2022-01-18 03:07:08 -08:00
Maciek Pytel
217d780160
Add FAQ entry about the go version used
2022-01-18 10:22:57 +01:00
Maciek Pytel
24f896cd9d
Add go:build tags matching existing +build tags
...
As of go1.17 both tags are expected to exist simultaneously.
Added tags in all cluster autoscaler files. Added verify-gomod.sh
exceptions for non-compliant autogenerated VPA files.
2022-01-18 10:22:57 +01:00
Daniel Gutowski
a230b47fec
Add AutoscalingContext to the scale-down post-processor
2022-01-18 07:58:53 +00:00
Benjamin Pineau
1aca77527a
azure: change a flacky test
...
It seems that test gets varying error messages which prompted
Bartłomiej previous fix, but I'm now seeing the original error
message string back ("Server failed to authenticate [...]"),
so that `TestDeleteBlob` test is failing again (other PRs' tests
failures suggest that's not just my laptop).
Let's assume this was meant to check for an error, until someone
can confirm, that might be better than potentially hidding other
PRs real tests failures.
2022-01-17 19:01:05 +01:00
Kubernetes Prow Robot
f5de590bea
Merge pull request #4580 from cprivite/Rename_Packet_to_Equinix_Metal
...
Rename packet to equinix metal
2022-01-13 08:04:30 -08:00
Kubernetes Prow Robot
441d7968fa
Merge pull request #4519 from kisieland/scale_down_candidate_select_processor
...
Introduce the scale down processor that picks the final scale down candidates
2022-01-13 08:02:30 -08:00
Kubernetes Prow Robot
b9bfdc1bbc
Merge pull request #4579 from randomvariable/remove-randomvariable-owners
...
Cluster API OWNERS: Remove randomvariable
2022-01-13 07:12:30 -08:00
Kubernetes Prow Robot
80574ca166
Merge pull request #4508 from aledbf/done-error
...
Cluster Autoscaler: GCE: check the result of the operation
2022-01-13 07:08:30 -08:00
Kubernetes Prow Robot
00721caf97
Merge pull request #4582 from cprivite/Use_Current_cluster-autoscaler_image_In_Example
...
use gcr hosted cluster-autoscaler image
2022-01-13 06:18:30 -08:00
Bartłomiej Wróblewski
f0a9ede345
Fix constant used in azure unit tests
2022-01-11 16:05:16 +00:00
Kubernetes Prow Robot
b3576e0cdc
Merge pull request #4507 from ByteAlex/hetzner-node-name
...
Shorten Hetzners node names with hex repr
2022-01-09 19:09:12 -08:00
Chris Privitere
a220224889
use gcr hosted cluster-autoscaler image
...
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
2022-01-06 20:59:59 +00:00
Chris Privitere
c4e1aa247e
Add note to readme about the rename of Packet.
...
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
2022-01-05 20:26:16 +00:00
Chris Privitere
8f8d071b9e
Update example facility and machine plans to current versions.
...
Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>
2022-01-05 18:20:31 +00:00
Chris Privitere
0396f5c3c9
Rename packet to Equinix Metal
2022-01-05 17:45:48 +00:00
Naadir Jeewa
ee761bdc24
Cluster API OWNERS: Remove randomvariable
...
Signed-off-by: Naadir Jeewa <jeewan@vmware.com>
2022-01-05 15:11:21 +00:00
Daniel Gutowski
8064d6d1fd
Introduce the scale down processor that picks the final scale down candidates.
2022-01-03 16:05:36 +00:00
Jayant Jain
729038ff2d
Adding support for Debugging Snapshot
2021-12-30 09:08:05 +00:00
Qi Ni
dc64e41104
chore: remove a time comsuming unit test in provider azure
2021-12-27 10:37:52 +08:00
Kubernetes Prow Robot
6d19e3ddb9
Merge pull request #4441 from marwanad/fix-pod-equivalence-perf
...
fix pod equivalency checks for pods with projected volumes
2021-12-24 04:12:15 -08:00
Kubernetes Prow Robot
fca1dc0513
Merge pull request #4550 from marwanad/csi-topology-label-ignore-scale-from-zero
...
ignore azure csi topology label for similarity checks and populate it for scale from zero
2021-12-23 03:30:37 -08:00
Marwan Ahmed
fd089c2d15
avoid double wrapping scale up error
2021-12-22 15:47:05 +02:00
Kubernetes Prow Robot
7b19d33de7
Merge pull request #4345 from sergelogvinov/create-timeout
...
Increase server create timeout
2021-12-22 03:23:35 -08:00