autoscaler

Commit Graph

Author	SHA1	Message	Date
Daniel Kłobuszewski	26769e4c1b	Expose nodes with unready GPU in CA status This change simplifies debugging GPU issues: without it, all nodes can be Ready as far as Kubernetes API is concerned, but CA will still report some of them as unready if are missing GPU resource. Explicitly calling them out in the status ConfigMap will point into the right direction.	2022-03-03 14:59:31 +01:00
Kubernetes Prow Robot	994fbac99f	Merge pull request #4661 from olagacek/master Remove disable scale down callback if schedulable pods are found in filter_out_schedulable.	2022-02-03 09:37:46 -08:00
Jayant Jain	a906da2c6e	mig_info_provider.go:fillMigInstances will now use locking when calling the gce api. This is to avoid multiple gce calls for the same mig during scale down (which is done in parallel).	2022-02-03 12:25:53 +00:00
Aleksandra Gacek	834d02b2d5	Remove disable scale down callback if schedulable pods are found in filter_out_schedulable.	2022-02-02 15:23:31 +01:00
Maciek Pytel	a8f4981f4f	Update import paths to clock utils library	2022-01-28 16:56:21 -08:00
Marwan Ahmed	b0da013ec2	update vendor directory	2022-01-28 16:47:06 -08:00
Marwan Ahmed	4d4ecbef02	increase azure clients polling delay to 30s	2022-01-28 13:58:23 -08:00
Marwan Ahmed	6689f92cbc	update delete async calls in scale sets	2022-01-28 13:58:15 -08:00
Marwan Ahmed	82c480d221	bump az cloudprovider version	2022-01-28 13:58:11 -08:00
Jayant Jain	a3db650c26	CA: Debugging snapshotter locking optimisation for better transactions	2022-01-27 11:36:19 +00:00
Kubernetes Prow Robot	b64d2949a5	Merge pull request #4633 from jayantjain93/debugging-snapshot-1 CA: Debugging snapshot adding a new field for TemplateNode.	2022-01-27 03:02:25 -08:00
Kubernetes Prow Robot	f508212e9d	Merge pull request #4641 from x13n/nodeinfocache Don't cache NodeInfo for recently Ready nodes	2022-01-27 01:56:10 -08:00
Daniel Kłobuszewski	9944137fae	Don't cache NodeInfo for recently Ready nodes There's a race condition between DaemonSet pods getting scheduled to a new node and Cluster Autoscaler caching that node for the sake of predicting future nodes in a given node group. We can reduce the risk of missing some DaemonSet by providing a grace period before accepting nodes in the cache. 1 minute should be more than enough, except for some pathological edge cases.	2022-01-26 20:18:53 +01:00
Kubernetes Prow Robot	44170bc038	Merge pull request #4648 from marwanad/moar-instances update azure instances and template with np-series SKU	2022-01-25 16:38:26 -08:00
Marwan Ahmed	24537b1ab7	properly set FPGA capacity	2022-01-25 15:51:08 -08:00
Kubernetes Prow Robot	28f549e4d1	Merge pull request #4636 from nxtlytics/fix-aws-asg-tags Allow colon in AWS ASG autodiscovery tag keys	2022-01-25 15:35:42 -08:00
Marwan Ahmed	21a758c635	update azure instances with np-series	2022-01-25 14:36:52 -08:00
Jayant Jain	537e07fdb1	CA: Debugging snapshot adding a new field for TemplateNode. This captures all the templates for nodegroups present	2022-01-24 17:12:57 +00:00
Tyler Montgomery	afc835a5dd	allow colon in aws asg discovery tag names, update documentation	2022-01-21 10:34:20 -06:00
Joel Speed	9f670d4ea8	Ensure ClusterAPI DeleteNodes accounts for out of band changes scale Because the autoscaler assumes it can delete nodes in parallel, it fetches nodegroups for each node in separate go routines and then instructs each nodegroup to delete a single node. Because we don't share the nodegroup across go routines, the cached replica count in the scalableresource can become stale and as such, if the autoscaler attempts to scale down multiple nodes at a time, the cluster api provider only actually removes a single node. To prevent this, we must ensure we have a fresh replica count for every scale down attempt.	2022-01-21 16:08:00 +00:00
Kubernetes Prow Robot	5c741c881d	Merge pull request #4626 from lzhecheng/remove-deleteblob-ut Remove TestDeleteBlob UT	2022-01-19 17:53:52 -08:00
Zhecheng Li	5b99b58ba1	Remove TestDeleteBlob UT Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>	2022-01-20 09:28:18 +08:00
Kubernetes Prow Robot	f8266a5101	Merge pull request #4627 from yaroslava-serdiuk/templates GCE: Add m2-megamem-416 price	2022-01-19 07:06:06 -08:00
Yaroslava Serdiuk	abacf124ad	GCE: Add m2-megamem-416 price	2022-01-19 14:51:22 +00:00
Kubernetes Prow Robot	698c02b17c	Merge pull request #4603 from yaroslava-serdiuk/templates Introduce gce image types and remove *_containerd gce os distributions	2022-01-19 04:56:04 -08:00
Yaroslava Serdiuk	5380a9dd83	Cluster-Autoscaler: Introduce gce image types and remove *_containerd gce os distributions.	2022-01-19 12:26:36 +00:00
Kubernetes Prow Robot	91e8f8e40c	Merge pull request #4617 from kisieland/add_context_to_scale_down_processor Add AutoscalingContext to the scale-down post-processor	2022-01-18 03:07:08 -08:00
Maciek Pytel	217d780160	Add FAQ entry about the go version used	2022-01-18 10:22:57 +01:00
Maciek Pytel	24f896cd9d	Add go:build tags matching existing +build tags As of go1.17 both tags are expected to exist simultaneously. Added tags in all cluster autoscaler files. Added verify-gomod.sh exceptions for non-compliant autogenerated VPA files.	2022-01-18 10:22:57 +01:00
Daniel Gutowski	a230b47fec	Add AutoscalingContext to the scale-down post-processor	2022-01-18 07:58:53 +00:00
Benjamin Pineau	1aca77527a	azure: change a flacky test It seems that test gets varying error messages which prompted Bartłomiej previous fix, but I'm now seeing the original error message string back ("Server failed to authenticate [...]"), so that `TestDeleteBlob` test is failing again (other PRs' tests failures suggest that's not just my laptop). Let's assume this was meant to check for an error, until someone can confirm, that might be better than potentially hidding other PRs real tests failures.	2022-01-17 19:01:05 +01:00
Kubernetes Prow Robot	f5de590bea	Merge pull request #4580 from cprivite/Rename_Packet_to_Equinix_Metal Rename packet to equinix metal	2022-01-13 08:04:30 -08:00
Kubernetes Prow Robot	441d7968fa	Merge pull request #4519 from kisieland/scale_down_candidate_select_processor Introduce the scale down processor that picks the final scale down candidates	2022-01-13 08:02:30 -08:00
Kubernetes Prow Robot	b9bfdc1bbc	Merge pull request #4579 from randomvariable/remove-randomvariable-owners Cluster API OWNERS: Remove randomvariable	2022-01-13 07:12:30 -08:00
Kubernetes Prow Robot	80574ca166	Merge pull request #4508 from aledbf/done-error Cluster Autoscaler: GCE: check the result of the operation	2022-01-13 07:08:30 -08:00
Kubernetes Prow Robot	00721caf97	Merge pull request #4582 from cprivite/Use_Current_cluster-autoscaler_image_In_Example use gcr hosted cluster-autoscaler image	2022-01-13 06:18:30 -08:00
Bartłomiej Wróblewski	f0a9ede345	Fix constant used in azure unit tests	2022-01-11 16:05:16 +00:00
Kubernetes Prow Robot	b3576e0cdc	Merge pull request #4507 from ByteAlex/hetzner-node-name Shorten Hetzners node names with hex repr	2022-01-09 19:09:12 -08:00
Chris Privitere	a220224889	use gcr hosted cluster-autoscaler image Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>	2022-01-06 20:59:59 +00:00
Chris Privitere	c4e1aa247e	Add note to readme about the rename of Packet. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>	2022-01-05 20:26:16 +00:00
Chris Privitere	8f8d071b9e	Update example facility and machine plans to current versions. Signed-off-by: Chris Privitere <cprivite@users.noreply.github.com>	2022-01-05 18:20:31 +00:00
Chris Privitere	0396f5c3c9	Rename packet to Equinix Metal	2022-01-05 17:45:48 +00:00
Naadir Jeewa	ee761bdc24	Cluster API OWNERS: Remove randomvariable Signed-off-by: Naadir Jeewa <jeewan@vmware.com>	2022-01-05 15:11:21 +00:00
Daniel Gutowski	8064d6d1fd	Introduce the scale down processor that picks the final scale down candidates.	2022-01-03 16:05:36 +00:00
Jayant Jain	729038ff2d	Adding support for Debugging Snapshot	2021-12-30 09:08:05 +00:00
Qi Ni	dc64e41104	chore: remove a time comsuming unit test in provider azure	2021-12-27 10:37:52 +08:00
Kubernetes Prow Robot	6d19e3ddb9	Merge pull request #4441 from marwanad/fix-pod-equivalence-perf fix pod equivalency checks for pods with projected volumes	2021-12-24 04:12:15 -08:00
Kubernetes Prow Robot	fca1dc0513	Merge pull request #4550 from marwanad/csi-topology-label-ignore-scale-from-zero ignore azure csi topology label for similarity checks and populate it for scale from zero	2021-12-23 03:30:37 -08:00
Marwan Ahmed	fd089c2d15	avoid double wrapping scale up error	2021-12-22 15:47:05 +02:00
Kubernetes Prow Robot	7b19d33de7	Merge pull request #4345 from sergelogvinov/create-timeout Increase server create timeout	2021-12-22 03:23:35 -08:00

1 2 3 4 5 ...

3004 Commits