autoscaler

Commit Graph

Author	SHA1	Message	Date
Kuba Tużnik	55388f1136	CA: plumb the DRA provider to SetClusterState callsites, grab and pass DRA snapshot The new logic is flag-guarded, it should be a no-op if DRA is disabled.	2024-12-20 13:30:36 +01:00
Kuba Tużnik	c5cb8a077d	CA: add DRA object handling logic to PredicateSnapshot All added logic is behind the DRA flag guard, this should be a no-op if the flag is disabled.	2024-12-20 13:30:36 +01:00
Kuba Tużnik	714ab661ca	CA: implement calculating utilization for DRA resources The logic is very basic and will likely need to be revised, but it's something for initial testing. Utilization of a given Pool is calculated as the number of allocated devices in the pool divided by the number of all devices in the pool. For scale-down purposes, the max utilization of all Node-local Pools is used. The new logic is mostly behind the DRA flag guard, so this should be a no-op if the flag is disabled. The only difference should be that FilterOutUnremovable marks a Node as unremovable if calculating utilization fails. Not sure why this wasn't the case before, but I think we need it for DRA - if CA sees an incomplete picture of a resource pool, we probably don't want to scale the Node down.	2024-12-20 13:30:36 +01:00
Kuba Tużnik	4e68a0c6ef	CA: sanitize and propagate DRA objects through NodeInfos in node_info utils	2024-12-20 13:30:36 +01:00
Kuba Tużnik	479d7ce3d6	CA: implement a Provider for dynamicresources.Snapshot The Provider uses DRA object listers to create a Snapshot of the DRA objects.	2024-12-20 13:30:36 +01:00
Kuba Tużnik	377639a8dc	CA: implement dynamicresources.Snapshot for storing and modifying the state of DRA objects The Snapshot can hold all DRA objects in the cluster, and expose them to the scheduler framework via the SharedDRAManager interface. The state of the objects can be modified during autoscaling simulations using the provided methods.	2024-12-20 13:30:10 +01:00
Kuba Tużnik	66d0aeb3cb	CA: implement utils for interacting with ResourceClaims These utils will be used by various parts of the DRA logic in the following commits.	2024-12-19 15:55:49 +01:00
Kubernetes Prow Robot	756db6aa66	Merge pull request #7591 from davidspek/fix/vpa-nonroot feat(vpa): run containers as nonroot user by default	2024-12-11 17:00:03 +00:00
Kubernetes Prow Robot	723a3bb0d8	Merge pull request #7587 from adrianmoisey/fix_link VPA: Fix typo in documentation	2024-12-11 12:40:04 +00:00
David van der Spek	8dd3dcdbc0	fix(vpa): run containers as nonroot user by default Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-11 13:36:28 +01:00
Kubernetes Prow Robot	562059be93	Merge pull request #7527 from adrianmoisey/logging Pass the whole VPA into cappingRecommendationProcessor.Apply()	2024-12-11 10:00:03 +00:00
Adrian Moisey	59236c93a2	Ensure that recommendation is not nil	2024-12-11 11:42:49 +02:00
Kubernetes Prow Robot	0b1cddd850	Merge pull request #7586 from adrianmoisey/clarify_history Clarify that the VPA can be run without Prometheus	2024-12-11 09:04:08 +00:00
Kubernetes Prow Robot	2f27a1238e	Merge pull request #7588 from WalkerMills/patch-1 Fix typo in custom memory bump-up documentation	2024-12-11 09:02:04 +00:00
Walker Mills	1730af52d7	Fix typo in custom memory bump-up documentation Move max function to the start of the expression to match the behavior of the recommender: `c44d4f0033/vertical-pod-autoscaler/pkg/recommender/model/container.go (L194-L196)`	2024-12-10 22:31:13 -08:00
Adrian Moisey	d3f0619e33	Fix typo in link	2024-12-10 19:56:36 +02:00
Adrian Moisey	3e707e94a0	Clarify that the VPA can be run without Prometheus	2024-12-10 19:54:54 +02:00
Kubernetes Prow Robot	c44d4f0033	Merge pull request #7577 from zendesk/grosser/docs3 document scale-down-gpu-utilization-threshold	2024-12-10 12:08:02 +00:00
Kubernetes Prow Robot	ef19cf8f00	Merge pull request #7546 from omerap12/issue-7528 VPA(feat): separate estimator object for CPU and memory	2024-12-10 08:02:03 +00:00
Omer Aplatony	4310a161b6	added comment Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-12-10 09:45:43 +02:00
Kubernetes Prow Robot	444e3295df	Merge pull request #7507 from Shubham82/CA_Reviewer Add Shubham82 to the reviewers	2024-12-09 23:36:01 +00:00
Kubernetes Prow Robot	37b3da4e79	Merge pull request #7529 from towca/jtuznik/dra-prep CA: prepare for DRA integration	2024-12-09 17:14:03 +00:00
Kuba Tużnik	0691512d27	CA: extend SchedulerPluginRunner with RunReserveOnNode RunReserveOnNode runs the Reserve phase of schedulerframework, which is necessary to obtain ResourceClaim allocations computed by the DRA scheduler plugin. RunReserveOnNode isn't used anywhere yet, so this should be a no-op.	2024-12-09 17:38:13 +01:00
Kuba Tużnik	307002eb42	CA: move NodeInfo methods from ClusterSnapshotStore to ClusterSnapshot All the NodeInfo methods have to take DRA into account, and the logic for that will be the same for different ClusterSnapshotStore implementations. Instead of duplicating the new logic in Basic and Delta, the methods are moved to ClusterSnapshot and the logic will be implemented once in PredicateSnapshot. PredicateSnapshot will use the DRA Snapshot exposed by its ClusterSnapshotStore to implement these methods. The DRA Snapshot has to be stored in the ClusterSnapshotStore layer, as we need to be able to fork/commit/revert it. Lower-level methods for adding/removing just the schedulerframework.NodeInfo parts are added to ClusterSnapshotStore. PredicateSnapshot utilizes these methods to implement AddNodeInfo and RemoveNodeInfo. This should be a no-op, it's just a refactor.	2024-12-09 17:38:04 +01:00
Kuba Tużnik	eba5e08f6d	CA: integrate BasicSnapshotStore with drasnapshot.Snapshot Store the DRA snapshot inside the current internal data in SetClusterState(). Retrieve the DRA snapshot from the current internal data in DraSnapshot(). Clone the DRA snapshot whenever the internal data is cloned during Fork(). This matches the forking logic that BasicSnapshotStore uses, ensuring that the DRA object state is correctly forked/commited/reverted during the corresponding ClusterSnapshot operations. This should be a no-op, as DraSnapshot() isn't called anywhere yet, adn no DRA snapshot is passed to SetClusterState() yet.	2024-12-09 17:14:45 +01:00
Kuba Tużnik	466f94b780	CA: extend ClusterSnapshotStore to allow storing, retrieving and modifying DRA objects A new DRA Snapshot type is introduced, for now with just dummy methods to be implemented in later commits. The new type is intended to hold all DRA objects in the cluster. ClusterSnapshotStore.SetClusterState() is extended to take the new DRA Snapshot in addition to the existing parameters. ClusterSnapshotStore.DraSnapshot() is added to retrieve the DRA snapshot set by SetClusterState() back. This will be used by PredicateSnapshot to implement DRA logic later. This should be a no-op, as DraSnapshot() is never called, and no DRA snapshot is passed to SetClusterState() yet.	2024-12-09 17:14:45 +01:00
Kuba Tużnik	1e560274d5	CA: extend WrapSchedulerNodeInfo to allow passing DRA objects This should be a no-op, as no DRA objects are passed yet.	2024-12-09 17:14:45 +01:00
Kuba Tużnik	d0338fa301	CA: integrate simulator with schedulerframework.SharedDRAManager Make SharedDRAManager a part of the ClusterSnapshotStore interface, and implement dummy methods to satisfy the interface. Actual implementation will come in later commits. This is needed so that ClusterSnapshot can feed DRA objects to the DRA scheduler plugin, and obtain ResourceClaim modifications back from it. The integration is behind the DRA flag guard, this should be a no-op if the flag is disabled.	2024-12-09 17:14:34 +01:00
Kubernetes Prow Robot	f6c990e25d	Merge pull request #7585 from adrianmoisey/add-adrianmoisey-as-reviewer Add adrianmoisey as VPA reviewer	2024-12-09 15:28:02 +00:00
Kubernetes Prow Robot	6fd8188a11	Merge pull request #7576 from adrianmoisey/local-e2e-tests-actuation Configure e2e local scripts for the actuation suite	2024-12-09 15:18:03 +00:00
Kubernetes Prow Robot	dda0dc8e81	Merge pull request #7572 from davidspek/feat/vpa-remove-go-mod cleanup(vpa): remove go vendoring	2024-12-09 10:54:02 +00:00
Adrian Moisey	475db5b120	Add adrianmoisey as VPA reviewer	2024-12-08 06:28:18 +02:00
Kubernetes Prow Robot	ccbae98104	Merge pull request #7579 from Azure/tallaxes/spot-refresh fix: correctly set the default refresh period for VMSS size (used for Spot instances)	2024-12-08 03:44:04 +00:00
Alex Leites	61c8cdeff7	fix: corresponding test	2024-12-08 02:22:02 +00:00
Alex Leites	5e7ceee507	fix: setting getVmssSizeRefreshPeriod	2024-12-08 01:23:04 +00:00
Kubernetes Prow Robot	bd7156e837	Merge pull request #7557 from gvnc/handle-ooh-capacity-nodes Avoid making delete api calls for nodes that don't have an instance id	2024-12-06 22:48:01 +00:00
Michael Grosser	d65ac6445f	document scale-down-gpu-utilization-threshold Signed-off-by: Michael Grosser <michael@grosser.it>	2024-12-06 12:27:06 -08:00
Adrian Moisey	f0af7c1b78	Configure e2e local scripts for the actuation suite When looking at the e2e tests that run on every PR, I noticed that actuation was run, and it isn't available for local e2e testing	2024-12-06 20:42:07 +02:00
David van der Spek	82908da8e3	fix: undo change to nonroot distroless Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-06 15:16:39 +01:00
David van der Spek	99e2f7ff3c	fix(vpa): update dockerfiles to work without vendor Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-06 15:16:39 +01:00
David van der Spek	f3a50a76d2	cleanup: tidy go mod Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-06 15:16:39 +01:00
David van der Spek	41c6eeef49	cleanup(vpa): add gitignore for vendor Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-06 15:16:39 +01:00
David van der Spek	0b6e7be086	cleanup(vpa): remove vendor directory Signed-off-by: David van der Spek <david.vanderspek@flyrlabs.com>	2024-12-06 15:16:13 +01:00
Kubernetes Prow Robot	5b57c9fdcb	Merge pull request #7573 from davidspek/feat/vpa-e2e-remove-go-mod cleanup(vpa): remove go vendoring for e2e	2024-12-06 14:12:01 +00:00
Kubernetes Prow Robot	a994301b5b	Merge pull request #7575 from omerap12/logging-patch fix: remove newlines from VPA recommendations	2024-12-06 13:40:00 +00:00
Kuba Tużnik	8c7f3fadc6	CA: plumb the DRA flag guard to PredicateSnapshot	2024-12-06 13:40:47 +01:00
Omer Aplatony	55b2b1e792	fmt Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-12-06 14:01:22 +02:00
Omer Aplatony	4bb7fb04dc	fix: remove newlines from VPA recommendations Signed-off-by: Omer Aplatony <omerap12@gmail.com>	2024-12-06 13:54:05 +02:00
Kubernetes Prow Robot	c88f72f149	Merge pull request #7567 from towca/jtuznik/handle-fix CA: Fix a data race in framework.NewHandle	2024-12-06 09:22:02 +00:00
Kubernetes Prow Robot	4d092e5f0a	Merge pull request #7563 from ialidzhikov/fix/vpa-updater-event-logs vpa-updater: Fix logging of events	2024-12-06 03:54:00 +00:00

1 2 3 4 5 ...

8592 Commits All Branches Search

8592 Commits

All Branches