autoscaler

Commit Graph

Author	SHA1	Message	Date
Kubernetes Prow Robot	160af8b0ba	Merge pull request #6556 from damikag/binpacking-time-limiter implement time limiter for binpacking	2024-06-17 06:51:31 -07:00
Damika Gamlath	0728d157c2	implement time limiter for binpacking	2024-06-13 11:50:13 +00:00
Rahul Rangith	333d438dbf	Default min/max sizes for Azure VMSSs return a struct	2024-06-11 13:55:38 -04:00
Aleksandra Malinowska	11de3075fe	Use the same processors for all currently supported provisioning classes	2024-06-06 12:11:34 +02:00
Yaroslava Serdiuk	4ed5df201b	Public ProvisioningClass interface	2024-06-04 09:34:44 +00:00
Yaroslava Serdiuk	08a49c0c66	Review remarks	2024-05-13 09:03:55 +00:00
Yaroslava Serdiuk	5f94f2c429	Add provreqOrchestrator that handle ProvReq classes (#6627 ) * Add provreqOrchestrator that handle ProvReq classes * Review remarks * Review remarks	2024-04-17 09:37:54 -07:00
Karol Wychowaniec	702883d72f	Add an option to Cluster Autoscaler that allows triggering new loops more frequently: based on new unschedulable pods and every time a previous iteration was productive.	2024-03-15 14:46:02 +00:00
Yaroslava Serdiuk	dffff4f557	Add ProvisioningRequest injector (#6529 ) * Add ProvisioningRequests injector * Add test case for Accepted conditions and add supported provreq classes list * Use Passive clock	2024-02-28 02:14:49 -08:00
Mahmoud Atwa	e7ff1cd90f	Introduce LocalSSDSizeProvider interface for GCE	2024-02-16 09:10:27 +00:00
Yaroslava Serdiuk	5286b3f770	Add ProvisioningRequestProcessor (#6488 )	2024-02-14 05:14:46 -08:00
Yaroslava Serdiuk	ed6ebbe8ba	ScaleUp for check-capacity ProvisioningRequestClass (#6451 ) * ScaleUp for check-capacity ProvisioningRequestClass * update condition logic * Update tests * Naming update * Update cluster-autoscaler/core/scaleup/orchestrator/wrapper_orchestrator_test.go Co-authored-by: Bartek Wróblewski <bwroblewski@google.com> --------- Co-authored-by: Bartek Wróblewski <bwroblewski@google.com>	2024-01-30 02:36:59 -08:00
vadasambar	5de49a11fb	feat: support `--scale-down-delay-after-` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update scale down status after every scale up - move scaledown delay status to cluster state/registry - enable scale down if `ScaleDownDelayTypeLocal` is enabled - add new funcs on cluster state to get and update scale down delay status - use timestamp instead of booleans to track scale down delay status Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use existing fields on clusterstate - uses `scaleUpRequests`, `scaleDownRequests` and `scaleUpFailures` instead of `ScaleUpDelayStatus` - changed the above existing fields a little to make them more convenient for use - moved initializing scale down delay processor to static autoscaler (because clusterstate is not available in main.go) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove note saying only `scale-down-after-add` is supported - because we are supporting all the flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: evaluate `scaleDownInCooldown` the old way only if `ScaleDownDelayTypeLocal` is set to `false` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove line saying `--scale-down-delay-type-local` is only supported for `--scale-down-delay-after-add` - because it is not true anymore - we are supporting all `--scale-down-delay-after-` flags per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate tests failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move back initializing processors logic to from static autoscaler to main - we don't want to initialize processors in static autoscaler because anyone implementing an alternative to static_autoscaler has to initialize the processors - and initializing specific processors is making static autoscaler aware of an implementation detail which might not be the best practice Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert changes related to `clusterstate` - since I am going with observer pattern Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add observer interface for state of scaling - to implement observer pattern for tracking state of scale up/downs (as opposed to using clusterstate to do the same) - refactor `ScaleDownCandidatesDelayProcessor` to use fields from the new observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove params passed to `clearScaleUpFailures` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: revert clusterstate tests - approach has changed - I am not making any changes in clusterstate now Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add accidentally deleted lines for clusterstate test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `Add` fn for scale state observer - to easily add new observers - re-word comments - remove redundant params from `NewDefaultScaleDownCandidatesProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI complaining because no comments on fn definitions Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: initialize parent `ScaleDownCandidatesProcessor` - instead of `ScaleDownCandidatesSortingProcessor` and `ScaleDownCandidatesDelayProcessor` separately Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add scale state notifier to list of default processors - initialize processors for `NewDefaultScaleDownCandidatesProcessor` outside and pass them to the fn - this allows more flexibility Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: add observer interface - create a separate observer directory - implement `RegisterScaleUp` function in the clusterstate - TODO: resolve syntax errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: use `scaleStateNotifier` in place of `clusterstate` - delete leftover `scale_stateA_observer.go` (new one is already present in `observers` directory) - register `clustertstate` with `scaleStateNotifier` - use `Register` instead of `Add` function in `scaleStateNotifier` - fix `go build` - wip: fixing tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix syntax errors - add utils package `pointers` for converting `time` to pointer (without having to initialize a new variable) Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip track scale down failures along with scale up failures - I was tracking scale up failures but not scale down failures - fix copyright year 2017 -> 2023 for the new `pointers` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: register failed scale down with scale state notifier - wip writing tests for `scale_down_candidates_delay_processor` - fix CI lint errors - remove test file for `scale_down_candidates_processor` (there is not much to test as of now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add unit tests for `ScaleDownCandidatesDelayProcessor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't track scale up failures in `ScaleDownCandidatesDelayProcessor` - not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: better doc comments for `TestGetScaleDownCandidates` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't ignore error in `NGChangeObserver` - return it instead and let the caller decide what to do with it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change pointers to values in `NGChangeObserver` interface - easier to work with - remove `expectedAddTime` param from `RegisterScaleUp` (not needed for now) - add tests for clusterstate's `RegisterScaleUp` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: conditions in `GetScaleDownCandidates` - set scale down in cool down if the number of scale down candidates is 0 Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: use `ng1` instead of `ng2` in existing test Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: assign directly instead of using `sdProcessor` variable - variable is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: first working test for static autoscaler Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: continue working on static autoscaler tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip second static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `Println` used for debugging Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add static_autoscaler tests for scale down delay per nodegroup flags Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: rebase off the latest `master` - change scale state observer interface's `RegisterFailedScaleup` to reflect latest changes around clusterstate's `RegisterFailedScaleup` in `master` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix clusterstate test failing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing orchestrator test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `defaultScaleDownCandidatesProcessor` -> `combinedScaleDownCandidatesProcessor` - describes the processor better Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: replace `NGChangeObserver` -> `NodeGroupChangeObserver` - makes it easier to understand for someone not familiar with the codebase Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: reword code comment `after` -> `for which` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't return error from `RegisterScaleDown` - not needed as of now (no implementer function returns a non-nil error for this function) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address review comments around ng change observer interface - change dir structure of nodegroup change observer package - stop returning errors wherever it is not needed in the nodegroup change observer interface - rename `NGChangeObserver` -> `NodeGroupChangeObserver` interface (makes it easier to understand) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: make nodegroupchange observer thread-safe Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add TODO to consider using multiple mutexes in nodegroupchange observer Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `time.Now()` directly instead of assigning a variable to it Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: share code for checking if there was a recent scale-up/down/failure Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: convert `ScaleDownCandidatesDelayProcessor` into table tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: change scale state notifier's `Register()` -> `RegisterForNotifications()` - makes it easier to understand what the function does Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: replace scale state notifier `Register` -> `RegisterForNotifications` in test - to fix syntax errors since it is already renamed in the actual code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `clusterStateRegistry` from `delete_in_batch` tests - not needed anymore since we have `scaleStateNotifier` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR review comments Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add empty `RegisterFailedScaleDown` for clusterstate - fix syntax error in static autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2024-01-11 21:46:42 +05:30
Kubernetes Prow Robot	0024f2540e	Merge pull request #6290 from faan11/master Clarify Scale down rule in the documentation (FAQ.md)	2024-01-08 13:08:50 +01:00
faan11	6ec725638c	docs: clarifies scale down operation by CA in FAQ.md and main.go This commit clarifies the condition when a node can be scaled down by the Cluster Autoscaler (CA). The changes updates the section and flag description in the FAQ.md and main.go files.	2024-01-07 23:36:53 +01:00
Yaroslava Serdiuk	d29ffd03b9	Add ProvisioningRequestPodsFilter processor (#6386 ) * Introduce ProvisioningRequestPodsFilter processor * Review	2024-01-03 11:49:36 +01:00
Joachim Bartosik	a5e540d5da	Restore flags for setting QPS limit in CA Partially undo #6274. I noticed that with this change CA get rate limited and slows down significantly (especially during large scale downs).	2023-12-29 13:28:08 +00:00
Kubernetes Prow Robot	fc48d5c052	Merge pull request #6139 from damikag/priority-evictor Implement priority based evictor	2023-12-21 18:18:53 +01:00
damikag	9ffbea4408	implement priority based evictor and refactor drain logic	2023-12-21 16:57:05 +00:00
Kubernetes Prow Robot	b95adf1fa9	Merge pull request #6246 from prashantrewar/deprecate-unused-flag Deprecate unused node-autoprovisioning-enabled and max-autoprovisioned-node-group-count flags	2023-12-14 17:26:40 +01:00
Prashant Rewar	cbebabb452	deprecate unused node-autoprovisioning-enabled and max-autoprovisioned-node-group-count flags Signed-off-by: Prashant Rewar <108176843+prashantrewar@users.noreply.github.com>	2023-12-09 13:49:05 +05:30
Kubernetes Prow Robot	0a1d74f352	Merge pull request #6294 from vadasambar/refactor/kube-client refactor(*): move getKubeClient to utils/kubernetes	2023-11-29 19:28:44 +01:00
qianlei.qianl	ae18f05a61	refactor(*): move getKubeClient to utils/kubernetes (cherry picked from commit `b9f636d2ef`) Signed-off-by: qianlei.qianl <qianlei.qianl@bytedance.com> refactor: move logic to create client to utils/kubernetes pkg - expose `CreateKubeClient` as public function - make `GetKubeConfig` into a private `getKubeConfig` function (can be exposed as a public function in the future if needed) Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CI failing because cloudproviders were not updated to use new autoscaling option fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: define errors as constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass kube client options by value Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-11-29 00:43:59 +05:30
Kubernetes Prow Robot	e23e63a192	Merge pull request #5820 from vadasambar/kwok-poc feat: implement `kwok` cloud provider	2023-11-27 15:15:57 +01:00
vadasambar	cfbee9a4d6	feat: implement kwok cloudprovider feat: wip implement `CloudProvider` interface boilerplate for `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add builder for `kwok` - add logic to scale up and scale down nodes in `kwok` provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip parse node templates from file Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add short README Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement remaining things - to get the provider in a somewhat working state Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add in-cluster `kwok` as pre-requisite in the README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: templates file not correctly marshalling into node list Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `invalid leading UTF-8 octet` error during template parsing - remove encoding using `gob` - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use lister to get and list - instead of uncached kube client - add lister as a field on the provider and nodegroup struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `did not find nodegroup annotation` error - CA was thinking the annotation is not present even though it is - fix a bug with parsing annotation Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: CA node recognizing fake nodegroups - add provider ID to nodes in the format `kwok:<node-name>` - fix invalid `KwokManagedAnnotation` - sanitize template nodes (remove `resourceVersion` etc.,) - not sanitizing the node leads to error during creation of new nodes - abstract code to get NG name into a separate function `getNGNameFromAnnotation` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: node not getting deleted Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add empty test file Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add OWNERS file Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip kwok provider config - add samples for static and dynamic template nodes Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip implement pulling node templates from cluster - add status field to kwok provider config - this is to capture how the nodes would be grouped by (can be annotation or label) - use kwok provider config status to get ng name from the node template Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error in calling `loadNodeTemplatesFromCluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: first draft of dynamic node templates - this allows node templates to be pulled from the cluster - instead of having to specify static templates manually Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: syntax error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract out related code into separate files - use named constants instead of hardcoded values Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: cleanup kwok nodes when CA is exiting - so that the user doesn't have to cleanup the fake nodes themselves Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: return `nil` instead of err for `HasInstance` - because there is no underlying cloud provider (hence no reason to return `cloudprovider.ErrNotImplemented` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: start working on tests for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add `gpuLabelKey` under `nodes` field in kwok provider config - fix validation for kwok provider config Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add motivation doc - update README with more details Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: update kwok provider config example to support pulling gpu labels and types from existing providers - still needs to be implemented in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip update kwok provider config to get gpu label and available types Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: wip read gpu label and available types from specified provider - add available gpu types in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add validation for gpu fields in kwok provider config - load gpu related fields in kwok provider config status Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: implement `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add support to install and uninstall kwok - add option to disable installation - add option to manually specify kwok release tag - add future scope in readme Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add future scope 'evaluate adding support to check if kwok controller already exists' Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: vendor conflict and cyclic import - remove support to get gpu config from the specified provider (can't be used because leads to cyclic import) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add a TODO 'get gpu config from other providers' Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `file` -> `configmap` - load config and templates from configmap instead of file - move `nodes` and `nodegroups` config to top level - add helper to encode configmap data into `[]bytes` - add helper to get current pod namespace Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add new options to the kwok provider config - auto install kwok only if the version is >= v0.4.0 - add test for `GPULabel()` - use `kubectl apply` way of installing kwok instead of kustomize - add test for kwok helpers - add test for kwok config - inject service account name in CA deployment - add example configmap for node templates and kwok provider config in CA helm chart - add permission to create `clusterrolebinding` (so that kwok provider can create a clusterrolebinding with `cluster-admin` role and create/delete upstream manifests) - update kwok provider sample configs - update `README` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: update go.mod to use v1.28 packages Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: `go mod tidy` and `go mod vendor` (again) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: kwok installation code - add functions to create and delete clusterrolebinding to create kwok resources - refactor kwok install and uninstall fns - delete manifests in the opposite order of install ] - add cleaning up left-over kwok installation to future scope Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: nil ptr error - add `TODO` in README for adding docs around kwok config fields Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code to automatically install and uninstall `kwok` - installing/uninstalling requires strong permissions to be granted to `kwok` - granting strong permissions to `kwok` means granting strong permissions to the entire CA codebase - this can pose a security risk - I have removed the code related to install and uninstall for now - will proceed after discussion with the community Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod tidy` and `go mod vendor` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add permission to create nodes - to fix permissions error for kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add more unit tests - add tests for kwok helpers - fix and update kwok config tests - fix a bug where gpu label was getting assigned to `kwokConfig.status.key` - expose `loadConfigFile` -> `LoadConfigFile` - throw error if templates configmap does not have `templates` key (value of which is node templates) - finish test for `GPULabel()` - add tests for `NodeGroupForNode()` - expose `loadNodeTemplatesFromConfigMap` -> `LoadNodeTemplatesFromConfigMap` - fix `KwokCloudProvider`'s kwok config was empty (this caused `GPULabel()` to return empty) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract provider ID code into `getProviderID` fn - fix provider name in test `kwok` -> `kwok:kind-worker-xxx` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` and `go mod tidy Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(cloudprovider/kwok): update info on creating nodegroups based on `hostname/label` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor(charts): replace fromLabelKey value `"kubernetes.io/hostname"` -> `"kwok-nodegroup"` - `"kubernetes.io/hostname"` leads to infinite scale-up Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: support running CA with kwok provider locally Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use global informer factory Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `fromNodeLabelKey: "kwok-nodegroup"` in test templates Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `Cleanup()` logic - clean up only nodes managed by the kwok provider Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix/refactor: nodegroup creation logic - fix issue where fake node was getting created which caused fatal error - use ng annotation to keep track of nodegroups - (when creating nodegroups) don't process nodes which don't have the right ng nabel - suffix ng name with unix timestamp Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor/test(cloudprovider/kwok): write tests for `BuildKwokProvider` and `Cleanup` - pass only the required node lister to cloud provider instead of the entire informer factory - pass the required configmap name to `LoadNodeTemplatesFromConfigMap` instead of passing the entire kwok provider config - implement fake node lister for testing Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test case for dynamic templates in `TestNodeGroupForNode` - remove non-required fields from template node Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `NodeGroups()` - add extra node template without ng selector label to add more variability in the test Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: write tests for `GetNodeGpuConfig()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetAvailableGPUTypes` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add test for `GetResourceLimiter()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for nodegroup's `IncreaseSize()` - abstract error msgs into variables to use them in tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `DeleteNodes()` fn - add check for deleting too many nodes - rename err msg var names to make them consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add tests for ng `DecreaseTargetSize()` - abstract error msgs into variables (for easy use in tests) Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `Nodes()` - add extra test case for `DecreaseTargetSize()` to check lister error Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `TemplateNodeInfo` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): improve tests for `BuildKwokProvider()` - add more test cases - refactor lister for `TestBuildKwokProvider()` and `TestCleanUp()` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): add test for ng `GetOptions` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(cloudprovider/kwok): unset `KWOK_CONFIG_MAP_NAME` at the end of the test - not doing so leads to failure in other tests - remove `kwokRelease` field from kwok config (not used anymore) - this was causing the tests to fail Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump CA chart version - this is because of changes made related to kwok - fix type `everwhere` -> `everywhere` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix linting checks Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: address CI lint errors Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: generate helm docs for `kwokConfigMapName` - remove `KWOK_CONFIG_MAP_KEY` (not being used in the code) - bump helm chart version Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: revise the outline for README - add AEP link to the motivation doc Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: wip create an outline for the README - remove `kwok` field from examples (not needed right now) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add outline for ascii gifs Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename env variable `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update README with info around installation and benefits of using kwok provider - add `Kwok` as a provider in main CA README Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: run `go mod vendor` - remove TODOs that are not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: finish first draft of README Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: env variable in chart `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove redundant/deprecated code Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: bump chart version `9.30.1` -> `9.30.2` - because of kwok provider related changes Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: fix typo `offical` -> `official` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: remove debug log msg Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add links for getting help Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix type in log `external cluster` -> `cluster` Signed-off-by: vadasambar <surajrbanakar@gmail.com> chore: add newline in chart.yaml to fix CI lint Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix mistake `sig-kwok` -> `sig-scheduling` - kwok is a part if sig-scheduling (there is no sig-kwok) Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: fix type `release"` -> `"release"` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: pass informer instead of lister to cloud provider builder fn Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-11-25 00:22:47 +05:30
Kubernetes Prow Robot	39245a5613	Merge pull request #6235 from atwamahmoud/ignore-scheduler-processing Ignore scheduler processing	2023-11-22 13:54:30 +01:00
Mahmoud Atwa	a1ae4d3b57	Update flags, Improve tests readability & use Bypass instead of ignore in naming	2023-11-22 11:18:55 +00:00
Mahmoud Atwa	4635a6dc04	Allow users to specify which schedulers to ignore	2023-11-22 11:18:44 +00:00
Mahmoud Atwa	86ab017967	Fix multiple comments and update flags	2023-11-22 11:17:48 +00:00
Mahmoud Atwa	a1ab7b9e20	Add new pod list processors for clearing TPU requests & filtering out expendable pods Treat non-processed pods yet as unschedulable	2023-11-22 11:16:33 +00:00
Artur Żyliński	e836e47c1e	Remove gce-expander-ephemeral-storage-support flag Always enable the feature	2023-11-15 10:59:06 +01:00
Artur Żyliński	747d0b9af4	Cleanup: Remove separate client for k8s events Remove RateLimiting options - replay on APF for apiserver protection. Details: https://github.com/kubernetes/kubernetes/issues/111880	2023-11-14 11:20:36 +01:00
Artem Minyaylov	ab4c5cb8c7	Initialize default drainability rules	2023-10-19 15:55:34 +00:00
Artem Minyaylov	324a33ede8	Pass DeleteOptions once during default rule creation	2023-10-10 20:35:49 +00:00
Artem Minyaylov	a68b748fd7	Refactor NodeDeleteOptions for use in drainability rules	2023-09-29 17:55:19 +00:00
Kubernetes Prow Robot	e461782e27	Merge pull request #6162 from pohly/log-init-fix fix log initialization	2023-09-29 06:32:42 -07:00
Patrick Ohly	b9c0d91da7	fix log initialization InitLogs unconditionally disables contextual logging, while ValidateAndApply checks the feature gate for that. InitLogs must come first, otherwise --feature-gates=ContextualLogging=true doesn't work.	2023-09-29 14:36:02 +02:00
Eric Lin	74d1f7f349	Trim managedFields in shared informer factory Signed-off-by: Eric Lin <exlin@google.com>	2023-09-28 12:45:47 +00:00
Mahmoud Atwa	1bbbbd6036	fix typo	2023-09-27 09:00:32 +00:00
Mahmoud Atwa	267476fd06	Rename variables & methods to StartupTaint... instead of IgnoreTaint	2023-09-26 13:54:52 +00:00
Mahmoud Atwa	f9d3185f16	Rename IgnoreTaints to StartupTaints & deprecate --ignore-taints flag	2023-09-26 09:12:01 +00:00
Mahmoud Atwa	79a55bbb05	Add startup taint flag, prefix & add status taint prefix	2023-09-22 21:08:07 +00:00
Mahmoud Atwa	d2fe118db9	Add startup taint flag, prefix & add status taint prefix	2023-09-22 20:51:34 +00:00
Patrick Ohly	ade5e0814e	fix incomplete startup of informers Previously, SharedInformerFactory.Start was called before core.NewAutoscaler. That had the effect that any new informer created as part of core.NewAutoscaler, in particular in kubernetes.NewListerRegistryWithDefaultListers, never got started. One of them was the DaemonSet informer. This had the effect that the DaemonSet lister had an empty cache and scale down failed with: I0920 11:06:36.046889 31805 cluster.go:164] node gke-cluster-pohly-default-pool-c9f60a43-5rvz cannot be removed: daemonset for kube-system/pdcsi-node-7hnmc is not present, err: daemonset.apps "pdcsi-node" not found This was on a GKE cluster with cluster-autoscaler running outside of the cluster on a development machine.	2023-09-20 11:20:35 +02:00
Kubernetes Prow Robot	f9a7c7f73f	Merge pull request #6114 from linxiulei/cmd Allow setting content-type in command	2023-09-19 06:23:07 -07:00
Eric Lin	61d784a662	Allow setting content-type in command Signed-off-by: Eric Lin <exlin@google.com>	2023-09-19 08:58:59 +00:00
Youn Jae Kim	6cf41290b6	add elect-leader flag to the pflag	2023-09-15 14:35:56 -07:00
Eric Lin	0f02235e98	Use informer factory to reuse listers Signed-off-by: Eric Lin <exlin@google.com>	2023-09-15 13:57:15 +00:00
Damika Gamlath	aef5a2bffc	Disable dynamic-node-delete-delay-after-taint-enabled by default	2023-09-11 13:20:08 +00:00
Kubernetes Prow Robot	9622a50f55	Merge pull request #6019 from damikag/fix-rc-ca-sched Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between CA and api-server	2023-09-08 07:04:15 -07:00
Damika Gamlath	42691d3443	fix race condition between ca and scheduler Implement dynamically adjustment of NodeDeleteDelayAfterTaint based on round trip time between ca and apiserver	2023-09-08 12:14:23 +00:00
Marc-Andre Dufresne	55c92e1025	add json logging support	2023-08-28 10:32:39 -04:00
Bartłomiej Wróblewski	e39d1b028d	Clean up NodeGroupConfigProcessor interface	2023-08-04 16:00:50 +00:00
Daniel Kłobuszewski	990cd6581c	Enable parallel drain by default.	2023-07-21 17:57:19 +02:00
vadasambar	23f03e112e	feat: support custom scheduler config for in-tree schedulr plugins (without extenders) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `--scheduler-config` -> `--scheduler-config-file` to avoid confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: `goto` causing infinite loop - abstract out running extenders in a separate function Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove code around extenders - we decided not to use scheduler extenders for checking if a pod would fit on a node Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move scheduler config to a `utils/scheduler` package` - use default config as a fallback Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix static_autoscaler test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `GetSchedulerConfiguration` fn - remove falling back - add mechanism to detect if the scheduler config file flag was set - Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: wip add tests for `GetSchedulerConfig` - tests are failing now Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `GetSchedulerConfig` - abstract error messages so that we can use them in the tests - set api version explicitly (this is what upstream does as well) Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: do a round of cleanup to make PR ready for review - make import names consistent Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: use `pflag` to check if the `--scheduler-config-file` flag was set Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add comments for exported error constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: don't export error messages - exporting is not needed Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: add underscore in test file name Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix test failing because of no comment on exported `SchedulerConfigFileFlag` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refacotr: change name of flag variable `schedulerConfig` -> `schedulerConfigFile` - avoids confusion Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add extra test cases for predicate checker - where the predicate checker uses custom scheduler config Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `setFlags` variable - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: abstract custom scheduler configs into `conifg` package - make them constants Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix linting error Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: introduce a new custom test predicate checker - instead of adding a param to the current one - this is so that we don't have to pass `nil` to the existing test predicate checker in many places Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `NewCustomPredicateChecker` -> `NewTestPredicateCheckerWithCustomConfig` - latter narrows down meaning of the function better than former Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `GetSchedulerConfig` -> `ConfigFromPath` - `scheduler.ConfigFromPath` is shorter and feels less vague than `scheduler.GetSchedulerConfig` - move test config to a new package `test` under `config` package Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add `TODO` for replacing code to parse scheduler config - with upstream function Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-13 09:51:33 +05:30
Kubernetes Prow Robot	da96d89b17	Merge pull request #5890 from Bryce-Soghigian/bsoghigian/respecting-bulk-delete fix: setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the same value	2023-07-12 09:45:12 -07:00
Kubernetes Prow Robot	c6893e9e28	Merge pull request #5672 from vadasambar/feat/5399/ignore-daemonsets-utilization-per-nodegroup feat: set `IgnoreDaemonSetsUtilization` per nodegroup for AWS	2023-07-12 07:43:12 -07:00
Damika Gamlath	0f8502c623	Refactor autoscaler.go and static_autoscalar.go to move declaration of the NodeDeletion option to main.go	2023-07-10 08:49:49 +00:00
bsoghigian	1e4507819a	fix: dynamic assignment of the scale down threshold flags. Setting maxEmptyBulkDelete, and maxScaleDownParallelism to be the larger of the two flags in the case both are set	2023-07-08 14:43:41 -07:00
vadasambar	8a73d8ea42	refactor: remove comment line (not relevant anymore) Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 11:51:09 +05:30
vadasambar	7941bab214	feat: set `IgnoreDaemonSetsUtilization` per nodegroup Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: test cases failing for actuator and scaledown/eligibility - abstract default values into `config` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename global `IgnoreDaemonSetsUtilization` -> `GlobalIgnoreDaemonSetsUtilization` in code - there is no change in the flag name - rename `thresholdGetter` -> `configGetter` and tweak it to accomodate `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: reset help text for `ignore-daemonsets-utilization` flag - because per nodegroup override is supported only for AWS ASG tags as of now Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: add info about overriding `--ignore-daemonsets-utilization` per ASG - in AWS cloud provider README Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use a limiting interface in actuator in place of `NodeGroupConfigProcessor` interface - to limit the functions that can be used - since we need it only for `GetIgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: tests failing for actuator - rename `staticNodeGroupConfigProcessor` -> `MockNodeGroupConfigGetter` - move `MockNodeGroupConfigGetter` to test/common so that it can be used in different tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: go lint errors for `MockNodeGroupConfigGetter` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in cloud provider dir Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update node group config processor tests for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: update eligibility test cases for `IgnoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: run actuation tests for 2 NGS - one with `IgnoreDaemonSetsUtilization`: `false` - one with `IgnoreDaemonSetsUtilization`: `true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: add tests for `IgnoreDaemonSetsUtilization` in actuator - add helper to generate multiple ds pods dynamically - get rid of mock config processor because it is not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> test: fix failing tests for actuator Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: remove `GlobalIgnoreDaemonSetUtilization` autoscaling option - not required Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: warn message `DefaultScaleDownUnreadyTimeKey` -> `DefaultIgnoreDaemonSetsUtilizationKey` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `generateDsPods` instead of `generateDsPod` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: `globaIgnoreDaemonSetsUtilization` -> `ignoreDaemonSetsUtilization` Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-07-06 10:31:45 +05:30
mendelski	51f1660db7	deduplicate scale-up test setup	2023-05-11 12:55:59 +00:00
mendelski	3e2d48d86d	Add option parallel-scale-up	2023-05-11 07:49:34 +00:00
Maria Oparka	ca088d26c2	Move MaxNodeProvisionTime to NodeGroupAutoscalingOptions	2023-04-19 11:08:20 +02:00
Aleksandra Gacek	656f1919a8	Limit refresh rate of GCE MIG instances.	2023-04-18 15:08:06 +02:00
vadasambar	ff6fe5833d	feat: check only controller ref to decide if a pod is replicated Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `144a64a402`) fix: set `replicated` to true if controller ref is set to `true` - forgot to add this in the last commit Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `f8f458295d`) fix: remove `checkReferences` - not needed anymore Signed-off-by: vadasambar <surajrbanakar@gmail.com> (cherry picked from commit `5df6e31f8b`) test(drain): add test for custom controller pod Signed-off-by: vadasambar <surajrbanakar@gmail.com> feat: add flag to allow scale down on custom controller pods - set to `false` by default - `false` will be set to `true` by default in the future - right now, we want to ensure backwards compatibility and make the feature available if the flag is explicitly set to `true` - TODO: this code might need some unit tests. Look into adding unit tests. Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: remove `at` symbol in prefix of `vadasambar` - to keep it consistent with previous such mentions in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils): run all drain tests twice - once for `allowScaleDownOnCustomControllerOwnedPods=false` - and once for `allowScaleDownOnCustomControllerOwnedPods=true` Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs(utils): add description for `testOpts` struct Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update FAQ with info about `allow-scale-down-on-custom-controller-owned-pods` flag Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allow-scale-down-on-custom-controller-owned-pods` -> `skip-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename `allowScaleDownOnCustomControllerOwnedPods` -> `skipNodesWithCustomControllerPods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> test(utils/drain): fix failing tests - refactor code to add cusom controller pod test Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix long code comments - clean-up print statements Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: move `expectFatal` right above where it is used - makes the code easier to read Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: fix code comment wording Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: address PR comments - abstract legacy code to check for replicated pods into a separate function so that it's easier to remove in the future - fix param info in the FAQ.md - simplify tests and remove the global variable used in the tests - rename `--skip-nodes-with-custom-controller-pods` -> `--scale-down-nodes-with-custom-controller-pods` Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: rename flag `--scale-down-nodes-with-custom-controller-pods` -> `--skip-nodes-with-custom-controller-pods` - refactor tests Signed-off-by: vadasambar <surajrbanakar@gmail.com> docs: update flag info Signed-off-by: vadasambar <surajrbanakar@gmail.com> fix: forgot to change flag name on a line in the code Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: use `ControllerRef()` directly instead of `controllerRef` - we don't need an extra variable Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: create tests consolidated test cases - from looping over and tweaking shared test cases - so that we don't have to duplicate shared test cases Signed-off-by: vadasambar <surajrbanakar@gmail.com> refactor: append test flag to shared test description - so that the failed test is easy to identify - shallow copy tests and add comments so that others do the same Signed-off-by: vadasambar <surajrbanakar@gmail.com>	2023-03-22 10:51:07 +05:30
Kubernetes Prow Robot	b1b7f39908	Merge pull request #5578 from yaroslava-serdiuk/empty-sorting Add empty nodes sorting for scale down candidates	2023-03-09 04:28:02 -08:00
Kubernetes Prow Robot	205293a7ca	Merge pull request #5537 from arrikto/feature-disable-unready-scaledown cluster-autoscaler: Add option to disable scale down of unready nodes	2023-03-08 07:55:11 -08:00
Yaroslava Serdiuk	cea9d1a73b	Add empty nodes sorting for scale down candidates	2023-03-08 15:43:22 +00:00
Grigoris Thanasoulas	6cf8c329da	cluster-autoscaler: Add option to disable scale down of unready nodes Add flag '--scale-down-unready-enabled' to enable or disable scale-down of unready nodes. Default value set to true for backwards compatibility (i.e., allow scale-down of unready nodes). Signed-off-by: Grigoris Thanasoulas <gregth@arrikto.com>	2023-03-06 15:51:10 +02:00
Yaroslava Serdiuk	a35d6d2269	Fix RemovalSimulation for parallel scale down	2023-03-01 17:30:30 +00:00
tombokombo	93b9f4b8be	fix Signed-off-by: tombokombo <tombo@sysart.tech>	2023-02-07 14:35:23 +01:00
Kubernetes Prow Robot	e911e54f1e	Merge pull request #5214 from tombokombo/fix/asg-resource-tags Fix/asg resource tags	2023-02-07 01:35:01 -08:00
Bartłomiej Wróblewski	b608278386	Add force Daemon Sets option	2023-01-30 11:02:42 +00:00
Bartłomiej Wróblewski	d4b812e936	Add filtering out DS pods from scale-up, refactor default pod list processor	2023-01-23 17:14:46 +00:00
Kubernetes Prow Robot	f507519916	Merge pull request #5423 from yaroslava-serdiuk/sd-sorting Add scale down candidates observer	2023-01-19 10:14:16 -08:00
Yaroslava Serdiuk	541ce04e4b	Add previous scale down candidate sorting	2023-01-19 16:04:50 +00:00
michael mccune	955396e857	remove clusterapi nodegroupset processor as discussed with the cluster api community[0], the nodegroupset processor is being removed from the clusterapi provider implementation in favor of instructing our community on the use of the --balancing-ignore-label flag. due to the wide variety of provider infrastructures that clusterapi can be deployed on, we would prefer to not encode all of these labels in the autoscaler itself. see the linked recording for more information. [0] https://www.youtube.com/watch?v=jbhca_9oPuQ	2023-01-12 15:05:37 -05:00
Kubernetes Prow Robot	b94f340af5	Merge pull request #5402 from Bryce-Soghigian/bsoghigian/adding-configurable-difference-ratios adding configurable difference ratios	2023-01-10 04:03:25 -08:00
bsoghigian	0f8ed0b81f	Configurable difference ratios	2023-01-09 22:40:16 -08:00
Kubernetes Prow Robot	3785a2f82a	Merge pull request #5223 from grosser/grosser/burst cluster-autoscaler: allow setting kuberentes client burst and qps to avoid rate limiting	2022-12-30 06:21:30 -08:00
Michael Grosser	cd26bcfe60	allow setting kuberentes client burst and qps to avoid rate limiting	2022-12-29 13:54:04 -08:00
Bartłomiej Wróblewski	62c68e1280	Move PredicateChecker initialization before processors initialization	2022-12-27 15:21:41 +00:00
Kubernetes Prow Robot	a46a095fe2	Merge pull request #5362 from yasinlachiny/maxnodetotal set cluster_autoscaler_max_nodes_count dynamically	2022-12-19 00:33:44 -08:00
yasin.lachiny	6d9fed5211	set cluster_autoscaler_max_nodes_count dynamically Signed-off-by: yasin.lachiny <yasin.lachiny@gmail.com>	2022-12-11 00:18:03 +01:00
Bartłomiej Wróblewski	2e1b04ff69	Add default PodListProcessor wrapper	2022-12-09 16:26:56 +00:00
Yaroslava Serdiuk	ae45571af9	Create a Planner object if --parallelDrain=true	2022-12-07 11:36:05 +00:00
Aleksandra Gacek	bae587d20c	Break node categorization in scale down planner on timeout.	2022-12-05 11:34:53 +01:00
Bartłomiej Wróblewski	10d3f25996	Use scheduling package in filterOutSchedulable processor	2022-11-23 12:32:59 +00:00
Xintong Liu	524886fca5	Support scaling up node groups to the configured min size if needed	2022-11-02 21:47:00 -07:00
Daniel Kłobuszewski	18f2e67c4f	Split out code from simulator package	2022-10-18 11:51:44 +02:00
Kubernetes Prow Robot	dc73ea9076	Merge pull request #5235 from UiPath/fix_node_delete Add option to wait for a period of time after node tainting/cordoning	2022-10-17 04:29:07 -07:00
Kubernetes Prow Robot	d022e260a1	Merge pull request #4956 from damirda/feature/scale-up-delay-annotations Add podScaleUpDelay annotation support	2022-10-13 09:29:02 -07:00
Alexandru Matei	0ee2a359e7	Add option to wait for a period of time after node tainting/cordoning Node state is refreshed and checked again before deleting the node It gives kube-scheduler time to acknowledge that nodes state has changed and to stop scheduling pods on them	2022-10-13 10:37:56 +03:00
tombokombo	17a09a0efe	fix asg resource tags Signed-off-by: tombokombo <tombo@sysart.tech>	2022-09-27 01:25:29 +02:00
Kubernetes Prow Robot	b3c6b60e1c	Merge pull request #5060 from yaroslava-serdiuk/deleting-in-batch Introduce NodeDeleterBatcher to ScaleDown actuator	2022-09-22 10:11:06 -07:00
Yaroslava Serdiuk	65b0d78e6e	Introduce NodeDeleterBatcher to ScaleDown actuator	2022-09-22 16:19:45 +00:00
Damir Markovic	11d150e920	Add podScaleUpDelay annotation support	2022-09-05 20:24:19 +02:00
James Ravn	1b98b3823a	Allow balancing by labels exclusively Adds a new flag `--balance-label` which allows users to balance between node groups exclusively via labels. This gives users the flexibility to specify the similarity logic themselves when --balance-similar-node-groups is in use.	2022-07-06 10:34:18 +01:00
Maciek Pytel	ab891418f6	Limit binpacking based on #new_nodes or time The binpacking algorithm is O(#pending_pods * #new_nodes) and calculating a very large scale-up can get stuck for minutes or even hours, leading to CA failing it's healthcheck and going down. The new limiting prevents this scenario by stopping binpacking after reaching specified threshold. Any pods that remain pending as a result of shorter binpacking will be processed next autoscaler loop. The thresholds used can be controlled with newly introduced flags: --max-nodes-per-scaleup and --max-nodegroup-binpacking-duration. The limiting can be disabled by setting both flags to 0 (not recommended, especially for --max-nodegroup-binpacking-duration).	2022-06-20 17:02:51 +02:00

1 2 3 4 5 ...

313 Commits