Commit Graph

177 Commits

Author SHA1 Message Date
shubham82 5651d9d92d Bump CA Chart image to v1.32 2025-01-06 14:49:11 +05:30
Nicolas Laduguie 6e8a5483c6
Bump minor version 2024-12-28 14:46:50 +01:00
Nicolas Laduguie dc3491b15e
Docs 2024-12-28 14:46:39 +01:00
Nicolas Laduguie 39890d7729
Docs 2024-12-28 14:46:39 +01:00
nicolas-laduguie 47fad26c3e
feat(helm): custom arguments 2024-12-28 14:46:09 +01:00
soer3n 833af67cbd bump chart version
Signed-off-by: soer3n <srenhenning@googlemail.com>
2024-12-20 09:33:46 +01:00
soer3n 470ced91f5 fix typo for namespace condition in node group auto discovery helm deployment
Signed-off-by: soer3n <srenhenning@googlemail.com>
2024-12-12 08:21:17 +01:00
shubham82 7a73dd1db5 Add Shubham82 to the reviewer 2024-12-02 15:53:17 +05:30
shubham82 9963069589 Improvement: Added service.clusterIP 2024-11-01 15:57:58 +05:30
shubham82 058219f64d Fixed ExternalIPs service link. 2024-10-01 21:45:31 +05:30
Kellen Sappington dfb30e6cf1 Only render ServiceMonitor annotations if not empty 2024-09-30 20:49:20 +00:00
Henrik Gerdes c50e5a9236
feat: allow setting init containers for cluster-autoscaler
Signed-off-by: Henrik Gerdes <hegerdes@outlook.de>
2024-09-30 16:53:22 +02:00
Danny Seymour 5593f981bd
feat: Add relabelings config to ServiceMonitor resource 2024-09-27 11:57:03 -07:00
Thomas Stadler 5abbb4a097
feat(cluster-autoscaler/exoscale): add support for --nodes
Signed-off-by: Thomas Stadler <thomas.stadler@whizus.com>
2024-09-24 13:18:58 +02:00
Kubernetes Prow Robot b644c1307b
Merge pull request #7305 from gjtempleton/Charts-Add-Reviewer
Charts - Add JackFrancis to reviewers
2024-09-24 11:42:00 +01:00
Guy Templeton a0a475ca42
Merge branch 'master' into chore/securityContext 2024-09-23 23:46:28 +01:00
Guy Templeton fe9543b730
Bump chart version 2024-09-23 23:45:04 +01:00
Pierluigi Lenoci 378839bbef
Merge branch 'master' into fix_5633 2024-09-24 00:23:47 +02:00
Pierluigi Lenoci af45a733aa
Chart bump 2024-09-24 00:23:22 +02:00
Guy Templeton ddfa6b9ce5
Charts - Add JackFrancis to reviewers 2024-09-23 21:17:28 +01:00
Pierluigi Lenoci da9f8d6806
helm-docs fix 2024-09-05 16:38:38 +02:00
Pierluigi Lenoci 23cfd34e95
Added the ability to specify K8s extra objects 2024-09-05 16:32:39 +02:00
shubham82 8330997575 Bump CA Chart image to v1.31 2024-09-03 14:16:50 +05:30
blanchardma 68c3f90a65 chore(helm): enhance securityContext 2024-07-02 18:42:47 +02:00
shubham82 2044a41687 Bump CA Chart image to v1.30 2024-05-06 14:08:34 +05:30
Kubernetes Prow Robot 616cfb652e
Merge pull request #5762 from jackfrancis/helm-chart-clusterapi-clusterNamespace
helm: enable clusterapi namespace autodiscovery
2024-03-18 04:07:30 -07:00
Kubernetes Prow Robot 165738a521
Merge pull request #6502 from NiklasRosenstein/master
feature: Support Hetzner cloud provider in Helm chart
2024-03-18 03:13:39 -07:00
Kubernetes Prow Robot 4bf83f1aaa
Merge pull request #6447 from Jont828/edge-zone
Azure: add support for edge zones
2024-03-17 03:58:03 -07:00
Jack Francis dfcf7d61d9 helm: enable clusterapi namespace autodiscovery 2024-02-12 09:51:06 -08:00
Niklas Rosenstein c49eefeee9
Update Chart.yaml 2024-02-08 12:32:19 +01:00
Niklas Rosenstein b2e3b7aeb5
Update charts/cluster-autoscaler/Chart.yaml
Co-authored-by: Shubham <shubham.kuchhal@india.nec.com>
2024-02-07 11:32:17 +01:00
Niklas Rosenstein 9d73b59046
add missing line breaks 2024-02-06 23:34:18 +01:00
Niklas Rosenstein 14f4c27a2f
use older helm-docs version and remove empty line in values comment 2024-02-06 23:32:52 +01:00
Niklas Rosenstein a8e00f454c
bump chart version 2024-02-06 23:24:44 +01:00
Niklas Rosenstein 6b78974ce4
update README.md.gotmpl and added Helm docs for Hetzner Cloud 2024-02-06 23:21:14 +01:00
Niklas Rosenstein 7ec2259f9e
Merge remote-tracking branch 'upstream/master' 2024-02-06 23:15:34 +01:00
Guy Templeton 26e918c601
Update Auto Labels of Subprojects 2024-02-05 23:00:59 +00:00
Niklas Rosenstein 1410185143
Update charts/cluster-autoscaler/README.md 2024-02-05 10:15:28 +01:00
Niklas Rosenstein 6f5810e9a4 Add instanceType/region support in Helm chart for Hetzner cloud provider 2024-02-03 17:12:35 +01:00
shubham82 3db3d224cb Bump CA Chart image to v1.29 2024-01-29 16:58:25 +05:30
Jont828 7212628934
Implement force delete in Azure provider 2024-01-10 18:34:12 -05:00
Jack Francis e28f9fdcb8 azure: fix chart bugs after AKS vmType deprecation
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2023-12-15 14:34:06 -08:00
Andrea Scarpino a70364d332 heml chart - update cluster-autoscaler to 1.28 2023-12-05 20:41:29 +01:00
Jack Francis 9e526aed3e Azure: Remove AKS vmType
Signed-off-by: Jack Francis <jackfrancis@gmail.com>
2023-11-27 07:17:06 -08:00
vadasambar cfbee9a4d6 feat: implement kwok cloudprovider
feat: wip implement `CloudProvider` interface boilerplate for `kwok` provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add builder for `kwok`
- add logic to scale up and scale down nodes in `kwok` provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip parse node templates from file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add short README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement remaining things
- to get the provider in a somewhat working state
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add in-cluster `kwok` as pre-requisite in the README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: templates file not correctly marshalling into node list
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `invalid leading UTF-8 octet` error during template parsing
- remove encoding using `gob`
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: use lister to get and list
- instead of uncached kube client
- add lister as a field on the provider and nodegroup struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `did not find nodegroup annotation` error
- CA was thinking the annotation is not present even though it is
- fix a bug with parsing annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: CA node recognizing fake nodegroups
- add provider ID to nodes in the format `kwok:<node-name>`
- fix invalid `KwokManagedAnnotation`
- sanitize template nodes (remove `resourceVersion` etc.,)
- not sanitizing the node leads to error during creation of new nodes
- abstract code to get NG name into a separate function `getNGNameFromAnnotation`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: node not getting deleted
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add empty test file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: add OWNERS file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip kwok provider config
- add samples for static and dynamic template nodes
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip implement pulling node templates from cluster
- add status field to kwok provider config
- this is to capture how the nodes would be grouped by (can be annotation or label)
- use kwok provider config status to get ng name from the node template
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: syntax error in calling `loadNodeTemplatesFromCluster`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: first draft of dynamic node templates
- this allows node templates to be pulled from the cluster
- instead of having to specify static templates manually
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: syntax error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract out related code into separate files
- use named constants instead of hardcoded values
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: cleanup kwok nodes when CA is exiting
- so that the user doesn't have to cleanup the fake nodes themselves
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: return `nil` instead of err for `HasInstance`
- because there is no underlying cloud provider (hence no reason to return `cloudprovider.ErrNotImplemented`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: start working on tests for kwok provider config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add `gpuLabelKey` under `nodes` field in kwok provider config
- fix validation for kwok provider config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add motivation doc
- update README with more details
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: update kwok provider config example to support pulling gpu labels and types from existing providers
- still needs to be implemented in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip update kwok provider config to get gpu label and available types
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip read gpu label and available types from specified provider
- add available gpu types in kwok provider config status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add validation for gpu fields in kwok provider config
- load gpu related fields in kwok provider config status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement `GetAvailableGPUTypes`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add support to install and uninstall kwok
- add option to disable installation
- add option to manually specify kwok release tag
- add future scope in readme
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add future scope 'evaluate adding support to check if kwok controller already exists'
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: vendor conflict and cyclic import
- remove support to get gpu config from the specified provider (can't be used because leads to cyclic import)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add a TODO 'get gpu config from other providers'
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `file` -> `configmap`
- load config and templates from configmap instead of file
- move `nodes` and `nodegroups` config to top level
- add helper to encode configmap data into `[]bytes`
- add helper to get current pod namespace
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add new options to the kwok provider config
- auto install kwok only if the version is >= v0.4.0
- add test for `GPULabel()`
- use `kubectl apply` way of installing kwok instead of kustomize
- add test for kwok helpers
- add test for kwok config
- inject service account name in CA deployment
- add example configmap for node templates and kwok provider config in CA helm chart
- add permission to create `clusterrolebinding` (so that kwok provider can create a clusterrolebinding with `cluster-admin` role and create/delete upstream manifests)
- update kwok provider sample configs
- update `README`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: update go.mod to use v1.28 packages
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: `go mod tidy` and `go mod vendor` (again)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: kwok installation code
- add functions to create and delete clusterrolebinding to create kwok resources
- refactor kwok install and uninstall fns
- delete manifests in the opposite order of install ]
- add cleaning up left-over kwok installation to future scope
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: nil ptr error
- add `TODO` in README for adding docs around kwok config fields
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove code to automatically install and uninstall `kwok`
- installing/uninstalling requires strong permissions to be granted to `kwok`
- granting strong permissions to `kwok` means granting strong permissions to the entire CA codebase
- this can pose a security risk
- I have removed the code related to install and uninstall for now
- will proceed after discussion with the community
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod tidy` and `go mod vendor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add permission to create nodes
- to fix permissions error for kwok provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add more unit tests
- add tests for kwok helpers
- fix and update kwok config tests
- fix a bug where gpu label was getting assigned to `kwokConfig.status.key`
- expose `loadConfigFile` -> `LoadConfigFile`
- throw error if templates configmap does not have `templates` key (value of which is node templates)
- finish test for `GPULabel()`
- add tests for `NodeGroupForNode()`
- expose `loadNodeTemplatesFromConfigMap` -> `LoadNodeTemplatesFromConfigMap`
- fix `KwokCloudProvider`'s kwok config was empty (this caused `GPULabel()` to return empty)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract provider ID code into `getProviderID` fn
- fix provider name in test `kwok` -> `kwok:kind-worker-xxx`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod vendor` and `go mod tidy
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs(cloudprovider/kwok): update info on creating nodegroups based on `hostname/label`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor(charts): replace fromLabelKey value `"kubernetes.io/hostname"` -> `"kwok-nodegroup"`
- `"kubernetes.io/hostname"` leads to infinite scale-up
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: support running CA with kwok provider locally
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use global informer factory
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `fromNodeLabelKey: "kwok-nodegroup"` in test templates
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `Cleanup()` logic
- clean up only nodes managed by the kwok provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix/refactor: nodegroup creation logic
- fix issue where fake node was getting created which caused fatal error
- use ng annotation to keep track of nodegroups
- (when creating nodegroups) don't process nodes which don't have the right ng nabel
- suffix ng name with unix timestamp
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor/test(cloudprovider/kwok): write tests for `BuildKwokProvider` and `Cleanup`
- pass only the required node lister to cloud provider instead of the entire informer factory
- pass the required configmap name to `LoadNodeTemplatesFromConfigMap` instead of passing the entire kwok provider config
- implement fake node lister for testing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test case for dynamic templates in `TestNodeGroupForNode`
- remove non-required fields from template node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `NodeGroups()`
- add extra node template without ng selector label to add more variability in the test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: write tests for `GetNodeGpuConfig()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test for `GetAvailableGPUTypes`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test for `GetResourceLimiter()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add tests for nodegroup's `IncreaseSize()`
- abstract error msgs into variables to use them in tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `DeleteNodes()` fn
- add check for deleting too many nodes
- rename err msg var names to make them consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add tests for ng `DecreaseTargetSize()`
- abstract error msgs into variables (for easy use in tests)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `Nodes()`
- add extra test case for `DecreaseTargetSize()` to check lister error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `TemplateNodeInfo`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): improve tests for `BuildKwokProvider()`
- add more test cases
- refactor lister for `TestBuildKwokProvider()` and `TestCleanUp()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `GetOptions`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): unset `KWOK_CONFIG_MAP_NAME` at the end of the test
- not doing so leads to failure in other tests
- remove `kwokRelease` field from kwok config (not used anymore) - this was causing the tests to fail
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: bump CA chart version
- this is because of changes made related to kwok
- fix type `everwhere` -> `everywhere`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: fix linting checks
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: address CI lint errors
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: generate helm docs for `kwokConfigMapName`
- remove `KWOK_CONFIG_MAP_KEY` (not being used in the code)
- bump helm chart version
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: revise the outline for README
- add AEP link to the motivation doc
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: wip create an outline for the README
- remove `kwok` field from examples (not needed right now)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add outline for ascii gifs
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename env variable `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update README with info around installation and benefits of using kwok provider
- add `Kwok` as a provider in main CA README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod vendor`
- remove TODOs that are not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: finish first draft of README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: env variable in chart `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove redundant/deprecated code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: bump chart version `9.30.1` -> `9.30.2`
- because of kwok provider related changes
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: fix typo `offical` -> `official`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: remove debug log msg
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add links for getting help
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix type in log `external cluster` -> `cluster`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: add newline in chart.yaml to fix CI lint
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix mistake `sig-kwok` -> `sig-scheduling`
- kwok is a part if sig-scheduling (there is no sig-kwok)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix type `release"` -> `"release"`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: pass informer instead of lister to cloud provider builder fn
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
2023-11-25 00:22:47 +05:30
Matt Dainty db80037c51 fix: Add revisionHistoryLimit override to cluster-autoscaler
Signed-off-by: Matt Dainty <matt@bodgit-n-scarper.com>
2023-11-17 12:46:12 +00:00
Mike Tougeron 841315f327 Template the autoDiscovery.clusterName variable in the Helm chart 2023-11-16 14:07:50 -08:00
Thomas Güttler 54bfbfae1f Update README.md: Link to Cluster-API
Add Link to Cluster API.
2023-11-15 16:52:56 +01:00
Guy Templeton bb623b0abb
Update Chart.yaml 2023-11-14 23:12:48 +00:00
Guy Templeton 996bf0661d
Merge branch 'master' into add-version-labels 2023-11-14 23:11:41 +00:00