feat: implement kwok cloudprovider

feat: wip implement `CloudProvider` interface boilerplate for `kwok` provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add builder for `kwok`
- add logic to scale up and scale down nodes in `kwok` provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip parse node templates from file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add short README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement remaining things
- to get the provider in a somewhat working state
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add in-cluster `kwok` as pre-requisite in the README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: templates file not correctly marshalling into node list
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `invalid leading UTF-8 octet` error during template parsing
- remove encoding using `gob`
- not required
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: use lister to get and list
- instead of uncached kube client
- add lister as a field on the provider and nodegroup struct
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: `did not find nodegroup annotation` error
- CA was thinking the annotation is not present even though it is
- fix a bug with parsing annotation
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: CA node recognizing fake nodegroups
- add provider ID to nodes in the format `kwok:<node-name>`
- fix invalid `KwokManagedAnnotation`
- sanitize template nodes (remove `resourceVersion` etc.,)
- not sanitizing the node leads to error during creation of new nodes
- abstract code to get NG name into a separate function `getNGNameFromAnnotation`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: node not getting deleted
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add empty test file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: add OWNERS file
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip kwok provider config
- add samples for static and dynamic template nodes
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip implement pulling node templates from cluster
- add status field to kwok provider config
- this is to capture how the nodes would be grouped by (can be annotation or label)
- use kwok provider config status to get ng name from the node template
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: syntax error in calling `loadNodeTemplatesFromCluster`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: first draft of dynamic node templates
- this allows node templates to be pulled from the cluster
- instead of having to specify static templates manually
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: syntax error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract out related code into separate files
- use named constants instead of hardcoded values
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: cleanup kwok nodes when CA is exiting
- so that the user doesn't have to cleanup the fake nodes themselves
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: return `nil` instead of err for `HasInstance`
- because there is no underlying cloud provider (hence no reason to return `cloudprovider.ErrNotImplemented`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: start working on tests for kwok provider config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add `gpuLabelKey` under `nodes` field in kwok provider config
- fix validation for kwok provider config
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add motivation doc
- update README with more details
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: update kwok provider config example to support pulling gpu labels and types from existing providers
- still needs to be implemented in the code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip update kwok provider config to get gpu label and available types
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: wip read gpu label and available types from specified provider
- add available gpu types in kwok provider config status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add validation for gpu fields in kwok provider config
- load gpu related fields in kwok provider config status
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: implement `GetAvailableGPUTypes`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add support to install and uninstall kwok
- add option to disable installation
- add option to manually specify kwok release tag
- add future scope in readme
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add future scope 'evaluate adding support to check if kwok controller already exists'
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: vendor conflict and cyclic import
- remove support to get gpu config from the specified provider (can't be used because leads to cyclic import)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add a TODO 'get gpu config from other providers'
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename `file` -> `configmap`
- load config and templates from configmap instead of file
- move `nodes` and `nodegroups` config to top level
- add helper to encode configmap data into `[]bytes`
- add helper to get current pod namespace
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: add new options to the kwok provider config
- auto install kwok only if the version is >= v0.4.0
- add test for `GPULabel()`
- use `kubectl apply` way of installing kwok instead of kustomize
- add test for kwok helpers
- add test for kwok config
- inject service account name in CA deployment
- add example configmap for node templates and kwok provider config in CA helm chart
- add permission to create `clusterrolebinding` (so that kwok provider can create a clusterrolebinding with `cluster-admin` role and create/delete upstream manifests)
- update kwok provider sample configs
- update `README`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: update go.mod to use v1.28 packages
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: `go mod tidy` and `go mod vendor` (again)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: kwok installation code
- add functions to create and delete clusterrolebinding to create kwok resources
- refactor kwok install and uninstall fns
- delete manifests in the opposite order of install ]
- add cleaning up left-over kwok installation to future scope
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: nil ptr error
- add `TODO` in README for adding docs around kwok config fields
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove code to automatically install and uninstall `kwok`
- installing/uninstalling requires strong permissions to be granted to `kwok`
- granting strong permissions to `kwok` means granting strong permissions to the entire CA codebase
- this can pose a security risk
- I have removed the code related to install and uninstall for now
- will proceed after discussion with the community
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod tidy` and `go mod vendor`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: add permission to create nodes
- to fix permissions error for kwok provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add more unit tests
- add tests for kwok helpers
- fix and update kwok config tests
- fix a bug where gpu label was getting assigned to `kwokConfig.status.key`
- expose `loadConfigFile` -> `LoadConfigFile`
- throw error if templates configmap does not have `templates` key (value of which is node templates)
- finish test for `GPULabel()`
- add tests for `NodeGroupForNode()`
- expose `loadNodeTemplatesFromConfigMap` -> `LoadNodeTemplatesFromConfigMap`
- fix `KwokCloudProvider`'s kwok config was empty (this caused `GPULabel()` to return empty)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: abstract provider ID code into `getProviderID` fn
- fix provider name in test `kwok` -> `kwok:kind-worker-xxx`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod vendor` and `go mod tidy
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs(cloudprovider/kwok): update info on creating nodegroups based on `hostname/label`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor(charts): replace fromLabelKey value `"kubernetes.io/hostname"` -> `"kwok-nodegroup"`
- `"kubernetes.io/hostname"` leads to infinite scale-up
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

feat: support running CA with kwok provider locally
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use global informer factory
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: use `fromNodeLabelKey: "kwok-nodegroup"` in test templates
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: `Cleanup()` logic
- clean up only nodes managed by the kwok provider
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix/refactor: nodegroup creation logic
- fix issue where fake node was getting created which caused fatal error
- use ng annotation to keep track of nodegroups
- (when creating nodegroups) don't process nodes which don't have the right ng nabel
- suffix ng name with unix timestamp
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor/test(cloudprovider/kwok): write tests for `BuildKwokProvider` and `Cleanup`
- pass only the required node lister to cloud provider instead of the entire informer factory
- pass the required configmap name to `LoadNodeTemplatesFromConfigMap` instead of passing the entire kwok provider config
- implement fake node lister for testing
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test case for dynamic templates in `TestNodeGroupForNode`
- remove non-required fields from template node
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add tests for `NodeGroups()`
- add extra node template without ng selector label to add more variability in the test
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: write tests for `GetNodeGpuConfig()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test for `GetAvailableGPUTypes`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test: add test for `GetResourceLimiter()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add tests for nodegroup's `IncreaseSize()`
- abstract error msgs into variables to use them in tests
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `DeleteNodes()` fn
- add check for deleting too many nodes
- rename err msg var names to make them consistent
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add tests for ng `DecreaseTargetSize()`
- abstract error msgs into variables (for easy use in tests)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `Nodes()`
- add extra test case for `DecreaseTargetSize()` to check lister error
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `TemplateNodeInfo`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): improve tests for `BuildKwokProvider()`
- add more test cases
- refactor lister for `TestBuildKwokProvider()` and `TestCleanUp()`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): add test for ng `GetOptions`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

test(cloudprovider/kwok): unset `KWOK_CONFIG_MAP_NAME` at the end of the test
- not doing so leads to failure in other tests
- remove `kwokRelease` field from kwok config (not used anymore) - this was causing the tests to fail
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: bump CA chart version
- this is because of changes made related to kwok
- fix type `everwhere` -> `everywhere`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: fix linting checks
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: address CI lint errors
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: generate helm docs for `kwokConfigMapName`
- remove `KWOK_CONFIG_MAP_KEY` (not being used in the code)
- bump helm chart version
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: revise the outline for README
- add AEP link to the motivation doc
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: wip create an outline for the README
- remove `kwok` field from examples (not needed right now)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add outline for ascii gifs
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: rename env variable `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: update README with info around installation and benefits of using kwok provider
- add `Kwok` as a provider in main CA README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: run `go mod vendor`
- remove TODOs that are not needed anymore
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: finish first draft of README
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

fix: env variable in chart `KWOK_CONFIG_MAP_NAME` -> `KWOK_PROVIDER_CONFIGMAP`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: remove redundant/deprecated code
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: bump chart version `9.30.1` -> `9.30.2`
- because of kwok provider related changes
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: fix typo `offical` -> `official`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: remove debug log msg
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: add links for getting help
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: fix type in log `external cluster` -> `cluster`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

chore: add newline in chart.yaml to fix CI lint
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix mistake `sig-kwok` -> `sig-scheduling`
- kwok is a part if sig-scheduling (there is no sig-kwok)
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

docs: fix type `release"` -> `"release"`
Signed-off-by: vadasambar <surajrbanakar@gmail.com>

refactor: pass informer instead of lister to cloud provider builder fn
Signed-off-by: vadasambar <surajrbanakar@gmail.com>
This commit is contained in:
vadasambar 2023-05-30 11:52:11 +05:30
parent 8de60c98a5
commit cfbee9a4d6
43 changed files with 4757 additions and 14 deletions

View File

@ -11,4 +11,4 @@ name: cluster-autoscaler
sources:
- https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
type: application
version: 9.32.0
version: 9.32.1

View File

@ -419,6 +419,7 @@ vpa:
| image.repository | string | `"registry.k8s.io/autoscaling/cluster-autoscaler"` | Image repository |
| image.tag | string | `"v1.27.2"` | Image tag |
| kubeTargetVersionOverride | string | `""` | Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands. |
| kwokConfigMapName | string | `"kwok-provider-config"` | configmap for configuring kwok provider |
| magnumCABundlePath | string | `"/etc/kubernetes/ca-bundle.crt"` | Path to the host's CA bundle, from `ca-file` in the cloud-config file. |
| magnumClusterName | string | `""` | Cluster name or ID in Magnum. Required if `cloudProvider=magnum` and not setting `autoDiscovery.clusterName`. |
| nameOverride | string | `""` | String to partially override `cluster-autoscaler.fullname` template (will maintain the release name) |

View File

@ -42,6 +42,8 @@ rules:
verbs:
- watch
- list
- create
- delete
- get
- update
- apiGroups:
@ -120,6 +122,7 @@ rules:
verbs:
- list
- watch
- get
- apiGroups:
- coordination.k8s.io
resources:

View File

@ -0,0 +1,416 @@
{{- if or (eq .Values.cloudProvider "kwok") }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.kwokConfigMapName | default "kwok-provider-config" }}
namespace: {{ .Release.Namespace }}
data:
config: |-
# if you see '\n' everywhere, remove all the trailing spaces
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "kwok-nodegroup"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
# gpuConfig:
# # to tell kwok provider what label should be considered as GPU label
# gpuLabelKey: "k8s.amazonaws.com/accelerator"
# availableGPUTypes:
# "nvidia-tesla-k80": {}
# "nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok: {} # default: fetch latest release of kwok from github and install it
# # you can also manually specify which kwok release you want to install
# # for example:
# kwok:
# release: v0.3.0
# # you can also disable installing kwok in CA code (and install your own kwok release)
# kwok:
# install: false (true if not specified)
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kwok-provider-templates
namespace: {{ .Release.Namespace }}
data:
templates: |-
# if you see '\n' everywhere, remove all the trailing spaces
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:16Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-control-plane
kwok-nodegroup: control-plane
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
node.kubernetes.io/exclude-from-external-load-balancers: ""
name: kind-control-plane
resourceVersion: "506"
uid: 86716ec7-3071-4091-b055-77b4361d1dca
spec:
podCIDR: 10.244.0.0/24
podCIDRs:
- 10.244.0.0/24
providerID: kind://docker/kind/kind-control-plane
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
status:
addresses:
- address: 172.18.0.2
type: InternalIP
- address: kind-control-plane
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:46Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: 96f8c8b8c8ae4600a3654341f207586e
operatingSystem: linux
osImage: Ubuntu 22.04.2 LTS
systemUUID: 111aa932-7f99-4bef-aaf7-36aa7fb9b012
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
name: kind-worker
resourceVersion: "577"
uid: 2ac0eb71-e5cf-4708-bbbf-476e8f19842b
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:05Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: a98a13ff474d476294935341f1ba9816
operatingSystem: linux
osImage: Ubuntu 22.04.2 LTS
systemUUID: 5f3c1af8-a385-4776-85e4-73d7f4252b44
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker2
kwok-nodegroup: kind-worker2
kubernetes.io/os: linux
name: kind-worker2
resourceVersion: "578"
uid: edc7df38-feb2-4089-9955-780562bdd21e
spec:
podCIDR: 10.244.1.0/24
podCIDRs:
- 10.244.1.0/24
providerID: kind://docker/kind/kind-worker2
status:
addresses:
- address: 172.18.0.4
type: InternalIP
- address: kind-worker2
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:08Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: fa9f4cd3b3a743bc867b04e44941dcb2
operatingSystem: linux
osImage: Ubuntu 22.04.2 LTS
systemUUID: f36c0f00-8ba5-4c8c-88bc-2981c8d377b9
kind: List
metadata:
resourceVersion: ""
{{- end }}

View File

@ -125,6 +125,14 @@ spec:
{{- end }}
{{- end }}
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
{{- if and (eq .Values.cloudProvider "aws") (ne .Values.awsRegion "") }}
- name: AWS_REGION
value: "{{ .Values.awsRegion }}"
@ -207,6 +215,9 @@ spec:
secretKeyRef:
key: api-zone
name: {{ default (include "cluster-autoscaler.fullname" .) .Values.secretKeyRefNameOverride }}
{{- else if eq .Values.cloudProvider "kwok" }}
- name: KWOK_PROVIDER_CONFIGMAP
value: "{{.Values.kwokConfigMapName | default "kwok-provider-config"}}"
{{- end }}
{{- range $key, $value := .Values.extraEnv }}
- name: {{ $key }}

View File

@ -244,6 +244,9 @@ image:
# kubeTargetVersionOverride -- Allow overriding the `.Capabilities.KubeVersion.GitVersion` check. Useful for `helm template` commands.
kubeTargetVersionOverride: ""
# kwokConfigMapName -- configmap for configuring kwok provider
kwokConfigMapName: "kwok-provider-config"
# magnumCABundlePath -- Path to the host's CA bundle, from `ca-file` in the cloud-config file.
magnumCABundlePath: "/etc/kubernetes/ca-bundle.crt"

View File

@ -31,6 +31,7 @@ You should also take a look at the notes and "gotchas" for your specific cloud p
* [HuaweiCloud](./cloudprovider/huaweicloud/README.md)
* [IonosCloud](./cloudprovider/ionoscloud/README.md)
* [Kamatera](./cloudprovider/kamatera/README.md)
* [Kwok](./cloudprovider/kwok/README.md)
* [Linode](./cloudprovider/linode/README.md)
* [Magnum](./cloudprovider/magnum/README.md)
* [OracleCloud](./cloudprovider/oci/README.md)

View File

@ -1,5 +1,5 @@
//go:build !gce && !aws && !azure && !kubemark && !alicloud && !magnum && !digitalocean && !clusterapi && !huaweicloud && !ionoscloud && !linode && !hetzner && !bizflycloud && !brightbox && !packet && !oci && !vultr && !tencentcloud && !scaleway && !externalgrpc && !civo && !rancher && !volcengine && !baiducloud && !cherry && !cloudstack && !exoscale && !kamatera && !ovhcloud
// +build !gce,!aws,!azure,!kubemark,!alicloud,!magnum,!digitalocean,!clusterapi,!huaweicloud,!ionoscloud,!linode,!hetzner,!bizflycloud,!brightbox,!packet,!oci,!vultr,!tencentcloud,!scaleway,!externalgrpc,!civo,!rancher,!volcengine,!baiducloud,!cherry,!cloudstack,!exoscale,!kamatera,!ovhcloud
//go:build !gce && !aws && !azure && !kubemark && !alicloud && !magnum && !digitalocean && !clusterapi && !huaweicloud && !ionoscloud && !linode && !hetzner && !bizflycloud && !brightbox && !packet && !oci && !vultr && !tencentcloud && !scaleway && !externalgrpc && !civo && !rancher && !volcengine && !baiducloud && !cherry && !cloudstack && !exoscale && !kamatera && !ovhcloud && !kwok
// +build !gce,!aws,!azure,!kubemark,!alicloud,!magnum,!digitalocean,!clusterapi,!huaweicloud,!ionoscloud,!linode,!hetzner,!bizflycloud,!brightbox,!packet,!oci,!vultr,!tencentcloud,!scaleway,!externalgrpc,!civo,!rancher,!volcengine,!baiducloud,!cherry,!cloudstack,!exoscale,!kamatera,!ovhcloud,!kwok
/*
Copyright 2018 The Kubernetes Authors.
@ -39,6 +39,7 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/huaweicloud"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/ionoscloud"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/kamatera"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/kwok"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/linode"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/magnum"
oci "k8s.io/autoscaler/cluster-autoscaler/cloudprovider/oci/instancepools"
@ -50,6 +51,7 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/volcengine"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/vultr"
"k8s.io/autoscaler/cluster-autoscaler/config"
"k8s.io/client-go/informers"
)
// AvailableCloudProviders supported by the cloud provider builder.
@ -72,6 +74,7 @@ var AvailableCloudProviders = []string{
cloudprovider.ClusterAPIProviderName,
cloudprovider.IonoscloudProviderName,
cloudprovider.KamateraProviderName,
cloudprovider.KwokProviderName,
cloudprovider.LinodeProviderName,
cloudprovider.BizflyCloudProviderName,
cloudprovider.BrightboxProviderName,
@ -87,7 +90,10 @@ var AvailableCloudProviders = []string{
// DefaultCloudProvider is GCE.
const DefaultCloudProvider = cloudprovider.GceProviderName
func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscoveryOptions, rl *cloudprovider.ResourceLimiter) cloudprovider.CloudProvider {
func buildCloudProvider(opts config.AutoscalingOptions,
do cloudprovider.NodeGroupDiscoveryOptions,
rl *cloudprovider.ResourceLimiter,
informerFactory informers.SharedInformerFactory) cloudprovider.CloudProvider {
switch opts.CloudProviderName {
case cloudprovider.BizflyCloudProviderName:
return bizflycloud.BuildBizflyCloud(opts, do, rl)
@ -129,6 +135,8 @@ func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGro
return ionoscloud.BuildIonosCloud(opts, do, rl)
case cloudprovider.KamateraProviderName:
return kamatera.BuildKamatera(opts, do, rl)
case cloudprovider.KwokProviderName:
return kwok.BuildKwok(opts, do, rl, informerFactory)
case cloudprovider.LinodeProviderName:
return linode.BuildLinode(opts, do, rl)
case cloudprovider.OracleCloudProviderName:

View File

@ -0,0 +1,43 @@
//go:build kwok
// +build kwok
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package builder
import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider/kwok"
"k8s.io/autoscaler/cluster-autoscaler/config"
)
// AvailableCloudProviders supported by the cloud provider builder.
var AvailableCloudProviders = []string{
cloudprovider.KwokProviderName,
}
// DefaultCloudProvider for Kwok-only build is Kwok.
const DefaultCloudProvider = cloudprovider.KwokProviderName
func buildCloudProvider(opts config.AutoscalingOptions, do cloudprovider.NodeGroupDiscoveryOptions, rl *cloudprovider.ResourceLimiter) cloudprovider.CloudProvider {
switch opts.CloudProviderName {
case cloudprovider.KwokProviderName:
return kwok.BuildKwokCloudProvider(opts, do, rl)(opts, do, rl)
}
return nil
}

View File

@ -20,12 +20,13 @@ import (
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
"k8s.io/autoscaler/cluster-autoscaler/context"
"k8s.io/client-go/informers"
klog "k8s.io/klog/v2"
)
// NewCloudProvider builds a cloud provider from provided parameters.
func NewCloudProvider(opts config.AutoscalingOptions) cloudprovider.CloudProvider {
func NewCloudProvider(opts config.AutoscalingOptions, informerFactory informers.SharedInformerFactory) cloudprovider.CloudProvider {
klog.V(1).Infof("Building %s cloud provider.", opts.CloudProviderName)
do := cloudprovider.NodeGroupDiscoveryOptions{
@ -42,7 +43,7 @@ func NewCloudProvider(opts config.AutoscalingOptions) cloudprovider.CloudProvide
return nil
}
provider := buildCloudProvider(opts, do, rl)
provider := buildCloudProvider(opts, do, rl, informerFactory)
if provider != nil {
return provider
}

View File

@ -60,6 +60,8 @@ const (
KamateraProviderName = "kamatera"
// KubemarkProviderName gets the provider name of kubemark
KubemarkProviderName = "kubemark"
// KwokProviderName gets the provider name of kwok
KwokProviderName = "kwok"
// HuaweicloudProviderName gets the provider name of huaweicloud
HuaweicloudProviderName = "huaweicloud"
// IonoscloudProviderName gets the provider name of ionoscloud

View File

@ -125,7 +125,7 @@ func main() {
},
UserAgent: "user-agent",
}
cloudProvider := cloudBuilder.NewCloudProvider(autoscalingOptions)
cloudProvider := cloudBuilder.NewCloudProvider(autoscalingOptions, nil)
srv := wrapper.NewCloudProviderGrpcWrapper(cloudProvider)
// listen

View File

@ -0,0 +1,7 @@
approvers:
- vadasambar
reviewers:
- vadasambar
labels:
- area/provider/kwok

View File

@ -0,0 +1,266 @@
With `kwok` provider you can:
* Run **CA** (cluster-autoscaler) in your terminal and connect it to a cluster (like a kubebuilder controller). You don't have to run CA in an actual cluster to test things out.
![](./docs/images/run-kwok-locally-1.png)
![](./docs/images/run-kwok-locally-2.png)
* Perform a "dry-run" to test autoscaling behavior of CA without creating actual VMs in your cloud provider.
* Run CA in your local kind cluster with nodes and workloads from a remote cluster (you can also use nodes from the same cluster).
![](./docs/images/kwok-as-dry-run-1.png)
![](./docs/images/kwok-as-dry-run-2.png)
* Test behavior of CA against a large number of fake nodes (of your choice) with metrics.
![](./docs/images/large-number-of-nodes-1.png)
![](./docs/images/large-number-of-nodes-2.png)
* etc.,
## What is `kwok` provider? Why `kwok` provider?
Check the doc around [motivation](./docs/motivation.md).
## How to use `kwok` provider
### In a Kubernetes cluster:
#### 1. Install `kwok` controller
Follow [the official docs to install `kwok`](https://kwok.sigs.k8s.io/docs/user/kwok-in-cluster/) in a cluster.
#### 2. Configure cluster-autoscaler to use `kwok` cloud provider
*Using helm chart*:
```shell
helm upgrade --install <release-name> charts/cluster-autoscaler \
--set "serviceMonitor.enabled"=true --set "serviceMonitor.namespace"=default \
--set "cloudprovider"=kwok --set "image.tag"="<image-tag>" \
--set "image.repository"="<image-repo>" \
--set "autoDiscovery.clusterName"="kind-kind" \
--set "serviceMonitor.selector.release"="prom"
```
Replace `<release-name>` with the release name you want.
Replace `<image-tag>` with the image tag you want. Replace `<image-repo>` with the image repo you want
(check [releases](https://github.com/kubernetes/autoscaler/releases) for the official image repos and tags)
Note that `kwok` provider doesn't use `autoDiscovery.clusterName`. You can use a fake value for `autoDiscovery.clusterName`.
Replace `"release"="prom"` with the label selector for `ServiceMonitor` in your grafana/prometheus installation.
For example, if you are using prometheus operator, you can find the service monitor label selector using
```shell
kubectl get prometheus -ojsonpath='{.items[*].spec.serviceMonitorSelector}' | jq # using jq is optional
```
Here's what it looks like
![](./docs/images/prom-match-labels.png)
`helm upgrade ...` command above installs cluster-autoscaler with `kwok` cloud provider settings. The helm chart by default installs a default kwok provider configuration (`kwok-provider-config` ConfigMap) and sample template nodes (`kwok-provider-templates` ConfigMap) to get you started. Replace the content of these ConfigMaps according to your need.
If you already have cluster-autoscaler running and don't want to use `helm ...`, you can make the following changes to get kwok provider working:
1. Create `kwok-provider-config` ConfigMap for kwok provider config
2. Create `kwok-provider-templates` ConfigMap for node templates
3. Set `POD_NAMESPACE` env variable in the CA Deployment (if it is not there already)
4. Set `--cloud-provider=kwok` in the CA Deployment
5. That's all.
For 1 and 2, you can refer to helm chart for the ConfigMaps. You can render them from the helm chart using:
```
helm template charts/cluster-autoscaler/ --set "cloudProvider"="kwok" -s templates/configmap.yaml --namespace=default
```
Replace `--namespace` with namespace where your CA pod is running.
If you want to temporarily revert back to your previous cloud provider, just change the `--cloud-provider=kwok`.
No other provider uses `kwok-provider-config` and `kwok-provider-templates` ConfigMap (you can keep them in the cluster or delete them if you want to revert completely). `POD_NAMESPACE` is used only by kwok provider (at the time of writing this).
#### 3. Configure `kwok` cloud provider
Decide if you want to use static template nodes or dynamic template nodes ([check the FAQ](#3-what-is-the-difference-between-static-template-nodes-and-dynamic-template-nodes) to understand the difference).
If you want to use static template nodes,
`kwok-provider-config` ConfigMap in the helm chart by default is set to use static template nodes (`readNodesFrom` is set to `configmap`). CA helm chart also installs a `kwok-provider-templates` ConfigMap with sample node yamls by default. If you want to use your own node yamls,
```shell
# delete the existing configmap
kubectl delete configmap kwok-provider-templates
# create a new configmap with your own node yamls
kubectl create configmap kwok-provider-templates --from-file=templates=template-nodes.yaml
```
Replace `template-nodes.yaml` with path to your template nodes file.
If you are using your own template nodes in the `kwok-provider-templates` ConfigMap, make sure you have set the correct value for `nodegroups.fromNodeLabelKey`/`nodegroups.fromNodeAnnotation`. Not doing so will make CA not scale up nodes (it won't throw any error either).
If you want to use dynamic template nodes,
Set `readNodesFrom` in `kwok-provider-config` ConfigMap to `cluster`. This tells kwok provider to use live nodes from the cluster as template nodes.
If you are using live nodes from cluster as template nodes in the `kwok-provider-templates` ConfigMap, make sure you have set the correct value for `nodegroups.fromNodeLabelKey`/`nodegroups.fromNodeAnnotation`. Not doing so will make CA not scale up nodes (it won't throw any error either).
### For local development
1. Point your kubeconfig to the cluster where you want to test your changes
Using [`kubectx`](https://github.com/ahmetb/kubectx):
```
kubectx <cluster-name>
```
Using `kubectl`:
```
kubectl config get-contexts
```
2. Create `kwok-provider-config` and `kwok-provider-templates` ConfigMap in the cluster you want to test your changes.
This is important because even if you run CA locally with kwok provider, kwok provider still searches for the `kwok-provider-config` ConfigMap and `kwok-provider-templates` (because by default `kwok-provider-config` has `readNodesFrom` set to `configmap`) in the cluster it connects to.
You can create both the ConfigMap resources from the helm chart like this:
```shell
helm template charts/cluster-autoscaler/ --set "cloudProvider"="kwok" -s templates/configmap.yaml --namespace=default | kubectl apply -f -
```
`--namespace` has to match `POD_NAMESPACE` env variable you set below.
3. Run CA locally
```shell
# replace `KUBERNETES_SERVICE_HOST` and `KUBERNETES_SERVICE_PORT`
# with your kubernetes api server url
# you can find it with `kubectl cluster-info`
# example:
# $ kubectl cluster-info
# Kubernetes control plane is running at https://127.0.0.1:36357
# ...
export KUBERNETES_SERVICE_HOST=https://127.0.0.1
export KUBERNETES_SERVICE_PORT=36357
# POD_NAMESPACE is the namespace where you want to look for
# your `kwok-provider-config` and `kwok-provider-templates` ConfigMap
export POD_NAMESPACE=default
# KWOK_PROVIDER_MODE tells kwok provider that we are running CA locally
export KWOK_PROVIDER_MODE=local
# `2>&1` redirects both stdout and stderr to VS Code (remove `| code -` if you don't use VS Code)
go run main.go --kubeconfig=/home/suraj/.kube/config --cloud-provider=kwok --namespace=default --logtostderr=true --stderrthreshold=info --v=5 2>&1 | code -
```
This is what it looks like in action:
![](./docs/images/run-kwok-locally-3.png)
## Tweaking the `kwok` provider
You can change the behavior of `kwok` provider by tweaking the kwok provider configuration in `kwok-provider-config` ConfigMap:
```yaml
# only v1alpha1 is supported right now
apiVersion: v1alpha1
# possible values: [cluster,configmap]
# cluster: use nodes from cluster as template nodes
# configmap: use node yamls from a configmap as template nodes
readNodesFrom: configmap
# nodegroups specifies nodegroup level config
nodegroups:
# fromNodeLabelKey's value is used to group nodes together into nodegroups
# For example, say you want to group nodes with same value for `node.kubernetes.io/instance-type`
# label as a nodegroup. Here are the nodes you have:
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# Your nodegroups will look like this:
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# fromNodeAnnotation's value is used to group nodes together into nodegroups
# (basically same as `fromNodeLabelKey` except based on annotation)
# you can specify either of `fromNodeLabelKey` OR `fromNodeAnnotation`
# (both are not allowed)
fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
# nodes specifies node level config
nodes:
# skipTaint is used to enable/disable adding kwok provider taint on the template nodes
# default is false so that even if you run the provider in a production cluster
# you don't have to worry about production workload
# getting accidentally scheduled on the fake nodes
skipTaint: true # default: false
# gpuConfig is used to specify gpu config for the node
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
# availableGPUTypes is used to specify available GPU types
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
# configmap specifies config map name and key which stores the kwok provider templates in the cluster
# Only applicable when `readNodesFrom: configmap`
configmap:
name: kwok-provider-templates
key: kwok-config # default: config
```
By default, kwok provider looks for `kwok-provider-config` ConfigMap. If you want to use a different ConfigMap name, set the env variable `KWOK_PROVIDER_CONFIGMAP` (e.g., `KWOK_PROVIDER_CONFIGMAP=kpconfig`). You can set this env variable in the helm chart using `kwokConfigMapName` OR you can set it directly in the cluster-autoscaler Deployment with `kubectl edit deployment ...`.
### FAQ
#### 1. What is the difference between `kwok` and `kwok` provider?
`kwok` is an open source project under `sig-scheduling`.
> KWOK is a toolkit that enables setting up a cluster of thousands of Nodes in seconds. Under the scene, all Nodes are simulated to behave like real ones, so the overall approach employs a pretty low resource footprint that you can easily play around on your laptop.
https://kwok.sigs.k8s.io/
`kwok` provider refers to the cloud provider extension/plugin in cluster-autoscaler which uses `kwok` to create fake nodes.
#### 2. What does a template node exactly mean?
Template node is the base node yaml `kwok` provider uses to create a new node in the cluster.
#### 3. What is the difference between static template nodes and dynamic template nodes?
Static template nodes are template nodes created using the node yaml specified by the user in `kwok-provider-templates` ConfigMap while dynamic template nodes are template nodes based on the node yaml of the current running nodes in the cluster.
#### 4. Can I use both static and dynamic template nodes together?
As of now, no you can't (but it's an interesting idea). If you have a specific usecase, please create an issue and we can talk more there!
#### 5. What is the difference between kwok provider config and template nodes config?
kwok provider config is configuration to change the behavior of kwok provider (and not the underlying `kwok` toolkit) while template nodes config is the ConfigMap you can use to specify static node templates.
### Gotchas
1. kwok provider by default taints the template nodes with `kwok-provider: true` taint so that production workloads don't get scheduled on these nodes accidentally. You have to tolerate the taint to schedule your workload on the nodes created by the kwok provider. You can turn this off by setting `nodes.skipTaint: true` in the kwok provider config.
2. Make sure the label/annotation for `fromNodeLabelKey`/`fromNodeAnnotation` in kwok provider config is actually present on the template nodes. If it isn't present on the template nodes, kwok provider will not be able to create new nodes.
3. Note that kwok provider makes the following changes to all the template nodes:
(pseudocode)
```
node.status.nodeInfo.kubeletVersion = "fake"
node.annotations["kwok.x-k8s.io/node"] = "fake"
node.annotations["cluster-autoscaler.kwok.nodegroup/name"] = "<name-of-the-nodegroup>"
node.spec.providerID = "kwok:<name-of-the-node>"
node.spec.taints = append(node.spec.taints, {
key: "kwok-provider",
value: "true",
effect: "NoSchedule",
})
```
## I have a problem/suggestion/question/idea/feature request. What should I do?
Awesome! Please:
* [Create a new issue](https://github.com/kubernetes/autoscaler/issues/new/choose) around it. Mention `@vadasambar` (I try to respond within a working day).
* Start a slack thread aruond it in kubernetes `#sig-autoscaling` channel (for invitation, check [this](https://slack.k8s.io/)). Mention `@vadasambar` (I try to respond within a working day)
* Add it to the [weekly sig-autoscaling meeting agenda](https://docs.google.com/document/d/1RvhQAEIrVLHbyNnuaT99-6u9ZUMp7BfkPupT2LAZK7w/edit) (happens [on Mondays](https://github.com/kubernetes/community/tree/master/sig-autoscaling#meetings))
Please don't think too much about creating an issue. We can always close it if it doesn't make sense.
## What is not supported?
* Creating kwok nodegroups based on `kubernetes/hostname` node label. Why? Imagine you have a `Deployment` (replicas: 2) with pod anti-affinity on the `kubernetes/hostname` label like this:
![](./docs/images/kwok-provider-hostname-label.png)
Imagine you have only 2 unique hostnames values for `kubernetes/hostname` node label in your cluster:
* `hostname1`
* `hostname2`
If you increase the number of replicas in the `Deployment` to 3, CA creates a fake node internally and runs simulations on it to decide if it should scale up. This fake node has `kubernetes/hostname` set to the name of the fake node which looks like `template-node-xxxx-xxxx` (second `xxxx` is random). Since the value of `kubernetes/hostname` on the fake node is not `hostname1` or `hostname2`, CA thinks it can schedule the `Pending` pod on the fake node and hence keeps on scaling up to infinity (or until it can't).
## Troubleshooting
1. Pods are still stuck in `Running` even after CA has cleaned up all the kwok nodes
* `kwok` provider doesn't drain the nodes when it deletes them. It just deletes the nodes. You should see pods running on these nodes change from `Running` state to `Pending` state in a minute or two. But if you don't, try scaling down your workload and scaling it up again. If the issue persists, please create an issue :pray:.
## I want to contribute
Thank you ❤️
It is expected that you know how to build and run CA locally. If you don't, I recommend starting from the [`Makefile`](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/Makefile). Check the CA [FAQ](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md) to know more about CA in general ([including info around building CA and submitting a PR](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#developer)). CA is a big and complex project. If you have any questions or if you get stuck anywhere, [reach out for help](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/kwok/README.md#reach-out-for-help-if-you-get-stuck).
### Get yourself familiar with the `kwok` project
Check https://kwok.sigs.k8s.io/
### Try out the `kwok` provider
Go through [the README](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/kwok/README.md).
### Look for a good first issue
Check [this](https://github.com/kubernetes/autoscaler/issues?q=is%3Aopen+is%3Aissue+label%3Aarea%2Fprovider%2Fkwok+label%3A%22good+first+issue%22) filter for good first issues around `kwok` provider.
### Reach out for help if you get stuck
You can get help in the following ways:
* Mention `@vadasambar` in the issue/PR you are working on.
* Start a slack thread in `#sig-autoscaling` mentioning `@vadasambar` (to join Kubernetes slack click [here](https://slack.k8s.io/)).
* Add it to the weekly [sig-autoscaling meeting](https://github.com/kubernetes/community/tree/master/sig-autoscaling#meetings) agenda (happens on Mondays)

Binary file not shown.

After

Width:  |  Height:  |  Size: 201 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 323 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 474 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 663 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 322 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 286 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 369 KiB

View File

@ -0,0 +1,107 @@
# KWOK (Kubernetes without Kubelet) cloud provider
*This doc was originally a part of https://github.com/kubernetes/autoscaler/pull/5869*
## Introduction
> [KWOK](https://sigs.k8s.io/kwok) is a toolkit that enables setting up a cluster of thousands of Nodes in seconds. Under the scene, all Nodes are simulated to behave like real ones, so the overall approach employs a pretty low resource footprint that you can easily play around on your laptop.
https://kwok.sigs.k8s.io/
## Problem
### 1. It is hard to reproduce an issue happening at scale on local machine
e.g., https://github.com/kubernetes/autoscaler/issues/5769
To reproduce such issues, we have the following options today:
### (a) setup [Kubemark](https://github.com/kubernetes/design-proposals-archive/blob/main/scalability/kubemark.md) on a public cloud provider and try reproducing the issue
You can [setup Kubemark](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-scalability/kubemark-guide.md) ([related](https://github.com/kubernetes/kubernetes/blob/master/test/kubemark/pre-existing/README.md)) and use the [`kubemark` cloudprovider](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/kubemark) (kubemark [proposal](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/kubemark_integration.md)) directly or [`cluster-api` cloudprovider with kubemark](https://github.com/kubernetes-sigs/cluster-api-provider-kubemark)
In either case,
> Every running Kubemark setup looks like the following:
> 1) A running Kubernetes cluster pointed to by the local kubeconfig
> 2) A separate VM where the kubemark master is running
> 3) Some hollow-nodes that run on the Kubernetes Cluster from #1
> 4) The hollow-nodes are configured to talk with the kubemark master at #2
https://github.com/kubernetes/kubernetes/blob/master/test/kubemark/pre-existing/README.md#introduction
You need to setup a separate VM (Virtual Machine) with master components to get Kubemark running.
> Currently we're running HollowNode with a limit of 0.09 CPU core/pod and 220MB of memory. However, if we also take into account the resources absorbed by default cluster addons and fluentD running on the 'external' cluster, this limit becomes ~0.1 CPU core/pod, thus allowing ~10 HollowNodes to run per core (on an "n1-standard-8" VM node).
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-scalability/kubemark-guide.md#starting-a-kubemark-cluster
Kubemark can mimic 10 nodes with 1 CPU core.
In reality it might be lesser than 10 nodes,
> Using Kubernetes and [kubemark](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scalability/kubemark.md) on GCP we have created a following 1000 node cluster setup:
>* 1 master - 1-core VM
>* 17 nodes - 8-core VMs, each core running up to 8 Kubemark nodes.
>* 1 Kubemark master - 32-core VM
>* 1 dedicated VM for Cluster Autoscaler
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/scalability_tests.md#test-setup
This is a cheaper option than (c) but if you want to setup Kubemark on your local machine you will need a master node and 1 core per 10 fake nodes i.e., if you want to mimic 100 nodes, that's 10 cores of CPU + extra CPU for master node. Unless you have 10-12 free cores on your local machine, it is hard to run scale tests with Kubemark for nodes > 100.
### (b) try to get as much information from the issue reporter as possible and try to reproduce the issue by tweaking our code tests
This works well if the issue is easy to reproduce by tweaking tests e.g., you want to check why scale down is getting blocked on a particular pod. You can do so by mimicing the pod in the tests by adding an entry [here](https://github.com/kubernetes/autoscaler/blob/1009797f5585d7bf778072ba59fd12eb2b8ab83c/cluster-autoscaler/utils/drain/drain_test.go#L878-L887) and running
```
cluster-autoscaler/utils/drain$ go test -run TestDrain
```
But when you want to test an issue related to scale e.g., CA is slow in scaling up, it is hard to do.
### (c) try reproducing the issue using the same CA setup as user with actual nodes in a public cloud provider
e.g., if the issue reporter has a 200 node cluster in AWS, try creating a 200 node cluster in AWS and use the same CA flags as the issue reporter.
This is a viable option if you already have a cluster running with a similar size but otherwise creating a big cluster just to reproduce the issue is costly.
### 2. It is hard to confirm behavior of CA at scale
For example, a user with a big Kubernetes cluster (> 100-200 nodes) wants to check if adding scheduling properties to their workloads (node affinity, pod affinity, node selectors etc.,) leads to better utilization of the nodes (which saves cost). To give a more concrete example, imagine a situation like this:
1. There is a cluster with > 100 nodes. cpu to memory ratio for the nodes is 1:1, 1:2, 1:8 and 1:16
2. It is observed that 1:16 nodes are underutilized on memory
3. It is observed that workloads with cpu to memory ratio of 1:7 are getting scheduled on 1:16 nodes thereby leaving some memory unused
e.g.,
1:16 node looks like this:
CPUs: 8 Cores
Memory: 128Gi
workload (1:7 memory:cpu ratio):
CPUs: 1 Core
Memory: 7 Gi
resources wasted on the node: 8 % 1 CPU(s) + 128 % 7 Gi
= 0 CPUs + 2 Gi memory = 2Gi of wasted memory
1:8 node looks like this:
CPUs: 8 Cores
Memory: 64 Gi
workload (1:7 memory:cpu ratio):
CPUs: 1 Core
Memory: 7 Gi
resources wasted on the node: 8 % 1 CPU(s) + 64 % 7 Gi
= 0 CPUs + 1 Gi memory = 1Gi of wasted memory
If 1:7 can somehow be scheduled on 1:8 node using node selector or required node affinity, the wastage would go down. User wants to add required node affinity on 1:7 workloads and see how CA would behave without creating actual nodes in public cloud provider. The goal here is to see if the theory is true and if there are any side-effects.
This can be done with Kubemark today but a public cloud provider would be needed to mimic the cluster of this size. It can't be done on a local cluster (kind/minikube etc.,).
### How does it look in action?
You can check it [here](https://github.com/kubernetes/autoscaler/issues/5769#issuecomment-1590541506).
### FAQ
1. **Will this be patched back to older releases of Kubernetes?**
As of writing this, the plan is to release it as a part of Kubernetes 1.28 and patch it back to 1.27 and 1.26.
2. **Why did we not use GRPC or cluster-api provider to implement this?**
The idea was to enable users/contributors to be able to scale-test issues around different cloud providers (e.g., https://github.com/kubernetes/autoscaler/issues/5769). Implementing the `kwok` provider in-tree means we are closer to the actual implementation of our most-used cloud providers (adding gRPC communication in between would mean an extra delay which is not there in our in-tree cloud providers). Although only in-tree provider is a part of this proposal, overall plan is to:
* Implement in-tree provider to cover most of the common use-cases
* Implement `kwok` provider for `clusterapi` provider so that we can provision `kwok` nodes using `clusterapi` provider ([someone is already working on this](https://kubernetes.slack.com/archives/C8TSNPY4T/p1685648610609449))
* Implement gRPC provider if there is user demand
3. **How performant is `kwok` provider really compared to `kubemark` provider?**
`kubemark` provider seems to need 1 core per 8-10 nodes (based on our [last scale tests](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/scalability_tests.md#test-setup)). This means we need roughly 10 cores to simulate 100 nodes in `kubemark`.
`kwok` provider can simulate 385 nodes for 122m of CPU and 521Mi of memory. This means, CPU wise `kwok` can simulate 385 / 0.122 =~ 3155 nodes per 1 core of CPU.
![](images/kwok-provider-grafana.png)
![](images/kwok-provider-in-action.png)
4. **Can I think of `kwok` as a dry-run for my actual `cloudprovider`?**
That is the goal but note that the definition of what exactly `dry-run` means is not very clear and can mean different things for different users. You can think of it as something similar to a `dry-run`.

View File

@ -0,0 +1,153 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"context"
"errors"
"fmt"
"os"
"strings"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/yaml"
kubeclient "k8s.io/client-go/kubernetes"
klog "k8s.io/klog/v2"
)
const (
defaultConfigName = "kwok-provider-config"
configKey = "config"
)
// based on https://github.com/kubernetes/kubernetes/pull/63707/files
func getCurrentNamespace() string {
currentNamespace := os.Getenv("POD_NAMESPACE")
if strings.TrimSpace(currentNamespace) == "" {
klog.Info("env variable 'POD_NAMESPACE' is empty")
klog.Info("trying to read current namespace from serviceaccount")
// Fall back to the namespace associated with the service account token, if available
if data, err := os.ReadFile("/var/run/secrets/kubernetes.io/serviceaccount/namespace"); err == nil {
if ns := strings.TrimSpace(string(data)); len(ns) > 0 {
currentNamespace = ns
} else {
klog.Fatal("couldn't get current namespace from serviceaccount")
}
} else {
klog.Fatal("couldn't read serviceaccount to get current namespace")
}
}
klog.Infof("got current pod namespace '%s'", currentNamespace)
return currentNamespace
}
func getConfigMapName() string {
configMapName := os.Getenv("KWOK_PROVIDER_CONFIGMAP")
if strings.TrimSpace(configMapName) == "" {
klog.Infof("env variable 'KWOK_PROVIDER_CONFIGMAP' is empty (defaulting to '%s')", defaultConfigName)
configMapName = defaultConfigName
}
return configMapName
}
// LoadConfigFile loads kwok provider config from k8s configmap
func LoadConfigFile(kubeClient kubeclient.Interface) (*KwokProviderConfig, error) {
configMapName := getConfigMapName()
currentNamespace := getCurrentNamespace()
c, err := kubeClient.CoreV1().ConfigMaps(currentNamespace).Get(context.Background(), configMapName, v1.GetOptions{})
if err != nil {
return nil, fmt.Errorf("failed to get configmap '%s': %v", configMapName, err)
}
decoder := yaml.NewYAMLOrJSONDecoder(strings.NewReader(c.Data[configKey]), 4096)
kwokConfig := KwokProviderConfig{}
if err := decoder.Decode(&kwokConfig); err != nil {
return nil, fmt.Errorf("failed to decode kwok config: %v", err)
}
if kwokConfig.status == nil {
kwokConfig.status = &GroupingConfig{}
}
switch kwokConfig.ReadNodesFrom {
case nodeTemplatesFromConfigMap:
if kwokConfig.ConfigMap == nil {
return nil, fmt.Errorf("please specify a value for 'configmap' in kwok config (currently empty or undefined)")
}
if strings.TrimSpace(kwokConfig.ConfigMap.Name) == "" {
return nil, fmt.Errorf("please specify 'configmap.name' in kwok config (currently empty or undefined)")
}
case nodeTemplatesFromCluster:
default:
return nil, fmt.Errorf("'readNodesFrom' in kwok config is invalid (expected: '%s' or '%s'): %s",
groupNodesByLabel, groupNodesByAnnotation,
kwokConfig.ReadNodesFrom)
}
if kwokConfig.Nodegroups == nil {
return nil, fmt.Errorf("please specify a value for 'nodegroups' in kwok config (currently empty or undefined)")
}
if strings.TrimSpace(kwokConfig.Nodegroups.FromNodeLabelKey) == "" &&
strings.TrimSpace(kwokConfig.Nodegroups.FromNodeLabelAnnotation) == "" {
return nil, fmt.Errorf("please specify either 'nodegroups.fromNodeLabelKey' or 'nodegroups.fromNodeAnnotation' in kwok provider config (currently empty or undefined)")
}
if strings.TrimSpace(kwokConfig.Nodegroups.FromNodeLabelKey) != "" &&
strings.TrimSpace(kwokConfig.Nodegroups.FromNodeLabelAnnotation) != "" {
return nil, fmt.Errorf("please specify either 'nodegroups.fromNodeLabelKey' or 'nodegroups.fromNodeAnnotation' in kwok provider config (you can't use both)")
}
if strings.TrimSpace(kwokConfig.Nodegroups.FromNodeLabelKey) != "" {
kwokConfig.status.groupNodesBy = groupNodesByLabel
kwokConfig.status.key = kwokConfig.Nodegroups.FromNodeLabelKey
} else {
kwokConfig.status.groupNodesBy = groupNodesByAnnotation
kwokConfig.status.key = kwokConfig.Nodegroups.FromNodeLabelAnnotation
}
if kwokConfig.Nodes == nil {
kwokConfig.Nodes = &NodeConfig{}
} else {
if kwokConfig.Nodes.GPUConfig == nil {
klog.Warningf("nodes.gpuConfig is empty or undefined")
} else {
if kwokConfig.Nodes.GPUConfig.GPULabelKey != "" &&
kwokConfig.Nodes.GPUConfig.AvailableGPUTypes != nil {
kwokConfig.status.availableGPUTypes = kwokConfig.Nodes.GPUConfig.AvailableGPUTypes
kwokConfig.status.gpuLabel = kwokConfig.Nodes.GPUConfig.GPULabelKey
} else {
return nil, errors.New("nodes.gpuConfig.gpuLabelKey or file.nodes.gpuConfig.availableGPUTypes is empty")
}
}
}
if kwokConfig.Kwok == nil {
kwokConfig.Kwok = &KwokConfig{}
}
return &kwokConfig, nil
}

View File

@ -0,0 +1,285 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"testing"
"os"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/kubernetes/fake"
core "k8s.io/client-go/testing"
)
var testConfigs = map[string]string{
defaultConfigName: testConfig,
"without-kwok": withoutKwok,
"with-static-kwok-release": withStaticKwokRelease,
"skip-kwok-install": skipKwokInstall,
}
// with node templates from configmap
const testConfig = `
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "kwok-nodegroup"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok: {}
`
// with node templates from configmap
const testConfigSkipTaint = `
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "kwok-nodegroup"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
skipTaint: true
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok: {}
`
const testConfigDynamicTemplates = `
apiVersion: v1alpha1
readNodesFrom: cluster # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "kwok-nodegroup"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok: {}
`
const testConfigDynamicTemplatesSkipTaint = `
apiVersion: v1alpha1
readNodesFrom: cluster # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "kwok-nodegroup"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
skipTaint: true
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok: {}
`
const withoutKwok = `
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
`
const withStaticKwokRelease = `
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
kwok:
release: "v0.2.1"
configmap:
name: kwok-provider-templates
`
const skipKwokInstall = `
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
kwok:
skipInstall: true
`
func TestLoadConfigFile(t *testing.T) {
defer func() {
os.Unsetenv("KWOK_PROVIDER_CONFIGMAP")
}()
fakeClient := &fake.Clientset{}
fakeClient.Fake.AddReactor("get", "configmaps", func(action core.Action) (bool, runtime.Object, error) {
getAction := action.(core.GetAction)
if getAction == nil {
return false, nil, nil
}
cmName := getConfigMapName()
if getAction.GetName() == cmName {
return true, &v1.ConfigMap{
Data: map[string]string{
configKey: testConfigs[cmName],
},
}, nil
}
return true, nil, errors.NewNotFound(v1.Resource("configmaps"), "whatever")
})
os.Setenv("POD_NAMESPACE", "kube-system")
kwokConfig, err := LoadConfigFile(fakeClient)
assert.Nil(t, err)
assert.NotNil(t, kwokConfig)
assert.NotNil(t, kwokConfig.status)
assert.NotEmpty(t, kwokConfig.status.gpuLabel)
os.Setenv("KWOK_PROVIDER_CONFIGMAP", "without-kwok")
kwokConfig, err = LoadConfigFile(fakeClient)
assert.Nil(t, err)
assert.NotNil(t, kwokConfig)
assert.NotNil(t, kwokConfig.status)
assert.NotEmpty(t, kwokConfig.status.gpuLabel)
os.Setenv("KWOK_PROVIDER_CONFIGMAP", "with-static-kwok-release")
kwokConfig, err = LoadConfigFile(fakeClient)
assert.Nil(t, err)
assert.NotNil(t, kwokConfig)
assert.NotNil(t, kwokConfig.status)
assert.NotEmpty(t, kwokConfig.status.gpuLabel)
os.Setenv("KWOK_PROVIDER_CONFIGMAP", "skip-kwok-install")
kwokConfig, err = LoadConfigFile(fakeClient)
assert.Nil(t, err)
assert.NotNil(t, kwokConfig)
assert.NotNil(t, kwokConfig.status)
assert.NotEmpty(t, kwokConfig.status.gpuLabel)
}

View File

@ -0,0 +1,163 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
const (
// ProviderName is the cloud provider name for kwok
ProviderName = "kwok"
//NGNameAnnotation is the annotation kwok provider uses to track the nodegroups
NGNameAnnotation = "cluster-autoscaler.kwok.nodegroup/name"
// NGMinSizeAnnotation is annotation on template nodes which specify min size of the nodegroup
NGMinSizeAnnotation = "cluster-autoscaler.kwok.nodegroup/min-count"
// NGMaxSizeAnnotation is annotation on template nodes which specify max size of the nodegroup
NGMaxSizeAnnotation = "cluster-autoscaler.kwok.nodegroup/max-count"
// NGDesiredSizeAnnotation is annotation on template nodes which specify desired size of the nodegroup
NGDesiredSizeAnnotation = "cluster-autoscaler.kwok.nodegroup/desired-count"
// KwokManagedAnnotation is the default annotation
// that kwok manages to decide if it should manage
// a node it sees in the cluster
KwokManagedAnnotation = "kwok.x-k8s.io/node"
groupNodesByAnnotation = "annotation"
groupNodesByLabel = "label"
// // GPULabel is the label added to nodes with GPU resource.
// GPULabel = "cloud.google.com/gke-accelerator"
// for kwok provider config
nodeTemplatesFromConfigMap = "configmap"
nodeTemplatesFromCluster = "cluster"
)
const testTemplates = `
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations: {}
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
k8s.amazonaws.com/accelerator: "nvidia-tesla-k80"
name: kind-worker
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
- apiVersion: v1
kind: Node
metadata:
annotations: {}
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker-2
kubernetes.io/os: linux
k8s.amazonaws.com/accelerator: "nvidia-tesla-k80"
name: kind-worker-2
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker-2
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker-2
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
kind: List
metadata:
resourceVersion: ""
`
// yaml version of fakeNode1, fakeNode2 and fakeNode3
const testTemplatesMinimal = `
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
cluster-autoscaler.kwok.nodegroup/name: ng1
labels:
kwok-nodegroup: ng1
name: node1
spec: {}
- apiVersion: v1
kind: Node
metadata:
annotations:
cluster-autoscaler.kwok.nodegroup/name: ng2
labels:
kwok-nodegroup: ng2
name: node2
spec: {}
- apiVersion: v1
kind: Node
metadata:
annotations: {}
labels: {}
name: node3
spec: {}
kind: List
metadata:
resourceVersion: ""
`

View File

@ -0,0 +1,278 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"bufio"
"context"
"errors"
"fmt"
"io"
"log"
"strconv"
"strings"
"time"
apiv1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/runtime/serializer"
"k8s.io/apimachinery/pkg/util/yaml"
kube_util "k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes"
"k8s.io/client-go/kubernetes"
clientscheme "k8s.io/client-go/kubernetes/scheme"
v1lister "k8s.io/client-go/listers/core/v1"
klog "k8s.io/klog/v2"
)
const (
templatesKey = "templates"
defaultTemplatesConfigName = "kwok-provider-templates"
)
type listerFn func(lister v1lister.NodeLister, filter func(*apiv1.Node) bool) kube_util.NodeLister
func loadNodeTemplatesFromCluster(kc *KwokProviderConfig,
kubeClient kubernetes.Interface,
lister kube_util.NodeLister) ([]*apiv1.Node, error) {
if lister != nil {
return lister.List()
}
nodeList, err := kubeClient.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
if err != nil {
return nil, err
}
nos := []*apiv1.Node{}
// note: not using _, node := range nodeList.Items here because it leads to unexpected behavior
// more info: https://stackoverflow.com/a/38693163/6874596
for i := range nodeList.Items {
nos = append(nos, &(nodeList.Items[i]))
}
return nos, nil
}
// LoadNodeTemplatesFromConfigMap loads template nodes from a k8s configmap
// check https://github.com/vadafoss/node-templates for more info on the parsing logic
func LoadNodeTemplatesFromConfigMap(configMapName string,
kubeClient kubernetes.Interface) ([]*apiv1.Node, error) {
currentNamespace := getCurrentNamespace()
nodeTemplates := []*apiv1.Node{}
c, err := kubeClient.CoreV1().ConfigMaps(currentNamespace).Get(context.Background(), configMapName, v1.GetOptions{})
if err != nil {
return nil, fmt.Errorf("failed to get configmap '%s': %v", configMapName, err)
}
if c.Data[templatesKey] == "" {
return nil, fmt.Errorf("configmap '%s' doesn't have 'templates' key", configMapName)
}
scheme := runtime.NewScheme()
clientscheme.AddToScheme(scheme)
decoder := serializer.NewCodecFactory(scheme).UniversalDeserializer()
multiDocReader := yaml.NewYAMLReader(bufio.NewReader(strings.NewReader(c.Data[templatesKey])))
objs := []runtime.Object{}
for {
buf, err := multiDocReader.Read()
if err != nil {
if err == io.EOF {
break
}
return nil, err
}
obj, _, err := decoder.Decode(buf, nil, nil)
if err != nil {
return nil, err
}
objs = append(objs, obj)
}
if len(objs) > 1 {
for _, obj := range objs {
if node, ok := obj.(*apiv1.Node); ok {
nodeTemplates = append(nodeTemplates, node)
}
}
} else if nodelist, ok := objs[0].(*apiv1.List); ok {
for _, item := range nodelist.Items {
o, _, err := decoder.Decode(item.Raw, nil, nil)
if err != nil {
return nil, err
}
if node, ok := o.(*apiv1.Node); ok {
nodeTemplates = append(nodeTemplates, node)
}
}
} else {
return nil, errors.New("invalid templates file (found something other than nodes in the file)")
}
return nodeTemplates, nil
}
func createNodegroups(nodes []*apiv1.Node, kubeClient kubernetes.Interface, kc *KwokProviderConfig, initCustomLister listerFn,
allNodeLister v1lister.NodeLister) []*NodeGroup {
ngs := map[string]*NodeGroup{}
// note: not using _, node := range nodes here because it leads to unexpected behavior
// more info: https://stackoverflow.com/a/38693163/6874596
for i := range nodes {
belongsToNg := ((kc.status.groupNodesBy == groupNodesByAnnotation &&
nodes[i].GetAnnotations()[kc.status.key] != "") ||
(kc.status.groupNodesBy == groupNodesByLabel &&
nodes[i].GetLabels()[kc.status.key] != ""))
if !belongsToNg {
continue
}
ngName := getNGName(nodes[i], kc)
if ngs[ngName] != nil {
ngs[ngName].targetSize += 1
continue
}
ng := parseAnnotations(nodes[i], kc)
ng.name = getNGName(nodes[i], kc)
sanitizeNode(nodes[i])
prepareNode(nodes[i], ng.name)
ng.nodeTemplate = nodes[i]
filterFn := func(no *apiv1.Node) bool {
return no.GetAnnotations()[NGNameAnnotation] == ng.name
}
ng.kubeClient = kubeClient
ng.lister = initCustomLister(allNodeLister, filterFn)
ngs[ngName] = ng
}
result := []*NodeGroup{}
for i := range ngs {
result = append(result, ngs[i])
}
return result
}
// sanitizeNode cleans the node
func sanitizeNode(no *apiv1.Node) {
no.ResourceVersion = ""
no.Generation = 0
no.UID = ""
no.CreationTimestamp = v1.Time{}
no.Status.NodeInfo.KubeletVersion = "fake"
}
// prepareNode prepares node as a kwok template node
func prepareNode(no *apiv1.Node, ngName string) {
// add prefix in the name to make it clear that this node is different
// from the ones already existing in the cluster (in case there is a name clash)
no.Name = fmt.Sprintf("kwok-fake-%s", no.GetName())
no.Annotations[KwokManagedAnnotation] = "fake"
no.Annotations[NGNameAnnotation] = ngName
no.Spec.ProviderID = getProviderID(no.GetName())
}
func getProviderID(nodeName string) string {
return fmt.Sprintf("kwok:%s", nodeName)
}
func parseAnnotations(no *apiv1.Node, kc *KwokProviderConfig) *NodeGroup {
min := 0
max := 200
target := min
if no.GetAnnotations()[NGMinSizeAnnotation] != "" {
if mi, err := strconv.Atoi(no.GetAnnotations()[NGMinSizeAnnotation]); err == nil {
min = mi
} else {
klog.Fatalf("invalid value for annotation key '%s' for node '%s'", NGMinSizeAnnotation, no.GetName())
}
}
if no.GetAnnotations()[NGMaxSizeAnnotation] != "" {
if ma, err := strconv.Atoi(no.GetAnnotations()[NGMaxSizeAnnotation]); err == nil {
max = ma
} else {
klog.Fatalf("invalid value for annotation key '%s' for node '%s'", NGMaxSizeAnnotation, no.GetName())
}
}
if no.GetAnnotations()[NGDesiredSizeAnnotation] != "" {
if ta, err := strconv.Atoi(no.GetAnnotations()[NGDesiredSizeAnnotation]); err == nil {
target = ta
} else {
klog.Fatalf("invalid value for annotation key '%s' for node '%s'", NGDesiredSizeAnnotation, no.GetName())
}
}
if max < min {
log.Fatalf("min-count '%d' cannot be lesser than max-count '%d' for the node '%s'", min, max, no.GetName())
}
if target > max || target < min {
log.Fatalf("desired-count '%d' cannot be lesser than min-count '%d' or greater than max-count '%d' for the node '%s'", target, min, max, no.GetName())
}
return &NodeGroup{
minSize: min,
maxSize: max,
targetSize: target,
}
}
func getNGName(no *apiv1.Node, kc *KwokProviderConfig) string {
if no.GetAnnotations()[NGNameAnnotation] != "" {
return no.GetAnnotations()[NGNameAnnotation]
}
var ngName string
switch kc.status.groupNodesBy {
case "annotation":
ngName = no.GetAnnotations()[kc.status.key]
case "label":
ngName = no.GetLabels()[kc.status.key]
default:
klog.Fatal("grouping criteria for nodes is not set (expected: 'annotation' or 'label')")
}
if ngName == "" {
klog.Fatalf("%s '%s' for node '%s' not present in the manifest",
kc.status.groupNodesBy, kc.status.key,
no.GetName())
}
ngName = fmt.Sprintf("%s-%v", ngName, time.Now().Unix())
return ngName
}

View File

@ -0,0 +1,890 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"os"
"testing"
"github.com/stretchr/testify/assert"
apiv1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/kubernetes/fake"
core "k8s.io/client-go/testing"
)
const multipleNodes = `
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:16Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-control-plane
kwok-nodegroup: control-plane
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
node.kubernetes.io/exclude-from-external-load-balancers: ""
name: kind-control-plane
resourceVersion: "603"
uid: 86716ec7-3071-4091-b055-77b4361d1dca
spec:
podCIDR: 10.244.0.0/24
podCIDRs:
- 10.244.0.0/24
providerID: kind://docker/kind/kind-control-plane
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
status:
addresses:
- address: 172.18.0.2
type: InternalIP
- address: kind-control-plane
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:29Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:29Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:29Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:29Z"
lastTransitionTime: "2023-05-31T04:39:46Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
- docker.io/kindest/local-path-provisioner@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: 96f8c8b8c8ae4600a3654341f207586e
operatingSystem: linux
osImage: Ubuntu
systemUUID: 111aa932-7f99-4bef-aaf7-36aa7fb9b012
---
apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
name: kind-worker
resourceVersion: "577"
uid: 2ac0eb71-e5cf-4708-bbbf-476e8f19842b
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:05Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: a98a13ff474d476294935341f1ba9816
operatingSystem: linux
osImage: Ubuntu
systemUUID: 5f3c1af8-a385-4776-85e4-73d7f4252b44
`
const nodeList = `
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:16Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-control-plane
kwok-nodegroup: control-plane
kubernetes.io/os: linux
node-role.kubernetes.io/control-plane: ""
node.kubernetes.io/exclude-from-external-load-balancers: ""
name: kind-control-plane
resourceVersion: "506"
uid: 86716ec7-3071-4091-b055-77b4361d1dca
spec:
podCIDR: 10.244.0.0/24
podCIDRs:
- 10.244.0.0/24
providerID: kind://docker/kind/kind-control-plane
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
status:
addresses:
- address: 172.18.0.2
type: InternalIP
- address: kind-control-plane
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:13Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:39:58Z"
lastTransitionTime: "2023-05-31T04:39:46Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: 96f8c8b8c8ae4600a3654341f207586e
operatingSystem: linux
osImage: Ubuntu
systemUUID: 111aa932-7f99-4bef-aaf7-36aa7fb9b012
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
name: kind-worker
resourceVersion: "577"
uid: 2ac0eb71-e5cf-4708-bbbf-476e8f19842b
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:05Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: a98a13ff474d476294935341f1ba9816
operatingSystem: linux
osImage: Ubuntu
systemUUID: 5f3c1af8-a385-4776-85e4-73d7f4252b44
kind: List
metadata:
resourceVersion: ""
`
const wrongIndentation = `
apiVersion: v1
items:
- apiVersion: v1
# everything below should be in-line with apiVersion above
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
name: kind-worker
resourceVersion: "577"
uid: 2ac0eb71-e5cf-4708-bbbf-476e8f19842b
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:05Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: a98a13ff474d476294935341f1ba9816
operatingSystem: linux
osImage: Ubuntu 22.04.2 LTS
systemUUID: 5f3c1af8-a385-4776-85e4-73d7f4252b44
kind: List
metadata:
resourceVersion: ""
`
const noGPULabel = `
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
annotations:
kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: "0"
volumes.kubernetes.io/controller-managed-attach-detach: "true"
creationTimestamp: "2023-05-31T04:39:57Z"
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kind-worker
kwok-nodegroup: kind-worker
kubernetes.io/os: linux
name: kind-worker
resourceVersion: "577"
uid: 2ac0eb71-e5cf-4708-bbbf-476e8f19842b
spec:
podCIDR: 10.244.2.0/24
podCIDRs:
- 10.244.2.0/24
providerID: kind://docker/kind/kind-worker
status:
addresses:
- address: 172.18.0.3
type: InternalIP
- address: kind-worker
type: Hostname
allocatable:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
capacity:
cpu: "12"
ephemeral-storage: 959786032Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 32781516Ki
pods: "110"
conditions:
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:39:57Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-05-31T04:40:17Z"
lastTransitionTime: "2023-05-31T04:40:05Z"
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
daemonEndpoints:
kubeletEndpoint:
Port: 10250
images:
- names:
- registry.k8s.io/etcd:3.5.6-0
sizeBytes: 102542580
- names:
- docker.io/library/import-2023-03-30@sha256:ba097b515c8c40689733c0f19de377e9bf8995964b7d7150c2045f3dfd166657
- registry.k8s.io/kube-apiserver:v1.26.3
sizeBytes: 80392681
- names:
- docker.io/library/import-2023-03-30@sha256:8dbb345de79d1c44f59a7895da702a5f71997ae72aea056609445c397b0c10dc
- registry.k8s.io/kube-controller-manager:v1.26.3
sizeBytes: 68538487
- names:
- docker.io/library/import-2023-03-30@sha256:44db4d50a5f9c8efbac0d37ea974d1c0419a5928f90748d3d491a041a00c20b5
- registry.k8s.io/kube-proxy:v1.26.3
sizeBytes: 67217404
- names:
- docker.io/library/import-2023-03-30@sha256:3dd2337f70af979c7362b5e52bbdfcb3a5fd39c78d94d02145150cd2db86ba39
- registry.k8s.io/kube-scheduler:v1.26.3
sizeBytes: 57761399
- names:
- docker.io/kindest/kindnetd:v20230330-48f316cd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
- docker.io/kindest/kindnetd@sha256:c19d6362a6a928139820761475a38c24c0cf84d507b9ddf414a078cf627497af
sizeBytes: 27726335
- names:
- docker.io/kindest/local-path-provisioner:v0.0.23-kind.0@sha256:f2d0a02831ff3a03cf51343226670d5060623b43a4cfc4808bd0875b2c4b9501
sizeBytes: 18664669
- names:
- registry.k8s.io/coredns/coredns:v1.9.3
sizeBytes: 14837849
- names:
- docker.io/kindest/local-path-helper:v20230330-48f316cd@sha256:135203f2441f916fb13dad1561d27f60a6f11f50ec288b01a7d2ee9947c36270
sizeBytes: 3052037
- names:
- registry.k8s.io/pause:3.7
sizeBytes: 311278
nodeInfo:
architecture: amd64
bootID: 2d71b318-5d07-4de2-9e61-2da28cf5bbf0
containerRuntimeVersion: containerd://1.6.19-46-g941215f49
kernelVersion: 5.15.0-72-generic
kubeProxyVersion: v1.26.3
kubeletVersion: v1.26.3
machineID: a98a13ff474d476294935341f1ba9816
operatingSystem: linux
osImage: Ubuntu 22.04.2 LTS
systemUUID: 5f3c1af8-a385-4776-85e4-73d7f4252b44
kind: List
metadata:
resourceVersion: ""
`
func TestLoadNodeTemplatesFromConfigMap(t *testing.T) {
var testTemplatesMap = map[string]string{
"wrongIndentation": wrongIndentation,
defaultTemplatesConfigName: testTemplates,
"multipleNodes": multipleNodes,
"nodeList": nodeList,
}
testTemplateName := defaultTemplatesConfigName
fakeClient := &fake.Clientset{}
fakeClient.Fake.AddReactor("get", "configmaps", func(action core.Action) (bool, runtime.Object, error) {
getAction := action.(core.GetAction)
if getAction == nil {
return false, nil, nil
}
if getAction.GetName() == defaultConfigName {
return true, &apiv1.ConfigMap{
Data: map[string]string{
configKey: testConfig,
},
}, nil
}
if testTemplatesMap[testTemplateName] != "" {
return true, &apiv1.ConfigMap{
Data: map[string]string{
templatesKey: testTemplatesMap[testTemplateName],
},
}, nil
}
return true, nil, errors.NewNotFound(apiv1.Resource("configmaps"), "whatever")
})
fakeClient.Fake.AddReactor("list", "nodes", func(action core.Action) (bool, runtime.Object, error) {
getAction := action.(core.GetAction)
if getAction == nil {
return false, nil, nil
}
return true, &apiv1.NodeList{Items: []apiv1.Node{}}, errors.NewNotFound(apiv1.Resource("nodes"), "whatever")
})
os.Setenv("POD_NAMESPACE", "kube-system")
kwokConfig, err := LoadConfigFile(fakeClient)
assert.Nil(t, err)
// happy path
testTemplateName = defaultTemplatesConfigName
nos, err := LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, fakeClient)
assert.Nil(t, err)
assert.NotEmpty(t, nos)
assert.Greater(t, len(nos), 0)
testTemplateName = "wrongIndentation"
nos, err = LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, fakeClient)
assert.Error(t, err)
assert.Empty(t, nos)
assert.Equal(t, len(nos), 0)
// multiple nodes is something like []*Node{node1, node2, node3, ...}
testTemplateName = "multipleNodes"
nos, err = LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, fakeClient)
assert.Nil(t, err)
assert.NotEmpty(t, nos)
assert.Greater(t, len(nos), 0)
// node list is something like []*List{Items:[]*Node{node1, node2, node3, ...}}
testTemplateName = "nodeList"
nos, err = LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, fakeClient)
assert.Nil(t, err)
assert.NotEmpty(t, nos)
assert.Greater(t, len(nos), 0)
// fake client which returns configmap with wrong key
fakeClient = &fake.Clientset{}
fakeClient.Fake.AddReactor("get", "configmaps", func(action core.Action) (bool, runtime.Object, error) {
getAction := action.(core.GetAction)
if getAction == nil {
return false, nil, nil
}
return true, &apiv1.ConfigMap{
Data: map[string]string{
"foo": testTemplatesMap[testTemplateName],
},
}, nil
})
fakeClient.Fake.AddReactor("list", "nodes", func(action core.Action) (bool, runtime.Object, error) {
getAction := action.(core.GetAction)
if getAction == nil {
return false, nil, nil
}
return true, &apiv1.NodeList{Items: []apiv1.Node{}}, errors.NewNotFound(apiv1.Resource("nodes"), "whatever")
})
// throw error if configmap data key is not `templates`
nos, err = LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, fakeClient)
assert.Error(t, err)
assert.Empty(t, nos)
assert.Equal(t, len(nos), 0)
}

View File

@ -0,0 +1,221 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"context"
"fmt"
apiv1 "k8s.io/api/core/v1"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/rand"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
klog "k8s.io/klog/v2"
schedulerframework "k8s.io/kubernetes/pkg/scheduler/framework"
)
var (
sizeIncreaseMustBePositiveErr = "size increase must be positive"
maxSizeReachedErr = "size increase too large"
minSizeReachedErr = "min size reached, nodes will not be deleted"
belowMinSizeErr = "can't delete nodes because nodegroup size would go below min size"
notManagedByKwokErr = "can't delete node '%v' because it is not managed by kwok"
sizeDecreaseMustBeNegativeErr = "size decrease must be negative"
attemptToDeleteExistingNodesErr = "attempt to delete existing nodes"
)
// MaxSize returns maximum size of the node group.
func (nodeGroup *NodeGroup) MaxSize() int {
return nodeGroup.maxSize
}
// MinSize returns minimum size of the node group.
func (nodeGroup *NodeGroup) MinSize() int {
return nodeGroup.minSize
}
// TargetSize returns the current TARGET size of the node group. It is possible that the
// number is different from the number of nodes registered in Kubernetes.
func (nodeGroup *NodeGroup) TargetSize() (int, error) {
return nodeGroup.targetSize, nil
}
// IncreaseSize increases NodeGroup size.
func (nodeGroup *NodeGroup) IncreaseSize(delta int) error {
if delta <= 0 {
return fmt.Errorf(sizeIncreaseMustBePositiveErr)
}
size := nodeGroup.targetSize
newSize := int(size) + delta
if newSize > nodeGroup.MaxSize() {
return fmt.Errorf("%s, desired: %d max: %d", maxSizeReachedErr, newSize, nodeGroup.MaxSize())
}
klog.V(5).Infof("increasing size of nodegroup '%s' to %v (old size: %v, delta: %v)", nodeGroup.name, newSize, size, delta)
schedNode, err := nodeGroup.TemplateNodeInfo()
if err != nil {
return fmt.Errorf("couldn't create a template node for nodegroup %s", nodeGroup.name)
}
for i := 0; i < delta; i++ {
node := schedNode.Node()
node.Name = fmt.Sprintf("%s-%s", nodeGroup.name, rand.String(5))
node.Spec.ProviderID = getProviderID(node.Name)
_, err := nodeGroup.kubeClient.CoreV1().Nodes().Create(context.Background(), node, v1.CreateOptions{})
if err != nil {
return fmt.Errorf("couldn't create new node '%s': %v", node.Name, err)
}
}
nodeGroup.targetSize = newSize
return nil
}
// DeleteNodes deletes the specified nodes from the node group.
func (nodeGroup *NodeGroup) DeleteNodes(nodes []*apiv1.Node) error {
size := nodeGroup.targetSize
if size <= nodeGroup.MinSize() {
return fmt.Errorf(minSizeReachedErr)
}
if size-len(nodes) < nodeGroup.MinSize() {
return fmt.Errorf(belowMinSizeErr)
}
for _, node := range nodes {
// TODO(vadasambar): check if there's a better way than returning an error here
if node.GetAnnotations()[KwokManagedAnnotation] != "fake" {
return fmt.Errorf(notManagedByKwokErr, node.GetName())
}
// TODO(vadasambar): proceed to delete the next node if the current node deletion errors
// TODO(vadasambar): collect all the errors and return them after attempting to delete all the nodes to be deleted
err := nodeGroup.kubeClient.CoreV1().Nodes().Delete(context.Background(), node.GetName(), v1.DeleteOptions{})
if err != nil {
return err
}
}
return nil
}
// DecreaseTargetSize decreases the target size of the node group. This function
// doesn't permit to delete any existing node and can be used only to reduce the
// request for new nodes that have not been yet fulfilled. Delta should be negative.
func (nodeGroup *NodeGroup) DecreaseTargetSize(delta int) error {
if delta >= 0 {
return fmt.Errorf(sizeDecreaseMustBeNegativeErr)
}
size := nodeGroup.targetSize
nodes, err := nodeGroup.getNodeNamesForNodeGroup()
if err != nil {
return err
}
newSize := int(size) + delta
if newSize < len(nodes) {
return fmt.Errorf("%s, targetSize: %d delta: %d existingNodes: %d",
attemptToDeleteExistingNodesErr, size, delta, len(nodes))
}
nodeGroup.targetSize = newSize
return nil
}
// getNodeNamesForNodeGroup returns list of nodes belonging to the nodegroup
func (nodeGroup *NodeGroup) getNodeNamesForNodeGroup() ([]string, error) {
names := []string{}
nodeList, err := nodeGroup.lister.List()
if err != nil {
return names, err
}
for _, no := range nodeList {
names = append(names, no.GetName())
}
return names, nil
}
// Id returns nodegroup name.
func (nodeGroup *NodeGroup) Id() string {
return nodeGroup.name
}
// Debug returns a debug string for the nodegroup.
func (nodeGroup *NodeGroup) Debug() string {
return fmt.Sprintf("%s (%d:%d)", nodeGroup.Id(), nodeGroup.MinSize(), nodeGroup.MaxSize())
}
// Nodes returns a list of all nodes that belong to this node group.
func (nodeGroup *NodeGroup) Nodes() ([]cloudprovider.Instance, error) {
instances := make([]cloudprovider.Instance, 0)
nodeNames, err := nodeGroup.getNodeNamesForNodeGroup()
if err != nil {
return instances, err
}
for _, nodeName := range nodeNames {
instances = append(instances, cloudprovider.Instance{Id: getProviderID(nodeName), Status: &cloudprovider.InstanceStatus{
State: cloudprovider.InstanceRunning,
ErrorInfo: nil,
}})
}
return instances, nil
}
// TemplateNodeInfo returns a node template for this node group.
func (nodeGroup *NodeGroup) TemplateNodeInfo() (*schedulerframework.NodeInfo, error) {
nodeInfo := schedulerframework.NewNodeInfo(cloudprovider.BuildKubeProxy(nodeGroup.Id()))
nodeInfo.SetNode(nodeGroup.nodeTemplate)
return nodeInfo, nil
}
// Exist checks if the node group really exists on the cloud provider side.
// Since kwok nodegroup is not backed by anything on cloud provider side
// We can safely return `true` here
func (nodeGroup *NodeGroup) Exist() bool {
return true
}
// Create creates the node group on the cloud provider side.
// Left unimplemented because Create is not used anywhere
// in the core autoscaler as of writing this
func (nodeGroup *NodeGroup) Create() (cloudprovider.NodeGroup, error) {
return nil, cloudprovider.ErrNotImplemented
}
// Delete deletes the node group on the cloud provider side.
// Left unimplemented because Delete is not used anywhere
// in the core autoscaler as of writing this
func (nodeGroup *NodeGroup) Delete() error {
return cloudprovider.ErrNotImplemented
}
// Autoprovisioned returns true if the node group is autoprovisioned.
func (nodeGroup *NodeGroup) Autoprovisioned() bool {
return false
}
// GetOptions returns NodeGroupAutoscalingOptions that should be used for this particular
// NodeGroup. Returning a nil will result in using default options.
func (nodeGroup *NodeGroup) GetOptions(defaults config.NodeGroupAutoscalingOptions) (*config.NodeGroupAutoscalingOptions, error) {
return &defaults, nil
}

View File

@ -0,0 +1,360 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"fmt"
"testing"
"time"
"github.com/stretchr/testify/assert"
apiv1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
kube_util "k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes"
"k8s.io/client-go/kubernetes/fake"
core "k8s.io/client-go/testing"
)
func TestIncreaseSize(t *testing.T) {
fakeClient := &fake.Clientset{}
nodes := []*apiv1.Node{}
fakeClient.Fake.AddReactor("create", "nodes",
func(action core.Action) (bool, runtime.Object, error) {
createAction := action.(core.CreateAction)
if createAction == nil {
return false, nil, nil
}
nodes = append(nodes, createAction.GetObject().(*apiv1.Node))
return true, nil, nil
})
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(nil),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 2,
maxSize: 3,
}
// usual case
err := ng.IncreaseSize(1)
assert.Nil(t, err)
assert.Len(t, nodes, 1)
assert.Equal(t, 3, ng.targetSize)
for _, n := range nodes {
assert.Contains(t, n.Spec.ProviderID, "kwok")
assert.Contains(t, n.GetName(), ng.name)
}
// delta is negative
nodes = []*apiv1.Node{}
err = ng.IncreaseSize(-1)
assert.NotNil(t, err)
assert.Contains(t, err.Error(), sizeIncreaseMustBePositiveErr)
assert.Len(t, nodes, 0)
// delta is greater than max size
nodes = []*apiv1.Node{}
err = ng.IncreaseSize(ng.maxSize + 1)
assert.NotNil(t, err)
assert.Contains(t, err.Error(), maxSizeReachedErr)
assert.Len(t, nodes, 0)
}
func TestDeleteNodes(t *testing.T) {
fakeClient := &fake.Clientset{}
deletedNodes := make(map[string]bool)
fakeClient.Fake.AddReactor("delete", "nodes", func(action core.Action) (bool, runtime.Object, error) {
deleteAction := action.(core.DeleteAction)
if deleteAction == nil {
return false, nil, nil
}
deletedNodes[deleteAction.GetName()] = true
return true, nil, nil
})
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(nil),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 1,
maxSize: 3,
}
nodeToDelete1 := &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "node-to-delete-1",
Annotations: map[string]string{
KwokManagedAnnotation: "fake",
},
},
}
nodeToDelete2 := &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "node-to-delete-2",
Annotations: map[string]string{
KwokManagedAnnotation: "fake",
},
},
}
nodeWithoutKwokAnnotation := &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "node-to-delete-3",
Annotations: map[string]string{},
},
}
// usual case
err := ng.DeleteNodes([]*apiv1.Node{nodeToDelete1})
assert.Nil(t, err)
assert.True(t, deletedNodes[nodeToDelete1.GetName()])
// min size reached
deletedNodes = make(map[string]bool)
ng.targetSize = 0
err = ng.DeleteNodes([]*apiv1.Node{nodeToDelete1})
assert.NotNil(t, err)
assert.Contains(t, err.Error(), minSizeReachedErr)
assert.False(t, deletedNodes[nodeToDelete1.GetName()])
ng.targetSize = 1
// too many nodes to delete - goes below ng's minSize
deletedNodes = make(map[string]bool)
err = ng.DeleteNodes([]*apiv1.Node{nodeToDelete1, nodeToDelete2})
assert.NotNil(t, err)
assert.Contains(t, err.Error(), belowMinSizeErr)
assert.False(t, deletedNodes[nodeToDelete1.GetName()])
assert.False(t, deletedNodes[nodeToDelete2.GetName()])
// kwok annotation is not present on the node to delete
deletedNodes = make(map[string]bool)
err = ng.DeleteNodes([]*apiv1.Node{nodeWithoutKwokAnnotation})
assert.NotNil(t, err)
assert.Contains(t, err.Error(), "not managed by kwok")
assert.False(t, deletedNodes[nodeWithoutKwokAnnotation.GetName()])
}
func TestDecreaseTargetSize(t *testing.T) {
fakeClient := &fake.Clientset{}
fakeNodes := []*apiv1.Node{
{
ObjectMeta: metav1.ObjectMeta{
Name: "node-1",
},
},
{
ObjectMeta: metav1.ObjectMeta{
Name: "node-2",
},
},
}
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(fakeNodes),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 3,
maxSize: 4,
}
// usual case
err := ng.DecreaseTargetSize(-1)
assert.Nil(t, err)
assert.Equal(t, 2, ng.targetSize)
// delta is positive
ng.targetSize = 3
err = ng.DecreaseTargetSize(1)
assert.NotNil(t, err)
assert.Contains(t, err.Error(), sizeDecreaseMustBeNegativeErr)
assert.Equal(t, 3, ng.targetSize)
// attempt to delete existing nodes
err = ng.DecreaseTargetSize(-2)
assert.NotNil(t, err)
assert.Contains(t, err.Error(), attemptToDeleteExistingNodesErr)
assert.Equal(t, 3, ng.targetSize)
// error from lister
ng.lister = &ErroneousNodeLister{}
err = ng.DecreaseTargetSize(-1)
assert.NotNil(t, err)
assert.Equal(t, cloudprovider.ErrNotImplemented.Error(), err.Error())
assert.Equal(t, 3, ng.targetSize)
ng.lister = kube_util.NewTestNodeLister(fakeNodes)
}
func TestNodes(t *testing.T) {
fakeClient := &fake.Clientset{}
fakeNodes := []*apiv1.Node{
{
ObjectMeta: metav1.ObjectMeta{
Name: "node-1",
},
},
{
ObjectMeta: metav1.ObjectMeta{
Name: "node-2",
},
},
}
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(fakeNodes),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 2,
maxSize: 3,
}
// usual case
cpInstances, err := ng.Nodes()
assert.Nil(t, err)
assert.Len(t, cpInstances, 2)
for i := range cpInstances {
assert.Contains(t, cpInstances[i].Id, fakeNodes[i].GetName())
assert.Equal(t, &cloudprovider.InstanceStatus{
State: cloudprovider.InstanceRunning,
ErrorInfo: nil,
}, cpInstances[i].Status)
}
// error from lister
ng.lister = &ErroneousNodeLister{}
cpInstances, err = ng.Nodes()
assert.NotNil(t, err)
assert.Len(t, cpInstances, 0)
assert.Equal(t, cloudprovider.ErrNotImplemented.Error(), err.Error())
}
func TestTemplateNodeInfo(t *testing.T) {
fakeClient := &fake.Clientset{}
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(nil),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 2,
maxSize: 3,
}
// usual case
ti, err := ng.TemplateNodeInfo()
assert.Nil(t, err)
assert.NotNil(t, ti)
assert.Len(t, ti.Pods, 1)
assert.Contains(t, ti.Pods[0].Pod.Name, fmt.Sprintf("kube-proxy-%s", ng.name))
assert.Equal(t, ng.nodeTemplate, ti.Node())
}
func TestGetOptions(t *testing.T) {
fakeClient := &fake.Clientset{}
ng := NodeGroup{
name: "ng",
kubeClient: fakeClient,
lister: kube_util.NewTestNodeLister(nil),
nodeTemplate: &apiv1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: "template-node-ng",
},
},
minSize: 0,
targetSize: 2,
maxSize: 3,
}
// dummy values
autoscalingOptions := config.NodeGroupAutoscalingOptions{
ScaleDownUtilizationThreshold: 50.0,
ScaleDownGpuUtilizationThreshold: 50.0,
ScaleDownUnneededTime: time.Minute * 5,
ScaleDownUnreadyTime: time.Minute * 5,
MaxNodeProvisionTime: time.Minute * 5,
ZeroOrMaxNodeScaling: true,
IgnoreDaemonSetsUtilization: true,
}
// usual case
opts, err := ng.GetOptions(autoscalingOptions)
assert.Nil(t, err)
assert.Equal(t, autoscalingOptions, *opts)
}
// ErroneousNodeLister is used to check if the caller function throws an error
// if lister throws an error
type ErroneousNodeLister struct {
}
func (e *ErroneousNodeLister) List() ([]*apiv1.Node, error) {
return nil, cloudprovider.ErrNotImplemented
}
func (e *ErroneousNodeLister) Get(name string) (*apiv1.Node, error) {
return nil, cloudprovider.ErrNotImplemented
}

View File

@ -0,0 +1,257 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
"context"
"fmt"
"os"
"strings"
apiv1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
"k8s.io/autoscaler/cluster-autoscaler/utils/errors"
"k8s.io/autoscaler/cluster-autoscaler/utils/gpu"
kube_util "k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes"
"k8s.io/client-go/informers"
kubeclient "k8s.io/client-go/kubernetes"
"k8s.io/client-go/rest"
"k8s.io/client-go/tools/clientcmd"
klog "k8s.io/klog/v2"
)
// Name returns name of the cloud provider.
func (kwok *KwokCloudProvider) Name() string {
return ProviderName
}
// NodeGroups returns all node groups configured for this cloud provider.
func (kwok *KwokCloudProvider) NodeGroups() []cloudprovider.NodeGroup {
result := make([]cloudprovider.NodeGroup, 0, len(kwok.nodeGroups))
for _, nodegroup := range kwok.nodeGroups {
result = append(result, nodegroup)
}
return result
}
// NodeGroupForNode returns the node group for the given node.
func (kwok *KwokCloudProvider) NodeGroupForNode(node *apiv1.Node) (cloudprovider.NodeGroup, error) {
// Skip nodes that are not managed by kwok cloud provider.
if !strings.HasPrefix(node.Spec.ProviderID, ProviderName) {
klog.V(2).Infof("ignoring node '%s' because it is not managed by kwok", node.GetName())
return nil, nil
}
for _, nodeGroup := range kwok.nodeGroups {
if nodeGroup.name == getNGName(node, kwok.config) {
klog.V(5).Infof("found nodegroup '%s' for node '%s'", nodeGroup.name, node.GetName())
return nodeGroup, nil
}
}
return nil, nil
}
// HasInstance returns whether a given node has a corresponding instance in this cloud provider
// Since there is no underlying cloud provider instance, return true
func (kwok *KwokCloudProvider) HasInstance(node *apiv1.Node) (bool, error) {
return true, nil
}
// Pricing returns pricing model for this cloud provider or error if not available.
func (kwok *KwokCloudProvider) Pricing() (cloudprovider.PricingModel, errors.AutoscalerError) {
return nil, cloudprovider.ErrNotImplemented
}
// GetAvailableMachineTypes get all machine types that can be requested from the cloud provider.
// Implementation optional.
func (kwok *KwokCloudProvider) GetAvailableMachineTypes() ([]string, error) {
return []string{}, cloudprovider.ErrNotImplemented
}
// NewNodeGroup builds a theoretical node group based on the node definition provided.
func (kwok *KwokCloudProvider) NewNodeGroup(machineType string, labels map[string]string, systemLabels map[string]string,
taints []apiv1.Taint,
extraResources map[string]resource.Quantity) (cloudprovider.NodeGroup, error) {
return nil, cloudprovider.ErrNotImplemented
}
// GetResourceLimiter returns struct containing limits (max, min) for resources (cores, memory etc.).
func (kwok *KwokCloudProvider) GetResourceLimiter() (*cloudprovider.ResourceLimiter, error) {
return kwok.resourceLimiter, nil
}
// GPULabel returns the label added to nodes with GPU resource.
func (kwok *KwokCloudProvider) GPULabel() string {
// GPULabel() might get called before the config is loaded
if kwok.config == nil || kwok.config.status == nil {
return ""
}
return kwok.config.status.gpuLabel
}
// GetAvailableGPUTypes return all available GPU types cloud provider supports
func (kwok *KwokCloudProvider) GetAvailableGPUTypes() map[string]struct{} {
// GetAvailableGPUTypes() might get called before the config is loaded
if kwok.config == nil || kwok.config.status == nil {
return map[string]struct{}{}
}
return kwok.config.status.availableGPUTypes
}
// GetNodeGpuConfig returns the label, type and resource name for the GPU added to node. If node doesn't have
// any GPUs, it returns nil.
func (kwok *KwokCloudProvider) GetNodeGpuConfig(node *apiv1.Node) *cloudprovider.GpuConfig {
return gpu.GetNodeGPUFromCloudProvider(kwok, node)
}
// Refresh is called before every main loop and can be used to dynamically update cloud provider state.
// In particular the list of node groups returned by NodeGroups can change as a result of CloudProvider.Refresh().
// TODO(vadasambar): implement this
func (kwok *KwokCloudProvider) Refresh() error {
// TODO(vadasambar): causes CA to not recognize kwok nodegroups
// needs better implementation
// nodeList, err := kwok.lister.List()
// if err != nil {
// return err
// }
// ngs := []*NodeGroup{}
// for _, no := range nodeList {
// ng := parseAnnotationsToNodegroup(no)
// ng.kubeClient = kwok.kubeClient
// ngs = append(ngs, ng)
// }
// kwok.nodeGroups = ngs
return nil
}
// Cleanup cleans up all resources before the cloud provider is removed
func (kwok *KwokCloudProvider) Cleanup() error {
for _, ng := range kwok.nodeGroups {
nodeNames, err := ng.getNodeNamesForNodeGroup()
if err != nil {
return fmt.Errorf("error cleaning up: %v", err)
}
for _, node := range nodeNames {
err := kwok.kubeClient.CoreV1().Nodes().Delete(context.Background(), node, v1.DeleteOptions{})
if err != nil {
klog.Errorf("error cleaning up kwok provider nodes '%v'", node)
}
}
}
return nil
}
// BuildKwok builds kwok cloud provider.
func BuildKwok(opts config.AutoscalingOptions,
do cloudprovider.NodeGroupDiscoveryOptions,
rl *cloudprovider.ResourceLimiter,
informerFactory informers.SharedInformerFactory) cloudprovider.CloudProvider {
var restConfig *rest.Config
var err error
if os.Getenv("KWOK_PROVIDER_MODE") == "local" {
// Check and load kubeconfig from the path set
// in KUBECONFIG env variable (if not use default path of ~/.kube/config)
apiConfig, err := clientcmd.NewDefaultClientConfigLoadingRules().Load()
if err != nil {
klog.Fatal(err)
}
// Create rest config from kubeconfig
restConfig, err = clientcmd.NewDefaultClientConfig(*apiConfig, &clientcmd.ConfigOverrides{}).ClientConfig()
if err != nil {
klog.Fatal(err)
}
} else {
restConfig, err = rest.InClusterConfig()
if err != nil {
klog.Fatalf("failed to get kubeclient config for cluster: %v", err)
}
}
// TODO: switch to using the same kube/rest config as the core CA after
// https://github.com/kubernetes/autoscaler/pull/6180/files is merged
kubeClient := kubeclient.NewForConfigOrDie(restConfig)
p, err := BuildKwokProvider(&kwokOptions{
kubeClient: kubeClient,
autoscalingOpts: &opts,
discoveryOpts: &do,
resourceLimiter: rl,
ngNodeListerFn: kube_util.NewNodeLister,
allNodesLister: informerFactory.Core().V1().Nodes().Lister()})
if err != nil {
klog.Fatal(err)
}
return p
}
// BuildKwokProvider builds the kwok provider
func BuildKwokProvider(ko *kwokOptions) (*KwokCloudProvider, error) {
kwokConfig, err := LoadConfigFile(ko.kubeClient)
if err != nil {
return nil, fmt.Errorf("failed to load kwok provider config: %v", err)
}
var nodegroups []*NodeGroup
var nodeTemplates []*apiv1.Node
switch kwokConfig.ReadNodesFrom {
case nodeTemplatesFromConfigMap:
if nodeTemplates, err = LoadNodeTemplatesFromConfigMap(kwokConfig.ConfigMap.Name, ko.kubeClient); err != nil {
return nil, err
}
case nodeTemplatesFromCluster:
if nodeTemplates, err = loadNodeTemplatesFromCluster(kwokConfig, ko.kubeClient, nil); err != nil {
return nil, err
}
}
if !kwokConfig.Nodes.SkipTaint {
for _, no := range nodeTemplates {
no.Spec.Taints = append(no.Spec.Taints, kwokProviderTaint())
}
}
nodegroups = createNodegroups(nodeTemplates, ko.kubeClient, kwokConfig, ko.ngNodeListerFn, ko.allNodesLister)
return &KwokCloudProvider{
nodeGroups: nodegroups,
kubeClient: ko.kubeClient,
resourceLimiter: ko.resourceLimiter,
config: kwokConfig,
}, nil
}
func kwokProviderTaint() apiv1.Taint {
return apiv1.Taint{
Key: "kwok-provider",
Value: "true",
Effect: apiv1.TaintEffectNoSchedule,
}
}

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,107 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package kwok
import (
apiv1 "k8s.io/api/core/v1"
"k8s.io/autoscaler/cluster-autoscaler/cloudprovider"
"k8s.io/autoscaler/cluster-autoscaler/config"
kube_util "k8s.io/autoscaler/cluster-autoscaler/utils/kubernetes"
"k8s.io/client-go/kubernetes"
listersv1 "k8s.io/client-go/listers/core/v1"
)
// KwokCloudProvider implements CloudProvider interface for kwok
type KwokCloudProvider struct {
nodeGroups []*NodeGroup
config *KwokProviderConfig
resourceLimiter *cloudprovider.ResourceLimiter
// kubeClient is to be used only for create, delete and update
kubeClient kubernetes.Interface
}
type kwokOptions struct {
kubeClient kubernetes.Interface
autoscalingOpts *config.AutoscalingOptions
discoveryOpts *cloudprovider.NodeGroupDiscoveryOptions
resourceLimiter *cloudprovider.ResourceLimiter
// TODO(vadasambar): look into abstracting kubeClient
// and lister into a single client
// allNodeLister lists all the nodes in the cluster
allNodesLister listersv1.NodeLister
// nodeLister lists all nodes managed by kwok for a specific nodegroup
ngNodeListerFn listerFn
}
// NodeGroup implements NodeGroup interface.
type NodeGroup struct {
name string
kubeClient kubernetes.Interface
lister kube_util.NodeLister
nodeTemplate *apiv1.Node
minSize int
targetSize int
maxSize int
}
// NodegroupsConfig defines options for creating nodegroups
type NodegroupsConfig struct {
FromNodeLabelKey string `json:"fromNodeLabelKey" yaml:"fromNodeLabelKey"`
FromNodeLabelAnnotation string `json:"fromNodeLabelAnnotation" yaml:"fromNodeLabelAnnotation"`
}
// NodeConfig defines config options for the nodes
type NodeConfig struct {
GPUConfig *GPUConfig `json:"gpuConfig" yaml:"gpuConfig"`
SkipTaint bool `json:"skipTaint" yaml:"skipTaint"`
}
// ConfigMapConfig allows setting the kwok provider configmap name
type ConfigMapConfig struct {
Name string `json:"name" yaml:"name"`
Key string `json:"key" yaml:"key"`
}
// GPUConfig defines GPU related config for the node
type GPUConfig struct {
GPULabelKey string `json:"gpuLabelKey" yaml:"gpuLabelKey"`
AvailableGPUTypes map[string]struct{} `json:"availableGPUTypes" yaml:"availableGPUTypes"`
}
// KwokConfig is the struct to define kwok specific config
// (needs to be implemented; currently empty)
type KwokConfig struct {
}
// KwokProviderConfig is the struct to hold kwok provider config
type KwokProviderConfig struct {
APIVersion string `json:"apiVersion" yaml:"apiVersion"`
ReadNodesFrom string `json:"readNodesFrom" yaml:"readNodesFrom"`
Nodegroups *NodegroupsConfig `json:"nodegroups" yaml:"nodegroups"`
Nodes *NodeConfig `json:"nodes" yaml:"nodes"`
ConfigMap *ConfigMapConfig `json:"configmap" yaml:"configmap"`
Kwok *KwokConfig `json:"kwok" yaml:"kwok"`
status *GroupingConfig
}
// GroupingConfig defines different
type GroupingConfig struct {
groupNodesBy string // [annotation, label]
key string // annotation or label key
gpuLabel string // gpu label key
availableGPUTypes map[string]struct{} // available gpu types
}

View File

@ -0,0 +1,28 @@
apiVersion: v1alpha1
readNodesFrom: cluster # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
# kwok provider adds a taint on the template nodes
# so that even if you run the provider in a production cluster
# you don't have to worry about production workload
# getting accidentally scheduled on the fake nodes
# use skipTaint: true to disable this behavior (false by default)
# skipTaint: false (default)
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}

View File

@ -0,0 +1,30 @@
apiVersion: v1alpha1
readNodesFrom: configmap # possible values: [cluster,configmap]
nodegroups:
# to specify how to group nodes into a nodegroup
# e.g., you want to treat nodes with same instance type as a nodegroup
# node1: m5.xlarge
# node2: c5.xlarge
# node3: m5.xlarge
# nodegroup1: [node1,node3]
# nodegroup2: [node2]
fromNodeLabelKey: "node.kubernetes.io/instance-type"
# you can either specify fromNodeLabelKey OR fromNodeAnnotation
# (both are not allowed)
# fromNodeAnnotation: "eks.amazonaws.com/nodegroup"
nodes:
# kwok provider adds a taint on the template nodes
# so that even if you run the provider in a production cluster
# you don't have to worry about production workload
# getting accidentally scheduled on the fake nodes
# use skipTaint: true to disable this behavior (false by default)
# skipTaint: false (default)
gpuConfig:
# to tell kwok provider what label should be considered as GPU label
gpuLabelKey: "k8s.amazonaws.com/accelerator"
availableGPUTypes:
"nvidia-tesla-k80": {}
"nvidia-tesla-p100": {}
configmap:
name: kwok-provider-templates
key: kwok-config # default: config

View File

@ -73,8 +73,8 @@ type Autoscaler interface {
}
// NewAutoscaler creates an autoscaler of an appropriate type according to the parameters
func NewAutoscaler(opts AutoscalerOptions) (Autoscaler, errors.AutoscalerError) {
err := initializeDefaultOptions(&opts)
func NewAutoscaler(opts AutoscalerOptions, informerFactory informers.SharedInformerFactory) (Autoscaler, errors.AutoscalerError) {
err := initializeDefaultOptions(&opts, informerFactory)
if err != nil {
return nil, errors.ToAutoscalerError(errors.InternalError, err)
}
@ -97,7 +97,7 @@ func NewAutoscaler(opts AutoscalerOptions) (Autoscaler, errors.AutoscalerError)
}
// Initialize default options if not provided.
func initializeDefaultOptions(opts *AutoscalerOptions) error {
func initializeDefaultOptions(opts *AutoscalerOptions, informerFactory informers.SharedInformerFactory) error {
if opts.Processors == nil {
opts.Processors = ca_processors.DefaultProcessors(opts.AutoscalingOptions)
}
@ -111,7 +111,7 @@ func initializeDefaultOptions(opts *AutoscalerOptions) error {
opts.RemainingPdbTracker = pdb.NewBasicRemainingPdbTracker()
}
if opts.CloudProvider == nil {
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts.AutoscalingOptions)
opts.CloudProvider = cloudBuilder.NewCloudProvider(opts.AutoscalingOptions, informerFactory)
}
if opts.ExpanderStrategy == nil {
expanderFactory := factory.NewFactory()

View File

@ -941,6 +941,8 @@ func (a *StaticAutoscaler) ExitCleanUp() {
}
utils.DeleteStatusConfigMap(a.AutoscalingContext.ClientSet, a.AutoscalingContext.ConfigNamespace, a.AutoscalingContext.StatusConfigMapName)
a.CloudProvider.Cleanup()
a.clusterStateRegistry.Stop()
}

View File

@ -192,7 +192,7 @@ require (
k8s.io/kms v0.29.0-alpha.3 // indirect
k8s.io/kube-openapi v0.0.0-20231010175941-2dd684a91f00 // indirect
k8s.io/kube-scheduler v0.0.0 // indirect
k8s.io/kubectl v0.0.0 // indirect
k8s.io/kubectl v0.28.0 // indirect
k8s.io/mount-utils v0.26.0-alpha.0 // indirect
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.28.0 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect

View File

@ -510,7 +510,7 @@ func buildAutoscaler(debuggingSnapshotter debuggingsnapshot.DebuggingSnapshotter
metrics.UpdateMemoryLimitsBytes(autoscalingOptions.MinMemoryTotal, autoscalingOptions.MaxMemoryTotal)
// Create autoscaler.
autoscaler, err := core.NewAutoscaler(opts)
autoscaler, err := core.NewAutoscaler(opts, informerFactory)
if err != nil {
return nil, err
}

View File

@ -1852,7 +1852,7 @@ k8s.io/kube-openapi/pkg/validation/strfmt/bson
## explicit; go 1.21.3
k8s.io/kube-scheduler/config/v1
k8s.io/kube-scheduler/extender/v1
# k8s.io/kubectl v0.0.0 => k8s.io/kubectl v0.29.0-alpha.3
# k8s.io/kubectl v0.28.0 => k8s.io/kubectl v0.29.0-alpha.3
## explicit; go 1.21.3
k8s.io/kubectl/pkg/scale
# k8s.io/kubelet v0.29.0-alpha.3 => k8s.io/kubelet v0.29.0-alpha.3