Commit Graph

98 Commits

Author SHA1 Message Date
Krzysztof Jastrzebski 6944f3fc56 Delete zero values from deletionsInProgress map in NodeDeletionTracker. 2019-05-28 14:34:56 +02:00
Krzysztof Jastrzebski 4247c8b032 Implement functionality which delays node deletion when node has
annotation with  prefix
'delay-deletion.cluster-autoscaler.kubernetes.io/'.
2019-05-17 16:06:17 +02:00
Kubernetes Prow Robot a6c109f8f5
Merge pull request #1967 from towca/jtuznik/delete-empty-nodes-behaviour-fix
Modify the info passed to ScaleDownStatusProcessor when empty nodes a…
2019-04-30 05:25:37 -07:00
Jakub Tużnik b92f971326 Provide ScaleDownStatusProcessor with more info about scale-down results 2019-04-30 13:49:06 +02:00
Jakub Tużnik 402c643851 Modify the info passed to ScaleDownStatusProcessor when empty nodes are deleted
Previously, if any of the nodes fails to delete, the processor gets
a ScaleDownError status. After this commit, it will get the list of
nodes that were successfully deleted.
2019-04-26 15:54:11 +02:00
Jiaxin Shan 83ae66cebc Consider GPU utilization in scaling down 2019-04-04 01:12:51 -07:00
Jiaxin Shan 90666881d3 Move GPULabel and GPUTypes to cloud provider 2019-03-25 13:03:01 -07:00
Marcin Wielgus 99f1dcf9d2
Merge branch 'master' into crc-fix-error-format 2019-02-01 17:22:57 +01:00
Vivek Bagade 79ef3a6940 unexporting methods in utils.go 2019-01-25 00:06:03 +05:30
Jacek Kaniuk 0c64e0932a Tainting unneeded nodes as PreferNoSchedule 2019-01-21 13:06:50 +01:00
CodeLingo Bot c0603afdeb Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments

Fix error format strings according to best practices from CodeReviewComments

Reverted incorrect change to with error format string

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Resolve conflict

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix error strings in testscases to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix more error strings to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>
2019-01-11 09:10:31 +13:00
Maciej Pytel 9060014992 Use listers in scale-down 2018-12-31 14:55:38 +01:00
lsytj0413 672dddd23a refactor(*): fix golint warning 2018-12-19 10:04:08 +08:00
Andrew McDermott fd3fd85f26 UPSTREAM: <carry>: handle nil nodeGroup in calculateScaleDownGpusTotal
Explicitly handle nil as a return value for nodeGroup in
`calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called
for GPU nodes that don't exist. The current logic generates a runtime
exception:

    "reflect: call of reflect.Value.IsNil on zero Value"

Looking through the rest of the tree all the other places that use
this pattern additionally and explicitly check whether `nodeGroup ==
nil` first.

This change now completes the pattern in
`calculateScaleDownGpusTotal()`.

Looking at the other occurrences of this pattern we see:

```
File: clusterstate/clusterstate.go
488:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {

File: core/utils.go
231:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
322:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
394:27:			if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
461:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {

File: core/scale_down.go
185:6:		if reflect.ValueOf(nodeGroup).IsNil() {
608:27:			if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
747:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
1010:25:	if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
```

with the notable exception at core/scale_down.go:185 which is
`calculateScaleDownGpusTotal()`.

With this change, and invoking the autoscaler with:

```
...
      --max-nodes-total=24 \
      --cores-total=8:128 \
      --memory-total=4:256 \
      --gpu-total=nvidia.com/gpu:0:16 \
      --gpu-total=amd.com/gpu:0:4 \
...
```

I no longer see a runtime exception.
2018-12-05 18:54:07 +00:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Alex Price 4ae7acbacc add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
2018-11-23 15:24:25 +11:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Jakub Tużnik 71111da20c Add a scale down status processor, refactor so that there's more scale down info available to it 2018-09-12 14:52:20 +02:00
Pengfei Ni 1dd0147d9e Add more events for CA 2018-07-09 15:42:05 +08:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Marcin Wielgus f2e76e2592
Merge pull request #1008 from krzysztof-jastrzebski/master
Move removing unneeded autoprovisioned node groups to node group manager
2018-06-22 21:01:36 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski 2df2568841 Move removing unneeded autoprovisioned node groups to node group manager 2018-06-22 14:26:12 +02:00
Nic Doye ebadbda2b2 issues/933 Consider making UnremovableNodeRecheckTimeout configurable 2018-06-18 11:54:14 +01:00
Łukasz Osipiuk b7323bc0d1 Respect GPU limits in scale_up 2018-06-14 15:46:58 +02:00
Łukasz Osipiuk 9f75099d2c Restructure checking resource limits in scale_up.go
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
Łukasz Osipiuk 1fa44a4d3a Fix bug resulting resource limits not being enforced in scale_down 2018-06-11 16:39:07 +02:00
Łukasz Osipiuk 519064e1ec Extract isNodeBeingDeleted function 2018-06-11 14:21:07 +02:00
Łukasz Osipiuk 6c57a01fc9 Restructure checking resource limits in scale_down.go 2018-06-11 14:02:40 +02:00
Łukasz Osipiuk 9c61477d25 Do not return error when getting cpu/memory capacity of node 2018-06-08 15:04:57 +02:00
Krzysztof Jastrzebski adad14c2c9 Delete autoprovisioned node pool after all nodes are deleted. 2018-05-28 14:22:18 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Aleksandra Malinowska ffeebde8d8 Add support for rescheduled pods with the same name in drain 2018-05-10 12:00:56 +02:00
Marcin Wielgus 9c5728fd74
Merge pull request #836 from kgolab/kg-clean-up-004
Use timestamp argument
2018-05-08 20:24:37 +02:00
Karol Gołąb 53b1c6a394 Use timestamp argument 2018-05-08 13:08:30 +02:00
Karol Gołąb da16642bcf Make the code slightly more idiomatic go 2018-05-08 11:35:01 +02:00
Beata Skiba 054f6d8650
Merge pull request #794 from krzysztof-jastrzebski/pods
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 3d599bfabe Rephrase unremovable node warning 2018-04-18 13:43:32 +02:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
anniedy bf59e3daa5 Typo fix unneded->[unneeded] (#623)
* Update clusterstate.md

* Update scale_down.go

* Update static_autoscaler.go
2018-02-07 17:36:58 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Krzysztof Jastrzebski 56ac572666 Adds resource limits to cloud provider. 2017-10-23 16:06:56 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Aleksandra Malinowska 4c31a57374 fix leaking taints in case of cloud provider error on node deletion 2017-09-22 17:55:48 +02:00