Commit Graph

89 Commits

Author SHA1 Message Date
Vivek Bagade 79ef3a6940 unexporting methods in utils.go 2019-01-25 00:06:03 +05:30
Jacek Kaniuk 0c64e0932a Tainting unneeded nodes as PreferNoSchedule 2019-01-21 13:06:50 +01:00
Maciej Pytel 9060014992 Use listers in scale-down 2018-12-31 14:55:38 +01:00
lsytj0413 672dddd23a refactor(*): fix golint warning 2018-12-19 10:04:08 +08:00
Andrew McDermott fd3fd85f26 UPSTREAM: <carry>: handle nil nodeGroup in calculateScaleDownGpusTotal
Explicitly handle nil as a return value for nodeGroup in
`calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called
for GPU nodes that don't exist. The current logic generates a runtime
exception:

    "reflect: call of reflect.Value.IsNil on zero Value"

Looking through the rest of the tree all the other places that use
this pattern additionally and explicitly check whether `nodeGroup ==
nil` first.

This change now completes the pattern in
`calculateScaleDownGpusTotal()`.

Looking at the other occurrences of this pattern we see:

```
File: clusterstate/clusterstate.go
488:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {

File: core/utils.go
231:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
322:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
394:27:			if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
461:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {

File: core/scale_down.go
185:6:		if reflect.ValueOf(nodeGroup).IsNil() {
608:27:			if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
747:26:		if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
1010:25:	if nodeGroup == nil || reflect.ValueOf(nodeGroup).IsNil() {
```

with the notable exception at core/scale_down.go:185 which is
`calculateScaleDownGpusTotal()`.

With this change, and invoking the autoscaler with:

```
...
      --max-nodes-total=24 \
      --cores-total=8:128 \
      --memory-total=4:256 \
      --gpu-total=nvidia.com/gpu:0:16 \
      --gpu-total=amd.com/gpu:0:4 \
...
```

I no longer see a runtime exception.
2018-12-05 18:54:07 +00:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Alex Price 4ae7acbacc add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
2018-11-23 15:24:25 +11:00
Łukasz Osipiuk 55fc1e2f00 Store NodeGroup in ScaleUpRequest and ScaleDownRequest 2018-10-30 18:03:04 +01:00
Jakub Tużnik 71111da20c Add a scale down status processor, refactor so that there's more scale down info available to it 2018-09-12 14:52:20 +02:00
Pengfei Ni 1dd0147d9e Add more events for CA 2018-07-09 15:42:05 +08:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Marcin Wielgus f2e76e2592
Merge pull request #1008 from krzysztof-jastrzebski/master
Move removing unneeded autoprovisioned node groups to node group manager
2018-06-22 21:01:36 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski 2df2568841 Move removing unneeded autoprovisioned node groups to node group manager 2018-06-22 14:26:12 +02:00
Nic Doye ebadbda2b2 issues/933 Consider making UnremovableNodeRecheckTimeout configurable 2018-06-18 11:54:14 +01:00
Łukasz Osipiuk b7323bc0d1 Respect GPU limits in scale_up 2018-06-14 15:46:58 +02:00
Łukasz Osipiuk 9f75099d2c Restructure checking resource limits in scale_up.go
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
Łukasz Osipiuk 1fa44a4d3a Fix bug resulting resource limits not being enforced in scale_down 2018-06-11 16:39:07 +02:00
Łukasz Osipiuk 519064e1ec Extract isNodeBeingDeleted function 2018-06-11 14:21:07 +02:00
Łukasz Osipiuk 6c57a01fc9 Restructure checking resource limits in scale_down.go 2018-06-11 14:02:40 +02:00
Łukasz Osipiuk 9c61477d25 Do not return error when getting cpu/memory capacity of node 2018-06-08 15:04:57 +02:00
Krzysztof Jastrzebski adad14c2c9 Delete autoprovisioned node pool after all nodes are deleted. 2018-05-28 14:22:18 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Aleksandra Malinowska ffeebde8d8 Add support for rescheduled pods with the same name in drain 2018-05-10 12:00:56 +02:00
Marcin Wielgus 9c5728fd74
Merge pull request #836 from kgolab/kg-clean-up-004
Use timestamp argument
2018-05-08 20:24:37 +02:00
Karol Gołąb 53b1c6a394 Use timestamp argument 2018-05-08 13:08:30 +02:00
Karol Gołąb da16642bcf Make the code slightly more idiomatic go 2018-05-08 11:35:01 +02:00
Beata Skiba 054f6d8650
Merge pull request #794 from krzysztof-jastrzebski/pods
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 3d599bfabe Rephrase unremovable node warning 2018-04-18 13:43:32 +02:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
anniedy bf59e3daa5 Typo fix unneded->[unneeded] (#623)
* Update clusterstate.md

* Update scale_down.go

* Update static_autoscaler.go
2018-02-07 17:36:58 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Krzysztof Jastrzebski 56ac572666 Adds resource limits to cloud provider. 2017-10-23 16:06:56 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Aleksandra Malinowska 4c31a57374 fix leaking taints in case of cloud provider error on node deletion 2017-09-22 17:55:48 +02:00
Marcin Wielgus f04113d746 Remove TargetSize() from loops iterating over nodes 2017-09-13 22:33:17 +02:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Aleksandra Malinowska 187c02693e Taint empty nodes to be deleted 2017-09-12 17:40:05 +02:00
Marcin Wielgus 3039a0e813 Merge pull request #319 from krzysztof-jastrzebski/core-test
Core/static_autoscaler.go unit tests.
2017-09-12 13:11:11 +02:00
Beata Skiba eba0fa2f95 Remove nodes that are not in the cluster from unremovableNodes 2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski 0aec68a46d Core/static_autoscaler.go unit tests. Current time usage refactoring. 2017-09-11 15:07:21 +02:00
Marcin Wielgus db63ac3a18 Merge pull request #324 from aleksandra-malinowska/scale-down-pod-not-found
Add checking for pod not found error on eviction
2017-09-11 15:10:08 +05:30
Beata Skiba 6e5784a519 Always add empty nodes to unneeded nodes 2017-09-08 15:55:18 +02:00
Aleksandra Malinowska fbc8462b10 Add checking for not found error 2017-09-08 15:45:44 +02:00