Commit Graph

103 Commits

Author SHA1 Message Date
Jakub Tużnik 8179e4e716 Refactor the scale-(up|down) status processors so that they have more info available
Replace the simple boolean ScaledUp property of ScaleUpStatus with a more
comprehensive ScaleUpResult. Add more possible values to ScaleDownResult.
Refactor the processors execution so that they are always executed every
iteration, even if RunOnce exits earlier.
2018-09-20 17:12:02 +02:00
Steve Scaffidi 88d857222d Renamed one more variable for consistency
Change-Id: Idf42fd58089a1e75f3291ab7cc583735c68735f2
2018-09-17 14:08:10 -04:00
Steve Scaffidi 56b5456269 Fixing nits: renamed newPodScaleUpBuffer -> newPodScaleUpDelay, deleted redundant comment
Change-Id: I7969194d8e07e2fb34029d0d7990341c891d0623
2018-09-17 10:38:28 -04:00
Steve Scaffidi 33b93cbc5f Add configurable delay for pod age before considering for scale-up
- This is intended to address the issue described in https://github.com/kubernetes/autoscaler/issues/923
  - the delay is configurable via a CLI option
  - in production (on AWS) we set this to a value of 2m
  - the delay could possibly be set as low as 30s and still be effective depending on your workload and environment
  - the default of 0 for the CLI option results in no change to the CA's behavior from defaults.

Change-Id: I7e3f36bb48641faaf8a392cca01a12b07fb0ee35
2018-09-14 13:55:09 -04:00
Jakub Tużnik 71111da20c Add a scale down status processor, refactor so that there's more scale down info available to it 2018-09-12 14:52:20 +02:00
Jakub Tużnik 054f0b3b90 Add AutoscalingStatusProcessor 2018-08-07 14:47:06 +02:00
Aleksandra Malinowska 90e8a7a2d9 Move initializing defaults out of main 2018-08-02 14:04:03 +02:00
Aleksandra Malinowska 6f9b6f8290 Move ListerRegistry to context 2018-07-26 13:31:49 +02:00
Aleksandra Malinowska 07e52e6c79 Move creating cloud provider out of context 2018-07-25 13:43:47 +02:00
Aleksandra Malinowska 0976d2aa07 Move autoscaling options out of static 2018-07-25 10:52:37 +02:00
Aleksandra Malinowska 6b94d7172d Move AutoscalingOptions to config/static 2018-07-23 15:52:27 +02:00
Krzysztof Jastrzebski 2df2568841 Move removing unneeded autoprovisioned node groups to node group manager 2018-06-22 14:26:12 +02:00
Beata Skiba b8ae6df5d3 Add post scale up status processor. 2018-06-06 13:34:49 +02:00
Maciej Pytel 856855987b Move some GKE-specific logic outside core
No change in actual logic being executed. Added a new
NodeGroupListProcessor interface to encapsulate the existing logic.
Moved PodListProcessor and refactor how it's passed around
to make it consistent and easy to add similar interfaces.
2018-05-29 12:57:19 +02:00
Maciej Pytel 5faa41e683 Move PodListProcessor to new directory
It's not really a util and with more processors
coming it makes more sense to keep them in dedicated place.
2018-05-29 12:00:47 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Karol Gołąb 5bfab7d9b2 Return value moved to the caller 2018-05-18 14:59:15 +02:00
Karol Gołąb fa6f25a70a Extract ClusterStateRegistry update with its soft dependency 2018-05-18 10:25:15 +02:00
Karol Gołąb dc34b43a40 Extract another tiny method 2018-05-18 10:10:51 +02:00
Karol Gołąb 34f6a45a04 Extract method to hide a tiny bit of complexity 2018-05-18 10:01:52 +02:00
Aleksandra Malinowska d7dc3616f7
Merge pull request #868 from kgolab/kg-clean-up-010
Move metrics update to proper place
2018-05-17 14:52:18 +02:00
Karol Gołąb e31bf0bb58 Move metrics.Autoscaling after all Node-level operations & checks 2018-05-17 14:37:43 +02:00
Aleksandra Malinowska 3b6cfc7c2b
Merge pull request #870 from kgolab/kg-clean-up-012
Set lastScaleDownFailTime properly
2018-05-17 12:09:15 +02:00
MaciekPytel 444201d1e7
Merge pull request #871 from kgolab/kg-clean-up-013
Extract duplicate code into a single method
2018-05-17 11:49:49 +02:00
Karol Gołąb 400147a075 Extract duplicate code into a single method 2018-05-17 10:01:04 +02:00
Karol Gołąb b8cbdf4178 Set lastScaleDownFailTime properly - the ScaleDownError check was unreachable 2018-05-17 09:50:22 +02:00
Karol Gołąb 38a5951e22 Check glog.V once 2018-05-17 09:47:52 +02:00
Karol Gołąb ccca078a2b Move metrics update to proper place 2018-05-17 09:46:25 +02:00
MaciekPytel bc39d4dcd5
Merge pull request #842 from kgolab/kg-clean-up-008
Merge two variables into one.
2018-05-14 10:54:43 +02:00
Aleksandra Malinowska b52ec59b05 Fix cleaning up taints 2018-05-11 12:00:48 +02:00
Karol Gołąb f1f92f065e Merge two variables into one. 2018-05-10 14:32:37 +02:00
Karol Gołąb ae203ed517 Removed unused CloudProvider() method. 2018-05-08 11:23:55 +02:00
Karol Gołąb 854fcc1ff8 Remove implementation details (CleanUp) from the interface.
The CleanUp method is instead called directly from the implementation,
when required.
Test updated in a quick way since the mock we're using does not support
AtLeast(1) - thus Times(2).
2018-05-07 15:24:14 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 7e1353a865 Ignore TPU resource in simulations 2018-04-11 12:26:22 +02:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
anniedy bf59e3daa5 Typo fix unneded->[unneeded] (#623)
* Update clusterstate.md

* Update scale_down.go

* Update static_autoscaler.go
2018-02-07 17:36:58 +01:00
Beata Skiba 346a5c26a9 Remove old unregistered nodes before checking cluster healthiness 2018-02-01 16:34:50 +01:00
Aleksandra Malinowska b17b6c3ec5 Wait before publishing no nodes ready after start 2018-01-16 19:04:38 +01:00
Aleksandra Malinowska 27efa05b1d Publish ClusterUnhealthy events 2018-01-16 16:56:36 +01:00
Aleksandra Malinowska 1b728d411b Publish status and metrics for empty cluster 2018-01-16 16:07:29 +01:00
Marcin Wielgus 15b10c8f67 Skip iteration if pending pods are too new 2017-12-28 16:55:44 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Maciej Pytel 07511f444a Add Refresh method to cloud provider
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00