Commit Graph

132 Commits

Author SHA1 Message Date
k8s-ci-robot 03283328a7
Merge pull request #1306 from losipiuk/lo/fluentd-ds-ready
Ignore lo/fluentd-ds-ready when checking node similarity
2018-10-10 03:55:57 -07:00
Łukasz Osipiuk e3891ba025 Ignore lo/fluentd-ds-ready when checking node similarity 2018-10-10 09:57:47 +02:00
Alexey Ermakov 9e8d026b19 deletetaint: retry on conflicts
Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>
2018-10-08 11:22:07 +02:00
Łukasz Osipiuk 52aaac362f Remove GetGpuRequests function 2018-09-05 11:58:46 +02:00
Krishnakumar R a6b81a6ca2 Code cleanup - use const from the package. 2018-08-30 22:30:44 -07:00
Aleksandra Malinowska 364e2da764 Check for ready condition not true 2018-08-30 13:43:24 +02:00
Aleksandra Malinowska f5690aab96 Make CheckPredicates return predicateError 2018-08-28 14:11:35 +02:00
Aleksandra Malinowska 8b89c8f9cd
Merge pull request #1168 from yguo0905/preemptible-tpu
Ignore resources with Cloud TPU prefix
2018-08-21 09:56:14 +01:00
Yang Guo b41c9828d9 Ignore resources with Cloud TPU prefix 2018-08-20 17:20:44 -07:00
Pengfei Ni 74045053f5 Fix potential panic 2018-07-23 11:13:09 +08:00
Arto Jantunen 11402b5ca7 Allow using the PodSafeToEvictKey annotation in reverse
Adding the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
annotation to a pod prevents the cluster autoscaler from touching it.
2018-07-11 09:32:56 +03:00
Arto Jantunen 9f3c17d153 Add test to use the PodSafeToEvictKey in reverse
When this is set to false instead of true, the pod should not be evicted by
the autoscaler.
2018-07-11 09:30:20 +03:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb 553db2c9fc Separated errors 2018-07-05 11:30:12 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Łukasz Osipiuk 57ea19599e Explicitly return AutoscalerError from GetNodeTargetGpus 2018-06-14 15:46:58 +02:00
Aleksandra Malinowska bc526e71e8
Merge pull request #960 from krzysztof-jastrzebski/backoff
Move backoff mechanism to utils.
2018-06-14 14:35:33 +02:00
Krzysztof Jastrzebski dd1db7a0ac Move backoff mechanism to utils. 2018-06-13 15:32:25 +02:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
Pengfei Ni 5fd15fc96b Set enableEquivalenceClassCache for schedulerConfigFactory and fix GPU resource name for unit tests 2018-06-11 15:17:46 +08:00
Pengfei Ni be3dd85503 Update scheduler cache package 2018-06-11 13:54:12 +08:00
MaciekPytel c41dc43704
Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes
Use bytes instead of MB for memory limits
2018-06-08 14:47:22 +02:00
Maciej Pytel 5faa41e683 Move PodListProcessor to new directory
It's not really a util and with more processors
coming it makes more sense to keep them in dedicated place.
2018-05-29 12:00:47 +02:00
Clayton Coleman 6146e0dbc1
Autoscaler doesn't drain nodes that have terminal pods
Terminal pods (Succeeded or Failed phase) are drained by kubectl drain,
and autoscaler should also drain them.
2018-05-28 22:35:37 -04:00
Karol Gołąb bada827839 Simplify the code by removing superfluous variable 2018-05-18 09:38:47 +02:00
Aleksandra Malinowska 3ccfa5be23 Move universal constants to separate module 2018-05-17 18:36:43 +02:00
Łukasz Osipiuk c406da4174 Support gpus in nodes and pods definitions in UT 2018-05-15 22:43:31 +02:00
Aleksandra Malinowska b2ad790121
Merge pull request #830 from aleksandra-malinowska/stateful-set-drain
Add support for rescheduled pods with the same name in drain
2018-05-11 13:47:53 +02:00
Aleksandra Malinowska 44ba1c719f Fix log message 2018-05-11 12:58:32 +02:00
Karol Gołąb f877f5a64e Remove unused error handling 2018-05-10 12:15:42 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 7e1353a865 Ignore TPU resource in simulations 2018-04-11 12:26:22 +02:00
AdamDang d4ba9120e3
correct the returned message in ready.go
feadiness->readiness
2018-04-08 23:00:02 +08:00
Aleksandra Malinowska 8ae3636ccf Fix method name 2018-03-28 13:23:38 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
AdamDang 5c4693f95f
Typo fix "typ"->"type"
line 31 and line 82: "the typ of AutoscalerError"
here shoule be type
2018-03-13 19:50:19 +08:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Maciej Pytel d876d74912 Ignore unfitness in price expander if using GPU 2018-03-02 15:50:43 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel 6554919700 Helper function to calculate GPU requests for NAP 2017-12-11 13:12:22 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Marcin Wielgus 26960b49df
Merge pull request #460 from sergeylanzman/replace-depricate-func
Replace deprecate kubernetes client functions
2017-11-22 15:58:01 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Sergey Lanzman eb546b87a0 Replace deprecate kubernetes client functions 2017-11-09 19:49:41 +02:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00