Commit Graph

208 Commits

Author SHA1 Message Date
Jiaxin Shan 83ae66cebc Consider GPU utilization in scaling down 2019-04-04 01:12:51 -07:00
Jiaxin Shan 90666881d3 Move GPULabel and GPUTypes to cloud provider 2019-03-25 13:03:01 -07:00
Łukasz Osipiuk ea0d61f93d Migrate to using api-specific REST clients 2019-03-07 21:38:00 +01:00
Pengfei Ni 128729bae9 Move schedulercache to package nodeinfo 2019-02-21 12:41:08 +08:00
Jacek Kaniuk f054c53c46 Account for kernel reserved memory in capacity calculations 2019-02-08 17:04:07 +01:00
Kubernetes Prow Robot bd84757b7e
Merge pull request #1596 from vivekbagade/improve-filterout-logic
Added better checks for filterSchedulablePods and added a tunable fla…
2019-01-27 13:00:31 -08:00
Vivek Bagade c6b87841ce Added a new method that uses pod packing to filter schedulable pods
filterOutSchedulableByPacking is an alternative to the older
filterOutSchedulable. filterOutSchedulableByPacking sorts pods in
unschedulableCandidates by priority and filters out pods that can be
scheduled on free capacity on existing nodes. It uses a basic packing
approach to do this. Pods with nominatedNodeName set are always
filtered out.

filterOutSchedulableByPacking is set to be used by default, but, this
can be toggled off by setting filter-out-schedulable-pods-uses-packing
flag to false, which would then activate the older and more lenient
filterOutSchedulable(now called filterOutSchedulableSimple).

Added test cases for both methods.
2019-01-25 16:09:51 +05:30
Jacek Kaniuk d05dbb9ec4 Refactor tests of tainting
Refactor scale down nad deletetaint tests
Speed up deletetaint tests
2019-01-25 09:21:41 +01:00
Vivek Bagade 8fff0f6556 Removing nominatedNodeName annotation and moving to pod.Status.NominatedNodeName 2019-01-25 00:06:03 +05:30
Jacek Kaniuk d00af2373c Tainting nodes - update first, refresh on conflict 2019-01-24 16:57:27 +01:00
Jacek Kaniuk 0c64e0932a Tainting unneeded nodes as PreferNoSchedule 2019-01-21 13:06:50 +01:00
Łukasz Osipiuk b5f9a9505c Extend backoff interface with NodeInfo and error information 2019-01-09 11:25:34 +01:00
Maciej Pytel b64139d3cb Use listers in simulator 2019-01-02 15:55:13 +01:00
Maciej Pytel 9060014992 Use listers in scale-down 2018-12-31 14:55:38 +01:00
Maciej Pytel 39551df790 Reenable statefulset drain test 2018-12-31 14:54:41 +01:00
Maciej Pytel e1f09b012b Migrate utils/drain to use listers 2018-12-31 14:54:41 +01:00
Maciej Pytel ed2e3bff52 Add functions for testing new listers 2018-12-31 11:38:42 +01:00
Maciej Pytel 60babe7158 Use kubernetes lister for daemonset instead of custom one
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Maciej Pytel 40811c2f8b Add listers for more controllers 2018-12-28 13:31:21 +01:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Łukasz Osipiuk 991873c237 Fix gofmt errors 2018-11-26 15:39:59 +01:00
mooncake 812549592b Fix typos: reqest->request, approporiate->appropriate
Signed-off-by: mooncake <xcoder@tenxcloud.com>
2018-11-10 20:29:34 +08:00
k8s-ci-robot 7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
Make Backoff interface
2018-11-07 05:13:43 -08:00
Łukasz Osipiuk 0e2c3739b7 Use NodeGroup as key in Backoff 2018-10-30 18:17:26 +01:00
Łukasz Osipiuk e462d4420c Extract Backoff interface 2018-10-29 23:02:13 +01:00
Maciej Pytel 6f5e6aab6f Move node group balancing to processor
The goal is to allow customization of this logic
for different use-case and cloudproviders.
2018-10-25 14:04:05 +02:00
k8s-ci-robot 03283328a7
Merge pull request #1306 from losipiuk/lo/fluentd-ds-ready
Ignore lo/fluentd-ds-ready when checking node similarity
2018-10-10 03:55:57 -07:00
Łukasz Osipiuk e3891ba025 Ignore lo/fluentd-ds-ready when checking node similarity 2018-10-10 09:57:47 +02:00
Alexey Ermakov 9e8d026b19 deletetaint: retry on conflicts
Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>
2018-10-08 11:22:07 +02:00
Łukasz Osipiuk 52aaac362f Remove GetGpuRequests function 2018-09-05 11:58:46 +02:00
Krishnakumar R a6b81a6ca2 Code cleanup - use const from the package. 2018-08-30 22:30:44 -07:00
Aleksandra Malinowska 364e2da764 Check for ready condition not true 2018-08-30 13:43:24 +02:00
Aleksandra Malinowska f5690aab96 Make CheckPredicates return predicateError 2018-08-28 14:11:35 +02:00
Aleksandra Malinowska 8b89c8f9cd
Merge pull request #1168 from yguo0905/preemptible-tpu
Ignore resources with Cloud TPU prefix
2018-08-21 09:56:14 +01:00
Yang Guo b41c9828d9 Ignore resources with Cloud TPU prefix 2018-08-20 17:20:44 -07:00
Pengfei Ni 74045053f5 Fix potential panic 2018-07-23 11:13:09 +08:00
Arto Jantunen 11402b5ca7 Allow using the PodSafeToEvictKey annotation in reverse
Adding the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
annotation to a pod prevents the cluster autoscaler from touching it.
2018-07-11 09:32:56 +03:00
Arto Jantunen 9f3c17d153 Add test to use the PodSafeToEvictKey in reverse
When this is set to false instead of true, the pod should not be evicted by
the autoscaler.
2018-07-11 09:30:20 +03:00
Aleksandra Malinowska 800ee56b34 Refactor and extend GPU metrics error types 2018-07-05 13:13:11 +02:00
Karol Gołąb 553db2c9fc Separated errors 2018-07-05 11:30:12 +02:00
Karol Gołąb aae4d1270a Make GetGpuTypeForMetrics more robust 2018-06-26 21:35:16 +02:00
Karol Gołąb 5eb7021f82 Add GPU-related scaled_up & scaled_down metrics (#974)
* Add GPU-related scaled_up & scaled_down metrics

* Fix name to match SD naming convention

* Fix import after master rebase

* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Łukasz Osipiuk 57ea19599e Explicitly return AutoscalerError from GetNodeTargetGpus 2018-06-14 15:46:58 +02:00
Aleksandra Malinowska bc526e71e8
Merge pull request #960 from krzysztof-jastrzebski/backoff
Move backoff mechanism to utils.
2018-06-14 14:35:33 +02:00
Krzysztof Jastrzebski dd1db7a0ac Move backoff mechanism to utils. 2018-06-13 15:32:25 +02:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
Pengfei Ni 5fd15fc96b Set enableEquivalenceClassCache for schedulerConfigFactory and fix GPU resource name for unit tests 2018-06-11 15:17:46 +08:00
Pengfei Ni be3dd85503 Update scheduler cache package 2018-06-11 13:54:12 +08:00
MaciekPytel c41dc43704
Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes
Use bytes instead of MB for memory limits
2018-06-08 14:47:22 +02:00
Maciej Pytel 5faa41e683 Move PodListProcessor to new directory
It's not really a util and with more processors
coming it makes more sense to keep them in dedicated place.
2018-05-29 12:00:47 +02:00
Clayton Coleman 6146e0dbc1
Autoscaler doesn't drain nodes that have terminal pods
Terminal pods (Succeeded or Failed phase) are drained by kubectl drain,
and autoscaler should also drain them.
2018-05-28 22:35:37 -04:00
Karol Gołąb bada827839 Simplify the code by removing superfluous variable 2018-05-18 09:38:47 +02:00
Aleksandra Malinowska 3ccfa5be23 Move universal constants to separate module 2018-05-17 18:36:43 +02:00
Łukasz Osipiuk c406da4174 Support gpus in nodes and pods definitions in UT 2018-05-15 22:43:31 +02:00
Aleksandra Malinowska b2ad790121
Merge pull request #830 from aleksandra-malinowska/stateful-set-drain
Add support for rescheduled pods with the same name in drain
2018-05-11 13:47:53 +02:00
Aleksandra Malinowska 44ba1c719f Fix log message 2018-05-11 12:58:32 +02:00
Karol Gołąb f877f5a64e Remove unused error handling 2018-05-10 12:15:42 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 7e1353a865 Ignore TPU resource in simulations 2018-04-11 12:26:22 +02:00
AdamDang d4ba9120e3
correct the returned message in ready.go
feadiness->readiness
2018-04-08 23:00:02 +08:00
Aleksandra Malinowska 8ae3636ccf Fix method name 2018-03-28 13:23:38 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
AdamDang 5c4693f95f
Typo fix "typ"->"type"
line 31 and line 82: "the typ of AutoscalerError"
here shoule be type
2018-03-13 19:50:19 +08:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Maciej Pytel d876d74912 Ignore unfitness in price expander if using GPU 2018-03-02 15:50:43 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel 6554919700 Helper function to calculate GPU requests for NAP 2017-12-11 13:12:22 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Marcin Wielgus 26960b49df
Merge pull request #460 from sergeylanzman/replace-depricate-func
Replace deprecate kubernetes client functions
2017-11-22 15:58:01 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Sergey Lanzman eb546b87a0 Replace deprecate kubernetes client functions 2017-11-09 19:49:41 +02:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Henrique Rodrigues 56135db3b0 Annotation which indicates that a pod is safe to evict despite other constraints 2017-10-26 09:29:50 -02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Aleksandra Malinowska 7e36ea61c0 Keep graceful termination timeout consistent 2017-09-21 12:54:11 +02:00
Krzysztof Jastrzebski 6b8b8b8fe1 Cloudprovider/gce/gce_manager.go unit tests. 2017-09-19 11:16:08 +02:00
Krzysztof Jastrzebski 0aec68a46d Core/static_autoscaler.go unit tests. Current time usage refactoring. 2017-09-11 15:07:21 +02:00
Aleksandra Malinowska d43029c180 implement blocking scale up beyond max cores & memory 2017-09-08 12:50:00 +02:00
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Sergey Lanzman 415f53cdea Change from deprecated Core to CoreV1 for kube client 2017-09-04 22:16:21 +03:00
Marcin Wielgus 2d8f59e23d Set verbosity for each of the glog.Info logs 2017-09-01 12:34:29 +02:00
Marcin Wielgus fbf0d6f499 Merge pull request #271 from aleksandra-malinowska/creator-ref
Use OwnerReferences in place of deprecated created by annotation
2017-08-30 04:21:58 +05:30
Aleksandra Malinowska ac0d8388bc use OwnerReferences instead of deprecated created by annotation 2017-08-29 17:26:38 +02:00
Maciej Pytel 281afa7147 precompute predicateMetadata in scale-down 2017-08-29 16:29:45 +02:00
Maciej Pytel fb6ef75d12 Don't create verbose errors in predicates if we ignore them
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Marcin Wielgus 33f3fcdef9 NAP - pick best labels for pods 2017-08-17 10:47:15 +02:00
Marcin Wielgus b8c1fc2b01 Fix listers in CA after godep update 2017-08-14 00:14:31 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Aleksandra Malinowska c159a90f04 rename test provider package 2017-07-06 16:23:15 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Marcin Wielgus 0fd87aeca7 Merge pull request #100 from aleksandra-malinowska/evict-kube-system-pods
Add allowing eviction of kube-system pods with PDB
2017-06-02 10:33:07 -07:00
Maciej Pytel 7c327a951f Function to balance scale-up between node groups 2017-06-02 15:53:50 +02:00