Jiaxin Shan
83ae66cebc
Consider GPU utilization in scaling down
2019-04-04 01:12:51 -07:00
Jiaxin Shan
90666881d3
Move GPULabel and GPUTypes to cloud provider
2019-03-25 13:03:01 -07:00
Łukasz Osipiuk
ea0d61f93d
Migrate to using api-specific REST clients
2019-03-07 21:38:00 +01:00
Pengfei Ni
128729bae9
Move schedulercache to package nodeinfo
2019-02-21 12:41:08 +08:00
Jacek Kaniuk
f054c53c46
Account for kernel reserved memory in capacity calculations
2019-02-08 17:04:07 +01:00
Kubernetes Prow Robot
bd84757b7e
Merge pull request #1596 from vivekbagade/improve-filterout-logic
...
Added better checks for filterSchedulablePods and added a tunable fla…
2019-01-27 13:00:31 -08:00
Vivek Bagade
c6b87841ce
Added a new method that uses pod packing to filter schedulable pods
...
filterOutSchedulableByPacking is an alternative to the older
filterOutSchedulable. filterOutSchedulableByPacking sorts pods in
unschedulableCandidates by priority and filters out pods that can be
scheduled on free capacity on existing nodes. It uses a basic packing
approach to do this. Pods with nominatedNodeName set are always
filtered out.
filterOutSchedulableByPacking is set to be used by default, but, this
can be toggled off by setting filter-out-schedulable-pods-uses-packing
flag to false, which would then activate the older and more lenient
filterOutSchedulable(now called filterOutSchedulableSimple).
Added test cases for both methods.
2019-01-25 16:09:51 +05:30
Jacek Kaniuk
d05dbb9ec4
Refactor tests of tainting
...
Refactor scale down nad deletetaint tests
Speed up deletetaint tests
2019-01-25 09:21:41 +01:00
Vivek Bagade
8fff0f6556
Removing nominatedNodeName annotation and moving to pod.Status.NominatedNodeName
2019-01-25 00:06:03 +05:30
Jacek Kaniuk
d00af2373c
Tainting nodes - update first, refresh on conflict
2019-01-24 16:57:27 +01:00
Jacek Kaniuk
0c64e0932a
Tainting unneeded nodes as PreferNoSchedule
2019-01-21 13:06:50 +01:00
Łukasz Osipiuk
b5f9a9505c
Extend backoff interface with NodeInfo and error information
2019-01-09 11:25:34 +01:00
Maciej Pytel
b64139d3cb
Use listers in simulator
2019-01-02 15:55:13 +01:00
Maciej Pytel
9060014992
Use listers in scale-down
2018-12-31 14:55:38 +01:00
Maciej Pytel
39551df790
Reenable statefulset drain test
2018-12-31 14:54:41 +01:00
Maciej Pytel
e1f09b012b
Migrate utils/drain to use listers
2018-12-31 14:54:41 +01:00
Maciej Pytel
ed2e3bff52
Add functions for testing new listers
2018-12-31 11:38:42 +01:00
Maciej Pytel
60babe7158
Use kubernetes lister for daemonset instead of custom one
...
Also migrate to using apps/v1.DaemonSet instead of old
extensions/v1beta1.
2018-12-28 13:55:41 +01:00
Maciej Pytel
40811c2f8b
Add listers for more controllers
2018-12-28 13:31:21 +01:00
Łukasz Osipiuk
016bf7fc2c
Use k8s.io/klog instead github.com/golang/glog
2018-11-26 17:30:31 +01:00
Łukasz Osipiuk
991873c237
Fix gofmt errors
2018-11-26 15:39:59 +01:00
mooncake
812549592b
Fix typos: reqest->request, approporiate->appropriate
...
Signed-off-by: mooncake <xcoder@tenxcloud.com>
2018-11-10 20:29:34 +08:00
k8s-ci-robot
7008fb50be
Merge pull request #1380 from losipiuk/lo/backoff
...
Make Backoff interface
2018-11-07 05:13:43 -08:00
Łukasz Osipiuk
0e2c3739b7
Use NodeGroup as key in Backoff
2018-10-30 18:17:26 +01:00
Łukasz Osipiuk
e462d4420c
Extract Backoff interface
2018-10-29 23:02:13 +01:00
Maciej Pytel
6f5e6aab6f
Move node group balancing to processor
...
The goal is to allow customization of this logic
for different use-case and cloudproviders.
2018-10-25 14:04:05 +02:00
k8s-ci-robot
03283328a7
Merge pull request #1306 from losipiuk/lo/fluentd-ds-ready
...
Ignore lo/fluentd-ds-ready when checking node similarity
2018-10-10 03:55:57 -07:00
Łukasz Osipiuk
e3891ba025
Ignore lo/fluentd-ds-ready when checking node similarity
2018-10-10 09:57:47 +02:00
Alexey Ermakov
9e8d026b19
deletetaint: retry on conflicts
...
Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>
2018-10-08 11:22:07 +02:00
Łukasz Osipiuk
52aaac362f
Remove GetGpuRequests function
2018-09-05 11:58:46 +02:00
Krishnakumar R
a6b81a6ca2
Code cleanup - use const from the package.
2018-08-30 22:30:44 -07:00
Aleksandra Malinowska
364e2da764
Check for ready condition not true
2018-08-30 13:43:24 +02:00
Aleksandra Malinowska
f5690aab96
Make CheckPredicates return predicateError
2018-08-28 14:11:35 +02:00
Aleksandra Malinowska
8b89c8f9cd
Merge pull request #1168 from yguo0905/preemptible-tpu
...
Ignore resources with Cloud TPU prefix
2018-08-21 09:56:14 +01:00
Yang Guo
b41c9828d9
Ignore resources with Cloud TPU prefix
2018-08-20 17:20:44 -07:00
Pengfei Ni
74045053f5
Fix potential panic
2018-07-23 11:13:09 +08:00
Arto Jantunen
11402b5ca7
Allow using the PodSafeToEvictKey annotation in reverse
...
Adding the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
annotation to a pod prevents the cluster autoscaler from touching it.
2018-07-11 09:32:56 +03:00
Arto Jantunen
9f3c17d153
Add test to use the PodSafeToEvictKey in reverse
...
When this is set to false instead of true, the pod should not be evicted by
the autoscaler.
2018-07-11 09:30:20 +03:00
Aleksandra Malinowska
800ee56b34
Refactor and extend GPU metrics error types
2018-07-05 13:13:11 +02:00
Karol Gołąb
553db2c9fc
Separated errors
2018-07-05 11:30:12 +02:00
Karol Gołąb
aae4d1270a
Make GetGpuTypeForMetrics more robust
2018-06-26 21:35:16 +02:00
Karol Gołąb
5eb7021f82
Add GPU-related scaled_up & scaled_down metrics ( #974 )
...
* Add GPU-related scaled_up & scaled_down metrics
* Fix name to match SD naming convention
* Fix import after master rebase
* Change the logic to include GPU-being-installed nodes
2018-06-22 21:00:52 +02:00
Łukasz Osipiuk
57ea19599e
Explicitly return AutoscalerError from GetNodeTargetGpus
2018-06-14 15:46:58 +02:00
Aleksandra Malinowska
bc526e71e8
Merge pull request #960 from krzysztof-jastrzebski/backoff
...
Move backoff mechanism to utils.
2018-06-14 14:35:33 +02:00
Krzysztof Jastrzebski
dd1db7a0ac
Move backoff mechanism to utils.
2018-06-13 15:32:25 +02:00
Łukasz Osipiuk
087a5cc9a9
Respect GPU limits in scale_down
2018-06-13 14:19:59 +02:00
Pengfei Ni
5fd15fc96b
Set enableEquivalenceClassCache for schedulerConfigFactory and fix GPU resource name for unit tests
2018-06-11 15:17:46 +08:00
Pengfei Ni
be3dd85503
Update scheduler cache package
2018-06-11 13:54:12 +08:00
MaciekPytel
c41dc43704
Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes
...
Use bytes instead of MB for memory limits
2018-06-08 14:47:22 +02:00
Maciej Pytel
5faa41e683
Move PodListProcessor to new directory
...
It's not really a util and with more processors
coming it makes more sense to keep them in dedicated place.
2018-05-29 12:00:47 +02:00
Clayton Coleman
6146e0dbc1
Autoscaler doesn't drain nodes that have terminal pods
...
Terminal pods (Succeeded or Failed phase) are drained by kubectl drain,
and autoscaler should also drain them.
2018-05-28 22:35:37 -04:00
Karol Gołąb
bada827839
Simplify the code by removing superfluous variable
2018-05-18 09:38:47 +02:00
Aleksandra Malinowska
3ccfa5be23
Move universal constants to separate module
2018-05-17 18:36:43 +02:00
Łukasz Osipiuk
c406da4174
Support gpus in nodes and pods definitions in UT
2018-05-15 22:43:31 +02:00
Aleksandra Malinowska
b2ad790121
Merge pull request #830 from aleksandra-malinowska/stateful-set-drain
...
Add support for rescheduled pods with the same name in drain
2018-05-11 13:47:53 +02:00
Aleksandra Malinowska
44ba1c719f
Fix log message
2018-05-11 12:58:32 +02:00
Karol Gołąb
f877f5a64e
Remove unused error handling
2018-05-10 12:15:42 +02:00
Krzysztof Jastrzebski
88b769b324
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 12:37:51 +02:00
Aleksandra Malinowska
7e1353a865
Ignore TPU resource in simulations
2018-04-11 12:26:22 +02:00
AdamDang
d4ba9120e3
correct the returned message in ready.go
...
feadiness->readiness
2018-04-08 23:00:02 +08:00
Aleksandra Malinowska
8ae3636ccf
Fix method name
2018-03-28 13:23:38 +02:00
Aleksandra Malinowska
feb4ad9e14
Add utility for limiting logging
2018-03-22 12:57:22 +01:00
Marcin Wielgus
04bec08e84
Compilation fix
2018-03-20 20:11:36 +01:00
Aleksandra Malinowska
4c594db7f8
Run spellchecker
2018-03-15 15:47:49 +01:00
AdamDang
5c4693f95f
Typo fix "typ"->"type"
...
line 31 and line 82: "the typ of AutoscalerError"
here shoule be type
2018-03-13 19:50:19 +08:00
Maciej Pytel
abbc45da2e
Delay scale-up including GPU request
...
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Maciej Pytel
d876d74912
Ignore unfitness in price expander if using GPU
2018-03-02 15:50:43 +01:00
Maciej Pytel
b7f8622eb2
Create node groups with GPU in scale-up.go
...
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel
6554919700
Helper function to calculate GPU requests for NAP
2017-12-11 13:12:22 +01:00
Marcin Wielgus
f8c0e20ad9
Source fix after godep update
2017-11-28 14:01:43 +01:00
Marcin Wielgus
26960b49df
Merge pull request #460 from sergeylanzman/replace-depricate-func
...
Replace deprecate kubernetes client functions
2017-11-22 15:58:01 +01:00
Marcin Wielgus
ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
...
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel
d81dca5991
Mark nodes with uninitialized GPUs as unready
2017-11-10 17:56:10 +01:00
Sergey Lanzman
eb546b87a0
Replace deprecate kubernetes client functions
2017-11-09 19:49:41 +02:00
Marcin Wielgus
439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
...
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba
2b28ac1a04
Add a workaround for scaling of VMs with GPUs
...
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang
4104a91991
more spelling fixes
2017-11-02 14:21:36 -07:00
Henrique Rodrigues
56135db3b0
Annotation which indicates that a pod is safe to evict despite other constraints
2017-10-26 09:29:50 -02:00
Krzysztof Jastrzebski
d9c00e5ce1
Adds priority preemption support to cluster autoscaler.
2017-10-23 09:54:56 +02:00
Maciej Pytel
ff21b0b00c
Keep track of nodes that failed to register for a long time
...
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Aleksandra Malinowska
7e36ea61c0
Keep graceful termination timeout consistent
2017-09-21 12:54:11 +02:00
Krzysztof Jastrzebski
6b8b8b8fe1
Cloudprovider/gce/gce_manager.go unit tests.
2017-09-19 11:16:08 +02:00
Krzysztof Jastrzebski
0aec68a46d
Core/static_autoscaler.go unit tests. Current time usage refactoring.
2017-09-11 15:07:21 +02:00
Aleksandra Malinowska
d43029c180
implement blocking scale up beyond max cores & memory
2017-09-08 12:50:00 +02:00
Sergey Lanzman
437a3f60e1
Small optimize code
2017-09-04 23:50:45 +03:00
Sergey Lanzman
415f53cdea
Change from deprecated Core to CoreV1 for kube client
2017-09-04 22:16:21 +03:00
Marcin Wielgus
2d8f59e23d
Set verbosity for each of the glog.Info logs
2017-09-01 12:34:29 +02:00
Marcin Wielgus
fbf0d6f499
Merge pull request #271 from aleksandra-malinowska/creator-ref
...
Use OwnerReferences in place of deprecated created by annotation
2017-08-30 04:21:58 +05:30
Aleksandra Malinowska
ac0d8388bc
use OwnerReferences instead of deprecated created by annotation
2017-08-29 17:26:38 +02:00
Maciej Pytel
281afa7147
precompute predicateMetadata in scale-down
2017-08-29 16:29:45 +02:00
Maciej Pytel
fb6ef75d12
Don't create verbose errors in predicates if we ignore them
...
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Marcin Wielgus
33f3fcdef9
NAP - pick best labels for pods
2017-08-17 10:47:15 +02:00
Marcin Wielgus
b8c1fc2b01
Fix listers in CA after godep update
2017-08-14 00:14:31 +02:00
Marcin Wielgus
9116e4c08c
Compilation fix for CA after godeps update
2017-08-11 17:56:47 +02:00
Aleksandra Malinowska
c159a90f04
rename test provider package
2017-07-06 16:23:15 +02:00
Marcin Wielgus
fc43808149
Godeps bump for CA
2017-07-03 22:05:11 +02:00
Marcin Wielgus
1bedee5707
Update GODEPS
2017-06-13 14:48:24 +02:00
Marcin Wielgus
69c77791a2
Fix error types
2017-06-12 21:26:50 +02:00
Marcin Wielgus
0fd87aeca7
Merge pull request #100 from aleksandra-malinowska/evict-kube-system-pods
...
Add allowing eviction of kube-system pods with PDB
2017-06-02 10:33:07 -07:00
Maciej Pytel
7c327a951f
Function to balance scale-up between node groups
2017-06-02 15:53:50 +02:00