Commit Graph

208 Commits

Author SHA1 Message Date
Clayton Coleman 6146e0dbc1
Autoscaler doesn't drain nodes that have terminal pods
Terminal pods (Succeeded or Failed phase) are drained by kubectl drain,
and autoscaler should also drain them.
2018-05-28 22:35:37 -04:00
Karol Gołąb bada827839 Simplify the code by removing superfluous variable 2018-05-18 09:38:47 +02:00
Aleksandra Malinowska 3ccfa5be23 Move universal constants to separate module 2018-05-17 18:36:43 +02:00
Łukasz Osipiuk c406da4174 Support gpus in nodes and pods definitions in UT 2018-05-15 22:43:31 +02:00
Aleksandra Malinowska b2ad790121
Merge pull request #830 from aleksandra-malinowska/stateful-set-drain
Add support for rescheduled pods with the same name in drain
2018-05-11 13:47:53 +02:00
Aleksandra Malinowska 44ba1c719f Fix log message 2018-05-11 12:58:32 +02:00
Karol Gołąb f877f5a64e Remove unused error handling 2018-05-10 12:15:42 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska 7e1353a865 Ignore TPU resource in simulations 2018-04-11 12:26:22 +02:00
AdamDang d4ba9120e3
correct the returned message in ready.go
feadiness->readiness
2018-04-08 23:00:02 +08:00
Aleksandra Malinowska 8ae3636ccf Fix method name 2018-03-28 13:23:38 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
AdamDang 5c4693f95f
Typo fix "typ"->"type"
line 31 and line 82: "the typ of AutoscalerError"
here shoule be type
2018-03-13 19:50:19 +08:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Maciej Pytel d876d74912 Ignore unfitness in price expander if using GPU 2018-03-02 15:50:43 +01:00
Maciej Pytel b7f8622eb2 Create node groups with GPU in scale-up.go
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Maciej Pytel 6554919700 Helper function to calculate GPU requests for NAP 2017-12-11 13:12:22 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Marcin Wielgus 26960b49df
Merge pull request #460 from sergeylanzman/replace-depricate-func
Replace deprecate kubernetes client functions
2017-11-22 15:58:01 +01:00
Marcin Wielgus ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel d81dca5991 Mark nodes with uninitialized GPUs as unready 2017-11-10 17:56:10 +01:00
Sergey Lanzman eb546b87a0 Replace deprecate kubernetes client functions 2017-11-09 19:49:41 +02:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
Henrique Rodrigues 56135db3b0 Annotation which indicates that a pod is safe to evict despite other constraints 2017-10-26 09:29:50 -02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Aleksandra Malinowska 7e36ea61c0 Keep graceful termination timeout consistent 2017-09-21 12:54:11 +02:00
Krzysztof Jastrzebski 6b8b8b8fe1 Cloudprovider/gce/gce_manager.go unit tests. 2017-09-19 11:16:08 +02:00
Krzysztof Jastrzebski 0aec68a46d Core/static_autoscaler.go unit tests. Current time usage refactoring. 2017-09-11 15:07:21 +02:00
Aleksandra Malinowska d43029c180 implement blocking scale up beyond max cores & memory 2017-09-08 12:50:00 +02:00
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Sergey Lanzman 415f53cdea Change from deprecated Core to CoreV1 for kube client 2017-09-04 22:16:21 +03:00
Marcin Wielgus 2d8f59e23d Set verbosity for each of the glog.Info logs 2017-09-01 12:34:29 +02:00
Marcin Wielgus fbf0d6f499 Merge pull request #271 from aleksandra-malinowska/creator-ref
Use OwnerReferences in place of deprecated created by annotation
2017-08-30 04:21:58 +05:30
Aleksandra Malinowska ac0d8388bc use OwnerReferences instead of deprecated created by annotation 2017-08-29 17:26:38 +02:00
Maciej Pytel 281afa7147 precompute predicateMetadata in scale-down 2017-08-29 16:29:45 +02:00
Maciej Pytel fb6ef75d12 Don't create verbose errors in predicates if we ignore them
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Marcin Wielgus 33f3fcdef9 NAP - pick best labels for pods 2017-08-17 10:47:15 +02:00
Marcin Wielgus b8c1fc2b01 Fix listers in CA after godep update 2017-08-14 00:14:31 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Aleksandra Malinowska c159a90f04 rename test provider package 2017-07-06 16:23:15 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Marcin Wielgus 0fd87aeca7 Merge pull request #100 from aleksandra-malinowska/evict-kube-system-pods
Add allowing eviction of kube-system pods with PDB
2017-06-02 10:33:07 -07:00
Maciej Pytel 7c327a951f Function to balance scale-up between node groups 2017-06-02 15:53:50 +02:00
Aleksandra Malinowska 8ca8b24d3d Add allowing eviction of kube-system pods with PDB 2017-06-01 18:24:42 +02:00
Maciej Pytel 95dc2c53e1 Function to find similar nodegroups 2017-06-01 12:20:11 +02:00
Maciej Pytel 849b3a2712 Function to compare nodeinfos to find similar nodegroups 2017-05-31 13:21:27 +02:00
Marcin Wielgus e9ebfb1c35 Preferred node in price-preferred expander 2017-05-30 20:33:10 +02:00
Marcin Wielgus 80bf191f02 GCE pricing model 2017-05-26 17:37:32 +02:00
Maciej Pytel 7f5c7ed3a2 Used typed errors in scale up code
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel f716a7e496 Add typed errors; add errors_total metric
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Matthew Walter f64a73429a Correct typos for `deamon` -> `daemon` as in `DaemonSet` 2017-05-12 13:27:40 -04:00
Marcin Wielgus f015ef1853 Merge pull request #62 from mwielgus/zero-4
Daemonset helper function that returns pods that DaemonSet controller would start on the given node.
2017-05-12 11:59:04 +02:00
Marcin Wielgus 6d578132b9 Daemonset helper functions 2017-05-12 11:37:11 +02:00
Marcin Wielgus 0a0129f511 Daemonset listers 2017-05-11 12:30:27 +02:00
Yusuke Kuoka 5304e9af21 cluster-autoscaler: Fix typos in comments 2017-05-10 11:22:15 +09:00
Maciej Pytel 6b2ea76973 Added UT for CA simulator 2017-04-19 19:12:30 +02:00
Maciej Pytel 4d40222b63 Fix gofmt 2017-04-18 16:45:27 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Maciej Pytel 1590789292 Cluster-Autoscaler: "unknown" readiness -> unready 2017-03-15 11:16:17 +01:00
Maciej Pytel 0379a73828 Cluster-Autoscaler: fix delete taint failing 2017-03-10 12:02:52 +01:00
Kubernetes Submit Queue b171566401 Merge pull request https://github.com/kubernetes/contrib/pull/2461 from mwielgus/lister-fix
Automatic merge from submit-queue

Cluster-autoscaler: ready node lister fix

cc: @MaciekPytel @jszczepkowski
2017-03-09 08:32:45 -08:00
Marcin Wielgus 95bad10311 Cluster-autoscaler: ready node lister fix 2017-03-09 19:18:49 +03:00
Kubernetes Submit Queue 7fcab2d18e Merge pull request https://github.com/kubernetes/contrib/pull/2460 from mwielgus/pdb-typo
Automatic merge from submit-queue

Cluster-autoscaler: fix typo in pdb listener

cc: @MaciekPytel @jszczepkowski
2017-03-09 07:53:12 -08:00
Marcin Wielgus 10f848b049 Cluster-autoscaler: fix typo in pdb listener 2017-03-09 18:38:50 +03:00
Maciej Pytel d305a0021a Cluster-Autoscaler: fix delete taint value format 2017-03-09 15:24:52 +01:00
Marcin Wielgus 27b797f541 Cluster-Autoscaler: skip nodes currently under deletion in scale down 2017-03-07 14:59:15 +01:00
Marcin Wielgus 5b4441083a Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2 2017-03-06 17:15:04 +01:00
Marcin Wielgus 2ffaddb7c0 Cluster-autoscaler: lint 2017-03-02 15:15:07 +01:00
Marcin Wielgus 72a47dc2b2 Cluster-autoscaler: update code for 1.6 k8s sync 2017-03-02 14:34:49 +01:00
fate-grand-order 82e148507f fix misspell "being" in drain.go 2017-02-28 18:20:06 +08:00
Yusuke Kuoka baee799524 cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.

The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.

Example usage:

```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```

The configmap would look like:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  settings: |-
    {
      "nodeGroups": [
        {
          "minSize": 1,
          "maxSize": 2,
          "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
        }
      ]
    }
 ```

Other notes:

* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267

* Trigger a full-recreate on a configuration change

according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410

* Introduced `autoscaler/` and moved  all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663)

* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`

* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.

* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)

* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`
2017-02-24 20:36:47 +09:00
Marcin Wielgus 89a370de1a Cluster-autoscaler: expand half-deleted pod skip logic in drain 2017-02-23 16:43:32 +01:00
Marcin Wielgus e3395fdae7 Cluster-autoscaler: skip pods that are being deleted in node drain 2017-02-22 14:09:39 +01:00
Marcin Wielgus ce45c33d29 Cluster-autoscaler: update CA code for godep refresh 2017-01-20 14:46:34 +01:00
Marcin Wielgus b57ab3b48a Cluster-autoscaler: add NodeReadyPredicate and allow unready nodes in CA 2017-01-18 15:09:59 +01:00
Marcin Wielgus 5095b12f27 Cluster-autoscaler: add stop channel to listers 2017-01-05 12:57:14 +01:00
Marcin Wielgus 36cdafad45 Cluster-autoscaler: check if cluster is healthy and add new node lister 2017-01-05 11:54:10 +01:00
Marcin Wielgus 949cf37465 Cluster-autoscaler: support unready nodes in scale down 2017-01-03 14:17:59 +01:00
Marcin Wielgus 4023750b98 Cluster-autoscaler: move ToBeDeleted taint functions to utils 2016-12-30 14:14:37 +01:00
Marcin Wielgus 5b3f67d9e1 Cluster-autoscaler: add self link to BuildTestPod/Node functions 2016-12-23 23:43:56 +01:00
Marcin Wielgus befd959c0e Cluster autoscaler: more unit tests for scale down 2016-12-22 19:48:16 +01:00
Marcin Wielgus d7a9a70bbf Cluster-autoscaler: add stateful set support in drain. 2016-12-21 12:11:32 +01:00
Marcin Wielgus 7b63b6c1f1 Cluster-autoscaler: update code to compile with K8S 1.5 2016-12-13 17:22:57 +01:00
Kubernetes Submit Queue 9f3900f466 Merge pull request https://github.com/kubernetes/contrib/pull/1909 from mwielgus/own_drain
Automatic merge from submit-queue

Cluster-autoscaler: own drain

This PR adds an own drain logic to ClusterAutoscaler. Previously we were using the code from kubectl drain command but:
* due to refactorings it was hard to update the the CA dependencies. 
* we had to add some extra logic on top of it in fast drain evaluation
* a recent feature request to check the number of replicas in rs/rc was impossible to fulfill. 

cc: @fgrzadkowski @davidopp @piosz
2016-10-27 08:31:11 -07:00
Marcin Wielgus df078c9101 Cluster-autoscaler: own drain 2016-10-27 16:58:59 +02:00
Marcin Wielgus 2560c686d5 Cluster-autoscaler: allow unschedulable nodes 2016-10-27 12:45:43 +02:00
Jan Chaloupka e028312170 Remove "All rights reserved" from all the headers 2016-09-08 13:02:39 +02:00
Piotr Szczesniak 2aaf07486c Initial implementation of rescheduler 2016-07-22 13:59:23 +02:00
Piotr Szczesniak 1275f7bfac [CA] Moved listers to util package 2016-07-22 10:36:31 +02:00
Marcin Wielgus 50f57321ff Cluster-autoscaler: use cloud provider interface in the code 2016-07-11 16:40:03 +02:00
Marcin Wielgus 1655d87caa Merge pull request https://github.com/kubernetes/contrib/pull/1282 from mwielgus/mig-node
Cluster-autoscaler: fix for multi-mig autoscaling
2016-06-28 21:15:40 +02:00
Marcin Wielgus 25fd38ccb0 Cluster-autoscaler: fix for multi-mig autoscaling 2016-06-28 21:00:44 +02:00
Piotr Szczesniak df7d010861 Use default token when tokenUrl not specified 2016-06-15 14:28:33 +02:00
Piotr Szczesniak a522004066 Added support for reading GCE token from config file 2016-05-31 12:14:08 +02:00
Piotr Szczesniak bdb8987db6 Implemented unit test for FilterOutSchedulable function 2016-05-24 20:06:29 +02:00
Marcin Wielgus e6c3b766a9 Cluster autoscaler: Fix zone for operation 2016-05-06 13:33:51 +02:00
Marcin Wielgus e5b5aa3912 Cluster-autoscaler: mig node fix 2016-05-06 12:37:35 +02:00
Marcin Wielgus 2c16e8a407 Cluster-autoscaler: relax url format 2016-05-05 16:06:52 +02:00
Marcin Wielgus cb9f8b493e Cluster-autoscaler: check node/mig state 2016-04-22 19:50:12 +02:00
Piotr Szczesniak 28d7b61cf9 Cluster-autoscaler: Implemented waiting for operation in GCE Manager 2016-04-22 14:12:08 +02:00
Piotr Szczesniak c3583b3047 Implemented operations on MIGs in GCE 2016-04-22 10:05:46 +02:00