Commit Graph

45 Commits

Author SHA1 Message Date
MaciekPytel c41dc43704
Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes
Use bytes instead of MB for memory limits
2018-06-08 14:47:22 +02:00
Krzysztof Jastrzebski 6761d7f354 Execute predicates only for similar pods. 2018-05-29 09:36:11 +02:00
Karol Gołąb 4c710950de Move ClusterStateRegistry to StaticAutoscaler
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Aleksandra Malinowska fcc3d004f5 Use bytes instead of MB for memory limits 2018-05-17 17:35:39 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska feb4ad9e14 Add utility for limiting logging 2018-03-22 12:57:22 +01:00
Marcin Wielgus 04bec08e84 Compilation fix 2018-03-20 20:11:36 +01:00
Aleksandra Malinowska 4c594db7f8 Run spellchecker 2018-03-15 15:47:49 +01:00
Maciej Pytel abbc45da2e Delay scale-up including GPU request
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Aleksandra Malinowska 9cc322a61d Disable checking inter pod affinity predicate if only preferred or node affinity used 2018-02-14 14:40:02 +01:00
Aleksandra Malinowska 3894ecb470 Export unregistered node count metric 2018-01-16 16:56:40 +01:00
Aleksandra Malinowska 1b728d411b Publish status and metrics for empty cluster 2018-01-16 16:07:29 +01:00
Aleksandra Malinowska 3d33b64599 Export long unregistered node count metric 2018-01-16 16:07:24 +01:00
Marcin Wielgus 15b10c8f67 Skip iteration if pending pods are too new 2017-12-28 16:55:44 +01:00
Marcin Wielgus f8c0e20ad9 Source fix after godep update 2017-11-28 14:01:43 +01:00
Aleksandra Malinowska 2ff962e53e Remove --unregistered-node-removal-time flag 2017-11-15 11:11:30 +01:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Maciej Pytel 098ebbee09 Log event when removing unregistered node 2017-09-22 22:48:07 +02:00
Maciej Pytel 5e05c84cf0 Add metric counting failed scale-ups
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Marcin Wielgus f04113d746 Remove TargetSize() from loops iterating over nodes 2017-09-13 22:33:17 +02:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Clayton Coleman e84807e828
Do not include ToBeDeleted taint when constructing a template
This results in the simulator being unable to place candidate pods
because the taint blocks all scheduling.
2017-09-10 22:31:39 -04:00
Maciej Pytel 69c5ea03ce Disable MatchInterPodAffinity if there are no pods using affinity 2017-08-30 16:18:31 +02:00
Aleksandra Malinowska ac0d8388bc use OwnerReferences instead of deprecated created by annotation 2017-08-29 17:26:38 +02:00
Maciej Pytel 2f6dd8aefc Skip nodes in min-sized groups in scale-down simulation
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
2017-08-28 15:48:41 +02:00
Maciej Pytel d2faf11482 Re-use results for similar pods in FilterOutSchedulable 2017-08-21 16:32:14 +02:00
Maciej Pytel 95b5b4be94 Remove --verify-unschedulabe-pods flag
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel ef1241b3c6 Remove checking and resetting PodSchedulable condition
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Aleksandra Malinowska d9eed646f1 add taints to GCE node template 2017-07-11 16:05:30 +02:00
Marcin Wielgus 7cbf295b7f Merge pull request #161 from mwielgus/godeps-020717
Godeps bump for CA
2017-07-04 11:41:00 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Maciej Pytel 39dfced56b Strip rescheduler taint from node templates 2017-07-03 14:57:17 +02:00
Marcin Wielgus 2cd532ebfe Don't calculate utilization and run scale down simulations for unmanaged nodes 2017-06-20 16:57:30 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Maciej Pytel 7f5c7ed3a2 Used typed errors in scale up code
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Marcin Wielgus ea7bd81681 Prefer using ready nodes and cloudprovider template nodes over unready/unschedulable nodes in scale-up 2017-05-16 13:06:19 +02:00
Marcin Wielgus d9bf5aacd7 Use TemplateNodeInfo in scale up 2017-05-16 11:45:05 +02:00
Marcin Wielgus 6f5d52e3a7 Overwrite pod.spec.nodename and node.name in template nodes for scale up 2017-04-28 17:57:02 +02:00
Marcin Wielgus e1c89f8fe2 Override hostname label when building a template node 2017-04-27 17:17:01 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Maciej Pytel 10d560dae6 Cluster-Autoscaler: handle nil node group
In a few place we assumed it's not-nil, leading
to segfaults.
2017-03-13 14:46:11 +01:00
Yusuke Kuoka baee799524 cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.

The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.

Example usage:

```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```

The configmap would look like:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  settings: |-
    {
      "nodeGroups": [
        {
          "minSize": 1,
          "maxSize": 2,
          "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
        }
      ]
    }
 ```

Other notes:

* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267

* Trigger a full-recreate on a configuration change

according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410

* Introduced `autoscaler/` and moved  all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663)

* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`

* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.

* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)

* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`
2017-02-24 20:36:47 +09:00