Commit Graph

174 Commits

Author SHA1 Message Date
Maciej Pytel d2faf11482 Re-use results for similar pods in FilterOutSchedulable 2017-08-21 16:32:14 +02:00
Beata Skiba 14df1b808b Drill down scale down metrics
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Maciej Pytel 95b5b4be94 Remove --verify-unschedulabe-pods flag
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel ef1241b3c6 Remove checking and resetting PodSchedulable condition
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus 998b3f1acd Merge pull request #198 from MaciekPytel/support_zone_failures
Backoff for node group after failed scale-up
2017-08-16 20:46:45 +05:30
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Marcin Wielgus 4580e1dc45 Fix getEmptyNodes function in CA 2017-08-07 22:21:41 +02:00
Maciej Pytel 6aacbb5bf7 Backoff for node group after failed scale-up 2017-08-04 15:40:23 +02:00
Ivan Towlson 902d2414b7 Fixed typoes of name 'Kubernetes' 2017-08-03 14:20:23 +12:00
Marcin Wielgus 55d750196c Add a flag to turn off pod status condition reseting for performance tests 2017-07-24 15:53:45 +02:00
Aleksandra Malinowska ab8323e8dc fix some logs in scale down 2017-07-20 10:33:42 +02:00
Aleksandra Malinowska 2de8ccc8e1 Change scope of scaleUp metric 2017-07-18 12:17:51 +02:00
Hanfei Shen 2dff7466f8 fix typo for logging 2017-07-14 13:14:27 +08:00
MaciekPytel 2ac2535a48 Merge pull request #169 from aleksandra-malinowska/test-provider-package-name
Rename testprovider package
2017-07-13 12:20:30 +02:00
fate-grand-order 5b230a45ee correct some misspells for cluster-autoscaler/core 2017-07-13 17:53:59 +08:00
Aleksandra Malinowska d9eed646f1 add taints to GCE node template 2017-07-11 16:05:30 +02:00
Aleksandra Malinowska aa1771107e change scope of findUnneeded metric 2017-07-07 16:30:59 +02:00
Aleksandra Malinowska c159a90f04 rename test provider package 2017-07-06 16:23:15 +02:00
Aleksandra Malinowska 9f54934229 add annotation 2017-07-06 14:47:32 +02:00
Marcin Wielgus 7cbf295b7f Merge pull request #161 from mwielgus/godeps-020717
Godeps bump for CA
2017-07-04 11:41:00 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Maciej Pytel 39dfced56b Strip rescheduler taint from node templates 2017-07-03 14:57:17 +02:00
Yusuke Kuoka 7697d5345a cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled
By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.
2017-06-22 10:25:37 +09:00
Marcin Wielgus 2cd532ebfe Don't calculate utilization and run scale down simulations for unmanaged nodes 2017-06-20 16:57:30 +02:00
Marcin Wielgus 63e679a74f Merge pull request #120 from MaciekPytel/fix_graceful_flag
Fix typos related to max-graceful-termination-sec
2017-06-14 14:42:35 +02:00
Maciej Pytel 767367c866 Fix typos related to max-graceful-termination-sec 2017-06-14 14:14:21 +02:00
Maciej Pytel fe514ed75d Make status configmap respect namespace parameter 2017-06-14 14:07:13 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Marcin Wielgus e2e171b7b7 Enable pricing in expander factory 2017-06-09 11:09:43 -07:00
Marcin Wielgus be0d16a57f Move Autoscaler Builder to a new file 2017-06-09 10:02:44 -07:00
Maciej Pytel cd186f3ebc Balance sizes of similar nodegroups in scale-up 2017-06-06 00:52:38 +02:00
Maciej Pytel 58cdfa1702 Updated log levels in main loop 2017-05-18 14:09:15 +02:00
Maciej Pytel 3f8ca51768 Use typed errors in scale down 2017-05-18 14:09:15 +02:00
Maciej Pytel 7f5c7ed3a2 Used typed errors in scale up code
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel f716a7e496 Add typed errors; add errors_total metric
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Marcin Wielgus ea7bd81681 Prefer using ready nodes and cloudprovider template nodes over unready/unschedulable nodes in scale-up 2017-05-16 13:06:19 +02:00
Marcin Wielgus d9bf5aacd7 Use TemplateNodeInfo in scale up 2017-05-16 11:45:05 +02:00
Maciej Pytel 7a21a68b56 Add metrics counting CA operations 2017-05-15 13:03:00 +02:00
Maciej Pytel 4cdf06ea94 Added CA metrics related to autoscaler execution 2017-05-11 14:51:04 +02:00
Maciej Pytel 83ef3d2be3 Added CA metrics related to cluster state 2017-05-11 13:54:04 +02:00
Marcin Wielgus 0a0129f511 Daemonset listers 2017-05-11 12:30:27 +02:00
Marcin Wielgus 30cb7a52e5 Merge pull request #11 from mumoshu/node-group-auto-discovery-with-asg-tag
cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
2017-05-10 11:07:58 +02:00
Yusuke Kuoka 5304e9af21 cluster-autoscaler: Fix typos in comments 2017-05-10 11:22:15 +09:00
Yusuke Kuoka e9c7cd0733 cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
This is an alternative implementation of https://github.com/kubernetes/contrib/pull/1982

Notable differences from the original PR are:

* A new flag named `--node-group-auto-discovery` is introduced for opting in to enable the auto-discovery feature.
  * For example, specifying `--cloud-provider aws --node-group-auto-discovery asg:tag=k8s.io/cluster-autoscaler/enabled` instructs CA to auto-discover ASGs tagged with `k8s.io/cluster-autoscaler/enabled` to be used as target node groups
* The new code path introduced by this PR is executed only when `node-group-auto-discovery` is specified. There is relatively less chance to break existing features by introducing this change

Resolves https://github.com/kubernetes/contrib/issues/1956

---

Other notes:

* We rely mainly on the `DescribeTags` API rather than `DescribeAutoScalingGroups` so that AWS can filter out unnecessary ASGs which doesn't belong to the k8s cluster, for us.
  * If we relied on `DescribeAutoScalingGroups` here, as it doesn't support `Filter`ing, we'd need to iterate over ALL the ASGs available in an AWS account, which isn't desirable due to unnecessary excessive API calls and network usages

* Update cloudprovider/aws/README for the new configuration

* Warn abount invalid combination of flags
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113713138

* Emit a validation error when both --nodes and --node-group-auto-discovery are specified
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113958080

TODO/Possible future improvements before recommending this to everyone:

* Cache the result of an auto-discovery for a configurable period, so that we won't invoke DescribeTags and DescribeAutoScalingGroup APIs too many times
2017-05-10 08:36:02 +09:00
Marcin Wielgus 42c177b68f Add deletion safety margin to node drain 2017-05-08 11:47:33 +02:00
Marcin Wielgus 6f5d52e3a7 Overwrite pod.spec.nodename and node.name in template nodes for scale up 2017-04-28 17:57:02 +02:00
Marcin Wielgus 6bafa2a940 Merge pull request #25 from mwielgus/label-fix
Override hostname label when building a template node
2017-04-27 17:25:43 +02:00
Marcin Wielgus e1c89f8fe2 Override hostname label when building a template node 2017-04-27 17:17:01 +02:00
Maciej Pytel 7e4212478a Fix error handling for updating node status 2017-04-25 17:34:23 +02:00
Maciej Pytel 6b2ea76973 Added UT for CA simulator 2017-04-19 19:12:30 +02:00
Maciej Pytel 4d40222b63 Fix gofmt 2017-04-18 16:45:27 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Maciej Pytel 0b74a3bd25 Cluster-Autoscaler: update event name 2017-04-10 14:03:21 +02:00
Marcin Wielgus eb3e6173d1 Cluster-autoscaler: Fix isNodeStarting 2017-03-27 23:27:14 +02:00
Maciej Pytel 72c885b800 Cluster-Autoscaler: reset scale-down on unready cluster 2017-03-22 17:17:59 +01:00
Maciej Pytel c71668a8d8 Cluster-Autoscaler: update status configmap on errors
Previously it would only update after successfully completing the main
loop, meaning the status wouldn't get updated unless cluster was
healthy.
2017-03-15 13:22:24 +01:00
Kubernetes Submit Queue ac5f7634d8 Merge pull request https://github.com/kubernetes/contrib/pull/2464 from MaciekPytel/ca_drain_evictions
Automatic merge from submit-queue

Cluster-Autoscaler: evict pods instead of deleting them

This should make CA respect PodDisruptionBudget.
2017-03-15 04:27:27 -07:00
Maciej Pytel 7d5488898c Cluster-autoscaler: fix NotTriggerScaleUp event
This should fix a failing e2e test
2017-03-14 14:54:36 +01:00
Maciej Pytel 10d560dae6 Cluster-Autoscaler: handle nil node group
In a few place we assumed it's not-nil, leading
to segfaults.
2017-03-13 14:46:11 +01:00
Maciej Pytel 39162f0860 Cluster-Autoscaler: evict pods instead of deleting them 2017-03-10 16:18:47 +01:00
Maciej Pytel 5d2c675c8e Cluster-Autoscaler: update scale down status 2017-03-08 11:51:20 +01:00
Marcin Wielgus 27b797f541 Cluster-Autoscaler: skip nodes currently under deletion in scale down 2017-03-07 14:59:15 +01:00
Kubernetes Submit Queue 39fa783ad7 Merge pull request https://github.com/kubernetes/contrib/pull/2451 from mwielgus/pdb-ca
Automatic merge from submit-queue

Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2

In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict.

cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski
2017-03-06 09:27:50 -08:00
Marcin Wielgus 5b4441083a Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2 2017-03-06 17:15:04 +01:00
Maciej Pytel d3bf5d3d51 Cluster-Autoscaler: log events on status configmap 2017-03-06 12:21:24 +01:00
Maciej Pytel 84f19c1e1e Cluster-Autoscaler: add map to disable status configmap 2017-03-02 15:35:00 +01:00
Marcin Wielgus 2ffaddb7c0 Cluster-autoscaler: lint 2017-03-02 15:15:07 +01:00
Marcin Wielgus 72a47dc2b2 Cluster-autoscaler: update code for 1.6 k8s sync 2017-03-02 14:34:49 +01:00
Maciej Pytel d0196c9e1b Cluster-Autoscaler: Delete status configmap on exit 2017-02-28 17:19:23 +01:00
Maciej Pytel 497d2800ea Cluster-Autoscaler: Write status to configmap 2017-02-28 09:59:40 +01:00
Maciej Pytel 637e750246 Cluster-Autoscaler: fix segfault
StaticAutoscaler.kubeClient was uninitialized,
leading to segfaults when trying to use it. It was
also a duplicate since the client is already available
through AutoscalingContext.
2017-02-27 14:13:54 +01:00
Marcin Wielgus 83fdeb184f Cluster-autoscaler: use listers from ListersRegistry 2017-02-24 20:40:53 +01:00
Yusuke Kuoka baee799524 cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.

The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.

Example usage:

```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```

The configmap would look like:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  settings: |-
    {
      "nodeGroups": [
        {
          "minSize": 1,
          "maxSize": 2,
          "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
        }
      ]
    }
 ```

Other notes:

* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267

* Trigger a full-recreate on a configuration change

according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410

* Introduced `autoscaler/` and moved  all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663)

* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`

* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.

* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)

* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`
2017-02-24 20:36:47 +09:00