Commit Graph

172 Commits

Author SHA1 Message Date
Marcin Wielgus db63ac3a18 Merge pull request #324 from aleksandra-malinowska/scale-down-pod-not-found
Add checking for pod not found error on eviction
2017-09-11 15:10:08 +05:30
Clayton Coleman e84807e828
Do not include ToBeDeleted taint when constructing a template
This results in the simulator being unable to place candidate pods
because the taint blocks all scheduling.
2017-09-10 22:31:39 -04:00
Beata Skiba 1d10a14aa0 Merge pull request #318 from bskiba/fix-empty
Always add empty nodes to unneeded nodes
2017-09-08 16:31:19 +02:00
Beata Skiba 6e5784a519 Always add empty nodes to unneeded nodes 2017-09-08 15:55:18 +02:00
Aleksandra Malinowska fbc8462b10 Add checking for not found error 2017-09-08 15:45:44 +02:00
Aleksandra Malinowska d43029c180 implement blocking scale up beyond max cores & memory 2017-09-08 12:50:00 +02:00
Marcin Wielgus fc599bd08c Merge pull request #310 from krzysztof-jastrzebski/core-test
Core/utils.go unit tests
2017-09-07 17:15:58 +05:30
Krzysztof Jastrzebski 2295d9bcc4 Core/utils.go unit tests 2017-09-07 13:24:12 +02:00
Marcin Wielgus f9cabf3a1a Merge pull request #297 from bskiba/additional-k
Only consider up to 10% of the nodes as additional candidates for scale down
2017-09-07 04:34:23 +05:30
Marcin Wielgus e85e94510d Tests for add autoprovisioned node groups 2017-09-06 02:44:16 +02:00
Marcin Wielgus 1ad8d9e10c Build template NodeInfo for node autoprovisioning 2017-09-05 17:28:49 +02:00
Sergey Lanzman 437a3f60e1 Small optimize code 2017-09-04 23:50:45 +03:00
Sergey Lanzman 44195b39a2 Fix small typos 2017-09-04 22:18:07 +03:00
Sergey Lanzman 415f53cdea Change from deprecated Core to CoreV1 for kube client 2017-09-04 22:16:21 +03:00
Beata Skiba a6c18b87d2 Only consider up to 10% of the nodes as additional candidates for scale down. 2017-09-04 17:37:02 +02:00
Aleksandra Malinowska 7ae64de0af Merge pull request #291 from mwielgus/nap-cleanup
Clean up empty autoprovisioned node groups
2017-09-04 15:03:26 +02:00
Marcin Wielgus bcc8cded64 Clean up empty autoprovisioned node groups 2017-09-04 13:53:07 +02:00
Marcin Wielgus ae00f0544b Merge pull request #290 from mwielgus/max-nap-groups
Limit autoprovisioned groups to 15
2017-09-01 23:49:33 +05:30
Marcin Wielgus de524a6688 Limit autoprovisioned groups to 15 2017-09-01 18:25:28 +02:00
Maciej Pytel a440d92a60 Log event on scale-up timeout 2017-09-01 14:19:14 +02:00
Maciej Pytel a86268f114 Write event on scale-up failure 2017-09-01 13:34:20 +02:00
Marcin Wielgus c0b48e4a15 Merge pull request #285 from mwielgus/loglevel
Set verbosity for each of the glog.Info logs
2017-09-01 16:42:11 +05:30
Marcin Wielgus 021a2fdf5d Merge pull request #286 from mwielgus/exist-no-error
Do not return error from exist
2017-09-01 16:05:52 +05:30
Marcin Wielgus 2d8f59e23d Set verbosity for each of the glog.Info logs 2017-09-01 12:34:29 +02:00
Marcin Wielgus f217d4ac93 Do not return error from exist 2017-09-01 00:24:01 +02:00
Beata Skiba 576e4105db Make ScaleDownNonEmptyCandidatesCount a flag. 2017-08-31 15:05:06 +02:00
Beata Skiba 4560cc0a85 Keep maximum 30 candidates for scale down with drain 2017-08-31 14:58:40 +02:00
Marcin Wielgus e9261a249c Merge pull request #284 from mwielgus/nap-5
Node autoprovisioning in scale up
2017-08-31 17:47:25 +05:30
Marcin Wielgus 22f856d4da Small refactoring in ScaleUp 2017-08-31 13:21:20 +02:00
Marcin Wielgus 6b9e56f0f9 Node autoprovisioning in scale up 2017-08-31 01:33:52 +02:00
Marcin Wielgus 19507aa0de Node autoprovisioning flag 2017-08-31 00:48:54 +02:00
Maciej Pytel 69c5ea03ce Disable MatchInterPodAffinity if there are no pods using affinity 2017-08-30 16:18:31 +02:00
Marcin Wielgus fbf0d6f499 Merge pull request #271 from aleksandra-malinowska/creator-ref
Use OwnerReferences in place of deprecated created by annotation
2017-08-30 04:21:58 +05:30
Aleksandra Malinowska ac0d8388bc use OwnerReferences instead of deprecated created by annotation 2017-08-29 17:26:38 +02:00
Maciej Pytel 281afa7147 precompute predicateMetadata in scale-down 2017-08-29 16:29:45 +02:00
Marcin Wielgus 51a5ad58c0 GKE NodePool support for NAP - get NP/Migs via api - part 1 2017-08-28 20:50:02 +02:00
Marcin Wielgus 191d140107 Don't increase pod graceful termination 2017-08-28 16:54:19 +02:00
Marcin Wielgus 6ad7ca21e8 Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size
Skip nodes in min-sized groups in scale-down simulation
2017-08-28 19:40:53 +05:30
Marcin Wielgus 9e2c76551f Merge pull request #263 from mwielgus/delete-in-goroutine
Run node drain/delete in a separate goroutine
2017-08-28 19:39:57 +05:30
Maciej Pytel 2f6dd8aefc Skip nodes in min-sized groups in scale-down simulation
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
2017-08-28 15:48:41 +02:00
Marcin Wielgus 718e5db78e Run node drain/delete in a separate goroutine 2017-08-28 12:12:31 +02:00
Marcin Wielgus 71b4ca5461 Dont block stale downs if no nodes can be removed 2017-08-26 16:29:50 +02:00
Maciej Pytel fa53e52ed9 Skip node in scale-down if it was recently found unremovable 2017-08-25 17:21:08 +02:00
Maciej Pytel fb6ef75d12 Don't create verbose errors in predicates if we ignore them
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Beata Skiba edeb522274 Add measuring of FilterOutSchedulable 2017-08-22 18:36:13 +02:00
Beata Skiba 2ae609b93a Merge pull request #237 from bskiba/split_scale_down
Drill down scale down metrics
2017-08-22 16:41:55 +02:00
Beata Skiba 43c9b6b06b Add cleaner function labels for metrics exporting. 2017-08-22 16:09:42 +02:00
Beata Skiba 44f69c6706 Extract deleting empty nodes to a separate function. 2017-08-22 16:09:42 +02:00
Maciej Pytel d2faf11482 Re-use results for similar pods in FilterOutSchedulable 2017-08-21 16:32:14 +02:00
Beata Skiba 14df1b808b Drill down scale down metrics
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Maciej Pytel 95b5b4be94 Remove --verify-unschedulabe-pods flag
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel ef1241b3c6 Remove checking and resetting PodSchedulable condition
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus 998b3f1acd Merge pull request #198 from MaciekPytel/support_zone_failures
Backoff for node group after failed scale-up
2017-08-16 20:46:45 +05:30
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Marcin Wielgus 4580e1dc45 Fix getEmptyNodes function in CA 2017-08-07 22:21:41 +02:00
Maciej Pytel 6aacbb5bf7 Backoff for node group after failed scale-up 2017-08-04 15:40:23 +02:00
Ivan Towlson 902d2414b7 Fixed typoes of name 'Kubernetes' 2017-08-03 14:20:23 +12:00
Marcin Wielgus 55d750196c Add a flag to turn off pod status condition reseting for performance tests 2017-07-24 15:53:45 +02:00
Aleksandra Malinowska ab8323e8dc fix some logs in scale down 2017-07-20 10:33:42 +02:00
Aleksandra Malinowska 2de8ccc8e1 Change scope of scaleUp metric 2017-07-18 12:17:51 +02:00
Hanfei Shen 2dff7466f8 fix typo for logging 2017-07-14 13:14:27 +08:00
MaciekPytel 2ac2535a48 Merge pull request #169 from aleksandra-malinowska/test-provider-package-name
Rename testprovider package
2017-07-13 12:20:30 +02:00
fate-grand-order 5b230a45ee correct some misspells for cluster-autoscaler/core 2017-07-13 17:53:59 +08:00
Aleksandra Malinowska d9eed646f1 add taints to GCE node template 2017-07-11 16:05:30 +02:00
Aleksandra Malinowska aa1771107e change scope of findUnneeded metric 2017-07-07 16:30:59 +02:00
Aleksandra Malinowska c159a90f04 rename test provider package 2017-07-06 16:23:15 +02:00
Aleksandra Malinowska 9f54934229 add annotation 2017-07-06 14:47:32 +02:00
Marcin Wielgus 7cbf295b7f Merge pull request #161 from mwielgus/godeps-020717
Godeps bump for CA
2017-07-04 11:41:00 +02:00
Marcin Wielgus fc43808149 Godeps bump for CA 2017-07-03 22:05:11 +02:00
Maciej Pytel 39dfced56b Strip rescheduler taint from node templates 2017-07-03 14:57:17 +02:00
Yusuke Kuoka 7697d5345a cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled
By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.
2017-06-22 10:25:37 +09:00
Marcin Wielgus 2cd532ebfe Don't calculate utilization and run scale down simulations for unmanaged nodes 2017-06-20 16:57:30 +02:00
Marcin Wielgus 63e679a74f Merge pull request #120 from MaciekPytel/fix_graceful_flag
Fix typos related to max-graceful-termination-sec
2017-06-14 14:42:35 +02:00
Maciej Pytel 767367c866 Fix typos related to max-graceful-termination-sec 2017-06-14 14:14:21 +02:00
Maciej Pytel fe514ed75d Make status configmap respect namespace parameter 2017-06-14 14:07:13 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus 69c77791a2 Fix error types 2017-06-12 21:26:50 +02:00
Marcin Wielgus e2e171b7b7 Enable pricing in expander factory 2017-06-09 11:09:43 -07:00
Marcin Wielgus be0d16a57f Move Autoscaler Builder to a new file 2017-06-09 10:02:44 -07:00
Maciej Pytel cd186f3ebc Balance sizes of similar nodegroups in scale-up 2017-06-06 00:52:38 +02:00
Maciej Pytel 58cdfa1702 Updated log levels in main loop 2017-05-18 14:09:15 +02:00
Maciej Pytel 3f8ca51768 Use typed errors in scale down 2017-05-18 14:09:15 +02:00
Maciej Pytel 7f5c7ed3a2 Used typed errors in scale up code
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel f716a7e496 Add typed errors; add errors_total metric
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Marcin Wielgus ea7bd81681 Prefer using ready nodes and cloudprovider template nodes over unready/unschedulable nodes in scale-up 2017-05-16 13:06:19 +02:00
Marcin Wielgus d9bf5aacd7 Use TemplateNodeInfo in scale up 2017-05-16 11:45:05 +02:00
Maciej Pytel 7a21a68b56 Add metrics counting CA operations 2017-05-15 13:03:00 +02:00
Maciej Pytel 4cdf06ea94 Added CA metrics related to autoscaler execution 2017-05-11 14:51:04 +02:00
Maciej Pytel 83ef3d2be3 Added CA metrics related to cluster state 2017-05-11 13:54:04 +02:00
Marcin Wielgus 0a0129f511 Daemonset listers 2017-05-11 12:30:27 +02:00
Marcin Wielgus 30cb7a52e5 Merge pull request #11 from mumoshu/node-group-auto-discovery-with-asg-tag
cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
2017-05-10 11:07:58 +02:00
Yusuke Kuoka 5304e9af21 cluster-autoscaler: Fix typos in comments 2017-05-10 11:22:15 +09:00
Yusuke Kuoka e9c7cd0733 cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
This is an alternative implementation of https://github.com/kubernetes/contrib/pull/1982

Notable differences from the original PR are:

* A new flag named `--node-group-auto-discovery` is introduced for opting in to enable the auto-discovery feature.
  * For example, specifying `--cloud-provider aws --node-group-auto-discovery asg:tag=k8s.io/cluster-autoscaler/enabled` instructs CA to auto-discover ASGs tagged with `k8s.io/cluster-autoscaler/enabled` to be used as target node groups
* The new code path introduced by this PR is executed only when `node-group-auto-discovery` is specified. There is relatively less chance to break existing features by introducing this change

Resolves https://github.com/kubernetes/contrib/issues/1956

---

Other notes:

* We rely mainly on the `DescribeTags` API rather than `DescribeAutoScalingGroups` so that AWS can filter out unnecessary ASGs which doesn't belong to the k8s cluster, for us.
  * If we relied on `DescribeAutoScalingGroups` here, as it doesn't support `Filter`ing, we'd need to iterate over ALL the ASGs available in an AWS account, which isn't desirable due to unnecessary excessive API calls and network usages

* Update cloudprovider/aws/README for the new configuration

* Warn abount invalid combination of flags
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113713138

* Emit a validation error when both --nodes and --node-group-auto-discovery are specified
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113958080

TODO/Possible future improvements before recommending this to everyone:

* Cache the result of an auto-discovery for a configurable period, so that we won't invoke DescribeTags and DescribeAutoScalingGroup APIs too many times
2017-05-10 08:36:02 +09:00
Marcin Wielgus 42c177b68f Add deletion safety margin to node drain 2017-05-08 11:47:33 +02:00
Marcin Wielgus 6f5d52e3a7 Overwrite pod.spec.nodename and node.name in template nodes for scale up 2017-04-28 17:57:02 +02:00
Marcin Wielgus 6bafa2a940 Merge pull request #25 from mwielgus/label-fix
Override hostname label when building a template node
2017-04-27 17:25:43 +02:00
Marcin Wielgus e1c89f8fe2 Override hostname label when building a template node 2017-04-27 17:17:01 +02:00
Maciej Pytel 7e4212478a Fix error handling for updating node status 2017-04-25 17:34:23 +02:00
Maciej Pytel 6b2ea76973 Added UT for CA simulator 2017-04-19 19:12:30 +02:00
Maciej Pytel 4d40222b63 Fix gofmt 2017-04-18 16:45:27 +02:00