Commit Graph

313 Commits

Author SHA1 Message Date
t-qini 89a09ccf00 Refactor the corresponding code. 2019-07-22 08:58:51 +08:00
t-qini f7c563ab06 Modify the code as the simple solution proposed by MaciekPytel. 2019-07-18 23:58:05 +08:00
t-qini 622a838c2c Modify nodal similarity rules. 2019-07-09 16:04:40 +08:00
Vivek Bagade 90aa28a077 Move pod packing in upcoming nodes to RunOnce from Estimator for performance improvements 2019-06-19 14:48:47 +02:00
Krzysztof Jastrzebski 4831d76288 Cache cloud provider node instances in cluster state. 2019-05-31 10:11:51 +02:00
Krzysztof Jastrzebski 4247c8b032 Implement functionality which delays node deletion when node has
annotation with  prefix
'delay-deletion.cluster-autoscaler.kubernetes.io/'.
2019-05-17 16:06:17 +02:00
Kubernetes Prow Robot c756ed3953
Merge pull request #1963 from cjbradfield/ignore-taints
add --ignore-taint flag and ignore taints added by TaintNodesByCondition
2019-05-15 02:18:21 -07:00
Chris Bradfield 92ea680f1a Implement an --ignore-taint flag
This change adds support for a user to specify taints to ignore when
considering a node as a template for a node group.
2019-05-14 10:22:59 -07:00
Thomas Hartland 80aa40bda7 Move CA version to own package 2019-05-06 11:30:08 +02:00
Łukasz Osipiuk db4c6f1133 Migrate filter out schedulabe to PodListProcessor 2019-04-15 16:59:13 +02:00
Jiaxin Shan 83ae66cebc Consider GPU utilization in scaling down 2019-04-04 01:12:51 -07:00
Aleksandra Malinowska 600ba8ad10 Fix default scale down delay after delete 2019-04-02 12:47:10 +02:00
Łukasz Osipiuk 34a4262ad8 Remove GKE specific node group comparator
Change-Id: I33131fec9b7972780cffde605a087cd2ad002752
2019-03-11 17:49:59 +01:00
Kubernetes Prow Robot 8944afd901
Merge pull request #1720 from aleksandra-malinowska/events-client
Use separate client for events
2019-02-26 12:00:19 -08:00
Aleksandra Malinowska f304722a1f Use separate client for events 2019-02-25 13:58:54 +01:00
Pengfei Ni 2546d0d97c Move leaderelection options to new packages 2019-02-21 13:45:46 +08:00
Pengfei Ni 4f7600911f Update flag package to k8s.io/component-base/cli/flag 2019-02-21 11:45:33 +08:00
Jacek Kaniuk f054c53c46 Account for kernel reserved memory in capacity calculations 2019-02-08 17:04:07 +01:00
Marcin Wielgus 99f1dcf9d2
Merge branch 'master' into crc-fix-error-format 2019-02-01 17:22:57 +01:00
Vivek Bagade c6b87841ce Added a new method that uses pod packing to filter schedulable pods
filterOutSchedulableByPacking is an alternative to the older
filterOutSchedulable. filterOutSchedulableByPacking sorts pods in
unschedulableCandidates by priority and filters out pods that can be
scheduled on free capacity on existing nodes. It uses a basic packing
approach to do this. Pods with nominatedNodeName set are always
filtered out.

filterOutSchedulableByPacking is set to be used by default, but, this
can be toggled off by setting filter-out-schedulable-pods-uses-packing
flag to false, which would then activate the older and more lenient
filterOutSchedulable(now called filterOutSchedulableSimple).

Added test cases for both methods.
2019-01-25 16:09:51 +05:30
Jacek Kaniuk 0c64e0932a Tainting unneeded nodes as PreferNoSchedule 2019-01-21 13:06:50 +01:00
CodeLingo Bot c0603afdeb Fix error format strings according to best practices from CodeReviewComments
Fix error format strings according to best practices from CodeReviewComments

Fix error format strings according to best practices from CodeReviewComments

Reverted incorrect change to with error format string

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Resolve conflict

Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingoBot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <hello@codelingo.io>
Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix error strings in testscases to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>

Fix more error strings to remedy failing tests

Signed-off-by: CodeLingo Bot <bot@codelingo.io>
2019-01-11 09:10:31 +13:00
Łukasz Osipiuk d53928a11d Initialize klog 2018-11-26 20:21:23 +01:00
Łukasz Osipiuk 016bf7fc2c Use k8s.io/klog instead github.com/golang/glog 2018-11-26 17:30:31 +01:00
Alex Price 4ae7acbacc add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node
Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization
(defaults to false) and when enabled, factors DaemonSet and mirror pods out when
calculating the resource utilization of a node.
2018-11-23 15:24:25 +11:00
SataQiu a110adf4fb fix typo: posistive -> positive 2018-11-15 15:48:08 +08:00
Aleksandra Malinowska bf6ff4be8e Clean up estimators 2018-11-06 14:15:42 +01:00
Maciej Pytel 01a56a8d73 Add GKE-specific NodeGroupSet processor
Also refactor Balancing processor a bit to make it easily extensible.
2018-10-25 18:50:17 +02:00
Steve Scaffidi 56b5456269 Fixing nits: renamed newPodScaleUpBuffer -> newPodScaleUpDelay, deleted redundant comment
Change-Id: I7969194d8e07e2fb34029d0d7990341c891d0623
2018-09-17 10:38:28 -04:00
Steve Scaffidi 33b93cbc5f Add configurable delay for pod age before considering for scale-up
- This is intended to address the issue described in https://github.com/kubernetes/autoscaler/issues/923
  - the delay is configurable via a CLI option
  - in production (on AWS) we set this to a value of 2m
  - the delay could possibly be set as low as 30s and still be effective depending on your workload and environment
  - the default of 0 for the CLI option results in no change to the CA's behavior from defaults.

Change-Id: I7e3f36bb48641faaf8a392cca01a12b07fb0ee35
2018-09-14 13:55:09 -04:00
Łukasz Osipiuk 01a2e4d3cf Update leader election configuration after godeps update 2018-09-05 16:54:15 +02:00
Aleksandra Malinowska 90e8a7a2d9 Move initializing defaults out of main 2018-08-02 14:04:03 +02:00
Aleksandra Malinowska 3b1b731c91 Move constructing cloud provider dynamic config structs into cloud provider builder 2018-07-25 13:43:47 +02:00
Aleksandra Malinowska 07e52e6c79 Move creating cloud provider out of context 2018-07-25 13:43:47 +02:00
Aleksandra Malinowska 0976d2aa07 Move autoscaling options out of static 2018-07-25 10:52:37 +02:00
Aleksandra Malinowska 6b94d7172d Move AutoscalingOptions to config/static 2018-07-23 15:52:27 +02:00
Sheldon Kwok 20293c2365 Bump kubernetes.sync and fix main.go with new k8 Godeps 2018-07-17 02:54:35 -07:00
Aleksandra Malinowska 82fa2df52f Lower default expendable pod priority cutoff to -10 2018-07-04 13:45:32 +02:00
Nic Doye ebadbda2b2 issues/933 Consider making UnremovableNodeRecheckTimeout configurable 2018-06-18 11:54:14 +01:00
Łukasz Osipiuk 087a5cc9a9 Respect GPU limits in scale_down 2018-06-13 14:19:59 +02:00
MaciekPytel 705eeb0a7b
Merge pull request #934 from losipiuk/lukaszos/cleanup-how-mult-string-flags-are-handled-in-main-1fdd5
Cleanup how multi-string flags are handled in main()
2018-06-08 14:53:12 +02:00
Łukasz Osipiuk 53fc344eca Cleanup how multi-string flags are handled in main() 2018-06-08 13:36:52 +02:00
Aleksandra Malinowska 3ccfa5be23 Move universal constants to separate module 2018-05-17 18:36:43 +02:00
Aleksandra Malinowska fcc3d004f5 Use bytes instead of MB for memory limits 2018-05-17 17:35:39 +02:00
Aleksandra Malinowska 820f688d2a Update max unready nodes to 45% 2018-05-17 12:51:45 +02:00
Beata Skiba f3a242cc8a Small refactor of main.go 2018-05-15 12:39:33 +02:00
Karol Gołąb 74b540fdab Remove DynamicAutoscaler since it's unused (#851)
* Remove DynamicAutoscaler since it's unused

* Remove configmap flag with its unused-elsewhere dependecies

* gofmt
2018-05-14 20:22:42 +02:00
Karol Gołąb 854fcc1ff8 Remove implementation details (CleanUp) from the interface.
The CleanUp method is instead called directly from the implementation,
when required.
Test updated in a quick way since the mock we're using does not support
AtLeast(1) - thus Times(2).
2018-05-07 15:24:14 +02:00
Krzysztof Jastrzebski 88b769b324 Refactor cluster autoscaler builder and add pod list processor. 2018-04-26 12:37:51 +02:00
Aleksandra Malinowska f98e953eb4 Add regional flag 2018-03-12 14:15:56 +01:00
yank1 ee3f3881b9 fix typo in main file 2018-02-07 00:27:10 +08:00
Marcin Wielgus 88d97c2254
Merge pull request #462 from negz/gcedisco
Support autodetection of GCE managed instance groups by name prefix
2017-12-18 21:08:22 +01:00
Aleksandra Malinowska 312f989c15 Don't register metrics unless on leading master 2017-12-14 16:08:20 +01:00
Nic Cope e96ff07896 Replace the Polling Autoscaler
Node group discovery is now handled by cloudprovider.Refresh() in all cases.
Additionally, explicit node groups can now be used alongside autodiscovery.
2017-12-11 13:09:56 -08:00
Nic Cope 6a704a6cf4 Break down cloud provider builder by provider
The Build method was getting pretty big, this hopefully makes it a little
more readable. It also fixes a few minor error shadowing bugs.
2017-12-11 13:09:56 -08:00
Nic Cope 982f9e41a3 Support autodetection of GCE managed instance groups by name prefix
This commit adds a new usage of the --node-group-auto-discovery flag intended
for use with the GCE cloud provider. GCE instance groups can be automatically
discovered based on a prefix of their group name. Example usage:

--node-group-auto-discovery=mig:prefix=k8s-mig,minNodes=0,maxNodes=10

Note that unlike the existing AWS ASG autodetection functionality we must
specify the min and max nodes in the flag. This is because MIGs store only
a target size in the GCE API - they do not have a min and max size we can
infer via the API.

In order to alleviate this limitation a little we allow multiple uses of the
autodiscovery flag. For example to discover two classes (big and small) of
instance groups with different size limits:

./cluster-autoscaler \
  --node-group-auto-discovery=mig:prefix=k8s-a-small,minNodes=1,maxNodes=10 \
  --node-group-auto-discovery=mig:prefix=k8s-a-big,minNodes=1,maxNodes=100

Zonal clusters (i.e. multizone = false in the cloud config) will detect all
managed instance groups within the cluster's zone. Regional clusters will
detect all matching (zonal) managed instance groups within any of that region's
zones.
2017-12-11 13:09:56 -08:00
Pengfei Ni 8f7d35b4e0 Enable azure options for autoscaler 2017-11-16 21:31:49 +08:00
Marcin Wielgus 2589c43a61
Merge pull request #469 from aleksandra-malinowska/single-unregistered-flag
Remove --unregistered-node-removal-time flag
2017-11-16 13:07:52 +01:00
Aleksandra Malinowska 2ff962e53e Remove --unregistered-node-removal-time flag 2017-11-15 11:11:30 +01:00
Aleksandra Malinowska 11a7d9f137 Fix typos in FAQ 2017-11-14 14:46:09 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Maciej Pytel 9ded6f9c9e Rename clusterName flag to cluster-name for consistency 2017-10-16 14:11:27 +02:00
Matt Terry 63310ef41a Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay 2017-09-18 17:09:44 -07:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Aleksandra Malinowska d43029c180 implement blocking scale up beyond max cores & memory 2017-09-08 12:50:00 +02:00
Marcin Wielgus f9cabf3a1a Merge pull request #297 from bskiba/additional-k
Only consider up to 10% of the nodes as additional candidates for scale down
2017-09-07 04:34:23 +05:30
Sergey Lanzman 415f53cdea Change from deprecated Core to CoreV1 for kube client 2017-09-04 22:16:21 +03:00
Beata Skiba a6c18b87d2 Only consider up to 10% of the nodes as additional candidates for scale down. 2017-09-04 17:37:02 +02:00
Clayton Coleman f411e38bb8
Support resource-lock type configmap for leader election
The lock type parameter was being ignored. Use the new factory method to
instantiate the lock type.
2017-09-03 18:14:46 -04:00
Marcin Wielgus de524a6688 Limit autoprovisioned groups to 15 2017-09-01 18:25:28 +02:00
Marcin Wielgus c0b48e4a15 Merge pull request #285 from mwielgus/loglevel
Set verbosity for each of the glog.Info logs
2017-09-01 16:42:11 +05:30
Marcin Wielgus 2d8f59e23d Set verbosity for each of the glog.Info logs 2017-09-01 12:34:29 +02:00
Beata Skiba 576e4105db Make ScaleDownNonEmptyCandidatesCount a flag. 2017-08-31 15:05:06 +02:00
Marcin Wielgus 19507aa0de Node autoprovisioning flag 2017-08-31 00:48:54 +02:00
Mark Janssen f53fb8b6ed Minor fixes 2017-08-29 23:11:35 +02:00
Marcin Wielgus 81e9226d17 Merge pull request #267 from mwielgus/gke-cp-1
Add GKE mode to GCE cloud provider
2017-08-29 18:26:07 +05:30
Marcin Wielgus 3d55a669ce Merge pull request #268 from drinktee/master
add kubeconfig flag to create kube-client
2017-08-29 16:14:36 +05:30
chenguoyan01 403cd8a11e add kubeconfig flag to create kube-client 2017-08-29 15:41:32 +08:00
Marcin Wielgus 51a5ad58c0 GKE NodePool support for NAP - get NP/Migs via api - part 1 2017-08-28 20:50:02 +02:00
Marcin Wielgus 718e5db78e Run node drain/delete in a separate goroutine 2017-08-28 12:12:31 +02:00
Zach Gardner 8c23346c72 Update main.go
Fix a typo (`waints` -> `waits`
2017-08-24 05:19:24 -07:00
Beata Skiba 2ae609b93a Merge pull request #237 from bskiba/split_scale_down
Drill down scale down metrics
2017-08-22 16:41:55 +02:00
Beata Skiba 43c9b6b06b Add cleaner function labels for metrics exporting. 2017-08-22 16:09:42 +02:00
Beata Skiba 596b165808 Cloud Provider Interface for Kubemark
This allows to run Custer Autoscaler on Kubemark.
See autoscaler/cluster-autoscaler/proposals/kubemark_integration.md
for more details.
2017-08-22 15:19:10 +02:00
Beata Skiba 14df1b808b Drill down scale down metrics
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Marcin Wielgus 6df186aeac Remove Azure support 2017-08-17 22:36:31 +02:00
Maciej Pytel 95b5b4be94 Remove --verify-unschedulabe-pods flag
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Marcin Wielgus f8541bdb6d Unexport leader election functions 2017-08-11 18:13:26 +02:00
Marcin Wielgus 9116e4c08c Compilation fix for CA after godeps update 2017-08-11 17:56:47 +02:00
Ivan Towlson 902d2414b7 Fixed typoes of name 'Kubernetes' 2017-08-03 14:20:23 +12:00
Yusuke Kuoka 3e8cc02243 cluster-autoscaler: Fix node group auto discovery for AWS not to mix up ASGs from different k8s clusters 2017-06-22 15:59:53 +09:00
Marcin Wielgus 63e679a74f Merge pull request #120 from MaciekPytel/fix_graceful_flag
Fix typos related to max-graceful-termination-sec
2017-06-14 14:42:35 +02:00
Maciej Pytel 767367c866 Fix typos related to max-graceful-termination-sec 2017-06-14 14:14:21 +02:00
Maciej Pytel fe514ed75d Make status configmap respect namespace parameter 2017-06-14 14:07:13 +02:00
Marcin Wielgus 1bedee5707 Update GODEPS 2017-06-13 14:48:24 +02:00
Marcin Wielgus e2e171b7b7 Enable pricing in expander factory 2017-06-09 11:09:43 -07:00
Maciej Pytel cd186f3ebc Balance sizes of similar nodegroups in scale-up 2017-06-06 00:52:38 +02:00
Aleksandra Malinowska 972772440a Add failing health check if autoscaler loop consistently returns error 2017-05-29 11:31:57 +02:00
Aleksandra Malinowska 7c94367099 Add health check 2017-05-25 11:37:44 +02:00
Maciej Pytel f716a7e496 Add typed errors; add errors_total metric
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Maciej Pytel 4cdf06ea94 Added CA metrics related to autoscaler execution 2017-05-11 14:51:04 +02:00
Yusuke Kuoka e9c7cd0733 cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
This is an alternative implementation of https://github.com/kubernetes/contrib/pull/1982

Notable differences from the original PR are:

* A new flag named `--node-group-auto-discovery` is introduced for opting in to enable the auto-discovery feature.
  * For example, specifying `--cloud-provider aws --node-group-auto-discovery asg:tag=k8s.io/cluster-autoscaler/enabled` instructs CA to auto-discover ASGs tagged with `k8s.io/cluster-autoscaler/enabled` to be used as target node groups
* The new code path introduced by this PR is executed only when `node-group-auto-discovery` is specified. There is relatively less chance to break existing features by introducing this change

Resolves https://github.com/kubernetes/contrib/issues/1956

---

Other notes:

* We rely mainly on the `DescribeTags` API rather than `DescribeAutoScalingGroups` so that AWS can filter out unnecessary ASGs which doesn't belong to the k8s cluster, for us.
  * If we relied on `DescribeAutoScalingGroups` here, as it doesn't support `Filter`ing, we'd need to iterate over ALL the ASGs available in an AWS account, which isn't desirable due to unnecessary excessive API calls and network usages

* Update cloudprovider/aws/README for the new configuration

* Warn abount invalid combination of flags
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113713138

* Emit a validation error when both --nodes and --node-group-auto-discovery are specified
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113958080

TODO/Possible future improvements before recommending this to everyone:

* Cache the result of an auto-discovery for a configurable period, so that we won't invoke DescribeTags and DescribeAutoScalingGroup APIs too many times
2017-05-10 08:36:02 +09:00
Maciej Pytel e8440ee15e Fix PVC informer issue 2017-04-24 14:12:27 +02:00
Marcin Wielgus 34eb4973f8 Fix imports in cluster autoscaler after migrating it from contrib 2017-04-18 15:42:04 +02:00
Maciej Pytel c87d10f042 Cluster-Autoscaler: fix ignoring node groups config 2017-03-03 17:21:24 +01:00
Marcin Wielgus 1cd861227a Cluster-autoscaler: precheck that the api server link is ok 2017-03-03 14:39:23 +01:00
Maciej Pytel 84f19c1e1e Cluster-Autoscaler: add map to disable status configmap 2017-03-02 15:35:00 +01:00
Marcin Wielgus 2ffaddb7c0 Cluster-autoscaler: lint 2017-03-02 15:15:07 +01:00
Marcin Wielgus 72a47dc2b2 Cluster-autoscaler: update code for 1.6 k8s sync 2017-03-02 14:34:49 +01:00
Maciej Pytel d0196c9e1b Cluster-Autoscaler: Delete status configmap on exit 2017-02-28 17:19:23 +01:00
Yusuke Kuoka baee799524 cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.

The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.

Example usage:

```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```

The configmap would look like:

```yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  settings: |-
    {
      "nodeGroups": [
        {
          "minSize": 1,
          "maxSize": 2,
          "name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
        }
      ]
    }
 ```

Other notes:

* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267

* Trigger a full-recreate on a configuration change

according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410

* Introduced `autoscaler/` and moved  all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663)

* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`

* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.

* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)

* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`
2017-02-24 20:36:47 +09:00