autoscaler

Commit Graph

Author	SHA1	Message	Date
Andrew McDermott	5bc77f051c	UPSTREAM: <carry>: fix calculation of max cluster size When scaling up, the calculation for the maximum size of the cluster based on `--max-nodes-total` doesn't take into account any nodes that are in the process of coming up. This allows the cluster to grow beyond the size specified. With this change I now see: scale_up.go:266] 21 other pods are also unschedulable scale_up.go:423] Best option to resize: openshift-cluster-api/amcdermo-ca-worker-us-east-2b scale_up.go:427] Estimated 18 nodes needed in openshift-cluster-api/amcdermo-ca-worker-us-east-2b scale_up.go:432] Capping size to max cluster total size (23) static_autoscaler.go:275] Failed to scale up: max node total count already reached	2018-12-18 17:05:19 +00:00
Zhenhai Gao	df10e5f5c2	Fix log output detailed warning info Signed-off-by: Zhenhai Gao <gaozh1988@live.com>	2018-12-07 17:25:54 +08:00
Andrew McDermott	fd3fd85f26	UPSTREAM: <carry>: handle nil nodeGroup in calculateScaleDownGpusTotal Explicitly handle nil as a return value for nodeGroup in `calculateScaleDownGpusTotal()` when `NodeGroupForNode()` is called for GPU nodes that don't exist. The current logic generates a runtime exception: "reflect: call of reflect.Value.IsNil on zero Value" Looking through the rest of the tree all the other places that use this pattern additionally and explicitly check whether `nodeGroup == nil` first. This change now completes the pattern in `calculateScaleDownGpusTotal()`. Looking at the other occurrences of this pattern we see: ``` File: clusterstate/clusterstate.go 488:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/utils.go 231:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 322:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 394:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 461:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { File: core/scale_down.go 185:6: if reflect.ValueOf(nodeGroup).IsNil() { 608:27: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 747:26: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { 1010:25: if nodeGroup == nil \|\| reflect.ValueOf(nodeGroup).IsNil() { ``` with the notable exception at core/scale_down.go:185 which is `calculateScaleDownGpusTotal()`. With this change, and invoking the autoscaler with: ``` ... --max-nodes-total=24 \ --cores-total=8:128 \ --memory-total=4:256 \ --gpu-total=nvidia.com/gpu:0:16 \ --gpu-total=amd.com/gpu:0:4 \ ... ``` I no longer see a runtime exception.	2018-12-05 18:54:07 +00:00
Thomas Hartland	d0dd00c602	Fix logged error in static autoscaler	2018-12-04 16:59:57 +01:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Łukasz Osipiuk	991873c237	Fix gofmt errors	2018-11-26 15:39:59 +01:00
Alex Price	4ae7acbacc	add flags to ignore daemonsets and mirror pods when calculating resource utilization of a node Adds the flag --ignore-daemonsets-utilization and --ignore-mirror-pods-utilization (defaults to false) and when enabled, factors DaemonSet and mirror pods out when calculating the resource utilization of a node.	2018-11-23 15:24:25 +11:00
Łukasz Osipiuk	5962354c81	Inject Backoff instance to ClusterStateRegistry on creation	2018-11-13 14:25:16 +01:00
k8s-ci-robot	7008fb50be	Merge pull request #1380 from losipiuk/lo/backoff Make Backoff interface	2018-11-07 05:13:43 -08:00
Aleksandra Malinowska	6febc1ddb0	Fix formatted log messages	2018-11-06 14:51:43 +01:00
Aleksandra Malinowska	bf6ff4be8e	Clean up estimators	2018-11-06 14:15:42 +01:00
Łukasz Osipiuk	0e2c3739b7	Use NodeGroup as key in Backoff	2018-10-30 18:17:26 +01:00
Łukasz Osipiuk	55fc1e2f00	Store NodeGroup in ScaleUpRequest and ScaleDownRequest	2018-10-30 18:03:04 +01:00
Maciej Pytel	6f5e6aab6f	Move node group balancing to processor The goal is to allow customization of this logic for different use-case and cloudproviders.	2018-10-25 14:04:05 +02:00
Łukasz Osipiuk	a266420f6a	Recalculate clusterStateRegistry after adding multiple node groups	2018-10-02 17:15:20 +02:00
Łukasz Osipiuk	437efe4af6	If possible use nodeInfo based on created node group	2018-10-02 15:46:45 +02:00
Jakub Tużnik	8179e4e716	Refactor the scale-(up\|down) status processors so that they have more info available Replace the simple boolean ScaledUp property of ScaleUpStatus with a more comprehensive ScaleUpResult. Add more possible values to ScaleDownResult. Refactor the processors execution so that they are always executed every iteration, even if RunOnce exits earlier.	2018-09-20 17:12:02 +02:00
k8s-ci-robot	556029ad8d	Merge pull request #1255 from towca/feat/jtuznik/original-reasons Add the ability to retrieve the original reasons from a PredicateError	2018-09-20 07:12:37 -07:00
Jakub Tużnik	8a7338e6d8	Add the ability to retrieve the original reasons from a PredicateError	2018-09-19 17:31:34 +02:00
Łukasz Osipiuk	bf8cfef10b	NodeGroupManager.CreateNodeGroup can return extra created node groups.	2018-09-19 13:55:51 +02:00
k8s-ci-robot	d56bb24b71	Merge pull request #1244 from losipiuk/lo/muzon Call CheckPodsSchedulableOnNode in scale_up.go via caching layer	2018-09-18 02:16:35 -07:00
Steve Scaffidi	88d857222d	Renamed one more variable for consistency Change-Id: Idf42fd58089a1e75f3291ab7cc583735c68735f2	2018-09-17 14:08:10 -04:00
Steve Scaffidi	56b5456269	Fixing nits: renamed newPodScaleUpBuffer -> newPodScaleUpDelay, deleted redundant comment Change-Id: I7969194d8e07e2fb34029d0d7990341c891d0623	2018-09-17 10:38:28 -04:00
Łukasz Osipiuk	705a6d87e2	fixup! Call CheckPodsSchedulableOnNode in scale_up.go via caching layer	2018-09-17 13:01:19 +02:00
Steve Scaffidi	33b93cbc5f	Add configurable delay for pod age before considering for scale-up - This is intended to address the issue described in https://github.com/kubernetes/autoscaler/issues/923 - the delay is configurable via a CLI option - in production (on AWS) we set this to a value of 2m - the delay could possibly be set as low as 30s and still be effective depending on your workload and environment - the default of 0 for the CLI option results in no change to the CA's behavior from defaults. Change-Id: I7e3f36bb48641faaf8a392cca01a12b07fb0ee35	2018-09-14 13:55:09 -04:00
Łukasz Osipiuk	0ad4efe920	Call CheckPodsSchedulableOnNode in scale_up.go via caching layer	2018-09-13 17:01:15 +02:00
Jakub Tużnik	71111da20c	Add a scale down status processor, refactor so that there's more scale down info available to it	2018-09-12 14:52:20 +02:00
mikeweiwei	7ed0599b42	Fix delete node event (#1229 ) * Add more event.When node is deleted and then add event * move eventf above return and change type to warning	2018-09-07 14:31:57 +02:00
Łukasz Osipiuk	84d8f6fd31	Remove obsolete implementations of node-related processors	2018-09-05 11:58:46 +02:00
Aleksandra Malinowska	b88e6019f7	code review fixes 3	2018-08-28 18:11:04 +02:00
Aleksandra Malinowska	5620f76c62	Pass NoScaleUpInfo to ScaleUpStatus processor	2018-08-28 14:26:03 +02:00
Aleksandra Malinowska	cd9808185e	Report reason why pod didn't trigger scale-up	2018-08-28 14:11:36 +02:00
Aleksandra Malinowska	f5690aab96	Make CheckPredicates return predicateError	2018-08-28 14:11:35 +02:00
Jakub Tużnik	054f0b3b90	Add AutoscalingStatusProcessor	2018-08-07 14:47:06 +02:00
Aleksandra Malinowska	90e8a7a2d9	Move initializing defaults out of main	2018-08-02 14:04:03 +02:00
Aleksandra Malinowska	6f9b6f8290	Move ListerRegistry to context	2018-07-26 13:31:49 +02:00
Aleksandra Malinowska	7225a0fcab	Move all Kubernetes API clients to AutoscalingKubeClients	2018-07-26 13:31:48 +02:00
Aleksandra Malinowska	07e52e6c79	Move creating cloud provider out of context	2018-07-25 13:43:47 +02:00
Aleksandra Malinowska	0976d2aa07	Move autoscaling options out of static	2018-07-25 10:52:37 +02:00
Aleksandra Malinowska	6b94d7172d	Move AutoscalingOptions to config/static	2018-07-23 15:52:27 +02:00
Aleksandra Malinowska	f7352500d7	Merge pull request #1080 from aleksandra-malinowska/refactor-cp-3 Remove not-so-useful type check test	2018-07-23 12:00:10 +02:00
Aleksandra Malinowska	1c09fdfe6a	Remove not-so-useful type check test	2018-07-23 11:32:24 +02:00
Aleksandra Malinowska	398a1ac153	Fix error on node info not found for group	2018-07-23 11:16:12 +02:00
Aleksandra Malinowska	3b90694191	Remove autoscaler builder	2018-07-19 15:22:30 +02:00
Aleksandra Malinowska	54f8497079	Remove unused dynamic.Config	2018-07-19 14:53:09 +02:00
Pengfei Ni	1dd0147d9e	Add more events for CA	2018-07-09 15:42:05 +08:00
Aleksandra Malinowska	800ee56b34	Refactor and extend GPU metrics error types	2018-07-05 13:13:11 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Marcin Wielgus	f2e76e2592	Merge pull request #1008 from krzysztof-jastrzebski/master Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 21:01:36 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Krzysztof Jastrzebski	2df2568841	Move removing unneeded autoprovisioned node groups to node group manager	2018-06-22 14:26:12 +02:00
Nic Doye	ebadbda2b2	issues/933 Consider making UnremovableNodeRecheckTimeout configurable	2018-06-18 11:54:14 +01:00
Aleksandra Malinowska	ed5e82d85d	Merge pull request #956 from krzysztof-jastrzebski/master Create NodeGroupManager which is responsible for creating…	2018-06-14 17:25:32 +02:00
Łukasz Osipiuk	51d628c2f1	Add test to check if nodes from not autoscaled groups are used in max-nodes limit	2018-06-14 16:17:51 +02:00
Krzysztof Jastrzebski	99c8c51bb3	Create NodeGroupManager which is responsible for creating/deleting node groups.	2018-06-14 16:11:32 +02:00
Łukasz Osipiuk	b7323bc0d1	Respect GPU limits in scale_up	2018-06-14 15:46:58 +02:00
Łukasz Osipiuk	dfcbedb41f	Take into consideration nodes from not autoscaled groups when enforcing resource limits	2018-06-14 15:31:40 +02:00
Łukasz Osipiuk	b1db155c50	Remove duplicated test case	2018-06-13 19:00:37 +02:00
Łukasz Osipiuk	9f75099d2c	Restructure checking resource limits in scale_up.go Preparatory work for before introducing GPU limits	2018-06-13 19:00:37 +02:00
Łukasz Osipiuk	087a5cc9a9	Respect GPU limits in scale_down	2018-06-13 14:19:59 +02:00
Łukasz Osipiuk	1fa44a4d3a	Fix bug resulting resource limits not being enforced in scale_down	2018-06-11 16:39:07 +02:00
Łukasz Osipiuk	519064e1ec	Extract isNodeBeingDeleted function	2018-06-11 14:21:07 +02:00
Łukasz Osipiuk	6c57a01fc9	Restructure checking resource limits in scale_down.go	2018-06-11 14:02:40 +02:00
Pengfei Ni	be3dd85503	Update scheduler cache package	2018-06-11 13:54:12 +08:00
Łukasz Osipiuk	9c61477d25	Do not return error when getting cpu/memory capacity of node	2018-06-08 15:04:57 +02:00
MaciekPytel	c41dc43704	Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes Use bytes instead of MB for memory limits	2018-06-08 14:47:22 +02:00
Beata Skiba	b8ae6df5d3	Add post scale up status processor.	2018-06-06 13:34:49 +02:00
Maciej Pytel	856855987b	Move some GKE-specific logic outside core No change in actual logic being executed. Added a new NodeGroupListProcessor interface to encapsulate the existing logic. Moved PodListProcessor and refactor how it's passed around to make it consistent and easy to add similar interfaces.	2018-05-29 12:57:19 +02:00
Maciej Pytel	5faa41e683	Move PodListProcessor to new directory It's not really a util and with more processors coming it makes more sense to keep them in dedicated place.	2018-05-29 12:00:47 +02:00
Krzysztof Jastrzebski	6761d7f354	Execute predicates only for similar pods.	2018-05-29 09:36:11 +02:00
Krzysztof Jastrzebski	adad14c2c9	Delete autoprovisioned node pool after all nodes are deleted.	2018-05-28 14:22:18 +02:00
Karol Gołąb	4c710950de	Move ClusterStateRegistry to StaticAutoscaler AutoscalingContext is basically a configuration and few static helpers and API handles. ClusterStateRegistry is state and thus moved to other state-keeping objects.	2018-05-24 13:03:01 +02:00
Marcin Wielgus	494c2aff1b	Merge pull request #883 from kgolab/kg-clean-up-016 Reorder & extract initial parts of RunOnce	2018-05-22 10:06:27 +02:00
Karol Gołąb	5bfab7d9b2	Return value moved to the caller	2018-05-18 14:59:15 +02:00
Joachim Bartosik	bfb70e40ee	Allow passing taints to Node Group creation.	2018-05-18 14:33:33 +02:00
Karol Gołąb	fa6f25a70a	Extract ClusterStateRegistry update with its soft dependency	2018-05-18 10:25:15 +02:00
Karol Gołąb	dc34b43a40	Extract another tiny method	2018-05-18 10:10:51 +02:00
Karol Gołąb	34f6a45a04	Extract method to hide a tiny bit of complexity	2018-05-18 10:01:52 +02:00
Aleksandra Malinowska	3ccfa5be23	Move universal constants to separate module	2018-05-17 18:36:43 +02:00
Aleksandra Malinowska	fcc3d004f5	Use bytes instead of MB for memory limits	2018-05-17 17:35:39 +02:00
Aleksandra Malinowska	d7dc3616f7	Merge pull request #868 from kgolab/kg-clean-up-010 Move metrics update to proper place	2018-05-17 14:52:18 +02:00
Karol Gołąb	e31bf0bb58	Move metrics.Autoscaling after all Node-level operations & checks	2018-05-17 14:37:43 +02:00
Aleksandra Malinowska	3b6cfc7c2b	Merge pull request #870 from kgolab/kg-clean-up-012 Set lastScaleDownFailTime properly	2018-05-17 12:09:15 +02:00
MaciekPytel	444201d1e7	Merge pull request #871 from kgolab/kg-clean-up-013 Extract duplicate code into a single method	2018-05-17 11:49:49 +02:00
Karol Gołąb	400147a075	Extract duplicate code into a single method	2018-05-17 10:01:04 +02:00
Karol Gołąb	b8cbdf4178	Set lastScaleDownFailTime properly - the ScaleDownError check was unreachable	2018-05-17 09:50:22 +02:00
Karol Gołąb	38a5951e22	Check glog.V once	2018-05-17 09:47:52 +02:00
Karol Gołąb	ccca078a2b	Move metrics update to proper place	2018-05-17 09:46:25 +02:00
Łukasz Osipiuk	eb6eff282a	Add gpu related tests to scale_up_test	2018-05-15 22:43:31 +02:00
Łukasz Osipiuk	c406da4174	Support gpus in nodes and pods definitions in UT	2018-05-15 22:43:31 +02:00
Łukasz Osipiuk	be381facfb	Introduce asserting expanding strategy for scale_up_test	2018-05-15 17:01:31 +02:00
Łukasz Osipiuk	c1073fe23a	Model expected scale up in scale_up_test with struct	2018-05-15 17:01:30 +02:00
Łukasz Osipiuk	8bdc6a1bdc	Move commons structs from scale_up_test.go to scale_test_common.go	2018-05-15 17:00:45 +02:00
Karol Gołąb	74b540fdab	Remove DynamicAutoscaler since it's unused (#851 ) * Remove DynamicAutoscaler since it's unused * Remove configmap flag with its unused-elsewhere dependecies * gofmt	2018-05-14 20:22:42 +02:00
MaciekPytel	bc39d4dcd5	Merge pull request #842 from kgolab/kg-clean-up-008 Merge two variables into one.	2018-05-14 10:54:43 +02:00
Aleksandra Malinowska	b52ec59b05	Fix cleaning up taints	2018-05-11 12:00:48 +02:00
Karol Gołąb	f1f92f065e	Merge two variables into one.	2018-05-10 14:32:37 +02:00
Aleksandra Malinowska	ffeebde8d8	Add support for rescheduled pods with the same name in drain	2018-05-10 12:00:56 +02:00
Marcin Wielgus	9c5728fd74	Merge pull request #836 from kgolab/kg-clean-up-004 Use timestamp argument	2018-05-08 20:24:37 +02:00
Karol Gołąb	53b1c6a394	Use timestamp argument	2018-05-08 13:08:30 +02:00
MaciekPytel	e5659e7c57	Merge pull request #835 from kgolab/kg-clean-up-003 Make the code slightly more idiomatic go	2018-05-08 12:58:14 +02:00
Karol Gołąb	da16642bcf	Make the code slightly more idiomatic go	2018-05-08 11:35:01 +02:00
Karol Gołąb	ae203ed517	Removed unused CloudProvider() method.	2018-05-08 11:23:55 +02:00
Karol Gołąb	854fcc1ff8	Remove implementation details (CleanUp) from the interface. The CleanUp method is instead called directly from the implementation, when required. Test updated in a quick way since the mock we're using does not support AtLeast(1) - thus Times(2).	2018-05-07 15:24:14 +02:00
Beata Skiba	054f6d8650	Merge pull request #794 from krzysztof-jastrzebski/pods Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	3d599bfabe	Rephrase unremovable node warning	2018-04-18 13:43:32 +02:00
Aleksandra Malinowska	7e1353a865	Ignore TPU resource in simulations	2018-04-11 12:26:22 +02:00
Aleksandra Malinowska	feb4ad9e14	Add utility for limiting logging	2018-03-22 12:57:22 +01:00
Marcin Wielgus	04bec08e84	Compilation fix	2018-03-20 20:11:36 +01:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
Aleksandra Malinowska	f98e953eb4	Add regional flag	2018-03-12 14:15:56 +01:00
Maciej Pytel	abbc45da2e	Delay scale-up including GPU request Nodes with GPU are expensive and it's likely a bunch of pods using them will be created in a batch. In this case we can wait a bit for all pods to be created to make more efficient scale-up decision.	2018-03-02 15:55:04 +01:00
Aleksandra Malinowska	9cc322a61d	Disable checking inter pod affinity predicate if only preferred or node affinity used	2018-02-14 14:40:02 +01:00
anniedy	bf59e3daa5	Typo fix unneded->[unneeded] (#623 ) * Update clusterstate.md * Update scale_down.go * Update static_autoscaler.go	2018-02-07 17:36:58 +01:00
Beata Skiba	346a5c26a9	Remove old unregistered nodes before checking cluster healthiness	2018-02-01 16:34:50 +01:00
Aleksandra Malinowska	b17b6c3ec5	Wait before publishing no nodes ready after start	2018-01-16 19:04:38 +01:00
Aleksandra Malinowska	3894ecb470	Export unregistered node count metric	2018-01-16 16:56:40 +01:00
Aleksandra Malinowska	27efa05b1d	Publish ClusterUnhealthy events	2018-01-16 16:56:36 +01:00
Aleksandra Malinowska	1b728d411b	Publish status and metrics for empty cluster	2018-01-16 16:07:29 +01:00
Aleksandra Malinowska	3d33b64599	Export long unregistered node count metric	2018-01-16 16:07:24 +01:00
Marcin Wielgus	d5f091a886	Merge pull request #508 from mwielgus/wait-for-pods Skip iteration if pending pods are too new	2017-12-28 17:22:38 +01:00
Marcin Wielgus	15b10c8f67	Skip iteration if pending pods are too new	2017-12-28 16:55:44 +01:00
Nic Cope	19607bd285	Remove the Polling Autoscaler.	2017-12-11 13:09:56 -08:00
Nic Cope	982f9e41a3	Support autodetection of GCE managed instance groups by name prefix This commit adds a new usage of the --node-group-auto-discovery flag intended for use with the GCE cloud provider. GCE instance groups can be automatically discovered based on a prefix of their group name. Example usage: --node-group-auto-discovery=mig:prefix=k8s-mig,minNodes=0,maxNodes=10 Note that unlike the existing AWS ASG autodetection functionality we must specify the min and max nodes in the flag. This is because MIGs store only a target size in the GCE API - they do not have a min and max size we can infer via the API. In order to alleviate this limitation a little we allow multiple uses of the autodiscovery flag. For example to discover two classes (big and small) of instance groups with different size limits: ./cluster-autoscaler \ --node-group-auto-discovery=mig:prefix=k8s-a-small,minNodes=1,maxNodes=10 \ --node-group-auto-discovery=mig:prefix=k8s-a-big,minNodes=1,maxNodes=100 Zonal clusters (i.e. multizone = false in the cloud config) will detect all managed instance groups within the cluster's zone. Regional clusters will detect all matching (zonal) managed instance groups within any of that region's zones.	2017-12-11 13:09:56 -08:00
Maciej Pytel	b7f8622eb2	Create node groups with GPU in scale-up.go This is still not implemented in cloudprovider. Extended NewNodeGroup inteface to have a way of passing parameters for more complex resources.	2017-12-11 13:12:22 +01:00
Marcin Wielgus	f8c0e20ad9	Source fix after godep update	2017-11-28 14:01:43 +01:00
Marcin Wielgus	2589c43a61	Merge pull request #469 from aleksandra-malinowska/single-unregistered-flag Remove --unregistered-node-removal-time flag	2017-11-16 13:07:52 +01:00
Krzysztof Jastrzebski	6c8d3aa37d	Fix unit static autoscaler unit tests.	2017-11-15 16:13:18 +01:00
Aleksandra Malinowska	2ff962e53e	Remove --unregistered-node-removal-time flag	2017-11-15 11:11:30 +01:00
Marcin Wielgus	ded016dfd8	Merge pull request #461 from MaciekPytel/gpu_unready_fix Consider GPU nodes unready until allocatable GPU is > 0	2017-11-13 15:29:27 +01:00
Maciej Pytel	d81dca5991	Mark nodes with uninitialized GPUs as unready	2017-11-10 17:56:10 +01:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Beata Skiba	2b28ac1a04	Add a workaround for scaling of VMs with GPUs When a machine with GPU becomes ready it can take up to 15 minutes before it reports that GPU is allocatable. This can cause Cluster Autoscaler to trigger a second unnecessary scale up. The workaround sets allocatable to capacity for GPU so that a node that waits for GPUs to become ready to use will be considered as a place where pods requesting GPUs can be scheduled.	2017-11-06 16:04:22 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
mmerrill3	3d043f73cb	Renaming the interface function to Cleanup() for CloudProvider type	2017-11-01 12:41:13 -04:00
mmerrill3	77aa30a5c1	Fixing for issue 252 by implementing a channel to stop the go routine	2017-11-01 11:00:00 -04:00
Maciej Pytel	c376ef3c87	Add metrics for autoprovisioning	2017-10-31 17:42:58 +01:00
Maciej Pytel	9c2ebccbfe	Write events when autoprovisioned nodegroup is created / deleted	2017-10-25 17:39:30 +02:00
Maciej Pytel	07511f444a	Add Refresh method to cloud provider This can be used to dynamically update cloud provider config (in particular list of managed NodeGroups and their min/max constraints). Add GKE implementation.	2017-10-24 18:36:29 +02:00
Marcin Wielgus	596f478e63	Merge pull request #414 from krzysztof-jastrzebski/resource_limit Adds resource limits to cloud provider.	2017-10-23 20:38:04 +02:00
Krzysztof Jastrzebski	56ac572666	Adds resource limits to cloud provider.	2017-10-23 16:06:56 +02:00
Maciej Pytel	7b95e71315	Use GKE alpha client when autoprovisioning is enabled	2017-10-23 15:21:02 +02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Maciej Pytel	02ccba3338	Update clusterstate after scale-up	2017-10-17 16:11:25 +02:00
Maciej Pytel	3498507220	Handle nodegroup id changing upon creation	2017-10-17 14:02:46 +02:00
Marcin Wielgus	f658450b16	Merge pull request #379 from MaciekPytel/long_unregistered_node Keep track of nodes that failed to register for a long time	2017-09-28 15:01:32 +02:00
Maciej Pytel	ff21b0b00c	Keep track of nodes that failed to register for a long time Previously a node that failed to register and couldn't be deleted basically broke CA.	2017-09-27 16:32:04 +02:00
Marcin Wielgus	9631f0f136	Merge pull request #375 from MaciekPytel/failed_scale_up_reason Add failed scale-up reason in metric	2017-09-26 19:23:47 +02:00
Maciej Pytel	e12ee88f5f	Add failed scale-up reason in metric	2017-09-26 13:40:34 +02:00
Krzysztof Jastrzebski	16e9106c07	Fix setting target size for group in core/static_autoscaler_test.go.	2017-09-26 10:58:00 +02:00
Krzysztof Jastrzebski	80a7577399	Unit tests.	2017-09-25 11:37:24 +02:00
Maciej Pytel	098ebbee09	Log event when removing unregistered node	2017-09-22 22:48:07 +02:00
Marcin Wielgus	32c4a7ba5c	Merge pull request #360 from aleksandra-malinowska/leaking-taints Fix leaking taints in case of cloud provider error on node deletion	2017-09-22 21:43:55 +01:00
Maciej Pytel	5e05c84cf0	Add metric counting failed scale-ups A minor refactor was required to avoid cyclic imports	2017-09-22 18:12:50 +02:00
Aleksandra Malinowska	4c31a57374	fix leaking taints in case of cloud provider error on node deletion	2017-09-22 17:55:48 +02:00
Matt Terry	63310ef41a	Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay	2017-09-18 17:09:44 -07:00
Marcin Wielgus	f04113d746	Remove TargetSize() from loops iterating over nodes	2017-09-13 22:33:17 +02:00
Marcin Wielgus	303f86c163	Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix Move calculateUnneededOnly check after unneeded calculations	2017-09-13 11:14:51 +02:00
Marcin Wielgus	4bed50d290	Merge pull request #331 from aleksandra-malinowska/min-cluster-cpu-memory Respect minimum cores/memory limit during scale down	2017-09-13 11:12:29 +02:00
Aleksandra Malinowska	197b05b180	respect minimum cores/memory limit during scale down	2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski	d8db14701e	Core/static_autoscaler_test.go unit tests.	2017-09-13 09:52:07 +02:00
Matt Terry	43943cdeb4	Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start	2017-09-12 21:38:29 -07:00
Aleksandra Malinowska	187c02693e	Taint empty nodes to be deleted	2017-09-12 17:40:05 +02:00
Marcin Wielgus	ef730e19c5	Merge pull request #332 from krzysztof-jastrzebski/scale_up2 Fix filtering for autoprovisioned node groups and add unit test.	2017-09-12 16:40:30 +02:00
Krzysztof Jastrzebski	b1396c3cd1	Fix filtering for autoprovisioned node groups and add unit test.	2017-09-12 16:20:23 +02:00
Marcin Wielgus	738fb640e1	Merge pull request #330 from krzysztof-jastrzebski/core-test4 Core/autoscaling_context_test.go unit tests.	2017-09-12 15:07:22 +02:00
Marcin Wielgus	9d3e52551c	Merge pull request #329 from krzysztof-jastrzebski/scale_down2 Core/scale_down.go unit tests.	2017-09-12 13:12:46 +02:00
Marcin Wielgus	3039a0e813	Merge pull request #319 from krzysztof-jastrzebski/core-test Core/static_autoscaler.go unit tests.	2017-09-12 13:11:11 +02:00
Krzysztof Jastrzebski	001ade48c9	Core/autoscaling_context_test.go unit tests.	2017-09-12 11:04:18 +02:00
Krzysztof Jastrzebski	1db2513f1f	Core/scale_down.go unit tests.	2017-09-12 10:41:19 +02:00
Beata Skiba	eba0fa2f95	Remove nodes that are not in the cluster from unremovableNodes	2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski	0aec68a46d	Core/static_autoscaler.go unit tests. Current time usage refactoring.	2017-09-11 15:07:21 +02:00
Marcin Wielgus	db63ac3a18	Merge pull request #324 from aleksandra-malinowska/scale-down-pod-not-found Add checking for pod not found error on eviction	2017-09-11 15:10:08 +05:30
Clayton Coleman	e84807e828	Do not include ToBeDeleted taint when constructing a template This results in the simulator being unable to place candidate pods because the taint blocks all scheduling.	2017-09-10 22:31:39 -04:00
Beata Skiba	1d10a14aa0	Merge pull request #318 from bskiba/fix-empty Always add empty nodes to unneeded nodes	2017-09-08 16:31:19 +02:00
Beata Skiba	6e5784a519	Always add empty nodes to unneeded nodes	2017-09-08 15:55:18 +02:00
Aleksandra Malinowska	fbc8462b10	Add checking for not found error	2017-09-08 15:45:44 +02:00
Aleksandra Malinowska	d43029c180	implement blocking scale up beyond max cores & memory	2017-09-08 12:50:00 +02:00
Marcin Wielgus	fc599bd08c	Merge pull request #310 from krzysztof-jastrzebski/core-test Core/utils.go unit tests	2017-09-07 17:15:58 +05:30
Krzysztof Jastrzebski	2295d9bcc4	Core/utils.go unit tests	2017-09-07 13:24:12 +02:00
Marcin Wielgus	f9cabf3a1a	Merge pull request #297 from bskiba/additional-k Only consider up to 10% of the nodes as additional candidates for scale down	2017-09-07 04:34:23 +05:30
Marcin Wielgus	e85e94510d	Tests for add autoprovisioned node groups	2017-09-06 02:44:16 +02:00
Marcin Wielgus	1ad8d9e10c	Build template NodeInfo for node autoprovisioning	2017-09-05 17:28:49 +02:00
Sergey Lanzman	437a3f60e1	Small optimize code	2017-09-04 23:50:45 +03:00
Sergey Lanzman	44195b39a2	Fix small typos	2017-09-04 22:18:07 +03:00
Sergey Lanzman	415f53cdea	Change from deprecated Core to CoreV1 for kube client	2017-09-04 22:16:21 +03:00
Beata Skiba	a6c18b87d2	Only consider up to 10% of the nodes as additional candidates for scale down.	2017-09-04 17:37:02 +02:00
Aleksandra Malinowska	7ae64de0af	Merge pull request #291 from mwielgus/nap-cleanup Clean up empty autoprovisioned node groups	2017-09-04 15:03:26 +02:00
Marcin Wielgus	bcc8cded64	Clean up empty autoprovisioned node groups	2017-09-04 13:53:07 +02:00
Marcin Wielgus	ae00f0544b	Merge pull request #290 from mwielgus/max-nap-groups Limit autoprovisioned groups to 15	2017-09-01 23:49:33 +05:30
Marcin Wielgus	de524a6688	Limit autoprovisioned groups to 15	2017-09-01 18:25:28 +02:00
Maciej Pytel	a440d92a60	Log event on scale-up timeout	2017-09-01 14:19:14 +02:00
Maciej Pytel	a86268f114	Write event on scale-up failure	2017-09-01 13:34:20 +02:00
Marcin Wielgus	c0b48e4a15	Merge pull request #285 from mwielgus/loglevel Set verbosity for each of the glog.Info logs	2017-09-01 16:42:11 +05:30
Marcin Wielgus	021a2fdf5d	Merge pull request #286 from mwielgus/exist-no-error Do not return error from exist	2017-09-01 16:05:52 +05:30
Marcin Wielgus	2d8f59e23d	Set verbosity for each of the glog.Info logs	2017-09-01 12:34:29 +02:00
Marcin Wielgus	f217d4ac93	Do not return error from exist	2017-09-01 00:24:01 +02:00
Beata Skiba	576e4105db	Make ScaleDownNonEmptyCandidatesCount a flag.	2017-08-31 15:05:06 +02:00
Beata Skiba	4560cc0a85	Keep maximum 30 candidates for scale down with drain	2017-08-31 14:58:40 +02:00

... 2 3 4 5 6 ...

445 Commits