autoscaler

Commit Graph

Author	SHA1	Message	Date
Jiaxin Shan	83ae66cebc	Consider GPU utilization in scaling down	2019-04-04 01:12:51 -07:00
Jiaxin Shan	90666881d3	Move GPULabel and GPUTypes to cloud provider	2019-03-25 13:03:01 -07:00
Łukasz Osipiuk	ea0d61f93d	Migrate to using api-specific REST clients	2019-03-07 21:38:00 +01:00
Pengfei Ni	128729bae9	Move schedulercache to package nodeinfo	2019-02-21 12:41:08 +08:00
Jacek Kaniuk	f054c53c46	Account for kernel reserved memory in capacity calculations	2019-02-08 17:04:07 +01:00
Kubernetes Prow Robot	bd84757b7e	Merge pull request #1596 from vivekbagade/improve-filterout-logic Added better checks for filterSchedulablePods and added a tunable fla…	2019-01-27 13:00:31 -08:00
Vivek Bagade	c6b87841ce	Added a new method that uses pod packing to filter schedulable pods filterOutSchedulableByPacking is an alternative to the older filterOutSchedulable. filterOutSchedulableByPacking sorts pods in unschedulableCandidates by priority and filters out pods that can be scheduled on free capacity on existing nodes. It uses a basic packing approach to do this. Pods with nominatedNodeName set are always filtered out. filterOutSchedulableByPacking is set to be used by default, but, this can be toggled off by setting filter-out-schedulable-pods-uses-packing flag to false, which would then activate the older and more lenient filterOutSchedulable(now called filterOutSchedulableSimple). Added test cases for both methods.	2019-01-25 16:09:51 +05:30
Jacek Kaniuk	d05dbb9ec4	Refactor tests of tainting Refactor scale down nad deletetaint tests Speed up deletetaint tests	2019-01-25 09:21:41 +01:00
Vivek Bagade	8fff0f6556	Removing nominatedNodeName annotation and moving to pod.Status.NominatedNodeName	2019-01-25 00:06:03 +05:30
Jacek Kaniuk	d00af2373c	Tainting nodes - update first, refresh on conflict	2019-01-24 16:57:27 +01:00
Jacek Kaniuk	0c64e0932a	Tainting unneeded nodes as PreferNoSchedule	2019-01-21 13:06:50 +01:00
Łukasz Osipiuk	b5f9a9505c	Extend backoff interface with NodeInfo and error information	2019-01-09 11:25:34 +01:00
Maciej Pytel	b64139d3cb	Use listers in simulator	2019-01-02 15:55:13 +01:00
Maciej Pytel	9060014992	Use listers in scale-down	2018-12-31 14:55:38 +01:00
Maciej Pytel	39551df790	Reenable statefulset drain test	2018-12-31 14:54:41 +01:00
Maciej Pytel	e1f09b012b	Migrate utils/drain to use listers	2018-12-31 14:54:41 +01:00
Maciej Pytel	ed2e3bff52	Add functions for testing new listers	2018-12-31 11:38:42 +01:00
Maciej Pytel	60babe7158	Use kubernetes lister for daemonset instead of custom one Also migrate to using apps/v1.DaemonSet instead of old extensions/v1beta1.	2018-12-28 13:55:41 +01:00
Maciej Pytel	40811c2f8b	Add listers for more controllers	2018-12-28 13:31:21 +01:00
Łukasz Osipiuk	016bf7fc2c	Use k8s.io/klog instead github.com/golang/glog	2018-11-26 17:30:31 +01:00
Łukasz Osipiuk	991873c237	Fix gofmt errors	2018-11-26 15:39:59 +01:00
mooncake	812549592b	Fix typos: reqest->request, approporiate->appropriate Signed-off-by: mooncake <xcoder@tenxcloud.com>	2018-11-10 20:29:34 +08:00
k8s-ci-robot	7008fb50be	Merge pull request #1380 from losipiuk/lo/backoff Make Backoff interface	2018-11-07 05:13:43 -08:00
Łukasz Osipiuk	0e2c3739b7	Use NodeGroup as key in Backoff	2018-10-30 18:17:26 +01:00
Łukasz Osipiuk	e462d4420c	Extract Backoff interface	2018-10-29 23:02:13 +01:00
Maciej Pytel	6f5e6aab6f	Move node group balancing to processor The goal is to allow customization of this logic for different use-case and cloudproviders.	2018-10-25 14:04:05 +02:00
k8s-ci-robot	03283328a7	Merge pull request #1306 from losipiuk/lo/fluentd-ds-ready Ignore lo/fluentd-ds-ready when checking node similarity	2018-10-10 03:55:57 -07:00
Łukasz Osipiuk	e3891ba025	Ignore lo/fluentd-ds-ready when checking node similarity	2018-10-10 09:57:47 +02:00
Alexey Ermakov	9e8d026b19	deletetaint: retry on conflicts Signed-off-by: Alexey Ermakov <alexey.ermakov@zalando.de>	2018-10-08 11:22:07 +02:00
Łukasz Osipiuk	52aaac362f	Remove GetGpuRequests function	2018-09-05 11:58:46 +02:00
Krishnakumar R	a6b81a6ca2	Code cleanup - use const from the package.	2018-08-30 22:30:44 -07:00
Aleksandra Malinowska	364e2da764	Check for ready condition not true	2018-08-30 13:43:24 +02:00
Aleksandra Malinowska	f5690aab96	Make CheckPredicates return predicateError	2018-08-28 14:11:35 +02:00
Aleksandra Malinowska	8b89c8f9cd	Merge pull request #1168 from yguo0905/preemptible-tpu Ignore resources with Cloud TPU prefix	2018-08-21 09:56:14 +01:00
Yang Guo	b41c9828d9	Ignore resources with Cloud TPU prefix	2018-08-20 17:20:44 -07:00
Pengfei Ni	74045053f5	Fix potential panic	2018-07-23 11:13:09 +08:00
Arto Jantunen	11402b5ca7	Allow using the PodSafeToEvictKey annotation in reverse Adding the "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation to a pod prevents the cluster autoscaler from touching it.	2018-07-11 09:32:56 +03:00
Arto Jantunen	9f3c17d153	Add test to use the PodSafeToEvictKey in reverse When this is set to false instead of true, the pod should not be evicted by the autoscaler.	2018-07-11 09:30:20 +03:00
Aleksandra Malinowska	800ee56b34	Refactor and extend GPU metrics error types	2018-07-05 13:13:11 +02:00
Karol Gołąb	553db2c9fc	Separated errors	2018-07-05 11:30:12 +02:00
Karol Gołąb	aae4d1270a	Make GetGpuTypeForMetrics more robust	2018-06-26 21:35:16 +02:00
Karol Gołąb	5eb7021f82	Add GPU-related scaled_up & scaled_down metrics (#974 ) * Add GPU-related scaled_up & scaled_down metrics * Fix name to match SD naming convention * Fix import after master rebase * Change the logic to include GPU-being-installed nodes	2018-06-22 21:00:52 +02:00
Łukasz Osipiuk	57ea19599e	Explicitly return AutoscalerError from GetNodeTargetGpus	2018-06-14 15:46:58 +02:00
Aleksandra Malinowska	bc526e71e8	Merge pull request #960 from krzysztof-jastrzebski/backoff Move backoff mechanism to utils.	2018-06-14 14:35:33 +02:00
Krzysztof Jastrzebski	dd1db7a0ac	Move backoff mechanism to utils.	2018-06-13 15:32:25 +02:00
Łukasz Osipiuk	087a5cc9a9	Respect GPU limits in scale_down	2018-06-13 14:19:59 +02:00
Pengfei Ni	5fd15fc96b	Set enableEquivalenceClassCache for schedulerConfigFactory and fix GPU resource name for unit tests	2018-06-11 15:17:46 +08:00
Pengfei Ni	be3dd85503	Update scheduler cache package	2018-06-11 13:54:12 +08:00
MaciekPytel	c41dc43704	Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes Use bytes instead of MB for memory limits	2018-06-08 14:47:22 +02:00
Maciej Pytel	5faa41e683	Move PodListProcessor to new directory It's not really a util and with more processors coming it makes more sense to keep them in dedicated place.	2018-05-29 12:00:47 +02:00
Clayton Coleman	6146e0dbc1	Autoscaler doesn't drain nodes that have terminal pods Terminal pods (Succeeded or Failed phase) are drained by kubectl drain, and autoscaler should also drain them.	2018-05-28 22:35:37 -04:00
Karol Gołąb	bada827839	Simplify the code by removing superfluous variable	2018-05-18 09:38:47 +02:00
Aleksandra Malinowska	3ccfa5be23	Move universal constants to separate module	2018-05-17 18:36:43 +02:00
Łukasz Osipiuk	c406da4174	Support gpus in nodes and pods definitions in UT	2018-05-15 22:43:31 +02:00
Aleksandra Malinowska	b2ad790121	Merge pull request #830 from aleksandra-malinowska/stateful-set-drain Add support for rescheduled pods with the same name in drain	2018-05-11 13:47:53 +02:00
Aleksandra Malinowska	44ba1c719f	Fix log message	2018-05-11 12:58:32 +02:00
Karol Gołąb	f877f5a64e	Remove unused error handling	2018-05-10 12:15:42 +02:00
Krzysztof Jastrzebski	88b769b324	Refactor cluster autoscaler builder and add pod list processor.	2018-04-26 12:37:51 +02:00
Aleksandra Malinowska	7e1353a865	Ignore TPU resource in simulations	2018-04-11 12:26:22 +02:00
AdamDang	d4ba9120e3	correct the returned message in ready.go feadiness->readiness	2018-04-08 23:00:02 +08:00
Aleksandra Malinowska	8ae3636ccf	Fix method name	2018-03-28 13:23:38 +02:00
Aleksandra Malinowska	feb4ad9e14	Add utility for limiting logging	2018-03-22 12:57:22 +01:00
Marcin Wielgus	04bec08e84	Compilation fix	2018-03-20 20:11:36 +01:00
Aleksandra Malinowska	4c594db7f8	Run spellchecker	2018-03-15 15:47:49 +01:00
AdamDang	5c4693f95f	Typo fix "typ"->"type" line 31 and line 82: "the typ of AutoscalerError" here shoule be type	2018-03-13 19:50:19 +08:00
Maciej Pytel	abbc45da2e	Delay scale-up including GPU request Nodes with GPU are expensive and it's likely a bunch of pods using them will be created in a batch. In this case we can wait a bit for all pods to be created to make more efficient scale-up decision.	2018-03-02 15:55:04 +01:00
Maciej Pytel	d876d74912	Ignore unfitness in price expander if using GPU	2018-03-02 15:50:43 +01:00
Maciej Pytel	b7f8622eb2	Create node groups with GPU in scale-up.go This is still not implemented in cloudprovider. Extended NewNodeGroup inteface to have a way of passing parameters for more complex resources.	2017-12-11 13:12:22 +01:00
Maciej Pytel	6554919700	Helper function to calculate GPU requests for NAP	2017-12-11 13:12:22 +01:00
Marcin Wielgus	f8c0e20ad9	Source fix after godep update	2017-11-28 14:01:43 +01:00
Marcin Wielgus	26960b49df	Merge pull request #460 from sergeylanzman/replace-depricate-func Replace deprecate kubernetes client functions	2017-11-22 15:58:01 +01:00
Marcin Wielgus	ded016dfd8	Merge pull request #461 from MaciekPytel/gpu_unready_fix Consider GPU nodes unready until allocatable GPU is > 0	2017-11-13 15:29:27 +01:00
Maciej Pytel	d81dca5991	Mark nodes with uninitialized GPUs as unready	2017-11-10 17:56:10 +01:00
Sergey Lanzman	eb546b87a0	Replace deprecate kubernetes client functions	2017-11-09 19:49:41 +02:00
Marcin Wielgus	439fd3c9ec	Merge pull request #411 from krzysztof-jastrzebski/priority Adds priority preemption support to cluster autoscaler.	2017-11-08 09:09:26 +01:00
Beata Skiba	2b28ac1a04	Add a workaround for scaling of VMs with GPUs When a machine with GPU becomes ready it can take up to 15 minutes before it reports that GPU is allocatable. This can cause Cluster Autoscaler to trigger a second unnecessary scale up. The workaround sets allocatable to capacity for GPU so that a node that waits for GPUs to become ready to use will be considered as a place where pods requesting GPUs can be scheduled.	2017-11-06 16:04:22 +01:00
Edward Tsang	4104a91991	more spelling fixes	2017-11-02 14:21:36 -07:00
Henrique Rodrigues	56135db3b0	Annotation which indicates that a pod is safe to evict despite other constraints	2017-10-26 09:29:50 -02:00
Krzysztof Jastrzebski	d9c00e5ce1	Adds priority preemption support to cluster autoscaler.	2017-10-23 09:54:56 +02:00
Maciej Pytel	ff21b0b00c	Keep track of nodes that failed to register for a long time Previously a node that failed to register and couldn't be deleted basically broke CA.	2017-09-27 16:32:04 +02:00
Aleksandra Malinowska	7e36ea61c0	Keep graceful termination timeout consistent	2017-09-21 12:54:11 +02:00
Krzysztof Jastrzebski	6b8b8b8fe1	Cloudprovider/gce/gce_manager.go unit tests.	2017-09-19 11:16:08 +02:00
Krzysztof Jastrzebski	0aec68a46d	Core/static_autoscaler.go unit tests. Current time usage refactoring.	2017-09-11 15:07:21 +02:00
Aleksandra Malinowska	d43029c180	implement blocking scale up beyond max cores & memory	2017-09-08 12:50:00 +02:00
Sergey Lanzman	437a3f60e1	Small optimize code	2017-09-04 23:50:45 +03:00
Sergey Lanzman	415f53cdea	Change from deprecated Core to CoreV1 for kube client	2017-09-04 22:16:21 +03:00
Marcin Wielgus	2d8f59e23d	Set verbosity for each of the glog.Info logs	2017-09-01 12:34:29 +02:00
Marcin Wielgus	fbf0d6f499	Merge pull request #271 from aleksandra-malinowska/creator-ref Use OwnerReferences in place of deprecated created by annotation	2017-08-30 04:21:58 +05:30
Aleksandra Malinowska	ac0d8388bc	use OwnerReferences instead of deprecated created by annotation	2017-08-29 17:26:38 +02:00
Maciej Pytel	281afa7147	precompute predicateMetadata in scale-down	2017-08-29 16:29:45 +02:00
Maciej Pytel	fb6ef75d12	Don't create verbose errors in predicates if we ignore them Turns out all this string formatting is pretty damn expensive.	2017-08-24 15:18:38 +02:00
Marcin Wielgus	33f3fcdef9	NAP - pick best labels for pods	2017-08-17 10:47:15 +02:00
Marcin Wielgus	b8c1fc2b01	Fix listers in CA after godep update	2017-08-14 00:14:31 +02:00
Marcin Wielgus	9116e4c08c	Compilation fix for CA after godeps update	2017-08-11 17:56:47 +02:00
Aleksandra Malinowska	c159a90f04	rename test provider package	2017-07-06 16:23:15 +02:00
Marcin Wielgus	fc43808149	Godeps bump for CA	2017-07-03 22:05:11 +02:00
Marcin Wielgus	1bedee5707	Update GODEPS	2017-06-13 14:48:24 +02:00
Marcin Wielgus	69c77791a2	Fix error types	2017-06-12 21:26:50 +02:00
Marcin Wielgus	0fd87aeca7	Merge pull request #100 from aleksandra-malinowska/evict-kube-system-pods Add allowing eviction of kube-system pods with PDB	2017-06-02 10:33:07 -07:00
Maciej Pytel	7c327a951f	Function to balance scale-up between node groups	2017-06-02 15:53:50 +02:00

1 2 3 4 5

208 Commits