Krzysztof Jastrzebski
2df2568841
Move removing unneeded autoprovisioned node groups to node group manager
2018-06-22 14:26:12 +02:00
Nic Doye
ebadbda2b2
issues/933 Consider making UnremovableNodeRecheckTimeout configurable
2018-06-18 11:54:14 +01:00
Aleksandra Malinowska
ed5e82d85d
Merge pull request #956 from krzysztof-jastrzebski/master
...
Create NodeGroupManager which is responsible for creating…
2018-06-14 17:25:32 +02:00
Łukasz Osipiuk
51d628c2f1
Add test to check if nodes from not autoscaled groups are used in max-nodes limit
2018-06-14 16:17:51 +02:00
Krzysztof Jastrzebski
99c8c51bb3
Create NodeGroupManager which is responsible for creating/deleting node groups.
2018-06-14 16:11:32 +02:00
Łukasz Osipiuk
b7323bc0d1
Respect GPU limits in scale_up
2018-06-14 15:46:58 +02:00
Łukasz Osipiuk
dfcbedb41f
Take into consideration nodes from not autoscaled groups when enforcing resource limits
2018-06-14 15:31:40 +02:00
Łukasz Osipiuk
b1db155c50
Remove duplicated test case
2018-06-13 19:00:37 +02:00
Łukasz Osipiuk
9f75099d2c
Restructure checking resource limits in scale_up.go
...
Preparatory work for before introducing GPU limits
2018-06-13 19:00:37 +02:00
Łukasz Osipiuk
087a5cc9a9
Respect GPU limits in scale_down
2018-06-13 14:19:59 +02:00
Łukasz Osipiuk
1fa44a4d3a
Fix bug resulting resource limits not being enforced in scale_down
2018-06-11 16:39:07 +02:00
Łukasz Osipiuk
519064e1ec
Extract isNodeBeingDeleted function
2018-06-11 14:21:07 +02:00
Łukasz Osipiuk
6c57a01fc9
Restructure checking resource limits in scale_down.go
2018-06-11 14:02:40 +02:00
Pengfei Ni
be3dd85503
Update scheduler cache package
2018-06-11 13:54:12 +08:00
Łukasz Osipiuk
9c61477d25
Do not return error when getting cpu/memory capacity of node
2018-06-08 15:04:57 +02:00
MaciekPytel
c41dc43704
Merge pull request #495 from aleksandra-malinowska/resource-limiter-bytes
...
Use bytes instead of MB for memory limits
2018-06-08 14:47:22 +02:00
Beata Skiba
b8ae6df5d3
Add post scale up status processor.
2018-06-06 13:34:49 +02:00
Maciej Pytel
856855987b
Move some GKE-specific logic outside core
...
No change in actual logic being executed. Added a new
NodeGroupListProcessor interface to encapsulate the existing logic.
Moved PodListProcessor and refactor how it's passed around
to make it consistent and easy to add similar interfaces.
2018-05-29 12:57:19 +02:00
Maciej Pytel
5faa41e683
Move PodListProcessor to new directory
...
It's not really a util and with more processors
coming it makes more sense to keep them in dedicated place.
2018-05-29 12:00:47 +02:00
Krzysztof Jastrzebski
6761d7f354
Execute predicates only for similar pods.
2018-05-29 09:36:11 +02:00
Krzysztof Jastrzebski
adad14c2c9
Delete autoprovisioned node pool after all nodes are deleted.
2018-05-28 14:22:18 +02:00
Karol Gołąb
4c710950de
Move ClusterStateRegistry to StaticAutoscaler
...
AutoscalingContext is basically a configuration and few static helpers
and API handles.
ClusterStateRegistry is state and thus moved to other state-keeping
objects.
2018-05-24 13:03:01 +02:00
Marcin Wielgus
494c2aff1b
Merge pull request #883 from kgolab/kg-clean-up-016
...
Reorder & extract initial parts of RunOnce
2018-05-22 10:06:27 +02:00
Karol Gołąb
5bfab7d9b2
Return value moved to the caller
2018-05-18 14:59:15 +02:00
Joachim Bartosik
bfb70e40ee
Allow passing taints to Node Group creation.
2018-05-18 14:33:33 +02:00
Karol Gołąb
fa6f25a70a
Extract ClusterStateRegistry update with its soft dependency
2018-05-18 10:25:15 +02:00
Karol Gołąb
dc34b43a40
Extract another tiny method
2018-05-18 10:10:51 +02:00
Karol Gołąb
34f6a45a04
Extract method to hide a tiny bit of complexity
2018-05-18 10:01:52 +02:00
Aleksandra Malinowska
3ccfa5be23
Move universal constants to separate module
2018-05-17 18:36:43 +02:00
Aleksandra Malinowska
fcc3d004f5
Use bytes instead of MB for memory limits
2018-05-17 17:35:39 +02:00
Aleksandra Malinowska
d7dc3616f7
Merge pull request #868 from kgolab/kg-clean-up-010
...
Move metrics update to proper place
2018-05-17 14:52:18 +02:00
Karol Gołąb
e31bf0bb58
Move metrics.Autoscaling after all Node-level operations & checks
2018-05-17 14:37:43 +02:00
Aleksandra Malinowska
3b6cfc7c2b
Merge pull request #870 from kgolab/kg-clean-up-012
...
Set lastScaleDownFailTime properly
2018-05-17 12:09:15 +02:00
MaciekPytel
444201d1e7
Merge pull request #871 from kgolab/kg-clean-up-013
...
Extract duplicate code into a single method
2018-05-17 11:49:49 +02:00
Karol Gołąb
400147a075
Extract duplicate code into a single method
2018-05-17 10:01:04 +02:00
Karol Gołąb
b8cbdf4178
Set lastScaleDownFailTime properly - the ScaleDownError check was unreachable
2018-05-17 09:50:22 +02:00
Karol Gołąb
38a5951e22
Check glog.V once
2018-05-17 09:47:52 +02:00
Karol Gołąb
ccca078a2b
Move metrics update to proper place
2018-05-17 09:46:25 +02:00
Łukasz Osipiuk
eb6eff282a
Add gpu related tests to scale_up_test
2018-05-15 22:43:31 +02:00
Łukasz Osipiuk
c406da4174
Support gpus in nodes and pods definitions in UT
2018-05-15 22:43:31 +02:00
Łukasz Osipiuk
be381facfb
Introduce asserting expanding strategy for scale_up_test
2018-05-15 17:01:31 +02:00
Łukasz Osipiuk
c1073fe23a
Model expected scale up in scale_up_test with struct
2018-05-15 17:01:30 +02:00
Łukasz Osipiuk
8bdc6a1bdc
Move commons structs from scale_up_test.go to scale_test_common.go
2018-05-15 17:00:45 +02:00
Karol Gołąb
74b540fdab
Remove DynamicAutoscaler since it's unused ( #851 )
...
* Remove DynamicAutoscaler since it's unused
* Remove configmap flag with its unused-elsewhere dependecies
* gofmt
2018-05-14 20:22:42 +02:00
MaciekPytel
bc39d4dcd5
Merge pull request #842 from kgolab/kg-clean-up-008
...
Merge two variables into one.
2018-05-14 10:54:43 +02:00
Aleksandra Malinowska
b52ec59b05
Fix cleaning up taints
2018-05-11 12:00:48 +02:00
Karol Gołąb
f1f92f065e
Merge two variables into one.
2018-05-10 14:32:37 +02:00
Aleksandra Malinowska
ffeebde8d8
Add support for rescheduled pods with the same name in drain
2018-05-10 12:00:56 +02:00
Marcin Wielgus
9c5728fd74
Merge pull request #836 from kgolab/kg-clean-up-004
...
Use timestamp argument
2018-05-08 20:24:37 +02:00
Karol Gołąb
53b1c6a394
Use timestamp argument
2018-05-08 13:08:30 +02:00
MaciekPytel
e5659e7c57
Merge pull request #835 from kgolab/kg-clean-up-003
...
Make the code slightly more idiomatic go
2018-05-08 12:58:14 +02:00
Karol Gołąb
da16642bcf
Make the code slightly more idiomatic go
2018-05-08 11:35:01 +02:00
Karol Gołąb
ae203ed517
Removed unused CloudProvider() method.
2018-05-08 11:23:55 +02:00
Karol Gołąb
854fcc1ff8
Remove implementation details (CleanUp) from the interface.
...
The CleanUp method is instead called directly from the implementation,
when required.
Test updated in a quick way since the mock we're using does not support
AtLeast(1) - thus Times(2).
2018-05-07 15:24:14 +02:00
Beata Skiba
054f6d8650
Merge pull request #794 from krzysztof-jastrzebski/pods
...
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 13:08:56 +02:00
Krzysztof Jastrzebski
88b769b324
Refactor cluster autoscaler builder and add pod list processor.
2018-04-26 12:37:51 +02:00
Aleksandra Malinowska
3d599bfabe
Rephrase unremovable node warning
2018-04-18 13:43:32 +02:00
Aleksandra Malinowska
7e1353a865
Ignore TPU resource in simulations
2018-04-11 12:26:22 +02:00
Aleksandra Malinowska
feb4ad9e14
Add utility for limiting logging
2018-03-22 12:57:22 +01:00
Marcin Wielgus
04bec08e84
Compilation fix
2018-03-20 20:11:36 +01:00
Aleksandra Malinowska
4c594db7f8
Run spellchecker
2018-03-15 15:47:49 +01:00
Aleksandra Malinowska
f98e953eb4
Add regional flag
2018-03-12 14:15:56 +01:00
Maciej Pytel
abbc45da2e
Delay scale-up including GPU request
...
Nodes with GPU are expensive and it's likely a bunch of pods
using them will be created in a batch. In this case we can
wait a bit for all pods to be created to make more efficient
scale-up decision.
2018-03-02 15:55:04 +01:00
Aleksandra Malinowska
9cc322a61d
Disable checking inter pod affinity predicate if only preferred or node affinity used
2018-02-14 14:40:02 +01:00
anniedy
bf59e3daa5
Typo fix unneded->[unneeded] ( #623 )
...
* Update clusterstate.md
* Update scale_down.go
* Update static_autoscaler.go
2018-02-07 17:36:58 +01:00
Beata Skiba
346a5c26a9
Remove old unregistered nodes before checking cluster healthiness
2018-02-01 16:34:50 +01:00
Aleksandra Malinowska
b17b6c3ec5
Wait before publishing no nodes ready after start
2018-01-16 19:04:38 +01:00
Aleksandra Malinowska
3894ecb470
Export unregistered node count metric
2018-01-16 16:56:40 +01:00
Aleksandra Malinowska
27efa05b1d
Publish ClusterUnhealthy events
2018-01-16 16:56:36 +01:00
Aleksandra Malinowska
1b728d411b
Publish status and metrics for empty cluster
2018-01-16 16:07:29 +01:00
Aleksandra Malinowska
3d33b64599
Export long unregistered node count metric
2018-01-16 16:07:24 +01:00
Marcin Wielgus
d5f091a886
Merge pull request #508 from mwielgus/wait-for-pods
...
Skip iteration if pending pods are too new
2017-12-28 17:22:38 +01:00
Marcin Wielgus
15b10c8f67
Skip iteration if pending pods are too new
2017-12-28 16:55:44 +01:00
Nic Cope
19607bd285
Remove the Polling Autoscaler.
2017-12-11 13:09:56 -08:00
Nic Cope
982f9e41a3
Support autodetection of GCE managed instance groups by name prefix
...
This commit adds a new usage of the --node-group-auto-discovery flag intended
for use with the GCE cloud provider. GCE instance groups can be automatically
discovered based on a prefix of their group name. Example usage:
--node-group-auto-discovery=mig:prefix=k8s-mig,minNodes=0,maxNodes=10
Note that unlike the existing AWS ASG autodetection functionality we must
specify the min and max nodes in the flag. This is because MIGs store only
a target size in the GCE API - they do not have a min and max size we can
infer via the API.
In order to alleviate this limitation a little we allow multiple uses of the
autodiscovery flag. For example to discover two classes (big and small) of
instance groups with different size limits:
./cluster-autoscaler \
--node-group-auto-discovery=mig:prefix=k8s-a-small,minNodes=1,maxNodes=10 \
--node-group-auto-discovery=mig:prefix=k8s-a-big,minNodes=1,maxNodes=100
Zonal clusters (i.e. multizone = false in the cloud config) will detect all
managed instance groups within the cluster's zone. Regional clusters will
detect all matching (zonal) managed instance groups within any of that region's
zones.
2017-12-11 13:09:56 -08:00
Maciej Pytel
b7f8622eb2
Create node groups with GPU in scale-up.go
...
This is still not implemented in cloudprovider.
Extended NewNodeGroup inteface to have a way of passing
parameters for more complex resources.
2017-12-11 13:12:22 +01:00
Marcin Wielgus
f8c0e20ad9
Source fix after godep update
2017-11-28 14:01:43 +01:00
Marcin Wielgus
2589c43a61
Merge pull request #469 from aleksandra-malinowska/single-unregistered-flag
...
Remove --unregistered-node-removal-time flag
2017-11-16 13:07:52 +01:00
Krzysztof Jastrzebski
6c8d3aa37d
Fix unit static autoscaler unit tests.
2017-11-15 16:13:18 +01:00
Aleksandra Malinowska
2ff962e53e
Remove --unregistered-node-removal-time flag
2017-11-15 11:11:30 +01:00
Marcin Wielgus
ded016dfd8
Merge pull request #461 from MaciekPytel/gpu_unready_fix
...
Consider GPU nodes unready until allocatable GPU is > 0
2017-11-13 15:29:27 +01:00
Maciej Pytel
d81dca5991
Mark nodes with uninitialized GPUs as unready
2017-11-10 17:56:10 +01:00
Marcin Wielgus
439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
...
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Beata Skiba
2b28ac1a04
Add a workaround for scaling of VMs with GPUs
...
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Edward Tsang
4104a91991
more spelling fixes
2017-11-02 14:21:36 -07:00
mmerrill3
3d043f73cb
Renaming the interface function to Cleanup() for CloudProvider type
2017-11-01 12:41:13 -04:00
mmerrill3
77aa30a5c1
Fixing for issue 252 by implementing a channel to stop the go routine
2017-11-01 11:00:00 -04:00
Maciej Pytel
c376ef3c87
Add metrics for autoprovisioning
2017-10-31 17:42:58 +01:00
Maciej Pytel
9c2ebccbfe
Write events when autoprovisioned nodegroup is created / deleted
2017-10-25 17:39:30 +02:00
Maciej Pytel
07511f444a
Add Refresh method to cloud provider
...
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Marcin Wielgus
596f478e63
Merge pull request #414 from krzysztof-jastrzebski/resource_limit
...
Adds resource limits to cloud provider.
2017-10-23 20:38:04 +02:00
Krzysztof Jastrzebski
56ac572666
Adds resource limits to cloud provider.
2017-10-23 16:06:56 +02:00
Maciej Pytel
7b95e71315
Use GKE alpha client when autoprovisioning is enabled
2017-10-23 15:21:02 +02:00
Krzysztof Jastrzebski
d9c00e5ce1
Adds priority preemption support to cluster autoscaler.
2017-10-23 09:54:56 +02:00
Maciej Pytel
02ccba3338
Update clusterstate after scale-up
2017-10-17 16:11:25 +02:00
Maciej Pytel
3498507220
Handle nodegroup id changing upon creation
2017-10-17 14:02:46 +02:00
Marcin Wielgus
f658450b16
Merge pull request #379 from MaciekPytel/long_unregistered_node
...
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel
ff21b0b00c
Keep track of nodes that failed to register for a long time
...
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Marcin Wielgus
9631f0f136
Merge pull request #375 from MaciekPytel/failed_scale_up_reason
...
Add failed scale-up reason in metric
2017-09-26 19:23:47 +02:00
Maciej Pytel
e12ee88f5f
Add failed scale-up reason in metric
2017-09-26 13:40:34 +02:00
Krzysztof Jastrzebski
16e9106c07
Fix setting target size for group in core/static_autoscaler_test.go.
2017-09-26 10:58:00 +02:00
Krzysztof Jastrzebski
80a7577399
Unit tests.
2017-09-25 11:37:24 +02:00
Maciej Pytel
098ebbee09
Log event when removing unregistered node
2017-09-22 22:48:07 +02:00
Marcin Wielgus
32c4a7ba5c
Merge pull request #360 from aleksandra-malinowska/leaking-taints
...
Fix leaking taints in case of cloud provider error on node deletion
2017-09-22 21:43:55 +01:00
Maciej Pytel
5e05c84cf0
Add metric counting failed scale-ups
...
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Aleksandra Malinowska
4c31a57374
fix leaking taints in case of cloud provider error on node deletion
2017-09-22 17:55:48 +02:00
Matt Terry
63310ef41a
Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay
2017-09-18 17:09:44 -07:00
Marcin Wielgus
f04113d746
Remove TargetSize() from loops iterating over nodes
2017-09-13 22:33:17 +02:00
Marcin Wielgus
303f86c163
Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
...
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Marcin Wielgus
4bed50d290
Merge pull request #331 from aleksandra-malinowska/min-cluster-cpu-memory
...
Respect minimum cores/memory limit during scale down
2017-09-13 11:12:29 +02:00
Aleksandra Malinowska
197b05b180
respect minimum cores/memory limit during scale down
2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski
d8db14701e
Core/static_autoscaler_test.go unit tests.
2017-09-13 09:52:07 +02:00
Matt Terry
43943cdeb4
Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start
2017-09-12 21:38:29 -07:00
Aleksandra Malinowska
187c02693e
Taint empty nodes to be deleted
2017-09-12 17:40:05 +02:00
Marcin Wielgus
ef730e19c5
Merge pull request #332 from krzysztof-jastrzebski/scale_up2
...
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:40:30 +02:00
Krzysztof Jastrzebski
b1396c3cd1
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:20:23 +02:00
Marcin Wielgus
738fb640e1
Merge pull request #330 from krzysztof-jastrzebski/core-test4
...
Core/autoscaling_context_test.go unit tests.
2017-09-12 15:07:22 +02:00
Marcin Wielgus
9d3e52551c
Merge pull request #329 from krzysztof-jastrzebski/scale_down2
...
Core/scale_down.go unit tests.
2017-09-12 13:12:46 +02:00
Marcin Wielgus
3039a0e813
Merge pull request #319 from krzysztof-jastrzebski/core-test
...
Core/static_autoscaler.go unit tests.
2017-09-12 13:11:11 +02:00
Krzysztof Jastrzebski
001ade48c9
Core/autoscaling_context_test.go unit tests.
2017-09-12 11:04:18 +02:00
Krzysztof Jastrzebski
1db2513f1f
Core/scale_down.go unit tests.
2017-09-12 10:41:19 +02:00
Beata Skiba
eba0fa2f95
Remove nodes that are not in the cluster from unremovableNodes
2017-09-11 20:01:02 +02:00
Krzysztof Jastrzebski
0aec68a46d
Core/static_autoscaler.go unit tests. Current time usage refactoring.
2017-09-11 15:07:21 +02:00
Marcin Wielgus
db63ac3a18
Merge pull request #324 from aleksandra-malinowska/scale-down-pod-not-found
...
Add checking for pod not found error on eviction
2017-09-11 15:10:08 +05:30
Clayton Coleman
e84807e828
Do not include ToBeDeleted taint when constructing a template
...
This results in the simulator being unable to place candidate pods
because the taint blocks all scheduling.
2017-09-10 22:31:39 -04:00
Beata Skiba
1d10a14aa0
Merge pull request #318 from bskiba/fix-empty
...
Always add empty nodes to unneeded nodes
2017-09-08 16:31:19 +02:00
Beata Skiba
6e5784a519
Always add empty nodes to unneeded nodes
2017-09-08 15:55:18 +02:00
Aleksandra Malinowska
fbc8462b10
Add checking for not found error
2017-09-08 15:45:44 +02:00
Aleksandra Malinowska
d43029c180
implement blocking scale up beyond max cores & memory
2017-09-08 12:50:00 +02:00
Marcin Wielgus
fc599bd08c
Merge pull request #310 from krzysztof-jastrzebski/core-test
...
Core/utils.go unit tests
2017-09-07 17:15:58 +05:30
Krzysztof Jastrzebski
2295d9bcc4
Core/utils.go unit tests
2017-09-07 13:24:12 +02:00
Marcin Wielgus
f9cabf3a1a
Merge pull request #297 from bskiba/additional-k
...
Only consider up to 10% of the nodes as additional candidates for scale down
2017-09-07 04:34:23 +05:30
Marcin Wielgus
e85e94510d
Tests for add autoprovisioned node groups
2017-09-06 02:44:16 +02:00
Marcin Wielgus
1ad8d9e10c
Build template NodeInfo for node autoprovisioning
2017-09-05 17:28:49 +02:00
Sergey Lanzman
437a3f60e1
Small optimize code
2017-09-04 23:50:45 +03:00
Sergey Lanzman
44195b39a2
Fix small typos
2017-09-04 22:18:07 +03:00
Sergey Lanzman
415f53cdea
Change from deprecated Core to CoreV1 for kube client
2017-09-04 22:16:21 +03:00
Beata Skiba
a6c18b87d2
Only consider up to 10% of the nodes as additional candidates for scale down.
2017-09-04 17:37:02 +02:00
Aleksandra Malinowska
7ae64de0af
Merge pull request #291 from mwielgus/nap-cleanup
...
Clean up empty autoprovisioned node groups
2017-09-04 15:03:26 +02:00
Marcin Wielgus
bcc8cded64
Clean up empty autoprovisioned node groups
2017-09-04 13:53:07 +02:00
Marcin Wielgus
ae00f0544b
Merge pull request #290 from mwielgus/max-nap-groups
...
Limit autoprovisioned groups to 15
2017-09-01 23:49:33 +05:30
Marcin Wielgus
de524a6688
Limit autoprovisioned groups to 15
2017-09-01 18:25:28 +02:00
Maciej Pytel
a440d92a60
Log event on scale-up timeout
2017-09-01 14:19:14 +02:00
Maciej Pytel
a86268f114
Write event on scale-up failure
2017-09-01 13:34:20 +02:00
Marcin Wielgus
c0b48e4a15
Merge pull request #285 from mwielgus/loglevel
...
Set verbosity for each of the glog.Info logs
2017-09-01 16:42:11 +05:30
Marcin Wielgus
021a2fdf5d
Merge pull request #286 from mwielgus/exist-no-error
...
Do not return error from exist
2017-09-01 16:05:52 +05:30
Marcin Wielgus
2d8f59e23d
Set verbosity for each of the glog.Info logs
2017-09-01 12:34:29 +02:00
Marcin Wielgus
f217d4ac93
Do not return error from exist
2017-09-01 00:24:01 +02:00
Beata Skiba
576e4105db
Make ScaleDownNonEmptyCandidatesCount a flag.
2017-08-31 15:05:06 +02:00
Beata Skiba
4560cc0a85
Keep maximum 30 candidates for scale down with drain
2017-08-31 14:58:40 +02:00
Marcin Wielgus
e9261a249c
Merge pull request #284 from mwielgus/nap-5
...
Node autoprovisioning in scale up
2017-08-31 17:47:25 +05:30
Marcin Wielgus
22f856d4da
Small refactoring in ScaleUp
2017-08-31 13:21:20 +02:00
Marcin Wielgus
6b9e56f0f9
Node autoprovisioning in scale up
2017-08-31 01:33:52 +02:00
Marcin Wielgus
19507aa0de
Node autoprovisioning flag
2017-08-31 00:48:54 +02:00
Maciej Pytel
69c5ea03ce
Disable MatchInterPodAffinity if there are no pods using affinity
2017-08-30 16:18:31 +02:00
Marcin Wielgus
fbf0d6f499
Merge pull request #271 from aleksandra-malinowska/creator-ref
...
Use OwnerReferences in place of deprecated created by annotation
2017-08-30 04:21:58 +05:30
Aleksandra Malinowska
ac0d8388bc
use OwnerReferences instead of deprecated created by annotation
2017-08-29 17:26:38 +02:00
Maciej Pytel
281afa7147
precompute predicateMetadata in scale-down
2017-08-29 16:29:45 +02:00
Marcin Wielgus
51a5ad58c0
GKE NodePool support for NAP - get NP/Migs via api - part 1
2017-08-28 20:50:02 +02:00
Marcin Wielgus
191d140107
Don't increase pod graceful termination
2017-08-28 16:54:19 +02:00
Marcin Wielgus
6ad7ca21e8
Merge pull request #265 from MaciekPytel/ignore_unneded_if_min_size
...
Skip nodes in min-sized groups in scale-down simulation
2017-08-28 19:40:53 +05:30
Marcin Wielgus
9e2c76551f
Merge pull request #263 from mwielgus/delete-in-goroutine
...
Run node drain/delete in a separate goroutine
2017-08-28 19:39:57 +05:30
Maciej Pytel
2f6dd8aefc
Skip nodes in min-sized groups in scale-down simulation
...
Currently we track if those nodes can be removed and only
skip them at the execution step. Since checking if node is
unneeded is pretty expensive it's better to filter them out
early.
2017-08-28 15:48:41 +02:00
Marcin Wielgus
718e5db78e
Run node drain/delete in a separate goroutine
2017-08-28 12:12:31 +02:00
Marcin Wielgus
71b4ca5461
Dont block stale downs if no nodes can be removed
2017-08-26 16:29:50 +02:00
Maciej Pytel
fa53e52ed9
Skip node in scale-down if it was recently found unremovable
2017-08-25 17:21:08 +02:00
Maciej Pytel
fb6ef75d12
Don't create verbose errors in predicates if we ignore them
...
Turns out all this string formatting is pretty damn expensive.
2017-08-24 15:18:38 +02:00
Beata Skiba
edeb522274
Add measuring of FilterOutSchedulable
2017-08-22 18:36:13 +02:00
Beata Skiba
2ae609b93a
Merge pull request #237 from bskiba/split_scale_down
...
Drill down scale down metrics
2017-08-22 16:41:55 +02:00
Beata Skiba
43c9b6b06b
Add cleaner function labels for metrics exporting.
2017-08-22 16:09:42 +02:00
Beata Skiba
44f69c6706
Extract deleting empty nodes to a separate function.
2017-08-22 16:09:42 +02:00
Maciej Pytel
d2faf11482
Re-use results for similar pods in FilterOutSchedulable
2017-08-21 16:32:14 +02:00
Beata Skiba
14df1b808b
Drill down scale down metrics
...
Split scale down duration into three parts:
1. Find nodes to remove
2. Node deletion
3. Misc operations
2017-08-18 14:17:02 +02:00
Maciej Pytel
95b5b4be94
Remove --verify-unschedulabe-pods flag
...
This flag was true in default setups for every platform,
we haven't heard about any user changing it to false and
after removing check on PodScheduled condition setting it
to false would basically break CA.
2017-08-16 17:31:59 +02:00
Maciej Pytel
ef1241b3c6
Remove checking and resetting PodSchedulable condition
...
The performance cost was too high and the pods should
be filtered out by follow up checks anyway.
Check out https://github.com/kubernetes/autoscaler/issues/187
for details.
2017-08-16 17:30:11 +02:00
Marcin Wielgus
998b3f1acd
Merge pull request #198 from MaciekPytel/support_zone_failures
...
Backoff for node group after failed scale-up
2017-08-16 20:46:45 +05:30
Marcin Wielgus
9116e4c08c
Compilation fix for CA after godeps update
2017-08-11 17:56:47 +02:00
Marcin Wielgus
4580e1dc45
Fix getEmptyNodes function in CA
2017-08-07 22:21:41 +02:00
Maciej Pytel
6aacbb5bf7
Backoff for node group after failed scale-up
2017-08-04 15:40:23 +02:00
Ivan Towlson
902d2414b7
Fixed typoes of name 'Kubernetes'
2017-08-03 14:20:23 +12:00
Marcin Wielgus
55d750196c
Add a flag to turn off pod status condition reseting for performance tests
2017-07-24 15:53:45 +02:00
Aleksandra Malinowska
ab8323e8dc
fix some logs in scale down
2017-07-20 10:33:42 +02:00
Aleksandra Malinowska
2de8ccc8e1
Change scope of scaleUp metric
2017-07-18 12:17:51 +02:00
Hanfei Shen
2dff7466f8
fix typo for logging
2017-07-14 13:14:27 +08:00
MaciekPytel
2ac2535a48
Merge pull request #169 from aleksandra-malinowska/test-provider-package-name
...
Rename testprovider package
2017-07-13 12:20:30 +02:00
fate-grand-order
5b230a45ee
correct some misspells for cluster-autoscaler/core
2017-07-13 17:53:59 +08:00
Aleksandra Malinowska
d9eed646f1
add taints to GCE node template
2017-07-11 16:05:30 +02:00
Aleksandra Malinowska
aa1771107e
change scope of findUnneeded metric
2017-07-07 16:30:59 +02:00
Aleksandra Malinowska
c159a90f04
rename test provider package
2017-07-06 16:23:15 +02:00
Aleksandra Malinowska
9f54934229
add annotation
2017-07-06 14:47:32 +02:00
Marcin Wielgus
7cbf295b7f
Merge pull request #161 from mwielgus/godeps-020717
...
Godeps bump for CA
2017-07-04 11:41:00 +02:00
Marcin Wielgus
fc43808149
Godeps bump for CA
2017-07-03 22:05:11 +02:00
Maciej Pytel
39dfced56b
Strip rescheduler taint from node templates
2017-07-03 14:57:17 +02:00
Yusuke Kuoka
7697d5345a
cluster-autoscaler: Fix scale-down when the node group auto-discovery feature is enabled
...
By fixing CA not to reset `StaticAutoscaler` state before each iteration so that it remembers last scale-up/down time which is used to throttle scale-down, which is causing the issue.
2017-06-22 10:25:37 +09:00
Marcin Wielgus
2cd532ebfe
Don't calculate utilization and run scale down simulations for unmanaged nodes
2017-06-20 16:57:30 +02:00
Marcin Wielgus
63e679a74f
Merge pull request #120 from MaciekPytel/fix_graceful_flag
...
Fix typos related to max-graceful-termination-sec
2017-06-14 14:42:35 +02:00
Maciej Pytel
767367c866
Fix typos related to max-graceful-termination-sec
2017-06-14 14:14:21 +02:00
Maciej Pytel
fe514ed75d
Make status configmap respect namespace parameter
2017-06-14 14:07:13 +02:00
Marcin Wielgus
1bedee5707
Update GODEPS
2017-06-13 14:48:24 +02:00
Marcin Wielgus
69c77791a2
Fix error types
2017-06-12 21:26:50 +02:00
Marcin Wielgus
e2e171b7b7
Enable pricing in expander factory
2017-06-09 11:09:43 -07:00
Marcin Wielgus
be0d16a57f
Move Autoscaler Builder to a new file
2017-06-09 10:02:44 -07:00
Maciej Pytel
cd186f3ebc
Balance sizes of similar nodegroups in scale-up
2017-06-06 00:52:38 +02:00
Maciej Pytel
58cdfa1702
Updated log levels in main loop
2017-05-18 14:09:15 +02:00
Maciej Pytel
3f8ca51768
Use typed errors in scale down
2017-05-18 14:09:15 +02:00
Maciej Pytel
7f5c7ed3a2
Used typed errors in scale up code
...
Updated some of the functions called by scale up
to return new errors as required.
2017-05-18 14:09:15 +02:00
Maciej Pytel
f716a7e496
Add typed errors; add errors_total metric
...
To keep reasonable commit size only top-level files use
new errors. Will add them in other files in next commits.
2017-05-18 14:09:15 +02:00
Marcin Wielgus
ea7bd81681
Prefer using ready nodes and cloudprovider template nodes over unready/unschedulable nodes in scale-up
2017-05-16 13:06:19 +02:00
Marcin Wielgus
d9bf5aacd7
Use TemplateNodeInfo in scale up
2017-05-16 11:45:05 +02:00
Maciej Pytel
7a21a68b56
Add metrics counting CA operations
2017-05-15 13:03:00 +02:00
Maciej Pytel
4cdf06ea94
Added CA metrics related to autoscaler execution
2017-05-11 14:51:04 +02:00
Maciej Pytel
83ef3d2be3
Added CA metrics related to cluster state
2017-05-11 13:54:04 +02:00
Marcin Wielgus
0a0129f511
Daemonset listers
2017-05-11 12:30:27 +02:00
Marcin Wielgus
30cb7a52e5
Merge pull request #11 from mumoshu/node-group-auto-discovery-with-asg-tag
...
cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
2017-05-10 11:07:58 +02:00
Yusuke Kuoka
5304e9af21
cluster-autoscaler: Fix typos in comments
2017-05-10 11:22:15 +09:00
Yusuke Kuoka
e9c7cd0733
cluster-autoscaler: Re: AWS Autoscaler autodiscover ASG names and sizes
...
This is an alternative implementation of https://github.com/kubernetes/contrib/pull/1982
Notable differences from the original PR are:
* A new flag named `--node-group-auto-discovery` is introduced for opting in to enable the auto-discovery feature.
* For example, specifying `--cloud-provider aws --node-group-auto-discovery asg:tag=k8s.io/cluster-autoscaler/enabled` instructs CA to auto-discover ASGs tagged with `k8s.io/cluster-autoscaler/enabled` to be used as target node groups
* The new code path introduced by this PR is executed only when `node-group-auto-discovery` is specified. There is relatively less chance to break existing features by introducing this change
Resolves https://github.com/kubernetes/contrib/issues/1956
---
Other notes:
* We rely mainly on the `DescribeTags` API rather than `DescribeAutoScalingGroups` so that AWS can filter out unnecessary ASGs which doesn't belong to the k8s cluster, for us.
* If we relied on `DescribeAutoScalingGroups` here, as it doesn't support `Filter`ing, we'd need to iterate over ALL the ASGs available in an AWS account, which isn't desirable due to unnecessary excessive API calls and network usages
* Update cloudprovider/aws/README for the new configuration
* Warn abount invalid combination of flags
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113713138
* Emit a validation error when both --nodes and --node-group-auto-discovery are specified
according to the review comment https://github.com/kubernetes/autoscaler/pull/11#discussion_r113958080
TODO/Possible future improvements before recommending this to everyone:
* Cache the result of an auto-discovery for a configurable period, so that we won't invoke DescribeTags and DescribeAutoScalingGroup APIs too many times
2017-05-10 08:36:02 +09:00
Marcin Wielgus
42c177b68f
Add deletion safety margin to node drain
2017-05-08 11:47:33 +02:00
Marcin Wielgus
6f5d52e3a7
Overwrite pod.spec.nodename and node.name in template nodes for scale up
2017-04-28 17:57:02 +02:00
Marcin Wielgus
6bafa2a940
Merge pull request #25 from mwielgus/label-fix
...
Override hostname label when building a template node
2017-04-27 17:25:43 +02:00
Marcin Wielgus
e1c89f8fe2
Override hostname label when building a template node
2017-04-27 17:17:01 +02:00
Maciej Pytel
7e4212478a
Fix error handling for updating node status
2017-04-25 17:34:23 +02:00
Maciej Pytel
6b2ea76973
Added UT for CA simulator
2017-04-19 19:12:30 +02:00
Maciej Pytel
4d40222b63
Fix gofmt
2017-04-18 16:45:27 +02:00
Marcin Wielgus
34eb4973f8
Fix imports in cluster autoscaler after migrating it from contrib
2017-04-18 15:42:04 +02:00
Maciej Pytel
0b74a3bd25
Cluster-Autoscaler: update event name
2017-04-10 14:03:21 +02:00
Marcin Wielgus
eb3e6173d1
Cluster-autoscaler: Fix isNodeStarting
2017-03-27 23:27:14 +02:00
Maciej Pytel
72c885b800
Cluster-Autoscaler: reset scale-down on unready cluster
2017-03-22 17:17:59 +01:00
Maciej Pytel
c71668a8d8
Cluster-Autoscaler: update status configmap on errors
...
Previously it would only update after successfully completing the main
loop, meaning the status wouldn't get updated unless cluster was
healthy.
2017-03-15 13:22:24 +01:00
Kubernetes Submit Queue
ac5f7634d8
Merge pull request https://github.com/kubernetes/contrib/pull/2464 from MaciekPytel/ca_drain_evictions
...
Automatic merge from submit-queue
Cluster-Autoscaler: evict pods instead of deleting them
This should make CA respect PodDisruptionBudget.
2017-03-15 04:27:27 -07:00
Maciej Pytel
7d5488898c
Cluster-autoscaler: fix NotTriggerScaleUp event
...
This should fix a failing e2e test
2017-03-14 14:54:36 +01:00
Maciej Pytel
10d560dae6
Cluster-Autoscaler: handle nil node group
...
In a few place we assumed it's not-nil, leading
to segfaults.
2017-03-13 14:46:11 +01:00
Maciej Pytel
39162f0860
Cluster-Autoscaler: evict pods instead of deleting them
2017-03-10 16:18:47 +01:00
Maciej Pytel
5d2c675c8e
Cluster-Autoscaler: update scale down status
2017-03-08 11:51:20 +01:00
Marcin Wielgus
27b797f541
Cluster-Autoscaler: skip nodes currently under deletion in scale down
2017-03-07 14:59:15 +01:00
Kubernetes Submit Queue
39fa783ad7
Merge pull request https://github.com/kubernetes/contrib/pull/2451 from mwielgus/pdb-ca
...
Automatic merge from submit-queue
Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2
In part 1 or 2 we skip nodes that have a pod with 0 poddisruptionallowed. Part 2/2 will delete pods using evict.
cc: @jszczepkowski @MaciekPytel @davidopp @fgrzadkowski
2017-03-06 09:27:50 -08:00
Marcin Wielgus
5b4441083a
Cluster-autoscaler: include PodDisruptionBudget in drain - part 1/2
2017-03-06 17:15:04 +01:00
Maciej Pytel
d3bf5d3d51
Cluster-Autoscaler: log events on status configmap
2017-03-06 12:21:24 +01:00
Maciej Pytel
84f19c1e1e
Cluster-Autoscaler: add map to disable status configmap
2017-03-02 15:35:00 +01:00
Marcin Wielgus
2ffaddb7c0
Cluster-autoscaler: lint
2017-03-02 15:15:07 +01:00
Marcin Wielgus
72a47dc2b2
Cluster-autoscaler: update code for 1.6 k8s sync
2017-03-02 14:34:49 +01:00
Maciej Pytel
d0196c9e1b
Cluster-Autoscaler: Delete status configmap on exit
2017-02-28 17:19:23 +01:00
Maciej Pytel
497d2800ea
Cluster-Autoscaler: Write status to configmap
2017-02-28 09:59:40 +01:00
Maciej Pytel
637e750246
Cluster-Autoscaler: fix segfault
...
StaticAutoscaler.kubeClient was uninitialized,
leading to segfaults when trying to use it. It was
also a duplicate since the client is already available
through AutoscalingContext.
2017-02-27 14:13:54 +01:00
Marcin Wielgus
83fdeb184f
Cluster-autoscaler: use listers from ListersRegistry
2017-02-24 20:40:53 +01:00
Yusuke Kuoka
baee799524
cluster-autoscaler: Dynamic Reconfiguration via ConfigMaps
...
Adds a new optional flag named `configmap` to specify the name of a configmap containing node group specs.
The configmap is polled every `scan-interval` seconds to reconfigure cluster-autoscaler dynamically at runtime.
Example usage:
```
./cluster-autoscaler --v=4 --cloud-provider=aws --skip-nodes-with-local-storage=false --logtostderr --leader-elect=false --configmap=cluster-autoscaler --logtostderr
```
The configmap would look like:
```yaml
kind: ConfigMap
apiVersion: v1
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
settings: |-
{
"nodeGroups": [
{
"minSize": 1,
"maxSize": 2,
"name": "kubeawstest-nodepool1-AutoScaleWorker-1VWD4GAVG35L5"
}
]
}
```
Other notes:
* Make namespace defaults to "kube-system"
according to https://github.com/kubernetes/contrib/pull/2226#discussion_r94144267
* Trigger a full-recreate on a configuration change
according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-269617410
* Introduced `autoscaler/` and moved all the dynamic/recreatable-at-runtime parts of autoscaler into there (Update: the package is now named `core` according to https://github.com/kubernetes/contrib/pull/2226#issuecomment-273071663 )
* Extracted the core of CA(=`func Run()` in `main.go`) into `Autoscaler`
* `DynamicAutoscaler` is a wrapper around `Autoscaler` which achieves reconfiguration of CA by recreating an `Autoscaler` instance on a configmap change.
* Moved `scale_down*.go`, `scale_up*.go` and `utils*.go` into the `autoscaler` package accordingly because they seemed to be meant to be collocated in the same package as the core of CA (which is now implemented as `Autoscaler`)
* Moved the `createEventRecorder` func from the `main` package to the `utils/kubernetes` package to make it importable from both `main` and `autoscaler`
2017-02-24 20:36:47 +09:00