Commit Graph

798 Commits

Author SHA1 Message Date
Krzysztof Jastrzebski 0484a7ff0c Adds KUBE_AUTOSCALER_ENABLE_SCALE_DOWN export to e2e test question in CA
FAQ.
2017-11-09 13:12:18 +01:00
Krzysztof Jastrzebski 5ee699584a Fix typo in CA FAQ. 2017-11-09 11:08:31 +01:00
Krzysztof Jastrzebski ef077e69b2 Adds Priority and Pod Preemption to Cluster Autoscaled FAQ. 2017-11-09 10:37:47 +01:00
Marcin Wielgus 439fd3c9ec
Merge pull request #411 from krzysztof-jastrzebski/priority
Adds priority preemption support to cluster autoscaler.
2017-11-08 09:09:26 +01:00
Maciej Pytel e1eabe5986 Update metrics documentation 2017-11-07 17:37:10 +01:00
Beata Skiba b2676c7e64 Fix release notes for 1.0.2 2017-11-07 11:42:51 +01:00
Beata Skiba 2b28ac1a04 Add a workaround for scaling of VMs with GPUs
When a machine with GPU becomes ready it can take
up to 15 minutes before it reports that GPU is allocatable.
This can cause Cluster Autoscaler to trigger a second
unnecessary scale up.
The workaround sets allocatable to capacity for GPU so that
a node that waits for GPUs to become ready to use will be
considered as a place where pods requesting GPUs can be
scheduled.
2017-11-06 16:04:22 +01:00
Maciej Pytel 267bedcbfc Update README with release notes for 1.0.1 and 1.0.2 2017-11-06 14:06:53 +01:00
Edward Tsang 4104a91991 more spelling fixes 2017-11-02 14:21:36 -07:00
mmerrill3 3d043f73cb Renaming the interface function to Cleanup() for CloudProvider type 2017-11-01 12:41:13 -04:00
mmerrill3 77aa30a5c1 Fixing for issue 252 by implementing a channel to stop the go routine 2017-11-01 11:00:00 -04:00
Marcin Wielgus 5d599ca678
Merge pull request #429 from MaciekPytel/update_nap_api_again
update GKE cloudprovider to match changes in GKE API
2017-11-01 13:37:36 +08:00
Marcin Wielgus da224d4db9
Merge pull request #440 from MaciekPytel/nap_metrics
Add metrics for autoprovisioning
2017-11-01 13:36:53 +08:00
Maciej Pytel b21a11cebe Add a check against dereferencing nil in GKE cloudprovider 2017-10-31 21:38:50 +01:00
Maciej Pytel c376ef3c87 Add metrics for autoprovisioning 2017-10-31 17:42:58 +01:00
SafPlusPlus 1350a77253
Trivial typo fix 2017-10-31 16:27:33 +01:00
Beata Skiba c25a0a6ec2 Call location API for regional clusters. 2017-10-31 10:24:00 +01:00
Maciej Pytel 50587992ae Update fetching ResourceLimits from GKE to use GB not MB 2017-10-30 19:02:41 +01:00
Maciej Pytel df63369967 Update GKE alpha client 2017-10-30 17:31:18 +01:00
Beata Skiba 26b2de9b63 Cluster Autoscaler integration with regional clusters.
Make GKE cloud provider work with locations instead of zones.
Use beta GKE endpoint for regional clusters.
2017-10-30 14:48:59 +01:00
MaciekPytel 8b8599e5af
Merge pull request #423 from krzysztof-jastrzebski/resource_limit3
Gets resource limits from GKE API.
2017-10-30 12:21:35 +01:00
Seth Pollack ea9aa6fe14 update aws instance types 2017-10-28 21:47:24 -04:00
Henrique Rodrigues 46db208e17 Observation about safe-to-evict annotation on FAQ 2017-10-26 13:37:17 -02:00
Krzysztof Jastrzebski aee8682cb7 Gets resource limits from GKE API. 2017-10-26 14:05:56 +02:00
Henrique Rodrigues 56135db3b0 Annotation which indicates that a pod is safe to evict despite other constraints 2017-10-26 09:29:50 -02:00
Krzysztof Jastrzebski e8e6ad1c7a Adds gke-api-endpoint flag to GCE could provider. 2017-10-26 10:32:00 +02:00
Maciej Pytel 9c2ebccbfe Write events when autoprovisioned nodegroup is created / deleted 2017-10-25 17:39:30 +02:00
Marcin Wielgus c7aa56a82a Merge pull request #416 from MaciekPytel/sync_cloudprovider
Add Refresh method to cloud provider
2017-10-25 06:38:33 +02:00
Maciej Pytel 3fe334b7e6 Fix a bug with parsing int64 in GKE alpha client 2017-10-24 18:51:09 +02:00
Maciej Pytel 07511f444a Add Refresh method to cloud provider
This can be used to dynamically update cloud provider
config (in particular list of managed NodeGroups and their
min/max constraints).
Add GKE implementation.
2017-10-24 18:36:29 +02:00
Maciej Pytel 84775f0e12 Enable cluster-level CA config in GKE alpha client 2017-10-24 12:56:28 +02:00
Marcin Wielgus 596f478e63 Merge pull request #414 from krzysztof-jastrzebski/resource_limit
Adds resource limits to cloud provider.
2017-10-23 20:38:04 +02:00
Krzysztof Jastrzebski 56ac572666 Adds resource limits to cloud provider. 2017-10-23 16:06:56 +02:00
Maciej Pytel 7b95e71315 Use GKE alpha client when autoprovisioning is enabled 2017-10-23 15:21:02 +02:00
Maciej Pytel a67afc54a2 Add v1alpha1 GKE client to vendor 2017-10-23 12:03:04 +02:00
Krzysztof Jastrzebski d9c00e5ce1 Adds priority preemption support to cluster autoscaler. 2017-10-23 09:54:56 +02:00
Beata Skiba f752fdfa8a fix typo 2017-10-20 10:02:09 +02:00
Beata Skiba 68647fa22b Fix GPU resource name. 2017-10-19 09:26:34 +02:00
Marcin Wielgus 2418dba233 Merge pull request #401 from bskiba/gpu
Support GPUs in scale from 0
2017-10-18 11:28:45 +02:00
Maciej Pytel 02ccba3338 Update clusterstate after scale-up 2017-10-17 16:11:25 +02:00
Beata Skiba 70287e486f Support GPUs in scale to/from 0 2017-10-17 16:11:00 +02:00
Maciej Pytel 3498507220 Handle nodegroup id changing upon creation 2017-10-17 14:02:46 +02:00
Maciej Pytel 91759d41cf GKE cloudprovider bugfixes
* Make autoprovisioned node pools names shorter (previously it
   was exceeding max name length).
 * Set autoscaling fields when creating node pool
2017-10-17 14:02:46 +02:00
Marcin Wielgus 2ce9ea2428 Merge pull request #395 from MaciekPytel/gke_fixit
Gke fixit
2017-10-16 06:43:37 -07:00
Maciej Pytel 9ded6f9c9e Rename clusterName flag to cluster-name for consistency 2017-10-16 14:11:27 +02:00
Maciej Pytel 0d791cda3c Fix some bugs in GKE cloudprovider
- Mig.TargetSize tried to query non-existing mig if autoprovisioning
  was enabled.
- GKE client was created using hosted master project instead of users
  project.
2017-10-16 14:03:40 +02:00
Beata Skiba c0c566a3dc Add scalability testing report to 1.0.0 release notes 2017-10-13 13:55:02 +02:00
Marcin Wielgus ecc1b97e12 SLO for CA in FAQ 2017-10-05 09:32:39 +02:00
Marcin Wielgus a5162b6c54 Merge pull request #385 from mwielgus/faq-ga
CA in GA - FAQ
2017-10-03 17:31:21 +02:00
Marcin Wielgus d4cfd3423f CA in GA - FAQ 2017-10-03 17:25:42 +02:00
Maciej Pytel fa7f71ac4b Refresh mig cache if node pool config has changed 2017-10-03 16:30:47 +02:00
Maciej Pytel 598fb7458e Update README with CA release 1.0 info 2017-10-02 13:32:16 +02:00
Marcin Wielgus f658450b16 Merge pull request #379 from MaciekPytel/long_unregistered_node
Keep track of nodes that failed to register for a long time
2017-09-28 15:01:32 +02:00
Maciej Pytel ff21b0b00c Keep track of nodes that failed to register for a long time
Previously a node that failed to register and couldn't be deleted
basically broke CA.
2017-09-27 16:32:04 +02:00
Marcin Wielgus f7b7755bbd Merge pull request #377 from bskiba/nap-allocatable
Compute allocatable for auto-provisioned migs
2017-09-27 12:51:50 +02:00
Beata Skiba 30ae3fa837 Compute allocatable for auto-provisioned migs 2017-09-27 12:07:23 +02:00
Marcin Wielgus 9631f0f136 Merge pull request #375 from MaciekPytel/failed_scale_up_reason
Add failed scale-up reason in metric
2017-09-26 19:23:47 +02:00
Beata Skiba ac8004f41a Cluster Autoscaler scalability testing report 2017-09-26 18:43:02 +02:00
Maciej Pytel e12ee88f5f Add failed scale-up reason in metric 2017-09-26 13:40:34 +02:00
Krzysztof Jastrzebski 16e9106c07 Fix setting target size for group in core/static_autoscaler_test.go. 2017-09-26 10:58:00 +02:00
Nikhita Raghunath b82adcdff7 Fix link after design proposal move 2017-09-26 00:26:26 +05:30
Marcin Wielgus d6c661c61f Mark Cluster Autoscaler as GA (1.0.0) 2017-09-25 19:40:34 +02:00
Marcin Wielgus f512749d63 Merge pull request #370 from MaciekPytel/add_metric_label
Add reason field to faied_scale_ups_total metric
2017-09-25 17:36:57 +02:00
Maciej Pytel 7f7243ea98 Add reason field to faied_scale_ups_total metric
For now it's just a placeholder, will add proper logic
for next release
2017-09-25 16:33:49 +02:00
Krzysztof Jastrzebski 80a7577399 Unit tests. 2017-09-25 11:37:24 +02:00
Marcin Wielgus 56625fb10a Merge pull request #366 from mwielgus/0.7.0-beta2
Bump CA version to 0.7.0-beta2
2017-09-23 00:00:16 +01:00
Marcin Wielgus fd6e093a9c Merge pull request #363 from mwielgus/godep-sync-0.7-rc
Godep sync 0.7 rc
2017-09-23 00:00:07 +01:00
Marcin Wielgus 6868288ee4 Bump CA version to 0.7.0-beta2 2017-09-22 23:06:40 +01:00
Marcin Wielgus e205577cf8 Merge pull request #364 from MaciekPytel/cleanup_findunneeded_logs
Move spammy logs to V(5)
2017-09-22 22:53:58 +01:00
Marcin Wielgus 8779ed3ad9 Remove Microsofr hcssim and go-winio 2017-09-22 22:52:00 +01:00
Marcin Wielgus 113ecf574e Merge pull request #362 from MaciekPytel/event_unregistered_node_delete
Log event when removing unregistered node
2017-09-22 22:21:39 +01:00
Marcin Wielgus 8f5a0ab7d2 Compilation fix after godeps update 2017-09-22 22:19:50 +01:00
Maciej Pytel 9f7cdb8b56 Remove overriding allocatable in simulations 2017-09-22 22:52:12 +02:00
Maciej Pytel 098ebbee09 Log event when removing unregistered node 2017-09-22 22:48:07 +02:00
Marcin Wielgus 32c4a7ba5c Merge pull request #360 from aleksandra-malinowska/leaking-taints
Fix leaking taints in case of cloud provider error on node deletion
2017-09-22 21:43:55 +01:00
Maciej Pytel 38f424fb2f Move spammy logs to V(5) 2017-09-22 19:33:09 +02:00
Maciej Pytel 5e05c84cf0 Add metric counting failed scale-ups
A minor refactor was required to avoid cyclic imports
2017-09-22 18:12:50 +02:00
Marcin Wielgus 9122c57282 Godep update to K8S 1.8 branch 2017-09-22 17:07:04 +01:00
Marcin Wielgus 6cf095c56b Add file with the exact kubernetes version used 2017-09-22 16:57:02 +01:00
Aleksandra Malinowska 4c31a57374 fix leaking taints in case of cloud provider error on node deletion 2017-09-22 17:55:48 +02:00
Aleksandra Malinowska 7e36ea61c0 Keep graceful termination timeout consistent 2017-09-21 12:54:11 +02:00
Krzysztof Jastrzebski 1d0c237adc Cloudprovider/gce/gce_cloud_provider.go unit tests. 2017-09-20 07:57:43 +02:00
Marcin Wielgus 012909d64e Merge pull request #354 from electronicarts/feature/mtcode/scale-down-delay-flags
Introduce new flags to control scale down behavior
2017-09-19 18:35:37 +02:00
Krzysztof Jastrzebski 6b8b8b8fe1 Cloudprovider/gce/gce_manager.go unit tests. 2017-09-19 11:16:08 +02:00
Matt Terry 63310ef41a Introduce new flags to control scale down behavior: scale-down-delay-after-delete and scale-down-delay-after-failure, replacing scale-down-trial-interval. scale-down-delay-after-add replaces scale-down-delay 2017-09-18 17:09:44 -07:00
Krzysztof Jastrzebski 1368f51d53 Fix calculating autoprovisioned property for node pool. 2017-09-15 17:12:56 +02:00
Marcin Wielgus 7cf36de42b Merge pull request #343 from bskiba/allocatable
Compute allocatable for scale up from 0 based on kube-reserved
2017-09-15 16:33:42 +02:00
Beata Skiba 90389930b7 Compute allocatable based on kube-reserved 2017-09-14 16:21:51 +02:00
Marcin Wielgus 64c83e262f Merge pull request #341 from mwielgus/target-removal
Remove TargetSize() from loops iterating over nodes
2017-09-14 10:14:38 +02:00
Marcin Wielgus f04113d746 Remove TargetSize() from loops iterating over nodes 2017-09-13 22:33:17 +02:00
Marcin Wielgus 290f20838e Bump CA version to 0.7.0-beta1 2017-09-13 13:51:33 +02:00
Marcin Wielgus 303f86c163 Merge pull request #336 from electronicarts/feature/matt/unneeded-check-fix
Move calculateUnneededOnly check after unneeded calculations
2017-09-13 11:14:51 +02:00
Marcin Wielgus 4bed50d290 Merge pull request #331 from aleksandra-malinowska/min-cluster-cpu-memory
Respect minimum cores/memory limit during scale down
2017-09-13 11:12:29 +02:00
Krzysztof Jastrzebski 530ce16417 Cloudprovider unit tests. 2017-09-13 10:46:56 +02:00
Aleksandra Malinowska 197b05b180 respect minimum cores/memory limit during scale down 2017-09-13 10:10:47 +02:00
Krzysztof Jastrzebski d8db14701e Core/static_autoscaler_test.go unit tests. 2017-09-13 09:52:07 +02:00
Matt Terry 43943cdeb4 Move calculateUnneededOnly check after unneeded calculations, add log message to main loop start 2017-09-12 21:38:29 -07:00
Aleksandra Malinowska 187c02693e Taint empty nodes to be deleted 2017-09-12 17:40:05 +02:00
Marcin Wielgus ef730e19c5 Merge pull request #332 from krzysztof-jastrzebski/scale_up2
Fix filtering for autoprovisioned node groups and add unit test.
2017-09-12 16:40:30 +02:00
Krzysztof Jastrzebski b1396c3cd1 Fix filtering for autoprovisioned node groups and add unit test. 2017-09-12 16:20:23 +02:00