Commit Graph

69 Commits

Author SHA1 Message Date
justinsb f2d4eeb104 reconcile: wait for apiserver to response before trying rolling-update
The rolling-update requires the apiserver (when called without --cloudonly),
so reconcile should wait for apiserver to start responding.

Implement this by reusing "validate cluster", but filtering to only the instance groups
and pods that we expect to be online.
2025-01-13 17:47:48 -05:00
justinsb ebcfebe50e chore: add context to rolling update functions
Move it out of the struct, and into the function parameters.

This is more go idiomatic.
2024-12-27 14:22:51 -05:00
Peter Rifel f05284a2f9
Migrate Instance Group management to aws-sdk-go-v2 2024-04-13 16:01:41 -04:00
Peter Rifel d4d39eb0fe
Migrate autoscaling to aws-sdk-go-v2 2024-03-31 23:04:06 -05:00
justinsb 49dfdabb79 cloudmock: Add context functions to mock 2023-11-09 08:17:10 -05:00
Ciprian Hacman 65c24a9f3d Add missing mock functions 2023-11-09 08:17:10 -05:00
Jack Andersen 6efd68f428
Remove optionality and exit when specific error prefix is matched
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:14 -08:00
Jack Andersen f9ea9b3ef8
Add a flag to rolling update to fail immediately on IG error
Signed-off-by: Jack Andersen <jandersen@plaid.com>
2022-12-21 09:30:13 -08:00
John Gardiner Myers d39ba74bd7 Change the control-plane IG role to "ControlPlane" in v1alpha3 API 2022-11-22 17:05:29 -08:00
Ciprian Hacman 8f79c9bd68 Replace fi.Bool/Float*/Int*/String() with fi.PtrTo() 2022-11-19 03:45:22 +02:00
Ciprian Hacman cb99db0757 Run make goimports 2022-08-17 07:03:33 +03:00
Ole Markus With 982463683d Remove checks that doesn't work when we do not delete the node object 2022-03-06 07:34:52 +01:00
Ciprian Hacman ea7df00719 Run hack/update-gofmt.sh 2021-12-01 22:39:50 +02:00
John Gardiner Myers 4396270d74 Fix out of bounds error when instance detach fails 2021-11-08 23:00:28 -08:00
John Gardiner Myers d46ee9c883 Exclude nodes from load balancers upon cordoning 2021-04-20 17:58:26 -07:00
Ole Markus With 09615935fd Make kOps CLI handle ASG warm pools 2021-04-15 11:10:23 +02:00
Ole Markus With 2659a30280 Make get instances respect needs-update annotation
Make it possible for addons to set needs-update annotation

Use onDelete update strategy for cilium and set needs-update annotation

Rename node roles
2020-11-16 08:26:17 +01:00
Bharath Vedartham 1e18a5d344 rollingupdate_test: add tests for rolling update
The tests create a cluster with 2 node instance groups and 1 master and bastion instance groups.
Only one node instance group requires rolling update.

instanceGroupNodeSpecificErrorClusterValidator mocks a validation failure for a given node group.
rolling update should not fail if the cluster validator reports an error in an unrelated instance group.
2020-10-31 19:17:45 +05:30
Ole Markus With 63f13322d5 Don't pass ctx and cluster everywhere 2020-09-23 08:30:24 +02:00
Ole Markus With 0ec71686b9 Refactor cloudinstancegroupmember in a more independent cloud instance representation
Apply suggestions from code review

Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
2020-08-30 21:37:03 +02:00
Ciprian Hacman 5a9cc3d216 Fix int to string conversions 2020-07-26 09:09:52 +03:00
John Gardiner Myers cc2b647d06 Create separate field for disabling rolling updates 2020-06-19 22:19:26 -07:00
John Gardiner Myers af90ecdddf Reduce test flakiness 2020-05-22 19:33:01 -07:00
Justin Santa Barbara ffb6cd61aa Rolling-update validation harmonization
This is a follow-on to #8868; I believe the intent of that was to
expose the option to do more (or fewer) retries.

We previously had a single retry to prevent flapping; this basically
unifies the previous behaviour with the idea of making it
configurable.

* validate-count=0 effectively turns off validation.

* validate-count=1 will do a single validation, without flapping
  detection.

* validate-count>=2 will require N succesful validations in a row,
waiting ValidateSuccessDuration in between.

A nice side-effect of this is that the tests now explicitly specify
ValidateCount=1 instead of setting ValidateSuccessDuration=0, which
had the side effect of doing the equivalent to ValidateCount=1.
2020-04-17 01:40:02 -04:00
Justin Santa Barbara 31bb16d4d1 Add context.Context to most signatures
The client-go signature for most methods adds a context.Context
object, and also makes Options mandatory.  Feed through a
context.Context through many of our methods (but use context.TODO to
stop it getting totally out of hand!)
2020-04-11 14:44:17 -04:00
Jesse Haka 11eaacd53e validationtimes -> validationcount 2020-04-08 13:55:29 +03:00
Jesse Haka e1e79790ef validate cluster n times in rolling update 2020-04-08 13:55:24 +03:00
John Gardiner Myers 33e23166e4 Support the kops.k8s.io/needs-update annotation on nodes 2020-03-09 22:43:09 -07:00
John Gardiner Myers 99100dc4a0 Fix flaky test 2020-03-03 20:54:22 -08:00
John Gardiner Myers ebfcf5d909 Implement recovery from previous failed surge rolling updates 2020-01-27 20:45:16 -08:00
John Gardiner Myers cee662d521 Implement MaxSurge happy path 2020-01-27 20:45:16 -08:00
John Gardiner Myers 640f5f5b74 Terminate AWS instances through EC2 instead of Autoscaling 2020-01-27 20:15:10 -08:00
John Gardiner Myers 5f72d12132 Reduce test flakiness 2020-01-12 21:27:55 -08:00
John Gardiner Myers 10d6416b8e Allow MaxConcurrency for masters and bastions 2020-01-11 18:50:35 -08:00
John Gardiner Myers 0c3651c9c8 Implement MaxUnavailable 2020-01-05 12:09:55 -08:00
John Gardiner Myers 01dd793604 Specify number of NotReady instances in makeGroup() parameter 2020-01-04 10:47:08 -08:00
John Gardiner Myers 39f849271b Fold setUpCloud() into getGroups() 2020-01-04 09:08:00 -08:00
John Gardiner Myers 612e4ae484 Extract creation of the CloudInstanceGroup 2020-01-04 09:08:00 -08:00
John Gardiner Myers cba59afac4 Change taint key per review comment 2020-01-03 10:07:21 -08:00
John Gardiner Myers 97ad2c3b54 Taint nodes needing update 2019-12-30 16:06:00 -08:00
John Gardiner Myers 5189cc1ef6 Add a third instance to each nodes group in rolling update tests 2019-12-30 13:48:37 -08:00
John Gardiner Myers 92581ab4a1 Create nodes for instances in rolling update tests 2019-12-30 13:48:37 -08:00
John Gardiner Myers 77769855af Return groups from getTestSetup() 2019-12-30 13:48:34 -08:00
Justin Santa Barbara 84835ce0ba
Update pkg/instancegroups/rollingupdate_test.go
Co-Authored-By: John Gardiner Myers <jgmyers@proofpoint.com>
2019-12-17 21:25:18 -05:00
Peter Rifel a24d9b6455
remove more trailing whitespace 2019-12-17 13:03:16 -06:00
Peter Rifel 85a1d23c18
remove trailing whitespace that was breaking gofmt 2019-12-17 12:49:20 -06:00
Justin Santa Barbara 8373c9fc4d tests: increase timeout in rolling update tests
We never know when e.g. a GC is going to delay us, so we need a lot
more padding on these timeouts.
2019-12-17 09:59:21 -05:00
John Gardiner Myers 19e165759b Add unit test for flapping validation 2019-12-13 13:45:21 -08:00
Jesse Haka 44183aef7f validate cluster twice 2019-12-12 08:48:15 +02:00
John Gardiner Myers 1239c05e71 Validate after updating bastion 2019-12-09 18:45:51 -08:00