John Gardiner Myers
70f7d9bdb2
Use function to get cloud provider from cluster spec
2022-03-02 21:59:47 -08:00
Bronson Mirafuentes
86b0ef0d0c
add drain-timeout flag to rolling-update cluster
2022-01-20 14:05:55 -08:00
Jesse Haka
b88d110f58
Drain OpenStack loadbalancers
2021-12-31 13:16:02 +02:00
Ole Markus With
5e944f1a15
Do not try to detach karpenter nodes from ASGs
2021-12-15 09:56:33 +01:00
Ole Markus With
b785965c50
Rename InstanceManager to Manager
2021-12-13 09:14:24 +01:00
Ole Markus With
1ccb7840ac
make rolling update work
2021-12-12 19:33:41 +01:00
Ciprian Hacman
ea7df00719
Run hack/update-gofmt.sh
2021-12-01 22:39:50 +02:00
Kubernetes Prow Robot
ec7fe88868
Merge pull request #12730 from johngmyers/fix-deprecated
...
Fix use of deprecated method
2021-11-13 23:22:46 -08:00
John Gardiner Myers
c5914d6ddb
Fix use of deprecated method
2021-11-13 20:29:52 -08:00
John Gardiner Myers
4396270d74
Fix out of bounds error when instance detach fails
2021-11-08 23:00:28 -08:00
John Gardiner Myers
d935a419f8
Simplify AddSSHPublicKey() interface
2021-07-24 08:59:57 -07:00
John Gardiner Myers
e0915887ed
Move asset copying out of apply_cluster
2021-06-05 21:17:50 -07:00
John Gardiner Myers
d46ee9c883
Exclude nodes from load balancers upon cordoning
2021-04-20 17:58:26 -07:00
Ole Markus With
09615935fd
Make kOps CLI handle ASG warm pools
2021-04-15 11:10:23 +02:00
Ole Markus With
ab1b85818d
Pass ctx to drain helper
...
In some rare cases, we hit an NPR because the k8s code tries to use the
ctx we are not passing.
2021-03-26 10:29:11 +01:00
Ole Markus With
20bd724f5e
Add support for scaling out the control plane with dedicated apiserver nodes
...
Ensure apiserver role can only be used on AWS (because of firewalling)
Apply api-server label to CP as well
Consolidate node not ready validation message
Guard apiserver nodes with a feature flag
Rename Apiserver role to APIServer
Add an integration test for apiserver nodes
Rename Apiserver role to APIServer
Enumerate all roles in rolling update docs
Apply suggestions from code review
Co-authored-by: Steven E. Harris <seh@panix.com>
2021-03-20 20:57:00 +01:00
Markos Chandras
0a49650c70
aws: Graceful handling of EC2 detach errors
...
Sometimes, we observe the following error during a rolling update:
error detaching instance "i-XXXX", node "ip-10-X-X-X.ec2.internal": error detaching instance "i-XXXX": ValidationError: The instance i-XXXX is not part of Auto Scaling group XXXXX
The sequence of events that lead to this problem is the following:
- A new ASG object is being built from the launch template
- Existing instances are being added to it
- An existing instance is being ignored because it's already terminating
W0205 08:01:32.593377 191 aws_cloud.go:791] ignoring instance as it is terminating: i-XXXX in autoscaling group: XXXX
- Due to maxSurge, the terminating instance is trying to be detached
from the autoscaling group and fails.
As such, in case of EC@ ASG deatch failures we can simply try to detach
the next node instead of aborting the whole update operation.
2021-03-05 15:01:30 +02:00
Jesse Haka
46de9f145e
update gophercloud dependency
2021-01-11 14:48:22 +02:00
Ole Markus With
5a2f1274fb
Don't try to detach masters
2020-11-28 09:44:42 +01:00
Kubernetes Prow Robot
0b5646e94a
Merge pull request #10266 from rifelpet/k8s120
...
Update k8s dependencies to 1.20.0-beta.2
2020-11-18 10:48:07 -08:00
Peter Rifel
47354ce010
Update kubectl drain fields for 1.20
2020-11-18 11:55:03 -06:00
Kubernetes Prow Robot
92911d7dcf
Merge pull request #10167 from olemarkus/cilium-ondelete
...
Make it possible to use OnDelete update strategy on addon daemonset
2020-11-16 12:38:03 -08:00
Ole Markus With
2659a30280
Make get instances respect needs-update annotation
...
Make it possible for addons to set needs-update annotation
Use onDelete update strategy for cilium and set needs-update annotation
Rename node roles
2020-11-16 08:26:17 +01:00
Bharath Vedartham
208199ba85
instancegroups: Clear out the TODO comment
...
Now that we are able to associate pod validation failures with the
instance groups. We can remove the TODO comment
2020-11-15 11:07:45 +05:30
Jesse Haka
bd2dcc93ca
fix test
2020-11-06 11:17:23 +02:00
Kubernetes Prow Robot
7b26ec4b6d
Merge pull request #10065 from bharath-123/feature/instancegroup-specific-validation
...
Avoid waiting on validation during rolling update for inapplicable instance groups
2020-11-05 22:38:50 -08:00
zouyu
2e6b50f9e4
Some typos
...
Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com>
2020-11-03 16:28:30 +08:00
Bharath Vedartham
1e18a5d344
rollingupdate_test: add tests for rolling update
...
The tests create a cluster with 2 node instance groups and 1 master and bastion instance groups.
Only one node instance group requires rolling update.
instanceGroupNodeSpecificErrorClusterValidator mocks a validation failure for a given node group.
rolling update should not fail if the cluster validator reports an error in an unrelated instance group.
2020-10-31 19:17:45 +05:30
Bharath Vedartham
7067f5f47a
instancegroups: Ignore validation errors in unrelated instance groups
...
When unrelated instance groups produce validation errors, the instance group
being updated produces a failure and is forced to wait for rolling update to continue.
This can be avoided as failures in different node instance groups usually don't affect
the instance group being affected in any way.
2020-10-31 19:17:24 +05:30
Ciprian Hacman
e0332177b3
Skip failing test
2020-10-15 07:46:47 +03:00
Srikanth Rao
4d251fe900
[Digital Ocean] Implement Delete Instance logic for rolling update ( #10000 )
...
* Add delete Instance implementation for DO
* Add warning for DeleteInstance usage
* Use reconcile option for rolling update
* Update pkg/instancegroups/instancegroups.go
Co-authored-by: Ciprian Hacman <ciprianhacman@gmail.com>
Co-authored-by: Ciprian Hacman <ciprianhacman@gmail.com>
2020-10-13 10:06:27 -07:00
Ole Markus With
aa66c4f6d8
Add rolling upgrade to openstack
2020-10-01 20:07:44 +02:00
Ole Markus With
a39beb20c8
Rolling update test for OS
2020-10-01 20:07:44 +02:00
Ole Markus With
63f13322d5
Don't pass ctx and cluster everywhere
2020-09-23 08:30:24 +02:00
Ole Markus With
0ec71686b9
Refactor cloudinstancegroupmember in a more independent cloud instance representation
...
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
2020-08-30 21:37:03 +02:00
Ole Markus With
ff6c04938d
Add kops delete instance command
...
Add support for deleting instance by k8s node name
Add yes flag
2020-08-28 08:43:30 +02:00
Peter Rifel
4d9f0128a3
Upgrade to klog2
...
This splits up the kubernetes 1.19 PR to make it easier to keep up to date until we get it sorted out.
2020-08-16 20:56:48 -05:00
Ciprian Hacman
5a9cc3d216
Fix int to string conversions
2020-07-26 09:09:52 +03:00
Kubernetes Prow Robot
219147e2f4
Merge pull request #9348 from johngmyers/rollingupdate-disable
...
Create separate field for disabling rolling updates
2020-07-02 09:08:47 -07:00
John Gardiner Myers
494209884a
Rolling update instance groups in consistent order
2020-06-20 10:58:06 -07:00
John Gardiner Myers
cc2b647d06
Create separate field for disabling rolling updates
2020-06-19 22:19:26 -07:00
ZouYu
2fc52ec6be
fix some go-lint warning
...
Signed-off-by: ZouYu <zouy.fnst@cn.fujitsu.com>
2020-06-09 08:52:50 +08:00
Kubernetes Prow Robot
2a613f1331
Merge pull request #9165 from johngmyers/retry-initial
...
Try validating multiple times before updating instancegroup
2020-05-29 12:07:33 -07:00
John Gardiner Myers
091893fd20
Simplify rolling update internal methods
2020-05-29 10:52:03 -07:00
John Gardiner Myers
dd884a6a64
fix missing space
...
Co-authored-by: Peter Rifel <rifelpet@users.noreply.github.com>
2020-05-29 10:35:15 -07:00
John Gardiner Myers
7756be7fbc
Try validating multiple times before updating instancegroup
2020-05-22 20:26:02 -07:00
John Gardiner Myers
af90ecdddf
Reduce test flakiness
2020-05-22 19:33:01 -07:00
John Gardiner Myers
df7e0b18b6
Ignore already-deleted nodes during rolling update
2020-04-26 21:41:54 -07:00
Justin Santa Barbara
ffb6cd61aa
Rolling-update validation harmonization
...
This is a follow-on to #8868 ; I believe the intent of that was to
expose the option to do more (or fewer) retries.
We previously had a single retry to prevent flapping; this basically
unifies the previous behaviour with the idea of making it
configurable.
* validate-count=0 effectively turns off validation.
* validate-count=1 will do a single validation, without flapping
detection.
* validate-count>=2 will require N succesful validations in a row,
waiting ValidateSuccessDuration in between.
A nice side-effect of this is that the tests now explicitly specify
ValidateCount=1 instead of setting ValidateSuccessDuration=0, which
had the side effect of doing the equivalent to ValidateCount=1.
2020-04-17 01:40:02 -04:00
Justin Santa Barbara
31bb16d4d1
Add context.Context to most signatures
...
The client-go signature for most methods adds a context.Context
object, and also makes Options mandatory. Feed through a
context.Context through many of our methods (but use context.TODO to
stop it getting totally out of hand!)
2020-04-11 14:44:17 -04:00
Jesse Haka
11eaacd53e
validationtimes -> validationcount
2020-04-08 13:55:29 +03:00
Jesse Haka
e1e79790ef
validate cluster n times in rolling update
2020-04-08 13:55:24 +03:00
John Gardiner Myers
ea3b8d7710
make gomod
2020-04-05 10:22:51 -07:00
John Gardiner Myers
6844eef4ca
Switch to the k/k implementation of drain.Helper
2020-04-05 10:22:49 -07:00
Peter Rifel
a999b3ea61
fix OWNERS labels format
...
These need to be lists
2020-03-10 22:47:50 -05:00
Kubernetes Prow Robot
8ecc5edb73
Merge pull request #8272 from johngmyers/need-update-annotation
...
Support the kops.k8s.io/needs-update annotation on nodes
2020-03-10 09:01:35 -07:00
Kubernetes Prow Robot
db435ee7cd
Merge pull request #8717 from rifelpet/owners-labels
...
Add labels to OWNERS files
2020-03-10 08:23:51 -07:00
Peter Rifel
237a125f2c
Add labels to OWNERS files
...
This will automatically label PRs that touch these directories.
This makes it easier to query GitHub for PRs that affect certain areas of the code.
I mostly used existing labels but created some new ones as well.
2020-03-10 08:35:58 -05:00
John Gardiner Myers
33e23166e4
Support the kops.k8s.io/needs-update annotation on nodes
2020-03-09 22:43:09 -07:00
John Gardiner Myers
03eb8246c7
Refactor/simplify rolling update
2020-03-09 11:05:58 -07:00
John Gardiner Myers
e104cdb982
Default maxSurge to 1 on AWS
2020-03-04 19:41:51 -08:00
John Gardiner Myers
99100dc4a0
Fix flaky test
2020-03-03 20:54:22 -08:00
John Gardiner Myers
ed73726195
Address review comments
2020-02-28 21:05:43 -08:00
John Gardiner Myers
38b7219b14
Remove code made unnecessary by apimachinery validation
2020-01-27 20:45:17 -08:00
John Gardiner Myers
ebfcf5d909
Implement recovery from previous failed surge rolling updates
2020-01-27 20:45:16 -08:00
John Gardiner Myers
cee662d521
Implement MaxSurge happy path
2020-01-27 20:45:16 -08:00
John Gardiner Myers
4ddc58ca5e
Add MaxSurge to resolveSettings
2020-01-27 20:45:16 -08:00
John Gardiner Myers
640f5f5b74
Terminate AWS instances through EC2 instead of Autoscaling
2020-01-27 20:15:10 -08:00
John Gardiner Myers
d56ad41334
Address review comments
2020-01-26 14:18:51 -08:00
John Gardiner Myers
5f72d12132
Reduce test flakiness
2020-01-12 21:27:55 -08:00
John Gardiner Myers
10d6416b8e
Allow MaxConcurrency for masters and bastions
2020-01-11 18:50:35 -08:00
John Gardiner Myers
0c3651c9c8
Implement MaxUnavailable
2020-01-05 12:09:55 -08:00
John Gardiner Myers
0952374027
Extract maybeValidate
2020-01-05 12:09:55 -08:00
John Gardiner Myers
91f4920537
Extract drainTerminateAndWait()
2020-01-05 12:09:55 -08:00
John Gardiner Myers
adaf903b90
Create resolveSettings
2020-01-05 12:09:54 -08:00
Kubernetes Prow Robot
121d9f461f
Merge pull request #7909 from johngmyers/remove-drain-feature-flag
...
Remove DrainAndValidateRollingUpdate feature flag
2020-01-05 11:15:40 -08:00
Kubernetes Prow Robot
a22af4fa80
Merge pull request #8239 from johngmyers/simplify-rolling
...
Simplify code for rolling updates of nodes
2020-01-04 13:13:40 -08:00
John Gardiner Myers
01dd793604
Specify number of NotReady instances in makeGroup() parameter
2020-01-04 10:47:08 -08:00
John Gardiner Myers
39f849271b
Fold setUpCloud() into getGroups()
2020-01-04 09:08:00 -08:00
John Gardiner Myers
612e4ae484
Extract creation of the CloudInstanceGroup
2020-01-04 09:08:00 -08:00
John Gardiner Myers
cba59afac4
Change taint key per review comment
2020-01-03 10:07:21 -08:00
John Gardiner Myers
cd499f6f09
Remove unused code
2019-12-31 14:33:05 -08:00
John Gardiner Myers
0cbd76ecfb
Simplify code for rolling updates of nodes
2019-12-31 10:25:55 -08:00
John Gardiner Myers
97ad2c3b54
Taint nodes needing update
2019-12-30 16:06:00 -08:00
John Gardiner Myers
5189cc1ef6
Add a third instance to each nodes group in rolling update tests
2019-12-30 13:48:37 -08:00
John Gardiner Myers
92581ab4a1
Create nodes for instances in rolling update tests
2019-12-30 13:48:37 -08:00
John Gardiner Myers
77769855af
Return groups from getTestSetup()
2019-12-30 13:48:34 -08:00
John Gardiner Myers
1d3d5c1d2f
pkg/instancegroups - fix static check
2019-12-22 20:56:27 -08:00
Justin Santa Barbara
84835ce0ba
Update pkg/instancegroups/rollingupdate_test.go
...
Co-Authored-By: John Gardiner Myers <jgmyers@proofpoint.com>
2019-12-17 21:25:18 -05:00
Peter Rifel
a24d9b6455
remove more trailing whitespace
2019-12-17 13:03:16 -06:00
Peter Rifel
85a1d23c18
remove trailing whitespace that was breaking gofmt
2019-12-17 12:49:20 -06:00
Justin Santa Barbara
8373c9fc4d
tests: increase timeout in rolling update tests
...
We never know when e.g. a GC is going to delay us, so we need a lot
more padding on these timeouts.
2019-12-17 09:59:21 -05:00
John Gardiner Myers
2850826a52
Improve logging of cluster revalidation
2019-12-13 13:48:47 -08:00
John Gardiner Myers
19e165759b
Add unit test for flapping validation
2019-12-13 13:45:21 -08:00
Jesse Haka
44183aef7f
validate cluster twice
2019-12-12 08:48:15 +02:00
John Gardiner Myers
1239c05e71
Validate after updating bastion
2019-12-09 18:45:51 -08:00
John Gardiner Myers
2e36124f77
Expose ValidateTickDuration for use by unit tests
2019-12-09 18:43:20 -08:00
Kashif Saadat
fcf6f0098c
Canal Typha spec and apimachinery
2019-12-06 15:36:48 +00:00
John Gardiner Myers
4eccd3d53f
Remove DrainAndValidateRollingUpdate feature flag
2019-12-05 22:50:04 -08:00
John Gardiner Myers
38b19e53b4
Add a second master to rolling update tests
2019-11-19 16:55:39 -08:00