Ensure apiserver role can only be used on AWS (because of firewalling)
Apply api-server label to CP as well
Consolidate node not ready validation message
Guard apiserver nodes with a feature flag
Rename Apiserver role to APIServer
Add an integration test for apiserver nodes
Rename Apiserver role to APIServer
Enumerate all roles in rolling update docs
Apply suggestions from code review
Co-authored-by: Steven E. Harris <seh@panix.com>
Sometimes, we observe the following error during a rolling update:
error detaching instance "i-XXXX", node "ip-10-X-X-X.ec2.internal": error detaching instance "i-XXXX": ValidationError: The instance i-XXXX is not part of Auto Scaling group XXXXX
The sequence of events that lead to this problem is the following:
- A new ASG object is being built from the launch template
- Existing instances are being added to it
- An existing instance is being ignored because it's already terminating
W0205 08:01:32.593377 191 aws_cloud.go:791] ignoring instance as it is terminating: i-XXXX in autoscaling group: XXXX
- Due to maxSurge, the terminating instance is trying to be detached
from the autoscaling group and fails.
As such, in case of EC@ ASG deatch failures we can simply try to detach
the next node instead of aborting the whole update operation.
The tests create a cluster with 2 node instance groups and 1 master and bastion instance groups.
Only one node instance group requires rolling update.
instanceGroupNodeSpecificErrorClusterValidator mocks a validation failure for a given node group.
rolling update should not fail if the cluster validator reports an error in an unrelated instance group.
When unrelated instance groups produce validation errors, the instance group
being updated produces a failure and is forced to wait for rolling update to continue.
This can be avoided as failures in different node instance groups usually don't affect
the instance group being affected in any way.
This is a follow-on to #8868; I believe the intent of that was to
expose the option to do more (or fewer) retries.
We previously had a single retry to prevent flapping; this basically
unifies the previous behaviour with the idea of making it
configurable.
* validate-count=0 effectively turns off validation.
* validate-count=1 will do a single validation, without flapping
detection.
* validate-count>=2 will require N succesful validations in a row,
waiting ValidateSuccessDuration in between.
A nice side-effect of this is that the tests now explicitly specify
ValidateCount=1 instead of setting ValidateSuccessDuration=0, which
had the side effect of doing the equivalent to ValidateCount=1.
The client-go signature for most methods adds a context.Context
object, and also makes Options mandatory. Feed through a
context.Context through many of our methods (but use context.TODO to
stop it getting totally out of hand!)
This will automatically label PRs that touch these directories.
This makes it easier to query GitHub for PRs that affect certain areas of the code.
I mostly used existing labels but created some new ones as well.