We're aiming to use this for testing immediately and better
logging/tracing in future, but to make the changes manageable breaking
them into a smaller series that don't directly achieve much.
Ensure apiserver role can only be used on AWS (because of firewalling)
Apply api-server label to CP as well
Consolidate node not ready validation message
Guard apiserver nodes with a feature flag
Rename Apiserver role to APIServer
Add an integration test for apiserver nodes
Rename Apiserver role to APIServer
Enumerate all roles in rolling update docs
Apply suggestions from code review
Co-authored-by: Steven E. Harris <seh@panix.com>
Sometimes, we observe the following error during a rolling update:
error detaching instance "i-XXXX", node "ip-10-X-X-X.ec2.internal": error detaching instance "i-XXXX": ValidationError: The instance i-XXXX is not part of Auto Scaling group XXXXX
The sequence of events that lead to this problem is the following:
- A new ASG object is being built from the launch template
- Existing instances are being added to it
- An existing instance is being ignored because it's already terminating
W0205 08:01:32.593377 191 aws_cloud.go:791] ignoring instance as it is terminating: i-XXXX in autoscaling group: XXXX
- Due to maxSurge, the terminating instance is trying to be detached
from the autoscaling group and fails.
As such, in case of EC@ ASG deatch failures we can simply try to detach
the next node instead of aborting the whole update operation.
The tests create a cluster with 2 node instance groups and 1 master and bastion instance groups.
Only one node instance group requires rolling update.
instanceGroupNodeSpecificErrorClusterValidator mocks a validation failure for a given node group.
rolling update should not fail if the cluster validator reports an error in an unrelated instance group.
When unrelated instance groups produce validation errors, the instance group
being updated produces a failure and is forced to wait for rolling update to continue.
This can be avoided as failures in different node instance groups usually don't affect
the instance group being affected in any way.