This makes use of the interface attach method when reconciling server
ports.
The difference between just setting the `DeviceID` on the Port and this
is that with the attachment process the server is also validated. Which
means for example that a Port cannot be attached to a server in `ERROR`
state.
We want to be able to use "dns=none" (without peer-to-peer gossip)
even for clusters that have the k8s.local extension. These were
previously called "gossip clusters", but really that is an
implementation; what actually matters to users is that they don't rely
on writing records into a DNS zone (such as Route53).
This will allow kOps to build an OpenStack InstanceGroup with a missing
fixed IP.
In essence this will get rid of `interface name X not found` errors,
when there are servers present which do not have an interface attached
or are in a state (e.g. `ERROR`) which does not allow this.
With this change the external network info will be set, even if there is
no loadbalancer support enabled. Otherwise this leads to an error when
kOps creates a network with router.
This will make use of the kOps taks engine to retry failed servers.
The former approach had the side effect of not making kOps fail when the
last retry failed:
Because there is now a server present - although in an erroneous state -
the "instance task" which the task engine retried reconciled the server
(port, floating ip) instead of recreating it again.
With the approach of letting the task engine retry the failed servers
this will be handled correctly.
* Add a mutex lock to 'awsCloudInstances' map
We're using terraform kops provider to manage our AWS kops clusters.
Time to time we hit a race condition with the stack trace points to
`awsup.NewAWSCloud` function, when writing to a concurrent map, that
maintains a map between regions and `AWSCloud` objects.
This PR changes this to variable so it belongs to a new type, that wraps
the map into its own struct where access is controlled by a mutex lock.
Let me know if that makes sense to you all.
Thanks for building this awesome project!
* lock on reads as well
* cosmetic change, removing empty line
Deployment manifest of snapshot-validation-deployment was missing a
service account and hence was using the default one that exists in
kube-system namespace.
This caused it to log Failed to watch *v1.VolumeSnapshotClass