When using a "mixed instance policy"[1] instance group spot and onDemand nodes are part of the same
ASG. The ASG handles the percentage of spot vs onDemand instances. There are no annotations, EC2 tags or labels to identify which
instances are onDemand vs spot. There is a field called `InstanceLifecycle` accessible through `EC2.DescribeInstances`.
The field `InstanceLifecycle` is available only in `spot` and
`scheduled` AWS EC2 instance types.
This PR introduces a new label to be attached on AWS EC2 spot nodes.
The label is:
```
node-role.kubernetes.io/spot-worker: "true"
```
or
```
node-role.kubernetes.io/scheduled-worker: "true"
```
[^1]: https://github.com/kubernetes/kops/blob/master/docs/instance_groups.md#mixedinstancepolicy-aws-only
kube-apiserver doesn't expose the healthcheck via a dedicated
endpoint, instead relying on anonyomous-access being enabled. That
has previously forced us to enable the unauthenticated endpoint on
127.0.0.1:8080.
Instead we now run a small sidecar container, which
proxies /healthz and /readyz requests (only) adding appropriate
authentication using a client certificate.
This will also enable better load balancer checks in future, as these
have previously been hampered by the custom CA certificate.
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
This is a follow-on to #8868; I believe the intent of that was to
expose the option to do more (or fewer) retries.
We previously had a single retry to prevent flapping; this basically
unifies the previous behaviour with the idea of making it
configurable.
* validate-count=0 effectively turns off validation.
* validate-count=1 will do a single validation, without flapping
detection.
* validate-count>=2 will require N succesful validations in a row,
waiting ValidateSuccessDuration in between.
A nice side-effect of this is that the tests now explicitly specify
ValidateCount=1 instead of setting ValidateSuccessDuration=0, which
had the side effect of doing the equivalent to ValidateCount=1.
The client-go signature for most methods adds a context.Context
object, and also makes Options mandatory. Feed through a
context.Context through many of our methods (but use context.TODO to
stop it getting totally out of hand!)
Currently the images have a timestamp of epoch 0:
```
$ docker inspect kope/kops-controller:1.18.0-alpha.2 -f '{{ .Created }}'
1970-01-01T00:00:00Z
```
The `container_image` bazel rule [0] mentions that `creation_time` has a default value of 0 unless `stamp = True`, so this should be enabled on all container_image rules that are pushed to a docker registry.
[0] https://github.com/bazelbuild/rules_docker#container_image-1
It can't be done anyway; instead we make it work (as far as we can),
and we document the workaround (which is to access it via the ELB DNS
name).
In future we could make it easier to discover this DNS name!
Issue #2881