Node Problem Detector aims to make various node problems visible to
the upstream layers in the cluster management stack. It is a daemon
that runs on each node, detects node problems and reports them to apiserver
so to avoid scheduling new pods on bad nodes and also easily identify
which are the problems on underlying nodes.
Project Home: https://github.com/kubernetes/node-problem-detector
Signed-off-by: dntosas <ntosas@gmail.com>
Node termination handler as all daemonSets may play a critical role in
capacity planning, define resource policy for chosing instanceType etc.
In this commit, we enable users to define resources themselves to meet
their needs and also removed limits to convey with the chosen strategy
to avoid limits on such components.
Signed-off-by: dntosas <ntosas@gmail.com>
After upgrading Cilium to 1.8 via kops one of our clusters had a total
outage due to cilium reporting errors as below:
```
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=592 error="Failed to load tc filter: exit status 1" identity=40147 ipv4= ipv6= k8sPodName=/ subsys=endpoint
```
upon searching Cilium slack we found the below thread:
https://cilium.slack.com/archives/C1MATJ5U5/p1616400216167600
which recommended setting `enable-host-reachable-services` to true will
address the problems. We set the field and it fixed our issues too,
however we observed that kops does not have a means to configure this
hence this PR.
We will like to have this backported after it has been merged.
- Update manifests
- Bump components version
- Add API capability of setting Version + VolumeLimit
- Remove snapshot-controller resources as it should be independent from
any CSI driver
Signed-off-by: dntosas <ntosas@gmail.com>
Cilium as a CNI is a critical component for the cluster so it would be safe
to have some guaranteed resources as well as allowing the users to
define them based on their needs.
In this commit, we init default requested resources and add the
capability of user-defined values.
Signed-off-by: dntosas <ntosas@gmail.com>
Generate and upload keys.json + discovery.json to public store
Don't enable anonymous auth on publicjwks
Remove tests that won't work using FS VFS anymore
As with the Cluster-level "spec.updatePolicy" field, add a similar
field at the InstanceGroup level, allowing overriding of the
cluster-level choice in each InstanceGroup.
Introduce a new value for the field ("automatic") as equivalent to the
default value applied when the field is absent. Honoring this new
value allows disabling automatic updates at the cluster level, but
then enabling them again for particular InstanceGroups. Without such a
positive affirmation, it's not possible to override a cluster-level
"external" policy at the InstanceGroup level, as there's no way to
specify positively that you want to recover the default
value. Instead, expressing the explicit "automatic" value is clear and
unambiguous.