After upgrading Cilium to 1.8 via kops one of our clusters had a total
outage due to cilium reporting errors as below:
```
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=592 error="Failed to load tc filter: exit status 1" identity=40147 ipv4= ipv6= k8sPodName=/ subsys=endpoint
```
upon searching Cilium slack we found the below thread:
https://cilium.slack.com/archives/C1MATJ5U5/p1616400216167600
which recommended setting `enable-host-reachable-services` to true will
address the problems. We set the field and it fixed our issues too,
however we observed that kops does not have a means to configure this
hence this PR.
We will like to have this backported after it has been merged.
- Update manifests
- Bump components version
- Add API capability of setting Version + VolumeLimit
- Remove snapshot-controller resources as it should be independent from
any CSI driver
Signed-off-by: dntosas <ntosas@gmail.com>
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
Cilium as a CNI is a critical component for the cluster so it would be safe
to have some guaranteed resources as well as allowing the users to
define them based on their needs.
In this commit, we init default requested resources and add the
capability of user-defined values.
Signed-off-by: dntosas <ntosas@gmail.com>
Ensure apiserver role can only be used on AWS (because of firewalling)
Apply api-server label to CP as well
Consolidate node not ready validation message
Guard apiserver nodes with a feature flag
Rename Apiserver role to APIServer
Add an integration test for apiserver nodes
Rename Apiserver role to APIServer
Enumerate all roles in rolling update docs
Apply suggestions from code review
Co-authored-by: Steven E. Harris <seh@panix.com>
Generate and upload keys.json + discovery.json to public store
Don't enable anonymous auth on publicjwks
Remove tests that won't work using FS VFS anymore
As with the Cluster-level "spec.updatePolicy" field, add a similar
field at the InstanceGroup level, allowing overriding of the
cluster-level choice in each InstanceGroup.
Introduce a new value for the field ("automatic") as equivalent to the
default value applied when the field is absent. Honoring this new
value allows disabling automatic updates at the cluster level, but
then enabling them again for particular InstanceGroups. Without such a
positive affirmation, it's not possible to override a cluster-level
"external" policy at the InstanceGroup level, as there's no way to
specify positively that you want to recover the default
value. Instead, expressing the explicit "automatic" value is clear and
unambiguous.