This is a needed configuration option for users that want to combine
Cilium alongside with a ServiceMesh. Cilium by default will LB requests
at CNI layer meaning that the Sidecars of ServiceMesh Proxy are not able
to apply LB by themselves thus loosing the capability of applying their
features for traffic management.
Ref issue: https://github.com/istio/istio/issues/35531
Signed-off-by: dntosas <ntosas@gmail.com>
Annotations is pretty useful when you need third-party tool to add additional behavior
for a k8s resource.
Lots of auto-discovery tools are based on this annotations.
CNI chaining allows to use Cilium in combination with other CNI plugins.
With Cilium CNI chaining, the base network connectivity and IP address management is managed by the non-Cilium CNI plugin, but Cilium attaches eBPF programs to the network devices created by the non-Cilium plugin to provide L3/L4 network visibility, policy enforcement and other advanced features.
https://docs.cilium.io/en/v1.9/gettingstarted/cni-chaining/#cni-chaining
In our case, to be able to use the `HostPort` feature in our cluster, we need to enable the `portmap` plugin.
As per
3b17e06879,
node-local-dns addon is now builded with latest coreDNS base v1.8 and
that brings great consistency between cache and upstream servers in a
manner of configuration, metrics name convention, etc.
So in this commit, we bump node-local-dns image to latest v1.20.0 which
is build upon latest coreDNS and also add support for overriding this
field.
Signed-off-by: dntosas <ntosas@gmail.com>
In this commit, we enable users to choose WireGuard as their prefered
encryption type, leveraging this new feature from Cilium.
Ref: https://cilium.io/blog/2021/05/20/cilium-110#wireguard
Signed-off-by: dntosas <ntosas@gmail.com>
Node Problem Detector aims to make various node problems visible to
the upstream layers in the cluster management stack. It is a daemon
that runs on each node, detects node problems and reports them to apiserver
so to avoid scheduling new pods on bad nodes and also easily identify
which are the problems on underlying nodes.
Project Home: https://github.com/kubernetes/node-problem-detector
Signed-off-by: dntosas <ntosas@gmail.com>
Node termination handler as all daemonSets may play a critical role in
capacity planning, define resource policy for chosing instanceType etc.
In this commit, we enable users to define resources themselves to meet
their needs and also removed limits to convey with the chosen strategy
to avoid limits on such components.
Signed-off-by: dntosas <ntosas@gmail.com>
After upgrading Cilium to 1.8 via kops one of our clusters had a total
outage due to cilium reporting errors as below:
```
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=592 error="Failed to load tc filter: exit status 1" identity=40147 ipv4= ipv6= k8sPodName=/ subsys=endpoint
```
upon searching Cilium slack we found the below thread:
https://cilium.slack.com/archives/C1MATJ5U5/p1616400216167600
which recommended setting `enable-host-reachable-services` to true will
address the problems. We set the field and it fixed our issues too,
however we observed that kops does not have a means to configure this
hence this PR.
We will like to have this backported after it has been merged.
- Update manifests
- Bump components version
- Add API capability of setting Version + VolumeLimit
- Remove snapshot-controller resources as it should be independent from
any CSI driver
Signed-off-by: dntosas <ntosas@gmail.com>
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
Cilium as a CNI is a critical component for the cluster so it would be safe
to have some guaranteed resources as well as allowing the users to
define them based on their needs.
In this commit, we init default requested resources and add the
capability of user-defined values.
Signed-off-by: dntosas <ntosas@gmail.com>
Generate and upload keys.json + discovery.json to public store
Don't enable anonymous auth on publicjwks
Remove tests that won't work using FS VFS anymore
As with the Cluster-level "spec.updatePolicy" field, add a similar
field at the InstanceGroup level, allowing overriding of the
cluster-level choice in each InstanceGroup.
Introduce a new value for the field ("automatic") as equivalent to the
default value applied when the field is absent. Honoring this new
value allows disabling automatic updates at the cluster level, but
then enabling them again for particular InstanceGroups. Without such a
positive affirmation, it's not possible to override a cluster-level
"external" policy at the InstanceGroup level, as there's no way to
specify positively that you want to recover the default
value. Instead, expressing the explicit "automatic" value is clear and
unambiguous.
Similar to how we can configure coreDNS image we will like to configure
cluster Proportional autoscaler so we can use our internal docker
registry rather than gcr.io.
- Resources
We enable users to set their desired capacity for cluster-autoscaler addon.
There are edge cases, especially in big clusters, where autoscaler needs
to reconcile a large number of objects thus may need increased memory or
increased cpu to avoid saturation.
- Metrics
Cluster autoscaler provides valuable insights for monitoring capacity
allocation and scheduling aspects of a cluster. In this commit, we
add proper annotation on deployment to enable Prometheus scrape metrics.
We also bump patch version of container images.
Signed-off-by: dntosas <ntosas@gmail.com>
add io2 case and correct IOPS minimum value check
add gp3 case
add io2 and gp3 parameter ratio validation logic
add volumeThroughput parameter for disks that support it
add volumeThroughput components throughout ebs structs
add volumeThroughput to versioned api
updated api machinery and crds
apimachinery update
Introduce a new "encapsulationMode" field in Calico's portion of the
Cluster specification to allow switching between the the IP-in-IP and
VXLAN encapsulation protocols. For now, we accept the values "ipip"
and "vxlan," and forgo a possible "none" value that would disable
encapsulation altogether (at least for the default Calico IP pool).
Augment the default-populating procedure for Calico to take this field
into account when deciding both which networking backend to use and
whether to use IP-in-IP or VXLAN encapsulation for the default IP
pool. Note that these values supplied for the "CALICO_IPV4POOL_IPIP"
and "CALICO_IPV4POOL_VXLAN" environment variables in the "calico-node"
DaemonSet pod spec only matter for creating the "default" IPPool pool
object when no such objects already exist.
Generalize the documentation for the "crossSubnet" field to cover
environments more broad than just AWS, as Calico can employ this
selective encapsulation in any environment in which it can detect
boundaries between subnets.
A new field is add to the InstanceGroup spec with 2 sub fields,
HTTPPutResponseHopLimit and HTTPTokens. These fields enable the user
to disable IMDv1 for instances within an instance group.
By default, both IMDv1 and IMDv2 are enabled in instances in an instance group.
* refactor TargetLoadBalancer to use DNSTarget interface instead of LoadBalancer
* add LoadBalancerClass fields into api
* make api machinery
* WIP: Implemented API loadbalancer class, allowing NLB and ELB support on AWS for new clusters.
* perform vendoring related tasks and apply fixes identified from hack/
dissallow spotinst + nlb
remove reflection in status_discovery.go
Add precreated additional security groups to the Master nodes in case of NLB
Remove support for attaching individual instances to NLB; only rely on ASG attachments
Don't specify Classic loadbalancer in GCE integration test
* add utility function to the kops model context to make LoadBalancer comparisons simpler
* use DNSTarget interface when locating DNSName of API ELB
* wip: create target group task
* Consolidate TargetGroup tasks
* Use context helper for determining api load balancer type to avoid nil pointers
* Update NLB creation to use target group ARN from separate task rather than creating a TG in-line
* Address staticcheck and bazel failures
* Removing NLB Attachment tasks because they're not used since we switched to defining them as a part of the ASGs
* Address PR review feedback
* Only set LB Class field for AWS clusters, fix nil pointer
* Move target group attributes from NLB task to TG task, removing unused attributes
* Add terraform and cloudformation support for NLBs, listeners, and target groups
* Update integration test for NLB support
* Fix NLB name format to pass terraform validation
* Preserve security group rule names when switching ELB to NLB to reduce destructive terraform changes
* Use elbv2 enums and address some TODOs
* Set healthcheck values in target group
* Find TG tags, fix NLB name detection
* Fix more spurious changes reported by lifecycle integration test
* Fix spotinst validation, more code cleanup
* Address more PR feedback
* ReconcileTargetGroups unit test + more code simplification
* Addressing PR feedback Renaming task 1. awstasks.LoadBalancer -> awstasks.ClassicLoadBalancer
* Addressing PR feedback Renaming task: ELBName() -> CLBName() / LinkToELB() -> LinkToCLB()
* Addressing PR feedback: Various text changes
* fix export of kubecfg
* address TargetGroup should have the same name as the NLB
* should address error when fetching tags due to missing ARN
* Update expected and crds
* Add feature table to NLB docs
* Address more feedback and remove some TODOs that arent applicable anymore
* Update spotinst validation error message
Co-authored-by: Peter Rifel <pgrifel@gmail.com>
In this commit, we initialize the support of --request-timeout flag on
the configuration of KubeAPIServer so as to enable users for setting
timeout duration value for all kinds of handlers.
Signed-off-by: dntosas <ntosas@gmail.com>
Use provider-agnostic node definition for cas instead of aws auto-discovery
Validate clusterAutoscalerSpec
Add spec documentation
Add cas docs
Make CRDs
Apply suggestions from code review
Co-authored-by: John Gardiner Myers <jgmyers@proofpoint.com>
Add enabled flag to cas config
Apply suggestions from code review
Co-authored-by: Guy Templeton <guyjtempleton@googlemail.com>
Add support for custom cas image
Support more k8s versions
Use full image names
Adding support for 'secret-name' flag
Adding instructions to enable encryption
Updating docs for cli
Addressing comments
Adding ciliumpassword subcommand to 'kops create secret'
Updating command to generate ciliumpassword secret