Currently, kOps uses cgroupfs cgroup driver for the kubelet and CRIs. This PR defaults
the cgroup driver to systemd for clusters created with k8s versions >= 1.20.
Using systemd as the cgroup-driver is the recommended way as per
https://kubernetes.io/docs/setup/production-environment/container-runtimes/
- Resources
We enable users to set their desired capacity for cluster-autoscaler addon.
There are edge cases, especially in big clusters, where autoscaler needs
to reconcile a large number of objects thus may need increased memory or
increased cpu to avoid saturation.
- Metrics
Cluster autoscaler provides valuable insights for monitoring capacity
allocation and scheduling aspects of a cluster. In this commit, we
add proper annotation on deployment to enable Prometheus scrape metrics.
We also bump patch version of container images.
Signed-off-by: dntosas <ntosas@gmail.com>
kops-controller can now serve the instance group & cluster config to
nodes, as part of the bootstrap process.
This enables nodes to boot without access to the state
store (i.e. without S3 / GCS / etc permissions)
Feature-flagged behind the KopsControllerStateStore feature-flag.
These are needed by protokube to create the kops-controller DNS record to allow nodes to bootstrap.
See these logs: https://storage.googleapis.com/kubernetes-jenkins/logs/e2e-kops-grid-scenario-public-jwks/1345956556562239488/artifacts/ip-172-20-48-1.sa-east-1.compute.internal/protokube.log
```
I0104 05:03:51.264472 6482 dnscache.go:74] querying all DNS zones (no cached results)
I0104 05:03:51.264570 6482 route53.go:53] AWS request: route53 ListHostedZones
W0104 05:03:51.389485 6482 dnscontroller.go:124] Unexpected error in DNS controller, will retry: error querying for zones: error querying for DNS zones: AccessDenied: User: arn:aws:sts::768319786644:assumed-role/masters.e2e-kops-scenario-public-jwks.test-cncf-aws.k8s.io/i-05b1db10d1a5b8637 is not authorized to perform: route53:ListHostedZones
```
and the nodeup logs on nodes that couldn't join the cluster:
```
Jan 04 04:55:53.500187 ip-172-20-38-84 nodeup[2070]: W0104 04:55:53.500117 2070 executor.go:131] error running task "BootstrapClient/BootstrapClient" (9m52s remaining to succeed): Post "https://kops-controller.internal.e2e-kops-scenario-public-jwks.test-cncf-aws.k8s.io:3988/bootstrap": dial tcp: lookup kops-controller.internal.e2e-kops-scenario-public-jwks.test-cncf-aws.k8s.io on 127.0.0.53:53: no such host
```
add io2 case and correct IOPS minimum value check
add gp3 case
add io2 and gp3 parameter ratio validation logic
add volumeThroughput parameter for disks that support it
add volumeThroughput components throughout ebs structs
add volumeThroughput to versioned api
updated api machinery and crds
apimachinery update
When using an AWS NLB in front of the Kubernetes API servers, we can't
attach the EC2 security groups nominated in the Cluster
"spec.api.loadBalancer.additionalSecurityGroups" field directly to the
load balancer, as NLBs don't have associated security groups. Instead,
we intend to attach those nominated security groups to the machines
that will receive network traffic forwarded from the NLB's
listeners. For the API servers, since that program runs only on the
master or control plane machines, we need only attach those security
groups to the machines that will host the "kube-apiserver" program, by
way of the ASG launch templates that come from kOps InstanceGroups of
role "master."
We were mistakenly including these security groups in launch templates
derived from InstanceGroups of all of our three current roles:
"bastion," "master," and "node." Instead, skip InstanceGroups of the
"bastion" and "node" roles and only target those of role "master."
Introduce a new "encapsulationMode" field in Calico's portion of the
Cluster specification to allow switching between the the IP-in-IP and
VXLAN encapsulation protocols. For now, we accept the values "ipip"
and "vxlan," and forgo a possible "none" value that would disable
encapsulation altogether (at least for the default Calico IP pool).
Augment the default-populating procedure for Calico to take this field
into account when deciding both which networking backend to use and
whether to use IP-in-IP or VXLAN encapsulation for the default IP
pool. Note that these values supplied for the "CALICO_IPV4POOL_IPIP"
and "CALICO_IPV4POOL_VXLAN" environment variables in the "calico-node"
DaemonSet pod spec only matter for creating the "default" IPPool pool
object when no such objects already exist.
Generalize the documentation for the "crossSubnet" field to cover
environments more broad than just AWS, as Calico can employ this
selective encapsulation in any environment in which it can detect
boundaries between subnets.
The maximum IAM role name length is 64 characters, which we hit much
more often now that we are constructing complex names. Use our normal
strategy of adding a hash when we truncate.
This is not a breaking change, because these names were not valid
previously.
A new field is add to the InstanceGroup spec with 2 sub fields,
HTTPPutResponseHopLimit and HTTPTokens. These fields enable the user
to disable IMDv1 for instances within an instance group.
By default, both IMDv1 and IMDv2 are enabled in instances in an instance group.
Release notes for 3.0.20201117:
* Release notes for 3.0.20200531
* Adds support for using OS application credentials
* Fixes usage of OpenStack Swift reauthentication
* Move from debian-hyperkube-base to debian-base
* Add license headers to each file
* Fix some typos picked up by verify-spelling
* Fix some problems with trailing spaces
* Add support for etcd 3.4.13
* Switch to gcr.io/cloud-marketplace-containers/google/debian10 - Fix
for #340 option 1
* Support for ARM64
* BUG: OpenStack ignore AvailabilityZone in discovery
* Added full cinder ID to candidateDeviceNodes
* feat(etcd-manager-ctl): use backupname to delete backup instead of timestamp
* Update kops to pick up AllowAuth Openstack
* Build base image by raw expansion of deb packages
* Switch the cloudbuild docker image, locking to 2.2.0
* Fix build on case-insensitive file systems (MacOS)
* Set AltNames on server certificates
* govet: Fix a log message
In order to let kops fully control the rules for each security group we need to be able to generate names from the info in AWS. This is similar to the approach we used for openstack
Update pkg/model/firewall.go
Co-authored-by: Ciprian Hacman <ciprianhacman@gmail.com>
This way the security group rule task doesn't need to be aware of VPCs, since we know the VPC CIDR ahead of time via cluster spec.
This also fixes the terraform and cloudformation rendering of this rule (see the added cidr block in the integration test outputs)
These rules are for NLB's health checks. The AWS docs recommend allowing access from the entire VPC CIDRs
Also add rules for additionalNetworkCIDRs, supporting VPCs with multiple CIDR blocks.
* refactor TargetLoadBalancer to use DNSTarget interface instead of LoadBalancer
* add LoadBalancerClass fields into api
* make api machinery
* WIP: Implemented API loadbalancer class, allowing NLB and ELB support on AWS for new clusters.
* perform vendoring related tasks and apply fixes identified from hack/
dissallow spotinst + nlb
remove reflection in status_discovery.go
Add precreated additional security groups to the Master nodes in case of NLB
Remove support for attaching individual instances to NLB; only rely on ASG attachments
Don't specify Classic loadbalancer in GCE integration test
* add utility function to the kops model context to make LoadBalancer comparisons simpler
* use DNSTarget interface when locating DNSName of API ELB
* wip: create target group task
* Consolidate TargetGroup tasks
* Use context helper for determining api load balancer type to avoid nil pointers
* Update NLB creation to use target group ARN from separate task rather than creating a TG in-line
* Address staticcheck and bazel failures
* Removing NLB Attachment tasks because they're not used since we switched to defining them as a part of the ASGs
* Address PR review feedback
* Only set LB Class field for AWS clusters, fix nil pointer
* Move target group attributes from NLB task to TG task, removing unused attributes
* Add terraform and cloudformation support for NLBs, listeners, and target groups
* Update integration test for NLB support
* Fix NLB name format to pass terraform validation
* Preserve security group rule names when switching ELB to NLB to reduce destructive terraform changes
* Use elbv2 enums and address some TODOs
* Set healthcheck values in target group
* Find TG tags, fix NLB name detection
* Fix more spurious changes reported by lifecycle integration test
* Fix spotinst validation, more code cleanup
* Address more PR feedback
* ReconcileTargetGroups unit test + more code simplification
* Addressing PR feedback Renaming task 1. awstasks.LoadBalancer -> awstasks.ClassicLoadBalancer
* Addressing PR feedback Renaming task: ELBName() -> CLBName() / LinkToELB() -> LinkToCLB()
* Addressing PR feedback: Various text changes
* fix export of kubecfg
* address TargetGroup should have the same name as the NLB
* should address error when fetching tags due to missing ARN
* Update expected and crds
* Add feature table to NLB docs
* Address more feedback and remove some TODOs that arent applicable anymore
* Update spotinst validation error message
Co-authored-by: Peter Rifel <pgrifel@gmail.com>
The ./hack/update-expected.sh script generates some fields which are
required to be string fields and hence results in linting errors.
This PR changes those fields to string/*string and removes lint
warnings.