Automatic merge from submit-queue
Add documentation on handling node resources
At a minimum, this is meant to give more context on why the feature in https://github.com/kubernetes/kops/pull/2982 was added and attempts to give some recommendations of what to consider when evaluating node system resources.
I hope this spurs some discussion and that the recommendations I make maybe be assessed further. For example ... in one of the links I referenced, we're advised to set `system-reserved` **only if we know what we are doing** (which I can't say I do 💯% ... 🤷♂️) and we're even warned to only set it if you really need to.
Automatic merge from submit-queue
Cluster Hooks Enhancement
Cluster Hook Enhancement
The current implementation is presently limited to docker exec, without ordering or any bells and whistles. This PR extends the functionality of the hook spec by;
- adds ordering to the hooks, with users able to set the requires and before of the unit
- cleaned up the manifest code, added tests and permit setting a section raw
- added the ability to filter hooks via master and node roles
- updated the documentation to reflect the changes
- extending the hooks to permit adding hooks per instancegroup as well cluster
- @note, instanceGroup are permitted to override the cluster wide one for ease of testing
- on the journey tried to fix an go idioms such as import ordering, comments for global export etc
- @question: v1alpha1 doesn't appear to have Subnet fields, are these different version being used anywhere?
Automatic merge from submit-queue
Etcd v3 Support
Etcd V3 Support
The current implementation is running v2.2.1 which is two years old and end of life. This PR adds the ability to use etcd v3 and set the versions if required. Note at the moment the image is still using the gcr.io registry image and much like Etcd TLS PR there presently is no 'automated' migration path from v2 to v3.
- the feature is gated behind the version of the etcd cluster, both clusters events and main must use the same storage type
- the version for v2 is unchanged and pinned at v2.2.1 with v3 using v3.0.17
- @question: we should consider allowing the user to override the images though I think this should be addressed generically, than one offs here and then. I know @chrislovecnm is working on a asset registry??
Automatic merge from submit-queue
Initial cloud interface for DigitalOcean
Just setup code for DigitalOcean and trying to keep my PRs as small as possible. Upcoming PRs will include tasks to create droplets, block storage (for etcd), etc.
- removing the StorageType on the etcd cluster spec (sticking with the Version field only)
- changed the protokube flag back to -etcd-image
- users have to explicitly set the etcd version now; the latest version in gcr.io is 3.0.17
- reverted the ordering on the populate spec
The current implementation is running v2.2.1 which is two year old and end of life. This PR add the ability to use etcd and set the versions if required. Note at the moment the image is still using the gcr.io registry image. As note, much like TLS their presently is not 'automated' migration path from v2 to v3.
- the feature is gated behine the storageType of the etcd cluster, bot clusters events and main must use the same storage type
- the version for v2 is unchanged and pinned at v2.2.1 with v2 using v3.0.17
- @question: we shoudl consider allowing the use to override the images though I think this should be addresses more generically, than one offs here and then. I know chris is working on a asset registry??
Automatic merge from submit-queue
Tighten down S3 IAM policy statements
This PR contains updates to:
- Remove default `s3:*` IAM policy for master and compute nodes
- Allow all nodes to list bucket contents
- Allow master nodes to get all bucket contents
- Allow compute nodes to get specific bucket contents (certain private key files are disallowed)
- Adds unit tests around the S3 policy build function
- switched to using an array of roles rather than boolean flags for node selection
- fixed up the README to reflect the changes
- added the docker.service as a Requires to all docker exec hooks
- extending the hooks to permit adding hooks per instancegroup as well
- @note, instanceGroup are permitted to override the cluster wide one for ease of testing
- updated the documentation to reflect the changes
- on the journey tried to fix an go idioms such as import ordering, comments for global export etc
- @question: v1alpha1 doesn't appear to have Subnet fields, are these different version being used anywhere?
The present implementation of hooks only perform for docker exec, which isn't that flexible. This PR permits the user to greater customize systemd units on the instances
- cleaned up the manifest code, added tests and permit setting a section raw
- added the ability to filter hooks via master and node roles
- updated the documentation to reflect the changes
- cleaned up some of the vetting issues
The current implementation does not permit the user to order the hooks. This PR adds optional Requires, Before and Documentation to the HookSpec which is added the systemd unit if specified.
Automatic merge from submit-queue
Add cluster spec to node user data so component config changes are detected
Related to #3076
Some cluster changes such as component config modifications are not picked up when performing updates (nodes are not marked as `NEEDUPDATE`). This change introduces the ability to:
1. Include certain cluster specs within the node user data file ~(`enableClusterSpecInUserData: true`)~
2. ~Encode the cluster spec string before placing within the user data file (`enableClusterSpecInUserData: true`)~
~The above flags default to false so shouldn't cause any changes to existing clusters.~
Following feedback I've removed the optional API flags, so component config is included by default within the user data. This WILL cause all nodes to have a required update to their bootstrap scripts.
Automatic merge from submit-queue
better error messages with docker api
Got `DOCKER_HOST="unix:///var/run/docker.sock" running on my mac. I do not have the time to try to figure out how to make socket connection, but the error message will help a user.
Automatic merge from submit-queue
Kubelet API Certificate
A while back options to permit secure kube-apiserver to kubelet api was [PR2381](https://github.com/kubernetes/kops/pull/2831) using the server.cert and server.key as testing grounds. This PR formalizes the options and generates a client certificate on their behalf (note, the server{.cert,key} can no longer be used post 1.7 as the certificate usage is checked i.e. it's not using a client cert). The users now only need to add anonymousAuth: false to enable secure api to kubelet. I'd like to make this default to all new builds i'm not sure where to place it.
- updated the security.md to reflect the changes
- issue a new client kubelet-api certificate used to secure authorize comms between api and kubelet
- fixed any formatting issues i came across on the journey
Automatic merge from submit-queue
Clarify docs: rename spec/specification into desired configuration
The cluster state in S3 has (among others) two files: `cluster.spec` and `config`.
When the documentation mentioned "create or update cluster spec" for example, it was confusing what was actually updated. It's not the cluster.spec file.
As I understand, `cluster.spec` should only be created/updated after `kops update --yes` is run.
I changed the docs for `kops get`, `kops create`, `kops replace`, `kops edit`.
I did NOT change those files: `kops_rolling-update.md`, `kops_rolling-update_cluster.md` as I think those actually use `cluster.spec`.
Automatic merge from submit-queue
Specify initial period in gossip-based cluster name pattern
This is the most trivial change ever, but I actually got bitten by this and had to grep the source code to figure out that the initial period needed to be in the cluster name suffix.
Automatic merge from submit-queue
Use SSL in ELB API server health check
This switch causes the ELB to perform a SSL handshake and makes the
`I0427 03:57:55.059255 1 logs.go:41] http: TLS handshake error from IP:PORT: EOF`
disappear from the apiserver logs.
Tested manually and everything looks ✅
Inspiration from https://github.com/kubernetes-incubator/kube-aws/pull/604
In the S3 bucket, the file cluster.spec is not actually the spec, but the
actual configuration. The file config is the spec. To avoid confusion,
this commit changes spec/specification into 'desired configuration' in
the documentation, to avoid associating cluster.spec with a cluster
'specification' that the users should use.
Another step towards working totally offline (which may never be fully
achievable, because of the need to hash assets). But should ensure that
when we update the stable channel, we are testing against that version
in the tests, otherwise it is easy to break master.
- fixed the various issues highlighted in https://github.com/kubernetes/kops/pull/3125
- changed the docuementation to make more sense
- changed the logic of the UseSecureKubelet to return early
A while back options to permit secure kube-apiserver to kubelet api was https://github.com/kubernetes/kops/pull/2831 using the server.cert and server.key as testing grouns. This PR formalizes the options and generates a client certificate on their behalf (note, the server{.cert,key} can no longer be used post 1.7 as the certificate usage is checked i.e. it's not using a client cert). The users now only need to add anonymousAuth: false to enable secure api to kubelet. I'd like to make this default to all new builds i'm not sure where to place it.
- updated the security.md to reflect the changes
- issue a new client kubelet-api certificate used to secure authorize comms between api and kubelet
- fixed any formatting issues i came across on the journey
Automatic merge from submit-queue
Add support for cluster using http forward proxy #2481
Adds support for running a cluster where access to external resources must be done through an http forward proxy. This adds a new element to the ClusterSpec, `EgressProxy`, and then sets up environment variables where appropriate. Access to API servers is additionally assumed to be done through the proxy, in particular this is necessary for AWS VPCs with private topology and egress by proxy (no NAT), at least until Amazon implements VPC Endpoints for the APIs.
Additionally, see my notes in #2481
TODOs
- [x] Consider editing files from nodeup rather than cloudup
- [x] Add support for RHEL
- [x] Validate on RHEL
- [x] ~Add support for CoreOS~ See #3032
- [x] ~Add support for vSphere~ See #3071
- [x] Minimize services effected
- [x] ~Support seperate https_proxy configuration~ See #3069
- [x] ~Remove unvalidated proxy auth support (save for future PR)~ See #3070
- [x] Add Documentation
- [x] Fill in some sensible default exclusions for the user, allow the user to extend this list
- [x] Address PR review comments
- [x] Either require port or handle nil
- [x] ~Do API validation (or file an issue for validation)~ See #3077
- [x] Add uppercase versions of proxy env vars to cover our bases
- [x] ~File an issue for unit tests~ 😬 See #3072
- [x] Validate cluster upgrades and updates
- [x] Remove ftp_proxy (nothing uses)
This switch causes the ELB to perform a SSL handshake and makes the
`I0427 03:57:55.059255 1 logs.go:41] http: TLS handshake error from IP:PORT: EOF`
disappear from the apiserver logs.
Automatic merge from submit-queue
Kops Replace Command - create unprovisioned
The current 'kops replace' fails if the resource does not exist, which is annoying if you want to use the feature to drive your CI. This PR adds a --create option to create any resource which does not exist. At the moment we limit this to instanceGroups only. I'd also like to see this command perhaps be renamed to kops apply?
- added a --create command line option to the replace command to create unprovisioned resources