mirror of https://github.com/kubernetes/kops.git
Merge pull request #11972 from johngmyers/doc-rotation
Add documentation for keypair rotation
This commit is contained in:
commit
655a70a9b5
|
@ -508,7 +508,7 @@ To prepare the customized client-ca file on master nodes, the user can either us
|
|||
|
||||
In the case that the user would use a customized client-ca file, it is common that the kubernetes CA (`/srv/kubernetes/ca/crt`) need to be appended to the end of the client-ca file. One way to append the ca.crt to the end of the customized client-ca file is to write an [kop-hook](https://kops.sigs.k8s.io/cluster_spec/#hooks) to do the append logic.
|
||||
|
||||
Kops will have [CA rotation](https://kops.sigs.k8s.io/rotate-secrets/) feature soon, which would refresh the kubernetes cert files, including the ca.crt. If a customized client-ca file is used, when kops cert rotation happens, the user is responsible to update the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook.
|
||||
Kops has a [CA rotation](operations/rotate-secrets.md) feature, which refreshes the Kubernetes certificate files, including the ca.crt. If a customized client-ca file is used, when kOps cert rotation happens, the user is responsible for updating the ca.crt in the customized client-ca file. The refresh ca.crt logic can also be achieved by writing a kops hook.
|
||||
|
||||
See also [Kubernetes certificates](https://kubernetes.io/docs/concepts/cluster-administration/certificates/)
|
||||
|
||||
|
|
|
@ -0,0 +1,248 @@
|
|||
# How to rotate all secrets / credentials
|
||||
|
||||
There are two types of credentials managed by kOps:
|
||||
|
||||
* "secrets" are symmetric credentials.
|
||||
|
||||
* "keypairs" are pairs of X.509 certificates and their corresponding private keys.
|
||||
The exceptions are "service-account" keypairs, which are stored as
|
||||
certificate and private key pairs, but do not use any part of the certificates
|
||||
other than the public keys.
|
||||
|
||||
Keypairs are grouped into named "keysets", according to their use. For example,
|
||||
the "kubernetes-ca" keyset is used for the cluster's Kubernetes general CA.
|
||||
Each keyset has a single primary keypair, which is the one whose private key
|
||||
is used. The remaining, secondary keypairs are either trusted or distrusted.
|
||||
The trusted keypairs, including the primary keypair, have their certificates
|
||||
included in relevant trust stores.
|
||||
|
||||
## Rotating keypairs
|
||||
|
||||
{{ kops_feature_table(kops_added_default='1.22') }}
|
||||
|
||||
You may gracefully rotate keypairs of keysets that are either Certificate Authorities
|
||||
or are "service-account" by performing the following procedure. Other keypairs will be
|
||||
automatically reissued by a non-dryrun `kops update cluster` when their issuing
|
||||
CA is rotated.
|
||||
|
||||
### Create and stage new keypair
|
||||
|
||||
Create a new keypair for each keyset that you are going to rotate.
|
||||
Then update the cluster and perform a rolling update.
|
||||
To stage all rotatable keysets, run:
|
||||
|
||||
```shell
|
||||
kops create keypair all
|
||||
kops update cluster --yes
|
||||
kops rolling-update cluster --yes
|
||||
```
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
A failure at this stage is unlikely. To roll back this change:
|
||||
|
||||
* Use `kops get keypairs` to get the IDs of the newly created keysets.
|
||||
* Then use `kops distrust keypair` to distrust each of them by keyset and ID.
|
||||
* Then use `kops update cluster --yes`
|
||||
* Then use `kops rolling-update cluster --yes`
|
||||
|
||||
### Export and distribute new kubeconfig certificate-authority-data
|
||||
|
||||
If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
|
||||
you are not using a load balancer for the Kubernetes API with its own separate
|
||||
certificate, export a new kubeconfig with the new CA certificate
|
||||
included in the `certificate-authority-data` field for the cluster:
|
||||
|
||||
```shell
|
||||
kops export kubecfg
|
||||
```
|
||||
|
||||
Distribute the new `certificate-authority-data` to all clients of that cluster's
|
||||
Kubernetes API.
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
To roll back this change, distribute the previous kubeconfig `certificate-authority-data`.
|
||||
|
||||
### Promote the new keypairs
|
||||
|
||||
Promote the new keypairs to primary with:
|
||||
|
||||
```shell
|
||||
kops promote keypair all
|
||||
kops update cluster --yes
|
||||
kops rolling-update cluster --force --yes
|
||||
```
|
||||
|
||||
As of the writing of this document, rolling-update will not necessarily identify all
|
||||
relevant nodes as needing update, so should be invoked with the `--force` flag.
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
The most likely failure at this stage would be a client of the Kubernetes API that
|
||||
did not get the new `certificate-authority-data` and thus do not trust the
|
||||
new TLS server certificate.
|
||||
|
||||
To roll back this change:
|
||||
|
||||
* Use `kops get keypairs` to get the IDs of the previous primary keysets,
|
||||
most likely by identifying the issue dates.
|
||||
* Then use `kops promote keypair` to promote each of them by keyset and ID.
|
||||
* Then use `kops update cluster --yes`
|
||||
* Then use `kops rolling-update cluster --force --yes`
|
||||
|
||||
### Export and distribute new kubeconfig admin credentials
|
||||
|
||||
If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
|
||||
have kubeconfigs with cluster admin credentials, export new kubeconfigs
|
||||
with new admin credentials for the cluster:
|
||||
|
||||
```shell
|
||||
kops export kubecfg --admin=DURATION
|
||||
```
|
||||
|
||||
where `DURATION` is the desired lifetime of the admin credential.
|
||||
|
||||
Distribute the new credentials to all clients that require them.
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
To roll back this change, distribute the previous kubeconfig admin credentials.
|
||||
|
||||
### Distrust the previous keypairs
|
||||
|
||||
Remove trust in the previous keypairs with:
|
||||
|
||||
```shell
|
||||
kops distrust keypair all
|
||||
kops update cluster --yes
|
||||
kops rolling-update cluster --yes
|
||||
```
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
The most likely failure at this stage would be a client of the Kubernetes API that
|
||||
is still using a credential issued by the previous keypair.
|
||||
|
||||
To roll back this change:
|
||||
|
||||
* Use `kops get keypairs --distrusted` to get the IDs of the previously trusted keysets,
|
||||
most likely by identifying the distrust dates.
|
||||
* Then use `kops trust keypair` to trust each of them by keyset and ID.
|
||||
* Then use `kops update cluster --yes`
|
||||
* Then use `kops rolling-update cluster --force --yes`
|
||||
|
||||
### Export and distribute new kubeconfig certificate-authority-data
|
||||
|
||||
If you are rotating the Kubernetes general CA ("kubernetes-ca" or "all") and
|
||||
you are not using a load balancer for the Kubernetes API with its own separate
|
||||
certificate, export a new kubeconfig with the previous CA certificate
|
||||
removed from the `certificate-authority-data` field for the cluster:
|
||||
|
||||
```shell
|
||||
kops export kubecfg
|
||||
```
|
||||
|
||||
Distribute the new `certificate-authority-data` to all clients of that cluster's
|
||||
Kubernetes API.
|
||||
|
||||
#### Rollback procedure:
|
||||
|
||||
To roll back this change, distribute the previous kubeconfig `certificate-authority-data`.
|
||||
|
||||
## Rotating encryptionconfig
|
||||
|
||||
See [the Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/#rotating-a-decryption-key)
|
||||
for information on how to gracefully rotate keys in the encryptionconfig.
|
||||
|
||||
Use `kops create secret encryptionconfig --force` to update the encryptionconfig secret.
|
||||
Following that, use `kops update cluster --yes` and `kops rolling-update cluster --yes`.
|
||||
|
||||
## Rotating other secrets
|
||||
|
||||
[TODO: cilium_encryptionconfig, dockerconfig, weave_encryptionconfig]
|
||||
|
||||
## Legacy procedure
|
||||
|
||||
The following is the procedure to rotate secrets and keypairs in kOps versions
|
||||
prior to 1.22.
|
||||
|
||||
**This is a disruptive procedure.**
|
||||
|
||||
### Delete all secrets
|
||||
|
||||
Delete all secrets & keypairs that kOps is holding:
|
||||
|
||||
```shell
|
||||
kops get secrets | grep '^Secret' | awk '{print $2}' | xargs -I {} kops delete secret secret {}
|
||||
|
||||
kops get secrets | grep '^Keypair' | awk '{print $2}' | xargs -I {} kops delete secret keypair {}
|
||||
```
|
||||
|
||||
### Recreate all secrets
|
||||
|
||||
Now run `kops update` to regenerate the secrets & keypairs.
|
||||
```
|
||||
kops update cluster
|
||||
kops update cluster --yes
|
||||
```
|
||||
|
||||
kOps may fail to recreate all the keys on first try. If you get errors about ca key for 'ca' not being found, run `kops update cluster --yes` once more.
|
||||
|
||||
### Force cluster to use new secrets
|
||||
|
||||
Now you will have to remove the etcd certificates from every master.
|
||||
|
||||
Find all the master IPs. One easy way of doing that is running
|
||||
|
||||
```
|
||||
kops toolbox dump
|
||||
```
|
||||
|
||||
Then SSH into each node and run
|
||||
|
||||
```
|
||||
sudo find /mnt/ -name server.* | xargs -I {} sudo rm {}
|
||||
sudo find /mnt/ -name me.* | xargs -I {} sudo rm {}
|
||||
```
|
||||
|
||||
You need to reboot every node (using a rolling-update). You have to use `--cloudonly` because the keypair no longer matches.
|
||||
|
||||
```
|
||||
kops rolling-update cluster --cloudonly --force --yes
|
||||
```
|
||||
|
||||
Re-export kubecfg with new settings:
|
||||
|
||||
```
|
||||
kops export kubecfg
|
||||
```
|
||||
|
||||
### Recreate all service accounts
|
||||
|
||||
Now the service account tokens will need to be regenerated inside the cluster:
|
||||
|
||||
`kops toolbox dump` and find a master IP
|
||||
|
||||
Then `ssh admin@${IP}` and run this to delete all the service account tokens:
|
||||
|
||||
```shell
|
||||
# Delete all service account tokens in all namespaces
|
||||
NS=`kubectl get namespaces -o 'jsonpath={.items[*].metadata.name}'`
|
||||
for i in ${NS}; do kubectl get secrets --namespace=${i} --no-headers | grep "kubernetes.io/service-account-token" | awk '{print $1}' | xargs -I {} kubectl delete secret --namespace=$i {}; done
|
||||
|
||||
# Allow for new secrets to be created
|
||||
sleep 60
|
||||
|
||||
# Bounce all pods to make use of the new service tokens
|
||||
pkill -f kube-controller-manager
|
||||
kubectl delete pods --all --all-namespaces
|
||||
```
|
||||
|
||||
### Verify the cluster is back up
|
||||
|
||||
The last command from the previous section will take some time. Meanwhile you can check validation to see the cluster gradually coming back online.
|
||||
|
||||
```
|
||||
kops validate cluster --wait 10m
|
||||
```
|
|
@ -28,6 +28,9 @@ spec:
|
|||
This feature may be temporarily disabled by turning off the `TerraformManagedFiles` feature flag
|
||||
using `export KOPS_FEATURE_FLAGS="-TerraformManagedFiles"`.
|
||||
|
||||
* kOps now implements graceful rotation of its Certificate Authorities and the service
|
||||
account signing key. See the documentation on [How to rotate all secrets / credentials](../operations/rotate-secrets.md)
|
||||
|
||||
* New clusters running Kubernetes 1.22 will have AWS EBS CSI driver enabled by default.
|
||||
|
||||
# Breaking changes
|
||||
|
|
|
@ -1,81 +0,0 @@
|
|||
# How to rotate all secrets / credentials
|
||||
|
||||
**This is a disruptive procedure.**
|
||||
|
||||
## Delete all secrets
|
||||
|
||||
Delete all secrets & keypairs that kOps is holding:
|
||||
|
||||
```shell
|
||||
kops get secrets | grep '^Secret' | awk '{print $2}' | xargs -I {} kops delete secret secret {}
|
||||
|
||||
kops get secrets | grep '^Keypair' | awk '{print $2}' | xargs -I {} kops delete secret keypair {}
|
||||
```
|
||||
|
||||
## Recreate all secrets
|
||||
|
||||
Now run `kops update` to regenerate the secrets & keypairs.
|
||||
```
|
||||
kops update cluster
|
||||
kops update cluster --yes
|
||||
```
|
||||
|
||||
kOps may fail to recreate all the keys on first try. If you get errors about ca key for 'ca' not being found, run `kops update cluster --yes` once more.
|
||||
|
||||
## Force cluster to use new secrets
|
||||
|
||||
Now you will have to remove the etcd certificates from every master.
|
||||
|
||||
Find all the master IPs. One easy way of doing that is running
|
||||
|
||||
```
|
||||
kops toolbox dump
|
||||
```
|
||||
|
||||
Then SSH into each node and run
|
||||
|
||||
```
|
||||
sudo find /mnt/ -name server.* | xargs -I {} sudo rm {}
|
||||
sudo find /mnt/ -name me.* | xargs -I {} sudo rm {}
|
||||
```
|
||||
|
||||
You need to reboot every node (using a rolling-update). You have to use `--cloudonly` because the keypair no longer matches.
|
||||
|
||||
```
|
||||
kops rolling-update cluster --cloudonly --force --yes
|
||||
```
|
||||
|
||||
Re-export kubecfg with new settings:
|
||||
|
||||
```
|
||||
kops export kubecfg
|
||||
```
|
||||
|
||||
## Recreate all service accounts
|
||||
|
||||
Now the service account tokens will need to be regenerated inside the cluster:
|
||||
|
||||
`kops toolbox dump` and find a master IP
|
||||
|
||||
Then `ssh admin@${IP}` and run this to delete all the service account tokens:
|
||||
|
||||
```shell
|
||||
# Delete all service account tokens in all namespaces
|
||||
NS=`kubectl get namespaces -o 'jsonpath={.items[*].metadata.name}'`
|
||||
for i in ${NS}; do kubectl get secrets --namespace=${i} --no-headers | grep "kubernetes.io/service-account-token" | awk '{print $1}' | xargs -I {} kubectl delete secret --namespace=$i {}; done
|
||||
|
||||
# Allow for new secrets to be created
|
||||
sleep 60
|
||||
|
||||
# Bounce all pods to make use of the new service tokens
|
||||
pkill -f kube-controller-manager
|
||||
kubectl delete pods --all --all-namespaces
|
||||
```
|
||||
|
||||
## Verify the cluster is back up
|
||||
|
||||
The last command from the previous section will take some time. Meanwhile you can check validation to see the cluster gradually coming back online.
|
||||
|
||||
```
|
||||
kops validate cluster --wait 10m
|
||||
```
|
|
@ -85,6 +85,7 @@ nav:
|
|||
- GPU setup: "gpu.md"
|
||||
- Label management: "labels.md"
|
||||
- Secret management: "secrets.md"
|
||||
- Rotate Secrets: "operations/rotate-secrets.md"
|
||||
- Service Account Token Volume: "operations/service_account_token_volumes.md"
|
||||
- Moving from a Single Master to Multiple HA Masters: "single-to-multi-master.md"
|
||||
- Running kOps in a CI environment: "continuous_integration.md"
|
||||
|
@ -131,7 +132,6 @@ nav:
|
|||
- Egress Proxy: "http_proxy.md"
|
||||
- Node Authorization: "node_authorization.md"
|
||||
- Node Resource Allocation: "node_resource_handling.md"
|
||||
- Rotate Secrets: "rotate-secrets.md"
|
||||
- Terraform: "terraform.md"
|
||||
- Authentication: "authentication.md"
|
||||
- Contributing:
|
||||
|
|
Loading…
Reference in New Issue