Postpone Deletion of a Persistent Volume Claim in case It Is Used by a Pod

Proposal for postponing deletion of Persistent Volume Claim in case it's used by a pod.

It will fix issue https://github.com/kubernetes/kubernetes/issues/45143
This commit is contained in:
pospispa 2017-10-11 17:10:45 +02:00
parent 425e57bb34
commit b378919dd8
1 changed files with 106 additions and 0 deletions

View File

@ -0,0 +1,106 @@
# Postpone Deletion of a Persistent Volume Claim in case It Is Used by a Pod
Status: Proposal
Version: GA
Implementation Owner: @pospispa
## Motivation
User can delete a Persistent Volume Claim (PVC) that is being used by a pod. This may have negative impact on the pod and it may result in data loss.
For more details see issue https://github.com/kubernetes/kubernetes/issues/45143
## Proposal
Postpone the PVC deletion until the PVC is not used by any pod.
## User Experience
### Use Cases
1. User deletes a PVC that is being used by a pod. This may have negative impact on the pod and may result in data loss. As a user, I want that any PVC deletion does not have any negative impact on any pod. As a user, I do not want to experience data loss.
#### Scenarios for data loss
Depending on the storage type the data loss occurs in one of the below scenarios:
- in case the dynamic provisioning is used and reclaim policy is `Delete` the PVC deletion triggers deletion of the associated storage asset and PV.
- the same as above applies for the static provisioning and `Delete` reclaim policy.
## Implementation
### API Server, PVC Admission Controller, PVC Create
A new plugin for PVC admission controller will be created. The plugin will automatically add finalizer information into newly created PVC's metadata.
### Scheduler
Scheduler will check if a pod uses a PVC and if any of the PVCs has `deletionTimestamp` set. In case this is true an error will be logged: "PVC (%pvcName) is in scheduled for deletion state" and scheduler will behave as if PVC was not found.
### Kubelet
Kubelet does currently live lookup of PVC(s) that are used by a pod.
In case any of the PVC(s) used by the pod has the `deletionTimestamp` set kubelet won't start the pod but will report and error: "can't start pod (%pod) because it's using PVC (%pvcName) that is being deleted". Kubelet will follow the same code path as if PVC(s) do not exist.
### PVC Finalizing Controller
PVC finalizing controller is a new internal controller.
PVC finalizing controller watches for both PVC and pod events that are processed as described below:
1. PVC add/update/delete events:
- If `deletionTimestamp` is `nil` and finalizer is missing, the PVC is added to PVC queue.
- If `deletionTimestamp` is `non-nil` and finalizer is present, the PVC is added to PVC queue.
2. Pod add events:
- If pod is terminated, all referenced PVCs are added to PVC queue.
3. Pod update events:
- If pod is changing from non-terminated to terminated state, all referenced PVCs are added to PVC queue.
4. Pod delete events:
- All referenced PVCs are added to PVC queue.
PVC and pod information are kept in a cache that is done inherently for an informer.
The PVC queue holds PVCs that need to be processed according to the below rules:
- If PVC is not found in cache, the PVC is skipped.
- If PVC is in cache with `nil` `deletionTimestamp` and missing finalizer, finalizer is added to the PVC. In case the adding finalizer operation fails, the PVC is re-queued into the PVC queue.
- If PVC is in cache with `non-nil` `deletionTimestamp` and finalizer is present, live pod list is done for the PVC namespace. If all pods referencing the PVC are not yet bound to a node or are terminated, the finalizer removal is attempted. In case the finalizer removal operation fails the PVC is re-queued.
### CLI
In case a PVC has the `deletionTimestamp` set the commands `kubectl get pvc` and `kubectl describe pvc` will display that the PVC is in terminating state.
### Client/Server Backwards/Forwards compatibility
N/A
## Alternatives considered
1. Check in admission controller whether PVC can be deleted by listing all pods and checking if the PVC is used by a pod. This was discussed and rejected in PR https://github.com/kubernetes/kubernetes/pull/46573
There were alternatives discussed in issue https://github.com/kubernetes/kubernetes/issues/45143
### Scheduler Live Lookups PVC(s) Instead of Kubelet
The implementation proposes that kubelet live updates PVC(s) used by a pod before it starts the pod in order not to start a pod that uses a PVC that has the `deletionTimestamp` set.
An alternative is that scheduler will live update PVC(s) used by a pod in order not to schedule a pod that uses a PVC that has the `deletionTimestamp` set.
But live update represents a performance penalty. As the live update performance penalty is already present in the kubelet it's better to do the live update in kubelet.
### Scheduler Maintains PVCUsedByPod Information in PVC
Scheduler will maintain information on both pods and PVCs from API server.
In case a pod is being scheduled and is using PVCs that do not have condition PVCUsedByPod set it will set this condition for these PVCs.
In case a pod is terminated and was using PVCs the scheduler will update PVCUsedByPod condition for these PVCs accordingly.
PVC finalizing controller won't watch pods because the information whether a PVC is used by a pod or not is now maintained by the scheduler.
In case PVC finalizing controller gets an update of a PVC and this PVC has `deletionTimestamp` set it will do live PVC update for this PVC in order to get up-to-date value of its PVCUsedByPod field. In case the PVCUsedByPod is not true it will remove the finalizer information from this PVC.
### Scheduler In the Role of PVC Finalizing Controller
Scheduler will be responsible for removing the finalizer information from PVCs that are being deleted.
So scheduler will watch pods and PVCs and will maintain internal cache of pods and PVCs.
In case a PVC is deleted scheduler will do one of the below:
- In case the PVC is used by a pod it will add the PVC into its internal set of PVCs that are waiting for deletion.
- In case the PVC is not used by a pod it will remove the finalizer information from the PVC metadata.
Note: scheduler is the source of truth of pods that are being started. The information on active pods may be a little bit outdated that causes that deletion of a PVC may be postponed (pod status in schedular is active while the pod is terminated in API server), but this does not cause any harm.
The disadvantage is that scheduler will become responsible for PVC deletion postponing that will make scheduler bigger.