Merge pull request #1174 from pospispa/postpone-pvc-deletion-if-used-in-a-pod
Automatic merge from submit-queue. Postpone Deletion of a Persistent Volume Claim in case It Is Used by a Pod Proposal for postponing deletion of Persistent Volume Claim in case it's used by a pod. Implementation will fix issue https://github.com/kubernetes/kubernetes/issues/45143 /sig storage /cc @kubernetes/sig-storage-proposals
This commit is contained in:
commit
6c96f9a2e5
|
@ -0,0 +1,106 @@
|
|||
# Postpone Deletion of a Persistent Volume Claim in case It Is Used by a Pod
|
||||
|
||||
Status: Proposal
|
||||
|
||||
Version: GA
|
||||
|
||||
Implementation Owner: @pospispa
|
||||
|
||||
## Motivation
|
||||
|
||||
User can delete a Persistent Volume Claim (PVC) that is being used by a pod. This may have negative impact on the pod and it may result in data loss.
|
||||
|
||||
For more details see issue https://github.com/kubernetes/kubernetes/issues/45143
|
||||
|
||||
## Proposal
|
||||
|
||||
Postpone the PVC deletion until the PVC is not used by any pod.
|
||||
|
||||
## User Experience
|
||||
|
||||
### Use Cases
|
||||
|
||||
1. User deletes a PVC that is being used by a pod. This may have negative impact on the pod and may result in data loss. As a user, I want that any PVC deletion does not have any negative impact on any pod. As a user, I do not want to experience data loss.
|
||||
|
||||
#### Scenarios for data loss
|
||||
Depending on the storage type the data loss occurs in one of the below scenarios:
|
||||
- in case the dynamic provisioning is used and reclaim policy is `Delete` the PVC deletion triggers deletion of the associated storage asset and PV.
|
||||
- the same as above applies for the static provisioning and `Delete` reclaim policy.
|
||||
|
||||
## Implementation
|
||||
|
||||
### API Server, PVC Admission Controller, PVC Create
|
||||
A new plugin for PVC admission controller will be created. The plugin will automatically add finalizer information into newly created PVC's metadata.
|
||||
|
||||
### Scheduler
|
||||
Scheduler will check if a pod uses a PVC and if any of the PVCs has `deletionTimestamp` set. In case this is true an error will be logged: "PVC (%pvcName) is in scheduled for deletion state" and scheduler will behave as if PVC was not found.
|
||||
|
||||
### Kubelet
|
||||
Kubelet does currently live lookup of PVC(s) that are used by a pod.
|
||||
|
||||
In case any of the PVC(s) used by the pod has the `deletionTimestamp` set kubelet won't start the pod but will report and error: "can't start pod (%pod) because it's using PVC (%pvcName) that is being deleted". Kubelet will follow the same code path as if PVC(s) do not exist.
|
||||
|
||||
### PVC Finalizing Controller
|
||||
PVC finalizing controller is a new internal controller.
|
||||
|
||||
PVC finalizing controller watches for both PVC and pod events that are processed as described below:
|
||||
1. PVC add/update/delete events:
|
||||
- If `deletionTimestamp` is `nil` and finalizer is missing, the PVC is added to PVC queue.
|
||||
- If `deletionTimestamp` is `non-nil` and finalizer is present, the PVC is added to PVC queue.
|
||||
2. Pod add events:
|
||||
- If pod is terminated, all referenced PVCs are added to PVC queue.
|
||||
3. Pod update events:
|
||||
- If pod is changing from non-terminated to terminated state, all referenced PVCs are added to PVC queue.
|
||||
4. Pod delete events:
|
||||
- All referenced PVCs are added to PVC queue.
|
||||
|
||||
PVC and pod information are kept in a cache that is done inherently for an informer.
|
||||
|
||||
The PVC queue holds PVCs that need to be processed according to the below rules:
|
||||
- If PVC is not found in cache, the PVC is skipped.
|
||||
- If PVC is in cache with `nil` `deletionTimestamp` and missing finalizer, finalizer is added to the PVC. In case the adding finalizer operation fails, the PVC is re-queued into the PVC queue.
|
||||
- If PVC is in cache with `non-nil` `deletionTimestamp` and finalizer is present, live pod list is done for the PVC namespace. If all pods referencing the PVC are not yet bound to a node or are terminated, the finalizer removal is attempted. In case the finalizer removal operation fails the PVC is re-queued.
|
||||
|
||||
### CLI
|
||||
In case a PVC has the `deletionTimestamp` set the commands `kubectl get pvc` and `kubectl describe pvc` will display that the PVC is in terminating state.
|
||||
|
||||
### Client/Server Backwards/Forwards compatibility
|
||||
|
||||
N/A
|
||||
|
||||
## Alternatives considered
|
||||
|
||||
1. Check in admission controller whether PVC can be deleted by listing all pods and checking if the PVC is used by a pod. This was discussed and rejected in PR https://github.com/kubernetes/kubernetes/pull/46573
|
||||
|
||||
There were alternatives discussed in issue https://github.com/kubernetes/kubernetes/issues/45143
|
||||
|
||||
### Scheduler Live Lookups PVC(s) Instead of Kubelet
|
||||
The implementation proposes that kubelet live updates PVC(s) used by a pod before it starts the pod in order not to start a pod that uses a PVC that has the `deletionTimestamp` set.
|
||||
|
||||
An alternative is that scheduler will live update PVC(s) used by a pod in order not to schedule a pod that uses a PVC that has the `deletionTimestamp` set.
|
||||
|
||||
But live update represents a performance penalty. As the live update performance penalty is already present in the kubelet it's better to do the live update in kubelet.
|
||||
|
||||
### Scheduler Maintains PVCUsedByPod Information in PVC
|
||||
Scheduler will maintain information on both pods and PVCs from API server.
|
||||
|
||||
In case a pod is being scheduled and is using PVCs that do not have condition PVCUsedByPod set it will set this condition for these PVCs.
|
||||
|
||||
In case a pod is terminated and was using PVCs the scheduler will update PVCUsedByPod condition for these PVCs accordingly.
|
||||
|
||||
PVC finalizing controller won't watch pods because the information whether a PVC is used by a pod or not is now maintained by the scheduler.
|
||||
|
||||
In case PVC finalizing controller gets an update of a PVC and this PVC has `deletionTimestamp` set it will do live PVC update for this PVC in order to get up-to-date value of its PVCUsedByPod field. In case the PVCUsedByPod is not true it will remove the finalizer information from this PVC.
|
||||
|
||||
### Scheduler In the Role of PVC Finalizing Controller
|
||||
Scheduler will be responsible for removing the finalizer information from PVCs that are being deleted.
|
||||
|
||||
So scheduler will watch pods and PVCs and will maintain internal cache of pods and PVCs.
|
||||
|
||||
In case a PVC is deleted scheduler will do one of the below:
|
||||
- In case the PVC is used by a pod it will add the PVC into its internal set of PVCs that are waiting for deletion.
|
||||
- In case the PVC is not used by a pod it will remove the finalizer information from the PVC metadata.
|
||||
|
||||
Note: scheduler is the source of truth of pods that are being started. The information on active pods may be a little bit outdated that causes that deletion of a PVC may be postponed (pod status in schedular is active while the pod is terminated in API server), but this does not cause any harm.
|
||||
|
||||
The disadvantage is that scheduler will become responsible for PVC deletion postponing that will make scheduler bigger.
|
Loading…
Reference in New Issue