293 lines
11 KiB
Markdown
293 lines
11 KiB
Markdown
# Persistent Storage
|
|
|
|
This document proposes a model for managing persistent, cluster-scoped storage
|
|
for applications requiring long lived data.
|
|
|
|
### Abstract
|
|
|
|
Two new API kinds:
|
|
|
|
A `PersistentVolume` (PV) is a storage resource provisioned by an administrator.
|
|
It is analogous to a node. See [Persistent Volume Guide](../user-guide/persistent-volumes/)
|
|
for how to use it.
|
|
|
|
A `PersistentVolumeClaim` (PVC) is a user's request for a persistent volume to
|
|
use in a pod. It is analogous to a pod.
|
|
|
|
One new system component:
|
|
|
|
`PersistentVolumeClaimBinder` is a singleton running in master that watches all
|
|
PersistentVolumeClaims in the system and binds them to the closest matching
|
|
available PersistentVolume. The volume manager watches the API for newly created
|
|
volumes to manage.
|
|
|
|
One new volume:
|
|
|
|
`PersistentVolumeClaimVolumeSource` references the user's PVC in the same
|
|
namespace. This volume finds the bound PV and mounts that volume for the pod. A
|
|
`PersistentVolumeClaimVolumeSource` is, essentially, a wrapper around another
|
|
type of volume that is owned by someone else (the system).
|
|
|
|
Kubernetes makes no guarantees at runtime that the underlying storage exists or
|
|
is available. High availability is left to the storage provider.
|
|
|
|
### Goals
|
|
|
|
* Allow administrators to describe available storage.
|
|
* Allow pod authors to discover and request persistent volumes to use with pods.
|
|
* Enforce security through access control lists and securing storage to the same
|
|
namespace as the pod volume.
|
|
* Enforce quotas through admission control.
|
|
* Enforce scheduler rules by resource counting.
|
|
* Ensure developers can rely on storage being available without being closely
|
|
bound to a particular disk, server, network, or storage device.
|
|
|
|
#### Describe available storage
|
|
|
|
Cluster administrators use the API to manage *PersistentVolumes*. A custom store
|
|
`NewPersistentVolumeOrderedIndex` will index volumes by access modes and sort by
|
|
storage capacity. The `PersistentVolumeClaimBinder` watches for new claims for
|
|
storage and binds them to an available volume by matching the volume's
|
|
characteristics (AccessModes and storage size) to the user's request.
|
|
|
|
PVs are system objects and, thus, have no namespace.
|
|
|
|
Many means of dynamic provisioning will be eventually be implemented for various
|
|
storage types.
|
|
|
|
|
|
##### PersistentVolume API
|
|
|
|
| Action | HTTP Verb | Path | Description |
|
|
| ---- | ---- | ---- | ---- |
|
|
| CREATE | POST | /api/{version}/persistentvolumes/ | Create instance of PersistentVolume |
|
|
| GET | GET | /api/{version}persistentvolumes/{name} | Get instance of PersistentVolume with {name} |
|
|
| UPDATE | PUT | /api/{version}/persistentvolumes/{name} | Update instance of PersistentVolume with {name} |
|
|
| DELETE | DELETE | /api/{version}/persistentvolumes/{name} | Delete instance of PersistentVolume with {name} |
|
|
| LIST | GET | /api/{version}/persistentvolumes | List instances of PersistentVolume |
|
|
| WATCH | GET | /api/{version}/watch/persistentvolumes | Watch for changes to a PersistentVolume |
|
|
|
|
|
|
#### Request Storage
|
|
|
|
Kubernetes users request persistent storage for their pod by creating a
|
|
```PersistentVolumeClaim```. Their request for storage is described by their
|
|
requirements for resources and mount capabilities.
|
|
|
|
Requests for volumes are bound to available volumes by the volume manager, if a
|
|
suitable match is found. Requests for resources can go unfulfilled.
|
|
|
|
Users attach their claim to their pod using a new
|
|
```PersistentVolumeClaimVolumeSource``` volume source.
|
|
|
|
|
|
##### PersistentVolumeClaim API
|
|
|
|
|
|
| Action | HTTP Verb | Path | Description |
|
|
| ---- | ---- | ---- | ---- |
|
|
| CREATE | POST | /api/{version}/namespaces/{ns}/persistentvolumeclaims/ | Create instance of PersistentVolumeClaim in namespace {ns} |
|
|
| GET | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Get instance of PersistentVolumeClaim in namespace {ns} with {name} |
|
|
| UPDATE | PUT | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Update instance of PersistentVolumeClaim in namespace {ns} with {name} |
|
|
| DELETE | DELETE | /api/{version}/namespaces/{ns}/persistentvolumeclaims/{name} | Delete instance of PersistentVolumeClaim in namespace {ns} with {name} |
|
|
| LIST | GET | /api/{version}/namespaces/{ns}/persistentvolumeclaims | List instances of PersistentVolumeClaim in namespace {ns} |
|
|
| WATCH | GET | /api/{version}/watch/namespaces/{ns}/persistentvolumeclaims | Watch for changes to PersistentVolumeClaim in namespace {ns} |
|
|
|
|
|
|
|
|
#### Scheduling constraints
|
|
|
|
Scheduling constraints are to be handled similar to pod resource constraints.
|
|
Pods will need to be annotated or decorated with the number of resources it
|
|
requires on a node. Similarly, a node will need to list how many it has used or
|
|
available.
|
|
|
|
TBD
|
|
|
|
|
|
#### Events
|
|
|
|
The implementation of persistent storage will not require events to communicate
|
|
to the user the state of their claim. The CLI for bound claims contains a
|
|
reference to the backing persistent volume. This is always present in the API
|
|
and CLI, making an event to communicate the same unnecessary.
|
|
|
|
Events that communicate the state of a mounted volume are left to the volume
|
|
plugins.
|
|
|
|
### Example
|
|
|
|
#### Admin provisions storage
|
|
|
|
An administrator provisions storage by posting PVs to the API. Various ways to
|
|
automate this task can be scripted. Dynamic provisioning is a future feature
|
|
that can maintain levels of PVs.
|
|
|
|
```yaml
|
|
POST:
|
|
|
|
kind: PersistentVolume
|
|
apiVersion: v1
|
|
metadata:
|
|
name: pv0001
|
|
spec:
|
|
capacity:
|
|
storage: 10
|
|
persistentDisk:
|
|
pdName: "abc123"
|
|
fsType: "ext4"
|
|
```
|
|
|
|
```console
|
|
$ kubectl get pv
|
|
|
|
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
|
|
pv0001 map[] 10737418240 RWO Pending
|
|
```
|
|
|
|
#### Users request storage
|
|
|
|
A user requests storage by posting a PVC to the API. Their request contains the
|
|
AccessModes they wish their volume to have and the minimum size needed.
|
|
|
|
The user must be within a namespace to create PVCs.
|
|
|
|
```yaml
|
|
POST:
|
|
|
|
kind: PersistentVolumeClaim
|
|
apiVersion: v1
|
|
metadata:
|
|
name: myclaim-1
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 3
|
|
```
|
|
|
|
```console
|
|
$ kubectl get pvc
|
|
|
|
NAME LABELS STATUS VOLUME
|
|
myclaim-1 map[] pending
|
|
```
|
|
|
|
|
|
#### Matching and binding
|
|
|
|
The ```PersistentVolumeClaimBinder``` attempts to find an available volume that
|
|
most closely matches the user's request. If one exists, they are bound by
|
|
putting a reference on the PV to the PVC. Requests can go unfulfilled if a
|
|
suitable match is not found.
|
|
|
|
```console
|
|
$ kubectl get pv
|
|
|
|
NAME LABELS CAPACITY ACCESSMODES STATUS CLAIM REASON
|
|
pv0001 map[] 10737418240 RWO Bound myclaim-1 / f4b3d283-c0ef-11e4-8be4-80e6500a981e
|
|
|
|
|
|
kubectl get pvc
|
|
|
|
NAME LABELS STATUS VOLUME
|
|
myclaim-1 map[] Bound b16e91d6-c0ef-11e4-8be4-80e6500a981e
|
|
```
|
|
|
|
A claim must request access modes and storage capacity. This is because internally PVs are
|
|
indexed by their `AccessModes`, and target PVs are, to some degree, sorted by their capacity.
|
|
A claim may request one of more of the following attributes to better match a PV: volume name, selectors,
|
|
and volume class (currently implemented as an annotation).
|
|
|
|
A PV may define a `ClaimRef` which can greatly influence (but does not absolutely guarantee) which
|
|
PVC it will match.
|
|
A PV may also define labels, annotations, and a volume class (currently implemented as an
|
|
annotation) to better target PVCs.
|
|
|
|
As of Kubernetes version 1.4, the following algorithm describes in more details how a claim is
|
|
matched to a PV:
|
|
|
|
1. Only PVs with `accessModes` equal to or greater than the claim's requested `accessModes` are considered.
|
|
"Greater" here means that the PV has defined more modes than needed by the claim, but it also defines
|
|
the mode requested by the claim.
|
|
|
|
1. The potential PVs above are considered in order of the closest access mode match, with the best case
|
|
being an exact match, and a worse case being more modes than requested by the claim.
|
|
|
|
1. Each PV above is processed. If the PV has a `claimRef` matching the claim, *and* the PV's capacity
|
|
is not less than the storage being requested by the claim then this PV will bind to the claim. Done.
|
|
|
|
1. Otherwise, if the PV has the "volume.alpha.kubernetes.io/storage-class" annotation defined then it is
|
|
skipped and will be handled by Dynamic Provisioning.
|
|
|
|
1. Otherwise, if the PV has a `claimRef` defined, which can specify a different claim or simply be a
|
|
placeholder, then the PV is skipped.
|
|
|
|
1. Otherwise, if the claim is using a selector but it does *not* match the PV's labels (if any) then the
|
|
PV is skipped. But, even if a claim has selectors which match a PV that does not guarantee a match
|
|
since capacities may differ.
|
|
|
|
1. Otherwise, if the PV's "volume.beta.kubernetes.io/storage-class" annotation (which is a placeholder
|
|
for a volume class) does *not* match the claim's annotation (same placeholder) then the PV is skipped.
|
|
If the annotations for the PV and PVC are empty they are treated as being equal.
|
|
|
|
1. Otherwise, what remains is a list of PVs that may match the claim. Within this list of remaining PVs,
|
|
the PV with the smallest capacity that is also equal to or greater than the claim's requested storage
|
|
is the matching PV and will be bound to the claim. Done. In the case of two or more PVCs matching all
|
|
of the above criteria, the first PV (remember the PV order is based on `accessModes`) is the winner.
|
|
|
|
*Note:* if no PV matches the claim and the claim defines a `StorageClass` (or a default
|
|
`StorageClass` has been defined) then a volume will be dynamically provisioned.
|
|
|
|
#### Claim usage
|
|
|
|
The claim holder can use their claim as a volume. The ```PersistentVolumeClaimVolumeSource``` knows to fetch the PV backing the claim
|
|
and mount its volume for a pod.
|
|
|
|
The claim holder owns the claim and its data for as long as the claim exists.
|
|
The pod using the claim can be deleted, but the claim remains in the user's
|
|
namespace. It can be used again and again by many pods.
|
|
|
|
```yaml
|
|
POST:
|
|
|
|
kind: Pod
|
|
apiVersion: v1
|
|
metadata:
|
|
name: mypod
|
|
spec:
|
|
containers:
|
|
- image: nginx
|
|
name: myfrontend
|
|
volumeMounts:
|
|
- mountPath: "/var/www/html"
|
|
name: mypd
|
|
volumes:
|
|
- name: mypd
|
|
source:
|
|
persistentVolumeClaim:
|
|
accessMode: ReadWriteOnce
|
|
claimRef:
|
|
name: myclaim-1
|
|
```
|
|
|
|
#### Releasing a claim and Recycling a volume
|
|
|
|
When a claim holder is finished with their data, they can delete their claim.
|
|
|
|
```console
|
|
$ kubectl delete pvc myclaim-1
|
|
```
|
|
|
|
The ```PersistentVolumeClaimBinder``` will reconcile this by removing the claim
|
|
reference from the PV and change the PVs status to 'Released'.
|
|
|
|
Admins can script the recycling of released volumes. Future dynamic provisioners
|
|
will understand how a volume should be recycled.
|
|
|
|
|
|
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
|
[]()
|
|
<!-- END MUNGE: GENERATED_ANALYTICS -->
|