Redesign mount propagation

This commit is contained in:
Jan Safranek 2017-08-01 12:46:13 +02:00
parent 4b82fb1ac0
commit baf0dcb0bd
1 changed files with 47 additions and 8 deletions

View File

@ -65,6 +65,12 @@ and references to network namespaces persist.
The new `VolumeMount` will look like: The new `VolumeMount` will look like:
```go ```go
const (
PropagationRShared PropagationMode = "RShared"
PropagationRSlave PropagationMode = "RSlave"
PropagationPrivate PropagationMode = "Private"
)
type VolumeMount struct { type VolumeMount struct {
// Required: This must match the Name of a Volume [above]. // Required: This must match the Name of a Volume [above].
Name string `json:"name"` Name string `json:"name"`
@ -73,13 +79,18 @@ type VolumeMount struct {
// Required. // Required.
MountPath string `json:"mountPath"` MountPath string `json:"mountPath"`
// Optional. // Optional.
Propagation string `json:"propagation"` Propagation PropagationMode `json:"propagation,omitempty"`
} }
``` ```
Default would be `RSlave`, which should not break backward compatibility,
`RShared` must be explicitly requested.
Opinion against this: Opinion against this:
1. This will affect all volumes, while only HostPath need this. 1. This will affect all volumes, while only HostPath need this. It could be
checked during validation and any non-HostPath volumes with non-default
propagation could be rejected.
1. This need API change, which is discouraged. 1. This need API change, which is discouraged.
@ -89,8 +100,8 @@ The new `HostPathVolumeSource` will look like:
```go ```go
const ( const (
PropagationShared PropagationMode = "Shared" PropagationRShared PropagationMode = "RShared"
PropagationSlave PropagationMode = "Slave" PropagationRSlave PropagationMode = "RSlave"
PropagationPrivate PropagationMode = "Private" PropagationPrivate PropagationMode = "Private"
) )
@ -142,13 +153,41 @@ distros.
1. (From @euank) Changing those mountflags may make docker even less stable, 1. (From @euank) Changing those mountflags may make docker even less stable,
this may lock up kernel accidentally or potentially leak mounts. this may lock up kernel accidentally or potentially leak mounts.
1. (From @jsafrane) Typical container that needs to mount something needs to
see host's `/dev` and `/sys` as HostPath volumes. This would make them shared
without any way to opt-out. Docker creates a new `/dev/shm` in the
container, which gets propagated to the host, shadowing host's `/dev/shm`.
Similarly, systemd running in a container is very picky about `/sys/fs/cgroup`
and something prevents it from starting if `/sys` is shared.
## Decision ## Decision
We will take 'Make HostPath shared for privileged containers, slave for * We will take 'Add an option in VolumeMount API' (with an annotation during
non-privileged', an environment check and an WARNING log will be emitted about alpha instead of real VolumeMount field):
whether propagation mode is supported. * With validation that it can be used only with HostPath volumes.
* With validation that shared propagation can be used only in privileged
containers.
* kubernetes/kubernetes#46444
* Kubelet will make sure that at least `/var/lib/kubelet` can be share-able into
containers and it will refuse to start if it's unsuccessful
* kubernetes/kubernetes#45724
* Kubelet's Docker shim layer will check that it is able to run a container with
shared mount propagation on `/var/lib/kubelet` during startup and refuse to
start otherwise. This ensures that both Docker and kubelet see the same
`/var/lib/kubelet` and it can be shared into containers.
E.g. Google COS-58 runs Docker in a separate mount namespace with slave
propagation and thus can't run a container with shared propagation on
anything. Other container engines should follow the suit.
* Node conformance suite will check that mount propagation in /var/lib/kubelet
works.
* During alpha, all the behavior above must be explicitly enabled by
`kubelet --experimental-enable-mount-propagation`
It will be used only for testing of volume plugins in e2e tests and
it will be marked as deprecated from the beginning.
Developers / testers can enable it in their clusters manually.
Mount propagation may be redesigned or even removed in any future release.
* The default mount propagation will be `rslave`, which is different to current
`private`. Extensive testing is needed!
## Extra Concerns ## Extra Concerns