Require shared PID namespace in CRI & plan rollout
This commit is contained in:
parent
d3b09aa70d
commit
d4789e1112
|
|
@ -86,7 +86,7 @@ container setup that are not currently trackable as Pod constraints, e.g.,
|
|||
filesystem setup, container image pulling, etc.*
|
||||
|
||||
A container in a PodSandbox maps to an application in the Pod Spec. For Linux
|
||||
containers, they are expected to share at least network and IPC namespaces,
|
||||
containers, they are expected to share at least network, PID and IPC namespaces,
|
||||
with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615).
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -1,70 +0,0 @@
|
|||
# Shared PID Namespace for the Docker Runtime
|
||||
|
||||
Pods share many namespaces, but the ability to share a PID namespace was not
|
||||
supported by Docker until version 1.12. SIG Node approved a change to the
|
||||
default behavior contingent on a brief rollout plan, which is this document.
|
||||
Please refer to [#1615](https://issues.k8s.io/1615) for full technical details.
|
||||
|
||||
## Motivation
|
||||
|
||||
Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615),
|
||||
and enables:
|
||||
|
||||
1. signaling between containers, which is useful for side cars (e.g. for
|
||||
signaling a daemon process after rotating logs).
|
||||
2. easier troubleshooting of pods.
|
||||
3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the
|
||||
infra container.
|
||||
|
||||
## Goals and Non-Goals
|
||||
|
||||
Goals include:
|
||||
- Changing default behavior in the Kubernetes Docker runtime
|
||||
|
||||
Non-goals include:
|
||||
- Creating an init solution that works for all runtimes
|
||||
- Supporting isolated PID namespace indefinitely
|
||||
- Addressing the larger issue of requiring shared namespaces in all runtimes
|
||||
|
||||
Kubernetes does not currently specify how runtimes must support a PID namespace,
|
||||
but many runtimes (e.g. cri-o & rkt) already support a shared namespace. This
|
||||
rolls out support for Docker.
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
Sharing the PID namespace changes an implicit behavior of the Docker runtime
|
||||
whereby the command run by the container image is always PID 1. This is a side
|
||||
effect of isolated namespaces rather than intentional behavior, but users may
|
||||
have built upon this assumption so we should change the default behavior over
|
||||
the course of multiple releases. (The following release numbers are earliest
|
||||
possible releases and may change based on implementation and community
|
||||
feedback.)
|
||||
|
||||
1. Release 1.6: Enable the shared PID namespace for pods annotated with
|
||||
`docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with
|
||||
Docker >= 1.12. Pods with this annotation will fail to start with older
|
||||
Docker versions rather than failing to meet a user's expectation.
|
||||
2. Release 1.7: Enable the shared PID namespace for pods unless annotated
|
||||
with `docker.kubernetes.io/shared-pid: false` (i.e. opt-out) when running
|
||||
with Docker >= 1.12.
|
||||
3. Release 1.8: Remove the annotation. All pods receive a shared PID
|
||||
namespace when running with Docker >= 1.12.
|
||||
|
||||
With each step we will add a release note that clearly describes the change.
|
||||
After each release we will poll kubernetes-users to determine what, if any,
|
||||
applications were impacted by this change. If we discover a use case which
|
||||
cannot be accommodated by a shared PID namespace, we will abort step 3 and
|
||||
instead formalize a shared-pid field into the pod spec.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
Changing this behavior over the course of 6 months is a bit conservative. We
|
||||
could instead change the behavior in 2 releases by omitting the first step, but
|
||||
the opt-in phase allows users to test the change with fewer surprises.
|
||||
|
||||
[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
|
|
@ -0,0 +1,78 @@
|
|||
# Shared PID Namespace
|
||||
|
||||
Pods share namespaces where possible, but a requirement for sharing the PID
|
||||
namespace has not been defined due to lack of support in Docker. Docker began
|
||||
supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt,
|
||||
cri-o, hyper) have already implemented a shared PID namespace.
|
||||
|
||||
This proposal defines a shared PID namespace as a requirement of the Container
|
||||
Runtime Interface and links its rollout in Docker to that of the CRI.
|
||||
|
||||
## Motivation
|
||||
|
||||
Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615),
|
||||
and enables:
|
||||
|
||||
1. signaling between containers, which is useful for side cars (e.g. for
|
||||
signaling a daemon process after rotating logs).
|
||||
2. easier troubleshooting of pods.
|
||||
3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the
|
||||
infra container.
|
||||
|
||||
## Goals and Non-Goals
|
||||
|
||||
Goals include:
|
||||
- Changing default behavior in the Docker runtime as implemented by the CRI
|
||||
- Making Docker behavior compatible with the other Kubernetes runtimes
|
||||
|
||||
Non-goals include:
|
||||
- Creating an init solution that works for all runtimes
|
||||
- Supporting isolated PID namespace indefinitely
|
||||
|
||||
## Modification to the Docker Runtime
|
||||
|
||||
We will modify the Docker implementation of the CRI to use a shared PID
|
||||
namespace when running with a version of Docker >= 1.12. The legacy
|
||||
`dockertools` implementation will not be changed.
|
||||
|
||||
Linking this change to the CRI means that Kubernetes users who care to test such
|
||||
changes can test the combined changes at once. Users who do not care to test
|
||||
such changes will be insulated by Kubernetes not recommending Docker >= 1.12
|
||||
until after switching to the CRI.
|
||||
|
||||
Other changes that must be made to support this change:
|
||||
|
||||
1. Ensure all containers restart if the infra container responsible for the
|
||||
PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace
|
||||
dies all containers sharing that namespace are killed as well.)
|
||||
2. Modify the Infra container used by the Docker runtime to reap orphaned
|
||||
zombies ([#36853](https://pr.k8s.io/36853)).
|
||||
|
||||
## Rollout Plan
|
||||
|
||||
SIG Node is planning to switch to the CRI as a default in 1.6, at which point
|
||||
users with Docker >= 1.12 will be able to test Shared namespaces. Switching
|
||||
back to isolated PID namespaces will require disabling the CRI.
|
||||
|
||||
At some point, say 1.7, SIG Node will remove support for disabling the CRI.
|
||||
After this point users must roll back to a previous version of Kubernetes or
|
||||
Docker to achieve PID namespace isolation. This is acceptable because:
|
||||
|
||||
* No one has been able to identify a concrete use case requiring isolated PID
|
||||
namespaces.
|
||||
* The lack of use cases means we can't justify the complexity required to make
|
||||
PID namespace type configurable.
|
||||
* Users will already be looking for issues due to the major version upgrade and
|
||||
prepared for a rollback to the previous release.
|
||||
|
||||
Alternatively, we could create a flag in the kublet to disable shared PID
|
||||
namespace, but this wouldn't be especially useful to users of a hosted
|
||||
Kubernetes cluster.
|
||||
|
||||
|
||||
[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
|
||||
|
||||
|
||||
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
|
||||
[]()
|
||||
<!-- END MUNGE: GENERATED_ANALYTICS -->
|
||||
Loading…
Reference in New Issue