Require shared PID namespace in CRI & plan rollout

2017-01-18 17:27:53 -08:00 · 2017-01-18 17:27:53 -08:00 · d4789e1112
parent d3b09aa70d
commit d4789e1112
3 changed files with 79 additions and 71 deletions
--- a/contributors/design-proposals/container-runtime-interface-v1.md
+++ b/contributors/design-proposals/container-runtime-interface-v1.md
@ -86,7 +86,7 @@ container setup that are not currently trackable as Pod constraints, e.g.,
 filesystem setup, container image pulling, etc.*

 A container in a PodSandbox maps to an application in the Pod Spec. For Linux
-containers, they are expected to share at least network and IPC namespaces,
+containers, they are expected to share at least network, PID and IPC namespaces,
 with sharing more namespaces discussed in [#1615](https://issues.k8s.io/1615).


--- a/contributors/design-proposals/pod-pid-namespace-docker.md
+++ b/contributors/design-proposals/pod-pid-namespace-docker.md
@ -1,70 +0,0 @@
-# Shared PID Namespace for the Docker Runtime
-
-Pods share many namespaces, but the ability to share a PID namespace was not
-supported by Docker until version 1.12. SIG Node approved a change to the
-default behavior contingent on a brief rollout plan, which is this document.
-Please refer to [#1615](https://issues.k8s.io/1615) for full technical details.
-
-## Motivation
-
-Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615),
-and enables:
-
-  1. signaling between containers, which is useful for side cars (e.g. for
-     signaling a daemon process after rotating logs).
-  2. easier troubleshooting of pods.
-  3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the
-     infra container.
-
-## Goals and Non-Goals
-
-Goals include:
-  - Changing default behavior in the Kubernetes Docker runtime
-
-Non-goals include:
-  - Creating an init solution that works for all runtimes
-  - Supporting isolated PID namespace indefinitely
-  - Addressing the larger issue of requiring shared namespaces in all runtimes
-
-Kubernetes does not currently specify how runtimes must support a PID namespace,
-but many runtimes (e.g. cri-o & rkt) already support a shared namespace. This
-rolls out support for Docker.
-
-## Rollout Plan
-
-Sharing the PID namespace changes an implicit behavior of the Docker runtime
-whereby the command run by the container image is always PID 1. This is a side
-effect of isolated namespaces rather than intentional behavior, but users may
-have built upon this assumption so we should change the default behavior over
-the course of multiple releases. (The following release numbers are earliest
-possible releases and may change based on implementation and community
-feedback.)
-
-  1. Release 1.6: Enable the shared PID namespace for pods annotated with
-     `docker.kubernetes.io/shared-pid: true` (i.e. opt-in) when running with
-     Docker >= 1.12. Pods with this annotation will fail to start with older
-     Docker versions rather than failing to meet a user's expectation.
-  2. Release 1.7: Enable the shared PID namespace for pods unless annotated
-     with `docker.kubernetes.io/shared-pid: false` (i.e. opt-out) when running
-     with Docker >= 1.12.
-  3. Release 1.8: Remove the annotation. All pods receive a shared PID
-     namespace when running with Docker >= 1.12.
-
-With each step we will add a release note that clearly describes the change.
-After each release we will poll kubernetes-users to determine what, if any,
-applications were impacted by this change. If we discover a use case which
-cannot be accommodated by a shared PID namespace, we will abort step 3 and
-instead formalize a shared-pid field into the pod spec.
-
-## Alternatives Considered
-
-Changing this behavior over the course of 6 months is a bit conservative. We
-could instead change the behavior in 2 releases by omitting the first step, but
-the opt-in phase allows users to test the change with fewer surprises.
-
-[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
-
-
-<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
-[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]()
-<!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/contributors/design-proposals/pod-pid-namespace.md
+++ b/contributors/design-proposals/pod-pid-namespace.md
@ -0,0 +1,78 @@
+# Shared PID Namespace
+
+Pods share namespaces where possible, but a requirement for sharing the PID
+namespace has not been defined due to lack of support in Docker. Docker began
+supporting a shared PID namespace in 1.12, and other Kubernetes runtimes (rkt,
+cri-o, hyper) have already implemented a shared PID namespace.
+
+This proposal defines a shared PID namespace as a requirement of the Container
+Runtime Interface and links its rollout in Docker to that of the CRI.
+
+## Motivation
+
+Sharing a PID namespace is discussed in [#1615](https://issues.k8s.io/1615),
+and enables:
+
+  1. signaling between containers, which is useful for side cars (e.g. for
+     signaling a daemon process after rotating logs).
+  2. easier troubleshooting of pods.
+  3. addressing [Docker's zombie problem][1] by reaping orphaned zombies in the
+     infra container.
+
+## Goals and Non-Goals
+
+Goals include:
+  - Changing default behavior in the Docker runtime as implemented by the CRI
+  - Making Docker behavior compatible with the other Kubernetes runtimes
+
+Non-goals include:
+  - Creating an init solution that works for all runtimes
+  - Supporting isolated PID namespace indefinitely
+
+## Modification to the Docker Runtime
+
+We will modify the Docker implementation of the CRI to use a shared PID
+namespace when running with a version of Docker >= 1.12. The legacy
+`dockertools` implementation will not be changed.
+
+Linking this change to the CRI means that Kubernetes users who care to test such
+changes can test the combined changes at once. Users who do not care to test
+such changes will be insulated by Kubernetes not recommending Docker >= 1.12
+until after switching to the CRI.
+
+Other changes that must be made to support this change:
+
+1. Ensure all containers restart if the infra container responsible for the
+   PodSandbox dies. (Note: With Docker 1.12 if the source of the PID namespace
+   dies all containers sharing that namespace are killed as well.)
+2. Modify the Infra container used by the Docker runtime to reap orphaned
+   zombies ([#36853](https://pr.k8s.io/36853)).
+
+## Rollout Plan
+
+SIG Node is planning to switch to the CRI as a default in 1.6, at which point
+users with Docker >= 1.12 will be able to test Shared namespaces. Switching
+back to isolated PID namespaces will require disabling the CRI.
+
+At some point, say 1.7, SIG Node will remove support for disabling the CRI.
+After this point users must roll back to a previous version of Kubernetes or
+Docker to achieve PID namespace isolation. This is acceptable because:
+
+* No one has been able to identify a concrete use case requiring isolated PID
+  namespaces.
+* The lack of use cases means we can't justify the complexity required to make
+  PID namespace type configurable.
+* Users will already be looking for issues due to the major version upgrade and
+  prepared for a rollback to the previous release.
+
+Alternatively, we could create a flag in the kublet to disable shared PID
+namespace, but this wouldn't be especially useful to users of a hosted
+Kubernetes cluster.
+
+
+[1]: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
+
+
+<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/pod-pid-namespace.md?pixel)]()
+<!-- END MUNGE: GENERATED_ANALYTICS -->