19 KiB
Pod Preset
Abstract
Describes a policy resource that allows for the loose coupling of a Pod's definition from additional runtime requirements for that Pod. For example, mounting of Secrets, or setting additional environment variables, may not be known at Pod deployment time, but may be required at Pod creation time.
Motivation
Consuming a service involves more than just connectivity. In addition to coordinates to reach the service, credentials and non-secret configuration parameters are typically needed to use the service. The primitives for this already exist, but a gap exists where loose coupling is desired: it should be possible to inject pods with the information they need to use a service on a service-by-service basis, without the pod authors having to incorporate the information into every pod spec where it is needed.
Constraints and Assumptions
- Future work might require new mechanisms to be made to work with existing controllers such as deployments and replicasets that create pods. Existing controllers that create pods should recreate their pods when a new Pod Injection Policy is added that would effect them.
Use Cases
- As a user, I want to be able to provision a new pod without needing to know the application configuration primitives the services my pod will consume.
- As a cluster admin, I want specific configuration items of a service to be withheld visibly from a developer deploying a service, but not to block the developer from shipping.
- As an app developer, I want to provision a Cloud Spanner instance and then access it from within my Kubernetes cluster.
- As an app developer, I want the Cloud Spanner provisioning process to configure my Kubernetes cluster so the endpoints and credentials for my Cloud Spanner instance are implicitly injected into Pods matching a label selector (without me having to modify the PodSpec to add the specific Configmap/Secret containing the endpoint/credential data).
Specific Example:
- Database Administrator provisions a MySQL service for their cluster.
- Database Administrator creates secrets for the cluster containing the database name, username, and password.
- Database Administrator creates a
PodPresetdefining the database port as an enviornment variable, as well as the secrets. See Examples below for various examples. - Developer of an application can now label their pod with the specified
Selectorthe Database Administrator tells them, and consume the MySQL database without needing to know any of the details from step 2 and 3.
Summary
The use case we are targeting is to automatically inject into Pods the information required to access non-Kubernetes-Services, such as accessing an instances of Cloud Spanner. Accessing external services such as Cloud Spanner may require the Pods to have specific credential and endpoint data.
Using a Pod Preset allows pod template authors to not have to explicitly set information for every pod. This way authors of pod templates consuming a specific service do not need to know all the details about that service.
Prior Art
Internally for Kubernetes we already support accessing the Kubernetes api from all Pods by injecting the credentials and endpoint data automatically - e.g. injecting the serviceaccount credentials into a volume (via secret) using an admission controller, and injecting the Service endpoints into environment variables. This is done without the Pod explicitly mounting the serviceaccount secret.
Objectives
The goal of this proposal is to generalize these capabilities so we can introduce similar support for accessing Services running external to the Kubernetes cluster. We can assume that an appropriate Secret and Configmap have already been created as part of the provisioning process of the external service. The need then is to provide a mechanism for injecting the Secret and Configmap into Pods automatically.
The ExplicitServiceLinks proposal, will allow us to decouple where a Service's credential and endpoint information is stored in the Kubernetes cluster from a Pod's intent to access that Service (e.g. in declaring it wants to access a Service, a Pod is automatically injected with the credential and endpoint data required to do so).
Proposed Changes
PodPreset API object
This resource is alpha. The policy itself is immutable. The API group will be
added to new group settings and the version is v1alpha1.
// PodPreset is a policy resource that defines additional runtime
// requirements for a Pod.
type PodPreset struct {
unversioned.TypeMeta
ObjectMeta
// +optional
Spec PodPresetSpec
}
// PodPresetSpec is a description of a pod preset.
type PodPresetSpec struct {
// Selector is a label query over a set of resources, in this case pods.
// Required.
Selector unversioned.LabelSelector
// Env defines the collection of EnvVar to inject into containers.
// +optional
Env []EnvVar
// EnvFrom defines the collection of EnvFromSource to inject into
// containers.
// +optional
EnvFrom []EnvFromSource
// Volumes defines the collection of Volume to inject into the pod.
// +optional
Volumes []Volume `json:omitempty`
// VolumeMounts defines the collection of VolumeMount to inject into
// containers.
// +optional
VolumeMounts []VolumeMount
}
Validations
In order for the Pod Preset to be valid it must fulfill the following constraints:
- The
Selectorfield must be defined. This is how we know which pods to inject so therefore it is required and cannot be empty. - The policy must define at least 1 of
Env,EnvFrom, orVolumeswith correspondingVolumeMounts. - If you define a
Volume, it has to define aVolumeMount. - For
Env,EnvFrom,Volumes, andVolumeMountsall existing API validations are applied.
This resource will be immutable, if you want to change something you can delete the old policy and recreate a new one. We can change this to be mutable in the future but by disallowing it now, we will not break people in the future.
Conflicts
There are a number of edge conditions that might occur at the time of injection. These are as follows:
- Merging lists with no conflicts: if a pod already has a
Volume,VolumeMountorEnvVardefined exactly as defined in the PodPreset. No error will occur since they are the exact same. The motivation behind this is if services have no quite converted to using pod injection policies yet and have duplicated information and an error should obviously not be thrown if the items that need to be injected already exist and are exactly the same. - Merging lists with conflicts: if a PIP redefines an
EnvVaror aVolume, an event on the pod showing the error on the conflict will be thrown and nothing will be injected. - Conflicts between
EnvandEnvFrom: this would throw an error with an event on the pod showing the error on the conflict. Nothing would be injected.
Note: In the case of a conflict nothing will be injected. The entire policy is ignored and an event is thrown on the pod detailing the conflict.
AdmissionControl Plug-in: PodPreset
The PodPreset plug-in introspects all incoming pod creation
requests and injects the pod based off a Selector with the desired
attributes.
For the initial alpha, the order of precedence for applying multiple
PodPreset specs is from oldest to newest. All Pod Injection
Policies in a namespace should be order agnostic; the order of application is
unspecified. Users should ensure that policies do not overlap.
However we can use merge keys to detect some of the conflicts that may occur.
This will not be enabled by default for all clusters, but once GA will be a part of the set of strongly recommended plug-ins documented here.
Why not an Initializer?
This will be first implemented as an AdmissionControl plug-in then can be converted to an Initializer once that is fully ready. The proposal for Initializers can be found at kubernetes/community#132.
Behavior
This will modify the pod spec. The supported changes to
Env, EnvFrom, and VolumeMounts apply to the container spec for
all containers in the pod with the specified matching Selector. The
changes to Volumes apply to the pod spec for all pods matching Selector.
The resultant modified pod spec will be annotated to show that it was modified by
the PodPreset. This will be of the form
podpreset.admission.kubernetes.io/<pip name>": "<resource version>".
Why modify all containers in a pod?
Currently there is no concept of labels on specific containers in a pod which would be necessary for per-container pod injections. We could add labels for specific containers which would allow this and be the best solution to not injecting all. Container labels have been discussed various times through multiple issues and proposals, which all congregate to this thread on the kubernetes-sig-node mailing list. In the future, even if container labels were added, we would need to be careful about not making breaking changes to the current behavior.
Other solutions include basing the container to inject based off
matching its name to another field in the PodPreset spec, but
this would not scale well and would cause annoyance with configuration
management.
In the future we might question whether we need or want containers to express that they expect injection. At this time we are deferring this issue.
Examples
Simple Pod Spec Example
This is a simple example to show how a Pod spec is modified by the Pod Injection Policy.
User submitted pod spec:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
Example Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
Pod spec after admission controller:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
volumes:
- name: cache-volume
emptyDir: {}
Pod Spec with ConfigMap Example
This is an example to show how a Pod spec is modified by the Pod Injection
Policy that defines a ConfigMap for Environment Variables.
User submitted pod spec:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
User submitted ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: etcd-env-config
data:
number_of_members: "1"
initial_cluster_state: new
initial_cluster_token: DUMMY_ETCD_INITIAL_CLUSTER_TOKEN
discovery_token: DUMMY_ETCD_DISCOVERY_TOKEN
discovery_url: http://etcd_discovery:2379
etcdctl_peers: http://etcd:2379
duplicate_key: FROM_CONFIG_MAP
REPLACE_ME: "a value"
Example Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
- name: duplicate_key
value: FROM_ENV
- name: expansion
value: $(REPLACE_ME)
envFrom:
- configMapRef:
name: etcd-env-config
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/app/config.json
readOnly: true
name: secret-volume
volumes:
- name: cache-volume
emptyDir: {}
- name: secret-volume
secretName: config-details
Pod spec after admission controller:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/app/config.json
readOnly: true
name: secret-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
- name: duplicate_key
value: FROM_ENV
- name: expansion
value: $(REPLACE_ME)
envFrom:
- configMapRef:
name: etcd-env-config
volumes:
- name: cache-volume
emptyDir: {}
- name: secret-volume
secretName: config-details
ReplicaSet with Pod Spec Example
The following example shows that only the pod spec is modified by the Pod Injection Policy.
User submitted ReplicaSet:
apiVersion: settings/v1alpha1
kind: ReplicaSet
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
tier: frontend
matchExpressions:
- {key: tier, operator: In, values: [frontend]}
template:
metadata:
labels:
app: guestbook
tier: frontend
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 80
Example Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
tier: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
Pod spec after admission controller:
kind: Pod
metadata:
labels:
app: guestbook
tier: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
spec:
containers:
- name: php-redis
image: gcr.io/google_samples/gb-frontend:v3
resources:
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /cache
name: cache-volume
env:
- name: GET_HOSTS_FROM
value: dns
- name: DB_PORT
value: 6379
ports:
- containerPort: 80
volumes:
- name: cache-volume
emptyDir: {}
Multiple PodPreset Example
This is an example to show how a Pod spec is modified by multiple Pod Injection Policies.
User submitted pod spec:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
ports:
- containerPort: 80
Example Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: cache-volume
volumes:
- name: cache-volume
emptyDir: {}
Another Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: proxy
namespace: myns
spec:
selector:
matchLabels:
role: frontend
volumeMounts:
- mountPath: /etc/proxy/configs
name: proxy-volume
volumes:
- name: proxy-volume
emptyDir: {}
Pod spec after admission controller:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
annotations:
podpreset.admission.kubernetes.io/allow-database: "resource version"
podpreset.admission.kubernetes.io/proxy: "resource version"
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
- mountPath: /etc/proxy/configs
name: proxy-volume
ports:
- containerPort: 80
env:
- name: DB_PORT
value: 6379
volumes:
- name: cache-volume
emptyDir: {}
- name: proxy-volume
emptyDir: {}
Conflict Example
This is a example to show how a Pod spec is not modified by the Pod Injection Policy when there is a conflict.
User submitted pod spec:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
volumes:
- name: cache-volume
emptyDir: {}
- containerPort: 80
Example Pod Preset:
kind: PodPreset
apiVersion: settings/v1alpha1
metadata:
name: allow-database
namespace: myns
spec:
selector:
matchLabels:
role: frontend
env:
- name: DB_PORT
value: 6379
volumeMounts:
- mountPath: /cache
name: other-volume
volumes:
- name: other-volume
emptyDir: {}
Pod spec after admission controller will not change because of the conflict:
apiVersion: v1
kind: Pod
metadata:
name: website
labels:
app: website
role: frontend
spec:
containers:
- name: website
image: ecorp/website
volumeMounts:
- mountPath: /cache
name: cache-volume
ports:
volumes:
- name: cache-volume
emptyDir: {}
- containerPort: 80
If we run kubectl describe... we can see the event:
$ kubectl describe ...
....
Events:
FirstSeen LastSeen Count From SubobjectPath Reason Message
Tue, 07 Feb 2017 16:56:12 -0700 Tue, 07 Feb 2017 16:56:12 -0700 1 {podpreset.admission.kubernetes.io/allow-database } conflict Conflict on pod preset. Duplicate mountPath /cache.