36 KiB
Extension of Admission Control via Initializers and External Admission Enforcement
Admission control is the primary business-logic policy and enforcement subsystem in Kubernetes. It provides synchronous hooks for all API operations and allows an integrator to impose additional controls on the system - rejecting, altering, or reacting to changes to core objects. Today each of these plugins must be compiled into Kubernetes. As Kubernetes grows, the requirement that all policy enforcement beyond coarse grained access control be done through in-tree compilation and distribution becomes unwieldy and limits administrators and the growth of the ecosystem.
This proposal covers changes to the admission control subsystem that allow extension of admission without recompilation and dynamic admission control configuration in ways that resemble existing controller behavior.
Background
The four core systems in Kubernetes are:
- API servers with persistent storage, providing basic object validation, defaulting, and CRUD operations
- Authentication and authorization layers that identify an actor and constrain the coarse actions that actor can take on API objects
- Admission controller layers that can control and limit the CRUD operations clients perform synchronously.
- Controllers which watch the API and react to changes made by other users asynchronously (scheduler, replication controller, kubelet, kube-proxy, and ingress are all examples of controllers).
Admission control supports a wide range of policy and behavior enforcement for cluster administrators and integrators.
Types of Admission Control
In Kubernetes 1.5 and OpenShift 1.4, the following types of functionality have been implemented through admission
(all file references are relative to plugin/pkg/admission, or simply identified by name for OpenShift). Many of the
Kubernetes admission controllers originated in OpenShift and are listed in both for history.
Resource Control
These admission controllers take resource usage for pods into account to ensure namespaces cannot abuse the cluster by consuming more than their fair share of resources. These perform security or defaulting type roles.
Kubernetes
| Name | Code | Description |
|---|---|---|
| InitialResources | initialresources/admission.go | Default the resources for a container based on past usage |
| LimitRanger | limitranger/admission.go | Set defaults for container requests and limits, or enforce upper bounds on certain resources (no more than 2GB of memory, default to 512MB). Implements the behavior of a v1 API (LimitRange). |
| ResourceQuota | resourcequota/admission.go | Calculate and deny number of objects (pods, rc, service load balancers) or total consumed resources (cpu, memory, disk) in a namespace. Implements the behavior of a v1 API (ResourceQuota). |
OpenShift
| Name | Code | Description |
|---|---|---|
| ClusterResourceOverride | clusterresourceoverride/admission.go | Allows administrators to override the user's container request for CPU or memory as a percentage of their request (the administrator's target overcommit number), or to default a limit based on a request. Allows cluster administrators to control overcommit on a cluster. |
| ClusterResourceQuota | clusterresourcequota/admission.go | Performs quota calculations over a set of namespaces with a shared quota. Can be used in conjunction with resource quota for hard and soft limits. |
| ExternalIPRanger | externalip_admission.go | Prevents users from creating services with externalIPs inside of fixed CIDR ranges, including the pod network, service network, or node network CIDRs to prevent hijacking of connections. |
| ImageLimitRange | admission.go | Performs LimitRanging on images that are pushed into the integrated image registry |
| OriginResourceQuota | resourcequota/admission.go | Performs quota calculations for API resources exposed by OpenShift. Demonstrates how quota would be implemented for API extensions. |
| ProjectRequestLimit | requestlimit/admission.go | A quota on how many namespaces may be created by any individual user. Has a global default and also a per user override. |
| RunOnceDuration | runonceduration/admission.go | Enforces a maximum ActiveDeadlineSeconds value on all RestartNever pods in a namespace. This ensures that users are defaulted to have a deadline if they did not request it (which prevents pathological resource consumption) |
Quota is typically last in the admission chain, to give all other components a chance to reject or modify the resource.
Security
These controllers defend against specific actions within a resource that might be dangerous that the authorization system cannot enforce.
Kubernetes
| Name | Code | Description |
|---|---|---|
| AlwaysPullImages | alwayspullimages/admission.go | Forces the Kubelet to pull images to prevent pods from accessing private images that another user with credentials has already pulled to the node. |
| LimitPodHardAntiAffinityTopology | antiaffinity/admission.go | Defended the cluster against abusive anti-affinity topology rules that might hang the scheduler. |
| DenyEscalatingExec | exec/admission.go | Prevent users from executing into pods that have higher privileges via their service account than allowed by their policy (regular users can't exec into admin pods). |
| DenyExecOnPrivileged | exec/admission.go | Blanket ban exec access to pods with host level security. Superseded by DenyEscalatingExec |
| OwnerReferencesPermissionEnforcement | gc/gc_admission.go | Require that a user who sets a owner reference (which could result in garbage collection) has permission to delete the object, to prevent abuse. |
| ImagePolicyWebhook | imagepolicy/admission.go | Invoke a remote API to determine whether an image is allowed to run on the cluster. |
| PodNodeSelector | podnodeselector/admission.go | Default and limit what node selectors may be used within a namespace by reading a namespace annotation and a global configuration. |
| PodSecurityPolicy | security/podsecuritypolicy/admission.go | Control what security features pods are allowed to run as based on the end user launching the pod or the service account. Sophisticated policy rules. |
| SecurityContextDeny | securitycontext/scdeny/admission.go | Blanket deny setting any security context settings on a pod. |
OpenShift
| Name | Code | Description |
|---|---|---|
| BuildByStrategy | strategyrestrictions/admission.go | Control which types of image builds a user can create by checking for a specific virtual authorization rule (field level authorization), since some build types have security implications. |
| OriginPodNodeEnvironment | nodeenv/admission.go | Predecessor to PodNodeSelector. |
| PodNodeConstraints | podnodeconstraints/admission.go | Prevent users from setting nodeName directly unless they can invoke the bind resource on pods (same as a scheduler). This prevents users from attacking nodes by repeatedly creating pods that target a specific node and forcing it to reject those pods. (field level authorization) |
| RestrictedEndpointsAdmission | endpoint_admission.go | In a multitenant network setup where namespaces are isolated like OpenShift SDN, service endpoints must not allow a user to probe other namespaces. If a user edits the endpoints object and sets IPs that fall within the pod network CIDR, the user must have create permission on a virtual resource endpoints/restricted. The service controller is granted this permission by default. |
| SecurityContextConstraint | admission.go | Predecessor to PodSecurityPolicy. |
| SCCExecRestrictions | scc_exec.go | Predecessor to DenyEscalatingExec. |
Many other controllers have been proposed, including but not limited to:
- Control over what taints and tolerations a user can set on a pod
- Control over which labels and annotations can be set or changed
- Generic control over which fields certain users may set (field level access control)
Defaulting / Injection
These controllers inject namespace or cluster context into pods and other resources at runtime to decouple application config from runtime config (separate the user's pod settings from environmental controls)
Kubernetes
| Name | Code | Description |
|---|---|---|
| ServiceAccount | serviceaccount/admission.go | Bind mount the service account token for a pod into the pod at a specific location. |
| PersistentVolumeLabel | persistentvolume/label/admission.go | Lazily bind persistent volume claims to a given zone when a pod is scheduled. |
| DefaultStorageClass | storageclass/default/admission.go | Set a default storage class on any PVC created without a storage class. |
Many other controllers have been proposed, including but not limited to:
- ServiceInjectionPolicy to inject environment, configmaps, and secrets into pods that reference those services
- Namespace level environment injection (all pods in this namespace should have env var
ENV=PROD) - Label selector based resource defaults (all pods with these labels get these default resources)
Referential Consistency
These controllers enforce that certain guarantees of the system related to integrity.
Kubernetes
| Name | Code | Description |
|---|---|---|
| NamespaceAutoProvision | namespace/autoprovision/admission.go | When users create resources in a namespace that does not exist, ensure the namespace is created so it can be seen with kubectl get namespaces |
| NamespaceExists | namespace/exists/admission.go | Require that a namespace object exist prior to a resource being created. |
| NamespaceLifecycle | namespace/lifecycle/admission.go | More powerful and flexible version of NamespaceExists. |
OpenShift
| Name | Code | Description |
|---|---|---|
| JenkinsBootstrapper | jenkinsbootstrapper/admission.go | Spawn a Jenkins instance in any project where a Build is defined that references a Jenkins pipeline. Checks that the creating user has permission to act-as an editor in the project to prevent escalation within a namespace. |
| ImagePolicy | imagepolicy/imagepolicy.go | Performs policy functions like ImagePolicyWebhook, but also is able to mutate the image reference from a tag to a digest (fully qualified spec), look up additional information about the image from the OpenShift Image API and potentially enforce resource consumption or placement decisions based on the image. May also be used to deny images from being used that don't resolve to image metadata that OpenShift tracks. |
| OriginNamespaceLifecycle | lifecycle/admission.go | Controls accepting resources for namespaces. |
Patterns
In a study of all known admission controllers, the following patterns were seen most often:
- Defaulting on creation
- Synchronous validation on creation
- Synchronous validation on update - side-effect free
Other patterns seen less frequently include:
- Defaulting on update
- Resolving / further specifying values on update (ImagePolicy)
- Creating resources in response to user action with the correct permission check (JenkinsBootstrapper)
- Policy decisions based on who is doing the action (OwnerReferencesPermissionEnforcement, PodSecurityPolicy, JenkinsBootstrapper)
- Synchronous validation on update - with side effects (quota)
While admission controllers can operate on all verbs, resources, and sub resource types, in practice they mostly deal with create and update on primary resources. Most sub resources are highly privileged operations and so are typically covered by authorization policy. Other controllers like quota tend to be per apiserver and therefore are not required to be extensible.
Building enforcement
In order to implement custom admission, an admin, integrator, or distribution of Kubernetes must compile their
admission controller(s) into the Kubernetes kube-apiserver binary. As Kubernetes is intended to be a
modular layered system this means core components must be upgraded to effect policy changes and only a fixed
list of plugins can be used. It also prevents experimentation and prototyping of policy, or "quick fix"
solutions applied on site. As we add additional APIs that are not hosted in the main binary (either as third
party resources or API extension servers), these APIs have many of the same security and policy needs that
the core resources do, but must compile in their own subsets of admission.
Further, distributions of Kubernetes like OpenShift that wish to offer complete solutions (such as OpenShift's multi-tenancy model) have no mechanism for running on top of Kubernetes without recompilation of the core or for extending the core with additional policy. This prevents the formation of an open ecosystem for tools around Kubernetes, forcing all changes to policy to go through the Kubernetes codebase review gate (when such review is unnecessary or disruptive to Kubernetes itself).
Ordering of admission
Previous work has described a logical ordering for admission:
- defaulting (PodPreset)
- mutation (ClusterResourceOverride)
- validation (PodSecurityPolicy)
- transactional (ResourceQuota)
Most controllers fit cleanly into one of these buckets. Controllers that need to act in multiple phases are often best split into separate admission controllers, although today we offer no code mechanism to share a request local cache. Extension may need to occur at each of these phases.
Design
It should be possible to perform holistic policy enforcement in Kubernetes without the recompilation of the core project as plugins that can be added and removed to a stock Kubernetes release. That extension of admission control should leverage similar our existing controller patterns and codebase where possible. Extension must be as performant and reliable as other core mechanisms.
Requirements
-
Easy Initialization
Privileged components should be able to easily participate in the initialization of a new object.
-
Synchronous Validation
Synchronous rejection of initialized objects or mutations must be possible outside of the kube-apiserver binary
-
Backwards Compatible
Existing API clients must see no change in behavior to external admission other than increased latency
-
Easy Installation
Administrators should be able to easily write a new admission plugin and deploy it in the cluster
-
Performant
External admission must not significantly regress performance in large and dense clusters
-
Reliable
External admission should be capable of being "production-grade" for deployment in an extremely large and dense cluster
-
Internally Consistent
Developing an admission controller should reuse as much infrastructure and tools as possible from building custom controllers so as to reduce the cost of extension.
Specification
Based on observation of the actual admission control implementations the majority of mutation occurs as part of creation, and a large chunk of the remaining controllers are for side-effect free validation of creation and updates. Therefore we propose the following changes to Kubernetes:
-
Allow some controllers to act as "initializers" - watching the API and mutating the object before it is visible to normal clients.
This would reuse the majority of the infrastructure in place for controllers. Because creation is one-way, the object can be "revealed" to regular clients once a set list of initializers is consumed. These controllers could run on the cluster as pods. Because initialization is a non-backwards compatible API change, some care must be taken to shield old clients from observing the scenario.
-
Add a generic external admission webhook controller that is non-mutating (thus parallelizable)
This generic webhook API would resemble
admission.Interfaceand be given the input object (for create) and the previous object (for update/patch). After initialization or on any update, these hooks would be invoked in parallel against the remote servers and any rejection would reject the mutation. -
Make the registration of both initializers and admission webhooks dynamic via the API (a configmap or cluster scoped resource)
Administrators should be able to dynamically add or remove hooks and initializers on demand to the cluster. Configuration would be similar to registering new API group versions and include config like "fail open" or "fail closed".
Some admission controller types would not be possible for these extensions:
- Mutating admission webhooks are not part of the initial implementation but are desired in the future.
- Admission controllers that need access to the acting user can receive that via the external webhook.
- Admission controllers that "react" to the acting user can couple the information received via a webhook and then act if they observe mutation succeed (tuple combining resource UID and resource generation).
- Quota will continue to be a core plugin per API server, so extension is not critical.
Implications:
- Initializers and generic admission controllers are highly privileged, so while some separation is valuable they are effectively cluster scoped
- This mechanism would allow dedicated infrastructure to host admission for multiple clusters, and allow some expensive admission to be centralized (like quota which is hard to performantly distribute)
- There is no way to build initializers for updates without a much more complicated model, but we anticipate initializers to work best on creation.
- Ordering will probably be necessary on initializers because defaulting in the wild requires ordering. Non-mutating validation on the other hand can be fully parallel.
- Some admission depends on knowing the identity of the actor - we will likely need to include the creator as information to initializers.
- Quota must still run after all validators are invoked. We may need to make quota extensible in the future.
Initializers
An initializer must allow some API clients to perceive creations prior to other apps. For backwards compatibility, uninitialized objects must be invisible to legacy clients. In addition, initializers must be recognized as participating in the initialization process and therefore a list of initializers must be populated onto each new object. Like finalizers, each initializer should perform an update and remove itself from the object.
Every API object will have the following field added:
type ObjectMeta struct {
...
// Initializers is a list of initializers that must run prior to this object being visible to
// normal clients. Only highly privileged users may modify this field. If this field is set,
// then normal clients will receive a 202 Accepted and a Status object if directly retrieved
// by name, and it will not be visible via listing or watching.
Initializers *Initializers `json:"initializers"`
...
}
// Initializers tracks the progress of initialization.
type Initializers struct {
// Pending is a list of initializers that must execute in order before this object is visible.
// When the last pending initializer is removed, and no failing result is set, the initializers
// struct will be set to nil and the object is considered as initialized and visible to all
// clients.
Pending []Initializer `json:"pending"`
// If result is set with the Failure field, the object will be persisted to etcd and then deleted,
// ensuring that other clients can observe the deletion.
Result *metav1.Status `json:"status"`
}
// Initializer records a single pending initializer. It is a struct for future extension.
type Initializer struct {
// Name is the name of the process that owns this initializer
Name string `json:"name"`
}
On creation, a compiled in admission controller defaults the initializers field (if nil) to a value from
the system configuration that may vary by resource type. If initializers is set the admission controller
will check whether the user has the ability to run the initialize verb on the current resource type, and
reject the entry with 403 if not. This allows a privileged user to bypass initialization by setting
initializers to the empty struct.
Once created, an object is not visible to clients unless the following conditions are satisfied:
- The initializers field is null.
- The client provides a special option to GET, LIST, or WATCH indicating that the user wishes to see uninitialized objects.
The apiserver that accepts the incoming creation should hold the response until the object is
initialized or the timeout is exceeded. This increases latency, but allows clients to avoid breaking
semantics. If the apiserver reaches the timeout it must return an appropriate error that includes the
resource version of the object and UID so that clients can perform a watch. If an initializer reports
a Result with a failure, it must return that to the user (all failures result in deletion).
Each initializer is a controller or other client agent that watches for new objects with an initializer
whose first position matches their assigned name (e.g. PodAutoSizer) and then operate on them. These
clients would use the ?includeUninitialized=true query param (working name) and observe all
objects.
The initializer would perform a normal update on the object to perform their function, and then
remove their entry from initializers (or adding more entries). If an error occurs during initialization
that must terminate initialization, the Status field on the initializer should be set instead of removing
the initializer entry and then the initializer should delete the object. The client would receive this
status as the response to their creation (as described below).
During initialization, resources may have relaxed validation requirements, which means initializers must handle incomplete objects. The create call will perform normal defaulting so that initializers are not providing their own defaulting, including UID and creationTimestamp. At all phases the object must be valid, so resources that wish to use initializers should consider how defaulting would complicate initializers.
To allow naive clients to avoid having to deal with uninitialized objects, the API will automatically
filter uninitialized objects out LIST and WATCH. Explicit GETs to that object should return the
appropriate status code 202 Accepted indicating that the resource is reserved and including a Status
response with the correct resource version, but not the object. Clients specifying
includeUninitialized will see all updates, but shared code like caches and informers may need
to implement layered filters to handle multiple clients requesting both variants. A CREATE to an
uninitialized object should report the same status as before, and DELETE is always allowed.
There is no current error case for a timeout that exactly matches existing behavior except a 5xx timeout if etcd does not respond quickly. We should return that error if CREATE exceeds the timeout, but return an appropriate status cause that lets a client determine what the outcome was.
Initializers are allowed to set other initializers or finalizers.
Example flow:
This flow shows the information moving across the system during a successful creation
Client APIServer (req) APIServer (other) Initializer 1 Initializer 2
----------------- ---------------- ----------------- -------------------- -------------------
listen <--------- WATCH /pods?init=0
listen <-------------|---------------- WATCH /pods?init=0
| |
POST /pods -----> validate | |
admission(init): | |
default v |
save etcd -----------------------> observe |
WATCH /pods/1 v |
| change resources |
| clear initializer |
| validate <------- PUT /pods/1 |
| admission(init): |
| check authz v
| save etcd ---------------------------> observe
| change env vars
| clear initializer
| validate <---------------------------- PUT /pods
| admission(init):
v check authz
observe <------- save etcd
response <------- handle
An example flow where initialization fails:
Client APIServer (req) APIServer (other) Initializer 1 Initializer 2
----------------- ---------------- ----------------- -------------------- -------------------
listen <--------- WATCH /pods?init=0
listen <-------------|---------------- WATCH /pods?init=0
| |
POST /pods -----> validate | |
admission(init): | |
default v |
save etcd -----------------------> observe |
WATCH /pods/1 v |
| change resources |
| clear initializer |
| validate <------- PUT /pods/1 |
| admission(init): |
| check authz v
| save etcd ---------------------------> observe
| failed to retrieve object
| set result to failure
| validate <---------------------------- PUT /pods
| admission(init):
| check authz
v signal failure
observe <------- save etcd
response <------- handle |
v
delete object
Failure
If the apiserver crashes before the initialization is complete, it may be necessary for the apiserver or another controller to complete deletion.
- Alternatively, upon receiving a failed status update during initialization, the apiserver could delete the object at that time.
- The garbage collector controller could be responsible for cleaning up resources that failed initialization.
- Read repair could be performed when listing resources from the apiserver.
Quota
Quota consists of two distinct parts - object count, which prevents abuse for some limited resource, and sub-object consumption which may not be known until after all initializers are executed.
Object count quota should be applied prior to the object being persisted to etcd (prior to initialization). All other quota should be applied when the object completes initialization. Compensation should run at the normal spot.
Bypassing initializers
Initializers, like external admission hooks, raise the potential for a cluster that cannot make progress or heal itself. An initializer on pods could block an emergency fallback scheduler from launching a new scheduler pod. An initializer on the endpoints resource could prevent masters from registering themselves, blocking an extension API server from observing endpoints changes to allow it to watch endpoints. In general, new initializers must be careful to not create circular dependencies with the masters.
The ability to set an empty initializer list allows cluster level components to make progress in the face of extension. Additional information may need to be returned with an object creation error to indicate which component failed to initialize.
Generic external admission webhook
Existing webhooks demonstrate specific admission callouts for image policy. In general, the admission.Interface
already defines a reasonable pattern for mutations. Admission is much less common for retrieval operations.
Add a new GenericAdmissionWebhook that is a list of endpoints that will receive POST operations. The schema
of the object is modelled after SubjectAccessReview and admission.Interface. The caller posts the object
to the server and expects a 200, 204 or 400 response. If the response is 200 or 204, the caller proceeds. If
the response is 400, the client should interpret the response body to determine the "status" sub field and
then use that as the response.
POST [arbitrary_url]
{
"kind": "AdmissionReview",
"apiVersion": "admission.k8s.io/v1",
"spec": {
"resource": "pods",
"subresource": "",
"operation": "Create",
"object": {...},
"oldObject": nil,
"userInfo": {
"name": "user1",
"uid": "cn=user,dn=ou",
"groups": ["g=1"]
}
},
"status": {
"status": "Failure",
"message": "...",
"reason": "Forbidden",
"code": 403,
...
}
}
400 Bad Request
{
"kind": "AdmissionReview",
"apiVersion": "admission.k8s.io/v1",
"spec": {
...
},
"status": {
"status": "Failure",
"message": "...",
"reason": "Forbidden",
"code": 403,
...
}
}
Clients may return 204 as an optimization. oldObject may be sent if this is an update operation. This
API explicitly does not allow mutations, but leaves open the possibility for that to be added in the future.
Each webhook must be non-mutating and will be invoked in parallel. The first failure will terminate processing, and the caller may choose to retry calls and so submission must be idempotent.
Because admission is performance critical, the following considerations are taken:
- Protobuf will be the expected serialization format - JSON is shown above for readability, but the content-type wrapper will be used to encode bytes.
- The admission object will be serialized once and sent to all callers
- To minimize fan-out variance, future implementations may set strict timeouts and dispatch multiple requests.
- Admission controllers MAY omit the spec field in returning a response to the caller, specifically the
objectandoldObjectfields.
Future work:
- It should be possible to bypass expensive parts of the serialization action by potentially passing the input bytes directly to the webhook, but mutation and defaulting may complicate this goal.
Admission is a high security operation, so end-to-end TLS encryption is expected and the remote endpoint should be authorized via strong signing, mutual-auth, or high security.
Upgrade of a cluster with external admission
The current order of cluster upgrade is apiserver -> controller -> nodes. External admission controllers
would typically need to be upgraded first in order to ensure new semantic changes in objects are not ignored.
This would include fields like PodSecurityContext - adding that prior to admission is necessary because it
allows escalation that was previously impossible.
Dynamic configuration
Trusted clients should be able to modify the initialization and external admission hooks on the fly and expect that configuration is updated quickly. Extension API servers should also be able to leverage central configuration, but may opt for alternate mechanisms.
The initializer admission controller and the generic webhook admission controller should dynamically load config
from a ConfigMap or a net new API object holding the following configuration schema:
type AdmissionControlConfiguration struct {
TypeMeta // although this object could simply be serialized like ComponentConfig
// ResourceInitializers is a list of resources and their default initializers
ResourceInitializers []ResourceDefaultInitializer
ExternalAdmissionHooks []ExternalAdmissionHook
}
type ResourceDefaultInitializer struct {
// Resource identifies the type of resource to be initialized that should be initialized
Resource GroupResource
// Initializers are the default names that will be registered to this resource
Initializers []string
}
type ExternalAdmissionHook struct {
// Operations is the list of operations this hook will be invoked on - Create, Update, or *
// for all operations. Defaults to '*'.
Operations []string
// Resources are the resources this hook should be invoked on. '*' is all resources.
Resources []string
// Subresources are the list of subresources this hook should be invoked on. '*' is all resources.
Subresources []string
// TODO define client configuration
// FailurePolicy defines how unrecognized errors from the admission endpoint are handled -
// allowed values are Ignore, Retry, Fail. Default value is Fail
FailurePolicy FailurePolicyType
}
All changes to this config must be done in an safe matter - when adding a new hook or initializer, first verify the new agent is online before allowing it to come into rotation. Removing a hook or initializer must occur before disabling the remote endpoint, and all queued items must complete.
Alternatives considered
The following are all viable alternatives to this specification, but have some downsides against the requirements above. There should be no reason these could not be implemented for specific use cases.
- Admission controller that can run shell commands inside its context to mutate objects.
- Limits on performance and reliability
- Requires the masters be updated (can't be done dynamically)
- Admission controller that can run a scripting language like Lua or JavaScript in process.
- Limits on performance and reliability
- Not consistent with existing tools and infrastructure
- Requires that masters be updated and has limits on dynamic behavior
- Direct external call outs for object mutation (RPC to initialize objects)
- Requires a new programming model
- Duplicates our create - watch - update logic from controllers
- Make it easy to recompile Kubernetes to have new admission controllers
- Limits administrators to using Go
- Prevents easy installation and dynamic reconfiguration
Future Work
Mutating admission controllers
Allow webhook admission controllers to return a mutated object that is then sent to others. Requires some ordering / dependency tree in order to control the set of changes. Is necessary for some forms of controller, such as those that resolve fields into more specific values (converting an update of a pod image from a tag reference to a digest reference, or pointing to a proxy server).
Bypassing external admission hooks
There may be scenarios where an external admission hook blocks a system critical loop in a non-obvious way - by preventing node updates that prevents a new admission pod from being created, for instance. One option is to allow administrators to request fail open on specific calls, or to require that certain special resource paths (initializers dynamic config path) are always fail open. Alternatively, it may be desirable for administrative users to bypass admission completely.
Some options:
- Some namespaces are opted out of external admission (kube-system)
- Certain service accounts can bypass external admission checks
An easily accessible policy engine for moderately complex scenarios
It should be easy for a novice Kubernetes administrator to apply simple policy rules to the cluster. In the future it is desirable to have many such policy engines enabled via extension to enable quick policy customization to meet specific needs.