diff --git a/contributors/design-proposals/architecture/resource-management.md b/contributors/design-proposals/architecture/resource-management.md index 472c3a565..6573f9390 100644 --- a/contributors/design-proposals/architecture/resource-management.md +++ b/contributors/design-proposals/architecture/resource-management.md @@ -16,7 +16,7 @@ In Kubernetes, declarative abstractions are primary, rather than layered on top Kubernetes supports declarative control by recording user intent as the desired state in its API resources. This enables a single API schema for each resource to serve as a declarative data model, as both a source and a target for automated components (e.g., autoscalers), and even as an intermediate representation for resource transformations prior to instantiation. -The intent is carried out by asynchronous [controllers](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md), which interact through the Kubernetes API. Controllers don’t access the state store, etcd, directly, and don’t communicate via private direct APIs. Kubernetes itself does expose some features similar to key-value stores such as etcd and [Zookeeper](https://zookeeper.apache.org/), however, in order to facilitate centralized [state and configuration management and distribution](https://sysgears.com/articles/managing-configuration-of-distributed-system-with-apache-zookeeper/) to decentralized components. +The intent is carried out by asynchronous [controllers](/contributors/devel/sig-api-machinery/controllers.md), which interact through the Kubernetes API. Controllers don’t access the state store, etcd, directly, and don’t communicate via private direct APIs. Kubernetes itself does expose some features similar to key-value stores such as etcd and [Zookeeper](https://zookeeper.apache.org/), however, in order to facilitate centralized [state and configuration management and distribution](https://sysgears.com/articles/managing-configuration-of-distributed-system-with-apache-zookeeper/) to decentralized components. Controllers continuously strive to make the observed state match the desired state, and report back their status to the apiserver asynchronously. All of the state, desired and observed, is made visible through the API to users and to other controllers. The API resources serve as coordination points, common intermediate representation, and shared state. @@ -125,4 +125,4 @@ And get: Kubernetes API resource specifications are designed for humans to directly author and read as declarative configuration data, as well as to enable composable configuration tools and automated systems to manipulate them programmatically. We chose this simple approach of using literal API resource specifications for configuration, rather than other representations, because it was natural, given that we designed the API to support CRUD on declarative primitives. The API schema must already well defined, documented, and supported. With this approach, there’s no other representation to keep up to date with new resources and versions, or to require users to learn. [Declarative configuration](https://goo.gl/T66ZcD) is only one client use case; there are also CLIs (e.g., kubectl), UIs, deployment pipelines, etc. The user will need to interact with the system in terms of the API in these other scenarios, and knowledge of the API transfers to other clients and tools. Additionally, configuration, macro/substitution, and templating languages are generally more difficult to manipulate programmatically than pure data, and involve complexity/expressiveness tradeoffs that prevent one solution being ideal for all use cases. Such languages/tools could be layered over the native API schemas, if desired, but they should not assume exclusive control over all API fields, because doing so obstructs automation and creates undesirable coupling with the configuration ecosystem. -The Kubernetes Resource Model encourages separation of concerns by supporting multiple distinct configuration sources and preserving declarative intent while allowing automatically set attributes. Properties not explicitly declaratively managed by the user are free to be changed by other clients, enabling the desired state to be cooperatively determined by both users and systems. This is achieved by an operation, called [**Apply**](https://docs.google.com/document/d/1q1UGAIfmOkLSxKhVg7mKknplq3OTDWAIQGWMJandHzg/edit#heading=h.xgjl2srtytjt) ("make it so"), that performs a 3-way merge of the previous configuration, the new configuration, and the live state. A 2-way merge operation, called [strategic merge patch](https://github.com/kubernetes/community/blob/master/contributors/devel/strategic-merge-patch.md), enables patches to be expressed using the same schemas as the resources themselves. Such patches can be used to perform automated updates without custom mutation operations, common updates (e.g., container image updates), combinations of configurations of orthogonal concerns, and configuration customization, such as for overriding properties of variants. +The Kubernetes Resource Model encourages separation of concerns by supporting multiple distinct configuration sources and preserving declarative intent while allowing automatically set attributes. Properties not explicitly declaratively managed by the user are free to be changed by other clients, enabling the desired state to be cooperatively determined by both users and systems. This is achieved by an operation, called [**Apply**](https://docs.google.com/document/d/1q1UGAIfmOkLSxKhVg7mKknplq3OTDWAIQGWMJandHzg/edit#heading=h.xgjl2srtytjt) ("make it so"), that performs a 3-way merge of the previous configuration, the new configuration, and the live state. A 2-way merge operation, called [strategic merge patch](https:git.k8s.io/community/contributors/devel/sig-api-machinery/strategic-merge-patch.md), enables patches to be expressed using the same schemas as the resources themselves. Such patches can be used to perform automated updates without custom mutation operations, common updates (e.g., container image updates), combinations of configurations of orthogonal concerns, and configuration customization, such as for overriding properties of variants. diff --git a/contributors/design-proposals/cli/multi-fields-merge-key.md b/contributors/design-proposals/cli/multi-fields-merge-key.md index 9db3d5497..857deb25e 100644 --- a/contributors/design-proposals/cli/multi-fields-merge-key.md +++ b/contributors/design-proposals/cli/multi-fields-merge-key.md @@ -6,7 +6,7 @@ Support multi-fields merge key in Strategic Merge Patch. ## Background -Strategic Merge Patch is covered in this [doc](/contributors/devel/strategic-merge-patch.md). +Strategic Merge Patch is covered in this [doc](/contributors/devel/sig-api-machinery/strategic-merge-patch.md). In Strategic Merge Patch, we use Merge Key to identify the entries in the list of non-primitive types. It must always be present and unique to perform the merge on the list of non-primitive types, and will be preserved. diff --git a/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md b/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md index 7f6c67d71..1d3c2484a 100644 --- a/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md +++ b/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md @@ -4,7 +4,7 @@ Author: @mengqiy ## Motivation -Background of the Strategic Merge Patch is covered [here](../devel/strategic-merge-patch.md). +Background of the Strategic Merge Patch is covered [here](/contributors/devel/sig-api-machinery/strategic-merge-patch.md). The Kubernetes API may apply semantic meaning to the ordering of items within a list, however the strategic merge patch does not keep the ordering of elements. diff --git a/contributors/design-proposals/storage/csi-snapshot.md b/contributors/design-proposals/storage/csi-snapshot.md index 19c3c38b6..db9abf4f1 100644 --- a/contributors/design-proposals/storage/csi-snapshot.md +++ b/contributors/design-proposals/storage/csi-snapshot.md @@ -292,7 +292,7 @@ As the figure below shows, the CSI snapshot controller architecture consists of * External snapshotter uses ControllerGetCapabilities to find out if CSI driver supports CREATE_DELETE_SNAPSHOT calls. It degrades to trivial mode if not. -* External snapshotter is responsible for creating/deleting snapshots and binding snapshot and SnapshotContent objects. It follows [controller](https://github.com/kubernetes/community/blob/master/contributors/devel/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out `VolumeSnapshot` instances with `Snapshotter==` and processes these events in workqueues with exponential backoff. +* External snapshotter is responsible for creating/deleting snapshots and binding snapshot and SnapshotContent objects. It follows [controller](/contributors/devel/sig-api-machinery/controllers.md) pattern and uses informers to watch for `VolumeSnapshot` and `VolumeSnapshotContent` create/update/delete events. It filters out `VolumeSnapshot` instances with `Snapshotter==` and processes these events in workqueues with exponential backoff. * For dynamically created snapshot, it should have a VolumeSnapshotClass associated with it. User can explicitly specify a VolumeSnapshotClass in the VolumeSnapshot API object. If user does not specify a VolumeSnapshotClass, a default VolumeSnapshotClass created by the admin will be used. This is similar to how a default StorageClass created by the admin will be used for the provisioning of a PersistentVolumeClaim. diff --git a/contributors/devel/api-conventions.md b/contributors/devel/api-conventions.md index 0e122c5f1..5579aec6d 100644 --- a/contributors/devel/api-conventions.md +++ b/contributors/devel/api-conventions.md @@ -1,3 +1,1372 @@ +<<<<<<< HEAD This file has moved to https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md. +======= +API Conventions +=============== + +Updated: 3/7/2017 + +*This document is oriented at users who want a deeper understanding of the +Kubernetes API structure, and developers wanting to extend the Kubernetes API. +An introduction to using resources with kubectl can be found in [the object management overview](https://kubernetes.io/docs/tutorials/object-management-kubectl/object-management/).* + +**Table of Contents** + + + - [Types (Kinds)](#types-kinds) + - [Resources](#resources) + - [Objects](#objects) + - [Metadata](#metadata) + - [Spec and Status](#spec-and-status) + - [Typical status properties](#typical-status-properties) + - [References to related objects](#references-to-related-objects) + - [Lists of named subobjects preferred over maps](#lists-of-named-subobjects-preferred-over-maps) + - [Primitive types](#primitive-types) + - [Constants](#constants) + - [Unions](#unions) + - [Lists and Simple kinds](#lists-and-simple-kinds) + - [Differing Representations](#differing-representations) + - [Verbs on Resources](#verbs-on-resources) + - [PATCH operations](#patch-operations) + - [Strategic Merge Patch](#strategic-merge-patch) + - [Idempotency](#idempotency) + - [Optional vs. Required](#optional-vs-required) + - [Defaulting](#defaulting) + - [Late Initialization](#late-initialization) + - [Concurrency Control and Consistency](#concurrency-control-and-consistency) + - [Serialization Format](#serialization-format) + - [Units](#units) + - [Selecting Fields](#selecting-fields) + - [Object references](#object-references) + - [HTTP Status codes](#http-status-codes) + - [Success codes](#success-codes) + - [Error codes](#error-codes) + - [Response Status Kind](#response-status-kind) + - [Events](#events) + - [Naming conventions](#naming-conventions) + - [Label, selector, and annotation conventions](#label-selector-and-annotation-conventions) + - [WebSockets and SPDY](#websockets-and-spdy) + - [Validation](#validation) + + +The conventions of the [Kubernetes API](https://kubernetes.io/docs/api/) (and related APIs in the +ecosystem) are intended to ease client development and ensure that configuration +mechanisms can be implemented that work across a diverse set of use cases +consistently. + +The general style of the Kubernetes API is RESTful - clients create, update, +delete, or retrieve a description of an object via the standard HTTP verbs +(POST, PUT, DELETE, and GET) - and those APIs preferentially accept and return +JSON. Kubernetes also exposes additional endpoints for non-standard verbs and +allows alternative content types. All of the JSON accepted and returned by the +server has a schema, identified by the "kind" and "apiVersion" fields. Where +relevant HTTP header fields exist, they should mirror the content of JSON +fields, but the information should not be represented only in the HTTP header. + +The following terms are defined: + +* **Kind** the name of a particular object schema (e.g. the "Cat" and "Dog" +kinds would have different attributes and properties) +* **Resource** a representation of a system entity, sent or retrieved as JSON +via HTTP to the server. Resources are exposed via: + * Collections - a list of resources of the same type, which may be queryable + * Elements - an individual resource, addressable via a URL +* **API Group** a set of resources that are exposed together. Along +with the version is exposed in the "apiVersion" field as "GROUP/VERSION", e.g. +"policy.k8s.io/v1". + +Each resource typically accepts and returns data of a single kind. A kind may be +accepted or returned by multiple resources that reflect specific use cases. For +instance, the kind "Pod" is exposed as a "pods" resource that allows end users +to create, update, and delete pods, while a separate "pod status" resource (that +acts on "Pod" kind) allows automated processes to update a subset of the fields +in that resource. + +Resources are bound together in API groups - each group may have one or more +versions that evolve independent of other API groups, and each version within +the group has one or more resources. Group names are typically in domain name +form - the Kubernetes project reserves use of the empty group, all single +word names ("extensions", "apps"), and any group name ending in "*.k8s.io" for +its sole use. When choosing a group name, we recommend selecting a subdomain +your group or organization owns, such as "widget.mycompany.com". + +Resource collections should be all lowercase and plural, whereas kinds are +CamelCase and singular. Group names must be lower case and be valid DNS +subdomains. + + +## Types (Kinds) + +Kinds are grouped into three categories: + +1. **Objects** represent a persistent entity in the system. + + Creating an API object is a record of intent - once created, the system will +work to ensure that resource exists. All API objects have common metadata. + + An object may have multiple resources that clients can use to perform +specific actions that create, update, delete, or get. + + Examples: `Pod`, `ReplicationController`, `Service`, `Namespace`, `Node`. + +2. **Lists** are collections of **resources** of one (usually) or more +(occasionally) kinds. + + The name of a list kind must end with "List". Lists have a limited set of +common metadata. All lists use the required "items" field to contain the array +of objects they return. Any kind that has the "items" field must be a list kind. + + Most objects defined in the system should have an endpoint that returns the +full set of resources, as well as zero or more endpoints that return subsets of +the full list. Some objects may be singletons (the current user, the system +defaults) and may not have lists. + + In addition, all lists that return objects with labels should support label +filtering (see [the labels documentation](https://kubernetes.io/docs/user-guide/labels/)), and most +lists should support filtering by fields. + + Examples: `PodLists`, `ServiceLists`, `NodeLists`. + + TODO: Describe field filtering below or in a separate doc. + +3. **Simple** kinds are used for specific actions on objects and for +non-persistent entities. + + Given their limited scope, they have the same set of limited common metadata +as lists. + + For instance, the "Status" kind is returned when errors occur and is not +persisted in the system. + + Many simple resources are "subresources", which are rooted at API paths of +specific resources. When resources wish to expose alternative actions or views +that are closely coupled to a single resource, they should do so using new +sub-resources. Common subresources include: + + * `/binding`: Used to bind a resource representing a user request (e.g., Pod, +PersistentVolumeClaim) to a cluster infrastructure resource (e.g., Node, +PersistentVolume). + * `/status`: Used to write just the status portion of a resource. For +example, the `/pods` endpoint only allows updates to `metadata` and `spec`, +since those reflect end-user intent. An automated process should be able to +modify status for users to see by sending an updated Pod kind to the server to +the "/pods/<name>/status" endpoint - the alternate endpoint allows +different rules to be applied to the update, and access to be appropriately +restricted. + * `/scale`: Used to read and write the count of a resource in a manner that +is independent of the specific resource schema. + + Two additional subresources, `proxy` and `portforward`, provide access to +cluster resources as described in +[accessing the cluster](https://kubernetes.io/docs/user-guide/accessing-the-cluster/). + +The standard REST verbs (defined below) MUST return singular JSON objects. Some +API endpoints may deviate from the strict REST pattern and return resources that +are not singular JSON objects, such as streams of JSON objects or unstructured +text log data. + +A common set of "meta" API objects are used across all API groups and are +thus considered part of the server group named `meta.k8s.io`. These types may +evolve independent of the API group that uses them and API servers may allow +them to be addressed in their generic form. Examples are `ListOptions`, +`DeleteOptions`, `List`, `Status`, `WatchEvent`, and `Scale`. For historical +reasons these types are part of each existing API group. Generic tools like +quota, garbage collection, autoscalers, and generic clients like kubectl +leverage these types to define consistent behavior across different resource +types, like the interfaces in programming languages. + +The term "kind" is reserved for these "top-level" API types. The term "type" +should be used for distinguishing sub-categories within objects or subobjects. + +### Resources + +All JSON objects returned by an API MUST have the following fields: + +* kind: a string that identifies the schema this object should have +* apiVersion: a string that identifies the version of the schema the object +should have + +These fields are required for proper decoding of the object. They may be +populated by the server by default from the specified URL path, but the client +likely needs to know the values in order to construct the URL path. + +### Objects + +#### Metadata + +Every object kind MUST have the following metadata in a nested object field +called "metadata": + +* namespace: a namespace is a DNS compatible label that objects are subdivided +into. The default namespace is 'default'. See +[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more. +* name: a string that uniquely identifies this object within the current +namespace (see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)). +This value is used in the path when retrieving an individual object. +* uid: a unique in time and space value (typically an RFC 4122 generated +identifier, see [the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/)) +used to distinguish between objects with the same name that have been deleted +and recreated + +Every object SHOULD have the following metadata in a nested object field called +"metadata": + +* resourceVersion: a string that identifies the internal version of this object +that can be used by clients to determine when objects have changed. This value +MUST be treated as opaque by clients and passed unmodified back to the server. +Clients should not assume that the resource version has meaning across +namespaces, different kinds of resources, or different servers. (See +[concurrency control](#concurrency-control-and-consistency), below, for more +details.) +* generation: a sequence number representing a specific generation of the +desired state. Set by the system and monotonically increasing, per-resource. May +be compared, such as for RAW and WAW consistency. +* creationTimestamp: a string representing an RFC 3339 date of the date and time +an object was created +* deletionTimestamp: a string representing an RFC 3339 date of the date and time +after which this resource will be deleted. This field is set by the server when +a graceful deletion is requested by the user, and is not directly settable by a +client. The resource will be deleted (no longer visible from resource lists, and +not reachable by name) after the time in this field except when the object has +a finalizer set. In case the finalizer is set the deletion of the object is +postponed at least until the finalizer is removed. +Once the deletionTimestamp is set, this value may not be unset or be set further +into the future, although it may be shortened or the resource may be deleted +prior to this time. +* labels: a map of string keys and values that can be used to organize and +categorize objects (see [the labels docs](https://kubernetes.io/docs/user-guide/labels/)) +* annotations: a map of string keys and values that can be used by external +tooling to store and retrieve arbitrary metadata about this object (see +[the annotations docs](https://kubernetes.io/docs/user-guide/annotations/)) + +Labels are intended for organizational purposes by end users (select the pods +that match this label query). Annotations enable third-party automation and +tooling to decorate objects with additional metadata for their own use. + +#### Spec and Status + +By convention, the Kubernetes API makes a distinction between the specification +of the desired state of an object (a nested object field called "spec") and the +status of the object at the current time (a nested object field called +"status"). The specification is a complete description of the desired state, +including configuration settings provided by the user, +[default values](#defaulting) expanded by the system, and properties initialized +or otherwise changed after creation by other ecosystem components (e.g., +schedulers, auto-scalers), and is persisted in stable storage with the API +object. If the specification is deleted, the object will be purged from the +system. The status summarizes the current state of the object in the system, and +is usually persisted with the object by an automated processes but may be +generated on the fly. At some cost and perhaps some temporary degradation in +behavior, the status could be reconstructed by observation if it were lost. + +When a new version of an object is POSTed or PUT, the "spec" is updated and +available immediately. Over time the system will work to bring the "status" into +line with the "spec". The system will drive toward the most recent "spec" +regardless of previous versions of that stanza. In other words, if a value is +changed from 2 to 5 in one PUT and then back down to 3 in another PUT the system +is not required to 'touch base' at 5 before changing the "status" to 3. In other +words, the system's behavior is *level-based* rather than *edge-based*. This +enables robust behavior in the presence of missed intermediate state changes. + +The Kubernetes API also serves as the foundation for the declarative +configuration schema for the system. In order to facilitate level-based +operation and expression of declarative configuration, fields in the +specification should have declarative rather than imperative names and +semantics -- they represent the desired state, not actions intended to yield the +desired state. + +The PUT and POST verbs on objects MUST ignore the "status" values, to avoid +accidentally overwriting the status in read-modify-write scenarios. A `/status` +subresource MUST be provided to enable system components to update statuses of +resources they manage. + +Otherwise, PUT expects the whole object to be specified. Therefore, if a field +is omitted it is assumed that the client wants to clear that field's value. The +PUT verb does not accept partial updates. Modification of just part of an object +may be achieved by GETting the resource, modifying part of the spec, labels, or +annotations, and then PUTting it back. See +[concurrency control](#concurrency-control-and-consistency), below, regarding +read-modify-write consistency when using this pattern. Some objects may expose +alternative resource representations that allow mutation of the status, or +performing custom actions on the object. + +All objects that represent a physical resource whose state may vary from the +user's desired intent SHOULD have a "spec" and a "status". Objects whose state +cannot vary from the user's desired intent MAY have only "spec", and MAY rename +"spec" to a more appropriate name. + +Objects that contain both spec and status should not contain additional +top-level fields other than the standard metadata fields. + +Some objects which are not persisted in the system - such as `SubjectAccessReview` +and other webhook style calls - may choose to add spec and status to encapsulate +a "call and response" pattern. The spec is the request (often a request for +information) and the status is the response. For these RPC like objects the only +operation may be POST, but having a consistent schema between submission and +response reduces the complexity of these clients. + + +##### Typical status properties + +**Conditions** represent the latest available observations of an object's +state. They are an extension mechanism intended to be used when the details of +an observation are not a priori known or would not apply to all instances of a +given Kind. For observations that are well known and apply to all instances, a +regular field is preferred. An example of a Condition that probably should +have been a regular field is Pod's "Ready" condition - it is managed by core +controllers, it is well understood, and it applies to all Pods. + +Objects may report multiple conditions, and new types of conditions may be +added in the future or by 3rd party controllers. Therefore, conditions are +represented using a list/slice, where all have similar structure. + +The `FooCondition` type for some resource type `Foo` may include a subset of the +following fields, but must contain at least `type` and `status` fields: + +```go + Type FooConditionType `json:"type" description:"type of Foo condition"` + Status ConditionStatus `json:"status" description:"status of the condition, one of True, False, Unknown"` + + // +optional + Reason *string `json:"reason,omitempty" description:"one-word CamelCase reason for the condition's last transition"` + // +optional + Message *string `json:"message,omitempty" description:"human-readable message indicating details about last transition"` + + // +optional + LastHeartbeatTime *unversioned.Time `json:"lastHeartbeatTime,omitempty" description:"last time we got an update on a given condition"` + // +optional + LastTransitionTime *unversioned.Time `json:"lastTransitionTime,omitempty" description:"last time the condition transit from one status to another"` +``` + +Additional fields may be added in the future. + +Do not use fields that you don't need - simpler is better. + +Use of the `Reason` field is encouraged. + +Use the `LastHeartbeatTime` with great caution - frequent changes to this field +can cause a large fan-out effect for some resources. + +Conditions should be added to explicitly convey properties that users and +components care about rather than requiring those properties to be inferred from +other observations. Once defined, the meaning of a Condition can not be +changed arbitrarily - it becomes part of the API, and has the same backwards- +and forwards-compatibility concerns of any other part of the API. + +Condition status values may be `True`, `False`, or `Unknown`. The absence of a +condition should be interpreted the same as `Unknown`. How controllers handle +`Unknown` depends on the Condition in question. + +Condition types should indicate state in the "abnormal-true" polarity. For +example, if the condition indicates when a policy is invalid, the "is valid" +case is probably the norm, so the condition should be called "Invalid". + +The thinking around conditions has evolved over time, so there are several +non-normative examples in wide use. + +In general, condition values may change back and forth, but some condition +transitions may be monotonic, depending on the resource and condition type. +However, conditions are observations and not, themselves, state machines, nor do +we define comprehensive state machines for objects, nor behaviors associated +with state transitions. The system is level-based rather than edge-triggered, +and should assume an Open World. + +An example of an oscillating condition type is `Ready` (despite it running +afoul of current guidance), which indicates the object was believed to be fully +operational at the time it was last probed. A possible monotonic condition +could be `Failed`. A `True` status for `Failed` would imply failure with no +retry. An object that was still active would generally not have a `Failed` +condition. + +Some resources in the v1 API contain fields called **`phase`**, and associated +`message`, `reason`, and other status fields. The pattern of using `phase` is +deprecated. Newer API types should use conditions instead. Phase was +essentially a state-machine enumeration field, that contradicted [system-design +principles](../design-proposals/architecture/principles.md#control-logic) and +hampered evolution, since [adding new enum values breaks backward +compatibility](api_changes.md). Rather than encouraging clients to infer +implicit properties from phases, we prefer to explicitly expose the individual +conditions that clients need to monitor. Conditions also have the benefit that +it is possible to create some conditions with uniform meaning across all +resource types, while still exposing others that are unique to specific +resource types. See [#7856](http://issues.k8s.io/7856) for more details and +discussion. + +In condition types, and everywhere else they appear in the API, **`Reason`** is +intended to be a one-word, CamelCase representation of the category of cause of +the current status, and **`Message`** is intended to be a human-readable phrase +or sentence, which may contain specific details of the individual occurrence. +`Reason` is intended to be used in concise output, such as one-line +`kubectl get` output, and in summarizing occurrences of causes, whereas +`Message` is intended to be presented to users in detailed status explanations, +such as `kubectl describe` output. + +Historical information status (e.g., last transition time, failure counts) is +only provided with reasonable effort, and is not guaranteed to not be lost. + +Status information that may be large (especially proportional in size to +collections of other resources, such as lists of references to other objects -- +see below) and/or rapidly changing, such as +[resource usage](../design-proposals/scheduling/resources.md#usage-data), should be put into separate +objects, with possibly a reference from the original object. This helps to +ensure that GETs and watch remain reasonably efficient for the majority of +clients, which may not need that data. + +Some resources report the `observedGeneration`, which is the `generation` most +recently observed by the component responsible for acting upon changes to the +desired state of the resource. This can be used, for instance, to ensure that +the reported status reflects the most recent desired status. + +#### References to related objects + +References to loosely coupled sets of objects, such as +[pods](https://kubernetes.io/docs/user-guide/pods/) overseen by a +[replication controller](https://kubernetes.io/docs/user-guide/replication-controller/), are usually +best referred to using a [label selector](https://kubernetes.io/docs/user-guide/labels/). In order to +ensure that GETs of individual objects remain bounded in time and space, these +sets may be queried via separate API queries, but will not be expanded in the +referring object's status. + +References to specific objects, especially specific resource versions and/or +specific fields of those objects, are specified using the `ObjectReference` type +(or other types representing strict subsets of it). Unlike partial URLs, the +ObjectReference type facilitates flexible defaulting of fields from the +referring object or other contextual information. + +References in the status of the referee to the referrer may be permitted, when +the references are one-to-one and do not need to be frequently updated, +particularly in an edge-based manner. + +#### Lists of named subobjects preferred over maps + +Discussed in [#2004](http://issue.k8s.io/2004) and elsewhere. There are no maps +of subobjects in any API objects. Instead, the convention is to use a list of +subobjects containing name fields. + +For example: + +```yaml +ports: + - name: www + containerPort: 80 +``` + +vs. + +```yaml +ports: + www: + containerPort: 80 +``` + +This rule maintains the invariant that all JSON/YAML keys are fields in API +objects. The only exceptions are pure maps in the API (currently, labels, +selectors, annotations, data), as opposed to sets of subobjects. + +#### Primitive types + +* Avoid floating-point values as much as possible, and never use them in spec. + Floating-point values cannot be reliably round-tripped (encoded and + re-decoded) without changing, and have varying precision and representations + across languages and architectures. +* All numbers (e.g., uint32, int64) are converted to float64 by Javascript and + some other languages, so any field which is expected to exceed that either in + magnitude or in precision (specifically integer values > 53 bits) should be + serialized and accepted as strings. +* Do not use unsigned integers, due to inconsistent support across languages and + libraries. Just validate that the integer is non-negative if that's the case. +* Do not use enums. Use aliases for string instead (e.g., `NodeConditionType`). +* Look at similar fields in the API (e.g., ports, durations) and follow the + conventions of existing fields. +* All public integer fields MUST use the Go `(u)int32` or Go `(u)int64` types, + not `(u)int` (which is ambiguous depending on target platform). Internal + types may use `(u)int`. +* Think twice about `bool` fields. Many ideas start as boolean but eventually + trend towards a small set of mutually exclusive options. Plan for future + expansions by describing the policy options explicitly as a string type + alias (e.g. `TerminationMessagePolicy`). + +#### Constants + +Some fields will have a list of allowed values (enumerations). These values will +be strings, and they will be in CamelCase, with an initial uppercase letter. +Examples: `ClusterFirst`, `Pending`, `ClientIP`. + +#### Unions + +Sometimes, at most one of a set of fields can be set. For example, the +[volumes] field of a PodSpec has 17 different volume type-specific fields, such +as `nfs` and `iscsi`. All fields in the set should be +[Optional](#optional-vs-required). + +Sometimes, when a new type is created, the api designer may anticipate that a +union will be needed in the future, even if only one field is allowed initially. +In this case, be sure to make the field [Optional](#optional-vs-required) +optional. In the validation, you may still return an error if the sole field is +unset. Do not set a default value for that field. + +### Lists and Simple kinds + +Every list or simple kind SHOULD have the following metadata in a nested object +field called "metadata": + +* resourceVersion: a string that identifies the common version of the objects +returned by in a list. This value MUST be treated as opaque by clients and +passed unmodified back to the server. A resource version is only valid within a +single namespace on a single kind of resource. + +Every simple kind returned by the server, and any simple kind sent to the server +that must support idempotency or optimistic concurrency should return this +value. Since simple resources are often used as input alternate actions that +modify objects, the resource version of the simple resource should correspond to +the resource version of the object. + + +## Differing Representations + +An API may represent a single entity in different ways for different clients, or +transform an object after certain transitions in the system occur. In these +cases, one request object may have two representations available as different +resources, or different kinds. + +An example is a Service, which represents the intent of the user to group a set +of pods with common behavior on common ports. When Kubernetes detects a pod +matches the service selector, the IP address and port of the pod are added to an +Endpoints resource for that Service. The Endpoints resource exists only if the +Service exists, but exposes only the IPs and ports of the selected pods. The +full service is represented by two distinct resources - under the original +Service resource the user created, as well as in the Endpoints resource. + +As another example, a "pod status" resource may accept a PUT with the "pod" +kind, with different rules about what fields may be changed. + +Future versions of Kubernetes may allow alternative encodings of objects beyond +JSON. + + +## Verbs on Resources + +API resources should use the traditional REST pattern: + +* GET /<resourceNamePlural> - Retrieve a list of type +<resourceName>, e.g. GET /pods returns a list of Pods. +* POST /<resourceNamePlural> - Create a new resource from the JSON object +provided by the client. +* GET /<resourceNamePlural>/<name> - Retrieves a single resource +with the given name, e.g. GET /pods/first returns a Pod named 'first'. Should be +constant time, and the resource should be bounded in size. +* DELETE /<resourceNamePlural>/<name> - Delete the single resource +with the given name. DeleteOptions may specify gracePeriodSeconds, the optional +duration in seconds before the object should be deleted. Individual kinds may +declare fields which provide a default grace period, and different kinds may +have differing kind-wide default grace periods. A user provided grace period +overrides a default grace period, including the zero grace period ("now"). +* PUT /<resourceNamePlural>/<name> - Update or create the resource +with the given name with the JSON object provided by the client. +* PATCH /<resourceNamePlural>/<name> - Selectively modify the +specified fields of the resource. See more information [below](#patch-operations). +* GET /<resourceNamePlural>&watch=true - Receive a stream of JSON +objects corresponding to changes made to any resource of the given kind over +time. + +### PATCH operations + +The API supports three different PATCH operations, determined by their +corresponding Content-Type header: + +* JSON Patch, `Content-Type: application/json-patch+json` + * As defined in [RFC6902](https://tools.ietf.org/html/rfc6902), a JSON Patch is +a sequence of operations that are executed on the resource, e.g. `{"op": "add", +"path": "/a/b/c", "value": [ "foo", "bar" ]}`. For more details on how to use +JSON Patch, see the RFC. +* Merge Patch, `Content-Type: application/merge-patch+json` + * As defined in [RFC7386](https://tools.ietf.org/html/rfc7386), a Merge Patch +is essentially a partial representation of the resource. The submitted JSON is +"merged" with the current resource to create a new one, then the new one is +saved. For more details on how to use Merge Patch, see the RFC. +* Strategic Merge Patch, `Content-Type: application/strategic-merge-patch+json` + * Strategic Merge Patch is a custom implementation of Merge Patch. For a +detailed explanation of how it works and why it needed to be introduced, see +below. + +#### Strategic Merge Patch + +Details of Strategic Merge Patch are covered [here](/contributors/devel/sig-api-machinery/strategic-merge-patch.md). + +## Idempotency + +All compatible Kubernetes APIs MUST support "name idempotency" and respond with +an HTTP status code 409 when a request is made to POST an object that has the +same name as an existing object in the system. See +[the identifiers docs](https://kubernetes.io/docs/user-guide/identifiers/) for details. + +Names generated by the system may be requested using `metadata.generateName`. +GenerateName indicates that the name should be made unique by the server prior +to persisting it. A non-empty value for the field indicates the name will be +made unique (and the name returned to the client will be different than the name +passed). The value of this field will be combined with a unique suffix on the +server if the Name field has not been provided. The provided value must be valid +within the rules for Name, and may be truncated by the length of the suffix +required to make the value unique on the server. If this field is specified, and +Name is not present, the server will NOT return a 409 if the generated name +exists - instead, it will either return 201 Created or 504 with Reason +`ServerTimeout` indicating a unique name could not be found in the time +allotted, and the client should retry (optionally after the time indicated in +the Retry-After header). + +## Optional vs. Required + +Fields must be either optional or required. + +Optional fields have the following properties: + +- They have the `+optional` comment tag in Go. +- They are a pointer type in the Go definition (e.g. `bool *awesomeFlag`) or +have a built-in `nil` value (e.g. maps and slices). +- The API server should allow POSTing and PUTing a resource with this field +unset. + +In most cases, optional fields should also have the `omitempty` struct tag (the +`omitempty` option specifies that the field should be omitted from the json +encoding if the field has an empty value). However, If you want to have +different logic for an optional field which is not provided vs. provided with +empty values, do not use `omitempty` (e.g. https://github.com/kubernetes/kubernetes/issues/34641). + +Note that for backward compatibility, any field that has the `omitempty` struct +tag will considered to be optional but this may change in future and having +the `+optional` comment tag is highly recommended. + +Required fields have the opposite properties, namely: + +- They do not have an `+optional` comment tag. +- They do not have an `omitempty` struct tag. +- They are not a pointer type in the Go definition (e.g. `bool otherFlag`). +- The API server should not allow POSTing or PUTing a resource with this field +unset. + +Using the `+optional` or the `omitempty` tag causes OpenAPI documentation to +reflect that the field is optional. + +Using a pointer allows distinguishing unset from the zero value for that type. +There are some cases where, in principle, a pointer is not needed for an +optional field since the zero value is forbidden, and thus implies unset. There +are examples of this in the codebase. However: + +- it can be difficult for implementors to anticipate all cases where an empty +value might need to be distinguished from a zero value +- structs are not omitted from encoder output even where omitempty is specified, +which is messy; +- having a pointer consistently imply optional is clearer for users of the Go +language client, and any other clients that use corresponding types + +Therefore, we ask that pointers always be used with optional fields that do not +have a built-in `nil` value. + + +## Defaulting + +Default resource values are API version-specific, and they are applied during +the conversion from API-versioned declarative configuration to internal objects +representing the desired state (`Spec`) of the resource. Subsequent GETs of the +resource will include the default values explicitly. + +Incorporating the default values into the `Spec` ensures that `Spec` depicts the +full desired state so that it is easier for the system to determine how to +achieve the state, and for the user to know what to anticipate. + +API version-specific default values are set by the API server. + +## Late Initialization + +Late initialization is when resource fields are set by a system controller +after an object is created/updated. + +For example, the scheduler sets the `pod.spec.nodeName` field after the pod is +created. + +Late-initializers should only make the following types of modifications: + - Setting previously unset fields + - Adding keys to maps + - Adding values to arrays which have mergeable semantics +(`patchStrategy:"merge"` attribute in the type definition). + +These conventions: + 1. allow a user (with sufficient privilege) to override any system-default + behaviors by setting the fields that would otherwise have been defaulted. + 1. enables updates from users to be merged with changes made during late +initialization, using strategic merge patch, as opposed to clobbering the +change. + 1. allow the component which does the late-initialization to use strategic +merge patch, which facilitates composition and concurrency of such components. + +Although the apiserver Admission Control stage acts prior to object creation, +Admission Control plugins should follow the Late Initialization conventions +too, to allow their implementation to be later moved to a 'controller', or to +client libraries. + +## Concurrency Control and Consistency + +Kubernetes leverages the concept of *resource versions* to achieve optimistic +concurrency. All Kubernetes resources have a "resourceVersion" field as part of +their metadata. This resourceVersion is a string that identifies the internal +version of an object that can be used by clients to determine when objects have +changed. When a record is about to be updated, it's version is checked against a +pre-saved value, and if it doesn't match, the update fails with a StatusConflict +(HTTP status code 409). + +The resourceVersion is changed by the server every time an object is modified. +If resourceVersion is included with the PUT operation the system will verify +that there have not been other successful mutations to the resource during a +read/modify/write cycle, by verifying that the current value of resourceVersion +matches the specified value. + +The resourceVersion is currently backed by [etcd's +modifiedIndex](https://coreos.com/etcd/docs/latest/v2/api.html). +However, it's important to note that the application should *not* rely on the +implementation details of the versioning system maintained by Kubernetes. We may +change the implementation of resourceVersion in the future, such as to change it +to a timestamp or per-object counter. + +The only way for a client to know the expected value of resourceVersion is to +have received it from the server in response to a prior operation, typically a +GET. This value MUST be treated as opaque by clients and passed unmodified back +to the server. Clients should not assume that the resource version has meaning +across namespaces, different kinds of resources, or different servers. +Currently, the value of resourceVersion is set to match etcd's sequencer. You +could think of it as a logical clock the API server can use to order requests. +However, we expect the implementation of resourceVersion to change in the +future, such as in the case we shard the state by kind and/or namespace, or port +to another storage system. + +In the case of a conflict, the correct client action at this point is to GET the +resource again, apply the changes afresh, and try submitting again. This +mechanism can be used to prevent races like the following: + +``` +Client #1 Client #2 +GET Foo GET Foo +Set Foo.Bar = "one" Set Foo.Baz = "two" +PUT Foo PUT Foo +``` + +When these sequences occur in parallel, either the change to Foo.Bar or the +change to Foo.Baz can be lost. + +On the other hand, when specifying the resourceVersion, one of the PUTs will +fail, since whichever write succeeds changes the resourceVersion for Foo. + +resourceVersion may be used as a precondition for other operations (e.g., GET, +DELETE) in the future, such as for read-after-write consistency in the presence +of caching. + +"Watch" operations specify resourceVersion using a query parameter. It is used +to specify the point at which to begin watching the specified resources. This +may be used to ensure that no mutations are missed between a GET of a resource +(or list of resources) and a subsequent Watch, even if the current version of +the resource is more recent. This is currently the main reason that list +operations (GET on a collection) return resourceVersion. + + +## Serialization Format + +APIs may return alternative representations of any resource in response to an +Accept header or under alternative endpoints, but the default serialization for +input and output of API responses MUST be JSON. + +A protobuf encoding is also accepted for built-in resources. As proto is not +self-describing, there is an envelope wrapper which describes the type of +the contents. + +All dates should be serialized as RFC3339 strings. + +## Units + +Units must either be explicit in the field name (e.g., `timeoutSeconds`), or +must be specified as part of the value (e.g., `resource.Quantity`). Which +approach is preferred is TBD, though currently we use the `fooSeconds` +convention for durations. + +Duration fields must be represented as integer fields with units being +part of the field name (e.g. `leaseDurationSeconds`). We don't use Duration +in the API since that would require clients to implement go-compatible parsing. + +## Selecting Fields + +Some APIs may need to identify which field in a JSON object is invalid, or to +reference a value to extract from a separate resource. The current +recommendation is to use standard JavaScript syntax for accessing that field, +assuming the JSON object was transformed into a JavaScript object, without the +leading dot, such as `metadata.name`. + +Examples: + +* Find the field "current" in the object "state" in the second item in the array +"fields": `fields[1].state.current` + +## Object references + +Object references should either be called `fooName` if referring to an object of +kind `Foo` by just the name (within the current namespace, if a namespaced +resource), or should be called `fooRef`, and should contain a subset of the +fields of the `ObjectReference` type. + + +TODO: Plugins, extensions, nested kinds, headers + + +## HTTP Status codes + +The server will respond with HTTP status codes that match the HTTP spec. See the +section below for a breakdown of the types of status codes the server will send. + +The following HTTP status codes may be returned by the API. + +#### Success codes + +* `200 StatusOK` + * Indicates that the request completed successfully. +* `201 StatusCreated` + * Indicates that the request to create kind completed successfully. +* `204 StatusNoContent` + * Indicates that the request completed successfully, and the response contains +no body. + * Returned in response to HTTP OPTIONS requests. + +#### Error codes + +* `307 StatusTemporaryRedirect` + * Indicates that the address for the requested resource has changed. + * Suggested client recovery behavior: + * Follow the redirect. + + +* `400 StatusBadRequest` + * Indicates the requested is invalid. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `401 StatusUnauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because the client must provide +authorization. If the client has provided authorization, the server is +indicating the provided authorization is unsuitable or invalid. + * Suggested client recovery behavior: + * If the user has not supplied authorization information, prompt them for +the appropriate credentials. If the user has supplied authorization information, +inform them their credentials were rejected and optionally prompt them again. + + +* `403 StatusForbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `404 StatusNotFound` + * Indicates that the requested resource does not exist. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `405 StatusMethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `409 StatusConflict` + * Indicates that either the resource the client attempted to create already +exists or the requested update operation cannot be completed due to a conflict. + * Suggested client recovery behavior: + * * If creating a new resource: + * * Either change the identifier and try again, or GET and compare the +fields in the pre-existing object and issue a PUT/update to modify the existing +object. + * * If updating an existing resource: + * See `Conflict` from the `status` response section below on how to +retrieve more information about the nature of the conflict. + * GET and compare the fields in the pre-existing object, merge changes (if +still valid according to preconditions), and retry with the updated request +(including `ResourceVersion`). + + +* `410 StatusGone` + * Indicates that the item is no longer available at the server and no +forwarding address is known. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `422 StatusUnprocessableEntity` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Suggested client recovery behavior: + * Do not retry. Fix the request. + + +* `429 StatusTooManyRequests` + * Indicates that the either the client rate limit has been exceeded or the +server has received more requests then it can process. + * Suggested client recovery behavior: + * Read the `Retry-After` HTTP header from the response, and wait at least +that long before retrying. + + +* `500 StatusInternalServerError` + * Indicates that the server can be reached and understood the request, but +either an unexpected internal error occurred and the outcome of the call is +unknown, or the server cannot complete the action in a reasonable time (this may +be due to temporary server load or a transient communication issue with another +server). + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `503 StatusServiceUnavailable` + * Indicates that required service is unavailable. + * Suggested client recovery behavior: + * Retry with exponential backoff. + + +* `504 StatusServerTimeout` + * Indicates that the request could not be completed within the given time. +Clients can get this response ONLY when they specified a timeout param in the +request. + * Suggested client recovery behavior: + * Increase the value of the timeout param and retry with exponential +backoff. + +## Response Status Kind + +Kubernetes will always return the `Status` kind from any API endpoint when an +error occurs. Clients SHOULD handle these types of objects when appropriate. + +A `Status` kind will be returned by the API in two cases: + * When an operation is not successful (i.e. when the server would return a non +2xx HTTP status code). + * When a HTTP `DELETE` call is successful. + +The status object is encoded as JSON and provided as the body of the response. +The status object contains fields for humans and machine consumers of the API to +get more detailed information for the cause of the failure. The information in +the status object supplements, but does not override, the HTTP status code's +meaning. When fields in the status object have the same meaning as generally +defined HTTP headers and that header is returned with the response, the header +should be considered as having higher priority. + +**Example:** + +```console +$ curl -v -k -H "Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc" https://10.240.122.184:443/api/v1/namespaces/default/pods/grafana + +> GET /api/v1/namespaces/default/pods/grafana HTTP/1.1 +> User-Agent: curl/7.26.0 +> Host: 10.240.122.184 +> Accept: */* +> Authorization: Bearer WhCDvq4VPpYhrcfmF6ei7V9qlbqTubUc +> + +< HTTP/1.1 404 Not Found +< Content-Type: application/json +< Date: Wed, 20 May 2015 18:10:42 GMT +< Content-Length: 232 +< +{ + "kind": "Status", + "apiVersion": "v1", + "metadata": {}, + "status": "Failure", + "message": "pods \"grafana\" not found", + "reason": "NotFound", + "details": { + "name": "grafana", + "kind": "pods" + }, + "code": 404 +} +``` + +`status` field contains one of two possible values: +* `Success` +* `Failure` + +`message` may contain human-readable description of the error + +`reason` may contain a machine-readable, one-word, CamelCase description of why +this operation is in the `Failure` status. If this value is empty there is no +information available. The `reason` clarifies an HTTP status code but does not +override it. + +`details` may contain extended data associated with the reason. Each reason may +define its own extended details. This field is optional and the data returned is +not guaranteed to conform to any schema except that defined by the reason type. + +Possible values for the `reason` and `details` fields: +* `BadRequest` + * Indicates that the request itself was invalid, because the request doesn't +make any sense, for example deleting a read-only object. + * This is different than `status reason` `Invalid` above which indicates that +the API call could possibly succeed, but the data was invalid. + * API calls that return BadRequest can never succeed. + * Http status code: `400 StatusBadRequest` + + +* `Unauthorized` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action without the client providing appropriate +authorization. If the client has provided authorization, this error indicates +the provided credentials are insufficient or invalid. + * Details (optional): + * `kind string` + * The kind attribute of the unauthorized resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the unauthorized resource. + * HTTP status code: `401 StatusUnauthorized` + + +* `Forbidden` + * Indicates that the server can be reached and understood the request, but +refuses to take any further action, because it is configured to deny access for +some reason to the requested resource by the client. + * Details (optional): + * `kind string` + * The kind attribute of the forbidden resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the forbidden resource. + * HTTP status code: `403 StatusForbidden` + + +* `NotFound` + * Indicates that one or more resources required for this operation could not +be found. + * Details (optional): + * `kind string` + * The kind attribute of the missing resource (on some operations may +differ from the requested resource). + * `name string` + * The identifier of the missing resource. + * HTTP status code: `404 StatusNotFound` + + +* `AlreadyExists` + * Indicates that the resource you are creating already exists. + * Details (optional): + * `kind string` + * The kind attribute of the conflicting resource. + * `name string` + * The identifier of the conflicting resource. + * HTTP status code: `409 StatusConflict` + +* `Conflict` + * Indicates that the requested update operation cannot be completed due to a +conflict. The client may need to alter the request. Each resource may define +custom details that indicate the nature of the conflict. + * HTTP status code: `409 StatusConflict` + + +* `Invalid` + * Indicates that the requested create or update operation cannot be completed +due to invalid data provided as part of the request. + * Details (optional): + * `kind string` + * the kind attribute of the invalid resource + * `name string` + * the identifier of the invalid resource + * `causes` + * One or more `StatusCause` entries indicating the data in the provided +resource that was invalid. The `reason`, `message`, and `field` attributes will +be set. + * HTTP status code: `422 StatusUnprocessableEntity` + + +* `Timeout` + * Indicates that the request could not be completed within the given time. +Clients may receive this response if the server has decided to rate limit the +client, or if the server is overloaded and cannot process the request at this +time. + * Http status code: `429 TooManyRequests` + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + + +* `ServerTimeout` + * Indicates that the server can be reached and understood the request, but +cannot complete the action in a reasonable time. This maybe due to temporary +server load or a transient communication issue with another server. + * Details (optional): + * `kind string` + * The kind attribute of the resource being acted on. + * `name string` + * The operation that is being attempted. + * The server should set the `Retry-After` HTTP header and return +`retryAfterSeconds` in the details field of the object. A value of `0` is the +default. + * Http status code: `504 StatusServerTimeout` + + +* `MethodNotAllowed` + * Indicates that the action the client attempted to perform on the resource +was not supported by the code. + * For instance, attempting to delete a resource that can only be created. + * API calls that return MethodNotAllowed can never succeed. + * Http status code: `405 StatusMethodNotAllowed` + + +* `InternalError` + * Indicates that an internal error occurred, it is unexpected and the outcome +of the call is unknown. + * Details (optional): + * `causes` + * The original error. + * Http status code: `500 StatusInternalServerError` `code` may contain the suggested HTTP return code for this status. + + +## Events + +Events are complementary to status information, since they can provide some +historical information about status and occurrences in addition to current or +previous status. Generate events for situations users or administrators should +be alerted about. + +Choose a unique, specific, short, CamelCase reason for each event category. For +example, `FreeDiskSpaceInvalid` is a good event reason because it is likely to +refer to just one situation, but `Started` is not a good reason because it +doesn't sufficiently indicate what started, even when combined with other event +fields. + +`Error creating foo` or `Error creating foo %s` would be appropriate for an +event message, with the latter being preferable, since it is more informational. + +Accumulate repeated events in the client, especially for frequent events, to +reduce data volume, load on the system, and noise exposed to users. + +## Naming conventions + +* Go field names must be CamelCase. JSON field names must be camelCase. Other +than capitalization of the initial letter, the two should almost always match. +No underscores nor dashes in either. +* Field and resource names should be declarative, not imperative (DoSomething, +SomethingDoer, DoneBy, DoneAt). +* Use `Node` where referring to +the node resource in the context of the cluster. Use `Host` where referring to +properties of the individual physical/virtual system, such as `hostname`, +`hostPath`, `hostNetwork`, etc. +* `FooController` is a deprecated kind naming convention. Name the kind after +the thing being controlled instead (e.g., `Job` rather than `JobController`). +* The name of a field that specifies the time at which `something` occurs should +be called `somethingTime`. Do not use `stamp` (e.g., `creationTimestamp`). +* We use the `fooSeconds` convention for durations, as discussed in the [units +subsection](#units). + * `fooPeriodSeconds` is preferred for periodic intervals and other waiting +periods (e.g., over `fooIntervalSeconds`). + * `fooTimeoutSeconds` is preferred for inactivity/unresponsiveness deadlines. + * `fooDeadlineSeconds` is preferred for activity completion deadlines. +* Do not use abbreviations in the API, except where they are extremely commonly +used, such as "id", "args", or "stdin". +* Acronyms should similarly only be used when extremely commonly known. All +letters in the acronym should have the same case, using the appropriate case for +the situation. For example, at the beginning of a field name, the acronym should +be all lowercase, such as "httpGet". Where used as a constant, all letters +should be uppercase, such as "TCP" or "UDP". +* The name of a field referring to another resource of kind `Foo` by name should +be called `fooName`. The name of a field referring to another resource of kind +`Foo` by ObjectReference (or subset thereof) should be called `fooRef`. +* More generally, include the units and/or type in the field name if they could +be ambiguous and they are not specified by the value or value type. +* The name of a field expressing a boolean property called 'fooable' should be +called `Fooable`, not `IsFooable`. + +### Namespace Names +* The name of a namespace must be a +[DNS_LABEL](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/architecture/identifiers.md). +* The `kube-` prefix is reserved for Kubernetes system namespaces, e.g. `kube-system` and `kube-public`. +* See +[the namespace docs](https://kubernetes.io/docs/user-guide/namespaces/) for more information. + +## Label, selector, and annotation conventions + +Labels are the domain of users. They are intended to facilitate organization and +management of API resources using attributes that are meaningful to users, as +opposed to meaningful to the system. Think of them as user-created mp3 or email +inbox labels, as opposed to the directory structure used by a program to store +its data. The former enables the user to apply an arbitrary ontology, whereas +the latter is implementation-centric and inflexible. Users will use labels to +select resources to operate on, display label values in CLI/UI columns, etc. +Users should always retain full power and flexibility over the label schemas +they apply to labels in their namespaces. + +However, we should support conveniences for common cases by default. For +example, what we now do in ReplicationController is automatically set the RC's +selector and labels to the labels in the pod template by default, if they are +not already set. That ensures that the selector will match the template, and +that the RC can be managed using the same labels as the pods it creates. Note +that once we generalize selectors, it won't necessarily be possible to +unambiguously generate labels that match an arbitrary selector. + +If the user wants to apply additional labels to the pods that it doesn't select +upon, such as to facilitate adoption of pods or in the expectation that some +label values will change, they can set the selector to a subset of the pod +labels. Similarly, the RC's labels could be initialized to a subset of the pod +template's labels, or could include additional/different labels. + +For disciplined users managing resources within their own namespaces, it's not +that hard to consistently apply schemas that ensure uniqueness. One just needs +to ensure that at least one value of some label key in common differs compared +to all other comparable resources. We could/should provide a verification tool +to check that. However, development of conventions similar to the examples in +[Labels](https://kubernetes.io/docs/user-guide/labels/) make uniqueness straightforward. Furthermore, +relatively narrowly used namespaces (e.g., per environment, per application) can +be used to reduce the set of resources that could potentially cause overlap. + +In cases where users could be running misc. examples with inconsistent schemas, +or where tooling or components need to programmatically generate new objects to +be selected, there needs to be a straightforward way to generate unique label +sets. A simple way to ensure uniqueness of the set is to ensure uniqueness of a +single label value, such as by using a resource name, uid, resource hash, or +generation number. + +Problems with uids and hashes, however, include that they have no semantic +meaning to the user, are not memorable nor readily recognizable, and are not +predictable. Lack of predictability obstructs use cases such as creation of a +replication controller from a pod, such as people want to do when exploring the +system, bootstrapping a self-hosted cluster, or deletion and re-creation of a +new RC that adopts the pods of the previous one, such as to rename it. +Generation numbers are more predictable and much clearer, assuming there is a +logical sequence. Fortunately, for deployments that's the case. For jobs, use of +creation timestamps is common internally. Users should always be able to turn +off auto-generation, in order to permit some of the scenarios described above. +Note that auto-generated labels will also become one more field that needs to be +stripped out when cloning a resource, within a namespace, in a new namespace, in +a new cluster, etc., and will need to be ignored around when updating a resource +via patch or read-modify-write sequence. + +Inclusion of a system prefix in a label key is fairly hostile to UX. A prefix is +only necessary in the case that the user cannot choose the label key, in order +to avoid collisions with user-defined labels. However, I firmly believe that the +user should always be allowed to select the label keys to use on their +resources, so it should always be possible to override default label keys. + +Therefore, resources supporting auto-generation of unique labels should have a +`uniqueLabelKey` field, so that the user could specify the key if they wanted +to, but if unspecified, it could be set by default, such as to the resource +type, like job, deployment, or replicationController. The value would need to be +at least spatially unique, and perhaps temporally unique in the case of job. + +Annotations have very different intended usage from labels. They are +primarily generated and consumed by tooling and system extensions, or are used +by end-users to engage non-standard behavior of components. For example, an +annotation might be used to indicate that an instance of a resource expects +additional handling by non-kubernetes controllers. Annotations may carry +arbitrary payloads, including JSON documents. Like labels, annotation keys can +be prefixed with a governing domain (e.g. `example.com/key-name`). Unprefixed +keys (e.g. `key-name`) are reserved for end-users. Third-party components must +use prefixed keys. Key prefixes under the "kubernetes.io" and "k8s.io" domains +are reserved for use by the kubernetes project and must not be used by +third-parties. + +In early versions of Kubernetes, some in-development features represented new +API fields as annotations, generally with the form `something.alpha.kubernetes.io/name` or +`something.beta.kubernetes.io/name` (depending on our confidence in it). This +pattern is deprecated. Some such annotations may still exist, but no new +annotations may be defined. New API fields are now developed as regular fields. + +Other advice regarding use of labels, annotations, taints, and other generic map keys by +Kubernetes components and tools: + - Key names should be all lowercase, with words separated by dashes instead of camelCase + - For instance, prefer `foo.kubernetes.io/foo-bar` over `foo.kubernetes.io/fooBar`, prefer + `desired-replicas` over `DesiredReplicas` + - Unprefixed keys are reserved for end-users. All other labels and annotations must be prefixed. + - Key prefixes under "kubernetes.io" and "k8s.io" are reserved for the Kubernetes + project. + - Such keys are effectively part of the kubernetes API and may be subject + to deprecation and compatibility policies. + - Key names, including prefixes, should be precise enough that a user could + plausibly understand where it came from and what it is for. + - Key prefixes should carry as much context as possible. + - For instance, prefer `subsystem.kubernetes.io/parameter` over `kubernetes.io/subsystem-parameter` + - Use annotations to store API extensions that the controller responsible for +the resource doesn't need to know about, experimental fields that aren't +intended to be generally used API fields, etc. Beware that annotations aren't +automatically handled by the API conversion machinery. + +## WebSockets and SPDY + +Some of the API operations exposed by Kubernetes involve transfer of binary +streams between the client and a container, including attach, exec, portforward, +and logging. The API therefore exposes certain operations over upgradeable HTTP +connections ([described in RFC 2817](https://tools.ietf.org/html/rfc2817)) via +the WebSocket and SPDY protocols. These actions are exposed as subresources with +their associated verbs (exec, log, attach, and portforward) and are requested +via a GET (to support JavaScript in a browser) and POST (semantically accurate). + +There are two primary protocols in use today: + +1. Streamed channels + + When dealing with multiple independent binary streams of data such as the +remote execution of a shell command (writing to STDIN, reading from STDOUT and +STDERR) or forwarding multiple ports the streams can be multiplexed onto a +single TCP connection. Kubernetes supports a SPDY based framing protocol that +leverages SPDY channels and a WebSocket framing protocol that multiplexes +multiple channels onto the same stream by prefixing each binary chunk with a +byte indicating its channel. The WebSocket protocol supports an optional +subprotocol that handles base64-encoded bytes from the client and returns +base64-encoded bytes from the server and character based channel prefixes ('0', +'1', '2') for ease of use from JavaScript in a browser. + +2. Streaming response + + The default log output for a channel of streaming data is an HTTP Chunked +Transfer-Encoding, which can return an arbitrary stream of binary data from the +server. Browser-based JavaScript is limited in its ability to access the raw +data from a chunked response, especially when very large amounts of logs are +returned, and in future API calls it may be desirable to transfer large files. +The streaming API endpoints support an optional WebSocket upgrade that provides +a unidirectional channel from the server to the client and chunks data as binary +WebSocket frames. An optional WebSocket subprotocol is exposed that base64 +encodes the stream before returning it to the client. + +Clients should use the SPDY protocols if their clients have native support, or +WebSockets as a fallback. Note that WebSockets is susceptible to Head-of-Line +blocking and so clients must read and process each message sequentially. In +the future, an HTTP/2 implementation will be exposed that deprecates SPDY. + + +## Validation + +API objects are validated upon receipt by the apiserver. Validation errors are +flagged and returned to the caller in a `Failure` status with `reason` set to +`Invalid`. In order to facilitate consistent error messages, we ask that +validation logic adheres to the following guidelines whenever possible (though +exceptional cases will exist). + +* Be as precise as possible. +* Telling users what they CAN do is more useful than telling them what they +CANNOT do. +* When asserting a requirement in the positive, use "must". Examples: "must be +greater than 0", "must match regex '[a-z]+'". Words like "should" imply that +the assertion is optional, and must be avoided. +* When asserting a formatting requirement in the negative, use "must not". +Example: "must not contain '..'". Words like "should not" imply that the +assertion is optional, and must be avoided. +* When asserting a behavioral requirement in the negative, use "may not". +Examples: "may not be specified when otherField is empty", "only `name` may be +specified". +* When referencing a literal string value, indicate the literal in +single-quotes. Example: "must not contain '..'". +* When referencing another field name, indicate the name in back-quotes. +Example: "must be greater than `request`". +* When specifying inequalities, use words rather than symbols. Examples: "must +be less than 256", "must be greater than or equal to 0". Do not use words +like "larger than", "bigger than", "more than", "higher than", etc. +* When specifying numeric ranges, use inclusive ranges when possible. +>>>>>>> Fixing links pointing to the new location of strategic-merge-patch.md This file is a placeholder to preserve links. Please remove by April 24, 2019 or the release of kubernetes 1.13, whichever comes first. \ No newline at end of file diff --git a/contributors/devel/controllers.md b/contributors/devel/controllers.md index 268e0d103..0b69d0beb 100644 --- a/contributors/devel/controllers.md +++ b/contributors/devel/controllers.md @@ -1,191 +1,3 @@ -# Writing Controllers +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/controllers.md. -A Kubernetes controller is an active reconciliation process. That is, it watches some object for the world's desired state, and it watches the world's actual state, too. Then, it sends instructions to try and make the world's current state be more like the desired state. - -The simplest implementation of this is a loop: - -```go -for { - desired := getDesiredState() - current := getCurrentState() - makeChanges(desired, current) -} -``` - -Watches, etc, are all merely optimizations of this logic. - -## Guidelines - -When you're writing controllers, there are few guidelines that will help make sure you get the results and performance you're looking for. - -1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue changes for a particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will work on the same item at the same time. - - Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those. - -1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee of ordering amongst those resources. - - Distinct watches are updated independently. Even with an objective ordering of “created resourceA/X” and “created resourceB/Y”, your controller could observe “created resourceB/Y” and “created resourceA/X”. - -1. Level driven, not edge driven. Just like having a shell script that isn't running all the time, your controller may be off for an indeterminate amount of time before running again. - - If an API object appears with a marker value of `true`, you can't count on having seen it turn from `false` to `true`, only that you now observe it being `true`. Even an API watch suffers from this problem, so be sure that you're not counting on seeing a change unless your controller is also marking the information it last made the decision on in the object's status. - -1. Use `SharedInformers`. `SharedInformers` provide hooks to receive notifications of adds, updates, and deletes for a particular resource. They also provide convenience functions for accessing shared caches and determining when a cache is primed. - - Use the factory methods down in https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/informers/factory.go to ensure that you are sharing the same instance of the cache as everyone else. - - This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization costs controller-side, and duplicate caching costs controller-side. - - You may see other mechanisms like reflectors and deltafifos driving controllers. Those were older mechanisms that we later used to build the `SharedInformers`. You should avoid using them in new controllers. - -1. Never mutate original objects! Caches are shared across controllers, this means that if you mutate your "copy" (actually a reference or shallow copy) of an object, you'll mess up other controllers (not just your own). - - The most common point of failure is making a shallow copy, then mutating a map, like `Annotations`. Use `api.Scheme.Copy` to make a deep copy. - -1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the resources that you'll be updating `Status` for. Secondary resources are resources that you'll be managing (creating/deleting) or using for lookups. - - Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing. - -1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else hasn't. - - Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, make sure you don't have a bug in your observation code (e.g., act before your cache has filled). - -1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow simple requeuing with reasonable backoffs. - - Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for. - -1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do. - - In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that item again. - -1. If the primary resource your controller is reconciling supports ObservedGeneration in its status, make sure you correctly set it to metadata.Generation whenever the values between the two fields mismatches. - - This lets clients know that the controller has processed a resource. Make sure that your controller is the main controller that is responsible for that resource, otherwise if you need to communicate observation via your own controller, you will need to create a different kind of ObservedGeneration in the Status of the resource. - -1. Consider using owner references for resources that result in the creation of other resources (eg. a ReplicaSet results in creating Pods). Thus you ensure that children resources are going to be garbage-collected once a resource managed by your controller is deleted. For more information on owner references, read more [here](/contributors/design-proposals/api-machinery/controller-ref.md). - - Pay special attention in the way you are doing adoption. You shouldn't adopt children for a resource when either the parent or the children are marked for deletion. If you are using a cache for your resources, you will likely need to bypass it with a direct API read in case you observe that an owner reference has been updated for one of the children. Thus, you ensure your controller is not racing with the garbage collector. - - See [k8s.io/kubernetes/pull/42938](https://github.com/kubernetes/kubernetes/pull/42938) for more information. - -## Rough Structure - -Overall, your controller should look something like this: - -```go -type Controller struct { - // pods gives cached access to pods. - pods informers.PodLister - podsSynced cache.InformerSynced - - // queue is where incoming work is placed to de-dup and to allow "easy" - // rate limited requeues on errors - queue workqueue.RateLimitingInterface -} - -func NewController(pods informers.PodInformer) *Controller { - c := &Controller{ - pods: pods.Lister(), - podsSynced: pods.Informer().HasSynced, - queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "controller-name"), - } - - // register event handlers to fill the queue with pod creations, updates and deletions - pods.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ - AddFunc: func(obj interface{}) { - key, err := cache.MetaNamespaceKeyFunc(obj) - if err == nil { - c.queue.Add(key) - } - }, - UpdateFunc: func(old interface{}, new interface{}) { - key, err := cache.MetaNamespaceKeyFunc(new) - if err == nil { - c.queue.Add(key) - } - }, - DeleteFunc: func(obj interface{}) { - // IndexerInformer uses a delta nodeQueue, therefore for deletes we have to use this - // key function. - key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) - if err == nil { - c.queue.Add(key) - } - }, - },) - - return c -} - -func (c *Controller) Run(threadiness int, stopCh chan struct{}) { - // don't let panics crash the process - defer utilruntime.HandleCrash() - // make sure the work queue is shutdown which will trigger workers to end - defer c.queue.ShutDown() - - glog.Infof("Starting controller") - - // wait for your secondary caches to fill before starting your work - if !cache.WaitForCacheSync(stopCh, c.podsSynced) { - return - } - - // start up your worker threads based on threadiness. Some controllers - // have multiple kinds of workers - for i := 0; i < threadiness; i++ { - // runWorker will loop until "something bad" happens. The .Until will - // then rekick the worker after one second - go wait.Until(c.runWorker, time.Second, stopCh) - } - - // wait until we're told to stop - <-stopCh - glog.Infof("Shutting down controller") -} - -func (c *Controller) runWorker() { - // hot loop until we're told to stop. processNextWorkItem will - // automatically wait until there's work available, so we don't worry - // about secondary waits - for c.processNextWorkItem() { - } -} - -// processNextWorkItem deals with one key off the queue. It returns false -// when it's time to quit. -func (c *Controller) processNextWorkItem() bool { - // pull the next work item from queue. It should be a key we use to lookup - // something in a cache - key, quit := c.queue.Get() - if quit { - return false - } - // you always have to indicate to the queue that you've completed a piece of - // work - defer c.queue.Done(key) - - // do your work on the key. This method will contains your "do stuff" logic - err := c.syncHandler(key.(string)) - if err == nil { - // if you had no error, tell the queue to stop tracking history for your - // key. This will reset things like failure counts for per-item rate - // limiting - c.queue.Forget(key) - return true - } - - // there was a failure so be sure to report it. This method allows for - // pluggable error handling which can be used for things like - // cluster-monitoring - utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err)) - - // since we failed, we should requeue the item to work on later. This - // method will add a backoff to avoid hotlooping on particular items - // (they're probably still not going to work right away) and overall - // controller protection (everything I've done is broken, this controller - // needs to calm down or it can starve other useful work) cases. - c.queue.AddRateLimited(key) - - return true -} -``` +This file is a placeholder to preserve links. Please remove by April 24, 2019 or the release of kubernetes 1.13, whichever comes first. \ No newline at end of file diff --git a/contributors/devel/generating-clientset.md b/contributors/devel/generating-clientset.md index bf12e92cb..69478a80e 100644 --- a/contributors/devel/generating-clientset.md +++ b/contributors/devel/generating-clientset.md @@ -1,50 +1,3 @@ -# Generation and release cycle of clientset +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/generating-clientset.md. -Client-gen is an automatic tool that generates [clientset](../design-proposals/api-machinery/client-package-structure.md#high-level-client-sets) based on API types. This doc introduces the use of client-gen, and the release cycle of the generated clientsets. - -## Using client-gen - -The workflow includes three steps: - -**1.** Marking API types with tags: in `pkg/apis/${GROUP}/${VERSION}/types.go`, mark the types (e.g., Pods) that you want to generate clients for with the `// +genclient` tag. If the resource associated with the type is not namespace scoped (e.g., PersistentVolume), you need to append the `// +genclient:nonNamespaced` tag as well. - -The following `// +genclient` are supported: - -- `// +genclient` - generate default client verb functions (*create*, *update*, *delete*, *get*, *list*, *update*, *patch*, *watch* and depending on the existence of `.Status` field in the type the client is generated for also *updateStatus*). -- `// +genclient:nonNamespaced` - all verb functions are generated without namespace. -- `// +genclient:onlyVerbs=create,get` - only listed verb functions will be generated. -- `// +genclient:skipVerbs=watch` - all default client verb functions will be generated **except** *watch* verb. -- `// +genclient:noStatus` - skip generation of *updateStatus* verb even thought the `.Status` field exists. - -In some cases you want to generate non-standard verbs (eg. for sub-resources). To do that you can use the following generator tag: - -- `// +genclient:method=Scale,verb=update,subresource=scale,input=k8s.io/api/extensions/v1beta1.Scale,result=k8s.io/api/extensions/v1beta1.Scale` - in this case a new function `Scale(string, *v1beta.Scale) *v1beta.Scale` will be added to the default client and the body of the function will be based on the *update* verb. The optional *subresource* argument will make the generated client function use subresource `scale`. Using the optional *input* and *result* arguments you can override the default type with a custom type. If the import path is not given, the generator will assume the type exists in the same package. - -In addition, the following optional tags influence the client generation: - -- `// +groupName=policy.authorization.k8s.io` – used in the fake client as the full group name (defaults to the package name), -- `// +groupGoName=AuthorizationPolicy` – a CamelCase Golang identifier to de-conflict groups with non-unique prefixes like `policy.authorization.k8s.io` and `policy.k8s.io`. These would lead to two `Policy()` methods in the clientset otherwise (defaults to the upper-case first segement of the group name). - -**2a.** If you are developing in the k8s.io/kubernetes repository, you just need to run hack/update-codegen.sh. - -**2b.** If you are running client-gen outside of k8s.io/kubernetes, you need to use the command line argument `--input` to specify the groups and versions of the APIs you want to generate clients for, client-gen will then look into `pkg/apis/${GROUP}/${VERSION}/types.go` and generate clients for the types you have marked with the `genclient` tags. For example, to generated a clientset named "my_release" including clients for api/v1 objects and extensions/v1beta1 objects, you need to run: - -``` -$ client-gen --input="api/v1,extensions/v1beta1" --clientset-name="my_release" -``` - -**3.** ***Adding expansion methods***: client-gen only generates the common methods, such as CRUD. You can manually add additional methods through the expansion interface. For example, this [file](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/internalversion/pod_expansion.go) adds additional methods to Pod's client. As a convention, we put the expansion interface and its methods in file ${TYPE}_expansion.go. In most cases, you don't want to remove existing expansion files. So to make life easier, instead of creating a new clientset from scratch, ***you can copy and rename an existing clientset (so that all the expansion files are copied)***, and then run client-gen. - -## Output of client-gen - -- clientset: the clientset will be generated at `pkg/client/clientset_generated/` by default, and you can change the path via the `--clientset-path` command line argument. - -- Individual typed clients and client for group: They will be generated at `pkg/client/clientset_generated/${clientset_name}/typed/generated/${GROUP}/${VERSION}/` - -## Released clientsets - -If you are contributing code to k8s.io/kubernetes, try to use the generated clientset [here](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset). - -If you need a stable Go client to build your own project, please refer to the [client-go repository](https://github.com/kubernetes/client-go). - -We are migrating k8s.io/kubernetes to use client-go as well, see issue [#35159](https://github.com/kubernetes/kubernetes/issues/35159). +This file is a placeholder to preserve links. Please remove by April 24, 2019 or the release of kubernetes 1.13, whichever comes first. \ No newline at end of file diff --git a/contributors/devel/sig-api-machinery/controllers.md b/contributors/devel/sig-api-machinery/controllers.md new file mode 100644 index 000000000..268e0d103 --- /dev/null +++ b/contributors/devel/sig-api-machinery/controllers.md @@ -0,0 +1,191 @@ +# Writing Controllers + +A Kubernetes controller is an active reconciliation process. That is, it watches some object for the world's desired state, and it watches the world's actual state, too. Then, it sends instructions to try and make the world's current state be more like the desired state. + +The simplest implementation of this is a loop: + +```go +for { + desired := getDesiredState() + current := getCurrentState() + makeChanges(desired, current) +} +``` + +Watches, etc, are all merely optimizations of this logic. + +## Guidelines + +When you're writing controllers, there are few guidelines that will help make sure you get the results and performance you're looking for. + +1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue changes for a particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will work on the same item at the same time. + + Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those. + +1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee of ordering amongst those resources. + + Distinct watches are updated independently. Even with an objective ordering of “created resourceA/X” and “created resourceB/Y”, your controller could observe “created resourceB/Y” and “created resourceA/X”. + +1. Level driven, not edge driven. Just like having a shell script that isn't running all the time, your controller may be off for an indeterminate amount of time before running again. + + If an API object appears with a marker value of `true`, you can't count on having seen it turn from `false` to `true`, only that you now observe it being `true`. Even an API watch suffers from this problem, so be sure that you're not counting on seeing a change unless your controller is also marking the information it last made the decision on in the object's status. + +1. Use `SharedInformers`. `SharedInformers` provide hooks to receive notifications of adds, updates, and deletes for a particular resource. They also provide convenience functions for accessing shared caches and determining when a cache is primed. + + Use the factory methods down in https://git.k8s.io/kubernetes/staging/src/k8s.io/client-go/informers/factory.go to ensure that you are sharing the same instance of the cache as everyone else. + + This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization costs controller-side, and duplicate caching costs controller-side. + + You may see other mechanisms like reflectors and deltafifos driving controllers. Those were older mechanisms that we later used to build the `SharedInformers`. You should avoid using them in new controllers. + +1. Never mutate original objects! Caches are shared across controllers, this means that if you mutate your "copy" (actually a reference or shallow copy) of an object, you'll mess up other controllers (not just your own). + + The most common point of failure is making a shallow copy, then mutating a map, like `Annotations`. Use `api.Scheme.Copy` to make a deep copy. + +1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the resources that you'll be updating `Status` for. Secondary resources are resources that you'll be managing (creating/deleting) or using for lookups. + + Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing. + +1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else hasn't. + + Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, make sure you don't have a bug in your observation code (e.g., act before your cache has filled). + +1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow simple requeuing with reasonable backoffs. + + Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for. + +1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do. + + In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that item again. + +1. If the primary resource your controller is reconciling supports ObservedGeneration in its status, make sure you correctly set it to metadata.Generation whenever the values between the two fields mismatches. + + This lets clients know that the controller has processed a resource. Make sure that your controller is the main controller that is responsible for that resource, otherwise if you need to communicate observation via your own controller, you will need to create a different kind of ObservedGeneration in the Status of the resource. + +1. Consider using owner references for resources that result in the creation of other resources (eg. a ReplicaSet results in creating Pods). Thus you ensure that children resources are going to be garbage-collected once a resource managed by your controller is deleted. For more information on owner references, read more [here](/contributors/design-proposals/api-machinery/controller-ref.md). + + Pay special attention in the way you are doing adoption. You shouldn't adopt children for a resource when either the parent or the children are marked for deletion. If you are using a cache for your resources, you will likely need to bypass it with a direct API read in case you observe that an owner reference has been updated for one of the children. Thus, you ensure your controller is not racing with the garbage collector. + + See [k8s.io/kubernetes/pull/42938](https://github.com/kubernetes/kubernetes/pull/42938) for more information. + +## Rough Structure + +Overall, your controller should look something like this: + +```go +type Controller struct { + // pods gives cached access to pods. + pods informers.PodLister + podsSynced cache.InformerSynced + + // queue is where incoming work is placed to de-dup and to allow "easy" + // rate limited requeues on errors + queue workqueue.RateLimitingInterface +} + +func NewController(pods informers.PodInformer) *Controller { + c := &Controller{ + pods: pods.Lister(), + podsSynced: pods.Informer().HasSynced, + queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "controller-name"), + } + + // register event handlers to fill the queue with pod creations, updates and deletions + pods.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{ + AddFunc: func(obj interface{}) { + key, err := cache.MetaNamespaceKeyFunc(obj) + if err == nil { + c.queue.Add(key) + } + }, + UpdateFunc: func(old interface{}, new interface{}) { + key, err := cache.MetaNamespaceKeyFunc(new) + if err == nil { + c.queue.Add(key) + } + }, + DeleteFunc: func(obj interface{}) { + // IndexerInformer uses a delta nodeQueue, therefore for deletes we have to use this + // key function. + key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) + if err == nil { + c.queue.Add(key) + } + }, + },) + + return c +} + +func (c *Controller) Run(threadiness int, stopCh chan struct{}) { + // don't let panics crash the process + defer utilruntime.HandleCrash() + // make sure the work queue is shutdown which will trigger workers to end + defer c.queue.ShutDown() + + glog.Infof("Starting controller") + + // wait for your secondary caches to fill before starting your work + if !cache.WaitForCacheSync(stopCh, c.podsSynced) { + return + } + + // start up your worker threads based on threadiness. Some controllers + // have multiple kinds of workers + for i := 0; i < threadiness; i++ { + // runWorker will loop until "something bad" happens. The .Until will + // then rekick the worker after one second + go wait.Until(c.runWorker, time.Second, stopCh) + } + + // wait until we're told to stop + <-stopCh + glog.Infof("Shutting down controller") +} + +func (c *Controller) runWorker() { + // hot loop until we're told to stop. processNextWorkItem will + // automatically wait until there's work available, so we don't worry + // about secondary waits + for c.processNextWorkItem() { + } +} + +// processNextWorkItem deals with one key off the queue. It returns false +// when it's time to quit. +func (c *Controller) processNextWorkItem() bool { + // pull the next work item from queue. It should be a key we use to lookup + // something in a cache + key, quit := c.queue.Get() + if quit { + return false + } + // you always have to indicate to the queue that you've completed a piece of + // work + defer c.queue.Done(key) + + // do your work on the key. This method will contains your "do stuff" logic + err := c.syncHandler(key.(string)) + if err == nil { + // if you had no error, tell the queue to stop tracking history for your + // key. This will reset things like failure counts for per-item rate + // limiting + c.queue.Forget(key) + return true + } + + // there was a failure so be sure to report it. This method allows for + // pluggable error handling which can be used for things like + // cluster-monitoring + utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err)) + + // since we failed, we should requeue the item to work on later. This + // method will add a backoff to avoid hotlooping on particular items + // (they're probably still not going to work right away) and overall + // controller protection (everything I've done is broken, this controller + // needs to calm down or it can starve other useful work) cases. + c.queue.AddRateLimited(key) + + return true +} +``` diff --git a/contributors/devel/sig-api-machinery/generating-clientset.md b/contributors/devel/sig-api-machinery/generating-clientset.md new file mode 100644 index 000000000..bf12e92cb --- /dev/null +++ b/contributors/devel/sig-api-machinery/generating-clientset.md @@ -0,0 +1,50 @@ +# Generation and release cycle of clientset + +Client-gen is an automatic tool that generates [clientset](../design-proposals/api-machinery/client-package-structure.md#high-level-client-sets) based on API types. This doc introduces the use of client-gen, and the release cycle of the generated clientsets. + +## Using client-gen + +The workflow includes three steps: + +**1.** Marking API types with tags: in `pkg/apis/${GROUP}/${VERSION}/types.go`, mark the types (e.g., Pods) that you want to generate clients for with the `// +genclient` tag. If the resource associated with the type is not namespace scoped (e.g., PersistentVolume), you need to append the `// +genclient:nonNamespaced` tag as well. + +The following `// +genclient` are supported: + +- `// +genclient` - generate default client verb functions (*create*, *update*, *delete*, *get*, *list*, *update*, *patch*, *watch* and depending on the existence of `.Status` field in the type the client is generated for also *updateStatus*). +- `// +genclient:nonNamespaced` - all verb functions are generated without namespace. +- `// +genclient:onlyVerbs=create,get` - only listed verb functions will be generated. +- `// +genclient:skipVerbs=watch` - all default client verb functions will be generated **except** *watch* verb. +- `// +genclient:noStatus` - skip generation of *updateStatus* verb even thought the `.Status` field exists. + +In some cases you want to generate non-standard verbs (eg. for sub-resources). To do that you can use the following generator tag: + +- `// +genclient:method=Scale,verb=update,subresource=scale,input=k8s.io/api/extensions/v1beta1.Scale,result=k8s.io/api/extensions/v1beta1.Scale` - in this case a new function `Scale(string, *v1beta.Scale) *v1beta.Scale` will be added to the default client and the body of the function will be based on the *update* verb. The optional *subresource* argument will make the generated client function use subresource `scale`. Using the optional *input* and *result* arguments you can override the default type with a custom type. If the import path is not given, the generator will assume the type exists in the same package. + +In addition, the following optional tags influence the client generation: + +- `// +groupName=policy.authorization.k8s.io` – used in the fake client as the full group name (defaults to the package name), +- `// +groupGoName=AuthorizationPolicy` – a CamelCase Golang identifier to de-conflict groups with non-unique prefixes like `policy.authorization.k8s.io` and `policy.k8s.io`. These would lead to two `Policy()` methods in the clientset otherwise (defaults to the upper-case first segement of the group name). + +**2a.** If you are developing in the k8s.io/kubernetes repository, you just need to run hack/update-codegen.sh. + +**2b.** If you are running client-gen outside of k8s.io/kubernetes, you need to use the command line argument `--input` to specify the groups and versions of the APIs you want to generate clients for, client-gen will then look into `pkg/apis/${GROUP}/${VERSION}/types.go` and generate clients for the types you have marked with the `genclient` tags. For example, to generated a clientset named "my_release" including clients for api/v1 objects and extensions/v1beta1 objects, you need to run: + +``` +$ client-gen --input="api/v1,extensions/v1beta1" --clientset-name="my_release" +``` + +**3.** ***Adding expansion methods***: client-gen only generates the common methods, such as CRUD. You can manually add additional methods through the expansion interface. For example, this [file](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset/typed/core/internalversion/pod_expansion.go) adds additional methods to Pod's client. As a convention, we put the expansion interface and its methods in file ${TYPE}_expansion.go. In most cases, you don't want to remove existing expansion files. So to make life easier, instead of creating a new clientset from scratch, ***you can copy and rename an existing clientset (so that all the expansion files are copied)***, and then run client-gen. + +## Output of client-gen + +- clientset: the clientset will be generated at `pkg/client/clientset_generated/` by default, and you can change the path via the `--clientset-path` command line argument. + +- Individual typed clients and client for group: They will be generated at `pkg/client/clientset_generated/${clientset_name}/typed/generated/${GROUP}/${VERSION}/` + +## Released clientsets + +If you are contributing code to k8s.io/kubernetes, try to use the generated clientset [here](https://git.k8s.io/kubernetes/pkg/client/clientset_generated/internalclientset). + +If you need a stable Go client to build your own project, please refer to the [client-go repository](https://github.com/kubernetes/client-go). + +We are migrating k8s.io/kubernetes to use client-go as well, see issue [#35159](https://github.com/kubernetes/kubernetes/issues/35159). diff --git a/contributors/devel/sig-api-machinery/strategic-merge-patch.md b/contributors/devel/sig-api-machinery/strategic-merge-patch.md new file mode 100644 index 000000000..4f45ef8e1 --- /dev/null +++ b/contributors/devel/sig-api-machinery/strategic-merge-patch.md @@ -0,0 +1,449 @@ +Strategic Merge Patch +===================== + +# Background + +Kubernetes supports a customized version of JSON merge patch called strategic merge patch. This +patch format is used by `kubectl apply`, `kubectl edit` and `kubectl patch`, and contains +specialized directives to control how specific fields are merged. + +In the standard JSON merge patch, JSON objects are always merged but lists are +always replaced. Often that isn't what we want. Let's say we start with the +following Pod: + +```yaml +spec: + containers: + - name: nginx + image: nginx-1.0 +``` + +and we POST that to the server (as JSON). Then let's say we want to *add* a +container to this Pod. + +```yaml +PATCH /api/v1/namespaces/default/pods/pod-name +spec: + containers: + - name: log-tailer + image: log-tailer-1.0 +``` + +If we were to use standard Merge Patch, the entire container list would be +replaced with the single log-tailer container. However, our intent is for the +container lists to merge together based on the `name` field. + +To solve this problem, Strategic Merge Patch uses the go struct tag of the API +objects to determine what lists should be merged and which ones should not. +The metadata is available as struct tags on the API objects +themselves and also available to clients as [OpenAPI annotations](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/README.md#x-kubernetes-patch-strategy-and-x-kubernetes-patch-merge-key). +In the above example, the `patchStrategy` metadata for the `containers` +field would be `merge` and the `patchMergeKey` would be `name`. + + +# Basic Patch Format + +Strategic Merge Patch supports special operations through directives. + +There are multiple directives: + +- replace +- merge +- delete +- delete from primitive list + +`replace`, `merge` and `delete` are mutual exclusive. + +## `replace` Directive + +### Purpose + +`replace` directive indicates that the element that contains it should be replaced instead of being merged. + +### Syntax + +`replace` directive is used in both patch with directive marker and go struct tags. + +Example usage in the patch: + +``` +$patch: replace +``` + +### Example + +`replace` directive can be used on both map and list. + +#### Map + +To indicate that a map should not be merged and instead should be taken literally: + +```yaml +$patch: replace # recursive and applies to all fields of the map it's in +containers: +- name: nginx + image: nginx-1.0 +``` + +#### List of Maps + +To override the container list to be strictly replaced, regardless of the default: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: replace # any further $patch operations nested in this list will be ignored +``` + + +## `delete` Directive + +### Purpose + +`delete` directive indicates that the element that contains it should be deleted. + +### Syntax + +`delete` directive is used only in the patch with directive marker. +It can be used on both map and list of maps. +``` +$patch: delete +``` + +### Example + +#### List of Maps + +To delete an element of a list that should be merged: + +```yaml +containers: + - name: nginx + image: nginx-1.0 + - $patch: delete + name: log-tailer # merge key and value goes here +``` + +Note: Delete operation will delete all entries in the list that match the merge key. + +#### Maps + +One way to delete a map is using `delete` directive. +Applying this patch will delete the rollingUpdate map. +```yaml +rollingUpdate: + $patch: delete +``` + +An equivalent way to delete this map is +```yaml +rollingUpdate: null +``` + +## `merge` Directive + +### Purpose + +`merge` directive indicates that the element that contains it should be merged instead of being replaced. + +### Syntax + +`merge` directive is used only in the go struct tags. + + +## `deleteFromPrimitiveList` Directive + +### Purpose + +We have two patch strategies for lists of primitives: replace and merge. +Replace is the default patch strategy for list, which will replace the whole list on update and it will preserve the order; +while merge strategy works as an unordered set. We call a primitive list with merge strategy an unordered set. +The patch strategy is defined in the go struct tag of the API objects. + +`deleteFromPrimitiveList` directive indicates that the elements in this list should be deleted from the original primitive list. + +### Syntax + +It is used only as the prefix of the key in the patch. +``` +$deleteFromPrimitiveList/: [a primitive list] +``` + +### Example + +##### List of Primitives (Unordered Set) + +`finalizers` uses `merge` as patch strategy. +```go +Finalizers []string `json:"finalizers,omitempty" patchStrategy:"merge" protobuf:"bytes,14,rep,name=finalizers"` +``` + +Suppose we have defined a `finalizers` and we call it the original finalizers: + +```yaml +finalizers: + - a + - b + - c +``` + +To delete items "b" and "c" from the original finalizers, the patch will be: + +```yaml +# The directive includes the prefix $deleteFromPrimitiveList and +# followed by a '/' and the name of the list. +# The values in this list will be deleted after applying the patch. +$deleteFromPrimitiveList/finalizers: + - b + - c +``` + +After applying the patch on the original finalizers, it will become: + +```yaml +finalizers: + - a +``` + +Note: When merging two set, the primitives are first deduplicated and then merged. +In an erroneous case, the set may be created with duplicates. Deleting an +item that has duplicates will delete all matching items. + +## `setElementOrder` Directive + +### Purpose + +`setElementOrder` directive provides a way to specify the order of a list. +The relative order specified in this directive will be retained. +Please refer to [proposal](/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md) for more information. + +### Syntax + +It is used only as the prefix of the key in the patch. +``` +$setElementOrder/: [a list] +``` + +### Example + +#### List of Primitives + +Suppose we have a list of `finalizers`: +```yaml +finalizers: + - a + - b + - c +``` + +To reorder the elements order in the list, we can send a patch: +```yaml +# The directive includes the prefix $setElementOrder and +# followed by a '/' and the name of the list. +$setElementOrder/finalizers: + - b + - c + - a +``` + +After applying the patch, it will be: +```yaml +finalizers: + - b + - c + - a +``` + +#### List of Maps + +Suppose we have a list of `containers` whose `mergeKey` is `name`: +```yaml +containers: + - name: a + ... + - name: b + ... + - name: c + ... +``` + +To reorder the elements order in the list, we can send a patch: +```yaml +# each map in the list should only include the mergeKey +$setElementOrder/containers: + - name: b + - name: c + - name: a +``` + +After applying the patch, it will be: +```yaml +containers: + - name: b + ... + - name: c + ... + - name: a + ... +``` + + +## `retainKeys` Directive + +### Purpose + +`retainKeys` directive provides a mechanism for union types to clear mutual exclusive fields. +When this directive is present in the patch, all the fields not in this directive will be cleared. +Please refer to [proposal](/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md) for more information. + +### Syntax + +``` +$retainKeys: [a list of field keys] +``` + +### Example + +#### Map + +Suppose we have a union type: +``` +union: + foo: a + other: b +``` + +And we have a patch: +``` +union: + retainKeys: + - another + - bar + another: d + bar: c +``` + +After applying this patch, we get: +``` +union: + # Field foo and other have been cleared w/o explicitly set them to null. + another: d + bar: c +``` + +# Changing patch format + +As issues and limitations have been discovered with the strategic merge +patch implementation, it has been necessary to change the patch format +to support additional semantics - such as merging lists of +primitives and defining order when merging lists. + +## Requirements for any changes to the patch format + +**Note:** Changes to the strategic merge patch must be backwards compatible such +that patch requests valid in previous versions continue to be valid. +That is, old patch formats sent by old clients to new servers with +must continue to function correctly. + +Previously valid patch requests do not need to keep the exact same +behavior, but do need to behave correctly. + +**Example:** if a patch request previously randomized the order of elements +in a list and we want to provide a deterministic order, we must continue +to support old patch format but we can make the ordering deterministic +for the old format. + +### Client version skew + +Because the server does not publish which patch versions it supports, +and it silently ignores patch directives that it does not recognize, +new patches should behave correctly when sent to old servers that +may not support all of the patch directives. + +While the patch API must be backwards compatible, it must also +be forward compatible for 1 version. This is needed because `kubectl` must +support talking to older and newer server versions without knowing what +parts of patch are supported on each, and generate patches that work correctly on both. + +## Strategies for introducing new patch behavior + +#### 1. Add optional semantic meaning to the existing patch format. + +**Note:** Must not require new data or elements to be present that was not required before. Meaning must not break old interpretation of old patches. + +**Good Example:** + +Old format + - ordering of elements in patch had no meaning and the final ordering was arbitrary + +New format + - ordering of elements in patch has meaning and the final ordering is deterministic based on the ordering in the patch + +**Bad Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +This example won't work, because old patch formats will contain data that is now +considered required. To support this, introduce a new directive to guard the +new patch format. + +#### 2. Add support for new directives in the patch format + +- Optional directives may be introduced to change how the patch is applied by the server - **backwards compatible** (old patch against newer server). + - May control how the patch is applied + - May contain patch information - such as elements to delete from a list + - Must NOT impose new requirements on the old patch format + +- New patch requests should be a superset of old patch requests - **forwards compatible** (newer patch against older server) + - *Old servers will ignore directives they do not recognize* + - Must include the full patch that would have been sent before the new directives were added. + - Must NOT rely on the directive being supported by the server + +**Good Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format *without* directive + - Same as old + +New format *with* directive + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +In this example, the behavior was unchanged when the directive was missing, +retaining the old behavior for old patch requests. + +**Bad Example:** + +Old format + - fields not present in a patch for Kind foo are ignored + - unmodified fields for Kind foo are optional in patch request + +New format *with* directive + - Same as old + +New format *without* directive + - fields not present in a patch for Kind foo are cleared + - unmodified fields for Kind foo are required in patch request + +In this example, the behavior was changed when the directive was missing, +breaking compatibility. + +## Alternatives + +The previous strategy is necessary because there is no notion of +patch versions. Having the client negotiate the patch version +with the server would allow changing the patch format, but at +the cost of supporting multiple patch formats in the server and client. +Using client provided directives to evolve how a patch is merged +provides some limited support for multiple versions. + diff --git a/contributors/devel/strategic-merge-patch.md b/contributors/devel/strategic-merge-patch.md index 4f45ef8e1..b76e28de7 100644 --- a/contributors/devel/strategic-merge-patch.md +++ b/contributors/devel/strategic-merge-patch.md @@ -1,449 +1,3 @@ -Strategic Merge Patch -===================== - -# Background - -Kubernetes supports a customized version of JSON merge patch called strategic merge patch. This -patch format is used by `kubectl apply`, `kubectl edit` and `kubectl patch`, and contains -specialized directives to control how specific fields are merged. - -In the standard JSON merge patch, JSON objects are always merged but lists are -always replaced. Often that isn't what we want. Let's say we start with the -following Pod: - -```yaml -spec: - containers: - - name: nginx - image: nginx-1.0 -``` - -and we POST that to the server (as JSON). Then let's say we want to *add* a -container to this Pod. - -```yaml -PATCH /api/v1/namespaces/default/pods/pod-name -spec: - containers: - - name: log-tailer - image: log-tailer-1.0 -``` - -If we were to use standard Merge Patch, the entire container list would be -replaced with the single log-tailer container. However, our intent is for the -container lists to merge together based on the `name` field. - -To solve this problem, Strategic Merge Patch uses the go struct tag of the API -objects to determine what lists should be merged and which ones should not. -The metadata is available as struct tags on the API objects -themselves and also available to clients as [OpenAPI annotations](https://github.com/kubernetes/kubernetes/blob/master/api/openapi-spec/README.md#x-kubernetes-patch-strategy-and-x-kubernetes-patch-merge-key). -In the above example, the `patchStrategy` metadata for the `containers` -field would be `merge` and the `patchMergeKey` would be `name`. - - -# Basic Patch Format - -Strategic Merge Patch supports special operations through directives. - -There are multiple directives: - -- replace -- merge -- delete -- delete from primitive list - -`replace`, `merge` and `delete` are mutual exclusive. - -## `replace` Directive - -### Purpose - -`replace` directive indicates that the element that contains it should be replaced instead of being merged. - -### Syntax - -`replace` directive is used in both patch with directive marker and go struct tags. - -Example usage in the patch: - -``` -$patch: replace -``` - -### Example - -`replace` directive can be used on both map and list. - -#### Map - -To indicate that a map should not be merged and instead should be taken literally: - -```yaml -$patch: replace # recursive and applies to all fields of the map it's in -containers: -- name: nginx - image: nginx-1.0 -``` - -#### List of Maps - -To override the container list to be strictly replaced, regardless of the default: - -```yaml -containers: - - name: nginx - image: nginx-1.0 - - $patch: replace # any further $patch operations nested in this list will be ignored -``` - - -## `delete` Directive - -### Purpose - -`delete` directive indicates that the element that contains it should be deleted. - -### Syntax - -`delete` directive is used only in the patch with directive marker. -It can be used on both map and list of maps. -``` -$patch: delete -``` - -### Example - -#### List of Maps - -To delete an element of a list that should be merged: - -```yaml -containers: - - name: nginx - image: nginx-1.0 - - $patch: delete - name: log-tailer # merge key and value goes here -``` - -Note: Delete operation will delete all entries in the list that match the merge key. - -#### Maps - -One way to delete a map is using `delete` directive. -Applying this patch will delete the rollingUpdate map. -```yaml -rollingUpdate: - $patch: delete -``` - -An equivalent way to delete this map is -```yaml -rollingUpdate: null -``` - -## `merge` Directive - -### Purpose - -`merge` directive indicates that the element that contains it should be merged instead of being replaced. - -### Syntax - -`merge` directive is used only in the go struct tags. - - -## `deleteFromPrimitiveList` Directive - -### Purpose - -We have two patch strategies for lists of primitives: replace and merge. -Replace is the default patch strategy for list, which will replace the whole list on update and it will preserve the order; -while merge strategy works as an unordered set. We call a primitive list with merge strategy an unordered set. -The patch strategy is defined in the go struct tag of the API objects. - -`deleteFromPrimitiveList` directive indicates that the elements in this list should be deleted from the original primitive list. - -### Syntax - -It is used only as the prefix of the key in the patch. -``` -$deleteFromPrimitiveList/: [a primitive list] -``` - -### Example - -##### List of Primitives (Unordered Set) - -`finalizers` uses `merge` as patch strategy. -```go -Finalizers []string `json:"finalizers,omitempty" patchStrategy:"merge" protobuf:"bytes,14,rep,name=finalizers"` -``` - -Suppose we have defined a `finalizers` and we call it the original finalizers: - -```yaml -finalizers: - - a - - b - - c -``` - -To delete items "b" and "c" from the original finalizers, the patch will be: - -```yaml -# The directive includes the prefix $deleteFromPrimitiveList and -# followed by a '/' and the name of the list. -# The values in this list will be deleted after applying the patch. -$deleteFromPrimitiveList/finalizers: - - b - - c -``` - -After applying the patch on the original finalizers, it will become: - -```yaml -finalizers: - - a -``` - -Note: When merging two set, the primitives are first deduplicated and then merged. -In an erroneous case, the set may be created with duplicates. Deleting an -item that has duplicates will delete all matching items. - -## `setElementOrder` Directive - -### Purpose - -`setElementOrder` directive provides a way to specify the order of a list. -The relative order specified in this directive will be retained. -Please refer to [proposal](/contributors/design-proposals/cli/preserve-order-in-strategic-merge-patch.md) for more information. - -### Syntax - -It is used only as the prefix of the key in the patch. -``` -$setElementOrder/: [a list] -``` - -### Example - -#### List of Primitives - -Suppose we have a list of `finalizers`: -```yaml -finalizers: - - a - - b - - c -``` - -To reorder the elements order in the list, we can send a patch: -```yaml -# The directive includes the prefix $setElementOrder and -# followed by a '/' and the name of the list. -$setElementOrder/finalizers: - - b - - c - - a -``` - -After applying the patch, it will be: -```yaml -finalizers: - - b - - c - - a -``` - -#### List of Maps - -Suppose we have a list of `containers` whose `mergeKey` is `name`: -```yaml -containers: - - name: a - ... - - name: b - ... - - name: c - ... -``` - -To reorder the elements order in the list, we can send a patch: -```yaml -# each map in the list should only include the mergeKey -$setElementOrder/containers: - - name: b - - name: c - - name: a -``` - -After applying the patch, it will be: -```yaml -containers: - - name: b - ... - - name: c - ... - - name: a - ... -``` - - -## `retainKeys` Directive - -### Purpose - -`retainKeys` directive provides a mechanism for union types to clear mutual exclusive fields. -When this directive is present in the patch, all the fields not in this directive will be cleared. -Please refer to [proposal](/contributors/design-proposals/api-machinery/add-new-patchStrategy-to-clear-fields-not-present-in-patch.md) for more information. - -### Syntax - -``` -$retainKeys: [a list of field keys] -``` - -### Example - -#### Map - -Suppose we have a union type: -``` -union: - foo: a - other: b -``` - -And we have a patch: -``` -union: - retainKeys: - - another - - bar - another: d - bar: c -``` - -After applying this patch, we get: -``` -union: - # Field foo and other have been cleared w/o explicitly set them to null. - another: d - bar: c -``` - -# Changing patch format - -As issues and limitations have been discovered with the strategic merge -patch implementation, it has been necessary to change the patch format -to support additional semantics - such as merging lists of -primitives and defining order when merging lists. - -## Requirements for any changes to the patch format - -**Note:** Changes to the strategic merge patch must be backwards compatible such -that patch requests valid in previous versions continue to be valid. -That is, old patch formats sent by old clients to new servers with -must continue to function correctly. - -Previously valid patch requests do not need to keep the exact same -behavior, but do need to behave correctly. - -**Example:** if a patch request previously randomized the order of elements -in a list and we want to provide a deterministic order, we must continue -to support old patch format but we can make the ordering deterministic -for the old format. - -### Client version skew - -Because the server does not publish which patch versions it supports, -and it silently ignores patch directives that it does not recognize, -new patches should behave correctly when sent to old servers that -may not support all of the patch directives. - -While the patch API must be backwards compatible, it must also -be forward compatible for 1 version. This is needed because `kubectl` must -support talking to older and newer server versions without knowing what -parts of patch are supported on each, and generate patches that work correctly on both. - -## Strategies for introducing new patch behavior - -#### 1. Add optional semantic meaning to the existing patch format. - -**Note:** Must not require new data or elements to be present that was not required before. Meaning must not break old interpretation of old patches. - -**Good Example:** - -Old format - - ordering of elements in patch had no meaning and the final ordering was arbitrary - -New format - - ordering of elements in patch has meaning and the final ordering is deterministic based on the ordering in the patch - -**Bad Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -This example won't work, because old patch formats will contain data that is now -considered required. To support this, introduce a new directive to guard the -new patch format. - -#### 2. Add support for new directives in the patch format - -- Optional directives may be introduced to change how the patch is applied by the server - **backwards compatible** (old patch against newer server). - - May control how the patch is applied - - May contain patch information - such as elements to delete from a list - - Must NOT impose new requirements on the old patch format - -- New patch requests should be a superset of old patch requests - **forwards compatible** (newer patch against older server) - - *Old servers will ignore directives they do not recognize* - - Must include the full patch that would have been sent before the new directives were added. - - Must NOT rely on the directive being supported by the server - -**Good Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format *without* directive - - Same as old - -New format *with* directive - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -In this example, the behavior was unchanged when the directive was missing, -retaining the old behavior for old patch requests. - -**Bad Example:** - -Old format - - fields not present in a patch for Kind foo are ignored - - unmodified fields for Kind foo are optional in patch request - -New format *with* directive - - Same as old - -New format *without* directive - - fields not present in a patch for Kind foo are cleared - - unmodified fields for Kind foo are required in patch request - -In this example, the behavior was changed when the directive was missing, -breaking compatibility. - -## Alternatives - -The previous strategy is necessary because there is no notion of -patch versions. Having the client negotiate the patch version -with the server would allow changing the patch format, but at -the cost of supporting multiple patch formats in the server and client. -Using client provided directives to evolve how a patch is merged -provides some limited support for multiple versions. +This file has moved to https://git.k8s.io/community/contributors/devel/sig-api-machinery/strategic-merge-patch.md. +This file is a placeholder to preserve links. Please remove by April 24, 2019 or the release of kubernetes 1.13, whichever comes first. \ No newline at end of file