From ad19d350a4b9b0eef79fe292aae4c0cdce318efc Mon Sep 17 00:00:00 2001 From: Kenneth Owens Date: Mon, 3 Apr 2017 16:18:15 -0700 Subject: [PATCH] Change Revision to Generation for consistency Make reference to fact that Deployment and Stateful set both update from a specific revision to a specific revision --- .../design-proposals/statefulset-update.md | 158 +++++++++--------- 1 file changed, 80 insertions(+), 78 deletions(-) diff --git a/contributors/design-proposals/statefulset-update.md b/contributors/design-proposals/statefulset-update.md index 962e66488..fac75b77e 100644 --- a/contributors/design-proposals/statefulset-update.md +++ b/contributors/design-proposals/statefulset-update.md @@ -54,7 +54,7 @@ compensate for the application's scaling. - StatefulSet update must support singleton StatefulSets. However, an update in this case will cause a temporary outage. This is acceptable as a single process application is, by definition, not highly available. - - Disruption in Kubernetes is controlled by PodDistruptionBugets. As + - Disruption in Kubernetes is controlled by PodDistruptionBudgets. As StatefulSet updates progress one Pod at a time, and only occur when all other Pods have a Status of Running and a Ready Condition, they can not violate reasonable PodDisrutptionBugdets. @@ -81,8 +81,10 @@ compensate for the application's scaling. - Stateful applications are likely to evolve wire protocols and storage formats between versions. In most cases, when updating the application's Pod's containers, it will not be safe to roll back or forward to an arbitrary - version. StatefulSet update should work well when rolling out an update, - or performing a rollback, between two specific revisions of the StatefulSet. + version. Controller based Pod update should work well when rolling out an + update, or performing a rollback, between two specific revisions of the + controlled API object. This is how Deployment functions, and this property is, + perhaps, even more critical for stateful applications. ## Requirements This design is based on the following requirements. @@ -109,26 +111,26 @@ The following modifications will be made to the StatefulSetStatus API object. type StatefulSetStatus struct { // ObservedGeneration and Replicas fields are ommitted for brevity. - // TemplateRevision, if not nil, is the revision of the PodTemplate that was + // TemplateGeneration, if not nil, is the generation of the PodTemplate that was // used to create Pods with ordinals in the sequence // [0,CurrentReplicas). - TemplateRevision *int64 `json:"templateRevision,omitempty"` + TemplateGeneration *int64 `json:"templateGeneration,omitempty"` - // TargetTemplateRevision, if not nil, is the revision of the PodTemplate + // TargetTemplateGeneration, if not nil, is the generation of the PodTemplate // that was used to create Pods with ordinals in the sequence // [Replicas - UpdatedReplicas, Replicas). - TargetTemplateRevision *int64 `json:"targetTemplateRevision,omitempty"` + TargetTemplateGeneration *int64 `json:"targetTemplateGeneration,omitempty"` // ReadyReplicas is the current number of Pods, created by the StatefulSet // controller, that have a Status for Running and a Ready Condition. ReadyReplicas int32 `json:"readyReplicas,omitempty"` - // CurrentRevisionReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by CurrentTemplateRevision. + // CurrentGenerationReplicas is the number of Pods created by the StatefulSet + // controller from the PodTemplateSpec indicated by CurrentTemplateGeneration. CurrentReplicas int32 `json:"currentReplicas,omitempty"` // UpdatedReplicas is the number of Pods created by the StatefulSet - // controller from the PodTemplateSpec indicated by TargetTemplateRevision. + // controller from the PodTemplateSpec indicated by TargetTemplateGeneration. UpdatedReplicas int32 `json:"taretReplicas,omitempty"` } ``` @@ -141,25 +143,25 @@ type StatefulSetSpec struct { // ommitted for brevity. v1.PodTemplateSpec `json:"template"` - // TemplateRevision is a monotonically increasing, 64 bit, integer used to + // TemplateGeneration is a monotonically increasing, 64 bit, integer used to // indicate the version of the of the PodTemplateSpec. If nil, the // StatefulSetController has not initialized its revision history, // change tracking is not enabled, and all Pods will be created from // Template. - TemplateRevision *int64 `json:"templateRevision"` + TemplateGeneration *int64 `json:"templateGeneration"` - // RevisionPartition partitions the Pods in the StatefulSet by ordinal such + // GenerationPartition partitions the Pods in the StatefulSet by ordinal such // that all Pods with a lower ordinal will be created from the PodTemplate that // represents the current revision of the StatefulSet's revision history and // all Pods with an a greater or equal ordinal will be created from the // PodTemplate that represents the target revision of the StatefulSet's // revision history. - RevisionPartition *int32 `json:"revisionPartition,omitempty` + GenerationPartition *int32 `json:"generationPartition,omitempty` // RevisionHistoryLimit is the maximum number of PodTemplates that will // be maintained in the StatefulSet's revision history. It must be at // least two. - RevisionHisotryLimit int32 `json:historyRevisionDepth,omitempty` + RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty` } ``` @@ -171,10 +173,10 @@ Additionally, we introduce the following constants. // history. const StatefulSetPodTemplateLabel = "created-by-statefulset" -// StatefulSetTemplateRevisionLabel is the label applied to a PodTemplate or +// StatefulSetTemplateGenerationLabel is the label applied to a PodTemplate or // Pod to indicate the position of the object's Template in the revision // history of a StatefulSet. -const StatefulSetTemplateRevisionLabel = "statefulset-template-revision" +const StatefulSetTemplateGenerationLabel = "statefulset-template-revision" ``` ## StatefulSet Controller @@ -197,23 +199,23 @@ conform to the other. The conditions that define this state in terms of the StatefulSet's StatefulSetSpec and StatefulSetStatus are below. 1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods. -1. If StatefulSet's `.Spec.RevisionPartition` is nil, then the following is true. - 1. The StatefulSet's `.Status.TemplateRevision` is equal to its - `.Status.TargetRevision`. +1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is true. + 1. The StatefulSet's `.Status.TemplateGeneration` is equal to its + `.Status.TargetGeneration`. 1. All Pods in the StatefulSet have been generated from the PodTemplate - labeled with a `StatefulSetTemplateRevisionLabel` equal to its - `.Status.TemplateRevision`. -1. If the StatefulSet's `.Spec.RevisionPartition` is not nil, then the following + labeled with a `StatefulSetTemplateGenerationLabel` equal to its + `.Status.TemplateGeneration`. +1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the following is true. - 1. All Pods with ordinals is the sequence `[0,.Spec.RevisionPartition)` have + 1. All Pods with ordinals is the sequence `[0,.Spec.GenerationPartition)` have been generated from the PodTemplate in the StatefulSet's revision history - that is labeled with a `StatefulSetTemplateRevisionLabel` equal to - `.Status.TemplateRevision`. + that is labeled with a `StatefulSetTemplateGenerationLabel` equal to + `.Status.TemplateGeneration`. 1. All Pods with ordinals in the sequence - `[Spec.RevisionParition,.Spec.Replicas)` have been created with the + `[Spec.GenerationParition,.Spec.Replicas)` have been created with the PodTemplate in the StatefulSet's revision history that is labeled with a - `StatefulSetTemplateRevisionLabel` equal to - `.Status.TargetTemplateRevision`. + `StatefulSetTemplateGenerationLabel` equal to + `.Status.TargetTemplateGeneration`. ### Revised Controller Algorithm The StatefulSet controller will use the following algorithm to continue to @@ -274,10 +276,10 @@ StatefulSet's `.Spec.Template` field. `StatefulSetPodTemplateLabel` set to the StatefulSet's `.Name` to allow for selection of the PodTemplates that comprise the StatefulSet's revision history. 1. The controller will label the PodTemplate with a -`StaefulSetTemplateRevisionLabel` set to the StatefulSet's -`.Spec.TemplateRevision`. +`StaefulSetTemplateGenerationLabel` set to the StatefulSet's +`.Spec.TemplateGeneration`. 1. The controller will set the Name of the PodTemplate to a concatenation of the -`.Name` of the StatefulSet and the `.Spec.TemplateRevision`. +`.Name` of the StatefulSet and the `.Spec.TemplateGeneration`. 1. The controller will then create the PodTemplate. #### PodTemplate Deletion @@ -297,7 +299,7 @@ and it will allow the PodTemplate to be deleted via garbage collection. In order to reconstruct the history of revisions to a StatefulSet, the StatefulSet controller will do the following. -1. If the StatefulSet's `.Spec.TemplateRevision` is nil, the StatefulSet +1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, the StatefulSet has never been updated, and its history has never been initialized. This is the state the object will be in when a cluster is first upgraded from a version that does not support StatefulSet update to a version that does. In this case, @@ -311,12 +313,12 @@ ControllerRef matching the the StatefulSet. If the controller selects PodTemplates that it does not own, it will report an error, but it will continue reconstructing the StatefulSet's history. 1. The controller will filter out all PodTemplates that do not have a -`StatefulSetTemplateRevisionLabel` mapped to a valid revision. This can only +`StatefulSetTemplateGenerationLabel` mapped to a valid revision. This can only occur if the user purposefully deletes the label. In this case, the controller will report an error, but it will continue reconstructing the StatefulSet's revision history. 1. For all the remaining PodTemplates, the controller will sort them in -ascending order by the value mapped to their `StatefulSetTemplateRevisionLabel`. +ascending order by the value mapped to their `StatefulSetTemplateGenerationLabel`. This will reconstruct a list of PodTemplates from oldest to newest. Note that, as the revision is monotonically increasing for an individual StatefulSet, and as we use ControllerRef to mitigate selector overlap, the StatefulSet's history @@ -331,13 +333,13 @@ the oldest PodTemplates from the StatefulSet's revision history. [reconstruct the revision history](#history-reconstruction) of the StatefulSet. 1. If the number of PodTemplates in the StatefulSet's revision history is -greater than the StatefulSet's `.Spec.RevisionHistoryLimit, the +greater than the StatefulSet's `.Spec.RevisionHistoryLimit`, the StatefulSet controller will delete PodTemplates, starting with the head of the revision history, until the limit of the revision history is equal to the StatefulSet's `.Spec.RevisionHistoryLimit`. 1. As a StatefulSet's `.Spec.RevisionHistoryLimit` is always at least two, and -as the PodTemplates corresponding to `.Status.TemplateRevision` -or `.Status.TargetTemplateRevision` are always the most recent PodTemplates +as the PodTemplates corresponding to `.Status.TemplateGeneration` +or `.Status.TargetTemplateGeneration` are always the most recent PodTemplates in the revision history, the StatefulSet controller will not delete any `PodTemplates` that represent the current or target revisions. @@ -346,21 +348,21 @@ The StatefulSet controller will create PodTemplates upon mutation of the `.Spec.Template` of a StatefulSet. 1. When the StafefulSet controller observes a mutation to a StatefulSet's - `.Spec.Template` it will compare the `.Spec.TemplateRevision` to the - `.Status.TargetTemplateRevision`. -1. If the `.Spec.TemplateRevision` is equivalent to the -`.Status.TargetTemplateRevision`, no update has occurred. Note that, in the + `.Spec.Template` it will compare the `.Spec.TemplateGeneration` to the + `.Status.TargetTemplateGeneration`. +1. If the `.Spec.TemplateGeneration` is equivalent to the +`.Status.TargetTemplateGeneration`, no update has occurred. Note that, in the event that both are nil, they are considered to be equivalent, and we expect this to occur after an initial upgrade to a version of Kubernetes that supports StatefulSet update form one that does not. -1. If the `.Status.TemplateRevision` field is nil, and the -`.Spec.TemplateRevision` is not nil, then the StatefulSet has no revision +1. If the `.Status.TemplateGeneration` field is nil, and the +`.Spec.TemplateGeneration` is not nil, then the StatefulSet has no revision history. To initialize its revision history, the StatefulSet controller will -set both `.Status.TemplateRevision` and `.Status.TargetTemplateRevision` -to `.Spec.TemplateRevision` and +set both `.Status.TemplateGeneration` and `.Status.TargetTemplateGeneration` +to `.Spec.TemplateGeneration` and [create a new PodTemplate](#podtemplate-creation). -1. If the `.Status.TemplateRevision` is not nil, and if the -`.Spec.TemplateRevision` is not equal to the `.Status.TargetTemplateRevision`, +1. If the `.Status.TemplateGeneration` is not nil, and if the +`.Spec.TemplateGeneration` is not equal to the `.Status.TargetTemplateGeneration`, the StatefulSet controller will do the following. 1. The controller will [reconstruct the revision history](#history-reconsturction) of the @@ -382,49 +384,49 @@ Pod. These criteria allow the controller to continue to make progress toward its target state, while respecting its guarantees and allowing for rolling updates back and forward. -1. If the StatefulSet's `.Spec.TemplateRevision` is nil, then the cluster +1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, then the cluster has been upgraded from a version that does not support StatefulSet update to a version that does. 1. In this case the `.Spec.Template` is the current revision, and no Pods in the StatefulSet should be labeled with a - `StatefulSetPodTemplateRevision` label. + `StatefulSetPodTemplateGeneration` label. 1. The StatefulSet will initialize its revision history on the first update to its `.Spec.Template`. -1. If the StatefulSet's `.Spec.TemplateRevision` is equal to its -`.Status.TemplateRevision`, then there is no update in progress and all +1. If the StatefulSet's `.Spec.TemplateGeneration` is equal to its +`.Status.TemplateGeneration`, then there is no update in progress and all Pods will be created from the PodTemplate matching this revision. 1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`, then it was previously created from the PodTemplate matching the -StatefulSet's `.Status.TemplateRevision`, and it will be recreated +StatefulSet's `.Status.TemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the Pod's ordinal is in the sequence `[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was previously created from the PodTemplate matching the StatefulSet's, - `.Status.TargetTemplateRevision`, and it will be recreated from this + `.Status.TargetTemplateGeneration`, and it will be recreated from this PodTemplate. 1. If the ordinal does not meet either of the prior two conditions, and -if ordinal is in the sequence `[0, .Spec.RevisionPartition)`, it will be created +if ordinal is in the sequence `[0, .Spec.GenerationPartition)`, it will be created from the PodTemplate matching the StatefulSet's -`.Status.TemplateRevision`. +`.Status.TemplateGeneration`. 1. Otherwise, the Pod is created from the PodTemplate matching the -StatefulSet's `.Status.TargetTemplateRevision`. +StatefulSet's `.Status.TargetTemplateGeneration`. ### Update Completion A StatefulSet update is complete when the following conditions are met. 1. All Pods with ordinals in the sequence `[0,.Spec.Replicas)` have a Status of Running and a Ready Condition. -1. The StatefulSet's `.Spec.RevisionPartition` is equal to `0`. +1. The StatefulSet's `.Spec.GenerationPartition` is equal to `0`. 1. All Pods in the StatefulSet are labeled with a -`StatefulSetTemplateRevisionLabel` equal to the StatefulSet's -`.Status.TargetTemplateRevision` (This implies they have been created from +`StatefulSetTemplateGenerationLabel` equal to the StatefulSet's +`.Status.TargetTemplateGeneration` (This implies they have been created from the PodTemplate at that revision). When a StatefulSet update is complete, the controller will signal completion by doing the following. -1. The controller will set the StatefulSet's `.Status.TemplateRevision` to its -`.Status.TargetTemplateRevision`. +1. The controller will set the StatefulSet's `.Status.TemplateGeneration` to its +`.Status.TargetTemplateGeneration`. 1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its `Status.UpdatedReplicas`. 1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0. @@ -449,8 +451,8 @@ the `.Generation` of the StatefulSet object that was observed. created Pods. 1. The controller will set the `.Status.ReadyReplicas` to the current number of Pods that have a Status of Running and a ReadyCondition. -1. The controller will set the `.Status.TemplateRevision` and -`.Status.TargetTemplateRevision` +1. The controller will set the `.Status.TemplateGeneration` and +`.Status.TargetTemplateGeneration` in accordance with [maintaining its revision history](#history-maintenance) and the status of any [complete updates](#update-completion). 1. The controller will set the `.Status.CurrentReplicas` to the number of @@ -464,7 +466,7 @@ communicate it to observers. ## API Server The API Server will perform validation for StatefulSet updates and ensure that -a StatefulSet's `.Spec.TemplateRevision` is a generator for a strictly +a StatefulSet's `.Spec.TemplateGeneration` is a generator for a strictly monotonically increasing sequence. ### StatefulSet Validation @@ -473,13 +475,13 @@ fields of the StatefulSet object other than `.Spec.Replicas` and `.Spec.Template.Containers`. This design imposes the following, additional constraints. -1. The `.Spec.RevisionHistoryDepty` must be greater than or equal to `2`. +1. The `.Spec.RevisionHistoryLimit` must be greater than or equal to `2`. 1. The `.Spec.PositionOrdinal` must be in the sequence `[0,.Spec.Replicas)`. -### TemplateRevision Maintenance +### TemplateGeneration Maintenance It will be the responsibility of the API Server to enforce that updates to StatefulSet's `.Spec.Template` atomically increment the -`.Spec.TemplateRevision` counter. There is no need for the value to be +`.Spec.TemplateGeneration` counter. There is no need for the value to be strictly sequential, but it must be strictly, monotonically increasing. As validation will not allow mutation to any field other than the `.Spec.Template.Containers` field, the API Server need not track all fields of @@ -599,7 +601,7 @@ kubectl apply -f web.yaml ### Canaries Users can create a canary using `kubectl apply`. The only difference between a [rolling update](#rolling-out-an-update) and a canary is that the - `.Spec.RevisionPartition` is set to `.Spec.Replicas - 1`. + `.Spec.GenerationPartition` is set to `.Spec.Replicas - 1`. ```yaml apiVersion: apps/v1beta1 @@ -623,7 +625,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionPartition: 2 + generationParition: 2 volumeClaimTemplates: - metadata: name: www @@ -661,7 +663,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - partitionOrdinal: 3 + generationParition: 3 volumeClaimTemplates: - metadata: name: www @@ -676,7 +678,7 @@ spec: ### Staged Roll Outs Users can create a canary using `kubectl apply`. The only difference between a - [canary](#canaries) and a staged roll out is that the `.Spec.RevisionPartition` + [canary](#canaries) and a staged roll out is that the `.Spec.GenerationPartition` is set to value less than `.Spec.Replicas - 1`. ```yaml @@ -701,7 +703,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionParition: 2 + generationParition: 2 volumeClaimTemplates: - metadata: name: www @@ -718,7 +720,7 @@ Staged roll outs can be used to roll out a configuration, image, or resource update to some portion of the fleet maintained by the StatefulSet prior to updating the entire fleet. It is useful to support linear, geometric, and exponential roll out of an update. Users can modify the -`.Spec.RevisionPartition` to allow the roll out to progress. +`.Spec.GenerationPartition` to allow the roll out to progress. ```yaml apiVersion: apps/v1beta1 @@ -742,7 +744,7 @@ spec: volumeMounts: - name: www mountPath: /usr/share/nginx/html - revisionPartition: 1 + generationParition: 1 volumeClaimTemplates: - metadata: name: www @@ -771,8 +773,8 @@ Rolling back is usually the safest, and often the fastest, strategy to mitigate deployment failure, but rolling forward is sometimes the only practical solution for stateful applications (e.g. A users has a minor configuration error but has already modified the storage format for the application). Users can use -sequential `kubectl apply`'s to update the `.Status.TargetRevision` of a -StatefulSet. This will respect the `.Spec.RevisionPartition` with respect to the +sequential `kubectl apply`'s to update the `.Status.TargetGeneration` of a +StatefulSet. This will respect the `.Spec.GenerationPartition` with respect to the target state, and it therefor interacts well with canaries and staged roll outs. Note that, while users can update the target template revision, they can not update the current template revision. The only way to advance the current @@ -829,7 +831,7 @@ would for these events for StatefulSet update. While this proposal does not address [VolumeTemplateSpec updates](https://github.com/kubernetes/kubernetes/issues/41015), this would be a valuable feature for production users of storage systems that use -intermittent compaction as a form of garbage collection. Application that use +intermittent compaction as a form of garbage collection. Applications that use log structured merge trees with size tiered compaction (e.g Cassandra) or append only B(+/*) Trees (e.g Couchbase) can temporarily double their storage usage when compacting their on disk storage. If there is insufficient space for compaction