Change Revision to Generation for consistency

Make reference to fact that Deployment and Stateful set both update from
a specific revision to a specific revision
This commit is contained in:
Kenneth Owens 2017-04-03 16:18:15 -07:00
parent 708300f8ca
commit ad19d350a4
1 changed files with 80 additions and 78 deletions

View File

@ -54,7 +54,7 @@ compensate for the application's scaling.
- StatefulSet update must support singleton StatefulSets. However, an update in
this case will cause a temporary outage. This is acceptable as a single
process application is, by definition, not highly available.
- Disruption in Kubernetes is controlled by PodDistruptionBugets. As
- Disruption in Kubernetes is controlled by PodDistruptionBudgets. As
StatefulSet updates progress one Pod at a time, and only occur when all
other Pods have a Status of Running and a Ready Condition, they can not
violate reasonable PodDisrutptionBugdets.
@ -81,8 +81,10 @@ compensate for the application's scaling.
- Stateful applications are likely to evolve wire protocols and storage formats
between versions. In most cases, when updating the application's Pod's
containers, it will not be safe to roll back or forward to an arbitrary
version. StatefulSet update should work well when rolling out an update,
or performing a rollback, between two specific revisions of the StatefulSet.
version. Controller based Pod update should work well when rolling out an
update, or performing a rollback, between two specific revisions of the
controlled API object. This is how Deployment functions, and this property is,
perhaps, even more critical for stateful applications.
## Requirements
This design is based on the following requirements.
@ -109,26 +111,26 @@ The following modifications will be made to the StatefulSetStatus API object.
type StatefulSetStatus struct {
// ObservedGeneration and Replicas fields are ommitted for brevity.
// TemplateRevision, if not nil, is the revision of the PodTemplate that was
// TemplateGeneration, if not nil, is the generation of the PodTemplate that was
// used to create Pods with ordinals in the sequence
// [0,CurrentReplicas).
TemplateRevision *int64 `json:"templateRevision,omitempty"`
TemplateGeneration *int64 `json:"templateGeneration,omitempty"`
// TargetTemplateRevision, if not nil, is the revision of the PodTemplate
// TargetTemplateGeneration, if not nil, is the generation of the PodTemplate
// that was used to create Pods with ordinals in the sequence
// [Replicas - UpdatedReplicas, Replicas).
TargetTemplateRevision *int64 `json:"targetTemplateRevision,omitempty"`
TargetTemplateGeneration *int64 `json:"targetTemplateGeneration,omitempty"`
// ReadyReplicas is the current number of Pods, created by the StatefulSet
// controller, that have a Status for Running and a Ready Condition.
ReadyReplicas int32 `json:"readyReplicas,omitempty"`
// CurrentRevisionReplicas is the number of Pods created by the StatefulSet
// controller from the PodTemplateSpec indicated by CurrentTemplateRevision.
// CurrentGenerationReplicas is the number of Pods created by the StatefulSet
// controller from the PodTemplateSpec indicated by CurrentTemplateGeneration.
CurrentReplicas int32 `json:"currentReplicas,omitempty"`
// UpdatedReplicas is the number of Pods created by the StatefulSet
// controller from the PodTemplateSpec indicated by TargetTemplateRevision.
// controller from the PodTemplateSpec indicated by TargetTemplateGeneration.
UpdatedReplicas int32 `json:"taretReplicas,omitempty"`
}
```
@ -141,25 +143,25 @@ type StatefulSetSpec struct {
// ommitted for brevity.
v1.PodTemplateSpec `json:"template"`
// TemplateRevision is a monotonically increasing, 64 bit, integer used to
// TemplateGeneration is a monotonically increasing, 64 bit, integer used to
// indicate the version of the of the PodTemplateSpec. If nil, the
// StatefulSetController has not initialized its revision history,
// change tracking is not enabled, and all Pods will be created from
// Template.
TemplateRevision *int64 `json:"templateRevision"`
TemplateGeneration *int64 `json:"templateGeneration"`
// RevisionPartition partitions the Pods in the StatefulSet by ordinal such
// GenerationPartition partitions the Pods in the StatefulSet by ordinal such
// that all Pods with a lower ordinal will be created from the PodTemplate that
// represents the current revision of the StatefulSet's revision history and
// all Pods with an a greater or equal ordinal will be created from the
// PodTemplate that represents the target revision of the StatefulSet's
// revision history.
RevisionPartition *int32 `json:"revisionPartition,omitempty`
GenerationPartition *int32 `json:"generationPartition,omitempty`
// RevisionHistoryLimit is the maximum number of PodTemplates that will
// be maintained in the StatefulSet's revision history. It must be at
// least two.
RevisionHisotryLimit int32 `json:historyRevisionDepth,omitempty`
RevisionHisotryLimit int32 `json:revisionHistoryLimit,omitempty`
}
```
@ -171,10 +173,10 @@ Additionally, we introduce the following constants.
// history.
const StatefulSetPodTemplateLabel = "created-by-statefulset"
// StatefulSetTemplateRevisionLabel is the label applied to a PodTemplate or
// StatefulSetTemplateGenerationLabel is the label applied to a PodTemplate or
// Pod to indicate the position of the object's Template in the revision
// history of a StatefulSet.
const StatefulSetTemplateRevisionLabel = "statefulset-template-revision"
const StatefulSetTemplateGenerationLabel = "statefulset-template-revision"
```
## StatefulSet Controller
@ -197,23 +199,23 @@ conform to the other. The conditions that define this state in terms of the
StatefulSet's StatefulSetSpec and StatefulSetStatus are below.
1. The StatefulSet contains exactly `[0,.Spec.Replicas)` Pods.
1. If StatefulSet's `.Spec.RevisionPartition` is nil, then the following is true.
1. The StatefulSet's `.Status.TemplateRevision` is equal to its
`.Status.TargetRevision`.
1. If StatefulSet's `.Spec.GenerationPartition` is nil, then the following is true.
1. The StatefulSet's `.Status.TemplateGeneration` is equal to its
`.Status.TargetGeneration`.
1. All Pods in the StatefulSet have been generated from the PodTemplate
labeled with a `StatefulSetTemplateRevisionLabel` equal to its
`.Status.TemplateRevision`.
1. If the StatefulSet's `.Spec.RevisionPartition` is not nil, then the following
labeled with a `StatefulSetTemplateGenerationLabel` equal to its
`.Status.TemplateGeneration`.
1. If the StatefulSet's `.Spec.GenerationPartition` is not nil, then the following
is true.
1. All Pods with ordinals is the sequence `[0,.Spec.RevisionPartition)` have
1. All Pods with ordinals is the sequence `[0,.Spec.GenerationPartition)` have
been generated from the PodTemplate in the StatefulSet's revision history
that is labeled with a `StatefulSetTemplateRevisionLabel` equal to
`.Status.TemplateRevision`.
that is labeled with a `StatefulSetTemplateGenerationLabel` equal to
`.Status.TemplateGeneration`.
1. All Pods with ordinals in the sequence
`[Spec.RevisionParition,.Spec.Replicas)` have been created with the
`[Spec.GenerationParition,.Spec.Replicas)` have been created with the
PodTemplate in the StatefulSet's revision history that is labeled with a
`StatefulSetTemplateRevisionLabel` equal to
`.Status.TargetTemplateRevision`.
`StatefulSetTemplateGenerationLabel` equal to
`.Status.TargetTemplateGeneration`.
### Revised Controller Algorithm
The StatefulSet controller will use the following algorithm to continue to
@ -274,10 +276,10 @@ StatefulSet's `.Spec.Template` field.
`StatefulSetPodTemplateLabel` set to the StatefulSet's `.Name` to allow for
selection of the PodTemplates that comprise the StatefulSet's revision history.
1. The controller will label the PodTemplate with a
`StaefulSetTemplateRevisionLabel` set to the StatefulSet's
`.Spec.TemplateRevision`.
`StaefulSetTemplateGenerationLabel` set to the StatefulSet's
`.Spec.TemplateGeneration`.
1. The controller will set the Name of the PodTemplate to a concatenation of the
`.Name` of the StatefulSet and the `.Spec.TemplateRevision`.
`.Name` of the StatefulSet and the `.Spec.TemplateGeneration`.
1. The controller will then create the PodTemplate.
#### PodTemplate Deletion
@ -297,7 +299,7 @@ and it will allow the PodTemplate to be deleted via garbage collection.
In order to reconstruct the history of revisions to a StatefulSet, the
StatefulSet controller will do the following.
1. If the StatefulSet's `.Spec.TemplateRevision` is nil, the StatefulSet
1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, the StatefulSet
has never been updated, and its history has never been initialized. This is
the state the object will be in when a cluster is first upgraded from a version
that does not support StatefulSet update to a version that does. In this case,
@ -311,12 +313,12 @@ ControllerRef matching the the StatefulSet. If the controller selects
PodTemplates that it does not own, it will report an error, but it will continue
reconstructing the StatefulSet's history.
1. The controller will filter out all PodTemplates that do not have a
`StatefulSetTemplateRevisionLabel` mapped to a valid revision. This can only
`StatefulSetTemplateGenerationLabel` mapped to a valid revision. This can only
occur if the user purposefully deletes the label. In this case, the
controller will report an error, but it will continue reconstructing the
StatefulSet's revision history.
1. For all the remaining PodTemplates, the controller will sort them in
ascending order by the value mapped to their `StatefulSetTemplateRevisionLabel`.
ascending order by the value mapped to their `StatefulSetTemplateGenerationLabel`.
This will reconstruct a list of PodTemplates from oldest to newest. Note that,
as the revision is monotonically increasing for an individual StatefulSet, and
as we use ControllerRef to mitigate selector overlap, the StatefulSet's history
@ -331,13 +333,13 @@ the oldest PodTemplates from the StatefulSet's revision history.
[reconstruct the revision history](#history-reconstruction)
of the StatefulSet.
1. If the number of PodTemplates in the StatefulSet's revision history is
greater than the StatefulSet's `.Spec.RevisionHistoryLimit, the
greater than the StatefulSet's `.Spec.RevisionHistoryLimit`, the
StatefulSet controller will delete PodTemplates, starting with the head of
the revision history, until the limit of the revision history is equal to
the StatefulSet's `.Spec.RevisionHistoryLimit`.
1. As a StatefulSet's `.Spec.RevisionHistoryLimit` is always at least two, and
as the PodTemplates corresponding to `.Status.TemplateRevision`
or `.Status.TargetTemplateRevision` are always the most recent PodTemplates
as the PodTemplates corresponding to `.Status.TemplateGeneration`
or `.Status.TargetTemplateGeneration` are always the most recent PodTemplates
in the revision history, the StatefulSet controller will not delete any
`PodTemplates` that represent the current or target revisions.
@ -346,21 +348,21 @@ The StatefulSet controller will create PodTemplates upon mutation of the
`.Spec.Template` of a StatefulSet.
1. When the StafefulSet controller observes a mutation to a StatefulSet's
`.Spec.Template` it will compare the `.Spec.TemplateRevision` to the
`.Status.TargetTemplateRevision`.
1. If the `.Spec.TemplateRevision` is equivalent to the
`.Status.TargetTemplateRevision`, no update has occurred. Note that, in the
`.Spec.Template` it will compare the `.Spec.TemplateGeneration` to the
`.Status.TargetTemplateGeneration`.
1. If the `.Spec.TemplateGeneration` is equivalent to the
`.Status.TargetTemplateGeneration`, no update has occurred. Note that, in the
event that both are nil, they are considered to be equivalent, and we expect
this to occur after an initial upgrade to a version of Kubernetes that supports
StatefulSet update form one that does not.
1. If the `.Status.TemplateRevision` field is nil, and the
`.Spec.TemplateRevision` is not nil, then the StatefulSet has no revision
1. If the `.Status.TemplateGeneration` field is nil, and the
`.Spec.TemplateGeneration` is not nil, then the StatefulSet has no revision
history. To initialize its revision history, the StatefulSet controller will
set both `.Status.TemplateRevision` and `.Status.TargetTemplateRevision`
to `.Spec.TemplateRevision` and
set both `.Status.TemplateGeneration` and `.Status.TargetTemplateGeneration`
to `.Spec.TemplateGeneration` and
[create a new PodTemplate](#podtemplate-creation).
1. If the `.Status.TemplateRevision` is not nil, and if the
`.Spec.TemplateRevision` is not equal to the `.Status.TargetTemplateRevision`,
1. If the `.Status.TemplateGeneration` is not nil, and if the
`.Spec.TemplateGeneration` is not equal to the `.Status.TargetTemplateGeneration`,
the StatefulSet controller will do the following.
1. The controller will
[reconstruct the revision history](#history-reconsturction) of the
@ -382,49 +384,49 @@ Pod. These criteria allow the controller to continue to make progress toward
its target state, while respecting its guarantees and allowing for rolling
updates back and forward.
1. If the StatefulSet's `.Spec.TemplateRevision` is nil, then the cluster
1. If the StatefulSet's `.Spec.TemplateGeneration` is nil, then the cluster
has been upgraded from a version that does not support StatefulSet update to
a version that does.
1. In this case the `.Spec.Template` is the current revision,
and no Pods in the StatefulSet should be labeled with a
`StatefulSetPodTemplateRevision` label.
`StatefulSetPodTemplateGeneration` label.
1. The StatefulSet will initialize its revision history on the first
update to its `.Spec.Template`.
1. If the StatefulSet's `.Spec.TemplateRevision` is equal to its
`.Status.TemplateRevision`, then there is no update in progress and all
1. If the StatefulSet's `.Spec.TemplateGeneration` is equal to its
`.Status.TemplateGeneration`, then there is no update in progress and all
Pods will be created from the PodTemplate matching this revision.
1. If the Pod's ordinal is in the sequence `[0,.Status.CurrentReplicas)`,
then it was previously created from the PodTemplate matching the
StatefulSet's `.Status.TemplateRevision`, and it will be recreated
StatefulSet's `.Status.TemplateGeneration`, and it will be recreated
from this PodTemplate.
1. If the Pod's ordinal is in the sequence
`[.Spec.Replicas-.Status.UpdatedReplicas,.Spec.Replicas)`, then it was
previously created from the PodTemplate matching the StatefulSet's,
`.Status.TargetTemplateRevision`, and it will be recreated from this
`.Status.TargetTemplateGeneration`, and it will be recreated from this
PodTemplate.
1. If the ordinal does not meet either of the prior two conditions, and
if ordinal is in the sequence `[0, .Spec.RevisionPartition)`, it will be created
if ordinal is in the sequence `[0, .Spec.GenerationPartition)`, it will be created
from the PodTemplate matching the StatefulSet's
`.Status.TemplateRevision`.
`.Status.TemplateGeneration`.
1. Otherwise, the Pod is created from the PodTemplate matching the
StatefulSet's `.Status.TargetTemplateRevision`.
StatefulSet's `.Status.TargetTemplateGeneration`.
### Update Completion
A StatefulSet update is complete when the following conditions are met.
1. All Pods with ordinals in the sequence `[0,.Spec.Replicas)` have a Status of
Running and a Ready Condition.
1. The StatefulSet's `.Spec.RevisionPartition` is equal to `0`.
1. The StatefulSet's `.Spec.GenerationPartition` is equal to `0`.
1. All Pods in the StatefulSet are labeled with a
`StatefulSetTemplateRevisionLabel` equal to the StatefulSet's
`.Status.TargetTemplateRevision` (This implies they have been created from
`StatefulSetTemplateGenerationLabel` equal to the StatefulSet's
`.Status.TargetTemplateGeneration` (This implies they have been created from
the PodTemplate at that revision).
When a StatefulSet update is complete, the controller will signal completion by
doing the following.
1. The controller will set the StatefulSet's `.Status.TemplateRevision` to its
`.Status.TargetTemplateRevision`.
1. The controller will set the StatefulSet's `.Status.TemplateGeneration` to its
`.Status.TargetTemplateGeneration`.
1. The controller will set the StatefulSet's `Status.CurrentReplicas` to its
`Status.UpdatedReplicas`.
1. The controller will set the StatefulSet's `Status.UpdatedReplicas` to 0.
@ -449,8 +451,8 @@ the `.Generation` of the StatefulSet object that was observed.
created Pods.
1. The controller will set the `.Status.ReadyReplicas` to the current number of
Pods that have a Status of Running and a ReadyCondition.
1. The controller will set the `.Status.TemplateRevision` and
`.Status.TargetTemplateRevision`
1. The controller will set the `.Status.TemplateGeneration` and
`.Status.TargetTemplateGeneration`
in accordance with [maintaining its revision history](#history-maintenance)
and the status of any [complete updates](#update-completion).
1. The controller will set the `.Status.CurrentReplicas` to the number of
@ -464,7 +466,7 @@ communicate it to observers.
## API Server
The API Server will perform validation for StatefulSet updates and ensure that
a StatefulSet's `.Spec.TemplateRevision` is a generator for a strictly
a StatefulSet's `.Spec.TemplateGeneration` is a generator for a strictly
monotonically increasing sequence.
### StatefulSet Validation
@ -473,13 +475,13 @@ fields of the StatefulSet object other than `.Spec.Replicas` and
`.Spec.Template.Containers`. This design imposes the following, additional
constraints.
1. The `.Spec.RevisionHistoryDepty` must be greater than or equal to `2`.
1. The `.Spec.RevisionHistoryLimit` must be greater than or equal to `2`.
1. The `.Spec.PositionOrdinal` must be in the sequence `[0,.Spec.Replicas)`.
### TemplateRevision Maintenance
### TemplateGeneration Maintenance
It will be the responsibility of the API Server to enforce that updates to
StatefulSet's `.Spec.Template` atomically increment the
`.Spec.TemplateRevision` counter. There is no need for the value to be
`.Spec.TemplateGeneration` counter. There is no need for the value to be
strictly sequential, but it must be strictly, monotonically increasing.
As validation will not allow mutation to any field other than the
`.Spec.Template.Containers` field, the API Server need not track all fields of
@ -599,7 +601,7 @@ kubectl apply -f web.yaml
### Canaries
Users can create a canary using `kubectl apply`. The only difference between a
[rolling update](#rolling-out-an-update) and a canary is that the
`.Spec.RevisionPartition` is set to `.Spec.Replicas - 1`.
`.Spec.GenerationPartition` is set to `.Spec.Replicas - 1`.
```yaml
apiVersion: apps/v1beta1
@ -623,7 +625,7 @@ spec:
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
revisionPartition: 2
generationParition: 2
volumeClaimTemplates:
- metadata:
name: www
@ -661,7 +663,7 @@ spec:
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
partitionOrdinal: 3
generationParition: 3
volumeClaimTemplates:
- metadata:
name: www
@ -676,7 +678,7 @@ spec:
### Staged Roll Outs
Users can create a canary using `kubectl apply`. The only difference between a
[canary](#canaries) and a staged roll out is that the `.Spec.RevisionPartition`
[canary](#canaries) and a staged roll out is that the `.Spec.GenerationPartition`
is set to value less than `.Spec.Replicas - 1`.
```yaml
@ -701,7 +703,7 @@ spec:
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
revisionParition: 2
generationParition: 2
volumeClaimTemplates:
- metadata:
name: www
@ -718,7 +720,7 @@ Staged roll outs can be used to roll out a configuration, image, or resource
update to some portion of the fleet maintained by the StatefulSet prior to
updating the entire fleet. It is useful to support linear, geometric, and
exponential roll out of an update. Users can modify the
`.Spec.RevisionPartition` to allow the roll out to progress.
`.Spec.GenerationPartition` to allow the roll out to progress.
```yaml
apiVersion: apps/v1beta1
@ -742,7 +744,7 @@ spec:
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
revisionPartition: 1
generationParition: 1
volumeClaimTemplates:
- metadata:
name: www
@ -771,8 +773,8 @@ Rolling back is usually the safest, and often the fastest, strategy to mitigate
deployment failure, but rolling forward is sometimes the only practical solution
for stateful applications (e.g. A users has a minor configuration error but has
already modified the storage format for the application). Users can use
sequential `kubectl apply`'s to update the `.Status.TargetRevision` of a
StatefulSet. This will respect the `.Spec.RevisionPartition` with respect to the
sequential `kubectl apply`'s to update the `.Status.TargetGeneration` of a
StatefulSet. This will respect the `.Spec.GenerationPartition` with respect to the
target state, and it therefor interacts well with canaries and staged roll outs.
Note that, while users can update the target template revision, they can not
update the current template revision. The only way to advance the current
@ -829,7 +831,7 @@ would for these events for StatefulSet update.
While this proposal does not address
[VolumeTemplateSpec updates](https://github.com/kubernetes/kubernetes/issues/41015),
this would be a valuable feature for production users of storage systems that use
intermittent compaction as a form of garbage collection. Application that use
intermittent compaction as a form of garbage collection. Applications that use
log structured merge trees with size tiered compaction (e.g Cassandra) or append
only B(+/*) Trees (e.g Couchbase) can temporarily double their storage usage when
compacting their on disk storage. If there is insufficient space for compaction