Vertical Pod Autoscaler - design proposal.

This commit is contained in:
kgrygiel 2017-02-07 15:47:26 +01:00
parent e2656eb943
commit 5a6183f3c0
2 changed files with 730 additions and 0 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 303 KiB

View File

@ -0,0 +1,730 @@
Vertical Pod Autoscaler
=======================
**Authors:** kgrygiel, mwielgus
**Contributors:** DirectXMan12, fgrzadkowski, jszczepkowski, smarterclayton
Vertical Pod Autoscaler
([#10782](https://github.com/kubernetes/kubernetes/issues/10782)),
later referred to as VPA (aka. "rightsizing" or "autopilot") is an
infrastructure service that automatically sets resource requirements of Pods
and dynamically adjusts them in runtime, based on analysis of historical
resource utilization, amount of resources available in the cluster and real-time
events, such as OOMs.
- [Introduction](#introduction)
- [Background](#background)
- [Purpose](#purpose)
- [Related features](#related-features)
- [Requirements](#requirements)
- [Functional](#functional)
- [Availability](#availability)
- [Extensibility](#extensibility)
- [Design](#design)
- [Overview](#overview)
- [Architecture overview](#architecture-overview)
- [API](#api)
- [Admission Controller](#admission-controller)
- [Recommender](#recommender)
- [Updater](#updater)
- [Recommendation model](#recommendation-model)
- [History Storage](#history-storage)
- [Open questions](#open-questions)
- [Future work](#future-work)
- [Pods that require VPA to start](#pods-that-require-vpa-to-start)
- [Combining vertical and horizontal scaling](#combining-vertical-and-horizontal-scaling)
- [Batch workloads](#batch-workloads)
- [Alternatives considered](#alternatives-considered)
- [Pods point at VPA](#pods-point-at-vpa)
- [VPA points at Deployment](#vpa-points-at-deployment)
- [Actuation using the Deployment update mechanism](#actuation-using-the-deployment-update-mechanism)
------------
Introduction
------------
### Background ###
* [Compute resources](https://kubernetes.io/docs/user-guide/compute-resources/)
* [Resource QoS](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-qos.md)
* [Admission Controllers](https://kubernetes.io/docs/admin/admission-controllers/)
* [External Admission Webhooks](https://kubernetes.io/docs/admin/extensible-admission-controllers/#external-admission-webhooks)
### Purpose ###
Vertical scaling has two objectives:
1. Reducing the maintenance cost, by automating configuration of resource
requirements.
2. Improving utilization of cluster resources, while minimizing the risk of containers running out of memory or getting CPU starved.
### Related features ###
#### Horizontal Pod Autoscaler ####
["Horizontal Pod Autoscaler"](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
(often abbreviated to HPA) is an infrastructure service that dynamically adjusts
the number of Pods in a replication controller based on realtime analysis of CPU
utilization or other, user specified signals.
Usually the user will choose horizontal scaling for stateless workloads and
vertical scaling for stateful. In some cases both solutions could be combined
([see more](#combining-vertical-and-horizontal-scaling)).
#### Cluster Autoscaler ####
["Cluster Autoscaler"](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
is a tool that automatically adjusts the size of the Kubernetes cluster based on
the overall cluster utilization.
Cluster Autoscaler and Pod Autoscalers (vertical or horizontal) are
complementary features. Combined together they provide a fully automatic scaling
solution.
#### Initial resources ####
["Initial Resources"](https://github.com/kgrygiel/community/blob/master/contributors/design-proposals/initial-resources.md)
is a very preliminary, proof-of-concept feature providing initial request based
on historical utilization. It is designed to only kick in on Pod creation.
VPA is intended to supersede this feature.
#### In-place updates ####
In-place Pod updates ([#5774]
(https://github.com/kubernetes/kubernetes/issues/5774)) is a planned feature to
allow changing resources (request/limit) of existing containers without killing them, assuming sufficient free resources available on the node.
Vertical Pod Autoscaler will greatly benefit from this ability, however it is
not considered a blocker for the MVP.
#### Resource estimation ####
Resource estimation is another planned feature, meant to improve node resource
utilization by temporarily reclaiming unused resources of running containers.
It is different from Vertical Autoscaling in that it operates on a shorter
timeframe (using only local, short-term history), re-offers resources at a
lower quality, and does not provide initial resource predictions.
VPA and resource estimation are complementary. Details will follow once
Resource Estimation is designed.
------------
Requirements
------------
### Functional ###
1. VPA is capable of setting container resources (CPU & memory request/limit) at
Pod submission time.
2. VPA is capable of adjusting container resources of existing Pods, in
particular reacting to CPU starvation and container OOM events.
3. When VPA restarts Pods, it respects the disruption budget.
4. It is possible for the user to configure VPA with fixed constraints on
resources, specifically: min & max request.
5. VPA is compatible with Pod controllers, at least with Deployments.
In particular:
* Updates of resources do not interfere/conflict with spec updates.
* It is possible to do a rolling update of the VPA policy (e.g. min resources)
on an existing Deployment.
6. It is possible to create Pod(s) that start following the VPA policy
immediately. In particular such Pods must not be scheduled until VPA policy
is applied.
7. Disabling VPA is easy and fast ("panic button"), without disrupting existing
Pods.
### Availability ###
1. Downtime of heavy-weight components (database/recommender) must not block
recreating existing Pods. Components on critical path for Pod creation
(admission controller) are designed to be highly available.
### Extensibility ###
1. VPA is capable of performing in-place updates once they become available.
------
Design
------
### Overview ###
(see further sections for details and justification)
1. We introduce a new type of **API resource**:
`VerticalPodAutoscaler`. It consists of a **label selector** to match Pods,
the **resources policy** (controls how VPA computes the resources), the
**update policy** (controls how changes are applied to Pods) and the
recommended Pod resources (an output field).
2. **VPA Recommender** is a new component which **consumes utilization signals
and OOM events** for all Pods in the cluster from the
[Metrics Server](https://github.com/kubernetes-incubator/metrics-server).
3. VPA Recommender **watches all Pods**, keeps calculating fresh recommended
resources for them and **stores the recommendations in the VPA objects**.
4. Additionally the Recommender **exposes a synchronous API** that takes a Pod
description and returns recommended resources.
5. All Pod creation requests go through the VPA **Admission Controller**.
If the Pod is matched by any VerticalPodAutoscaler object, the admission
controller **overrides resources** of containers in the Pod with the
recommendation provided by the VPA Recommender. If the Recommender is not
available, it falls back to the recommendation cached in the VPA object.
6. **VPA Updater** is a component responsible for **real-time updates** of Pods.
If a Pod uses VPA in `"Auto"` mode, the Updater can decide to update it with
recommender resources.
In MVP this is realized by just evicting the Pod in order to have it
recreated with new resources. This approach requires the Pod to belong to a
Replica Set (or some other owner capable of recreating it).
In future the Updater will take advantage of in-place updates, which would
most likely lift this constraint.
Because restarting/rescheduling Pods is disruptive to the service, it must be
rare.
7. VPA only controls the resource **request** of containers. It sets the limit
to infinity. The request is calculated based on analysis of the current and
previous runs (see [Recommendation model](#recommendation-model) below).
8. **History Storage** is a component that consumes utilization signals and OOMs
(same data as the Recommender) from the API Server and stores it persistently.
It is used by the Recommender to **initialize its state on startup**.
It can be backed by an arbitrary database. The first implementation will use
[Prometheus](https://github.com/kubernetes/charts/tree/master/stable/prometheus),
at least for the resource utilization part.
### Architecture overview ###
![VPA Architecture Diagram](images/vpa-architecture.png "VPA architecture overview")
### API ###
We introduce a new type of API object `VerticalPodAutoscaler`, which
consists of the Target, that is a [label selector](https://kubernetes.io/docs/api-reference/v1.5/#labelselector-unversioned)
for matching Pods and two policy sections: the update policy and the resources
policy.
Additionally it holds the most recent recommendation computed by VPA.
#### VPA API object overview ####
```go
// VerticalPodAutoscalerSpec is the specification of the behavior of the autoscaler.
type VerticalPodAutoscalerSpec {
// A label query that determines the set of pods controlled by the Autoscaler.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
Selector *metav1.LabelSelector
// Describes the rules on how changes are applied to the pods.
// +optional
UpdatePolicy PodUpdatePolicy
// Controls how the autoscaler computes recommended resources.
// +optional
ResourcePolicy PodResourcePolicy
}
// VerticalPodAutoscalerStatus describes the runtime state of the autoscaler.
type VerticalPodAutoscalerStatus {
// The time when the status was last refreshed.
LastUpdateTime metav1.Time
// The most recently computed amount of resources recommended by the
// autoscaler for the controlled pods.
// +optional
Recommendation RecommendedPodResources
// A free-form human readable message describing the status of the autoscaler.
StatusMessage string
}
```
The complete API definition is included [below](#complete_vpa_api_object_definition).
#### Label Selector ####
The label selector determines which Pods will be scaled according to the given
VPA policy. The Recommender will aggregate signals for all Pods matched by a
given VPA, so it is important that the user set labels to group similarly
behaving Pods under one VPA.
It is yet to be determined how to resolve conflicts, i.e. when the Pod is
matched by more than one VPA (this is not a VPA-specific problem though).
#### Update Policy ####
The update policy controls how VPA applies changes. In MVP it consists of a
single field `mode` that enables the feature.
```json
"updatePolicy" {
"mode": "",
}
```
Mode can be set to one of the following:
1. `"Initial"`: VPA only assigns resources on Pod creation and does not
change them during lifetime of the Pod.
2. `"Auto"` (default): VPA assigns resources on Pod creation and
additionally can update them during lifetime of the Pod, including evicting /
rescheduling the Pod.
3. `"Off"`: VPA never changes Pod resources. The recommender still sets the
recommended resources in the VPA object. This can be used for a “dry run”.
To disable VPA updates the user can do any of the following: (1) change the
updatePolicy to `"Off"` or (2) delete the VPA or (3) change the Pod labels to no
longer match the VPA selector.
Note: disabling VPA prevents it from doing further changes, but does not revert
resources of the running Pods, until they are updated.
For example, when running a Deployment, the user would need to perform an update
to revert Pod to originally specified resources.
#### Resource Policy ####
The resources policy controls how VPA computes the recommended resources.
In MVP it consists of (optional) lower and upper bound on the request of each
container.
The resources policy could later be extended with additional knobs to let the
user tune the recommendation algorithm to their specific use-case.
#### Recommendation ####
The VPA resource has an output-only field keeping a recent recommendation,
filled by the Recommender. This field can be used to obtain a recent
recommendation even during a temporary unavailability of the Recommender.
The recommendation consists of the recommended target amount of resources as
well as an range (min..max), which can be used by the Updater to make decisions
on when to update the pod.
In the case of a resource crunch the Updater may decide to squeeze pod resources
towards the recommended minimum.
The width of the (min..max) range also reflects the confidence of a
recommendation. For example, for a workload with a very spiky usage it is much
harder to determine the optimal balance between performance and resource
utilization, compared to a workload with stable usage.
#### Complete VPA API object definition ####
```go
// VerticalPodAutoscaler is the configuration for a vertical pod
// autoscaler, which automatically manages pod resources based on historical and
// real time resource utilization.
type VerticalPodAutoscaler struct {
metav1.TypeMeta
// Standard object metadata.
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata
// +optional
metav1.ObjectMeta
// Specification of the behavior of the autoscaler.
// More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status.
// +optional
Spec VerticalPodAutoscalerSpec
// Current information about the autoscaler.
// +optional
Status VerticalPodAutoscalerStatus
}
// VerticalPodAutoscalerSpec is the specification of the behavior of the autoscaler.
type VerticalPodAutoscalerSpec {
// A label query that determines the set of pods controlled by the Autoscaler.
// More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors
Selector *metav1.LabelSelector
// Describes the rules on how changes are applied to the pods.
// +optional
UpdatePolicy PodUpdatePolicy
// Controls how the autoscaler computes recommended resources.
// +optional
ResourcePolicy PodResourcePolicy
}
// VerticalPodAutoscalerStatus describes the runtime state of the autoscaler.
type VerticalPodAutoscalerStatus {
// The time when the status was last refreshed.
LastUpdateTime metav1.Time
// The most recently computed amount of resources recommended by the
// autoscaler for the controlled pods.
// +optional
Recommendation RecommendedPodResources
// A free-form human readable message describing the status of the autoscaler.
StatusMessage string
}
// UpdateMode controls when autoscaler applies changes to the pod resoures.
type UpdateMode string
const (
// UpdateModeOff means that autoscaler never changes Pod resources.
// The recommender still sets the recommended resources in the
// VerticalPodAutoscaler object. This can be used for a "dry run".
UpdateModeOff UpdateMode = "Off"
// UpdateModeInitial means that autoscaler only assigns resources on pod
// creation and does not change them during the lifetime of the pod.
UpdateModeInitial UpdateMode = "Initial"
// UpdateModeAuto means that autoscaler assigns resources on pod creation
// and additionally can update them during the lifetime of the pod,
// including evicting / rescheduling the pod.
UpdateModeAuto UpdateMode = "Auto"
)
// PodUpdatePolicy describes the rules on how changes are applied to the pods.
type PodUpdatePolicy struct {
// Controls when autoscaler applies changes to the pod resoures.
// +optional
UpdateMode UpdateMode
}
const (
// DefaultContainerResourcePolicy can be passed as
// ContainerResourcePolicy.Name to specify the default policy.
DefaultContainerResourcePolicy = "*"
)
// ContainerResourcePolicy controls how autoscaler computes the recommended
// resources for a specific container.
type ContainerResourcePolicy struct {
// Name of the container or DefaultContainerResourcePolicy, in which
// case the policy is used by the containers that don't have their own
// policy specified.
Name string
// Whether autoscaler is enabled for the container. Defaults to "On".
// +optional
Mode ContainerScalingMode
// Specifies the minimal amount of resources that will be recommended
// for the container.
// +optional
MinAllowed api.ResourceRequirements
// Specifies the maximum amount of resources that will be recommended
// for the container.
// +optional
MaxAllowed api.ResourceRequirements
}
// PodResourcePolicy controls how autoscaler computes the recommended resources
// for containers belonging to the pod.
type PodResourcePolicy struct {
// Per-container resource policies.
ContainerPolicies []ContainerResourcePolicy
}
// ContainerScalingMode controls whether autoscaler is enabled for a speciifc
// container.
type ContainerScalingMode string
const (
// ContainerScalingModeOn means autoscaling is enabled for a container.
ContainerScalingModeOn ContainerScalingMode = "On"
// ContainerScalingModeOff means autoscaling is disabled for a container.
ContainerScalingModeOff ContainerScalingMode = "Off"
)
// RecommendedPodResources is the recommendation of resources computed by
// autoscaler.
type RecommendedPodResources struct {
// Resources recommended by the autoscaler for each container.
ContainerRecommendations []RecommendedContainerResources
}
// RecommendedContainerResources is the recommendation of resources computed by
// autoscaler for a specific container. Respects the container resource policy
// if present in the spec.
type RecommendedContainerResources struct {
// Name of the container.
Name string
// Recommended amount of resources.
Target api.ResourceRequirements
// Minimum recommended amount of resources.
// Running the application with less resources is likely to have
// significant impact on performance/availability.
// +optional
MinRecommended api.ResourceRequirements
// Maximum recommended amount of resources.
// Any resources allocated beyond this value are likely wasted.
// +optional
MaxRecommended api.ResourceRequirements
}
```
### Admission Controller ###
VPA Admission Controller intercepts Pod creation requests. If the Pod is matched
by a VPA config with mode not set to “off”, the controller rewrites the request
by applying recommended resources to the Pod spec. Otherwise it leaves the Pod
spec unchanged.
The controller gets the recommended resources by fetching
/recommendedPodResources from the Recommender. If the call times out or fails,
the controller falls back to the recommendation cached in the VPA object.
If this is also not available the controller lets the request pass-through
with originally specified resources.
Note: in future it will be possible to (optionally) enforce using VPA by marking
the Pod as "requiring VPA". This will disallow scheduling the Pod before a
corresponding VPA config is created. The Admission Controller will reject such
Pods if it finds no matching VPA config. This ability will be convenient for the
user who wants to create the VPA config together with submitting the Pod.
The VPA Admission Controller will be implemented as an
[External Admission Hook](https://kubernetes.io/docs/admin/extensible-admission-controllers/#external-admission-webhooks).
Note however that this depends on the proposed feature to allow
[mutating webhook admission controllers](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/admission_control_extension.md#future-work).
### Recommender ###
Recommender is the main component of the VPA. It is responsible for
computing recommended resources. On startup the recommender fetches
historical resource utilization of all Pods (regardless of whether
they use VPA) together with the history of Pod OOM events from the
History Storage. It aggregates this data and keeps it in memory.
During normal operation the recommender consumes real time updates of
resource utilization and new events via the Metrics API from
the [Metrics Server](https://github.com/kubernetes-incubator/metrics-server).
Additionally it watches all Pods and all VPA objects in the
cluster. For every Pod that is matched by some VPA selector the
Recommender computes the recommended resources and sets the
recommendation on the VPA object.
It is important to realize that one VPA object has one recommendation.
The user is expected to use one VPA to control Pods with similar
resource usage patterns, typically a group of replicas or shards of
a single workload.
The Recommender acts as an
[extension-apiserver](https://kubernetes.io/docs/concepts/api-extension/apiserver-aggregation/),
exposing a synchronous method that takes a Pod Spec and the Pod metadata
and returns recommended resources.
#### Recommender API ####
```POST /recommendationQuery```
Request body:
```go
// RecommendationQuery obtains resource recommendation for a pod.
type RecommendationQuery struct {
metav1.TypeMeta
// +optional
metav1.ObjectMeta
// Spec is filled in by the caller to request a recommendation.
Spec RecommendationQuerySpec
// Status is filled in by the server with the recommended pod resources.
// +optional
Status RecommendationQueryStatus
}
// RecommendationQuerySpec is a request of recommendation for a pod.
type RecommendationQuerySpec struct {
// Pod for which to compute the recommendation. Does not need to exist.
Pod core.Pod
}
// RecommendationQueryStatus is a response to the recommendation request.
type RecommendationQueryStatus {
// Recommendation holds recommended resources for the pod.
// +optional
Recommendation autoscaler.RecommendedPodResources
// Error indicates that the recommendation was not available. Either
// Recommendation or Error must be present.
// +optional
Error string
}
```
Notice that this API method may be called for an existing Pod, as well as for a
yet-to-be-created Pod.
### Updater ###
VPA Updater is a component responsible for applying recommended resources to
existing Pods.
It monitors all VPA objects and Pods in the cluster, periodically fetching
recommendations for the Pods that are controlled by VPA by calling the
Recommender API.
When recommended resources significantly diverge from actually configured
resources, the Updater may decide to update a Pod.
In MVP (until in-place updates of Pod resources are available)
this means evicting Pods in order to have them recreated with the recommended
resources.
The Updater relies on other mechanisms (such as Replica Set) to recreate a
deleted Pod. However it does not verify whether such mechanism is actually
configured for the Pod. Such checks could be implemented in the CLI and warn
the user when the VPA would match Pods, that are not automatically restarted.
While terminating Pods is disruptive and generally undesired, it is sometimes
justified in order to (1) avoid CPU starvation (2) reduce the risk of correlated
OOMs across multiple Pods at random time or (3) save resources over long periods
of time.
Apart from its own policy on how often a Pod can be evicted, the Updater also
respects the Pod disruption budget, by using Eviction API to evict Pods.
The Updater only touches pods that point to a VPA with updatePolicy.mode set
to `"Auto"`.
The Updater will also need to understand how to adjust the recommendation before
applying it to a Pod, based on the current state of the cluster (e.g. quota,
space available on nodes or other scheduling constraints).
Otherwise it may deschedule a Pod permanently. This mechanism is not yet
designed.
### Recommendation model ###
VPA controls the request (memory and CPU) of containers. In MVP it always sets
the limit to infinity. It is not yet clear whether there is a use-case for VPA
setting the limit.
The request is calculated based on analysis of the current and revious runs of
the container and other containers with similar properties (name, image,
command, args).
The recommendation model (MVP) assumes that the memory and CPU consumption are
independent random variables with distribution equal to the one observed in the
last N days (recommended value is N=8 to capture weekly peaks).
A more advanced model in future could attempt to detect trends, periodicity and
other time-related patterns.
For CPU the objective is to **keep the fraction of time when the container usage
exceeds a high percentage (e.g. 95%) of request below a certain threshold**
(e.g. 1% of time).
In this model the "CPU usage" is defined as mean usage measured over a short
interval. The shorter the measurement interval, the better the quality of
recommendations for spiky, latency sensitive workloads. Minimum reasonable
resolution is 1/min, recommended is 1/sec.
For memory the objective is to **keep the probability of the container usage
exceeding the request in a specific time window below a certain threshold**
(e.g. below 1% in 24h). The window must be long (≥ 24h) to ensure that evictions
caused by OOM do not visibly affect (a) availability of serving applications
(b) progress of batch computations (a more advanced model could allow user to
specify SLO to control this).
#### Handling OOMs ####
When a container is evicted due to exceeding available memory, its actual memory
requirements are not known (the amount consumed obviously gives the lower
bound). This is modelled by translating OOM events to artificial memory usage
samples by applying a "safety margin" multiplier to the last observed usage.
### History Storage ###
VPA defines data access API for providers of historical events and resource
utilization. Initially we will use Prometheus as the reference implementation of
this API, at least for the resource utilization part. The historical events
could be backed by another solution, e.g.
[Infrastore](https://github.com/kubernetes/kubernetes/issues/44095).
Users will be able to plug their own implementations.
History Storage is populated with real time updates of resources utilization and
events, similarly to the Recommender. The storage keeps at least 8 days of data.
This data is only used to initialize the Recommender on startup.
### Open questions ###
1. How to resolve conflicts if multiple VPA objects match a Pod.
2. How to adjust the recommendation before applying it to a specific pod,
based on the current state of the cluster (e.g. quota, space available on
nodes or other scheduling constraints).
-----------
Future work
-----------
### Pods that require VPA to start ###
In the current proposal the Pod will be scheduled with originally configured
resources if no matching VPA config is present at the Pod admission time.
This may be undesired behavior. In particular the user may want to create the
VPA config together with submitting the Pod, which leads to a race condition:
the outcome depends on which resource (VPA or the Pod) is processed first.
In order to address this problem we propose to allow marking Pods with a special
annotation ("requires VPA") that prevents the Admission Controller from allowing
the Pod if a corresponding VPA is not available.
An alternative would be to introduce a VPA Initializer serving the same purpose.
### Combining vertical and horizontal scaling ###
In principle it may be possible to use both vertical and horizontal scaling for
a single workload (group of Pods), as long as the two mechanisms operate on
different resources.
The right approach is to let the
[Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/)
scale the group based on the _bottleneck_ resource. The Vertical Pod Autoscaler
could then control other resources. Examples:
1. A CPU-bound workload can be scaled horizontally based on the CPU utilization
while using vertical scaling to adjust memory.
2. An IO-bound workload can be scaled horizontally based on the IO throughput
while using vertical scaling to adjust both memory and CPU.
However this is a more advanced form of autoscaling and it is not well supported
by the MVP version of Vertical Pod Autoscaler. The difficulty comes from the
fact that changing the number of instances affects not only the utilization of
the bottleneck resource (which is the principle of horizontal scaling) but
potentially also non-bottleneck resources that are controlled by VPA.
The VPA model will have to be extended to take the size of the group into account
when aggregating the historical resource utilization and when producing a
recommendation, in order to allow combining it with HPA.
### Batch workloads ###
Batch workloads have different CPU requirements than latency sensitive
workloads. Instead of latency they care about throughput, which means VPA should
base the CPU requirements on average CPU consumption rather than high
percentiles of CPU distribution.
TODO: describe the recommendation model for the batch workloads and how VPA will
distinguish between batch and serving. A possible approach is to look at
`PodSpec.restartPolicy`.
An alternative would be to let the user specify the latency requirements of the
workload in the `PodResourcePolicy`.
-----------------------
Alternatives considered
-----------------------
### Pods point at VPA ###
*REJECTED BECAUSE IT REQUIRES MODIFYING THE POD SPEC*
#### proposal: ####
Instead of VPA using label selectors, Pod Spec is extended with an optional
field `verticalPodAutoscalerPolicy`,
a [reference](https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_localobjectreference)
to the VPA config.
#### pros: ####
* Consistency is enforced at the API level:
* At most one VPA can point to a given Pod.
* It is always clear at admission stage whether the Pod should use
VPA or not. No race conditions.
* It is cheap to find the VPA for a given Pod.
#### cons: ####
* Requires changing the core part of the API (Pod Spec).
### VPA points at Deployment ###
#### proposal: ####
VPA has a reference to Deployment object. Doesnt use label selector to match
Pods.
#### pros: ####
* More consistent with HPA.
#### cons: ####
* Extending VPA support from Deployment to other abstractions that manage Pods
requires additional work. VPA must be aware of all such abstractions.
* It is not possible to do a rolling update of the VPA config.
For example setting `max_memory` in the VPA config will apply to the whole
Deployment immediately.
* VPA cant be shared between deployments.
### Actuation using the Deployment update mechanism ###
In this solution the Deployment itself is responsible for actuating VPA
decisions.
#### Actuation by update of spec ####
In this variant changes of resources are applied similarly to normal changes of
the spec, i.e. using the Deployment rolling update mechanism.
**pros:** existing clean API (and implementation), one common update policy
(e.g. max surge, max unavailable).
**cons:** conflicting with user (config) update - update of resources and spec
are tied together (they are executed at the same rate), problem with rollbacks,
problem with pause. Not clear how to handle in-place updates? (this problem has
to be solved regardless of VPA though).
#### Dedicated method for resource update ####
In this variant Deployment still uses the rolling update mechanism for updating
resources, but update of resources is treated in a special way, so that it can
be performed in parallel with config update.
**pros:** handles concurrent resources and spec updates, solves resource updates
without VPA, more consistent with HPA, all update logic lives in one place (less
error-prone).
**cons:** specific to Deployment, high complexity (multiple replica set created
underneath - exposed to the user, can be confusing and error-prone).