[es] Creating first disruptoins.md file
This commit is contained in:
parent
ff4e472dcf
commit
eab7b6cbdc
|
|
@ -0,0 +1,276 @@
|
|||
---
|
||||
reviewers:
|
||||
- electrocucaracha
|
||||
- raelga
|
||||
title: Interrupciones
|
||||
content_type: concept
|
||||
weight: 60
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
Esta guía es para los dueños de aplicaciones que quieren crear
|
||||
aplicaciones con alta disponibilidad y que necesitan entender
|
||||
que tipos de interrupciones pueden suceder en los Pods.
|
||||
|
||||
Tambien es para los administradores de clusters que quieren aplicacar acciones
|
||||
automáticas en sus clusters, como actualizar o autoescalar los clusters.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Interrupciones voluntarias e involuntarias
|
||||
|
||||
Los pods no desaparecen hasta que algo (una persona o un controlador) los destruye
|
||||
ó hay problemas de hardware o del software que son inevitables.
|
||||
|
||||
Nosotros llamamos esos casos inevitables *interrupciones involuntarias* a
|
||||
una aplicación. Algunos ejemplos:
|
||||
|
||||
- Una falla en en hardware de la máquina fisica del nodo
|
||||
- Un administrador del cluster borra una VM (instancia) por error
|
||||
- El proveedor cloud o el hypervisor falla y hace desaparecer la VM
|
||||
- Un kernel panic
|
||||
- El nodo desaparece del cluster por un problema de red que lo separa del cluster
|
||||
- Una remoción del pod porque el nodo esta [fuera-de-recursos](/docs/concepts/scheduling-eviction/node-pressure-eviction/).
|
||||
```No se si es correcto el termino remoción, porque la traducción sería desalojo pero no me gusta tanto```
|
||||
|
||||
A excepción de la condición fuera-de-recursos, todas estas condiciones
|
||||
deben ser familiares para la mayoría de los usuarios, no son específicas
|
||||
de Kubernetes
|
||||
|
||||
Nosotros llamamos a los otros casos *interrupciones voluntarias*. Estas incluyen
|
||||
las acciones iniciadas por el dueño de la aplicación y aquellas iniciadas por el Administrador
|
||||
del Cluster. Las acciones típicas de los dueños de la aplicación incluye:
|
||||
|
||||
- borrar el deployment o otro controlador que maneja el pod
|
||||
- actualizar el deployment del pod que causa un reinicio
|
||||
- borrar un pod (por ejemplo por accidente)
|
||||
|
||||
Las acciones del administrador del cluster incluyen:
|
||||
|
||||
- [Drenar un nodo](/docs/tasks/administer-cluster/safely-drain-node/) para reparar o actualizar.
|
||||
- Drenar un nodo del cluster para reducir el cluster (aprenda acerca de [Autoescalamiento de Cluster](https://github.com/kubernetes/autoscaler/#readme)
|
||||
).
|
||||
- Remover un pod de un nodo para permitir que otra cosa pueda ingresar a ese nodo.
|
||||
|
||||
Estas acciones pueden ser realizadas directamente por el administrador del cluster, por
|
||||
tareas automaticas del administrador del cluster ó por el proveedor del cluster.
|
||||
```Si puden revisar esta frase de arriba sería muy bueno, no me gusta como ha quedado```
|
||||
|
||||
Consulte al administrador de su cluster, a su proveedor cloud ó la documentación de su distribución
|
||||
para determinar si alguno de estas interrupciones voluntarias están habilitadas en su cluster.
|
||||
Si no estan habilitadas, puede saltear la creación del presupuesto de Interrupción de Pods.
|
||||
```Si puden revisar esta frase de arriba sería muy bueno, no me gusta como ha quedado```
|
||||
|
||||
{{< caution >}}
|
||||
Not all voluntary disruptions are constrained by Pod Disruption Budgets. For example,
|
||||
deleting deployments or pods bypasses Pod Disruption Budgets.
|
||||
{{< /caution >}}
|
||||
|
||||
## Tratando con las interrupciones
|
||||
|
||||
Estas son algunas de las maneras para mitigar las interrupciones involuntarias:
|
||||
|
||||
- Asegurarse que el pod [solicite los recursos](/docs/tasks/configure-pod-container/assign-memory-resource) que necesita.
|
||||
- Replique su aplicación su usted necesita alta disponiblidad. (Aprenda sobre correr aplicaciones replicadas
|
||||
[stateless](/docs/tasks/run-application/run-stateless-application-deployment/)
|
||||
y [stateful](/docs/tasks/run-application/run-replicated-stateful-application/) applications.)
|
||||
- Incluso, para alta disponiblidad cuando corre aplicaciones replicadas,
|
||||
extendia aplicaciones por varios racks (usando
|
||||
[anti-affinity](/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity))
|
||||
o usando zonas (si usa un [cluster multi-zona](/docs/setup/multiple-zones).)
|
||||
|
||||
La frecuencia de las interrupciones voluntarias varía. En un cluster basico de Kubernetes, no hay
|
||||
interrupciones voluntarias automáticas (solo el usuario las genera). Sin embargo, su administrador del cluster o proveedor de hosting
|
||||
puede correr algun servicio adicional que pueda causar estas interrupciones voluntarias. Por ejemplo,
|
||||
desplegando una actualización de software en los nodos puede causar interrupciones. También, algunas implementaciones
|
||||
de clusters con autoescalamiento de nodos puede causar interrupciones para defragmentar o compactar los nodos.
|
||||
Su administrador de cluster o proveedor de hosting debe tener documentado cual es el nivel de interrupciones
|
||||
voluntarias esperadas, si las hay. Ciertas opciones de configuración, como ser
|
||||
[usar PriorityClasses](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
|
||||
en las spec de su pod pueden tambien causar interrupciones voluntarias (o involuntarias).
|
||||
|
||||
|
||||
## Pod disruption budgets
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="stable" >}}
|
||||
|
||||
Kubernetes offers features to help you run highly available applications even when you
|
||||
introduce frequent voluntary disruptions.
|
||||
|
||||
As an application owner, you can create a PodDisruptionBudget (PDB) for each application.
|
||||
A PDB limits the number of Pods of a replicated application that are down simultaneously from
|
||||
voluntary disruptions. For example, a quorum-based application would
|
||||
like to ensure that the number of replicas running is never brought below the
|
||||
number needed for a quorum. A web front end might want to
|
||||
ensure that the number of replicas serving load never falls below a certain
|
||||
percentage of the total.
|
||||
|
||||
Cluster managers and hosting providers should use tools which
|
||||
respect PodDisruptionBudgets by calling the [Eviction API](/docs/tasks/administer-cluster/safely-drain-node/#eviction-api)
|
||||
instead of directly deleting pods or deployments.
|
||||
|
||||
For example, the `kubectl drain` subcommand lets you mark a node as going out of
|
||||
service. When you run `kubectl drain`, the tool tries to evict all of the Pods on
|
||||
the Node you're taking out of service. The eviction request that `kubectl` submits on
|
||||
your behalf may be temporarily rejected, so the tool periodically retries all failed
|
||||
requests until all Pods on the target node are terminated, or until a configurable timeout
|
||||
is reached.
|
||||
|
||||
A PDB specifies the number of replicas that an application can tolerate having, relative to how
|
||||
many it is intended to have. For example, a Deployment which has a `.spec.replicas: 5` is
|
||||
supposed to have 5 pods at any given time. If its PDB allows for there to be 4 at a time,
|
||||
then the Eviction API will allow voluntary disruption of one (but not two) pods at a time.
|
||||
|
||||
The group of pods that comprise the application is specified using a label selector, the same
|
||||
as the one used by the application's controller (deployment, stateful-set, etc).
|
||||
|
||||
The "intended" number of pods is computed from the `.spec.replicas` of the workload resource
|
||||
that is managing those pods. The control plane discovers the owning workload resource by
|
||||
examining the `.metadata.ownerReferences` of the Pod.
|
||||
|
||||
[Involuntary disruptions](#voluntary-and-involuntary-disruptions) cannot be prevented by PDBs; however they
|
||||
do count against the budget.
|
||||
|
||||
Pods which are deleted or unavailable due to a rolling upgrade to an application do count
|
||||
against the disruption budget, but workload resources (such as Deployment and StatefulSet)
|
||||
are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures
|
||||
during application updates is configured in the spec for the specific workload resource.
|
||||
|
||||
When a pod is evicted using the eviction API, it is gracefully
|
||||
[terminated](/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination), honoring the
|
||||
`terminationGracePeriodSeconds` setting in its [PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
|
||||
|
||||
## PodDisruptionBudget example {#pdb-example}
|
||||
|
||||
Consider a cluster with 3 nodes, `node-1` through `node-3`.
|
||||
The cluster is running several applications. One of them has 3 replicas initially called
|
||||
`pod-a`, `pod-b`, and `pod-c`. Another, unrelated pod without a PDB, called `pod-x`, is also shown.
|
||||
Initially, the pods are laid out as follows:
|
||||
|
||||
| node-1 | node-2 | node-3 |
|
||||
|:--------------------:|:-------------------:|:------------------:|
|
||||
| pod-a *available* | pod-b *available* | pod-c *available* |
|
||||
| pod-x *available* | | |
|
||||
|
||||
All 3 pods are part of a deployment, and they collectively have a PDB which requires
|
||||
there be at least 2 of the 3 pods to be available at all times.
|
||||
|
||||
For example, assume the cluster administrator wants to reboot into a new kernel version to fix a bug in the kernel.
|
||||
The cluster administrator first tries to drain `node-1` using the `kubectl drain` command.
|
||||
That tool tries to evict `pod-a` and `pod-x`. This succeeds immediately.
|
||||
Both pods go into the `terminating` state at the same time.
|
||||
This puts the cluster in this state:
|
||||
|
||||
| node-1 *draining* | node-2 | node-3 |
|
||||
|:--------------------:|:-------------------:|:------------------:|
|
||||
| pod-a *terminating* | pod-b *available* | pod-c *available* |
|
||||
| pod-x *terminating* | | |
|
||||
|
||||
The deployment notices that one of the pods is terminating, so it creates a replacement
|
||||
called `pod-d`. Since `node-1` is cordoned, it lands on another node. Something has
|
||||
also created `pod-y` as a replacement for `pod-x`.
|
||||
|
||||
(Note: for a StatefulSet, `pod-a`, which would be called something like `pod-0`, would need
|
||||
to terminate completely before its replacement, which is also called `pod-0` but has a
|
||||
different UID, could be created. Otherwise, the example applies to a StatefulSet as well.)
|
||||
|
||||
Now the cluster is in this state:
|
||||
|
||||
| node-1 *draining* | node-2 | node-3 |
|
||||
|:--------------------:|:-------------------:|:------------------:|
|
||||
| pod-a *terminating* | pod-b *available* | pod-c *available* |
|
||||
| pod-x *terminating* | pod-d *starting* | pod-y |
|
||||
|
||||
At some point, the pods terminate, and the cluster looks like this:
|
||||
|
||||
| node-1 *drained* | node-2 | node-3 |
|
||||
|:--------------------:|:-------------------:|:------------------:|
|
||||
| | pod-b *available* | pod-c *available* |
|
||||
| | pod-d *starting* | pod-y |
|
||||
|
||||
At this point, if an impatient cluster administrator tries to drain `node-2` or
|
||||
`node-3`, the drain command will block, because there are only 2 available
|
||||
pods for the deployment, and its PDB requires at least 2. After some time passes, `pod-d` becomes available.
|
||||
|
||||
The cluster state now looks like this:
|
||||
|
||||
| node-1 *drained* | node-2 | node-3 |
|
||||
|:--------------------:|:-------------------:|:------------------:|
|
||||
| | pod-b *available* | pod-c *available* |
|
||||
| | pod-d *available* | pod-y |
|
||||
|
||||
Now, the cluster administrator tries to drain `node-2`.
|
||||
The drain command will try to evict the two pods in some order, say
|
||||
`pod-b` first and then `pod-d`. It will succeed at evicting `pod-b`.
|
||||
But, when it tries to evict `pod-d`, it will be refused because that would leave only
|
||||
one pod available for the deployment.
|
||||
|
||||
The deployment creates a replacement for `pod-b` called `pod-e`.
|
||||
Because there are not enough resources in the cluster to schedule
|
||||
`pod-e` the drain will again block. The cluster may end up in this
|
||||
state:
|
||||
|
||||
| node-1 *drained* | node-2 | node-3 | *no node* |
|
||||
|:--------------------:|:-------------------:|:------------------:|:------------------:|
|
||||
| | pod-b *terminating* | pod-c *available* | pod-e *pending* |
|
||||
| | pod-d *available* | pod-y | |
|
||||
|
||||
At this point, the cluster administrator needs to
|
||||
add a node back to the cluster to proceed with the upgrade.
|
||||
|
||||
You can see how Kubernetes varies the rate at which disruptions
|
||||
can happen, according to:
|
||||
|
||||
- how many replicas an application needs
|
||||
- how long it takes to gracefully shutdown an instance
|
||||
- how long it takes a new instance to start up
|
||||
- the type of controller
|
||||
- the cluster's resource capacity
|
||||
|
||||
## Separating Cluster Owner and Application Owner Roles
|
||||
|
||||
Often, it is useful to think of the Cluster Manager
|
||||
and Application Owner as separate roles with limited knowledge
|
||||
of each other. This separation of responsibilities
|
||||
may make sense in these scenarios:
|
||||
|
||||
- when there are many application teams sharing a Kubernetes cluster, and
|
||||
there is natural specialization of roles
|
||||
- when third-party tools or services are used to automate cluster management
|
||||
|
||||
Pod Disruption Budgets support this separation of roles by providing an
|
||||
interface between the roles.
|
||||
|
||||
If you do not have such a separation of responsibilities in your organization,
|
||||
you may not need to use Pod Disruption Budgets.
|
||||
|
||||
## How to perform Disruptive Actions on your Cluster
|
||||
|
||||
If you are a Cluster Administrator, and you need to perform a disruptive action on all
|
||||
the nodes in your cluster, such as a node or system software upgrade, here are some options:
|
||||
|
||||
- Accept downtime during the upgrade.
|
||||
- Failover to another complete replica cluster.
|
||||
- No downtime, but may be costly both for the duplicated nodes
|
||||
and for human effort to orchestrate the switchover.
|
||||
- Write disruption tolerant applications and use PDBs.
|
||||
- No downtime.
|
||||
- Minimal resource duplication.
|
||||
- Allows more automation of cluster administration.
|
||||
- Writing disruption-tolerant applications is tricky, but the work to tolerate voluntary
|
||||
disruptions largely overlaps with work to support autoscaling and tolerating
|
||||
involuntary disruptions.
|
||||
|
||||
|
||||
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
* Follow steps to protect your application by [configuring a Pod Disruption Budget](/docs/tasks/run-application/configure-pdb/).
|
||||
|
||||
* Learn more about [draining nodes](/docs/tasks/administer-cluster/safely-drain-node/)
|
||||
|
||||
* Learn about [updating a deployment](/docs/concepts/workloads/controllers/deployment/#updating-a-deployment)
|
||||
including steps to maintain its availability during the rollout.
|
||||
|
||||
Loading…
Reference in New Issue