807 lines
		
	
	
		
			34 KiB
		
	
	
	
		
			Markdown
		
	
	
	
			
		
		
	
	
			807 lines
		
	
	
		
			34 KiB
		
	
	
	
		
			Markdown
		
	
	
	
---
 | 
						|
title: Resource Management for Pods and Containers
 | 
						|
content_type: concept
 | 
						|
weight: 40
 | 
						|
feature:
 | 
						|
  title: Automatic bin packing
 | 
						|
  description: >
 | 
						|
    Automatically places containers based on their resource requirements and other constraints, while not sacrificing availability.
 | 
						|
    Mix critical and best-effort workloads in order to drive up utilization and save even more resources.
 | 
						|
---
 | 
						|
 | 
						|
<!-- overview -->
 | 
						|
 | 
						|
When you specify a {{< glossary_tooltip term_id="pod" >}}, you can optionally specify how
 | 
						|
much of each resource a {{< glossary_tooltip text="container" term_id="container" >}} needs.
 | 
						|
The most common resources to specify are CPU and memory (RAM); there are others.
 | 
						|
 | 
						|
When you specify the resource _request_ for containers in a Pod, the
 | 
						|
{{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}} uses this
 | 
						|
information to decide which node to place the Pod on. When you specify a resource _limit_
 | 
						|
for a container, the kubelet enforces those limits so that the running container is not
 | 
						|
allowed to use more of that resource than the limit you set. The kubelet also reserves
 | 
						|
at least the _request_ amount of that system resource specifically for that container
 | 
						|
to use.
 | 
						|
 | 
						|
<!-- body -->
 | 
						|
 | 
						|
## Requests and limits
 | 
						|
 | 
						|
If the node where a Pod is running has enough of a resource available, it's possible (and
 | 
						|
allowed) for a container to use more resource than its `request` for that resource specifies.
 | 
						|
However, a container is not allowed to use more than its resource `limit`.
 | 
						|
 | 
						|
For example, if you set a `memory` request of 256 MiB for a container, and that container is in
 | 
						|
a Pod scheduled to a Node with 8GiB of memory and no other Pods, then the container can try to use
 | 
						|
more RAM.
 | 
						|
 | 
						|
If you set a `memory` limit of 4GiB for that container, the kubelet (and
 | 
						|
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}) enforce the limit.
 | 
						|
The runtime prevents the container from using more than the configured resource limit. For example:
 | 
						|
when a process in the container tries to consume more than the allowed amount of memory,
 | 
						|
the system kernel terminates the process that attempted the allocation, with an out of memory
 | 
						|
(OOM) error.
 | 
						|
 | 
						|
Limits can be implemented either reactively (the system intervenes once it sees a violation)
 | 
						|
or by enforcement (the system prevents the container from ever exceeding the limit). Different
 | 
						|
runtimes can have different ways to implement the same restrictions.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
If a container specifies its own memory limit, but does not specify a memory request, Kubernetes
 | 
						|
automatically assigns a memory request that matches the limit. Similarly, if a container specifies its own
 | 
						|
CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches
 | 
						|
the limit.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
## Resource types
 | 
						|
 | 
						|
*CPU* and *memory* are each a *resource type*. A resource type has a base unit.
 | 
						|
CPU represents compute processing and is specified in units of [Kubernetes CPUs](#meaning-of-cpu).
 | 
						|
Memory is specified in units of bytes.
 | 
						|
For Linux workloads, you can specify _huge page_ resources.
 | 
						|
Huge pages are a Linux-specific feature where the node kernel allocates blocks of memory
 | 
						|
that are much larger than the default page size.
 | 
						|
 | 
						|
For example, on a system where the default page size is 4KiB, you could specify a limit,
 | 
						|
`hugepages-2Mi: 80Mi`. If the container tries allocating over 40 2MiB huge pages (a
 | 
						|
total of 80 MiB), that allocation fails.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
You cannot overcommit `hugepages-*` resources.
 | 
						|
This is different from the `memory` and `cpu` resources.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
CPU and memory are collectively referred to as *compute resources*, or *resources*. Compute
 | 
						|
resources are measurable quantities that can be requested, allocated, and
 | 
						|
consumed. They are distinct from
 | 
						|
[API resources](/docs/concepts/overview/kubernetes-api/). API resources, such as Pods and
 | 
						|
[Services](/docs/concepts/services-networking/service/) are objects that can be read and modified
 | 
						|
through the Kubernetes API server.
 | 
						|
 | 
						|
## Resource requests and limits of Pod and container
 | 
						|
 | 
						|
For each container, you can specify resource limits and requests,
 | 
						|
including the following:
 | 
						|
 | 
						|
* `spec.containers[].resources.limits.cpu`
 | 
						|
* `spec.containers[].resources.limits.memory`
 | 
						|
* `spec.containers[].resources.limits.hugepages-<size>`
 | 
						|
* `spec.containers[].resources.requests.cpu`
 | 
						|
* `spec.containers[].resources.requests.memory`
 | 
						|
* `spec.containers[].resources.requests.hugepages-<size>`
 | 
						|
 | 
						|
Although you can only specify requests and limits for individual containers,
 | 
						|
it is also useful to think about the overall resource requests and limits for
 | 
						|
a Pod.
 | 
						|
For a particular resource, a *Pod resource request/limit* is the sum of the
 | 
						|
resource requests/limits of that type for each container in the Pod.
 | 
						|
 | 
						|
## Resource units in Kubernetes
 | 
						|
 | 
						|
### CPU resource units {#meaning-of-cpu}
 | 
						|
 | 
						|
Limits and requests for CPU resources are measured in *cpu* units.
 | 
						|
In Kubernetes, 1 CPU unit is equivalent to **1 physical CPU core**,
 | 
						|
or **1 virtual core**, depending on whether the node is a physical host
 | 
						|
or a virtual machine running inside a physical machine.
 | 
						|
 | 
						|
Fractional requests are allowed. When you define a container with
 | 
						|
`spec.containers[].resources.requests.cpu` set to `0.5`, you are requesting half
 | 
						|
as much CPU time compared to if you asked for `1.0` CPU.
 | 
						|
For CPU resource units, the [quantity](/docs/reference/kubernetes-api/common-definitions/quantity/) expression `0.1` is equivalent to the
 | 
						|
expression `100m`, which can be read as "one hundred millicpu". Some people say
 | 
						|
"one hundred millicores", and this is understood to mean the same thing.
 | 
						|
 | 
						|
CPU resource is always specified as an absolute amount of resource, never as a relative amount. For example,
 | 
						|
`500m` CPU represents the roughly same amount of computing power whether that container
 | 
						|
runs on a single-core, dual-core, or 48-core machine.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
Kubernetes doesn't allow you to specify CPU resources with a precision finer than
 | 
						|
`1m`. Because of this, it's useful to specify CPU units less than `1.0` or `1000m` using
 | 
						|
the milliCPU form; for example, `5m` rather than `0.005`.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
### Memory resource units {#meaning-of-memory}
 | 
						|
 | 
						|
Limits and requests for `memory` are measured in bytes. You can express memory as
 | 
						|
a plain integer or as a fixed-point number using one of these
 | 
						|
[quantity](/docs/reference/kubernetes-api/common-definitions/quantity/) suffixes:
 | 
						|
E, P, T, G, M, k. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi,
 | 
						|
Mi, Ki. For example, the following represent roughly the same value:
 | 
						|
 | 
						|
```shell
 | 
						|
128974848, 129e6, 129M,  128974848000m, 123Mi
 | 
						|
```
 | 
						|
 | 
						|
Take care about case for suffixes. If you request `400m` of memory, this is a request
 | 
						|
for 0.4 bytes. Someone who types that probably meant to ask for 400 mebibytes (`400Mi`)
 | 
						|
or 400 megabytes (`400M`).
 | 
						|
 | 
						|
## Container resources example {#example-1}
 | 
						|
 | 
						|
The following Pod has two containers. Both containers are defined with a request for
 | 
						|
0.25 CPU
 | 
						|
and 64MiB (2<sup>26</sup> bytes) of memory. Each container has a limit of 0.5
 | 
						|
CPU and 128MiB of memory. You can say the Pod has a request of 0.5 CPU and 128
 | 
						|
MiB of memory, and a limit of 1 CPU and 256MiB of memory.
 | 
						|
 | 
						|
```yaml
 | 
						|
---
 | 
						|
apiVersion: v1
 | 
						|
kind: Pod
 | 
						|
metadata:
 | 
						|
  name: frontend
 | 
						|
spec:
 | 
						|
  containers:
 | 
						|
  - name: app
 | 
						|
    image: images.my-company.example/app:v4
 | 
						|
    resources:
 | 
						|
      requests:
 | 
						|
        memory: "64Mi"
 | 
						|
        cpu: "250m"
 | 
						|
      limits:
 | 
						|
        memory: "128Mi"
 | 
						|
        cpu: "500m"
 | 
						|
  - name: log-aggregator
 | 
						|
    image: images.my-company.example/log-aggregator:v6
 | 
						|
    resources:
 | 
						|
      requests:
 | 
						|
        memory: "64Mi"
 | 
						|
        cpu: "250m"
 | 
						|
      limits:
 | 
						|
        memory: "128Mi"
 | 
						|
        cpu: "500m"
 | 
						|
```
 | 
						|
 | 
						|
## How Pods with resource requests are scheduled
 | 
						|
 | 
						|
When you create a Pod, the Kubernetes scheduler selects a node for the Pod to
 | 
						|
run on. Each node has a maximum capacity for each of the resource types: the
 | 
						|
amount of CPU and memory it can provide for Pods. The scheduler ensures that,
 | 
						|
for each resource type, the sum of the resource requests of the scheduled
 | 
						|
containers is less than the capacity of the node.
 | 
						|
Note that although actual memory
 | 
						|
or CPU resource usage on nodes is very low, the scheduler still refuses to place
 | 
						|
a Pod on a node if the capacity check fails. This protects against a resource
 | 
						|
shortage on a node when resource usage later increases, for example, during a
 | 
						|
daily peak in request rate.
 | 
						|
 | 
						|
## How Kubernetes applies resource requests and limits {#how-pods-with-resource-limits-are-run}
 | 
						|
 | 
						|
When the kubelet starts a container as part of a Pod, the kubelet passes that container's
 | 
						|
requests and limits for memory and CPU to the container runtime.
 | 
						|
 | 
						|
On Linux, the container runtime typically configures
 | 
						|
kernel {{< glossary_tooltip text="cgroups" term_id="cgroup" >}} that apply and enforce the
 | 
						|
limits you defined.
 | 
						|
 | 
						|
- The CPU limit defines a hard ceiling on how much CPU time that the container can use.
 | 
						|
  During each scheduling interval (time slice), the Linux kernel checks to see if this
 | 
						|
  limit is exceeded; if so, the kernel waits before allowing that cgroup to resume execution.
 | 
						|
- The CPU request typically defines a weighting. If several different containers (cgroups)
 | 
						|
  want to run on a contended system, workloads with larger CPU requests are allocated more
 | 
						|
  CPU time than workloads with small requests.
 | 
						|
- The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses
 | 
						|
  cgroups v2, the container runtime might use the memory request as a hint to set
 | 
						|
  `memory.min` and `memory.low`.
 | 
						|
- The memory limit defines a memory limit for that cgroup. If the container tries to
 | 
						|
  allocate more memory than this limit, the Linux kernel out-of-memory subsystem activates
 | 
						|
  and, typically, intervenes by stopping one of the processes in the container that tried
 | 
						|
  to allocate memory. If that process is the container's PID 1, and the container is marked
 | 
						|
  as restartable, Kubernetes restarts the container.
 | 
						|
- The memory limit for the Pod or container can also apply to pages in memory backed
 | 
						|
  volumes, such as an `emptyDir`. The kubelet tracks `tmpfs` emptyDir volumes as container
 | 
						|
  memory use, rather than as local ephemeral storage.
 | 
						|
 | 
						|
If a container exceeds its memory request and the node that it runs on becomes short of
 | 
						|
memory overall, it is likely that the Pod the container belongs to will be
 | 
						|
{{< glossary_tooltip text="evicted" term_id="eviction" >}}.
 | 
						|
 | 
						|
A container might or might not be allowed to exceed its CPU limit for extended periods of time.
 | 
						|
However, container runtimes don't terminate Pods or containers for excessive CPU usage.
 | 
						|
 | 
						|
To determine whether a container cannot be scheduled or is being killed due to resource limits,
 | 
						|
see the [Troubleshooting](#troubleshooting) section.
 | 
						|
 | 
						|
### Monitoring compute & memory resource usage
 | 
						|
 | 
						|
The kubelet reports the resource usage of a Pod as part of the Pod
 | 
						|
[`status`](/docs/concepts/overview/working-with-objects/kubernetes-objects/#object-spec-and-status).
 | 
						|
 | 
						|
If optional [tools for monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
 | 
						|
are available in your cluster, then Pod resource usage can be retrieved either
 | 
						|
from the [Metrics API](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-api)
 | 
						|
directly or from your monitoring tools.
 | 
						|
 | 
						|
## Local ephemeral storage
 | 
						|
 | 
						|
<!-- feature gate LocalStorageCapacityIsolation -->
 | 
						|
{{< feature-state for_k8s_version="v1.10" state="beta" >}}
 | 
						|
 | 
						|
Nodes have local ephemeral storage, backed by
 | 
						|
locally-attached writeable devices or, sometimes, by RAM.
 | 
						|
"Ephemeral" means that there is no long-term guarantee about durability.
 | 
						|
 | 
						|
Pods use ephemeral local storage for scratch space, caching, and for logs.
 | 
						|
The kubelet can provide scratch space to Pods using local ephemeral storage to
 | 
						|
mount [`emptyDir`](/docs/concepts/storage/volumes/#emptydir)
 | 
						|
 {{< glossary_tooltip term_id="volume" text="volumes" >}} into containers.
 | 
						|
 | 
						|
The kubelet also uses this kind of storage to hold
 | 
						|
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level),
 | 
						|
container images, and the writable layers of running containers.
 | 
						|
 | 
						|
{{< caution >}}
 | 
						|
If a node fails, the data in its ephemeral storage can be lost.
 | 
						|
Your applications cannot expect any performance SLAs (disk IOPS for example)
 | 
						|
from local ephemeral storage.
 | 
						|
{{< /caution >}}
 | 
						|
 | 
						|
As a beta feature, Kubernetes lets you track, reserve and limit the amount
 | 
						|
of ephemeral local storage a Pod can consume.
 | 
						|
 | 
						|
### Configurations for local ephemeral storage
 | 
						|
 | 
						|
Kubernetes supports two ways to configure local ephemeral storage on a node:
 | 
						|
{{< tabs name="local_storage_configurations" >}}
 | 
						|
{{% tab name="Single filesystem" %}}
 | 
						|
In this configuration, you place all different kinds of ephemeral local data
 | 
						|
(`emptyDir` volumes, writeable layers, container images, logs) into one filesystem.
 | 
						|
The most effective way to configure the kubelet means dedicating this filesystem
 | 
						|
to Kubernetes (kubelet) data.
 | 
						|
 | 
						|
The kubelet also writes
 | 
						|
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level)
 | 
						|
and treats these similarly to ephemeral local storage.
 | 
						|
 | 
						|
The kubelet writes logs to files inside its configured log directory (`/var/log`
 | 
						|
by default); and has a base directory for other locally stored data
 | 
						|
(`/var/lib/kubelet` by default).
 | 
						|
 | 
						|
Typically, both `/var/lib/kubelet` and `/var/log` are on the system root filesystem,
 | 
						|
and the kubelet is designed with that layout in mind.
 | 
						|
 | 
						|
Your node can have as many other filesystems, not used for Kubernetes,
 | 
						|
as you like.
 | 
						|
{{% /tab %}}
 | 
						|
{{% tab name="Two filesystems" %}}
 | 
						|
You have a filesystem on the node that you're using for ephemeral data that
 | 
						|
comes from running Pods: logs, and `emptyDir` volumes. You can use this filesystem
 | 
						|
for other data (for example: system logs not related to Kubernetes); it can even
 | 
						|
be the root filesystem.
 | 
						|
 | 
						|
The kubelet also writes
 | 
						|
[node-level container logs](/docs/concepts/cluster-administration/logging/#logging-at-the-node-level)
 | 
						|
into the first filesystem, and treats these similarly to ephemeral local storage.
 | 
						|
 | 
						|
You also use a separate filesystem, backed by a different logical storage device.
 | 
						|
In this configuration, the directory where you tell the kubelet to place
 | 
						|
container image layers and writeable layers is on this second filesystem.
 | 
						|
 | 
						|
The first filesystem does not hold any image layers or writeable layers.
 | 
						|
 | 
						|
Your node can have as many other filesystems, not used for Kubernetes,
 | 
						|
as you like.
 | 
						|
{{% /tab %}}
 | 
						|
{{< /tabs >}}
 | 
						|
 | 
						|
The kubelet can measure how much local storage it is using. It does this provided
 | 
						|
that:
 | 
						|
 | 
						|
- the `LocalStorageCapacityIsolation`
 | 
						|
  [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
 | 
						|
  is enabled (the feature is on by default), and
 | 
						|
- you have set up the node using one of the supported configurations
 | 
						|
  for local ephemeral storage.
 | 
						|
 | 
						|
If you have a different configuration, then the kubelet does not apply resource
 | 
						|
limits for ephemeral local storage.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
The kubelet tracks `tmpfs` emptyDir volumes as container memory use, rather
 | 
						|
than as local ephemeral storage.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
### Setting requests and limits for local ephemeral storage
 | 
						|
 | 
						|
You can specify `ephemeral-storage` for managing local ephemeral storage. Each
 | 
						|
container of a Pod can specify either or both of the following:
 | 
						|
 | 
						|
* `spec.containers[].resources.limits.ephemeral-storage`
 | 
						|
* `spec.containers[].resources.requests.ephemeral-storage`
 | 
						|
 | 
						|
Limits and requests for `ephemeral-storage` are measured in byte quantities.
 | 
						|
You can express storage as a plain integer or as a fixed-point number using one of these suffixes:
 | 
						|
E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi,
 | 
						|
Mi, Ki. For example, the following quantities all represent roughly the same value:
 | 
						|
 | 
						|
- `128974848`
 | 
						|
- `129e6`
 | 
						|
- `129M`
 | 
						|
- `123Mi`
 | 
						|
 | 
						|
In the following example, the Pod has two containers. Each container has a request of
 | 
						|
2GiB of local ephemeral storage. Each container has a limit of 4GiB of local ephemeral
 | 
						|
storage. Therefore, the Pod has a request of 4GiB of local ephemeral storage, and
 | 
						|
a limit of 8GiB of local ephemeral storage.
 | 
						|
 | 
						|
```yaml
 | 
						|
apiVersion: v1
 | 
						|
kind: Pod
 | 
						|
metadata:
 | 
						|
  name: frontend
 | 
						|
spec:
 | 
						|
  containers:
 | 
						|
  - name: app
 | 
						|
    image: images.my-company.example/app:v4
 | 
						|
    resources:
 | 
						|
      requests:
 | 
						|
        ephemeral-storage: "2Gi"
 | 
						|
      limits:
 | 
						|
        ephemeral-storage: "4Gi"
 | 
						|
    volumeMounts:
 | 
						|
    - name: ephemeral
 | 
						|
      mountPath: "/tmp"
 | 
						|
  - name: log-aggregator
 | 
						|
    image: images.my-company.example/log-aggregator:v6
 | 
						|
    resources:
 | 
						|
      requests:
 | 
						|
        ephemeral-storage: "2Gi"
 | 
						|
      limits:
 | 
						|
        ephemeral-storage: "4Gi"
 | 
						|
    volumeMounts:
 | 
						|
    - name: ephemeral
 | 
						|
      mountPath: "/tmp"
 | 
						|
  volumes:
 | 
						|
    - name: ephemeral
 | 
						|
      emptyDir: {}
 | 
						|
```
 | 
						|
 | 
						|
### How Pods with ephemeral-storage requests are scheduled
 | 
						|
 | 
						|
When you create a Pod, the Kubernetes scheduler selects a node for the Pod to
 | 
						|
run on. Each node has a maximum amount of local ephemeral storage it can provide for Pods.
 | 
						|
For more information, see
 | 
						|
[Node Allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
 | 
						|
 | 
						|
The scheduler ensures that the sum of the resource requests of the scheduled containers is less than the capacity of the node.
 | 
						|
 | 
						|
### Ephemeral storage consumption management {#resource-emphemeralstorage-consumption}
 | 
						|
 | 
						|
If the kubelet is managing local ephemeral storage as a resource, then the
 | 
						|
kubelet measures storage use in:
 | 
						|
 | 
						|
- `emptyDir` volumes, except _tmpfs_ `emptyDir` volumes
 | 
						|
- directories holding node-level logs
 | 
						|
- writeable container layers
 | 
						|
 | 
						|
If a Pod is using more ephemeral storage than you allow it to, the kubelet
 | 
						|
sets an eviction signal that triggers Pod eviction.
 | 
						|
 | 
						|
For container-level isolation, if a container's writable layer and log
 | 
						|
usage exceeds its storage limit, the kubelet marks the Pod for eviction.
 | 
						|
 | 
						|
For pod-level isolation the kubelet works out an overall Pod storage limit by
 | 
						|
summing the limits for the containers in that Pod. In this case, if the sum of
 | 
						|
the local ephemeral storage usage from all containers and also the Pod's `emptyDir`
 | 
						|
volumes exceeds the overall Pod storage limit, then the kubelet also marks the Pod
 | 
						|
for eviction.
 | 
						|
 | 
						|
{{< caution >}}
 | 
						|
If the kubelet is not measuring local ephemeral storage, then a Pod
 | 
						|
that exceeds its local storage limit will not be evicted for breaching
 | 
						|
local storage resource limits.
 | 
						|
 | 
						|
However, if the filesystem space for writeable container layers, node-level logs,
 | 
						|
or `emptyDir` volumes falls low, the node
 | 
						|
{{< glossary_tooltip text="taints" term_id="taint" >}} itself as short on local storage
 | 
						|
and this taint triggers eviction for any Pods that don't specifically tolerate the taint.
 | 
						|
 | 
						|
See the supported [configurations](#configurations-for-local-ephemeral-storage)
 | 
						|
for ephemeral local storage.
 | 
						|
{{< /caution >}}
 | 
						|
 | 
						|
The kubelet supports different ways to measure Pod storage use:
 | 
						|
 | 
						|
{{< tabs name="resource-emphemeralstorage-measurement" >}}
 | 
						|
{{% tab name="Periodic scanning" %}}
 | 
						|
The kubelet performs regular, scheduled checks that scan each
 | 
						|
`emptyDir` volume, container log directory, and writeable container layer.
 | 
						|
 | 
						|
The scan measures how much space is used.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
In this mode, the kubelet does not track open file descriptors
 | 
						|
for deleted files.
 | 
						|
 | 
						|
If you (or a container) create a file inside an `emptyDir` volume,
 | 
						|
something then opens that file, and you delete the file while it is
 | 
						|
still open, then the inode for the deleted file stays until you close
 | 
						|
that file but the kubelet does not categorize the space as in use.
 | 
						|
{{< /note >}}
 | 
						|
{{% /tab %}}
 | 
						|
{{% tab name="Filesystem project quota" %}}
 | 
						|
 | 
						|
{{< feature-state for_k8s_version="v1.15" state="alpha" >}}
 | 
						|
 | 
						|
Project quotas are an operating-system level feature for managing
 | 
						|
storage use on filesystems. With Kubernetes, you can enable project
 | 
						|
quotas for monitoring storage use. Make sure that the filesystem
 | 
						|
backing the `emptyDir` volumes, on the node, provides project quota support.
 | 
						|
For example, XFS and ext4fs offer project quotas.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
Project quotas let you monitor storage use; they do not enforce limits.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
Kubernetes uses project IDs starting from `1048576`. The IDs in use are
 | 
						|
registered in `/etc/projects` and `/etc/projid`. If project IDs in
 | 
						|
this range are used for other purposes on the system, those project
 | 
						|
IDs must be registered in `/etc/projects` and `/etc/projid` so that
 | 
						|
Kubernetes does not use them.
 | 
						|
 | 
						|
Quotas are faster and more accurate than directory scanning. When a
 | 
						|
directory is assigned to a project, all files created under a
 | 
						|
directory are created in that project, and the kernel merely has to
 | 
						|
keep track of how many blocks are in use by files in that project.
 | 
						|
If a file is created and deleted, but has an open file descriptor,
 | 
						|
it continues to consume space. Quota tracking records that space accurately
 | 
						|
whereas directory scans overlook the storage used by deleted files.
 | 
						|
 | 
						|
If you want to use project quotas, you should:
 | 
						|
 | 
						|
* Enable the `LocalStorageCapacityIsolationFSQuotaMonitoring=true`
 | 
						|
  [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
 | 
						|
  using the `featureGates` field in the
 | 
						|
  [kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/)
 | 
						|
  or the `--feature-gates` command line flag.
 | 
						|
 | 
						|
* Ensure that the root filesystem (or optional runtime filesystem)
 | 
						|
  has project quotas enabled. All XFS filesystems support project quotas.
 | 
						|
  For ext4 filesystems, you need to enable the project quota tracking feature
 | 
						|
  while the filesystem is not mounted.
 | 
						|
 | 
						|
  ```bash
 | 
						|
  # For ext4, with /dev/block-device not mounted
 | 
						|
  sudo tune2fs -O project -Q prjquota /dev/block-device
 | 
						|
  ```
 | 
						|
 | 
						|
* Ensure that the root filesystem (or optional runtime filesystem) is
 | 
						|
  mounted with project quotas enabled. For both XFS and ext4fs, the
 | 
						|
  mount option is named `prjquota`.
 | 
						|
 | 
						|
{{% /tab %}}
 | 
						|
{{< /tabs >}}
 | 
						|
 | 
						|
## Extended resources
 | 
						|
 | 
						|
Extended resources are fully-qualified resource names outside the
 | 
						|
`kubernetes.io` domain. They allow cluster operators to advertise and users to
 | 
						|
consume the non-Kubernetes-built-in resources.
 | 
						|
 | 
						|
There are two steps required to use Extended Resources. First, the cluster
 | 
						|
operator must advertise an Extended Resource. Second, users must request the
 | 
						|
Extended Resource in Pods.
 | 
						|
 | 
						|
### Managing extended resources
 | 
						|
 | 
						|
#### Node-level extended resources
 | 
						|
 | 
						|
Node-level extended resources are tied to nodes.
 | 
						|
 | 
						|
##### Device plugin managed resources
 | 
						|
See [Device
 | 
						|
Plugin](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/)
 | 
						|
for how to advertise device plugin managed resources on each node.
 | 
						|
 | 
						|
##### Other resources
 | 
						|
 | 
						|
To advertise a new node-level extended resource, the cluster operator can
 | 
						|
submit a `PATCH` HTTP request to the API server to specify the available
 | 
						|
quantity in the `status.capacity` for a node in the cluster. After this
 | 
						|
operation, the node's `status.capacity` will include a new resource. The
 | 
						|
`status.allocatable` field is updated automatically with the new resource
 | 
						|
asynchronously by the kubelet.
 | 
						|
 | 
						|
Because the scheduler uses the node's `status.allocatable` value when
 | 
						|
evaluating Pod fitness, the scheduler only takes account of the new value after
 | 
						|
that asynchronous update. There may be a short delay between patching the
 | 
						|
node capacity with a new resource and the time when the first Pod that requests
 | 
						|
the resource can be scheduled on that node.
 | 
						|
 | 
						|
**Example:**
 | 
						|
 | 
						|
Here is an example showing how to use `curl` to form an HTTP request that
 | 
						|
advertises five "example.com/foo" resources on node `k8s-node-1` whose master
 | 
						|
is `k8s-master`.
 | 
						|
 | 
						|
```shell
 | 
						|
curl --header "Content-Type: application/json-patch+json" \
 | 
						|
--request PATCH \
 | 
						|
--data '[{"op": "add", "path": "/status/capacity/example.com~1foo", "value": "5"}]' \
 | 
						|
http://k8s-master:8080/api/v1/nodes/k8s-node-1/status
 | 
						|
```
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
In the preceding request, `~1` is the encoding for the character `/`
 | 
						|
in the patch path. The operation path value in JSON-Patch is interpreted as a
 | 
						|
JSON-Pointer. For more details, see
 | 
						|
[IETF RFC 6901, section 3](https://tools.ietf.org/html/rfc6901#section-3).
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
#### Cluster-level extended resources
 | 
						|
 | 
						|
Cluster-level extended resources are not tied to nodes. They are usually managed
 | 
						|
by scheduler extenders, which handle the resource consumption and resource quota.
 | 
						|
 | 
						|
You can specify the extended resources that are handled by scheduler extenders
 | 
						|
in [scheduler configuration](/docs/reference/config-api/kube-scheduler-config.v1beta3/)
 | 
						|
 | 
						|
**Example:**
 | 
						|
 | 
						|
The following configuration for a scheduler policy indicates that the
 | 
						|
cluster-level extended resource "example.com/foo" is handled by the scheduler
 | 
						|
extender.
 | 
						|
 | 
						|
- The scheduler sends a Pod to the scheduler extender only if the Pod requests
 | 
						|
     "example.com/foo".
 | 
						|
- The `ignoredByScheduler` field specifies that the scheduler does not check
 | 
						|
     the "example.com/foo" resource in its `PodFitsResources` predicate.
 | 
						|
 | 
						|
```json
 | 
						|
{
 | 
						|
  "kind": "Policy",
 | 
						|
  "apiVersion": "v1",
 | 
						|
  "extenders": [
 | 
						|
    {
 | 
						|
      "urlPrefix":"<extender-endpoint>",
 | 
						|
      "bindVerb": "bind",
 | 
						|
      "managedResources": [
 | 
						|
        {
 | 
						|
          "name": "example.com/foo",
 | 
						|
          "ignoredByScheduler": true
 | 
						|
        }
 | 
						|
      ]
 | 
						|
    }
 | 
						|
  ]
 | 
						|
}
 | 
						|
```
 | 
						|
 | 
						|
### Consuming extended resources
 | 
						|
 | 
						|
Users can consume extended resources in Pod specs like CPU and memory.
 | 
						|
The scheduler takes care of the resource accounting so that no more than the
 | 
						|
available amount is simultaneously allocated to Pods.
 | 
						|
 | 
						|
The API server restricts quantities of extended resources to whole numbers.
 | 
						|
Examples of _valid_ quantities are `3`, `3000m` and `3Ki`. Examples of
 | 
						|
_invalid_ quantities are `0.5` and `1500m`.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
Extended resources replace Opaque Integer Resources.
 | 
						|
Users can use any domain name prefix other than `kubernetes.io` which is reserved.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
To consume an extended resource in a Pod, include the resource name as a key
 | 
						|
in the `spec.containers[].resources.limits` map in the container spec.
 | 
						|
 | 
						|
{{< note >}}
 | 
						|
Extended resources cannot be overcommitted, so request and limit
 | 
						|
must be equal if both are present in a container spec.
 | 
						|
{{< /note >}}
 | 
						|
 | 
						|
A Pod is scheduled only if all of the resource requests are satisfied, including
 | 
						|
CPU, memory and any extended resources. The Pod remains in the `PENDING` state
 | 
						|
as long as the resource request cannot be satisfied.
 | 
						|
 | 
						|
**Example:**
 | 
						|
 | 
						|
The Pod below requests 2 CPUs and 1 "example.com/foo" (an extended resource).
 | 
						|
 | 
						|
```yaml
 | 
						|
apiVersion: v1
 | 
						|
kind: Pod
 | 
						|
metadata:
 | 
						|
  name: my-pod
 | 
						|
spec:
 | 
						|
  containers:
 | 
						|
  - name: my-container
 | 
						|
    image: myimage
 | 
						|
    resources:
 | 
						|
      requests:
 | 
						|
        cpu: 2
 | 
						|
        example.com/foo: 1
 | 
						|
      limits:
 | 
						|
        example.com/foo: 1
 | 
						|
```
 | 
						|
 | 
						|
## PID limiting
 | 
						|
 | 
						|
Process ID (PID) limits allow for the configuration of a kubelet
 | 
						|
to limit the number of PIDs that a given Pod can consume. See
 | 
						|
[PID Limiting](/docs/concepts/policy/pid-limiting/) for information.
 | 
						|
 | 
						|
## Troubleshooting
 | 
						|
 | 
						|
### My Pods are pending with event message `FailedScheduling`
 | 
						|
 | 
						|
If the scheduler cannot find any node where a Pod can fit, the Pod remains
 | 
						|
unscheduled until a place can be found. An
 | 
						|
[Event](/docs/reference/kubernetes-api/cluster-resources/event-v1/) is produced
 | 
						|
each time the scheduler fails to find a place for the Pod. You can use `kubectl`
 | 
						|
to view the events for a Pod; for example:
 | 
						|
 | 
						|
```shell
 | 
						|
kubectl describe pod frontend | grep -A 9999999999 Events
 | 
						|
```
 | 
						|
```
 | 
						|
Events:
 | 
						|
  Type     Reason            Age   From               Message
 | 
						|
  ----     ------            ----  ----               -------
 | 
						|
  Warning  FailedScheduling  23s   default-scheduler  0/42 nodes available: insufficient cpu
 | 
						|
```
 | 
						|
 | 
						|
In the preceding example, the Pod named "frontend" fails to be scheduled due to
 | 
						|
insufficient CPU resource on any node. Similar error messages can also suggest
 | 
						|
failure due to insufficient memory (PodExceedsFreeMemory). In general, if a Pod
 | 
						|
is pending with a message of this type, there are several things to try:
 | 
						|
 | 
						|
- Add more nodes to the cluster.
 | 
						|
- Terminate unneeded Pods to make room for pending Pods.
 | 
						|
- Check that the Pod is not larger than all the nodes. For example, if all the
 | 
						|
  nodes have a capacity of `cpu: 1`, then a Pod with a request of `cpu: 1.1` will
 | 
						|
  never be scheduled.
 | 
						|
- Check for node taints. If most of your nodes are tainted, and the new Pod does
 | 
						|
  not tolerate that taint, the scheduler only considers placements onto the
 | 
						|
  remaining nodes that don't have that taint.
 | 
						|
 | 
						|
You can check node capacities and amounts allocated with the
 | 
						|
`kubectl describe nodes` command. For example:
 | 
						|
 | 
						|
```shell
 | 
						|
kubectl describe nodes e2e-test-node-pool-4lw4
 | 
						|
```
 | 
						|
```
 | 
						|
Name:            e2e-test-node-pool-4lw4
 | 
						|
[ ... lines removed for clarity ...]
 | 
						|
Capacity:
 | 
						|
 cpu:                               2
 | 
						|
 memory:                            7679792Ki
 | 
						|
 pods:                              110
 | 
						|
Allocatable:
 | 
						|
 cpu:                               1800m
 | 
						|
 memory:                            7474992Ki
 | 
						|
 pods:                              110
 | 
						|
[ ... lines removed for clarity ...]
 | 
						|
Non-terminated Pods:        (5 in total)
 | 
						|
  Namespace    Name                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits
 | 
						|
  ---------    ----                                  ------------  ----------  ---------------  -------------
 | 
						|
  kube-system  fluentd-gcp-v1.38-28bv1               100m (5%)     0 (0%)      200Mi (2%)       200Mi (2%)
 | 
						|
  kube-system  kube-dns-3297075139-61lj3             260m (13%)    0 (0%)      100Mi (1%)       170Mi (2%)
 | 
						|
  kube-system  kube-proxy-e2e-test-...               100m (5%)     0 (0%)      0 (0%)           0 (0%)
 | 
						|
  kube-system  monitoring-influxdb-grafana-v4-z1m12  200m (10%)    200m (10%)  600Mi (8%)       600Mi (8%)
 | 
						|
  kube-system  node-problem-detector-v0.1-fj7m3      20m (1%)      200m (10%)  20Mi (0%)        100Mi (1%)
 | 
						|
Allocated resources:
 | 
						|
  (Total limits may be over 100 percent, i.e., overcommitted.)
 | 
						|
  CPU Requests    CPU Limits    Memory Requests    Memory Limits
 | 
						|
  ------------    ----------    ---------------    -------------
 | 
						|
  680m (34%)      400m (20%)    920Mi (11%)        1070Mi (13%)
 | 
						|
```
 | 
						|
 | 
						|
In the preceding output, you can see that if a Pod requests more than 1.120 CPUs
 | 
						|
or more than 6.23Gi of memory, that Pod will not fit on the node.
 | 
						|
 | 
						|
By looking at the “Pods” section, you can see which Pods are taking up space on
 | 
						|
the node.
 | 
						|
 | 
						|
The amount of resources available to Pods is less than the node capacity because
 | 
						|
system daemons use a portion of the available resources. Within the Kubernetes API,
 | 
						|
each Node has a `.status.allocatable` field
 | 
						|
(see [NodeStatus](/docs/reference/kubernetes-api/cluster-resources/node-v1/#NodeStatus)
 | 
						|
for details).
 | 
						|
 | 
						|
The `.status.allocatable` field describes the amount of resources that are available
 | 
						|
to Pods on that node (for example: 15 virtual CPUs and 7538 MiB of memory).
 | 
						|
For more information on node allocatable resources in Kubernetes, see
 | 
						|
[Reserve Compute Resources for System Daemons](/docs/tasks/administer-cluster/reserve-compute-resources/).
 | 
						|
 | 
						|
You can configure [resource quotas](/docs/concepts/policy/resource-quotas/)
 | 
						|
to limit the total amount of resources that a namespace can consume.
 | 
						|
Kubernetes enforces quotas for objects in particular namespace when there is a
 | 
						|
ResourceQuota in that namespace.
 | 
						|
For example, if you assign specific namespaces to different teams, you
 | 
						|
can add ResourceQuotas into those namespaces. Setting resource quotas helps to
 | 
						|
prevent one team from using so much of any resource that this over-use affects other teams.
 | 
						|
 | 
						|
You should also consider what access you grant to that namespace:
 | 
						|
**full** write access to a namespace allows someone with that access to remove any
 | 
						|
resource, including a configured ResourceQuota.
 | 
						|
 | 
						|
### My container is terminated
 | 
						|
 | 
						|
Your container might get terminated because it is resource-starved. To check
 | 
						|
whether a container is being killed because it is hitting a resource limit, call
 | 
						|
`kubectl describe pod` on the Pod of interest:
 | 
						|
 | 
						|
```shell
 | 
						|
kubectl describe pod simmemleak-hra99
 | 
						|
```
 | 
						|
 | 
						|
The output is similar to:
 | 
						|
```
 | 
						|
Name:                           simmemleak-hra99
 | 
						|
Namespace:                      default
 | 
						|
Image(s):                       saadali/simmemleak
 | 
						|
Node:                           kubernetes-node-tf0f/10.240.216.66
 | 
						|
Labels:                         name=simmemleak
 | 
						|
Status:                         Running
 | 
						|
Reason:
 | 
						|
Message:
 | 
						|
IP:                             10.244.2.75
 | 
						|
Containers:
 | 
						|
  simmemleak:
 | 
						|
    Image:  saadali/simmemleak:latest
 | 
						|
    Limits:
 | 
						|
      cpu:          100m
 | 
						|
      memory:       50Mi
 | 
						|
    State:          Running
 | 
						|
      Started:      Tue, 07 Jul 2019 12:54:41 -0700
 | 
						|
    Last State:     Terminated
 | 
						|
      Reason:       OOMKilled
 | 
						|
      Exit Code:    137
 | 
						|
      Started:      Fri, 07 Jul 2019 12:54:30 -0700
 | 
						|
      Finished:     Fri, 07 Jul 2019 12:54:33 -0700
 | 
						|
    Ready:          False
 | 
						|
    Restart Count:  5
 | 
						|
Conditions:
 | 
						|
  Type      Status
 | 
						|
  Ready     False
 | 
						|
Events:
 | 
						|
  Type    Reason     Age   From               Message
 | 
						|
  ----    ------     ----  ----               -------
 | 
						|
  Normal  Scheduled  42s   default-scheduler  Successfully assigned simmemleak-hra99 to kubernetes-node-tf0f
 | 
						|
  Normal  Pulled     41s   kubelet            Container image "saadali/simmemleak:latest" already present on machine
 | 
						|
  Normal  Created    41s   kubelet            Created container simmemleak
 | 
						|
  Normal  Started    40s   kubelet            Started container simmemleak
 | 
						|
  Normal  Killing    32s   kubelet            Killing container with id ead3fb35-5cf5-44ed-9ae1-488115be66c6: Need to kill Pod
 | 
						|
```
 | 
						|
 | 
						|
In the preceding example, the `Restart Count:  5` indicates that the `simmemleak`
 | 
						|
container in the Pod was terminated and restarted five times (so far).
 | 
						|
The `OOMKilled` reason shows that the container tried to use more memory than its limit.
 | 
						|
 | 
						|
Your next step might be to check the application code for a memory leak. If you
 | 
						|
find that the application is behaving how you expect, consider setting a higher
 | 
						|
memory limit (and possibly request) for that container.
 | 
						|
 | 
						|
## {{% heading "whatsnext" %}}
 | 
						|
 | 
						|
* Get hands-on experience [assigning Memory resources to containers and Pods](/docs/tasks/configure-pod-container/assign-memory-resource/).
 | 
						|
* Get hands-on experience [assigning CPU resources to containers and Pods](/docs/tasks/configure-pod-container/assign-cpu-resource/).
 | 
						|
* Read how the API reference defines a [container](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container)
 | 
						|
  and its [resource requirements](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources)
 | 
						|
* Read about [project quotas](https://xfs.org/index.php/XFS_FAQ#Q:_Quota:_Do_quotas_work_on_XFS.3F) in XFS
 | 
						|
* Read more about the [kube-scheduler configuration reference (v1beta3)](/docs/reference/config-api/kube-scheduler-config.v1beta3/)
 | 
						|
 |