Merge pull request #43214 from shannonxtreme/apparmor-seccomp
Add new page for kernel-level constraints
This commit is contained in:
commit
ea4444a849
|
|
@ -0,0 +1,290 @@
|
|||
---
|
||||
title: Linux kernel security constraints for Pods and containers
|
||||
description: >
|
||||
Overview of Linux kernel security modules and constraints that you can use to
|
||||
harden your Pods and containers.
|
||||
content_type: concept
|
||||
weight: 100
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
This page describes some of the security features that are built into the Linux
|
||||
kernel that you can use in your Kubernetes workloads. To learn how to apply
|
||||
these features to your Pods and containers, refer to
|
||||
[Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
|
||||
You should already be familiar with Linux and with the basics of Kubernetes
|
||||
workloads.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Run workloads without root privileges {#run-without-root}
|
||||
|
||||
When you deploy a workload in Kubernetes, use the Pod specification to restrict
|
||||
that workload from running as the root user on the node. You can use the Pod
|
||||
`securityContext` to define the specific Linux user and group for the processes in
|
||||
the Pod, and explicitly restrict containers from running as root users. Setting
|
||||
these values in the Pod manifest takes precedence over similar values in the
|
||||
container image, which is especially useful if you're running images that you
|
||||
don't own.
|
||||
|
||||
{{< caution >}}
|
||||
Ensure that the user or group that you assign to the workload has the permissions
|
||||
required for the application to function correctly. Changing the user or group
|
||||
to one that doesn't have the correct permissions could lead to file access
|
||||
issues or failed operations.
|
||||
{{< /caution >}}
|
||||
|
||||
Configuring the kernel security features on this page provides fine-grained
|
||||
control over the actions that processes in your cluster can take, but managing
|
||||
these configurations can be challenging at scale. Running containers as
|
||||
non-root, or in user namespaces if you need root privileges, helps to reduce the
|
||||
chance that you'll need to enforce your configured kernel security capabilities.
|
||||
|
||||
## Security features in the Linux kernel {#linux-security-features}
|
||||
|
||||
Kubernetes lets you configure and use Linux kernel features to improve isolation
|
||||
and harden your containerized workloads. Common features include the following:
|
||||
|
||||
* **Secure computing mode (seccomp)**: Filter which system calls a process can
|
||||
make
|
||||
* **AppArmor**: Restrict the access privileges of individual programs
|
||||
* **Security Enhanced Linux (SELinux)**: Assign security labels to objects for
|
||||
more manageable security policy enforcement
|
||||
|
||||
To configure settings for one of these features, the operating system that you
|
||||
choose for your nodes must enable the feature in the kernel. For example,
|
||||
Ubuntu 7.10 and later enable AppArmor by default. To learn whether your OS
|
||||
enables a specific feature, consult the OS documentation.
|
||||
|
||||
You use the `securityContext` field in your Pod specification to define the
|
||||
constraints that apply to those processes. The `securityContext` field also
|
||||
supports other security settings, such as specific Linux capabilities or file
|
||||
access permissions using UIDs and GIDs. To learn more, refer to
|
||||
[Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
|
||||
|
||||
### seccomp
|
||||
|
||||
Some of your workloads might need privileges to perform specific actions as the
|
||||
root user on your node's host machine. Linux uses *capabilities* to divide the
|
||||
available privileges into categories, so that processes can get the privileges
|
||||
required to perform specific actions without being granted all privileges. Each
|
||||
capability has a set of system calls (syscalls) that a process can make. seccomp
|
||||
lets you restrict these individual syscalls. <!--Copied from seccomp tutorial-->
|
||||
It can be used to sandbox the privileges of a process, restricting the calls it
|
||||
is able to make from userspace into the kernel.<!--End copy-->
|
||||
|
||||
In Kubernetes, you use a *container runtime* on each node to run your
|
||||
containers. Example runtimes include CRI-O, Docker, or containerd. Each runtime
|
||||
allows only a subset of Linux capabilities by default. You can further limit the
|
||||
allowed syscalls individually by using a seccomp profile. Container runtimes
|
||||
usually include a default seccomp profile. <!--Copied from seccomp tutorial-->
|
||||
Kubernetes lets you automatically
|
||||
apply seccomp profiles loaded onto a node to your Pods and containers.<!--End copy-->
|
||||
|
||||
{{<note>}}
|
||||
Kubernetes also has the `allowPrivilegeEscalation` setting for Pods and
|
||||
containers. When set to `false`, this prevents processes from gaining new
|
||||
capabilities and restricts unprivileged users from changing the applied seccomp
|
||||
profile to a more permissive profile.
|
||||
{{</note>}}
|
||||
|
||||
To learn how to implement seccomp in Kubernetes, refer to
|
||||
[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/).
|
||||
|
||||
To learn more about seccomp, see
|
||||
[Seccomp BPF](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)
|
||||
in the Linux kernel documentation.
|
||||
|
||||
#### Considerations for seccomp {#seccomp-considerations}
|
||||
|
||||
seccomp is a low-level security configuration that you should only configure
|
||||
yourself if you require fine-grained control over Linux syscalls. Using
|
||||
seccomp, especially at scale, has the following risks:
|
||||
|
||||
* Configurations might break during application updates
|
||||
* Attackers can still use allowed syscalls to exploit vulnerabilities
|
||||
* Profile management for individual applications becomes challenging at scale
|
||||
|
||||
**Recommendation**: Use the default seccomp profile that's bundled with your
|
||||
container runtime. If you need a more isolated environment, consider using a
|
||||
sandbox, such as gVisor. Sandboxes solve the preceding risks with custom
|
||||
seccomp profiles, but require more compute resources on your nodes and might
|
||||
have compatibility issues with GPUs and other specialized hardware.
|
||||
|
||||
### AppArmor and SELinux: policy-based mandatory access control {#policy-based-mac}
|
||||
|
||||
You can use Linux policy-based mandatory access control (MAC) mechanisms, such
|
||||
as AppArmor and SELinux, to harden your Kubernetes workloads.
|
||||
|
||||
#### AppArmor
|
||||
|
||||
<!-- Original text from https://kubernetes.io/docs/tutorials/security/apparmor/ -->
|
||||
|
||||
[AppArmor](https://apparmor.net/) is a Linux kernel security module that
|
||||
supplements the standard Linux user and group based permissions to confine
|
||||
programs to a limited set of resources. AppArmor can be configured for any
|
||||
application to reduce its potential attack surface and provide greater in-depth
|
||||
defense. It is configured through profiles tuned to allow the access needed by a
|
||||
specific program or container, such as Linux capabilities, network access, and
|
||||
file permissions. Each profile can be run in either enforcing mode, which blocks
|
||||
access to disallowed resources, or complain mode, which only reports violations.
|
||||
|
||||
AppArmor can help you to run a more secure deployment by restricting what
|
||||
containers are allowed to do, and/or provide better auditing through system
|
||||
logs. The container runtime that you use might ship with a default AppArmor
|
||||
profile, or you can use a custom profile.
|
||||
|
||||
To learn how to use AppArmor in Kubernetes, refer to
|
||||
[Restrict a Container's Access to Resources with AppArmor](/docs/tutorials/security/apparmor/).
|
||||
|
||||
#### SELinux
|
||||
|
||||
SELinux is a Linux kernel security module that lets you restrict the access
|
||||
that a specific *subject*, such as a process, has to the files on your system.
|
||||
You define security policies that apply to subjects that have specific SELinux
|
||||
labels. When a process that has an SELinux label attempts to access a file, the
|
||||
SELinux server checks whether that process' security policy allows the access
|
||||
and makes an authorization decision.
|
||||
|
||||
In Kubernetes, you can set an SELinux label in the `securityContext` field of
|
||||
your manifest. The specified labels are assigned to those processes. If you
|
||||
have configured security policies that affect those labels, the host OS kernel
|
||||
enforces these policies.
|
||||
|
||||
To learn how to use SELinux in Kubernetes, refer to
|
||||
[Assign SELinux labels to a container](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container).
|
||||
|
||||
#### Differences between AppArmor and SELinux {#apparmor-selinux-diff}
|
||||
|
||||
The operating system on your Linux nodes usually includes one of either
|
||||
AppArmor or SELinux. Both mechanisms provide similar types of protection, but
|
||||
have differences such as the following:
|
||||
|
||||
* **Configuration**: AppArmor uses profiles to define access to resources.
|
||||
SELinux uses policies that apply to specific labels.
|
||||
* **Policy application**: In AppArmor, you define resources using file paths.
|
||||
SELinux uses the index node (inode) of a resource to identify the resource.
|
||||
|
||||
### Summary of features {#summary}
|
||||
|
||||
The following table describes the use cases and scope of each security control.
|
||||
You can use all of these controls together to build a more hardened system.
|
||||
|
||||
<table>
|
||||
<caption>Summary of Linux kernel security features</caption>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Security feature</th>
|
||||
<th>Description</th>
|
||||
<th>How to use</th>
|
||||
<th>Example</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td>seccomp</td>
|
||||
<td>Restrict individual kernel calls in the userspace. Reduces the
|
||||
likelihood that a vulnerability that uses a restricted syscall would
|
||||
compromise the system.</td>
|
||||
<td>Specify a loaded seccomp profile in the Pod or container specification
|
||||
to apply its constraints to the processes in the Pod.</td>
|
||||
<td>Reject the <code>unshare</code> syscall, which was used in
|
||||
<a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0185">CVE-2022-0185</a>.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>AppArmor</td>
|
||||
<td>Restrict program access to specific resources. Reduces the attack
|
||||
surface of the program. Improves audit logging.</td>
|
||||
<td>Specify a loaded AppArmor profile in the container specification.</td>
|
||||
<td>Restrict a read-only program from writing to any file path
|
||||
in the system.</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>SELinux</td>
|
||||
<td>Restrict access to resources such as files, applications, ports, and
|
||||
processes using labels and security policies.</td>
|
||||
<td>Specify access restrictions for specific labels. Tag processes with
|
||||
those labels to enforce the access restrictions related to the label.</td>
|
||||
<td>Restrict a container from accessing files outside its own filesystem.</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
{{< note >}}
|
||||
Mechanisms like AppArmor and SELinux can provide protection that extends beyond
|
||||
the container. For example, you can use SELinux to help mitigate
|
||||
[CVE-2019-5736](https://access.redhat.com/security/cve/cve-2019-5736).
|
||||
{{< /note >}}
|
||||
|
||||
### Considerations for managing custom configurations {#considerations-custom-configurations}
|
||||
|
||||
seccomp, AppArmor, and SELinux usually have a default configuration that offers
|
||||
basic protections. You can also create custom profiles and policies that meet
|
||||
the requirements of your workloads. Managing and distributing these custom
|
||||
configurations at scale might be challenging, especially if you use all three
|
||||
features together. To help you to manage these configurations at scale, use a
|
||||
tool like the
|
||||
[Kubernetes Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator).
|
||||
|
||||
## Kernel-level security features and privileged containers {#kernel-security-features-privileged-containers}
|
||||
|
||||
Kubernetes lets you specify that some trusted containers can run in
|
||||
*privileged* mode. Any container in a Pod can run in privileged mode to use
|
||||
operating system administrative capabilities that would otherwise be
|
||||
inaccessible. This is available for both Windows and Linux.
|
||||
|
||||
Privileged containers explicitly override some of the Linux kernel constraints
|
||||
that you might use in your workloads, as follows:
|
||||
|
||||
* **seccomp**: Privileged containers run as the `Unconfined` seccomp profile,
|
||||
overriding any seccomp profile that you specified in your manifest.
|
||||
* **AppArmor**: Privileged containers ignore any applied AppArmor profiles.
|
||||
* **SELinux**: Privileged containers run as the `unconfined_t` domain.
|
||||
|
||||
### Privileged containers {#privileged-containers}
|
||||
|
||||
<!-- Content from https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers -->
|
||||
|
||||
Any container in a Pod can enable *Privileged mode* if you set the
|
||||
`privileged: true` field in the
|
||||
[`securityContext`](/docs/tasks/configure-pod-container/security-context/)
|
||||
field for the container. Privileged containers override or undo many other hardening settings such as the applied seccomp profile, AppArmor profile, or
|
||||
SELinux constraints. Privileged containers are given all Linux capabilities,
|
||||
including capabilities that they don't require. For example, a root user in a
|
||||
privileged container might be able to use the `CAP_SYS_ADMIN` and
|
||||
`CAP_NET_ADMIN` capabilities on the node, bypassing the runtime seccomp
|
||||
configuration and other restrictions.
|
||||
|
||||
In most cases, you should avoid using privileged containers, and instead grant
|
||||
the specific capabilities required by your container using the `capabilities`
|
||||
field in the `securityContext` field. Only use privileged mode if you have a
|
||||
capability that you can't grant with the securityContext. This is useful for
|
||||
containers that want to use operating system administrative capabilities such
|
||||
as manipulating the network stack or accessing hardware devices.
|
||||
|
||||
In Kubernetes version 1.26 and later, you can also run Windows containers in a
|
||||
similarly privileged mode by setting the `windowsOptions.hostProcess` flag on
|
||||
the security context of the Pod spec. For details and instructions, see
|
||||
[Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
|
||||
|
||||
## Recommendations and best practices {#recommendations-best-practices}
|
||||
|
||||
* Before configuring kernel-level security capabilities, you should consider
|
||||
implementing network-level isolation. For more information, read the
|
||||
[Security Checklist](/docs/concepts/security/security-checklist/#network-security).
|
||||
* Unless necessary, run Linux workloads as non-root by setting specific user and
|
||||
group IDs in your Pod manifest and by specifying `runAsNonRoot: true`.
|
||||
|
||||
Additionally, you can run workloads in user namespaces by setting
|
||||
`hostUsers: false` in your Pod manifest. This lets you run containers as root
|
||||
users in the user namespace, but as non-root users in the host namespace on the
|
||||
node. This is still in early stages of development and might not have the level
|
||||
of support that you need. For instructions, refer to
|
||||
[Use a User Namespace With a Pod](/docs/tasks/configure-pod-container/user-namespaces/).
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* [Learn how to use AppArmor](/docs/tutorials/security/apparmor/)
|
||||
* [Learn how to use seccomp](/docs/tutorials/security/seccomp/)
|
||||
* [Learn how to use SELinux](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container)
|
||||
|
|
@ -276,30 +276,34 @@ Containers within the Pod see the system hostname as being the same as the confi
|
|||
`name` for the Pod. There's more about this in the [networking](/docs/concepts/cluster-administration/networking/)
|
||||
section.
|
||||
|
||||
## Privileged mode for containers
|
||||
## Pod security settings {#pod-security}
|
||||
|
||||
{{< note >}}
|
||||
Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} must support the concept of a privileged container for this setting to be relevant.
|
||||
{{< /note >}}
|
||||
To set security constraints on Pods and containers, you use the
|
||||
`securityContext` field in the Pod specification. This field gives you
|
||||
granular control over what a Pod or individual containers can do. For example:
|
||||
|
||||
Any container in a pod can run in privileged mode to use operating system administrative capabilities
|
||||
that would otherwise be inaccessible. This is available for both Windows and Linux.
|
||||
* Drop specific Linux capabilities to avoid the impact of a CVE.
|
||||
* Force all processes in the Pod to run as a non-root user or as a specific
|
||||
user or group ID.
|
||||
* Set a specific seccomp profile.
|
||||
* Set Windows security options, such as whether containers run as HostProcess.
|
||||
|
||||
### Linux privileged containers
|
||||
{{< caution >}}
|
||||
You can also use the Pod securityContext to enable
|
||||
[_privileged mode_](/docs/concepts/security/linux-kernel-security-constraints/#privileged-containers)
|
||||
in Linux containers. Privileged mode overrides many of the other security
|
||||
settings in the securityContext. Avoid using this setting unless you can't grant
|
||||
the equivalent permissions by using other fields in the securityContext.
|
||||
In Kubernetes 1.26 and later, you can run Windows containers in a similarly
|
||||
privileged mode by setting the `windowsOptions.hostProcess` flag on the
|
||||
security context of the Pod spec. For details and instructions, see
|
||||
[Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
|
||||
{{< /caution >}}
|
||||
|
||||
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag
|
||||
on the [security context](/docs/tasks/configure-pod-container/security-context/) of the
|
||||
container spec. This is useful for containers that want to use operating system administrative
|
||||
capabilities such as manipulating the network stack or accessing hardware devices.
|
||||
|
||||
### Windows privileged containers
|
||||
|
||||
{{< feature-state for_k8s_version="v1.26" state="stable" >}}
|
||||
|
||||
In Windows, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod) by setting the
|
||||
`windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these
|
||||
pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used
|
||||
to perform administrative tasks as is done with Linux privileged containers.
|
||||
* To learn about kernel-level security constraints that you can use,
|
||||
see [Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints).
|
||||
* To learn more about the Pod security context, see
|
||||
[Configure a Security Context for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
|
||||
|
||||
## Static Pods
|
||||
|
||||
|
|
|
|||
|
|
@ -10,22 +10,10 @@ weight: 30
|
|||
|
||||
{{< feature-state feature_gate_name="AppArmor" >}}
|
||||
|
||||
|
||||
[AppArmor](https://apparmor.net/) is a Linux kernel security module that supplements the standard Linux user and group based
|
||||
permissions to confine programs to a limited set of resources. AppArmor can be configured for any
|
||||
application to reduce its potential attack surface and provide greater in-depth defense. It is
|
||||
configured through profiles tuned to allow the access needed by a specific program or container,
|
||||
such as Linux capabilities, network access, file permissions, etc. Each profile can be run in either
|
||||
*enforcing* mode, which blocks access to disallowed resources, or *complain* mode, which only reports
|
||||
violations.
|
||||
|
||||
On Kubernetes, AppArmor can help you to run a more secure deployment by restricting what containers are allowed to
|
||||
do, and/or provide better auditing through system logs. However, it is important to keep in mind
|
||||
that AppArmor is not a silver bullet and can only do so much to protect against exploits in your
|
||||
application code. It is important to provide good, restrictive profiles, and harden your
|
||||
applications and cluster from other angles as well.
|
||||
|
||||
|
||||
This page shows you how to load AppArmor profiles on your nodes and enforce
|
||||
those profiles in Pods. To learn more about how Kubernetes can confine Pods using
|
||||
AppArmor, see
|
||||
[Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints/#apparmor).
|
||||
|
||||
## {{% heading "objectives" %}}
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue