Merge pull request #43214 from shannonxtreme/apparmor-seccomp

Add new page for kernel-level constraints
2024-04-30 04:56:15 -07:00 · 2024-04-30 04:56:15 -07:00 · ea4444a849
parent 23b054d2e1 7416c9c4d2
commit ea4444a849
3 changed files with 318 additions and 36 deletions
--- a/content/en/docs/concepts/security/linux-kernel-security-constraints.md
+++ b/content/en/docs/concepts/security/linux-kernel-security-constraints.md
@ -0,0 +1,290 @@
 ---
 title: Linux kernel security constraints for Pods and containers
 description: >
  Overview of Linux kernel security modules and constraints that you can use to
  harden your Pods and containers.
 content_type: concept
 weight: 100
 ---
 <!-- overview -->
 This page describes some of the security features that are built into the Linux
 kernel that you can use in your Kubernetes workloads. To learn how to apply
 these features to your Pods and containers, refer to
 [Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
 You should already be familiar with Linux and with the basics of Kubernetes
 workloads.
 <!-- body -->
 ## Run workloads without root privileges {#run-without-root}
 When you deploy a workload in Kubernetes, use the Pod specification to restrict
 that workload from running as the root user on the node. You can use the Pod
 `securityContext` to define the specific Linux user and group for the processes in
 the Pod, and explicitly restrict containers from running as root users. Setting
 these values in the Pod manifest takes precedence over similar values in the
 container image, which is especially useful if you're running images that you
 don't own.
 {{< caution >}}
 Ensure that the user or group that you assign to the workload has the permissions
 required for the application to function correctly. Changing the user or group
 to one that doesn't have the correct permissions could lead to file access
 issues or failed operations.
 {{< /caution >}}
 Configuring the kernel security features on this page provides fine-grained
 control over the actions that processes in your cluster can take, but managing
 these configurations can be challenging at scale. Running containers as
 non-root, or in user namespaces if you need root privileges, helps to reduce the
 chance that you'll need to enforce your configured kernel security capabilities.
 ## Security features in the Linux kernel {#linux-security-features}
 Kubernetes lets you configure and use Linux kernel features to improve isolation
 and harden your containerized workloads. Common features include the following:
 * **Secure computing mode (seccomp)**: Filter which system calls a process can
  make
 * **AppArmor**: Restrict the access privileges of individual programs
 * **Security Enhanced Linux (SELinux)**: Assign security labels to objects for
  more manageable security policy enforcement
 To configure settings for one of these features, the operating system that you
 choose for your nodes must enable the feature in the kernel. For example,
 Ubuntu 7.10 and later enable AppArmor by default. To learn whether your OS
 enables a specific feature, consult the OS documentation.
 You use the `securityContext` field in your Pod specification to define the
 constraints that apply to those processes. The `securityContext` field also
 supports other security settings, such as specific Linux capabilities or file
 access permissions using UIDs and GIDs. To learn more, refer to
 [Configure a SecurityContext for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
 ### seccomp
 Some of your workloads might need privileges to perform specific actions as the
 root user on your node's host machine. Linux uses *capabilities* to divide the
 available privileges into categories, so that processes can get the privileges
 required to perform specific actions without being granted all privileges. Each
 capability has a set of system calls (syscalls) that a process can make. seccomp
 lets you restrict these individual syscalls. <!--Copied from seccomp tutorial-->
 It can be used to sandbox the privileges of a process, restricting the calls it
 is able to make from userspace into the kernel.<!--End copy-->
 In Kubernetes, you use a *container runtime* on each node to run your
 containers. Example runtimes include CRI-O, Docker, or containerd. Each runtime
 allows only a subset of Linux capabilities by default. You can further limit the
 allowed syscalls individually by using a seccomp profile. Container runtimes
 usually include a default seccomp profile. <!--Copied from seccomp tutorial-->
 Kubernetes lets you automatically
 apply seccomp profiles loaded onto a node to your Pods and containers.<!--End copy-->
 {{<note>}}
 Kubernetes also has the `allowPrivilegeEscalation` setting for Pods and
 containers. When set to `false`, this prevents processes from gaining new
 capabilities and restricts unprivileged users from changing the applied seccomp
 profile to a more permissive profile.
 {{</note>}}
 To learn how to implement seccomp in Kubernetes, refer to
 [Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/).
 To learn more about seccomp, see
 [Seccomp BPF](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)
 in the Linux kernel documentation.
 #### Considerations for seccomp {#seccomp-considerations}
 seccomp is a low-level security configuration that you should only configure
 yourself if you require fine-grained control over Linux syscalls. Using
 seccomp, especially at scale, has the following risks:
 * Configurations might break during application updates
 * Attackers can still use allowed syscalls to exploit vulnerabilities
 * Profile management for individual applications becomes challenging at scale
 **Recommendation**: Use the default seccomp profile that's bundled with your
 container runtime. If you need a more isolated environment, consider using a
 sandbox, such as gVisor. Sandboxes solve the preceding risks with custom
 seccomp profiles, but require more compute resources on your nodes and might
 have compatibility issues with GPUs and other specialized hardware.
 ### AppArmor and SELinux: policy-based mandatory access control {#policy-based-mac}
 You can use Linux policy-based mandatory access control (MAC) mechanisms, such
 as AppArmor and SELinux, to harden your Kubernetes workloads.
 #### AppArmor
 <!-- Original text from https://kubernetes.io/docs/tutorials/security/apparmor/ -->
 [AppArmor](https://apparmor.net/) is a Linux kernel security module that
 supplements the standard Linux user and group based permissions to confine
 programs to a limited set of resources. AppArmor can be configured for any
 application to reduce its potential attack surface and provide greater in-depth
 defense. It is configured through profiles tuned to allow the access needed by a
 specific program or container, such as Linux capabilities, network access, and
 file permissions. Each profile can be run in either enforcing mode, which blocks
 access to disallowed resources, or complain mode, which only reports violations.
 AppArmor can help you to run a more secure deployment by restricting what
 containers are allowed to do, and/or provide better auditing through system
 logs. The container runtime that you use might ship with a default AppArmor
 profile, or you can use a custom profile.
 To learn how to use AppArmor in Kubernetes, refer to
 [Restrict a Container's Access to Resources with AppArmor](/docs/tutorials/security/apparmor/).
 #### SELinux
 SELinux is a Linux kernel security module that lets you restrict the access
 that a specific *subject*, such as a process, has to the files on your system.
 You define security policies that apply to subjects that have specific SELinux
 labels. When a process that has an SELinux label attempts to access a file, the
 SELinux server checks whether that process' security policy allows the access
 and makes an authorization decision.
 In Kubernetes, you can set an SELinux label in the `securityContext` field of
 your manifest. The specified labels are assigned to those processes. If you
 have configured security policies that affect those labels, the host OS kernel
 enforces these policies.
 To learn how to use SELinux in Kubernetes, refer to
 [Assign SELinux labels to a container](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container).
 #### Differences between AppArmor and SELinux {#apparmor-selinux-diff}
 The operating system on your Linux nodes usually includes one of either
 AppArmor or SELinux. Both mechanisms provide similar types of protection, but
 have differences such as the following:
 * **Configuration**: AppArmor uses profiles to define access to resources.
  SELinux uses policies that apply to specific labels.
 * **Policy application**: In AppArmor, you define resources using file paths.
  SELinux uses the index node (inode) of a resource to identify the resource.
 ### Summary of features {#summary}
 The following table describes the use cases and scope of each security control.
 You can use all of these controls together to build a more hardened system.
 <table>
  <caption>Summary of Linux kernel security features</caption>
  <thead>
    <tr>
      <th>Security feature</th>
      <th>Description</th>
      <th>How to use</th>
      <th>Example</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>seccomp</td>
      <td>Restrict individual kernel calls in the userspace. Reduces the
      likelihood that a vulnerability that uses a restricted syscall would
      compromise the system.</td>
      <td>Specify a loaded seccomp profile in the Pod or container specification
      to apply its constraints to the processes in the Pod.</td>
      <td>Reject the <code>unshare</code> syscall, which was used in
      <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-0185">CVE-2022-0185</a>.</td>
    </tr>
    <tr>
      <td>AppArmor</td>
      <td>Restrict program access to specific resources. Reduces the attack
      surface of the program. Improves audit logging.</td>
      <td>Specify a loaded AppArmor profile in the container specification.</td>
      <td>Restrict a read-only program from writing to any file path
      in the system.</td>
    </tr>
    <tr>
      <td>SELinux</td>
      <td>Restrict access to resources such as files, applications, ports, and
      processes using labels and security policies.</td>
      <td>Specify access restrictions for specific labels. Tag processes with
      those labels to enforce the access restrictions related to the label.</td>
      <td>Restrict a container from accessing files outside its own filesystem.</td>
    </tr>
  </tbody>
 </table>
 {{< note >}}
 Mechanisms like AppArmor and SELinux can provide protection that extends beyond
 the container. For example, you can use SELinux to help mitigate
 [CVE-2019-5736](https://access.redhat.com/security/cve/cve-2019-5736).
 {{< /note >}}
 ### Considerations for managing custom configurations {#considerations-custom-configurations}
 seccomp, AppArmor, and SELinux usually have a default configuration that offers
 basic protections.  You can also create custom profiles and policies that meet
 the requirements of your workloads. Managing and distributing these custom
 configurations at scale might be challenging, especially if you use all three
 features together. To help you to manage these configurations at scale, use a
 tool like the
 [Kubernetes Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator).
 ## Kernel-level security features and privileged containers {#kernel-security-features-privileged-containers}
 Kubernetes lets you specify that some trusted containers can run in
 *privileged* mode. Any container in a Pod can run in privileged mode to use
 operating system administrative capabilities that would otherwise be
 inaccessible. This is available for both Windows and Linux.
 Privileged containers explicitly override some of the Linux kernel constraints
 that you might use in your workloads, as follows:
 * **seccomp**: Privileged containers run as the `Unconfined` seccomp profile,
  overriding any seccomp profile that you specified in your manifest.
 * **AppArmor**: Privileged containers ignore any applied AppArmor profiles.
 * **SELinux**: Privileged containers run as the `unconfined_t` domain.
 ### Privileged containers {#privileged-containers}
 <!-- Content from https://kubernetes.io/docs/concepts/workloads/pods/#privileged-mode-for-containers  -->
 Any container in a Pod can enable *Privileged mode* if you set the
 `privileged: true` field in the
 [`securityContext`](/docs/tasks/configure-pod-container/security-context/)
 field for the container. Privileged containers override or undo many other hardening settings such as the applied seccomp profile, AppArmor profile, or
 SELinux constraints. Privileged containers are given all Linux capabilities,
 including capabilities that they don't require. For example, a root user in a
 privileged container might be able to use the `CAP_SYS_ADMIN` and
 `CAP_NET_ADMIN` capabilities on the node, bypassing the runtime seccomp
 configuration and other restrictions.
 In most cases, you should avoid using privileged containers, and instead grant
 the specific capabilities required by your container using the `capabilities`
 field in the `securityContext` field. Only use privileged mode if you have a
 capability that you can't grant with the securityContext. This is useful for
 containers that want to use operating system administrative capabilities such
 as manipulating the network stack or accessing hardware devices.
 In Kubernetes version 1.26 and later, you can also run Windows containers in a
 similarly privileged mode by setting the `windowsOptions.hostProcess` flag on
 the security context of the Pod spec. For details and instructions, see
 [Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
 ## Recommendations and best practices {#recommendations-best-practices}
 * Before configuring kernel-level security capabilities, you should consider
  implementing network-level isolation. For more information, read the
  [Security Checklist](/docs/concepts/security/security-checklist/#network-security).
 * Unless necessary, run Linux workloads as non-root by setting specific user and
  group IDs in your Pod manifest and by specifying `runAsNonRoot: true`.
 Additionally, you can run workloads in user namespaces by setting
 `hostUsers: false` in your Pod manifest. This lets you run containers as root
 users in the user namespace, but as non-root users in the host namespace on the
 node. This is still in early stages of development and might not have the level
 of support that you need. For instructions, refer to
 [Use a User Namespace With a Pod](/docs/tasks/configure-pod-container/user-namespaces/).
 ## {{% heading "whatsnext" %}}
 * [Learn how to use AppArmor](/docs/tutorials/security/apparmor/)
 * [Learn how to use seccomp](/docs/tutorials/security/seccomp/)
 * [Learn how to use SELinux](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container)
--- a/content/en/docs/concepts/workloads/pods/_index.md
+++ b/content/en/docs/concepts/workloads/pods/_index.md
@ -276,30 +276,34 @@ Containers within the Pod see the system hostname as being the same as the confi
 `name` for the Pod. There's more about this in the [networking](/docs/concepts/cluster-administration/networking/)
 section.
-## Privileged mode for containers
+## Pod security settings {#pod-security}
-{{< note >}}
+To set security constraints on Pods and containers, you use the
-Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} must support the concept of a privileged container for this setting to be relevant.
+`securityContext` field in the Pod specification. This field gives you
-{{< /note >}}
+granular control over what a Pod or individual containers can do. For example:
-Any container in a pod can run in privileged mode to use operating system administrative capabilities
+* Drop specific Linux capabilities to avoid the impact of a CVE.
-that would otherwise be inaccessible. This is available for both Windows and Linux.
+* Force all processes in the Pod to run as a non-root user or as a specific
  user or group ID.
 * Set a specific seccomp profile.
 * Set Windows security options, such as whether containers run as HostProcess.
-### Linux privileged containers
+{{< caution >}}
 You can also use the Pod securityContext to enable
 [_privileged mode_](/docs/concepts/security/linux-kernel-security-constraints/#privileged-containers)
 in Linux containers. Privileged mode overrides many of the other security
 settings in the securityContext. Avoid using this setting unless you can't grant
 the equivalent permissions by using other fields in the securityContext.
 In Kubernetes 1.26 and later, you can run Windows containers in a similarly
 privileged mode by setting the `windowsOptions.hostProcess` flag on the
 security context of the Pod spec. For details and instructions, see
 [Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/).
 {{< /caution >}}
-In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag
+* To learn about kernel-level security constraints that you can use,
-on the [security context](/docs/tasks/configure-pod-container/security-context/) of the
+  see [Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints).
-container spec. This is useful for containers that want to use operating system administrative
+* To learn more about the Pod security context, see
-capabilities such as manipulating the network stack or accessing hardware devices.
+  [Configure a Security Context for a Pod or Container](/docs/tasks/configure-pod-container/security-context/).
 ### Windows privileged containers
 {{< feature-state for_k8s_version="v1.26" state="stable" >}}
 In Windows, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod) by setting the 
 `windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these
 pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used
 to perform administrative tasks as is done with Linux privileged containers.
 ## Static Pods
--- a/content/en/docs/tutorials/security/apparmor.md
+++ b/content/en/docs/tutorials/security/apparmor.md
@ -10,22 +10,10 @@ weight: 30
 {{< feature-state feature_gate_name="AppArmor" >}}
-
+This page shows you how to load AppArmor profiles on your nodes and enforce
-[AppArmor](https://apparmor.net/) is a Linux kernel security module that supplements the standard Linux user and group based
+those profiles in Pods. To learn more about how Kubernetes can confine Pods using
-permissions to confine programs to a limited set of resources. AppArmor can be configured for any
+AppArmor, see
-application to reduce its potential attack surface and provide greater in-depth defense. It is
+[Linux kernel security constraints for Pods and containers](/docs/concepts/security/linux-kernel-security-constraints/#apparmor).
 configured through profiles tuned to allow the access needed by a specific program or container,
 such as Linux capabilities, network access, file permissions, etc. Each profile can be run in either
 *enforcing* mode, which blocks access to disallowed resources, or *complain* mode, which only reports
 violations.
 On Kubernetes, AppArmor can help you to run a more secure deployment by restricting what containers are allowed to
 do, and/or provide better auditing through system logs. However, it is important to keep in mind
 that AppArmor is not a silver bullet and can only do so much to protect against exploits in your
 application code. It is important to provide good, restrictive profiles, and harden your
 applications and cluster from other angles as well.
 ## {{% heading "objectives" %}}