Merge pull request #21651 from dvdksn/seccomp-freshness

seccomp freshness
This commit is contained in:
David Karlsson 2024-12-18 09:36:52 +01:00 committed by GitHub
commit 54fcccd75e
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 66 additions and 62 deletions

View File

@ -17,6 +17,7 @@ exceptions:
- AWS - AWS
- BIOS - BIOS
- BPF - BPF
- BSD
- CI - CI
- CISA - CISA
- CLI - CLI
@ -73,6 +74,7 @@ exceptions:
- NFS - NFS
- NOTE - NOTE
- NTLM - NTLM
- NUMA
- NVDA - NVDA
- OCI - OCI
- OS - OS

View File

@ -20,8 +20,8 @@ Couchbase
Datadog Datadog
Ddosify Ddosify
Debootstrap Debootstrap
Dev Environments?
Dev Dev
Dev Environments?
Django Django
Docker Build Cloud Docker Build Cloud
Docker Business Docker Business
@ -73,8 +73,8 @@ Nuxeo
OAuth OAuth
OTel OTel
Okta Okta
Paketo
PKG PKG
Paketo
Postgres Postgres
PowerShell PowerShell
Python Python
@ -98,8 +98,9 @@ WireMock
Zscaler Zscaler
Zsh Zsh
[Aa]utobuild [Aa]utobuild
[Bb]uildx [Aa]llowlist
[Bb]uildpack(s)? [Bb]uildpack(s)?
[Bb]uildx
[Cc]odenames? [Cc]odenames?
[Cc]ompose [Cc]ompose
[Dd]istroless [Dd]istroless
@ -134,6 +135,10 @@ Zsh
[Ss]ysfs [Ss]ysfs
[Tt]oolchains? [Tt]oolchains?
[Uu]narchived? [Uu]narchived?
[Uu]ngated
[Uu]ntrusted
[Uu]serland
[Uu]serspace
[Vv]irtiofs [Vv]irtiofs
[Vv]irtualize [Vv]irtualize
[Ww]alkthrough [Ww]alkthrough
@ -178,8 +183,5 @@ systemd
tmpfs tmpfs
ufw ufw
umask umask
ungated
userland
untrusted
vSphere vSphere
vpnkit vpnkit

View File

@ -26,8 +26,8 @@ protective while providing wide application compatibility. The default Docker
profile can be found profile can be found
[here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json). [here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json).
In effect, the profile is an allowlist which denies access to system calls by In effect, the profile is an allowlist that denies access to system calls by
default, then allowlists specific system calls. The profile works by defining a default and then allows specific system calls. The profile works by defining a
`defaultAction` of `SCMP_ACT_ERRNO` and overriding that action only for specific `defaultAction` of `SCMP_ACT_ERRNO` and overriding that action only for specific
system calls. The effect of `SCMP_ACT_ERRNO` is to cause a `Permission Denied` system calls. The effect of `SCMP_ACT_ERRNO` is to cause a `Permission Denied`
error. Next, the profile defines a specific list of system calls which are fully error. Next, the profile defines a specific list of system calls which are fully
@ -53,61 +53,61 @@ $ docker run --rm \
Docker's default seccomp profile is an allowlist which specifies the calls that Docker's default seccomp profile is an allowlist which specifies the calls that
are allowed. The table below lists the significant (but not all) syscalls that are allowed. The table below lists the significant (but not all) syscalls that
are effectively blocked because they are not on the Allowlist. The table includes are effectively blocked because they are not on the allowlist. The table includes
the reason each syscall is blocked rather than white-listed. the reason each syscall is blocked rather than white-listed.
| Syscall | Description | | Syscall | Description |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------| | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. | | `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. |
| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. | | `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
| `bpf` | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`. | | `bpf` | Deny loading potentially persistent BPF programs into kernel, already gated by `CAP_SYS_ADMIN`. |
| `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | | `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
| `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | | `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_NEWUSER`. | | `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE\_\* flags, except `CLONE_NEWUSER`. |
| `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. | | `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. |
| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | | `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | | `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. | | `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. |
| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | | `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | | `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | | `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | | `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | | `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. | | `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. |
| `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. | | `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. |
| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | | `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. |
| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. | | `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. |
| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | | `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | | `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. |
| `move_pages` | Syscall that modifies kernel memory and NUMA settings. | | `move_pages` | Syscall that modifies kernel memory and NUMA settings. |
| `nfsservctl` | Deny interaction with the kernel nfs daemon. Obsolete since Linux 3.1. | | `nfsservctl` | Deny interaction with the kernel NFS daemon. Obsolete since Linux 3.1. |
| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. | | `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. |
| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. | | `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. |
| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. | | `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulnerabilities. |
| `pivot_root` | Deny `pivot_root`, should be privileged operation. | | `pivot_root` | Deny `pivot_root`, should be privileged operation. |
| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | | `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | | `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
| `ptrace` | Tracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping `CAP_SYS_PTRACE`, because it could leak a lot of information on the host. | | `ptrace` | Tracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping `CAP_SYS_PTRACE`, because it could leak a lot of information on the host. |
| `query_module` | Deny manipulation and functions on kernel modules. Obsolete. | | `query_module` | Deny manipulation and functions on kernel modules. Obsolete. |
| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. | | `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. |
| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. | | `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. |
| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. | | `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | | `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. | | `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. |
| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | | `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | | `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | | `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | | `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
| `sysfs` | Obsolete syscall. | | `sysfs` | Obsolete syscall. |
| `_sysctl` | Obsolete, replaced by /proc/sys. | | `_sysctl` | Obsolete, replaced by /proc/sys. |
| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | | `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
| `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | | `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. | | `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. |
| `uselib` | Older syscall related to shared libraries, unused for a long time. | | `uselib` | Older syscall related to shared libraries, unused for a long time. |
| `userfaultfd` | Userspace page fault handling, largely needed for process migration. | | `userfaultfd` | Userspace page fault handling, largely needed for process migration. |
| `ustat` | Obsolete syscall. | | `ustat` | Obsolete syscall. |
| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | | `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | | `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
## Run without the default seccomp profile ## Run without the default seccomp profile
@ -115,6 +115,6 @@ You can pass `unconfined` to run a container without the default seccomp
profile. profile.
```console ```console
$ docker run --rm -it --security-opt seccomp=unconfined debian:jessie \ $ docker run --rm -it --security-opt seccomp=unconfined debian:latest \
unshare --map-root-user --user sh -c whoami unshare --map-root-user --user sh -c whoami
``` ```