mirror of https://github.com/docker/docs.git
Merge pull request #21651 from dvdksn/seccomp-freshness
seccomp freshness
This commit is contained in:
commit
54fcccd75e
|
@ -17,6 +17,7 @@ exceptions:
|
||||||
- AWS
|
- AWS
|
||||||
- BIOS
|
- BIOS
|
||||||
- BPF
|
- BPF
|
||||||
|
- BSD
|
||||||
- CI
|
- CI
|
||||||
- CISA
|
- CISA
|
||||||
- CLI
|
- CLI
|
||||||
|
@ -73,6 +74,7 @@ exceptions:
|
||||||
- NFS
|
- NFS
|
||||||
- NOTE
|
- NOTE
|
||||||
- NTLM
|
- NTLM
|
||||||
|
- NUMA
|
||||||
- NVDA
|
- NVDA
|
||||||
- OCI
|
- OCI
|
||||||
- OS
|
- OS
|
||||||
|
|
|
@ -20,8 +20,8 @@ Couchbase
|
||||||
Datadog
|
Datadog
|
||||||
Ddosify
|
Ddosify
|
||||||
Debootstrap
|
Debootstrap
|
||||||
Dev Environments?
|
|
||||||
Dev
|
Dev
|
||||||
|
Dev Environments?
|
||||||
Django
|
Django
|
||||||
Docker Build Cloud
|
Docker Build Cloud
|
||||||
Docker Business
|
Docker Business
|
||||||
|
@ -73,8 +73,8 @@ Nuxeo
|
||||||
OAuth
|
OAuth
|
||||||
OTel
|
OTel
|
||||||
Okta
|
Okta
|
||||||
Paketo
|
|
||||||
PKG
|
PKG
|
||||||
|
Paketo
|
||||||
Postgres
|
Postgres
|
||||||
PowerShell
|
PowerShell
|
||||||
Python
|
Python
|
||||||
|
@ -98,8 +98,9 @@ WireMock
|
||||||
Zscaler
|
Zscaler
|
||||||
Zsh
|
Zsh
|
||||||
[Aa]utobuild
|
[Aa]utobuild
|
||||||
[Bb]uildx
|
[Aa]llowlist
|
||||||
[Bb]uildpack(s)?
|
[Bb]uildpack(s)?
|
||||||
|
[Bb]uildx
|
||||||
[Cc]odenames?
|
[Cc]odenames?
|
||||||
[Cc]ompose
|
[Cc]ompose
|
||||||
[Dd]istroless
|
[Dd]istroless
|
||||||
|
@ -134,6 +135,10 @@ Zsh
|
||||||
[Ss]ysfs
|
[Ss]ysfs
|
||||||
[Tt]oolchains?
|
[Tt]oolchains?
|
||||||
[Uu]narchived?
|
[Uu]narchived?
|
||||||
|
[Uu]ngated
|
||||||
|
[Uu]ntrusted
|
||||||
|
[Uu]serland
|
||||||
|
[Uu]serspace
|
||||||
[Vv]irtiofs
|
[Vv]irtiofs
|
||||||
[Vv]irtualize
|
[Vv]irtualize
|
||||||
[Ww]alkthrough
|
[Ww]alkthrough
|
||||||
|
@ -178,8 +183,5 @@ systemd
|
||||||
tmpfs
|
tmpfs
|
||||||
ufw
|
ufw
|
||||||
umask
|
umask
|
||||||
ungated
|
|
||||||
userland
|
|
||||||
untrusted
|
|
||||||
vSphere
|
vSphere
|
||||||
vpnkit
|
vpnkit
|
||||||
|
|
|
@ -26,8 +26,8 @@ protective while providing wide application compatibility. The default Docker
|
||||||
profile can be found
|
profile can be found
|
||||||
[here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json).
|
[here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json).
|
||||||
|
|
||||||
In effect, the profile is an allowlist which denies access to system calls by
|
In effect, the profile is an allowlist that denies access to system calls by
|
||||||
default, then allowlists specific system calls. The profile works by defining a
|
default and then allows specific system calls. The profile works by defining a
|
||||||
`defaultAction` of `SCMP_ACT_ERRNO` and overriding that action only for specific
|
`defaultAction` of `SCMP_ACT_ERRNO` and overriding that action only for specific
|
||||||
system calls. The effect of `SCMP_ACT_ERRNO` is to cause a `Permission Denied`
|
system calls. The effect of `SCMP_ACT_ERRNO` is to cause a `Permission Denied`
|
||||||
error. Next, the profile defines a specific list of system calls which are fully
|
error. Next, the profile defines a specific list of system calls which are fully
|
||||||
|
@ -53,61 +53,61 @@ $ docker run --rm \
|
||||||
|
|
||||||
Docker's default seccomp profile is an allowlist which specifies the calls that
|
Docker's default seccomp profile is an allowlist which specifies the calls that
|
||||||
are allowed. The table below lists the significant (but not all) syscalls that
|
are allowed. The table below lists the significant (but not all) syscalls that
|
||||||
are effectively blocked because they are not on the Allowlist. The table includes
|
are effectively blocked because they are not on the allowlist. The table includes
|
||||||
the reason each syscall is blocked rather than white-listed.
|
the reason each syscall is blocked rather than white-listed.
|
||||||
|
|
||||||
| Syscall | Description |
|
| Syscall | Description |
|
||||||
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------|
|
| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. |
|
| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. |
|
||||||
| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
||||||
| `bpf` | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`. |
|
| `bpf` | Deny loading potentially persistent BPF programs into kernel, already gated by `CAP_SYS_ADMIN`. |
|
||||||
| `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
| `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
||||||
| `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
| `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
||||||
| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_NEWUSER`. |
|
| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE\_\* flags, except `CLONE_NEWUSER`. |
|
||||||
| `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. |
|
| `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. |
|
||||||
| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
||||||
| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
||||||
| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. |
|
| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. |
|
||||||
| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
||||||
| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. |
|
||||||
| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
|
| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
|
||||||
| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
|
| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. |
|
||||||
| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
||||||
| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. |
|
| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. |
|
||||||
| `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. |
|
| `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. |
|
||||||
| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
||||||
| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. |
|
| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
||||||
| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. |
|
| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. |
|
||||||
| `move_pages` | Syscall that modifies kernel memory and NUMA settings. |
|
| `move_pages` | Syscall that modifies kernel memory and NUMA settings. |
|
||||||
| `nfsservctl` | Deny interaction with the kernel nfs daemon. Obsolete since Linux 3.1. |
|
| `nfsservctl` | Deny interaction with the kernel NFS daemon. Obsolete since Linux 3.1. |
|
||||||
| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. |
|
| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. |
|
||||||
| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. |
|
| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. |
|
||||||
| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. |
|
| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulnerabilities. |
|
||||||
| `pivot_root` | Deny `pivot_root`, should be privileged operation. |
|
| `pivot_root` | Deny `pivot_root`, should be privileged operation. |
|
||||||
| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
||||||
| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. |
|
||||||
| `ptrace` | Tracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping `CAP_SYS_PTRACE`, because it could leak a lot of information on the host. |
|
| `ptrace` | Tracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping `CAP_SYS_PTRACE`, because it could leak a lot of information on the host. |
|
||||||
| `query_module` | Deny manipulation and functions on kernel modules. Obsolete. |
|
| `query_module` | Deny manipulation and functions on kernel modules. Obsolete. |
|
||||||
| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. |
|
| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. |
|
| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. |
|
||||||
| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. |
|
||||||
| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. |
|
||||||
| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. |
|
| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
||||||
| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. |
|
||||||
| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
|
| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
|
| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `sysfs` | Obsolete syscall. |
|
| `sysfs` | Obsolete syscall. |
|
||||||
| `_sysctl` | Obsolete, replaced by /proc/sys. |
|
| `_sysctl` | Obsolete, replaced by /proc/sys. |
|
||||||
| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
|
| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
|
| `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. |
|
| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. |
|
||||||
| `uselib` | Older syscall related to shared libraries, unused for a long time. |
|
| `uselib` | Older syscall related to shared libraries, unused for a long time. |
|
||||||
| `userfaultfd` | Userspace page fault handling, largely needed for process migration. |
|
| `userfaultfd` | Userspace page fault handling, largely needed for process migration. |
|
||||||
| `ustat` | Obsolete syscall. |
|
| `ustat` | Obsolete syscall. |
|
||||||
| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
|
| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
|
| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. |
|
||||||
|
|
||||||
## Run without the default seccomp profile
|
## Run without the default seccomp profile
|
||||||
|
|
||||||
|
@ -115,6 +115,6 @@ You can pass `unconfined` to run a container without the default seccomp
|
||||||
profile.
|
profile.
|
||||||
|
|
||||||
```console
|
```console
|
||||||
$ docker run --rm -it --security-opt seccomp=unconfined debian:jessie \
|
$ docker run --rm -it --security-opt seccomp=unconfined debian:latest \
|
||||||
unshare --map-root-user --user sh -c whoami
|
unshare --map-root-user --user sh -c whoami
|
||||||
```
|
```
|
||||||
|
|
Loading…
Reference in New Issue