Commit Graph

17 Commits

Author SHA1 Message Date
Giuseppe Scrivano 339f5cbdb9 seccomp: allow pkey_*
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-16 12:16:41 +02:00
Giuseppe Scrivano 24114130c2 seccomp: let io_uring_* fail with ENOSYS
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-16 12:15:05 +02:00
Giuseppe Scrivano d4fd05c527 seccomp: allow clone3
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-16 12:14:26 +02:00
Giuseppe Scrivano 526b9a36e7 seccomp: switch default to ENOSYS
add the currently blocked syscalls to a deny-list and switch the
default to ENOSYS.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-14 19:08:07 +02:00
Jan Palus 0583bac499 seccomp: allow timer_settime64
allow time64 variant of timer_settime which was missed in 4405585

Signed-off-by: Jan Palus <jpalus@fastmail.com>
2021-06-14 12:55:55 +02:00
Jan Palus e50fdde382 seccomp: allow more *_time64 syscalls
add missing equivalents of already allowed syscalls for 32-bit platforms
with 64-bit time for countering Y2038

Fixes #593

Signed-off-by: Jan Palus <jpalus@fastmail.com>
2021-06-01 18:05:14 +02:00
Daniel J Walsh a482b92f4a Add setns to default seccomp.json
In order to run containers within containers via podman
and do a podman exec, we need to allow setns syscalls.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2021-04-19 06:21:02 -04:00
Aleksa Sarai 1478f9331d seccomp: update profile to Linux 5.11 list
This mirrors the Docker and containerd changes, with the caveat that
because mount(2) is permitted under podman for all containers we
therefore add all of the v2 mount API syscalls as available to all
containers.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-01-27 21:40:48 +11:00
Aleksa Sarai 624d0aa703 seccomp: deduplicate default profile
Several syscalls were enabled globally (SCMP_ACT_ALLOW without any
conditions for all containers), but also had conditional rules later in
the profile (likely inherited from Docker). The following syscalls do
not need special casing because they were globally enabled:

 * clone, unshare, mount, umount, umount2 all had special CAP_SYS_ADMIN
   restrictions but those don't make sense since they were also enabled
   for all containers.
 * reboot was permitted for CAP_SYS_BOOT and all containers.
 * name_to_handle_at was permitted for CAP_SYS_ADMIN, CAP_SYS_NICE(?),
   and all containers.

And certain syscalls had globally-enabled rules when they shouldn't
have:

 * socket has special rules for CAP_AUDIT_WRITE but it also had a global
   "allow unconditionally" rule. It turns out that libseccomp will
   override unconditional rules with conditional ones but this is
   somewhat of an implementation detail and it's much safer to remove
   the rule and use the existing cases.

Now the only syscalls remaining with complicated rules (meaning they
appear more than once in the profile) are:

 * sync_file_range2 which is architecture specific (though in principle
   we could move it to enabled-without-rules because runc ignores
   unknown syscalls).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
2021-01-27 21:39:54 +11:00
Giuseppe Scrivano 10e862731c seccomp: drop 'vmsplice' from the allowed list
More details: https://lore.kernel.org/linux-mm/X+PoXCizo392PBX7@redhat.com/

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-01-08 13:43:54 +01:00
Daniel J Walsh 297a9ab8d6 Add pidfd_open syscall by default
This syscall will actually allow processes to be more secure,  Should be allowed by
default.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-12-15 05:46:02 -05:00
Daniel J Walsh 83bda5699e Move buildah/pkg/secrets to common/pkg/subscriptions
Since secrets is shared by buildah, podman and cri-o, we need
to move it to containers/common.

Also move containers-mounts.conf.5.md to common from podman,
since this is common to all packages.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-11-19 10:31:58 -05:00
Daniel J Walsh 4405585d9e Add time64 syscalls to seccomp.json
12 new syscalls have been added for handling 64 bit time.
These syscalls are breaking containers on newer kernels.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-10-21 17:38:10 -04:00
Daniel J Walsh 47ef35244c remove fchmodat2 from seccomp.json file
This syscall is proposed for the kernel but does not exists yet.  Having it in
the default syscall table is causing crun to print warning messages.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-09-21 07:57:43 -04:00
Daniel J Walsh d3e2a9fb55 Allow pidfd_getfd by default in seccomp.json
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-09-12 07:44:52 -04:00
Daniel J Walsh 746c707914 Add new syscalls to allowed seccomp.json
faccessat2, openat2, fchmodat2 are all new syscalls to help eliminate
race conditions, current containers get the older versions of these syscalls
so adding them by default makes sense.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-09-11 07:11:16 -04:00
Daniel J Walsh 826c76f723 Update default seccomp rules to match fedora rules
Add the following default syscalls:
"clock_adjtime"  --  Already allow adjtimex
"clone"          --  Needed so we can use a usernamespace within a container.
                     Since this is allowed for non root users, it should be safe
                     to use, and can allow us to support containers/user namespaces
                     within locked down containers.
"pivot_root"     --  Can be used by containers within containers

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-09-09 15:32:50 -04:00