In the default seccomp rule, allow use of 32 bit syscalls on
64 bit architectures, so you can run x86 Linux images on x86_64
without disabling seccomp or using a custom rule.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
Being able to obtain a file handle is no use as we cannot perform
any operation in it, and it may leak kernel state.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
This change is done so that driver_unsupported.go and driver_unsupported_nocgo.go
declare the same signature for NewDriver as driver.go.
Fixes#19032
Signed-off-by: Lukas Waslowski <cr7pt0gr4ph7@gmail.com>
This can be allowed because it should only restrict more per the seccomp docs, and multiple apps use it today.
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
Block kcmp, procees_vm_readv, process_vm_writev.
All these require CAP_PTRACE, and are only used for ptrace related
actions, so are not useful as we block ptrace.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
The bpf syscall can load code into the kernel which may
persist beyond container lifecycle. Requires CAP_SYS_ADMIN
already.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
These provide an in kernel virtual machine for x86 real mode on x86
used by one very early DOS emulator. Not required for any normal use.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
The stime syscall is a legacy syscall on some architectures
to set the clock, should be blocked as time is not namespaced.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
clock_adjtime is the new posix style version of adjtime allowing
a specific clock to be specified. Time is not namespaced, so do
not allow.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
This is a new version of init_module that takes a file descriptor
rather than a file name.
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
The set_robust_list syscall sets the list of futexes which are
cleaned up on thread exit, and are needed to avoid mutexes
being held forever on thread exit.
See for example in Musl libc mutex handling:
http://git.musl-libc.org/cgit/musl/tree/src/thread/pthread_mutex_trylock.c#n22
Signed-off-by: Justin Cormack <justin.cormack@unikernel.com>
It's used for updating properties of one or more containers, we only
support resource configs for now. It can be extended in the future.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This restores the behavior that existed prior to #16235 for setting
OOMKilled, while retaining the additional benefits it introduced around
emitting the oom event.
This also adds a test for the most obvious OOM cases which would have
caught this regression.
Fixes#18510
Signed-off-by: Euan <euank@amazon.com>
Whether a shared/slave volume propagation will work or not also depends on
where source directory is mounted on and what are the propagation properties
of that mount point. For example, for shared volume mount to work, source
mount point should be shared. For slave volume mount to work, source mount
point should be either shared/slave.
This patch determines the mount point on which directory is mounted and
checks for desired minimum propagation properties of that mount point. It
errors out of configuration does not seem right.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Allow passing mount propagation option shared, slave, or private as volume
property.
For example.
docker run -ti -v /root/mnt-source:/root/mnt-dest:slave fedora bash
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Ubuntu 14.04 LTS is on apparmor 2.8.95.
This enables `ps` inside a container without causing
audit log entries on the host.
Signed-off-by: Joel Hansson <joel.hansson@ecraft.com>