Add blog post about recording seccomp profiles in edge scenarios
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This commit is contained in:
parent
26f5f6b64e
commit
af4cec641c
|
|
@ -0,0 +1,282 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Having fun with seccomp profiles on the edge"
|
||||
date: 2023-05-18
|
||||
slug: seccomp-profiles-edge
|
||||
---
|
||||
|
||||
**Author**: Sascha Grunert
|
||||
|
||||
The [Security Profiles Operator (SPO)][spo] is a feature-rich
|
||||
[operator][operator] for Kubernetes to make managing seccomp, SELinux and
|
||||
AppArmor profiles easier than ever. Recording those profiles from scratch is one
|
||||
of the key features of this operator, which usually involves the integration
|
||||
into large CI/CD systems. Being able to test the recording capabilities of the
|
||||
operator in edge cases is one of the recent development efforts of the SPO and
|
||||
makes it excitingly easy to play around with seccomp profiles.
|
||||
|
||||
[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
|
||||
[operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator
|
||||
|
||||
## Recording seccomp profiles with `spoc record`
|
||||
|
||||
The [v0.8.0][spo-latest] release of the Security Profiles Operator shipped a new
|
||||
command line interface called `spoc`, a little helper tool for recording and
|
||||
replaying seccomp profiles among various other things that are out of scope of
|
||||
this blog post.
|
||||
|
||||
[spo-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/v0.8.0
|
||||
|
||||
Recording a seccomp profile requires a binary to be executed, which can be a
|
||||
simple golang application which just calls [`uname(2)`][uname]:
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import (
|
||||
"syscall"
|
||||
)
|
||||
|
||||
func main() {
|
||||
utsname := syscall.Utsname{}
|
||||
if err := syscall.Uname(&utsname); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
[uname]: https://man7.org/linux/man-pages/man2/uname.2.html
|
||||
|
||||
Building a binary from that code can be done by:
|
||||
|
||||
```console
|
||||
> go build -o main main.go
|
||||
> ldd ./main
|
||||
not a dynamic executable
|
||||
```
|
||||
|
||||
Now it's possible to download the latest binary of [`spoc` from
|
||||
GitHub][spoc-latest] and run the application on Linux with it:
|
||||
|
||||
[spoc-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/download/v0.8.0/spoc.amd64
|
||||
|
||||
```console
|
||||
> sudo ./spoc record ./main
|
||||
10:08:25.591945 Loading bpf module
|
||||
10:08:25.591958 Using system btf file
|
||||
libbpf: loading object 'recorder.bpf.o' from buffer
|
||||
…
|
||||
libbpf: prog 'sys_enter': relo #3: patched insn #22 (ALU/ALU64) imm 16 -> 16
|
||||
10:08:25.610767 Getting bpf program sys_enter
|
||||
10:08:25.610778 Attaching bpf tracepoint
|
||||
10:08:25.611574 Getting syscalls map
|
||||
10:08:25.611582 Getting pid_mntns map
|
||||
10:08:25.613097 Module successfully loaded
|
||||
10:08:25.613311 Processing events
|
||||
10:08:25.613693 Running command with PID: 336007
|
||||
10:08:25.613835 Received event: pid: 336007, mntns: 4026531841
|
||||
10:08:25.613951 No container ID found for PID (pid=336007, mntns=4026531841, err=unable to find container ID in cgroup path)
|
||||
10:08:25.614856 Processing recorded data
|
||||
10:08:25.614975 Found process mntns 4026531841 in bpf map
|
||||
10:08:25.615110 Got syscalls: read, close, mmap, rt_sigaction, rt_sigprocmask, madvise, nanosleep, clone, uname, sigaltstack, arch_prctl, gettid, futex, sched_getaffinity, exit_group, openat
|
||||
10:08:25.615195 Adding base syscalls: access, brk, capget, capset, chdir, chmod, chown, close_range, dup2, dup3, epoll_create1, epoll_ctl, epoll_pwait, execve, faccessat2, fchdir, fchmodat, fchown, fchownat, fcntl, fstat, fstatfs, getdents64, getegid, geteuid, getgid, getpid, getppid, getuid, ioctl, keyctl, lseek, mkdirat, mknodat, mount, mprotect, munmap, newfstatat, openat2, pipe2, pivot_root, prctl, pread64, pselect6, readlink, readlinkat, rt_sigreturn, sched_yield, seccomp, set_robust_list, set_tid_address, setgid, setgroups, sethostname, setns, setresgid, setresuid, setsid, setuid, statfs, statx, symlinkat, tgkill, umask, umount2, unlinkat, unshare, write
|
||||
10:08:25.616293 Wrote seccomp profile to: /tmp/profile.yaml
|
||||
10:08:25.616298 Unloading bpf module
|
||||
```
|
||||
|
||||
I have to execute `spoc` as root because it will internally run an [ebpf][ebpf]
|
||||
program by reusing the same code parts from the Security Profiles Operator
|
||||
itself. I can see that the bpf module got loaded successfully and `spoc`
|
||||
attached the required tracepoint to it. Then it will track the main application
|
||||
by using its [mount namespace][mntns] and process the recorded syscall data. The
|
||||
nature of ebpf programs is that they see the whole context of the Kernel, which
|
||||
means that `spoc` tracks all syscalls of the system, but does not interfere with
|
||||
their execution.
|
||||
|
||||
[ebpf]: https://ebpf.io
|
||||
[mntns]: https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
|
||||
|
||||
The logs indicate that `spoc` found the syscalls `read`, `close`,
|
||||
`mmap` and so on, including `uname`. All other syscalls than `uname` are coming
|
||||
from the golang runtime and its garbage collection, which already adds overhead
|
||||
to a basic application like in our demo. I can also see from the log line
|
||||
`Adding base syscalls: …` that `spoc` adds a bunch of base syscalls to the
|
||||
resulting profile. Those are used by the OCI runtime (like [runc][runc] or
|
||||
[crun][crun]) in order to be able to run a container. This means that `spoc`
|
||||
can be used to record seccomp profiles which then can be containerized directly.
|
||||
This behavior can be disabled in `spoc` by using the `--no-base-syscalls`/`-n`
|
||||
or customized via the `--base-syscalls`/`-b` command line flags This can be
|
||||
helpful in cases where different OCI runtimes other than crun and runc are used,
|
||||
or if I just want to record the seccomp profile for the application and stack
|
||||
it with another [base profile][base].
|
||||
|
||||
[runc]: https://github.com/opencontainers/runc
|
||||
[crun]: https://github.com/containers/crun
|
||||
[base]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#base-syscalls-for-a-container-runtime
|
||||
|
||||
The resulting profile is now available in `/tmp/profile.yaml`, but the default
|
||||
location can be changed using the `--output-file value`/`-o` flag:
|
||||
|
||||
```console
|
||||
> cat /tmp/profile.yaml
|
||||
```
|
||||
|
||||
```yaml
|
||||
apiVersion: security-profiles-operator.x-k8s.io/v1beta1
|
||||
kind: SeccompProfile
|
||||
metadata:
|
||||
creationTimestamp: null
|
||||
name: main
|
||||
spec:
|
||||
architectures:
|
||||
- SCMP_ARCH_X86_64
|
||||
defaultAction: SCMP_ACT_ERRNO
|
||||
syscalls:
|
||||
- action: SCMP_ACT_ALLOW
|
||||
names:
|
||||
- access
|
||||
- arch_prctl
|
||||
- brk
|
||||
- …
|
||||
- uname
|
||||
- …
|
||||
status: {}
|
||||
```
|
||||
|
||||
The seccomp profile Custom Resource Definition (CRD) can be directly used
|
||||
together with the Security Profiles Operator for managing it within Kubernetes.
|
||||
`spoc` is also capable of producing raw seccomp profiles (as JSON), by using the
|
||||
`--type`/`-t` `raw-seccomp` flag:
|
||||
|
||||
```console
|
||||
> sudo ./spoc record --type raw-seccomp ./main
|
||||
…
|
||||
52.628827 Wrote seccomp profile to: /tmp/profile.json
|
||||
```
|
||||
|
||||
```
|
||||
> jq . /tmp/profile.json
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"defaultAction": "SCMP_ACT_ERRNO",
|
||||
"architectures": ["SCMP_ARCH_X86_64"],
|
||||
"syscalls": [
|
||||
{
|
||||
"names": ["access", "…", "write"],
|
||||
"action": "SCMP_ACT_ALLOW"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The utility `spoc record` allows us to record complex seccomp profiles directly
|
||||
from binary invocations in any Linux system which is capable of running the ebpf
|
||||
code within the Kernel. But it can do more: How about modifying the seccomp
|
||||
profile and then testing it by using `spoc run`.
|
||||
|
||||
## Running seccomp profiles with `spoc run`
|
||||
|
||||
`spoc` is also able to run binaries with applied seccomp profiles, making it
|
||||
easy to test any modification to it. To do that, just run:
|
||||
|
||||
```console
|
||||
> sudo ./spoc run ./main
|
||||
10:29:58.153263 Reading file /tmp/profile.yaml
|
||||
10:29:58.153311 Assuming YAML profile
|
||||
10:29:58.154138 Setting up seccomp
|
||||
10:29:58.154178 Load seccomp profile
|
||||
10:29:58.154189 Starting audit log enricher
|
||||
10:29:58.154224 Enricher reading from file /var/log/audit/audit.log
|
||||
10:29:58.155356 Running command with PID: 437880
|
||||
>
|
||||
```
|
||||
|
||||
It looks like that the application exited successfully, which is anticipated
|
||||
because I did not modify the previously recorded profile yet. I can also
|
||||
specify a custom location for the profile by using the `--profile`/`-p` flag,
|
||||
but this was not necessary because I did not modify the default output location
|
||||
from the record. `spoc` will automatically determine if it's a raw (JSON) or CRD
|
||||
(YAML) based seccomp profile and then apply it to the process.
|
||||
|
||||
The Security Profiles Operator supports a [log enricher feature][enricher],
|
||||
which provides additional seccomp related information by parsing the audit logs.
|
||||
`spoc run` uses the enricher in the same way to provide more data to the end
|
||||
users when it comes to debugging seccomp profiles.
|
||||
|
||||
[enricher]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#using-the-log-enricher
|
||||
|
||||
Now I have to modify the profile to see anything valuable in the output. For
|
||||
example, I could remove the allowed `uname` syscall:
|
||||
|
||||
```console
|
||||
> jq 'del(.syscalls[0].names[] | select(. == "uname"))' /tmp/profile.json > /tmp/no-uname-profile.json
|
||||
```
|
||||
|
||||
And then try to run it again with the new profile `/tmp/no-uname-profile.json`:
|
||||
|
||||
```
|
||||
> sudo ./spoc run -p /tmp/no-uname-profile.json ./main
|
||||
10:39:12.707798 Reading file /tmp/no-uname-profile.json
|
||||
10:39:12.707892 Setting up seccomp
|
||||
10:39:12.707920 Load seccomp profile
|
||||
10:39:12.707982 Starting audit log enricher
|
||||
10:39:12.707998 Enricher reading from file /var/log/audit/audit.log
|
||||
10:39:12.709164 Running command with PID: 480512
|
||||
panic: operation not permitted
|
||||
|
||||
goroutine 1 [running]:
|
||||
main.main()
|
||||
/path/to/main.go:10 +0x85
|
||||
10:39:12.713035 Unable to run: launch runner: wait for command: exit status 2
|
||||
```
|
||||
|
||||
Alright, that was expected! The applied seccomp profile blocks the `uname`
|
||||
syscall, which results in an "operation not permitted" error. This error is
|
||||
pretty generic and does not provide any hint on what got blocked by seccomp.
|
||||
It is generally extremely difficult to predict how applications behave if single
|
||||
syscalls are forbidden by seccomp. It could be possible that the application
|
||||
terminates like in our simple demo, but it could also lead to a strange
|
||||
misbehavior and the application does not stop at all.
|
||||
|
||||
If I now change the default seccomp action of the profile from `SCMP_ACT_ERRNO`
|
||||
to `SCMP_ACT_LOG` like this:
|
||||
|
||||
```console
|
||||
> jq '.defaultAction = "SCMP_ACT_LOG"' /tmp/no-uname-profile.json > /tmp/no-uname-profile-log.json
|
||||
```
|
||||
|
||||
Then the log enricher will give us a hint that the `uname` syscall got blocked
|
||||
when using `spoc run`:
|
||||
|
||||
```
|
||||
> sudo ./spoc run -p /tmp/no-uname-profile-log.json ./main
|
||||
10:48:07.470126 Reading file /tmp/no-uname-profile-log.json
|
||||
10:48:07.470234 Setting up seccomp
|
||||
10:48:07.470245 Load seccomp profile
|
||||
10:48:07.470302 Starting audit log enricher
|
||||
10:48:07.470339 Enricher reading from file /var/log/audit/audit.log
|
||||
10:48:07.470889 Running command with PID: 522268
|
||||
10:48:07.472007 Seccomp: uname (63)
|
||||
```
|
||||
|
||||
The application will not terminate any more, but seccomp will log the behavior
|
||||
to `/var/log/audit/audit.log` and `spoc` will parse the data to correlate it
|
||||
directly to our program. Generating the log messages to the audit subsystem
|
||||
comes with a large performance overhead and should be handled with care in
|
||||
production systems. It also comes with a security risk when running untrusted
|
||||
apps in audit mode in production environments.
|
||||
|
||||
This demo should give you an impression how to debug seccomp profile issues with
|
||||
applications, probably by using our shiny new helper tool powered by the
|
||||
features of the Security Profiles Operator. `spoc` is a flexible and portable
|
||||
binary suitable for edge cases where resources are limited and even Kubernetes
|
||||
itself may not be available with its full capabilities.
|
||||
|
||||
Thank you for reading this blog post! If you're interested in more, providing
|
||||
feedback or asking for help, then feel free to get in touch with us directly via
|
||||
[Slack (#security-profiles-operator)][slack] or the [mailing list][mail].
|
||||
|
||||
[slack]: https://kubernetes.slack.com/messages/security-profiles-operator
|
||||
[mail]: https://groups.google.com/forum/#!forum/kubernetes-dev
|
||||
Loading…
Reference in New Issue