Commit Graph

2970 Commits

Author SHA1 Message Date
Paul Holzinger 3af19917a1
Add failing run test for netavark
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 17:50:10 +01:00
Paul Holzinger fe90a45e0d
Add flag to overwrite network backend from config
To make testing easier we can overwrite the network backend with the
global `--network-backend` option.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 17:30:27 +01:00
Paul Holzinger 8041d44c93
Add network backend to podman info
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 16:49:46 +01:00
Paul Holzinger b2f7430b67
Add more netavark tests
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 16:49:28 +01:00
Paul Holzinger 1c88f741a7
select network backend based on config
You can change the network backendend in containers.conf supported
values are "cni" and "netavark".

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 16:26:47 +01:00
Paul Holzinger 3fe0c49174
Fix RUST_LOG envar for netavark
THe rust netlink library is very verbose. It contains way to much debug
and trave logs. We can set `RUST_LOG=netavark=<level>` to make sure this
log level only applies to netavark and not the libraries.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 16:25:52 +01:00
Paul Holzinger 4febe55769
netavark IPAM assignment
Add a new boltdb to handle IPAM assignment.

The db structure is the following:
Each network has their own bucket with the network name as bucket key.
Inside the network bucket there is an ID bucket which maps the container ID (key)
to a json array of ip addresses (value).
The network bucket also has a bucket for each subnet, the subnet is used as key.
Inside the subnet bucket an ip is used as key and the container ID as value.

The db should be stored on a tmpfs to ensure we always have a clean
state after a reboot.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 16:25:19 +01:00
Paul Holzinger eaae294628
netavark network interface
Implement a new network interface for netavark.
For now only bridge networking is supported.
The interface can create/list/inspect/remove networks. For setup and
teardown netavark will be invoked.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 15:54:02 +01:00
Paul Holzinger 12c62b92ff
Make networking code reusable
To prevent code duplication when creating new network backends move
reusable code into a separate internal package.

This allows all network backends to use the same code as long as they
implement the new NetUtil interface.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-11 15:54:02 +01:00
Paul Holzinger 3690532b3b
network reload return error if we cannot reload ports
As rootless we have to reload the port mappings. If it fails we should
return an error instead of the warning.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-10 21:16:30 +01:00
Paul Holzinger 27de152b5a
network reload without ports should not reload ports
When run as rootless the podman network reload command tries to reload
the rootlessport ports because the childIP could have changed.
However if the containers has no ports we should skip this instead of
printing a warning.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-10 21:16:08 +01:00
OpenShift Merge Robot 5437568fcd
Merge pull request #12227 from Luap99/net-setup
Fix rootless networking with userns and ports
2021-11-09 21:11:30 +01:00
OpenShift Merge Robot 43bd57c7fb
Merge pull request #12195 from boaz0/closes_11998
podman-generate-kube - remove empty structs from YAML
2021-11-09 19:46:31 +01:00
Paul Holzinger 216e2cb366
Fix rootless networking with userns and ports
A rootless container created with a custom userns and forwarded ports
did not work. I refactored the network setup to make the setup logic
more clear.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-09 15:58:57 +01:00
Ian Wienand 72cf389685 shm_lock: Handle ENOSPC better in AllocateSemaphore
When starting a container libpod/runtime_pod_linux.go:NewPod calls
libpod/lock/lock.go:AllocateLock ends up in here.  If you exceed
num_locks, in response to a "podman run ..." you will see:

 Error: error allocating lock for new container: no space left on device

As noted inline, this error is technically true as it is talking about
the SHM area, but for anyone who has not dug into the source (i.e. me,
before a few hours ago :) your initial thought is going to be that
your disk is full.  I spent quite a bit of time trying to diagnose
what disk, partition, overlay, etc. was filling up before I realised
this was actually due to leaking from failing containers.

This overrides this case to give a more explicit message that
hopefully puts people on the right track to fixing this faster.  You
will now see:

 $ ./bin/podman run --rm -it fedora bash
 Error: error allocating lock for new container: allocation failed; exceeded num_locks (20)

[NO NEW TESTS NEEDED] (just changes an existing error message)

Signed-off-by: Ian Wienand <iwienand@redhat.com>
2021-11-09 18:34:21 +11:00
Valentin Rothberg 6444f24028 pod/container create: resolve conflicts of generated names
Address the TOCTOU when generating random names by having at most 10
attempts to assign a random name when creating a pod or container.

[NO TESTS NEEDED] since I do not know a way to force a conflict with
randomly generated names in a reasonable time frame.

Fixes: #11735
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-11-08 13:33:30 +01:00
OpenShift Merge Robot 865653b661
Merge pull request #12184 from adrianreber/2021-11-05-stats-dump
Add 'stats-dump' file to exported checkpoint
2021-11-08 09:29:56 +01:00
Boaz Shuster f3fab1e17c podman-generate-kube - remove empty structs from YAML
[NO NEW TESTS NEEDED]

Signed-off-by: Boaz Shuster <boaz.shuster.github@gmail.com>
2021-11-07 16:33:38 +02:00
OpenShift Merge Robot abbd6c167e
Merge pull request #11890 from Luap99/ports
libpod: deduplicate ports in db
2021-11-06 10:39:16 +01:00
Paul Holzinger 02f67181a2
Fix swagger definition for the new mac address type
The new mac address type broke the api docs. While we could
successfully generate the swagger file it could not be viewed in a
browser.

The problem is that the swagger generation create two type definitions
with the name `HardwareAddr` and this pointed back to itself. Thus the
render process was stucked in an endless loop. To fix this manually
rename the new type to MacAddress and overwrite the types to string
because the json unmarshaller accepts the mac as string.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-05 19:25:40 +01:00
Adrian Reber 6b8fc3bd1d
Add 'stats-dump' file to exported checkpoint
There was the question about how long it takes to create a checkpoint.
CRIU already provides some statistics about how long it takes to create
a checkpoint and similar.

With this change the file 'stats-dump' is included in the checkpoint
archive and the tool checkpointctl can be used to display these
statistics:

./checkpointctl show -t /tmp/cp.tar --print-stats

Displaying container checkpoint data from /tmp/dump.tar

[...]
CRIU dump statistics
+---------------+-------------+--------------+---------------+---------------+---------------+
| FREEZING TIME | FROZEN TIME | MEMDUMP TIME | MEMWRITE TIME | PAGES SCANNED | PAGES WRITTEN |
+---------------+-------------+--------------+---------------+---------------+---------------+
| 105405 us     | 1376964 us  | 504399 us    | 446571 us     |        492153 |         88689 |
+---------------+-------------+--------------+---------------+---------------+---------------+

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-11-05 16:15:00 +00:00
Paul Holzinger 7f433df7e7
rename rootless cni ns to rootless netns
Since we want to use the rootless cni ns also for netavark we should
pick a more generic name. The name is now "rootless network namespace"
or short "rootless netns".

The rename might cause some issues after the update but when the
all containers are restarted or the host is rebooted it should work
correctly.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-05 15:44:37 +01:00
Paul Holzinger 58f8c3d743
mount full XDG_RUNTIME_DIR in rootless cni ns
We should mount the full runtime directory into the namespace instead of
just the netns dir. This allows more use cases.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-05 15:41:04 +01:00
Paul Holzinger 614c6f5970
Fix rootless cni netns cleanup logic
The check if cleanup is needed reads all container and checks if there
are running containers with bridge networking. If we do not find any we
have to cleanup the ns. However there was a problem with this because
the state is empty by default so the running check never worked.
Fortunately the was a second check which relies on the CNI files so we
still did cleanup anyway.

With netavark I noticed that this check is broken because the CNI files
were not present.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-05 00:20:10 +01:00
Paul Holzinger 001d48929d
MAC address json unmarshal should allow strings
Create a new mac address type which supports json marshal/unmarshal from
and to string. This change is backwards compatible with the previous
versions as the unmarshal method still accepts the old byte array or
base64 encoded string.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-11-03 15:30:16 +01:00
OpenShift Merge Robot 1305902ff4
Merge pull request #12127 from vrothberg/bz-2014149
volumes: be more tolerant and fix infinite loop
2021-10-29 13:30:29 +00:00
OpenShift Merge Robot d2147bada6
Merge pull request #12117 from adrianreber/2021-10-27-set-checkpointed-false-after-restore
Set Checkpointed state to false after restore
2021-10-28 18:06:25 +00:00
Valentin Rothberg c5f0a5d788 volumes: be more tolerant and fix infinite loop
Make Podman more tolerant when parsing image volumes during container
creation and further fix an infinite loop when checking them.

Consider `VOLUME ['/etc/foo', '/etc/bar']` in a Containerfile.  While
it looks correct to the human eye, the single quotes are wrong and yield
the two volumes to be `[/etc/foo,` and `/etc/bar]` in Podman and Docker.

When running the container, it'll create a directory `bar]` in `/etc`
and a directory `[` in `/` with two subdirectories `etc/foo,`.  This
behavior is surprising to me but how Docker behaves.  We may improve on
that in the future.  Note that the correct way to syntax for volumes in
a Containerfile is `VOLUME /A /B /C` or `VOLUME ["/A", "/B", "/C"]`;
single quotes are not supported.

This change restores this behavior without breaking container creation
or ending up in an infinite loop.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2014149
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-10-28 16:37:33 +02:00
OpenShift Merge Robot 3bc449371c
Merge pull request #12126 from giuseppe/fix-race-warning-message
runtime: change PID existence check
2021-10-28 12:16:24 +00:00
Giuseppe Scrivano 960831f9c8
runtime: change PID existence check
commit 6b3b0a17c6 introduced a check for
the PID file before attempting to move the PID to a new scope.

This is still vulnerable to TOCTOU race condition though, since the
PID file or the PID can be removed/killed after the check was
successful but before it was used.

Closes: https://github.com/containers/podman/issues/12065

[NO NEW TESTS NEEDED] it fixes a CI flake

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-10-28 11:18:48 +02:00
Giuseppe Scrivano 9e5cd32056
oci: rename sub-cgroup to runtime instead of supervisor
we are having a hard time figuring out a failure in the CI:

https://github.com/containers/podman/issues/11191

Rename the sub-cgroup created here, so we can be certain the error is
caused by this part.

[NO NEW TESTS NEEDED] we need this for the CI.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-10-28 09:18:08 +02:00
Paul Holzinger 0136a66a83
libpod: deduplicate ports in db
The OCICNI port format has one big problem: It does not support ranges.
So if a users forwards a range of 1k ports with podman run -p 1001-2000
we have to store each of the thousand ports individually as array element.
This bloats the db and makes the JSON encoding and decoding much slower.
In many places we already use a better port struct type which supports
ranges, e.g. `pkg/specgen` or the new network interface.

Because of this we have to do many runtime conversions between the two
port formats. If everything uses the new format we can skip the runtime
conversions.

This commit adds logic to replace all occurrences of the old format
with the new one. The database will automatically migrate the ports
to new format when the container config is read for the first time
after the update.

The `ParsePortMapping` function is `pkg/specgen/generate` has been
reworked to better work with the new format. The new logic is able
to deduplicate the given ports. This is necessary the ensure we
store them efficiently in the DB. The new code should also be more
performant than the old one.

To prove that the code is fast enough I added go benchmarks. Parsing
1 million ports took less than 0.5 seconds on my laptop.

Benchmark normalize PortMappings in specgen:
Please note that the 1 million ports are actually 20x 50k ranges
because we cannot have bigger ranges than 65535 ports.
```
$ go test -bench=. -benchmem  ./pkg/specgen/generate/
goos: linux
goarch: amd64
pkg: github.com/containers/podman/v3/pkg/specgen/generate
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkParsePortMappingNoPorts-12             480821532                2.230 ns/op           0 B/op          0 allocs/op
BenchmarkParsePortMapping1-12                      38972             30183 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMapping100-12                    18752             60688 ns/op          141088 B/op        315 allocs/op
BenchmarkParsePortMapping1k-12                      3104            331719 ns/op          223840 B/op       3018 allocs/op
BenchmarkParsePortMapping10k-12                      376           3122930 ns/op         1223650 B/op      30027 allocs/op
BenchmarkParsePortMapping1m-12                         3         390869926 ns/op        124593840 B/op   4000624 allocs/op
BenchmarkParsePortMappingReverse100-12             18940             63414 ns/op          141088 B/op        315 allocs/op
BenchmarkParsePortMappingReverse1k-12               3015            362500 ns/op          223841 B/op       3018 allocs/op
BenchmarkParsePortMappingReverse10k-12               343           3318135 ns/op         1223650 B/op      30027 allocs/op
BenchmarkParsePortMappingReverse1m-12                  3         403392469 ns/op        124593840 B/op   4000624 allocs/op
BenchmarkParsePortMappingRange1-12                 37635             28756 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange100-12               39604             28935 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange1k-12                38384             29921 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange10k-12               29479             40381 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange1m-12                  927           1279369 ns/op          143022 B/op        164 allocs/op
PASS
ok      github.com/containers/podman/v3/pkg/specgen/generate    25.492s
```

Benchmark convert old port format to new one:
```
go test -bench=. -benchmem  ./libpod/
goos: linux
goarch: amd64
pkg: github.com/containers/podman/v3/libpod
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
Benchmark_ocicniPortsToNetTypesPortsNoPorts-12          663526126                1.663 ns/op           0 B/op          0 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1-12                 7858082               141.9 ns/op            72 B/op          2 allocs/op
Benchmark_ocicniPortsToNetTypesPorts10-12                2065347               571.0 ns/op           536 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts100-12                138478              8641 ns/op            4216 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1k-12                   9414            120964 ns/op           41080 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts10k-12                   781           1490526 ns/op          401528 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1m-12                      4         250579010 ns/op        40001656 B/op          4 allocs/op
PASS
ok      github.com/containers/podman/v3/libpod  11.727s
```

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-10-27 18:59:56 +02:00
OpenShift Merge Robot 6caf5e3b7c
Merge pull request #12111 from giuseppe/fix-warning-move-pause-process
runtime: check for pause pid existence
2021-10-27 15:07:59 +00:00
Adrian Reber dcbf5cae12
Set Checkpointed state to false after restore
A restored container still had the state set to 'Checkpointed: true'
which seems wrong if it running again.

[NO NEW TESTS NEEDED]

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-10-27 13:40:54 +00:00
OpenShift Merge Robot 979b631228
Merge pull request #11956 from vrothberg/pause
remove need to download pause image
2021-10-27 10:22:56 +00:00
Giuseppe Scrivano 6b3b0a17c6
runtime: check for pause pid existence
check that the pause pid exists before trying to move it to a separate
scope.

Closes: https://github.com/containers/podman/issues/12065

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-10-27 11:24:50 +02:00
OpenShift Merge Robot ed3aa2acaf
Merge pull request #12098 from Luap99/slirp-dad
Slirp4netns with ipv6 set net.ipv6.conf.default.accept_dad=0
2021-10-26 20:54:27 +00:00
OpenShift Merge Robot 1243954372
Merge pull request #12067 from hshiina/logs-journal-tail
Fix a few problems in 'podman logs --tail' with journald driver
2021-10-26 20:33:26 +00:00
OpenShift Merge Robot 420ac5d13d
Merge pull request #12088 from adrianreber/2021-10-25-fix-label-ipc-host
Allow 'container restore' with '--ipc host'
2021-10-26 16:38:54 +00:00
Paul Holzinger 008075ce54
Slirp4netns with ipv6 set net.ipv6.conf.default.accept_dad=0
Duplicate Address Detection slows the ipv6 setup down for 1-2 seconds.
Since slirp4netns is run it is own namespace and not directly routed
we can skip this to make the ipv6 address immediately available.
We change the default to make sure the slirp tap interface gets the
correct value assigned so DAD is disabled for it.
Also make sure to change this value back to the original after slirp4netns
is ready in case users rely on this sysctl.

Fixes #11062

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-10-26 18:27:30 +02:00
Hironori Shiina c723e6b978 Fix a few problems in 'podman logs --tail' with journald driver
The following problems regarding `logs --tail` with the journald log
driver are fixed:
- One more line than a specified value is displayed.
- '--tail 0' displays all lines while the other log drivers displays
  nothing.
- Partial lines are not considered.
- If the journald events backend is used and a container has exited,
  nothing is displayed.

Integration tests that should have detected the bugs are also fixed. The
tests are executed with json-file log driver three times without this
fix.

Signed-off-by: Hironori Shiina <shiina.hironori@jp.fujitsu.com>
2021-10-26 12:18:57 -04:00
Adrian Reber bf8fd943ef
Allow 'container restore' with '--ipc host'
Trying to restore a container that was started with '--ipc host' fails
with:

Error: error creating container storage: ProcessLabel and Mountlabel must either not be specified or both specified

We already fixed this exact same error message for containers started
with '--privileged'. The previous fix was to check if the to be restored
container is a privileged container (c.config.Privileged). Unfortunately
this does not work for containers started with '--ipc host'.

This commit changes the check for a privileged container to check if
both the ProcessLabel and the MountLabel is actually set and only then
re-uses those labels.

Signed-off-by: Adrian Reber <areber@redhat.com>
2021-10-26 14:42:32 +00:00
Paul Holzinger efd1c080bf
Document to not set K8S envars for CNI
Setting these environment variables can cause issues with custom CNI
plugins, see #12083.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-10-26 16:11:46 +02:00
Valentin Rothberg 75f478c08b pod create: remove need for pause image
So far, the infra containers of pods required pulling down an image
rendering pods not usable in disconnected environments.  Instead, build
an image locally which uses local pause binary.

Fixes: #10354
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-10-26 13:51:45 +02:00
Valentin Rothberg 2e3611d61f overlay root fs: create mount on runtime dir
Make sure to create the mounts for containers with an overlay root FS in
the runtime dir (e.g., /run/user/1000/...) to guarantee that we can
actually overlay mount on the specific path which is not the case for
the graph root.

[NO NEW TESTS NEEDED] since it is not a user-facing change.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-10-26 13:51:45 +02:00
Daniel J Walsh a42c131c80
Update vendor github.com/opencontainers/runtime-tools
This will change mount of /dev within container to noexec, making
containers slightly more secure.

[NO NEW TESTS NEEDED]

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2021-10-25 16:50:45 -04:00
Stefan Weil d7662edf66 [NO NEW TESTS NEEDED] Fix off-by-one index comparision (reported by LGTM)
LGTM alert:

    Off-by-one index comparison against length may lead to out-of-bounds read.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-25 10:52:01 +02:00
Stefan Weil 22270fb845 Replace 'an user' => 'a user'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2021-10-24 22:27:39 +02:00
OpenShift Merge Robot 5dd211f91b
Merge pull request #11991 from rhatdan/size
Allow API to specify size and inode quota
2021-10-22 14:18:45 +00:00
OpenShift Merge Robot 833d92d709
Merge pull request #12021 from rhatdan/kube
Generate Kube should not print default structs
2021-10-22 14:12:44 +00:00