Commit Graph

317 Commits

Author SHA1 Message Date
Paul Holzinger 5490be67b3
network db rewrite: migrate existing settings
The new network db structure stores everything in the networks bucket.
Previously some network settings were not written the the network bucket
and only stored in the container config.
Instead of the old format which used the container ID as value in the
networks buckets we now use the PerNetworkoptions struct there.

To migrate existing users we use the state.GetNetworks() function. If it
fails to read the new format it will automatically migrate the old
config format to the new one. This is allows a flawless migration path.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-12-14 15:23:20 +01:00
Giuseppe Scrivano 0afaf78378
container, cgroup: detect pid termination
If the /proc/$PID/cgroup file doesn't exist, then it is likely the
container was terminated in the meanwhile so report ErrCtrStopped that
is already handled instead of ENOENT.

commit a66f40b4df introduced the regression.

Closes: https://github.com/containers/podman/issues/12457

[NO NEW TESTS NEEDED] it solves a race in the CI that is difficult to reproduce.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-12-01 13:42:59 +01:00
Giuseppe Scrivano e648122b29
libpod: improve heuristic to detect cgroup
improve the heuristic to detect the scope that was created for the container.
This is necessary with systemd running as PID 1, since it moves itself
to a different sub-cgroup, thus stats would not account for other
processes in the same container.

Closes: https://github.com/containers/podman/issues/12400

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-11-24 14:50:12 +01:00
Aditya Rajan 014cc4b9d9
secret: honor custom target for secrets with run
Honor custom `target` if specified while running or creating containers
with secret `type=mount`.

Example:
`podman run -it --secret token,type=mount,target=TOKEN ubi8/ubi:latest
bash`

Signed-off-by: Aditya Rajan <arajan@redhat.com>
2021-11-15 23:19:27 +05:30
Paul Holzinger 0136a66a83
libpod: deduplicate ports in db
The OCICNI port format has one big problem: It does not support ranges.
So if a users forwards a range of 1k ports with podman run -p 1001-2000
we have to store each of the thousand ports individually as array element.
This bloats the db and makes the JSON encoding and decoding much slower.
In many places we already use a better port struct type which supports
ranges, e.g. `pkg/specgen` or the new network interface.

Because of this we have to do many runtime conversions between the two
port formats. If everything uses the new format we can skip the runtime
conversions.

This commit adds logic to replace all occurrences of the old format
with the new one. The database will automatically migrate the ports
to new format when the container config is read for the first time
after the update.

The `ParsePortMapping` function is `pkg/specgen/generate` has been
reworked to better work with the new format. The new logic is able
to deduplicate the given ports. This is necessary the ensure we
store them efficiently in the DB. The new code should also be more
performant than the old one.

To prove that the code is fast enough I added go benchmarks. Parsing
1 million ports took less than 0.5 seconds on my laptop.

Benchmark normalize PortMappings in specgen:
Please note that the 1 million ports are actually 20x 50k ranges
because we cannot have bigger ranges than 65535 ports.
```
$ go test -bench=. -benchmem  ./pkg/specgen/generate/
goos: linux
goarch: amd64
pkg: github.com/containers/podman/v3/pkg/specgen/generate
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
BenchmarkParsePortMappingNoPorts-12             480821532                2.230 ns/op           0 B/op          0 allocs/op
BenchmarkParsePortMapping1-12                      38972             30183 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMapping100-12                    18752             60688 ns/op          141088 B/op        315 allocs/op
BenchmarkParsePortMapping1k-12                      3104            331719 ns/op          223840 B/op       3018 allocs/op
BenchmarkParsePortMapping10k-12                      376           3122930 ns/op         1223650 B/op      30027 allocs/op
BenchmarkParsePortMapping1m-12                         3         390869926 ns/op        124593840 B/op   4000624 allocs/op
BenchmarkParsePortMappingReverse100-12             18940             63414 ns/op          141088 B/op        315 allocs/op
BenchmarkParsePortMappingReverse1k-12               3015            362500 ns/op          223841 B/op       3018 allocs/op
BenchmarkParsePortMappingReverse10k-12               343           3318135 ns/op         1223650 B/op      30027 allocs/op
BenchmarkParsePortMappingReverse1m-12                  3         403392469 ns/op        124593840 B/op   4000624 allocs/op
BenchmarkParsePortMappingRange1-12                 37635             28756 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange100-12               39604             28935 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange1k-12                38384             29921 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange10k-12               29479             40381 ns/op          131584 B/op          9 allocs/op
BenchmarkParsePortMappingRange1m-12                  927           1279369 ns/op          143022 B/op        164 allocs/op
PASS
ok      github.com/containers/podman/v3/pkg/specgen/generate    25.492s
```

Benchmark convert old port format to new one:
```
go test -bench=. -benchmem  ./libpod/
goos: linux
goarch: amd64
pkg: github.com/containers/podman/v3/libpod
cpu: Intel(R) Core(TM) i7-10850H CPU @ 2.70GHz
Benchmark_ocicniPortsToNetTypesPortsNoPorts-12          663526126                1.663 ns/op           0 B/op          0 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1-12                 7858082               141.9 ns/op            72 B/op          2 allocs/op
Benchmark_ocicniPortsToNetTypesPorts10-12                2065347               571.0 ns/op           536 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts100-12                138478              8641 ns/op            4216 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1k-12                   9414            120964 ns/op           41080 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts10k-12                   781           1490526 ns/op          401528 B/op          4 allocs/op
Benchmark_ocicniPortsToNetTypesPorts1m-12                      4         250579010 ns/op        40001656 B/op          4 allocs/op
PASS
ok      github.com/containers/podman/v3/libpod  11.727s
```

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-10-27 18:59:56 +02:00
Valentin Rothberg 30bf31010e libpod: add execSessionNoCopy
To avoid creating an expensive deep copy, create an internal function to
access the exec session.

[NO TESTS NEEDED]

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-09-29 13:44:55 +02:00
Valentin Rothberg 98176f0018 libpod: do not call (*container).Spec()
Access the container's spec field directly inside of libpod instead of
calling Spec() which in turn creates expensive JSON deep copies.

Accessing the field directly drops memory consumption of a simple
podman run --rm busybox true from ~700kB to ~600kB.

[NO TESTS NEEDED]

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-09-29 13:44:39 +02:00
Paul Holzinger af49810a6e
Bump CNI to v1.0.1
Update CNI so we can match wrapped errors. This should silence ENOENT
warnings when trying to read the cni conflist files.

Fixes #10926

Because CNI v1.0.0 contains breaking changes we have to change some
import paths. Also we cannot update the CNI version used for the
conflist files created by `podman network create` because this would
require at least containernetwork-plugins v1.0.1 and a updated dnsname
plugin. Because this will take a while until it lands in most distros
we should not use this version. So keep using v0.4.0 for now.

The update from checkpoint-restore/checkpointctl is also required to
make sure it no longer uses CNI to read the network status.

[NO TESTS NEEDED]

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-09-22 11:51:40 +02:00
cdoern 8fac34b8ff Pod Device Support
added support for pod devices. The device gets added to the infra container and
recreated in all containers that join the pod.

This required a new container config item to keep track of the original device passed in by the user before
the path was parsed into the container device.

Signed-off-by: cdoern <cdoern@redhat.com>
2021-09-20 23:22:43 -04:00
Paul Holzinger b906b9d858
Drop OCICNI dependency
We do not use the ocicni code anymore so let's get rid of it. Only the
port struct is used but we can copy this into libpod network types so
we can debloat the binary.

The next step is to remove the OCICNI port mapping form the container
config and use the better PortMapping struct everywhere.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-09-15 20:00:28 +02:00
Paul Holzinger 85e8fbf7f3
Wire network interface into libpod
Make use of the new network interface in libpod.

This commit contains several breaking changes:
- podman network create only outputs the new network name and not file
  path.
- podman network ls shows the network driver instead of the cni version
  and plugins.
- podman network inspect outputs the new network struct and not the cni
  conflist.
- The bindings and libpod api endpoints have been changed to use the new
  network structure.

The container network status is stored in a new field in the state. The
status should be received with the new `c.getNetworkStatus`. This will
migrate the old status to the new format. Therefore old containers should
contine to work correctly in all cases even when network connect/
disconnect is used.

New features:
- podman network reload keeps the ip and mac for more than one network.
- podman container restore keeps the ip and mac for more than one
  network.
- The network create compat endpoint can now use more than one ipam
  config.

The man pages and the swagger doc are updated to reflect the latest
changes.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2021-09-15 20:00:20 +02:00
Urvashi Mohnani f5e4ffb5e4 Add init containers to generate and play kube
Kubernetes has a concept of init containers that run and exit before
the regular containers in a pod are started. We added init containers
to podman pods as well. This patch adds support for generating init
containers in the kube yaml when a pod we are converting had init
containers. When playing a kube yaml, it detects an init container
and creates such a container in podman accordingly.
Note, only init containers created with the init type set to "always"
will be generated as the "once" option deletes the init container after
it has run and exited. Play kube will always creates init containers
with the "always" init container type.

Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2021-09-10 09:37:46 -04:00
Matthew Heon bfcd83ecd6 Add Checkpointed bool to Inspect
When inspecting a container, we now report whether the container
was stopped by a `podman checkpoint` operation via a new bool in
the State portion of inspected, `Checkpointed`.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2021-09-07 14:16:01 -04:00
Daniel J Walsh c22f3e8b4e Implement SD-NOTIFY proxy in conmon
This leverages conmon's ability to proxy the SD-NOTIFY socket.
This prevents locking caused by OCI runtime blocking, waiting for
SD-NOTIFY messages, and instead passes the messages directly up
to the host.

NOTE: Also re-enable the auto-update tests which has been disabled due
to flakiness.  With this change, Podman properly integrates into
systemd.

Fixes: #7316
Signed-off-by: Joseph Gooch <mrwizard@dok.org>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-08-20 11:12:05 +02:00
Daniel J Walsh 404488a087
Run codespell to fix spelling
[NO TESTS NEEDED] Just fixing spelling.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2021-08-11 16:41:45 -04:00
Daniel J Walsh 221b1add74 Add support for pod inside of user namespace.
Add the --userns flag to podman pod create and keep
track of the userns setting that pod was created with
so that all containers created within the pod will inherit
that userns setting.

Specifically we need to be able to launch a pod with
--userns=keep-id

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
2021-08-09 15:17:22 -04:00
flouthoc 2a484e782a ps: support the container notation for ps --filter network=...
Signed-off-by: flouthoc <flouthoc.git@gmail.com>
2021-07-30 19:31:05 +05:30
Giuseppe Scrivano 3b6cb8fabb
container: ignore named hierarchies
when looking up the container cgroup, ignore named hierarchies since
containers running systemd as payload will create a sub-cgroup and
move themselves there.

Closes: https://github.com/containers/podman/issues/10602

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2021-06-10 14:49:58 +02:00
Matthew Heon 533d88b656 Add the option of Rootless CNI networking by default
When the containers.conf field "NetNS" is set to "Bridge" and the
"RootlessNetworking" field is set to "cni", Podman will now
handle rootless in the same way it does root - all containers
will be joined to a default CNI network, instead of exclusively
using slirp4netns.

If no CNI default network config is present for the user, one
will be auto-generated (this also works for root, but it won't be
nearly as common there since the package should already ship a
config).

I eventually hope to remove the "NetNS=Bridge" bit from
containers.conf, but let's get something in for Brent to work
with.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2021-05-26 15:03:30 -04:00
OpenShift Merge Robot 9a9118b831
Merge pull request #10366 from ashley-cui/secretoptions
Support uid,gid,mode options for secrets
2021-05-17 16:24:20 -04:00
Ashley Cui cf30f160ad Support uid,gid,mode options for secrets
Support UID, GID, Mode options for mount type secrets. Also, change
default secret permissions to 444 so all users can read secret.

Signed-off-by: Ashley Cui <acui@redhat.com>
2021-05-17 14:35:55 -04:00
Baron Lenardson c8dfcce6db Add host.containers.internal entry into container's etc/hosts
This change adds the entry `host.containers.internal` to the `/etc/hosts`
file within a new containers filesystem. The ip address is determined by
the containers networking configuration and points to the gateway address
for the containers networking namespace.

Closes #5651

Signed-off-by: Baron Lenardson <lenardson.baron@gmail.com>
2021-05-17 08:21:22 -05:00
Paul Holzinger 57e8c66322 Do not leak libpod package into the remote client
Some packages used by the remote client imported the libpod package.
This is not wanted because it adds unnecessary bloat to the client and
also causes problems with platform specific code(linux only), see #9710.

The solution is to move the used functions/variables into extra packages
which do not import libpod.

This change shrinks the remote client size more than 6MB compared to the
current master.

[NO TESTS NEEDED]
I have no idea how to test this properly but with #9710 the cross
compile should fail.

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
2021-03-15 14:02:04 +01:00
Valentin Rothberg a090301bbb podman cp: support copying on tmpfs mounts
Traditionally, the path resolution for containers has been resolved on
the *host*; relative to the container's mount point or relative to
specified bind mounts or volumes.

While this works nicely for non-running containers, it poses a problem
for running ones.  In that case, certain kinds of mounts (e.g., tmpfs)
will not resolve correctly.  A tmpfs is held in memory and hence cannot
be resolved relatively to the container's mount point.  A copy operation
will succeed but the data will not show up inside the container.

To support these kinds of mounts, we need to join the *running*
container's mount namespace (and PID namespace) when copying.

Note that this change implies moving the copy and stat logic into
`libpod` since we need to keep the container locked to avoid race
conditions.  The immediate benefit is that all logic is now inside
`libpod`; the code isn't scattered anymore.

Further note that Docker does not support copying to tmpfs mounts.

Tests have been extended to cover *both* path resolutions for running
and created containers.  New tests have been added to exercise the
tmpfs-mount case.

For the record: Some tests could be improved by using `start -a` instead
of a start-exec sequence.  Unfortunately, `start -a` is flaky in the CI
which forced me to use the more expensive start-exec option.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-03-04 15:43:12 +01:00
Eduardo Vega 874f2327e6 Add U volume flag to chown source volumes
Signed-off-by: Eduardo Vega <edvegavalerio@gmail.com>
2021-02-22 22:55:19 -06:00
Valentin Rothberg 5dded6fae7 bump go module to v3
We missed bumping the go module, so let's do it now :)

* Automated go code with github.com/sirkon/go-imports-rename
* Manually via `vgrep podman/v2` the rest

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2021-02-22 09:03:51 +01:00
Paul Holzinger 78c8a87362 Enable whitespace linter
Use the whitespace linter and fix the reported problems.

[NO TESTS NEEDED]

Signed-off-by: Paul Holzinger <paul.holzinger@web.de>
2021-02-11 23:01:56 +01:00
Ashley Cui 832a69b0be Implement Secrets
Implement podman secret create, inspect, ls, rm
Implement podman run/create --secret
Secrets are blobs of data that are sensitive.
Currently, the only secret driver supported is filedriver, which means creating a secret stores it in base64 unencrypted in a file.
After creating a secret, a user can use the --secret flag to expose the secret inside the container at /run/secrets/[secretname]
This secret will not be commited to an image on a podman commit

Signed-off-by: Ashley Cui <acui@redhat.com>
2021-02-09 09:13:21 -05:00
Milivoje Legenovic cdbbc6120b podman generate kube ignores --network=host
Signed-off-by: Milivoje Legenovic <m.legenovic@gmail.com>
2021-01-30 09:08:36 +01:00
Giuseppe Scrivano 64571ea0a4
libpod: handle single user mapped as root
if a single user is mapped in the user namespace, handle it as root.

It is needed for running unprivileged containers with a single user
available without being forced to run with euid and egid set to 0.

Needs: https://github.com/containers/storage/pull/794

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-12-24 13:39:15 +01:00
Valentin Rothberg 055248ce98 container cgroup path
Before querying for a container's cgroup path, make sure that the
container is synced.  Also make sure to error out if the container
isn't running.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-12-07 15:16:20 +01:00
Valentin Rothberg ccbca0b4ab rewrite podman-cp
* Add a new `pkg/copy` to centralize all container-copy related code.

* The new code is based on Buildah's `copier` package.

* The compat `/archive` endpoints use the new `copy` package.

* Update docs and an several new tests.

* Includes many fixes, most notably, the look-up of volumes and mounts.

Breaking changes:

 * Podman is now expecting that container-destination paths exist.
   Before, Podman created the paths if needed.  Docker does not do
   that and I believe Podman should not either as it's a recipe for
   masking errors.  These errors may be user induced (e.g., a path
   typo), or internal typos (e.g., when the destination may be a
   mistakenly unmounted volume).  Let's keep the magic low for such
   a security sensitive feature.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-12-04 14:39:55 +01:00
Matthew Heon ce775248ad Make c.networks() list include the default network
This makes things a lot more clear - if we are actually joining a
CNI network, we are guaranteed to get a non-zero length list of
networks.

We do, however, need to know if the network we are joining is the
default network for inspecting containers as it determines how we
populate the response struct. To handle this, add a bool to
indicate that the network listed was the default network, and
only the default network.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-11-20 14:03:24 -05:00
Valentin Rothberg 1efb9b5e17 fix container cgroup lookup
When running on cgroups v1, `/proc/{PID}/cgroup` has multiple entries,
each pointing potentially to a different cgroup.  Some may be empty,
some may point to parents.

The one we really need is the libpod-specific one, which always is the
longest path.  So instead of looking at the first entry, look at all and
select the longest one.

Fixes: #8397
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-11-20 11:31:12 +01:00
baude a3e0b7d117 add network connect|disconnect compat endpoints
this enables the ability to connect and disconnect a container from a
given network. it is only for the compatibility layer. some code had to
be refactored to avoid circular imports.

additionally, tests are being deferred temporarily due to some
incompatibility/bug in either docker-py or our stack.

Signed-off-by: baude <bbaude@redhat.com>
2020-11-19 08:16:19 -06:00
baude d3e794bda3 add network connect|disconnect compat endpoints
this enables the ability to connect and disconnect a container from a
given network. it is only for the compatibility layer. some code had to
be refactored to avoid circular imports.

additionally, tests are being deferred temporarily due to some
incompatibility/bug in either docker-py or our stack.

Signed-off-by: baude <bbaude@redhat.com>
2020-11-17 14:22:39 -06:00
Valentin Rothberg 39bf07694c use container cgroups path
When looking up a container's cgroup path, parse /proc/[PID]/cgroup.
This will work across all cgroup managers and configurations and is
supported on cgroups v1 and v2.

Fixes: #8265
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-11-17 12:29:50 +01:00
Matthew Heon 8d56eb5342 Add support for network connect / disconnect to DB
Convert the existing network aliases set/remove code to network
connect and disconnect. We can no longer modify aliases for an
existing network, but we can add and remove entire networks. As
part of this, we need to add a new function to retrieve current
aliases the container is connected to (we had a table for this
as of the first aliases PR, but it was not externally exposed).

At the same time, remove all deconflicting logic for aliases.
Docker does absolutely no checks of this nature, and allows two
containers to have the same aliases, aliases that conflict with
container names, etc - it's just left to DNS to return all the
IP addresses, and presumably we round-robin from there? Most
tests for the existing code had to be removed because of this.

Convert all uses of the old container config.Networks field,
which previously included all networks in the container, to use
the new DB table. This ensures we actually get an up-to-date list
of in-use networks. Also, add network aliases to the output of
`podman inspect`.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-11-11 16:37:54 -05:00
Valentin Rothberg 65a618886e new "image" mount type
Add a new "image" mount type to `--mount`.  The source of the mount is
the name or ID of an image.  The destination is the path inside the
container.  Image mounts further support an optional `rw,readwrite`
parameter which if set to "true" will yield the mount writable inside
the container.  Note that no changes are propagated to the image mount
on the host (which in any case is read only).

Mounts are overlay mounts.  To support read-only overlay mounts, vendor
a non-release version of Buildah.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-10-29 15:06:22 +01:00
Matthew Heon 4d800a5f45 Store cgroup manager on a per-container basis
When we create a container, we assign a cgroup parent based on
the current cgroup manager in use. This parent is only usable
with the cgroup manager the container is created with, so if the
default cgroup manager is later changed or overridden, the
container will not be able to start.

To solve this, store the cgroup manager that created the
container in container configuration, so we can guarantee a
container with a systemd cgroup parent will always be started
with systemd cgroups.

Unfortunately, this is very difficult to test in CI, due to the
fact that we hard-code cgroup manager on all invocations of
Podman in CI.

Fixes #7830

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-10-08 15:25:06 -04:00
Daniel J Walsh a5e37ad280
Switch all references to github.com/containers/libpod -> podman
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-28 08:23:45 -04:00
louis 10c4ab1149 Refactor container config
This commit handle the TODO task of breaking the Container
config into smaller sub-configs

Signed-off-by: ldelossa <ldelossa@redhat.com>
2020-07-23 10:18:14 -04:00
Ashley Cui d4d3fbc155 Add --umask flag for create, run
--umask sets the umask inside the container
Defaults to 0022

Co-authored-by: Daniel J Walsh <dwalsh@redhat.com>
Signed-off-by: Ashley Cui <acui@redhat.com>
2020-07-21 14:22:30 -04:00
Qi Wang 020d81f113 Add support for overlay volume mounts in podman.
Add support -v for overlay volume mounts in podman.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

Signed-off-by: Qi Wang <qiwan@redhat.com>
2020-07-20 09:48:55 -04:00
Giuseppe Scrivano 9be7029cdd
libpod: pass down network options
do not pass network specific options through the network namespace.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-16 22:37:27 +02:00
Daniel J Walsh 6c6670f12a
Add username to /etc/passwd inside of container if --userns keep-id
If I enter a continer with --userns keep-id, my UID will be present
inside of the container, but most likely my user will not be defined.

This patch will take information about the user and stick it into the
container.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-07-07 08:34:31 -04:00
Joseph Gooch 0b1c1ef461 Implement --sdnotify cmdline option to control sd-notify behavior
--sdnotify container|conmon|ignore
With "conmon", we send the MAINPID, and clear the NOTIFY_SOCKET so the OCI
runtime doesn't pass it into the container. We also advertise "ready" when the
OCI runtime finishes to advertise the service as ready.

With "container", we send the MAINPID, and leave the NOTIFY_SOCKET so the OCI
runtime passes it into the container for initialization, and let the container advertise further metadata.
This is the default, which is closest to the behavior podman has done in the past.

The "ignore" option removes NOTIFY_SOCKET from the environment, so neither podman nor
any child processes will talk to systemd.

This removes the need for hardcoded CID and PID files in the command line, and
the PIDFile directive, as the pid is advertised directly through sd-notify.

Signed-off-by: Joseph Gooch <mrwizard@dok.org>
2020-07-06 17:47:18 +00:00
OpenShift Merge Robot 9532509c50
Merge pull request #6836 from ashley-cui/tzlibpod
Add --tz flag to create, run
2020-07-06 13:28:20 -04:00
Valentin Rothberg 8489dc4345 move go module to v2
With the advent of Podman 2.0.0 we crossed the magical barrier of go
modules.  While we were able to continue importing all packages inside
of the project, the project could not be vendored anymore from the
outside.

Move the go module to new major version and change all imports to
`github.com/containers/libpod/v2`.  The renaming of the imports
was done via `gomove` [1].

[1] https://github.com/KSubedi/gomove

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-07-06 15:50:12 +02:00
Ashley Cui 9a1543caec Add --tz flag to create, run
--tz flag sets timezone inside container
Can be set to IANA timezone as well as `local` to match host machine

Signed-off-by: Ashley Cui <acui@redhat.com>
2020-07-02 13:30:59 -04:00
Giuseppe Scrivano 6ee5f740a4
podman: add new cgroup mode split
When running under systemd there is no need to create yet another
cgroup for the container.

With conmon-delegated the current cgroup will be split in two sub
cgroups:

- supervisor
- container

The supervisor cgroup will hold conmon and the podman process, while
the container cgroup is used by the OCI runtime (using the cgroupfs
backend).

Closes: https://github.com/containers/libpod/issues/6400

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-25 17:16:12 +02:00
Qi Wang f61a7f25a8 Add --preservefds to podman run
Add --preservefds to podman run. close https://github.com/containers/libpod/issues/6458

Signed-off-by: Qi Wang <qiwan@redhat.com>
2020-06-19 09:40:13 -04:00
Matthew Heon 6f1440a3ec Add support for the unless-stopped restart policy
We initially believed that implementing this required support for
restarting containers after reboot, but this is not the case.
The unless-stopped restart policy acts identically to the always
restart policy except in cases related to reboot (which we do not
support yet), but it does not require that support for us to
implement it.

Changes themselves are quite simple, we need a new restart policy
constant, we need to remove existing checks that block creation
of containers when unless-stopped was used, and we need to update
the manpages.

Fixes #6508

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-06-17 11:16:12 -04:00
Valentin Rothberg f269be3a31 add {generate,play} kube
Add the `podman generate kube` and `podman play kube` command.  The code
has largely been copied from Podman v1 but restructured to not leak the
K8s core API into the (remote) client.

Both commands are added in the same commit to allow for enabling the
tests at the same time.

Move some exports from `cmd/podman/common` to the appropriate places in
the backend to avoid circular dependencies.

Move definitions of label annotations to `libpod/define` and set the
security-opt labels in the frontend to make kube tests pass.

Implement rest endpoints, bindings and the tunnel interface.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-05-06 17:08:22 +02:00
Brent Baude ba430bfe5e podman v2 remove bloat v2
rid ourseleves of libpod references in v2 client

Signed-off-by: Brent Baude <bbaude@redhat.com>
2020-04-16 12:04:46 -05:00
Daniel J Walsh c4ca3c71ff
Add support for selecting kvm and systemd labels
In order to better support kata containers and systemd containers
container-selinux has added new types. Podman should execute the
container with an SELinux process label to match the container type.

Traditional Container process : container_t
KVM Container Process: containre_kvm_t
PID 1 Init process: container_init_t

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-04-15 16:52:16 -04:00
Daniel J Walsh 4352d58549
Add support for containers.conf
vendor in c/common config pkg for containers.conf

Signed-off-by: Qi Wang qiwan@redhat.com
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2020-03-27 14:36:03 -04:00
OpenShift Merge Robot aa6c8c2e55
Merge pull request #5088 from mheon/begin_exec_rework
Begin exec rework
2020-03-19 22:09:40 +01:00
Matthew Heon 118e78c5d6 Add structure for new exec session tracking to DB
As part of the rework of exec sessions, we need to address them
independently of containers. In the new API, we need to be able
to fetch them by their ID, regardless of what container they are
associated with. Unfortunately, our existing exec sessions are
tied to individual containers; there's no way to tell what
container a session belongs to and retrieve it without getting
every exec session for every container.

This adds a pointer to the container an exec session is
associated with to the database. The sessions themselves are
still stored in the container.

Exec-related APIs have been restructured to work with the new
database representation. The originally monolithic API has been
split into a number of smaller calls to allow more fine-grained
control of lifecycle. Support for legacy exec sessions has been
retained, but in a deprecated fashion; we should remove this in
a few releases.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-03-18 11:02:14 -04:00
Matthew Heon f138405b46 Populate ExecSession with all required fields
As part of the rework of exec sessions, we want to split Create
and Start - and, as a result, we need to keep everything needed
to start exec sessions in the struct, not just the bare minimum
for tracking running ones.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2020-03-18 11:02:14 -04:00
Valentin Rothberg f4e873c4e1 auto updates
Add support to auto-update containers running in systemd units as
generated with `podman generate systemd --new`.

`podman auto-update` looks up containers with a specified
"io.containers.autoupdate" label (i.e., the auto-update policy).

If the label is present and set to "image", Podman reaches out to the
corresponding registry to check if the image has been updated.  We
consider an image to be updated if the digest in the local storage is
different than the one of the remote image.  If an image must be
updated, Podman pulls it down and restarts the container.  Note that the
restarting sequence relies on systemd.

At container-creation time, Podman looks up the "PODMAN_SYSTEMD_UNIT"
environment variables and stores it verbatim in the container's label.
This variable is now set by all systemd units generated by
`podman-generate-systemd` and is set to `%n` (i.e., the name of systemd
unit starting the container).  This data is then being used in the
auto-update sequence to instruct systemd (via DBUS) to restart the unit
and hence to restart the container.

Note that this implementation of auto-updates relies on systemd and
requires a fully-qualified image reference to be used to create the
container.  This enforcement is necessary to know which image to
actually check and pull.  If we used an image ID, we would not know
which image to check/pull anymore.

Fixes: #3575
Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-03-17 17:18:56 +01:00
Matthew Heon e3a549b7b1 Remove ImageVolumes from database
Before Libpod supported named volumes, we approximated image
volumes by bind-mounting in per-container temporary directories.
This was handled by Libpod, and had a corresponding database
entry to enable/disable it.

However, when we enabled named volumes, we completely rewrote the
old implementation; none of the old bind mount implementation
still exists, save one flag in the database. With nothing
remaining to use it, it has no further purpose.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-02-21 09:37:30 -05:00
Matthew Heon 4567f39800 Initial implementation of a spec generator package
The current Libpod pkg/spec has become a victim of the better
part of three years of development that tied it extremely closely
to the current Podman CLI. Defaults are spread across multiple
places, there is no easy way to produce a CreateConfig that will
actually produce a valid container, and the logic for generating
configs has sprawled across at least three packages.

This is an initial pass at a package that generates OCI specs
that will supersede large parts of the current pkg/spec. The
CreateConfig will still exist, but will effectively turn into a
parsed CLI. This will be compiled down into the new SpecGenerator
struct, which will generate the OCI spec and Libpod create
options.

The preferred integration point for plugging into Podman's Go API
to create containers will be the new CreateConfig, as it's less
tied to Podman's command line. CRI-O, for example, will likely
tie in here.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2020-02-04 08:10:23 -06:00
Giuseppe Scrivano ba0a6f34e3
podman: add new option --cgroups=no-conmon
it allows to disable cgroups creation only for the conmon process.

A new cgroup is created for the container payload.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-01-16 18:56:51 +01:00
Valentin Rothberg 67165b7675 make lint: enable gocritic
`gocritic` is a powerful linter that helps in preventing certain kinds
of errors as well as enforcing a coding style.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2020-01-13 14:27:02 +01:00
Giuseppe Scrivano 71341a1948
log: support --log-opt tag=
support a custom tag to add to each log for the container.

It is currently supported only by the journald backend.

Closes: https://github.com/containers/libpod/issues/3653

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-01-10 10:35:19 +01:00
Akihiro Suda da7595a69f rootless: use RootlessKit port forwarder
RootlessKit port forwarder has a lot of advantages over the slirp4netns port forwarder:

* Very high throughput.
  Benchmark result on Travis: socat: 5.2 Gbps, slirp4netns: 8.3 Gbps, RootlessKit: 27.3 Gbps
  (https://travis-ci.org/rootless-containers/rootlesskit/builds/597056377)

* Connections from the host are treated as 127.0.0.1 rather than 10.0.2.2 in the namespace.
  No UDP issue (#4586)

* No tcp_rmem issue (#4537)

* Probably works with IPv6. Even if not, it is trivial to support IPv6.  (#4311)

* Easily extensible for future support of SCTP

* Easily extensible for future support of `lxc-user-nic` SUID network

RootlessKit port forwarder has been already adopted as the default port forwarder by Rootless Docker/Moby,
and no issue has been reported AFAIK.

As the port forwarder is imported as a Go package, no `rootlesskit` binary is required for Podman.

Fix #4586
May-fix #4559
Fix #4537
May-fix #4311

See https://github.com/rootless-containers/rootlesskit/blob/v0.7.0/pkg/port/builtin/builtin.go

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-08 19:35:17 +09:00
Valentin Rothberg 437bc61f4e container config: add CreateCommand
Store the full command plus arguments of the process the container has
been created with.  Expose this data as a `Config.CreateCommand` field
in the container-inspect data as well.

This information can be useful for debugging, as we can find out which
command has created the container, and, if being created via the Podman
CLI, we know exactly with which flags the container has been created
with.

The immediate motivation for this change is to use this information for
`podman-generate-systemd` to generate systemd-service files that allow
for creating new containers (in contrast to only starting existing
ones).

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-12-13 14:39:45 +01:00
Giuseppe Scrivano 3f1675d902
libpod: fix stats for rootless pods
honor the systemd parent directory when specified.

Closes: https://github.com/containers/libpod/issues/4634

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-12-04 11:13:40 +01:00
Matthew Heon b0b9103cca Allow chained network namespace containers
The code currently assumes that the container we delegate network
namespace to will never further delegate to another container, so
when looking up things like /etc/hosts and /etc/resolv.conf we
won't pull the correct files from the chained dependency. The
changes to resolve this are relatively simple - just need to keep
looking until we find a container without NetNsCtr set.

Fixes #4626

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-12-03 10:27:15 -05:00
Jakub Filak 2497b6c77b
podman: add support for specifying MAC
I basically copied and adapted the statements for setting IP.

Closes #1136

Signed-off-by: Jakub Filak <jakub.filak@sap.com>
2019-11-06 16:22:19 +01:00
Valentin Rothberg 11c282ab02 add libpod/config
Refactor the `RuntimeConfig` along with related code from libpod into
libpod/config.  Note that this is a first step of consolidating code
into more coherent packages to make the code more maintainable and less
prone to regressions on the long runs.

Some libpod definitions were moved to `libpod/define` to resolve
circular dependencies.

Signed-off-by: Valentin Rothberg <rothberg@redhat.com>
2019-10-31 17:42:37 +01:00
Nalin Dahyabhai a4a70b4506 bump containers/image to v5.0.0, buildah to v1.11.4
Move to containers/image v5 and containers/buildah to v1.11.4.

Replace an equality check with a type assertion when checking for a
docker.ErrUnauthorizedForCredentials in `podman login`.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2019-10-29 13:35:18 -04:00
Matthew Heon 5f8bf3d07d Add ensureState helper for checking container state
We have a lot of checks for container state scattered throughout
libpod. Many of these need to ensure the container is in one of a
given set of states so an operation may safely proceed.
Previously there was no set way of doing this, so we'd use unique
boolean logic for each one. Introduce a helper to standardize
state checks.

Note that this is only intended to replace checks for multiple
states. A simple check for one state (ContainerStateRunning, for
example) should remain a straight equality, and not use this new
helper.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2019-10-28 13:09:01 -04:00
Matthew Heon 6f630bc09b Move OCI runtime implementation behind an interface
For future work, we need multiple implementations of the OCI
runtime, not just a Conmon-wrapped runtime matching the runc CLI.

As part of this, do some refactoring on the interface for exec
(move to a struct, not a massive list of arguments). Also, add
'all' support to Kill and Stop (supported by runc and used a bit
internally for removing containers).

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-10-10 10:19:32 -04:00
Miloslav Trmač d3f59bedb3 Update c/image to v4.0.1 and buildah to 1.11.3
This requires updating all import paths throughout, and a matching
buildah update to interoperate.

I can't figure out the reason for go.mod tracking
	github.com/containers/image v3.0.2+incompatible // indirect
((go mod graph) lists it as a direct dependency of libpod, but
(go list -json -m all) lists it as an indirect dependency),
but at least looking at the vendor subdirectory, it doesn't seem
to be actually used in the built binaries.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
2019-10-04 20:18:23 +02:00
Matthew Heon c2284962c7 Add support for launching containers without CGroups
This is mostly used with Systemd, which really wants to manage
CGroups itself when managing containers via unit file.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-09-10 10:52:37 -04:00
Gabi Beyer ef8834aeab Add comment to describe postConfigureNetNS
Provide information stating what the postConfigureNetNS option
is used for.

Signed-off-by: Gabi Beyer <gabrielle.n.beyer@intel.com>
2019-07-30 23:28:52 +00:00
baude db826d5d75 golangci-lint round #3
this is the third round of preparing to use the golangci-lint on our
code base.

Signed-off-by: baude <bbaude@redhat.com>
2019-07-21 14:22:39 -05:00
baude a78c885397 golangci-lint pass number 2
clean up and prepare to migrate to the golangci-linter

Signed-off-by: baude <bbaude@redhat.com>
2019-07-11 09:13:06 -05:00
OpenShift Merge Robot edc7f52c95
Merge pull request #3425 from adrianreber/restore-mount-label
Set correct SELinux label on restored containers
2019-07-08 20:31:59 +02:00
Matthew Heon a1bb1987cc Store Conmon's PID in our state and display in inspect
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-07-02 18:52:55 -04:00
baude 8561b99644 libpod removal from main (phase 2)
this is phase 2 for the removal of libpod from main.

Signed-off-by: baude <bbaude@redhat.com>
2019-06-27 07:56:24 -05:00
Giuseppe Scrivano e27fef335a
stats: fix cgroup path for rootless containers
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-06-26 13:17:06 +02:00
baude dd81a44ccf remove libpod from main
the compilation demands of having libpod in main is a burden for the
remote client compilations.  to combat this, we should move the use of
libpod structs, vars, constants, and functions into the adapter code
where it will only be compiled by the local client.

this should result in cleaner code organization and smaller binaries. it
should also help if we ever need to compile the remote client on
non-Linux operating systems natively (not cross-compiled).

Signed-off-by: baude <bbaude@redhat.com>
2019-06-25 13:51:24 -05:00
Adrian Reber 94e2a0cd63
Track if a container is restored from an exported checkpoint
Instead of only tracking that a container is restored from
a checkpoint locally in runtime_ctr.go this adds a flag to the
Container structure.

Upcoming patches to correctly label the root file-system mount-point
need also to know if a container is restored from a checkpoint.

Instead of passing a parameter around a lot of functions, this
adds that information to the Container structure.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-06-25 14:55:11 +02:00
Matthew Heon 92bae8d308 Begin adding support for multiple OCI runtimes
Allow Podman containers to request to use a specific OCI runtime
if multiple runtimes are configured. This is the first step to
properly supporting containers in a multi-runtime environment.

The biggest changes are that all OCI runtimes are now initialized
when Podman creates its runtime, and containers now use the
runtime requested in their configuration (instead of always the
default runtime).

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-06-19 17:08:43 -04:00
Matthew Heon 7b7853d8c7 Purge all use of easyjson and ffjson in libpod
We're no longer using either of these JSON libraries, dropped
them in favor of jsoniter. We can't completely remove ffjson as
c/storage uses it and can't easily migrate, but we can make sure
that libpod itself isn't doing anything with them anymore.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-06-13 11:03:20 -04:00
Peter Hunt 51bdf29f04 Address comments
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-05-28 11:10:57 -04:00
Peter Hunt f61fa28d39 Added --log-driver and journald logging
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2019-05-28 11:10:57 -04:00
Matthew Heon 7ba1b609aa Move to using constants for valid restart policy types
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-03 10:36:16 -04:00
Matthew Heon f4db6d5cf6 Add support for retry count with --restart flag
The on-failure restart option supports restarting only a given
number of times. To do this, we need one additional field in the
DB to track restart count (which conveniently fills a field in
Inspect we weren't populating), plus some plumbing logic.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-03 10:36:16 -04:00
Matthew Heon 0d73ee40b2 Add container restart policy to Libpod & Podman
This initial version does not support restart count, but it works
as advertised otherwise.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-03 10:36:16 -04:00
Matthew Heon 3fb52f4fbb Add a StoppedByUser field to the DB
This field indicates that a container was explciitly stopped by
an API call, and did not exit naturally. It's used when
implementing restart policy for containers.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-05-03 10:36:16 -04:00
Matthew Heon f7951c8776 Use GetContainer instead of LookupContainer for full ID
All IDs in libpod are stored as a full container ID. We can get a
container by full ID faster with GetContainer (which directly
retrieves) than LookupContainer (which finds a match, then
retrieves). No reason to use Lookup when we have full IDs present
and available.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-12 10:59:00 -04:00
Ed Santiago a07b2c2c60 (minor): fix misspelled 'Healthcheck'
Signed-off-by: Ed Santiago <santiago@redhat.com>
2019-04-10 09:43:56 -06:00
baude 23cd1928ec podman-remote ps
add the ability to run ps on containers using the remote client.

Signed-off-by: baude <bbaude@redhat.com>
2019-04-09 15:00:35 -05:00
Matthew Heon 1fdc89f616 Drop LocalVolumes from our the database
We were never using it. It's actually a potentially quite sizable
field (very expensive to decode an array of structs!). Removing
it should do no harm.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-04 12:27:20 -04:00
Matthew Heon d245c6df29 Switch Libpod over to new explicit named volumes
This swaps the previous handling (parse all volume mounts on the
container and look for ones that might refer to named volumes)
for the new, explicit named volume lists stored per-container.

It also deprecates force-removing volumes that are in use. I
don't know how we want to handle this yet, but leaving containers
that depend on a volume that no longer exists is definitely not
correct.

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-04 12:26:29 -04:00
Matthew Heon 11799f4e0e Add named volumes for each container to database
Signed-off-by: Matthew Heon <matthew.heon@pm.me>
2019-04-04 12:26:29 -04:00