By default, the cgroup setting in libcontainer's configs.Cgroup for
memory swappiness will default to 0, which is a valid choice for memory
swappiness, but that means by default every container's memory
swappiness will be set to zero instead of the default 60, which is
probably not what users are expecting.
When the swappiness UI PR comes into Docker, there will be docker run
controls to set this per container, but for now we want to make sure
*not* to change the default, as well as work around an older kernel
issue that refuses to allow it to be set when cgroup hiearchies are in
use.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This is breaking various setups where the host's rootfs is mount shared
correctly and breaks live migration with bind mounts.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
* Don't AllocateNetwork when network is disabled
* Don't createNetwork in execdriver when network is disabled
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
As part of this some generic packages like iptables, etchosts and resolvconf
have also been moved to libnetwork. Even though they can still be
consumed in a generic fashion they will reside and be maintained
from within the libnetwork project.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
- Updated Dockerfile to satisfy libnetwork GOPATH requirements.
- Reworked daemon to allocate network resources using libnetwork.
- Reworked remove link code to also update network resources in libnetwork.
- Adjusted the exec driver command population to reflect libnetwork design.
- Adjusted the exec driver create command steps.
- Updated a few test cases to reflect the change in design.
- Removed the dns setup code from docker as resolv.conf is entirely managed
in libnetwork.
- Integrated with lxc exec driver.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Generation based on CAP_LAST_CAP, I hardcoded
capability.CAP_BLOCK_SUSPEND as last for systems which has no
/proc/sys/kernel/cap_last_cap
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add tests for mounting into /proc and /sys
These two locations should be prohibited from mounting volumes into
those destinations.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
`lxc-stop` does not support sending arbitrary signals.
By default, `lxc-stop -n <id>` would send `SIGPWR`.
The lxc driver was always sending `lxc-stop -n <id> -k`, which always
sends `SIGKILL`. In this case `lxc-start` returns an exit code of `0`,
regardless of what the container actually exited with.
Because of this we must send signals directly to the process when we
can.
Also need to set quiet mode on `lxc-start` otherwise it reports an error
on `stderr` when the container exits cleanly (ie, we didn't SIGKILL it),
this error is picked up in the container logs... and isn't really an
error.
Also cleaned up some potential races for waitblocked test.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
The `--userland-proxy` daemon flag makes it possible to rely on hairpin
NAT and additional iptables routes instead of userland proxy for port
publishing and inter-container communication.
Usage of the userland proxy remains the default as hairpin NAT is
unsupported by older kernels.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
To help avoid version mismatches between libcontainer and Docker, this updates libcontainer to be the source of truth for which version of logrus the project is using. This should help avoid potential incompatibilities in the future, too. 👍
Signed-off-by: Andrew "Tianon" Page <admwiggin@gmail.com>
This also moves `exec -i` test to _unix_test.go because it seems to need a
pty to reliably reproduce the behavior.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
Updates most of the instances of HTTP urls in the engine's
comments. Does not account for any use in the code itself,
documentation, contrib, or project files.
Signed-off-by: Eric Windisch <eric@windisch.us>
Preventing the test execution to pollute the deterministic runtime environment
by seeding the global rand.Random.
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
When working with Go channels you must not set it to nil or else the
channel will block forever. It will not panic reading from a nil chan
but it blocks. The correct way to do this is to create the channel then
close it as the correct results to the caller will be returned.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
container.
docker run -v /dev:/dev should stop mounting other default mounts in i
libcontainer otherwise directories and devices like /dev/ptx get mishandled.
We want to be able to run libvirtd for launching vms and it needs
access to the hosts /dev. This is a key componant of OpenStack.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This ensures that the libcontainer state is fully removed for a
container after it is terminated.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The default for rlimit handling should be to inherit the rlimit of the
daemon unless explicitly set.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Cgroup resources are host dependent, they should be in hostConfig.
For backward compatibility, we just copy it to hostConfig, and leave it in
Config for now, so there is no regressions, but the right way to use this
throught json is to put it in HostConfig, like:
{
"Hostname": "",
...
"HostConfig": {
"CpuShares": 512,
"Memory": 314572800,
...
}
}
As we will add CpusetMems, CpusetCpus is definitely a better name, but some
users are already using Cpuset in their http APIs, we also make it compatible.
The main idea is keep using Cpuset in Config Struct, and make it has the same
value as CpusetCpus, but not always, some scenarios:
- Users use --cpuset in docker command, it can setup cpuset.cpus and can
get Cpuset field from docker inspect or other http API which will get
config info.
- Users use --cpuset-cpus in docker command, ditto.
- Users use Cpuset field in their http APIs, ditto.
- Users use CpusetCpus field in their http APIs, they won't get Cpuset field
in Config info, because by then, they should already know what happens
to Cpuset.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This fixes various tests by checking for non zero exit code, accounting for lxc-specific base-diffs, and by removing lxc specific environment vars.
It also adds the --share-ipc option to lxc-start for shared ipc namespaces.
Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Set lxc.auto.mount = proc:mixed in unprivilged mode. This ensures that lxc mounts sys and proc/sysrq-trigger as readonly.
Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Sending capability ids instead of capability names ot LXC for --cap-add and --cap-drop.
Also fixed tests.
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Add a --readonly flag to allow the container's root filesystem to be
mounted as readonly. This can be used in combination with volumes to
force a container's process to only write to locations that will be
persisted. This is useful in many cases where the admin controls where
they would like developers to write files and error on any other
locations.
Closes#7923Closes#8752
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
We want to be able to use container without the PID namespace. We basically
want containers that can manage the host os, which I call Super Privileged
Containers. We eventually would like to get to the point where the only
namespace we use is the MNT namespace to bring the Apps userspace with it.
By eliminating the PID namespace we can get better communication between the
host and the clients and potentially tools like strace and gdb become easier
to use. We also see tools like libvirtd running within a container telling
systemd to place a VM in a particular cgroup, we need to have communications of the PID.
I don't see us needing to share PID namespaces between containers, since this
is really what docker exec does.
So currently I see us just needing docker run --pid=host
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This fixes the issue where an lxc.conf override of lxc.network.ipv4 was not being honored.
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
This commit contains changes for docker:
* user.GetGroupFile to user.GetGroupPath docker/libcontainer#301
* Add systemd support for OOM docker/libcontainer#307
* Support for custom namespaces docker/libcontainer#279, docker/libcontainer#312
* Fixes#9699docker/libcontainer#308
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
Some workloads rely on IPC for communications with other processes. We
would like to split workloads between two container but still allow them
to communicate though shared IPC.
This patch mimics the --net code to allow --ipc=host to not split off
the IPC Namespace. ipc=container:CONTAINERID to share ipc between containers
If you share IPC between containers, then you need to make sure SELinux labels
match.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This passed the --net=container:CONTINER_ID to lxc-start as --share-net
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
Since the containers can handle the out of memory kernel kills gracefully, docker
will only provide out of memory information as an additional metadata as part of
container status.
Docker-DCO-1.1-Signed-off-by: Vishnu Kannan <vishnuk@google.com> (github: vishh)
Lxc driver was throwing errors for mounts where the mount point does not exist in the container.
This adds a create=dir/file mount option to the lxc template, to alleviate this issue.
Docker-DCO-1.1-Signed-off-by: Abin Shahab <ashahab@altiscale.com> (github: ashahab-altiscale)
We removed the syncpipe package and replaced it with specific calls to
create a new *os.File from a specified fd passed to the process. This
reduced code and an extra object to manage the container's init
lifecycle.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Right now, MAC addresses are randomly generated by the kernel when
creating the veth interfaces.
This causes different issues related to ARP, such as #4581, #5737 and #8269.
This change adds support for consistent MAC addresses, guaranteeing that
an IP address will always end up with the same MAC address, no matter
what.
Since IP addresses are already guaranteed to be unique by the
IPAllocator, MAC addresses will inherit this property as well for free.
Consistent mac addresses is also a requirement for stable networking (#8297)
since re-using the same IP address on a different MAC address triggers the ARP
issue.
Finally, this change makes the MAC address accessible through docker
inspect, which fixes#4033.
Signed-off-by: Andrea Luzzardi <aluzzardi@gmail.com>
security-opts will allow you to customise the security subsystem.
For example the labeling system like SELinux will run on a container.
--security-opt="label:user:USER" : Set the label user for the container
--security-opt="label:role:ROLE" : Set the label role for the container
--security-opt="label:type:TYPE" : Set the label type for the container
--security-opt="label:level:LEVEL" : Set the label level for the container
--security-opt="label:disabled" : Turn off label confinement for the container
Since we are passing a list of string options instead of a space separated
string of options, I will change function calls to use InitLabels instead of
GenLabels. Genlabels interface is Depracated.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
This also removes dead code in the native driver for a past feature that
was never fully implemented.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
commit 4aa5da278f moves `Console` from Command to
ProcessConfig, but missed the change in lxc_template. Therefore creating a
container with tty using lxc driver with fail with error
template: lxc:60:20: executing "lxc" at <.Console>: Console is not a field of
struct type struct { *execdriver.Command; AppArmor bool; ProcessLabel string; MountLabel string }
This changes lxc_console template to refers to `.ProcessConfig.Console`
Docker-DCO-1.1-Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com> (github: dqminh)
This changes the way the exec drivers work by not specifing a -driver
flag on reexec. For each of the exec drivers they register their own
functions that will be matched aginst the argv 0 on exec and called if
they match.
This also allows any functionality to be added to docker so that the
binary can be reexec'd and any type of function can be called. I moved
the flag parsing on docker exec to the specific initializers so that the
implementations do not bleed into one another. This also allows for
more flexability within reexec initializers to specify their own flags
and options.
Signed-off-by: Michael Crosby <michael@docker.com>
lxc is special in that we cannot create the master outside of the
container without opening the slave because we have nothing to provide to the
cmd. We have to open both then do the crazy setup on command right now instead of
passing the console path to lxc and telling it to open up that console. we save a couple of
openfiles in the native driver because we can do this.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)
This uses "," instead of spaces so that the flags are parsed correctly
and also does not do a strings.Split on an empty string because
strings.Split will return a slice with one element, and empty string
causing parsing to fail when it validates that the cap exists.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)
as a maintainer.
Best of luck on your e-commerce business Guillaume, and thanks for all
the great contributions!
Docker-DCO-1.1-Signed-off-by: Solomon Hykes <solomon@docker.com> (github: shykes)
This patch adds pause/unpause to the command line, api, and drivers
for use on containers. This is implemented using the cgroups/freeze
utility in libcontainer and lxc freeze/unfreeze.
Co-Authored-By: Eric Windisch <ewindisch@docker.com>
Co-Authored-By: Chris Alfonso <calfonso@redhat.com>
Docker-DCO-1.1-Signed-off-by: Ian Main <imain@redhat.com> (github: imain)
This is a fix for a race condition in the LXC driver. This is described
more in issue #6092.
Closes#6092
Docker-DCO-1.1-Signed-off-by: Shane Canon <scanon@lbl.gov> (github: scanon)
This also makes sure that devices are pointers to avoid copies
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We now have one place that keeps track of (most) devices that are allowed and created within the container. That place is pkg/libcontainer/devices/devices.go
This fixes several inconsistencies between which devices were created in the lxc backend and the native backend. It also fixes inconsistencies between wich devices were created and which were allowed. For example, /dev/full was being created but it was not allowed within the cgroup. It also declares the file modes and permissions of the default devices, rather than copying them from the host. This is in line with docker's philosphy of not being host dependent.
Docker-DCO-1.1-Signed-off-by: Timothy Hobbs <timothyhobbs@seznam.cz> (github: https://github.com/timthelion)
Add specific types for Required and Optional DeviceNodes
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
Fixes#5692
This change requires lxc 1.0+ to work and breaks lxc versions less than
1.0 for host networking. We think that this is a find tradeoff by
bumping docker to only support lxc 1.0
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
We need SETFCAP to be able to mark files as having caps, which is
heavily used by fedora.
See https://github.com/dotcloud/docker/issues/5928
We also need SETPCAP, for instance systemd needs this to set caps
on its childen.
Both of these are safe in the sense that they can never ever
result in a process with a capability not in the bounding set of the
container.
We also add NET_BIND_SERVICE caps, to be able to bind to ports lower
than 1024.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
those that were specified in the config. This commit also explicitly
adds a set of capabilities that we were silently not dropping and were
assumed by the tests.
Docker-DCO-1.1-Signed-off-by: Victor Marmol <vmarmol@google.com> (github: vmarmol)
All modern distros set up /run to be a tmpfs, see for instance:
https://wiki.debian.org/ReleaseGoals/RunDirectory
Its a very useful place to store pid-files, sockets and other things
that only live at runtime and that should not be stored in the image.
This is also useful when running systemd inside a container, as it
will try to mount /run if not already mounted, which will fail for
non-privileged container.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)