The ZFS driver should raise proper errors when the ZFS utility is
missing or when there's no zfs partition active on the system. Raising the
proper errors make possible to silently ignore the ZFS storage
driver when no default storage driver is specified.
Previous to this commit it was no longer possible to start the
docker daemon in that way:
docker -d --storage-opt dm.loopdatasize=2GB
The above command resulted in an exit error because the ZFS driver
tried to use the storage options.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Closes#14621
This one grew to be much more than I expected so here's the story... :-)
- when a bad port string (e.g. xxx80) is passed into container.create()
via the API it wasn't being checked until we tried to start the container.
- While starting the container we trid to parse 'xxx80' in nat.Int()
and would panic on the strconv.ParseUint(). We should (almost) never panic.
- In trying to remove the panic I decided to make it so that we, instead,
checked the string during the NewPort() constructor. This means that
I had to change all casts from 'string' to 'Port' to use NewPort() instead.
Which is a good thing anyway, people shouldn't assume they know the
internal format of types like that, in general.
- This meant I had to go and add error checks on all calls to NewPort().
To avoid changing the testcases too much I create newPortNoError() **JUST**
for the testcase uses where we know the port string is ok.
- After all of that I then went back and added a check during container.create()
to check the port string so we'll report the error as soon as we get the
data.
- If, somehow, the bad string does get into the metadata we will generate
an error during container.start() but I can't test for that because
the container.create() catches it now. But I did add a testcase for that.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Current default basesize is 10G. Change it to 100G. Reason being that for
some people 10G is turning out to be too small and we don't have capabilities
to grow it dyamically.
This is just overcommitting and no real space is allocated till container
actually writes data. And this is no different then fs based graphdrivers
where virtual size of a container root is unlimited.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Replaced github.com/docker/libcontainer with
github.com/opencontainers/runc/libcontaier.
Also I moved AppArmor profile generation to docker.
Main idea of this update is to fix mounting cgroups inside containers.
After updating docker on CI we can even remove dind.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
- Refactor opts.ValidatePath and add an opts.ValidateDevice
ValidePath will now accept : containerPath:mode, hostPath:containerPath:mode
and hostPath:containerPath.
ValidateDevice will have the same behavior as current.
- Refactor opts.ValidateEnv, opts.ParseEnvFile
Environment variables will now be validated with the following
definition :
> Environment variables set by the user must have a name consisting
> solely of alphabetics, numerics, and underscores - the first of
> which must not be numeric.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Memory swappiness option takes 0-100, and helps to tune swappiness
behavior per container.
For example, When a lower value of swappiness is chosen
the container will see minimum major faults. When no value is
specified for memory-swappiness in docker UI, it is inherited from
parent cgroup. (generally 60 unless it is changed).
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
When a container is removed but it had an exec, that still hasn't been
GC'd per PR #14476, and someone tries to inspect the exec we should
return a 404, not a 500+container not running. Returning "..not running" is
not only misleading because it could lead people to think the container is
actually still around, but after 5 minutes the error will change to a 404
after the GC. This means that we're externalizing our internall soft-deletion/GC
logic which shouldn't be any of the end user's concern. They should get the
same results immediate or after 5 minutes.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Libcontainer already supported mount container's own cgroup into
container, with this patch, we can see container's own cgroup info
in container.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This takes the final removal for exec commands in two steps. The first
GC tick will mark the exec commands for removal and then the second tick
will remove the config from the daemon.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds an event loop for running a GC cleanup for exec command
references that are on the daemon. These cannot be cleaned up
immediately because processes may need to get the exit status of the
exec command but it should not grow out of bounds. The loop is set to a
default 5 minute interval to perform cleanup.
It should be safe to perform this cleanup because unless the clients are
remembering the exec id of the process they launched they can query for
the status and see that it has exited. If they don't save the exec id
they will have to do an inspect on the container for all exec instances
and anything that is not live inside that container will not be returned
in the container inspect.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the exec config from the container after the command exits
so that dead exec commands are not displayed in the container inspect.
The commands are still kept on the daemon so that when you inspect the
exec command, not the container, you are still able to get it's exit
status.
This also changes the ProcessConfig to a pointer.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The ability to save and verify base device UUID (#13896) introduced a
situation where the initialization would panic when removing the device
returns EBUSY.
Functions `verifyBaseDeviceUUID` and `saveBaseDeviceUUID` now take the
lock on the `DeviceSet`, which solves the problem as `removeDevice`
assumes it owns the lock.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
Often it happens that docker is not able to shutdown/remove the thin
pool it created because some device has leaked into some mount name
space. That means device is in use and that means pool can't be removed.
Docker will leave pool as it is and exit. Later when user starts the
docker, it finds pool is already there and docker uses it. But docker
does not know it is same pool which is using the loop devices. Now
docker thinks loop devices are not being used. That means it does not
display the data correctly in "docker info", giving user wrong information.
This patch tries to detect if loop devices as created by docker are
being used for pool and fills in the right details in "docker info".
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
If a container is read-only, also set /proc, /sys,
& /dev to read-only. This should apply to both privileged and
unprivileged containers.
Note that when /dev is read-only, device files may still be
written to. This change will simply prevent the device paths
from being modified, or performing mknod of new devices within
the /dev path.
Tests are included for all cases. Also adds a test to ensure
that /dev/pts is always mounted read/write, even in the case of a
read-write rootfs. The kernel restricts writes here naturally and
bad things will happen if we mount it ro.
Signed-off-by: Eric Windisch <eric@windisch.us>
DeviceMapper must be explicitly selected because the Docker binary might not be linked to the right devmapper library.
With this change, Docker fails fast if the driver detection finds the devicemapper directory but the driver is not the default option.
The option `override_udev_sync_check` doesn't make sense anymore, since the user must be explicit to select devicemapper, so it's being removed.
Docker fails to use devicemapper only if Docker has been built statically unless the option was explicit.
Signed-off-by: David Calavera <david.calavera@gmail.com>
- Container networking statistics are no longer
retrievable from libcontainer after the introduction
of libnetwork. This change adds the missing code
for docker daemon to retireve the nw stats from
Endpoint.
Signed-off-by: Alessandro Boch <aboch@docker.com>
libnetwork host, none and bridge driver initialization is incorrectly
disabled if the daemon flag --bridge=none. The expected behavior of
setting --bridge as none is to disable the bridge driver alone and let
all other modes to be operational.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
By convention /pkg is safe to use from outside the docker tree, for example
if you're building a docker orchestrator.
/nat currently doesn't have any dependencies outside of /pkg, so it seems
reasonable to move it there.
This rename was performed with:
```
gomvpkg -vcs_mv_cmd="git mv {{.Src}} {{.Dst}}" \
-from github.com/docker/docker/nat \
-to github.com/docker/docker/pkg/nat
```
Signed-off-by: Peter Waller <p@pwaller.net>
Export metadata for container and image in docker-inspect when overlay
graphdriver is in use. Right now it is done only for devicemapper graph
driver.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
When a container is started with `--net=host` with
a particular name and it is subsequently destroyed,
then all subsequent creations of the container with
the same name will fail. This is because in `--net=host`
the namespace is shared i.e the host namespace so
trying to destroy the host namespace by calling
`LeaveAll` will fail and the endpoint is left with
the dangling state. So the fix is, for this mode, do
not attempt to destroy the namespace but just cleanup
the endpoint state and return.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
With publish-service and default-network support, a container could be
connected to a user-defined network that is backed by any driver/plugin.
But if the user uses port mapping or expose commands, the expectation
for that container is to behave like existing bridge network.
Thanks to the Libnetwork's CNM model, containers can be connected
to the bridge network as a secondary network in addition to the
user-specified network.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
- brings in vxlan based native multihost networking
- added a daemon flag required by libkv for dist kv operations
- moved the daemon flags to experimental
Signed-off-by: Madhu Venugopal <madhu@docker.com>
This commit makes use of the CNM model supported by LibNetwork and
provides an ability to let a container to publish a specified service.
Behind the scenes, if a service with the given name doesnt exist, it is
automatically created on appropriate network and attach the container.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Signed-off-by: Madhu Venugopal <madhu@docker.com>
By default, the cgroup setting in libcontainer's configs.Cgroup for
memory swappiness will default to 0, which is a valid choice for memory
swappiness, but that means by default every container's memory
swappiness will be set to zero instead of the default 60, which is
probably not what users are expecting.
When the swappiness UI PR comes into Docker, there will be docker run
controls to set this per container, but for now we want to make sure
*not* to change the default, as well as work around an older kernel
issue that refuses to allow it to be set when cgroup hiearchies are in
use.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This commit also brings in the ability to specify a default network and its
corresponding driver as daemon flags. This helps in existing clients to
make use of newer networking features provided by libnetwork.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
It is easy for one to use docker for a while, shut it down and restart
docker with different set of storage options for device mapper driver
which will effectively change the thin pool. That means any of the
metadata stored in /var/lib/docker/devicemapper/metadata/ is not valid
for the new pool and user will run into various kind of issues like
container not found in the pool etc.
Users think that their images or containers are lost but it might just
be the case of configuration issue. People might use wrong metadata
with wrong pool.
To detect such situations, save UUID of base image and once docker
starts later, query and compare the UUID of base image with the
stored one. If they don't match, fail the initialization with the
error that UUID failed to match.
That way user will be forced to cleanup /var/lib/docker/ directory
and start docker again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
An inspection of the graph package showed this function to be way out of place.
It is only depended upon by the daemon code. The function prepares a top-level
readonly layer used to provide a consistent runtime environment for docker
images.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
If no Mtu value is provided to the docker daemon, get the mtu from the
default route's interface. If there is no default route, default to a
mtu of 1500.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Export image/container metadata stored in graph driver. Right now 3 fields
DeviceId, DeviceSize and DeviceName are being exported from devicemapper.
Other graph drivers can export fields as they see fit.
This data can be used to mount the thin device outside of docker and tools
can look into image/container and do some kind of inspection.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This PR brings the vendored libnetwork code to
3be488927db8d719568917203deddd630a194564, which pulls in quite a few
fixes to support kvstore, windows daemon compilation fixes,
multi-network support for Bridge driver, etc...
Signed-off-by: Madhu Venugopal <madhu@docker.com>