Expand the godoc documentation for the graph package.
Centralize DefaultTag in the graphs/tag package instead of defining it
twice.
Remove some unnecessary "config" structs that are only used to pass
a few parameters to a function.
Simplify the GetParentsSize function - there's no reason for it to take
an accumulator argument.
Unexport some functions that aren't needed outside the package.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
[pkg/archive] Update archive/copy path handling
- Remove unused TarOptions.Name field.
- Add new TarOptions.RebaseNames field.
- Update some of the logic around path dir/base splitting.
- Update some of the logic behind archive entry name rebasing.
[api/types] Add LinkTarget field to PathStat
[daemon] Fix stat, archive, extract of symlinks
These operations *should* resolve symlinks that are in the path but if the
resource itself is a symlink then it *should not* be resolved. This patch
puts this logic into a common function `resolvePath` which resolves symlinks
of the path's dir in scope of the container rootfs but does not resolve the
final element of the path. Now archive, extract, and stat operations will
return symlinks if the path is indeed a symlink.
[api/client] Update cp path hanling
[docs/reference/api] Update description of stat
Add the linkTarget field to the header of the archive endpoint.
Remove path field.
[integration-cli] Fix/Add cp symlink test cases
Copying a symlink should do just that: copy the symlink NOT
copy the target of the symlink. Also, the resulting file from
the copy should have the name of the symlink NOT the name of
the target file.
Copying to a symlink should copy to the symlink target and not
modify the symlink itself.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[v2: a separate aufs commit is merged into this one]
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
The 'deny ptrace' statement was supposed to only ignore
ptrace failures in the AUDIT log. However, ptrace was implicitly
allowed from unconfined processes (such as the docker daemon and
its integration tests) due to the abstractions/base include.
This rule narrows the definition such that it will only ignore
the failures originating inside of the container and will not
cause denials when the daemon or its tests ptrace inside processes.
Introduces positive and negative tests for ptrace /w apparmor.
Signed-off-by: Eric Windisch <eric@windisch.us>
Integration tests were failing due to proc filter behavior
changes with new apparmor policies.
Also include the missing docker-unconfined policy resolving
potential startup errors. This policy is complain-only so
it should behave identically to the standard unconfined policy,
but will not apply system path-based policies within containers.
Signed-off-by: Eric Windisch <eric@windisch.us>
- downcase and privatize exported variables that were unused
- make accurate an error message
- added package comments
- remove unused var ReadLogsNotSupported
- enable linter
- some spelling corrections
Signed-off-by: Morgan Bauer <mbauer@us.ibm.com>
If I run two containers with the same network they share the same /etc/resolv.conf.
The current code changes the labels of the /etc/resolv.conf currently to the
private label which causes it to be unusable in the first container.
This patch changes the labels to a shared label if more then one container
will use the content.
Docker-DCO-1.1-Signed-off-by: Dan Walsh dwalsh@redhat.com (github: rhatdan)
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
Will attempt to load profiles automatically. If loading fails
but the profiles are already loaded, execution will continue.
A hard failure will only occur if Docker cannot load
the profiles *and* they have not already been loaded via
some other means.
Also introduces documentation for AppArmor.
Signed-off-by: Eric Windisch <eric@windisch.us>
The `ApplyDiff` function takes a tar archive stream that is
automagically decompressed later. This was causing a double
decompression, and when the layer was empty, that causes an early EOF.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Return an error when the container is stopped only in api versions
equal or greater than 1.20 (docker 1.8).
Signed-off-by: David Calavera <david.calavera@gmail.com>
daemon_test.go supposted to be unit test for daemon, so
don't see reason why we need another daemon_unit_test.go.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
In 1.6.2 we were decoding inspect API response into interface{}.
time.Time fields were JSON encoded as RFC3339Nano in the response
and when decoded into interface{} they were just strings so the inspect
template treated them as just strings.
From 1.7 we are decoding into types.ContainerJSON and when the template
gets executed it now gets a time.Time and it's formatted as
2015-07-22 05:02:38.091530369 +0000 UTC.
This patch brings back the old behavior by typing time.Time fields
as string so they gets formatted as they were encoded in JSON -- RCF3339Nano
Signed-off-by: Antonio Murdaca <runcom@linux.com>
* Add godoc documentation where it was missing
* Change identifier names that don't match Go style, such as INDEX_NAME
* Rename RegistryInfo to PingResult, which more accurately describes
what this structure is for. It also has the benefit of making the name
not stutter if used outside the package.
Updates #14756
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
There is no option validation for "journald" log-driver, so it makes no
sense to fail in that case.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
This patch creates a new cli package that allows to combine both client
and daemon commands (there is only one daemon command: docker daemon).
The `-d` and `--daemon` top-level flags are deprecated and a special
message is added to prompt the user to use `docker daemon`.
Providing top-level daemon-specific flags for client commands result
in an error message prompting the user to use `docker daemon`.
This patch does not break any old but correct usages.
This also makes `-d` and `--daemon` flags, as well as the `daemon`
command illegal in client-only binaries.
Signed-off-by: Tibor Vass <tibor@docker.com>
Prevent the docker daemon from mounting the created network files over
those provided by the user via -v command line option. This would otherwise
hide the one provide by the user.
The benefit of this is that a user can provide these network files using the
-v command line option and place them in a size-limited filesystem.
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
By using the 'unconfined' policy for privileged
containers, we have inherited the host's apparmor
policies, which really make no sense in the
context of the container's filesystem.
For instance, policies written against
the paths of binaries such as '/usr/sbin/tcpdump'
can be easily circumvented by moving the binary
within the container filesystem.
Fixes GH#5490
Signed-off-by: Eric Windisch <eric@windisch.us>
When we are creating a container, first we call into graph driver to take
snapshot of image and create root for container-init. Then we write some
files to it and call into graph driver again to create container root
from container-init as base.
Once we have written files to container-init root, we don't unmount it
before taking a snapshot of it. Looks like with XFS it leaves it in such
a state that when we mount the container root, it goes into log recovery
path.
Jul 22 10:24:54 vm2-f22 kernel: XFS (dm-6): Mounting V4 Filesystem
Jul 22 10:24:54 vm2-f22 kernel: XFS (dm-6): Starting recovery (logdev: internal)
Jul 22 10:24:54 vm2-f22 kernel: XFS (dm-6): Ending recovery (logdev: internal)
This should not be required. So let us unmount container-init before use
it as a base for container root and then XFS does not go into this
internal recovery path.
Somebody had raised this issue for ext4 sometime back and proposed the same
change. I had shot it down at that point of time. I think now time has
come for this change.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
It's introduced in
68ba5f0b69 (Execdriver implementation on new libcontainer API)
But I don't see reson why we need it.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
The following methods will deprecate the Copy method and introduce
two new, well-behaved methods for creating a tar archive of a resource
in a container and for extracting a tar archive into a directory in a
container.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
The automatic installation of AppArmor policies prevents the
management of custom, site-specific apparmor policies for the
default container profile. Furthermore, this change will allow
a future policy for the engine itself to be written without demanding
the engine be able to arbitrarily create and manage AppArmor policies.
- Add deb package suggests for apparmor.
- Ubuntu postinst use aa-status & fix policy path
- Add the policies to the debian packages.
- Add apparmor tests for writing proc files
Additional restrictions against modifying files in proc
are enforced by AppArmor. Ensure that AppArmor is preventing
access to these files, not simply Docker's configuration of proc.
- Remove /proc/k?mem from AA policy
The path to mem and kmem are in /dev, not /proc
and cannot be restricted successfully through AppArmor.
The device cgroup will need to be sufficient here.
- Load contrib/apparmor during integration tests
Note that this is somewhat dirty because we
cannot restore the host to its original configuration.
However, it should be noted that prior to this patch
series, the Docker daemon itself was loading apparmor
policy from within the tests, so this is no dirtier or
uglier than the status-quo.
Signed-off-by: Eric Windisch <eric@windisch.us>
There are several bug reports on this error happening, and error is
not helpful unless you read the code. Google brings up removing
the repositories.btrfs file.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
The ZFS driver should raise proper errors when the ZFS utility is
missing or when there's no zfs partition active on the system. Raising the
proper errors make possible to silently ignore the ZFS storage
driver when no default storage driver is specified.
Previous to this commit it was no longer possible to start the
docker daemon in that way:
docker -d --storage-opt dm.loopdatasize=2GB
The above command resulted in an exit error because the ZFS driver
tried to use the storage options.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Closes#14621
This one grew to be much more than I expected so here's the story... :-)
- when a bad port string (e.g. xxx80) is passed into container.create()
via the API it wasn't being checked until we tried to start the container.
- While starting the container we trid to parse 'xxx80' in nat.Int()
and would panic on the strconv.ParseUint(). We should (almost) never panic.
- In trying to remove the panic I decided to make it so that we, instead,
checked the string during the NewPort() constructor. This means that
I had to change all casts from 'string' to 'Port' to use NewPort() instead.
Which is a good thing anyway, people shouldn't assume they know the
internal format of types like that, in general.
- This meant I had to go and add error checks on all calls to NewPort().
To avoid changing the testcases too much I create newPortNoError() **JUST**
for the testcase uses where we know the port string is ok.
- After all of that I then went back and added a check during container.create()
to check the port string so we'll report the error as soon as we get the
data.
- If, somehow, the bad string does get into the metadata we will generate
an error during container.start() but I can't test for that because
the container.create() catches it now. But I did add a testcase for that.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Current default basesize is 10G. Change it to 100G. Reason being that for
some people 10G is turning out to be too small and we don't have capabilities
to grow it dyamically.
This is just overcommitting and no real space is allocated till container
actually writes data. And this is no different then fs based graphdrivers
where virtual size of a container root is unlimited.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Replaced github.com/docker/libcontainer with
github.com/opencontainers/runc/libcontaier.
Also I moved AppArmor profile generation to docker.
Main idea of this update is to fix mounting cgroups inside containers.
After updating docker on CI we can even remove dind.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
- Refactor opts.ValidatePath and add an opts.ValidateDevice
ValidePath will now accept : containerPath:mode, hostPath:containerPath:mode
and hostPath:containerPath.
ValidateDevice will have the same behavior as current.
- Refactor opts.ValidateEnv, opts.ParseEnvFile
Environment variables will now be validated with the following
definition :
> Environment variables set by the user must have a name consisting
> solely of alphabetics, numerics, and underscores - the first of
> which must not be numeric.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Memory swappiness option takes 0-100, and helps to tune swappiness
behavior per container.
For example, When a lower value of swappiness is chosen
the container will see minimum major faults. When no value is
specified for memory-swappiness in docker UI, it is inherited from
parent cgroup. (generally 60 unless it is changed).
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
When a container is removed but it had an exec, that still hasn't been
GC'd per PR #14476, and someone tries to inspect the exec we should
return a 404, not a 500+container not running. Returning "..not running" is
not only misleading because it could lead people to think the container is
actually still around, but after 5 minutes the error will change to a 404
after the GC. This means that we're externalizing our internall soft-deletion/GC
logic which shouldn't be any of the end user's concern. They should get the
same results immediate or after 5 minutes.
Signed-off-by: Doug Davis <dug@us.ibm.com>
Libcontainer already supported mount container's own cgroup into
container, with this patch, we can see container's own cgroup info
in container.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This takes the final removal for exec commands in two steps. The first
GC tick will mark the exec commands for removal and then the second tick
will remove the config from the daemon.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds an event loop for running a GC cleanup for exec command
references that are on the daemon. These cannot be cleaned up
immediately because processes may need to get the exit status of the
exec command but it should not grow out of bounds. The loop is set to a
default 5 minute interval to perform cleanup.
It should be safe to perform this cleanup because unless the clients are
remembering the exec id of the process they launched they can query for
the status and see that it has exited. If they don't save the exec id
they will have to do an inspect on the container for all exec instances
and anything that is not live inside that container will not be returned
in the container inspect.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes the exec config from the container after the command exits
so that dead exec commands are not displayed in the container inspect.
The commands are still kept on the daemon so that when you inspect the
exec command, not the container, you are still able to get it's exit
status.
This also changes the ProcessConfig to a pointer.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The ability to save and verify base device UUID (#13896) introduced a
situation where the initialization would panic when removing the device
returns EBUSY.
Functions `verifyBaseDeviceUUID` and `saveBaseDeviceUUID` now take the
lock on the `DeviceSet`, which solves the problem as `removeDevice`
assumes it owns the lock.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
Often it happens that docker is not able to shutdown/remove the thin
pool it created because some device has leaked into some mount name
space. That means device is in use and that means pool can't be removed.
Docker will leave pool as it is and exit. Later when user starts the
docker, it finds pool is already there and docker uses it. But docker
does not know it is same pool which is using the loop devices. Now
docker thinks loop devices are not being used. That means it does not
display the data correctly in "docker info", giving user wrong information.
This patch tries to detect if loop devices as created by docker are
being used for pool and fills in the right details in "docker info".
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
If a container is read-only, also set /proc, /sys,
& /dev to read-only. This should apply to both privileged and
unprivileged containers.
Note that when /dev is read-only, device files may still be
written to. This change will simply prevent the device paths
from being modified, or performing mknod of new devices within
the /dev path.
Tests are included for all cases. Also adds a test to ensure
that /dev/pts is always mounted read/write, even in the case of a
read-write rootfs. The kernel restricts writes here naturally and
bad things will happen if we mount it ro.
Signed-off-by: Eric Windisch <eric@windisch.us>
DeviceMapper must be explicitly selected because the Docker binary might not be linked to the right devmapper library.
With this change, Docker fails fast if the driver detection finds the devicemapper directory but the driver is not the default option.
The option `override_udev_sync_check` doesn't make sense anymore, since the user must be explicit to select devicemapper, so it's being removed.
Docker fails to use devicemapper only if Docker has been built statically unless the option was explicit.
Signed-off-by: David Calavera <david.calavera@gmail.com>
- Container networking statistics are no longer
retrievable from libcontainer after the introduction
of libnetwork. This change adds the missing code
for docker daemon to retireve the nw stats from
Endpoint.
Signed-off-by: Alessandro Boch <aboch@docker.com>
libnetwork host, none and bridge driver initialization is incorrectly
disabled if the daemon flag --bridge=none. The expected behavior of
setting --bridge as none is to disable the bridge driver alone and let
all other modes to be operational.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
By convention /pkg is safe to use from outside the docker tree, for example
if you're building a docker orchestrator.
/nat currently doesn't have any dependencies outside of /pkg, so it seems
reasonable to move it there.
This rename was performed with:
```
gomvpkg -vcs_mv_cmd="git mv {{.Src}} {{.Dst}}" \
-from github.com/docker/docker/nat \
-to github.com/docker/docker/pkg/nat
```
Signed-off-by: Peter Waller <p@pwaller.net>
Export metadata for container and image in docker-inspect when overlay
graphdriver is in use. Right now it is done only for devicemapper graph
driver.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
When a container is started with `--net=host` with
a particular name and it is subsequently destroyed,
then all subsequent creations of the container with
the same name will fail. This is because in `--net=host`
the namespace is shared i.e the host namespace so
trying to destroy the host namespace by calling
`LeaveAll` will fail and the endpoint is left with
the dangling state. So the fix is, for this mode, do
not attempt to destroy the namespace but just cleanup
the endpoint state and return.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
With publish-service and default-network support, a container could be
connected to a user-defined network that is backed by any driver/plugin.
But if the user uses port mapping or expose commands, the expectation
for that container is to behave like existing bridge network.
Thanks to the Libnetwork's CNM model, containers can be connected
to the bridge network as a secondary network in addition to the
user-specified network.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
- brings in vxlan based native multihost networking
- added a daemon flag required by libkv for dist kv operations
- moved the daemon flags to experimental
Signed-off-by: Madhu Venugopal <madhu@docker.com>
This commit makes use of the CNM model supported by LibNetwork and
provides an ability to let a container to publish a specified service.
Behind the scenes, if a service with the given name doesnt exist, it is
automatically created on appropriate network and attach the container.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Signed-off-by: Madhu Venugopal <madhu@docker.com>
By default, the cgroup setting in libcontainer's configs.Cgroup for
memory swappiness will default to 0, which is a valid choice for memory
swappiness, but that means by default every container's memory
swappiness will be set to zero instead of the default 60, which is
probably not what users are expecting.
When the swappiness UI PR comes into Docker, there will be docker run
controls to set this per container, but for now we want to make sure
*not* to change the default, as well as work around an older kernel
issue that refuses to allow it to be set when cgroup hiearchies are in
use.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This commit also brings in the ability to specify a default network and its
corresponding driver as daemon flags. This helps in existing clients to
make use of newer networking features provided by libnetwork.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
It is easy for one to use docker for a while, shut it down and restart
docker with different set of storage options for device mapper driver
which will effectively change the thin pool. That means any of the
metadata stored in /var/lib/docker/devicemapper/metadata/ is not valid
for the new pool and user will run into various kind of issues like
container not found in the pool etc.
Users think that their images or containers are lost but it might just
be the case of configuration issue. People might use wrong metadata
with wrong pool.
To detect such situations, save UUID of base image and once docker
starts later, query and compare the UUID of base image with the
stored one. If they don't match, fail the initialization with the
error that UUID failed to match.
That way user will be forced to cleanup /var/lib/docker/ directory
and start docker again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
An inspection of the graph package showed this function to be way out of place.
It is only depended upon by the daemon code. The function prepares a top-level
readonly layer used to provide a consistent runtime environment for docker
images.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
If no Mtu value is provided to the docker daemon, get the mtu from the
default route's interface. If there is no default route, default to a
mtu of 1500.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Export image/container metadata stored in graph driver. Right now 3 fields
DeviceId, DeviceSize and DeviceName are being exported from devicemapper.
Other graph drivers can export fields as they see fit.
This data can be used to mount the thin device outside of docker and tools
can look into image/container and do some kind of inspection.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This PR brings the vendored libnetwork code to
3be488927db8d719568917203deddd630a194564, which pulls in quite a few
fixes to support kvstore, windows daemon compilation fixes,
multi-network support for Bridge driver, etc...
Signed-off-by: Madhu Venugopal <madhu@docker.com>
This is breaking various setups where the host's rootfs is mount shared
correctly and breaks live migration with bind mounts.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When the daemon is going down trigger immediate
garbage collection of libnetwork resources deleted
like namespace path since there will be no way to
remove them when the daemon restarts.
Signed-off-by: Jana Radhakrishnan <mrjana@docker.com>
Move graph related functions in image to graph package.
Consolidating graph functionality is the first step in refactoring graph into an image store model.
Subsequent refactors will involve breaking up graph into multiple types with a strongly defined interface.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)