This is a better fix for futimes() on kernels not supporting O_PATH.
The previous fix broke when copying a device, as it tried to open it
and got and error.
Older kernel can't handle O_PATH in open() so this will
fail on dirs and symlinks. For dirs wa can fallback to
the normal Utimes, but for symlinks there is not much to do
but ignore their timestamps.
This creates a container by copying the corresponding files
from the layers into the containers. This is not gonna be very useful
on a developer setup, as there is no copy-on-write or general diskspace
sharing. It also makes container instantiation slower.
However, it may be useful in deployment where we don't always have a lot
of containers running (long-running daemons) and where we don't
do a lot of docker commits.
There are some changes here that make the file metadata better match
the layer files:
* Set the mode of the file after the chown, as otherwise the per-group/uid
specific flags and e.g. sticky bit is lost
* Use lchown instead of chown
* Delay mtime updates to after all other changes so that later file
creation doesn't change the mtime for the parent directory
* Use Futimes in combination with O_PATH|O_NOFOLLOW to set mtime on symlinks
The init layer needs to be topmost to make sure certain files
are always there (for instance, the ubuntu:12.10 image wrongly
has /dev/shm being a symlink to /run/shm, and we need to override
that). However, previously the devmapper code implemented the
init layer by putting it in the base devmapper device, which meant
layers above it could override these files (so that ubuntu:12.10
broke).
So, instead we put the base layer in *each* images devmapper device.
This is "safe" because we still have the pristine layer data
in the layer directory. Also, it means we diff the container
against the image with the init layer applied, so it won't show
up in diffs/commits.
Right now this does nothing but add a new layer, but it means
that all DeviceMounts are paired with DeviceUnmounts so that we
can track (and cleanup) active mounts.
To do diffing we just compare file metadata, so this relies
on things like size and mtime/ctime to catch any changes.
Its *possible* to trick this by updating a file without
changing the size and setting back the mtime/ctime, but
that seems pretty unlikely to happen in reality, and lets
us avoid comparing the actual file data.
Without this there is really no way to map back from the device-mapper
devices to the actual docker image/container ids in case the json file
somehow got lost
This supports creating images from layers and mounting them
for running a container.
Not supported yet are:
* Creating diffs between images/containers
* Creating layers for new images from a device-mapper container