This fixes errors in ownership on directory creation during build that
can cause inaccessible files depending on the paths in the Dockerfile
and non-existing directories in the starting image.
Add tests for the mkdir variants in pkg/idtools
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
The race is between pools.Put which calls buf.Reset and exec.Cmd
doing io.Copy from the buffer; it caused a runtime crash, as
described in #16924:
``` docker-daemon cat the-tarball.xz | xz -d -c -q | docker-untar /path/to/... (aufs ) ```
When docker-untar side fails (like try to set xattr on aufs, or a broken
tar), invokeUnpack will be responsible to exhaust all input, otherwise
`xz` will be write pending for ever.
this change add a receive only channel to cmdStream, and will close it
to notify it's now safe to close the input stream;
in CmdStream the change to use Stdin / Stdout / Stderr keeps the
code simple, os/exec.Cmd will spawn goroutines and call io.Copy automatically.
the CmdStream is actually called in the same file only, change it
lowercase to mark as private.
[...]
INFO[0000] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=aufs version=1.8.2
DEBU[0006] Calling POST /build
INFO[0006] POST /v1.20/build?cgroupparent=&cpuperiod=0&cpuquota=0&cpusetcpus=&cpusetmems=&cpushares=0&dockerfile=Dockerfile&memory=0&memswap=0&rm=1&t=gentoo-x32&ulimits=null
DEBU[0008] [BUILDER] Cache miss
DEBU[0009] Couldn't untar /home/lib-docker-v1.8.2-tmp/tmp/docker-build316710953/stage3-x32-20151004.tar.xz to /home/lib-docker-v1.8.2-tmp/aufs/mnt/d909abb87150463939c13e8a349b889a72d9b14f0cfcab42a8711979be285537: Untar re-exec error: exit status 1: output: operation not supported
DEBU[0009] CopyFileWithTar(/home/lib-docker-v1.8.2-tmp/tmp/docker-build316710953/stage3-x32-20151004.tar.xz, /home/lib-docker-v1.8.2-tmp/aufs/mnt/d909abb87150463939c13e8a349b889a72d9b14f0cfcab42a8711979be285537/)
panic: runtime error: slice bounds out of range
goroutine 42 [running]:
bufio.(*Reader).fill(0xc208187800)
/usr/local/go/src/bufio/bufio.go:86 +0x2db
bufio.(*Reader).WriteTo(0xc208187800, 0x7ff39602d150, 0xc2083f11a0, 0x508000, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:449 +0x27e
io.Copy(0x7ff39602d150, 0xc2083f11a0, 0x7ff3960261f8, 0xc208187800, 0x0, 0x0, 0x0)
/usr/local/go/src/io/io.go:354 +0xb2
github.com/docker/docker/pkg/archive.func·006()
/go/src/github.com/docker/docker/pkg/archive/archive.go:817 +0x71
created by github.com/docker/docker/pkg/archive.CmdStream
/go/src/github.com/docker/docker/pkg/archive/archive.go:819 +0x1ec
goroutine 1 [chan receive]:
main.(*DaemonCli).CmdDaemon(0xc20809da30, 0xc20800a020, 0xd, 0xd, 0x0, 0x0)
/go/src/github.com/docker/docker/docker/daemon.go:289 +0x1781
reflect.callMethod(0xc208140090, 0xc20828fce0)
/usr/local/go/src/reflect/value.go:605 +0x179
reflect.methodValueCall(0xc20800a020, 0xd, 0xd, 0x1, 0xc208140090, 0x0, 0x0, 0xc208140090, 0x0, 0x45343f, ...)
/usr/local/go/src/reflect/asm_amd64.s:29 +0x36
github.com/docker/docker/cli.(*Cli).Run(0xc208129fb0, 0xc20800a010, 0xe, 0xe, 0x0, 0x0)
/go/src/github.com/docker/docker/cli/cli.go:89 +0x38e
main.main()
/go/src/github.com/docker/docker/docker/docker.go:69 +0x428
goroutine 5 [syscall]:
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:21 +0x1f
created by os/signal.init·1
/usr/local/go/src/os/signal/signal_unix.go:27 +0x35
Signed-off-by: Derek Ch <denc716@gmail.com>
this allows jsonfile logger to collect extra metadata from containers with
`--log-opt labels=label1,label2 --log-opt env=env1,env2`.
Extra attributes are saved into `attrs` attributes for each log data.
Signed-off-by: Daniel Dao <dqminh@cloudflare.com>
All the go-lint work forced any existing "Uid" -> "UID", but seems to
not have the same rules for Gid, so stat package has calls UID() and
Gid().
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Adds support for the daemon to handle user namespace maps as a
per-daemon setting.
Support for handling uid/gid mapping is added to the builder,
archive/unarchive packages and functions, all graphdrivers (except
Windows), and the test suite is updated to handle user namespace daemon
rootgraph changes.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
The `pkg/idtools` package supports the creation of user(s) for
retrieving /etc/sub{u,g}id ranges and creation of the UID/GID mappings
provided to clone() to add support for user namespaces in Docker.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This leverages recent additions to libkv enabling client
authentication via TLS so the discovery back-end can be locked
down with mutual TLS. Example usage:
docker daemon [other args] \
--cluster-advertise 192.168.122.168:2376 \
--cluster-store etcd://192.168.122.168:2379 \
--cluster-store-opt kv.cacertfile=/path/to/ca.pem \
--cluster-store-opt kv.certfile=/path/to/cert.pem \
--cluster-store-opt kv.keyfile=/path/to/key.pem
Signed-off-by: Daniel Hiltgen <daniel.hiltgen@docker.com>
progressreader.Broadcaster becomes broadcaster.Buffered and
broadcastwriter.Writer becomes broadcaster.Unbuffered.
The package broadcastwriter is thus renamed to broadcaster.
Signed-off-by: Tibor Vass <tibor@docker.com>
This patch creates interfaces in builder/ for building Docker images.
It is a first step in a series of patches to remove the daemon
dependency on builder and later allow a client-side Dockerfile builder
as well as potential builder plugins.
It is needed because we cannot remove the /build API endpoint, so we
need to keep the server-side Dockerfile builder, but we also want to
reuse the same Dockerfile parser and evaluator for both server-side and
client-side.
builder/dockerfile/ and api/server/builder.go contain implementations
of those interfaces as a refactoring of the current code.
Signed-off-by: Tibor Vass <tibor@docker.com>
Finally here is the patch to implement deferred deletion functionality.
Deferred deleted devices are marked as "Deleted" in device meta file.
First we try to delete the device and only if deletion fails and user has
enabled deferred deletion, device is marked for deferred deletion.
When docker starts up again, we go through list of deleted devices and
try to delete these again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Although having a request ID available throughout the codebase is very
valuable, the impact of requiring a Context as an argument to every
function in the codepath of an API request, is too significant and was
not properly understood at the time of the review.
Furthermore, mixing API-layer code with non-API-layer code makes the
latter usable only by API-layer code (one that has a notion of Context).
This reverts commit de41640435, reversing
changes made to 7daeecd42d.
Signed-off-by: Tibor Vass <tibor@docker.com>
Conflicts:
api/server/container.go
builder/internals.go
daemon/container_unix.go
daemon/create.go
This fixes the case where directory is removed in
aufs and then the same layer is imported to a
different graphdriver.
Currently when you do `rm -rf /foo && mkdir /foo`
in a layer in aufs the files under `foo` would
only be be hidden on aufs.
The problems with this fix:
1) When a new diff is recreated from non-aufs driver
the `opq` files would not be there. This should not
mean layer differences for the user but still
different content in the tar (one would have one
`opq` file, the others would have `.wh.*` for every
file inside that folder). This difference also only
happens if the tar-split file isn’t stored for the
layer.
2) New files that have the filenames before `.wh..wh..opq`
when they are sorted do not get picked up by non-aufs
graphdrivers. Fixing this would require a bigger
refactoring that is planned in the future.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Before this patch libcontainer badly errored out with `invalid
argument` or `numerical result out of range` while trying to write
to cpuset.cpus or cpuset.mems with an invalid value provided.
This patch adds validation to --cpuset-cpus and --cpuset-mems flag along with
validation based on system's available cpus/mems before starting a container.
Signed-off-by: Antonio Murdaca <runcom@linux.com>
Absorb Swarm's discovery package in order to provide a common node
discovery mechanism to be used by both Swarm and networking code.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
This PR adds a "request ID" to each event generated, the 'docker events'
stream now looks like this:
```
2015-09-10T15:02:50.000000000-07:00 [reqid: c01e3534ddca] de7c5d4ca927253cf4e978ee9c4545161e406e9b5a14617efb52c658b249174a: (from ubuntu) create
```
Note the `[reqID: c01e3534ddca]` part, that's new.
Each HTTP request will generate its own unique ID. So, if you do a
`docker build` you'll see a series of events all with the same reqID.
This allow for log processing tools to determine which events are all related
to the same http request.
I didn't propigate the context to all possible funcs in the daemon,
I decided to just do the ones that needed it in order to get the reqID
into the events. I'd like to have people review this direction first, and
if we're ok with it then I'll make sure we're consistent about when
we pass around the context - IOW, make sure that all funcs at the same level
have a context passed in even if they don't call the log funcs - this will
ensure we're consistent w/o passing it around for all calls unnecessarily.
ping @icecrime @calavera @crosbymichael
Signed-off-by: Doug Davis <dug@us.ibm.com>
The tests added cover the case when the Writer field returns and error
and, related to that, when the number of written bytes is less than 0.
Signed-off-by: Federico Gimenez <fgimenez@coit.es>
This way provide both Time and TimeNano in the event. For the display of
the JSONMessage, use either, but prefer TimeNano Proving only TimeNano
would break Subscribers that are using the `Time` field, so both are set
for backwards compatibility.
The events logging uses nano formatting, but only provides a Unix()
time, therefor ordering may get lost in the output. Example:
```
2015-09-15T14:18:51.000000000-04:00 ee46febd64ac629f7de9cd8bf58582e6f263d97ff46896adc5b508db804682da: (from busybox) resize
2015-09-15T14:18:51.000000000-04:00 a78c9149b1c0474502a117efaa814541926c2ae6ec3c76607e1c931b84c3a44b: (from busybox) resize
```
By having a field just for Nano time, when set, the marshalling back to
`time.Unix(sec int64, nsec int64)` has zeros exactly where it needs to.
This does not break any existing use of jsonmessage.JSONMessage, but now
allows for use of `UnixNano()` and get event formatting that has
distinguishable order. Example:
```
2015-09-15T15:37:23.810295632-04:00 6adcf8ed9f5f5ec059a915466cd1cde86a18b4a085fc3af405e9cc9fecbbbbaf: (from busybox) resize
2015-09-15T15:37:23.810412202-04:00 6b7c5bfdc3f902096f5a91e628f21bd4b56e32590c5b4b97044aafc005ddcb0d: (from busybox) resize
```
Including tests for TimeNano and updated event API reference doc.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Also, use the channel to determine if the broadcaster is closed,
removing the redundant isClosed variable.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Before, this only waited for the download to complete. There was no
guarantee that the layer had been registered in the graph and was ready
use. This is especially problematic with v2 pulls, which wait for all
downloads before extracting layers.
Change Broadcaster to allow an error value to be propagated from Close
to the waiters.
Make the wait stop when the extraction is finished, rather than just the
download.
This also fixes v2 layer downloads to prefix the pool key with "layer:"
instead of "img:". "img:" is the wrong prefix, because this is what v1
uses for entire images. A v1 pull waiting for one of these operations to
finish would only wait for that particular layer, not all its
dependencies.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Allow to set the signal to stop a container in `docker run`:
- Use `--stop-signal` with docker-run to set the default signal the container will use to exit.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Without this change, there was a narrow race condition that would allow
writers to finish when there was still data left to write. This is
likely to be what was causing some integration tests to fail with
truncated pull output.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Allows people to create out-of-process graphdrivers that can be used
with Docker.
Extensions must be started before Docker otherwise Docker will fail to
start.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- utils_test.go and docker_utils_test.go
- Moved docker related function to docker_utils.go
- add a test for integration-cli/checker
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Calling runtime.Stack requires the buffer to be big enough to fit the
goroutines dump. If it's not big enough the dump will be truncated and
the value returned will be the same size as the buffer.
The code was changed to handle this situation and try again with a
bigger buffer. Each time the dump doesn't fit in the buffer its size is
doubled.
Signed-off-by: Cezar Sa Espinola <cezarsa@gmail.com>
Avoid duplicate definitions of NewSqliteConn when cgo isn't enabled, so
that we can at least build the daemon.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com> (github: nalind)
This means the writing to a WriteFlusher will flush in the same places
as it would if the broadcaster wasn't sitting in front of it.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
- Rename to Broadcaster
- Document exported types
- Change Wait function to just wait. Writing a message to the writer and
adding the writer to the observers list are now handled by separate
function calls.
- Avoid importing logrus (the condition where it was used should never
happen, anyway).
- Make writes non-blocking
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Based on #12874 from Sam Abed <sam.abed@gmail.com>. His original commit
was brought up to date by manually porting the changes in pull.go into
the new code in pull_v1.go and pull_v2.go.
Fixes#8385
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
The practice of buffering to a tempfile during a pushing contributes massively
to slow V2 push performance perception. The protocol was actually designed to
avoid precalculation, supporting cut-through data push. This means we can
assemble the layer, calculate its digest and push to the remote endpoint, all
at the same time.
This should increase performance massively on systems with slow disks or IO
bottlenecks.
Signed-off-by: Stephen J Day <stephen.day@docker.com>
Maps rely on the keys being comparable.
Using an interface type as the map key is dangerous,
because some interface types are not comparable.
I talked about this in my "Stupid Gopher Tricks" talk:
https://talks.golang.org/2015/tricks.slide
In this case, if the user-provided writer is backed by a slice
(such as io.MultiWriter) then the code will panic at run time.
Signed-off-by: Andrew Gerrand <adg@golang.org>
This patch makes it such that plugin initialization is synchronized
based on the plugin name and not globally
Signed-off-by: Darren Shepherd <darren@rancher.com>
Signed-off-by: Don Kjer <don.kjer@gmail.com>
Changing vendor/src/github.com/docker/libnetwork to match lindenlab/libnetwork custom-host-port-ranges-1.7 branch
sysinfo struct was initialized at daemon startup to make sure
kernel configs such as device cgroup are present and error out if not.
The struct was embedded in daemon struct making impossible to detect
if some system config is changed at daemon runtime (i.e. someone
umount the memory cgroup). This leads to container's starts failure if
some config is changed at daemon runtime.
This patch moves sysinfo out of daemon and initilize and check it when
needed (daemon startup, containers creation, contaienrs startup for
now).
Signed-off-by: Antonio Murdaca <runcom@linux.com>
(cherry picked from commit 472b6f66e03f9a85fe8d23098dac6f55a87456d8)
Carried: #14015
If kernel is compiled with CONFIG_FAIR_GROUP_SCHED disabled cpu.shares
doesn't exist.
If kernel is compiled with CONFIG_CFQ_GROUP_IOSCHED disabled blkio.weight
doesn't exist.
If kernel is compiled with CONFIG_CPUSETS disabled cpuset won't be
supported.
We need to handle these conditions by checking sysinfo and verifying them.
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Some structures use int for sizes and UNIX timestamps. On some
platforms, int is 32 bits, so this can lead to the year 2038 issues and
overflows when dealing with large containers or layers.
Consistently use int64 to store sizes and UNIX timestamps in
api/types/types.go. Update related to code accordingly (i.e.
strconv.FormatInt instead of strconv.Itoa).
Use int64 in progressreader package to avoid integer overflow when
dealing with large quantities. Update related code accordingly.
Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
[pkg/archive] Update archive/copy path handling
- Remove unused TarOptions.Name field.
- Add new TarOptions.RebaseNames field.
- Update some of the logic around path dir/base splitting.
- Update some of the logic behind archive entry name rebasing.
[api/types] Add LinkTarget field to PathStat
[daemon] Fix stat, archive, extract of symlinks
These operations *should* resolve symlinks that are in the path but if the
resource itself is a symlink then it *should not* be resolved. This patch
puts this logic into a common function `resolvePath` which resolves symlinks
of the path's dir in scope of the container rootfs but does not resolve the
final element of the path. Now archive, extract, and stat operations will
return symlinks if the path is indeed a symlink.
[api/client] Update cp path hanling
[docs/reference/api] Update description of stat
Add the linkTarget field to the header of the archive endpoint.
Remove path field.
[integration-cli] Fix/Add cp symlink test cases
Copying a symlink should do just that: copy the symlink NOT
copy the target of the symlink. Also, the resulting file from
the copy should have the name of the symlink NOT the name of
the target file.
Copying to a symlink should copy to the symlink target and not
modify the symlink itself.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[v2: a separate aufs commit is merged into this one]
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kir@openvz.org>