Following #19995 and #17409 this PR enables skipping userns re-mapping
when creating a container (or when executing a command). Thus, enabling
privileged containers running side by side with userns remapped
containers.
The feature is enabled by specifying ```--userns:host```, which will not
remapped the user if userns are applied. If this flag is not specified,
the existing behavior (which blocks specific privileged operation)
remains.
Signed-off-by: Liron Levin <liron@twistlock.com>
When I use `docker exec -ti test ls`, I got error:
```
ERRO[0035] Handler for POST /v1.23/exec/9677ecd7aa9de96f8e9e667519ff266ad26a5be80e80021a997fff6084ed6d75/resize returned error: bad file descriptor
```
It's because `POST /exec/<id>/start` and
`POST /exec/<id>/resize` are asynchronous, it is
possible that exec process finishes and ternimal
is closed before resize. Then `console.Fd()` will
get a large invalid number and we got the above
error.
Fix it by adding synchronization between exec and
resize.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Now what we provide dynamic binaries for all plaforms,
we shouldn't try to run docker without udev sync support.
This change changes the previous warning to an Error,
unless the user explicitly overrides the warning, in
which case they're at their own risk.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Attach can hang forever if there is no data to send. This PR adds notification
of Attach goroutine about container stop.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
This change centralizes the template manipulation in a single package
and adds basic string functions to their execution.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Use token handler options for initialization.
Update auth endpoint to set identity token in response.
Update credential store to match distribution interface changes.
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
Previously docker used obsolete rfc3164 syslog format for syslog. rfc3164 explicitly
uses semicolon as a separator between 'TAG' and 'Content' section of the log message.
Docker uses semicolon as a separator between image name and version tag.
When {{.ImageName}} was used as a tag expression and contained ":" syslog parser mistreated
"tag" part of the image name as syslog message body, which resulted in incorrect "syslogtag" been reported by syslog
daemon.
Use of rfc5424 log format partually fixes the issue as it does not use semicolon as a separator.
However using default rfc5424 syslog format itroduces backward incompatability because rsyslog template keyword %syslogtag%
is parsed differently. In rfc3164 it uses the "TAG" part reported before the "pid" part. In rfc5424 it uses "appname" part reported
before the pid part, while tag part is introduced by %msgid% part.
For more information on rsyslog configuration properties see: http://www.rsyslog.com/doc/master/configuration/properties.html
Added two options to specify logging in either rfc5424, rfc3164 format or unix format omitting hostname in order to keep backwards compatability with
previous versions.
Signed-off-by: Solganik Alexander <solganik@gmail.com>
Correct creation of a non-existing WORKDIR during docker build to use
remapped root uid/gid on mkdir
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
I can reproduce this easily on one of my servers,
`docker exec -ti my_cont ls` will not print anything,
without `-t` it acts normally.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Once thin pool gets full, bad things can happen. Especially in case of xfs
it is possible that xfs keeps on retrying IO infinitely (for certain kind
of IO) and container hangs.
One way to mitigate the problem is that once thin pool is about to get full,
start failing some of the docker operations like pulling new images or
creation of new containers. That way user will get warning ahead of time
and can try to rectify it by creating more free space in thin pool. This
can be done either by deleting existing images/containers or by adding more
free space to thin pool.
This patch adds a new option dm.min_free_space to devicemapper graph
driver. Say one specifies dm.min_free_space=10%. This means atleast
10% of data and metadata blocks should be free in pool before new device
creation is allowed, otherwise operation will fail.
By default min_free_space is 10%. User can change it by specifying
dm.min_free_space=X% on command line. A value of 0% will disable the
check.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This fixes an issue that caused the client to hang forever if the
process died before the code arrived to exit the `Kill` function.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Only register a container once it's successfully started. This avoids a
race condition where the daemon is killed while in the process of
calling `libcontainer.Container.Start`, and ends up killing -1.
There is a time window where the container `initProcess` is not set, and
its PID unknown. This commit fixes the race Engine side.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
Check whether or not the file system type of a mountpoint is aufs
by calling statfs() instead of parsing mountinfo. This assumes
that aufs graph driver does not allow aufs as a backing file
system.
Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
Previously, Windows layer diffs were written using a Windows-internal
format based on the BackupRead/BackupWrite Win32 APIs. This caused
problems with tar-split and tarsum and led to performance problems
in implementing methods such as DiffPath. It also was just an
unnecessary differentiation point between Windows and Linux.
With this change, Windows layer diffs look much more like their
Linux counterparts. They use AUFS-style whiteout files for files
that have been removed, and they encode all metadata directly in
the tar file.
This change only affects Windows post-TP4, since changes to the Windows
container storage APIs were necessary to make this possible.
Signed-off-by: John Starks <jostarks@microsoft.com>
This change adds "KernelMemory" to the /info endpoint and
shows a warning if KernelMemory is not supported by the kernel.
This makes it more consistent with the other memory-limit
options.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This allows a graph driver to provide a custom FileGetter for tar-split
to use. Windows will use this to provide a more efficient implementation
in a follow-up change.
Signed-off-by: John Starks <jostarks@microsoft.com>
I noticied an inconsistency when reviewing docker/pull/20692.
Changing Ip to IP and Nf to NF.
More info: The golang folks recommend that you keep the initials consistent:
https://github.com/golang/go/wiki/CodeReviewComments#initialisms.
Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
Moving all strings to the errors package wasn't a good idea after all.
Our custom implementation of Go errors predates everything that's nice
and good about working with errors in Go. Take as an example what we
have to do to get an error message:
```go
func GetErrorMessage(err error) string {
switch err.(type) {
case errcode.Error:
e, _ := err.(errcode.Error)
return e.Message
case errcode.ErrorCode:
ec, _ := err.(errcode.ErrorCode)
return ec.Message()
default:
return err.Error()
}
}
```
This goes against every good practice for Go development. The language already provides a simple, intuitive and standard way to get error messages, that is calling the `Error()` method from an error. Reinventing the error interface is a mistake.
Our custom implementation also makes very hard to reason about errors, another nice thing about Go. I found several (>10) error declarations that we don't use anywhere. This is a clear sign about how little we know about the errors we return. I also found several error usages where the number of arguments was different than the parameters declared in the error, another clear example of how difficult is to reason about errors.
Moreover, our custom implementation didn't really make easier for people to return custom HTTP status code depending on the errors. Again, it's hard to reason about when to set custom codes and how. Take an example what we have to do to extract the message and status code from an error before returning a response from the API:
```go
switch err.(type) {
case errcode.ErrorCode:
daError, _ := err.(errcode.ErrorCode)
statusCode = daError.Descriptor().HTTPStatusCode
errMsg = daError.Message()
case errcode.Error:
// For reference, if you're looking for a particular error
// then you can do something like :
// import ( derr "github.com/docker/docker/errors" )
// if daError.ErrorCode() == derr.ErrorCodeNoSuchContainer { ... }
daError, _ := err.(errcode.Error)
statusCode = daError.ErrorCode().Descriptor().HTTPStatusCode
errMsg = daError.Message
default:
// This part of will be removed once we've
// converted everything over to use the errcode package
// FIXME: this is brittle and should not be necessary.
// If we need to differentiate between different possible error types,
// we should create appropriate error types with clearly defined meaning
errStr := strings.ToLower(err.Error())
for keyword, status := range map[string]int{
"not found": http.StatusNotFound,
"no such": http.StatusNotFound,
"bad parameter": http.StatusBadRequest,
"conflict": http.StatusConflict,
"impossible": http.StatusNotAcceptable,
"wrong login/password": http.StatusUnauthorized,
"hasn't been activated": http.StatusForbidden,
} {
if strings.Contains(errStr, keyword) {
statusCode = status
break
}
}
}
```
You can notice two things in that code:
1. We have to explain how errors work, because our implementation goes against how easy to use Go errors are.
2. At no moment we arrived to remove that `switch` statement that was the original reason to use our custom implementation.
This change removes all our status errors from the errors package and puts them back in their specific contexts.
IT puts the messages back with their contexts. That way, we know right away when errors used and how to generate their messages.
It uses custom interfaces to reason about errors. Errors that need to response with a custom status code MUST implementent this simple interface:
```go
type errorWithStatus interface {
HTTPErrorStatusCode() int
}
```
This interface is very straightforward to implement. It also preserves Go errors real behavior, getting the message is as simple as using the `Error()` method.
I included helper functions to generate errors that use custom status code in `errors/errors.go`.
By doing this, we remove the hard dependency we have eeverywhere to our custom errors package. Yes, you can use it as a helper to generate error, but it's still very easy to generate errors without it.
Please, read this fantastic blog post about errors in Go: http://dave.cheney.net/2014/12/24/inspecting-errors
Signed-off-by: David Calavera <david.calavera@gmail.com>
The execdriver pipes setup uses OS pipes with fds so that they can be
chown'ed to the remapped root user for proper access. Recent flakiness
in certain short-lived tests (usually via the "exec" path) reveals that
the copy routines are not completing before exit/tear-down.
This fix adds synchronization and proper closure such that these
routines exit successfully.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This enhancement is to fix the wrong list results on
`docker ps` before and since filters specifying the non-running container.
Fixes issue #20431
Signed-off-by: Wen Cheng Ma <wenchma@cn.ibm.com>
Because devices will be bind-mounted instead of using `mknod`, we need
to make sure the source exists and filter the list by only those whose
source is a valid path/current device entry.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
There are five options 'debug' 'labels' 'cluster-store' 'cluster-store-opts'
and 'cluster-advertise' that can be reconfigured, configure any of these
options should not affect other options which may have configured in flags.
But this is not true, for example, I start a daemon with -D to enable the
debugging, and after a while, I want reconfigure the 'label', so I add a file
'/etc/docker/daemon.json' with content '"labels":["test"]' and send SIGHUP to daemon
to reconfigure the daemon, it work, but the debugging of the daemon is also diabled.
I don't think this is a expeted behaviour.
This patch also have some minor refactor of reconfiguration of cluster-advertiser.
Enable user to reconfigure cluster-advertiser without cluster-store in config file
since cluster-store could also be already set in flag, and we only want to reconfigure
the cluster-advertiser.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
When checking if we have the development files for libsystemd's journal
APIs, check for either 'libsystemd >= 209' and 'libsystemd-journal'. If
we find 'libsystemd', define the 'journald' tag, which defaults to using
the 'libsystemd.pc' file. If we find the older 'libsystemd-journal',
define both the 'journald' and 'journald_compat' tags, which causes the
'libsystemd-journal.pc' file to be consulted instead.
Signed-off-by: Nalin Dahyabhai <nalin@redhat.com> (github: nalind)
- Allow to filter containers by volume with `--filter volume=name` and `filter volume=/dest`.
- Show their names in the list with the custom format `{{ .Mounts }}`.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Add `--restart` flag for `update` command, so we can change restart
policy for a container no matter it's running or stopped.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
When the value for a configuration option in the file is `false`,
and the default value for a flag is `true`, we should not
take the value from the later as final value for the option,
because the user explicitly set `false`.
This change overrides the default value in the flagSet with
the value in the configuration file so we get the correct
result when we merge the two configurations together.
Signed-off-by: David Calavera <david.calavera@gmail.com>
This corrects `docker cp` behavior when user namespaces are enabled.
Instead of chown'ing copied-in files to real root (0,0), the code
queries for the remapped root uid & gid and sets the chown option
properly.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
- It reverts fa163f5619 plus a small change
in order to allow passing the global scope datastore
to libnetwork after damon boot.
Signed-off-by: Alessandro Boch <aboch@docker.com>
Also changes missing storage layer for container
RWLayer to a soft failure.
Fixes#20147
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
(cherry picked from commit 2798d7a6a681aee8995e87c9b68128e54876d2b5)
If user specifies --read-only flag it should not effect /dev/mqueue.
This is causing SELinux issues in docker-1.10. --read-only blows up
on SELinux enabled machines. Mounting /dev/mqueue read/only would also
blow up any tool that was going to use /dev/mqueue.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
inotify event is trigged immediately there's data written to disk.
But at the time that the inotify event is received, the json line might
not fully saved to disk. If the json decoder tries to decode in such
case, an io.UnexpectedEOF will be trigged.
We used to retry for several times to mitigate the io.UnexpectedEOF error.
But there are still flaky tests caused by the partial log entries.
The daemon knows exactly when there are new log entries emitted. We can
use the pubsub package to notify all the log readers instead of inotify.
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
try to fix broken test. will squash once tests pass
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
Fixes issues with layer remounting (e.g. a running container which then
has `docker cp` used to copy files in or out) by applying the same
refcounting implementation that exists in other graphdrivers like
overlay and aufs.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This fixes an issue where `docker run -v foo:/bar --volume-driver
<remote driver>` -> daemon restart -> `docker run -v foo:/bar` would
make a `local` volume after the restart instead of using the existing
volume from the remote driver.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Currently, when running a container with --ipc=host, if /dev/mqueue is
a standard directory on the hos the daemon will bind mount it allowing
the container to create/modify files on the host.
This commit forces /dev/mqueue to always be of type mqueue except when
the user explicitely requested something to be bind mounted to
/dev/mqueue.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Filters should not include stopped container if `-a` is not specified.
Right now, before and since filter are acting as --before and --since
deprecated flags. This commit is fixing that.
Signed-off-by: Vincent Demeester <vincent@sbr.pm>
Save was failing file integrity checksums due to bugs in both
Windows and Docker. This commit includes fixes to file time handling
in tarexport and system.chtimes that are necessary along with
the Windows platform fixes to correctly support save. With this
change, sysfile_backups for windowsfilter driver are no longer
needed, so that code is removed.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
This is done by moving the following types to api/types/config.go:
- ContainersConfig
- ContainerAttachWithLogsConfig
- ContainerWsAttachWithLogsConfig
- ContainerLogsConfig
- ContainerStatsConfig
Remove dependency on "version" package from types.ContainerStatsConfig.
Decouple the "container" router from the "daemon/exec" implementation.
* This is done by making daemon.ContainerExecInspect() return an interface{}
value. The same trick is already used by daemon.ContainerInspect().
Improve documentation for router packages.
Extract localRoute and router into separate files.
Move local.router to image.imageRouter.
Changes:
- Move local/image.go to image/image_routes.go.
- Move local/local.go to image/image.go
- Rename router to imageRouter.
- Simplify imports for image/image.go (remove alias for router package).
Merge router/local package into router package.
Decouple the "image" router from the actual daemon implementation.
Add Daemon.GetNetworkByID and Daemon.GetNetworkByName.
Decouple the "network" router from the actual daemon implementation.
This is done by replacing the daemon.NetworkByName constant with
an explicit GetNetworkByName method.
Remove the unused Daemon.GetNetwork method and the associated constants NetworkByID and NetworkByName.
Signed-off-by: Lukas Waslowski <cr7pt0gr4ph7@gmail.com>
Signed-off-by: David Calavera <david.calavera@gmail.com>
Fix root directory of the mountpoint being owned by real root. This is
unique to ZFS because of the way file mountpoints are created using the
ZFS tooling, and the remapping that happens at layer unpack doesn't
impact this root (already created) holding directory for the layer.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
mqueue can not be mounted on the host os and then shared into the container.
There is only one mqueue per mount namespace, so current code ends up leaking
the /dev/mqueue from the host into ALL containers. Since SELinux changes the
label of the mqueue, only the last container is able to use the mqueue, all
other containers will get a permission denied. If you don't have SELinux protections
sharing of the /dev/mqueue allows one container to interact in potentially hostile
ways with other containers.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
For btrfs driver, in d.Create(), Get() of parentDir is called but not followed
by Put().
If we apply SElinux mount label, we need to mount btrfs subvolumes in d.Get(),
without a Put() would end up with a later Remove() failure on
"Device resourse is busy".
This calls the subvolume helper function directly in d.Create().
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Fix error message for `--net container:b` and `--ipc container:b`,
container `b` is a restarting container.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
When the daemon shutdown ungracefully, it will left the running
containers' rootfs still be mounted. This will cause some error
when trying to remove the containers.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
duplicate dot in user namespaces error message:
$ docker run -ti --net=host ubuntu /bin/bash
docker: Error response from daemon: Cannot share the host or a
container's network namespace when user namespaces are enabled..
Signed-off-by: Liron Levin <liron@twistlock.com>
Currently some commands including `kill`, `pause`, `restart`, `rm`,
`rmi`, `stop`, `unpause`, `udpate`, `wait` will print a lot of error
message on client side, with a lot of redundant messages, this commit is
trying to remove the unuseful and redundant information for user.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
This adds an npipe protocol option for Windows hosts, akin to unix
sockets for Linux hosts. This should become the default transport
for Windows, but this change does not yet do that.
It also does not add support for the client side yet since that
code is in engine-api, which will have to be revendored separately.
Signed-off-by: John Starks <jostarks@microsoft.com>
Currently, daemonbuilder package (part of daemon) implemented the
builder backend. However, it was a very thin wrapper around daemon
methods and caused an implementation dependency for api/server build
endpoint. api/server buildrouter should only know about the backend
implementing the /build API endpoint.
Removing daemonbuilder involved moving build specific methods to
respective files in the daemon, where they fit naturally.
Signed-off-by: Anusha Ragunathan <anusha@docker.com>
Most storage drivers call graphdriver.GetFSMagic(home),
it is more clean to easy to maintain. So btrfs need to
adopt such change.
Signed-off-by: Kai Qiang Wu(Kennan) <wkqwu@cn.ibm.com>
Updated documentation to reflect the new State property in the inspect remote api
Updated API changes for 1.23
Signed-off-by: Marius Gundersen <me@mariusgundersen.net>
SetMaxThreads from runtime/debug in Golang is called to set max threads
value to 90% of /proc/sys/kernel/threads-max
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
The restart container has already prepared the mountpoint, there is
no need to do that again. This can speed up the daemon start if
the restart container has a volume and the volume driver is not
available.
Signed-off-by: Lei Jitang <leijitang@huawei.com>
Signed-off-by: Jussi Nummelin <jussi.nummelin@gmail.com>
Changed buffer size to 1M and removed unnecessary fmt call
Signed-off-by: Jussi Nummelin <jussi.nummelin@gmail.com>
Updated docs for the new fluentd opts
Signed-off-by: Jussi Nummelin <jussi.nummelin@gmail.com>
Currently if we exec a restarting container, client will fail silently,
and daemon will print error that container can't be found which is not a
very meaningful prompt to user.
This commit will stop user from exec a restarting container and gives
more explicit error message.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
We cannot rely on the tar command for this type of operation because tar
versions, flags, and functionality can very from distro to distro.
Since this is in the container execution path it is not safe to have
this as a dependency from dockers POV where the user cannot change the
fact that docker is adding these pre and post mount commands.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
daemon cache was getting the whole image map and then iterating through
it to find children. This information is already stored in the image
store.
Prior to this change building the docker repo with a full cache took 30
seconds.
After it takes between 15 seconds or less (As low as 9 seconds).
This is an improvement on docker 1.9.1 which hovered around 17 seconds.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
dockerinit has been around for a very long time. It was originally used
as a way for us to do configuration for LXC containers once the
container had started. LXC is no longer supported, and /.dockerinit has
been dead code for quite a while. This removes all code and references
in code to dockerinit.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
https://github.com/docker/libnetwork/pull/810 provides the more complete
solution for moving the Port-mapping ownership away from endpoint and
into Sandbox. But, this PR makes the best use of existing libnetwork
design and get a step closer to the gaol.
Signed-off-by: Madhu Venugopal <madhu@docker.com>
This makes it so when calling `docker run --rm`, or `docker rm -v`, only
volumes specified without a name, e.g. `docker run -v /foo` instead of
`docker run -v awesome:/foo` are removed.
Note that all volumes are named, some are named by the user, some get a
generated name. This is specifically about how the volume was specified
on `run`, assuming that if the user specified it with a name they expect
it to persist after the container is cleaned up.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- Return an error if any of the keys don't match valid flags.
- Fix an issue ignoring merged values as named values.
- Fix tlsverify configuration key.
- Fix bug in mflag to avoid panics when one of the flag set doesn't have any flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>