WIP: Osxfs caching updates (#2883)

* Update the osxfs documentation with recent details about caching.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Add a note about 'cached' to the volumes tutorial.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Add a document giving a detailed specification of osxfs caching.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Americanize spelling in the osxfs-caching document.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* More osxfs benchmark details for 'go list'.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Remove mentions of the blog post.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* editorial, topic structure changes, x-refs

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>

* linked to new topic also where bind mounts are discussed in Namespaces

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>

* added more tags

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>

* Decapitalize descriptions

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* "releease" ~> "release"

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Use @avsm's suggested rewording for the osxfs-caching introduction.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Add an example showing how to use `cached`, `consistent`, etc.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* Add a link to the user-guided caching blog post.

Signed-off-by: Jeremy Yallop <yallop@docker.com>

* added examples heading, more x-refs to blog post, docker run

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>

* escaped second dash in --volume long version of command

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>

* fixed double dashes on volume option to render properly

Signed-off-by: Victoria Bialas <victoria.bialas@docker.com>
This commit is contained in:
yallop 2017-05-05 19:49:00 +01:00 committed by Victoria Bialas
parent cfea1e9ba0
commit 42d9c55a1e
4 changed files with 304 additions and 66 deletions

View File

@ -2208,6 +2208,8 @@ manuals:
title: Networking
- path: /docker-for-mac/osxfs/
title: File system sharing
- path: /docker-for-mac/osxfs-caching/
title: Performance tuning for volume mounts (shared filesystems)
- path: /docker-for-mac/troubleshoot/
title: Logs and troubleshooting
- path: /docker-for-mac/faqs/

View File

@ -0,0 +1,226 @@
---
description: Osxfs caching
keywords: mac, osxfs, volume mounts, docker run -v, performance
title: Performance tuning for volume mounts (shared filesystems)
toc_max: 4
toc_min: 2
---
[Docker 17.04 CE
Edge](https://docs.docker.com/edge/#docker-ce-edge-new-features) adds support
for two new flags to the [docker run `-v`,
`--volume`](https://docs.docker.com/engine/reference/run/#volume-shared-filesystems)
option, `cached` and `delegated`, that can significantly improve the performance
of mounted volume access on Docker for Mac. These options begin to solve some of
the challenges discussed in [Performance issues, solutions, and
roadmap](/docker-for-mac/osxfs.md#performance-issues-solutions-and-roadmap).
> **Tip:** Release notes for Docker CE Edge 17.04 are [here](https://github.com/moby/moby/releases/tag/v17.04.0-ce), and the associated pull request for the additional `docker run -v` flags is [here](https://github.com/moby/moby/pull/31047).
## Performance implications of host-container file system consistency
With Docker distributions now available for an increasing number of
platforms, including macOS and Windows, generalizing mount semantics
during container run is a necessity to enable workload optimizations.
The current implementations of mounts on Linux provide a consistent
view of a host directory tree inside a container: reads and writes
performed either on the host or in the container are immediately
reflected in the other environment, and file system events (`inotify`,
`FSEvents`) are consistently propagated in both directions.
On Linux, these guarantees carry no overhead, since the underlying VFS is
shared directly between host and container. However, on macOS (and
other non-Linux platforms) there are significant overheads to
guaranteeing perfect consistency, since messages describing file system
actions must be passed synchronously between container and host. The
current implementation is sufficiently efficient for most tasks, but
with certain types of workloads the overhead of maintaining perfect
consistency can result in significantly worse performance than a
native (non-Docker) environment. For example,
* running `go list ./...` in the bind-mounted `docker/docker` source tree
takes around 26 seconds
* writing 100MB in 1k blocks into a bind-mounted directory takes
around 23 seconds
* running `ember build` on a freshly created (i.e. empty) application
involves around 70000 sequential syscalls, each of which translates
into a request and response passed between container and host.
Optimizations to reduce latency throughout the stack have brought
significant improvements to these workloads, and a few further
optimization opportunities remain. However, even when latency is
minimized, the constraints of maintaining consistency mean that these
workloads remain unacceptably slow for some use cases.
## Tuning with consistent, cached, and delegated configurations
**_Fortunately, in many cases where the performance degradation is most
severe, perfect consistency between container and host is unnecessary._**
In particular, in many cases there is no need for writes performed in a
container to be immediately reflected on the host. For example, while
interactive development requires that writes to a bind-mounted directory
on the host immediately generate file system events within a container,
there is no need for writes to build artifacts within the container to
be immediately reflected on the host file system. Distinguishing between
these two cases makes it possible to significantly improve performance.
There are three broad scenarios to consider, based on which you can dial in the level of consistency you need. In each case, the container
has an internally-consistent view of bind-mounted directories, but in
two cases temporary discrepancies are allowed between container and host.
* `consistent`: perfect consistency
(host and container have an identical view of the mount at all times)
* `cached`: the host's view is authoritative
(permit delays before updates on the host appear in the container)
* `delegated`: the container's view is authoritative
(permit delays before updates on the container appear in the host)
## Examples
Each of these configurations (`consistent`, `cached`, `delegated`) can be specified as a suffix to the [`-v`](https://docs.docker.com/engine/reference/run/#volume-shared-filesystems)
option of [`docker run`](https://docs.docker.com/engine/reference/run.md).
For example, to bind-mount `/Users/yallop/project` in a container under
the path `/project`, you might run the following command:
```
docker run -v /Users/yallop/project:/project:cached alpine command
```
The caching configuration can be varied independently for each bind mount,
so you can mount each directory in a different mode:
```
docker run -v /Users/yallop/project:/project:cached \
-v /host/another-path:/mount/another-point:consistent
alpine command
```
## Semantics
The semantics of each configuration is described as a set of guarantees
relating to the observable effects of file system operations. In this
specification, "host" refers to the file system of the user's Docker
client.
### delegated
The `delegated` configuration provides the weakest set of guarantees.
For directories mounted with `delegated` the container's view of the
file system is authoritative, and writes performed by containers may not
be immediately reflected on the host file system. As with (e.g.) NFS
asynchronous mode, if a running container with a `delegated` bind mount
crashes, then writes may be lost.
However, by relinquishing consistency, `delegated` mounts can offer
significantly better performance than the other configurations. Where
the data written is ephemeral or readily reproducible (e.g. scratch
space or build artifacts) `delegated` may be optimal for a user's
workload.
A `delegated` mount offers the following guarantees, which are presented
as constraints on the container run-time:
1. If the implementation offers file system events, the container state
as it relates to a specific event **_must_** reflect the host file system
state at the time the event was generated if no container modifications
pertain to related file system state.
2. If flush or sync operations are performed, relevant data **_must_** be
written back to the host file system. Between flush or sync
operations containers **_may_** cache data written, metadata modifications,
and directory structure changes.
3. All containers hosted by the same runtime **_must_** share a consistent
cache of the mount.
4. When any container sharing a `delegated` mount terminates, changes
to the mount **_must_** be written back to the host file system. If this
writeback fails, the container's execution **_must_** fail via exit code
and/or Docker event channels.
5. If a `delegated` mount is shared with a `cached` or a `consistent`
mount, those portions that overlap **_must_** obey `cached` or `consistent`
mount semantics, respectively.
Besides these constraints, the `delegated` configuration offers the
container runtime a degree of flexibility:
6. Containers **_may_** retain file data and metadata (including directory
structure, existence of nodes, etc) indefinitely and this cache **_may_**
desynchronize from the file system state of the host. Implementors are
encouraged to expire caches when host file system changes occur but,
due to platform limitations, may be unable to do this in any specific
timeframe.
7. If changes to the mount source directory are present on the host
file system, those changes **_may_** be lost when the `delegated` mount
synchronizes with the host source directory.
8. Behaviors 6-7 **do not** apply to the file types of socket, pipe, or device.
### cached
The `cached` configuration provides all the guarantees of the `delegated`
configuration, and some additional guarantees around the visibility of writes
performed by containers. As such, `cached` typically improves the performance
of read-heavy workloads, at the cost of some temporary inconsistency between the
host and the container.
For directories mounted with `cached`, the host's view of
the file system is authoritative; writes performed by containers are immediately
visible to the host, but there may be a delay before writes performed on the
host are visible within containers.
>**Tip:** To learn more about `cached`, see the article on
[User-guided caching in Docker for Mac](https://blog.docker.com/2017/05/user-guided-caching-in-docker-for-mac/).
1. Implementations **_must_** obey `delegated` Semantics 1-5.
2. If the implementation offers file system events, the container state
as it relates to a specific event **_must_** reflect the host file system
state at the time the event was generated.
3. Container mounts **_must_** perform metadata modifications, directory
structure changes, and data writes consistently with the host file
system, and **_must not_** cache data written, metadata modifications, or
directory structure changes.
4. If a `cached` mount is shared with a `consistent` mount, those portions
that overlap **_must_** obey `consistent` mount semantics.
Some of the flexibility of the `delegated` configuration is retained,
namely:
5. Implementations **_may_** permit `delegated` Semantics 6.
### consistent
The `consistent` configuration places the most severe restrictions on
the container run-time. For directories mounted with `consistent` the
container and host views are always synchronized: writes performed
within the container are immediately visible on the host, and writes
performed on the host are immediately visible within the container.
The `consistent` configuration most closely reflects the behavior of
bind mounts on Linux. However, the overheads of providing strong
consistency guarantees make it unsuitable for a few use cases, where
performance is a priority and maintaining perfect consistency has low
priority.
1. Implementations **_must_** obey `cached` Semantics 1-4.
2. Container mounts **_must_** reflect metadata modifications, directory
structure changes, and data writes on the host file system immediately.
### default
The `default` configuration is identical to the `consistent`
configuration except for its name. Crucially, this means that `cached`
Semantics 4 and `delegated` Semantics 5 that require strengthening
overlapping directories do not apply to `default` mounts. This is the
default configuration if no `state` flags are supplied.

View File

@ -1,5 +1,5 @@
---
description: OSXFS
description: Osxfs
keywords: mac, osxfs
redirect_from:
- /mackit/osxfs/
@ -27,8 +27,8 @@ Mac software dubiously relies on case-insensitivity to function.
### Access control
`osxfs`, and therefore Docker, can access only those file system resources that
the Docker for Mac user has access to. `osxfs` does not run as `root`. If the OS
X user is an administrator, `osxfs` inherits those administrator privileges. We
the Docker for Mac user has access to. `osxfs` does not run as `root`. If the macOS
user is an administrator, `osxfs` inherits those administrator privileges. We
are still evaluating which privileges to drop in the file system process to
balance security and ease-of-use. `osxfs` performs no additional permissions
checks and enforces no extra access control on accesses made through it. All
@ -69,6 +69,8 @@ VM, an attempt to bind mount it will fail rather than create it in the VM. Paths
that already exist in the VM and contain files are reserved by Docker and cannot
be exported from macOS.
>Please see **[Performance tuning for volume mounts (shared filesystems)](/docker-for-mac/osxfs-caching.md)** to learn about new configuration options available with the Docker 17.04 CE Edge release.
### Ownership
Initially, any containerized process that requests ownership metadata of
@ -149,17 +151,18 @@ between macOS userspace processes and the macOS kernel.
### Performance issues, solutions, and roadmap
>Please see **[Performance tuning for volume mounts (shared filesystems)](/docker-for-mac/osxfs-caching.md)** to learn about new configuration options available with the Docker 17.04 CE Edge release.
With regard to reported performance issues ([GitHub issue 77: File access in
mounted volumes extremely slow](https://github.com/docker/for-mac/issues/77)),
and a similar thread on [Docker for Mac forums on topic: File access in mounted
volumes extremely
slow](https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/),
this topic provides an explanation of the issues, what we are doing to
address them, how the community can help us, and what you can expect in the
future. This explanation is a slightly re-worked version of an [understanding
performance
post](https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/8076/158?u=orangesnap)
from David Sheets (@dsheets) on the [Docker development
this topic provides an explanation of the issues, recent progress in addressing
them, how the community can help us, and what you can expect in the
future. This explanation derives from a [post about understanding
performance](https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/8076/158?u=orangesnap)
by David Sheets (@dsheets) on the [Docker development
team](https://forums.docker.com/groups/Docker) to the forum topic just
mentioned. We want to surface it in the documentation for wider reach.
@ -172,7 +175,7 @@ file system server in Docker for Mac. File system APIs are very wide (20-40
message types) with many intricate semantics involving on-disk state, in-memory
cache state, and concurrent access by multiple processes. Additionally, `osxfs`
integrates a mapping between macOS's FSEvents API and Linux's `inotify` API
which is implemented inside of the file system itself complicating matters
which is implemented inside of the file system itself, complicating matters
further (cache behavior in particular).
At the highest level, there are two dimensions to file system performance:
@ -186,65 +189,64 @@ Latency is the time it takes for a file system call to complete. For instance,
the time between a thread issuing write in a container and resuming with the
number of bytes written. With a classical block-based file system, this latency
is typically under 10μs (microseconds). With `osxfs`, latency is presently
around 200μs for most operations or 20x slower. For workloads which demand many
sequential roundtrips, this results in significant observable slowdown. To
reduce the latency, we need to shorten the data path from a Linux system call to
around 130μs for most operations or 13× slower. For workloads which demand many
sequential roundtrips, this results in significant observable slowdown.
Reducing the latency requires shortening the data path from a Linux system call to
macOS and back again. This requires tuning each component in the data path in
turn -- some of which require significant engineering effort. Even if we achieve
a huge latency reduction of 100μs/roundtrip, we will still "only" see a doubling
a huge latency reduction of 65μs/roundtrip, we will still "only" see a doubling
of performance. This is typical of performance engineering, which requires
significant effort to analyze slowdowns and develop optimized components. We
know how we can likely halve the roundtrip time but we haven't implemented those
improvements yet (more on this below in
know a number of approaches that will probably reduce the roundtrip time but we
haven't implemented all those improvements yet (more on this below in
[What you can do](osxfs.md#what-you-can-do)).
There is hope for significant performance improvement in the near term despite
these fundamental communication channel properties, which are difficult to
overcome (latency in particular). This hope comes in the form of increased
caching (storing "recent" values closer to their use to prevent roundtrips
completely). The Linux kernel's VFS layer contains a number of caches which can
be used to greatly improve performance by reducing the required communication
with the file system. Using this caching comes with a number of trade-offs:
A second approach to improving performance is to reduce the number of
roundtrips by caching data. Recent versions of Docker for Mac (17.04 onwards)
include caching support that brings significant (2-4×) improvements to many
applications. Much of the overhead of osxfs arises from the requirement to
keep the container's and the host's view of the file system consistent, but
full consistency is not necessary for all applications and relaxing the
constraint opens up a number of opportunities for improved performance.
* It requires understanding the cache behavior in detail in order to write
correct, stateful functionality on top of those caches.
* It harms the coherence or consistency of the file system as observed
from Linux containers and the macOS file system interfaces.
At present there is support for read caching, with which the container's view
of the file system can temporarily drift apart from the authoritative view on
the host. Further caching developments, including support for write caching,
are planned.
A [detailed description of the behavior in various caching configurations](osxfs-caching)
is available.
#### What we are doing
We are actively working on both increasing caching while mitigating the
associated issues and on reducing the file system data path latency. This
requires significant analysis of file system traces and speculative development
of system improvements to try to address specific performance issues. Perhaps
surprisingly, application workload can have a huge effect on performance. As an
example, here are two different use cases contributed on the [forum topic](https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/)
We continue to actively work on increasing caching and on reducing the
file system data path latency. This requires significant analysis of file
system traces and speculative development of system improvements to try to
address specific performance issues. Perhaps surprisingly, application
workload can have a huge effect on performance. As an example, here are two
different use cases contributed on the
[forum topic](https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/)
and how their performance differs and suffers due to latency, caching, and
coherence:
1. A rake example (see below) appears to attempt to access 37000+
different files that don't exist on the shared volume. We can work very hard to
speed up all use cases by 2x via latency reduction but this use case will still
seem "slow". The ultimate solution for rake is to use a "negative dcache" that
different files that don't exist on the shared volume. Even with a 2× speedup
via latency reduction this use case will still seem "slow".
With caching enabled the performance increases around 3.5×, as described in
the [user-guided caching post](link-TODO).
We expect to see further performance improvements for rake with a "negative dcache" that
keeps track of, in the Linux kernel itself, the files that do not exist.
Unfortunately, even this is not sufficient for the first time rake is run on a
However, even this is not sufficient for the first time rake is run on a
shared directory. To handle that case, we actually need to develop a Linux
kernel patch which negatively caches all directory entries not in a
specified set -- and this cache must be kept up-to-date in real-time with the OS
X file system state even in the presence of missing macOS FSEvents messages and
specified set -- and this cache must be kept up-to-date in real-time with the macOS
file system state even in the presence of missing macOS FSEvents messages and
so must be invalidated if macOS ever reports an event delivery failure.
2. Running ember build in a shared file system results in ember creating many
2. Running `ember build` in a shared file system results in ember creating many
different temporary directories and performing lots of intermediate activity
within them. An empty ember project is over 300MB. This usage pattern does not
require coherence between Linux and macOS but, because we cannot distinguish this
fact at run-time, we maintain coherence during its hundreds of thousands of file
system accesses to manipulate temporary state. There is no "correct" solution in
this case. Either ember needs to change, the volume mount needs to have
coherence properties specified on it somehow, some heuristic needs to be
introduced to detect this access pattern and compensate, or the behavior needs
to be indicated via, e.g., extended attributes in the macOS file system.
require coherence between Linux and macOS, and will be significantly improved by
write caching.
These two examples come from performance use cases contributed by users and they
are incredibly helpful in prioritizing aspects of file system performance to
@ -254,24 +256,17 @@ work on next.
Under development, we have:
1. A Linux kernel patch to reduce data path latency by 2/7 copies and 2/5
context switches
2. Increased macOS integration to reduce the latency between the hypervisor and
the file system server
3. A server-side directory read cache to speed up traversal of large directories
4. User-facing file system tracing capabilities so that you can send us
recordings of slow workloads for analysis
5. A growing performance test suite of real world use cases (more on this below
1. A growing performance test suite of real world use cases (more on this below
in What you can do)
6. Experimental support for using Linux's inode, writeback, and page caches
2. Further caching improvements, including negative, structural, and write-back
caching, and lazy cache invalidation.
7. End-user controls to configure the coherence of subsets of cross-OS bind
mounts without exposing all of the underlying complexity
3. A Linux kernel patch to reduce data path latency by 2/7 copies and 2/5
context switches
4. Increased macOS integration to reduce the latency between the hypervisor and
the file system server
#### What you can do
@ -310,16 +305,16 @@ can be easily tracked.
#### What you can expect
We will continue to work toward an optimized shared file system implementation
on the Beta channel of Docker for Mac.
on the Edge channel of Docker for Mac.
You can expect some of the performance improvement work mentioned above to reach
the Beta channel in the coming release cycles.
the Edge channel in the coming release cycles.
In due course, we will open source all of our shared file system components. At
that time, we would be very happy to collaborate with you on improving the
implementation of `osxfs` and related software.
We still have on the slate to write up and publish details of shared file system
We also plan to write up and publish further details of shared file system
performance analysis and improvement on the Docker blog. Look for or nudge
@dsheets about those articles, which should serve as a jumping off point for
understanding the system, measuring it, or contributing to it.

View File

@ -151,6 +151,21 @@ $ docker run -d -P --name web -v /src/webapp:/webapp:ro training/webapp python a
Here you've mounted the same `/src/webapp` directory but you've added the `ro`
option to specify that the mount should be read-only.
You can also relax the consistency requirements of a mounted directory
to improve performance by adding the `cached` option:
```bash
$ docker run -d -P --name web -v /src/webapp:/webapp:cached training/webapp python app.py
```
The `cached` option typically improves the performance of read-heavy workloads
on Docker for Mac, at the cost of some temporary inconsistency between the host
and the container. On other platforms, `cached` currently has no effect. The
article [User-guided caching in Docker for
Mac](https://blog.docker.com/2017/05/user-guided-caching-in-docker-for-mac/)
gives more details about the behavior of `cached` on macOS.
>**Note**: The host directory is, by its nature, host-dependent. For this
>reason, you can't mount a host directory from `Dockerfile`, the `VOLUME`
instruction does not support passing a `host-dir`, because built images