build: add cache introduction docs

Signed-off-by: Justin Chadwell <me@jedevc.com>
2022-09-06 12:15:10 +01:00 · 2022-09-06 12:15:10 +01:00 · 55ebeda0cc
parent cb275a78ef
commit 55ebeda0cc
2 changed files with 285 additions and 0 deletions
--- a/_data/toc.yaml
+++ b/_data/toc.yaml
@ -1391,6 +1391,8 @@ manuals:
      section:
        - path: /build/building/packaging/
          title: Packaging your software
+        - path: /build/building/cache/
+          title: Optimizing builds with cache management
        - sectiontitle: Choosing a build driver
          section:
            - path: /build/building/drivers/
--- a/build/building/cache.md
+++ b/build/building/cache.md
@ -0,0 +1,283 @@
+---
+title: Optimizing builds with cache management
+description: Improve your build speeds by taking advantage of the builtin cache
+keywords: build, buildx, buildkit, dockerfile, image layers, build instructions, build context
+---
+
+It's very unlikely you end up just building a docker image once - most of the
+time, you'll want to build it again at some point, whether that's for the next
+release of your software, or, more likely, on your local development machine
+for testing. Because building images is a frequent operation, docker provides
+several tools to speed up your builds for when you inevitably need to run them
+again.
+
+The main approach to improving your build's speed is to take advantage of
+docker's build cache.
+
+## How does the build cache work?
+
+Docker's build cache is quite simple to understand - first, remember the
+instructions that make up your Dockerfile, for example, in this build which
+might be used to create a C/C++ program:
+
+```dockerfile
+FROM ubuntu:latest
+
+RUN apt-get update && apt-get upgrade -y build-essentials
+COPY . /src/
+WORKDIR /src/
+RUN make build
+```
+
+Each instruction in this Dockerfile (roughly) translates into a layer in your
+final image. You can think of layers in a stack, with each layer adding more
+content to the filesystem on top of the layer before it:
+
+```
+stack diagram
+```
+
+Now, if one of the layers changes, somewhere - for example, suppose you make a
+change to your C/C++ program in `main.c`. After this change, the `COPY` command
+will have to run again, so that the layer changes, so the cache for that layer
+has been invalidated.
+
+```
+stack diagram with COPY layer cache invalidated
+```
+
+But since we have a change to that file, we now need to run our `make build`
+step again, so that those changes are built into our program. So since our
+cache for `COPY` was invalidated, we also have to invalidate the cache for all
+the layers after it, including our `RUN make build`, so that it will run again:
+
+```
+stack diagram with COPY + other layer cache invalidated
+```
+
+That's pretty much all there is to understand the cache - once there's a change
+in a layer, then all the layers after it will need to be rebuilt as well (even
+if they wouldn't build anything differently, they still need to re-run).
+
+> **Note**
+>
+> Suppose you have a `RUN apt-get update && apt-get upgrade -y` step in your
+> Dockerfile to upgrade all the software packages in your Debian-based image to
+> the latest version.
+>
+> Unfortunately, this doesn't mean that the images you build are *always* up to
+> date! If you built the image a week ago, then the results of your `apt-get`
+> will get cached, and re-used if you re-run it now! The only way to force a
+> re-run is to make sure that a layer before it has changed, for example, by
+> making sure you have the latest version of the image used in `FROM`.
+
+## How can I use the cache efficiently?
+
+Now that we've seen how the cache works, we can look at how to best take
+advantage of the cache to get the best results. While the cache will
+automatically work on any docker build that you run, you can often refactor
+your Dockerfile to get even better performance and save precious seconds (or
+even minutes) off of your builds!
+
+### Order your layers
+
+Putting the commands in your Dockerfile into a logical order is a great place
+to start. Because a change in an earlier step will rebuild all the later steps,
+we want to make sure that we put our most expensive steps near the beginning,
+and our most frequently changing steps near the end, to avoid unnecessarily
+rebuilding layers that haven't changed much.
+
+Let's take a simple example, a Dockerfile snippet that runs a javascript build
+from the source files in the current directory:
+
+```dockerfile
+FROM node
+WORKDIR /app
+COPY . .
+RUN npm install
+RUN npm build
+```
+
+We can examine why this isn't very efficient. If we update our `package.json`
+file, we'll install all of our dependencies and run the build from scratch, as
+intended. But, if we update `src/main.js`, then we'll install all of our
+dependencies again - even if nothing has changed!
+
+We can improve this, to only install dependencies the relevant files have
+changed:
+
+```dockerfile
+FROM node
+WORKDIR /app
+COPY package.json yarn.lock .
+RUN npm install
+COPY . .
+RUN npm build
+```
+
+What we've done is to divide up our `COPY` command to only copy over our
+`package.json` and `yarn.lock` before the `npm install` - this means that we'll
+only re-run `npm install` if those files change, instead of any of the files
+in our local directory!
+
+### Keep layers small
+
+One of the easiest things you can do to keep your images building quickly is to
+just put less stuff into your build! This keeps your image layers thin and
+lean, which means that not only will your cache stay smaller, but there should
+be fewer things that could be out-of-date and need rebuilding!
+
+To get started, here are a few tips and tricks:
+
+- Don't `COPY` unnecessary files into your build environment!
+
+  Running a command like `COPY . /src` will `COPY` your entire build context
+  into the image! If you've got logs, package manager artifacts, or even
+  previous build results in your current directory, those will also be copied
+  over, which will make your image larger than it needs to be (especially as
+  those files are usually not helpful)!
+  
+  You can avoid copying these files over by `COPY`ing only the files and
+  directories that you want, for example, you might only just want a `Makefile`
+  and your `src` directory - if that's all you need, then you can split up your
+  `COPY` into `COPY ./Makefile /src` and `COPY ./src /src`. If you do want the
+  entire current directory, but want to ignore the unnecessary files in it, you
+  can setup your [`.dockerignore` file](https://docs.docker.com/engine/reference/builder/#dockerignore-file),
+  to make sure that those files won't be copied over!
+
+- Use your package manager wisely!
+
+  No matter what operating system or programming language you choose to use as
+  your build's base image, most docker images have some sort of package manager
+  to help install software into your image. For example, `debian` has `apt`,
+  `alpine` has `apk`, `python` has `pip`, `node` has `npm`, etc, etc.
+
+  When installing packages be careful! Make sure to only install the packages
+  that you need - if you're not going to use them, don't install them. Remember
+  that this might be a different list for your local development environment
+  and your production environment. You can use multi-stage builds (which we'll
+  cover later) to split these up efficiently.
+
+- Try using the `RUN` command dedicated cache!
+
+  The `RUN` command supports a specialized cache, which can be used when you
+  need a more fine-grained cache between runs. For example, when installing
+  packages, you don't always need to fetch all of your packages from the
+  internet each time, you only need the ones that have changed!
+
+  To solve this problem, you can use `RUN --mount type=cache`. For example, for
+  your `debian`-based image you might use the following:
+
+  ```dockerfile
+  RUN \
+      --mount=type=cache,target=/var/cache/apt \
+      apt-get update && apt-get install -y git
+  ```
+
+  The use of the explicit cache with the `--mount` flag keeps the contents of
+  the `target` directory preserved between builds - so when this layer needs to
+  be rebuilt, then it'll be able to use `apt`'s own cache in `/var/cache/apt`.
+
+### Minimize the number of layers
+
+Keeping your layers small is a good step to getting quick builds - the logical
+next step is to reduce the number of layers that you have! Fewer layers mean
+that you have less to rebuild, when something in your Dockerfile changes, so
+your build will complete faster!
+
+Here are some more tips you can use:
+
+- Use an appropriate base image!
+
+  Docker provides over 170 pre-built [official images](https://hub.docker.com/search?q=&image_filter=official)
+  for almost every common development scenario! For example, if you're building
+  a Java web server, then while you could install `java` into any image you
+  like, it's much quicker (and easier to manage updates) if you use a dedicated
+  image, for example, [`openjdk`](https://hub.docker.com/_/openjdk/). Even if
+  there's not an official image for what you might want, Docker provides images
+  from [verified publishers](https://hub.docker.com/search?q=&image_filter=store)
+  and [open source partners](https://hub.docker.com/search?q=&image_filter=open_source)
+  that can help you on your way, and the community often produces third-party
+  images to use as well.
+  
+  These pre-built stop you from needing to manually install and manage the
+  software, which allows you to save valuable build time as well as disk space.
+
+- Use multi-stage builds to run builds in parallel!
+
+  <!-- x-link to multi-stage builds once we have some reworked content for that --> 
+
+  Multi-stage builds let you split up your Dockerfile into multiple distinct
+  stages, and then provide the tools to combine them all back together again.
+  The docker builder will work out dependencies between the stages and run them
+  using the most efficient strategy, even allowing you to run multiple commands at the
+  same time in this way!
+  
+  To use a multi-stage build, you can simply use multiple `FROM` commands. For
+  example, suppose you want to build a simple web server that serves HTML from
+  your `docs` directory in Git:
+  
+  ```dockerfile
+  FROM alpine as git
+  RUN apk add git
+  
+  FROM git as fetch
+  WORKDIR /repo
+  RUN git clone https://github.com/your/repository.git .
+  
+  FROM nginx as site
+  COPY --from=fetch /repo/docs/ /usr/share/nginx/html
+  ```
+  
+  This build has 3 stages - `git`, `fetch` and `site`. In this example, we've
+  used `git` as the base for the `fetch` stage, and also used `COPY`'s `--from`
+  flag to copy the data from the `docs/` directory into the NGINX server
+  directory.
+  
+  Each stage has only a few instructions, and when possible, docker will run
+  these stages in parallel. Additionally, only the final instructions in the
+  `site` stage will end up as layers in our image, so we won't have our entire
+  `git` history embedded into the final result, which helps keep our images
+  small and secure.
+
+- Combine your commands together wherever possible!
+
+  Most commands in your Dockerfile support being joined together, so that they
+  can do multiple things all at once! For example, it's fairly common to see
+  `RUN` commands being used like this:
+
+  ```dockerfile
+  RUN echo "the first command"
+  RUN echo "the second command"
+  ```
+
+  But actually, we can run both of these commands inside a single `RUN`, which
+  means that they will share the same cache! We can do this by using the `&&`
+  shell operator to run one command after another:
+
+  ```dockerfile
+  RUN echo "the first command" && echo "the second command"
+  # or to split to multiple lines
+  RUN echo "the first command" && \
+      echo "the second command"
+  ```
+
+  We can also use [heredocs]() to simplify complex multiline scripts (note the
+  `set -e` command to exit immediately after any command fails, instead of
+  continuing):
+
+  ```dockerfile
+  RUN <<EOF
+  set -e
+  echo "the first command"
+  echo "the second command"
+  EOF
+  ```
+
+## Other resources
+
+For more information on using cache to do efficient builds:
+
+<!-- x-link to dedicated cache exporter content once that's written -->
+
+- [Export your build cache](https://github.com/moby/buildkit#export-cache)