WIP: TLS Docs for Swarm

Struct edit pass thru conceptual material
Updating with comments from Mike
Tweaking menu layout
Updating for Nigel
Updating with local images, formatting fixes
Updating with the comments from review

Signed-off-by: Mary Anthony <mary@docker.com>
This commit is contained in:
Mary Anthony 2016-01-27 13:17:35 -08:00
parent 342611313e
commit f93d787e3b
24 changed files with 1887 additions and 1 deletions

594
docs/configure-tls.md Normal file
View File

@ -0,0 +1,594 @@
<!--[metadata]>
+++
title = "Configure Docker Swarm for TLS"
description = "Swarm and transport layer security"
keywords = ["docker, swarm, TLS, discovery, security, certificates"]
[menu.main]
parent="workw_swarm"
weight=55
+++
<![end-metadata]-->
# Configure Docker Swarm for TLS
In this procedure you create a two-node Swarm cluster, a Docker Engine CLI, a
Swarm Manager, and a Certificate Authority as shown below. All the Docker Engine
hosts (`client`, `swarm`, `node1`, and `node2`) have a copy of the
CA's certificate as well as their own key-pair signed by the CA.
![](images/tls-1.jpg)
You will complete the following steps in this procedure:
- [Step 1: Set up the prerequisites](#step-1-set-up-the-prerequisites)
- [Step 2: Create a Certificate Authority (CA) server](#step-2-create-a-certificate-authority-ca-server)
- [Step 3: Create and sign keys](#step-3-create-and-sign-keys)
- [Step 4: Install the keys](#step-4-install-the-keys)
- [Step 5: Configure the Engine daemon for TLS](#step-5-configure-the-engine-daemon-for-tls)
- [Step 6: Create a Swarm cluster](#step-6-create-a-swarm-cluster)
- [Step 7: Create the Swarm Manager using TLS](#step-7-create-the-swarm-manager-using-tls)
- [Step 8: Test the Swarm manager configuration](#step-8-test-the-swarm-manager-configuration)
- [Step 9: Configure the Engine CLI to use TLS](#step-9-configure-the-engine-cli-to-use-tls)
### Before you begin
The article includes steps to create your own CA using OpenSSL. This is similar
to operating your own internal corporate CA and PKI. However, this `must not`
be used as a guide to building a production-worthy internal CA and PKI. These
steps are included for demonstration purposes only - so that readers without
access to an existing CA and set of certificates can follow along and configure
Docker Swarm to use TLS.
## Step 1: Set up the prerequisites
To complete this procedure you must stand up 5 (five) Linux servers. These
servers can be any mix of physical and virtual servers; they may be on premises
or in the public cloud. The following table lists each server name and its purpose.
| Server name | Description |
|-------------|------------------------------------------------|
| `ca` | Acts as the Certificate Authority (CA) server. |
| `swarm` | Acts as the Swarm Manager. |
| `node1` | Act as a Swarm node. |
| `node2` | Act as a Swarm node. |
| `client` | Acts as a remote Docker Engine client |
Make sure that you have SSH access to all 5 servers and that they can communicate with each other using DNS name resolution. In particular:
- Open TCP port 2376 between the Swarm Manager and Swarm nodes
- Open TCP port 3376 between the Docker Engine client and the Swarm Manager
You can choose different ports if these are already in use. This example assumes
you use these ports though.
Each server must run an operating system compatible with Docker Engine. For
simplicity, the steps that follow assume all servers are running Ubuntu 14.04
LTS.
## Step 2: Create a Certificate Authority (CA) server
>**Note**:If you already have access to a CA and certificates, and are comfortable working with them, you should skip this step and go to the next.
In this step, you configure a Linux server as a CA. You use this CA to create
and sign keys. This step included so that readers without access to an existing
CA (external or corpoate) and certificates can follow along and complete the
later steps that require installing and using certificates. It is `not`
intended as a model for how to deploy production-worthy CA.
1. Logon to the terminal of your CA server and elevate to root.
$ sudo su
2. Create a private key called `ca-priv-key.pem` for the CA:
$ sudo openssl genrsa -out ca-priv-key.pem 2048
Generating RSA private key, 2048 bit long modulus
...........................................................+++
.....+++
e is 65537 (0x10001)
3. Create a public key called `ca.pem` for the CA.
The public key is based on the private key created in the previous step.
$ sudo openssl req -config /usr/lib/ssl/openssl.cnf -new -key ca-priv-key.pem -x509 -days 1825 -out ca.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:US
<output truncated>
You have now configured a CA server with a public and private keypair. You can inspect the contents of each key. To inspect the private key:
```
$ sudo openssl rsa -in ca-priv-key.pem -noout -text
```
To inspect the public key (cert): `
```
$ sudo openssl x509 -in ca.pem -noout -text`
```
The following command shows the partial contents of the CA's public key.
$ sudo openssl x509 -in ca.pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 17432010264024107661 (0xf1eaf0f9f41eca8d)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, ST=CA, L=Sanfrancisco, O=Docker Inc
Validity
Not Before: Jan 16 18:28:12 2016 GMT
Not After : Jan 13 18:28:12 2026 GMT
Subject: C=US, ST=CA, L=San Francisco, O=Docker Inc
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
00:d1:fe:6e:55:d4:93:fc:c9:8a:04:07:2d:ba:f0:
55:97:c5:2c:f5:d7:1d:6a:9b:f0:f0:55:6c:5d:90:
<output truncated>
Later, you'll use this to certificate to sign keys for other servers in the
infrastructure.
## Step 3: Create and sign keys
Now that you have a working CA, you need to create key pairs for the Swarm
Manager, Swarm nodes, and remote Docker Engine client. The commands and process
to create key pairs is identical for all servers. You'll create the following keys:
<table>
<tr>
<th></th>
<th></th>
</tr>
<tr>
<td><code>ca-priv-key.pem</td>
<td>The CA's private key and must be kept secure. It is used later to sign new keys for the other nodes in the environment. Together with the <code>ca.pem</code> file, this makes up the CA's key pair.</td>
</tr>
<tr>
<td><code>ca.pem</td>
<td>The CA's public key (also called certificate). This is installed on all nodes in the environment so that all nodes trust certificates signed by the CA. Together with the <code>ca-priv-key.pem</code> file, this makes up the CA's key pair.</td>
</tr>
<tr>
<td><code><i>node</i>.csr</code></td>
<td>A certificate signing request (CSR). A CSR is effectively an application to the CA to create a new key pair for a particular node. The CA takes the information provided in the CSR and generates the public and private key pair for that node.</td>
</tr>
<tr>
<td><code><i>node</i>-priv.key</code></td>
<td>A private key signed by the CA. The node uses this key to authenticate itself with remote Docker Engines. Together with the <code><i>node</i>-cert.pem</code> file, this makes up a node's key pair.</td>
</tr>
<tr>
<td><code><i>node</i>-cert.pem</code></td>
<td>A certificate signed by the CA. This is not used in this example. Together with the <code><i>node</i>-priv.key</code> file, this makes up a node's key pair</td>
</tr>
</table>
The commands below show how to create keys for all of your nodes. You perform this procedure in a working directory located on your CA server.
1. Logon to the terminal of your CA server and elevate to root.
$ sudo su
2. Create a private key `swarm-priv-key.pem` for your Swarm Manager
$ sudo openssl genrsa -out swarm-priv-key.pem 2048
Generating RSA private key, 2048 bit long modulus
............................................................+++
........+++
e is 65537 (0x10001)
2. Generate a certificate signing request (CSR) `swarm.csr` using the private key you create in the previous step.
$ sudo openssl req -subj "/CN=swarm" -new -key swarm-priv-key.pem -out swarm.csr
Remember, this is only for demonstration purposes. The process to create a
CSR will be slightly different in real-world production environments.
3. Create the certificate `swarm-cert.pem` based on the CSR created in the previous step.
$ sudo openssl x509 -req -days 1825 -in swarm.csr -CA ca.pem -CAkey ca-priv-key.pem -CAcreateserial -out swarm-cert.pem -extensions v3_req -extfile /usr/lib/ssl/openssl.cnf
<snip>
$ sudo openssl rsa -in swarm-priv-key.pem -out swarm-priv-key.pem
You now have a keypair for the Swarm Manager.
4. Repeat the steps above for the remaining nodes in your infrastructure (`node1`, `node2`, and `client`).
Remember to replace the `swarm` specific values with the values relevant to the node you are creating the key pair for.
<table>
<tr>
<th>Server name</th>
<th>Private key</th>
<th>CSR</th>
<th>Certificate</th>
</tr>
<tr>
<td><code>node1 </code></td>
<td><code>node1-priv-key.pem</code></td>
<td><code>node1.csr</code></td>
<td><code>node1-cert.pem</code></td>
</tr>
<tr>
<td><code>node2</code></td>
<td><code>node2-priv-key.pem</code></td>
<td><code>node2.csr</code></td>
<td><code>node2-cert.pem</code></td>
</tr>
<tr>
<td><code>client</code></td>
<td><code>client-priv-key.pem</td>
<td><code>client.csr</code></td>
<td><code>client-cert.pem</code></td>
</tr>
</table>
5. Verify that your working directory contains the following files:
# ls -l
total 64
-rw-r--r-- 1 root root 1679 Jan 16 18:27 ca-priv-key.pem
-rw-r--r-- 1 root root 1229 Jan 16 18:28 ca.pem
-rw-r--r-- 1 root root 17 Jan 18 09:56 ca.srl
-rw-r--r-- 1 root root 1086 Jan 18 09:56 client-cert.pem
-rw-r--r-- 1 root root 887 Jan 18 09:55 client.csr
-rw-r--r-- 1 root root 1679 Jan 18 09:56 client-priv-key.pem
-rw-r--r-- 1 root root 1082 Jan 18 09:44 node1-cert.pem
-rw-r--r-- 1 root root 887 Jan 18 09:43 node1.csr
-rw-r--r-- 1 root root 1675 Jan 18 09:44 node1-priv-key.pem
-rw-r--r-- 1 root root 1082 Jan 18 09:49 node2-cert.pem
-rw-r--r-- 1 root root 887 Jan 18 09:49 node2.csr
-rw-r--r-- 1 root root 1675 Jan 18 09:49 node2-priv-key.pem
-rw-r--r-- 1 root root 1082 Jan 18 09:42 swarm-cert.pem
-rw-r--r-- 1 root root 887 Jan 18 09:41 swarm.csr
-rw-r--r-- 1 root root 1679 Jan 18 09:42 swarm-priv-key.pem
You can inspect the contents of each of the keys. To inspect a private key:
```
openssl rsa -in <key-name> -noout -text
```
To inspect a public key (cert):
```
openssl x509 -in <key-name> -noout -text
```
The following commands shows the partial contents of the Swarm Manager's public
`swarm-cert.pem` key.
```
$ sudo openssl x509 -in ca.pem -noout -text
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 9590646456311914051 (0x8518d2237ad49e43)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, ST=CA, L=Sanfrancisco, O=Docker Inc
Validity
Not Before: Jan 18 09:42:16 2016 GMT
Not After : Jan 15 09:42:16 2026 GMT
Subject: CN=swarm
<output truncated>
```
## Step 4: Install the keys
In this step, you install the keys on the relevant servers in the
infrastructure. Each server needs three files:
- A copy of the Certificate Authority's public key (`ca.pem`)
- It's own private key
- It's own public key (cert)
The procedure below shows you how to copy these files from the CA server to each
server using `scp`. As part of the copy procedure, you'll rename each file as
follows on each node:
| Original name | Copied name |
|-------------------------|-------------|
| `ca.pem` | `ca.pem` |
| `<server>-cert.pem` | `cert.pem` |
| `<server>-priv-key.pem` | `key.pem` |
1. Logon to the terminal of your CA server and elevate to root.
$ sudo su
2. Create a` ~/.certs` directory on the Swarm manager.
$ ssh ubuntu@swarm 'mkdir -p /home/ubuntu/.certs'
2. Copy the keys from the CA to the Swarm Manager server.
$ scp ./ca.pem ubuntu@swarm:/home/ubuntu/.certs/ca.pem
$ scp ./swarm-cert.pem ubuntu@swarm:/home/ubuntu/.certs/cert.pem
$ scp ./swarm-key.pem ubuntu@swarm:~/.certs/key.pem
>**Note**: You may need to provide authentication for the `scp` commands to work. For example, AWS EC2 instances use certificate-based authentication. To copy the files to an EC2 instance associated with a public key called `nigel.pem`, modify the `scp` command as follows: `scp -i /path/to/nigel.pem ./ca.pem ubuntu@swarm:/home/ubuntu/.certs/ca.pem`.
3. Repeat step 2 for each remaining server in the infrastructure.
* `node1`
* `node2`
* `client`
4. Verify your work.
When the copying is complete, each machine should have the following keys.
![](images/tls-2.jpeg)
Each node in your infrastructure should have the following files in the
`/home/ubuntu/.certs/` directory:
# ls -l /home/ubuntu/.certs/
total 16
-rw-r--r-- 1 ubuntu ubuntu 1229 Jan 18 10:03 ca.pem
-rw-r--r-- 1 ubuntu ubuntu 1082 Jan 18 10:06 cert.pem
-rw-r--r-- 1 ubuntu ubuntu 1679 Jan 18 10:06 key.pem
## Step 5: Configure the Engine daemon for TLS
In the last step, you created and installed the necessary keys on each of your
Swarm nodes. In this step, you configure them to listen on the network and only
accept connections using TLS. Once you complete this step, your Swarm nodes will
listen on TCP port 2376, and only accept connections using TLS.
On `node1` and `node2` (your Swarm nodes), do the following:
1. Open a terminal on `node1` and elevate to root.
$ sudo su
2. Edit Docker Engine configuration file.
If you are following along with these instructions and using Ubuntu 14.04
LTS, the configuration file is `/etc/default/docker`. The Docker Engine
configuration file may be different depending on the Linux distribution you
are using.
3. Add the following options to the `DOCKER_OPTS` line.
-H tcp://0.0.0.0:2376 --tlsverify --tlscacert=/home/ubuntu/.certs/ca.pem --tlscert=/home/ubuntu/.certs/cert.pem --tlskey=/home/ubuntu/.certs/key.pem
2. Restart the Docker Engine daemon.
$ service docker restart
3. Repeat the procedure on `node2` as well.
## Step 6: Create a Swarm cluster
Next create a Swarm cluster. In this procedure you create a two-node Swarm
cluster using the default *hosted discovery* backend. The default hosted
discovery backend uses Docker Hub and is not recommended for production use.
1. Logon to the terminal of your Swarm manager node.
2. Create the cluster and export it's unique ID to the `TOKEN` environment variable.
$ sudo export TOKEN=$(docker run --rm swarm create)
Unable to find image 'swarm:latest' locally
latest: Pulling from library/swarm
d681c900c6e3: Pulling fs layer
<snip>
986340ab62f0: Pull complete
a9975e2cc0a3: Pull complete
Digest: sha256:c21fd414b0488637b1f05f13a59b032a3f9da5d818d31da1a4ca98a84c0c781b
Status: Downloaded newer image for swarm:latest
3. Join `node1` to the cluster.
Be sure to specify TCP port `2376` and not `2375`.
$ sudo docker run -d swarm join --addr=node1:2376 token://$TOKEN
7bacc98536ed6b4200825ff6f4004940eb2cec891e1df71c6bbf20157c5f9761
4. Join `node2` to the cluster.
$ sudo docker run -d swarm join --addr=node2:2376 token://$TOKEN
db3f49d397bad957202e91f0679ff84f526e74d6c5bf1b6734d834f5edcbca6c
## Step 7: Create the Swarm Manager using TLS
To configure and run a containerized Swarm Manager process using TLS, you
need to create a custom Swarm image that contains the Swarm Manager's keys and
the CA's trusted public key.
1. Logon to the terminal of your Swarm manager node.
2. Create a build directory and change into it
$ mkdir build && cd build
3. Copy the Swarm manager's keys in the build directory
$ cp /home/ubuntu/.certs/{ca,cert,key}.pem /home/ubuntu/build
4. Create a new `Dockerfile` file with the following contents:
FROM swarm
COPY ca.pem /etc/tlsfiles/ca.pem
COPY cert.pem /etc/tlsfiles/cert.pem
COPY key.pem /etc/tlsfiles/key.pem
This Dockerfile creates a new image called, `swarm-tls` based on the
official `swarm` image. This new image has copies of the required keys in it.
5. Build a new image from the `Dockerfile`.
$ sudo docker build -t nigel/swarm-tls:latest .
6. Launch a new container with you new `swarm-tls:latest` image.
The command runs the `swarm manage` command:
$ docker run -d -p 3376:2376 nigel/swarm-tls manage --tlsverify --tlscacert=/etc/tlsfiles/ca.pem --tlscert=/etc/tlsfiles/cert.pem --tlskey=/etc/tlsfiles/key.pem --host=0.0.0.0:2376 token://$TOKEN
The command above launches a new container based on the `swarm-tls:latest`
image. It also maps port `3376` on the server to port `2376` inside the
container. This mapping ensures that Docker Engine commands sent to the host
on port `3376` are passed on to port `2376` inside the container. The
container runs the Swarm `manage` process with the `--tlsverify`,
`--tlscacert`, `--tlscert` and `--tlskey` options specified. These options
force TLS verification and specify the location of the Swarm manager's TLS
keys.
7. Run a `docker ps` command to verify that your Swarm manager container is up
and running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
035dbf57b26e nigel/swarm-tls "/swarm manage --tlsv" 7 seconds ago Up 7 seconds 2375/tcp, 0.0.0.0:3376->2376/tcp compassionate_lovelace
Your Swarm cluster is now configured to use TLS.
## Step 8: Test the Swarm manager configuration
Now that you have a Swarm cluster built and configured to use TLS, you'll test that it works with a Docker Engine CLI.
1. Open a terminal onto your `client` server.
2. Issue the `docker version` command.
When issuing the command, you must pass it the location of the clients certifications.
$ sudo docker --tlsverify --tlscacert=/home/ubuntu/.certs/ca.pem --tlscert=/home/ubuntu/.certs/cert.pem --tlskey=/home/ubuntu/.certs/key.pem -H swarm:3376 version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:12:04 UTC 2015
OS/Arch: linux/amd64
Server:
Version: swarm/1.0.1
API version: 1.21
Go version: go1.5.2
Git commit: 744e3a3
Built:
OS/Arch: linux/amd64
The output above shows the `Server` version as "swarm/1.0.1". This means
that the command was successfully issued against the Swarm manager.
2. Verify that the same command does not work without TLS.
This time, do not pass your certs to the Swarm manager.
$ sudo docker -H swarm:3376 version
:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:12:04 UTC 2015
OS/Arch: linux/amd64
Get http://swarm:3376/v1.21/version: malformed HTTP response "\x15\x03\x01\x00\x02\x02".
* Are you trying to connect to a TLS-enabled daemon without TLS?
The output above shows that the command was rejected by the server. This is
because the server (Swarm manager) is configured to only accept connections
from authenticated clients using TLS.
## Step 9: Configure the Engine CLI to use TLS
You can configure the Engine so that you don't have to pass the TLS options when
you issue a command. To do this, you'll configure the `Docker Engine host` and
`TLS` settings as defaults on your Docker Engine client.
To do this, you place the client's keys in your `~/.docker` configuration folder. If you have other users on your system using the Engine command line, you'll need to configure their account's `~/.docker` as well. The procedure below shows how to do this for the `ubuntu` user on
your Docker Engine client.
1. Open a terminal onto your `client` server.
2. If it doesn't exist, create a `.docker` directory in the `ubuntu` user's home directory.
$ mkdir /home/ubuntu/.docker
4. Copy the Docker Engine client's keys from `/home/ubuntu/.certs` to
`/home/ubuntu/.docker`
$ cp /home/ubuntu/.certs/{ca,cert,key}.pem /home/ubuntu/.docker
5. Edit the account's `~/.bash_profile`.
6. Set the following variables:
<table>
<tr>
<th>Variable</th>
<th>Description</th>
</tr>
<tr>
<td><code>DOCKER_HOST</code></td>
<td>Sets the Docker host and TCP port to send all Engine commands to.</td>
</tr>
<tr>
<td><code>DOCKER_TLS_VERIFY</code></td>
<td>Tell's Engine to use TLS.</td>
</tr>
<tr>
<td><code>DOCKER_CERT_PATH</code></td>
<td>Specifies the location of TLS keys.</td>
</tr>
</table>
For example:
export DOCKER_HOST=tcp://swarm:3376
export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH=/home/ubuntu/.docker/
6. Save and close the file.
7. Source the file to pick up the new variables.
$ source ~/.bash_profile
8. Verify that the procedure worked by issuing a `docker version` command
$ docker version
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:12:04 UTC 2015
OS/Arch: linux/amd64
Server:
Version: swarm/1.0.1
API version: 1.21
Go version: go1.5.2
Git commit: 744e3a3
Built:
OS/Arch: linux/amd64
The server portion of the output above command shows that your Docker
client is issuing commands to the Swarm Manager and using TLS.
Congratulations! You have configured a Docker Swarm cluster to use TLS.
## Related Information
* [Secure Docker Swarm with TLS](secure-swarm-tls.md)
* [Docker security](https://docs.docker.com/engine/security/security/)

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 73 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 60 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 107 KiB

BIN
docs/images/interlock.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

BIN
docs/images/proxy-test.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

BIN
docs/images/review-work.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 62 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

BIN
docs/images/tls-1.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

BIN
docs/images/tls-2.jpeg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 84 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View File

@ -5,6 +5,7 @@ description = "Swarm and container networks"
keywords = ["docker, swarm, clustering, networking"]
[menu.main]
parent="workw_swarm"
weight=3
+++
<![end-metadata]-->

353
docs/plan-for-production.md Normal file
View File

@ -0,0 +1,353 @@
<!--[metadata]>
+++
title = "Plan for Swarm in production"
description = "Plan for Swarm in production"
keywords = ["docker, swarm, scale, voting, application, plan"]
[menu.main]
parent="workw_swarm"
weight=70
+++
<![end-metadata]-->
# Plan for Swarm in production
This article provides guidance to help you plan, deploy, and manage Docker
Swarm clusters in business critical production environments. The following high
level topics are covered:
- [Security](#security)
- [High Availability](#high-availability)
- [Performance](#performance)
- [Cluster ownership](#cluster-ownership)
## Security
There are many aspects to securing a Docker Swarm cluster. This section covers:
- Authentication using TLS
- Network access control
These topics are not exhaustive. They form part of a wider security architecture
that includes: security patching, strong password policies, role based access
control, technologies such as SELinux and AppArmor, strict auditing, and more.
### Configure Swarm for TLS
All nodes in a Swarm cluster must bind their Docker Engine daemons to a network
port. This brings with it all of the usual network related security
implications such as man-in-the-middle attacks. These risks are compounded when
the network in question is untrusted such as the internet. To mitigate these
risks, Swarm and the Engine support Transport Layer Security(TLS) for
authentication.
The Engine daemons, including the Swarm manager, that are configured to use TLS
will only accept commands from Docker Engine clients that sign their
communications. The Engine and Swarm support external 3rd party Certificate
Authorities (CA) as well as internal corporate CAs.
The default Engine and Swarm ports for TLS are:
- Engine daemon: 2376/tcp
- Swarm manager: 3376/tcp
For more information on configuring Swarm for TLS, see the **need link to
securing swarm article**
### Network access control
Production networks are complex, and usually locked down so that only allowed
traffic can flow on the network. The list below shows the network ports that
the different components of a Swam cluster listen on. You should use these to
configure your firewalls and other network access control lists.
- **Swarm manager.**
- **Inbound 80/tcp (HTTP)**. This is allows `docker pull` commands to work. If you will be pulling from Docker Hub you will need to allow connections on port 80 from the internet.
- **Inbound 2375/tcp**. This allows Docker Engine CLI commands direct to the Engine daemon.
- **Inbound 3375/tcp**. This allows Engine CLI commands to the Swarm manager.
- **Inbound 22/tcp**. This allows remote management via SSH
- **Service Discovery**:
- **Inbound 80/tcp (HTTP)**. This is allows `docker pull` commands to work. If you will be pulling from Docker Hub you will need to allow connections on port 80 from the internet.
- **Inbound *Discovery service port***. This needs setting to the port that the backend discovery service listens on (consul, etcd, or zookeeper).
- **Inbound 22/tcp**. This allows remote management via SSH
- **Swarm nodes**:
- **Inbound 80/tcp (HTTP)**. This is allows `docker pull` commands to work. If you will be pulling from Docker Hub you will need to allow connections on port 80 from the internet.
- **Inbound 2375/tcp**. This allows Engine CLI commands direct to the Docker daemon.
- **Inbound 22/tcp**. This allows remote management via SSH.
- **Custom, cross-host container networks**:
- **Inbound 7946/tcp** Allows for discovering other container networks.
- **Inbound 7946/udp** Allows for discovering other container networks.
- **Inbound <store-port>/tcp** Network key-value store service port.
- **4789/udp** For the container overlay network.
If your firewalls and other network devices are connection state aware, they
will allow responses to established TCP connections. If your devices are not
state aware, you will need to open up ephemeral ports from 32768-65535. For
added security you can configure the ephemeral port rules to only allow
connections from interfaces on known Swarm devices.
If your Swarm cluster is configured for TLS, replace `2375` with `2376`, and
`3375` with `3376`.
The ports listed above are just for Swarm cluster operations such as; cluster
creation, cluster management, and scheduling of containers against the cluster.
You may need to open additional network ports for application-related
communications.
It is possible for different components of a Swarm cluster to exist on separate
networks. For example, many organizations operate separate management and
production networks. Some Docker Engine clients may exist on a management
network, while Swarm managers, discovery service instances, and nodes might
exist on one or more production networks. To offset against network failures,
you can deploy Swarm managers, discovery services, and nodes across multiple
production networks. In all of these cases you can use the list of ports above
to assist the work of your network infrastructure teams to efficiently and
securely configure your network.
## High Availability (HA)
All production environments should be highly available, meaning they are
continuously operational over long periods of time. To achieve high
availability, an environment must the survive failures of its individual
component parts.
The following sections discuss some technologies and best practices that can
enable you to build resilient, highly-available Swarm clusters. You can then use
these cluster to run your most demanding production applications and workloads.
### Swarm manager HA
The Swarm manager is responsible for accepting all commands coming in to a Swarm
cluster, and scheduling resources against the cluster. If the Swarm manager
becomes unavailable, some cluster operations cannot be performed until the Swarm
manager becomes available again. This is unacceptable in large-scale business
critical scenarios.
Swarm provides HA features to mitigate against possible failures of the Swarm
manager. You can use Swarm's HA feature to configure multiple Swarm managers for
a single cluster. These Swarm managers operate in an active/passive formation
with a single Swarm manager being the *primary*, and all others being
*secondaries*.
Swarm secondary managers operate as *warm standby's*, meaning they run in the
background of the primary Swarm manager. The secondary Swarm managers are online
and accept commands issued to the cluster, just as the primary Swarm manager.
However, any commands received by the secondaries are forwarded to the primary
where they are executed. Should the primary Swarm manager fail, a new primary is
elected from the surviving secondaries.
When creating HA Swarm managers, you should take care to distribute them over as
many *failure domains* as possible. A failure domain is a network section that
can be negatively affected if a critical device or service experiences problems.
For example, if your cluster is running in the Ireland Region of Amazon Web
Services (eu-west-1) and you configure three Swarm managers (1 x primary, 2 x
secondary), you should place one in each availability zone as shown below.
![](http://farm2.staticflickr.com/1657/24581727611_0a076b79de_b.jpg)
In this configuration, the Swarm cluster can survive the loss of any two
availability zones. For your applications to survive such failures, they must be
architected across as many failure domains as well.
For Swarm clusters serving high-demand, line-of-business applications, you
should have 3 or more Swarm managers. This configuration allows you to take one
manager down for maintenance, suffer an unexpected failure, and still continue
to manage and operate the cluster.
### Discovery service HA
The discovery service is a key component of a Swarm cluster. If the discovery
service becomes unavailable, this can prevent certain cluster operations. For
example, without a working discovery service, operations such as adding new
nodes to the cluster and making queries against the cluster configuration fail.
This is not acceptable in business critical production environments.
Swarm supports four backend discovery services:
- Hosted (not for production use)
- Consul
- etcd
- Zookeeper
Consul, etcd, and Zookeeper are all suitable for production, and should be
configured for high availability. You should use each service's existing tools
and best practices to configure these for HA.
For Swarm clusters serving high-demand, line-of-business applications, it is
recommended to have 5 or more discovery service instances. This due to the
replication/HA technologies they use (such as Paxos/Raft) requiring a strong
quorum. Having 5 instances allows you to take one down for maintenance, suffer
an unexpected failure, and still be able to achieve a strong quorum.
When creating a highly available Swarm discovery service, you should take care
to distribute each discovery service instance over as many failure domains as
possible. For example, if your cluster is running in the Ireland Region of
Amazon Web Services (eu-west-1) and you configure three discovery service
instances, you should place one in each availability zone.
The diagram below shows a Swarm cluster configured for HA. It has three Swarm
managers and three discovery service instances spread over three failure
domains (availability zones). It also has Swarm nodes balanced across all three
failure domains. The loss of two availability zones in the configuration shown
below does not cause the Swarm cluster to go down.
![](http://farm2.staticflickr.com/1675/24380252320_999687d2bb_b.jpg)
It is possible to share the same Consul, etcd, or Zookeeper containers between
the Swarm discovery and Engine container networks. However, for best
performance and availability you should deploy dedicated instances &ndash; a
discovery instance for Swarm and another for your container networks.
### Multiple clouds
You can architect and build Swarm clusters that stretch across multiple cloud
providers, and even across public cloud and on premises infrastructures. The
diagram below shows an example Swarm cluster stretched across AWS and Azure.
![](http://farm2.staticflickr.com/1493/24676269945_d19daf856c_b.jpg)
While such architectures may appear to provide the ultimate in availability,
there are several factors to consider. Network latency can be problematic, as
can partitioning. As such, you should seriously consider technologies that
provide reliable, high speed, low latency connections into these cloud
platforms &ndash; technologies such as AWS Direct Connect and Azure
ExpressRoute.
If you are considering a production deployment across multiple infrastructures
like this, make sure you have good test coverage over your entire system.
### Isolated production environments
It is possible to run multiple environments, such as development, staging, and
production, on a single Swarm cluster. You accomplish this by tagging Swarm
nodes and using constraints to filter containers onto nodes tagged as
`production` or `staging` etc. However, this is not recommended. The recommended
approach is to air-gap production environments, especially high performance
business critical production environments.
For example, many companies not only deploy dedicated isolated infrastructures
for production &ndash; such as networks, storage, compute and other systems.
They also deploy separate management systems and policies. This results in
things like users having separate accounts for logging on to production systems
etc. In these types of environments, it is mandatory to deploy dedicated
production Swarm clusters that operate on the production hardware infrastructure
and follow thorough production management, monitoring, audit and other policies.
### Operating system selection
You should give careful consideration to the operating system that your Swarm
infrastructure relies on. This consideration is vital for production
environments.
It is not unusual for a company to use one operating system in development
environments, and a different one in production. A common example of this is to
use CentOS in development environments, but then to use Red Hat Enterprise Linux
(RHEL) in production. This decision is often a balance between cost and support.
CentOS Linux can be downloaded and used for free, but commercial support options
are few and far between. Whereas RHEL has an associated support and license
cost, but comes with world class commercial support from Red Hat.
When choosing the production operating system to use with your Swarm clusters,
you should choose one that closely matches what you have used in development and
staging environments. Although containers abstract much of the underlying OS,
some things are mandatory. For example, Docker container networks require Linux
kernel 3.16 or higher. Operating a 4.x kernel in development and staging and
then 3.14 in production will certainly cause issues.
You should also consider procedures and channels for deploying and potentially
patching your production operating systems.
## Performance
Performance is critical in environments that support business critical line of
business applications. The following sections discuss some technologies and
best practices that can help you build high performance Swarm clusters.
### Container networks
Docker Engine container networks are overlay networks and can be created across
multiple Engine hosts. For this reason, a container network requires a key-value
(KV) store to maintain network configuration and state. This KV store can be
shared in common with the one used by the Swarm cluster discovery service.
However, for best performance and fault isolation, you should deploy individual
KV store instances for container networks and Swarm discovery. This is
especially so in demanding business critical production environments.
Engine container networks also require version 3.16 or higher of the Linux
kernel. Higher kernel versions are usually preferred, but carry an increased
risk of instability because of the newness of the kernel. Where possible, you
should use a kernel version that is already approved for use in your production
environment. If you do not have a 3.16 or higher Linux kernel version approved
for production, you should begin the process of getting one as early as
possible.
### Scheduling strategies
<!-- NIGEL: This reads like an explanation of specific scheduling strategies rather than guidance on which strategy to pick for production or with consideration of a production architecture choice. For example, is spread a problem in a multiple clouds or random not good for XXX type application for YYY reason?
Or perhaps there is nothing to consider when it comes to scheduling strategy and network / HA architecture, application, os choice etc. that good?
-->
Scheduling strategies are how Swarm decides which nodes on a cluster to start
containers on. Swarm supports the following strategies:
- spread
- binpack
- random (not for production use)
You can also write your own.
**Spread** is the default strategy. It attempts to balance the number of
containers evenly across all nodes in the cluster. This is a good choice for
high performance clusters, as it spreads container workload across all
resources in the cluster. These resources include CPU, RAM, storage, and
network bandwidth.
If your Swarm nodes are balanced across multiple failure domains, the spread
strategy evenly balance containers across those failure domains. However,
spread on its own is not aware of the roles of any of those containers, so has
no inteligence to spread multiple instances of the same service across failure
domains. To achieve this you should use tags and constraints.
The **binpack** strategy runs as many containers as possible on a node,
effectively filling it up, before scheduling containers on the next node.
This means that binpack does not use all cluster resources until the cluster
fills up. As a result, applications running on Swarm clusters that operate the
binpack strategy might not perform as well as those that operate the spread
strategy. However, binpack is a good choice for minimizing infrastructure
requirements and cost. For example, imagine you have a 10-node cluster where
each node has 16 CPUs and 128GB of RAM. However, your container workload across
the entire cluster is only using the equivalent of 6 CPUs and 64GB RAM. The
spread strategy would balance containers across all nodes in the cluster.
However, the binpack strategy would fit all containers on a single node,
potentially allowing you turn off the additional nodes and save on cost.
## Ownership of Swarm clusters
The question of ownership is vital in production environments. It is therefore
vital that you consider and agree on all of the following when planning,
documenting, and deploying your production Swarm clusters.
- Who's budget does the production Swarm infrastructure come out of?
- Who owns the accounts that can administer and manage the production Swarm
cluster?
- Who is responsible for monitoring the production Swarm infrastructure?
- Who is responsible for patching and upgrading the production Swarm
infrastructure?
- On-call responsibilities and escalation procedures?
The above is not a complete list, and the answers to the questions will vary
depending on how your organization's and team's are structured. Some companies
are along way down the DevOps route, while others are not. Whatever situation
your company is in, it is important that you factor all of the above into the
planning, deployment, and ongoing management of your production Swarm clusters.
## Related information
* [Try Swarm at scale](swarm_at_scale.md)
* [Swarm and container networks](networking.md)
* [High availability in Docker Swarm](multi-manager-setup.md)
* [Universal Control plane](https://www.docker.com/products/docker-universal-control-plane)

View File

@ -6,7 +6,7 @@ keywords = ["docker, swarm, clustering, scheduling"]
[menu.main]
identifier="swarm_sched"
parent="workw_swarm"
weight=80
weight=5
+++
<![end-metadata]-->

167
docs/secure-swarm-tls.md Normal file
View File

@ -0,0 +1,167 @@
<!--[metadata]>
+++
title = "Overview Docker Swarm with TLS"
description = "Swarm and transport layer security"
keywords = ["docker, swarm, TLS, discovery, security, certificates"]
[menu.main]
parent="workw_swarm"
weight=50
+++
<![end-metadata]-->
# Overview Swarm with TLS
All nodes in a Swarm cluster must bind their Docker daemons to a network port.
This has obvious security implications. These implications are compounded when
the network in question is untrusted such as the internet. To mitigate these
risks, Docker Swarm and the Docker Engine daemon support Transport Layer Security
(TLS).
> **Note**: TLS is the successor to SSL (Secure Sockets Layer) and the two
> terms are often used interchangeably. Docker uses TLS, this
> term is used throughout this article.
## Learn the TLS concepts
Before going further, it is important to understand the basic concepts of TLS
and public key infrastructure (PKI).
Public key infrastructure is a combination of security-related technologies,
policies, and procedures, that are used to create and manage digital
certificates. These certificates and infrastructure secure digital
communication using mechanisms such as authentication and encryption.
The following analogy may be useful. It is common practice that passports are
used to verify an individual's identity. Passports usually contain a photograph
and biometric information that identify the owner. A passport also lists the
country that issued it, as well as *valid from* and *valid to* dates. Digital
certificates are very similar. The text below is an extract from a a digital
certificate:
```
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 9590646456311914051 (0x8518d2237ad49e43)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=US, ST=CA, L=Sanfrancisco, O=Docker Inc
Validity
Not Before: Jan 18 09:42:16 2016 GMT
Not After : Jan 15 09:42:16 2026 GMT
Subject: CN=swarm
```
This certificate identifies a computer called **swarm**. The certificate is valid between January 2016 and January 2026 and was issued by Docker Inc based in the state of California in the US.
Just as passports authenticate individuals as they board flights and clear
customs, digital certificates authenticate computers on a network.
Public key infrastructure (PKI) is the combination of technologies, policies,
and procedures that work behind the scenes to enable digital certificates. Some
of the technologies, policies and procedures provided by PKI include:
- Services to securely request certificates
- Procedures to authenticate the entity requesting the certificate
- Procedures to determine the entity's eligibility for the certificate
- Technologies and processes to issue certificates
- Technologies and processes to revoke certificates
## How does Docker Engine authenticate using TLS
In this section, you'll learn how Docker Engine and Swarm use PKI and
certificates to increase security.
<!--[metadata]>Need to know about encryption too<![end-metadata]-->
You can configure both the Docker Engine CLI and the Engine daemon to require
TLS for authentication. Configuring TLS means that all communications between
the Engine CLI and the Engine daemon must be accompanied with, and signed by a
trusted digital certificate. The Engine CLI must provide its digital certificate
before the Engine daemon will accept incoming commands from it.
The Engine daemon must also trust the certificate that the Engine CLI uses.
This trust is usually established by way of a trusted third party. The Engine
CLI and daemon in the diagram below are configured to require TLS
authentication.
![](images/trust-diagram.jpg)
The trusted third party in this diagram is the the Certificate Authority (CA)
server. Like the country in the passport example, a CA creates, signs, issues,
revokes certificates. Trust is established by installing the CA's root
certificate on the host running the Engine daemon. The Engine CLI then requests
its own certificate from the CA server, which the CA server signs and issues to
the client.
The Engine CLI sends its certificate to the Engine daemon before issuing
commands. The daemon inspects the certificate, and because daemon trusts the CA,
the daemon automatically trusts any certificates signed by the CA. Assuming the
certificate is in order (the certificate has not expired or been revoked etc.)
the Engine daemon accepts commands from this trusted Engine CLI.
The Docker Engine CLI is simply a client that uses the Docker Remote API to
communicate with the Engine daemon. Any client that uses this Docker Remote API can use
TLS. For example, other Engine clients such as Docker Universal Control Plane
(UCP) have TLS support built-in. Other, third party products, that use Docker's
Remote API, can also be configured this way.
## TLS modes with Docker and Swarm
Now that you know how certificates are used by Docker Engine for authentication,
it's important to be aware of the three TLS configurations possible with Docker
Engine and its clients:
- External 3rd party CA
- Internal corporate CA
- Self-signed certificates
These configurations are differentiated by the type of entity acting as the Certificate Authority (CA).
### External 3rd party CA
An external CA is a trusted 3rd party company that provides a means of creating,
issuing, revoking, and otherwise managing certificates. They are *trusted* in
the sense that they have to fulfill specific conditions and maintain high levels
of security and business practices to win your business. You also have to
install the external CA's root certificates for you computers and services to
*trust* them.
When you use an external 3rd party CA, they create, sign, issue, revoke and
otherwise manage your certificates. They normally charge a fee for these
services, but are considered an enterprise-class scalable solution that
provides a high degree of trust.
### Internal corporate CA
Many organizations choose to implement their own Certificate Authorities and
PKI. Common examples are using OpenSSL and Microsoft Active Directory. In this
case, your company is its own Certificate Authority with all the work it
entails. The benefit is, as your own CA, you have more control over your PKI.
Running your own CA and PKI requires you to provide all of the services offered
by external 3rd party CAs. These include creating, issuing, revoking, and
otherwise managing certificates. Doing all of this yourself has its own costs
and overheads. However, for a large corporation, it still may reduce costs in
comparison to using an external 3rd party service.
Assuming you operate and manage your own internal CAs and PKI properly, an
internal, corporate CA can be a highly scalable and highly secure option.
### Self-signed certificates
As the name suggests, self-signed certificates are certificates that are signed
with their own private key rather than a trusted CA. This is a low cost and
simple to use option. If you implement and manage self-signed certificates
correctly, they can be better than using no certificates.
Because self-signed certificates lack of a full-blown PKI, they do not scale
well and lack many of the advantages offered by the other options. One of their
disadvantages is you cannot revoke self-signed certificates. Due to this, and
other limitations, self-signed certificates are considered the least secure of
the three options. Self-signed certificates are not recommended for public
facing production workloads exposed to untrusted networks.
## Related information
* [Configure Docker Swarm for TLS](configuire-tls.md)
* [Docker security](https://docs.docker.com/engine/security/security/)

771
docs/swarm_at_scale.md Normal file
View File

@ -0,0 +1,771 @@
<!--[metadata]>
+++
title = "Try Swarm at scale"
description = "Try Swarm at scale"
keywords = ["docker, swarm, scale, voting, application, certificates"]
[menu.main]
parent="workw_swarm"
weight=75
+++
<![end-metadata]-->
# Try Swarm at scale
Using this example, you'll deploy a voting application on a Swarm cluster. The
example walks you through creating a Swarm cluster and deploying the application
against the cluster. This walk through is intended to illustrate one example of
a typical development process.
After building and manually deploying the voting application, you'll construct a
Docker Compose file. You (or others) can use the file to deploy and scale the
application further. The article also provides a troubleshooting section you can
use while developing or deploying the voting application.
## About the example
Your company is a pet food company that has bought an commercial during the
Superbowl. The commercial drives viewers to a web survey that asks users to vote &ndash; cats or dogs. You are developing the web survey. Your survey must ensure that
millions of people can vote concurrently without your website becoming
unavailable. You don't need real-time results, a company press release announces
the results. However, you do need confidence that every vote is counted.
The example assumes you are deploying the application to a Docker Swarm cluster
running on top of Amazon Web Services (AWS). AWS is an example only. There is
nothing about this application or deployment that requires it. You could deploy
the application to a Docker Swarm cluster running on; a different cloud provider
such as Microsoft Azure, on premises in your own physical data center, or in a
development environment on your laptop.
The example requires you to perform the following high-level steps:
- [Deploy your infrastructure](#deploy-your-infrastructure)
- [Create the Swarm cluster](#create-the-swarm-cluster)
- [Overlay a container network on the cluster](#overlay-a-container-network-on-the-cluster)
- [Deploy the voting application](#deploy-the-voting-application)
- [Test the application](#test-the-application)
Before working through the sample, make sure you understand the application and Swarm cluster architecture.
### Application architecture
The voting application is a Dockerized microservice application. It uses a
parallel web frontend that sends jobs to asynchronous background workers. The
application's design can accommodate arbitrarily large scale. The diagram below
shows the high level architecture of the application.
![](images/app-architecture.jpg)
The application is fully Dockerized with all services running inside of
containers.
The frontend consists of an Interlock load balancer with *n* frontend web
servers and associated queues. The load balancer can handle an arbitrary number
of web containers behind it (`frontend01`- `frontendN`). The web containers run
a simple Python Flask application. Each container accepts votes and queues them
to a Redis container on the same node. Each web container and Redis queue pair
operates independently.
The load balancer together with the independent pairs allows the entire
application to scale to an arbitrary size as needed to meet demand.
Behind the frontend is a worker tier which runs on separate nodes. This tier:
* scans the Redis containers
* dequeues votes
* deduplicates votes to prevent double voting
* commits the results to a Postgres container running on a separate node
Just like the front end, the worker tier can also scale arbitrarily.
### Swarm Cluster Architecture
To support the application the design calls for a Swarm cluster that with a single Swarm manager and 4 nodes as shown below.
![](images/swarm-cluster-arch.jpg)
All four nodes in the cluster are running the Docker daemon, as is the Swarm
manager and the Interlock load balancer. The Swarm manager exists on a Docker
host that is not part of the cluster and is considered out of band for the
application. The Interlock load balancer could be placed inside of the cluster,
but for this demonstration it is not.
The diagram below shows the application architecture overlayed on top of the
Swarm cluster architecture. After completing the example and deploying your
application, this is what your environment should look like.
![](images/final-result.jpg)
As the previous diagram shows, each node in the cluster runs the following containers:
- `frontend01`:
- Container: Pyhton flask web app (frontend01)
- Container: Redis (redis01)
- `frontend02`:
- Container: Python flask web app (frontend02)
- Container: Redis (redis02)
- `worker01`: vote worker app (worker01)
- `store`:
- Container: Postgres (pg)
- Container: results app (results-app)
## Deploy your infrastructure
As previously stated, this article will walk you through deploying the
application to a Swam cluster in an AWS Virtual Private Cloud (VPC). However,
you can reproduce the environment design on whatever platform you wish. For
example, you could place the application on another public cloud platform such
as DigitalOcean, on premises in your data center, or even in in a test
environment on your laptop.
Deploying the AWS infrastructure requires that you first build the VPC and then
apply apply the [CloudFormation
template](https://github.com/docker/swarm-demo-voting-app/blob/master/AWS/cloudformation.json).
While you cloud create the entire VPC and all instances via a CloudFormation
template, splitting the deployment into two steps allows the CloudFormation
template to be easily used to build instances in *existing VPCs*.
The diagram below shows the VPC infrastructure required to run the
CloudFormation template.
![](images/cloud-formation-tmp.jpg)
The AWS configuration is a single VPC with a single public subnet. The VPC must
be in the `us-west-1` Region (N. California). This Region is required for this
particular CloudFormation template to work. The VPC network address space is
`192.168.0.0/16` and single 24-bit public subnet is carved out as
192.168.33.0/24. The subnet must be configured with a default route to the
internet via the VPC's internet gateway. All 6 EC2 instances are deployed into
this public subnet.
Once the VPC is created you can deploy the EC2 instances using the
CloudFormation template located
[here](https://github.com/docker/swarm-demo-voting-app/blob/master/AWS/cloudformation.json).
>**Note**: If you are not deploying to AWS, or are not using the CloudFormation template mentioned above, make sure your Docker hosts are running a 3.16 or higher kernel. This kernel is required by Docker's container networking feature.
### Step 1. Build and configure the VPC
This step assumes you know [how to configure a VPC](link here) either manually
or using the VPC wizard on Amazon. You can build the VPC manually or by using
using the VPC Wizard. If you use the wizard, be sure to choose the **VPC with a
Single Public Subnet** option.
Configure your VPC with the following values:
- **Region**: N. California (us-west-1)
- **VPC Name**: Swarm-scale
- **VPC Network (CIDR)**: 192.168.0.0/16
- **DNS resolution**: Yes
- **Subnet name**: PublicSubnet
- **Subnet type**: Public (with route to the internet)
- **Subnet network (CIDR)**: 192.168.33.0/24
- **Auto-assign public IP**: Yes
- **Availability Zone**: Any
- **Router**: A single router with a route for *local* traffic and default route for traffic to the internet
- **Internet gateway**: A single internet gateway used as default route for the subnet's routing table
You'll configure the remaining AWS settings in the next section as part of the
CloudFormation template.
### Step 2. Apply the CloudFormation template
Before you can apply the CloudFormation template, you will need to have created
a VPC as per instructions in the previous section. You will also need access to
the private key of an EC2 KeyPair associated with your AWS account in the
`us-west-1` Region. Follow the steps below to build the remainder of the AWS
infrastructure using the CloudFormation template.
1. Choose **Create Stack** from the CloudFormation page in the AWS Console
2. Click the **Choose file** button under the **Choose a template** section
3. Select the **swarm-scale.json** CloudFormation template available from the [application's GitHub repo](https://github.com/docker/swarm-demo-voting-app/blob/master/AWS/cloudformation.json)
4. Click **Next**
5. Give the Stack a name. You can name the stack whatever you want, though it is recommended to use a meaningful name
6. Select a KeyPair form the dropdown list
7. Select the correct **Subnetid** (PublicSubnet) and **Vpcid** (SwarmCluster) from the dropdowns
8. Click **Next**
9. Click **Next** again
10. Review your settings and click **Create**
AWS displays the progress of your stack being created
### Step 3. Check your deployment
When completed, the CloudFormation populates your VPC with the following six EC2 instances:
- `manager`: t2.micro / 192.168.33.11
- `interlock`: t2.micro / 192.168.33.12
- `frontend01`: t2.micro / 192.168.33.20
- `frontend02`: t2.micro / 192.168.33.21
- `worker01`: t2.micro / 192.168.33.200
- `store`: m3.medium / 192.168.33.250
Your AWS infrastructure should look like this.
![](images/aws-infrastructure.jpg)
All instances are based on the `ami-56f59e36` AMI. This is an Ubuntu 14.04
image with a 3.16 kernel and 1.9.1 of the Docker Engine installed. It also has
the following parameters added to the `DOCKER_OPTS` line in
`/etc/default/docker`:
```
--cluster-store=consul://192.168.33.11:8500 --cluster-advertise=eth0:2375 -H=tcp://0.0.0.0:2375 -H=unix:///var/run/docker.sock\
```
Once your stack is created successfully you are ready to progress to the next
step and build the Swarm cluster. From this point, the instructions refer to the
AWS EC2 instances as "nodes".
## Create the Swarm cluster
Now that your underlying network infrastructure is built, you are ready to build and configure the Swarm cluster.
### Step 1: Construct the cluster
The steps below construct a Swarm cluster by:
* using Consul as the discovery backend
* join the `frontend`, `worker`
* `store` EC2 instances to the cluster
* use the `spread` scheduling strategy.
Perform all of the following commands from the `manager` node.
1. Start a new Consul container that listens on TCP port 8500
$ sudo docker run --restart=unless-stopped -d -p 8500:8500 -h consul progrium/consul -server -bootstrap
This starts a Consul container for use as the Swarm discovery service. This
backend is also used as the K/V store for the container network that you
overlay on the Swarm cluster in a later step.
2. Start a Swarm manager container.
This command maps port 3375 on the `manager` node to port 2375 in the
Swarm manager container
$ sudo docker run --restart=unless-stopped -d -p 3375:2375 swarm manage consul://192.168.33.11:8500/
This Swarm manager container is the heart of your Swarm cluster. It is
responsible for receiving all Docker commands sent to the cluster, and for
scheduling resources against the cluster. In a real-world production
deployment you would configure additional replica Swarm managers as
secondaries for high availability (HA).
3. Set the `DOCKER_HOST` environment variable.
This ensures that the default endpoint for Docker commands is the Docker daemon running on the `manager` node
$ export DOCKER_HOST="tcp://192.168.33.11:3375"
4. While still on the `manager` node, join the nodes to the cluster.
You can run these commands form the `manager` node because the `-H` flag
sends the commands to the Docker daemons on the nodes. The command joins a
node to the cluster and registers it with the Consul discovery service.
sudo docker -H=tcp://<node-private-ip>:2375 run -d swarm join --advertise=<node-private-ip>:2375 consul://192.168.33.11:8500/
Substitute `<node-private-ip` in the command with the private IP of the
you are adding. Repeat step 4 for every node you are adding to the cluster -
`frontend01`, `frontend02`, `worker01`, and `store`.
### Step 2: Review your work
The diagram below shows the Swarm cluster that you created.
![](images/review-work.jpg)
The diagram shows the `manager` node is running two containers: `consul` and
`swarm`. The `consul` container is providing the Swarm discovery service. This
is where nodes and services register themselves and discover each other. The
`swarm` container is running the `swarm manage` process which makes it act as
the cluster manager. The manager is responsible for accepting Docker commands
issued against the cluster and scheduling resources on the cluster.
You mapped port 3375 on the `manager` node to port 2375 inside the `swarm`
container. As a result, Docker clients (for example the CLI) wishing to issue
commands against the cluster must send them to the `manager` node on port
3375. The `swarm` container then executes those commands against the relevant
node(s) in the cluster over port 2375.
Now that you have your Swarm cluster configured, you'll overlay the container
network that the application containers will be part of.
## Overlay a container network on the cluster
All containers that are part of the voting application belong to a container
network called `mynet`. This will be an overlay network that allows all
application containers to easily communicate irrespective of the underlying
network that each node is on.
### Step 1: Create the network
You can create the network and join the containers from any node in your VPC
that is running Docker Engine. However, best practice when using Docker Swarm is
to execute commands from the `manager` node, as this is where all management
tasks happen.
1. Open a terminal on your `manager` node.
2. Create the overlay network with the `docker network` command
$ sudo docker network create --driver overlay mynet
An overlay container network is visible to all Docker daemons that use the
same discovery backend. As all Swarm nodes in your environment are
configured to use the Consul discovery service at
`consul://192.168.33.11:8500`, they all should see the new overlay network.
Verify this with the next step.
3. Log onto each node in your Swarm cluster and verify the `mynet` network is running.
$ sudo docker network ls
NETWORK ID NAME DRIVER
72fa20d0663d mynet overlay
bd55c57854b8 host host
25e34427f6ff bridge bridge
8eee5d2130ab none null
You should see an entry for the `mynet` network using the `overlay` driver as shown above.
### Step 2: Review your work
The diagram below shows the complete cluster configuration including the overlay
container network, `mynet`. The `mynet` is shown as red and is available to all
Docker hosts using the Consul discovery backend. Later in the procedure you will
connect containers to this network.
![](images/overlay-review.jpg)
> **Note**: The `swarm` and `consul` containers on the `manager` node are not attached to the `mynet` overlay network.
Your cluster is now built and you are ready to build and run your application on
it.
## Deploy the voting application
Now it's time to configure the application.
Some of the containers in the application are launched from custom images you
must build. Others are launched form existing images pulled directly from Docker
Hub. Deploying the application requires that you:
- Understand the custom images
- Build custom images
- Pull stock images from Docker Hub
- Launch application containers
### Step 1: Understand the custom images
The list below shows which containers use custom images and which do not:
- Web containers: custom built image
- Worker containers: custom built image
- Results containers: custom built image
- Load balancer container: stock image (`ehazlett/interlock`)
- Redis containers: stock image (official `redis` image)
- Postgres (PostgreSQL) containers: stock image (official `postgres` image)
All custom built images are built using Dockerfile's pulled from the [application's public GitHub repository](https://github.com/docker/swarm-demo-voting-app).
1. Log into the Swarm manager node.
2. Clone the [application's GitHub repo](https://github.com/docker/swarm-demo-voting-app)
$ sudo git clone https://github.com/docker/swarm-demo-voting-app
This command creates a new directory structure inside of your working
directory. The new directory contains all of the files and folders required
to build the voting application images.
The `AWS` directory contains the `cloudformation.json` file used to deploy
the EC2 instances. The `Vagrant` directory contains files and instructions
required to deploy the application using Vagrant. The `results-app`,
`vote-worker`, and `web-vote-app` directories contain the Dockerfiles and
other files required to build the custom images for those particular
components of the application.
3. Change directory into the `swarm-demo-voting-app/web-vote-app` directory and inspect the contents of the `Dockerfile`
$ cd swarm-demo-voting-app/web-vote-app/
$ cat Dockerfile
FROM python:2.7
WORKDIR /app
ADD requirements.txt /app/requirements.txt
RUN pip install -r requirements.txt
ADD . /app
EXPOSE 80
CMD ["python", "app.py"]
As you can see, the image is based on the official `Python:2.7` tagged
image, adds a requirements file into the `/app` directory, installs
requirements, copies files from the build context into the container,
exposes port `80` and tells the container which command to run.
### Step 2. Build custom images
1. Log into the swarm manager node if you haven't already.
2. Change to the root of your swarm-demo-voting app clone.
2. Build the `web-votes-app` image on `frontend01` and `frontend02`
$ sudo docker -H tcp://192.168.33.20:2375 build -t web-vote-app ./web-vote-app
$ sudo docker -H tcp://192.168.33.21:2375 build -t web-vote-app ./web-vote-app
These commands build the `web-vote-app` image on the `frontend01` and
`frontend02` nodes. To accomplish the operation, each command copies the
contents of the `swarm-demo-voting-app/web-vote-app` sub-directory from the
`manager` node to each frontend node. The command then instructs the
Docker daemon on each frontend node to build the image and store it locally.
It may take a minute or so for each image to build. Wait for the builds to finish.
3. Build `vote-worker` image on the `worker01` node
$ sudo docker -H tcp://192.168.33.200:2375 build -t vote-worker ./vote-worker
It may take a minute or so for the image to build. Wait for the build to finish.
5. Build the `results app` on the `store` node
$ sudo docker -H tcp://192.168.33.250:2375 build -t results-app ./results-app
Each of the *custom images* required by the application is now built and stored locally on the nodes that will use them.
### Step 3. Pull stock images from Docker Hub
For performance reasons, it is always better to pull any required Docker Hub images locally on each instance that needs them. This ensures that containers based on those images can start quickly.
1. Log into the Swarm `manager` node.
2. Pull the `redis` image to `frontend01` and `frontend02`
$ sudo docker -H tcp://192.168.33.20:2375 pull redis
$ sudo docker -H tcp://192.168.33.21:2375 pull redis
2. Pull the `postgres` image to the `store` node
$ sudo docker -H tcp://192.168.33.250:2375 pull postgres
3. Pull the `ehazlett/interlock` image to the `interlock` node
$ sudo docker -H tcp://192.168.33.12:2375 pull ehazlett/interlock
Each node in the cluster, as well as the `interlock` node, now has the required images stored locally as shown below.
![](images/interlock.jpg)
Now that all images are built, pulled, and stored locally, the next step is to start the application.
### Step 4. Start the voting application
The following steps will guide you through the process of starting the application
* Start the `interlock` load balancer container on `interlock`
* Start the `redis` containers on `frontend01` and `frontend02`
* Start the `web-vote-app` containers on `frontend01` and `frontend02`
* Start the `postgres` container on `store`
* Start the `worker` container on `worker01`
* Start the `results-app` container on `store`
Do the following:
1. Log into the Swarm `manager` node.
2. Start the `interlock` container on the `interlock` node
$ sudo docker -H tcp://192.168.33.12:2375 run --restart=unless-stopped -p 80:80 --name interlock -d ehazlett/interlock --swarm-url tcp://192.168.33.11:3375 --plugin haproxy start
This command is issued against the `interlock` instance and maps port 80 on the instance to port 80 inside the container. This allows the container to load balance connections coming in over port 80 (HTTP). The command also applies the `--restart=unless-stopped` policy to the container, telling Docker to restart the container if it exits unexpectadly.
2. Start a `redis` container on `frontend01` and `frontend02`
$ sudo docker run --restart=unless-stopped --env="constraint:node==frontend01" -p 6379:6379 --name redis01 --net mynet -d redis
$ sudo docker run --restart=unless-stopped --env="constraint:node==frontend02" -p 6379:6379 --name redis02 --net mynet -d redis
These two commands are issued against the Swarm cluster. The commands specify *node constraints*, forcing Swarm to start the contaienrs on `frontend01` and `frontend02`. Port 6379 on each instance is mapped to port 6379 inside of each container for debugging purposes. The command also applies the `--restart=unless-stopped` policy to the containers and attaches them to the `mynet` overlay network.
3. Start a `web-vote-app` container on `frontend01` and `frontend02`
$ sudo docker run --restart=unless-stopped --env="constraint:node==frontend01" -d -p 5000:80 -e WEB_VOTE_NUMBER='01' --name frontend01 --net mynet --hostname votingapp.local web-vote-app
$ sudo docker run --restart=unless-stopped --env="constraint:node==frontend02" -d -p 5000:80 -e WEB_VOTE_NUMBER='02' --name frontend02 --net mynet --hostname votingapp.local web-vote-app
These two commands are issued against the Swarm cluster. The commands specify *node constraints*, forcing Swarm to start the contaienrs on `frontend01` and `frontend02`. Port 5000 on each node is mapped to port 80 inside of each container. This allows connections to come in to each node on port 5000 and be forwarded to port 80 inside of each container. Both containers are attached to the `mynet` overlay network and both containers are given the `votingapp-local` hostname. The `--restart=unless-stopped` policy is also applied to these containers.
4. Start the `postgres` container on the `store` node
$ sudo docker run --restart=unless-stopped --env="constraint:node==store" --name pg -e POSTGRES_PASSWORD=pg8675309 --net mynet -p 5432:5432 -d postgres
This command is issued against the Swarm cluster and starts the container on `store`. It maps port 5432 on the `store` node to port 5432 inside the container and attaches the container to the `mynet` overlay network. It also inserts the database password into the container via the POSTGRES_PASSWORD environment variable and applies the `--restart=unless-stopped` policy to the container. Sharing passwords like this is not recommended for production use cases.
5. Start the `worker01` container on the `worker01` node
$ sudo docker run --restart=unless-stopped --env="constraint:node==worker01" -d -e WORKER_NUMBER='01' -e FROM_REDIS_HOST=1 -e TO_REDIS_HOST=2 --name worker01 --net mynet vote-worker
This command is issued against the Swarm manager and uses a constraint to start the container on the `worker01` node. It passes configuration data into the container via environment variables, telling the worker container to clear the queues on `frontend01` and `frontend02`. It adds the container to the `mynet` overlay network and applies the `--restart=unless-stopped` policy to the container.
6. Start the `results-app` container on the `store` node
$ sudo docker run --restart=unless-stopped --env="constraint:node==store" -p 80:80 -d --name results-app --net mynet results-app
This command starts the results-app container on the `store` node by means of a *node constraint*. It maps port 80 on the `store` node to port 80 inside the container. It adds the container to the `mynet` overlay network and applies the `--restart=unless-stopped` policy to the container.
The application is now fully deployed as shown in the diagram below.
![](images/fully-deployed.jpg)
## Test the application
Now that the application is deployed and running, it's time to test it.
1. Configure a DNS mapping on your local machine for browsing.
You configure a DNS mapping on the machine where you are running your web browser. This maps the "votingapp.local" DNS name to the public IP address of the `interlock` node.
- On Windows machines this is done by adding `votingapp.local <interlock-public-ip>` to the `C:\Windows\System32\Drivers\etc\hosts file`. Modifying this file requires administrator privileges. To open the file with administrator privileges, right-click `C:\Windows\System32\notepad.exe` and select `Run as administrator`. Once Notepad is open, click `file` > `open` and open the file and make the edit.
- On OSX machines this is done by adding `votingapp.local <interlock-public-ip>` to `/private/etc/hosts`.
- On most Linux machines this is done by adding `votingapp.local <interlock-public-ip>` to `/etc/hosts`.
Be sure to replace <interlock-public-ip> with the public IP address of your `interlock` node. You can find the `interlock` node's Public IP by selecting your `interlock` EC2 Instance from within the AWS EC2 console.
2. Verify the mapping worked with a ping command from your web browsers machine
C:\Users\nigelpoulton>ping votingapp.local
Pinging votingapp.local [54.183.164.230] with 32 bytes of data:
Reply from 54.183.164.230: bytes=32 time=164ms TTL=42
Reply from 54.183.164.230: bytes=32 time=163ms TTL=42
Reply from 54.183.164.230: bytes=32 time=169ms TTL=42
3. Now that name resolution is configured and you have successfully pinged `votingapp.local`, point your web browser to [http://votingapp.local](http://votingapp.local)
![](images/vote-app-test.jpg)
Notice the text at the bottom of the web page. This shows which web
container serviced the request. In the diagram above, this is `frontend02`.
If you refresh your web browser you should see this change as the Interlock
load balancer shares incoming requests across both web containers.
To see more detailed load balancer data from the Interlock service, point your web browser to [http://stats:interlock@votingapp.local/haproxy?stats](http://stats:interlock@votingapp.local/haproxy?stats)
![](images/proxy-test.jpg)
4. Cast your vote. It is recommended to choose "Dogs" ;-)
5. To see the results of the poll, you can point your web browser at the public IP of the `store` node
![](images/poll-results.jpg)
Congratulations. You have successfully walked through manually deploying a microservice-based application to a Swarm cluster.
## Troubleshooting the application
It's a fact of life that things fail. With this in mind, it's important to
understand what happens when failures occur and how to mitigate them. The
following sections cover different failure scenarios:
- [Swarm manager failures](#swarm-manager-failures)
- [Consul (discovery backend) failures](#consul-discovery-backend-failures)
- [Interlock load balancer failures](#interlock-load-balancer-failures)
- [Web (web-vote-app) failures](#web-web-vote-app-failures)
- [Redis failures](#redis-failures)
- [Worker (vote-worker) failures](#worker-vote-worker-failures)
- [Postgres failures](#postgres-failures)
- [Results-app failures](#results-app-failures)
- [Infrastructure failures](#infrastructure-failures)
### Swarm manager failures
In it's current configuration, the Swarm cluster only has single manager
container running on a single node. If the container exits or the node fails,
you will not be able to administer the cluster until you either; fix it, or
replace it.
If the failure is the Swarm manager container unexpectedly exiting, Docker will
automatically attempt to restart it. This is because the container was started
with the `--restart=unless-stopped` switch.
While the Swarm manager is unavailable, the application will continue to work in
its current configuration. However, you will not be able to provision more nodes
or containers until you have a working Swarm manager.
Docker Swarm supports high availability for Swarm managers. This allows a single
Swarm cluster to have two or more managers. One manager is elected as the
primary manager and all others operate as secondaries. In the event that the
primary manager fails, one of the secondaries is elected as the new primary, and
cluster operations continue gracefully. If you are deploying multiple Swarm
managers for high availability, you should consider spreading them across
multiple failure domains within your infrastructure.
### Consul (discovery backend) failures
The Swarm cluster that you have deployed has a single Consul container on a
single node performing the cluster discovery service. In this setup, if the
Consul container exits or the node fails, the application will continue to
operate in its current configuration. However, certain cluster management
operations will fail. These include registering new containers in the cluster
and making lookups against the cluster configuration.
If the failure is the `consul` container unexpectedly exiting, Docker will
automatically attempt to restart it. This is because the container was started
with the `--restart=unless-stopped` switch.
The `Consul`, `etcd`, and `Zookeeper` discovery service backends support various
options for high availability. These include Paxos/Raft quorums. You should
follow existing best practices for deploying HA configurations of your chosen
discover service backend. If you are deploying multiple discovery service
instances for high availability, you should consider spreading them across
multiple failure domains within your infrastructure.
If you operate your Swarm cluster with a single discovery backend service and
this service fails and is unrecoverable, you can start a new empty instance of
the discovery backend and the Swarm agents on each node in the cluster will
repopulate it.
#### Handling failures
There are many reasons why containers can fail. However, Swarm does not attempt
to restart failed containers.
One way to automatically restart failed containers is to explicitly start them
with the `--restart=unless-stopped` flag. This will tell the local Docker daemon
to attempt to restart the container if it unexpectedly exits. This will only
work in situations where the node hosting the container and it's Docker daemon
are still up. This cannot restart a container if the node hosting it has failed,
or if the Docker daemon itself has failed.
Another way is to have an external tool (external to the cluster) monitor the
state of your application, and make sure that certain service levels are
maintained. These service levels can include things like "have at least 10 web
server containers running". In this scenario, if the number of web containers
drops below 10, the tool will attempt to start more.
In our simple voting-app example, the front-end is scalable and serviced by a
load balancer. In the event that on the of the two web containers fails (or the
AWS instance that is hosting it), the load balancer will stop routing requests
to it and send all requests the surviving web container. This solution is highly
scalable meaning you can have up to *n* web containers behind the load balancer.
### Interlock load balancer failures
The environment that you have provisioned has a single
[interlock](https://github.com/ehazlett/interlock) load balancer container
running on a single node. In this setup, if the container exits or node fails,
the application will no longer be able to service incoming requests and the
application will be unavailable.
If the failure is the `interlock` container unexpectedly exiting, Docker will
automatically attempt to restart it. This is because the container was started
with the `--restart=unless-stopped` switch.
It is possible to build an HA Interlock load balancer configuration. One such
way is to have multiple Interlock containers on multiple nodes. You can then use
DNS round robin, or other technologies, to load balance across each Interlock
container. That way, if one Interlock container or node goes down, the others
will continue to service requests.
If you deploy multiple interlock load balancers, you should consider spreading
them across multiple failure domains within your infrastructure.
### Web (web-vote-app) failures
The environment that you have configured has two web-vote-app containers running
on two separate nodes. They operate behind an Interlock load balancer that
distributes incoming connections across both.
In the event that one of the web containers or nodes fails, the load balancer
will start directing all incoming requests to surviving instance. Once the
failed instance is back up, or a replacement is added, the load balancer will
add it to the configuration and start sending a portion of the incoming requests
to it.
For highest availability you should deploy the two frontend web services
(`frontend01` and `frontend02`) in different failure zones within your
infrastructure. You should also consider deploying more.
### Redis failures
If the a `redis` container fails, it's partnered `web-vote-app` container will
not function correctly. The best solution in this instance might be to configure
health monitoring that verifies the ability to write to each Redis instance. If
an unhealthy `redis` instance is encountered, remove the `web-vote-app` and
`redis` combination and attempt remedial actions.
### Worker (vote-worker) failures
If the worker container exits, or the node that is hosting it fails, the redis
containers will queue votes until the worker container comes back up. This
situation can prevail indefinitely, though a worker needs to come back at some
point and process the votes.
If the failure is the `worker01` container unexpectedly exiting, Docker will
automatically attempt to restart it. This is because the container was started
with the `--restart=unless-stopped` switch.
### Postgres failures
This application does not implement any for of HA or replication for Postgres.
Therefore losing the Postgres container would cause the application to fail and
potential lose or corrupt data. A better solution would be to implement some
form of Postgres HA or replication.
### Results-app failures
If the results-app container exits, you will not be able to browse to the
results of the poll until the container is back up and running. Results will
continue to be collected and counted, you will just not be able to view results
until the container is back up and running.
The results-app container was started with the `--restart=unless-stopped` flag
meaning that the Docker daemon will automatically attempt to restart it unless
it was administratively stopped.
### Infrastructure failures
There are many ways in which the infrastructure underpinning your applications
can fail. However, there are a few best practices that can be followed to help
mitigate and offset these failures.
One of these is to deploy infrastructure components over as many failure domains
as possible. On a service such as AWS, this often translates into balancing
infrastructure and services across multiple AWS Availability Zones (AZ) within a
Region.
To increase the availability of our Swarm cluster you could:
* Configure the Swarm manager for HA and deploy HA nodes in different AZs
* Configure the Consul discovery service for HA and deploy HA nodes in different AZs
* Deploy all scalable components of the application across multiple AZs
This configuration is shown in the diagram below.
![](images/infrastructure-failures.jpg)
This will allow us to lose an entire AZ and still have our cluster and
application operate.
But it doesn't have to stop there. Some applications can be balanced across AWS
Regions. In our example we might deploy parts of our cluster and application in
the `us-west-1` Region and the rest in `us-east-1`. It's even becoming possible
to deploy services across cloud providers, or have balance services across
public cloud providers and your on premises date ceters!
The diagram below shows parts of the application and infrastructure deployed
across AWS and Microsoft Azure. But you could just as easily replace one of
those cloud providers with your own on premises data center. In these scenarios,
network latency and reliability is key to a smooth and workable solution.
![](images/deployed-across.jpg)
## Related information
The application in this example could be deployed on Docker Universal Control Plane (UPC) which is currently in Beta release. To try the application on UPC in your environment, [request access to the UPC Beta release](https://www.docker.com/products/docker-universal-control-plane). Other useful documentation:
* [Plan for Swarm in production](plan-for-production.md)
* [Swarm and container networks](networking.md)
* [High availability in Docker Swarm](multi-manager-setup.md)