docs/swarm/multi-manager-setup.md

---
advisory: swarm-standalone
hide_from_sitemap: true
description: High availability in Swarm
keywords: docker, swarm,  clustering
title: High availability in Docker Swarm
---

In Docker Swarm, the **swarm manager** is responsible for the entire cluster and manages the resources of multiple *Docker hosts* at scale. If the swarm manager dies, you must create a new one and deal with an interruption of service.

The *High Availability* feature allows a swarm to gracefully handle the failover of a manager instance. Using this feature, you can create a single **primary manager** instance and multiple **replica** instances.

A primary manager is the main point of contact with the swarm cluster. You can also create and talk to replica instances that act as backups. Requests issued on a replica are automatically proxied to the primary manager. If the primary manager fails, a replica takes away the lead. In this way, you always keep a point of contact with the cluster.

## Setup primary and replicas

This section explains how to set up Docker Swarm using multiple **managers**.

### Assumptions

You need either a `Consul`, `etcd`, or `Zookeeper` cluster. This procedure is written assuming a `Consul` server running on address `192.168.42.10:8500`. All hosts have a Docker Engine configured to listen on port 2375.  The Managers operate on port 4000. The sample swarm configuration has three machines:

- `manager-1` on `192.168.42.200`
- `manager-2` on `192.168.42.201`
- `manager-3` on `192.168.42.202`

### Create the primary manager

You use the `swarm manage` command with the `--replication` and `--advertise` flags to create a primary manager.

      user@manager-1 $ swarm manage -H :4000 <tls-config-flags> --replication --advertise 192.168.42.200:4000 consul://192.168.42.10:8500/nodes
      INFO[0000] Listening for HTTP addr=:4000 proto=tcp
      INFO[0000] Cluster leadership acquired
      INFO[0000] New leader elected: 192.168.42.200:4000
      [...]


The  `--replication` flag tells Swarm that the manager is part of a multi-manager configuration and that this primary manager competes with other manager instances for the primary role. The primary manager has the authority to manage cluster, replicate logs, and replicate events happening inside the cluster.

The `--advertise` option specifies the primary manager address. Swarm uses this address to advertise to the cluster when the node is elected as the primary. As you see in the command's output, the address you provided now appears to be the one of the elected Primary manager.


### Create two replicas

Now that you have a primary manager, you can create replicas.

    user@manager-2 $ swarm manage -H :4000 <tls-config-flags> --replication --advertise 192.168.42.201:4000 consul://192.168.42.10:8500/nodes
    INFO[0000] Listening for HTTP                            addr=:4000 proto=tcp
    INFO[0000] Cluster leadership lost
    INFO[0000] New leader elected: 192.168.42.200:4000
    [...]

This command creates a replica manager on `192.168.42.201:4000` which is looking at `192.168.42.200:4000` as the primary manager.

Create an additional, third *manager* instance:

    user@manager-3 $ swarm manage -H :4000 <tls-config-flags> --replication --advertise 192.168.42.202:4000 consul://192.168.42.10:8500/nodes
    INFO[0000] Listening for HTTP                            addr=:4000 proto=tcp
    INFO[0000] Cluster leadership lost
    INFO[0000] New leader elected: 192.168.42.200:4000
    [...]

Once you have established your primary manager and the replicas, create **swarm agents** as you normally would.


### List machines in the cluster

Typing `docker info` should give you an output similar to the following:


    user@my-machine $ export DOCKER_HOST=192.168.42.200:4000 # Points to manager-1
    user@my-machine $ docker info
    Containers: 0
    Images: 25
    Storage Driver:
    Role: Primary  <--------- manager-1 is the Primary manager
    Primary: 192.168.42.200
    Strategy: spread
    Filters: affinity, health, constraint, port, dependency
    Nodes: 3
     swarm-agent-0: 192.168.42.100:2375
      └ Containers: 0
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 2.053 GiB
      └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-49-generic, operatingsystem=Ubuntu 14.04.2 LTS, storagedriver=aufs
     swarm-agent-1: 192.168.42.101:2375
      └ Containers: 0
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 2.053 GiB
      └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-49-generic, operatingsystem=Ubuntu 14.04.2 LTS, storagedriver=aufs
     swarm-agent-2: 192.168.42.102:2375
      └ Containers: 0
      └ Reserved CPUs: 0 / 1
      └ Reserved Memory: 0 B / 2.053 GiB
      └ Labels: executiondriver=native-0.2, kernelversion=3.13.0-49-generic, operatingsystem=Ubuntu 14.04.2 LTS, storagedriver=aufs
    Execution Driver:
    Kernel Version:
    Operating System:
    CPUs: 3
    Total Memory: 6.158 GiB
    Name:
    ID:
    Http Proxy:
    Https Proxy:
    No Proxy:

This information shows that `manager-1` is the current primary and supplies the address to use to contact this primary.

## Test the failover mechanism

To test the failover mechanism, you shut down the designated primary manager.
Issue a `Ctrl-C` or `kill` the current primary manager (`manager-1`) to shut it down.

### Wait for automated failover

After a short time, the other instances detect the failure and a replica takes the *lead* to become the primary manager.

For example, look at `manager-2`'s logs:

    user@manager-2 $ swarm manage -H :4000 <tls-config-flags> --replication --advertise 192.168.42.201:4000 consul://192.168.42.10:8500/nodes
    INFO[0000] Listening for HTTP                            addr=:4000 proto=tcp
    INFO[0000] Cluster leadership lost
    INFO[0000] New leader elected: 192.168.42.200:4000
    INFO[0038] New leader elected: 192.168.42.201:4000
    INFO[0038] Cluster leadership acquired               <--- We have been elected as the new Primary Manager
    [...]

Because the primary manager, `manager-1`, failed right after it was elected, the replica with the address `192.168.42.201:4000`, `manager-2`, recognized the failure and attempted to take away the lead. Because `manager-2` was fast enough, the process was effectively elected as the primary manager. As a result, `manager-2` became the primary manager of the cluster.

If we take a look at `manager-3` we should see those `logs`:

    user@manager-3 $ swarm manage -H :4000 <tls-config-flags> --replication --advertise 192.168.42.202:4000 consul://192.168.42.10:8500/nodes
    INFO[0000] Listening for HTTP                            addr=:4000 proto=tcp
    INFO[0000] Cluster leadership lost
    INFO[0000] New leader elected: 192.168.42.200:4000
    INFO[0036] New leader elected: 192.168.42.201:4000   <--- manager-2 sees the new Primary Manager
    [...]


At this point, we need to export the new `DOCKER_HOST` value.


### Switch the primary

To switch the `DOCKER_HOST` to use `manager-2` as the primary, you do the following:

    user@my-machine $ export DOCKER_HOST=192.168.42.201:4000 # Points to manager-2
    user@my-machine $ docker info
    Containers: 0
    Images: 25
    Storage Driver:
    Role: Primary  <--------- manager-2 is the Primary manager
    Primary: 192.168.42.201
    Strategy: spread
    Filters: affinity, health, constraint, port, dependency
    Nodes: 3

You can use the `docker` command on any swarm manager or any replica.

If you like, you can use custom mechanisms to always point `DOCKER_HOST` to the current primary manager. Then, you never lose contact with your swarm in the event of a failover.