network: Update the description of iptables rules

- describe Docker's custom chains
- describe the new direct routing options
- use documentation address ranges in examples

Signed-off-by: Rob Murray <rob.murray@docker.com>
This commit is contained in:
Rob Murray 2024-06-12 10:23:00 +01:00 committed by David Karlsson
parent abef550be1
commit cbbb68a4eb
3 changed files with 213 additions and 78 deletions

View File

@ -91,9 +91,9 @@ $ docker run --rm -it --network container:redis example/redis-cli -h 127.0.0.1
## Published ports
By default, when you create or run a container using `docker create` or `docker run`,
the container doesn't expose any of its ports to the outside world.
containers on bridge networks don't expose any ports to the outside world.
Use the `--publish` or `-p` flag to make a port available to services
outside of Docker.
outside the bridge network.
This creates a firewall rule in the host,
mapping a container port to a port on the Docker host to the outside world.
Here are some examples:
@ -111,11 +111,12 @@ Here are some examples:
> a container's ports it becomes available not only to the Docker host, but to
> the outside world as well.
>
> If you include the localhost IP address (`127.0.0.1`) with the publish flag,
> only the Docker host can access the published container port.
> If you include the localhost IP address (`127.0.0.1`, or `::1`) with the
> publish flag, only the Docker host and its containers can access the
> published container port.
>
> ```console
> $ docker run -p 127.0.0.1:8080:80 nginx
> $ docker run -p 127.0.0.1:8080:80 -p '[::1]:8080:80' nginx
> ```
>
> > **Warning**
@ -132,6 +133,14 @@ it isn't necessary to publish the container's ports.
You can enable inter-container communication by connecting the containers to the
same network, usually a [bridge network](./drivers/bridge.md).
Ports on the host's IPv6 addresses will map to the container's IPv4 address
if no host IP is given in a port mapping, the bridge network is IPv4-only,
and `--userland-proxy=true` (default).
For more information about port mapping, including how to disable it and use
direct routing to containers, see
[packet filtering and firewalls](./packet-filtering-firewalls.md).
## IP address and hostname
By default, the container gets an IP address for every Docker network it attaches to.

View File

@ -105,11 +105,12 @@ The following table describes the driver-specific options that you can pass to
`--option` when creating a custom network using the `bridge` driver.
| Option | Default | Description |
| ------------------------------------------------ | -------------- | --------------------------------------------------------------------------------------------- |
|-------------------------------------------------------------------------------------------------|-----------------------------|------------------------------------------------------------------------------------------------|
| `com.docker.network.bridge.name` | | Interface name to use when creating the Linux bridge. |
| `com.docker.network.bridge.enable_ip_masquerade` | `true` | Enable IP masquerading. |
| `com.docker.network.bridge.gateway_mode_ipv4`<br/>`com.docker.network.bridge.gateway_mode_ipv6` | `nat` | Enable NAT and masquerading (`nat`), or only allow direct routing to the container (`routed`). |
| `com.docker.network.bridge.enable_icc` | `true` | Enable or Disable inter-container connectivity. |
| `com.docker.network.bridge.host_binding_ipv4` | | Default IP when binding container ports. |
| `com.docker.network.bridge.host_binding_ipv4` | all IPv4 and IPv6 addresses | Default IP when binding container ports. |
| `com.docker.network.driver.mtu` | `0` (no limit) | Set the containers network Maximum Transmission Unit (MTU). |
| `com.docker.network.container_iface_prefix` | `eth` | Set a custom prefix for container interfaces. |
| `com.docker.network.bridge.inhibit_ipv4` | `false` | Prevent Docker from [assigning an IP address](#skip-ip-address-configuration) to the network. |

View File

@ -6,34 +6,50 @@ aliases:
- /network/iptables/
---
On Linux, Docker manipulates `iptables` rules to provide network isolation.
While this is an implementation detail and you should not modify the rules
Docker inserts into your `iptables` policies, it does have some implications
on what you need to do if you want to have your own policies in addition to
those managed by Docker.
On Linux, Docker creates `iptables` and `ip6tables` rules to implement network
isolation, port publishing and filtering.
If you're running Docker on a host that is exposed to the Internet, you will
probably want to have iptables policies in place that prevent unauthorized
access to containers or other services running on your host. This page
describes how to achieve that, and what caveats you need to be aware of.
Because these rules are required for the correct functioning of Docker bridge
networks, you should not modify the rules created by Docker.
## Add iptables policies before Docker's rules
But, if you are running Docker on a host exposed to the internet, you will
probably want to add iptables policies that prevent unauthorized access to
containers or other services running on your host. This page describes how
to achieve that, and the caveats you need to be aware of.
Docker installs two custom `iptables` chains named `DOCKER-USER` and `DOCKER`,
and it ensures that incoming packets are always checked by these two chains
first. These chains are part of the `FORWARD` chain.
> **Note**
>
> Docker creates `iptables` rules for bridge networks.
>
> No `iptables` rules are created for `ipvlan`, `macvlan` or `host` networking.
All of Docker's `iptables` rules are added to the `DOCKER` chain. Do not
manipulate this chain manually. If you need to add rules which load before
Docker's rules, add them to the `DOCKER-USER` chain. These rules are applied
before any rules Docker creates automatically.
## Docker and iptables chains
Other rules added to the `FORWARD` chain, either manually, or by another
iptables-based firewall, are evaluated after the `DOCKER-USER` and `DOCKER` chains.
This means that if you publish a port through Docker,
this port gets published no matter what rules your firewall has configured.
If you want rules to apply even when a port gets published through Docker,
you must add these rules to the `DOCKER-USER` chain.
In the `filter` table, Docker sets the default policy to `DROP`, and creates the
following custom `iptables` chains:
* `DOCKER-USER`
* A placeholder for user-defined rules that will be processed before rules
in the `DOCKER` chain.
* `DOCKER`
* Rules that determine whether a packet that is not part of an established
connection should be accepted, based on the port forwarding configuration
of running containers.
* `DOCKER-ISOLATION-STAGE-1` and `DOCKER-ISOLATION-STAGE-2`
* Rules to isolate Docker networks from each other.
In the `FORWARD` chain, Docker adds rules that pass packets that are not related
to established connections to these custom chains, as well as rules to accept
packets that are part of established connections.
In the `nat` table, Docker creates chain `DOCKER` and adds rules to implement
masquerading and port-mapping.
### Add iptables policies before Docker's rules
Packets that get accepted or rejected by rules in these custom chains will not
be seen by user-defined rules appended to the `FORWARD` chain. So, to add
additional rules to filter these packets, use the `DOCKER-USER` chain.
### Match the original IP and ports for requests
@ -49,7 +65,7 @@ For example:
```console
$ sudo iptables -I DOCKER-USER -p tcp -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
$ sudo iptables -I DOCKER-USER -p tcp -m conntrack --ctorigsrc 1.2.3.4 --ctorigdstport 80 -j ACCEPT
$ sudo iptables -I DOCKER-USER -p tcp -m conntrack --ctorigdst 198.51.100.2 --ctorigdstport 80 -j ACCEPT
```
> **Important**
@ -57,76 +73,173 @@ $ sudo iptables -I DOCKER-USER -p tcp -m conntrack --ctorigsrc 1.2.3.4 --ctorigd
> Using the `conntrack` extension may result in degraded performance.
{ .important }
### Restrict connections to the Docker host
## Port publishing and mapping
By default, for both IPv4 and IPv6, the daemon blocks access to ports that have not
been published. Published container ports are mapped to host IP addresses.
To do this, it uses iptables to perform Network Address Translation (NAT),
Port Address Translation (PAT), and masquerading.
For example, `docker run -p 8080:80 [...]` creates a mapping
between port 8080 on any address on the Docker host, and the container's
port 80. Outgoing connections from the container will masquerade, using
the Docker host's IP address.
### Restrict external connections to containers
By default, all external source IPs are allowed to connect to ports that have
been published to the Docker host's addresses.
By default, all external source IPs are allowed to connect to the Docker host.
To allow only a specific IP or network to access the containers, insert a
negated rule at the top of the `DOCKER-USER` filter chain. For example, the
following rule restricts external access from all IP addresses except `192.168.1.1`:
following rule drops packets from all IP addresses except `192.0.2.2`:
```console
$ iptables -I DOCKER-USER -i ext_if ! -s 192.168.1.1 -j DROP
$ iptables -I DOCKER-USER -i ext_if ! -s 192.0.2.2 -j DROP
```
You will need to change `ext_if` to correspond with your
host's actual external interface. You could instead allow connections from a
source subnet. The following rule only allows access from the subnet `192.168.1.0/24`:
source subnet. The following rule only allows access from the subnet `192.0.2.0/24`:
```console
$ iptables -I DOCKER-USER -i ext_if ! -s 192.168.1.0/24 -j DROP
$ iptables -I DOCKER-USER -i ext_if ! -s 192.0.2.0/24 -j DROP
```
Finally, you can specify a range of IP addresses to accept using `--src-range`
(Remember to also add `-m iprange` when using `--src-range` or `--dst-range`):
```console
$ iptables -I DOCKER-USER -m iprange -i ext_if ! --src-range 192.168.1.1-192.168.1.3 -j DROP
$ iptables -I DOCKER-USER -m iprange -i ext_if ! --src-range 192.0.2.1-192.0.2.3 -j DROP
```
You can combine `-s` or `--src-range` with `-d` or `--dst-range` to control both
the source and destination. For instance, if the Docker daemon listens on both
`192.168.1.99` and `10.1.2.3`, you can make rules specific to `10.1.2.3` and leave
`192.168.1.99` open.
the source and destination. For instance, if the Docker host has addresses
`2001:db8:1111::2` and `2001:db8:2222::2`, you can make rules specific to
`2001:db8:1111::2` and leave `2001:db8:2222::2` open.
`iptables` is complicated and more complicated rules are out of scope for this
topic. See the [Netfilter.org HOWTO](https://www.netfilter.org/documentation/HOWTO/NAT-HOWTO.html)
for a lot more information.
`iptables` is complicated. There is a lot more information at [Netfilter.org HOWTO](https://www.netfilter.org/documentation/HOWTO/NAT-HOWTO.html).
## Docker on a router
### Direct routing
Docker also sets the policy for the `FORWARD` chain to `DROP`. If your Docker
host also acts as a router, this will result in that router not forwarding
any traffic anymore. If you want your system to continue functioning as a
router, you can add explicit `ACCEPT` rules to the `DOCKER-USER` chain to
allow it:
Port mapping ensures that published ports are accessible on the host's
network addresses, which are likely to be routable for any external
clients. No routes are normally set up in the host's network for container
addresses that exist within a host.
But, particularly with IPv6 you may prefer to avoid using NAT and instead
arrange for external routing to container addresses.
To access containers on a bridge network from outside the Docker host,
you must set up routing to the bridge network via an address on the Docker
host. This can be achieved using static routes, Border Gateway Protocol
(BGP), or any other means appropriate for your network.
The bridge network driver has options
`com.docker.network.bridge.gateway_mode_ipv6=<nat|routed>` and
`com.docker.network.bridge.gateway_mode_ipv4=<nat|routed>`.
The default is `nat`, NAT and masquerading rules are set up for each
published container port. With mode `routed`, no NAT or masquerading rules
are set up, but `iptables` are still set up so that only published container
ports are accessible.
In `routed` mode, a host port in a `-p` or `--publish` port mapping is
not used, and the host address is only used to decide whether to apply
the mapping to IPv4 or IPv6. So, when a mapping only applies to `routed`
mode, only addresses `0.0.0.0` or `::1` are allowed, and a host port
must not be given.
Mapped container ports, in `nat` or `routed` mode, are accessible from
any remote address, if routing is set up in the network, unless the
Docker host's firewall has additional restrictions.
#### Example
Create a network suitable for direct routing for IPv6, with NAT enabled
for IPv4:
```console
$ iptables -I DOCKER-USER -i src_if -o dst_if -j ACCEPT
$ docker network create --ipv6 --subnet 2001:db8::/64 -o com.docker.network.bridge.gateway_mode_ipv6=routed mynet
```
## Prevent Docker from manipulating iptables
Create a container with a published port:
```console
$ docker run --network=mynet -p 8080:80 myimage
```
It is possible to set the `iptables` key to `false` in the Docker engine's configuration file at `/etc/docker/daemon.json`, but this option is not appropriate for most users. It is not possible to completely prevent Docker from creating `iptables` rules, and creating them after-the-fact is extremely involved and beyond the scope of these instructions. Setting `iptables` to `false` will more than likely break container networking for the Docker engine.
Then:
- Only container port 80 will be open, for IPv4 and IPv6. It is accessible
from anywhere, if there is routing to the container's address, and access
is not blocked by the host's firewall.
- For IPv6, using `routed` mode, port 80 will be open on the container's IP
address. Port 8080 will not be opened on the host's IP addresses, and
outgoing packets will use the container's IP address.
- For IPv4, using the default `nat` mode, the container's port 80 will be
accessible via port 8080 on the host's IP addresses, as well as directly.
Connections originating from the container will masquerade, using the
host's IP address.
For system integrators who wish to build the Docker runtime into other applications, explore the [`moby` project](https://mobyproject.org/).
In `docker inspect`, this port mapping will be shown as follows. Note that
there is no `HostPort` for IPv6, because it is using `routed` mode:
```console
$ docker container inspect <id> --format "{{json .NetworkSettings.Ports}}"
{"80/tcp":[{"HostIp":"0.0.0.0","HostPort":"8080"},{"HostIp":"::","HostPort":""}]}
```
## Setting the default bind address for containers
Alternatively, to make the mapping IPv6-only, disabling IPv4 access to the
container's port 80, use the unspecified IPv6 address `[::]` and do not
include a host port number:
```console
$ docker run --network mynet -p '[::]::80'
```
By default, the Docker daemon binds published container ports to the `0.0.0.0`
address. When you publish a container's ports as follows:
### Setting the default bind address for containers
By default, when a container's ports are mapped without any specific host
address, the Docker daemon binds published container ports to all host
addresses (`0.0.0.0` and `[::]`).
For example, the following command publishes port 8080 to all network
interfaces on the host, on both IPv4 and IPv6 addresses, potentially
making them available to the outside world.
```console
docker run -p 8080:80 nginx
```
This publishes port 8080 to all network interfaces on the host, potentially
making them available to the outside world. Unless you've disabled IPv6 at the
kernel level, the port gets published on both IPv4 and IPv6.
You can change the default binding address for published container ports so that
they're only accessible to the Docker host by default. To do that, you can
configure the daemon to use the loopback address (`127.0.0.1`) instead.
To do so, configure the `"ip"` key in the `daemon.json` configuration file:
> **Warning**
>
> Hosts within the same L2 segment (for example, hosts connected to the same
> network switch) can reach ports published to localhost.
> For more information, see
> [moby/moby#45610](https://github.com/moby/moby/issues/45610)
{ .warning }
To configure this setting for user-defined bridge networks, use
the `com.docker.network.bridge.host_binding_ipv4`
[driver option](./drivers/bridge.md#options) when you create the network.
```console
$ docker network create mybridge \
-o "com.docker.network.bridge.host_binding_ipv4=127.0.0.1"
```
> **Note**
>
> - Setting the default binding address to `::` means port bindings with no host
> address specified will work for any IPv6 address on the host. But, `0.0.0.0`
> means any IPv4 or IPv6 address.
> - Changing the default bind address doesn't have any effect on Swarm services.
> Swarm services are always exposed on the `0.0.0.0` network interface.
#### Default bridge
To set the default binding for the default bridge network, configure the `"ip"`
key in the `daemon.json` configuration file:
```json
{
@ -139,20 +252,32 @@ ports on the default bridge network.
Restart the daemon for this change to take effect.
Alternatively, you can use the `dockerd --ip` flag when starting the daemon.
> **Note**
>
> Changing the default bind address doesn't have any effect on Swarm services.
> Swarm services are always exposed on the `0.0.0.0` network interface.
## Docker on a router
To configure this setting for user-defined bridge networks, use
the `com.docker.network.bridge.host_binding_ipv4`
[driver option](./drivers/bridge.md#options) when you create the network.
Docker sets the policy for the `FORWARD` chain to `DROP`. This will prevent
your Docker host from acting as a router.
If you want your system to function as a router, you must add explicit
`ACCEPT` rules to the `DOCKER-USER` chain. For example:
```console
$ docker network create mybridge \
-o "com.docker.network.bridge.host_binding_ipv4=127.0.0.1"
$ iptables -I DOCKER-USER -i src_if -o dst_if -j ACCEPT
```
## Prevent Docker from manipulating iptables
It is possible to set the `iptables` or `ip6tables` keys to `false` in
[daemon configuration](https://docs.docker.com/reference/cli/dockerd/), but
this option is not appropriate for most users. It is likely to break
container networking for the Docker Engine.
All ports of all containers will be accessible from the network, and none
will be mapped from Docker host IP addresses.
It is not possible to completely prevent Docker from creating `iptables`
rules, and creating rules after-the-fact is extremely involved and beyond
the scope of these instructions.
## Integration with firewalld
If you are running Docker with the `iptables` option set to `true`, and