Docker Tip: Customising Docker CLI Output


Photo by Andrew Filer

Docker provides a comprehensive API and CLI to its platform. This article is concerned with customising the output returned by Docker CLI commands.

There are a large number of Docker client CLI commands, which provide information relating to various Docker objects on a given Docker host or Swarm cluster. Generally, this output is provided in a tabular format. An example, which all Docker users will have come across, is the docker container ls command, which provides a list of running containers:

$ docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                  NAMES  
43195e559b42        wordpress           "docker-entrypoint..."   47 seconds ago      Up 46 seconds>80/tcp   wp  
f7926468281f        mariadb             "docker-entrypoint..."   2 minutes ago       Up 2 minutes        3306/tcp               mysql  
Customising Command Output

Sometimes, all of this information is too much, and you may find yourself wanting to format the output just how you'd like it. You might want to do this to de-clutter the output, for aesthetic purposes, or for formatting output as input to scripts. This is quite straightforward to do, as a large number of CLI commands have a config option, --format, just for this purpose. The format of the output needs to be specified using a Golang template, which translates a JSON object into the desired format. For example, if we're only interested in the container ID, image, status, exposed ports and name, we could get this with the following (the \t specifies a tab):

$ docker container ls --format '{{.ID}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}\t{{.Names}}'
43195e559b42    wordpress   Up 41 minutes>80/tcp    wp  
f7926468281f    mariadb Up 43 minutes   3306/tcp    mysql  

This provides us with the reduced amount of information we specified, but it looks a bit shoddy. We can add the table directive to improve the look:

$ docker container ls --format 'table {{.ID}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}\t{{.Names}}'
CONTAINER ID        IMAGE               STATUS              PORTS                  NAMES  
43195e559b42        wordpress           Up About an hour>80/tcp   wp  
f7926468281f        mariadb             Up About an hour    3306/tcp               mysql  

Docker actually uses a template applied to a JSON object, to generate the default output you see when no user-defined formatting is applied. The default table format for listing all of the container objects is:

table {{.ID}}\t{{.Image}}\t{{.Command}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}\t{{.Names}}  

These are not the complete set of fields available in the output, however. We can find all of the fields associated with the container object, with:

$ docker container ls --format '{{json .}}' | jq '.'
  "Command": "\"docker-entrypoint...\"",
  "CreatedAt": "2017-07-24 16:23:25 +0100 BST",
  "ID": "43195e559b42",
  "Image": "wordpress",
  "Labels": "",
  "LocalVolumes": "1",
  "Mounts": "c71e998f250e...",
  "Names": "wp",
  "Networks": "wp",
  "Ports": ">80/tcp",
  "RunningFor": "About an hour ago",
  "Size": "0B",
  "Status": "Up About an hour"
  "Command": "\"docker-entrypoint...\"",
  "CreatedAt": "2017-07-24 16:21:33 +0100 BST",
  "ID": "f7926468281f",
  "Image": "mariadb",
  "Labels": "",
  "LocalVolumes": "1",
  "Mounts": "acaa1732009a...",
  "Names": "mysql",
  "Networks": "wp",
  "Ports": "3306/tcp",
  "RunningFor": "About an hour ago",
  "Size": "0B",
  "Status": "Up About an hour"

Notice that there are some keys in each of the objects, missing from the default output; Labels, LocalVolumes, Mounts and Networks, to name a few. Hence, we could customise our output further, by replacing the Status field with the Networks field:

$ docker container ls --format 'table {{.ID}}\t{{.Image}}\t{{.Networks}}\t{{.Ports}}\t{{.Names}}'
CONTAINER ID        IMAGE               NETWORKS            PORTS                  NAMES  
43195e559b42        wordpress           bridgey,wp>80/tcp   wp  
f7926468281f        mariadb             wp                  3306/tcp               mysql  
Making a Customisation Permanent

The --formatconfig option is great, if you want to customise the output in a specific way for a particular use case. It would be a significant PITA, however, if you had to remember this syntax each time you issued a command, if you wanted to perpetually have customised output. You would of course, create an alias, or a script. Docker, however, allows you to make this customisation more permanent, with the use of a configuration file. When a user on a Docker host logs in to the Docker Hub for the very first time, using the docker login command, a file called config.json is created in a directory called .docker in the user's home directory. This file is used by Docker to hold JSON encoded properties, including a user's credentials. It can also be used to hold the format template for the docker container ls command, using the psFormat property. The property is called psFormat, after the old version of the command name, docker ps. A config.json file might look like this:

$ cat config.json
    "auths": {},
    "psFormat": "table {{.ID}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}\t{{.Names}}\t{{.Networks}}"

The psFormat property is the JSON key, whilst the value is the required template for configuring the command output.

With the psFormat property defined, every time you use the docker container ls command, you'll get the customised output you desire. It's possible to override the customisation on a case by case basis, simply by using the --format config option, which takes precedence. Take care when editing the config file; incorrect syntax could render all properties invalid.

Valid Command Customisation Properties

Whilst the output for a large number of commands can be formatted using the --format config option, permanent customisation via a property defined in the config.json file, is mainly reserved for commands listing particular objects. A complete list of the commands, their relevant config property, and default template, are provided in the table below:

Command Property Default Template
docker container ls psFormat table {{.ID}}\t{{.Image}}\t{{.Command}}\t{{.RunningFor}}\t{{.Status}}\t{{.Ports}}\t{{.Names}}
docker image ls imagesFormat table {{.Repository}}\t{{.Tag}}\t{{.ID}}\t{{.CreatedSince}}\t{{.Size}}
docker network ls networksFormat table {{.ID}}\t{{.Name}}\t{{.Driver}}\t{{.Scope}}
docker node ls nodesFormat table {{.ID}} {{if .Self}}*{{else}} {{end}}\t{{.Hostname}}\t{{.Status}}\t{{.Availability}}\t{{.ManagerStatus}}
docker plugin ls pluginsFormat table {{.ID}}\t{{.Name}}\t{{.Description}}\t{{.Enabled}}
docker secret ls secretFormat table {{.ID}}\t{{.Name}}\t{{.CreatedAt}}\t{{.UpdatedAt}}
docker service ls servicesFormat table {{.ID}}\t{{.Name}}\t{{.Mode}}\t{{.Replicas}}\t{{.Image}}\t{{.Ports}}
docker service ps tasksFormat table {{.ID}}\t{{.Name}}\t{{.Image}}\t{{.Node}}\t{{.DesiredState}}\t{{.CurrentState}}\t{{.Error}}\t{{.Ports}}
docker volume ls volumesFormat table {{.Driver}}\t{{.Name}}

The output of a couple of additional Docker CLI commands, can also be defined in the config.json file. The first of these is the format associated with the output of the docker stats command. This command provides rudimentary, real-time resource consumption for running containers, and the statsFormat property allows for customising which metrics are displayed:

Command Property Default Template
docker stats statsFormat table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}\t{{.PIDs}}

The second additional property available, is used to format the output associated with the docker service inspect command. Historically, inspect commands, for example docker container inspect, provide JSON output. Docker's maintainers decided that, whilst the docker service inspect command warranted having its output rendered in a more readable format than JSON, they didn't want to break the expected behaviour associated with the inspect commands for other objects. As a compromise, in addition to providing a --pretty config option for the command itself, it's also possible to set the default output to pretty using the serviceInspectProperty in the config.json file:

Command Property Useful Template
docker service inspect serviceInspectFormat pretty

Secrets Come to Docker

Secrets Come to Docker

The provision of secure, authenticated access to sensitive data on IT systems, is an integral component of systems design. The secrets that users or peer IT services employ for accessing sensitive data published by an IT service, come in a variety of guises; passwords, X.509 certificates, SSL/TLS keys, GPG keys, SSH keys and so on. Managing and controlling these secrets in service-oriented environments, is non-trivial. With the continued advance in the adoption of the microservices architecture pattern for software applications, and their common implementation as distributed, immutable containers, this challenge has been exacerbated. How do you de-couple the secret from the template (image) of the container? How do you provide the container with the secret without compromising it? Where will the container be running, so as to provide it with the secret? How do you change the secret without interrupting the consumption of the service?

Docker Engine 1.13.0 introduced a new primary object, the secret, when it was released recently. In conjunction with new API endpoints and CLI commands, the new secret object is designed for handling secrets in a multi-container, multi-node environment - a 'swarm mode' cluster. It is not intended or available for use outside of a swarm mode cluster. Whilst the management of secrets is an oft-requested feature for Docker (particularly in the context of building Docker images), it's unclear if or when a secrets solution will be implemented for the standalone Docker host context. For now, people have been encouraged to use the 'service' abstraction in place of deploying individual containers. This requires bootstrapping a swarm mode cluster, even if it only contains a single node, and the service you deploy only comprises a single task. It's a good job it's as simple as,

$ docker swarm init
How are secrets created?

Creating a secret with the Docker client is a straightforward exercise,

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create db_pw -

In this simple example, the Docker CLI reads the content of the secret from STDIN, but it could equally well be a file. The content of the secret can be anything, provided it's size is no more than the secret limit of 500 KB. As with all Docker objects, there are API endpoints and CLI commands for inspecting, listing and removing secrets: docker secret inspect, docker secret ls, docker secret rm. Inspecting the secret provides the following:

$ docker secret inspect db_pw
        "ID": "joptoh9y7x8galitn4ztnk86r",
        "Version": {
            "Index": 44
        "CreatedAt": "2017-01-23T13:52:35.810853263Z",
        "UpdatedAt": "2017-01-23T13:52:35.810853263Z",
        "Spec": {
            "Name": "db_pw"

Inspecting the secret, doesn't (obvs) show you the content of the secret. It shows the creation time of the secret, and whilst the output displays an UpdatedAt key, secrets cannot be updated by the CLI at present. There is, however, an API endpoint for updating secrets.

The Spec key provides some detail about the secret, just the name in the above example. Like most objects in Docker, it is possible to associate labels with secrets when they are created, and labels appear as part of the value of the Spec key.

How are secrets consumed?

Secrets are consumed by services through explicit association. Services are implemented with tasks (individual containers), which can be scheduled on any node within the swarm cluster. If a service comprises of multiple tasks, an associated secret is accessible to any of the tasks, whichever node they are running on.

A service can be created with access to a secret, using the --secret flag:

$ docker service create --name app --secret db_pw my_app:1.0

In addition, a previously created service can be be granted access to an additional secret or have secrets revoked, using the --secret-add and --secret-rm flags used in conjunction with docker service update.

Where are secrets kept?

A swarm mode cluster uses the Raft Consensus Algorithm in order to ensure that nodes participating in the management of the cluster, agree on the state of the cluster. Part of this process involves the replication of the state to all management nodes in the form of a log.

The implementation of secrets in Docker swarm mode, takes advantage of the highly consistent, distributed nature of Raft, by writing secrets to the raft log, which means they are replicated to each of the manager nodes. The Raft log on each manager node is held in memory whilst the cluster is operating, and is encrypted in Docker 1.13.0+.

How does a container access a secret?

A container that is a task associated with a service that has access to a secret, has the secret mounted onto its filesystem under /run/secrets, which is a tmpfs filesystem residing in memory. For example, if the secret is called db_pw, it's available inside the container at /var/run/secrets/db_pw for as long as the container is running (/var/run is a symlink to /run). If the container is halted for any reason, /run/secrets is no longer a component of the container's filesystem, and the secret is also flushed from the hosting node's memory.

The secrets user interface provides some flexibility regarding a service's consumption of the secret. The secret can be mounted with a different name to the one provided during its creation, and its possible to set the UID, GID and mode for the secret. For example, the db_pw secret could be made available inside container tasks with the following attributes:

$ docker service create --name app --secret source=db_pw,target=password,uid=2000,gid=3000,mode=0400 my_app:1.0

Inside the container, this would yield:

root@a61281217232:~# ls -l /var/run/secrets  
total 8  
-r--r--r-- 1 root root 32 Jan 23 11:49 my_secret
-r-------- 1 2000 3000 32 Jan 23 11:49 password
How are secrets updated?

By design, secrets in Docker swarm mode are immutable. If a secret needs to be rotated, it must first be removed from the service, before being replaced with a new secret. The replacement secret can be mounted in the same location. Let's take a look at an example. First we'll create a secret, and use a version number in the secret name, before adding it to a service as password:

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create my_secret_v1.0 -
$ docker service create --name nginx --secret source=my_secret_v1.0,target=password nginx

Once the task is running, the secret will be available in the container at /var/run/secrets/password. If the secret is changed, the service can be updated to reflect this:

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create my_secret_v1.1 -
$ docker service update --secret-rm my_secret_v1.0 --secret-add source=my_secret_v1.1,target=password nginx

Each service update results in the replacement of existing tasks based on the update policy defined by the --update-parallelism and --update-delay flags (1 and 0s by default, respectively). If the service comprises of multiple tasks, and the update is configured to be applied over a period of time, then some tasks will be using the old secret, whilst the updated tasks will be using the new secret. Clearly, some co-ordination needs to take place between service providers and consumers, when secrets are changed!

After the update, the new secret is available for all tasks that make up the service, and it can be removed (if desired). It can't be removed whilst a service is using the secret:

$ docker secret rm my_secret_v1.0

Introduced in Docker 1.13.0:

  • A new secrets object, along with API endpoints and CLI commands
  • Available in swarm mode only
  • Secrets are stored in the Raft log associated with the swarm cluster
  • Mounted in tmpfs inside a container

Explaining Docker Image IDs

When Docker v1.10 came along, there was a fairly seismic change with the way the Docker Engine handles images. Whilst this was publicised well, and there was little impact on the general usage of Docker (image migration, aside), there were some UI changes which sparked some confusion. So, what was the change, and why does the docker history command show some IDs as <missing>?

$ docker history debian
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT  
1742affe03b5        10 days ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B  
<missing>           10 days ago         /bin/sh -c #(nop) ADD file:5d8521419ad6cfb695   125.1 MB  

First, some background. A docker image is a read-only template for creating containers, and provides a filesystem based on an ordered union of multiple layers of files and directories, which can be shared with other images and containers. Sharing of image layers is a fundamental component of the Docker platform, and is possible through the implementation of a copy-on-write (COW) mechanism. During its lifetime, if a container needs to change a file from the read-only image that provides its filesystem, it copies the file up to its own private read-write layer before making the change.

A layer or 'diff' is created during the Docker image build process, and results when commands are run in a container, which produce new or modified files and directories. These new or modified files and directories are 'committed' as a new layer. The output of the docker history command above shows that the debian image has two layers.

Historical Perspective

Historically (pre Docker v1.10), each time a new layer was created as a result of a commit action, Docker also created a corresponding image, which was identified by a randomly generated 256-bit UUID, usually referred to as an image ID (presented in the UI as either a short 12-digit hex string, or a long 64-digit hex string). Docker stored the layer contents in a directory with a name synonymous with the image ID. Internally, the image consisted of a configuration object, which held the characteristics of the image, including its ID, and the ID of the image's parent image. In this way, Docker was able to construct a filesystem for a container, with each image in turn referencing its parent and the corresponding layer content, until the base image was reached which had no parent. Optionally, each image could also be tagged with a meaningful name (e.g. my_image:1.0), but this was usually reserved for the leaf image. This is depicted in the diagram below:

Using the docker inspect command would yield:

$ docker inspect my_image:1.0
        "Id": "ca1f5f48ef431c0818d5e8797dfe707557bdc728fe7c3027c75de18f934a3b76",
        "Parent": "91bac885982d2d564c0e1869e8b8827c435eead714c06d4c670aaae616c1542c"

This method served Docker well for a sustained period, but over time was perceived to be sub-optimal for a variety of reasons. One of the big drivers for change, came from the lack of a means of detecting whether an image's contents had been tampered with during a push to or pull from a registry, such as the Docker Hub. This led to robust criticism from the community at large, and led to a series of changes, culminating in content addressable IDs.

Content Addressable IDs

Since Docker v1.10, generally, images and layers are no longer synonymous. Instead, an image directly references one or more layers that eventually contribute to a derived container's filesystem.

Layers are now identified by a digest, which takes the form algorithm:hex; for example:


The hex element is calculated by applying the algorithm (SHA256) to a layer's content. If the content changes, then the computed digest will also change, meaning that Docker can check the retrieved contents of a layer with its published digest in order to verify its content. Layers have no notion of an image or of belonging to an image, they are merely collections of files and directories.

A Docker image now consists of a configuration object, which (amongst other things) contains an ordered list of layer digests, which enables the Docker Engine to assemble a container's filesystem with reference to layer digests rather than parent images. The image ID is also a digest, and is a computed SHA256 hash of the image configuration object, which contains the digests of the layers that contribute to the image's filesystem definition. The following diagram depicts the relationship between image and layers post Docker v1.10:
The digests for the image and layers have been shortened for readability.

The diff directory for storing the layer content, is now named after a randomly generated 'cache ID', and the Docker Engine maintains the link between the layer and its cache ID, so that it knows where to locate the layer's content on disk.

So, when a Docker image is pulled from a registry, and the docker history command is used to reveal its contents, the output provides something similar to:

$ docker history swarm
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT  
c54bba046158        9 days ago          /bin/sh -c #(nop) CMD ["--help"]                0 B  
<missing>           9 days ago          /bin/sh -c #(nop) ENTRYPOINT &{["/swarm"]}      0 B  
<missing>           9 days ago          /bin/sh -c #(nop) VOLUME [/.swarm]              0 B  
<missing>           9 days ago          /bin/sh -c #(nop) EXPOSE 2375/tcp               0 B  
<missing>           9 days ago          /bin/sh -c #(nop) ENV SWARM_HOST=:2375          0 B  
<missing>           9 days ago          /bin/sh -c #(nop) COPY dir:b76b2255a3b423981a   0 B  
<missing>           9 days ago          /bin/sh -c #(nop) COPY file:5acf949e76228329d   277.2 kB  
<missing>           9 days ago          /bin/sh -c #(nop) COPY file:a2157cec2320f541a   19.06 MB  

The command provides detail about the image and the layers it is composed of. The <missing> value in the IMAGE field for all but one of the layers of the image, is misleading and a little unfortunate. It conveys the suggestion of an error, but there is no error as layers are no longer synonymous with a corresponding image and ID. I think it would have been more appropriate to have left the field blank. Also, the image ID appears to be associated with the uppermost layer, but in fact, the image ID doesn't 'belong' to any of the layers. Rather, the layers collectively belong to the image, and provide its filesystem definition.

Locally Built Images

Whilst this narrative for content addressable images holds true for all Docker images post Docker v1.10, locally built images on a Docker host are treated slightly differently. The generic content of an image built locally remains the same - it is a configuration object containing configuration items, including an ordered list of layer digests.

However, when a layer is committed during an image build on a local Docker host, an 'intermediate' image is created at the same time. Just like all other images, it has a configuration item which is a list of the layer digests that are to be incorporated as part of the image, and its ID or digest contains a hash of the configuration object. Intermediate images aren't tagged with a name, but, they do have a 'Parent' key, which contains the ID of the parent image.

The purpose of the intermediate images and the reference to parent images, is to facilitate the use of Docker's build cache. The build cache is another important feature of the Docker platform, and is used to help the Docker Engine make use of pre-existing layer content, rather than regenerating the content needlessly for an identical build command. It makes the build process more efficient. When an image is built locally, the docker history command might provide output similar to the following:

$ docker history jbloggs/my_image:latest 
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT  
26cca5b0c787        52 seconds ago      /bin/sh -c #(nop) CMD ["/bin/sh" "-c" "/bin/b   0 B  
97e47fb9e0a6        52 seconds ago      /bin/sh -c apt-get update &&     apt-get inst   16.98 MB  
1742affe03b5        13 days ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B  
<missing>           13 days ago         /bin/sh -c #(nop) ADD file:5d8521419ad6cfb695   125.1 MB  

In this example, the top two layers are created during the local image build, whilst the bottom layers came from the base image for the build (e.g. Dockerfile instruction FROM debian). We can use the docker inspect command to review the layer digests associated with the image:

$ docker inspect jboggs/my_image:latest 
        "RootFS": {
            "Type": "layers",
            "Layers": [

The docker history command shows the image as having four layers, but docker inspect suggests just three layers. This is because the two CMD instructions produce metadata for the image, don't add any content, and therefore the 'diff' is empty. The digest 5f70bf18a08a is the SHA256 hash of an empty layer, and is shared by both of the layers in question.

When a locally built image is pushed to a registry, it is only the leaf image that is uploaded along with its constituent layers, and a subsequent pull by another Docker host will not yield any intermediate parent images. This is because once the image is made available to other potential users on different Docker hosts via a registry, it effectively becomes read-only, and the components that support the build cache are no longer required. Instead of the image ID, <missing> is inserted into the history output in its place.

Pushing the image to a registry might yield:

$ docker push jbloggs/my_image:latest
The push refers to a repository []  
f22bfbc1df82: Pushed  
5f70bf18a086: Layer already exists  
4dcab49015d4: Layer already exists  
latest: digest: sha256:7f63e3661b1377e2658e458ac1ff6d5e0079f0cfd9ff2830786d1b45ae1bb820 size: 3147  

In this example, only one layer has been pushed, as two of the layers already exist in the registry, referenced by one or more other images which use the same content.

A Final Twist

The digests that Docker uses for layer 'diffs' on a Docker host, contain the sha256 hash of the tar archived content of the diff. Before the layer is uploaded to a registry as part of a push, it is compressed for bandwidth efficiency. A manifest is also created to describe the contents of the image, and it contains the digests of the compressed layer content. Consequently, the digests for the layers in the manifest are different to those generated in their uncompressed state. The manifest is also pushed to the registry.

The digest of a compressed layer diff can be referred to as a 'distribution digest', whilst the digest for the uncompressed layer diff can be referred to as a 'content digest'. Hence, when we pull our example image on a different Docker host, the docker pull command gives the following output:

$ docker pull jbloggs/my_image
Using default tag: latest  
latest: Pulling from jbloggs/my_image

51f5c6a04d83: Pull complete  
a3ed95caeb02: Pull complete  
9a246d793396: Pull complete  
Digest: sha256:7f63e3661b1377e2658e458ac1ff6d5e0079f0cfd9ff2830786d1b45ae1bb820  
Status: Downloaded newer image for jbloggs/my_image:latest  

The distribution digests in the output of the docker pull command, are very different to the digests reported by the docker push command. But, the pull will decompress the layers, and the output of a docker inspect command will provide the familiar content digests that we saw after the image build.


Following the changes to image and layer handling in Docker v1.10:

  • A Docker image provides a filesystem for a derived container based on the references it stores to layer diffs
  • Layer diffs are referenced using a digest, which contains an SHA256 hash of an archive of the diff's contents
  • A Docker image's ID is a digest, which contains an SHA256 hash of the image's JSON configuration object
  • Docker creates intermediate images during a local image build, for the purposes of maintaining a build cache
  • An image manifest is created and pushed to a Docker registry when an image is pushed
  • An image manifest contains digests of the image's layers, which contain the SHA256 hashes of the compressed, archived diff contents

Docker Overlay Networking

Everyone knows that in the early days of the Docker platform's existence, more emphasis was placed on the Dev side of the DevOps equation. Effectively, that meant that Docker provided a good experience for developing software applications, but a sub-optimal one for running those applications in production. No more so, than with the native networking capabilities provided, that limited inter-container communication to a local Docker host (unless you employed some creative glue and sticky tape).

That all changed with Docker's acquisition of SocketPlane, and the subsequent release of Docker 1.9 in November 2015. The team from SocketPlane helped to completely overhaul the platform's networking capabilities, with the introduction of a networking library called libnetwork. Libnetwork implements Docker's Container Network Model (CNM), and via its API, specific networking drivers provide container networking capabilities based on the CNM abstraction. Docker has in-built drivers, but also supports third party plugin drivers, such as Weave Net and Calico.

One of the in-built drivers is the overlay driver, which provides one of the hitherto most sought after features - cross-host Docker networking for containers. It's based on the VXLAN principle, which encapsulates layer 2 ethernet frames in layer 4 (UDP) packets to enable overlay networking. Let's see how to set this up in Docker.

To demonstrate the use of overlay networks in Docker, I'll use a variation of Dj Walker-Morgan's goredchat application, a simple chat application that uses the Redis database engine to register chat users, and for routing chat messages. We'll create a Redis instance, and two client sessions using goredchat (all running in containers), but on different Docker hosts and connected via an overlay network.

Establish a Key/Value Store

The first thing we need to do is establish a key/value store that Docker's overlay networking requires - it's used to hold network state. We will use HashiCorp's Consul key/value store (other choices are CoreOS' etcd and Apache Software Foundation's Zookeeper), and the easiest way to do this is to run it in a container on a dedicated VM using Virtualbox. To simplify things, we'll use Docker Machine to create the VM.

Create the VM:

$ docker-machine create -d virtualbox kv-store

Next, we need to point our Docker client at the Docker daemon running on the kv-store VM:

$ eval $(docker-machine env kv-store)

Now we need to start a container running Consul, and we'll use the popular progrium/consul Docker image from the Docker Hub. The container needs some ports forwarded to its VM host, and can be started with the following command:

$ docker run -it -d --restart unless-stopped -p 8400:8400 -p 8500:8500 \
> -p 8600:53/udp -h consul progrium/consul -server -bootstrap

Consul will run in the background, and will be available for storing key/value pairs relating to the state of Docker overlay networks for Docker hosts using the store. When one or more Docker hosts are configured to make use of a key/value store, they often do so as part of a cluster arrangement, but being part of a formal cluster (e.g. Docker Swarm) is not a requirement for participating in overlay networks.

Create Three Additional Docker Hosts

Next, we'll create three more VMs, each running a Docker daemon, which we'll use to host containers that will be connected to the overlay network. Each of the Docker daemons running on these machine needs to be made aware of the KV store, and of each other. To achieve this, we need to configure each Docker daemon with the --cluster-store and --cluster-advertise configuration options, which need to supplied via the Docker Machine --engine-opt configuration option:

$ for i in {1..3}; do docker-machine create -d virtualbox \
> --engine-opt “cluster-store consul://$(docker-machine ip kv-store):8500” \
> --engine-opt “cluster-advertise eth1:2376” \
> host0${i}; done

To see if all the VMs are running as expected:

$ docker-machine ls -f "table {{.Name}}  \t{{.State}}\t{{.URL}}\t{{.DockerVersion}}"
NAME         STATE     URL                         DOCKER  
host01       Running   tcp://   v1.10.2  
host02       Running   tcp://   v1.10.2  
host03       Running   tcp://   v1.10.2  
kv-store     Running   tcp://   v1.10.2  
Create an Overlay Network

We now need to run some Docker CLI commands on each of the Docker hosts we have created. There are numerous ways of doing this;

  1. Establish an ssh session on the VM in question, using the docker-machine ssh command
  2. Point the local Docker client at the Docker host in question, using the docker-machine env command
  3. Run one-time commands against the particular Docker host using the docker-machine config command

The overlay network can be created using any of the Docker hosts:

$ docker $(docker-machine config host01) network create -d overlay my_overlay

We can check that the overlay network my_overlay can be seen from each of the Docker hosts (replacing host01 for each individual host):

$ docker $(docker-machine config host01) network ls -f name=my_overlay
NETWORK ID          NAME                DRIVER  
0223fc182bd3        my_overlay          overlay  
Create a Container Running the Redis KV Database Engine

Having created the overlay network, we now need to start a Redis server running in a container on Docker host host01, using the library image found on the Docker Hub registry.

$ docker $(docker-machine config host01) run -d --restart unless-stopped \
> --net-alias redis_svr --net my_overlay redis:alpine redis-server \
> --appendonly yes

The library Redis image will be pulled from the Docker Hub registry, and the Docker CLI will return the container ID, e.g.


In order to connect the container to the my_overlay network, the docker run client command is given the --net my_overlay configuration option, along with --net-alias redis_svr, which provides a network specific alias for the container, redis_svr. The alias can be used to lookup the container. The Redis library image exposes port 6379, which will be accessible to containers connected to the my_overlay network. To test whether the redis_svr container is listening for connections, we can run the Redis client in an ephemeral container, also on host01:

$ docker $(docker-machine config host01) run --rm --net my_overlay \
> redis:alpine redis-cli -h redis_svr ping

Notice that Docker's embedded DNS server resolves the redis_svr name, and the Redis server responds to the Redis client ping, with a PONG.

Create a Container Running goredchat on host02

Now that we have established that a Redis server container is connected to the my_overlay network and listening on port 6379, we can attempt to consume its service from a container running on a different Docker host.

The goredchat image can be found on the Docker Hub registry, and can be run like a binary, with command line options. To find out how it functions, I can run the following command:

$ docker $(docker-machine config host02) run --rm --net my_overlay nbrown/goredchat --help
Usage: /goredchat [-r URL] username  
  e.g. /goredchat -r redis://redis_svr:6379 antirez

  If -r URL is not used, the REDIS_URL env must be set instead

Now let's run a container using the -r configuration option to address the Redis server. If you're re-creating these steps in your own environment, don't forget to add the -it configuration options to enable you to interact with the container:

$ docker $(docker-machine config host02) run -it --rm --net my_overlay \
> nbrown/goredchat -r redis://redis_svr:6379 bill

Welcome to goredchat bill! Type /who to see who's online, /exit to exit.  

I can find out who's online by typing /who:


The goredchat client has created a TCP socket connection from the container on host02 to the Redis server container on host01 over the my_overlay network, and queried the Redis database engine.

Create a Container Running goredchat on host03

In another bash command shell, we can start another instance of goredchat, this time on host03.

$ docker $(docker-machine config host03) run -it --rm --net my_overlay \
> nbrown/goredchat -r redis://redis_svr:6379 brian

Welcome to goredchat brian! Type /who to see who's online, /exit to exit.


Brian and Bill can chat through the goredchat client, which uses a Redis server for subscribing and publishing to a message channel. All components of the chat service run in containers, on different Docker hosts, but connected to the same overlay network.

Docker's new networking capabilitities have some detractors, but libnetwork is a significant step forward, is evolving, and is designed to support multiple use cases, whilst maintaining a consistent and familiar user experience.

The Overlay Filesystem

The overlay filesystem (formally known as overlayfs) was merged into the mainline Linux kernel at version 3.18 in December 2014. Whilst other, similar union mount filesystems have been around for many years (notably, aufs), overlay is the first to become integrated into the Linux kernel.

An overlay sits on top of an existing filesystem, and combines an upper and a lower directory tree (which can be from different filesystems), in order to present a unified representation of both directory trees. Where objects with the same name exist in both directory trees, then their treatment depends on the object type:

  • File: the object in the upper directory tree appears in the overlay, whilst the object in the lower directory tree is hidden
  • Directory: the contents of each directory object are merged to create a combined directory object in the overlay

The lower directory can be read-only, and could be an overlay itself, whilst the upper directory is normally writeable. In order to create an overlay of two directories, dir1 and dir2, we can use the following mount command:

mount -t overlay -o lowerdir=./dir1,upperdir=./dir2,workdir=./work overlay ./dir3  

A union of the two directories is created as an overlay in the dir3 directory. The workdir option is required, and used to prepare files before they are switched to the overlay destination in an atomic action (the workdir needs to be on the same filesystem as the upperdir). The following illustrates a simple example of the overlay mount above:

Overlay Mount

When a file or directory that originates in the upper directory is removed from the overlay, it's also removed from the upper directory. If a file or directory that originates in the lower directory is removed from the overlay, it remains in the lower directory, but a 'whiteout' is created in the upper directory. A whiteout takes the form of a character device with device number 0/0, and a name identical to the removed object. The result of the whiteout creation means that the object in the lower directory is ignored, whilst the whiteout itself is not visible in the overlay. The following illustrates the creation of a whiteout in the upperdir on removal of the file mango:

$ ls -l ./dir3/fruit
total 72  
-rw-rw-r-- 1 bill bill  1320 May 20 12:39 apple
-rw-rw-r-- 1 bill bill    92 May 20 11:53 grape
-rw-rw-r-- 1 bill bill 63456 May 20 11:53 mango
$ rm ./dir3/fruit/mango
$ ls -l ./dir3/fruit
total 8  
-rw-rw-r-- 1 bill bill  1320 May 20 12:39 apple
-rw-rw-r-- 1 bill bill    92 May 20 11:53 grape
$ ls -l ./dir2/fruit
total 4  
-rw-rw-r-- 1 bill bill 1320 May 20 12:39 apple
c--------- 1 bill bill 0, 0 May 20 17:38 mango  

Linux kernel 4.0 further extends the overlay capabilities, to enable multiple lower directories to be specified, separated by a :, with the rightmost lower directory on the bottom, and the leftmost lower directory on the top of the union. For example:

mount -t overlay -o lowerdir=./dir3:./dir2:./dir1 overlay ./dir4  

In this extended version, the upperdir is optional, and if it is omitted, then the workdir option is also optional, and will be ignored in any case. In this scenario, the overlay will be read-only.

At the time of writing, Linux kernel version 4.0 is very new, and will not have found its way into many Linux distributions.

Use Cases

Union filesystems are often used for Live CD creation, where a read-only image is augmented with a writeable layer in tmpfs, thereby enabling a dynamic, but ephemeral session.

Effectively, this is 'copy-on-write', where read-only data is used until such time as the data requires changing, whereupon it is copied and altered in the read-write layer. This copy-on-write mechanism is used in the creation of filesystems for Linux containers, used by container runtime environments like Docker or rkt. It's not the only option for assembling container filesystems, but it is one of the more performant, because it allows pages in the kernel's page cache to be shared between containers - an option which is not available with block device copy-on-write mechanisms, such as the device mapper framework with the thinp target, or the btrfs filesystem. Docker's overlay graphdriver is currently the last in the queue for automatic selection (vfs is used for testing), behind aufs, btrfs and the devicemapper graphdrivers, but as the remaining issues are closed out, I expect it to become the default.