Containers

Diminishing the Powers of a Container Workload

Managing the privileges available to a container, is important to the ongoing integrity of the container and the host on which it runs. With privilege, comes power, and the potential to abuse that power, wittingly or unwittingly.

A simple container example, serves to illustrate:

$ sudo docker container run -itd --name test alpine sh
8588dfbfc89fc5761c11ebff6c9319fb655da92a1134cd5810031149e5cfc6e0  
$ sudo docker container top test -eo pid
PID  
2140  
$ ps -fp 26919
UID        PID  PPID  C STIME TTY          TIME CMD  
root      2140  2109  0 10:31 pts/0    00:00:00 sh  

A container is started in detached mode, we retrieve the process ID from the perspective of the default (or host's) PID namespace, list the process, and find that the UID (user ID) associated with the container's process is root. It turns out that the set of UIDs and GIDs (Groups IDs) are the same for the container and the host, because containers are started with the privileged user with UID/GID=0 (aka root or superuser). A big ask, but if the container's process were able to break out of the confines of the container, it would have root access on the host.

There are lots of things we can do to mitigate this risk. Docker removes a lot of potentially, pernicious privileges by dropping capabilities, and applying other security mechanisms in order to minimize the potential attack surface. We can even make use of user namespaces, by configuring the Docker daemon to map a UID/GID range from the host, onto another range in the container. This means a container's process, running as the privileged root user, will map to a non-privileged user on the host.

If you're able to make use of the --userns-remap config option on the daemon, to perform this mapping, you absolutely should. Unfortunately, it's not always possible or desirable to do so - another story, another post! This puts us back to square one; what can we do to minimize the risk? The simple answer is, that we should always be guided by the principle of least privilege. Often, containers need privileges that are associated with the root user, but if they don't, then you should take action to run your containers as a benign user. How do you achieve this?

A Simple Example

Let's take a simple Dockerfile example, which defines a Docker image for the AWS CLI. This use case might be more suited to a local developer's laptop, rather than a sensitive, production-based environment, but it will serve as an illustration. The image enables us to install and run AWS CLI commands in a container, rather than on the host itself:

FROM alpine:latest

# Define build argument for AWS CLI version
ARG VERSION

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                  && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip

CMD ["help"]  
ENTRYPOINT ["aws"]  

Assuming the contents of the above are in a file called Dockerfile, located in the current working directory, we can use the docker image build command to build this image. Assuming we have made the local user a member of the group docker, which for convenience, will provide unfettered access to the Docker CLI (something that should only ever be done in a development environment), the following will create the image:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v1 .

We could then check the image works as intended, by running a container derived from the image. This is equivalent to running the command aws --version in a non-containerized environment:

$ docker container run --rm --name aws aws:v1 --version
aws-cli/1.14.38 Python/2.7.14 Linux/4.4.0-112-generic botocore/1.8.42  

This is all well and good, but as we didn't take any action to curtail any privileges, the container ran as the root user, with UID/GID=0. This level of privilege is not necessary to run AWS CLI commands, so let's do something about it!

Using a Non-privileged User

To fix this, we can add a non-privileged user to the image, and then 'set' the user for the image to the non-privileged user, so that a derived container's process, is no longer privileged. The changes to the Dockerfile, might look something like this:

FROM alpine:latest

# Define build argument for AWS CLI version
ARG VERSION

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                  && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip                              && \
    addgroup aws                                && \
    adduser -D -G aws aws

USER aws

WORKDIR /home/aws

CMD ["help"]  
ENTRYPOINT ["aws"]  

All we've done, is add two commands to the RUN instruction, to add a group called aws, and to add a user called aws that belongs to the aws group. In order to make use of the aws user, however, we also have to set the user with the USER Dockerfile instruction, and whilst we're at it, we'll set the working context in the filesystem, to its home directory, courtesy of the WORKDIR instruction. We can re-build the image, tagging it as v2 this time:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v2 .

Now that we have a new variant of the aws image, we'll run up a new container, but we'll not specify any command line arguments, which means the argument for the aws command will be help, as specified with the CMD instruction in the Dockerfile:

$ docker container run --rm -it --name aws aws:v2

Unsurprisingly, this will list help for the AWS CLI, which is piped to less, which will give us the opportunity to poke around whilst the container is still running. In another terminal on the host, if we repeat the exercise we carried out earlier, when we looked for the container's process(es), we get the following:

$ docker container top aws -eo pid
PID  
2436  
2487  
$ ps -fp 2436,2487
UID        PID  PPID  C STIME TTY          TIME CMD  
rackham   2436  2407  0 14:27 pts/0    00:00:00 /usr/bin/python2 /usr/bin/aws he  
rackham   2487  2436  0 14:27 pts/0    00:00:00 less -R  

It reports that the processes are running with the UID associated with the user rackham. In actual fact, the UID 1000 is associated with the user rackham on the host, but in the container, the UID 1000 is associated with the user aws:

$ id -u rackham
1000  
$ docker container exec -it aws id
uid=1000(aws) gid=1000(aws) groups=1000(aws)  

What really matters is the UID, not the user that it translates to, as the kernel works with the UID when it comes to access control. With the trivial changes made to the image, our container is happily running as a non-privileged user, which should provide us with some peace of mind.

IDs and Bind Mounts

There is something missing from the AWS CLI image, however. In order to do anything meaningful, the AWS CLI commands need access to the user's AWS configuration and credentials, in order to access the AWS API. Obviously, we shouldn't bake these into the image, especially if we intend to share the image with others! We could pass them as environment variables, but whilst this might be a means for injecting configuration items into a container, it's not safe for sensitive data, such as credentials. If you allow others, access to the same Docker daemon, without limiting access using an access authorization plugin, environment variables will be exposed to other users, if they use the docker container inspect command. Another approach would be to bind mount the files containing the relevant data, into the container at run time. In fact, if we want to make use of the aws configure command, to update our local AWS configuration, this is the only way we can update those files, when using a container.

On Linux, AWS config and credentials files are normally located in $HOME/.aws, so we need to bind mount this directory inside the container, at the /home/aws/.aws location of the container's user. We need to do this, each time we want to execute an AWS CLI command using a container. Let's try this out, and try to list the instances running in the default region, which is specified in the AWS config file located in /home/aws/.aws. This command is equivalent to running aws ec2 describe-instances:

$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--name aws aws:v2 ec2 describe-instances
You must specify a region. You can also configure your region by running "aws configure"  

That didn't go too well! The error message would suggest that the aws command can't find the files. After we've ascertained that the local user's UID/GID is 1001, if we run another container, and override the container's entrypoint, and run ls -l ./.aws, we can see the reason for the error:

$ id
uid=1001(baxter) gid=1001(baxter) groups=1001(baxter),27(sudo),999(docker)  
$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--entrypoint ls --name aws aws:v2 -l ./.aws
total 8  
-rw-------    1 1001     1001           149 Feb 13 16:20 config
-rw-------    1 1001     1001           229 Feb 13 15:42 credentials

The files are present inside the container, but they are owned by UID/GID=1001. Remember, whilst we didn't specify a deterministic UID/GID for the container's user in the image, the addgroup and adduser commands, created the aws user with a UID/GID=1000. There is a mismatch between the UID/GIDs, and the file permissions are such that the container's user cannot read or write to the files.

This is a big problem. We've been careful to ensure that our container runs with diminished privileges, but ended up with a problem to resolve, as a consequence.

We could try and circumvent this problem, by using the --user config option to the docker container run command, and specify the container gets run with a UID/GID=1001, instead of 1000:

$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--user 1001:1001 --name aws aws:v2 ec2 describe-instances
You must specify a region. You can also configure your region by running "aws configure".  

This error message is starting to become familiar. The reason, this time, is that there is no 'environment' ($HOME to be precise) for a user with UID/GID=1001, which the AWS CLI needs in order to locate the config and credentials files. This is because there is no user configured in the container's filesystem with UID/GID=1001. We might be tempted to pass a HOME environment variable to the docker container run command, or even to alter the Dockerfile to provide deterministic values for the UID/GID. If we succumb to these seductions, then we're in danger of making the image very specific to a given host, and relying too much on a consumer of our image, to figure out how to make it work around these idiosyncrasies. A better option, would be to add the aws user after the container has been created, which will give us the ability to add the user with the required UID/GID. Let's see how to do this.

Defer Stepping Down to a Non-privileged User

The image for the AWS CLI is immutable, so we can't define a 'variable' aws user in the Dockerfile. Instead, we can make use of an entrypoint script, which will get executed when the container starts. It replaces the aws command, that is specified as the entrypoint in the Dockerfile. Here's a revised Dockerfile:

$ FROM alpine:latest

# Define build time argument for AWS CLI version
ARG VERSION

# Add default UID for 'aws' user
ENV AWS_UID=1000

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                     \
        su-exec                                 && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip                              && \
    mkdir -p /home/aws

COPY docker-entrypoint.sh /usr/local/bin/

WORKDIR /home/aws

CMD ["help"]  
ENTRYPOINT ["docker-entrypoint.sh"]  

In addition to changing the entrypoint, and copying the script from the build context with the COPY instruction, we've added an environment variable specifying a default UID for the aws user (in case the user neglects to do so), removed the commands from the RUN instruction for creating the user, and added a command to create the mount point for the bind mount. We've also added a utility to the image, called su-exec, which will enable our script to step down from the root user to the aws user at the last moment.

Let's get to the entrypoint script, itself:

#!/bin/sh

# If --user is used on command line, cut straight to aws command.
# The command will fail, unless the AWS region and profile have
# been provided as command line arguments or envs.
if [ "$(id -u)" != '0' ]; then  
    exec aws "$@"
fi

# Add 'aws' user using $AWS_UID and $AWS_GID
if [ ! -z "${AWS_GID+x}" ] && [ "$AWS_GID" != "$AWS_UID" ]; then  
    addgroup -g $AWS_GID aws
    adduser -D -G aws -u $AWS_UID aws
else  
    adduser -D -u $AWS_UID aws
fi

# Step down from root to aws, and run command
exec su-exec aws aws "$@"  

When the script is invoked, it is running with the all powerful UID/GID=0, unless the user has invoked the container using the --user config option. As the script needs root privileges to create the aws user, if its invoked with any other user, it won't be possible to create the aws user. Hence, a check is made early on in the script, and if the user associated with the container's process is not UID=0, then we simply use exec to replace the script with the aws command, and any arguments passed at the end of the command which invoked the container (e.g ec2 describe-instances). In this scenario, the command will fail if it is required to provide a default region and credentials.

What we would prefer the user to do instead, is specify an environment variable, AWS_UID (and optionally, AWS_GID), on the command line, which reflects the owner of the AWS config and credentials files on the host. Using this variable, the script will create the aws user with a corresponding UID/GID, before the script is replaced with the desired AWS CLI command, which is executed as the aws user, courtesy of the su-exec utility. First we must re-build the image, and when that's done, let's also create an alias for invoking the AWS CLI container:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v3 .
$ alias aws='docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws --env AWS_UID=$UID --name aws aws:v3'

In the Docker CLI command we've aliased, we've defined the AWS_UID environment variable for use inside the container, which is set to the UID of the user invoking the container. All that's left to do, is test the new configuration, using the alias:

$ aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId,State.Name]'
[
    [
        [
            "i-04d3a022e5cc0a140", 
            "terminated"
        ]
    ], 
    [
        [
            "i-009efe47f59402b4e", 
            "terminated"
        ]
    ], 
    [
        [
            "i-0ad081df0fbe1d9e4", 
            "running"
        ]
    ]
]

This time we're successful!

Stepping down from the root user for our containerized AWS CLI, is a fairly trivial example use case. The technique of stepping down to a non-privileged user in an entrypoint script, however, is very common for applications that require privileges to perform some initialisation, prior to invoking the application associated with the container. You might want to create a database, for example, or apply some configuration based on the characteristics of the host, or the command line arguments provided at run time.

Summary

If we hadn't undertaken this exercise to reduce the privileges available inside a container derived from our AWS CLI image, the task of creating the image would have been quite straightforward. However, in taking the time, and expending a little effort, we have taken a considerable step in minimizing the risk of privilege escalation inside the container, which in turn helps to reduce the risk of compromising the host itself. Running containers with a non-privileged user, is one of many steps we can take to secure the containers we run, especially when they are deployed to a production environment.

If you want to find out what else you can do to make your containers more secure, check out my hosted training course - Securing Docker Container Workloads.

Secrets Come to Docker

Secrets Come to Docker

The provision of secure, authenticated access to sensitive data on IT systems, is an integral component of systems design. The secrets that users or peer IT services employ for accessing sensitive data published by an IT service, come in a variety of guises; passwords, X.509 certificates, SSL/TLS keys, GPG keys, SSH keys and so on. Managing and controlling these secrets in service-oriented environments, is non-trivial. With the continued advance in the adoption of the microservices architecture pattern for software applications, and their common implementation as distributed, immutable containers, this challenge has been exacerbated. How do you de-couple the secret from the template (image) of the container? How do you provide the container with the secret without compromising it? Where will the container be running, so as to provide it with the secret? How do you change the secret without interrupting the consumption of the service?

Docker Engine 1.13.0 introduced a new primary object, the secret, when it was released recently. In conjunction with new API endpoints and CLI commands, the new secret object is designed for handling secrets in a multi-container, multi-node environment - a 'swarm mode' cluster. It is not intended or available for use outside of a swarm mode cluster. Whilst the management of secrets is an oft-requested feature for Docker (particularly in the context of building Docker images), it's unclear if or when a secrets solution will be implemented for the standalone Docker host context. For now, people have been encouraged to use the 'service' abstraction in place of deploying individual containers. This requires bootstrapping a swarm mode cluster, even if it only contains a single node, and the service you deploy only comprises a single task. It's a good job it's as simple as,

$ docker swarm init
How are secrets created?

Creating a secret with the Docker client is a straightforward exercise,

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create db_pw -
joptoh9y7x8galitn4ztnk86r  

In this simple example, the Docker CLI reads the content of the secret from STDIN, but it could equally well be a file. The content of the secret can be anything, provided it's size is no more than the secret limit of 500 KB. As with all Docker objects, there are API endpoints and CLI commands for inspecting, listing and removing secrets: docker secret inspect, docker secret ls, docker secret rm. Inspecting the secret provides the following:

$ docker secret inspect db_pw
[
    {
        "ID": "joptoh9y7x8galitn4ztnk86r",
        "Version": {
            "Index": 44
        },
        "CreatedAt": "2017-01-23T13:52:35.810853263Z",
        "UpdatedAt": "2017-01-23T13:52:35.810853263Z",
        "Spec": {
            "Name": "db_pw"
        }
    }
]

Inspecting the secret, doesn't (obvs) show you the content of the secret. It shows the creation time of the secret, and whilst the output displays an UpdatedAt key, secrets cannot be updated by the CLI at present. There is, however, an API endpoint for updating secrets.

The Spec key provides some detail about the secret, just the name in the above example. Like most objects in Docker, it is possible to associate labels with secrets when they are created, and labels appear as part of the value of the Spec key.

How are secrets consumed?

Secrets are consumed by services through explicit association. Services are implemented with tasks (individual containers), which can be scheduled on any node within the swarm cluster. If a service comprises of multiple tasks, an associated secret is accessible to any of the tasks, whichever node they are running on.

A service can be created with access to a secret, using the --secret flag:

$ docker service create --name app --secret db_pw my_app:1.0

In addition, a previously created service can be be granted access to an additional secret or have secrets revoked, using the --secret-add and --secret-rm flags used in conjunction with docker service update.

Where are secrets kept?

A swarm mode cluster uses the Raft Consensus Algorithm in order to ensure that nodes participating in the management of the cluster, agree on the state of the cluster. Part of this process involves the replication of the state to all management nodes in the form of a log.

The implementation of secrets in Docker swarm mode, takes advantage of the highly consistent, distributed nature of Raft, by writing secrets to the raft log, which means they are replicated to each of the manager nodes. The Raft log on each manager node is held in memory whilst the cluster is operating, and is encrypted in Docker 1.13.0+.

How does a container access a secret?

A container that is a task associated with a service that has access to a secret, has the secret mounted onto its filesystem under /run/secrets, which is a tmpfs filesystem residing in memory. For example, if the secret is called db_pw, it's available inside the container at /var/run/secrets/db_pw for as long as the container is running (/var/run is a symlink to /run). If the container is halted for any reason, /run/secrets is no longer a component of the container's filesystem, and the secret is also flushed from the hosting node's memory.

The secrets user interface provides some flexibility regarding a service's consumption of the secret. The secret can be mounted with a different name to the one provided during its creation, and its possible to set the UID, GID and mode for the secret. For example, the db_pw secret could be made available inside container tasks with the following attributes:

$ docker service create --name app --secret source=db_pw,target=password,uid=2000,gid=3000,mode=0400 my_app:1.0

Inside the container, this would yield:

root@a61281217232:~# ls -l /var/run/secrets  
total 8  
-r--r--r-- 1 root root 32 Jan 23 11:49 my_secret
-r-------- 1 2000 3000 32 Jan 23 11:49 password
How are secrets updated?

By design, secrets in Docker swarm mode are immutable. If a secret needs to be rotated, it must first be removed from the service, before being replaced with a new secret. The replacement secret can be mounted in the same location. Let's take a look at an example. First we'll create a secret, and use a version number in the secret name, before adding it to a service as password:

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create my_secret_v1.0 -
o8ozmi3sc0clf55p90oo7unaj  
$ docker service create --name nginx --secret source=my_secret_v1.0,target=password nginx
t19vuui8u7le66ct0z9cwshlx  

Once the task is running, the secret will be available in the container at /var/run/secrets/password. If the secret is changed, the service can be updated to reflect this:

$ < /dev/urandom tr -dc 'a-z0-9' | head -c 32 | docker secret create my_secret_v1.1 -
p4zugztwx00jf48zz9drv2ov0  
$ docker service update --secret-rm my_secret_v1.0 --secret-add source=my_secret_v1.1,target=password nginx
nginx  

Each service update results in the replacement of existing tasks based on the update policy defined by the --update-parallelism and --update-delay flags (1 and 0s by default, respectively). If the service comprises of multiple tasks, and the update is configured to be applied over a period of time, then some tasks will be using the old secret, whilst the updated tasks will be using the new secret. Clearly, some co-ordination needs to take place between service providers and consumers, when secrets are changed!

After the update, the new secret is available for all tasks that make up the service, and it can be removed (if desired). It can't be removed whilst a service is using the secret:

$ docker secret rm my_secret_v1.0
o8ozmi3sc0clf55p90oo7unaj  
Summary

Introduced in Docker 1.13.0:

  • A new secrets object, along with API endpoints and CLI commands
  • Available in swarm mode only
  • Secrets are stored in the Raft log associated with the swarm cluster
  • Mounted in tmpfs inside a container

Container Mania

Nearly 60 years after Malcolm McLean developed the intermodal shipping container for the transportation of goods, it seems its computing equivalent has arrived with considerable interest, much expression, and with some occasional controversy.

Containers are a form of virtualisation, but unlike the traditional virtual machine, they are very light in terms of footprint and resource usage. Applications running in containers don't need a full blown guest operating system to function, they just need the bare minimum in terms of OS binaries and libraries, and share the host's kernel with other containers and the host itself. The light nature of containers, and the very quick speeds with which container workloads can be provisioned, can however, be contrasted with a reduction in the level of isolation when compared to a traditional virtual machine, and they are currently only available on the Linux OS. Equivalent capabilities exist in other *nix operating systems, such as Solaris (Zones) and FreeBSD (Jails), but not the Windows platform .... yet. When it comes to choosing whether to plump for containers or virtual machines, it's a matter of horses for courses.

So, why now? What has provoked the current interest and activity? The sudden popularity of containers has much to do with technological maturity, inspired innovation, and an evolving need.

Maturity:
Whilst some aspects of the technology that provides Linux containers has been around for a number of years, it's true to say that the 'total package' has only recently matured to a level where its inherent in the kernels shipped with most Linux distributions.

Innovation:
Containers are an abstraction of Linux kernel namespaces and control groups (or cgroups), and as such it requires some effort and knowledge on the part of the user to create and invoke a container. This has inhibited their take up as a means of isolating workloads. Enter stage left, the likes of Docker (libcontainer library), Rocket, LXC and lmctfy, all of which serve to commodotise the container. Docker, in particular, has captured the hearts and minds of the Devops community, with its platform for delivering distributed applications in containers.

Need:
Containers are a perfect fit for a recent trend in architecting software applications as small, discrete, independent microservices. Whilst there is no formal definition of a microservice, it is generally considered that a microservice is a highly de-coupled, independent process with a specific function, which often communicates via a RESTful HTTP API. It's entirely possible to use containers to run multiple processes (as is the case with LXC), but the approach taken by Docker and Rocket is to encourage a single process per container, fitting neatly with the microservice aspiration.

The fact that all major cloud and operating system vendors, including Microsoft, are busy developing their capabilities regarding containers, is evidence enough that containers will have a big part to play in workload deployment in the coming years. This means the stakes are high for the organisations behind the different technologies, which has led to some differences of opinion. On the whole, however, the majority of the technologies are being developed in the open, using a community-based model, which should significantly aid continued innovation, maturity, and adoption.

This article serves as an introduction to a series of articles that examine the fundamental building blocks for containers; namespaces and cgroups.