Managing the privileges available to a container, is important to the ongoing integrity of the container and the host on which it runs. With privilege, comes power, and the potential to abuse that power, wittingly or unwittingly.

A simple container example, serves to illustrate:

$ sudo docker container run -itd --name test alpine sh
8588dfbfc89fc5761c11ebff6c9319fb655da92a1134cd5810031149e5cfc6e0  
$ sudo docker container top test -eo pid
PID  
2140  
$ ps -fp 26919
UID        PID  PPID  C STIME TTY          TIME CMD  
root      2140  2109  0 10:31 pts/0    00:00:00 sh  

A container is started in detached mode, we retrieve the process ID from the perspective of the default (or host's) PID namespace, list the process, and find that the UID (user ID) associated with the container's process is root. It turns out that the set of UIDs and GIDs (Groups IDs) are the same for the container and the host, because containers are started with the privileged user with UID/GID=0 (aka root or superuser). A big ask, but if the container's process were able to break out of the confines of the container, it would have root access on the host.

There are lots of things we can do to mitigate this risk. Docker removes a lot of potentially, pernicious privileges by dropping capabilities, and applying other security mechanisms in order to minimize the potential attack surface. We can even make use of user namespaces, by configuring the Docker daemon to map a UID/GID range from the host, onto another range in the container. This means a container's process, running as the privileged root user, will map to a non-privileged user on the host.

If you're able to make use of the --userns-remap config option on the daemon, to perform this mapping, you absolutely should. Unfortunately, it's not always possible or desirable to do so - another story, another post! This puts us back to square one; what can we do to minimize the risk? The simple answer is, that we should always be guided by the principle of least privilege. Often, containers need privileges that are associated with the root user, but if they don't, then you should take action to run your containers as a benign user. How do you achieve this?

A Simple Example

Let's take a simple Dockerfile example, which defines a Docker image for the AWS CLI. This use case might be more suited to a local developer's laptop, rather than a sensitive, production-based environment, but it will serve as an illustration. The image enables us to install and run AWS CLI commands in a container, rather than on the host itself:

FROM alpine:latest

# Define build argument for AWS CLI version
ARG VERSION

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                  && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip

CMD ["help"]  
ENTRYPOINT ["aws"]  

Assuming the contents of the above are in a file called Dockerfile, located in the current working directory, we can use the docker image build command to build this image. Assuming we have made the local user a member of the group docker, which for convenience, will provide unfettered access to the Docker CLI (something that should only ever be done in a development environment), the following will create the image:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v1 .

We could then check the image works as intended, by running a container derived from the image. This is equivalent to running the command aws --version in a non-containerized environment:

$ docker container run --rm --name aws aws:v1 --version
aws-cli/1.14.38 Python/2.7.14 Linux/4.4.0-112-generic botocore/1.8.42  

This is all well and good, but as we didn't take any action to curtail any privileges, the container ran as the root user, with UID/GID=0. This level of privilege is not necessary to run AWS CLI commands, so let's do something about it!

Using a Non-privileged User

To fix this, we can add a non-privileged user to the image, and then 'set' the user for the image to the non-privileged user, so that a derived container's process, is no longer privileged. The changes to the Dockerfile, might look something like this:

FROM alpine:latest

# Define build argument for AWS CLI version
ARG VERSION

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                  && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip                              && \
    addgroup aws                                && \
    adduser -D -G aws aws

USER aws

WORKDIR /home/aws

CMD ["help"]  
ENTRYPOINT ["aws"]  

All we've done, is add two commands to the RUN instruction, to add a group called aws, and to add a user called aws that belongs to the aws group. In order to make use of the aws user, however, we also have to set the user with the USER Dockerfile instruction, and whilst we're at it, we'll set the working context in the filesystem, to its home directory, courtesy of the WORKDIR instruction. We can re-build the image, tagging it as v2 this time:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v2 .

Now that we have a new variant of the aws image, we'll run up a new container, but we'll not specify any command line arguments, which means the argument for the aws command will be help, as specified with the CMD instruction in the Dockerfile:

$ docker container run --rm -it --name aws aws:v2

Unsurprisingly, this will list help for the AWS CLI, which is piped to less, which will give us the opportunity to poke around whilst the container is still running. In another terminal on the host, if we repeat the exercise we carried out earlier, when we looked for the container's process(es), we get the following:

$ docker container top aws -eo pid
PID  
2436  
2487  
$ ps -fp 2436,2487
UID        PID  PPID  C STIME TTY          TIME CMD  
rackham   2436  2407  0 14:27 pts/0    00:00:00 /usr/bin/python2 /usr/bin/aws he  
rackham   2487  2436  0 14:27 pts/0    00:00:00 less -R  

It reports that the processes are running with the UID associated with the user rackham. In actual fact, the UID 1000 is associated with the user rackham on the host, but in the container, the UID 1000 is associated with the user aws:

$ id -u rackham
1000  
$ docker container exec -it aws id
uid=1000(aws) gid=1000(aws) groups=1000(aws)  

What really matters is the UID, not the user that it translates to, as the kernel works with the UID when it comes to access control. With the trivial changes made to the image, our container is happily running as a non-privileged user, which should provide us with some peace of mind.

IDs and Bind Mounts

There is something missing from the AWS CLI image, however. In order to do anything meaningful, the AWS CLI commands need access to the user's AWS configuration and credentials, in order to access the AWS API. Obviously, we shouldn't bake these into the image, especially if we intend to share the image with others! We could pass them as environment variables, but whilst this might be a means for injecting configuration items into a container, it's not safe for sensitive data, such as credentials. If you allow others, access to the same Docker daemon, without limiting access using an access authorization plugin, environment variables will be exposed to other users, if they use the docker container inspect command. Another approach would be to bind mount the files containing the relevant data, into the container at run time. In fact, if we want to make use of the aws configure command, to update our local AWS configuration, this is the only way we can update those files, when using a container.

On Linux, AWS config and credentials files are normally located in $HOME/.aws, so we need to bind mount this directory inside the container, at the /home/aws/.aws location of the container's user. We need to do this, each time we want to execute an AWS CLI command using a container. Let's try this out, and try to list the instances running in the default region, which is specified in the AWS config file located in /home/aws/.aws. This command is equivalent to running aws ec2 describe-instances:

$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--name aws aws:v2 ec2 describe-instances
You must specify a region. You can also configure your region by running "aws configure"  

That didn't go too well! The error message would suggest that the aws command can't find the files. After we've ascertained that the local user's UID/GID is 1001, if we run another container, and override the container's entrypoint, and run ls -l ./.aws, we can see the reason for the error:

$ id
uid=1001(baxter) gid=1001(baxter) groups=1001(baxter),27(sudo),999(docker)  
$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--entrypoint ls --name aws aws:v2 -l ./.aws
total 8  
-rw-------    1 1001     1001           149 Feb 13 16:20 config
-rw-------    1 1001     1001           229 Feb 13 15:42 credentials

The files are present inside the container, but they are owned by UID/GID=1001. Remember, whilst we didn't specify a deterministic UID/GID for the container's user in the image, the addgroup and adduser commands, created the aws user with a UID/GID=1000. There is a mismatch between the UID/GIDs, and the file permissions are such that the container's user cannot read or write to the files.

This is a big problem. We've been careful to ensure that our container runs with diminished privileges, but ended up with a problem to resolve, as a consequence.

We could try and circumvent this problem, by using the --user config option to the docker container run command, and specify the container gets run with a UID/GID=1001, instead of 1000:

$ docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws \
--user 1001:1001 --name aws aws:v2 ec2 describe-instances
You must specify a region. You can also configure your region by running "aws configure".  

This error message is starting to become familiar. The reason, this time, is that there is no 'environment' ($HOME to be precise) for a user with UID/GID=1001, which the AWS CLI needs in order to locate the config and credentials files. This is because there is no user configured in the container's filesystem with UID/GID=1001. We might be tempted to pass a HOME environment variable to the docker container run command, or even to alter the Dockerfile to provide deterministic values for the UID/GID. If we succumb to these seductions, then we're in danger of making the image very specific to a given host, and relying too much on a consumer of our image, to figure out how to make it work around these idiosyncrasies. A better option, would be to add the aws user after the container has been created, which will give us the ability to add the user with the required UID/GID. Let's see how to do this.

Defer Stepping Down to a Non-privileged User

The image for the AWS CLI is immutable, so we can't define a 'variable' aws user in the Dockerfile. Instead, we can make use of an entrypoint script, which will get executed when the container starts. It replaces the aws command, that is specified as the entrypoint in the Dockerfile. Here's a revised Dockerfile:

$ FROM alpine:latest

# Define build time argument for AWS CLI version
ARG VERSION

# Add default UID for 'aws' user
ENV AWS_UID=1000

# Install dependencies, AWS CLI and clean up.
RUN set -ex                                     && \  
    apk add --no-cache                             \
        python                                     \
        groff                                      \
        less                                       \
        py-pip                                     \
        su-exec                                 && \
    pip --no-cache-dir install awscli==$VERSION && \
    apk del py-pip                              && \
    mkdir -p /home/aws

COPY docker-entrypoint.sh /usr/local/bin/

WORKDIR /home/aws

CMD ["help"]  
ENTRYPOINT ["docker-entrypoint.sh"]  

In addition to changing the entrypoint, and copying the script from the build context with the COPY instruction, we've added an environment variable specifying a default UID for the aws user (in case the user neglects to do so), removed the commands from the RUN instruction for creating the user, and added a command to create the mount point for the bind mount. We've also added a utility to the image, called su-exec, which will enable our script to step down from the root user to the aws user at the last moment.

Let's get to the entrypoint script, itself:

#!/bin/sh

# If --user is used on command line, cut straight to aws command.
# The command will fail, unless the AWS region and profile have
# been provided as command line arguments or envs.
if [ "$(id -u)" != '0' ]; then  
    exec aws "$@"
fi

# Add 'aws' user using $AWS_UID and $AWS_GID
if [ ! -z "${AWS_GID+x}" ] && [ "$AWS_GID" != "$AWS_UID" ]; then  
    addgroup -g $AWS_GID aws
    adduser -D -G aws -u $AWS_UID aws
else  
    adduser -D -u $AWS_UID aws
fi

# Step down from root to aws, and run command
exec su-exec aws aws "$@"  

When the script is invoked, it is running with the all powerful UID/GID=0, unless the user has invoked the container using the --user config option. As the script needs root privileges to create the aws user, if its invoked with any other user, it won't be possible to create the aws user. Hence, a check is made early on in the script, and if the user associated with the container's process is not UID=0, then we simply use exec to replace the script with the aws command, and any arguments passed at the end of the command which invoked the container (e.g ec2 describe-instances). In this scenario, the command will fail if it is required to provide a default region and credentials.

What we would prefer the user to do instead, is specify an environment variable, AWS_UID (and optionally, AWS_GID), on the command line, which reflects the owner of the AWS config and credentials files on the host. Using this variable, the script will create the aws user with a corresponding UID/GID, before the script is replaced with the desired AWS CLI command, which is executed as the aws user, courtesy of the su-exec utility. First we must re-build the image, and when that's done, let's also create an alias for invoking the AWS CLI container:

$ docker image build --build-arg VERSION="1.14.38" -t aws:v3 .
$ alias aws='docker container run --rm -it --mount type=bind,source=$HOME/.aws,target=/home/aws/.aws --env AWS_UID=$UID --name aws aws:v3'

In the Docker CLI command we've aliased, we've defined the AWS_UID environment variable for use inside the container, which is set to the UID of the user invoking the container. All that's left to do, is test the new configuration, using the alias:

$ aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId,State.Name]'
[
    [
        [
            "i-04d3a022e5cc0a140", 
            "terminated"
        ]
    ], 
    [
        [
            "i-009efe47f59402b4e", 
            "terminated"
        ]
    ], 
    [
        [
            "i-0ad081df0fbe1d9e4", 
            "running"
        ]
    ]
]

This time we're successful!

Stepping down from the root user for our containerized AWS CLI, is a fairly trivial example use case. The technique of stepping down to a non-privileged user in an entrypoint script, however, is very common for applications that require privileges to perform some initialisation, prior to invoking the application associated with the container. You might want to create a database, for example, or apply some configuration based on the characteristics of the host, or the command line arguments provided at run time.

Summary

If we hadn't undertaken this exercise to reduce the privileges available inside a container derived from our AWS CLI image, the task of creating the image would have been quite straightforward. However, in taking the time, and expending a little effort, we have taken a considerable step in minimizing the risk of privilege escalation inside the container, which in turn helps to reduce the risk of compromising the host itself. Running containers with a non-privileged user, is one of many steps we can take to secure the containers we run, especially when they are deployed to a production environment.

If you want to find out what else you can do to make your containers more secure, check out my hosted training course - Securing Docker Container Workloads.