Part 7 - A Basic Container

We've looked at five of the six available namespaces provided by the Linux kernel in a series of previous articles, and we'll take a look at the final namespace, the USER namespace, in a future article. This article looks at how we can combine a number of the namespaces with a specially prepared directory, in which we'll 'jail' our process using the chroot system call. Although our implementation will be missing a few key features that normally accompany container implementations (e.g. cgroups), the resulting environment in which our process will run, can be considered a very rudimentary container of sorts. It isolates the process from several different system resources, and contains the process within a limited filesystem.

The first thing we need to do is to prepare a directory on the host which will become the root filesystem for the container, which will be located at /var/local/jail. We're going to provide just a few binaries for the container to use; env, bash, ps, ls and top.

It's not just a simple matter of copying the binaries to /var/local/jail, each binary relies on shared libraries, and we also need to ensure they are available in the appropriate directory in the container's filesystem. To do this, we can make use of the ldd command, whose purpose is to provide information regarding the shared libraries used by a particular binary. I've created a script called binlibdepcp.sh, which takes care of determining the library dependencies for a binary, and then copying them along with the relevant libraries to the correct locations in the container's filesystem. It also copies the ld.so.cache file, which is the list of directories that is searched for libraries, in the event that a required library does not reside in /lib or /usr/lib. The script is available on GitHub.

Let's demonstrate this for the env binary, which is located at /usr/bin/env. Having previously created the /var/local/jail directory, the env binary and libraries are copied to the correct location under /var/local/jail with the following:

$ sudo ./binlibdepcp.sh /usr/bin/env /var/local/jail
[sudo] password for wolf:
Copying ...

                      env : [OK]
                libc.so.6 : [OK]
     ld-linux-x86-64.so.2 : [OK]
              ld.so.cache : [OK]

...... Done

We can repeat this exercise for the other binaries we intend to use within the container, which, more likely than not, are located in either /usr/bin or /bin. Additionally, so that our commands will display nicely when we run them inside the container, we need to provide the relevant portion of the terminfo database. Assuming we have an xterm, we can copy this into the jail (the location of the terminfo database varies from Linux distro to distro, so make sure to copy the files to the correct directory under /var/local/jail):

$ sudo mkdir -p /var/local/jail/lib/terminfo/x
[sudo] password for wolf:
$ sudo cp -p /lib/terminfo/x/* /var/local/jail/lib/terminfo/x

That's the container's filesystem prepared. Now we need to amend the program we have slowly been developing whilst we've been looking at the properties of containers. The changes are available in the invoke_ns6.c source file, which can be found here.

The first change is to add a new command line option, -c which must be accompanied with a directory path, which will be the root of the jail:

// Parse command line options and construct arguments
// to be passed to childFunction
while ((option = getopt(argc, argv, "+hvpmu:ni:c:")) != -1) {  
    switch (option) {
        case 'c':
            args.jail = 1;
            args.path = malloc(sizeof(char *) * (strlen(optarg) + 1));
            strcpy(args.path, optarg);
        case 'i':
            if (strcmp("no", optarg) != 0 && strcmp("yes", optarg) != 0) {
                fprintf(stderr, "%s: option requires valid argument -- 'i'\n", argv[0]);
                if (strcmp("yes", optarg) == 0)
                    flags |= CLONE_NEWIPC;
            args.ipc = 1;
        case 'n':
            flags |= CLONE_NEWNET;
        case 'u':
            flags |= CLONE_NEWUTS;
            args.hostname = malloc(sizeof(char *) * (strlen(optarg) + 1));
            strcpy(args.hostname, optarg);
        case 'm':
            flags |= CLONE_NEWNS;
        case 'p':
            flags |= CLONE_NEWPID;
        case 'v':
            args.verbose = 1;
        case 'h':

The other main change is to add some code to ensure that if the -c option is supplied, the cloned child process is jailed inside the directory with the chroot system call. The chroot system call changes the root directory of the child process, and we change the current working directory to that root directory, and then create a /proc directory within it:

// If specified, place process in chroot jail
if (args->jail) {  
    if (args->verbose)
        printf(" Child: creating chroot jail\n");
    if (chroot(args->path) == -1) {
        perror(" Child: childFunction: chroot");
    else {
        if (args->verbose)
            printf(" Child: changing directory into chroot jail\n");
        if (chdir("/") == -1) {
            perror(" Child: childFunction: chdir");
        if (access("/proc", F_OK) != 0)
            if (mkdir("/proc", 0555) == -1) {
                perror(" Child: childFunction: mkdir");

We can now invoke our container (with a customised command prompt) with the following command:

$ sudo ./invoke_ns -vpmu calculus -c /var/local/jail env PS1="\h [\W] " TERM=$TERM bash --norc
[sudo] password for wolf:
calculus [/]  

Now that we have an interactive bash command shell running inside the container, we can use the ls, ps and top commands to verify we have a very minimal operating environment, if not a very useful one! It doesn't take much imagination, however, to see the possibilities for containing independent workloads in minimal, lightweight containers.

In reality, a process inside a container needs a few more things than we have provided in our rudimentary version. Thankfully, the excellent work that has been conducted in the open source community with projects like Docker and LXC, have taken the hard work out of creating and manipulating workloads within containers.

Container Mania

Nearly 60 years after Malcolm McLean developed the intermodal shipping container for the transportation of goods, it seems its computing equivalent has arrived with considerable interest, much expression, and with some occasional controversy.

Containers are a form of virtualisation, but unlike the traditional virtual machine, they are very light in terms of footprint and resource usage. Applications running in containers don't need a full blown guest operating system to function, they just need the bare minimum in terms of OS binaries and libraries, and share the host's kernel with other containers and the host itself. The light nature of containers, and the very quick speeds with which container workloads can be provisioned, can however, be contrasted with a reduction in the level of isolation when compared to a traditional virtual machine, and they are currently only available on the Linux OS. Equivalent capabilities exist in other *nix operating systems, such as Solaris (Zones) and FreeBSD (Jails), but not the Windows platform .... yet. When it comes to choosing whether to plump for containers or virtual machines, it's a matter of horses for courses.

So, why now? What has provoked the current interest and activity? The sudden popularity of containers has much to do with technological maturity, inspired innovation, and an evolving need.

Whilst some aspects of the technology that provides Linux containers has been around for a number of years, it's true to say that the 'total package' has only recently matured to a level where its inherent in the kernels shipped with most Linux distributions.

Containers are an abstraction of Linux kernel namespaces and control groups (or cgroups), and as such it requires some effort and knowledge on the part of the user to create and invoke a container. This has inhibited their take up as a means of isolating workloads. Enter stage left, the likes of Docker (libcontainer library), Rocket, LXC and lmctfy, all of which serve to commodotise the container. Docker, in particular, has captured the hearts and minds of the Devops community, with its platform for delivering distributed applications in containers.

Containers are a perfect fit for a recent trend in architecting software applications as small, discrete, independent microservices. Whilst there is no formal definition of a microservice, it is generally considered that a microservice is a highly de-coupled, independent process with a specific function, which often communicates via a RESTful HTTP API. It's entirely possible to use containers to run multiple processes (as is the case with LXC), but the approach taken by Docker and Rocket is to encourage a single process per container, fitting neatly with the microservice aspiration.

The fact that all major cloud and operating system vendors, including Microsoft, are busy developing their capabilities regarding containers, is evidence enough that containers will have a big part to play in workload deployment in the coming years. This means the stakes are high for the organisations behind the different technologies, which has led to some differences of opinion. On the whole, however, the majority of the technologies are being developed in the open, using a community-based model, which should significantly aid continued innovation, maturity, and adoption.

This article serves as an introduction to a series of articles that examine the fundamental building blocks for containers; namespaces and cgroups.