A Basic Container

We’ve looked at five of the six available namespaces provided by the Linux kernel in a series of previous articles, and we’ll take a look at the final namespace, the USER namespace, in a future article. This article looks at how we can combine a number of the namespaces with a specially prepared directory, in which we’ll ‘jail’ our process using the chroot system call. Although our implementation will be missing a few key features that normally accompany container implementations (e.g. cgroups), the resulting environment in which our process will run, can be considered a very rudimentary container of sorts. It isolates the process from several different system resources, and contains the process within a limited filesystem.

The first thing we need to do is to prepare a directory on the host which will become the root filesystem for the container, which will be located at /var/local/jail. We’re going to provide just a few binaries for the container to use; env, bash, ps, ls and top¹.

It’s not just a simple matter of copying the binaries to /var/local/jail, each binary relies on shared libraries, and we also need to ensure they are available in the appropriate directory in the container’s filesystem. To do this, we can make use of the ldd command, whose purpose is to provide information regarding the shared libraries used by a particular binary. I’ve created a script called binlibdepcp.sh, which takes care of determining the library dependencies for a binary, and then copying them along with the relevant libraries to the correct locations in the container’s filesystem. It also copies the ld.so.cache file, which is the list of directories that is searched for libraries, in the event that a required library does not reside in /lib or /usr/lib. The script is available on GitHub.

Let’s demonstrate this for the env binary, which is located at /usr/bin/env. Having previously created the /var/local/jail directory, the env binary and libraries are copied to the correct location under /var/local/jail with the following:

$ sudo ./binlibdepcp.sh /usr/bin/env /var/local/jail
[sudo] password for wolf:
Copying ...

                      env : [OK]
                libc.so.6 : [OK]
     ld-linux-x86-64.so.2 : [OK]
              ld.so.cache : [OK]

...... Done

We can repeat this exercise for the other binaries we intend to use within the container, which, more likely than not, are located in either /usr/bin or /bin. Additionally, so that our commands will display nicely when we run them inside the container, we need to provide the relevant portion of the terminfo database. Assuming we have an xterm, we can copy this into the jail (the location of the terminfo database varies from Linux distro to distro, so make sure to copy the files to the correct directory under /var/local/jail):

$ sudo mkdir -p /var/local/jail/lib/terminfo/x
[sudo] password for wolf:
$ sudo cp -p /lib/terminfo/x/* /var/local/jail/lib/terminfo/x

That’s the container’s filesystem prepared. Now we need to amend the program we have slowly been developing whilst we’ve been looking at the properties of containers. The changes are available in the invoke_ns6.c source file, which can be found here.

The first change is to add a new command line option, -c which must be accompanied with a directory path, which will be the root of the jail:

// Parse command line options and construct arguments
// to be passed to childFunction
while ((option = getopt(argc, argv, "+hvpmu:ni:c:")) != -1) {
    switch (option) {
        case 'c':
            args.jail = 1;
            args.path = malloc(sizeof(char *) * (strlen(optarg) + 1));
            strcpy(args.path, optarg);
            break;
        case 'i':
            if (strcmp("no", optarg) != 0 && strcmp("yes", optarg) != 0) {
                fprintf(stderr, "%s: option requires valid argument -- 'i'\n", argv[0]);
                usage(argv[0]);
                exit(EXIT_FAILURE);
            }
            else
                if (strcmp("yes", optarg) == 0)
                    flags |= CLONE_NEWIPC;
            args.ipc = 1;
            break;
        case 'n':
            flags |= CLONE_NEWNET;
            break;
        case 'u':
            flags |= CLONE_NEWUTS;
            args.hostname = malloc(sizeof(char *) * (strlen(optarg) + 1));
            strcpy(args.hostname, optarg);
            break;
        case 'm':
            flags |= CLONE_NEWNS;
            break;
        case 'p':
            flags |= CLONE_NEWPID;
            break;
        case 'v':
            args.verbose = 1;
            break;
        case 'h':
            usage(argv[0]);
            exit(EXIT_SUCCESS);
        default:
            usage(argv[0]);
            exit(EXIT_FAILURE);
    }
}

The other main change is to add some code to ensure that if the -c option is supplied, the cloned child process is jailed inside the directory with the chroot system call. The chroot system call changes the root directory of the child process, and we change the current working directory to that root directory, and then create a /proc directory within it:

// If specified, place process in chroot jail
if (args->jail) {
    if (args->verbose)
        printf(" Child: creating chroot jail\n");
    if (chroot(args->path) == -1) {
        perror(" Child: childFunction: chroot");
        exit(EXIT_FAILURE);
    }
    else {
        if (args->verbose)
            printf(" Child: changing directory into chroot jail\n");
        if (chdir("/") == -1) {
            perror(" Child: childFunction: chdir");
            exit(EXIT_FAILURE);
        }
        if (access("/proc", F_OK) != 0)
            if (mkdir("/proc", 0555) == -1) {
                perror(" Child: childFunction: mkdir");
                exit(EXIT_FAILURE);
            }
    }
}

We can now invoke our container (with a customised command prompt) with the following command:

$ sudo ./invoke_ns -vpmu calculus -c /var/local/jail env PS1="\h [\W] " TERM=$TERM bash --norc
[sudo] password for wolf:
calculus [/]

Now that we have an interactive bash command shell running inside the container, we can use the ls, ps and top commands to verify we have a very minimal operating environment, if not a very useful one! It doesn’t take much imagination, however, to see the possibilities for containing independent workloads in minimal, lightweight containers.

In reality, a process inside a container needs a few more things than we have provided in our rudimentary version. Thankfully, the excellent work that has been conducted in the open source community with projects like Docker and LXC, have taken the hard work out of creating and manipulating workloads within containers.

You could also use the content of a root filesystem obtained from a suitable distro, e.g. Ubuntu ↩︎