UAO (User Access Override) as a mitigation against addr_limit overwrites

by Vitaly Nikolenko


Posted on September 4, 2020 at 6:10PM


Introduction

This post describes UAO (User Access Override) implementation details in the Linux/Android kernel and demonstrates its effectiveness against the common addr_limit overwrite kernel exploitation technique.

Currently most recently Android devices on the market are based on the ARMv8.2 architecture extension. UAO feature was introduced in the ARMv8.2 extension. It allows unprivileged load and store (ldtr*/sttr*) instructions to behave as standard privileged ldr*/str* instructions.

Implementation details

On systems without UAO support, standard privileged ldr*/str* instructions are used when accessing user-space addresses:

 Dump of assembler code for function __arch_copy_to_user:
...
     0xffffffc000405ea0 <+32>:    sub     x2, x2, x4
     0xffffffc000405ea4 <+36>:    tbz     w4, #0, 0xffffffc000405eb4 <__arch_copy_to_user+52>
-->  0xffffffc000405ea8 <+40>:    ldrb    w3, [x1],#1
-->  0xffffffc000405eac <+44>:    strb    w3, [x6],#1
     0xffffffc000405eb0 <+48>:    nop
     0xffffffc000405eb4 <+52>:    tbz     w4, #1, 0xffffffc000405ec4 <__arch_copy_to_user+68>
-->  0xffffffc000405eb8 <+56>:    ldrh    w3, [x1],#2
-->  0xffffffc000405ebc <+60>:    strh    w3, [x6],#2
     0xffffffc000405ec0 <+64>:    nop
     0xffffffc000405ec4 <+68>:    tbz     w4, #2, 0xffffffc000405ed4 <__arch_copy_to_user+84>
-->  0xffffffc000405ec8 <+72>:    ldr     w3, [x1],#4
-->  0xffffffc000405ecc <+76>:    str     w3, [x6],#4
     0xffffffc000405ed0 <+80>:    nop
     0xffffffc000405ed4 <+84>:    tbz     w4, #3, 0xffffffc000405ee4 <__arch_copy_to_user+100>
-->  0xffffffc000405ed8 <+88>:    ldr     x3, [x1],#8
-->  0xffffffc000405edc <+92>:    str     x3, [x6],#8
...

If PAN (Privileged Access Never) is enabled, these privileged instructions generate faults when reading/writing from/to user-space addresses. Hence, the standard procedure in all user-space accessor functions (copy_to/from_user, get/put_user, memdup_user, etc.) temporarily disable PAN (specifically PSTATE.PAN), perform the user-space data access and then re-enable PAN.

ENTRY(__arch_copy_to_user)
        uaccess_enable_not_uao x3, x4, x5
        add     end, x0, x2
#include "copy_template.S"                                [1]
        uaccess_disable_not_uao x3, x4
        mov     x0, #0
        ret
ENDPROC(__arch_copy_to_user)

Functions uaccess_enable_not_uao and uaccess_disable_not_uao are simply no-ops when UAO is supported by the processor:

static inline void uaccess_disable_not_uao(void)
{
        __uaccess_disable(ARM64_ALT_PAN_NOT_UAO);
}

static inline void uaccess_enable_not_uao(void)
{
        __uaccess_enable(ARM64_ALT_PAN_NOT_UAO);
}

#define __uaccess_disable(alt)                                          \
do {                                                                    \
        if (!uaccess_ttbr0_disable())                                   \
-->             asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), alt,          \
                                CONFIG_ARM64_PAN));                     \
} while (0)

#define __uaccess_enable(alt)                                           \
do {                                                                    \
        if (!uaccess_ttbr0_enable())                                    \
-->             asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), alt,          \
                                CONFIG_ARM64_PAN));                     \
} while (0)

The above macros ensure that PAN is enabled/disabled on user-space data accesses when there's no UAO support, i.e., when ARM64_ALT_PAN_NOT_UAO is specified. Otherwise, the whole block becomes a nop.

The actual data copy implementation in [1] then uses overloaded/macro load and store instructions:

...
        mov     dst, dstin
        cmp     count, #16
        /*When memory length is less than 16, the accessed are not aligned.*/
        b.lo    .Ltiny15

        neg     tmp2, src
        ands    tmp2, tmp2, #15/* Bytes to reach alignment. */
        b.eq    .LSrcAligned
        sub     count, count, tmp2
        /*
        * Copy the leading memory data from src to dst in an increasing
        * address order.By this way,the risk of overwritting the source
        * memory data is eliminated when the distance between src and
        * dst is less than 16. The memory accesses here are alignment.
        */
        tbz     tmp2, #0, 1f
-->     ldrb1   tmp1w, src, #1
-->     strb1   tmp1w, dst, #1
1:
        tbz     tmp2, #1, 2f
-->     ldrh1   tmp1w, src, #2
-->     strh1   tmp1w, dst, #2
2:
        tbz     tmp2, #2, 3f
...

These macro load and store instructions behave differently depending on the UAO status:

       .macro ldrb1 ptr, regB, val
        ldrb  \ptr, [\regB], \val
        .endm

        .macro strb1 ptr, regB, val
-->     uao_user_alternative 9998f, strb, sttrb, \ptr, \regB, \val
        .endm

        .macro ldrh1 ptr, regB, val
        ldrh  \ptr, [\regB], \val
        .endm

        .macro strh1 ptr, regB, val
-->     uao_user_alternative 9998f, strh, sttrh, \ptr, \regB, \val
        .endm

        .macro ldr1 ptr, regB, val
        ldr \ptr, [\regB], \val
        .endm

        .macro str1 ptr, regB, val
-->     uao_user_alternative 9998f, str, sttr, \ptr, \regB, \val
        .endm
...

For example, in case of copy_to_user, the store instructions above use the uao_user_alternative macro to replace privileged str* instructions with unprivileged sttr* instructions [2] when UAO is enabled and supported by the processor:

        .macro uao_user_alternative l, inst, alt_inst, reg, addr, post_inc
                alternative_if_not ARM64_HAS_UAO
8888:                   \inst   \reg, [\addr], \post_inc;
                        nop;
                alternative_else
                        \alt_inst       \reg, [\addr];                  [2]
                        add             \addr, \addr, \post_inc;
                alternative_endif

                .section __ex_table,"a";
                .align  3;
                .quad   8888b,\l;
                .previous
        .endm
Enabling UAO

The key implementation detail is unprivileged load/store instructions (unlike their privileged counterparts) don't generate faults when accessing user-space memory from EL1 (even if PAN is enabled/supported by the processor). This means that on systems with UAO support there's no need to enable and disable PAN when executing user-space accessor functions - PAN is always enabled.

The UAO bit then allows control over how these unprivileged load/store instructions operate in EL1. If the UAO bit is 0 (disabled), ldtr*/sttr* don't generate faults on user-space accesses as stated above. However, when this bit is set, ldtr*/sttr* instructions behave as equivalent privileged ldr*/str* instructions.

In order to catch invalid user-space accesses in the kernel user-space accessor functions, the UAO bit is enabled when the addr_limit of the current task is set to KERNEL_DS (i.e., (unsigned long)-1):

void uao_thread_switch(struct task_struct *next)
{
        if (IS_ENABLED(CONFIG_ARM64_UAO)) {
                if (task_thread_info(next)->addr_limit == KERNEL_DS)
                        asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), ARM64_HAS_UAO));
                else
                        asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO));
        }
}

The above function is called from __switch_to [3] which is executed on every context switch:

struct task_struct *__switch_to(struct task_struct *prev,
                                struct task_struct *next)
{
        struct task_struct *last;

        fpsimd_thread_switch(next);
        tls_thread_switch(next);
        hw_breakpoint_thread_switch(next);
        contextidr_thread_switch(next);
#ifdef CONFIG_THREAD_INFO_IN_TASK
        entry_task_switch(next);
#endif
        uao_thread_switch(next);                                        [3]

        /*
         * Complete any pending TLB or cache maintenance on this CPU in case
         * the thread migrates to a different CPU.
         */
        dsb(ish);

        /* the actual thread switch */
        last = cpu_switch_to(prev, next);

        return last;
}
addr_limit overwrite

Prior to UAO, a common exploitation technique was overwriting the addr_limit of the current task and then using something like a pipe to achieve arbitrary kernel read/write:

int pipefds[2];
pipe(pipefds);

// arbitrary kernel read 8 bytes
unsigned long val;
write(pipefds[1], 0xffffff80..., 8);
read(pipefds[0], &val, 8);

// arbitrary kernel write 8 bytes
val = some_val; 
write(pipefds[1], &val, 8);
read(pipefds[0], 0xffffff80...,, 8);

When UAO is enabled, this specific exploitation technique is no longer usable. For example, in case of the arbitrary read above, there's a temporary pipe buffer in the kernel (default size is 16 pages):

  1. The value stored at some kernel address 0xffffff80... (8 bytes) is written to this temporary buffer via copy_from_user(kern_tmp_buf, 0xffffff80..., 8).
  2. The value from the temporary buffer is written back to user space using copy_to_user(&val, kern_tmp_buf, 8).

With UAO enabled, the first step above succeeds, however the second step will fail. When UAO bit is set, the aforementioned unprivileged load and store instructions will trigger faults on user-space accesses. Similarly, the arbitrary kernel write would fail on accessing the val in user space.

UAO allows to detect invalid user-space accesses while running with addr_limit == KERNEL_DS. As a side effect it also presents an effective exploitation mitigation against addr_limit overwrite. You might be thinking: what if I set addr_limit to something other than -1, e.g., -2 (since the check in uao_thread_switch() is specifically for -1)? Well, in that case you would be using unprivileged ldtr*/sttr* instructions when executing copy_to/from_user functions. When executed at EL1 (with UAO bit unset), these instructions behave as if executed at EL0. If we take our arbitrary read example above, we're doing copy_from_user() when writing to the pipe - ldtr* instruction is used to fetch data from the provided kernel address 0xffffff80.... Since it's a kernel address, ldtr* just fails without loading any data and triggering any exceptions.

What's the easiest UAO bypass you can think of? Assuming you can overwrite addr_limit in the first place, there might be some ways to bypass it ;)