This post describes UAO (User Access Override) implementation details in the Linux/Android kernel and demonstrates its effectiveness against the common addr_limit
overwrite kernel exploitation technique.
Currently most recently Android devices on the market are based on the ARMv8.2 architecture extension. UAO feature was introduced in the ARMv8.2 extension. It allows unprivileged load and store (ldtr*/sttr*
) instructions to behave as standard privileged ldr*/str*
instructions.
On systems without UAO support, standard privileged ldr*/str*
instructions are used when accessing user-space addresses:
Dump of assembler code for function __arch_copy_to_user: ... 0xffffffc000405ea0 <+32>: sub x2, x2, x4 0xffffffc000405ea4 <+36>: tbz w4, #0, 0xffffffc000405eb4 <__arch_copy_to_user+52> --> 0xffffffc000405ea8 <+40>: ldrb w3, [x1],#1 --> 0xffffffc000405eac <+44>: strb w3, [x6],#1 0xffffffc000405eb0 <+48>: nop 0xffffffc000405eb4 <+52>: tbz w4, #1, 0xffffffc000405ec4 <__arch_copy_to_user+68> --> 0xffffffc000405eb8 <+56>: ldrh w3, [x1],#2 --> 0xffffffc000405ebc <+60>: strh w3, [x6],#2 0xffffffc000405ec0 <+64>: nop 0xffffffc000405ec4 <+68>: tbz w4, #2, 0xffffffc000405ed4 <__arch_copy_to_user+84> --> 0xffffffc000405ec8 <+72>: ldr w3, [x1],#4 --> 0xffffffc000405ecc <+76>: str w3, [x6],#4 0xffffffc000405ed0 <+80>: nop 0xffffffc000405ed4 <+84>: tbz w4, #3, 0xffffffc000405ee4 <__arch_copy_to_user+100> --> 0xffffffc000405ed8 <+88>: ldr x3, [x1],#8 --> 0xffffffc000405edc <+92>: str x3, [x6],#8 ...
If PAN (Privileged Access Never) is enabled, these privileged instructions generate faults when reading/writing from/to user-space addresses. Hence, the standard procedure in all user-space accessor functions (copy_to/from_user
, get/put_user
, memdup_user
, etc.) temporarily disable PAN (specifically PSTATE.PAN
), perform the user-space data access and then re-enable PAN.
ENTRY(__arch_copy_to_user) uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 #include "copy_template.S" [1] uaccess_disable_not_uao x3, x4 mov x0, #0 ret ENDPROC(__arch_copy_to_user)
Functions uaccess_enable_not_uao
and uaccess_disable_not_uao
are simply no-ops when UAO is supported by the processor:
static inline void uaccess_disable_not_uao(void) { __uaccess_disable(ARM64_ALT_PAN_NOT_UAO); } static inline void uaccess_enable_not_uao(void) { __uaccess_enable(ARM64_ALT_PAN_NOT_UAO); } #define __uaccess_disable(alt) \ do { \ if (!uaccess_ttbr0_disable()) \ --> asm(ALTERNATIVE("nop", SET_PSTATE_PAN(1), alt, \ CONFIG_ARM64_PAN)); \ } while (0) #define __uaccess_enable(alt) \ do { \ if (!uaccess_ttbr0_enable()) \ --> asm(ALTERNATIVE("nop", SET_PSTATE_PAN(0), alt, \ CONFIG_ARM64_PAN)); \ } while (0)
The above macros ensure that PAN is enabled/disabled on user-space data accesses when there's no UAO support, i.e., when ARM64_ALT_PAN_NOT_UAO
is specified. Otherwise, the whole block becomes a nop
.
The actual data copy implementation in [1] then uses overloaded/macro load and store instructions:
... mov dst, dstin cmp count, #16 /*When memory length is less than 16, the accessed are not aligned.*/ b.lo .Ltiny15 neg tmp2, src ands tmp2, tmp2, #15/* Bytes to reach alignment. */ b.eq .LSrcAligned sub count, count, tmp2 /* * Copy the leading memory data from src to dst in an increasing * address order.By this way,the risk of overwritting the source * memory data is eliminated when the distance between src and * dst is less than 16. The memory accesses here are alignment. */ tbz tmp2, #0, 1f --> ldrb1 tmp1w, src, #1 --> strb1 tmp1w, dst, #1 1: tbz tmp2, #1, 2f --> ldrh1 tmp1w, src, #2 --> strh1 tmp1w, dst, #2 2: tbz tmp2, #2, 3f ...
These macro load and store instructions behave differently depending on the UAO status:
.macro ldrb1 ptr, regB, val ldrb \ptr, [\regB], \val .endm .macro strb1 ptr, regB, val --> uao_user_alternative 9998f, strb, sttrb, \ptr, \regB, \val .endm .macro ldrh1 ptr, regB, val ldrh \ptr, [\regB], \val .endm .macro strh1 ptr, regB, val --> uao_user_alternative 9998f, strh, sttrh, \ptr, \regB, \val .endm .macro ldr1 ptr, regB, val ldr \ptr, [\regB], \val .endm .macro str1 ptr, regB, val --> uao_user_alternative 9998f, str, sttr, \ptr, \regB, \val .endm ...
For example, in case of copy_to_user
, the store instructions above use the uao_user_alternative
macro to replace privileged str*
instructions with unprivileged sttr*
instructions [2] when UAO is enabled and supported by the processor:
.macro uao_user_alternative l, inst, alt_inst, reg, addr, post_inc alternative_if_not ARM64_HAS_UAO 8888: \inst \reg, [\addr], \post_inc; nop; alternative_else \alt_inst \reg, [\addr]; [2] add \addr, \addr, \post_inc; alternative_endif .section __ex_table,"a"; .align 3; .quad 8888b,\l; .previous .endm
The key implementation detail is
The UAO bit then allows control over how these unprivileged load/store instructions operate in EL1. If the UAO bit is 0 (disabled), ldtr*/sttr*
don't generate faults on user-space accesses as stated above. However, when this bit is set, ldtr*/sttr*
instructions behave as equivalent privileged ldr*/str*
instructions.
In order to catch invalid user-space accesses in the kernel user-space accessor functions, the UAO bit is enabled when the addr_limit
of the current task is set to KERNEL_DS
(i.e., (unsigned long)-1
):
void uao_thread_switch(struct task_struct *next) { if (IS_ENABLED(CONFIG_ARM64_UAO)) { if (task_thread_info(next)->addr_limit == KERNEL_DS) asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), ARM64_HAS_UAO)); else asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO)); } }
The above function is called from __switch_to
[3] which is executed on every context switch:
struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next) { struct task_struct *last; fpsimd_thread_switch(next); tls_thread_switch(next); hw_breakpoint_thread_switch(next); contextidr_thread_switch(next); #ifdef CONFIG_THREAD_INFO_IN_TASK entry_task_switch(next); #endif uao_thread_switch(next); [3] /* * Complete any pending TLB or cache maintenance on this CPU in case * the thread migrates to a different CPU. */ dsb(ish); /* the actual thread switch */ last = cpu_switch_to(prev, next); return last; }
Prior to UAO, a common exploitation technique was overwriting the addr_limit
of the current task and then using something like a pipe
to achieve arbitrary kernel read/write:
int pipefds[2]; pipe(pipefds); // arbitrary kernel read 8 bytes unsigned long val; write(pipefds[1], 0xffffff80..., 8); read(pipefds[0], &val, 8); // arbitrary kernel write 8 bytes val = some_val; write(pipefds[1], &val, 8); read(pipefds[0], 0xffffff80...,, 8);
When UAO is enabled, this specific exploitation technique is no longer usable. For example, in case of the arbitrary read above, there's a temporary pipe buffer in the kernel (default size is 16 pages):
- The value stored at some kernel address
0xffffff80...
(8 bytes) is written to this temporary buffer viacopy_from_user(kern_tmp_buf, 0xffffff80..., 8)
. - The value from the temporary buffer is written back to user space using
copy_to_user(&val, kern_tmp_buf, 8)
.
With UAO enabled, the first step above succeeds, however the second step will fail. When UAO bit is set, the aforementioned unprivileged load and store instructions will trigger faults on user-space accesses. Similarly, the arbitrary kernel write would fail on accessing the val
in user space.
UAO allows to detect invalid user-space accesses while running with addr_limit == KERNEL_DS
. As a side effect it also presents an effective exploitation mitigation against addr_limit
overwrite. You might be thinking: what if I set addr_limit
to something other than -1, e.g., -2 (since the check in uao_thread_switch()
is specifically for -1)? Well, in that case you would be using unprivileged ldtr*/sttr*
instructions when executing copy_to/from_user
functions. When executed at EL1 (with UAO bit unset), these instructions behave as if executed at EL0. If we take our arbitrary read example above, we're doing copy_from_user()
when writing to the pipe - ldtr*
instruction is used to fetch data from the provided kernel address 0xffffff80...
. Since it's a kernel address, ldtr*
just fails without loading any data and triggering any exceptions.
What's the easiest UAO bypass you can think of? Assuming you can overwrite addr_limit
in the first place, there might be some ways to bypass it ;)