💾 Archived View for aphrack.org › issues › phrack70 › 10.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
==Phrack Inc.== Volume 0x10, Issue 0x46, Phile #0x0a of 0x0f |=-----------------------------------------------------------------------=| |=----------------------=[ Hypervisor Necromancy; ]=---------------------=| |=----------------=[ Reanimating Kernel Protectors, or ]=----------------=| |=-----------------------------------------------------------------------=| |=--------=[ On emulating hypervisors; a Samsung RKP case study ]=-------=| |=-----------------------------------------------------------------------=| |=---------------------------=[ Aris Thallas ]=--------------------------=| |=--------------------=[ athallas.phrack@gmail.com ]=--------------------=| |=-----------------------------------------------------------------------=| --[ Table of Contents 0 - Introduction 1 - Overview 1.1 - ARM Architecture & Virtualization Extensions 1.2 - Samsung Hypervisor 1.3 - Workspace Environment 2 - Framework Implementation & RKP Analysis 2.1 - System Bootstrap 2.1.1 - EL1 2.2 - EL2 Bootstrap 2.2.1 - Stage 2 translation & Concatenated tables 2.2.2 - EL2 bootstrap termination and EL1 physical address 2.3 - RKP Initialization Functions 2.3.1 - RKP Exception Handlers 2.3.2 - RKP Initialization 2.3.3 - RKP Deferred Initialization 2.3.4 - Miscellaneous Initializations 2.4 - Final Notes 3 - Fuzzing 3.1 - Dummy fuzzer 3.1.1 - Handling Aborts 3.1.2 - Handling Hangs 3.2 - AFL with QEMU full system emulation 3.2.1 - Introduction 3.2.2 - Implementation 3.3.2.1 - QEMU patches 3.3.2.2 - Framework support 3.3.2.3 - Handling parent translations 3.3.2.4 - Handling hangs and aborts 3.3.2.5 - Demonstration 3.4 - Final Comments 4 - Conclusions 5 - Thanks 6 - References 7 - Source code --[ 0 - Introduction Until recently, to compromise an entire system during runtime attackers found and exploited kernel vulnerabilities. This allowed them to perform a variety of actions; executing malicious code in the context of the kernel, modify kernel data structures to elevate privileges, access protected data, etc. Various mitigations have been introduced to protect against such actions and hypervisors have also been utilized, appart from their traditional usage for virtualization support, towards this goal. In the Android ecosystem this has been facilitated by ARM virtualization extensions, which allowed vendors/OEMs to implement their own protection functionalities/logic. On the other hand, Android devices have been universally a major PITA to debug due to the large diversity of OEMs and vendors that introduced endless customizations, the lack of public tools, debug interfaces etc. To the author's understanding, setting up a proper debug environment is usually one of the most important and time consuming tasks and can make a world of difference in understanding the under examination system or application in depth (especially true if no source code is available), identifying 0day vulnerabilities and exploiting them. In this (rather long) article we will be investigating methods to emulate proprietary hypervisors under QEMU, which will allow researchers to interact with them in a controlled manner and debug them. Specifically, we will be presenting a minimal framework developed to bootstrap Samsung S8+ proprietary hypervisor as a demonstration, providing details and insights on key concepts on ARM low level development and virtualization extensions for interested readers to create their own frameworks and Actually Compile And Boot them ;). Finally, we will be investigating fuzzing implementations under this setup. The article is organized as follows. The first section provides background information on ARM, Samsung hypervisors and QEMU to properly define our development setup. Next, we will elaborate on the framework implementation while dealing with the various ARM virtualization and Samsung implementation nuances. We will continue by demonstrating how to implement custom dummy fuzzers under this setup and finally for more intelligent fuzzing incorporate AFL a.k.a. "NFL or something by some chap called Cameltuft" :p On a final note, any code snippets, memory offsets or other information presented throughout this article refer to Samsung version G955FXXU4CRJ5, QEMU version 4.1.0 and AFL version 2.56b. --[ 1 - Overview ----[ 1.1 - ARM Architecture & Virtualization Extensions As stated in "Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile - Issue E.a" (AARM), Armv8 defines a set of Exception Levels (EL, also referred to as Execution Levels) EL0 to EL3 and two security states Secure and Non-secure aka Normal World. The higher the exception level, the higher the software execution privilege. EL3 represents the highest execution/privilege level and provides support for switching between the two security states and can access all system resources for all ELs in both security states. EL2 provides support for virtualization and in the latest version Armv8.5 support for Secure World EL2 was introduced. EL1 is the Operating System kernel EL typically described as _privileged_ and EL0 is the EL of userland applications called _unprivileged_. --------------------------------------------------- | Secure Monitor (EL3) | --------------------------------------------------- | Hypervisor (EL2)* | Sec Hypervisor (sEL2) | --------------------------------------------------- | OS (EL1) | Trusted OS (sEL1) | --------------------------------------------------- | Userland App (EL0) | Secure App (sEL0) | --------------------------------------------------- Normal World Secure World Switching between ELs is only allowed via taking an exception or returning from one. Taking an exception leads to a higher or the same EL while returning from one (via `eret`) to lower or the same EL. To invoke EL1, `svc` (SuperVisor Call) command is used which triggers a synchronous exception which is then handled by the corresponding OS kernel exception vector entry. Similarly, EL2 is invoked via the `hvc` (HyperVisor Call) command and EL3 via the `smc` (Secure Monitor Call) command. Switching between security states is only done by EL3. When a hypervisor is present in the system it can control various aspects of EL1 behavior, such as trapping certain operations traditionally handled by EL1 to the hypervisor allowing the latter to decide how to handle the operation. Hypervisor Configuration Register (HCR_EL2) is the system register the allows hypervisors to define which of these behaviors they would like to enable. Last but not least, a core feature of the virtualization extensions is the Stage 2 (S2) translation. As depicted below, this feature splits the standard translation process into two steps. First, using the EL1 translation tables (stored at Translation Table Base Register TTBRn_EL1) which are controlled by EL1, the Virtual Address (VA) is translated to an Intermediate Physical Address (IPA), instead of a Physical Address (PA) of the standard process. The IPA is then translated to a PA by the hypervisor using the Stage 2 translation table (stored at Virtual Translation Table Base Register VTTBR_EL2) which is fully controlled by EL2 and not accessible by EL1. Note that once S2 translation is enabled, EL1 does not access physical memory immediately and every IPA must always be translated via S2 tables for the actual PA access. Of course, EL2 and EL3 maintain their own Stage 1 translation tables for their code and data VAs, which perform the traditional VA to PA mapping. Intermediate Virtual Memory Map Guest Physical Guest OS Memory Map (IPA) +----------------+ +-------------+ | +------------+ | | +---------+ | | | OS (EL 1) | | +--------------------+ | | Flash | | | +------------+ | | Guest OS | | +---------+ | | +-->+ Translation Tables +-->+ | | +------------+ | | TTBRn_EL1 | | +---------+ | | | APP (EL 0) | | +--------------------+ | | RAM | | | +------------+ | | +---------+ | +----------------+ +-------------+ | +---------------------------------------------+ | | +-------------+ v Real Physical | +---------+ | +------+-------------+ Memory Map | | Flash | | | Translation tables | | +---------+ | | VTTBR_EL2 +----------------------->+ | +--------------------+ | +---------+ | +------------->+ | RAM | | | | +---------+ | +----------------+ +---------+----------+ +-------------+ | +------------+ | | Hypervisor | | | Hyp (EL 2) | +-->+ Translation Tables | | +------------+ | | TTBR0_EL2 | +----------------+ +--------------------+ In this article we will be focusing on Normal World, implementing the EL3 and EL1 framework to bootstrap a proprietary EL2 implementation. ----[ 1.2 - Samsung Hypervisor As part of its ecosystem Samsung implements a security platform named Samsung Knox [01] which among others comprises a hypervisor implementation called Real-Time Kernel Protection (RKP). RKP aims to achieve various security features [02], such as the prevention of unauthorized privileged code execution, the protection of critical kernel data (i.e. process credentials) etc. Previous versions of the Samsung hypervisor have been targeted before, with [03] being the most notable exemplar. There, Samsung S7 hypervisor was analyzed in great detail and the article provided valuable information. Moreover, Samsung S8+ hypervisor is stripped and strings are obfuscated whereas S7 is not, providing a valuable resource for binary diffing and string comparison. Finally, the under examination S8+ hypervisor shares many similarities regarding the system architecture which have slowly begun disappearing in the latest models such as Samsung S10. One of the most obvious differences is the location of the binary and the bootstrap process. In sum, for S8+ the hypervisor binary is embedded in the kernel image and the precompiled binary can be found in the kernel source tree under init/vmm.elf (the kernel sources are available at [04]). The kernel is also responsible for bootstrapping and initializing RKP. On the other hand, the S10+ hypervisor binary resides in a separate partition, is bootstrapped by the bootloader and then initialized by the kernel. We will provide more details in the corresponding sections that follow. All these reasons contributed to the selection of the S8 hypervisor as the target binary, as they ease the analysis process, remove undesired complexity from secondary features/functionalities and allow focusing on the core required knowledge for our demonstration. Ultimately, though, it was an arbitrary decision and other hypervisors could have been selected. ----[ 1.3 - Workspace Environment As aforementioned the targeted Samsung version is G955FXXU4CRJ5 and QEMU version is 4.1.0. Both the hypervisor and our framework are 64-bit ARM binaries. QEMU was configured to only support AArch64 targets and built with gcc version 7.4.0, while the framework was built with aarch64-linux-gnu-gcc version 8.3.0. For debugging purposes we used aarch64-eabi-linux-gdb version 7.11. $ git clone git://git.qemu-project.org/qemu.git $ cd qemu $ git checkout v4.1.0 $ ./configure --target-list=aarch64-softmmu --enable-debug $ make -j8 AFL version is 2.56b and is also compiled with gcc version 7.4.0. $ git clone https://github.com/google/afl $ cd afl $ git checkout v2.56b $ make --[ 2 - Framework Implementation & RKP Analysis The first important thing to mention regarding the framework is that it is compiled as an ELF AArch64 executable and treated as a kernel image, since QEMU allows to boot directly from ELF kernel images in EL3 and handles the image loading process. This greatly simplifies the boot process as we are not required to implement separate firmware binary to handle image loading. Function `_reset()` found in framework/boot64.S is the starting execution function and its physical address is 0x80000000 (as specified in the linker script framework/kernel.ld) instead of the default value of 0x40000000 for our QEMU setup (the reasoning behind this is explained later when the framework physical memory layout is discussed). We are now ready to start executing and debugging the framework which is contained in the compilation output kernel.elf. We use the virt platform, cortex-a57 cpu with a single core, 3GB of RAM (the reason for this size is clarified during the memory layout discussion later), with Secure mode (EL3) and virtualization mode (EL2) enabled and wait for gdb to attach. $ qemu-system-aarch64 \ -machine virt \ -cpu cortex-a57 \ -smp 1 \ -m 3G \ -kernel kernel.elf \ -machine gic-version=3 \ -machine secure=true \ -machine virtualization=true \ -nographic \ -S -s $ aarch64-eabi-linux-gdb kernel.elf -q Reading symbols from kernel.elf...done. (gdb) target remote :1234 Remote debugging using :1234 _Reset () at boot64.S:15 15 ldr x30, =stack_top_el3 (gdb) disassemble Dump of assembler code for function _Reset: => 0x0000000080000000 <+0>: ldr x30, 0x80040000 0x0000000080000004 <+4>: mov sp, x30 ... The framework boot sequence is presented below. We will explain the individual steps in the following sections. Note that we will not be following the graph in a linear manner. +-------+ +-------+ +-------+ | EL3 | | EL2 | | EL1 | +-------+ +-------+ +-------+ | . . _reset . . | . . copy_vmm . . | . . eret -------------------------------------------> start_el1 | . | | . __enable_mmu | . | handle_interrupt_el3 <--------------------------- smc(CINT_VMM_INIT) | . | _vmm_init_el3 . | | . | eret(0xb0101000) ----------> start | | | | | | | handle_interrupt_el3 <--- smc(0xc2000401) | | | | _reset_and_drop_el1_main | | | | | eret --------------------------------------------> _el1_main | | | | | el1_main | | | | | rkp_init | | | | | rkp_call | | | | vmm_dispatch <---------- hvc(RKP_INIT) | | | | vmm_synchronous_handler | | | | | rkp_main | | | | | my_handle_cmd_init | | | | | various init functions... | | | | | rkp_paging_init | | | | | process el1 page tables | | | | | eret -----------------> el1_main | | | | | +---+ | | | | | | |<--+ ----[ 2.1 - System Bootstrap The first thing to do after a reset is to define the stack pointers and exception vectors. Since EL2 system register values are handled by RKP during its initialization, we will be skipping EL2 registers to avoid affecting RKP configurations, except for any required reserved values as dictated by AARM. Moreover, various available oracles which will be discussed later can be examined to verify the validity of the system configuration after initializations are complete. Stack pointers (SP_ELn) are set to predefined regions, arbitrarily sized 8kB each. Vector tables in AArch64 comprise 16 entries of 0x80 bytes each, must be 2kB aligned and are set in VBAR_ELx system configuration registers where x denotes the EL (for details refer to AARM section "D1.10 Exception entry" and "Bare-metal Boot Code for ARMv8-A Processors"). | Exception taken from EL | Synchronous | IRQ | FIQ | SError | ------------------------------------------------------------------- | Current EL (SP_EL0) | 0x000 | 0x080 | 0x100 | 0x180 | | Current EL (SP_ELx, x>0) | 0x200 | 0x280 | 0x300 | 0x380 | | Lower EL AArch64 | 0x400 | 0x480 | 0x500 | 0x580 | | Lower EL AArch32 | 0x600 | 0x680 | 0x700 | 0x780 | In our minimal implementation we will not be enabling IRQs or FIQs. Moreover, we will not be implementing any EL0 applications or performing `svc` calls from our kernel and as a result all VBAR_EL1 entries are set to lead to system hangs (infinite loops). Similarly, for EL3 we only expect synchronous exceptions from lower level AArch64 modes. As a result only the corresponding `vectors_el3` entry (+0x400) is set and all others lead to system hang as with EL1 vectors. The exception handler saves the current processor state (general purpose and state registers) and invokes the second stage handler. We follow the `smc` calling convention [05], storing the function identifier in W0 register and arguments in registers X1-X6 (even though we only use one argument). If the function identifier is unknown, then the system hangs, a decision of importance in the fuzzing setup. // framework/vectors.S .align 11 .global vectors vectors: /* * Current EL with SP0 */ .align 7 b . /* Synchronous */ .align 7 b . /* IRQ/vIRQ */ ... .align 11 .global vectors_el3 vectors_el3: ... /* * Lower EL, aarch64 */ .align 7 b el3_synch_low_64 ... el3_synch_low_64: build_exception_frame bl handle_interrupt_el3 cmp x0, #0 b.eq 1f b . 1: restore_exception_frame eret ... Processors enter EL3 after reset and in order to drop to a lower ELs we must initialize the execution state of the desired EL and control registers and construct a fake state in the desired EL to return to via `eret`. Even though we will be dropping from EL3 directly to EL1 to allow the proprietary EL2 implementation to define its own state, we still have to set some EL2 state registers values to initialize EL1 execution state. Failure to comply with the minimal configuration results in `eret` invocation to have no effect on the executing exception level (at least in QEMU), in other words we can not drop to lower ELs. In detail, to drop from EL3 to EL2 we have to define EL2 state in Secure Configuration Register (SCR_EL3). We set SCR_EL3.NS (bit 0) to specify that we are in Normal World, SCR_EL3.RW (bit 10) to specify that EL2 is AArch64 and any required reserved bits. Additionally, we set SCR_EL3.HCE (bit 8) to enable the `hvc` instruction here, although this could also be performed at later steps. Next, to be able to drop to EL1 we modify Hypervisor Configuration Register (HCR_EL2) to set HCR_EL2.RW (bit 31) and specify that EL1 is AArch64 and any other required reserved bits. To be as close as possible to the original setup we set some more bits here, such as HCR_EL2.SWIO (bit 1) which dictates the cache invalidation behavior. These additional values are available to us via the aforementioned oracles which will be presented later in the article. // framework/boot64.S .global _reset _reset: // setup EL3 stack ldr x30, =stack_top_el3 mov sp, x30 // setup EL1 stack ldr x30, =stack_top_el1 msr sp_el1, x30 ... // Setup exception vectors for EL1 and EL3 (EL2 is setup by vmm) ldr x1, = vectors msr vbar_el1, x1 ldr x1, = vectors_el3 msr vbar_el3, x1 ... // Initialize EL3 register values ldr x0, =AARCH64_SCR_EL3_BOOT_VAL msr scr_el3, x0 // Initialize required EL2 register values mov x0, #( AARCH64_HCR_EL2_RW ) orr x0, x0,#( AARCH64_HCR_EL2_SWIO ) msr hcr_el2, x0 ... /* * DROP TO EL1 */ mov x0, #( AARCH64_SPSR_FROM_AARCH64 | AARCH64_SPSR_MODE_EL1 | \ AARCH64_SPSR_SP_SEL_N) msr spsr_el3, x0 // drop to function start_el1 adr x0, start_el1 msr elr_el3, x0 eret For the fake lower level state, Exception Link Register (ELR_EL3) holds the exception return address, therefore we set it to the desired function (`start_el1()`). Saved Process Status Register (SPSR_EL3) holds the processor state (PSTATE) value before the exception, so we set its values so that the fake exception came from EL1 (SPSR_EL3.M bits[3:0]), using SP_EL1 (SPSR_EL3.M bit 0) and executing in AArch64 mode (SPSR_EL3.M bit 4). `eret` takes us to `start_el1()` in EL1. The final register related to exceptions is Exception Syndrome Register (ESR_ELx) which holds information regarding the nature of the exception (syndrome information) and as such it has no value to the returning EL and can be ignored. ------[ 2.1.1 - EL1 As aforementioned our goal is to provide a minimal setup. Considering this, there is also the need to be as close as possible to the original setup. Our EL1 configuration is defined with those requirements in mind and to achieve this we used system configuration register values from both the kernel source and the EL2 oracles that will be presented in the following sections, but for now we can safely assume these are arbitrarily chosen values. We will be presenting details regarding some critical system register values but for detailed descriptions please refer to AARM section "D13.2 General system control registers". start_el1: // initialize EL1 required register values ldr x0, =AARCH64_TCR_EL1_BOOT_VAL msr tcr_el1, x0 ldr x0, =AARCH64_SCTLR_EL1_BOOT_VAL msr sctlr_el1, x0 ... #define AARCH64_TCR_EL1_BOOT_VAL ( \ ( AARCH64_TCR_IPS_1TB << AARCH64_TCR_EL1_IPS_SHIFT ) | \ ( AARCH64_TCR_TG1_4KB << AARCH64_TCR_EL1_TG1_SHIFT ) | \ ( AARCH64_TCR_TSZ_512G << AARCH64_TCR_EL1_T1SZ_SHIFT ) | \ ( AARCH64_TCR_TG0_4KB << AARCH64_TCR_EL1_TG0_SHIFT ) | \ ( AARCH64_TCR_TSZ_512G << AARCH64_TCR_EL1_T0SZ_SHIFT ) | \ ... ) As Translation Control Register (TCR_EL1) values suggest, we use a 40-bit 1TB sized Intermediate Physical Address space (TCR_EL1.IPS bits[34:32]), for both TTBR0_EL1 and TTBR1_EL1 4kB Translation Granule size (TCR_EL1.TG1 bits [31:30] and TCR_EL1.TG0 [15:14] respectively) and 25 size offset which means that there is a 64-25=39 bit or 512GB region of input VAs for each TTBRn_EL1 (TCR_EL1.T1SZ bits[21:16] and TCR_EL1.T0SZ bits[5:0]). By using 4kB Granularity each translation table size is 4kB and each entry is a 64-bit descriptor, hence 512 entries per table. So at Level 3 we have 512 entries each pointing to a 4kB page or in other words we can map a 2MB space. Similarly, Level 2 has 512 entries each pointing to a 2MB space summing up to a 1GB address space and Level 1 entries point to 1GB spaces summing up to a 512GB address space. In this setup where there are 39bit input VAs we do not require a Level 0 table as shown from the translation graph. For more details refer to AARM section "D5.2 The VMSAv8-64 address translation system". +---------+---------+---------+-----------+ | [38:30] | [29:21] | [20:12] | [11:0] | VA segmentation with | | | | | 4kB Translation Granule | Level 1 | Level 2 | Level 3 | Block off | 512GB input address space +---------+---------+---------+-----------+ Physical Address +-------------------------+-----------+ VA Translation | [39:12] | [11:0] | demonstration with +-------------------------+-----------+ 4kB Granule, ^ ^ 512GB Input VA Space | | 1TB IPS | +----------+ +-------------------------+ | | | Level 1 tlb Level 2 tlb Level 3 tlb | | +--------> +-----------+ +--->+-----------+ +-->+-----------+ | | | | | | | | | | | | | | +-----------+ | +-----------+ | | | | | | | 1GB block | | | 2MB block | | | | | | | | entry | | | entry | | | | | | | +-----------+ | +-----------+ | | | | | | | | | | | | | | | | | +-----------+ | | | | | | | | | +-->+ Tbl entry +---+ | | | | | | | +---+---+ | +-----------+ +-----------+ | | | | | | TTBRn | | | | +-->+ Tbl entry +--+ +-----------+ | | +---+---+ | | | | +-----------+ +->+ Pg entry +--+ | ^ | | | | | | | +-----------+ | | | | | | | | | | | | +--+ | +-----------+ | +-----------+ | +-----------+ | | | +------+ | | | +----+ Index +----+ | +--+ +-----------+ | | | | | +----+-+-+----+---------+----+----+----+----+----+----+------+----+ | | | | Level 0 | Level 1 | Level 2 | Level 3 | PA offset | VA +----+---+----+---------+---------+---------+---------+-----------+ [55] [47:39] [38:30] [29:21] [20:12] [11:0] TTBRn Select For Levels 1 and 2 every entry can either point to the next translation table level (table entry) or to the actual physical address (block entry) effectively ending translation. The entry type is defined in bits[1:0], where bit 0 identifies whether the descriptor is valid (1 denotes a valid descriptor) and bit 1 identifies the type, value 0 being used for block entries and 1 for table entries. As a result entry type value 3 identifies table entries and value 1 block entries. Level 1 block entries point to 1GB memory regions with VA bits[29:0] being used as the PA offset and Level 2 block entries point to 2MB regions with bits[20:0] used as the offset. Last but not least, Level 3 translation tables can only have page entries (similar to block entries but with descriptor type value 3, as previous level table entries). 61 51 11 2 1:0 +------------+-----------------------------+----------+------+ Block Entry | Upper Attr | ... | Low Attr | Type | Stage 1 +------------+-----------------------------+----------+------+ Translation | bits | Attr | Description | --------------------------------------------------- | 4:2 | AttrIndex | MAIR_EL1 index | | 7:6 | AP | Access permissions | | 53 | PXN | Privileged execute never | | 54 | (U)XN | (Unprivileged) execute never | Block entry attributes | AP | EL0 Access | EL1/2/3 Access | for Stage 1 translation ------------------------------------- | 00 | None | Read Write | | 01 | Read Write | Read Write | | 10 | None | Read Only | | 11 | Read Only | Read Only | 61 59 2 1:0 +--------+--------------------------------------------+------+ Table Entry | Attr | ... | Type | Stage 1 +--------+--------------------------------------------+------+ Translation | bits | Attr | Description | --------------------------------------------- | 59 | PXN | Privileged execute never | | 60 | U/XN | Unprivileged execute never | | 62:61 | AP | Access permissions | Table entry attributes | AP | Effect in subsequent lookup levels | for Stage 1 translation ------------------------------------------- | 00 | No effect | | 01 | EL0 access not permitted | | 10 | Write disabled | | 11 | Write disabled, EL0 Read disabled | In our setup we use 2MB regions to map the kernel and create two mappings. Firstly, an identity mapping (VAs are equal to the PAs they are mapped to) set to TTBR0_EL1 and used mainly when the system transitions from not using the MMU to enabling it. Secondly, the TTBR1_EL1 mapping where PAs are mapped to VA_OFFSET + PA, which means that getting the PA from a TTBR1_EL1 VA or vice versa is simply done by subtracting or adding the VA_OFFSET correspondingly. This will be of importance during the RKP initialization. #define VA_OFFSET 0xffffff8000000000 #define __pa(x) ((uint64_t)x - VA_OFFSET) #define __va(x) ((uint64_t)x + VA_OFFSET) The code to create the page tables and enable the MMU borrows heavily from the Linux kernel implementation. We use one Level 1 entry and the required amount of Level 2 block entries with the two tables residing in contiguous preallocated (defined in the linker script) physical pages. The Level 1 entry is evaluated by macro `create_table_entry`. First, the entry index is extracted from VA bits[38:30]. The entry value is the next Level table PA ORed with the valid table entry value. This also implicitly defines the table entry attributes, where (U)XN is disabled, Access Permissions (AP) have no effect in subsequent levels of lookup. For additional details regarding the memory attributes and their hierarchical control over memory accesses refer to AARM section "D5.3.3 Memory attribute fields in the VMSAv8-64 translation table format descriptors". A similar process is followed for Level 2 but in a loop to map all required VAs in macro `create_block_map`. The entry value is the PA we want to map ORed with block entry attribute values defined by AARCH64_BLOCK_DEF_FLAGS. The flag value used denotes a non-secure memory region, (U/P)XN disabled, Normal memory as defined in Memory Attribute Indirection Register (MAIR_EL1) and Access Permissions (AP) that allow Read/Write to EL1 and no access to EL0. As with table entries, for detailed description refer to AARM section "D5.3.3". Finally, MAIR_ELx serves as a table holding information/attributes of memory regions and readers may refer to AARM section "B2.7 Memory types and attributes" for more information. // framework/aarch64.h /* * Block default flags for initial MMU setup * * block entry * attr index 4 * NS = 0 * AP = 0 (EL0 no access, EL1 rw) * (U/P)XN disabled */ #define AARCH64_BLOCK_DEF_FLAGS ( \ AARCH64_PGTBL_BLK_ENTRY | \ 0x4 << AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_IDX_SHIFT | \ AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_AP_RW_ELHIGH << \ AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_AP_SHIFT | \ AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_SH_INN_SH << \ AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_SH_SHIFT | \ 1 << AARCH64_PGTBL_BLK_ENT_STAGE1_LOW_ATTR_AF_SHIFT \ ) // framework/mmu.S __enable_mmu: ... bl __create_page_tables isb mrs x0, sctlr_el1 orr x0, x0, #(AARCH64_SCTLR_EL1_M) msr sctlr_el1, x0 ... __create_page_tables: mov x7, AARCH64_BLOCK_DEF_FLAGS ... // x25 = swapper_pg_dir u/ x20 = VA_OFFSET mov x0, x25 adrp x1, _text add x1, x1, x20 create_table_entry x0, x1, #(LEVEL1_4K_INDEX_SHIFT), \ #(PGTBL_ENTRIES), x4, x5 adrp x1, _text add x2, x20, x1 adrp x3, _etext add x3, x3, x20 create_block_map x0, x7, x1, x2, x3 ... .macro create_table_entry, tbl, virt, shift, ptrs, tmp1, tmp2 lsr \tmp1, \virt, \shift and \tmp1, \tmp1, \ptrs - 1 // table entry index add \tmp2, \tbl, #PAGE_SIZE // next page table PA orr \tmp2, \tmp2, #AARCH64_PGTBL_TBL_ENTRY // valid table entry str \tmp2, [\tbl, \tmp1, lsl #3] // store new entry add \tbl, \tbl, #PAGE_SIZE // next level table page .endm .macro create_block_map, tbl, flags, phys, start, end lsr \phys, \phys, #LEVEL2_4K_INDEX_SHIFT lsr \start, \start, #LEVEL2_4K_INDEX_SHIFT and \start, \start, #LEVEL_4K_INDEX_MASK // table index orr \phys, \flags, \phys, lsl #LEVEL2_4K_INDEX_SHIFT // table entry lsr \end, \end, #LEVEL2_4K_INDEX_SHIFT // block entries counter and \end, \end, #LEVEL_4K_INDEX_MASK // table end index 1: str \phys, [\tbl, \start, lsl #3] // store the entry add \start, \start, #1 // next entry add \phys, \phys, #LEVEL2_4K_BLK_SIZE // next block cmp \start, \end b.ls 1b .endm ... As a demonstration we perform a manual table walk for VA 0xffffff8080000000 which should be the TTBR1_EL1 VA of function `_reset()`. The Level 1 table index (1) is 2 and the entry value is 0x8008a003 which denotes a valid table descriptor at PA 0x8008a000. The Level 2 entry index (2) is 0 and value of the entry is 0x80000711 which denotes a block entry at physical address 0x80000000. The remaining VA bits setting the PA offset are zero and examining the resulting PA is of course the start of function `_reset()`. Note that since we have not yet enabled the MMU (as shown in the disassembly this is performed in the next instructions), all memory accesses with gdb refer to PAs that is why we can directly examine the page tables and resulting PA. In our setup that would be true even with MMU enabled due to the identity mapping, however, this should not be assumed to apply to every system. (gdb) disas Dump of assembler code for function __enable_mmu: 0x00000000800401a0 <+0>: mov x28, x30 0x00000000800401a4 <+4>: adrp x25, 0x80089000 // TTBR1_EL1 0x00000000800401a8 <+8>: adrp x26, 0x8008c000 0x00000000800401ac <+12>: bl 0x80040058 <__create_page_tables> => 0x00000000800401b0 <+16>: isb 0x00000000800401b4 <+20>: mrs x0, sctlr_el1 0x00000000800401b8 <+24>: orr x0, x0, #0x1 End of assembler dump. (gdb) p/x ((0xffffff8000000000 + 0x80000000) >> 30) & 0x1ff /* (1) */ $19 = 0x2 (gdb) x/gx ($TTBR1_EL1 + 2*8) 0x80089010: 0x000000008008a003 (gdb) p/x ((0xffffff8000000000 + 0x80000000) >> 21) & 0x1ff /* (2) */ $20 = 0x0 (gdb) x/gx 0x000000008008a000 0x8008a000: 0x0000000080000711 (gdb) x/10i 0x0000000080000000 0x80000000 <_reset>: ldr x30, 0x80040000 0x80000004 <_reset+4>: mov sp, x30 0x80000008 <_reset+8>: mrs x0, currentel Finally, with the MMU enabled we are ready to enable RKP. Since the EL2 exception vector tables are not set, the only way to do that is to drop to EL2 from EL3 as we did for EL1. We invoke `smc` with function identifier CINT_VMM_INIT which the EL3 interrupt handler redirects to function `_vmm_init_el3()`. ----[ 2.2 - EL2 Bootstrap RKP binary is embedded in our kernel image using the `incbin` assembler directive as shown below and before dropping to EL2 we must place the binary in its expected physical address. Since RKP is an ELF file, we can easily obtain the PA and entry point which for this specific RKP version are 0xb0100000 and 0xb0101000 respectively. `copy_vmm()` function copies the binary from its kernel position to the expected PA during the system initialization in function `_reset()`. // framework/boot64.S ... .global _svmm _svmm: .incbin "vmm-G955FXXU4CRJ5.elf" .global _evmm _evmm: ... $ readelf -l vmm-G955FXXU4CRJ5.elf Elf file type is EXEC (Executable file) Entry point 0xb0101000 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x00000000b0100000 0x00000000b0100000 0x000000000003e2e0 0x000000000003e6c0 RWE 0x10000 ... At long last we are ready to drop to EL2. Similarly to dropping to EL1, we set ELR_EL3 to the RKP entry point and SPSR_EL3 so that the fake exception came from EL2 executing in AArch64 mode. We additionally set X0 and X1 to to RKP start PA and reserved size. These values are dictated by the Samsung kernel implementation and the oracles and required by the EL2 implementation which will be explained shortly. Readers interested in the Samsung kernel implementation can refer to kernel function `vmm_init()` at kernel/init/vmm.c which is called during the kernel initialization in function `start_kernel()`. // framework/boot64.S .global _vmm_init_el3 .align 2 _vmm_init_el3: // return to vmm.elf entry (RKP_VMM_START + 0x1000) mov x0, #RKP_VMM_START add x0, x0, #0x1000 msr elr_el3, x0 mov x0, #(AARCH64_SPSR_FROM_AARCH64 | AARCH64_SPSR_MODE_EL2 | \ AARCH64_SPSR_SP_SEL_N) msr spsr_el3, x0 // these are required for the correct hypervisor setup mov x0, #RKP_VMM_START mov x1, #RKP_VMM_SIZE eret .inst 0xdeadc0de //crash for sure ENDPROC(_vmm_init_el3) One valuable source of information at this point is the Linux kernel procfs entry /proc/sec_log as it provides information about the aforementioned values during Samsung kernel `vmm_init()` invocation. This procfs entry is part of the Exynos-SnapShot debugging framework and more information can be found in the kernel source at kernel/drivers/trace/exynos-ss.c. A sample output with RKP related values is displayed below. Apart from the RKP related values we can see the kernel memory layout which will be helpful in creating our framework memory layout to satisfy the plethora of criteria introduced by RKP which will be presented later. RKP: rkp_reserve_mem, base:0xaf400000, size:0x600000 RKP: rkp_reserve_mem, base:0xafc00000, size:0x500000 RKP: rkp_reserve_mem, base:0xb0100000, size:0x100000 RKP: rkp_reserve_mem, base:0xb0200000, size:0x40000 RKP: rkp_reserve_mem, base:0xb0400000, size:0x7000 RKP: rkp_reserve_mem, base:0xb0407000, size:0x1000 RKP: rkp_reserve_mem, base:0xb0408000, size:0x7f8000 software IO TLB [mem 0x8f9680000-0x8f9a80000] (4MB) mapped at [ffffffc879680000-ffffffc879a7ffff] Memory: 3343540K/4136960K available (11496K kernel code, 3529K rwdata, 7424K rodata, 6360K init, 8406K bss, 637772K reserved, 155648K cma-reserved) Virtual kernel memory layout: modules : 0xffffff8000000000 - 0xffffff8008000000 ( 128 MB) vmalloc : 0xffffff8008000000 - 0xffffffbdbfff0000 ( 246 GB) .init : 0xffffff8009373000 - 0xffffff80099a9000 ( 6360 KB) .text : 0xffffff80080f4000 - 0xffffff8008c2f000 ( 11500 KB) .rodata : 0xffffff8008c2f000 - 0xffffff8009373000 ( 7440 KB) .data : 0xffffff80099a9000 - 0xffffff8009d1b5d8 ( 3530 KB) vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) 0xffffffbdc0000000 - 0xffffffbde2000000 ( 544 MB actual) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1 RKP: vmm_reserved .base=ffffffc030100000 .size=1048576 .bss=ffffffc03013e2e0 .bss_size=992 .text_head=ffffffc030101000 .text_head_size=192 RKP: vmm_kimage .base=ffffff8009375a10 .size=255184 RKP: vmm_start=b0100000, vmm_size=1048576 RKP: entry point=00000000b0101000 RKP: status=0 in rkp_init, swapper_pg_dir : ffffff800a554000 The entry point eventually leads to RKP function `vmm_main()` (0xb0101818). The function initially checks whether RKP has already been initialized (3) and if true it returns, or else proceeds with the initialization and sets the initialization flag. Immediately after this, `memory_init()` function (0xb0101f24) is called where a flag is set indicating that memory is active and a 0x1f000 sized buffer at 0xb0220000 is initialized to zero. // vmm-G955FXXU4CRJ5.elf int64_t vmm_main(int64_t hyp_base_arg, int64_t hyp_size_arg, char **stacks) { ... if ( !initialized_ptr ) /* (3) */ { initialized_ptr = 1; memory_init(); log_message("RKP_cdb5900c %sRKP_b826bc5a %s\n", "Jul 11 2018", "11:19:43"); /* various log messages and misc initializations */ heap_init(base, size); stacks = memalign(8, 0x10000) + 0x2000; vmm_init(); ... if (hyp_base_arg != 0xB0100000) return -1; ... set_ttbr0_el2(&_static_s1_page_tables_start___ptr); s1_enable(); set_vttbr_el2(&_static_s2_page_tables_start___ptr); s2_enable(); } ... return result; } This buffer is the RKP log and along with RKP debug log at 0xb0200000, which will be presented later, they comprise the EL2 oracles. Both of them are made available via procfs entry /proc/rkp_log and interested readers can check kernel/drivers/rkp/rkp_debug_log.c for more information from the kernel perspective. RKP log is written to by `log_message()` function (0xb0102e94) among others and an edited sample output from `vmm_main()` with deobfuscated strings as comments with the help of S7 hypervisor binary as mentioned before. RKP_1f22e931 0xb0100000 RKP_dd15365a 40880 // file base: %p size %s RKP_be7bb431 0xb0100000 RKP_dd15365a 100000 // region base: %p size %s RKP_2db69dc3 0xb0220000 RKP_dd15365a 1f000 // memory log base: %p size %s RKP_2c60d5a7 0xb0141000 RKP_dd15365a bf000 // heap base: %p size %s During the initialization the heap is initialized and memory is allocated for the stack which has been temporarily set to a reserved region during compilation. Next, in `vmm_init()` (0xb0109758) two critical actions are performed. First, the EL2 exception vector table (0xb010b800) is set in VBAR_EL2 enabling us to invoke RKP from EL1 via `hvc`. Finally, HCR_EL2.TVM (bit 26) is set trapping EL1 writes to virtual memory control registers (SCTLR_EL1, TTBRnRL1, TCR_EL1, etc) to EL2 with Syndrome Exception Class (ESR_EL2.EC bits [31:26]) value 0x18 (more on this while discussing the EL2 synchronous exception handler). At this point we clarify one the aforementioned constrains; that of the RKP bootstrap arguments. The RKP PA is compared at this point with hardcoded value 0xb0100000 and if there's a mismatch the bootstrap process terminates and -1 is returned denoting failure. Furthermore, the PA is stored and used later during the paging initialization, also discussed later. If the RKP PA check is satisfied, the final bootstrap steps comprise the MMU and memory translations enabling. First, EL2 Stage 1 translations are enabled. TTBR0_EL2 is set to predefined static tables at 0xb011a000 and `s1_enable()` (0xb0103dcc) function is called. First, MAIR_EL2 is set to define two memory attributes (one for normal memory and one for device memory). Next, TCR_EL2 is ORed with 0x23518 which defines a 40 bits or 1TB Physical Address Size (TCR_EL2.PS bits[18:16]), a 4kB Granule size (TCR_EL2.TG0 bits[15:14]) and 24 size offset (TCR_EL2.T0SZ bits[5:0]) which corresponds to a 64-24=40 bit or 1TB input address space for TTBR0_EL2. To conclude `s1_enable()` SCTLR_EL2 is set with the important values being SCTLR_EL2.WNX (bit 19) which enables the behavior where write permission implies XN and SCTLR_EL2.M (bit 0) which enables the MMU. Last but not least, Stage 2 translation is enabled. VTTBR_EL2 which holds the Stage 2 translation tables is set to the predefined static tables at 0xb012a000. Next, Virtual Translation Control Register (VTCR_EL2) is set which as the name dictates, controls the Stage 2 translation process similarly to TCR_ELx for Stage 1 translations. Its value defines a 40 bits or 1TB Physical Address Size (VTCR_EL2.PS bits[18:16]), a 4kB Granule size (TCR_EL2.TG0 bits[15:14]), and 24 size offset (TCR_EL2.T0SZ bits[5:0]) which corresponds to a 64-24=40 bit or 1TB input address space for VTTBR0_EL2. Moreover, Starting Level of Stage 2 translation controlled by VTCR_EL2.SL0 (bits[7:6]) is set to 1 and since TCR_EL2.TG0 is set to 4kB Stage 2 translations start at Level 1 with concatenated tables which will be explained in detail next. Finally, HCR_EL2.VM (bit 0) is set to enable Stage 2 translation. ------[ 2.2.1 - Stage 2 translation & Concatenated tables As AARM states "for a Stage 2 translation, up to 16 translation tables can be concatenated at the initial lookup level. For certain input address sizes, concatenating tables in this way means that the lookup starts at a lower level than would otherwise be the case". We are going to demonstrate this in our current setup but for more details refer to section "D5.2.6 Overview of the VMSAv8-64 address translation stages" of AARM. Since we have a 40 bit input address range only bit 39 of the input VA is used to index translation table at Level 0 and as a result only two Level 1 tables exist. Instead of the default setup, ARM allows to concatenate the two tables in contiguous physical pages and start translation in Level 1. To index the Level 1 tables, IPA bits[39:30] are used instead of the traditional bits[38:30]. +---------+---------+---------+---------+-----------+ Default approach | 39 | [38:30] | [29:21] | [20:12] | [11:0] | Stage 2 translation | | | | | | IPA segmentation | Level 0 | Level 1 | Level 2 | Level 3 | Block off | 4kB Granule +---------+---------+---------+---------+-----------+ 40-bit IPS +-------------+---------+---------+-----------+ Concatenated Tables | [39:30] | [29:21] | [20:12] | [11:0] | IPA segmentation | | | | | 4kB Granule | Level 1 | Level 2 | Level 3 | Block off | 40-bit IPS +-------------+---------+---------+-----------+ VTCR_EL2.SL0 = 1 We have included a gdb script to dump the Stage 2 translation tables based on tools from [03] and [06]. The script reads the table PA from VTTBR_EL2 and is configured for our setup only and not the generic approach. Moreover, it needs to be called from EL2 or EL3, for which `switchel <#>` command can be used. Finally, our analysis indicates that there is a 1:1 mapping between IPAs and PAs. (gdb) switchel $cpsr = 0x5 (EL1) (gdb) switchel 2 Moving to EL2 $cpsr = 0x9 (gdb) pagewalk ################################################ # Dump Second Stage Translation Tables # ################################################ PA Size: 40-bits Starting Level: 1 IPA range: 0x000000ffffffffff Page Size: 4KB ... Third level: 0x1c07d000-0x1c07e000: S2AP=11, XN=10 Third level: 0x1c07e000-0x1c07f000: S2AP=11, XN=10 ... second level block: 0xbfc00000-0xbfe00000: S2AP=11, XN=0 second level block: 0xbfe00000-0xc0000000: S2AP=11, XN=0 first level block: 0xc0000000-0x100000000: S2AP=11, XN=0 first level block: 0x880000000-0x8c0000000: S2AP=11, XN=0 ... (gdb) switchel 1 Moving to EL1 $cpsr = 0x5 (EL1) ------[ 2.2.2 - EL2 bootstrap termination and EL1 physical address Now that the hypervisor is setup we can resume with the framework setup. The bootstrap process terminates via an `smc` command thus returning to EL3. X0 holds the special value 0xc2000401 and X1 the return value of the operation (zero denoting success). If the bootstrap process fails, `handle_interrupt_el3()` fails (5) and the system hangs (4). // framework/vectors.S el3_synch_low_64: build_exception_frame bl handle_interrupt_el3 cmp x0, #0 /* (4) */ b.eq 1f b . 1: restore_exception_frame eret ... // framework/interrupt-handler.c int handle_interrupt_el3(uint64_t value, uint64_t status) { int ret = 0; switch (value) { case 0xc2000401: // special return value from vmm initialization if (status == 0) { _reset_and_drop_el1_main(); } else { ret = -1; /* (5) */ } ... } Careful readers might have noticed that the EL2 `smc` invocation causes a new exception frame to be stored in EL3 and in order to return to EL1 we must properly restore the state. Well, due to the framework minimal nature no information needs to be saved before or after EL2 bootstrap. As a result we simply reset the state (i.e. stack pointers) and drop to EL1 function `_el1_main()` which in turn leads to `el1_main()`. // framework/boot64.S ... _reset_and_drop_el1_main: /* * We have initialized vmm. Jump to EL1 main since HVC is now enabled, * and EL1 does not require EL3 to interact with hypervisor */ // setup EL3 stack ldr x30, =stack_top_el3 mov sp, x30 // setup EL1 stack ldr x30, =stack_top_el1 msr sp_el1, x30 mov x0, #(AARCH64_SPSR_FROM_AARCH64 | AARCH64_SPSR_MODE_EL1 | \ AARCH64_SPSR_SP_SEL_N) msr spsr_el3, x0 // drop to function _el1_main adr x0, _el1_main msr elr_el3, x0 eret /* (6) */ ... _el1_main: mov x20, #-1 lsl x20, x20, #VA_BITS adr x0, el1_main add x0, x0, x20 blr x0 ... Here we explain another system constrain. Our framework was arbitrarily placed at PA 0x80000000. The reason should by now be obvious. After enabling Stage 2 translation, every EL1 IPA is translated through Stage 2 tables to find the PA. Examining the hypervisor static maps reveals region starting at 0x80000000 to satisfy the criteria required for lower level execution. Specifically, eXecute Never (XN) field is unset and there is no write permissions. Should the kernel be placed in an unmapped or non executable for Stage 2 translation region during framework initialization, then returning from EL3 to EL1 (6) results in a translation error. (gdb) pagewalk ################################################ # Dump Second Stage Translation Tables # ################################################ ... Third level: 0x1c07e000-0x1c07f000: S2AP=11, XN=10 Third level: 0x1c07f000-0x1c080000: S2AP=11, XN=10 Third level: 0x80000000-0x80001000: S2AP=1, XN=0 Third level: 0x80001000-0x80002000: S2AP=1, XN=0 ... 54 51 10 2 1:0 +------------+-----------------------------+----------+------+ Block Entry | Upper Attr | .... | Low Attr | Type | Stage 2 +------------+-----------------------------+----------+------+ Translation | bits | Attr | Description | ------------------------------------------ | 5:2 | AttrIndex | MAIR_EL2 index | | 7:6 | S2AP | Access permissions | | 53:54 | XN | Execute never | Block entry attributes | S2AP | EL1/EL0 Access | | XN | Allow Exec | for Stage 2 translation ------------------------- -------------------- | 00 | None | | 00 | EL0/EL1 | | 01 | Read Only | | 01 | EL0 not EL1 | | 10 | Write Only | | 10 | None | | 11 | Read Write | | 11 | EL1 not EL0 | ----[ 2.3 - RKP Initialization Functions The first thing performed in `el1_main()` is to initialize RKP. There are numerous steps that comprise RKP initialization and we will present them in the following sections. Before explaining the initialization process though we will describe the RKP exception handlers. ------[ 2.3.1 - RKP Synchronous Handler As explained during the EL2 bootstrap VBAR_EL2 is set at 0xb010b800 where each handler first creates the exception frame storing all generic registers and then calls function `vmm_dispatch()` (0x0b010aa44) with the three arguments being the offset indicating the EL from which the exception was taken, the exception type and the exception frame address respectively. `vmm_dispatch()` is designed to only handle synchronous exceptions and simply returns otherwise. Function `vmm_synchronous_handler()` (0xb010a678) handles as the name suggests the synchronous exceptions and only the exception frame (third) argument is of importance. stp X1, X0, [SP,#exception_frame]! ... mov X0, #0x400 // Lower AArch64 mov X1, #0 // Synchronous Exception mov X2, SP // Exception frame, holding args from EL1 bl vmm_dispatch ... ldp X1, X0, [SP+0x10+exception_frame],#0x10 clrex eret As shown from the following snippet the handler first evaluates ESR_EL2.EC. Data and Instruction Aborts from the current EL (ECs 0x21 and 0x25) are not recoverable and the handler calls `vmm_panic()` function (0xb010a4cc) which leads to system hang. Data and Instruction Aborts from lower EL (ECs 0x20 and 0x24) are handled directly by the handler. Furthermore, as mentioned before, by setting HCR_EL2.TVM during the RKP bootstrap, EL1 writes to virtual memory control registers are trapped to EL2 with EC 0x18 and here handled by function `other_msr_mrs_system()` (0xb010a24c). `hvc` commands either from AArch32 or AArch64 (ECs 0x12 and 0x16) are our main focus and will be explained shortly. Finally, any other ECs return -1 which leads `vmm_dispatch()` to `vmm_panic()`. // vmm-G955FXXU4CRJ5.elf int64_t vmm_synchronous_handler(int64_t from_el_offset, int64_t exception_type, exception_frame *exception_frame) { esr_el2 = get_esr_el2(); ... switch ( esr_el2 >> 26 ) /* Exception Class */ { case 0x12: /* HVC from AArch32 */ case 0x16: /* HVC from AArch64 */ if ((exception_frame->x0 & 0xFFF00000) == 0x83800000) /* (7) */ rkp_main(exception_frame->x0, exception_frame); ... return 0; case 0x18: /* Trapped MSR, MRS or System instruction execution */ v7 = other_msr_mrs_system(exception_frame); ... case 0x20: /* Instruction Abort from a lower Exception level */ ... case 0x21: /* Instruction Abort Current Exception Level */ vmm_panic(from_el_offset, exception_type, ...); case 0x24: /* Data Abort from a lower Exception level */ ... case 0x25: /* Data Abort Current Exception Level */ vmm_panic(from_el_offset, exception_type, ...); default: return -1; } } Before moving to `hvc` we will be briefly introducing `msr`/`mrs` handling (for details regarding the values of ESR_EL2 discussed here refer to AARM section "D13.2.37"). First, the operation direction is checked via the ESR_EL2.ISS bit 0. As mentioned only writes are supposed to be trapped (direction bit value must be 0) and if somehow a read was trapped, handler ends up in `vmm_panic()`. The general purpose register used for the transfer is discovered from the value of ESR_EL2.ISS.Rt (bits [9:5]). The rest of ESR_EL2.ISS values are used to identify the system register accessed by `msr` and in RKP each system register is handled differently. For example SCTLR_EL1 handler does not allow to disable the MMU or change endianess and TCR_EL1 handler does not allow modification of the Granule size. We will not be examining every case in this (already long) article, but interested readers should by now have more than enough information to start investigating function `other_msr_mrs_system()`. RKP `hvc` invocation's first argument (X0) is the function identifier and as shown in (7) must abide by a specific format for function `rkp_main()` (0xb010d000) which is the `hvc` handler to be invoked. Specifically, each command is expected to have a prefix value of 0x83800000. Furthermore, to form the command, command indices are shifted by 12 and then ORed with the prefix (readers may also refer to kernel/include/linux/rkp.h). This format is also expected by `rkp_main()` as explained next. // vmm-G955FXXU4CRJ5.elf void rkp_main(unsigned int64_t command, exception_frame *exception_frame) { hvc_cmd = (command >> 12) & 0xFF; /* (8) */ if ( hvc_cmd && !is_rkp_activated ) /* (9) */ lead_to_policy_violation(hvc_cmd); ... my_check_hvc_command(hvc_cmd); switch ( hvc_cmd ) { case 0: ... if ( is_rkp_activated ) /* (10) */ rkp_policy_violation(2, 0, 0, 0); rkp_init(exception_frame); ... break; ... void my_check_hvc_command(unsigned int64_t cmd_index) { if ( cmd_index > 0x9F ) rkp_policy_violation(3, cmd_index, 0, 0); prev_counter = my_cmd_counter[cmd_index]; if ( prev_counter != 0xFF ) { cur_counter = (prev_counter - 1); if ( cur_counter > 1 ) rkp_policy_violation(3, cmd_index, prev_counter, 0); my_cmd_counter[cmd_index] = cur_counter; } } `rkp_main()` first extracts the command index (8) and then calls function `my_check_hvc_command()` (0xb0113510). Two things are happening there. First, the index must be smaller than 0x9f. Second, RKP maintains an array with command counters. The counter for RKP initialization command is 1 during the array definition and is set again along with all other values at runtime in function `my_initialize_hvc_cmd_counter()` (0xb011342c) during the initialization. If any of these checks fails, `rkp_policy_violation()` (0xb010dba4) is called which can be considered as an assertion error and leads to system hang. Finally, before allowing any command invocation except for the initialization, a global flag indicating whether RKP is initialized is checked (9). This flag is obviously set after a successful initialization as explained in the following section. Before continuing with the initialization process we will present some commands as examples to better demonstrate their usage. The first initialization function (presented next) is `rkp_init()` with command id 0 which corresponds to command 0x83800000. During definition, as mentioned above, its counter is set to 1 so that it can be called once before invoking `my_initialize_hvc_cmd_counter()`. Similarly, command id 1 corresponds to deferred initialization function (also presented next), can be reached with command 0x83801000 and since its counter is set to 1 which means it can only be called once. Commands with counter value -1 as the ones shown in the table below for handling page tables (commands 0x21 and 0x22 for level 1 and 2 correspondingly) can be called arbitrary number of times. | Function | ID | Command | Counter | ---------------------------------------------- | rkp_init | 0x0 | 0x83800000 | 0 | | rkp_def_init | 0x1 | 0x83801000 | 1 | ... | rkp_pgd_set | 0x21 | 0x83821000 | -1 | | rkp_pmd_set | 0x22 | 0x83822000 | -1 | ... ------[ 2.3.2 - RKP Initialization With this information, we are now ready to initialize RKP. In the snippet below we demonstrate the framework process to initialize the RKP (with RKP command id 0). We also show the `rkp_init_t` struct values used in the framework during the invocation and we will be elaborating more on them while examining the RKP initialization function `rkp_init()` (0xb0112f40). Interested readers can also study and compare `framework_rkp_init()` function with Samsung kernel function `rkp_init()` in kernel/init/main.c and the initialization values presented here against some of the values from the sample sec_log output above. // framework/main.c void el1_main(void) { framework_rkp_init(); ... } // framework/vmm.h #define RKP_PREFIX (0x83800000) #define RKP_CMDID(CMD_ID) (((CMD_ID) << 12 ) | RKP_PREFIX) #define RKP_INIT RKP_CMDID(0x0) ... // framework/vmm.c void framework_rkp_init(void) { struct rkp_init_t init; init.magic = RKP_INIT_MAGIC; init._text = (uint64_t)__va(&_text); init._etext = (uint64_t)__va(&_etext); init.rkp_pgt_bitmap = (uint64_t)&rkp_pgt_bitmap; init.rkp_dbl_bitmap = (uint64_t)&rkp_map_bitmap; init.rkp_bitmap_size = 0x20000; init.vmalloc_start = (uint64_t)__va(&_text); init.vmalloc_end = (uint64_t)__va(&_etext+0x1000); init.init_mm_pgd = (uint64_t)&swapper_pg_dir; init.id_map_pgd = (uint64_t)&id_pg_dir; init.zero_pg_addr = (uint64_t)&zero_page; init.extra_memory_addr = RKP_EXTRA_MEM_START; init.extra_memory_size = RKP_EXTRA_MEM_SIZE; init._srodata = (uint64_t)__va(&_srodata); init._erodata = (uint64_t)__va(&_erodata); rkp_call(RKP_INIT, &init, (uint64_t)VA_OFFSET, 0, 0, 0); } // framework/util.S rkp_call: hvc #0 ret ENDPROC(rkp_call) magic : 0x000000005afe0001 vmalloc_start : 0xffffff8080000000 vmalloc_end : 0xffffff8080086000 init_mm_pgd : 0x0000000080088000 id_map_pgd : 0x000000008008b000 zero_pg_addr : 0x000000008008e000 rkp_pgt_bitmap : 0x0000000080044000 rkp_dbl_bitmap : 0x0000000080064000 rkp_bitmap_size : 0x0000000000020000 _text : 0xffffff8080000000 _etext : 0xffffff8080085000 extra_mem_addr : 0x00000000af400000 extra_mem_size : 0x0000000000600000 physmap_addr : 0x0000000000000000 _srodata : 0xffffff8080085000 _erodata : 0xffffff8080086000 large_memory : 0x0000000000000000 fimc_phys_addr : 0x00000008fa080000 fimc_size : 0x0000000000780000 tramp_pgd : 0x0000000000000000 Before everything else, the debug log at 0xb0200000 is initialized (11). This is the second EL2 oracle and we will be discussing it shortly as it will provide valuable information to help create correct memory mapping for the initialization to be successful. Evidently, there are two modes of RKP operation which are decided upon during the initialization; normal and test mode. Test mode disables some of the aforementioned `hvc` command invocation counters and enables some command indices/functions. As the name suggests these are used for testing purposes and while these may assist and ease the reversing process, we will not be analyzing them in depth, because the are not encountered in real world setups. The mode is selected by the struct magic field, whose value can either be 0x5afe0001 (normal mode) or 0x5afe0002 (test mode). It would be possible to change to test mode via a second `rkp_init()` invocation while hoping not to break any other configurations, however this is not possible via normal system interaction. As shown in (12) after a successful initialization, global flag `is_rkp_activated` is set. This flag is then checked (10) before calling `rkp_init()` in `rkp_main()` function as demonstrated in the previously presented snippet. // vmm-G955FXXU4CRJ5.elf void rkp_init(exception_frame *exception_frame) { ... rkp_init_values = maybe_rkp_get_pa(exception_frame->x1); rkp_debug_log_init(); /* (11) */ ... if ( rkp_init_values->magic - 0x5AFE0001 <= 1 ){ if ( rkp_init_values->magic == 0x5AFE0002 ) { /* enable test mode */ } /* store all rkp_init_t struct values */ rkp_physmap_init(); ... if ( rkp_bitmap_init() ) { /* misc initializations and debug logs */ rkp_debug_log("RKP_6398d0cb", hcr_el2, sctlr_el2, rkp_init_values->magic); /* more debug logs */ if ( rkp_paging_init() ) { is_rkp_activated = 1; /* (12) */ ... my_initialize_hvc_cmd_counter(); ... } } ... } ... } RKP maintains a struct storing all required information. During initialization in RKP function `rkp_init()`, values passed via `rkp_init_t` struct along with the VA_OFFSET are stored there to be used later. Next, various memory regions such as physmap and bitmaps are initialized. We are not going to be expanding on those regions since they are implementation specific, but due to their heavy usage by RKP (especially physmap) we are going to briefly explain them. Physmap contains information about physical regions, such as whether this is an EL2 or EL1 region etc., is set to a predefined EL2 only accessible region as explained next and RKP uses this information to decide if certain actions are allowed on specific regions. Two bitmaps exist in this specific RKP implementation; rkp_pgt_bitmap and rkp_dbl_bitmap and their physical regions are provided by EL1 kernel. They are both written to by RKP. rkp_pgt_bitmap provides information to EL1 on whether addresses are protected by S2 mappings and as such accesses should be handled by RKP. rkp_dbl_bitmap is used to track and prevent unauthorized mappings from being used for page tables. The `rkp_bitmap_init()` success requires only the pointers to not be zero, however additional restrictions are defined during `rkp_paging_init()` function (0xb010e4c4) later. Next, we see the RKP debug log being used, dumping system registers thus providing important information regarding the system state/configuration, which has helped us understand the system and configure the framework. Below a (processed) sample output is displayed with the various registers annotated. Finally, Samsung allows OEM unlock for the under examination device model, which allows us to patch vmm.elf, build and boot the kernel with the patched RKP and retrieve additional information. The final snippet line contains the debug log from a separate execution, where MAIR_ELn registers were replaced with SCTLR_EL1 and VTCR_EL2 respectively. How to build a custom kernel and boot a Samsung device with it is left as exercise to the reader. 0000000000000000 neoswbuilder-DeskTop RKP64_01aa4702 0000000000000000 Jul 11 2018 0000000000000000 11:19:42 /* hcr_el2 */ /* sctlr_el2 */ 84000003 30cd1835 5afe0001 RKP_6398d0cb /* tcr_el2 */ /* tcr_el1 */ 80823518 32b5593519 5afe0001 RKP_64996474 /* mair_el2 */ /* mair_el1 */ 21432b2f914000ff 0000bbff440c0400 5afe0001 RKP_bd1f621f ... /* sctlr_el1 */ /* vtcr_el2 */ 34d5591d 80023f58 5afe0001 RKP_patched Finally, one of the most important functions in RKP initialization follows; `rkp_paging_init()`. Numerous checks are performed in this function and the system memory layout must satisfy them all for RKP to by initialized successfully. Furthermore, physmap, bitmaps and EL2 Stage 1 and 2 tables are set or processed. We will be explaining some key points but will not go over every trivial check. Finally, we must ensure that any RKP required regions are reserved. The physical memory layout used in the framework aiming to satisfy the minimum requirements to achieve proper RKP initialization is shown below. Obviously, more complex layouts can be used to implement more feature rich frameworks. The graph also explains the previously presented size selection of 3GBs for the emulation system RAM. This size ensures that the framework has a sufficiently large PA space to position executables in their expected PAs. +---------+ 0x80000000 text, vmalloc | | | | | | | | +---------+ 0x80044000 rkp_pgt_bitmap | | | | +---------+ 0x80064000 rkp_map_bitmap | | | | +---------+ 0x80085000 _etext, srodata | | +---------+ 0x80086000 _erodata, vmalloc_end | | | | +---------+ 0x80088000 swapper_pg_dir | | | | +---------+ 0x8008b000 id_pg_dir | | | | +---------+ 0x8008e000 zero_page | | ... | | +---------+ 0xaf400000 rkp_extra_mem_start | | | | +---------+ 0xafa00000 rkp_extra_mem_end | | +---------+ 0xafc00000 rkp_phys_map_start | | | | +---------+ 0xb0100000 rkp_phys_map_end, hyp_base To sum up the process, after alignment and layout checks, the EL1 kernel region is set in physmap (13) and mapped in EL2 Stage 1 translation tables (14). The two bitmap regions are checked (15) and if they are not incorporated in the kernel text, their Stage 2 (S2) entries are changed to Read-Only and not executable (16) and finally physmap is updated with the two bitmap regions. FIMC region, which will be discussed shortly, is processed next (17) in function `my_process_fimc_region()` (0xb0112df0). Continuing, kernel text is set as RWX in S2 translation tables (18) which will change later during the initialization to read-only. Last but not least, physmap and extra memory address are unmapped from S2 (19) and (21) rendering them inaccessible from EL1 and their physmap regions are set (20) and (22). // vmm-G955FXXU4CRJ5.elf int64_t rkp_paging_init(void) { /* alignment checks */ v2 = my_rkp_physmap_set_region(text_pa, etext - text, 4); /* (13) */ if ( !v2 ) return v2; /* alignment checks */ res = s1_map(text_pa, etext_pa - text_pa, 9); /* (14) */ ... /* * bitmap alignment checks /* (15) */ * might lead to label do_not_process_bitmap_regions */ res = rkp_s2_change_range_permission(rkp_pgt_bitmap, /* (16) */ bitmap_size + rkp_pgt_bitmap, 0x80, 0, 1); // RO, XN ... res = rkp_s2_change_range_permission(rkp_map_bitmap, bitmap_size + rkp_map_bitmap, 0x80, 0, 1); // RO, XN ... do_not_process_bitmap_regions: if ( !my_rkp_physmap_set_region(rkp_pgt_bitmap, bitmap_size, 4) ) return 0; res = my_rkp_physmap_set_region(rkp_map_bitmap, bitmap_size, 4); if ( res ) { res = my_process_fimc_region(); /* (17) */ if ( res ) { res = rkp_s2_change_range_permission( /* (18) */ text_pa, etext_pa, 0, 1, 1); // RW, X ... /* (19) */ res = maybe_s2_unmap(physmap_addr, physmap_size + 0x100000); ... res = my_rkp_physmap_set_region(physmap_addr, /* (20) */ physmap_size + 0x100000, 8); ... /* (21) */ res = maybe_s2_unmap(extra_memory_addr, extra_memory_size); ... res = my_rkp_physmap_set_region(extra_memory_addr, /* (22) */ extra_memory_size, 8); ... } } return res; } FIMC refers to Samsung SoC Camera Subsystem and during the kernel initialization, regions are allocated and binaries are loaded from the disk. There is only one relevant `hvc` call, related to the FIMC binaries verification (command id 0x71). RKP modifies the related memory regions permissions and then invokes EL3 to handle the verification in function `sub_B0101BFC()`. Since we are implementing our own EL3 and are interested in EL2 functionality we will be ignoring this region. However, we still reserve it for completeness reasons and function `my_process_fimc_region()` simply processes the S2 mappings for this region. By invoking `hvc` with command id 0x71, even if every other condition is met and `smc` is reached, as discussed above EL3 will hang because there is no handler for `smc` command id 0xc200101d in our setup. // vmm-G955FXXU4CRJ5.elf sub_B0101BFC ... mov X0, #0xC200101D mov X1, #0xC mov X2, X19 // holds info about fimc address, size, etc. mov X3, #0 dsb SY smc #0 ... Although, as mentioned, simply reserving the region will suffice for this specific combination of hypervisor and subsystem, it is indicative of the considerations needed when examining hypervisors, even if more complex actions are required by other hypervisors and/or subsystems. For example the verification might have been incorporated in the initialization procedure, in which case this could be handled by our framework EL3 component. At this step we have performed the first step of RKP initialization successfully. After some tasks such as the `hvc` command counters initialization and the `is_rkp_activated` global flag setting `rkp_init()` returns. We can now invoke other `hvc` commands. ------[ 2.3.3 - RKP Deferred Initialization The next step is the deferred initialization which is handled by function `rkp_def_init()` (0xb01131dc) and its main purpose is to set the kernel S2 translation permissions. // vmm-G955FXXU4CRJ5.elf void rkp_def_init(void) { ... if ( srodata_pa >= etext_pa ) { if (!rkp_s2_change_range_permission(text_pa, etext_pa, 0x80, 1, 1)) // Failed to make Kernel range ROX rkp_debug_log("RKP_ab1e86d9", 0, 0, 0); } else { res = rkp_s2_change_range_permission(text_pa, srodata_pa, 0x80, 1, 1)) // RO, X ... res = rkp_s2_change_range_permission(srodata_pa, etext_pa, 0x80, 0, 1)) // RO, XN ... } rkp_l1pgt_process_table(swapper_pg_dir, 1, 1); RKP_DISALLOW_DEBUG = 1; rkp_debug_log("RKP_8bf62beb", 0, 0, 0); } As demonstrated below after `rkp_s2_change_range_permission()` invocation the kernel region is set to read only. Finally, in `rkp_l1pgt_process_table()` swapper_pg_dir (TTBR1_EL1) and its subtables are set to read-only and not-executable. // EL1 text before rkp_s2_change_range_permission() Third level: 0x80000000-0x80001000: S2AP=11, XN=0 ... // EL1 text after rkp_s2_change_range_permission() Third level: 0x80000000-0x80001000: S2AP=1, XN=0 ... // swapper_pg_dir before rkp_l1pgt_process_table() Third level: 0x80088000-0x80089000: S2AP=11, XN=0 Third level: 0x80089000-0x8008a000: S2AP=11, XN=0 ... // swapper_pg_dir after rkp_l1pgt_process_table() Third level: 0x80088000-0x80089000: S2AP=1, XN=10 Third level: 0x80089000-0x8008a000: S2AP=1, XN=10 ... ------[ 2.3.4 - Miscellaneous Initializations In our approach, we have not followed the original kernel initialization to the letter. Specifically, we skip various routines initializing values regarding kernel structs such as credentials, etc., which are void of meaning in our minimal framework. Moreover, these are application specific and do not provide any valuable information required by the ARM architecture to properly define the EL2 state. However, we will be briefly presenting them here for completeness reasons, and as our system understanding improves and the framework supported functionality requirements increase (for example to improve fuzzing discussed next) they can be incorporated in the framework. Command 0x40 is used to pass information about cred and task structs offsets and then command 0x42 for cred sizes during the credential initialization in kernel's `cred_init()` function. Next, in `mnt_init()` command 0x41 is used to inform EL2 about vfsmount struct offsets and then when rootfs is mounted in `init_mount_tree()` information regarding the vfsmount are sent via command 0x55. This command is also used later for the /system partition mount. These commands can only be called once (with the exception of command 0x55 whose counter is 2) and as mentioned above are used in the original kernel initialization process. Incorporating them to the framework requires understanding of their usage from both the kernel and the hypervisor perspective which will be left as an exercise to the reader who can start by studying the various `rkp_call()` kernel invocations. ----[ 2.4 - Final Notes At this point we have performed most of the expected RKP initialization routines. We now have a fully functional minimal framework which can be easily edited to test and study the RKP hypervisor behavior. More importantly we have introduced fundamental concepts for readers to implement their own setups and reach the current system state which allows us to interact with it and start investigating fuzzing implementations. On a final note, some of the original kernel initialization routines were omitted since their action lack meaning in our framework. They were briefly introduced and interested readers can study the various `rkp_call()` kernel invocations and alter the framework state at will. Additionally, this allows the fuzzers to investigate various configuration scenarios not restricted by our own assumptions. --[ 3 - Fuzzing In this section we will be describing our approaches towards setting up fuzzing campaigns under the setup presented above. We will begin with a naive setup aiming to introduce system concepts we need to be aware and an initial interaction with QEMU source code and functionality. We will then be expanding on this knowledge, incorporating AFL in our setup for more intelligent fuzzing. To verify the validity of the fuzzing setups presented here we evidently require a bug that would crash the system. For this purpose we will be relying on a hidden RKP command with id 0x9b. This command leads to function `sub_B0113AA8()` which, as shown in the snippet, adds our second argument (register X1) to value 0x4080000000 and uses the result as an address to store a QWORD. As you might be imagining, simply passing 0 as our second argument results in a data abort ;) // vmm-G955FXXU4CRJ5.elf int64_t sub_B0113AA8(exception_frame *exc_frame) { *(exc_frame->x1 + 0x4080000000) = qword_B013E6B0; rkp_debug_log("RKP_5675678c", qword_B013E6B0, 0, 0); return 0; } To demonstrate the framework usage we are going to trigger this exception with a debugger attached. We start the framework and set a breakpoint in the handler from `hvc` command 0x9b at the instruction writing the QWORD to the evaluated address. Single stepping from there causes an exception, which combined with the previous information about RKP exception handlers, we can see is a synchronous exception from the same EL. Continuing execution from there we end up in the synchronous handler for data aborts (EC 0x25) which leads to `vmm_panic()` :) (gdb) target remote :1234 _reset () at boot64.S:15 15 ldr x30, =stack_top_el3 (gdb) continue ... Breakpoint 1, 0x00000000b0113ac4 in ?? () (gdb) x/4i $pc-0x8 0xb0113abc: mov x0, #0x80000000 0xb0113ac0: movk x0, #0x40, lsl #32 => 0xb0113ac4: str x1, [x2,x0] 0xb0113ac8: adrp x0, 0xb0116000 (gdb) info registers x0 x1 x2 x0 0x4080000000 277025390592 x1 0x0 0 x2 0x1 1 (gdb) stepi 0x00000000b010c1f4 in ?? () (gdb) x/20i $pc => 0xb010c1f4: stp x1, x0, [sp,#-16]! ... 0xb010c234: mov x0, #0x200 // Current EL 0xb010c238: mov x1, #0x0 // Synchronous 0xb010c23c: mov x2, sp 0xb010c240: bl 0xb010aa44 // vmm_dispatch (gdb) continue Continuing. Breakpoint 5, 0x00000000b010a80c in ?? () // EC 0x25 handler (gdb) x/7i $pc => 0xb010a80c: mov x0, x22 0xb010a810: mov x1, x21 0xb010a814: mov x2, x19 0xb010a818: adrp x3, 0xb0115000 0xb010a81c: add x3, x3, #0x4d0 0xb010a820: bl 0xb010a4cc // vmm_panic ----[ 3.1 - Dummy fuzzer To implement the dummy fuzzer we decided to abuse `brk` instruction, which generates a Breakpoint Instruction exception. The exception is recorded in in ESR_ELx and the value of the immediate argument in the instruction specific syndrome field (ESR_ELx.ISS, bits[24:0]). In QEMU, this information is stored in `CPUARMStame.exception` structure as shown in the following snippet. // qemu/target/arm/cpu.h typedef struct CPUARMState { ... /* Regs for A64 mode. */ uint64_t xregs[32]; ... /* Information associated with an exception about to be taken: * code which raises an exception must set cs->exception_index and * the relevant parts of this structure; the cpu_do_interrupt function * will then set the guest-visible registers as part of the exception * entry process. */ struct { uint32_t syndrome; /* AArch64 format syndrome register */ ... } exception; ... } `arm_cpu_do_interrupt()` function handles the exceptions in QEMU and we can intercept the `brk` invocation by checking `CPUState.exception_index` variable as shown in (23). There we can introduce our fuzzing logic and setup the system state with our fuzzed values for the guest to access as discussed next. Finally, to avoid actually handling the exception (calling the exception vector handle, changing ELs etc.) which would disrupt our program flow, we simply advance `pc` to the next instruction and return from the function. This effectively turns `brk` into a fuzzing instruction. // qemu/target/arm/helper.c /* Handle a CPU exception for A and R profile CPUs. ... */ void arm_cpu_do_interrupt(CPUState *cs) { ARMCPU *cpu = ARM_CPU(cs); CPUARMState *env = &cpu->env; ... // Handle the break instruction if (cs->exception_index == EXCP_BKPT) { /* (23) */ handle_brk(cs, env); env->pc += 4; return; } ... arm_cpu_do_interrupt_aarch64(cs); ... } We utilize syndrome field as a function identifier and specifically immediate value 0x1 is used to call the dummy fuzzing functionality. There are numerous different harnesses that can be implemented here. In our demo approach we only use a single argument (via X0) which points to a guest buffer where fuzzed data could be placed. The framework registers, hence arguments which will be passed to EL2 by `rkp_call_fuzz` after calling `__break_fuzz()` are set by our harness in function `handle_brk()`. // framework/main.c void el1_main(void) { framework_rkp_init(); rkp_call(RKP_DEF_INIT, 0, 0, 0, 0, 0); for(; ;){ // fuzzing loop __break_fuzz(); // create fuzzed values rkp_call_fuzz(); // invoke RKP } } // framework/util.S __break_fuzz: ldr x0, =rand_buf brk #1 ret ENDPROC(__break_fuzz) rkp_call_fuzz: hvc #0 ret ENDPROC(rkp_call_fuzz) We will not be presenting complex harnesses here since this is beyond the scope of this article and will be left as exercise for the reader. We will, however, be describing a simple harness to fuzz RKP commands. Moreover, since most RKP handlers expect the second argument (X1 register) to point to a valid buffer we will be using `rand_buf` pointer as shown above for that purpose. The logic should be rather straightforward. We get a random byte (24), at the end place it in X0 (25) and as a result will be used as the RKP command index. Next, we read a page of random data and copy it to the guest buffer `rand_buf` (using function `cpu_memory_rw_debug()`) and use it as the second argument by placing the buffer address in X1 (26). // qemu/target/arm/patch.c int handle_brk(CPUState *cs, CPUARMState *env) { uint8_t syndrome = env->exception.syndrome & 0xFF; int l = 0x1000; uint8_t buf[l]; switch (syndrome) { case 0: // break to gdb if (gdbserver_running()) { qemu_log_mask(CPU_LOG_INT, "[!] breaking to gdb\n"); vm_stop(RUN_STATE_DEBUG); } break; case 1: ; // dummy fuzz uint8_t cmd = random() & 0xFF; /* (24) */ /* write random data to buffer buf */ /* * Write host buffer buf to guest buffer pointed to * by register X0 during brk invocation */ if (cpu_memory_rw_debug(cs, env->xregs[0], buf, l, 1) < 0) { fprintf(stderr, " Cannot access memory\n"); return -1; } fuzz_cpu_state.xregs[0] = 0x83800000 | (cmd << 12); fuzz_cpu_state.xregs[1] = env->xregs[0]; env->xregs[0] = fuzz_cpu_state.xregs[0]; /* (25) */ env->xregs[1] = fuzz_cpu_state.xregs[1]; /* (26) */ break; default: ; } return 0; } As you might expect after compiling the modified QEMU and executing the fuzzer, nothing happens! We elaborate more on this next. ------[ 3.1.1 - Handling Aborts Since this is a bare metal implementation there is nothing to "crash". Once an abort happens, the abort exception handler is invoked and both our framework and RKP ends up in an infinite loop. To identify aborts we simply intercept them in `arm_cpu_do_interrupt()` similarly with `brk`. // qemu/target/arm/helper.c void arm_cpu_do_interrupt(CPUState *cs) { ... // Handle the instruction or data abort if (cs->exception_index == EXCP_PREFETCH_ABORT || cs->exception_index == EXCP_DATA_ABORT ) { if(handle_abort(cs, env) == -1) { qemu_system_shutdown_request(SHUTDOWN_CAUSE_HOST_ERROR); } // reset system qemu_system_reset_request(SHUTDOWN_CAUSE_HOST_QMP_SYSTEM_RESET); } ... } When a data or instruction abort exception is generated, we create a crash log in `handle_abort()` and then request QEMU to either reset and restart fuzzing or terminate if `handle_abort()` fails which essentially terminates fuzzing as we can not handle aborts. We use QEMU functions to dump the system state such as the faulting addresses, system registers, and memory dumps in text log files located in directory crashes/. int handle_abort(CPUState *cs, CPUARMState *env) { FILE* dump_file; if (open_crash_log(&dump_file) == -1) return -1; const char *fmt_str = "********* Data\\Instruction abort! *********\n" "FAR = 0x%llx\t ELR = 0x%llx\n" "Fuzz x0 = 0x%llx\t Fuzz x1 = 0x%llx\n"; fprintf(dump_file, fmt_str, env->exception.vaddress, env->pc, fuzz_cpu_state.xregs[0], fuzz_cpu_state.xregs[1]); fprintf(dump_file, "\n********** CPU State **********\n"); cpu_dump_state(cs, dump_file, CPU_DUMP_CODE); fprintf(dump_file, "\n********** Disassembly **********\n"); target_disas(dump_file, cs, env->pc-0x20, 0x40); fprintf(dump_file, "\n********** Memory Dump **********\n"); dump_extra_reg_data(cs, env, dump_file); fprintf(dump_file, "\n********** End of report **********\n"); fclose(dump_file); return 0; } A sample trimmed crash log is presented below. We can see that the faulting command is 0x8389b000 (or command index 0x9b ;) the faulting address and the code were the abort happened. You can create your own logs by executing the dummy fuzzer ;)