💾 Archived View for aphrack.org › issues › phrack66 › 4.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
==Phrack Inc.== Volume 0x0d, Issue 0x42, Phile #0x04 of 0x11 |=-----------------------------------------------------------------------=| |=-------=[ The Objective-C Runtime: Understanding and Abusing ]=-------=| |=-----------------------------------------------------------------------=| |=----------------------=[ nemo@felinemenace.org ]=----------------------=| |=-----------------------------------------------------------------------=| --[ Contents 1 - Introduction 2 - What is Objective-C? 3 - The Objective-C Runtime 3.1 - libobjc.A.dylib 3.2 - The __OBJC Segment 4 - Reverse Engineering Objective-C Applications. 4.1 - Static analysis toolset 4.2 - Runtime analysis toolset 4.3 - Cracking 4.4 - Objective-C Binary infection. 5 - Exploiting Objective-C Applications 5.1 - Side note: Updated shared_region technique. 6 - Conclusion 7 - References 8 - Appendix A: Source code --[ 1 - Introduction Hello reader. I am writing this paper to document some research which I undertook on Mac OS X around 3 years ago. At the time i prepared this research, I gave a talk on it at Ruxcon. It was a pretty terrible talk, dry and technical and it demotivated me a little. Unfortunately due to this i didn't keep the slides. Around this time my laptop broke and Apple refused to fix it. This drove me away from Mac OS X for a while. A week ago, we tried again with another Apple store, just in case, and they seem to have fixed the problem. So i'm back on OS X and giving the documentation of this research another try. I'm hoping it transfers a little smoother in .txt format, however you be the judge. The topic of this research is the Objective-C runtime on Mac OS X. Basically, during the contents of this paper, i will look at how the Objective-C runtime works both in a binary, and in memory. I will then look at how we can manipulate the runtime to our advantage, from a reverse engineering/exploit development and binary infection perspective. --[ 2 - What is Objective-C? Before we look at the Objective-C runtime, let's take a look at what Objective-C actually is. Objective-C is a reflective programming language which aims to provide object orientated concepts and Smalltalk-esque messaging to C. Gcc provides a compiler for Objective-C, however due to the rich library support on OpenStep based operating systems (Mac OS X, IPhone, GNUstep) it is typically only really used on these platforms. Objective-C is implemented as an augmentation to the C language. It is a superset of C which means that any Objective-C compiler can also compile C. To learn more about Objective-C, you can read the [1] and [2] in the references. To illustrate what Objective-C looks like as a language we'll look at a simple Hello World example from [3]. This tutorial shows how to compile a basic Hello World style Objective-C app from the command line. If you're already familiar with Objective-C just go ahead and skip to the next section. ;-) So first we make a directory for our project ... -[dcbz@megatron:~/code]$ mkdir HelloWorld -[dcbz@megatron:~/code]$ mkdir HelloWorld/build ... and create the header file for our new class (Talker.) -[dcbz@megatron:~/code]$ cat > HelloWorld/Talker.h #import <Foundation/Foundation.h> @interface Talker : NSObject - (void) say: (STR) phrase; @end ^D As you can see, Objective-C projects use the .h extension just like C. This header looks pretty different to a typically C style header though. The "@interface Talker : NSObject" line basically tells the compiler that a "Talker" class exists, and it's derived from the NSObject class. The "- (void) say: (STR) phrase;" line describes a public method of that class called "say". This method takes a (STR) argument called "phrase". Now that the header file exists and our class is defined, we need to implement the meat of the class. Typically Objective-C files have the file extension ".m". -[dcbz@megatron:~/code]$ cat > HelloWorld/Talker.m #import "Talker.h" @implementation Talker - (void) say: (STR) phrase { printf("%s\n", phrase); } @end ^D Clearly the implementation for the Talker class is pretty straight forward. The say() method takes the string "phrase" and prints it with printf. Now that our class is layed down, we need to write a little main() function to use it. -[dcbz@megatron:~/code]$ cat > HelloWorld/hello.m #import "Talker.h" int main(void) { Talker *talker = [[Talker alloc] init]; [talker say: "Hello, World!"]; [talker release]; } From this example you can see that the syntax for calling methods of an Objective-C class is not quite the same as your typical C or C++ code. It looks far more like smalltalk messaging, or Lisp. [<object> <method>: <argument>]; Typically Objective-C programmers alloc and init on the same line, as shown in the example. I know this generally sets off alarm bells that a NULL pointer dereference can occur, however the Objective-C runtime has a check for a NULL pointer being passed to the runtime which catches this condition. (see the objc_msgSend source later in this paper.) Now we just build the project. The -framework option to gcc allows us to specify an Objective-C framework to link with. -[dcbz@megatron:~/code]$ cd HelloWorld/ -[dcbz@megatron:~/code/HelloWorld]$ gcc -o build/hello Talker.m hello.m -framework Foundation -[dcbz@megatron:~/code/HelloWorld]$ cd build/ -[dcbz@megatron:~/code/HelloWorld/build]$ ./hello Hello, World! As you can see, the produced binary outputs "Hello, World!" as expected. Unfortunately, this example about showcases all the skill I have with Objective-C as a language. I've spent way more time auditing it than I have writing it. Fortunately you don't really need a heavy understanding of Objective-C to follow the rest of the paper. --[ 3 - The Objective-C Runtime Now that we're intimately familiar with Objective-C as a language, ;-) - We can begin to focus on the interesting aspects of Objective-C, the runtime that allows it to function. As I mentioned earlier in the Introduction section, Objective-C is a reflective language. The following quote explains this more clearly than i could (in a very academic manner :( ). """ Reflection is the ability of a program to manipulate as data something representing the state of the program during its own execution. There are two aspects of such manipulation : introspection and intercession. Introspection is the ability of a program to observe and therefore reason about its own state. Intercession is the ability of a program to modify its own execution state or alter its own interpretation or meaning. Both aspects require a mechanism for encoding execution state as data; providing such an encoding is called reification. """ - [4] Basically this means, that at runtime, Objective-C classes are designed to be aware of their own state, and be capable of altering their own implementation. As you can imagine, this information/functionality can be quite useful from a hacking perspective. So how is this implemented on Mac OS X? Firstly, when gcc compiles our hello.m application, it is linked with the "libobjc.A.dylib" library. """ -[dcbz@megatron:~/code/HelloWorld/build]$ otool -L hello hello: /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 677.22.0) /usr/lib/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 111.1.3) /usr/lib/libobjc.A.dylib (compatibility version 1.0.0, current version 227.0.0) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 476.17.0) """ The source code for this dylib is available from [5]. This library contains the code for manipulating our Objective-C classes at runtime. Also during compile time, gcc is responsible for storing all the information required by libobjc.A.dylib inside the binary. This is accomplished by creating the __OBJC segment. I plan not to cover the Mach-O file format in this paper, as it's been done to death [6]. We're more interested in what the various sections contain. Here's a list of the __OBJC segment in our binary and the sections contained (logically) within. LC_SEGMENT.__OBJC.__cat_cls_meth LC_SEGMENT.__OBJC.__cat_inst_meth LC_SEGMENT.__OBJC.__string_object LC_SEGMENT.__OBJC.__cstring_object LC_SEGMENT.__OBJC.__message_refs LC_SEGMENT.__OBJC.__sel_fixup LC_SEGMENT.__OBJC.__cls_refs LC_SEGMENT.__OBJC.__class LC_SEGMENT.__OBJC.__meta_class LC_SEGMENT.__OBJC.__cls_meth LC_SEGMENT.__OBJC.__inst_meth LC_SEGMENT.__OBJC.__protocol LC_SEGMENT.__OBJC.__category LC_SEGMENT.__OBJC.__class_vars LC_SEGMENT.__OBJC.__instance_vars LC_SEGMENT.__OBJC.__module_info LC_SEGMENT.__OBJC.__symbols As you can see, quite a lot of information is stored in the file and therefore available at runtime.. We'll look at both the in memory components of the Objective-C runtime and the file contents in more detail in the following sections. ------[ 3.1 - libobjc.A.dylib As mentioned previously, the file libobjc.A.dylib is a library file on Mac OS X which provides the in-memory runtime functionality of the Objective-C language. The source code for this library is available from the apple website. [5]. Apple have documented the mechanics of this library quite well in the papers [7] & [8]. These papers show versions 1.0 and 2.0 of the runtime. When I last looked at the runtime 3 years ago, version 2.0 was the latest. However it seems that 3.0 is the standard now, and things have changed quite dramatically. I actually wrote a large portion of this section based on how things used to be, and I had to go back and rewrite most of it. Hopefully there aren't any errors due to this. But please forgive me if there are. Probably the first and most important function in this library is the "objc_msgSend" function. objc_msgSend() is used to send messages to an object in memory. All access to a method or attribute of an Objective-C object at runtime utilize this function. Here is the description of this function, taken from the Objective-C 2.0 Runtime Reference [7]. """ objc_msgSend(): Sends a message with a simple return value to an instance of a class. id objc_msgSend(id theReceiver, SEL theSelector, ...) Parameters: theReceiver A pointer that points to the instance of the class that is to receive the message. theSelector The selector of the method that handles the message. ... A variable argument list containing the arguments to the method. ReturnValue The return value of the method. """ In order to understand this function we need to first understand the structures used by this function. The first argument to objc_msgSend() is an "id" struct. The definition for this struct is in the file /usr/include/objc/objc.h. typedef struct objc_object { Class isa; } *id; typedef struct objc_class *Class; struct objc_class { struct objc_class* isa; struct objc_class* super_class; const char* name; long version; long info; long instance_size; struct objc_ivar_list* ivars; struct objc_method_list** methodLists; struct objc_cache* cache; struct objc_protocol_list* protocols; }; As you can see, an id is basically a pointer to an "objc_class" instance in memory. I will now run through some of the more interesting elements of this struct. The isa element is a pointer to the class definition for the object. The super_class element is a pointer to the base class for this object. The name element is just a pointer to the name of the object at runtime. This is only really useful from a higher level perspective. The ivars element is basically a way to represent all the instance variables of an object in memory. It consists of a pointer to an objc_ivar_list struct. This basically contains a count, followed by an array of count * objc_ivar structs. struct objc_ivar_list { int ivar_count /* variable length structure */ struct objc_ivar ivar_list[1] } The objc_ivar struct, consists of the name, and type of the variable. Both of which are simply char * as seen below. struct objc_ivar { char *ivar_name char *ivar_type int ivar_offset } The ivar_offset value indicates how far into the __OBJC.__class_vars section to seek, to find the data used by this variable. The methodLists element is basically a list of the methods supported by the class. The objc_method_list struct is simply made up of an integer that dictates how many methods there are, followed by an array of struct objc_method's. struct objc_method_list { struct objc_method_list *obsolete; int method_count; struct objc_method method_list[1]; } typedef struct objc_method *Method; The objc_method struct contains a SEL, (our second argument to objc_msgSend too, while we'll get to soon) which dictates the method_name, a string containing the argument types to the method. Finally this struct contains a function pointer for the method itself, of type IMP. struct objc_method { SEL method_name char *method_types IMP method_imp } id (*IMP)(id, SEL, ...) An IMP function pointer indicates that the first argument should be the classes "self" pointer, or the id (objc_class) pointer for the class. The second argument should be the methods's SEL (selector). For now that's all that's interesting to us about the ID data type. Later on in this paper we'll look at how the method caching works, and how it can negatively affect us. Now let's look at the mysterious data type "SEL" that we've been hearing so much about. The second argument to objc_msgSend. typedef struct objc_selector *SEL; And what is an objc_selector struct you ask? Turns out, it's just a char * string that's been processed by the runtime. objc_msgSend() is implemented in assembly. To read it's implementation browse to the runtime/Messengers.subproj directory in the objc-runtime source tree. The file objc-msg-i386.s is the intel implementation of this. Now that we're some what familiar with the runtime, let's take a look at our sample "hello" application we wrote earlier in a debugger and verify our progress. The most commonly used debugger on Mac OS X is gdb, obviously. Since I've spent so much time in the Windows world lately I am intel syntax inclined, I apologize in advance. Regardless, let's fire up gdb and take a look at the source of our main function. -[dcbz@megatron:~/code/HelloWorld/build]$ gdb ./hello GNU gdb 6.3.50-20050815 (Apple version gdb-768) (Tue Oct 2 04:07:49 UTC 2007) Copyright 2004 Free Software Foundation, Inc. (gdb) set disassembly-flavor intel (gdb) disas main Dump of assembler code for function main: 0x00001f3d <main+0>: push ebp 0x00001f3e <main+1>: mov ebp,esp 0x00001f40 <main+3>: push ebx 0x00001f41 <main+4>: sub esp,0x24 0x00001f44 <main+7>: call 0x1f49 <main+12> 0x00001f49 <main+12>: pop ebx 0x00001f4a <main+13>: lea eax,[ebx+0x117b] 0x00001f50 <main+19>: mov eax,DWORD PTR [eax] 0x00001f52 <main+21>: mov edx,eax 0x00001f54 <main+23>: lea eax,[ebx+0x1177] 0x00001f5a <main+29>: mov eax,DWORD PTR [eax] 0x00001f5c <main+31>: mov DWORD PTR [esp+0x4],eax 0x00001f60 <main+35>: mov DWORD PTR [esp],edx 0x00001f63 <main+38>: call 0x4005 <dyld_stub_objc_msgSend> 0x00001f68 <main+43>: mov edx,eax 0x00001f6a <main+45>: lea eax,[ebx+0x1173] 0x00001f70 <main+51>: mov eax,DWORD PTR [eax] 0x00001f72 <main+53>: mov DWORD PTR [esp+0x4],eax 0x00001f76 <main+57>: mov DWORD PTR [esp],edx 0x00001f79 <main+60>: call 0x4005 <dyld_stub_objc_msgSend> 0x00001f7e <main+65>: mov DWORD PTR [ebp-0xc],eax 0x00001f81 <main+68>: mov ecx,DWORD PTR [ebp-0xc] 0x00001f84 <main+71>: lea eax,[ebx+0x116f] 0x00001f8a <main+77>: mov edx,DWORD PTR [eax] 0x00001f8c <main+79>: lea eax,[ebx+0x96] 0x00001f92 <main+85>: mov DWORD PTR [esp+0x8],eax 0x00001f96 <main+89>: mov DWORD PTR [esp+0x4],edx 0x00001f9a <main+93>: mov DWORD PTR [esp],ecx 0x00001f9d <main+96>: call 0x4005 <dyld_stub_objc_msgSend> 0x00001fa2 <main+101>: mov edx,DWORD PTR [ebp-0xc] 0x00001fa5 <main+104>: lea eax,[ebx+0x116b] 0x00001fab <main+110>: mov eax,DWORD PTR [eax] 0x00001fad <main+112>: mov DWORD PTR [esp+0x4],eax 0x00001fb1 <main+116>: mov DWORD PTR [esp],edx 0x00001fb4 <main+119>: call 0x4005 <dyld_stub_objc_msgSend> 0x00001fb9 <main+124>: add esp,0x24 0x00001fbc <main+127>: pop ebx 0x00001fbd <main+128>: leave 0x00001fbe <main+129>: ret As you can see, our main function only consists of 4 calls to objc_msgSend(). There are no calls to our actual methods here. Here is a listing of the source code again, to jog your memory. int main(void) { Talker *talker = [[Talker alloc] init]; [talker say: "Hello World!"]; [talker release]; } Each call to objc_msgSend() corresponds to each method call in our source. class | method ------------------ Talker | alloc talker | init talker | say talker | release ------------------ To verify this we can put a breakpoint on the objc_msgSend() function. (gdb) break objc_msgSend Breakpoint 2 at 0x9470d670 (gdb) c Continuing. Breakpoint 2, 0x9470d670 in objc_msgSend () (gdb) x/2i $pc 0x9470d670 <objc_msgSend>: mov ecx,DWORD PTR [esp+0x8] 0x9470d674 <objc_msgSend+4>: mov eax,DWORD PTR [esp+0x4] As you can see, the first two instructions in objc_msgSend() are responsible for moving the id into eax, and the selector into ecx. To verify, lets step and print the contents of ecx. (gdb) stepi 0x9470d674 in objc_msgSend () (gdb) x/s $ecx 0x9470e66c <objc_msgSend_stub+828>: "alloc" As predicted "alloc" was the first method called. Now we can delete our breakpoints, and add a breakpoint at the current location. Then use the "commands" option in gdb to print the string at ecx, every time this breakpoint is hit. (gdb) break Breakpoint 3 at 0x9470d674 (gdb) commands Type commands for when breakpoint 3 is hit, one per line. End with a line saying just "end". >x/s $ecx >c >end (gdb) c Continuing. Breakpoint 8, 0x9470d674 in objc_msgSend () 0x94722d20 <__FUNCTION__.12370+80320>: "defaultCenter" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9470e83c <objc_msgSend_stub+1292>: "self" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x94772d28 <__FUNCTION__.12370+408008>: "addObserver:selector:name:object:" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9470e66c <objc_msgSend_stub+828>: "alloc" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9470e680 <objc_msgSend_stub+848>: "initialize" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9477f158 <__FUNCTION__.12370+458232>: "allocWithZone:" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9470e858 <objc_msgSend_stub+1320>: "init" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x1fd0 <main+147>: "say:" Hello World! Breakpoint 8, 0x9470d674 in objc_msgSend () 0x947a9334 <__FUNCTION__.12370+630740>: "release" Breakpoint 8, 0x9470d674 in objc_msgSend () 0x9474e514 <__FUNCTION__.12370+258484>: "dealloc" This works as expected. However, we can see that we were flooded with methods that weren't related to our class from the NS runtime loading. Let's try to implement something to see which class methods were called on. Remembering back to our objc_class struct: struct objc_class { struct objc_class* isa; struct objc_class* super_class; const char* name; 8 bytes into the struct there's a 4 byte pointer to the class's name. To verify this, we can restart the process with our breakpoint in the same place. Breakpoint 6, 0x9470d674 in objc_msgSend () (gdb) printf "%s\n", *(long*)($eax+8) NSNotificationCenter This time when it's hit, we deref the pointer at $eax+8 and print it to find out the class name. Again we can script this with the "commands" option to automate the process. But lets change our code so that rather than using printf, we utilize one of the functions exported by our objective-c runtime: call (char *)class_getName($eax) This function will do the work for us just with our ID. (gdb) b *0x9470d674 Breakpoint 1 at 0x9470d674 (gdb) commands Type commands for when breakpoint 1 is hit, one per line. End with a line saying just "end". >call (char *)class_getName($eax) >x/s $ecx >c >end (gdb) run ... Breakpoint 2, 0x9470d674 in objc_msgSend () $107 = 0x6e6f5a68 <Address 0x6e6f5a68 out of bounds> 0x9477f158 <__FUNCTION__.12370+458232>: "allocWithZone:" Breakpoint 2, 0x9470d674 in objc_msgSend () $108 = 0x0 0x94772d28 <__FUNCTION__.12370+408008>: "addObserver:selector:name:object:" Breakpoint 2, 0x9470d674 in objc_msgSend () $109 = 0x916e0318 "NSNotificationCenter" 0x94722d20 <__FUNCTION__.12370+80320>: "defaultCenter" Breakpoint 2, 0x9470d674 in objc_msgSend () $110 = 0x916e0318 "NSNotificationCenter" 0x9470e83c <objc_msgSend_stub+1292>: "self" Breakpoint 2, 0x9470d674 in objc_msgSend () $111 = 0x0 0x94772d28 <__FUNCTION__.12370+408008>: "addObserver:selector:name:object:" Breakpoint 2, 0x9470d674 in objc_msgSend () $112 = 0x77656e <Address 0x77656e out of bounds> 0x9470e66c <objc_msgSend_stub+828>: "alloc" Breakpoint 2, 0x9470d674 in objc_msgSend () $113 = 0x1fc9 "Talker" 0x9470e680 <objc_msgSend_stub+848>: "initialize" Breakpoint 2, 0x9470d674 in objc_msgSend () $114 = 0x1fc9 "Talker" 0x9477f158 <__FUNCTION__.12370+458232>: "allocWithZone:" Breakpoint 2, 0x9470d674 in objc_msgSend () $115 = 0x6b617761 <Address 0x6b617761 out of bounds> 0x9470e858 <objc_msgSend_stub+1320>: "init" Breakpoint 2, 0x9470d674 in objc_msgSend () $116 = 0x21646c72 <Address 0x21646c72 out of bounds> 0x1fd0 <main+147>: "say:" Hello World! Breakpoint 2, 0x9470d674 in objc_msgSend () $117 = 0x6470755f <Address 0x6470755f out of bounds> 0x947a9334 <__FUNCTION__.12370+630740>: "release" Breakpoint 2, 0x9470d674 in objc_msgSend () $118 = 0x615f4943 <Address 0x615f4943 out of bounds> 0x9474e514 <__FUNCTION__.12370+258484>: "dealloc" And as you can see, this works as sort of a make shift, objective-c message tracing system. However in some cases, eax does not actually contain an id. And this will not work. Hence we get the messages like: $118 = 0x615f4943 <Address 0x615f4943 out of bounds> This is due to the fact that objc_msgSend() is not always an entry point. So we can't guarantee that every time our breakpoint is hit we are actually seeing a call to objc_msgSend(). To make our tracer work more effectively we can put a breakpoint on 0x4005 <dyld_stub_objc_msgSend> instead. This means we have to use esp+0x8 for our SEL and esp+0x4 for our ID. We can use the statement: printf "[%s %s]\n", *(long *)((*(long*)($esp+4))+8),*(long *)($esp+8) To print our object and method nicely. This works pretty well but we still hit a situation where sometimes our class's name is set to NULL. In this case we take the isa (deref the first pointer in the struct) and get the name of that. The following gdb script will handle this: # # Trace objective-c messages. - nemo 2009 # b dyld_stub_objc_msgSend commands set $id = *(long *)($esp+4) set $sel = *(long *)($esp+8) if(*(long *)($id+8) != 0) printf "[%s %s]\n", *(long *)($id+8),$sel continue end set $isx = *(long *)($id) printf "[%s %s]\n", *(long *)($isx+8),$sel continue end We could also implement this with dtrace on Mac OS X quite easily. #!/usr/sbin/dtrace -qs /* usage: objcdump.d <pid> */ pid$1::objc_msgSend:entry { self->isa = *(long *)copyin(arg0,4); printf("-[%s %s]\n",copyinstr(*(long *)copyin(self->isa + 8, 4)),copyinstr(arg1)); } Let me correct myself on that, we /should/ be able to implement this with dtrace on Mac OS X quite easily. However, dtrace is kind of like looking at a beautiful painting through a kids kaleidescope toy. Thanks a lot to twiz for helping me out with implementing this. As you can see, the output of this script is the same as our gdb script, however the speed at which the process runs is magnitudes faster. Now that we're hopefully familiar with how calls to objc_msgSend() work we can look at how the ivar's and methods are accessed. In order to investigate this a little, we can modify our hello.m example code a little to include some attributes. To demonstrate this I will use the fraction example from [10]. (I'm getting uncreative in my old age ;-) . -[dcbz@megatron:~/code/fraction]$ ls -lsa total 24 0 drwxr-xr-x 5 dcbz dcbz 170 Mar 27 10:28 . 0 drwxr-xr-x 33 dcbz dcbz 1122 Mar 27 10:17 .. 8 -rwxr----- 1 dcbz dcbz 231 Mar 23 2004 Fraction.h 8 -rwxr----- 1 dcbz dcbz 339 Mar 24 2004 Fraction.m 8 -rwxr----- 1 dcbz dcbz 386 Mar 27 2004 main.m As you can see, this project is pretty similar to our earlier hello.m example. -[dcbz@megatron:~/code/fraction]$ cat Fraction.h #import <Foundation/NSObject.h> @interface Fraction: NSObject { int numerator; int denominator; } -(void) print; -(void) setNumerator: (int) d; -(void) setDenominator: (int) d; -(int) numerator; -(int) denominator; @end Our header file defines a simple interface to a "Fraction" class. This class represents the numerator and denominator of a fraction. It exports the methods setNumerator and setDemonimator in order to modify these values, and the methods numerator() and denominator() to get the values. -[dcbz@megatron:~/code/fraction]$ cat Fraction.m #import "Fraction.h" #import <stdio.h> @implementation Fraction -(void) print { printf( "%i/%i", numerator, denominator ); } -(void) setNumerator: (int) n { numerator = n; } -(void) setDenominator: (int) d { denominator = d; } -(int) denominator { return denominator; } -(int) numerator { return numerator; } @end The actual implementation of these methods is pretty much what you would expect from any OOP language. Get methods return the object's attribute, set methods set it. -[dcbz@megatron:~/code/fraction]$ cat main.m #import <stdio.h> #import "Fraction.h" int main( int argc, const char *argv[] ) { // create a new instance Fraction *frac = [[Fraction alloc] init]; // set the values [frac setNumerator: 1]; [frac setDenominator: 3]; // print it printf( "The fraction is: " ); [frac print]; printf( "\n" ); // free memory [frac release]; return 0; } As you can see, our main.m file contains code to instantiate an instance of the class. It then sets the numerator to 1 and denominator to 3, and prints the fraction. Pretty straight forward stuff. -[dcbz@megatron:~/code/fraction]$ gcc -o fraction Fraction.m main.m -framework Foundation -[dcbz@megatron:~/code/fraction]$ ./fraction The fraction is: 1/3 Before we fire up gdb and look at this from a debugging perspective, lets take a quick look through the source code for what happens after objc_msgSend() is called. ENTRY _objc_msgSend CALL_MCOUNTER // load receiver and selector movl selector(%esp), %ecx movl self(%esp), %eax // check whether selector is ignored cmpl $ kIgnore, %ecx je LMsgSendDone // return self from %eax // check whether receiver is nil testl %eax, %eax je LMsgSendNilSelf // receiver (in %eax) is non-nil: search the cache LMsgSendReceiverOk: movl isa(%eax), %edx // class = self->isa CacheLookup WORD_RETURN, MSG_SEND, LMsgSendCacheMiss movl $kFwdMsgSend, %edx // flag word-return for _objc_msgForward jmp *%eax // goto *imp // cache miss: go search the method lists LMsgSendCacheMiss: MethodTableLookup WORD_RETURN, MSG_SEND movl $kFwdMsgSend, %edx // flag word-return for _objc_msgForward jmp *%eax // goto *imp As you can see, objc_msgSend() first moves the receiver and selector into eax and ecx respectively. It then tests if the selector is kignore ("?"). If this is the case, it simply returns the receiver (id). If the receiver is not NULL, a cache lookup is performed on the method in question. If the method is found in the cache, the value in the cache is simply called. We'll look into the cache in more detail later in the exploitation section. If the method's address is not in the cache, the "MethodTableLookup" macro is used. .macro MethodTableLookup subl $4, %esp // 16-byte align the stack // push args (class, selector) pushl %ecx pushl %eax CALL_EXTERN(__class_lookupMethodAndLoadCache) addl $12, %esp // pop parameters and alignment .endmacro From the code above we can see that this macro simply aligns the stack and calls __class_lookupMethodAndLoadCache. This function, checks the cache of the class again, and it's super class for the method in question. If it's definitely not in the cache, the method list in the class is walked and tested individually for a match. If this is not successful the parent of the class is checked and so forth. If the method is found, it's called. Let's look at this process in gdb. We hit out breakpoint in objc_msgSend(). Breakpoint 7, 0x9470d670 in objc_msgSend () (gdb) stepi 0x9470d674 in objc_msgSend () (gdb) stepi 0x9470d678 in objc_msgSend () Step over the first two instructions to populate ecx and eax, for our convenience. (gdb) x/s $ecx 0x1f8d <main+244>: "setNumerator:" We can see the method being called (from the SEL argument) is setNumerator: (gdb) x/x $eax 0x103240: 0x00003000 We take the ISA... (gdb) x/x 0x00003000 0x3000 <.objc_class_name_Fraction>: 0x00003040 (gdb) 0x3004 <.objc_class_name_Fraction+4>: 0xa07fccc0 (gdb) 0x3008 <.objc_class_name_Fraction+8>: 0x00001f7e Offset this by 8 bytes to find the class name. (gdb) x/s 0x00001f7e 0x1f7e <main+229>: "Fraction" So this is a call to -[Fraction setNumerator:] (obviously). struct objc_class { struct objc_class* isa; struct objc_class* super_class; const char* name; long version; long info; long instance_size; struct objc_ivar_list* ivars; struct objc_method_list** methodLists; struct objc_cache* cache; struct objc_protocol_list* protocols; }; Remembering our objc_class struct from earlier, we know that the method_lists struct is 28 bytes in. (gdb) set $classbase=0x3000 (gdb) x/x $classbase+28 0x301c <.objc_class_name_Fraction+28>: 0x00103250 So the address of our method_list is 0x00103250. struct objc_method_list { struct objc_method_list *obsolete; int method_count; struct objc_method method_list[1]; } As you can see, our method_count is 5. (gdb) x/x 0x00103250+4 0x103254: 0x00000005 typedef struct objc_method *Method; struct objc_method { SEL method_name char *method_types IMP method_imp } (gdb) x/3x 0x00103250+8 0x103258: 0x00001fb7 0x00001fd2 0x00001e8b (gdb) x/s 0x00001fb7 0x1fb7 <main+286>: "numerator" (gdb) x/7i 0x00001e8b 0x1e8b <-[Fraction numerator]>: push ebp 0x1e8c <-[Fraction numerator]+1>: mov ebp,esp 0x1e8e <-[Fraction numerator]+3>: sub esp,0x8 0x1e91 <-[Fraction numerator]+6>: mov eax,DWORD PTR [ebp+0x8] 0x1e94 <-[Fraction numerator]+9>: mov eax,DWORD PTR [eax+0x4] 0x1e97 <-[Fraction numerator]+12>: leave 0x1e98 <-[Fraction numerator]+13>: ret Now that we see clearly how methods are stored, we can write a small amount of gdb script to dump them. (gdb) set $methods = 0x00103250 + 8 (gdb) set $i = 1 (gdb) while($i <= 5) >printf "name: %s\n", *(long *)$methods >printf "addr: 0x%x\n", *(long *)($methods+8) >set $methods += 12 >set $i++ >end name: numerator addr: 0x1e8b name: denominator addr: 0x1e7d name: setDenominator: addr: 0x1e6c name: setNumerator: addr: 0x1e5b name: print addr: 0x1e26 We can now clearly display all our methods, so lets take a look at how our set and get methods actually work. Firstly, lets take a look at the setDenominator method. (gdb) x/8i 0x1e6c 0x1e6c <-[Fraction setDenominator:]>: push ebp 0x1e6d <-[Fraction setDenominator:]+1>: mov ebp,esp 0x1e6f <-[Fraction setDenominator:]+3>: sub esp,0x8 0x1e72 <-[Fraction setDenominator:]+6>: mov edx,DWORD PTR [ebp+0x8] 0x1e75 <-[Fraction setDenominator:]+9>: mov eax,DWORD PTR [ebp+0x10] 0x1e78 <-[Fraction setDenominator:]+12>: mov DWORD PTR [edx+0x8],eax 0x1e7b <-[Fraction setDenominator:]+15>: leave 0x1e7c <-[Fraction setDenominator:]+16>: ret As you can see from the implementation, this function basically takes a pointer to the instance of our Fraction class, and stores the argument we pass to it at offset 0x8. 0x1e5b <-[Fraction setNumerator:]>: push ebp 0x1e5c <-[Fraction setNumerator:]+1>: mov ebp,esp 0x1e5e <-[Fraction setNumerator:]+3>: sub esp,0x8 0x1e61 <-[Fraction setNumerator:]+6>: mov edx,DWORD PTR [ebp+0x8] 0x1e64 <-[Fraction setNumerator:]+9>: mov eax,DWORD PTR [ebp+0x10] 0x1e67 <-[Fraction setNumerator:]+12>: mov DWORD PTR [edx+0x4],eax 0x1e6a <-[Fraction setNumerator:]+15>: leave 0x1e6b <-[Fraction setNumerator:]+16>: ret Our setNumerator method is almost identical to this, however it uses offset 0x4 instead this is all pretty straight forward. So what's the ivars pointer that we saw earlier in our objc_class struct for then, you ask? struct objc_class { struct objc_class* isa; struct objc_class* super_class; const char* name; long version; long info; long instance_size; struct objc_ivar_list* ivars; struct objc_method_list** methodLists; struct objc_cache* cache; struct objc_protocol_list* protocols; }; Our ivars pointer (24 bytes in to the objc_class struct) is required because of the reflective properties of the Objective-C language. The ivars pointer basically points to all the information about the instance variables of the class. We can explore this in gdb, with our Fraction class some more. First off, let's put a breakpoint on one of our objc_msgSend calls: (gdb) break *0x00001f3b Breakpoint 2 at 0x1f3b (gdb) c Continuing. Once it's hit, we use the stepi command a few times, to populate the registers eax and ecx with the selector and id. Breakpoint 2, 0x00001f3b in main () (gdb) stepi 0x00004005 in dyld_stub_objc_msgSend () (gdb) 0x94e0c670 in objc_msgSend () (gdb) 0x94e0c674 in objc_msgSend () Now our eax register contains a pointer to our instantiated class. (gdb) x/x $eax 0x103230: 0x00003000 We display the first 4 bytes at eax to retrieve the ISA pointer. Then we dump a bunch of bytes at that address. (gdb) x/10x 0x3000 0x3000 <.objc_class_name_Fraction>: 0x00003040 0xa06e3cc0 0x00001f7e 0x00000000 0x3010 <.objc_class_name_Fraction+16>: 0x00ba4001 0x0000000c 0x000030c4 0x00103240 0x3020 <.objc_class_name_Fraction+32>: 0x001048d0 0x00000000 So according to our previous logic, 24 bytes in we should have the ivars pointer. Therefore in this case our ivars pointer is: 0x000030c4 Before we continue dumping memory here, lets take a look at the struct definitions for what we're seeing. The pointer we just found, points to a struct of type "objc_ivar_list" this struct looks like so: struct objc_ivar_list { int ivar_count /* variable length structure */ struct objc_ivar ivar_list[1] } So we can dump the count, trivially in gdb. (gdb) x/x 0x000030c4 0x30c4 <.objc_class_name_Fraction+196>: 0x00000002 And see that our Fraction class has 2 ivars. This makes sense, numerator and denominator. Following our count is an array of objc_ivar structs, one for each instance variable of the class. The definition for this struct is as follows: struct objc_ivar { char *ivar_name char *ivar_type int ivar_offset } So lets start dumping our ivars and see where it takes us. (gdb) 0x30c8 <.objc_class_name_Fraction+200>: 0x00001fb7 // ivar_name. (gdb) 0x30cc <.objc_class_name_Fraction+204>: 0x00001fd9 // ivar_type. (gdb) 0x30d0 <.objc_class_name_Fraction+208>: 0x00000004 // ivar_offset. So if we dump the name and type, we can see that the first instance variable we are looking at is the numerator. (gdb) x/s 0x00001fb7 0x1fb7 <main+286>: "numerator" (gdb) x/s 0x00001fd9 0x1fd9 <main+320>: "i" The "i" in the type string means that we're looking at an integer. The int ivar_offset is set to 0x4. This means that when a Fraction class is allocated, 4 bytes into the allocation we can find the numerator. This matches up with the code in our setNumerator and makes sense. We can repeat the process with the next element to verify our logic. (gdb) 0x30d4 <.objc_class_name_Fraction+212>: 0x00001fab (gdb) 0x30d8 <.objc_class_name_Fraction+216>: 0x00001fd9 (gdb) 0x30dc <.objc_class_name_Fraction+220>: 0x00000008 (gdb) x/s 0x00001fab 0x1fab <main+274>: "denominator" (gdb) x/s 0x00001fd9 0x1fd9 <main+320>: "i" Again, as we can see, the denominator is an integer and is 0x8 bytes offset into the allocation for this object. Hopefully that makes the Objective-C runtime in memory relatively clear. ------[ 3.2 - The __OBJC Segment In this section I will go over how the data mentioned in the previous section is stored inside the Mach-O binary. I'm going to try and avoid going into the Mach-O format as much as possible. This has already been covered to death, if you need to read about the file format check out [6]. Basically, files containing Objective-C code have an extra Mach-O segment called the __OBJC segment. This segment consists of a bunch of different sections, each containing different information pertinent to the Objective-C runtime. The output below from the otool -l command shows the sizes/load addresses and flags etc for our __OBJC sections in the hello binary we compiled earlier in the paper. -[dcbz@megatron:~/code/HelloWorld/build]$ otool -l hello ... Load command 3 cmd LC_SEGMENT cmdsize 668 segname __OBJC vmaddr 0x00003000 vmsize 0x00001000 fileoff 8192 filesize 4096 maxprot 0x00000007 initprot 0x00000003 nsects 9 flags 0x0 Section sectname __class segname __OBJC addr 0x00003000 size 0x00000030 offset 8192 align 2^5 (32) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __meta_class segname __OBJC addr 0x00003040 size 0x00000030 offset 8256 align 2^5 (32) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __inst_meth segname __OBJC addr 0x00003080 size 0x00000020 offset 8320 align 2^5 (32) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __instance_vars segname __OBJC addr 0x000030a0 size 0x00000010 offset 8352 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __module_info segname __OBJC addr 0x000030b0 size 0x00000020 offset 8368 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __symbols segname __OBJC addr 0x000030d0 size 0x00000010 offset 8400 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __message_refs segname __OBJC addr 0x000030e0 size 0x00000010 offset 8416 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000005 reserved1 0 reserved2 0 Section sectname __cls_refs segname __OBJC addr 0x000030f0 size 0x00000004 offset 8432 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __image_info segname __OBJC addr 0x000030f4 size 0x00000008 offset 8436 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 This output shows us where in the file itself each section resides. It also shows us where that portion will be mapped into memory in the address space of the process, as well as the size of each mapping. The first section in the __OBJC segment we will look at is the __class section. To understand this we'll take a quick look at how ida displays this section. __class:00003000 ; =========================================================================== __class:00003000 __class:00003000 ; Segment type: Pure data __class:00003000 ; Segment alignment '32byte' can not be represented in assembly __class:00003000 __class segment para public 'DATA' use32 __class:00003000 assume cs:__class __class:00003000 ;org 3000h __class:00003000 public _objc_class_name_Talker __class:00003000 _objc_class_name_Talker __class_struct <offset stru_3040, offset aNsobject, offset aTalker, 0,\ __class:00003000 ; DATA XREF: __symbols:000030B0o __class:00003000 1, 4, 0, offset dword_3070, 0, 0> ; "NSObject" __class:00003028 align 10h __class:00003028 __class ends __class:00003028 From IDA's dump of this section (from our hello binary) we can see that this section is pretty much where our objc_class structs are stored. struct objc_class { struct objc_class* isa; struct objc_class* super_class; const char* name; long version; long info; long instance_size; struct objc_ivar_list* ivars; struct objc_method_list** methodLists; struct objc_cache* cache; struct objc_protocol_list* protocols; }; More particularly though, this is where the ISA classes are stored. An interesting note, is that from what I've seen gcc seems to almost always pick 0x3000 for this section. It's pretty reliable to attempt to utilize this area in an exploit if the need arises. The next section we'll look at is the __meta_class section. __meta_class:00003040 ; =========================================================================== __meta_class:00003040 __meta_class:00003040 ; Segment type: Pure data __meta_class:00003040 ; Segment alignment '32byte' can not be represented in assembly __meta_class:00003040 __meta_class segment para public 'DATA' use32 __meta_class:00003040 assume cs:__meta_class __meta_class:00003040 ;org 3040h __meta_class:00003040 stru_3040 __class_struct <offset aNsobject, offset aNsobject, offset aTalker, 0,\ __meta_class:00003040 ; DATA XREF: __class:_objc_class_name_Talkero __meta_class:00003040 2, 30h, 0, 0, 0, 0> ; "NSObject" __meta_class:00003068 align 10h __meta_class:00003068 __meta_class ends __meta_class:00003068 Again, as you can see this section is filled with objc_class structs. However this time the structs represent the super_class structs. We can see that the __class section references this one. The __inst_meth section (shown below) contains pointers to the various methods used by the classes. These pointers can be changed to gain control of execution. __inst_meth:00003070 ; =========================================================================== __inst_meth:00003070 __inst_meth:00003070 ; Segment type: Pure data __inst_meth:00003070 __inst_meth segment dword public 'DATA' use32 __inst_meth:00003070 assume cs:__inst_meth __inst_meth:00003070 ;org 3070h __inst_meth:00003070 dword_3070 dd 0 ; DATA XREF: __class:_objc_class_name_Talkero __inst_meth:00003074 dd 1 __inst_meth:00003078 dd offset aSay, offset aV12@048, offset __Talker_say__ ; "say:" __inst_meth:00003078 __inst_meth ends __inst_meth:00003078 The __message_refs section basically just contains pointers to all the selectors used throughout the application. The strings themselves are contained in the __cstring section, however __message_refs contains all the pointers to them. __message_refs:000030B4 ; =========================================================================== __message_refs:000030B4 __message_refs:000030B4 ; Segment type: Pure data __message_refs:000030B4 __message_refs segment dword public 'DATA' use32 __message_refs:000030B4 assume cs:__message_refs __message_refs:000030B4 ;org 30B4h __message_refs:000030B4 off_30B4 dd offset aRelease ; DATA XREF: _main+68o __message_refs:000030B4 ; "release" __message_refs:000030B8 off_30B8 dd offset aSay ; DATA XREF: _main+47o __message_refs:000030B8 ; "say:" __message_refs:000030BC off_30BC dd offset aInit ; DATA XREF: _main+2Do __message_refs:000030BC ; "init" __message_refs:000030C0 off_30C0 dd offset aAlloc ; DATA XREF: _main+17o __message_refs:000030C0 __message_refs ends ; "alloc" __message_refs:000030C0 The __cls_refs section contains pointers to the names of all the classes in our Application. The strings themselves again are stored in the cstring section, however the __cls_refs section simply contains an array of pointers to each of them. __cls_refs:000030C4 ; =========================================================================== __cls_refs:000030C4 __cls_refs:000030C4 ; Segment type: Regular __cls_refs:000030C4 __cls_refs segment dword public '' use32 __cls_refs:000030C4 assume cs:__cls_refs __cls_refs:000030C4 ;org 30C4h __cls_refs:000030C4 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing __cls_refs:000030C4 unk_30C4 db 0C9h ; + ; DATA XREF: _main+Do __cls_refs:000030C5 db 1Fh __cls_refs:000030C6 db 0 __cls_refs:000030C7 db 0 __cls_refs:000030C7 __cls_refs ends __cls_refs:000030C7 I'm not really sure what the __image_info section is used for. But it's good for us to use in our binary infector. :P __image_info:000030C8 ; =========================================================================== __image_info:000030C8 __image_info:000030C8 ; Segment type: Regular __image_info:000030C8 __image_info segment dword public '' use32 __image_info:000030C8 assume cs:__image_info __image_info:000030C8 ;org 30C8h __image_info:000030C8 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing __image_info:000030C8 align 10h __image_info:000030C8 __image_info ends __image_info:000030C8 One section that was missing from our hello binary but is typically in all Objective-C compiled files is the __instance_vars section. Section sectname __instance_vars segname __OBJC addr 0x000030c4 size 0x0000001c offset 8388 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 The reason this was omitted from our hello binary is due to the fact that our program has no classes with instance vars. Talker simply had a method which took a string and printed it. The __instance_vars section holds the ivars structs mentioned at the end of the previous chapter. It begins with a count, and is followed up by an array of objc_ivar structs, as described previously. struct objc_ivar { char *ivar_name char *ivar_type int ivar_offset } I skipped a few of the self explanatory sections like symbols. But hopefully this served as an introduction to the information available to us in the binary. In the next sections we'll look at tools to turn this information into something more human readable. --[ 4 - Reverse Engineering Objective-C Applications. As I'm sure you can imagine having read this far, with such a large variety of information present in the binary and in memory at runtime reverse engineering Objective-C applications is quite a bit easier than their C or C++ counterparts. In the following section I will run through some of the tools and methods that help out when attempting to reverse engineer Objective-C applications on Mac OSX both on disk and at runtime. ------[ 4.1 - Static analysis toolset First up, lets take a look at how we can access the information statically from the disk. There exists a variety of tools which help us with this task. The first tool, is one we've used previously in this paper, "otool". Otool on Mac OS X is basically the equivalent of objdump on other platforms (NOTE: objdump can obviously be compiled for Mac OS X too.). Otool will not only dump assembly code for particular sections as well as header information for Mach-O files, but it can display our Objective-C information as well. By using the "-o" flag to otool we can tell it to dump the Objective-C segment in a readable fashion. The output below shows us running this command against our hello binary from earlier. -[dcbz@megatron:~/code/HelloWorld/build]$ otool -o hello hello: Objective-C segment Module 0x30b0 version 7 size 16 name 0x00001fa8 symtab 0x000030d0 sel_ref_cnt 0 refs 0x00000000 (not in an __OBJC section) cls_def_cnt 1 cat_def_cnt 0 Class Definitions defs[0] 0x00003000 isa 0x00003040 super_class 0x00001fa9 name 0x00001fb2 version 0x00000000 info 0x00000001 instance_size 0x00000008 ivars 0x000030a0 ivar_count 1 ivar_name 0x00001fc6 ivar_type 0x00001fde ivar_offset 0x00000004 methods 0x00003080 obsolete 0x00000000 method_count 2 method_name 0x00001fc1 method_types 0x00001fd4 method_imp 0x00001f13 method_name 0x00001fb9 method_types 0x00001fca method_imp 0x00001f02 cache 0x00000000 protocols 0x00000000 (not in an __OBJC section) Meta Class isa 0x00001fa9 super_class 0x00001fa9 name 0x00001fb2 version 0x00000000 info 0x00000002 instance_size 0x00000030 ivars 0x00000000 (not in an __OBJC section) methods 0x00000000 (not in an __OBJC section) cache 0x00000000 protocols 0x00000000 (not in an __OBJC section) Module 0x30c0 version 7 size 16 name 0x00001fa8 symtab 0x00002034 (not in an __OBJC section) Contents of (__OBJC,__image_info) section version 0 flags 0x0 RR As you can see, this output provides us with a variety of information such as the addresses of our class definitions, their ivar count, name and types as well as their offsets into the appropriate section. Most of the times however, it can be more useful to see a human readable interface description for our binary. This can be arranged using the class-dump tool available from [14]. -[dcbz@megatron:~/code/HelloWorld/build]$ /Volumes/class-dump-3.1.2/class-dump hello /* * Generated by class-dump 3.1.2. * * class-dump is Copyright (C) 1997-1998, 2000-2001, 2004-2007 by Steve * Nygard. */ /* * File: hello * Arch: Intel 80x86 (i386) */ @interface Talker : NSObject { } - (void)say:(char *)fp8; @end The output above shows class-dump being run against our small hello binary from the previous sections. Our example is pretty tiny though, but it still demonstrates the format in which class-dump will display it's information. By running this tool against Safari we can get a more clear picture of the kind of information class-dump can give us. /* * Generated by class-dump 3.1.2. * * class-dump is Copyright (C) 1997-1998, 2000-2001, 2004-2007 by Steve * Nygard. */ struct AliasRecord; struct CGAffineTransform { float a; float b; float c; float d; float tx; float ty; }; struct CGColor; struct CGImage; struct CGPoint { float x; float y; }; ... @protocol NSDraggingInfo - (id)draggingDestinationWindow; - (unsigned int)draggingSourceOperationMask; - (struct _NSPoint)draggingLocation; - (struct _NSPoint)draggedImageLocation; - (id)draggedImage; - (id)draggingPasteboard; - (id)draggingSource; - (int)draggingSequenceNumber; - (void)slideDraggedImageTo:(struct _NSPoint)fp8; - (id)namesOfPromisedFilesDroppedAtDestination:(id)fp8; @end ... Class-dump is a very valuable tool and definitely one of the first things that I run when trying to understand the purpose of an Objective-C binary. Back when the earth was flat, and Mac OS X ran mostly on PowerPC architecture Braden started work on a really cool tool called "code-dump". Code-dump was built on top of the class-dump source and rather than just dumping class definitions, it was designed to decompile Objective-C code. Unfortunately code-dump has never been updated since then, but to me the idea is still very sound. It would be really cool to see some Objective-C support added to Hex-rays in the future. I think you could get some really reliable output with that. However, until the day arrives when someone bothers working on a real decompiler for intel Objective-C binaries the closest thing we have is called OTX.app. OTX (hosted on one of the coolest domains ever.) [15] is a gui tool for Mac OS X which takes a Mach-O binary as input and then uses otool output to dump an assembly listing. It is capable of querying the Objective-C sections of the binary for information and then populating the assembly with comments. Let's take a look at the output from OTX running against the Safari web browser. -(id)[AppController(FileInternal) _closeMenuItem] +0 00003f70 55 pushl %ebp +1 00003f71 89e5 movl %esp,%ebp +3 00003f73 83ec18 subl $0x18,%esp +6 00003f76 a1cc6c1e00 movl 0x001e6ccc,%eax _fileMenu +11 00003f7b 89442404 movl %eax,0x04(%esp) +15 00003f7f 8b4508 movl 0x08(%ebp),%eax +18 00003f82 890424 movl %eax,(%esp) +21 00003f85 e812ee2000 calll 0x00212d9c -[(%esp,1) _fileMenu] +26 00003f8a 8b15bc6c1e00 movl 0x001e6cbc,%edx performClose: +32 00003f90 c744240800000000 movl $0x00000000,0x08(%esp) +40 00003f98 8954240c movl %edx,0x0c(%esp) +44 00003f9c 8b15c46c1e00 movl 0x001e6cc4,%edx itemWithTarget:andAction: +50 00003fa2 890424 movl %eax,(%esp) +53 00003fa5 89542404 movl %edx,0x04(%esp) +57 00003fa9 e8eeed2000 calll 0x00212d9c -[(%esp,1) itemWithTarget:andAction:] +62 00003fae c9 leave +63 00003faf c3 ret The comments in the above output are pretty clear, they show the name of the method as well as which method and attribute are being used in the assembly. Unfortunately, working from a .txt file containing assembly is still pretty painful, these days most people are using IDA pro to navigate an assembly listing. Back when I was first doing this research I wrote an ida python script which would parse the .txt file output from OTX, and steal all the comments, then add them to IDA. It also took the method names and renamed the functions appropriately and added cross refs where appropriate. Unfortunately I haven't been able to locate this script since I got back from my forced time off :( If I do find it, I'll put it up on felinemenace in case anyone is interested. Thankfully since I've been away it seems a few people have recreated IDC scripts to pull information from the __OBJC segment and populate the IDB. I'm sure you can google around and find them yourselves, but regardless a couple are available at [16] and [17]. ------[ 4.2 - Runtime analysis toolset In the previous section we explored how to access the Objective-C information present in the binary without executing it. In this section I will cover how to interact with the Objective-C runtime in the active process in order to understand program flow and assist in reverse engineering. The first tool we'll look at exists basically in the libobjc.A.dylib library itself. By setting the OBJC_HELP environment variable to anything non-zero and then running an Objective-C application we can see some options that are available to us. % OBJC_HELP=1 ./build/Debug/HelloWorld objc: OBJC_HELP: describe Objective-C runtime environment variables objc: OBJC_PRINT_OPTIONS: list which options are set objc: OBJC_PRINT_IMAGES: log image and library names as the runtime loads them objc: OBJC_PRINT_CONNECTION: log progress of class and category connections objc: OBJC_PRINT_LOAD_METHODS: log class and category +load methods as they are called objc: OBJC_PRINT_RTP: log initialization of the Objective-C runtime pages objc: OBJC_PRINT_GC: log some GC operations objc: OBJC_PRINT_SHARING: log cross-process memory sharing objc: OBJC_PRINT_CXX_CTORS: log calls to C++ ctors and dtors for instance variables objc: OBJC_DEBUG_UNLOAD: warn about poorly-behaving bundles when unloaded objc: OBJC_DEBUG_FRAGILE_SUPERCLASSES: warn about subclasses that may have been broken by subsequent changes to superclasses objc: OBJC_USE_INTERNAL_ZONE: allocate runtime data in a dedicated malloc zone objc: OBJC_ALLOW_INTERPOSING: allow function interposing of objc_msgSend() objc: OBJC_FORCE_GC: force GC ON, even if the executable wants it off objc: OBJC_FORCE_NO_GC: force GC OFF, even if the executable wants it on objc: OBJC_CHECK_FINALIZERS: warn about classes that implement -dealloc but not -finalize 2006-04-22 12:08:17.544 HelloWorld[4831] Hello, World! This help is pretty self explanatory, in order to utilize each of this functionality you simply set the appropriate environment variable before running your Objective-C application. The runtime does the rest. Another environment variable which is useful for runtime analysis of Objective-C applications is "NSObjCMessageLoggingEnabled". If this variable is set to "Yes" then all objc_msgSend calls are logged to a file /tmp/msgSends-<pid>. This is also obeyed for suid Objective-C apps and very useful. The output below demonstrates the use of this variable to log objc_msgSend calls for our "HelloWorld" application. -[dcbz@megatron:~/code/HelloWorld/build]$ NSObjCMessageLoggingEnabled=Yes ./hello Hello World! -[dcbz@megatron:~/code/HelloWorld/build]$ cat /tmp/msgSends-6686 + NSRecursiveLock NSObject initialize + NSRecursiveLock NSObject new + NSRecursiveLock NSObject alloc .... + Talker NSObject initialize + Talker NSObject alloc + Talker NSObject allocWithZone: - Talker NSObject init - Talker Talker say: - Talker NSObject release - Talker NSObject dealloc From this output it is easy to see exactly what our application was doing when we ran it. To take our message tracing functionality further, the "dtrace" application can be used to spy on Objective-C methods and functionality. Taken straight from the dtrace man-page, dtrace supports an Objective-C provider. The syntax for this is as follows: """ OBJECTIVE C PROVIDER The Objective C provider is similar to the pid provider, and allows instrumentation of Objective C classes and methods. Objective C probe specifiers use the following format: objcpid:[class-name[(category-name)]]:[[+|-]method-name]:[name] pid The id number of the process. class-name The name of the Objective C class. category-name The name of the category within the Objective C class. method-name The name of the Objective C method. name The name of the probe, entry, return, or an integer instruction offset within the method. OBJECTIVE C PROVIDER EXAMPLES objc123:NSString:-*:entry Every instance method of class NSString in process 123. objc123:NSString(*)::entry Every method on every category of class NSString in process 123. objc123:NSString(foo):+*:entry Every class method in NSString's foo category in process 123. objc123::-*:entry Every instance method in every class and category in process 123. objc123:NSString(foo):-dealloc:entry The dealloc method in the foo category of class NSString in process 123. objc123::method?with?many?colons:entry The method method:with:many:colons in every class in process 123. (A ? wildcard must be used to match colon characters inside of Objective C method names, as they would otherwise be parsed as the provider field separators.) """ This can be used as a message tracer for a particular class. You can even use this to write a simple fuzzer. There are plenty of tutorials out on the interwebz regarding writing .d scripts, and honestly, I'm still very new to it, so I'm going to leave this topic for now. I'd imagine that most people reading this paper are already pretty familiar with gdb. On Mac OS X, Apple have slightly modified gdb to have better support for Objective-C objects. The first notable change I can think of is that they've added the print-object command: (gdb) help print-object Ask an Objective-C object to print itself. In order to show an example of this we can fire up gdb on our hello example Objective-C application.. -[dcbz@megatron:~/code/HelloWorld/build]$ gdb hello GNU gdb 6.3.50-20050815 (Apple version gdb-768) (gdb) set disassembly-flavor intel (gdb) disas main Dump of assembler code for function main: 0x00001f3d <main+0>: push ebp 0x00001f3e <main+1>: mov ebp,esp 0x00001f40 <main+3>: push ebx [...] 0x00001f96 <main+89>: mov DWORD PTR [esp+0x4],edx 0x00001f9a <main+93>: mov DWORD PTR [esp],ecx 0x00001f9d <main+96>: call 0x4005 <dyld_stub_objc_msgSend> 0x00001fa2 <main+101>: mov edx,DWORD PTR [ebp-0xc] 0x00001fa5 <main+104>: lea eax,[ebx+0x116b] [...] 0x00001fb9 <main+124>: add esp,0x24 0x00001fbc <main+127>: pop ebx 0x00001fbd <main+128>: leave 0x00001fbe <main+129>: ret End of assembler dump. .. and stick a breakpoint on one of the calls to objc_msgSend() from main(). (gdb) b *0x00001f9d Breakpoint 1 at 0x1f9d (gdb) r Starting program: /Users/dcbz/code/HelloWorld/build/hello Breakpoint 1, 0x00001f9d in main () (gdb) stepi 0x00004005 in dyld_stub_objc_msgSend () (gdb) 0x94e0c670 in objc_msgSend () (gdb) 0x94e0c674 in objc_msgSend () (gdb) 0x94e0c678 in objc_msgSend () We stepi a few instructions to populate our eax and ecx registers with the selector and id, as we've done previously in this paper. (gdb) po $eax <Talker: 0x103240> Then use the "po" command on our class pointer, which shows that we have an instance of the Talker class at 0x103240 on the heap. (gdb) x/x $eax 0x103240: 0x00003000 (gdb) po 0x3000 Talker As you can see, if you use the "po" command on an ISA pointer, it simply spits out the name of the class. Some of the coolest techniques I've seen for manipulating the Objective-C runtime involve injecting an interpreter for the language of your choice into the address space of the running process, and then manipulating the classes in memory from there. None of the implementations of this that I've seen have been anywhere near as cool as F-Script Anywhere [18]. It's hard to explain this tool in .txt format but if you have a Mac you should grab it and check it out. Basically when you run F-Script Anywhere you are presented with a list of all the running Objective-C applications on the system. You can select one and click the install button, to inject the F-Script interpreter into that process. On Leopard however, before you use this tool, you must set it to sgid procmod. This is due to the debugging restrictions around task_for_pid(). To do this basically just: -[root@megatron:/Applications/F-Script Anywhere.app/Contents/MacOS]$ chgrp procmod F-Script\ Anywhere -[root@megatron:/Applications/F-Script Anywhere.app/Contents/MacOS]$ chmod g+s F-Script\ Anywhere Once the F-Script interpreter has been injected into your application, a "FSA" menu will appear in the menu bar at the top of your screen. This menu gives you the options: - New F-Script Workspace. - Browser for target. If you select "New F-Script Workspace" you are presented with a small terminal, in which to execute F-Script commands. The F-Script language is very simple and documented on their website [18]. It looks very similar to Objective-C itself. The interpreter window is running in the context of the application itself. Therefore any F-Script statements you make are capable of manipulating the classes etc within the target Objective-C application. But what if you don't know the name of your class in order to write F-Script to manipulate it? The "Browser" button at the bottom of the terminal will open up an object browser for our target application. Clicking on the "Classes" button at the top of this window will result in a list of all the classes in our address space being listed down the side. Clicking on any of the classes, will bring up all the attributes and methods for a particular class. (Methods are indicated with a colon. ie; "say:"). Double clicking on any of the methods in this window will result in the method being called, if arguments are required a window will pop up prompting you to supply them. This is very useful for exploring and testing the functionality of your target. Rather than clicking the "New F-Script Workspace" option in our FSA menu, you can select the "Browser for target" option. This will change your cursor into some kind of weird, clover/target/thing. Once this happens, clicking on any object in the gui, will pop up an object browser for the particular instance of the object. This way we can call methods/view attributes/see the address for the class etc. You can do a lot more with F-Script anywhere, but the best place to learn is from the website [18] itself. ------[ 4.3 - Cracking I'm not going to spend too much time on this topic as it's been covered pretty well by curious in [19], and I've published a little bit on it before in [13]. However, when attempting to crack Objective-C apps it's always definitely worth running class-dump before you do anything else, and reading over the output. I can't count the number of times I've seen an application which has a method like createRegistrationKey() which you can call from F-Script Anywhere, or isRegistered() which is easily noppable. With all the Objective-C information at your disposal cracking a majority of applications on Mac OS X becomes quite trivial. Honestly, lets face it, people writing applications for Mac OS X care about the pretty gui, not the binary protection schemes available. ------[ 4.4 - Objective-C Binary Infection Again I won't spend too much time on this section. Dino let me know recently that Vincenzo Iozzo (snagg@openssl.it) did a talk apparently at Deepsec last year on infecting the Objective-C structures in a Mach-O binary. I couldn't find any information on it on google, so i'll release my technique, however if you want to read a (probably much much better technique) then look up Vincenzo's work. The method I propose is quite simple, it involves looking at the __OBJC segment for any sections with padding, then writing our shellcode into each of them. Then basically overwriting a methods pointer with the address of the start of our shellcode. When the shellcode finishes executing, the original address is called. While this method is more complicated/convoluted than other Mach-O infection techniques, no attempt to modify the entry point takes place. This makes it harder to detect for the uninitiated. In order to demonstrate this procedure I wrote the following tiny assembly code. -[dcbz@megatron:~/code]$ cat infected.asm BITS 32 SECTION .text _main: xor eax,eax push byte 0xa jmp short down up: push eax mov al,0x04 push eax ; fake int 0x80 jmp short end down: call up db "infected!",0x0a,0x00 end: int3 -[dcbz@megatron:~/code]$ cat tst.c char sc[] = "\x31\xc0\x6a\x0a\xeb\x08\x50\xb0\x04\x50\xcd\x80\xeb\x10\xe8\xf3" "\xff\xff\xff\x69\x6e\x66\x65\x63\x74\x65\x64\x21\x0a\x00\xcc"; int main(int ac, char **av) { void (*fp)() = sc; fp(); } -[dcbz@megatron:~/code]$ gcc tst.c -o tst tst.c: In function 'main': tst.c:7: warning: initialization from incompatible pointer type -[dcbz@megatron:~/code]$ ./tst infected! Trace/BPT trap As you can see when executed this code simply prints the string "infected!\n" using the write() system call. This will be the parasite code, our poor little HelloWorld project will be the host. The first step in our infection process is to locate a little slab of space in the file where we can stick our code. Our code is around 30 bytes in length, so we'll need around 36 bytes in order to call the old address as well and complete the hook. Looking at the first two sections in our OBJC segment, the first has an offset of 8192 and a size of 0x30 the second has an offset of 8256. Section sectname __class segname __OBJC addr 0x00003000 size 0x00000030 offset 8192 align 2^5 (32) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 Section sectname __meta_class segname __OBJC addr 0x00003040 size 0x00000030 offset 8256 align 2^5 (32) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 If we do the math on the first part: >>> 8192 + 0x30 8240 This means there's 16 bytes of padding in the file that we can use to store our code. If needed, however since our code is quite a bit bigger than this it would be painful to squeeze it into the padding here. Fortunately we can utilize the __OBJC.__image_info section. There is a tone of padding straight after this section. Section sectname __image_info segname __OBJC addr 0x000030c8 size 0x00000008 offset 8392 align 2^2 (4) reloff 0 nreloc 0 flags 0x00000000 reserved1 0 reserved2 0 So this is where we can store our code. But first, we need to increase the size of this section in the header. We can do this using HTE [20].