💾 Archived View for aphrack.org › issues › phrack69 › 9.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
==Phrack Inc.== Volume 0x0f, Issue 0x45, Phile #0x09 of 0x10 |=-----------------------------------------------------------------------=| |=-----------=[ Modern Objective-C Exploitation Techniques ]=------------=| |=-----------------------------------------------------------------------=| |=----------------------------=[ by nemo ]=------------------------------=| |=-----------------------=[ nemo@felinemenace.org ]=---------------------=| |=-----------------------------------------------------------------------=| --[ Introduction Hello again reader. Over the years the exploitation process has obviously shifted in complexity. What once began with the straight forward case of turning a single bug into a reliable exploit has now evolved more towards combining vulnerability primitives together in an attempt to bypass each of the memory protection hurdles present on a modern day operating system. With this in mind, let's jump once again into the exploitation of Objective-C based memory corruption vulnerabilities in a modern time. Back in Phrack 0x42 (Phile #0x04) I wrote a paper documenting a way to turn the most common Objective-C memory corruption primitive (an attacker controlled Objective-C method call) into control of EIP. If you have not read this paper, or if it's been a while and you need to refresh, it's probably wise to do so now, as the first half of this paper will only build on the techniques covered in the original [1]. Contrary to the beliefs of Ian Beer, the techniques in the original paper are still alive and kicking in modern times however some adjustment is needed depending on the context of the vulnerability. --[ Dangling Objective-C Method Calls As you're aware since you read my paper in [1], Objective-C method calls are implemented by passing "messages" to the receiver (object) via the objc_msgSend() API call. When Objective-C objects are allocated, storage for their instance variables is allocated on the native heap with malloc(). The first element in this space is a pointer to the class definition in the binary. This is typically referred to as the "ISA" pointer. As in: "an NSString 'IS-A' NSObject". When dealing with bugs in Objective-C applications it is extremely common for this ISA pointer to be attacker controlled, resulting in an Objective-C method call to be performed on an attacker controlled memory location. This can occur when dealing with Use-After-Free conditions, heap overflows into objective-c objects, and even format bugs using the %@ format string character. In my original paper [1] I wrote about how to utilize this construct to perform a successful cache lookup for the selector value, resulting in control of EIP. An alternative route to gain EIP control is to make the Objective-C runtime think that it's finished looking through the entire cache and found no match for the SEL value passed in. In which case the runtime will attempt to resolve the method's address via the class definition (through the controlled ISA pointer) and once again use an EIP value from memory controlled by us. This method is longer however, and adds little benefit. But i digress, both of these methods are still completely valid in the most current version of Mac OS X at this time Mavericks, (10.10). While, at the time of the Phrack 0x42 release, this technique was fairly useful by itself, in modern times EIP/RIP control is only a small victory and in no way wins the battle of process control. This is due to the fact that even with direct control of EIP modern NX and ASLR makes it difficult to know a reliable absolute location in which we can store a payload and return to execute it. From what i've seen, the most commonly used technique to bypass this currently is to combine an EIP control primitive with an information leak of a .text address in order to construct a ROP chain (returning repeatedly into the text segment) which either executes the needed functionality, mprotect()'s some shellcode before executing it, or loads an existing executable or shared library. Under the right conditions, it is possible to skip some of these steps and turn a dangling Objective-C method call into both an information leak and execution control. In order to use this technique, we must first know the exact binary version in use on the target. Thankfully on Mac OS X this is usually pretty easy as automatic updates mean that most people are running the same binary version. The specifics of the technique differ depending on the architecture of the target system, as well as the location of the particular SEL string which is used in the dangling method call construct. Since we are already familiar with 32-bit internals, we will begin our investigation of dangling objc_msgSend() exploitation with the 32-bit runtime, before moving on to look at the changes in the new run-time on 64-bit. --[ 32-bit dangling objc_msgSend() Firstly, 32-bit processes utilize the old Objective-C runtime, so the specifics of the internals are identical to what is documented in my original paper. However, depending on the location of the module containing the selector string, the technique varies slightly. ----[ 32-bit Shared Region The shared-region is a mapping which is common to all processes on the system. The file '/var/db/dyld/dyld_shared_cache_i386' is mapped into this space. This file is generated by the "update_dyld_shared_cache" utility during system update, and contains a large selection of libraries which are commonly used on the system. The .paths files in "/var/db/dyld/shared_region_roots" dictate which files are contained within. The order in which each library is added to this file is randomized, therefore the offset into the file for a particular library cannot be relied on. Reading the file '/var/db/dyld/dyld_shared_cache_i386.map' shows the order of these files. For 32-bit processes, this file is mapped at the fixed address 0x90000000. At this location there is a structure which described the contents of the shared region. This technique, once again, revolves around the ability to control the ISA pointer, and to point it at a fake class struct in memory. In order to demonstrate how this works, a small sample Objective-C class was created (shown below). The complete example of this technique is included at the end of this paper in the uuencoded files blob. [leakme.m] #import "leakme.h" @implementation leakme -(void) log { printf("lol\n"); } @end In main.m, we create an instance of this object, and then use sprintf() to write out a string representation of the objects address, before converting it back with atol(). This is pretty confusing, but it's basically an easy way to trick the compiler into giving us a void pointer to the object. Type casting the object pointer directly will not compile with gcc. printf("[+] Class @ 0x%lx\n",l); sprintf(num,"%li",l); long *ptr = atol(num); ... printf("[+] Overwriting object\n"); *ptr = &fc; // isa ptr By overwriting the ISA pointer with the address of an allocation we control, we can easily simulate a vulnerable scenario. Obviously in the real world things are not that easy. We need to know the address of an allocation which we control. There are a variety of ways this can be accomplished. Some examples of these are: - Using a leak to read a pointer out of memory. - Abusing language weaknesses to infer an address. [2] - Abuse the predictable nature of large allocations. However, these concepts are the topic of many other discussions and not relevant to this particular technique. As a quick refresher, the first thing the Objective-C runtime does when attempting to call a method for an object (objc_msgSend()) is to retrieve the location of the method cache for the object. This is done by offsetting the ISA pointer by 0x20 and reading the pointer at this location. To control this cache pointer we use the following structure: struct fakecache { char pad[0x20]; long cache_ptr; }; In the example code we use a separate allocation for the fakecache struct and the cache itself. However in a real scenario the address of the cache itself would most likely be the same address as the fakecache offset by 0x24. This would allow us to use a single allocation, and therefore a single address, reducing the constraints of the exploit. Also, in a real world case we could leak the address of the cache_ptr, then subtract 0x20 from it's address. This would allow us to shave 0x20 bytes off of the buffer we need to control. Next, objc_msgSend() traverses the cache looking for a cached method call matching the desired implementation. This is done by iterating through a series of pointers to cache entries. Each entry contains a SEL which matches the cached method SEL in the .text segment of the Objective-C binary. By comparing this SEL value with the SEL value passed to objc_msgSend() the matching entry can be located and used. Rather than iterating through every pointer to find the appropriate cache entry each time however, a mask is applied to the selector pointer. The masked off bits are then shifted and used as an index into the cache table entry pointer array. Then after this index is used, each entry is inspected. This means that multiple entries can have the same index, however it greatly reduces the search time of the cache. Controlling the mask provides us with the mechanism we need to create a leak. Ok, so going back to the mask. In my original Objective-C paper, we set the mask to 0. This forced the runtime to look directly past the mask regardless of what value the SEL had. In this case however, we want to abuse the mask in order to isolate the "randomized" unpredictable bits in the selector pointer value (SEL). Below, we can see a "real" SEL value from a 10.10 system, which is located in the shared_region. (lldb) x/s $ecx 0x90f3f86e: "length" Since we know that the shared region begins at 0x90000000 we know that first octet will always be 0x9. We also know that the offset into the page which contains the SEL will always be the same, therefore the last 3 octets 0x86e will be the same for the binary version we retrieve the SEL value from. However, we cannot count on the rest of the SEL value being the same on the system we are running our exploit against. For the value 0x90f3f86e we can see the bit pattern looks as follows: 9 0 f 3 f 8 6 e 1001 0000 1111 0011 1111 1000 0110 1110 : 0x90f3f86e Based on what we just discussed the mask which would retrieve the bits we care about looks as follows: 0 f f f f 0 0 0 0000 1111 1111 1111 1111 0000 0000 0000 : 0x0ffff000 However, since objc_msgSend() shifts the SEL 2 to the right prior to applying the mask, we must shift our mask to account for this. This leaves us with: 0 3 f f f c 0 0 0000 0011 1111 1111 1111 1100 0000 0000 : 0x03fffc00 As you remember, objc_msgSend() applies the following calculation to generate the index into the cache entries: index = (SEL >> 2) & mask Filling in the values for this leaves us with an index like: index = (0x90f3f86e >> 2) & 0x03fffc00 == 0x3cfc00 This means that for our particular SEL value the runtime will index 0x3cfc00 * 4 (0xf3f000) bytes forward, and take the bucket pointer from this location. It will then dereference the pointer and check for a SEL match at that location. By creating a giant cache slide, containing all permutations of slide, we can make sure that this location contains the right value for slide. In the 32-bit runtime (the old runtime) the cache index is used to retrieve a pointer to a cache_entry from an array of pointers. (buckets). In our example code (main.m) we set the buckets array up as follows: long *buckets = malloc((CACHESIZE + 1) * sizeof(void *)); However, in a typical exploitation scenario, this array would be part of the single large allocation which we control. For each of the buckets pointers, a cache entry must be allocated. In the example code we can use the following struct for each of these entries: struct cacheentry { long sel; long pad; long eip; }; Each of these structures must be populated with a different SEL and EIP value depending on its index into the table. For each of the possible index values, we add the (unshifted) randomized bits to the SEL base. This way the appropriate SEL is guaranteed to match after the mask is applied and used to index the table. For the EIP value, we can utilize the fact that the string table containing the SEL string is always going to be relative to the .text segment within the same binary. The diagram below shows this more clearly. ,_______________,<--- Mach-O base address | | | mach-o header | +---------------+ | |<--- SEL in string table, relative to base | string table | /\ Relative offset +---------------+ \/ from SEL to ROP gadgets | |<--- ROP gadget in .text segment | .text segment | '---------------' For each possible entry in the table, the EIP value must be set to the appropriate address relative to the SEL value used. The quickest way i know to calculate these values is to break on the objc_msgSend function and dump the current SEL value. In lldb this is simple a case of using "reg read ecx". Next, "target module list -a $ecx" provides us with the module base. By subtracting the absolute SEL address from the module base we can get the relative offset within the module. This can be repeated for the gadget address within the same module. Next, when populating the table, we simple need to add these two relative offsets to our potential module base candidate. We increment the module base candidate for each entry in the table. By populating our cache slide in this way we are guaranteed the execution of a single ROP gadget within the module that our SEL is in. This can be enough for us to succeed. We will look into ways to use this construct later. Obviously the allocation used for this 32-bit technique is very large. To calculate the size of the cache slide which we need to generate we need to look at the size of the shared region. The shared region always starts at 0x90000000, but the first module inside the shared region starts at 0x90008000. The end of the shared region depends on the number of modules loaded in the shared region. On the latest version of Mac OS X at this time, the end of the shared region is located at 0x9c391000. The bit patterns for these are shown below. 10010000 00000000 10000000 00000000 :: SR START -- 0x90008000 10011100 00111001 00010000 00000000 :: SR END -- 0x9C391000 00001111 11111111 11110000 00000000 :: MASK UNSHIFTED If we compare this to the unshifted mask, and mask off the bits we care about we get the following values for our potential index values. 00000000 00000000 00100000 00000000 -- smallest index value - 0x2000 00000011 00001110 01000100 00000000 -- biggest index value - 0x30E4400 Since the buckets array is an array of 4 byte pointer values we can multiple the largest index by 4, giving us 0xc391000. Each cache entry pointed to by a bucket is 12 bytes in size. This means that the size of the cache entry array is 0x24ab3000. By adding these two values together we get the total size of our cache slide, 0x30e44000 bytes. Allocations of this size can be difficult to make depending on the target application. However, also due to the size, they are predictably placed within the address space. This buffer can be made from JavaScript for example. ----[ Uncommon 32-bit Libraries Libraries which are not contained within the shared region are mapped in by the linker when an executable is loaded that requires them as a dependency. The location of these modules is always relative to the end of the executable file and is loaded in the order specified in the LC_LOAD_DYLIB header. When loading the executable file, the kernel generates a randomized slide value for ASLR. This value is added to the desired segment load addresses in the executable (if it's compiled with PIE) and then the executable is re-based to that location. uintptr_t requestedLoadAddress = segPreferredLoadAddress(i) + slide; The slide value is calculated by the kernel and then passed to the main function of the dynamic loader. The following algorithm is responsible for generating the slide value. aslr_offset = (unsigned int)random(); max_slide_pages = vm_map_get_max_aslr_slide_pages(map); aslr_offset %= max_slide_pages; aslr_offset <<= vm_map_page_shift(map); where: uint64_t vm_map_get_max_aslr_slide_pages(vm_map_t map) { return (1 << (vm_map_is_64bit(map) ? 16 : 8)); } int vm_map_page_shift( vm_map_t map) { return VM_MAP_PAGE_SHIFT(map); } #define VM_MAP_PAGE_SHIFT(map) \ ((map) ? (map)->hdr.page_shift : PAGE_SHIFT) #define PAGE_SHIFT I386_PGSHIFT #define I386_PGSHIFT 12 So for example, a random() value of 0xdeadbeef, would end up as the value 0xef000. With the following calculation: slide = ((0xdeadbeef % (1<<8)) << 12) slide = 0xef000 The gcc compiler and llvm both (by default) use a load address of 0x1000 for the text section of an executable. So for the slide value 0xef000 the executable file would be based at 0x1000 + 0xef000 = 0xf0000. This means that for the most part, you're dealing with roughly 1 byte of unpredictable bits. Depending on the number of libraries loaded which are outside of the shared region, this fluctuates, however libraries are always loaded in the order stipulated by the executable itself, so this is fairly predictable. For our dangling objc_msgSend technique this means that our mask fluctuates depending on the target. In the best case, masking of the single byte in the address can be achieved by using the mask (0x000ff000 >> 2) == 0x3fc00. --[ 64-bit dangling objc_msgSend() The 64-bit version of this technique is quite different to it's 32-bit brethren. This is mostly due to the fact that 64-bit processes use a brethren. This is mostly due to the fact that 64-bit processes use a whole new version of the runtime. In the new runtime, the objc_class structure is no longer a basic C structure. Instead it uses C++ intrinsics to include methods. The memory footprint for the new class is shown below. struct objc_class : objc_object { // Class ISA; Class superclass; cache_t cache; // formerly cache pointer and vtable class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags ... } The cache_t struct looks as follows: struct cache_t { struct bucket_t *_buckets; mask_t _mask; mask_t _occupied; ... } and a bucket_t struct looks like: struct bucket_t { private: cache_key_t _key; IMP _imp; ... } Putting this together. The main thing that has changed regarding the cache lookup, rather than an array of pointers to cache entries, there is simply a single pointer to an array of SEL + method address entries at offset 0x10 into the structure. Following this, there's the mask, followed by an occupied field indicating that entries in the cache exist. The critical difference in the run-time is the way the mask is used to index into this table. Rather than the (SEL >> 2) value in the 32-bit runtime, the index is calculated via ((SEL & mask) << 4). This means, if we were to abuse the mask in a similar way to the 32-bit technique we would need a mask of 0xffff0000 in order to isolate the randomized bits. Obviously even if we were able to make an allocation big enough to contain the cache slide necessary for this it would be such a time consuming act to populate 4gb worth of cache entries to catch the index that this is not really a feasible process. Instead we must utilize an additional characteristic of the new runtime. The objc_msgSend() call at a high level looks as follows: ISA = *class_ptr; offset = ((SEL & ISA->mask) << 4); while(ISA->buckets[offset].SEL != 0) { if(ISA->buckets[offset].SEL == SEL) { return ISA->buckets[offset].method(args); } else { offset--; continue; } } This means that if we once again create a large slide containing entries for all possible randomized bits, we simply need to point (using the index we control) the runtime to end of our slide, and let it walk backwards until it finds a match. ----[ 64-bit Shared Region In order to investigate this technique, we will begin again by looking at the shared region on 64-bit processes. The shared region starts at the address 0x7FFF80000000. Once again a cache file is mapped in, this time from /var/db/dyld/dyld_shared_cache_x86_64. This file is, once again, randomized upon creation, however in 64-bit processes there is also a random slide added to the file when it is mapped in. This is calculated using sizeof(shared_region) - sizeof(cache file) as the max. As far as our technique goes however this does not really change very much. Calculating the mask value for this technique can be challenging. There are a few constraints which we must work against in order to index our bucket list to the last entry. To investigate this we will take a typical SEL value 0x00007fff99f88447 The bit pattern is broke down below. SEL: 0x00 00 7f ff 99 f8 84 47 00000000 00000000 01111111 11111111 10011001 11111000 10000100 01000111 Unfortunately the mask variable is only 4 bytes long. This means that the predictable bits in the upper 32-bits of the SEL are not available to us. Also, the last 12 static bits (offset into page - 0x447) would result in an index that is too small. If we used those bits we would not have a large enough offset to index to the end of the slide. Luckily, we have one single static bit in position 33 which we can count on being set. We can take advantage of this bit with the following mask. Mask: 0x00 00 00 00 80 00 00 00 00000000 00000000 00000000 00000000 10000000 00000000 00000000 00000000 Applying this bit to any SEL value within the shared region will guarantee the offset 0x80000000. Clearly this value is way beyond the end of our required slide, however since we also control the pointer to the bucket slide, we can subtract (0x80000000 - sizeof(cache)) from the pointer value to force it to point to the right location. The example code main64.m demonstrates this technique. In this code, we use a fakecache structure to control the initial cache lookup. A pad is used to correctly position the bucket pointer and mask. struct fakecache { char pad[0x10]; long bucketptr; long mask; }; Next, we allocate an array of cache entry structs in order to hold our SEL slide. Obviously in a real attack all these elements would be in a single allocation, however for this example we will split them up for clarity. struct cacheentry { long sel; long rip; }; struct cacheentry *buckets = malloc((NUMBUCKETS+1) * sizeof(struct cacheentry)); Initializing each of these elements is simply a case of incrementing the random value added to the SEL each time, and populating each entry. Again, the RIP value is calculated by adding a relative offset to the SEL in order to locate our ROP gadget. for(slide = 0; slide < NUMBUCKETS ; slide++) { buckets[slide].sel = BASESEL + (slide * 0x1000); buckets[slide].rip = buckets[slide].sel - 75654446; } ----[ Uncommon 64-bit Libraries Once again, libraries which are not within the shared region are mapped directly after the executable image in memory. Typically the text segment address generated by the compiler is 0x100000000. The same code is used to to generate the slide that we looked at earlier in the 32-bit section. Here is an example of a slide for a 64-bit process with the random() value of 0xdeadbeef. slide = ((0xdeadbeef % (1<<16)) << 12) slide = 0xbeef000 example SEL = 0x10beef447 As you can see, in this example, there is no predictable bit in the lower 32-bits of the SEL which we can rely on to index to the end of our table. Our only option here is to utilize the random bits in the SEL. We can do this by repeating the entire spectrum of randomized values in our slide multiple times. This way depending on the value of the random bits a different offset will occur into the slide, however in most scenarios it will result in finding one instance of the correct entry. --[ Single Gadget Exploitation Strategies Now that we've looked at how to get execution to a predictable location of our choice, the next step is to look at some ways to utilize this to our advantage. Obviously there is an abundance of ways that this can be utilized, but the following 3 methods are ways that I have seen succeed in real life. ----[ Return SEL Gadget At the moment when we gain execution control using this technique a register value contains the SEL pointer value. We can use this fact to our advantage. For example, for 32-bit code, the following gadget could take advantage of this. 00000000 89C8 mov eax,ecx 00000002 5E pop esi 00000003 5D pop ebp 00000004 C3 ret The gadget above moves the SEL pointer value into the eax register, obviously on function return this register is treated as the return value. Next it restores EBP from the stack and uses the ret instruction to return from the function. This results in, rather than the expected return value for whatever Objective-C method was dangling, the SEL value is returned. This is only a useful approach if we are able to retrieve the value from this context and utilize it to re-trigger the bug. In the example code provided, the use of this gadget causes the SEL value to be printed, rather than the length of the NSString which is intended. You can see the result of this below. -[nemo@objcbox:code]$ ./leak [+] buckets is 0x10000000 size. [+] cacheentry is 0x30000000 size. [+] Setting up buckets [+] Done [+] Class @ 0x78622240 [+] Overwriting object [+] Calling method String length: 0x93371b88 Likewise, in some cases it may not make sense to return the SEL directly. If it is not possible to retrieve the leaked value upon return it may make more sense to execute a gadget which writes ecx somewhere in memory. For example in a web browser context, writing the ecx register into a JavaScript array which is attacker controlled may result in the ability to "collect" this value from JavaScript context and re-trigger the bug. ----[ Self Modifying ROP Another potential use of the single gadget execution primitive is to use the ecx register containing SEL to modify the rest of a ROP chain prior to pivoting to use it. I have never personally been successful with this, however I have seen this done in a friends exploit. Finding a gadget which accomplishes all this is extremely challenging. ----[ Arbitrary Write Gadget The final method for using a single gadget to continue the exploitation process is to turn the execution primitive into an arbitrary write primitive. It is usually fairly straight forward to find a gadget which allows you to write any high value to a fixed location. By positioning something at this location (eg 0x0d0d0d0d) this single write can be leveraged to escalate the available functionality. For example, in a web context. Positioning a JavaScript array or string at this location then writing to the length field can be enough to gain an arbitrary read/write primitive from JavaScript. This is easily enough to finish the exploitation process. Outside of the browser context there are still a variety of length encoded data types which can be used for this. Specifically to Objective-C, the NSMutableArray/NSArray classes work this way. --[ Tagged Pointers One of the new features added to the Objective-C runtime is the usage of "tagged pointers" to conserve resources. Tagged pointers take advantage of the fact that the system memory allocator will align pointers handed out on natural alignment boundaries. This means that the low bit will never be set. (lldb) print (long)malloc_good_size(1) (long) $0 = 16 The runtime takes advantage of this lower bit in order to indicate that the pointer value is not to be treated as a regular pointer, and instead, bits 61-63 are used as an index into a table of potential ISA pointers, registered with the system. This means the first 60 bits can then by used to store the object payload itself inline. Tagged pointer layout 11111111 11111111 11111111 11111111 11111111 11111111 11111111 1111[111][1] | | | tag index As mentioned, index bits index into a table of potential object types. The default types registered with the runtime is shown below. OBJC_TAG_NSAtom = 0, OBJC_TAG_1 = 1, OBJC_TAG_NSString = 2, OBJC_TAG_NSNumber = 3, OBJC_TAG_NSIndexPath = 4, OBJC_TAG_NSManagedObjectID = 5, OBJC_TAG_NSDate = 6, OBJC_TAG_7 = 7 It is possible for a developer to add their own types to the table, however it is very uncommon for anyone to do this. The guide at [3] clearly illustrates the mechanics of tagged pointers, if you require more information. Now that we've looked at how tagged pointers work, we will investigate some of them from an exploitation perspective. ----[ Tagged NSAtom NSAtom is an extremely handy object type for exploitation. In order to use a tagged NSAtom, we simply need the low bit set indicating a tagged pointer, and then no bits set in the index bits. The value 0x1 by itself for example will satisfy this. The beautiful thing about the NSAtom class is that calling any method name on this class will result in success. The example code below simply calls the method initWithUTF8String on the object 0x1. Clearly this is not a valid pointer, and instead is treated as an NSAtom. Any method name could be used and the result would still be 1. int main(int argc, const char * argv[]) { printf("[+] NSAtom returned: %u\n",[1 initWithUTF8String:"lol"]); return 0; } $ ./nsatom [+] NSAtom returned: 1 As you can imagine, this behavior can be extremely useful for CoE or general exploitation. An example scenario would be, if you are forced to write through several Objective-C object pointers on the path to an overwrite target, any method call on those objects would require valid pointers/fake object setup. However with the NSAtom tagged pointer type, simply replacing these pointers with the value 0x1 can be enough to stop the crash and take advantage of the overwrite target. Also, in extremely specific cases, the fact that this object returns true can be used to manipulate the path of the program. ----[ Tagged NSString The next tagged pointer type we will investigate is the tagged NSString. With the new runtime, when a NSString is created, the size of the string during initialization dictates the type of storage for the string. String which are greater than 7 bytes in length are stored on the heap in a typical Objective-C NSString object. However, for strings of 7-bytes or less, a tagged pointer with the index 2 is used. The bitpattern for a tagged NSString is shown below. It is comprised of 7 bytes of string data, followed by 4 bits for the length, 3-bit for the index into the tagged pointer types array and finally the low bit to indicate tagged pointer type. <-------------------[ String Data ]--------------------> 11111111111111111111111111111111111111111111111111111111[1111][010][1] [strlen]<----> | | | tag index: 02 The first scenario in which we can abuse the properties of a tagged NSString is a partial overwrite into an untagged NSString. The example code included with this paper (nsstring1.m) demonstrates this. In this code (shown below) we create an NSString (s) using the C string contents "thisisaverylongstringnottagged". Since this is not 7 or less bytes in length this string is stored on the heap, and the object pointer points to this. We use the character pointer (ptr) to simulate a 1 byte write into the least significant byte of the object pointer. This condition can occur from either a controlled overflow, or an actual 1 byte off-by-one. We write the value 0xf5 to this byte, and then print the length and contents of the string. int main(int argc, const char * argv[]) { NSString *s = [[NSString alloc] initWithUTF8String:"thisisaverylongstringnottagged"]; char *ptr = (char *)&s; *ptr = 0xf5; // NSString Tagged printf("[+] NSString @ 0x%lx\n",(unsigned long)s); printf("[+] String length: 0x%lx\n",(unsigned long)[s length]); NSLog(@"%@",s); return 0; } The value 0xf5 in the least significant byte has the following bit pattern: [1111][010][1] As you can see, this leaves us with a string length of 0xf, an index of 0x2 and the LSB set to indicate a tagged pointer. By only using a partial overwrite, we have left the first 7 bytes of the pointer untouched. As you can see from the output below, the length of the string is 0xf (15) after this overwrite. This means that when the NSLog() attempts to print the string contents, 15 bytes of data are pulled out starting from the inline data. This leaks the address of the object. If our target allows us to retrieve a string value and use it, we can turn a one byte overwrite into an info leak primitive. $ ./nsstring1 [+] NSString @ 0x7fc0db4116f5 [+] String length: 0xf 2015-04-04 07:47:26.815 nsstring1[13335:92489992] eeeeeee 3eIjuaj The next scenario which we will investigate involves overflowing into a tagged NSString, rather than an un-tagged variant. The example code nsstring2.m demonstrates this. In this code, we initialize an NSString with the contents "AAAAAAA". Since this is only 7 bytes of c-string it guarantees that the NSString will be a tagged type. This means it will contain the value: 0x4141414141414175 Essentially the first 7 bytes are taken up with our "A" contents. The last byte contains the length (7) followed by the bitpattern to indicate NSString type of tagged pointer. Next, we once again simulate a single byte overflow into the object pointer. This time we write the value 0x00, which is a common primitive in real life due to off-by-one string operations. This forcefully unsets the tagged LSB in the pointer, turning the tagged string into an un-tagged type. Finally we call the length method on the object. int main(int argc, const char * argv[]) { NSString *s = [[NSString alloc] initWithUTF8String:"AAAAAAA"]; char *ptr = (char *)&s; *ptr = 0x00; // un-tag printf("[+] NSString @ 0x%lx\n",(unsigned long)s); printf("[+] String length: 0x%lx\n",(unsigned long)[s length]); NSLog(@"%@",s); return 0; } As you can imagine, the runtime now treats our tagged object as untagged. This means that the tagged pointer is now treated as a real pointer. If we were able to control the contents of the NSString on initialization, this would present us with direct control over the object cache lookup, allowing us to use the construct presented earlier in the paper to turn this into code execution. (lldb) r Process 13636 launched: './nsstring2' (x86_64) [+] NSString @ 0x4141414141414100 Process 13636 stopped