💾 Archived View for aphrack.org › issues › phrack58 › 4.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
==Phrack Inc.== Volume 0x0b, Issue 0x3a, Phile #0x04 of 0x0e |=------------=[ The advanced return-into-lib(c) exploits: ]=------------=| |=------------------------=[ PaX case study ]=---------------------------=| |=-----------------------------------------------------------------------=| |=----------------=[ by Nergal <nergal@owl.openwall.com> ]=--------------=| May this night carry my will And may these old mountains forever remember this night May the forest whisper my name And may the storm bring these words to the end of all worlds Ihsahn, "Alsvartr" --[ 1 - Intro 1 - Intro 2 - Classical return-into-libc 3 - Chaining return-into-libc calls 3.1 - Problems with the classical approach 3.2 - "esp lifting" method 3.3 - frame faking 3.4 - Inserting null bytes 3.5 - Summary 3.6 - The sample code 4 - PaX features 4.1 - PaX basics 4.2 - PaX and return-into-lib exploits 4.3 - PaX and mmap base randomization 5 - The dynamic linker's dl-resolve() function 5.1 - A few ELF data types 5.2 - A few ELF data structures 5.3 - How dl-resolve() is called from PLT 5.4 - The conclusion 6 - Defeating PaX 6.1 - Requirements 6.2 - Building the exploit 7 - Misc 7.1 - Portability 7.2 - Other types of vulnerabilities 7.3 - Other non-exec solutions 7.4 - Improving existing non-exec schemes 7.5 - The versions used 8 - Referenced publications and projects This article can be roughly divided into two parts. First, the advanced return-into-lib(c) techniques are described. Some of the presented ideas, or rather similar ones, have already been published by others. However, the available pieces of information are dispersed, usually platform-specific, somewhat limited, and the accompanying source code is not instructive enough (or at all). Therefore I have decided to assemble the available bits and a few of my thoughts into a single document, which should be useful as a convenient reference. Judging by the contents of many posts on security lists, the presented information is by no means the common knowledge. The second part is devoted to methods of bypassing PaX in case of stack buffer overflow (other types of vulnerabilities are discussed at the end). The recent PaX improvements, namely randomization of addresses the stack and the libraries are mmapped at, pose an untrivial challenge for an exploit coder. An original technique of calling directly the dynamic linker's symbol resolution procedure is presented. This method is very generic and the conditions required for successful exploitation are usually satisfied. Because PaX is Intel platform specific, the sample source code has been prepared for Linux i386 glibc systems. PaX is not considered sufficiently stable by most people; however, the presented techniques (described for Linux on i386 case) should be portable to other OSes/architectures and can be possibly used to evade other non-executability schemes, including ones implemented by hardware. The reader is supposed to possess the knowledge on standard exploit techniques. Articles [1] and [2] should probably be assimilated before further reading. [12] contains a practical description of ELF internals. --[ 2 - Classical return-into-libc The classical return-into-libc technique is well described in [2], so just a short summary here. This method is most commonly used to evade protection offered by the non-executable stack. Instead of returning into code located within the stack, the vulnerable function should return into a memory area occupied by a dynamic library. It can be achieved by overflowing a stack buffer with the following payload: <- stack grows this way addresses grow this way -> ------------------------------------------------------------------ | buffer fill-up(*)| function_in_lib | dummy_int32 | arg_1 | arg_2 | ... ------------------------------------------------------------------ ^ | - this int32 should overwrite saved return address of a vulnerable function (*) buffer fill-up should overwrite saved %ebp placeholder as well, if the latter is used When the function containing the overflown buffer returns, the execution will resume at function_in_lib, which should be the address of a library function. From this function's point of view, dummy_int32 will be the return address, and arg_1, arg_2 and the following words - the arguments. Typically, function_in_lib will be the libc system() function address, and arg_1 will point to "/bin/sh". --[ 3 - Chaining return-into-libc calls ----[ 3.1 - Problems with the classical approach The previous technique has two essential limitations. First, it is impossible to call another function, which requires arguments, after function_in_lib. Why ? When the function_in_lib returns, the execution will resume at address dummy_int32. Well, it can be another library function, yet its arguments would have to occupy the same place that function_in_lib's argument does. Sometimes this is not a problem (see [3] for a generic example). Observe that the need for more than one function call is frequent. If a vulnerable application temporarily drops privileges (for example, a setuid application can do seteuid(getuid())), an exploit must regain privileges (with a call to setuid(something) usually) before calling system(). The second limitation is that the arguments to function_in_lib cannot contain null bytes (in case of a typical overflow caused by string manipulation routines). There are two methods to chain multiple library calls. ----[ 3.2 - "esp lifting" method This method is designed for attacking binaries compiled with -fomit-frame-pointer flag. In such case, the typical function epilogue looks this way: eplg: addl $LOCAL_VARS_SIZE,%esp ret Suppose f1 and f2 are addresses of functions located in a library. We build the following overflow string (I have skipped buffer fill-up to save space): <- stack grows this way addresses grow this way -> --------------------------------------------------------------------------- | f1 | eplg | f1_arg1 | f1_arg2 | ... | f1_argn| PAD | f2 | dmm | f2_args... --------------------------------------------------------------------------- ^ ^ ^ | | | | | <---------LOCAL_VARS_SIZE------------->| | |-- this int32 should overwrite return address of a vulnerable function PAD is a padding (consisting of irrelevant nonzero bytes), whose length, added to the amount of space occupied by f1's arguments, should equal LOCAL_VARS_SIZE. How does it work ? The vulnerable function will return into f1, which will see arguments f1_arg, f1_arg2 etc - OK. f1 will return into eplg. The "addl $LOCAL_VARS_SIZE,%esp" instruction will move the stack pointer by LOCAL_VARS_SIZE, so that it will point to the place where f2 address is stored. The "ret" instruction will return into f2, which will see arguments f2_args. Voila. We called two functions in a row. The similar technique was shown in [5]. Instead of returning into a standard function epilogue, one has to find the following sequence of instructions in a program (or library) image: pop-ret: popl any_register ret Such a sequence may be created as a result of a compiler optimization of a standard epilogue. It is pretty common. Now, we can construct the following payload: <- stack grows this way addresses grow this way -> ------------------------------------------------------------------------------ | buffer fill-up | f1 | pop-ret | f1_arg | f2 | dmm | f2_arg1 | f2_arg2 ... ------------------------------------------------------------------------------ ^ | - this int32 should overwrite return address of a vulnerable function It works very similarly to the previous example. Instead of moving the stack pointer by LOCAL_VARS_SIZE, we move it by 4 bytes with the "popl any_register" instruction. Therefore, all arguments passed to f1 can occupy at most 4 bytes. If we found a sequence pop-ret2: popl any_register_1 popl any_register_2 ret then we could pass to f1 two arguments of 4 bytes size each. The problem with the latter technique is that it is usually impossible to find a "pop-ret" sequence with more than three pops. Therefore, from now on we will use only the previous variation. In [6] one can find similar ideas, unfortunately with some errors and chaoticly explained. Note that we can chain an arbitrary number of functions this way. Another note: observe that we do not need to know the exact location of our payload (that is, we don't need to know the exact value of the stack pointer). Of course, if any of the called functions requires a pointer as an argument, and if this pointer should point within our payload, we will need to know its location. ----[ 3.3 - frame faking (see [4]) This second technique is designed to attack programs compiled _without_ -fomit-frame-pointer option. An epilogue of a function in such a binary looks like this: leaveret: leave ret Regardless of optimization level used, gcc will always prepend "ret" with "leave". Therefore, we will not find in such binary an useful "esp lifting" sequence (but see later the end of 3.5). In fact, sometimes the libgcc.a archive contains objects compiled with -fomit-frame-pointer option. During compilation, libgcc.a is linked into an executable by default. Therefore it is possible that a few "add $imm, %esp; ret" sequences can be found in an executable. However, we will not %rely on this gcc feature, as it depends on too many factors (gcc version, compiler options used and others). Instead of returning into "esp lifting" sequence, we will return into "leaveret". The overflow payload will consist of logically separated parts; usually, the exploit code will place them adjacently. <- stack grows this way addresses grow this way -> saved FP saved vuln. function's return address -------------------------------------------- | buffer fill-up(*) | fake_ebp0 | leaveret | -------------------------|------------------ | +---------------------+ (*) this time, buffer fill-up must not | overwrite the saved frame pointer ! v ----------------------------------------------- | fake_ebp1 | f1 | leaveret | f1_arg1 | f1_arg2 ... -----|----------------------------------------- | the first frame +-+ | v ------------------------------------------------ | fake_ebp2 | f2 | leaveret | f2_arg1 | f2_argv2 ... -----|------------------------------------------ | the second frame +-- ... fake_ebp0 should be the address of the "first frame", fake_ebp1 - the address of the second frame, etc. Now, some imagination is needed to visualize the flow of execution. 1) The vulnerable function's epilogue (that is, leave;ret) puts fake_ebp0 into %ebp and returns into leaveret. 2) The next 2 instructions (leave;ret) put fake_ebp1 into %ebp and return into f1. f1 sees appropriate arguments. 3) f1 executes, then returns. Steps 2) and 3) repeat, substitute f1 for f2,f3,...,fn. In [4] returning into a function epilogue is not used. Instead, the author proposed the following. The stack should be prepared so that the code would return into the place just after F's prologue, not into the function F itself. This works very similarly to the presented solution. However, we will soon face the situation when F is reachable only via PLT. In such case, it is impossible to return into the address F+something; only the technique presented here will work. (BTW, PLT acronym means "procedure linkage table". This term will be referenced a few times more; if it does not sound familiar, have a look at the beginning of [3] for a quick introduction or at [12] for a more systematic description). Note that in order to use this technique, one must know the precise location of fake frames, because fake_ebp fields must be set accordingly. If all the frames are located after the buffer fill-up, then one must know the value of %esp after the overflow. However, if we manage somehow to put fake frames into a known location in memory (in a static variable preferably), there is no need to guess the stack pointer value. There is a possibility to use this technique against programs compiled with -fomit-frame-pointer. In such case, we won't find leave&ret code sequence in the program code, but usually it can be found in the startup routines (from crtbegin.o) linked with the program. Also, we must change the "zeroth" chunk to ------------------------------------------------------- | buffer fill-up(*) | leaveret | fake_ebp0 | leaveret | ------------------------------------------------------- ^ | |-- this int32 should overwrite return address of a vulnerable function Two leaverets are required, because the vulnerable function will not set up %ebp for us on return. As the "fake frames" method has some advantages over "esp lifting", sometimes it is necessary to use this trick even when attacking a binary compiled with -fomit-frame-pointer. ----[ 3.4 - Inserting null bytes One problem remains: passing to a function an argument which contains 0. But when multiple function calls are available, there is a simple solution. The first few called functions should insert 0s into the place occupied by the parameters to the next functions. Strcpy is the most generic function which can be used. Its second argument should point to the null byte (located at some fixed place, probably in the program image), and the first argument should point to the byte which is to be nullified. So, thus we can nullify a single byte per a function call. If there is need to zero a few int32 location, perhaps other solutions will be more space-effective. For example, sprintf(some_writable_addr,"%n%n%n%n",ptr1, ptr2, ptr3, ptr4); will nullify a byte at some_writable_addr and nullify int32 locations at ptr1, ptr2, ptr3, ptr4. Many other functions can be used for this purpose, scanf being one of them (see [5]). Note that this trick solves one potential problem. If all libraries are mmapped at addresses which contain 0 (as in the case of Solar Designer non-exec stack patch), we can't return into a library directly, because we can't pass null bytes in the overflow payload. But if strcpy (or sprintf, see [3]) is used by the attacked program, there will be the appropriate PLT entry, which we can use. The first few calls should be the calls to strcpy (precisely, to its PLT entry), which will nullify not the bytes in the function's parameters, but the bytes in the function address itself. After this preparation, we can call arbitrary functions from libraries again. ----[ 3.5 - Summary Both presented methods are similar. The idea is to return from a called function not directly into the next one, but into some function epilogue, which will adjust the stack pointer accordingly (possibly with the help of the frame pointer), and transfer the control to the next function in the chain. In both cases we looked for an appropriate epilogue in the executable body. Usually, we may use epilogues of library functions as well. However, sometimes the library image is not directly reachable. One such case has already been mentioned (libraries can be mmapped at addresses which contain a null byte), we will face another case soon. Executable's image is not position independent, it must be mmapped at a fixed location (in case of Linux, at 0x08048000), so we may safely return into it. ----[ 3.6 - The sample code The attached files, ex-move.c and ex-frames.c, are the exploits for vuln.c program. The exploits chain a few strcpy calls and a mmap call. The additional explanations are given in the following chapter (see 4.2); anyway, one can use these files as templates for creating return-into-lib exploits. --[ 4 - PaX features ----[ 4.1 - PaX basics If you have never heard of PaX Linux kernel patch, you are advised to visit the project homepage [7]. Below there are a few quotations from the PaX documentation. "this document discusses the possibility of implementing non-executable pages for IA-32 processors (i.e. pages which user mode code can read or write, but cannot execute code in). since the processor's native page table/directory entry format has no provision for such a feature, it is a non-trivial task." "[...] there is a desire to provide some sort of programmatic way for protecting against buffer overflow based attacks. one such idea is the implementation of non-executable pages which eliminates the possibility of executing code in pages which are supposed to hold data only[...]" "[...] possible to write [kernel mode] code which will cause an inconsistent state in the DTLB and ITLB entries.[...] this very same mechanism would allow for creating another kind of inconsistent state where only data read/write accesses would be allowed and code execution prohibited. and this is what is needed for protecting against (many) buffer overflow based attacks." To sum up, a buffer overflow exploit usually tries to run code smuggled within some data passed to the attacked process. The main PaX functionality is to disallow execution of all data areas - thus PaX renders typical exploit techniques useless. --[ 4.2 - PaX and return-into-lib exploits Initially, non-executable data areas was the only feature of PaX. As you may have already guessed, it is not enough to stop return-into-lib exploits. Such exploits run code located within libraries or binary itself - the perfectly "legitimate" code. Using techniques described in chapter 3, one is able to run multiple library functions, which is usually more than enough to take advantage of the exploited program's privileges. Even worse, the following code will run successfully on a PaX protected system: char shellcode[] = "arbitrary code here"; mmap(0xaa011000, some_length, PROT_EXEC|PROT_READ|PROT_WRITE, MAP_FIXED|MAP_PRIVATE|MAP_ANON, -1, some_offset); strcpy(0xaa011000+1, shellcode); return into 0xaa011000+1; A quick explanation: mmap call will allocate a memory region at 0xaa011000. It is not related to any file object, thanks to the MAP_ANON flag, combined with the file descriptor equal to -1. The code located at 0xaa011000 can be executed even on PaX (because PROT_EXEC was set in mmap arguments). As we see, the arbitrary code placed in "shellcode" will be executed. Time for code examples. The attached file vuln.c is a simple program with an obvious stack overflow. Compile it with: $ gcc -o vuln-omit -fomit-frame-pointer vuln.c $ gcc -o vuln vuln.c The attached files, ex-move.c and ex-frames.c, are the exploits for vuln-omit and vuln binaries, respectively. Exploits attempt to run a sequence of strcpy() and mmap() calls. Consult the comments in the README.code for further instructions. If you plan to test these exploits on a system protected with recent version of PaX, you have to disable randomizing of mmap base with $ chpax -r vuln; chpax -r vuln-omit ----[ 4.3 - PaX and mmap base randomization In order to combat return-into-lib(c) exploits, a cute feature was added to PaX. If the appropriate option (CONFIG_PAX_RANDMMAP) is set during kernel configuration, the first loaded library will be mmapped at random location (next libraries will be mmapped after the first one). The same applies to the stack. The first library will be mmapped at 0x40000000+random*4k, the stack top will be equal to 0xc0000000-random*16; in both cases, "random" is a pseudo random unsigned 16-bit integer, obtained with a call to get_random_bytes(), which yields cryptographically strong data. One can test this behavior by running twice "ldd some_binary" command or executing "cat /proc/$/maps" from within two invocations of a shell. Under PaX, the two calls yield different results: nergal@behemoth 8 > ash $ cat /proc/$/maps 08048000-08058000 r-xp 00000000 03:45 77590 /bin/ash 08058000-08059000 rw-p 0000f000 03:45 77590 /bin/ash 08059000-0805c000 rw-p 00000000 00:00 0 4b150000-4b166000 r-xp 00000000 03:45 107760 /lib/ld-2.1.92.so 4b166000-4b167000 rw-p 00015000 03:45 107760 /lib/ld-2.1.92.so 4b167000-4b168000 rw-p 00000000 00:00 0 4b16e000-4b289000 r-xp 00000000 03:45 107767 /lib/libc-2.1.92.so 4b289000-4b28f000 rw-p 0011a000 03:45 107767 /lib/libc-2.1.92.so 4b28f000-4b293000 rw-p 00000000 00:00 0 bff78000-bff7b000 rw-p ffffe000 00:00 0 $ exit nergal@behemoth 9 > ash $ cat /proc/$/maps 08048000-08058000 r-xp 00000000 03:45 77590 /bin/ash 08058000-08059000 rw-p 0000f000 03:45 77590 /bin/ash 08059000-0805c000 rw-p 00000000 00:00 0 48b07000-48b1d000 r-xp 00000000 03:45 107760 /lib/ld-2.1.92.so 48b1d000-48b1e000 rw-p 00015000 03:45 107760 /lib/ld-2.1.92.so 48b1e000-48b1f000 rw-p 00000000 00:00 0 48b25000-48c40000 r-xp 00000000 03:45 107767 /lib/libc-2.1.92.so 48c40000-48c46000 rw-p 0011a000 03:45 107767 /lib/libc-2.1.92.so 48c46000-48c4a000 rw-p 00000000 00:00 0 bff76000-bff79000 rw-p ffffe000 00:00 0 CONFIG_PAX_RANDMMAP feature makes it impossible to simply return into a library. The address of a particular function will be different each time a binary is run. This feature has some obvious weaknesses; some of them can (and should be) fixed: 1) In case of a local exploit the addresses the libraries and the stack are mmapped at can be obtained from the world-readable /proc/pid_of_attacked_process/maps pseudofile. If the data overflowing the buffer can be prepared and passed to the victim after the victim process has started, an attacker has all information required to construct the overflow data. For example, if the overflowing data comes from program arguments or environment, a local attacker loses; if the data comes from some I/O operation (socket, file read usually), the local attacker wins. Solution: restrict access to /proc files, just like it is done in many other security patches. 2) One can bruteforce the mmap base. Usually (see the end of 6.1) it is enough to guess the libc base. After a few tens of thousands tries, an attacker has a fair chance of guessing right. Sure, each failed attempt is logged, but even large amount of logs at 2 am prevent nothing :) Solution: deploy segvguard [8]. It is a daemon which is notified by the kernel each time a process crashes with SIGSEGV or similar. Segvguard is able to temporarily disable execution of programs (which prevents bruteforcing), and has a few interesting features more. It is worth to use it even without PaX. 3) The information on the library and stack addresses can leak due to format bugs. For example, in case of wuftpd vulnerability, one could explore the stack with the command site exec [eat stack]%x.%x.%x... The automatic variables' pointers buried in the stack will reveal the stack base. The dynamic linker and libc startup routines leave on the stack some pointers (and return addresses) to the library objects, so it is possible to deduce the libraries base as well. 4) Sometimes, one can find a suitable function in an attacked binary (which is not position-independent and can't be mmapped randomly). For example, "su" has a function (called after successful authentication) which acquires root privileges and executes a shell - nothing more is needed. 5) All library functions used by a vulnerable program can be called via their PLT entry. Just like the binary, PLT must be present at a fixed address. Vulnerable programs are usually large and call many functions, so there is some probability of finding interesting stuff in PLT. In fact only the last three problems cannot be fixed, and none of them is guaranteed to manifest in a manner allowing successful exploitation (the fourth is very rare). We certainly need more generic methods. In the following chapter I will describe the interface to the dynamic linker's dl-resolve() function. If it is passed appropriate arguments, one of them being an asciiz string holding a function name, it will determine the actual function address. This functionality is similar to dlsym() function. Using the dl-resolve() function, we are able to build a return-into-lib exploit, which will return into a function, whose address is not known at exploit's build time. [12] also describes a method of acquiring a function address by its name, but the presented technique is useless for our purposes. --[ 5 - The dynamic linker's dl-resolve() function This chapter is simplified as much as possible. For the detailed description, see [9] and glibc sources, especially the file dl-runtime.c. See also [12]. ----[ 5.1 - A few ELF data types The following definitions are taken from the include file elf.h: typedef uint32_t Elf32_Addr; typedef uint32_t Elf32_Word; typedef struct { Elf32_Addr r_offset; /* Address */ Elf32_Word r_info; /* Relocation type and symbol index */ } Elf32_Rel; /* How to extract and insert information held in the r_info field. */ #define ELF32_R_SYM(val) ((val) >> 8) #define ELF32_R_TYPE(val) ((val) & 0xff) typedef struct { Elf32_Word st_name; /* Symbol name (string tbl index) */ Elf32_Addr st_value; /* Symbol value */ Elf32_Word st_size; /* Symbol size */ unsigned char st_info; /* Symbol type and binding */ unsigned char st_other; /* Symbol visibility under glibc>=2.2 */ Elf32_Section st_shndx; /* Section index */ } Elf32_Sym; The fields st_size, st_info and st_shndx are not used during symbol resolution. ----[ 5.2 - A few ELF data structures The ELF executable file contains a few data structures (arrays mainly) which are of some interest for us. The location of these structures can be retrieved from the executable's dynamic section. "objdump -x file" will display the contents of the dynamic section: $ objdump -x some_executable ... some other interesting stuff... Dynamic Section: ... STRTAB 0x80484f8 the location of string table (type char *) SYMTAB 0x8048268 the location of symbol table (type Elf32_Sym*) .... JMPREL 0x8048750 the location of table of relocation entries related to PLT (type Elf32_Rel*) ... VERSYM 0x80486a4 the location of array of version table indices (type uint16_t*) "objdump -x" will also reveal the location of .plt section, 0x08048894 in the example below: 11 .plt 00000230 08048894 08048894 00000894 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE ----[ 5.3 - How dl-resolve() is called from PLT A typical PLT entry (when elf format is elf32-i386) looks this way: (gdb) disas some_func Dump of assembler code for function some_func: 0x804xxx4 <some_func>: jmp *some_func_dyn_reloc_entry 0x804xxxa <some_func+6>: push $reloc_offset 0x804xxxf <some_func+11>: jmp beginning_of_.plt_section PLT entries differ only by $reloc_offset value (and the value of some_func_dyn_reloc_entry, but the latter is not used for the symbol resolution algorithm). As we see, this piece of code pushes $reloc_offset onto the stack and jumps at the beginning of .plt section. After a few instructions, the control is passed to dl-resolve() function, reloc_offset being one of its arguments (the second one, of type struct link_map *, is irrelevant for us). The following is the simplified dl-resolve() algorithm: 1) calculate some_func's relocation entry Elf32_Rel * reloc = JMPREL + reloc_offset; 2) calculate some_func's symtab entry Elf32_Sym * sym = &SYMTAB[ ELF32_R_SYM (reloc->r_info) ]; 3) sanity check assert (ELF32_R_TYPE(reloc->r_info) == R_386_JMP_SLOT); 4) late glibc 2.1.x (2.1.92 for sure) or newer, including 2.2.x, performs another check. if sym->st_other & 3 != 0, the symbol is presumed to have been resolved before, and the algorithm goes another way (and probably ends with SIGSEGV in our case). We must ensure that sym->st_other & 3 == 0. 5) if symbol versioning is enabled (usually is), determine the version table index uint16_t ndx = VERSYM[ ELF32_R_SYM (reloc->r_info) ]; and find version information const struct r_found_version *version =&l->l_versions[ndx]; where l is the link_map parameter. The important part here is that ndx must be a legal value, preferably 0, which means "local symbol". 6) the function name (an asciiz string) is determined: name = STRTAB + sym->st_name; 7) The gathered information is sufficient to determine some_func's address. The results are cached in two variables of type Elf32_Addr, located at reloc->r_offset and sym->st_value. 8) The stack pointer is adjusted, some_func is called. Note: in case of glibc, this algorithm is performed by the fixup() function, called by dl-runtime-resolve(). ----[ 5.4 - The conclusion Suppose we overflow a stack buffer with the following payload -------------------------------------------------------------------------- | buffer fill-up | .plt start | reloc_offset | ret_addr | arg1 | arg2 ... -------------------------------------------------------------------------- ^ | - this int32 should overwrite saved return address of a vulnerable function If we prepare appropriate sym and reloc variables (of type Elf32_Sym and Elf32_Rel, respectively), and calculate appropriate reloc_offset, the control will be passed to the function, whose name is found at STRTAB + sym->st_name (we control it of course). Arguments arg1, arg2 will be placed appropriately, and still we have opportunity to return into another function (ret_addr). The attached dl-resolve.c is a sample code which implements the described technique. Beware, you have to compile it twice (see the comments in the README.code). --[ 6 - Defeating PaX ----[ 6.1 - Requirements In order to use the "ret-into-dl" technique described in chapter 5, we need to position a few structures at appropriate locations. We will need a function, which is capable of moving bytes to a selected place. The obvious choice is strcpy; strncpy, sprintf or similar would do as well. So, just like in [3], we will require that there is a PLT entry for strcpy in an attacked program's image. "Ret-into-dl" solves a problem with randomly mmapped libraries; however, the problem of the stack remains. If the overflow payload resides on the stack, its address will be unknown, and we will be unable to insert 0s into it with strcpy (see 3.3). Unfortunately, I haven't come up with a generic solution (anyone?). Two methods are possible: 1) if scanf() function is available in PLT, we may try to execute something like scanf("%s\n",fixed_location) which will copy from stdin appropriate payload into fixed_location. When using "fake frames" technique, the stack frames can be disjoint, so we will be able to use fixed_location as frames. 2) if the attacked binary is compiled with -fomit-frame-pointer, we can chain multiple strcpy calls with the "esp lifting" method even if %esp is unknown (see the note at the end of 3.2). The nth strcpy would have the following arguments: strcpy(fixed_location+n, a_pointer_within_program_image) This way we can construct, byte by byte, appropriate frames at fixed_location. When it is done, we switch from "esp lifting" to "fake frames" with the trick described at the end of 3.3. More similar workarounds can be devised, but in fact they usually will not be needed. It is very likely that even a small program will copy some user-controlled data into a static or malloced variable, thus saving us the work described above. To sum up, we will require two (fairly probable) conditions to be met: 6.1.1) strcpy (or strncpy, sprintf or similar) is available via PLT 6.1.2) during normal course of execution, the attacked binary copies user-provided data into a static (preferably) or malloced variable. ----[ 6.2 - Building the exploit We will try to emulate the code in dl-resolve.c sample exploit. When a rwx memory area is prepared with mmap (we will call mmap with the help of ret-into-dl), we will strcpy the shellcode there and return into the copied shellcode. We discuss the case of the attacked binary having been compiled without -fomit-frame-pointer and the "frame faking" method. We need to make sure that three related structures are placed properly: 1) Elf32_Rel reloc 2) Elf32_Sym sym 3) unsigned short verind (which should be 0) How the addresses of verind and sym are related ? Let's assign to "real_index" the value of ELF32_R_SYM (reloc->r_info); then sym is at SYMTAB+real_index*sizeof(Elf32_Sym) verind is at VERSYM+real_index*sizeof(short) It looks natural to place verind at some place in .data or .bss section and nullify it with two strcpy calls. Unfortunately, in such case real_index tends to be rather large. As sizeof(Elf32_Sym)=16, which is larger than sizeof(short), sym would likely be assigned the address beyond a process' data space. That is why in dl-resolve.c sample program (though it is very small) we have to allocate a few tens of thousands (RQSIZE) of bytes. Well, we can arbitrarily enlarge a process' data space with setting MALLOC_TOP_PAD_ environ variable (remember traceroute exploit ?), but this would work only in case of a local exploit. Instead, we will choose more generic (and cheaper) method. We will place verind lower, usually within read-only mmapped region, so we need to find a null short there. The exploit will relocate "sym" structure into an address determined by verind location. Where to look for this null short ? First, we should determine (by consulting /proc/pid/maps just before the attacked program crashes) the bounds of the memory region which is mmapped writable (the executable's data area) when the overflow occurs. Say, these are the addresses within [low_addr,hi_addr]. We will copy "sym" structure there. A simple calculation tells us that real_index must be within [(low_addr-SYMTAB)/16,(hi_addr-SYMTAB)/16], so we have to look for null short within [VERSYM+(low_addr-SYMTAB)/8, VERSYM+(hi_addr-SYMTAB)/8]. Having found a suitable verind, we have to check additionally that 1) sym's address won't intersect our fake frames 2) sym's address won't overwrite any internal linker data (like strcpy's GOT entry) 3) remember that the stack pointer will be moved to the static data area. There must be enough room for stack frames allocated by the dynamic linker procedures. So, its best (though not necessary) to place "sym" after our fake frames. An advice: it's better to look for a suitable null short with gdb, than analyzing "objdump -s" output. The latter does not display memory placed after .rodata section. The attached ex-pax.c file is a sample exploit against pax.c. The only difference between vuln.c and pax.c is that the latter copies another environment variable into a static buffer (so 6.1.2 is satisfied). --[ 7 - Misc ----[ 7.1 - Portability Because PaX is designed for Linux, throughout this document we focused on this OS. However, presented techniques are OS independent. Stack and frame pointers, C calling conventions, ELF specification - all these definitions are widely used. In particular, I have successfully run dl-resolve.c on Solaris i386 and FreeBSD. To be exact, mmap's fourth argument had to be adjusted (looks like MAP_ANON has different value on BSD systems). In case of these two OS, the dynamic linker do not feature symbol versions, so ret-into-dl is even easier to accomplish. ----[ 7.2 - Other types of vulnerabilities All presented techniques are based on stack buffer overflow. All return-into-something exploits rely on the fact that with a single overflow we can not only modify %eip, but also place function arguments (after the return address) at the stack top. Let's consider two other large classes of vulnerabilities: malloc control structures corruption and format string attacks. In case of the previous, we may at most count on overwriting an arbitrary int with an arbitrary value - it is too little to bypass PaX protection genericly. In case of the latter, we may usually alter arbitrary number of bytes. If we could overwrite saved %ebp and %eip of any function, we wouldn't need anything more; but because the stack base is randomized, there is no way to determine the address of any frame.