💾 Archived View for aphrack.org › issues › phrack58 › 5.gmi captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content
View Raw
More Information
-=-=-=-=-=-=-
==Phrack Inc.==
Volume 0x0b, Issue 0x3a, Phile #0x05 of 0x0e
|=----=[ Armouring the ELF: Binary encryption on the UNIX platform ]=----=|
|=-----------------------------------------------------------------------=|
|=-------=[ grugq <grugq@lokmail.net>, scut <scut@team-teso.net> ]=------=|
--[ Contents
- Introduction
- Why encrypt?
- What is binary encryption?
- The threat
- ELF format
- ELF headers
- ELF sections
- ELF segments
- ELF support and history
- ELF loading
- ELF loading - Linux
- ELF Linux - auxiliary vectors
- ELF mapping
- Binary encryption theory
- Runtime decryption techniques
- ELF parasite approach
- Packing/Userspace ELF loader
- The future
- References
--[ Introduction
The UNIX world has lagged far behind the Microsoft world (including both
MS-DOS and MS Windows) in the twin realms of binary protection and reverse
engineering.
The variety and types of binary protection are a major area of difference.
MS Windows PE binaries can be encrypted, packed, wrapped, and thoroughly
obfuscated, and then decrypted, unpacked, unwrapped, and reconstructed.
Conversely, the best that can be done to a UNIX ELF binary is stripping the
debugging symbol table. There are no deconstructors, no wrappers, no
encrypters, and only a single packer (UPX [12], aimed at decreasing disk
space, not increasing protection) for the ELF. Clearly the UNIX ELF binary
is naked compared to the powerful protections afforded the Windows PE binary
format.
The quantity and quality of reverse engineering tools are other key areas
of significant gulf. The runtime environment of the PE binary, and indeed
the very operating system it executes on, is at the mercy of the brilliant
debugger SoftICE. Meanwhile the running ELF can only be examined one word
at a time via the crippled system call ptrace(), imperfectly interfaced via
adb and its brain dead cousin: gdb. The procfs, on those systems on which
it is present, typically only provides the ability to examine a process
rather than control it. Indeed, the UNIX world is an unrealised nightmare
for the UNIX reverse engineer. Unrealised because up until now no one has
bothered to protect an ELF binary.
--[ Why encrypt?
The prime motivator for protecting files on MS platforms has been to enforce
copy protection in a failed attempt to ensure payment for shareware
applications. As of now, there is no such motivation on the UNIX side, but
there are other reasons to protect binaries.
From the viewpoint of an attacker the reasons to protect binaries can be
listed as:
- hindering forensic analysis in case of detection
- hindering copying of confidential data (possibly by other
attackers or commercially motivated forensic investigators*)
- adding functionality to the protected binary
From the point of view of a defender, there are also good reasons to
protect binaries. These can be enumerated as
- adding a level of authorization checks
- hindering analysis of customised intrusion detection tools (tools
that an attacker might figure out how to evade, were they to
discover their purpose)
- adding functionality to the protected binary
The need to protect binaries from analysis in the UNIX world has clearly
surfaced.
- Certain big five companies sell their collections of recovered exploits
for an annual fee.
--[ What is binary encryption?
The reasons to protect a binary are clear, now we have to come up with a
good design for the protection itself. When we talk of protecting binaries
it is important to know what sort of protection we expect to achieve; we
must define our requirements. The requirements for this implementation are
as follows:
- Only authorised individuals may execute the binary.
- The on disk binary must be immune for all methods of static
analysis which might reveal anything substantial about the
purposes/methods of the binary.
- The process image of the binary, something that unfortunately
cannot be hidden, must obscure the purposes/methods of the
binary.
- The mechanism for protecting the binary must be production
quality, being both robust and reliable.
The best mechanism to fulfill all of these requirements is with some form of
encryption. We know enough of what we want that we can now define the term
"binary encryption" as the process of protecting a binary from reverse
engineering and analysis, while keeping it intact and executeable to the
underlying operating system. Thus, when we talk of binary encryption we refer
to a robust security mechanism for protecting binaries.
--[ The threat
Today most of the so called "forensic analysts" have very few tools and
knowledge at hand to counter anything more sophisticated than rm, strip and
some uncautious attacker. This has been demostrated in the public analysis of
the x2 binary [14]. Two seminal forensic investigators have been completely
stumped by a relatively simple binary protection. It is worth mentioning
that two private reverse engineers reversed the x2 binary to C source code
in approximately one day.
The Unix forensic investigater has an extremely limited range of tools at
her disposal for analysis of a compromised machine. These tools tend to
be targeted at debugging a misbehaving system, rather than analysing a
compromised system. While locate, find, lsof and netstat are fine when
attempting to keep a production system from falling over, when it comes to
investigating a breakin, they fall short on usefulness. Even TCT is severly
limited in its capabilities (although that is the subject of another
paper).
If the broad analysis of an entire system is so impaired, binary analysis
is even more so. The forensic analyst is equiped with tools designed to
debug binaries straight from the back end of an accomidating compiler, not
the hostile binaries packaged by a crafty attacker. The list of tools is
short, but for completeness presented here: strings, objdump, readelf,
ltrace, strace, and gdb. These tools are all based on two flawed interfaces:
libbfd and ptrace(). There are superior tools currently in development, but
they are primarily intended for, and used by, Unix reverse engineers and
other individuals with "alternative" motivations.
Barring these private reverse engineering applications, no Unix tools exist
to tackle sophisticated hostile code. This is because the basic Unix
debugging hooks are very limited. The ubiquitus ptrace() can be easily
subverted and confused, and while /proc interface is more feature rich, it is
not uniform across platforms. Additionally the /proc debugging interface
typically provides only information about the runtime environment of a
process, not control over its exectuion. Even the most sophisticated procfs
need not be of any help to the analyst, if the binary is sufficiently
protected.
That said, there has been some slight improvement in the quality of analysis
tools. The powerful Windows only disassembler - IDA - now provides complete
support for the ELF binary format. Indeed, with the latest release IDA can
finally handle ELF binaries without a section header table (thanks Ilfak).
These improvements in the available tools are meaningless however, unless
there is an accompanying increase in knowledge and skill for the forensic
analysers. Given that there are almost no skilled reverse engineers in
forensic analysis (based on the published material one could easily conclude
that there are none), the hackers will have the upper hand at the start of
this arms race.
As the underground world struggles with with the issue of leaking exploits
and full vs. non disclusure, more hackers will see binary encryption as a
means of securing their intellectual property. Simultaneously the security
community is going to be exposed to more encrypted binaries, and will have
to learn to analyse a hostile binary.
--[ ELF format
The 'Executeable and Linking Format' is a standardized file format for
executeable code. It is mostly used for executeable files (ET_EXEC) or for
shared libraries (ET_DYN). Currently almost all modern Unix variants
support the ELF format for its portability, standardized features and
designed-from-scratch cleaness. The actual version of the ELF standard is
1.2. There are multiple documents covering the standard, see [1].
The ELF binary format was designed to meet the requirements of both linkers
(typically used during compile time) and loaders (typically used only
during run time). This nessicitated the incorporation of two distinct
interfaces to describe the data contained within the binary file. These two
interfaces have no dependancy on each other. This section will act as a
brief introduction to both interfaces of the ELF.
--[ ELF headers
An ELF file must contain at a minimum an ELF header. The ELF header
contains information regarding how the contents of the binary file should
be interpreted, as well as the locations of the other structures describing
the binary. The ELF header starts at offset 0 within the file, and has the
following format:
#define EI_NIDENT (16)
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;
The fields are explained in detail below:
* e_ident has certain known offsets that contain information about how to
treat and interpret the binary. Be warned that Linux defines additional
indices and values that are not contained in the SysV ABI, and are
therefore non-portable. These are the official known offsets, and their
potential values:
#define EI_MAG0 0 /* File identification byte 0 index */
#define ELFMAG0 0x7f /* Magic number byte 0 */
#define EI_MAG1 1 /* File identification byte 1 index */
#define ELFMAG1 'E' /* Magic number byte 1 */
#define EI_MAG2 2 /* File identification byte 2 index */
#define ELFMAG2 'L' /* Magic number byte 2 */
#define EI_MAG3 3 /* File identification byte 3 index */
#define ELFMAG3 'F' /* Magic number byte 3 */
#define EI_CLASS 4 /* File class byte index */
#define ELFCLASSNONE 0 /* Invalid class */
#define ELFCLASS32 1 /* 32-bit objects */
#define ELFCLASS64 2 /* 64-bit objects */
#define EI_DATA 5 /* Data encoding byte index */
#define ELFDATANONE 0 /* Invalid data encoding */
#define ELFDATA2LSB 1 /* 2's complement, little endian */
#define ELFDATA2MSB 2 /* 2's complement, big endian */
#define EI_VERSION 6 /* File version byte index */
#define EV_CURRENT 1 /* Value must be EV_CURRENT */
* e_type describes how the binary is intended to be utilised. The following
are legal values:
#define ET_NONE 0 /* No file type */
#define ET_REL 1 /* Relocatable file */
#define ET_EXEC 2 /* Executable file */
#define ET_DYN 3 /* Shared object file */
#define ET_CORE 4 /* Core file */
* e_machine indicates for which architecture the object file is
intended. The following is a short list of the most common values:
#define EM_SPARC 2 /* SUN SPARC */
#define EM_386 3 /* Intel 80386 */
#define EM_SPARCV9 43 /* SPARC v9 64-bit */
#define EM_IA_64 50 /* Intel Merced */
* e_version indicates which version of ELF the object file conforms too.
Currently it must be set to EV_CURRENT, identical to
e_ident[EI_VERSION].
* e_entry contains the relative virtual address of the entry point to the
binary. This is traditionally the function _start() which is located at
the start of the .text section (see below). This field only has meaning
for ET_EXEC objects.
* e_phoff conatins the offset from the start of the file to the first
Program Header (see below). This field is only meaningful in ET_EXEC and
ET_DYN objects.
* e_shoff contains the offset from the start of the file to the first
Section Header (see below). This field is always useful to the reverse
engineer, but only required on ET_REL files.
* e_flags contains processor specific flags. This field is not used on
i386 or SPARC systems, so it can be safely ignored.
* e_ehsize contains the size of the ELF header. This is for error checking
and should be set to sizeof(Elf32_Ehdr).
* e_phentsize contains the size of a Program Header. This is for error
checking and should be set to sizeof(Elf32_Phdr).
* e_phnum contains the number of Program headers. The program header table
is an array of Elf32_Phdr with e_phnum elements.
* e_shentsize contains the size of a Section Header. This is for error
checking and should be set to sizeof(Elf32_Shdr).
* e_shnum contains the number of Section headers. The section header table
is an array of Elf32_Shdr with e_shnum elements.
* e_shstrndx contains the index within the section header table of the
section containing the string table of section names (see below).
The following two sections describe in detail the linking interface and the
execution interface to the ELF, respectively.
--[ ELF Sections
The interface used when linking multiple object files together is the Section
interface. The binary file is viewed as an collection of sections; each an
array of bytes of which no byte may reside in more than one secion. The
contents of a section may be interpreted in any way by the inspecting
application, although there is helper information to enable an application
to correctly interpret a section's contents. Each section is described by a
section header, contained within a section header table typically located
at the end of the object. The section header table is an array of section
headers in arbitrary order, although usually in the same order as they
appear in the file, with the only exeption being that the zeroeth entry is
the NULL section: a section which is set to 0 and doesn't describe any part
of the binary. Each section header has the following format:
typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;
The fields of the section header have the following meanings:
* sh_name contains an index into the section contents of the e_shstrndx
string table. This index is the start of a null terminated string to
be used as the name of the section. There are reserved names, the
most important being:
.text Executable object code
.rodata Read only strings
.data Initialised "static" data
.bss Zero initialized "static" data, and the
base of the heap
* sh_type contains the section type, helping the inspecting application
to determine how to interpret the sections contents. The following
are legal values:
#define SHT_NULL 0 /* Section header table entry unused */
#define SHT_PROGBITS 1 /* Program data */
#define SHT_SYMTAB 2 /* Symbol table */
#define SHT_STRTAB 3 /* String table */
#define SHT_RELA 4 /* Relocation entries with addends */
#define SHT_HASH 5 /* Symbol hash table */
#define SHT_DYNAMIC 6 /* Dynamic linking information */
#define SHT_NOTE 7 /* Notes */
#define SHT_NOBITS 8 /* Program space with no data (bss) */
#define SHT_REL 9 /* Relocation entries, no addends */
#define SHT_SHLIB 10 /* Reserved */
#define SHT_DYNSYM 11 /* Dynamic linker symbol table */
* sh_flags contains a bitmap defining how the contents of the section
are to be treated at run time. Any bitwise OR'd value of the
following is legal:
#define SHF_WRITE (1 << 0) /* Writable */
#define SHF_ALLOC (1 << 1) /* Occupies memory during execution */
#define SHF_EXECINSTR (1 << 2) /* Executable */
* sh_addr contains the relative virtual address of the section during
runtime.
* sh_offset contains the offset from the start of the file to the first
byte of the section.
* sh_size contains the size in bytes of the section.
* sh_link is used to link associated sections together. This is
typically used to link a string table to a section whose contents
require a string table for correct intepretation, e.g. symbol tables.
* sh_info is a used to contain extra information to aid in link
editing. This field has exactly two uses, indicating which section a
relocation applies to for SHT_REL[A] sections, and holding the
maximum number of elements plus one within a symbol table.
* sh_addralign contains the alignment requirement of section contents,
typically 0/1 (both meaning no alignment) or 4.
* sh_entsize, if the section holds a table, contains the size of each
element. Used for error checking.
--[ ELF Segments
The ELF segment interface is used to during the creation of a process
image. Each segment, a contiguous stream of bytes, (not to be confused with
a memory segment, i.e. one page) is described by a program header. The
program headers are contained in a program header table described by the
ELF header. This table can be located anywhere, but is typically located
immediately after the ELF header *. The program header is now described in
depth:
typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;
The fields have the following meanings:
* p_type describes how to treat the contents of a segment. The
following are legal values:
#define PT_NULL 0 /* Program header table entry unused */
#define PT_LOAD 1 /* Loadable program segment */
#define PT_DYNAMIC 2 /* Dynamic linking information */
#define PT_INTERP 3 /* Program interpreter */
#define PT_NOTE 4 /* Auxiliary information */
#define PT_SHLIB 5 /* Reserved */
#define PT_PHDR 6 /* Entry for header table itself */
* p_offset contains the offset within the file of the first byte of the
segment.
* p_vaddr contains the realtive virtual address the segment expects to
be loaded into memory at.
* p_paddr contains the physical address of the segment expects to be
loaded into memory at. This field has no meaning unless the hardware
supports and requires this information. Typically this field is set to
either 0 or the same value as p_vaddr.
* p_filesz contains the size in bytes of the segment within the file.
* p_memsz contains the size in bytes of the segment once loaded into
memory. If the segment has a larger p_memsz than p_filesz, the
remaining space is initialised to 0. This is the mechanism used to
create the .bss during program loading.
* p_flags contains the memory protection flags for the segment once
loaded. Any bit wise OR'd combination of following are legal values:
#define PF_X (1 << 0) /* Segment is executable */
#define PF_W (1 << 1) /* Segment is writable */
#define PF_R (1 << 2) /* Segment is readable */
* p_align contains the alignment for the segment in memory. If the
segment is of type PT_LOAD, then the alignment will be the expected
page size.
- FreeBSD's dynamic linker requires the program header table to be located
within the first page (4096 bytes) of the binary.
--[ ELF format - support and history
The ELF format has widely gained acceptance as a reliable and mature
executeable format. It is flexible, being able to support different
architectures, 32 and 64 bit alike, without compromising too much of its
design.
As of now, the following systems support the ELF format:
DGUX | ELF, ?, ?
FreeBSD | ELF, 32/64 bit, little/big endian
IRIX | ELF, 64 bit, big endian
Linux | ELF, 32/64 bit, little/big endian
NetBSD | ELF, 32/64 bit, little/big endian
Solaris | ELF, 32/64 bit, little/big endian
UnixWare | ELF, 32 bit, little endian
The 32/64 bit differences on a single system is due to different
architectures the operating systems is able to run on.
--[ ELF loading
An ELF binary is loaded by mapping all PT_LOAD segments into memory at the
correct locations (p_vaddr), the binary is checked for library dependancies
and if they exist those libraries are loaded. Finally, any relocations that
need to be done are performed, and control is transfered to the main
executable's entry point. The accompanying code in load.c demonstrates one
method of doing this (based on the GNU dynamic linker).
--[ ELF loading - Linux
Once the userspace receives control, we have this situation:
- All PT_LOAD segments of the binary, or if its dynamicly linked:
the dynamic linker, are mapped properly
- Entry point: In case there is a PT_INTERP segment, the program
counter is set to the entry point of the program interpreter.
- Entry point: In case there is no PT_INTERP segment, the program
counter is initialized to the ELF header's entry point.
- The top of the stack is initialized with important data, see
below.
When the userspace receives control, the stack layout has a fixed format.
The rough order is this:
<arguments> <environ> <auxv> <string data>
The detailed layout, assuming IA32 architecture, is this (Linux kernel
series 2.2/2.4):
position content size (bytes) + comment
------------------------------------------------------------------------
stack pointer -> [ argc = number of args ] 4
[ argv[0] (pointer) ] 4 (program name)
[ argv[1] (pointer) ] 4
[ argv[..] (pointer) ] 4 * x
[ argv[n - 1] (pointer) ] 4
[ argv[n] (pointer) ] 4 (= NULL)
[ envp[0] (pointer) ] 4
[ envp[1] (pointer) ] 4
[ envp[..] (pointer) ] 4
[ envp[term] (pointer) ] 4 (= NULL)
[ auxv[0] (Elf32_auxv_t) ] 8
[ auxv[1] (Elf32_auxv_t) ] 8
[ auxv[..] (Elf32_auxv_t) ] 8
[ auxv[term] (Elf32_auxv_t) ] 8 (= AT_NULL vector)
[ padding ] 0 - 16
[ argument ASCIIZ strings ] >= 0
[ environment ASCIIZ str. ] >= 0
(0xbffffffc) [ end marker ] 4 (= NULL)
(0xc0000000) < top of stack > 0 (virtual)
------------------------------------------------------------------------
When the runtime linker (rtld) has done its duty of mapping and resolving
all the required libraries and symbols, it does some initialization work
and hands over the control to the real program entry point afterwards. As
this happens, the conditions are:
- All required libraries mapped from 0x40000000 on
- All CPU registers set to zero, except the stack pointer ($sp) and
the program counter ($eip/$ip or $pc). The ABI may specify
further initial values, the i386 ABI requires that %edx is set to
the address of the DT_FINI function.
--[ ELF loading - auxiliary vectors (Elf32_auxv_t).
The stack initialization is somewhat familar for a C programmer, since he
knows the argc, argv and environment pointers from the parameters of his
'main' function. It gets called by the C compiler support code with exactly
this parameters:
main (argc, &argv[0], &envp[0]);
However, what is more of a mystery, and usually not discussed at all, is
the array of 'Elf32_auxv_t' vectors. The structure is defined in the elf.h
include file:
typedef struct
{
int a_type; /* Entry type */
union
{
long int a_val; /* Integer value */
void *a_ptr; /* Pointer value */
void (*a_fcn) (void); /* Function pointer value */
} a_un;
} Elf32_auxv_t;
It is a generic type-to-value relationship structure used to transfer very
important data from kernelspace to userspace. The array is initialized on
any successful execution, but normally it is used only by the program
interpreter. Lets take a look on the 'a_type' values, which define what
kind of data the structure contains. The types are found in the 'elf.h'
file, and although each architecture implementing the ELF standard is
free to define them, there are a lot of similarities among them. The
following list is from a Linux 2.4 kernel.
/* Legal values for a_type (entry type). */
#define AT_NULL 0 /* End of vector */
#define AT_IGNORE 1 /* Entry should be ignored */
#define AT_EXECFD 2 /* File descriptor of program */
#define AT_PHDR 3 /* Program headers for program */
#define AT_PHENT 4 /* Size of program header entry */
#define AT_PHNUM 5 /* Number of program headers */
#define AT_PAGESZ 6 /* System page size */
#define AT_BASE 7 /* Base address of interpreter */
#define AT_FLAGS 8 /* Flags */
#define AT_ENTRY 9 /* Entry point of program */
#define AT_NOTELF 10 /* Program is not ELF */
#define AT_UID 11 /* Real uid */
#define AT_EUID 12 /* Effective uid */
#define AT_GID 13 /* Real gid */
#define AT_EGID 14 /* Effective gid */
#define AT_CLKTCK 17 /* Frequency of times() */
Some types are mandatory for the runtime dynamic linker, while some are
merely candy and remain unused. Also, the kernel does not have to use every
type, infact, the order and occurance of the elements are subject to change
across different kernel versions. This turns out to be important when
writing our own userspace ELF loader, since the runtime dynamic linker may
expect a certain format, or even worse, the headers we receive by the
kernel ourselves are in different order on different systems (Linux 2.2 to
2.4 changed behaviour, for example). Anyway, if we stick to a few simple
rules when parsing and setting up the headers, few things can go wrong:
- Always skip sizeof(Elf32_auxv_t) bytes at a time
- Skip any unknown AT_* type
- Ignore AT_IGNORE types
- Stop processing only at AT_NULL vector
On Linux, the runtime linker requires the following Elf32_auxv_t
structures:
AT_PHDR, a pointer to the program headers of the executeable
AT_PHENT, set to 'e_phentsize' element of the ELF header (constant)
AT_PHNUM, number of program headers, 'e_phnum' from ELF header
AT_PAGESZ, set to constant 'PAGE_SIZE' (4096 on x86)
AT_ENTRY, real entry point of the executeable (from ELF header)
On other architectures there are similar requirements for very important
auxiliary vectors, with which the runtime linker would not be able to work.
Some further details about the way Linux starts up an executeable can be
found at [11].
--[ Binary encryption theory
There is nothing new about encrypting binaries, indeed since the 1980's
there have been various mechanisms developed for protecting binaries on
personal computers. The most active developers of binary protections have
been virus writers and shareware developers. While these techniques have
evolved with advances in processing power and operating system architecture,
most of the basic concepts remain the same. Essentially a plaintext
decryption engine will execute first and it will decrypt the next encrypted
section of code, this might be the main .text, or it might be another
decryption engine.
Barring a flawed and easily cracked encryption technique (e.g. XOR with a
fixed value), the first plaintext decryptor is the usually the weak point of
any encrypted binary. Due to this weakness, a number of various methods have
been developed for making the initial decryption engine as difficult to
reverse engineer as possible.
The following is just a brief list of methods that have been used to
protect the initial decryption engine:
* Self Modifying Code: Code which alters itself during run time, so that
analysis of the binary file on disk is different from analysis of the
memory image.
* Polymorphic Engines: Creates a unique decryption engine each time it is
used so that it is more difficult to compare two files. Also, it is
slightly more difficult to reverse engineer.
* Anti-Disassembling/Debugging tricks: Tricks which attempt to confuse
the tools being used by the reverse engineer. This makes it difficult
for the analyst to discover what the object code is doing.
The following is a short list of encryption methods that have been used to
protect the main object code of the executable:
* XOR: The favourite of any aspiring hacker, xor is frequently used to
obfuscate code with a simple encryption. These are usually very easily
broken, but extend slightly the time it takes to reverse engineer.
* Stream Ciphers: Ideal for binary encryption, these are usually strong,
small and can decrypt an arbitray number of bytes. A binary properly
encrypted with a stream cipher is impregnable to analysis.
* Block Ciphers: These are more awkward to use for binary encryption
because of the block alignment requirements.
* Virtual CPUs: A painstaking and powerful method of securing a binary.
The object code actually runs on a virual CPU that needs to be
independantly analysed first. Very painful for a reverse engineer (and
also the developer).
There are even mechanisms to keep the plaintext as safe as possible in
memory. Here is a partial list of some of these mechanisms:
* Running Line Code: This is when only the code immediately needed is
decrypted, and then encrypted again after use. CPU intensive, but
extremely difficult to analyse.
* Proprietary Binary Formats: If the object code is stored in an unknown
format, it is quite difficult for the reverse engineer to determine what
is data and what is text.
--[ Runtime encryption techniques
--[ The virus approach
Adding code to an ELF executeable is far from being new. There have been
known ELF viruses since about 1997, and Silvio was the first to publish
about it [2], [3].
One nasty property about the ELF format is its "easy loading" design
goal. The program headers and the associated segments map directly into the
memory, speeding up the preparation of the executeable when executing it.
The way its implemented in the ELF format makes it difficult to change the
file layout after linking. To add code or to modify the basic structure
becomes nearly impossible, since a lot of hardcoded values cannot be
adjusted without knowing the pre-linking information, such as relocation
information, symbols, section headers and the like. But most of such
information is either gone in the binary or incomplete.
Even with such information, modifying the structure of the ELF
executeable is difficult (without using a sophisticated library such as
libbfd). For an in-depth discussion about reducing the pain when modifying
shared libraries with most of the symbol information intact, klog has
written an article about it [4].
Because of this difficulties, most attempts in the past have focused on
exploiting 'gaps' within the ELF binary, that get mapped into memory when
loading it, but remain unused. Such areas are needed to align the memory on
pages. As mentioned earlier, ELF has been designed for fast loading, and
this alignment in the file guarantees a one-to-one mapping of the file into
the memory. Also, as we will see below, this alignment allows easy
implementation of page-wise granularity for read, write and execution
permission.
So the 'usual' ELF virus searches through the host executeable for such
gaps, and in case a sufficient large area has been found it writes a copy
of itself into it. Afterwards it redirects the execution flow of the
program to its own area, often by just modifying the program entry point in
the ELF header. There have been numerous examples for such viruses, most
notable the 'VIT' [5] and 'Brundle-Fly' [6] virii.
While this approach works moderatly well in practice, it cannot infect
every ET_EXEC ELF executeable. The page size (PAGE_SIZE) on a UNIX system
is often 4096, and since the padding can take up at max a whole page, the
chances of finding a possible gap is dependant on the virus size and the
host executeable. An average virus of the above type takes about 2000 bytes
and hence can infect only about 50 percent of all executeables. While for
virii this adds some non-deterministic fun and does not really matter, for
reliable binary encryption this approach has serious drawbacks.
However, there have been mad people using this approach for basic binary
encryption purposes. The program which does this is called dacryfile. There
is a demonstration copy of dacryfile* available from [7]. Dacryfile uses a
data injected parasite to perform the run time decryption of the host file.
While dacryfile is undocumented, a limited amount of information is provided
here for the curious.
Dacryfile is a collection of tools which implement the following concept.
The host file is encrypted from the start of the .text section, to the end
of the .text segment. The file now has its object code and its read only
data protected by encryption, while all its data and dynamic objects are
open to inspection. The host file is injected with a parasite that will
perform the runtime decryption. This parasite can be of arbitrary size
because it is appended to the end of the .data segment.
The default link map of a gcc produced Linux ELF has the .dynamic section
as the last prior to the .bss section. The .dynamic section is an array of
Elf32_Dyn structures, terminated by a NULL struct tag. Therefore, regardless
of how big the .dynamic section, processing of its contents will halt when
the terminating Elf32_Dyn struct is encountered. A parasite can be injected
at the end of the section without damaging the host file in any way. The
dacryfile program "inject" appends the .text section from a parasite object
file onto the .dynamic section of a host binary.
The parasite itself is fairly simple, utilising the subversive dynamic
linking Linux library to access libc functions, and rc4 to decrypt the host.
The dacryfile collection is unsupported and undocumented, it and all other
first generation binary encryptors, are a dead end. However, a dacryfile
protected binary will be extremely immune from the recent pitiful attempts
at reverse engineering by the forensic experts. Provided the encryption
passphrase remains secret, and is strong enough to withstand a brute force
attack, a dacryfile protect binary will keep is its object code or read-only
data secure from examination. The dynamic string table will still be
available, but that will provide limited information about the functionality
of the binary.
Also included with the article is a stripped down but functional loader of
the burneye runtime encryption program. It is commented and should work
just fine.
- dacryphilia is a fetish in which one gains sexual arousal through the
tears of one's partner.
--[ Packing/Userspace ELF loader
The most flexible approach to wrap an executeable has been invented by the
developers of the UPX packer [12], by John Reiser to be exact :). They load
the binary in userspace, much like the kernel does it. When done properly
there is no visible change in behaviour to the wrapped program, while it
has no constrains on either the wrapper or the wrapped executeable, as the
techniques mentioned before have. So this is the way we want to encrypt
binaries, by loading them from userspace.
Normally the kernel is responsible for loading the ELF executeable into
memory, setting page permissions and allocating storage. Then it passes
control to the code in the executeable.
On todays system this is not fully true anymore. The kernel still does a
lot of initial work, but then interacts with a userspace runtime linker
(rtld) to resolve libraries dependancies, symbols and linking preparations.
Only after the rtld has done the whole backstage work, control is passed to
the real programs entry point. The program finds itself in a healthy
environment with all library symbols resolved, well prepared memory layout
and a carefully watching runtime linker in the background.
In normal system use this is a very hidden operation and since it works
so smooth nobody really cares. But as we are going to write a userspace ELF
loader, we have to mess with the details. To get a rough impression, just
write a simple "hello world" program in C, compile it, and instead of just
running it, do a strace on it. Ever wondered what happens as so many
syscalls are issued by your one-line executeable?
This is the runtime linker in action, trying to resolve your 'printf'
symbol after it mapped the entire C library into memory and prepared the
page permissions.
A lot of interesting details about the history of linkers and program
loading can be found in [8].
--[ The future
Forensic work on binary executeables will become very difficult, and most
of the people who do forensics nowadays will drop out of the field. Most
likely some people from the reverse engineering 'scene' will convert more
to network security and become forensics.
There are promising approaches to incorporating decompilation and
data/code flow analysis techniques into binary encryption to implement
further protections against tampering, analyzing and deprotecting such
binaries.
The strength of the next protections will rely on the missing debug
interfaces on most UNIX's, that are able to deal with hostile code. The
generation of protections that come afterwards will rely solely on their
sophisticated obfuscation approaches to deny attempts of static and
dead-listing type of analysis.
There are approaches to replace the overtaxed ptrace interface [9] with
more powerful debug interfaces that can deal with hostile code. Also work
on kernel space debuggers has been done, such as the Pice debugger [10].
Aside from poor debugging tools and bad debugging hooks, the only thing
that can be used to armour the run time binary is heavy obfuscation that
will make it harder for a reverse engineer to see what is actually going
on. You have to remember that a reverse engineer can see each atomic
operation that is performed, as well as what is going on in memory (i.e.
change variables, new mmaps, read()s, etc. etc. If this is to be defeated,
they need to be swamped with information. They need to be so bady off that
they cry about each time they have to restart their debuggers!
--[ References
[1] Tool Interface Standard, Executeable and Linking Format, Version 1.2
http://segfault.net/~scut/cpu/generic/TIS-ELF_v1.2.pdf
http://www.caldera.com/developers/gabi/latest/contents.html
http://www.caldera.com/developers/devspecs/gabi41.pdf
additional per-architecture information is available from
http://www.caldera.com/developers/devspecs/
[2] Silvio Cesare, Unix viruses
http://www.big.net.au/~silvio/unix-viruses.txt
[3] Silvio Cesare, Unix ELF parasites and virus
http://www.big.net.au/~silvio/elf-pv.txt
[4] klog, Phrack #56 article 9, Backdooring binary objects
http://www.phrack.org/show.php?p=56&a=9
[5] Silvio Cesare, The 'VIT' virus
http://www.big.net.au/~silvio/vit.html
[6] Konrad Rieck, Konrad Kretschmer
'Brundle-Fly', a good-natured Linux ELF virus
http://www.roqe.org/brundle-fly/
[7] The grugq, dacryfile binary encryptor
http://hcunix.7350.org/grugq/src/dacryfile.tgz
[8] John R. Levine, Linkers & Loaders
ISBN 1-55860-496-0
[9] Linux ptrace man page (see if you can catch the three errors)
http://www.die.net/doc/linux/man/man2/ptrace.2.html
[10] PrivateICE Linux system level symbolic source debugger
http://pice.sourceforge.net/
[11] Konstantin Boldyshev, Startup state of Linux/i386 ELF binary
http://linuxassembly.org/startup.html
[12] UPX, the Ultimate Packer for eXecutables
http://upx.sourceforge.net/
[13] GNU binutils
ftp://ftp.gnu.org
[14] Forensic analysis of a burneye protected binary
http://www.incidents.org/papers/ssh_exploit.pdf
http://staff.washington.edu/dittrich/misc/ssh-analysis.txt
[15] The grugq, Subversive Dynamic Linking
http://hcunix.7350.org/grugq/doc/subversivedl.pdf
begin 644 binary-encryption.tar.gz
M'XL(`#^^+#P``^S\97`>3;,V"%K,S,S,8#%8S,QD,3,S6,S,;+$LM%@6,S,S
M,S/K7MG/^[QPSG=F-F:^V(V-G9:C[^KLK*S$*ZM^M`W,;?0=W.F,;0P=W.V<
MS&UM&#[];[\8&5D9V=D_?_S^N?[K[U]C=E8F%B86]@_>3XQ,+,PLGS\1?O[?
MK\I_OYP=G?0="`D_.=C:.OT?\?V?O?__T<O@O\7?RE;?B-[P?^<:C$R,C&QL
MK/]3_-G86?X9?S9V-J8/?E86%M9/A(S_.Y7XGZ[_/X\_B;F-H96SD3$AC[&5
M";T9'_2_"":&-DY6_TER=#(RM_UO)"MS@_^D.=N8?Y#_"Y^[(\-_6^(WT=I:
MW^:_4S^BXO3?J4[N=L:.O\G0)$;&)N8VQH3R`F(B2A*:(E`BTJ*Z(NHB0KI_
M4P@9J`E9&3G9"*D9H*&AS6V<H$WTG?2M*`W-/N)-[4@%[0D-96?LX&#K0.E(
MQ0T-9>QF[D3)_#'RAH9VMG$T-[4Q-B*TLK4QA?Y=$;H?NE-^""$T,:(E%+$R
M86'6%3$S^A!D_''_FR+_AV+WA_+7,M0.QDZ_I_Y9[3^D0A$2VOP6;&AMY$C(
M2\A("PT%]4&SUK>S,K8Q=3+[T.@/VP=-W\C(X>/Q8_7?3^9_<UKKFECIF_Z>
M+",@KRNO**$JH"SRKY=V#K9.'^\HY17EE'45102$O?Z,U!0EE$5^V_M;02@H
M:L(/[73-K?5-C3]H_S+CXX7=;QT<G1R<#9T(_Z$IX8<5?ZGUH>?OPG'ZL]['
M@[&-T9^AT8>/_Q[K6UG9&GX\?(B!LC4QT77ZS?@Q^'CV_ENBH]9O!]+Q&>O:
MF=DX6^O0$E(36G%#0T-9&UL[&CM1DO_-1TO(Z,;(2$OH:.YA;&M"^9]Z47W,
M^@\Y5+]%F-@Z$%*:_W'NAS$?O[\CPTUH3LCSG\P?)!J:WRPT-%1_+'1T-7<R
M-".DM#.CX[/3_9UT?]$-]1T_,DY95UI.0)CKMX6_B5#_U4?4AA]+_5-OK7]&
MF89&A_L/OR$=W]_N^Z/4[T5<?@>9D)S0YQ^+ZEM]I`HA'2$3U;_/^7#F[Y!2
M_OL<FG](,#&W,G;T(*3YPP[USRKX+>*/W/^@_$OJ/P+V7Q3Y3Z'_9/X[HO\#
M]^^8>?R[OA_!_B?KQ_@CH/^SB=X?MX^BUG>V<OKC6P,'8WW+WV\^7GR4))35
MAZ2_7?DG0?ZNE'^C_\O9O^7JT/]37SI"JW\Y_??L#W`P_%C`R9A0G_!#;5L'
M]]^E1VAK0NAD9DSH:F9K94QH9*QO]1L]H,Q-/GS^SSKY6.\#L^PH956DI6G_
M5;"T?Q<=[3]+D_8/6C!24?'R_F:F^C#F`P^<'3ZL9OJMQ!_G?<3S=T51_=L"
M_Z'MGU3^9[%[_57MHA+J(L(?SG$U^PC0AX"/E/Y?9)S.7VG[6WVK?^8/WW\(
M_U.^OZVA=+$U-Z*FHORM$\V_<]`2_FON?RCVI\:A_@>CK?Z._Q_U_U;AG_'X
MH\0_4N\O)7]'A-[`T9'0T=CP]S[DH]+-;)VMC`@-C`D]C!ULC8WH">U,3$R<
MB/Z$Y$^Z6-'0</^5'!^3U8P)_^'<WQ'\;=,'TGYTA0\(^$/Z"#'5GZE_8_)O
MS_^%TE3_CH'_$/)O0.S]5_LP<K:VT_WH$7_-^><46L+_P/4_>/H;I/Y`_M^8
M_4]L%?D+6W_CSW_'VX^^\=L3_X#>_]HNC&V<=/^@YX>]O^?_UO_?.]%_6&'W
M'PQ_-:9_3S&:?X/`OZ/T?P4O_Q';?X-*0E[>OS'R`WC("8G^O?RI_JVR/ZSX
MRZ*/E?Z6_T'YJ$2Z?T>7WXQV#A^.-*$D%OGS^B]17(1D5LZTA+I_X>AOCW]0
M?A/,;<S_]:QM0TQ+^'O1?ZSU;PWD8YJNHYFQE175/XF_I_Z#]L<CT/],!\9_
MIL%?L*'[NV9^[P=^N_1WPO^5%;;.3KHV^M;&_\M^_W=F_*N7_V8W,?IG,R8D
MM/N(C=;?2*WSCVSXF/<1LP\O?62SC(#8OZC_I6__G^36G\3Y$ZR_EOT0:/M1
MU91_ZTQ+**>K**RFZ"6G*_2Q7U#^0"XVQ@_P^AU0.J;?D?MK`T4L]+LN;2B<
M_DS_;8.=\\>^Z`.(B/\X[?<2'SXRHOS;-79_JN0?+O[]\"&3B/<_"?^2_GOJ
M'QPV=OR0JO\AUN@ON?_+G+?[RP&_U_R`<4-K.\J_4\G<Z"/DM/]P'BVATE_.
MH_HW$*:D^]-Z_L(.&^./0/U&C@\T,W70MR8T^]##V,&1T,GV[T[Q!U?^ZA5_
M%=%O-/E?%YK=_U1C_VPEOZW_1^?ZL[?\VUE_[2?_VD.2_S/&'Q[C(60D_*>;
M""F)Q9VMG6U,;9T="?]L8O](,O_`G]^0]]>I[E_A^!NZ_@VT_@E3OP7_F_M_
ML_X6\]O8WS']CQ#\EN7J8.YD_(\4HB7\7TK\".Z_4/!?DLW_)(VKK8/E7]+^
M#@,CU3^+RUK_`ZS_U)6^@ZGA/W?1'P\N_\130L)_E<P_2NA#0RWFSVPZOU/\
M+RU_3_^M!_-?W>5O`'%V_%#U`Q<</XXWOXW[".Y'(?-]8,3O);08=?[L1?YY
M%/C36_Z$[%_E\H>12>>O:I&3E=;X7U3(G\+XRW&$OU'U=R[_9;.CS3]4^5OO
M?Q;&WP0J0CH.6D)B,D?Z#\T^>MC?NC']T>W?T.>W^_\YB?N_0-7_N^>__W[^
M-]+_&/Z62?^!D/2F'O_WSYC_Q^=_%B8F)M;_<OYG^W_.__\?N@@B(#_EH5+S
M?`(YY7ES7!Z,?T)[0\.KD>CLQ[Q<R,(K3M;CW$)CX^O,>$3&D!,',0.)SR8]
MO;1Q]'UQ?HAL_[0RU-[KJ8*@LB3.PKOV\<?"RTNG$7PT$7(O;,7_9*Y66*/%
MN;C2LR%YIU<EV)%T?$X/^'$BL:?7J+E2H6J1=)P@6:^ZQ.T&[,D?4[B;#O9]
M"X0'ZZEVBO\\",7G$A.6]]RX[YE@<5RNYMO%-.`<_LC7U]2V`C#0A/RF97#\
MC@ORB/^F9K^I]H/[L64>8*@-\USLNZ?W:FBN_"/IY5979(WF30W^WM7L?6.M
M=#;[R]Y`S<_G9_8%0&TQSQ6$H8PGFTM`]>GS/>4"#DPNS@H3^9=YA7G6&<BM
MSYC&*%_;_'UT;7/N7Y995I@H6*0ZRPCKK!IY?"K,*2<GV;IP\[4=<1B##0<[
M4!+",OBG65>@ZDOUN>6;[*W+2MYBDDF^!+:0-3U4\O^$E!V1^'3I9<5,NYL#
M<<GC]CPTE**G[;]"W+WKA33N):S255REV9O`QBRSZ$!T#L]4\:2TB'V=13[Z
M$^:IX7A#_RN-[4!U'&:@80WJY4*.Y_?J1$QIAB!V_]EAX^0\8-EA-"I>P?C)
M[6539.2!36"Z%2QHQS1*X%XZXO2&&VY_Q8T\[,7K3S1K71<KV5LXQNP&.\*,
M/BM89%Z(1:@{body}gt;MCP'.=TL*"@WVT]LZLE$2^MH62`;09$*6T"=>E3ZCN67186
M.GS(;@"T=12^3\^Z/1$+@+5CCS??2$L!@`,;509_(G@%X$5Q@^96EY=C<2EL
M'I#UQOL1*OSTW@^N0$M/H0`XLZ9U7(PZ>\@@422+$E1(>P&#M$2H7+ZT3R"@
MFS")O5'5)9D_N+IKS+TQ<X&:X3&CP2AIM+L)[4S;J9I`U:A[<GHF[XJ!*M==
M./;[0'W7-,35BRJ-$:QOF1#%H&!U_2A>2_&FB0^?SJ.LMV1"(Q!8V)RFLIKZ
M@F"E%WY:6:`BHENPHVT1K:F-P)2#;<*39$MC>"=3'>.WH,1S"ZI'??,44I%$
M#P1E&!"Q%4U$[(;7\W^$E!%,;DG9@PI+1^G)WJ:6X+`8B23L2A0F%JW.F8=-
M>^%L]3LM[#9JWK>BSIP,36R(>K;[N1Y"57G9;BWM+WQ9SBB8>0[N:E,\VJ"O
M?H@2Z/*%-.%>^!-0J1TE[.V<\!<+BILO[<THW[AA;16LP^F209&OR/N]OE,$
M/V'A)815B_I?XO]`_\3;36*,X7\MAM*K][V%V7XJYEN75]</UY$O/BF&TB**
M`:\3$I.[P$3%E/(8O"PC=V&6.9`R:XS;:G%PL>&V&[$]P7-+V2*J\-\HI&OL
M'<R''KUK;"@$A>?Q027+E1<]%TX%&[V7/74O#OT>,++N;<=0WD#7M?4+X>]&
MUEM<(-[?'SPPRG3;`Y^?H-YU%`S?>D3$+ER`3C),,2K(&P19LY[=JTI=I29Y
M;.OHY=(/;*4N]VL@#L[PZ])-Q_#][Y;)NF7%D[4$7U#';V5W\/CF-UBUQM;1
ME'BM[\?#5JFFRWI$VF8LV>BU[N"UM5WHEW7Z1=LXS%UP*CQNI/FI0)B_!0H`
M9/@O!!YTVN8D'XZCHG'G-NJ:J/)Y.[KW\=JF2.X36L##0=/,DW(]O[.G.^7[
M3L;=>!S\