💾 Archived View for aphrack.org › issues › phrack52 › 8.gmi captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2021-12-03)
-=-=-=-=-=-=-
---[ Phrack Magazine Volume 8, Issue 52 January 26, 1998, article 08 of 20 -------------------------[ Steganography Thumbprinting --------[ The HackLab (http://www.hacklab.com) Steg`a*nog"ra*phy (?), n. [Gr. covered (fr. to cover closely) + -graphy.] The art of writing in cipher, or in characters which are not intelligible except to persons who have the key; cryptography. i. Introduction While this may be a general description of cryptography, steganography has come to describe not only the act of encrypting data, but also of hiding its very existence. Steganography (or "stego") uses techniques to store a "message" file within a "container" file by altering the container file in such a way as to make the original file _appear_ unchanged. The resulting file can be referred to as the stego file and contains the message file enclosed in a close approximation of the original container file. Several tools exist (mostly for DOS/Windows/NT) which automate these functions using DES, DES3 or IDEA as encryption methods and BMP, GIF, JPG, WAV, VOC and even ASCII files as containers. Using these tools, data can be hidden within images, sounds, and even other data files. However, these tools do leave perceptible traces on their container files and do not offer nearly the level of obfuscation the user assumes. This article will provide the reader with a fundamental understanding of basic stego techniques and will highlight some of the "thumbprints" left by modern steganographic toolsets, specifically on graphic images. Not intended to challenge the cryptographic strength or perceptible mathematical variances of current steganographic techniques, this article will give the reader a basic understanding of stego and suggest low-budget methods for detecting and cracking basic steganographic techniques. Also presented is a program which can be used to brute-force two of the most popular stego toolsets. I. Basic Steganography Simply put, steganography involves the hiding of messages. While there are many techniques employed by the various tools, the least common denominator amongst most toolsets is the modification of some of the Least Significant Bits (or LSBs) of the container file's individual bytes. In the simplest example, consider the following binary representations of the numbers 20 through 27: 10100 10101 10110 10111 11000 11001 11010 11011 By modifying the LSBs of these binary digits, we can hide the binary representation of the number 200 (11001000) across the above bytestream: 10101 10101 10110 10110 11001 11000 11010 11010 By reconstructing the LSBs of the above bytestream, we recover the number 200 (11001000). In the above example, the original bytestream of the numbers 20-27 is the container, while the number 200 is the message file. This is a very poor basic example since the resulting stego file is not an accurate representation of the original file. After modification to include the message file, the numbers 20-27 now read: 21 21 22 22 25 24 26 26 However, in most stego applications, the container file does not contain bytestreams which are rendered useless by modifying LSB information. Instead, container files typically contain various levels of "noise" at the level of the LSB's which when viewed apart from the rest of the byte can appear random. A sound (.WAV) file, for example contains mostly inaudible background noise at the LSB level. An 8-bit graphic file will contain minor color differences at the LSB level, while a 24-bit image will contain color changes which are nearly imperceptible to the human eye. A very common container format is a 256 color, 8 bit image such as a GIF or BMP file. II. Stego Techniques In an 8-bit image such as a GIF or BMP each pixel is described as a number from 0 - 255 which refers to an actual color in the "color lookup table" or palette. A common misconception is that all images simply contain strings of bytes that describe individual colors, and that the graphic file simply lists these colors in left-to-right, and top-to-bottom fashion. This is only partially true for 8-bit images. The palette lists every color that is used in the image (and extra colors, if less than 256 total colors are actually used in the image), and the image data itself is stored as a series of digits from 0 - 255 which reference an entry in the palette. In this way, the image can be reconstructed by performing palette lookups to determine the color to insert at that pixel location. In order to hide data within an 8-bit GIF or BMP container, most existing tools use one of two techniques which I will term LSB palette reference modification and RGB element LSB modification. LSB palette reference modification involves changing the LSB(s) of a _palette_reference_ (0 - 255) in order to hide the data contained in the message. Remember that a palette reference simply contains a number from 0 - 255 which references a color, or entry, in the palette. In order to hide data, a program utilizing palette reference modification may decide which color to point to based on the color's LSBs. This type of program will pay no attention to how similar the colors are, only whether or not the LSBs serve its purpose of data hiding. If the adjacent colors in the palette have dissimilar LSBs, they are well suited for data hiding and become good candidates for storing hidden text in the final stegoed container. If a 0 (zero) is meant to be hidden, the stego program inserts the palette index reference of the color with the LSB of 0 (zero), and vice versa for hiding a 1 (one). RGB element LSB modification involves modifying the pixel's _actual_color_ by changing the LSB of the Red, Green or Blue elements of the color in the color table. For example, the color "white" is represented by the RGB values 255,255,255 which in binary equates to: 11111111 11111111 11111111 listed in RGB order. By altering the LSB of each color in the RGB element, we can hide data by making almost identical copies of colors such that only the LSBs are different. Since the color is only changed by one or two LSBs, the resulting colors are very close, perhaps undetectable to the human eye. The result of this change to the colors in the table enables nearly identical colors to be referenced by multiple table entries. This becomes extremely obvious when the palette is viewed and sorted by luminance (relative brightness)in a product such as Paint Shop Pro. These similar colors will be grouped right next to each other in a luminance-sorted palette. Using this technique, a binary 1 in the message file can be represented in the stego file by replacing a color in the container file with an altered version of that color whose RG or B element ends with a binary 1. Likewise, a binary 0 in the message file can be represented in the stego file by replacing the original color in the container file with an altered version of that color whose RG or B element ends with a binary 0. III. Steganographic Thumbprints Several tools are available that apply these techniques to files on several different platforms. I will focus on two specific toolsets; Steganos and S-Tools v4.0. Steganos is perhaps the most versatile and powerful of the toolsets, while S-Tools seems to be the easiest and most widely used (not to mention the fact that I like S-Tools; it's been around for a long time and is very well done). Other available toolsets include similar functionality and hiding techniques. In order to discover what the tools actually do when they hide data, it's best to use a simple BMP container file. The RGB BMP file utilizes a palette scheme identical to that of a GIF for the purposes of our tests, and all the reviewed toolsets can use BMP files as containers. For example, consider a container image which is 50 pixels by 50 pixels and contains only black-colored (0,0,0) pixels. This image references palette entry 0 (zero) as its only color. I will use a freeware painting program Paint Shop Pro V4.10 (PSP) to create and analyze the base images. When creating this image, PSP used a default palette with 216 unique palette entries and 40 "filler" entries at the end of the palette all of which contain the value (0,0,0) or pure black. Our message file is simply a text file which contains the phrase "This is a test." A. S-Tools When the message file is hidden using S-Tools, the resulting 8-bit image appears identical to the human eye when compared to the original. However, there are perceptible oddities about the file which are revealed under closer scrutiny. Since S-Tools uses RGB element LSB modification as its hiding technique, the palette has distinct and very obvious characteristics. Many of the palette's colors are offset by a single bit in the R,G or B element. This is very obvious when the palette is sorted by luminance (brightness) and viewed with PSP. The first sixteen (and only original) colors in this palette are: (51,1,1) (51,1,0) (50,1,0) (51,0,1) (51,0,0) (50,0,1) (50,0,0) (1,1,0) (1,1,0) (0,1,1) (0,1,0) (1,0,1) (1,0,1) (1,0,0) (0,0,1) (0,0,0) Notice that the offsets of the RGB elements are only 1 bit. This is an imperceptible color change, and is a very wasteful use of the palette. Remember, there are only 256 colors to work with. Most 8-bit image creation programs are very careful when deciding which colors to include in the palette, and almost all use standard palettes which contain all the most commonly used colors. To see a palette with this many _nearly_ identical colors is odd. Also, the palette has been adjusted to contain less colors. The standard colors selected by PSP have been replaced by some of the colors listed above. As is typical with this type of hiding, the slack space at the end of the palette has been reduced to make room for the new copies of existing colors. This type of hiding will always make itself obvious by using single-bit offsets in one or more of the LSBs. Since this type of thumbprint is so easily identifiable, we will concentrate our efforts on the harder-to-detect palette reference method used by Steganos. B. Steganos Steganos kindly reminds you that 8-bit images don't make terribly secure containers. It's a good thing, too, because when the message file is hidden using Steganos the resulting 8-bit image has a major anomaly- the stego image is completely different than the original! As opposed to an all-black image, the image now resembles a black-and-blue checkerboard. However, this difference is only obvious if you have access to the original image. Since an interceptor will most likely not have a copy of the original image, we will examine other methods of detection. When the palette of the image is checked for single-bit offset colors (as in the stego image created with S-Tools), none can be found. Also, there is no more or less slack space at the end of the palette than existed in the original palette. Steganos does not alter the palette in any way when hiding data. It uses the LSB palette reference technique described above. However, there are very distinctive ways of determining if this technique has been used to hide data, specifically by looking at _how_ the palette's colors are used. In this simple case, a histogram will show exactly the type of modification we are looking for. In the words of the PSP Help documentation, "A histogram is a graph of image color values, typically RGB values and/or luminance. In a histogram, the spectrum for a color component appears on the horizontal axis, and the vertical axis indicates the portion of the image's color that matches each point on the component's spectrum." In a nutshell, this simply means a graph is generated showing how the color(s) are used in an image, and how similar (in shade) they are. When viewing the "blue" histogram for the Steganos-hidden file, we see something like this: 100= X X - X X 90 = X X - X X 80 = X X - X X 70 = X X - X X 60 = X X - X X 50 = X X - X X 40 = X X - X X 30 = X X - X X 20 = X X - X X 10 = X X - X X 00 = X X . ! . ! . ! . ! . ! . ! . ! . ! . ! . ! . . . 0 1 2 3 4 5 6 7 8 9 2 0 0 0 0 0 0 0 0 0 0 5 5 The X-axis shows the spectrum for the color blue (from 0 to 255). The Y-axis shows the number of pixels in the image that match that color. When displaying a histogram, the 100 on the Y axis is not percentage, but a MAX value (in this case 1272) which indicates the greatest number of pixels used for _any_one_color_. Since there are really only two colors _used_ in this stego image, there are only two vertical bars. These bars indicate that in the Blue color family there are really only two colors used; one with a blue value of zero, and another with a blue value of approximately 50 (51 to be exact). Upon examining the color table for this image sorted in _palette_order_, it is evident that these two referenced colors are only similar since they are placed right next to one another in the palette. The two colors are (0,0,0) and (0,0,51) or black and very, very dark blue. The image mostly has black hues, and Steganos probably picked the very dark blue color (00110011) as the 1 for some hidden data, and black (00000000) as the 0 for some hidden data since these colors are _right_ next to each other in a palette-index-order color table listing. Although they reside next to each other in the palette, the colors are not very similar which makes the final stego file appear discolored. Steganos does not modify any of the colors, but it modifies how the original palette is used by making nearly equal references to a color and its neighbor (when sorted by palette index). Bottom line: this image uses neighboring palette colors nearly an identical number of times. 1272 pixels were used for black and 1228 pixels were used for the dark, dark blue. This would not be unusual if not for the fact that the colors are palette index neighbors. If the designer of the image were using some sort of shading effect, there would be many more than just two shades involved in this 256 color image, and the shading offsets would be greater. These two colors don't even appear as shades of one another when placed side-by-side. A skilled interceptor will know immediately that something is not quite right with these images. They both display typical signs of data hiding. IV. Real-World example Intercepting a single-color image and determining that it is stegoed is a trivial task. Increasing the number of used colors within the boundaries of the 256-color palette could (so the reader may think) obfuscate the hidden message file. However, by applying a few simple methodologies, a pattern emerges which can increase the odds of detecting a stegoed image. For example, if a two-color image is created using only the colors black (0,0,0) and white (255,255,255), and data is hidden in the file by using Steganos, the results would show that Steganos not only used black and white, but two more colors from the palette are used with values of (0,0,51) and (255,255,51) respectively. These newly-used colors adjoin the original two colors in the palette listing, have differing LSBs, and are referenced nearly as much in the new image as the original colors are. A similar situation evolves when a 6-color image is created. After Steganos hides the data, the original 6 colors and their palette neighbors will be used in the new file. The 6 new colors become alternate representations of the original 6 colors in terms of their LSBs. This methodology holds true all the way up to images containing 256 different colors. By understanding these patterns, all 8-bit Steganos images can be detected without access to the original image. When attempting to detect the use of steganography in 16 or 24-bit images, a great deal of pattern analysis must be used. 24-bit stego detection is not for the faint of heart, but it can be done. Standard "randomization" solutions fall quite short of solving this problem since LSB data in image creation programs is hardly random. It follows a pronounced pattern when viewed as a part of a whole: an 8-bit number. Most standard graphics effects do not use random data, they use patterns to create and maintain a certain graphic illusion. Inserting "random" data, even at the LSB level can become fuel for the analyst's fire. In many 24-bit stego programs, bits in the secret text are generally inserted with average spacing between them, then random "noise" is added to make the secret bits seem less obvious. The random "noise" would (should!) have a random interval between differing bits. The contrast of an average spacing against random spacing may be enough to not only alert an analyst, but to point out where secret bits start and random bits begin. The bottom line is that 24-bit detection is doable, just not practical for an amateur- yet! V. The Future Steganography is in it's infancy, but several new technologies are emerging including selection and construction methods of data hiding and continuing research in the area of random distribution. Selection involves the generation of a large number of copies of the same container file that differ slightly. In the case of an image file, you may make minor adjustments in hue, saturation and RGB levels to the end that your secret message will eventually _appear_ in the LSBs of the data! Although difficult to generate, this type of data hiding is nearly impossible to detect since the image's characteristics are not altered at all. Construction simply involves modeling the characteristics of the original container when creating your message. In simplest terms, mold your message around the existing container instead of molding the container to your message. If, for example the original image were left unchanged, and a key was developed to create the message _from_ the image, detection would be impossible without the key. Several advances are being made in the area of random distribution, specifically by Tuomas Aura at the Helsinki University of Technology. His paper "Practical Invisibility in Digital Communication" presents a technique called "pseudorandom permutation", which brings steganography up to the technical level of cryptography and properly addresses the issue of randomness from a data hiding perspective. His paper is excellent reading and can be found at http://deadlock.hut.fi/ste/ste_html.html Interesting research (and proof-of-concepts) are being done to utilize stego techniques in reserved fields in TCP, UDP and ICMP packets. This research proves that steganography has merit and application beyond sound and image files. Unfortunately, using stego where there was nothing before (ie within typically blank reserved fields) can raise a flag in and of itself. Use encryption and compression to further protect data. It really doesn't matter if the secret data is discovered if the underlying crypto is secure. VI. Conclusion Detecting stego in an 8-bit image is fairly easy. Actually gaining access to the secret text becomes a bit harder yet a simple overlooked method involves bruteforcing the creating application (see S_BRUTE.WBT program below). On the other hand, 24-bit image analysis requires quite a bit of work. If you choose to employ data hiding techniques, use 24-bit images and compress and encrypt your message file, bearing in mind that 24-bit images can raise flags simply due to their size. When attempting to identify stego files in 8-bit images, keep in mind the following pointers: