������������������������������ SCAN STRINGS, HOW THEY WORK, AND HOW TO AVOID THEM ������������������������������ By Dark Angel ������������������������������ Scan strings are the scourge of the virus author and the friend of anti- virus wanna-bes. The virus author must find encryption techniques which can successfully evade easy detection. This article will show you several such techniques. Scan strings, as you are well aware, are a collection of bytes which an anti-viral product uses to identify a virus. The important thing to keep in mind is that these scan strings represent actual code and can NEVER contain code which could occur in a "normal" program. The trick is to use this to your advantage. When a scanner checks a file for a virus, it searches for the scan string which could be located ANYWHERE IN THE FILE. The scanner doesn't care where it is. Thus, a file which consists solely of the scan string and nothing else would be detected as infected by a virus. A scanner is basically an overblown "hex searcher" looking for 1000 signatures. Interesting, but there's not much you can do to exploit this. The only thing you can do is to write code so generic that it could be located in any program (by chance). Try creating a file with the following debug script and scanning it. This demonstrates the fact that the scan string may be located at any position in the file. --------------------------------------------------------------------------- n marauder.com e 0100 E8 00 00 5E 81 EE 0E 01 E8 05 00 E9 rcx 000C w q --------------------------------------------------------------------------- Although scanners normally search for decryption/encryption routines, in Marauder's case, SCAN looks for the "setup" portion of the code, i.e. setting up BP (to the "delta offset"), calling the decryption routine, and finally jumping to program code. What you CAN do is to either minimise the scannable code or to have the code constantly mutate into something different. The reasons are readily apparent. The simplest technique is having multiple encryption engines. A virus utilising this technique has a database of encryption/decryption engines and uses a random one each time it infects. For example, there could be various forms of XOR encryption or perhaps another form of mathematical encryption. The trick is to simply replace the code for the encryption routine each time with the new encryption routine. Mark Washburn used this in his V2PX series of virii. In it, he used six different encryption/decryption algorithms, and some mutations are impossible to detect with a mere scan string. More on those later. Recently, there has been talk of the so-called MTE, or mutating engine, from Bulgaria (where else?). It utilises the multiple encryption engine technique. Pogue Mahone used the MTE and it took McAfee several days to find a scan string. Vesselin Bontchev, the McAfee-wanna-be of Bulgaria, marvelled the engineering of this engine. It is distributed as an OBJ file designed to be able to be linked into any virus. Supposedly, SCANV89 will be able to detect any virus using the encryption engine, so it is worthless except for those who have an academic interest in such matters (such as virus authors). However, there is a serious limitation to the multiple encryption technique, namely that scan strings may still be found. However, scan strings must be isolated for each different encryption mechanism. An additional benefit is the possibility that the antivirus software developers will miss some of the encryption mechanisms so not all the strains of the virus will be caught by the scanner. Now we get to a much better (and sort of obvious) method: minimising scan code length. There are several viable techniques which may be used, but I shall discuss but three of them. The one mentioned before which Mark Washburn used in V2P6 was interesting. He first filled the space to be filled in with the encryption mechanism with dummy one byte op-codes such as CLC, STC, etc. As you can see, the flag manipulation op-codes were exploited. Next, he randomly placed the parts of his encryption mechanism in parts of this buffer, i.e. the gaps between the "real" instructions were filled in with random dummy op-codes. In this manner, no generic scan string could be located for this encryption mechanism of this virus. However, the disadvantage of this method is the sheer size of the code necessary to perform the encryption. A second method is much simpler than this and possibly just as effective. To minimise scan code length, all you have to do is change certain bytes at various intervals. The best way to do this can be explained with the following code fragment: mov si, 1234h ; Starting location of encryption mov cx, 1234h ; Virus size / 2 + variable number loop_thing: xor word ptr cs:[si], 1234h ; Decrypt the value add si, 2 loop loop_thing In this code fragment, all the values which can be changed are set to 1234h for the sake of clarity. Upon infection, all you have to do is to set these variable values to whatever is appropriate for the file. For example, mov bx, 1234h would have to be changed to have the encryption start at the wherever the virus would be loaded into memory (huh?). Ponder this for a few moments and all shall become clear. To substitute new values into the code, all you have to do is something akin to: mov [bp+scratch+1], cx Where scratch is an instruction. The exact value to add to scratch depends on the coding of the op-code. Some op-codes take their argument as the second byte, others take the third. Regardless, it will take some tinkering before it is perfect. In the above case, the "permanent" code is limited to under five or six bytes. Additionally, these five or six bytes could theoretically occur in ANY PROGRAM WHATSOEVER, so it would not be prudent for scanners to search for these strings. However, scanners often use scan strings with wild-card-ish scan string characters, so it is still possible for a scan string to be found. The important thing to keep in mind when using this method is that it is best for the virus to use separate encryption and decryption engines. In this manner, shorter decryption routines may be found and thus shorter scan strings will be needed. In any case, using separate encryption and decryption engines increases the size of the code by at most 50 bytes. The last method detailed is theft of decryption engines. Several shareware products utilise decryption engines in their programs to prevent simple "cracks" of their products. This is, of course, not a deterrent to any programmer worth his salt, but it is useful for virus authors. If you combine the method above with this technique, the scan string would identify the product as being infected with the virus, which is a) bad PR for the company and b) unsuitable for use as a scan string. This technique requires virtually no effort, as the decryption engine is already written for you by some unsuspecting PD programmer. All the methods described are viable scan string avoidance techniques suitable for use in any virus. After a few practice tries, scan string avoidance should become second nature and will help tremendously in prolonging the effective life of your virus in the wild.