������������������������������
                         SCAN STRINGS, HOW THEY WORK,
                             AND HOW TO AVOID THEM
                        ������������������������������
                                 By Dark Angel
                        ������������������������������

  Scan strings  are the  scourge of  the virus author and the friend of anti-
  virus wanna-bes.   The  virus author  must find encryption techniques which
  can successfully  evade easy detection.  This article will show you several
  such techniques.

  Scan strings,  as you  are well  aware, are  a collection of bytes which an
  anti-viral product  uses to  identify a virus.  The important thing to keep
  in mind  is that  these scan  strings represent  actual code  and can NEVER
  contain code  which could occur in a "normal" program.  The trick is to use
  this to your advantage.

  When a  scanner checks  a file for a virus, it searches for the scan string
  which could  be located  ANYWHERE IN  THE FILE.   The  scanner doesn't care
  where it  is.   Thus, a  file which  consists solely of the scan string and
  nothing else  would be  detected as  infected by  a virus.   A  scanner  is
  basically  an   overblown  "hex  searcher"  looking  for  1000  signatures.
  Interesting, but  there's not  much you  can do  to exploit this.  The only
  thing you  can do  is to  write code so generic that it could be located in
  any program  (by chance).   Try  creating a  file with  the following debug
  script and  scanning it.   This  demonstrates the fact that the scan string
  may be located at any position in the file.

  ---------------------------------------------------------------------------

  n marauder.com
  e 0100  E8 00 00 5E 81 EE 0E 01 E8 05 00 E9

  rcx
  000C
  w
  q

  ---------------------------------------------------------------------------

  Although scanners  normally search  for decryption/encryption  routines, in
  Marauder's case,  SCAN looks  for the  "setup" portion  of the  code,  i.e.
  setting up  BP (to the "delta offset"), calling the decryption routine, and
  finally jumping to program code.

  What you  CAN do  is to  either minimise  the scannable code or to have the
  code constantly  mutate into  something different.  The reasons are readily
  apparent.

  The simplest  technique is  having multiple  encryption engines.   A  virus
  utilising this  technique has  a database  of encryption/decryption engines
  and uses  a random  one each  time it infects.  For example, there could be
  various forms  of XOR  encryption or  perhaps another  form of mathematical
  encryption.   The trick  is to  simply replace  the code for the encryption
  routine each time with the new encryption routine.

  Mark Washburn  used this  in his  V2PX series of virii.  In it, he used six
  different  encryption/decryption   algorithms,  and   some  mutations   are
  impossible to detect with a mere scan string.  More on those later.

  Recently, there  has been  talk of  the so-called  MTE, or mutating engine,
  from Bulgaria  (where else?).   It  utilises the multiple encryption engine
  technique.   Pogue Mahone  used the  MTE and it took McAfee several days to
  find a  scan string.   Vesselin  Bontchev, the McAfee-wanna-be of Bulgaria,
  marvelled the engineering of this engine.  It is distributed as an OBJ file
  designed to  be able to be linked into any virus.  Supposedly, SCANV89 will
  be able to detect any virus using the encryption engine, so it is worthless
  except for  those who  have an  academic interest  in such matters (such as
  virus authors).

  However,  there   is  a  serious  limitation  to  the  multiple  encryption
  technique, namely  that scan  strings may  still be  found.   However, scan
  strings must  be isolated  for each  different encryption  mechanism.    An
  additional  benefit   is  the   possibility  that  the  antivirus  software
  developers will  miss some  of the  encryption mechanisms  so not  all  the
  strains of the virus will be caught by the scanner.

  Now we  get to  a much better (and sort of obvious) method: minimising scan
  code length.   There are several viable techniques which may be used, but I
  shall discuss but three of them.

  The one  mentioned before which Mark Washburn used in V2P6 was interesting.
  He first  filled the  space to  be filled  in with the encryption mechanism
  with dummy  one byte  op-codes such  as CLC, STC, etc.  As you can see, the
  flag manipulation  op-codes were  exploited.   Next, he randomly placed the
  parts of  his encryption  mechanism in  parts of this buffer, i.e. the gaps
  between the  "real" instructions were filled in with random dummy op-codes.
  In this manner, no generic scan string could be located for this encryption
  mechanism of  this virus.   However, the disadvantage of this method is the
  sheer size of the code necessary to perform the encryption.

  A second  method is  much simpler than this and possibly just as effective.
  To minimise scan code length, all you have to do is change certain bytes at
  various intervals.   The  best way  to do  this can  be explained  with the
  following code fragment:

    mov si, 1234h                     ; Starting location of encryption
    mov cx, 1234h                     ; Virus size / 2 + variable number
  loop_thing:
    xor word ptr cs:[si], 1234h       ; Decrypt the value
    add si, 2
    loop loop_thing

  In this code fragment, all the values which can be changed are set to 1234h
  for the  sake of  clarity.   Upon infection,  all you  have to do is to set
  these variable  values to  whatever is  appropriate  for  the  file.    For
  example, mov  bx, 1234h  would have  to be  changed to  have the encryption
  start at the wherever the virus would be loaded into memory (huh?).  Ponder
  this for  a few  moments and  all shall  become clear.   To  substitute new
  values into the code, all you have to do is something akin to:

    mov [bp+scratch+1], cx

  Where scratch is an instruction.  The exact value to add to scratch depends
  on the  coding of  the op-code.   Some  op-codes take their argument as the
  second byte,  others take  the  third.    Regardless,  it  will  take  some
  tinkering before it is perfect.  In the above case, the "permanent" code is
  limited to  under five or six bytes.  Additionally, these five or six bytes
  could theoretically  occur in  ANY PROGRAM  WHATSOEVER, so  it would not be
  prudent for  scanners to search for these strings.  However, scanners often
  use scan  strings with wild-card-ish scan string characters, so it is still
  possible for a scan string to be found.

  The important  thing to  keep in  mind when using this method is that it is
  best for  the virus  to use separate encryption and decryption engines.  In
  this manner, shorter decryption routines may be found and thus shorter scan
  strings will  be needed.   In  any  case,  using  separate  encryption  and
  decryption engines increases the size of the code by at most 50 bytes.

  The last method detailed is theft of decryption engines.  Several shareware
  products utilise  decryption engines  in their  programs to  prevent simple
  "cracks" of  their products.   This  is, of  course, not a deterrent to any
  programmer worth  his salt,  but it  is useful  for virus  authors.  If you
  combine the  method above  with  this  technique,  the  scan  string  would
  identify the  product as  being infected with the virus, which is a) bad PR
  for the company and b) unsuitable for use as a scan string.  This technique
  requires virtually  no effort,  as the decryption engine is already written
  for you by some unsuspecting PD programmer.

  All the  methods described  are viable  scan  string  avoidance  techniques
  suitable for  use in  any virus.   After  a few practice tries, scan string
  avoidance should  become  second  nature  and  will  help  tremendously  in
  prolonging the effective life of your virus in the wild.