gemini - kennedy.gemi.dev

💾 Archived View for spam.works › mirrors › textfiles › computers › 86bugs.lst captured on 2023-06-14 at 15:58:31.
-=-=-=-=-=-=-
Last Change 7/17/93.  Please send updates directly to Harald.


86BUGS.LST   revision 1.0
By Harald Feldmann (harald.feldmann@almac.co.uk), mail address:
Hamarsoft, p.o. box 91, 6114 ZH  Susteren, The Netherlands.
(Please retain my name and address in the document)

This file lists undocumented and buggy instructions of the Intel 80x86
family of processors. Some of the information was obtained from the book
"Programmer's technical reference, the processor and coprocessor; by
Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly
recommended. Note that Intel does not support the special features and
may decide to drop opcode variants and instructions in future products.

All mentioned trademarks and/or tradenames are owned by the respective
owners and are acknowledged.

Undocumented instructions and undocumented features of Intel and IIT
processors:

AAD:	    OPCODE: d5,0a     OPCODE VARIANT

	    This instruction regularly performs the following action:
		- unpacked BCD in AX	  example (AX = 0104h)
		- AL = AH * 10d + AL		  (AL = 0eh )
		- AH = 00			  (AH = 00h )

	    The normal opcode decodes as follows: d5,0a
	    The instruction itself is an instruction plus operand. By
	    replacing the second byte with any number in the range 00 -
	    ff we can build our own instruction AAD for various number
	    systems in those ranges. For example by coding d5,10 we
	    achieve an instruction that performs: AL = AH * 16d + AL.

	    Note: the variant is not supported on all 80x86-compatible
	    CPUs, notably the NEC V-series, because some hard-code the
	    divisor at 0Ah

AAM:	    OPCODE: d4,0a     OPCODE VARIANT

	    This instruction regularly performs the following action:
		- binary number in AL
		- AH = AL / 10d
		- AL = AL MOD 10d

	    Thus creating an unpacked BCD in AX.
	    The normal opcode decodes as follows: d4,0a
	    The instruction itself is an instruction plus operand. By
	    replacing the second byte with any number in the range 00 -
	    ff we can build our own instruction AAM for various number
	    systems in that range. For example by coding d4,07 we
	    achieve an instruction that performs: AH = AL / 07d, AL = AL
	    MOD 07d

	    The AAD and AAM opcode variants have been found in Future
	    Domain SCSI controller ROMS.


LOADALL:    OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486)
	    UNDOCUMENTED

	    Load _ALL_ processor registers. Does exactly as the name
	    suggests, separate versions for i80286 and i80386 exist. The
	    i80286 LOADALL instruction reads a block of 102 bytes into
	    the chip, starting at address 000800 hex. The i80286 LOADALL
	    takes 195 clocks to execute.
	    The sequence is as follows (Hex address, Bytes, Register):

		0800: 6 N/A
		0806: 2 MSW (Machine Status Word)
		0808: 14 N/A
		0816: 2 TR (Task Register)
		0818: 2 FLAGS (Flags)
		081a: 2 IP (Instruction Pointer)
		081c: 2 LDT (Local Descriptor Table)
		081e: 2 DS (Data Segment)
		0820: 2 SS (Stack Segment)
		0822: 2 CS (Code Segment)
		0824: 2 ES (Extra Segment)
		0826: 2 DI (Destination Index)
		0828: 2 SI (Source Index)
		082a: 2 BP (Base Pointer)
		082c: 2 SP (Stack Pointer)
		082e: 2 BX (BX register)
		0830: 2 DX (DX register)
		0832: 2 CX (CX register)
		0834: 2 AX (AX register)
		0836: 6 ES cache (ES descriptor _cache_)
		083c: 6 CS cache (CS descriptor _cache_)
		0842: 6 SS cache (SS descriptor _cache_)
		0848: 6 DS cache (DS descriptor _cache_)
		084e: 6 GDTR (Global Descriptor Table)
		0854: 6 LDT cache (Local Descriptor_cache_)
		085a: 6 IDTR (Interrupt Descriptor table)
		0860: 6 TSS cache (Task State Segment _cache_)

	    Descriptor cache entries are internal copies of the
	    original registers (the LDT cache is normally a copy of the
	    last regularly _loaded_ LDT). Note that after executing
	    LOADALL, the chip will use the _cache_ registers without
	    re-checking the caches against the regular registers. That
	    means that cache and register do not have to be the same.
	    Caches are updated when the original register is loaded
	    again. Both will then contain the same value.

	    Descriptor caches layout:
	    3 bytes	   24 bit physical address of segment
	    1 byte	   access rights byte, mapped as access right
			   byte in a regular descriptor. The present
			   bit now represents a valid bit. If this bit
			   is cleared (zero) the segment is invalid and
			   accessing it will trigger exception 0dh. The
			   DPL (Descriptor Privilege Level) fields of
			   the CS and SS descriptor caches determine
			   the CPL (Current Privilege Level).
	    2 bytes	   16 bit segment limit.
	    This layout is the same for the GDTR and IDTR registers,
	    except that the access rights byte must be zero.


	    i80386 LOADALL:
	    The i80386 variant loads 204 (dec) bytes from the address at
	    ES:EDI and resumes execution in the specified state.
	    No timing information available.

	    relative offset: Bytes:   Registers:
		0000:	     4	      CR0
		0004:	     4	      EFLAGS
		0008:	     4	      EIP
		000c:	     4	      EDI
		0010:	     4	      ESI
		0014:	     4	      EBP
		0018:	     4	      ESP
		001c:	     4	      EBX
		0020:	     4	      EDX
		0024:	     4	      ECX
		0028:	     4	      EAX
		002c:	     4	      DR6
		0030:	     4	      DR7
		0034:	     4	      TR
		0038:	     4	      LDT
		003c:	     4	      GS (zero extended)
		0040:	     4	      FS (zero extended)
		0044:	     4	      DS (zero extended)
		0048:	     4	      SS (zero extended)
		004c:	     4	      CS (zero extended)
		0050:	     4	      ES (zero extended)
		0054:	    12	      TSS descriptor cache
		0060:	    12	      IDT descriptor cache
		006c:	    12	      GDT descriptor cache
		0078:	    12	      LDT descriptor cache
		0084:	    12	      GS descriptor cache
		0090:	    12	      FS descriptor cache
		009c:	    12	      DS descriptor cache
		00a8:	    12	      SS descriptor cache
		00b4:	    12	      CS descriptor cache
		00c0:	    12	      ES descriptor cache

	    Descriptor caches layout:
	    1 byte	   zero
	    1 byte	   access rights byte, same as i80286
	    2 bytes	   zero
	    4 bytes	   32 bit physical base address of segment
	    4 bytes	   32 bit segment limit


UNKNOWN:    OPCODE: 0f,04     UNDOCUMENTED

	    This instruction is likely to be an alias for the LOADALL on
	    the i80286. It is not documented and is even marked as
	    unused in the 'Programmer's technical reference'. Still it
	    executes on the i80286. >> info wanted <<


SETALC:     OPCODE: d6	      UNDOCUMENTED

	    This instruction copies the Carry Flag to the AL register.
	    In case of a CY, AL becomes ffh. When the Carry Flag is
	    cleared, AL becomes 00.


Floating Point special instructions:

FMUL4X4:    OPCODE: db,f1     IIT ONLY

	    This instruction is available only on the IIT (Integrated
	    Information Technology Inc.) math processors.
	    Takes 242 clocks.
	    The instruction performs a 4x4 matrix multiply in one
	    instruction using four banks of 8 floating point registers.
	    The operands must be loaded to a specific bank in a specific
	    order. The equation solved can be represented by:

	    Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo)
	    Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo)
	    Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo)
	    Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo)

	    Where Xo stands for the original X value and Xn for the
	    result. Operands must be loaded to the following registers
	    in the specified banks in the specified order.

		   Before FMUL4X4		 After FMUL4X4

			    bank	       bank
	    Register:	0    1	  2		 0

		ST(0)  Xo   A33  A31		Xn
		ST(1)  Yo   A23  A21		Yn
		ST(2)  Zo   A13  A11		Zn
		ST(3)  Vo   A03  A01		Vn
		ST(4)	    A32  A30		 ?
		ST(5)	    A22  A20		 ?
		ST(6)	    A12  A10		 ?
		ST(7)	    A02  A00		 ?



	    All four banks can be selected by using the bankswitching
	    instructions, but only bank 0, 1 and 2 make sense since bank
	    3 is an internal scratchpad. The separate banks can contain
	    8 floating points and may be re-used with normal
	    instructions. Each bank acts like an independent i80287,
	    except when bankswitched inbetween, in those cases where the
	    initial status is not maintained;

	    Pseudo- multichip operation can be performed in each bank
	    and even in multiple banks at the same time (although only
	    one instruction will operate on one register at any given
	    time), provided that the active register and top register
	    are not changed after switching from bank to bank.


	    EXAMPLE:
		FINIT			     ; reset control word
		FSBP1			     ; select bank 1
		FLD DWORD PTR es:[si]	     ; first original
		FLD DWORD PTR es:[si+4]      ; second original
		FLD DWORD PTR es:[si+8]      ; third original
		FSTCW WORD PTR [bx]	     ; save FPU control status
		FSBP2			     ; NOTE ! you will see three
					       active registers in this
					       bank when using a
					       debugger
		FINIT			     ; nothing visible
		FLD DWORD PTR [si]	     ; new value
		FLD DWORD PTR [si+4]	     ; second new value
		FADD ST,ST(1)		     ; two values visible
		FSTP DWORD PTR [si+8]	     ; one value visible
		FSBP1			     ; one original visible
		FLDCW WORD PTR [bx]	     ; restore FPU status to the
					       one active in bank 1,
					       causing original three
					       values to be visible
					       again in correct
					       sequence

		... simply continue with what you wanted to do with
		those numbers from es:[si], they are still there.

		FLD DWORD PTR [si+8]	     ; for instance...


	    This feature of the IIT chips can be used to perform complex
	    operations in registers with many components remaining the
	    same for a large dataset, only saving intermediary results
	    to ONE memory location, bankswitching to the next series of
	    operands, loading that ONE operand and continuing the
	    calculation with the next set of operands already in that
	    bank. This does require another read into the new bank but
	    may save time and memoryspace compared to memory based
	    operands or multiple pass algorithms with multiple arrays of
	    intermediary results.



BANKSWITCH INSTRUCTIONS:

FSBP0:	    OPCODE: db,e8     IIT ONLY
	    Selects the original bank. (default) (6 clocks)


FSBP1:	    OPCODE: db,eb     IIT ONLY

	    Selects bank 1 from FMUL4X4 instruction diagram (6 clocks)


FSBP2:	    OPCODE: db,ea     IIT ONLY

	    Selects bank 2 from FMUL4X4 instruction diagram (6 clocks)

FSBP3:	    OPCODE: db,e9     IIT ONLY	 UNDOCUMENTED
	    Selects the scratchpad bank3 used by the FMUL4X4 internally.
	    Not very useful but funny to look at... How-to: load
	    any value into bank 0,1 or 2 until you have a full 8
	    registers, then execute this bankswitch. Using a
	    debugger like CodeView you are now able to inspect the
	    bank3 registers. (most likely to take 6 clocks)



TRIGONIOMETRIC FUNCTIONS:

	    Apparently the IIT 2c87 recognises and executes some
	    i80387 trigoniometric functions. UNDOCUMENTED
	    FSIN (sine) and FCOS (cosine) have been tested and function
	    according to the Intel 80387 specifications. FSINCOS
	    (available on the Intel 80287XL, 80387 and up) does not
	    work.

FSIN:	    OPCODE: d9,fe     IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
	    Calculates the sine of the input in radians in ST(0). After
	    calculation, ST(0) contains the sine. Takes approximately
	    120 clocks.

FCOS:	    OPCODE: d9,ff     IIT 2c87+ (also Intel 80387+) UNDOCUMENTED
	    Calculates the cosine of the input in radians in ST(0).
	    After calculation, ST(0) contains the cosine. Takes
	    approximately 120 clocks.


... CUT HERE FOR FIRST REVISION, next part is to be revised ...



Instructions by mnemonic mnemonic:
opcode: processor:  remark & remedy:

AAA	       i80286 & i80386 & i80486

CMPS			i80286
CMPXCHG 		i80486
FINIT
FSTSW
FSTCW


INS			i80286 &
			i80386 &
			i80486

INVD			i80486



MOV to SS	 n/a	early 8088  Some early 8088 would not properly
				    disable interrupts after a move to
				    the SS register. Workaround would
				    be to explicitly clear the
				    interrupts, update SS and SP and
				    then re-enable the interrupts.
				    Typically this would occur in a
				    situation where one would relocate
				    a stack in memory, more than 64Kb
				    from the original one, updating
				    both SS and SP like in:
				      MOV SS,AX  ; would disable
						   interrupts
						   automatically during
						   this and next
						   instruction.
				      MOV SP,DX  ; interrupts disabled
				      ...	 ; interrupts enabled.


multiple prefixes
with REPx		8088 & 8086 They would not properly restart at
				    the first prefix byte after an
				    interrupt. when more than one
				    prefix is used. e.g. LOCK REP MOVSW
				    CS:[bx]. A workaround is to test
				    after the instruction for CX==0,
				    here: LOCK REP MOVSW CS:[BX] OR
				    CX,CX JNZ here because of the CS
				    override, the REP and LOCK prefixes
				    would not be recognised to be part
				    of the instruction and the REP MOVSW
				    would be aborted. This also seems to
				    be the case for a REP MOVSW CS:[BX]
				    Note that this also implies that
				    REPZ, REPNZ are affected in SCASW
				    for instance.