Last Change 7/17/93. Please send updates directly to Harald. 86BUGS.LST revision 1.0 By Harald Feldmann (harald.feldmann@almac.co.uk), mail address: Hamarsoft, p.o. box 91, 6114 ZH Susteren, The Netherlands. (Please retain my name and address in the document) This file lists undocumented and buggy instructions of the Intel 80x86 family of processors. Some of the information was obtained from the book "Programmer's technical reference, the processor and coprocessor; by Robert L. Hummel; Ziff davis press. ISBN 1-56276-016-5 Which is highly recommended. Note that Intel does not support the special features and may decide to drop opcode variants and instructions in future products. All mentioned trademarks and/or tradenames are owned by the respective owners and are acknowledged. Undocumented instructions and undocumented features of Intel and IIT processors: AAD: OPCODE: d5,0a OPCODE VARIANT This instruction regularly performs the following action: - unpacked BCD in AX example (AX = 0104h) - AL = AH * 10d + AL (AL = 0eh ) - AH = 00 (AH = 00h ) The normal opcode decodes as follows: d5,0a The instruction itself is an instruction plus operand. By replacing the second byte with any number in the range 00 - ff we can build our own instruction AAD for various number systems in those ranges. For example by coding d5,10 we achieve an instruction that performs: AL = AH * 16d + AL. Note: the variant is not supported on all 80x86-compatible CPUs, notably the NEC V-series, because some hard-code the divisor at 0Ah AAM: OPCODE: d4,0a OPCODE VARIANT This instruction regularly performs the following action: - binary number in AL - AH = AL / 10d - AL = AL MOD 10d Thus creating an unpacked BCD in AX. The normal opcode decodes as follows: d4,0a The instruction itself is an instruction plus operand. By replacing the second byte with any number in the range 00 - ff we can build our own instruction AAM for various number systems in that range. For example by coding d4,07 we achieve an instruction that performs: AH = AL / 07d, AL = AL MOD 07d The AAD and AAM opcode variants have been found in Future Domain SCSI controller ROMS. LOADALL: OPCODE: 0f,05 (i80286) & 0f,07 (i80386 & i80486) UNDOCUMENTED Load _ALL_ processor registers. Does exactly as the name suggests, separate versions for i80286 and i80386 exist. The i80286 LOADALL instruction reads a block of 102 bytes into the chip, starting at address 000800 hex. The i80286 LOADALL takes 195 clocks to execute. The sequence is as follows (Hex address, Bytes, Register): 0800: 6 N/A 0806: 2 MSW (Machine Status Word) 0808: 14 N/A 0816: 2 TR (Task Register) 0818: 2 FLAGS (Flags) 081a: 2 IP (Instruction Pointer) 081c: 2 LDT (Local Descriptor Table) 081e: 2 DS (Data Segment) 0820: 2 SS (Stack Segment) 0822: 2 CS (Code Segment) 0824: 2 ES (Extra Segment) 0826: 2 DI (Destination Index) 0828: 2 SI (Source Index) 082a: 2 BP (Base Pointer) 082c: 2 SP (Stack Pointer) 082e: 2 BX (BX register) 0830: 2 DX (DX register) 0832: 2 CX (CX register) 0834: 2 AX (AX register) 0836: 6 ES cache (ES descriptor _cache_) 083c: 6 CS cache (CS descriptor _cache_) 0842: 6 SS cache (SS descriptor _cache_) 0848: 6 DS cache (DS descriptor _cache_) 084e: 6 GDTR (Global Descriptor Table) 0854: 6 LDT cache (Local Descriptor_cache_) 085a: 6 IDTR (Interrupt Descriptor table) 0860: 6 TSS cache (Task State Segment _cache_) Descriptor cache entries are internal copies of the original registers (the LDT cache is normally a copy of the last regularly _loaded_ LDT). Note that after executing LOADALL, the chip will use the _cache_ registers without re-checking the caches against the regular registers. That means that cache and register do not have to be the same. Caches are updated when the original register is loaded again. Both will then contain the same value. Descriptor caches layout: 3 bytes 24 bit physical address of segment 1 byte access rights byte, mapped as access right byte in a regular descriptor. The present bit now represents a valid bit. If this bit is cleared (zero) the segment is invalid and accessing it will trigger exception 0dh. The DPL (Descriptor Privilege Level) fields of the CS and SS descriptor caches determine the CPL (Current Privilege Level). 2 bytes 16 bit segment limit. This layout is the same for the GDTR and IDTR registers, except that the access rights byte must be zero. i80386 LOADALL: The i80386 variant loads 204 (dec) bytes from the address at ES:EDI and resumes execution in the specified state. No timing information available. relative offset: Bytes: Registers: 0000: 4 CR0 0004: 4 EFLAGS 0008: 4 EIP 000c: 4 EDI 0010: 4 ESI 0014: 4 EBP 0018: 4 ESP 001c: 4 EBX 0020: 4 EDX 0024: 4 ECX 0028: 4 EAX 002c: 4 DR6 0030: 4 DR7 0034: 4 TR 0038: 4 LDT 003c: 4 GS (zero extended) 0040: 4 FS (zero extended) 0044: 4 DS (zero extended) 0048: 4 SS (zero extended) 004c: 4 CS (zero extended) 0050: 4 ES (zero extended) 0054: 12 TSS descriptor cache 0060: 12 IDT descriptor cache 006c: 12 GDT descriptor cache 0078: 12 LDT descriptor cache 0084: 12 GS descriptor cache 0090: 12 FS descriptor cache 009c: 12 DS descriptor cache 00a8: 12 SS descriptor cache 00b4: 12 CS descriptor cache 00c0: 12 ES descriptor cache Descriptor caches layout: 1 byte zero 1 byte access rights byte, same as i80286 2 bytes zero 4 bytes 32 bit physical base address of segment 4 bytes 32 bit segment limit UNKNOWN: OPCODE: 0f,04 UNDOCUMENTED This instruction is likely to be an alias for the LOADALL on the i80286. It is not documented and is even marked as unused in the 'Programmer's technical reference'. Still it executes on the i80286. >> info wanted << SETALC: OPCODE: d6 UNDOCUMENTED This instruction copies the Carry Flag to the AL register. In case of a CY, AL becomes ffh. When the Carry Flag is cleared, AL becomes 00. Floating Point special instructions: FMUL4X4: OPCODE: db,f1 IIT ONLY This instruction is available only on the IIT (Integrated Information Technology Inc.) math processors. Takes 242 clocks. The instruction performs a 4x4 matrix multiply in one instruction using four banks of 8 floating point registers. The operands must be loaded to a specific bank in a specific order. The equation solved can be represented by: Xn = (A00 * Xo) + (A01 * Xo) + (A02 * Xo) + (A03 * Xo) Yn = (A10 * Yo) + (A11 * Yo) + (A12 * Yo) + (A13 * Yo) Zn = (A20 * Zo) + (A21 * Zo) + (A22 * Zo) + (A23 * Zo) Vn = (A30 * Vo) + (A31 * Vo) + (A32 * Vo) + (A33 * Vo) Where Xo stands for the original X value and Xn for the result. Operands must be loaded to the following registers in the specified banks in the specified order. Before FMUL4X4 After FMUL4X4 bank bank Register: 0 1 2 0 ST(0) Xo A33 A31 Xn ST(1) Yo A23 A21 Yn ST(2) Zo A13 A11 Zn ST(3) Vo A03 A01 Vn ST(4) A32 A30 ? ST(5) A22 A20 ? ST(6) A12 A10 ? ST(7) A02 A00 ? All four banks can be selected by using the bankswitching instructions, but only bank 0, 1 and 2 make sense since bank 3 is an internal scratchpad. The separate banks can contain 8 floating points and may be re-used with normal instructions. Each bank acts like an independent i80287, except when bankswitched inbetween, in those cases where the initial status is not maintained; Pseudo- multichip operation can be performed in each bank and even in multiple banks at the same time (although only one instruction will operate on one register at any given time), provided that the active register and top register are not changed after switching from bank to bank. EXAMPLE: FINIT ; reset control word FSBP1 ; select bank 1 FLD DWORD PTR es:[si] ; first original FLD DWORD PTR es:[si+4] ; second original FLD DWORD PTR es:[si+8] ; third original FSTCW WORD PTR [bx] ; save FPU control status FSBP2 ; NOTE ! you will see three active registers in this bank when using a debugger FINIT ; nothing visible FLD DWORD PTR [si] ; new value FLD DWORD PTR [si+4] ; second new value FADD ST,ST(1) ; two values visible FSTP DWORD PTR [si+8] ; one value visible FSBP1 ; one original visible FLDCW WORD PTR [bx] ; restore FPU status to the one active in bank 1, causing original three values to be visible again in correct sequence ... simply continue with what you wanted to do with those numbers from es:[si], they are still there. FLD DWORD PTR [si+8] ; for instance... This feature of the IIT chips can be used to perform complex operations in registers with many components remaining the same for a large dataset, only saving intermediary results to ONE memory location, bankswitching to the next series of operands, loading that ONE operand and continuing the calculation with the next set of operands already in that bank. This does require another read into the new bank but may save time and memoryspace compared to memory based operands or multiple pass algorithms with multiple arrays of intermediary results. BANKSWITCH INSTRUCTIONS: FSBP0: OPCODE: db,e8 IIT ONLY Selects the original bank. (default) (6 clocks) FSBP1: OPCODE: db,eb IIT ONLY Selects bank 1 from FMUL4X4 instruction diagram (6 clocks) FSBP2: OPCODE: db,ea IIT ONLY Selects bank 2 from FMUL4X4 instruction diagram (6 clocks) FSBP3: OPCODE: db,e9 IIT ONLY UNDOCUMENTED Selects the scratchpad bank3 used by the FMUL4X4 internally. Not very useful but funny to look at... How-to: load any value into bank 0,1 or 2 until you have a full 8 registers, then execute this bankswitch. Using a debugger like CodeView you are now able to inspect the bank3 registers. (most likely to take 6 clocks) TRIGONIOMETRIC FUNCTIONS: Apparently the IIT 2c87 recognises and executes some i80387 trigoniometric functions. UNDOCUMENTED FSIN (sine) and FCOS (cosine) have been tested and function according to the Intel 80387 specifications. FSINCOS (available on the Intel 80287XL, 80387 and up) does not work. FSIN: OPCODE: d9,fe IIT 2c87+ (also Intel 80387+) UNDOCUMENTED Calculates the sine of the input in radians in ST(0). After calculation, ST(0) contains the sine. Takes approximately 120 clocks. FCOS: OPCODE: d9,ff IIT 2c87+ (also Intel 80387+) UNDOCUMENTED Calculates the cosine of the input in radians in ST(0). After calculation, ST(0) contains the cosine. Takes approximately 120 clocks. ... CUT HERE FOR FIRST REVISION, next part is to be revised ... Instructions by mnemonic mnemonic: opcode: processor: remark & remedy: AAA i80286 & i80386 & i80486 CMPS i80286 CMPXCHG i80486 FINIT FSTSW FSTCW INS i80286 & i80386 & i80486 INVD i80486 MOV to SS n/a early 8088 Some early 8088 would not properly disable interrupts after a move to the SS register. Workaround would be to explicitly clear the interrupts, update SS and SP and then re-enable the interrupts. Typically this would occur in a situation where one would relocate a stack in memory, more than 64Kb from the original one, updating both SS and SP like in: MOV SS,AX ; would disable interrupts automatically during this and next instruction. MOV SP,DX ; interrupts disabled ... ; interrupts enabled. multiple prefixes with REPx 8088 & 8086 They would not properly restart at the first prefix byte after an interrupt. when more than one prefix is used. e.g. LOCK REP MOVSW CS:[bx]. A workaround is to test after the instruction for CX==0, here: LOCK REP MOVSW CS:[BX] OR CX,CX JNZ here because of the CS override, the REP and LOCK prefixes would not be recognised to be part of the instruction and the REP MOVSW would be aborted. This also seems to be the case for a REP MOVSW CS:[BX] Note that this also implies that REPZ, REPNZ are affected in SCASW for instance.