o Rechecking opcodes w/ Intel document stopped at BOUND o Add special exceptions information, if any o Correct all incorrect displays of the auxiliary flag (a) that say (a - Same as c) and fix the code in the instruction accordingly. 8086 Initial 16bit CPU 8088 A cheap alternative to the 8086 that used only an 8bit bus. 8087 Floating point co-processor for the 8086 80186 Enhanced 8086 80286 Identical in speed (cycles per instruction) as the 186. Introduced protected mode 80287 Floating point co-processor for the 80286 80386 Introduced 32bit mode, as we as upgrading all general registers to support a 32bit equivalent 80387 Floating point co-processor for the 80386 80486 Greatly enhanced cycle operations and more instructions. Also incorporated the floating point instructions as a part of the CPU. Added the following instructions: BSWAP, XADD, CMPXCHG, INVD, WBINVD, INVLPG Later 80486s also had the CPUID instruction Pentium Introduced UV piping (two unrelated instructions can run parallel) The original Pentium had a bug in the FDIV instruction (what exactly was it???) Added the following instructions: CMPXCHG8B, CPUID (later 486s also had it), RDTSC, RDMSR, WRMSR, RSM The form of MOV used to access test registers was removed and all future Intel processors do not support them Added the CR4 register. Pentium Pro First processor that uses the micro-ops concept. They do NOT include the MMX instructions Added the following instructions: CMOVcc, FCMOVcc, FCOMI, FCOMIP, FUCOMI, FUCOMIP, RDPMC, UD2 Pentium MMX These later Pentiums included the MMX (Multi-Media eXtension) instructions. They do NOT include the new Pentium Pro instructions, and use the same timing as the original Pentium Pentium 2 Combines the Pentium MMX and the Pentium Pro architectures. Celeron Pentium 3 Introduced Streaming SIMD Extensions (SSE) Pentium 4 Introduced SSE2 Pentium 4 Prescott Introduced SSE3 Core 2 Introduced SSSE3 (aka SSE4) MMX Instructions: EMMS - Empty MMX State MOVx - Move (DWord, QWord) PACKx - Pack (Signed Saturation for Word to Byte, Signed Saturation for DWord to Word, Unsigned Saturation for Word to Byte) PADDx - Packed Add (Byte, Word, DWord, Saturation by Byte, Saturation by Word, Unsigned Saturation by Byte, Unsigned Saturation by Word) PAND - Bitwise AND PANDN - Bitwise AND NOT PCMPccx - Packed Compare (Equality by Byte, Equality by Word, Equality by DWord, Greater Than by Byte, Greater Than by Word, Greater Than by DWord) PMADDWD - Packed Multiply Add Word to DWord PMULx - Packed Multiply (High Word, Low Word) POR - Bitwise OR PSLLx - Packed Shift Left Logical (Word, DWord, QWord) PSRAx - Packed Shift Right Arithmetic (Word, DWord) PSRLx - Packed Shift Right Logical (Word, DWord, QWord) PSUBx - Packed Subtract (Byte, Word, DWord, Saturation by Byte, Saturation by Word, Unsigned Saturation by Byte, Unsigned Saturation by Word) PUNPCKx - Unpack Packed Data (High Byte to Word, High Word to DWord, High DWord to QWord, Low Byte to Word, Low Word to DWord, Low DWord to QWord) PXOR - Bitwise XOR SSE Instructions: ADDx - Add (Packed Single, Scalar Single) ANDNPS - Bitwise Locical AND NOT for Single-FP ANDPS - Bitwise Logical AND for Single-FP CMPx - Compare (Packed Single, Scalar Single) COMISS - Scalar Ordered Single-FP Compare and set EFlags CVTx - Packed Conversion (Packed Signed INT32 to Packed Single, Packed Single to Packed Signed INT32, Scalar Signed INT32 to Scalar Single, Scalar Single to Scalar Signed INT32, Truncated Packed Single to Packed Signed INT32, Truncated Scalar Single to Scalar INT32) DIVx - Divide (Packed Single, Scalar Single) FXRSTOR - Restore FP/MMX and Streaming SIMD Extensions State FXSAVE - Store FP/MMX and Streaming SIMD Extensions State LDMXCSR - Load Streaming SIMD Extensions Technology Control/Status Register MASKMOVQ - Byte Mask Write MAXx - Maximum (Packed Single, Scalar Single) MINx - Minimum (Packed Single, Scalar Single) MOVx - Move (Aligned Packed Single, High to Low Packed Single, High Packed Single, Low to High Packed Single, Low Packed Single, Mask to Integer, Non Temporal Aligned Packed Single, Non Temporal QWord, Scalar Single, Unaligned Packed Single) MULx - Multiply (Packed Single, Scalar Single) ORPS - Bitwise Logical OR for Single-FP Data PAVGx - Packed Average Byte/Word PEXTRW - Extract Word PINSRW - Insert Word PMAXx - Packed Integer Maximum (Unsigned Byte, Signed Word) PMINx - Packed Integer Minimum (Unsigned Byte, Signed Word) PMOVMSKB - Move Byte Mask to Integer PMULHUW - Packed Multiply (High Unsigned Word) PSADBW - Packed Sum of Absolute Differences PSHUFW - Packed Shuffle Word RCPPS - Packed Single-FP Reciprocal RSQRTx - Square Root Reciprocal (Packed Single, Scalar Single) SFENCE - Store Fence SHUFPS - Shuffle Single-FP SQRTx - Square Root (Packed Single, Scalar Single) STMXCSR - Store Streaming SIMD Extensions Technology Control/Status Register SUBx - Subtract (Packed Single, Scalar Single) UCOMISS - Unordered Scalar Single-FP Compare and Set EFlags UNPCKx - Unpack (High Packed Single, Low Packed Single) XORPS - Bitwise Logical XOR for Single-FP Data : --------------------------------------------------------------------------------------------------- AMD Family K5 K6 K6-2 K6-3 K7/Athlon Athlon64 AthlonFX *************************************************************************************************** Registers --------------------------------------------------------------------------------------------------- Integer Registers ___________________________________________________________________________________________________ Flags/EFlags 16bit (186 and below) or 32bit (286 and above). Holds processor state and comparison flags. BITS 0 - 00000001 - CF: Carry Flag This is set when the high bit of an operation overflows the boundaries of the values used. For example, if you are adding two bytes and the result should be 256, carry will be set and the result will be 0. 1 - 00000002 - RESERVED. Always 1 2 - 00000004 - PF: Parity Flag In operations where this flag is set, it is set if the low byte has an odd number of bits set, clear otherwise 3 - 00000008 - RESERVED. Always 0 4 - 00000010 - AF: Auxiliary carry flag This is set when bit 3 overflows into bit 4. It is useful only for BCD math. 5 - 00000020 - RESERVED. Always 0 6 - 00000040 - ZF: Zero flag In operations where this flag is set, it is set only if the result is zero 7 - 00000080 - SF: Sign flag In operations where this flag is set, it is set only if the high bit of the result is set 8 - 00000100 - TF: Trap flag If set, a debug exception occurs after every instruction. Setting this flag with POPF, POPFD or IRET will cause the very next instruction to cause a debug exception. 9 - 00000200 - IF: Interrupt enable flag Normal hardware interrupts only occur if this flag is set. It may be cleared with CLI and set with STI. CPL, IOPL and the state of the VME flag in CR4 determine whether this flag may be modified. KERBLUH - What are the exact conditions? 10 - 00000400 - DF: Direction flag String opcodes, such as MOVS, increment memory pointers if this flag is clear, decrement if it is set. It may be cleared with CLD and set with STD 11 - 00000800 - OF: Overflow flag In operations where this flag is set, it is set only if the sign of the operation changes incorrectly, representing a signed overflow 12,13-00003000 - IOPL: Input/Output Priviledge level This 2bit value (0 to 3) is the maximum priviledge level the system can be in and still access hardware (and other special operations) directly It may only be modified via POPF or IRET in a program that is at CPL of 0. 14 - 00004000 - NT: Nested Task KERBLUH 15 - 00008000 - RESERVED. Always 0 16 - 00010000 - RF: Resume flag Used for debug management. When this flag is set, the next instruction executed can not cause a debugging exception, then the flag is cleared. It is useful to set this flag before an IRET/IRETD of a debug exception routine to let the program execute the next opcode, ignoring breakpoints. Without this flag, an instruction with a breakpoint would always generate the exception and could not be bypassed. 17 - 00020000 - VM: Virtual-86 Mode Set to enable Virtual-86 mode, clear for normal operation KERBLUH - need more info 18 - 00040000 - AC: Alignment Check An alignment exception occurs if a memory reference is not aligned (eg, a word access on an odd address) if this flag is set, the AM flag in CR0 is set, CPL is 3, and the processor is in protected mode (PE flag in CR0 is set). If any one of these conditions is not met, then alignment exceptions do not occur. 19 - 00080000 - VIF: Virtual Interrupt flag KERBLUH 20 - 00100000 - VIP: Virtual Interrupt Pending KERBLUH 21 - 00200000 - ID: ID flag If you can modify this flag, then the CPUID instruction is available. 22 - 00400000 - RESERVED. Always 0 23 - 00800000 - RESERVED. Always 0 24 - 01000000 - RESERVED. Always 0 25 - 02000000 - RESERVED. Always 0 26 - 04000000 - RESERVED. Always 0 27 - 08000000 - RESERVED. Always 0 28 - 10000000 - RESERVED. Always 0 29 - 20000000 - RESERVED. Always 0 30 - 40000000 - RESERVED. Always 0 31 - 80000000 - RESERVED. Always 0 --------------------------------------------------------------------------------------------------- Floating Point Registers ___________________________________________________________________________________________________ FPUControl aka FPU Control Register or FPU Control Word 16bit. This register holds settings for the FPU, determining which exceptions are masked, what kind of rounding to use, the precision used on the processor and how infinity is handled. BITS 0 - 0001 - IM. Invalid Operation exception mask 1 - 0002 - DM. Denormalized Operand exception mask 2 - 0004 - ZM. Zero Divide exception mask 3 - 0008 - OM. Overflow exception mask 4 - 0010 - UM. Underflow exception mask 5 - 0020 - PM. Precision exception mask 6 - 0040 - RESERVED. Always 1 (according to [2]) 7 - 0080 - RESERVED. Always 0 (according to [2]) 8-9- 0300 - PC. Precision Control: 0000 - Single Precision (24bit [including implied integer bit]) 0100 - RESERVED (KERBLUH- What happens if this is used?) 0200 - Double Precision (53bit [including implied integer bit]) 0300 - Extended Precision (64bit) 10-11-0C00 - RC. Rounding Control: 0000 - Round to nearest (even) 0400 - Round down (towards -infinity) 0800 - Round up (towards +infinity) 0C00 - Round toward zero (truncate) 12 - 1000 - X. Infinity Control 13 - 2000 - RESERVED. Always 0 14 - 4000 - RESERVED. Always 0 15 - 8000 - RESERVED. Always 0 Bits 0 to 5 are exception masks that correspond to the six exception types (and six exception bits in FPUStatus). When a mask bit is set, that exception does not occur, though the exception bit in FPUStatus is still set. When the mask bit is clear, that exception may occur and the corresponding exception bit in FPUStatus will be set. Precision control affects how many bits are used for significand values on the FPU. When reduced precision is used (ie, not Extended Precision), the significand simply clears the unused bits to 0. This only affects the following instructions: FADD, FADDP, FSUB, FSUBP, FSUBR, FSUBRP, FMUL, FMULP, FDIV, FDIVP, FDIVR, FDIVRP, and FSQRT. Rounding control affects how rounding occurs when a single or double precision result is used. KERBLUH The FPU control word becomes 037Fh (round to nearest, all exceptions masked, 64bit precision) after FINIT/FNINIT or FSAVE/FNSAVE. ___________________________________________________________________________________________________ FPURegx aka FPU Data Registers or FPU Registers or MMX Registers 80bit. 8 unique registers. The eight FPU data registers are used to store floating point values OR MMX values. The eight ST(x) FPU stack registers map to physical FPURegx values by this equation: ST(x) = 1; is the same as: FPUReg[(x + ((FPUStatus >> 11) & 7)) & 7] = 1; For emulation purposes, it'd be easiest to just keep the TOP portion of FPUStatus as a separate value to simplify this to: FPUReg[(x + FPUStatusTOP) & 7] = 1; The eight MMX registers map directly to the corresponding floating point values. Note that they only use the low 64bits of each FPUReg, and that before using the FPURegs again for floating point, the EMMS instruction needs to be executed. KERBLUH - What happens if we don't? See KERBLUH for a description of how data is stored in these registers ___________________________________________________________________________________________________ FPUIP aka FPU Instruction Pointer 48bit [KERBLUH, what about older x87s?]. ___________________________________________________________________________________________________ FPUOpcode 11bit. This register holds an 11bit value of the last non-control instruction executed on the FPU. It is created by placing the low 3bits of the first byte of the opcode into bits 8-10 (since only D8-DF are used for FPU), and the full second byte of the opcode into bits 0-7. This completely ignores any prefixes. For example, if the last non-control instruction was "FADDP ST(1), ST(0)", then the actual opcode would be: DE C1. FPUOpcode would take the low 3 bits of DE (0110b), store them into bits 8-10 (0600h) and store the second byte, C1, in bits 0-7 for a final value of 06C1h. ___________________________________________________________________________________________________ FPUOperand aka FPU Operand Pointer or FPU Data Pointer 48bit [KERBLUH, what about older x87s?]. ___________________________________________________________________________________________________ FPUStatus aka FPU Status Register or FPU Status Word 16bit. This value holds flags for exceptions, condition values and the 3bit floating point stack pointer. BITS 0 - 0001 - IE. Invalid operation exception 1 - 0002 - DE. Denormalized operation exception 2 - 0004 - ZE. Zero-Divide exception 3 - 0008 - OE. Overflow exception 4 - 0010 - UE. Underflow exception 5 - 0020 - PE. Precision exception 6 - 0040 - SF. Stack Fault. Indicates that TOP has overflowed or underflowed. 7 - 0080 - ES. Error/Exception Summary Status. Occurs when any of the other exception flags is set and it is unmasked. 8 - 0100 - C0. Condition 0 9 - 0200 - C1. Condition 1 10 - 0400 - C2. Condition 2 11-13- 3800 - TOP. 3bit pointer to the top of the stack. This points to which physical register is currently ST(0). The next highest physical register is ST(1), the next ST(2), etc. For example, if TOP is 6, then ST(0) is FPU register 6, ST(1) is FPU register 7, ST(2) is FPU register 0, etc. 14 - 4000 - C3. Condition 3 15 - 8000 - B. FPU Busy. Reflects the contents of ES. For x87 compatibility only. Bits 0 to 7 are "sticky", in that as soon as any of them is set, they remain set until specifically cleared by software. This may be accomplished with FINIT/FNINIT (Initialize FPU), FCLEX/FNCLEX (Clear Exceptions, which only clears these flags), or FSAVE/FNSAVE. ___________________________________________________________________________________________________ FPUTag aka FPU Tag Register or FPU Tag Word 16bit. Holds eight 2bit values that determine what kind of value is in the eight floating point registers. These are one of the following: Value Description (binary) 00 Valid 01 Zero 10 Special: Invalid (NaN/Unsupported), Infinity or Denormal 11 Empty The tags are set for the physical registers which are offset from the top-of-stack value in the FPU status word: BITS 0,1 - 0003 - TAG for Physical Register 0 2,3 - 000C - TAG for Physical Register 1 4,5 - 0030 - TAG for Physical Register 2 6,7 - 00C0 - TAG for Physical Register 3 8,9 - 0300 - TAG for Physical Register 4 10,11-0C00 - TAG for Physical Register 5 12,13-3000 - TAG for Physical Register 6 14,15-C000 - TAG for Physical Register 7 The only way to access this register is through one of the environment storage mnemonics (FSTENV, FNSTENV, FSAVE or FNSAVE) --------------------------------------------------------------------------------------------------- Operating System Registers KERBLUH - misc notes about OS-level registers In protected mode, attempting to read or write to CR0 to CR4 with the MOV instructions when CPL is not zero will cause a GPL(0) fault. KERBLUH - One chart in [4] says that reading from the CR* registers is allowed in normal programs. This contradicts what it says elsewhere. Therefore, we need to figure out which it is! ___________________________________________________________________________________________________ CR0 aka Control Register 0 aka Machine Status Word (286) 16bit (286) or 32bit (386 and above). Contains flags that control what mode the processor is running in and various states of the processor. Special 286 instructions LMSW and SMSW are used to load and store the 16bit Machine Status Word respectively. These instructions exist on later x86 processors for downward compatibility. BITS 0 - 00000001 - PE: Protection Enable If set, the processor is running in protected mode. If clear, this processor is running in real mode. Note that paging is not necessarily activated when running in protected mode because it is determined by the PG flag. KERBLUH - Information on using PE to switch to/from protected mode 1 - 00000002 - MP: Monitor Coprocessor If set AND TS is set, then a WAIT/FWAIT will throw a device-not-available exception (#NM). Otherwise, WAIT/FWAIT execute as normal. 2 - 00000004 - EM: Emulation (for Floating Point coprocessor) If set, a device-not-available exception (#NM) is thrown whenever a floating point instruction, with the exception of WAIT/FWAIT, is used. WAIT/FWAIT are designed to wait for the coprocessor and since this assumes there is not one, these instructions simply do nothing. This also throws an invalid opcode exception (#UD) when an MMX or any of the SSE instructions is used, with the exception of PREFETCH and SFENCE, which are not affected by EM. If clear, then the coprocessor responds as normal to floating point instructions, and MMX and SSE instructions are used as normal. 3 - 00000008 - TS: Task Switched This flag is set just after every task is switched. While it is set, the next floating-point or MMX instruction will throw a device-not-available (#NM) exception. WAIT/FWAIT will not throw the exception unless MP is also set. This is designed so the operation system does not need to preserve the current floating point environment unless the current task actually uses floating point or MMX instructions. 4 - 00000010 - ET: Extension Type Reserved for Pentium and higher (always 1 for Pentium Pro??? and above). On 486 and 386, if this bit is set, then a 387 coprocessor is available. 5 - 00000020 - NE: Numeric Error KERBLUH 6-15- 0000FFC0 - Reserved (should always be 0???) 16- 00010000 - WP: Write Protect When set, supervisor-level code is not allowed to write to user-level read-only pages. [4] states: "This flag facilitates implementation of the copy-on-write method of creating a new process (forking) used by operating systems such as Unix*". What this means, or what that asterisk are for is not clarified. KERBLUH - Need more details, like what exception occurs, if any... 17- 00020000 - Reserved (should always be 0???) 18- 00040000 - AM: Alignment Mask An alignment exception occurs if a memory reference is not aligned (eg, a word access on an odd address) if this flag is set, the AC flag in EFLAGS is set, CPL is 3, and the processor is in protected mode (PE flag in CR0 is set). If any one of these conditions is not met, then alignment exceptions do not occur. 19-28-1FF80000 - Reserved (should always be 0???) 29- 20000000 - NW: Not Write-Through KERBLUH 30- 40000000 - CD: Cache Disable KERBLUH 31- 80000000 - PG: Paging 386 and above If set, paging is enabled. If clear, paging is disabled. When paging is disabled, all linear addresses are treated the same as physical addresses. Note that this only has an effect if the program is in protected mode (PE is set). If an attempt is made to set this flag when not in protected mode, a general-protection fault error (#GP) is generated. KERBLUH - More info? Or should we just refer them to the paging section of the doc? ___________________________________________________________________________________________________ CR1 aka Control Register 1 32bit. This register is completely reserved. KERBLUH - So... what's the point of it?!? ___________________________________________________________________________________________________ CR2 aka Control Register 2 aka Page-Fault Linear Address 32bit. Contains the page-fault linear address. Any address access at or above this address causes a page fault ??? ___________________________________________________________________________________________________ CR3 aka Control Register 3 aka PDBR aka Page-Directory Base Register 32bit. The top 20bits of this register are the Page-Directory Base, the lower 12bits of which are always assumed to be 0, meaning it must align to a 4KB boundary. The lower 12bits of the register, however, contain extra flags: BITS 0-2- 007 - Reserved (must be 0???) 3 - 008 - PWT: Page-level Writes Transparent KERBLUH 4 - 010 - PCD: Page-Cache Disable KERBLUH 5-11-FE0 - Reserved (must be 0???) ___________________________________________________________________________________________________ CR4 aka Control Register 4 32bit. Contains a set of flags that enable architectural extensions as well as signify virtual modes and the level of Streaming SIMD extensions (SSE instructions) the OS supports. BITS 0 - 00000001 - VME: Virtual-86 Mode Extensions When this flag is set, interrupt and exception handling in Virtual-86 mode automatically uses the 8086 program's handlers instead of calling the Virtual-86 monitor, allowing an increase in speed. It also permits use of the Virtual Interrupt Flag (VIF) to help improve 16bit software compatibility. See Virtual-86 Mode under "Modes of Operation" for more details. 1 - 00000002 - PVI: Protected-Mode Virtual Interrupts When set, this flag permits use of the hardware Virtual Interrupt Flag (VIF) in a protected mode program. KERBLUH - More info? Or discuss this elsewhere... ? 2 - 00000004 - TSD: Time Stamp Disable If set, only programs running at a CPL of 0 are permitted to use RDTSC. If clear, all programs may use the RDTSC mnemonic. 3 - 00000008 - DE: Debugger Extensions If set, debug registers DR4 and DR5 throw an undefined-opcode exception (#UD) if used. If clear, these registers are aliased to KERBLUH. KERBLUH - Need more info 4 - 00000010 - PSE: Page Size Extensions Pentium and above Enables 4MB pages when set, only 4KB pages are used when clear 5 - 00000020 - PAE: Physical Address Extensions Pentium Pro and above Enables the paging mechanism to reference 36bit physical addresses when set, only 32bit addresses are used when clear 6 - 00000040 - MCE: Machine Check Enable Allows the machine check exception (KERBLUH - which exception?), disables it when clear. KERBLUH - What's this for? 7 - 00000080 - PGE: Page Global Enable Enables the "Global Page" setting in the descriptor tables if set, ignores it when clear. Global pages are kept in the currently cached TLB after a task switch or CR3 has changed. Note that this should be cleared before enabling paging via CR0's PG flag. 8 - 00000100 - PCE: Performance-Monitoring Counter Enable If set, all programs are permitted to use RDPMC. If clear, only programs with a CPL of 0 may use RDPMC. 9 - 00000200 - OSFXSR: Operating System FXSAVE/FXRSTOR Support The OS should set this bit if both the CPU and the OS support the use of FXSAVE and FXRSTOR during context switches. If this flag is clear, then all SSE instructions will throw an undefined opcode exception (#UD) (KERBLUH - That's what the chart says, double check this!) 10- 00000400 - OSXMMEXCPT: Operating System Unmasked Exception Support The OS should set this bit if the OS supports unmasked SSE floating point exceptions. 11-31-FFFFF800 - Reserved (must be 0) ___________________________________________________________________________________________________ GDTR aka Global Descriptor Table Register 48bit. This register holds the 32bit base address and 16bit limit for the GDT (Global Descriptor Table). The address is the linear address of the first byte of the GDT, the limit is the number of bytes in the GDT. On power up, the base address is 0 and the limit is 0FFFFh. Before switching to protected mode, the GDTR should be changed. This register may be loaded with LGDT (Load Global Descriptor Table) and stored with SGDT (Store Global Descriptor Table). LGDT may only be run by code running at a CPL of 0. ___________________________________________________________________________________________________ IDTR aka Interrupt Descriptor Table Register 48bit. This register holds the 32bit base address and 16bit limit for the IDT (Interrupt Descriptor Table). The address is the linear address of the first byte of the IDT, the limit is the number of bytes in the IDT. On power up, the base address is 0 and the limit is 0FFFFh. Before switching to protected mode, the IDTR should be changed. This register may be loaded with LIDT (Load Interrupt Descriptor Table) and stored with SIDT (Store Interrupt Descriptor Table). LIDT may only be run by code running at a CPL of 0. ___________________________________________________________________________________________________ LDTR aka Local Descriptor Table Register 80bit. This register holds the current 16bit segment selector, 32bit base address, 16bit segment limit and descriptor attributes for the LDT (Local Descriptor Table). The segment must be one of the segments described in the GDT. The This register may be loaded with LLDT (Load Local Descriptor Table) and stored with SLDT (Store Local Descriptor Table). LLDT may only be run by code running at a CPL of 0. When a task switch occurs, the new segment selector is used to load the LDT with the local data. The contents of the old LDTR are not automatically saved when modifying LDTR. On power up, the base address is 0 and the limit is 0FFFFh. ___________________________________________________________________________________________________ TR aka Task Register 80bit. This register holds the current 16bit segment selector, 32bit base address, 16bit segment limit and descriptor attributes for the TSS. The segment must be one of the segments described in the GDT. This register may be loaded with LTR (Load Task Register) and stored with STR (Store Task Register). LTR may only be run by code running at a CPL of 0. When a task switch occurs, the new segment selector is used to load the TSS with the local data. The contents of the old TR are not automatically saved when modifying TR. On power up, the base address is 0 and the limit is 0FFFFh. *************************************************************************************************** Modes of Operation For the 286 and above, there are multiple possible modes of operation. To act like older x86 processors, they all start in Real Mode which is identical to how the 8088, 8086 and 80186 operate. All of the modes may be toggled by software, with the exception of SMM (System Management Mode) which requires a special hardware signal to activate. Switching to a new mode requires quite a bit of configuration ahead of time since, aside from Real Mode, a number of descriptor tables and settings need to be in place before one may switch. --------------------------------------------------------------------------------------------------- Real Mode This is the default mode of all x86 processors. The default operand size is ALWAYS 16bit. Segment registers are simple 16bit values that are used for the top 16 bits of the 20bit final address. Selectors are not available. This means that the GDT, LDT, IDT, etc. as well as any other selector-based extensions such as paging are also unavailable. Addressing is through 20bit addressing using the specified segment as the top 16 bits and the specified effective address as the bottom 16 bits. Note that 12 bits overlap and whether this wraps around (default behavior???) or extends and uses the A20 line (to allow an extra 64KB to be accessed in real mode), depends on KERBLUH. There is no memory, hardware or process level protection available in real mode. Every program is allowed to access every piece of hardware and memory directly, without limitations. For 286 and above, this is essentially the same as if CPL and IOPL were both always 0. Most post-186 instructions are available in real mode, except where specified in the opcode listing. --------------------------------------------------------------------------------------------------- Protected Mode 286 and above Though this is not the default for the processor, it is considered the native mode. In protected mode, all processor features (with the exception of the AMD 64bit extensions) are available. The default operand size is usually 32bit (386 and above), but it may also be set to 16bit (always 16bit on the 286). KERBLUH --------------------------------------------------------------------------------------------------- System Management Mode aka SMM 386SL and above This is a special mode that is only accessible through hardware by activating the SMI# pin on the processor. It is generally used for power management features, as well as OEM specific features. After saving the state of the current process, the mode switches and the processor accesses the SMM code in another address space (KERBLUH - where is this address space and how does it work?) After SMM code is done, the RSM instruction returns the processor to the exact mode and settings it was in before the SMM code began. (KERBLUH - exact details on this!) KERBLUH - Need more details on what registers are available, CPL vs IOPL, 16bit vs 32bit, segments vs selectors, etc. --------------------------------------------------------------------------------------------------- Virtual-86 Mode aka V86 286 and above ??? Designed to allow old 16bit code execute in a protected mode environment, Virtual-86 Mode causes interrupts/exceptions whenever code running in this mode attempts to access hardware so that the hardware may be emulated in software. KERBLUH *************************************************************************************************** Integer Instructions o Overflow Flag Set if result is too large a positive number or too small a negative number (excluding sign bit) to fit in the destination operand d Direction flag Used to control block move instructions i Interrupt Enable flag Used to enable interrupts t Trap flag s Sign flag Generally set if the highbit of a result is set z Zero flag Generally set if a result is 0 a Auxillary Carry flag p Parity flag Generally set if the low-order 8bits of a result contain an even number of 1 bits c Carry flag Generally set on highorder bit carry/borrow --------------------------------------------------------------------------------------------------- How opcodes are stored in memory: |Instruction| Opcode | ModR/M | SIB |Displacement| Immediate | | Prefix | | | | | | Up to four 1/2 byte 1 byte 1 byte Address Immediate prefixes of opcode (if used) (if used) displacement data 1 byte each of 1, 2 or 4 1, 2 or 4 (optional) bytes or none bytes or none For the Opcode and ModR/M bytes, specific bits may be assigned specific values: Name Bits Description reg 3 Refers to a general register: Value 8bit 16/32bit 000 AL (E)AX 001 CL (E)CX 010 DL (E)DX 011 BL (E)BX 100 AH (E)SP 101 CH (E)BP 110 DH (E)SI 111 BH (E)DI 8bit values are used only on instructions that support the w (word or byte designation) bit and if the bit is clear. These three bits usually appear in bits 0-2 or 3-5 of the ModR/M byte, but also appear in a few instructions as bits 0-2 of the last opcode byte. w 1 Specifies if the data is a byte (clear) or 16/32bit (set) This appears as bit 0 of the last opcode byte, where used. s 1 Sign extend flag. If this is set, then the immediate value is 8bit and sign extended to 16/32bit. This appears as bit 1 of the last opcode byte, where used. sreg2 2 A 2bit representation of a segment register: Value Register 00 ES 01 CS 10 SS 11 DS This appears as bits 3-4 in the POP and PUSH opcode bytes only sreg3 3 A 3bit representation of a segment register: Value Register 000 ES 001 CS 010 SS 011 DS 100 FS 101 GS 110 unused (??? what happens if it is used) 111 unused (??? what happens if it is used) eee 3 Special purpose registers (CRx or DRx): Value CRx DRx 000 CR0 DR0 001 CR1* DR1 010 CR2 DR2 011 CR3 DR3 100 CR4 DR4* 101 unused* DR5* 110 unused* DR6 111 unused* DR7 * If not unused, this is a special Control/Debug register (KERBLUH - Why? What happens if we use them?) cond 4 Condition test. The top three bits bits are the condition type. The bottom bit is a "not" bit, testing for the opposite of the normal value. The following table shows all 16 possibilities and their meanings: Value Mnemonic(s) Meaning Actual Test 0000 O Overflow o 0001 NO Not overflow !o 0010 B, NAE, C Below, Not Above or Equal, Carry c 0011 NB, AE, NC Not Below, Above or Equal, Not Carry !c 0100 E, Z Equal, Zero z 0101 NE, NZ Not Equal, Not Zero !z 0110 BE, NA Below or Equal, Not Above c || z 0111 NBE, A Not Below or Equal, Above !c && !z 1000 S Sign s 1001 NS Not sign !s 1010 P, PE Parity, Parity Even p 1011 NP, PO Not Parity, Parity Odd !p 1100 L, NGE Less than, Not Greater than or Equal s != o 1101 NL, GE Not Less than, Greater than or Equal s == o 1110 LE, NG Less than or Equal, Not Greater than z || s != o 1111 NLE, G Not Less than or Equal, Greater than !z && s == o d ....................................................................................................................... Instruction Prefixes: Intel groups prefixes into four categories. Lock/Repeat prefixes: KERBLUH - what are the cycle costs? F0h LOCK prefix This prefix locks the memory access while operating, preventing other processors from accessing it (???) F2h REPNE/REPNZ prefix This prefix is used with string operations, such as SCASB It causes a repeat loop on the instruction, decrementing (E)CX every loop, until either (E)CX is zero or the Zero flag is set (What happens if ECX is zero when it starts???) F3h REP prefix This prefix is used with string operations, such as MOVSB It causes a repeat loop on the instruction, decrementing (E)CX every loop, until (E)CX is zero F3h REPNE/REPNZ prefix This prefix is used with string operations, such as CMPSB It causes a repeat loop on the instruction, decrementing (E)CX every loop, until either (E)CX is zero or the Zero flag is clear F3h Streaming SIMD Extensions prefix (Pentium 2+ only ???) KERBLUH - Need more information Segment Override prefixes: KERBLUH - what are the cycle costs? 2Eh CS segment override Causes the instruction to use CS instead of whatever default segment it would normally use 36h SS segment override Causes the instruction to use SS instead of whatever default segment it would normally use 3Eh DS segment override Causes the instruction to use DS instead of whatever default segment it would normally use 26h ES segment override Causes the instruction to use ES instead of whatever default segment it would normally use 64h FS segment override (386+ Only) Causes the instruction to use FS instead of whatever default segment it would normally use 65h GS segment override (386+ Only) Causes the instruction to use GS instead of whatever default segment it would normally use Operand-size Override prefix: 66h Operand-size override Overrides the default operand size for the instruction, costing 1 cycle. If the system is in 16bit mode, this instruction will be in 32bit mode. Likewise, if the system is in 32bit mode, this instruction will be in 16bit mode. Address-size Override prefix: 67h Address-size override Overrides the default address size for the instruction, costing 1 cycle. If the system is in 16bit addressing mode, this instruction will use 32bit addressing. Likewise, if the system is in 32bit addressing mode, this instruction will use 16bit addressing. One prefix from each category is possible on a single instruction though only a few are applicable in general cases to any given instruction. ....................................................................................................................... Opcode: Most opcodes are static values, they have no internal meaning. However, many opcode values are derived from assigning various bits to different meanings. ....................................................................................................................... ModR/M (Modifier Register/Memory): ...................................................................................................................... SIB (Scale, Index, Base): ....................................................................................................................... Exceptions that occur on any instruction Any Mode: #UD If EM in CR0 is set #NM If TS in CR0 is set #MF If there is a pending FPU exception Protected Mode only: #GP(0) If a destination operand is a nonwritable segment; or If a memory operand effective address is outside the CS, DS, ES, FS or GS segment limit #SS(0) If a memory operand effective address is outside the SS segment limit #PF(code) If a page fault occurs #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current priviledge level is 3 Virtual 8086 Mode only: #GP If any part of the operand lies outside of the effective address space from 0 to 0FFFFh #PF(code) If a page fault occurs #AC(0) If alignment checking is enabled and an unaligned memory reference is made Real Mode only: #GP If any part of the operand lies outside of the effective address space from 0 to 0FFFFh --------------------------------------------------------------------------------------------------- Code Each opcode will have AnyScript style code to demonstrate how the opcodes function. AnyScript is very similar to C/C++, but is very good at representing alternating data type memory structures and specific data access types whereas C/C++ is not. The biggest differences are type casting and alternate data type accessing. Functions ConvertS32ToF32(x) Converts the specified S32 to a single (F32) according to the processor and MXCSR. ConvertF32ToS32(x) Converts the specified F32 to an S32 according to the processor and MXCSR. If the floating point value would be larger than the maximums S32 can accept, 0x80000000 is used. ConvertF32ToS32Truncate(x) Converts the specified F32 to an S32 according to the processor. Inexact results are truncated (??? what does this mean exactly?). If the floating point value would be larger than the maximums S32 can accept, 0x80000000 is used. Exception(x) Causes the listed exception GetParity(x) Returns 0 if an even number of bits is set in the value, 1 if an odd number of bits is set in the value. On the x86 processors, parity is only used on the low 8 bits of any value. Saturate(x, DataType) This function means to saturate the data using a signed algorithm, based on the specified data type. Saturate means that we use the highest or lowest value that can fit into the specified data type. For example, if the data type specified is S8, then the result would be: If (x < -128) result = -128; Else If (x > 127) result = 127; Else result = x; Now, if the data type was U8 instead, then the result would be: If (x > 255) result = 255; Else result = x; TypeCast(x, DataType) This casts the value to the data type specified. If the new data type is smaller than the value (for example, TypeCast(x, U8) when x is a U32), then the value is truncated. If the new data type is the same size (for example, TypeCast(x, U8) when x is an S8), then no change occurs. If the new data type is larger than the value (for example, TypeCast(x, S16) when x is a U16), then the value is zero extended if it was unsigned originally, sign extended if it was signed originally. Read8bits Read16bits Read32bits Read48bits Read64bits OpcodeOperandSize OperandSize SegmentOperandSize StackPop16 StackPop32 StackPush16 StackPush32 --------------------------------------------------------------------------------------------------- Samples Each opcode will have a sample or two demonstrating how the opcode is used. These samples will have comments on the side explaining what each opcode is doing as well as having pipe and cycle count information. The cycle counts will be for Pentium and Pentium MMX with piping considered for Pentium w/ MMX. Since Pentiums and Pentiums w/ MMX have slightly different piping considerations (usually dealing with prefixes), the actual results may be off. It is assumed that the processor is running in 32bit protected mode unless otherwise specified. Each sample line will use this format: OPCODE ;PPxxx Comment Example: (Note: This code is meaningless, it's just to show a sample of the sample format!) mov eax, ecx ;U 1 ECX = EAX mov edx, [somevar] ; V1 EDX = somevar div ebx ;--41 Divide EDX:EAX by EBX test edx, edx ;U 1 Test if EDX is zero (no remainder) jz SomeLabel ;!V1(+4?) Jump to SomeLabel if there was no remainder rep movsd ;--3+1n Move ECX DWords over mov ax, [somevar2] ;U 1+1 Get somevar2 in AX xor eax, ebx ;U*1 XOR the new value with EBX shl ebx, 1 ;U!1 Multiply EBX by 2 SomeLabel: OPCODE This will be the opcode and any parameters it uses PP This will contain piping information and always takes two characters. The only time two instructions may run in parallel is when a U (U , U* or U!) is followed by a V ( V or !V). - Though the instruction may pipe in both U and V, it is unknown where it will be This will generally be used on the first opcode when the second opcode does not rely on the first opcode being in either pipe. U - This instruction may pipe in both U and V, but is currently in the U pipe Often times this will be used on the first opcode even though we do not know what pipe it will be on. It is shown assuming that we are starting on the U pipe simply so we can show how the following instructions pipe relative to it. V - This instruction may pipe in both U and V, but is currently in the V pipe U! - This instruction may only pipe in the U pipe !V - Though this instruction theoretically can be in either the U or the V pipe, it can ONLY pipe in the V pipe so for our purposes it is considered to always be in the V pipe. However, if it is in the U pipe and it has a mispredicted branch penalty, that penalty will be 1 cycle higher (+5) instead of the normal +4. U* - This instruction may pipe in both U and V, but because of a conflict with the previous opcode, it must be in the U pipe. This usually occurs when two opcodes in a row need to modify the same register. -- - This instruction does not support UV piping at all. xxx This contains timing information. There are a few different timing types, based on the opcode: x - A single value means that it is always expected to use the same number of cycles x-y - The opcode may take anywhere from x to y cycles x+y - The +y is usually a +1, the amount of extra cycles necessary because of prefixes on the opcode. This usually occurs because of 16bit operations x+? - This is used only for function calls, meaning we don't know how many cycles the call will add x+yn - x cycles plus y cycles per iteration. This is generally used for strings, such as rep movsb x/y - If the result of the opcode is true, x cycles are use. If it is false, y cycles are used x(+y?) Used only on conditional branches, the y value is the penalty for a mispredicted branch. Comment This contains the description of the opcode ___________________________________________________________________________________________________ AAA ASCII Adjust after Addition Description: ASCII Adjusts AL after addition. The ADD is expected to have been between two unpacked BCD values. This adjusts the binary result back to an ASCII adjusted result. If a decimal carry occurs, c and a are set and AH is incremented. If no carry occurs, c and a are cleared and AH is uneffected. It any case, the high nibble in AH is set to 0. To convert AL to ASCII, simply do "or al, 030h" Note that this instruction does not guarantee that AH does not overflow from a valid BCD value. Flags: oszapc ???*?* c - Was AL greater than 9 or a set? a - Was AL greater than 9 or a set? Special Exceptions: Code: If ((AL & 0x0F) > 0x09 || a) { AL += 0x06; AH++; a = c = 1; } Else { a = c = 0; } // Make certain only the low nibble is set AL &= 0x0F; Sample: ; BCD Add: ; BCDValue1 - 16bit unpacked BCD Value ; BCDValue2 - 8bit unpacked BCD Value ; BCDValue1 += BCDValue2; // Unpacked BCD Version mov ax, BCDValue1 ; The BCD value is in AX add al, BCDValue2 ; Add BCD value to AL aaa ; Adjust back to BCD cmp ah, 09h ; | jbe Done ; +-> We have to manually adjust AH sub ah, 0Ah ; | Done: mov BCDValue1, ax ; Move it back ___________________________________________________________________________________________________ AAD ASCII Adjust before Division Description: Used to prepare two unpacked BCD digits (the least significant digit in AL, most significant in AH) for a division operation that will yield an unpacked result. This basically just makes AL = AL + (10 * AH) and sets AH to 0. At this point AX is the binary equivalent of the BCD value. The actual opcode used by AAD has a hidden feature, in that the second byte of the opcode is actually an immediate value used in the modification. In this way, AAD is actually an unsigned 8bit MULADD imm8, where we get AL += AH * x, where x is the immediate value. To use this form, remove the second byte that is normally 0Ah (10), and replace it with your own immediate value. Flags: oszapc ?**?*? No clue what the flags are set to... z and s are kinda obvious at least. s = Highbit of AL set? z = AL = 0? p = Parity of AL Code: //AAD version: AL += AH * 10; AH = 0; z = !AL; s = AL & 0x80; p = GetParity(AL); //Actual Opcode: AL += AH * operand; AH = 0; z = !AL; s = AL & 0x80; p = GetParity(AL); Sample: ; BCD Divide: ; BCDValue1 - 16bit BCD Value ; BCDValue2 - 16bit BCD Value ; BCDValue /= BCDValue2; mov ax, BCDValue1 ; | aad ; | mov cx, ax ; +-> AL and CL are now the ASCII Adjusted values to divide mov ax, BCDValue2 ; | aad ; | div cl ; Do the divide. AH is 0 at this point, so no worries about it ; being used in the DIV ; AAM works after divide as well as multiply since it does the ; opposite of AAD aam ; ASCII adjust back mov BCDValue1, ax ; Store the result ___________________________________________________________________________________________________ AAM ASCII Adjust after Multiply Description: Use AAM after a MUL between two unpacked BCD digits that leaves the result in AX. Since the result is less than 100, it is contained entirely in AL. This basically makes AH = AL / 10 and AL = AL % 10. The actual opcode used by AAM has a hidden feature, in that the second byte of the opcode is actually an immediate value used in the modification. In this way, AAM is an unsigned DIV imm8, where we get AH = AL / x, AL = AL % x. To use this form, remove the second byte that is normally 0Ah (10), and replace it with your own immediate value. Flags: oszapc ?**?*? p - Parity of AL z - AL = 0? s - The high bit of AL is set? (KERBLUH - Just a guess) Code: // AAM AH = AL / 10; AL = AL % 10; s = AL & 0x80; z = !AL; p = GetParity(AL); // Actual Opcode: AH = AL / value; AL = AL % value; s = AL & 0x80; z = !AL; p = GetParity(AL); Sample: ; BCD Multiply: ; BCDValue1 - 8bit BCD Value ; BCDValue2 - 8bit BCD Value ; BCDValue3 - 16bit BCD Value ; BCDValue3 = BCDValue1 * BCDValue2; mov al, BCDValue1 ; | mov cl, BCDValue2 ; +-> Get two BCD values and multiply mul cl ; | aam ; ASCII Adjust mov BCDValue3, ax ; Store the 2 byte value ___________________________________________________________________________________________________ AAS ASCII Adjust after Subtract Description: Use AAS after a SUB that leaves the byte result in the AL register. The operands in the SUB must have been between 0 and 9. If decimal carry occurred, AH is decremented and a and c are set. If no decimal carry occurred, a and c are set to 0. In either case, AL is left with it's top nibble set to 0. To convert AL to ASCII, use OR al, 030h Flags: oszapc ???*?* c - Did decimal carry occur? a - Did decimal carry occur? Code: If ((AL & 0x0F) > 9 || a) { AL -= 6; AH--; c = a = 1; } Else { c = a = 0; } AL &= 0x0F; Sample: ; BCD Subtract: ; BCDValue1 - 16bit BCD Value ; BCDValue2 - 8bit BCD Value ; BCDValue1 -= BCDValue2; mov ax, BCDValue1 ; The BCD value is in ax sub al, BCDValue2 ; Subtract BCD value to AL aas ; Adjust back to BCD mov BCDValue1, ax ; Move it back ___________________________________________________________________________________________________ ADC ADd with Carry Description: Performs an addition on the two values, adding one if c is set. It is designed for use with higher-order additions than the processor can handle natively. For example, to add two 64bit values together, you can ADD the lower 32bits of both, then ADC the upper 32bits of both and c is used to "carry" a one over. Opcode 83 /2 uses an 8bit value, but adds it to a 16/32 bit value. The 8bit value is sign extended before the ADC operation. Flags: oszapc ****** c - Is the result too large to fit into the destination? p - Parity a - Same as c z - Is the result 0? s - Is the highbit set in the result? o - Did both operands have the same highbit and the result had a different highbit? Code: result = operand1 + operand2; If (c) { result++; a = c = result <= operand1; } Else { a = c = result < operand1; } p = GetParity(result); z = !result; s = result & highbit; o = ((operand1 ^ operand2 ^ highbit) & (operand1 ^ result)) & highbit; operand1 = result; Sample: ; To do: ValueA64bit += ValueB64bit; mov eax, ValueB64bitL ; Get the low 32 bits of operand2 mov edx, ValueB64bitH ; Get the high 32 bits of operand2 add ValueB64bitL, eax ; Add the lower part of the two 64bit values together adc ValueB64bitH, edx ; Add the upper part of the two 64bit values with carry ___________________________________________________________________________________________________ ADD ADD Description: Integer addition, adding the second operand to the first. Opcode 83 /0 uses an 8bit value, but adds it to a 16/32 bit value. The 8bit value is sign extended before the ADD operation. Flags: oszapc ****** c - Is the result too large to fit into the destination? p - Parity a - Same as c z - Is the result 0? s - Is the highbit set in the result? o - Did both operands have the same highbit and the result had a different highbit? Code: result = operand1 + operand2; a = c = result < operand1; p = GetParity(result); z = !result; s = result & highbit; o = ((operand1 ^ operand2 ^ highbit) & (operand1 ^ result)) & highbit; operand1 = result; Sample: ; value1 += value2; mov eax, value2 ; add value1, eax ; Add value2 to value1 ___________________________________________________________________________________________________ AND Logical AND Description: Perform a logical AND. For each bit of the result, it is set only if both corresponding bits in the operands are also set. Opcode 83 /4 uses an 8bit value, but ands it to a 16/32 bit value. The 8bit value is sign extended before the AND operation. AND affects bits in the following manner: Dest Source Result 0 0 0 0 1 0 1 0 0 1 1 1 Flags: oszapc 0**?*0 p - Parity z - Is the result 0? s - Is the highbit set in the result? Code: operand1 &= operand2; o = c = 0; p = GetParity(operand1); z = !operand1; s = operand1 & highbit; Sample: ;if (val1 & val2) ; val3++; mov eax, val1 ; and eax, val2 ; eax = val1 & val2 jz NotTrue ; Skip the increment if (val1 & val2) is 0 inc val3 ; Increase val3 NotTrue: ___________________________________________________________________________________________________ ARPL Adjust RPL field of selector 286+ Description: Direct from [1]: The ARPL instruction has two operands. The first operand is a 16bit memory variable or word register that contains the value of a selector. The second operand is a word register. If the RPL field ("request privilege level" -- bottom two bits) of the first operand is less than the RPL field of the second operand, the zero flag is set to 1 and the RPL field of the first operand is increased to match the second operand. Otherwise, the zero flag is set to 0 and no change is made to the first operand. ARPL appears in operating system software, not in application programs. It is used to guarantee that a selector parameter to a subroutine does not request more privilege than the caller is allowed. The second operand of ARPL is normally a register that contains the CS selector value of the caller. Flags: oszapc --*--- z - Set if the bottom two bits of the first operand is less than the RPL field of the second operand Code: If ((operand1 & 0x03) < (operand2 & 0x03)) { operand1 &= ~0x03; // Turn off the bottom two bits operand1 |= (operand2 & 0x03); // Get the bottom two bits from operand2 z = 1; } Else { z = 0; } Sample: ; KERBLUH- I don't have a good sample for this instruction ___________________________________________________________________________________________________ BOUND Array BOUND check 186+ Description: BOUND checks that a signed array's index is within the limits specified by a block of memory consisting of an upper and lower bound. This checks that the passed in register is between the first and second signed value pointed to by the memory pointer. If the value is less than the first value or greater than the second value, it throws a BOUND Range Exceeded exception (#BR). Since this is a standard fault, the return EIP points to the BOUND instruction. Flags: oszapc ------ Code: If (operand1.S < operand2[0].S || operand1.S > operand2[1].S) ThrowException(#BR); Sample: ; I do not currently have a good example of this instruction since it seems to have been ; designed for use BEFORE protected mode ___________________________________________________________________________________________________ BSF Bit Scan Forward 386+ Description: Scans the bits in the second operand, starting with bit 0. If all bits are clear, z is cleared. If a bit is found, z is set and the first operand is loaded with the index of the first set bit. Flags: oszapc --*--- z - Set if a bit is found Code: z = 0; For (temp = 0; temp < bitdepth; temp++) { If (operand2 & (1 << temp)) { z = 1; operand1 = temp; Break; } } Sample: ; Get the first bit set in a value ; value1 = getfirstbit(value2); // use -1 if not found bsf eax, value2 ;--7-43 Get the bit if any jz BitFound ;!V1(+5?) mov value1, -1 ;U 1 Use -1 if not found jmp Done ;!V1 BitFound: mov value1, eax ;U 1 Use the bit we found Done: ___________________________________________________________________________________________________ BSR Bit Scan Reverse 386+ Description: Scans the bits in the second operand, starting with bit 0. If all bits are clear, z is cleared. If a bit is found, z is set and the first operand is loaded with the index of the first set bit. Flags: oszapc --*--- z - Set if a bit is found Code: z = 0; For (temp = bitdepth - 1; temp >= 0; temp--) { If (operand2 & (1 << temp)) { z = 1; operand1 = temp; Break; } } Sample: ; Get the last bit set in a value ; value1 = getlastbit(value2); // use -1 if not found bsr eax, value2 ;--7-104 Get the bit if any jz BitFound ;!V1(+5?) mov value1, -1 ;U 1 Use -1 if not found jmp Done ;!V1 BitFound: mov value1, eax ;U 1 Use the bit we found Done: ___________________________________________________________________________________________________ BSWAP Byte SWAP 486+ Description: Reverses the byte order of a 32bit register, effectively converting it's endianness. It's essentially the equivalent of (for EAX): xchg al, ah rol eax, 16 xchg al, ah If this is used on a 16bit register, the result is undefined. Flags: oszapc ------ Code: result.U8 = operand.U8[3]; result.U8[1] = operand.U8[2]; result.U8[2] = operand.U8[1]; result.U8[3] = operand.U8; operand = result; Sample: ; Simply switches a 32bit value's endianness ; value1 = SwitchEndian(value1); mov eax, value1 ; 1 | bswap eax ;--1 +-> Swap the endianness of the value mov value1, eax ;U 1 | ___________________________________________________________________________________________________ BT Bit Test 386+ Description: If the bit specified in operand 2 is set in operand 1, the carry flag is set. Otherwise, carry is clear. For example: bt eax, 3 ; c = (EAX & 0x08) ? 1 : 0; Flags: oszapc -----* c - Set if the bit is on Code: c = operand1 & (1 << operand2); Sample: ; If (a & (1 << b)) ; c = 2; mov eax, a ;U 1 | mov ecx, b ; V1 +-> Move both to a register and do the BT (a BT with a memory operand takes too long) bt eax, ecx ;--3 | jnc Done ;!V1(+5?) If carry is not set, skip the mov mov c, 2 ;U 1 c = 2 Done: ___________________________________________________________________________________________________ BTC Bit Test and Complement 386+ Description: If the bit specified in operand 2 is set in operand 1, the carry flag is set, otherwise it's clear. Then it complements that bit in operand 1. For example: btc eax, 3 means: c = (EAX & 0x08) ? 1 : 0; // 0x08 is bit 3 EAX ^= 0x08; Flags: oszapc -----* c - Set if the bit was on before complementing Code: temp = (1 << operand2); c = operand1 & temp; operand1 ^= temp; Sample: ; Quick way to do: ; If (a & (1 << b)) ; c = 2; ; a ^= (1 << b); mov eax, a ;U 1 | Move both to a register and do the BTC mov ecx, b ; V1 +-> (a BTC with a memory operand takes too long) btc eax, ecx ;--6 | This also does the "a ^= (1 << b);" part jnc Done ;!V1(+5?) If carry is not set, skip the mov mov c, 2 ;U 1 c = 2 Done: ___________________________________________________________________________________________________ BTR Bit Test and Reset 386+ Description: If the bit specified in operand 2 is set in operand 1, the carry flag is set, otherwise it's clear. Then it clears that bit in operand 1. For example: btc eax, 3 means: c = (EAX & 0x08) ? 1 : 0; // 0x08 is bit 3 EAX &= ~0x08; Flags: oszapc -----* c - Set if the bit was set before resetting Code: temp = (1 << operand2); c = operand1 & temp; operand1 !&= temp; Sample: ; If (a & (1 << b)) ; c = 2; ; a !&= (1 << b); mov eax, a ;U 1 | Move both to a register and do the BTR mov ecx, b ; V1 +-> (a BTR with a memory operand takes too long) btr eax, ecx ;--6 | This also does the "a !&= (1 << b);" part jnc Done ;!V1(+5?) If carry is not set, skip the mov mov c, 2 ;U 1 c = 2 Done: ___________________________________________________________________________________________________ BTS Bit Test and Set 386+ Description: If the bit specified in operand 2 is set in operand 1, the carry flag is set, otherwise it's clear. Then it sets that bit in operand 1. For example: btc eax, 3 means: c = (EAX & 0x08) ? 1 : 0; // 0x08 is bit 3 EAX |= 0x08; Flags: oszapc -----* c - Set if the bit was set before setting Code: temp = (1 << operand2); c = operand1 & temp; operand1 |= temp; Sample: ; If (a & (1 << b)) ; c = 2; ; a |= (1 << b); mov eax, a ;U 1 | Move both to a register and do the BTR mov ecx, b ; V1 +-> (a BTR with a memory operand takes too long) bts eax, ecx ;--6 | This also does the "a |= (1 << b);" part jnc Done ;!V1(+5?) If carry is not set, skip the mov mov c, 2 ;U 1 c = 2 Done: ___________________________________________________________________________________________________ CALL CALL procedure Description: There are two main types of calls: near calls (to an address in the same code segment) and far calls (to an address in another code segment). Near calls are the simplest and work the same in protected mode, real mode and virtual 86 mode. The (E)IP register is pushed onto the stack, then the new address is loaded into (E)IP. If the operand is a relative operand (E8 opcode), it is a signed offset from the position of (E)IP for the instruction after this CALL. CS is never changed, so if the relative operand would cause overflow, the address would wrap around. This makes it so that in 16bit or 32bit mode, a relative address can access every possible address with a 16bit or 32bit signed value (respectively). Setting the operand size to 16/32bit on a CALL instruction causes it to use only the specified size of values/registers. Using a 16bit CALL always pushes 16bit IP and clears the high 16bits of EIP, therefore 16bit calls should only ever be used in the lower 64KB of a segment (which is generally never a problem in real mode since only 16bit IP is ever used). Far calls act very differently in real mode and virtual 86 mode when compared to protected mode. In real/virtual 86 mode, a far call simply pushes CS:(E)IP onto the stack (CS first, then (E)IP) then does a far jump to the specified address. Again, like near calls, the operand size affects all values and registers involved: in 16bit mode, only IP is pushed and the high 16bits of EIP are cleared. Far calls in protected mode ... KERBLUH Flags: oszapc ------ Code: KERBLUH Sample: ___________________________________________________________________________________________________ CBW Convert signed Byte to Word aka CWDE Convert signed Word to Dword (Extended) Description: Sign extends AL into AX. Essentially, this just makes all bits of AH equal to the high bit of AL. If the operand size is 32bits instead of 16, this instruction is called CWDE and instead AX is sign extended to EAX. Flags: oszapc ------ Code: If (OperandSize == 16) AX = SignExtend(AL); Else EAX = SignExtend(AX); Sample: ___________________________________________________________________________________________________ CDQ Convert signed Dword to Qword see CWD ___________________________________________________________________________________________________ CLC CLear Carry flag Description: Clears the carry flag (sets it to zero). Flags: oszapc -----0 Code: c = 0; Sample: ___________________________________________________________________________________________________ CLD CLear Direction flag Description: Clears the direction flag (sets it to zero). Essentially, this makes all future string operations (MOVS, CMPS, etc) increment memory on each repetition. Flags: oszapc ------ Code: d = 0; Sample: ; A very very simple memory move example mov ecx, BytesToMove ; Get the number of bytes to move les edi, Buffer2 ; Get the destination lds esi, Buffer1 ; Get the source cld ; Make certain we move it forwards rep movsb ; Move ECX bytes from DS:ESI to ES:EDI ___________________________________________________________________________________________________ CLI CLear Interrupt flag Description: Clears the interrupt flag (sets it to zero). When the interrupt flag is clear, maskable interrupts no longer occur. This does not stop non-maskable interrupts or exceptions/faults from occurring. This is only allowed to occur if the program is in real mode, or it is in protected mode and CPL >= IOPL, or it is in virtual 86 mode and IOPL is 3. Otherwise, #GP(0) occurs. Flags: oszapc ------ Code: // If: // We're in real mode // or We're in protected mode and CPL is less or equal to IOPL // or We're in virtual 86 and IOPL is 3 // only then can we clear the interrupt flag! If (PE == 0 || (VM == 0 && CPL <= IOPL) || (VM == 1 && IOPL == 3)) i = 0; Else Exception(GP0); Sample: ___________________________________________________________________________________________________ CLTS CLear Task Switch flag Description: Clears the task switch flag in CR0. This is designed only for use in operating systems and will throw an exception if not in real mode or CPL is greater than 0. Flags: oszapc ------ Special Exceptions: #GP(0) if in protected/v86 mode and CPL is greater than 0 Code: ts = 0; Sample: ___________________________________________________________________________________________________ CMC CoMplement Carry flag Description: Complements (switches) the carry flag. Flags: oszapc -----* c - Becomes ~c Code: c ^= 1; Sample: ___________________________________________________________________________________________________ CMOVcc Conditional Move Description: Performs a 16 or 32bit move only if the specified condition is true. This instruction is fairly limited compared to the MOV instruction, in that only registers may be the destination. This instruction may not be supported on all processors as expected. It can be detected using the CPUID instruction. Flags: oszapc ------ Code: If (ConditionTrue) operand1 = operand2; Sample: ___________________________________________________________________________________________________ CMP CoMPare Description: This instruction sets the conditional flags (carry, zero, auxiliary carry, parity, sign and overflow) by subtracting the second operand from the first operand and throwing away the result (neither operand is changed). The resulting flags are affected the same as if a SUB instruction was used. This is generally used before a Jcc, CMOVcc or SETcc instruction. When an 8bit immediate value is used as the second operand and the first operand is 16 or 32bit, it is sign-extended. Flags: oszapc ****** o - Did both operands have the same highbit and the result had a different highbit? s - Set if the high bit is set in the result z - Set if the result is zero a - Set if there is borrow in the low four bits p - Set if the low 8bits have an even number of bits set c - Set if the result is greater than operand1 Code: temp = operand2; If (OperandSize == 32 && ImmediateSize == 8) temp = TypeCast(temp, S32); Else If (OperandSize == 16 && ImmediateSize == 8) temp = TypeCast(temp, S16); result = operand1 - operand2; c = operand1 < result; p = GetParity(result); z = !result; s = result & highbit; o = ((operand1 ^ temp ^ highbit) & (operand1 ^ result)) & highbit; Sample: ___________________________________________________________________________________________________ CMPPS CoMPare Packed Singles SSE Instruction CMPEQPS CoMPare for EQual Packed Single (sets operand3 to 0) CMPLTPS CoMPare for Less Than Packed Single (sets operand3 to 1) CMPLEPS CoMPare for Less than or Equal Packed Single (sets operand3 to 2) CMPUNORDPS CoMPare for UNORDered Packed Single (sets operand3 to 3) CMPNEPS CoMPare for Not Equal Packed Single (sets operand3 to 4) CMPNLTPS CoMPare for Not Less Than Packed Single (sets operand3 to 5) CMPNLEPS CoMPare for Not Less than or Equal Packed Single (sets operand3 to 6) CMPORDPS CoMPare for ORDered Packed Single (sets operand3 to 7) Description: Compares the packed single values in operand1 with the values in operand2. After the comparison, if it is true, the value becomes a 32bit integer -1 (0xFFFFFFFF); if it is false, it becomes a 32bit integer 0. Which comparison operation is used is based on the immedate operand3, which can only be between 0 and 7: Operand3 Description Result if NaN Q/SNan Operand (hex) (in either) Signals Invalid 00 Equal False No 01 Less False Yes 02 Less or Equal False Yes 03 Unordered True No 04 Not Equal True No 05 Not Less True Yes 06 Not Less or Equal True Yes 07 Ordered False No [3] states that you should swap the operands manually, protecting the second operand if you do not want to lose it, to use greater, greater or equal, not greater or not greater or equal. However, aside from how NaN is handled, using not less is the same as greater or equal, and not less or equal is the same as greater. KERBLUH - What does ordered mean? Flags: oszapc ------ Code: KERBLUH Sample: ___________________________________________________________________________________________________ CMPSx CoMPare String CMPSB CoMPare String by Byte CMPSW CoMPare String by Word CMPSD CoMPare String by Dword Description: These instructions perform a compare with DS:(E)SI as operand1 and ES:(E)DI as operand2, using the size specified by the opcode (1, 2 or 4 bytes). Then, both (E)SI and (E)DI are incremented (if d is 0) or decremented (if d is 1) by that number of bytes. The comparison is identical to the CMP instruction, in that a temporary value holds the result of operand1 - operand2 and the flags are set accordingly. If the address size is 16bit, then DS:SI and ES:DI are used, otherwise DS:ESI and ES:EDI are used. If a segment override is used, it overrides DS; ES cannot be overridden. REPE/REPZ and REPNE/REPNZ repeat prefixes may be used with this instruction. Flags: oszapc ****** o - Did both operands have the same highbit and the result had a different highbit? s - Set if the high bit is set in the result z - Set if the result is zero a - Set if there is borrow in the low four bits p - Set if the low 8bits have an even number of bits set c - Set if the result is greater than operand1 Code: result = operand1 - operand2; If (AddressSize == 16) { If (d) { SI -= OpcodeOperandSize; DI -= OpcodeOperandSize; } Else { SI += OpcodeOperandSize; DI += OpcodeOperandSize; } } Else { If (d) { ESI -= OpcodeOperandSize; EDI -= OpcodeOperandSize; } Else { ESI += OpcodeOperandSize; EDI += OpcodeOperandSize; } } c = operand1 < result; p = GetParity(result); z = !result; s = result & highbit; o = ((operand1 ^ operand2 ^ highbit) & (operand1 ^ result)) & highbit; Sample: ___________________________________________________________________________________________________ CMPSS CoMPare Scalar Single SSE Instruction CMPEQSS CoMPare for EQual Scalar Single (sets operand3 to 0) CMPLTSS CoMPare for Less Than Scalar Single (sets operand3 to 1) CMPLESS CoMPare for Less than or Equal Scalar Single (sets operand3 to 2) CMPUNORDSS CoMPare for UNORDered Scalar Single (sets operand3 to 3) CMPNESS CoMPare for Not Equal Scalar Single (sets operand3 to 4) CMPNLTSS CoMPare for Not Less Than Scalar Single (sets operand3 to 5) CMPNLESS CoMPare for Not Less than or Equal Scalar Single (sets operand3 to 6) CMPORDSS CoMPare for ORDered Scalar Single (sets operand3 to 7) Description: Compares the low scalar single value in operand1 with the low value in operand2. After the comparison, if it is true, the low 32bits of operand1 become a 32bit integer -1 (0xFFFFFFFF); if it is false, it becomes a 32bit integer 0. Which comparison operation is used is based on the immedate operand3, which can only be between 0 and 7: Operand3 Description Result if NaN Q/SNan Operand (hex) (in either) Signals Invalid 00 Equal False No 01 Less False Yes 02 Less or Equal False Yes 03 Unordered True No 04 Not Equal True No 05 Not Less True Yes 06 Not Less or Equal True Yes 07 Ordered False No [3] states that you should swap the operands manually, protecting the second operand if you do not want to lose it, to use greater, greater or equal, not greater or not greater or equal. However, aside from how NaN is handled, using not less is the same as greater or equal, and not less or equal is the same as greater. KERBLUH - What does ordered mean? Flags: oszapc ------ Code: KERBLUH Sample: ___________________________________________________________________________________________________ CMPXCHG CoMPare and eXCHanGe Description: Operand1 is compared to AL/AX/EAX (depending on the opcode operand size). If they are equal, operand1 becomes operand2; otherwise, AL/AX/EAX becomes operand1. [3] states that, even if operand1 is not changed, if it is a memory value it will be written back to memory to simplify the operation. Flags: oszapc ****** o - Did both operands have the same highbit and the result had a different highbit? s - Set if the high bit is set in the result z - Set if the result is zero a - Set if there is borrow in the low four bits p - Set if the low 8bits have an even number of bits set c - Set if the result is greater than operand1 Code: ; 32bit version result = operand1 - EAX; z = !result; c = EAX < result; p = GetParity(result); s = result & highbit; o = ((EAX ^ operand1 ^ highbit) & (EAX ^ result)) & highbit; If (z) operand1 = operand2; Else EAX = operand1; Sample: ___________________________________________________________________________________________________ CMPXCHG8B CoMPare and eXCHanGe 8 Bytes Pentium and above Description: Operand1 is compared to EDX:EAX. If they are equal, operand1 becomes ECX:EBX; otherwise, EDX:EAX becomes operand1. [3] states that, even if operand1 is not changed it will be written back to memory to simplify the operation. Flags: oszapc --*--- z - Set if the values are equal Code: z = (EAX == operand1.U32 && EDX == operand1.U32[1]); If (z) { operand1.U32 = EBX; operand1.U32[1] = ECX; } Else { EAX = operand1.U32; EDX = operand1.U32[1]; } Sample: ___________________________________________________________________________________________________ COMISS COMpare Single Scalar (where the I goes is anybody's guess) SSE Instruction Description: Compares the low 32bit single in operand1 with the low 32bit single in operand2, setting overflow, signed and auxiliary carry to zero. Flags: oszapc 00*0** z - Set if the values are equal OR if a NaN is present p - Set if a NaN is present c - Set if operand1 < operand2 OR if a NaN is present Code: If (IsNaN(operand1.F32) || IsNaN(operand2.F32)) { z = 1; p = 1; c = 1; } Else { p = 0; z = operand1.F32 == operand2.F32; c = operand1.F32 < operand2.F32; } o = 0; s = 0; a = 0; Sample: ___________________________________________________________________________________________________ CPUID CPU IDentification Pentium and higher (available on some 486s) Description: Use of this instruction permits a program from identifying features of the current processor. Functions are called by putting a value into EAX before using the CPUID opcode. EAX, ECX, EDX and EBX are all modified by most of the CPU ID functions. This instruction also guarantees serialization of all previous instructions, guaranteeing that all flag, register and memory updates for previous instructions are completed before the next instruction executes. [3] states that all registers are undefined if EAX is higher than the maximum supported. EAX func Description 0 EAX - Maximum EAX function supported (1 for 486 and Pentium, 2 for P2) EBX:EDX:ECX - Processor signature as a 12 char string 756E6547:49656E69:6C65746E 'GenuineIntel' for Intel processors ??????41:????????:???????? 'AuthenticAMD' for AMD processors ????????:????????:???????? '????????????' for Cyrix processors 1 EAX - Version Information (Type, Family, Model and Stepping ID): BITS 0-3 - 0000000F - Stepping ID (??? what does this mean?) 4-7 - 000000F0 - Model (Starts on 1) (??? what does this mean?) 8-11- 00000F00 - Family (KERBLUH - Need a full list): 0000 - 0600 - Pentium Pro 12-13-00003000 - Processor Type: 0000 - Original OEM Processor 1000 - Intel OverDrive Processor 2000 - Dual Processor (not applicable to 386 and 486 3000 - Reserved EBX - Reserved (KERBLUH - What does it contain?) ECX - Reserved (KERBLUH - What does it contain?) EDX - Feature Information: If any of these bits is set, the feature is available BITS 0 - 00000001 - FPU: Floating Point Unit on chip Processor contains an FPU unit w/ 387 instruction set 1 - 00000002 - VME: Virtual-8086 Mode Enhancements Processor supports the following Virtual86 mode enhancements: o CR4.VME enables virtual86 mode extensions o CR4.PVI enabled protected-mode virtual interrupts o Expansion of the TSS with software indirection bitmap o EFLAGS.VIF bit (virtual interrupt flag) o EFLAGS.VIP bit (virtual interrupt pending flag) 2 - 00000004 - DE: Debug Extensions Debugging extensions are available: o CR4.DE for enabling debug extensions o Optional trapping of access to DR4 and DR5 3 - 00000008 - PSE: Page Size Extensions Processor supports 4MB pages: o CR4.PSE bit enabling page size extensions o PDEs or page directory entries, and their modified bit o PTEs or page table entries 4 - 00000010 - TSC: Time Stamp Counter o RDTSC (real time stamp counter) instruction o CR4.TSD which, along with CPL, controls whether the time stamp counter may be read 5 - 00000020 - MSR: Model Specific Registers o RDMSR (ReaD Model Specific Register) instruction o WRMSR (WRite Model Specific Register) instruction 6 - 00000040 - PAE: Physical Address Extension Support for physical addresses greater than 32bits: o CR4.PAE bit to enable this feature o Extended page table entry format o Extra level in the page translation tables o 2MB Pages Number of address bits is implementation specific. The Pentium Pro supports 36bits of addressing when CR4.PAE is set. KERBLUH- How do other procs handle this? Or do they ignore it? 7 - 00000080 - MCE: Machine Check Exception o CR4.MCE bit to enable this feature o Allows Machine Check Exceptions, which are model specific 8 - 00000100 - CX8: CMPXCHG8B Instruction Support for the CMPXCHG8B instruction 9 - 00000200 - APIC: Advanced Programmable Interrupt Controller Support for the APIC is available on this proc 10- 00000400 - Reserved (0) 11- 00000800 - SEP: Fast System Calls Supports fast system calls: o SYSENTER instruction o SYSEXIT instruction 12- 00001000 - MTRR: Memory Type Range Registers Supports Memory Type Range Registers (MTRR) These machine-specific registers may be used to gather more information about them 13- 00002000 - PGE: PTE Global Flag o CR4.PGE bit to enable this feature o Global bit support for PTDEs and PTEs, used to indicate translation lookaside buffer (TLB) entries that are common to different tasks and need not be flushed when CR3 is written 14- 00004000 - MCA: Machine Check Architecture Supports the MCG_CAP (machine check global capability) MSR which has the number of banks of error reporting MSRs the cpu supports 15- 00008000 - CMOV: Conditional Move and Compare Instructions o CMOVcc instructions o If FPU (bit 0), then FCMOVcc and FCOMI instructions also 16- 00010000 - FGPAT: Page Attribute Table [3] does not describe this, it has a copy of the text for CMOV here which is probably a mistake KERBLUH 17- 00020000 - PSE-36: 36bit Page Size Extension Support for 4MB pages w/ 36bit physical addresses 18- 00040000 - PN: Processor Number Support for 96bit processor number 19-22 - Reserved (0) 23- 00800000 - MMX CPU supports the MMX instructions 24- 01000000 - FXSR: Fast FP/MMX/SSE Save/Restore o CR4.OSFXSR bit to enable o FXSAVE instruction o FXRSTOR instruction 25- 02000000 - XMM: SSE Support for SSE instructions 26-31 - Reserved (0) 2 EAX - Cache and TLB Information EBX - Cache and TLB Information ECX - Cache and TLB Information EDX - Cache and TLB Information When first calling function 2, the low byte of EAX (AL) will contain the number of calls to CPUID function 2 that will be necessary to get all the TLB values. This number includes the first call, so if AL is 1, then no subsequent calls to CPUID are necessary. For each register, refer to the high bit (bit 31). If it is clear, the register contains valid TLB information. If it is set, the register is reserved (KERBLUH- does it contain anything useful, or is it random, or what?) For each byte in each register that had valid TLB information, refer to the following table: HEX Description 00 Null descriptor 01 Instruction TLB: 4K-Byte Pages, 4-way set associative, 32 entries 02 Instruction TLB: 4M-Byte Pages, fully associative, two entries 03 Data TLB: 4K-Byte Pages, 4-way set associative, 64 entries 04 Data TLB: 4M-Byte Pages, 4-way set associative, eight entries 06 Instruction cache (L1): 8K Bytes, 4-way set associative, 32 byte line size 08 Instruction cache (L1): 16K Bytes, 4-way set associative, 32 byte line size 0A Data cache (L1): 8K Bytes, 2-way set associative, 32 byte line size 0C Data cache (L1): 16K Bytes, 2-way or 4-way set associative, 32 byte line size 40 No L2 Cache 41 L2 Unified cache: 128K Bytes, 4-way set associative, 32 byte line size 42 L2 Unified cache: 256K Bytes, 4-way set associative, 32 byte line size 43 L2 Unified cache: 512K Bytes, 4-way set associative, 32 byte line size 44 L2 Unified cache: 1M Byte, 4-way set associative, 32 byte line size 45 L2 Unified cache: 2M Byte, 4-way set associative, 32 byte line size Flags: oszapc ------ Code: // KERBLUH Sample: ___________________________________________________________________________________________________ CVTPI2PS ConVerT Packed Integers to (2) Packed Singles SSE Instruction Description: The two low packed singles in operand1 become floating point equivalents of the two signed 32bit integers in operand2. Rounding is done according to MXCSR. [3]'s wording is not exactly clear, but it appears as though the MMX register (operand2), if not a memory operand, will become modified to change it to a floating point value. ??? Flags: oszapc ------ Special Exceptions: #MF - If there is a pending x87 fault. KERBLUH - Maybe more? Code: operand1.F32 = ConvertS32ToF32(operand2.S32); operand1.F32[1] = ConvertS32ToF32(operand2.S32[1]); Sample: ___________________________________________________________________________________________________ CVTPS2PI ConVerT Packed Singles to (2) Packed Integers SSE Instruction Description: The two packed signed integers in operand1 become integer equivalents of the two low packed singles in operand2. Rounding is done according to MXCSR. If the converted values are larger than the maximums of an S32, 0x80000000 is used. [3] again has confusing comments. More information is required. ??? Flags: oszapc ------ Special Exceptions: #MF - If there is a pending x87 fault. KERBLUH - Maybe more? Code: operand1.U32 = ConvertF32ToS32(operand2.F32); operand1.U32[1] = ConvertF32ToS32(operand2.F32[1]); Sample: ___________________________________________________________________________________________________ CVTSI2SS ConVerT Scalar Integer to (2) Scalar Single SSE Instruction Description: The low single in operand1 becomes the floating point equivalent of the signed 32bit integer operand2. Rounding is done according to MXCSR. Flags: oszapc ------ Code: operand1.F32 = ConvertS32ToF32(operand2.S32); Sample: ___________________________________________________________________________________________________ CVTSS2SI ConVerT Scalar Single to (2) Scalar Integer SSE Instruction Description: Operand1 becomes the integer equivalent of the low floating point value in operand2. Rounding is done according to MXCSR. If the converted value is larger than the maximum of an S32, 0x80000000 is used. Flags: oszapc ------ Code: operand1.S32 = ConvertF32ToS32(operand2.F32); Sample: ___________________________________________________________________________________________________ CVTTPS2PI ConVerT Truncated Packed Singles to (2) Packed Integers SSE Instruction Description: The two packed signed integers in operand1 become integer equivalents of the two low packed singles in operand2. If the conversion is inexact, truncation is used (??? Does this just drop the decimal portion?). If the converted values are larger than the maximums of an S32, 0x80000000 is used. [3] again has confusing comments. More information is required. ??? Flags: oszapc ------ Special Exceptions: #MF - If there is a pending x87 fault. KERBLUH - Maybe more? Code: operand1.U32 = ConvertF32ToS32Truncate(operand2.F32); operand1.U32[1] = ConvertF32ToS32Truncate(operand2.F32[1]); Sample: ___________________________________________________________________________________________________ CVTTSS2SI ConVerT Truncated Scalar Single to (2) Scalar Integer SSE Instruction Description: Operand1 becomes the integer equivalent of the low floating point value in operand2. If the conversion is inexact, then the value is truncated (??? the decimal portion is dropped off?). If the converted value is larger than the maximum of an S32, 0x80000000 is used. Flags: oszapc ------ Code: operand1.S32 = ConvertF32ToS32Truncate(operand2.F32); Sample: ___________________________________________________________________________________________________ CWD Convert signed Word to Dword aka CDQ Convert signed Dword to Qword Description: Sign extends AX into DX:AX. Essentially, this just makes all bits of DX equal to the high bit of AX. If the operand size is 32bits instead of 16, this instruction is called CDQ and instead EAX is sign extended to EDX:EAX. Flags: oszapc ------ Code: If (OperandSize == 16) DX:AX = SignExtend(AX); Else EDX:EAX = SignExtend(EAX); Sample: ___________________________________________________________________________________________________ CWDE Convert signed Word to Dword (Extended) see CBW ___________________________________________________________________________________________________ DAA Decimal Adjust after Addition Description: Decimal adjusts AL after an addition. This instruction is used after adding two packed BCD bytes together into AL. If the low nibble of AL is greater than 9 or auxiliary carry is set, then 6 is added, a is set and c is or'd with 1 if full carry occurred. If not, then a is cleared. Then, if the high nibble of AL is greater than 9 or carry is set, then 60h is added and c is set. If not, c is cleared. Flags: oszapc ?***** s - If the high bit of AL is set (useless) z - If AL is zero a - If 06h had to be added to the result p - Parity of AL c - If 60h had to be added to the result Code: // First nibble If ((AL & 0x06) > 0x09 || a) { temp = AL; AL += 6; a = 1; If (AL < temp) c = 1; } Else { a = 0; } // Second nibble If ((AL & 0x60) > 0x90 || c) { AL += 0x60; c = 1; } Else { c = 0; } s = AL & 0x80; z = !AL; p = GetParity(AL); Sample: ___________________________________________________________________________________________________ DAS Decimal Adjust after Subtraction Description: Decimal adjusts AL after a subtraction. This instruction is used after subtracting two packed BCD bytes into AL. If the low nibble of AL is greater than 9 or auxiliary carry is set, then 6 is subtracted, a is set and c is or'd with 1 if full borrow occurred. If not, then a is cleared. Then, if the high nibble of AL is greater than 9 or carry is set, then 60h is subtracted and c is set. If not, c is cleared. Flags: oszapc ?***** s - If the high bit of AL is set (useless) z - If AL is zero a - If 06h had to be subtracted from the result p - Parity of AL c - If 60h had to be subtracted from the result Code: // First nibble If ((AL & 0x06) > 0x09 || a) { temp = AL; AL -= 6; a = 1; If (AL < temp) c = 1; } Else { a = 0; } // Second nibble If ((AL & 0x60) > 0x90 || c) { AL -= 0x60; c = 1; } Else { c = 0; } s = AL & 0x80; z = !AL; p = GetParity(AL); Sample: ___________________________________________________________________________________________________ DEC DECrement by 1 Description: Decreases the operand by one. Unlike other math opcodes, this does NOT set the carry flag as it was designed to work as a loop control more than a math operation. Flags: oszapc *****- o - If there was signed overflow s - If the high bit of the result is set z - If the result is zero a - If there was carry on the low nibble p - Parity of the result Code: // Assumes result is the same bit depth as operand result = operand + 1; a = (result & 0x0F) < (operand & 0x0F); p = GetParity(result); z = !result; s = result & highbit; o = ((operand ^ highbit) & (operand ^ result)) & highbit; operand = result; Sample: ___________________________________________________________________________________________________ DIV DIVide (unsigned integer) Description: Performs an 8bit, 16bit or 32bit unsigned division according to the following: Bits Destination High Val Division Quotient Remainder 8 AX AH AX / operand AL AH 16 DX:AX DX DX:AX / operand AX DX 32 EDX:EAX EDX EDX:EAX / operand EAX EDX If, in any of the cases, operand is zero OR if an overflow would occur from dividing (ie, if the High Val is greater or equal to operand), a #DE (Divide Error) exception is thrown. Flags: oszapc ?????? Special Exceptions: #DE - If overflow occurs (if the operand is less than the high byte/word/dword) OR the operand is 0 Code: 8bit version: If (operand == 0 || operand < AH) { Exception(#DE); } Else { AH = AX % operand; AL = AX / operand; } 16bit version: If (operand == 0 || operand < DX) { Exception(#DE); } Else { DX = DX:AX % operand; AX = DX:AX / operand; } 32bit version: If (operand == 0 || operand < EDX) { Exception(#DE); } Else { EDX = EDX:EAX % operand; EAX = EDX:EAX / operand; } Sample: ___________________________________________________________________________________________________ DIVPS DIVide Packed Singles SSE Instruction Description: Divides each of the packed single values from operand1 by the respective values in operand2. Flags: oszapc ------ Special Exceptions: Overflow, Underflow, Invalid, Divide-By-Zero, Precision, Denormal Code: operand1.F32 /= operand2.F32; operand1.F32[1] /= operand2.F32[1]; operand1.F32[2] /= operand2.F32[2]; operand1.F32[3] /= operand2.F32[3]; Sample: ___________________________________________________________________________________________________ DIVSS DIVide Scalar Single SSE Instruction Description: Divides the low single value from operand1 by the low single value in operand2. Flags: oszapc ------ Special Exceptions: Overflow, Underflow, Invalid, Divide-By-Zero, Precision, Denormal Code: operand1.F32 /= operand2.F32; Sample: ___________________________________________________________________________________________________ EMMS Empty MMx State MMX Instruction Description: From [3]: This instruction sets the values of all the tags in the FPU tag word to empty (all ones). This operation marks the MMX technology registers as available, so they can subsequently be used by floating-point instructions. All other MMX instructions (other than the EMMS instruction) set all the tags in FPU tag word to valid (all zeroes). The EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology routines and before calling other procedures or subroutines that may execute floating-point instructions. If a floating-point instruction loads one of the registers in the FPU register stack before the FPU tag word has been reset by the EMMS instruction, a floating point stack overflow can occur that will result in a floating-point exception or incorrect result. Flags: oszapc ------ Code: FPUTag = 0xFFFF; Sample: ___________________________________________________________________________________________________ ENTER ENTER a subroutine Description: Creates a stack frame for a proceedure by pushing (E)BP onto the stack, then subtracting (E)SP by the number specified in operand1. operand2, if non-zero, specifies an additional number of times to push (E)BP onto the stack, up to 31 (only the low 5 bits are read). This, in essence, pushes (E)BP, keeps the value of (E)SP in a temp variable, continually decrements EBP (by 2 if operand size is 16bit, 4 if it's 32bit) and pushes it (operand2 & 31) - 1 times, then, if (operand2 & 31) is non-zero, pushes the temp value onto the stack (again, based on the operand size) and finally sets (E)BP to the temp value and subtracts operand1 from the temp value and stores it into (E)SP. Operand size only takes affect if operand2 & 31 is non-zero. Flags: oszapc ------ Code: temp8 = operand2 & 0x1F; // 0x1F = 31 StackSize = SegmentOperandSize(SS); If (StackSize == 32) { StackPush32(EBP); temp32 = ESP; } Else { StackPush16(BP); temp32 = ZeroExtend(SP); } If (temp8) { // Loop one less time than temp8 For (q = 1; q < temp8; q++) { If (OperandSize == 32) { If (StackSize == 32) { EBP -= 4; StackPush32(EBP); } Else { BP -= 4; StackPush16(BP); } } Else { If (StackSize == 32) { EBP -= 2; StackPush32(EBP); } Else { BP -= 2; StackPush16(BP); } } } // Push the final frame position If (OperandSize == 32) { StackPush32(temp32); } Else { StackPush16(temp32); } } If (StackSize == 32) { EBP = temp32; ESP = temp32 - operand1; } Else { BP = temp32; SP = temp32 - operand1; } Sample: ___________________________________________________________________________________________________ HLT HaLT Description: Halts the processor until an enabled interrupt, NMI or a reset causes it to resume execution. This instruction uses CS:EIP/CS:IP after the HLT instruction so that after a halt, the program resumes on the instruction immediately following it, if an interrupt or NMI is used. This is a privileged instruction. In protected mode and virtual-86 mode, the program must be running at a privilege level of 0 to use it, otherwise a general protection fault exception (#GP) is thrown. Flags: oszapc ------ Code: EnterHaltState(); Sample: ___________________________________________________________________________________________________ IDIV Integer DIVide Description: Performs an 8bit, 16bit or 32bit signed division according to the following: Bits Destination High Val Division Quotient Remainder 8 AX AH AX / operand AL AH 16 DX:AX DX DX:AX / operand AX DX 32 EDX:EAX EDX EDX:EAX / operand EAX EDX The sign of the remainder is always the same as the sign of the dividend (AX, DX:AX or EDX:EAX). If, in any of the cases, operand is zero OR if an overflow would occur from dividing (ie, if the High Val is greater or equal to operand), a #DE (Divide Error) exception is thrown. Flags: oszapc ?????? Special Exceptions: #DE - If overflow occurs (if the high byte/word/dword is greater or equal to the operand) OR the operand is 0 Code: 8bit version: If (operand == 0 || Math.Abs(operand) < Math.Abs(AH)) { Exception(#DE); } Else { AH = AX.S % operand.S; AL = AX.S / operand.S; } 16bit version: If (operand == 0 || Math.Abs(operand) < Math.Abs(DX)) { Exception(#DE); } Else { DX = DX:AX.S % operand.S; AX = DX:AX.S / operand.S; } 32bit version: If (operand == 0 || Math.Abs(operand) < Math.Abs(EDX)) { Exception(#DE); } Else { EDX = EDX:EAX.S % operand.S; EAX = EDX:EAX.S / operand.S; } Sample: ___________________________________________________________________________________________________ IMUL Integer MULtiply Description: Performs a signed integer multiplication. There are three forms of this instruction: o One Operand: This version uses a single operand value. The 8bit version multiplies the operand with AL, storing the final 16bit result into AX. The 16bit version multiplies the operand with AX, storing the final 32bit result into DX:AX. The 32bit version multiplies the operand with EAX, storing the final 64bit result into EDX:EAX. o Two Operands: Multiplies the two operands together, storing the result into the first operand. The first operand must be a 16bit or 32bit register, and the second operand must be a 16bit or 32bit register or memory value. NOTE: Some assemblers support a short version of the Three Operands version of IMUL where the first and second operand are the same, by showing this as a Two Operand instruction. For example: "IMUL EAX, EAX, 24" is shortened to "IMUL EAX, 24". These can easily be recognized since the only form of IMUL that allows immediate values is the Three Operands version. o Three Operands: Multiplies the second and third operands together, storing the result into the first operand. The first operand must be a 16bit or 32bit register, the second operand must be a 16bit or 32bit register or memory value, and the last operand must be an immediate value. The immediate value may be represented in the opcode as an 8bit value, in which case it is sign extended to match the size of the other operands. NOTE: See the NOTE under Two Operands above about how this form may be shown in an assembler as a Two Operands version. In all three forms, the processor calculates the result using double the bits passed in. If the upper half of the result has bits carried into it (the result cannot fit into the lower half), the carry and overflow flags are set. Both flags are cleared otherwise. Only the One Operand form stores the full double bit-depth value. The Two Operands and Three Operands versions only store the lower half. Note that the Two Operands and Three Operands versions may be used for unsigned multiplcation as well due to the fact that the upper half is not used, though the carry and overflow flags are not reliable for this use. Flags: oszapc *????* c - Set if the result is larger than the base operand size o - Set if the result is larger than the base operand size Code: // KERBLUH Sample: ___________________________________________________________________________________________________ POP POP from the stack Description: Pops a value from the stack. Essentially, this copies the memory value (32 or 16 bits at a time) that SS:(E)SP is pointing to, to the operand specified, then adds the size of the value popped (4 or 2 bytes respectively) to (E)SP. In 32bit mode, segment registers receive a full 32 bits of data (a hidden part of the segment register) and work properly. CS cannot be modified by POP. NULL may be popped into a segment register without a protection fault, but trying to use that segment afterwards causes faults as normal. KERBLUH: Translate this from [3]: "If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific. "The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination. "A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers." Flags: oszapc ------ Special Exeptions: If we are popping a value into SS, then: #GP(0) will occur if SS is NULL Code: If (OperandSize == 32) { operand = StackPop32(); } Else { operand = StackPop16(); } Sample: ___________________________________________________________________________________________________ POPA POP All general registers aka POPAD POP All Dword general registers Description: Pops all general registers off of the stack in the following order: (E)DI, (E)SI, (E)BP, (E)SP [this is actually skipped and ignored], (E)BX, (E)DX, (E)CX, (E)AX. Some assemblers will use POPA for a 16bit pop and POPAD for a 32bit pop. Flags: oszapc ------ Code: If (OperandSize == 16) { DI = StackPop16(); SI = StackPop16(); BP = StackPop16(); If (SegmentOperandSize(SS) == 16) { SP += 2; } Else { ESP += 2; } BX = StackPop16(); DX = StackPop16(); CX = StackPop16(); AX = StackPop16(); } Else { EDI = StackPop32(); ESI = StackPop32(); EBP = StackPop32(); If (SegmentOperandSize(SS) == 16) { SP += 4; } Else { ESP += 4; } EBX = StackPop32(); EDX = StackPop32(); ECX = StackPop32(); EAX = StackPop32(); } Sample: ___________________________________________________________________________________________________ POPAD POP All Dword general registers see POPA ___________________________________________________________________________________________________ POPF POP Flags aka POPFD POP Flags Dword Description: Pops either the low 16 bits of the flags register if operand size is 16bit, or the full 32bit flags register if operand size is 32bit. Reserved bits are never modified. When operating in protected mode at privilege level 0 or real mode, only VIP, VIF and VM are not modified (VIP and VIF are cleared and VM is unmodified). When operating in protected mode above privilege level 0 but less or equal to IOPL, IOPL is also not modified. When operating in protected mode and the privilege level is greater than IOPL, IF is also not modified. When operating in Virtual86 mode, IOPL must be 3, and RF is also not modified (though IF is since IOPL would match the priviledge level of V86 mode). If IOPL is not 3, a general-protection exception occurs (#GP). Flags: oszapc ****** All flags are set to the flags value popped off of the stack except those listed above. Special Exceptions: #GP - If this instruction is used in Virtual86 mode and IOPL is less than 3 Code: If (OperandSize == 16) { temp16 = StackPop16(); } Else { temp32 = StackPop32(); } If (InV86Mode) { If (IOPL < 3) { Exception(GP); } Else { // KERBLUH } } Else { // KERBLUH } Sample: ___________________________________________________________________________________________________ PREFETCH* PREFETCH PREFETCHT0 - PREFETCH Temporal to 0th cache level and above (all cache levels) PREFETCHT1 - PREFETCH Temporal to 1st cache level and above (all but 0th level) PREFETCHT2 - PREFETCH Temporal to 2nd cache level and above (all but 0th and 1st level) PREFETCHNTA - PREFETCH Non-Temporal to All cache levels Description: This instruction is merely a processor "hint" to let it know a data set to try to prefetch into the specified cache levels. The amount of data, and even if this data is prefetched, is processor implementation-dependent. If a prefetch is done at all, it will be a minimum of 32 bytes. Prefetches to uncacheable or WC memory (UC or WCF) will be ignored. The only exception these instructions may cause is for break points. NOTE: Some documentation calls this an SSE instruction, but it does is not really an SIMD instruction. Flags: oszapc ------ Code: // Prefetches the information, not exactly something we can show a sample for without adding // cache level support to our code samples, which really wouldn't be very useful. // Since the instruction specifies that often a prefetch doesn't even occur, it almost makes // this instruction completely worthless since it is unreliable. Sample: ___________________________________________________________________________________________________ PUSH PUSH onto the stack Description: This instruction pushes a single value onto the stack. If the operand size is 16bit, the stack pointer is decremented by 2 and then the 16bit value is stored at that location. If the operand size is 32bit, the stack pointer is decremented by 4 and then the 32bit value is stored at that location. If the address size of the SS segment/selector is 32bit, ESP is used as the stack pointer, otherwise SP is used. Note that pushing a 16bit value is not recommended when operating in 32bit mode because it will cause the stack to be misaligned. If an 8bit immediate is pushed onto the stack, it is sign extended to the desired operand size. When pushing (E)SP onto the stack, the value of (E)SP BEFORE the decrement, EXCEPT on processors before the 286 (8088, 8086, 80186, etc), where the value pushed is SP AFTER the decrement. In real mode, if (E)SP is 1 when a push is called, the processor shuts down due to a lack of stack space. KERBLUH - Does it lock up? Or reboot? Or... what? Flags: oszapc ------ Code: PUSH r/m16/32 | PUSH r16/32 | PUSH imm16/32: If (OperandSize == 16) { StackPush16(operand); } Else { StackPush32(operand); } PUSH imm8: If (OperandSize == 16) { temp = TypeCast(operand, S16); StackPush16(temp); } Else { temp = TypeCast(operand, S32); StackPush32(temp); } PUSH seg: If (OperandSize == 16) { StackPush16(operand); } Else { StackPush16(operand); StackPush16(0); } PUSH (E)SP: If (CPU >= 286) { If (OperandSize == 16 || SegmentOperandSize(SS) == 16) { StackPush16(SP); } Else { StackPush32(ESP); } } Else { StackPush16(SP - 2); } Sample: ___________________________________________________________________________________________________ WAIT WAIT for floating point exceptions see FWAIT under Floating Point Instructions ___________________________________________________________________________________________________ xxx The XxX instruction Description: Flags: oszapc ------ Code: Sample: *************************************************************************************************** Floating Point Instructions ___________________________________________________________________________________________________ F2XM1 Floating point 2 to the X Minus 1 x87 Instruction Description: This calculates the exponential value of 2 to the power of the ST(0), minus one. ST(0) must be between -1.0 and 1.0 before calling this instruction, or the result is undefined. The result is stored in ST(0). Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Code: ST(0) = Exponent(2, ST(0)) - 1; Sample: ___________________________________________________________________________________________________ FABS Floating point ABSolute value x87 Instruction Description: Clears the sign bit of the floating point value, converting it to the absolute value. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred [3] is confusing and says "Set to 0 if stack underflow occurred; otherwise cleared to 0" Code: // ??? Is the sign bit the high bit? ST(0) !&= FPSignBit; Sample: ___________________________________________________________________________________________________ FADD Floating point ADD x87 Instruction FADDP - Floating point ADD then Pop FIADD - Floating point Integer ADD Description: Adds two values together. The FADDP instructions also pop ST(0) off the stack after operation. The FIADD instructions convert the integer operand into an extended real before performing the addition; an integer of 0 becomes +0 in floating point. When no operands are specified, FADD/FADDP ST(1), ST(0) is assumed. If one operand is specified, ST(0) is the destination and the specified operand is the source. When the sum of two operands with opposite signs is 0, the result is +0 unless rounding towards -infinity is used, then the result is -0. Anything plus +/-infinity is +/-infinity except when two infinity values with opposite signs are added together, which causes a #IA exception. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Code: FADD (one operand): ST(0) += TypeCast(operand, F80); FADD: operand1 += operand2; FADDP: operand1 += operand2; FPStackPop(); FIADD: ST(0) += TypeCast(operand, F80); Sample: ___________________________________________________________________________________________________ FADDP Floating point ADD then Pop x87 Instruction See FADD ___________________________________________________________________________________________________ FBLD Floating point Binary coded decimal LoaD x87 Instruction Description: Converts the BCD operand to extended real, pushing it onto the floating point stack. The sign of the operand is preserved (including for -0). Invalid nibbles (0x0A to 0x0F) are translated and essentially render this function useless. KERBLUH - What data format is the source in? Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred Code: FPStackPush(ConvertBCD80ToF80(operand)); Sample: ___________________________________________________________________________________________________ FBSTP Floating point Binary coded decimal STore and Pop x87 Instruction Description: Converts the value in ST(0) to an 18-digit BCD integer, stores the result into the operand and pops the floating point stack. If ST(0) is not integral, it is rounded to an integer according to the RC field of the FPU control word. If ST(0) is +/-infinity or NaN or too large to fit into an 18-digit BCD value, then a #IA exception is generated. KERBLUH - What data format is the source in? It's 18 digits (9 bytes) plus a sign bit, but details are not listed. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Code: operand = ConvertF80ToBCD80(ST(0)); FPStackPop(); Sample: ___________________________________________________________________________________________________ FCHS Floating point CHange Sign x87 Instruction Description: Changes the sign of the floating point value in ST(0). This has no effect on NaN values. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Code: ST(0) ^= FPSignBit; Sample: ___________________________________________________________________________________________________ FCLEX Floating point CLear EXception flags x87 Instruction FNCLEX - Floating pointer No wait for exceptions before CLearing EXception flags Description: This instruction clears the following floating point flags: PE, UE, OE, ZE, DE, IE, ES, SF and B. FCLEX allows unmasked exceptions to be handled before execution, FNCLEX does not. Note: On the Pentium and 486 processors, there is a slight chance that an interrupt may occur on an FNCLEX instruction prior to it being executed to handle a pending exception. On the Pentium Pro and above, this problem does not exist. Note: FCLEX is actually an FNCLEX with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? PE UE OE ZE DE IE ES SF B 0 0 0 0 0 0 0 0 0 Code: FPUStatus !&= 0x80FF; Sample: ___________________________________________________________________________________________________ FCMOVcc Floating point Conditional MOVe x87 Instruction FCMOVB - Floating point Conditional MOVe if Below FCMOVBE - Floating point Conditional MOVe if Below or Equal FCMOVE - Floating point Conditional MOVe if Equal FCMOVNB - Floating point Conditional MOVe if Not Below FCMOVNBE - Floating point Conditional MOVe if Not Below or Equal FCMOVNE - Floating point Conditional MOVe if Not Equal FCMOVNU - Floating point Conditional MOVe if Not Unordered (parity clear) FCMOVU - Floating point Conditional MOVe if Unordered (parity set) Description: Tests if the specified condition is true in EFLAGS. If it is, then operand1 becomes operand2. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred Code: FCMOVB: If (c) operand1 = operand2; Sample: ___________________________________________________________________________________________________ FCOM Floating point COMpare x87 Instruction FCOMP - Floating point COMpare then Pop FCOMPP - Floating point COMpare then Pop then Pop again FICOM - Floating point Integer COMpare FICOMP - Floating point Integer COMpare then Pop FUCOM - Floating point Unordered COMpare FUCOMP - Floating point Unordered COMpare then Pop FUCOMPP - Floating point Unordered COMpare the Pop then Pop again Description: Compares the operand with ST(0). If no operand is given, then ST(1) is used. The C0, C2 and C3 flags in the floating point status register are set according to the result of the comparison. Sign of zero is ignored, -0.0 and +0.0 are treated as identical. The FICOMx instructions first convert the operand to an extended floating point value; 0 becomes +0.0. If either value is NaN or an unsupported value, #IA is raised and the flags are not modified. If #IA is masked, then all three flags (C0, C2, C3) are set to 1 for unordered. In the FUCOMx instructions, #IA is only raised if either value is unsupported or an sNaN. qNaN values set the flags to 1 for unordered in FUCOMx. FCOMP, FICOMP and FUCOMP pop the stack once after the comparison. FCOMPP and FUCOMPP pop the stack twice after the comparison. [3] shows that the exception is still raised even if it is masked. Is this always the case??? [3] shows that unordered is set for FICOM regardless of #IA being masked Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * C0 - Set if ST(0) < operand OR values are unordered and #IA is masked C1 - Cleared if stack underflow occurred C2 - Set if values are unordered and #IA is masked C3 - Set if ST(0) = operand OR values are unordered and #IA is masked FUCOMx also sets C0, C2 and C3 if either value is a qNaN but neither is unsupported or an sNaN. Code: FCOM: If (FPIsInvalidOrNaN(ST(0)) || FPIsInvalidOrNaN(operand)) { Exception(#IA); If (#IA Is Masked) C0 = C2 = C3 = 1; } Else { C2 = 0; C0 = ST(0) < operand; C3 = ST(0) = operand; // Doesn't test sign if they're both zero } // FCOMP, FCOMPP add: FPStackPop(); // FCOMPP add: FPStackPop(); Sample: ___________________________________________________________________________________________________ FCOMI Floating point COMpare and set Integer flags x87 Instruction FCOMIP - Floating point COMpare and set Integer flags then Pop FUCOMI - Floating point Unordered COMpare and set Integer flags FUCOMIP - Floating point Unordered COMpare and set Integer flags then Pop Description: Compares operand2 (always ST(i)) to operand1 (always ST or ST(0), same thing) and sets EFLAGS according to the result. Carry is set if operand1 is less than operand2. Zero is set if both operands are equal (-0.0 and +0.0 are considered equal). If either value is NaN or an unsupported value, #IA is raised and the flags are not modified. If #IA is masked, then all three flags (Carry, Parity and Zero) are set to 1 for unordered. In the FUCOMIx instructions, #IA is only raised if either value is unsupported or an sNaN. qNaN values set the flags to 1 for unordered in FUCOMIx. FCOMIP and FUCOMIP pop ST(0) off the floating point stack after the comparison. [3] differs in how the flags are set if #IA occurs. At one point, it says, "Flags are set regardless, whether there is an unmasked invalid-arithmetic-operand (#IA) exception generated or not." In another place, it says, "If invalid operation exception is unmasked, the status flags are not set if the invalid-arithmetic-operand exception is generated.". And in it's code example, it only sets the flags if #IA is masked. Therefore, since there are two cases where it IS NOT set on #IA and one when it IS, we are assuming it is not. [3] shows that the exception is still raised even if it is masked. Is this always the case??? Flags: oszapc --*-** z - Set if ST(0) = operand OR values are unordered and #IA is masked p - Set if values are unordered and #IA is masked c - Set if ST(0) < operand OR values are unordered and #IA is masked FUCOMIx also sets c, p and z if either value is a qNaN but neither is unsupported or an sNaN. FPU Flags: C3 C2 C1 C0 - - * - C1 - Cleared if stack underflow occurred Code: FCOMI: If (FPIsInvalidOrNaN(ST(0)) || FPIsInvalidOrNaN(operand)) { Exception(#IA); If (#IA Is Masked) p = c = z = 1; } Else { p = 0; c = ST(0) < operand; z = ST(0) = operand; // Doesn't test sign if they're both zero } // FCOMIP and FUCOMIP add: FPStackPop(); Sample: ___________________________________________________________________________________________________ FCOMIP Floating point COMpare and set Integer flags then Pop x87 Instruction See FCOMI ___________________________________________________________________________________________________ FCOMP Floating point COMpare then Pop x87 Instruction See FCOM ___________________________________________________________________________________________________ FCOMPP Floating point COMpare then Pop then Pop again x87 Instruction See FCOM ___________________________________________________________________________________________________ FCOS Floating point COSine x87 Instruction 80387 and higher Description: Calculates the cosine of ST(0) (in radians) and stores the result in ST(0). ST(0) must be between -2^63 and 2^63 for a value to be generated. If ST(0) is outside of this range, C2 is set and ST(0) is unchanged. The correct range may be attained by using FPREM with a divisor of 2*pi OR by subtracting/adding a large multiple of 2*pi appropriate to the value. If ST(0) is +/-infinity, an sNaN or an unsupported value then #IA is thrown. If ST(0) is qNaN, then nothing happens (does it count as out of range???). Additional cycles are required by the opcode if Abs(ST(0)) > pi/4. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? * * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup If C2 was set, this value is undefined C2 - Set to 1 if ST(0) was out of the range of -2^63 and 2^63, 0 otherwise Code: If (Math.Abs(ST(0)) < Math.Exponent(2, 63)) { C2 = 0; ST(0) = Math.Cos(ST(0)); } Else { C2 = 1; } Sample: ___________________________________________________________________________________________________ FDECSTP Floating point DECrement STack Pointer x87 Instruction Description: This instruction decrements the floating point stack pointer. This in essence rotates the stack pointer by one. [3] states that "The C1 flag is set to 0; otherwise, cleared to 0." Not really certain what the difference between setting C1 to 0 and clearing it to 0 is, but there is no condition to set or clear it so I can only assume it means that C1 is simply always cleared. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - 0 Code: FPStackPointer = (FPStackPointer - 1) & 0x07; Sample: ___________________________________________________________________________________________________ FDISI Floating point DISable Interrupts x87 Instruction 8087 Only FNDISI - Floating point No wait for exceptions DISable Interrupts Description: KERBLUH 80287 and above: This instruction acts just like a FNOP. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FDIV Floating point DIVide x87 Instruction FDIVP - Floating point DIVide then Pop FDIVR - Floating point DIVide Reverse FDIVRP - Floating point DIVide Reverse then Pop FIDIV - Floating point Integer DIVide FIDIVR - Floating point Integer DIVide Reverse Description: Divides one operand by another. The FDIVP/FDIVRP instructions also pop ST(0) off the stack after operation. The FIDIV instructions convert the integer operand into an extended real before performing the division; an integer of 0 becomes +0 in floating point. The FDIV/FDIVP/FIDIV instructions use the source as the divisor and the destination as the dividend, storing the result in the destination. FDIVR/FDIVRP/FIDIVR instructions use the destination as the divisor and the source as the dividend, again storing the result in the destination. When no operands are specified, FDIV/FDIVP/FDIVR/FDIVRP ST(1), ST(0) is assumed. If one operand is specified, ST(0) is the destination and the specified operand is the source. If either value is an sNaN or an unsupported value, #IA is thrown. If both values are +/-infinity or both values are +/- 0, then #IA is thrown. If the divisor is +/-0 and the dividend is a value non-zero floating point number, then #Z is generated. In this case, if #Z is masked, a result of +/-infinity is stored in the destination (the sign would be the appropriate sign of the result). KERBLUH - The original Pentium had a bug in this instruction. What exactly did that do? [3] shows that dividing an infinite value by 0 does not throw an exception but instead just modifies the sign, if appropriate. Is this actually the case ??? Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Floating Point Exceptions: #IA - If either operand is sNaN or unsupported, if both operands are +/-0, or if both operands are +/-infinity #Z - If the source operand is +/-0 and the destination operand is NOT +/-0 Code: If (Instruction == FDIV or FDIVP or FIDIV) { If (Instruction == FDIV, memory operand) divisor = TypeCast(operand2, F80); Else If (Instruction = FIDIV) divisor = TypeCast(operand2, F80); Else divisor = operand2; dividend = operand1; } Else { If (Instruction == FDIVR, memory operand) dividend = TypeCast(operand2, F80); Else If (Instruction = FIDIVR) dividend = TypeCast(operand2, F80); Else dividend = operand2; divisor = operand1; } If (divisor == 0) { If (dividend == 0) { Exception(#IA); } Else { Exception(#Z); If (IsMasked(#Z)) operand1 = (appropriate sign) infinity; } } Else If ((FPIsInvalidOrsNaN(dividend) || FPIsInvalidOrsNaN(divisor)) || (FPIsInfinity(dividend) && FPIsInfinity(divisor))) { Exception(#IA); } Else If (FPIsNaN(dividend) || FPIsNaN(divisor)) { operand1 = qNaN; } Else If (FPIsInfinity(dividend) || FPIsInfinity(divisor)) { operand1 = (appropriate sign) infinity; } Else { operand1 = dividend /= divisor; } If (Instruction == FDIVP or FDIVRP) FPStackPop(); Sample: ___________________________________________________________________________________________________ FDIVP Floating point DIVide then Pop x87 Instruction See FDIV ___________________________________________________________________________________________________ FDIVR Floating point DIVide Reverse x87 Instruction See FDIV ___________________________________________________________________________________________________ FDIVRP Floating point DIVide Reverse then Pop x87 Instruction See FDIV ___________________________________________________________________________________________________ FENI Floating point ENable Interrupts x87 Instruction 8087 Only FNENI - Floating point No wait for exceptions ENable Interrupts Description: KERBLUH 80287 and above: This instruction acts just like a FNOP. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FFREE Floating point FREE register x87 Instruction Description: Sets the tag in the FPU tag register associated with the specified floating point register as free (11b). The contents of the register and the floating point stack are unaffected. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: FPSetTagFree(operand); Sample: ___________________________________________________________________________________________________ FIADD Floating point Integer ADD x87 Instruction See FADD ___________________________________________________________________________________________________ FICOM Floating point Integer COMpare x87 Instruction See FCOM ___________________________________________________________________________________________________ FICOMP Floating point Integer COMpare then Pop x87 Instruction See FCOM ___________________________________________________________________________________________________ FIDIV Floating point Integer DIVide x87 Instruction See FDIV ___________________________________________________________________________________________________ FIDIVR Floating point Integer DIVide Reverse x87 Instruction See FDIV ___________________________________________________________________________________________________ FILD Floating point Integer Load x87 Instruction Description: Converts the signed integer operand to an extended floating point value then pushes it onto the floating point stack. A value of 0 becomes +0.0. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set to 1 if stack overflow, 0 otherwise Code: FPStackPush(TypeCast(operand, F80)); Sample: ___________________________________________________________________________________________________ FIMUL Floating point Integer MULtiply x87 Instruction See FMUL ___________________________________________________________________________________________________ FINCSTP Floating point INCrement STack Pointer x87 Instruction Description: This instruction increments the floating point stack pointer without first marking ST(0) as unused. This in essence rotates the stack pointer by one. [3] states that "The C1 flag is set to 0; otherwise, cleared to 0." Not really certain what the difference between setting C1 to 0 and clearing it to 0 is, but there is no condition to set or clear it so I can only assume it means that C1 is simply always cleared. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? 0 ? Code: FPStackPointer = (FPStackPointer + 1) & 0x07; Sample: ___________________________________________________________________________________________________ FINIT Floating point INITialize x87 Instruction FNINIT - Floating point No wait for exceptions before INITialize Description: Clears the FPU control, status, tag, instruction pointer and data pointer register to their default states. The FPU control word becomes 037Fh (round to nearest, all exceptions masked, 64bit precision). The status word is cleared (no exception flags, TOP set to 0). Instruction and data pointers are cleared. Data registers are not affected but they're all tagged as empty (11b). FINIT checks for and handles any pending floating point exceptions before executing whereas FNINIT does not. On the x387, these instructions do not clear the instruction and data pointers (KERBLUH - What about pre x387?) There are unusual circumstances where FNINIT would handle exceptions on a Pentium or 486 in "MS-DOS Compatibility Mode" (KERBLUH - What does this mean??? in real mode?), according to [3]. Note: FINIT is actually an FNINIT with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 0 0 0 0 Code: // direct from [3]: FPUControl = 0x037F; FPUStatus = 0; FPUTag = 0xFFFF; FPUDataPointer = 0; FPUInstructionPointer = 0; FPULastInstructionOpcode = 0; Sample: ___________________________________________________________________________________________________ FIST Floating point Integer STore x87 Instruction FISTP - Floating point Integer STore then Pop Description: Converts ST(0) to a signed integer and stores the result into the operand. Rounding is done according to the RC field of the FPU control word. If ST(0) is too large, +/-infinity, NaN, or an unsupported value, #IA is thrown. If #IA is not masked, then the destination is not modified. If it is, then "integer indefinite" is stored (0x8000 for 16bit, 0x80000000 for 32bit, etc) The FISTP instruction pops ST(0) off of the stack after execution. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set to 0 if stack underflow occurred. Indicates rounding direction if #P occurs: 0 = not roundup, 1 = roundup [3] says "Cleared to 0 otherwise", which doesn't make sense here Code: // 16bit version If (FPIsInvalidOrNaN(ST(0)) || FPIsInfinity(ST(0)) || FPIsTooLarge(ST(0), S16)) { #IA; If (FP_IAIsMasked()) operand = 0x8000; } Else { operand = FPConvertF80ToS16(ST(0)); } // FISTP adds: FPStackPop(); Sample: ___________________________________________________________________________________________________ FISTP Floating point Integer STore then Pop x87 Instruction See FIST ___________________________________________________________________________________________________ FISUB Floating point Integer SUBtract x87 Instruction See FSUB ___________________________________________________________________________________________________ FISUBR Floating point Integer SUBtract Reverse x87 Instruction See FSUB ___________________________________________________________________________________________________ FLD Floating point LoaD x87 Instruction Description: Pushes the operand onto the floating point stack, essentially loading ST(0) after bumping all ST(x) registers up an index. If the operand is an F32 or F64, then it is first converted to an extended real. #IA is thrown if the source is an sNan or an unsupported format. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set to 1 if stack overflow occurred, otherwise cleared to 0. Code: FPStackPush(operand); Sample: ___________________________________________________________________________________________________ FLD* Floating point LoaD constant x87 Instruction FLD1 - Floating point LoaD 1 FLDL2E - Floating point LoaD Log2 of e FLDL2T - Floating point LoaD Log2 of Ten FLDLG2 - Floating point LoaD Log10 of 2 FLDLN2 - Floating point LoaD Loge of 2 FLDPI - Floating point LoaD PI FLDZ - Floating point LoaD Zero Description: Pushes the specified constant onto the floating point stack, essentially loading ST(0) after bumping all ST(x) registers up an index. Each constant is stored on the processor as a 66bit constant which is rounded (as specified by RC in the FPU control word) to extended real format. #P is never generated from the rounding. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set to 1 if stack overflow occurred, otherwise cleared to 0. Code: FPStackPush(constant); Sample: ___________________________________________________________________________________________________ FLDCW Floating point LoaD Control Word x87 Instruction Description: Sets the FPU control word to the operand. If one or more exception flags as set in the FPU status word before this instruction, and the new control word unmaskes any of those exception, they will be generated on the next floating point instruction (except those that do not wait for exceptions). To avoid this situation, use FCLEX or FNCLEX before calling this instruction. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: FPUControl = operand; Sample: ___________________________________________________________________________________________________ FLDENV Floating point LoaD ENVironment x87 Instruction Description: This instruction restores a floating point environment stored previously with the FSTENV or FNSTENV instruction. Depending on operand size, either 14 bytes (16bit operand size) or 28 bytes (32bit operand size) is loaded. There are a total of four different formats the data may be loaded: 14 and 28 byte versions for real mode and 14 and 28 byte versions for protected mode. Virtual-86 Mode and System Management Mode (SMM) both use the real mode versions. See "FPU Stored Environment" below under "Data Formats" for full details on how the data is stored. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * The flags are loaded from the stored environment Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FMUL Floating point MULtiply x87 Instruction FMULP - Floating point MULtiply then Pop FIMUL - Floating point Integer MULtiply Description: Multiplies two values together, storing the result in the destination. The FMULP instruction also pops ST(0) off the stack after operation. The FIMUL instruction converts the integer operand into an extended real before performing the multiplication; an integer of 0 becomes +0 in floating point. When no operands are specified, FMUL/FMULP ST(1), ST(0) is assumed. If one operand is specified, ST(0) is the destination and the specified operand is the source. The sign of the result is always the exclusive OR (XOR) of the two sign bits, even if one or more values used is 0 or infinity. Multiplying any value with a qNaN value causes a qNaN result. If either value is an sNaN, the #IA exception is thrown. If one value is +/- 0 and the other is +/-infinity, the #IA exception is thrown. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Floating Point Exceptions: #IA - If either operand is sNaN or unsupported, or if one operand is +/- 0 and the other is +/-infinity Code: If (Instruction == FMUL, memory operand) sourceval = TypeCast(operand2, F80); Else If (Instruction = FIDIV) sourceval = TypeCast(operand2, F80); Else sourceval = operand2; result = operand1; If ((FPIsInvalidOrsNaN(result) || FPIsInvalidOrsNaN(sourceval)) || (FPIsInfinity(result) && FPIsZero(sourceval)) || (FPIsZero(result) && FPIsInfinity(sourceval))) { Exception(#IA); } Else If (FPIsNaN(result) || FPIsNaN(sourceval)) { operand1 = qNaN; } Else If (FPIsInfinity(result) || FPIsInfinity(sourceval)) { operand1 = (appropriate sign) infinity; } Else { operand1 = result * sourceval; } If (Instruction == FMULP) FPStackPop(); Sample: ___________________________________________________________________________________________________ FMULP Floating point MULtiply then Pop x87 Instruction See FMUL ___________________________________________________________________________________________________ FNOP Floating point No OPeration x87 Instruction Description: Performs no operation. [3] says that all of the FPU flags are undefined after this instruction. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: // Do nothing Sample: ___________________________________________________________________________________________________ FNCLEX Floating pointer No wait for exceptions before CLearing EXception flags x87 Instruction See FCLEX ___________________________________________________________________________________________________ FNDISI Floating point No wait for exceptions DISable Interrupts x87 Instruction 8087 Only See FDISI ___________________________________________________________________________________________________ FNENI Floating point No wait for exceptions ENable Interrupts x87 Instruction 8087 Only See FENI ___________________________________________________________________________________________________ FNINIT Floating point No wait for exceptions before INITialize x87 Instruction See FINIT ___________________________________________________________________________________________________ FNSAVE Floating point No wait for exceptions before SAVE fpu state x87 Instruction See FSAVE ___________________________________________________________________________________________________ FNSTCW Floating point No wait for exceptions STore Control Word x87 Instruction see FSTCW ___________________________________________________________________________________________________ FNSTENV Floating point No wait for exceptions STore fpu ENVironment x87 Instruction see FSTENV ___________________________________________________________________________________________________ FNSTSW Floating point No exception before STore Status Word x87 Instruction see FSTSW ___________________________________________________________________________________________________ FPATAN Floating point Partial ArcTANgent x87 Instruction Description: Calculates a partial arctangent. This is done by dividing ST(1) by ST(0), taking the arctangent of the result, then storing that final result into ST(1). Lastly, ST(0) is popped off of the stack. The result will always have the same sign as the source ST(1), and will be between +/- 0 and +/- pi. This function is generally used to calculate the angle between the X axis and a line who's X,Y coordinates are placed into ST(0) (the abscissa or X coordinate) and ST(1) (the ordinate or Y coordinate). Since the process does not actually use a division, it is possible for 0/0 or infinity/infinity to return real results. This means that any valid value, +/- 0 to infinity, will return a meaningful result. However, a qNaN in either value will result in a qNaN. sNaN and undefined values will throw #IA as usual. This chart shows what occurs for different values. Note that, for the purpose of calculating the angle as described above, this chart returns the correct results: ST(0) -inf -real -0 +0 +real +inf -inf -3pi/4 -pi/2 -pi/2 -pi/2 -pi/2 -pi/4 -real -pi -pi to -pi/2 -pi/2 -pi/2 -pi/2 to -0 -0 S -0 -pi -pi -pi -0 -0 -0 T +0 +pi +pi +pi +0 +0 +0 1 +real +pi +pi/2 to +pi +pi/2 +pi/2 +0 to +pi/2 +0 +inf +3pi/4 +pi/2 +pi/2 +pi/2 +pi/2 +pi/4 287 only: Both values must be less than infinity and the absolute value of ST(1) must be less than the absolute value of ST(0). On the later processors, this restriction does not exist. KERBLUH - What happens if these conditions are not met??? Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Set if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup Code: // KERBLUH // Simple version: ST(1) = Math.ArcTan(ST(1) / ST(0)); FPStackPop(); Sample: ___________________________________________________________________________________________________ FPREM Floating point Partial REMainder x87 Instruction Description: Computes the remainder from dividing ST(0) by ST(1), storing the result into ST(0). This may be though of as: \ ST(0) = ST(0) - (TruncateTowardZero(ST(0) / ST(1)) * ST(1)); where TruncateTowardZero is a function that only keeps the whole number portion of the returned floating point value (truncates it towards 0). The sign of the remainder is always the sign of the dividend, even if the result is zero. This instruction always produces an exact result; the precision exception will never be thrown and rounding is ignored. If the divisor is +/- 0 and the dividend is a valid floating point between +/- 0 and infinity but NOT +/- 0 or infinity, a divide-by-zero exception (#Z) is thrown. If the divisor is +/- 0 and the dividend is +/- 0, an invalid-arithmetic exception (#IA) is thrown. If the dividend is +/-infinity, an invalid-arithmetic exception (#IA) is thrown. If the divisor is +/-infinity, the value in ST(0) is not changed. If either values is sNaN or an unsupported value, #IA is thrown. If either value is qNaN at the beginning, the result is qNaN. The "partial remainder" part of the instruction comes into play because the instruction arrives at the remainder via iterative subtraction, but it can not reduce the exponent of ST(0) by more than 63 in a single execution. If the instruction does compute the full remainder, it will clear the C2 flag. Otherwise, C2 sill be set and the result in ST(0) will be a partial remainder, the value of which will have an exponent at least 32 less than the original dividend. This instruction may be used in a loop until the C2 flag is cleared to guarantee a full remainder. [3] states that this value is implementation-dependent and is always between 32 and 63. KERBLUH - Is it possible to get an implementation list? According to [3]: "An important use of the FPREM instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU status word. This information is important in argument reduction for the tangent function (using a modulus of p/4), because it locates the original angle in the correct one of eight sectors of the unit circle." Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * C0 - Set to bit 2 of the quotient C1 - Cleared if stack overflow occurred, otherwise set to bit 0 of the quotient C2 - Cleared if full remainder is calculated, set otherwise C3 - Set to bit 1 of the quotient [3] shows that C0, C2 and C3 are ONLY set if a full remainder is calculated Code: If (FPIsInvalidOrsNaN(ST(0)) || FPIsInvalidOrsNaN(ST(1)) || FPIsInfinity(ST(0)) || (FPIsZero(ST(0)) && FPIsZero(ST(1))) { ThrowException(#IA); } Else If (FPIsZero(ST(1))) { ThrowException(#Z); } Else If (FPIsInfinity(ST(1)) { C2 = 0; } Else If (FPIsqNaN(ST(0)) || FPIsqNaN(ST(1))) { ST(0) = qNaN; C2 = 0; } Else { // This is almost copied verbatum from [3] since it is fairly strange. deltaexponent = GetExponent(ST(0)) - GetExponent(ST(1)); If (deltaexponent < 64) { temp = TruncateTowardZero(ST(0) / ST(1)); ST(0) -= temp * ST(1); C2 = 0; C0 = temp & 0x04; C1 = temp & 0x01; C3 = temp & 0x02; } Else { exponent = deltaexponent - (implementation_dependent_maximum_between_32_and_64); temp = TruncateTowardZero(ST(0) / ST(1) / Math.Exponent(2, exponent)); ST(0) -= temp * ST(1) * Math.Exponent(2, exponent); C2 = 1; } } Sample: ___________________________________________________________________________________________________ FPREM1 Floating point Partial REMainder (ieee version) x87 Instruction 80387 and higher Description: The only difference between this function and FPREM is that FPREM1 rounds (ST(0) / ST(1)) and FPREM truncates it. FPREM1 is the IEEE 754 method of calculating a partial remainder. Computes the remainder from dividing ST(0) by ST(1), storing the result into ST(0). This may be though of as: \ ST(0) = ST(0) - (RoundToInteger(ST(0) / ST(1)) * ST(1)); where RoundToInteger is a function that rounds the floating point value to the nearest integer. The sign of the remainder is always the sign of the dividend, even if the result is zero. This instruction always produces an exact result; the precision exception will never be thrown and rounding is ignored. If the divisor is +/- 0 and the dividend is a valid floating point between +/- 0 and infinity but NOT +/- 0 or infinity, a divide-by-zero exception (#Z) is thrown. If the divisor is +/- 0 and the dividend is +/- 0, an invalid-arithmetic exception (#IA) is thrown. If the dividend is +/-infinity, an invalid-arithmetic exception (#IA) is thrown. If the divisor is +/-infinity, the value in ST(0) is not changed. If either values is sNaN or an unsupported value, #IA is thrown. If either value is qNaN at the beginning, the result is qNaN. The "partial remainder" part of the instruction comes into play because the instruction arrives at the remainder via iterative subtraction, but it can not reduce the exponent of ST(0) by more than 63 in a single execution. If the instruction does compute the full remainder, it will clear the C2 flag. Otherwise, C2 sill be set and the result in ST(0) will be a partial remainder, the value of which will have an exponent at least 32 less than the original dividend. This instruction may be used in a loop until the C2 flag is cleared to guarantee a full remainder. [3] states that this value is implementation-dependent and is always between 32 and 63. KERBLUH - Is it possible to get an implementation list? According to [3]: "An important use of the FPREM1 instruction is to reduce the arguments of periodic functions. When reduction is complete, the instruction stores the three least-significant bits of the quotient in the C3, C1, and C0 flags of the FPU status word. This information is important in argument reduction for the tangent function (using a modulus of p/4), because it locates the original angle in the correct one of eight sectors of the unit circle." Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * C0 - Set to bit 2 of the quotient C1 - Cleared if stack overflow occurred, otherwise set to bit 0 of the quotient C2 - Cleared if full remainder is calculated, set otherwise C3 - Set to bit 1 of the quotient [3] shows that C0, C2 and C3 are ONLY set if a full remainder is calculated Code: If (FPIsInvalidOrsNaN(ST(0)) || FPIsInvalidOrsNaN(ST(1)) || FPIsInfinity(ST(0)) || (FPIsZero(ST(0)) && FPIsZero(ST(1))) { ThrowException(#IA); } Else If (FPIsZero(ST(1))) { ThrowException(#Z); } Else If (FPIsInfinity(ST(1)) { C2 = 0; } Else If (FPIsqNaN(ST(0)) || FPIsqNaN(ST(1))) { ST(0) = qNaN; C2 = 0; } Else { // This is almost copied verbatum from [3] since it is fairly strange. deltaexponent = GetExponent(ST(0)) - GetExponent(ST(1)); If (deltaexponent < 64) { temp = RoundToInteger(ST(0) / ST(1)); ST(0) -= temp * ST(1); C2 = 0; C0 = temp & 0x04; C1 = temp & 0x01; C3 = temp & 0x02; } Else { exponent = deltaexponent - (implementation_dependent_maximum_between_32_and_64); // [3] shows that TruncateTowardZero is used here instead of RoundToInteger, which // I believe is a mistake. ??? temp = RoundToInteger(ST(0) / ST(1) / Math.Exponent(2, exponent)); ST(0) -= temp * ST(1) * Math.Exponent(2, exponent); C2 = 1; } } Sample: ___________________________________________________________________________________________________ FPTAN Floating point Partial TANgent x87 Instruction 80387 and higher Description: If ST(0) is between -2^63 and +2^63, then ST(0) becomes the tangent of ST(0) (using radians), a 1.0 is pushed onto the stack, and C2 is cleared. If ST(0) is out of bounds, then C2 is set and nothing else occurs. No exception is thrown from the value being out of bounds. If ST(0) is +/-infinity, sNaN, or an unsupported value, #IA is thrown. If ST(0) is qNaN, then nothing happens. Additional cycles are required by the opcode if Abs(ST(0)) > pi/4. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? * * ? C1 - Cleared if stack underflow occurred, set if stack overflow occurred If #P is generated: 0 = not roundup, 1 = roundup C2 - Set if ST(0) was out of bounds, cleared otherwise. Code: If (FPIsInvalidOrsNaN(ST(0)) || FPIsInfinity(ST(0)) { ThrowException(#IA); } Else If (Math.Abs(ST(0)) > Math.Exponent(2, 63)) { C2 = 1; } Else { ST(0) = Math.Tan(ST(0)); FPStackPush(1.0); C2 = 0; } Sample: ___________________________________________________________________________________________________ FRNDINT Floating point RouND to INTeger x87 Instruction Description: Rounds the value in ST(0) using the current floating point rounding method to the nearest integer, storing the result into ST(0). If there is no integer portion of ST(0), #P is thrown. If ST(0) is +/-infinity or qNaN, it is unchanged. If ST(0) is sNaN or an unsupported value, #IA is thrown. KERBLUH - [3] says "If the source value is not an integral value, the floating-point inexact result exception (#P) is generated." I took this to mean that if there is no integer portion in ST(0), it causes this error to occur. However, I am not positive since this might be read several different ways. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred. If #P is thrown: 0 - not round up, 1 - round up Code: If (FPIsInvalidOrsNaN(ST(0))) { ThrowException(#IA); } Else If (!FPIsInfinity(ST(0))) { ST(0) = FPRoundToInteger(ST(0)); } // KERBLUH - This doesn't show what happens if there is no "integral" value in ST(0) to begin // with Sample: _______________________________________________________________________________________________________________________FRSTOR Floating point ReSTORe state x87 Instruction Description: Loads (restores) the floating point state from the memory region specified. The state is generally written to memory using the FSAVE/FNSAVE opcodes. The first 14/28 bytes of the state are the same used by FLDENV/FSTENV. See "FPU Stored Environment" below under "Data Formats" for full details on how the data is stored. The next 80 bytes are the FPU/MMX registers, stored in the order they are in the stack (ST(0) then ST(1), etc) This instruction never throws any floating point exceptions. However, if an existing exception that was masked becomes unmasked as a result of this instruction, the exception will be thrown at the end of this instruction. Though this does affect MMX and 3D Now! registers (which really are the floating point registers), the SSE SIMD registers are unaffected. * 486 and below: An FWAIT instruction should be used before using this instruction to guarantee that the previously saved data set is fully written to memory. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * The flags are loaded from the stored environment Code: // Do nothing Sample: ___________________________________________________________________________________________________ FSAVE Floating point SAVE fpu state x87 Instruction FNSAVE - Floating point No wait for exceptions before SAVE fpu state Description: Stores the floating point environment, just like FSTENV. This is 14/28 bytes of data, depending on the mode of the processor. See "FPU Stored Environment" below under "Data Formats" for full details on how the data is stored. Then, all 80 bytes of the FPU register set are stored, ST(0) first going down to ST(7). Note that since MMX and 3D Now! registers are using the same memory space on the FPU, that they are stored as well. Then, the FPU processor is initialized in the exact same way that FINIT/FNINIT does: Clears the FPU control, status, tag, instruction pointer and data pointer register to their default states. The FPU control word becomes 037Fh (round to nearest, all exceptions masked, 64bit precision). The status word is cleared (no exception flags, TOP set to 0). Instruction and data pointers are cleared. Data registers are not affected but they're all tagged as empty (11b). FSAVE checks for and handles any pending floating point exceptions before executing whereas FNSAVE does not. On the x387, these instructions do not clear the instruction and data pointers (KERBLUH - What about pre x387?) There are unusual circumstances where FNSAVE would handle exceptions on a Pentium or 486 in "MS-DOS Compatibility Mode" (KERBLUH - What does this mean??? in real mode?), according to [3]. Though this instruction does store MMX and 3D Now! registers (which really are the floating point registers), the SSE SIMD registers are not stored. Note: FSAVE is actually an FNSAVE with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 0 0 0 0 Code: // KERBLUH - Show the save // direct from [3]: FPUControl = 0x037F; FPUStatus = 0; FPUTag = 0xFFFF; FPUDataPointer = 0; FPUInstructionPointer = 0; FPULastInstructionOpcode = 0; Sample: ___________________________________________________________________________________________________ FSCALE Floating point SCALE x87 Instruction Description: Multiplies ST(0) by 2 to the power of ST(1) after being converted to an integer. The conversion to an integer simply truncates the value in ST(1) towards zero. Essentially this adds ST(1) to ST(0)'s exponent value, though if ST(0) was a denormal value before the adjustment, it's significand is also changed. In a case of overflow or underflow, the significand may also be changed to correctly represent the new value. If ST(0) is +/- 0, +/-infinity or a qNaN, then it is unaffected. If either value is an sNaN or unsupported value, an #IA exception is thrown. If ST(1) is +/-infinity, then over/under flow will occur and the appropriate exception thrown. KERBLUH - What happens if ST(1) is a qNaN? FSCALE is also used to undo the effect of FXTRACT. This is accomplished by using FSCALE followed by FSTP ST(1). Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Clear is stack underflow occurred, ??? set otherwise. If #P was thrown, this means: 0 - not roundup, 1 - roundup Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FSETPM Floating point SET Protected Mode x87 Instruction 80287 Only Description: KERBLUH Acts like a FNOP on 80387 and above Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FSIN Floating point SINe x87 Instruction 80387 and higher Description: Calculates the sine of ST(0) (in radians) and stores the result in ST(0). ST(0) must be between -2^63 and 2^63 for a value to be generated. If ST(0) is outside of this range, C2 is set and ST(0) is unchanged. The correct range may be attained by using FPREM with a divisor of 2*pi OR by subtracting/adding a large multiple of 2*pi appropriate to the value. If ST(0) is +/-infinity, an sNaN or an unsupported value then #IA is thrown. If ST(0) is qNaN, then nothing happens (does it count as out of range???). Additional cycles are required by the opcode if Abs(ST(0)) > pi/4. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? * * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup If C2 was set, this value is undefined C2 - Set to 1 if ST(0) was out of the range of -2^63 and 2^63, 0 otherwise Code: If (Math.Abs(ST(0)) < Math.Exponent(2, 63)) { C2 = 0; ST(0) = Math.Sin(ST(0)); } Else { C2 = 1; } Sample: ___________________________________________________________________________________________________ FSINCOS Floating point SINe and COSine x87 Instruction 80387 and higher Description: Calculates the sine and the cosine of ST(0) (in radians), storing the sine in ST(0) and pushing the cosine onto the stack. ST(0) must be between -2^63 and 2^63 for a value to be generated. If ST(0) is outside of this range, a NaN, +/-infinity, or an unsupported value, C2 is set, ST(0) is unchanged and nothing is pushed onto the stack. The correct range may be attained by using FPREM with a divisor of 2*pi OR by subtracting/adding a large multiple of 2*pi appropriate to the value. If ST(0) is +/-infinity, an sNaN or an unsupported value then #IA is thrown. If ST(0) is qNaN, then nothing happens (does it count as out of range???). Additional cycles are required by the opcode if Abs(ST(0)) > pi/4. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? * * ? C1 - Cleared if stack underflow occurred If #P is generated: 0 = not roundup, 1 = roundup If C2 was set, this value is undefined C2 - Set to 1 if ST(0) was out of the range of -2^63 and 2^63, 0 otherwise Code: If (Math.Abs(ST(0)) < Math.Exponent(2, 63)) { C2 = 0; tempF80 = Math.Cos(ST(0)); ST(0) = Math.Sin(ST(0)); FPStackPush(tempF80); } Else { C2 = 1; } Sample: ___________________________________________________________________________________________________ FSQRT Floating point SQuare RooT x87 Instruction Description: Calculates the square root of ST(0), storing the result into ST(0). If ST(0) is negative (except -0, which returns -0), sNaN, -infinity or an unsupported value, an #IA exception is thrown. If ST(0) is +infinity or qNaN, it does not change. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? set otherwise If #P is thrown, 0 - not roundup, 1 - roundup Code: If (FPIsInvalidOrsNaN(ST(0)) || IsMinusInfinity(ST(0)) || ST(0) < -0) { ThrowException(#IA); } Else If (!IsNaN(ST(0)) && !IsInfinity(ST(0)) && ST(0) != 0) { ST(0) = Math.SquareRoot(ST(0)); } Sample: ___________________________________________________________________________________________________ FST Floating point STore value x87 Instruction FSTP - Floating point STore value and Pop Description: Stores ST(0) to the specified location. The FSTP variants of this instruction then pop ST(0) off of the FPU stack. Only FSTP supports storing a full 80bit extended-real value to memory. If the value is a normal/denormal real value, it is reduced according to the rounding mode of the RC field of the control word. If that value was too large to place into the destination, an overflow exception (#O) is thrown. If the value was too small or would result in a denormal value, an underflow exception (#U) is thrown (#D is never thrown). If the value being stored is +/- 0, +/-infinity, or a NaN, the least-significant bits of the significand and exponent are truncated to fit the destination format. KERBLUH - Doesn't this make it possible for an sNaN to become +/-infinity?!? If the destination is a register and it is a non-empty register, #IA is not thrown. KERBLUH - So... what happens? Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? set otherwise If #P is thrown, 0 - not roundup, 1 - roundup Code: // KERBLUH Sample: ___________________________________________________________________________________________________ FSTP Floating point STore value and Pop x87 Instruction See FST ___________________________________________________________________________________________________ FSTCW Floating point STore Control Word x87 Instruction FNSTCW - Floating point No wait for exceptions STore Control Word Description: Stores the floating point control word to the specified 16bit address. FSTCW allows floating point exceptions to occur before the instruction occurs, FNSTCW does not. Note: On the 486 and Pentium processors in MS-DOS compatibility mode (??? what is this), there are certain unusual circumstances where FNSTCW may be interrupted before executing by floating point exceptions. This does not affect any other processors. Note: FSTCW is actually an FNSTCW with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: operand = FPUControl; Sample: ___________________________________________________________________________________________________ FSTENV Floating point STore fpu ENVironment x87 Instruction FNSTENV - Floating point No wait for exceptions STore fpu ENVironment Description: Stores the floating point environment. This is 14/28 bytes of data, depending on the mode of the processor. See "FPU Stored Environment" below under "Data Formats" for full details on how the data is stored. This includes the FPU control word, status word, tag word, instruction pointer, data pointer and last opcode. After the store, all floating point exceptions are then masked. FSTENV throws any pending floating point exceptions before executing, FNSTENV does not. The state of the FPU that is stored is based after the exceptions are thrown. Note: On the 486 and Pentium processors in MS-DOS compatibility mode (??? what is this), there are certain unusual circumstances where FNSTENV may be interrupted before executing by floating point exceptions. This does not affect any other processors. Note: FSTENV is actually an FNSTENV with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: // KERBLUH - Show the store FPUControl |= 0x3F; // Mask all exceptions Sample: ___________________________________________________________________________________________________ FSTSW Floating point STore Status Word x87 Instruction FNSTSW - Floating point No exception before STore Status Word Description: Stores the FPU Status Word to the specified 16bit location. Unlike other FST* instructions, this instruction allows AX to be a destination. When AX is the destination, only the FNSTSW instruction guarantees that AX is set before the floating point processor executes any further instructions. FSTSW throws any pending floating point exceptions before executing, FNSTSW does not. The state of the FPU that is stored is based after the exceptions are thrown. Note: On the 486 and Pentium processors in MS-DOS compatibility mode (??? what is this), there are certain unusual circumstances where FNSTSW may be interrupted before executing by floating point exceptions. This does not affect any other processors. Note: FSTSW is actually an FNSTSW with a FWAIT before it. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: operand = FPUStatus; Sample: ___________________________________________________________________________________________________ FSUB Floating point SUBtract x87 Instruction FISUB - Floating point Integer SUBtract FISUBR - Floating point Integer SUBtract Reverse FSUBP - Floating point SUBtract then Pop FSUBR - Floating point SUBtract Reverse FSUBRP - Floating point SUBtract Reverse then Pop Description: Subtracts one value from another. The normal versions (FISUB, FSUB, FSUBP) subtract the source operand from the destination, storing the result in the destination. The reverse versions (FISUBR, FSUBR, FSUBRP) subtract the destination operand from the source, storing the result in the destination. The integer versions (FISUB and FISUBR) convert the signed integer value to a floating point value before executing. The pop versions (FSUBP and FSUBRP) pop ST(0) off of the stack after executing. The no operand version is assembly short hand for ST(1), ST(0) (for FSUBP and FSUBRP [though some assemblers use FSUB and FSUBR for the no operand version, a pop still occurs]) and doesn't actually exist as it's own opcode. The single operand versions, those that use a memory value, use ST(0) for the destination and the operand specified for the source. The two operand versions are always two floating point values, the first is the destination and the second is the source (like normal x86 instructions). When two values of like sign result in a value of 0, the result is +0 unless round towards -infinity mode is in use, in which case the result is -0. +0 - -0 results in +0 and -0 - +0 results in -0. Integer operands with a value of 0 are treated as +0. If either value is infinity, the sign of the result is as expected. Subtracting infinity from any valid value results in infinity with a reversed sign. Subtracting any valid value from infinity results in infinity with the same sign. Subtracting +infinity from +infinity or -infinity from -infinity throws an invalid operation exception (#IA). However, subtracting -infinity from +infinity results in +infinity and subtracting +infinity from -infinity results in -infinity. If either value is an sNaN or unsupported value, #IA is thrown as normal. If either value is a qNaN, the result is a qNaN. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? set otherwise If #P is thrown, 0 - not roundup, 1 - roundup Code: // operand1 = Destination // operand2 = Source // Get the source as an extended-real source = TypeCast(operand2, F80); // Is either an sNaN or invalid? If (FPIsInvalidOrsNaN(operand1) || FPIsInvalidOrsNaN(source)) { ThrowException(#IA); // Are they both infinity and the same sign? } Else If (FPIsInfinity(operand1) && FPIsInfinity(source) && FPSignOf(operand1) == FPSignOf(source)) { ThrowException(#IA); // Is either of them a qNaN? } Else If (FPIsqNaN(operand1) || FPIsqNaN(source)) { operand1 = qNaN; // Is it a normal opcode? } Else If (Opcode == FSUB or FISUB or FSUBP) { // Is the destination infinity? If (FPIsInfinity(operand1)) { operand1 = -operand1; // Otherwise, is the source infinity? } Else If (FPIsInfinity(source)) { operand1 = source; // KERBLUH - Show weird zero treatments // Otherwise they're two normal values } Else { operand1 -= source; } // Otherwise, it's a reverse opcode } Else { // Is the destination infinity? If (FPIsInfinity(operand1)) { // No change then // Otherwise, is the source infinity? } Else If (FPIsInfinity(source)) { operand1 = -source; // KERBLUH - Show weird zero treatments // Otherwise they're two normal values } Else { operand1 = source - operand1; } } If (Opcode == FSUBP or FSUBRP) FPStackPop(); Sample: ___________________________________________________________________________________________________ FSUBP Floating point SUBtract then Pop x87 Instruction See FSUB ___________________________________________________________________________________________________ FSUBR Floating point SUBtract Reverse x87 Instruction See FSUB ___________________________________________________________________________________________________ FSUBRP Floating point SUBtract Reverse then Pop x87 Instruction See FSUB ___________________________________________________________________________________________________ FTST Floating point TeST x87 Instruction Description: Tests ST(0) by comparing it with 0, setting the FPU flags according to the comparison. If ST(0) is +/-0, then C3 is set and C0 and C2 are cleared. If ST(0) is less than 0, C0 is set and C2 and C3 are cleared. If ST(0) is greater than 0, C0, C2 and C3 are all cleared. If ST(0) is NaN or undefined. C3 C2 C0 0 0 0 ST(0) > +/-0 0 0 1 ST(0) < 0 1 0 0 ST(0) = 0 1 1 1 ST(0) is NaN or undefined Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * C1 - Cleared if stack underflow occurred, ??? set otherwise If #P is thrown, 0 - not roundup, 1 - roundup C3 C2 C0 - set according to the chart below: 0 0 0 ST(0) > 0 0 0 1 ST(0) < 0 1 0 0 ST(0) = +/-0 1 1 1 ST(0) is NaN or undefined Code: If (FPIsInvalidOrNaN(ST(0))) { C0 = 1; C2 = 1; C3 = 1; } Else { If (ST(0) == 0) { C0 = 0; C2 = 0; C3 = 1; } Else If (ST(0) < 0) { C0 = 1; C2 = 0; C3 = 0; } Else { C0 = 0; C2 = 0; C3 = 0; } } Sample: ___________________________________________________________________________________________________ FUCOM Floating point Unordered COMpare x87 Instruction 80387 and above See FCOM ___________________________________________________________________________________________________ FUCOMI Floating point Unordered COMpare and set Integer flags x87 Instruction See FCOMI ___________________________________________________________________________________________________ FUCOMIP Floating point Unordered COMpare and set Integer flags then Pop x87 Instruction See FCOMI ___________________________________________________________________________________________________ FUCOMP Floating point Unordered COMpare then Pop x87 Instruction 80387 and above See FCOM ___________________________________________________________________________________________________ FUCOMPP Floating point Unordered COMpare the Pop then Pop again x87 Instruction 80387 and above See FCOM ___________________________________________________________________________________________________ FWAIT Floating point WAIT for exceptions aka WAIT - WAIT for floating point exceptions Description: This instruction causes the processor to wait for all pending, unmasked floating point exceptions to be handled before proceeding. It is useful to guarantee that the FPU is done handling exceptions as a synchronization instruction, before relying on the results of previous floating point instructions. NOTE: Many instructions are prefixed with an FWAIT in assembly language. These are: FCLEX, FINIT, FSAVE, FSTCW, FSTENV, FSTSW NOTE: Though this is not technically a floating point/x87 instruction, it is only useful in conjunction with floating point opcodes, and therefore I am placing it here. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? ? ? Code: FPWaitForPendingUnmaskedExceptions(); Sample: ___________________________________________________________________________________________________ FXAM Floating point eXAMine x87 Instruction Description: Classifies the type of value that is in ST(0), storing the result into C0, C1, C2 and C3 according to the table in the FPU Flags section. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * C1 - Always set to the sign of ST(0) regardless of whether the register is empty or full. C3 C2 C0 Type of value in ST(0) 0 0 0 Unsupported 0 0 1 NaN 0 1 0 Normal finite value 0 1 1 Infinity 1 0 0 Zero 1 0 1 Empty 1 1 0 Denormal number Code: C1 = ST(0) & highbit; If (FPIsUnsupported(ST(0))) { temp = 0; } Else If (FPIsNaN(ST(0))) { temp = 1; } Else If (FPIsInfinity(ST(0))) { temp = 3; } Else If (FPIsZero(ST(0)) { temp = 4; } Else If (FPIsEmpty(ST(0))) { temp = 5; } Else If (FPIsDenormal(ST(0))) { temp = 6; } Else { temp = 2; } C3 = temp & 0x04; C2 = temp & 0x02; C0 = temp & 0x01; Sample: ___________________________________________________________________________________________________ FXCH Floating point eXCHange x87 Instruction Description: Exchanges the specified floating point register with ST(0). Some assemblers allow this instruction without an operand. In this case, ST(1) is automatically assumed. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? set otherwise ??? [3] says "Set to 0 if stack underflow occurred; otherwise, cleared to 0". Code: temp = ST(0); ST(0) = operand; operand = temp; Sample: ___________________________________________________________________________________________________ FXRSTOR Floating point eXtended ReSTORe x87 Instruction Description: Reloads the floating point/MMX state as well as SSE state from a 512 byte block of memory that was previously saved by FXSAVE. This instruction does not throw pending FPU exceptions; use a FWAIT before this instruction to gaurantee that they are flushed. The data block is expected to be aligned to a 16 byte block of memory. SSE fields (the XMM registers and MXCSR) will not be loaded into the processor if the OSFXSR bit of CR4 is not set. All reserved bits in MXCSR must be 0 on the data load, otherwise a general protection exception will be thrown. Also, unmasked exceptions in MXCSR will NOT cause those exceptions to be thrown after reading the new value in. See "FPU Extended Stored Environment" below under "Data Formats" for full details on how the data is stored in memory. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 * * * * All flags are loaded from the stored state. Code: // KERBLUH - Show the restore Sample: ___________________________________________________________________________________________________ FXSAVE Floating point eXtended SAVE x87 Instruction Description: Saves the floating point/MMX state as well as SSE state to a 512 byte block of memory for later restoration by FXRSTOR. This instruction does not throw pending FPU exceptions, like FNSAVE. Unlike FNSAVE, however, none of the saved registers are affected by this instruction. It is designed for the maximum speed possible to save the full register set. See "FPU Extended Stored Environment" below under "Data Formats" for full details on how the data is stored in memory. If this instruction is immediately after a floating point instruction that does not use a memory operand, the DP field written to memory is not updated in the same image. If the memory structure is not aligned on a 16 byte boundary, a general protection exception is thrown. In some cases, this overrides the #AC exception even if it is enabled. According to [3], this behavior is implementation dependent. Therefore, #AC should never be relied on when using this instruction. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 - - - - All flags are unaffected. Code: // KERBLUH - Show the save Sample: ___________________________________________________________________________________________________ FXTRACT Floating point eXTRACT exponent and significand x87 Instruction Description: Separates out the exponent and significand of ST(0) by storing the exponent into ST(0) and pushing the significand onto the stack. The signficand value is the operand's exponent expressed as a real number; the significand is set with an exponent of 0 (3FFFh biased). The sign of the significand is the same as the sign of the original value. A value of +/-0 will throw a Division by Zero exception (#Z), unless that exception is masked, in which case the exponent is set to -infinity and the significand becomes 0 with the same sign as the original. KERBLUH - What happens if the value in ST(0) is +/-infinity or it's a qNaN??? Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, set to 1 if stack overflow occurred, ??? otherwise unchanged Code: If (FPIsInvalidOrsNaN(ST(0))) { ThrowException(#IA); } Else If (ST(0) == 0) { If (FPExceptionMasked(#Z)) { temp = ST(0); ST(0) = -infinity; FPStackPush(temp); } Else { ThrowException(#IA); } } Else If (FPIsInfinity(ST(0)) || FPIsqNaN(ST(0))) { // KERBLUH - No clue what happens here } Else { temp = ST(0); ST(0) = Exponent(temp); FPStackPush(Significand(temp)); } Sample: ___________________________________________________________________________________________________ FYL2X Floating point Y times Log2 of X x87 Instruction Description: Calculates (ST(1) * log2(ST(0))), storing the result into ST(1) and popping ST(0) off of the stack. If either value is unsupported or an sNaN, an invalid operation exception (#IA) is thrown. If either value is a qNaN, the result is a qNaN. If ST(0) is less than zero OR ST(0) is +infinity and ST(1) is zero OR both ST(0) and ST(1) are zero OR ST(0) is +/-1 and ST(1) is +/-infinity, then an invalid operation exception (#IA) is thrown. If ST(0) is 0 and ST(1) is a normal or denormal value (non-zero), then a division by zero exception (#Z) is thrown. If ST(0) is greater or equal to zero, but less than 1 (0 <= ST(0) < 1), and ST(1) is +/-infinity, the result is infinity with the opposite sign of ST(1). If ST(0) is +infinity and ST(1) is not zero, the result is infinity with the same sign as ST(1). If ST(0) is greater than 1 and ST(1) is +/-infinity, ST(1) remains unchanged (it stays +/-infinity of the same sign). If you caught all of that, it means that this function only gives a useful result when ST(0) is greater than 0 but less than +infinity, and when ST(1) is any normal or denormal value or 0. If the divide by zero exception (#Z) should be thrown but it is masked, the result will be infinity with the opposite sign of ST(1). This instruction is designed to optimize calculating logarithms with an arbitrary base b: logb(x) = (log2(b) ** -1) * log2(x); Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? otherwise not changed If an inexact result exception (#P) is thrown: 0 - not round up, 1 - round up Code: If (FPIsInvalidOrsNaN(ST(0)) || FPIsInvalidOrsNaN(ST(1))) { ThrowException(#IA); } Else If (FPIsqNaN(ST(0)) || FPIsqNaN(ST(1))) { ST(1) = qNaN; FPStackPop(); } Else If (ST(0) < 0 || (FPIsPositiveInfinity(ST(0)) && ST(1) == 0) || (ST(0) == 0 && ST(1) == 0) || (Math.Abs(ST(0)) == 1 && FPIsInfinity(ST(1)))) { ThrowException(#IA); } Else If (ST(0) == 0 && (FPIsNormal(ST(1)) || ST(1) == 0)) { If (FPExceptionMasked(#Z)) { ST(1) = -Infinity * FPSignOf(ST(1)); FPStackPop(); } Else { ThrowException(#Z); } } Else If (ST(0) >= 0 && ST(0) < 1 && FPIsInfinity(ST(1))) { ST(1) = -Infinity * FPSignOf(ST(1)); FPStackPop(); } Else If (FPIsPositiveInfinity(ST(0)) && ST(1) != 0) { ST(1) = Infinity * FPSignOf(ST(1)); FPStackPop(); } Else If (ST(0) > 1 && FPIsInfinity(ST(1))) { // ST(1) is unchanged FPStackPop(); } Else { ST(1) = ST(1) * Math.Log2(ST(0)); FPStackPop(); } Sample: ___________________________________________________________________________________________________ FYL2XP1 Floating point Y times Log2 of X Plus 1 x87 Instruction Description: Calculates the value of (ST(1) * (log2(ST(0) + 1))), storing the result into ST(1) and popping ST(0) off of the FPU stack. ST(0) must be between -(1 - (SQRT(2) / 2)) and (1 - (SQRT(2) / 2)) inclusive (about -0.293 to 0.293), but ST(1) may be any value between -infinity and infinity inclusive. If ST(0) is outside of this range, the result is undefined and each processor implementation may thrown different exceptions, meaning that exceptions here may not be relied upon. KERBLUH - What does each processor do? If either value is unsupported or an sNaN, an invalid operation exception (#IA) is thrown. If either value is qNaN, the result is a qNaN. If ST(0) is +/-0 and ST(1) is +/-infinity, an invalid operation exception (#IA) is thrown. If ST(0) < 0 and ST(1) is +/-infinity, the result is infinity with the opposite sign of ST(1). If ST(0) > 0 and ST(1) is +/-infinity, ST(1) is unchanged (the result is infinity with the same sign as ST(1)). This instruction is designed for optimal accuracy of values of ST(0) that are close to 0. When it is very small, this instruction is more accurate than using a FADD with 1.0 then using FYL2X since the 1.0 doesn't need to be represented in the value as well as the fraction. Flags: oszapc ------ FPU Flags: C3 C2 C1 C0 ? ? * ? C1 - Cleared if stack underflow occurred, ??? otherwise not changed If an inexact result exception (#P) is thrown: 0 - not round up, 1 - round up Code: If (FPIsInvalidOrsNaN(ST(0)) || FPIsInvalidOrsNaN(ST(1))) { ThrowException(#IA); } Else If (FPIsqNaN(ST(0)) || FPIsqNaN(ST(1))) { ST(1) = qNaN; FPStackPop(); } Else If (ST(0) == 0 && FPIsInfinity(ST(1))) { ThrowException(#IA); } Else If (ST(0) < 0 && FPIsInfinity(ST(1))) { ST(1) = -Infinity * FPSignOf(ST(1)); FPStackPop(); } Else If (ST(0) > 0 && FPIsInfinity(ST(1))) { // ST(1) is unchanged FPStackPop(); } Else { ST(1) = ST(1) * Math.Log2(ST(0) + 1.0); FPStackPop(); } Sample: *************************************************************************************************** SIMD Instructions This section includes all SIMD instructions, which are the MMX and SSE instructions. ___________________________________________________________________________________________________ ADDPS ADD Packed Singles SSE Instruction Description: Adds each of the four 32bit floating point values in both operands together, storing the result into the first operand. Flags: oszapc ------ Code: operand1.F32 += operand2.F32; operand1.F32[1] += operand2.F32[1]; operand1.F32[2] += operand2.F32[2]; operand1.F32[3] += operand2.F32[3]; Sample: ___________________________________________________________________________________________________ ADDSS ADD Single Scalar SSE Instruction Description: Adds the lower 32bit floating point values in both operands together, storing the result into the first operand. Flags: oszapc ------ Code: operand1.F32 += operand2.F32; Sample: ___________________________________________________________________________________________________ ANDNPS ADD Not Packed Singles SSE Instruction Description: Takes the complement of the first operand, bitwise ANDs it with the second operand, and stores the result into the first operand. This essentially clears every bit in operand2 that is set in operand1. This instruction is NOT generally useful for floating point SIMD, but instead for 128bit integer bitwise operations. Flags: oszapc ------ Code: operand1 = (~operand1) & operand2; Sample: ___________________________________________________________________________________________________ MOVD MOVe Double word MMX Instruction Description: Transfers a DWord from memory or a general register to an MMX register, or from an MMX register to memory or a general register. If an MMX register is the source, only the lower 32bits are transferred. If the MMX register was the destination, the upper 32bits of the MMX register become 0. Flags: oszapc ------ Code: If (destination_type == MMX_Register) operand1 = TypeCast(value.U32, U64); Else operand1 = value.U32; Sample: ___________________________________________________________________________________________________ MOVQ MOVe Quad word MMX Instruction Description: Transfers a QWord from memory to an MMX register or from an MMX register to another MMX register or memory. Flags: oszapc ------ Code: operand1 = operand2; Sample: ___________________________________________________________________________________________________ PACKx PACK PACKSSDW - PACK Signed Saturation Dword to Word data PACKSSWB - PACK Signed Saturation Word to Byte data PACKUSWB - Pack Unsigned Saturation Word to Byte data MMX Instruction Description: Packs two values from both the destination and the source operands into half the data space (DWords to Words or Words to Bytes) and stores the result into the destination operand. Signed saturation versions (PACKSSDW, PACKSSWB) use signed saturation on the resulting values before storing the results. Unsigned saturation versions (PACKUSWB) use unsigned saturation on the resulting values before storing the results. Flags: oszapc ------ Code: PACKSSDW: // Assumes result is a U64 result.S16[0] = Saturate(operand1.S32[0], S16); result.S16[1] = Saturate(operand1.S32[1], S16); result.S16[2] = Saturate(operand2.S32[0], S16); result.S16[3] = Saturate(operand2.S32[1], S16); operand1 = result; PACKSSWB: // Assumes result is a U64 result.S8[0] = Saturate(operand1.S16[0], S8); result.S8[1] = Saturate(operand1.S16[1], S8); result.S8[2] = Saturate(operand1.S16[2], S8); result.S8[3] = Saturate(operand1.S16[3], S8); result.S8[4] = Saturate(operand2.S16[0], S8); result.S8[5] = Saturate(operand2.S16[1], S8); result.S8[6] = Saturate(operand2.S16[2], S8); result.S8[7] = Saturate(operand2.S16[3], S8); operand1 = result; PACKUSWB: // Assumes result is a U64 result.U8[0] = Saturate(operand1.U16[0], U8); result.U8[1] = Saturate(operand1.U16[1], U8); result.U8[2] = Saturate(operand1.U16[2], U8); result.U8[3] = Saturate(operand1.U16[3], U8); result.U8[4] = Saturate(operand2.U16[0], U8); result.U8[5] = Saturate(operand2.U16[1], U8); result.U8[6] = Saturate(operand2.U16[2], U8); result.U8[7] = Saturate(operand2.U16[3], U8); operand1 = result; Sample: ___________________________________________________________________________________________________ PADDx Packed ADD PADDB - Packed ADD Bytes PADDW - Packed ADD Words PADDD - Packed ADD DWords PADDSB - Packed ADD with Saturation by Bytes PADDSW - Packed ADD with Saturation by Words PADDUSB - Packed ADD with Unsigned Saturation by Bytes PADDUSW - Packed ADD with Unsigned Saturation by Words MMX Instruction Description: Each value in the second operand is added to the matching value in the first operand. The normal versions of this instruction (PADDB, PADDW, PADDD) truncate the value to the size specified just like the normal ADD opcode. Saturation versions of this instruction (PADDSB, PADDSW) will convert values outside of the maximum or minimum signed values permitted for their data sizes (-128 to 127 for bytes, -32768 to 32767 for words) to the maximum or minimum values respectively. Unsigned saturated versions (PADDUSB, PADDUSW) as the same as saturated version, except they use unsigned boundaries (0 to 255 for bytes, 0 to 65535 for words) instead of signed boundaries. For example, if we use PADDSB and it ends up adding 120 to 50 (120 + 50 = 170), since 170 is greater than 127, 127 will be the result. If we used PADDUSB instead, the result would have been 170 since 170 is still in the appropriate range. Flags: oszapc ------ Code: PADDB: For (q = 0; q < 8; q++) operand1.U8[q] += operand2.U8[q]; PADDW: For (q = 0; q < 4; q++) operand1.U16[q] += operand2.U16[q]; PADDD: operand1.U32[0] += operand2.U32[0]; operand1.U32[1] += operand2.U32[1]; PADDSB: For (q = 0; q < 8; q++) operand1.S8[q] = Saturate(operand1.S8[q] + operand2.S8[q], S8); PADDSW: For (q = 0; q < 4; q++) operand1.S16[q] = Saturate(operand1.S16[q] + operand2.S16[q], S16); PADDUSB: For (q = 0; q < 8; q++) operand1.U8[q] = Saturate(operand1.U8[q] + operand1.U8[q], U8); PADDUSW: For (q = 0; q < 4; q++) operand1.U16[q] = Saturate(operand1.U16[q] + operand.U16[q], U16); Sample: ___________________________________________________________________________________________________ PAND Packed bitwise AND MMX Instruction Description: This does a 64bit AND on both operands, storing the result in operand1. Since there is no carry, overflow, underflow or other considerations, this works properly no matter what data types are in the operands. Each bit in operand1 is set only if the corresponding bit in both operand1 and operand2 were set. XOR affects bits in the following manner: Dest Source Result 0 0 0 0 1 0 1 0 0 1 1 1 Flags: oszapc ------ Code: operand1 &= operand2; Sample: ___________________________________________________________________________________________________ PANDN Packed bitwise AND Not MMX Instruction Description: First performs a bitwise NOT on the operand1, then ANDs operand2 with operand1. Since there is no carry, overflow, underflow or other considerations, this works properly no matter what data types are in the operands. The end result is that every bit that was cleared in the first operand and set in the second operand ends up being set, otherwise the resulting bit is cleared. [3] has an example of this that does not make any sense. It shows the same bit pattern three times, for operand1, operand2 and the result, which would not even be true if it happened that way because of how this mnemonic functions. This instruction affects bits in the following manner: Dest Source Result 0 0 0 0 1 1 1 0 0 1 1 0 Flags: oszapc ------ Code: operand1 = (~operand1) & operand2; Sample: ; We have two 64bit values: ; One is a bitmap of flags (Flags) ; The other is a bitmap of those flags that we want to turn off in Flags (FlagsDisable) movq mm0, [FlagsDisable] ; Get the flags to disable pandn mm0, [Flags] ; NOT the flags to disable then AND in the Flags movq [Flags], mm0 ; Store the result into Flags ___________________________________________________________________________________________________ PAVGx Packed AVeraGe PAVGB - Packed AVeraGe by Byte PAVGW - Packed AVeraGe by Word SSE Instruction Description: Calculates an average for every value in operand1 with the corresponding value in operand2. This is done by adding the two values together, adding one, then bit shifting it right by one. This essentially causes the average to "round up". [3] states that if both values are 1, then their average somehow becomes 2. I find this both very wrong (how do you get 2 for an average with 1 and 1???) and can find no explanation for it since (1 + 1 + 1) >> 1 = 3 >> 1 = 1. Even using the psuedo code from Intel, 1 and 1 would have an average of 1 according to this instruction. Flags: oszapc ------ Code: PAVGB: For (q = 0; q < 8; q++) operand1.U8[q] = (TypeCast(operand1.U8[q], U16) + TypeCast(operand2.U8[q], U16) + 1) >> 1; PAVGW: For (q = 0; q < 4; q++) operand1.U16[q] = (TypeCast(operand1.U16[q], U32) + TypeCast(operand2.U16[q], U32) + 1) >> 1; Sample: ___________________________________________________________________________________________________ PCMPccx Packed CoMPare PCMPEQB - Packed CoMPare for EQuality by Byte PCMPEQW - Packed CoMPare for EQuality by Word PCMPEQD - Packed CoMPare for EQuality by Dword PCMPGTB - Packed CoMPare for Greater Than by Byte (Signed) PCMPGTW - Packed CoMPare for Greater Than by Word (Signed) PCMPGTD - Packed CoMPare for Greater Than by Dword (Signed) MMX Instruction Description: Compares each Byte/Word/DWord in the operands for the condition specified. If the condition is true, operand1 becomes -1 (all 1s), otherwise operand1 becomes 0 (all 0s). Flags: oszapc ------ Code: PCMPEQB // Assumes result is a U64 For (q = 0; q < 8; q++) result.S8[q] = (operand1.U8[q] == operand1.U8[q]) ? -1 : 0; PCMPEQW // Assumes result is a U64 For (q = 0; q < 4; q++) result.S16[q] = (operand1.U16[q] == operand1.U16[q]) ? -1 : 0; PCMPEQD // Assumes result is a U64 For (q = 0; q < 2; q++) result.S32[q] = (operand1.U32[q] == operand1.U32[q]) ? -1 : 0; PCMPGTB // Assumes result is a U64 For (q = 0; q < 8; q++) result.S8[q] = (operand1.S8[q] > operand1.S8[q]) ? -1 : 0; PCMPGTW // Assumes result is a U64 For (q = 0; q < 4; q++) result.S16[q] = (operand1.S16[q] > operand1.S16[q]) ? -1 : 0; PCMPGTD // Assumes result is a U64 For (q = 0; q < 2; q++) result.S32[q] = (operand1.S32[q] > operand1.S32[q]) ? -1 : 0; Sample: ___________________________________________________________________________________________________ PEXTRW Packed EXTRact Word SSE Instruction Description: Extracts the specified word from the MMX register, zero extends it, and stores it into the specified general register. Only the low two bits of the immediate are used. Flags: oszapc ------ Code: temp = operand3 & 0x03; operand1 = TypeCast(operand2.U16[temp], U32); Sample: ___________________________________________________________________________________________________ PINSRW Packed INSeRt Word SSE Instruction Description: Inserts the specified word to the MMX register, using the low word of the specified register or the specified 16bit value. Only the low two bits of the immediate are used. Flags: oszapc ------ Code: temp = operand3 & 0x03; operand1.U16[temp] = operand2.U16; Sample: ___________________________________________________________________________________________________ PMADDWD Packed Multiply ADD Word to Dword MMX Instruction Description: Multiplies the four signed 16bit values in the first operand with the four signed 16bit values in the second operand, which results in four signed 32bit values. The low two 32bit values are added together, then the high two 32bit values are added together, and the result is stored in the first operand. Flags: oszapc ------ Exceptions: Code: // Assume that temp is: S32 temp[4]; For (q = 0; q < 4; q++) { temp[q] = operand1.S16[q] * operand2.S16[q]; } operand1.S32[0] = temp[0] + temp[1]; operand1.S32[1] = temp[2] + temp[3]; Sample: ___________________________________________________________________________________________________ PMAXx Packed MAXimum PMAXSW - Packed MAXimum Signed Word PMAXUB - Packed MAXimum Unsigned Byte SSE Instruction Description: The first operand's 16bit or 8bit values all become the maximum of those values compared to the values of the second operand. Flags: oszapc ------ Code: PMAXSW: For (q = 0; q < 4; q++) { If (operand1.S16[q] < operand2.S16[q]) operand1.S16[q] = operand2.S16[q]; } PMAXUB: For (q = 0; q < 8; q++) { If (operand1.U8[q] < operand2.U8[q]) operand1.U8[q] = operand2.U8[q]; } Sample: ___________________________________________________________________________________________________ PMINx Packed MINimum PMINSW - Packed MINimum Signed Word PMINUB - Packed MINimum Unsigned Byte SSE Instruction Description: The first operand's 16bit or 8bit values all become the minimum of those values compared to the values of the second operand. Flags: oszapc ------ Code: PMAXSW: For (q = 0; q < 4; q++) { If (operand1.S16[q] > operand2.S16[q]) operand1.S16[q] = operand2.S16[q]; } PMAXUB: For (q = 0; q < 8; q++) { If (operand1.U8[q] > operand2.U8[q]) operand1.U8[q] = operand2.U8[q]; } Sample: ___________________________________________________________________________________________________ PMOVMSKB Packed MOVe MaSK Byte SSE Instruction Description: Takes the high bit of each byte in the second operand and stores it as a zero extended byte in the first operand. The bits are stored from the low bit (from the low byte) to the high bit (from the high byte). Flags: oszapc ------ Code: result = 0; For (q = 0; q < 8; q++) { If (operand2.U8[q] & 0x80) result |= 1 << q; } operand1 = result; Sample: ___________________________________________________________________________________________________ PMULx Packed MULtiply PMULHUW - Packed MULtiply High Unsighed Word (SSE Instruction) PMULHW - Packed MULtiply High signed Word (MMX Instruction) PMULLW - Packed MULtiply Low Word (MMX Instruction) MMX/SSE Instruction Description: Multiplies all of the words in the two values together, keeping only one word (high or low) of the result. PMULHUW performs an unsigned multiply and keeps the high words in the first operand. PMULHW performs a signed multiply and keeps the high words. PMULLW performs an unsigned multiply and keeps the low words. [3] shows in their example that the low byte is kept for PMULHUW, but the text and example for the opcode (as well as the name of it) suggest that the high byte is kept. Since it makes no sense for it to duplicate the functionality of PMULLW, I am assuming that it uses the high unsigned word. Flags: oszapc ------ Code: PMULHUW: For (q = 0; q < 4; q++) { temp = operand1.U16[q] * operand2.U16[q]; operand1.U16[q] = temp.U16[1]; } PMULHW: For (q = 0; q < 4; q++) { temp = operand1.S16[q] * operand2.S16[q]; operand1.S16[q] = temp.S16[1]; } PMULLW: For (q = 0; q < 4; q++) operand1.U16[q] = (operand1.U16[q] * operand2.U16[q]).U16; Sample: ___________________________________________________________________________________________________ POR Packed bitwise OR MMX Instruction Description: This causes a 64bit OR of operand2 to operand1. Since carry is not necessary, this works for all data types. Each bit in operand1 is set if the corresponding bit is set in either operand1 or operand2. OR affects bits in the following manner: Dest Source Result 0 0 0 0 1 1 1 0 1 1 1 1 Flags: oszapc ------ Code: operand1 |= operand2; Sample: ___________________________________________________________________________________________________ PSADBW Packed Sum of Absolute Differences Byte to Word SSE Instruction Description: This instruction takes the absolute value of each unsigned byte in operand1 subtracted by each byte in operand2, then adds all eight values together, storing the result into operand1. Since the maximum is 0x07F8 (255 or 0xFF * 8), only the low word of operand1 has a value. The upper three words of operand1 are cleared to zero. Flags: oszapc ------ Code: // Assumes result is 64bit result = 0; For (q = 0; q < 8; q++) result += Abs(operand1.U8[q] - operand2.U8[q]); operand1 = result; Sample: ___________________________________________________________________________________________________ PSHUFW Packed SHUFfle Word SSE Instruction Description: Using the immediate byte value, every word of operand1 is replaced with a specific word in operand2 using 2 bit selectors: Immediate value: BITS 0,1 - 03 - Which word from operand2 will go into word 0 of operand1 2,3 - 0C - Which word from operand2 will go into word 1 of operand1 4,5 - 30 - Which word from operand2 will go into word 2 of operand1 6,7 - C0 - Which word from operand2 will go into word 3 of operand1 Flags: oszapc ------ Code: //value = Immediate operand1.U16[0] = operand2.U16[value & 0x03]; operand1.U16[1] = operand2.U16[(value & 0x0C) >> 2]; operand1.U16[2] = operand2.U16[(value & 0x30) >> 4]; operand1.U16[3] = operand2.U16[value >> 6]; Sample: ___________________________________________________________________________________________________ PSLLx Packed Shift Left Logical PSLLW - Packed Shift Left Logical Word PSLLD - Packed Shift Left Logical Dword PSLLQ - Packed Shift Left Logical Qword MMX Instruction Description: Shifts each data element in operand1 to the left by the number of bits specified in operand2, shifting in 0s at the bottom. Note that unlike most MMX instructions, if the second value is a memory or MMX register, the WHOLE register is used to shift each value. Unlike most shift operations, if the shift is greater or equal to the number of bits to shift (16 or more for 16bit, 32 or more for 32bit, etc), then the result is 0. Flags: oszapc ------ Code: PSLLW: If (operand2 >= 16) { operand1 = 0; } Else { operand1.U16 <<= operand2; operand1.U16[1] <<= operand2; operand1.U16[2] <<= operand2; operand1.U16[3] <<= operand3; } PSLLD: If (operand2 >= 32) { operand1 = 0; } Else { operand1.U32 <<= operand2; operand1.U32[1] <<= operand2; } PSLLQ: If (operand2 >= 64) operand1 = 0; Else operand1 <<= operand2; Sample: ___________________________________________________________________________________________________ PSRAx Packed Shift Right Arithmetic PSRAW - Packed Shift Right Arithmetic Word PSRAD - Packed Shift Right Arithmetic Dword MMX Instruction Description: Shifts each data element in operand1 to the right by the number of bits specified in operand2, preserving the high bit after each shift. Note that unlike most MMX instructions, if the second value is a memory or MMX register, the WHOLE register is used to shift each value. Unlike most shift operations, if the shift is greater or equal to the number of bits to shift (16 or more for 16bit, 32 or more for 32bit, etc), then the result is every bit is set the same as the high bit. Flags: oszapc ------ Code: PSRAW: temp = (operand2 >= 16) ? 15 : operand2; operand1.U16 ->>= temp; operand1.U16[1] ->>= temp; operand1.U16[2] ->>= temp; operand1.U16[3] ->>= temp; PSRAD: temp = (operand2 >= 32) ? 31 : operand2; operand1.U32 ->>= temp; operand1.U32[1] ->>= temp; Sample: ___________________________________________________________________________________________________ PSRLx Packed Shift Right Logical PSRLW - Packed Shift Right Logical Word PSRLD - Packed Shift Right Logical Dword PSRLQ - Packed Shift Right Logical Qword MMX Instruction Description: Shifts each data element in operand1 to the right by the number of bits specified in operand2, shifting in 0s at the top. Note that unlike most MMX instructions, if the second value is a memory or MMX register, the WHOLE register is used to shift each value. Unlike most shift operations, if the shift is greater or equal to the number of bits to shift (16 or more for 16bit, 32 or more for 32bit, etc), then the result is 0. Flags: oszapc ------ Code: PSRLW: If (operand2 >= 16) { operand1 = 0; } Else { operand1.U16 >>= operand2; operand1.U16[1] >>= operand2; operand1.U16[2] >>= operand2; operand1.U16[3] >>= operand3; } PSRLD: If (operand2 >= 32) { operand1 = 0; } Else { operand1.U32 >>= operand2; operand1.U32[1] >>= operand2; } PSRLQ: If (operand2 >= 64) operand1 = 0; Else operand1 >>= operand2; Sample: ___________________________________________________________________________________________________ PSUBx Packed SUBtract PSUBB - Packed SUBtract with wrap-around by Byte PSUBW - Packed SUBtract with wrap-around by Word PSUBD - Packed SUBtract with wrap-around by Dword PSUBSB - Packed SUBtract with Saturation by Byte PSUBSW - Packed SUBtract with Saturation by Word PSUBUSB - Packed SUBtract with Unsigned Saturation by Byte PSUBUSW - Packed SUBtract with Unsigned Saturation by Word MMX Instruction Description: Each value in the second operand is subtracted from the matching value in the first operand. The normal versions of this instruction (PSUBB, PSUBW, PSUBD) truncate the value to the size specified just like the normal SUB opcode. Saturation versions of this instruction (PSUBSB, PSUBSW) will convert values outside of the maximum or minimum signed values permitted for their data sizes (-128 to 127 for bytes, -32768 to 32767 for words) to the maximum or minimum values respectively. Unsigned saturated versions (PSUBUSB, PSUBUSW) as the same as saturated version, except they use unsigned boundaries (0 to 255 for bytes, 0 to 65535 for words) instead of signed boundaries. For example, if we use PSUBSB and it ends up subtracting 120 from -50 (-50 - 120 = -170), since -170 is less than -128, -128 will be the result. If we use PSUBUSB and subtract 120 from 50 (120 - 50 = -70), since -70 is less than 0, 0 will be the result. Flags: oszapc ------ Code: PSUBB: For (q = 0; q < 8; q++) operand1.U8[q] -= operand2.U8[q]; PSUBW: For (q = 0; q < 4; q++) operand1.U16[q] -= operand2.U16[q]; PSUBD: operand1.U32[0] -= operand2.U32[0]; operand1.U32[1] -= operand2.U32[1]; PSUBSB: For (q = 0; q < 8; q++) operand1.S8[q] = Saturate(operand1.S8[q] - operand2.S8[q], S8); PSUBSW: For (q = 0; q < 4; q++) operand1.S16[q] = Saturate(operand1.S16[q] - operand2.S16[q], S16); PSUBUSB: For (q = 0; q < 8; q++) operand1.U8[q] = Saturate(operand1.U8[q] - operand1.U8[q], U8); PSUBUSW: For (q = 0; q < 4; q++) operand1.U16[q] = Saturate(operand1.U16[q] - operand.U16[q], U16); Sample: ___________________________________________________________________________________________________ PUNPCKx Pack UNPaCK PUNPCKHBW - Pack UNPaCK High Byte to Word PUNPCKHWD - Pack UNPaCK High Word to Dword PUNPCKHDQ - Pack UNPaCK High Dword to Qword PUNPCKLBW - Pack UNPaCK Low Byte to Word PUNPCKLWD - Pack UNPaCK Low Word to Dword PUNPCKLDQ - Pack UNPaCK Low Dword to Qword MMX Instruction Description: Each of these interleaves bytes/words/dwords of data from each operand into a single 64bit value. The high opcodes (PUNPCKHBW, PUNPCKHWD, PUNPCKHDQ) use the high 32bits of both operands, whereas the low opcodes (PUNPCKLBW, PUNPCKLWD, PUNPCKLDQ) use the low 32bits of both operands. Also, the low opcodes only read the low 32bits of the second operand if it is a memory operand; the high opcodes read the whole 64bit value, even though they only use the upper 32bits. The interleaving starts reading from the first operand first, then the second, storing the result from low to high order. For example, if we are using PUNPCKHBW, operand1 contains (from low byte to high byte) 01 02 03 04 05 06 07 08 and operand2 contains 11 12 13 14 15 16 17 18, the result would be (from low byte to high byte) 05 15 06 16 07 17 08 18. Notice that the low 4 bytes in both values are ignored. Using the same two operands, PUNPCKLWD would result in (from low byte to high byte) 01 02 11 12 03 04 13 14. Flags: oszapc ------ Code: PUNPCKHBW: // Assumes result is a U64 result.U8 = operand1.U8[4]; result.U8[1] = operand2.U8[4]; result.U8[2] = operand1.U8[5]; result.U8[3] = operand2.U8[5]; result.U8[4] = operand1.U8[6]; result.U8[5] = operand2.U8[6]; result.U8[6] = operand1.U8[7]; result.U8[7] = operand2.U8[7]; operand1 = result; PUNPCKHWD: // Assumes result is a U64 result.U16 = operand1.U16[2]; result.U16[1] = operand2.U16[2]; result.U16[2] = operand1.U16[3]; result.U16[3] = operand2.U16[3]; operand1 = result; PUNPCKHDQ: // Assumes result is a U64 result.U32 = operand1.U32[1]; result.U32[1] = operand2.U32[1]; operand1 = result; PUNPCKLBW: // Assumes result is a U64 result.U8 = operand1.U8; result.U8[1] = operand2.U8; result.U8[2] = operand1.U8[1]; result.U8[3] = operand2.U8[1]; result.U8[4] = operand1.U8[2]; result.U8[5] = operand2.U8[2]; result.U8[6] = operand1.U8[3]; result.U8[7] = operand2.U8[3]; operand1 = result; PUNPCKLWD: // Assumes result is a U64 result.U16 = operand1.U16; result.U16[1] = operand2.U16; result.U16[2] = operand1.U16[1]; result.U16[3] = operand2.U16[1]; operand1 = result; PUNPCKLDQ: // Assumes result is a U64 result.U32 = operand1.U32; result.U32[1] = operand2.U32; operand1 = result; Sample: ___________________________________________________________________________________________________ PXOR Packed bitwise eXclusive OR MMX Instruction Description: operand1 is XOR'd by operand2. Since no carry is necessary from one operand to the next, this single instruction works for all data sizes. Each bit in operand1 is set only if one or the other, but not both, bits are set in the two operands. XOR affects bits in the following manner: Dest Source Result 0 0 0 0 1 1 1 0 1 1 1 0 Flags: oszapc ------ Code: operand1 ^= operand2; Sample: ___________________________________________________________________________________________________ xxx The XxX instruction Description: Flags: oszapc ------ Code: Sample: *************************************************************************************************** Data Formats There are numerous data formats and tables used in the x86 processor family, in particular because of protected mode (286 and above). Note: Many documents show "Top Down" charts or tables for many of the data structures listed below, meaning they start at the highest address and work their way down. Since 99% of the time, I think in a bottom up format for memory addressing, this is confusing and backwards to me. Because of this, *ALL* of the tables/charts below are listed in a bottom up format (starting from the lowest address and moving up) and may appear backwards if you are accustomed to how other documentation displays them (such as [4]). All addresses in the chart are represented with hexadecimal instead of decimal. --------------------------------------------------------------------------------------------------- Integer Values KERBLUH - List the various integer value types that may occur --------------------------------------------------------------------------------------------------- Binary Coded Decimal (BCD) To make representing decimal values easier, the Intel processors have a number of instructions for converting values to and from binary coded decimal (KERBLUH - List of instructions). There are two general types of BCD: unpacked BCD values and packed BCD values. Unpacked BCD values are bytes that store a single digit as a value between 0 and 9, and are most useful for converting numbers to and from ASCII for display. Packed BCD values store two digits in a single byte, using 0 through 9 for each nibble (4 bits), and are most useful for storing two digits in a single byte. Adding 1 to a packed BCD value of 0x09 results in 0x10 (as if it was decimal) instead of 0x0A (the hexidecimal equivalent). Because these differ from binary values, special instructions are generally necessary either before or after every arithmetic operation since arithmetic instructions themselves do not support BCD values. The Intel x87 floating point processors also support a special 18 digit BCD format for loading from and storing to the FPU. These values use 80 bits of memory. The top bit is a sign bit and the bottom 72 bits are the 18 BCD digits. Bits 72 to 78 are unusued (KERBLUH - Does anything interesting happen if you use them? Or are they simply ignored?) --------------------------------------------------------------------------------------------------- Floating Point Values The Intel x87 processors (and 486 and above) use IEEE 754 and 854 standard floating point values. It uses three data sizes which differ primarily in bit depth (and therefore level of precision): 32bit or single precision, 64bit or double precision, and 80bit or extended. 80bit values are designed only for the extended precision used by the FPU itself and not for normal calculations. Each value has a single bit for a sign, a number of bits for an exponent, and the remaining bits for a significand. In some cases, the high bit of the significand is called the Integer or J-bit. These are given a different number of bits per data type: Bit Exponent Significand Exp Bias Range Range Depth Bits Bits Constant (binary) (decimal) 32 8 23 127 2^-126 to 2^127 1.18e-38 to 3.40e+38 64 11 52 1023 2^-1022 to 2^1023 2.23e-308 to 1.79e+308 80 15 64 16383 2^-16382 to 2^16383 3.37e-4932 to 1.18e+4932 [2] argues with itself on how large significands are! It says in one place that 32bit and 64bit floating point values have 24 bits and 53 bits respectively for significands (see section 7.3.4.2), yet most other data says they have 23 bits and 52 bits for the significand. I believe these are meant that there is an implied extra bit of precision for non-denormalized values. Also, [2] states that the Integer bit is only implied in 32bit and 64bit values, but is always present in 80bit values. There are a number of values that may be stored in a floating point value: +/-0, +/-infinity, normalized finite values, denormalized finite values, sNaN and qNaN. To demonstrate samples of each, we are using binary on 32bit values: Value Sample Binary (32bit float) S Exponent Significand +0 0 00000000 00000000000000000000000 -0 1 00000000 00000000000000000000000 +Denormal 0 00000000 0********************** (At least one of the *s must be non-zero) -Denormal 1 00000000 0********************** (At least one of the *s must be non-zero) +Integer 0 00000000 1********************** -Integer 1 00000000 1********************** +Normal 0 ******** *********************** (exponent non-zero and not maximum) -Normal 1 ******** *********************** (exponent non-zero and not maximum) +Infinity 0 11111111 00000000000000000000000 -Infinity 1 11111111 00000000000000000000000 Indefinite 1 11111111 10000000000000000000000 (This is often treated as a normal qNaN as well) sNaN * 11111111 0********************** (At least one of the *s must be non-zero) qNaN * 11111111 1********************** The sign bit works just like the sign bit in integer values. If it is clear, the value is positive; if it is set, the value is negative. The sign bit has no effect on NaN values. The exponent is a biased value to describe where the binary point is located. For 32bit values, the bias is 127; for 64bit, it is 1023; for 80bit, it is 16383. The bias value is used to determine where E2+0 is located so that a single unsigned value may be used to represent the exponent. If the exponent is all 0s, then the value is +/-0 (if the significand is all 0s), a denormalized value (if the significand is non-zero, but the high bit of it is clear) or ??? an integer (if the high bit of the significand is set). If the exponent is all 1s, then the value is +/-infinity (if the significand is all 0s), an sNaN (if the significand is non-zero, but the high bit of it is clear), or a qNaN (if the significand is non-zero and the high bit of it is set). The significand holds the actual binary numerical information for the value. For normalized values, an implied 1 (set bit) is used at the beginning of the significand. For denormalized values, the implied 1 does not exist. For all other values, only a zero or non-zero significand matters; it's actual bit data is insignificant. KERBLUH - Add info on the integer bit. Because this is confusing, some real world examples are listed below (in 32bit format, for readability purposes): Decimal Binary S Exponent Significand Values without a significand: 0.125 1.0E2-3 0 01111100 (124) 00000000000000000000000 -0.125 -1.0E2-3 1 01111100 (124) 00000000000000000000000 1.0 1.0 0 01111111 (127) 00000000000000000000000 2.0 1.0E2+1 0 10000000 (128) 00000000000000000000000 128.0 1.0E2+7 0 10000110 (134) 00000000000000000000000 Values with a significand: 178.125 1.0110010001E2+7 0 10000110 (134) 01100100010000000000000 4371 1.000100010011E2+12 Values that binary cannot represent properly: 0.1 1.100110011001100110011E2-4 0 01111011 (123) 10011001100110011001100 (this is actually 0.0999999940395355224609375) 1.1 1.000110011001100110011 0 01111111 (127) 00011001100110011001100 (this is actually 1.09999942779541015625) All of this for 0.1: 00011001100110011001100 which is really: 0.09999942779541015625 After normalizing: 0.099999904632568359375 All of this for 1/3: 01010101010101010101010 +0 is represented by all bits being cleared, -0 by all bits but the sign bit cleared. In most operations the two are identical and it will be noted in the instructions themselves if they are not. Normalized values are how most floating point values are represented. They have an "assumed" 1 at the beginning of the significand. The exponent is between 1 and one less than the highest possible exponent value and they may be positive or negative depending on the signed bit. For example, with a 32bit floating point value, an exponent between 1 and 254 would be for a normalized value. 1.0 would be represented with an exponent of 127 and a significand of all 0s since the 1 is implied. Denormal values are those values that are so small that they cannot be properly held by the value and only are used in an underflow situation. Unlike normalized values, they do NOT have an "assumed" 1 at the beginning and they may be positive or negative depending on the signed bit. For example, if we had 0001b E-129 as a 32bit value, this would be stored with an exponent of 0 and with the second highest bit of the significand set to 1, all others cleared to 0. Infinity values are those values that are so large they cannot be properly held by the value. They are represented with the largest possible exponent and a significand of all 0s and they may be positive or negative depending on the signed bit. Most math operations on infinity have a result of infinity. Exceptions are listed in the individual instructions. NaN (Not a Number) values are not real numbers. There are two kinds: Quiet (or silently ignored) NaNs or qNaNs, and Signal (or throws an exception) NaNs or sNaNs. qNaNs are allowed to be used through most arithmetic operations without throwing processor exceptions while sNaNs will throw an exception the moment it is used. qNaNs are represented with the largest possible exponent and the highest bit of the significand being set (1), the sign bit is ignored (except in the case of the Indefinite, see below). sNaNs are represented with the largest possible exponent and a non-zero significand with the highest bit of the significand being clear (0), the sign bit is ignored. sNaNs are only set by software, the processor will never cause a value to become an sNaN. Aside from requiring that at least one bit of a sNaN's significand is set (to distinguish them from infinity), software may use the significand of either type of NaN for other purposes, if desired. qNaNs may be generated as follows: Source Values Result An sNaN and a qNaN The qNaN Two sNaNs The sNaN with the larger significand, converted to a qNaN Two qNaNs The qNaN with the larger significand An sNaN and a real The sNaN converted to a qNaN A qNaN and a real The qNaN Either is a NaN, and #IA The qNaN real indefinite is thrown For each data type, there is also an "Indefinite" encoding. This is a qNaN stored with the sign bit set and the significand being all 0s except the highest bit. --------------------------------------------------------------------------------------------------- FPU Stored Environment There are four distinct formats that the FPU Stored Environment may use. These are based on the operand size used for the store/load instructions, as well as what mode the processor is running in. In Virtual-86 and System Management Mode, the real mode versions are used. KERBLUH - Are all the reserved values 0s? ___________________________________________________________________________________________________ Protected Mode 32bit 28 bytes. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Control Word | 00 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Status Word | 04 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Tag Word | 08 +-----------------------------------------------+-----------------------------------------------+ | 32bit Instruction Pointer Offset | 0C +--+--+--+--+--+--------------------------------+-----------------------------------------------+ | 0| 0| 0| 0| 0| 11bit Opcode | 16bit Instruction Pointer Selector | 10 +--+--+--+--+--+--------------------------------+-----------------------------------------------+ | 32bit Operand Pointer Offset | 14 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Operand Pointer Selector | 18 +-----------------------------------------------+-----------------------------------------------+ KERBLUH - [2] shows only four 0s above the 11bit Opcode, but there would have to be five, right? ___________________________________________________________________________________________________ Protected Mode 16bit 14 bytes. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | 16bit Control Word | 00 +---------------------------------------------------------------+ | 16bit Status Word | 02 +---------------------------------------------------------------+ | 16bit Tag Word | 04 +---------------------------------------------------------------+ | 16bit Instruction Pointer Offset | 06 +---------------------------------------------------------------+ | 16bit Instruction Pointer Selector | 08 +---------------------------------------------------------------+ | 16bit Operand Pointer Offset | 0A +---------------------------------------------------------------+ | 16bit Operand Pointer Selector | 0C +---------------------------------------------------------------+ ___________________________________________________________________________________________________ Real Mode 32bit 28 bytes. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Control Word | 00 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Status Word | 04 +-----------------------------------------------+-----------------------------------------------+ | Reserved | 16bit Tag Word | 08 +-----------------------------------------------+-----------------------------------------------+ | Reserved | Bits 0 to 15 of 32bit Instruction Pointer | 0C +--+--+--+--+-----------------------------------+-----------+--+--------------------------------+ | 0| 0| 0| 0| Bits 16 to 31 of 32bit Instruction Pointer | 0| 11bit Opcode | 10 +--+--+--+--+-----------------------------------+-----------+--+--------------------------------+ | Reserved | Bits 0 to 15 of 32bit Operand Pointer | 14 +--+--+--+--+-----------------------------------+-----------+--+--+--+--+--+--+--+--+--+--+--+--+ | 0| 0| 0| 0| Bits 16 to 31 of 32bit Operand Pointer | 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 0| 18 +--+--+--+--+-----------------------------------------------+--+--+--+--+--+--+--+--+--+--+--+--+ ___________________________________________________________________________________________________ Real Mode 16bit 14 bytes. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------------------+ | 16bit Control Word | 00 +---------------------------------------------------------------+ | 16bit Status Word | 02 +---------------------------------------------------------------+ | 16bit Tag Word | 04 +---------------------------------------------------------------+ | Bits 0 to 15 of 20bit Instruction Pointer | 06 +---------------+---+-------------------------------------------+ | IP*,Bits 16-19| 0 | 11bit Opcode | 08 +---------------+---+-------------------------------------------+ | Bits 0 to 15 of 20bit Operand Pointer | 0A +---------------+---+---+---+---+---+---+---+---+---+---+---+---+ |OP*,Bits 16-19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0C +---------------+---+---+---+---+---+---+---+---+---+---+---+---+ * IP for 20bit FPU Instruction Pointer OP for 20bit FPU Operand Pointer Offset --------------------------------------------------------------------------------------------------- FPU Extended Stored Environment Byte Offsets (NOT Bits like other charts) F E D C B A 9 8 7 6 5 4 3 2 1 0 +-------+-------+---------------+-------+-------+-------+-------+ | Rsrvd | CS | IP | FOP | FTW | FSW | FCW | 0000 +-------+-------+---------------+-------+-------+-------+-------+ | Reserved | MXCSR | Rsrvd | DS | DP | 0010 +---------------+-------+-------+-------+-------+---------------+ | Reserved | ST0/MM0 | 0020 +-----------------------+---------------------------------------+ | Reserved | ST1/MM1 | 0030 +-----------------------+---------------------------------------+ | Reserved | ST2/MM2 | 0040 +-----------------------+---------------------------------------+ | Reserved | ST3/MM3 | 0050 +-----------------------+---------------------------------------+ | Reserved | ST4/MM4 | 0060 +-----------------------+---------------------------------------+ | Reserved | ST5/MM5 | 0070 +-----------------------+---------------------------------------+ | Reserved | ST6/MM6 | 0080 +-----------------------+---------------------------------------+ | Reserved | ST7/MM7 | 0090 +-----------------------+---------------------------------------+ | XMM0 | 00A0 +---------------------------------------------------------------+ | XMM1 | 00B0 +---------------------------------------------------------------+ | XMM2 | 00C0 +---------------------------------------------------------------+ | XMM3 | 00D0 +---------------------------------------------------------------+ | XMM4 | 00E0 +---------------------------------------------------------------+ | XMM5 | 00F0 +---------------------------------------------------------------+ | XMM6 | 0100 +---------------------------------------------------------------+ | XMM7 | 0110 +---------------------------------------------------------------+ | Reserved | 0120 +---------------------------------------------------------------+ | Reserved | 0130 +---------------------------------------------------------------+ | Reserved | 0140 +---------------------------------------------------------------+ | Reserved | 0150 +---------------------------------------------------------------+ | Reserved | 0160 +---------------------------------------------------------------+ | Reserved | 0170 +---------------------------------------------------------------+ | Reserved | 0180 +---------------------------------------------------------------+ | Reserved | 0190 +---------------------------------------------------------------+ | Reserved | 01A0 +---------------------------------------------------------------+ | Reserved | 01B0 +---------------------------------------------------------------+ | Reserved | 01C0 +---------------------------------------------------------------+ | Reserved | 01D0 +---------------------------------------------------------------+ | Reserved | 01E0 +---------------------------------------------------------------+ | Reserved | 01F0 +---------------------------------------------------------------+ All reserved values and bits should be 0. FCW FPU Control Word. FSW FPU Status Word. FTW FPU Tag Word. Only the lower 8 bits of this value are used and they are a simplified version of the normal FPU tag word. Instead of having eight 2bit values, this is only 8 1bit values. Each bit is either 0 (for empty, which is 11b in the normal FPU Tag Word), or 1 (for valid, which is 00b, 01b or 10b in the normal FPU Tag Word). Also, the order of the bits ignores the current FPU Top value, storing the values in the order of the physical registers on the processor, not the order of the FPU stack (similar to how the MMX instructions work). If FPU Top is 4, for example, then bit 0 is for ST4, bit 1 is for ST5, bit 2 is for ST6, bit 3 is for ST7, bit 4 is for ST0, etc. FXRSTOR tests specific bits in the actual FPU registers to restore the full FPU Tag Word when this is read back in. FOP FPU Opcode. The lower 11 bits are the register's value, the top 5 bits are reserved (0) IP In 32bit mode, this holds the lower 32 bits of the FPU Instruction Pointer. In 16bit mode, it only holds the lower 16 bits and the upper 16 bits are reserved (0). ??? - Or is this the real IP CS The CS register (same as the high 16 bits of the FPU Instruction Pointer ???) DP In 32bit mode, this holds the lower 32 bits of the FPU Operand Pointer. In 16bit mode, it only holds the lower 16 bits and the upper 16 bits are reserved (0). DS The DS register ??? MXCSR The SIMD floating-point Control/Status register. Reserved bits are 0 and must be 0 when FXRSTOR is used or it will throw a general protection exception. KERBLUH - Are the ST0/MM0 registers stored in stack order, or real order? ST0/MM0 seems to be a confused representation since MM0 is not necessarily ST0. --------------------------------------------------------------------------------------------------- Task Switch Segment (TSS) This structure is used whenever the processor switches tasks. The structure should not sit on the border between two pages when paging is used for it's selector, unless those two pages are ALWAYS guaranteed to be sequential in RAM and to always be in RAM. The processor only calculates the initial physical address and reads the entire structure from that point. All reserved values should contain 0s. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Reserved | Previous Task Link | 00 +-----------------------------------------------+-----------------------------------------------+ | ESP0 | 04 +-----------------------------------------------+-----------------------------------------------+ | Reserved | SS0 | 08 +-----------------------------------------------+-----------------------------------------------+ | ESP1 | 0C +-----------------------------------------------+-----------------------------------------------+ | Reserved | SS1 | 10 +-----------------------------------------------+-----------------------------------------------+ | ESP2 | 14 +-----------------------------------------------+-----------------------------------------------+ | Reserved | SS2 | 18 +-----------------------------------------------+-----------------------------------------------+ | CR3 (PDBR) | 1C +-----------------------------------------------------------------------------------------------+ | EIP | 20 +-----------------------------------------------------------------------------------------------+ | EFLAGS | 24 +-----------------------------------------------------------------------------------------------+ | EAX | 28 +-----------------------------------------------------------------------------------------------+ | ECX | 2C +-----------------------------------------------------------------------------------------------+ | EDX | 30 +-----------------------------------------------------------------------------------------------+ | EBX | 34 +-----------------------------------------------------------------------------------------------+ | ESP | 38 +-----------------------------------------------------------------------------------------------+ | EBP | 3C +-----------------------------------------------------------------------------------------------+ | ESI | 40 +-----------------------------------------------------------------------------------------------+ | EDI | 44 +-----------------------------------------------+-----------------------------------------------+ | Reserved | ES | 48 +-----------------------------------------------+-----------------------------------------------+ | Reserved | CS | 4C +-----------------------------------------------+-----------------------------------------------+ | Reserved | SS | 50 +-----------------------------------------------+-----------------------------------------------+ | Reserved | DS | 54 +-----------------------------------------------+-----------------------------------------------+ | Reserved | FS | 58 +-----------------------------------------------+-----------------------------------------------+ | Reserved | GS | 5C +-----------------------------------------------+-----------------------------------------------+ | Reserved | LDT Segment Selector | 60 +-----------------------------------------------+--------------------------------------------+--+ | I/O Map Base Address | Reserved | T| 64 +-----------------------------------------------+--------------------------------------------+--+ Previous Task Link [Dynamic] Holds the segment selector of the previous task, useful for IRET statements. SS0:ESP0, SS1:ESP1, SS2:ESP2 [Static] These are used for the stack pointers when this task is accessed in privilege levels 0, 1 and 2 respectively. They are designed for use when the task switches to a higher level privileged mode, usually for a hardware interrupt routine. CR3 (aka PDBR) [Static] This is the base physical address of the page directory to be used by this task. It is loaded into CR3 when the task is switched to. EIP, EFLAGS, EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI, ES, CS, SS, DS, FS, GS [Dynamic] Stores the states of the registers prior to the task switch. T Bit (Debug Trap) [Static] If this bit is set, a debug exception is thrown when a task switch to this task occurs. I/O Map Base Address [Static] A 16bit offset from the base of the TSS to the I/O permission bitmap and interrupt redirection bitmap. --------------------------------------------------------------------------------------------------- Segment Selectors In protected mode, the segment registers are actually selectors, pointers into either the GDT or LDT. These use the following format: F E D C B A 9 8 7 6 5 4 3 2 1 0 +---------------------------------------------------+---+-------+ | Index | TI| RPL | +---------------------------------------------------+---+-------+ Index 13bit. This is a value from 0 to 8191 and is the index into the GDT or LDT that is used for this segment. By simply clearing the bottom three bytes, this also is conveniently the byte index into the appropriate table. Note that in the GDT, an index of 0 is a NULL table entry and may be used for segments that are not initialized. Using a NULL segment selector, or setting CS or SS to NULL, will result in a general protection exception (#GP). TI 1bit. Table Index, which determines which table this selector is using: 0 - GDT, 1 - LDT RPL 2bit. Requested Privilege Level. KERBLUH - What happens if we request more privilege than we have? --------------------------------------------------------------------------------------------------- Segment Descriptors Reserved values should be 0s. All descriptors should be aligned on 8 byte boundaries for maximum speed/efficiency. KERBLUH - What does all this crap mean? ___________________________________________________________________________________________________ Segment Descriptor 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Bits 0 to 15 of the 32bit Base Address | Bits 0 to 15 of the 20bit Segment Limit | 00 +-----------------------+--+--+--+--+-----------+--+-----+--+-----------+-----------------------+ | Base bits 24-31 | G|DB| 0|Av|SLbits16-19| P| DPL | S| Type | Base bits 16-23 | 04 +-----------------------+--+--+--+--+-----------+--+-----+--+-----------+-----------------------+ Segment Limit: 20bit. Bits 0 to 15 of the word at 00h, bits 0 to 3 of byte 06h are bits 16 to 19. This value, along with the G flag, specify the limit of the segment. If G is cleared, this value is the number of bytes in the segment, allowing 1 byte to 1MB. If G is set, this is the number of 4KB pages in the segment, allowing 4KB to 4GB. Base Address: 32bit. Bits 0 to 15 of the word at 02h, bits 0 to 7 of byte 04h are bits 16 to 23, and bits 0 to 7 of byte 07h are bits 24 to 31. Type: 4bit. Segment Type, bits 0 to 3 of byte 05h. Along with the system flag, this determines the type of the descriptor: S cleared (system descriptor): 0000 - Reserved 0001 - 16bit TSS (Available) 0010 - LDT 0011 - 16bit TSS (Busy) 0100 - 16bit Call Gate 0101 - Task Gate 0110 - 16bit Interrupt Gate 0111 - 16bit Trap Gate 1000 - Reserved 1001 - 32bit TSS (Available) 1010 - Reserved 1011 - 32bit TSS (Busy) 1100 - 32bit Call Gate 1101 - Reserved 1110 - 32bit Interrupt Gate 1111 - 32bit Trap Gate S set (code or data segment): BITS 0 - 1 - A (Accessed) flag. When this segment has been accessed (loaded into a segment register), the processor sets this flag. The flag is sticky, meaning it must be specifically cleared. This is useful for memory management software to know what segments have been accessed in a given time frame, and it is also useful for debugging software. 1 - 2 - Data Segments: W (Write) flag. If this is set, this data segment may be written to. Otherwise, it's read only. Code Segments: R (Read) flag. If this is set, this code segment may be read from. Otherwise, it's execute only. 2 - 4 - Data Segments: E (Expand Down) flag. If this is set, this data segment stores data in an expand down format, with the highest address at FFFFh (if it's 16bit) or FFFFFFFFh (if it's 32bit) and the lowest address at that address minus the limit. Code Segments: C (Conforming) flag. If this is set, this code segment is a conforming segment, meaning that a less privileged program (with a higher CPL) may execute code in this segment without having to go through a call gate and at the lower privilege level. This is useful for system code that does not need to be run at a high privilege level, and that will be shared with application software. 3 - 8 - If this is clear, this is a data segment. If this is set, this is a code segment. The stack segment is expecting to be on a read/write capable data segment. If it is on a read-only data segment or a code segment, a general protection exception (#GP) will occur. S: 1bit. System Flag, bit 4 of byte 05h. If it is clear, then it is a system descriptor. If it is set, then this is a code or data segment. See Type above for full details. DPL: 2bit. Descriptor Privilege Level, bits 5 and 6 of byte 05h. This is the privilege level the descriptor runs at. (see CPL for more information) P: 1bit. Segment Present, bit 7 of byte 05h. If this bit is set, this descriptor is valid and may be used. When it is clear, it is an empty or "not present" segment and may use the format described below under "Not Present Descriptor". If a segment accesses a "not present" segment, a segment not present exception (#NP) is thrown. This is useful for memory management that supports virtual memory, since it can clear the P flag of a segment that has been moved out of memory and upon receiving the #NP exception, it can reload the segment as necessary. Av: 1bit. Available, bit 4 of byte 06h. It may be used for anything you want it to mean. The processor simply ignores it. DB: (aka D/B) 1bit. Default operand size, bit 6 of byte 06h. If this is clear, then 16bit operand and addressing are the default for this segment. If it is set, then 32bit operand and addressing are the default. Depending on the type of segment this is, it may also have other meanings: o Executable Code Segment: The flag is called D (Default) and works exactly as described above. o Stack Segment (used by SS): The flag is called B (Big) and specifies whether SP is used (if it is clear) or ESP is used (if it is set). If the stack segment is an expand-down data segment, this also controls the upper bound. o Expand-down Data Segment: The flag is called B and specifies whether the upper bound of the segment is at the 16bit upper limit (0FFFFh, when clear) or 32bit upper limit (0FFFFFFFFh, when set). G: 1bit. Granularity, bit 7 of byte 06h. If this is clear, the segment limit is the number of bytes that the segment occupies, allowing 1 byte to 1MB. If it is set, the segment limit is the number of 4KB pages that the segment occupies, allowing 4KB to 4GB. ___________________________________________________________________________________________________ Not Present Descriptor 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------------------------------------------------------+ | Available | 00 +-----------------------------------------------+--+-----+--+-----------+-----------------------+ | Available | 0| DPL | S| Type | Available | 04 +-----------------------------------------------+--+-----+--+-----------+-----------------------+ ___________________________________________________________________________________________________ Task Switch Segement (TSS) Descriptor 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Bits 0 to 15 of the 32bit Base Address | Bits 0 to 15 of the 20bit Segment Limit | 00 +-----------------------+--+--+--+--+-----------+--+-----+--+--+--+--+--+-----------------------+ | Base bits 24-31 | G| 0| 0|Av|SLbits16-19| P| DPL | 0| 1| 0| B| 1| Base bits 16-23 | 04 +-----------------------+--+--+--+--+-----------+--+-----+--+--+--+--+--+-----------------------+ See the normal Segment Descriptor for a description of most fields. B: 1bit. Busy flag KERBLUH ___________________________________________________________________________________________________ Task Gate Descriptor 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | TSS Segment Selector | Reserved | 00 +-----------------------------------------------+--+-----+--+--+--+--+--+-----------------------+ | Reserved | P| DPL | 0| 0| 1| 0| 1| Reserved | 04 +-----------------------------------------------+--+-----+--+--+--+--+--+-----------------------+ ___________________________________________________________________________________________________ Interrupt Gate Descriptor IDT only. Very similar to a call gate and a trap gate, this points to a specific procedure entry point for an interrupt. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Segment Selector | Bits 0 to 15 of the 32bit Offset | 00 +-----------------------------------------------+--+-----+--+--+--+--+--+--+--+--+--------------+ | Bits 16 to 31 of the 32bit Offset | P| DPL | 0| D| 1| 1| 0| 0| 0| 0| Reserved | 04 +-----------------------------------------------+--+-----+--+--+--+--+--+--+--+--+--------------+ Segment Selector: 16bit. The segment selector for where the interrupt resides. Offset: 32bit. Offset into the specified Segment Selector of the procedure entry point. D: 1bit. Size of gate: 1 - 32bit, 0 - 16bit ___________________________________________________________________________________________________ Trap Gate Descriptor IDT only. Very similar to a call gate and an interrupt gate, this points to a specific procedure entry point for an interrupt. 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 +-----------------------------------------------+-----------------------------------------------+ | Segment Selector | Bits 0 to 15 of the 32bit Offset | 00 +-----------------------------------------------+--+-----+--+--+--+--+--+--+--+--+--------------+ | Bits 16 to 31 of the 32bit Offset | P| DPL | 0| D| 1| 1| 1| 0| 0| 0| Reserved | 04 +-----------------------------------------------+--+-----+--+--+--+--+--+--+--+--+--------------+ Segment Selector: 16bit. The segment selector for where the interrupt resides. Offset: 32bit. Offset into the specified Segment Selector of the procedure entry point. D: 1bit. Size of gate: 1 - 32bit, 0 - 16bit *************************************************************************************************** Exceptions and Interrupts The x86 family has 256 interrupt vectors (00h to 0FFh). The first 32 (00h to 1Fh) are reserved for processor specific faults, exceptions or special interrupts. KERBLUH - Add a section about the order simultaneous exceptions are handled In real mode, interrupts are KERBLUH In protected mode, interrupts are called based on the IDT (Interrupt Decriptor Table) which is pointed to by the IDTR (Interrupt Descriptor Table Register). This table should be aligned on an 8 byte boundary, like all other descriptor tables. When an interrupt is called, it multiplies the interrupt vector by 8, then uses that offset in the IDT. Unlike the GDT, the first descriptor of the IDT should be defined (for Int 0 or Divide by Zero). Each IDT entry may be a Task Gate (same format as used in the GDT or LDT), an Interrupt Gate or a Trap Gate. Since descriptors always take 8 bytes each, and there are only 256 interrupts available, the IDT is generally 2048 (8 * 256) bytes long, though it may be shorter since descriptors are only required for interrupts and exceptions that may occur (note, this doesn't prevent software from using any of the undefined interrupts, such as INT 0FFh). All empty descriptor slots should have the present flag for the descriptor cleared. If an interrupt for a vector beyond the limit of the IDT occurs, a general protection exception (#GP) is thrown. KERBLUH - What happens when an interrupt attempts to access an interrupt that is not defined? And what happens if we don't have enough vectors for a #GP to be thrown on an invalid interrupt? IDT entries may be a Task Gate, Interrupt Gate or Trap Gate. Interrupt Gates and Trap Gates are identical, except that interrupt gates clear the IF flag to prevent other interrupts and Trap Gates do not clear the IF flag. KERBLUH - Describe the permissions, differences between the three, how IRET is used to preserve EFLAGS, etc. --------------------------------------------------------------------------------------------------- Exception/Interrupt Types Interrupts are caused by hardware or software and allow the processor to "react" almost immediately to outside influences. For example, when you press a key on the keyboard, an interrupts goes off on the processor to tell it to intercept a new key press. Exceptions are caused by instructions throwing specific errors. These include Divide by Zero (from DIV and IDIV), General Protection faults (accessing privileged code without the proper permission, for example), and Stack Segment faults (when the stack overflows). In both cases, all x86 processors guarantee that exceptions and interrupts are processed in the order in which they are received, regardless of the "out of order" nature of the instructions executed. Note that interrupts, debug exceptions and single-step trap exceptions are not allowed to occur just after a MOV SS, r/m16 or POP SS instruction. ___________________________________________________________________________________________________ Interrupt Normal interrupts may be caused by hardware (INTR, NMI, etc) or by software (INT n instruction). KERBLUH - More info on Interrupts. The return address for the trap handler points to the next instruction to be executed, instead of the instruction that caused the trap. If the trap was called by a jump or branch style instruction (JMP, Jcc, CALL, etc), then the new address will be the value returned to. KERBLUH - Add info about NMI and masking normal interrupts ___________________________________________________________________________________________________ Fault Exception A fault is an error that can generally be corrected and the program may continue execution afterwards. The instruction that caused the fault will not be executed, however, except in rare cases (see below), but will be executed after the exception. The return address on the stack for faults points to the instruction that caused the fault, not to the next instruction. This allows the exception handler to know what instruction caused the fault. This also means that, after the exception, the instruction is attempted again. Because there are many cases where this might cause an infinite loop, if the exception handler cannot correct the problem, the program should be terminated. This behavior is very useful for the Page Fault exceptions (#PF) which occur when a program references a memory address that is not currently in memory (such as virtual memory that may be stored on the hard drive when not in use). Some faults also accept an error code (interrupts 08h, 0Ah, 0Bh, 0Ch, 0Dh, 0Eh and 12h). The error code is pushed onto the stack after CS:(E)IP (KERBLUH - need more info on error codes). Though software may call fault-type exceptions with error codes using INT n. However, since an error code will not be pushed onto the stack, the interrupt will improperly pop off and discard EIP instead of the error code and the interrupt will not return to the correct location in memory (causing a real exception, generally #GP). In extremely rare cases, a fault may occur WHILE an instruction is executing and as such, it will have partly been executed. [4] gives a POPA as an example, where the POPA would cause the stack to underflow, throwing a Stack Segment fault (#SS). In this case, some of the registers may have been updated by the POPA instruction. These are considered programming errors and the operating system should terminate the offending program in such cases. KERBLUH - Is there a way to detect this kind of error? KERBLUH - Get the information on the error code ___________________________________________________________________________________________________ Trap Exception Traps are reported immediately after the instruction that caused the trap. They are used mostly for debugging purposes, and allow the program to be continued without affecting the flow of the program (aside from the time to take the trap). The return address for the trap handler points to the next instruction to be executed, instead of the instruction that caused the trap. If the trap was called by a jump or branch style instruction (JMP, Jcc, CALL, etc), then the new address will be the value returned to. ___________________________________________________________________________________________________ Abort Exception Abort exceptions are severe program faults, such as hardware errors or illegal values in system tables. They are not designed to permit the program to continue execution after they are called and should instead abort the program, perhaps after collecting diagnostic information about the state of the processor when the fault happened. --------------------------------------------------------------------------------------------------- Specific Exceptions Vector Excep Type Error Description/Source (hex) Code 00 #DE Fault No Divide Error, from DIV or IDIV 01 #DB Fault/Trap No Debug, from any code/data reference or an INT 1 02 --- Interrupt No NMI/Non-Maskable Hardware Interrupt 03 #BP Trap No Breakpoint, called by INT 3 04 #OF Trap No Overflow, called by INTO 05 #BR Fault No BOUND range exceeded, from the BOUND instruction 06 #UD Fault No Invalid/Undefined Opcode (see UD0, UD1 or UD2 to specifically cause this interrupt) 07 #NM Fault No Device not Available (No FPU), from floating point instructions or WAIT/FWAIT 08 #DF Abort Yes-0 Double fault, may happen from any instruction that may generate an exception, NMI or INTR 09 --- Fault No Coprocessor Segment Overrun (reserved), from floating point instructions (387 and below only) 0A #TS Fault Yes Invalid TSS, from a Task switch or TSS access 0B #NP Fault Yes Segment Not Present, attempting to access system segments or invalid segments 0C #SS Fault Yes Stack-Segment Fault, caused when the stack overflows/underflows or an invalid SS is used 0D #GP Fault Yes General Protection Fault, Invalid memory references or other protection error 0E #PF Fault Yes Page Fault, Not in memory references 0F --- N/A No Reserved 10 #MF Fault No Floating point error (Math Fault), from floating point instructions or WAIT/FWAIT 11 #AC Fault Yes-0 486 and above. Alignment Check, Misaligned memory references when enabled 12 #MC Abort No Pentium and above. Machine Check, Error codes (if used) and source are model dependent 13 #XF Fault No Pentium 3 and above. SSE, from SSE instructions 14-1F --- N/A No Reserved 20-FF --- Interrupt No User Defined, External interrupt or INT n instruction KERBLUH - List per exception/interrupt details *************************************************************************************************** Sources Sources [1] Borland Turbo Assembler 5.0 Quick Reference (c) 1995, Borland International, Inc. I used this guide to cross reference timing values that were specified in the 80x86 and 80x87 guides by M Schmit. Since there were a few holes in the Schmit guides, this helped me plug many of them, but it only contained accurate data for the 8086, 80286, 80386 and 80486. 80186 and Pentium timing doesn't really exist for the most part. [2] Intel - Volume 1 - Basic Architecture (243190) (c) 1999, Intel Corporation [3] Intel - Volume 2 - Instruction Set Reference (243191) (c) 1999, Intel Corporation [4] Intel - Volume 3 - System Programming (243192) (c) 1999, Intel Corporation [5] 80x86 Integer Instruction Set (8088 - Pentium) (c) ????, by M Schmit aka Quantasm [6] Timing for the 80x87 Instruction Set (x87 - Pentium) (c) ????, M Schmit aka Quantasm