Now Reading
Undocumented 8086 directions, defined by the microcode

Undocumented 8086 directions, defined by the microcode

2023-07-16 16:06:17

What occurs in case you give the Intel 8086 processor an instruction that does not exist?
A contemporary microprocessor (80186 and later) will generate an exception, indicating that an unlawful instruction
was executed.
Nonetheless, early microprocessors did not embrace the circuitry to detect unlawful directions, for the reason that chips did not have
transistors to spare. As an alternative these processors would do one thing,
however the outcomes weren’t specified.1

The 8086 has a lot of undocumented directions.
Most of them are merely duplicates of standard directions, however just a few have surprising conduct, corresponding to revealing the
values of inner, hidden registers.
Within the 8086, most directions are applied in microcode, so inspecting the 8086’s microcode can clarify why these directions
behave the best way they do.

The photograph under exhibits the 8086 die below a microscope, with the essential practical blocks labeled. The metallic layer is seen, whereas the underlying silicon and polysilicon wiring is generally hidden.
The microcode ROM and the microcode handle decoder are within the decrease proper.
The Group Decode ROM (higher heart) can be essential, because it performs step one of instruction decoding.

The 8086 die under a microscope, with main functional blocks labeled. Click on this image (or any other) for a larger version.

The 8086 die below a microscope, with foremost practical blocks labeled. Click on on this picture (or some other) for a bigger model.

Microcode and 8086 instruction decoding

You would possibly suppose that machine directions are the fundamental steps that a pc performs.
Nonetheless, directions often require a number of steps contained in the processor.
A technique of expressing these a number of steps is thru microcode, a way courting again to 1951.
To execute a machine instruction, the pc internally executes a number of less complicated micro-instructions, specified by the microcode.
In different phrases, microcode types nother layer between the machine directions and the {hardware}.
The principle benefit of microcode is that it turns the processor’s management logic right into a programming process as an alternative of a troublesome logic design process.

The 8068’s microcode ROM holds 512 micro-instructions, every 21 bits vast.
Every micro-instruction performs two actions in parallel. First is a transfer between a supply and a vacation spot, usually registers.
Second is an operation that may vary from an arithmetic (ALU) operation to a reminiscence entry.
The diagram under exhibits the construction of a 21-bit micro-instruction, divided into six sorts.

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

When executing a machine instruction, the 8086 performs a decoding step.
Though the 8086 is a 16-bit processor, its directions are primarily based on bytes. Normally, the primary byte specifies the
opcode, which can be adopted by extra instruction bytes.
In different instances, the byte is a “prefix” byte, which adjustments the conduct of the next instruction.
The primary byte is analyzed
by one thing referred to as the Group Decode ROM.
This circuit categorizes the primary byte of the instruction into about 35 classes that management how the instruction is
decoded and executed.
One class is “1-byte logic”; this means a one-byte instruction or prefix that’s easy and applied by logic circuitry within the 8086.
For directions on this class, microcode isn’t concerned whereas
the remaining directions are applied in microcode.
Many of those directions are within the “two-byte ROM” class indicating that the instruction has a second byte
that additionally must be decoded by microcode.
This second byte, referred to as the ModR/M byte, specifies that reminiscence addressing mode or registers that the instruction makes use of.

The following step is the microcode’s handle decoder circuit, which determines the place to start out executing microcode primarily based on
the opcode.
Conceptually, you possibly can consider the microcode as saved in a ROM, listed by the instruction opcode and some sequence bits.
Nonetheless, since many directions can use the identical microcode, it will be inefficient to retailer duplicate copies of those routines.
As an alternative, the microcode handle decoder permits a number of directions to reference the identical entries within the ROM.
This decoding circuitry is much like a PLA (Programmable Logic Array) so it matches bit patterns to find out a specific start line.
This seems to be essential for undocumented directions since undocumented directions typically match the sample for a “actual” instruction, making the undocumented instruction an alias.

The 8086 has a number of inner registers which can be invisible to the programmer however are utilized by the microcode.
Reminiscence accesses use the Oblique (IND) and Operand (OPR) registers; the IND register holds the handle within the phase,
whereas the OPR register holds the info worth that’s learn or written.
Though these registers are usually not accessible by the programmer, some undocumented directions present entry to those registers, as will likely be described later.

The Arithmetic/Logic Unit (ALU) performs arithmetic, logical, and shift operations within the 8086.
The ALU makes use of three inner registers: tmpA, tmpB, and tmpC. An ALU operation requires two micro-instructions.
The primary micro-instruction specifies the operation (corresponding to ADD) and the momentary register that holds one argument (e.g. tmpA);
the second argument is all the time in tmpB.
A following micro-instruction can entry the ALU consequence by means of the pseudo-register Σ (sigma).

The ModR/M byte

A elementary a part of the 8086 instruction format is the ModR/M byte, a byte that specifies addressing for a lot of directions.
The 8086 has quite a lot of addressing modes, so the ModR/M byte is considerably sophisticated.
Usually it specifies one reminiscence handle and one register. The reminiscence handle is specified by means of one among eight addressing
modes (under) together with an non-compulsory 8- or 16-bit displacement within the instruction.
As an alternative of a reminiscence handle, the ModR/M byte may specify a second register.
For just a few opcodes, the ModR/M byte selects what instruction to execute somewhat than a register.

The 8086's addressing modes. From The register assignments, from MCS-86 Assembly Language Reference Guide.

The implementation of the ModR/M byte performs an essential function within the conduct of undocumented directions.
Assist for this byte is applied in each microcode and {hardware}.
The varied reminiscence handle modes above are applied by microcode subroutines, which compute the suitable reminiscence handle and
carry out a learn if needed.
The subroutine leaves the reminiscence handle within the IND register, and if a learn is carried out, the worth is within the OPR register.

The {hardware} hides the ModR/M byte’s collection of reminiscence versus register, by making the worth accessible by means of the pseudo-register M, whereas the second register is accessible by means of N.
Thus, the microcode for an instruction would not have to know if the worth was in reminiscence or a register, or which register was chosen.
The Group Decode ROM examines the primary byte of the instruction to find out if a ModR/M byte is current, and if a learn
is required.
If the ModR/M byte specifies reminiscence, the Translation ROM determines which micro-subroutines to name earlier than dealing with the
instruction itself.
For extra on the ModR/M byte, see my publish on Reverse-engineering the ModR/M addressing microcode.

Holes within the opcode desk

The primary byte of the instruction is a price from 00 to FF in hex.
Virtually all of those opcode values correspond to documented 8086 directions, however there are just a few exceptions, “holes” within the opcode desk.
The desk under exhibits the 256 first-byte opcodes for the 8086, from hex 00 to FF. Legitimate opcodes for the 8086 are in white;
the coloured opcodes are undefined and fascinating to look at.
Orange, yellow, and inexperienced opcodes got which means within the 80186, 80286, and 80386 respectively.
The purple opcode is uncommon: it was applied within the 8086 and later processors however not documented.2
On this part, I am going to study the microcode for these opcode holes.

This table shows the 256 opcodes for the 8086, where the white ones are valid instructions. Click for a larger version.

This desk exhibits the 256 opcodes for the 8086, the place the white ones are legitimate directions. Click on for a bigger model.

D6: SALC

The opcode D6 (purple above) performs a widely known however undocumented operation that’s usually referred to as SALC, for Set AL to Carry.
This instruction units the AL register to 0 if the carry flag is 0, and units the AL register to FF if the carry flag is 1.
The curious factor about this undocumented instruction is that it exists in all x86 CPUs, however Intel did not point out it till 2017.
Intel most likely put this instruction into the processor intentionally as a copyright trap.
The concept is that if an organization created a replica of the 8086 processor and the processor included the SALC instruction, this
would show that the corporate had copied Intel’s microcode and thus had probably violated Intel’s copyright on the microcode.
This got here to gentle when NEC created improved variations of the 8086, the NEC V20 and V30 microprocessors, and was sued by Intel.
Intel analyzed NEC’s microcode however was upset to search out that NEC’s chip didn’t embrace the hidden instruction, exhibiting
that NEC hadn’t copied the microcode.3
Though a Federal choose ruled in 1989 that NEC hadn’t infringed
Intel’s copyright, the 5-year trial ruined NEC’s market momentum.

The SALC instruction is applied with three micro-instructions, proven under.4
The primary micro-instruction jumps if the carry (CY) is about.
If not, the following instruction strikes 0 to the AL register. RNI (Run Subsequent Instruction) ends the microcode execution
inflicting the following machine instruction to run.
If the carry was set, all-ones (i.e. FF hex) is moved to the AL register and RNI ends the microcode sequence.

           JMPS CY 2 SALC: leap on carry
ZERO → AL  RNI       Transfer 0 to AL, run subsequent instruction
ONES → AL  RNI       2:Transfer FF to AL, run subsequent instruction

0F: POP CS

The 0F opcode is the primary gap within the opcode desk.
The 8086 has directions to push and pop the 4 phase registers, besides opcode 0F is undefined the place POP CS must be.
This opcode performs POP CS efficiently, so the query is why is it undefined?
The reason being that POP CS is basically ineffective and would not do what you’d anticipate, so Intel figured it was greatest not
to doc it.

To grasp why POP CS is ineffective, I have to step again and clarify the 8086’s phase registers.
The 8086 has a 20-bit handle house, however 16-bit registers.
To make this work, the 8086 has the idea of segments: reminiscence is accessed in 64K chunks referred to as segments, that are positioned
within the 1-megabyte handle house.
Particularly, there are 4 segments: Code Phase, Stack Phase, Knowledge Phase, and Additional Phase,
with 4 phase registers that outline the beginning of the phase: CS, SS, DS, and ES.

An inconvenient a part of phase addressing is that if you wish to entry greater than 64K, you have to change the phase register.
So that you would possibly push the info phase register, change it quickly so you possibly can entry a brand new a part of reminiscence, after which pop the previous knowledge phase
register worth off the stack.
This could use the PUSH DS and POP DS directions.
However why not POP CS?

The 8086 executes code from the code phase, with the instruction pointer (IP) monitoring the placement within the code phase.
The principle downside with POP CS is that it adjustments the code phase, however not the instruction pointer, so now you might be executing
code on the previous offset in a brand new phase.
Except you line up your code extraordinarily rigorously, the result’s that you simply’re leaping to an surprising place in reminiscence.
(Usually, you wish to change CS and the instruction pointer on the identical time, utilizing a CALL or JMP instruction.)

The second downside with POP CS is prefetching.
For effectivity, the 8086 prefetches directions earlier than they’re wanted, storing them in an 8-byte prefetch queue.
Whenever you carry out a leap, for example, the microcode flushes the prefetch queue so execution will proceed with the
new directions, somewhat than the previous directions.
Nonetheless, the directions that pop a phase register do not flush the prefetch buffer.
Thus, POP CS not solely jumps to an surprising location in reminiscence, however it’s going to execute an unpredictable variety of directions
from the previous code path.

The POP phase register microcode under packs lots into three micro-instructions.
The primary micro-instruction pops a price from the stack.
Particularly, it strikes the stack pointer (SP) to the Oblique (IND) register.
The Oblique register is an inner register, invisible to the programmer, that holds the handle offset for reminiscence
accesses.
The primary micro-instruction additionally performs a reminiscence learn (R) from the stack phase (SS) after which increments IND
by 2 (P2, plus 2).
The second micro-instruction strikes IND to the stack pointer, updating the stack pointer with the brand new worth.
It additionally tells the microcode engine that this micro-instruction is the next-to-last (NXT) and the following machine instruction
may be began.
The ultimate micro-instruction strikes the worth learn from reminiscence to the suitable phase register and runs the following instruction.
Particularly, reads and writes put knowledge within the inner OPR (Operand) register.
The {hardware} makes use of the register N to point the register specified by the instruction.
That’s, the worth will likely be saved within the CS, DS, ES, or SS register, relying on the bit sample within the instruction.
Thus, the identical microcode works for all 4 phase registers.
That is why POP CS works though POP CS wasn’t explicitly applied within the microcode; it makes use of the frequent code.

SP → IND  R SS,P2 POP sr: learn from stack, compute IND plus 2
IND → SP  NXT     Put up to date worth in SP, begin subsequent instruction.
OPR → N   RNI     Put stack worth in specified phase register

However why does POP CS run this microcode within the first place?
The microcode to execute is chosen primarily based on the instruction, however a number of directions can execute the identical microcode.
You may consider the handle decoder as pattern-matching on the instruction’s bit patterns, the place a few of the bits may be ignored.
On this case, the POP sr microcode above is run by any instruction with the bit sample 000??111, the place a query mark
may be both a 0 or a 1.
You may confirm that this sample matches POP ES (07), POP SS (17), and POP DS (1F).
Nonetheless, it additionally matches 0F, which is why the 0F opcode runs the above microcode and performs POP CS.
In different phrases, to make 0F do one thing apart from POP CS would require extra circuitry, so it was simpler to
depart the motion applied however undocumented.

606F: conditional jumps

One entire row of the opcode desk is unused: values 60 to 6F.
These opcodes merely act the identical as 70 to 7F, the conditional leap directions.

The conditional jumps use the next microcode.
It fetches the leap offset from the instruction prefetch queue (Q) and places the worth into the ALU’s tmpBL register,
the low byte of the tmpB register.
It checks the situation within the instruction (XC) and jumps to the RELJMP micro-subroutine if glad.
The RELJMP code (not proven) updates this system counter to carry out the leap.

Q → tmpBL                Jcond cb: Get offset from prefetch queue
           JMP XC RELJMP Check situation, if true leap to RELJMP routine
           RNI           No leap: run subsequent instruction

This code is executed for any instruction matching the bit sample 011?????, i.e. something from 60 to 7F.
The situation is specified by the 4 low bits of the instruction.
The result’s that any instruction 606F is an alias for the corresponding conditional leap 707F.

C0, C8: RET/RETF imm

These undocumented opcodes act like a return instruction, particularly RET imm16 (source).
Particularly, the instruction C0 is similar as C2, close to return, whereas C8 is similar as CA, far return.

The microcode under is executed for the instruction bits 1100?0?0, so it’s executed for C0, C2, C8, and CA.
It will get two bytes from the instruction prefetch queue (Q) and places them within the AX register.
Subsequent, it calls FARRET, which performs both a close to return (popping PC from the stack) or a far return (popping PC and CS
from the stack). Lastly, it provides the unique argument to the SP, equal to popping that many bytes.

Q → tmpAL    ADD tmpA    RET/RETF iw: Get phrase from prefetch, arrange ADD
Q → tmpAH    CALL FARRET Name Far Return micro-subroutine
IND → tmpB               Transfer SP (in IND) to tmpB for ADD
Σ → SP       RNI         Put sum in Stack Pointer, finish

One difficult half is that the FARRET micro-subroutine examines bit 3 of the instruction to find out whether or not it does a close to
return or a far return.
That is why documented instruction C2 is a close to return and CA is a far return.
Since C0 and C8 run the identical microcode, they’ll carry out the identical actions, a close to return and a far return respectively.

C1: RET

The undocumented C1 opcode is similar to the documented C3, close to return instruction.
The microcode under is executed for instruction bits 110000?1, i.e. C1 and C3.
The primary micro-instruction reads from the Stack Pointer, incrementing IND by 2.
Prefetching is suspended and the prefetch queue is flushed, since execution will proceed at a brand new location.
The Program Counter is up to date with the worth from the stack, learn into the OPR register.
Lastly, the up to date handle is put within the Stack Pointer and execution ends.

SP → IND  R SS,P2  RET:  Learn from stack, increment by 2
          SUSP     Droop prefetching
OPR → PC  FLUSH    Replace PC from stack, flush prefetch queue
IND → SP  RNI      Replace SP, run subsequent instruction

C9: RET

The undocumented C9 opcode is similar to the documented CB, far return instruction.
This microcode is executed for instruction bits 110010?1, i.e. C9 and CB, so C9 is similar to CB.
The microcode under merely calls the FARRET micro-subroutine to pop the Program Counter and CS register.
Then the brand new worth is saved into the Stack Pointer.
One subtlety is that FARRET appears to be like at bit 3 of the instruction to modify between a close to return and a far return, as
described earlier.
Since C9 and CB each have bit 3 set, they each carry out a far return.

          CALL FARRET  RETF: name FARRET routine
IND → SP  RNI          Replace stack pointer, run subsequent instruction

F1: LOCK prefix

The ultimate gap within the opcode desk is F1.
This opcode is totally different as a result of it’s applied in logic somewhat than microcode.
The Group Decode ROM signifies that F1 is a prefix, one-byte logic, and LOCK.
The Group Decode outputs are the identical as F0, so F1 additionally acts as a LOCK prefix.

Holes in two-byte opcodes

For many of the 8086 directions, the primary byte specifies the instruction.
Nonetheless, the 8086 has just a few directions the place the second byte specifies the instruction: the reg discipline of the ModR/M byte offers an opcode extension that selects the instruction.5
These fall into 4 classes which Intel labeled “Immed”, “Shift”, “Group 1”, and “Group 2”, similar to opcodes 8083, D0D3,
F6F7, and FEFF.
The desk under exhibits how the second byte selects the instruction.
Notice that “Shift”, “Group 1”, and “Group 2” all have gaps, leading to undocumented values.

Meaning of the reg field in two-byte opcodes. From MCS-86 Assembly Language Reference Guide.

These units of directions are applied in two fully other ways.
The “Immed” and “Shift” directions run microcode in the usual means, chosen by the primary byte.
For a typical arithmetic/logic instruction corresponding to ADD, bits 5-3 of the primary instruction byte are latched into the X register to point
which ALU operation to carry out.
The microcode specifies a generic ALU operation, whereas the X register controls whether or not the operation is an ADD, SUB, XOR, or
so forth.
Nonetheless, the Group Decode ROM signifies that for the particular “Immed” and “Shift” directions, the X register latches the bits
from the second byte.
Thus, when the microcode executes a generic ALU operation, it finally ends up with the one specified within the second byte.6

The “Group 1” and “Group 2” directions (F0F1, FEFF), nevertheless, run totally different microcode for every instruction.
Bits 5-3 of the second byte substitute bits 2-0 of the instruction earlier than executing the microcode.
Thus, F0 and F1 act as if they’re opcodes within the vary F0F7, whereas FE and FF act as if they’re opcodes within the vary F8FF.
Thus, every instruction specified by the second byte can have its personal microcode, in contrast to the “Immed” and “Shift” directions.
The trick that makes this work is that every one the “actual” opcodes within the vary F0FF are applied in logic, not microcode,
so there aren’t any collisions.

The outlet in “Shift”: SETMO, D0..D3/6

There’s a “gap” within the record of shift operations when the second byte has the bits 110 (6).
(That is usually expressed as D0/6 and so forth; the worth after the slash is the opcode-selection bits within the ModR/M byte.)
Internally, this worth selects the ALU’s SETMO (Set Minus One) operation, which merely returns FF or FFFF, for a byte or phrase operation respectively.7

The microcode under is executed for 1101000? bit patterns patterns (D0 and D1).
The primary instruction will get the worth from the M register and units up the ALU to do no matter operation was
specified within the instruction (indicated by XI).
Thus, the identical microcode is used for all of the “Shift” directions, together with SETMO.
The result’s written again to M. If no writeback to reminiscence is required (NWB), then RNI runs the following instruction, ending
the microcode sequence.
Nonetheless, if the result’s going to reminiscence, then the final line writes the worth to reminiscence.

M → tmpB  XI tmpB, NXT  rot rm, 1: get argument, arrange ALU
Σ → M     NWB,RNI F     Retailer consequence, possibly run subsequent instruction
          W DS,P0 RNI   Write consequence to reminiscence

The D2 and D3 directions (1101001?) carry out a variable variety of shifts, specified by the CL register, so that they use totally different microcode (under).
This microcode loops the variety of instances specified by CL, however the management movement is a bit difficult to keep away from shifting if
the intial counter worth is 0.
The code units up the ALU to cross the counter (in tmpA) unmodified the primary time (PASS) and jumps to 4, which
updates the counter and units up the ALU for the shift operation (XI).
If the counter isn’t zero, it jumps again to 3, which performs the previously-specified shift and units up
the ALU to decrement the counter (DEC).
This time, the code at 4 decrements the counter.
The loop continues till the counter reaches zero. The microcode shops the consequence as within the earlier microcode.

ZERO → tmpA               rot rm,CL: 0 to tmpA
CX → tmpAL   PASS tmpA    Get depend to tmpAL, arrange ALU to cross by means of
M → tmpB     JMPS 4       Get worth, leap to loop (4)
Σ → tmpB     DEC tmpA F   3: Replace consequence, arrange decrement of depend
Σ → tmpA     XI tmpB      4: replace depend in tmpA, arrange ALU
             JMPS NZ 3    Loop if depend not zero
tmpB → M     NWB,RNI      Retailer consequence, possibly run subsequent instruction
             W DS,P0 RNI  Write consequence to reminiscence

The outlet in “group 1”: TEST, F6/1 and F7/1

The F6 and F7 opcodes are in “group 1”, with the precise instruction specified by bits 5-3 of the second byte.
The second-byte desk confirmed a gap for the 001 bit sequence.
As defined earlier, these bits substitute the low-order bits of the instruction, so F6 with 001 is processed as if it have been
the opcode F1.
The microcode under matches in opposition to instruction bits 1111000?, so F6/1 and F7/1 have the identical impact as F6/0 and F7/1 respectively,
that’s, the byte and phrase TEST directions.

The microcode under will get one or two bytes from the prefetch queue (Q); the L8 situation checks if the operation is
an 8-bit (i.e. byte) operation and skips the second micro-instruction.
The third micro-instruction ANDs the argument and the fetched worth.
The situation flags (F) are set primarily based on the consequence, however the consequence itself is discarded.
Thus, the TEST instruction checks a price in opposition to a masks, seeing if any bits are set.

Q → tmpBL    JMPS L8 2     TEST rm,i: Get byte, leap if operation size = 8
Q → tmpBH                  Get second byte from the prefetch queue
M → tmpA     AND tmpA, NXT 2: Get argument, AND with fetched worth
Σ → no dest  RNI F         Discard consequence however set flags.

I defined the processing of those “Group 3” directions in additional element in my microcode article.

The outlet in “group 2”: PUSH, FE/7 and FF/7

The FE and FF opcodes are in “group 2”, which has a gap for the 111 bit sequence within the second byte.
After substitute, this will likely be processed because the FF opcode, which matches the sample 1111111?.
In different phrases, the instruction will likely be processed the identical because the 110 bit sample, which is PUSH.
The microcode will get the Stack Pointer, units up the ALU to decrement it by 2.
The brand new worth is written to SP and IND. Lastly, the register worth is written to stack reminiscence.

SP → tmpA  DEC2 tmpA   PUSH rm: arrange decrement SP by 2
Σ → IND                Decremented SP to IND
Σ → SP                 Decremented SP to SP
M → OPR    W SS,P0 RNI Write the worth to reminiscence, achieved

82 and 83 “Immed” group

Opcodes 8083 are the “Immed” group, performing one among eight arithmetic operations, specified within the ModR/M byte.
The 4 opcodes differ within the dimension of the values: opcode 80 applies an 8-bit rapid worth to an 8-bit register, 81 applies a 16-bit
worth to a 16-bit register, 82 applies an 8-bit worth to an 8-bit register, and 83 applies an 8-bit worth to a 16-bit register.
The opcode 82 has the unusual scenario that some sources say it’s undocumented, nevertheless it exhibits up in some Intel documentation as a legitimate bit mixture (e.g. under).
Notice that 80 and 82 have the 8-bit to 8-bit motion, so the 82 opcode is redundant.

ADC is one of the instructions with opcode 80-83. From the 8086 datasheet, page 27.

ADC is without doubt one of the directions with opcode 80-83. From the 8086 datasheet, web page 27.

The microcode under is used for all 4 opcodes.
If the ModR/M byte specifies reminiscence, the suitable micro-subroutine is known as to compute the efficient handle in IND,
and fetch the byte or phrase into OPR.
The primary two directions under get the 2 rapid knowledge bytes from the prefetch queue; for an 8-bit operation, the second byte
is skipped.
Subsequent, the second argument M is loaded into tmpA and the specified ALU operation (XI) is configured.
The consequence Σ is saved into the required register M and the operation could terminate with RNI.
But when the ModR/M byte specified reminiscence, the next write micro-operation saves the worth to reminiscence.

Q → tmpBL  JMPS L8 2    alu rm,i: get byte, take a look at if 8-bit op
Q → tmpBH               Perhaps get second byte
M → tmpA   XI tmpA, NXT 2: 
Σ → M      NWB,RNI F    Save consequence, replace flags, achieved if no reminiscence writeback
           W DS,P0 RNI  Write consequence to reminiscence if wanted

The difficult a part of that is the L8 situation, which checks if the operation is 8-bit.
You would possibly suppose that bit 0 acts because the byte/phrase bit in a pleasant, orthogonal means, however the 8086 has a bunch of particular instances.
Bit 0 of the instruction usually selects between a byte and a phrase operation, however there are a bunch of particular instances.
The Group Decode ROM creates a sign indicating if bit 0 must be used because the byte/phrase bit.
However it generates a second sign indicating that an instruction must be pressured to function on bytes, for directions
corresponding to DAA and XLAT.
One other Group Decode ROM sign signifies that bit 3 of the instruction ought to choose byte or phrase; this
is used for the MOV directions with opcodes Bx.
Yet one more Group Decode ROM sign signifies that inverted bit 1 of the instruction ought to choose byte or phrase;
that is used for just a few opcodes, together with 8087.

The essential factor right here is that for the opcodes below dialogue (8083), the L8 micro-condition makes use of each bits 0 and 1
to find out if the instruction is 8 bits or not.
The result’s that solely opcode 81 is taken into account 16-bit by the L8 take a look at, so it’s the just one that makes use of two rapid bytes
from the instruction.
Nonetheless, the register operations use solely bit 0 to pick out a byte or phrase switch.
The result’s that opcode 83 has the bizarre conduct of utilizing an 8-bit rapid operand with a 16-bit register.
On this case, the 8-bit worth is sign-extended to type a 16-bit worth. That’s, the highest little bit of the 8-bit worth fills
the whole higher half of the 16-bit worth,
changing an 8-bit signed worth to a 16-bit signed worth (e.g. -1 is FF, which turns into FFFF).
This is smart for arithmetic operations, however not a lot sense for logical operations.

Intel documentation is inconsistent about which opcodes are listed for which directions.
Intel opcode maps typically outline opcodes 8083.
Nonetheless, lists of particular directions present opcodes 80, 81, and 83 for arithmetic operations however solely 80 and 81 for logical operations.8
That’s, Intel omits the redundant 82 opcode in addition to omitting logic operations that carry out sign-extension (83).

Extra FE holes

For the “group 2” directions, the FE opcode performs a byte operation whereas FF performs a phrase operation.
Many of those operations do not make sense for bytes: CALL, JMP, and PUSH.
(The one directions supported for FE are INC and DEC.) However what occurs in case you use the unsupported directions?
The rest of this part examines these instances and exhibits that the outcomes aren’t helpful.

CALL: FE/2

This instruction performs an oblique subroutine name inside a phase, studying the goal handle from the reminiscence location specified by the ModR/M byte.

The microcode under is a bit convoluted as a result of the code falls by means of into the shared NEARCALL routine, so there’s
some pointless register motion.
Earlier than this microcode executes, the suitable ModR/M micro-subroutine will learn the goal handle from reminiscence.
The code under copies the vacation spot handle from M to tmpB and shops it into the PC later within the code
to switch execution.
The code suspends prefetching, corrects the PC to cancel the offset from prefetching, and flushes the prefetch queue.
Lastly, it decrements the SP by two and writes the previous PC to the stack.

M → tmpB    SUSP        CALL rm: learn worth, droop prefetch
SP → IND    CORR        Get SP, right PC
PC → OPR    DEC2 tmpC   Get PC to put in writing, arrange decrement
tmpB → PC   FLUSH       NEARCALL: Replace PC, flush prefetch
IND → tmpC              Get SP to decrement
Σ → IND                 Decremented SP to IND
Σ → SP      W SS,P0 RNI Replace SP, write previous PC to stack

This code will mess up in two methods when executed as a byte instruction.
First, when the vacation spot handle is learn from reminiscence, solely a byte will likely be learn, so the vacation spot handle will likely be corrupted.
(I believe that the conduct right here is dependent upon the bus {hardware}. The 8086 will ask for a byte from reminiscence however will
learn the phrase that’s positioned on the bus.
Thus, if reminiscence returns a phrase, this half could function accurately.
The 8088’s conduct will likely be totally different due to its 8-bit bus.)
The second subject is writing the previous PC to the stack as a result of solely a byte of the PC will likely be written.
Thus, when the code returns from the subroutine name, the return handle will likely be corrupt.

CALL: FE/3

This instruction performs an oblique subroutine name between segments, studying the goal handle from the reminiscence location specified by the ModR/M byte.

IND → tmpC  INC2 tmpC    CALL FAR rm: arrange IND+2
Σ → IND     R DS,P0      Learn new CS, replace IND
OPR → tmpA  DEC2 tmpC    New CS to tmpA, arrange SP-2
SP → tmpC   SUSP         FARCALL: Droop prefetch
Σ → IND     CORR         FARCALL2: Replace IND, right PC
CS → OPR    W SS,M2      Push previous CS, decrement IND by 2
tmpA → CS   PASS tmpC    Replace CS, arrange for NEARCALL
PC → OPR    JMP NEARCALL Proceed with NEARCALL

As within the earlier CALL, this microcode will fail in a number of methods when executed in byte mode.
The brand new CS and PC addresses will likely be learn from reminiscence as bytes, which can or could not work.
Solely a byte of the previous CS and PC will likely be pushed to the stack.

JMP: FE/4

This instruction performs an oblique leap inside a phase, studying the goal handle from the reminiscence location specified by the ModR/M byte.
The microcode is brief, for the reason that ModR/M micro-subroutine does many of the work.
I imagine this may have the identical downside because the earlier CALL directions, that it’ll try and learn a byte from
reminiscence as an alternative of a phrase.

        SUSP       JMP rm: Droop prefetch
M → PC  FLUSH RNI  Replace PC with new handle, flush prefetch, achieved

JMP: FE/5

This instruction performs an oblique leap between segments, studying the brand new PC and CS values from the reminiscence location specified by the ModR/M byte.
The ModR/M micro-subroutine reads the brand new PC handle. This microcode increments IND and suspends prefetching.
It updates the PC, reads the brand new CS worth from reminiscence, and updates the CS.
As earlier than, the reads from reminiscence will learn bytes as an alternative of phrases, so this code won’t meaningfully work in byte mode.

IND → tmpC  INC2 tmpC   JMP FAR rm: arrange IND+2
Σ → IND     SUSP        Replace IND, droop prefetch
tmpB → PC   R DS,P0     Replace PC, learn new CS from reminiscence
OPR → CS    FLUSH RNI   Replace CS, flush prefetch, achieved

PUSH: FE/6

This instruction pushes the register or reminiscence worth specified by the ModR/M byte.
It decrements the SP by 2 after which writes the worth to the stack.
It can write one byte to the stack however decrements the SP by 2,
so one byte of previous stack knowledge will likely be on the stack together with the info byte.

SP → tmpA  DEC2 tmpA    PUSH rm: Arrange SP decrement 
Σ → IND                 Decremented worth to IND
Σ → SP                  Decremented worth to SP
M → OPR    W SS,P0 RNI  Write the info to the stack

Undocumented instruction values

The following class of undocumented directions is the place the primary byte signifies a legitimate instruction, however
there’s something fallacious with the second byte.

AAM: ASCII Modify after Multiply

The AAM instruction is a reasonably obscure one, designed to assist binary-coded decimal
arithmetic (BCD).
After multiplying two BCD digits, you find yourself with a binary worth between 0 and 81 (0×0 to 9×9).
In order for you a BCD consequence, the AAM instruction converts this binary worth to BCD, for example splitting 81 into the
decimal digits 8 and 1, the place the higher digit is 81 divided by 10, and the decrease digit is 81 modulo 10.

The fascinating factor about AAM is that the 2-byte instruction is D4 0A. You would possibly discover that hex 0A is 10, and this
isn’t a coincidence.
There wasn’t a straightforward approach to get the worth 10 within the microcode, so as an alternative they made the instruction
present that worth within the second byte.
The undocumented (however well-known) half is that in case you present a price apart from 10, the instruction will convert the binary enter into
digits in that base. For instance, in case you present 8 because the second byte, the instruction returns the worth divided by 8
and the worth modulo 8.

The microcode for AAM, under, units up the registers. calls
the CORD (Core Division) micro-subroutine to carry out the division,
after which places the outcomes into AH and AL.
In additional element, the CORD routine divides tmpA/tmpC by tmpB, placing the complement of the quotient in tmpC and leaving the rest in tmpA.
(If you wish to understand how CORD works internally, see my division post.)
The essential step is that the AAM microcode will get the divisor from the prefetch queue (Q).
After calling CORD, it units up the ALU to carry out a 1’s complement of tmpC and places the consequence (Σ) into AH.
It units up the ALU to cross tmpA by means of unchanged, places the consequence (Σ) into AL, and updates the flags accordingly (F).

Q → tmpB                    AAM: Transfer byte from prefetch to tmpB
ZERO → tmpA                 Transfer 0 to tmpA
AL → tmpC    CALL CORD      Transfer AL to tmpC, name CORD.
             COM1 tmpC      Set ALU to enhance
Σ → AH       PASS tmpA, NXT Complement AL to AH
Σ → AL       RNI F          Cross tmpA by means of ALU to set flags

The fascinating factor is why this code has undocumented conduct.
The 8086’s microcode solely has assist for the constants 0 and all-1’s (FF or FFFF), however the microcode must divide by 10.
One answer could be to implement an extra micro-instruction and extra circuitry to supply the fixed 10, however each
transistor was treasured again then.
As an alternative, the designers took the method of merely placing the quantity 10 because the second byte of the instruction and loading the
fixed from there.
For the reason that AAM instruction isn’t used very a lot, making the instruction two bytes lengthy wasn’t a lot of a downside.
However in case you put a unique quantity within the second byte, that is the divisor the microcode will use.
(In fact you may add circuitry to confirm that the quantity is 10, however then the implementation is now not easy.)

Intel may have documented the complete conduct, however that creates a number of issues.
First, Intel could be caught supporting the complete conduct into the longer term.
Second, there are nook instances to cope with, corresponding to divide-by-zero.
Third, testing the chip would turn out to be tougher as a result of all these instances would should be examined.
Fourth, the documentation would turn out to be lengthy and complicated.
It isn’t shocking that Intel left the complete conduct undocumented.

AAD: ASCII Modify earlier than Division

The AAD instruction is analogous to AAM however used for BCD division.
On this case, you wish to divide a two-digit BCD quantity by one thing, the place the BCD digits are in AH and AL.
The AAD instruction converts the two-digit BCD quantity to binary by computing AH×10+AL, earlier than you carry out
the division.

The microcode for AAD is proven under. The microcode units up the registers, calls the multiplication micro-subroutine
CORX (Core Occasions), and
then places the leads to AH and AL.
In additional element, the multiplier comes from the instruction prefetch queue Q.
The CORX routine multiples tmpC by tmpB, placing the end in tmpA/tmpC.
Then the microcode provides the low BCD digit (AL) to the product (tmpB + tmpC), placing the sum (Σ) into AL,
clearing AH and setting the standing flags F appropriately.

One fascinating factor is that the second-last micro-instruction jumps to AAEND, which is the final
micro-instruction of the AAM microcode above.
By reusing the micro-instruction from AAM, the microcode is one micro-instruction shorter, however
the leap provides one cycle to the execution time.
(The CORX routine is used for integer multiplication; I talk about the internals in this post.)

Q → tmpC              AAD: Get byte from prefetch queue.
AH → tmpB   CALL CORX Name CORX
AL → tmpB   ADD tmpC  Set ALU for ADD
ZERO → AH   JMP AAEND Zero AH, leap to AAEND
i
...
Σ → AL      RNI F     AAEND: Sum to AL, achieved.

As with AAM, the fixed 10 is offered within the second byte of the instruction.
The microcode accepts any worth right here, however values apart from 10 are undocumented.

8C, 8E: MOV sr

The opcodes 8C and 8E carry out a MOV register to or from the required phase register, utilizing the register specification
discipline within the ModR/M byte.
There are 4 phase registers and three choice bits, so an invalid phase register may be specified.
Nonetheless, the {hardware} that decodes the register quantity ignores instruction bit 5 for a phase register. Thus,
specifying a phase register 4 to 7 is similar as specifying a phase register 0 to three.
For extra particulars, see my article on 8086 register codes.

Surprising REP prefix

REP IMUL / IDIV

The REP prefix is used with string operations to trigger the operation to be repeated throughout a block of reminiscence.
Nonetheless, in case you use this prefix with an IMUL or IDIV instruction, it has the surprising conduct
of negating the product or the quotient (source).

The explanation for this conduct is that the string operations use an inner flag referred to as F1 to point {that a} REP
prefix has been utilized.
The multiply and divide code reuses this flag to trace the signal of the enter values, toggling F1 for every detrimental worth.
If F1 is about, the worth on the finish is negated. (This handles “two negatives make a optimistic.”)
The consequence is that the REP prefix places the flag within the 1 state when the multiply/divide begins, so the computed signal
will likely be fallacious on the finish and the result’s the detrimental of the anticipated consequence.
The microcode is pretty advanced, so I will not present it right here; I clarify it intimately in this blog post.

REP RET

Wikipedia lists
REP RET (i.e. RET with a REP prefix) as a approach to implement a two-byte return instruction.
That is sort of trivial; the RET microcode (like virtually each instruction) would not use the F1 inner flag,
so the REP prefix has no impact.

REPNZ MOVS/STOS

Wikipedia mentions that
the usage of the REPNZ prefix (versus REPZ) is undefined with string operations apart from CMPS/SCAS.
An inner flag referred to as F1Z distinguishes between the REPZ and REPNZ prefixes.
This flag is just utilized by CMPS/SCAS. For the reason that different string directions ignore this flag, they’ll ignore the
distinction between REPZ and REPNZ.
I wrote about string operations in additional element in this post.

Utilizing a register as an alternative of reminiscence.

Some directions are documented as requiring a reminiscence operand. Nonetheless, the ModR/M byte can specify a register.
The conduct in these instances may be extremely uncommon, offering entry to hidden registers.
Analyzing the microcode exhibits how this occurs.

LEA reg

Many directions have a ModR/M byte that signifies the reminiscence handle that the instruction ought to use, maybe by means of
an advanced addressing mode.
The LEA (Load Efficient Handle) instruction is totally different: it would not entry the reminiscence location however returns the handle itself.
The undocumented half is that the ModR/M byte can specify a register as an alternative of a reminiscence location. In that case,
what does the LEA instruction do? Clearly it could’t return the handle of a register, nevertheless it must return one thing.

The conduct of LEA is defined by how the 8086 handles the ModR/M byte.
Earlier than operating the microcode similar to the instruction, the microcode engine calls a brief micro-subroutine
for the actual addressing mode.
This micro-subroutine places the specified reminiscence handle (the efficient handle) into the tmpA register.
The efficient handle is copied to the IND (Oblique) register and the worth is loaded from reminiscence if wanted.
Then again, if the ModR/M byte specified a register as an alternative of reminiscence, no micro-subroutine is known as.
(I clarify ModR/M dealing with in additional element in this article.)

The microcode for LEA itself is only one line. It shops the efficient handle within the IND register into the required vacation spot register, indicated by N.
This assumes that the suitable ModR/M micro-subroutine was referred to as earlier than this code, placing the efficient handle into IND.

IND → N   RNI  LEA: retailer IND register in vacation spot, achieved

But when a register was specified as an alternative of a reminiscence location, no ModR/M micro-subroutine will get referred to as.
As an alternative, the LEA instruction will return no matter worth was left
in IND from earlier than, usually the earlier reminiscence location that was accessed.
Thus, LEA can be utilized to learn the worth of the IND register, which is often hidden from the programmer.

LDS reg, LES reg

The LDS and LES directions load a far pointer from reminiscence into the required phase register and general-purpose register.
The microcode under assumes that the suitable ModR/M micro-subroutine has arrange IND and skim the primary worth into OPR.
The microcode updates the vacation spot register, increments IND by 2, reads the following worth, and updates DS.
(The microcode for LES is a replica of this, however updates ES.)

OPR → N               LDS: Copy OPR to dest register
IND → tmpC  INC2 tmpC Arrange incrementing IND by 2
Σ → IND     R DS,P0   Replace IND, learn subsequent location
OPR → DS    RNI       Replace DS

If the LDS instruction specifies a register as an alternative of reminiscence, a micro-subroutine won’t be referred to as, so IND and OPR
may have values from a earlier instruction.
OPR will likely be saved within the vacation spot register, whereas the DS worth will likely be learn from the handle IND+2.
Thus, these directions present a mechanism to entry the hidden OPR register.

JMP FAR rm

The JMP FAR rm instruction usually jumps to the far handle saved in reminiscence on the location indicated by the ModR/M byte.
(That’s, the ModR/M byte signifies the place the brand new PC and CS values are saved.)
However, as with LEA, the conduct is undocumented if the ModR/M byte specifies a register, since a register would not maintain
a four-byte worth.

The microcode explains what occurs.
As with LEA, the code expects a micro-subroutine to place the handle into the IND register.
On this case, the micro-subroutine additionally masses the worth at that handle (i.e. the vacation spot PC) into tmpB.
The microcode increments IND by 2 to level to the CS phrase in reminiscence and reads that into CS.
In the meantime, it updates the PC with tmpB.
It suspends prefetching and flushes the queue, so instruction fetching will restart on the new handle.

IND → tmpC  INC2 tmpC   JMP FAR rm: arrange so as to add 2 to IND
Σ → IND     SUSP        Replace IND, droop prefetching
tmpB → PC   R DS,P0     Replace PC with tmpB. Learn new CS from specified handle
OPR → CS    FLUSH RNI   Replace CS, flush queue, achieved

For those who specify a register as an alternative of reminiscence, the micro-subroutine will not get referred to as.
As an alternative, this system counter will likely be loaded with no matter worth was in tmpB and the CS phase register will
be loaded from the reminiscence location two bytes after the placement that IND was referencing.
Thus, this undocumented use of the instruction provides entry to the otherwise-hidden tmpB register.

The tip of undocumented directions

Microprocessor producers quickly realized that undocumented directions have been an issue, since
programmers discover them and sometimes use them.
This creates a problem for future processors, and even revisions of the present processor:
in case you eradicate an undocumented instruction, previously-working code that used the instruction will break,
and it’ll seem to be the brand new processor is defective.

The answer was for processors to detect undocumented directions and stop them from executing.
By the early Eighties, processors had sufficient transistors (because of Moore’s legislation) that they may embrace
the circuitry to dam unsupported directions.
Particularly, the 80186/80188 and the 80286 generated a lure of sort 6 when an unused opcode was executed,
blocking use of the instruction.9
This lure is also called #UD (Undefined instruction lure).10

Conclusions

The 8086, like many early microprocessors, has undocumented directions however no traps to cease them from executing.11
For the 8086, these fall into a number of classes.
Many undocumented directions merely mirror present directions.
Some directions are applied however not documented for one purpose or one other, corresponding to SALC and POP CS.
Different directions can be utilized exterior their regular vary, corresponding to AAM and AAD.
Some directions are supposed to work solely with a reminiscence handle, so specifying a register can have
unusual results corresponding to revealing the values of the hidden IND and OPR registers.

Understand that my evaluation relies on transistor-level simulation and inspecting the microcode; I have never verified the conduct on a
bodily 8086 processor. Please let me know in case you see any errors in my evaluation or undocumented directions that I’ve
neglected.
Additionally notice that the conduct may change between totally different variations of the 8086; specifically, some variations by totally different producers
(such because the NEC V20 and V30) are recognized to be totally different.

I plan to put in writing extra concerning the 8086, so
comply with me on Twitter @kenshirriff or RSS for updates.
I’ve additionally began experimenting with Mastodon just lately as @[email protected]
and Bluesky as @righto.com so you possibly can comply with me there too.

Notes and references



Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top