Reverse-engineering the ModR/M addressing microcode within the Intel 8086 processor
One attention-grabbing side of a pc’s instruction set is its addressing modes, how the pc determines the
tackle for a reminiscence entry.
The Intel 8086 (1978) used the
ModR/M byte, a particular byte following the opcode, to pick out the addressing mode.1
The ModR/M byte has persevered into the fashionable x86 structure, so it is attention-grabbing to take a look at its roots and authentic implementation.
On this put up, I take a look at the {hardware} and microcode within the 8086 that implements ModR/M2
and the way the 8086 designers match a number of addressing modes into the 8086’s restricted microcode ROM.
One method was a hybrid method that mixed generic microcode with {hardware} logic that crammed within the particulars for a selected instruction.
A second method was modular microcode, with subroutines for numerous elements of the duty.
I have been reverse-engineering the 8086 beginning with the silicon die.
The die photograph under reveals the chip underneath a microscope.
The metallic layer on high of the chip is seen, with the silicon and polysilicon largely hidden beneath.
Across the edges of the die, bond wires join pads to the chip’s 40 exterior pins.
I’ve labeled the important thing useful blocks; those which are essential to this dialogue are darker and might be mentioned intimately under.
Architecturally, the chip is partitioned right into a Bus Interface Unit (BIU) on the high and an Execution Unit (EU) under.
The BIU handles bus and reminiscence exercise in addition to instruction prefetching, whereas the Execution Unit (EU) executes directions and microcode.
Each models play essential roles in reminiscence addressing.
The 8086 die underneath a microscope, with essential useful blocks labeled. This photograph reveals the chip’s single metallic layer; the polysilicon and silicon are beneath. Click on on this picture (or another) for a bigger model.
8086 addressing modes
Let’s begin with an addition instruction, ADD
dst,src, which provides a supply worth to a vacation spot worth and shops the end result within the vacation spot.3
What are the supply and vacation spot? Reminiscence? Registers? The addressing mode solutions this query.
You need to use a register because the supply and one other register because the vacation spot.
The instruction under makes use of the AX
register because the vacation spot and the BX
register because the supply. Thus, it provides BX
to AX
and places the end result
in AX
.
ADD AX, BX Add the contents of the BX register to the AX register
A reminiscence entry is indicated with sq. brackets across the “efficient tackle”4 to entry.
As an example, [1234]
means the reminiscence location with tackle 1234,
whereas [BP]
means the reminiscence location that the BP
register factors to.
For a extra difficult addressing mode, [BP+SI+1234]
means the reminiscence location is decided by including the BP and SI registers to the fixed 1234 (referred to as the displacement).
On the 8086, you should use reminiscence for the supply or the vacation spot, however not each.
Listed below are some examples of utilizing reminiscence as a supply:
ADD AX, [1234] Add the contents of reminiscence location 1234 to AX register ADD CX, [BP] Add reminiscence pointed to by BP register to CX register ADD DX, [BX+SI+1234] Supply reminiscence tackle is BX + SI + fixed 1234
Listed below are examples with reminiscence because the vacation spot:
ADD [1234], AX Add AX to the contents of reminiscence location 1234 ADD [BP], CX Add CX to reminiscence pointed to by BP register ADD [BX+SI+1234], DX Vacation spot reminiscence tackle is BX + SI + fixed 1234
You can too function on bytes as an alternative of phrases, utilizing a byte register and accessing a reminiscence byte:
ADD AL, [SI+1234] Add to the low byte of AX register ADD AH, [BP+DI+1234] Add to the excessive byte of AX register
As you’ll be able to see, the 8086 helps many alternative addressing schemes.
To know how they’re carried out, we should first take a look at how directions encode the addressing schemes within the ModR/M byte.
The ModR/M byte
The ModR/M byte follows many opcodes to specify the addressing mode.
This byte is pretty difficult however I am going to attempt to clarify it on this part.
The diagram under reveals how the byte is break up into three fields:5
mod
selects the general mode, reg
selects a register, and r/m
selects both a register or reminiscence mode.
I am going to begin with the register-register mode, the place the mod
bits are 11 and the reg
and r/m
fields every choose one in all eight registers, as proven under.
The instruction ADD AX,BX
would use reg
=011 to pick out BX
and r/m
=000 to pick out AX
, so the ModR/M byte could be
11/g>011/g>000/g>.
(The register task will depend on whether or not the instruction operates on phrases, bytes, or section registers.
As an example, in a phrase instruction, 001
selects the CX
register, whereas in a byte instruction, 001
selects the CL
register, the low byte of CX
.)
The subsequent addressing mode specifies a reminiscence argument and a register argument. On this case, the mod
bits are 00, the reg
area specifies a
register as described above, and the r/m
area specifies a reminiscence tackle based on the desk under.
For instance, the instruction ADD [SI],CX
would use reg
=001 to pick out CX
and r/m
=100 to pick out [SI]
, so the ModR/M byte could be
00/g>001/g>100/g>.
r/m | Operand Tackle |
---|---|
000 | [BX+SI] |
001 | [BX+DI] |
010 | [BP+SI] |
011 | [BP+DI] |
100 | [SI] |
101 | [DI] |
110 | [BP] |
111 | [BX] |
The subsequent mode, 01, provides an 8-bit signed displacement to the tackle. This displacement consists of 1 byte
following the ModR/M byte. This helps addressing modes reminiscent of [BP+5]
.
The mode 10 is comparable besides the displacement is 2 bytes lengthy,
for addressing modes reminiscent of [BP+DI+0x1234]
.
The desk under reveals the which means of all 256 values for the ModR/M byte.
The mod
bits are coloured pink, the reg
bits inexperienced, and the r/m
bits blue.
Be aware the particular case “disp16” to help a 16-bit mounted tackle.
The register mixtures for reminiscence accesses could seem random however they have been designed to help the wants of high-level
languages, reminiscent of arrays and knowledge constructions.
The concept is so as to add a base register, an index register, and/or a set displacement to find out the tackle.6
The bottom register can point out the beginning of an array, the index register holds the offset within the array, and the displacement supplies the
offset of a area within the array entry.
The bottom register is BX
for knowledge or BP
for info on the stack.
The index registers are SI
(Supply Index) and DI
(Vacation spot Index).7
Some addressing options are dealt with by the opcode, not the ModR/M byte.
As an example, the ModR/M byte would not distinguish between
ADD AX,[SI]
and ADD [SI],AX
.
As an alternative, the 2
variants are distinguished by bit 1 of the instruction, the D
or “course” bit.8
Furthermore, many directions have one opcode that operates on phrases and one other that operates on bytes, distinguished by bit 0 of
the opcode, the W
or phrase bit.
The D
and W
bits are an instance of orthogonality within the 8086 instruction set,
permitting options to be mixed in numerous mixtures.
As an example, the addressing modes mix 8 forms of offset computation with three sizes of displacements and eight goal registers.
Arithmetic directions mix these addressing modes with eight ALU operations,
every of which might act on a byte or a phrase, with two doable reminiscence instructions.
All of those mixtures are carried out with one block of microcode, implementing a big instruction set with a small quantity of microcode.
(The orthogonality of the 8086 should not be overstated, although; it has many particular circumstances and issues that do not fairly match.)
An outline of 8086 microcode
Most individuals consider machine directions as the essential steps that a pc performs.
Nonetheless, many processors (together with the 8086) have one other layer of software program beneath: microcode.
With microcode, as an alternative of constructing the management circuitry from advanced logic gates, the management logic is basically changed with code.
To execute a machine instruction, the pc internally executes a number of less complicated micro-instructions, specified by the microcode.
The 8086 makes use of a hybrid method: though it makes use of microcode, a lot of the instruction performance is carried out with gate logic.
This method eliminated duplication from the microcode and stored the microcode sufficiently small for 1978 know-how.
In a way, the microcode is parameterized.
As an example, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation and a generic register.
The gate logic examines the instruction to find out which particular operation to carry out and the suitable register.
A micro-instruction within the 8086 is encoded into 21 bits as proven under.
Each micro-instruction has a transfer from a supply register to a vacation spot register, every specified with 5 bits.
The which means of the remaining bits will depend on the kind area.
A “quick bounce” is a conditional bounce inside the present block of 16 micro-instructions.
An ALU operation units up the arithmetic-logic unit to carry out an operation.
Bookkeeping operations are something from flushing the prefetch queue to ending the present instruction.
A reminiscence operation triggers a bus cycle to learn or write reminiscence.
A “lengthy bounce” is a conditional bounce to any of 16 mounted microcode areas (laid out in an exterior desk referred to as the Translation ROM).
Lastly, a “lengthy name” is a conditional subroutine name to one in all 16 areas.
For extra about 8086 microcode, see my microcode blog post.
Some examples of microcode for addressing
On this part, I am going to take a detailed take a look at a number of addressing modes and the way they’re carried out in microcode.
Within the subsequent part, I am going to summarize all of the microcode for addressing modes.
A register-register operation
Let’s begin by taking a look at a register-to-register instruction, earlier than we get into the problems of reminiscence accesses: ADD BX,AX
which provides AX
to BX
, storing the lead to BX
. This instruction has the opcode worth 01 and ModR/M worth C3 (hex).
Earlier than the microcode begins, the {hardware} performs some decoding of the opcode.
The Group Decode ROM (under) classifies an instruction into a number of classes:
this instruction incorporates a D bit, a W bit, and an ALU operation, and has a ModR/M byte.
Fields from the opcode and ModR/M bytes are extracted and saved in numerous inner registers.
The ALU operation kind (ADD
) is saved within the ALU opr
register.
From the ModR/M byte,
the reg
register code (AX
) is saved within the N
register, and the r/m
register code
(BX
) is saved within the M
register.
(The M
and N
registers are inner registers which are invisible to the programmer; every holds a 5-bit register code that specifies a register.9)
This diagram reveals the Group Decode ROM. The Group Decode ROM is extra of a PLA (programmable logic array) with two layers of NOR gates. Its enter strains are on the decrease left and its outputs are on the higher proper.
As soon as the preliminary decoding is finished, the microcode under for this ALU instruction is executed.10
(There are three micro-instructions, so the instruction takes three clock cycles.)
Every micro-instruction incorporates a transfer and an motion.
First, the register specified by M
(i.e. BX
) is moved to the ALU’s non permanent A register (tmpA
).
In the meantime, the ALU is configured to carry out the suitable operation on tmpA
; XI
signifies that the ALU operation is specified by the instruction bits, i.e. ADD
).
The second instruction strikes the register specified by N
(i.e. AX
) to the ALU’s tmpB
register.
The motion NX
signifies that that is the next-to-last micro-instruction so
the microcode engine can begin processing the subsequent machine instruction.
The final micro-instruction shops the ALU’s end result (Σ
) within the register indicated by M
(i.e. BX
).
The standing flags are up to date due to the F
.
WB,RNI
(Run Subsequent Instruction) signifies that that is the tip and the microcode engine can course of the subsequent machine instruction.
The WB
prefix would skip the actions if a reminiscence writeback have been pending (which isn’t the case).
transfer motion M → tmpA XI tmpA ALU rm↔r: BX to tmpA N → tmpB WB,NX AX to tmpB Σ → M WB,RNI F end result to BX, run subsequent instruction.
This microcode packs lots into three micro-instructions.
Be aware that it is vitally generic: the microcode would not know what ALU operation is being carried out or which registers are getting used.
As an alternative, the microcode offers with summary registers and operations, whereas the {hardware} fills within the particulars utilizing bits from the directions.
The identical microcode is used for eight totally different ALU operations. And as we’ll see, it helps a number of addressing modes.
Utilizing reminiscence because the vacation spot
Reminiscence operations on the 8086 contain each microcode and {hardware}.
A reminiscence operation makes use of two inner registers: IND
(Oblique) holds the reminiscence tackle, whereas OPR
(Operand) holds the phrase that’s learn or written.
A typical reminiscence micro-instruction is R DS,P0
, which begins a learn from the Information Section
with a “Plus 0” on the IND
register afterward. The Bus Interface Unit carries out this operation by including the section register
to compute the bodily tackle, after which operating the reminiscence bus cycles.
With that background, let us take a look at the instruction ADD [SI],AX
, which provides AX
to the reminiscence location listed by SI
.
As earlier than, the {hardware} performs some evaluation of the instruction (hex 01 04).
Within the ModR/M byte, mod=00 (reminiscence, no displacement), reg=000 (AX), and R/M=100 ([SI]).
The N
register is loaded with the code for AX
as earlier than.
The M
register, nevertheless, is loaded with OPR
(the reminiscence knowledge register) because the Group Decode ROM determines that the instruction has a reminiscence addressing mode.
The microcode under begins in an efficient tackle microcode subroutine for the [SI]
mode.
The primary line of the microcode subroutine computes the efficient tackle just by loading the tmpA
register with SI
. It jumps to the micro-routine EAOFFSET
which finally ends up at EALOAD
(for causes that might be described under), which masses the worth from reminiscence.
Particularly, EALOAD
places the tackle in IND
, reads the worth from reminiscence, places the worth into tmpB
, and returns from the subroutine.
SI → tmpA JMP EAOFFSET [SI]: put SI in tmpA tmpA → IND R DS,P0 EALOAD: learn reminiscence OPR → tmpB RTN M → tmpA XI tmpA ALU rm↔r: OPR to tmpA N → tmpB WB,NX AX to tmpB Σ → M WB,RNI F end result to BX, run subsequent instruction. W DS,P0 RNI writes end result to reminiscence
Microcode execution continues with the ALU rm↔r
routine described above, however with a number of variations.
The M
register signifies OPR
, so the worth learn from reminiscence is put into tmpA
.
As earlier than, the N
register specifies AX
, in order that register is put into tmpB
.
On this case, the WB,NX
determines that the end result might be written again to reminiscence so it skips the NXT
operation.
The ALU’s end result (Σ
) is saved in OPR
as directed by M
.
The WB,RNI
is skipped so microcode execution continues.
The W DS,P0
micro-instruction writes the end result (in OPR
) to the reminiscence tackle in IND
.
At this level, RNI
terminates the microcode sequence.
Lots is occurring right here so as to add two numbers! The principle level is that the identical microcode runs as within the register case, however the outcomes are totally different resulting from
the M
register and the conditional WB
code.
By operating totally different subroutines, totally different efficient tackle computations will be carried out.
Utilizing reminiscence because the supply
Now let us take a look at how the microcode makes use of reminiscence as a supply, as within the instruction ADD AX,[SI]
.
This instruction (hex 03 04) has the identical
ModR/M byte as earlier than, so the N
register holds AX
and the M
register holds OPR
.
Nonetheless, as a result of the opcode has the D bit set, the M
and N
registers are swapped when accessed.
Thus, when the microcode makes use of M
, it will get the worth AX
from N
, and vice versa. (Sure, that is complicated.)
The microcode begins the identical because the earlier instance, studying [SI]
into tmpB
and returning to the ALU code.
Nonetheless, because the which means of M
and N
are reversed, the AX worth goes into tmpA
whereas the reminiscence worth goes into tmpB
.
(This swap would not matter for addition, however would matter for subtraction.)
An essential distinction is that there is no such thing as a writeback to reminiscence, so WB,NX
begins processing the subsequent machine instruction.
Within the final micro-instruction, the result’s written to M
, indicating the AX
register. Lastly, WB,RNI
runs the subsequent machine instruction.
SI → tmpA JMP EAOFFSET [SI]: put SI in tmpA tmpA → IND R DS,P0 EALOAD: learn reminiscence OPR → tmpB RTN M → tmpA XI tmpA ALU rm↔r: AX to tmpA N → tmpB WB,NX OPR to tmpB Σ → M WB,RNI F end result to AX, run subsequent instruction.
The principle level is that the identical microcode handles reminiscence as a supply and a vacation spot, just by setting the D
bit.
First, the D
bit reverses the operands by swapping M
and N
.
Second, the WB
conditionals stop the writeback to reminiscence that occurred within the earlier case.
Utilizing a displacement
The reminiscence addressing modes optionally help a signed displacement of 1 or two bytes.
Let us take a look at the instruction ADD AX,[SI+0x1234]
.
In hex, this instruction is 03 84 34 12, the place the final two bytes are the displacement, reversed as a result of the 8086 makes use of little-endian numbers.
The mod bits are 10, indicating a 16-bit displacement, however the different bits are the identical as within the earlier instance.
Microcode execution once more begins with the [SI]
subroutine.
Nonetheless, the bounce to EAOFFSET
goes to [i]
this time, to deal with the displacement offset. (I am going to clarify how, shortly.)
This code masses the offset as two bytes from the instruction prefetch queue (Q
) into the tmpB
register.
It provides the offset to the earlier tackle in tmpA
and places the sum Σ in tmpA
, computing the efficient tackle. Then it jumps to EAFINISH
(EALOAD
).
From there, the code continues as earlier, studying an argument from reminiscence and computing the sum.
SI → tmpA JMP EAOFFSET [SI]: put SI in tmpA Q → tmpBL JMPS MOD1 12 [i]: load from queue, conditional bounce Q → tmpBH Σ → tmpA JMP EAFINISH 12: tmpA → IND R DS,P0 EALOAD: learn reminiscence OPR → tmpB RTN M → tmpA XI tmpA ALU rm↔r: AX to tmpA N → tmpB WB,NX OPR to tmpB Σ → M WB,RNI F end result to AX, run subsequent instruction.
For the one-byte displacement case,
the conditional MOD1
will bounce over the fetch of the second displacement byte.
When the primary byte is loaded into the low byte of tmpB
, it was sign-extended into the excessive byte.14
Thus, the one-byte displacement case makes use of the identical microcode however finally ends up with a sign-extended 1-byte displacement in tmpB
.
The Translation ROM
Now let’s take a more in-depth take a look at the jumps to EAOFFSET
, EAFINISH
, and the efficient tackle subroutines, which use one thing referred to as the Translation ROM.
The Translation ROM converts the 5-bit bounce tag in a micro-instruction right into a 13-bit microcode tackle.
It additionally supplies the addresses of the efficient tackle subroutines.
As might be seen under, there are some problems.11
The Translation ROM because it seems on the die. The metallic layer has been eliminated to reveal the silicon and polysilicon beneath. The left half decodes the inputs to pick out a row. The suitable half outputs the corresponding microcode tackle.
The efficient tackle micro-routines
Register calculations
The Translation ROM has an entry for the addressing mode calculations reminiscent of [SI]
and [BP+DI]
, usually indicated by the r/m
bits,
the three low bits of the ModR/M byte.
Every routine computes the efficient tackle and places it into the ALU’s non permanent A register and jumps to EAOFFSET
, which provides any
displacement offset.
The microcode under reveals the 4 easiest efficient tackle calculations, which simply load the suitable register into tmpA
.
SI → tmpA JMP EAOFFSET [SI]: load SI into tmpA DI → tmpA JMP EAOFFSET [DI]: load SI into tmpA BP → tmpA JMP EAOFFSET [BP]: load BP into tmpA BX → tmpA JMP EAOFFSET [BX]: load BX into tmpA
For the circumstances under, an addition is required, so the registers are loaded into the ALU’s non permanent A and non permanent B registers.
The efficient tackle is the sum (indicated by Σ), which is moved to non permanent A.12
These routines are fastidiously organized in reminiscence so [BX+DI]
and [BP+SI]
every execute one micro-instruction after which bounce into
the center of the opposite routines, saving code.13
BX → tmpA [BX+SI]: get regs SI → tmpB 1: Σ → tmpA JMP EAOFFSET BP → tmpA [BP+DI]: get regs DI → tmpB 4: Σ → tmpA JMP EAOFFSET BX → tmpA JMPS 4 [BX+DI]: quick bounce to 4 BP → tmpA JMPS 1 [BP+SI]: quick bounce to 1
The EAOFFSET
and EAFINISH
targets
After computing the register portion of the efficient tackle, the routines above bounce to
EAOFFSET
, however this isn’t a set goal.
As an alternative, the Translation ROM selects one in all three goal microcode addresses based mostly on the instruction and the ModR/M byte:
If there is a displacement, the microcode jumps to [i]
so as to add the displacement worth.
If there is no such thing as a displacement however a reminiscence rea however a reminiscence learn, the microcode in any other case jumps to EALOAD
to load the reminiscence contents.
If there is no such thing as a displacement and no reminiscence learn ought to happen, the microcode jumps to EADONE
.
In different phrases, the microcode bounce is a three-way department that’s carried out by the Translation ROM and is clear to the microcode.
For a displacement, the [i]
speedy code under masses a 1-byte or 2-byte displacement into the tmpB
register and provides it to the tmpA
register,
as described earlier.
On the finish of a displacement calculation, the microcode jumps to the EAFINISH
tag, which is one other branching goal.
Primarily based on the instruction, the Translation ROM selects one in all two microcode targets: EALOAD
to load from reminiscence, or EADONE
to skip the load.
Q → tmpBL JMPS MOD1 12 [i]: get byte(s) Q → tmpBH Σ → tmpA JMP EAFINISH 12: add displacement
The EALOAD
microcode under reads the worth from reminiscence, utilizing the efficient tackle in tmpA
. It places the lead to tmpB
.
The RTN
micro-instruction returns to the microcode that implements the unique machine instruction.
tmpA → IND R DS,P0 EALOAD: learn from tmpA tackle OPR → tmpB RTN retailer lead to tmpB, return
The EADONE
routine places the efficient tackle in IND
, nevertheless it would not learn from the reminiscence location.
This helps machine directions reminiscent of MOV
(some strikes) and LEA
(Load Efficient Tackle) that do not learn from reminiscence
tmpA → IND RTN EADONE: retailer efficient tackle in IND
To summarize, the microcode runs totally different subroutines and totally different paths, relying on the addressing mode, executing the suitable code.
The Translation ROM selects the suitable management circulate path.
Particular circumstances
There are a few particular circumstances in addressing that I’ll talk about on this part.
Supporting a set tackle
It is not uncommon to entry a set reminiscence tackle, however the usual addressing modes use a base or index register.
The 8086 replaces the mode of [BP]
with no displacement with 16-bit mounted addressing.
In different phrases, a ModR/M byte with the sample 00xxx110
is handled specifically.
(This particular case is the orange disp16
line within the ModR/M desk earlier.)
That is carried out within the Translation ROM which has further rows to
detect this sample and execute the speedy phrase [iw]
microcode under as an alternative.
This microcode fetches a phrase from the instruction prefetch queue (Q
) into the tmpA
register, a byte at a time.
It jumps to EAFINISH
as an alternative of EAOFFSET
as a result of it would not make sense so as to add one other displacement.
Q → tmpAL [iw]: get bytes
Q → tmpAH JMP EAFINISH
Choosing the section
Reminiscence accesses within the 8086 are relative to one of many 64-kilobyte segments: Information Section, Code Section, Stack Section, or Further Section.
Most addressing modes use the Information Section by default.
Nonetheless, addressing modes that use the BP
register use the Stack Section by default.
It is a good selection because the BP
(Base Pointer) register is meant for accessing values on the stack.
This particular case is carried out within the Translation ROM.
It has an additional output bit that signifies that the addressing mode ought to use the Stack Section.
For the reason that Translation ROM is already decoding the addressing mode to pick out the precise microcode routine, including
yet another output bit is easy.
This bit goes to the section register choice circuitry, altering the default section.
This circuitry additionally handles prefixes that change the section.
Thus, section register choice is dealt with in {hardware} with none motion by the microcode.
Conclusions
I hope you’ve got loved this tour by way of the depths of 8086 microcode.
The efficient tackle calculation within the 8086 makes use of a mixture of microcode and logic circuitry to implement
quite a lot of addressing strategies.
Particular circumstances make the addressing modes extra helpful, however make the circuitry extra difficult.
This reveals the CISC (Advanced Instruction Set Laptop) philosophy of x86, making the directions difficult however
extremely useful. In distinction, the RISC (Diminished Instruction Set Laptop) philosophy takes the other method,
making the directions less complicated however permitting the processor to run quicker.
RISC vs. CISC was an enormous debate of the Nineteen Eighties, however is not as related these days.
Folks usually ask if microcode might be up to date on the 8086. Microcode was hardcoded into the ROM, so it couldn’t be modified.
This turned an enormous downside for Intel with the well-known Pentium floating-point division bug.
The Pentium chip turned out to have a bug that resulted in uncommon however critical errors when dividing.
Intel recalled the faulty processors in 1994 and changed them at a value of $475 million.
Beginning with the Pentium Professional (1995), microcode might be patched at boot time, a helpful characteristic that persists in fashionable CPUs.
I’ve written a number of posts on the 8086 thus far and
plan to proceed reverse-engineering the 8086 die so
observe me on Twitter @kenshirriff or RSS for updates.
I’ve additionally began experimenting with Mastodon lately as @[email protected].