Reverse-engineering the register codes for the 8086 processor’s microcode
Like most processors, the Intel 8086 (1978) gives registers which are sooner than most important reminiscence.
In addition to the registers which are seen to the programmer, the 8086 has a handful of inner registers which are hidden from the person.
Internally, the 8086 has a sophisticated scheme to pick which register to make use of, with a mix of microcode and {hardware}.
Registers are assigned a 5-bit figuring out quantity, both from the machine instruction or from the microcode.
On this weblog publish, I clarify how this register system works.
My evaluation relies on reverse-engineering the 8086 from die pictures. The die picture beneath exhibits the chip below a microscope.
For this die picture, I eliminated the the metallic and polysilicon layers, revealing the silicon beneath.
I’ve labeled the important thing practical blocks; those which are essential to this publish are darker.
Specifically, the registers and the Arithmetic/Logic Unit (ALU) are on the left and the microcode ROM is within the decrease proper.
Architecturally, the chip is partitioned right into a Bus Interface Unit (BIU) on the prime and an Execution Unit (EU) beneath.
The BIU handles bus and reminiscence exercise in addition to instruction prefetching, whereas the Execution Unit (EU) executes the directions.
The 8086 die below a microscope, with most important practical blocks labeled. Click on on this picture (or some other) for a bigger model.
Microcode
Most individuals consider machine directions as the fundamental steps that a pc performs.
Nonetheless, many processors (together with the 8086) have one other layer of software program beneath: microcode.
With microcode, as an alternative of constructing the management circuitry from advanced logic gates, the management logic is basically changed with code.
To execute a machine instruction, the pc internally executes a number of easier micro-instructions, specified by the microcode.
The 8086 makes use of a hybrid strategy: though it makes use of microcode, a lot of the instruction performance is carried out with gate logic.
This strategy eliminated duplication from the microcode and stored the microcode sufficiently small for 1978 expertise.
In a way, the microcode is parameterized.
As an example, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation and a generic register.
The gate logic examines the instruction to find out which particular operation to carry out and the suitable register.
A micro-instruction within the 8086 is encoded into 21 bits as proven beneath.
Each micro-instruction has a transfer from a supply register to a vacation spot register, every specified with 5 bits; this encoding is the
most important subject of this weblog publish.
The which means of the remaining bits depends upon the kind area and will be something from an ALU operation to a reminiscence learn or write to
a change of microcode management circulation.
For extra about 8086 microcode, see my microcode blog post.
Let’s take a look at how the machine instruction XCHG AX,reg
is carried out in microcode. This instruction exchanges AX
with the register specified within the low 3 bits
of the instruction.1
The microcode for this instruction consists of three micro-instructions, so the instruction takes three clock cycles.
Every micro-instruction comprises a transfer, which is the fascinating half.2
The desired register is moved to the ALU’s non permanent B register, the AX
register is moved to the required register, and eventually
the non permanent B is moved to AX
, finishing the swap.
transfer motion M → tmpB XCHG AX,rw: transfer reg to tmpB AX → M NXT transfer AX to reg, Subsequent to final tmpB → AX RNI transfer tmpB to AX, Run Subsequent Instruction
The important thing half for this dialogue is how M
signifies the specified register.
Suppose the instruction is XCHG AX,DX
. The underside three bits of the instruction are 010
, indicating DX
.
Throughout the first clock cycle of instruction execution, the opcode byte is transferred from the prefetch queue over the queue bus.
The M
register is loaded with the DX
quantity (which occurs to be 26), based mostly on the underside three bits of the instruction.
After a second clock cycle, the microcode begins.
The primary micro-instruction places M
‘s worth (26) onto the supply bus and the quantity for tmpB
(13) on the vacation spot bus, inflicting the switch from
DX
to tmpB
.
The second micro-instruction places the AX
quantity (24) onto the supply bus and the M
worth (26) onto the vacation spot bus, inflicting the switch from AX
to DX
.
The third micro-instruction places tmpB
quantity (13) onto the supply bus and the AX
quantity (24) onto the vacation spot bus, inflicting the switch from tmpB
to AX
.
Thus, the values on the supply and vacation spot bus management the information switch throughout every micro-instruction. Microcode can both specify these values explicitly
(as for AX
and tmpB
) or can specify the M
register to make use of the register outlined within the instruction.
Thus, the identical microcode implements all of the XCHG
directions and the microcode does not must know which register is concerned.
The register encoding
The microcode above illustrated how totally different numbers specified totally different registers.
The desk beneath exhibits how the quantity 0-31 maps onto a register.
Some numbers have a unique which means for a supply register or a vacation spot register; a slash separates these entries.
0 | ES | 8 | AL | 16 | AH | 24 | AX | |
1 | CS | 9 | CL | 17 | CH | 25 | CX | |
2 | SS | 10 | DL | 18 | DH | (M) | 26 | DX |
3 | DS | 11 | BL | 19 | BH | (N) | 27 | BX |
4 | PC | 12 | tmpA | 20 | Σ/tmpAL | 28 | SP | |
5 | IND | 13 | tmpB | 21 | ONES/tmpBL | 29 | BP | |
6 | OPR | 14 | tmpC | 22 | CR/tmpAH | 30 | SI | |
7 | Q/- | 15 | F | 23 | ZERO/tmpBH | 31 | DI |
Most of those entries are programmer-visible registers: the section registers are in inexperienced, the 8-bit registers in blue, and the 16-bit registers
in crimson.
Some inner registers and pseudo-registers are additionally accessible:
IND
(Oblique register), holding the reminiscence handle for a learn or write;
OPR
(Operand register), holding the information for a learn or write;
Q
(Queue), studying a byte from the instruction prefetch queue;
ALU non permanent registers A, B, and C, together with low (L) and (H) bytes;
F
, Flags register;
Σ
, the ALU output;
ONES
, all ones;
CR
, the three low bits of the microcode handle;
and ZERO
, the worth zero. The M
and N
entries can solely be specified from microcode, taking the place of DH
and BH
.
The desk is sort of difficult, however there are causes for its construction.
First, machine directions within the 8086 encode registers in keeping with the system beneath.
The 5-bit register quantity above is actually an extension of the instruction encoding.
Furthermore, the AX/CX/DX/BX registers (crimson) are lined up with their upper-byte and lower-byte variations (blue).
This simplifies the {hardware} since
the low three bits of the register quantity choose the register, whereas the higher two bits carry out the byte versus phrase choice.3
The inner registers match into obtainable spots within the desk.
The ModR/M byte
Most of the 8086 directions use a second byte referred to as the ModR/M byte to specify the addressing modes.4
The ModR/M byte offers the 8086 lots of flexibility in how an instruction accesses its operands.
The byte specifies a register for one operand and both a register or reminiscence for the opposite operand.
The diagram beneath exhibits how the byte is break up into three fields:
mod
selects the general mode, reg
selects a register, and r/m
selects both a register or reminiscence mode.
For a ModR/M byte, the reg
and the r/m
fields are learn into the N
and M
registers respectively, so the registers specified within the ModR/M byte
will be accessed by the microcode.
Let’s take a look at the instruction SUB AX,BX
which subtracts BX
from AX
.
Within the 8086, some essential processing steps happen earlier than the microcode begins.
Specifically, the “Group Decode ROM” categorizes the instruction into over a dozen classes that have an effect on how it’s processed, reminiscent of
directions which are carried out with out microcode, one-byte directions, or directions with a ModR/M byte.
The Group Decode ROM additionally signifies the construction of directions, reminiscent of directions which have a W
bit choosing byte versus phrase operations, or
a D
bit reversing the path of the operands.
On this case, the Group Decode ROM classifies the instruction as
containing a D
bit, a W
bit, an ALU operation, and a ModR/M byte.
Based mostly on the Group Decode ROM’s indicators, fields from the opcode and ModR/M bytes are extracted and saved in varied inner registers.
The ALU operation sort (SUB
) is saved within the ALU opr
register.
The ModR/M byte specifies BX
within the reg
area and AX
within the r/m
area so
the reg
register quantity (BX
, 27) is saved within the N
register, and the r/m
register quantity
(AX
, 24) is saved within the M
register.
As soon as the preliminary decoding is finished, the microcode beneath for this ALU instruction is executed.5
There are three micro-instructions, so the instruction takes three clock cycles.
First, the register specified by M
(i.e. AX
) is moved to the ALU’s non permanent A register (tmpA
).
In the meantime, XI
configures the ALU to carry out the operation specified by the instruction bits, i.e. SUB
.
The second micro-instruction strikes the register specified by N
(i.e. BX
) to the ALU’s tmpB
register.
The final micro-instruction shops the ALU’s end result (Σ
, quantity 20) within the register indicated by M
(i.e. AX
).
transfer motion M → tmpA XI tmpA ALU rm↔r: AX to tmpA N → tmpB NXT BX to tmpB Σ → M RNI F end result to AX, replace flags
One of many fascinating options of the 8086 is that many directions comprise a D
bit that reverses the path of the operation,
swapping the supply and the vacation spot. If we hold the ModR/M byte however use the SUB
instruction with the D
bit set, the
instruction turns into SUB BX,AX
, subtracting AX
from BX
, the other of earlier than.
(Swapping the supply and vacation spot is extra helpful when one argument is in reminiscence. However I will use an instance with two registers to maintain it easy.)
This instruction runs precisely the identical microcode as earlier than. The distinction is that when the microcode accesses M
, as a result of path bit it
will get the worth in N
, i.e. BX
as an alternative of AX
. The entry to N
is equally swapped.
The result’s that AX
is subtracted from BX
, and the change of path is clear to the microcode.
The M and N registers
Now let’s take a better have a look at how the M
and N
registers are carried out.
Every register holds a 5-bit register quantity, expanded from the three bits of the instruction.
The M register is loaded from the three least vital bits of the instruction or ModR/M byte,
whereas the N register is loaded with bits three by way of 5.
Mostly, the registers are specified by the ModR/M byte, however some directions specify the register within the opcode.6
The desk beneath exhibits how the bits within the instruction’s opcode or ModR/M byte (i5
, i4
, i3
) are transformed to a 5-bit quantity for the N register.
There are three circumstances: a 16-bit register, an 8-bit register, and a section register.
The mappings beneath could appear random, however they end result within the entries proven within the 5-bit register encoding desk earlier.
I’ve coloured the entries so you’ll be able to see the correspondence.
Mode | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|
16-bit reg | 1 | 1 | i5 | i4 | i3 |
8-bit reg | i5 | i5′ | 0 | i4 | i3 |
section reg | 0 | 0 | 0 | i4 | i3 |
I will undergo the three circumstances in additional element.
Many 8086 directions have two variations, one which acts on bytes and one which acts on phrases, distinguished by the W
bit (bit 0) within the instruction.
If the Group Decode ROM signifies that the instruction has a W
bit and the W
bit is 0, then the instruction is a byte instruction.7
If the instruction has a ModR/M byte and the instruction operates on a byte, the N
register is loaded with the 5-bit quantity for the required
byte register.
This occurs throughout “Second Clock”, the clock cycle when the ModR/M byte is fetched from the instruction queue.
The second case is comparable; if the instruction operates on a phrase, the N
register is loaded with the quantity for the phrase register specified within the
ModR/M byte.
The third case handles a section register.
The N
register is loaded with a section register quantity throughout Second Clock if the Group Decode ROM signifies the instruction has a ModR/M byte with a segment-register area (particularly the section register MOV
directions).
A bit surprisingly, a section register quantity can be loaded throughout First Clock.
This helps the PUSH
and POP
section register directions, which have the section register encoded in bits 3 and 4 of the opcode.8
The desk beneath exhibits how the bits are assigned within the M register, which makes use of instruction bits i2
, i1
, and i0
.
The circumstances are a bit extra difficult than the N
register.
First, a 16-bit register quantity is loaded from the opcode byte throughout First Clock to assist directions that specify the register within the low bits.
Throughout Second Clock, this worth could also be changed.
For a ModR/M byte utilizing register mode, the M
register is reloaded with the required
8-bit or a 16-bit register, relying on the byte mode sign described earlier.
Nonetheless, for a ModR/M byte that makes use of a reminiscence mode, the M
register is loaded with OPR
(Operand), the inner register that holds the phrase that’s learn or written to reminiscence.
Mode | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|
16-bit reg | 1 | 1 | i2 | i1 | i0 |
8-bit reg | i2 | i2′ | 0 | i1 | i0 |
OPR | 0 | 0 | 1 | 1 | 0 |
AX/AL | byte’ | 1 | 0 | 0 | 0 |
convert to 8-bit | m2 | m2′ | 0 | m1 | m0 |
Many directions use the AX or AL register, such because the ALU instant directions,
the enter and output directions, and string directions. For these, the Group Decode ROM triggers the AX
or AL
register quantity
particularly to be loaded into the M register throughout Second Clock. The highest bit is ready for a phrase operation and cleared for a byte operation offering
AX
or AL
as applicable.
The ultimate M
register case is a bit difficult. For a direct transfer instruction reminiscent of MOV BX,imm
, bit 3 switches between a byte and a phrase transfer (fairly than bit 0), as a result of bits 2-0 specify the register.
Sadly, the Group Decode ROM outputs aren’t obtainable throughout First Clock to point this case.
As a substitute, M
is loaded throughout First Clock with the belief of a 16-bit register. If that seems to be incorrect, the M
register is transformed
to an 8-bit register quantity throughout Second Clock by shuffling a couple of bits.
Producing the supply and vacation spot values
There are three circumstances for the quantity that goes on the supply or vacation spot buses:
the register quantity can come from the micro-instruction, the worth can come from the M
or N
register as specified within the micro-instruction,
or the worth can come from the M
and N
register with the roles swapped by the D
bit.
(Notice that the supply and vacation spot will be totally different circumstances and are chosen with separate circuitry.)
The primary case is the default case, the place the 5 bits from the micro-instruction supply or vacation spot specify the register immediately.
As an example, within the micro-instruction tmpB→AX
, the microcode is aware of which registers are getting used and specifies them immediately.
The second and third circumstances contain extra logic. Take into account the supply in M→tmpB
.
For an instruction and not using a D
bit, the register quantity is taken from M
. Likewise if the D
bit is 0. But when the instruction makes use of a D
bit
and the D
bit is 1, then the register quantity is taken from N
.
Multiplexers between the M
and N
registers choose the suitable register to placed on the bus.
The M and N registers as they seem on the die. The metallic layer has been faraway from this picture to indicate the silicon and polysilicon beneath.
The diagram above exhibits how the M
and N
register circuitry is carried out on the die, with the N
register on the prime and the M
register beneath.
Every register has an enter multiplexer that implements the tables above, choosing the suitable 5 bits relying on the mode. The registers themselves are carried out as
dynamic latches pushed by the clock. Within the center, a crossover multiplexer drives the supply and vacation spot buses, choosing the M
and N
registers
as applicable and amplifying the indicators with comparatively massive transistors. The third output from the multiplexer, the bits from the micro-instruction, is carried out in circuitry
bodily separated and nearer to the microcode ROM.
The register choice {hardware}
How does the 5-bit quantity choose a register?
The 8086 has a bunch of logic that turns a register quantity right into a management line that allows studying or writing of the register.
For essentially the most half, this logic is carried out with NOR gates that match a selected register quantity and generate a choose sign.
The sign goes by way of a particular bootstrap driver
to spice up its voltage because it wants to manage 16 register bits.
The 8086 registers are separated into two most important teams. The “higher registers” are within the higher left of the chip, within the Bus Interface Unit.
These are the registers which are immediately concerned with reminiscence accesses.
The “decrease registers” are within the decrease left of the chip, within the Execution Unit.
From backside to prime, they’re AX
, CX
, DX
, BX
, SP
, BP
, SI
, and DI
; their bodily order matches their order within the instruction set.9
A separate PLA (Programmable Logic Array) selects the ALU non permanent registers or flags as vacation spot.
Just under it, a PLA selects the supply from ALU non permanent registers, flags, or the ALU end result (Σ
).10
I’ve written concerning the 8086’s registers and their low-level implementation here if you would like extra data.
Some historical past
The 8086’s system of choosing registers with 3-bit codes originates with the Datapoint 2200,11
a desktop pc introduced in 1970.
The processor of the Datapoint 2200 was carried out with a board of TTL built-in circuits, since this was earlier than microprocessors.
Most of the Datapoint’s directions used a 3-bit code to pick a register, with a vacation spot register specification in bits 5-3 of the
instruction and a supply register in bits 2-0. (This structure is actually the identical as in 8086 directions and the ModR/M byte.)12
The eight values of this code chosen one in all 7 registers, with the eighth worth indicating a reminiscence entry.
Intel copied the Datapoint 2200 structure for the 800813 microprocessor (1972) and cleaned it up for the 8080 (1974),
however stored the fundamental instruction structure and register/reminiscence choice bits.
The 8086’s use of a numbering system for all of the registers goes significantly past this sample, partly as a result of its registers
perform each as general-purpose registers and special-purpose registers.14
Many directions can act on the AX, BX, and so on. registers interchangeably, treating them as general-purpose registers.
However these registers every have their very own particular functions for different directions, so the microcode should be capable to entry them particularly.
This motivates the 8086’s strategy the place registers will be handled as general-purpose registers which are chosen
from instruction bits, or as special-purpose registers chosen by the microcode.
The Motorola 68000 (1979) makes an fascinating comparability to the 8086 since they had been rivals.
The 68000 makes use of a lot wider microcode (85-bit microinstructions in comparison with 21 bits within the 8086).
It has two most important inner buses, however as an alternative of offering generic supply and vacation spot transfers just like the 8086, the 68000 has a
rather more difficult system: about two dozen microcode fields that join registers and different parts to the bus in varied methods.15
Conclusions
Internally, the 8086 represents registers with a 5-bit quantity. That is uncommon in comparison with earlier microprocessors, which often
chosen registers immediately from the instruction or management circuitry.
Three components motivated this design within the 8086. First, it used microcode, so a uniform methodology of specifying registers (each programmer-visible
and inner) was helpful.
Second, having the ability to swap the supply and vacation spot in an instruction motivated a stage of indirection in register specification, supplied by the
M
and N
registers.
Lastly, the flexibleness of the ModR/M byte, particularly supporting byte, phrase, and section registers, meant that the register specification
wanted 5 bits.
I’ve written a number of posts on the 8086 thus far and
plan to proceed reverse-engineering the 8086 die so
observe me on Twitter @kenshirriff or RSS for updates.
I’ve additionally began experimenting with Mastodon lately as @[email protected].