The 8086 processor’s first step of instruction decoding
A key part of any processor is instruction decoding: analyzing a numeric opcode and determining
what actions have to be taken.
The Intel 8086 processor (1978) has a posh instruction set, making instruction decoding a problem.
Step one in decoding an 8086 instruction is one thing known as the Group Decode ROM, which categorizes
directions into about 35 varieties that management how the instruction is decoded and executed.
As an illustration, the Group Decode ROM determines if an instruction is executed in {hardware} or in microcode.
It additionally signifies how the instruction is structured: if the instruction has a bit specifying a byte or phrase operation,
if the instruction has a byte that specifies the addressing mode, and so forth.
The 8086 die beneath a microscope, with foremost practical blocks labeled. This photograph exhibits the chip with the steel and polysilicon eliminated, revealing the silicon beneath. Click on on this picture (or every other) for a bigger model.
The diagram above exhibits the place of the Group Decode ROM on the silicon die, in addition to different key practical blocks.
The 8086 chip is partitioned right into a Bus Interface Unit that communicates with exterior elements comparable to reminiscence,
and the Execution Unit that executes directions.
Machine directions are fetched from reminiscence by the Bus Interface Unit and saved within the prefetch queue registers,
which maintain 6 bytes of directions.
To execute an instruction, the queue bus transfers an instruction byte from the prefetch queue to the instruction register, beneath management of a state machine known as the Loader.
Subsequent, the Group Decode ROM categorizes the instruction in response to its construction.
Most often, the machine instruction is applied in low-level microcode. The instruction byte is transferred
to the Microcode Tackle Register, the place the Microcode Tackle Decoder selects the suitable microcode routine
that implements the instruction.
The microcode gives the micro-instructions that management the Arithmetic/Logic Unit (ALU), registers, and different
elements to execute the instruction.
On this weblog put up, I’ll concentrate on a small a part of this course of: how the Group Decode ROM decodes directions.
Be warned that this put up will get down into the weeds, so that you may wish to begin with certainly one of my higher-level
posts, comparable to how the 8086’s microcode engine works.
Microcode
Most directions within the 8086 are applied in microcode.
Most individuals consider machine directions as the essential steps that a pc performs.
Nevertheless, many processors have one other layer of software program beneath: microcode.
With microcode, as a substitute of constructing the CPU’s management circuitry from advanced logic gates, the management logic is essentially changed with code.
To execute a machine instruction, the pc internally executes a number of less complicated micro-instructions, specified by the microcode.
Microcode is just used if the Group Decode ROM signifies that the instruction is applied in microcode.
In that case, the microcode deal with register is loaded with the instruction and the deal with decoder selects
the suitable microcode routine.
Nevertheless, there is a complication. If the second byte of the instruction is a Mod R/M byte, the Group Decode ROM
signifies this and causes a reminiscence addressing micro-subroutine to be known as.
Some easy directions are applied solely in {hardware} and do not use microcode.
These are often called 1-byte logic directions (1BL) and are additionally indicated by the Group Decode ROM.
The Group Decode ROM’s construction
The Group Decode ROM takes an 8-bit instruction as enter, together with an interrupt sign.
It produces 15 outputs that management how the instruction is dealt with.
On this part I will focus on the bodily implementation of the Group Decode ROM; the assorted outputs
are mentioned in a later part.
Though the Group Decode ROM is named a ROM, its implementation is mostly a PLA (Programmable Logic Array),
two ranges of highly-structured logic gates.1
The thought of a PLA is to create two ranges of NOR gates, every in a grid.
This construction has the benefits that it implements the logic densely and is straightforward to switch.
Though bodily two ranges of NOR gates, a PLA could be considered an AND
layer adopted by an OR
layer.
The AND
layer matches explicit bit patterns after which the OR
layer combines a number of values from the primary
layer to supply arbitrary outputs.
The Group Decode ROM. This photograph exhibits the steel layer on prime of the die.
Because the output values are extremely structured, a PLA implementation is significantly extra environment friendly than a ROM, since in a way
it combines a number of entries.
Within the case of the Group Decode ROM, utilizing a ROM construction would require 256 columns (one for every 8-bit instruction sample),
whereas the PLA implementation requires simply 36 columns, about 1/7 the scale.
The diagram under exhibits how one column of the Group Decode ROM is wired within the “AND” aircraft.
On this die photograph, I eliminated the steel layer with acid to disclose the polysilicon and silicon beneath.
The vertical traces present the place the steel line for floor and the column output had been.
The essential thought is that every column implements a NOR gate, with a subset of the enter traces chosen as inputs to the
gate.
The pull-up resistor on the prime pulls the column line excessive by default. But when any of the chosen inputs are excessive,
the corresponding transistor activates, connecting the column line to floor and pulling it low.
Thus, this implements a NOR gate.
Nevertheless, it’s extra helpful to consider it as an AND of the complemented inputs (by way of De Morgan’s Law):
if all of the inputs are “right”, the output is excessive.
On this approach, every column matches a selected bit sample.
Closeup of a column within the Group Decode ROM.
The construction of the ROM is applied by means of the silicon doping sample, which is seen above.
A transistor is fashioned the place a polysilicon wire crosses a doped silicon area: the polysilicon varieties the gate, turning the transistor on or off.
At every intersection level, a transistor could be created or not, relying on the doping sample.
If a selected transistor is created, then the corresponding enter should be 0 to supply a excessive output.
On the prime of the diagram above, the column outputs are switched from the steel layer to polysilicon wires and turn into the inputs to the higher “OR”
aircraft.
This aircraft is applied in a similar way as a grid of NOR gates.
The aircraft is rotated 90 levels, with the inputs vertical and every row forming an output.
Intermediate decoding within the Group Decode ROM
The primary aircraft of the Group Decode ROM categorizes directions into 36 varieties based mostly on the instruction bit sample.2
The desk under exhibits the 256 instruction values, coloured in response to their categorization.3
As an illustration, the primary blue block consists of the 32 ALU directions
similar to the bit sample 00XXX0XX
, the place X
signifies that the bit could be 0 or 1.
These directions are all decoded and executed in an analogous approach.
Virtually all directions have a single class, that’s, they activate a single column line within the Group Decode ROM. Nevertheless, a number of directions activate two traces and have two colours under.
Grid of 8086 directions, coloured in response to the primary degree of the Group Decode Rom.
Word that the directions would not have arbitrary numeric opcodes, however are assigned in a approach that makes decoding less complicated.
As a result of these blocks correspond to bit patterns, there may be little flexibility.
One of many challenges of instruction set design for early microprocessors was to assign numeric values to the opcodes
in a approach that made decoding simple.
It’s kind of like a jigsaw puzzle, becoming the directions into the 256 accessible values, whereas making them simple to decode.
Outputs from the Group Decode ROM
The Group Decode ROM has 15 outputs, one for every row of the higher half.
On this part, I will briefly focus on these outputs and their roles within the 8086.
For an interactive exploration of those alerts, see this page,
which exhibits the outputs which can be triggered by every instruction.
Out 0 signifies an IN
or OUT
instruction.
This sign controls the M/IO (S2) standing line, which distinguishes between a reminiscence learn/write and an I/O learn/write.
Other than this, reminiscence and I/O accesses are principally the identical.
Out 1 signifies (inverted) that the instruction has a Mod R/M byte and performs a learn/modify/write on its argument. This sign is utilized by the Translation ROM when dispatching
an deal with handler (details).
(This sign distinguishes between, say, ADD [AX],BX
and MOV [AX],BX
.
The previous each reads and writes [AX]
, whereas the latter solely writes to it.)
Out 2 signifies a “group 3/4/5” opcode, an instruction the place the second byte specifies the actual instruction,
and thus decoding wants to attend for the second byte.
This controls the loading of the microcode deal with register.
Out 3 signifies an instruction prefix (section, LOCK
, or REP
).
This causes the following byte to be decoded as a brand new instruction, whereas blocking interrupt dealing with.
Out 4 signifies (inverted) a two-byte ROM instruction (2BR), i.e. an instruction is dealt with by the microcode ROM, however
requires the second byte for decoding.
That is an instruction with a Mod R/M byte.
This sign controls the loader indicating that it must fetch the second byte.
This sign is sort of the identical as output 1 with a number of variations.
Out 5 specifies the highest bit for an ALU operation. The 8086 makes use of a 5-bit area to specify an ALU operation.
If not specified explicitly by the microcode, the sector makes use of bits 5 by means of 3 of the opcode.
(These bits distinguish, say, an ADD
instruction from AND
or SUB
.)
This management line units the highest little bit of the ALU area for directions comparable to DAA
, DAS
, AAA
, AAS
, INC
, and DE
that fall into a unique set from the “common” ALU directions.
Out 6 signifies an instruction that units or clears a situation code instantly: CLC
, STC
, CLI
, STI
, CLD
, or STD
(however not CMC
). This sign is utilized by the flag circuitry to replace the situation code.
Out 7 signifies an instruction that makes use of the AL
or AX
register, relying on the instruction’s measurement bit.
(As an illustration MOVSB
vs MOVSW
.)
This sign is utilized by the register choice circuitry, the M
register particularly.
Out 8 signifies a MOV
instruction that makes use of a section register.
This sign is utilized by the register choice circuitry, the N
register particularly.
Out 9 signifies the instruction has a d
bit, the place bit 1 of the instruction swaps the supply and vacation spot.
This sign is utilized by the register choice circuitry, swapping the roles of the M
and N
registers in response to the d
bit.
Out 10 signifies a one-byte logic (1BL) instruction, a one-byte instruction that’s applied in logic, not microcode. These directions are the prefixes, HLT
, and the condition-code directions.
This sign controls the loader, inflicting it to maneuver to the following instruction.
Out 11 signifies directions the place bit 0 is the byte/phrase indicator. This sign controls the register dealing with
and the ALU performance.
Out 12 signifies an instruction that operates solely on a byte: DAA
, DAS
, AAA
, AAS
, AAM
, AAD
, and XLAT
.
This sign operates at the side of the earlier output to pick out a byte versus phrase.
Out 13 forces the instruction to make use of a byte argument if instruction bit 1 is about, overriding the common byte/phrase sample. Particularly, it forces the L8
(size 8 bits) situation
for the JMP
direct-within-segment and the ALU directions which can be fast with signal extension (details).
Out 14 permits a carry replace. This prevents the carry from being up to date by the INC
and DEC
operations.
This sign is utilized by the flag circuitry.
Columns
Many of the Group Decode ROM’s column alerts are used to derive the outputs listed above.
Nevertheless, some column outputs are additionally used as management alerts instantly. These are listed under.
Column 10 signifies a direct MOV
instruction. These directions use instruction bit 3 (somewhat than bit 1) to pick out byte versus phrase, as a result of the three low bits specify the register.
This sign impacts the L8
situation described earlier and likewise causes the M
register choice to be transformed from a phrase register to a byte register if obligatory.
Column 12 signifies an instruction with bits 5-3 specifying the ALU instruction.
This sign causes the X
register to be loaded with
the bits within the instruction that specify the ALU operation. (To be exact, this sign prevents the X
register
from being reloaded from the second instruction byte.)
Column 13 signifies the CMC
(Complement Carry) instruction. This sign is utilized by the flags circuitry to enrich the carry flag (details).
Column 14 signifies the HLT
(Halt) instruction. This sign stops instruction processing by blocking the instruction queue.
Column 31 signifies a REP
prefix. This sign causes the REPZ/NZ latch to be loaded with instruction bit 0 to
point out if the prefix is REPNZ
or REPZ
. It additionally units the REP
latch.
Column 32 signifies a section prefix. This sign masses the section latches with the specified section kind.
Column 33 signifies a LOCK
prefix. It units the LOCK
latch, locking the bus.
Column 34 signifies a CLI
instruction. This sign instantly blocks interrupt dealing with to keep away from an interrupt between the CLI
instruction and when the interrupt flag bit is cleared.
Timing
One necessary facet of the Group Decode ROM is that its outputs are usually not instantaneous.
It takes a clock cycle to get the outputs from the Group Decode ROM.
Specifically, when instruction decoding begins, the timing sign FC
(First Clock) is activated to point the primary clock
cycle. Nevertheless, the Group Decode ROM’s outputs are usually not accessible till the Second Clock SC
.
One consequence of that is that even the only instruction (comparable to a flag operation) takes two clock cycles, as does a prefix.
The issue is that despite the fact that the instruction might be carried out in a single clock cycle, it takes two clock cycles
for the Group Decode ROM to find out that the instruction solely wants one cycle.
This illustrates how a posh instruction format impacts efficiency.
The FC
and SC
timing alerts are generated by a state machine known as the Loader.
These alerts could appear trivial, however there are a number of problems.
First, the prefetch queue might run empty, wherein case the FC
and/or SC
sign is delayed till the prefetch queue has a byte accessible.
Second, to extend efficiency, the 8086 can begin decoding an instruction over the last clock cycle of the earlier instruction.
Thus, if the microcode signifies that there’s one cycle left, the Loader can proceed with the following instruction.
Likewise, for a one-byte instruction applied in {hardware} (one-byte logic or 1BL), the loader proceeds
as quickly as doable.
The diagram under exhibits the timing of an ADD
instruction. Every line is half of a clock cycle.
Execution is pipelined: the instruction is fetched through the first clock cycle (First Clock).
Throughout Second Clock, the Group Decode ROM produces its output. The microcode deal with register additionally generates
the micro-address for the instruction’s microcode.
The microcode ROM provides a micro-instruction through the third clock cycle and execution of the micro-instruction
takes place through the fourth clock cycle.
This diagram exhibits the execution of an ADD instruction and what’s taking place in numerous elements of the 8086. The arrows present the circulation from step to step. The character µ is brief for “micro”.
The Group Decode ROM’s outputs throughout Second Clock management the decoding.
Most significantly, the ADD imm
instruction used microcode; it’s not a one-byte logic instruction (1BL
).
Furthermore, it doesn’t have a Mod R/M byte, so it doesn’t want two bytes for decoding (2BR
).
For a 1BL
instruction, microcode execution could be blocked and the following instruction could be instantly fetched.
However, for a 2BR
instruction, the loader would inform the prefetch queue that it was completed with the
second byte through the second half of Second Clock.
Microcode execution could be blocked through the third cycle and the fourth cycle would execute a microcode
subroutine to find out the reminiscence deal with.
For extra particulars, see my article on the 8086 pipeline.
Interrupts
The Group Decode ROM takes the 8 bits of the instruction as inputs, however it has a further enter indicating that
an interrupt is being dealt with.
This sign blocks a lot of the Group Decode ROM outputs.
This prevents the present instruction’s outputs from interfering with interrupt dealing with.
I wrote concerning the 8086’s interrupt dealing with intimately here, so I will not go into extra element on this put up.
Conclusions
The Group Decode ROM signifies one of many key variations between CISC processors (Advanced Instruction Set Pc) such because the 8086 and the RISC processors (Lowered Instruction Set Pc) that turned in style a number of years later.
A RISC instruction set is designed to make instruction decoding very simple, with a small variety of uniform instruction varieties.
However, the 8086’s CISC instruction set was designed for compactness and excessive code density.
Because of this, directions are squeezed into the accessible opcode house.
Though there may be a number of construction to the 8086 opcodes, this construction is stuffed with particular instances and any patterns solely apply to a subset of the directions.
The Group Decode ROM brings some order to this chaotic jumble of directions, and the variety of outputs
from the Group Decode ROM is a measure of the instruction set’s complexity.
The 8086’s instruction set was prolonged over the a long time to turn into the x86 instruction set in use at this time.
Throughout that point, extra layers of complexity had been added to the instruction set.
Now, an x86 instruction could be as much as 15 bytes lengthy with a number of prefixes.
Some prefixes change the register encoding or point out a very totally different instruction set comparable to VEX
(Vector Extensions) or SSE
(Streaming SIMD Extensions).
Thus, x86 instruction decoding could be very troublesome, particularly when making an attempt to decode a number of directions in parallel.
This has an impression in trendy methods, the place x86 processors usually have 4 advanced instruction decoders whereas Apple’s ARM processors have 8 less complicated decoders; that is said to present Apple a efficiency profit.
Thus, architectural selections from 45 years in the past are nonetheless impacting the efficiency of contemporary processors.
I’ve written quite a few posts on the 8086 up to now and
plan to proceed reverse-engineering the 8086 die so
observe me on Twitter @kenshirriff or RSS for updates.
I’ve additionally began experimenting with Mastodon not too long ago as @[email protected].
Because of Arjan Holscher for suggesting this matter.