Reverse-engineering the Intel 8086 processor’s HALT circuits
The 8086 processor was launched in 1978 and has vastly influenced fashionable computing by the x86 structure.
One uncommon instruction on this processor is HLT, which stops the processor and places it in a halt state.
On this weblog publish, I clarify intimately how the halt circuitry is applied and the way it interacts with the 8086’s structure.
The die picture under exhibits the 8086 microprocessor below a microscope.
The metallic layer on high of the chip is seen, with the silicon and polysilicon principally hidden beneath.
Across the edges of the die, bond wires join pads to the chip’s 40 exterior pins.
I’ve labeled the important thing purposeful blocks; those which might be necessary to this dialogue are darker and can be mentioned intimately under.
Architecturally, the chip is partitioned right into a Bus Interface Unit (BIU) on the high and an Execution Unit (EU) under.
The BIU handles reminiscence accesses, whereas the Execution Unit (EU) executes directions.
Each are stopped by a halt instruction.
The 8086 die below a microscope, with fundamental purposeful blocks labeled. This picture exhibits the chip’s single metallic layer; the polysilicon and silicon are beneath. Click on on this picture (or some other) for a bigger model.
Halt processing within the Execution Unit
On this part, I will clarify how the HLT instruction is decoded and dealt with within the Execution Unit.
The 8086 makes use of a mixture of lookup ROMs, logic, and microcode to implement directions.
The method begins with the loader, a state machine that gives synchronization between the prefetch queue and
the decoding circuitry.
When an instruction byte is accessible, the loader offers a sign referred to as First Clock that masses the instruction into the Instruction Register and begins the instruction decoding course of.
Earlier than microcode will get concerned, the Group Decode ROM classifies directions by producing about 15 indicators, indicating
properties akin to directions with a Mod R/M byte, directions with a byte/phrase bit, directions that all the time
act on a byte, and so forth.
For the HLT instruction, the Group Decode ROM offers two necessary indicators.
The primary is one-byte logic (1BL), indicating that the instruction is one byte lengthy and is applied with logic
circuitry somewhat than microcode.1 The second sign is produced for the HLT instruction particularly and generates the
inner HALT sign.
This sign travels to varied components of the 8086 to halt the processor.
The Group Decode ROM. The yellow rectangle detects the HLT instruction, with an output on the backside. The crimson rectangle generates the 1BL (one-byte logic) sign.
Within the Execution Unit, the HALT sign blocks the studying of latest directions from the prefetch queue.
This causes the loader to attend indefinitely and stops execution of latest directions.
Since no new instruction replaces HLT, the Group Decode ROM continues to generate the HALT sign.
The HALT sign additionally blocks many of the different outputs from the Group Decode ROM, stopping different decoding actions.
Thus, the Execution Unit sits idle on account of the HLT instruction, unable to start out a brand new instruction.
Fashionable processors typically have low-power halt modes, the place a part of the processor is shut down or a clock area is
stopped to cut back energy consumption.
The 8086, nevertheless, does not do something intelligent to attenuate energy consumption within the halt mode, since this wasn’t
a priority for processors within the Seventies.
Halt processing within the Bus Interface Unit
Reminiscence and I/O gadgets are related to the 8086 chip by a bus that transmits tackle, information, and management data.
The 8086’s Bus Interface Unit handles reads and writes over this bus, working independently from the Execution Unit.
A whole bus cycle for a learn or write takes 4 clock intervals, referred to as T1, T2, T3, and T4,2
with particular indicators on the bus for every time state.
A HLT instruction stops the Bus Interface Unit, however this takes a number of steps.
First, the Bus Interface Unit should full any currently-running bus cycle. Any new bus cycle should be blocked.
Lastly,
the processor signifies the HALT state to any gadgets on the bus by issuing a particular T1 cycle over the bus.
The principle HALT management sign contained in the Bus Interface Unit is one thing I name halt-not-hold
, indicating a HALT is lively,
however not a HOLD. (Ignore the HOLD half for now.)
This sign is activated by the HLT instruction sign from the Group Decode ROM, besides it’s blocked by
any bus operations in progress. As soon as any present bus operation reaches T2, halt-not-hold
will get activated and
begins the halt course of whereas the present bus cycle winds up.
To stop new bus exercise,
the halt-not-hold
sign blocks new prefetch requests.
The one different supply of bus exercise is an instruction that performs reads or writes.
However the present instruction is HLT, so it will not generate any bus site visitors.
Thus, the Bus Interface Unit will stay idle.
The learn/write management circuitry on the die with the flip-flops labeled. Steel and polysilicon had been eliminated to indicate the underlying silicon.
The circuitry to regulate the bus cycle is difficult with many flip-flops and logic gates;
the diagram above exhibits the flip-flops.
I plan to jot down concerning the bus cycle circuitry intimately later, however for now, I will give a particularly simplified description.
Internally, there’s a T0 state earlier than T1 to supply a cycle to arrange the bus operation.
The bus timing states are managed by a sequence of flip-flops configured like
a shift register with extra logic:
the output from the T0 flip-flop is related to the enter of the T1 flip-flop and likewise with T2 and T3, forming
a sequence.
A bus cycle is began by placing a 1 into the enter of the T0 flip-flop.3
When the CPU’s clock transitions, the
flip-flop latches this sign, indicating the (inner) T0 bus state.
On the subsequent clock cycle, this 1 sign goes from the T0 flip-flop to the T1 flip-flop, creating the externally-visible
T1 state.
Likewise, the sign passes to the T2 and T3 flip-flops in sequence, creating the bus cycle.
A barely completely different path is used to generate the particular T1 sign that signifies a HALT.
As soon as any bus exercise is accomplished, the halt-not-hold
sign places a 1 into the T1 flip-flop by some gates.
This generates the T1 sign, bypassing T0.
Furthermore, this sign doesn’t propagate to the T2 flip-flop as a result of it’s blocked by halt-not-hold
and a few gates.
One other flip-flop blocks this T1 cycle after the primary cycle so halt-not-hold
does not repeately set off it.
General, this particular HALT T1 state seems like a particular case that was hacked into the circuitry.
One complication is the bus maintain function.
The 8086 helps complicated bus configurations, the place exterior gadgets could take management of the bus.
As an example, peripherals could use the bus for
direct reminiscence entry, bypassing the CPU.
A tool can request management of the bus, a “bus maintain”, by the 8086’s HOLD pin.4
This causes the 8086 to electrically cease placing indicators on the bus
(i.e. a high-impedance, tri-state off state). This permits one other machine to make use of the bus till it releases HOLD.
Even when the CPU is halted, the CPU nonetheless has “possession” of the bus and drives the bus with idle indicators.5
If a tool requests a bus maintain when the CPU is halted, the halt-not-hold
sign is blocked.
When the machine releases the maintain, halt-not-hold
is unblocked.
This causes the 8086 to undergo the particular T1 cycle once more, utilizing the identical flip-flop course of described above.
This lets listeners on the bus know that the CPU remains to be halted.
Exiting the halt state
The processor exits the halt state when it receives a reset, interrupt, or non-maskable interrupt.
To implement this, an interrupt unblocks the instruction decoder by overriding the queue-unavailable sign.
This causes the loader, which controls instruction decoding, to maneuver into the First Clock state.
In the meantime, the interrupt causes the microcode tackle register to be loaded with the hardcoded microcode tackle
of the suitable interrupt routine.
Thus, the microcode engine begins working the interrupt handler microcode.
The Instruction Register holds the 8-bit opcode that’s at the moment being processed.
It has a ninth bit that signifies if an interrupt is being processed.
The Instruction Register (together with the interrupt bit) is loaded on First Clock (described above).
It outputs the instruction and interrupt bit to the Group Decode ROM one clock cycle later.
The interrupt bit blocks common instruction decoding by the Group Decode ROM.
Specifically, the HLT instruction will now not be decoded, dropping the HALT sign all through the CPU.
Within the Execution Unit, this reactivates the prefetch queue. This may enable instruction execution as soon as
the microcode finishes executing the interrupt dealing with code.
Within the Bus Interface Unit, dropping the HALT sign causes halt-not-hold
to drop.
This allows bus exercise from the Bus Interface Unit.6
Historical past of HALT and x86
Traditionally, computer systems often had some form of “cease” or “wait” instruction to cease execution on the finish of a program.
This goes again to the electromechanical Harvard Mark I (1944), EDSCAC (1949), and Univac I (1951), amongst different machines.
Most (however not all) mainframes and minicomputers continued this method.7
The HLT instruction within the 8086, like many different options, derives from the Datapoint 2200, and there is an
fascinating story behind that.
The Datapoint 2200 was a desktop pc introduced in 1970, and offered as a “programmable terminal”.
The processor of the Datapoint 2200 was applied with a board of TTL built-in circuits,
since this was earlier than microprocessors.
The Datapoint producer talked to Intel and Texas Devices about changing the board of
chips with a single processor chip.
Texas Devices produced the TMX 1795 microprocessor chip and Intel produced the 8008 shortly after,8 each copying the Datapoint 2200’s structure and instruction set.
Datapoint did not just like the efficiency of those chips and determined to stay with a TTL-based processor.
Texas Devices could not discover a buyer for the TMX 1795 and deserted it.
Intel, then again, offered the 8008 as an 8-bit microprocessor, creating the microprocessor market within the course of.
Intel improved the 8008 to create the favored 8080 microprocessor (1974). Zilog produced the extra highly effective Z80 (1976), backward-compatible
with the 8080.
The Datapoint 2200. That is the later Mannequin II with an improved TTL processor utilizing the 74181 ALU chip.
Intel began designing the iAPX 432 in 1975 to be their high-end 32-bit processor, a “micromainframe” that supported rubbish
assortment and objects within the processor.
The iAPX 432 was too complicated for the time and because the schedule slipped, Intel determined to supply a stopgap 16-bit
processor to compete with Zilog and Motorola:
this processor grew to become the 8086.
To make it simpler for Intel clients to maneuver to the 8086, the processor was designed for compatibility with 8080 meeting language so it inherited a lot of the structure and instruction set, though prolonged from 8 bits to 16 bits.9
The consequence of this historical past is that the 8086 inherited many options from the Datapoint 2200.
The Datapoint 2200 used cheaper shift-register reminiscence so it had a serial processor
that operated on one bit at a time.
This required the Datapoint 2200 to be little-endian, a function that lives on within the x86 structure.
Because the Datapoint 2200 was marketed as a programmable terminal, it had parity calculation constructed into the {hardware}.
Thus, the 8008 and descendants have a parity flag, in distinction to modern processors such because the 6800 and 6502 that omitted this reasonably complicated function.
Using I/O ports as a substitute of memory-mapped I/O is one other function of the Datapoint 2200 that persists within the x86, however was not used within the 6800 and 6502 and their descendants.
Lastly, the Datapoint 2200’s HALT instruction was precisely copied by the 800810 and persists in x86.
Conclusions
The HLT instruction looks as if a easy perform, however its implementation touches many components of the 8086.
It’s applied in logic circuitry, fully bypassing the microcode.
The implementation grew to become extra difficult due to the 8086’s four-step bus protocol, in addition to interplay between halting
and the bus maintain function.
This illustrates how complexity creates extra complexity, one thing the RISC processors of the Nineteen Eighties tried to counter.
I’ve written a number of posts on the 8086 up to now and
plan to proceed reverse-engineering the 8086 die so
comply with me on Twitter @kenshirriff or RSS for updates.
I’ve additionally began experimenting with Mastodon lately as @[email protected].
Because of monocasa for suggesting this matter.