Now Reading
Reverse-engineering Ethernet backoff on the Intel 82586 community chip’s die

Reverse-engineering Ethernet backoff on the Intel 82586 community chip’s die

2023-10-31 10:59:57

Launched in 1973, Ethernet is the predominant manner of wiring computer systems collectively.
Chips have been quickly launched to deal with the low-level facets of Ethernet: changing information packets into bits,
implementing checksums, and dealing with community collisions.
In 1982, Intel introduced the i82586 Ethernet LAN coprocessor chip, which went a lot additional by offloading a lot of the
information motion from the primary processor to an on-chip coprocessor.
Trendy Ethernet networks deal with a gigabit of knowledge per second or extra, however on the time, the Intel chip’s help for 10 Mb/s
Ethernet put it on the leading edge.
(Ethernet was surprisingly costly, about $2000 on the time, however expected to drop beneath $1000 with the Intel chip.)
On this weblog publish, I give attention to a selected a part of the coprocessor chip: the way it handles community collisions and implements
exponential backoff.

The die photograph under reveals the i82586 chip. This photograph reveals the steel layer on prime of the chip, which hides the underlying
polysilicon wiring and silicon transistors.
Across the fringe of the chip, sq. bond pads present the hyperlink to the chip’s 48 exterior pins.
I’ve labeled the operate blocks based mostly on my reverse engineering and printed descriptions. The left facet of the chip is named the “obtain unit” and
handles the low-level networking,
with circuitry for the community transmitter and receiver.
The left facet additionally comprises low-level management and standing registers.
The correct facet is named the “command unit” and interfaces to reminiscence and the primary processor.
The correct facet comprises a easy processor managed by a microinstruction ROM.1
Knowledge is transmitted between the 2 halves of the chip by 16-byte FIFOs (first in, first out queues).

The die of the Intel 82586 with the main functional blocks labeled. Click this image (or any other) for a larger version.

The die of the Intel 82586 with the primary purposeful blocks labeled. Click on this picture (or another) for a bigger model.

The 82586 chip is extra advanced than the standard Ethernet chip on the time.
It was designed to enhance system efficiency by shifting a lot of the Ethernet processing from the primary processor to
the coprocessor, permitting the primary processor and the coprocessor to function in parallel.
The coprocessor gives 4 DMA channels to maneuver information between reminiscence and the community with out the primary processor’s involvement.
The principle processor and the coprocessor talk by means of advanced information buildings2 in shared reminiscence: the primary processor places management blocks in reminiscence
to inform the I/O coprocessor what to do, specifying the places of transmit and obtain buffers in reminiscence.
In response, the I/O coprocessor places standing blocks in reminiscence.
The processor onboard the 82586 chip permits the chip to deal with these advanced information buildings in software program.
In the meantime, the transmission/obtain circuitry on the left facet of the chip makes use of devoted circuitry to deal with the low-level,
high-speed facets of Ethernet.

Ethernet and collisions

A key downside with a shared community is methods to forestall a number of computer systems from making an attempt to ship information on the community on the identical
As an alternative of a centralized management mechanism, Ethernet permits computer systems to transmit each time they need.3
If two computer systems transmit on the identical time, the “collision” is detected and the computer systems attempt once more, hoping to
keep away from a collision the following time.
Though this may increasingly sound inefficient, it seems to work out remarkably nicely.4
To keep away from a second collision, every laptop waits a random period of time earlier than retransmitting the packet.
If a collision occurs once more (which is probably going on a busy community), an exponential backoff algorithm is used, with every
laptop ready longer and longer after every collision.
This robotically balances the retransmission delay to reduce collisions and maximize throughput.

I traced out a bunch of circuitry to find out how the exponential backoff logic is applied.
To summarize, exponential backoff is applied with a 10-bit counter to supply a pseudorandom quantity, a 10-bit masks register to get an exponentially sized
delay, and a delay counter to rely down the delay.
I will talk about how these are applied, beginning with the 10-bit counter.

The ten-bit counter

A ten-bit counter could seem trivial, nevertheless it nonetheless takes up a considerable space of the chip.
The easy manner of implementing a counter is to hook up 10 latches as a “ripple counter”.
The counter is managed by a clock sign that signifies that the counter ought to increment.
The clock toggles the bottom little bit of the counter.
If this bit flips from 1 to 0, the following greater bit is toggled.
The method is repeated from bit to bit, toggling a bit if there’s a carry.
The issue with this method is that the carry “ripples” by means of the counter.
Every bit is delayed by the decrease bit, so the bits do not all flip on the identical time.
This limits the velocity of the counter as the highest bit is not settled till the carry has propagated by means of the 9 decrease bits.

The counter within the chip makes use of a special method with extra circuitry to enhance efficiency.
Every bit has logic to verify if all of the decrease bits are ones. If that’s the case, the clock sign toggles the bit.
All of the bits toggle on the identical time, quickly incrementing the counter in response to the clock indicators.
The downside of this method is that it requires way more logic.

The diagram under reveals how the carry logic is applied.
The circuitry is optimized to stability velocity and complexity.
Specifically, bits are examined in teams of three, permitting a number of the logic to be shared throughout a number of bits.
As an illustration, as an alternative of utilizing a 9-input gate to look at the 9 decrease bits, separate gates check bits 0-2 and 3-5.

The circuitry to generate the toggle signals for each bit of the counter.

The circuitry to generate the toggle indicators for every little bit of the counter.

The implementation of the latches can also be attention-grabbing.
Every latch is applied with dynamic logic, utilizing the circuit’s capacitance to retailer every bit.
The enter is related to the output with two inverters.
When the clock is excessive, the transistor activates, connecting the inverters in a loop that holds the worth.
When the clock is low, the transistor turns off. Nonetheless, the 0 or 1 worth will nonetheless stay on the enter to
the primary inverter, held by the cost on the transistor’s gate.
At the moment, an enter might be fed into the latch, overriding the outdated worth.

The basic dynamic latch circuit.

The essential dynamic latch circuit.

The latch has some extra circuitry to make it helpful.
To toggle the latch, the output is inverted earlier than feeding it again to the enter. The toggle management sign selects
the inverted output by means of one other cross transistor.
The toggle sign is simply activated when the clock is low, guaranteeing that the
circuit would not repeatedly toggle, oscillating uncontrolled.

One bit of the counter.

One little bit of the counter.

The picture under reveals how the counter circuit is applied on the die. I’ve eliminated the steel layer to point out the underlying transistors; the circles are contacts the place the steel was related to the underlying silicon.
The pinkish areas are doped silicon. The pink-gray strains are polysilicon wiring. When polysilicon crosses doped silicon, it
creates a transistor.
The blue shade swirls should not important; they’re bits of oxide remaining on the die.

The counter circuitry on the die.

The counter circuitry on the die.

The ten-bit masks register

The masks register has a specific variety of low bits set, offering a masks of size 0 to 10.
As an illustration, with 4 bits set, the masks register is 0000001111.
The masks register might be up to date in two methods. First, it may be set to size 1-8 with a three-bit size enter.5
Second, the masks might be lengthened by one bit, for instance going from 0000001111 to 0000011111 (size 4 to five).

The masks register is applied with dynamic latches just like the counter, however the inputs to the latches are completely different.
To load the masks to a specific size, every bit has logic to find out if the bit ought to be set based mostly on the three-bit enter.
For instance, bit 3 is cleared if the desired size is 0 to three, and set in any other case.
The lengthening characteristic is applied by shifting the masks worth to the left by one bit and inserting a 1 into the bottom bit.

The schematic under reveals one little bit of the masks register. On the heart is a two-inverter latch as seen earlier than.
When the clock is excessive, it holds its worth. When the clock is low, the latch might be loaded with a brand new worth.
The “shift” line causes the bit from the earlier stage to be shifted in. The “load” line hundreds the masks bit generated from
the enter size. The “reset” line clears the masks.
On the proper is the NAND gate that applies the masks to the rely and inverts the consequence.
As shall be seen under, these NAND gates are unusually massive.

One stage of the mask register.

One stage of the masks register.

The logic to set a masks bit based mostly on the size enter is proven under.6
The three-bit “sel” enter selects the masks size from 1 to eight bits; notice that the mask0 bit is all the time set whereas bits
8 and 9 are cleared.7
Every set of gates energizes the corresponding masks line for the suitable inputs.

The control logic to enable mask bits based on length.

See Also

The management logic to allow masks bits based mostly on size.

The diagram under reveals the masks register on the die. I eliminated the steel layer to point out the underlying
silicon and polysilicon, so the transistors are seen.
On the left are the NAND gates that mix every little bit of the counter with the masks. Word that enormous snake-like
transistors; these bigger transistors present sufficient present to drive the sign over the lengthy bus to
the delay counter register on the backside of the chip.
Bit 0 of the masks is all the time set, so it would not have a latch. Bits 8 and 9 of the masks are solely set by
shifting, not by choosing a masks size, so they do not have masks logic.8

The mask register on the die.

The masks register on the die.

The delay counter register

To generate the pseudorandom exponential backoff, the counter register and the masks register are NANDed collectively.
This generates plenty of the specified binary size, which is saved within the delay counter.
Word that the NAND operation inverts the consequence, making it adverse.
Thus, because the delay counter counts up, it counts towards zero, reaching zero after the specified variety of clock ticks.

The implementation of the delay counter is just like the primary counter, so I will not embody a schematic.
Nonetheless, the delay counter is hooked up to the register bus, permitting its worth to be learn by the chip’s CPU.
Management strains permit the delay counter’s worth to cross onto the register bus.

The diagram under reveals the places of the counter, masks, and delay register on the die.
On this period, one thing so simple as a 10-bit register occupied a major a part of the die.
Additionally notice the gap between the counter and masks and the delay register on the backside of the chip.
The NAND gates for the counter and masks required massive transistors to drive the sign throughout this massive distance.

The die, with counter, mask, and delay register.

The die, with counter, masks, and delay register.


The Intel Ethernet chip gives an attention-grabbing instance of how a real-world circuit is applied on a chip.
Exponential backoff is a key a part of the Ethernet commonplace.
This chip implements backoff with a easy however optimized circuit.9

A high-resolution image of the die with the metal removed. (Click for a larger version.) Some of the oxide layer remains, causing colored regions due to thin-film interference.

A high-resolution picture of the die with the steel eliminated. (Click on for a bigger model.) Among the oxide layer stays, inflicting coloured areas resulting from thin-film interference.

For extra chip reverse engineering,
comply with me on Twitter @kenshirriff or RSS for updates.
I am additionally on Mastodon sometimes as @[email protected].
Acknowledgments: Because of Robert Garner for offering the chip and questions.

Notes and references

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top