Now Reading
16-bit Serial Homebrew CPU – 2023

16-bit Serial Homebrew CPU – 2023

2023-08-20 16:26:22

16-bit Serial Homebrew CPU – 2023

Constructing a homebrew CPU from scratch takes a lot of logic chips. It’s comprehensible, that implementing registers, program counter, ALU, and different elements of the CPU in TTL or CMOS logic does require a considerable quantity of chips. However what number of precisely?

I’ve tried to optimize my homebrew CPU for the bottom quantity of logic chips attainable and reply a query:
How few ICs are required for a Turing-complete CPU with no CPU?

16-bit Serial Homebrew CPU

My reply is a 16-bit Serial CPU with solely 8 ICs, together with reminiscence and clock. It has 128kB SRAM, 768kB FLASH, and might be clocked as much as 10MHz. It incorporates solely a 1-bit ALU, however the majority of its 52 directions function on 16-bit values (serially). At its most pace, it executes roughly 12k directions per second (0.012MIPS) and, amongst different issues, is able to streaming a video on PCD8544 primarily based (Nokia 5110) LCD at ~10 FPS.

Relying on the place you place the road between a state machine and a CPU, my 16-bit system would possibly truly be the CPU with the bottom quantity of ICs. Though, another contestants are Jeff Laughton’s 1-bit computer with 1 instruction and 1-bit of reminiscence, and Daniel Thornburgh’s Simple CPU with 1 byte-byte-jump instruction and reminiscence simulated on a Raspberry PI.

{Hardware}:

The structure is impressed by different CPU builds like James Sharman’s JAM-1, Ben Eater’s SAP-1, Warren’s 4-bit Crazy Small CPU, its 8-bit version, and others. All of them, and plenty of others alike, use a “management” EEPROM, EPROM, or ROM for producing management alerts to the CPU elements. As a result of it’s manner simpler than producing them by logic circuits alone, and since it gives extra flexibility sooner or later, I’ve additionally determined to make use of such a “management” reminiscence, particularly, an EPROM. Opposite to the builds talked about above, I’ve aimed for the bottom attainable chip rely, so I’ve tried to “squeeze” as a lot information processing contained in the reminiscence as attainable, to both decrease the calls for on different CPU elements or higher but, eradicate them utterly. Listed here are some key steps taken:

  • Fully eliminating the ALU and implementing it as a lookup. As a result of most EPROMs have solely 8-bit output and the system additionally wants different management alerts, the information width of the ALU must be drastically restricted. To not fear, it may be decreased all the best way all the way down to a single bit: 1-bit computing is definitely all we want.
  • To get any significant computation carried out, the output from the 1-bit ALU must be serialized. That may be a good use case for a serial SRAM, which additionally brings different advantages. First, it eliminates the necessity for registers, since all ALU operations might be carried out immediately on the information in SRAM. Second, serial SRAMs are additionally addressed serially, so there is no such thing as a must latch the supply and vacation spot addresses. Third, an arbitrary information processing width might be achieved simply by choosing the variety of SRAM clock cycles. I selected 16 bits (16 SRAM clock cycles per 1 ALU operation) as a pleasant compromise between utility and pace.
  • At the least 2 serial SRAM chips are required, one among them has to offer a serialized enter to our 1-bit ALU, and, on the similar time, the second has to retailer the end result.
  • For ALU operations with 2 operands (like ADD/AND/XOR…), 2 serialized inputs are wanted. Including a 3rd SRAM may actually be an choice (2 for ALU inputs, 1 for end result), however there’s a higher answer. If a serial FLASH reminiscence is used as an alternative of an SRAM, the identical advantages stay (already serialized information, serialized handle), however the FLASH can be utilized for storing the directions/program in addition to offering the ALU enter.
  • It’s pointless so as to add any {hardware} for a program counter, as there’s already loads of house contained in the SRAMs the place its worth might be saved.

Even with these dramatic simplifications, some extra {hardware} remains to be required, nonetheless, every thing might be constructed with simply 8 chips in whole, following the schematic beneath:

16-bit Serial Homebrew CPU

The circuit is constructed round a 128kB M27C1001-15 EPROM, working at 5V and mixing a management state machine with a 1-bit ALU. Its output strains are latched by a 74HC574 each clock cycle and management the 2 23LCV512 64kB serial SRAMs and one W25Q80 1MB serial FLASH. There are usually not sufficient outputs to regulate every reminiscence individually, in order that they share the information line and partially additionally the chip choose line, solely the clock strains are saved separate. I couldn’t discover a 5V serial FLASH reminiscence, so resistors R3, R4, and R5 restrict the present and kind a bridge from 5V to three.3V. I don’t rely the MCP1703 3.3V voltage regulator as a part of the CPU (I contemplate it, to be part of an influence provide), however with it, my CPU incorporates 9 chips.

The present instruction is saved in a buffered shift register 74HC595, which has its management strains additionally partially shared with the recollections. Each instruction takes a few cycles to finish, so the progress inside an instruction is tracked by a “microcode” counter 74HC393. After the instruction completes, the “Counter_reset” line resets the “microcode” counter and begins the execution of the subsequent instruction buffered in 74HC595.

The 74HC574 and the “microcode” counter 74HC393 use reverse clock edge, so the clock generator 74HC14 supplies an inverted clock sign to the 74HC393 to make them each synchronous.

16-bit Serial Homebrew CPU Schematic

Inputs and Outputs:

One factor I used to be not capable of moderately implement in my CPU is a self-programming of the FLASH reminiscence. Bootloader is, due to this fact, not attainable, and importing a brand new program to the serial FLASH have to be carried out externally. For this objective, I used an Attiny13 microcontroller that listens to a set of instructions over a UART, so any USB-UART adapter is adequate for importing a brand new code. When programming, it disables the output of the 74HC574 by way of the “Prog_en” line and proceeds to program the FLASH reminiscence immediately. The microcontroller is used just for importing a brand new program and the CPU is fortunately operating with out it.

The one obtainable outputs are the 2 higher bits of the instruction shift register 74HC595. I used one among these inverted strains as a chip choose, which permits the CPU to hook up with an SPI-like system. For instance, a 3.3V PCD8544-based SPI LCD (Nokia 5110) might be linked immediately, with the second higher instruction bit performing because the LCD information/command selector.
Another choice is connecting an extra 74HC595 shift register as an alternative of an LCD to get basic digital output strains.

The one obtainable inputs are the 2 reminiscence information/enter alerts linked to the EPROM handle strains (A9, A11). The serial recollections preserve these alerts at excessive impedance when they don’t seem to be in use, to allow them to be sampled as normal digital inputs when the recollections are idle. You will need to be aware that the enter sign should not intervene with the reminiscence information, so a excessive resistance between the enter sign and the reminiscence enter line is required (R6, R7). Sidenote: studying the enter sign on reminiscence information strains is working for clock frequencies solely as much as about 8MHz. At increased frequencies, the sampled information appear to be erratic and the CPU is vulnerable to stalling.

16-bit Serial Homebrew CPU Peripherals

You would already see my CPU enjoying the “Dangerous Apple!!” music video on a PCD8544 LCD someplace on prime of this web page. Within the video beneath, I show the likelihood to regulate normal digital outputs by including one other 74HC595. The identical circuit can be utilized to provide 8-bit music at as much as 4300 samples per second if an R-2R ladder is used as an alternative of LEDs, and it’s the similar circuit I used to provide the soundtrack for the “Dangerous Apple!!” video.

Reminiscence map:

The CPU has no devoted registers, however it has two SRAMs that it will probably learn from and write to. The draw back is that every time the CPU needs to entry information, it has to put in writing the total 16-bit handle to the serial SRAM. The upside is that as a result of it has to put in writing the total 16-bit handle anyway, the CPU (and directions usually) can entry all the 64kB of the SRAM at a relentless time.

I’ve chosen one SRAM (U8/RAM1) for use for holding this system information, and all arithmetic and logical operations are designed to be carried out on values inside this reminiscence. The second SRAM (U7/RAM2) is supposed for use for a stack, so just a few directions are capable of entry and modify its contents.
The primary few bytes of each recollections are reserved for storing the interior CPU state like this system counter, the flag bit, the stack pointer, an intermediate end result, the supply/vacation spot addresses, and different internally used values. The approximate reminiscence map is within the desk beneath:

Tackle: 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC 0xD 0x000E~0xFFFF
RAM1: Flag & Enter Program counter (PC) Program counter reversed Stack pointer (SP) Stack worth (SPVAL) Registers and person information
RAM2: Flag Program counter (PC) Vacation spot handle Instruction’s end result Stack and person information

One factor I wish to point out is the tactic of utilizing the FLASH reminiscence because the second ALU enter. As a result of the FLASH is sort of giant (1MB), it’s attainable to suit inside it a full 16-bit lookup desk containing 16-bit similar values. With this 128kB lookup, it’s then attainable to put in writing a 16-bit worth to the FLASH as an handle and skim again the identical 16-bit worth as information, which can be utilized as an ALU enter.

A slight inconvenience of utilizing the serial recollections is that they’re addressed in an MSB-first format, whereas the 1-bit ALU naturally computes in an LSB-first format. To get a purposeful reminiscence addressing, we have to reverse the bits from the LSB-first format the CPU works with to the MSB-first format the recollections work with. Reversing bits utilizing a 1-bit ALU is just not simple, so I’ve reserved one other 128kB of FLASH reminiscence for a “reversed-values” lookup desk to make the operation sooner. It really works the identical manner because the earlier lookup, a worth is written to the FLASH reminiscence as an handle and its reversed illustration is learn again as information.

These two 16-bit lookup tables are the rationale my CPU has solely 768kB of FLASH reminiscence and why this system counter (PC) begins at handle 0x040000 and never zero.

Instruction set:

There are some restrictions for the instruction set arising from the restricted {hardware}. The CPU is able to solely 64 distinctive directions/operations, all of which have to slot in the utmost of 256 micro-instruction steps and should get by working with solely a 1-bit ALU and 1 Flag bit. However even with these limitations, surprisingly, it’s attainable to create fairly a nice-looking instruction set:

OP code Identify Operands Width Flag Cycles Complete Description
0x00 INIT clear 256 256 Watch for clock to stabilize, then initialize RAM ICs to sequential mode
0x01 RESET clear 235 235 Set program counter PC = 0x040000 and stack pointer SP = 0x000A
0x02 158 414 Shadow instruction: Fetch
0x03 256 414 Shadow instruction: Fetch continuation
0x04 129 129 Shadow instruction: Increment program counter PC = PC + 3
0x05 129 129 Shadow instruction: Increment program counter PC = PC + 5
0x06 129 129 Shadow instruction: Increment program counter PC = PC + 7
0x07 129 129 Shadow instruction: Increment program counter PC = PC + 8
0x08 162 291 Shadow instruction: Copy 32 bit end result
0x09 130 259 Shadow instruction: Copy 16 bit end result
0x0A 113 113 Shadow instruction: Copy program counter
0x0B 167 296 Shadow instruction: Retailer to RAM oblique
0x0C 151 280 Shadow instruction: Retailer to RAM oblique
0x0D 173 587 Shadow instruction: Arithmetic instruction dispatch
0x0E STF set 132 546 Set FLAG
0x0F CLF clear 132 546 Clear FLAG
0x10 NOP 132 546 No operation
0x11 MOV addr16 <- addr16 16 231 774 Transfer 16 bit worth
0x12 MOVW addr16 <- addr16 32 146 851 Transfer 32 bit worth
0x13 INC addr16 <- addr16 16 overflow 231 774 Increment
0x14 DEC addr16 <- addr16 16 overflow 231 774 Decrement
0x15 COM addr16 <- addr16 16 zero 231 774 1’s complement (NOT)
0x16 NEG addr16 <- addr16 16 zero 231 774 2’s complement
0x17 LSL addr16 <- addr16 16 overflow 233 776 Left shift (<<)
0x18 LSR addr16 <- addr16 16 overflow 233 776 Proper shift (>>)
0x19 ROL addr16 <- addr16 16 overflow 233 776 Left shift with carry
0x1A ROR addr16 <- addr16 16 overflow 255 798 Proper shift with carry
0x1B ASR addr16 <- addr16 16 overflow 235 778 Arithmetic proper shift (retains signal bit)
0x1C REV addr16 <- addr16 16 238 781 Bit reverse
0x1D ADDI addr16 <- addr16, val16 16 overflow 231 774 Add fast
0x1E ADCI addr16 <- addr16, val16 16 overflow 231 774 Add fast with carry
0x1F SUBI addr16 <- addr16, val16 16 overflow 231 774 Subtract fast
0x20 SBCI addr16 <- addr16, val16 16 overflow 231 774 Subtract fast with carry
0x21 ANDI addr16 <- addr16, val16 16 zero 231 774 Logical AND with fast
0x22 ORI addr16 <- addr16, val16 16 zero 231 774 Logical OR with fast
0x23 XORI addr16 <- addr16, val16 16 zero 231 774 Logical XOR with fast
0x24 ADD addr16 <- addr16, addr16 16 overflow 171 887 Add register
0x25 ADC addr16 <- addr16, addr16 16 overflow 171 887 Add register with carry
0x26 SUB addr16 <- addr16, addr16 16 overflow 171 887 Subtract register
0x27 SBC addr16 <- addr16, addr16 16 overflow 171 887 Subtract register with carry
0x28 AND addr16 <- addr16, addr16 16 zero 171 887 Logical AND with register
0x29 OR addr16 <- addr16, addr16 16 zero 171 887 Logical OR with register
0x2A XOR addr16 <- addr16, addr16 16 zero 171 887 Logical XOR with register
0x2B JMP addr24 197 611 Soar to deal with
0x2C CALL addr24 32 221 748 Copy following instruction’s handle (PC + 4) and present FLAG to SPVAL, then leap
0x2D RET 32 restore 138 552 Transfer SPVAL to PC & FLAG (successfully returns from CALL and restores earlier FLAG)
0x2E BRFS addr24 160 625|574 Department if FLAG set
0x2F BRFC addr24 160 625|574 Department if FLAG cleared
0x30 BREQ addr16, addr24 16 243 708|657 Department if register is zero
0x31 BRNE addr16, addr24 16 243 708|657 Department if register is just not zero
0x32 LDI addr16 <- value16 16 81 624 Load 16 bit fast
0x33 LDIW addr16 <- value32 32 113 656 Load 32 bit fast
0x34 LD addr16 <- [addr16] 16 238 911 Oblique load 16 bits from handle
0x35 LDB addr16 <- [addr16] 8 238 911 Oblique load 8 bits from handle, set higher 8 bits to 0
0x36 ST [addr16] <- addr16 16 163 873 Oblique retailer 16 bits to deal with
0x37 STB [addr16] <- addr16 8 163 857 Oblique retailer 8 bits to deal with
0x38 LD2W [addr16] 32 256 799 Oblique load 32 bits from handle in RAM2 to SPVAL register
0x39 LD2 [addr16] 16 224 767 Oblique load 16 bits from handle in RAM2 to SPVAL register
0x3A ST2W [addr16] 32 256 799 Oblique retailer 32 bits from SPVAL register to deal with in RAM2
0x3B ST2 [addr16] 16 224 767 Oblique retailer 16 bits from SPVAL register to deal with in RAM2
0x3C LPM addr16 <- [addr16] 16 211 884 Oblique load 16 bits from handle in FLASH
0x3D LPB addr16 <- [addr16] 8 211 884 Oblique load 8 bits from handle in FLASH, set higher 8 bits to 0
0x3E OUT addr16 8 252 795 Output 8 bits over SPI
0x3F HALT clear 14 428 Cease execution

The primary directions, INIT and RESET, are executed at power-up or when the RESET button is pressed. The “shadow” directions are non-user-accessible directions, primarily used for repeating operations like fetching an instruction, program counter increment, end result write-back, and related.

Arithmetic and logical operations use the one Flag bit as both a Carry/Overflow or a Zero flag. As talked about above, there is no such thing as a efficiency penalty for accessing the total handle house, so all these directions can specify any supply/vacation spot handle throughout the 64kB SRAM handle house. Oblique addressing for arithmetic operations is just not supported immediately however have to be carried out by LD/ST (load/retailer) directions.

The second set of LD2/ST2 directions is accessing the second SRAM. They’re meant for use for stack, however any information might be saved. PUSH and POP directions are usually not applied, however they are often constructed from LD2/ST2 and INC/DEC directions.

A mean instruction takes about 800 clock cycles, together with fetch operation and program counter increment. On the most clock frequency of 10MHz, the CPU can execute about 12k directions per second.

Writing code in assembler:

I exploit Lorenzi’s customasm instrument to generate binary information from meeting supply code. The binary information can then be uploaded utilizing a small python3 utility to the Attiny13 programming microcontroller that writes the binary into the FLASH.

Beneath are two examples of small subroutines written in assembler for my CPU. The primary subroutine returns the 32-bit results of two 16-bit values multiplication. The second writes an ascii string saved contained in the FLASH reminiscence to the LCD.

Multiply32_16x16 LCD_WriteStrF
; Returns FA32 = FA16 * FB16
; FB is anticipated to be smaller
Multiply32_16x16:
    ;PUSH_PC        ; Not crucial
    LDIW FC, 0      ; Clear end result
    LDI FA+2, 0     ; Solid FA16 to FA32
.loop:
    ANDI TMP, FB, 1
    BRFS .skip_add
    ADD FC, FA      ; Add FC32 += FA32
    ADC FC+2, FA+2  ; Add FC32 += FA32
.skip_add:
    LSL FA          ; Shift FA32 << 1
    ROL FA+2        ; Shift FA32 << 1
    LSR FB          ; Shift FB16 >> 1
    BRNE FB, .loop
    MOVW FA, FC     ; Copy end result
    ;POP_PC         ; Not crucial
    RET
; Write String in Flash
; enter: FA32 <- Tackle of the string in Flash
LCD_WriteStrF:
    PUSH_PC             ; Save return handle
    PUSHW RA            ; Save RA 32-bit
    MOVW RA, FA
.loop:
    LPB FA, RA          ; Load character from Flash
    BREQ FA, .cease      ; Take a look at "&bsol;0" character
    REV FA              ; MSB-first -> LSB-first
    ANDI FA, FA+1, 0xFF ; Solid to 8bits
    CALL LCD_WriteChar  ; Write character
    ADDI RA, 1          ; Enhance 32-bit pointer
    ADCI RA+2, 0        ; Enhance 32-bit pointer
    JMP .loop
.cease:
    POPW RA             ; Restore RA 32-bit
    POP_PC              ; Restore return handle
    RET

Most Frequency and the Important path:

In line with specs, the entire propagation delay throughout the important path is:

See Also

  • 12ns at 74HC14 from “Clock_pos” to “Clock_neg”,
  • 54ns at 74HC393 to ripple to the final eighth bit (12+3×5+12+3×5 ns),
  • 150ns entry time at M27C1001-15 EPROM,
  • 2ns at 74HC574 to stabilize the inputs earlier than the clock edge.

Placing it collectively, the circuit ought to solely be capable to run at ~4.6MHz. My particular construct, nonetheless, is ready to work flawlessly as much as 10MHz and turns into unstable solely above ~10.5MHz. For a circuit constructed on a breadboard with loads of parasitic capacitance, I contemplate it fairly spectacular. The utmost clock fee would possibly even be improved if a sooner binary counter or sooner EPROM had been used.

Conclusion and Retrospective:

I’m actually happy with the completed CPU. It has good and “easy-to-work-with” instruction set with all the essential directions current. It’s highly effective sufficient to stream a video on a small LCD display screen, play audio (although utilizing an exterior “sound card”), and customarily carry out the easy enter/output computational operations it was initially meant for. Lastly, it successfuly demonstrates that it’s attainable to construct a purposeful homebrew CPU with solely a handful of ICs.

There are, nonetheless, some small enhancements attainable:

  • The 74HC393 ripple counter is a major bottleneck within the important path. Changing it with a carry-lookahead adder (“quick adder”) or with a buffered counter like 74HC590 would enhance the utmost clock pace.
  • The identical goes for the M27C1001-15 EPROM. Utilizing a sooner reminiscence like M27C1001-35 EPROM or SST39SF020A-70 FLASH would additionally enable increased clock fee.
  • A bigger EPROM with greater than 17 handle strains might be used to both enhance the instruction rely or to make the most of the extra handle strains as normal digital inputs.
  • Including some directions for erasing and programming the interior FLASH reminiscence would have enabled a bootloader to be made, which might make the Attiny13 programming circuit pointless.
  • The system is ready to execute code solely from the FLASH reminiscence. It could be attainable to create an emulator contained in the FLASH and make the emulator execute code from SRAM however to make the CPU execute code from SRAM natively, a special instruction fetching course of could be required, presumably together with a replica set of directions only for the SRAM execution itself.

I’ll have but to see if a few of these enhancements appear to be worthwhile to implement. Within the meantime, when you just like the undertaking and wish to dive deeper, you possibly can skim by means of the supply code obtainable here. It incorporates a simulator, EPROM microcode generator, Attiny13 programmer firmware, and all of my assembler codes.

Replace:

I’ve applied a minimalistic 3D wireframe object projection engine utilizing 16-bit fixed-point arithmetic. Multiplying matrices on my 0.012 MIPS CPU is sort of sluggish, so 3D video games are in all probability not coming anytime quickly:

I am additionally slowly rising the checklist of {hardware} my CPU immediately helps. I’ve added an SPI alphanumeric LCD I’ve salvaged from an outdated HP printer:

16-bit Serial Homebrew CPU

and I have been capable of “bit-bang” the serial interface for DS1302 real-time clock. The software program does have to make use of some particular instruction sequences to provide the required alerts, however it’s attainable and doesn’t require any extra {hardware}.

16-bit Serial Homebrew CPU

Replace 2:

The CPU now helps a PCF8833 LCD driver, though one body takes about 96 seconds to render.

16-bit Serial Homebrew CPU


Homebuilt CPUs WebRing

Undoubtedly try different superior homebrew CPU builds on Warren’s https://www.homebrewcpuring.org

Be part of the ring?

To affix the Homebuilt CPUs ring, drop Warren a line (mail is obfuscated, it’s important to change [at] to @), mentioning your web page’s URL. He’ll then add it to the checklist.
You have to to repeat this code fragment into your web page (or reference it.)
Notice: The ring is chartered for tasks that embody a home-built CPU. It might emulate a business half, that′s OK.
However truly utilizing that business CPU doesn′t fee. Likewise, the undertaking will need to have been not less than partially constructed: pure paper designs don′t fee both.
It may be constructed utilizing any know-how you want, from relays to FPGAs.


Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top