Writing axle’s GameBoy emulator | Phillip Tennen
The Nintendo GameBoy is an exceptionally well-documented system, good for anyone who’d wish to take a crack at emulation. Publicly-maintained sources just like the Pan Docs make it each approachable and handy to get an outline of the GameBoy’s deal with house structure, to grasp the mechanics of {hardware} peripherals, and to find out about numerous edge-case behaviors that some video games rely upon.
Over the previous three weeks, I’ve been writing a GameBoy emulator my very personal. This isn’t a novel undertaking! I used to be impressed by the weblog submit of one other adventurer doing the identical factor. What’s the particular sauce of this submit, then? Quite than stopping at writing an emulator alone, I threw an additional requirement into the combination: The emulator ought to run and be playable underneath axle, my from-scratch and home-grown working system.
Earlier than I dive into the nitty-gritty, it’s value asking: is that this even possible in axle’s current state?
Ah! Properly then, off we go.
axle doesn’t (but!) have the debugging services and improvement niceties that make trendy software program improvement attainable, not to mention nice. Subsequently, we’ll should work with some sort of dual-approach: it needs to be straightforward to each run the emulator on my macOS machine, for fast iteration, and to run it underneath axle after we’re prepared to check the entire shebang.
Most software program I develop for axle will get the straightforward only-runnable-within-axle strategy, partly as a result of most software program that runs inside axle makes use of axle-specific platform APIs for issues like window/event-loop administration and message passing. Nevertheless, I’ve chosen the multi-platform strategy for extra difficult bits of software program: the primary one the place this dual-pronged strategy paid off was axle’s first iteration of a browser.
Okay, let’s get began! I’m not completely new to emulation – I’ve written a reasonably complete AArch64 simulator – however one factor that’s new to me is emulating a whole system, full with {hardware} peripherals, fairly than simply CPU and reminiscence. Nonetheless, the beating coronary heart (and nice place to begin!) of any emulator would be the CPU implementation. Let’s have a take into consideration what our objectives are right here.
The CPU
All CPUs function through the identical basic cycle: fetch, decode, execute. Trendy CPUs throw some spectacularly difficult wrenches into the combination, however for our functions we’ll maintain it easy.
Let’s break it down.
The basic information kind of a pc is a fixed-size integer (nowadays, 64 bits). Once more – we’re going to gloss over particulars like SIMD and {hardware} floating-point. If you recognize extra, consider this as your second to really feel smug. Each track you’ve ever digitally consumed, PDF you’ve browsed, sport you’ve performed, or message you’ve despatched boils right down to streams of those numbers. This isn’t a very esoteric concept! What’s barely extra fascinating, although, is that the code that’s facilitating these duties can also be only a stream of numbers.
The place is that this code and information saved? In some sort of reminiscence, usually RAM, that the CPU can entry and function on.
Let’s take a concrete instance. Say our RAM accommodates some numbers like the next (expressed in hexadecimal):
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21
The which means of those numbers is completely contextual! It could possibly be some sort of information, sure: a chunk of encoded textual content, a couple of encoded pixels inside a picture, an inventory of salaries (let’s hope not!), and so forth. Or, it may be code: uncooked directions that the CPU is aware of learn how to interpret to carry out an operation. The best way the bytes are interpreted, and subsequently the which means that will get derived from them, depends upon the context by which the bytes are used.
Within the instance above, these bytes are the results of me encoding some information – specifically, the string “Howdy, world!” – character-by-character into numbers, utilizing the ASCII character encoding. Let’s take one other instance:
48 8b 04 25 00 00 00
Once more, there’s no absolute which means to those bytes, however the context by which I intend for them to be understood is as a chunk of code to be run on an x86_64 CPU. An x86_64 CPU would perceive that this sequence of bytes is telling it to maneuver a quantity from one register to a different. One other manner of expressing the identical factor as these bytes could be to jot down mov $rax, $rbp
, however the particulars aren’t notably vital proper now.
We may also have a sequence of bytes that may be understood as a superbly legitimate sequence as both code or information:
6a 6b
If an x86_64 CPU tried to interpret this sequence of bytes as an instruction, it’d perceive it a directive (or instruction) to “push 0x6b to the stack”. If we tried to interpret it as ASCII-encoded textual content, we’d as an alternative perceive this sequence because the textual content “jk”.
These mappings are completely standard! Why does 6a
correspond to j
when interpreted as ASCII textual content? There’s no ‘basic’ motive. By accidents of historical past, one explicit manner of mapping bytes to English characters proliferated sufficient for everyone to conform to formalize the mapping right into a ‘dictionary’ referred to as ASCII. Equally, how does an x86_64 CPU “know” that 6a
ought to imply PUSH
? Simply the identical, AMD and Intel outlined what an x86_64 CPU ought to be capable of perceive. When an assembler is producing code for this platform, it is aware of to emit this byte when it desires the CPU to run a PUSH
instruction. When CPUs are manufactured of their billion-dollar fabrication plants, circuitry is lasered into the silicon that ensures that when the CPU reads the 6a
byte from reminiscence, the CPU will do the corresponding operations related to what we’d anticipate PUSH
to do.
Okay, the place have been we going with this? Sure! CPU’s wish to fetch, decode, and execute. The basic loop of a CPU is:
-
Fetch the following byte from reminiscence.
-
Decode the byte, decoding it as an instruction the CPU is aware of learn how to perceive.
-
Execute the instruction, updating the system with the results of the operation indicated by the instruction.
How does the CPU know the place to fetch from? All CPUs comprise an instruction pointer denoting the place it’s presently fetching a byte from. This pointer would in any other case transfer alongside in a linear style, however one of many central tips of computer systems is that this pointer will also be redirected, leaping backwards and forwards because the logic of the code dictates. This allows some neat use circumstances, reminiscent of winning WWII.
Again to the GameBoy. It’s obtained a CPU, so we all know it’ll be working a fetch, decode, execute loop. However what’s it fetching its code from? The sport cartridge slotted into the gadget! This cartridge will maintain all of the code and information essential to run the sport.
Maintain on a second. We all know that the Fetch stage will get the subsequent byte from reminiscence. What concerning the very first byte? How is the CPU going to start out working code situated within the sport cartridge in any respect?
The boot course of
Earlier, I discussed that code and information is saved in reminiscence, usually RAM. Not at all times. There are particular circumstances that modify based mostly on the system – whether or not or not it’s a GameBoy, a UEFI-based pc, an iPhone, and so forth.
When a tool first boots, programs want some method to set themselves up sufficient to maneuver on to one thing extra superior: this subsequent step is likely to be loading an working system, handing off management to a sport cartridge, or one thing else. This piece of code is purpose-built for the boot course of, and it’s additionally usually etched into the silicon of the system when it’s manufactured. In different phrases, it’s learn solely reminiscence, or ROM. Thus, this piece of code is usually known as the boot ROM.
Hrm. We’ve established that the boot ROM is the very first thing that runs, however actually we’ve simply pushed our query additional again a bit. How does the CPU start working the boot ROM when the system first begins up?
When the CPU is initialized, the aforementioned instruction pointer will comprise the worth 0. That’s, the CPU will kick off its fetch-decode-execute lifecycle utilizing no matter bytes are on the very starting of reminiscence.
So, our system wants some method to prepare issues such that when the CPU reads byte# 00, it will get the primary byte contained inside the boot ROM, and so forth for byte# 01, byte# 02, and so forth.
Enter the MMU. A technique to consider reminiscence is as a giant record of bytes. Studying byte# 00 is equal to studying the primary factor of the record, whereas studying byte# 100 is equal to studying the 10016th factor of the record. Fairly easy. These byte indexes are alternatively referred to as ‘addresses’, and the record itself is alternatively referred to as a ‘house’. Thus, we get the time period deal with house.
The CPU can learn (or write, typically!) to every of those addresses, however the place is the “different finish”? The place is the information at these addresses saved? A primary guess is likely to be RAM, which is actually heading in the right direction. That mentioned, it’s not the entire story.
The MMU, or reminiscence administration unit, mediates the CPU’s entry to completely different {hardware} peripherals which might be linked to the address bus, of which RAM is only one. When the CPU tries to entry a given deal with, it’s the MMU’s job to find out which piece of underlying {hardware} that exact deal with belongs to, and to ahead the entry to that gadget as acceptable. The MMU even handles when the CPU tries to entry a chunk of reminiscence that no gadget is offering. Totally different programs implement completely different conduct when one thing like this occurs – some will generate an inner occasion that invalid reminiscence was accessed (x86_64 does this), some may have the MMU return all zeroes or all ones when an invalid deal with is learn (the GameBoy does one thing like this).
With that information in hand, it will get rather a lot simpler to reply our query about how the boot ROM is executed: the MMU on the GameBoy has issues arrange such that when the CPU reads deal with 0, it’s going to obtain the primary of the boot ROM’s bytes that have been etched into the silicon when the GameBoy was manufactured.
Okay! Lengthy stroll for a brief drink of water, however this information will serve us properly. Let’s monitor the sequence of occasions to this point:
-
The GameBoy receives energy
-
The CPU units its instruction pointer to 0
-
The CPU fetches the byte on the deal with indicated by its instruction pointer
-
The MMU routes the learn of deal with 0 to the boot ROM
-
The primary byte of the boot ROM is returned to the CPU
-
(We’re right here!) The CPU decodes the byte into an instruction
Nice! Our understanding has leveled-up from Fetch to the following stage, Decode.
Instruction decoding
Like we mentioned, the correspondence of a quantity to its which means when decoded as a directive to the CPU is contextual and depending on the CPU in query. That mentioned, there’s a essential stage of shared understanding: the CPU itself and code that’s supposed to run on that sort of CPU should agree on what every byte ought to imply. If we tried to run bytes compiled for x86_64 on a CPU manufactured to run AArch64, issues would get very bizarre very fast as a result of the semantic which means of our code has been misplaced. This correspondence of numeric bytes to the directives or operations they’re related to is termed the set of directions that the CPU is aware of learn how to run, or the CPU’s instruction set.
The GameBoy’s instruction set is derived from that of the Zilog Z80. The lineage isn’t as vital because the inheritance: just like the Z80 it’s derived from, the GameBoy CPU runs an 8-bit instruction set. That’s, as an alternative of the elemental data-type being the 64-bit integer we touched on above, the elemental kind is as an alternative an 8-bit integer.
Every instruction begins off with an opcode, a 8-bit worth (or code) denoting what operation the CPU ought to carry out.
For instance, say the CPU fetches this opcode:
76
What’s the contextual which means for the GameBoy CPU of this opcode? The reply to this query is technically outlined by the circuitry etched into each GameBoy CPU, however individuals have additionally provide you with nice visualizations of the GameBoy CPU’s instruction set to assist emulator writers. Right here’s an opcode desk from a great resource I used whereas writing my emulator:
By following the ‘7’ row on the left and the ‘6’ column throughout the highest, we are able to see that the opcode 76
corresponds to an instruction referred to as HALT
. When the CPU fetches this opcode, it’s going to decode the byte and perceive that it refers back to the HALT instruction, then will run the operation related to HALT (we’ll get there, don’t fear!)
Some directions can’t make do with only a single byte, however want some further information to explain extra particulars of their operation. For instance, say the GameBoy CPU contained an ADD Register A, u8
instruction. The operation related to this instruction could be to take the 8-bit worth (the u8
), and add it to the 8-bit worth saved in Register A
, then to retailer the results of the addition in Register A
. Clearly, that is going to take greater than a byte to retailer, as a result of we have to know what 8-bit worth so as to add to Register A
. The 8-bit worth might be saved straight following the opcode itself. Subsequently, some directions take up greater than only one byte: we’ve obtained one byte for the opcode, plus both no information, 1 byte of knowledge, or 2 bytes of knowledge.
This technically makes the GameBoy CPU’s instruction set a variable-length encoding, the place the variety of bytes it takes to retailer every instruction relies on what the instruction does. Some extra modern instruction units (notably x86_64, one other descendent of the 8080 lineage that the Z80 itself is descended from) get much more wacky with this, however fortunately the GameBoy CPU caps the utmost instruction measurement at three bytes.
Armed with the information that an instruction can span a number of bytes, let’s check out the complete GameBoy CPU opcode tables:
Whoa, did you catch that? Desks! Plural!
That is fascinating, and reveals the place our digression about multi-byte directions comes from: if each instruction took up precisely one byte, then there could be a most restrict of 256 directions within the instruction set, since a byte (equal to eight bits) is able to storing any quantity from 010 (00000000
2) to 25510 (11111111
2). Due to the encoding scheme, it’s attainable for the instruction set to outline greater than this. The GameBoy CPU ended up defining precisely 500 directions.
Whereas a few of these directions are multi-byte as a result of they comprise an additional byte of information (like our ADD Register A, u8
instance), a few of them are multi-byte as a result of they comprise an additional byte of code. Particularly, each instruction within the blue desk has an preliminary opcode byte of cb
, adopted by a second opcode byte whose which means is given by the desk. I’ve seen this set known as the cb opcodes.
So! We’ve obtained our opcode, plus maybe an additional byte or two that it must do its work. It’s time to execute the instruction, updating the system state by performing the operation indicated by the opcode.
There’s an vital level right here: we’ve a number of opcodes as a result of every opcode does one thing completely different to the system. Every distinctive opcode maps to a novel operation that the CPU is aware of learn how to carry out, and our fledgling CPU emulator might want to know learn how to apply each considered one of them to our emulated system.
Getting intelligent with our implementation
Fortunately, the duty isn’t actually so daunting as “implement 500 distinctive operations”. Lots of the opcodes are variations on a theme. For instance, let’s take a more in-depth have a look at a snippet of the instruction tables from above:
Every of those directions begins off with SUB
, then has a special letter it operates with. these opcodes as an entire, these are the attainable letters {that a} SUB
instruction could comprise:
B, C, D, E, H, L, (HL), A
Every of those ’letters’, in fact, imply one thing very particular within the context of the system: the GameBoy CPU, like any CPU, computes with the assistance of registers. Whereas 8-bit integers are the elemental information kind of the CPU, the register is the elemental information storage of the CPU. Nearly any time a CPU operates on an 8-bit quantity, it’s doing so by interacting with the contents of a register. Once we need the CPU to learn an 8-bit quantity, we use an instruction to load the byte situated at a reminiscence deal with right into a given register. Once we wish to carry out arithmetic, we specify the registers containing the numbers we wish to function on.
The GameBoy CPU has precisely eight registers. The names are arbitrary, however the GameBoy itself and our CPU desk sheets name them:
B, C, D, E, H, L, A, F
Excellent! These precisely match up with what we noticed within the variants of the SUB
instruction, so every SUB
instruction is clearly a barely completely different model that performs a subtraction utilizing the register on the suitable hand facet.
…
Oh.
Let’s see ’em side-by-side.
SUB operands:
B, C, D, E, H, L, (HL), A
CPU registers:
B, C, D, E, H, L, A, F
Properly, that’s a complication. The attainable SUB
operands are very related, however not equivalent, to the accessible CPU registers. And what is that (HL)
doing there, anyway? Clearly, we’ve obtained some speaking to do.
Whereas what I mentioned earlier than is true, that the CPU largely performs computations through loading information into registers, it’s not the complete story. The CPU can also be capable of carry out computations by speaking on to reminiscence, while not having to contain an additional load/retailer with a register.
The GameBoy CPU gives precisely one method to do such a factor. With some directions, the GameBoy CPU is aware of learn how to take the values within the H and L 8-registers, stick them side-by-side to create a 16-bit quantity, then learn the worth on the deal with indicated by the 16-bit quantity from reminiscence. This operation of “studying the worth at an deal with” is indicated by the ()
surrounding the HL
, and is extra typically known as dereferencing the deal with. We’ll see extra of it later. However first! An instance of what this might appear to be in apply:
H: aa
L: bb
(HL): The byte presently saved at deal with aabb
Okay, certain, together with the 8-bit registers we are able to additionally carry out some operands by dereferencing the worth in HL
, or (HL)
for brief. Is that each one?
Again to our SUB
operands and accessible CPU registers, there’s one other obvious omission: we are able to see that the CPU accommodates an F
register, however there’s no SUB
variant that helps it. What offers?
Properly, the F
register is particular: fairly than being a general-purpose 8-bit register like the remainder, it as an alternative serves because the special-purpose flags register. That’s, over the course of execution of an instruction, the system would possibly wish to notice some ensuing state for additional processing. For instance, if the results of the SUB
instruction is precisely zero, the Z
bit within the flags register might be set. The CPU then has different directions that do various things based mostly on whether or not a specific flag is ready or not. The F
register isn’t straight accessible to the programmer; its numerous bits may be set as side-effects of working directions, and its standing may be implicitly learn by working directions that do various things based mostly on its contents.
Proper. We’ve obtained our SUB
operands. Once we fetch the opcode indicated by the instruction pointer, how will we all know which SUB
instruction to execute?
In fact, a technique to do that is in a barely brain-dead “if the opcode is 90 do that, if the opcode is 91 do this, …”, however I feel we are able to do a bit higher. Let’s write out all of the SUB
opcodes and see if we are able to spot something:
90 91 92 93 94 95 96 97
There’s a pleasant and easy property staring us within the face – it’s additionally fairly seen from wanting straight on the opcode desk above. The SUB
directions begin at 90
, and increment by 1 for every right-side operand. Let’s check out the binary illustration of every opcode:
90: 10010000
91: 10010001
92: 10010010
93: 10010011
94: 10010100
95: 10010101
96: 10010110
97: 10010111
We will see that every SUB
opcode begins with the identical sample in essentially the most important bit: 10010___
, adopted by a sequence that counts up by 1 for every variant: 000
, 001
, 010
, 011
, 100
, 101
, 110
, 111
. That is handy for us! It means we are able to do some sample matching on no matter opcode byte we’ve fetched. If the excessive bits match the 10010
sample that each one SUB
directions observe, we all know that we’ve fetched a SUB
instruction, and all that’s wanted is to take these decrease 3 bits as an index into the desk that may inform us which register to subtract with.
Er. Not at all times a register – our buddy (HL)
continues to be round, able to fetch reminiscence at a second’s discover.
That is very nice! With one fell swoop, we’ve managed to knock out eight directions with one implementation.
One minor level I’ve glossed over as much as know: We’ve obtained a SUB
instruction, we’ve determined what register it’s subtracting with, however what’s it going to subtract from? And the place does the results of the subtraction go as soon as we’ve carried out it?
We’ve obtained one other register with privileged standing to debate: the A
, or accumulator register. A typical idea amongst instruction units, the A
register is each an implicit operand to, and implicit vacation spot of, SUB
directions (and lots of different directions!) In different phrases, whereas the SUB
instruction solely mentions one operand on the tin, it truly performs an operation one thing like the next:
A = A - Operand
Once more, this isn’t the entire image: executing this instruction can also replace numerous standing bits within the F
register, reminiscent of whether or not the consequence was precisely zero, whether or not the subtraction underflowed, and some different situations.
Let’s have a look at one other instruction!
The DEC
directions subtracts 1 from the operand and locations the consequence again into the operand’s storage. In different phrases, it performs this operation:
Operand = Operand - 1
That is easy within the case the place the operand is a register, and requires only a hair extra thought when the operand in query is (HL)
. Sketching out what which may appear to be, we get:
H = 11
L = 22
HL = 1122
On this case, we’ll be working DEC
with the dereferenced worth of HL
, or the byte saved at deal with 1122. Let’s faux this deal with already accommodates some worth:
(1122) = 77
Once we run the DEC (HL)
instruction, we’ll learn the present worth of (HL)
: first, we conjoin the 8-bit values of H
and L
, 11
and 22
, to yield a 16-bit deal with, 1122
. Then, we’ll ask the MMU to learn the byte at deal with 1122
, and we’ll get 77
again. Lastly, we’ll subtract 1
from that worth, and retailer the consequence again into the byte saved at deal with 1122
. Nothing too snazzy right here, however we’ll wish to maintain it in thoughts when designing our routines that may execute an instruction given any of its operand variants. Early on, I needed to rework my first implementation in order that it may reuse the identical code to deal with each plain registers and (HL)
.
The DEC
household sounds easy sufficient. Can we do the identical trick we pulled with SUB
, by which we implement a bunch of comparable instruction variants with a single implementation?
With SUB
, issues have been fairly easy: the operand we have been imagined to work with was clear, with a one-to-one correspondence ranging from opcode 90
. With DEC
, issues are wanting a bit shakier: the varied opcodes are on completely completely different sides of the desk, and positively aren’t incrementing by one from one variant to the following.
Can we spot something if we write out the DEC
variants’ hexadecimal representations?
05
0d
15
1d
25
second
35
3d
Hmm… There’s actually a sample right here, but it surely’s not clear to me how we’d check out an opcode and instantly see each that it’s a DEC
, and which variant it’s. Perhaps wanting on the binary representations of every opcode will shed some mild?
00000101
00001101
00010101
00011101
00100101
00101101
00110101
00111101
Sure! We’ve obtained the identical sample of three bits counting up one-by-one to explain what variant the opcode refers to, besides this time round these three bits are sandwiched in the course of the opcode! We will determine a DEC
by checking whether or not the opcode matches 00___101
, and people three bits within the center inform us what operand we’ll be working with in the identical counting-by-one style as earlier than. I discover this actually fascinating, as a result of this sample isn’t obvious in any respect when how the opcodes are visually specified by the desk, but it surely’s actually clear when writing out the binary representations of the opcodes!
That needs to be a fairly good basis to get us going writing our emulated CPU in software program. The factor a few CPU, although, particularly when writing a digital one, is that it will get much more helpful while you’ve obtained some code you’d wish to execute on it!
Working a program
We’ve obtained a whole bunch of opcodes to implement, and a few sort of plan to divide-and-conquer might be very important to implementing this factor in an approachable manner. If we choose an present program that we’d wish to run on our fledgling CPU, it’ll present us with an incredible system for knocking out segments of the instruction set:
-
Load this system into our digital setting
-
Kick off the CPU on its fetch/decode/execute lifecycle
-
Hold going till the CPU fetches an opcode it doesn’t but assist
-
Implement the lacking opcode (and its family, if any)
-
Repeat!
However what program to select? Ideally we’d need one thing that’s approachable in measurement (so we don’t should implement each opcode underneath the solar earlier than we see outcomes), that’s well-understood (so we are able to examine whether or not our emulated CPU is doing the suitable factor), and that’d be helpful code to run anyway.
An important candidate right here is likely to be the boot ROM that executes each time the GameBoy boots! Let’s see:
-
Approachable in measurement? Yep, the boot ROM suits in simply 256 bytes.
-
Properly-understood? Undoubtedly – individuals have annotated the boot ROM’s directions line-by-line.
-
Helpful in its personal proper? The boot ROM is liable for displaying the traditional
Nintendo
brand on startup, so I’d go as far to say that is important.
That final level is an actual traditional, and really acquainted to anybody who’s performed a GameBoy: the scrolling brand is the begin to each session! Which means the boot ROM each a purposeful program, and an iconic a part of the GameBoy expertise.
Now, it’s helpful to grasp what the boot ROM is doing precisely, so we all know the form of issues to anticipate. We all know that usually the boot ROM’s duty is issues like initializing the remainder of the system in such a manner that later code can run in a identified state, however the boot ROM can produce other tasks, too. One in every of these tasks, each within the case of the GameBoy and lots of different programs, is verifying the integrity of the remainder of the code the system will execute.
Trusting belief
This results in an idea referred to as the chain of belief. The boot ROM, as we all know, is read-only, so if we belief the boot ROM and the boot ROM trusts no matter it masses subsequent, we’ve established a chain making certain that any code that the system runs is ‘protected’. The definition of ‘protected’ varies: it’d imply that the code has been signed with a cryptographic signature, for instance, assuring that this system has been verified and approved by the platform vendor, or it’d validate that the code hasn’t been tampered with after compilation.
The GameBoy, nevertheless, does no such cryptographic validation of the following piece of code to run after the boot ROM. Launched in 1989, I’m not certain whether or not that is right down to cryptographic validation being too taxing for the constrained {hardware} that the GameBoy packs, whether or not the modern strategies for code signing had not been developed to a adequate diploma to allow Nintendo to cryptographically safe the video games of third celebration builders, whether or not they merely weren’t capable of pack sufficient code into the boot ROM to do cryptographic validation whereas balancing different tradeoffs, or another motive. Nonetheless, the case stays that the boot ROM performs no such factor. That mentioned, Nintendo nonetheless wished some measure of management over the system! With any leisure platform, there’s a palpable propensity for end-user management and modification. The implications of this will vary from the outright cool (hobbyist video games and demoscenes), to the explicitly detrimental for Nintendo and video games publishers (pirated video games redistributed with out paying the builders).
With none method to technologically implement that each one software program working on the GameBoy should be approved by Nintendo, is there some different method to implement this?
Sure certainly! Nintendo did one thing very intelligent right here. Bear in mind the traditional Nintendo
brand that scrolls down from the highest of the display screen when the boot ROM executes? Quite than together with the Nintendo brand information within the boot ROM and loading it into the video RAM information straight, Nintendo required that each sport cartridge contained its personal copy of the Nintendo brand.
Do you see the trick right here?
…
It’s a menace.
Nintendo is staving off pirates and unauthorized redistributions by threatening them with copyright regulation. To ship an unauthorized or pirated sport, the pirate is compelled to distribute the Nintendo brand too. Intelligent!
One fascinating consequence of that is that each GameBoy cartridge ever manufactured accommodates the identical redundant copy of the Nintendo brand, again and again in duplicated silicon counted in thousands and thousands, all to threaten would-be pirates.
Besides… how does the boot ROM know that the brand saved within the cartridge is the proper brand in any respect? Couldn’t we simply put something in there and permit the boot ROM to show it?
Properly, sure!
We will stick something we’d like within the cartridge reminiscence that’s imagined to comprise the Nintendo brand. Right here’s what the boot display screen appears to be like like if we don’t initialize the cartridge brand reminiscence in any respect:
Invalid Nintendo boot brand
Looks as if a fairly easy circumvention for Nintendo’s litigative DRM, no? Sadly, after I referred to as the cartridge brand redundant, I actually meant it: the boot ROM accommodates its personal copy of the Nintendo brand, too. It doesn’t show it, although: it reads the brand information from the cartridge, shows it, then checks it in opposition to the copy of the brand saved within the boot ROM itself. If there’s any sort of mismatch, the boot ROM will lock up the system to forestall any additional use.
Going again to the duty at hand: we wish to run the boot ROM code as a software to assist transfer us alongside our CPU implementation, however to run this system we’ll first have to have this system within the first place. How do we all know what code the boot ROM accommodates?
Buying the boot ROM
This might sound easy at first look: the boot ROM program has been etched into the silicon of each GameBoy ever produced, so can’t we simply… learn it?
Properly, sure, in concept we are able to. Virtually, although, studying information from a bodily medium measured in micrometers is not any small feat. In actual fact, it took 14 years earlier than someone managed to do it, armed with a minimum of a strong microscope, a GameBoy CPU with the plastic housing chemically dissolved away, and an entire lot of gumption.
Due to their work, copies of the boot ROM, each in ready-to-run compiled bytes and annotated meeting, are floating ‘spherical on the web – ready for any intrepid emulator developer who’s able to take them on.
Hrm. Hold on a second. 14 years to learn the boot ROM? Couldn’t a sport simply, like, learn out the contents of the boot ROM addresses when the sport begins working, then show the information on the display screen or one thing?
Properly, sure. That’s why Nintendo included another trick within the jam-packed 256 bytes of code within the boot ROM. Recall our earlier digression on the MMU:
When the CPU accesses an deal with, the MMU will route the entry to the peripheral that ‘owns’ that deal with. Say the CPU reads deal with #00
. That is inside the boot ROM, so the primary byte of the boot ROM might be accessed. What about deal with #ff
? Nonetheless inside the boot ROM, so the MMU will route the entry there. And deal with #100
? The boot ROM takes precisely 256 bytes to retailer, so that is simply outdoors the boot ROM’s deal with house. As an alternative, the MMU will route this entry to a special peripheral: the sport cartridge.
With our concept above, the sport cartridge would learn addresses #00
by #ff
, storing or displaying their contents so {that a} human may learn the boot ROM information. Why doesn’t this work?
As its closing flourish, the final 4 bytes within the 256-byte boot ROM encode the next 2 directions:
Tackle: #00fc Byte(s): 3e 01 Description: Load Register A with 01
Tackle: #00fe Byte(s): e0 50 Description: Load (ff50) with Register A
Feeling enlightened? Me neither. Let’s head to the Pan Docs to see what’s so particular concerning the deal with #ff50
.
Reminiscence-mapped IO descriptions
Ah ha! Equally to how the MMU will redirect any accesses to the boot ROM deal with vary to the boot ROM itself, the MMU additionally implements some particular addresses: accessing them (through reads or writes, relying) triggers particular system conduct. Writing a worth of 1
to deal with #ff50
, specifically, causes the MMU to successfully disable the boot ROM! After this worth has been written, the MMU received’t ahead any accesses within the deal with vary #00 - #ff
to the boot ROM any longer. As an alternative, the MMU will direct these accesses to the sport cartridge. There’s additionally no manner to disable this conduct as soon as enabled, wanting rebooting the system!
In different phrases, the very very last thing the boot ROM does, simply on the cusp of the instruction pointer iterating previous the #00 - #ff
vary, is write a particular worth that’ll trigger the MMU to disable any and all capacity to succeed in the boot ROM information through reminiscence entry. That is spectacularly well-timed, because the second the CPU is completed working this instruction, its instruction pointer reaches #100
and it leaves the #00 - #ff
vary it was beforehand working inside. The sport that’s been loaded has no method to speak to the boot ROM, however there may be the bonus that the sport is free to make use of the #00 - #ff
vary for its personal functions.
OK, acid and microscopes it’s. Let’s recap the key capabilities of the boot ROM program:
-
Initialize the system
-
Load the Nintendo brand from the sport cartridge
-
Scroll the brand onto the display screen
-
Validate the brand’s validity
-
Disable the boot ROM and permit the cartridge to start working its code
The great factor about writing an emulator is we don’t have to know precisely how the emulated program does every of this stuff! All we actually have to do is present the setting that this system expects itself to be working inside, and to run its code faithfully. This system will do the remainder. It is a actually highly effective concept: it implies that we are able to write an emulator that runs in style video games, with out us ever needing to know the way these video games work underneath the hood. In actual fact, our emulator is theoretically equal to any pc. That’s, something that may be computed may be computed in our emulated setting (modulo useful resource constraints). I wouldn’t go porting a desktop setting to the GameBoy anytime quickly, however the magic is in the concept that you may: our emulated CPU might be beneath, dutifully chugging alongside, facilitating another person’s program to hold out its logic with out actually needing to know something about what’s happening on the greater stage.
Again to our earlier concept, and boot ROM dump in-hand, let’s style ourselves an emulator workshop. We’ll run the boot ROM inside our digital system, again and again, urgent on only a bit additional in execution every time as we implement extra of the opcodes this system depends on to execute. To maintain monitor of my progress, I coloured in every set of opcodes as I completed their implementation, leaving me with a pleasant and neat time-lapse of the way it went. I solely allowed myself to paint in an instruction as soon as I’d completed each the implementation and related unit exams, which stored me disciplined. It’s exhausting to overstate how satisfying it was to observe this desk develop increasingly occluded!
8-bit opcode desk time-lapse
16-bit opcode desk time-lapse
Subsequent steps
In fact, all of the directions on this planet received’t do us a lot good with out a method to observe their side-effects. We all know the boot ROM is meant to be scrolling a brand, however the place would we see that? It’s time to maneuver on to our subsequent system part, a very formidable beast: the PPU.