The Case of the Lacking 4th Commodore BASIC Variable (and the fifth Byte)
One more detective story.
We’ve met them aready, again within the happyier days of ’20, when issues nonetheless appeared proper, earlier than that cloud of gloom settled over town, the jolly bunch often known as the Commodore BASIC variables. To frequent data, there are 3 of them, Float, Integer and String, and in the event you browse the gazettes and tales distributed over the primary streat counters of the Web, this may increasingly all you recognize about them. Every of them is understood by their signature grin and every of them comes with a goal.
Let’s have them rounded up for a fast identification:
Mug | Reminiscence Signature | Enterprise | Stature, Private Traits | |
---|---|---|---|---|
A1 |
A 1 |
(0x40 0x31) |
Floating Level Quantity | 5 bytes: exponent/signal, 4 bytes mantissa |
I2% |
I̅ 2̅ |
(0xC9 0xB2) |
Integer Quantity | 5 bytes: 2 bytes binary worth, 3 zero-bytes (unused) |
S3$ |
S 3̅ |
(0x53 0xB3) |
String | 5 bytes: size, 2-byte reminiscence pointer, 2 zero-bytes |
Every of them is 7 bytes in reminiscence, 2 bytes for the title, adopted by a 5-byte variable physique, which does the precise enterprise. The title additionally encodes their kind, so that you already know who you’re coping with as quickly as they arrive round. They don’t make a lot of a secret of their enterprise, as they proudly present it off, proper of their face, by signal marks sprinkled throughout them.
Particularly, Floats comes with a clear child face, with no marks in any respect, contemporary ASCII strings throughout. Integer, nevertheless, is yet one more character, marked by indicators on each cheeks, and ol‘ Strings is understood by a single signal mark on the second, right-hand aspect of his signature grin.
- Commodore BASIC variables by sign-bit - 0 0 Float 1 1 Integer 0 1 String
In case you have been round for a while within the backyards and alleys they name the Binaries, you finally develop a really feel for this. One thing was telling me that this is probably not all, that there could also be nonetheless some in hiding. Slightly one thing, we hadn‘t seen, but. Who is aware of, possibly a damsel in misery?
Simply to place you within the image, I interrogated them with my trusty PET 2001 emulator, which now comes with a quick instrument for disassembling variables as in reminiscence. (That is yet one more story, keep tuned.) There’s no hiding anymore and here’s what they seem like with out their fairly itemizing garments:
### COMMODORE BASIC ### 15359 BYTES FREE READY. 10 A1 =2.345 20 I2%=258 30 S3$="BLA" RUN READY. █
→ Utils/Export → Disassemble Variables
.[simple BASIC variables] 042B 41 31 A1 042D 82 16 14 7A E2 = 2.345 0432 C9 B2 I2% 0434 01 02 00 00 00 = 258 0439 53 B3 S3$ 043B 03 24 04 00 00 len: 3, @ $0424 .[end of BASIC variables]
(Thoughts that they give the impression of being a bit completely different, after they are available a flock and establish solely by subscript.)
However the place is the story in that — and what concerning the damsel?
One other Sort?
It wasn’t earlier than a buddy of mine got here round with an outdated supply of his that I caught a primary glimpse of her: (Fancy speak apart, of which we might have had sufficient by now, this was Jason Prepare dinner, who grew to become a useful beta tester for the brand new model of the emulator. Take a look at his new PET game!)
1C0A D2 00 B4 0A 13 1C B5 ;var: "R" + sign-bit, 0
There she was, shyly revealing the sign-bit that adorned her first byte!
So there really are,
- Commodore BASIC variables by sign-bit - 0 0 Float 1 1 Integer 0 1 String 1 0 damsel in misery?
However, who was she, and was she really in misery?
That is a good deeper thriller, since Commodore by no means made a lot of a thriller of variable codecs, proper from the start. The PET manuals clearly describe how BASIC interacts with reminiscence and gives some examples for in-memory codecs, however it solely mentions 3 sorts, floating level, integer, and string. So what might this 4th variable kind be, and what mysteries are lurging behind this?
I knew already some, specifically that she was recognized by the only letter “R”. So it wasn‘t that tough to hint her right down to the origins, hidden in a bunch of densely formatted BASIC statements:
150 DEFFNR(X)=INT(X*RND(U)):GOSUB8010:A1$="NLTSMR"
(STARTREK1978.PRG by Jason Prepare dinner)
It’s a DEFFN
variable! — This makes really some sense that these person outlined features must be saved as variables, so as to look them up by title.
So let’s have a more in-depth have a look at her (*blush*) anatomy…
So as to take action, let‘s give you a a lot less complicated instance that lends itself a bit simpler to investigations:
10 DEFFNR(X)=1+X*X 20 PRINT FNR(3) RUN 10
Now let‘s take a look on the variable as in reminiscence:
→ Utils/Export → Disassemble Variables
.[simple BASIC variables] 0420 D2 00 FNR() 0422 0C 04 29 04 31 – ??? – 0427 58 00 X 0429 00 00 00 00 00 = 0 .[end of BASIC variables]
And, as we’re at it, let’s examine the tokenized program as in reminiscence, as effectively:
→ Utils/Export → Disassemble Program
.[tokenized BASIC text] 0401 12 04 hyperlink: $0412 0403 12 04 line# 10 0405 96 token DEF 0406 A5 token FN 0407 52 28 58 29 ascii «R(X)» 040B B2 token = 040C 31 ascii «1» 040D AA token + 040E 58 ascii «X» 040F AC token * 0410 58 ascii «X» 0411 00 -EOL- 0412 1E 04 hyperlink: $041E 0414 1E 04 line# 20 0416 99 token PRINT 0417 20 ascii « » 0418 A5 token FN 0419 52 28 33 29 ascii «R(3)» 041D 00 -EOL- 041E 00 00 -EOP- (hyperlink: null) .[end of BASIC text]
A versed investigator of BASIC affairs might have noticed it already, straight away: the primary two bytes are pointers into reminiscence, as given away by their second (excessive) byte of 04
, pointing at addresses within the 0x0400
– 0x04FF
vary, with BASIC beginning on the PET at 0x0401
, populated by the tokenized BASIC textual content, adopted by easy variables after which arrays, if there are any.
Let’s rearrange this:
0401 12 04 hyperlink: $0412 0403 12 04 line# 10 0405 96 token DEF 0406 A5 token FN 0407 52 28 58 29 ascii «R(X)» 040B B2 token = 040C 31 ascii «1» 040D AA token + 040E 58 ascii «X» 040F AC token * 0410 58 ascii «X» 0411 00 -EOL- (...) 0420 D2 00 FNR() 0422 0C 04 pointer to $040C (low, excessive) 0424 29 04 pointer to $0429 (low, excessive) 0426 31 – ??? – 0427 58 00 X 0429 00 00 00 00 00 = 0
The primary pointer faucets straight into the perform physique after the task to the perform definition.
The second pointer faucets straight into the variable physique of the argument “X
”, which is definitely a world variable. (Which does make some sense, as there are solely international variables in BASIC.)
This already guarantees some speedy and optimized execution at run-time, because the pointers refer instantly to reminiscence as wanted. Furthermore, we are able to see, why solely floating level values are allowed as an argument, because the pointer to the argument skips previous any notion of the title and sort of that variable, assuming, it‘s a float, straight away.
The Thriller of the 5th Byte
So, what might the 5th byte be about? A few of this may increasingly remind us of how strings are saved, by a primary byte storing the size after which a pointer to the in-memory location, at which the string begins. Is it a size of types? (This will appear much more believable, because the code for executing “DEFFN
” borrows some from the code for string dealing with.)
This was really my first assumption, nourished by some coincidence. Nonetheless, this, after all, it’s not. The execution at run-time simply stops at the primary colon (“:
“) or the primary finish of line, what ever comes first, extending over a single BASIC assertion. No lengths required for that. Is it associated to the variable title? However this was yet one more coincidence in my early investigations into this. As may be clearly seen by the above instance, the place 0x31
provides the ASCII code for “1
”, which bears no relation to “R
”. So, what’s it?
Let‘s broaden on our little experiment:
10 DEFFNR(X)=1+X*X 20 DEFFNG(Y)=3*Y+4
Which (after RUN
) gives the next variable read-out:
0425 D2 00 FNR() 0427 0C 04 2E 04 31 @ $040C, arg @ $042E, ?? 042C 58 00 X 042E 00 00 00 00 00 = 0 0433 C7 00 FNG() 0435 1D 04 3C 04 33 @ $041D, arg @ $043C, ?? 043A 59 00 Y 043C 00 00 00 00 00 = 0
So, the primary variable has a 5th byte of 0x31
and the second variable considered one of 0x33
. Is it some counter? (This additionally reveals, as soon as once more, that this isn‘t associated to any names, since nothing in both “R
”, “G
”, “X
”, or “Y
” interprets to a distinction of two.)
So let’s add one other onother DEFFN
definition to this, simply to confirm:
10 DEFFNR(X)=1+X*X 20 DEFFNG(Y)=3*Y+4 30 DEFFNI(T)=3*T-2 0436 D2 00 FNR() 0438 0C 04 3F 04 31 @ $040C, arg @ $043F, ?? 043D 58 00 X 043F 00 00 00 00 00 = 0 0444 C7 00 FNG() 0446 1D 04 4D 04 33 @ $041D, arg @ $044D, ?? 044B 59 00 Y 044D 00 00 00 00 00 = 0 0452 C9 00 FNI() 0454 2E 04 5B 04 33 @ $042E, arg @ $045B, ?? 0459 54 00 T 045B 00 00 00 00 00 = 0
Hum, that is considerably disappointing: each the second and the third FN
variable have 0x33
as their final byte. So it isn’t a counter in any respect. Furthermore, including another variables to our quick program or altering any of the names doesn’t present any impact on this 5th byte of the variable physique, in any respect.
Nonetheless, if we alter the very first character of the perform physique, we lastly do make a distinction:
30 DEFFNI(T)=4*T-2
0452 C9 00 FNI()
0454 2E 04 5B 04 34 @ $042E, arg @ $045B, ??
Let’s make this
30 DEFFNI(T)=T-2
0450 C9 00 FNI()
0452 2E 04 59 04 54 @ $042E, arg @ $0459, ??
Because the eagle-eyed might have noticed already, 0x34
is the ASCII code for “1
” and 0x54
is ASCII “T
”.
It’s the primary byte literal of our DEFFN
perform physique!
Let’s verify this with a token within the first place:
30 DEFFNI(T)=INT(T)
0451 C9 00 FNI()
0453 2E 04 5A 04 B5 @ $042E, arg @ $045A, ??
Sure, 0xB5
has the sign-bit set, freely giving the BASIC token, and it’s the BASIC token for INT
, certainly:
0425 33 04 line# 30
0427 96 token DEF
0428 A5 token FN
0429 49 28 54 29 ascii «I(T)»
042D B2 token =
042E B5 token INT
042F 28 54 29 ascii «(T)»
0432 00 -EOL-
Nicely, that is that thriller solved.
However, does this 5th byte matter?
10 DEFFNR(X)=1+X*X 20 DEFFNG(Y)=3*Y+4 30 DEFFNI(T)=INT(T) 40 POKE 1160,32 : REM DEC 1160 = $0488 50 PRINT FNI(4.1) RUN 4 READY.
It doesn’t appear so. The end result remains to be what we’d anticipate because of the BASIC perform INT
. It’s additionally not what we’d anticipated, if we changed the token INT
within the BASIC textual content by 32
, which is an easy area/clean, giving “ (T)
”.
And we really modified that final byte:
0482 C9 00 FNI()
0484 2E 04 8B 04 20 @ $042E, arg @ $048B, « »
Let‘s have one other go at this, this time changing the “1
” in FNR
by the ASCII code for “2
”:
10 DEFFNR(X)=1+X*X
20 DEFFNG(Y)=3*Y+4
30 DEFFNI(T)=INT(T)
40 POKE 1130,50 : REM DEC 1130 = $046A
50 PRINT FNR(2)
RUN
5
0464 D2 00 FNR()
0466 0C 04 6D 04 32 @ $040C, arg @ $046D, «2»
This didn‘t make a distinction, as effectively.
A extra thorough investigation into literature on the matter produced a sole supply, specifically “Programming the PET/CBM” by Raeto West.
Right here, we discover FN
variables really described as a particular kind, on p. 9, the place the final byte is described as “INITIAL OF VAR.”
The precise that means of “INITIAL OF VAR.” is probably not that clear because it‘s supplied with out additional context, however — as we‘ve established already — that is truthful and proper, if we’re meant to know, “the preliminary byte of the perform physique refered to by the variable.” (Versus, e.g., “the primary character of the variable identifier,“ or comparable.) The descriptive textual content goes as follows,
A perform definition has two pointers; one to the definition within the physique of the
BASIC program, and one to the floating-point dependent variable. They level simply
after the ‘=’ signal and to the exponent byte respectively. The ultimate byte is rubbish,
generated when the definition is ready up, and isn’t used.
Nicely, I assume, that’s it. Particularly, as (as already talked about) the code makes use of some sources devoted to string dealing with. Nonetheless, it’s nonetheless a bit unusual that this 5th byte isn’t simply set to 0
as with every different surplus bytes in integer and string variables.
What is that this FN
damsel hiding? Which causes her such misery that she ought to exhibit essentially the most intimate secrets and techniques of her construct like this in broad daylight?
I assume, that is yet one more story. Which additionally brings this true detective story to an finish.
Anyhow, if you wish to have a more in-depth have a look at the brand new model of the PET emulator, right here it’s working all the newest demos:
Edit/Replace
Whereas it could be right to talk of the perform parameter (argument) as a world variable, within the sense that it’s created along with the FN
variable and saved alongside it within the international variable reminiscence, it doesn‘t behave like one:
10 X=1 20 DEFFNR(X)=1+X*X 30 PRINT X 40 PRINT FNR(2) 50 PRINT X RUN 1 5 1 READY. █
Furthermore, even, if there isn’t a battle, the perform parameter isn‘t accessible from outdoors:
10 DEFFNR(X)=1+X*X 20 PRINT FNR(2) 30 PRINT X RUN 5 0 READY. █
As could also be inferred from this, the worth of the variable (1 byte exponent and 4 bytes mantissa) is saved earlier than the variable is accessed as a parameter/argument after which restored once more. As person outlined features are callable from inside person outlined features, this can’t be only a buffer within the zero-page (that means, there could also be multiple worth to be saved at any given time), moderately, the contents of the variable physique is pushed to the processor stack after which restored from there. So, whereas they could be outlined as international variables, these are literally behaving like native variables.
We might notice that the trouble taken (5 pushs and 5 pulls to and from the stack, along with the reads and writes that go together with this) to make these behave like native variables considerably counteracts the effectivity instructed by the argument pointer tapping straight within the variable physique.
Wait, there may be extra: don’t miss the Bonus Episode!
Norbert Landsteiner,
Vienna, 2023-03-15