A Dynamic Forth Compiler for WebAssembly
In one more ‘probably-useless-but-interesting’ passion venture, I wrote a Forth
compiler and interpreter targeting WebAssembly.
It’s written solely in WebAssembly, and comes with a compiler
that dynamically emits WebAssembly code on the fly. Your complete system (together with 80% of all
core phrases) matches right into a 10k (5k gzipped) WebAssembly module. You’ll be able to attempt it out here, or
seize the code from GitHub.
What follows are some notes on the design, and a few preliminary crude velocity benchmarks.
ℹ️ Word (2022-08-19): This publish is comparatively previous. For the reason that time of writing, lots has been added
to WAForth, together with design adjustments, the implementation of all of the ANS Core phrases and most
ANS Core Extension phrases, the addition of a JavaScript interface, a standalone model, …For a extra up-to-date view of the venture, try the WAForth GitHub page.
Forth
Forth is a
low-level, minimalistic stack-based programming language. It usually comes within the
type of an interactive interpreter, the place you possibly can kind in your instructions. For instance,
taking the sum of two numbers and printing the end result:
2 4 + . 6 okay
Forth environments even have a compiler built-in, permitting you to outline new ‘phrases’
by typing their definition straight from throughout the interpretter:
: QUADRUPLE 4 * ;
which you’ll then instantly invoke
2 QUADRUPLE . 8 okay
Not in contrast to Lisps, you possibly can customise Forth’s compiler, add new management circulation
constructs, and even swap backwards and forwards between the interpreter and the compiler
whereas compiling.
Due to its minimalism, Forth environments may be simply ported to new
instruction units, making them widespread in embedded techniques. To be taught a bit extra
about this language (and about WebAssembly), I wished to attempt creating an
implementation for WebAssembly – not precisely an embedded instruction set, however
an instruction set nonetheless.
Design
WAForth is (nearly) solely written in WebAssembly. The one elements for which
it depends on exterior (JavaScript) code is the dynamic loader (which isn’t
out there
(yet?)
in WebAssembly), and the I/O primitives to learn and write a personality.
I received numerous inspiration from jonesforth, a
minimal x86 meeting Forth system, written within the type of a tutorial.
The Macro Assembler
Replace (11/2019): WAForth not makes use of a customized macro assembler; the core is now written
solely in uncooked WebAssembly.
The WAForth core is written as a single module in WebAssembly’s text format. The
textual content format isn’t actually meant for writing code in, so it has no services like an actual assembler
(e.g. fixed definitions, macro growth, …) Nevertheless, because the textual content format makes use of S-expressions,
you are able to do some small tweaks to make it loadable in a Lisp-style system, and use it to increase
it with macros.
So, I added some Scheme (Racket) macros to the module definition,
and carried out a mini assembler to print out the ensuing s-expressions in a compliant WebAssembly format.
The result’s one thing that’s nearly precisely like a regular WebAssembly
textual content format module, however sprinkled with some macros for comfort.
The Interpreter
The interpreter runs a loop that processes instructions, and switches to and from
compiler mode.
Opposite to another Forth techniques, WAForth doesn’t use direct threading
for executing code, the place generated code is interleaved with knowledge, and the
program jumps between these items of code. WebAssembly doesn’t enable
unstructured jumps, not to mention dynamic jumps. As an alternative, WAForth makes use of
subroutine threading, the place every phrase is
carried out as a single WebAssembly operate, and the system makes use of calls and
oblique calls (see beneath) to execute phrases.
The Compiler
Whereas in compile mode for a phrase, the compiler generates WebAssembly directions in
binary format (as there isn’t a assembler infrastructure within the browser). As a result of WebAssembly
doesn’t support JIT compilation yet, a completed phrase is bundled right into a separate binary WebAssembly module, and
despatched to the loader, which dynamically masses it and registers it in a shared
function table on the
subsequent offset, which in flip is recorded within the phrase dictionary.
As a result of phrases reside in numerous modules, all calls to and from the phrases must occur as
oblique call_indirect
calls via the shared operate desk. This after all introduces
some overhead.
As WebAssembly doesn’t assist unstructured jumps, management circulation phrases (IF/ELSE/THEN
,
LOOP
, REPEAT
, …) can’t be carried out when it comes to extra fundamental phrases, in contrast to in jonesforth.
Nevertheless, since Forth solely requires structured jumps, the compiler can simply be carried out
utilizing the loop and department directions out there in WebAssembly.
Lastly, the compiler provides minimal debug details about the compiled phrase in
the name section, making it simpler for doing a little debugging within the browser.
The Loader
The loader is a small little bit of JavaScript that makes use of the WebAssembly JavaScript API to dynamically load a compiled phrase (within the type of a WebAssembly module), and guaranteeing that the shared operate desk is giant sufficient for the module to
register itself.
The Shell
There’s a small shell across the WebAssembly core to interface it with JavaScript.
The shell is a simple class
that masses the WebAssembly code within the browser,
offers the loader and the I/O primitives to the WebAssembly module to learn and write characters to a terminal. On the opposite finish, it offers a run()
operate to execute a fraction of Forth code.
To tie all the things collectively into an interactive system, there’s a small
console-based interface round this shell to kind Forth code, which you’ll see
in motion here.
Benchmarks
⚠️ Word (2022-08-19): I wouldn’t belief the outcomes beneath any extra:
- Quite a bit has modified in browser WebAssembly implementation, each good (speedups) and
dangerous (Spectre mitigations). I don’t know wherein route this takes the recorded instances.- Quite a bit has modified in WAForth itself when it comes to design (together with optimizations that yield 40%
speedups on this toy benchmark, but additionally adjustments that slowed down)- For GForth instances, I didn’t look in particulars at any flags. For all I do know, Gforth might
generate even quicker code by tweaking some flags.- The benchmark is a really toy benchmark. I don’t know how consultant it’s.
Though I didn’t concentrate on efficiency whereas writing WAForth, I nonetheless wished to have an
estimate of how briskly it went. To get a crude thought, I ran an implementation of the
Sieve of Eratosthenes. I let the algorithm compute
all of the primes as much as 90 million on my MacBook Professional 2017 (3.5Ghz Intel Core i7) operating Firefox 60.0.1, and timed completely different techniques:
- WAForth: The sieve algorithm, written in Forth, operating in WAForth. The phrases are compiled as separate WebAssembly modules, and all calls to and from the phrases are oblique (as described above).
- WAForth Direct Calls: The sieve algorithm, as compiled by WAForth, however inserted straight within the WAForth core, substituting all oblique name directions from the earlier model for direct calls. This measurement provides a sign of the overhead of oblique jumps.
- Gforth: The sieve algorithm operating in Gforth Fast 0.7.3, a local, high-performance Forth.
- JS-Forth: The sieve algorithm operating in JS-Forth 0.5200804171342, a JavaScript implementation of Forth.
- Vanilla WebAssembly: A straight WebAssembly implementation of the algorithm. This serves as an higher sure of how briskly the algorithm can run on WebAssembly.
Some observations:
- Not surprisingly, JS-Forth comes out the slowest. It additionally runs into reminiscence issues when making an attempt to extend the 90 million restrict. The opposite implementations haven’t any downside (requiring just one byte per candidate prime).
- The oblique calls in WAForth trigger ±50% overhead. A few of this overhead may be diminished when
WebAssembly begins supporting mutable globals, as
it will require much less oblique requires operations like loops and jumps. - WAForth is 2× slower than the high-performance native Gforth
- The Vanilla WebAssembly model of the Sieve algorithm is far quicker than the remainder. Opposite to the WAForth model, this model
doesn’t must go to reminiscence for each piece of management circulation, inflicting huge velocity positive factors. This
is very noticeable when growing the variety of primes: for all primes lower than 500 million,
the vanilla WebAssembly model is as much as 8 instances quicker than the WAForth one.
WAForth continues to be experimental, and lacks support for a few of the ANS
core words (though including assist for these shouldn’t be an excessive amount of work). I additionally didn’t spend any time profiling efficiency to see if there are any
low-hanging fruit optimizations to be completed (though I believe most optimizations would complicate
the compiler an excessive amount of at this level).