Now Reading
a self-hosting compiler for x86-64

a self-hosting compiler for x86-64

2023-04-12 20:34:55

Module flow chart for Kalyn

Over the course of my Spring 2020 semester at Harvey Mudd
College
, I developed a self-hosting compiler
fully from scratch. This text walks by means of many attention-grabbing
components of the challenge. It’s laid out so you’ll be able to simply learn from
starting to finish, however for those who’re extra curious about a selected matter,
be at liberty to leap there. Or, check out the project on
GitHub
.

Desk of contents

What the challenge is and why it exists

Kalyn is a self-hosting compiler. Because of this the compiler is
itself written within the language that it is aware of compile, and so
the compiler can compile itself. Self-hosting compilers are widespread,
one cause being that programmers engaged on a compiler for language X
most likely take pleasure in writing code in language X and so are inclined to
implement the compiler in language X.

Kalyn compiles a programming language of my very own design, additionally referred to as
Kalyn. One impediment to creating a self-hosting compiler for a brand new
programming language is that with the intention to compile the compiler for the
first time, you must have already got a compiler: it’s a
chicken-and-egg drawback. The only strategy to clear up this drawback is to
first write a easy model of your compiler in a special language,
after which use that compiler to compile your actual compiler. So there are
two implementations of the Kalyn compiler: one in Haskell and one in
Kalyn itself. First I take advantage of the Haskell implementation to compile the
Kalyn implementation, after which after that I can use the Kalyn
implementation to compile itself.

I used to be impressed to create Kalyn by my Compilers class at Harvey Mudd
College
. On this class, college students develop a
working compiler for a easy
Swift-like programming language
over the course of the semester. Nevertheless, I used to be left wanting extra, for
a couple of causes:

  • Many of the compiler was designed and carried out already, with solely
    a couple of components left as homework. This was most likely a fantastic concept for
    maximizing the ratio of studying to work, however I’m the sort of individual
    who will get plenty of satisfaction from doing issues from scratch.
  • The language we compiled in school was probably not fully-featured
    sufficient to do any severe work. Moreover, the programming model of
    Swift and related languages does probably not “spark
    joy
    ” for
    me, even when it’s a good suggestion for efficient software program engineering. I
    desire working in additional expressive languages like
    Haskell and
    Lisp once I’m not on the
    clock. I didn’t really feel terribly motivated in making a compiler for
    a language that I might not truly need to use.
  • The compiler we labored on in school was not actually “full-stack”, because it
    have been, because it reused quite a lot of present software program elements. For
    instance, we used the GNU linker and assembler in order that we might
    generate x86-64 meeting code in textual content format somewhat than binary
    format, and we took benefit of the C commonplace library to keep away from
    having to implement reminiscence administration and enter/output primitives.
    Once more, this was most likely a good suggestion from an academic
    perspective, however I needed to tackle your complete vertical from supply
    code to meeting opcodes.

Kalyn addresses these issues within the following methods:

  • I created every thing from scratch, together with the linker, the
    assembler, and the usual library. Each single byte that finally ends up
    within the executable binary is immediately generated by my code.
  • I designed Kalyn to make it as usable as potential whereas being as
    simple to compile as potential. It has only a few core options (for
    instance, no lists, arrays, maps, or lessons), but is really a
    general-purpose programming language as a result of these options may be
    carried out in person code with no need particular compiler assist.
    By aiming for a self-hosting compiler, I pressured myself to prioritize
    language usability, as a result of I wanted to jot down a complete compiler in
    Kalyn.
  • I truthfully assume Kalyn is an efficient programming language and I take pleasure in
    writing code in it. It’s just like Haskell, however makes use of Lisp syntax,
    which is one thing that I’ve seen only
    rarely
    . However since I actually like
    Haskell besides for the syntax (which I contemplate an absolute
    abomination), Kalyn provides one thing on prime of languages that already
    exist, so it looks like I’m creating worth. (Sure, clearly Kalyn
    received’t be utilized in any actual initiatives, but it surely was necessary to me that
    my language couldn’t be described as “principally the identical as X, however
    it doesn’t work as properly”.)

Kalyn by the numbers

So does it truly work? Sure! Kalyn can compile itself. The
efficiency is gradual sufficient to be annoying, however not gradual sufficient to be a
drawback, compared with Haskell. Listed here are the stats:

  • Time for GHC to compile my Haskell implementation: 13 seconds
  • Time for my Haskell implementation to compile my Kalyn
    implementation: 2 seconds
  • Time for my Kalyn implementation to compile itself: 48 seconds

So we are able to see that Kalyn runs about 25 occasions slower than Haskell,
which I’m fairly glad with provided that Haskell has been optimized
by consultants for many years and for Kalyn I principally threw collectively the
easiest factor that might probably work.

Now right here’s a special numerical perspective, the dimensions of the challenge
as a perform of time. The ultimate whole is 4,300 traces of Haskell
code throughout 23 modules and 5,400 traces of Kalyn code throughout
43 modules. (Why extra Kalyn? The syntax is barely much less concise,
however largely it’s as a result of I needed to implement your complete Haskell commonplace
library – or a minimum of the half I used within the compiler.) Right here’s are
graphs exhibiting traces of code and variety of modules over time, from
which you’ll see I undoubtedly left every thing to the final minute…

Total lines of code as a function of
time

Number of modules as a function of time

For an additional perspective on the event course of, here’s a graph of
the cumulative whole traces of code added and eliminated (so the challenge
dimension at any given time is the vertical distance between the traces).

Code frequency graph

You’ll be able to have a look for your self on
GitHub
.

Now let’s get into the Kalyn programming language!

In regards to the language being compiled

Kalyn is a mixture of Haskell and
Lisp.
Right here is an instance of some Haskell code that prints out the prime
numbers as much as 100:

module Essential the place

-- | Examine if the quantity is prime.
isPrime :: Int -> Bool
isPrime num =
  let components = [2 .. num - 1]
  in  all (issue -> num `mod` issue /= 0) components

fundamental :: IO ()
fundamental =
  let nums   = [2 .. 100]
      primes = filter isPrime nums
  in  print primes

Right here is identical code in Clojure, a just lately
developed Lisp that runs on the JVM.

(ns hello-world.core)

(defn prime?
  "Examine if the quantity is prime."
  [n]
  (let [factors (range 2 n)]
    (each?
     (fn [factor]
       (not (zero? (mod n issue))))
     components)))

(defn -main
  []
  (let [nums (range 2 100)
        primes (filter prime? nums)]
    (println primes)))

And right here is the equal Kalyn code, which you’ll see combines the
concept of Haskell with the syntax of Lisp:

(import "Stdlib.kalyn")

(defn isPrime (Func Int Bool)
  "Examine if the quantity is prime."
  (num)
  (let ((components (iterate (+ 1) 2 (- num 2))))
    (all
      (lambda (issue)
        (/=Int 0 (% num issue)))
      components)))

(public def fundamental (IO Empty)
  (let ((nums (iterate (+ 1) 2 98))
        (primes (filter isPrime nums)))
    (print (append (showList showInt primes) "n"))))

The language is definitely fairly small, so we are able to undergo all of it
fairly shortly. Let’s have a look.

Knowledge sorts

Kalyn is a statically
typed

programming language, like Haskell. It has precisely 4 lessons of
knowledge sorts:

  • Signed 64-bit integer, denoted Int
  • Perform, denoted Func a b
  • Enter/output monad, denoted IO a
  • Consumer-defined algebraic knowledge sorts

Some extra clarification is clearly so as.

Integers

Why just one dimension of integer? This makes the code era simpler
as a result of each integer has the identical dimension. In reality, I designed Kalyn
utilizing what known as a boxed reminiscence illustration, in order that each
knowledge kind has the identical dimension. Extra on this later.

What about characters? These are literally simply saved as integers.
This wastes plenty of area, as a result of 56 bits out of 64 are left unused,
however once more it makes the implementation a lot easier if we don’t must
fear about differently-sized knowledge sorts.

Capabilities

Kalyn has first-class
functions
,
which means that code can dynamically create capabilities at runtime and move
them round similar to some other knowledge kind. That is required to assist
any affordable functional
programming
.
Kalyn’s capabilities have
closures,
which requires particular compiler assist. Extra on that later.

All capabilities in Kalyn are robotically
curried, like in Haskell.
Because of this all capabilities take solely a single argument;
multiple-argument capabilities are carried out as a single-argument
perform that returns one other single-argument perform that returns
one other perform, and so forth. I made this choice for 2 causes:
firstly, as a result of currying is superior, and secondly, as a result of it
simplifies the sort system and code era if capabilities all take
the identical variety of arguments.

As a result of capabilities are curried, the notation Func a b c is de facto
simply shorthand for Func a (Func b c), the place a, b, and c are
kind parameters which may stand for issues like Int and Record String and Func String Int.

One factor you could be questioning is how capabilities of no arguments are
dealt with. The reply is there isn’t a such factor. Since evaluating a
perform has no negative effects (see the subsequent part on monadic IO),
there’s no distinction between a perform of no arguments that returns
some expression and simply that expression itself.

Enter/output monad

Kalyn adopts Haskell’s abstraction of
monads with
youthful exuberance. Explaining monads is past the scope of this
article, however the level is that each enter/output perform within the
commonplace library (print, readFile, writeFile, and many others.) doesn’t
truly do IO. As an alternative, it returns an occasion of the IO monad which
represents the IO motion. These situations can then be chained
collectively utilizing useful programming methods, and the result’s
executed solely whether it is returned from the fundamental perform of the
program.

Every occasion of the IO monad has a return kind, as in Haskell, so the
kind is denoted IO Int or IO (Record String) or IO a typically.

You would possibly assume that utilizing monadic IO is in battle with the design
objective of creating Kalyn as simple as potential to compile. You’d be
right. But it surely’s so cool!

Consumer-defined algebraic knowledge sorts

You’ll have observed that almost all helpful knowledge sorts, akin to booleans and
lists, are absent from Kalyn. It’s because you’ll be able to simply outline
them your self. That is finished simply as it’s in Haskell, with algebraic
knowledge sorts. Right here is how the Kalyn commonplace library defines some useful
knowledge sorts which shall be acquainted to the Haskell programmer:

(public knowledge Bool
  False True)

(public knowledge (Perhaps a)
  Nothing (Only a))

(public knowledge (Both l r)
  (Left l) (Proper r))

(public knowledge (Pair a b)
  (Pair a b))

(public knowledge (Record a)
  Null (Cons a (Record a)))

(public alias Word8 Int)

(public knowledge Char (Char Word8))

(public alias String (Record Char))

So, for instance, a variable of kind Record Int could possibly be any of:

  • Null
  • Cons 5 Null
  • Cons 5 (Cons 2 Null)
  • Cons 5 (Cons 2 (Cons 9 Null))
  • and many others.

By together with assist for arbitrary algebraic knowledge sorts, the compiler
doesn’t want any particular assist for booleans, lists, arrays, maps,
pairs, optionals, or the rest that may complicate the
implementation.

Syntax

Kalyn consists of declarations and expressions, each of that are
just like Haskell besides in look.

Expressions

First we’ve got perform calls, that are lists. Perform currying is
dealt with robotically, in order that (map (+ 1) elts) means we name the
+ perform with the argument 1 after which move that to the map
perform, and take the perform returned from map and move it the
argument elts.

Subsequent, you’ll be able to outline nameless capabilities utilizing lambda, so a extra
specific type of the earlier code can be:

(map
  (lambda (x)
    (+ x 1))
  elts)

The kind checker features a constraint solver, so it could robotically
determine the kinds of nameless capabilities; there’s no have to
specify that manually (and, for simplicitly, you’ll be able to’t).

Lambdas can have a number of arguments, however that simply means they’re
robotically curried, in order that (lambda (x y) ...) is identical as
(lambda (x) (lambda (y) ...)).

You’ll be able to set up native bindings utilizing let:

(let ((nums (iterate (+ 1) 2 98))
      (primes (filter isPrime nums)))
  (print (showList showInt primes)))

Every binding is evaluated in sequence, and it could check with not solely
earlier bindings but additionally itself recursively. This lets you
outline recursive nameless capabilities:

(let ((explode
       (lambda (x)
         (explode (+ x 1)))))
  (explode 0))

Mutual recursion is
notably not supported in let bindings, as a result of internally a let
kind with a number of bindings is translated right into a collection of nested
single-binding let types, which makes the code era simpler.

The final particular kind is case, which (as in Haskell) lets you
return completely different values relying on an algebraic knowledge kind. Arbitrary
patterns of knowledge constructors and variables can be utilized on the
left-hand aspect of every department. For instance, right here is Kalyn’s
implementation of the basic unzip perform from Haskell:

(public defn unzip (Func (Record (Pair a b)) (Pair (Record a) (Record b)))
  (pairs)
  (case pairs
    (Null (Pair Null Null))
    ((Cons (Pair left proper) pairs)
     (let (((Pair lefts rights)
            (unzip pairs)))
       (Pair (Cons left lefts)
             (Cons proper rights))))))

You could discover that the let kind employs destructuring, which is
principally the identical because the pattern-matching utilized in case branches.
This may be finished in perform arguments as properly, and the @ syntax
from Haskell lets you identify a worth whereas concurrently
destructuring it:

(lambda ([email protected](Char i))
  (if (isAlphaNum c)
    [c]
    (append "_u" (showInt i))))

Macros

That’s it for the core expression sorts in Kalyn. There are a couple of extra
items of syntax, which the parser handles as macros. For instance, the
if assertion

(if b
  False
  True)

interprets into:

(case b
  (True False)
  (False True))

The record literal [1 2 3] interprets into:

(Cons 1 (Cons 2 (Cons 3 Null)))

The string "Whats up" interprets into:

(Cons
  (Char 72)
  (Cons
    (Char 101)
    (Cons
      (Char 108)
      (Cons
        (Char 108)
        (Cons
          (Char 111)
          Null)))))

The variadic and and or types translate right down to nested case
types. And at last, we’ve got the basic do notation from Haskell,
which interprets right into a sequence of >>= invocations. Now, as I’ll
talk about later, Kalyn doesn’t have typeclasses, which implies there are
separate >>=IO, >>=State, and many others. capabilities for every monad. As a
end result, you must specify which monad you’re working with on the
begin of the macro. It seems like this:

(do IO
  (with contents (readFile "in.txt"))
  (let reversed (reverse contents))
  (writeFile "out.txt" reversed)
  (setFileMode "out.txt" 0o600))

The with kind is equal to Haskell’s <- operator, whereas the
let kind is identical as in Haskell. Different types are assumed to be
monad situations whose return values are ignored (aside from the final
kind, which determines the return worth of your complete do macro). The
above code interprets like this:

(>>=IO
  (readFile "in.txt")
  (lambda (contents)
    (let ((reversed (reverse contents)))
      (>>=IO
        (writeFile "out.txt" reversed)
        (lambda (_)
          (setFileMode "out.txt" 0o600))))))

By implementing many acquainted language options as macros as a substitute of
true expressions, I used to be in a position to vastly simplify the implementation of
the compiler, since solely the parser must learn about these
options.

You would possibly surprise why let isn’t carried out as a macro as properly, since
in any case (let ((foo bar)) ...) is equal to ((lambda (foo) ...) bar). The reply is that this might introduce an enormous quantity of
overhead, as a result of a let may be simply translated into only a single
transfer instruction within the meeting, whereas a perform name (particularly
with correct dealing with of closures) is far more costly.

Declarations

First we’ve got def, which lets you outline the worth of a logo,
giving its kind and an non-compulsory
docstring, like:

(def pageSize Int
  "The web page dimension of the CPU."
  0x1000)

Subsequent up is defn, which is for outlining capabilities:

(defn fst (Func (Pair a b) a)
  ((Pair a _))
  a)

Really, although, defn is only a macro that expands to def and
lambda, like so:

(def fst (Func (Pair a b) a)
  (lambda ((Pair a _))
    a))

We’ve algebraic knowledge kind declarations, as we’ve seen earlier than:

(knowledge (Perform reg)
  (Perform Int Label (Record (Instruction reg))))

And we’ve got kind aliases. That is the kind key phrase from Haskell.
(The newtype key phrase is principally the identical as knowledge, and Kalyn
doesn’t care concerning the distinction, so it doesn’t have a separate
declaration kind for that.) So, for instance, String can be utilized as a
shorthand for Record Char:

(alias String (Record Char))

Kalyn’s commonplace library defines quite a lot of aliases, like these:

(alias Int8  Int)
(alias Int16 Int)
(alias Int32 Int)
(alias Int64 Int)

(alias Bytes String)
(alias FilePath String)

After all, there is just one dimension of integer, and there’s no
distinction between binary and textual content strings, however utilizing the sort
aliases is useful to make the sort signatures simpler to know.

Module system

The Kalyn compiler and commonplace library is break up into many alternative
information. One file is designated by the compiler as the principle module, and
it could import others, like:

(import "Stdlib.kalyn")

Now every declaration key phrase (def, defn, knowledge, alias) may be
optionally preceded by public to point that the declaration
must be made out there to different code that imports the module. As an
apart, this solves an enormous annoyance I’ve with Haskell, which is that
there’s no strategy to specify which capabilities in a module must be public
with out having to record all of them on the prime of the file.

Ideally, Kalyn would even have a strategy to disguise or choose particular
symbols on an import, however within the curiosity of simplicity we don’t have
that. Certified imports can be one other helpful characteristic, however of their
absence we get alongside high quality by simply prefixing names to keep away from conflicts,
like for instance mapInsert versus setInsert.

One key characteristic is that even the import key phrase may be preceded by
public to point that each one the imported symbols must be
re-exported. This enables for Stdlib.kalyn to public import many
submodules, in order that person code solely must import Stdlib.kalyn to
get your complete commonplace library.

The module system in Kalyn is de facto filth easy. There’s no idea
of a search path or challenge root. Kalyn modules are simply information
containing Kalyn supply code (even the file extension doesn’t matter),
and imports are merely resolved as filenames relative to the listing
containing the module with the imports. This simplified the
implementation; languages like Python impose stronger conventions on
module format however we don’t want that to get a compiler working.

Typeclasses

You’ll have observed the conspicuous absence of 1 key characteristic of
Haskell, specifically
typeclasses. That is
as a result of it seems that you just don’t want them to get a compiler up and
working, despite the fact that they’re actually very nice. In Haskell, you’ll be able to
outline a Present situations like this, for instance (in the event that they weren’t
already outlined in the usual library):

occasion Present Bool the place
  present False = "False"
  present True = "True"

occasion Present a => Present (Record a) the place
  present elts = "[" ++ intercalate "," (map show elts) ++ "]"

present [False, True]  -- "[False,True]"

In Kalyn, we are able to do the identical factor, we simply must outline a special
perform for every kind:

(alias (Present a) (Func a String))

(defn showBool (Present Bool)
  (bool)
  (case bool
    (False "False")
    (True "True")))

(defn showList (Func (Present a) (Present (Record a)))
  (present elts)
  (concat
    ["[" (intercalate ", " (map show elts)) "]"]))

showList showBool [False, True]  ; "[False, True]"

Not ideally suited, but it surely sort of seems just like the Haskell model for those who
squint, and in observe it’s not that large of a ache. What’s extra
annoying is that this method doesn’t work for higher-kinded
typeclasses

like Monad. (Attempt it and see!) So it’s not potential to outline a
perform after the model of showList that may act on an arbitrary
monad for those who handed it the related >>=No matter bind operator.
Fortunately, we solely use two monads (IO and State) within the compiler, so
that wasn’t too large of a deal.

On reflection, I’m fairly proud of the end result. Extending the sort
checker to assist typeclasses can be fairly advanced, so I believe the
restricted model that I carried out was a very good compromise to get a
self-hosted compiler initially off the bottom.

Laziness

The opposite main distinction from Haskell that’s value mentioning is
laziness. Haskell may be very lazy by default, so expressions are solely
evaluated once they should be. This usually wreaks havoc with
analysis order and makes it exhausting to know what’s working when,
though it does allow some neat methods like with the ability to manipulate
infinite lists. Kalyn takes a less complicated method and evaluates
every thing eagerly. There are two fundamental disadvantages to doing issues
this fashion:

  • You’ll be able to’t have infinite lists anymore, so idioms like take 100 (iterate (+ 1) 0) don’t work. I made the iterate perform within the
    commonplace library take an additional argument that controls the variety of
    iterations, so we are able to write (iterate (+ 1) 0 100) as a substitute and it
    works nice. Seems that laziness isn’t truly wanted all that
    usually, a minimum of in this sort of challenge.
  • Usually the way in which lazy analysis works is that every expression is
    become a thunk whose
    worth may be computed when wanted after which cached. By not
    implementing any of this, we lose the caching. Which means the values
    of top-level symbols are literally recomputed each time they’re
    wanted, which is unlucky in some instances the place a top-level image
    is assigned the results of a nontrivial calculation. However ultimately
    it’s not that unhealthy. This drawback could possibly be mounted at some extra
    complexity price, even when laziness weren’t added.

… And that’s it for Kalyn! You now know your complete language.

Preliminary technical design choices

Earlier than we get into the compiler stack, we have to speak about a couple of
design choices which have an enormous affect on how the low-level code
being generated seems.

In-memory knowledge representations

The primary alternative I wanted to make was symbolize every of the
knowledge sorts in reminiscence, because the meeting code I generate operates
immediately on bytes, not monads and algebraic knowledge sorts.

To simplify the implementation as a lot as potential, I chosen a
boxed reminiscence illustration. On this illustration, each knowledge kind
has precisely the identical dimension, specifically eight bytes (which we name a
phrase). So, if an information kind wants eight or fewer bytes, we are able to simply
retailer it immediately like that. If it wants extra, nevertheless, then as a substitute
we allocate reminiscence for it on the
heap

and retailer a pointer to that reminiscence. If an object has sub-objects in
its fields, we are able to retailer these sub-objects in the identical approach: both
immediately, if they’re sufficiently small, or by means of a pointer.

Now let’s speak concerning the particular person lessons of knowledge sorts. Integers
are simple: since they’re 64-bit, we are able to retailer them as-is in a single phrase.
The opposite sorts are extra attention-grabbing.

Capabilities

Perform objects should embody two issues: firstly, the handle of
their machine directions in reminiscence; secondly, the arguments of their
closure. For instance, suppose we run the next code:

(let ((x 5)
      (y 7))
  (lambda (z)
    (+ (* z x) y)))

Then the perform object returned must retailer two values in its
closure, x = 5 and y = 7. In Kalyn, perform objects include
three components:

  • First comes a phrase that accommodates the handle of their code. (For
    every lambda kind that seems within the supply code, we generate one
    perform within the meeting, so that every lambda has a spot the place its
    directions are saved.)
  • Subsequent comes a phrase that specifies what number of values are within the closure
    of the perform. In idea this could possibly be decided robotically by
    wanting on the perform handle, because the dimension of every lambda’s
    closure is thought at compile-time, however that may impose plenty of
    complexity at runtime.
  • Lastly, we’ve got one phrase for every of the closure values. This implies
    that perform objects have completely different sizes, however as a result of we put them
    behind a pointer, we are able to deal with them as if they’re all a single
    phrase.

Word that the order of closure arguments is necessary! As I clarify
later, the translator (code generator) arranges for the caller and the
callee to agree about what order the values ought to go in.

In abstract, the perform object from above would possibly appear like this on the
heap, and we’d move round a pointer to it:

  code addr   num params  worth of x  worth of y
  .           .           .           .
  .           .           .           .
  .           .           .           .
+-----------+-----------+-----------+-----------+
| 0x821ad   | 2         | 5         | 7         |
+-----------+-----------+-----------+-----------+

IO monad

I used to be a bit terrified of determining precisely implement monadic
IO, as a result of it appeared very summary. It seems, nevertheless, to be
shockingly easy. An occasion of the IO monad is solely a perform
object which, when referred to as, does the IO.

Let’s have a look at an instance. Suppose we need to translate this code:

(let ((fname "take a look at.txt")
      (msg "Whats up, world!n"))
  (writeFile fname contents))

We’d find yourself with a perform object that appears like this (the place
fname and contents are pointers into the heap):

  code addr   num args    fname ptr   msg ptr
  .           .           .           .
  .           .           .           .
  .           .           .           .
+-----------+-----------+-----------+-----------+
| 0xcf73a   | 2         | 0x2eb2820 | 0x49f7988 |
+-----------+-----------+-----------+-----------+

This seems similar to the perform objects based mostly on lambda
types, however conceptually it’s truly somewhat completely different. As an alternative of
closure values, we’ve got perform arguments. With the lambda instance
from earlier than, calling the perform object meant giving the code each
values from the closure along with the precise argument of the
lambda. With this instance, there’s no closure and no additional argument to
present: all the required info to do the IO is correct there in
the perform object. Regardless of these variations, although, the mechanics
are related sufficient that each sorts of perform objects may be handled
the identical by Kalyn internally.

Within the instance above, the code handle just isn’t the handle of
writeFile, as a result of writeFile is the perform that returned this
monad occasion (aka perform object). As an alternative, it’s the handle of a
helper perform writeFile__unmonadified which truly writes the
file. Every perform that returns a monad has an related helper
perform to do the work.

Now let’s contemplate how we implement the monadic binding operator
>>=IO. The >>=IO perform itself is only a wrapper that returns a
perform object pointing at >>=IO__unmonadified which does the
precise work. What’s that precise work? The helper will get two arguments
ma and famb. First, it runs ma to do its IO and procure the
return worth. Then it passes that return worth to famb to get
one other perform object which is the returned IO occasion. Lastly, it
should invoke that perform object to do the remainder of the IO (which
would possibly represent additional invocations of >>=IO) earlier than returning.

Lastly, since one thing should kick off the IO execution within the first
place, the boilerplate code generated for Kalyn’s fundamental perform
first evaluates its physique to get a monad occasion after which invokes that
perform object to do all of the IO. Then it exits to terminate the
course of.

Consumer-defined algebraic knowledge sorts

That is maybe greatest illustrated by instance. First contemplate booleans:

(knowledge Bool
  False True)

The worth False is represented as 0 and the worth True is
represented as 1. There’s no additional knowledge, so we don’t want a pointer.

Now let’s have a look at optionals:

(knowledge (Perhaps a)
  Nothing (Only a))

We are able to’t match this right into a single phrase with out getting inventive, and
inventive just isn’t appropriate with easy, so we use a pointer for this
one. The primary phrase on the heap is an integer that tells us which
constructor is getting used, similar to with booleans (0 for Nothing, 1
for Simply). For Nothing, that’s it. For Simply, nevertheless, the 1 is
adopted by one other phrase that accommodates the a within the Only a. This
could possibly be both a bit of literal knowledge or a pointer to extra
heap-allocated knowledge. This would possibly appear to be a waste of area within the case
of Nothing, however (with out being inventive) we have to have precisely one
place to look to search out out whether or not we’ve got a Nothing or a Simply, so
both each constructor has to slot in a phrase or we’ve got to place all of
them behind a pointer.

At this level you’ve seen virtually every thing. Typically, an algebraic
knowledge kind consists of two components:

  • A header phrase to inform you which constructor was used. That is
    omitted if there’s just one constructor, akin to in Char.
  • If the constructor has fields, then the values of the fields.

If the mixture of these two components matches inside one phrase for each
knowledge constructor, then the sort may be saved immediately with out a
pointer. In any other case, we use a pointer for each constructor. In case
you’re curious, we’d like a pointer when both:

  • any of the constructors has multiple subject
  • any of the constructors has a minimum of one subject, and there’s extra
    than one constructor

(What about (knowledge Empty), with no constructors in any respect? Eh… we simply
use a zero. We could possibly be good and elide empty fields from containing
knowledge constructors, however this might complicate the implementation.)

Calling conference

Okay, so now we all know how Kalyn’s knowledge sorts are represented. One
notable omission, nevertheless, is truly use perform objects.

One of many first choices I wanted to make after deciding on knowledge
sorts was to determine the Kalyn calling
convention
. This
describes the way in which wherein capabilities obtain their arguments from
callers, and the way they return outcomes.

In Kalyn, perform arguments are handed on the
stack
. Right here is the format
of a single stack
frame
:

| Earlier stack body |
+----------------------+
| Perform argument 1  |
| Perform argument 2  |
|         ...          |
+----------------------+
|    Return handle    |
+----------------------+
|  Saved base pointer  | <-- base pointer
+----------------------+
|   Native variable 1   |
|   Native variable 2   |
|         ...          |
+----------------------+
|   Saved register 1   |
|   Saved register 2   |
|         ...          | <-- stack pointer
+----------------------+
|   Subsequent stack body   |

As is commonplace in x86 meeting, two registers are used to handle the
stack: the stack pointer (%rsp) and the bottom pointer (%rbp). The
stack pointer at all times factors to the final merchandise that was pushed onto the
stack (which shall be on the backside of the stack, since in x86 the
stack grows downward). The bottom pointer, then again, factors to
a hard and fast level inside the stack body and doesn’t change as objects are
pushed and popped (a minimum of till a brand new stack body is entered). The
base pointer is used to simply find particular values inside the stack
body, since indexing from the stack pointer can be tough (because it
strikes round inside the body).

Right here is the movement of a perform name:

  • The caller pushes all the arguments for the perform onto the
    stack. For normal perform objects, this implies all of the closure
    values adopted by the principle parameter of the perform. For IO
    capabilities, this simply means the precise arguments of the perform.
  • The caller invokes the perform utilizing the callq x86 instruction.
    This robotically pushes a return
    address
    onto the
    stack and jumps into the perform.
  • The callee pushes the present base pointer (which pointed into the
    caller’s stack body) onto the stack, with the intention to save its worth,
    after which updates the bottom pointer to level on the present stack
    pointer. Now the bottom pointer can be utilized to index into the callee’s
    stack body.
  • If the callee can’t match all of its variables into registers (as I
    talk about later within the part on register allocation), it strikes the
    stack pointer additional downward to order stack area for the additional
    variables.
  • The callee pushes the values of any registers it makes use of onto the
    stack, with the intention to save their values.
  • The callee’s perform physique is executed. When it wants entry to the
    perform arguments or native variable area, it could find them utilizing
    the bottom pointer. If the callee must name extra capabilities (fairly
    probably), it pushes their arguments and this course of repeats
    recursively.
  • The callee pops the values of the saved registers off the stack,
    restoring their values for the caller.
  • The callee strikes the stack pointer upwards to deallocate the area
    it reserved for its native variables.
  • The callee pops the saved base pointer off the stack. The bottom
    pointer now factors again into the caller’s stack body.
  • The caller places its return worth into the %rax registers, then
    returns utilizing the retq x86 instruction. This pops the return
    handle off the stack and jumps again to the caller.
  • The caller strikes the stack pointer as much as deallocate the area it
    used to push the perform arguments.

Within the stack body diagram above, the bottom pointer and stack pointer
shall be on the locations labeled whereas the callee’s perform physique is
executing.

Discover that the callee’s base pointer is pointing on the saved base
pointer from the caller. That base pointer factors on the saved base
pointer from the caller’s caller, and so forth. Thus, by traversing the
chain of base pointers, we are able to assemble a name stack. All we have to
do is look proper above every base pointer to search out the return addresses,
and that can inform us which capabilities we’re in (and at which
instruction, which may be translated right into a line quantity). After all,
Kalyn doesn’t truly present backtraces at runtime, however the skill
to observe the bottom pointer chain was invaluable when debugging in
GDB.

Other than stack body format, there’s one different necessary
consideration when selecting a calling conference, which is to
designate machine
registers
as both
caller-saved or callee-saved. Since each perform should do its work
utilizing the identical set of registers, conflicts between completely different capabilities
should be prevented. That is sometimes finished by pushing the worth of a
register onto the stack, after which later popping it off to revive its
worth. Between the push and the pop, the register can safely be used
by one other perform. The query is whether or not the caller or callee is
accountable for saving the values of probably conflicting registers.

In the usual x86-64 calling conference, some registers are marked
as caller-saved and others are marked as callee-saved:

  • Caller-saved: %rax, %rcx, %rdx, %rsi, %rdi, %r8, %r9,
    %r10, %r11
  • Callee-saved: %rbx, %r12, %r13, %r14, %r15

This break up was chosen as a compromise, as a result of caller-saved registers
are higher to make use of in some instances whereas callee-saved registers are
higher for others. In Kalyn, nevertheless, all registers are callee-saved
aside from %rax (which is used to retailer return values). This
simplifies the implementation.

Why all callee-saved as a substitute of all caller-saved? I judged that it was
easier to rearrange for registers to be saved and restored on the
starting and finish of every perform somewhat than earlier than and after every
subroutine name. However the alternative is generally one among style.

Readers acquainted with x86-64 would possibly recall that in the usual calling
conference, arguments are usually not handed on the stack until there are
lots of them. The primary six arguments are handed in registers, specifically
%rdi, %rsi, %rdx, %rcx, %r8, and %r9. That is clearly extra
environment friendly than pushing each argument onto the stack, as a result of reminiscence
accesses are gradual. However, it’s extra sophisticated, so Kalyn does issues
the straightforward approach.

Readers acquainted with meeting programming may also object “doesn’t
selecting a nonstandard calling conference forestall Kalyn from
interoperating with different code?” Effectively… sure! However the objective for this
challenge was to jot down every thing from scratch, so in truth there isn’t a
different code to interoperate with. The one exception is system calls,
which happen solely inside primitive capabilities that I hand-wrote in
meeting. The remainder of Kalyn doesn’t have to learn about system calls,
so there’s no want for it to make use of their calling conference.

Compiler structure walkthrough

On this part I’ll stroll you thru your complete compiler pipeline
from prime to backside. Let’s observe the pattern program that I used to
illustrate Kalyn’s syntax:

(import "Stdlib.kalyn")

(defn isPrime (Func Int Bool)
  "Examine if the quantity is prime."
  (num)
  (let ((components (iterate (+ 1) 2 (- num 2))))
    (all
      (lambda (issue)
        (/=Int 0 (% num issue)))
      components)))

(public def fundamental (IO Empty)
  (let ((nums (iterate (+ 1) 2 98))
        (primes (filter isPrime nums)))
    (print (append (showList showInt primes) "n"))))

Step one of the compiler is the lexer. This takes this system
supply code and turns it right into a sequence of tokens, that are names,
numbers, and items of punctuation. It seems like this:

LPAREN
SYMBOL "import"
STRING "Stdlib.kalyn"
RPAREN
LPAREN
SYMBOL "defn"
SYMBOL "isPrime"
LPAREN
SYMBOL "Func"
SYMBOL "Int"
SYMBOL "Bool"
RPAREN
STRING "Examine if the quantity is prime."
LPAREN
SYMBOL "num"
RPAREN
LPAREN
SYMBOL "let"
LPAREN
LPAREN
SYMBOL "components"
LPAREN
SYMBOL "iterate"
LPAREN
SYMBOL "+"
...

Subsequent up is the reader. This converts the token stream right into a
hierarchical list-of-lists illustration. In different phrases, it parses
the Lisp syntax of Kalyn. Here’s what that appears like:

RoundList
    [ Symbol "import"
    , StrAtom "Stdlib.kalyn"
    ]
RoundList
    [ Symbol "defn"
    , Symbol "isPrime"
    , RoundList
        [ Symbol "Func"
        , Symbol "Int"
        , Symbol "Bool"
        ]
    , StrAtom "Examine if the quantity is prime."
    , RoundList [ Symbol "num" ]
    , RoundList
        [ Symbol "let"
        , RoundList
            [ RoundList
                [ Symbol "factors"
                , RoundList
                    [ Symbol "iterate"
                    , RoundList
                        [ Symbol "+"
                        , IntAtom 1
                        ]
                    ...

After the reader comes the parser, which converts the list-of-lists
illustration into an abstract syntax tree
(AST)
that may
be simply processed by the remainder of the compiler. The AST consists
of the declarations and expressions that I outlined earlier. Notably
it doesn’t embody any macros akin to if or do, because the parser
robotically interprets these into their lower-level counterparts.
Right here is a part of the AST for this system above:

Import False "Stdlib.kalyn"
Def False "isPrime"
    ( Kind [] "Func"
        [ Type [] "Int" []
        , Kind [] "Bool" []
        ]
    )
    ( Lambda "num"
        ( Let "components"
            ( Name
                ( Name
                    ( Name ( Variable "iterate" )
                        ( Name ( Variable "+" ) ( Const 1 ) )
                    ) ( Const 2 )
                )
                ( Name
                    ( Name ( Variable "-" ) ( Variable "num" ) ) ( Const 2 )
                )
            )
            ( Name
                ( Name ( Variable "all" )
                    ( Lambda "issue"
                        ( Name
                            ( Name ( Variable "/=Int" ) ( Const 0 ) )
                            ( Name
                                ...

The False that seems after Import and Def imply that public
was not used on these declarations. The empty lists after every Kind
are as a result of this code doesn’t use typeclass constraints. (I wrote the
parser earlier than deciding I might get away with out typeclass assist for
the primary model of Kalyn, so all the AST manipulation capabilities
take typeclasses into consideration.)

One attention-grabbing factor you would possibly observe is that the parser handles
perform currying, so each Name has precisely two arguments even
although capabilities have been referred to as with greater than two arguments within the enter
program.

Subsequent up is the bundler. The lexer, reader, and parser are literally all
run from the bundler, which is the actual entry level to the compiler.
The bundler is accountable for dealing with the module system of Kalyn.
After lexing, studying, and parsing the principle module, the bundler checks
for Import types. If it finds any, it lexes, reads, and parses the
information referenced, and continues recursively till it has processed all
of the wanted supply code.

At this level, the bundler resolves transitive imports. In different
phrases, it inspects the gathering of import and public import
types in all loaded modules and determines what modules one another
module can “see”. So, if A.kalyn has (import "B.kalyn") and
B.kalyn has (import "C.kalyn") and (public import "D.kalyn"),
then A.kalyn can see itself, B.kalyn, and D.kalyn, however not
C.kalyn.

After the bundler has completed working, it has produced a group
of modules (every with an inventory of declarations and details about
what different modules are seen). This assortment known as a bundle,
surprisingly sufficient. Earlier than the bundle may be remodeled into
meeting by the translator, it should be handed to 2 different aspect
modules: the resolver and the sort checker.

The job of the resolver is twofold. First it should determine on a novel
identify for each object that the meeting code might want to check with
(akin to variables, capabilities, and knowledge constructors). This course of,
referred to as name mangling,
entails substituting Unicode characters with ASCII equivalents and
additionally ensuring variables by the identical identify in numerous modules don’t
battle with one another. For instance, the foldr perform outlined in
Stdlib/Lists.kalyn could be given the identify
__src_u45kalynStdlibLists_u46kalyn__foldr.

After the resolver decides on names, it additionally should generate a mapping
for every module that interprets names from person code into the interior
names. So, in each module that imports Stdlib/Lists.kalyn there
shall be a mapping from foldr to
__src_u45kalynStdlibLists_u46kalyn__foldr. The mapping additionally contains
kind info and, for knowledge constructor, notes on which knowledge
constructor is in use, what number of fields it has, and many others. The mappings
generated by the resolver are used to lookup image definitions in
each the sort checker and translator.

At this level the bundle is run by means of the sort checker. It would
shock you to listen to that the sort checker doesn’t truly produce
info for some other components of the compiler. Its solely function is
to crash the compiler if there’s a kind error. You would possibly count on that
in a strongly typed programming language we would wish kind
info with the intention to compile. In reality, nevertheless, my use of a boxed
reminiscence illustration implies that code that operates on a worth doesn’t
truly have to know what kind that worth has. Because of this the
solely utility within the kind checker is making it in order that kind errors will
offer you a compile-time error as a substitute of a segmentation fault at
runtime. (Nonetheless fairly helpful although.) I took benefit of this
property by not bothering to port the sort checker to Kalyn. Since I
already know from the Haskell implementation that my Kalyn code
type-checks, and since compilation doesn’t require kind info,
the Kalyn implementation doesn’t want a sort checker to be
self-hosting. (Though clearly it will want one finally, in
order to be helpful.)

Now we arrive on the core of the compiler, the translator (additionally referred to as
the code generator). At this level we’ve got a bundle that accommodates AST
declarations and expressions, along with a resolver mapping that
tells us the which means of each identify that seems within the AST. The job of
the translator is to rework every declaration from the AST right into a
set of a number of capabilities in x86 meeting. Right here’s a part of the
translated code for isPrime from our instance:

__Main_u46kalyn__isPrime:
        pushq $16
        callq memoryAlloc
        addq $8, %rsp
        movq %rax, %t0
        leaq __Main_u46kalyn__isPrime__uncurried(%rip), %t1
        movq %t1, (%t0)
        movq $0, 8(%t0)
        movq %t0, %rax
        retq
__Main_u46kalyn__isPrime__uncurried:
        movq 16(%rbp), %t2
        movq $1, %t8
        pushq %t8
        callq plus__curried0
        addq $8, %rsp
        movq %rax, %t9
        movq %t9, %t5
        pushq %t5
        movq $2, %t6
        pushq %t6
        movq %t2, %t10
        pushq %t10
        movq $2, %t11
        pushq %t11
        callq minus__uncurried
        ...

(Why two completely different capabilities? The primary one returns the worth of
isPrime, which is a perform object, and the second implements
the lambda for that perform object.)

What results in the binary is, nevertheless, not solely this code for person
capabilities, but additionally code for the core primitives of the language.
These are issues like arithmetic and IO operations which might’t be
carried out immediately in Kalyn. We’ve to start out someplace! I wrote
these capabilities manually in meeting, and they’re added to the
program by the translator.

There are a couple of modules which are accountable for coping with
primitives:

  • Subroutines contains code that’s used to implement widespread logic,
    like getting arguments from the stack or performing a perform name.
  • Primitives has implementations of all the essential primitive
    capabilities that person code can name, like + and print and >>=IO.
  • MemoryManager has inner capabilities which are used to deal with
    reminiscence allocation. Bear in mind, “from scratch” means no
    malloc!
  • Bridge inspects the person code to see what primitives it calls, and
    hyperlinks in solely these primitives to keep away from bloating the binary. It additionally
    handles wrapping primitives in order that they’re appropriate to be referred to as
    from person code. This contains producing curried and monadic
    wrappers in order that I didn’t have to fret about any of that when
    implementing the precise primitives.

You would possibly discover within the meeting snippet above that we’re utilizing
digital registers %t0, %t1, and many others. as a substitute of simply the standard x86
machine registers %rax, %rdi, %rsi, and many others. It’s because code
era is way simpler once we can faux we’ve got infinitely many
registers. It’s the job of the register allocator to map these
digital registers onto precise machine registers, and to maneuver additional
info into native variables on the stack when there are usually not
sufficient machine registers to suit all the information.

Step one of register allocation is to carry out a liveness
evaluation
. We analyze every meeting instruction to find out which
registers it reads from and writes to. Based mostly on that info, we
can carry out an iterative evaluation to find out which registers are
stay (could be used sooner or later) at every level in this system. If
two digital registers are stay on the similar time, then they’ll’t be
assigned to the identical bodily register or they might battle. Right here is
a part of the liveness evaluation for isPrime:

__Main_u46kalyn__isPrime:

;; stay IN: (none)
;; used: (none)
        pushq $16
;; outlined: (none)
;; stay OUT: (none)

;; stay IN: (none)
;; used: (none)
        callq memoryAlloc
;; outlined: %rax
;; stay OUT: %rax

;; stay IN: %rax
;; used: (none)
        addq $8, %rsp
;; outlined: %rsp
;; stay OUT: %rax

;; stay IN: %rax
;; used: %rax
        movq %rax, %t0
;; outlined: %t0
;; stay OUT: %t0

;; stay IN: %t0
;; used: %rip
        leaq __Main_u46kalyn__isPrime__uncurried(%rip), %t1
;; outlined: %t1
;; stay OUT: %t0, %t1

;; stay IN: %t0, %t1
;; used: %t0, %t1
        movq %t1, (%t0)
;; outlined: (none)
;; stay OUT: %t0

...

Based mostly on this info, the register allocator rewrites the code to
use applicable bodily registers. You’ll be able to see that %t0 was positioned
in %rdx and %t1 was positioned in %rcx:

__Main_u46kalyn__isPrime:
        pushq $16
        callq memoryAlloc
        addq $8, %rsp
        movq %rax, %rdx
        leaq __Main_u46kalyn__isPrime__uncurried(%rip), %rcx
        movq %rcx, (%rdx)
        movq $0, 8(%rdx)
        movq %rdx, %rax
        retq
__Main_u46kalyn__isPrime__uncurried:
        movq 16(%rbp), %rsi
        movq $1, %rax
        pushq %rax
        callq plus__curried0
        addq $8, %rsp
        movq %rax, %rdx
        movq %rdx, %rcx
        pushq %rcx
        movq $2, %rcx
        pushq %rcx
        movq %rsi, %rcx
        pushq %rcx
        movq $2, %rcx
        pushq %rcx
        callq minus__uncurried

After code era, there’s one remaining transformation step on the
meeting, which is dealt with by the boilerplate module. This module
adapts every perform to respect the Kalyn calling conference by
updating the bottom pointer, saving and restoring the information registers it
overwrites, and, if the perform wanted native variables, shifting the
stack pointer to allocate and deallocate area for them. Right here is a component
of the ultimate model of isPrime:

__Main_u46kalyn__isPrime:
        pushq %rbp
        movq %rsp, %rbp
        pushq %rdx
        pushq %rcx
        pushq $16
        callq memoryAlloc
        addq $8, %rsp
        movq %rax, %rdx
        leaq __Main_u46kalyn__isPrime__uncurried(%rip), %rcx
        movq %rcx, (%rdx)
        movq $0, 8(%rdx)
        movq %rdx, %rax
        popq %rcx
        popq %rdx
        popq %rbp
        retq
__Main_u46kalyn__isPrime__uncurried:
        pushq %rbp
        movq %rsp, %rbp
        pushq %rsi
        pushq %rdx
        pushq %rcx
        pushq %rbx
        movq 16(%rbp), %rsi

At this level we’ve got your complete program in x86 meeting format. It’s
now time for the assembler to translate every meeting instruction into
the suitable sequence of bytes. Mechanically it is a
simple course of, though deciphering the reference
materials
is kind of the duty.
For instance, right here is the binary for every instruction in
__Main_u46kalyn__isPrime:

48 ff f5                pushq %rbp
48 8b ec                movq %rsp, %rbp
48 ff f2                pushq %rdx
48 ff f1                pushq %rcx
68 10 00 00 00          pushq $16
e8 f1 4e 01 00          callq memoryAlloc
48 81 c4 08 00 00 00    addq $8, %rsp
48 8b d0                movq %rax, %rdx
48 8d 0d 21 00 00 00    leaq __Main_u46kalyn__isPrime__uncurried(%rip), %rcx
48 89 8c 22 00 00 00    movq %rcx, (%rdx)
00
48 c7 84 22 08 00 00    movq $0, 8(%rdx)
00 00 00 00 00
48 8b c2                movq %rdx, %rax
48 8f c1                popq %rcx
48 8f c2                popq %rdx
48 8f c5                popq %rbp
c3                      retq

It’s at this level that each one the labels generated by the resolver are
put to make use of: every one is translated to a numerical offset in bytes that
may be embedded into the binary.

The ultimate step is the linker. This takes the binary code and knowledge that
was generated by the assembler and wraps it in a header within the
Executable and Linkable Format
(ELF)
.
The ensuing binary has metadata that’s utilized by the working system
to load it into reminiscence and that’s utilized by
GDB to show debugging
info:

ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Knowledge:                              2's complement, little endian
  Model:                           1 (present)
  OS/ABI:                            UNIX - System V
  ABI Model:                       0
  Kind:                              EXEC (Executable file)
  Machine:                           Superior Micro Gadgets X86-64
  Model:                           0x1
  Entry level handle:               0x18000
  Begin of program headers:          64 (bytes into file)
  Begin of part headers:          176 (bytes into file)
  Flags:                             0x0
  Dimension of this header:               64 (bytes)
  Dimension of program headers:           56 (bytes)
  Variety of program headers:         2
  Dimension of part headers:           64 (bytes)
  Variety of part headers:         6
  Part header string desk index: 1

Part Headers:
  [Nr] Title              Kind             Tackle           Offset
       Dimension              EntSize          Flags  Hyperlink  Information  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .shstrtab         STRTAB           0000000000000000  00000230
       0000000000000027  0000000000000000           0     0     0
  [ 2] .symtab           SYMTAB           0000000000000000  00000257
       0000000000003348  0000000000000018           3   547     0
  [ 3] .strtab           STRTAB           0000000000000000  0000359f
       00000000000045b0  0000000000000000           0     0     0
  [ 4] .textual content             PROGBITS         0000000000018000  00008000
       0000000000015245  0000000000000000  AX       0     0     0
  [ 5] .knowledge             PROGBITS         000000000002e000  0001e000
       00000000000010b7  0000000000000000  WA       0     0     0

Program Headers:
  Kind           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000008000 0x0000000000018000 0x0000000000000000
                 0x0000000000015245 0x0000000000015245  R E    0x0
  LOAD           0x000000000001e000 0x000000000002e000 0x0000000000000000
                 0x00000000000010b7 0x00000000000010b7  RW     0x0

Image desk '.symtab' accommodates 547 entries:
   Num:    Worth          Dimension Kind    Bind   Vis      Ndx Title
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000018326     0 FUNC    LOCAL  DEFAULT    4 __Booleans_u46kalyn__not
     2: 000000000001836e     0 FUNC    LOCAL  DEFAULT    4 __Booleans_u46kalyn__not_
     3: 00000000000183d5     0 FUNC    LOCAL  DEFAULT    4 __Booleans_u46kalyn__xor
     4: 000000000001841d     0 FUNC    LOCAL  DEFAULT    4 __Booleans_u46kalyn__xor_
     5: 000000000001847b     0 FUNC    LOCAL  DEFAULT    4 __Booleans_u46kalyn__xor_
     6: 000000000001f6c1     0 FUNC    LOCAL  DEFAULT    4 __DataTypes_u46kalyn__Cha
     ...

And now you know the way program supply code flows by means of your complete
Kalyn compiler stack to turn into an executable native binary.

How I carried out it

This part has a deep dive into every a part of the compiler
implementation, relating all the attention-grabbing technical choices
that I made.

Lexer, reader, and parser

Step one within the compiler is remodeling supply code into an
AST. I made a decision to separate this course of into three items, somewhat than
the same old two (lexing and parsing) or one (doing every thing within the
parser). The reason being that it’s fairly simple to cleanly separate every
of the three steps, and doing this makes the implementation simpler to
handle.

The reader, which handles the Lisp syntax of Kalyn, is carried out as
a recursive descent
parser
. This
is a fairly easy job as a result of there’s not an excessive amount of syntax and the
grammar is LL(1) for
sensible functions. The Lisp syntax is the one a part of Kalyn that
requires an actual recursive descent parser, and by separating it out
right into a separate reader module, I used to be in a position to make the parser itself
trivial: it merely must pattern-match on the lists that it
receives to determine which AST nodes they correspond to. Word that we
solely get the simple LL(1) grammar as a result of the lexer runs first and
converts runs of characters into single tokens. With out the lexer,
reader, and parser all being separate, the implementation can be
considerably extra advanced.

One factor to notice concerning the lexer is that it doesn’t use common
expressions, in contrast to most lexers, and no a part of the stack makes use of a lexer
or parser generator. The rationale for that is easy: if I had, then I
would have wanted to implement the dependency (common expressions,
lexer/parser generator) in Kalyn!

Customary library

By design, Kalyn omits most helpful options from the core language,
deferring them as a substitute to user-defined capabilities and algebraic knowledge
sorts. So I wanted to implement all the knowledge buildings that I
needed to make use of within the compiler. For probably the most half, this was simply lists,
booleans, maps, and units.

Lists and booleans have been pretty simple. The primary problem was merely
implementing the massive quantity of ordinary library capabilities that I
wanted with the intention to manipulate them correctly. There are a complete of 139
public capabilities within the Kalyn commonplace library, with virtually all the
names lifted immediately from Haskell. I wrote most of them myself
as a result of the Haskell commonplace library is fairly simple to implement for
probably the most half; for instance, here’s a typical perform from
Stdlib/Lists.kalyn:

(public defn drop (Func Int (Record a) (Record a))
  (n elts)
  (if (<=Int n 0)
    elts
    (case elts
      (Null Null)
      ((Cons fst rst)
       (drop (- n 1) rst)))))

I did definitely have some tough bugs brought on by misimplemented
commonplace library capabilities, although.

The primary problem – and in reality the very very first thing I carried out
in Kalyn, to verify every thing was working – was maps and units. I
elected to make use of splay
trees
, as a result of they’re one
of the best self-balancing bushes to implement. An information construction
that didn’t have $$ O(n log n) $$ operations wouldn’t be
acceptable, as a result of the Kalyn compiler makes heavy use of very giant
maps, and I anticipated (accurately) that Kalyn would run slowly sufficient
to make compiler efficiency a difficulty.

On reflection, splay bushes are usually not truly the appropriate alternative for any
commonplace library implementation in a useful language, as a result of the
amortized evaluation of splay bushes requires that lookups be capable of
mutate the tree. Sadly, this could’t be carried out in a
language that doesn’t assist mutation with out altering the interface
of map lookups, an unacceptable burden. Haskell uses size-balanced
binary
trees
.
Having observed this drawback solely late into the challenge, I elected to
hope that my bushes wouldn’t carry out too poorly if rebalancing on
lookup have been omitted. It appears to be adequate.

Self-balancing bushes are fairly tough to implement, particularly in a
useful language, so I stole a Haskell implementation from the
TreeStructures
bundle on Hackage. It did prove that this implementation had
a number of bugs, which have been a pleasure to find whereas monitoring down
seemingly unrelated points within the compiler, however I used to be in a position to repair them
and Kalyn’s maps appear fairly strong now.

What about units? They’re simply maps whose values are the Empty
algebraic knowledge kind that has one constructor and no fields. This
wastes area (every key-value mapping shops an additional zero), however that’s
hardly the worst reminiscence offense of Kalyn, so I judged it to be high quality.
The Stdlib/Collections/Units.kalyn module has adapter capabilities that
wrap the map module to take away references to map values.

There’s one different attention-grabbing a part of the usual library, that are
the typeclass situations. As I discussed earlier, Kalyn doesn’t assist
typeclasses for the time being, which was a bit tough to take care of since
the Haskell implementation makes heavy use of the typeclass capabilities
present, evaluate, (==), and >>=. My method was to make each
perform with a typeclass constraint as a substitute take a further
parameter which is a concrete occasion of the typeclass perform. So,
for instance, when developing a map of strings, you move within the
compareString perform. If you wish to convert an inventory to a string,
you name the showList perform and move it additionally the suitable
showInt or showString or no matter is suitable to your factor
kind. Discovering the index of an integer in an inventory requires passing
elem the ==Int perform. And so forth.

Once more as I discussed earlier, this method sadly doesn’t
work for >>=. Fortunately, we solely use two necessary monads: IO and
State (the latter being a easy encapsulation of stateful
computation supplied by the
mtl bundle). I
merely carried out the related monadic combinators for every occasion
that wanted them (mapMState, foldMState, replicateMState,
fmapIO, mapMIO, and many others.). Word that nothing about monads makes them
want particular compiler assist: solely the side-effecting nature of the
IO monad requires additional primitives. So State is carried out
fully in person code.

Bundler and resolver

There’s not a lot to say concerning the bundler. The primary choice I made
there was to make it accountable not just for studying all of the modules
but additionally for resolving their transitive imports. I did this primarily
as a result of resolving transitive imports requires a graph traversal
algorithm and I needed to isolate this from the already-complex logic
of the resolver.

Now, the resolver is without doubt one of the largest modules within the compiler, even
although it ostensibly doesn’t do something very sophisticated. There are
simply plenty of little issues to deal with. The very first thing to speak
about is the identify mangling scheme.

Step 1 is to uniquify module names. By default we simply prepend every
image’s identify with the identify of its module. This ensures that symbols
from completely different modules don’t battle. (If two imported modules A
and B outline a logo Sym by the identical identify, then we’ll get
A__Sym and B__Sym, and the resolver will report a battle as a result of
it’s not clear whether or not Sym within the present module ought to resolve to
A__Sym or B__Sym.)

Now, it’s potential that we’ve got each Stdlib/A.kalyn and
Consumer/A.kalyn, wherein case we strive StdlibA and UserA to see if
this disambiguates all of the modules. In any other case we hold wanting
backwards on the full paths till we’ve got a novel prefix for every
module.

Step 2 is to sanitize module and image names in order that they’re protected to
use in meeting. That is primarily to make it in order that the .S information
generated by Kalyn have legitimate syntax and may be compiled utilizing GCC if
for some cause we need to bypass Kalyn’s assembler and linker. We
simply exchange non-alphanumeric characters with underscore-based escape
sequences. For instance, the perform set supplied by Units.kalyn
would possibly encode to set_u92_u92 with a module prefix of Sets_u46kalyn.

Step 3 is to mix the components. By eliminating underscores in step 2,
we make it potential to make use of them to unambiguously namespace our
symbols. In Kalyn, all user-defined symbols begin with __, and the
module and image names are separated by one other __. This namespaces
the person symbols whereas reserving image names not beginning with __
for our use (e.g. primitives like print). Combining every thing
above, the precise full identify of set from
Stdlib/Collections/Units.kalyn is
__StdlibCollectionsSets_u46kalyn__set_u92_u92. Lovely!

The remainder of the resolver is lengthy, however not terribly attention-grabbing. We
simply traverse the bundle and iterate by means of transitive imports to
discover out which absolutely resolved image every identify ought to map to. Within the
course of, we acquire details about image sorts, knowledge constructor
fields, and sort aliases from the top-level AST nodes. Right here is an
excerpt from the returned mapping, which as you’ll be able to see has a bit too
a lot info to learn comfortably:

module "/residence/raxod502/information/faculty/hmc/senior/spring/compilers/kalyn/src-kalyn/Essential.kalyn"
  ...
  >>=State -> common image __States_u46kalyn___u62_u62_u61State with kind (Func (__DataTypes_u46kalyn__State s a) (Func (Func a (__DataTypes_u46kalyn__State s b)) (__DataTypes_u46kalyn__State s b))) (and a couple of sublambdas)
  >Int -> common image greaterThan with kind (Func Int (Func Int __DataTypes_u46kalyn__Bool)) (and a couple of sublambdas)
  Char -> knowledge constructor __DataTypes_u46kalyn__Char with index 0 out of 1 and 1 subject (unboxed, no header phrase, subject kind __DataTypes_u46kalyn__Word8 for kind spec __DataTypes_u46kalyn__Char)
  ...
    __Sets_u46kalyn__Set okay -> (__Maps_u46kalyn__Map okay __DataTypes_u46kalyn__Empty)
    ...

(The precise mapping is round 3,900 traces of this.)

Kind checker

The kind checker is maybe probably the most attention-grabbing a part of the compiler,
a minimum of to me. It makes use of a constraint fixing algorithm just like that
utilized in Haskell. For example the way it works, let’s contemplate an
instance, the usual library perform curry, however the kind signature
of uncurry:

(defn curry (Func (Func a b c)
                  (Func (Pair a b) c))
  (f a b)
  (f (Pair a b)))

This desugars to the next declaration:

(def curry (Func (Func a b c)
                 (Func (Pair a b) c))
  (lambda (f)
    (lambda (a)
      (lambda (b)
        (f ((Pair a) b))))))

Step 1 is to assign numerical identifiers to each expression and sort
parameter within the declaration. That appears like this, utilizing actual numbers
from the sort checker:

;               0           1 2 3
;               :           : : :
     (def curry (Func (Func a b c)

;                                 1 2  3
;                                 : :  :
                      (Func (Pair a b) c))

;      0        4
;      :        :
       (lambda (f)

;        5        6
;        :        :
         (lambda (a)

;          7        8
;          :        :
           (lambda (b)

;            9  10 11 12 14   15 13
;            :  :  :  :  :    :  :
             (  f  (  (  Pair a) b))))))

;    14            16      17      16 17
;    :             :       :       :  :
     Pair :: (Func a (Func b (Pair a  b)))

On this numbering, we’ve got:

  • Native variables (4, 6, 8)
  • Intermediate expressions (5, 7, 9, 10, 11, 12, 13, 15)
  • World symbols (0, 14)
  • Kind parameters in international symbols (1, 2, 3, 16, 17)

Step 2 is to generate an inventory of constraints based mostly on how these
numerical identifiers seem in expressions relative to at least one one other.
Right here is the precise record of constraints generated by the sort checker:

  • 0 == Func (Func 1 (Func 2 3)) (Func (Pair 1 2) 3) (from kind of
    top-level image curry)
  • 0 == Func 4 5 (from argument and return kind of lambda (f))
  • 5 == Func 6 7 (from argument and return kind lambda (a))
  • 7 == Func 8 9 (from argument and return kind lambda (b))
  • 10 == Func 11 9 (as a result of f is utilized to ((Pair a) b))
  • 10 == 4 (as a result of f is certain by an enclosing lambda)
  • 12 == Func 13 11 (as a result of Pair a is utilized to b)
  • 14 == Func 15 12 (as a result of Pair is utilized to a)
  • 14 == Func 16 (Func 17 (Pair 16 17)) (from kind of top-level knowledge
    constructor Pair)
  • 15 == 6 (as a result of a is certain by an enclosing lambda)
  • 13 == 8 (as a result of b is certain by an enclosing lambda)

Step 3 is to unify these constraints, one after the other, to see if there
are any inconsistencies between them. We begin with an empty mapping,
after which fill it up by processing the constraints.

  • 0 == Func (Func 1 (Func 2 3)) (Func (Pair 1 2) 3): Set 0 to
    Func (Func 1 (Func 2 3)) (Func (Pair 1 2) 3) in our mapping.
  • 0 == Func 4 5: We need to set 0 to Func 4 5, however 0 already
    has a worth Func (Func 1 (Func 2 3)) (Func (Pair 1 2) 3). We should
    unify the 2 buildings. Fortuitously, each begin with Func.
    In any other case, we might report a sort error. To unify, we set 4 to
    Func 1 (Func 2 3) and set 5 to Func (Pair 1 2) 3.
  • 5 == Func 6 7: Set 5 to Func 6 7.
  • 7 == Func 8 9: Set 7 to Func 8 9.
  • 10 == Func 11 9: Set 10 to Func 11 9.
  • 10 == 4: We need to set 10 to 4, however 10 already has a worth
    Func 11 9. Thus we attempt to set 4 to Func 11 9 as a substitute. Since
    4 already has a worth Func 1 (Func 2 3), we should once more unify. We
    set 1 to 11 and set 9 to Func 2 3.
  • 12 == Func 13 11: Set 12 to Func 13 11.
  • 14 == Func 15 12: Set 14 to Func 15 12.
  • 14 == Func 16 (Func 17 (Pair 16 17)): We need to set 14 to Func 16 (Func 17 (Pair 16 17)), however 14 already has a worth Func 15 12. We should unify. First we set 15 to 16. Then we need to set
    12 to Func 17 (Pair 16 17), however 12 already has a worth Func 13 11. We are able to unify these by setting 13 to 17 and 11 to Pair 16 17.
  • 15 == 6: We need to set 15 to 6, however 15 already has a worth
    16, so we as a substitute set 16 to 6.
  • 13 == 8: We need to set 13 to 8, however 13 already has a worth
    17, so we as a substitute set 17 to 8.

Right here is the ensuing mapping:

 0 -> Func (Func 1 (Func 2 3)) (Func (Pair 1 2) 3)`
 1 -> 11
 4 -> 5
 5 -> Func 6 7
 7 -> Func 8 9
 9 -> Func 2 3
10 -> Func 11 9
11 -> Pair 16 17
12 -> Func 13 11
13 -> 17
14 -> Func 15 12
15 -> 16
16 -> 6
17 -> 8

Why didn’t we get a sort error? Let’s take a better have a look at our
mapping. It says that with the intention to make every thing unify, 1 should be
11, and 11 should be Pair 16 17. However wait, 1 was the parameter
a within the kind declaration for curry. The perform as we’ve written
it solely type-checks if a is a Pair, which isn’t included within the
kind signature. So we’ve got to verify to ensure that any free kind
parameters are usually not set in our mapping to particular sorts, and sign a
kind error if they’re.

Sadly, even after accounting for this, there’s an much more
delicate bug that may happen. Contemplate this code:

(def bug Int
  (let ((recur
         (lambda ((Cons elt elts))
           (recur elt))))
    (size (recur Null))))

It clearly shouldn’t type-check as a result of the recur perform takes a
record of parts but passes itself a single factor. Nevertheless, for those who
run the unification algorithm described above, you’ll discover a distinct
lack of any unification or free kind parameter errors. Let’s have a look at
the ensuing mapping, courtesy of Kalyn’s kind checker:

 0 -> 2
 1 -> Func 15 12
 2 -> Int
 3 -> Record 8
 4 -> Record 13
 5 -> Record 8
 6 -> Record 16
 7 -> Record 8
 8 -> Record 16
 9 -> 1
10 -> 6
11 -> Func (Record 13) Int
12 -> Record 13
14 -> 1
15 -> Record 16
16 -> Record 16

Hmmm… what’s occurring with 16? That seems to the kind of the
argument to recur! We’ve 16 == Record 16 == Record (Record 16) == Record (Record (Record 16)) and so forth. If you concentrate on it, this sort of makes
sense. The argument is 16. From the destructuring, we all know 16 is a
record of parts. However a type of parts is handed because the argument
to recur, so it should additionally 16. The algorithm concludes fortunately that
16 is record of itself. To keep away from this drawback, we’ve got to manually
verify after unification that no kind references itself as a subject of a
knowledge constructor, both immediately or not directly.

Haskell programmers will acknowledge unification errors from GHC’s
Anticipated kind / Precise kind messages, free kind parameter errors
from its Could not match anticipated kind ... a1 is a inflexible kind variable messages, and naturally can't assemble the infinite kind: a = [a]. For sure, Haskell’s kind errors are extraordinarily
tough to interpret, and regularly the one treatment is to stare at
the offending expression till it turns into clear what’s flawed. The
similar is true of Kalyn. Producing significant kind errors for a language
with implicit currying is a tough drawback as a result of any given kind
error could possibly be solved by any variety of completely different adjustments to the code.

Translator (code generator)

The translator is by far the biggest part of the compiler. Many
compilers have quite a lot of intermediate languages between the AST and
uncooked meeting, however Kalyn does translation in a single step. That is
largely as a result of Kalyn is such a easy language that there are actually
only some kinds of constructs to translate, and it’s tough to
provide you with an intermediate language that may helpfully symbolize
the necessary components of those constructs.

The primary problem of the translator is coping with the truth that
Kalyn makes use of a radically completely different programming model than meeting,
in contrast to (for instance) C, C++, Java, or Swift, which might all be
translated pretty immediately. Alternatively, one good factor about
Kalyn is that there are solely about three constructs to determine how
to translate (perform calls, lambdas, and sample matching), and
each different language characteristic doesn’t want any particular assist from the
compiler. For instance, in Java one would wish to translate objects,
lessons, arrays, strings, and many others., however in Kalyn all of these items (or
their equivalents) are merely a part of person code.

Perform calls

Recall from earlier the in-memory illustration of perform objects:

(let ((x 5)
      (y 7))
  (lambda (z)
    (+ (* z x) y)))

  code addr   num params  worth of x  worth of y
  .           .           .           .
  .           .           .           .
  .           .           .           .
+-----------+-----------+-----------+-----------+
| 0x821ad   | 2         | 5         | 7         |
+-----------+-----------+-----------+-----------+

Calling a perform is pretty simple. Contemplate the next
perform whose total physique is only a single perform name:

(defn name (Func (Func Int a) a)
  (func)
  (func 42))

Kalyn interprets it like this:

__Main_u46kalyn__call__uncurried:
        movq 16(%rbp), %t2
        movq %t2, %t4
        movq $42, %t5
        movq 8(%t4), %t7
        leaq 16(%t4), %t6
l9:
        cmpq $0, %t7
        jle l10
        pushq (%t6)
        addq $8, %t6
        dec %t7
        jmp l9
l10:
        pushq %t5
        callq *(%t4)
        movq 8(%t4), %t8
        leaq 8(%rsp, %t8, 8), %rsp
        movq %rax, %t3
        movq %t3, %rax
        retq

First we fetch the perform object from the stack into %t2. Then we
extract the variety of closure values from %t7, and enter a loop to
push all of them onto the stack so as, utilizing %t6 as a pointer
into the perform object. Lastly we push the formal argument to the
perform, which is the worth 42 in register %t5, and use callq
to carry out an oblique name. After it finishes, we restore the stack.

Invoking an occasion of the IO monad may be very related! The one
distinction is that after pushing the values that have been bundled within the
perform object, we name instantly, as a substitute of pushing an additional
argument.

Lambdas

Okay, so now that we all know name perform objects, how will we
assemble them? The primary tough factor right here is coping with closures.
When translating an expression, we’ve got entry to a map (initially
derived from the resolver, then augmented with native bindings) which
tells us whether or not any given identify refers to a worldwide image or to a
native variable (i.e., a digital register like %t42).

Let’s suppose we need to translate the lambda expression from above:

(let ((x 5)
      (y 7))
  (lambda (z)
    (+ (* z x) y)))

I believe that is simpler to clarify with out wanting on the precise
meeting generated, which is a little bit of a large number. First we need to
translate the let. We reserve temporaries (say %t0, %t1) for x
and y, and produce the next code:

movq $5, %t0
movq $7, %t1

Now we have to create a perform object. We begin by inspecting the
lambda kind recursively to search out out what free variables it refers
to. Free variables are variables that aren’t certain by an enclosing
let or lambda. The let expression as a complete has no free
variables, but when we solely have a look at the lambda, we see that the free
variables are x and y. Now we all know what to place within the closure of
the perform. We generate one thing like the next pseudocode:

obj    := malloc(32)
obj[0] := handle of lambda physique's code
obj[1] := 2
obj[2] := %t0
obj[3] := %t1

That’s it for the perform object, however now we have to take care of the
physique of the lambda kind. This doesn’t go into the identical perform as
the code above, since it’d get executed later in a very
completely different context (possibly it received returned from one perform after which
handed into map in one other). Let’s say the lambda kind appeared
contained in the perform __Main_u46kalyn__closure. Then we might come up
with a contemporary identify for the physique code, for instance
__Main_u46kalyn__closure__lambda15__x_y_z (the place the closure and
perform argument get caught within the label only for the sake of us
people making an attempt to learn the meeting).

Now, when the lambda perform is invoked, its argument and closure are
all on the stack, however how does it know what order they’re in? That is
taken care of by the translator. After we discover that the lambda has
x and y in its closure, we robotically provide you with two new
temporaries, say %t2 and %t3, to retailer their values inside the
lambda
. (Alternatively, %t0 and %t1 saved the values of x
and y outdoors the lambda.) We additionally provide you with a brief %t4
for the perform argument z. Then we stick this code on the entrance of
the lambda’s physique:

%t2 := first argument from stack
%t3 := second argument from stack
%t4 := third argument from stack

Lastly, once we recursively translate the physique of the lambda, we
replace its map to inform it that x is in %t2, y is in %t3, and
z is in %t4. This cooperation between caller and callee is
essential to verify all of the arguments and closure values get the place
they should go.

Knowledge constructors and sample matching

The primary two challenges of the translator have been the paired operations
of perform creation and performance calls. Subsequent up was one other key pair
of operations: development and matching of algebraic knowledge sorts.

Knowledge constructors are pretty simple. For instance, the information
constructor Pair outlined by the code

(knowledge (Pair a b)
  (Pair a b))

is basically the identical as

(def Pair (Func a b (Pair a b))
  (lambda (a)
    (lambda (b)
      (MakePair a b))))

the place MakePair is an uncurried perform that takes two arguments,
allocates area for a Pair on the heap, and places the arguments in
its fields. Utilizing this transformation, we are able to translate knowledge
constructors utilizing related code to what we used for lambdas above. In
reality, we’ve got to do an identical factor to deal with primitive capabilities (as
I talk about under), so there’s a subroutine within the Kalyn compiler
particularly for taking an uncurried perform like MakePair and
producing the collection of wrappers that permit it to be referred to as in
curried style.

Sample matching requires extra code because of the have to deal with nested
patterns, though it’s pretty simple in Kalyn. There are
many cool optimizations that may be finished on case patterns to determine
what order to carry out checks in and keep away from repeating work. Of
course, we do none of those optimizations, so we translate case
statements merely as a sequence of simple checks. First,
recall the definition of lists in Kalyn and their in-memory
illustration:

(knowledge (Record a)
  Null (Cons a (Record a)))

        ctor idx
        .
        .
        .
      +-----------+
Null  | 0         |
      +-----------+

        ctor idx    head        tail
        .           .           .
        .           .           .
        .           .           .
      +-----------+-----------+-----------+
Cons  | 1         | ****      | ****      |
      +-----------+-----------+-----------+

Contemplate this expression:

(case record
  (Null
   first)
  ((Cons x Null)
   (second x))
  ((Cons 42 [email protected](Cons x _))
   (third x xs)))

We are able to translate it like this:

case0:
    if record[0] != 0 then     (verify if Null)
      goto case1
    end result := first
    goto finished
case1:
    if record[0] != 1 then     (verify if Cons)
      goto case2
    if record[2][0] != 0 then  (verify if tail is Null)
      goto case2
    x := record[1]
    end result := (second x)
    goto finished
case2:
    if record[0] != 1 then     (verify if Cons)
      goto case3
    if record[1] != 42 then    (verify if head is 42)
      goto case3
    if record[2][0] != 1 then  (verify if tail is Cons)
    xs := record[2]
    x := record[2][1]
    end result := (third x xs)
    goto finished
case3:
    error "sample match failed"
finished:
    return end result

The small optimizations Kalyn does do relate to the reminiscence
illustration of algebraic knowledge sorts. Recall that the header phrase
indicating constructor index is barely included if there’s truly
multiple constructor. So, when pattern-matching an ADT like
Pair, we don’t have to verify the constructor index. No additional price!
Likewise, because the Bool ADT has no fields, there’s no have to put
it behind a pointer, so if “statements” in Kalyn simply contain
integer comparisons.

Optimizing perform calls

One factor you might have observed is that perform calls in Kalyn take
time $$ O(n^2) $$ within the variety of arguments. First you must name
the bottom perform with one argument, then you definitely name the returned
perform with a closure worth and the subsequent argument, then you definitely name the
new returned perform with two closure values and the subsequent argument,
and so forth. Moreover, every name requires a loop since you don’t
essentially know what number of closure arguments there have been already. This
is clearly a bit distressing for a language whose applications are
composed virtually fully out of an enormous variety of perform calls. Upon
discovering that Kalyn was not quick sufficient to compile itself, I
carried out what I assumed can be the highest-value easy
optimization, which is $$ O(n) $$ perform calls.

The concept is fairly easy. We are able to’t optimize all perform calls,
as a result of (for instance) when map will get handed a perform, it doesn’t
know the dimensions of its closure, so it has to do the total oblique name.
However once we’re calling a perform that’s globally outlined, why not
simply push all of the arguments instantly and leap into the internal lambda?
To make this occur, I did a couple of issues.

Firstly, I made the resolver examine the AST declarations and see how
many top-level lambdas have been in every image definition (that is
equal to the variety of perform arguments, since perform
declarations broaden to nested lambdas). Subsequent, I modified the
translator in order that it will detect when it was translating a top-level
lambda and provides it a predictable identify. For instance, contemplate the
following commonplace library perform:

(defn foldr (Func (Func a b b) b (Record a) b)
  (func init elts)
  (case elts
    (Null init)
    ((Cons fst rst)
     (func fst (foldr func init rst)))))

Earlier than the change, we might get these capabilities:

  • __src_u45kalynStdlibLists_u46kalyn__foldr
  • __src_u45kalynStdlibLists_u46kalyn__foldr__lambda30479__func
  • __src_u45kalynStdlibLists_u46kalyn__foldr__lambda30483__func_init
  • __src_u45kalynStdlibLists_u46kalyn__foldr__lambda30488__func_init_elts

Other than being an actual mouthful, these capabilities don’t have
predictable names. After the change, we as a substitute get these capabilities:

  • __src_u45kalynStdlibLists_u46kalyn__foldr
  • __src_u45kalynStdlibLists_u46kalyn__foldr__curried0
  • __src_u45kalynStdlibLists_u46kalyn__foldr__curried1
  • __src_u45kalynStdlibLists_u46kalyn__foldr__uncurried

After all, non-top-level lambdas are nonetheless translated like earlier than.

The following step was to replace the translator in order that it will make
direct calls when potential. Basically, when translating a name in
the AST, we examine the left-hand aspect to see if it’s a globally certain
image with top-level lambdas. In that case, we unwind the AST to see how
many arguments the perform is being handed, and leap immediately into
the suitable internal lambda. This modification produced an enormous enchancment
in runtime, though probably not for the explanation you’ll guess. (See
the part on register allocation.)

One significantly tough facet of this optimization is that extra
bookkeeping is required when translating lambdas. Recall from the
part on lambda translation that the caller and callee should
cooperate about which order the closure arguments go in. That is high quality
when a lambda perform is barely used from its immediately containing
expression. Nevertheless, now that top-level lambdas may be referred to as immediately
from different capabilities, these different capabilities should additionally cooperate with
the lambda about argument order: the order of the closure values
immediately has turn into a part of the general public API of the lambda. The answer
is so as to add bookkeeping to the translator to maintain monitor of the argument
order in top-level lambdas and drive it to evolve to what seems in
the code.

Primitive capabilities and bridge

The translator handles person code, however that’s not the one factor wanted
for a functioning binary. Some operations should be carried out
immediately in meeting, akin to arithmetic and IO.

The Primitives module has a group of such hand-rolled meeting
capabilities. The arithmetic operators are fairly easy: they simply learn
arguments from the stack and run them by means of an addq or idivq
instruction earlier than returning. The IO capabilities are extra advanced.
Primarily, every one wraps a sequence of system calls and handles the
related reminiscence allocation and error checking. For instance,
writeFile wraps the unlink, open, write, and shut system
calls, whereas setFileMode makes use of (appropriately sufficient) the chmod
system name.

The Subroutines module, together with numerous utility capabilities for
issues like getting arguments from the stack and performing perform
calls, contains two core knowledge transformation capabilities, packString
and unpackString. Kalyn’s strings are (very bloated) linked lists of
characters, whereas system calls like learn and write function on C
strings or uncooked character buffers. Utilizing hand-written copying loops and
some calls to the reminiscence allocator, packString and unpackString
implement a two-way map between OS strings and Kalyn
linked-list-strings.

Additionally within the Subroutines module is a pair of perform mills,
curryify and monadify. These mills take an uncurried or
side-effecting perform and create extra wrapper capabilities that
assist calling it in a curried or monadic approach. This enables primitives
to be referred to as from person code in the identical approach that some other perform
can be, and it additionally helps the era of curried knowledge
constructors within the translator.

Lastly, the Bridge module defines how all of those person and
primitive capabilities work collectively to kind an entire program. It
presents all the out there primitives, together with a mapping from
their user-code-facing names to their inner meeting names, their
sorts (declared by fiat), and the variety of arguments they take (for
direct perform name optimization). When all of the person capabilities in a
program have been translated, they’re scanned to search out calls to
primitives, and solely these primitives are included within the remaining
binary.

Reminiscence administration

Usually, reminiscence allocation utilizing malloc is taken into account fairly
low-level sufficient. We have to go even lower-level to implement our personal
malloc. On Linux, course of reminiscence allocation is dealt with by the use of
the mmap and brk system calls. Most fashionable applications use mmap and
brk is because of this considerably discouraged, however brk is less complicated so
that’s what Kalyn makes use of.

To know brk, we have to know somewhat concerning the Linux course of
execution mannequin. In precept, when a course of is executed, reminiscence
seems one thing like this (though the image is massively
simplified and considerably flawed):

See Also

+----------------------+
|                      |
|        Stack         |
|                      | <-- stack pointers (%rsp, %rbp)
+----------------------+


          ...


+----------------------+ <-- program break
|                      |
|     Knowledge part     |
|                      |
+----------------------+
|                      |
|     Code part     |
|                      | <-- instruction pointer (%rip)
+----------------------+

At one finish of the handle area are the directions and knowledge from the
binary, and on the different finish is the stack. In between is a big
area of unmapped reminiscence. The working system units all of this up
when executing a program.

On the finish of the information part is a marker often known as the program
break
. This marks the top of the area of the handle area that the
program can use. Utilizing the brk system name, a program can regulate the
place of this system break. By rising this system break, the
program can acquire extra reminiscence to make use of for its heap, after which
parcel out that reminiscence as it’s wanted.

Kalyn’s reminiscence allocator is kind of easy. At startup, it queries the
location of this system break. When person code or a primitive requests
reminiscence from the heap, the allocator increments a pointer for the final
free byte on the heap. As soon as this pointer reaches this system break,
the allocator makes use of brk to request extra heap area from the working
system.

What about reminiscence deallocation? Effectively… we don’t hassle! Which may
sound unhealthy, however I made the guess that our compiler wouldn’t allocate
a lot reminiscence that it will truly run out. Trendy programs have a
lot of RAM, in any case. And in reality my compiler can efficiently
compile itself with out a garbage
collector
.

So how a lot RAM does it use, precisely? Effectively… I didn’t understand this
till after ending the challenge, however in truth round 40 GB. I commend
the sensible engineers of Linux for designing an working system
kernel that may take care of folks like me. Right here, test it out:

Conclusion: Kalyn most likely wants a GC.

Register allocation, liveness evaluation, and performance boilerplate

Register
allocation
is the
a part of Kalyn that gave me probably the most grief by far. Not as a result of it was
exhausting, however as a result of it was gradual. It takes extra time than some other stage
of the compiler, and even after optimizing it to run about 1,200 occasions
sooner, it nonetheless takes 25 seconds to run within the Kalyn implementation
(out of a complete of 45 seconds for compiling the compiler).

Handiest register allocation algorithms begin with an iterative
liveness
analysis
, as I
talked about earlier. For every instruction, we retrieve a set of
registers that it reads from and a set of registers that it writes
from. Then, by analyzing native jumps, we construct a movement graph for every
perform that specifies the potential branches and paths of execution.
Lastly, we use a algorithm to propagate liveness info
by means of the perform till we converge to a fixed
point
.

One helpful utility of the liveness evaluation, in addition to register
allocation, is that we are able to verify for temporaries which are stay on the
starting of a perform. If there are any, meaning we would learn
from a brief that we by no means write to. That is the meeting
equal of an “undefined variable” error, and it proved to be
extremely useful for catching bugs within the translator.

There are a lot of helpful optimizations for iterative liveness evaluation,
however the principle one which I carried out was to replace liveness info
for the directions of every perform in a selected order. The
easiest method is to simply compute liveness info for each
instruction within the perform in parallel, after which recompute all of it
based mostly on the up to date info, repeating till the data no
longer adjustments. As an alternative, I up to date the liveness info for one
instruction at a time, stepping backwards by means of the perform, and
then repeated beginning once more from the top. This decreased the variety of
iterations required for termination by an element of 300 on common.

The opposite a part of register allocation is utilizing the liveness
info to assign registers to temporaries. One of many
conceptually easiest approaches to register allocation is to start out by
constructing an interference
graph

which connects every pair of temporaries that can not be put into the
similar register, after which color the
graph
to search out an
allowable register allocation. The primary drawback with
graph-coloring allocators are that they’re fairly gradual, because the
interference graph has dimension quadratic within the size of the perform
(not acceptable since capabilities in Kalyn usually have many hundreds of
directions).

Because of this, I based mostly my implementation as a substitute on linear-scan
register
allocation
.
In linear-scan allocation, the total interference graph just isn’t
constructed, and as a substitute solely approximated by discovering the primary and
final instruction the place every register is stay, and assuming that it’s
stay for your complete interval in between. It’s quick and straightforward to verify
if two stay intervals intersect.

Now, the paper on linear-scan register allocation supplies a really quick
linear-time algorithm for performing the allocation, which exploits
the construction of liveness intervals. I tried to implement this
algorithm, but it surely proved to be very awkward to translate right into a
useful model, so what Kalyn makes use of (for now) is an easy
“brute-force” allocation algorithm that doesn’t run as quick because the
actual linear-scan algorithm however nonetheless advantages from not having
to compute the interference graph. (See the linear-scan branch on
GitHub

for my try at true linear-scan register allocation.)

I used to be upset to search out that after rushing up liveness evaluation by
an element of 300, the register allocator was nonetheless far too gradual. I
solved this drawback by a mixture of a number of optimizations:

  • Not computing a full movement graph, and as a substitute gathering the
    info wanted to account for jumps on the fly throughout liveness
    evaluation.
  • Avoiding using $$ O(n log n) $$ knowledge buildings like maps as
    a lot as potential, in favor of lists plus extra bookkeeping.
  • Computing liveness intervals for all temporaries in parallel,
    as a substitute of doing it individually for every one.
  • Making small logic adjustments to the code that checked for out there
    machine registers for a brief with the intention to keep away from duplicating
    work.
  • However most significantly, implementing the direct function-call
    optimization within the translator that I mentioned earlier. One strategy to
    make the register allocator sooner is to simply make the code smaller
    that it’s allocating registers for! (Utilizing direct perform calls
    decreased the variety of directions by over 50%.)

At this level, I think that the easiest way to get a efficiency
enchancment from the register allocator could also be to handle the actual fact
that my system is 5 GB into swap when it begins working, by including a
correct rubbish collector 🙂

Assembler

The assembler was by far the slowest a part of the compiler to jot down,
regardless of that it’s probably not very lengthy. This was primarily as a result of
each present supply of documentation on x86 instruction encodings is
reprehensibly unhealthy. For instance, I dare you to take a look at this
page
and provide you with
something understandable. On this part, I’ll take you thru the
fundamentals of how the encoding scheme for x86 works.

x86 directions include quite a lot of completely different components, a few of which
are usually not current in each instruction (so completely different directions can
have completely different lengths):

  • REX byte: that is non-compulsory; if it’s current, it supplies sure
    flags that change the conduct of the instruction, like working in
    64-bit as a substitute of 32-bit mode or altering which set of registers the
    instruction operates on.
  • Opcode: this tells you what instruction it’s, and what sorts of
    arguments are being handed to it.
  • Mod/RM byte: this tells you what registers the instruction operates
    on, and likewise tells you whether or not or not the instruction accesses
    reminiscence.
  • SIB byte: for directions that entry reminiscence, this offers you the
    info for that.
  • Displacement: it is a numerical offset used for reminiscence accesses.
  • Quick: it is a numerical fixed used for directions that
    have one hardcoded into them.

The best strategy to perceive how all of this works is to encode an
instance instruction. Let’s encode the next instruction:

addq $0x42, 0x20(%r11, %rdi, 4)

This instruction says that we should always determine the worth of %r11 + %rdi * 4 + 0x20, and add the worth 0x42 to no matter is saved at
that reminiscence handle.

The very first thing we do is have a look at the big table of
opcodes
. There are fairly a
few rows for add. We wish the one which has r/m16/32/64 in op1
and imm16/32 in op2. The notation implies that one operand is a
16-bit, 32-bit, or 64-bit reminiscence reference, whereas the opposite is a
16-bit or 32-bit fast (or fixed). In line with the desk, the
opcode for this model of add is 0x81, with an “opcode extension”
of 0x0. (Extra on that later.)

Subsequent, we have to determine the Mod/RM byte. The format of this byte
is as follows:

+---+---+---+---+---+---+---+---+
| mod   | reg       | rm        |
+---+---+---+---+---+---+---+---+

After all, this doesn’t inform you very a lot. Let’s undergo the
items. First is the mod subject. This tells us if we’re doing a
reminiscence reference, and in that case what sort. On this case we’re, so we wish
to set it to 0b10. If we weren’t, we might set it to 0b11.

Subsequent is reg. This may usually inform you the supply register for
the instruction. On this case, nevertheless, the supply is a right away, so
as a substitute that is the place we put the opcode extension. (Why? Effectively, the
byte had some bits free, so Intel determined to cram much more knowledge in,
as a result of that meant the identical opcode might imply various things
relying on the worth of the additional bits.) Thus reg is 0b000.

Lastly we’ve got rm. This may usually inform you the vacation spot
register for the instruction. Nevertheless, on this case we’re utilizing a
reminiscence reference, so we set it to the particular worth 0b100.

Now, as a result of we’re utilizing a reminiscence reference, we’ve got to incorporate the
SIB byte. Right here’s what that appears like:

+---+---+---+---+---+---+---+---+
| scale | index     | base      |
+---+---+---+---+---+---+---+---+

Once more, this doesn’t inform you very a lot, so we’ll undergo the items
individually. First is scale. That is fairly simple; it
tells you the multiplier for the handle computation, which is 4 in
this case. We encode 4 as its base-two logarithm 0b10.

Subsequent are index and base. These inform you the 2 registers which are
used within the reminiscence reference, utilizing the identical encoding that we might
have in any other case used within the mod and rm fields. On this encoding,
the index %rdi is 0b0111 and the bottom %r11 is 0b1011. Now, how
can we put this into the three-bit fields index and base? The
reply is the REX byte. The decrease three bits of every register go into
the SIB byte fields, whereas the higher bit, if wanted, goes into the REX
byte. That’s our subsequent matter.

The REX byte seems like this:

+---+---+---+---+---+---+---+---+
| 0   1   0   0 | W | R | X | B |
+---+---+---+---+---+---+---+---+

(Why the 0b0100 at first? As a result of the REX byte doesn’t have
for use on all directions, so it could’t battle with some other
instruction’s opcode.) The meanings of the flag bits are as follows:

  • W: run in 64-bit mode
  • R: the higher little bit of the reg subject within the Mod/RM byte
  • X: the higher little bit of the index subject within the SIB byte
  • B: the higher little bit of the rm subject within the Mod/RM byte, or the
    higher little bit of the base subject within the SIB byte if the rm subject says
    we’re doing a reminiscence reference

We need to do every thing in 64-bit mode, so we set the W bit, and
the bottom register within the SIB byte has the excessive bit set, so we additionally set
the B bit. That offers us a REX byte of 0b01001001.

Now we’ve got to encode the numerical constants. We’ve two, the reminiscence
handle displacement 0x20 and the fast 0x42. These are each
32-bit, so on a
little-endian system we
sign-extend the
immediates to 0x20000000 and 0x42000000.

Placing all of this collectively, we get this 12-byte instruction
encoding:

0x49  [0b01001001]   REX byte
0x81                 Opcode
0x84  [0b10000100]   Mod/RM byte
0xbb  [0b10111011]   SIB byte
0x20 0x00 0x00 0x00  Displacement
0x42 0x00 0x00 0x00  Quick

Sadly, there are much more particulars that should be taken care
of in Kalyn’s assembler. Reminiscence references based mostly on %rip are encoded
otherwise, some directions like imulq don’t assist sure
kinds of calls, some directions truly encode a register quantity as
a part of the opcode, and so forth. None of that may be very attention-grabbing,
although, so I received’t go into it right here.

The one element that’s value mentioning is how labels are dealt with. How
do you encode the next instruction?

callq __src_u45kalynStdlibLists_u46kalyn__foldr__uncurried

Effectively, first we generate meeting for every of the directions, with
placeholders for label references. At this level we all know the size of
every instruction, so we are able to compute an offset in bytes between any two
directions. Now we generate meeting a second time. This time, when
we encounter an instruction like callq or jmp, we glance up the
offset of that label from the present instruction, and substitute that
offset into the instruction.

Linker

The linker was truly the part that I wrote first, and it
impressed your complete remainder of Kalyn. Why? I used to be upset that in my
Compilers class we have been solely generate meeting code in textual content (.S)
format, and utilizing GCC to compile it the remainder of the way in which. How exhausting, I
puzzled, would it not be to supply an actual executable by hand? The
reply, it turned out, was solely 60 traces of code! Right here is the unique
model:

fixedPoint :: Eq a => a -> (a -> a) -> a
fixedPoint x f = let fx = f x in if x == fx then x else fixedPoint fx f

-- see web page 20
elfIdent :: B.ByteString
elfIdent =
  toLazyByteString
    $  word8 0x7f -- magic bytes
    <> stringUtf8 "ELF"
    <> word8 2 -- handle dimension, 64-bit
    <> word8 1 -- endianness, little-endian
    <> word8 1 -- model of ELF specification
    <> mconcat (replicate 9 $ word8 0)

-- see web page 18
elfHeader :: Word16 -> Word16 -> B.ByteString
elfHeader elfHeaderLength programHeaderLength =
  let totalLength = elfHeaderLength + programHeaderLength
  in  toLazyByteString
        $  lazyByteString elfIdent
        <> word16LE 3 -- file kind, relocatable executable (referred to as "shared object file")
                      -- see https://opensource.apple.com/supply/dtrace/dtrace-90/sys/elf.h
        <> word16LE 62 -- structure, x86_64
        <> word32LE 1 -- object file model
        <> word64LE (fromIntegral totalLength) -- entry level in digital reminiscence
        <> word64LE (fromIntegral elfHeaderLength) -- program header offset
        <> word64LE 0 -- part header offset, unused
        <> word32LE 0 -- processor-specific flags, none wanted
        <> word16LE elfHeaderLength -- ELF header dimension
        <> word16LE programHeaderLength -- program header entry size
        <> word16LE 1 -- program header entry depend
        <> word16LE 0 -- part header entry dimension, unused
        <> word16LE 0 -- part header entry depend, unused
        <> word16LE 0 -- index of string desk in part header, unused

-- see web page 40
programHeader :: Word16 -> Word16 -> Word64 -> B.ByteString
programHeader elfHeaderLength programHeaderLength imageSize =
  let totalLength = fromIntegral $ elfHeaderLength + programHeaderLength
  in  toLazyByteString
        $  word32LE 1 -- phase kind, loadable code/knowledge
        <> word32LE 0x7 -- permissions, permit all (see web page 73)
        <> word64LE totalLength -- offset from starting of file
        <> word64LE totalLength -- digital handle at which to map code/knowledge
        <> word64LE 0 -- bodily handle at which to map, unused
        <> word64LE imageSize -- variety of bytes listed in file picture
        <> word64LE imageSize -- variety of bytes to order in reminiscence
        <> word64LE 0 -- alignment, none required

-- see web page 15
elfData :: B.ByteString -> B.ByteString
elfData code =
  let (ehdr', phdr') = fixedPoint (B.empty, B.empty) $ (ehdr, phdr) ->
        let elen      = fromIntegral $ B.size ehdr
            plen      = fromIntegral $ B.size phdr
            imageSize = fromIntegral $ B.size code
        in  (elfHeader elen plen, programHeader elen plen imageSize)
  in  toLazyByteString
        $  lazyByteString ehdr'
        <> lazyByteString phdr'
        <> lazyByteString code

On condition that the challenge ended up at a complete of about 9,800 traces of
code, maybe I used to be barely misled concerning the ease of making a
compiler from scratch. Particularly, Kalyn’s remaining linker has about
320 traces of code, as a result of it helps extra options than simply “hi there,
world”. But it surely was extremely gratifying to have the ability to create a completely
working binary and know precisely what each byte was there for, and I’m
glad this preliminary linker impressed the remainder of Kalyn.

In any case, let’s undergo the linker. It’s principally an
implementation of the Exectuable and Linkable Format
(ELF)
.
Fortunately, the ELF
specification
is
very simple to learn (in comparison with most different specs, a minimum of).
Listed here are the fundamentals:

  • On the very starting of the file there’s a fixed-length header
    that identifies the file as utilizing ELF and declares basic
    configuration choices just like the
    endianness of the file.
  • After that comes the ELF header, which units additional configuration
    choices such because the processor
    architecture

    of the meeting code contained within the file and likewise identifies the
    places of the next headers within the file.
  • Subsequent, there’s a program header that explains how the working
    system ought to load this system into reminiscence and begin it executing.
    This specifies the place this system must be loaded in handle area
    and the digital reminiscence settings that must be utilized.
  • Optionally, there’s a part header that gives additional metadata
    which can be utilized by instruments akin to
    GDB.
  • Relying on what entries there are within the part header, there could
    be extra buildings akin to string and image tables.
  • And naturally there’s the precise code and knowledge of the binary, as
    referenced by this system header.

Debugging info

What truly goes within the part header? Effectively, naturally sufficient, it
is an inventory of various sections, every of which might have a special
sort of knowledge (and, by conference, a reputation that makes it simple for people
and instruments to determine its function). Listed here are those included in
Kalyn binaries:

.shstrtab
.symtab
.strtab
.textual content
.knowledge

Let’s begin with .textual content and .knowledge. These comprise the principle code and
knowledge of this system, respectively. (There are additionally entries within the
program header for the textual content and knowledge sections. This system header is
for the working system whereas the part header is for different instruments,
in order that they each embody related however not the identical info.)

Subsequent is .symtab. That is the image desk, and it accommodates
debugging info for GDB: the addresses of all of the symbols within the
program. Every perform has an entry within the image desk, in order that GDB
is aware of show perform names whereas debugging:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃B+>0xcdbd7 <print__uncurried>      rex.W push %rbp                ┃
┃   0xcdbda <print__uncurried+3>    mov    %rsp,%rbp               ┃
┃   0xcdbdd <print__uncurried+6>    rex.W push %rcx                ┃
┃   0xcdbe0 <print__uncurried+9>    pushq  $0x18                   ┃
┃   0xcdbe5 <print__uncurried+14>   callq  0xce5eb <memoryAlloc>   ┃
┃   0xcdbea <print__uncurried+19>   add    $0x8,%rsp               ┃
┃   0xcdbf1 <print__uncurried+26>   lea    -0x9c(%rip),%rcx        ┃
┃       # 0xcdb5c <print__uncurried__unmonadified>                 ┃
┃   0xcdbf8 <print__uncurried+33>   mov    %rcx,0x0(%rax,%riz,1)   ┃
┃   0xcdc00 <print__uncurried+41>   movq   $0x1,0x8(%rax,%riz,1)   ┃
┃   0xcdc0c <print__uncurried+53>   mov    0x10(%rbp,%riz,1),%rcx  ┃
┃   0xcdc14 <print__uncurried+61>   mov    %rcx,0x10(%rax,%riz,1)  ┃
┃   0xcdc1c <print__uncurried+69>   rex.W pop %rcx                 ┃
┃   0xcdc1f <print__uncurried+72>   rex.W pop %rbp                 ┃
┃   0xcdc22 <print__uncurried+75>   retq                           ┃
┃   0xcdc23 <print>                 rex.W push %rbp                ┃
┃   0xcdc26 <print+3>               mov    %rsp,%rbp               ┃
┃   0xcdc29 <print+6>               rex.W push %rdx                ┃
┃   0xcdc2c <print+9>               rex.W push %rcx                ┃
┃   0xcdc2f <print+12>              pushq  $0x10                   ┃
┃   0xcdc34 <print+17>              callq  0xce5eb <memoryAlloc>   ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
native course of 112189 In: print.uncurried          L??   PC: 0xcdbd7

I did plenty of stepping by means of the generated meeting in GDB, so this
characteristic was invaluable.

Lastly, we’ve got .strtab and .shstrtab. These are string tables
for the image desk and part header, respectively. They’re wanted
as a result of the image desk and part header don’t truly comprise
any names (of symbols or sections, respectively). As an alternative, they
comprise pointers into the suitable string desk, which is only a
large record of null-terminated
strings
.

One tough a part of producing ELF is dealing with self-reference. ELF is
surprisingly self-referential! The ELF header must comprise its personal
size in addition to the lengths and offsets of all the opposite headers,
and naturally the offsets of later headers depend upon the lengths of
earlier ones. The part header must reference its string desk,
however its string desk is outlined by the sections within the part header.
The way in which we deal with these issues is principally the identical approach we deal with
label decision within the assembler: simply begin with placeholders, and
then hold recompiling till we hit a hard and fast level 🙂

Tackle area format and randomization

So what does this system header inform us about handle area format? In
the ultimate model of Kalyn’s linker, this system header has two
entries: one for code and one for knowledge. You would possibly count on them to be
proper after each other, however this doesn’t work. The reason being that
fashionable CPUs use page-based virtual
memory
.

In digital reminiscence, the handle area is split into items referred to as
pages, generally 4kiB (0x1000 bytes). Every web page may be mapped to a web page
of the CPU’s bodily reminiscence {hardware} utilizing an information construction referred to as
the page table. By
sustaining a separate web page desk for every course of, the working
system can current the phantasm to every course of that it has command of
your complete (digital) handle area, whereas in truth bodily reminiscence is
shared between many alternative processes.

Web page tables assist extra metadata to be saved for every web page.
For instance, the web page desk can keep completely different permissions for
every web page. This enables code to be mapped on pages with solely
learn/execute permission, whereas the stack and heap may be mapped on
pages with solely learn/write permission. This can be a plus for safety,
because it mitigates assaults based mostly on overwriting code or executing
knowledge. In Kalyn, the code is marked as learn/execute-only whereas the information
is marked as learn/write-only. However every web page can solely have one
permission, so we have to align each code and knowledge to start out at web page
boundaries. That is dealt with in live performance by the assembler and linker,
as a result of alignment impacts label offsets.

The alignment requirement suggests this file format, the place the third
handle depends upon the dimensions of the code:

0x00000  ELF headers
0x01000  Code
0xcf000  Knowledge

The only program header, then, would map the code at 0x01000 and
the information at 0xcf000. Sadly, this doesn’t work both,
as a result of it seems that the working system reserves digital
addresses close to zero (so {that a} null pointer
dereference

will lead to a segmentation
fault
, amongst different
causes). So we’d like this system header to request that the code and
knowledge be mapped at the next digital handle. I discovered that 0x10000
labored properly, so we’ve got in this system header:

0x00000  ->  (not mapped)  ELF headers
0x01000  ->  0x11000       Code
0xcf000  ->  0xdf000       Knowledge

One other subject with handle area format is using randomization
(ASLR)
,
which is ubiquitous right now as a result of it mitigates many assaults based mostly on
reverse-engineering the reminiscence format of a course of. ELF has a subject
that can be utilized to specify whether or not a binary is “relocatable”, which means
that it may be safely mapped in a special place in reminiscence than it
asks for. Kalyn binaries are relocatable (or
position-independent)
as a result of they use PC-relative
addressing
.
Because of this each time the assembler interprets a label, it doesn’t
insert an absolute handle into the instruction encoding. As an alternative, it
computes the offset of the label from the present instruction and
inserts instructions so as to add this offset to the present instruction
pointer
.

Nonetheless, Kalyn disables relocation in its ELF header. Why? It
seems that though Kalyn itself has no issue working at an
arbitrary handle, GDB doesn’t know show symbols accurately
when randomization is enabled, a minimum of not with out extra
metadata. Fairly than put in extra work, I elected to easily
disable randomization. Kalyn just isn’t a security-hardened language 🙂

(As a aspect observe, enabling randomization does clear up the issue of the
backside of handle area being reserved by the working system,
as a result of the method will robotically be mapped at an applicable
location.)

ASLR causes us yet one more issue. Recall that reminiscence allocation in
Kalyn is dealt with by shifting this system break. Usually (and based mostly on
what I mentioned within the part on reminiscence administration), we might count on the
program break to be on the finish of the information part. Nevertheless, even with
relocation disabled, the placement of this system break remains to be
randomized to elsewhere within the handle area. Notably, this
doesn’t imply all of the area between the top of the information part and the
program break is free to make use of! So Kalyn can’t assume it is aware of the place the
heap is situated, and should invoke brk at startup to get the present
location of this system break, then instantly enhance it so as
to get some area for the heap.

Worst/funniest debugging experiences

This brings us to my favourite part of the write-up. You’ll be able to see
from my commit
messages

that there have been some “enjoyable” bugs. Compilers are nice as a result of when
there’s a bug, it might be a bug within the code you’re compiling, or
maybe in the usual library, or within the code generator, or the
parser, or the register allocator, or maybe the reminiscence allocator,
or, heck, possibly the system name documentation simply lied to you!

Anyway, listed below are among the most… attention-grabbing… bugs, in
chronological order:

  • Mainly the identical: Bear in mind once I mentioned the ELF
    specification
    was
    actually nice? That’s… largely true. It seems that there are a
    few gotchas, although. One in every of them is that it’s just for 32-bit
    programs, though nowhere is that this talked about that I’m conscious of.
    Apparently, there’s a separate doc for 64-bit
    ELF
    , which says “properly it’s
    principally the identical as 32-bit, however a bunch of the fields have extra
    bits now”. Nice. I needed to discover that out by manually evaluating
    Kalyn’s ELF header with one from GCC, byte by byte.

  • Essentially the most useful error message: “Segmentation fault” is unquestionably
    everyone’s favourite informative error message. You probably did one thing
    flawed. The place? Don’t ask me, you’re the one who wrote the code.

    Typically, I’ve assumed that segmentation faults imply my code tried
    to dereference a foul pointer, or entry reminiscence with out the right
    permissions. Effectively, it seems that’s not the one cause you’ll be able to
    get one.

    Proper after I up to date Kalyn’s linker to assist separate code and
    knowledge sections, my binaries began failing on startup with a
    segmentation fault. Naturally, I assumed the “hi there, world” code was
    in some way messing up studying its message from the information part. Nope!
    Seems you get a segmentation fault when this system header tries
    to map two sections onto completely different components of the identical web page (that is
    the issue I discussed earlier when speaking about web page tables). I
    mounted it by page-aligning the code and knowledge sections.

    That is probably the toughest bug I’ve ever needed to debug in my life,
    as a result of I nonetheless can’t consider any approach that I might have figured it
    out other than what I did, which was stare at issues till divine
    inspiration struck. (Really, it was courtesy of getting carried out
    a web page desk supervisor again in my Working Methods class a yr in the past.
    Thanks Prof. Rhodes!)

  • A basic off-by-a-variety-of-values error: For some cause, I
    had a lot of bother determining the appropriate strategy to learn arguments
    from the stack. I believe this sequence of commits speaks for itself:

    Implement arithmetic primitives in meeting

    +-- warning: will get arguments in reverse order!
     getArg n = getField (n + 1) rsp
    

    Repair calling conference bugs

     -- warning: will get arguments in reverse order!
    -getArg n = getField (n + 1) rsp
    +getArg n = getField (n + 2) rbp
    

    Repair implementation of getArg

     -- warning: will get arguments in reverse order!
    -getArg n = getField (n + 2) rbp
    +getArg n = getField (n + 1) rbp
    

    Repair calling conference

     -- warning: will get arguments in reverse order!
    -getArg n = getField (n + 1) rbp
    +getArg n = getField n rbp
    

    Bugs in reminiscence supervisor

     -- warning: will get arguments in reverse order!
    -getArg n = getField n rbp
    +getArg n = getField (n - 1) rbp
    

    Put spilled temporaries in the appropriate place

     -- warning: will get arguments in reverse order!
    -getArg n = getField (n - 1) rbp
    +getArg n = getField (n - 3) rbp
    

    Alright now we’re getting someplace

     -- warning: will get arguments in reverse order!
    -getArg n = getField (n - 3) rbp
    +getArg n = getField (n + 1) rbp
    
  • The place did my reminiscence go?: This was a fantastic one as a result of my program
    would segfault, however provided that I wasn’t utilizing the debugger. Seems
    that GDB disables ASLR, which on multiple event modified the
    conduct of my applications (both to make a bug seen or to cover
    it). On this case, I used to be hit by ASLR placing this system break
    someplace completely completely different from the top of the information part.
    Beforehand, I used to be initializing the reminiscence supervisor by placing a
    image referred to as heap on the finish of the information part, and beginning
    allocation there. As I discussed earlier, I mounted the issue by
    as a substitute calling brk at startup to question the preliminary location of
    this system break. It was tough primarily as a result of there seems to
    be no documentation by any means on Linux handle area format and in
    explicit how this system break works within the context of
    randomization.

  • Simply in case: Beforehand I used the
    regex-tdfa bundle
    for Kalyn’s lexer within the Haskell implementation. One factor that
    mystified me was that studying supply code received suspiciously gradual for
    “giant” information (the place “giant” meant a couple of hundred traces). I used to be
    initially misled into considering the parser was at fault, as a result of
    Haskell helpfully clings onto lazy
    evaluation
    with a
    loss of life grip, thus making it extraordinarily tough to accurately observe
    how lengthy something takes to guage. However no, it seems that when
    you ask regex-tdfa to match a regex at first of a string, if
    it doesn’t match, then it helpfully scans your complete remainder of the
    string
    . You understand, simply in case? I suppose? Anyway, that produced a
    pretty quadratic-time lexer. I dropped regexes and switched to
    guide pattern-matching, which was extra elegant anyway.

  • Inventive subject ordering: I used to be in a position to monitor this one right down to
    the next take a look at case:

    (knowledge Instance
      (Instance Int Int Int Int Int Int Int Int Int Int))
    
    (public def fundamental (IO Empty)
      (let ((t (Instance 1 2 3 4 5 6 7 8 9 10)))
        (case t
          ((Instance a b c d e f g h i j)
          (print (showList showInt [a b c d e f g h i j]))))))
    

    The output:

    [9, 7, 5, 3, 1, 2, 4, 6, 8, 10]
    

    … What?

    It turned out that what was occurring right here was curried capabilities have been
    studying their arguments within the flawed order, so every time one other
    argument received handed, it received caught onto the other aspect of the
    argument record. A straightforward repair, however the first time I noticed this subject
    ordering I used to be fairly dumbfounded.

  • Performed by x86: This one took fairly some time in GDB to trace down.
    For some cause, %r11 was getting overwritten someplace within the
    center of a thousand-instruction-long perform, so it didn’t have
    the appropriate worth anymore by time it received to the top. Fortunately, GDB
    helps breaking on writes to a register, so I used to be in a position to monitor it
    right down to a system name. In x86-64, the %rax, %rdi, %rsi,
    %rdx, %rcx, %r8, and %r9 registers are used for argument and
    return worth passing. I had assumed that these have been additionally the
    caller-saved registers. Oops. Seems %r10 and %r11 are
    caller-saved too.

  • Removes most duplicates: This was a bug within the a part of the
    translator that collects free variables from an expression. Are you able to
    spot the issue?

    freeVariables (Lambda arg physique) = freeVariables physique  [arg]
    

    We acquire the free variables within the physique, however then arg doesn’t
    depend, so we take away it utilizing the list-differencing operator .

    In case you haven’t noticed it but, right here’s some code that compiled
    high quality:

    (lambda (foo)
      ((if b fst snd) foo))
    

    And right here’s some code that didn’t:

    (lambda (foo)
      (if b
        (fst foo)
        (snd foo)))
    

    Sure, it seems that solely removes the first copy of every
    factor from the enter record. So foo was returned as a free
    variable provided that it was used greater than as soon as within the physique. Thanks,
    Haskell.

  • Performed by my very own kind implementation: There are, in fact, many
    fascinating kind algorithms for me to select from for the Kalyn
    commonplace library (stooge
    sort
    , sleep
    sort
    ,
    stack sort, and many others.). However since I already
    went to the difficulty of implementing a splay tree library, tree
    sort
    was probably the most pure:

    (public defn kind (Func (Ord a) (Record a) (Record a))
      (cmp elts)
      (setToList (setFromList cmp elts)))
    

    Can you notice the issue?

    +;; warning: deletes duplicates!!
     (public defn kind (Func (Ord a) (Record a) (Record a))
       (cmp elts)
       (setToList (setFromList cmp elts)))
    
  • The person web page lied to me: Let me quote from the person web page of
    getcwd(2):

    SYNOPSIS
    
           char *getcwd(char *buf, size_t dimension);
           ...
    
    RETURN VALUE
           On success, these capabilities return  a  pointer  to  a
           string containing the pathname of the present working
           listing.
           ...
    
    NOTES
           Below Linux, these capabilities make use of the getcwd()
           system name (out there since Linux 2.1.92).
           ...
    
       C library/kernel variations
           On Linux, the kernel supplies a getcwd() system name,
           which  the  capabilities described on this web page will use
           if potential.  The system name takes  the  similar  argu‐
           ments  as  the library perform of the identical identify, however
           is proscribed to returning at most PATH_MAX bytes.
           ...
    

    Nice! We all know use the system name: simply move it the buf
    and dimension, and we get a pointer to the pathname. Proper?

    Nope. Seems what the system name returns is truly the
    size of the string that was put into the buffer. To this date, I
    have completely no concept how I used to be speculated to know this aside from
    stepping by means of GDB after I received a segfault from dereferencing the
    return worth.

    Though frankly I’d take this documentation over the x86 reference.
    A minimum of this one is making an attempt, even when it’s flawed.

    (Sure, okay, okay, the documentation doesn’t technically say
    something that’s outright false. Solely severely deceptive.)

  • Hanging off the sting: After I was within the remaining levels of getting
    the compiler on-line, I observed that I might generally get a
    segmentation fault, however provided that I printed sufficient knowledge. Right here’s what
    was occurring.

    Within the Subroutines module, I’ve a routine referred to as packString
    which takes a Kalyn-style linked-list string and packs it right into a
    contiguous byte array for passing to a system name. Since Kalyn
    shops characters as regular 64-bit integers (with the higher 56 bits
    unused), and doesn’t in any other case manipulate single bytes, I wanted to
    add restricted assist for the single-byte transfer directions in x86, so
    that I might copy bytes to and from packed strings with out
    overwriting neighboring bytes. (That is after I debugged why two
    seemingly similar characters refused to match equal. Apparently,
    one among them had some rubbish within the higher 56 bits as a result of I had
    by chance copied it from neighboring reminiscence. Helpfully, GDB
    didn’t show any knowledge from the higher bits once I informed it to
    show the worth as a personality.)

    Sadly, it turned out that whereas I used to be utilizing single-byte
    strikes for copying characters into the packString buffer, I used to be
    utilizing a standard movq for setting the null byte on the finish. And if
    packString received very unfortunate, it was potential that the allotted
    string buffer ended lower than eight bytes from this system break (at
    a web page boundary), and writing the null byte as a full phrase would
    trigger an entry to unmapped reminiscence.

  • And for this reason we don’t belief folks: For some cause, I discovered
    that setSize was returning zero for a set that shouldn’t be empty.
    Why? Effectively, I had already had bugs the place I had copied the splay tree
    implementation from
    TreeStructures
    incorrectly, so I suspected one other a type of. Simply to verify,
    although, I loaded TreeStructures into GHCI…

    λ dimension $ empty
    0
    λ dimension $ singleton (1, 2)
    0
    λ dimension $ fromList [(1, ""), (2, ""), (3, "")]
    2
    

    Yeah, that appears about proper. And for this reason we don’t belief folks.

  • Subject ordering 2: Electrical Boogaloo: Argument references ought to
    appear like the next.

    movq 56(%rbp), %t792
    movq 48(%rbp), %t793
    movq 40(%rbp), %t794
    movq 32(%rbp), %t795
    movq 24(%rbp), %t796
    movq 16(%rbp), %t797
    

    Q: Why do they appear like this as a substitute?

    movq 56(%rbp), %t792
    movq -32(%rbp), %t793
    movq 56(%rbp), %t794
    movq -32(%rbp), %t795
    movq 56(%rbp), %t796
    movq -32(%rbp), %t797
    

    A: The perform (- 8) did not do what I assumed it did.
    Corrected to (flip - 8).

What subsequent?

Kalyn is clearly not full. Sooner or later, I hope to proceed
improvement and fill within the gaps to create a really usable
general-purpose language that may kind the premise for additional studying
(working system improvement?). Listed here are some fast enhancements:

  • Add a rubbish collector. This may not solely enhance efficiency however
    would additionally make compilation sensible for bigger applications. (I’m
    going to expire of swap area finally…)
  • Implement a efficiency profiler to determine runtime hotspots for
    optimization.
  • Enhance efficiency usually in order that core improvement can take
    place utilizing solely the self-hosted implementation.
  • Add typeclass assist.
  • Report user-friendly error messages, together with line numbers and
    filenames.
  • Show backtraces by mapping the image desk into reminiscence and
    strolling the bottom pointer chain when this system has crashed.
  • Add assist for inline meeting, in order that primitives may be outlined
    in person code and don’t should be duplicated throughout each
    implementations of the compiler.
  • Cut back the Haskell implementation to a minimal potential base in
    order to cut back the hassle wanted so as to add new language options.

In abstract, I’m extraordinarily proud of Kalyn as a challenge. Growing a
compiler from scratch is a wonderful studying expertise as a result of it
entails creating numerous elements, every of which is kind of
completely different from the others. And within the course of, you’ll be able to develop a really
satisfying and thorough information of how fashionable software program actually
works.

Necessary authorized discover: This weblog submit is maintained by Radian
LLC
.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top