Now Reading
Constructing a Programming Language in Twenty-4 Hours

Constructing a Programming Language in Twenty-4 Hours

2023-06-05 11:19:06

I do know I promised to place out an article each week on a Sunday. I have never been too good at sticking to that schedule. It is now Sunday morning at round 8:50 AM. I’ve a factor I’ve to be at by 4 PM. Naturally, I’ll choose one thing that can take much more than seven hours.

Enter: Constructing a programming language from scratch in six hours (or much less).

Hey, future Ersei right here. This took greater than six hours. Simply FYI.

Let’s have a look at how this goes.

Discuss this post on Mastodon

This could not be used as a tutorial for how one can create a programming language. It’s my precise thought course of and my failures whereas making one. The whole lot that I write is unsuitable, and in case you copy it down then what you make can be unsuitable too (till you get to the half the place I understand it is unsuitable and repair it). You’ve got been warned.

My good friend advised me that I’m not an actual programmer as a result of I’ve not made my very own programming language but. As a result of I can not take a joke, I made a decision to take her up on the problem and make one. There have been a couple of small points with that, although.

I’ve no expertise constructing a programming language.

Actually none.

However I understand how to start out tasks! Set some targets!

This programming language should be capable to:

  1. Be quick
  2. Be capable of name exterior C code
  3. Look good (kinda)
  4. Moveable
  5. (Time allowing) have knowledge buildings like a dynamically allotted listing

Hey, future Ersei right here. About two of those acquired applied. Simply FYI.

Let’s do what each undertaking begins with: looking out the web.

A DuckDuckGo search bar showing

There are plenty of choices! Folks have made their very own passion languages already! I opened up all the hyperlinks on the primary web page up in one other tab.

The primary hyperlink I clicked on gave me this:

Over the previous 6 months, I have been engaged on a programming language referred to as Pinecone. I would not name it mature but, nevertheless it already has sufficient options working to be usable, akin to variables, capabilities and person outlined buildings (freeCodeCamp)

Uh, six months? I have to get all of that carried out in six hours (or much less.)

I’ve additionally simply spent the final ten minutes writing all of that and placing in screenshots!

Anyway, there appear to be a couple of steps in making a programming language: first is choosing the language that your language is in-built. Let’s go for Rust for no different purpose than for the meme.

Subsequent up is lexing (which is to take the programming language textual content and switch it right into a set of “tokens” for the compiler to have a look at). The primary hyperlink additionally advised me that there are present lexers that’ll do all of it for you. Let’s seek for a lexer in Rust. The primary hyperlink is pest. May as effectively use it.

I’m additionally now realizing that I would like a reputation for my Git repo the place I am gonna be placing all of this. Let’s go together with… Ersei-Lang, shortened to Erlang! No, that will not work. EPL? For Ersei-Programming-Language? Apparently it already exists for programming printers. Uh, how about choosing a random phrase from the dictionary? Let’s go together with…

A random word generator returning the word

Cane! Adequate for me. A cursory search returns no different undertaking referred to as “Cane”, so I believe I am good.

Wow, I did actually simply spend ten extra minutes choosing a reputation.

Anyway, let’s get began! Sit down behind the pc, open up Vim, and get typing!

In case you actually need to learn the entire growth story, go to the Development Hell part.

To obtain the language, go to the Sourcehut page.

To mess with the language on-line, go to the web playground.

With out additional ado, this is some code snippets!

; Easy "Hiya, world!" program ~

(`whats up, ' `worldn' add print)
;
It is fairly easy to outline a variable
You want: the information, and a string as a reputation.
~

((1 2 3 4 add) `myvarname' defvar)
(myvarname `n' print)
(myvarname `n' print)
(myvarname `n' print)
(myvarname `n' print)

; All variables are immutable! You'll be able to change the variable, however you may't edit it. ~

((1 2 3 4 add) `myvarname' defvar)
(myvarname `n' print)
((1 2 3 4 5 add) `myvarname' defvar)
(myvarname `n' print)

Fairly cool, huh?

How about one thing extra dynamic? Make a operate (I name them “actions” as a result of why not)

;
actions, or capabilities, are simply strings which might be executed. arguments handed can be found
via the `args' variable
~

([(args add)] `myaction' defact) ; outline an motion referred to as `myaction' ~
(1 2 3 4 myaction print) ; name myaction with arguments 1 2 3 4 ~

Now print the numbers from 1 to 100!

; Prints the numbers from 1 to 100 ~

(1 `depend' defvar)
(
    [
        (count `n' print)
        (count 1 add `count' defvar)
    ]
    100 repeat
)

There are much more demos on the Git repo.

The TL;DR

Cane is a weird language. It’s like if you showed me Lisp and requested me to go loopy. It is acquired parentheses!

That is additionally technically a practical programming language. The entire program will consider to a state. Once more, I don’t know how practical programming languages work. Essentially the most expertise I’ve is with Nix and even then I simply copy-paste issues into my config till it really works.

I’m most likely the worst particular person to be making a Lisp-like functional-ish programming language. Largely as a result of I do know simply sufficient to be harmful. And never sufficient to know any higher. Moreover, I do not know sufficient Rust to make this environment friendly, which is why every little thing occurs on the stack.

However! I am an actual programmer now ????

This was actually a journey, and I’d come again in a couple of days/weeks/months so as to add a characteristic or three. However this was only a proof-of-concept, to show that I can and to do all of it with out dishonest (utilizing exterior libraries/LLVM/taking a look at another person’s code). My journey consists of novice errors, rediscoveries of previous programming paradigms (reverse polish notation ftw!), and largely plenty of enjoyable. Over the course of the previous two months (I had closing exams after which I took a break from computer systems so I may destress), this has been my undertaking. I fell asleep interested by this. I awakened with new concepts. The final three days particularly have been so enjoyable!

I like to recommend doing one thing like this your self. Tackle a undertaking that may be as easy or as difficult or as straightforward or as arduous as you need. Like constructing a programming language. Give your self a time restrict. See what occurs. No stress like with hackathons or homeworks. Simply time your self. Estimate in case you do not actually care. Give it a shot. You may have enjoyable, I promise.

After which email me to point out me what you have made ❤️

None of this truly made it to the top undertaking. Go all the way down to the Designing the New Language part to see one thing extra just like the top consequence.

$ cargo new cane-lang

Let’s additionally add within the lexer dependency into our Cargo.toml.

$ cargo add pest

Now let’s waste treasured time for Cargo to replace the Crates index.

And whereas we’re at it, let’s replace our Neovim configuration and hope it would not break. You’ll be able to inform how I postpone scripting this till the final minute proper?

Anyway, now what? We’ve got a Git repository, we’ve got a lexer put in, however what can we do after that?

Now we’ve got to take our tokens and switch it into an abstract syntax tree. If you do not know what which means, it is advantageous, neither do I. I believe it is like, the map of your code as a tree or one thing. I do not know, I simply learn the primary line within the Wikipedia article.

Oh good, someone has done a lot of the hard work for me

Let’s do all of that.

I’m additionally now realizing that my Rust abilities are… rusty. I overlook how one can do every little thing. Perhaps Rust wasn’t the perfect language to choose, however right here I’m.

$ rm -rf cane-lang
$ cargo new cane-lang --lib
$ cd cane-lang
$ mkdir src/bin

Now that we’ve got some construction in our undertaking, I can get began on truly doing work! Let’s work out what our programming language will appear like (spoiler alert: it is gonna appear like C). Sure, there can be semicolons and you’ll’t cease me.

that theforce programming language, it looks as if I can simply… copy the code and alter up the tokens within the lexer and the take a look at and I can be carried out? See? Programming language in six hours (or much less). It is extra like 5 hours now as a result of I’ve spent a lot time simply researching and taking a look at documentation and I’ve zero thought what is going on on.

Anyway, let’s use that programming language as a assist card, to attract inspiration from within the case of being caught.

Let’s poke round that repo and see what is going on on. It is a quite simple set of code, which is tremendous good.

Now let’s get into programming language design. I am studying as I write this, so it won’t be tremendous correct, however let’s do that.

Introduction to ASTs

An AST is a representation of the code in a “tree” format. The tree is made up of “nodes”. One node is an “if” statement, or declaring a string, or to call a function, or assigning a variable. Some nodes can be operations. There are two kinds of operations: unary and binary. Unary operations act on one node and one node only. Think of the “not” operator. It will invert one node. Binary operations act on two nodes, and thus can be recursive since they can act on themselves. Think of adding two nodes together.

Now, we have to take the program file and make it into an AST. We do that with a grammar.

Introduction to Grammar

Grammar represents how the programming language should look. That’s it. That’s what pest does. It defines a grammar for your language and you can use it to generate a lexer and AST. Let’s take a look at what a part of a .pest file looks like (taken from theforce):

Statements = _{ Assertion* }

Assertion = _ AssignFromFunctionStatement
    

I ought to most likely learn the pest documentation whereas I am at it.

Anyway, let’s put collectively a primary grammar for Cane. I am gonna copy the boilerplate from theforce.

Program = _{ SOI ~ Features ~ EOI }

Features = { Operate* }

Operate = _ NonVoidFunction 

VoidFunction = {
    DeclareFunction ~ FunctionName
    ~ Parameters
    ~ Void
    ~ Statements
    ~ EndFunctionDeclaration
}

NonVoidFunction = {
    DeclareFunction ~ FunctionName
    ~ Parameters
    ~ Statements
    ~ ReturnStatement
    ~ EndFunctionDeclaration
}

Parameters = { (FunctionParameters ~ VariableName)* }

Most important = {
    BeginMain
    ~ Statements
    ~ EndMain }

Statements = _{ Assertion* }

Now I’ve to seek out out what statements I need to declare and use. theforce makes use of the next statements:

  1. DeclareBooleanStatement
  2. DeclareFloatStatement
  3. DeclareStringStatement
  4. AssignStatement
  5. AssignFromFunctionStatement
  6. PrintStatement
  7. ReadFloatStatement
  8. ReadStringStatement
  9. ReadBooleanStatement
  10. ForStatement
  11. WhileStatement
  12. IfStatement
  13. CallFunctionStatement
  14. Noop

Of those, I do not need to have a number of Declare statements. I want to have one strongly-typed declaration assertion. Likewise, I do not need to have a number of Learn statements for all the supported sorts.

So, I add the next to my grammar:

Assertion = _ Noop

Now for the arduous half: implementing all of those statements. Let’s work out how one can implement sorts.

ooOOoooo Time Skip ooOOoOOoOo

Okay, it has been like an hour of researching pest grammar. To be sincere, I do not perceive it. Perhaps a strongly-typed language is not a “six hour” undertaking. On the time of scripting this sentence, I’ve precisely three hours left. Let’s rethink how sorts should be applied on this program.

There must be a set of “atomic” sorts—like integer (cut up into unsigned int, signed int, unsigned lengthy, and signed lengthy), float (and double), and character. These are the smallest set of information that the language can deal with. On high of those “atomic” sorts, there should be sorts that may mix the atomic sorts (and different sorts), like arrays. I do not need to do something past that, as that’ll add an excessive amount of complexity. With the ability to use arrays would make this language truly helpful! Would’ya take a look at that?

Sufficient speak, let’s get into it. As an alternative of only a generic DeclareStatement and AssignStatement, I’ll use a number of: DeclareAtomicStatement and DeclareArrayStatement (and associated).

So, I would like one thing like the next:

DeclareAtomicStatement =  VariableName )


VariableName = { Identifier }

FunctionName = { Identifier }

Identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC)* }

SetInitialValue = _{ "=" }
TypePickerValue = _{ ":" }

ooOooOoOoooo Time Skip Once more OoOOOoooO

So, I had a dialog with somebody smarter than I’m about making my very own programming language. To have arrays, I would like sorts. Sort concept is difficult.

So, I am beginning over. Once more. Let’s redefine our targets. Our previous targets appeared one thing like this:

  1. Be quick
  2. Be capable of name exterior C code
  3. Look good (kinda)
  4. Moveable
  5. (Time allowing) have knowledge buildings like a dynamically allotted listing

Let’s change that up a little bit:

  1. Moveable
  2. Have lists

Hey, future Ersei right here. These are literally what ended up occurring.

Now, that brings me to my first “aha” second: I could make a practical programming language! Perhaps one thing like Scheme or make my very own Lisp dialect…

Large difficulty: I do not know how one can program in practical languages. Thus, I don’t perceive them. Maintaining in step with my “six hour” timeframe (of which I’ve like, two left), this is perhaps tough.

First, let’s give myself a little bit crash course on Lisp-like languages (non-comprehensive, carried out in 5 minutes):

  1. It is acquired a number of parentheses
  2. It makes use of plenty of singly-linked lists
  3. Very recursive

That is so far as I need to get with understanding Lisp. Let’s get proper into designing the language:

Designing the New Language

Most of this got changed up. Again. But we are a lot closer to what the final language will look like.

We can worry about how the language will be implemented later (but we’re keeping it in mind so that it isn’t too difficult later down the road).

Each program is a statement. Each statement can have more than one statement in it. This allows for recursion.

Each statement is of the following syntax: either (action data) or (data). Data is of the following types: a statement, list, or atomic (string, int, float).

All numeric values are infinite precision.

Let’s define some keywords:

Hi, future Ersei here. A lot of these have changed. This is not what the language will have in the end.

  1. add A...: adds together all of the items in the data recursively. If any value is a String, then the output will begin to add every additional item as a string.
  2. mult A...: multiplies all of items in the data. If lists are passed, then matrix multiplication occurs. Two strings being multiplied must result in an error. A string multiplied by an integer n will return that string repeated n times.
  3. div A...: divides each value from the next in the data. If non-numeric values are passed, then the interpreter must halt.
  4. dup N A...: returns a list containing N of the same inputs. If more than one item is passed, then the items are put into a list and the list is duplicated.
  5. defact NAME A...: creates an action with name NAME. Will evaluate A in order when called. If more than one A is passed, then the function returns each output in a list. Names must be unique between actions and variables.
  6. free NAME...: frees all variables and actions with the following names, allowing that name to be reused. A variable that is not found will halt the interpreter.
  7. print A...: prints data.
  8. write FILE: writes a string to a file (clobber).
  9. read FILE: reads a file into a list of lines.
  10. ext FILE: import and execute another file.
  11. if CONDITION THEN ELSE: if the condition is non-zero when cast to a number, return the data in THEN. Otherwise, return the data in ELSE.
  12. get RANGE: where range is a set of integers, return those values from the list (inclusive). Zero-indexed.
  13. eq A...: checks if the values are equal. Casts to the first argument. Will return a Number0 if they are not equal. Will return 1 if they are equal.
  14. eqq A... checks if the values and types are equal. Will return a Number0 if they are not equal. Will return 1 if they are equal.
  15. gt A...: checks if the values are in descending order. Casts to the first argument. Will return a Number0 if they are not in order. Will return 1 if they are in order.
  16. gtt A... checks if the values are in descending order. Types must all match. Will return a Number0 if they are not in order or if types do not match. Will return 1 if they are in order.
  17. lt A...: checks if the values are in ascending order. Casts to the first argument. Will return a Number0 if they are not in order. Will return 1 if they are in order.
  18. ltt A... checks if the values are in ascending order. Types must all match. Will return a Number0 if they are not in order or if types do not match. Will return 1 if they are in order.

There will be a few atomic types as well:

  1. String: one or more chars. Empty string is considered 0.
  2. Number: infinite precision.
  3. Action: refers to a set of instructions. If an Action is in a List, it will be ignored.

There is one composition type:

  1. List: a dynamically allocated array of items (including other lists). All lists are zero-indexed.

There will also be a few reserved values:

  1. args: a list of the arguments passed to the Action.

All types will be implicitly cast to the other if needed:

  1. Number to list: 1 will convert to (1)
  2. String to list: "apple" will convert to ("a", "p", "p", "l", "e")
  3. List to number: always will return the size of the list.
  4. List to string: concatenates all items in list to a string, ie ("a", "p", "p", "l", "e") will turn into "apple". This is recursive.
  5. Number to string: 1 will convert to "1". Any floating point numbers will be represented as rational fractions.
  6. String to number: all numerics will be converted into strings and all non-numerics will halt execution. "1" will convert to 1, and "a" will stop the program. Expects rational fraction.

Actions cannot be cast. If attempted, the program must terminate.

The program will continue to execute until an unrecoverable error has occured.

Unrecoverable errors:

  1. Cast to and from Action
  2. Reached end of file and parentheses are not all matched
  3. Undefined Action or variable
  4. Undefined value
  5. Same Action or variable is defined multiple times
  6. Cast string to number when it is not possible to do so
  7. Unexpected token (two types passed instead of a list containing both items)
  8. The empty list is used ()
  9. Two strings are multiplied together
  10. Non-numbers are divided

Comments all start with =. The interpreter must ignore all input until a = is reached. The exception to this is when the comment delimiters are in a string.

The Hello, World program would look one thing like this:

(print "whats up, world")

Alternatively, it is also:

(print "whats up, " "world")

Or:

(print (add ("whats up") ("world")))

I can work out extra examples later. Now that we’ve got the (possibly damaged) spec let’s get to programming!

One hour stays. I’d find yourself needing extra time.

Designing The Interpreter

This will be an interpreted language. This interpreter will memory-map the file (not required to do so) and read it character by character. As such, the file should not edited while it is being executed.

The interpreter will keep track of the state of the program:

  1. A list of all Actions as a hashmap of the action name and the location in memory where it is kept.
  2. The call stack. This will be implemented as a linked list that grows backward. Every time that the program “jumps”, the previous location is added to the stack. When the “jump” is finished (the last parenthesis is closed), the program pops the call stack and continues from that location. If the stack is empty, the program exits.
  3. The data list. This is a dynamic list that contains the current execution state of the program. The list must be comprised of either lists or atomics. As the program progresses, the list is recursively flattened from the bottom-left up.

Sounds simple, right? Now for the hard part: implementing the interpreter.

Fifty minutes remain. I definitely will need more time.

Writing the Interpreter

First, we have to erase everything we’ve already written, which wasn’t that much anyway.

Now, let’s go from the ground up. First, we have to implement the three basic types: Lists, Numbers, and Strings. They all have to be part of the same struct so we can use them anywhere. We can also implement casts while we are at it.

After spending a few minutes trying to figure out how to do Rust development on Nix (because I switched to NixOS a few days ago, writeup soon), I can get started.

In src/stdtypes.rs:

#[derive(Clone)]
pub struct List(pub LinkedList<Data>);

#[derive(Clone)]
pub struct Number(Rational);

#[derive(Clone, Copy, PartialEq)]
pub enum Types {
    STRING,
    NUMBER,
    LIST
}

#[derive(Clone)]
pub struct Data {
    kind: Types, 
    val_string: String,
    val_number: Number,
    val_list: List,
}

Next, we can implement the basic Actions: add and print. This is the minimum to implement a basic “Hello world” program. We will worry about the rest later.

A snippet of the add action:

impl Data {
    pub fn add(&mut self) {
        if self.get_kind() == Types::LIST {
            // The first value in the list will determine the type
            let first = self.get_list().0.pop_front();
            if first.is_none() {
                panic!("List is not allowed to be null!");
            }
            let mut sum: Data = first.unwrap();
            sum.add();
            while let Some(mut var) = self.get_list().0.pop_front() {
                if var.get_kind() == Types::LIST {
                    var.add();
                }
                sum = sum + &mut var;
            }
            self.kind = sum.kind;
            self.val_number = sum.val_number;
            self.val_string = sum.val_string;
        }
    }
}

impl std::ops::Add<&mut Data> for Data {
    type Output = Data;
    fn add(mut self, other: &mut Data) -> Data {
        if self.kind == Types::LIST && other.kind == Types::LIST {
            self.as_list().val_list.0.append(&mut other.as_list().val_list.0);
            return self;
        } else if self.kind == Types::LIST {
            // The clone is stupid
            // C would let me do this WITHOUT the clone
            // It's stupid because we are adding to the end of a linked list. This is a pointer to
            // an object. I should be allowed to add it to the list.
            self.as_list().val_list.0.push_back(other.clone());
            return self;
        } else if self.kind == Types::STRING || other.kind == Types::STRING {
            self.as_string().val_string += &other.as_string().val_string;
            return self;
        } else {
            self.as_number().val_number.0 += &other.as_number().val_number.0;
            return self;
        }
    }
}

The remainder of the code is left as an exercise for the reader.

I have spent six hours on the basics and the add function alone. I don’t think that this time limit is gonna work out.

I am on hour ten of six. It is also time to go to bed. I’ll write a bunch of comments so I don’t forget everything tomorrow.

ooOOOOoOooO Time Skip oOOOoooooOOoO

Anyway, now that we have a functioning action, we can start making the actual interpreter.

For this interpreter, we will need to keep track of a few things:

  1. The current position of the executor in the file.
  2. The call stack and the positions to jump back to.
  3. The current data in the program so far as a list. Because of how this language is structured, only one (recursive) list can be active at a time.
  4. A map of each Action to the location in the file they can be found.
  5. A map of each variable name to the data it represents.

A struct that can keep track of this information could look something like this:

struct Interpreter<R: Read> {
    reader: BufReader<R>,
    call_stack: Vec<SeekFrom>,
    data_state: Data,
    actions: HashMap<String, SeekFrom>,
    variables: HashMap<String, Data>,
}
impl<R: Read> Interpreter<R> {
    fn new(reader: BufReader<R>) -> Interpreter<R> {
        Interpreter {
            reader,
            call_stack: Vec::new(),
            data_state: Data::new(Types::LIST),
            actions: HashMap::new(),
            variables: HashMap::new(),
        }
    }
}

Now to make the actual interpreter… this might take a while.

I have spent another thirty minutes so far. Currently at ten and a half hours. Maybe aim for 24? That sounds impressive, right?

So, the way that the interpreter will work is that it will start reading the file until it hits a token. The tokens are as follows:

  1. Open string with `
  2. End string with '
  3. Begin comment with ;
  4. End comment with ~
  5. Begin expression with (
  6. End expression with )

All whitespace (including newlines) will be ignored unless part of a string. Therefore, all whitespace are considered delimiters.

These are in order of priority. A string should continue until it hits another end quote, and so on. Let’s make a basic interpreter that will do the following:

  1. Read the input character by character
    1. If a “begin” comment is found, then read character by character until and “end” comment is found
    2. Ignore any whitespace
    3. If anything but an open parenthesis is found, terminate the program with “unexpected token”
    4. When the first opening parenthesis is found, begin execution

In the process of writing the code, I realized that it would be a lot easier to make everything based on postfix notation (reverse polish notation):

(`hello world' print)

instead of prefix notation

(print `hello world')

That does bring into question how functions will be handled. We can worry about that later. I have a couple of ideas.

The big question is how am I going handle recursion? After distracting myself and reading a random article on RSS, I came to the conclusion that the interpreter should hold a reference to a data object in the interpreter state. This should be done through the call stack. In addition to the previous position, it will also contain the location of the previous data object as a reference. In addition, the interpreter will also keep track of a “working data” state. Easy enough, right? Let’s get started.

pub fn execute(&mut self) -> Result<(), io::Error> {
    let mut buf: [u8; 1] = [0];
    loop {
        match self.reader.read_exact(&mut buf) {
            Err(_) => break,
            _ => (),
        };
        match self.call_stack.last_mut() {
            Some(call) => match call.get_state() {
                Some(State::Wait) => {
                    // Waiting for program to begin
                    // Only a comment or starting data can begin
                    if buf[0] == Symbols::CommentStart as u8 {
                        self.call_stack.push(Call::new(State::Comment));
                    } else if buf[0] == Symbols::DataStart as u8 {
                        self.call_stack.push(Call::new(State::Data));
                    } else if WHITESPACE.contains(&buf[0]) {
                        // Ignore whitespace
                    } else {
                        // Unexpected token
                        return Err(Error::new(ErrorKind::InvalidData, format!(
                            "Unexpected token {} at position {}",
                            std::str::from_utf8(&buf).unwrap(),
                            self.reader
                                .seek(SeekFrom::Current(0))
                                .expect("Could not get current position!")
                        )));
                    }
                }
                Some(State::Data) => {
                    // Read data into object until end of object
                    if buf[0] == Symbols::DataEnd as u8
                        && self.call_stack.last_mut().unwrap().get_token().len() == 0
                    {
                        self.call_stack.pop();
                    } else if buf[0] == Symbols::StringStart as u8 {
                        self.call_stack.push(Call::new(State::String));
                    } else if !WHITESPACE.contains(&buf[0]) && buf[0] != Symbols::DataEnd as u8
                    {
                        self.call_stack
                            .last_mut()
                            .unwrap()
                            .get_token()
                            .push_str(std::str::from_utf8(&buf).unwrap());
                    } else {
                        // Now that the item in the data is split, we can choose what to do
                        // with it
                        // There are a few options. First is to execute a function. Second is
                        // to parse the object as a number. Third is to parse the object as a
                        // string.

                        let mut reset = true;
                        match self.call_stack.last_mut().unwrap().get_token().as_str() {
                            "add" => self.call_stack.last_mut().unwrap().get_data().add(), // Add data
                            "print" => write!(
                                self.stdout,
                                "{}",
                                self.call_stack.last_mut().unwrap().get_data()
                            )
                            .unwrap(),
                            _ => reset = false,
                        }

                        if reset {
                            // Reset token
                            let token = self.call_stack.last_mut().unwrap().get_token();
                            *token = "".to_string();
                        }

                        if buf[0] == Symbols::DataEnd as u8 {
                            self.call_stack.pop();
                        }
                    }
                }
                Some(State::String) => {
                    // If the string is over, push it into the data object
                    if buf[0] == Symbols::StringEnd as u8 {
                        let string_stack_object = self.call_stack.pop().unwrap();
                        // We can unwrap here safely because the only way to get to a string is
                        // when Data calls it
                        self.call_stack
                            .last_mut()
                            .unwrap()
                            .get_data()
                            .get_list()
                            .push(string_stack_object.data);
                    } else if let Some(last) = self.call_stack.last_mut() {
                        // Read data into object until end of string
                        last.get_data()
                            .get_string()
                            .push_str(std::str::from_utf8(&buf).unwrap());
                    }
                }
                Some(State::Comment) => {
                    // Ignore everything until we hit the comment end
                    if buf[0] == Symbols::CommentEnd as u8 {
                        self.call_stack.pop();
                    }
                }
                None => todo!(),
            },
            None => todo!(), // Parse stack is empty, catastrophic failure
        };
    }
    // End of file
    // Check if the program still needs to do anything
    match self.call_stack.last_mut().unwrap().get_state() {
        Some(State::Wait) => Ok(()),
        _ => return Err(Error::new(ErrorKind::UnexpectedEof, "Unexpected EOF!".to_string())),
    }
}

This (and the supporting data structures) took me like another eight hours to put together, mostly since Rust got in the way (but good thing it did otherwise hard-to-find errors would have popped up).

But it works.

It does hello world.

(`hello, ' `world
' add print)

There is no string escaping (yet), so the newline has to be a part of the code and not as an escape character.

But it works.

The relief.

We are at 18.5 hours. Time to implement the other basic parts. Next up is numbers. It’s not too hard, I just have to patch the token parsing to check for numbers.

let mut reset = true;
match self.call_stack.last_mut().unwrap().get_token().as_str() {
    "add" => self.call_stack.last_mut().unwrap().get_data().add(), // Add data
    "print" => write!(
        self.stdout,
        "{}",
        self.call_stack.last_mut().unwrap().get_data()
    )
    .unwrap(),
    _ => {
        match Rational::from_str_radix(self.call_stack.last_mut().unwrap().get_token().as_str(), 10) {
            Ok(val) => {
                let mut data = Data::new(Types::NUMBER);
                data.set_number(Number(val));
                self.call_stack.last_mut().unwrap().get_data().get_list().push(data);
            },
            Err(_) => {
                reset = false;
            },
        }
    },
}

That took less than ten minutes. I’m making good time here! What’s next? Unwraps!

What are unwraps, you ask? Well, in a situation like this:

(1 2 (3 4 add 1) add print)

the language will add together the two numbers then add a list to it. Because this language is very poorly designed, the list will be coerced into a number. That will always return the size of the list. Therefore, this will return 5.

Unwrapping would look something like this:

(1 2 (3 4 add 1)? add print)

First 1 and 2 will be added, then 3 4 will be added, then the contents of that inner list will be appended to the end of the outer list:

(1 2 7 1 add print)

which will act as expected.

if buf[0] == Symbols::Unwrap as u8 {
    let last_val = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    match last_val {
        Some(data) => match data.get_kind() {
            Types::LIST => {
                self.call_stack.last_mut().unwrap().get_data().get_list().append(data);
            }
            _ => {
                self.call_stack.last_mut().unwrap().get_data().get_list().push(data);
            }
        },
        _ => {
            return Err(Error::new(
                ErrorKind::InvalidData,
                format!(
                    "Invalid unwrap at position {}",
                    self.reader
                        .seek(SeekFrom::Current(0))
                        .expect("Could not get current position!")
                ),
            ))
        }
    }
}

As I am writing code demos for the language, I am running into issues with newlines. There are no string escapes! I can’t write in n, I have to put in a newline. Let’s fix that.

Some(State::StringEscape) => {
    self.call_stack.pop();
    match buf[0] {
        b'n' => self
            .call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .get_string()
            .push_str("n"),
        _ => {
            let unescape_buf: [u8; 2] = [Symbols::Escape as u8, buf[0]];
            self.call_stack
                .last_mut()
                .unwrap()
                .get_data()
                .get_string()
                .push_str(std::str::from_utf8(&unescape_buf).unwrap());
        }
    };
}

It should be pretty easy to add in more escape characters later on (like if I want to print a single quote). Another forty-five minutes have passed. I am now at 21 hours. I have three hours left to make this language useful.

Of the keywords I defined earlier, I have done exactly two of them: print and add. Who knew making a programming language would be so difficult? I have comments, strings, numbers, nests, unwraps, demos, tests, and internal error handling, so if I were to stop here, I could consider this a success.

But I’m not satisfied yet. The big things remaining are variables and functions. Those would make this language actually useful.

See Also

Remember how I said earlier that reverse polish notation would make functions and variables hard? That is because the interpreter reads the file from left to right. It won’t know whether to execute the code or save it for later if it’s a function, since it hasn’t gotten to that part yet.

Because I’ve already thrown out all sensibility, my brain went to using more than parentheses. A function could be defined with square brackets. Because why not. The interpreter will read the function into memory as a string, then execute it when the function token is found.

Now I’m thinking, why not just use a string? That can be the function! And then, the language can generate its own code to execute!

I am really going crazy here folks.

Things Get Truly Cursed

Let’s implement functions-that-are-strings-but-also-executable. It shouldn’t be too difficult.

Here’s the plan. When defact is hit, then the last item in the data is popped and used as the name, and the data before that is used as the action. Same deal with defvar:

To load the data into the hashmap (this is alongside print):

"defvar" => {
    let key = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    if key.is_none() {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "No action name found".to_string(),
        ));
    }
    let var = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    if var.is_none() {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "No variable found".to_string(),
        ));
    }
    self.variables
        .insert(key.unwrap().to_string(), var.unwrap());
    println!("{:?}", self.variables);
}

and to retrieve it (this is after the token to number check):

match self
    .variables
    .get(self.call_stack.last_mut().unwrap().get_token())
{
    Some(data) => {
        println!("found {:?}", data);
        match data.get_kind() {
            Types::ACTION => {
                unimplemented!()
            }
            _ => {
                self.call_stack
                    .last_mut()
                    .unwrap()
                    .get_data()
                    .get_list()
                    .push(data.clone());
            reset = true;
            }
        }
    }
    None => (),
}

That was just one more hour spent. Two hours remain.

((1 2 3 4 add) `myvarname' defvar)
(myvarname `n' print)

((1 2 3 4 5 add) `myvarname' defvar)
(myvarname `n' print)

This will print 10 then 15.

Now for functions. There are two ways I can do this: functions can return data, or they can only act on the variables. It’ll be easier to implement the second one, but the first one would be more useful.

I think I have enough time. Let’s do it. I’ve already implemented the code to save a string as an action (almost the exact same as the variable saving code, just with a few more lines to change the type to action).

Maybe the best (worst) way to do this is to start another interpreter with a shared variable state and to use a Cursor to read the function data.

Types::ACTION => {
    let mut c = Cursor::new(Vec::new());

    c.write_all(data.to_string().as_bytes()).unwrap();
    let mut reader = BufReader::new(c);
    reader.rewind().unwrap();
    let mut action_interp = Interpreter::new(
        reader,
        self.stdout,
        self.variables,
        self.call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .clone(),
    );
    // Delete old data (since it was copied into
    // args for new instance)
    self.call_stack.last_mut().unwrap().get_data().erase();
    match action_interp.execute() {
        Ok(data) => {
            self.call_stack
                .last_mut()
                .unwrap()
                .get_data()
                .get_list()
                .push(data);
            reset = true;
        }
        Err(e) => return Err(e),
    }
}

It has now been two hours. I have hit the deadline.

This is the last commit in the 24 hour deadline.

However since when did I care about deadlines? To make this language helpful, there’s one final thing that should be added: management stream.

Again to Sanity (Kinda)

Implementing eq, lt, gt, and so on will be pretty easy. Each of these will take n >= 3 arguments. The first is the condition. The second is the output if the condition is true. The third is the output if the condition is false. Of course, to make it easier for myself, the output conditions are also strings that will be executed (who cares about Rust’s call stack? I’m on a deadline here!)

Furthermore, these comparisons will be made strictly. Casting will be implemented pretty easily:

"tolist" => {
    self.call_stack.last_mut().unwrap().get_data().as_list();
    let data = self.call_stack.last_mut().unwrap().get_data();
    let mut changeme = data.get_list().pop_back().unwrap();
    changeme.as_list();
    data.get_list().push(changeme);
}
"tostring" => {
    let data = self.call_stack.last_mut().unwrap().get_data();
    let mut changeme = data.get_list().pop_back().unwrap();
    changeme.as_string();
    data.get_list().push(changeme);
}
"tonum" => {
    let data = self.call_stack.last_mut().unwrap().get_data();
    let mut changeme = data.get_list().pop_back().unwrap();
    changeme.as_number();
    data.get_list().push(changeme);
}

We are popping the last value in the list, converting it, then pushing it back.

Converting a list to a number still returns the size of the list, so getting the size of lists and strings is pretty easy (ex: (string tolist tonum)).

Now to finish off what we started: conditionals.

Here’s how it’s going to work: if the conditional is found, pop the last two items (for the true and false conditions), then check the data. Based on the check, execute the true or false condition.

"eq" | "lt" | "gt" | "lte" | "gte" => {
    let token = self.call_stack.last_mut().unwrap().get_token().clone();

    let data = self.call_stack.last_mut().unwrap().get_data();
    let mut false_exec = data.get_list().pop_back().unwrap();
    let mut true_exec = data.get_list().pop_back().unwrap();
    let mut c = Cursor::new(Vec::new());

    let is_true = match token.as_str() {
        "eq" => data.is_eq(),
        "lt" => data.is_sorted_asc(),
        "lte" => data.is_sorted_asc_eq(),
        "gt" => data.is_sorted_desc(),
        "gte" => data.is_sorted_desc_eq(),
        _ => unreachable!(),
    };

    if is_true {
        c.write_all(true_exec.get_string().as_bytes()).unwrap();
    } else {
        c.write_all(false_exec.get_string().as_bytes()).unwrap();
    }

    let mut reader = BufReader::new(c);
    reader.rewind().unwrap();
    let mut action_interp = Interpreter::new(
        reader,
        self.stdout,
        self.variables,
        self.args.clone(),
    );
    self.call_stack.last_mut().unwrap().get_data().erase();
    match action_interp.execute() {
        Ok(data) => {
            self.call_stack
                .last_mut()
                .unwrap()
                .get_data()
                .get_list()
                .push(data);
            reset = true;
        }
        Err(e) => return Err(e),
    }
}

I have also realized that the syntax for conditionals is ugly. If you want to return a string, then it looks so gross:

(1 2 3 4 `(`this is ascending!')' `(`this is not ascending :(')' lt `n' print)

Instead, let’s make it look better:

(1 2 3 4 `this is ascending!' `this is not ascending :(' lt `n' print)

Simple! Just have the interpreter treat all unknown characters as a string to save! (This code is left as an exercise to the reader (or just look at the source (wow this sentence looks like the programming language)))

We are now on hour 26 of 24. This programming language is more or less done. I could implement more features, but at this point I wanna make cool shit with Cane.

Dogfooding

Let’s make a simple loop. Print the numbers from 1 to 100.

(1 `counter' defvar)
(
    `(counter 100 `(counter `n' print) (1 counter add `counter' defvar) (body)' `' lte)' 
    `body' defact
)
(body)

Now that I’m writing code in this language, I can see how using strings is a bad idea. Maybe a slightly different syntax is needed?

How about using brackets to signify that?

(1 `counter' defvar)
(
    [(counter 100 [(counter `n' print) (1 counter add `counter' defvar) (body)] 0 lte)]
    `body' defact
)
(body)

It will act almost exactly like a string, but it will match the brackets so no escaping is needed. If a bracket is found in a string, nothing will happen.

It will still return a string though.

Furthermore, I think the current model of creating a new interpreter every time recursion happens (which will happen a lot) is stupid and inefficient. I have a little bit of pride in my work. After the bracket bit though.

Maybe it was a mistake to have the call stack and the parse stack merged…

Anyway, two hours later I have this:

Some(State::LazyEval) => {
    if buf[0] == Symbols::StringStart as u8 {
        self.call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .get_list()
            .last_mut()
            .unwrap()
            .get_string()
            .push_str("`");
        self.call_stack.push(Call::new(State::String));
    } else if buf[0] == Symbols::LazyStart as u8 {
        open_count += 1;
        self.call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .get_list()
            .last_mut()
            .unwrap()
            .get_string()
            .push_str("`");
    } else if buf[0] == Symbols::LazyEnd as u8 {
        close_count += 1;
        // We only want to add the end quote if we are at least one layer deep
        if open_count - close_count > 0 {
            self.call_stack
                .last_mut()
                .unwrap()
                .get_data()
                .get_list()
                .last_mut()
                .unwrap()
                .get_string()
                .push_str(&("".repeat(open_count - close_count - 1) + "'"));
        }
    } else if open_count == close_count {
        // If the string is over, push it into the data object
        let mut string_stack_object = self.call_stack.pop().unwrap();
        // Merge the string fragments
        string_stack_object.get_data().as_string();
        // We can unwrap here safely because the only way to get to a string is
        // when Data calls it
        self.call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .get_list()
            .push(string_stack_object.data);
    } else {
        let last = self.call_stack.last_mut().unwrap();
        // Read data into object until end of string
        last.get_data()
            .get_string()
            .push_str(from_utf8(&buf).unwrap());
    }
}

Of course, the rest of the code was tweaked to get this working too.

I am now at hour 28 of 24. I feel exhausted. Optimization be damned, I’m gonna get this ready for release. What a ride.

Butttt, I could release this as WASM and have people be able to mess with it on the web… maybe later.

Also, everything is happening on the stack. Nothing is on the heap. This means that recursion (being so inefficient), can really only go so deep. The loop will max out at around ~760 iterations.

Now that I think about it, I did spend the first six hours or so doing nothing and messing around with AST and LLVM.

I am now at hour 22 of 24 >:3

I think that I need to add more functionality:

  1. DATA N repeat will execute DATAn times
  2. DATA... N duplicate will duplicate DATA...n times
  3. DATA... drop will remove the data from the object
  4. DATA... flatten will recursively unwrap the data until no lists remain

This should get rid of the recursion issue with loops!

For example, flatten is implemented like so:

pub fn make_flat(&mut self) {
    let mut flattened: LinkedList<Data> = LinkedList::new();
    if self.kind == Types::LIST {
        while let Some(mut item) = self.get_list().0.pop_front() {
            flattened.append(&mut item.val_list.0);
        }
        self.set_list(List::new(flattened));
    }
}

mmmm recursion. Tasty. Can you tell I’m losing my sanity over this project?

; Prints the numbers from 1 to 100 ~

(1 `count' defvar)
(
    [
        (count `n' print)
        (count 1 add `count' defvar)
    ]
    1000000 repeat
)

It has been another hour and a half. I should probably add the last two features I have been thinking about: join and split.

  1. DATA... String join will join the data with the string (think python "stringhere".join(array))
  2. String String split will split the first string with the second. Should return a list.

Join was easy:

pub fn join_string(&mut self, joiner: &str) {
    if self.kind != Types::LIST {
        return;
    }
    let mut new_string = String::new();
    let mut should_strip: bool = false;
    for val in self.iter() {
        should_strip = true;
        new_string.push_str(&(val.to_string() + joiner));
    }
    if should_strip {
        new_string = new_string.strip_suffix(joiner).unwrap().to_string();
    }

    self.set_string(new_string);
    self.kind = Types::STRING;
}

along with the interpreter side of things:

"join" => {
    let joiner = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    let data = self.call_stack.last_mut().unwrap().get_data();
    if joiner.is_none() {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Joiner is missing".to_string(),
        ));
    }
    if joiner.as_ref().unwrap().get_kind() != Types::STRING {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Joiner must be a string".to_string(),
        ));
    }
    data.join_string(joiner.unwrap().get_string());
}

Splitting was much the same. This post is already very code-heavy, but here goes anyway:

"split" => {
    let splitter = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    let splitme = self
        .call_stack
        .last_mut()
        .unwrap()
        .get_data()
        .get_list()
        .pop_back();
    if splitter.is_none() {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Splitter is missing".to_string(),
        ));
    }
    if splitter.as_ref().unwrap().get_kind() != Types::STRING {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Splitter must be a string".to_string(),
        ));
    }
    if splitme.is_none() {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Splittee is missing".to_string(),
        ));
    }
    if splitme.as_ref().unwrap().get_kind() != Types::STRING {
        return Err(Error::new(
            ErrorKind::InvalidData,
            "Splittee must be a string".to_string(),
        ));
    }
    for item in splitme
        .unwrap()
        .get_string()
        .split(splitter.unwrap().get_string().as_str())
    {
        let mut data = Data::new(Types::STRING);
        data.set_string(item.to_string());
        self.call_stack
            .last_mut()
            .unwrap()
            .get_data()
            .get_list()
            .push(data);
    }
}

I have now hit 24 hours for the second time. I should be done now.

The Aftermath

The TL;DR is a fairly good abstract of how I felt in spite of everything of this. However you learn the entire thing (or scrolled to the underside), so I ought to most likely have one thing else right here for you.

Since I’d contemplate this an esolang, I have to do funky esolang issues with it, like show Turing-completeness or write a quine. I am positive it is potential, however I’m not ok at computer systems to determine these simply but. I would like a break from doing pc issues like this. My mind is mush sufficient already.

So what did I be taught?

Rust. Mostly Rust. As I stopped searching for things on the internet (beyond Rust things) once I gave up on using LLVM, I did not learn much else from this project. But it was fun!

What Next?

Probably nothing. Maybe I’ll add in some list manipulation features later, but I’ve already stretched those 24 hours far enough.

Maybe I’ll add support for this on Replit with Nix trickery (as soon as I work out how one can use Nix (which is kinda unhappy contemplating that I’ve been utilizing NixOS for a couple of months now (wow this sentence can also be wanting like Cane (because it has plenty of parentheses)))). The chances are limitless.

Is that this language silly? Sure. May I’ve carried out higher with extra time? No, since I’d have gotten bored or burnt out or a mix of the 2. I am glad I put a time restrict on this.

The Half by which I Hold Working

I could not put down the language and now I ported it to WASM. It required a restructure of the crates, libraries, and then swap out the only dependency for another similar one.

So yeah. Cane is now Web-Ready. Like Java.

And then my friend tells me that Cane doesn’t work on Home windows! Seems that Home windows has different line endings than *nix, so I simply wanted so as to add r to the listing of whitespace.

See, now I need to implement much more options. I wanna add some correct math assist. Multiplication, exponentiation, division, subtraction, and so on)

  1. sub: subtracts the information one after one other. All knowledge should be numbers.
  2. mult: multiplies the information one after one other. All knowledge should be numbers.
  3. div: divides the numbers one after one other. All knowledge should be numbers.
  4. neg: negates the final quantity.
  5. flooring: rounds all the way down to the closest integer.
  6. ceil: rounds as much as the closest integer.
  7. abs: absolute worth of the final quantity.

Okay, that took like one other hour. Now I am carried out, I swear.

However there is no method to learn stdin… no no no no no no. I’m carried out. Completed.

Aaaand one hour later I couldn’t maintain myself again and determined to implement stdin. And endlessly loops. The loops had been straightforward. It was implementing stdin that was too tough. I blame Javascript. I couldn’t, for the lifetime of me, get stdin to work on the net playground correctly. So the net playground has a non-interactive stdin. Heartbreaking. Perhaps when I’ve extra time and vitality I can look into one thing else, like xterm.js.

See, now it is 8:37 within the morning on Monday. I get this weblog submit out at 9:00, no matter whether or not it is completed or not.

I believe it is completed.

Grand whole: 6 hours (messing round with previous stuff), 25 hours (language), 4 hours (internet playground), 1 hour (weblog submit).

Hmm, let’s work out how a lot that’s:

((6 25 4 1) `knowledge' defvar)
((knowledge ? ` + ' be a part of) ` = ' (knowledge ? add) println)
6 + 25 + 4 + 1 = 36

Is not this good? A programming language in thirty-six hours (but in addition truly simply 25 hours for the precise language). I believe this can be a 24-hour-ish undertaking, proper?

Proper?


Corrections? Feedback? Simply wanna say hello? Get in touch!

Observe me on Mastodon

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top