Now Reading
Designing a Programming Language to Speedrun Creation of Code

Designing a Programming Language to Speedrun Creation of Code

2023-11-13 15:45:55

“shouldn’t this have been printed a number of months in the past?” yeah, most likely. I even thought of submitting it to the AoC contest. time is an actual beast.

The title is clickbait. I didn’t design and implement a programming language for the only and even main objective of leaderboarding on Creation of Code. It simply turned out that the programming language I used to be engaged on match the duty remarkably properly.

I can’t identify only a single cause I began work on my language, Noulith, again in July 2022, however I believe the most important one was much more absurdly area of interest: I clear up and write a number of puzzlehunts, and I needed a greater programming language to make use of to look phrase lists for phrases satisfying uncommon constraints, corresponding to, “Discover all ten-letter phrases that comprise every of the letters A, B, and C precisely as soon as and which have the ninth letter Ok.” I’ve a folder of ten-line scripts of this type, principally Python, and I assumed there was absolutely a greater manner to do that. Not essentially sooner — there may be clearly no manner I may save time on net by optimizing this process. However, for instance, I needed to have the ability to simply share these applications such that others may run them. I had a constructive expertise on this with my barely older golflang Paradoc, which I had compiled right into a WASM blob and put online and, simply as soon as, skilled the comfort of sharing a short text processing program by way of a hyperlink. (Puzzle: what does this program do?) I additionally needed to jot down and run these applications whereas booted into a special working system, utilizing a special pc, or simply on my telephone.

As I labored on it, I saved accumulating causes to maintain going. There have been different contexts the place I needed to rapidly code a combinatorial brute pressure that was annoying to jot down in different languages; a glib phrasing is that I needed entry to Haskell’s listing monad in a sloppier language. I additionally needed an excuse to learn Crafting Interpreters extra completely. However generally I believe one of the best characterization for what creating the language “felt like” was that I had been possessed by a supernatural creature — say, the dragon from the Dragon Book. I spent each spare minute occupied with language options and subsequent implementation steps, as a result of I needed to.

The primary “actual program” I wrote in Noulith was to brute pressure constructions for The Cube, for final yr’s Galactic Puzzle Hunt in early August, and it labored unexpectedly properly. I wrote a for loop with a 53-clause iteratee and the interpreter executed it easily. Ultimately I spotted that the language may broaden into different niches in my life the place I needed a scripting language. For instance, I did a number of Cryptopals challenges in them. It might take a month or two earlier than it dawned on me that the identical compulsion that drove me to create this language would drive me to do Creation of Code in it. That’s simply the way it needs to be.

This publish particulars my thought course of behind the design of this language. Some preliminary notes:

  • I made a number of uncommon decisions with this language, however none are notably “deep” language options like Rust’s possession checker, Mercury’s determinism checks and Pony’s reference guarantees (three examples lifted verbatim from “So You’re Using a Weird Language”). The immutability semantics are slightly attention-grabbing, however nonetheless don’t have as far-reaching implications. To the extent the language breaks any new floor, it’s most likely the boundaries of style in including syntax sugar. Nonetheless, syntax is enjoyable.
  • A number of the selections I made are deeply entangled with one another. I form of attempt to string them collectively right into a linear narrative for presentation’s sake, typically additionally pretending that I researched how a bunch of different languages approached the identical resolution earlier than making it myself, however the existence of such a story is generally fictitious.
  • Pixel’s syntax across languages web page was immensely helpful.
  • Noulith was meant as a private programming language at first, deeply knowledgeable by and optimized for the way I, particularly, take into consideration and write code. I consider it as a “home-cooked” programming language, a la Robin Sloan’s home-cooked app. I didn’t create this language with the expectation or hope that even a single different individual on the earth would need to study it; the quantity of curiosity it briefly garnered was a (principally) nice shock. I additionally didn’t intend for this programming language to work properly for applications which are longer than 100 strains or so, even when written by me. My best-case situation is that if one of many bizarre syntax experiments I did with this language vaguely influences a greater thought-out function in a serious programming language.
  • There are two ideas from interface design, inside consistency and exterior consistency, which are fairly apparent in hindsight however that I discovered helpful to explicitly consult with beneath. Inside consistency refers to comparable issues inside a single software working in comparable methods, whereas exterior consistency refers to issues in a single software which are much like issues in different functions working in comparable methods. Each are fascinating since they make it simpler to learn to use the applying: inside consistency implies that customers can study issues from one a part of your software and apply them to a different, whereas exterior consistency implies that customers can apply data they could have already got from different functions. However they’ll come into battle with one another and with different desiderata.

    So for instance, inside consistency favors giving two built-in capabilities to append and prepend an merchandise to an inventory names which are clearly associated, so programmers who bear in mind one can simply bear in mind the opposite; whereas exterior consistency favors copying these names from a longtime programming language if potential, so programmers coming from that established language already know these names.

    All that is related due to the generally underappreciated consideration {that a} programming language is a person interface! I believe this attitude is straightforward to lose sight of as a result of “programmers” and “customers” are normally totally different teams of individuals, however for a programming language, the person is the programmer writing code in it, distinct from the programmer implementing the language.

  • This publish is just too lengthy — to cite the Mark Twain apology, I didn’t have time to jot down a brief one — and as I completed it I spotted that half of its raison d’être is simply to supply an excuse for me to accidentally point out a bunch of attention-grabbing options and nook instances of different programming languages. So should you’d fairly simply learn that, I collected a lot of the fun facts into an ending section.

Literals, identifiers, and information varieties

First issues first. On a character-to-character, token-to-token degree, what does the language seem like?

There are a number of questions which are too fundamental to be attention-grabbing, corresponding to what numeric and string literals seem like. This doesn’t have a lot impression on the remainder of the language, so I simply copied a bunch of widespread syntaxes. For numbers, aside from the apparent decimal ones, I threw in binary 0b literals, hexadecimal 0x literals, arbitrary radix 36r1000 literals, scientific notation 1e100 literals, and complicated quantity 1i or 1j literals. I even added base64 literals for kicks. Strings can use both single or double quotes, basically what Python has. Have been I so as to add, say, ternary literals, extra flavors of triple-quoted or uncooked strings, or a bunch of particular escape sequences, nothing else must change and there can be nothing to say about their design.

Identifiers have a bit extra depth. Like most languages, most Noulith identifiers include a letter (together with _) adopted by any variety of alphanumeric characters. From Haskell I copied the power for such identifiers to additionally comprise (however not begin with) apostrophes, which I believe appears neat for denoting a brand new model or modified variant of a variable, just like the prime image in math. Way more questionably, I additionally gave ? the identical therapy, with the purpose of connoting variants of capabilities that returned null as an alternative of erroring. In hindsight, I ought to maybe not have muddled up the lexical syntax a lot; a special conference, like a trailing _ on alphanumeric identifiers, might need sufficed. Individually, Noulith helps identifiers consisting of solely symbolic characters as properly, additionally like in Haskell. We’ll focus on how the parser treats them later.

I additionally had to consider the essential information varieties we need to help, however earlier than that I needed to determine if Noulith can be statically or dynamically typed. I like static varieties, however provided that they’re sufficiently expressive and supported by good inference, and I like not having to implement any of that stuff much more, so I settled for dynamic typing.

I received’t listing all the information varieties I ended up with right here, however a number of the fundamental ones are null, numbers, strings, lists, and dictionaries. Although null has a justifiably unhealthy popularity, it’s arduous to keep away from in a dynamically typed language; it’s too helpful as, for instance, the return worth of capabilities that don’t explicitly return something. Notable omissions are booleans, units, and any type of devoted error/exception sort. I don’t assume they’re unhealthy issues to have in a language, I simply thought they have been considerably simpler to work round than to implement, and I couldn’t be bothered to place within the work:

  • As a substitute of true and false, you possibly can simply use numbers 0 and 1, which is near how C and Python do it.
  • As a substitute of units, you possibly can simply use dictionaries the place the values are null, so {a} == {a: null}. This nonetheless works properly as a result of in can simply check for the presence of the important thing in a dictionary, a conduct additionally precisely like Python.
  • As a substitute of devoted error varieties, you possibly can simply use… non-dedicated information varieties. You may compose a string with an error message and throw it. I don’t like this state of affairs — I believe having devoted or at the least extra structured error varieties actually is a good suggestion, possibly purely for the precept of the factor, however design and implementation each take effort, and it’s arduous to argue for prioritizing this once I solely use Noulith for brief throwaway scripts.

I didn’t assume arduous about any these choices, however they’d penalties we’ll focus on later. For the syntax of lists, I selected to make use of sq. brackets [], and for dictionaries, curly brackets {}, but once more precisely like Python. This additionally has the profit that legitimate JSON is legitimate Noulith.

Lastly, with regard to variable scoping, Noulith has a easy approximation of lexical scoping, however names will not be namespaced or certified in any way. All built-ins stay in the identical international namespace. That is one other factor that’s unhealthy however low precedence.

Operators and capabilities

Issues ought to get extra attention-grabbing from right here. Subsequent up: how do you carry out fundamental arithmetic operations? I’m used to including two numbers like x + y. There are alternate options: in Lisps, for instance, arithmetic operations are referred to as prefix like (+ x y) for homoiconicity; in stack-based languages like Forth and GolfScript, they’re referred to as postfix like x y +. Each approaches additionally make parsing a lot simpler. Nonetheless, I made a decision both various would fairly basically sluggish me down as I attempted to translate ideas into code, so I caught with the mainstream: fundamental arithmetic operations are infix.

Equally, I made a decision that prefix unary minus was required. Which implies that the - operator, if nothing else, needs to be callable as both a prefix unary operator or an infix binary operator. We’ll return to this later.

Okay, what about operate calls? There may be once more a well-liked syntax: foo(bar, baz). The principle various is solely juxtaposition (and heavy currying in order that this does the fitting factor), as in Haskell and MLs (OCaml, SML, F♯, and so on.): foo bar baz. A smaller deviation is to help the favored syntax but in addition enable the parentheses to be omitted, as in Perl and Ruby: foo bar, baz.

Utilizing mere juxtaposition as operate invocation form of conflicts with binary operators, that are simply two arguments juxtaposed round an operator: is x + y calling addition on x and y, or calling x with arguments + and y? Most languages don’t have this downside as a result of they’ve a hard and fast set of binary operators which are completely distinct from identifiers, however I needed to have the ability to add a number of operators with out enshrining them into the syntax or needing a number of boilerplate within the implementation. Haskell and MLs resolve this battle by parsing identifiers comprised of operator symbols, like +, as binary operators, whereas parsing identifiers comprised of alphanumerics, like x and y, as capabilities to be referred to as with juxtaposition. So, one thing like a b + c d is parsed as (a(b)) + (c(d)). Nevertheless, the strategy I ended up liking essentially the most is Scala’s, whose parser doesn’t draw this distinction between varieties of identifiers (besides to find out priority, which we’ll come again to later; and its lexer does draw this distinction, as does Noulith’s, in order that x+y is three tokens whereas xplusy is one). Scala’s grammar simply says that a b c is all the time a binary operator the place b known as with a and c.

Nicely, Scala truly says that operators are methods: the b technique of a known as with c as its sole argument. However I didn’t notably need strategies in my language, as they appeared like an pointless layer of abstraction for my objectives. So in Noulith, b is appeared up in the identical scope because the identifiers round it. One can view this as combining Scala’s strategy with Uniform Function Call Syntax, seen in languages like D and Nim.

Why is that this strategy nice?

  • It’s easy: after identifiers are lexed, the parser doesn’t have to know their sort.
  • It’s good for compositionality: it turns into straightforward to go operators to different capabilities, like zip(+, list1, list2).
  • And, properly, it matches my private style: I like having the ability to use alphanumeric identifiers as infix operators, which we’ll discuss extra in a bit. (You may have particular syntax for doing so, like Haskell’s backticks, however I assumed that was ugly for one thing I needed to make use of extensively.)

However there’s a wrinkle we have now to return to. I already talked about I needed to help unary minus, so -a ought to be the negation of a. However then how ought to an expression like - - a be parsed? Is it calling the center - as a binary operator on the operands - and a flanking it, or making use of unary minus twice to a? I nonetheless didn’t need to make - particular within the syntax, so I made a decision I used to be okay with requiring parentheses to specific the second intent, as in -(-a), and saying that (a b) is a form of particular case the place juxtaposition expresses a operate name, whereby a known as with one argument, b.

Alternatively, I take pleasure in partially making use of operators rather a lot. They’re helpful for passing into higher-order capabilities to supply neat expressions like (Creation of Code Day 7) some_list filter (<= 100000) then sum to sum all numbers which are at most 100000 in an inventory. This syntax I needed to help is taken from Haskell, however additionally conflicts with unary minus. Is (-3) the quantity “unfavorable 3” or the operate that subtracts 3 from its enter? Haskell resolves this by particularly carving out a syntactic special case for -; it’s the solely operator for which (-x) doesn’t partially apply the primary operate within the juxtaposition. For each different Haskell operator, say +, (+x) is a partially-applied operate that, given an argument a, returns a+x. I selected to emulate this conduct by nonetheless having juxtaposing two expressions imply unary operate software, however then simply making most built-in capabilities help partial software when referred to as with one argument, however not -.

On the gripping hand, I additionally determined to emulate Scala right here and likewise provide the “part” _ + x, which can be a operate that, given an argument a, returns a + x. These are strictly extra highly effective (e.g., for causes defined later, 0 < _ < 10 can be a legitimate “part” that checks whether or not one argument x is between 0 and 10 — in contrast to Scala, the place this wouldn’t work as a result of it parses as evaluating the lambda 0 < _ to 10), at the price of requiring at most two additional characters, so the argument for having these and capabilities rampantly supporting partial software is way weaker. Nonetheless, for now, I’m retaining each syntaxes out of inertia.

On the fourth hand, Haskell additionally permits partially making use of capabilities on the opposite facet of binary operators. For instance, (3-) is the operate that subtracts its argument from 3. Noulith additionally copies this syntax by decreeing that, if a just isn’t a operate however b is, then (a b) is b partially utilized with a as its first argument. This heuristic is flawed when each a and b are capabilities: for instance, <<< is the operate composition operator, in order that (f <<< g)(h) is f(g(h)), however should you attempt to postcompose sin onto one other operate as (sin <<<), it received’t work. This particular case is straightforward to work round as a result of you possibly can write (>>> sin) as an alternative, but it surely’s undoubtedly a downside.

Earlier than we spend a while wanting on the implications of constructing every thing an infix operator, I’ll point out that Noulith doesn’t (presently) help named arguments. It’s a kind of issues that I believe can be good to have, however isn’t a precedence as a result of it issues extra in longer, extra structured applications, and it additionally comes into gentle rigidity with a closely practical fashion. A technique I’d characterize the attract of named arguments is that they’d assist you to ignore, for instance, which of the next two definitions a operate was outlined with, and use them the identical manner:

Sadly, the distinction does matter if you wish to map or zip with foo. To maintain ignoring it, both you’d need to wrap foo in a lambda to plumb the fitting inputs to the fitting named arguments every time, which loses a lot of the class of practical programming, otherwise you’d need to make all these higher-order capabilities take the names of arguments to make use of when invoking the capabilities you present them, which I believe is annoying to implement and to make use of. Nonetheless, you could possibly think about a language that takes that plunge. Maybe language help at a extra basic degree would make every thing work out.

Coding with and with out infix capabilities

As I beforehand alluded to, I additionally like making every thing an infix operator so I can name capabilities like map on an inventory by typing after the code for creating that listing. This matches how I write code mentally: “I’ve this information, I’ll rework it on this manner, then rework it in that manner, then apply some closing operate and I’ll have my reply.” At every step I bear in mind what type of the information is in my head and determine what transformation I need to apply subsequent.

To offer a extra concrete instance, I’ll stroll by way of 2022’s first day of Advent of Code. If I have been to do it in Python, I’d assume to myself: okay, the puzzle enter is a sequence of “paragraphs” (the identify I mentally give to blocks of textual content separated by double newlines), so let’s break it up into such:

“Now for every paragraph we need to get all ints from it…” Like many leaderboarders, I’ve a prewritten function ints that extracts all of the integers from a string with a easy regex, however to make use of it I’ve to maneuver my cursor to the beginning of the expression, sort map(ints,, then transfer my cursor again to the tip so as to add ).

“Then we need to sum all of the integers in every paragraph…” Again to the beginning of the road, map(sum,, then again to the tip, ).

“Lastly take the max…” Rinse and repeat.

That’s six cursor jumps to jot down this straightforward four-step expression. Leaping to the beginning of the road is a comparatively straightforward textual content editor operation, but when I have been scripting this expression to assign it to a variable, finding the beginning every time can be much less enjoyable. A language may keep away from the cursor jumps again to the tip of the road by making parentheses non-compulsory as in Perl or Ruby or one thing, however would nonetheless pressure me to jot down the ints map, the sum map, and the max name right-to-left within the order I considered making use of them. A whole resolution to this situation has to make capabilities like map and sum callable postfix of the sequence being mapped or summed. This could possibly be finished by making them strategies of lists, puzzle_input.break up("nn").map(ints), or by offering operators like |> in F♯ and Elm. However our Scala-inspired resolution not solely achieves this, it dispenses with nearly all of the punctuation! Right here’s the precise Noulith from my Day 1 solution this yr, the place you possibly can see the tokens in the identical order because the steps in my thought course of above.

puzzle_input break up "nn" map ints map sum then max

One draw back of this syntax is that it solely helps calling binary operators, i.e., combining the expression you’re constructing on with precisely one different argument. Nevertheless, that is simply prolonged to help unary operations with a built-in operate that simply performs reverse operate software, as seen above with then max. Noulith gives two such built-ins, then and . (which have totally different precedences): a.b and a then b are each simply b(a). It’s much less apparent the way to chain capabilities that take three or extra arguments, however some language choices we’ll see within the subsequent part truly make it fairly affordable (to not point out that, as I noticed in my earlier post about code golf, capabilities that “naturally” take three or extra arguments are surprisingly uncommon).

Earlier than we transfer on, I need to level out that “having the ability to write code from left to proper with out backtracking” is a totally bonkers factor to optimize a programming language for. This shouldn’t be anyplace within the high hundred priorities for any “severe programming language”! Most code is learn way more typically than it’s written. An additional keystroke right here or there may be simply maximally insignificant. Fortuitously, Noulith just isn’t a severe programming language, so I’ve no qualms about optimizing it for no matter I need.

Operator priority and chaining

Right here’s one thing we haven’t mentioned: what’s the priority of binary operators? Is an expression like a + b * c evaluated as a + (b * c) or (a + b) * c, and why?

There are fairly a number of choices. Most languages simply decide this with a giant desk, e.g., right here’s C++’s operator precedence, however this received’t work for a language like Noulith that helps utilizing arbitrary identifiers as operators. In OCaml and Scala, priority relies on the same desk that classifies all identifiers by their first character: so, for instance, each operator whose identify begins with * binds extra tightly than each operator whose identify begins with +. You may also make this extra customizable: in Haskell, you possibly can declare the priority of operators as you outline them with fixity declarations, whereas in Swift (via, via), you possibly can declare “priority teams” and assign infix operators to them, and every group can state whether or not it binds roughly tightly than different teams. Whereas these approaches are neat, they complicate the parsing story fairly a bit. It is advisable to parse earlier code to the extent that you recognize every operator’s priority earlier than you possibly can parse later code accurately, whereas I needed to implement a easy parser that didn’t have to consider international state. Lastly, some languages like Smalltalk and APL (and APL descendants) dispense with priority totally: all binary operators are left-to-right in Smalltalk and right-to-left in APL, which suggests you possibly can’t depend on the priority for arithmetic operators and equality you realized in math class. I believe getting used to it isn’t too unhealthy, however determined it was nonetheless value making an attempt to keep away from.

Alongside this query, although, I used to be contemplating an much more tough purpose: I needed to have the ability to chain comparisons like in Python, e.g., 0 <= x < n. This type of testing if one thing is in vary is widespread, and having to jot down expressions like 0 <= x && x < n annoys me, particularly when x is a sophisticated expression I don’t need to write twice or stick in an intermediate variable. It’s additionally an additional alternative to make a mistake like 0 <= x && y < n — I’ve written these bugs and struggled to seek out them earlier than. So, how may I add this syntax function?

Syntax help for chained comparisons is uncommon amongst programming languages as a result of it’s “pure syntax sugar” that doesn’t allow you to write extra attention-grabbing code (regardless of my complaints, stashing the center expression in a variable isn’t a giant deal) and is simply typically disagreeable to parse. After Python, I believe essentially the most well-known languages to help chained comparisons are Raku and CoffeeScript. I additionally realized that there’s a C++ proposal so as to add them, although it doesn’t appear more likely to get anyplace. I labored briefly with a Mavo implementation that bolted comparisons on high of a parse tree from a library. However all of those languages obtain this purpose by privileging comparability operators within the syntax, whereas I needed them to be parsed the identical manner as each different symbolic identifier.

Whereas researching this additional, I discovered a very neat technique of help in Icon (via), the place comparability operators are left-associative within the regular manner, however “simply work” as follows (based mostly on my understanding after studying the Icon documentation for 2 minutes):

  • Expressions both “succeed and produce a end result” or “fail”.
  • If a comparability is true, it succeeds with its right-hand-side as its end result. In any other case, it fails.
  • Management movement statements test whether or not an expression succeeds fairly than what its result’s.

So in Icon, a chained comparability a < b < c is evaluated by first evaluating the subexpression a < b; if a is lower than b, this simplifies to b after which checking if b < c; if both comparability isn’t true, the expression fails. If each comparisons go, the expression evaluates to c, however that doesn’t matter, as a result of the one necessary criterion is whether or not the expression succeeded. Whereas that is cute, I didn’t need to overhaul what “evaluating an expression” means in Noulith to incorporate an extra success/failure standing, simply to permit chaining comparisons. To not point out, I take pleasure in having the choice to deal with the reality worth of a comparability as an integer, e.g., to index into an array or sum in a loop. I’m not conscious of some other programming languages that help chained comparisons with out privileging them within the syntax (besides maybe in some actually summary sense the place code can change how subsequent code is parsed, like in Coq or one thing).

Basically, I needed a parsing technique that might deal with expressions like @ = <=; @@ = <; a @ b @@ c. If I parse a @ b @@ c as a tree of binary operator invocations, with both nested below the opposite, I’ve already misplaced. There’s no method to recuperate what was actually meant. Take into account, for instance:

change (random_range(0, 3))
case 0 -> (@, @@ = <=, <)
case 1 -> (@, @@ = +, *)
case 2 -> (@, @@ = *, +);
print(1 @ 2 @@ 3);

There’s merely no method to know which of @ and @@ binds extra tightly till the random quantity has been generated, lengthy after the code’s been parsed. So I concluded that Noulith needed to parse a @ b @@ c as a flat listing of three operands and two operators, and take care of priority at runtime. In short, what occurs then: each operator operate is examined at runtime to resolve whether or not it “chains” with the following operator to supply a single operator invocation subexpression, after which to resolve which operators bind essentially the most tightly.

From there, it was straightforward and pure to make operator priority accessible and mutable by customers. With out pondering too arduous, I threw it below a string key "priority" simply to get one thing working, so I may take a cool screenshot and publish it on Twitter. Then it stayed there out of inertia. Right here’s a remake of that screenshot with the most recent syntax and highlighting.

REPL in which two arithmetic operators are swapped and their precedences are swapped, and this is shown to affect the parsing and return value of a function using those operators. Screenshot of terminal.

Whereas that is most likely deeply disturbing to any parser fanatics on the market, it opens up the sector for us to simply add chaining help to principally any operator, and there are literally some extra “good” unintended effects of this!

  • Cartesian product and zip operators can behave extra properly with three or extra operands. If zip have been a standard left-associative binary operator, then the results of [1, 2, 3] zip [4, 5, 6] zip [7, 8, 9] would start with [[1, 4], 7]. However by permitting zip to acknowledge if you’re instantly zipping its output with one other sequence, you possibly can produce a end result that begins with [1, 4, 7]. The one different language I’ve seen that helps one thing like that is TLA+’s Cartesian product ×, although I’ve no clue the way to seek for this sort of syntax in different programming languages.

  • Runs of binary operator invocations can naturally embody capabilities that take greater than two arguments. By saying that change chains with with, I assist you to tack change b with c onto the tip of a sequence of binary operators.

  • Lastly, capabilities can have “non-compulsory arguments” whereas nonetheless being referred to as in the identical binary operator fashion. By saying that to and til chain with by, I enable the expression 1 to 10 by 2 with out affecting the which means of 1 to 10. (Scala achieves the identical impact with out parsing shenanigans by having ranges being conscious of what sort of vary they’re and supporting by as a method.)

One other implementation element of word is that Noulith precedences are floating level numbers. I assumed this was pure as a result of it appears that evidently each programming language with only some precedences, like Haskell’s 10, ultimately will get complaints that there’s no room to suit an operator’s priority between two present ones. Some languages hedge by leaving gaps, the best way BASIC programmers spread out their line numbers within the Nineteen Seventies (or so I’m advised) and CSS builders spread out their z-index values, simply in case it’s essential to insert one thing in-between later: Coq makes use of precedences from 0 to 100, with defaults principally at multiples of 5 or 10; Prolog, from 0 to 1200 in multiples of fifty or 100; Z, at multiples of… 111? However floating-point precedences allow you to depart finer gaps with much less foresight. I think about different languages don’t do the identical for causes alongside the strains of, the semantics of floating-point numbers are too sophisticated and unportable for a core function of language syntax to rely upon them. (What if an operator’s priority is NaN?) I can sympathize rather a lot with this, however as I’ve no ambitions for Noulith to change into a language with a proper specification, I didn’t thoughts.

Lastly, I ought to point out the usual boolean “operators” and and or. These operators are, and need to be, particular in most programming languages as a result of they should short-circuit — in an expression like a and b, if a evaluates to one thing falsy, then b just isn’t evaluated, which is necessary for each effectivity and correctness. For instance, you possibly can test if an index is in bounds for an array on the left facet of an and after which carry out the precise indexing on the fitting; with out short-circuiting, the indexing would nonetheless be tried when the index is out of bounds, inflicting an error. and and or could be regular capabilities/operators in some languages with simply accessible lazy analysis like Haskell, or regular macro constructs in different languages like Lisps. Sadly, Noulith lacks each colleges, so its and and or do need to be language constructs. As in Python, these expressions return the final or first truthy expression they encounter (e.g., 2 and three is 3 as an alternative of simply “true”), enabling them to emulate conditional expressions in some contexts. I additionally added the SQL-inspired coalesce, which has similarities to or however solely rejects null as its left operand, with the imprecise concept that it could possibly be utilized in extra exact “default worth” setups, however barely ended up utilizing it. (Nevertheless, not doesn’t want any particular conduct, so it’s only a regular operate.)

Variables, statements, and blocks of code

We’re lastly graduating from expressions to statements. First up: How do you declare a variable? I used to be simply going to repeat Python at first and use a easy = for each declaration and project, however then I learn the Crafting Interpreters design word on implicit variable declaration and was totally satisfied, so I began searching for a syntax to differentiate them.

In some statically typed languages (principally C/C++/C♯ and Java), variable declarations begin with the variable’s sort merely juxtaposed with its identify. I’m not sufficiently invested in static varieties to need this, however even when I have been, since I already determined that juxtaposition could be operate invocation, making an attempt to repeat this actual syntax principally implies that Noulith has to right away have the ability to inform whether or not it’s parsing a kind or an expression when beginning to parse an announcement. That is doable by a technique like saying that varieties need to be capitalized or one thing, however… it’s sophisticated.

Nonetheless, there are various different viable decisions. let? var? my? Heck, I may spell out variable as in Ceylon. In the long run I landed on utilizing :=, form of like Go and even Pascal, each for succinctness and since I spotted I appreciated the choice of having the ability to declare varieties generally (like Python 3 annotations, as utilized by sort checkers like mypy): conveniently, a declaration like a := 3 could be seen as a particular case of a declaration like a: int = 3 the place the sort is omitted, which Noulith additionally helps. Of word is that Noulith checks the values assigned to typed variables at runtime, so the next errors:

a: int = "hello"

As does this:

a: int = 6;
a = "hello"

That is weird and foolish — normally you don’t need sort annotations to have any runtime price, a lot much less each time you assign to an annotated variable — but it surely catches some bugs and is manner simpler to implement than a static evaluation go, plus it’s in keeping with a extra affordable conduct for typed patterns in sample matching, which we’ll discuss a lot, a lot later.

One other benefit is that by pondering of x: as a common lvalue (crudely, a “factor that may be assigned to”), this syntax naturally generalizes to single assignments that concurrently assign to an present variable and declare a brand new one: x := [3, 4]; (a:), b = x. (Go’s short variable declarations are considerably magic right here: you should utilize a single := to concurrently declare some new variables and assign to some outdated ones, so long as at the least one variable is new. I believe that is barely inelegant, and generally daydream about Evan Miller’s proposal whereby it’s essential to write precisely as many colons as variables you’re newly declaring. However as my gripes with languages go, it ranks fairly low.)

Additionally in contrast to Crafting Interpreters, I don’t enable redeclaring a variable with the identical identify in the identical scope. The e book makes a extremely good level that that is annoying for REPL utilization, the place programmers may simply need to use and reuse variable names with out mentally monitoring which of them have been declared up to now. I’ve not made up my thoughts right here but, so redeclarations are banned for now, principally as a result of it’s simpler to make guidelines laxer than stricter because the language develops, however I think I’ll find yourself lifting this restriction in some unspecified time in the future.

Subsequent: how are statements and blocks of code (for management movement branches, e.g.) delimited? I used to love indentation-based construction a la Python, the thought being that, since you need your code to be indented to replicate its construction for the human reader anyway, having your language additionally require braces or different delimiters is redundant. Nevertheless, I’ve realized to understand that redundancy just isn’t inherently unhealthy, and management movement that’s delimited with solely indentation is definitely fairly annoying to refactor. When you’ve got a block of nested code that you simply need to transfer round, it’s important to observe its indentation way more fastidiously than you’d have to if there have been express delimiters. For instance, suppose I needed to inline the decision to f on this Python code:

I’d attempt copying the physique of f the place I need it to go, changing the decision to it, which looks as if it ought to work as a result of its arguments and parameters are precisely the identical. Uh-oh:

This code is presently damaged, and to repair it I’ve to indent the 2 strains I copied precisely twice, whereas taking care to not indent the strains subsequent to it. That is an exaggeratedly easy case, however the block of code being transferred might need its personal inside indentation or different inside particulars that should be saved observe of, like parameter names that should be modified, making the switch a lot trickier. Alternatively, in the same language with braces, the copy-pasted code can be syntactically and semantically appropriate with no additional effort, and its indentation can trivially be fastened by any competent textual content editor.

To defend the indentation resolution, I’d say that that is uncommon and that the fitting method to keep away from it’s to keep away from deeply nested code within the first place, or simply to get higher editor help (I haven’t spent sufficient time in massive Python tasks to look into extra subtle tooling, however I assume it exists). I’d additionally level out all the opposite prices of the braces-based resolution, such because the clean strains with solely }s in them. I don’t assume this can be a horrible protection — deeply nested code is commonly value avoiding. However I needed Noulith to help code with out a number of effort put into structuring it and breaking issues into capabilities, so I selected to stay with express delimiters.

What delimiters, although? Unusually, I ended up utilizing extra parentheses, fairly than the way more widespread curly braces, as a result of I discovered the simplicity of not distinguishing expressions and statements fairly interesting. Scala (at the least, model 2) is one language the place some blocks could be written with both parentheses or curly braces, that are comparable however have subtly different semantics, and I didn’t need to take into consideration that. This led me to comply with C/C++/Rust and all the time require statements to be separated by semicolons, as a result of if any expression is usually a collection of statements, and if a language’s syntax is so versatile in different methods, it’s actually arduous to guess when a newline is supposed to finish an announcement. Different languages can say that line breaks don’t depend inside parentheses, or have much more sophisticated guidelines for automatic semicolon insertion; however the flexibility of Noulith syntax means code just like the contents of the parentheses beneath actually may make sense as one massive expression or as two expressions (the latter of which calls * with one argument to partially apply it).

x := (2
# whats up
* 3)

All this does make Noulith’s parser extremely unhealthy at recovering from mistakenly omitted semicolons, which is one cause I’d wholeheartedly disrecommend anyone attempt to write Noulith applications which are bigger than quick-and-dirty scripts. It’s most likely too late to repair this at this level, and in hindsight, maybe I ought to have thought a bit extra about alternate options earlier than allocating each sq. and curly brackets to literals. Nonetheless, I don’t know if I’d have determined any otherwise. I like all the opposite options I received for this tradeoff.

Management movement

Having mentioned a lot of the choices surrounding easy expressions and statements, we are able to flip our consideration to regulate movement buildings.

A basic syntactic situation most languages need to grapple with: within the syntax for a assemble like if situation physique or whereas situation physique, you want some method to determine the place situation stops and physique begins. There are a pair choices:

  • You might use a key phrase like if situation then physique (e.g. Haskell, Ruby) or whereas situation do physique (e.g. varied POSIX shells, Scala 3).
  • You might use punctuation like if situation: physique (e.g. Python).
  • You might require parentheses (or another delimiter) across the situation like if (situation) physique (e.g. C/C++, Java, JavaScript).
  • You might require braces (or another delimiter) across the physique like if situation { physique } (e.g. Go, Rust). (Observe that this solely works if reputable situations by no means comprise the delimiter, so doing this with parentheses wouldn’t work in Noulith and most different languages.)

I partly locked myself out of contemplating the final possibility by allocating curly brackets to units, however I believe that for my use case, I nonetheless most popular the old-school C-like resolution of parenthesizing the situation as a result of I typically wrote nested management buildings with our bodies that have been lengthy however solely comprised a single expression. In such instances, I assumed it was much less psychological load to sort the closing parentheses sooner. For instance, I assumed this:

if (a) for (b <- c) if (d) e;

appeared neater and simpler to jot down than this:

if a { for b in c { if d { e }}};

I additionally copied from Scala/Rust the power to make use of if/else constructs as expressions, which simply return regardless of the final expression of the taken department consider to, so you possibly can write code like:

print(if (a < b) "a is much less" else "b is much less or equal")

Semantically, this assemble (and all others that care about “fact worth”, e.g., filter predicates) decide truthiness similar to Python, the place 0 (which false is a synonym for), null, and empty collections (lists, strings, dictionaries, and so on.) are falsy and all different values are truthy. That is one other alternative I made with out a lot thought, and isn’t in any respect the one believable one — you could possibly, for instance, contemplate 0 truthy like Ruby and most Lisps, or contemplate empty lists truthy like JavaScript. You might contemplate the string "0" falsy like PHP and Perl. You might contemplate every thing aside from true false like Dart. If you wish to be actually adventurous, you could possibly contemplate integers truthy iff constructive, like Nibbles; or iff equal to 1, like 05AB1E; or in the event that they’re ≥ 0.5, like in Game Maker Language (in some contexts?) The Pythonic rule is smart to me in that it does one thing helpful for many information varieties, however I think that that is principally simply because I’m used to it.

On ternary expressions

I’ve to go on one other mini-rant right here. Ternary expressions are an necessary function of programming languages to me, and I’m nonetheless aggravated that Go doesn’t have them. Critics say they’re complicated and might all the time get replaced by if-then-else statements — code like:

var foo = bar ? baz : quux

can all the time be rewritten as:

var foo
if (bar) {
    foo = baz
} else {
    foo = quux
}

That is six strains as an alternative of 1. Now, I attempt to not let my code golf tendencies seep into different contexts, besides I believe six strains as an alternative of 1 is an unacceptable quantity of verbosity and really makes the code a lot tougher to learn, notably in instances when all of the constituent expressions are actually that brief. The gap between foo’s declaration and initialization additionally implies that readers need to take care of the psychological load of worrying “is foo going to be initialized?” when studying this code.

One may suggest the shorter four-line various in response, which regularly works:

var foo = quux
if (bar) {
    foo = baz
}

Even ignoring the instances the place evaluating quux has unintended effects that break this rewrite, what I don’t like about this code is that to readers, the primary assertion var foo = quux is a lie. Semantically, it seems that the code is stating an unconditional proven fact that foo ought to be outlined as, or at the least initialized to, quux; so if quux is a sophisticated expression, readers may be mulling over the way to perceive that reality. For an instance (taken straight from Noulith itself), say I used to be implementing an interpreter that took one command-line argument, which could possibly be both a filename or a literal snippet of code, relying on a flag. The one-branch if may look one thing like:

var code = arg
if (flag) {
    code = open(arg).learn()
}
// lex, parse, and execude code...

arg is usually a filename, wherein case it’s undoubtedly not a snippet of code. If a reader already know this, or maybe guessed it and is skimming the code to confirm whether or not it’s true, they usually learn the road var code = arg, they’ll stumble. After all, they’ll most likely determine what’s occurring in the event that they maintain studying two extra strains, however why allow this confusion to happen within the first place?

I can, nonetheless, sympathize with believing that ? : is just too cryptic, so I most want Rust and Scala’s strategy of simply accepting all the if/else assemble to be an expression, permitting code like:

code := if (flag) {
    open(arg).learn()
} else {
    arg
}

That is trustworthy and avoids ever suggesting to readers that code is unconditionally one thing it’s not. It’s additionally simpler to suit on one line (although linters may complain).

Loops

With if/else out of the best way, we are able to transfer on to loops. Noulith has whereas loops, that are fairly unremarkable, however no do ... whereas loops or infinite loops but. The for loops (that are all “for-each” loops) are extra attention-grabbing, although, and are one of many few options that I added below one syntax, labored with and wrote code utilizing for a very long time, after which went again to vary the syntax of. Particularly, I began with the C++/Java-style for (a : b), plus the quirky generalization for (a :: b) for iterating over index-value or key-value pairs. However ultimately I concluded this interfered an excessive amount of with wanting to make use of : for “sort annotations”, so I swapped out the separator after the iteration variable to be <-, as in Haskell and Scala. (in as in Python and Rust was not a severe contender as a result of I most popular to allocate that for use nonsyntactically as a operate; design decisions to this point stop it from doing double responsibility. I didn’t need one thing :=-based as in Go simply because the image := doesn’t counsel that to me.) I additionally copied Scala to supply a function I exploit rather a lot in search-type scripts, permitting a number of iterations in a single loop, in addition to if guards.

for (a <- as; b <- bs; c <- cs; d <- ds; if cond(a, b, c, d)) e

Additionally from Scala I copied the power to change this into an inventory comprehension:

for (a <- as; b <- bs; c <- cs; d <- ds; if cond(a, b, c, d))
  yield e

Lastly, impressed by a number of Discord conversations, I additionally enable dictionary comprehensions:

for (a <- as; b <- bs; c <- cs; d <- ds; if cond(a, b, c, d))
  yield okay: v

I don’t have way more to say about these loops, besides maybe to notice that they are surely only for iteration, as an alternative of being syntax sugar for monads or something like that.

Structs

It is a brief part as a result of this was a last-minute addition and I haven’t actually used it a lot but, however Noulith helps structs, that are tremendous bare-bones product varieties.

struct Foo(bar, baz);

Every occasion of Foo has two fields. The variables bar and baz are “reified fields” that can be utilized as member entry capabilities, and likewise used to assign or modify the fields with the identical indexing syntax as every thing else.

foo := Foo(2, 3);
bar(foo); # evaluates to 2
foo.bar; # simply operate software, evaluates to 2 for a similar cause
foo[bar] = 4;
foo[bar] += 5;

Essentially the most notable facet is that bar and baz are literally simply newly outlined variables holding these subject objects, and never namespaced below the struct Foo in any manner. Noulith is not going to allow you to outline one other struct with a subject named bar or baz (or some other variable with both identify) in the identical scope. This was principally the lowest-effort manner I may consider to get usable structs into the language, and the one factor I’ll say in protection of this design is that Haskell document fields have hogged their names in a lot the identical manner till possibly 2016, when GHC 8 launched DuplicateRecordFields, and continues to be experimenting with language extensions like OverloadedRecordUpdate. So I’m permitting myself at the least twenty years to determine one thing higher.

Sample matching, lvalues, and packing/unpacking

Noulith has change/case for fundamental sample matching. (Instance lifted from Python’s pattern matching tutorial.)

change (standing)
case 400 -> "Dangerous request"
case 404 -> "Not discovered"
case 418 -> "I am a teapot"
case _ -> "One thing's incorrect with the Web"

(A syntactic commentary: as a result of we have now the case key phrase and since changees don’t make sense with out at the least one case, the parentheses across the change argument aren’t vital like they’re with if or whereas. Noulith’s parser nonetheless requires them for now for consistency, however maybe I ought to raise this requirement…)

Not like some comparable constructs in different dynamic languages, Noulith’s change expressions error out if no instances match, though there’s a strong case to be made for doing nothing and returning null. It is a change I made throughout Creation of Code after writing too many bugs attributable to mistakenly omitted default instances.

Apart from test for equality with constants, sample matching can destructure/unpack sequences:

change (x)
case a, -> "one"
case a, b -> "two"
case a, b, c -> "three"
case a, b, c, ...d -> "extra"

One gotcha, shared with many different languages’ pattern-matching, is that variable names in patterns all the time bind new variables, whereas generally you need to test equality towards a beforehand outlined variable. This code, for instance, is not going to do what you need. The sample will all the time match and outline a brand new variable named not_found equal to x.

not_found := 404;
change (x)
case not_found -> "Not discovered"  # sample will all the time match

Scala and Rust each assist you to work round this by supporting constants which are syntactically distinct from variables; Python helps “fixed worth patterns” that should be dotted, which I believe is luckily widespread. Noulith’s barely extra common workaround is the key phrase actually, which turns an expression right into a sample that evaluates the expression and checks for equality.

not_found := 404;
change (x)
case actually not_found -> "Not discovered"

Patterns may also test the kind of values at runtime (which is why this test additionally happens when declaring variables):

change (x)
case _: int -> "it is an int"
case _: float -> "it is a float"

To implement the analogue of many languages’ much more common patterns, “sample guards”, that allow you to test for arbitrary predicates, you possibly can manufacture arbitrary varieties with satisfying (which is a standard operate). I’m not certain that is “proper”, but it surely was straightforward.

change (x)
case _: satisfying(1 < _ < 9) -> "it is between 1 and 9"

Notably lacking is the power to destructure customized structs, partly as a result of I haven’t gotten round to it and partly as a result of there are considerations about how this interacts with augmented project, which we’ll discuss a lot later.

In hindsight, I don’t know why I used the extraordinarily old-school C/C++/Java change key phrase. match makes way more sense and is widespread right now. Even Python adopted it. However it’s what it’s for now.

Anyway, my expertise was that you simply don’t want a number of capabilities for sample matching to be actually helpful. The trivial product sort supplied by sequences is sufficient to approximate sum varieties simply by manually tagging issues with constants. Additionally, sample matching is simply actually helpful for parsing Creation of Code strings. Day 7 (my full code) may be one of the best instance:

change (line.phrases)
case "$", "cd", "/" -> (pwd = [])
case "$", "cd", ".." -> pop pwd
case "$", "cd", x -> (pwd append= x)
case "$", "ls" -> null
case "dir", _ -> null
case measurement, _name -> (
    for (p <- prefixes(pwd)) csize[p] += int(measurement)
)

In languages with out sample matching, the best method to deal with this may be to jot down a bunch of deeply nested if/else statements that seem like the next, which is a ache to learn, write, and debug:

if (a == "$") (
    if (b == "cd") (
        if (c == "/") ( ... )
        else if (c == "..") ( ... )
        else ( ... )
    ) else if (b == "ls") ( ... )
)

It occurs that Day 7 is the one day on which I used to be first to resolve both Creation of Code half, and I received first on each that day. Maybe this was an element?

Nevertheless, Noulith’s sample matching has its personal points. Here’s a sample that’s surprisingly tough to help, which I solely realized in mid-September:

change (x)
case -1 -> "it is unfavorable one"

Clearly, we wish the case to match if x equals -1. The analogous sample for nonnegative integers works with the easy, apparent rule: a price matches a literal in the event that they’re equal. Sadly, -1 just isn’t a literal — it’s a operate invocation! Outdoors a sample, it calls unary minus on the argument 1.

The only method to resolve that is to say that, when parsing a sample, - will get connected to the next numeric literal if one exists. Python’s sample matching, for instance, particularly permits - within the syntax for literal patterns — as does Rust, as does Haskell. As for Scala, its literal patterns are syntactically the identical as its literals, which embody a unfavorable check in each context. One cause this is smart for it however not the opposite languages I simply listed is that, courtesy of its Java/JVM lineage, the units of authorized constructive and unfavorable integer literal will not be symmetric as a result of they characterize two’s-complement machine phrases. Particularly, -2147483648 is a authorized Java/Scala expression, however 2147483648 by itself is a compile-time error. (Subsequently, so is -(2147483648)! I first realized this from Java Puzzlers.)

However returning to Noulith: having gotten this far with out privileging - within the syntax, I made a decision to attempt slightly tougher. Thus, I had sample matching “ask” the operate - the way to destructure the scrutinee into an interior sample. That’s, to see whether or not x matches the sample -1, Noulith resolves the identifier -, determines that it means negation in a pattern-matching context, negates x, and matches that towards the sample 1.

Which means sample matching like this works as properly:

change (x)
case -y -> print(x, "is unfavorable", y)

This makes it straightforward to help a bunch of different, considerably advert hoc patterns, like permitting fractions to be destructured into their numerator and denominator.

change (f)
case x/y -> print("numerator is", x, "and denominator is", y)

Or checking for divisibility. As a result of we are able to.

change (x)
case 2*okay -> print(okay, "pairs")
case 2*okay + 1 -> print(okay, "pairs with one left over")

However essentially the most “evil” pattern-matching mode I’ve carried out might be for the comparability operators. A sample like 1 < y < 9 matches any quantity that’s better than 1 and fewer than 9, and binds that quantity to y. Extra typically, a sequence of comparability operators with one variable matches any worth that will fulfill these comparisons. But when the chain has X variables the place X > 1, it matches any listing of X values that will fulfill these comparisons if plugged in.

xs := [2, 7];
change (xs)
case 1 < a < b < 9 ->
  "two strictly rising numbers between 1 and 9"

This works as a result of, earlier than an expression is matched towards a sample, there’s a preparatory go by way of the sample that evaluates literals and actually expressions and presents them to the operate, in order that any operate requested to destructure one thing in the course of the matching course of is aware of which of its operands are identified values and that are different patterns that it would ship one thing downwards into. Additionally, capabilities decide their priority and chaining properties as they’d outdoors a sample. So, the three <’s within the above instance chain into one operate that’s then requested whether or not it matches [2, 7], with the knowledge that it has 4 “slots”, the primary and fourth of which comprise values 1 and 9 and the second and third of that are its accountability to fill. Nevertheless, it doesn’t know any extra specifics about what patterns produced these values or what patterns are within the slots it has to fill. Its view of the state of affairs is similar as within the following instance (which additionally succeeds… at the least after I fastened a bug I discovered whereas scripting this publish):

xs := [2, 7];
change (xs)
case 1 < 2*a < 2*b + 1 < actually 3*3 ->
  "a good quantity after which an odd quantity, each between 1 and 9"

I needed to look all this up within the code to recollect the way it works. I believe I wrote this whereas possessed by the dragon. Nonetheless, having the ability to write notation like this pleases my interior mathematician.

The final function of patterns is or, which can be utilized to mix patterns to supply a sample that matches if both subpattern matches. I believe | is much more widespread in different languages, however once more, I needed | to be a standard identifier within the syntax. Sample-combining has short-circuiting conduct that may’t be carried out by a standard pattern-matching operate, similar to or in an expression can’t get replaced by a operate, so it made sense to me.

The opposite management movement construction utilizing sample matching is attempt/catch.

attempt 1//0
catch x -> print(x)

The code within the physique of the attempt is evaluated usually, besides that if an exception is thrown, the exception is checked towards the catch clause’s sample in a lot the identical manner a case clause checks whether or not the change argument matches a sample; if it matches, the catch’s physique is evaluated and the exception just isn’t propagated additional. For no matter cause, I solely enable every attempt to simply accept one catch clause now, though it might be straightforward and extra smart for every attempt to simply accept a number of clauses, the identical manner one change accepts a number of cases. I’ve no excuse besides laziness. Perhaps I’ll implement it after ending this publish.

As beforehand talked about, Noulith doesn’t have a particular sort for exceptions or errors, though it “ought to”. You may simply throw and catch any worth you possibly can retailer in a variable. Most (all?) errors thrown by built-in capabilities are simply strings for now, and most of my Creation of Code options simply throw and catch the string "finished". The terribly poor error dealing with is one more reason no person ought to write manufacturing code in Noulith.

Sample matching can be helpful in mere assignments, for destructuring a sequence and assigning totally different elements to totally different variables…

foo := [1, 2];
a, b := foo

…in addition to in capabilities’ parameter lists. So let’s flip to these subsequent.

Capabilities

What do capabilities and lambdas seem like?

I really like lambdas and wish Noulith to help practical programming extensively, so a key phrase like Python’s lambda is certainly too verbose for me. This isn’t a syntax the place there’s a lot uniformity throughout programming languages to be discovered, so I went with Haskell’s brief, snappy , which I believe is meant to seem like an precise lambda λ should you squint. (The actually “enjoyable” possibility would have been to straight use U+03BB λ, which is definitely straightforward for me to sort with a Vim digraph, Ctrl-OkL*; however I’m not that adventurous and didn’t assume I’d do anything with anyway. To not point out, λ is a Letter, the incorrect Unicode Normal Class.) The remainder of the syntax is a mixture of Python and Haskell: parameters are delimited with commas, however the parameter listing is separated from the physique with ->.

a, b -> a + b

On reflection, I spotted many programming languages don’t begin lambdas with a prefix sigil in any respect, e.g., JavaScript and Scala have arrow capabilities much like x => x + 1 or (x, y) => x + 4; you simply parse a comma-separated listing of expressions, then if you see an arrow you flip that expression into an argument listing. This doesn’t make parsing meaningfully tougher as a result of I already need to do comparable backtracking when parsing the LHS of an project. However utilizing a prefix sigil does enable me to proceed to reject () as a syntactically invalid expression, as an alternative of accepting it in some contexts to specific a lambda with zero parameters () => x. Plus, a prefix-less syntax would make parse errors much more fragile. So I used to be happy sticking with .

Lastly, I made a decision I used to be comfy sufficient with lambdas that I didn’t really feel the necessity to design and add a separate syntax for declaring named capabilities. Simply make a lambda and assign it to a variable. One downside, although, is that it’s generally helpful for debugging or metaprogramming for capabilities to know what their very own names are, so I wouldn’t rule out including a syntax for outlining and naming a operate sooner or later.

Whereas we’re speaking about lambdas, let’s discuss a standard lambda-related pitfall and certainly one of Noulith’s weirdest key phrases. Fast, what’s incorrect with the next Python code?

The issue, which many a Python programmer has been bitten by, is that each one the lambdas shut over the identical variable i, which is shared between loop iterations. When the loop concludes, i is 9, so all the capabilities add 9. Even worse, should you have been constructing adders in an crucial for loop, you could possibly nonetheless mutate i outdoors the loop (for instance, by unintentionally utilizing it in one other loop).

This situation is much less more likely to seem in Noulith. Firstly, partial software is far more widespread, typically obviating express lambdas, and the act of partial software grabs the variable’s worth fairly than closing over it. Secondly, Noulith for loops get a contemporary iterator variable in every loop iteration, so even should you did make express lambdas just like the above, they’d shut over totally different variables — certainly one of only a few breaking modifications (probably the one one?) being considered for Go 2, which ought to attest to how treacherous the choice is. The associated discussion has enjoyable tidbits like:

Loop variables being per-loop as an alternative of per-iteration is the one design resolution I do know of in Go that makes applications incorrect extra typically than it makes them appropriate.

We constructed a toolchain with the change and examined a subset of Google’s Go checks […] The speed of latest check failures was roughly 1 in 2,000, however practically all have been beforehand undiagnosed precise bugs. The speed of spurious check failures (appropriate code truly damaged by the change) was 1 in 50,000.

Nonetheless, should you needed to artificially induce this error, you could possibly write one thing like:

i := 0;
adders := for (_ <- 1 to 10) yield (
    i += 1;
    x -> x + i
)

Faux which you can’t use a singular loop variable or partial software as a result of different problems within the code. How may you make the code work as meant anyway?

One strategy, widespread in older JavaScript, can be to make use of an instantly invoked operate expression (IIFE). Translated to Noulith, this could be:

i := 0;
adders := for (_ <- 1 to 10) yield (
    i += 1;
    (i -> x -> x + i)(i)
)

Noulith doesn’t have this function (but), however one other strategy you possibly can typically get by with in Python is utilizing a default argument (although this dangers swallowing later errors the place adders’s parts are referred to as with two arguments, and won’t work should you needed to do deeper metaprogramming on the capabilities):

However I don’t discover both of these completely satisfying. Noulith gives a special manner out with the freeze key phrase:

i := 0;
adders := for (_ <- 1 to 10) yield (
    i += 1;
    freeze x -> x + i
)

freeze takes an arbitrary expression, normally a lambda, and eagerly resolves each free variable to the worth that that variable holds. So within the lambda produced by freeze x -> x + i, i is “frozen” to the worth the variable i held on the time of manufacturing (and so is the operator +). Apart from the semantic change, freeze may also be used as a gentle optimization, since in any other case the lambda must lookup i and + by their string names within the setting on every invocation (one thing that could possibly be optimized out by extra clever compilers, however: effort!)

See Also

On reflection, this took a silly quantity of labor for what quantities to a celebration trick, however I used to be in a position to reuse a number of the work for static passes later, so it labored out.

Augmented project

Along with the unpacking/sample matching we’ve already mentioned, many programming languages additionally help one other variant of project assertion generally referred to as augmented assignment, as in x += y. That is typically described as merely being shorthand for x = x + y, however many languages even have shocking refined semantic variations between the 2. In C++, I imagine they’re the identical for numeric varieties, however courses can overload particular person augmented project operators like += individually from every operator +. In Python, if x is a mutable listing, x += y will mutate x however x = x + y will make a brand new copy, which issues if some variable elsewhere holds reference to the identical listing. Even in that bastion of unadventurous languages, Java, x += y and x = x + y have refined variations involving sort coercion and generally when one of many arguments is a String (see Java Puzzlers 9 and 10). Noulith has its personal refined semantic distinction, however let’s speak in regards to the syntax first.

I undoubtedly needed to help +=, however in contrast to most languages with such operators, + is simply an identifier, and I didn’t need to undergo each operator and outline an augmented variant. So I assumed it made sense to permit any operate f to be a part of an augmented project f=, no matter whether or not f’s identify is alphanumeric or symbolic. This function received Noulith a shoutout in Computer Things.

I do assume this syntax function is sensible. I’ve typically needed to jot down assignments like a max= b or a min= b in search issues, the place a is a variable monitoring one of the best rating you’ve achieved up to now and b is a rating you simply achieved. These constructs are so helpful that I embody them in my aggressive programming template as minify and maxify, with definitions like the next, and I’ve discovered at the least a number of different templates on-line with comparable capabilities. (I received’t hyperlink to any concrete examples as a result of a lot of the outcomes seem like search engine optimization spam, however I’m assured many aggressive programmers aside from myself do that.)

Not solely that (and I completely forgot about this till scripting this publish), a foolish “competitive programming preprocessor” I briefly tried to create in 2015 borrowed the operator spellings <? and >? of min and max, respectively, from LiveScript in order that they could possibly be utilized in augmented project. So this has been one thing I’ve needed for a very long time. Extra prosaically, although, the augmented project with an alphanumeric identifier that I’ve utilized by far essentially the most typically is append=. All in all, I needed to help augmented project for any identifier, alphanumeric or symbolic.

There are a number of difficulties, although. Most instantly, the overwhelmingly widespread comparability operators battle with making this syntax absolutely common, and even merely relevant to all symbolic identifiers: x <= y is certainly not the augmented project x = x < y. This was one place the place inside and exterior consistency got here into arduous battle and I couldn’t see the way to get every thing I needed with out some syntax particular casing. So, Noulith’s lexer particularly treats the 4 tokens ==, !=, <=, and >= specifically. All operators whose names finish with = are lexed as which means augmented project, aside from these 4. In hindsight, I may have appeared tougher for precedent: Scala has very similar carveouts, however moreover carves out any image beginning and ending with =.

Even with that determined, it’s not clear how precisely wherein stage of lexing and parsing this ought to be dealt with. Proper now, the lexer parses tokens like += as two separate tokens, so the parser simply parses a += 3 as assigning 3 to a +. This manner, augmented assignments look the identical to the parser regardless of whether or not the augmenting operator’s identifier is alphanumeric or symbolic. Then, the left-hand facet a + is parsed as a name expression, the identical sort utilized in juxtaposition for unary operators; and when a name expression is assigned to, it performs an augmented project.

This works, however is definitely an enormous downside for inside consistency. Did you discover it? We already determined that in sample matching, a sample like a b, which is a operate name, is a “destructure” with a: we give a the worth we’re matching the sample towards, and it tells us what worth we should always match towards b. This permits us to successfully pattern-match towards unfavorable numbers by having a be - and b be a numeric literal. However this conflicts with wanting it to imply to enhance the project with b because the operate when on the left of an =. Alas, these two interpretations simply coexist in an uneasy rigidity for now; assignments test for the augmented project interpretation earlier than permitting any destructuring, however that test is omitted in different sample matching contexts.

This may look like an affordable compromise at first: augmentation doesn’t make a lot sense when sample matching in a change/case or attempt/catch, which ought to all the time bind new variables; and destructuring typically doesn’t make sense with a single argument on the left-hand facet of an project, which ought to be irrefutable. -x := y is horrible when x := -y works. However I don’t have a satisfying method to reconcile this with a syntax for destructuring structs I’d like some day. Ideally, given a customized product sort like struct Foo(bar, baz), each pattern-matching and easy project destructuring would work:

change (foo) case Foo(bar, baz) -> print(bar, baz);

Foo(bar, baz) = foo

However then the second project appears prefer it has a name on its left-hand facet, which we presently parse as augmented project. One concept can be to solely interpret LHS calls as augmented project when the decision has one argument, however that appears inelegant and I believe customized structs with one subject ought to be well-supported, since they’re helpful for emulating sum varieties. One other concept can be to differentiate a b and a(b) in LHSes, decoding the parentheses-free model as augmented project and the parenthesized model as destructuring. Nevertheless, augmented project with a parenthesized operator, corresponding to (zip +), isn’t that outlandish (although I’d properly conclude that forgoing this means is the least unhealthy possibility):

a := [2, 5, 3];
a (zip +)= [4, 9, 2];
a # [6, 14, 5]

Maybe the interpretation ought to be chosen at runtime based mostly on whether or not the taking part identifiers/expressions are outlined or what they consider to, like how juxtaposition decides to partially apply the fitting operate on the left argument? This appears… very messy.

Maybe the lexer ought to tackle extra accountability, lexing code like += and f= as single tokens that “imply” + or f with an = connected, in order that a b = is a destructure however a b= is an augmented project? However we additionally wouldn’t need the lexer to contemplate the primary token of x==y to be x=… proper? Or maybe we may, and require programmers to incorporate the house between x and == when writing an expression like x == y? Or maybe the lexer can get only one additional character of lookahead? That is all to say, this is among the corners of the language design I’m essentially the most unsure about.

Anyway, onto Noulith’s promised refined semantic distinction: augmented project like x += y “takes the worth” out of x after which units x to null earlier than calling + with the argument. To offer a concrete instance, this code efficiently appends 1 to x however prints x is null:

x := [];
myappend := a, b -> (
  print("x is", x);
  a append b
);
x myappend= 1;

This extremely uncommon conduct seems to be actually necessary for effectivity, however to elucidate why, I’ve to speak about Noulith’s curious semantics round immutability.

Immutability and copy-on-write semantics

Presumably the weirdest semantic function of Noulith is its strategy to immutability. In Noulith, all built-in information varieties are immutable, within the sense that the project to x within the following code doesn’t have an effect on y and vice versa:

x := [1, 2, 3];
y := x;
x[0] = 4;
y[1] += 5;

The identical precept applies should you go x right into a operate. That operate can not mutate x by way of its parameter. Nevertheless, as the identical snippet demonstrates, variables holding lists are mutable, and you may set and mutate their parts individually.

To be completely trustworthy, this “function” is one thing I principally sleepwalked into: Rust, the implementation language, is basically large on immutability, and Rc<Vec<Obj>> is shorter than Rc<RefCell<Vec<Obj>>>. However in hindsight, there are many causes to love it:

  • Almost everyone who completes Creation of Code in Python learns which you can’t initialize a grid you propose to mutate later with code like x = [[0] * 10] * 10, as a result of then x will include ten references to the identical mutable listing. An project like x[0][0] = 1 will set 1 in each row. Oops.

    Classic Drake meme titled 'Day 10 using python be like:' in which Drake dislikes the code crt = [['.'] * 40] * 6 and likes the code crt = [['.'] * 40 for i in range(6)]
    Meme by /u/QultrosSanhattan

    Noulith avoids this pitfall.

  • As a result of Python lists are mutable, they’ll’t be used as dictionary keys, so it’s essential to use Python’s separate tuple sort if you wish to key a dictionary by sequences. This will imply a bunch of express conversions when accessing the dictionary. Noulith additionally dispenses with this.

The massive, apparent draw back is that, if that is carried out naively, mutation is sluggish! If each project like x[i][j] = okay needed to make a duplicate of all the array in case another variable refers to x, writing performant crucial code would change into immensely tough. I didn’t instantly contemplate this a dealbreaker — it’s potential to simply suck it up and say that Noulith programmers need to get good at working with immutable information buildings. As a parallel, you possibly can write a number of code in Haskell whereas staying firmly within the land of immutable information buildings, typically by constructing new information buildings in sweeps fairly than particular person mutations (although Haskell’s ecosystem has way more subtle information buildings to help translating mutation-flavored algorithms, just like the finger bushes of Data.Sequence, to not point out neat methods to attain native mutability like with the ST monad). One other believable escape hatch would have been to show an express “mutable pointer” sort.

Nevertheless, none of that ended up mattering as a result of it was far simpler than I anticipated to implement this non-naively in Rust. The secret’s that Rust’s reference-counted pointer Rc permits you to examine the reference depend and mutate by way of a pointer if and provided that you maintain the one pointer to that individual worth — in any other case, you possibly can select to make a duplicate. In observe, you possibly can simply name Rc::make_mut. Thus, should you make an (n occasions n) grid x := 0 .* n .* n and mutate a bunch of cells with x[i][j] = okay, some rows shall be copied within the first few mutations since their reference counts are > 1, however ultimately each row will level to a singular listing that may be safely mutated with out copying, and the entire endeavor amortizes out to (O(n^2)) plus (O(1)) per project, precisely as asymptotically performant as it might be in, say, Python.

This conduct is the rationale for the weird temporarily-stashing-null conduct of augmented project. With out it, when executing x append= y and calling the append operate, there’ll all the time be one other stay reference to the listing being appended to, which ensures Noulith has to carry out a (Theta(n)) copy of the listing and append to the copy, making append= unusably inefficient. However with this function, in commonest instances append can mutate x and get the job finished in (O(1)) time. This wasn’t all the time the technique: for some time, I saved the outdated worth in x by default and manually marked a bunch of built-ins as pure in order that the additional reference can be dropped solely when a kind of was the operate used for augmented project. However ultimately I made a decision manually marking built-ins as pure was an excessive amount of work, too fragile, and nonetheless liable to overlook instances the place the additional reference could possibly be dropped. Particularly, it might stop customers from simply writing an environment friendly operate like myappend for a customized information construction and not using a ton of extra language help. So I simply enshrined this conduct into the semantics.

Why don’t different languages take this strategy? I anticipate the reply is simply that it’s “too magic”. Refined modifications in your code can simply depart an additional reference to an inventory someplace and make manipulations a lot slower. Not all code is performance-critical, however stopping programmers from reasoning about efficiency domestically to this extent is a giant deal.

There are different features of Noulith the place immutability is much more poorly thought out. The principle factor is the presence of a handful of “lazy streams” that may execute arbitrary code if you iterate over them, much like Python turbines or lazy maps in different languages. In concept, it doesn’t make sense to repeat a stream like that and faux it’s immutable. The stream could possibly be modifying recordsdata or sending packets as you iterate over it — you possibly can’t simply put it in two variables, iterate over one, and anticipate the opposite stream to nonetheless characterize the identical sequence of parts. In observe… properly, you possibly can simply shrug, name it undefined conduct if the code isn’t a pure operate, and permit the programmer to shoot themselves within the foot.

Different assignments and mutations

One of many much less pleasing penalties of immutability is that there’s no method to name a operate that can mutate an argument. That is unlucky as a result of there are many widespread mutations you may need to carry out on advanced information buildings, corresponding to popping the final component from an inventory, that appear like they need to be capabilities. There isn’t any method to implement a standard operate pop such that, when you have an inventory xs, calling pop(xs) modifies it. You may attempt to make do by making pop a operate that takes an inventory and individually returns the final component and the listing of all earlier parts (that is an present built-in, unsnoc — the inverse of snoc, the reverse of cons, as Lispers will acknowledge), after which asking folks to jot down:

xs, (x:) = pop(xs)

However should you did this, whereas pop is working, xs will nonetheless consult with the listing being popped, so pop will all the time need to inefficiently make a (Theta(n)) copy of the listing, simply as append would have with out our particular dealing with round augmented project. This is able to make it basically unusable.

So… I made pop a key phrase that mutates its operand and returns the popped worth.

There are two different key phrases that carry out comparable mutations: take away removes a component from an inventory or a price from a dictionary, so given,

xs := [1, 2, 3, 4];

take away xs[1] will consider to 2 and can depart xs set as [1, 3, 4]. And devour is the lowest-level mutator that takes the worth from an lvalue and leaves behind null, in a mechanism vaguely harking back to C++ transfer semantics. This at the least provides you one other method to effectively pop a component should you solely had unsnoc:

xs, (x:) = unsnoc(devour xs);

Extra importantly, this allows you to write and use analogous environment friendly mutating capabilities for customized information buildings, though it’s fairly verbose. It might be value introducing a key phrase that extra elegantly converts unsnoc to pop.

There are a number of different bizarre assignment-related key phrases. swap x, y swaps two lvalues, which I believe I principally put collectively only for the sake of constructing an excellent priority demo. Right here’s the remade screenshot from earlier:

REPL in which two arithmetic operators are swapped and their precedences are swapped, and this is shown to affect the parsing and return value of a function using those operators. Screenshot of terminal.

The tiny benefit of the swap key phrase over the traditional Pythonic x, y = y, x is simply that it’s extra concise when the expressions being swapped are lengthy, as they’re within the screenshot.

And at last, the each key phrase is a method to assign one expression to a number of variables and even without delay, like so: each a, b, c = 1. Partly, that is Noulith’s response to constructs like Python’s chained assignment a = b = c = 1, which I imagine was itself a restricted model of assignments in a language like C. In C, expressions that consider to the assigned worth and so can naturally be chained, however permitting this in full generality is a standard supply of bugs (contemplate the dreaded if (a = b) when if (a == b) was meant). Nevertheless, each additionally walks by way of sliced lists and dictionaries, giving it a special set of powers than chained project. Assuming x is an inventory, code like each x[2:5] = 1 assigns 1 to every of x[2], x[3], and x[4]. I can not bear in mind if I had a particular use case or ache level in thoughts when designing each; it is available in helpful occasionally, however so would a number of options. I can, simply barely, discover one place I used it on 2016 Day 8. So it could be a kind of issues that sticks round purely by way of inertia.

Naming built-ins

Syntax is necessary [citation needed], however a language additionally wants built-in capabilities to, properly, operate.

Noulith has a number of sequence-manipulating and higher-order capabilities with English names (map, filter, and so on.) that I received’t focus on an excessive amount of, besides with respect to 2 recurring points:

  • What a part of speech and tense ought to these operate names be? For instance, ought to the operate that types an inventory — or extra exactly, receives an inventory and returns a sorted copy — be referred to as type or sorted? One line of thought I recall from Java Puzzlers recommends the latter to explain capabilities that take an enter and produce a brand new output as an alternative of mutating the enter, for the reason that present-tense verb connotes mutation. I believe this is smart in contexts the place each sorts of capabilities seem typically, however immutability is so central to Noulith that I made a decision utilizing shorter present-tense verbs wouldn’t trigger any confusion.
  • Ought to identifiers with a number of phrases be named with CamelCase or snake_case? It is a robust query that I’ve flipflopped on. Aesthetically, I believe snake case appears higher, however that is completely subjective; camel case is extra compact and simpler to sort (phrase breaks marked by capital letters require hitting Shift one additional time, whereas the underscore itself requires Shift plus one other key). I selected snake case for now principally as a result of each it’s customary in each Rust, the implementation language, and Python, the scripting language I’d beforehand use for many use instances Noulith was meant to focus on.

Arithmetic operators and semantics

A much more attention-grabbing subject is selecting the names and definitions of capabilities with symbolic names, essentially the most acquainted of that are those for performing fundamental arithmetic. It may be shocking how a lot inter-language variation there may be right here. I believe the one uncontroversial operators are +, -, and *.

  • What does / imply? Most likely division (although not in, e.g., J!), however what sort? In low- to medium-level languages it’s typical for / to imply integer division. It additionally used to take action in Python 2, however turned float division in Python 3. I truly additionally used the float division definition at first, however ultimately I spotted that, as a result of I actually didn’t care about efficiency and needed to do Actual Math™, I’d as properly add rational numbers as in Widespread Lisp.
  • What does % imply? Most likely the rest/modulo, however there are a number of subtly different semantics for it. (And once more J has % imply division.) I ended up retaining the C-style conduct for % and providing the paired // and %% for rounding-down division and sign-of-divisor the rest.
  • What does ^ imply? There’s a little bit of a schism right here: mathematicians and a handful of programming languages (e.g. Haskell, Lua, Awk (!)) use it for exponentiation as a result of connoting a superscript and/or LaTeX affect, however lower-level languages normally use it for bitwise xor. I selected to facet with the mathematicians right here, as a result of for my use instances, I anticipated exponentiation to be extra virtually helpful than xor, so I didn’t need to give it an extended identify like ** (plus, I assumed there was a pure sequence-related definition for **).
  • How are comparisons made? These are principally uncontroversial. We do need = to imply project, so == is fairly locked-in, and < > <= >= are additionally shut sufficient to common that I by no means critically thought of any alternate options (regardless of their gentle battle with augmented project), however the commonest inequality operator != is tougher to justify as a result of ! doesn’t imply “not” in Noulith. I thought of Haskell’s /= (which visually appears extra like ≠), however that will collide with the pure syntax for augmented division-assignment (a difficulty Haskell itself has skilled: the Lens operator //= makes use of a double slash for that reason, and, for consistency, so does each different division-related Lens operator). The choice I discovered essentially the most compelling was truly <>, outstanding in SQL and provided in Python 2 all the best way up till its dying gasp, which is definitely fairly internally in keeping with the opposite comparisons. However ultimately I assumed the exterior consistency consideration for != was nonetheless overwhelming. Different languages that use != for not-equals with out utilizing standalone ! to imply “not” embody OCaml and fish.

    I additionally included the three-valued comparability “spaceship operator”, <=>, in addition to its inverse, >=<.

  • What symbols ought to be used for bitwise operators? There are some actual advantages to not assigning & and | and as an alternative giving these symbols different functions. For instance, & could possibly be saved for some sort of concatenation (widespread in spreadsheets), which I’d anticipate to make use of overwhelmingly extra typically in scripts than an operator for bitwise AND. However what would I name them as an alternative? Haskell calls them .&. and .|. and F♯ calls them &&& and |||, however I couldn’t discover any particular symbolic alternate options with convincing precedent. I believe the principle various would simply be to present them a prose identify like bitand/bitor as an alternative. Ultimately I made a decision to stay with & and | out of the extra consideration that it was extra internally constant if most operators for doing math on two numbers have been single characters (although the paired division/modulo // and %%, in addition to bit shifting << and >>, are all nonetheless two characters).

    However wait, on condition that I assigned ^ already, how do I write bitwise xor? I ultimately realized that I may overload ~ to carry out both bitwise complement or xor, relying on whether or not it’s referred to as with one or two arguments; that is truly internally in keeping with how we already determined - would work. Moreover, this is similar strategy because the numerical evaluation language Yorick and, curiously, the precise mirror of Go’s approach, whereby ^ means each bitwise xor and complement in order that ~ could be assigned a special which means, so this resolution isn’t indefensible when it comes to exterior consistency both. I didn’t consciously recall these examples when deciding on these names, however felt like there was precedent.

Sequence operators

  • What operator ought to we use for listing and/or string concatenation? Probably the most widespread choices is overloading +, however I by no means truly actually appreciated that. I believe overloading the identical operator to imply numeric addition and sequence concatenation is basically arduous to justify from first rules. Neither is it satisfying to the mathematicians: any algebraist will inform you that + normally connotes that you simply’re working in an abelian group, however concatenation is neither commutative (generally, a + b doesn’t equal b + a) nor invertible (generally, there is no such thing as a “unfavorable string” -a such that a + -a is the identification component, i.e., the empty sequence). Moreover, you may think about generalizing + and different arithmetic operators to some sequences, just by including or working on parts pairwise, and in reality I did need to try this for the particular sequence sort of “vectors” as a result of it’s immensely helpful in observe.

    So, what as an alternative? There are a number of choices justifiable with precedent: D makes use of ~, some MLs and F♯ use @, Ada makes use of &, Smalltalk makes use of ,, Maple (and notation widespread in cryptography) generally makes use of ||… Ultimately I went with Haskell/Scala’s ++ as a result of it generalizes properly to counsel symbolic names for different sequence operations, obtained by doubling comparable arithmetic operators: ** is the Cartesian product; &&, ||, -- mix units.

    Following this practice of thought additionally permits us to outline systematic operators for prepending/appending. Right here Scala makes use of +: and :+, with the mnemonic, “the collection is on the colon facet”, however I needed to avoid wasting the colon for different issues, so I as an alternative selected . with the alternative orientation, a single dot on the facet with a single object. So .+ prepends one merchandise to an inventory and +. appends one merchandise to an inventory. This additionally generalizes properly to other forms of collections and operations: including one component to a set could be |., “replicating” an merchandise into an inventory of n gadgets could be .*, becoming a member of two gadgets right into a length-2 listing could be .., and so on. One widespread and fairly affordable criticism about languages that make intensive use of operators or help operator overloading is that operators are notably cryptic and arduous to lookup, so I needed the “vocabulary” of operator symbols to be conceptually easy.

    I additionally selected to allocate a separate operator, $, to string concatenation, partly as a result of I once more thought the sorts of concatenation have been conceptually distinct, partly as a result of I may then make $ coerce all of its arguments to strings with out feeling unhealthy about shoehorning coercions into overloads. This turned much less compelling later as I added byte strings and “vectors” of numbers, that are different sequence varieties that generally should be concatenated however that I didn’t need separate concatenation operators for, in addition to format strings, which allow coercion to be finished much more explicitly. Nonetheless, there’s one thing good about having apply $ shut at hand for mashing a bunch of strings collectively.

  • Lastly, this isn’t precisely an operator, however what syntax (if any) ought to we use for “splatting” — that’s, declaring a operate that takes a variable variety of arguments and/or calling a operate with an inventory of arguments? We will’t make * serve double responsibility as in Python/Ruby because it’s only a regular identifier, so ... of languages like JavaScript appeared one of the best concept.

Extra operate composition

The final batch of operators I believe are value remarking on are these for operate composition. I stole >>>, <<<, &&&, and *** from Haskell’s Control.Arrow:

(f <<< g <<< h)(a, b, c) = f(g(h(a, b, c)))
(f >>> g >>> h)(a, b, c) = h(g(f(a, b, c)))
(f &&& g &&& h)(a, b, c) = [f(a, b, c), g(a, b, c), h(a, b, c)]
(f *** g *** h)(a, b, c) = [f(a), g(b), h(c)]

They’re fairly verbose, however I couldn’t consider a greater batch of names that will be acceptably internally constant.

A barely totally different operate composition operator is Haskell’s on, which is definitely from Data.Function, and primarily meant for use in the very same manner.

(f on g)(a, b, c) = f(g(a), g(b), g(c))

Lastly, lists of arguments could be “splatted” into capabilities with the JavaScript-inspired of and apply, which is beneficial for chaining with issues that produce lists:

f of [a, b, c] = f(a, b, c)
[a, b, c] apply f = f(a, b, c)

Classes realized

I believe I predicted that requiring myself to make use of solely Noulith on Creation of Code would make my median leaderboard efficiency higher however my worst-case and common performances considerably worse. I don’t assume my median efficiency improved, however my worst-case efficiency undoubtedly received worse. One way or the other it nonetheless didn’t matter and I positioned high of the leaderboard anyway. (I’ll word that 2021’s second to fourth place all didn’t do 2022.)

One utterly predictable situation: Debugging Noulith code is way tougher as a result of most programmers could be fairly assured that the basis explanation for a bug isn’t a bug within the language implementation. That assumption just isn’t protected if you additionally wrote the language! For each bug I encountered, I needed to contemplate whether or not the trigger might need been within the couple dozen strains I had written for that day or within the 13,000 strains of Rust I had written over the prior few months. In the long run, I don’t assume I ever encountered any correctness bugs within the language whereas doing Creation of Code — that’s, bugs the place the language executed a program efficiently however gave the incorrect end result — however that didn’t stop me from contemplating such a speculation at a number of moments, so I used to be nonetheless considerably slower debugging. I did encounter a number of bugs that brought on errors the place they shouldn’t have, in addition to shocking interactions between options that I’m unsure depend as bugs however counsel a flaw within the design someplace: for instance, on Day 21, I spotted that the pure stringification of unfavorable fractions like -1/2 can’t be evaled as a result of identical priority points as all the time. To not point out fairly a number of correctness bugs in later days that I used to be simply fortunate sufficient to not hit earlier than Christmas.

I used to be additionally not fairly pessimistic sufficient about my Noulith interpreter merely being sluggish. There weren’t any days that turned unattainable, however there have been a number of days the place I imagine I’d have completed a number of minutes sooner if I had simply carried out the identical algorithm in Python, whereas I anticipated possibly just one.

Taking a step again, the language design course of was a number of enjoyable. One factor I loved, which might be unrealistic in a extra severe language, was the liberty to simply add key phrases willy-nilly (swap, actually, coalesce). Including key phrases to a language with manufacturing customers tends to be a giant deal for the easy cause that it breaks code utilizing that phrase as an identifier (except the key phrase manages to be a “smooth key phrase” or one thing, however that simply complicates the parser much more). Additionally naming issues is tough generally. (Though it’s not a key phrase per se, contemplate JavaScript’s globalThis and the handfuls of rejected names.) This freedom additionally allowed me to keep away from the temptation so as to add punctuation to the syntax: the set of usable punctuation characters is way more finite and makes me need to be fairly assured {that a} language function is worthy of 1 earlier than assigning a personality to it, to not point out that serps typically have bother with documentation for them.

Reflecting on all the course of, surprisingly sufficient, I’m reminded of this bit from Lockhart’s Lament, about imagining and investigating mathematical objects:

[O]nce you’ve got made your decisions […] then your new creations do what they do, whether or not you prefer it or not. That is the wonderful factor about making imaginary patterns: they speak again!

The language design and semantics, unbiased of the implementation, are in some sense an imaginary sample, and I did typically really feel like the alternatives I made “talked again”. See how chaining comparability operators led to mutable runtime operator precedences, or how immutability led to a customized world of move-semantics-like key phrases. Fairly neat.

As for my broader sensible objectives for Noulith, when it comes to turning into a greater go-to language for fast and soiled scripts: it labored, 100%. I solved at the least two Thriller Hunt puzzles with it and used it extensively for information munging in one other upcoming challenge, generally whereas I used to be on a special system with out my dev setup, and I anticipate to proceed.

Nonetheless, possibly essentially the most generalizable takeaway is simply how I encountered Hyrum’s Law. I haven’t made any guarantees of stability/compatibility in any form or kind — I haven’t up to date the Cargo.toml model subject from 0.1.0 for the reason that first commit — but it surely form of doesn’t matter anyway: there’s a bunch of random Noulith recordsdata on the market within the wild, anyone even wrote a puzzle about the language, and I’d really feel slightly unhealthy for breaking them and not using a good cause.

General, 10/10, would do once more.

The enjoyable reality roulette

In no explicit order, listed below are some enjoyable info about non-Noulith languages that I realized or remembered whereas scripting this publish. They’re all referenced someplace within the 16,000 previous phrases however I don’t blame anyone for not studying all that.

Do you know that:

Appendix: What’s a noulith?

Nouliths are the weapons wielded by Sages, a healer class from the critically acclaimed MMORPG Closing Fantasy XIV, with an expanded free trial in which you’ll — *ahem*

It’s arduous to explain precisely what nouliths are, however in-game we’re launched to them as a set of 4 “brief staves” that sages management with their thoughts to attract. A Wikipedia blurb calls them “magical aether foci”. In accordance with Reddit sleuths, etymologically, the identify relies on Ancient Greek: νόος “thoughts” + λίθος “stone”. (The Sage’s talent set is Ancient Greek and themed around the medical theory of Humors.)

I assumed the identify was apt as a result of computer systems are additionally simply good rocks we attempt to management with our minds generally, and this programming language was an try and make a tiny nook of that management slightly smoother for a single individual. (Who additionally mains Sage these days.)

© SQUARE ENIX CO., LTD. All rights reserved. Noncommercial use is allowed by the license.

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top