Asserting Topiary – Tweag
Topiary goals to be a common formatter engine inside the
Tree-sitter ecosystem. Named after the artwork of clipping or trimming
timber into unbelievable shapes, it’s designed for formatter authors and
formatter customers:
-
Authors can create a formatter for a language with out having to jot down
their very own formatting engine, and even their very own parser. -
Customers profit from uniform, comparable code fashion, throughout a number of
languages, with the comfort of a single formatter
device.
The core of Topiary is written in Rust, with declarative formatting
guidelines for bundled languages written within the Tree-sitter query
language. On this first launch, now we have
targeting formatting OCaml code, capitalising on the OCaml
experience inside the Topiary Group and our colleague, Nicolas Jeannerod.
All growth and releases occur over within the Topiary GitHub
repository.
Motivation
Coding fashion has traditionally been a matter of non-public alternative. That is
inherently subjective, resulting in bikeshedding over formatting decisions,
slightly than significant dialogue throughout evaluate. Prescribed fashion
guides, linters and in the end automated formatters — popularised by
gofmt
, whose builders had the insight to
impose “adequate” uniform formatting on a codebase — have helped
remedy these points.
This motivated analysis into growing a formatter for our Nickel
language. Nevertheless, its inside parser didn’t present a syntax
tree that retained sufficient context to permit the unique program to be
reconstructed after parsing. After making a Tree-sitter grammar for
Nickel, for syntax highlighting,
we concluded that it might be attainable to
leverage Tree-sitter for formatting as effectively.
However why cease at Nickel? Topiary generalises this method for any
language that doesn’t make use of semantic whitespace — for which,
specialised formatters, corresponding to our Haskell formatter Ormolu, are
required — by expressing formatting fashion guidelines within the Tree-sitter
query language. It thus aspires to be a “common
formatter engine” for such languages; enabling the quick growth of
formatters, supplied a Tree-sitter grammar is
obtainable.
Design Rules
To that finish, Topiary has been created with the next targets in thoughts:
- Use Tree-sitter for parsing, to keep away from writing yet one more engine for
a formatter. - Anticipate idempotency. That’s, formatting of already-formatted code
shouldn’t change something. - For bundled formatting kinds to satisfy the next constraints:
- Suitable with attested formatting kinds used for that language in
the wild. - Devoted to the creator’s intent: if code has been written such that
it spans a number of traces, that call is preserved. - Minimise adjustments between commits such that diffs focus primarily on the
code that’s modified, slightly than superficial artefacts. - Be well-tested and sturdy, such that they are often trusted on massive
tasks.
- Suitable with attested formatting kinds used for that language in
- For finish customers, the formatter ought to run effectively and combine with
different developer instruments, corresponding to editors and language servers.
The way it Works
So long as a Tree-sitter grammar is outlined for a
language, Tree-sitter can parse it and construct a concrete syntax tree.
Tree-sitter additionally permits us to run queries towards this tree. We will make
use of those to focus on fascinating subtrees (e.g., an if
block or a
loop), to which we are able to apply formatting guidelines. These cohere right into a
declarative definition of how that language must be formatted.
For instance:
(
[
(infix_operator)
"if"
":"
] @append_space
.
(_)
)
It will match any node that the grammar has recognized as an
infix_operator
, or the nameless nodes containing if
or :
tokens,
instantly adopted by any named node (represented by the (_)
wildcard sample). The question matches on subtrees of the identical form,
the place the annotated node inside it is going to be “captured” with the identify
@append_space
; one among many formatting rules we
have outlined. Our formatter runs by way of all matches and captures, and
once we course of any seize referred to as @append_space
, we append an area
after the annotated node.
Earlier than rendering the output, Topiary does some post-processing, corresponding to
squashing consecutive areas and newlines, trimming extraneous
whitespace, and ordering indentation and newline directions
constantly. This implies which you could, for instance, prepend and append
areas to if
and true
, and Topiary will nonetheless output if true
with
only one area between the phrases.
To make this extra concrete, think about the expression 1+2
. This has the
following syntax tree, if it’s interpreted as OCaml, the place the match
described by the above question is highlighted in purple:
The @append_space
seize instructs Topiary to append an area after
the infix_operator
, rendering 1+ 2
. Repeating this course of for each
syntactic construction we care about — making even handed generalisations
wherever attainable — leads us to an total formatting fashion for a
language.
As a formatter creator, defining a method for a language is only a matter
of increase these queries. Finish customers can then apply them to their
codebase with Topiary, to render their code on this fashion.
Topiary will not be the primary device to make use of Tree-sitter past its unique
scope, neither is it the primary device that makes an attempt to be a formatter for
a number of languages (e.g., Prettier). This part accommodates some instruments
that we drew inspiration from, or used throughout the growth of
Topiary.
Tree-sitter Particular
Meta-Formatters
- treefmt: A normal formatter orchestrator, which unifies formatters
below a typical interface. - format-all: A formatter orchestrator for Emacs.
- null-ls.nvim: An LSP framework for Neovim that facilitates formatter
orchestration.
Getting Began
We’re actually enthusiastic about Topiary and the potential it has on this
area.
This primary launch concentrates on formatting help for OCaml, as effectively
as easy languages, corresponding to JSON and TOML. Experimental formatting
help can be obtainable for Nickel, Bash, Rust, and Tree-sitter’s
personal question language; these are below lively growth or serve a
pedagogical finish for formatter authors.
We’d extremely encourage you to strive Topiary and invite you to take a look at
the Topiary GitHub repository to see for your self.
Info on putting in and utilizing Topiary may be discovered on this
repository, the place we’d additionally welcome contributions,
characteristic requests, and bug studies.