Visible Programming with Elixir: Studying to Write Binary Parsers | by Kyle Hanson
Distributed programming isn’t the one space Elixir excels. One of many much less marketed strengths of erlang’s little brother is that it incorporates one of the elegant and compact methods to outline binary parsers in a programming language. Code visually represents knowledge constructions making it simple to match and extract bytes and even particular person bits within the perform heads themselves. This functionality makes Elixir an sudden front-runner for studying and prototyping binary protocols.
Binary Sample Fundamentals
Patterns are a deeply central part of writing idiomatic Elixir code. They’re a visible illustration of data-structures which act as guards in function-heads and case-statements and assign components of the data-structure to variables. Elixir contains a variety of binary patterns that are similar to erlang’s.
Our First Parser
MessagePack must be thought of the Rosetta Stone of binary parsers. With implementations in a whole bunch of languages, its compact however expressive grammar signifies that important bits of the parsers are usually just a few hundred strains are so. Datatypes are clearly prefixed and there’s no want for sophisticated textual content unescaping as with JSON.
Strings in MessagePack begin with a byte-prefix of both 0xD9
0xDA
0xDB
or the bit-prefix 101
adopted by an unsigned-big-endian-integer defining the byte-size of the string.
Similar to that, we have already got a parser for a binary grammar in Elixir! Add booleans and floats and we’ve got a parser able to defining the fundamental varieties.
Including collections will get attention-grabbing. Our parser must recursively loop over the binary enter to construct an accumulator. The parse perform wants to vary to return the worth and the remainder of the string.
Elixir makes it clear visually what you’ll be able to count on for every byte. You may proceed to construct the parser your self by implementing the remainder of the MessagePack spec. The remainder of this put up as a substitute will cowl an vital optimization: combining the listing’s loop and extracting the worth in the identical perform.
Optimizing the loop
To see what combining the loop and parser right into a single perform seems to be like, simplify the grammar in order that we solely have parse a listing of floats. As an alternative of parse_list
calling parse
and needing to unpack a tuple, we will instantly parse the float within the loop itself.
This straightforward optimization can drastically reduce reminiscence utilization and pace up your parser (generally as a lot as 2x). However there’s a downside: we have to repeat the patterns from parse
in parse_list
in order that lists can include any factor.
Macros are Elixir’s resolution to this downside. We will outline our patterns as soon as and declare capabilities primarily based on these patterns. Nevertheless, macros are inclined to obfuscate the gorgeous simplicity of Elixir’s sample matching so they need to solely be used as an optimization step.
A extra full model of mixing the loop with the parser would look one thing like this:
A pull-request primarily based on this system has been opened towards MsgPax, the first MessagePack implementation for Elixir, enhancing the efficiency and reminiscence utilization of the library. MsgPax already mixed the loop and parser, however this pull-request additional specializes the loop into a number of capabilities. As an train you’ll be able to add the remainder of the MessagePack datatypes.
Conclusion
The expressiveness of Elixir’s binary patterns was stunning to me once I first found them. They make parsing simple and free you from worrying about bit shifting and masking. This makes elixir is without doubt one of the best languages to discover binary protocols.
A particular because of lexmag for uplifting this put up!