Clojure’s lethal sin – Clojure Goes Quick

Jul 27, 2023
This text is about laziness in Clojure. It’s meant to be a complete
and goal (nonetheless potential) critique of lazy sequences as a characteristic. In
no approach do I need this to be a judgment of the determination to make Clojure lazy.
Clojure the language is certainly not formulaic; creating it concerned making a
plethora of impactful decisions. We will choose by Clojure’s longevity that the
whole package deal has been successful, nevertheless it’s solely pure that some choices
proved to be extra correct than others. Lastly, this isn’t meant to criticize
the folks behind Clojure. Hindsight would not want glasses; it’s extremely
robust to put in writing a language (not to mention a profitable one) and straightforward to select on its
perceived shortcomings. But, I am nonetheless prepared to do the latter as a result of I
imagine my diatribe could be useful.
My aim is to align the Clojure neighborhood’s stance on laziness. Occasions and instances
once more, the builders discover themselves grappling with the complexities of the
lazy strategy, attributing their struggles to the lack of expertise of the
Clojure Approach[1]. I need to show that many issues make
up the Clojure Approach, and laziness would not should be a defining attribute. I
need programmers to scale back or remove their reliance on laziness and never really feel
responsible about it simply because laziness has been deeply ingrained in Clojure since
its inception.
Solely time will inform if writing that is useful, however I am prepared to strive. Additionally,
sorry in regards to the clickbait
title, it was too juicy to
cross up.
What’s laziness?
Lazy analysis, often known as deferred execution, is a programming language
characteristic that delays producing the results of computation till the worth is
explicitly wanted by another analysis. There are a number of completely different
views on how one can take a look at it, they’re all comparable, however every can provide
you some recent perception:
- Separation of declaration and execution in time. Writing an expression
doesn’t instantly produce the end result. There are actually two levels of
computing the worth {that a} programmer has to pay attention to (whether or not it is good
or unhealthy will probably be mentioned later). - Declarative vs crucial. The separation above encourages the developer
to consider this system extra in declarative phrases. A program just isn’t a
line-by-line instruction for the pc to execute however extra of a versatile
recipe. This makes programming a bit nearer to math, the place writing a method
on paper doesn’t instantly power you to resolve it (and make your mind
damage). For the compiler, the declarative strategy permits further
optimizations as a result of it’s free to reorder and even remove some steps of
the execution. - Program as a tree of evaluations. With all values being lazy and
dependent upon each other, the entire program turns into this declarative tree
simply ready to be executed. Faucet the foundation — the entrypoint — and the tree
will recursively stroll itself, compute from the leaves downwards[2], and collapse right into a single end result. Leaves and branches that
usually are not linked to the foundation are left unevaluated. - Pull vs push. For those who like this analogy extra, lazy analysis is a pull
strategy. Nothing will get produced till it’s explicitly known as for (pulled).
In pervasively lazy languages like Haskell, each expression produces a lazy
worth. It will not even compute 2 + 3
for you except one thing else wants it. By
invoking the “program as a tree” reasoning, it turns into obvious that “one thing
else” in the end has to have an effect on the outer world in some way, to have a facet
impact — print to display screen, write to a file, and so on. With out unintended effects, a lazy
program is a festive cookbook recipe you by no means bake.
The ideas of lazy analysis are simple to simulate in any language that
helps wrapping arbitrary code right into a block and giving it a reputation (nameless
features, nameless lessons — all of them work). In Clojure, that will be a plain
lambda or a devoted delay
assemble:
(fn [] (+ 2 3)) ;; As lazy because it will get
(delay (+ 2 3)) ;; Comparable, however the result's computed as soon as and cached.
What distinguishes laziness as a language characteristic fairly than a method is
that it happens mechanically and transparently for the consumer. The code that
consumes a price would not should know whether or not the worth is lazy or not, the API
is precisely the identical, and there’s no strategy to inform (really, there typically is,
nevertheless it’s hardly ever obligatory). In distinction, a delay
additionally represents a deferred
computation, nevertheless it needs to be explicitly dereferenced with a @
.
Laziness in Clojure
Whereas Clojure was impressed by Haskell in a number of methods, its strategy to laziness
is way more pragmatic. Laziness in Clojure is proscribed solely to lazy
sequences. Be aware that we do not say “lazy collections” as a result of a sequence is the
solely assortment that’s lazy. For instance, updating a hashmap is keen in
Clojure, whereas in Haskell, it will be lazy:
(assoc m :foo "bar") ;; Occurs instantly
There are a number of sources whence a developer can acquire a lazy sequence:
- The commonest are the sequence-processing features like
map
,filter
,
concat
,take
,partition
, and so forth. Such features are lazy-friendly
(they’ll settle for lazy sequences and do not implement their analysis) and
return a lazy sequence themselves (even when the equipped assortment was not
lazy). - Features that produce infinite sequences:
iterate
,repeat
,vary
. - Features that present a pull-based API to a normally restricted useful resource:
line-seq
,file-seq
. - Low-level lazy sequence constructors:
lazy-seq
,lazy-cat
. Not often used
outdoors of theclojure.core
namespace, the place they function constructing blocks
for higher-level sequence features.
Let’s take a look at the instance code that entails lazy sequences:
(let [seq1 (iterate inc 1) ; Infitite sequence of natural numbers,
; obviously lazy.
seq2 (map #(* % 2) seq1) ; Arithmetic progression with step=2, still
; infinite, lazy.
seq3 (take 100 seq2) ; Sequence of 100 items from the previous
; sequence, lazy, nothing has happened yet.
seq4 (map inc [1 2 3 4]) ; The result's lazy, although the enter is
; a realized (non-lazy) vector.
seq5 (concat seq3 seq4)] ; Lazy inputs and lazy end result.
(vec seq5)) ; The precise work lastly begins right here as a result of we
; convert sequence to a vector, and vector is
; not a lazy assortment.
The instance above exhibits how some features produce lazy sequences, some devour
them and retain the laziness, and a few implement the analysis like vec
does.
It takes a while to wrap your head round all that.
The rationale why Clojure did not go all lazy is that laziness is costly. For
every worth with a postponed computation, the runtime has to maintain monitor of it and
bear in mind the code to be executed and its context (native variables). A price is
internally changed with a wrapper that holds all that info, a thunk
(Haskell time period). Thunks typically set off further reminiscence allocations, occupy area
in reminiscence, and introduce indirections that decelerate this system execution. Many
of these inefficiencies could be alleviated by a sophisticated compiler just like the one
in Haskell. However the design of Clojure prompts for a easy, simple
compiler, so a full-lazy strategy in Clojure would possible have precipitated
efficiency issues.
However that’s not sufficient. Even with simply lazy sequences, the price of making a
thunk for every successor can be prohibitively excessive. To counter that, Clojure
employs an idea known as sequence chunking. In easy phrases, it implies that
the “unit of laziness”, the variety of parts that get realized at a time, is
not 1 however 32. What this achieves is that when processing a big assortment, the
overhead from the laziness equipment will get higher amortized per merchandise. This is a
traditional instance of the chunking conduct:
(let [seq1 (range 100)
seq2 (map #(do (print % " ") (* % 2)) seq1)]
(first seq2))
;; Print output:
;; 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
;; 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
On this instance, we launched a facet impact to the map
step to watch how
many objects it operates on. Within the remaining step, we ask just for the primary merchandise of
the lazy assortment, so it’s pure to imagine that only one merchandise will get
printed contained in the map
, however no, we see 32 objects printed. That is the impact of
chunking as a result of when the lazy assortment runs out of “realized” objects, it
forces the analysis of the following 32 objects directly. All 32 are evaluated
regardless if the following step requires 1, 5, or 32 objects. If we ask for 33 objects,
64 objects will get evaluated, and so forth.
Lazy sequences in Clojure are cached, which implies that the deferred values
are computed precisely as soon as. The saved result’s returned on subsequent entry,
and the worth just isn’t computed anew. A brief demonstration:
(let [s (time (map #(Thread/sleep %) (range 0 200 10)))]
(time (doall s))
(time (doall s)))
;; Elapsed: 0.1287 msecs - map produces a lazy sequence, nothing has occurred but
;; Elapsed: 1970.8 msecs - doall pressured the analysis
;; Elapsed: 0.0039 msecs - values are already computed, not reevaluated the second time
This differentiates lazy collections from plain lambdas that do not retain the
analysis end result and make them nearer to delay
, which does.
Now it’s best to know sufficient in regards to the specifics of lazy analysis in Clojure to
recognize the next chapters. We’ll begin by enumerating the undisputable
advantages of lazy collections and observe with the unlucky drawbacks and
problems.
The nice elements of laziness
Avoiding pointless work
The principle worth proposition of lazy sequences in Clojure (and the laziness in
different languages general) is barely computing what’s wanted. You possibly can write the code
with out pondering upfront about how a lot of the end result the shoppers will want
later. The inversion of management permits one to put in writing code like this:
(defn parse-numbers-from-file [rdr]
(->> (line-seq rdr)
(map parse-long)))
(take 10 (parse-numbers-from-file (io/reader "some-file.txt")))
The perform parse-numbers-from-file
would not should know what number of traces will
ever be wanted, it would not have to wonder if it is wasteful to parse all of the
traces. The code is written as if it parses all the pieces, and the calling code will
later determine how a lot will really be parsed.
Infinite sequences
We won’t symbolize an infinite sequence utilizing any keen assortment because it
would take an infinite time to compute it. As a substitute, there are different methods an
infinite sequence may very well be represented — a streaming API of some kind or an
iterator. Within the case of Clojure, lazy sequences function a becoming abstraction
for an infinite assortment.
This makes for an ideal occasion trick — a kind of language options that make
folks go “wow” and get them [3]. You write
(iterate inc 1)
and get all pure numbers, how cool is that[4]? And since most sequence processing features are
lazy-friendly, you get to derive new infinite sequences which might keep infinite
up till you demand some bounded end result. Wanna make an infinite Fibonacci
sequence? Go for it:
;; The API of `iterate` already recoils just a little. We've to retailer every merchandise
;; as a tuple of two numbers, and later drop the second quantity. It's because
;; `iterate` offers us entry solely to the instant earlier merchandise.
(->> (iterate (fn [[a b]] [b (+ a b)]) [0 1])
(map first)
(drop 30) ;; This and the following step offers us Thirtieth-Fortieth parts of the sequence.
(take 10))
=> (832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986)
Performing like you’ve gotten infinite reminiscence
As a result of a lazy sequence can act as a stream and solely maintain a single aspect of
the sequence in reminiscence (or, fairly, a 32-item chunk), it may be used to course of
giant information or community payloads utilizing acquainted sequence-processing features
with out listening to the scale of the dataset. We have already checked out
this instance:
(->> (line-seq (io/reader "very-large-file.txt"))
(map parse-long)
(cut back +))
The file we cross to it might not match into reminiscence if loaded utterly, nevertheless it
would not concern us and would not change the way in which we write code. line-seq
returns
a sequence of all traces within the file, and the laziness ensures that not all of
them are resident in reminiscence directly. This offers the developer one much less hurdle to
take into consideration. In case they did not give it some thought, this system may be extra
strong due to laziness; say, the developer solely examined such code on small
information, and the laziness assured the right execution on giant information too.
Unified sequence API
Lazy and non-lazy sequences, infinite sequences, larger-than-memory sequences
can all be labored with utilizing the identical assortment of features. Clojure designers
did not should implement particular person features for every subtype; due to this fact, you
did not should study them. The language core is thus smaller, and there are
fewer potential bugs.
The unhealthy elements of laziness
On this part, I am going to enumerate the issues which might be both inherent to
laziness or incidental to its implementation in Clojure. Their order right here is
arbitrary, however for every situation I am going to give my private opinion on how essential it
is.
Many issues come up from the truth that for laziness to be utterly seamless,
it calls for full referential transparency. However Clojure is a practical language
that permits unintended effects wherever and doesn’t make it a precedence to switch the
side-effecting approaches with purely useful wrappers. Lazy sequences do not
play properly with unintended effects, as we’ll see many instances on this part.
Error dealing with
In Haskell, error dealing with is achieved by way of particular return values that will
include both a profitable end result or an error (e.g.,
Either).
Thus, errors in Haskell are first-class residents of the common analysis move
and do not battle with the laziness of the language.
Clojure, nonetheless, makes use of Java’s stack-based exception infrastructure as its
major error-handling mechanism. That is probably the most pragmatic alternative since any
different answer would require repackaging all exceptions that may very well be thrown by
the underlying Java code. This might have had large efficiency
implications. Apart from, in a dynamically typed Clojure, result-based error
dealing with would simply not be as handy as in a statically typed language.
So, we’re caught with exceptions. And we’ve got lazy sequences. What can go unsuitable?
Contemplate the next bug that I am positive 99% of all Clojure programmers bumped into
not less than as soon as:
;; We have to generate a listing of numbers 1/x.
(defn invert-seq [s]
(map #(/ 1 %) s))
(invert-seq (vary 10))
;; java.lang.ArithmeticException: Divide by zero
We wrote a perform that accepts a listing of numbers and produces a listing of
numbers 1/x
. All’s good till anyone offers us a sequence that accommodates a
zero. We need to defend ourselves from that, so we repair our perform:
(defn safe-invert-seq [s]
(strive (map #(/ 1 %) s)
;; For the sake of the instance, let's return an empty record if we
;; encounter a division by zero.
(catch ArithmeticException _ ())))
(safe-invert-seq (vary 10))
;; java.lang.ArithmeticException: Divide by zero
Seems, the perform didn’t get any safer from our repair. Regardless that it seems
like we wrapped the damaging code in a try-catch
block, the code inside
merely returns a lazy sequence. There’s nothing to catch
simply but. However by the
time the values are “pulled” from the lazy assortment, the mapping code has
already left the try-catch
block, and the raised exception crashes the
program. To actually repair this instance, we’ve got to look at for exceptions inside every
map
iteration:
(defn really-safe-invert-seq [s]
(map #(strive (/ 1 %)
(catch ArithmeticException _ ##Inf))
s))
(really-safe-invert-seq (vary 10))
;; => (##Inf 1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9)
Of all the issues created by laziness, I take into account this to be a very critical
one. The Java strategy to exception dealing with teaches us that wrapping code in
try-catch
ought to deal with the exceptions raised inside[5]. Within the presence of lazy sequences, you possibly can now not depend on
that except you implement all lazy analysis (the woes of which I am going to point out
later). This downside is surprising, it’s frequent, and it causes nervousness. And
you possibly can solely take care of it by both making certain there isn’t a laziness within the wrapped
code or catching all exceptions inside every lazy iteration step.
Dynamic bindings
Much like exception dealing with, dynamic bindings in Clojure are stack-based. As
quickly because the execution exits the binding
type, the dynamic variable returns to
its earlier worth.
(def ^:dynamic *multiplier* 1)
(let [seq1 (binding [*multiplier* 100]
(map #(* % *multiplier*) (vary 10)))]
(vec seq1))
;; => [0 1 2 3 4 5 6 7 8 9]
The foundation worth of the dynamic variable *multiplier*
is 1. We bind it to 100
and multiply a bunch of numbers by that variable. We might anticipate that the
numbers in map
will probably be multiplied by 100. Nonetheless, resulting from laziness, the precise
execution of map
solely occurs at (vec seq1)
step, and the binding is lengthy
misplaced by that time.
One strategy to fight that is to wrap the perform that will be executed lazily in
a particular bound-fn*
perform. bound-fn*
captures all dynamic values it sees
in the intervening time of its invocation and passes these values to the wrapped perform.
It would not matter at which level of time the wrapped perform will probably be executed;
it should obtain the dynamic variables as if it ran instantly.
(let [seq1 (binding [*multiplier* 100]
(map (bound-fn* #(* % *multiplier*))
(vary 10)))]
(vec seq1))
;; => [0 100 200 300 400 500 600 700 800 900]
For my part, this interplay with laziness considerably reduces the
usefulness of dynamic variables for something vital. Positive, multi-threading
additionally “breaks” dynamic variables, so laziness just isn’t the one one responsible. However
that is one thing to be continuously conscious of, and more often than not, it’s simpler
and extra wise to forgo dynamic variables in your code altogether.
Releasing assets
Considerably much like the earlier two issues, liberating a beforehand obtained
useful resource (e.g., a file deal with, a community socket) occurs at a particular level in
time, and all of the execution delays that include laziness do not play properly with
that in any respect. This is one other bug that needs to be acquainted to just about
everybody:
(defn read-words []
(with-open [rdr (io/reader "/usr/share/dict/words")]
(line-seq rdr)))
(depend (read-words))
;; java.io.IOException: Stream closed
The implementation of with-open
opens the desired useful resource, executes the
physique with that useful resource out there, and at last closes the useful resource. Utilizing
with-open
significantly just isn’t vital — you can write out the precise
steps manually, and the end result would keep the identical. Such useful resource administration
implies that every one the operations on the useful resource should occur inside that open
window, so that you should be completely positive that no lazy code that also desires to make use of
the useful resource stays unexecuted after the useful resource is freed.
This sort of bug would not creep into my code typically, however when it does, I despise
it and go on one other sweep to clear this system of all potential laziness. I would
name it a medium-sized downside.
Brittle for “huge knowledge” processing
Lazy sequences are handy for processing “bigger than reminiscence” datasets.
Nonetheless, talking from private expertise, this strategy is strong solely as lengthy
as the information entry sample is linear and simple. Again within the days when
I used to be enamored with laziness, I developed a program that needed to chew by way of
a number of giant information containing nested knowledge, and I closely relied on lazy
sequences to attain it. This system ultimately grew extra complicated, and the information
entry moved from the trivial “fly-by streaming” to requiring aggregation,
transposition, and flattening. I ended up having zero understanding of how a lot
reminiscence this system wanted at any given cut-off date and just about no confidence
that the already processed objects can be promptly discarded to make room for
new objects.
There’s even an formally acknowledged mistake of holding onto one’s
head which is
comparatively simple to make. For those who retain the reference to the pinnacle of a big
sequence, the runtime must maintain all the sequence in reminiscence as you
stroll over it. Finally, you will run out of reminiscence. Clojure compiler
aggressively nils out all native variables as quickly as they aren’t used anymore,
a characteristic known as locals
clearing. It
exists solely to forestall a few of the holding on the pinnacle bugs.
It is a main situation for me as a result of it straight contradicts lazy sequences’
most enjoyable worth prop. If a instrument turns into unreliable beneath the situations it
particularly claims to handle, then the suitability of that instrument needs to be
questioned.
Massive/infinite sequence is a powder keg
It is best to train excessive warning when returning unbounded lazy sequences as
a part of your API. The buyer has no strategy to inform if the result’s protected to load
into the reminiscence fully apart from counting on documentation. The buyer should
additionally know which features are protected to invoke on a lazy assortment and which might
blow up. Incessantly, they won’t know or take into consideration these implications, and
issues will go bitter.

Even if you end up sure that the sequence you come just isn’t infinite or overly
giant, there are nonetheless methods to screw up the customers of your API. A great instance
can be Cheshire, whose perform
parse-stream
returns a lazy sequence if the top-level JSON object is an array. Mix that
with the issue of releasing resources and, God forbid,
asynchronous processing, and also you get a bug that I as soon as spent a literal hour to
determine[6].
This situation deserves not less than medium significance. Absent-minded dealing with of lazy
collections can result in lurking issues that will stay hidden for years and
then shock you in probably the most baffling methods.
Complicated unintended effects
As I mentioned earlier than, laziness just isn’t an issue when the code is referentially
clear. For those who can freely substitute the analysis with its end result and
nothing adjustments for the observer, then laziness is okay. Fortunately, Clojure
doesn’t implement referential transparency, and you might be allowed so as to add facet
results wherever in your code. Then, unexpectedly, you witness this:
(defn sum-numbers [nums]
(map println nums) ;; Let's print every quantity for debugging.
(cut back + nums))
(sum-numbers (vary 10))
;; => 45
;; However the place are the printlns?
After fifteen minutes of distrustful debugging, googling, and/or asking on
Clojurians, you discover out that the map
name
by no means executes resulting from laziness and the truth that no person consumes its return
worth. Round that time, you additionally study what it’s best to have executed in another way
and carry that lesson by way of the remainder of your life:
- Wrap the printing type in
dorun
ordoall
:(dorun (map println nums))
. - Use
run!
as an alternative ofmap
:
(run! println nums)
.
In my guide, it is a manageable downside. It’s possible to recollect whether or not you
may be coping with laziness while you need to set off unintended effects. That is
to not say that you will achieve it each time; bugs associated to unintended effects
and laziness occur to skilled programmers too.
Convoluted benchmarking and profiling
No matter how functionally pure a programming language is, each perform
will at all times have not less than one facet impact — the time spent to execute it (additionally,
reminiscence allocations, disk I/O, and another useful resource utilization). By deferring the
execution to a different cut-off date, the language makes it tougher for the
programmer to know the place these CPU cycles are spent. Easy instance utilizing
Criterium:
(crit/quick-bench (map inc (vary 10000)))
;; Analysis depend : 34285704 in 6 samples of 5714284 calls.
;; Execution time imply : 16.188222 ns
;; ...
This results of 16 nanoseconds doesn’t show that Clojure is so amazingly swift,
however fairly that you have to be vigilant when benchmarking code that probably
entails laziness. That is the end result that it’s best to have obtained:
(crit/quick-bench (doall (map inc (vary 10000))))
;; Analysis depend : 2088 in 6 samples of 348 calls.
;; Execution time imply : 299.631257 µs
;; ...
The identical goes for profiling. Laziness and all of the execution ambiguity that comes
with it make the hierarchical profiler view fairly ineffective. Contemplate this instance
and the flamegraph obtained with
clj-async-profiler:
(defn burn [n]
(cut back + (vary n)))
(defn actually-slow-function [coll]
(map burn coll))
(defn seemingly-fast-function [coll]
(depend coll))
(prof/profile
(let [seq1 (repeat 10000 100000)
seq2 (actually-slow-function seq1)]
(seemingly-fast-function seq2)))
On the flamegraph, you possibly can see that almost all CPU time is attributed to
seemingly-fast-function
, however actually-slow-function
is nowhere to be discovered.
By now, it needs to be crystal clear to you what has occurred —
actually-slow-function
returned a lazy sequence and did not do any work, whereas
seemingly-fast-function
by calling the innocuous depend
triggered all the
computation to be executed. This may be simple to interpret in a toy instance,
however in actual life, such execution migrations will certainly bamboozle you.
For those who do not measure the execution time of your packages typically (too unhealthy!), then
this disadvantage is not going to affect you a lot. I personally do this so much, so for me,
it is a medium-to-large supply of headache and one other stable purpose to keep away from
laziness.
Inefficient iteration with sequence API
This downside just isn’t attributable to laziness straight. As a substitute, Clojure’s sequence API
has to accommodate lazy collections, amongst different issues, so it’s fairly
restrictive in what it could possibly supply. Principally, Clojure’s sequence interface
ISeq
defines human-readable replacements for car and
cdr. You possibly can iterate just about
all the pieces with this abstraction, however it’s removed from environment friendly for something however
linked lists. Let’s measure it utilizing time+:
;;;; Basic hand-rolled iteration with loop.
(let [v (vec (range 10000))]
(time+
(loop [[c & r :as v] (seq v)]
(if v
(recur r)
nil))))
;; Time per name: 238.92 us Alloc per name: 400,080b
;;;; doseq
(let [v (vec (range 10000))]
(time+ (doseq [x v] nil)))
;; Time per name: 41.50 us Alloc per name: 20,032b
;;;; run!
(let [v (vec (range 10000))]
(time+ (run! id v)))
;; Time per name: 42.65 us Alloc per name: 24b
Within the first snippet, we carry out a fundamental, most versatile iteration with loop
.
You normally resort to it in any non-trivial iteration state of affairs (when you must
accumulate a number of completely different outcomes directly or stroll by way of the sequence in a
non-obvious method). We see that it takes us 240 microseconds to merely iterate
over that vector, and 400KB price of objects will get allotted alongside the way in which. The
second snippet makes use of doseq
, which accommodates a number of chunking optimizations.
Iteration with doseq
is 6 instances quicker than with loop
, producing 20
instances much less rubbish on the heap. Lastly, the reduce-based run!
presents the
identical velocity as doseq
on this instance whereas not allocating something because it runs.
How huge of an issue that is will depend on how a lot you care in regards to the efficiency.
For Clojure creators, it is vital sufficient that an increasing number of
collection-processing features are utilizing the
IReduce
abstraction over ISeq.
Efficiency overhead
Like I mentioned earlier than, laziness just isn’t free, and neither is it significantly low cost.
Contemplate an instance[7]:
;;;; Lazy map
(time+
(->> (repeat 1000 10)
(map inc)
(map inc)
(map #(* % 2))
(map inc)
(map inc)
doall))
;; Time per name: 410.22 us Alloc per name: 480,296b
;;;; Keen mapv
(time+
(->> (repeat 1000 10)
(mapv inc)
(mapv inc)
(mapv #(* % 2))
(mapv inc)
(mapv inc)))
;; Time per name: 63.66 us Alloc per name: 28,456b
;;;; Transducers+into
(time+
(into []
(comp (map inc)
(map inc)
(map #(* % 2))
(map inc)
(map inc))
(repeat 1000 10)))
;; Time per name: 43.95 us Alloc per name: 6,264b
;;;; Transducers+sequence
(time+
(doall
(sequence (comp (map inc)
(map inc)
(map #(* % 2))
(map inc)
(map inc))
(repeat 1000 10))))
;; Time per name: 86.16 us Alloc per name: 102,776b
The lazy model within the instance takes 410 µs and 480KB of trash to carry out
a number of mappings over a sequence. The keen model using mapv
is 6.5
instances quicker and allocates 16 instances much less for a similar end result. And that’s
with all of the intermediate vectors being generated on every step. The
transducer model is even quicker
at 44 µs and even much less rubbish spawned as a result of it fuses all of the mappings right into a
single step. As a final snippet, I present that composing the transformation steps
with transducers and producing a lazy sequence with sequence
remains to be a lot
quicker and allocation-efficient than constructing a processing pipeline with lazy
sequences straight.
I wished to point out the profiler outcomes for the above examples, however with such
efficiency disparity, there may be nothing to realize from them. The profile for the
lazy model is dominated by creating intermediate lazy sequences and strolling
over them. The mapv
model is generally about updating TransientVectors.
Might it’s that the lazy model is extra environment friendly on shorter sequences? Let’s
discover out:
(time+ (doall (map inc (repeat 3 10))))
;; Time per name: 181 ns Alloc per name: 440b
(time+ (mapv inc (repeat 3 10)))
;; Time per name: 159 ns Alloc per name: 616b
As you possibly can see, with the scale of the enter sequence as small as 3, mapv
exhibits
comparable efficiency to map
. Do not be afraid to make use of mapv
the place you do not
want laziness.
This draw back of laziness is critical. Loads of Clojure code entails
strolling over and modifying sequences, and 95% of these haven’t any enterprise being
lazy, so it is leaking efficiency on the ground for no purpose.
No strategy to power all the pieces
Whereas doall
makes positive {that a} lazy sequence you cross to it will get evaluated, it
solely operates on the highest stage. If sequence parts are lazy sequences
themselves, they might not be evaluated. A man-made instance:
(let [seq1 (map #(Thread/sleep %) (repeat 100 10))]
(time (doall seq1)))
;; "Elapsed time: 1220.941875 msecs"
;; As anticipated - doall pressured the lazy analysis.
(let [seq1 (map (fn [outer]
(map #(Thread/sleep %) (repeat 100 10)))
[1 2])]
(time (doall seq1)))
;; "Elapsed time: 0.139 msecs"
;; As a result of lazy sequences had been inside one other sequence, doall didn't power them.
The identical can be true when lazy sequences are part of another knowledge
construction, e.g., a hashmap.
(let [m1 {:foo (map #(Thread/sleep %) (repeat 100 10))
:bar (map #(Thread/sleep %) (repeat 100 10))}]
(time (doall m1)))
;; "Elapsed time: 0.01775 msecs"
;; Doall doesn't work on hashmaps and isn't recursive.
Chances are you’ll encounter a state of affairs like this typically, and it is extremely annoying when
it occurs. If you wish to power instant analysis in such instances, your solely
choices are:
- When you’ve got entry to the code that produces these constituent lazy seqs,
power them there. - Use
clojure.stroll
to stroll the nested construction recursively and namedoall
on all the pieces. - Name
(with-out-str (pr my-nested-structure))
and discard the end result.
Printing the construction will stroll it for you and notice any lazy sequences
inside. That is the dirtiest and probably the most inefficient strategy.
It is a medium-sized downside. It isn’t too frequent, however if you happen to do run into
it, it should spoil your day.
Chunking is unpredictable
I’ve already talked about that Clojure evaluates lazy collections in chunks of 32
objects to amortize the price of laziness. On the identical time, this makes lazy
sequences unsuitable for instances while you need to management the manufacturing of each
single aspect of the sequence. Sure, you possibly can hand-craft a sequence with
lazy-seq
after which make certain to by no means name any perform on it that makes use of
chunking internally. To me, this seems like one other strategy to make your program
fragile.
To be sincere, I don’t know how and when chunking works. As I used to be penning this
publish, I stumbled upon this:
(let [seq1 (range 10)
seq2 (map #(print % " ") seq1)]
(first seq2))
;; 0 1 2 3 4 5 6 7 8 9
;; Makes use of chunking.
(let [seq1 (take 10 (range))
seq2 (map #(print % " ") seq1)]
(first seq2))
;; 0
;; Does not use chunking.
Within the first instance, we used (vary 10)
to provide a bounded lazy sequence,
and mapping over it used chunking. Within the second instance, we made an infinite
sequence of numbers with (vary)
, took a bounded slice of it with take
, and
there was no chunking when being mapped over. I am positive the veil can be lifted
if I learn sufficient docs and the implementation code. However I’ve no want to do
that. As a substitute, I do not use laziness wherever the place chunking might make a
distinction, so this bother would not trouble me.
Duplicate features
Whereas Clojure’s sequence abstraction
vastly reduces code duplication and the necessity for type-specialized features,
some repetitiveness nonetheless has made it into the language. For the big half, I
attribute that to laziness and the frequent must keep away from it.
- To map over a sequence, there may be
map
andmapv
(and likewiserun!
, however it’s
helpful by itself, past discussing laziness). To filter, there may befilter
andfilterv
, and so forth. Later variations of Clojure added a bunch of those
v-suffix features as a result of, apparently, programmers typically need to guarantee keen
analysis (and obtain the end result as a PersistentVector). - There are two record comprehension macros:
for
and
doseq
. Sure, they’re
semantically completely different (doseq
doesn’t type a ensuing sequence and may
solely be used for unintended effects, likerun!
). However I would argue that if not for the
requirement to devour and produce lazy sequences, these two macros might have
had a typical and far easier implementation. - Having to know and bear in mind about
doall
anddorun
additionally provides psychological
overhead.
None of it is a deal breaker, simply one thing mildly irritating from a
perfectionist’s perspective.
Mismatch between REPL and remaining program
With the intention to have an efficient REPL expertise, it’s essential for the programmer
to be assured that the REPL and the conventional program execution behave the identical.
The traditional Clojure workflow presumes that you simply do most of your exploration in
the REPL, take a look at the code, confirm it, and at last incorporate it into this system.
That is the primary characteristic of Clojure, its alpha and omega, its cornerstone. And
laziness compromises that, even when barely.
The issue with laziness within the REPL is that you simply at all times implicitly devour the
results of the analysis. REPL prints the end result; therefore, any lazy sequences,
even the nested ones, will probably be realized earlier than being introduced. However copy that
expression into the ultimate program, and it would now not be the case. Within the
REPL, it is extremely simple to overlook that you could be be coping with lazy sequences —
except you deal with all the pieces in spherical parens as a possible hazard (maybe, you
ought to!).
To me, it is a minor situation that you simply develop out of. There are different issues to
maintain monitor of when transitioning REPL code to the ultimate program (soiled REPL
state, order of definitions, and so forth), and also you study to just accept that. Nonetheless,
each time you must inform a newbie: “within the REPL, it is completely different” — a little bit of
belief is eroded.
Huge bytecode footprint of for
and doseq
That is my private foolish gripe that shouldn’t be related to anybody else. Checklist
comprehension macros for
and doseq
are typically very sensible for mapping
over a set, even with out superior options like filtering and splicing
nested iterations. However as a result of they should take care of laziness and chunking,
their growth is totally huge. By utilizing
clj-java-decompiler‘s
disassemble
facility, we will confirm that and examine how a lot larger a for
growth is to a hand-rolled iterator-based loop. Alternatively, we will do it
by manually enabling AOT and evaluating file sizes.
(binding [*compile-files* true
*compile-path* "/tmp/test/"]
(eval '(defn using-for [coll]
(for [x coll]
(inc x)))))
;; /tmp/take a look at/ accommodates 4 information totalling 6030 bytes.
(binding [*compile-files* true
*compile-path* "/tmp/test/"]
(eval '(defn using-iterator [coll]
(when coll
(let [it (.iterator ^Iterable coll)]
(loop [res (transient [])]
(if (.hasNext it)
(recur (conj! res (.subsequent it)))
(persistent! res))))))))
;; /tmp/take a look at/ accommodates 1 file with the scale of 1832 bytes.
That further bytecode will ultimately be JIT-compiled to native code, additional
polluting the instruction cache and hindering iTLB. Once more, that is an extremely
minor situation in comparison with all the pieces listed above, nevertheless it makes me reluctant to
use for
in conditions the place it will match properly in any other case.
Tips on how to reside with it
This text is already longer than something I’ve ever written, and I nonetheless have
to offer steerage on what to do subsequent. It’s evident that I do not like
laziness. If I managed to show my level to you, the reader, then take the
following strategies as my private mitigation technique to scale back the unfavourable
affect of laziness.
Probably the most simple recommendation is to keep away from laziness when it isn’t wanted.
To realize that, you’ll want to observe these steps:
- Want v-suffixed features (
mapv
,filterv
) over lazy counterparts. - Use transducers and
(into [] <xform> <coll>)
for complicated multi-step processing. - For those who nonetheless like a
->>
-threaded pipeline higher, end it off with an
keen final step ordoall
orvec
.
In case you are a library writer, do not return lazy sequences in your public
features. If you wish to let the consumer management and restrict the quantity of information
processed by your code, take into account having a transducer arity or returning an
eduction
.
Chorus from constructing a processing paradigm round lazy sequences. It could appear
tempting to return a lazy sequence pondering that the consumer can avoid wasting
execution time by not consuming the total end result. It virtually by no means occurs. First,
the result’s hardly ever used solely partially. Second, good efficiency isn’t
unintentional. If the consumer is acutely aware of program efficiency and measures it,
they may discover methods to chop down pointless work anyway.
In instances while you take care of infinite or exceedingly giant sequences, regardless
if you’re engaged on an software or a library, select specific
representations for them. It might once more be an eduction, a Java stream, even
an Iterator, a cursor. Something that extra clearly indicators the non-finite and
piecewise nature of the gathering will evade a lot of the laziness issues I
described.
Transducers are general an satisfactory substitute for lazy sequences. Maybe they
are considerably much less handy to experiment with interactively, however the advantages
they provide are stable. Chances are you’ll even transfer again from them into the lazy-land with
sequence
if wanted.
For those who agree with this text, share it with others. Present it to your
coworkers, focus on it, change the frequent notion. Make changes to your
code high quality requirements, weed out laziness throughout code overview. Admit that it’s
simpler to battle lazy sequences within the codebase than to scold your self for not
using them correctly.
Clojure won’t ever drop lazy sequences due to backward compatibility, and
that may be a good factor. It’s in our energy and management to not endure from them
present, and acknowledging the issue is step one to overcoming it. I
hope I made my level clear; please inform me your opinion on this and if I missed
something (because you may be too lazy to do this, I explicitly ask for
suggestions). Cheers.
- Sure, I am projecting.↑
- Upwards? Why are the timber in CS at all times upside
down?↑ - Many individuals are initially hooked by issues that look
spectacular (even when these issues do not assist a lot in on a regular basis work) however keep
for mundane advantages. Maximizing that first impression is important for language
adoption.↑ - Regardless of
(vary)
being shorter and extra environment friendly,
it would not look as magical.↑ - Positive, multi-threading and callbacks already break
this premise in each Java and Clojure. Nonetheless, you might be normally extra conscious
while you use these. Laziness is extra pervasive and
incidental.↑ - I attempted to breed this for the blogpost however
could not set off it. Both one thing has been mounted in Cheshire, or the bug
solely surfaces beneath sure situations. However I’m 100% constructive it occurred to
me!↑ - The numbers within the instance are picked in a approach that
all computed boxed numbers keep inside the Java Integer
Cache. In any other case, the
execution time and allocations can be dominated by producing new Lengthy
objects.↑