
Clojure Don’ts: Lazy Effects (2015) - lsh
https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects
======
taneq
This isn't just a Clojure "Don't". It's a general programming "don't".

In my old team we had a semi-serious production bug result from this (it
didn't actually kill anyone but stopped the system from being used) because we
had an expression with a side effect inside a debug logging statement. The
code worked perfectly in a debug build, but in release the DEBUG_PRINTF()
equivalent was removed by the C++ preprocessor (so logically equivalent to
"!debug && " in front of every line) and a crucial value didn't get updated.
It took some tense minutes debugging it on site in front of the client before
we figured it out... lesson learned!

~~~
kerkeslager
It's an obvious-enough "don't" that it's one of the stated reasons for
universal laziness in Haskell. Simon Peyton Jones has said that he's committed
to keeping everything in GHC lazy to keep them honest about side effects. I
think the idea is basically that if something has side effects, it will become
painful to use immediately due to Haskell's laziness.

~~~
jes5199
I’ve had a Haskell program fail in exactly the same way, though - removing a
debug statement caused a program to start crashing - even though the lazy
functions were completely pure.

We had a numeric counter that was enumerating the frame number of a
simulation. When we removed the debug statement, we didn’t realize that no
other code ever evaluated the counter - so instead of resolving each X+1 into
an integer, Haskell kept it as a ever-growing nested series of unevaluated
thunks: 1+1+1+1+1+1+1+1+1+1+... , and our efficient little program started
filling up all its available RAM and crashing.

~~~
taeric
Just, wow. I'm curious if there are mechanisms that could guard against
growing strings of unevaluated thunks. Would be an interesting GC problem. One
where you free up space by forcing a maximum depth of unevaluated space.
(You'd obviously have to treat any "infinitely extra" computations versus the
"do this on top of that" ones. But that shouldn't be undoable, should it?)

~~~
zenhack
Yeah, this is Haskell's biggest footgun by far. The language gives you tools
to control evaluation order (including some neat libraries that let you
parallelize things without much disruption to the logic of your program), but
there are no silver bullets.

A good rule of thumb is to by default mark fields for basic types like Int,
Bool, etc. as strictly evaluated, and leave larger structures (trees, lists,
etc) lazy. But you still need to be careful.

The compiler trying to silently fix things is probably a bad idea. The current
behavior is at least easy to understand; I'd hate to have a program that's
working because the compiler could figure out that it could make something
strict, and then I bump something mostly unrelatrd such that the optimizer
can't be sure anymore, so I get a space leak.

Also, unintended evaluation can potentially cause high memory use as well
(e.g. [1..1000000]), so the compiler also has to be careful about
_introducing_ excessive memory use.

The compiler does do some strictness analysis, but it's a hard problem.

I like the way idris does things -- strict by default, laziness controlled by
the type system, and some nice support for automatic coercions.

~~~
taeric
To be clear, I was not voting for the compiler to do this, but the runtime. It
should be instrumented and it would likely be a round trip process sometimes,
with people tweaking how it does things.

------
djtango
This is a sound article - Clojure's laziness is often a source of common
gotchas for new users of the language.

It's often not clear what is lazy and what isn't - concat is lazy for
instance. It isn't externally obvious which of the very common operations are
lazy, filter being lazy has bitten me in CLJS a few times too.

Adding to the source of the confusion is that the REPL will often realize your
side-effects which then won't get evaluated in your runtime environment.

One of the errors I used to make earlier on was:

    
    
        (map insert-into-db table-rows)

~~~
serpix
always mapv when doing I/O on the backend and always extra vec when returning
sequences in the frontend

~~~
joncampbelldev
run! and doseq are for IO on each element of a sequence. Functions like mapv
are intended for data transformation, not side-effects. This is why run! and
doseq return nil

~~~
djtango
I don't often find myself doing sequential side effects so often anymore, but
when I do doseq has been my go to, though I will confess I didn't know about
run! until this thread.

------
noelwelsh
This is part of the reason for monads: they specify the order of operation,
which is necessary in a lazy language like Haskell. The `bind` or `flatMap`
operation in a monad specifies "what happens next". Once you have defined
order of operations you can start to reason about effects.

~~~
zenhack
It's worth noting that while monads themselves naively introduce a data
dependency, that doesn't necessarily force evaluation order. If the compiler
is smart enough and sees something like:

    
    
        foo >>= \_ -> bar
    

It is well within its rights to evaluate bar and then foo, or do both in
parallel, or not evaluate foo at all, as long as it can guarantee that the
resulting value is the same. The big thing it needs to be careful of is not to
introduce nontermination.

What makes this do the right thing for effects is what values of type IO
actually _are_ , and what bind means specifically for IO. It's helpful to
think of an IO value as code in another (imperative) language. Bind takes two
fragments of code in that other language, and stitches them together into a
script that executes one after the other.

The key thing is that the order in which you compute parts of the script is
entirely orthogonal to the order the commands appear _in_ the script.

In Haskell, evaluation does not cause side effects, period. `main` is a value
of type IO, which is executed when the program is run. The effects of that
execution are independent of how the value `main` is computed.

Obviously, the computation itself takes time and space though.

------
setzer22
Something that is specially dangerous is mixing laziness and dynamic vars. If
you bind a dynamic var to then return a lazy seq, and the dynamic var is used
to generate the seq's elements (e.g. `binding` over `map`, and use the bound
var inside the map's `fn`), you may get different results depending on when
the seq is evaluated (which is not determined)

This made me waste a whole afternoon once. I was even ready to submit a
compiler bug!

------
nathell
> In my opinion, the presence of doall, dorun, or even “unchunk” is almost
> always a sign that something never should have been a lazy sequence in the
> first place.

I’d say it’s a good rule of thumb, but it’s sometimes justified. For example,
`line-seq` returns a lazy seq of lines read from a given Reader; the appeal
being you can process them one by one, without keeping them in memory all at
once. But if you just want them all, you wrap the `line-seq` in a `doall` in a
`with-open`.

My scraping library, Skyscraper [1], has a similar justification for laziness
around side-effects: scraping a site returns a lazy sequence of elements, each
corresponding to one page. It's terrifically useful to have that sequence be
lazy, and there's unchunking code on Skyscraper to enforce full laziness.
Incidentally, I'm rewriting it to be based on core.async, but it has a less
functional feel to it.

[1]:
[https://github.com/nathell/skyscraper](https://github.com/nathell/skyscraper)

------
zdkl
Does mapv not do exactly what the author needs? Ie an unlazy map?
[https://clojuredocs.org/clojure.core/mapv](https://clojuredocs.org/clojure.core/mapv)

~~~
Zak
It can, but using it that way is considered non-idiomatic in Clojure. The
intent of using variants of map is to produce a collection with each element
transformed. Someone who reads the code might reasonably expect not to find
side effects inside any variety of map.

Clojure provides doseq and run! for side effects on collections, and both
return nil. One might get the impression that these design choices are
intended to discourage the programmer from complecting transformations of
sequences with performing side effects on their elements.

Most of the time, you can replace one with another, such as pmap (lazy,
parallel) and have the program behave the same way. Using mapv for side
effects breaks this assumption.

------
dustingetz
This problem is magnified in ClojureScript doing React.js rendering. React.js
renders breadth-first in stratas (compared to a call stack which is depth
first). Everyone gets bitten by it once.

------
grzm
(2015)

~~~
jxub
That is valid clojure too ;)

~~~
kimi
No way, a number does not implement IFn :-)

~~~
agumonkey
Curious... are numbers open to extension ? so I can (1024 2) => 10

