
How to Avoid the Assignment Statement (2010) - ingve
http://loup-vaillant.fr/tutorials/avoid-assignment
======
ch33zer
I disagree with the premise of this article. Sometimes using a mutable
variable really is the simplest way to implement something:

    
    
      def min(foo: List[int]) -> Optional[int]:
        min = None
        for val in foo:
          if min is None or val < min:
            min = val
        return min
    

I argue this is much easier to read than the recursive version without
assignments:

    
    
      def min(foo: List[int]) -> Optional[int]:
        if not foo:
          return None
        else if len(foo) == 1:
          return foo[0]
        else:
          rest_min = min(foo[1:])
          return rest_min if rest_min < foo[0] else foo[0]
    

We're optimizing for readability, and sometimes readability means mutable
variables. This is to says nothing about performance which is often much
better with mutable variables in languages like C++.

Edit: people have posted better version of my code in the comments. Take a
look for more pythonic examples.

~~~
samhh
Firstly, there's a nuance here that readability is heavily dependent upon the
language. A language where recursion is the recommended approach to these
sorts of problems will have far better syntax for doing so, such as in
Haskell.

Secondly, I think there's something to be said for familiarity. I think
looping is second nature to most of us not because it's superior or more
intuitive, but because traditionally this is one of the first things we're all
taught. If functional programming were the norm and people were taught
recursion early on instead I think your notion of simplicity would differ.

I've been teaching a little bit of programming to a beginner lately, and
they've often found functional approaches to problems easier to grasp than
their imperative equivalents because they're more mathematical. It's like an
expression of something they already know versus looping, mutability, et al
which are new concepts entirely.

~~~
seanmcdirmid
Looping has a huge advantage in terms of how we do things in real life. We
don’t think of “pound 100 fence posts into the ground” recursively.

Recursion is much more of a mathematical concept that we learn about long
after preschool, while my 3 year old already gets loops.

~~~
loup-vaillant
> _Looping has a huge advantage in terms of how we do things in real life._

Anthropomorphism is the bane of our craft. Our untrained intuitions don't
really matter.

> _Recursion is much more of a mathematical concept that we learn about long
> after preschool_

Programming is applied mathematics. Different from the kind of maths you learn
at school of course, and often even more rigorous.

I personally have no qualms about requiring some mathematical proficiency in
programming. Makes you much more capable at a number of applications later on.

~~~
greggman3
I have a huge problem with requiring mathematical proficiency in programming.
It's sets an unreasonable barrier to entry.

Thousands of every day apps and services work just fine without anything more
than simple arithmetic. 3 or 4 of the top 5 websites required no math to get
started.

Spreading the idea that you need to know math before learning to program is
elitist and counter productive IMO.

~~~
andrepd
>It's sets an unreasonable barrier to entry.

Maybe that's not such a bad thing? When a big chunk of our world runs on
computers, maybe it's not such a bad thing to have standards and take things
seriously, like it is done in every single other engineering profession.

~~~
seanmcdirmid
So we don't allow kids to learn how to program until they take a lot of
requisite math first? I was planning on teaching my kid some programming when
he turned 5 or so, but should I hold off until he is 15 because obviously he
isn't going to know all of that math before hand?

I believe that programming is a skill, not a profession. You can know how to
program without being a professional programmer.

~~~
RHSeeger
Kids can learn how to build bridges out of legos and erector sets before they
learn math and (intro to?) materials science and a host of other things. But
we don't let them be civil engineers until they learn the basics of these
things. It's entirely possible to learn about and "play with" something
without the things that are considered prerequisites for being a professional
at it.

~~~
seanmcdirmid
Again, programming is not a profession. You might have a point with being a
professional programmer (although a lot of programmers do useful things for
money without much math), but programming is a skill that can be learned
without a deep understanding of math, and often programming acts as a better
lead into math rather than vice versa.

~~~
RHSeeger
Neither is designing or building scale model bridges. But the minute you put a
bridge or software project into a situation where many people depend on it and
it can cause serious damage, it tends to become one. And that is when we would
start applying stricter standards.

~~~
seanmcdirmid
Sure, but this just brings up the fact that not all programming is the same.
Some people are building small programs that don't need stricter standards,
some people are building large ones that do. We just happen to call lots of
activities that involve writing code programming, but they are fundamentally
different.

------
wffurr
I generally prefer expression-oriented programming like this, but it's hard to
pass up the efficiency improvements of arrays and hashmaps. The purely
functional list given in the article is actually an order of magnitude less
efficient to iterate on modern caching pipelined processors.

What's the best way to have your expression-oriented programming and use
cache-friendly data structures? How do Haskell et al handle this?

~~~
kqr
Local impurity. The key insight is that mutation/impurity isn't necessarily
bad in and of itself. The important thing is that it is guaranteed to not
cross API boundaries -- this prevents abstractions from leaking at least one
way, and lets you reason about the code as if it were pure. (And in fact, from
the outside it is.)

This allows mutation to remain the performance optimisation it is, without
allowing it to seep into the design of systems, which is where it causes
badness.

As far as I know, Haskell is one of the few languages (if not the only one)
that supports local mutation that is guaranteed to not be visible in any way
to the caller.

------
overgard
I like this style in other languages, but unfortunately it's really hard to
write this style in C, most of his examples with functions would be pretty
inefficient, or at least have somewhat confusing lifetimes. For instance the
functions returning a string on the stack which at a glance seem inefficient.
I think modern C++ compilers might be able to use move semantics for the
examples returning a string, but im not entirely sure and I personally don't
like relying on compiler optimisations

------
leephillips
This is great advice for many types of programs; for example, an application
to be run in a web browser, or an interpreter. But there is a large class of
programs for which “avoiding the assignment statement” does not apply. The
author knows this, and mentions it in the “When you can't help it” section:

“Sometimes, you just can't avoid the assignment statement or other side
effects. Maybe you need so much efficiency that you have to mutate state to
optimise your program”.

This class of programs includes _all high performance numerical code_. These
programs compute by declaring large multidimensional arrays (usually
distributed among multiple compute nodes), reading values from array elements,
doing arithmetic on those values, and writing the results into array elements.
These elements have fixed memory addresses, so the computation consists in
changing the values stored in fixed memory locations. Efficiency comes from
minimizing the movement of data and parallelizing as much as possible. It’s
what Rich Hickey would call computing with places, not with values.

------
jandrese
Does anybody else get nervous when the author goes "don't worry about
efficiency, the compiler will fix that for you?"

~~~
derefr
It’s important to understand where compiler optimization opportunities come
from. They’re not magic; they’re determined by the _information_ the compiler
has available to it—information about constraints you’ve imposed on it, and
information about guarantees it can make.

Using separate variables instead of reusing the same variable will sometimes
actually _increase_ the optimizations that compilers can do. Why? Because the
version where you overwrite the variable is introducing additional constraints
(e.g. keeping X1 and X2 in the very same stack slot called X) that have costs,
but whose semantics you aren’t necessarily (or even usually!) relying upon.

For example: you know all that spare register spill space in the x64 calling
convention that you’re allocating either way, and which deallocates as a block
on function exit? Can’t make efficient use of that if you’re telling the
compiler it has to keep both of your pointers successively in the same memory
address!

On the other hand, the opposite is also true: sometimes overwriting a variable
will _free_ the compiler from a constraint it was under. This mostly comes
about as the result of X1 and X2 both holding a shared or overlapping pointer,
introducing aliasing (as often happens in the pure-functional data structures
OP advocates for.) By overwriting X1 with X2 (or, equivalently, by freeing X1
as soon as you’re done with it!) you can allow the compiler to be more
confident that nothing will access the shared pointer through X1.Y any more,
only through X2.Y—which may allow e.g. smart pointer classes to elide a
synchronization point.

(Note that this assumes a language with a model where a variable directly
relates to memory, such that assigning to the same variable implies assigning
to the same memory. This is true in most low-level languages, but not in
functional languages; in e.g. Elixir, a second assignment to the same variable
is actually creating a second variable with a mangled name, and leaving it to
the compiler’s lifetime analysis to figure out that the first one is dead
after that point.)

~~~
lou1306
> Using separate variables instead of reusing the same variable will sometimes
> actually increase the optimizations that compilers can do

LLVM optimizations, for instance, often need the IR to be in SSA form (static
single Assignment).

~~~
strbean
If you're converting it to SSA form, is there really a difference between

    
    
        int a = 5
        int b = 2 * a
        // don't use a again

and

> int a = 5 > a = 2 * a

?

~~~
lou1306
I don't think I really understand the question, but please bear with me.
Indeed both code snippets are the same computation. The first snippet is SSA,
the second is not. Notice that doing, for instance, constant propagation on
the first snippet is straightforward:

    
    
        int a = 5
        int b = 2 * 5
    

You can do this because SSA form guarantees that the value of a is 5. In non-
SSA code, there is no such guarantee: you need to check the "history" of each
variable before replacing it with a value.

I'm not totally convinced that SSA "increases the optimizations that compilers
can do". However, it definitely makes optimization algorithms easier.

~~~
strbean
I guess my point was that the optimizations available are the same (between
those two snippets of code) if you end up in SSA form. Thinking about it now,
I don't know what optimizations would be done before going to SSA and how they
would be affected.

------
recursive
Has anyone ever seen a linear-time algorithm to do a histogram without
mutation? I mean like `Counter` from python.
[https://docs.python.org/3.8/library/collections.html#collect...](https://docs.python.org/3.8/library/collections.html#collections.Counter)

~~~
QuinnWilton
It's not exactly the same, but that's pretty comparable to a g-counter CRDT,
which is very straightforward to implement in a purely-functional language:
[https://github.com/QuinnWilton/distsys_training/blob/master/...](https://github.com/QuinnWilton/distsys_training/blob/master/shortener/lib/shortener/g_counter.ex#L1)

------
CamperBob2
_Instead of changing the content of a variable, you can just declare a new
one. By avoiding assignment, you can guarantee that your variables won 't
change. You can guarantee that the current value of x is the one it has been
initialised with: init()._

Hard to see the upside there. By adding another variable (y in the example),
you've added another degree of freedom to your program state, arguably for no
good reason. Now there's something else to go wrong that wasn't there before.

------
kevin_thibedeau
> The reason why the "wrong" way is used at all is because many old
> programming languages forced you to declare variables at the beginning of
> blocks. It was easier for compilers. It's no longer an issue, however. Not
> even in C.

It's still very much an issue for a certain compiler vendor without full
support for C99.

~~~
asveikau
The vendor you are likely referring to fixed mixed declarations and code in
their 2015 release.

~~~
kevin_thibedeau
I still have to write code that builds against earlier versions. Code that
primarily targets a backwards embedded compiler that only does C++98 but has
had usable C99 support since forever.

------
massysett
A much shorter version of this page: "Use Haskell."

~~~
js8
I wanted to say exactly the same thing. If you're going to these lengths, you
might as well bite the bullet and use a language that supports that style of
programming.

------
crystaln
Is there a difference between this and functional programming? It's odd that
this article doesn't mention the entire branch of programming it is talking
about.

~~~
loup-vaillant
It was 10 years ago, so my memory is a bit fuzzy.

I believe I didn't want to scare readers away. But more importantly,
functional programming is more than just removing the assignment statement. If
you take a language and just remove the assignment statement, the language you
get isn't functional, it's just crippled.

Functional programming involves other techniques to compensate. Higher order
functions are critical, and often used to hide recursion under the rug (see
map & reduce). I have a more complete guide to functional programming there:
[http://loup-vaillant.fr/tutorials/from-imperative-to-functio...](http://loup-
vaillant.fr/tutorials/from-imperative-to-functional)

------
brundolf
I wrote a related post a few months back:
[http://www.brandonsmith.ninja/blog/procedures-functions-
data](http://www.brandonsmith.ninja/blog/procedures-functions-data)

Something interesting that's happened since then: I've taken a job at a Python
company, and I've noticed that, compared to JavaScript, assigning intermediate
results into local variables seems to be much more idiomatic in Python for
some reason. This is doubly surprising since JavaScript at least has the const
keyword. I'm not sure yet how I feel about it.

One other anecdote: I personally relax my usage of assignment a lot more when
using Rust, because a) it has rich, deep immutability at a type level, and b)
its expressions tend to be longer than those in other languages due to all the
monads and such.

------
m4r35n357
Sometimes this is not possible or desirable. Libraries like MPFR pretty much
demand that you pre-allocate and re-use variables. Of course in this case
despite the mutation there is no assignment statement, as variables are set
using output parameters ;)

------
joe_the_user
So in the example of the linked list, how do you add an element to the middle?
I actually don't think you can make this work in a conventional sense - you'd
copy the whole list and lose all linked-list advantages. Instead, my
impression functional programming uses list objects that appear to copy the
whole list, are efficient but actually use shared memory and are mutable
"behind the scenes". And sure that works but it results in one's code being a
few more levels of abstraction away from what's happening with the processor.

~~~
mangodrunk
It seems that this implementation of a list is more useful when used like a
stack. As a general list (with more operations than "pop" and "push"), not so
great in terms of performance compared to one that is mutable.

I did like the article and how the author writes so simply. To the author's
credit, right below, they do have the section on "When you can't help it",
which if you do need insertion like that and performance matters, probably
best to use a mutable data structure.

------
temac
If you want that much to program in a functional style, maybe use a functional
language. It will probably be more efficient, compile faster, and be easier to
maintain.

------
kazinator
Tip o' the hat!

[http://www.kylheku.com/cgit/txr/tree/eval.c?id=txr-234#n2063](http://www.kylheku.com/cgit/txr/tree/eval.c?id=txr-234#n2063)

[http://www.kylheku.com/cgit/txr/tree/hash.c?id=txr-234#n901](http://www.kylheku.com/cgit/txr/tree/hash.c?id=txr-234#n901)

------
icedchocolate
“Declare New Variables... I see two possible reasons why one would want to do
it the "wrong" way: efficiency, and conciseness.”

Correct me if I’m wrong, but isn’t this plain wrong? You are increasing the
memory required for your program to run, and if the data saved to those
variables are sufficiently large you can run into memory issues this way.

------
whitehouse3
I think all of this is achievable in Python. I'd be interested in a linter
that checks for unnecessary assignment statements.

------
_hardwaregeek
Kinda funny note, I realized that I forgot to add an assignment statement in
my language. Like I just straight up forgot to add it in the grammar. Maybe
I'll see how long I can avoid adding it.

~~~
crystaln
So you wrote a functional language... keep it that way.

