
Better Clojure formatting - harperlee
http://tonsky.me/blog/clojurefmt
======
napsterbr
The 80-character limit may have come from punchcards, but it's still important
in 2018. It allows me to have three side-by-side columns of code, and I know
I'm not alone. Even 100-character limit would break similar setups.

Furthermore, I personally find a lot easier to read code that is not too
spread horizontally. I do not have extensive experience with Lisp or Clojure,
but all languages I've worked with will handle a "small" line width of 80
character just fine most of the time.

I do recognize this is a matter of personal preference, but please don't
dismiss the 80-character limit because it's old. It may be, but it is still
relevant.

Other than that, I wholeheartedly agree with the post. I started toying around
with clojurescript for the past few months, and the one thing I can't wrap my
head around is its formatting. Sometimes one space, sometimes two. Sometimes
block spacing. It's very confusing.

~~~
gizmo385
We expanded our team line limit to 88 characters and that feels quite a bit
better than 80. There are a lot of pretty common lines of code that once
you're nested in a function/if statement/loop stretch out beyond 80 characters
but fall comfortably into 88.

~~~
dkersten
I used to use 100 characters as my limit and it was pretty nice. I switched
back to 80 because most editors already mark the 80 character point and I
didn't want to have to configure editors (basically laziness). I find 80 fine
for most code. There are some cases where I naturally go over a little and its
a bit clumsy to avoid it, but they're few enough that I don't care. Especially
in Clojure code, I find that if things are that long, then its too nested and
hard to follow anyway and I rearrange my code (use ->, ->>, let etc to split
nested forms up).

In Clojure code, the most likely reason to go over 80 characters for me is let
forms like (let [long-descriptive-names-that-get-too-close-to-80 (and we-have-
hit the-limit)]) but overly long names mess the code up anyway and should be
avoided. Descriptive is good, but too long makes it hard too read too.

------
thomastjeffery
> > Avoid trailing whitespace.

> This is one of the stupidest things to automate. Unlike indenting, removing
> trailing whitespace simply produces diffs where it doesn’t matter. And does
> nothing more.

I suppose formatting is a weak point for version control systems, but I think
that is all the more reason _to_ get rid of trailing whitespace. The more your
code is clean of ambiguous formatting, the less it will change between
commits.

~~~
Guthur
If you got rid of it from the beginning there is no diff. The argument makes
no sense.

------
lilactown
I agree with everything he said... except the 80 col limit. I know I'll
probably lose this fight, but I frequently write code on all sorts of
different screen sizes broken into split panes (Code | REPL, Code | Browser,
Code - REPL | Docs) and having a line limit makes this manageable.

Maybe 80 isn't the right one, but letting code stretch out to ad-infinitum
(either with awkward line breaks or having to scroll) is a big ergonomic loss
for me personally.

~~~
ajuc
I agree, and it's not mainly because of splitting the screen. Very long lines
of code just read badly, even if you have the screen-space to fit them.

It's easy to skip and forget where you were when you look to compare the code
to something and then look back.

~~~
dkersten
Yep, there's a reason most article/blog sites use a fixed-width column of
text. Long free-flowing lines of text are simply hard to read.

I also used to not care about side-by-side, but now, I find it too useful.
Besides, nobody wants to scroll to read git diffs.

------
TeMPOraL
I see a fifth column growing here.

The core problem (sort of created here) is about how lispy Clojure should be.
Unlike most programming languages, Lisp code isn't meant to be fully
statically analyzable. This affect indentation as well, because the "full
picture" of syntax is only available at runtime.

Consider a macro invocation:

    
    
      (foo bar
           (some args)
           (some other code))
    

vs.:

    
    
      (foo bar (some args)
        (some other code))
    

Which indentation style is the correct one? _That depends on what "foo" is_.
If it's a function, then the first one. If it's just a regular macro, it's
still the first one. But for some particular macros - like _defun_ in
elisp/CL, the valid syle is the second one. You can't, in general[0], know
that until runtime. That's why the Lisps usually give you tools for that. In
Common Lisp, you have &body, which works like &rest but also hints that the
expected indentation is like the 2nd case I shown here. In Clojure, you have
metadata - you can attach e.g. {:style/indent 2} to your macro, which signals
that desired indentation changes to 2nd form on third argument.

My personal opinion - most Lisps are designed for interactive, in-image work.
So is Clojure. If you want to have a completely static formatter, go back to
coding Ruby or Python.

\--

[0] - You can guess it in simple cases, but not when you're dealing with
macro-writing macros, i.e. code that writes code that writes code.

~~~
dmitriid
Isn't LISP a language that prides itself in having no special forms? A macro
is a macro is a macro.

    
    
        (macroname
           first-param
           third-param
           fourth-param
           ad-infinitum)
    

This is the only case (there are no special, or complex, or any other cases).

Besides, if you took your time and actually read the article, you'd see `defn`
in the examples.

~~~
TeMPOraL
> _A macro is a macro is a macro._

At the syntax level, yes. But indentation is not a syntax-level concern, but a
_semantic one_. Speaking Clojure,

    
    
      (defn function [arg]
        {:some-op arg})
    

has the exact same syntactic structure to

    
    
      (assoc-in some-map
                [path]
                {:some-key path})
    

but it's expected to be indented differently because of the difference in
_meaning_ between defn and assoc-in.

Lisp allows you to use macros to provide new symbols that, put first in a
list, imply something else than function call. So user-defined forms are put
on equal footing with things like if or defn. And user-defined things can
_generate_ more user-defined things that are meant to be indented like defn.
Hence the need for loading the code for correct indentation.

~~~
dmitriid
Nah. I still don't see the difference :) To me all of these are some
macros/functions that expect certain parameters. Makes it much easier to
reason about code IMO (especially if they are indented similarly).

~~~
kazinator
Even the basic lisp mode in Vim recognizes the difference. There is a :set
lispwords=... list of symbols that are operators. They get indented like this:

    
    
      (lispword a b c
        d e f
        g h)
    

whereas symbols not in the set like this:

    
    
      (not-lispword a b c 
                    d e f)
    

This is important so we don't end up with this:

    
    
      (let ((x y))
           form
           form)
    

or, _vice versa_ , this:

    
    
       (list (item 1) (item 2)
         (item 3) (item 4))

------
alexeiz
Clojure doesn't need this gofmt crap from Golang. The reason Golang needs it
is that it uses a bizarre code formatting convention of indenting with tabs
and aligning with spaces. It's impossible to correctly use both tabs and
spaces in your code manually. You're bound to screw it, so there is a need for
an automated formatter that enforces this style.

In Clojure we format everything with spaces ([https://ukupat.github.io/tabs-
or-spaces/](https://ukupat.github.io/tabs-or-spaces/) \- 99% of Clojure code
uses 2 spaces). This formatting style is pretty much trivial and all code
editors support it.

I'm perfectly fine with Clojure code formatted slightly differently depending
on a person who writes it. If you want a BDSM language, code in Golang.

------
vbuwivbiu
Jeez it's just data! Use a standard format for serialization and then use
programs to format it for reading & everyone gets to read it the way they like
it

~~~
mrmaloke
That way you force a serialization standard upon everyone. I'd rather force a
formatting standard upon everyone and get the benefit of readable code with
any tool e.g. also code reviews in version control. There's hardly a way to
read code "the way I like it" in a webapp.

~~~
vbuwivbiu
that's exactly what I mean by standard serialization format - format of code
exchanged between people and programs. You can always rely on the standard way
(for example the one presented here) when reading code on github or when you
open it in your text editor by default, and then if you have another way you
prefer that it be formatted, you can use a program to format it that way for
you while you work on it - however when you commit it, it's formatted the
standard way not your special way.

We've conflated how the code is stored with how it looks. We need to separate
these. That's one of the problems with equating code with text: they're
different things. Code is data (or objects), text is a way to represent code.

~~~
rogual
I don't know if they are totally separate. Say you like to see XMLHttpRequest
and I like to see xml-http-request. If I type xml-http-request, how can a
piece of software know how to translate that to XMLHttpRequest?

It's not just variable casing; it's any situation where one coding style
distinguishes two things which look the same in another coding style. And I
think there's a lot of such cases.

~~~
vbuwivbiu
this is in the context of LISP code

------
bunderbunder
> if everybody is reformatting it’s doesn’t really matter how much.

Really aggressive reformatting introduces git blame horizons that can make
code archaeology more annoying.

I wouldn't consider it a reason not to switch to autoformatting, but it might
be a reason not to get _too_ wild about changing the conventions.

~~~
hinkley
It's why I issue a separate commit for reformatting.

At least when the person sees my commit in git blame they know to annotate the
previous version.

~~~
notduncansmith
Or even better, do it once for the main codebase in a single commit, then make
sure developers have their editors apply the formatting on save, and a
formatting check in CI, so that way misformatted code never gets code-
reviewed.

------
agumonkey
That's one very nice side of python, no time wasted on formatting. I cannot
bear formatting arguments even for a second it kills my soul.

~~~
dkersten
I've heard endless Python formatting debates too.

Luckily _most_ people have settled on pep-8, but even there, there are points
of contention, such as line length limits, or how spaces won the tabs vs
spaces debate (despite that imho tabs are superior as they encode indention
cleanly: 1 tab = 1 indent, yet each person can set their editors tab width
however they prefer) See, nobody will ever fully agree on formatting.

~~~
agumonkey
That's true.. even though I wouldn't consider variable naming and docstrings a
formatting debate.

~~~
dkersten
To me anything that affects the visuals of the code without the semantics is a
formatting issue -- so tabs vs spaces (and how many) for indents, space after
function names (before parameter lists), space between operators, whether
camelCase or something else, Foo.foo or Foo.Foo or foo.foo or..., etc. They're
all formatting.

But that wasn't my point really, which was just that despite its forced
indentation, Python still has, in my personal experience, its own formatting
debates.

------
vseloved
it's a classic "worse is better": if doing good is hard, let's do the easy
stuff. meh

------
lincpa
There are some professional domains where the "domain-namespace-alias/domain-
variable-name" is over 80 characters. Clojure should not enforcement language-
independent aspects.

------
harlanji
No mention of cljfmt, which I’ve used for a few years via lein plugin (‘lein
cljfmt fix’).

~~~
enoch_r
I think you're missing the point. The point isn't "no tool exists to format my
clojure code!" which is obviously wrong. The author is looking to define a
_standard_ that is simple enough to define once and be used universally.
`cljfmt` is awesome and I use it myself, but it fails on pretty much all of
the criteria the author lays out (slow startup time, it is "extensible,"
formats macros/binding-forms/functions differently, etc).

~~~
TeMPOraL
Honestly, I think this is a bad idea. The way Lisp works, you need to load the
code to have a shot at knowing the correct indentation. You can't just do it
statically without abandoning the different syntax for function calls vs.
block-of-code macros, as the distinction between the two may not be generally
apparent until runtime.

~~~
enoch_r
That isn't "how lisp works," that's how most people choose to format their
lisp code.

And this is the entire point of the article. He discusses this at length: to
reach the universality and speed of a `gofmt`-like formatting tool, we would
have to give up on the idea that s-expressions should be indented differently
based on their content.

Personally I find this a very tempting tradeoff. You may not, but indenting
different s-exps in different ways based on their first element is not
"correct," it's just... what most people choose to do.

~~~
lispm
> You may not, but indenting different s-exps in different ways based on their
> first element is not "correct," it's just... what most people choose to do.

They are not just s-expressions. They are programs. Just like a C code is not
a random string, but a program.

------
dmead
isn't it true that the lisp community is more tolerate of horizontal space
than most subsets of programmer?

That, plus the fact i haven't coded on a 4:3 monitor since ~2002 makes me
thing this is not helpful. The overuse of newlines in coding standards is... a
useless exercise.

~~~
notduncansmith
I don't know about the wider Lisp community, but I think Clojure is much
easier to edit and more aesthetically pleasing when arranged in well-newlined
blocks <=80 chars wide.

Constraining line length is great for readability. The benefits are similar to
those gained by limiting prose line length, and sentence length.

Think of a long line of code as having similar traversal characteristics to a
linked list. This requires more memory for comprehension, and is hard to
search.

Now compare this to a newlined block of code, which resembles a richer data
structure that spatially colocates relevant elements.

It's a difference that matters a lot.

~~~
kazinator
> _The benefits are similar to those gained by limiting prose line length, and
> sentence length._

Not really; prose doesn't indent with increasing nesting, so line length in
prose actually means that many non-whitespace characters to scan.

> _Now compare this to a newlined block of code, which resembles a richer data
> structure that spatially colocates relevant elements._

Nobody writes huge lines in Lisps; but the indentation can get out of hand in
large functions. So you have this neat block that spatially colocates relevant
elements, but has 150 columns of whitespace on the left, which creates a
navigation challenge that is different from the problem of scanning a long
line.

If you combat this by simpy constraining the column length, then what happens
is this problem

    
    
                                                (cramped
                                                  expressions
                                                  (butting up
                                                           to
                                                           80
                                                           col
                                                           limit))
    
    

You have little recourse but to break it up into smaller functions (perhaps
some of them nested in the parent).

~~~
dkersten
> Not really; prose doesn't indent with increasing nesting

Clojure lines increase in length as they increase in nesting, but without
increasing nesting, they don't typically get particularly long[1], unless you
have very long argument lists or very long names. So, the key is to reduce
nesting (which helps comprehension anyway) and Clojure has plenty of tools to
help with that: you can use ->, ->>, as->, some->, or you can bind sub-
expressions to names using let/when-let/etc. There's no need to split code
into smaller functions unless it makes sense to do so (eg for code reuse or
semantically).

[1] 2-space indents help keep indents from being crazy wide without crazy
nesting. The only place where I often find lines getting too long is let-
blocks with aligned expressions and long variable names:

    
    
        (let [this-name-is-too-long (some-code-here
                                      (uh-oh this is getting long)
            ...

~~~
OliverM

            ]
        ))
    

Had to close those parens to stop the mental itch!

~~~
dkersten
Hah! But but but I _did_ close them (implicitly, as part of the "...")

