Hacker News new | comments | show | ask | jobs | submit login
Better Clojure formatting (tonsky.me)
111 points by harperlee 12 days ago | hide | past | web | favorite | 61 comments

The 80-character limit may have come from punchcards, but it's still important in 2018. It allows me to have three side-by-side columns of code, and I know I'm not alone. Even 100-character limit would break similar setups.

Furthermore, I personally find a lot easier to read code that is not too spread horizontally. I do not have extensive experience with Lisp or Clojure, but all languages I've worked with will handle a "small" line width of 80 character just fine most of the time.

I do recognize this is a matter of personal preference, but please don't dismiss the 80-character limit because it's old. It may be, but it is still relevant.

Other than that, I wholeheartedly agree with the post. I started toying around with clojurescript for the past few months, and the one thing I can't wrap my head around is its formatting. Sometimes one space, sometimes two. Sometimes block spacing. It's very confusing.

We expanded our team line limit to 88 characters and that feels quite a bit better than 80. There are a lot of pretty common lines of code that once you're nested in a function/if statement/loop stretch out beyond 80 characters but fall comfortably into 88.

I used to use 100 characters as my limit and it was pretty nice. I switched back to 80 because most editors already mark the 80 character point and I didn't want to have to configure editors (basically laziness). I find 80 fine for most code. There are some cases where I naturally go over a little and its a bit clumsy to avoid it, but they're few enough that I don't care. Especially in Clojure code, I find that if things are that long, then its too nested and hard to follow anyway and I rearrange my code (use ->, ->>, let etc to split nested forms up).

In Clojure code, the most likely reason to go over 80 characters for me is let forms like (let [long-descriptive-names-that-get-too-close-to-80 (and we-have-hit the-limit)]) but overly long names mess the code up anyway and should be avoided. Descriptive is good, but too long makes it hard too read too.

I personally like that it discourages nesting

Changing your standard to a value so close to the existing standard seems like an odd choice are you using a verbose language?

I think the python formatter 'black' came to the same conclusion - the default line width is 88 characters instead of 80

> all languages I've worked with will handle a "small" line width of 80 character just fine most of the time

Guessing you don't write Java? While you could do it, you'd have so much line wrapping that I imagine it would be much more annoying/difficult to read.

I think the argument the article was making wrt 80 is that editors can wrap the text to 80 characters without modifying the source, not that you shouldn't want your code to fit in 80 characters.

The counterargument is that they suck at it, you get a break (maybe word-aligned) with an arbitrarily indented next line.

As a purist I advocate for a future with smarter editor soft wrapping.

What editor should do is automatically scroll horizontally to hide the indentation and keep the code line in sight, as you move up and down through different indentation levels.

The thing is, code in the body of a function is rarely 80 columns long from the first non-whitespace character to the end of the line.

What would be useful would be a text editing mode that automatically scrolls horizontally to keep the code in view, regardless of indentation amount. Then you can have your 80 column window.

What sucks is moving down in a body of code, and watching the text disappear out to the right because the cursor is tracking straight down through the indentation instead of riding to the right of it.

That would do nothing to help the "I can't see all the code at once because it's bleeding off the side of the screen" problem. You'd just end up with getting to use a different set of keys, and slightly fewer keystrokes, to choose between not getting to see either the start of some lines or the end of some lines, which is a marginal improvement at best.

And I'm guessing it would only worsen the problem of sprawling code that's difficult to read, by removing one of the few sources of pain that encourages developers who are less mindful of what it's going to be like for the person who'll have to read this stuff 3 years down the road to exercise at least a little bit of care.

I can't see all the code at once because it's 80,000 lines and I have a 50 line peephole; so the "see all the code at once" has sailed. Roughly speaking, Lisp code grows diagonally:

                 (   )))

If the 80x50 peephole moves nicely along the diagonal, I'm okay.

I don't want to go from this:

   |(.....     | ;; not to scale, obviously
   |  (....    |
   |    (...)  |
   |     (.....|...
                  (   )))
To this:

   |           |....)
   |           | (...(...
   |           |       (.....)))
   |           | (   )))
When I move the cursor down. If the box would move diagonally, that would be great. Yes, of course the box will have empty space at the bottom left and top right.

That could be addressed in a funky way: why not rotate the text about 45 degrees? Then the rectangular window would capture a fuller slice of the diagonal. I could get used to the rotated text.

We may be at a cultural impasse, here. I see code that's nested that deeply, and it just makes me want to close the editor and go find something else to do.

Whether it's single lines that make a beeline for the pure land in the east, or towering ziggurats of stacked contexts slowly plodding their way into the sky, it doesn't really matter all that much to me. I count both of them as "sprawling code that's difficult to read." Lisp's a functional language; it's totally acceptable to break things up with a `defun` (or I guess, in this case, `defn`) here and there.

> > Avoid trailing whitespace.

> This is one of the stupidest things to automate. Unlike indenting, removing trailing whitespace simply produces diffs where it doesn’t matter. And does nothing more.

I suppose formatting is a weak point for version control systems, but I think that is all the more reason to get rid of trailing whitespace. The more your code is clean of ambiguous formatting, the less it will change between commits.

If you got rid of it from the beginning there is no diff. The argument makes no sense.

Exactly! Just separate the commits which fix formatting from the ones that change code.

In vim it's as simple as :%s/\s* $// (okay, not that simple if you don't spend time in regex or in the shell) and git can hide whitespace only changes in diffs. Just don't check in trailing whitespace.

If you need to clean up a whole codebase you can throw in something like find -name * .java | xargs sed -i 's/\s* $//' and just commit that once.

Sidebar: How do I print an asterisk and a char without adding in whitespace.

Trailing whitespace also makes jumping to the end of a line or moving back over a newline not take you where you expect, which is annoying when it breaks your flow.

It's only ever a mild nuisance (in that scenario or for diffs), but the fact that it's trivial to automate seems to make it a no-brainer to me.

I agree with everything he said... except the 80 col limit. I know I'll probably lose this fight, but I frequently write code on all sorts of different screen sizes broken into split panes (Code | REPL, Code | Browser, Code - REPL | Docs) and having a line limit makes this manageable.

Maybe 80 isn't the right one, but letting code stretch out to ad-infinitum (either with awkward line breaks or having to scroll) is a big ergonomic loss for me personally.

I usually follow 80 if possible, or in Java 100, but put a hard limit at 120. There's just a need for the occasional judgement call.

This seems to work well enough for two columns. Useful for editing, but indespnseble for diffs. For reference my 15" Mac has 234 columns at full screen.

I agree, and it's not mainly because of splitting the screen. Very long lines of code just read badly, even if you have the screen-space to fit them.

It's easy to skip and forget where you were when you look to compare the code to something and then look back.

Yep, there's a reason most article/blog sites use a fixed-width column of text. Long free-flowing lines of text are simply hard to read.

I also used to not care about side-by-side, but now, I find it too useful. Besides, nobody wants to scroll to read git diffs.

I see a fifth column growing here.

The core problem (sort of created here) is about how lispy Clojure should be. Unlike most programming languages, Lisp code isn't meant to be fully statically analyzable. This affect indentation as well, because the "full picture" of syntax is only available at runtime.

Consider a macro invocation:

  (foo bar
       (some args)
       (some other code))

  (foo bar (some args)
    (some other code))
Which indentation style is the correct one? That depends on what "foo" is. If it's a function, then the first one. If it's just a regular macro, it's still the first one. But for some particular macros - like defun in elisp/CL, the valid syle is the second one. You can't, in general[0], know that until runtime. That's why the Lisps usually give you tools for that. In Common Lisp, you have &body, which works like &rest but also hints that the expected indentation is like the 2nd case I shown here. In Clojure, you have metadata - you can attach e.g. {:style/indent 2} to your macro, which signals that desired indentation changes to 2nd form on third argument.

My personal opinion - most Lisps are designed for interactive, in-image work. So is Clojure. If you want to have a completely static formatter, go back to coding Ruby or Python.


[0] - You can guess it in simple cases, but not when you're dealing with macro-writing macros, i.e. code that writes code that writes code.

> Which indentation style is the correct one?

There is no correct one. There are aesthetic preferences to make Lisp code readable. Lisp has code formatters for decades and Common Lisp comes with one in the standard (the pretty printer). Tools to format (not just indent) textual Lisp code has also been available for decades, but are less used.

> If you want to have a completely static formatter

The proposal here is to ignore the programming language Clojure and format all code only on the expression level.

Isn't LISP a language that prides itself in having no special forms? A macro is a macro is a macro.

This is the only case (there are no special, or complex, or any other cases).

Besides, if you took your time and actually read the article, you'd see `defn` in the examples.

> A macro is a macro is a macro.

At the syntax level, yes. But indentation is not a syntax-level concern, but a semantic one. Speaking Clojure,

  (defn function [arg]
    {:some-op arg})
has the exact same syntactic structure to

  (assoc-in some-map
            {:some-key path})
but it's expected to be indented differently because of the difference in meaning between defn and assoc-in.

Lisp allows you to use macros to provide new symbols that, put first in a list, imply something else than function call. So user-defined forms are put on equal footing with things like if or defn. And user-defined things can generate more user-defined things that are meant to be indented like defn. Hence the need for loading the code for correct indentation.

Nah. I still don't see the difference :) To me all of these are some macros/functions that expect certain parameters. Makes it much easier to reason about code IMO (especially if they are indented similarly).

Even the basic lisp mode in Vim recognizes the difference. There is a :set lispwords=... list of symbols that are operators. They get indented like this:

  (lispword a b c
    d e f
    g h)
whereas symbols not in the set like this:

  (not-lispword a b c 
                d e f)
This is important so we don't end up with this:

  (let ((x y))
or, vice versa, this:

   (list (item 1) (item 2)
     (item 3) (item 4))

> Isn't LISP a language that prides itself in having no special forms

No, that's a misconception. Lisp has special forms from day one (1960). On day two (1962), macros were added -> they implement even more syntax.

Is there no way out, to run the formatter after the macros?

You've already made the crucial step then - you've loaded the code. The author apparently wants to avoid that.

You could get away with parsing the code statically in the most trivial cases, but to do it even remotely correctly, you'd have to parse all the code, and only then format individual files. A lot of work for no good reason, and it wouldn't save you from macro-writing macros.

In a large Lisp project, this would screw a lot of things up. I don't have relevant Clojure example here, but in Common Lisp, I worked on a pretty big codebase (CLIM2), where trying to indent things without loading the code into a Lisp image would screw up indentation in 50+% of the files - because a lot of frequently-used macros were written by other macros.

Clojure doesn't need this gofmt crap from Golang. The reason Golang needs it is that it uses a bizarre code formatting convention of indenting with tabs and aligning with spaces. It's impossible to correctly use both tabs and spaces in your code manually. You're bound to screw it, so there is a need for an automated formatter that enforces this style.

In Clojure we format everything with spaces (https://ukupat.github.io/tabs-or-spaces/ - 99% of Clojure code uses 2 spaces). This formatting style is pretty much trivial and all code editors support it.

I'm perfectly fine with Clojure code formatted slightly differently depending on a person who writes it. If you want a BDSM language, code in Golang.

Jeez it's just data! Use a standard format for serialization and then use programs to format it for reading & everyone gets to read it the way they like it

That way you force a serialization standard upon everyone. I'd rather force a formatting standard upon everyone and get the benefit of readable code with any tool e.g. also code reviews in version control. There's hardly a way to read code "the way I like it" in a webapp.

that's exactly what I mean by standard serialization format - format of code exchanged between people and programs. You can always rely on the standard way (for example the one presented here) when reading code on github or when you open it in your text editor by default, and then if you have another way you prefer that it be formatted, you can use a program to format it that way for you while you work on it - however when you commit it, it's formatted the standard way not your special way.

We've conflated how the code is stored with how it looks. We need to separate these. That's one of the problems with equating code with text: they're different things. Code is data (or objects), text is a way to represent code.

I don't know if they are totally separate. Say you like to see XMLHttpRequest and I like to see xml-http-request. If I type xml-http-request, how can a piece of software know how to translate that to XMLHttpRequest?

It's not just variable casing; it's any situation where one coding style distinguishes two things which look the same in another coding style. And I think there's a lot of such cases.

this is in the context of LISP code

That’s quite insightful actually. It had not occurred to me that way and for a simple example, I can envision a prettier based web extension that makes reading code bases online easier.

> if everybody is reformatting it’s doesn’t really matter how much.

Really aggressive reformatting introduces git blame horizons that can make code archaeology more annoying.

I wouldn't consider it a reason not to switch to autoformatting, but it might be a reason not to get too wild about changing the conventions.

It's why I issue a separate commit for reformatting.

At least when the person sees my commit in git blame they know to annotate the previous version.

Or even better, do it once for the main codebase in a single commit, then make sure developers have their editors apply the formatting on save, and a formatting check in CI, so that way misformatted code never gets code-reviewed.

That's one very nice side of python, no time wasted on formatting. I cannot bear formatting arguments even for a second it kills my soul.

I've heard endless Python formatting debates too.

Luckily most people have settled on pep-8, but even there, there are points of contention, such as line length limits, or how spaces won the tabs vs spaces debate (despite that imho tabs are superior as they encode indention cleanly: 1 tab = 1 indent, yet each person can set their editors tab width however they prefer) See, nobody will ever fully agree on formatting.

That's true.. even though I wouldn't consider variable naming and docstrings a formatting debate.

To me anything that affects the visuals of the code without the semantics is a formatting issue -- so tabs vs spaces (and how many) for indents, space after function names (before parameter lists), space between operators, whether camelCase or something else, Foo.foo or Foo.Foo or foo.foo or..., etc. They're all formatting.

But that wasn't my point really, which was just that despite its forced indentation, Python still has, in my personal experience, its own formatting debates.

it's a classic "worse is better": if doing good is hard, let's do the easy stuff. meh

There are some professional domains where the "domain-namespace-alias/domain-variable-name" is over 80 characters. Clojure should not enforcement language-independent aspects.

No mention of cljfmt, which I’ve used for a few years via lein plugin (‘lein cljfmt fix’).

I think you're missing the point. The point isn't "no tool exists to format my clojure code!" which is obviously wrong. The author is looking to define a standard that is simple enough to define once and be used universally. `cljfmt` is awesome and I use it myself, but it fails on pretty much all of the criteria the author lays out (slow startup time, it is "extensible," formats macros/binding-forms/functions differently, etc).

Honestly, I think this is a bad idea. The way Lisp works, you need to load the code to have a shot at knowing the correct indentation. You can't just do it statically without abandoning the different syntax for function calls vs. block-of-code macros, as the distinction between the two may not be generally apparent until runtime.

That isn't "how lisp works," that's how most people choose to format their lisp code.

And this is the entire point of the article. He discusses this at length: to reach the universality and speed of a `gofmt`-like formatting tool, we would have to give up on the idea that s-expressions should be indented differently based on their content.

Personally I find this a very tempting tradeoff. You may not, but indenting different s-exps in different ways based on their first element is not "correct," it's just... what most people choose to do.

> You may not, but indenting different s-exps in different ways based on their first element is not "correct," it's just... what most people choose to do.

They are not just s-expressions. They are programs. Just like a C code is not a random string, but a program.

It's worth pointing out that this post is in response to the discussion over at clojureverse, which is an effort to come up with an option-free indentation standard similar to gofmt or prettier.


isn't it true that the lisp community is more tolerate of horizontal space than most subsets of programmer?

That, plus the fact i haven't coded on a 4:3 monitor since ~2002 makes me thing this is not helpful. The overuse of newlines in coding standards is... a useless exercise.

I don't know about the wider Lisp community, but I think Clojure is much easier to edit and more aesthetically pleasing when arranged in well-newlined blocks <=80 chars wide.

Constraining line length is great for readability. The benefits are similar to those gained by limiting prose line length, and sentence length.

Think of a long line of code as having similar traversal characteristics to a linked list. This requires more memory for comprehension, and is hard to search.

Now compare this to a newlined block of code, which resembles a richer data structure that spatially colocates relevant elements.

It's a difference that matters a lot.

> The benefits are similar to those gained by limiting prose line length, and sentence length.

Not really; prose doesn't indent with increasing nesting, so line length in prose actually means that many non-whitespace characters to scan.

> Now compare this to a newlined block of code, which resembles a richer data structure that spatially colocates relevant elements.

Nobody writes huge lines in Lisps; but the indentation can get out of hand in large functions. So you have this neat block that spatially colocates relevant elements, but has 150 columns of whitespace on the left, which creates a navigation challenge that is different from the problem of scanning a long line.

If you combat this by simpy constraining the column length, then what happens is this problem

                                              (butting up

You have little recourse but to break it up into smaller functions (perhaps some of them nested in the parent).

> Not really; prose doesn't indent with increasing nesting

Clojure lines increase in length as they increase in nesting, but without increasing nesting, they don't typically get particularly long[1], unless you have very long argument lists or very long names. So, the key is to reduce nesting (which helps comprehension anyway) and Clojure has plenty of tools to help with that: you can use ->, ->>, as->, some->, or you can bind sub-expressions to names using let/when-let/etc. There's no need to split code into smaller functions unless it makes sense to do so (eg for code reuse or semantically).

[1] 2-space indents help keep indents from being crazy wide without crazy nesting. The only place where I often find lines getting too long is let-blocks with aligned expressions and long variable names:

    (let [this-name-is-too-long (some-code-here
                                  (uh-oh this is getting long)

Had to close those parens to stop the mental itch!

Hah! But but but I did close them (implicitly, as part of the "...")

Ahh, so you're more concerned with indentation than huge lines of code. Makes sense.

> You have little recourse but to break it up into smaller functions (perhaps some of them nested in the parent).

Generally this type of factoring is a good thing for the codebase, and a very natural part of the development process in Clojure.

I still like to constrain line length, because having two source files (or a source file and a REPL) side-by-side on the same monitor is so useful. The trade-off of avoiding long lines is a small price to pay. I recognize that most people like to have one file per monitor though.

> The overuse of newlines in coding standards is... a useless exercise.

My clojure code is more vertically compact (despite sticking to 80-column lines and avoiding deep nesting) than a lot of C++ code I see out there:

    if (condition)
    { // wasted line
        some = code(here);
    } // another wasted line?
    // sometimes a blank line here too
    if (another condition)
The Java convention at least wastes a tiny bit less vertical space:

    if (condition) {
        some = code(here);
    if (another condition) {
In Clojure I have the same, but without the trailing closing brace. I insert blank lines if I need to visually break the code up or if its really unrelated from the previous line, but its up to me when to apply it based on when it makes sense to do so, not just because the naming standard says so.

    (if condition
      (some code here))
    (if another-condition
In reality, I'd of course use when and if the body is trivial then I may even simply write (when condition trivial-body) as one line. As long as it won't go over 80 characters.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact