
The Hardest Program I've Ever Written (2015) - tosh
http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/
======
vjeux
I read this before working on prettier and it was a bit scary to be honest :)

I wasn’t convinced that the exhaustive approach with weigths was a good idea.
It makes it computationally super intensive and hard to predict what the
printer would do.

Instead prettier runs on a simple idea: if something doesn’t fit in one line,
break the outermost parent.

This simple rule (implemented via the Wadler IR) is efficient to implement and
makes the output generally look good.

Then the hard part was to go through the very looong (took me ~6 months full
time) process of adding special cases for every constructs that human write in
a special way.

There is nothing in there that’s hard at a computer science level, it’s just a
lot of special cases. I don’t believe that there’s a general solution for it,
you just need to slog through it.

And the last annoying thing are comments, they can be everywhere and are super
annoying to print correctly.

Overall, it wasn’t really hard, but a lot of work.

~~~
munificent
_> I wasn’t convinced that the exhaustive approach with weigths was a good
idea. It makes it computationally super intensive and hard to predict what the
printer would do._

Yes, you are right on both accounts. I'm still not entirely sure it's the
right approach for dartfmt.

I started with a corpus of code that was hand-formatted by people who I felt
had great style. And then I worked backwards to figure out what kinds of
formatting mechanics I would need to be able to automate that.

Unfortunately, there are some very common formatting idioms in Dart that
really just don't play nice with any simpler formalism. For example, it's
common to use embedded function literals as if they were blocks:

    
    
        group("test some stuff", () {
          test("test a thing with a very long"
              "multi-line description of the thing being tested", () {
            expect(thing, isSomeThing);
          });
        });
    

Note here how the body of the innermost function literal does not respect the
surrounding expression nesting, even though it is technically all an argument
to test().

But in other cases, a function parameter should work like a normal indented
expression:

    
    
        things
            .map((thing) {
              ...
            })
            .where((thing) => thing.isFoo);
            .toList();
    

I probably could have said, "Well, these idioms are going away and this code
will look different." and then designed a much simpler formatter. A huge part
of software engineering is picking the right problem to solve and knowing when
to push against requirement and when to take them as given.

For dartfmt, I did my best to keep the original requirements even thought it
meant a much more complex formatter. That made my implementation job harder.
But it made the political job of convincing everyone inside Google (and most
users outside) to use dartfmt much easier because the output was closer to
what they wanted.

~~~
vjeux
> For dartfmt, I did my best to keep the original requirements even thought it
> meant a much more complex formatter. That made my implementation job harder.
> But it made the political job of convincing everyone inside Google (and most
> users outside) to use dartfmt much easier because the output was closer to
> what they wanted.

I couldn't have said it better :)

I added a ton of special cases in the step that takes an AST and outputs the
IR to handle those two things (and many many more).

For 1), I special case functions that look like `test(` and completely ignore
the 80 column rule so that the name stays in one line.
[https://github.com/prettier/prettier/blob/3e0dceda9954658860...](https://github.com/prettier/prettier/blob/3e0dceda9954658860ffdbc7e04465fcf5737d12/src/language-
js/printer-estree.js#L1047-L1058)

For 2), member chains is actually the most complex piece of prettier, but it
was important to get right. There's plenty of comments in the implementation:
[https://github.com/prettier/prettier/blob/3e0dceda9954658860...](https://github.com/prettier/prettier/blob/3e0dceda9954658860ffdbc7e04465fcf5737d12/src/language-
js/printer-estree.js#L4222-L4584)

I'm sure you did something similar as well :)

~~~
munificent
Yes! There is a whole separate class just to handle formatting method chains:

[https://github.com/dart-
lang/dart_style/blob/master/lib/src/...](https://github.com/dart-
lang/dart_style/blob/master/lib/src/call_chain_visitor.dart)

------
srean
Pretty printers are hard.

    
    
        "We sat down one morning," recalls Steele. "I was at
        the keyboard, and he was at my elbow," says Steele.
        "He was perfectly willing to let me type, but he was
        also telling me what to type. 
    
        The programming session lasted 10 hours. Throughout
        that entire time, Steele says, neither he nor Stallman
        took a break or made any small talk. By the end of the
        session, they had managed to hack the pretty print
        source code to just under 100 lines. "My fingers were
        on the keyboard the whole time," Steele recalls, "but
        it felt like both of our ideas were flowing onto the
        screen. He told me what to type, and I typed it." 
    
       The length of the session revealed itself when Steele
       finally left the AI Lab. Standing outside the building
       at 545 Tech Square, he was surprised to find himself
       surrounded by nighttime darkness. As a programmer,
       Steele was used to marathon coding sessions. Still,
       something about this session was different. Working with
       Stallman had forced Steele to block out all external
       stimuli and focus his entire mental energies on the task
       at hand. Looking back, Steele says he found the Stallman
       mind-meld both exhilarating and scary at the same
       time. "My first thought afterward was: it was a great
       experience, very intense, and that I never wanted to do
       it again in  my life."  - Guy Steele
    

Heck! to get a taste one can take a stab at writing a formatter for printing a
floating point number.

~~~
Someone
If you require the output to be the shortest possible (e.g. 0.1+0.11 should
print as “0.21”, not as “0.21000000000000002”) and to have your output round-
trip (if you read in what you printed, you get back the number you printed) I
would think printing floating point numbers is harder than writing a pretty-
printer.

Pretty-printers have the advantage never being finished, as there always are
cases where some people would say “I would format that differently”. That’s an
advantage because it also means you can declare any version that works
decently “finished”. For formatting floats, it’s much more black or white.
Your formatter either works or it is buggy.

The first working formatter for floating point numbers is from 1980, IIRC, and
doing it correctly only became somewhat the expected case after Steelers 1990
paper ([https://lists.nongnu.org/archive/html/gcl-
devel/2012-10/pdfk...](https://lists.nongnu.org/archive/html/gcl-
devel/2012-10/pdfkieTlklRzN.pdf))

~~~
jancsika
> and to have your output round-trip

Unless you're simply talking about round-trip _within_ the application itself,
that's not possible.

~~~
tlb
It is possible, and most reasonable format systems guarantee it. Javascript
guarantees that parseFloat(x.toString()) === x for any fp number x. The same
extends to JSON formatting. Python guarantees that eval(str(x)) == x.

~~~
jancsika
That's because Javascript doesn't print out the shortest possible output.

OP's requirement is that the output for 0.1 + 0.11 is displayed as 0.21, _and_
that the output survives a round-trip. Your example satisfies the latter but
not the former.

------
AceJohnny2
Reminds me of Steve Yegge's introduction of js2-mode, a new JavaScript mode
for Emacs, and how hard it was to get its indentation right:

 _Amazingly, surprisingly, counterintuitively, the indentation problem is
almost totally orthogonal to parsing and syntax validation. I 'd never have
guessed it. But for indentation you care about totally different things that
don't matter at all to parsers. Say you have a JavaScript argument list: it's
just (blah, blah, blah): a paren-delimited, comma-separated, possibly empty
list of identifiers. Parsing that is pretty easy. But for indentation
purposes, that list is rife with possibility!_

[http://steve-yegge.blogspot.com/2008/03/js2-mode-new-
javascr...](http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascript-
mode-for-emacs.html)

------
saagarjha
> It reads in a string and writes out a string.

That’s usually where a lot of the hard problems lie, in my experience.
Parsers, compilers, interpreters, sanitizers, and of course formatters are
always a deep dive into formal language theory, where the limits are often
fundamental rather than technical and you have to make tradeoffs to get a
solution that works well.

------
bjoli
The hardest thing I ever wrote was an inliner for a language I made myself.

You'd think it is a simple problem, but soon you will have a rabid animal on a
leash and you will be struggling to restrain it.

I gave the source to a friend who modified it a bit and commited it to a
language that was mentioned a couple of times on HN in the early days. It has
since been replaced, but was used for a good 5 years.

~~~
sillysaurus3
Can you expand on some of the problems?

I was going to write an inliner for a Lisp I was working on for a couple
years. Emacs seems to have a decent one. But I know there are tricky corner
cases.

~~~
nappy-doo
Writing a simple inliner isn't too bad, writing a good one is hard. Let's
pretend you just "copy/paste" the code into place, there's the question of
deciding when/if to copy/paste the code in -- what heuristic do you use. Now
there's constant propagation, possibly more doing dead code (from constant
propagation), dependency reanalysis (because you've changed the basic block,
possibly adding more basic blocks). There are reasons the resultant code could
even be less efficient (register pressure for locals for the inline code comes
to mind), and there's i-cache complexities that come from blowing up your code
size.

Inlining is really hard.

------
zaroth

      return doughnutFryer
        .start()
        .then((_) => _frostingGlazer.start())
        .then((_) => Future.wait([
              _conveyorBelts.start(),
              sprinkleSprinkler.start(),
              sauceDripper.start()
            ]))
      ...
      ...
    

> (The funny names are because this was sanitized from internal code.)

And for about 3 seconds I was really excited to learn more about who wrote
this code!

------
choeger
I really hope the author did some research. There is a very old work from
Derek Oppen that has been adapted to the functional world later. The
principles sound very similar to the OP's approach. It would be a shame if so
much work went into reinventing the wheel...

~~~
baby
I read once that this is the difference between research and hacking. Research
is about researching related work and the field and ensuring that we’re not
re-inventing the wheel. Hacking is about writing code as fast as possible.

~~~
gnulinux
A good software engineer should be able to do both of them successfully.

------
otakucode
I know this is a stupid question, and please forgive me for asking it... but
just how wide do monitors need to get or how high do resolutions need to go
before longer lines become reasonable?

~~~
EvilTerran
It's not about monitor size or resolution - text just gets hard to read once
lines exceed a certain width, because the eye starts losing track of which
line it's on as it sweeps from the end of one line to the start of the next.
That's true for prose as well as code - and code has the additional complexity
of having to keep track of possibly-deeply-nested paired pieces of syntax
(parentheses, braces etc); that part can be made easier with indentation, but
not if everything's on one line.

Wider screens still have their advantages even if you keep your text narrow,
mind - when you want to display multiple things side-by-side; the halves of a
diff, or your code & a reference, for example.

------
hyperpallium
> Yes, I really did brute force all of the combinations at first. It let me
> focus on getting the output correct before I worried about performance.

The programmer spoke correctly.

------
_asummers
Not nearly as difficult, but I recently did a large project on converting
Erlang Dialyzer output into Elixir friendly output in the Dialyxir[0] project,
which involved a lot of lexing and parsing and pretty printing, though I was
fortunately able to take advantage of the Elixir code formatter for a lot of
the heavy lifting for output. My grammar rules, while better than they've
been, are still pretty bad, but they seem to work well enough in practice for
most programs that I've seen.

The tl;dr for that project is that it runs dialyzer, finds the relevant
Warning module for each warning, which will pretty print parts of the error
output into a larger explanation, which involves taking the output, lexing it,
parsing it, then pretty printing the IR back into Elixir, then running through
the formatter.

[0]
[https://github.com/jeremyjh/dialyxir](https://github.com/jeremyjh/dialyxir)

------
dkrikun
I hope we can have a library for this kind of formatting so each new
$hotlanguage can have $hotlanguagefmt easily..

~~~
dguo
There's an effort to make a plugin system for Prettier so that it can format
languages other than JS and company. See
[https://prettier.io/docs/en/plugins.html](https://prettier.io/docs/en/plugins.html)

------
Myrmornis
On reading this, does it seem like the language parser itself should demand
that the code is formatted correctly, rather than relying on a separate tool
to format the code? I.e. incorrectly formatted code would be a syntax error.

That way all this hard work is off-loaded on human brains. They'll complain a
bit but in practice will pick up very quickly where the line-breaks go.

(This would probably have to be a decision made by the language designers from
the outset.)

------
_xgw
Couldn't you start from Dart's BNF grammar, create a BNF grammar of
"formatted" Dart and then format the code accordingly?

~~~
munificent
Maybe. If you think of the formatted output as a grammar that includes
whitespace, you'll probably discover it is _very_ context-sensitive. Then
determining what context to synthesize when making the translation is probably
more or less equivalent to how you'd otherwise write the formatter.

------
doug1001
i'm a regular reader of this blog; there's always something non-obvious (at
least to me) plus just really fine, fine writing--eg,

"Some Dart users really dig a functional style and appear to be playing a game
where whoever crams the most work before a single semicolon wins."

sort of the Dave Eggers of software engineering

------
fizixer
> you’d expect it to do something pretty deep right? ... Nope. It reads in a
> string and writes out a string.

If an optimizing compiler takes in a program decides to spit out the
executable in a string (in base-64 format) which can later be converted to
binary, using xxd for example, that makes the optimizing compiler "not pretty
deep"?

Automated code formatting is fairly deep as far as I'm concerned.

------
gcb0
sounds like a case of bruteforcing a simple task with a pile of hacks.

can't imagine why all that mess is saner then putting a pretty printer on the
other end of the language parser.

also this mess will be extremely non deterministic, depending on machine speed
etc

------
scarface74
Side tangent:

He mentioned the algorithm questions he had to study to pass the interview and
that knowing the algorithms came in handy.

If you are interviewing for Google, Facebook, Amazon, or Netflix, etc. solving
problems that have never been solved at thier scale, algorithm questions are
important.

If you just want someone to right the next software as a service CRUD app,
don't waste my time.

~~~
alexbecker
Speaking as a former Googler, many (most?) Googlers never work on complex
algorithms at huge scale. The interview questions are often completely
irrelevant to their work.

~~~
scarface74
Do you think that even at Google the algorithm questions are useless and
shouldn’t be asked? I was trying to give Google the benefit of the doubt.

~~~
alexbecker
"At Google" is such a broad category that no single answer suffices. For
anyone working on the hot-path of search/ads/knowledge graph back-end, sure,
they need to know their algorithms. Same for the language teams like Nystrom
is on. For a front-end engineer, which is what I was at Google (exclusively in
Dart fwiw)? Probably not. I implemented a graph traversal exactly once.

However, Google uses the same interview process for all software engineers,
and mostly decides what job they will do _after_ deciding whether to hire
them. So they really can't tailor their questions without changing their
entire process. I think they should change it, but I understand why they
haven't.

