
The Hardest Program I've Ever Written (2015) - tanu057
http://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/?sort
======
eesmith
"If it took that much thrashing to get it right, you’d expect it to do
something pretty deep right? Maybe a low-level hardware interface or .. I’m
talking, of course, about an automated code formatter."

This reminds me of the Tao of Programming 3.3, at
[http://www.mit.edu/~xela/tao.html](http://www.mit.edu/~xela/tao.html) , the
relevant part of which I will copy here:

There was once a programmer who was attached to the court of the warlord of
Wu. The warlord asked the programmer: "Which is easier to design: an
accounting package or an operating system?"

"An operating system," replied the programmer.

The warlord uttered an exclamation of disbelief. "Surely an accounting package
is trivial next to the complexity of an operating system," he said.

"Not so," said the programmer, "When designing an accounting package, the
programmer operates as a mediator between people having different ideas: how
it must operate, how its reports must appear, and how it must conform to the
tax laws. By contrast, an operating system is not limited by outside
appearances. When designing an operating system, the programmer seeks the
simplest harmony between machine and ideas. This is why an operating system is
easier to design."

~~~
moretai
where do people find quotes like this? Just reading a ton of books?

~~~
npsimons
> where do people find quotes like this? Just reading a ton of books?

No lie, I've gotten more than one good book recommendation (including "Tao of
Programming") from '/usr/games/fortune'. Also, if you find a book fascinating
or highly informative, skim the bibliography for more like it.

~~~
ColanR
what's "/usr/games/fortune"?

~~~
sbierwagen
[https://en.wikipedia.org/wiki/Fortune_(Unix)](https://en.wikipedia.org/wiki/Fortune_\(Unix\))

------
harpocrates
Pretty-printing is tough. That said, please don't reinvent the wheel. There is
research that has gone into this that should make most of this stuff pretty
straightforward. I personally recommend Wadler's "A Prettier Printer" [0]
(although credit goes to Hughes for laying a lot of the groundwork [1]). It
too uses an IR and has several possible heuristics for rendering.

I've been using an implementation of it [2] with a lot of success for pretty
printing Rust code [3].

    
    
      [0]: https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf
      [1]: http://belle.sourceforge.net/doc/hughes95design.pdf
      [2]: https://hackage.haskell.org/package/prettyprinter
      [3]: https://github.com/harpocrates/language-rust

------
jemfinch
It sounds like perhaps a case of being so focused on whether the program
_could_ be built that they didn't stop to ask whether it _should_ be built.

Automatic code formatters don't need to be perfect or complete; they simply
must format _good_ code well. If bad code (as many of the difficult examples
were) formats poorly, that's just another reason for people to write better
code.

~~~
marcosdumay
If you are using it to enforce a formating over a code base, ok a good enough
formater will do.

But one of the biggest use cases of a code formater is for understanding bad
code. If it can't format bad code, that use case is gone.

------
peterburkimsher
I wrote a pretty-printer for bash scripts using PHP. Colouring the keywords
was fun, but I moved on to other personal projects instead of tackling line
breaks.

More recently I've been trying to learn Chinese, and one of the features of
Pingtype is to put spaces between words.

[http://pingtype.github.io](http://pingtype.github.io)

To my surprise, this article linked to a Wikipedia page about line wrapping,
which says that line wrapping in CJK is unsolved.

[https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Word_w...](https://en.wikipedia.org/wiki/Line_wrap_and_word_wrap#Word_wrapping_in_text_containing_Chinese.2C_Japanese.2C_and_Korean)

"Most existing word processors and typesetting software cannot handle either
[personal names or compound words]."

My method works, but I don't know who to give it to. This is Hacker News and
there's people from all different backgrounds here, so I'll just throw it out
there - if anyone is interested, please contact me.

~~~
bmn__
> if anyone is interested, please contact me

This is not how things are done. Just put your code and documentation on the
public Web, then submit the link to Hackernews or other relevant forums.

~~~
peterburkimsher
Did that. Nobody noticed.

[https://news.ycombinator.com/item?id=14907618](https://news.ycombinator.com/item?id=14907618)

~~~
bmn__
You are correct that no one is interested in yet another translation Web
application, especially since it does not say how it's different or better
than the rest. That's all anyone got to see so far.

If you want to be successful in communicating your improved method for word
segmentation to the world at large, extract the code, publish it, and document
the relevant details what makes it better and how it compares against existing
solutions.

I am a hobby computational linguist. "Contact me" is an outmoded concept, most
of us are doing our collaboration in the open.

------
wheresvic1
Why do you necessarily need a line limit?

I would simply indent all chained function calls and be done with it. Eg.

    
    
       return foo(param)
          .then(bar)
          .catch(err => {
             logger.error(err);
             return -1;
          });

~~~
skybrian
Some style guides have a line limit (including Dart). If you want to submit
code without hand-editing, the formatter needs to do it.

On the other hand, Go's style guide doesn't have a line limit, so its
formatter doesn't have to solve tricky problems like this. Programmers can add
line breaks by hand if they get too long. (A nice side effect is that auto-
formatting your code doesn't change line numbers.)

------
dracodoc
One of my side project was an auto formatter for R. There are some limits in
existing formatters:

\- I think most of them doesn't recognize multi line string literals, which is
difficult if you consider the case that you can have "" in comments, comment
symbol in "" string literals and line breaks. The only way to deal with it is
to scan linearly with context.

\- It's tricky to wrap a long line: \+ some points in a long line are more
suitable for breaking points in logic level \+ but sometimes you want less
lines and not to break too often, even it's more clear in logic. The lines
could be just some parameter list that will be both well represented in one
column or multi columns. \+ with nested code the natural indent position could
be at the far right, which make each line very short if you stick to 80
columns rule.

After quite some efforts my code can deal with all the comments, multi line
strings, all the operators I known (I need to separate unary and binary
operators to determine whether to insert space), but the script take several
seconds to run, and I haven't start to deal with indent. I probably can save
some time if I do more optimization, but I don't have time to finish it now.

This python formatter talked about its algorithm, worth a read.

[https://github.com/google/yapf](https://github.com/google/yapf)

------
StefanKarpinski
Cool post. Some insight into why this problem is so hard: what this post is
describing seems to be an integer linear programming problem [1]. I.e.
optimizing a linear cost function constrained by (convex) linear bounds with
integer-valued variables. The reason it's so difficult is that ILP is an NP-
hard problem. Finding the right way to represent program source is also
tricky, but, as the post says, doing so in a way that caters to the extremely
performance-sensitive solver code is the really difficult bit. A better
approach might be to produce an explicit ILP program and use an ILP solver to
decide where the line breaks should go. As with many NP-hard problems (e.g.
SAT, TSP), there are very good solvers these days that are much faster than
anything you could ever hope to write yourself – and they produce fully
optimal solutions.

[1]
[https://en.wikipedia.org/wiki/Integer_programming](https://en.wikipedia.org/wiki/Integer_programming)

~~~
tnecniv
> As with many NP-hard problems (e.g. SAT, TSP), there are very good solvers
> these days that are much faster than anything you could ever hope to write
> yourself – and they produce fully optimal solutions.

And if it's not fast enough, you can use one of a number of relaxations to a
standard LP and call it a day.

------
matt_wulfeck
> _There are thirteen places where a line break is possible here according to
> our style rules. That’s 8,192 different combinations if we brute force them
> all_

This is why the language should be designed with a formatter in mind from the
beginning, as Go was designed. Just enough mustaches to make formatters
accurate and fast. How many possibilities should there be? Exactly one.

------
pmoriarty
_" There are thirteen places where a line break is possible here according to
our style rules. That’s 8,192 different combinations if we brute force them
all. The search space we have to cover is exponentially large..."_

Sounds like this might be a good candidate for some AI methods which are not
intimidated by such large search spaces.

~~~
buttcoinslol
Would there be AI methods that would produce deterministic results for code
formatting? Deterministic output seems to be the main draw of a code
formatter.

~~~
pmoriarty
The code formatter described in the article is not deterministic either. The
author admits the search space is far too large for him to brute-force it, so
his formatter does the best it can, which might not get the optimal results:

 _" If the line splitter tries, like, 5,000 solutions and still hasn't found a
winner yet, it just picks the best it found so far and bails."_

This sounds like it would give no more guarantees of a deterministic result
than AI methods would. They also try their best, only search a subset of the
search space, and bail when user-defined criteria such as time, iteration, or
space limits are reached.

Another approach could be to use both the author's hand-crafted algorithm and
an AI method. There are at least two ways to go about that. The first would be
to run the hand-crafted algorithm and pass its evaluation of the best
solutions to seed an AI method (for example, as the seed populations in an
evolutionary algorithm). The second would be to do the opposite, and run the
AI method first, and seed the hand-crafted algorithm with the best solutions
the AI method discovered.

~~~
bradleyjg
I doesn't look like that stopping time is user-defined, it's a compiled in
constant.

Given that, I don't see any reason why the algorithm should be non-
deterministic just because it gives up after a set number of iterations. As
long every time it sees a given string it tries the possibilities in the same
order, scores them the same way, and stops at the same place then it should be
deterministic.

~~~
pmoriarty
By "user-defined", I meant to refer to the user of the algorithm (AI or not),
who is the developer.

You're right that if the algorithm described in the article will always get
the same result given the same input it will be deterministic in a way that
the kinds of AI methods I'm thinking of (the stochastic kind) will not. That
sort of non-determinism will be a problem if you expect your code to be
formatted the same way every time it goes through the formatter.

------
igravious
I remember this coming up before :)
[https://news.ycombinator.com/item?id=10195091](https://news.ycombinator.com/item?id=10195091)

ninja edit: I mostly jump remembered the picture of Robert with his/a dog and
the text, “Hi! I'm Bob Nystrom, the one on the left.”

------
Bromskloss
> That means adding line breaks (or “splits” as the formatter calls them), and
> determining the best place to add those is famously hard.

Naive question here: What is so hard? It can be solved with dynamic
programming, right? Doesn't he even link to solutions of the problem?

~~~
perfmode
> For most of the time, the formatter did use dynamic programming and
> memoization. I felt like a wizard when I first figured out how to do it. It
> worked fairly well, but was a nightmare to debug.

------
ycmbntrthrwaway
[http://suckless.org/philosophy](http://suckless.org/philosophy)

~~~
carapace
Yes, but in this case it would have to be applied to the Dart language
_syntax_ rather than the formatter. At this point it's too late for the
formatter to be simple.

------
gnode
Please mark with (2015).

~~~
dang
Sure. Thanks.

