
Line breaking (2014) - kawera
http://xxyxyz.org/line-breaking/
======
seanwilson
Is there a reason browsers don't have decent line breaking algorithms yet that
make paragraph text look better than the current "left" and "justified" CSS
alignments? I miss the way LaTeX lays out text.

Also, maybe I missed it but are these algorithms trying to find the best line
breaking solution? How much more efficient can you make this if you look for
"good enough"?

~~~
bitcoinboi9
It's not just line breaking that makes LaTeX beautiful. It's also space
placing too.

Sometimes you can adjust the distance between letters in a word or distance
between words to not break to many lines.

The algorithm is similar.

The algorithms do find the best breakage but as you can see, there's no need
to look for good enough, there's already a nice n log n solution.

These algorithms work so well that then you get the problem of rivers.
[https://en.wikipedia.org/wiki/River_(typography)](https://en.wikipedia.org/wiki/River_\(typography\))

The reason why efficient DP algorithms exist is because the evaluation
function you're trying to minimize is decomposable over the decisions (of
where to put a line break). Rivers on the other hand can be optimally
evaluated only after you've put your spaces and breaks.

Similar issues are encountered in machine learning for sequence tagging or
natural language parsing, where the loss function does not always decompose
over sequence of decisions. The statistical models used there (that replace
Viterbi-like DP algorithm) try to compensate by reinforcement learning or
decomposition tricks. It works quite well.

~~~
amelius
> there's no need to look for good enough, there's already a nice n log n
> solution.

How efficient is that under local changes? E.g., when the user (or javascript)
changes one word in a long paragraph, does the algorithm require a re-analysis
of the whole paragraph?

~~~
yorwba
In the worst case yes, since that might change the number of words that fit in
that line, and that would mean exchanging words with the line before or after
that, leading to a cascading effect.

However, you might find that many of the old linebreaks can stay as they are,
so I'd expect reflowing to be faster in practice.

~~~
amelius
I'm thinking the algorithm will then not be able to provide a responsive UI in
all cases, so perhaps browsers should allow for a temporary suboptimal
rendering.

------
anameaname
Isn't line breaking an NP complete problem, akin to bin packing? I seem to
recall hearing a presentation about it in the context of autoformatters for
code. For example, Auto formatting Java code, which is typically verbose,
relies on heuristics to avoid computational snares.

~~~
thechao
It depends what on what bells&whistles you want, but line breaking is
“theoretically” quadratic but, for most use cases, basically linear. A real
misfortune is that the Knuth-Plass algorithm was published before we had a
good grasp on how to present algorithms well. The actual algorithm is
basically just a linear pass (reverse) along with a memo-table. (This is
usually described as “dynamic programming”, but I find that term less than
useless.)

[https://github.com/jaroslov/knuth-plass-
thoughts](https://github.com/jaroslov/knuth-plass-thoughts)

------
bogomipz
Could someone explain what these style of graphs are and how do I interpret
them in relation to the running time? Why does their orientation change?

They are very appealing visually.

~~~
khedoros1
The independent variables are the width of the line in characters (shown on
the horizontal axis), and the length of the text in words (the "depth" axis).
The dependent variable is the time that the algorithm takes to run, and it's
shown on the vertical axis.

The orientation of the graphs doesn't change, but the scale does pretty
massively, and seems to indicate the rough efficiency of the algorithm (the
time is always in the range of 0.0-1.0 seconds, so it's clear that brute force
is _many_ times slower than the more-advanced algorithms).

~~~
bogomipz
Thank you, this all makes perfect sense now. Cheers.

------
rusbus
(2014) should be added to the title

~~~
dang
Added. Thanks!

