
Description and implementation of the core Knuth-Plass line-wrapping algorithm - Tomte
https://github.com/jaroslov/knuth-plass-thoughts/blob/master/plass.md
======
jws
There is an interesting blind spot in the article. When the author has two
paragraphs shown, one greedy and one Knuth-Plass and declares the second to be
"more rectangular" it only looks so because of the colons added to the second
to show the ideal length. The differences in the line lengths are actually
identical. Measuring from the longest line the greedy is [0,2,3,1,many] where
the Knuth-Plass is [2,0,3,1,many].

The author's assertion is true, the example data just doesn't show it although
it looks like it does because of the extra marking.

~~~
svat
Good catch. In fact, it's possible that TeX (and humans) judge the first
paragraph as better. Here they are, with colons after each:

    
    
        xxxx x xxxx xxxxx xx xx xxxxx x:
        xx xxxxxxx xxxxx xxx xxxx xxx  :
        xxxxx x x xxx xxxx xxx xx xx   :
        xxxxxxx x xxxx xx xxxx xxx xxx :
        xx
        
        xxxx x xxxx xxxxx xx xx xxxxx  :
        x xx xxxxxxx xxxxx xxx xxxx xxx:
        xxxxx x x xxx xxxx xxx xx xx   :
        xxxxxxx x xxxx xx xxxx xxx xxx :
        xx
    

Note that in reality the paragraphs will not be typeset as above, but the
spaces will be stretched or shrunk so that the lines have identical width.

The reason the first could be judged as better is that TeX has penalties
(\adjdemerits) when consecutive lines belong to "incompatible fitness classes"
(e.g. very loose lines following very tight lines), so it may be possible that
the first paragraph can count as (tight, loose, loose, loose), and the latter
as (loose, tight, loose, loose), which is worse.

------
raphlinus
Another implementation people might find interesting is the one in Minikin,
the text layout library inside Android. It's all free software. This one has
been optimized and tuned fairly carefully.

[https://android.googlesource.com/platform/frameworks/minikin...](https://android.googlesource.com/platform/frameworks/minikin/+/master/libs/minikin/OptimalLineBreaker.cpp)

------
thechao
I’m willing to take pull requests if people find issues with the text—I’m not
a strong writer.

~~~
watersb
Thanks for the write-up. I think too much about text layout. I know that web
browser layout is a crazy mass of constraints, still trying for better
implementations.

------
svat
The classic paper in which Knuth and Plass described the algorithm is actually
very readable and enjoyable [1] — though it's long (66 pages!), you can just
stop at any point and still get a lot of value out of it.

PS: I started collecting a few of the more obscure implementations of TeX here
([https://github.com/tex-other](https://github.com/tex-other)) — please let me
know if there are more that I missed (other than the popular ones:
pdfTeX/XeTeX/LuaTeX which are in TeX Live / MiKTeX and don't require
archival). Among these, there are JavaScript and C implementations of this
algorithm.

[1]: [http://eprg.org/G53DOC/pdfs/knuth-plass-
breaking.pdf](http://eprg.org/G53DOC/pdfs/knuth-plass-breaking.pdf)

------
pgtan
I wish there were also other attempts to create a paragraph breaking
algorithms. For example one, which limits the total amount of white space on a
line. Because, if you have a line with many short words on it, and all the
glue is max stretched even not exceeding the allowed value, the line looks
nevertheless somewhat holey.

~~~
svat
Avoiding “holey” lines is in fact is what the Knuth-Plass algorithm tries to
do, and in general it results in “tighter” (less whitespace) paragraphs than
alternative algorithms (those that look at only one line at a time). If you'd
like even less space, you can just set a smaller “ideal” width of a space — if
you have an example of a paragraph (and font, and line width) for which you
think the spacing is not optimal, I can show you what I mean.

~~~
tropo
An example of a paragraph (and font, and line width) for which the spacing is
not optimal is trivial: anything done with the Knuth-Plass line-wrapping
algorithm.

The trouble here isn't just the design of the algorithm. The specification
itself is bad. The moment you start to mess with the kerning (spacing, both
between letters and otherwise) specified by the font designer, you've gone
wrong. The font ships with correct kerning. That kerning is what the font
designed determined would look best and be most readable. Regular letter
spacing also helps with readability. Hyphens are also trouble for readability.

~~~
svat
Perhaps you, and the person I was replying to, are talking about different
things. The Knuth–Plass algorithm does _not_ mess with the kerning specified
by the font designer. In fact, despite some demand, Knuth never added a
feature of letter-spacing into TeX, because all the typographers he spoke to
agreed it was a bad idea. (It is now possible to do that, with hacks and with
other engines like pdfTeX/XeTeX/LuaTeX, but in any case that's not relevant to
the Knuth-Plass algorithm being discussed.)

What the Knuth-Plass algorithm is concerned with is merely where to choose
line breaks. For justified paragraphs (as in any competently typeset book you
can pick up), the space between words necessarily varies from line to line
(barring some staggering coincidence). What the algorithm tries to do is break
lines such that this inter-word space is as close as possible to the ideal
specified in the font (and to minimize the amount of hyphenation required),
and at this it does a better job than other algorithms (those that look at
only one line at a time).

See Figure 4 in the paper: [http://www.eprg.org/G53DOC/pdfs/knuth-plass-
breaking.pdf#pag...](http://www.eprg.org/G53DOC/pdfs/knuth-plass-
breaking.pdf#page=12) (page 1130, the 12th page in the PDF).

~~~
burfog
Justified paragraphs are never justified. Proper typesetting has zero
hyphenation and always obeys the inter-word space specified by the font
designer.

~~~
svat
That's an interesting opinion, stated like a fact. :-) I have three responses
to that:

1\. The opinion is certainly a radical one, i.e. at odds with almost every
typographer. Walk into a library and pick up a random book or magazine (by a
good publisher), and you'll almost surely find justified paragraphs. It's only
some self-published books or typewriter / computer printouts that tend not to
justify. It is an interesting exercise to try to find a physical book (that
you have access to) that contains non-justified paragraphs — I looked at 65
books at home (that have paragraphs) and _all_ of them (across multiple
languages) use justification. (In the poetry books I had to look at the
preface...)

I checked previews online of some books by typographers and font designers,
and (of course) they also contain justified paragraphs: Robert Bringhurst's
_The Elements of Typographic Style,_ etc. I am aware of Eric Gill's _An Essay
on Typography_ that is not, and contains a section called ”The Procrustean
Bed”, and that's about it.

2\. The problem being solved, for which the algorithms are being compared, is
how to generate justified paragraphs. The top-level comment (by pgtan@) that
started this thread, was also about justified paragraphs (that's what I was
pointing out in the comment you replied to). As an analogy, when discussing
algorithms for the traveling salesman problem, one can reject the problem by
declaring that sales must be done online, which... seems somewhat a change of
topic.

3\. Even if what you want is non-justified (ragged-right) paragraphs, you
still need to choose line breaks. The choice is not always unique. The
Knuth–Plass algorithm covers this case too (with the right choice of penalties
and glue), and has something to offer here too. It's in the paper :-)
[http://eprg.org/G53DOC/pdfs/knuth-plass-
breaking.pdf](http://eprg.org/G53DOC/pdfs/knuth-plass-breaking.pdf) (Search
for "ragged")

------
chillacy
This is given as an interview question sometimes, which amazes me because
someone spent a lot of effort to invent this within the past 50 years, and now
you have to come up with it on the spot in 45 minutes (or more likely, leet
code it ahead of time).

~~~
kevmo314
Well it's possible that it really only took them 45 minutes to come up with
the solution and identifying the problem that needed to be solved is what took
longer.

------
tomxor
Wow, I never knew so much thought had been put into line breaking algorithms
before!

I wanted to get a simple and fast solution into a one line regex a while back
with the simple premise of: breaking on spaces before n chars, i was quite
happy with it [1]:

    
    
        const wrap = (s) => s.replace(
            /(?![^\n]{1,32}$)([^\n]{1,32})\s/g, '$1\n'
        );
    

...this looks like a child drawing compared to the article, but then i'm
heavily biased towards simplicity over sophistication.

[1] [https://stackoverflow.com/questions/14484787/wrap-text-in-
ja...](https://stackoverflow.com/questions/14484787/wrap-text-in-
javascript/51506718#51506718)

