Hacker News new | past | comments | ask | show | jobs | submit login
Knuth and Plass line breaking algorithm in JavaScript (bramstein.com)
138 points by apl on Dec 6, 2010 | hide | past | web | favorite | 42 comments

The web is now twenty and browsers are still incapable of something as basic and commonplace as hyphenation and justification. It’s a real shame that this problem has to be solved with JavaScript in 2010.

How old is TeX again?

I've been asking for years why browsers do not have this. The only reply I've gotten is for performance considerations, which is a bad answer for several reasons.

Internet Explorer actually has this through the (almost standardized) text-justify CSS property. It still doesn't do hyphenation, but Hyphenator.js (http://code.google.com/p/hyphenator/) fills that gap pretty nicely.

Performance isn't a good argument in my opinion. The algorithm isn't that expensive. The most expensive part right now is retrieving all the text metrics, but you would get that a lot cheaper in the browsers rendering engine.

I briefly looked at hacking it into Webkit, but then gave up due to a lack of time.

I also looked into it recently, and Damon from the Gnome project attempted it in 2002 or so. Also Adobe+Google are trying to get it into WebKit. The problem (as far as I can tell) is that people have tried to do it all at once, and pushing such a large thunk of code upstream is very hard.

I suspect that if you take it in pieces: first get a decent hyphenation algo into Pango, then get that into FF and WebKit, then work on the line-breaker, and then get a new CSS rule approved by the W3C... well, maybe you could get it done in 3 or 4 years.

W3C [...] 3 or 4 years; don't hold your breath.

Last I looked the CSS3 working draft included a hyphenate property.

Is Internet Explorer’s justification done for whole paragraphs using a reasonable algorithm, or just line by line?

It is the whole paragraph as far as I can tell (I haven't seen the internals, just the output.)

One other thing. This algorithm, unless I'm missing something, doesn't handle situations in which the available widths are different for different lines in the paragraph, and in particular in which they available width for a line depends on the precise break positions and vertical alignment results of all the earlier lines in the paragraph. Handling this is required to correctly handle CSS floats. Greedy line-breaking does this by the simple expedient of fully laying out all previous lines in the paragraph before considering the next line.

> The algorithm isn't that expensive.

It's quadratic in length of the paragraph, no? Not a problem for most reasonable text chunks, but browsers have to deal with unreasonable text too. In particular, O(N^2) algorithms in browser layout are generally unacceptable...

But it would only apply when the web page author "opted in" with the appropriate CSS, no? Doesn't seem like it should affect performance on pages that don't use the feature.

If you made it an opt-in, that might be doable... though there would still be the danger of pages cargo-culting into the opt-in.

But at that point you're also asking browsers to maintain two separate line-wrapping codepaths, of which one is not used anywhere to a first approximation. Browser vendors seem to be somewhat resistant to doing that sort of thing.

It could maybe be switched off (even if requested) once a paragraph hits a certain threshold size. Of course I suppose that could get complicated as the paragraph gets mutated by JavaScript... you don't want to be turning it on and off all the time.

It sounds like a difficult thing to do correctly for any given language. I'd probably procrastinate it too.

Another reason I've heard before was internationalization. Not sure how many languages have draconian hyphenation rules like English though.

English hyphenation rules are very simple and relaxed by continental standards. It is about the only language where such a simple and straightforward algorithm as the one in TeX (it doesn't even contain a full list of word stems and a rule engine for the (de)construction of composite words) can work.

>> English hyphenation rules are very simple and relaxed by continental standards.

Really? From my experience, ESL students often don't understand the logic behind english syllables.

My point though, is that if different languages hyphenate differently and we're talking about sites with international user generated content (Facebook or Orkut come to mind), then it's not exactly trivial to hyphenate correctly.

I was told by my English teacher to just not hyphenate because the rules are crazy and inconsistent. Might be a British English thing, though.

It's not only browsers. Look at Word, for god's sake.

Even hyphenation like in Word would be an improvement over the status quo.

I must say this is a very pretty render for a web-based text. Too bad the justification breaks when zooming in/out.

Also, the computation takes a while. Is this a dynamic programming algorithm? Maybe browsers should support this natively.

Zooming is problematic, I haven't quite figured out a reliable way of fixing it yet. The problem is most apparent in Webkit based browsers, Firefox seems to handle it much better (though there are still some small problems--most can be fixed though.)

Yes, this is an application of dynamic programming. The computation is actually quite fast, most of the time is spent in retrieving the text metrics (put each word in a span, retrieve width and move on to the next word.) If that can somehow be alleviated it would become a feasible solution.

I agree with you that browsers should support this natively (Internet Explorer actually does.) If you are interested, I've written a bit on this subject in this Typophile thread: http://typophile.com/node/71247

Oh, I was wondering if it was just me. A few lines seem to have hanging right margins. This is probably because my morning lean-back browsing is done at +2 zoom-factor in Safari.

Looks great, but it would be nice if the hyphens were removed when copying text.

I recall the first interesting example usage my professor taught of dynamic programming was this example (well, described, not taught). He basically just said that all you do is take the L^2 or L^3 sum of extra space and minimize it using a DP. The fact that that yields great-looking text I thought was pretty cool.

I see no hyphens... breaks (eg: lines 3-4), but no markers for them.

edit: wait, they're there when I zoom out a couple levels. Weird.

That's probably a bug, what browser and operating system are you using?

Same here, Chrome 8.0.552.215, Mac OS X 10.6.5.

I have the same Chrome version, but I'm running Ubuntu. Perhaps it is a font issue. I will check it out later when I have access to a Mac. Thanks for reporting.

I set font family to Times, and, indeed, I can see hyphens.

Similar to dchest, though Chrome 9.0.something. Up-to-date dev release.

dito, same problem

This is beautiful. Fantastic work. Now, I'd love to see someone work this into WebKit or Gecko's core.

Great work! The output is beautiful. Looking forward to this being a standard library in the future. Do you know how well it might handle cases of text that already has some formatting?

Also: It blows up in IE9 for me though, many of the lines go on for quite a ways.

For those who don't know the algorithm, here's a bunch of links about it in Wikipedia:


I would also highly recommend "Digital Typography" by Knuth. It contains a more detailed description of the algorithm as well as many interesting historical and technical chapters on TeX, MetaFont and computer typesetting in general.

Looks very nice. The only annoying thing I can see is that copying the text ends up with word-breaks in the hyphenated words.

It looks really good in Firefox! However, I just tried printing the page and though it still looks decent, the justification is a lot worse than when rendered on-screen.

I honestly hadn't thought about printing it out yet. I had a quick look, and I think the issues you are seeing can probably be fixed. My initial plan was to render PDFs server-side, instead of relying on the browser's print mode.

Does anyone know if there is a good jQuery plugin for this, or do I need to take the code from this demo and make my own plugin?

As far as I know, this is the only implementation in JavaScript. It might be possible to turn this into a jQuery plugin at some point, but there are probably quite a couple of bugfixes and changes needed to turn this from a tech demo into a drop-in plugin.

At the default zoom on my G1, the results are poor. Pull back a bit and everything looks swell.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact