
Hyphenation in CSS - kawera
http://clagnut.com/blog/2395
======
jacobolus
So when are browsers going to abandon their greedy line breaking algorithm and
implement Knuth/Plass or similar?

In my opinion that is the single biggest problem with web typography, and has
been for years. Significantly more important than auto hyphenation.

This has been a solved problem for 40 years (or at least 10–15, if we need to
worry about fast performance for interactive use on a desktop computer).
Should be table stakes for any software rendering long blocks of text.

~~~
pcwalton
This comes up on every thread related to web typography, and the answer as
always is that it's not possible in the general case, at least not with the
specs as they are today. The biggest problem is that the CSS specification
demands that a float be placed as high as possible (CSS 2.1 section 9.5.1,
rule 8 [1]). But float ceilings can be anywhere in the middle of a paragraph.
By its nature, Knuth line breaking means that any particular unit (word), and
therefore any float ceiling, might not be _as high as possible_. In fact, the
only algorithm that can be used in this case to satisfy the spec is the greedy
one. Therefore, Knuth line breaking cannot be used on the Web.

It might be possible to use Knuth line breaking in specific circumstances,
such as when paragraphs have no floats. But not in general.

[1]: [https://www.w3.org/TR/CSS2/visuren.html#float-
position](https://www.w3.org/TR/CSS2/visuren.html#float-position)

~~~
jacobolus
There has to be _some_ kind of method that does a better job balancing line
widths than the naïve greedy version, and is compatible with whatever
requirements browsers are subject to.

Can someone hire some grad students to tackle this or something? We are
talking about a significant proportion of all reading people do every day, all
over the world.

A pocket computer that can do hundreds of billions of arithmetic operations
per second should be able to make text as pretty as an apprentice typesetter
from 1600.

~~~
pcwalton
As stated before, the _only_ algorithm that satisfies the constraint that
floats must be placed as high as possible is the greedy one (or, to be more
precise, any such algorithm must match the output of the greedy one).

~~~
sjwright
Why not just make this algorithm a non-default option (like hyphenation) and
accept that when it's enabled floats may not be optimal?

~~~
pcwalton
Sure, an opt-in may be possible. But it requires some spec work and isn't
something browsers can just do right now. As gsnedders points out, we'd
probably have to spec the entire algorithm to avoid causing compatibility
problems in the future.

~~~
galaxyLogic
As you suggested if there are no floats there would be no reason to NOT use
the better algorithm. I personally don't use floats often and definitely not
inside paragraphs.

As suggested by others bad line-breaking has a bad effect on the large reading
population.

Also I hope they can do the thing to adjust spaces on the line so all lines
look about the same length.

------
packet_nerd
Somewhat related, but not the same thing; I'm curious when Firefox will
finally get proper line breaking support for Burmese and related languages?

Burmese doesn't have spaces between words, just between phrases and sentences.
Breaking lines between sentences results in really ugly jagged paragraphs.
(I've ended up inserting Unicode zero width spaces between each syllable to
get line breaking to work consistently in projects I've worked on.)

Checking just now on Windows 10, Firefox and Edge aren't doing it correctly
but Chrome and IE are. Even Windows Notepad is getting it right.

~~~
roca
File a Mozilla Bugzilla bug? Firefox is meant to use Uniscribe line breaking
for complex scripts on Windows, which should handle this.

------
munificent
Is there any data on whether hyphenation actually improves reading speed? My
hunch is that the problem it mostly solves is packing a greater number of
words on a fewer number of printed pages, thus saving the printer money.

But my impression as a reader is that I stumble over hyphenated words much
more often than I am distracted by a particularly ragged right edge or a big
river in a justified paragraph.

~~~
ivanhoe
I have no data on it, but I presume it depends on a language heavily. In
languages like English where words are not that long it'd be very counter-
intuitive that hyphenation could help with reading speed as we don't read
individual letters, we read word by word. Splitting words in two parts can
only slow down the reader. In languages like German with a lot of very long
words built by combining a few simpler ones, I presume hyphenation can be
important tool to avoid having big holes in the text that could make it hard
for readers' eyes to follow the lines of text.

~~~
chronogram
> very long words, hyphenation important

Oh don’t worry I prefer words like
“Kindercarnavalsoptochtvoorbereidingswerkzaamhedencomitéleden” to be one thing
so I can just gloss over the blob. Only have to read the blob once to
recognise the shape and know what it refers to. With hyphenation it’s a
different blob every time, so then I might spend time reading 60 letters.

Hyphenation is for newspapers, where space is money.

~~~
Tomte
If you want to skip over the blob, sure.

But if you want to read it, hyphenation makes it much easier. Because your
claim that there would be a "different blob every time" is not false, but
misleading. There are different blobs, but they are all ones that can
berecognized by shape.

Because such long words would certainly be split up between their constituent
words, not between random places.

This actually makes a hyphenated composite word easier to read than the very
long non-hyphenated form.

~~~
bhaak
> Because such long words would certainly be split up between their
> constituent words, not between random places.

At least for German, this is not a requirement. Hyphenation doesn't happen at
random places but at syllable boundaries.

The only care that you should place up hyphenation is that it shouldn't lead
to possible misreadings. "Ur-in-stinkt" being the classic example. Only split
that word at the first, never at the second possible hyphen.

~~~
Tomte
It is not a requirement, but you would generally do that. Hafen-meister
instead of Ha-fenmeister.

------
aleem
&shy; support has been around for at least a decade and it can be done server
side with any logic.

It's probably not needed today but when I first used it the logic was along
the lines of adding it at the 5th 7th 9th characters provided there were at
least 4 more characters after each &shy; or some such while being wary of
hanging the last word etc.

It's a soft hyphen meaning the browser will break if needed otherwise ignore
it. It's an HTML feature rather than a CSS one which also has implications.

~~~
cryo
Awesome, didn't know about that one.

MDN has a good page to see it in action, as well as comparison with <wbr>

[https://developer.mozilla.org/en-
US/docs/Web/HTML/Element/wb...](https://developer.mozilla.org/en-
US/docs/Web/HTML/Element/wbr)

------
atomwaffel
On a slightly related note, I find it incredible that the Kindle (and
presumably other e-readers) still fail to support typographic standards that
printed books have had for hundreds of years. It took them years to support
hyphenation at all, and now that they do, the number of times I’ve turned a
page only to find it completely blank but for the very last syllable of the
very last word of the chapter is infuriating. I know it’s a small thing, but
it instantly rips me out of the story and reminds me of the imperfections of
the thing. For a single-purpose device by one of the world’s largest tech
companies, that’s just not good enough.

~~~
a13n
What do you expect it to do differently in that case?

~~~
bhaak
This is a problem since printed books exist.

There are lots of different options:
[https://en.wikipedia.org/wiki/Widows_and_orphans#Guidelines](https://en.wikipedia.org/wiki/Widows_and_orphans#Guidelines)

The kindle also doesn't support proper kerning. Sorry if I told you that and
you can't unsee it now.

But you can install 3rd party reading software on jailbroken kindles like
koreader that fixes some of the deficiencies.

~~~
UsernameProxy
I was just trying to find _koreader_ on the Play Store, but am not having much
luck. What's the actual name, please?

~~~
bhaak
I don't know if it's in the Play Store but you can download the APK directly
from GitHub:
[https://github.com/koreader/koreader/releases](https://github.com/koreader/koreader/releases)

Although Koreader has been originally developed for e-ink devices. I'm not
sure how well it has been optimized for normal tablet screens.

~~~
UsernameProxy
Great - thanks for the link.

------
reaperducer
Now if only there was a way to specify that paragraphs of text shouldn't leave
an orphan word on the last line, I could use one CSS property to get the
consultants off my back instead of having to preg_replace the last space in
every <p> paragraph with an &nbsp;.

~~~
TheRealPomax
You might at least be interested to know that you can also wrap those last(two
or more) words in <span class="nobreak"> with the nobreak class using "white-
space: nowrap;" so that your solution works across all languages, rather than
just those where the non-breaking space doesn't looking horrendously out of
place.

------
crazygringo
Curious how automatic hyphenation works with ambiguous words?

E.g. to pro-ject an image but to work on a proj-ect. [1]

Documentation does seem to suggest that inserting a soft hyphen &shy; at the
correct point will serve as a hint that automatic hyphenation should obey. [2]
Not that many people are going to remember to bother.

But wondering if any browser's automatic hyphenation dictionaries attempt to
perform any contextual analysis such as part-of-speech tagging to try to get
it right in ambiguous cases?

[1] [https://www.merriam-webster.com/dictionary/project](https://www.merriam-
webster.com/dictionary/project)

[2] [https://css-tricks.com/almanac/properties/h/hyphenate/](https://css-
tricks.com/almanac/properties/h/hyphenate/)

~~~
function_seven
I never knew about the &shy; entity, but now I won't forget it. It's a hyphen,
but a shy one!

------
crazygringo
In English at least, I don't think hyphenation is ever really going to take
off on the web, because most body text on the web is in a wide column and
left-justified, since screens are wider rather than taller (unlike books and
newspapers).

Hyphenation is by far most valuable in narrow columns and particularly columns
that are justified, because it allows for far more even spacing between
words/letters.

(I wonder, for example, if the NYT would ever adopt hyphenation in the narrow
article blurbs on its home page, which are in narrow columns in a grid,
although not justified.)

Still, it's a pretty cool tool to have for the occasions when you do want it.
(And the author mentions how valuable it is in German, so other languages may
need it more.)

~~~
tantalor
> screens are wider rather than taller

What about mobile web

~~~
crazygringo
Excellent point, totally escaped me. Yes, I would _love_ to see hyphenation
take off more when reading articles on phones!

------
robbrown451
I am fine with just considering hyphenation a thing of the past. Do we really
think it adds value? I find it far easier to read text that doesn't break
words in half.

So what if the right margin is a bit raggedy.

~~~
TheRealPomax
Respectively: that's your perogative; yes; cool; typesetting matter, sometimes
drastically, even if most of the time you don't even notice it.

I have several "books" on the web, for which hyphenation is a godsend,
especially as CSS is not just for that computer monitor you're staring at, but
also for `media="print"`, which yes: matters even in 2019.

~~~
robbrown451
Ok, well I personally find them a bit jarring, after years of mostly reading
on the web where they are extremely rare.

I remember typing papers on manual typewriters and having to think about where
to hyphenate (as well as just when to do a carriage return), and it was awful.
Admittedly now it is more of an aspect of reading (since an algorithm can do
the actual hyphenation), but still. I would prefer we just abandon it.

Could explain why it matters more on a printed page that on a monitor? I don't
see any relevant differences.

~~~
TheRealPomax
No, but therein lies the crux. You don't see any relevant differences, but
there are billions of people who aren't you, and a good portion of them are
what people who don't care about ragged vs perceived aligned edges call "a bit
OCD" (even though of course it has nothing to do with OCD). They prefer nice,
clean typesetting and layout, and enough of them are disturbed by ragged edges
in "not novels" to not even bother reading more than a page. That's eyeballs,
and opinions, lost over a typographical feature.

So for me, if the choice is between "maintain ragged edge, lost half my
readership" (on what are already niche topics; how many people can possible
care about Bezier curves, for instance) and "justify the content, with
hyphenation because holy shit justified text looks bad without it" and not
lose that readership, it's a no-brainer.

Because that's primarily what you use it for: you use hyphenation in
combination with justified text to make sure that the number of words per line
of text end up in the 14~16 range, without crazy longs gaps in sentences that
are forced to move words like "reconfiguration" or "interoperability" to the
next line because they're not allowed to hyphenate them. Back in the
typewriter days, with fixed letter spacing and separate pointsize discs/balls,
that was an absolute dire chore: 100% agreed that if at all possible, back
then ragged edges were the way to go.

But the moment we got decent automated layout management through LaTeX (while
plain TeX worked, it was also horrible) and these days XeLateX, and later on
HTML+CSS, the "chore" part disappeared. You simply write your text, you turn
on auto-hyphenation, and you don't give it a second thought until someone goes
"hey this sentence looks really off", and then you fix that one sentence.

And then for novels, keep things ragged. The readership's used to it. But for
web content, especially the kind of textbooks that can be printed, too, meet
the people where they are, not just where you're comfortable.

~~~
robbrown451
"maintain ragged edge, lost half my readership"

Uhhh, really? Love to see the A-B test. Or anyone who is so "disturbed" by
ragged edges that they'll stop reading and is willing to speak up about it.
Sorry, but I think you are making that up. That is an absurd claim.

"But for web content, especially the kind of textbooks that can be printed,
too, meet the people where they are"

Web content has been mostly without hyphenation for 20+ years. People ARE used
to it.

(If you think I'm a kid with no concept of design: I'm 55 years old, have a
degree in design, and was doing bezier and b-spline curves starting in the mid
80s)

------
Theodores
I imagine this can be combined with a media query so that you can use
hyphenation differently on small screens when real estate might be lacking and
the ragged edge takes up too much space.

Time to experiment!

------
undershirt
Good to have this overview! I abandoned hyphenation in a project after seeing
it behave so different from LaTeX’s. Looks like I missed some options.

Related, I was looking at multi-column CSS for magazine or newspaper
typesetting. The controls on column-breaking were not well supported, making
it very difficult to not end up with ugly typography. Would love to see
improvements there as well.

~~~
erichurkman
If you want real typesetting via CSS, you can use PrinceXML. It does great
hyphenation, column breaking, and has some extensions to CSS to allow similar
controls as you'd get from LaTeX's column options.

[0]
[https://www.princexml.com/doc/11/hyphenation/](https://www.princexml.com/doc/11/hyphenation/)

[1] [https://www.princexml.com/doc/11/floats/#float-
extensions](https://www.princexml.com/doc/11/floats/#float-extensions)

~~~
undershirt
Oops, I meant newspaper-like layouts for browser pages. But cool, I like the
specific extensions PrinceXML has for column control. thanks for the
reference, will try it if i need this for printing

------
coldtea
> _Automatic hyphenation on the web has been possible since 2011 and is now
> broadly supported. Safari, Firefox and Internet Explorer 9 upwards support
> automatic hyphenation, as does Chrome on Android and MacOS (but not yet on
> Windows or Linux)._

If Chrome doesn't do it on Windows, then it just as well might not exist.

Chrome has ~80% of the users, and Windows has a 90% as well. So Chrome/Windows
is the most important combo.

I wouldn't call this state "broadly supported" in any way.

------
quotemstr
First impression: I like the nuanced control over _where_ hyphenation happens,
but can't help but wonder whether giving the browser control over the
intraword hyphenation candidate points is a good idea. Don't we have soft
hyphen characters in Unicode for exactly the purpose of letting authors
precisely specify hyphenation candidate break locations?

~~~
matt4077
How would this work? You cannot know the exact flow of text on each and every
platform, so you would end up just manually specifying every possible
hyphenation. That would seem to be a lot of work to create something that's
almost definitely going to contain far more errors than a somewhat complete
dictionary that ships with the browser.

~~~
quotemstr
What about words that the browser doesn't know about? While automatic
hyphenation point insertion is convenient, I want to retain the ability to
provide manual overrides.

------
frosted-flakes
One problem for which there is no CSS solution yet is orphaned lines caused by
floated images, when only the last line of a paragraph flows beneath the
image. Particularly annoying with left-floated images in RTL text. I don't use
floated images that often, but sometimes the design calls for it.

------
amelius
A bit late. Why only now do we get support for this basic text formatting
primitive?

~~~
_fzslm
This has been supported since 2011.

~~~
Carpetsmoker
Only hyphenate: auto has, and only in Firefox. Chrome didn't add support until
2 years ago.

In my experimentation hyphenate: auto doesn't give good results; you need all
the other hyphenate rules to get that, which aren't widely supported yet.

~~~
chinathrow
Chrome on Linux and Windows has no support:

[https://caniuse.com/#search=hyphens](https://caniuse.com/#search=hyphens)

Also from the article:

"Safari, Firefox and Internet Explorer 9 upwards support automatic
hyphenation, as does Chrome on Android and MacOS (but not yet on Windows or
Linux)."

------
Udo_Schmitz
>The language of a webpage should be set using the HTML lang attribute: <html
lang="en">

And if I want to have more than one language on the same webpage?

~~~
err4nt
> And if I want to have more than one language on the same webpage?

Simple, add more than one lang="" attribute. You can add them to _any_ HTML
tag.

Learn more about lang in HTML here: [https://html.spec.whatwg.org/#the-lang-
and-xml:lang-attribut...](https://html.spec.whatwg.org/#the-lang-and-xml:lang-
attributes)

And it can be really useful for styling purposes too - you can use CSS to
style writing direction, font, text rendering settings, even which types of
quotation marks to use for each language you want to specify

Learn more about :lang() in CSS here:
[https://drafts.csswg.org/selectors-4/#the-lang-
pseudo](https://drafts.csswg.org/selectors-4/#the-lang-pseudo)

Or see an example of how it can be put to use here:
[https://github.com/mozdevs/cssremedy/blob/master/quotes.css](https://github.com/mozdevs/cssremedy/blob/master/quotes.css)

~~~
Udo_Schmitz
> Simple, add more than one lang="" attribute. You can add them to _any_ HTML
> tag.

Thanks! Very helpful and thoroughly referenced :)

------
syphilis2
There's an ff ligature on the page that got hyphenated for me. ("differences"
under section 1)

------
massivecali
Does this impact screenreaders?

~~~
frosted-flakes
No. Screen readers ignore CSS text transformations like this.

~~~
massivecali
Thanks, good to know.

