In my opinion that is the single biggest problem with web typography, and has been for years. Significantly more important than auto hyphenation.
This has been a solved problem for 40 years (or at least 10–15, if we need to worry about fast performance for interactive use on a desktop computer). Should be table stakes for any software rendering long blocks of text.
It might be possible to use Knuth line breaking in specific circumstances, such as when paragraphs have no floats. But not in general.
Can someone hire some grad students to tackle this or something? We are talking about a significant proportion of all reading people do every day, all over the world.
A pocket computer that can do hundreds of billions of arithmetic operations per second should be able to make text as pretty as an apprentice typesetter from 1600.
As suggested by others bad line-breaking has a bad effect on the large reading population.
Also I hope they can do the thing to adjust spaces on the line so all lines look about the same length.
Filed https://github.com/w3c/csswg-drafts/issues/3756 for this.
Regardless, if it is the case the web relies on this, we should just make it explicitly required to do greedy line breaking. Relying on an implication from here isn't great.
That's what we get when for 20 years W3C and co ignore layout, and instead have designers (who don't know better) use the styling mechanism of CSS and floats as a layout engine.
The only sane layout mechanism for most of those years were tables, which, even though they weren't built for general purpose page layout, they at least had innate layout support for their contents that could be abused.
That was abuse, but far less abuse than using floats for layout, the most idiotic "best practice" the web has seen (and promoted with smugness from ignorant designers to boot).
At least now we have Flex and Grids. It only took 20 years...
Making anything better opt-in sounds all great and well till we end up in this position again whereby we rely on one (or more) browser's unspecified line-breaking algorithm for the "better line-breaking" option. If we specify anything, we probably actually need to specify the algorithm (and given AFAIK there hasn't been notable improvements in decades, that probably isn't the end of the world).
> It might be possible to use Knuth line breaking in specific circumstances, such as when paragraphs have no floats. But not in general.
If the CSS specification was updated to add vastly improved line-breaking to all text, would it really be that big a deal to add a few new exceptions to go with it? E.g. make floats behave differently when different line-breaking is used.
There's already plenty of cases in CSS of "oh, that doesn't work there because you're using inline/absolute/float/overflow". It's not like the rules are particular minimal or intuitive right now.
2. One could just determine the height of the float ceiling with the greedy algorithm and run the optimization algorithm with that constraint
Sometimes I think this should be the default: definitely the worst typesetting I've seen is in documents produced by people who ignored all the overfull box warnings; even the greedy algorithm of Word etc would be vastly preferable to that.)
For typesetting more broadly, certainly. For making a clean right edge, no.
So yes they can be used together. In fact the Unicode TeX engines like XeTeX and LuaTeX do so (I don't know how well they follow the Unicode specification intrinsically versus with the support of language-specific packages, but they seem to do the job).
What’s tricky is actually OpenType, because the length of a “word” can vary in complex ways depending on whether or not it is broken - much more complex than simple hyphens.
Burmese doesn't have spaces between words, just between phrases and sentences. Breaking lines between sentences results in really ugly jagged paragraphs. (I've ended up inserting Unicode zero width spaces between each syllable to get line breaking to work consistently in projects I've worked on.)
Checking just now on Windows 10, Firefox and Edge aren't doing it correctly but Chrome and IE are. Even Windows Notepad is getting it right.
But my impression as a reader is that I stumble over hyphenated words much more often than I am distracted by a particularly ragged right edge or a big river in a justified paragraph.
Oh don’t worry I prefer words like “Kindercarnavalsoptochtvoorbereidingswerkzaamhedencomitéleden” to be one thing so I can just gloss over the blob. Only have to read the blob once to recognise the shape and know what it refers to. With hyphenation it’s a different blob every time, so then I might spend time reading 60 letters.
Hyphenation is for newspapers, where space is money.
But if you want to read it, hyphenation makes it much easier. Because your claim that there would be a "different blob every time" is not false, but misleading. There are different blobs, but they are all ones that can berecognized by shape.
Because such long words would certainly be split up between their constituent words, not between random places.
This actually makes a hyphenated composite word easier to read than the very long non-hyphenated form.
At least for German, this is not a requirement. Hyphenation doesn't happen at random places but at syllable boundaries.
The only care that you should place up hyphenation is that it shouldn't lead to possible misreadings. "Ur-in-stinkt" being the classic example. Only split that word at the first, never at the second possible hyphen.
Maybe you do get into those situations?
And what is the "more than once" thing supposed to mean? Hyphenation helps the first time, too.
I think it is long past time to agree that hyphens were always a very bad idea, along with mucking up the font designer's intended kerning. There is nothing wrong with ragged right text.
For best readability, we should prefer to break lines such that we avoid breaking up significant logical parts of text. If reasonably avoidable, don't break: clause, phrase, sentence, quotation, etc.
The problem is that justified text is hard - it needs human intervention and proper settings and composer engine. Basically you can properly do it only in LaTeX with some plugins and indesign.
The way it is handled in browsers must be computationally efficient so browsers do it in half assed way.
Dont use hyphenation in digital except where stuff would overflow.
It's probably not needed today but when I first used it the logic was along the lines of adding it at the 5th 7th 9th characters provided there were at least 4 more characters after each ­ or some such while being wary of hanging the last word etc.
It's a soft hyphen meaning the browser will break if needed otherwise ignore it. It's an HTML feature rather than a CSS one which also has implications.
MDN has a good page to see it in action, as well as comparison with <wbr>
Those who work for publishes lament this endlessly, because lack of software support for certain features limits the quality of what they can publish. (And note for several major publishers their print editions are nowadays typeset from HTML/CSS using Prince.)
Interesting, having played around with wkhtmltopdf and Puppeteer, I had no idea any HTML-to-PDF renderers would be a viable option for publishers, but their examples are pretty impressive – definitely more than any e-readers manage!
> as fundamentally they don't believe there's a business case for it
And they’re probably right. I imagine very few people avoid e-books because of their typographical shortcomings. The arguments I hear against them tend to be on a more fundamental level (‘I prefer to be able to feel the paper’), and even people like me who do notice bad typography put up with e-readers. Publishers meanwhile may well care about their readers’ experience, but they can’t afford not to sell digital copies.
So I understand that typography probably doesn’t make much of a difference to the bottom line of e-reader manufacturers, I just wish the bottom line weren’t the only incentive for large companies.
It's not cheap, but if you're a publisher I imagine the cost savings of not having to deal with multiple formats (one for print, one for digital) outweighs that many times over, provided you can get good enough results.
I do wonder if any publishers will get into the e-reader space, especially on the software-only side. But it's not an easy market to get into and the costs are pretty high, and I think both of us are dubious as to how much any user actually cares.
There are lots of different options:
The kindle also doesn't support proper kerning. Sorry if I told you that and you can't unsee it now.
But you can install 3rd party reading software on jailbroken kindles like koreader that fixes some of the deficiencies.
Although Koreader has been originally developed for e-ink devices. I'm not sure how well it has been optimized for normal tablet screens.
E.g. to pro-ject an image but to work on a proj-ect. 
Documentation does seem to suggest that inserting a soft hyphen ­ at the correct point will serve as a hint that automatic hyphenation should obey.  Not that many people are going to remember to bother.
But wondering if any browser's automatic hyphenation dictionaries attempt to perform any contextual analysis such as part-of-speech tagging to try to get it right in ambiguous cases?
Hyphenation is by far most valuable in narrow columns and particularly columns that are justified, because it allows for far more even spacing between words/letters.
(I wonder, for example, if the NYT would ever adopt hyphenation in the narrow article blurbs on its home page, which are in narrow columns in a grid, although not justified.)
Still, it's a pretty cool tool to have for the occasions when you do want it. (And the author mentions how valuable it is in German, so other languages may need it more.)
This is an effect of artificial technical limitations in the rendering software which was initially designed by computer programmers without much consideration for the capabilities or preferences of human readers – not an effect of screen dimensions. A newspaper page is much wider than the great majority of screens.
Text laid out in one-screen-tall columns with horizontal scrolling is actually dramatically more pleasant to read. If you have a Mac, you can try for yourself, http://amarsagoo.info/tofu/
This is especially true on a multitouch display like an iPad or the like, where swiping to the side is easy and natural.
For a phone display which only fits one column, a continuous vertical scroll might be better, but then you definitely also want good paragraph composition.
Considering the magazine standard is and has long been columns, it's not implausible to consider this a result of the limitations of HTML and CSS. Until fairly recently (and I'm not sure about support across all browsers even today), multi-column layouts were almost impossible, at least without all sorts of crutches and JS.
Or maybe multi-column layout was/is a crutch for printed media, where a 5:32 ratio would be rather impractical to handle on a moving subway?
Either way, the standard of about 60-character column widths is almost definitely the most ergonomical. And at that width hyphenation is quite useful.
Except on mobile, I hope ;)
What about mobile web
So what if the right margin is a bit raggedy.
I have several "books" on the web, for which hyphenation is a godsend, especially as CSS is not just for that computer monitor you're staring at, but also for `media="print"`, which yes: matters even in 2019.
I remember typing papers on manual typewriters and having to think about where to hyphenate (as well as just when to do a carriage return), and it was awful. Admittedly now it is more of an aspect of reading (since an algorithm can do the actual hyphenation), but still. I would prefer we just abandon it.
Could explain why it matters more on a printed page that on a monitor? I don't see any relevant differences.
So for me, if the choice is between "maintain ragged edge, lost half my readership" (on what are already niche topics; how many people can possible care about Bezier curves, for instance) and "justify the content, with hyphenation because holy shit justified text looks bad without it" and not lose that readership, it's a no-brainer.
Because that's primarily what you use it for: you use hyphenation in combination with justified text to make sure that the number of words per line of text end up in the 14~16 range, without crazy longs gaps in sentences that are forced to move words like "reconfiguration" or "interoperability" to the next line because they're not allowed to hyphenate them. Back in the typewriter days, with fixed letter spacing and separate pointsize discs/balls, that was an absolute dire chore: 100% agreed that if at all possible, back then ragged edges were the way to go.
But the moment we got decent automated layout management through LaTeX (while plain TeX worked, it was also horrible) and these days XeLateX, and later on HTML+CSS, the "chore" part disappeared. You simply write your text, you turn on auto-hyphenation, and you don't give it a second thought until someone goes "hey this sentence looks really off", and then you fix that one sentence.
And then for novels, keep things ragged. The readership's used to it. But for web content, especially the kind of textbooks that can be printed, too, meet the people where they are, not just where you're comfortable.
Uhhh, really? Love to see the A-B test. Or anyone who is so "disturbed" by ragged edges that they'll stop reading and is willing to speak up about it. Sorry, but I think you are making that up. That is an absurd claim.
"But for web content, especially the kind of textbooks that can be printed, too, meet the people where they are"
Web content has been mostly without hyphenation for 20+ years. People ARE used to it.
(If you think I'm a kid with no concept of design: I'm 55 years old, have a degree in design, and was doing bezier and b-spline curves starting in the mid 80s)
One example I see on this page is "be-fore", In my head it is "be. fore." and suddenly this word seems to have a lot more emphasis than it ought to have. And I also get distracted when it's unclear if a word was intentionally hyphenated or just auto-hyphenated. Maybe just the way that some of us process text.
Sure, you can practice and learn to get used to it (the first half of my life there was no web, so I certainly did), but why should you have to? YOU CaN gEt usED To tHIS as wELl, wiTh somE PRActICE.
I'd like to see a study that actually measures reading speed and comprehension with and without.
Here's a good example of a place where having right justification was important for aesthetics, but they still chose not to hyphenate (presumably because it is just ugly):
Personally, I find a very small proportion of text to fall into the latter category. But for those that do, clean presentation is huge.
Personally I don't think they make it look cleaner, but the opposite.
Time to experiment!
Related, I was looking at multi-column CSS for magazine or newspaper typesetting. The controls on column-breaking were not well supported, making it very difficult to not end up with ugly typography. Would love to see improvements there as well.
If Chrome doesn't do it on Windows, then it just as well might not exist.
Chrome has ~80% of the users, and Windows has a 90% as well. So Chrome/Windows is the most important combo.
I wouldn't call this state "broadly supported" in any way.
Also I recently noticed Firefox seems to copy it, which is pretty annoying (didn't use to do that, I think).
In my experimentation hyphenate: auto doesn't give good results; you need all the other hyphenate rules to get that, which aren't widely supported yet.
Also from the article:
"Safari, Firefox and Internet Explorer 9 upwards support automatic hyphenation, as does Chrome on Android and MacOS (but not yet on Windows or Linux)."
And if I want to have more than one language on the same webpage?
Simple, add more than one lang="" attribute. You can add them to _any_ HTML tag.
Learn more about lang in HTML here: https://html.spec.whatwg.org/#the-lang-and-xml:lang-attribut...
And it can be really useful for styling purposes too - you can use CSS to style writing direction, font, text rendering settings, even which types of quotation marks to use for each language you want to specify
Learn more about :lang() in CSS here: https://drafts.csswg.org/selectors-4/#the-lang-pseudo
Or see an example of how it can be put to use here: https://github.com/mozdevs/cssremedy/blob/master/quotes.css
Thanks! Very helpful and thoroughly referenced :)