Hacker News new | past | comments | ask | show | jobs | submit login
Hyphenation in CSS (clagnut.com)
184 points by kawera on Mar 23, 2019 | hide | past | favorite | 108 comments

So when are browsers going to abandon their greedy line breaking algorithm and implement Knuth/Plass or similar?

In my opinion that is the single biggest problem with web typography, and has been for years. Significantly more important than auto hyphenation.

This has been a solved problem for 40 years (or at least 10–15, if we need to worry about fast performance for interactive use on a desktop computer). Should be table stakes for any software rendering long blocks of text.

This comes up on every thread related to web typography, and the answer as always is that it's not possible in the general case, at least not with the specs as they are today. The biggest problem is that the CSS specification demands that a float be placed as high as possible (CSS 2.1 section 9.5.1, rule 8 [1]). But float ceilings can be anywhere in the middle of a paragraph. By its nature, Knuth line breaking means that any particular unit (word), and therefore any float ceiling, might not be as high as possible. In fact, the only algorithm that can be used in this case to satisfy the spec is the greedy one. Therefore, Knuth line breaking cannot be used on the Web.

It might be possible to use Knuth line breaking in specific circumstances, such as when paragraphs have no floats. But not in general.

[1]: https://www.w3.org/TR/CSS2/visuren.html#float-position

There has to be some kind of method that does a better job balancing line widths than the naïve greedy version, and is compatible with whatever requirements browsers are subject to.

Can someone hire some grad students to tackle this or something? We are talking about a significant proportion of all reading people do every day, all over the world.

A pocket computer that can do hundreds of billions of arithmetic operations per second should be able to make text as pretty as an apprentice typesetter from 1600.

As stated before, the only algorithm that satisfies the constraint that floats must be placed as high as possible is the greedy one (or, to be more precise, any such algorithm must match the output of the greedy one).

Why not just make this algorithm a non-default option (like hyphenation) and accept that when it's enabled floats may not be optimal?

Sure, an opt-in may be possible. But it requires some spec work and isn't something browsers can just do right now. As gsnedders points out, we'd probably have to spec the entire algorithm to avoid causing compatibility problems in the future.

As you suggested if there are no floats there would be no reason to NOT use the better algorithm. I personally don't use floats often and definitely not inside paragraphs.

As suggested by others bad line-breaking has a bad effect on the large reading population.

Also I hope they can do the thing to adjust spaces on the line so all lines look about the same length.

While I agree a literal reading of CSS 2.1 implies this, I'm not sure it's a deliberate implicature. I definitely don't think it's meant to imply that greedy line breaking must be used (and, IIRC, Prince for example does not).

Filed https://github.com/w3c/csswg-drafts/issues/3756 for this.

You don't think violating that rule would break the Web? I know quite a few Web pages are broken in Servo because it doesn't implement float ceilings inside paragraphs properly.

Ergh. And I presume they rely on the position relative to the containing block (and therefore also on line-height computation and that whole underdefined mess), rather than the position relative to the line box where the float is placed?

Regardless, if it is the case the web relies on this, we should just make it explicitly required to do greedy line breaking. Relying on an implication from here isn't great.

I haven't done a comprehensive survey... I just notice it a lot. Mostly it comes up when sites use inline layout as a poor man's flexbox. It might be possible to make better line breaking an opt-in, though.

>Mostly it comes up when sites use inline layout as a poor man's flexbox

That's what we get when for 20 years W3C and co ignore layout, and instead have designers (who don't know better) use the styling mechanism of CSS and floats as a layout engine.

The only sane layout mechanism for most of those years were tables, which, even though they weren't built for general purpose page layout, they at least had innate layout support for their contents that could be abused.

That was abuse, but far less abuse than using floats for layout, the most idiotic "best practice" the web has seen (and promoted with smugness from ignorant designers to boot).

At least now we have Flex and Grids. It only took 20 years...

Yeah, I could see that would be the case that would likely break if it's violated.

Making anything better opt-in sounds all great and well till we end up in this position again whereby we rely on one (or more) browser's unspecified line-breaking algorithm for the "better line-breaking" option. If we specify anything, we probably actually need to specify the algorithm (and given AFAIK there hasn't been notable improvements in decades, that probably isn't the end of the world).

> Therefore, Knuth line breaking cannot be used on the Web.

> It might be possible to use Knuth line breaking in specific circumstances, such as when paragraphs have no floats. But not in general.

If the CSS specification was updated to add vastly improved line-breaking to all text, would it really be that big a deal to add a few new exceptions to go with it? E.g. make floats behave differently when different line-breaking is used.

There's already plenty of cases in CSS of "oh, that doesn't work there because you're using inline/absolute/float/overflow". It's not like the rules are particular minimal or intuitive right now.

1. That situation is equivalent to the one where the text is different and longer, and generally web pages are designed to layout properly regardless of the content of multiline text sections, so probably not an issue in practice

2. One could just determine the height of the float ceiling with the greedy algorithm and run the optimization algorithm with that constraint

There are quite a few pages that are broken in Servo precisely because it doesn't implement that rule.

What about giving us a bloody <proper-text> tag, in which no float BS applies, and which has all the proper typographic treatments?

Then the solution could be a compatible float spec, and a way for users to switch it on (and the old spec off).

It's a solved problem for simple text, but a while ago someone on HN pointed out that (1) with other CSS layout features it has become an NP-complete problem; (2) even just for simple text, sophisticated line breaking algorithms usually have time complexities quadratic in the length of text and cannot be as fast as simple greedy which is linear, and the web is full of people eager to take advantage of this by sending your browser a huge paragraph and locking up your browser.

Knuth/Plass seems to regularly need user input to resolve issues, at least in the TeX implementation - if that’s inherent can it be used in a browser?

User intervention is (occasionally) needed in TeX only at its default settings, which are optimized for “rather than typeset any paragraph with a too-loose line, just give up and warn me loudly so I can take a look”. This makes sense for Knuth and his original goal of producing books. But it's not inherent; a combination of \tolerance=9999 (the default is 200) and \emergencystretch=\maxdimen makes it so that TeX will try to produce the best paragraph possible without requiring the user to intervene and rewrite the words. (Another alternative is to increase the stretchability and shrinkability of the interword glue.) And there will be no overfull boxes unless of course there's a single unhyphenatable word that by itself is longer than the line width. (LaTeX has a shortcut called \sloppy for achieving basically this, but the name scares people who don't understand the TeX algorithm.)

Sometimes I think this should be the default: definitely the worst typesetting I've seen is in documents produced by people who ignored all the overfull box warnings; even the greedy algorithm of Word etc would be vastly preferable to that.)

For basic hyphenation of text paragraphs, it's been years since I felt the need to intervene in (La)Tex hyphenation.

For typesetting more broadly, certainly. For making a clean right edge, no.

I myself was struck by how ugly the hyphenation looked because the hyphens (and periods, and commas, and semicolons, and quotation marks) didn't poke outside the margin slightly to make the perceived margin straight. Even with a greedy line-by-line model as used in web browsers, LaTeX's \usepackage{microtype} ought to be fairly easy to implement.

Does anyone know what the relationship is between Knuth/Plass (or similar pretty-line-wrapping algos, if others exist), and Unicode? I know Unicode has their own line-wrap algorithm, but is it possible to use both?

What Unicode calls the “Unicode line breaking algorithm” (http://unicode.org/reports/tr14/ see “LB1” to “LB31”) is a method of determining “break opportunities”, the places where it is allowed to break a line of Unicode text. What the Knuth-Plass paper (http://www.eprg.org/G53DOC/pdfs/knuth-plass-breaking.pdf) is about is a method of determining, given a paragraph along with its “break opportunities”, a partition of the paragraph into lines such that (a certain function of) the stretching/shrinking of spaces on the lines is minimized.

So yes they can be used together. In fact the Unicode TeX engines like XeTeX and LuaTeX do so (I don't know how well they follow the Unicode specification intrinsically versus with the support of language-specific packages, but they seem to do the job).

Awesome, thanks for the explanation!

Yes, the Unicode line breaking algorithm tells you how to compute where breaks are permitted. But it’s up to you to choose which breaks to actually use.

What’s tricky is actually OpenType, because the length of a “word” can vary in complex ways depending on whether or not it is broken - much more complex than simple hyphens.

That happens in TeX as well, the width of “efficient” does not necessarily equal to the width of “ef” + “ficient” due to the presence of ligatures, and TeX deals with it by storing the width of the full ligature as well as the pre and post-line break widths (for all possible line breaks in the middle of paragraphs) and takes them into consideration when calculating line breaks. This can be generalized to any situation where line breaking changes the width of a chunk of the text.

Somewhat related, but not the same thing; I'm curious when Firefox will finally get proper line breaking support for Burmese and related languages?

Burmese doesn't have spaces between words, just between phrases and sentences. Breaking lines between sentences results in really ugly jagged paragraphs. (I've ended up inserting Unicode zero width spaces between each syllable to get line breaking to work consistently in projects I've worked on.)

Checking just now on Windows 10, Firefox and Edge aren't doing it correctly but Chrome and IE are. Even Windows Notepad is getting it right.

File a Mozilla Bugzilla bug? Firefox is meant to use Uniscribe line breaking for complex scripts on Windows, which should handle this.

Is there any data on whether hyphenation actually improves reading speed? My hunch is that the problem it mostly solves is packing a greater number of words on a fewer number of printed pages, thus saving the printer money.

But my impression as a reader is that I stumble over hyphenated words much more often than I am distracted by a particularly ragged right edge or a big river in a justified paragraph.

I have no data on it, but I presume it depends on a language heavily. In languages like English where words are not that long it'd be very counter-intuitive that hyphenation could help with reading speed as we don't read individual letters, we read word by word. Splitting words in two parts can only slow down the reader. In languages like German with a lot of very long words built by combining a few simpler ones, I presume hyphenation can be important tool to avoid having big holes in the text that could make it hard for readers' eyes to follow the lines of text.

> very long words, hyphenation important

Oh don’t worry I prefer words like “Kindercarnavalsoptochtvoorbereidingswerkzaamhedencomitéleden” to be one thing so I can just gloss over the blob. Only have to read the blob once to recognise the shape and know what it refers to. With hyphenation it’s a different blob every time, so then I might spend time reading 60 letters.

Hyphenation is for newspapers, where space is money.

If you want to skip over the blob, sure.

But if you want to read it, hyphenation makes it much easier. Because your claim that there would be a "different blob every time" is not false, but misleading. There are different blobs, but they are all ones that can berecognized by shape.

Because such long words would certainly be split up between their constituent words, not between random places.

This actually makes a hyphenated composite word easier to read than the very long non-hyphenated form.

> Because such long words would certainly be split up between their constituent words, not between random places.

At least for German, this is not a requirement. Hyphenation doesn't happen at random places but at syllable boundaries.

The only care that you should place up hyphenation is that it shouldn't lead to possible misreadings. "Ur-in-stinkt" being the classic example. Only split that word at the first, never at the second possible hyphen.

It is not a requirement, but you would generally do that. Hafen-meister instead of Ha-fenmeister.

Yes I see where you’re coming from if you want to read the blobs, but I cannot recall a situation where I wanted to read them more than once. Not in Dutch or German, so I do really appreciate the lack of hyphenation when reading those languages (on my phone screen especially).

Maybe you do get into those situations?

Arwe you really saying that words above a certain length don't interest you when reading an article with that word?

And what is the "more than once" thing supposed to mean? Hyphenation helps the first time, too.

I think they are referring to the way many (most?) people read most words - recognising the shape rather than reading the individual letters. Presumably if your language contains several common and unique long words you quickly start to recognise the shape and so no longer need to read the letters.

Yes, I understood that. But the hyphenated parts are also words (if you're hyphenating intelligently), so you recognize their shapes just as well, probably even better.

On a small screen - say, smartphone - it might not be an option.

I certainly have my own data point. When my kids are reading, they stumble very badly over every hyphen.

I think it is long past time to agree that hyphens were always a very bad idea, along with mucking up the font designer's intended kerning. There is nothing wrong with ragged right text.

For best readability, we should prefer to break lines such that we avoid breaking up significant logical parts of text. If reasonably avoidable, don't break: clause, phrase, sentence, quotation, etc.

In that article I came across "of-ten". It took me a noticeable amount of time to work out what it meant.

Weird, that means the "no hyphens in words shorter than 6 letters" rule the author suggests isn't respected in the very article that discusses the rule

"Currently only IE/Edge supports this property (with a prefix), however Safari does support hyphenation character limits using some legacy properties specified in an earlier draft of the CSS3 Text Module."

It's true that it doesn't so much sense in digital. Two reasons exist - mobile devices and aesthetics. Many people regard text set to justified as something luxurious or "high".

The problem is that justified text is hard - it needs human intervention and proper settings and composer engine. Basically you can properly do it only in LaTeX with some plugins and indesign.

The way it is handled in browsers must be computationally efficient so browsers do it in half assed way.

Dont use hyphenation in digital except where stuff would overflow.

&shy; support has been around for at least a decade and it can be done server side with any logic.

It's probably not needed today but when I first used it the logic was along the lines of adding it at the 5th 7th 9th characters provided there were at least 4 more characters after each &shy; or some such while being wary of hanging the last word etc.

It's a soft hyphen meaning the browser will break if needed otherwise ignore it. It's an HTML feature rather than a CSS one which also has implications.

Awesome, didn't know about that one.

MDN has a good page to see it in action, as well as comparison with <wbr>


On a slightly related note, I find it incredible that the Kindle (and presumably other e-readers) still fail to support typographic standards that printed books have had for hundreds of years. It took them years to support hyphenation at all, and now that they do, the number of times I’ve turned a page only to find it completely blank but for the very last syllable of the very last word of the chapter is infuriating. I know it’s a small thing, but it instantly rips me out of the story and reminds me of the imperfections of the thing. For a single-purpose device by one of the world’s largest tech companies, that’s just not good enough.

I know several people on the publishing side around the digital publishing groups around the W3C, but a large part of the problem is most of the companies making the popular readers (both those with strict hardware/software integration, like most eInk readers, and those without, like most phone/tablet/computer-based readers) simply have no interest in participating or improving the quality of their readers (as fundamentally they don't believe there's a business case for it).

Those who work for publishes lament this endlessly, because lack of software support for certain features limits the quality of what they can publish. (And note for several major publishers their print editions are nowadays typeset from HTML/CSS using Prince.)

> And note for several major publishers their print editions are nowadays typeset from HTML/CSS using Prince.

Interesting, having played around with wkhtmltopdf and Puppeteer, I had no idea any HTML-to-PDF renderers would be a viable option for publishers, but their examples are pretty impressive – definitely more than any e-readers manage!

> as fundamentally they don't believe there's a business case for it

And they’re probably right. I imagine very few people avoid e-books because of their typographical shortcomings. The arguments I hear against them tend to be on a more fundamental level (‘I prefer to be able to feel the paper’), and even people like me who do notice bad typography put up with e-readers. Publishers meanwhile may well care about their readers’ experience, but they can’t afford not to sell digital copies.

So I understand that typography probably doesn’t make much of a difference to the bottom line of e-reader manufacturers, I just wish the bottom line weren’t the only incentive for large companies.

Prince is purpose-built for paged media and non-real-time rendering, so it can make plenty of performance tradeoffs that neither WebKit (and wkhtmltopdf) nor Blink (and Puppeteer) can, along with having far more engineering resources caring about the paged media case and edge-cases there.

It's not cheap, but if you're a publisher I imagine the cost savings of not having to deal with multiple formats (one for print, one for digital) outweighs that many times over, provided you can get good enough results.

I do wonder if any publishers will get into the e-reader space, especially on the software-only side. But it's not an easy market to get into and the costs are pretty high, and I think both of us are dubious as to how much any user actually cares.

What do you expect it to do differently in that case?

This is a problem since printed books exist.

There are lots of different options: https://en.wikipedia.org/wiki/Widows_and_orphans#Guidelines

The kindle also doesn't support proper kerning. Sorry if I told you that and you can't unsee it now.

But you can install 3rd party reading software on jailbroken kindles like koreader that fixes some of the deficiencies.

And we have the orphans and widows properties in CSS that provide some control across fragments, though they're not supported in Firefox (or EdgeHTML, for however much longer that matters).

I was just trying to find _koreader_ on the Play Store, but am not having much luck. What's the actual name, please?

I don't know if it's in the Play Store but you can download the APK directly from GitHub: https://github.com/koreader/koreader/releases

Although Koreader has been originally developed for e-ink devices. I'm not sure how well it has been optimized for normal tablet screens.

Great - thanks for the link.

How about proper widow/orphan control?

Now if only there was a way to specify that paragraphs of text shouldn't leave an orphan word on the last line, I could use one CSS property to get the consultants off my back instead of having to preg_replace the last space in every <p> paragraph with an &nbsp;.

You might at least be interested to know that you can also wrap those last(two or more) words in <span class="nobreak"> with the nobreak class using "white-space: nowrap;" so that your solution works across all languages, rather than just those where the non-breaking space doesn't looking horrendously out of place.

Curious how automatic hyphenation works with ambiguous words?

E.g. to pro-ject an image but to work on a proj-ect. [1]

Documentation does seem to suggest that inserting a soft hyphen &shy; at the correct point will serve as a hint that automatic hyphenation should obey. [2] Not that many people are going to remember to bother.

But wondering if any browser's automatic hyphenation dictionaries attempt to perform any contextual analysis such as part-of-speech tagging to try to get it right in ambiguous cases?

[1] https://www.merriam-webster.com/dictionary/project

[2] https://css-tricks.com/almanac/properties/h/hyphenate/

I never knew about the &shy; entity, but now I won't forget it. It's a hyphen, but a shy one!

In English at least, I don't think hyphenation is ever really going to take off on the web, because most body text on the web is in a wide column and left-justified, since screens are wider rather than taller (unlike books and newspapers).

Hyphenation is by far most valuable in narrow columns and particularly columns that are justified, because it allows for far more even spacing between words/letters.

(I wonder, for example, if the NYT would ever adopt hyphenation in the narrow article blurbs on its home page, which are in narrow columns in a grid, although not justified.)

Still, it's a pretty cool tool to have for the occasions when you do want it. (And the author mentions how valuable it is in German, so other languages may need it more.)

> body text on the web is in a wide column and left-justified, since screens are wider rather than taller (unlike books and newspapers).

This is an effect of artificial technical limitations in the rendering software which was initially designed by computer programmers without much consideration for the capabilities or preferences of human readers – not an effect of screen dimensions. A newspaper page is much wider than the great majority of screens.

Text laid out in one-screen-tall columns with horizontal scrolling is actually dramatically more pleasant to read. If you have a Mac, you can try for yourself, http://amarsagoo.info/tofu/

This is especially true on a multitouch display like an iPad or the like, where swiping to the side is easy and natural.

For a phone display which only fits one column, a continuous vertical scroll might be better, but then you definitely also want good paragraph composition.

Almost all screens are far too wide for a single column of text. Most publishers with long-form text currently opt for a fixed maximum width.

Considering the magazine standard is and has long been columns, it's not implausible to consider this a result of the limitations of HTML and CSS. Until fairly recently (and I'm not sure about support across all browsers even today), multi-column layouts were almost impossible, at least without all sorts of crutches and JS.

Or maybe multi-column layout was/is a crutch for printed media, where a 5:32 ratio would be rather impractical to handle on a moving subway?

Either way, the standard of about 60-character column widths is almost definitely the most ergonomical. And at that width hyphenation is quite useful.

> Almost all screens are far too wide for a single column of text.

Except on mobile, I hope ;)

> screens are wider rather than taller

What about mobile web

Excellent point, totally escaped me. Yes, I would love to see hyphenation take off more when reading articles on phones!

True, although even mobile screens can hold pretty dense blocks of text.

I am fine with just considering hyphenation a thing of the past. Do we really think it adds value? I find it far easier to read text that doesn't break words in half.

So what if the right margin is a bit raggedy.

Respectively: that's your perogative; yes; cool; typesetting matter, sometimes drastically, even if most of the time you don't even notice it.

I have several "books" on the web, for which hyphenation is a godsend, especially as CSS is not just for that computer monitor you're staring at, but also for `media="print"`, which yes: matters even in 2019.

Ok, well I personally find them a bit jarring, after years of mostly reading on the web where they are extremely rare.

I remember typing papers on manual typewriters and having to think about where to hyphenate (as well as just when to do a carriage return), and it was awful. Admittedly now it is more of an aspect of reading (since an algorithm can do the actual hyphenation), but still. I would prefer we just abandon it.

Could explain why it matters more on a printed page that on a monitor? I don't see any relevant differences.

No, but therein lies the crux. You don't see any relevant differences, but there are billions of people who aren't you, and a good portion of them are what people who don't care about ragged vs perceived aligned edges call "a bit OCD" (even though of course it has nothing to do with OCD). They prefer nice, clean typesetting and layout, and enough of them are disturbed by ragged edges in "not novels" to not even bother reading more than a page. That's eyeballs, and opinions, lost over a typographical feature.

So for me, if the choice is between "maintain ragged edge, lost half my readership" (on what are already niche topics; how many people can possible care about Bezier curves, for instance) and "justify the content, with hyphenation because holy shit justified text looks bad without it" and not lose that readership, it's a no-brainer.

Because that's primarily what you use it for: you use hyphenation in combination with justified text to make sure that the number of words per line of text end up in the 14~16 range, without crazy longs gaps in sentences that are forced to move words like "reconfiguration" or "interoperability" to the next line because they're not allowed to hyphenate them. Back in the typewriter days, with fixed letter spacing and separate pointsize discs/balls, that was an absolute dire chore: 100% agreed that if at all possible, back then ragged edges were the way to go.

But the moment we got decent automated layout management through LaTeX (while plain TeX worked, it was also horrible) and these days XeLateX, and later on HTML+CSS, the "chore" part disappeared. You simply write your text, you turn on auto-hyphenation, and you don't give it a second thought until someone goes "hey this sentence looks really off", and then you fix that one sentence.

And then for novels, keep things ragged. The readership's used to it. But for web content, especially the kind of textbooks that can be printed, too, meet the people where they are, not just where you're comfortable.

"maintain ragged edge, lost half my readership"

Uhhh, really? Love to see the A-B test. Or anyone who is so "disturbed" by ragged edges that they'll stop reading and is willing to speak up about it. Sorry, but I think you are making that up. That is an absurd claim.

"But for web content, especially the kind of textbooks that can be printed, too, meet the people where they are"

Web content has been mostly without hyphenation for 20+ years. People ARE used to it.

(If you think I'm a kid with no concept of design: I'm 55 years old, have a degree in design, and was doing bezier and b-spline curves starting in the mid 80s)

I have to agree. Sounds like a lot of people here like it, but I find it confusing.

One example I see on this page is "be-fore", In my head it is "be. fore." and suddenly this word seems to have a lot more emphasis than it ought to have. And I also get distracted when it's unclear if a word was intentionally hyphenated or just auto-hyphenated. Maybe just the way that some of us process text.

This is down to practice. Most people who spent a few years reading books and newspapers stop noticing the hyphenation at all.

Maybe true, but for whatever reason they haven't been used much on the web, and I think the lack of hyphenation has proved itself to be preferable.

Sure, you can practice and learn to get used to it (the first half of my life there was no web, so I certainly did), but why should you have to? YOU CaN gEt usED To tHIS as wELl, wiTh somE PRActICE.

I'd like to see a study that actually measures reading speed and comprehension with and without.

Here's a good example of a place where having right justification was important for aesthetics, but they still chose not to hyphenate (presumably because it is just ugly): https://static.independent.co.uk/s3fs-public/thumbnails/imag...

It adds considerable value if the goal is not more than getting some words on the screen" but rather "how is the best way I can possibly present this textual document."

Personally, I find a very small proportion of text to fall into the latter category. But for those that do, clean presentation is huge.

Could you give me an example of a type of document that falls into that latter category?

Personally I don't think they make it look cleaner, but the opposite.

I imagine this can be combined with a media query so that you can use hyphenation differently on small screens when real estate might be lacking and the ragged edge takes up too much space.

Time to experiment!

Good to have this overview! I abandoned hyphenation in a project after seeing it behave so different from LaTeX’s. Looks like I missed some options.

Related, I was looking at multi-column CSS for magazine or newspaper typesetting. The controls on column-breaking were not well supported, making it very difficult to not end up with ugly typography. Would love to see improvements there as well.

If you want real typesetting via CSS, you can use PrinceXML. It does great hyphenation, column breaking, and has some extensions to CSS to allow similar controls as you'd get from LaTeX's column options.

[0] https://www.princexml.com/doc/11/hyphenation/

[1] https://www.princexml.com/doc/11/floats/#float-extensions

Oops, I meant newspaper-like layouts for browser pages. But cool, I like the specific extensions PrinceXML has for column control. thanks for the reference, will try it if i need this for printing

>Automatic hyphenation on the web has been possible since 2011 and is now broadly supported. Safari, Firefox and Internet Explorer 9 upwards support automatic hyphenation, as does Chrome on Android and MacOS (but not yet on Windows or Linux).

If Chrome doesn't do it on Windows, then it just as well might not exist.

Chrome has ~80% of the users, and Windows has a 90% as well. So Chrome/Windows is the most important combo.

I wouldn't call this state "broadly supported" in any way.

First impression: I like the nuanced control over where hyphenation happens, but can't help but wonder whether giving the browser control over the intraword hyphenation candidate points is a good idea. Don't we have soft hyphen characters in Unicode for exactly the purpose of letting authors precisely specify hyphenation candidate break locations?

It's U+00AD, or &shy;. While occasionally helpful, it's a complete pain to manually add though.

Also I recently noticed Firefox seems to copy it, which is pretty annoying (didn't use to do that, I think).

Yeah. Removing this character when copying to the clipboard is definitely the right thing to do. Have you filed a bug?

It doesn't sound like a bug to me. If I copy and paste something I expect to get all the Unicode characters. Only the CSS should be stripped. If the browser silently deletes Unicode characters then it's certainly going to cause unexpected data loss in some cases. Firefox is doing the right thing.

I consider soft breaks to be metadata, not data. Dropping soft hyphens doesn't affect the meaning of text and so doesn't drop information that humans care about. If it's not data loss that copy and paste doesn't propagate bold and italic formatting, it's not data loss that it doesn't propagate soft hyphens. Not every Unicode code point conveys information that we need to preserve.

It's not data loss if the bold or italic are in CSS or HTML tags, but if it's 𝐛𝐨𝐥𝐝 or 𝑖𝑡𝑎𝑙𝑖𝑐 like this (which I realize is not how Unicode is supposed to be used) then it's part of the text, just like soft hyphens.

Soft hyphens only "part of the text" because the hyphens are represented using "in band signaling" instead of "out of band" signaling, as in proper bold and italic. It's still just information about the text, not the text itself.

It's exactly because they're in-band signaling that they should be preserved. I don't expect copy and paste to change the sequence of Unicode code points. If the browser "helpfully" removes certain code points then it's more complexity I have to think about. The only change I'd consider acceptable would be changing to a specified Unicode normalization form. In this case the complexity already exists within Unicode, so the browser isn't making it worse.

How would this work? You cannot know the exact flow of text on each and every platform, so you would end up just manually specifying every possible hyphenation. That would seem to be a lot of work to create something that's almost definitely going to contain far more errors than a somewhat complete dictionary that ships with the browser.

What about words that the browser doesn't know about? While automatic hyphenation point insertion is convenient, I want to retain the ability to provide manual overrides.

Simple. Using a hyphenation dictionary, automatically insert soft hyphens at the optimal position in every word before delivery to the client. I haven't seen this done in practice, but I see no reason why it wouldn't work.

One problem for which there is no CSS solution yet is orphaned lines caused by floated images, when only the last line of a paragraph flows beneath the image. Particularly annoying with left-floated images in RTL text. I don't use floated images that often, but sometimes the design calls for it.

A bit late. Why only now do we get support for this basic text formatting primitive?

This has been supported since 2011.

Only hyphenate: auto has, and only in Firefox. Chrome didn't add support until 2 years ago.

In my experimentation hyphenate: auto doesn't give good results; you need all the other hyphenate rules to get that, which aren't widely supported yet.

Chrome on Linux and Windows has no support:


Also from the article:

"Safari, Firefox and Internet Explorer 9 upwards support automatic hyphenation, as does Chrome on Android and MacOS (but not yet on Windows or Linux)."

>The language of a webpage should be set using the HTML lang attribute: <html lang="en">

And if I want to have more than one language on the same webpage?

> And if I want to have more than one language on the same webpage?

Simple, add more than one lang="" attribute. You can add them to _any_ HTML tag.

Learn more about lang in HTML here: https://html.spec.whatwg.org/#the-lang-and-xml:lang-attribut...

And it can be really useful for styling purposes too - you can use CSS to style writing direction, font, text rendering settings, even which types of quotation marks to use for each language you want to specify

Learn more about :lang() in CSS here: https://drafts.csswg.org/selectors-4/#the-lang-pseudo

Or see an example of how it can be put to use here: https://github.com/mozdevs/cssremedy/blob/master/quotes.css

> Simple, add more than one lang="" attribute. You can add them to _any_ HTML tag.

Thanks! Very helpful and thoroughly referenced :)

There's an ff ligature on the page that got hyphenated for me. ("differences" under section 1)

Does this impact screenreaders?

No. Screen readers ignore CSS text transformations like this.

Thanks, good to know.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact