Hacker News new | past | comments | ask | show | jobs | submit login
An end to typographic widows on the web (clagnut.com)
207 points by saeedesmaili 11 months ago | hide | past | favorite | 88 comments



Having some background in typesetting and typography, I don’t think in near term text-wrap: balance would be as good as human eye for headlines.

In a sentence like “One year on and what next for remote working?”, “what next” is a stable phrase and breaking it up is jarring. Either of the below reads better:

> One year on

> and what next

> for remote working?

or:

> One year on and what next

> for remote working?

(On the Web you can achieve those with non-breaking spaces, word joiners, and other similar HTML entities.)

— As a hard rule with rare exceptions, don’t break after conjunctions or short modifying words such as “and”, “the”, “on”, etc.—carry them to the next line.

— As a more vague rule of thumb, do not break after a word that is tightly coupled to the next one (this includes stable phrases and short idioms, adjective-noun combinations, and so on), unless intentionally for word play.

Just one of those small things that together make for clean and readable headlines and GUI copy.


I don't have the same background, but it drives me nuts when I see phrases word-wrapped on awkward boundaries that don't make sense (especially in h1 elements, titles, headers, billboards, posters, etc etc etc).

I'm just going to re-emphasize what you wrote because it's exactly the problem and I want to change the formatting a bit to make it clearer. In the article, the author shows that the headline with text-wrap:balance would look like this

    One year on and what
  next for remote working?
, which breaks right in the middle of "what next". Much better (to my eyes) is wrapping the line along some larger syntactic boundary:

  One year on and what next
     for remote working?
We have the technology to analyze sentences and figure out the syntactic structure. Preferring to break those sentences across larger parts of the syntax tree would make this text-wrapping property so much better.


Thanks for the illustration!

I suppose the technology for more accurate automated line breaks should indeed be already available, though I don’t think it’ll be applied in this particular case quite soon so we’ll have to endure some awkward (even if a bit less ugly now with text-wrap: balance) word wrapping in meantime.


ChatGPT can do it. Perhaps this won't be viable in-browser for a while, but I could see somebody using it in a static site build pipeline to convert space characters to   at compile-time.

---

Break the following headlines up into individual strings where each string is an idiom that shouldn't be broken up. If words aren't part of an idiom, they can be their own string.

"One year on, and what's next for remote workers?"

"Hurricane Ida Gives New York What For!"

"Ice Town Costs Ice Clown His Town Crown"

---

"One year on" "and" "what's next" "for" "remote workers?"

"Hurricane Ida" "Gives" "New York" "What For!"

"Ice Town" "Costs" "Ice Clown" "His" "Town Crown"


> ChatGPT can do it.

This is truly the kind of suggestion that makes me wonder about the energy efficiency of websites.

To me it seems, that many web developers/designers are so happy with the fact that what they are doing actually works, they often forget other (more boring?) considerations like stability, maintainability, resilience, efficiency and such.

We have a LEGO set with infinite pieces and a good website in my eyes doesn't try to use as many as possible of those, but to use just as many as needed while still getting what you aim for.


Additional thought: The problem space of how to break a specific text depending on the size of a window is sort of limited as well. So if someone had the idea to solve this using machine learning, the best idea would be to run these calculations once (per page, serverside), and then just use a lookup table or something of the sorts after. Just store the line break positions for each text on the page for all possible widths that make sense and check that table when rendering.

This has the advantage that a human typesetter could output to this format as well or they could correct the machine generated output.

As there is a limited number of ways to break text the table wouldn't even have to store all pixel widths, just those at which the words break differently.


I think it needs to be done while the browser is flowing text, because it should look good with nearly any width, like:

  Ice Town Costs Renowned Ice Clown His Town Crown
                     and Gown
vs

  Ice Town Costs Renowned Ice Clown
      His Town Crown and Gown


Thanks for expanding on my point. Yes, LLMs or plain algorithms, the capability is here. It just won’t be used for this.


I've long been bugged by the lazy approach to plurals/multiples in e.g. game tooltips. "You loot 861 gold(s)" I wonder if we'll ever see improvements on either front.


It's even more complicated when you try and localize this. Some languages make pluralization very complicated.

A few years ago, one of the localization team members gave me a tip I have been trying to stick to this time around.

Instead of "You found 1 coin." or "You found 2 coins." you turn it into an enumeration "Coins Found: 1"

The localization of the string part is the same for all languages.


Combining text-wrap: balance with non-breaking spaces and word-joiners seems to be a good way to go. Also ­ is good to know.


> non-breaking spaces, word joiners, and other similar HTML entities

Non-breaking spaces in some fonts are actually a different size than normal spaces, and most copy doesn't contain entities. Better for all if we can adjust the display of the text without programmers changing content.


> adjust the display of the text without programmers changing content

It’s this awkward and awesome place where content and appearance meet. Whoever wrote the headline probably knows the exact meaning they wanted to convey, and ideally should have control over where the line would wrap just as they would expect to have a say in word emphasis or punctuation.


> most copy doesn't contain entities

It doesn't need to contain entities. GP is not correct to refer to these characters as "HTML entities", which are just ways to conveniently express the characters in HTML. In fact these characters are Unicode characters just like all the other characters in the copy.


Off topic(?) but, "what next" in that sentence at all seems jarring. English is my first language so I could be wrong, but should it be "what's next" or "what is next"?


Those two choices sound like statements: 'here is what is next for remote working.' It's instead asking where remote working will go, and 'what next' is the expected form there from my experience, though it's usually seen on its own as an exclamation. Contrived example: "I've seen my boss, the Mayor, and the Governor walk through here! What next? The president?"


I don't know, that still feels wrong. The "What next?" in the contrived is "What [will I see] next?" The "what next" in the title is short for "what [is] next," which feels specifically made for "what's next."

Though, in general, just a too long title that doesn't flow well without hints for where the pauses and inflections should be.


I tried to refrain from more radical rephrasing, but yes, it’s not unwarranted.


I don’t think in near term text-wrap: balance would be as good as human eye for headlines

It's apparently very good for reading comprehension.

I've had a lot of training in this area, and while I don't understand why it is, all of the materials I've read — especially in the last ten years — say that orphaned words make sentences harder to understand.


I agree that it’s better than nothing—merely saying that, based on the examples, it’s not as good as the [more cumbersome] manual specification of preferred text wrapping points.


If that's true, then it sounds like an excellent application for AI. Too bad I can't think of an existing model that you can just throw that problem at. There also seems to be next to no money in it for anyone looking to solve it that way.


Might make layout quite energy intensive, especially if this is applied to long body text—and it would certainly be done on the web if it gets as easy as setting a CSS property!


> As a hard rule with rare exceptions, don’t break after conjunctions or short modifying words such as “and”, “the”, “on”, etc.—carry them to the next line.

That's interesting because my inclination is that "One Year On and What Next for Remove Working?" works best because "and" and "for" make it clear to me that the line is continued and that the current line isn't a complete clause.


Interestingly it appears ChatGPT is somewhat adept at this. Asking for the best ways to line-break this title, it suggests:

    "One year on and what next for remote working?" (original)

    "One year on and what next for remote
    working?"

    "One year on and what next
    for remote working?"

    "One year on and
    what next for remote working?"


Its first suggestion is the worst possible way to line-break it, though. A hypothetical automated system that used ChatGPT to do this would need to somehow know to ignore the first suggestion…


>Widow

>A paragraph-ending line that falls at the beginning of the following page or column, thus separated from the rest of the text. Mnemonically, a widow is "alone at the top" (of the family tree but, in this case, of the page).

>Orphan

>A paragraph-opening line that appears by itself at the bottom of a page or column, thus separated from the rest of the text. Mnemonically, an orphan is "alone at the bottom" (of the family tree but, in this case, of the page).

>Alternately, a word, part of a word, or very short line that appears by itself at the end of a paragraph. Mnemonically still "alone at the bottom", just this time at the bottom of a paragraph. Orphans of this type give the impression of too much white space between paragraphs.

https://en.wikipedia.org/wiki/Widows_and_orphans

In this case the author is referring to the last definition, short lines at the end of a paragraph.


The traditional name for widow in German typography is Hurenkind , literally "child of a whore".


The traditional name for orphan in German typography is Schusterjunge, literally "shoemaker boy".


Same in Sweden (or at least used to be) - horunge. If you have a widow on a new page that’s a dubbel horunge.


> Mnemonically (…)

The one that made me remember which is which was something like “a widow continues alone while an orphan is left behind”.

In practice it makes little difference if you mix up the terms, since in context the problematic line will be visible.


Thanks, I kept thinking it was a typo in the title!


Yes this article is about orphans, not widows. However some people use the terms interchangeably.

Widows don't exist without page or column divisions.


> Algorithms such as Knuth-Plass won’t necessarily eliminate widows and orphans, but might go some way to doing so. The reluctance to using such approaches is understandable, however, as they can be extremely demanding: the processing requirements increase quadratically with the paragraph length.

Knuth-Plass isn’t quadratic. This post has some good explanations:

https://github.com/jaroslov/knuth-plass-thoughts/blob/master...



Jeremy Gibbons et al. invented this; I just wrote this up because Jeremy's a really amazing scientist & I think his work should be appreciated, more.


This solves a real problem for me. In the past I've used <wbr> with white-space:nowrap set on the parent to help choose logical breaks in text, but it's tedious and requires DOM access.

A simple CSS rule to automatically calculate this is very welcome.

I'm not sure I like the name "pretty" for the second rule though. If they have to expose the algorithm (first-fit vs Knuth-Plass), I'd rather they choose more descriptive names.


You can also use &nbsp; between the penultimate and the ultimate word, that doesn’t require additional styling.


But that alters the content, and copy/paste would carry that forced pairing into a different line-length context.


In macOS, you can achieve this in plain text with [option]-[space]. Helpful in Markdown — although, of course, you can use `&nbsp;` in Markdown, as well.


This is my preferred method (now!). Also very simple to write a couple of lines of JavaScript to target P and H elements and replaces the last space with a non breaking space.

But a CSS method is very welcome!


Please do this at build time instead of shipping to clients. You'll slow down page rendering, introduce a repaint, and now require a script. Considering all browsers will be doing the exact same execution, it's needlessly wasteful.

Same thing applies to client-side syntax highlighting and LaTeX.


Designers: "Why does this text look different in Firefox? Make it look like Chrome."


lol yes, that's definitely going to happen. However, in this case, I think it's warranted. I've had so many incidents with designers, where they don't like how a line wraps, and want it to break at a more reasonable point. If the text is static, just toss in a <br> and you're done. But with dynamic text, it's just not practical at all to try to fix, so I tell my designers that they have to live with it.

I'm excited about this feature!


Even for static text, line breaks (<br>) can cause unexpected results unless you check at varying viewports. Non-breaking spaces may provide a better option, depending on the specifics of the situation.


Yep, exactly. Though the sheer complexity of the problem makes me really curious how this is implemented under the hood. I'm surprised there aren't parameters devs can use to modify this behavior.


Leaving stuff like this to be interpreted by the User-Agent is simply a tragedy.


This is why I have "Best viewed in Netscape 800x600" button on my page.


I dunno… I kind of miss the days when every site wasn’t statically pixel perfect and text just sorta flowed around and did something sensible based on your browser window size, zoom level, etc. It allowed people to focus on the content over the presentation. But I can also understand the apprehension —- it feels like a recipe for arbitrary fragmentation.


I wonder if it is acceptable now because this is bound to be pretty dynamic across viewport sizes and resolutions. So perhaps they are allowing browsers to pick an algorithm they find best suited. Changing the algorithm afterwards will probably not break anything that wouldn’t be broken by a user shrinking their window.


Just sharing that there is a react component I have used called "React Wrap Balancer" from Vercel that does something similar and works today:

https://vercel.com/blog/react-wrap-balancer

It does add another dependency which I am not fond of.


It's 2023, the AI winter is over, and we still don't have Knuth-Plass in our web browsers or Kindles!

I'm happy to read that it might finally happen with text-wrap: pretty.


Winter is coming.


> What this is NOT is control over widows and orphans

> this isn’t an approach you would take to prevent widows at the end of paragraphs

Title is misleading, great to see progress here though.

It can be difficult to get designers to accept fluidity in text however, especially headings. This is "determined by the rendering engine rather than any [...] CSS specification" so I'm concerned this will bring back bug tickets saying headings appear differently across browsers.


Until this happened, I've been using https://github.com/adamjgrant/Buddy-System


This is brilliant news and frankly should be the global default for headings if we were to start over - hopefully browsers will adapt this ASAP.


I remember how his niche-but-important topic was handled programmatically in the early days of blogging.

Abstraction was sometimes provided in blogging software. For example, in Texpattern's TXP Tags there's a no_widow attribute that's been there since pre-2008 I think?

https://docs.textpattern.com/tags/title

It's funny to remember all the typographic fixes that effectively took place in PHP due to CSS solutions being planned but not ready yet. I'll bet a lot of them are still functioning in various sites out there.


I sometimes really hate how text wraps in browsers. It really limits beautiful designs. What is the current state...are there javascript libraries that could layout text? Or are the primitives not exposed enough to control the spacing and line breaks?

Every time I use tex I just marvel how well it typesets text and fight with floating figures :)


There are JS plugins for it, example: https://opensource.adobe.com/balance-text/

There are also the CSS properties `widows` and `orphans`, they behave a bit differently than "the text is rendered so that the amount of text on each line is about the same."


It's funny, I use Firefox on my phone and both of the examples ("normal" and "balanced") look exactly the same to me[1], with the line break between "savvy" and "businesses", but I gather from the text that they're not supposed to look the same. I get that the article is talking about blink browsers but it's also assuming that everyone has one. I suppose they should have put an image there instead of having the browser render it.

1: https://imgur.com/a/u2N3i9q


You're looking at the wrong part, the article refers to the title, where the widowed word is "working" on the bottom example


The image you linked to is displaying correctly. Its the large headline that is wrapped differently, not the small body text.


The difference is in the headline, not the body text.


We need a rule to stop breaking up the words and their determiners too. (Wife is a typographer)

Something like: « Wikipedia is a

multilingual free online

encyclopedia written and

maintained by a

community of volunteers »

Would be better as: « Wikipedia is a multilingual

free online encyclopedia

written and maintained

by a community

of volunteers »


Chrome discussion: https://groups.google.com/a/chromium.org/g/blink-dev/c/f5eLz...

Apparently no evidence that Safari/Firefox will be implementing this any time soon. So don't get too excited unless you're fine with it only working for Chrome users.


I built something similar for myself a few years back even adding colors to ensure faster reading.

Demo: https://reading.ashishb.net/v1/readable/aHR0cHM6Ly93d3cudGhl...


Great timing. I was actually just complaining about this a couple weeks ago.

Wonder how long this will take to land in the OBS browser where I actually needed it.


1) I have so much trouble caring about issues like "typographic widows".

2) Wait, if this is actually important, why isn't it enabled by default instead of being yet another obscure CSS thing we need to know about?

... I realize these thoughts are somewhat contradictory.


If it were enabled by default then it would potentially change how older webpages were intended to render.


I remember seeing someone linking a React component whos sole purpose was to balance text. Can't remember the name of it though...


CSS and browser engines are increasingly complicated. Why not serve PDF and be done with it? Why reimplement complex typesetting, multiple times in different browsers, and why does it have to all happen on the client-side if you want pixel-perfect display anyway? This seem like an example of stretching a decision made decades ago far beyond the point it makes sense.


In order to be responsive to different widths -- to work on a small phone screen and as 30" monitor.

Also PDF's have page breaks. And they're not interactive (usually).


I still remember when CSS was such a promising thing... And now it'a wreck.


Is there really no fast "good enough" algorithm for breaking body text?


How I would’ve loved to have this available when I worked at The New Yorker!


Tangentially, I really want to see hanging-indent not be limited to WebKit


reminds me of this wonderfully named library from yesteryear - https://github.com/mogelbrod/widont


Not exactly an "end" but appreciated nonetheless


this seems like something that should be left to end users to come up with a solution for. I would guess that a large percentage of web designers don't even know what this is, so adding it to the spec itself is just further bloating an already bloated spec.

one thing I like about the Go language team, is they are not afraid to say "no" to proposals. it seems W3 forgot this tactic, long, long ago, and just rubber stamp anything coming from the Chrome team. Sad.


Any web designer who has studied the minimum amount of typography would know this. And near every graphic designer know this. Near no web designers or graphic designers do this on the web, because they can’t. One persons bloat is another fundamentals.


Any web designer with any degree of training in typography and text layout will recognize this as a problem. It's just that there's almost nothing we could do about it without CSS support, so everyone was forced to simply accept it.


How could end users reasonably solve this?

You can’t know exactly how text will be laid out, so I can only imagine using… canvas and doing your own font rendering? Which sounds horrible for all number of reasons.


More importantly, why should they?

I mean, yeah, I guess giving end users that level of control is a good thing, just like how they can decide the default font style of their browser.

But it's not the user's job to fix your web page.

How text is rendered on a page is a result of intention. The web developer should be able to implement the design intent of a body of text and either allow it to widow where it makes sense or force the text to balance. The problem is that the tool for that simply doesn't exist without some JavaScript foolery, not that the user can't toggle something to make it happen.

> so I can only imagine using… canvas and doing your own font rendering? Which sounds horrible for all number of reasons.

It's both horrible and not.

Canvas can be used merely to make appropriate calculations for text given that it is aware of the size of any font it is using. Rather than rendering with a canvas directly, it could be used to efficiently determine what the width of an element containing text should be to make it "balanced", and maybe where to stick `<br>`.

This has basically nothing specifically to do with widowing, but I once implemented the approach of using canvas to do calculations for actual page text. The goal was to make it so that a given text would always fit the size of its containing element no matter how long the text. In other words, if the container has a fixed height and width, like if you wanted for whatever reason to render the Declaration Of Independence in a 300x300 div, it will find the correct font size to squeeze that whole thing into that div.

https://codepen.io/Ravenstine/pen/QdRYeq

Of course it will get slower the more text it has to calculate. Kinda tempted to see if WASM speeds it up any.

I think it's a poor idea to try to polyfill text balancing in that way, but I think it can be done and in a way that doesn't sacrifice rendering actual text.

EDIT: I realized I contradicted myself.


Its not my problem to solve, that's the whole point. Its a niche enough need that it shouldn't even be part of CSS. people who have a need for it, should figure out some solution on their own, not force their complication onto the entire spec itself.


This isn’t niche at all, this has been a problem on many of the sites I’ve worked on. And the comment above is right, this is a rendering engine issue, there’s not really much you can do as an end-user.


Just stay in the back-end. Then you don't have to deal with these problems. Seems like you are having an opinion on an issue that doesn't really concern you for a reason that I can't seem to identify. Would you care to elaborate on why you care about the CSS typography spec?


thanks for your comment. I will make sure to never comment on a thread again, unless I am a world expert on the subject.


This can't be solved well by the "end user", since it'd have to be done after the page loads.


Yes! Finally!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: