- Code point is sometimes not enough to determine the glyph. For example, U+5199 must look different in Simplified Chinese and Japanese. Typically this is handled by using different fonts, but more formally it should be marked with different lang attributes (in case of HTML).
- Top-to-bottom writing mode is still pretty much in use in Japanese. HTML support is poor, but it's common in PDF. Caveat: Latin alphabets are rotated by 90 degree (as explained in OP), but for punctuations, simply rotating the glyph isn't enough because the center line is going to be slightly off. You need special glyphs for rotated punctuation.
- Unlike Latin scripts, most Chinese and Japanese characters are free to break line at any point, but there are exceptions: contracted letters (e.g. ちょっ) cannot be split. So you'll end up with treating each letter as a separate word. (Contracted letters are treated as one word.)
Japanese hiragana chi, small yo, and small tsu.
Hiragana is a syllabary (not alphabet!) comprised of 46 sounds. Small hiragana can be used to modify the sound of the leading character to expand the set of available phonemes. Here, "Chi (small)yo" becomes "cho".
Small tsu is special. It represents a doubled or germinated consonant.
For example ちょっと (chotto) sounds like chot-to. Example:
Japanese is a really fun language. If you've ever had an interest in learning it, I recommend it. Reading kanji is a joy in and of itself, because it has a pleasant inner logic that gets deeper the more you learn.
Tibetan is also in pretty desperate need of spelling reform. I’d love to hear replies about other terrible writing systems.
As far as Indo-European languages go, languages like English and French are quite conservative in spelling and underlying meaning, which means they do diverge from pronunciation but retain semantic clues which help the reader unlock the meaning of unknown words. While languages that continuously reform to try to match pronunciation shifts (e.g. German) discard this info. In addition they have to pick one canonical pronunciation, and themselves typically retain various homophonic issues (e.g. how many ways can you write the sound of ä (I don't mean "ae")).
Even as a kid (should be "easy to learn", right?) I struggled far more with the variants of the devanagari alphabet even though it was so phonetic. But hundreds of millions of people use it every day without thinking about it.
For example, people routinely mispronounce the name of the President of China. (It’s closer to Shi than Zi.) Why? Because English has a strong preference for retaining native spellings even when the other language has a completely incompatible set of pronunciation rules. With Chinese this is extra silly because pinyin is only one of a million ways of romanizing Chinese and basically the least English compatible method.
English spelling is like a Hofstader puzzle: to be able to master it, you must master the spellings of all languages plus English itself recursively going back to the Great Vowel Shift. It’s not a good system.
It can’t really be called part of English in any meaningful way anyway.
If people who speak Chinese want to use pinyin, more power to them. But English speakers need to stop just copying other people's romanizations even when the romanizations do not connect to our spelling system at all.
This is objecting to the .01%--I find it hard to take seriously.
English has the really good feature that there is a phonic correspondence between spelling and pronunciation. It's not always one-to-one, and it's sometimes really weird, but it's generally there.
So, if someone sees Xi Jinping in writing, the utterance out of their mouth will be close enough for me to know what they are talking about.
However, even when someone completely botches the spelling of a word, there is almost always enough logic behind the error for the person looking at it figure out the actual word meant. That's a really nice feature in a language. (For example--I have seen phlegm or subtle spelled in all manner of ways, but I generally could tell what was meant.).
By contrast, lots of the articles talking about Chinese highlight all the really common words that native speakers can't even cough up the kanji for because there is so little phonic correspondence.
It's absolutely not the case and it's actually one of the most mocked feature of English on the internet. The pronunciation of the name of “Sean Bean” is a good example. As a non native speaker, I have had a lot of pain learning the pronunciation of words I've only seen written.
In American English, you have at least one direction easy (from pronunciation to spelling) but not in British English. And going from spelling to pronunciation is really hard in both dialects.
But you can try. And it will kinda work.
If you pronounce "Sean Bean" as "seen bean" (most likely), "shawn bawn", or "say-an bay-an" (that's gonna require some thought for a native ...) people will scrunch their brow, think a bit, maybe chuckle, and have an idea who you are talking about--especially if they've been around you for a couple days.
English is remarkably error tolerant. You may not get it right, but you will get your point across.
With kanji you can't even try. This isn't about whether English is easy. It's about that fact that the simple act of going from pronunciation to kanji or kanji to pronunciation simply isn't possible at all.
(And choosing a formal name as an example is just asking for exceptions. Japanese, for example, has a whole class of kanji used basically for nothing except names--good luck pronouncing those ...)
I don't want to defend kanji/hanzi as a writing system. It's also in debt to history in a crazy way. But as a non-native reader of Japanese, you can often figure out the pronunciation and meaning of unfamiliar characters based on how they look. A large majority, maybe 90%, of characters have a meaning-part and sound-part, and once you know the common roots, you can get pretty far by just looking at things. You can also move to Korean and Cantonese pretty easily because those preserved the pronunciations of old Chinese pretty well. Ironically, Mandarin did a pretty bad job of preserving old Chinese, so it's harder to match up.
E.g. 楽 (music) is old Chinese nguk, Japanese gaku, Cantonese ngok, Korean ak, and Mandarin yue(??).
I don't know what's your primary language, but as a French person I can tell you that no, English isn't that error tolerant and people don't understand what you mean if your pronunciation isn't good enough.
I still have traumatic flashback of my younger self desperately trying to buy water (pronouncing “wa” as in waffles and “ter” as in territory) in Canada when I was 15 (and really thirsty)… I think I spent a whole 5 minutes before a French Canadian arrived and saved my day.
Don't agree US English makes it easier, it's just differently idiosyncratic.
Extreme example - ghoti: Pronounced fish. The gh from tough, the o from women, the ti from nation. Works in American English. :p
If I hear in the US "cull err" and go to spell, I don't end up at color, hearing "sell fone" doesn't lead me to cell phone, etc.
Half the problems of English spelling are the three (four in US English, with Webster) significant attempts to simplify spelling over the centuries, and outsourcing some temporarily illegal printing to the Dutch.
Initial 'gh' can never be pronounced 'f', final 'ti' can never be pronounced 'sh', and 'o' is only pronounced 'i' in one word.
Indeed. Fixed, thanks
> Don't agree US English makes it easier, it's just differently idiosyncratic.
I found it easier when learning, but YMMV.
Just a lot I might have thought American English would take the opportunity to clean up, remain. Like all those silent h's from having printed the first English bibles in Holland. Ghost, aghast etc. There used to be a lot more - ghospel for one!
Well, even native speakers need to learn it the hard way as a child! As far as I'm concerned, I really struggled with French spelling when I was younger ;).
> Like all those silent h's from having printed the first English bibles in Holland. Ghost, aghast etc.
That's a really cool story. Thanks!
But the other way around isn't really bad: a spelling almost always have an unambiguous pronunciation. (There are exceptions but they are few and far between)
At least if you don't count ancient French, which is still prominent in places name's, and have a totally arbitrary pronunciation. As a little game for French speakers, try pronouncing the name of the city of «Meung sur Loire» ;).
Isn't the that point of UNIcode?
To unify all text into a single character set so that can exist side-by-side without messing with code pages?
Isn't this just code pages all over again?
If you add support for archaic systems you have to handle vertical bottom to top and boustrophedon too.
Edit: Some background can be read on .
Also regarding hiDPI: I know that you just want to get rid of subpixel antialiasing but you really don't want to get rid of antialiasing in general. It's not really an issue with typefaces, but non-antialiased high-frequency patterns can easily generate unwanted Moiré patterns on you display regardless how high is your display resolution.
For example, most people expect antialiasing (i.e., subsampling across space) but we seem to have written off the idea of motion blur (i.e., subsampling across time), except in some games.
It’s be technically more accurate to draw a quickly-dragged mouse cursor with a (correctly computed) blur, but that’s not generally easy to render today. And motion blur of moving subpixel antialiased text sounds like a nightmare right now, but with the right abstractions it might not be.
I tried reproducing the results using actual Cleartype with black lines on white backgrounds, and "orange-blue" vertical lines appear blurrier than "black" vertical lines.
I use only grayscale antialiasing, even on LoDPI screens. It looks great. Subpixel adds bad color fringes.
Here's a library I've made for an embedded Linux device: https://github.com/Const-me/nanovg At 5" 800x480, subpixel AA improved text a lot.
Speaking of mercy, I wish Apple had disabled it only on hidpi displays and chosen a longer deprecation window for normal displays.
I prefer keeping displays at a distance slightly longer than my arm. This completely hides the slight blur of subpixel antialiasing.
Now a 1440p has significantly degraded text rendering (even though other graphics look great) and if you want to use an external display, it must be 4K or 5K to not get a blurry, thin mess.
What I'd be curious about here, though, is what this says about large scale software engineering. Text rendering has to be one of the most common activities within a large number of computer program sorts and I know pieces of the text rendering process are common examples in texts on object oriented programming, indeed the different of rendering processes seem to suggest objects and interface readily. Yet the standard pipleline the author describes seems leakier than anything I saw twenty years when I dealt with such issues - nothing is solved on the software engineering side, the mess just grows.
Obviously, this is a product of adding new languages and new display models to the text rendering process, as well as standardizing the process so it accommodated different font approaches and etc.
But object orientation as well as related models promised, some time in the past, something akin to "encapsulate the process and adding complexity will be easier". Object oriented programming has lost almost all luster but what alternatives? Could the pipeline be less leaky in functional programming or something similar or something different.
It just makes me curious.
At that rate, it seems like just about any messy, multilevel problem can be taken as inherently leaky and not something design paradigms can make easier. Maybe the solution is, "there is no solution" but as an optimist, it's hard for me to accept that.
(And yeah, I should have said "design paradigm", not programming paradigm).
When you're looking at text all day every day, having every single glyph take a massive step up in fidelity (4x the pixel budget!) is not to be sneezed at.
No longer is there much need to have a 200+ DPI difference between your display and what you’re printing.
Gone are the days of zooming way way in to a page-sized PS document - it’s just the size it will be for print.
For my girlfriend, this was super important.
I fired mine up a few weeks ago and simply couldn't believe that we used to use screens that low resolution and blurry. My eyes have become spoiled.
Try looking at any recent smartphone with an at least full-HD display ;)
To appreciate just how many pixels there are and how many pixels per glyph, take a screenshot and look at it on a computer monitor. It's kind of surprising.
Like Emacs, Framework was programmable in a Lisp-like scripting language called FRED. In fact you could attach FRED macros to any frame, and they would be saved along with your work.
All this on an 8088-based PC (or 80186-based Tandy 2000).
I mean, just because ~100 looks okay isn't grounds to get stuck in a rut dictated by what was economical to mass produce in the year 2000s.
If Microsoft adopted the Mac font rendering aesthetic and fixed their CFF rasterizer, we wouldn't need to worry about TrueType hinting anymore. But now, since your PDFs and Web fonts get viewed on Windows, you need to use TrueType outlines with Windows-friendly hinting even if you aren't using Windows yourself.
This is a serious revelation to me. Thank you. I've been having to switch between my Mac laptop, Linux workstation, Windows 10 workstation, and Windows 10 Amazon Workspace a lot. It's been frustrating seeing the small differences but being unable to really figure out what was going on.
The outline format of .otf comes from Adobe Type 1 fonts (.otf really is just .ttf with outline format taken from Type 1), so the .otf outline format is the older one. Apple created TrueType after licensing talks with Adobe failed long ago.
.otf is generally more compact in terms of font file size than .ttf. Really the only reason to stick with .ttf is the rasterizer situation on Windows and outlines traveling from Linux and Mac to Windows in PDFs.
I personally find Adobe's .otf renderer in FreeType the best compromise between aesthetically pleasing and legible and can only recommend using exclusively .otf fonts on FreeType platforms.
Microsoft claims there are over 1 billion Windows 10 installations (https://news.microsoft.com/bythenumbers/en/windowsdevices/) and over half are running Windows 10; so that means that less than 2 billion Windows installations are in existence.
The numbers are pretty clear at this point.
> so that means that less than 2 billion Windows installations are in existence.
In what world is 3 billion out of 5 billion >= 90%? I repeat "Half sure, 90% no."
It was the only thing I could find that could do high quality text (ligatures, OpenType fonts, etc.), had native CMYK support, and could produce a print ready PDF or JPEG preview in milliseconds.
I can’t open source the code, but if anyone ever needs to embark on something similar I’d be happy to share what I learnt.
They are orthogonal concepts.
It doesn't look awful in Firefox, it looks perfect (it's version 26 of Firefox though.) This is what I see https://i.imgur.com/sjvqycv.png
Designers forgetting to include a bold version of their web fonts is one of my biggest bug bears. It always stands out, especially on Safari on iOS as you zoom in and out.
Or rather deliberately not including to decrease download size.
Though OpenType variable fonts are a thing now. e.g. Inter https://rsms.me/inter/ has a variable version
Excluding bold copy on the page or using system fonts in the design seem like better alternatives for saving bandwidth. Or appropriate use of the font-display property. Otherwise I see this as choosing custom fonts and only half implementing them for a design.
And variable fonts still need two sets of outlines to interpolate between glyph weights, so they may not completely answer the problem.
Obviously blurry and color-fringed; hard to believe anyone approved that artwork or finished product.
Canvas is just a bitmap so when printing pdf.js renders to a certain dpi canvas which I believe is less than 300 dpi (150?) which even still uses huge memory and ends up with fuzzy text.
You don't want bitmaps going to printer for text and line art, you want vector so it can come out at 600+ dpi while using minimal memory.
This is what seems weird. Why is that? Also it's not like most printer drivers don't need to convert it back to ps or pdf before printing.
SVG is turned into the appropriate PS/PDF vector drawing command by the browser print engine, canvas just gets sent as a bitmap in the printer language since that all that's left.
I believe pdf.js incorporated and modified https://github.com/gliffy/canvas2svg which implements the canvas api but instead creates an SVG dom.
Printing is a forgotten corner on the web, would be nice if a mainstream browser implemented the full css print spec too so we could create page perfect output without relying on PrinceXml...
Related undesired complexities/heterogeneities I encountered while implementing a simple text drawing API on top of various libraries (cf. "Pain points", near the bottom):
God yes. I'm building a UI design tool, and running head-first into this reality. I had no idea it was this hard.
Looks correct in Firefox 69.0.1 on Windows here. In Chrome it looks awful as described. In both the "bow" at the top overlaps the text on the previous line.
People also might be (1% prob.) placed into a "holdback" experiment group which has the old behaviour in M77.
 : https://colorfonts.langustefonts.com/disco.html
What's more, Unicode has a lot of control characters, that don't get rendered into glyphs but affect text rendering. These can be pretty tame, like space and non-breaking space - but some are rather nastly, like pushing and popping a stack of RTL-vs-LTR state (yes, it nests).
 : http://www.unicode.org/reports/tr9/
One of my favorite examples was Devanagari shaping. In the word ट्विटर (tvitar = Twitter), does the "i" matra shape before the "tv" it in the middle of it? In my sample, people accepted it either way (and you'll see both depending on font), but there are lots of examples where one is just wrong.
If I were to write it, I would write "tv" first. And draw an extended variant of "I" to cover "tv"
In text rendering, this translates to a custom ligature for an unexpected combo. I am not sure how feasible it is to add ligatures for every such combos
In real life, as long as the reader "gets" it, they would assume that it is the right way to write it :)
Same reason I prefer to use imperfect terms that capture the important aspects of the problem-space from an english-speaking perspective. Are ligatures the right word for how arabic and marathi get shaped into glyphs? Maybe not, but as long as you get that the æ ligature can be synthesized from ae by a font, and that this is super important for some languages, you're on the right path.
I don't even know what the fragments I use mean, lol. I like to assume I'm just copy-pasting Arabic swears around. Apparently at least one is just Manish's name?
Yes, I noticed that. :) Both मनीष and منش are "Manish", though he didn't bother with the vowels in the Arabic-script version, so alternative readings are possible.
You may need to be able to read some scripts, but learning scripts is much easier than learning languages. Most text rendering experts I know seem to be bilingual or monolingual, but understand the mechanics of a lot more scripts (and can read a couple). Many of them are people who taught themselves about other scripts as they went along.
It's quite easy to talk about text from other scripts in a more clinical way without actually being able to read the script: I've often had text rendering discussions about the Perso-Arabic or Devanagari scripts with folks who can't 100% read the script, but know the mechanics of the script: you can totally describe things in terms of general categories like consonants and vowels (in both scripts they behave differently, an equivalent in the Latin script would be talking about letters and accent marks).
I once wrote https://manishearth.github.io/blog/2017/01/15/breaking-our-l... which goes through the various ways scripts deviate from Latin that most programmers should know. There's a lot that isn't listed there (which only folks working specifically on text would need to care about), but it's not hard to acquire that background to a level well enough to be effective.
As demonstrated in that post you can also "collapse" a lot of scripts together into one set of scripts with similar behavior. A lot of the weirdness in text shaping, for example, is covered by the Perso-Arabic script and any one Indic script. I like to say that there's a reason so many people involved in text shaping are Persians.
Personally, while this stuff isn't my dayjob, I can read around ten scripts to varying degrees of success but I know like ... a couple words from each language whose script I can read. It's not hard to learn to read a script, and as I mentioned you don't even need to be able to read them: If we're counting understanding the mechanics of scripts, my 10 balloons to a number I can't even count, because I can for example now include most Indic scripts. I've had productive conversations in Unicode spaces about e.g the Punjabi script without being able to properly read it.
I’d suggest “LCD antialiasing” to replace “subpixel antialiasing”. And regular antialiasing doesn’t really need a term, it was already established long before LCDs existed. AFAICT “Greyscale antialiasing” was made up only to differentiate regular antialiasing from LCD antialiasing.
Now, using “Greyscale” to mean color, just not LCD color, that one doesn’t make as much sense to me. Some articles call it “whole pixel anatialiasing” or “traditional antialiasing”, those seem better, but maybe we can assume antialiasing is the regular kind, and only need a term to talk about LCD style antialiasing.
It applies alpha to pixels of whatever size right? What's the sub- part?
The term subpixel is often referring to virtual pixels used to compute some final pixel value, as opposed to the LCD specific idea of a physical subpixel that’s red, green, or blue. Look around, for example, for discussions on subpixel resolution, subpixel positioning, subpixel animation, etc. Those are usually talking about the virtual kind used in traditional antialiasing, not the physical LCD subpixels.
Perhaps, subpixel addressed anti-aliasing vs whole-pixel addressed anti-aliasing.
When playing around with FreeType's ftview demo program, text rendered with an OTF/CFF font using the Adobe CFF renderer with stem darkening enabled actually looks pretty good with grayscale rendering. Subpixel rendering is not strictly necessary.
I think the author mixes up two concepts here, hinting and subpixel positioning.
A TrueType font can change stuff around on the x and y-axis. Applying hinting on the x-axis messes with your layouting and prevents subpixel positioning on the x-axis. What you do is apply hinting on the y-axis only (FreeType calls this slight hinting, DirectWrite more or less does this -- not quite, but close enough) so you are free to shift glyphs around on the x-axis. This helps with displaying text with a more even texture on LoDPI screens. On *nixes, Chrome does this, Firefox doesn't.
These days, with bigger better bolder fonts, better renderers, and the departure from pixel-perfect stuff in favor of higher resolutions, subpixel is absolutely unnecessary.
Even on LoDPI screens I turn it off (assign none to lcdfilter in fontconfig) because the color fringes are so ugly. I always notice them.
Unless we completely abandon small font sizes or switch exclusively to high-dpi screens (will likely happen eventually, but we're not there yet), subpixel AA can look much sharper than grayscale AA if properly configured and you're not sensitive to the color fringes (personally, sitting at about an arm's length from my monitor, I don't notice them at all). And I'd rather not have text rendering quality suddenly downgraded on my existing peripherals before that happens.
The text I see looks fine, although the font is different.
- indoor positioning
>Here's what they look like in Chrome and Safari:
Running 76.0.3809.132 here on Mac OS and it looks very different from the picture. Bug partially fixed?
And it is ironic that an article about text rendering uses such
low-contrast and small-sized text.
Can’t really fault the author here
Does the Reader View in Firefox help here?
The font settings are just from a copy of bootstrap from like 6 years ago, because I have absolutely no eye for this sort of thing.
Maybe it's the way Tom presents it, but I do get the distinct impression that (fully) dealing with time zones is an even more maddening incomprehensible quagmire of a task than rendering text as laid out in this article.
Just reading Microsoft's Michael J Kaplan's blog (RIP) about Unicode, and how that plays into rendering, was daunting enough.
But this article really points out how ridiculous it can get. No thanks, I'll stick to the easy stuff like my current project, Angular 8 / .Net Core, and let Firefox handle the tricky bits.
I once tried to see the subpixel rendering using the Microsoft's magnifying glass tool. Got disappointed - the tool just disabled subpixel rendering. Globally, not just the area being magnified.
P.S.: because subpixel effectively gives you 3x horizontal resolution, it looks terrible to still fit to a pixel grid horizontally. Check the Anti-grain geometry article on font rendering for more cool font rendering bits, I recall it being very interesting.
Gotta love the sharpness of it somehow.
I was pretty disappointed the original article didn't talk much about hinting.
Even old operating systems did a great job at type hinting to align glyphs to pixels (even when antialiasing is on). "Modern" operating systems (especially MacOS after Mavericks) assume retina and throw hinting to the wind; the consequence is pervasive blurry text even on retina displays.
Then the errors in color would cancel out, much like in serpentine dithering. You could do the same with a Pentile pixel layout like this:
Either way, instead of seeing color fringing, I imagine it would give a grainy appearance similar to what dithering looks like.