Hacker News new | past | comments | ask | show | jobs | submit login
Harir – Reducing Noise in Arabic Script (typotheque.com)
129 points by flannery on Nov 24, 2015 | hide | past | favorite | 46 comments

For comparison, I just pulled up http://arabic.cnn.com/

The headlines are a bit stylized and difficult to read as I'm not familiar with the font. Getting accustomed to a font is either more difficult in Arabic, or there are greater extremes in fonts -- I've never figured out which. Certainly I have to learn a new style of writing every time I read a hand written letter.

Most of the text on cnn's site is easier to read but feels mechanical compared to the harir font which feels like a significant improvement.

Harir is very easy to read, with few peculiarities. I don't claim to know fonts or which is better/worse, but I would say Harir is very good, at least for my sample size of one. You would have to ask a native speaker who might be more comfortable with script than type their opinion.

I don't like the Arabic font used on CNN either. I don't know why they used such an inelegant font for everything on their website, it looks very ridiculous and unfriendly to the eyes.

I have better examples that you might find interesting:

- http://www.madamasr.com $$ Give it some time for the webfont to load $$

- http://www.dw.com/ar/

- http://www.bbc.com/persian

I am not a fan of the cnn font but I don't see Harir as a major improvement (it's too thick). The BBC and dw fonts look the most natural to me. Al Jazeera (http://www.aljazeera.net/portal) is also very readable.

All the examples of the Harir typeface from this post are the bold and bold caption fonts. Do you also think the regular and display regular fonts are too thick? https://www.typotheque.com/fonts/harir/regular/sample/arabic... https://www.typotheque.com/fonts/harir/display_regular/sampl...

I wonder, how does the logo work? Does the original, latin "CNN" logo mean anything in the arabic logo? Or is it arabic and latin smashed together?

Reading it right to left you might pronounce it "bilarabia" which means "in arabic". Even though I can't recall any at the moment, I know I've seen numerous logos with dual meanings like that.

I find it exceedingly interesting how Latin is so easily and often broken down into simple geometric shapes, whereas things like Arabic and Chinese when digitized still try to emulate the subtleties of brush strokes.

There's folks who maintain that's why Europe entered the Industrial Revolution way before people out East did - non-Latin languages didn't take too kindly to be broken up into the pieces that could be used on a printing press!

"According to Suraiya Faroqhi, lack of interest and religious reasons were among the reasons for the slow adoption of the printing press outside Europe: Thus, the printing of Arabic, after encountering strong opposition by Muslim legal scholars and the manuscript scribes, remained prohibited in the Ottoman empire between 1483 and 1729, initially even on penalty of death,[4][5] while some movable Arabic type printing was done by Pope Julius II (1503−1512) for distribution among Middle Eastern Christians,[6] and the oldest Qur’an printed with movable type was produced in Venice in 1537/1538 for the Ottoman market."


That's quite an interesting theory. Do you have any good books on the topic?

Marshall and Eric McLuhan's Laws of media address the technological impact of the latin letter, and how it was extended to other technology. http://www.amazon.com/Laws-Media-Science-Marshall-McLuhan/dp...

Marshall McLuhan's book The Gutenberg Galaxy is a longer approach to specific conversation around how latin characters led to the printing press which transformed society into the industrial age:


One of the better discussions of this is probably in http://blog.gatunka.com/2008/05/05/why-japan-didnt-create-th... , which makes a much weaker claim about the 8-bit era of computing in particular, which I'd think is easier to discuss for a lot of reasons.

One other point I'd make is that I think people may be accidentally overestimating the friendliness of English for printing, to some degree, because sometimes it sounds like that people think that Japanese, Arabic, or whathaveyou people are just completely unwilling to make any practical concessions to the fact that technology might have some problems reproducing their language. But it's not as if English was not itself compromised by technology.

How much of this discussion sounds like "No English-speaking country would use a typewriter because English is written with a different amount of space allocated to each letter, which typewriters of the era could not reproduce."? English speakers adapted to the tech. Telegrams historically looked like crap... they couldn't even have punctuation!... but they were popular even so.

Granted, English may have a particularly easy time of it (it is, admittedly, quite nice to fit a quite usable subset of your language into five bits), but I'd like to see some more concrete evidence that all these other cultures were unwilling to bend for a while, because they were so much more concerned about... whatever it is they were concerned about.

(Also, to be clear, I'm speaking about day-to-day usage. Being particular about the Koran is one thing, being unwilling to bend in an text message on your 1998 phone is quite another.)

> One other point I'd make is that I think people may be accidentally overestimating the friendliness of English for printing

I don't think so to be frank. When compared to Japanese or Arabic, once could say English was created solely for printing. I'm a native Arabic speaker, so I can appreciate why people would claim it's a difficult language to print. Word processors and some browsers, to this very day, sometimes have trouble rendering Arabic properly, especially when it comes to linking characters and Arabic punctuation.

> Telegrams historically looked like crap... they couldn't even have punctuation!.

English punctuation does not significantly alter meaning in my opinion. You have sentence-ending punctuation, which is used to differentiate mainly between a statement and a question. Then you have separator punctuation, like commas, semi-colons, and colons. I don't see them making much of a change either. And finally, the apostrophe and quotation mark. In addition, telegrams are short messages, so there usually is no need to convey complex meanings. Therefore, it would have been a slight inconvenience for telegram users at most.

In Arabic, changing a punctuation mark above a single letter in a word can change the word's meaning entirely. Furthermore, letters are linked together to form words - this is not essential, but would make things much more difficult to read. Also, the position of a letter in a word dictates its "shape". This introduces a ton of variation to the letters of the language, making it difficult for even modern keyboards to get right. I'd imagine it would have been even more complex at the beginning.

Of course, I think that the main barrier to widespread adoption of the printing press in the Arab World was more related to demand than anything else. At the time of the Industrial Revolution and slightly before it, the Arab countries' least concern was printing books, as they were too busy dealing with the imperialist occupation.

> letters are linked ... get right.

That whole section applies equally well to many written forms of English. I'm not an expert but I think modern Latin "print" forms (i.e., the only forms an increasing number of Anglophones can read or write, aside from very similar modern italic) are as much a product of the limitations of the printing press as anything. Look at the archaic forms of the letter /s/ for a start.

> English punctuation does not significantly alter meaning in my opinion

Let's eat, Grandma! vs Let's eat Grandma!

Commas save lives.

My parents, Ayn Rand and God.

> That's quite an interesting theory. Do you have any good books on the topic?

It's an idea that comes up in online discussions from time to time but is generally not taken seriously by academics because it completely ignores the fact that most non-Latin languages do in fact lend themselves quite well to being written as the composition of smaller elements used in a printing press.

It's the same sort of argument made about keyboards to explain "why China, Japan, and India haven't produced as many prolific computer scientists", completely avoiding the fact that keyboards for Chinese, Japanese, Devanagri, etc. have existed for over a hundred years.

Urdu, Arabic and Farsi in particular (all using the Arabic script) do not. Urdu, for example has 38 letters, and most of those letters have 2-3 forms depending on whether they are in the beginning, middle and end of the word. Additionally, the letters have to be joined at precise places, otherwise the text becomes illegible real fast.

Hindi, Chinese and Japanese, by contrast, may have many characters, but each character is a distinct element.

> Hindi, Chinese and Japanese, by contrast, may have many characters, but each character is a distinct element.

That's not... really true about Devanagri and related scripts like Bengali, Marathi, etc. At least, no different than the Arabic scripts as you describe it. They have the same considerations with multiple forms per letter (as many as five per-letter, not counting variations in size and joining or conjunct forms), along with the issues regarding character joining affecting meaning in critical ways. Nevertheless, these problems have pretty straightforward solutions when it comes to typesetting.

It's also worth noting that English had many of these same problems as well at the time of the printing press, which is why handwritten English before the 15th century looks so different from what we read today. We literally dropped letters from the English alphabet due (in part) to this problem.[0]

It's frustrating to see this argument persist because it has next to no evidentiary support by accepted literature, and it comes dangerously close to psuedoscientific post-hoc rationalizations of European political dominance. (e.g., phrenology being used to "explain" why Europeans were able to make scientific advances during the Enlightenment whereas African and Asian societies allegedly "weren't"[1]).

[0] For example, the printing press is a major reason that we no longer use the thorn in written English: https://en.wikipedia.org/wiki/Thorn_(letter)#Middle_and_Earl...

[1] which is wonderfully ironic given where numerals in European languages come from (spoiler: originally India, by way of Arabia), but I digress.

> We literally dropped letters from the English alphabet due (in part) to this problem.[0]

Going by your link, there was no technological difficulty at all with printing thorns. Rather, what happened was that England imported type rather than manufacturing it, and foreign type didn't include English-specific letters.

> the substitution of Y for thorn soon became ubiquitous, leading to the common 'ye', as in 'Ye Olde Curiositie Shoppe'. One major reason for this was that Y existed in the printer's type fonts that were imported from Germany or Italy, while thorn did not.

In addition, Arabic has symbols that can appear either above or below each letter that usually change the meaning of a word completely.

For example:

عالِم (pronounced a'alim): scientist

عالَم (pronounced a'alam): world

Notice the difference is essentially the "i" and "a" yet the entire meaning changes.

I'd imagine this would add complexity to the mechanisms employed by printing equipment back then.

So, like world and word? Where a single letter changes the entire meaning?

No, the letters are exactly the same. The change is caused by adding an accent above or below the letter, in this case the letter pronounced like "l" in English.

> the letters are exactly the same. The change is caused by adding an accent above or below the letter

This is a pretty myopic view of what's significant in writing. Why are you calling the consonant a "letter" and the vowel an "accent"? Would you apply the same rigorous distinction to é, which is a full letter (specifically Latin Small Letter E with Acute, U+00E9), vs é, which is a letter e with a combining accent above?

The correct terminology is a diacritic.


There is no meaningful difference, for this purpose, between letters and diacritics; it is nonsense to say that "fit" vs "fat" is an easy, simple contrast because 'i' and 'a' are "letters", while the analagous distinction in Arabic is tricky because it is represented in different glyphs. They're all glyphs.

Well, English has situations where the word is written the exact same way, but pronounced differently.

The case that accents change meaning is also existing even in many european languages.

So, it’s not like latin-1 doesn’t deal with those issues either.

I’d assume it’s more other issues that are problematic with Arabic script.

Unfortunately, my knowledge is anecdotal and based on conversations with my grandparents. :(

Those folks are evidently crackpots. China had working printing presses, and mass production of books, centuries before Europe did.

The claim isn't about printing, but about movable type and printing presses.

China had printing, and movable type, before Europe, but not printing presses. And movable type was difficult with Chinese characters.


> China had printing, and movable type, before Europe, but not printing presses.

Your link contains no support for this rather odd idea. Wikipedia explicitly contradicts it:

> A printing press is a device for applying pressure to an inked surface resting upon a print medium (such as paper or cloth), thereby transferring the ink.

> Movable-type presses using cast ceramics were employed in China from the early years of the last millennium.

(emphasis mine)

What exactly does "printing press" mean to you? What distinction are you trying to draw?

> However, even the greatest technology is constrained by cultural circumstances. Chinese character ideograms are too specific. Consequently, the thousands of distinct characters would be difficulat to categorize in molds. And although the more complex characters can often be decomposed into simpler elements, the process of doing so is so unsystematic that mechanising it efficiently proved impossible. True mass printing could only thrive in a culture with a less sophisticated writing system � an alphabet of few characters. Western civilization in the 15th century would be revolutionized with the Dutch goldsmith Johannes Gutenberg's invention of the movable printing press. Movable type is a reform in the history of printing and contributed much to human civilization.

Depends entirely on the font. Would you make that same claim about Dancing Script or Tangerine? Or without getting that extreme - what do you consider serif fonts to be? They emulate the tapir that writing by hand might have and are generally considered easier to read because of it.

I use Meiryo for Chinese/Japanese scripts and everything is simple, geometric shapes with more or less constant thickness for lines [0]. There are 100+ radicals (the name given to often used "parts" of Chinese characters) and so there is a lot more variation than the 26 (52 including capitals) that Latin alphabets use.

Korean hangul doesn't have this issue, because hangul is a wonderful, wonderful writing systems.

[0] https://upload.wikimedia.org/wikipedia/commons/thumb/7/7f/Me...

Hangul is the closest thing to a phonetic alphabet I'm aware of. Certainly more consistent than English in the Latin alphabet. Not that I know Korean, but I learned to sound out English loan words during the flight to Korea. It's that easy.

By any chance, did you use this comic/guide? :)

[0] http://www.ryanestrada.com/learntoreadkoreanin15minutes/

No, but that's awesome!

I actually find Hangul very interesting and beautiful.

Counterpoint: Greek uses a simple alphabet like Latin, but Greek typefaces are typically much closer to the "curly handwriting" end of the spectrum with Arabic.

If I had to guess, I'd say the reason is something to do with the Renaissance and a couple of centuries of fashion in typeface design pushing the limits of the mechanised process, instead of just trying to reproduce the shapes that people were used to (there's a reason sans-serif typefaces were originally called 'grotesk'!).

Is that true?

For example, first place I checked: http://yahoo.co.jp doesn't have any subtle brush-strokes. Everthing chracter is constant thickness. The characters are intricate, but I don't know what the information density is (pixels per word); I think they have fewer charaters per idea in Japanese, and more essential pixels per character.

> The characters are intricate, but I don't know what the information density is (pixels per word); I think they have fewer charaters per idea in Japanese, and more essential pixels per character.

Interestingly, that appears to not be the case for Japanese, quite the opposite in fact. There was a research a while ago about the speed of spoken language that I think can loosely be applied here (number of characters per syllable depends on which Japanese alphabet is being used). The research concluded that Japanese is very low density (and thus spoken very fast).


As to spoken Japanese, this isn't hard to conclude from first principles, since the phonological structure of Japanese is quite spare. As few syllables are possible, the informational content per syllable is low. (related: I read a book containing an example intended to address the idea that Japanese is spoken faster than English -- an English sentence of six notional syllables such as "Smith's strength crushed six sleek ships" is going to take much longer to say than one such as "I made an apple pie", which is also six syllables. The second sentence has one consonant per syllable (on average), the first has considerably more. In Japanese, only the second kind is possible.)

The Japanese syllabary characters tend to be quite modestly formed; they have few essential pixels per character. The kanji, which carry large quantities of semantic information, require much more space to be legible.

Beautiful font! I can't read the language, but have done some software work in the middle east, and Harir looks much cleaner to my eye than the helvetica knock-offs in arabic.

If you want a real challenge in il8n, try ensuring that all the ligatures are correct in generated PDFs.

I think a great test for the font would be to have Quran readers read the Quran with it, and perhaps test their speed. It is a pretty good benchmark. I have a HUGE problem with reading the Quran electronically because i'm used to the Indo-Pak script and become almost illiterate when reading the traditional Arabic script. Unfortunately most electronic materials are in the traditional Arabic script.

Does "elementary school" mean something other than school for 5-11 year olds in other countries? I am having trouble imagining 10 year old me being concerned with a font's "inelegant spacing and minimal kerning."

As I understand it, this is not "somewhat unpleasant" or something a pedant would notice, but absolutely horrible.

Most elementary students would notice and dislike kerning like this https://static.flickr.com/55/134612871_dd482da6a2_o.gif .


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact