Hacker News new | past | comments | ask | show | jobs | submit login

I wonder how the decisions for inclusion of languages were made, as there are some very odd decisions. For example, Osmanya is a script created for the Somali language that was hardly ever used (Somali literacy was only widespread after the latin alphabet was adopted - previously Arabic was commonly used). The population of actual users of this script is pretty indisputably 0. 100,000 would be a wildly ambitious estimate of the number of people who had ever actually even seen the script.

On the other hand, Oriya, which has over 33 million native speakers, including 80% of India's Odisha state, does not appear to be supported.




In their defense, when you click India and scroll down, it does say, "not supported yet". Which leads me to believe they picked both languages with few characters (or straightforward to render?) and those most common, and they'll get to the rest shortly. :)

Oriya appears to be quite complicated to render: http://www.microsoft.com/typography/OpenTypeDev/oriya/intro....

Meanwhile, I wonder if this means we'll see OCR and ePubs for all kinds of scripts now; or if this will help enable Google Translate in more languages? ;-)


"Oriya is similarly structured to Devanagari and is used to write the Oriya language in Indian state of Orissa."

Devanagri is what Hindi, Marathi, and Sanskrit use, so I am certain that it isn't any more complex to render than those languages.


Also maybe this was a %20 time thing and the programmer who started it just wanted to do those languages (probably because they couldn't be found elsewhere).


%20, the final frontier


I thought they didn't do 20% time anymore.


It's still a thing, though it does depend to some degree on one's manager.


It's probably just a matter of whether or not there's somebody in the relevant team(s) who is familiar with, or at least has heard of, any given script.

I wouldn't be surprised if there happens to be an Osmanya geek in Google, but none of his teammates has ever heard of Oriya. For the same reason, I wouldn't surprised if they added a bunch of geeky fictional languages before actual ones.


have an upvote if you file an enhancement request for Tengwar [1] at https://code.google.com/p/noto/issues/list !!

[1] http://www.omniglot.com/writing/tengwar.htm


Unfortunately, Tengwar is not an official part of Unicode. Ditto for Klingon.


And the fun thing is, there is two OLD! persian script available, (Pahlavi, OldPersian Both dead for almost 1500 year) and the current Persian is not supported :)))


I'm guessing their going for the ones nobody would ever put the effort into supporting first and get to the current ones later.


Perhaps even more curious is the inclusion of Deseret, a toy language developed by the Mormons in the early 1800s. It never caught on and few books other than the Book of Mormon were ever translated into it.


Deseret wasn't a language, it was simply a script to go along with a reformed version of English with simplified spelling.


Good pick, but I also see the value in preservation — maybe 500 years from now, the Noto fonts will be the best or only representation of many dead or forgotten scripts and languages.


Although they are non-prescriptive, the Unicode Osmanya table:

http://www.unicode.org/charts/PDF/U10480.pdf

already contains reference glyphs. If you want to preserve scripts, preserve Unicode tables instead of making fonts.


There are many more glyphs than there are codepoints -- a font contains a ton of information that would be needed to reproduce a script that is not present in unicode tables.


This is particularly true for non-European languages – ligatures are a minor feature for most of the languages using latin-1 but there are quite a few which depend on complex, multi-letter combinations which are required for text to be comprehensible:

http://en.wikipedia.org/wiki/Complex_text_layout


I noticed the inclusion of Cornish. As of 2011 there were 557 people that claimed Cornish as their primary language.


Sure, but, there are no Cornish glyphs. Cornish is entirely writeable with the same characters you write English with, so you get it for free


Hi fellow Oriyan, google has very bad support for Oriya,since IT is not that great as in R&D in odisha, I work in IIIT hyd, which is the leading NLP lab in India and I dont see anything in Oriya.


Good observation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: