Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A modern way to type in African languages (github.com/pythonbrad)
189 points by pythonbrad 3 months ago | hide | past | favorite | 85 comments
Hello HN, I'm pythonbrad and a core maintainer of Afrim - an input method engine for African languages.

Afrim want to simplify the typing in African languages and also digitalize the African typing systems. Basically, it wants to solve the problems encountered with current solutions: - slow typing - not easily configurable - keyboard layout dependent - constant bugs

Additionally, Afrim offers the following features [1]: - Dataset easily customizable - Keyboard layout independent - Auto completion, autocorrection and autosuggestion - Support all sequential codes

Technical details [2]: Afrim is written in Rust and his architecture is inspired of RIME.

What's next? - Offer an android frontend of the Afrim (in development) [3] - Support more African input methods as possible

I would like to have your opinions about this project. I have been working on it so far, and I would like to know how I can improve it.

-------------- [1] https://github.com/pythonbrad/afrim?tab=readme-ov-file#featu... [2] https://pythonbrad.github.io/afrim-man/for_developers [3] https://github.com/pythonbrad/afrim-keyboard/




I've visited all the links and honestly still don't have the slightest idea what this input method does, or exactly what problem it's trying to solve, or why it's a good solution to the problem. I found the video [1] but it has no audio or explanations at all.

I also don't understand why you'd want phonetic input methods, rather than wanting to input your desired character directly. For languages like Chinese I understand because there are thousands of characters, but aren't most or all African writing systems based on small alphabets? I shudder to think of having to learn to input English phonetically.

So if you're looking for opinions, my first one is that your pages need to do a better job at explaining what current problems are (with multiple clear examples for each), where current solutions fail (with clear examples of how), and how your solution is different and better (again, with clear examples).

Good luck!

[1] https://github.com/pythonbrad/afrim-keyboard/?tab=readme-ov-...


> aren't most or all African writing systems based on small alphabets?

well, perhaps the most famous african writing system has a fairly large inventory of over 1000 characters, but it hasn't been widely used for about 2000 years due to religious persecution

the writing systems that are most widely used in africa are the latin alphabet and the arabic abjad, but as i tiresomely repeat every time the subject comes up, africa is immensely diverse, to the point that generalizations about africa are only slightly more useful than generalizations about non-elephant mammals


> well, perhaps the most famous african writing system has a fairly large inventory of over 1000 characters, but it hasn't been widely used for about 2000 years due to religious persecution

Which is ...?


I believe they're referring to hieroglyphics, although I didn't know the story about religious persecution and would be interested in some context there.


https://en.wikipedia.org/wiki/Decline_of_ancient_Egyptian_re...

> Where the pagan religion of the Graeco-Roman world accepted the influence and integration of native Egyptian deities and practices into its tradition, Christianity was not nearly as accepting. The strict monotheism of the latter was in stark opposition to the freeform syncretism of paganism. Local Christians engaged in campaigns of proselytism and iconoclasm that contributed even more to the erosion of traditional religion. In AD 333, the number of Egyptian bishops is estimated to be just under 100; the Christianisation of the Roman Empire itself, and edicts by Christian emperors in the third and fourth centuries AD compounded the decline, and the last known inscription[16] in hieroglyphics (regarded by some as a symbol of the decline of the religion itself due to their close ties) dates from AD 394, known as the Graffito of Esmet-Akhom. It is located at the temple of Isis on the island of Philae, in Upper Egypt believed to be one of the final remaining places of worship of native Egyptian religion.[17] By this time, Egyptian religion was largely confined to the south of the country and to the distant, isolated Siwa Oasis in the west.[18] This century also saw significant expansion of institutionalised Christianity into Egypt, but adherence to the old religion on a smaller, more local scale was still prevalent.[19] Philae is also the site of the final demotic inscription, dating to AD 452. The temple was closed in AD 553 by Byzantine emperor Justinian I,[20] who ruled from 527 to 565. As official temples fell into disrepair, and religious structures across Egypt declined, the religion gradually faded away.[21]


> I also don't understand why you'd want phonetic input methods

I'm from Morocco, and most people here (myself included) are accustomed to typing on an AZERTY (or QWERTY) keyboard. Typing in Arabic using a standard keyboard layout can be quite cumbersome and slow for us (Most people never took the time to learn it), using latin alphabet for us is just more practical, and doesn't require you to learn a new way to type.

In our daily communication, when we text, whether with friends or family, we often switch between English, French, or Moroccan dialects. When writing in Moroccan dialect, we frequently use a phonetic system (read in french way) that combines the Latin alphabet with numbers to represent specific sounds or letters that don’t have a direct equivalent in the Latin script.

For example:

    ق is replaced by "q"
    ع is replaced by "3"
    خ is replaced by "5" (though "kh" is more common)
    غ is replaced by "4" (or "gh")
    The rest is just a 1 to 1 conversion to how the letter sounds in french
    
Example phrase :

Bro, I woke up and had a sick breakfast.

Sat, 3ad fe9t o drabt wa7d ftor khatir.


this was the inspiration for http://canonical.org/~kragen/alphanumerenglish, in which i would write 'bro, ai dj7st wok 7p en h4d e s1k br3kfest', because it turns out that english also has a lot of specific sounds that don't have a direct equivalent in the latin script, and i think the solution people have been using for 1000 years in english is a lot worse than the solution arabic-speakers commonly use

(unfortunately i don't even speak enough arabic to know if you're actually writing in arabic or in a berber language like tamazight here)

(ai 4m ev k0rs w3l ew3r qet ref0rmi6 i6gl1c sp3l16 1z e fulz 3rend, qe d4wnfal ev m3ni en eks3ntr1k over qe sentceriz)


Shouldn't that be "q3t"? Or do some English speakers really pronounce "that" with the vowel in "the"?


most do in that context, though not in cases like this sentence, where 'that' takes a heavy stress and therefore doesn't undergo vowel reduction: https://en.wikipedia.org/wiki/Stress_and_vowel_reduction_in_...

(a shortcoming of that article, and often of linguistic studies of english in general, is that indian english has more speakers than either american english or british english, but gets rather short shrift in it; in particular, i don't know how consistently vowel reduction is applied in indian english, and its prosody is very notably different)


very interesting approach! in south asia, due to similar exposure to tech (through qwerty), we too use a phonetic approach, but without touching the visual aspect of the script.

it gets a little ambiguous, however, because we have ~2x the symbols that latin allows (before you get into accents). the workaround is to combine multiple symbols to replicate the phonetic sound the target character is meant to capture.

curious to see if the tooling can be combined to use across languages that can work with similar approach as the OP.


Urdu uses a QWERTY-based layout (so ق is where Q normally is, etc.), and adapts the shift key for additional characters. I find this much easier than the Arabic keyboard layout. I think non-Latin script alphabets should try to match QWERTY as much as possible.


Really? That’s surprising to me. The Arabic keyboard layout is very simple to learn, and people in the Eastern Arabic-speaking world (Egypt, Jordan, etc.) seem to use it pretty universally. Obviously people write online with the “chat alphabet,” but I’ve never heard of anyone in the Levant actually using it to input Arabic characters.


The existence of direct keyboard layouts doesn't always imply that they would be popular though. I think this happens fairly regularly around the world and the only major exception I'm aware of is Korean where no phonetic input methods were widespread at any moment.


It’s also related to the the layout of the first phone’s keyboards, where you had to type up to three times to select a letter ( it also was latin characters only)


Korean is phonetic.


Not exactly, though. And Japanese kana should be phonetic enough that a direct input method should be used for kana, but the dominant input method there is Romaji, i.e. typing Latin characters to get both kana and kanji. So that is not a defining factor.


Yeah, it's the same in Algeria and Tunisa. I was working with a derja tutor for a while and on day one she gave me the phonetic map like the parent comment pointed out, as it's ubiquitous.


I like the 3 for the 'ayn (since it's the same shape, only backwards). My Darija is poor, what's the 9 standing in for in "fe9t"? Is the 7 in "wa7d" a ح ?


I forgot the 7! yes you're correct it's for ح

9 is for ق


Ah, qaaf, that totally makes sense (same shape again, I guess you could say the same for faaf too, but since the sounds are the same 'f' works fine already).


Great to see more support for African communities!

I have to concur with the previous comment though that I'm unclear as to what this adds.


It's interesting, having grown up in Africa, and lived there almost 35 years, reading, writing and speaking an African language, being surrounfed by at least 10 other African languages and seeing them written, that I have never ever in my entire life encountered this writing system.

I think the product is a great technological achievement, but...

I would warn against the generalisation of "African" in this context. It does not tell the full picture, and simplifies the rather complex and very ancient reality that is language on the continent of Africa.


FWIW, the only languages supported appear to be Amharic (modern Ethiopian), Ge'ez (classic Ethiopian, basically only used in church liturgy today), and one rather obscure language (Nufi/Fefe, ~140k speakers) spoken in a small part of Cameroon. This is indeed a pretty long way from being a pan-African solution.


It’s true that we currently have few languages in our dataset, but by supporting Nufi, we are supporting all Cameroonian languages since their alphabets have been standardized under the General Alphabet of Cameroonian Languages (GACL). Additionally, we support the CLAFRICA code, which allows for typing in GACL. This GACL indirectly supports other African languages as well.

Regarding Geez and Amharic, the objective was to support a language distinct from Latin. Its implementation allows us to avoid being limited to Latin-based languages.


Can't wait for you to get started with the oldest languages on the planet, as spoken by the oldest people on Earth, the Khoisan languages. Hope that is in the pipeline, as the Khoi and San is almost eradicated in all of Africa and many of their languages are now extinct.

Khoekhoe is a nice start.

I think people will lose their minds if they see how it is written.

Fun fact - The Khoisan gave all the Bantu languages that have it, the click sound.

There are a lot of different clicks in the Khoisan languages. So many that they need exclamation marks and every other symbol on the keyboard to accommodate them.


> the oldest languages on the planet, as spoken by the oldest people

Begging your pardon, this is a nonsensical thing to say. One people are not older than another, and nor are their languages.

As it happens, the Khoisan show greater genetic divergence from the rest of the human race, indicating that as a population they have been relatively more isolated, for longer. This does not in any sense imply that they are in some way basal, such that it would be intelligible to call them older than other branches of the family tree.

This same objection applies to characterizing their languages in that way, we have no way of knowing if their languages are less or more prone to mutation than any other. We have rather less to work with, in fact, as the absence of literacy in their societies until modern times leaves us with no evidence at all. But it would be astonishing if their languages had no drift to them, and even "less drift" is a major and unsupportable claim, and these are the only ways in which it would be intelligible to say that those languages are older than another.


Khoekhoe is one of the languages I referred to in my GP.

I've seen it written in ascii, is there an alternative alphabet?


You haven't seen Amharic on Ethiopian Airline planes?

The project supports many other languages which justifies the use of "African".


The issue might be in the "About" section:

"This application allows you to type most of the characters in the african language in any text field."

Maybe it should say "in AN african language", rather than "in THE african language".


Africa is a very very big place. :)


Thanks for your feedback, i will take it in account.


Comment on the documentation/README more than anything - I couldn't find anywhere a list of specific languages supported. That's a pretty important data point for any speaker of an African language hoping to use this IME. If the library supports only Amhari and Ge'ez (the only two languages mentioned specifically) it's extremely important to a Wolof or Swahili speaker to know that when evaluating if the IME is in a usable state for them.


It looks like it is an a distinct repository, see https://github.com/pythonbrad/afrim-data


repo for Android keyboard implementation[1] says below, but feels strange this don't seem to be stated anywhere else:

  ### What african languages (and layouts) are supported ?

    - Amharic Keyboard - Transliteration
    - Clafrica Keyboard - Transliteration
    - Geez Keyboard - Transliteration
    - Nufi Keyboard (Fe'efe'e) - Transliteration
1: https://github.com/pythonbrad/afrim-keyboard


I understand your point, we should do a big work at the level of the documentation.


Isn’t Swahili compatible with ASCII (just like English)? Why would you need a special IME to write it?


Depends on location and time period. There's lots of historical stuff written in a customized Arabic script with unique characters.


Yes, Swahili is now written using the Latin script.


Similar to other commenters, I am curious to know what is the problem with african languages? Can't you just make a button for each character in the alphabet? (The readme mentions it is a phonetic-based input method, so I assume African languages use alphabets, not some logograms, right?)

What is common for African languages that allows solving the problem for all of them together in one software package? (How meaningful, for example, whould be a software package for Eurasian languages?)

I watched the video - https://github.com/pythonbrad/afrim-keyboard/ - but don't understand. A latin keyboard is used, but it produces some other characters.


Ge'ez and its adaptations are abugidas, which means one symbol is a full syllable. For Amharic there are over 200 individual symbols. [0] That would be a big keyboard! It does seem that most in-use languages in Africa are alphabets or abjads, which could be adapted to keyboards. [1]

[0] https://en.wikipedia.org/wiki/Ge%CA%BDez_script#Ge%CA%BDez_w...

[1] https://en.wikipedia.org/wiki/Writing_systems_of_Africa


Would these African abugidas be served well with a keyboard like the mobile 10-key Japanese swipe keyboards? [1] Japanese has much fewer sounds so it all fits in a pretty small package but it works so much better and faster than romaji in that context. Maybe it could be adapted with slightly more keys and complex swipe patterns, like up then right/left etc (it looks like for ge’ez at least, such patterns might actually be intuitive eyeballing the patterns in the characters, but someone native would know best).

[1] https://youtu.be/Q204SYyfEJY?si=KWe1sny93MeBScuT


It'd be nice, but I don't think that model would be workable here.

Somewhat oversimplified, but the two Japanese syllabaries hiragana and katakana are around 50 distinct characters each, so that a core 3x4 board of "keys" responding to tap(-and-maybe-swipe (up|down|left|right)) will give you roughly the full set of each. Generally, tapping the key designates the consonant; swiping (or not) gives you the vowel. There are 5 vowels, and roughly 10 consonants. There's a couple of other symbols added on as modifiers for voicing, etc. Again, oversimplified, but that's roughly it.

(Side note, and I'm guessing here, but I suspect this model probably evolved from T9 texting)

From a brief inspection of https://en.wikipedia.org/wiki/Ge%CA%BDez_script, the Ge'ez syllabary/abugida (used e.g. for Amharic) needs 6-8 vowels across at least 26 consonants, and then some more combinations for labialization/velarization, and then some more for application in specific other languages.

Following the Japanese model, that'd be a pretty big grid :) Phonetic input seems a more workable model to me at least.


Thanks.

So probably the user on the video types in a phonetic approximation of the words using the latin alphabet, and the software translates it to the abugida symbols?

Seems plausable, especially because he types several latin characters to get one symbol.

Interesting to note that sometimes he also uses digits.


The demo video appears to show typing in Fe'fe', which uses a Latin alphabet.

The library apparently also supports Amharic/Ge'ez, which does use an abugida, but I can't find any videos of this.


As a "african" myself, living and working in africa for the last 46 years, and speaking 4 different african languages. I am struggling to understand what "problem" this is trying to solve? It's never been an issue to type in an "african" language. And what is meant by "african" There is a big difference between for example zulu from south africa and arabic in morocco. And then we are not even touching on the thousands of languages in between that in the rest of africa.


Apart from Ethiopia, are there any places where they don't use either Latin or Arabic scripts in everyday life? There are technically a whole bunch of alphabets of course but they aren't used much afaik. Maybe the Tuareg script? Is that used by people in North Africa beyond bilingual signs?


There is Adlam for Fulani [1]. Widely spoken in west Africa. However, I'm still trying to familiarize myself with the writing system. The word for "yes" in fulani does not translate in writing. The best way I can write it is HIiiII. Imagine the lowercase i goes down in tone. Then the last two uppercase i shoots up. I don't know if any writing system supports this.

[1]: https://en.m.wikipedia.org/wiki/Adlam_script


> While they were teenagers in the late 1980s, brothers Ibrahima and Abdoulaye Barry devised the alphabetic script to transcribe the Fulani language.[3][6] One method they used involved them closing their eyes and drawing lines. After looking at their drawn shapes, they would pick which ones would look the most to them like a good glyph for a letter, and associate it with whatever sound they felt it would represent.

I'm not sure if I should be impressed that this has turned into an actual script, or disappointed that so many people thought this was a good idea.

Hot take: artificially creating new scripts that need to be taught from scratch to everybody and require new fonts, layout engines, etc hinders language adoption/preservation instead of helping it.


> disappointed that so many people thought this was a good idea

Perhaps it's more that Fulani speakers truly appreciated having an alphabetic script that is able to adequately represent the distinct sounds of their language without ambiguities, which had not been the case with the Latin or Arabic scripts. Cultural pride also would have played a factor, there's a reason South Korea has a special holiday to commemorate the creation of Hangul script: https://en.wikipedia.org/wiki/Hangul_Day


Hangul and the corresponding Japanese syllabaries made sense because Chinese characters are very poorly suited to writing Korean and Japanese.

Latin and Arabic, on the other hand, have a long history of being used for other languages and can be adapted to represent basically anything.


> Hot take

Uh yeah it is. A writing system needs a flat surface, a writing implement, and a mind prepared to learn. If enough people think that the new alphabet is a good idea, fonts and layout engines will follow. For Adlam, they did. (Another invented system which took off indigenously: Cherokee syllabary[1].)

[1] https://en.wikipedia.org/wiki/Cherokee_syllabary


Using the phonetic representation (IPA), you can obtain a writing system similar to the pinyin.

Hanyu Pinyin Hànyǔ Pīnyīn Fāng'àn

Bopomofo ㄏㄢˋ ㄩˇ ㄆㄧㄣ ㄧㄣ ㄈㄤ ㄢˋ

IPA [xân.ỳ pʰín.ín fáŋ.ân]

Cameroonian dialect are written using the GACL who is similar to the IPA.

Clafrica: Pookai2t peu2nze2e2 n*kut !

Shʉ̄pāpə̀m (Bamoun): Pookɛ́t pә́nzéé ŋkut !


> Cameroonian dialect are written using the GACL who is similar to the IPA.

What exactly is the GACL? Trying to find information on it returns this HN comment as the top result, and a bunch of language-unrelated results after that.


From another comment, it's the General Alphabet of Cameroon Languages.

https://en.wikipedia.org/wiki/General_Alphabet_of_Cameroon_L...


If there are no two words that are only distinguished by tone, it might be unnecessary to mark it in writing, except as a learning aid for people who're not fluent enough in the language to predict this information from context.

E.g. in English, questions are marked by rising pitch, but that intonation is not indicated in writing.


FWIW, Japanese has tonal minimal pairs (ha'shi chopsticks, hashi' bridge, hashi edge), but doesn't bother marking them in (kana) writing. Although the Chinese characters are obviously different.


> but that intonation is not indicated in writing

Oh?


?


https://en.wikipedia.org/wiki/Writing_systems_of_Africa

I don't think any have very widespread use outside of Ge'ez (Ethiopian, etc). Maybe Tifinagh.

I do notice that iOS has built-in keyboards for N'Ko and Tamazight (Tifinagh) - pretty cool to try out if you have an iphone.


North african here! For Moroccan arabic dialect, we mostly use latin alphabets and some numbers to replace some sounds/letters


It looks a little bit similar to the working of the clafrica code to type in General Alphabet of Cameroonian languages.

Clafrica: Pookai2t peu2nze2e2 n*kut !

Shʉ̄pāpə̀m (Bamoun): Pookɛ́t pә́nzéé ŋkut !


Very interesting. Is that to avoid confusion with standard Arabic?



I added a FAQ[1] to answer some common questions.

For more reading, there is an article[2] who have similarities with what we want to achieve.

Disclaimer: We discovered the article 01 year after the development.

[1]https://github.com/pythonbrad/afrim/blob/main/FAQ.md

[2]https://hughandbecky.us/Hugh-CV/talk/2015-africa-assessing-t...


I'm wondering what problem you experienced that this is a solution for.

"African languages" is not, in my experience, a single class of languages. There are large differences between the languages, with most of the northern languages borrowing heavily from Arabic, most of the central African/West African languages borrowing from French and the rest are different enough that they can't be considered dialects.


What does that have to do with an IME library/project?


IMEs are usually very narrowly targeted at a single language or script. Claiming that this one is somehow superior to existing solutions for an entire continent's worth of languages with wildly different phonologies is unusual.


`afrim` is not just an IME. It can be used as a library to develop another IME. It's an important point that I neglected in the README.


> `afrim` is not just an IME. It can be used as a library to develop another IME. It's an important point that I neglected in the README.

That's all good, but I am very curious about the use-case that motivated the creation of this project.



Is this being done in collaboration with the Linguistics faculties of African universities?


Amahric, Ge'ez and "Ethiopic" are all the same alphabets, am I missing something? Ge'ez encompasses all of them including Tigrinya. Why not just have an expanded Ge'ez library?


It's interesting to see African input methods taking inspiration from a Chinese IME, RIME. Would like to know more about this!


How does this compare to IBus or Fcitx?


Cool. Do you have plans for Windows and macOS frontends?


Windows is supported through https://github.com/pythonbrad/afrim-wish


Oh, didn't realize that. How does it handle key input in Windows? Text Services Framework or something else?


The initial work was a prototype, we are currently working on the usage of text service for windows, and IBUS for Linux (mainly Wayland environment).

But the next release, will take a time, since we are not familiar with these technologies. https://github.com/pythonbrad/afrim/issues/242


You could update the dependency "enigo" to the current version. I've implemented experimental support for Wayland. It can use Wayland protocols or libei. There will probably be bugs that's why it's hidden behind feature flags.

I'm the maintainer, so if you need anything, please let me know


Thanks, currently I am using Debian (Wayland) and it will be great to test this enigo feature.


I didn't look at the code but the Cargo.toml file says it's using enigo as it's dependency. You can use that to simulate input an Windows, macOS and Linux


I was intrigued by this and looked through the README for examples of Afrim being used and learn more about the problem it's solving and how but didn't find these.


This video linked from the HN post is an example, although I did't understand yet how it works. They type on a latin keyboard but get some other symbols in result.

https://github.com/pythonbrad/afrim-keyboard/


Well spotted, I somehow missed that, thanks.

Noob me has no idea how editing is done. Delete and re-enter?

Have you ever used WordPerfect's reveal codes mode? It'd show both the actual input stream (characters you typed) and the formatted (rendered) output at the same time. (Here's the top hit showing the feature on Windows. https://youtu.be/LQOYYi2IHIY I used the DOS version, back when dinosaurs roamed the earth; I loved it.)

Should/Could there be a "reveal codes" mode for text input?

I've never used an IME, so please disregard this notion if it's not even wrong. :)


I couldn't understand what this is for and why it is useful. Gotta work on that documentation to make it more clear.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: