I have tried models for Arabic but they fall short because of the borrowings from other languages, but also a different grammar in many cases.
Also, on one hand, transliterating the Latin written sentences in North African Arabic into a regular Arabic alphabet is a challenging task (many transliterations are possible for a given word), on the other, North African Arabic (NAA) is not standardized, so words are commonly written phonetically using a loose set of transcription rules, also borrowed from other languages.
An example of this last aspect would be 'the pharmacist' which in NAA could be written in any of those forms (combinations of either change are possible):
- l frmsian
Thanks for the reference, I will check it out!
First you should figure out what type of parsing you want. I would recommend looking at Stanford's CoreNLP library to see which task you actually want. There are multiple ways to parse grammar. Once you can name the actual problem you want to solve it should be googleable.
The downside of classical NLP is that you need to learn some amount of linguistics to create labeled parse trees for your data or even interpret them.
So, if your goal is to build an application, rather than a library, you may want to learn about neural nets/LSTMs. They can let you go from language to the actual information you want without you needing to encode and interpret parse trees.
The downside of neural nets is that they tend to need more data, but the data is much simpler so you could farm this out to mechanical turk if you wanted.
Cotterell, R., & Callison-Burch, C. (2014). A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic. In LREC (pp. 241-245).
It's using arabic characters but at least it's labelled data of Magrhebi arabic. So you'd "only" have to perform a translitteration or multiple translitteration between that corpus and your data.
As far as I know, noone has actually succeeded in doing NLP in NAA or amazight...
It's a topic of great interest to me but unfortunately I don't have time to invest in that subject. Please keep me informed of your progress!
I'd love to see a system doing NLP in latin alphabet for amazight and NLG to Tifinagh...