
Show HN: Mob Translate – Translation for Australian Aboriginal Languages - thomasfromcdnjs
https://mobtranslate.com
======
thomasfromcdnjs
So the site is mostly a prototype at this point ->
[https://mobtranslate.com/](https://mobtranslate.com/)

I'm building it out to be a fully open source and community driven project ->
[https://github.com/australia/mobtranslate-
server](https://github.com/australia/mobtranslate-server)

I've also only started work on my own tribes language (Kuku Yalanji)

We have a dictionary in PDF form ->
[http://www.ausil.org.au/sites/ausil/files/WP-B-7%20Kuku-
Yal%...](http://www.ausil.org.au/sites/ausil/files/WP-B-7%20Kuku-
Yal%20-%20English%20Dict._0.pdf)

And I'm currently manually transcribing it all into YAML ->
[https://github.com/australia/mobtranslate-
server/blob/master...](https://github.com/australia/mobtranslate-
server/blob/master/dictionaries/kuku_yalanji/dictionary.yaml)

I'm trying to keep the project simple but with enough consideration to keep it
robust.

I'd love if anyone is interested enough to perhaps lend a hand. I'm always
looking out for any other indigenous developers to network with. (I believe
I've only ever met one other person)

Also, for anyone who works in this space, any resource recommendations or tips
for writing a translation engine. (I currently will just for likely word
replacements but will eventually make the engine try to understand grammar)

"mob" is a common way Aboriginal's refer to each others tribes. "Which mob are
you from?"

Edit: I've only added 20% of the dictionary so will finish over the next
couple days.l

~~~
mabcat
> any resource recommendations or tips for writing a translation engine

All the successful translation websites/apps you're familiar with use machine
learning. ML stomped all over NLP approaches because it gives a rough
translation between so many more languages for so much less work.

On the question of where the data comes from, you might be a bit closer than
you think. That dictionary you're transcribing has some sentence pairs, and
like yorwba said, sentence pairs are food for ML language models. Extracting
all the sentence pairs into a dataset might raise some interest from ML
people.

~~~
akdor1154
I think ML approaches are not well suited here - or at least there are still
huge problems between non-germanic/Latin languages that have yet to be
approached. (My background: white Australian, not a linguist, know a little
Vietnamese, know a very tiny bit about indigenous Australian languages based
on chatting with friends who studied that area)

I continually see consumer-facing ml approaches (FB, Google) give terrible
Vietnamese translations because they assume all of the context needed for a
translation is available in the text. In general this is not the case. In
Vietnamese this is hugely obvious because their pronoun system is largely
based on 3rd person relationships ("sister walks down the street", "boyfriend
loves girlfriend"), which is impossible to map to/from English 2nd person
("you walk down the street", "I love you") without basically a full conscious
intelligence. Even FB, which is in a unique circumstance of actually having a
lot of the requisite relationship data between people available to it, does a
terrible job at this.

My (tiny) understanding of the incredibly rich kinship systems in indigenous
Australian cultures suggests that this would be a huge issue there as well,
assuming these complexities are also present in their languages. (...OP? :) )

------
cam_l
If you are from FNQ, i had heard somewhere there are grants available from qld
gov for language initiatives. Did a which search and found one here [0].

I know grant applications can be a real pita, but from the face of it sounds
like you could be eligible (just in case you were not already aware).

[0] [https://www.datsip.qld.gov.au/programs-
initiatives/grants/in...](https://www.datsip.qld.gov.au/programs-
initiatives/grants/indigenous-languages-grants-2020)

~~~
thomasfromcdnjs
Good thinking, I imagine that it would be a grant worthy project..

I personally prefer to stay away from funding and will do this project in my
spare time.

Though, once there is a big enough team, and they vote that the project
deserves funding, I'd be happy to investigate it.

------
wombatmobile
Great initiative that will have significant impact by making translations
available to all, including journalists, governmentS, lawyers, courts and
merchants. And of course it will give kids one more reason to contextualise
aboriginality as first class citizenship.

Have you looked into affiliations with unis and education departments?

~~~
thomasfromcdnjs
Great ideas.

Current plan will be to finish one whole dictionary.

Then find the second most popular dictionary.

Once two are fully integrated, I will do a bit of promotion to the places you
mentioned.

And hopefully just network everyone who is interested in thoroughly digitising
the tribal dialects.

------
mebeam
As having dabbled in NN's,speech,language recognition, I feel rightly ashamed
that Aboriginal langauge has never crossed my mind (ever!); That's not the
worse part,, I'm a native of Australia :(

~~~
thomasfromcdnjs
Hey mate, part of me posting this here was to find other Aboriginal
programmers, as I've only ever met one other in my life.

I was thinking of putting together a group chat somewhere if you are
interested in joining.

~~~
mebeam
Absolutely, would really appreciate access to a group chat.. I can organize
free hosting/servers if it helps

please msg me

~~~
thomasfromcdnjs
can't send you a message.

Email me at thomasalwyndavis@gmail.com

------
howlgarnish
Commendable effort, but do I understand correctly that you're just doing word-
for-word replacements? If so, the end result will not be at all grammatical,
since you're missing the case markings and inflections required in most
Aboriginal languages.

~~~
thomasfromcdnjs
Yeah, the translation hasn't quite been massaged yet. The next steps are;

1) Complete copying the dictionary in

2) Add more variations of words e.g. hi, hello, gday

3) Add a probability matcher that looks for % matches of liklihood e.g. house
!= hosue (0.75)

4) Use the saved grammar types to try infer if the word is a verb before noun
etc

5) Use a real ML/NLP system to really try understand the language

rinse and repeat those steps for optimizations. A few developers have already
reached out to contribute so hopefully get this up to scratch asap.

------
rlv-dan
Good effort, but this is not Google Translate, just Translate.

Edit: I see the title of this post is changed now

~~~
thomasfromcdnjs
Sorry I meant to put it in double quotes.

------
donbrae
Great project!

