
Ask HN: Could you recommend language agnostic NLP tools? - assane101
I just build a spell-checker for Wolof, my native language, using some basic rules and a dictionary I managed to put together. I need your help finding open source tools for NLP that are language agnostic or do not require lot of heavy lifting to adapt to a new locale.<p>Thanks for your help.<p>If you would like to test my spell-checker : https:&#x2F;&#x2F;digibox.info&#x2F;apps&#x2F;experiments&#x2F;wolofix&#x2F;
======
web64
Polyglot [0] is a python multilingual NLP toolkit. The quality is not great,
but it supports a lot of languages.

[0]
[https://github.com/aboSamoor/polyglot](https://github.com/aboSamoor/polyglot)

------
Pamar
Far from an expert but I was just discussing this with a former colleague
about a specific problem he is considering and I found this:
[https://www.r-bloggers.com/natural-language-processing-
for-n...](https://www.r-bloggers.com/natural-language-processing-for-non-
english-languages-with-udpipe/)

~~~
assane101
I see Wolof is under "Upcoming UD Languages", I know nothing about R but I see
what I can contribute and/or get from there. Thanks!

------
itronitron
The Lucene API has a lot of language specific tokenizers and analyzers that
will help normalize what a term is in the index regardless of language. You
can then apply various statistical NLP methods which tend to be more language
agnostic.

------
thecodingmonk
I work in NLP at a company that actually develops language agnostic solutions,
but I'm not aware of any open-source tool that can do this.

Nonetheless, if you can be more specific about what kind of tools you are
looking for maybe I can give you some pointers.

~~~
assane101
Thanks for your reply. If you don't mind sharing a link to your company's
website or products, I would appreciate.

These are some areas of interest to me : 1- Translation : ie French->Wolof 2-
Speech understanding & question answering systems 3- Text to speech .. among
others. (I will work day and night to build training samples if I have the
tools)

~~~
thecodingmonk
Sure, the company is Babelscape
([http://babelscape.com](http://babelscape.com)). For the translation tasks
you can find massive parallel dataset with several language pairs at
[http://opus.nlpl.eu/](http://opus.nlpl.eu/), the other two things that you
mentioned are not really in my area of expertise so nothing comes to my mind
at the moment.

