Show HN: Argos Translate – Open-source offline translation app written in Python

ad404b8a372f2b9 · on Dec 19, 2020

It's very refreshing to see a local-only program with a nice native GUI these days, thank you.

It's also cool to see how the open source translation ecosystem has evolved. I worked on a Japanese<->English translation software just a few years back and you had to do everything from scratch or use per-language tooling (like Mecab or Juman ugh).

It'd be cool if there were metrics for your language translation models to be able to compare them to other offerings like google translate for example.

pjfin123 · on Dec 19, 2020

Thanks! The goal was to make local-only translation usable. I think cloud translation is still pretty valuable in a lot of cases since the model for one single direction translation is ~100MB. In addition to having more language options without a large download cloud translations let you use more specialized models for example French to Spanish. I just have a model to and from English for each language and any other translations have to "pivot" through English. For cloud translations you can also use one model with multiple input and output languages which gives you better quality translation between languages that don't have as much data available and lets you support direct translation between a large number of languages. Here's a talk where Google explains how they do this for Google Translate: https://youtu.be/nR74lBO5M3s?t=1682. You could do this locally but it would have its own set of challanges for getting the right model for the languages you want to translate.

It would be possible, and probably not to hard to benchmark these translations (the standard method for automated testing is a BLEU score) but haven't bothered so far. I use Stanza for sentence boundary detection, SentencePiece for tokenization, and OpenNMT for translations which all seem to be about the best available open source. I think the most interesting thing about the underlying translation is that I use Stanza for sentence boundary detection which lets it deal with languages that don't use periods with one set of tools: https://forum.opennmt.net/t/sentence-boundary-detection-for-.... An OpenNMT demo script I based my training script on claims a 28.0 BLEU score, which is pretty good, so that would probably be a reasonable estimate at least for language pairs with lots of high quality data available: https://github.com/OpenNMT/OpenNMT-tf/tree/230118d72b8787605....

pjfin123 · on Dec 19, 2020

From the Readme: "Open source offline translation app written in Python. Uses OpenNMT for translations, SentencePiece for tokenization, Stanza for sentence boundary detection, and PyQt for GUI. Designed to be used either as a GUI application or as a Python library."

This is my submission/project, happy to answer any questions.

rakwoelq · on Dec 20, 2020

Are there any future plans to bring this wonderful application to Flatpak too?

pjfin123 · on Dec 20, 2020

Hopefully, in the future I'd like to add better support for native installation on other platforms (Flatpak, .deb, .rpm, MacOS, Windows, maybe mobile). If anyone who knows more about Flatpak is interested in helping I would very much appreciate it.

ldng · on Dec 20, 2020

Am I the only one who can not download the argosmodel files from Google Drive ? (tested with both Firefox and Chrome)

pjfin123 · on Dec 20, 2020

I didn't get any notification from Google but this probably means they're throttling the downloads. There's been a lot of traffic from HackerNews (GitHub stars have more than tripled in a day) so shouldn't be an issue in a day or so. I'm going to try to add an option for torrent download now too for another option.

pjfin123 · on Dec 20, 2020

Just created a torrent with all of the models, and added a link to it in the models section of the README.

ldng · on Dec 20, 2020

Still not working for me (could it be that a Google account is now needed to use Drive ?). Maybe you could had that torrent to the source code on Github.

pjfin123 · on Dec 20, 2020

Interesting, I just copied the .torrent file into the Github repo so Google Drive isn't required at all. This is probably a better way to do it anyways.

maille · on Dec 19, 2020

Looks great, can you batch process Qt translation files?

pjfin123 · on Dec 19, 2020

I haven't written any specific integrations or scripts for translation formats because I don't have a great sense of what people would find useful. Since this is written in Python it would be pretty easy to write a script to batch process most formats. Feel free to make a GitHub issue for Qt translation files (or other formats) and I may try to add support at some point. Qt translation files would be a good one too because currently the GUI has no localization which isn't ideal for a translation application. Also if you write your own script please share there is the scripts/ directory in the code where we could put scripts others might find useful.