It's very refreshing to see a local-only program with a nice native GUI these days, thank you.
It's also cool to see how the open source translation ecosystem has evolved. I worked on a Japanese<->English translation software just a few years back and you had to do everything from scratch or use per-language tooling (like Mecab or Juman ugh).
It'd be cool if there were metrics for your language translation models to be able to compare them to other offerings like google translate for example.
Thanks! The goal was to make local-only translation usable. I think cloud translation is still pretty valuable in a lot of cases since the model for one single direction translation is ~100MB. In addition to having more language options without a large download cloud translations let you use more specialized models for example French to Spanish. I just have a model to and from English for each language and any other translations have to "pivot" through English. For cloud translations you can also use one model with multiple input and output languages which gives you better quality translation between languages that don't have as much data available and lets you support direct translation between a large number of languages. Here's a talk where Google explains how they do this for Google Translate: https://youtu.be/nR74lBO5M3s?t=1682. You could do this locally but it would have its own set of challanges for getting the right model for the languages you want to translate.
It would be possible, and probably not to hard to benchmark these translations (the standard method for automated testing is a BLEU score) but haven't bothered so far. I use Stanza for sentence boundary detection, SentencePiece for tokenization, and OpenNMT for translations which all seem to be about the best available open source. I think the most interesting thing about the underlying translation is that I use Stanza for sentence boundary detection which lets it deal with languages that don't use periods with one set of tools: https://forum.opennmt.net/t/sentence-boundary-detection-for-.... An OpenNMT demo script I based my training script on claims a 28.0 BLEU score, which is pretty good, so that would probably be a reasonable estimate at least for language pairs with lots of high quality data available: https://github.com/OpenNMT/OpenNMT-tf/tree/230118d72b8787605....
From the Readme:
"Open source offline translation app written in Python. Uses OpenNMT for translations, SentencePiece for tokenization, Stanza for sentence boundary detection, and PyQt for GUI. Designed to be used either as a GUI application or as a Python library."
This is my submission/project, happy to answer any questions.
Hopefully, in the future I'd like to add better support for native installation on other platforms (Flatpak, .deb, .rpm, MacOS, Windows, maybe mobile). If anyone who knows more about Flatpak is interested in helping I would very much appreciate it.
I didn't get any notification from Google but this probably means they're throttling the downloads. There's been a lot of traffic from HackerNews (GitHub stars have more than tripled in a day) so shouldn't be an issue in a day or so. I'm going to try to add an option for torrent download now too for another option.
Still not working for me (could it be that a Google account is now needed to use Drive ?). Maybe you could had that torrent to the source code on Github.
Interesting, I just copied the .torrent file into the Github repo so Google Drive isn't required at all. This is probably a better way to do it anyways.
I haven't written any specific integrations or scripts for translation formats because I don't have a great sense of what people would find useful. Since this is written in Python it would be pretty easy to write a script to batch process most formats. Feel free to make a GitHub issue for Qt translation files (or other formats) and I may try to add support at some point. Qt translation files would be a good one too because currently the GUI has no localization which isn't ideal for a translation application. Also if you write your own script please share there is the scripts/ directory in the code where we could put scripts others might find useful.
It's also cool to see how the open source translation ecosystem has evolved. I worked on a Japanese<->English translation software just a few years back and you had to do everything from scratch or use per-language tooling (like Mecab or Juman ugh).
It'd be cool if there were metrics for your language translation models to be able to compare them to other offerings like google translate for example.