Hacker News new | past | comments | ask | show | jobs | submit login
Fast and secure translation on your local machine with a GUI (github.com/xapajiamnu)
158 points by Intralexical on April 14, 2024 | hide | past | favorite | 38 comments



Good idea, I wish it to work well.

Tried to translate on the official website https://private.mt/. The phrase "PRIVATE MACHINE TRANSLATION, RUNNING LOCALLY ON YOUR DEVICE" translates to Russian as "ПРАКТРОНАТРАТИВНАЯ РАННЕЕ ПЕРЕДАЖА, ПОВЕСТКИ ДЛЯ ВАШЕГО УСЕДАНИЯ", which is an uncomprehencive combination of characters (although "вашего" is a correct word).

Translating to Ukrainian also produced rubbish: "ПРИВАТНА МАЧИННА ПЕРЕКЛАД, РУНУВАННЯ ЛОКАЛІЇ НА СВОЇХ ДЕВІСНИКІВ".

To German it translates as "PRIVATE MACHINE ÜBERSETZUNG, RUNNING LOCALLY AUF IHREM DEVICE", comparing to Google translate "PRIVATE MASCHINENÜBERSETZUNG, LÄUFT LOKAL AUF IHREM GERÄT".


It depends entirely on the specific model it's using, I guess. I believe the currently list it queries is here:

https://translatelocally.com/models.json

…Oddly, I don't actually even have Russian available at all in my desktop install of this. And there is Ukrainian, but only Ukrainian-to-English, so there's also no way that it could even be using another language as a pivot as there aren't any models that output Ukrainian. I guess the website might be using old, known bad models or something?

With "English-German tiny", I get "Private maschinelle Übersetzung, läuft lokal auf Ihrem Gerät", and with "English-German base", I get "Private maschinelle Übersetzung, die lokal auf Ihrem Gerät läuft", though I had to type it in lowercase.

I'd trust the translation quality enough to read foreign articles in it. Not enough to translate anything meant for anyone else to read…


Yeah, it'll all be about the model. I do a fair bit of machine translation and the Helinski opus models are generally good enough for groking what a text is about. Definitely better than the examples above.


> To German it translates as "PRIVATE MACHINE ÜBERSETZUNG, RUNNING LOCALLY AUF IHREM DEVICE"

  ATTENTION
   This room is fullfilled mit special electronische equippment.
   Fingergrabbing and pressing the cnoeppkes from the computers is
   allowed for die experts only!  So all the "lefthanders" stay away
   and do not disturben the brainstorming von here working
   intelligencies.  Otherwise you will be out thrown and kicked
   anderswhere!  Also: please keep still and only watchen astaunished
   the blinkenlights.


As we clarified under the sibling comment by KTibow (why is it flagged now?), downcasing the input text results on OK translations.


Well, it did say it was fast and secure, nothing about accuracy...


[flagged]


> which GPT says is correct

AI really has broken people's brains, huh :/


Either I’m missing a joke or you’re being very unnecessarily rude. Hoping it’s the former.


He's basically trying to say don't fully put your trust in any LLM, even one of the top ones. As you can see from an adjacent comment from someone who speaks the language, it's a closer translation but still not quite right.

> Seems so. I typed in a lowered version now, gives good translation "частный машинный перевод, работая локально на вашем устройстве". (The one you got is a little clumsy ~ "translation of machines")


If you don't know something, then just say you don't know. Deferring to an LLM just comes across as low-effort and irrelevant.


He is right, you don't confirm that an LLM works well by comparing it to the result of another LLM.


As with many things spread by the Internet, I think it's just lowered the bar (and the effort) for participation. But the brains are the same.


Seems so. I typed in a lowered version now, gives good translation "частный машинный перевод, работая локально на вашем устройстве". (The one you got is a little clumsy ~ "translation of machines")


It's not a good translation. The participle is in the wrong form, the first part of the sentence does not make sense in this context and reads something like "this is the machine's private business's translation".


No, it does not read like that. The translation is (reletively) good.


It does read like that. While "частный" из the direct mechanical translation of "private", it does not work like that in the context.


Second part of that sentence is correct. First one makes no sense in the context, and is awkwardly constructed. First part says something like "machines' private translation (as in this is the private translation business run by machines)" though there other possible interpretations


How does this compare to Deepl in translation quality?


Interestingly, I think this is actually related to the offline translation features built into Firefox. Both are products of "Project Bergamot", but the Mozilla-maintained version was later merged into the Firefox application:

https://browser.mt/

https://blog.mozilla.org/en/mozilla/local-translation-add-on...

https://hacks.mozilla.org/2022/06/training-efficient-neural-...

https://github.com/mozilla/firefox-translations

https://firefox-source-docs.mozilla.org/toolkit/components/t...

Extra webpage with screenshot and links, impossible to search for normally:

https://translatelocally.com/downloads/

Does one thing and does it well.

Oh— For downloading models, it's much easier to pipe/`xargs` `translateLocally --available-models` into `translateLocally -d` than go through the GUI.

---

Other self-hostable translation tools:

https://www.apertium.org/index.eng.html

- Traditional rule-based translation. Seems to work pretty well, but no good desktop frontend.

https://www.argosopentech.com/

- Works, but crashy desktop app.

https://libretranslate.com/

- API wrapping Argos Translate.

https://lingva.thedaviddelta.com/

- Google Translate scraper/privacy frontend.

https://euroglot.com/

- Proprietary, subscription trialware.


In firefox, you can get this by navigating to about:translations


The models used, without really trying them yet, seem to be much older and much worse compared to seamless-m4t-v2 [1] which is multi-modal and support the tasks of:

Speech-to-speech translation (S2ST) Speech-to-text translation (S2TT) Text-to-speech translation (T2ST) Text-to-text translation (T2TT) Automatic speech recognition (ASR).

across

101 languages for speech input. 96 Languages for text input/output. 35 languages for speech output.

I tried it for low resource languages like Thai to German for text and audio, and it works quite well.

1 https://huggingface.co/facebook/seamless-m4t-v2-large


> https://huggingface.co/facebook/seamless-m4t-v2-large

Unfortunately, interpreting "CC-BY-NC" as a software license, I think you'd be pirating if you used the linked models for anything you might sell.

(Bergamot is BY-SA, but I think the virality would only apply to derivative models and not model outputs, whereas Facebook's NonCommercial clause might apply to usage of the original model itself, as it usually does in software licenses.)


I was looking for something like this since I found the awesome Firefox plugin, thank you!


This is interesting. LibreTranslate has been my go-to offline translation system for a few years, but I need an upgrade. This looks like a decent candidate, assuming I can make it work with what we got.


Anyone happen to know how this program got funded by Horizon?


https://cordis.europa.eu/project/id/825303 seems to have a lot of info.


Whoa, 3 million EUR! Nice!

I hope they'll fund more things that aim to break cloud/vendor lock-in.


What languages it supports? How about Japanese and chinese?


could this be used for locally translating between programming languages in addition to natural languages?

any models for that available i wonder?


It depends on the scale you're talking about but the local LLMs are fairly capable of translating between different programming languages, particularly if you're not so concerned about external library support.

Pasting a function in and asking for it in a different programming language will get an implementation in the target language. Using ollama run llama2:13b on my mac will allow converting in this fashion.

It might not be the best code, but this is true of machine translations as well.


thanks for the reply!

  > Using ollama run llama2:13b on my mac will allow converting in this fashion
okay, i'll have t try this out

  > It might not be the best code, but this is true of machine translations as well.
true, one thing im hoping for is some local version of copilot/chatgpt for code where its trained on local libraries and the local project and such (and can translate them in some scenarios)


Computer programming languages are already machine-parsable though. ML does not seem like the appropriate solution for converting between them.

Technically what you're describing is done by a compiler/decompiler/transpiler, operating on the AST.


Programming languages are machine-parseable if you already have a parser for them. The befit of LLMs is that you do not need a parser for each programming language.


How does this compare to something like Whisper?

EDIT: this is a genuine question as I don't have a clue. Rather than downvoting without comment, maybe downvote and let me know why my question is dumb?


It's translation (text -> text), not speech -> text.


Thanks, much appreciated for the clarification. I clearly overlooked that, which now it's pointed out seems entirely obvious, my bad. Only took negative karma for it to click, haha.


Ironically, the other link I posted at the same is actually speech to text. You want something like VOSK if you're looking for local machine transcription:

https://news.ycombinator.com/item?id=40027675

As for quality, I think its models are, IDK, maybe around the level that Youtube automatic captions were two or three years ago? So well over 90% accurate, and servicable for getting something to search for or clean up, but expect it to get a word wrong every now and then.


This post got downvoted, but there's a legit point here. I've found whisper's translated speech to text to be pretty decent, certainly compared to the reported quality of this bergamot-tiny used in the OP.

FWIW, I like Helinski opus on Huggingface, worth checking out if you need machine translation and can deal with sub Google Translate quality.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: