Hacker Newsnew | past | comments | ask | show | jobs | submit | Alex-Programs's commentslogin

This is really cool!

I've done a lot of research into LLM translation for my product[0], and I'm currently working on a deep translation service that provides reliably human-level translations.

I don't know what model you're using, but GPT-4.1 is probably the best for your use case - it's in the top few % for nearly every language, and has a low standard deviation, while also being relatively low latency and low cost.

[0] https://nuenki.app


"Devotion" feels more appropriate.

I'm working on a translation API that's ~2x better than SOTA LLMs.

The API itself is done. Right now I'm remaking the landing page so that it doesn't quote "look like it's from 2015". We'll see how it goes.


Hi,vishesh here,would love to collaborate, I have 8 months of experience building ML solutions.

I am a flexible Fullstack AI developer.

take my resume for reference :

https://drive.google.com/file/d/1tJBfVqmZycwueCMq1ym484lMmU3...


My old technical school had Immersive Labs, which used remote VMs and had some tutorials around Linux usage.

This is a neat topic. I've only ever done this once.

A friend of mine had made a spreadsheet of fusion 360 shortcuts, so I made a little webapp for use at school: https://fusion.alexcj.co.uk/

I made it use a custom stylesheet for printing, as they describe in the article, so that it produced a nice worksheet - though in retrospect, I probably ought to have made it denser when printed!


Ironically, maybe writing. I've been able to get LLMs to translate well, but I can't get them to write well.

More realistically, though - the areas with the best lobbying and strongest unions.


By far, Deepseek writes the best of any model I've used. I've heard it's because they totally ignored all copyright laws and got a bigger training set of higher quality literature.

I've noticed the same! I assumed it was something to do with its reinforcement learning making it more organic and less LLM-speak, but that sounds plausible.

I'm working on LLM translation research for my tool that teaches you a language while you browse by translating sentences at your level into the language you're learning (https://nuenki.app)

I've had some breakthroughs with LLM translation, and I can now translate (slowly, unfortunately) at a far far higher quality than Opus, and well above DeepL. So I'm considering offering that as an API, though I don't know how much people actually care about translation quality.

DeepL's customers clearly don't care - their website is all about enterprise features, and they appear to get plenty of business despite their core product being mediocre.

Would people here be interested in that?


This looks fun. I don't have any specific suggestions - I've always used WASM with Rust - but I wish you luck with your project!

P.S. You might have more success with replies on Reddit. HN is very all-or-nothing.


Tried the reddit thing, funny story. I tried to post to the wasm sub and it said it was restricted. I posted to webdev and they said maybe try posting it at the programminglanguages sub. So I tried there and they deleted it! I don't understand anyone anymore.

Classic Reddit. IRC, perhaps? :)

Does anyone still use IRC? I thought freenode died?

There's a successor, I can't remember the name. You could also try Matrix or Discord?

The internet is too big, it didn't used to be this way :(

Thanks I'll give it a try.


You might like my tool. It has the same general principle as LingQ - learning from content you actually want to read - except it applies it to all web browsing. It's a browser extension that finds sentences in webpages, scores them by difficulty, then translates the ones that are right at the edge of your knowledge.

https://nuenki.app

I haven't added Anki integration, though. A few people have asked for it, but it's a big time investment for something relatively niche.


My tool supports Thai, if you'd like to try it - https://nuenki.app . I added it at the request of a user, who seems to be happy with it.

It's a browser extension that finds English sentences in webpages, and translates the ones at your difficulty level into the language you're learning.


Thank you, I will try it, although I'd prefer to translate entire sentences into Thai randomly. Perhaps you can add this advanced mode. Actually, I saw your app before while looking for an alternative to Toucan that supported Thai, but at that point in time you hadn't added support yet. Thanks for doing so.


Okay I installed it and this is pretty great. Although I think your extension doesn't work for Thai the way you think it does. Because there's spaces between sentences instead of between words in Thai, it's translating entire sentences even with the "words only" setting enabled. This is what I want anyways, but will be too difficult for most learners. I have written misc Thai learning softwares and just so you know you should use an LLM to do word-splitting, not a software library. If you do use a library, you need to split words while looking for the largest possible word, but it won't work well. Basically you can't tell without a brain whether it's a lot of small words next to one another or a smaller number of compound words. IME only an LLM or a human will do a good job of this.


Translating entire sentences is the idea - I'm not sure what setting you mean with "words only"? I really ought to make the settings clearer, but it's hard to do when you know what they "ought" to express.

"Translate Isolated Words" allows it to translate "sentences" of only one word, but it doesn't disable full sentences.

And yeah, atm it word splits by spaces for the dictionary. I hadn't thought to do it with LLMs, though that's a good idea. There's a somewhat related problem when doing Furigana, where it has a hashmap of strings-to-pronunciations, and it starts with a 4-character sliding window looking for matches, then a 3 character, etc.


That's a pretty sick idea. Unfortunately I presume it involves sending your browsing data (e.g. page contents) to the server?


Yeah, though I've added lots of privacy protections to at least partially mitigate that:

- There's a global blacklist of sites, as well as phrases in the title/URL (e.g. "bank")

- You can blacklist sites yourself

- Each sentence is run against filters checking for medical/legal/etc info, as well as checks for addresses, card/social security numbers, etc. All the checks are done client side

- There are also some special implementations, e.g. it looks at the source code of websites to work out if they're an instance of an American health portal that I've forgotten the name of - each doctor's surgery self-hosts it.

- Websites can add `nuenki-ignore=true` on their end, if they'd like to disable it.

And of course it doesn't log anything, though there is an anonymous cache in order to make it economical.


What about a whitelist? I might just be interested in only having certain sites, like this one or Reddit, translated into my target language. That way I can be certain it is only turned on for sites that I am OK with sharing browsing history and not be concerned that I might have missed adding something to the blacklist.


This is a great idea. Specifically, I want this enabled when I'm wasting time but not when I'm working. So I'd like it to be enabled only on X.com. This whitelist+blocklist functionality could be a user-side setting like with Adblockers.


That's a good point. At some point I ought to make a UBlock-origin style list of customisable rules.

At the moment I'm focused on translation quality, but I intend to add that.


Thanks!


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: