Hacker News new | past | comments | ask | show | jobs | submit login
“sync,corrected by elderman” issue in ML translation datasets spread on internet (duckduckgo.com)
3 points by mvolfik on March 17, 2023 | hide | past | favorite | 2 comments



I can't find the true origin of this, but (unless I'm missing some old internet joke) it seems like some language models have some corrupt training data frequently including a string like "== sync, corrected by elderman ==". Now searching for this phrase yields a ton of random results occurring in places where you would expect automatically translated spam. Some interesting mentions I found:

- it historically appeared in autotranslated game chats in Arena of Valor game https://www.reddit.com/r/arenaofvalor/comments/btykru/commen... - mention on GitHub repo of a translation model https://github.com/Helsinki-NLP/Opus-MT/issues/62

I'm curious to see if anyone else has interesting encounters with this


i think that might've come from the rtfm.mit.edu FAQ archives, there were several documents there that had multiple language versions and were great bait for things needing translated text inputs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: