
In-Emulator Machine Translation of Retro Games - polm23
https://www.libretro.com/index.php/retroarch-1-7-8-ai-service-how-to-set-it-up/
======
Pfhreak
It's interesting to compare translation to localization here. They are two
different processes. Translation might give you a set of words with roughly
the same literal meaning, but localization tends to go much farther. For
example, localization might change sprites to something more recognizable to
the local audience, translate idioms into something meaningful, ensure that
the game respects cultural norms/laww (e.g. No skeletons in China, no nazis in
Germany), or even alter color palettes to make something more recognizable to
a local audience. They'll also change the images with text in them, e.g. the
title screen.

Localization is a ton of work, full of many judgement calls. I love seeing
these types of translation efforts, but it's important to remember that it's
an incomplete process.

~~~
AdmiralAsshat
A full localization is always preferable, but since it does entail quite a bit
of work, I can think of at least two scenarios where this would be useful:

1) I played a good amount of untranslated Japanese ROMs for the NES/SNES/GBA,
back in the day. Some of them had pretty simplistic gameplay ("Press A to
swing your sword. Swing your sword until everything on-screen is dead.") or
gameplay that was self-explanatory once it started. Those games didn't really
need full translations to get enjoyment out of the game. Except, I recall a
handful where I could never even get to the gameplay part because I was stuck
in some pre-game menu hell where a non-native speaker would not be able to
figure out how to get the game going, short of clicking every menu option and
every sub-menu option. This happened alot with sports games, IIRC.

2) A more recent example that comes to mind would be Rhythm Tengoku (a.k.a.
Rhythm Heaven) for the GBA. 95% of the game's gameplay is "Press a button in
time with the rhythm", making it trivial for non-Japanese speakers to play.
But one particular stage had an on-screen text prompt which told the player
_which_ button they should press at the right time, making it a random
guessing game for the non-Japanese speakers. A quick translation for what is
an incredibly tiny component of the full game would make a full localization
unnecessary and still allow international players to complete the game.

~~~
temporaryvector
>A full localization is always preferable

You'll quickly find this not to be the case for everyone if you hang around
fans of Japanese games or cartoons, or really any non-casual fans of foreign
media. A lot of those people prefer the translation to be as direct as
possible and for the translators to do little to no localization for various
reasons. The reasons are varied but include things like preserving the
artist's original intent, preserving cultural nuance, fear of censorship,
changes to the story lines to match local cultural sensitivities, inclusion of
local cultural references that get outdated fast and many more. Some of these
reasons are more valid that others, but in general localization is seen as
damaging to the original work in favor of getting more sales.

Taken to the extreme this results in people doing things like learning
Japanese to enjoy the originals or learning Russian to read Dostoevsky.

There is a trick to having just the right amount of localization to make the
work attractive to a new audience without alienating the existing fans, and
it's by no means trivial.

For the kind of people who emulate games or play fan translations, a complete
localization would not be a selling point and a more direct (even if flawed)
translations could be preferable, particularly because overtime these people
get used to interpreting the original intent as they become familiar with the
culture surrounding the games, even if they don't speak the language.

~~~
AdmiralAsshat
Such people exist, yes, but you're never going to convince me that those
people's opinions are valid. :)

I can completely understand not wanting to completely change someone's name or
setting (e.g. pretending your clearly-based-in-Japan game is actually New York
and protagonist Toshi is now named "Stan"), but the degree to which some of
them attempt to preserve the Japanese is mind-boggling. To use a contrived
example, let's say an RPG character has a special attack called 北斗の拳 (anime
name, not an attack name, I know, just bear with me). _Most_ people would
prefer that the name be translated to "Fist of the North Star" so that its
meaning is obvious. Nonetheless, there is a contingent of fans and purist
translators that insist it is _better_ simply left rendered as "Hokuto no
Ken", where the pronunciation is left intact, and yet the meaning is
completely obscured to the non-native speaker.

In which case, my question is, "If you believe your target audience knows
enough Japanese to understand what the hell that attack is called, why do they
need your translation?"

~~~
grawprog
The thing with old video games, the things retroarch is translating, a lot of
localizations were terrible to the point of being incomprehensible(castlevania
2) comes to mind, came with some heavy censorship(especially) with Nintendo
and in some cases actually changed a bunch of the gameplay. For example, ninja
gaiden 3, where the north American version cut out passwords and continues
when it was localized.

------
AdmiralAsshat
Clearly this isn't going to eliminate fan-translations or suddenly make that
unreleased Super Famicom JRPG completely understood, but I can think of so
many games where just having a machine-translated _menu_ might make the
difference between a game being playable or not.

------
Qwertystop
Clyde Mandelin has done something similar,
[https://legendsoflocalization.com/wanderbar/](https://legendsoflocalization.com/wanderbar/)
, but without claiming "AI" and by producing HTML in a sidebar instead of
voice or screen-alteration.

(also useful for non-translation things. needs per-game plugins for e.g. what
memory area to look at. on the other hand, reading memory directly instead of
using OCR means much less processing to to to get it running live instead of
needing a button-press per-frame.)

------
derefr
It always surprises and annoys me that these two steps—OCR and translation—are
put together inseparably in these cloud APIs.

OCR is the much simpler of the two, and is possible with an entirely-offline
model; but AFAIK there are no “modern” OCR algorithms (e.g. the tech behind
WordLens) available as offline libraries. It’s cloud or nothing if you want to
get the best algorithms, and _especially_ if you want an OCR algorithm that
works on text containing an arbitrary mixture of Unicode graphemes rather than
text in a specific known language.

And, even if I _didn’t_ just want OCRed text—if you separate the two steps of
the process (putting things through OCR first and getting a result back, then
feeding those results to machine translation separately), then you also get
the opportunity to apply domain-specific “cleaning” to the resultant text
before feeding it into machine translation. This usually has much better
translation results.

One thing I’ve always wanted from web browsers is the ability to just “see
into” images, such that I could copy-and-paste text out of any image embedded
in a page. (If you’ve ever seen a “translation annotation” done as a floating
div on an image, I’d expect to see those, but with text in the original
language, not the translated language.) I’ve been consistently surprised when
new browsers like Opera or Edge come out and don’t decide to differentiate
themselves by including this feature.

But, back on topic, I’d love if emulators could “lift” the bitmap text that
gets rendered to the framebuffer into semantic “text” visible to the OS
accessibility API, in the same way. This would enable:

• screen-reader software to read the text

• the user to highlight, copy and paste the text

• text transcript dumps to be made of the game, using 100% demo-files +
scripting

Certainly, translation could be added on top of this, but just having the
original text, in the original language, _available_ to the emulator’s engine,
would open all sorts of possibilities.

~~~
polm23
> One thing I’ve always wanted from web browsers is the ability to just “see
> into” images, such that I could copy-and-paste text out of any image
> embedded in a page.

Here you go:

[https://projectnaptha.com/](https://projectnaptha.com/)

~~~
derefr
Naptha has two flaws that exclude it from being what I mean:

1\. It relies on the crappy old kind of OCR algorithms, that were only trained
on English text. (Yes, even though it uses cloud services. The cloud services
themselves use the crappy old OCR algorithms. Literally only the translation
services have modern ones.)

2\. It isn’t _built into_ a web-browser, in the sense of the model being on-
disk as part of the web browser and operating offline, but rather has to
upload your images to cloud services. If you want Naptha to be available
automatically for any page (such as you would if you were using a screen-
reader), then every image on every page you visit is _automatically_ fed to a
cloud service. This is not only a horrible privacy leak but also means that
with Naptha enabled, pages take _forever_ to fully load. (Which wouldn’t be
true with an offline model, given a good computer, or a device that has
dedicated ML-model-running circuits like the iPhone.)

Also, a third, that’s more of a UX flaw than anything and _could_ be fixed:

3\. The model tries to parse the whole image as if it were left-to-right
prose, when often the image’s text is in columns, or figure captions, or
dialogue balloons. The user can very easily see where the text is, but there’s
no way for the user to indicate to the engine where on the image the text is.
This means that the model will “connect” text that shouldn’t be connected,
just because it’s on the same baseline, even though it’s part of two separate
text flows. It would make a lot more sense if the images with this plugin
enabled became interactive in the sense of allowing you to draw crop-mark-like
rectangles on them, where the text in the rectangle would then be separately
OCRed. (As well—even for English!—some text has the characters flowing
vertically instead of horizontally. Rather than recognizing these as
individual “lines”, it’d be helpful to be able to draw an arrow indicating
direction-of-flow.)

Yes, you could train a separate model to discover flow-boxes—but nobody’s done
it, and AFAIK we don’t have a good training corpus for it (save for rendering
documents with TeX and using the output of an intermediate layout step as the
training data, but this only goes so far.) It would make a lot of sense to use
these users’ indications of flow-box placement (with their opt-in consent)
_as_ this training corpus. (This model alone would be valuable enough to make
a sustainable business out of—a lack of a flow-box recognizer is the very
reason that Evernote, Notes.app, etc. all have handwriting-based _indexing_ ,
but no handwriting _transcription_.)

------
mysterydip
Coincidentally, this project with a similar goal but different approach came
up in my twitter feed this morning:
[https://www.codedojo.com/?p=2426](https://www.codedojo.com/?p=2426) done by
the guy who made the LORD and TEOS BBS games, for those that remember them.

------
lostgame
I'd add a suggestion to this.

Many games already have excellent fan translations - as a Sega Saturn junkie,
whose excellent selection of JAP games dwarfs it's pathetic NTSC-U offerings,
I am very used to using, usually quite excellent, fan translations.

Could this service have access to a sort of database, where, in the case of an
existing translation for that game, given permission, it could simply retrieve
that, instead of a likely incorrect, or poorly-translated version? Using the
'AI' as a fallback when such translations don't already exist?

------
peterburkimsher
If there are any Age of Empires II fans around here, please could someone let
me know how to install a Chinese language pack? I'm using AoE II HD on
Crossover on a Mac. I've installed the Traditional Chinese localization
("Zoey'sWork") but only see "???D??" instead of normal text. I'm not sure
whether it's an issue with Crossover or AOE.

