Hacker News new | past | comments | ask | show | jobs | submit login
Localizing Papers, Please (dukope.tumblr.com)
198 points by baxter on Aug 3, 2014 | hide | past | web | favorite | 58 comments

In case you don't see the link, Pope created a web interface, hosted on Github Pages, that allows users to easily do the localization, collaborate via Github forks, and import/export their work via CSV:


Pretty brilliant...reading Pope's devlog for the game makes it clear that he's an excellent game developer and designer..but it's unexpected that he'd also know how to whip up a useful, well-designed web-app for the purposes of supporting his game like this.

It is always interesting to see people use CSV in place of the established file format for translations, POT files. There are logic for certain grammatical elements (IIRC, it has been a while) and other edge cases that are exactly why one avoids homegrown CSV in the first place.

Then again, even Papers Please is making common mistakes. I can count on a single finger the number of i18n jobs I have seen. One in many years of looking at IT jobs. This stuff is little understood, and very little demand. But he makes of pointing out how this will causes problems.

I would love to see this game in Arabic, for example. Number in Arabic is crazy complicated, and as a guy who translated software for FOSS (arabeyes.org) there is a reason I bring up the number logic (1, dual, 3-99, 1000+ dictate different noun classes) and POT handles that. This and many other issues indicate why no one can be bothered to handle this until much later, and then it is such a pain in the ass with non-English charsets.

Not that this issue has come up a lot on HN recently. I am glad people are showing this stuff with interest. Regardless of my opinion, this is very cool work and I am glad to see developers caring again.

That was my reaction as well: why not gettext? ngettext and other facilities are wildly useful, and there are a huge number of tools out there designed to make it easy for translators to maintain translations.

gettext is more of a pain than you'd think unless you're in C or a C-with-a-hat-on-it like PHP. As I mentioned elsewhere in this thread I settled on Excel XML files because gettext (which I of course looked at first) wasn't a worthwhile format for either Java or .NET. And given that Papers Please is (I think) written in HaXe, I don't know what the story for gettext support is there.

That's without getting into licensing headaches, which is why the idea of P/Invoking GNU gettext is a straight-up nopenopenope.

It is always interesting to see people use CSV in place of the established file format for translations, POT files.

CSV is common because there are many more applications that can read/write it, it's relatively standard, and translation data tends to be vaguely relational.

> makes it clear that he's an excellent game developer and designer..

FWIW, playing Papers Please, you can easily come to the same conclusion. The game is a great game.

It's a fascinating and thought-provoking experience, but I'm not sure I'd call it a great game in the traditional sense (not that it aims to be)

"In the traditional sense"? What does that even mean?

Maybe entertainment value? Traditionally games have been designed for fun and entertainment, whereas the shift in the last 5 years or so has been towards more emotional 'experiences' such as Flower, Journey, Gone Home, The Last of Us, and Papers, Please.

Contrast this with getting high scores, competing with your friends, collecting pickups, etc. that are all what I would call traditional game mechanics.

Certainly to each his own, but in this case, Papers, Please hits everything I want in the type of game it purports to be, and does so in splendid fashion.

I find it a bit unfair to request free help from users to localize the game, given that the game is proprietary...

I'm inclined to think the same, but you wouldn't discard fan translations either... If users show interest and enthusiasm for localizing the game, the best thing would be to find a way to integrate that effort into the process. Maybe you could earn some in-game coins or something?

I don't find it much fairer to retribute users that work for you using a currency that only makes sense in the universe that you designed and own.

Ideally, users should be cautious in choosing the projects to which their dedicate their interest and enthusiasm (choose those which belong to the community)... Under the assumption that the game cannot be made open source (or that it is impractical to give a share of the game's profit to the contributing users), I imagine that a good solution would be to republish all translations separately under an open license (in case they can be reused, very theoretically, for something else... maybe as an aligned corpus to train a machine translation system), and make sure that the users who contributed are prominently credited.

Is he the sole developer of the game?

Localization professional and developer here. I'd recommend taking a look at Transifex. This is a well thought out tool for managing localization projects and assets (prompt catalogs, resx files, etc), and for managing the translation process (it supports machine, crowd and professional translation, so you can optimize for cost and quality).

They also just released a really neat Javascript tool which makes translating web content super easy. You just embed some JS in your template, and it re-writes the pages in translation when needed. Way way easier than, for example, setting up a multilingual Drupal site, and in most cases, will get the job done nicely.

My $0.02

I'd recommend against Transifex. They were originally an open project, but ended up going proprietary and abandoning the open version, causing problems for various projects that depended on it. Take a look at some of the alternatives, instead, such as Zanata (http://zanata.org/).

I use Excel XML spreadsheets (so I don't have to write my own tools) as tabular stores for stuff like I18N, item lists, and so on, but a way to actually manage translation has generally eluded me. Never heard of Transifex before but this'll help me out with my own gamedev.

Transifex has largely become a standard for large open source projects that need string translation (think Openstreetmap and LibreOffice large). I bring up OpenOffice because they have used Pootle, which was the alternative for years and I always found it so-so. But people with more experience can pipe in.

How does the JS version affect SEO?

"My $0.02"

Why do people feel the need to write this after their posts? Were you afraid it would not be clear that it was an opinion? What purpose does it serve? Why did you go to such length to write it as "My $0.02" rather than "My two cents", or just "My 2c"?

It seems to be some kind of "thing" that people do here. Is it an in-joke?

It's a way of being humble. http://en.wikipedia.org/wiki/My_two_cents

  the user of the phrase hopes to lessen the impact of a
  possibly contentious statement, showing politeness and

I think you take it way too seriously. Relax.

I'm not angry about it or anything.

I just don't get it - is it supposed to be a joke?

'My $0.02' - For when 'IMHO' isn't enough.

Very interesting article. I can't how imagine how difficult it must have been to support localization after the game was released.

Very minor complaint but first of January 1984 is always depicted as '1984.01.01' in the game, regardless of I18N selected. While this is good in Germany, it many countries like US you instead prefer something like '1/1/84'.

I wish that everyone would switch to the YYYY-MM-DD ISO 8601 standard... It sorts nicely and there would never be any question about whether it's "January 5" or "May 1"!

I think I recall a discussion about this early on, on the thread for the game on TIGSource. I think he asked what would be the best way to write down the date in numerical format without being ambiguous, given that different places swap day and month. I think he ended up using YYYY/MM/DD because when you start with a year, the other two are usually interpreted as month and day. I think he only wanted to use one format, that was way before all of these localization headaches. He probably left it like that after localization because he didn't want to create more problems.

Obligatory XKCD: https://xkcd.com/1179/

Being Swedish I totally agree with XKCD since that's how dates are written here since the 70's. But this standard seems to be receding, for example recently the Swedish DMV had to change so that on new driver's licenses the date is written '01.01.1984' instead of previously '1984-01-01' :-( because of EU parliament rules.

Not to mention this gives you free chronological sorting easily, in addition to being a standard.

Nitpick: the example was '1984.01.01' whereas the ISO format is from 1988. The ante-ISO form of representing dates may therefore be intentional, as a game design detail.

1984.01.01 wouldn't be good in Germany either. It'd be 01.01.1984 or (more sane, albeit less used) 1984-01-01 there.

And the US is about the only country in the world where MDY order for date parts is used exclusively. I wouldn't call the US and Belize »many countries« ;-)

I don't have my copy of the game here to check but I remember there being an option in the game settings that let you choose the preferred date format.

Very interesting writeup, thank you!

In his second part of his article, he says...

> There’s a system for making people sound generally non-Japanese (using lots of katakana and dropping prepositions), but it’s tiring to read and has an air of childishness, since this is one of the first scripts kids learn to read/write in Japan

This is utterly wrong. Katakana usage in Japanese has nothing childish attached to it. If at all, Hiragana would be the one which is considered the more "childish" way of writing, but there are numerous imported words (and more and more, I'd say) using Katakana even in business context - and certainly taken very seriously.

If you don't know a language, don't make assumptions on it. By the way the french translation of "Your son is dead" as "Votre fils est mort" is very dry and tasteless, the proper way or saying it in french is "votre fils est decede". I hate it when people do a literate translation from English to French, many words are similar but they are not used at all in the same situations.

> By the way the french translation of "Your son is dead" as "Votre fils est mort" is very dry and tasteless

That's exactly the point. You wouldn't really say it like this in English either, but in a Soviet style totalitarian state, it's easy to see.

I still disagree. Look at how the words are used in a formal context in English and in French: in English you say "Death Certificate" for the document recording the death and events leading to the person, but in French it translates to "Certificat de Deces", and certainly not "Certificat de Mort" - people would be laughing at such a translation in that particular context, hence the reason why I think the French translation in this particular context is inappropriate.

Nah, you're still wrong - it's the correct translation - the tone is every bit as detached in both the English and French versions - a softer version of "Your son is dead" is indeed "Your son is deceased", or even gentler "your son has passed away". The tone of "Your son is dead" in English is every bit as cold and impersonal as "Votre fils est mort" in French...

(speaking as a native English speaker, that has lived / worked in Paris for the last ten years)

You didn't answer my point about the certificate part. Please explain why the words are different in French and in English then.

"(is) dead" and "death" are different words in English, which both can be literally translated with "mort" in French.

that's different words in English.

similarly there's other languages that would translate the English words "(of) death" and "death (of)" to different words (because of case).

and you don't say "Certificate of the Dead", which is yet another phrase with a different meaning and nearly the same words.

it's just different!

Explain why they're different? Really? Why are mort and the other French word different? The why would take explaining the evolution of the language. The fact that they are different in itself is the point however. Dead is a very clinical word in describing the state of death, while deceased or "passed away" is a much softer way of communicating it. That's simply how English is. You can't look at it from the perspective of the French language any more the you can look at French via the lens of English. English is a mutt of a language, Germanic in origin, heavily influenced by Latin and Romance languages, with a significant independent evolution on its own.

The tone "dead" confers is much colder than deceased, which is exactly why the author chose to use it. To show the state couldn't care less.

Uh, because sometimes direct translation isn't appropriate? That was your own point, was it not? It's just that the case you picked happened to be one where the direct translation was in fact the correct translation...

Yeah, and that's the point I am still making about the original example. I guess we'll have to agree to disagree.

Well, we could just agree that you're wrong ;)

You'll agree on that alone then :P

Nah, I'll back them up--you're wrong.

> I hate it when people do a literate translation from English to French, many words are similar but they are not used at all in the same situations.

Yes, this is why most people generally hate doing translations in general - people like you who nitpick.

> Yes, this is why most people generally hate doing translations in general - people like you who nitpick.

I don't know a bit of French, or Japanese. But I do know that there are a awful lot of English translations, done by well-meaning people, which vary from mediocre to Godawful. Almost every anime subtitling job I've ever seen is at the very least somewhat stilted and awkward to my ear. (Though they're still better than the dubs.) And a lot of it comes down to native English speakers who fear losing the intended nuances of the original and so do a rigidly literal translation, because they think that's the "most accurate."

So the problem isn't nitpicking; it's the wrong kind of nitpicking. If you think you can do a translation by following a rulebook and wave off dissenters as "nitpickers," you are probably going to do a shitty job, and have no idea why.

> And a lot of it comes down to native English speakers who fear losing the intended nuances of the original and so do a rigidly literal translation, because they think that's the "most accurate."

Agree with you.

One of the key issues is that, to be a good translator you need to have a good command of BOTH languages. I can tell you I see piss poor French translations (from English or other languages) every day as well, and it's not nitpicking, it's just people doing an awful job at what they are being paid for. Most people who do translations are barely even literate in their own language in the first place (you can see that in their obvious lack of vocabulary).

On the other hand, I'd say the best translations I have seen go way beyond the original work, making the translated work even better, more rich, more nuanced than what it was before. It's not just "translation", it's rather close to versioning.

Oh, yes, absolutely.

Maddeningly, whenever that happens with e.g. anime you immediately get swarms of furious fans decrying the translation for "inaccuracy."

I can't say I have seen that in anime myself, but there's a couple of movies where the translation/version was actually better than the original movie in terms of language, figures of speech and so on. It was not just translation, it was beautiful writing.

> Yes, this is why most people generally hate doing translations in general - people like you who nitpick.

It's not nitpicking, if there are 50 000+ words in the common language there's a reason for it.

To be fair, "your son is dead" is a very tasteless way of saying it in English, too ;)

He also says his wife is Japanese which makes me doubt it's "utterly wrong" to be honest. Care to elaborate/provide sources?

décédé = deceased

mort = dead

You don't have to hate

This is the whole point. While in isolation, A generally translates to A' and B to B', it doesn't mean that it can always be used that way.

In translations you'll often get situations where A should be translated to B' in that context (even if in the original language B wouldn't be used there), in order to transfer the concept properly. Literal translations often end up weird, and sometimes completely wrong.

Sorry, but "deceased" is way more formal in English than in French.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact