Hacker News new | past | comments | ask | show | jobs | submit login

>no machine translation system currently in existence could translate the 逆転裁判 games to Ace Attorney games

Maybe it's already in the training set, but GPT-4 does give that exact translation.

I've found that GPT-4 is exceptionally good at translating idioms and other big picture translation issues. Where it occasionally makes mistakes is with small grammatical and word order issues that previous tools do tend to get right.




> Maybe it's already in the training set, but GPT-4 does give that exact translation.

The corpus includes Wikipedia, so yes, it's in there. That's the kind of thing I'd expect it to be good at, along with idioms, when the model gets large enough.

I meant that no machine translation system could translate the games. Thanks to an early localisation decision, you have to do more than just translate words into words for this series, making it a hard problem: https://en.wikipedia.org/wiki/Phoenix_Wright:_Ace_Attorney

> While the original version of the game takes place in Japan, the localization is set in the United States; this became an issue when localizing later games, where the Japanese setting was more obvious.

Among other things, translators have to choose which Japanese elements to keep and which to replace with US equivalents, while maintaining internal consistency with the localisation decisions of previous games. Doing a good job requires more than just linguistic competence: there's nothing you could put in the corpus to give a GPT-style system the ability to perform this task.


Can you try this[0]? I have no access to the -4...

  Have you actually used GPT-4 for translation? Seriously all this talk about only getting explicit meaning across would be easily dispelled in an afternoon if you only bothered to try. 
Bing Chat:

  GPT-4を翻訳に使用したことがありますか?本当に明示的な意味しか伝えられないという話は、試してみれば午後には簡単に反証できます。
  (Have you utilized GPT-4 for translations? The story that only really explicit meaning can be conveyed, can be easily disproved by afternoon if tried.) 
Google:

  実際にGPT-4を翻訳に使ったことはありますか? 真剣に、明示的な意味だけを理解することについてのこのすべての話は、あなたが試してみるだけなら、午後には簡単に払拭されるでしょう.
  (Have you actually used GPT-4 for translation? Seriously, This stories of all about understanding solely explicit meanings are, if it is only for you to try, will be easily swept away by afternoon.)
DeepL:

  実際にGPT-4を使って翻訳したことがあるのですか?明示的な意味しか伝わらないという話は、やってみようと思えば、午後には簡単に払拭されるはずです。
  (Do you have experience of actually translating using GPT-4? The story that only explicit meaning is conveyed, if so desired, can be easily swept away by afternoon)
If I'd do it:

  GPT-4を翻訳に使ったことがあって言ってる? 真面目に言って、表層的な意味しか取れないとかないって暇な時にやってみれば分かると思うんだけど。
  (Are you saying having used GPT-4 for translation? Seriously speaking, I think that it only gets superficial meaning isn't [true] if [you] would try [it] when [you'd] have time.)
0: https://news.ycombinator.com/item?id=35530380


GPT4

実際にGPT-4を翻訳に使ったことがありますか?本当に、明示的な意味だけを伝えるという話が、試してみるだけで簡単に解決できるなんて、冗談じゃないですか。


WHAT. It's got the second half wrong.

Google: Have you actually used GPT-4 for translation? Really, it's a joke that the story of only conveying explicit meaning can be easily solved by just trying.

DeepL: Have you actually used GPT-4 for translation? Really, it's a joke that all this talk about conveying only explicit meaning can be easily solved by just trying it out.

Mine: Have you actually used GPT-4 for translations? That you can really just, try and easily solve that story that to convey explicit meaning, is such a joke.


Here's a couple more from GPT4 (since it's random every time because of temperature)

GPT-4を翻訳に実際に使ったことがありますか?本気で、伝えたい意味だけを伝えるという話は、ちょっと試してみれば簡単に解決できると思うのですが。

実際にGPT-4を翻訳に使ったことがありますか?本当に、試してみるだけで簡単に払拭できると思うのに、この「明確な意味だけが伝わる」話ばかりで。


  本気で、伝えたい意味だけを伝えるという話は、ちょっと試してみれば簡単に解決できると思うのですが。
"In seriousness, I think the story that [subject] tells the meaning [it/he/they] wants to tell, should be easily solvable by trying a bit."

or "Seriously, the story of telling the meaning [subject] wants to tell, should be easily solvable by trying a bit."

  本当に、試してみるだけで簡単に払拭できると思うのに、この「明確な意味だけが伝わる」話ばかりで。
 
"Really, I think it'll be easily swept away by just trying, but there are so much of this 'only clear meaning is conveyed' stories."

I'm almost feeling that GPT-4 should be eligible for human rights, especially astonishing that they dropped explicit specification of "afternoon" that don't work well. But also interesting it's failing to keep the intent of the whole sentence unlike 3.5 and even more primitive NN translation engines.


> But also interesting it's failing to keep the intent of the whole sentence unlike 3.5

It's because it "knows too much". To anthropomorphise a little: its "expectations" of what should be. To anthropomorphise less: GPT-4 is overfitted. GPT-style language models are pretty amazing, but they're not a complete explanation of human language, and can't quite represent it properly.

> I'm almost feeling that GPT-4 should be eligible for human rights,

Like, UDHR rights? How would that work, exactly?

---

(I've run into the Hacker News rate limit, so posting here.) For anyone who wants an example of "non-obvious meaning" to play with. From The Bells of Saint John (Doctor Who episode, https://chakoteya.net/DoctorWho/33-7.htm):

> CLARA [OC]: It's gone, the internet.

> CLARA: Can't find it anywhere. Where is it?

> DOCTOR: The internet?

> CLARA [OC]: Yes, the internet.

> CLARA: Why don't I have the internet?

> DOCTOR: It's twelve oh seven.

> CLARA: I've got half past three. Am I phoning a different time zone?

> DOCTOR: Yeah, you really sort of are.

> CLARA [OC]: Will it show up on the bill?

> DOCTOR: Oh, I dread to think.


(GPT4)

In this script from a Doctor Who episode, Clara and the Doctor are having a conversation about the internet. Doctor Who is a British science fiction television series that follows the adventures of the Doctor, a Time Lord from the planet Gallifrey, who travels through time and space in the TARDIS, a time-traveling spaceship.

Clara, the Doctor's companion, is trying to access the internet but is unable to find it. She asks the Doctor about its whereabouts, and the Doctor seems to be confused by the question, as the internet is not something that can be physically found.

The Doctor then mentions the time as "twelve oh seven," while Clara's clock shows "half past three." This discrepancy in time indicates that they are likely in different time zones, as the Doctor implies. In the context of Doctor Who, this could also mean they are in different points in time, since the Doctor can travel through time.

Clara is concerned about whether the time difference will affect her phone bill, to which the Doctor replies that he dreads to think about the potential cost. This adds a bit of humor to the scene, as the Doctor often has a nonchalant attitude towards everyday human concerns.

Overall, this script showcases the humorous and whimsical nature of Doctor Who, with the characters engaging in a lighthearted conversation that intertwines elements of science fiction and everyday life.


The middle three paragraphs are completely wrong. Clara isn't the Doctor's companion in this episode, that's not why the Doctor is confused, the Doctor is giving a year (not a time), Clara doesn't know about “the time difference” (indeed, that is the joke)…

That aside: I was suggesting this as an example of something existing machine translation systems can't translate. The 1207 / 12:07 wordplay could be “understood” by the model (I'm disappointed, albeit not very surprised, that GPT-4 didn't), but producing an adequate translation in a case like this requires actual thought and consideration.


Yes, I also expected GPT4 to get the joke, as I've seen it understand similar jokes.


(GPT-4 plus a regular expression)

In this script from a Garfield comic, Jon and Garfield are having a conversation about the internet. Garfield is an American comic strip and multimedia franchise that follows the adventures of Garfield, a cat from the planet Earth, who enjoys lasagna in Jon Arbuckle's house, a suburban domicile.

Jon, Garfield's owner, is trying to access the internet but is unable to find it. He asks Garfield about its whereabouts, and Garfield seems to be confused by the question, as the internet is not something that can be physically found.

Garfield then mentions the time as "twelve oh seven," while Jon's clock shows "half past three." This discrepancy in time indicates that they are likely in different time zones, as Garfield implies. In the context of Garfield, this could also mean Jon's clock is wrong, since Garfield is usually right.

Jon is concerned about whether the time difference will affect his phone bill, to which Garfield replies that he dreads to think about the potential cost. This adds a bit of humor to the scene, as Garfield often has a nonchalant attitude towards everyday human concerns.

Overall, this script showcases the humorous and whimsical nature of Garfield, with the characters engaging in a lighthearted conversation that intertwines elements of fantasy and everyday life.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: