Hacker News new | past | comments | ask | show | jobs | submit login
"this gem is awesome".to_spanish # = "esta joya es impresionante" (github.com/jimmycuadra)
94 points by vrish88 on Jan 12, 2011 | hide | past | favorite | 43 comments



Just in case you didn't know, and you have plans to actually use Google Translate for translating web pages into Spanish... please don't. I haven't seen a single sentence with more than 3-4 words translated properly from English to Spanish. Even the examples shown in the README make no sense in Spanish.


Really? From my experience English to Spanish translation works pretty well in Google Translate. I've been using it a lot when I started to live in Spain and my Spanish skills were not too great yet: people were actually complementing me for how good my (written) Spanish is and were in huge shock when they met me in person and it turned out that I barely can formulate any sentence. True story.


It depends... most of what's translated is understandable for a native speaker, but it will be clear that it wasn't written by one. If you need to communicate with a Spanish speaker, it will work quite well. If you need to write a brochure or something formal... better get someone to translate for you.


Well, that's quite obvious, Google Translate is not perfect, e.g. overuses "Usted/Ustedes" or often gets lost when there's a lot of pronouns and missed "yo", "tu", "el", but:

a) with a little big of knowledge about the language and bit of editing you can overcome most of those stuff

b) even without editing it makes most of the stuff readable/understandable

c) works much better that competitive products (babblefish).


"my Spanish skills were not too great yet"

"From my experience English to Spanish translation works pretty well"

I think there's a connection.


You're wrong. "Not too great" doesn't mean I had completely no idea about the language, I was able to spot most obvious mistakes or strange constructions in the translation . Plus, 1,5 year later my Spanish is much better and I still think GT does amazingly good job for a machine translation. I also can compare it with how it works when translating from/to Spanish or English to/from Polish and really, English <-> Spanish is much, much better.


I'm fluent in English and Spanish. Yes, GT's English -> Spanish output is better than gibberish and you can often get the gist despite the glaring errors, especially if you know both languages.

But my point was, don't you think there's a bit of cognitive dissonance to somebody using machine translation because they don't speak a language very well, yet confidently asserting that the translations are good?


It might be like that but the point of my original post was that the correctness of the GT job was verified by native Spanish speakers (one might say it's a sort of a Turing test), not that my Spanish was bad so I thought translations are done good.


"Very cool gem!".to_spanish -> "Muy fresco joya!" This sounds like it would in English: "Fresh very jewel!". Makes no sense in any way, neither south american nor european Spanish. I'm sure there are many cases that it helps, and it actually did help me in many cases, even for learning German. But eventually, for writing something I always have to check the grammar with a native german, because grammar changes the whole meaning or sense. For a web page, I think it doesn't help reach the spanish community.


I assume Spanish people are quite polite.


They are but there is a world of difference between "don't worry, your Spanish is OK, I understand what you mean" and "I thought you know the language because your e-mails were written in perfect Spanish".


A few years back an ex-girlfriend of mine did an erasmus year over in Spain and I ended up making friends with a whole bunch of people that I could barely converse with. They've always been very impressed at far my Spanish has come in IM, thank you babel fish.


My dissertation research involved using (and, most importantly, evaluating) GT for certain medical text problems (going from EN->ES), and I found that its performance is highly domain-dependent (i.e., the quality of its translations depends a lot on what kind of input text you're feeding it). Furthermore, I found that whether or not it's "good enough" depends heavily on the user, specifically that user's level of proficiency in English.

GT is better than a lot of machine translation systems, but it still falls victim to a number of common problems that face such systems. Specifically, I've run into a lot of major word-sense-disambiguation issues, especially when working with the sort of short snippets of text that the gem in question uses as examples. Basically, that sort of short bit of text is a worst-case scenario for statistical machine translation, since there's so little context. Google generally does much better with longer runs of text than it does with short phrases.


What's causing this?

    ruby-1.8.7-p302 > "Where is the bathroom?".to_spanish
    => "\302\277D\303\263nde est\303\241 el ba\303\261o?"


Ruby 1.8 displays non-ascii characters in the shell using octal escape sequences. \302\277 is ¿, \303\263 is ó and so forth. It should be fine to work with (eg, saving to a database or displaying in a template) as long as you know the encoding. I just glanced at the code, but it looks like it just takes whatever the Google Translate API gives it, which I'm guessing is UTF-8.


Listed under "Roadmap" in the readme is the following:

Investigate Unicode support for Ruby 1.8. to_lang has only been tested with 1.9.

Do you have a copy of 1.9 to test with?


Ah, yes. Much better:

    ruby-1.9.2-p0 > "Where is the bathroom?".to_spanish
    => "¿Dónde está el baño?" 
Thanks


Accents and special chars, I think it is trying to output "¿Dónde está el baño?"


Translate fail already... user won't understand that 'joya' means a Ruby gem :)


In context, and with the slightest amount of deductive reasoning, I don't see why they wouldn't.


Sure, but those can read basic english too.


Since this gem appears to use Google's public API, one important thing to keep in mind is that there's probably a hard limit on the amount of text that can be translated at one go. I've found that throwing more than a thousand or so characters at GT's API results in their server throwing "URI too long" errors, since it only works via HTTP GET.

Disclaimer: This might have changed in the five or six months since I last messed around with Google's API, or, alternatively, this gem could be doing something clever to try and get around the problem. Either way, I'd suggest checking it out.


From memory, other limitations include 100,000 characters per day, and limits on where the API can be used (e.g. can't use it for paid services).


This is cool, but no one should ever use this. I've used the google translator frequently, and while it's good, it always requires me to use my knowledge of the language I'm translating to in order to create a more correct and meaningful translation. If you really want a site or app that is multilingual, you need a human with good knowledge of the language to do your translations.


Another similar project: https://github.com/caius/gtranslate


This is why even us die hard php coders sometimes yearn for RoR. .to_spanish !


Not trying to be pedantic but the gem has nothing to do with ror - it just extends the string class and adds to_x methods for each language.


http://code.google.com/p/gtranslate-api-php/

$gt = new Gtranslate; echo "Translating [Ciao mondo] Italian to English => ".$gt->it_to_en("Ciao mondo");


I think your example demonstrates very well why one would yearn for Ruby. The ability to give an extra method to a base class allows you to build very expressive code.

edit: and, BTW, the to_spanish method is not defined in the https://github.com/jimmycuadra/to_lang/blob/master/lib/to_la... module.


You're quite right, but in all fairness, the following would be a saner interface in php:

    italian_to_spanish("Ciao mondo");


Even boring old C# can do that.


Not true. That's just a compiler trick


But still not as elegant as: echo "Ciao mondo"->it_to_en(); would be.


It's only elegant until you have two libraries which want to add `foo` to a `Bar`. Then you're in the world of pain compared to writing `lib1->foo($myBar);`


Actually I agree, I'm rather against giving programmers ability to extend native objects at free will, but I'm still for strings being objects in PHP, or even having "everything-is-an-object" approach there.


I'd rather have the common case be gorgeous and have the uncommon case require a some workaround, than have every case look like crap.


Thank you for proving my point.


Drupal's localization infrastructure is quite nice. Instead of 'echo' you use the function t() which lets users use a GUI to translate all tokens in a Drupal site.


how does the gem failover if the machine it is being used on isn't connected to the internet?


You would get a SocketError exception from getaddrinfo because HTTParty (which uses Net::HTTP) can't lookup the hostname. If the hostname is already resolved, you'd get a Errno::EHOSTUNREACH exception from connect because there is no route to Google's servers.



Hey, I didn't say it was awesome, I was just reporting what actually happens. :)


In other words, "poorly".




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: