[EDIT: Note, Translating Muerto Fin to English in Google Translate, does result in 'Dead End'. Can any Spanish reader clarify why the original Google Translate chose to translate 'DEAD END' to 'callejón sin sali'.]
[EDIT: So let me get this straight. Their program, running on iPhone [256MB RAM, 600 MHz ARM CPU], can take a live image, perform OCR, translate, create another image with the translated words. And all of this happens real time? Wow.]
I think it's really hard to get a 'Wow' reaction from HN crowd. And you have hit a home run! As someone else said in this thread, you are going to make boatload of money. Congrats!
But I think that this app is doing almost a word by word translations, without a list of usual expressions or grammatical analysis.
For example, the correct translation of 'Dead End' is 'callejón sin salida', that means literally something like 'street without exit'. Translating word by word 'Dead End' you obtain 'Muerto Fin' that is unintelligible.
Another example from the video is the translation of
'Lengua boliviana con una salsa picante de anchoas'.
The app translates this as
'Tongue Bolivian with a sauce spicy of anchovies',
But the correct translations is something like
'Bolivian Tongue with a spicy sauce of anchovies',
because in a translation between English and Spanish you must reverse the order of the adjectives and nouns.
yep, this is a word-for-word translation, because it's fast and it gets the point across. we're working to improve translation quality and finesse, but it's a much harder problem to understand grammar. so, we hope it gives the general meaning, and you can learn to piece it apart.
Word for word is far, far better than nothing. And no network connection required, so awesome. Roaming charges when travelling are brutal, so very important feature.
Real time OCR + word for word translation....maybe its not particularly technically interesting on paper, but to watch it on a handheld device in front of your eyes is breathtaking.
Do you have a timeline for other languages? (Japanese)
Yeah, no network connection is the killer app here, since you're probably going to need this in another country where you probably don't have network access.
That's great too. Text detection + character recognition + 3d position estimation + translation, each one of the problems is difficult but state-of-art method usually works quite well. The combination, however, is super hard for a small team because each one of them requires special knowledge in that particular manner. To pick and choose which one to focus on and can still get awesome result is a rare skill.
I think that most of the time a word-for-word translation is enough. I use Google translator a lot, and most of the time the translation is not perfect but it is good enough. So I think that is app is very useful.
But I think that the real problem are the idioms and phrases, like "dead end", that have a completely different translation.
On the other hand, I think that you have more processing time that 1/10 sec. Most of the time the user will point the app to the same text for 10 or 15 seconds. I think that it is possible to show first a very fast translation almost instantly and a few seconds later show an improved version.
Some times, but not always. Not even sure if it's "most" of the times. "Right turn" to "correcto turno" is so wrong, anyone following directions with that would be lost pretty easily.
Still, amazing technology, specially since it's all client-side. Hopefully they'll have some contextual analysis in the future or enable Google Translation API use.
I know that Google Translate is not word-to-word, but it is not perfect.
For example, a few years ago, some students of my wife give to her a homework about clocks and gears. When she read it, she was annoyed because the redaction was incredible horrible. But later we realized that the students didn't write it, the "homework" was a web page translated with Google Translate.
Another time, I need an example of the differences between JPEG and JPEG2K for a internal talk. I found a photograph with a zoom of Lena's eye with a legend like "JPEG2K 1% vs JPG 1%", but the webpage was in Japanese, and I can’t reed it. So I use Google Translate to be sure that it was a comparation of the two methods with the same compression level.
So Google Translate is very good to get an idea of what some web page means, but it is not good enough to make a final version of the translation.
This app have some additional difficulties that GT doesn't need to solve: they need to OCR the text in the wild, they have less computational power and they have to do it in real time. It is almost incredible that they can solve these things. With a word to word translation, I think that you can get a good enough translation 90%, 95% or even 99% of the times, but the corner cases can be really unintelligible. The translation of "Tongue Bolivian ..." is fine, and the user can understand what it means, so it is useful. The translation of "Dead End" is something that they should improve in next version.
I once saw this forum post that was in German, that Google translated "A: Nyet." to "A: Yes."
Since it was a question and answer post regarding information for an upcoming game, there was quite of bit of fuss over it before this translation error was discovered.
John, outstanding work. Word-for-word is IMO more interesting too, because you understand enough to get the point across - but you still get the flavour of the language and enjoy the idioms. Huge, huge potential. You win the Internet today.
Thanks for the positive feedback (I'm one of the developers). We never imagined we'd win the internets!!! :P
About the word for word thing, that is mostly correct to say it's word-for-word. We have just a few short phrases in there, like "por favor". If that didn't translate correctly, it would have been teh lame.
I have some experience working with machine translation during my thesis(using Latent Semantic indexing, LSI), and after watching the video I was very impress with the speed of the translation, specially knowing how resource demanding are other strategies like probabilistic translation or LSI.
Are you hiring? ;) I am a native Spanish speaker btw.
Thank you. I think context is extremely important and very difficult to get right. I remember reading one article by Google engineer where they discuss how they keep improving their analysis of search queries.
E.g.
[Search Term] -> [Interpretation]
New York -> New York city
New York Times -> The news paper
New York Times Square -> Famous tourist spot in NYC.
There were other examples in the article. I will try to find that article and link it up.
I think Google should buy these guys and provide them with their knowledge of 'context'. Google Translate team and these guys should talk right away.
I used to work in machine translation, about 15 years ago. At the time, one of the big things they were working on was an improvement to the contextual inferences. The prior versions were able to consider context within a single sentence, and input documents could be tagged with a subject area, so the system would know to resolve ambiguities in favor of, say, medical terms.
The best examples of this, I thought, were the following passages:
Yesterday I went to the symphony. The sound was beautiful.
versus
Yesterday I went to Long Island. The sound was beautiful.
In the first passage, sound refers to an auditory experience; in the second, it refers to a body of water. Within the sentence "The sound was beautiful", there's no way to know which to use, so one must look to surrounding information to make a guess.
But I've given a short and clear example; anyone can see the right solution. But in real-life usage it's frequently unclear.
It becomes "make correct shift" (as in "I work in the night shift"). The word in Spanish would be "giro". The natural translation would be "gira a la derecha".
"make right turn" is apparently hard! Word Lens translates it to "make correct turn" but where "turn" is in the sense of taking turns not in the sense of turning. Google Translate gives "to the right which in turn".
(hola, i'm john deweese, one of the creators)
good to hear the early positive response, and we look forward to your insightful feedback (support@questvisual.com) -- mention this site so we pay some extra attention.
i hope this app will be nice and disruptive, and we'll be looking carefully at what people expect and how they actually use it. it's a platform, and we're really excited about the directions it opens up.
You should register wordlensapp.com or something of the like. I'm not sure hosting nothing but the app promotion on a site with a completely different URL is beneficial.
With such a product, you should perhaps brand yourself more on your product than your company name. :)
There are many trends for using domains similar to what you would desire: get<product> being the best, and <product>app being one of the other choices.
I have very bad experiences with finding an app's site, if it models <product>app. Maybe it's just bad SEO on their part, but I imagine that it'd be an awful hassle for any normal user who don't know the conventions.
Then it struck me that <use>product would be a very interesting domain that makes much more sense than <product>app. It also mirrors the imperative <get>, although it may not be as popular and known to users - yet.
I think it's a shame they didn't secure a regular <product>.com domain for such a great product - but, on the other hand, I'm sure all the press will forgive them (hell, they're trending on Twitter, and 80% of my social media digest today was about the bloody thing).
I've registered usewordlens.com, usewordlens.net, and usewordlens.org and will be happy to hand it over or point it to a domain of their choice (if someone tells me how the hell to do it using name.com, because I'm a complete idiot in that regard. I guess I have to mess with some DNS).
Since you mentioned this, I registered wordlens.co and set it to redirect to their website. I emailed the info and asked them if they want it. That's how awesome I think this app is - I'd rather spend my own money to make sure they get a good domain.
otaviogood or johndeweese, if you're reading this, my email is in my profile.
An android or SDK version that could be used with a heads up display system would be impressive. If you can do more then Spanish, you should consider talking to the US Military.
For the love of FSM, do it! I speak for a vast majority of Android users here on HN when I say I'd buy this even if it cost five times what it does now.
From the technological point of view I can just say wow! Congratulations, this is a great app.
Aside from that, and as a Spanish native speaker, the translated text sounds really weird to me. Even without knowing English I would be able to understand the translated text (at least the example that you gave) but it can be improved. But this is the only con that I see to your application. Probably you're already working on that, but I thought that you might like this input.
1. The logo seems similar to the United Nations logo. What was your inspiration for the logo?
2. In the video, who is holding the camera? And who is holding the signs?
3. Did you use an ad agency to make the video, or did you put it together yourselves?
The first card says "welcome to the future" which I thought was perfect :)
The designer who made the logo said he was going for a passport look. My brother made the video. He also did the music in the video. He's pretty good with that stuff.
Depending on how their OCR works, Mandarin/Japanese->English(/spanish/&c) could be a _lot_ more difficult (thousands of characters to distinguish) and in the case of Japanese the translation a lot trickier (completely different grammar, Mandarin's not so bad though).
This could just be a case of needing to optimize and train their recognizer to deal with a much larger set of possible characters, or it could require implementing kanji-specific OCR techniques like attempting to decompose the characters into their constituent strokes, and recognize based on classification of those strokes (orientation, position, direction).
I would have absolutely no problem reading Japanese or Chinese if it just did a word-for-word translation, romanized all the particles, and expected me to know what they meant (or had a handy feature that explained them when the image was paused). Even so far from actual translation, this would still pass for one of the best things ever.
Of course, guessing the correct word when performing word-for-word translation of hanzi is almost impossible, so even the extremely primitive product I'm thinking of is very difficult.
This is what I was thinking. I don't know about Japanese, but sentence structure in mandarin is pretty much reversed from what it is in romance languages. And so much of character meaning is based on context that outside of common phrases it might be nearly impossible.
On the other hand, an English to Chinese translator might be more doable, and might also be more commercially profitable.
Even an app that converted characters into the most direct meaning could provide insight in a large number of cases into the intended meaning and would be valuable.
An app that converted from characters to pinyin (a much easier problem) would be gold to someone trying to learn mandarin. I'd easily pay $50 for something with that functionality without it even making an attempt at the english. It's a much smaller market than those trying to understand chinese signs on a vacation, but it's one with a much larger stake and interest in the result.
spot on! I can usually muddle through Spanish and French signs when I travel, but I'm terrified of traveling to China or Japan where the signs are quite a bit more impenetrable.
You guys have made Doctor Who's Tardis translation-in-the-brain come one step closer to being real. I just tried it and it works perfectly. In my minimal experience, it works great on medium sized text so you may want to advice users to get closer/further from the text if your app can't find anything to translate (i.e. OCR fails).
As a language learner I'd love the ability to capture and export (in some fashion) the pre-translated text, as text not an image, to import into SRS learning software. This especially applies to the Japanese/Mandarin support that I hope you guys are working on.
I downloaded the app, bought the Spanish -> English pack, and tried it with some simple phrases in a big TextEdit window on my monitor. It flickered a bit, but I expected that from a monitor. It got them right.
Big deal, common phrases are easy -- I could just buy a phrasebook. Then I tried a random phrase from the Spanish version of "Dive into Python":
Una función, como cualquier otra cosa en Python, es un objeto.
It flickered like before from the monitor, but it was easy to read the translation:
A function, as any other thing in Python, is a object.
Perfect? No. Usable? Absolutely.
As soon as a French -> English pack is released I'll buy it, even if it's $100.
This is the kind of thing that would make it possible for me to move to Montreal. I love that city, but don't know French. I could learn, but it would take time and I'd be lost like a baby gazelle on the Serengeti while I learned.
This app could ease the process of moving to an entirely different country. That's amazing.
I live in Montreal and you are right that you would need to understand some French but only to read signalization, some store names and public advertisement. Signalization of course is important if you are driving but an app like this won't help you (you are holding the wheel I hope). Besides that, almost any store or institution has bilingual staff (and when they don't most French speakers will understand basic English). Don't let that hold you back... unless you hate snow and cold winters ;)
I live in Montréal too and I'm chiming in that it's easy to live here with only English, 25% of the city is natively anglophone. Some people in the company I work at (also immigrants) see no reason to learn French, which the quebecois probably don't want to hear.
It is easy to get the hang of signage too; lots of French words are spelt almost identically in English, and the language is much easier to read than it is to listen to. You should get the hang of it quickly.
On the other hand, if you have to get any kind of job that involves speaking with customers ... you are shit out of luck.
Seconding this. I know many people who speak only English and are doing just fine here. Almost everybody who works at a store is bilingual. It's still worth it to learn French though, if only to get the francophones' respect. I often hear Quebecois friends complain about people in Montreal not doing an effort to learn French, but they'll respect people who put some effort into it.
Whoever created this deserves the boatload of money they are going to make. It has the potential of radically change the experience abroad for people traveling in foreign countries.
If this thing can be made to do Chinese or Japanese... even if it does so haltingly, with mistakes...
Raise the price. For the love of god. Raise the price. This is much, much more valuable than five bucks. There have been times in life where I'd gladly have paid a dollar per minute for this. And I'm a cheapskate.
Raise the price. For the love of god. Raise the price.
Absolutely, read http://bit.ly/bdHKmm to understand why. (Sorry for the shortened URL, the URL recognition software here doesn't understand URLs with apostrophes in here, so it actually is necessary.)
Yes. No Android yet but this seems worth $10 to me. I'd do Chinese first. There's a big market for that in California; many Asian families have a language gap between grandparents or parents born abroad and a younger generation born here. Basic communication is easy but things slow down fast if conversation turns to anything complex. Being able to print some words neatly by hand in English and instantly get Cantonese would sell like crazy - I have 5 in-laws for whom it would be a no-brainer.
We have a lot more Vietnam refugees here than we do Chinese folks. If you'll notice, the three main languages you can get government papers in (at least over in San Jose, no experience elsewhere) is English, Spanish, and Vietnamese.
San Jose is a pretty skewed example as it has a very high Vietnamese population, where over California there is probably an order of magnitude more Chinese immigrants.
Vietnamese would be easier to recognize/do a translation pack for too, now you mention it - basically the Latin alphabet with extra marks to make 30 letters. Apparently there are more north Vietnamese around here and Oakland, more South Vietnamese in SJ and down south, maybe 450k in all of CA. Chinese-American maybe 5-600k? More than 150k in SF.
Incidentally, there's other and easier untapped startup opportunities for the Asian language demographic, if anyone wants to send me a note via gmail
My in-laws are a mix so either language is good, both is better. I can promise 3 Android and 1 iPhone sale just in my home :)
The dictionary is free and the OCR is available as an in app purchase for $15.
I'm just saying this because it is not trivial to adapt an OCR engine to non-latin scripts, because the image analysis techniques are rather different.
As I was suffering from acute future shock last night I didn't make this clear... but I'd happily pay a premium for French, a much easier OCR problem (and, indeed, one of the easiest foreign languages for an English speaker to read, since there are so many loan words in both directions.)
I understand that Chinese is harder, for many reasons. It is also much harder for humans. But that is why I'd happily pay, say, $10 per day of my trip for an engine that gives reasonable hints. Maybe more. Try me and see.
Of course, once this technology is ubiquitous and the price of such a thing has fallen to nigh-zero it is going to change the world. But we have to walk before we run, and you'll need the money for R&D and legal costs.
If I was traveling to japan or china, I would gladly pay $50 per language to translate. That's probably too high for an in-app purchase but I think $29 would be a no-brainer for languages that don't use English characters.
Absolutely. I travel pretty much nonstop and I would pay dearly to be able to translate signs on arrival into a new country. Things like 'ATM' and 'Toilet' are crucial.
Another observation about the site: it's very Facebook-unfriendly at the moment. Try sharing the URL on Facebook and you'll see that the site lacks any text or images that FB can readily pick up for display and thus only your URL is displayed. This app looks amazing, but that does make it a bit hard to tell people about on Fb.
He's referring to the fact that "click here" tells nothing about the link itself (for SEO purposes).
But in this particular case, it's just added to the text "Try World Lens for FREE", so I don't know what's so wrong about it (or why you get down-voted for just asking that question..)
The product seems amazing, but they are somehow cheating in the video.
The signs (in Spanish) have grammar mistakes, but they are automatically translated to correct English sentences. I have a few examples:
- The third sign says "Lo traduce el texto", but that sentence doesn't make sense. It should be "Traduce el texto" or even "Se traduce el texto".
- 'Ropas opcional' sounds strange to me, although it may be accepted in some countries.. It should be something like 'Ropas opcionales'.
I tried both examples in Google, Bing and Babel Fish and the results were OK, so I don't think that Word Lens translator is very accurate.
"Mr. Babbage, if you put into the machine wrong figures, will
the right answers come out?" ... I am not able rightly to
apprehend the kind of confusion of ideas that could provoke
such a question."
Sure it does. I'd be impressed with this app doing word-for-word translation if that's what I expected. If On the other hand, if the examples uses grammatically incorrect source material to give the impression that it can do more than it can, I feel irritated at being misled, and I'm much less likely to support the product.
These guys are going to have some hard decisions to make in the next few months. This has such incredible potential I'd be really sad to see it get bought and languish as a side-feature at Google. Run as a business, rather than a free technology project liable to be side-lined on whatever whim, and really developed for the next 5 years, think about what it could be! I mean, just look at what it is today on day one!
The offers will come, that much is certain; I can only hope they decide to go for it and turn this into something great.
I doubt they will be bought by Google, Apple is probably already writing cheques. Apple has been getting the pants beat off them by Google in mobile translation for awhile now. If there is a patent behind this then Apple will grab this for their stable just to keep it off the Android platform. It would be a huge 'one-up' for them. (which makes me sad as I strongly prefer Android devices) In time you'll see it licensed to other platforms as a new cash cow.
This should also be viewed as a prime example that it is not always better to push complex processing into the cloud. This would not be possible trying to push to a server somewhere real-time.
I don’t think the translation is the amazing part – as one of the creators said here, at the moment it mostly does word-for-word translation. Google’s statistical machine translation is mightily impressive technology, I’m not sure whether you could even theoretically put it on a phone at the moment.
You would ideally want the OCR and text replacement together with Google’s translation algorithms, which, besides requiring a network connection at the moment, would also introduce way too much lag that would kill the experience.
(Word-for-word translation is a perfect compromise in the meantime.)
Requiring a network connection would also make it useless while travelling abroad, which would probably make up about 90% of the times when you'd actually need this app. Not using a network connection is vital for this app to be of any real use.
Here's an interesting thought. This app is so good it could influence Apple hardware.
While I agree this is a killer app for the iOS platform, it doesn't yet have the polish that Jobs would demand for a live demo. A good deal of that is the hardware contraints of the system.
I think if you had invented a dedicated device that did nothing but this app it would sell like hotcakes, and we've heard in this forum that people are considering buying iPod touches just to give them out this app to family.
Thus if you wanted to move the needle from 'wow' to 'insanely great' it would take hardware level engineering to optimize the device, which I don't think it outside the realm of possibility.
Remember it was killer apps like quake, unreal, and crisis that moved the graphics card industry for years. This app has the potential for even broader appeal, and its successors will require even more horsepower.
I showed this app to my 60 year old semi-technophobic parents and they immediately understood it. They are ready to buy an iPod touch to use with this app for their next cruise. My mom guessed the cost of a language at $50 and was shocked to hear the actual price.
Screw zuckerberg, The developers of this app should be on the cover of time magazine.
A friend of my girlfriend is part of the team that made this and he gave me a demo of this 5 or 6 months ago and even back then when it was in an early alpha form I was absolutely blown away by this thing.
Seeing this thing in action in real life is life changing.
I agree. I saw a demo at Epicenter cafe randomly by this guy about 6 months ago. Its one of the only demos I've ever seen where I felt like 'holy shit I've just seen the future'.
I downloaded it immediately. Making it free to try and selling the dictionaries is a great sales model. The demo mode does a mirror-image reversal of the word to show you that it can detect words.
I was using a 4th gen iPod Touch, so lower-res camera and no flash.
Unfortunately, I was not able to get it to work as well as I'd hoped. I tested physical copies of a few things:
1. Mac OS X Snow Leopard cd case. Black sans-serif text on white background. This worked the best, with the word-reversal consistently getting "Snow" and "Leopard" reversed with little flicker. "Mac" seemed to go in and out. OS was rarely replaced, and usually along with "Mac" as though it were part of the same word.
2. C++ Programming language book cover (http://pixhost.info/pictures/631454). Dark blue serif text on white. This almost never worked. When it did recognize letters, the recognition shifted so much that the word was a constantly moving jumble.
3. Throat Coat tea bag. White serif text on Red. At any point in time, about 50% of the words were recognized and reversed.
You can take a snapshot, and each word it recognizes is highlighted, which is pretty cool.
I would definitely buy something like this that handled non-romance languages to English.
The current iPod touch has only a very low res, low quality camera (I think VGA or maybe a bit better, in any case less than 1 MP) without even autofocus. It would be nice if someone could try it with an iPhone to see if it fares better.
For me it seemed to work better in the translation mode. The reversing mode flickered forth and back, but with English-to-Spanish seemed to be more stable.
I work in the translation industry, and the interesting part of this to me is the implementation. Generating the resulting translation image in real time is a unique idea that is probably hard to make work well. OCR on a mobile device without a network connection is not as difficult as you might think (read: it can fairly easily be done on Android with tesseract-ocr, at least). Also, instead of the typical rule-based or statistical machine translation you get on the web, think of translating word-by-word as simply substituting words from a key/value dictionary list.
Having said that, I'm always glad to see interesting translation being developed and getting such a positive reaction on HN. Congratulations to Word Lens on launching!
I agree that it's impressive, but the flash video felt like cheating.
For instance, no spanish speaker would EVER say "LO TRADUCE EL TEXTO" (@ 0:20), or "ROPAS OPCIONAL" (@ 0:50). They just picked some words in spanish that made sense when translated, but my guess is that average translation would be much more awful
As others have said it doesn't really matter if the translation is awful. Given a context, and a poor translation into my native language - I will understand what it says.
Unless it translates `apple` as `chair` - I'll understand what the meaning is.
I'm sure I'm too late to the game to have anybody care about this, but here is how you can tell this is something cool:
I just went to the local bar with my roomate and, despite it being pretty busy, I told one of the bartenders "Hey, want to see something cool?" and pointed my phone at their menu. Now, almost immediately after I said this, I realized it was kindof rude (they were busy), but she looked anyway...then she proceeded to yell at all the other bartenders to come over and check it out...and the guy sitting next to me spent about a minute going over the menu (translating from english to spanish) wowed as well.
If Smartphone + Wikipedia + Google = Hitchhiker's Guide to the Galaxy, then this is the Babel fish.
And I’m sure that it will also, by effectively removing all barriers to communication between different cultures and races, cause more and bloodier wars than anything else in the history of creation.
One day tourists are going to buy "holiday glasses". They'll walk around Paris, stop at a nice restaurant, put on their glasses, and read the whole menu in English. Then they'll order by pointing. Hell, they can do this with their phone right now, the glasses just make it easier.
Head-mounted displays have been around for so long in a crappy form. I can't wait until Oakley or Apple finally catch the HMD bug and make something nice. Maybe this kind of app will be the push.
Saw this comment on TechCrunch, sums it up perfectly:
When I was a kid, I would have bet all the money I would ever earn that we'd have flying cars before the ability to do something like this with a phone. Amazing. Absolutely amazing.
They could have stopped at just translating and printing it on the screen, but they show it the original form, that is the difference between doing things and doing the things beautifully.
While I am sure that there is a use as an aide, and I have known blind people who are both patient and very independent, I doubt they would stand on street corner waving a phone around hoping to catch some text.
That doesn't rule out that text to speech with translation wouldn't be beneficial - just not in this context imho.
//Actually -- I'm not sure how limited the text-to-speech stuff is in the kindle. (as-in -- how many books in the store allow it. But on the couple that I've tested - it works great)
Also -- audible/various other audiobook retailers are great.
I. Am. Totally. Floored. Just wow. The last time I checked in on assisted reality apps for the iPhone there where cheesy laser tag apps and stuff to hang tag clouds in random locations. Interesting I guess but nothing to produce a visceral response. If I where you your shoes I'd see about buying a wheelbarrow for the money that's coming your way.
I just caught myself thinking: "I hope they got a software patent on this, they deserve it." But I usually hate software patents unequivocally. So, yet again, a real-world use case reminds me that nothing is so black and white.
The problem is that it is a clever combination of techniques that already exist.
* OCR, to identify the text
* Removing the background of the text, filling it with surrounding colors
* Translating words
* Placing new words on the same area, using the same rectangle
It's very, very clever, but it's not something that should be controlled by one person or company. Should any combination of existing techniques will also be patentable? Where do we draw the line?
They were the first and they'll have a head start and make a lot of money. Now, be exclusive? I don't think so.
As a friend who knows octavio, they've tried various "OCR" techniques from typical journal papers and in general those types of algorithms were mostly for static images. They struggled for 2 years to find the best techniques and I think they final product is great.
As a programmer, sometimes I think "this could be patentable, because the implementation is not immediately obvious to me." Maybe the non-programmers at the USPTO apply the same standard.
The problem with this reasoning is obvious: given that "idea patents" are now commonly awarded, these guys are much more likely to be sued by a troll who has already acquired a patent on the idea, than they are to be able to defend their IP by threatening a patent suit against someone else. An eye for an eye leaves the whole world blind.
Patents are no defense against patent trolls, because (by definition) the patent trolls don't actually produce anything. There is no defense against patent trolls, except lobbying to narrow the scope and duration of technology patents.
The military doesn't have to make a bid to buy them. All they have to do is throw down a billion dollar contract for prototyping an arabic version and Quest Visual will have no reason to sell.
You don't have permission to access / on this server.
Additionally, a 403 Forbidden error was encountered while trying to use an ErrorDocument to handle the request.
Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.7a mod_fcgid/2.3.5 Phusion_Passenger/2.2.15 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 Server at questvisual.com Port 80
The awesome thing is that once the platform is there, they could easily open it up to other languages and get them rolling too--though they might have to really think before diving into languages with a different character set, such as Japanese.
And I see the opposite. Anyone could have had this idea, but very few could execute on it. In fact all that will determine whether this is a big success or not is whether the thing really works or not (execution).
No kidding... I was just sitting down to research open-source Java OCR software to do this exact thing on Android. My dreams are dashed but I'm glad someone did it...
A lot of people must have thought of making OCR iPhone apps. But the idea of using OCR to do real-time language translation superposed on the original background image? That's genius. I hadn't thought of it. Did you?
At least in my limited experience, deciding what to work on is really important.
Acknowledging great ideas doesn't take away anything from their superb execution.
I did: "One that I've mentioned in almost every post here for ages, a service that I can take a photo, it does OCR and feeds the text into Google translate. Bonus points if it does source-language-detection. More bonus points if it is actually usable in a foreign country on real live things like signposts, menus, timetables, advertisements, book covers, leaflets, etc." - me, http://news.ycombinator.com/item?id=569838
Nearly two years ago, travelling in a foreign country with an iPhone, knowing Google Translate existed, knowing OCR existed, and not being able to do anything with that knowledge.
(It's so rare that I hit this kind of advance view on things I'm going to be shamelessly happy about it. NB, I'm not claiming anything here - I couldn't write it and didn't try).
My work colleague also had this exact idea, and told me about it in the hopes I could code something to do the job. There was just no way I had the time/resources/skill to make something like this.
I disagree. There is a comment higher up suggesting that this app could change the way people live their lives. I read this and it occurred to me that this was likely true. With the caveats if they market it correctly, if they handle competition correctly, if they expand and the proper pace, etc.
The creators will likely see a payoff from this whether or not they do any of these things correctly. But the magnitude of that payoff is entirely dependent on those ifs and the difference between executing correctly and not is humongous.
It is amazing that you can get a piece of software for free (with language packs for an extra low price)
This is something the US military would have paid millions and millions for in the past. Now, you see people say, "I'd even pay fifteen whole dollars for that app"
Do you see yourselves extending this to languages based on non-Latin characters? This could be an amazing, amazing tool for Chinese characters- both for foreigners and natives.
it's funny to think about how the advantages of compact encoding influenced the progress of technology. would like to think that we're getting past this by way of Unicode, but this is an analysis problem, so character 'visual encoding' rears its head again.
of course, people are fantastically good at it, if trained in school. looking at /other/ people's scripts though, well, here we are. :)
The Pleco Chinese-English dictionary iPhone app does realtime lookups using the camera. It's more of a study tool than a realtime translation tool, but still really cool:
Well Octavio did work a full year on it before finding John, and still took another year of hard work before they were able to release a quality product.
That was the fastest $10 I've ever spent in the App Store. I'll echo most people's sentiments, this is seriously cool (although I must look like an idiot walking around my apartment pointing my phone at everything).
John Deweese, "one of the creators" says here that "it's a platform".
Here's one thing they could do: open the platform to distribute dictionaries from third parties.
Questvisual could still sell "basic" or "standard" dictionaries for each language pair, but they would also sell competing dictionaries, that could either try to address the problem from a different angle (phrase translation vs. word translation), or be specialized dictionaries: legal, medical, etc.
They would take a cut, of course, and they would create a market that they would curate. Great wins for everyone!
Cute undocumented (that i could find) Easter egg on the Reverse Words feature - It also 'corrects' the spelling. heh. (Discovered when it insisted the reverse of "Shephard" was drehpehS)
Even though I have no use for it, I bought the English to Spanish to try it out. I'm blown away and these guys deserve to take away Facebook guy's man of the year award... amazing...
That's very, very impressive. Traveling would be so much easier with just this app, amazing! Let's hope they'll support Asian languages in the future, for round-the-world coverage.
I'd like an English to English add on so I can get dictionary definitions of words using the 'pause' feature. This would also make visiting URLs in print extremely easy!
Everything is secondary to compelling usecase. Word Lens has strange company name (QuestVisual), no logo, crappy landing page, and no PR preparation. But its darn cool! And that's what matters.
Assembly innovation is really cool. Every piece of Word Lens was here, but nobody made a perfect combination before today. Academic researchers will never do another Word Lens, as they are overfocused on novelty and hate just-assemble-the-pieces work.
Clever freemium business model. Word Lens is free, but you have to buy language packs. "erase words" and "reverse words" are free demo modes to prove that the app really works. Note, that you can even turn it into subscription model with dictionary updates.
BlendBack is the heart of this invention. Word Lens goes like this: (1) detect and recognize characters, (2) translate, (3) produce text in similar colors and shape and blend it back to the picture. The last step is the most innovative and can be used beyond Word Lens. E.g. one can do "Bar Code replacer". Turn your phone on any barcode and see some picture there. Can be used as a cheap replacement for road signs and ads.
No connection required. This is extremely important. 3G is unstable. WiFi is not everywhere. 4G has not really arrived yet. When you travel, your carrier can not cover you perfectly. I can see more and more essential apps that will not require connection. "Yelp in a box" anyone?
Global appeal. This is not another geek's app. It is mom and pop's app. It is an app for every country and and every village. We need to spend more time outside Silicon Valley to find needs like this one.
Science fiction inspiration. Part of the reason for press craziness is that Word Lens matched the science fiction story (Babel fish from "The Hitchhiker's Guide to the Galaxy"). We love seeing SF concepts turning into reality. Let's reread old classic and implement all other concepts from there :)
What I would do next
Obvious: add more language packs. Asian languages can be a bit of a challenge (recognition is harder).
Brainstorm pricing. A lot of options are available: different price for the first month, bundle prices for several languages, one-time price discount (a la 23andme), subscription model, enterprise package.
Put on hold all talks to investors and potential acquirers.
Immediately start working on versions for other platforms (Android, Blackberry, Nokia, WinMo). Hire another person to do just that.
Run a contest: iPhone for the best "Word Lens in the wild" video.
> Immediately start working on versions for other platforms (Android, Blackberry, Nokia, WinMo)
I disagree. I strongly believe this is close to the 1st successful iteration of a killer app for mobile devices (the Babel Fish). One that other platforms might not have for a while. I would be surprised if Apple has not already reached out to invite them to One Infinite Loop to talk about their future plans (not acquisition, but Android).
I just read a bunch of reviews in the app store and there are plenty one star reviews that say something like "what a rip off! This isn't free at all! You need to buy the language packs to do anything", a lot of those reviews even add that if it just cost $10 instead of falsely claiming to be free it would sell like crazy. So maybe the freemium model is a little too clever.
So am I honestly the only person who's first thought on seeing this is... ooh, give me a Japanese pack and I don't have to wait for scanlations anymore?
Very impressive. I'm curious to see how it performs in the wild though -- can it translate signs outdoors just as well as in a studio? I really hope so.
It performs very nicely in the wild. One of the nice things about the "real time" aspect of this app is that you can nudge the phone around until you get a good translation.
What about smaller type (and sometimes weird fonts) in somewhat dim environments – I’m thinking of menus, that seems like a typical situation where translation is necessary for tourists.
I've known about this application for a long time, Otavio showed me an early prototype of this running on his laptop over a year ago. I'm ecstatic that they finally released it.
(I'm also happy that I was able to get their story on HN before TechCrunch :D)
At first the demo didn't work at all for me. Nothing would happen at all. Then I tried to click on a few buttons, and text started dancing left and right, with no apparent reason.
Then I got better at it. It seemed to reverse the words and letters more or less fine.
Then I spent the $5 on the product for Spanish (could be handy). I'll happily spend another $5 on a French version.
I'm reading comments on this app on other blog sites and people say things like "translations suck", "not worth your time".
Times like this remind me how ridiculous consumer expectations can be. People still don't get that product development takes time and energy -- they just like to criticize with their elitist expectations only because they own an iPhone.
Just bought it, this is indeed impressive stuff. I bought both English -> Spanish and the other way around. Not being a big speaker of Spanish I wasn't sure how good the translation was but the Spanish to English is good enough for my travels next year.
I'm very surprised people have not mentioned this app does not work very well (if at all) when the phone is in any orientation other than straight up and down.
Why is it not possible for the developers to use the orientation sensor/gyroscope to accomodate for this? Many times it's easier to fit a bunch of text on the camera in landscape mode than the alternative. It's also a lot more natural for me to hold my phone in landscape mode when using the camera.
Just my $0.02 and congrats on all the hubbub. Can't wait for more languages :)
I would pay $20 for japanese, chinese, korean, russian, arabic, and hindi language packs. seriously. if i was traveling around the world, I would HAPPILY drop $100 into this app.
- an OCR function that reads a few words from an (easy) image
- a (simple) translator that translates the words into a different language word for word
- and then paints the translated words back using the same font/size
...continuously and quickly !
Don't get me wrong, this is cool and all, but it would be much more useful with a single snapshot that translates better, instead of focusing on doing it "realtime" (but the video wouldn't be as cool :-) )
The idea is nice, so I expect Evernote and Google to implement this ASAP ;-)
Just one small piece of feedback on the "commercial." It's a bit confusing for the first 10-15 seconds or so. I was too busy looking at the iPhone screen and not the billboards. Can I recommend an animated version with v/o?
Great work!
Edit: One more piece of feedback. The icon isn't nearly cool enough for how cool this app is. My prediction is that you guys are going to make a boatload of money, hire a designer sooner rather than later to spiff anything up.
Am I the only one to be surprised by the overwhelming reaction of the HN crowd? Doing word by word translation is not so exciting from a technology point of view, is it because all the other apps doing translation on a phone are crap? I believe there are libraries that already store language translations, right? Seriously wondering why everybody is jumping and shouting how great and disruptive this is.
I think it's more the speed, and the fact that it's not just doing the word-by-word translation, but it's doing OCR first, and, as a nice touch, it at least makes a vague effort to visually look similar to the original text (well, color at least)... and it does all this without using a network connection at all.
The 'traditional' method of firing up something like the Google Translate app, manually (and potentially painfully, if it's a language that uses symbols you aren't used to typing in on a regular basis) typing in the text, and maybe waiting for a server in the cloud to spit back response... is just crap compared to something like this. Even if Word Lens doesn't do proper grammatical and idiomatic translation (yet?), it still seems like it'd be super useful.
You should be the only one being surprised. This isn't just a nice app, it's a killer app.
People will buy smartphones/iTouches/tablets for this purpose. Alone.
This is a simple to use, cheap app. on an existing popular platform which makes a significant improvement on a task a lot of people want to do. You're standing there saying "What's all the fuss about Chrome and FireFox? I can browse the internet with IE6 and have done for years!".
This reaction is future shock, this is a surprisingly strong wave in the flow of future technologies and it's shaking people up.
for the sake of being contrarian, why all the freaking out? it's cool and all, but it's OCR + translation, two things computers are already pretty good at (not to mention word lens seems to be doing word-for-word translation, not context-sensitive).
yeah doing it so fast on a mobile device is impressive, but am I alone in thinking this isn't going to change the world?
You are looking at a product from it's technical merits only.
From a product perspective, I can travel to any country in the world and read the signs on the shop, read menus, or even learn the language since I have an always available translator to help.
It makes the world smaller and more accessible. Those things always change the world
It's doing a lot more than OCR + translation. It's doing OCR + translation very very quickly, on a very limited platform, then re-injected the translated form of the words back into the stream in real time.
If I could point my macbook at something and get the same thing, then I might be a bit fascinated...the fact that I can do this with something that fits in my pocket is what is so amazing.
This is amazing and has great potential. Imagine word-lens for conferences: record the presentation -> translate the slides with Word Lens -> publish the translated slides live to the conference audience. There are hundreds of ways of making money from this technology, the iOS app should only be the beginning.
This is mindblowing! I hope you make it for Android as well.
A minor thing about the webpage: If I want to share it on facebook (and I do!), it does not come up with any sort of summary or images, like it normally does for links. Now, I don't know exactly how it gets this information, but I would definitely find out if I were you.
Not sure if it's the lighting or the text I'm trying, but it does have a tendency to make words dance around insanely. The execution could be improved upon a bit.
I can certainly see a lot of potential for this. Let me know when it's embedded in my augmented reality eyeglasses!
One note, it'll be hard to get really good translation quality without a network connection, because good models require tons of CPU and memory -- Google Translate is far better than word-for-word because it uses such things.
The color matching is easy. It's not using the same fonts, and pretty obviously stretching the text to take up the approximate amount of line space - also easy. Don't be too quick to judge! :)
The only thing I noticed is that it translates the text to all caps, and the slides were all caps, so you didn't notice anything in the video. But that's really not material; to obsess over that is missing the point that it translates text in real time between human languages using a camera.
It’s also free (not the translation part but the word recognition and replacement part) so there is no excuse to not try it out yourself or getting a friend to try it out for you (should you lack an iPhone).
Totally awesome, but there's a big problem. Most of the time when you're in need of translation of text you find like signs, it's because you're in another country. That means data is very expensive, because your fancy iphone is locked and roaming.
It's awesome, many people may download it, but it will get very little use. :-\ They should probably charge like $5 or $10 for it, make money on people WANTING to use it, not on them actually using it.
Unless i'm mistaken and it's doing the image processing all locally, in that case, they've got a smash hit.
It said in App description that it does not need network connectivity. To verify this claim, I put my iPhone in 'Airplane' mode and tried the sample paper again. And I can confirm that it works without any network connectivity.
Also, the App is free but you have to pay for each language to language translation separately. I think it's a smart move. I may buy 'Spanish to English' for 4.99 if I am going to Mexico. If I end up in France 6 months down the line, I will buy 'French to English' at 4.99. I won't worry about spending 4.99$ but they have successfully extracted 9.98 from me. This is awesome.
I had to put the App to test.
Printed some simple English words and tried it out.
Original paper:
http://imgur.com/EYsko
What the App shows in REALTIME:
http://imgur.com/FhaSW
Comparison with Google Translate:
http://imgur.com/GOd94.jpg
[EDIT: Note, Translating Muerto Fin to English in Google Translate, does result in 'Dead End'. Can any Spanish reader clarify why the original Google Translate chose to translate 'DEAD END' to 'callejón sin sali'.]
[EDIT: So let me get this straight. Their program, running on iPhone [256MB RAM, 600 MHz ARM CPU], can take a live image, perform OCR, translate, create another image with the translated words. And all of this happens real time? Wow.]
I think it's really hard to get a 'Wow' reaction from HN crowd. And you have hit a home run! As someone else said in this thread, you are going to make boatload of money. Congrats!