Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Word Lens, for Android (questvisual.com)
76 points by jf on July 6, 2012 | hide | past | favorite | 61 comments



Conversation here on whether this would be a good acquisition for Google just got me thinking about Google's motivation behind Glass (a bit off-topic):

1) Google can (easily) make the argument that it needs to sample the image stream from every headset along with GPS/gyro to recognize "what you are looking at." These samples would be stored (of course).

2) Each sample can be OCR'd to recognize signage and contextual strings for searching that particular frame (e.g. thinking of every frame as a "web page").

3) Google can then "index the real world" with the image and contextual data.

Forget sending cars around taking photos. With enough users wearing headsets, Gooogle can build a searchable, virtual representation of the physical world.

And to get back on-topic: I'm surprised that the Word Lens guys didn't think of this (not the image mapping, but the signage/contextual mapping). Or maybe they have...


I have no doubt that at some point Google will start building some sort of integration between Glass and Street View. They're already starting to do what you're talking about with street view data - they're using ReCAPTCHA to OCR street signs and house numbers to try to get more accurate address information for maps.


I'd love for Google to buy them and combine them with the Google Goggles team so they do something similar to the "offline voice" that they made for JellyBean. In the same way you could use the "full engine" when connected through the Internet (Google Goggles), but also still be able to do 80% of it offline through something like Word Lens.


WHOA! How freakin' cool would this be on Google Glasses when traveling abroad! Everywhere you look... the text is just translated for you.


If it can do that, it will be pretty easy to make it appear to "take the clothes off" people. (Note, I'm very much not saying this should be done, just that it will be done.) They say that to calm your nerves when public speaking, you should visualize your audience in their underwear. Will be psychologically interesting, at least, to see how that works out for the first Glasses-wearing presenter to try it. Strange world we live in.


I'm failing to see how you've made that jump? How does recognizing words, translating them and displaying the result compare to mapping a naked body from only uncovered skin? at best, you could get the correct skin tone on a model.


Oh certainly it wouldn't be an accurate representation, it'd have to just "guess" with a generic model transformed to closely match the position/shape/skin tone of the person. Anyway, didn't mean to get off topic (or sound horribly creepy!), it's just that there are a million "interesting" (in good or bad ways) apps that people will be playing with once they have these powerful, programmable devices that can filter/overlay their view of the world in real time. It will be a fundamental change in the way people interact with the world — whether you're wearing them or not, if this catches on, many of the people around you will be. Right now we all have a pretty good idea of what other people are seeing — in most cases its roughly what we're seeing, just from a different location; in the not too distant future, that may not be so.


Staying off topic for a just a bit more, one obvious example would be to automatically give everyone you see a moustache. An app was on Hacker News that does this within the past month. People interested in developing these filters can start building them now by creating smart phone apps and then port them over once Glasses becomes available.


Forget about LCD/LED displays. This is more like an LSD display. You could have all sorts of weird/random stuff overlayed so you're always walking around like you're trippin' balls.


What people seem to be forgetting is that Glass isn't augemented reality -- it's a display that sits in the top of your vision, not over all of it.


... not yet anyway.


Indeed, and it already seemed like this was definitely on their to-do list from the Glass keynote. I don't know how you could build a product like Glass and /not/ do that (other than it perhaps being an extremely difficult problem ;P).

More than that, microphone + reasonable speech recognition = everybody you talk to is instantly subtitled. Also an extremely difficult problem, but one I'll bet will be commonplace in 15 years.


I take it that you are not familiar with the way the German language order words?

Because there is no way you could accurately subtitle German in English, other than knowing what he is going to say before he say it.


Or just show the subtitles after a short delay. Or show a word-by-word translation in realtime. Or both.


> there is no way you could accurately subtitle German in English

At one time I am sure there were people that said, "Mr. Wright, there is no way you can get that thing to fly."


A piece of unsolicited advice:

Learn the difference between what cannot be done and what is impossible.


Pfft. Just use the mind reader api. It works based on words already said, body language, and tone.


Mind blown! Having everyone subtitled would be way cool for traveling abroad. But even more so for the hearing impaired.


I conveyed this to the developers[1] a few months ago, this would be a great acquisition for the Google Glass project.

[1]https://twitter.com/inkaudio/status/195677653049683969


An acquisition would only make sense if they had relevant patents to purchase, which they do not.


Don't underestimate the amount of work, expertise and experience required to arrive at word lens, to date I've to yet to see this technology duplicated elsewhere. Google has acquire hire before recently with milk inc, I don't see why they can't with this team.


This. Wordlens is computationally intensive and required a significant amount of GPU knowledge to get working in the first place, and then a couple revs of the iPhone before the hardware caught up. Otavio has some serious graphics chops that made the product possible.


Not always. While it is highly likely that Google could hire (or already has) the talent to build the same thing, acquiring a finished product and the talent that built it can be cheaper and faster than DIY. But yes, having a patent would make a deal even sweeter.


As someone who travels a lot, I've been hoping to get more out of this app, but it's been almost 2 years now, and I can only choose Spanish or French. (according to this page Italian is also available, but I don't have that option on my phone yet)

Since they're only translating each individual word (and not phrases or sentences), I wonder why it takes them so long to put out new languages.


Hi s_henry_paulson, check the app store for an update, we released 1.2 last night, which should include Italian.

We've been short-staffed until very recently, stay tuned for more languages!


Does this work right for anyone?

For me the words don't stay put. It flickers constantly, making it near impossible to read.


Does the same for me, but you can press the pause button to get a snapshot.


Hasn't google goggles been doing this for a couple years now? http://googlemobile.blogspot.com/2010/05/translate-real-worl...


This isn't quite as slick as how word lens does it. Google's translation is likely better, but Word Lens superimposes the translated text back into place on the original image.

Anyone tried to install it on a stock 2.3.3 Nexus One and been told it's incompatible?


Yep, we were sad about that one. The hardware is up to spec but there's something about the GPU driver that's causing the N1 to perform poorly.


Most of the time, in my experience, goggles fails to even recognize it as text to translate, because it tries to identify objects, too.


I think there's definitely a gap in the market for something a bit simpler: there have been a good number of times when I've been stumped by a word or two in a sentence (in Polish) written on something in the street, and have resorted to taking a cameraphone photo and translating later when at home.

Sadly, as I'd love to have a play, the app store says that it's incompatible with my cheapo ZTE blade.


I'm not surprised. It has a very slow processor, and for this kind of processing you need a lot of power. In fact, I think the (original?) developer of Word Lens said he first made it in Assembly to make it fast enough even for an iPhone 4.

Plus, your phone is also on the ARMv6 architecture, and for the kind of optimization they need to do, they didn't want to bother with that older architecture.

For the record right now I have a phone with a similar processor as well, but planning on switching to a Nexus this fall.


Yep, that's correct, critical sections of the pipeline are coded in assembly. Also, Word Lens only works with Android devices that have NEON support.


That needs to be explained somewhere obvious. On the market page it only tells me the app is incompatible with my device, giving me no clue what to get that would run it (except for the OS versions, which my device satisfied).


We considered publishing a list of minimum requirements but we encountered phones that for whatever reason wouldn't work even though the hardware met the minimum requirements. For example, the Nexus One and HTC Evo both had low frame rates despite adequate hardware.


Not compatible with the n1 either which is pretty rare. I wonder why...


We ran into performance issues on the N1 that made for a very poor experience. In testing, the N1 oddly worked fine when we replaced the stock firmware with Cyanogen. That would have really confused users though so we had to blacklist the N1.


It isn't compatible with my Droid X either, which surprised me. Play doesn't tell me why, which is inconvenient.


We haven't blacklisted the Droid X. Can you let us know what kind of problems you're seeing?


Really? It's compatible with my Droid X.


Have been waiting for this to come to Android for awhile now but sadly it doesn't seem to want to install on my Nexus One. Says it only requires Android 2.3.3+ and I have 2.3.6. Oh well, add another reason on the pile to replace this phone despite still loving it over the new Nexus models.


Yeah, we wanted to support the Nexus One, and the hardware meets the minimum requirements but we encountered weird performance issues likely related to the graphics driver.


I don't understand how this isn't compatible with my device (google play won't even let me install). I have an HTC EVO. It can't be a hardware power thing, if the iTouch runs it fine.


Sometimes the hardware meets the minimum specs but other issues like buggy drivers lead us to have to blacklist a device.


I like the idea but the translations to Spanish are terrible.


Any chance of getting a version that recognizes a url, email, twitter handle, etc. and makes them clickable? Goodbye QR codes!


Word Lens + Auto Captions = Instant Subtitles in EVERY language. OMGBBQ.


Unrelated rant: Dear Google, can you please add QR-Codes to the web play-store.


Why? You can just press "Install" and it instantly starts downloading on your phone?


Only if I login and thereby allow google to connect my mobile profile with my browsing profile. Which I decidedly don't want.


... what?

You're already logged into a Google account on your phone. Or do you really have different account for your phone, thereby foregoing the biggest advantage of having an Android phone (sync with all of the Google services)?


I don't use any Google services other than Search. I have no interest in giving them more information than absolutely needed.


Two years is an awful long time to go from iOS to Android.


Why did it take so long? Small team, SDK limitations, funding? I can't imagine that porting an Obj-C code base to Java would take as long as it did. (Doesn't matter how long original code base took; that was development, whereas porting code is more "manufacturing") Not trolling, just curious.


I'm sure what took so long is a combination of a) ridiculously poor/inconsistent camera API's b) less than satisfactory SIMD support. When I ported some camera stuff from IOS to Android it took me 3X as long due to the need to work around bugs in the SDK. I'm sure they had to do all sorts of nasty things with the JNI to get it performant as well, and I'm not even sure the JNI supports things like the C++ STL yet.


Not a long time when you consider how long it took them to write their first release.


How long did the initial iOS release take?


nice!


sweet!


This is incredible. We're living in the future!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: