Conversation here on whether this would be a good acquisition for Google just got me thinking about Google's motivation behind Glass (a bit off-topic):
1) Google can (easily) make the argument that it needs to sample the image stream from every headset along with GPS/gyro to recognize "what you are looking at." These samples would be stored (of course).
2) Each sample can be OCR'd to recognize signage and contextual strings for searching that particular frame (e.g. thinking of every frame as a "web page").
3) Google can then "index the real world" with the image and contextual data.
Forget sending cars around taking photos. With enough users wearing headsets, Gooogle can build a searchable, virtual representation of the physical world.
And to get back on-topic: I'm surprised that the Word Lens guys didn't think of this (not the image mapping, but the signage/contextual mapping). Or maybe they have...
I have no doubt that at some point Google will start building some sort of integration between Glass and Street View. They're already starting to do what you're talking about with street view data - they're using ReCAPTCHA to OCR street signs and house numbers to try to get more accurate address information for maps.
I'd love for Google to buy them and combine them with the Google Goggles team so they do something similar to the "offline voice" that they made for JellyBean. In the same way you could use the "full engine" when connected through the Internet (Google Goggles), but also still be able to do 80% of it offline through something like Word Lens.
If it can do that, it will be pretty easy to make it appear to "take the clothes off" people. (Note, I'm very much not saying this should be done, just that it will be done.) They say that to calm your nerves when public speaking, you should visualize your audience in their underwear. Will be psychologically interesting, at least, to see how that works out for the first Glasses-wearing presenter to try it. Strange world we live in.
I'm failing to see how you've made that jump? How does recognizing words, translating them and displaying the result compare to mapping a naked body from only uncovered skin? at best, you could get the correct skin tone on a model.
Oh certainly it wouldn't be an accurate representation, it'd have to just "guess" with a generic model transformed to closely match the position/shape/skin tone of the person. Anyway, didn't mean to get off topic (or sound horribly creepy!), it's just that there are a million "interesting" (in good or bad ways) apps that people will be playing with once they have these powerful, programmable devices that can filter/overlay their view of the world in real time. It will be a fundamental change in the way people interact with the world — whether you're wearing them or not, if this catches on, many of the people around you will be. Right now we all have a pretty good idea of what other people are seeing — in most cases its roughly what we're seeing, just from a different location; in the not too distant future, that may not be so.
Staying off topic for a just a bit more, one obvious example would be to automatically give everyone you see a moustache. An app was on Hacker News that does this within the past month. People interested in developing these filters can start building them now by creating smart phone apps and then port them over once Glasses becomes available.
Forget about LCD/LED displays. This is more like an LSD display. You could have all sorts of weird/random stuff overlayed so you're always walking around like you're trippin' balls.
Indeed, and it already seemed like this was definitely on their to-do list from the Glass keynote. I don't know how you could build a product like Glass and /not/ do that (other than it perhaps being an extremely difficult problem ;P).
More than that, microphone + reasonable speech recognition = everybody you talk to is instantly subtitled. Also an extremely difficult problem, but one I'll bet will be commonplace in 15 years.
Don't underestimate the amount of work, expertise and experience required to arrive at word lens, to date I've to yet to see this technology duplicated elsewhere. Google has acquire hire before recently with milk inc, I don't see why they can't with this team.
This. Wordlens is computationally intensive and required a significant amount of GPU knowledge to get working in the first place, and then a couple revs of the iPhone before the hardware caught up. Otavio has some serious graphics chops that made the product possible.
Not always. While it is highly likely that Google could hire (or already has) the talent to build the same thing, acquiring a finished product and the talent that built it can be cheaper and faster than DIY. But yes, having a patent would make a deal even sweeter.
As someone who travels a lot, I've been hoping to get more out of this app, but it's been almost 2 years now, and I can only choose Spanish or French. (according to this page Italian is also available, but I don't have that option on my phone yet)
Since they're only translating each individual word (and not phrases or sentences), I wonder why it takes them so long to put out new languages.
This isn't quite as slick as how word lens does it. Google's translation is likely better, but Word Lens superimposes the translated text back into place on the original image.
Anyone tried to install it on a stock 2.3.3 Nexus One and been told it's incompatible?
I think there's definitely a gap in the market for something a bit simpler: there have been a good number of times when I've been stumped by a word or two in a sentence (in Polish) written on something in the street, and have resorted to taking a cameraphone photo and translating later when at home.
Sadly, as I'd love to have a play, the app store says that it's incompatible with my cheapo ZTE blade.
I'm not surprised. It has a very slow processor, and for this kind of processing you need a lot of power. In fact, I think the (original?) developer of Word Lens said he first made it in Assembly to make it fast enough even for an iPhone 4.
Plus, your phone is also on the ARMv6 architecture, and for the kind of optimization they need to do, they didn't want to bother with that older architecture.
For the record right now I have a phone with a similar processor as well, but planning on switching to a Nexus this fall.
That needs to be explained somewhere obvious. On the market page it only tells me the app is incompatible with my device, giving me no clue what to get that would run it (except for the OS versions, which my device satisfied).
We considered publishing a list of minimum requirements but we encountered phones that for whatever reason wouldn't work even though the hardware met the minimum requirements. For example, the Nexus One and HTC Evo both had low frame rates despite adequate hardware.
We ran into performance issues on the N1 that made for a very poor experience. In testing, the N1 oddly worked fine when we replaced the stock firmware with Cyanogen. That would have really confused users though so we had to blacklist the N1.
Have been waiting for this to come to Android for awhile now but sadly it doesn't seem to want to install on my Nexus One. Says it only requires Android 2.3.3+ and I have 2.3.6. Oh well, add another reason on the pile to replace this phone despite still loving it over the new Nexus models.
Yeah, we wanted to support the Nexus One, and the hardware meets the minimum requirements but we encountered weird performance issues likely related to the graphics driver.
I don't understand how this isn't compatible with my device (google play won't even let me install). I have an HTC EVO. It can't be a hardware power thing, if the iTouch runs it fine.
You're already logged into a Google account on your phone. Or do you really have different account for your phone, thereby foregoing the biggest advantage of having an Android phone (sync with all of the Google services)?
Why did it take so long? Small team, SDK limitations, funding? I can't imagine that porting an Obj-C code base to Java would take as long as it did. (Doesn't matter how long original code base took; that was development, whereas porting code is more "manufacturing") Not trolling, just curious.
I'm sure what took so long is a combination of a) ridiculously poor/inconsistent camera API's b) less than satisfactory SIMD support. When I ported some camera stuff from IOS to Android it took me 3X as long due to the need to work around bugs in the SDK. I'm sure they had to do all sorts of nasty things with the JNI to get it performant as well, and I'm not even sure the JNI supports things like the C++ STL yet.
1) Google can (easily) make the argument that it needs to sample the image stream from every headset along with GPS/gyro to recognize "what you are looking at." These samples would be stored (of course).
2) Each sample can be OCR'd to recognize signage and contextual strings for searching that particular frame (e.g. thinking of every frame as a "web page").
3) Google can then "index the real world" with the image and contextual data.
Forget sending cars around taking photos. With enough users wearing headsets, Gooogle can build a searchable, virtual representation of the physical world.
And to get back on-topic: I'm surprised that the Word Lens guys didn't think of this (not the image mapping, but the signage/contextual mapping). Or maybe they have...