

Let's Build a Superpower: see within and behind objects - jonmrodriguez
http://kck.st/voxel-vision

======
vladoh
I'm sorry but you obviously have no experience with augmented reality (what
you are actually trying to do). There are so many problems with your
approach...

1\. In order to be able to render virual objects on top of the real image
(camera image or directly to the retina) you should know the pose of the eye
with respect to the outer world. For example if you are a surgeon and want to
overlay an MRI scan on the patient, you need to know the pose of the eye
(camera) in respect to the patient and this is a VERY hard task as everybody
doing computer vision can tell you. You only mention how to find the pose of
the eye with respect to the head but this is not enough.

2\. Processing large 3D models like 3D scans requires a lot of computational
power. It won't be very portable.

3\. You will also need a depth image along the camera image in order to
achieve some of your goals (like bluring the image according to the focus
distance). Do you have any idea how difficult this task is?

You have cool ideas, but should be more realistic about your goals. Also
reading some things about augmented reality and computer vision will help you
a lot to understand how difficult this is.

~~~
jonmrodriguez
You are right, taking a limited project scope is a virtue:

We merely want to make a _proof of concept_ that inspires users with a mind-
opening sense that the world's data can have higher dimensionality than just 2
dimensions.

I (Jon) read the novel _Flatland_ when I was a kid, and have ever since been
yearning to see in more than just 2 dimensions. This project is an attempt to
open users eyes in the same way.

\--

With the goal defined as proving a concept, it becomes apparent that _we do
not need to permanently solve every challenge with AR in a scalable way._

We merely need to kludge around them (for example using tons of computing
power) enough to make a convincing temporary experience:

1\. Patterns of marks or lights can be placed on the outside of the headset,
detectable by cameras in the environment.

2\. True. Luckily we can achieve a kludgy demo by having one entire rendering
computer (with graphics card) per depth slice, and selecting between their
outputs as a dumb switch, post-rendering.

3\. The plenoptic camera from lytro.com should provide the necessary post-
capture-focusable images. We are meeting with Prof. Marc Levoy today to ask
about partnering.

\--

We merely hope that once the concept of voxel vision has been proven, that all
future AR systems will feature it as a standard mode of interaction.

~~~
vladoh
Augmented reality is going to grow a lot in the future and therefor I think it
is worth investing time in it and I wish you good luck :) I would just suggest
to focus on a smaller and better defined task and borrow as much as you can
from the field of AR in order to increase your chaces for success. For example
I wouldn't waste time in developing you own tracking system (point 1) but use
an existing one using markers. Also for the depth images I think you could do
much better with a Kinect instead of this camera, because it will give you
much more detailed depth image (however only indoors). I think the ideas of
tracking the eye position and is really worth exploring. The projecion of the
image on the retina is also realy cool, but I think it is much more difficult
and one year will not be enough to achieve reasonable results. Good luck :)

------
sskates
Cool project, but the intro on Kickstarter is grossly misleading. It implies
that you have a way to obtain the data about what should be rendered in
addition to projecting it, when in reality this project is only about the
projection part. It's not bad to talk about potential applications, but you're
implying that they're part of the project when they're not.

~~~
jonmrodriguez
We will fix it tonight. Thank you for the feedback, and sorry for the overly
lofty rhetoric.

~~~
JabavuAdams
Are you familiar with Steve Mann's wearable computing work?

I bought a Kopin cyber-display around 1998-2000, and one problem with all of
these displays is that they have a tiny field of view. It feels a lot more
like looking down a sight than having some kind of full-scene overlay.

I've also tried using the 3Com MPro mini projectors to back-project onto a
phone-sized screen. With the default optics, the best image I could get had
the projector 20 cm from the screen, and the screen 10 cm from my eye.

~~~
jonmrodriguez
Of course, Steve Mann's telepointer helped define the field of AR!

If you (a user) do think that screen size is too small, it is possible to take
the approach of having large, opaque screens that show the entire scene, so
none of the real light makes it through.

For example, imagine a helmet that has two screens inside, which are showing a
heavily augmented version of the video streams from a pair of plenoptic
cameras (lytro.com) that are mounted on the front of the helmet.

------
etherael
All that is mentioned here so far as I can see is a method for getting focus
information coupled with a virtual retinal display. How are they proposing to
get the image on the other side of the arbitrary thing you are supposed to be
able to look through?

~~~
jonmrodriguez
EDIT:

I think that in the future, the following phenomena are likely to proliferate:

\- high-res satellite and airplane imaging of the globe

\- public geotagged photos, including compass bearing

\- Google Street View road images

\- public security cameras on public streets

All of this data can be integrated using software like Microsoft's PhotoSynth,
to stitch together a gigantic 3D model of the real globe.

Then, people with Voxel Vision will be able to look around at streets and
landmarks that are interesting but physically occluded.

======

Edited to rise a relevant comment from further down the thread. Original
answer follows:

Jon Rodriguez here, project lead.

Your question seems ambiguously worded.

I will answer both possible interpretations of the question:

===

Q:

"How do you propose to do the sensing task of seeing inside real world objects
so that that data can be rendered?"

A:

The source of content depends on the application.

1: put a virtual screen at distance infinity, so in a boring meeting you just
look far away to check your email

2: for telepresence surgery, use a MRI to do the 3D imaging of the patient's
interior

3: Google Street View images allow you to see to any point in a city, even if
buildings are between you and it

4: educators create custom 3D models of physical phenomena they want the kids
to vividly explore, such as a 3D model of a human neuron, or a 3D geological
model of the Earth, or the interior of a grape.

Etc.

In particular, virtual reality games and worlds can be easily shown this way,
as long as the game artists take the extra effort to give their models
interesting interiors.

===

Q:

"How does a Heads Up Display that is in front of a real-world object (like a
brick wall) then project a virtual image that appears to be behind that
object?"

A:

This is where the adjustable lens on the virtual retinal display comes in.

By properly decollimating the light from the VRD's screen or laser, you can
give that light an arbitrary "equivalent optical distance".

Despite the fact that the light is originating at 2 cm from the eye, it can
appear to originate at 2 km from the eye, in that 2 km is that distance that
the eye will have to accommodate to to put that light in focus.

~~~
wlievens
> Despite the fact that the light is originating at 2 cm from the eye, it can
> appear to originate at 2 km from the eye, in that 2 km is that distance that
> the eye will have to accommodate to to put that light in focus.

But there's a wall in front of it. How can I perceive an object as 10m away
when I'm staring at a wall at 2m away?

~~~
jonmrodriguez
The wall will appear as if it:

\- were translucent, and

\- had a screen behind it

The light arrives at your eye containing both an image of a wall when focusing
at 3m and an image of a TV show when focusing at 20m.

Such a light field is not distinguishable from the light field that would
occur if the wall actually were translucent and actually did have a TV show
playing on a screen behind it

~~~
milkshakes
i think his question has less to do with how your system will render the
information about what's behind the wall, and more to do with how your system
will acquire that information in the first place

~~~
jonmrodriguez
I think that in the future, the following phenomena are likely to proliferate:

\- high-res satellite and airplane imaging of the globe

\- public geotagged photos, including compass bearing

\- Google Street View road images

\- public security cameras on public streets

All of this data can be integrated using software like Microsoft's PhotoSynth,
to stitch together a gigantic 3D model of the real globe.

Then, people with Voxel Vision will be able to look around at streets and
landmarks that are interesting but physically occluded.

~~~
blackiron
3d data of the surroundings could be also obtained with a portable sound-to-3d
sonar system.. not sure if such exists yet but it would be a cool thing to
connect to your device for enhanced bat-sight.

~~~
tomelders
I saw this documentary once about a man who was half bat, half man. He
implemented something like this using mobile phones in the area he wanted to
look at.

I think he must not have gotten the sonar parts in his half bat bits.

~~~
wlievens
He did have the help of one particular Magical Negro
[<http://tvtropes.org/pmwiki/pmwiki.php/Main/MagicalNegro>]

