
Ask HN: How do World Lenses work without depth sensor? - hackerews
Snap just released World Lenses and I&#x27;m wondering if CV experts can explain how they work: https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=K6x44v8prFA.<p>I was under the impression that phones needed depth and advanced motion tracking (eg see Tango https:&#x2F;&#x2F;developers.google.com&#x2F;tango&#x2F;overview&#x2F;motion-tracking) to enable stuff like placing digital objects down and having them stick without moving around. Surprised Snapchat can now do that without needing additional sensors on the device.<p>How might that work?
======
oppositelock
You can use the accelerometer to figure out which way is "down". Then, using
CV techniques to identify the ground plane. Even if you get this a bit wrong,
as long as you visually track some registration points, you get consistent
positioning of your virtual object, which is probably the most important bit.

I noticed in the videos that nothing ever goes in front of the animated
objects, so they're probably not identifying objects, and trying to figure out
object boundaries or anything like that. It's a simple, well-polished trick.

~~~
chillacy
That would be my guess as well. If you actually try the feature, it doesn't
actually map the planes in the image, it just assumes you're pointing it at
flat ground.

------
zamalek
Clickable:
[https://www.youtube.com/watch?v=K6x44v8prFA](https://www.youtube.com/watch?v=K6x44v8prFA)

------
hartman_willie
Tango is the "correct" way to do this and will give essentially the most
robust set of data back to whoever is looking. A SLAM image + depth based
hybrid approach will always give you more robust data.

But just plain old analysis of "parallax" (things closer move faster etc) can
quickly give you a workable understanding of the general geometry of a space.

The new MR headsets like Magic Leap and HoloLens likely use both types of
sensors (like the kinect camera) to determine this stuff. But Snap just wants
the widest push possible, and since it is not mounted on your eyes, the 'pixel
stick' being off is not as much of an issue.

------
CaffeineSqurr
There's a lot you can do with just a single camera and imu. PTAM is just one
example there are other formulations but it tends to work well without using a
ton of CPU/GPU resources

[http://www.robots.ox.ac.uk/~gk/PTAM/](http://www.robots.ox.ac.uk/~gk/PTAM/)

~~~
Qworg
There are a few things stacked on top of each other to make it happen, but
this is a good start.

Combine this with GPS information and you can roughly "place it in the world"
and make sure it doesn't clip the ground plane. After that, it is just pulling
it up from your sparse 2D DB and dumping into the scene for other cameras. If
they really care, there'll be some data included from the placement camera to
help localize it.

------
popey456963
Imperial (A UK University) are currently making large leaps forward in terms
of modelling a room or environment with just a single camera and little
computing power. Their basic methodology was:

\- Pick "unique" points currently in view

\- Track how they move as the camera moves

\- Combine this data with the accelerometer to get an accurate movement
reading of the phone.

\- You can get the depth of any point by comparing two images and knowing the
change in user position.

Simple algorithm, but their results were astonishingly good. Snap don't need
to model the entire room, they just need to work out where these points are to
keep the image appearing still.

------
gmiller123456
There are several ways to do depth mapping with just a single camera.
Structure from Motion, and Multiple View Geometry are a couple. Of course,
more sensors will usually provide better data. I imagine there are plenty of
situation where this app with behave poorly (e.g. low light, scenes with
little texture), they just don't show it in the demo.

------
thomaspun
Besides, analysing the pixels directly, maybe their algorithm take into
account the vibration/movement of the live video to detect depth.

------
quotemstr
Close one eye. You can still estimate depth, can't you?

~~~
hackerews
Definitely.

But I guess I find it extremely surprising that Google would go through years
of working on Tango - software, hardware, developer community, etc, etc - if
the tech was available on existing phones.

[https://developers.google.com/tango/](https://developers.google.com/tango/)

~~~
khedoros1
If you've got typical levels of binocular depth perception:

Close one eye. Slowly poke something 2-3 feet away. You can do it, but you
might be off an inch or two with your distance estimation and be surprised
when you actually make contact. Open both eyes and do something similar with
another object. The added depth information should mean that you can tell
where something is to a greater precision.

Same thing with Tango. In both cases, you can extract some information about
the environment from a 2D image stream. Add in accelerometer, gyroscope, and
depth-sensing hardware, and you can correct some of the edge cases while
increasing precision even for the things that you could do with standard
hardware.

------
ap46
Semi-dense visual odometry, thats what you need to search.

