
Software converts 360 video into 3D model for VR - sbarre
http://360rumors.com/2017/11/software-institut-pascal-converts-360-video-3d-model-vr.html
======
nkurz
_Nowadays, it is possible to create a 3D model of a space by taking hundreds
of photographs of a space and using computers to analyze those photos through
photogrammetry and computer vision methods. This is a laborious process, and
is especially time-consuming and resource-intensive for large scenes._

I was recently trying to understand what the state of the art was here, and
was surprised to learn that this is true: most 3D reconstruction fares better
with a smaller number of high-resolution still photos than with a larger
number of lower resolution "stills" extracted from video. I'm still somewhat
confused why this is the case.

My limited understanding is that main advantage of working with stills is that
they more commonly contain location tagging, while frames from video do not.
Compared to the base level of difficulty of building a accurate 3D model from
lossy 2D sources, figuring out the trajectory of camera doesn't seem too hard.
Once one has this, wouldn't the video be just as easier to work with?

Trying to figure this out this discrepancy, I got the impression that many
researchers may actually be trying to solve the harder problem of trying to
compute the camera trajectory in real time, as might be needed for a self-
driving vehicle:
[http://webdiis.unizar.es/~raulmur/orbslam/](http://webdiis.unizar.es/~raulmur/orbslam/).
This indeed does seem harder, but still leaves me wondering why a multipass
approach wouldn't be feasible.

What am I missing? Why can't one do a first pass to calculate camera
trajectory, possibly a second pass to combine temporally adjacent frames for
greater resolution, then create a better model from the resulting wealth of
data? Alternatively stated, why does the quality of the 3D model seem to
depend more on the resolution and quality of the input 2D images rather than
on the number of these images?

~~~
ndh2
I've been wondering about this as well. Why are they using so few images? To
me that is like reverse blinking: Instead of keeping your eyes open, and
blinking for a short amount of time every now and then, you're walking around
with your eyes closed, and open them only for a short time.

I believe the main problem is knowing which data to trust and which data to
discard. Reflections (water, shiny surfaces), moving objects (leaves in the
wind), over-exposure, and lens flares are already pretty hard to deal with.
But with low resolution data, you make it even more difficult because there's
more data to discard for being too inaccurate.

~~~
namlem
Actually, that's not how our eyes work at all. In fact, the "reverse blinking"
you describe is much closer to the truth. The retina has a very small region
of high acuity that darts around your visual field, essentially taking
snapshots that your brain stitches together. A lot of what we perceive as
movement is actually our brains predicting trajectories.

~~~
ghusbands
No, a lot of what we perceive as movement is our eyes seeing movement. Motion
is perceived across the whole of our field of vision, not just by the fovea
(the "very small region of high acuity" you mentioned). Try not to spread
misinformation.

------
cwe
Reminds me of parts of photosynth[1], glad to see people are still working on
this kind of thing. Was very disappointed that photosynth didn't really pan
out with building 3d models off the images, they went with just panoramas.

[1]
[https://en.wikipedia.org/wiki/Photosynth](https://en.wikipedia.org/wiki/Photosynth)

~~~
greggman
I guess not quite the same but the majority of 3d models in Google maps 3d
mode are autogenerated

------
duiker101
This is really nice but PLEASE don't put an image of a youtube video with the
play button at the beginning of the article just to post the video later... it
took me way too long to realize I was clicking on an image.

~~~
residude
I clicked 15 times until using inspect element and realized no link in it.

------
PeachPlum
Try Autodesk's Recap

[https://www.autodesk.com/products/recap/overview](https://www.autodesk.com/products/recap/overview)

~~~
sbarre
This seems to require some serious non-hobbyist hardware to generate the data
needed to build the models, but the results are more impressive than this demo
for sure!

~~~
PeachPlum
We've made models using a cell phone camera

~~~
sbarre
Interesting! I'd love to know more about that..

~~~
PeachPlum
I'll try and post some images tomorrow, my colleague has them

------
mcoliver
Check out Reality Capture by Capturing Reality (yeah..really..they reversed
the name of the company and the software).
[https://www.capturingreality.com](https://www.capturingreality.com)

We have used this in conjunction with screen captures from google earth to
regenerate environments

------
GoToRO
Very nice. It needs two 360 videos.

------
drcross
Things like this give hefty credence to simulation theory.

------
qume
This would have been news in 2005

~~~
sbarre
Do you have a source for something similar from back then?

Honest question because I keep up with this field and this seemed pretty novel
to me (at least the quasi-DIY aspect of it)

~~~
qume
Snavely released bundler 9 years ago:
[http://www.cs.cornell.edu/~snavely/bundler/](http://www.cs.cornell.edu/~snavely/bundler/)

There were a bunch of other papers around the time.

I was downvoted a bunch here - which is odd as this really would have been
news in 2005 when this would have been considered state of the art.

Right now there is nothing here which wasn't published already a decade ago.

Disclosure: I've been working on structure from motion software for the last
decade.

~~~
nkurz
_Right now there is nothing here which wasn 't published already a decade
ago._

I downvoted your initial comment, but because I thought it was unhelpful
rather than because I thought it was untrue. By contrast, I upvoted your more
recent comment that mentions your expertise and defends your view with a
useful link to decade old software.

Still, I'd guess that for many people outside your field, "would have been
news" is not the same as with "published already a decade ago". I'd guess the
majority are interested in what's currently achievable using off-the-shelf
hardware and ready-to-run software, and aren't bothered that it may be weak in
theoretical advances. Alternatively phrased, people may consider the
performance and availability newsworthy even if the theory isn't cutting edge.

That said, I'm familiar with neither the state of the art nor the state of the
theory. Are you saying that you could strap the same consumer camera rig to
your head, take an unplanned stroll through a forest or city, and achieve the
same model quality by running the resulting video through Bundler? If so, you
would have a strong case that the parent article is accepting the hype of the
press release a little too easily.

~~~
qume
You nailed it. You could use pix4d, photoscan, bundler + pmvs, inpho,
areohawk, etc many years ago to achieve exactly what this is showing. With
wharever cameras you happen to have had at the time, strapped to whatever you
feel like strapping it to.

The author of the article did not do due dilligence on the subject.

------
mh2292
This is nothing new- Google's "Cardboard Camera" app has been around for a few
years and does almost exactly this same thing.
[https://play.google.com/store/apps/details?id=com.google.vr....](https://play.google.com/store/apps/details?id=com.google.vr.cyclops&hl=en)

~~~
throwaway2016a
This looks like it is actually creating a 3d model. The app you linked to I
think just stitches together photos into something closer to street view. It's
not actually 3d, it just looks 3d because you have enough photos to cover the
whole field of vision.

~~~
nocut12
They don't do a 3d model, but it is a real stereo image. There's a bit more
info in their docs: [https://developers.google.com/vr/concepts/cardboard-
camera-v...](https://developers.google.com/vr/concepts/cardboard-camera-vr-
photo-format#gimage)

~~~
DiThi
> it is a real stereo image

Until you look down. Or until you have a different eye distance than average.
Or you try to move a bit and there are things close enough to make you dizzy.

Cardboard didn't need any of those fancy things, of course. It was bad enough
with abysmal latency.

