
Software Can Recreate 3D Spaces from Random Internet Photos - elsewhen
https://www.vice.com/en_us/article/xg87ea/software-can-recreate-3d-spaces-from-random-internet-photos
======
philipkglass
Previously discussed a few days ago:

"NeRF in the Wild: reconstructing 3D scenes from internet photography"

[https://news.ycombinator.com/item?id=24071787](https://news.ycombinator.com/item?id=24071787)

------
CyberDildonics
This was actually done over 12 years ago with a feature extraction called SIFT
and approximate nearest neighbor kd-trees at the university of british
columbia. It was turned into a product by microsoft. There are youtube videos
that show models of the notre dame cathedral recreated with photos from
flickr.

~~~
dcanelhas
"Building rome in a day" and the "Photo tourism" papers were seminal works but
you're not really making a serious argument if you are claiming them to be
equivalent prior art to this.

I get it, artificial neural nets make everything easy and boring...back in the
day when you had to solve difficult linear algebra problems by hand and tweak
descriptor matching thresholds for ages...those were the good old glory days
of computer vision. ;)

~~~
CyberDildonics
I think you have an exceptionally warped idea of how manual photosynth was, or
maybe you aren't familiar with it at all.

[https://youtu.be/p16frKJLVi0?t=32](https://youtu.be/p16frKJLVi0?t=32)

The core idea is placing pixels in 3D space, in this case by figuring out the
transform of the projections of each image.

This reconstructs a field by taking into account the direction each color is
coming from, but the difficult part is getting that information in the first
place, which photosynth already did. This paper seems to be organizing that
data into a network while claiming that the entire capability and result is
new.

~~~
teraflop
The NeRF-W paper is definitely not claiming that the entire capability is
novel. The authors even cite the original Photo Tourism paper in the "Related
Work" section. The introduction explains what makes their approach different:

> The Neural Radiance Fields (NeRF) approach [21] implicitly models the
> radiance field and density of a scene within the weights of a neural
> network. Direct volume rendering is then used to synthesize new views,
> demonstrating a heretofore unprecedented level of fidelity on a range of
> challenging scenes.

Reconstructing camera positions and sparse point clouds is not really the
"difficult part"; it's been a fairly well-understood problem for more than a
decade, as you point out. Generating high-quality images without obtrusive
artifacts is a much harder problem, and Photosynth didn't even try to solve
it.

~~~
CyberDildonics
> Generating high-quality images without obtrusive artifacts is a much harder
> problem, and Photosynth didn't even try to solve it.

That's not really true, there are lots of ways to blend the
points/pixels/images together from different viewpoints, including the
illumination changing - this looks like more of a refinement in
compression/volumetric representation instead of generating geometry and
having view dependent textures or some other technique. You can see in the
video that some things like the water shift as the view changes in a very soft
way, and some things end up represented with more transparency. The results
look good, but it might accurate to say that the article is way off in its
claims instead of the paper, although the paper fails to mention photosynth as
a citation as far as I could see.

~~~
teraflop
> That's not really true, there are lots of ways to blend the
> points/pixels/images together from different viewpoints

Of course there are, but (as far as I recall) Photosynth didn't use any of
them; it only projected the original images on top of a point cloud. In any
case, that's kind of my point; the rendering step is where the interesting
research has been happening, not the camera pose reconstruction process which
is trivially easy in comparison.

> although the paper fails to mention photosynth as a citation as far as I
> could see.

Photosynth was Microsoft's implementation of the same technology that was
described in the "Photo Tourism" paper from 2006, which the NeRF-W paper does
cite. Naturally, it makes more sense for other researchers work to refer to
the original academic literature rather than the commercialized product.

------
6510
When I first experience a picture of a street turn into a 3d environment I
thought it was wonderful and that we would gradually learn to travel further
into the painting with each cm taking more effort than the cm before.
Eventually we would be able to (be it virtually) walk our way from something
closely matching reality (or the imagination of the artist) to a collectively
generated fully imaginary environment without knowing where one ends and the
other begins.

------
ricardo81
This is the cool kind of thing a behemoth like Google can and should please
the world with, from a company their size. Unfortunately it's been at the
expense of user privacy and competition.

Nonetheless I admire (some of) the new places they take us.

------
netsharc
I've been making cross-eyed 3D pictures (What it is:
[https://www.youtube.com/watch?v=TfBHJEIvAc4](https://www.youtube.com/watch?v=TfBHJEIvAc4)
(not my video)), it's surprisingly effective, I can see how far landscapes
actually go and how distant objects are...

