Hacker News new | past | comments | ask | show | jobs | submit login
Software Can Recreate 3D Spaces from Random Internet Photos (vice.com)
37 points by elsewhen on Aug 10, 2020 | hide | past | favorite | 11 comments



Previously discussed a few days ago:

"NeRF in the Wild: reconstructing 3D scenes from internet photography"

https://news.ycombinator.com/item?id=24071787


This was actually done over 12 years ago with a feature extraction called SIFT and approximate nearest neighbor kd-trees at the university of british columbia. It was turned into a product by microsoft. There are youtube videos that show models of the notre dame cathedral recreated with photos from flickr.


Same rough usecase but not the same output. The SIFT stereoscopy approach would build a 3D model while this learns a field.

Meaning that this deals properly with things like reflection and transparency. Also it is resistant to partial occlusion that appears randomly in the training data (people in the photographs).

The huge downside is that you have to train a neural network every time you want a model: that is very slow, ressource intensive and fragile.


"Building rome in a day" and the "Photo tourism" papers were seminal works but you're not really making a serious argument if you are claiming them to be equivalent prior art to this.

I get it, artificial neural nets make everything easy and boring...back in the day when you had to solve difficult linear algebra problems by hand and tweak descriptor matching thresholds for ages...those were the good old glory days of computer vision. ;)


I think you have an exceptionally warped idea of how manual photosynth was, or maybe you aren't familiar with it at all.

https://youtu.be/p16frKJLVi0?t=32

The core idea is placing pixels in 3D space, in this case by figuring out the transform of the projections of each image.

This reconstructs a field by taking into account the direction each color is coming from, but the difficult part is getting that information in the first place, which photosynth already did. This paper seems to be organizing that data into a network while claiming that the entire capability and result is new.


The NeRF-W paper is definitely not claiming that the entire capability is novel. The authors even cite the original Photo Tourism paper in the "Related Work" section. The introduction explains what makes their approach different:

> The Neural Radiance Fields (NeRF) approach [21] implicitly models the radiance field and density of a scene within the weights of a neural network. Direct volume rendering is then used to synthesize new views, demonstrating a heretofore unprecedented level of fidelity on a range of challenging scenes.

Reconstructing camera positions and sparse point clouds is not really the "difficult part"; it's been a fairly well-understood problem for more than a decade, as you point out. Generating high-quality images without obtrusive artifacts is a much harder problem, and Photosynth didn't even try to solve it.


> Generating high-quality images without obtrusive artifacts is a much harder problem, and Photosynth didn't even try to solve it.

That's not really true, there are lots of ways to blend the points/pixels/images together from different viewpoints, including the illumination changing - this looks like more of a refinement in compression/volumetric representation instead of generating geometry and having view dependent textures or some other technique. You can see in the video that some things like the water shift as the view changes in a very soft way, and some things end up represented with more transparency. The results look good, but it might accurate to say that the article is way off in its claims instead of the paper, although the paper fails to mention photosynth as a citation as far as I could see.


> That's not really true, there are lots of ways to blend the points/pixels/images together from different viewpoints

Of course there are, but (as far as I recall) Photosynth didn't use any of them; it only projected the original images on top of a point cloud. In any case, that's kind of my point; the rendering step is where the interesting research has been happening, not the camera pose reconstruction process which is trivially easy in comparison.

> although the paper fails to mention photosynth as a citation as far as I could see.

Photosynth was Microsoft's implementation of the same technology that was described in the "Photo Tourism" paper from 2006, which the NeRF-W paper does cite. Naturally, it makes more sense for other researchers work to refer to the original academic literature rather than the commercialized product.


When I first experience a picture of a street turn into a 3d environment I thought it was wonderful and that we would gradually learn to travel further into the painting with each cm taking more effort than the cm before. Eventually we would be able to (be it virtually) walk our way from something closely matching reality (or the imagination of the artist) to a collectively generated fully imaginary environment without knowing where one ends and the other begins.


This is the cool kind of thing a behemoth like Google can and should please the world with, from a company their size. Unfortunately it's been at the expense of user privacy and competition.

Nonetheless I admire (some of) the new places they take us.


I've been making cross-eyed 3D pictures (What it is: https://www.youtube.com/watch?v=TfBHJEIvAc4 (not my video)), it's surprisingly effective, I can see how far landscapes actually go and how distant objects are...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: