Hacker News new | past | comments | ask | show | jobs | submit login
Real world location virtually recreated to scale in minutes [video] (nwn.blogs.com)
106 points by Kroeler 63 days ago | hide | past | web | favorite | 58 comments

Related question that I've had for a while. I have 25 years worth of family pictures, many of them taken at my childhood home. The house is no longer in the family, and I desperately want a photorealistic 3D model.

I don't have a walkthrough video, let alone different videos; and I believe reconstruction techniques like these work on such sparse inputs.

What's my best approach here? I have partial floorplans, I made a reasonable SketchUp model (I counted bricks in the pictures to get accurate measurements) but I'm nowhere near my goal, which would be having a complete, photorealistic 3D model I can drop load in Unreal or Unity and make a VR walkthrough (I can only imagine how that would blow my mum's mind!)

I've thought of outsourcing this, e.g. find a talented game environment artist or architectural modeller and pay them to do this, but I fear the result will be "off". I dream of the perfect algorithm that will do this.

Any ideas?

I think there aren't any algorithm that can do that and won't be for any time soon.

I spent several weeks last year recreating my backyard in VR, aiming at 1:1 mapping, using photogrammetry for capture and a standalone mobile headset that allow walking around untethered (Lenovo Mirage Solo). Even with photogrammetry it wasn't as accurate as required for the objects to be placed exactly at the correct position and be able to confidently walk around without fear of hitting a tree that is 20 cm more to the side in VR than in reality. It took many back and forth to get the alignment right. By the time I was done the tomatoes had grown so much the VR environment was already wrong.

Outsourcing reconstruction of lost/old places is a great idea. I found that even just stereo-panoramas of old apartments seen in VR (so without positional tracking) is triggering memories in a way that simple photos can't.

It's definitely possible, if the photos are suitable. There was a burst of interest over a decade ago specifically around reconstructing a 3D model of Notre Dame from public tourist photos, so that might be something to look into. Here's an example:


I am wondering what has happened (or is currently happening) to this project: https://www.youtube.com/watch?v=Ur1Z72_LyTM

OMG, that's incredible. I wanted to do something similar for my project - whenever you stepped in a location where a picture was taken, the picture would fade in, superimposed to the VR scene.

Thats a fascinating project. If you think the SketchUp mod is pretty good for the dimensions, and you have a lot of photos, isn't the next task the mapping.

Take the images that show the most structure, and mark points on the picture and map to points on the model. Time consuming but if you chip away at it....

Or move about in the model, and position 2d image and camera so it aligns, and project/paint the image.

I'm just making stuff up.

I’m interested in this too! If you do outsource maybe we could join forces and offer it as a service. It seems like there would be some good demand and folks would pay a lot for nostalgia (eg rosebud)

Also if this was a reasonably popular service you’d end up with (1000s?) Of input photos with their mapped recreated environments. Perhaps that could serve as the training data for an algorithm.

Have you looked at Meshroom ? If you have enough photos maybe you can select a subset which are similar enough to get the algorithm working.

It's a hot research subject combining deep learning and SLAM, to be able to estimate the locations and have a neural network hallucinate the view from a specific position. There will probably be some turn-key solution with-in a few years.

But for the virtual visit you may not want no solve the full hard problem of reconstructing the 3d point cloud, if you have the position and orientation of the pictures (which you can position manually), you can have a rendering like in street-view or those point-and-click games, where you jump, to the closest picture in the right orientation. It will also have the merit of not suppressing the foreground of the pictures which in a certain sense is kind of the heart of the memory lane.

I remember a ycombinator "ba, tched" startup doing about the same. https://sendreality.com/

The time to sniff the place was a bit more but the quality looked better. Let's not get fooled by speed : without a good 3D copy of the place you are... screwed. Because our eyes can't stand gross mistakes.

That's us. https://sendreality.com/318-main-st-a3d/

The biggest challenge about 3D mapping like this isn't about the tech itself, but about actually making it useful for end users. These demos are cool, but the important thing is that you need to build out the part of the software that makes it useful/valuable for consumer/business applications. Otherwise, you should be working on the tech in a research lab vs. a venture-funded business.

This is a trap that even we've admittedly fallen into at times.

That's a great point, and an interesting example link for two reasons:

1) Real estate seems like a large market for this technology.

2) The specific space in the link looks like it would photograph well, but the 3D version reminds me that it has low ceilings and a lot of areas with limited natural lighting. i.e. it does a worse job than photographs for the purpose — selling the unit — because you can find the unflattering perspectives.

(Side: the red dot is very confusing as regards browser pointer-lock, and the whole thing seemed very awkward to navigate with a trackpad until I discovered that WASD worked.)

The incentives of realtors are not aligned with those of the buyer, which manifests in the cognitive dissonance you've just experienced.

On a different note — if possible, I'd love to chat about your experience with the UI. Do you have an email I can reach out to?

That's awesome! Could you elaborate more on the difficulties of making the model actually valuable? Is it merely the aforementioned problem that even a good scan won't cut it for us picky humans? Or did you run into other issues as well?

Wow. Great. Up until now I knew only such things like [1]. This is what a company I worked for some years ago used (and is still using).

What you are showing is definitely in another league and could be used for an industry like [2] (not that I like the ecological aspect of such travel).

[1] https://diginetmedia.de/virtualreality/360-vr-tour [2] https://www.tuicruises.com/mein-schiff-1

> 3D mapping like this isn't about the tech itself, but about actually making it useful for end users.

Spot on. Body tracking has a similar usability challenge with consumers. I'm finding myself focused more on the E2E experience and workflow integration than the 3D pose estimation itself.

VR ought to be a big application for that, no? A lot of games just leave your torso out of it completely, but some have to include it and it's disorienting to have it drag along like a ragdoll hanging off of your floating head.

Works fine when you're standing up, but when you lean sideways and see your body acting like you took a step to the side and are still standing up straight, that's a little unnerving.

There's another company called Fantasmo (https://www.fantasmo.io) in the same space too.

They're based in LA & have called their approach CPS - Camera Positioning System.

this doesn't look like it's 3d, looks like it's a list of positionally linked 360° photos.

We're experimenting with the value proposition of a DIY, smartphone-based approach, independently of the 3D representation. What you're seeing is us taking the (temporary) approach that's allowed us to iterate a lot faster on the non-technical side of things.

There have been lots of programs to do that. This one isn't that good. Here's an open source one.[1] And another one.[2]

Doing a room is the same photogrammetry problem, but inside-out. Autodesk had that 10 years ago. The current product for that is ReCap.[3] You can work from drone imagery if necessary.

[1] https://www.youtube.com/watch?v=R0PDCp0QF1o

[2] https://www.youtube.com/watch?v=1D0EhSi-vvc

[3] https://www.autodesk.com/products/recap/overview

> Here's an open source one.[1] And another one.[2]

I think both of these links refer to Meshroom/AliceVision. Very nice input, though.

It makes for a cool demo but all the applications that could benefit from this need much higher quality textures and better dimensional accuracy. Also there are certain cases where this fails, reflective or partially transparent surfaces, lack of texture, etc.

Object detection can 'normalise' the shape of your couch from a noisy input, and the upscaling algorithms we've seen recently can clean up the textures. It's only an aggressive short-term R&D effort away from being terrifying.

I can already see deep learning applied to normal maps of the various (+coloured) surfaces at home.

You could then deep-filter the inputs and get a higher-resolution version of the same.

If you wanna do VR games in your house, this is a very useful first step, to map your house. The game engine can then draw things exactly where they need to be based on your environment.

This is nascent technology. Just wait for this to become ubiquitous.

There are so many fantastic uses for this technology. I'm excited about 3D capturing environments and never needing to film on set again. Imagine having a database of locations at your disposal.

I'm working on a PoC (8000 m², outdoor, historic train station building complex). In case you're interested, I'm still searching for a small dataset (5~10m long segment of a buildings outdoor wall, including ground and ornaments/nearby objects) to tune the photogrammetry pipeline on.

If you have access to a global shutter camera with 4k or more and a a mode with no chroma subsampling and intra-only encoding (frame rate doesn't matter, it just takes longer), get in touch for a preview/demo.

I'm interested and have a 12 MP global shutter camera, complete control over chroma, not sure what you mean by intra-only encoding though?

Basically only I-frames, no P or B frames. Chroma isn't even the issue. How can we communicate further? Do you have an email I could send you details to?


This would be pretty handy for indoor mapping. Imagine a crowd-sourced 3d OSM for malls, public buildings, office-plans ect.

That would be massively useful.

Serious question. What are some examples you're thinking about of "massively useful" applications of this?

I think it's a really cute tech demo, but I'm struggling to find obvious uses that'd make the world a better place or generate huge piles of cash.

Fair point, for me 'massively' would be an significant amount of time / confusion saved.

Whenever I want to go catch a bus at an overseas metro station, try find a shop in a unfamiliar mall, or even try find where a clients desk is in an unfamiliar building, that would be fantastic.

The a potential to generate a snapshot of a space indoors would be super handy. Maybe even for places without an address you could look at a timestamped point cloud like that and see which building to go to.

Democratizing capturing 3d space in a simple/user-friendly and cheap way would open up really interesting doors.

Streetview type imagery/navigation solves that problem though, right? There's no need for the depth map/3d reconstruction going on here? Just a "walk through" with a 360degree camera (or multiple overlapping cameras) capturing 2D lets you find the shop or the desk.

Potentially, but I am sure you find use from zooming around in 3d before going into streetview and taking a closer look, as being able to look on from up is a useful perspective. At least I can find streetview disorienting at times.

Maybe it just comes down to personal preference. But having a finer scale of 3d, another level in google maps for example, seems like a logical next step.

I can think of something massively useful for technology like this when it gets a little (lot) better, and considering using many cameras such as from robots - a real time map of the world. You would know where everybody is and where they have been. I admit I don't know what best to do with this. A government certainly would want it. At a minimum it would be much more valuable for advertising than your internet history.

So you've hit on a few "definitely makes thew world a worse place" ideas...

Also, I don't think any of your explanations require - and they'd possibly be made _worse_ by doing the 3d reconstruction/calculations. There's no benefit for surveillance (that I can see) from reconstructing the 3D geometry. Your (arguably evil) applications just require high enough resolution (in 2D) to run accurate-as-available face recognition.

I agree this is scary. I should also say this is about more than just spying on people, and some of the uses, probably the less impactful ones, are not evil. There is a lot you could do with a real time model of the world. Like I said, I haven't thought of the best uses for it.

That's where I ended up - this is a fascinating solution looking for a problem.

I love the idea. I'm not convinced there's "massive useful use cases" for it.

Oh I forgot to add that this would be amazing for disaster relief and Search and Rescue.

Imagine sending in a little robot for rescue that also maps out in 3d the interior of a collapsed building!

Fortnite. In your bathroom.

I expect Apple is working on this type of technology for their next iPhone. What better use case for a back facing depth camera (yes, I understand this demo was performed with a standard camera, but it would likely be better with a depth camera). The implications for their maps, home furnishing, AR, Android differentiator... all very interesting, and Apple will likely do it right. Want a high resolution capture of a vase in your living room? Just walk closer. Claim a space and mark it private? Sure. An option to turn an elaborate house into a basic ceiling/walls/floors for others, maybe.

the baseline photogrammetry software is already 'mostly good' if you're scanning something featureful & matte like a wall of graffiti

where it chokes is specular reflections, curves, 'wobbly bits' (leaves on trees), lots of duplicate features (leaves on trees), and things in parallel planes (leaves on trees in front of a brick wall)

these are all solvable with better scene models, object models and feature stats

looking forward to the next gen of structure from motion tech

This has been possible for a good while, the most common algorithm is Parallel Tracking and Mapping (PTAM). No stereo cameras are needed, but there are also implementations that make use of depth sensors.

What interests me is for an open implementation of Street View: drones plus crowdsourcing and we'd be free of requiring Google services for street visualization. We already have open street maps, we could have open street view too.

For a course project, I implemented DTAM - which does dense mapping and tracking, without needing any stereo vision, in real time (on a mid-range GPU). It's really amazing.

Hadn't heard of DTAM yet, but found a video and am very impressed! What GPU did you use? Do you think it'd be feasible on mobile GPUs?

Unfortunately, I don't know CUDA yet and didn't implement a GPU accelerated version. I can't comment on its feasibility on mobile GPUs but the paper authors reported real-time performance on a GTX-480 + i7 quad-core CPU system.

That sounds interesting.

Any links to further explain what DTAM is/how it works?

DTAM: Dense tracking and mapping in real-time

> DTAM is a system for real-time camera tracking and reconstruction which relies not on feature extraction but dense, every pixel methods. As a single hand-held RGB camera flies over a static scene, we estimate detailed textured depth maps at selected keyframes to produce a surface patchwork with millions of vertices. We use the hundreds of images available in a video stream to improve the quality of a simple photometric data term, and minimise a global spatially regularised energy functional in a novel non-convex optimisation framework. Interleaved, we track the camera's 6DOF motion precisely by frame-rate whole image alignment against the entire dense model. Our algorithms are highly parallelisable throughout and DTAM achieves real-time performance using current commodity GPU hardware. We demonstrate that a dense model permits superior tracking performance under rapid motion compared to a state of the art method using features; and also show the additional usefulness of the dense model for real-time scene interaction in a physics-enhanced augmented reality application.


Awesome, thanks!

I found this post summarizes the concepts step by step and is really helpful. http://ahumaninmachinesworld.blogspot.com/2015/07/dtam-dense...

I remember seeing some of this type of stuff with the Xbox Kinect. It's cool that the form factor has moved to mobile (slowly swinging the Kinect around the room was not ideal). I'm not following this space but from the video it still looks like capturing good 3D models is a long way away. Being in a VR world that looks like that would be terrifying.

Yes I do agree.

Here is Jack Black demoing similar NSA technology in 1998 https://www.youtube.com/watch?v=3EwZQddc3kY

The possibilities for those with disabilities is remarkable. You wouldn't have to map everything to be useful, just common routes.

This is cool. I have a related question. Can I make 3d models of my 5yr old kid and 3-d print robots that walk/talk like thme? Does current day technology even suffice for this? Or do I need to create 3d models now and then wait for a few years for technology to catch up?

You could do it, but it would take forever. Here are some starting points for the different technologies involved:

Here's how to make the walking robot: https://asimo.honda.com/ http://users.umiacs.umd.edu/~fer/cmsc828/classes/cse390-05-0...

Here's how to make it look like them: https://www.creativebloq.com/features/deepfake-examples https://www.youtube.com/watch?v=_9qs6JudXJg

Here's how to make it talk like them: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Thnaks. yes, based on the list you posted, it will take forever. Guess I have to wait for someone to put a package together.

The deepfakes can only do a 2-d model, not a 3-d model yet.

Btw, why are people down voting this?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact