I don't have a walkthrough video, let alone different videos; and I believe reconstruction techniques like these work on such sparse inputs.
What's my best approach here? I have partial floorplans, I made a reasonable SketchUp model (I counted bricks in the pictures to get accurate measurements) but I'm nowhere near my goal, which would be having a complete, photorealistic 3D model I can drop load in Unreal or Unity and make a VR walkthrough (I can only imagine how that would blow my mum's mind!)
I've thought of outsourcing this, e.g. find a talented game environment artist or architectural modeller and pay them to do this, but I fear the result will be "off". I dream of the perfect algorithm that will do this.
I spent several weeks last year recreating my backyard in VR, aiming at 1:1 mapping, using photogrammetry for capture and a standalone mobile headset that allow walking around untethered (Lenovo Mirage Solo). Even with photogrammetry it wasn't as accurate as required for the objects to be placed exactly at the correct position and be able to confidently walk around without fear of hitting a tree that is 20 cm more to the side in VR than in reality. It took many back and forth to get the alignment right. By the time I was done the tomatoes had grown so much the VR environment was already wrong.
Outsourcing reconstruction of lost/old places is a great idea. I found that even just stereo-panoramas of old apartments seen in VR (so without positional tracking) is triggering memories in a way that simple photos can't.
Take the images that show the most structure, and mark points on the picture and map to points on the model. Time consuming but if you chip away at it....
Or move about in the model, and position 2d image and camera so it aligns, and project/paint the image.
I'm just making stuff up.
Also if this was a reasonably popular service you’d end up with (1000s?) Of input photos with their mapped recreated environments. Perhaps that could serve as the training data for an algorithm.
It's a hot research subject combining deep learning and SLAM, to be able to estimate the locations and have a neural network hallucinate the view from a specific position. There will probably be some turn-key solution with-in a few years.
But for the virtual visit you may not want no solve the full hard problem of reconstructing the 3d point cloud, if you have the position and orientation of the pictures (which you can position manually), you can have a rendering like in street-view or those point-and-click games, where you jump, to the closest picture in the right orientation. It will also have the merit of not suppressing the foreground of the pictures which in a certain sense is kind of the heart of the memory lane.
The time to sniff the place was a bit more but the quality looked better.
Let's not get fooled by speed : without a good 3D copy of the place you are... screwed.
Because our eyes can't stand gross mistakes.
The biggest challenge about 3D mapping like this isn't about the tech itself, but about actually making it useful for end users. These demos are cool, but the important thing is that you need to build out the part of the software that makes it useful/valuable for consumer/business applications. Otherwise, you should be working on the tech in a research lab vs. a venture-funded business.
This is a trap that even we've admittedly fallen into at times.
1) Real estate seems like a large market for this technology.
2) The specific space in the link looks like it would photograph well, but the 3D version reminds me that it has low ceilings and a lot of areas with limited natural lighting. i.e. it does a worse job than photographs for the purpose — selling the unit — because you can find the unflattering perspectives.
(Side: the red dot is very confusing as regards browser pointer-lock, and the whole thing seemed very awkward to navigate with a trackpad until I discovered that WASD worked.)
On a different note — if possible, I'd love to chat about your experience with the UI. Do you have an email I can reach out to?
What you are showing is definitely in another league and could be used for an industry like  (not that I like the ecological aspect of such travel).
Spot on. Body tracking has a similar usability challenge with consumers. I'm finding myself focused more on the E2E experience and workflow integration than the 3D pose estimation itself.
Works fine when you're standing up, but when you lean sideways and see your body acting like you took a step to the side and are still standing up straight, that's a little unnerving.
They're based in LA & have called their approach CPS - Camera Positioning System.
Doing a room is the same photogrammetry problem, but inside-out. Autodesk had that 10 years ago. The current product for that is ReCap. You can work from drone imagery if necessary.
I think both of these links refer to Meshroom/AliceVision. Very nice input, though.
You could then deep-filter the inputs and get a higher-resolution version of the same.
There are so many fantastic uses for this technology. I'm excited about 3D capturing environments and never needing to film on set again. Imagine having a database of locations at your disposal.
If you have access to a global shutter camera with 4k or more and a a mode with no chroma subsampling and intra-only encoding (frame rate doesn't matter, it just takes longer), get in touch for a preview/demo.
That would be massively useful.
I think it's a really cute tech demo, but I'm struggling to find obvious uses that'd make the world a better place or generate huge piles of cash.
Whenever I want to go catch a bus at an overseas metro station, try find a shop in a unfamiliar mall, or even try find where a clients desk is in an unfamiliar building, that would be fantastic.
The a potential to generate a snapshot of a space indoors would be super handy. Maybe even for places without an address you could look at a timestamped point cloud like that and see which building to go to.
Democratizing capturing 3d space in a simple/user-friendly and cheap way would open up really interesting doors.
Maybe it just comes down to personal preference. But having a finer scale of 3d, another level in google maps for example, seems like a logical next step.
Also, I don't think any of your explanations require - and they'd possibly be made _worse_ by doing the 3d reconstruction/calculations. There's no benefit for surveillance (that I can see) from reconstructing the 3D geometry. Your (arguably evil) applications just require high enough resolution (in 2D) to run accurate-as-available face recognition.
I love the idea. I'm not convinced there's "massive useful use cases" for it.
Imagine sending in a little robot for rescue that also maps out in 3d the interior of a collapsed building!
where it chokes is specular reflections, curves, 'wobbly bits' (leaves on trees), lots of duplicate features (leaves on trees), and things in parallel planes (leaves on trees in front of a brick wall)
these are all solvable with better scene models, object models and feature stats
looking forward to the next gen of structure from motion tech
What interests me is for an open implementation of Street View: drones plus crowdsourcing and we'd be free of requiring Google services for street visualization. We already have open street maps, we could have open street view too.
Any links to further explain what DTAM is/how it works?
> DTAM is a system for real-time camera tracking and reconstruction which relies not on feature extraction but dense, every pixel methods. As a single hand-held RGB camera flies over a static scene, we estimate detailed textured depth maps at selected keyframes to produce a surface patchwork with millions of vertices. We use the hundreds of images available in a video stream to improve the quality of a simple photometric data term, and minimise a global spatially regularised energy functional in a novel non-convex optimisation framework. Interleaved, we track the camera's 6DOF motion precisely by frame-rate whole image alignment against the entire dense model. Our algorithms are highly parallelisable throughout and DTAM achieves real-time performance using current commodity GPU hardware. We demonstrate that a dense model permits superior tracking performance under rapid motion compared to a state of the art method using features; and also show the additional usefulness of the dense model for real-time scene interaction in a physics-enhanced augmented reality application.
Here's how to make the walking robot:
Here's how to make it look like them:
Here's how to make it talk like them:
The deepfakes can only do a 2-d model, not a 3-d model yet.
Btw, why are people down voting this?