
DeepTAM: Deep Tracking and Mapping - lainon
https://lmb.informatik.uni-freiburg.de/people/zhouh/deeptam/
======
angry_octet
I started reading this thinking "oh not another deep learning / SLAM paper
with no code or data". Ironically they reference and build on the concepts
from DTAM:

[https://github.com/magican/OpenDTAM.git](https://github.com/magican/OpenDTAM.git)

If you don't publish code your paper will have less impact. We want to test
your algorithm on different data, not painstakingly reimplement it from
incomplete descriptions.

There are literally thousands of papers in the field every year, you want to
be accessible not baroque.

~~~
phiresky
The same authors did publish code for another recent paper [1]. So it doesn't
look like they are adverse to publishing their code in general.

[1]: [https://lmb.informatik.uni-
freiburg.de/people/ummenhof/depth...](https://lmb.informatik.uni-
freiburg.de/people/ummenhof/depthmotionnet/)

~~~
angry_octet
That is positive. It just needs to become the norm.

------
blt
SLAM is one of the few computer vision problems not conquered by deep learning
yet... guess that's changing.

~~~
namibj
How about reconstruction a high-resolution mesh from pictures/video? The best
I know yet is to use patch-based reconstruction of a depth map and feed them
to floating-scale surface reconstruction, or a similar patch-size-aware
poisson-style mesh generator. Is there any code handling the depth-map
reconstruction using deep learning which you know of? Feeding it precise
camera parameters/un-distorted views as well as precise locations of these
views with some known-matching points (artifacts of previous processing steps)
is not a problem. Even pre-selecting only somewhat well-matching views is not
a problem, as that is likely better done out-of-core anyway, due to the size
of datasets where nice things become possible (e.g., capturing a small part
(100 * 100 m, 5 stories, or equivalent surface area) of a neighborhood in
sufficient precision to max out the resolution of likely all VR headsets you
can get for this year's Christmas).

The current outlook from what I found to reconstruct such depth maps is bleak
as far as speed goes, with the alternative being a drop in density/resolution
too low to be useful (capturing is cheap but not free). If there is some magic
based on deep learning I'd like to forgo having to split part of these
algorithms to e.g. an FPGA or so (the parts that sort and arrange the patches
that should be hit with brute-force number crunching), as it seems from what I
understand about them that it's near-impossible to decide these arrangements
efficiently on a CPU or even GPU, seeing how sparse the math and how
dense/wide the branching is. (I'm considering feeding lists of to-be-compared
patches to a GPU and the results back, the technique is somewhat similar to
Dijkstra's algorithm for deciding in what order to compare and what starting
values to use in the iterative optimization of depth and surface normals, and
that branching currently takes 80% of CPU time without the actual number
crunching even using vectorization, combined with GPU speed I expect two
magnitudes of improvement as a minimum, and hope for closer to three)

~~~
blt
there are lots of papers that attempt to reconstruct depth maps from monocular
RGB images. It's not really my field but here is google scholar search for
recent papers that cite a seminal older paper on the topic:
[https://scholar.google.com/scholar?as_ylo=2017&hl=en&as_sdt=...](https://scholar.google.com/scholar?as_ylo=2017&hl=en&as_sdt=2005&sciodt=0,5&cites=4052327659070755452&scipsc=)
that should be a decent start.

If you know the depth of a few pixels in the image, e.g. from a sparse
keypoint-based SLAM / visual-inertial odometry system, then you can do better:
[https://arxiv.org/pdf/1709.07492.pdf](https://arxiv.org/pdf/1709.07492.pdf)

If you already have accurate camera positions, you can use something like
occupancy grid mapping or poisson reconstruction to build the mesh.

