
Light-field videos: Part I - Findeton
https://roblesnotes.com/blog/light-fields-project-i/
======
joefourier
I built a functional light field video and photo camera* a few years back and
the results were absolutely stunning in VR. That was before the LLFF paper was
released, so it was relying on simple depth-maps to generate a geometric
proxy, and having enough viewpoints to brute-force non-lambertian surfaces.
Still, it allowed for a completely photorealistic reproduction of most scenes
with full 6-degrees of freedom within the capture volume.

However I've been surprised at how there's hardly been any commercial
applications of light-field technology. Lytro unfortunately folded despite
having large amounts of capital, and the only company left is Raytrix, which
uses light fields largely only for single-camera depth map generation. Google
has released very nice tech demos, but that's about it.

*See here for a few test videos: [https://www.youtube.com/watch?v=6Buj8WWhGrA&list=PLzhX-LcIzx...](https://www.youtube.com/watch?v=6Buj8WWhGrA&list=PLzhX-LcIzxKFyD_E6wPt2sl32q1ZUO9lg&index=14)

------
Scaevolus
The author counts years from the founding of Rome (ab urbe condita or AUC), so
subtract 753 to get the AD year.

Or: yes, this is a recent blog post.

~~~
mysterEFrank
obnoxious

~~~
emsy
I can see how the author thinks this is quirky but actually it’s just a mental
obstacle that tripped me up while reading.

~~~
dTal
Yeah, and he really didn't need to mention year dates 5 times in two
paragraphs either. You can tell he's making excuses to flex his new quirk.

~~~
baking
If I wanted to write a blog post that I could be sure no one would ever finish
reading, I would do just this. (Or maybe just to screen out randoms from
Hacker News. Genius!)

------
idoby
I believe the author is referring to the year MMDCCLXXII. I have no idea where
this 2772 comes from.

------
isoprophlex
The author mentions the annoyance of having to pull 16 sd cards from his
camera array and extracting the files one by one.

Maybe a wifi enabled sd card is what's needed?

Edit: cool, there's even a hack that adds auto ftp uploading of new files to a
specific brand of wifi enabled sd cards

[https://bitbucket.org/harbortronics/flashair-ftp-
upload/src/...](https://bitbucket.org/harbortronics/flashair-ftp-
upload/src/master/)

~~~
dylan604
I've had this pain myself back in 2015-ish or so when live action VR exploded.
The GoPro was the camera of choice with it being readily available and small
form factor. First versions were 6 cameras (6 microSD cards), then 12 for 3D
(6 L/R pairs), and then Google released their Odyssey camera (a 16 camera
rig). Even that wasn't enough as it left holes at the zenith and nadir points,
so we customized more cameras to fill in the gaps bringing the total count to
22 cameras.

As painful as all of that was, there's no way I would have risked losing data
by using wifi enabled cards. If you lose the image from one card, the entire
take is lost. Trying to push that much data at the required throughput for
16-22 cameras is just not something that's going to work out in the field.

~~~
joefourier
The most reliable and cost-effective method I found was an array of USB
cameras, recording on a host PC with PCI-E expansion cards and an SSD array.
You avoid the hassle of file transfers, h264 compression artefacts and frame
synchronisation issues.

~~~
blincoln
I'm really curious, because I tried something similar maybe 5-10 years ago
with a moderately powerful Linux system, and it couldn't support more than 2
USB cameras recording simultaneously.

What were the hardware specs on the PC like? How many cameras could you record
from simultaneously? Which OS were you using, and did you have to do anything
special to make it work?

~~~
joefourier
The PC was nothing special, just a middle-of-the-road i5 4000-series running
plain Ubuntu. The compute requirements were practically nil, so the CPU was
actually overkill for the application.

The main considerations are having enough USB bandwidth (hence the need for
PCI-E expansion cards), dumping the USB camera data straight to the SSD (no
re-encoding, use a proper Gstreamer pipeline or ffmpeg command), and ensuring
you can sustain the write speeds (through RAID array). There was some bug in
the USB drivers that caused the cameras to reserve more bandwidth than they
actually used, limiting their numbers, but that was easy to fix.

My guess is that you probably re-encoded the stream, which would indeed
drastically limit the number of cameras if their resolution was high enough.

------
gibba999
This is really neat!

I am looking at the footage from the A77 on Youtube. The resolution doesn't
seem to be 4K. Apeman seems to do a lot of software upscaling. Their highest-
end model, the A100, claims 20MP resolution, 4K video, and an Panasonic
MN34120 sensor.

Panasonic claims 16MP resolution -- so 4MP less than Apeman claims, and a
maximum of 22fps at 4k. This isn't a linear slide, and I think the best Apeman
could do is grab 1080p at 30fps and upscale.

[https://industrial.panasonic.com/content/data/SC/ds/ds4/MN34...](https://industrial.panasonic.com/content/data/SC/ds/ds4/MN34120PA_E.pdf)

The A77 doesn't say what chip it uses, but it's a model down. And looking at
two Youtube videos, I'd say it's grabbing at best 720p and upscaling, likely
less, but most of the video has enough action that I could be wrong
(compression relics do things as well when scenes change quickly).

OP: Can you post some video frames and see if this setup works as claimed? If
it does, it'd be really neat to play with. I'd even be happy with true 1080.

I'd consider this setup more for photos. Lightfield video will be hard without
frame-synchronized videos. But once you get photos, one can think about how to
invest in videos next.

~~~
Findeton
Author here. Synchronization is an issue I forgot to mention, I guess that's
because it's the first problem I addressed. Right now I'm shooting indoors so
I just switch lights on/off at some point in the video and then synchronize
the videos over the average between the maximum changes of luminance with a
simple python script. It's not a generic solution and the accuracy you get is
not very high, but it's simple and it works well enough for a first test.

Resolution at 30fps is 3840x2160 pixels for the A77, but yes it's probably
upscaled. Again at this point I am not too concerned about resolution at this
point.

------
TaylorAlexander
This is great! Computational photography is a really neat field. I've been
slowly learning computer vision for a different project, but I must have a lot
in common with the author of this post. Many late nights doing video
conversion with the nvidia hardware decoders, photo registration with COLMAP,
and fiddling with cameras.

I recently got a spherical camera and I am trying to use it for
photogrammetry. I also have an array of four 4k cameras with hardware
synchronized shutters hooked up to an NVIDIA Jetson Xavier[1]. That system can
record four 4k streams at once to the SSD. I wonder how many 4k streams a
Jetson Nano could record, because then you could use 16 of these [2] and four
to eight Jetson Nanos to make a camera system with all hardware synchronized
shutters that could easily record the data and export it via the network. It
would cost around $2500 though. These projects get expensive and I keep
thinking I want a sponsor, but the slow pace using what hardware I can buy is
probably fine for now.

I'm trying to do a complete photorealistic photogrammetry capture of hiking
trails, so I can run my robot in simulation on virtual hiking trails and train
real computer vision networks. Lately I've been wondering if there is a GAN in
my future...

The frustrating thing about my project is the sheer amount of computation
required. I really don't need a direct photogrammetry capture of a trail, an
approximation would be fine to some degree. But I take like ten gigabytes of
video data and then process each frame to find keypoints, run correlation on
all these points, and all this (using COLMAP). This stuff can take days to
process on my desktop.

Meanwhile there are neural networks that can compute depth from video in real
time, and I wonder what it would take to stitch sequential depth estimations
in to one 3D model with RGB textures in one continuous calculation. There's so
much research to do!

By the way I found the work in this paper pretty fascinating. [3] Facebook is
working on 6dof video recording and playback, which is quite the challenge on
many levels!

[1] [https://reboot.love/t/new-cameras-on-rover/](https://reboot.love/t/new-
cameras-on-rover/) [2] [https://www.e-consystems.com/4k-usb-
camera.asp](https://www.e-consystems.com/4k-usb-camera.asp) [3]
[https://research.fb.com/wp-content/uploads/2019/09/An-
Integr...](https://research.fb.com/wp-content/uploads/2019/09/An-
Integrated-6DoF-Video-Camera-and-System-Design.pdf)

~~~
pretty_dumm_guy
> The frustrating thing about my project is the sheer amount of computation
> required. I really don't need a direct photogrammetry capture of a trail, an
> approximation would be fine to some degree. But I take like ten gigabytes of
> video data and then process each frame to find keypoints, run correlation on
> all these points, and all this (using COLMAP). This stuff can take days to
> process on my desktop.

If the hiking trails are accessible enough, you should have a look at SLAM
technique. SLAM allows you to create smooth and rough approximate map of the
environment through which you navigate your camera. Colorization of this map
could be done by a GAN(might be an interesting side project).

I am adding some pointers below :

1\. [https://www.doc.ic.ac.uk/~ajd/](https://www.doc.ic.ac.uk/~ajd/) \- Prof.
Davison and his group's work is impressive in this area. 2\.
[https://vision.in.tum.de/research/vslam](https://vision.in.tum.de/research/vslam)
\- Prof. Cremers group have some SOTA algorithms in this area.

P.S: You don't need a heavy setup for this. A single or a stereo camera should
do the job.

~~~
TaylorAlexander
Thank you very much! It has been a long time since I've looked at visual slam
actually. That Omnidirectional LSD-SLAM looks really nice. Their code repo has
been untouched for six years, but this still makes me realize I need to use
vslam! I just found this recent work which seems really useful.

[https://github.com/ivalab/gf_orb_slam2](https://github.com/ivalab/gf_orb_slam2)

I feel like vslam could be the the first step in a post-processing pipeline
that would reduce a lot of the computational complexity of solving large maps.
Once I can easily make large maps I can build simulated environments and use
those for training an agent.

Thanks again for the tips!

------
joshu
the classic:
[http://graphics.stanford.edu/papers/lfcamera/](http://graphics.stanford.edu/papers/lfcamera/)
(good lord can someone upload that video to youtube)

~~~
tobr
This one?
[https://www.youtube.com/watch?v=ciqhoFr9940](https://www.youtube.com/watch?v=ciqhoFr9940)

------
netsec_burn
There are few things the entire planet has standardized on, one of them being
time (with minor exceptions like DST). Regardless of your opinion of religion,
to revert to debating the epoch again would be net negative. Could you imagine
if we had to debate time like we debate the metric vs imperial system? That
has been the result of several very expensive technical failures.

