
Learning to See in the Dark - isp
http://web.engr.illinois.edu/~cchen156/SID.html
======
taneq
I'm always concerned when this type of deep learning image processing is
presented. The resulting images look nice but there's no guarantee that all
the extra detail visible in those images is genuine detail and not just
"believable" data filled in by the net. Maybe fine for happy snaps but it's
very important that users of the camera know that its output is just an
"artist's impression."

It raises shades of the Xerox copiers which helpfully "compressed" images by
deciding that 6s, 8s and 9s looked similar enough and using them
interchangeably.
([http://www.bbc.co.uk/news/technology-23588202](http://www.bbc.co.uk/news/technology-23588202))

~~~
sdrothrock
> The resulting images look nice but there's no guarantee that all the extra
> detail visible in those images is genuine detail and not just "believable"
> data filled in by the net. Maybe fine for happy snaps but it's very
> important that users of the camera know that its output is just an "artist's
> impression."

I completely agree with this, and it's more and more dangerous as the
resulting images appear more and more realistic. On a related tangent, this
also showed up recently: [http://fortune.com/2018/04/24/nvidia-artificial-
intelligence...](http://fortune.com/2018/04/24/nvidia-artificial-intelligence-
images/)

A lot of people I know -- intelligent people who are familiar with machine
learning and image manipulation -- were confused as to how this approach was
"recovering" data.

It's not recovering data at all; it's guessing and filling in blanks, but
doing so in such a realistic fashion that apparently it's poking around in
some blind spots because the result is so convincing that you think it's the
"real" image. I feel like the same blind spot would be attacked with "seeing
in the dark" as well.

~~~
dahart
> On a related tangent, this also showed up recently: (NVIDIA’s inpainting). A
> lot of people I know -- intelligent people who are familiar with machine
> learning and image manipulation -- were confused as to how this approach was
> "recovering" data.

Realistic image inpainting & synthesis has been going on for decades, so I’d
guess the main confusion is due to reading the title of Fortune’s article,
rather than the paper’s title “Image Inpainting for Irregular Holes Using
Partial Convolutions”. BTW, Kudos to Fortune for actually linking to the
paper. I felt like it was pretty obviously inpainting, and suggesting
arbitrary training data just from watching the video, so maybe the confusion
was from reading the PR title only, and not diving any deeper?

Here’s my favorite inpainting paper, partly because the author is a friend,
but also because it’s able to hallucinate written text, which most inpainting
algorithms since then haven’t been able to do. It’s not a neural network
though, and the training data comes from the single input image itself.
[http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung....](http://graphics.cs.cmu.edu/people/efros/research/EfrosLeung.html)

> it’s more and more dangerous... I feel like the same blind spot would be
> attacked with “seeing in the dark” as well.

It’s possible, yes, but it does depend on what the authors did, how the
network was trained, whether they allow reconstruction from pure noise, etc..
I would agree that this paper title is a bit provocative, and suggests
assuming the output is realistic. The problem might be the title, and not the
technique.

While it is important to understand that NNs are hallucinating output with
training data, it’s also a good idea to reflect on the history of analog &
digital photography & photoshop, and recall that this slippery slope of danger
against fake realism has been warned against multiple times before. There are
lots of legitimate uses for inpainting (movies, ads). As someone who’s worked
in film, I’m excited about the possibilities that NNs bring in terms of new
techniques and reduction of labor.

~~~
sandworm101
>> I’m excited about the possibilities that NNs bring in terms of new
techniques and reduction of labor.

Reduction of labour is nice. Elimination of labour is problematic. To
photoshop something someone has to actually photoshop it. We don't set our
cameras to automatically photoshop things as we take them. The techniques in
the OP are dangerous because they could be employed in situations where the
photographer doesn't realize. A camera/robot trying to edit something
automatically fabricating a false reality in the eyes of humans. Imagine a
camera used to capture evidence of crime. You don't want such a thing filling
in details on its own.

~~~
TeMPOraL
Isn't Google Photos / Google Camera App doing that, automatically or semi-
automatically? I recall they had a feature that could magically merge multiple
photos into one that has a combination of details from the source images.

------
isp
Remarkable machine learning result for "producing astoundingly sharp photos in
very low light" (Cory Doctorow - [https://boingboing.net/2018/05/09/enhance-
enhance.html](https://boingboing.net/2018/05/09/enhance-enhance.html) )

Demo example (one of many examples - drag the middle slider from left-to-
right):
[http://web.engr.illinois.edu/~cchen156/SID/examples/16.html](http://web.engr.illinois.edu/~cchen156/SID/examples/16.html)

GitHub: [https://github.com/cchen156/Learning-to-See-in-the-
Dark](https://github.com/cchen156/Learning-to-See-in-the-Dark)

Paper: [https://arxiv.org/abs/1805.01934](https://arxiv.org/abs/1805.01934)

Video:
[https://www.youtube.com/watch?v=qWKUFK7MWvg](https://www.youtube.com/watch?v=qWKUFK7MWvg)

------
jjcm
I'd be very curious to see this applied to not just a single frame of video,
but rather to the video as a whole. My assumption is that it would create a
weird jitter to the parts of the image that have been recreated by the neural
net.

I think what would be almost even more interesting is an algorithm like this
that is specifically trained to video, and takes into account previous and
next frames when recreating lost data.

~~~
ghgr
There's been work (in a related field) to "stabilize" the jitter in
consecutive frames. As you say, by taking into account the neighboring frames.

Relevant excerpt from [1]:

“If you just apply the algorithm frame by frame, you don’t get a coherent
video — you get flickering in the sequence,” says University of Freiburg
postdoc Alexey Dosovitskiy. “What we do is introduce additional constraints,
which make the video consistent.”

[1] [https://blogs.nvidia.com/blog/2016/05/25/deep-learning-
paint...](https://blogs.nvidia.com/blog/2016/05/25/deep-learning-paints-
videos/)

------
pietz
I have a couple of questions if the authors are following along.

While I understand the choice of using a downsampled input with 4 channels I'm
wondering why you went with a downsampled output instead of going to the
original resolution directly where the 3 color channels are separate.

Also, did you investigate "faking" the training data by taking a single well
exposed image, making it darker using conventional methods and using the
resulting image as an input to the workflow?

------
foobarrio
I wish they provided the RAW files. Looking at "traditional-pipeline" photos I
am positive I can get a much better result just spending some time with
Lightroom and coming up with some "super high ISO" preset. Perhaps it will not
match their new pipeline but it will be better than what they have for the
"traditional pipeline".

------
jack_pp
This might work wonders for webcam video if it works fast enough

------
chwahoo
I love playing with my mirrorless camera and lenses, but I'm becoming more and
more convinced that it's a risky proposition "investing" in a bunch of
expensive camera gear (which traditionally holds it's value better than most
gadgets) when computational methods will soon evaporate the advantages of
bigger sensors / faster glass.

------
eveningcoffee
This does not sound right. The source image must have had more information
(perhaps not compressed raw data) than in the example.

The book cover details simply are not there on the dark image (if you scale up
the brightness then there is only blocky noise.

So either this is not the right dark image or their network dreamed it up.

~~~
dr_zoidberg
The dataset is composed of D -> F sets (not pairs, because there are many
underexposed images) of Dark to Fully-illuminated images.

Yes, the net is "dreaming" the details, based on what it learns from those
mappings. I'd say this nets are very specialised on the sensor, and maybe even
lens choice. Simply put, what they did is compressing a full pipeline of
processing into a deep net that consumes RAW files and spits out natural
looking images.

------
OldSchoolJohnny
This isn't my field but I'm curious: are the results in the slider samples
novel images or ones that were trained on?

Could it not just be really good are recreating the image it was trained on or
is it generally doing this with novel images in this case?

------
aylmao
I can see Apple and Google rushing to secure a deal to include this tech on
their cameras.

------
didibus
Doesn't look like their slider bar works with mobile chrome.

------
anovikov
Astronomers spend entire careers trying to squeeze as much data as possible
from low light shots of the stars. And their math skills are superb, and
budgets almost unlimited. There is very little to add to their job really.

------
jamesholden
The technology and those pics are interesting. Though the content of the
pictures are odd.. mannequin heads and metamucil.. xD

