
Learning to See in the Dark (2018) - ksaxena
https://github.com/cchen156/Learning-to-See-in-the-Dark
======
f-
As a photographer, the comparison to "raw" results without color balance or
noise removal seems somewhat deceptive. The effects visible in the video seem
easy to quickly replicate with existing techniques, such as the "surface blur"
filter that averages out pixel values in areas with similar color.

This happens at the expense of detail in low-contrast areas, producing a
plastic-like appearance of human skin and hair, and making low-contrast text
unintelligible, which is why it's generally not done by default.

~~~
sdenton4
Your example strikes me as the kind of thing neural networks are much better
at than a fixed filter. You or I could easily identify regions of an image
where it's safe vs unsafe to do the surface averaging, and boundaries where we
wouldn't want to mix up the averages. (For example, averaging text should be
fine, so long as you don't cross the text boundaries.) A CNN should also be
able to learn to do this pretty easily.

~~~
BubRoss
What you are describing is a class of filters known as edge preserving
filters. You can look at bilateral filters and guided filters for examples
that have been around for decades at this point.

~~~
sdenton4
So we can do a decent job with hand designed filters... Why aren't they in use
in the problem the parent describes? Are they not good enough to deal with
small text boundaries?

A lot of hand built filters (I see a lot of these in the audio space) have
many hand tuned parameters, which work well in certain circumstances, and less
well in other circumstances. One of the big advantages of NN systems is the
ability to adapt to context more dynamically. The NN filters can generally
emulate the hand designed system, and pick out weightings appropriate to the
example.

~~~
BubRoss
This is effectively noise reduction, which bilateral and guided filters are
actually used for. They take the weights of their kernels based on local
pixels and statistics. You can also look up other edge preserving filters like
BM3D and non-local means.

I don't know what you mean by hand made filters and I don't know why that's a
conclusion you jumped to.

------
y7r4m
Hi, I'm a developer at NexOptic[0] and we are a company that was deeply
inspired by this paper when it was first published. We had a lot of early
success when attempting to replicate the results on our own and ended up
running with it, and extending it into our own product line under our ALIIS
brand of AI powered solutions.

For those curious, our current approach differs in some very significant ways
to the author's implementation, such as performing our denoising and
enhancement on a raw bayer -> raw bayer basis with a separate pipeline for
tone mapping, white-balance, and HDR enhancement. As well, we explored a fair
amount of different architectures for the CNN and came to the conclusion that
a heavily mixed multi-resolution layering solution produces superior results.

As other commentators have pointed out, the most interesting part of it is
really coming to terms that, as war1025 pointed out, "The message has an
entropy limit, but the message isn't the whole dataset." It is incredibly
powerful what can be accomplished with even extraordinarily noisy information
as long as one has a extremely "knowledge packed" prior.

If anyone has any questions about our research in this space, please feel free
to ask.

[0]
[https://nexoptic.com/artificialintelligence/](https://nexoptic.com/artificialintelligence/)

~~~
randyrand
It would be really cool if you could feed the network a photo with flash that
it could use for gathering more information, but then recreated a photo
without flash from the non-flash raw.

Often flash is not the look people are going for, but would be okay with the
flash firing in order to improve the non-flash photo.

~~~
y7r4m
Absolutely! We recently rebranded our AI solutions from ALLIS (Advanced Low
Light Imagine Solution) to ALIIS (All Light Intelligent Imaging Solution)
specifically because we are beginning to branch out to handle use cases such
as this!

As a proof of concept that this task can be tackled directly, a quick search
brought up "DeepFlash: Turning a Flash Selfie into a Studio Portrait"[0]

Beyond denoising, we are already running experiments with very promising
results on haze, lens flare, and reflection removal; super resolution; region
adaptive white balancing; single exposure HDR; and a fair bit more.

One of the other cooler things we are doing is putting together a unified SDK
where our algorithms and neural nets will be able to run pretty much anywhere,
on any hardware, using transparent backend switching. (e.g. CPU, GPU, TPU,
NPU, DSP, other accelerator ASICs, etc..)

[0] [https://arxiv.org/abs/1901.04252](https://arxiv.org/abs/1901.04252)

~~~
exikyut
Before reading your reply to OP's comment I got to thinking about how the
super-resolution process and flash photography might interact
([https://news.ycombinator.com/item?id=22905317](https://news.ycombinator.com/item?id=22905317)).
I get the impression you left the point I got to a long time ago :)

------
NikhilVerma
It's surprising how little code [1] is needed to do this. On the other hand I
feel this is quite dependent on the specific camera models and might not work
on the RAW data downloaded from my phone. Happy to be corrected.

[1] - [https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob...](https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob/master/train_Fuji.py)

~~~
covidacct
It's a huge amount of code, hidden behind the tensorflow import statements.
It's common to credit GPUs for the rapid spread of deep learning, but good
GPUs were available for quite a few years before deep learning really took
off. As someone who wrote * a lot* of OpenCL code, including my own python
wrappers, I'm fairly certain this code would be thousands of lines without a
computation graph framework library. These frameworks are really amazing
pieces of software engineering and deserve some non-trivial fraction of the
credit for the rise of deep learning.

If you want to know what the next hot thing in software engineering will be,
just pay attention to whatever Jeff Dean is doing.

~~~
ganstyles
I don't know that I agree with this first statement, but even if I do,
everything is abstracted by import statements even outside ML. You say this is
a huge amount of code abstracted, but it wouldn't be difficult to reimplement
this in numpy and pandas directly without using tensorflow at all. The code
would expand a bit, and you'd have to deal directly with backprop and
calculating derivatives but it wouldn't expand things too much. But then you
could make the same claim about numpy abstracting the linear algebra, and I
could show you that I could extract that and do it without numpy but then it
would be the python math library. It's turtles all the way down. My point is,
your comment applies to everything.

~~~
covidacct
Yup, I absolutely agree. Almost all big leaps in software engineering and
applied computer science come from building a powerful and simple abstraction.
Powerful and simple abstractions are surprisingly difficult to get right.

------
dgellow
A "Two Minute Papers" on this project from 2018:
[https://www.youtube.com/watch?v=bcZFQ3f26pA](https://www.youtube.com/watch?v=bcZFQ3f26pA)

------
jameshart
The problem with techniques like this is that they fundamentally amount to
‘making a plausible guess as to what the image would look like’, since
essentially they can’t extract information that is simply not there. There is
a Shannon entropy limit here.

Machine learning is really machine-enhanced educated-guesswork, which has its
place but also has its limits.

~~~
oconnor663
Counterpoint: The human brain converting a 2D image to a 3D model is educated-
guesswork too :)

~~~
war1025
This hits on an interesting point.

There is an entropy limit to the message, but the message isn't actually the
only data.

One thing humans are great at is integrating existing knowledge into a messy
situation and intuiting more than is available just from the raw message.

I.e. The message has an entropy limit, but the message isn't the whole
dataset.

~~~
MiroF
Yes and that's what this "lossy" conversion to daytime does as well,
incorporate prior knowledge, but that prior knowledge is about how images of
real world things function during night versus day.

------
coenhyde
It's a great result, but it's not perfect. No need for the "perfect" hyperbole
in the title.

------
ksaxena
Video from the paper here:
[https://www.youtube.com/watch?v=qWKUFK7MWvg](https://www.youtube.com/watch?v=qWKUFK7MWvg)

------
jrimbault
Why is the "page suspended" ?
[http://cchen156.web.engr.illinois.edu/paper/18CVPR_SID.pdf](http://cchen156.web.engr.illinois.edu/paper/18CVPR_SID.pdf)

~~~
selectodude
The State of Illinois is out of money again.

~~~
anigbrowl
You shouldn't be downvoted - with a big recession/depression looming, link rot
and many sorts of repositories shutting down are a big issue that will slow
down the pace of research.

------
q3k
Finally, a way to restore
[https://en.wikipedia.org/wiki/The_Night_Watch](https://en.wikipedia.org/wiki/The_Night_Watch)
!

------
babuskov
I was just wondering a couple days ago why the image from my phone is so
grainy, while my eyes+brain can see everything clear in the dark (it wasn't
completely dark, of course).

This seems to replicate the post-processing we do in our brain (which is also
a giant neural network). I wonder if the process is similar?

~~~
arpa
Very small numbers of photons (1) are required to trigger rhodopsin cycle. So
primary receptor itself is very very VERY sensitive.

~~~
tofof
To be clear, the parent did not fail to include a citation. The parenthetical
note is that rod cells are so sensitive that they react to being struck by a
single photon.

------
mkchoi212
Pretty cool but seems like there’s a big limitation on this for now

“ The pretrained model probably not work for data from another camera sensor.
We do not have support for other camera data. It also does not work for images
after camera ISP, i.e., the JPG or PNG data.”

Would be cool to see how they come up with better models that would allow them
to overcome the above limitations

------
Aardwolf
I doubt this image is showing the true raw data (a):

[https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob...](https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob/master/images/fig1.png)

If you take the dark image (a) from that and balance its color, the
information that is present in it simply cannot contain the text from the book
covers and so on. In fact, it's full of JPEG artifacts despite the image being
a PNG. It would be useful if they presented a histogram equalized image of
(a).

------
dang
Discussed at the time:
[https://news.ycombinator.com/item?id=17064079](https://news.ycombinator.com/item?id=17064079)

------
penetrarthur
I always wondered if you can "trust" an image that has been basically
recreated. Could that kind of image be used as an evidence in court?

~~~
mattkrause
Xerox copiers had a bug caused by a failed image (re)construction, which
caused it to replace similar (but not identical) parts of an image with other
pieces of the image.

[http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_...](http://www.dkriesel.com/en/blog/2013/0802_xerox-
workcentres_are_switching_written_numbers_when_scanning)?

------
GhostVII
It would be good to get a comparison of a brightened version of the sample
image compared with the CNN version. Right now the sample image just looks
black, but if you scale up the brightness you get an image that looks more
like the higher ISO image. That would be a better comparison since it shows
what improvements the CNN gives over naive techniques like just bumping up the
pixel values.

------
robmiller
I wonder if photographic evidence "enhanced" by such a method would be
admissible in court?

~~~
draugadrotten
Perhaps it depends on "which court". The legal systems in some parts of the
world would allow it.

------
amelius
Some questions:

\- Did they create a special network topology for this problem?

\- Does the network need to see the entire image, or only an NxN subblock at a
time?

\- How did they obtain the training data? Is it possible to take daylight
images and automatically turn them into nighttime images somehow?

~~~
isatty
Take pictures at night with a tripod mounted camera with different exposure
brackets?

~~~
amelius
The problem is the amount of pictures you'd need. It's much easier to use
available datasets if you know how to preprocess the data.

------
todd3834
Could something like this be done for night vision goggles or is there
significant latency?

~~~
riazrizvi
An interpolation that looks for movement of a few anchor points? I imagine
that would entail much less computation and so deliver apparent real-time
night vision. Though sudden big movements in scene would cause blackout
regions of about a second?

------
ZeroCool2u
Interestingly, this is effectively an extreme version of solving the
colorization of black & white photos problem. I wonder what the results would
be if you just threw some black and white photos into the model.

------
ChrisArchitect
(2018) and a better title that doesn't say CNN, c'mon

~~~
dang
We've added the year above. Submitted title was "CNN converts night images to
perfect daylight in – 1s".

------
jordache
isn't this just some tweaked raw processing algorithm?

------
ptrenko
I think I'll faint seeing AI progress anymore.

I didn't even think this was possible. Have people ever done this manually
before? Like without AI?

------
paul7986
Cool and integrating this into AR Glasses would make them almost a must buy!
Turn night into Day ..see in the dark, etc!

------
vehemenz
What would happen if this were paired with license plate recognition? And
would it be admissible as evidence?

------
Invictus0
I believe this should have a (2018) tag.

~~~
yokto
Yes, the result image and videos are from 2018
[https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob...](https://github.com/cchen156/Learning-to-See-in-the-
Dark/blob/master/images/fig1.png)

------
soperj
Just wondering why for something brand new they'd use python 2.7??

~~~
as1mov
Probably because it was already installed on their machines. Also what benefit
would this project get by using a newer version?

~~~
soperj
Longevity.

------
pachico
Funny. I'm walking down the corridor almost in total darkness trying to get my
son to sleep. I get bored and with my free hand reach to my phone, open NH and
stumble upon this title. Totally unrelated to its content but I had a (quiet)
laugh :)

------
baybal2
I wonder, how much can it improve over this:
[https://youtu.be/c_0s06ORTkY](https://youtu.be/c_0s06ORTkY)

X27 is also using some kind of neural algorithm to denoise and get maximum out
of the CIS

------
css
What camera are they shooting at 409,600 ISO at?

~~~
Traster
In the video they reference the Sony A7S II, on Sony's website[1] they claim:

>Still images: ISO 100-102400 (expandable to ISO 50-409600),

[1]:[https://www.sony.co.uk/electronics/interchangeable-lens-
came...](https://www.sony.co.uk/electronics/interchangeable-lens-
cameras/ilce-7sm2#product_details_defaul)

~~~
dingaling
Which is extremely lossy, because any ISO other than the sensor's native level
is the result of in-camera processing. Unlike film, adjusting the "ISO" in a
digital camera doesn't increase sensitivity; that's physically impossible.
Instead, very strong overgain processing is applied.

So in this instance they're processing lossily on top of an image already
processed lossily in-camera.

~~~
anigbrowl
The A7g image is used for comparison rather than input data, as that camera is
widely regard as the state of the art for low-light photography on a non-
scientific/military budget.

------
skyde
why use iso 8,000 as input and not the camera native ISO ?

------
throwaway122378
Now all they have to do is make sure the correct image displayed for the story

------
GEBBL
Impressive of the American news channel, CNN, to convert images in minus one
second.

~~~
wyldfire
CNN here is a "Convolutional Neural Network"

~~~
echelon
The title needs to be changed so brains recognize it as such. It either needs
a preceding adjective or letter indicating what type of convolutional network
it is.

The other option is spelling it out.

Most people will read CNN as the news channel. Even those familiar with neural
networks.

~~~
numbernine
Not everyone is from the US...

~~~
MFogleman
I'm not from Europe, but if I saw "BBC converts night images to perfect
daylight in ~1 second", I would assume it meant the British Broadcasting
Corporation. CNN is just as big of a name. His point is absolutely relevant.

Said differently: The percentage of people who are not from the US - but are
aware of CNN as the Cable News Network, is higher than the percentage of
people who are not machine learning experts - but are aware of CNN as a
Convolutional Neural Network

------
grumple
Ah, after decades of effort, we have finally replicated the effect of a
candle.

