
Show HN: Open source colorizing grayscale images with NN - ducktracker
https://github.com/pavelgonchar/colornet
======
argonaut
These results:
[http://richzhang.github.io/colorization/](http://richzhang.github.io/colorization/)
from Berkeley are _much_ better than this model (and the code is open source
as well).

Furthermore, those better results have the advantage of a much simpler model.
This model has a fairly complicated architecture (a complex residual
concatenation setup) and many more parameters (I would guess anywhere between
2x-10x as many, but I'd have to take a closer look), which means it's much
slower to run and takes up more memory (disk and RAM).

I'd also say that in general the better model does things that are a lot more
common sense: using the CIE Lab color space (perceptually uniform), omitting
pooling, using a classification loss instead of regression (regression in
generally performs poorly in deep learning), etc.

~~~
specialist
Idle thought: Could colorization be used to (lossy) compress full color images
(usefully)?

What I imagine is full color input -> create B/W & color histogram (list of
colors used) -> image viewer uses colorization algorithm to reapply colors.

~~~
adrusi
I don't really know a ton about image compression, but I exported the same
photo to jpg using the same settings twice: once with 3 channels and once
desaturated and then with two channels hidden (I couldn't figure out how to
make a 1-channel jpg in gimp, no sure if it's possible). The 1-channel export
had less than 30% file size reduction.

I don't think a compression technique that would require that much processing
power and have that little size reduction would be too useful.

~~~
maffydub
A 30% saving is roughly what I would expect.

Commonly JPEG separates the image into 1 channel of luma and 2 channels of
chroma. It then downsamples the chroma to half the luma resolution, meaning
that you have twice of much raw luma data as you have chroma.

It then goes on to do a whole load of fun discrete cosine transform,
quantization and huffman encoding, but to a first approximation, I'd expect
them to compress roughly similarly on average.

Since you've got twice as much luma data as chroma data, dropping the chroma
will only save you ~30%.

Since the predictions from the neural network are not always accurate, you
could then encode the error in the compressed file, and restore it on
decompression. This will generally be close to 0, so compression should be
pretty good.

At the end of the day, 30% better compression would be great... if the
processing overheads aren't too great - I think that's the key factor.

(Most of this is from memory based on some hacking I did to be able to
transcode videos to play on a cheap Chinese iPod Nano clone, which used an
undocumented variant of MJPEG. The default quantization tables for luma and
chroma are different from each other. The iPod Nano clone was using the
standard quantization tables but the other way round (so using the luma table
for chroma and the chroma table for luma). I can only imagine this was a bug
in their code, as it was bound to reduce their compression ratio/image
fidelity.)

------
fmeyer
I'm colorblind and I can't tell the difference between the prediction and the
GT. My coworker says that they have slight differences in some colors.

Congrats, you wrote the first colorblind NN ever!

~~~
rimantas
I'd say the differences are very well expressed.

------
m_mueller
I'd apply it to an old B&W film to show off. Imagine Ben Hur or Citizen Kane
in color!

~~~
nightcracker
I see a potential problem with moving pictures - the neural network might
decide that while red was an appropriate color for the truck in the last
frame, blue is more likely in this frame.

~~~
xiphias
You can use the previous few frames for fixing this problem for videos

~~~
thomasahle
There might be a few scenes in between the viewings of the truck.

~~~
xiphias
You're right, it would be a fun side effect to see the color change between
different scenes :) I guess there's not enough in commercial interest in
fixing these problems, but it can be probably done with the current
algorithms. Precise 3D reconstruction is much more important than colour
reconstruction.

~~~
m_mueller
Would it be possible to have some sort of semantic naming coupled to it 'the
car in this scene is red', that could be fixed by humans, maybe even by
talking to the NN? That's when this stuff starts getting fun IMO - human
language together with superhuman domain knowledge.

~~~
ThePaco
Or even using the previously processed frames as reference the next time the
shot appears

------
SyneRyder
The title says it's open source, but I couldn't actually see what the license
is? (Maybe I missed it?)

~~~
syockit
It is open source. You can see the source. The license is no-license, so you
can (only) fork it. That way, people can clone from your username instead of
the original author, although that doesn't make any difference as they're all
hosted on GitHub.

~~~
lucideer
You seem to be getting downvotes because you don't understand what the term
"open source" means, but nobody has offered to explain so:

Open source does not just mean you can see the source. From Wikipedia[0]:

"Open-source software is computer software with its source code made available
with a license in which the copyright holder provides the rights to study,
change, and distribute the software to anyone and for any purpose."

[https://en.wikipedia.org/wiki/Open-
source_software](https://en.wikipedia.org/wiki/Open-source_software)

~~~
nothrabannosir
Not OP, but---I've been in software and OSS for a long time now, and I never
knew this. I thought the term for that was Free Software.

In fact, looking closely at that definition, isn't practically nothing open
source?

"Open-source software is computer software with its source code made available
with a license in which the copyright holder provides the rights to study,
change, and distribute the software to anyone and _for any purpose._ "

I.e. public domain? Any other license lives precisely to limit those rights of
distribution, no?

Anyway, just pedantry. I see your point. Learn something new every day.

~~~
lucideer
I think you've highlighted an inaccuracy, or at least a lack of clarity, in
the Wikipedia article. I guess a Wikipedian could argue it should be
interpreted as "for any purpose depending on the chosen licence", but that's
definitely not clear.

Re: Open source vs Free software, this is a matter for debate (the tendency
being too associate Free software with the more copyleft/viral/gnu-ish side of
things), but I would definitely say that both involve explicit licencing, as
opposed to potentially implicit " all rights reserved " style regional
copyright.

~~~
Chris2048
I would usually use the term "FLOSS", for Free-Liberte, OSS"

The L part clarified the type of freedom, as the FL clarifies that it's not
just OSS.

------
x5n1
Can we get some instructions on how to actually use this. I want to test it
out on pictures from:

[https://www.reddit.com/r/OldSchoolCool/](https://www.reddit.com/r/OldSchoolCool/)

~~~
mlsource
unfortunately I'm not contributor of this project, but I hope author can
answer soon )

------
g_sch
I'd be interested to see what kind of hidden biases this might reveal,
especially about people, e.g. skin tone, eye color, etc.

~~~
gcr
Agreed. I wonder how good this model is at filling the skin tone of someone.
It would be sad if (for example) it turned everybody into white people or
something like that.

In biometrics, there's been similar cases of software like face detectors and
face recognition working very well on people from China and not very well for
other people, because all the researchers who trained those models only had
available large public databases from Chinese universities. The model hadn't
seen any other ethnicity so its performance on "non-Chinese" folks wasn't
surprising.

------
gravypod
I don't know if OP will see this but I'd love it if you could use your
experience making this to write up a blog post about how it works or port it
to being able to adjust color scales for color blind people.

I'd love to combine this technology with this:
[http://matplotlib.org/style_changes.html](http://matplotlib.org/style_changes.html)

You would probably have some cool results as you could generate examples of
what they would look like to color blind people, and a corrected set so color
blind people could see them.

Would be a cool, and I am assuming simpler problem then the one you have
already managed to solve.

Good show, great work.

------
taneq
I love the last one.

"But you didn't say what colour it was, so I made it a red truck."

~~~
LoSboccacc
Also the lighthouse's sky: "That's sky, it's surely blue!"

Generally it's interesting to see nn thinking out missing details. I'd like to
see images with an element deleted and a nn filling in black spots to see what
level of shape recognition could do.

~~~
Coincoin
I'm pretty sure if you asked a group of humans to colorize the lighthouse you
would probably get the same result.

There is no way, from the greyscale, to know that the sky should be orange.

------
mosselman
Can I try this out in images myself somehow?

------
amelius
I'm looking for a technique that can upscale images nicely (using NNs).

I've found this: [1], but the results seem somewhat disappointing. One of the
problems is that the quality measures are (in my case) subjective (the results
should look convincing but need not be "perfect", whatever that may mean).

[1] [http://engineering.flipboard.com/2015/05/scaling-
convnets/](http://engineering.flipboard.com/2015/05/scaling-convnets/)

------
leichtgewicht
I think this was on HackerNews before?

~~~
astrosi
Are you thinking of this project instead?
[http://richzhang.github.io/colorization/](http://richzhang.github.io/colorization/)

~~~
liotier
Yes - how do those two projects differ in their approach ?

------
Gigablah
Now give it a picture of a bike shed.

~~~
illumen
I think a tool shed is more appropriate.

~~~
georgemcbay
I think it was a reference to this concept:

[http://bikeshed.com/](http://bikeshed.com/)

~~~
baq
no, it was about this one:
[https://en.wikipedia.org/wiki/Bikeshed_color](https://en.wikipedia.org/wiki/Bikeshed_color)

~~~
walrus
How about we stop arguing about which link to use and just pick one? I propose
[http://amzn.com/0345282779](http://amzn.com/0345282779). Of the three links
proposed so far, this one is clearly the most authoritative.

------
gwern
Reddit:
[https://www.reddit.com/r/programming/comments/4ft4hj/colorne...](https://www.reddit.com/r/programming/comments/4ft4hj/colornet_neural_network_to_colorize_grayscale/)

------
SamDLC
It may be interesting to investigate using this for image
compression/decompression. ie starting with an image having a greatly reduced
color space, could the NN reproduce the original image?

------
nitrogen
I wonder if this could be used to improve the proposed color prediction in
Daala, or if the memory and CPU requirements are too high to put into a video
codec.

------
amelius
How would this system colorize this image?:
[http://imgur.com/RBEht9H](http://imgur.com/RBEht9H)

------
WhitneyLand
What's the application and/or interest in colorizing images? Does it
generalize to other types of problems?

------
new_hackers
The sky is blue, grass is green, trucks are red, and everything else is a
shade of brown :-) pretty cool work!

------
mushmouth
This looks pretty cool. Do any photography people or film people can use this?

------
mkagenius
Can this be used to train a model which can un-blur an image?

------
amelius
Does it color the sea blue and the grass green?

