
Colorizing black and white photos with deep learning - max_
http://www.tinyclouds.org/colorize/
======
gwern
The 'averaging' problem is interesting and I think related to the Euclidean-
style loss function: it is correctly minimizing the color distance between its
gray-brown guess and all the possible shades? If that's what's going on, then
messing with the architecture won't fix it because it's already reaching the
'right' answer and what is necessary is a better loss... Perhaps something
like DCGAN where the loss function is another CNN which is being trained to
guess whether an image is original or generated? (So you'd feed the BW image
into the generator, which converts it into a colorized image; the colorized
and the true color images are both fed into the discriminator, which tries to
guess which is the real one. Then error is backpropagated into both.)

~~~
Houshalter
One other problem is that "real" black and white images don't come from the
same distribution as greyscaled images. The training data is just regular
photos that have had all the color channels combined and averaged together.
Sometimes it's a weighted average, like green is given more weight than blue,
because human eyes see green better.

However real black and white photos represent the actual intensity of the
light waves. Black and white photos might be sensitive to light humans can't
even see, or weight different "colors" very differently. Even two colors that
appear the same to the human eye might be different intensities and
frequencies, and so produce a different black and white photo.

I don't know how much of an effect this would have. But real black and white
photos definitely look different than greyscaled color photos, and it might
lead the NN to guessing the wrong color information.

~~~
derefr
Interesting point; the "perceptual brightness" formula used in grayscaling
calculations is usually this:

    
    
        sqrt(0.299r² + 0.587g² + 0.114b²)
    

Real "photosensitizing" chemicals, on the other hand, have spectral response
curves, like this one[1] for the silver iodide used in daguerrotypes. (The
formula above just being an approximation of the summed spectral response
curves of the three cones in the eye.)

There's a big difference, like you said, but it's not an impossible problem;
we should be able to create a greyscaling formula for any photographic process
with a known response curve. (And honestly, I have no reason to think this
hasn't already been done in the creation of an Instagram filter at some
point.)

[1]
[https://books.google.ca/books?id=9IpaIAcgthQC&lpg=PA119&ots=...](https://books.google.ca/books?id=9IpaIAcgthQC&lpg=PA119&ots=zEWTsY30T3&dq=silver%20iodide%20photosensitivity%20spectrum&pg=PA120#v=onepage&q=silver%20iodide%20photosensitivity%20spectrum&f=false)

~~~
Houshalter
Well not necessarily. If the response curves are different than that of a
digital camera, then some information will be lost. In fact the response
curves are different even for different cameras. But you could probably get a
closer approximation using a better formula.

------
beautifulfreak
I do colorizations on Reddit. How about adapting this to assist manual
colorizers? We'd mask out each color region by hand, so the neural net
wouldn't have to decide what's what. We'd tell it, "this region is skin, this
is a brown overcoat, this is a gold candlestick - please colorize them
appropriately." As it is, colorizers fill in those regions with a very small
number of colors, sometimes with gradient maps that blend two or three colors
together, but there's just not much color variety in even the best works.
Grass should be a hundred shades of green, but we'd use 2 or 3. Skin is the
hardest to get right, and requires more individual hues, but that means maybe
5 to 10. It's enough to trick the eye, but on close examination looks more
painterly than photographic. A neural net palette picker could be Photoshop's
next big feature. An intelligent skin colored crayon might actually deliver
all the shades of skin.

~~~
bootload
_" I do colorizations on Reddit. How about adapting this to assist manual
colorizers? We'd mask out each color region by hand, so the neural net
wouldn't have to decide what's what. We'd tell it, "this region is skin, this
is a brown overcoat, this is a gold candlestick - please colorize them
appropriately."_

That's a smart idea. Master painters used to do this. Do the broad outlines,
choose the colour and style then unleash the underlings (painters in training)
to do the rest.

------
sandworm101
Note the image with the wolves. I think the entire exercise is in that photo.
This system seems very good at patterns. The animals do well when against
natural backgrounds. That's because they match those backgrounds. Even though
they are different colours, animals with fur all adopt some form of
camouflage. So their patterns at some level match the patterns of their
natural environment.

The wolves are rendered well, the flowers not. Wolves are camo. Flowers are
the opposite. They want to stand out from the background. So the machine
doesn't handle them well. The green stripe on the truck also fits this.

To take this idea forward, look at the image of the puppies against the grass.
They are not camo. Their colour is the product of breeding, therefore they do
not render so well as the wolves. There might be something useful here to
measure whether or not an animal is being viewed in its natural environment.

~~~
munificent
> Wolves are camo. Flowers are the opposite. They want to stand out from the
> background. So the machine doesn't handle them well.

There is a simpler explanation. Wolves only come in a few different colors.
Flowers come in a variety of colors. Therefore, there are only a couple of
correct answers for coloring a wolf, but a wide variety of completely
incompatible answers for coloring a wolf.

You are right that flowers come in a variety of colors _because_ they want to
stand out. But I don't think the neural net understands that. It just knows
that a gray flower could be any color while a wolf is confidently going to be
some kind of brown.

~~~
sandworm101
I wouldn't want to be the one to tell this guy that wolves are all 'some kind
of brown'.

[https://media2.wnyc.org/i/620/372/c/80/1/485198177.jpg](https://media2.wnyc.org/i/620/372/c/80/1/485198177.jpg)

~~~
munificent
Ugh. _Seriously?_ Didn't we all discuss _literally yesterday_ about pedantic
misinterpretation of text on the Internet?

This article is about colorization, which means taking shades of _gray_ and
selecting a _hue and saturation_ for them. The brightness is effectively fixed
because, guess what, a black and white image can convey brightness already.

Obviously, black and white coloration on wolves falls outside of this because
those are more or less already correct in the _black and white_ image.

Now look at that picture you linked. What do you see? White: doesn't need much
coloration. Black: uh, also doesn't need much coloration. Slightly brownish
gray: like I said, wolves are all some kind of brown.

Show me a blue wolf, or a green wolf, then we'll have something interesting to
talk about. But most wolves, like almost all mammals, have coloration pretty
much limited to dull warm colors and tints and shades of those. Here's a
picture for you:

[https://en.wikipedia.org/wiki/Canis#/media/File:Canis.jpg](https://en.wikipedia.org/wiki/Canis#/media/File:Canis.jpg)

What do you see?

~~~
krapp
>Didn't we all discuss literally yesterday about pedantic misinterpretation of
text on the Internet?

No, we all literally did not.

------
nkron
Very cool results. Seems like with some extra human intervention this would
produce very good results.

Shameless plug: I built an online tool to colorize photos using WebGL. It's
all manual but it's easy to get started and doesn't require any additional
software. [http://www.colorizephoto.com](http://www.colorizephoto.com)

~~~
ghrifter
Awesome site! I like the Clint Eastwood "Color Picker" :)

------
tacos
I'd like to commend the author for the tone of this article.

It's the right mix of "paper" and "blog post." It's an experiment that sort of
flops and there will be a variety of "yay for tech!" and "I can't wait to see
a movie where various body parts of people remain black and white!" in the
comments here regardless.

Presenting it as an experiment, clearly explaining what you tried, then
detailing some future thoughts and saying "it kind of works" was refreshing
and honest. Thank you.

------
Robin_Message
Great idea. I feel like an alternate color model could help, because of all
the places where the color is arbitrary but strong average to a muddy sepia.

For example, the stripe on the truck should be bright and saturated, but the
actual color doesn't matter.

The HSV colour space could work if the difference between colours is
calculated with some kind of circular arithmetic.

~~~
luminiferous
I think the more fundamental problem is that the program is trying to minimize
error where error is defined as deviation in color from the original image.
This means objects that can be many different colors but are always strong and
saturated average to a brown, as you said. And as you said, for those types of
objects, it's best to pick a random color and make it bright and saturated.
The best way to have it do that is to redefine error as some metric of how
"realistic" the picture looks vs the original. For example, a picture of a car
recolored to look bright blue looks similarly "realistic" to the human eye as
the original picture where the car is bright red. The deviation in color is
high, but it still looks "good", so the error should be low. I have no idea
how this metric would be calculated without humans evaluating the output
manually, though.

------
transcranial
Nice work! Another demonstration of the amazing power of deep neural nets for
transfer learning. For those interested,
[http://arxiv.org/abs/1511.06681](http://arxiv.org/abs/1511.06681)
demonstrates experiments with video coloring using a 3D convolution-
deconvolution network with some architectural elements similar to that used by
the author.

~~~
eadz
It's great to see really good use cases for this technology. I would be
interested to see an old B&W movie colorised this way.

------
marknadal
You guys! This is Ryan Dahl of NodeJS! He's back! When did he come back? So
excited to see him around again, he's such a great guy (I had the honor of
meeting him once).

Brilliant post too with an excellent write up. Can't believe people hadn't
already been thinking about this.

------
logicallee
VERY interesting - as soon as I read the title, I thought "finally, this will
really tell me whether deep earning is bullshit or not!". Why? Because my
understanding is that colorization is basically magic, like, how can you
possibly get the color of a telephone handset out of a black and white image?
In fact, you can't! Here's the set of the Addams Family:

[http://www.fastcodesign.com/3021327/asides/the-addams-
family...](http://www.fastcodesign.com/3021327/asides/the-addams-familys-
living-room-was-pink)

Totally all over the map. So I thought this would be such a fantastic,
fantastic way to compare my knowledge of the world with a punk algorithm's.

I didn't read the key/legend/explanation - as soon as I saw the first set of 3
images, I knew the middle lamp was the true color, and the right lamp was
generated; because lampshades overwhelmingly look like the middle picture (in
my mind) -
[https://www.google.com/search?q=lampshade](https://www.google.com/search?q=lampshade)
not that weird blue color.

"Ha, stupid algorithm", I thought. "Who has a blue lampshade". this algrithm
doesn't even come close.

Then I kept scrolling with that assumption, and it got worse and worse - wow
that photographer's color is like blood orange, the algorithm doesn't even
know it's a person! This is terrible. Where does that truck get that green
trim, nobody would choose that, straight out of left field.

Until I got to the field with wolves. Why are the flowers' colors missing from
the middle picture? This doesn't look right at all.

Then I read the caption. The middle images are the generated ones; the right-
hand images are reality.

For five 5 of 6 images, I thought that the generated image was "obviously" an
actual photo, and much more plausible in colors than the right-hand real
photos. Continuing to scroll, for the park bench also I think the middle image
is much closer to how I imagine it.

So we are at the stage where an algorithm generates a much more plausible view
of reality, with rare exceptions, than actual reality. This is pretty
impressive.

~~~
TheOtherHobbes
I had completely the opposite take-away - but then I have some experience of
photography, and sadly, brightly coloured lampshades no longer surprise me.

I think it's a good illustration of the limits of statistical approaches, and
why "It's harder than it looks" applies.

This is about as good as it's going to get without genuine object recognition,
knowledge of real-world lighting and colour, and awareness of photographic
styles.

It might be possible for a system to learn all of the above, but it's going to
need a bigger and probably pre-partitioned training set, and a _much_ more
complex model.

~~~
Houshalter
To be useful, it doesn't have to be perfect. It can have humans provide it
with reference photos, select colors for different parts, or look at several
different versions of the same image and decide which one looks best.

------
samim
Made a Gist, allowing you to apply this to Video:
[https://gist.github.com/samim23/5baaf1d206cf5e81436d](https://gist.github.com/samim23/5baaf1d206cf5e81436d)
And ran it on Chaplin as demo:
[https://www.youtube.com/watch?v=_MJU8VK2PI4](https://www.youtube.com/watch?v=_MJU8VK2PI4)

------
WalterBright
I have a lot of old B+W photos, and various color photos of the same general
subject. It would be way cool to have an app that I could use the color ones
to teach it, and colorize the B+W ones. I don't mind that it wouldn't be
perfect.

~~~
gwern
> It would be way cool to have an app that I could use the color ones to teach
> it, and colorize the B+W ones.

You'd want the training to be done on the server, not locally. Setting up a
full Torch/Caffe/Theano stack is not easy because there are so many libraries
and moving pieces which must interact with Nvidia's proprietary blobs and
libraries and ever-changing GPUs, that you can follow all the directions and
either work or fail with an utterly inscrutable error. (For example, I'm
running on an old Ubuntu because the newer Ubuntu is not officially supported,
and my usual OS, Debian, just does not work no matter what I try.)

~~~
ymt123
There are actually dockerized versions of many (maybe all) of the deep
learning libraries. The docker containers can take advantage of the GPU for
training. You still have to install CUDA on the box (outside the docker
container) but then you can try out different deep learning libraries.

Libraries we've started from in my lab: Caffe:
[https://hub.docker.com/r/kaixhin/caffe/](https://hub.docker.com/r/kaixhin/caffe/)
Torch
:[https://hub.docker.com/r/kaixhin/torch/](https://hub.docker.com/r/kaixhin/torch/)
Theano:
[https://hub.docker.com/r/kaixhin/theano/](https://hub.docker.com/r/kaixhin/theano/)

------
vegabook
Wow, the real (rhs) images have so much more of an "emotional" appeal. Somehow
the middle images, while I admire the tech, really don't add much to the black
and white, and they might even take something away. They seem much too
tentative, washed out, averaged. I would argue that in general, they're the
worst out of the three.

~~~
Houshalter
That's a problem with this specific architecture. If it's not sure what color
something is, it goes with an average of all the colors. When a brighter color
would be more realistic, even if the exact shade is wrong. Some people have
suggested ways of fixing this above.

But just the fact it guesses the right colors at all is really cool. Previous
automatic colorizations I've seen were very very crappy or required lots of
human input. Or both.

And while these colorized photos do look a bit dull, I like them better than
black and white. Something about black and white photography makes it look
fake to my brain. It doesn't register the same way. Even really bad
colorizations make images feel more real. I once saw very badly colorized
video of WWI, and it was really fascinating. I actually felt like I was
watching a real event that had actually happened. The same is true for these
images.

------
CyberDildonics
If anyone is interested in this an wants to learn more, start with natural
image matting - that is the more fundamental research topic of which
colorization is one use.

While I'm sure this is interesting to many people, the results here are
extremely poor compared to modern techniques.

~~~
cfcef
> the results here are extremely poor compared to modern techniques.

Could you link a demonstration of the much superior results?

~~~
CyberDildonics
Search natural image matting or colorization and you will find a lot of
results.

Take a look at Levin 2004 to start.

------
djfm
This is what must be going on in our brains when we watch black & white
movies, fascinating.

------
noobie
On a tangent but any idea how the author's portrait was created?
[http://tinyclouds.org/](http://tinyclouds.org/)

Edit: Thank you.

~~~
heed
It looks like they ran the photo through Google's Deep Dream. There are a
bunch of generators online, eg
[http://deepdreamgenerator.com](http://deepdreamgenerator.com).

------
jayess
In their list of validation images, I thought this was was pretty cool:

[http://tinyclouds.org/colorize/val-
imgs/val-006200-2.jpg](http://tinyclouds.org/colorize/val-
imgs/val-006200-2.jpg)

------
trizzashamafoo
These results are clearly hand-picked to leave out complete failures like
these:

[http://tinyclouds.org/colorize/val-
imgs/val-000100-1.jpg](http://tinyclouds.org/colorize/val-
imgs/val-000100-1.jpg) [http://tinyclouds.org/colorize/val-
imgs/val-000800-2.jpg](http://tinyclouds.org/colorize/val-
imgs/val-000800-2.jpg)

When I selected from the validation images, most had these blue splotches that
were not shown on the web page. Obviously a model is not expected to work 100%
of the time, but I think the link misrepresents the results by not even
showing a single instance of this common failure.

edit: further scrolling shows it's less common than I thought.

~~~
gefh
That's a 100-generation result and not expected to be that great, they do get
better with more training.

------
Nav_Panel
How might this model handle B&W images created with colored filters? What
would the difference in output be? Many many film-era B&W photos used yellow
(or perhaps even more dramatic, like red and blue) filters. Here's a small
example of the difference: [http://www.exposureguide.com/images/lens-
filters/red-filter....](http://www.exposureguide.com/images/lens-filters/red-
filter.jpg)

------
VikingCoder
"Human wants me to do something... Uh... Brown? Yes? Brown? You like brown?
BROWN!"

~~~
jcl
Yeah, I got the same impression. The mapping it seems to have discovered is:

    
    
      vegetation texture -> greenish
      sky texture -> blueish
      everything else -> brownish
    

...which is not a bad set of defaults for most photos.

------
al626
Any explanation as to why there seem to be a lot of blue highlights on the
extended validation set?

------
amelius
> In the past few years Convolutional Neural Networks (CNNs) have
> revolutionized the field of computer vision. Each year the ImageNet
> Challenge (ILSVRC) has seen plummeting error rates due to the ubiquitous
> adoption of CNN models amongst the contestants.

Am I right that the "Convolution" part only refers to the speed by which the
models can be trained, and not to any other quality of these models?

~~~
yablak
No. Most of layers in a CNN perform convolutions with kernels. This is not the
same as standard DNNs that do a full matrix multiply.

Convolutional kernels allow you to use many fewer variables to perform the
forward layer operation; and CNNs tie these trainable variables across layers.
Training is not only faster, but also more robust because you have less
parameters to learn.

------
drcode
What would be really awesome is a Reddit bot that takes the Highest-scoring
image from the colorization subreddit and mixes it 50% with the deep learning
results and posts it as a comment.

From the published results, I think it's likely a 50:50 mix of human and ML in
this case would likely yield the most naturalistic result in 90% of cases.

------
OldSchoolJohnny
I think it might be possible to infer the colours of some objects based on how
old black and white film worked because each type had different spectral
sensitivity and probably has a signature that can be determined automatically.
Once you know the film you might have a shot at inferring what colour that car
is for example.

------
oilywater
Thing about NNs is that they cannot encode structure, or, to be more precise,
they cannot optimize over joint loss (except RNNs but training is a pain). For
example, the color of the current pixel can depend on the inferenced color of
upper neighbors and neighbors on the left (we would be coloring pixels from
left to right, going row by row). NN in the article is coloring locally
without any kind of reference of what the whole picture looks, NN can encode
the information of what the pictures look since it has a lot of representation
power but that requires large amounts of data, and parameter tweaking.

Optimizing over joint loss (maximizing the probability of the full image
colorization) would work extremely well.

Tools like vowpal wabbit can easily be adapted to learn a chain classifier
over colors and it should work insanely fast.

------
DanBC
I'd be interested to see the output images compared to the output of "colour
blindness simulators".

------
EijiK
Test movies "Roman Holiday Trailer (1953)"
[https://youtu.be/pKtYv6cU8VE](https://youtu.be/pKtYv6cU8VE) "Tokyo Story
Trailer (1953)" [https://youtu.be/-PsCZ1D_Brg](https://youtu.be/-PsCZ1D_Brg)
"Metropolis Trailer (1927)"
[https://youtu.be/RAvtbcmusY8](https://youtu.be/RAvtbcmusY8)

------
Hydraulix989
Have you tried making the output a palette with potentially multiple color
choices? I'm trying to think of a way of representing "it could be any of
these colors, choose one." The problem is training it will be tough without
being able to semantically segment objects that are the same but colored
differently a priori.

I'd imagine the output would also have to be ordered, but this is easy -- just
use the EM spectrum as an imposed ordering for the palette, and say, cap it at
size 3 max with potential null hues for things that really only have one (or
less than max) color(s).

Any ideas?

------
mjibson
I made a docker image that can be run from the command line or as a website:

[https://hub.docker.com/r/mjibson/colorizer/](https://hub.docker.com/r/mjibson/colorizer/)

------
frevd
Great idea. Google should pick it up to colorize black/white movies and
historical footage. Can easily be extended to feeding desaturated movies for
proper training data, can maybe even remaster the quality.

------
kaivi
I believe this works in the same way, just that there is no technical
explanation:
[http://www.solargreencolor.com/](http://www.solargreencolor.com/)

------
sparky_
I suppose I don't have anything of intelligence to add, but I did want to
congratulate all involved, as the results are actually very impressive. Carry
on!

------
awinter-py
can this be used as a form of compression? is it cheaper to store color
information as a delta on the guessed information?

~~~
cfcef
Asymptotically, yes; prediction = compression (if you have a model for a
bitstream which produces probabilities over the next bit, it can be fed into
an arithmetic encoder and you now have a compressor). In this case, it's not
practically helpful. A VGG is 528MB all on its own, so you need to compress a
lot of images to make back that 0.5GB use plus runtime dependencies.

~~~
im2w1l
They don't necessarily use the same budget. The compressor can come
prepackaged or be downloaded from a fast connection, and then be used when you
have poor mobile internet.

~~~
cfcef
They don't necessarily have to, that's true. But if there was any appetite to
have gargantuan 500MB+ decompression libraries to save 20 or 30% on downloaded
bytes while browsing, we would have seen much more uptake of existing simpler
compression schemes like SDCH which takes a tiny step in that direction with a
relatively large (but still tiny) pre-built dictionary for the WWW.

------
dzhiurgis
I had exact same idea last week and even searched for some approaches today.
Thank god someone did it for me though!

------
slantaclaus
The world is your oyster, _max. Great post. I'm jealous I never learned linear
algebra, etc in college

------
sciencesama
interesting i would like to install and run the scripts, if so what is the
process to go docker or aws, or cloud would be great it would be awesome if
you can provide the installation and running steps how to load images and make
them to color like a guide, may be i can help aswell.

------
moultano
This doesn't seem like it is working yet. Most of the validation set images
are flat sepia with a random blotch of blue.

------
haosdent
could it become a cloud service?

~~~
jeffjose
This could become, but I believe the author intended to show a usecase with
this blogpost. If you look closely, the results are far from satisfactory. As
methods and techniques evolve, this can become a cloud service.

------
majidvision
Hi,

the model is not available!

------
CrowFly
I wonder if the "deep learning" is getting any hints from the bayer patterns
that may be detectable even in the desaturated image. It would be interesting
to see what it did from a true b/w sensor, a Fovean sensor, or a scan of black
and white films.

On a related "deep learning" topic:

We do a lot of work with 3D modeling tools and scanning large objects,
including people, at our facility.

One thing we realized that should be possible with "deep learning" is taking a
standard human computer model, and configuring it to match the position and
shape of a scanned human model, or a photo of a human being -- in order to add
back missing bits. Imagine a website where you can upload a swimsuit image and
get back a computer generated nude image with the obscured body parts
replaced. This should be very doable today, and would make a very popular
website!

~~~
6502nerdface
This reminds me of a guy I interviewed recently who's been working at a
startup that creates detailed, 3D, biomechanically accurate models of women's
breasts, and associated UI, for use by plastic surgeons to show clients what
they could look like after different procedures, based on scans of their
bodies.

~~~
Roodgorf
Not to critique the legitimacy of the work your interviewee, but I can't
imagine another time this would be relevant on HN. [http://www.smbc-
comics.com/index.php?id=3971](http://www.smbc-comics.com/index.php?id=3971)

