
Colorizing and restoring old images with deep learning - varal7
https://github.com/jantic/DeOldify
======
nine_k
The most interesting exhibit for me is "People watching a television set for
the first time", where everything is colorized _except the TV image_ , which
correctly remains B&W. I wonder what kind of a training set provided the
neural network with this notion.

~~~
citnaj
Author here- So I'll just be brutally honest on that one- not all renders are
doing that. I cherry picked the one that did that because yeah, it's amazing.
There's a simple explanation for why it sometimes doesn't pick up on the guy
on tv to color it- The source material is fuzzy and small.

I wish I could claim it was something more awesome than that but that's the
truth! I'm treating these outputs as an art of selection to a certain extent
because it's simply not 100% consistent yet. That's one of the things I'm
going to continue to try to improve upon.

~~~
sytelus
If you had like to brutely honest, you should put randomly selected set, along
with the hand-picked set - labeling each set how it was selected. This is a
cancer in current deep learning research. You see paper with such a glowing
cool examples but in reality they are just hiding all problematic cases while
being fully aware of it. If this happened anywhere else in any other domain
people would say they got ripped off and outright lied to.

~~~
citnaj
I understand the frustration and in fact share it to a certain extent with
science in general. Keep in mind that this wasn't intended to be published as
a paper or anything like that. I'm just a software engineer who picked a
problem and found a pretty cool solution.

Primarily I thought it was cool because it should be useful in many other
image modification domains. And then it blew up in popularity today (didn't
expect that). But yeah in the notes in the readme at github I do say this:

>To expand on the above- Getting the best images really boils down to the art
of selection.

I added that after getting some feedback similar to yours, because before
that, this disclaimer wasn't quite cutting it apparently:

>You'll have to play around with the size of the image a bit to get the best
result output.

So yeah I'm trying to stay honest here. I'm not going as far as picking
completely random samples, admittedly, but really what I'm trying to drive at
here is you can produce cool results with this tool. It's not perfect, but
it's a tool. And even if you pick at random, they still look pretty damn good.
Just sometimes it renders the tv as color and sometimes it doesn't, and i
picked the cool option.

------
iforgotpassword
I've to admit I have no clue about machine learning, but what I notice is that
this seems to have preferred colors for things that can actually have many
different colors, most notably clothes. They're almost always this blueish
slightly purple color here, even the samurai. Don't get me wrong, this is
still awesome and I might try this on some old photos from my grandparents.
I'm just wondering if and how one can prevent these things from picking this
one ideal color for something and instead have it randomize a bit, since
obviously you can't really know what color some jacket really was. (except
maybe if the picture is a black and white photo of a PAL TV program.)

~~~
dsfyu404ed
The Seneca Native in 1908 example seems the most absurd to me. I know the
software has no notion of a "fabric" or "clothing" but it's very rare for
brown or beige things to fade to blue (or vise versa). In real life things
when transition from brown/beige to another color that other color tends to be
a red orange or yellow. I know from the known issue that it likes blue but it
still seems very odd that it chose to fade from brown to blue like that.

~~~
onemoresoop
To me the Seneca native's skin on the hand seems a bit to reddish. I find
these photos to have very high saturation. I think this could be adjusted and
get subtler effects. It's still amazing that this is possible with no human
intervention but at the same time, from a different perspective, I find that
the originals have their own charm that I would leave it like that.

~~~
draugadrotten
Is it unthinkable that the seneca girl actually had her hand painted red, for
decoration or as a symbol of something? Perhaps her father/brother etc was a
fighter and this was a way to keep spirits up while he was in the war?

~~~
blackflame7000
If you compare the tone of her hand to her face in the black and white image
you can clearly see that the colors are different.

------
crazygringo
Very interesting... seems to basically learn that:

Faces -> some variety of flesh-colored from light to dark

Fabric/clothing -> blue

Sky -> blue

Vegetation -> green

Wood -> brown

Blank -> turquoise or tan

Small details -> fascinating variety of colors, but often a brilliant red

Which all seems fairly reasonable. For many things (like wood or skin) it
seems accurate.

Obviously things like clothes come in such a variety of colors that there's
simply no way at all to predict accurately, zero meaningful signal -- so if it
settles on whatever the most common color is, it doesn't surprise me that
would be blue.

~~~
citnaj
Author here. Yeah you're basically right. GANs vastly improve the situation
though because being safe with "green for grass, blue for sky, brown as
default" doesn't work in the Generative-Adversarial setting. The critic will
assign lower scores if the generator keeps outputting brown. Now I'd think the
generator would get more creative than going for blue constantly but that
might just be a matter or more/better training (...?)

------
robarr
I know this is HN and we always hope machines will help us anywhere but i
suspect (and hope too) that the human perspective will always be needed.
Photography is as subjective as anything can be.

By hand colorization:
[http://www.marinamaral.com/portfolio-2/](http://www.marinamaral.com/portfolio-2/)

------
eagsalazar2
When I saw "restoring" in the title I was expecting higher resolution. For
example seeing in modern photo level detail eyelashes, wrinkles, etc. I get
that, like the colors, this would require the adding lots of made up
information about scene and feature details but IMO it would blur the lines
between restoration and reconstruction/storytelling in a really awesome way.
Old photos are cool in their own way but their lack of detail makes them seem
so alien. Would be exciting to get a hyper real reconstruction.

Are there examples of ML doing something like that? (also know little about
ML)

~~~
slg
There reminds me of a recent 99% Invisible episode [1] in which they discuss
the same topic in the world of dinosaurs. It details how dinosaurs used to be
depicted with the goal of only showing the things that we are confident in
being true (although what we are confident in obviously changes over time).
This results in mostly just greenish-brown skin draped over a muscle structure
attached to the fossilized skeletons.

In recent decades there has been a push to show the animals more
realistically. The fossilized evidence is studied and compared to the skeletal
structure of animals that exist today. Inferences and educated guesses are
made from there to project a more realistic but more subjective image of the
dinosaurs. We now get much more varied and interesting depictions with
feathers, bright coloring, fat deposits, and other features that can neither
be completely confirmed or ruled out based on the evidence.

[1] - [https://99percentinvisible.org/episode/welcome-to-
jurassic-a...](https://99percentinvisible.org/episode/welcome-to-jurassic-
art/)

~~~
eagsalazar2
Hah yeah I just listened to that a couple days ago but didn't make the
connection. Probably was rattling around my subconscious when I wrote this
question because yeah it is very similar. That's a particularly interesting
comparison too because the whole point was that just filling in conservatively
based on experience misses a ton a real-world crazy and interesting diversity.
The best example was how if we were imagining what elephants looked like just
based on their fossilized skeletons, they wouldn't have trunks!

------
userbinator
One of the reasons why these photos look so convincingly realistic is the same
reason
[https://en.wikipedia.org/wiki/Chroma_subsampling](https://en.wikipedia.org/wiki/Chroma_subsampling)
is done --- the human eye has less sensitivity to colour resolution, and so
even relatively vague blobs of colour can evoke the right perception as long
as there is sufficient _luma_ detail (provided by the original monochrome
image); but if you inspect the photos closely, you'll see there are plenty of
unnatural gradients in clothes and such, and the colours of objects blend into
each other.

~~~
sixdimensional
This was my thought too - it may not matter if the colors are 100% accurate as
long they are enough to trick the human eye and brain into filling in what’s
missing. Besides, the reality is, these are not color source photos and will
never be. A black and white photo does not contain the color information, it
was never captured. All we really can do is use historically accurate colors
and afaik, that is the same thing professionally recolorists do as well.

Really neat work!

------
jbattle
This seems almost too good to be true. One thing I find very striking is how
it gets skin tones very plausible across people of different ethnicities
(though the majority of subjects in the picture appear of european descent).

Unless a) my brain is applying more interpretation to these pictures than I
realize or b) the author (intentionally or not) picked out pictures that show
the best results

~~~
drcode
Yeah some of the details are absurdly good, especially the picture of the
"Texas Woman", how it gets the dogs ears perfect, perfect colors on the
apples, and renders the copper pot a perfect copper hue.

~~~
aembleton
Unfortunately her hands are grey. Otherwise, it is very good.

------
user-x
It's interesting to see how the algorithm seems to turn aerial photos into
romantic paintings. My guess is that the model was trained on mostly up-close
photos and that the colors don't map exactly to aerial photos because color
intensity fades over large distances.

On that note it is cool to see how the algorithm does work for both indoor and
outdoor photos. Indoor settings tend to have dark backgrounds and outdoor
settings have light backgrounds.

Very cool project.

------
searine
Colorizing single images will always been a bespoke task. There are just too
much missing data in the image to be able to create high quality colorizations
from the photo alone.

However, I think the real application here is colorizing frames of movies.
Imagine being able to turn black and white historical footage into color. It
won't be as good looking as a single image, but it would be good enough i bet.

~~~
bpye
I imagine applying this to a video will perhaps be better than the stills,
motion helps to hide errors.

------
rhplus
There's an active subreddit dedicated to manual colorization:

[https://www.reddit.com/r/Colorization/](https://www.reddit.com/r/Colorization/)

------
Sol-
As amazingly plausible the pictures look, I personally have some dislike
towards such applications (nothing against the author of course, just about ML
in general) because I always feel a bit as if I'm being duped by the neural
net. When I see image restoration, I'm subconsciously expecting historical
fidelity even if I'm just marveling at the nice colorization. But of course
such historical accuracy is not the primary goal of the GAN.

Maybe another cool avenue to explore would be combining models like this with
some NLP approach that parses a historian's rough description of how the scene
should be colored and biases the generator with prior information that way.
(Maybe related to visual question answering or something.)

------
neom
15 years ago I was in the first cohort of a brand new college program in
Digital Imaging Technology. I spent the cost of a college diploma and over
10,000 hours learning to do this by hand. Now it's AI on Github for all. The
Times They Are A Changin'

------
michrassena
This is one of the few colorizing algorithms that I've seen which creates
desirable output. The images really do look like old colorized images. I
wonder how the authors dealt with the differences in spectral sensitivity of
their source material. There's clearly some orthochromatic plates or film
being used. The image of the Seneca native 1908 is a good example. Notice how
dark the field is on the patch on her skirt. With orthochromatic emulsions,
the patch could have been either black or red since the emulsion isn't
sensitive to red. It's most sensitive to blue, which is part of the reason
skies look so white in old photos.

~~~
citnaj
Author here. Easy to answer that one- altering the training photos with random
lighting/contrast changes (yet keeping the color targets the same) really
helped to deal with varying qualities of photos. But also neural networks are
just particular good at picking up on context, so that has a lot to do with
why the results are so robust.

------
WalterBright
The colorized photos on
[https://www.reddit.com/r/Colorization/](https://www.reddit.com/r/Colorization/)
are just marvelous. If that could be combined with the AI colorization to
colorize old BW movies, that would make them so much more watchable. Other
attempts at colorizing them, like what Turner did in the 80's, were a
commendable attempt but didn't turn out well.

------
dsfyu404ed
Is it just me or do the example images seem overly biased toward coloring
clothing as blue?

~~~
eagsalazar2
Well it has to pick something right? Am I wrong in thinking the color, except
for very specific known items, is simply lost and can't be inferred by any
level intelligence? Maybe the solution is "If I_HAVE_NO_IDEA -> randomColor()"
which I realize doesn't jibe with how ML works (does it?)

------
neuromantik8086
> And yes, I'm definitely interested in doing video

As someone familiar with the libraries space, I'd actually be very interested
in seeing a machine learning model that could deal with "cleaning up" old film
(I've actually brought this up w/ several of my ML friends occasionally). One
of the biggest challenges in the world of media preservation is migrating
analogue content to digital media before physical deterioration kicks in.
Oftentimes, libraries aren't able to migrate content quickly enough, and you
end up with frames that have been partially eaten away by mold.

As a heads-up, these are some of the problems you might encounter on the film
front (which you might not otherwise find with photos due to differences in
materials used, etc):

[https://www.nyu.edu/tisch/preservation/program/05fall/physic...](https://www.nyu.edu/tisch/preservation/program/05fall/physical-
properties.pdf)

[https://www.filmpreservation.org/preservation-
basics/vinegar...](https://www.filmpreservation.org/preservation-
basics/vinegar-syndrome)

~~~
nkoren
I believe that Peter Jackson's recent endeavour in cleaning up WW1 footage
employs significant ML for de-noising, frame interpolation, and colorising. I
haven't seen the final film, but some of the clips are staggeringly good:
[https://www.bbc.com/news/av/entertainment-
arts-45884501/pete...](https://www.bbc.com/news/av/entertainment-
arts-45884501/peter-jackson-lord-of-rings-director-s-ww1-movie-they-shall-not-
grow-old-opens)

Edit: Here's maybe a better link --
[https://www.bbc.com/news/av/entertainment-
arts-45803977/pete...](https://www.bbc.com/news/av/entertainment-
arts-45803977/peter-jackson-world-war-one-footage-brought-to-life-by-lord-of-
the-rings-director)

~~~
lcrs
I'm actually not sure much ML was involved here - depends where you draw the
line I guess, but denoising and interpolation for restoration typically use
more traditional wavelet and optical flow algorithms. The work for this was
done by Park Road Post and StereoD, which are established post-production
facilities using fairly off-the-shelf image processing software. The
colorisation likely leant heavily on manual rotoscoping, in the same way that
post-conversion to stereo 3D does.

I'd love to hear otherwise but I'm not aware of any commercial "machine
learning" for post-production aside from the Nvidia Optix denoiser and one
early beta of an image segmentation plugin.

~~~
nkoren
Huh, I recall seeing an article at one point (can't find the link) where it
said or suggested that ML was involved. Of course this could have just been a
journalist failing to make the distinction; I've seen everything from linear
regression on up naively lumped into the ML bucket.

In any case the results are damned impressive -- can't say I've seen anything
like it before.

------
MayeulC
This is quite interesting!

The pictures were basically perfect to myeyes, until I scrolled down to the
"gotchas" section, at which point I started to notice a lot of details that
are wrong, mostly fading colors, on clothes or otherwise.

Now, there seems to be a distinct loss of details in the restored images. The
network being resolution-limited, is the black-and-white image displayed at
full resolution besides the restored one?

What I would like to see is the output of the network to be treated as
chrominance only.

Take the YUV transform of both the input and output images, scale back the UV
matrix of the restored one to match the input, and replace the original
channels. I'd be really curious to look at the output (and would do it myself
if I was not on asmartphone)!

Nevertheless, that's some awesome work, and I can't wait to see where it goes!

~~~
citnaj
Author here. That's actually what I find quite fascinating myself about the
results- that they look almost perfect at first glance, yet you drill down a
bit closer and you see another "zombie hand". The resolution issue you mention
is definitely something I'm painfully aware of- it just comes down to lack of
memory resources to support bigger renderings. That's going to be something
I'm going to try to attack next.

~~~
MayeulC
Hey, thanks for replying!

However, I feel like you glossed over the proposed workaround, which I feel is
appropriate (though more complicated if you want to implement"defade"), and
extremely easy to implement.

I took a couple minutes to write an octave script that implement the
workaround [1], it would have been even easier if both images had already been
distinct files, and perfectly aligned.

The basic idea here is the same as the one behind the YUV transform: our
brains are much less sensitive to the chroma channels than the luma channel.
So I separate those, and keep the original luma channel, while I use the
reconstructed chroma, which is lower-resolution.

Judge the results by yourself, but it seems to me that the end results are a
whole lot better: [https://imgur.com/a/n2sBYCi](https://imgur.com/a/n2sBYCi)

And it could still be improved a lot more (by using the original high-
resolution image, and avoiding to hand-align the images).

Edit: also, ironically, the Indigo dye (thus blue clothes) didn't become
common before the 1900s [2], so the bias might produce historically-inaccurate
images!

[1]
[https://gist.github.com/MayeulC/626bafbaf925fb3a3c80fdba76b7...](https://gist.github.com/MayeulC/626bafbaf925fb3a3c80fdba76b7e8be)

[2]
[https://en.wikipedia.org/wiki/Indigo_dye#Synthetic_indigo](https://en.wikipedia.org/wiki/Indigo_dye#Synthetic_indigo)

~~~
citnaj
Oh shit yeah that really does look good! Amazing really. Ok...I'm going to put
these notes on the project board.

Yes..I definitely glossed over the proposed workaround and I apologize. Thanks
for this.

~~~
MayeulC
No problem :)

Although I would have made it a fully-fledged github issue, with a link in
your board, instead of a text entry, to add supplementary material in the
issue thread.

Bonus: if you are only interested in chrominance, you can train your network
to use YUV as an input instead, and output only UV. I suspect this might lead
to substantial gains in the training time and network complexity.

~~~
citnaj
Update: I got this working, and dude- it's so awesome in every way. This is
the most substantial improvement I've seen yet. Most importantly- it massively
reduces memory requirements. Thank you so much. I'll commit within a day or so
and make sure to mention you, on Twitter.

~~~
MayeulC
Hey, thank you a lot, that's awesome! One more thing I recently thought about,
but didn't get around to mention, is that you can probably reduce the input of
your net to the Y (luminance) channel (with UV-only output), to trim it
further ;)

But that might already be what you are doing, for all I know. I am just really
glad I could be of any help! And this feels like an "free-lunch" improvement.

------
keehun
>> BEEFY Graphics card. I'd really like to have more memory than the 11 GB in
my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and
Critic are ridiculously large but honestly I just kept getting better results
the bigger I made them.

Wow!

~~~
taeric
This basically sounds like "the more I let the machine memorize, the better it
did." Not necessarily bad, mind you, just amusing.

------
m3at
This is a cool application of ML. Not to diminish the work, just to point out
that humans are more sensitive to luminance than color (hence YUV encoding [1]
and others), so it might make inaccuracies less visible.

For example, in "Interior of Miller and Shoemaker Soda Fountain, 1899" the
colors from the counter and chairs blend, but the luma help our eyes to
separate it.

[1] [https://en.m.wikipedia.org/wiki/YUV](https://en.m.wikipedia.org/wiki/YUV)

------
lewiscollard
This is great work! And,

> The model loves blue clothing. Not quite sure what the answer is yet, but
> I'll be on the lookout for a solution!

Just throwing a thought out here that you might have considered, but, maybe
it's because traditional black-and-white film is over-sensitive to blue? It's
why when one uses traditional black and white films one usually uses at least
a yellow filter and if you have blue sky in a shot you use a red filter. This
may or may not be useful; either way, keep up the awesome work!

------
mrtron
Anyone else left wondering if the hand really was red in the photo?

~~~
blauditore
I think the issue is that the hand is round-ish and surrounded by wood
texture, the model might apply learnings from photos apples or other fruit on
trees.

------
joshvm
How stable is the result with respect to augmentation?

If you get an image with a funny artifact, like a super-red hand, can you fix
it by running the network on a slightly augmented image? For this kind of
work, it seems reasonable that you could keep re-colorising an image until you
got one that was acceptable (as in the case with the B+W TV).

------
amelius
Seems like an easy problem for DL, as you have an enormous amount of data
available (just take any color image, convert it to grayscale and you have a
pair of training images).

(This is also the case for e.g. the superresolution problem.)

~~~
bitL
You probably need an enormous GPU (24GB RAM) as well to make as large model as
possible for as good generalization as you can (there are so many different
types of objects/surfaces/fabric and their compositions).

~~~
taeric
Something amusing about needing a ridiculously large model to claim good
generalization. Analytical models typically go the other way, right?

~~~
bitL
It's Deep Learning, not much to do with any analytical model, it's not
thinking like a human :-(. Recently even good NLP processing needs 24GB+ for
training (won't fit into 16GB), a good quality colorizing (no spills, natural
colors) could be expected to be as demanding.

From the article:

"BEEFY Graphics card. I'd really like to have more memory than the 11 GB in my
GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic
are ridiculously large but honestly I just kept getting better results the
bigger I made them."

~~~
taeric
I get that. I just have a hard time thinking that is "generalizing" the model,
so much as making the model all encompassing.

~~~
tripzilch
It's the difference between the sense of training the model to be a _"
generalist"_ and it doing "generalizing".

I strongly doubt that you can "generalize" colourization in the sense that you
talk about (over a wide variety of subject matter).

------
varal7
There is a HN discussion about a similar model from 2016:
[https://news.ycombinator.com/item?id=11403653](https://news.ycombinator.com/item?id=11403653)

------
rangibaby
There is something wrong with that woman’s hand. It is either extremely
swollen, a glove, or not a real hand (wooden?). Perhaps your model didn’t make
a mistake after all

------
gardaani
How it works for modern color photos which have been converted to BW? It would
be interesting to see how the colors change compared to the original color
photo.

------
sigstoat
some nice research on computer-assisted colorization from some years back:
[http://www.cs.huji.ac.il/~yweiss/Colorization/](http://www.cs.huji.ac.il/~yweiss/Colorization/)

works on video. don't suppose anyone knows if a photoshop (or gimp) plugin of
it was ever made?

------
eveninglucifer7
It blows my mind what deep learning can do, however I wonder if this optimism
can be extrapolated into super ai...

------
rhizome
Facial recognition and recommendation engines don't work, let's see what it
can do for movies.

------
fireattack
The sample images seem too small to tell if it can color within edges
accurately.

------
ekianjo
Very nice, but it looks like the blues are kind of off in several pictures.

------
asow92
What happens with modern black and white photos?

------
mkagenius
> Granted, the model isn't always perfect. This one's red hand drives me nuts
> because it's otherwise fantastic

Plot twist: It was actually red in reality.

------
kinos
Can someone run this against some manga?

~~~
corysama
Google "colorize manga neural" and you'll find many projects for AI-assisted
manga colorization.

