Show HN: Open source colorizing grayscale images with NN

argonaut · on April 21, 2016

These results: http://richzhang.github.io/colorization/ from Berkeley are much better than this model (and the code is open source as well).

Furthermore, those better results have the advantage of a much simpler model. This model has a fairly complicated architecture (a complex residual concatenation setup) and many more parameters (I would guess anywhere between 2x-10x as many, but I'd have to take a closer look), which means it's much slower to run and takes up more memory (disk and RAM).

I'd also say that in general the better model does things that are a lot more common sense: using the CIE Lab color space (perceptually uniform), omitting pooling, using a classification loss instead of regression (regression in generally performs poorly in deep learning), etc.

logicrook · on April 21, 2016

And the thread concerning that much better model: https://news.ycombinator.com/item?id=11403653

im3w1l · on April 21, 2016

The authors of that point out that those are the especially successful cases. Presumably the the new article also showcases the success.

For comparability, I think it would be best if we could see outputs for the two models for the same, chosen in advance and not cherry picked, images.

specialist · on April 21, 2016

Idle thought: Could colorization be used to (lossy) compress full color images (usefully)?

What I imagine is full color input -> create B/W & color histogram (list of colors used) -> image viewer uses colorization algorithm to reapply colors.

adrusi · on April 21, 2016

I don't really know a ton about image compression, but I exported the same photo to jpg using the same settings twice: once with 3 channels and once desaturated and then with two channels hidden (I couldn't figure out how to make a 1-channel jpg in gimp, no sure if it's possible). The 1-channel export had less than 30% file size reduction.

I don't think a compression technique that would require that much processing power and have that little size reduction would be too useful.

maffydub · on April 21, 2016

A 30% saving is roughly what I would expect.

Commonly JPEG separates the image into 1 channel of luma and 2 channels of chroma. It then downsamples the chroma to half the luma resolution, meaning that you have twice of much raw luma data as you have chroma.

It then goes on to do a whole load of fun discrete cosine transform, quantization and huffman encoding, but to a first approximation, I'd expect them to compress roughly similarly on average.

Since you've got twice as much luma data as chroma data, dropping the chroma will only save you ~30%.

Since the predictions from the neural network are not always accurate, you could then encode the error in the compressed file, and restore it on decompression. This will generally be close to 0, so compression should be pretty good.

At the end of the day, 30% better compression would be great... if the processing overheads aren't too great - I think that's the key factor.

(Most of this is from memory based on some hacking I did to be able to transcode videos to play on a cheap Chinese iPod Nano clone, which used an undocumented variant of MJPEG. The default quantization tables for luma and chroma are different from each other. The iPod Nano clone was using the standard quantization tables but the other way round (so using the luma table for chroma and the chroma table for luma). I can only imagine this was a bug in their code, as it was bound to reduce their compression ratio/image fidelity.)

Lerc · on April 21, 2016

I think you could do quite well with seeding areas with known colours. Provide data of x,y,colour at a few sample points where the initial guess was too far out. A very small amount of data might allow for quite accurate results.

fmeyer · on April 21, 2016

I'm colorblind and I can't tell the difference between the prediction and the GT. My coworker says that they have slight differences in some colors.

Congrats, you wrote the first colorblind NN ever!

rimantas · on April 21, 2016

I'd say the differences are very well expressed.

m_mueller · on April 21, 2016

I'd apply it to an old B&W film to show off. Imagine Ben Hur or Citizen Kane in color!

orlp · on April 21, 2016

I see a potential problem with moving pictures - the neural network might decide that while red was an appropriate color for the truck in the last frame, blue is more likely in this frame.

xiphias · on April 21, 2016

You can use the previous few frames for fixing this problem for videos

thomasahle · on April 21, 2016

There might be a few scenes in between the viewings of the truck.

xiphias · on April 21, 2016

You're right, it would be a fun side effect to see the color change between different scenes :) I guess there's not enough in commercial interest in fixing these problems, but it can be probably done with the current algorithms. Precise 3D reconstruction is much more important than colour reconstruction.

pionar · on April 21, 2016

>it would be a fun side effect to see the color change between different scenes :)

Now that's a horse of a different color[0]!

[0] http://oz.wikia.com/wiki/Horse_of_a_Different_Color

m_mueller · on April 21, 2016

Would it be possible to have some sort of semantic naming coupled to it 'the car in this scene is red', that could be fixed by humans, maybe even by talking to the NN? That's when this stuff starts getting fun IMO - human language together with superhuman domain knowledge.

ThePaco · on April 21, 2016

Or even using the previously processed frames as reference the next time the shot appears

pgeorgi · on April 21, 2016

I'd imagine that many people wouldn't even notice (unless the truck's color is part of the plot)

caltrain · on April 21, 2016

Truck's color can not be part of the plot, its a B&W movie!

pgeorgi · on April 21, 2016

Protagonist: "Do you see that red truck?" light blue truck rolls through the scene

thomasahle · on April 21, 2016

Maybe it's about a truck painting shop, and every time they paint something, it either doesn't change color, or get a color different from what the customer ordered. I would watch that.

jcoffland · on April 21, 2016

You mean you would watch paint dry?

paulmd · on April 21, 2016

Actually, someone recently filmed exactly that - a 10-hour film of paint drying, as a protest against film censorship. The British Board of Film Classification awarded it a "U" rating: universal/suitable for all.

http://www.telegraph.co.uk/film/movie-news/paint-drying-bbfc...

It would have been funny to splice in a couple frames of porn throughout, Fight Club-style.

nxzero · on April 21, 2016

Are there any other issues or advantages that you're able to see that would be unique to movies? While I'm unable to think of a use, seems like there's the potiental for side channel analyis via music, dialog, background-noises, scripts, etc.

xiphias · on April 21, 2016

Script analysis would take much more effort probably, but it could work if the colour is in the movie script...in that case object detection gets important as well. Just using more training data to cover all the dog breeds and car colours would help a bit though.

WalterBright · on April 21, 2016

I suspect that if an artist selected colors for a few key frames, the computer could figure out the rest of the frames.

pbhjpbhj · on April 21, 2016

How would they know that the vehicles, say, were supposed to be the same colour. I'm thinking The Italian Job - the same model of cars are used and are distinguished by their colours. Also what about when different vehicles are painted by the production team to look the same (for stunts say), the computer could rightly recognise them as different vehicles - how would it then know that they're supposed to be the same.

For such things it seems you'd need to check every shot change.

WalterBright · on April 21, 2016

BW filmmakers were, of course, well aware that they were filming in BW and would select colors for sets and costumes that would look good in BW. For example, chocolate milk was used for blood. You wouldn't want a colorizer to determine the actual colors, but the intended colors!

Even today, directors rarely seem to want to film in actual color. They'll tint everything sepia, or that hideous blue-orange scheme that is so popular these days.

http://priceonomics.com/why-every-movie-looks-sort-of-orange...

WalterBright · on April 21, 2016

That TIJ cars were distinguishable only by color was made possible by filming in color. A BW movie would not make a film requiring distinguishing colors that filmed as equal shades of grey.

In any case, any such colorizing system would be designed to accept a bit of guidance here and there from the artist. This is much like when an OCR'd document needs a bit of touch-up.

And even if it wasn't perfect, many BW movies would be made much more watchable, like the 1927 Wings, which is crying out to be colorized (and have a soundtrack added).

m_mueller · on April 21, 2016

Even though already perfectly enjoyable, I'm thinking Kurosawa films like Seven Samurai and Yojimbo. Man that would be something.

WalterBright · on April 21, 2016

Yup. For what is possible, see reddit's colorizedhistory subreddit. The old pictures become much more interesting.

omtose · on April 21, 2016

>the same model of cars are used and are distinguished by their colours

Aren't we talking about a B&W film? In that case the colour (or its lack of) wouldn't communicate any information.

aab0 · on April 21, 2016

One idea here is to adopt a recent generative approach: a CNN which starts with two noise image inputs, and then repeatedly tweaks it plus a new noise image over multiple inputs until it does one last tweak to a final version. The noise serves as a RNG for making choices to the built-up image, I think. You could apply this recurrent idea to movies too: for the first BW frame, pass in a noise image and the BW frame, get out a C frame; now for the second BW frame, pass in the BW frame but also the C frame from before. The CNN may gradually learn to transfer colors from the C frame to the BW frame, thereby maintaining temporal coherency.

(Or you could just try to use a RNN directly and keep hidden state from frame to frame.)

datenwolf · on April 21, 2016

> Ben Hur

The widely known Ben Hur (1959), the one with the chariot race, already is in colour. Did you mean "Ben-Hur: A Tale of the Christ" from 1925?

WalterBright · on April 21, 2016

Part of the 1925 version was already in color (2 strip Technicolor).

m_mueller · on April 21, 2016

Alright, I forgot about this. For some reason I've seen B&W still shots lately and forgot how I saw it.

SyneRyder · on April 21, 2016

The title says it's open source, but I couldn't actually see what the license is? (Maybe I missed it?)

cyphar · on April 21, 2016

And it looks like you have to download a torrent in order to get something. But yeah, it's missing a license and is still proprietary.

walrus · on April 21, 2016

The torrent contains the parameters for another neural network, VGG16, that this network makes use of. VGG16 is a network developed by the Visual Geometry Group at Oxford[1]. They released their parameters under CC BY-NC 4.0 to save others from spending 2-3 weeks[2] training the network. Someone else converted those parameters to work with TensorFlow[3], which is what the torrent in this repository is.

I'm interested to see if neural network parameters become the new "binary blob". While in theory you could always retrain the network yourself, actually doing so takes a lot of work fiddling with the network's hyperparameters and requires significant computing resources.

[1] http://www.robots.ox.ac.uk/~vgg/research/very_deep/

[2] "On a system equipped with four NVIDIA Titan Black GPUs, training a single net took 2–3 weeks depending on the architecture." - arXiv:1409.1556

[3] https://github.com/ry/tensorflow-vgg16

syockit · on April 21, 2016

It is open source. You can see the source. The license is no-license, so you can (only) fork it. That way, people can clone from your username instead of the original author, although that doesn't make any difference as they're all hosted on GitHub.

lucideer · on April 21, 2016

You seem to be getting downvotes because you don't understand what the term "open source" means, but nobody has offered to explain so:

Open source does not just mean you can see the source. From Wikipedia[0]:

"Open-source software is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose."

https://en.wikipedia.org/wiki/Open-source_software

nothrabannosir · on April 21, 2016

Not OP, but---I've been in software and OSS for a long time now, and I never knew this. I thought the term for that was Free Software.

In fact, looking closely at that definition, isn't practically nothing open source?

"Open-source software is computer software with its source code made available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose."

I.e. public domain? Any other license lives precisely to limit those rights of distribution, no?

Anyway, just pedantry. I see your point. Learn something new every day.

lucideer · on April 21, 2016

I think you've highlighted an inaccuracy, or at least a lack of clarity, in the Wikipedia article. I guess a Wikipedian could argue it should be interpreted as "for any purpose depending on the chosen licence", but that's definitely not clear.

Re: Open source vs Free software, this is a matter for debate (the tendency being too associate Free software with the more copyleft/viral/gnu-ish side of things), but I would definitely say that both involve explicit licencing, as opposed to potentially implicit " all rights reserved " style regional copyright.

Chris2048 · on April 22, 2016

I would usually use the term "FLOSS", for Free-Liberte, OSS"

The L part clarified the type of freedom, as the FL clarifies that it's not just OSS.

SyneRyder · on April 21, 2016

In some countries, code is copyrighted by default unless the author specifies a different license or explicitly puts it into the public domain. Without a license (or asking the author for permission), I have to assume this code is copyright & All Rights Reserved, and that I can't do anything with it except read it.

It's still interesting and cool to see! Just not what I thought it was when I clicked on the link.

BugsBunnySan · on April 21, 2016

Side note: "some countries" actually encompasses almost all the countries in the world, since source code falls under the Berne Convention: https://en.wikipedia.org/wiki/Berne_Convention, which automatically grants the author copyright on anything they make, unless they explicitly give it up...

x5n1 · on April 21, 2016

Can we get some instructions on how to actually use this. I want to test it out on pictures from:

https://www.reddit.com/r/OldSchoolCool/

mlsource · on April 21, 2016

unfortunately I'm not contributor of this project, but I hope author can answer soon )

g_sch · on April 21, 2016

I'd be interested to see what kind of hidden biases this might reveal, especially about people, e.g. skin tone, eye color, etc.

gcr · on April 21, 2016

Agreed. I wonder how good this model is at filling the skin tone of someone. It would be sad if (for example) it turned everybody into white people or something like that.

In biometrics, there's been similar cases of software like face detectors and face recognition working very well on people from China and not very well for other people, because all the researchers who trained those models only had available large public databases from Chinese universities. The model hadn't seen any other ethnicity so its performance on "non-Chinese" folks wasn't surprising.

jkrippy · on April 21, 2016

The boy at the beach looks very orange/blue to me and reminds me of this:

http://www.slashfilm.com/orangeblue-contrast-in-movie-poster...

gravypod · on April 21, 2016

I don't know if OP will see this but I'd love it if you could use your experience making this to write up a blog post about how it works or port it to being able to adjust color scales for color blind people.

I'd love to combine this technology with this: http://matplotlib.org/style_changes.html

You would probably have some cool results as you could generate examples of what they would look like to color blind people, and a corrected set so color blind people could see them.

Would be a cool, and I am assuming simpler problem then the one you have already managed to solve.

Good show, great work.

taneq · on April 21, 2016

I love the last one.

"But you didn't say what colour it was, so I made it a red truck."

avereveard · on April 21, 2016

Also the lighthouse's sky: "That's sky, it's surely blue!"

Generally it's interesting to see nn thinking out missing details. I'd like to see images with an element deleted and a nn filling in black spots to see what level of shape recognition could do.

Coincoin · on April 21, 2016

I'm pretty sure if you asked a group of humans to colorize the lighthouse you would probably get the same result.

There is no way, from the greyscale, to know that the sky should be orange.

mosselman · on April 21, 2016

Can I try this out in images myself somehow?

amelius · on April 21, 2016

I'm looking for a technique that can upscale images nicely (using NNs).

I've found this: [1], but the results seem somewhat disappointing. One of the problems is that the quality measures are (in my case) subjective (the results should look convincing but need not be "perfect", whatever that may mean).

[1] http://engineering.flipboard.com/2015/05/scaling-convnets/

leichtgewicht · on April 21, 2016

I think this was on HackerNews before?

astrosi · on April 21, 2016

Are you thinking of this project instead? http://richzhang.github.io/colorization/

liotier · on April 21, 2016

Yes - how do those two projects differ in their approach ?

landmark2 · on April 21, 2016

https://news.ycombinator.com/item?id=10864801

leichtgewicht · on April 21, 2016

Thanks, I think that is what I meant.

Gigablah · on April 21, 2016

Now give it a picture of a bike shed.

illumen · on April 21, 2016

I think a tool shed is more appropriate.

georgemcbay · on April 21, 2016

I think it was a reference to this concept:

http://bikeshed.com/

baq · on April 21, 2016

no, it was about this one: https://en.wikipedia.org/wiki/Bikeshed_color

walrus · on April 21, 2016

How about we stop arguing about which link to use and just pick one? I propose http://amzn.com/0345282779. Of the three links proposed so far, this one is clearly the most authoritative.

abdulhaq · on April 21, 2016

It came out red :-(

gwern · on April 21, 2016

Reddit: https://www.reddit.com/r/programming/comments/4ft4hj/colorne...

SamDLC · on April 21, 2016

It may be interesting to investigate using this for image compression/decompression. ie starting with an image having a greatly reduced color space, could the NN reproduce the original image?

nitrogen · on April 21, 2016

I wonder if this could be used to improve the proposed color prediction in Daala, or if the memory and CPU requirements are too high to put into a video codec.

amelius · on April 21, 2016

How would this system colorize this image?: http://imgur.com/RBEht9H

WhitneyLand · on April 21, 2016

What's the application and/or interest in colorizing images? Does it generalize to other types of problems?

new_hackers · on April 22, 2016

The sky is blue, grass is green, trucks are red, and everything else is a shade of brown :-) pretty cool work!

mushmouth · on April 21, 2016

This looks pretty cool. Do any photography people or film people can use this?

mkagenius · on April 21, 2016

Can this be used to train a model which can un-blur an image?

amelius · on April 21, 2016

Does it color the sea blue and the grass green?