Colorizing black and white photos with deep learning

gwern · on Jan 8, 2016

The 'averaging' problem is interesting and I think related to the Euclidean-style loss function: it is correctly minimizing the color distance between its gray-brown guess and all the possible shades? If that's what's going on, then messing with the architecture won't fix it because it's already reaching the 'right' answer and what is necessary is a better loss... Perhaps something like DCGAN where the loss function is another CNN which is being trained to guess whether an image is original or generated? (So you'd feed the BW image into the generator, which converts it into a colorized image; the colorized and the true color images are both fed into the discriminator, which tries to guess which is the real one. Then error is backpropagated into both.)

Houshalter · on Jan 8, 2016

One other problem is that "real" black and white images don't come from the same distribution as greyscaled images. The training data is just regular photos that have had all the color channels combined and averaged together. Sometimes it's a weighted average, like green is given more weight than blue, because human eyes see green better.

However real black and white photos represent the actual intensity of the light waves. Black and white photos might be sensitive to light humans can't even see, or weight different "colors" very differently. Even two colors that appear the same to the human eye might be different intensities and frequencies, and so produce a different black and white photo.

I don't know how much of an effect this would have. But real black and white photos definitely look different than greyscaled color photos, and it might lead the NN to guessing the wrong color information.

derefr · on Jan 8, 2016

Interesting point; the "perceptual brightness" formula used in grayscaling calculations is usually this:

    sqrt(0.299r² + 0.587g² + 0.114b²)

Real "photosensitizing" chemicals, on the other hand, have spectral response curves, like this one[1] for the silver iodide used in daguerrotypes. (The formula above just being an approximation of the summed spectral response curves of the three cones in the eye.)

There's a big difference, like you said, but it's not an impossible problem; we should be able to create a greyscaling formula for any photographic process with a known response curve. (And honestly, I have no reason to think this hasn't already been done in the creation of an Instagram filter at some point.)

[1] https://books.google.ca/books?id=9IpaIAcgthQC&lpg=PA119&ots=...

Houshalter · on Jan 9, 2016

Well not necessarily. If the response curves are different than that of a digital camera, then some information will be lost. In fact the response curves are different even for different cameras. But you could probably get a closer approximation using a better formula.

jacobolus · on Jan 9, 2016

> However real black and white photos represent the actual intensity of the light waves

No. Black and white negatives represent the intensity of the light (possibly with some lens filters in between) multiplied by some light frequency sensitivity distribution of the film, under some nonlinear time/intensity response function composed with another nonlinear time/chemical strength/exposure function from the chemical development of the negative.

Then to get from negative to print, black and white photos undergo another pair of multiparameter nonlinear functions (representing the exposure and chemical development process), possibly including intentional manipulation by a human operator.

Depending on the film, filters, development process, and printing techniques used, this can be relatively close or quite far from a naively converted 3-channel color picture.

For details cf. for example Ansel Adams’s books The Negative and The Print.

jrockway · on Jan 8, 2016

I've done a little bit of experimentation with this, comparing real B&W film to various software's conversions to B&W. Most interesting was a case where I took a picture of the NYC subway diagram with a red filter. I could never make the RGB data from the digital camera look like my original negative, even when shooting through the same red filter. (Which should not be necessary, if you want to block blue and green, you should be able to do that in software. Lightroom, at least, did not let me make a matching image.)

More subjectively, I think most digital pictures converted to B&W look kind of dull, whereas actual film looks very exciting to me. I haven't done any detailed research into this, but I'm not 100% convinced that collecting luminance through red, green, and blue filters can capture all the data that panchromatic B&W film captures.

(Even more of a tangent, one of the joys of B&W photography is that you can outright lie about colors and the photo still works. Try a red filter and watch the blue sky become black!)

darkmighty · on Jan 8, 2016

I too am fascinated by this problem... Your adversarial proposal is very interesting. Let me try to confront the fundamental problem directly.

Even if you're given a perfect probability distribution over the space of images the solution wasn't obvious for me, mostly because we're used to thinking of a "best estimate".

The first thing you think of is giving the least-squares estimate (the average), but MMSE exhibits the problem shown.

So you might instead try a maximum likelihood estimate; but this too has problems: imagine every car is a sightly different shade of blue (none are quite the same, maybe the manufacturing is unreliable), except 1 in a million cars are red, but the red is very consistent. The ML estimate will pick the red car, which of course is unrealistic.

The optimal solution is simply drawing from the underlying distribution, instead of relying on a deterministic "best estimate": an outside observer won't be able to distinguish your generated samples from the true distribution. That's why the "Adversarial discriminator" should work.

I wonder if there exists a cost function that directly promotes sampling from the underlying distribution without needing the adversarial approach...

Houshalter · on Jan 8, 2016

You could have the net predict an entire distribution for each pixel. Like a mixture of gaussians or something. But then sampling from it would be incorrect. E.g. it might not know if the car is blue or red, so half the pixels would randomly be red, and the other have blue. It would look terrible.

Somewhere, the neural net needs to decide "this car is going to be blue" and then be consistent with that. Adversarial nets allow that, by having random inputs. One of the inputs to the NN is a random number, and that random number might determine if the car is going to be blue or red this time.

The cool thing about this is that it allows you to generate multiple samples. You can generate 10 different images and select the best one. And the adversarial nets should learn to approximate the true distribution as closely as possible. And I don't think there is any other method that can do that.

Another idea would to just have a loss function that doesn't punish it for getting a wrong color. But rewards it only when it gets very close to the right color. This way the algorithm doesn't worry about producing muddy brown colors when it isn't' sure, it just goes with a best guess.

darkmighty · on Jan 8, 2016

When I referred to sampling I was talking about the joint distribution of all pixels (hence "space of images"), which would work fine. But I suspect predicting distributions is impractical, it may be better to use methods that sample directly without ever explicitly finding the distribution.

You do need a source of entropy to perform the sampling. This amount should be more than a minimum given by how precisely you want to sample from the continuous source, related to the Kullback-Leibler divergence of the distribution.

Houshalter · on Jan 9, 2016

Well in theory, the adversarial nets should learn to model the distribution perfectly. But there might be a way to do it directly. You could train an NN to produce samples from a random input source, just like the adversarial nets. But unlike the adversarial nets, the inputs don't need to be random. You could train another NN to predict a distribution of what they should be. And then instead of adversarial training, just regular training to predict the exact pixels, and backproping all the way through.

gwern · on Jan 14, 2016

Here's someone using DCGAN to fix blur: https://swarbrickjones.wordpress.com/2016/01/13/enhancing-im... I wonder if it would work for colorizing...

eutectic · on Jan 9, 2016

This would also hopefully solve the 'splotchiness' seen in some of the images, which presumably comes from the network optimising each pixel independently without caring about continuity.

beautifulfreak · on Jan 9, 2016

I do colorizations on Reddit. How about adapting this to assist manual colorizers? We'd mask out each color region by hand, so the neural net wouldn't have to decide what's what. We'd tell it, "this region is skin, this is a brown overcoat, this is a gold candlestick - please colorize them appropriately." As it is, colorizers fill in those regions with a very small number of colors, sometimes with gradient maps that blend two or three colors together, but there's just not much color variety in even the best works. Grass should be a hundred shades of green, but we'd use 2 or 3. Skin is the hardest to get right, and requires more individual hues, but that means maybe 5 to 10. It's enough to trick the eye, but on close examination looks more painterly than photographic. A neural net palette picker could be Photoshop's next big feature. An intelligent skin colored crayon might actually deliver all the shades of skin.

bootload · on Jan 9, 2016

"I do colorizations on Reddit. How about adapting this to assist manual colorizers? We'd mask out each color region by hand, so the neural net wouldn't have to decide what's what. We'd tell it, "this region is skin, this is a brown overcoat, this is a gold candlestick - please colorize them appropriately."

That's a smart idea. Master painters used to do this. Do the broad outlines, choose the colour and style then unleash the underlings (painters in training) to do the rest.

patmcguire · on Jan 9, 2016

If you could figure out a fleshed out version of the interaction loop - "this is skin, no, not like that, a little more like this" - that would probably be useful. The interface between AI and people is always a little weird - how could you tell it to do a small correction and have it be meaningful? What do two to three AI driven actions a second look like?

oilywater · on Jan 9, 2016

Thing you are talking about is similar to scene labeling (google the term and look at the pictures). Only you'd be doing it by hand, instead of letting NN find it's own representation. This is good because you'd be creating a dataset full of regions that we could use to automatically segment the pictures into useful regions later.

There's even a language mapping that you want, you want it to recognize image parts and associate it with a word, which also isn't simple, because you'd have to have a lot of labels (candles, hands, hairs, cars, trees, licence plates etc.)

It is a harder problem than the one in the article.

State-of-the-art scene labeling is still not good enough (close to 80% accuracy) but I believe it's due to lack of data because algorithms used combine neural networks with joint learning approaches such as conditional random fields to extract the regions.

sandworm101 · on Jan 8, 2016

Note the image with the wolves. I think the entire exercise is in that photo. This system seems very good at patterns. The animals do well when against natural backgrounds. That's because they match those backgrounds. Even though they are different colours, animals with fur all adopt some form of camouflage. So their patterns at some level match the patterns of their natural environment.

The wolves are rendered well, the flowers not. Wolves are camo. Flowers are the opposite. They want to stand out from the background. So the machine doesn't handle them well. The green stripe on the truck also fits this.

To take this idea forward, look at the image of the puppies against the grass. They are not camo. Their colour is the product of breeding, therefore they do not render so well as the wolves. There might be something useful here to measure whether or not an animal is being viewed in its natural environment.

munificent · on Jan 8, 2016

> Wolves are camo. Flowers are the opposite. They want to stand out from the background. So the machine doesn't handle them well.

There is a simpler explanation. Wolves only come in a few different colors. Flowers come in a variety of colors. Therefore, there are only a couple of correct answers for coloring a wolf, but a wide variety of completely incompatible answers for coloring a wolf.

You are right that flowers come in a variety of colors because they want to stand out. But I don't think the neural net understands that. It just knows that a gray flower could be any color while a wolf is confidently going to be some kind of brown.

sandworm101 · on Jan 8, 2016

I wouldn't want to be the one to tell this guy that wolves are all 'some kind of brown'.

https://media2.wnyc.org/i/620/372/c/80/1/485198177.jpg

munificent · on Jan 8, 2016

Ugh. Seriously? Didn't we all discuss literally yesterday about pedantic misinterpretation of text on the Internet?

This article is about colorization, which means taking shades of gray and selecting a hue and saturation for them. The brightness is effectively fixed because, guess what, a black and white image can convey brightness already.

Obviously, black and white coloration on wolves falls outside of this because those are more or less already correct in the black and white image.

Now look at that picture you linked. What do you see? White: doesn't need much coloration. Black: uh, also doesn't need much coloration. Slightly brownish gray: like I said, wolves are all some kind of brown.

Show me a blue wolf, or a green wolf, then we'll have something interesting to talk about. But most wolves, like almost all mammals, have coloration pretty much limited to dull warm colors and tints and shades of those. Here's a picture for you:

https://en.wikipedia.org/wiki/Canis#/media/File:Canis.jpg

What do you see?

krapp · on Jan 8, 2016

>Didn't we all discuss literally yesterday about pedantic misinterpretation of text on the Internet?

No, we all literally did not.

nkron · on Jan 8, 2016

Very cool results. Seems like with some extra human intervention this would produce very good results.

Shameless plug: I built an online tool to colorize photos using WebGL. It's all manual but it's easy to get started and doesn't require any additional software. http://www.colorizephoto.com

ghrifter · on Jan 8, 2016

Awesome site! I like the Clint Eastwood "Color Picker" :)

tacos · on Jan 8, 2016

I'd like to commend the author for the tone of this article.

It's the right mix of "paper" and "blog post." It's an experiment that sort of flops and there will be a variety of "yay for tech!" and "I can't wait to see a movie where various body parts of people remain black and white!" in the comments here regardless.

Presenting it as an experiment, clearly explaining what you tried, then detailing some future thoughts and saying "it kind of works" was refreshing and honest. Thank you.

Robin_Message · on Jan 8, 2016

Great idea. I feel like an alternate color model could help, because of all the places where the color is arbitrary but strong average to a muddy sepia.

For example, the stripe on the truck should be bright and saturated, but the actual color doesn't matter.

The HSV colour space could work if the difference between colours is calculated with some kind of circular arithmetic.

luminiferous · on Jan 8, 2016

I think the more fundamental problem is that the program is trying to minimize error where error is defined as deviation in color from the original image. This means objects that can be many different colors but are always strong and saturated average to a brown, as you said. And as you said, for those types of objects, it's best to pick a random color and make it bright and saturated. The best way to have it do that is to redefine error as some metric of how "realistic" the picture looks vs the original. For example, a picture of a car recolored to look bright blue looks similarly "realistic" to the human eye as the original picture where the car is bright red. The deviation in color is high, but it still looks "good", so the error should be low. I have no idea how this metric would be calculated without humans evaluating the output manually, though.

leereeves · on Jan 9, 2016

HSV might result in rather artistic results with appropriate saturation and unnatural hues. I'd love to see the results.

transcranial · on Jan 8, 2016

Nice work! Another demonstration of the amazing power of deep neural nets for transfer learning. For those interested, http://arxiv.org/abs/1511.06681 demonstrates experiments with video coloring using a 3D convolution-deconvolution network with some architectural elements similar to that used by the author.

eadz · on Jan 8, 2016

It's great to see really good use cases for this technology. I would be interested to see an old B&W movie colorised this way.

marknadal · on Jan 8, 2016

You guys! This is Ryan Dahl of NodeJS! He's back! When did he come back? So excited to see him around again, he's such a great guy (I had the honor of meeting him once).

Brilliant post too with an excellent write up. Can't believe people hadn't already been thinking about this.

logicallee · on Jan 8, 2016

VERY interesting - as soon as I read the title, I thought "finally, this will really tell me whether deep earning is bullshit or not!". Why? Because my understanding is that colorization is basically magic, like, how can you possibly get the color of a telephone handset out of a black and white image? In fact, you can't! Here's the set of the Addams Family:

http://www.fastcodesign.com/3021327/asides/the-addams-family...

Totally all over the map. So I thought this would be such a fantastic, fantastic way to compare my knowledge of the world with a punk algorithm's.

I didn't read the key/legend/explanation - as soon as I saw the first set of 3 images, I knew the middle lamp was the true color, and the right lamp was generated; because lampshades overwhelmingly look like the middle picture (in my mind) - https://www.google.com/search?q=lampshade not that weird blue color.

"Ha, stupid algorithm", I thought. "Who has a blue lampshade". this algrithm doesn't even come close.

Then I kept scrolling with that assumption, and it got worse and worse - wow that photographer's color is like blood orange, the algorithm doesn't even know it's a person! This is terrible. Where does that truck get that green trim, nobody would choose that, straight out of left field.

Until I got to the field with wolves. Why are the flowers' colors missing from the middle picture? This doesn't look right at all.

Then I read the caption. The middle images are the generated ones; the right-hand images are reality.

For five 5 of 6 images, I thought that the generated image was "obviously" an actual photo, and much more plausible in colors than the right-hand real photos. Continuing to scroll, for the park bench also I think the middle image is much closer to how I imagine it.

So we are at the stage where an algorithm generates a much more plausible view of reality, with rare exceptions, than actual reality. This is pretty impressive.

TheOtherHobbes · on Jan 8, 2016

I had completely the opposite take-away - but then I have some experience of photography, and sadly, brightly coloured lampshades no longer surprise me.

I think it's a good illustration of the limits of statistical approaches, and why "It's harder than it looks" applies.

This is about as good as it's going to get without genuine object recognition, knowledge of real-world lighting and colour, and awareness of photographic styles.

It might be possible for a system to learn all of the above, but it's going to need a bigger and probably pre-partitioned training set, and a much more complex model.

Houshalter · on Jan 8, 2016

To be useful, it doesn't have to be perfect. It can have humans provide it with reference photos, select colors for different parts, or look at several different versions of the same image and decide which one looks best.

logicallee · on Jan 8, 2016

I'm curious, for you is the difference obvious? (is the right-hand one obviously reality in each case?)

eutectic · on Jan 8, 2016

[citation needed]

platz · on Jan 8, 2016

> an algorithm generates a much more plausible view of reality, with rare exceptions, than actual reality

Queue Baudrillard: https://en.wikipedia.org/wiki/Simulacra_and_Simulation

thenayr · on Jan 8, 2016

I believe your first assumption was correct: "The output is the middle image. The right image is the true color"

logicallee · on Jan 8, 2016

(edited that one part in my comment, but I think what I wrote is clear - I assumed the middle image was reality and the right-hand generated.)

gmt2027 · on Jan 9, 2016

This suggests a colorisation "Turing test" where website visitors have to tell which image they think is real.

samim · on Jan 9, 2016

Made a Gist, allowing you to apply this to Video: https://gist.github.com/samim23/5baaf1d206cf5e81436d And ran it on Chaplin as demo: https://www.youtube.com/watch?v=_MJU8VK2PI4

WalterBright · on Jan 8, 2016

I have a lot of old B+W photos, and various color photos of the same general subject. It would be way cool to have an app that I could use the color ones to teach it, and colorize the B+W ones. I don't mind that it wouldn't be perfect.

gwern · on Jan 8, 2016

> It would be way cool to have an app that I could use the color ones to teach it, and colorize the B+W ones.

You'd want the training to be done on the server, not locally. Setting up a full Torch/Caffe/Theano stack is not easy because there are so many libraries and moving pieces which must interact with Nvidia's proprietary blobs and libraries and ever-changing GPUs, that you can follow all the directions and either work or fail with an utterly inscrutable error. (For example, I'm running on an old Ubuntu because the newer Ubuntu is not officially supported, and my usual OS, Debian, just does not work no matter what I try.)

ymt123 · on Jan 8, 2016

There are actually dockerized versions of many (maybe all) of the deep learning libraries. The docker containers can take advantage of the GPU for training. You still have to install CUDA on the box (outside the docker container) but then you can try out different deep learning libraries.

Libraries we've started from in my lab: Caffe: https://hub.docker.com/r/kaixhin/caffe/ Torch :https://hub.docker.com/r/kaixhin/torch/ Theano: https://hub.docker.com/r/kaixhin/theano/

vegabook · on Jan 8, 2016

Wow, the real (rhs) images have so much more of an "emotional" appeal. Somehow the middle images, while I admire the tech, really don't add much to the black and white, and they might even take something away. They seem much too tentative, washed out, averaged. I would argue that in general, they're the worst out of the three.

Houshalter · on Jan 8, 2016

That's a problem with this specific architecture. If it's not sure what color something is, it goes with an average of all the colors. When a brighter color would be more realistic, even if the exact shade is wrong. Some people have suggested ways of fixing this above.

But just the fact it guesses the right colors at all is really cool. Previous automatic colorizations I've seen were very very crappy or required lots of human input. Or both.

And while these colorized photos do look a bit dull, I like them better than black and white. Something about black and white photography makes it look fake to my brain. It doesn't register the same way. Even really bad colorizations make images feel more real. I once saw very badly colorized video of WWI, and it was really fascinating. I actually felt like I was watching a real event that had actually happened. The same is true for these images.

CyberDildonics · on Jan 8, 2016

If anyone is interested in this an wants to learn more, start with natural image matting - that is the more fundamental research topic of which colorization is one use.

While I'm sure this is interesting to many people, the results here are extremely poor compared to modern techniques.

cfcef · on Jan 8, 2016

> the results here are extremely poor compared to modern techniques.

Could you link a demonstration of the much superior results?

CyberDildonics · on Jan 8, 2016

Search natural image matting or colorization and you will find a lot of results.

Take a look at Levin 2004 to start.

djfm · on Jan 8, 2016

This is what must be going on in our brains when we watch black & white movies, fascinating.

noobie · on Jan 8, 2016

On a tangent but any idea how the author's portrait was created? http://tinyclouds.org/

Edit: Thank you.

heed · on Jan 8, 2016

It looks like they ran the photo through Google's Deep Dream. There are a bunch of generators online, eg http://deepdreamgenerator.com.

fudgie · on Jan 8, 2016

Something like https://github.com/jcjohnson/neural-style which applies the style from one image to another.

eutectic · on Jan 8, 2016

By a version of Google's DeepDream sampling from a relatively low layer.

jayess · on Jan 8, 2016

In their list of validation images, I thought this was was pretty cool:

http://tinyclouds.org/colorize/val-imgs/val-006200-2.jpg

trizzashamafoo · on Jan 8, 2016

These results are clearly hand-picked to leave out complete failures like these:

http://tinyclouds.org/colorize/val-imgs/val-000100-1.jpg http://tinyclouds.org/colorize/val-imgs/val-000800-2.jpg

When I selected from the validation images, most had these blue splotches that were not shown on the web page. Obviously a model is not expected to work 100% of the time, but I think the link misrepresents the results by not even showing a single instance of this common failure.

edit: further scrolling shows it's less common than I thought.

gefh · on Jan 8, 2016

That's a 100-generation result and not expected to be that great, they do get better with more training.

degenerate · on Jan 8, 2016

For a second I thought the pics on the far right were the machine-created colors, and I thought, those are amazing! Then I realized it was the middle pic, haha, terrible.

Nav_Panel · on Jan 8, 2016

How might this model handle B&W images created with colored filters? What would the difference in output be? Many many film-era B&W photos used yellow (or perhaps even more dramatic, like red and blue) filters. Here's a small example of the difference: http://www.exposureguide.com/images/lens-filters/red-filter....

VikingCoder · on Jan 8, 2016

"Human wants me to do something... Uh... Brown? Yes? Brown? You like brown? BROWN!"

jcl · on Jan 9, 2016

Yeah, I got the same impression. The mapping it seems to have discovered is:

  vegetation texture -> greenish
  sky texture -> blueish
  everything else -> brownish

...which is not a bad set of defaults for most photos.

pythonlion · on Jan 8, 2016

lol, great work!

al626 · on Jan 8, 2016

Any explanation as to why there seem to be a lot of blue highlights on the extended validation set?

amelius · on Jan 8, 2016

> In the past few years Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision. Each year the ImageNet Challenge (ILSVRC) has seen plummeting error rates due to the ubiquitous adoption of CNN models amongst the contestants.

Am I right that the "Convolution" part only refers to the speed by which the models can be trained, and not to any other quality of these models?

yablak · on Jan 8, 2016

No. Most of layers in a CNN perform convolutions with kernels. This is not the same as standard DNNs that do a full matrix multiply.

Convolutional kernels allow you to use many fewer variables to perform the forward layer operation; and CNNs tie these trainable variables across layers. Training is not only faster, but also more robust because you have less parameters to learn.

drcode · on Jan 8, 2016

What would be really awesome is a Reddit bot that takes the Highest-scoring image from the colorization subreddit and mixes it 50% with the deep learning results and posts it as a comment.

From the published results, I think it's likely a 50:50 mix of human and ML in this case would likely yield the most naturalistic result in 90% of cases.

OldSchoolJohnny · on Jan 8, 2016

I think it might be possible to infer the colours of some objects based on how old black and white film worked because each type had different spectral sensitivity and probably has a signature that can be determined automatically. Once you know the film you might have a shot at inferring what colour that car is for example.

oilywater · on Jan 9, 2016

Thing about NNs is that they cannot encode structure, or, to be more precise, they cannot optimize over joint loss (except RNNs but training is a pain). For example, the color of the current pixel can depend on the inferenced color of upper neighbors and neighbors on the left (we would be coloring pixels from left to right, going row by row). NN in the article is coloring locally without any kind of reference of what the whole picture looks, NN can encode the information of what the pictures look since it has a lot of representation power but that requires large amounts of data, and parameter tweaking.

Optimizing over joint loss (maximizing the probability of the full image colorization) would work extremely well.

Tools like vowpal wabbit can easily be adapted to learn a chain classifier over colors and it should work insanely fast.

DanBC · on Jan 8, 2016

I'd be interested to see the output images compared to the output of "colour blindness simulators".

EijiK · on Jan 11, 2016

Test movies "Roman Holiday Trailer (1953)" https://youtu.be/pKtYv6cU8VE "Tokyo Story Trailer (1953)" https://youtu.be/-PsCZ1D_Brg "Metropolis Trailer (1927)" https://youtu.be/RAvtbcmusY8

Hydraulix989 · on Jan 9, 2016

Have you tried making the output a palette with potentially multiple color choices? I'm trying to think of a way of representing "it could be any of these colors, choose one." The problem is training it will be tough without being able to semantically segment objects that are the same but colored differently a priori.

I'd imagine the output would also have to be ordered, but this is easy -- just use the EM spectrum as an imposed ordering for the palette, and say, cap it at size 3 max with potential null hues for things that really only have one (or less than max) color(s).

Any ideas?

mjibson · on Jan 9, 2016

I made a docker image that can be run from the command line or as a website:

https://hub.docker.com/r/mjibson/colorizer/

frevd · on Jan 9, 2016

Great idea. Google should pick it up to colorize black/white movies and historical footage. Can easily be extended to feeding desaturated movies for proper training data, can maybe even remaster the quality.

kaivi · on Jan 8, 2016

I believe this works in the same way, just that there is no technical explanation: http://www.solargreencolor.com/

sparky_ · on Jan 9, 2016

I suppose I don't have anything of intelligence to add, but I did want to congratulate all involved, as the results are actually very impressive. Carry on!

awinter-py · on Jan 8, 2016

can this be used as a form of compression? is it cheaper to store color information as a delta on the guessed information?

cfcef · on Jan 8, 2016

Asymptotically, yes; prediction = compression (if you have a model for a bitstream which produces probabilities over the next bit, it can be fed into an arithmetic encoder and you now have a compressor). In this case, it's not practically helpful. A VGG is 528MB all on its own, so you need to compress a lot of images to make back that 0.5GB use plus runtime dependencies.

im2w1l · on Jan 8, 2016

They don't necessarily use the same budget. The compressor can come prepackaged or be downloaded from a fast connection, and then be used when you have poor mobile internet.

cfcef · on Jan 10, 2016

They don't necessarily have to, that's true. But if there was any appetite to have gargantuan 500MB+ decompression libraries to save 20 or 30% on downloaded bytes while browsing, we would have seen much more uptake of existing simpler compression schemes like SDCH which takes a tiny step in that direction with a relatively large (but still tiny) pre-built dictionary for the WWW.

dzhiurgis · on Jan 8, 2016

I had exact same idea last week and even searched for some approaches today. Thank god someone did it for me though!

slantaclaus · on Jan 8, 2016

The world is your oyster, _max. Great post. I'm jealous I never learned linear algebra, etc in college

sciencesama · on Jan 8, 2016

interesting i would like to install and run the scripts, if so what is the process to go docker or aws, or cloud would be great it would be awesome if you can provide the installation and running steps how to load images and make them to color like a guide, may be i can help aswell.

moultano · on Jan 8, 2016

This doesn't seem like it is working yet. Most of the validation set images are flat sepia with a random blotch of blue.

haosdent · on Jan 8, 2016

could it become a cloud service?

jeffjose · on Jan 8, 2016

This could become, but I believe the author intended to show a usecase with this blogpost. If you look closely, the results are far from satisfactory. As methods and techniques evolve, this can become a cloud service.

dharma1 · on Jan 8, 2016

colourmebrown.com

majidvision · on Jan 10, 2016

Hi,

the model is not available!

CrowFly · on Jan 8, 2016

I wonder if the "deep learning" is getting any hints from the bayer patterns that may be detectable even in the desaturated image. It would be interesting to see what it did from a true b/w sensor, a Fovean sensor, or a scan of black and white films.

On a related "deep learning" topic:

We do a lot of work with 3D modeling tools and scanning large objects, including people, at our facility.

One thing we realized that should be possible with "deep learning" is taking a standard human computer model, and configuring it to match the position and shape of a scanned human model, or a photo of a human being -- in order to add back missing bits. Imagine a website where you can upload a swimsuit image and get back a computer generated nude image with the obscured body parts replaced. This should be very doable today, and would make a very popular website!

semi-extrinsic · on Jan 8, 2016

> I wonder if the "deep learning" is getting any hints from the bayer patterns that may be detectable even in the desaturated image.

I'd say it's highly unlikely, since these images are downsampled to 224x224 pixels. That would average out any residual Bayer pattern (which is pretty hard to detect in the first place).

6502nerdface · on Jan 8, 2016

This reminds me of a guy I interviewed recently who's been working at a startup that creates detailed, 3D, biomechanically accurate models of women's breasts, and associated UI, for use by plastic surgeons to show clients what they could look like after different procedures, based on scans of their bodies.

Roodgorf · on Jan 8, 2016

Not to critique the legitimacy of the work your interviewee, but I can't imagine another time this would be relevant on HN. http://www.smbc-comics.com/index.php?id=3971