I wish I could claim it was something more awesome than that but that's the truth! I'm treating these outputs as an art of selection to a certain extent because it's simply not 100% consistent yet. That's one of the things I'm going to continue to try to improve upon.
What I'd love to see in the future are compound networks where a few nodes like this can be mixed with a few nodes that extract vector data, a few others that infer depth maps from images, modulated by similarity detectors that match objects and individuals.
I'm very impressed by the work you've already done - I have a huge library of images I'd like to run it against for both forensic and aesthetic purposes.
I think the biggest problem is that picture is not the hand (its very visible and it could be easily fixed in post processing), it's the blue shade in the clothes that just should not be there. Otherwise, the colors are great (skin and all look very real).
Primarily I thought it was cool because it should be useful in many other image modification domains. And then it blew up in popularity today (didn't expect that). But yeah in the notes in the readme at github I do say this:
>To expand on the above- Getting the best images really boils down to the art of selection.
I added that after getting some feedback similar to yours, because before that, this disclaimer wasn't quite cutting it apparently:
>You'll have to play around with the size of the image a bit to get the best result output.
So yeah I'm trying to stay honest here. I'm not going as far as picking completely random samples, admittedly, but really what I'm trying to drive at here is you can produce cool results with this tool. It's not perfect, but it's a tool. And even if you pick at random, they still look pretty damn good. Just sometimes it renders the tv as color and sometimes it doesn't, and i picked the cool option.
That’s why colorizing companies employ historians and researchers. You can have a pretty accurate idea of this color with enough research, but it takes time (and thus money).
They were chiefs blankets.
They were blue/had no red
Faces -> some variety of flesh-colored from light to dark
Fabric/clothing -> blue
Sky -> blue
Vegetation -> green
Wood -> brown
Blank -> turquoise or tan
Small details -> fascinating variety of colors, but often a brilliant red
Which all seems fairly reasonable. For many things (like wood or skin) it seems accurate.
Obviously things like clothes come in such a variety of colors that there's simply no way at all to predict accurately, zero meaningful signal -- so if it settles on whatever the most common color is, it doesn't surprise me that would be blue.
By hand colorization: http://www.marinamaral.com/portfolio-2/
Are there examples of ML doing something like that? (also know little about ML)
Oh yeah to answer your question- super resolution does indeed make up details as you describe there and arguably does blur the line with restoration/story telling. But so does colorization- not all the colors added by the model are going to be what was actually going on there, of course.
In recent decades there has been a push to show the animals more realistically. The fossilized evidence is studied and compared to the skeletal structure of animals that exist today. Inferences and educated guesses are made from there to project a more realistic but more subjective image of the dinosaurs. We now get much more varied and interesting depictions with feathers, bright coloring, fat deposits, and other features that can neither be completely confirmed or ruled out based on the evidence.
 - https://99percentinvisible.org/episode/welcome-to-jurassic-a...
Really neat work!
Unless a) my brain is applying more interpretation to these pictures than I realize or b) the author (intentionally or not) picked out pictures that show the best results
Look at the Chinese Opium Smokers in 1880. They appear slightly too caucasian-coloured to me.
At the extreme, it'd be interesting to see what it would do to these, for example: https://mashable.com/2015/01/31/former-slaves-photos-united-...
His face is arguably too red. But on average it’s fine. (Amusing: is this comment correct, or unconsciously biased by the lack of knowledge of what native Americans actually look like? I admit the latter is possible.)
Humans interpret colors thanks to context. When you strip away context, it’s easy to come up with things that fool you. (Optical illusions are the limit case of this.)
See DaVinci’s journals on color. They are worth studying, and most entries are so short they may as well be tweets. http://www.sacred-texts.com/aor/dv/dvs005.htm
On that note it is cool to see how the algorithm does work for both indoor and outdoor photos. Indoor settings tend to have dark backgrounds and outdoor settings have light backgrounds.
Very cool project.
However, I think the real application here is colorizing frames of movies. Imagine being able to turn black and white historical footage into color. It won't be as good looking as a single image, but it would be good enough i bet.
Maybe another cool avenue to explore would be combining models like this with some NLP approach that parses a historian's rough description of how the scene should be colored and biases the generator with prior information that way. (Maybe related to visual question answering or something.)
> The model loves blue clothing. Not quite sure what the answer is yet, but I'll be on the lookout for a solution!
As someone familiar with the libraries space, I'd actually be very interested in seeing a machine learning model that could deal with "cleaning up" old film (I've actually brought this up w/ several of my ML friends occasionally). One of the biggest challenges in the world of media preservation is migrating analogue content to digital media before physical deterioration kicks in. Oftentimes, libraries aren't able to migrate content quickly enough, and you end up with frames that have been partially eaten away by mold.
As a heads-up, these are some of the problems you might encounter on the film front (which you might not otherwise find with photos due to differences in materials used, etc):
Edit: Here's maybe a better link -- https://www.bbc.com/news/av/entertainment-arts-45803977/pete...
I'd love to hear otherwise but I'm not aware of any commercial "machine learning" for post-production aside from the Nvidia Optix denoiser and one early beta of an image segmentation plugin.
In any case the results are damned impressive -- can't say I've seen anything like it before.
The pictures were basically perfect to myeyes, until I scrolled down to the "gotchas" section, at which point I started to notice a lot of details that are wrong, mostly fading colors, on clothes or otherwise.
Now, there seems to be a distinct loss of details in the restored images. The network being resolution-limited, is the black-and-white image displayed at full resolution besides the restored one?
What I would like to see is the output of the network to be treated as chrominance only.
Take the YUV transform of both the input and output images, scale back the UV matrix of the restored one to match the input, and replace the original channels. I'd be really curious to look at the output (and would do it myself if I was not on asmartphone)!
Nevertheless, that's some awesome work, and I can't wait to see where it goes!
However, I feel like you glossed over the proposed workaround, which I feel is appropriate (though more complicated if you want to implement"defade"), and extremely easy to implement.
I took a couple minutes to write an octave script that implement the workaround , it would have been even easier if both images had already been distinct files, and perfectly aligned.
The basic idea here is the same as the one behind the YUV transform: our brains are much less sensitive to the chroma channels than the luma channel. So I separate those, and keep the original luma channel, while I use the reconstructed chroma, which is lower-resolution.
Judge the results by yourself, but it seems to me that the end results are a whole lot better: https://imgur.com/a/n2sBYCi
And it could still be improved a lot more (by using the original high-resolution image, and avoiding to hand-align the images).
Edit: also, ironically, the Indigo dye (thus blue clothes) didn't become common before the 1900s , so the bias might produce historically-inaccurate images!
Yes..I definitely glossed over the proposed workaround and I apologize. Thanks for this.
Although I would have made it a fully-fledged github issue, with a link in your board, instead of a text entry, to add supplementary material in the issue thread.
Bonus: if you are only interested in chrominance, you can train your network to use YUV as an input instead, and output only UV. I suspect this might lead to substantial gains in the training time and network complexity.
But that might already be what you are doing, for all I know. I am just really glad I could be of any help! And this feels like an "free-lunch" improvement.
I'm not sure what I want to do about the Kanban board versus issues tracker yet... I'm used to JIRA mostly. I'll figure it out but do know your contribution is very very much appreciated. I don't think I would have come up with that.
I don't know much about ML, but would it be possible to use some kind of attention model to iteratively construct the final colouring? The memory limit of the GPU would then limit the attention region size, but not the maximum image size. Talkin' outta my rear here, though.
For example, in "Interior of Miller and Shoemaker Soda Fountain, 1899" the colors from the counter and chairs blend, but the luma help our eyes to separate it.
Just throwing a thought out here that you might have considered, but, maybe it's because traditional black-and-white film is over-sensitive to blue? It's why when one uses traditional black and white films one usually uses at least a yellow filter and if you have blue sky in a shot you use a red filter. This may or may not be useful; either way, keep up the awesome work!
If you get an image with a funny artifact, like a super-red hand, can you fix it by running the network on a slightly augmented image? For this kind of work, it seems reasonable that you could keep re-colorising an image until you got one that was acceptable (as in the case with the B+W TV).
(This is also the case for e.g. the superresolution problem.)
From the article:
"BEEFY Graphics card. I'd really like to have more memory than the 11 GB in my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic are ridiculously large but honestly I just kept getting better results the bigger I made them."
I strongly doubt that you can "generalize" colourization in the sense that you talk about (over a wide variety of subject matter).
works on video. don't suppose anyone knows if a photoshop (or gimp) plugin of it was ever made?
Plot twist: It was actually red in reality.