Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Would it be as simple as feeding it a bunch of decolorized images along with the originals?


basically the training works as follows: Take a color image in RGB. Convert it to LAB. This is an alternative color space where the first channel is a greyscale image, and two channels that represent the color information.

In a traditional pixel-space (non latent) diffusion model, you noise all the RGB channels and train a Unet to predict the noise at a given timestep.

When colorizing an image, the Unet always "knows" the black and white image (i.e the L channel).

This implementation only adds noise to the color channels, while keeping the L channel constant.

So to train the model, you need a dataset of colored images. They would be converted to LAB, and the color channels would be noised.

You can't train on decolorized images, because the neural network needs to learn how to predict color with a black and white image as context. Without color info, the model can't learn.


But since you do not have access to colour originals of historical photos in almost every instance, you cannot possibly train the network to have any instinct for the colour sensitivity of the medium, can you?

An extreme example:

https://www.cabinetmagazine.org/issues/51/archibald.php

https://www.messynessychic.com/2016/05/05/max-factors-clown-...

Colourising old TV footage can only result in a misrepresentation, because the underlying colour is false to have any kind of usable representation on the medium itself.

And this caricatured example underpins the problem with colourisation: contemporary bias is unavoidable, and can be misleading. Can you take a black and white photo of an African-American woman in the 1930s and accurately colour her skin?

You cannot.


> Can you take a black and white photo of an African-American woman in the 1930s and accurately colour her skin?

AI colorization will, in general, be plausible, not accurate.


Yeah, the model is racist for sure. That's a limitation of the dataset though (celeb A is not known for its diversity, but it was easy for me to work with, I trained this model on Colab)

And plausibility is a feauture, not a bug.

There are always many plausibily correct colorizations of an image, which you want the model to be able to capture in order to be versatile.

Many colorization models introduce additional losses (such as discriminator losses) that avoid constraining the model to a single "correct answer" when the solution space is actually considerably larger.


In other words, bullshit.


No more so than any other colorization method that isn’t dependent on out-of-band info about the particular image (and even that is just more constrained informed guesswork.)

That's what happens when you are filling in missing info that isn't in your source.

EDIT: Of course, color photography can be “bullshit” rather than accurate in relation to the actual colors of things in the image; as is the case with the red, blue, and green (actual colors of the physical items) uniforms in Star Trek: The Original Series. But, also fairly frequently, lots of not-intentionally-distortive reproductions of skin tones (often most politically sensitive in the US with racially non-White subjects, where there are also plenty of examples of deliberate manipulation.)


Showing color X on TVs by actually making the thing color Y in the studio, well, filming, not bullshit. It's an intentional choice playing out as intended. It is meant to communicate a particular thing and does so.


That particular thing was not intentional, and is the reason why the (same color in person, different material) command wrap uniform that is supposed to be color-matched to the made-as-green uniforms isn’t on screen.

But, yes, in general inaccurate color reproduction can be intentionally manipulated with planning to intentionally create appearances in photos that do not exist in reality.


shrug people like looking at colorised photos because it helps root the image within the setting of the real world they occupy.

For some it’s more evocative, irregardless of the absolute accuracy.

Having a professional do it for that picture of your great grandad is expensive.

Having a colourisation subreddit do it is probably worse for accuracy.

I think there is a place for this bullshit.


The original color information just isn't there.

So bullshit is the best you're going to get.


Well, you could also not put more bullshit in the world by not doing the thing.


Why are you so negative about it? Pretty sure many people would find it impressive to colorize old photos to look at them as if these were taken in color.

Should artists not put their bs in the world? Writers? Musicians? Most of it is made up but plausible to make you feel something subjective.


People have been colorizing photos as long as there have been photos.


This is true, but if you have some reference images, you can probably adapt some of the recent diffusion adaptation work such as DreamBooth, to tell the model „hey this period looked like this“, and finetune it.

https://dreambooth.github.io/


>You can't train on decolorized images, because the neural network needs to learn how to predict color with a black and white image as context. Without color info, the model can't learn.

I think the parent means with delocorized images used to test the success and guide the training (since they can be readily compared with the colored image they resulted from which would be the perfect result).

Not to use decolorized images alone to train for coloring (which doesn't even make sense).


Is there a reason for using LAB as opposed to YCbCr? My understanding is that YCbCr is another model that separates luma (Y) from chroma (Cb and Cr), but JPEG uses YCbCr natively, so I wonder if there would be any advantage in using that instead of LAB?


The Y in YCbCr is linear, and is just a grayscale image. The L channel in lab is non-linear (as are A and B), and is a complex transfer function designed to mimic the response of the human eye.

A YCbCr colorspace is directly mapped from RGB, and thus is limited to that gamut.

LAB can encode colors brighter than diffuse white (ala #ffffff), like an outdoor scene in direct sunlight.

Sorta HDR (LAB) vs non-HDR (YCbCr).

This image (https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Ex...) is a good demo, left side was processed in LAB, right in YCbCr). Even reduced back down to a jpeg, the left side is obviously more lifelike, since the highlights and tones were preserved until much later in processing pipeline.


The description included with that image conflicts with your account:

> An example of color enhancement using LAB colorspace in Photoshop (CIELAB D50). Left side is enhanced, right side is not. Enhancement is "overdone" to show the effect better.

And per the original upload the “enhancement” demonstrated is linear compression of the a* and b* channels—

https://upload.wikimedia.org/wikipedia/commons/archive/f/f3/...

—the effect a divergence from the likeness of life at least as I’ve experienced it.


You can take arbitrary images and convert them to grayscale for training, and do conditional diffusion


But convert them to grayscale how?

Black and white film doesn't have one single colour sensitivity. Play around with something like DxO FilmPack sometime (it has excellent measurement-based representations of black and white film stocks).

It's a much more complex problem than it might seem on the surface.


fair, but can’t you just randomize the grayscale generation for training?


I wanted to say no, that can't work.

And I think it can't work. But now I am not sure!

The other day I was working on a mono photo to prove a point: that a model (a photographic artist's model!) with very striking pink hair was of little concern to a photographer who worked in black and white only, and might actually present some opportunities for choosing tonal separation that are not present in those with non-tinted hair.

In different circumstances (film and filter) her hair could appear (in black and white) to the viewer as if it was likely brunette or likely blonde, before any local (as opposed to image wide) adjustments were made.

The question you are asking, I think, is could you get the hair colour right based on the impact of those same circumstances on other known objects in the scene.

I think the answer is no, in the main, generally because those objects likely don't survive to make colour comparisons from (and there are known cases where the colourisation of a building has been completely wrong because it had simply been repainted). And also because it's sometimes not even obvious what a structure actually is, without its colour. People who colourise by hand make this mistake too.

But I concede that given that we have to work with contemporary images to have a colour source, randomising the tone curve is the only thing that could work.


yes, so infinite training data. but the challenge will be scaling to large resolutions and getting global consistency


Is that challenging? Humans have awful color resolution perception, so even if you have a huge black-and-white image, people would think it looks right with even with very low-resolution color information. Or, if the AI hallucinates a lot of high frequency color noise, it wouldn't be noticable.

Wikipedia has a great example image here: https://en.wikipedia.org/wiki/Chroma_subsampling. Most people would say all of them looked fine at 1:1 resolution.


I meant more from a comoute standpoint, the models are expensive to run full res


I see what you mean. I think that you can happily scale the B&W image down, run the model, and then scale the chroma information back up.

Something I was thinking about after writing the comment is that the model is probably trained on chroma-subsampled images. Digital cameras do it with the bayer filter, and video cameras add 4:2:0 subsampling or similar subsampling as they compress the image. So the AI is probably biased towards "look like this photo was taken with a digital camera" versus "actually reconstruct the colors of the image". What effect this actually has, I don't know!


good point, I hadn’t realized that you only need to predict chroma! That actully greatly simplifies things

re. chroma subsampling in training data: this is actually a big problem and a good generative model will absolutely learn to predict chroma subsampled values (or JPEG artifacts even!). you can get around it by applying random downscaling with antialiasing during training.


I guess you can always use a two-stage process. First colorize, then upscale


yeah, you can use SOTA super res, but that tends to be generative too (even diffusion based on its own, or more commonly based on GANs). it can be a challenge to synthesize the right high res details.

but that’s basically the stable diffusion paper (diffusion in latent space plus GAN superres)


Yeah, if you have a high res image, you can get color info at super low-res and then regenerate the colors at high res with another model. (though this isn't an efficient approach at all)

https://github.com/TencentARC/T2I-Adapter

i've also seen a controlnet do this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: