Thanks for calling this out. I think the dev cycle for production AI Systems will involve a lot of bug squashing on this kind of thing. These models handle a huge variety of inputs; by definition much more than you could hand code. It’s hard to even identify systematic failures.
Once a failure mode is known - like here - how do you fix it? The foundational problem is minorities are a minority of the training set. Good training data is expensive, so how can we practically boost representation without getting the Gemini fiasco?
One idea: start with the best coloring books. Presumably some human has mastered the art of “accurately representing black or mixed hair without it turning into a caricature”. Find them, and start buying their art. When they have drawings based on photos, use that training pair. When they have line art only, use a style transfer tool (like this one!) to convert to a photo. That gives you another pair for use in the other direction.
Another idea: make it easy to users to flag poor transfers. Add a “whitewashed” option to the standard mod report flow. Feed that into RLHF. Get better.
Another idea: focus dev cycles on this. Once you have a system to flag problems and address them, use all this fancy AI to identify input photos with black/mixed hair. Manually inspect how the model performs and push feedback into the next training cycle.
> Presumably some human has mastered the art of “accurately representing black or mixed hair without it turning into a caricature”
There absolutely are good examples, yes, and I think you're right that the failure of capturing it is largely down to volume in the extant datasets, and particularly in datasets that provide a direct match for the type of line-art colouring in style, which is in itself fairly dated.
I agree with all of your recommendations. I think it's definitely a problem that will be solved. The main thing is for people to get used to looking for it.
Once a failure mode is known - like here - how do you fix it? The foundational problem is minorities are a minority of the training set. Good training data is expensive, so how can we practically boost representation without getting the Gemini fiasco?
One idea: start with the best coloring books. Presumably some human has mastered the art of “accurately representing black or mixed hair without it turning into a caricature”. Find them, and start buying their art. When they have drawings based on photos, use that training pair. When they have line art only, use a style transfer tool (like this one!) to convert to a photo. That gives you another pair for use in the other direction.
Another idea: make it easy to users to flag poor transfers. Add a “whitewashed” option to the standard mod report flow. Feed that into RLHF. Get better.
Another idea: focus dev cycles on this. Once you have a system to flag problems and address them, use all this fancy AI to identify input photos with black/mixed hair. Manually inspect how the model performs and push feedback into the next training cycle.
One tool for this is https://www.aquariumlearning.com/
The point is - once you identify a corner of feature space where things break, you can fix them.