> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch
It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.
That doesn’t mean it’s impossible to train an LLM on the concept of similarity.
I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.
It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.
That doesn’t mean it’s impossible to train an LLM on the concept of similarity.