TokenVerse: Multi-Concept Personalization in Token Modulation Space by Google

jasonjmcghee · 2025-01-28T21:53:17 1738101197

Feels like a moodboarding multiplier for some design disciplines, if these aren't cherry-picked / transfer to other domains.

Pretty interesting.

Seems like you could apply similar ideas to text too.

sdenton4 · 2025-01-28T22:24:05 1738103045

This looks like an excellent step towards being able to apply consistency to generated images across a series.

basch · 2025-01-29T03:28:38 1738121318

It looks as if it would trivially integrate into Whisk, which already has a similar feature for defining an outputs “subjects” “scene” and “style”.

doctorpangloss · 2025-01-28T22:54:21 1738104861

If it worked with people, they’d show people.

SeanAnderson · 2025-01-28T23:10:42 1738105842

The second example in the "Results" section includes a human.

doctorpangloss · 2025-01-29T00:34:11 1738110851

They do not show a realistic photo face transfer. They blur out the faces even.

It would be a huge invention, but they did not achieve that.

drewbeck · 2025-01-29T01:00:07 1738112407

Below the first Results header is a carousel of images. If you tap the arrows you can explore — I believe there are three examples where the final image is a person who’s face was applied from a reference photo.

doctorpangloss · 2025-01-29T01:31:25 1738114285

Yes. But. The reference photo is blurred. The smallest details matter for faces! That's the whole point. I have no doubt you can do a kind-of-looks-like faces. But this is the same issue since Dreambooth. All the IP transfer approaches, even the best like Ideogram's, are failing on faces.

rprwhite · 2025-01-29T03:47:59 1738122479

There's two images where the face is transferred to the final image. The references images with blurred faces are all being used for a different reference; the pose, or "necklace", etc. The faces are blurred in every image unless they explicitly want the face transferred to the final image, at least that's how it seems.

doctorpangloss · 2025-01-29T04:12:33 1738123953

I know. But there are no unblurred source images of faces. This isn't complicated.

danpalmer · 2025-01-29T04:36:11 1738125371

https://token-verse.github.io/results/multi_concepts/25.png

https://token-verse.github.io/results/multi_concepts/06.png

Both of these show a man's face in a source image being used in a newly generated image. I agree that it isn't complicated, but you seem to be drawing different conclusions to everyone else here.

If your point is that it can't perform face transfer, you seem to be wrong - that's what's happening here. If your point is that the blurred photos used for other parts of the input mean that this suggests the model may get confused by other faces, then that's a fair point, but it seems clear they have demonstrated face transfer, and requiring blurring irrelevant faces seems a minor point compared to transferring the face that's intended. I'm not sure how that would really impact use-cases.

doctorpangloss · 2025-01-29T05:21:53 1738128113

Well. It doesn't look like him. If they had working face / human character transfer, listen, my dude, every single image would show a face transfer. It's one of the biggest challenges.