I wonder how many permutations could legibly be generated in a single image with an extended version of the same technique. I don't understand the math, but would two orthogonal transformations in sequence still be an orthogonal transformation and thus work?
I’m not sure whether ‘orthogonal transformations’ in this context refers to the usual orthogonal linear transformations (/matrices), but if so then yes.
That looks more like a cat-aclysm to me TBH. Probably the model was overwhelmed by the conflicting requirements, so that neither the individual images nor the composite image are particularly good. But, as you wrote, maybe they will get better at this eventually...
That could be interesting. A recursive cat, so to say.
The problem would be this: In the picture at hand, the big cat is rather simple. Just a portrait of a smiling cat. While the 9 smaller cats are doing all kinds of poses to adjust to the form of the big cat portrait. So the subcats are more complex than the main cat.
When doing the recursive cat, it would be hard to make a subcat from 9 subsubcats because the subcat is already a complex image that is not as easy to recognise as the main cat.
This thread reminded me of this old gem: https://thesecatsdonotexist.com/ (warning: you may see some catspiders / r/Imsorryjon material!)
Now what would be interesting is a "demixer" which allows you to locate the source image(s) from multiple interations of a given image. Like a reverse image search but for generative images. I suppose it would rely on artefact matching or some other kind of granular pattern matching, along with other more general methods (assuming the source material is actually available online in the first place).
The man/woman color inversion one was the most impressive to me. On the rotations, I can rotate in my mind and see the other view… but I find it very hard to color invert mentally
For me it's the reverse: the color inversions feel hardly more impressive than the morph animations that were all the rage in the 1990, because while I certainly understand how straight-forward color inversion is on the level of pixel data, I still can't "see" that simplicity. It hardly looks any different than an alpha blend with no relation at all.
The rotations on the other hand, wow! It is perfectly visible how the pixels don't change. You can physically rotate the screen and the image "changes". I could not think of a better illustration of how diffusion model images are not just echoes of preexisting images (they certainly are), but solutions to the problem of "find a set of pixels that will match the description of {prompt}". Or in this case, "that will match {A} when oriented this way and {B} when oriented that way".
> Per the code, the technique is based off of DeepFloyd-IF, which is not as easy to run as a Stable Diffusion variant.
I haven't dug in yet, but it _should_ be possible to use their ideas in other diffusion networks? It may be a non-trivial change to the code provided though. Happy to be corrected of course.
I suspect the trick only works because DeepFloyd-IF operates in pixel space while other diffusion models operate in the latent space.
> Our method uses DeepFloyd IF, a pixel-based diffusion model. We do not use Stable Diffusion because latent diffusion models cause artifacts in illusions (see our paper for more details).
I always thought it was weird that this idea took off with that particular controlnet model. Many other controlnet models when combined with those same images produce excellent and striking results.
The ecosystem around Stable Diffusion in general is so massive.
Other ControlNet adapters either preserve the high-level shape not enough or preserve it too well, IMO. Canny/Depth ControlNet generations are less of an illusion.
It created a backlash because a) it was too popular with AI people hyping "THIS CHANGES EVERYTHING!" and people were posting low-effort transformations to the point that it got saturated and b) non-AI people were "tricked" into thinking it was a clever trick with real art since ControlNet is not ubiquitous outside the AI-sphere, and they got mad.
Maybe a naive question, but if AI generated images are not copyrightable, wouldn't it be possible for some people to use it for research and for other people to re-use the results commercially?
I’m curious how they even thought of the idea to train a jigsaw puzzle like that in the first place. My naive guess was that those types of puzzles were preexisting. If in fact it’s a novel type of puzzle, that idea in itself is as cool as the generator they created!
I’m curious if they could ask for permission from the original authors (who doesn’t love a fun puzzle?—and it’s not like the profit motive here is alarming): most licenses are default permission.
You can always reach out and ask for a one-off in good faith.
* or directly inspired by, but the “young lady” prompt triggered the model to pick a dress, and there’s no way to make an eye and an ear or a month and a chocker photo-realistically identical:
https://www.reddit.com/r/RedditDayOf/comments/35cjn5/the_cla...
oh hmm, the penguin/giraffe one when I first saw it I was like "that looks like an upside down penguin, where's the giraffe?" Whereas others I immediately saw what it was trying to be.
I’d need to check, but if one set of “ear and hole” can be swapped with another set, both sets have to be identical in shape and color. But if they split and attach to other edges rather than swap, that creates further connection.
If you think of the edges as nodes in a connected di-graph of ears and holes, possible pairs are connected: a swap is a two-pair cluster; further connection is a four-element chain with both ends open-ended. If that connection ties to more pairs, you might have a larger cluster of identical hears and holes. Given graph properties, that’s presumably most of them — see the prisoners paradox for why [0].
That would make the puzzle much more challenging to solve if most ears fit in most holes.
With that many rearrangeable elements, you could make so many different "valid" solutions, indistinguishable without a photograph, that it would become art rather than a puzzle.
I'd love one of these on my wall. Imagining a framed version of the Einstein pop-art one where the circle in the middle rotates (either periodically or via a manual lever).
I feel like a neural network is probably overkill for this task and a suboptimal substitute for a theoretical understanding of optical illusions, but can't argue with results.
Most of them are not “illusions” where you perceive two identical segments being different lengths because of tricks of human perception, they are ambigrams. They rely on humans’ ability to think of any three dots as two eyes and a mouth.
They also “copy” the way those networks seem to do so often that they somehow get copyright strikes; they were either prompted on existing solutions or learned them whole through training:
* The penguin and giraffe one is a previously known ambigram, for example.
* The old lady turning into a dress is obviously based on a classic pencil drawing where a similar old lady hiding in her collar turns into a young lady looking behind her shoulder [0]; however, the network interpreted “young lady” and turned into a white dress because color-matching the two different body parts from the pencil outline and turning it photorealistic wouldn’t have been much harder otherwise. There are photorealistic interpretations, though [1].
I’m more impressed by the radically new ones, like the fire flipping into a face—but most of those rely on having two distinct parts of the image be meaningful in their own context, and not relevant otherwise.
The black-and-white inversion man/woman is impressive because the two interpretations are not on separate parts of the image. That’s where you can interpret the quality of the effect as the model having learned how humans perceive and pay attention to dark and light contrasts differently. That one captures an understanding of perception.
I can do you even better. I've made an entire game [1] based off of multistable perception [2]. Sugihara out did me by finding optical illusions with triple interpretations [3]. A solid half of MC Escher's work was about the study of tiling wherein both the negative space and the object space could be interchanged [4].
These things aren't a mystery. There are principles you can work from to produce such multi-stable illusions in formulaic, computer generated ways without resorting to the technical debt of a neural net. But, as with so much in modern times, training a neural net gets results faster than distilling a true understanding and then translating your understanding into code.
some of these style illusions I've seen drawn by hand before, but the lithopane ones are new to me. I'm sure the 3d printing lithopane community will love them
As usual with AI-generated artwork: looks nice at first sight, but if you look closer, you can't help but notice the flaws. E.g. the ambigrams: in the "happy"/"holiday" one, the second word is actually missing the "i", and the two "blessing"s are really hard to read. Also, the "campfire man"'s face seems to be melting in a very disconcerting way...
I'm a photographer, and for years I've been pixel peeping at photos taken on phones with "portrait mode"; many years after the first introduction of the feature, regardless of the implementation, results still look crummy to my eye.
Looking at fine elements like hairs (nevermind curly hair) is a disaster, especially when you're used to fine classic german/japanese optics that accurately reproduce every subtle detail of a subject while having extremely aesthetically pleasing sharpness falloff/bokeh.
I've had to swallow the pill though: No one (end users; pros are another story) cares about those details. People just want something that vaguely looks good in the immediate moment, and then it's on to the next thing.
I suspect it'll remain the same for AI generated visuals; a sharp eye will always be able to tell, but it won't really matter for consumption by the masses (where the money is).
An inversion is a word or name written so it reads in more than one way. For instance, the word Inversions above is my name upside down. Douglas Hofstadter coined ambigram as the generic word for inversions. I drew my first inversion in 1975 in an art class, wrote a book called Inversions in 1981, and am now doing animated inversions.
I completely disagree. It's fantastic that we can get access to this hardware for so cheap. A used V100 is $1300. You could pay for Colab Pro for 10 years with that, which will get you faster and faster hardware through the years. Where I am, a month is the cost of two bags of chips.
So don't buy a V100 then, and test it for a few bucks online somewhere. If you want other's to provide yo that hardware for free, with no chance of you actually buying you just come across as entitled.
Do you also think it's sad you can't sneak into Disneyland for 5 minutes for free just to see if there are any streakers in It's a Small World?
I'm getting the impression you're just an entitled gamer who wants a free ride from the University of Michigan, not a professional programmer or AI developer who would actually get some tangible value out of subscribing to ChatGPT for $20 a month. I'm thankful to be alive in a time I can so conveniently get so much value for so little cash.
Is that the case? Is $10 really too much to ask to use a high-end GPU for a month? Then it's not really as sad and hopeless as you complain it is. Just be a good boy all year, ask Santa for an GeForce RTX 4090 for Christmas, leave some cookies and milk out for him, and hope your parents get the hint!
hah wrong on all accounts. not a gamer, yes a professional "programmer", yes paying $20 for chatgpt.
people don't pay for things without getting a feel for what they're getting. hence the huge focus in saas on various monetisation strategies. if someone puts these anagrams in a product, it will be freemium or have a tree tier, and then i will play with it.
there are 20 new projects like this every day, i'm not going to pay for all of them just to try them. i'll try the product if/when there is one
If there are 20 projects like this that make you sad every day, you must be terribly depressed. But you just seem entitled, and feeling sorry for yourself. You don't seem very serious about pursuing and paying for your own self education or even amusement.
You are aware that the University of Michigan is not a startup whose mission is to make an SaaS with a free tier for you to play with funded by their investors, right? Maybe if you enrolled as a grad student they'd let you use their resources for free (once your tuition check cleared). But your chances of being accepted into their PhD program would be higher if you showed more than $10/month in enthusiasm and initiative.
So, Im grad school I had access to an sgi onyx and basically did this but didn't toot my horn about it because.
1. I didn't think it was particularly amazing
2. We didn't have social platforms yet.
I wonder how many permutations could legibly be generated in a single image with an extended version of the same technique. I don't understand the math, but would two orthogonal transformations in sequence still be an orthogonal transformation and thus work?