To be fair, to me, title should be more like are you able to tell cherry picked AI images from cherry picked human created images, since both clearly were selected to appear similarly picked to confuse a human in my opinion.
No idea if that’s accurate, or even if it is, how to fix it, but one idea would be to get a list of the top ten search queries on a popular stock image site, then randomly pick an AI image and human image; problem being that in my experience it would become painful obvious on average which was real and which was AI, which to me points out that AI images as they’re currently “generated” heavily depend on humans to create prompts and curate which are shared.
>> Except for generations which did not match the prompt at all, I took every image that was created.
While I believe I understand what you mean, to me, this is not what AI generated art means, hence why I suggested limiting prompts to existing top queries to a top stock image site and the using that as the prompt. Also, I have to say that I very surprised none of the AI images were cherry picked, since I personally have run 100s if not 1000s of “high quality” prompts, and in my experience volume of misfits is normal pretty high; for sure read and reviewed 1000s of prompts and their outputs which were shared and in my experience even those have artificial artifacts and are likely cherry picked from the ones that were much worse.
Another alternative might be to get the top best selling stock images, run them through CLIP, then use those as the prompts, but then you’re evaluating CLIP + plus whatever AI image generation system you’re using.
Another huge give away is the resolution. Obviously you’re able to get ultra super resolution human generated images, but to my knowledge that’s not possible yet for AI images.
Which AI generation tool were you using and what were the prompts/settings?
Two models were used for image generation: the not-yet-released AniPlus v2 and vanilla Stable Diffusion 1.5 AniPlus v2 generally creates good if not great images consistently.
All images were generated in AUTOMATIC1111's web UI, with the DPM++ 2S A sampler @ 15 steps. I'll soon release the metadata for the AI images.
> Actually no, they were not cherry picked. Except for generations which did not match the prompt at all
Second sentence describes process of selection. Is there difference in English between "filtered" / "selected" and "cherry picked"? Is "cherry picked" implying extreme filtering?
Filtered is the process of removing parts from the whole. Like how a water filter removes unwanted chemicals.
Selected means it was chosen. So after filtering you could make a selection from all or part of the remaining choices.
Cherry picked, means selected individually to meet a goal.
Selecting very bad results to be removed is almost the opposite of cherry picking. Cherry picking would be picking only the best images and presenting them, here they removed the worst images and showed the rest.
To me, there’s no right answer, hence why I listed set of examples and ended list with etc.
Based on my own experience, selecting images that fit the prompt and images that just happened to be generally speaking a fit for the creator are different modes of producing outputs to share.
That said, as it relates to the topic of this HN post, understanding the systems, assumption, workflow, etc used to render the survey, its content, and results to me are vital to understanding what actually going on.
Is removing rejects not a way to pick only the best?
It is not, unless your rejection criteria is "everything that is not the best." However, if your rejection criteria is "not meeting basic requirements" then you are left with the best, and the second best and the middle, all the way down to the bottom of adequate.
So if "basic requirements" in this case are "on first glance like a face good enough to fool a human", I don't see how you could have a "best" criteria that's significantly different without accounting for the odd glitches in the current set.
You're making a distinction that I understand can exits. I'm just failing to see where it exists in this case.
Good test would document its hypothesis, procedures, results, etc - and let the reader make their own decisions on its merits and their experience. 100% agree cherry picking is part of the process of generating AI content, though there’s no standard for evaluating the effectiveness of a system, creator, or audience at cherry picking good from bad generations. By pushing as much of the cherry picking to the audience, in my opinion it’s currently the best way to reduce assumptions and get better sense of what current state of the art is. As I mentioned before though, not sure what the answer is, just know it’s an issue, especially in this context in my opinion.
That's a great summary. Another telltale sign is the discontinuity in the partially obstructed background.
If you have a human subject in the foreground, the parts of the background to the left and to the right of the subject do not connect properly. What should be single continuous parts of a building, a window, a railing or a landscape are drawn on different heights or otherwise do not form a whole structure.
By consistency I was meaning to say wrong continuity as you put it.
BTW for some anime images I felt like i have seen them on that subreddit. Even if not those exact images, I have seen similar. That particular style (coloring, smoothness etc) also made me look carefully for flaws.
Those particular images were so generic with the characters lacking any distinct traits, they could have easily been from either Stable Diffusion, or some ultra-low budget visual novel.
Probably shouldn't show duplicate images in the same test, as that's just dropping the sample size.
You may want to spell out exactly what you mean by "AI". If someone goes into photoshop and clicks "Edit › Sky Replacement.", it's not quite what I'd call human made.
It keeps asking me to do arithmetic problems to prove I'm not an AI (several in a row) and then loses its place and starts over with #2 (not necessarily the same image).
I tried this morning (19 of 30 (63%)). I just tired again, and clicked "Continue Quiz!" and then "Save for later... [arrow]" and it offered the possibility to add my email. So the GP can still fill an email and get the result next week. (With the real email or mailinator.)
Perhaps you should change the text form "Save for later... [arrow]" to something like "Get the results by email... [arrow]".
>> CAPTCHA: In order to prevent bogus quiz submissions and reduce required network resources, please solve the following problem: (add/subtract two number)
>> Repeated CAPTCHA prompts, image, repeated CAPTCHA prompts, image, etc.
Wow. No idea what is going on, but you managed to make the most unusual and annoying experience I have had on the web ever. Curious, what was your reasoning for doing this?
The CAPTCHA is supposed to filter bots. For some reason, the site has a bug keeping track of sessions (it just stops being able to), so as a result, it can't keep track of any quiz data.
I restarted the server so it should be working again.
> You identified 12 of 14 (85%) AI-generated images.
> You identified 14 of 16 (87%) human-made images.
In general, I followed a pretty simple rule:
- If it's a photograph, it's likely human if there're no immediate imperfections
- If it's an illustration, it's AI if the art has imperfections OR if it's anime-adjacent (example: images with glossy lighting on human skin)
Right now, SD-based art is trending towards a particular art style (anime-like & 3D-ish art), with little inherent ability towards creating subtle details in the background. It has a hard time drawing hands correctly, and often fails in drawing reflections. It also seems to have a hard time creating flat illustrations, or illustrations with limited color palates.
It would be nice to see the mistakes at the end. Right now, I have a feeling that I misidentified the anime pictures but was good on the nature ones, but I would like to know whether that's actually the case.
If you submitted your email address, I'll email you the details next week. If not, then visit the session dump page [1][2] and send me your session ID in an email.
Interestingly, I'm not experienced with AI art, but I am a photographer and used to do 3D graphics. I focused on composition, interest and "overhype" - yes you could add a glow / halo effect here, but does it look right? Would this photo be interesting to a photographer, or is it a bit meh?
So while my hit rate is lower than others mentioned on this thread (and not inconsistent with 50:50 guess really, so p-value of 12%), it feels like there are some non-detail aspects that aren't human.
But even then, distinguishibility doesn't mean it's bad. For the most part, these were all very "accomplished" pictures - if you could make them as a human, you would be pretty pleased with yourself.
29/30. I've spent far too much time playing with StableDiffusion though, and those anime-style illustrations seemed too easy to pick out. I think the quiz could have been better with some more variety with the AI imagery.
>You identified 16 of 17 (94%) AI-generated images.
>You identified 12 of 13 (92%) human-made images.
I'm pretty sure the errors are the only two times I hesitated a fraction before clicking, it would be nice to have them in the summary.
Most AI images have issues that I find pretty jarring after having seen a lot. It's a bit like seeing somebody without drafting fundamentals but perfect rendering skills. Currently the only way to really have human level output is to regurgitate something very close to the training data. So I don't think way to make it really hard and "fair" at the same time.
They almost always do have them if you look close. I got 28 of 30 (93%) and correctly identified all AI images. Anime-style ones are obvious even outside of the hands/eyes because SD can only produce one really specific type of shading that is a mishmash of many anime styles but doesn't match any of them, pictures of houses always have some part of them deformed, pictures of landscapes tend to be the hardest to identify but usually have some detail in them that doesn't make sense when you actually think about it (one drawing of a forest had a subtle but consistent horizontal line behind the trees), etc.
Since it didn't tell you what ones you got right, I can't use my 76% as evidence (I got a lot of blurry pictures of forests, which are hard because it's either an AI or a low-effort background art asset), but while landscapes are the hardest, a bunch of obvious tells without hands or eyes (and the other well-known one: text) are:
Shoelaces. Or anything where things pass over and under other things. The AIs did really poorly with belt buckles too. Its sense of object permanence is entirely stochastic, and small details don't get a lot of virtual brain power devoted to them.
Straight lines and repeating patterns. "Human hand holding a 30cm ruler in front of a chain link fence" is probably AI kryptonite.
Symmetry. If there's a nontrivial object that's supposed to be symmetric, chances are, it won't be.
Swept curves and perspective. Another thing that's stochastic is the AI's understanding of how 3D spaces map onto 2D images.
Circular objects being transformed in 3D, like a roundabout being viewed in perspective, or a curved road moving off into the distance will typically be inconsistent with human expectations.
Of course, some of these mistakes are also made by human artists, but generally the human artists who make these mistakes also have amateurish colouring, lines or brushwork, and AIs are generally pretty good at emulating good brushwork.
Hands definitely, AI doesn't seem sure if a hand ought to have 0 1 or 2 thumbs, and it's also unclear how fingers work.
I haven't noticed big problems with eyes.
AI doesn't really seem to understand how boobs work, but human artists (presumably especially male artists?) don't seem to understand how boobs work either so it isn't a tell.
Today's AI doesn't understand how text works, several images I was shown had human text which made sense in context and therefore must be human made (e.g. a scene of an office front with a car having a plate with number in the form AA99 AAA, that's how plates work in the UK an AI has no reason to write AA99 AAA rather than 9A9 99 or whatever), one was trickier because the text was Japanese, and I don't read Japanese - I understand a couple of easy characters but not enough to be sure, I marked it human because the text looked plausible, if I could read Japanese I'd have been confident.
Cliché is in general a hard problem. Is this really obvious trope in the image because the AI just copies what it has seen (in some broad sense) or is it there because human artists also copy what they have seen in some similarly broad sense.
also look for geometric details that a human artist wouldn't bother with because they don't really make sense like some golden swirly jewelry thing on a dress that barely looks like anything
>You correctly classified 22 of 30 (73%) images!
>You identified 12 of 12 (100%) AI-generated images.
>You identified 10 of 18 (55%) human-made images.
I draw regularly so that may have helped.
I managed to identify the AI images because they contained very AI-like mistakes (ie. poorly defined hands, inexact facial symmetry and shadows mismatching the object). When the image contained no obvious mistakes, it was more or less a coin flip for me, turns out the images without mistakes were probably all from humans.
I got a few repetitions, and a few images that were very similar (like in a serie of drawings about the same character). Is the set of images the same for everyone or you are randomizing it?
I found it pretty easy once I started looking at context. For example, “would this person be wearing these kind of earrings?”, etc. Seems like AI does faces incredibly well, but misses the broader picture.
aww, you don't get to see your mistakes? I got 80% right. Always interesting what where the clue can be found (fancy mirroring, subtle but correct things, small text, textures, eyes, hands...)
No idea if that’s accurate, or even if it is, how to fix it, but one idea would be to get a list of the top ten search queries on a popular stock image site, then randomly pick an AI image and human image; problem being that in my experience it would become painful obvious on average which was real and which was AI, which to me points out that AI images as they’re currently “generated” heavily depend on humans to create prompts and curate which are shared.
How did you filter, select, etc the images?