
CNN-generated images are surprisingly easy to spot for now - hardmaru
https://peterwang512.github.io/CNNDetection/
======
calibwam
Off topic: Always define your abbreviations. To find out what CNN stands for
here, you either have to read a comment thread on HN, or go to the paper and
read the introduction. The linked page doesn't even mention neural networks.
And as some other commenter here has mentioned, CNN has other more well known
meanings than Convolutional Neural Networks.

~~~
alias_neo
It was drilled into us in university (engineering) that you spell out
abbreviations and acronyms on first use, no matter how well known you think it
is.

Some cases I've seen lately seem to forgo this not out of ignorance but as a
form of eletism/knowledge gate keeping.

~~~
everdrive
> Some cases I've seen lately seem to forgo this not out of ignorance but as a
> form of elitism/knowledge gate keeping.

It's a natural tendency for ingroups. Nearly any video game forum, or anything
else that's full of hobbyists will ultimately contain posts that are
absolutely full of acronyms. And they're impenetrable. Bear in mind, I'm not
defending this behavior, and certainly not disagreeing with you.

~~~
yjftsjthsd-h
I'll defend it. Not the elitism, but using jargon/abbreviations/etc. When
you're writing something for a larger audience, you should of course target
that audience. But when you're on the "inside" writing for people who already
have the background knowledge, it's unnecessary friction to stop and think
"what terms would a newbie need defined in this?" It breaks the flow of
writing/discourse and is _probably_ mostly not needed, because someone coming
in not knowing the terms in play can either go look them up, or just ask in
their own post. (Granted, this also depends on that being easy; either a
jargon dictionary being available, or the forum members being friendly to
newbie questions.) I think it's also _understandable_ to apply a _small_
amount of gatekeeping, insofar as that continual beginner questions in the
middle of an advanced discussion are just a distraction. The answer to that
_should_ be directing them to a more beginner-friendly subforum, but FWIW I do
understand why people sometimes act poorly out of frustration.

~~~
ameister14
The solution to the "unnecessary friction to stop and think "what terms would
a newbie need defined in this?"

If you're writing a paper, define every acronym the first time you use it.

If you're in a forum with a set of acronyms known to all, define them in a
sticky or the forum readme.

------
blueblisters
I wonder if these results hold when the CNN-generated images are converted to
an analog medium and back to digital (say scanning a printout or taking a
screencap).

If not, this might indicate that the fingerprints or artifacts left by the
generators are not of the "perceptible" variety.

Also a discriminator trained from this experiment might be useful to train a
more powerful generator.

~~~
manthideaal
From the paper:

(1) We show that when the correct steps are taken, classifiers are indeed
robust to common operations such as JPEG compression, blurring, and resizing.

(2) When using Photoshop like methods the detector performs at chance (is
useless).

------
leod
Interesting. They train an image classifier to detect images that were
generated by a GAN-trained CNN. I wonder if it could be possible to include
this classifier in the training loss, such that the generated images fly under
its radar as much as possible. If this makes sense, then I guess the cat-and-
mouse game just gained another level. On the other hand, what the classifier
is detecting could be a fingerprint of the CNN architecture itself.

(Full disclosure: I have only read the abstract so far.)

~~~
NoodleIncident
> Due to the difficulties in achieving Nash equilibria, none of the current
> GAN-based architectures are optimized to convergence, i.e. the generator
> never wins against the discriminator.

If I understand the terms used, it sounds like you're suggesting adding this
classifier to the discriminator, to avoid detection. Since they are already
failing to pass their existing discriminators, it seems like they could try to
not be detected, but they wouldn't actually succeed.

------
slipheen
I'm not particularly familiar with neural nets, so forgive a rather ignorant
question.

Could the classifier that they're using here be used as a discriminator in a
GAN, to help train it to avoid this detection method?

~~~
skinner_
Absolutely possible, might even be a good idea, but my expectation is that the
results won't be robust: the fakes will be uncovered by a slightly differently
trained classifier. Maybe even the same classifier with a different random
initialization.

~~~
SkyBelow
Sounds like overfitting the defense against classification. Would existing
solutions to overfitting possibly fix this (though make such a network even
more expensive to train)?

------
manthideaal
I have read the paper and there are plenty of useful references and points:
related work, the 11 CNN based image generators models, and the discussion
part.

But sadly I could not obtain a clear picture of what is the difference between
their detector and a baseline one. There are some minor points and references
about upsampling, downsampling, resizing, cropping and fourier spectra
comparison across generators, but those seems to be just comments and
comparison and not crucial points in the construction of the detector.
Furthermore data augmentation doesn't play a big role, they say that it
usually improves (a little) the detector.

As a math person I like to get some more meat from papers, but here it seems
that little tricks allow then to win the game. Perhaps that is the way (little
or no math involved) to make advances. Well, at least they say that shallow
methods modify the fingerprint of the fourier spectra so that now you can't
detect which is the generator of the image.

Perhaps the "universal word" was what captured my attention.

------
pgodzin
If this "universal detector" is now used as a discriminator and the original
models are fine-tuned/re-trained then it will stop being a universal detector
no?

~~~
villgax
As long as the underlying CNN math stays the same it would not matter, Uber
worked on a coord2conv to make better image outputs from CNNs without the
artefacts this method capitalises on.

------
kfuwbi2640
I’ve seen a number of attempts to identify deepfakes and other forms of
manipulated images using AI. This seems like a fool’s errand since it becomes
a never ending adversarial AI arms race.

Instead, I haven’t seen a proposal for a system I think could work well.
Camera and phone manufacturers could have their devices cryptographically sign
each photo or video taken. And that’s it. From that starting place, you can
build a system on top of it to verify that the image on the site you’re
reading is authentic. What am I missing that makes this an invalid approach?

I do understand that this would require manufacturers to implement, but it
seems achievable to get them onboard. I even think you get one company like
Apple to do this and it’s enough traction for the rest of the industry to have
to follow suit.

------
Shivetya
Does it matter that they are easy to spot when the damage they can do would be
well underway before a trusted service invalidates the image?

I am coming at this from the angle of, who would use this type of service
other than the courts? Certainly major news organizations could benefit but we
have numerous recent examples where they have either run with CNN imagery but
they have also purposefully run video and use images of similar events to
portray the view they wanted for a current event.

Of course in the end, if the end game is to have news, image, and video,
validation there will need to be more than one and in separate enough areas of
the world to have some chance all would not be intimidated / infiltrated to
the point they are not trust worthy

------
ThePowerOfFuet
Spoiler alert: this has nothing to do with the Cable News Network.

~~~
ash
What does CNN stand for in this case?

~~~
slipheen
Convolutional Neural Networks

------
andy_ppp
Surely if you want you can train the network to produce images that are not
easily detectable?

So:

1) train a network that can detect CNN generated images

2) train the CNN network to generate whatever you want, politicians in
compromising positions, etc. but also add in weights against the the other
network

3) Images won't be easy to spot...

People will obviously start writing CNNs that detect images that are generated
obfuscated this way with CNNs, but still, it's all possible.

~~~
guidopallemans
What you describe is exactly the way these models work!

Typically; a GAN (Generative Adversarial Network) consists of (1) the
generator; a model generating images and (2) the discriminator; a model that
learns whether images it is fed come from the generator or from the image
dataset. The (gradient) information of how the discriminator made its decision
is fed back into the generator, in order to help it learn how to generate more
_real_ images.

The discriminator is what you describe in step 1, and the generator is your
step 2.

------
DannyB2
Arms race. First an AI can generate synthesized images.

Next, it is possible to have a test which detects those. And that test can be
improved by better training.

Then, another AI learns how to synthesize images which the fake image detector
AI can spot, until it learns how to fool the fake image detector.

Then the fake image detector is improved by training it against the improved
fake image synthesizer.

Repeat.

~~~
darawk
This is the premise of GANs. It seems to me that the tech already exists to
make extremely hard to spot fakes using GANs, it's just that nobody has
bothered to write the code to do it.

------
manthideaal
My understanding from the discussion part, section 5 of the linked paper, is
that the GAN could be modified so that the relative power of the discriminator
and generator is fine tunned in order to generate hard to detect images, by
giving more power to the discriminator in the final steps.

------
adversary10450
How is this different from an adversarial network?

~~~
pmelendez
To my knowledge, adversarial networks are actually two different networks, one
“correcting” the output of the other one. On the other hand, CNN consist of
just one architectural model that internally uses convolution.

------
a3n
Article on CNN-generated images surprisingly easy to spot is surprising
difficult to read on mobile ... for now.

------
villgax
Uber had this coord2conv to create better CNN generated images, maybe that
could fool this detector?

~~~
nl
No, that's 2 years old at this point. The tech has moved on a lot since then.

------
takeda
Is there a high res version of these images to look at?

~~~
relevant1
From their github page:
[https://drive.google.com/file/d/1z_fD3UKgWQyOTZIBbYSaQ-
hz4Az...](https://drive.google.com/file/d/1z_fD3UKgWQyOTZIBbYSaQ-
hz4AzUrLC1/view?usp=sharing)

------
Chinjut
Yeah, from their logo in the corner!!

------
thinkloop
I might argue that the only reason this is on the front page is because of the
confusion surrounding "CNN"

