Most "AI detection tools" are just the equivalent of a Magic 8 ball.
In fact, most of them are just implemented by feeding an LLM the text, and asking "is it AI generated?". You cannot trust that answer any more than any other LLM hallucination. LLMs don't have a magic ability to recognise their own output.
Even if your "detection tool" was using exactly the same model, at the same exact version... unless the generation was done with 0 temperature, you just wouldn't be able to confirm that the tool would actually generate the same text that you suspect of being LLM-generated. And even then, you'd need to know exactly the input tokens (including the prompt) used.
Currently, the only solution is watermarking, like what Deepmind created:
but even that, it requires cooperation from all the LLM vendors. There's always going to be one (maybe self-hosted) LLM out there which won't play ball.
If you're going to accuse someone of pushing LLM-generated content, don't hide behind "computer said so", not without clearly qualifying what kind of detection technique and which "detection tool" you used.
I am starting to believe this is a lie spread by AI companies because if AI-slop starts to be detected at scale, it kills their primary use case.
True, AI detection tools are not perfect, like any classification algo, they don't have a 100% accuracy. But it does not mean they are useless. They give useful probabilities. If AI detectors are so wrong, how do you explain that passing AI generated text on gptzero and it gets it all the time, same when I pass human written content it recognises it as such almost 99% of the time.
It's the false positives that make it useless. Even if it's generally very good at detecting AI, the fact that it can and does throw false positives (and pretty frequently) means that nothing it says means anything.
Yes, compared to an all-or-nothing approach, it's better to be upfront about the uncertainty, especially if the tool surfaces the probability by-sentence.
And these very same papers highlight how everything is still unknown
> Despite their wide use and support
of non-English languages, the extent of their zero-
shot multilingual and cross-lingual proficiency in
detecting MGT remains unknown. The training
methodologies, weight parameters, and the spe-
cific data used for these detectors remain undis-
closed.
GPTZero seems to be better than some of the alternatives, but other discussions here on HN when it was launched highlight all of the false positives and false negatives it yielded:
But all of that is pretty old, there have been a couple of posts in the last year about it, but both are about the business, rather than the quality of the tool itself.
So, to check if now it's any better I tried it myself: I got it to yield a false negative (50% human and 50% AI rating, for a text which was wholly AI-generated), and I haven't got it to yield a false positive.
But all of this is just anecdotal evidence, I haven't run a rigorous study.
For sure, if some competent people believe that the tool won't generate false positives, I'll be mindful of it and (In the rare cases in which I write a long posts/blog articles, etc.) I'll check that it doesn't erroneously flag what I write.
It's bittersweet: if a tool that can be relied upon really exist, that would be good news. But if that tool is closed source (just like ChatGPT, Gemini, etc.) that doesn't inspire confidence. What if the closed source detection tool will suddenly start erroneously flagging a subset of human texts which it didn't before?
At least, even with the closed source LLMs, we have a bunch of papers that explain their mechanism. I hope that GPTZero will be more forthcoming about the way it works.
In fact, most of them are just implemented by feeding an LLM the text, and asking "is it AI generated?". You cannot trust that answer any more than any other LLM hallucination. LLMs don't have a magic ability to recognise their own output.
Even if your "detection tool" was using exactly the same model, at the same exact version... unless the generation was done with 0 temperature, you just wouldn't be able to confirm that the tool would actually generate the same text that you suspect of being LLM-generated. And even then, you'd need to know exactly the input tokens (including the prompt) used.
Currently, the only solution is watermarking, like what Deepmind created:
https://deepmind.google/discover/blog/watermarking-ai-genera...
but even that, it requires cooperation from all the LLM vendors. There's always going to be one (maybe self-hosted) LLM out there which won't play ball.
If you're going to accuse someone of pushing LLM-generated content, don't hide behind "computer said so", not without clearly qualifying what kind of detection technique and which "detection tool" you used.