Most "AI detection tools" are just the equivalent of a Magic 8 ball. In fact, mo...

fsndz · 2025-02-22T14:48:13 1740235693

I am starting to believe this is a lie spread by AI companies because if AI-slop starts to be detected at scale, it kills their primary use case. True, AI detection tools are not perfect, like any classification algo, they don't have a 100% accuracy. But it does not mean they are useless. They give useful probabilities. If AI detectors are so wrong, how do you explain that passing AI generated text on gptzero and it gets it all the time, same when I pass human written content it recognises it as such almost 99% of the time.

titmouse · 2025-02-22T16:13:12 1740240792

It's the false positives that make it useless. Even if it's generally very good at detecting AI, the fact that it can and does throw false positives (and pretty frequently) means that nothing it says means anything.

fsndz · 2025-02-22T21:24:38 1740259478

lol with that kind of reasoning, nobody should use statistics or any kind of machine learning model...

foldr · 2025-02-23T10:43:17 1740307397

That’s actually not a bad rule of thumb!

berdario · 2025-02-23T16:25:17 1740327917

> They give useful probabilities

Yes, compared to an all-or-nothing approach, it's better to be upfront about the uncertainty, especially if the tool surfaces the probability by-sentence.

But how are those probabilities computed? You mention gptzero, but https://gptzero.me/technology doesn't clarify at all how it works. They link papers using GPTZero (i.e. from other researchers), e.g. https://arxiv.org/pdf/2310.13606

And these very same papers highlight how everything is still unknown

> Despite their wide use and support of non-English languages, the extent of their zero- shot multilingual and cross-lingual proficiency in detecting MGT remains unknown. The training methodologies, weight parameters, and the spe- cific data used for these detectors remain undis- closed.

GPTZero seems to be better than some of the alternatives, but other discussions here on HN when it was launched highlight all of the false positives and false negatives it yielded:

https://news.ycombinator.com/item?id=34556681

https://news.ycombinator.com/item?id=34859348

But all of that is pretty old, there have been a couple of posts in the last year about it, but both are about the business, rather than the quality of the tool itself.

https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal...

So, to check if now it's any better I tried it myself: I got it to yield a false negative (50% human and 50% AI rating, for a text which was wholly AI-generated), and I haven't got it to yield a false positive.

But all of this is just anecdotal evidence, I haven't run a rigorous study.

For sure, if some competent people believe that the tool won't generate false positives, I'll be mindful of it and (In the rare cases in which I write a long posts/blog articles, etc.) I'll check that it doesn't erroneously flag what I write.

It's bittersweet: if a tool that can be relied upon really exist, that would be good news. But if that tool is closed source (just like ChatGPT, Gemini, etc.) that doesn't inspire confidence. What if the closed source detection tool will suddenly start erroneously flagging a subset of human texts which it didn't before?

At least, even with the closed source LLMs, we have a bunch of papers that explain their mechanism. I hope that GPTZero will be more forthcoming about the way it works.