As someone working in this field, it is simply not closer to 0%
People keep using these "gotcha" examples and never actually look at the stats for it. I get it, there are some terrible detectors out there, and of course they are the free ones :)
GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.
We did a comparison of hand reviewed 3,000 9-12th grade assignments and found that GPTZero holds up really well.
In the same way that plagiarism detectors need a process for review, your educational institution needs the same for AI detection. Students shouldn't be immediately punished, but instead it should be reviewed, and then an appropriate decision made by a person.
> GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.
One false positive out of only "five human-written samples", unless I'm misreading.
Say 50 papers are checked, with 5 being generated by AI. By the rates of GPTZero in the paper, 3 AI-generated papers would be correctly flagged and 9 human-written papers would incorrectly flagged. Meaning a flagged paper is only 25% likely to actually be AI-generated.
Realistically the sample size in the paper is just far too small to make any real conclusion one way or another, but I think people fail to appreciate the difference between false positive rate and false discovery rate.