Hacker News new | past | comments | ask | show | jobs | submit login
OpenAI shuts down its AI Classifier due to poor accuracy (decrypt.co)
503 points by cbowal on July 25, 2023 | hide | past | favorite | 283 comments



I'm glad that they did, although they should obviously done an announcement for it.

The amount of people in the ecosystem who thinks it's even possible to detect if something is AI written or not when it's just a couple of sentences is staggering high. And somehow, people in power seems to put their faith in some of these tools that guarantee a certain amount of truthfulness when in reality it's impossible they could guarantee that, and act on whatever these "AI vs Human-written" tool tell them to.

So hopefully this can serve as another example that it's simply not possible to detect if a bunch of characters were outputted by an LLM or not.


Indeed it's not possible. Say you had a classifier that detected whether a given text was AI generated or not. You can easily plug this classifier into the end of a generative network trying to fool it, and even backpropagate all the way from the yes/no output to the input layer of the generative network. Now you can easily generate text that fools that classifier.

So such a model is doomed from the start, unless its parameters are a closely-guarded secret (and never leaked). Then it means it's foolable by those with access and nobody else. Which means there's a huge incentive for adversaries to make their own, etc. etc. until it's just a big arms race.

It's clear the actual answer needs to be: we need better automated tools to detect quality content, whatever that might mean, whether written by a human or an AI. That would be a godsend. And if it turned into an arms race, the arms we're racing each other to build are just higher-quality content.


The whole problem with AI is that it's able to copy some of the superficial indicators of quality content while feeding you lies. You cannot detect quality content without detecting truthfulness. Any heuristic you use in place of that can be copied without actually providing value (which is exactly what ChatGPT does now, when it gets things wrong)


That's the whole problem with LLMs in general. They are designed to be convincing, not necessarily accurate.


thats my take on all this LLM hype: they're great at creative work where accuracy is not important, and assisting in technical work where the user is already able to discern an answer that is accurate from one that is slightly to fully bullshit.

even if an LLM can give an amazing and correct answer 7/10 times, it still takes a human expert to cherrypick which 7 answers are amazing and which are just convincingly-assembled bs.


Yeah - but there’s a lot of domains where that is fine. I’m in France at the moment and I’ve been using chatgpt as a tour guide. I’m sure some of what it says is wrong, but I don’t honestly care. It’s also fantastic for teaching. I’ve been doing some self study lately and it’s been helping me to figure out what I should spend time learning and help direct my self study sessions toward what will help.

I listened to an interview with the StabilityAI founder / ceo the other day. He said we should think about LLMs like having a bunch of clever grad students / interns floating around that we can freely offload tasks to. They aren’t experts, but they’re very diligent. The question is, how can we effectively make use of them? People who succeed at this will be much more productive.


Can you explain how you use it for teaching/study? I also have used it to learn but with mixed results. Recently, I've been asking it to write me outlines so I can have somewhat of a learning plan.


I'm learning AI at the moment. I gave ChatGPT the following prompt:

> Write a training plan for a series of lessons to teach someone modern deep learning. The training plan should last for approximately 3 months of lessons.

> The lesson plan is for a single student with a strong background in programming (systems programming, algorithms and web). But the student has little knowledge of python. And university level mathematics knowledge but relatively weak skills in linear algebra, probability and statistics.

> By the end of the training process, the student should know modern deep learning methods and techniques and be able to modify, implement and deploy AI based systems.

> Think through your answer. Start by listing out learning objectives, then write a teaching plan to meet those learning objectives.

The response from chatgpt was super long! It gave me recommendations for what to study each week for the next 3 months. I've started going through the material it recommended. For the first 2 weeks, my goal is to learn the basics of python, and learn some linear algebra, and probability and statistics. Then its just a case of finding appropriate material online. I'm watching a lecture series on youtube teaching matrix mathematics now.

[1] https://chat.openai.com/share/d6966012-0d96-4511-b96e-086b80...


Most of those topics require months to learn. Is it feeding you a small number of lessons each week?


I could be wrong, but it looks pretty reasonable to me assuming fulltime study. Obviously, I'll scale how much time I spend on each topic based on what interests me, what I find difficult and how much theory vs practice I want at the time. Too much time coding and I might never get through all the content. Too much theory (lectures) and I'll lose motivation, and stop remembering the lessons as well.

I haven't asked for specific lessons from chatgpt. But there's fantastic material about most of this stuff online. Pick just about any of those bullet points in the lesson plan and there's a ton of great videos on youtube, and courses on coursera and friends if I want to go deeper. And I'm sure I'll be asking questions to chatgpt as I go as well. And, maybe, ask for more detailed lesson plans and suggestions on example problems to code up.

With chatgpt as a personalized tutor and youtube & coursera for lectures, its astounding how easy it is to learn stuff like this now.


Lots of human discourse is that way too. The LLMs just learned it from us.


Whether true or not is irrelevant to this discssion, we're discussing problems with AI tooling, which, can generate highly convincing like 100000x faster than you.


> You can easily plug this classifier into the end of a generative network trying to fool it, and even backpropagate all the way from the yes/no output to the input layer of the generative network. Now you can easily generate text that fools that classifier

could you contextualize your use of the word "easily" here?

I feel like "easily" might mean "with infinite funds and frictionless spherical developers."


GANs are established engineering. Infinite funds and frictionless spheres aside, you don't need to break ground, but copy/paste/glue existing code with some comprehension.

LLMs are newer than GANs afaik, it just so happens GANs are a good fit here, not that one is "smarter" or "dumber".


Even without a GAN, it's quite possible that one could simply write `while(rejects(output)) { output = gpt(prompt) }` and obtain a sufficient output after a reasonable number of iterations.


You just outlined an excellent proof of what might be called the AI Halting Problem.


Even the idea of it is bad, ChatGPT is supposed to write indistinguishably from a human.

The "detector" has extremely little information and the only somewhat reasonable criteria are things like style, where ChatGPT certainly has a particular, but by no means unique writing style. And as it gets better it will (by definition) be better at writing in more varied styles.


Copying a comment I posted a while ago:

I listened to a podcast with Scott Aaronson that I'd highly recommend [0]. He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.

Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.

I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.

That part starts around 1 hour 25 min in.

> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.

https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...


I think the chance of this working reliably is precisely zero. There are multiple trivial attacks against this and it can not work if the user has any kind of access to token level data (where he could trivially write his own truly random choice). And if there is a non-water marking neural network with enough capacity to do simple rewriting you can easily remove any watermark or the user does the minor rewrite himself.


It’ll be the equivalent to a shutterstock watermark.


I heard of this (very neat) idea and gave it some thought. I think it can work very well in the short term. Perhaps OpenAI has already implemented this and can secretly detect long enough text created by GPT with high levels of accuracy.

However, as soon a detection tool becomes publicly available (or even just the knowledge that watermarking has been implemented internally), a simple enough garbling LLM would pop up that would only need to be smart enough to change words and phrasing here and there.

Of course these garbling LLMs could have a watermark of their own... So it might turn out to be a kind of cat-and-mouse game but with strong bias towards the mouse, as FOSS versions of garblers would be created or people would actually do some work manually, and make the changes by hand.


There are already quite complex language models which can run on a CPU. Outside of the government banning personal LLMs, the chance of there not existing a working fully FOSS and open data rewrite model, if it becomes known that ChatGPT output is marked, seems very low.

The water marking techniques also can not work after some level of sophisticated rewriting. There simply will be no data encoded in the probabilities of the words.


If it's sophisticatedly rewritten then it's no longer AI generated


That is not a reliable indicator even today. GPT-4 (not the ChatGPT RLHF one) is not distinguishable from human writing. You could ask it about modern events, but that's not a long term plan, and it could just make the excuse they don't follow the news.


This, or cryptographic signing (like what the C2PA suggests) of all real digital media on the Earth are the only ways to maintain consensus reality (https://en.wikipedia.org/wiki/Consensus_reality) in a post-AI world.

I personally would want to live in Aaronson's world, and not the world where a centralized authority controls the definition of reality.


How can we maintain consensus reality, when it has never existed? There are a couple of bubbles of humanity where honesty and skepticism and valued. Everywhere else, at all moments of history, truth has been manipulated to subjugate people. Be it newspaper owned by polical families, priests, etc.


This would be trivially broken once sufficiently good open source pretrained LLMs become available, as bad actors would simply use unwatermarked models.


Even if you could force the bad actors to use this watermarked large language model, there's no guarantee that they couldn't immediately feed that through Langchain into a different large language model that would render all the original watermarks useless.


I'd challenge this assumption. ChatGPT is supposed to convey information and answer questions in a manner that is intelligible to humans. It doesn't mean it should write indistinguishably from humans. It has a certain manner of prose that (to me) is distinctive and, for lack of a better descriptor, silkier, more anodyne, than most human writing. It should only attempt a distinct style if prompted to.


ChatGPT is explicitly trained on human writing it's training goal is explicitly to emulate human writing.

>It should only attempt a distinct style if prompted to.

There is no such thing as an indistinct style. Any particular style it could have would be made distinct by it being the style ChatGPT chooses to answer in.

The answers that ChatGPT gives are usually written in a style combining somewhat dry academic prose and the type of writing you might find in a Public Relations statement. ChatGPT sounds very confident in the responses it generates to the queries of users, even if the actual content of the information is quite doubtful. With some attention to detail I believe that it is quite possible for humans to emulate that style, further I believe that the style was designed by the creators of ChatGPT to make the output of the machine learning algorithm seem more trustworthy.


That's true, you could even purposely inject fingerprinting into its writing style and it could still accomplish the goal of conveying information to people.


All I would have to do is run the same tool over the text, see it gets flagged, and then modify the text until it no longer gets flagged. That's assuming I can't just prompt inject my way out of the scenario.


That's true of virtually any detection tool, no?

"All I have to do is modify my virus until the anti-virus doesn't detect it."


But then that wouldn't be “detecting AI”, but merely recognizing an intentionally added fingerprint, which sounds far less attractive…


I tried an experiment when GPT4 allowed for browsing. I sent it my website and asked it to read my blog posts, then to write a new blog post in my writing style. It did an ok job. Not spectacular but it did pick up on a few things (I use a lot of -'s when I write).

The point being that it's already possible to change ChatGPT's tone significantly. Think of how many people have done "Write a poem but as if <blah famous person> wrote it". The idea that ChatGPT could be reliably detected is kind of silly. It's an interesting problem but not one I'd feel comfortable publishing a tool to solve.


Yup.

Moreover, the way to deal with AI in this context is not like the way to deal with plagiarism; do not try to detect AI and punish its use.

Instead, assign it's use, and have the students critique the output and find the errors. This both builds skills in using a new technology, and more critically, builds the essential skills of vigilance for errors, and deeper understanding of the material — really helping students strengthen their BS detectors, a critical life skill.


Yes. Whether we like it or not, AI is with us to stay. A skill that AI can easily supplant is a skill that will become outdated very quickly. We're better off teaching students how to use AI effectively. Hopefully this will "future proof" them somewhat.


Nitpick: ChatGPR is supposed to write in a way that is indistinguishable from a human, to another human.

That doesn't mean that it can't be distguishable by some other means.


I think for small amounts of text there's no way around it being indistinguishable to a machine and not distinguishable to a human. There just aren't that many combinations of words that still flow well. Furthermore as more and more people use it I think we'll find some humans changing their speech patterns subconsciously more to mimic whatever it does. I imagine with longer text there will be things they'll be able to find, but, I think it will end up being trivial for others to detect what those changes are and then modifying the result enough to be undetectable.


I think for this sort of problem it is more productive to think in terms of the amount of text necessary for detection, and how reliable such a detection would be, than a binary can/can't. I think similarly for how "photorealistic" a particular graphics tech is; many techs have already long passed the point where I can tell at 320x200 but they're not necessarily all there yet at 4K.

LLMs clearly pass the single sentence test. If you generate far more text than their window, I'm pretty sure they'd clearly fail as they start getting repetitive or losing track of what they've written. In between, it varies depending on how much text you get to look at. A single paragraph is pretty darned hard. A full essay starts becoming something I'm more confident in my assessment.

It's also worth reminding people that LLMs are more than just "ChatGPT in its standard form". As a human trying to do bot detection sometimes, I've noticed some tells in ChatGPT's "standard voice" which almost everyone is still using, but once people graduate from "Write a blog post about $TOPIC related to $LANGUAGE" to "Write a blog post about $TOPIC related to $LANGUAGE in the style of Ernest Hemmingway" in their prompts it's going to become very difficult to tell by style alone.


If a human can't verify whether distinguished text is actually AI or not, detection will be full of false positives and ultimately unreliable.


Precisely -- watermarks are an obvious example of this. To me, this is THE path forward for AI content detection.


Watermarking text can't work 100% and will have false negatives and false positives. It is worse than nothing in many situations. It is nice when the stakes are low, but when you really need it you can't rely on it.


The default style people cites about ChatGPT is also nothing intrinsic about AI, it is just this paticular AI is trained and prompted to output information in this way. The output style can change drastically with just a little prompt change even on the user side.


Why even care if it is written by a machine or not? I am not sure it matters as much as people think.


There are a number of reasons people may care. For instance, the thing about art that appeals to me is that it's human communication. If it's machine generated, then I want to know so that I can properly contextualize it (and be able to know whether or not I'm supporting a real person by paying for it).

A world where I can't tell if something is made by human or by machine is a world that has been drained of something important to me. It would reduce the appeal of all art for me and render the world a bit less meaningful.


Fair, but I think that will shake out easier than expected: if there is a market (i.e. it is being valued) for certain things human generated people will work on being able to authenticate their output. Yes, there will likeky be fraud etc., but if there is a reasonable market it has a good chance of working because it serves all participants.


> Why even care if it is written by a machine or not? I am not sure it matters as much as people think.

You don't see the writing on the wall? OK, here is a big hint: it might make a huge difference from a legal perspective whether some "photo" showing child sexual abuse (CSA) was generated using a camera and a real, physical child, or by some AI image generator.


I don't think all jurisdictions make that distinction to start with and even if they did and societies really wanted to go there: not sure why a licensing regime on generators with associated cryptographic information in the images could not work. We don't have to be broadly permissive, if at all.


I agree with you, but in some jurisdictions the distance between stuff generated with AI and actual photographs of child abuse are treated rather closely; either way, possessing either could result in what the England & Wales calls a "sexual harm prevention order" (SHPO). To me the idea that someone could be served such an order without ever possessing real CSEM (or "child porn"), never mind actually never being near a child is rather worrying.


Well, I teach English as a second language in a non-English speaking country. I often used short essays and diary-writing for homework. The students have had lots of English input over the years, but not much experience with output. So, writing assignments work out very well for them. Alas, with ChatGPT on the rise here, they no longer have to write it themselves.

The upshot of which is, the useful writing assignments I used to give as homework will either have to be done in class (wasting valuable class time) or given up altogether (wasting valuable learning experiences).


> Well, I teach English as a second language in a non-English speaking country. I often used short essays and diary-writing for homework. The students have had lots of English input over the years, but not much experience with output. So, writing assignments work out very well for them. Alas, with ChatGPT on the rise here, they no longer have to write it themselves.

If your students want to betray themselves of the possible learning opportunities of attempting to formulate the sentences by themselves in English, it is their problem.

The same holds in mathematics (degree course): of course, in the first semesters, you can use a computer algebra system like Maple or Mathematica for computing the integrals on your exercise sheets, but you will betray yourself of the practice of computing integrals that these exercise sheets are supposed to teach you.


Is your objective that your students learn, or to police them? In the latter case, yes, you have those two options you mentioned. In the former case, you can just continue as you were. Some will cheat and some will not. The ones who do are only cheating themselves.


My goal is for them to learn. And yes, I can just carry on, and some will cheat. I've caught cheaters in the past, of course, but they were far and few between. With ChatGPT and even improved translators like DeepL, it's hard to get them to do the practice they need to learn.

And as a teacher who really WANTS them to learn and to get that feeling, "Hey, I can actually do this!", it's depressing to think of the one who do cheat themselves. Oh well...


Just tell them, "the point of this exercise is for you to practice writing in English, not for me to grade you. If you use ChatGPT to do it for you I won't be able to notice, but it will be pointless. It's better if you don't do it at all than if you do that."


Teens are very propense to succumb under peer pressure. Some would be honest and do the work themselves, but if a significant subgroup is using GPT there's a bigger chance that this group influences some of them to do so.


What if they accompany the writings with a recording of them reading it aloud?

I think you could pick up right quick on who understood what they wrote, and who didn't.


That's actually a pretty good idea. I might have to try it selectively, though it'd be too much for 200 students submitting diaries and summaries every week. laugh

They once said I should join Line then we can all talk, then I asked if it's possible to talk in groups of 200+ and their eyes got really big.


There's also the post going around about how it can (and does) falsely flag human posts as AI output, particularly among some autistic people. About as useful as a polygraph, no?


Both false-positives are as useful as the other one, flagged "human" but actually "LLM" vs flagged "LLM" but actually "human". As long as no one put too much weight on the result, no harm would have been done, in either case. But clearly, people can't stay away from jumping to conclusions based on what a simple-but-incorrect tool says.


A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process. There is no way to know when it’s wrong so you’ll either use it to help justify what you want, or ignore it.

Edit: this tool is as reliable as a magic 8-ball


> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.

It can be used for some decision (i.e. not critical ones), but it should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard.

> this tool is as reliable as a magic 8-ball

Citation needed


The AI tool doesn't give accurate results. You don't know when it's not accurate. There is no accurate way to check its results. Who should use a tool to help them make a decision when you don't know when the tool will be wrong and it has a low rate of accuracy? It's in the article.


> The AI tool doesn't give accurate results.

Nearly everything doesn't give 100% accurate results. Even CPUs have had bugs their calculation. You have to use a suitable tool for a suitable job with the correct context while understanding it's limitation to apply it correctly. Now that is proper engineering. You're partially correctly but you're overstating:

> A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process.

That's totally wrong and an overstated position.

A better position is that some tools have such a low accuracy rate that they shouldn't be used for their intended purpose. Now that position I agree with it. I accept that CPUs may give incorrect results due to a cosmic ray event, but I wouldn't accept a CPU that gives the wrong result for 1/100 instructions.


The thread is about tools to evaluate LLMs. Please re-read my comment in that light and generously assume I'm talking about that.


Your comment applies to all these tools though lol. No need to clarify, it's all a probabilistic machine that's very unreliable.


>"should NOT be used to accused someone of academic misconduct unless the tool meets a very robust quality standard."

Meanwhile, the leading commercial tools for plagiarism detection often flag properly cited/annotated quotes from sources in your text as plagiarism.


That sounds like a less serious problem—if the tool highlights the allegedly plagarized sections, at worst the author can conclusively prove it false with no additional research (though that burden should instead be on the tool’s user, of course). So it’s at least possible to use the tool to get meaningful results.

On the other hand, an opaque LLM detector that just prints “that was from an LLM, methinks” (and not e.g. a prompt and a seed that makes ChatGPT print its input) essentially cannot be proven false by an author who hasn’t taken special precautions against being falsely accused, so the bar for sanctioning people based on its output must be much higher (infinitely so as far as I am concerned).


I agree. Just noting the bar is very low for these tools, which may have set low expectations.


ChatGPT isn't the only AI. It is possible, and inevitable, to train other models specifically to avoid detection by tools designed to detect ChatGPT output.

The whole silly concept of an "AI detector" is a subset of an even sillier one: the notion that human creative output is somehow unique and inimitable.


This is an unreasonable standard. Outside of trivial situations, there are no infallible tools.


You're right. After reading what I'd wrote, there should be some reasonable expectations about a tool, such as how accurate it is, or what are the consequences to be wrong.

The AI detection tool fails both as it has a low accuracy and could ruin someones reputation and livelihood. If a tool like this helped you pick out what color socks you're wearing, then it's just as good as asking a magic 8-ball if you should wear the green socks.


If you were trying to predict the direction a stock will move (up or down) and it was right 99.9% of the time, would you use it or not?


This is a strawman. First, the AI detection algorithms can't offer anything close to 99.9%. Second, your scenario doesn't analyze another human and issue judgement, as the AI detection algorithms do.

When a human is miscategorized as a bot, they could find themselves in front of academic fraud boards, skipped over by recruiters, placed in the spam folder, etc.


> Second, your scenario doesn't analyze another human and issue judgement, as the AI detection algorithms do.

> When a human is miscategorized as a bot, they could find themselves in front of academic fraud boards, skipped over by recruiters, placed in the spam folder, etc.

Is the problem here the algorithms or how people choose to use them?

There’s a big difference between treating the results of an AI algorithm as infallible, and treating it as just one piece of probabilistic evidence, to be combined with others, to produce a probabilistic conclusion.

“AI detector says AI wrote student’s essay, therefore it must be true, so let’s fail/expel/etc them” vs “AI detector says AI wrote student’s essay, plus I have other independent reasons to suspect that, so I’m going to investigate the matter further”


That's exactly why the stock analogy doesn't work. People don't buy algorithms, they buy products - such as detectors or predictors. You necessarily have to sell judgement alongside the algorithm. So debating the merits of an algorithm in a vacuum, when the issue being raised is the human harm caused by detector products, is the strawman.


> People don't buy algorithms, they buy products - such as detectors or predictors. You necessarily have to sell judgement alongside the algorithm.

Two people can buy the same product yet use it in very different ways: some educators take the output of anti-cheating software with a grain of salt, others treat it as infallible gospel.

Neither approach is determined by the product design in itself, rather by the broader business context (sales, marketing, education, training, implementation), and even factors entirely external to the vendor (differences in professional culture among educational institutions/systems).


It's not a strawman. There are many fundamentally unpredictable things where we can't make the benchmark be 100% accuracy.

To make it more concrete on work I am very familiar with: breast cancer screening. If you had a model that outperformed human radiologists at predicting whether there is pathology confirmed cancer within 1 year, but the accuracy was not 100%, would you want to use that model or not?


It's a strawman because they aren't comparable to AI detection tests. A screening coming back as possible cancer will lead to follow up tests to confirm, or rule out. An AI detection test coming back as positive can't be refuted or further tested with any level of accuracy. It's a completely unverifiable test with a low accuracy.


You are moving the goalposts here. The original claim I am responding to is "A tool that gives incorrect and inconsistent results shouldn’t have any part of a decision making process."

I agree that there are places where we shouldn't put AI and that checking whether something is an LLM or not is one of them. However I think the sentence above takes it way too far and breast cancer screening is a pretty clear example of somewhere we should accept AI even if it can sometimes make mistakes.


The thread is about tools to evaluate LLMs. Please re-read my comment in that light and generously assume I'm talking about that.


Seems a tautology no? “As long as we ignore the results the results don’t matter.”


Flagged "human" but actually "LLM" is not a false positive, but a false negative.


It depends how the question is framed: are you asking to confirm humanity, or confirm LLM.

If you are asking, is this LLM text Human generated, and it says Human (yes), then it is false positive.

If you are asking is this LLM generated text LLM generated, and is says and it says Human (no), then it is a false negative.


That seems like a restrictive binary. Are there not other entities which generate text? What if a gorilla uses ASL that is transcribed? ELIZA could generate text, after a fashion, as a precursor to LLM. It seems like there's a number of automated processes that could take data and generate text, sort of, like weather reports, no?

So I think the only thing a mythical detector could determine would be LLM, or non-LLM, and let us take it from there. But detectors are bunk; I've had first-hand experience with that.


We could combine those, couldn't we?


Some kind of Voigt-Kampff Test, perhaps.


Something something cells, interlinked.


You could but is there any reason to believe these two noisy signals wouldn't result in more combined noise than signal?

Sure, it's theoretically possible to add two noisy signals that are uncorrelated and get noise reduction, but is it probable this would be such a case?


Yes, you can :)

It all depends on the properties of the signal and the noise. In photography you can combine multiple noisy images to increase the signal to noise ratio. This works because the signal increases O(N) with the number of images but the noise only increases O(sqrt(N)). The result is that while both signal and noise are increasing, the signal is increasing faster.

I have no idea if this idea could be used for AI detection, but it is possible to combine 2 noisy signals and get better SNR.


If the noisy signals are not completely correlated then the signal would be enhanced; however in this case I imagine that there is likely to be a strong correlation between different tools which would mean adding additional sources may not be so useful.


TBH, a properly-administered polygraph is probably more accurate than OpenAI's detector (of course, "properly administered" requires the subject to be cooperative and answer very simple yes or no questions, because a poly measures subconscious anxiety, not "truth")


Polygraph is pseudo-science, it measures nothing.


I mean, it literally and factually measures multiple your body's autonomous responses - all of which are provably correlated with stress. That's what a polygraph machine is. Saying it measures nothing is factually incorrect.

You can't detect "truth" from that, but you can often tell (i.e. with better accuracy than chance) whether or not a subject is able to give a confident, uncomplicated yes-or-no to a straightforward question in a situation where they don't have to be particularly nervous (which is why it's not very useful for interrogating a stressed criminal suspect, and should absolutely be inadmissible in court).

But everyone knows that it's not very reliable in almost every circumstance it's used. My point is that while only marginally better than chance, it's still better than chance, unlike the OpenAI's detector, which is significant worse than chance.


Right. The point is: it absolutely does NOT measure what it claims to measure, i.e. truthfulness.

You can detect indicators of stress... or hot weather... or stage-fright (admittedly a form of stress)... or too much caffeine... or an underlying (maybe undiagnosed) medical condition, etc. So it does not even necessarily measure "stress".

It's about as useful as the so called "fruit machine" which they used to test for homosexuality[0], in that it is utterly useless while at the same time can be quite ruinous for people. People have been fired over polygraph "fails", and while not admissible in courts, people probably have been fingered for crimes after they failed polygraphs. Also, criminals have gone free after passing polygraphs[1].

>But everyone knows that it's not very reliable in almost every circumstance it's used.

You and I may know that. But a lot of people actually do not. That's why it's still used. Either because people administering those tests think it's "good science", or because those people administering it know that while it's all bullshit the person they are testing might not know that and break down and admit to things. Remember that fake polygraph on the show The Wire, which was just a copier they strapped to the suspect. If I remember correctly that was based upon true events.

A quick google shows e.g. you can hire "polygraphers" to e.g. "test" if your partner was unfaithful, making claims such as: "However, assuming that you have a good polygrapher with a fair amount of experience in working with betrayal trauma, you're going to get results that are at least 90% accurate or better."[2]

The US (and probably a lot of other) government(s) like their polygraphs very much, too[3].

> you can often tell (i.e. with better accuracy than chance) whether or not a subject is able to give a confident, uncomplicated yes-or-no to a straightforward question in a situation where they don't have to be particularly nervous

Uhmm, if somebody sat me down in a room, strapped all kinds of "science" to my body and then asked me questions, I'd be quite nervous regardless of whether I am truthful or not. In fact, I'd be even more nervous knowing it's a polygraph and bullshit, because I cannot know if the person administrating it would know that too.

If that somebody then asked me "Have you ever killed a prostitute?", or "Have you ever colluded with the enemy?", or "Have you ever cheated on your partner?", or "Have you ever stolen from your employer?", for example, my stress would certainly peak despite being able to confidently and truthfully answer "No!" to all of those questions. And I am sure the polygraph would "measure" my "stress".

[0] Yes, that was a real thing too. https://en.wikipedia.org/wiki/Fruit_machine_(homosexuality_t...

[1] E.g. the Green River Killer Gary Ridgway passed a polygraph, so the police turned their resources to another suspect who failed the polygraph. That was in 1984. Ridgway remained free until his arrest in 2001. He killed at least 4 more times after the investigation stopped focusing on him after that "passed" polygraph.

[2] https://www.affairrecovery.com/newsletter/founder/use-abuse-...

[3] https://support.clearancejobs.com/t/the-differences-between-...


Taking away tools don't seem to me like the best response same way taking away things tends never to be. If the problem is people not using it right, that seems to me like it would be designed wrong for what people need it for. Like if the issue is using it wrong with too little sentences, then put a minimum sentence or something to have that minimum likelihood.

Same goes for representing what it means. If people don't understand statistics or math and such, then show what it means with circles or coins or stuff like that. Point is don't seem ever a good thing for options to get removed, especially if it's for bein cynical and judgin people like they're beneath deservin it. Don't make no sense.


The problem isn't people not using it right, the problem is that the tool can never work and just by being out in the world it would cause harm.

If I have a tool that returns a random number between 0 and 1, indicating confidence that text is AI generated, is that tool good? Is it ethical to release it? I'd say no, it isn't. Removing the option is far better because the tool itself is harmful.


>just by being out in the world it would cause harm

sounds like AI rather than AI detection to me. :)


I don't agree with that premise. I don't know that it can't work, that'd suggest something like no matter what it's worse than a coin flip. I don't think it's that bad or at least nobody showed me anything of it being that bad. You'd have to show me that it can't work and that seems to me a pretty big ask I know


All that has to be shown is that the tool is as bad as or worse than random today, in order to remove it today.


From the article, "while incorrectly labeling the human-written text as AI-written 9% of the time."

Seems like from what the article we're talkin about says it definitely ain't worse than random by far. Thing you most want to avoid is wrongly labeling humans as AI-written so that seems pretty good. Though it only identified 26% of AI text as "likely AI-written" that's still better than nothing, and better than random. But we don't know or I don't know from the article if that's on the problem cases of less than 1,000 characters or not. It don't say what the *best case* is just what the general cases are.

Anyhow don't seem to me worse than random is the issue here


You're right, I should have been less specific. If the harm of false positives is significant you may not need to have random or worse than random results to feel obligated to stop the project.


alright. thanks for your thoughts


I'd want to see a lot better than "better than random" for the type of tool which is already being used to discipline students for academic misconduct, making hiring and firing decisions over who used AI in what CV/job tasks, and generally used to check if someone decieved others by passing off ai writing as their own, a wrong result can impugn people's reputations


Wherever you draw the line someone's going to be upset at where the line is. You're echoing the other guy's concern, really everyone's concern. Same issue with everything from criminal justice to government all around so there's not really any value in yelling personal preferences at one another, even assumin I disagree which I don't. That ain't what I'm about in either case and it don't change what I said about removing options by assuming people suck being a bad way to go about doing anything.

Might as well remove all comment sections because people suck so assume there's no value having one. Pick any number of things like that. Just ain't a good way to go thinking about anything let alone defending a company for removing it, since the same logic justifies removing your ability to criticize or defend it in the first place. You an AI expert? Assume no, so why we let you talk about it? Or me? People suck so why let you comment? On and on like that.


There are numerous people that I’ve tried to get them comprehend statistics, important medical statistics for doctors so you would assume they’re smart enough to understand. There just seems to be a sufficient subset of the population that are blind to statistics and nothing can be done about it. Even sitting down and carefully going through the math with them doesn’t work. No matter how deep into visualization rabbit hole you go there will still be a subset that will not get it.


Alright let's say that's how it is. How happy would everyone else be if they were treated like that even if they weren't like that? I'd be right miffed and I ain't no einstein. My problem is saying it's a good thing to *remove* options just because some people don't know how to use it. Use that kinda logic for other stuff and you'd paint yourself in a corner with a very angry hornet trapped in it, so not the kind of thing you want to encourage if you assume you'd end up the one trapped. I don't know if my message is comin across right do you get me?


What about the patients getting unnecessary treatments? How upset should they be? What about the student expelled for AI plagiarism due to a false reading? These things are unreliable, and despite an infinite amount of caveats there is no way to prevent people from over relying on it. We might as well dunk people in the water to see if they float.

That’s a weird kind of extortion, a demand that we placate a subset of the population to the detriment of others. If a conflict came down to people who understand stats versus those blind to it I would put my money on those who understand stats.


I don't see how that's any different from anything, any tool, any power, any method. Same problem with everything. That's why this don't convince me and just seems like removing things cynically instead of improving it. Seems to me like the company also really don't want its service identified negatively like that and get itself associated with cheaters even if they're the ones selling the cheat identifying, or something like that.


Firstly, this tool cannot be made better than it is due to the nature of its construction, it is completely intrinsic. Secondly, as LLM models improve, as they are guaranteed to do, this tool can only become worse as it becomes increasingly difficult to distinguish between human and AI written text.


I don't know about neither of those. How is it intrinsic? What stops detection improving just because AI gets better? Assuming it just doesn't become sentient human replica or something I mean AI like this where it's just a language model thing. Plus that's assuming future stuff you can track in the meanwhile and still don't justify "remove it because people dumb and do bad stuff with tool", that'd only justify removing it later as they do get better.


The algorithms are trained on minimizing the difference between what the algorithm produces and what a human produces. The better the algorithms the less the difference. The algorithms are at the point where there is very little difference and it won’t be long until there is no difference.


I think it will be increasingly irrelevant what specific process generated a text, for example. Already before genAI people did not in general query into how politicians' speeches were crafted etc.


Indeed or whether math was done in your head, on a calculator or by a computer. Math is math and the agent that represents the result gets the credit and blame.


cool beans. I didn't think about it like that. Could be.


> The amount of people in the ecosystem who thinks it's even possible to detect if something is AI written or not when it's just a couple of sentences is staggering high.

I saw that this report came out today which frankly is baffling: https://gpai.ai/projects/responsible-ai/social-media-governa... (Foundation AI Models Need Detection Mechanisms as a Condition of Release [pdf])


I'm still interested in this line of enquiry.

These models are clearly not good enough for decision-making, but still might tell an interesting story.

Here's an easily testable exercise: get a load of news from somewhere like newsapi.ai, run it through an open model and there should be a clear discontinuity around ChatGPT launch.

We can assume false positives and false negatives, but with a fat wadge of data we should still be able to discern trends.

Certainly couldn't accuse a student of cheating with it, but maybe spot content farms.


I see no reason why watermarking can’t be broken by having someone simply rephrase/redraw the output.

Yes, it’s still work, but it’s one step removed from having to think up of the original content.


Watermarking was never going to be successful except for the most naive uses.


It can likely work in images where you can make subtle, human-undetectable tweaks across thousands/millions of pixels, each with many possible values.

Nearly impossible across data with a couple hundred characters and dozens to thousands of tokens.


right but the non-naive approach would be to add noise or have a dumber model rewrite the image. agreed it is easier with images though


I agree, transparency is essential, especially when it comes to AI applications. Many underestimate the complexity of distinguishing AI-written content from human-written, especially for short texts. There's a danger in trusting tools claiming to provide absolute certainty in this regard; no current technology can guarantee 100% accuracy. This incident underscores the need for a more realistic understanding of AI capabilities and limitations in text generation detection.


Dr. Michio Kaku claimed in an interview that it may eventually be possible for quantum computers to guarantee a certain level of truthfulness. I didn't really follow his argument and it seemed a little hand-wavey, but I can't prove that he's wrong.

https://mkaku.org/home/tag/quantum-computing/


No need to prove! Any well calibrated bullshit detector should ring at 110dB when applied to claims mixing AI and quantum computing.

(that said, "may eventually be possible" is so weak a claim it's already meaningless. Quantum fluctuations may eventually turn me into a potato but it's not keeping me up at night)


Based on my experience from grad school, I would bet plenty of the professors who fail students because ChatGPT said ChatGPT might have written something honestly don't care whether it's true or not, as long as it shifts liability away from themselves onto someone else


They could certainly keep a database of things generated by /their/ AI ...


Which would be trivially broken with emojis injection or viewpoint shifting.


People are too lazy to bother about that.


It's really disturbing to me how many people don't realize it's not possible.

It's like asking a 747 to be made into a dog.

It's completely nonsensical to me.


Good. If it is not reliable, it's a further harm than good if it exists as a false security.

An analogous example: my local pizza delivery (where I worked) would shut the box with a safety sticker, to avoid tampering / dipping by the delivery boys. Now, sometimes they would forget to do this for various logistical reasons. Every one of the non-stickered ones started getting returned as customers worried a pepperoni stolen. They stopped doing it shortly after.


It’s a law of nature that pepperoni thieves cannot take a job at a pizza place. They are forever doomed to be delivery guys.


This is actually kinda true with doordash etc. Those drivers are completely unvetted, they don't even have an interview.

The kind of people that can't get a job at a pizza place.

Personally, I never order delivery through these services. The incentives are all wrong. Not to mention the costs are super high: restaurants don't make any money, I pay out the @$$, and the drivers are given sub-minimum-wage pay after taking on the risks of delivery driving.


Eh, I'd consider that a failure of employee training and reverse the situation by giving out a weekly bonus to shifts that did not fail to put the security stickers on.

Kinda like if they forgot to put the security seal on your aspirin, I'm not going to take them all off because someone forgot to run production with all the bottles sealed.


The bottle of Aspirin goes through many hands between the manufacturer and you including sitting on an unattended shelf open to the public. The person making the pizza is working for the same company as the person delivering it, or may even be the same person. If you can't trust the pizza co delivery person then you probably shouldn't trust the person making it either.


Your right, don't eat at that pizza place either

This is the brown m&m principle in effect.


Nowadays food delivery often goes through third party. Eg Uber eats, Foodora, Bolt, Grab, etc.


All the pizza places here have their own drivers still.


Frankly, the kind of person who forgets to put the sticker on at the pizza place will forget about the bonus too.


This tool's been fueling tons of false accusations in academia. Wife is doing her PhD and she often tells me stories about professors falsely accusing students of using ChatGPT.


Lots of stories on Reddit about school teachers unfairly accusing students of using ChatGPT for assignments too


A group of students at my university were claiming their papers were being marked by a LLM. They cited a classifier like the OP which they used on their feedback comments.


It isn't just this one. There are a hundred different "AI detectors" sold online that are all basically snake oil, but overzealous professors and school administrators will keep paying for them regardless.


Eh I am doing my PhD and I use ChatGPT all the time!


And?

The tool in question was used for AI text detection not generation.


Not all people might be accused wrongly. Then again does it matter if you use ChatGPT for inspiration?


> does it matter if you use ChatGPT for inspiration?

Absolutely not.


whats funny is GPT output with even a relatively low temp will come back as human more reliably than human content.


Latest I heard is that teachers are requiring homework to be turned in in Google Docs so that they can look at the revision history and see if you wrote the whole thing or just dumped a fully formed essay into GDocs and then edited it.

Of course the smart student will easily figure out a way to stream the GPT output into Google Docs, perhaps jumping around to make "edits".

A clever and unethical student is pretty much undetectable no mater what roadblocks you put in their way. This just stops the not clever ones. :)


Proof of work, human version.

Yes, anybody can write an agent to meander about typing the chatgpt generated text into Google docs. Yes, Google could judge how likely it's that a document was typed by a human, but they won't for the same reasons openAI just cancelled this.

Somebody (maybe reacting to this news, maybe reading this thread) will write such an editor or evaluator. Another solution is screen recording as you write. Another (the best one, and the hardest one for educators) is to not request or grade things a robot can write better than most humans.


This will be hard to break. It’s basically an hour long CAPTCHA. You can look at things like key stroke timing, mouse movement, revision pattern, etc. I don’t see LLM’s breaking this approach to classify human writing.


> I don’t see LLM’s breaking this approach to classify human writing.

Why not? Record a bunch of humans writing, train model, release. That's orders of magnitude simpler than to come up with the right text to begin with.


Lol. I love HN -- the reaction is because this is either straight-faced or tongue-in-cheek, if it's straight-faced, this is stylistically a parody of the infamous "well Dropbox is rsync, it's moat is basically a SWE-weekend" comment


Someone can release this as a product though, exactly like Dropbox. Dropbox basically was rsync, it just had a better UX. There are a lot of people these days that are pretty good at taking ML models and slapping a nice UX on top of them.


Seems easy to me? Just manually copy the text by typing it in yourself. There may be certain patterns that could potentially give it away vs truly human-generated text, but will Google Docs revision history show that level of detail?


Retyping the essay from the chatgpt while actively rewording the occassional sentence seems like it would do it.


It seems like that's nearing the sweet spot of fraud prevention, where committing the act of fraud is as much work as doing the real thing.


The intellectual labor of thinking about an essay, drafting it, editing, and revising it is much higher than strategically re-typing a ChatGPT output. One requires experience, knowledge, understanding and creativity, the other one requires basic functioning of motor skills and senses.

You could program a robot to re-type the ChatGPT output into a different word processor and feed it parameters to make the duration between keystrokes and backspaces fluctuate over time. You could even have it stop, come back later, copy and paste sections and re-organize as it moves through and end up with the final essay from ChatGPT.


Doesn't sound like it to me. Researching a topic can be a lot of effort, while typing is easy, even with occasional rephrasing. Hell, even with constant rephrasing it's not very hard.


Having a student sit there and type out, word for word, a decent essay on the subject is probably fairly effective education for a LOT of subjects.

In the same vein that letting students write calculator programs to do the quadratic formula for them during a test is actually a pretty good way to get them to learn the quadratic formula.

It's not an exact equivalent to the intent of an essay today - from an education perspective - but it's not a complete miss either.


It's better than nothing, sure, but nowhere close to as good as doing the research and formulating the ideas yourself.


Not when someone creates a browser extension that you run while using Google Docs, feed the entire ChatGPT doc into it and it recreates the doc over a 2 hour period with small bits and pieces.


It sounds a lot easier to retype what you see rather than to create it.


It's a bit suspicious to type an essay linearly from start to finish, though.


A bit like how us old timers had to write our exams in the pen and paper days.


Is it? That's how I've always written them (and still do to this day). I write the first draft linearly from start to end, then go back and do my revising and editing.


Well, that's what I mean. The revisions and edits are what show that you wrote it, rather than copying from an AI line by line.


Will Google Docs provide that level of detail for revision history in a way that teachers can easily see that's it's likely AI-generated, and have high confidence in that?


Google Docs already does have this level of revision history.


Well, there is one way, which is timed, proctored exams.

Which sucks, because take-home projects are evaluating a different skill set, and some people thrive on one vs the other. But it is what it is.


>Of course the smart student will easily figure out a way to stream the GPT output into Google Docs, perhaps jumping around to make "edits".

No need to complicate it that much. Just start off writing an essay normally, and then paste in the GPT output normally. A teacher probably isn't going to check any of the revision history, especially if there's more than 30 students to go through.


Easy thing to do would be very low maximum word counts. ChatGPT is incapable of brevity.


It's like hearing stories about people hating the first automobiles because they weren't horses.

The education bubble is about to implode - it will probably be one of the first industries killed by AI.


"Half a year later, that tool is dead, killed because it couldn’t do what it was designed to do."

This was my conclusion as well testing the image detectors.

Current automated detection isn’t very reliable. I tried out Optic’s AI or Not , which boasts 95% accuracy, on a small sample of my own images. It correctly labeled those with AI content as AI generated, but it also labeled about 50% of my own stock photo composites I tried as AI generated. If generative AI was not a moving target I would be optimistic such tools could advance and become highly reliable. However, that is not the case and I have doubts this will ever be a reliable solution.

from my article on AI art - https://www.mindprison.cc/p/ai-art-challenges-meaning-in-a-w...


> but it also labeled about 50% of my own stock photo composites I tried as AI generated

Could it be that a large proportion of the source stock photos were actually AI generated?


No, they were older images. However, that is now becoming a problem. Some stock photo sites now have AI images and they are not labeled. I'm able to distinguish most for now because at hires the details contain obvious errors.

This is really painful, because for some of my work I need high quality images suitable for print. Now I can't just look at the thumbnail and say "this will work". I now have to examine it taking more of my time.


Humorously, in my experience, if a response from ChatGPT ever got classified as AI generated by tools like ZeroGPT or similar, all I had to do was adjust the prompt to tell the model not to sound like it was AI generated and that bypassed all detection with a very high success rate. Additionally, I also found that if you prompt it to make the response be in the style or some known writer for example, it often made responses 100% human written by most AI detection models.


“Can’t you just try to blend in and be a little more cool? The bouncer is gonna notice.”

Starts talking like Shakespeare


This didn't work half a year ago. They'd still accuse the rewritten text to be AI generated. I think some of the recent updates have changed the tone of ChatGPT so significantly that they no longer register on the radar.


Good. And I think watermarking AI output is also a dead end. Better that we simply assume that all content is fake unless proven otherwise. To the extent that we need trustworthy photos, it seems like a better idea to cryptographically sign images at the hardware level when the photo is taken. Voluntarily watermarking AI content is completely pointless.


I can see that working for specialized equipment like police body cameras, but if every camera manufacturer in the world needs to manage keys and install them securely into their sensors then there will be leaked keys within weeks.


Makes fake image, hold it in front of camera, click, verified image...


The signed timestamp and location would give that away, but those would have to become not configurable by the user.


clocks and gps sensors can be hacked, there is no fundamental source of truth to fall back on here.

its as Sisyphean a task as AI detection.


Just use a certificate chain. The manufacturer can provide each camera its own private key, signed by the manufacturer.


And when the sensor bus is hijacked to directly feed “trusted” data into the processor?


Then you can examine the specific camera and try to prove it has been tampered with. At least you know exactly which camera to look at.


Cryptography cant save us here, people will figure out how to send AI images to the crypto hardware to get it signed in months. Just would be another similar layer of false security.


That's not how cryptographic signing works.

Cryptographic signing means "I wrote this" or "I created this". Sure you could sign an AI generated image as yourself. But you could not sign an image as being created by Getty or NYT


I believe that's not what they're saying. It's signing hardware, like a camera that signs every picture you take, so not even you can tamper with it without invalidating that signature. Naively, then, a signed picture would be proof that it was a real picture taken of a real thing. What GP is saying is that people would inevitably get the keys from the cameras, and then the whole thing would be pointless.


Yep.

A chain of trust is one way to solve this problem. Chains of trust aren't perfect, but they can work.

But if you're going to build a chain of trust that relies on humans to certify they used a non-tampered-with crypto camera, why not just let them use plain ol cameras. Adding cryptosigning hardware just adds a false sense of security that grifter salespeople will lie and say is 'impossible to break', and non-technical decision makers wont understand the threat model.


> Cryptography cant save us here, people will figure out how to send AI images to the crypto hardware to get it signed in months.

Possibly (who am I kidding. *PROBABLY*!) will use chatGPT to help them design the method :)


I'm in the SEO game and I've spoken with some 'heavy players' who believe a "Google Ai Update" is in the works. As it currently stands, the search engine results will be completely overtaken by Ai content in the near future without this.

From my understanding, this is a fools play in the long run, but there are current Ai Classifier Detectors that can successfully detect ChatGPT and other models (Originality.ai being a big one) on longish content.

Their process is fairly simple, they create a classification model after generating tons of examples from all the major models (ChatGPT, GPT4, Laama, etc).

One obvious downside to their strategy is the implementation of Finetuning and how that changes the stylistic output. This same 'heavy hitter' has successfully bypassed Originalities detector using his specified finetuning method (which he said took months of testing and thousands of dollars).


Google needs to do a full 180, and only the most succinct website that answers search queries should be elevated.

The current state of Google is a disaster, everything is 100 paragraphs per article, the answer you are looking for buried half way in to make sure you spend more time and scroll to appease the algorithm.

I cannot wait for them to sink all these spam websites.


Waiting for Google to do that won't happen, they'd lose too many ad links.


Many comments here seem to suggest that practically it is going to be impossible to classify text as Human generated versus AI generated -- given the many different ways in which such attempts can be foiled in a never ending game of cat-and-mouse.

If we accept this ...

The challenge I am foreseeing is this:

We are only at the very beginning of the AI revolution -- and if LLMs need to get more sophisticated and powerful in future they will need good-quality human-generated / curated training data at a scale that is likely impossible to do manual curation/cleansing/quality-checks on.

And there is no doubt that evey medium is going to get bombarded and spammed with AI-generated content in coming years.

How then, are we going to filter the data -- to separate the real data from AI generated noise -- to train future LLMs on -- and really push them to their potential.

This problem has been bugging me for a while and I commented here previously as well, tentatively calling it 'Data Pollution' for the lack of a better word.

Curious to hear other perspectives on this.


curation? I mean of the content is considered good, does it matter if it was produced by an LLM?


The only way to prevent AI from answering questions in digital platforms is to develop a ML db on the typing style of every student across their tenure at an institution. Good luck getting that approved — departments can't even access grade or demo data without a steering group going through a 3-deep committee process.

¯\_(ツ)_/¯ try paper I guess. Time to brush up on our OCR.


If AI can replicate linguistic patterns in a way that is undetectable for both humans and models, then it seems even easier for a ML model to emulate a natural typing style, rhythm, and cadence in a way that is undetectable for both humans and models.

But you know who has more real-world data on typing style? Google, Microsoft, Meta, and everyone else who runs SaaS docs, emails, or messaging. I imagine a lot of students write their essays on Google Docs, Word, or the like, and submit them as attachments or copy-paste into a textbox.


We shattered the turing test and now we want to put it back into pandora’s box because we don’t like the repercussions.


The techbros were so focued on whether or not they could that they accused anyone who asked if they should of being a luddite. And dinner that night was crow, and dessert was humble pie.


“We shattered the Turing test” The Turing test is typically an interactive back-and-forth, no? Detecting if paragraphs of pre-written LLM output and giving a yes-or-no answer is not the same thing.


We asked the question 'can we beat the turning test', not what would happen when we did.


Funny incentive problem, OpenAI obviously has an incentive to use it's best AI detection tool for adversarial training, with the result that it's detection tool will not be very good against chatGPT generated text because it is trained to defeat the detection tool.


If they wanted to disguise their content as coming from a human, they probably wouldn't keep saying "As an AI chatbot, I can't..."


I'm not saying that they are trying to disguise, I'm saying that their goal is natural language and a way you can distinguish chatGPT (aside from trivial things like "as an ai chatbot...") from human speech is most likely a way to improve chatGPT, because the goal is speaking like a human.


This kind of thing strikes me in any case as something that's only good for the generation of AI it's been trained against. And with the exponential improvements happening almost on a monthly basis, that becomes obsolete pretty quickly and a bit of a moving target.

Maybe a better term would be Superior Intelligence (SI). I sure as hell would not be able to pass any legal or medical exams without dedicating the next decade or so to getting there. Nor do I have any interest in doing so. But chat gpt 4 is apparently able to wow its peers. Does that pass the Turing test because it's too smart or too stupid? Most of humanity would fail that test.


I read a post on LinkedIn by Timnit Gebru, where she shared an anonymous post by a student who said that their paper was being rejected due to having seemingly high AI written content. Their paper was evaluated by Turnitin who claim to have a built an AI detector. The student mentioned it was putting their career at risk, and they hadn't actually taken the help of any AI and was actually a 4.0 student throughout their college career.

So assuming all that to be true, how can the likes of Turnitin claim to be an authority for AI writing detection. When I graduated a few years back, they used to offer plag check only.


> how can the likes of Turnitin claim to be an authority for AI writing detection

Pretty easy - they lie to people.


There is inherent conflict in having both an AI tool business and an AI tool detection business.

If the first does a good job, the second fails. And vice versa.

(On the other hand, maybe there is a lot of money to be made selling both, to different groups?)


I don't think this follows. If they wanted, they could crypographically bias the sampling to make the output detectable without decreasing capabilities at all.

Only people using it deceptively would be affected. No idea what portion of ChatGPT's users that is, would be very interested to know.


There’s a much more effective way: store hashes of each output paragraph (with a minimum entropy) that has ever been generated, and allow people to enter a block of text to search the database.

It wouldn’t beat determined users but it would at least catch the unaware.


Changing every 10th word defeats that strategy but doesn't defeat a cryptographic bias.

Also the cost of storing every paragraph hash might eventually add up even if at the moment it would be negligable compared to the generation cost.


One solution is to store a hash of every n-gram for n from 2 to whatever, then report what percent of ngrams of various lengths were hits.

Did someone say Bloom Filter??


They literally already store the whole conversation...


If the goal is students, then the best would be a tool not only detects AI. But where you can submit previous writing and see how likely it is they wrote a similar text, not so much if it was llm generated.


I wonder why we need this very thing of AI generated. It's a luddite view of AI. Much like the need to distinguish between handcrafted versus machined products - is there a real utility to knowing this?

For educators looking at evaluating students, essays and the like - we possibly need different ways of evaluation rather than on written asynchronous content for communicating concepts and ideas.


>is there a real utility to knowing this?

For civics, I would say yes.

Imagine you were talking to an online group about a design project for a local neighborhood. Based on the plurality of voices it seemed like mist people wanted a brown and orange design. But later when you talk to actual people in real life, you could only find a few that actually wanted that.

Virtual beings are a great addition to the bot nets that generate false consensus.


I believe you’re exactly right. It would be similar to detecting that math homework used wolfram alpha or even a calculator.


Timing seems related to Altman launching Worldcoin (whose goal is to reliably differentiate between human and AI generated content).

https://www.reuters.com/technology/openais-sam-altman-launch...


Not to say "I can detect chatgpt" but it sure seems to have a similar way of talking even when I say things like: Talk like a "Millennial male who is obsessed with Zelda, their name is bob zelenski"

Now the topic isnt about anything millennial or Zelda related, but I'd think that the language model would select sentence and paragraph phrasing differently.

Maybe I need to switch to the API.


I've also noticed that ChatGPT tends to respond to short prompts, especially questions, in a predictable format. There are a few characteristics.

First, it tends to print a five-paragraph essay, with an introduction, three main points, and a conclusion.

Second, it signposts really well. Each of the body paragraphs is marked with either a bullet point or a number or something else that says "I'm starting a new point."

Third, it always reads like a WikiHow article. There's never any subtle humour or self-deprecation or ironic understatement. It's very straightforward, like an infographic.

It's definitely easy to recognize a ChatGPT response to a simple prompt if the author hasn't taken any measures to disguise it. The conclusion usually has a generic reminder that your mileage may vary and that you should always be careful.


I have to admit I'm struggling to tell if this was done ironically, but your comment is exactly a five paragraph essay with an introduction, three main points, and a conclusion.

If so, nice meta-commentary.


Thank you, it was intentional!


Smart of OpenAI to shut down a tool that basically doesn't work before the school year starts and students start to get in trouble based on it.

I think this upcoming school year is going to be a wakeup call for many educators. ChatGPT with GPT-4 is already capable of getting mostly A's on Harvard essay assignments - the best analysis I have seen is this one:

https://www.slowboring.com/p/chatgpt-goes-to-harvard

I'm not sure what instructors will do. Detecting AI-written essays seems technologically intractable, without cooperation from the AI providers, who don't seem too eager to prioritize watermarking functionality when there is so much competition. In the short term, it will probably just be fairly easy to cheat and get a good grade in this sort of class.


Nah, everything is just going to be proctored exams on paper in the future. Sucks for the pro take-home project crowd, but they ruined it for themselves.


It's an open ended red-queen problem. You can't win.

Besides, even if they did win, they would still lose by shooting their own foot.


As a joke I built a simple tool that swaps random words for their synonym and it did the trick in throwing off any distribution matching (came out with gibberish but still lol) https://www.gptminus1.com/


Fascinating. I tried zerogpt.com for testing a text I generated. Got 91% AI generated response. Used gptminus1.com and now zerogpt.com shows 0% AI. These detectors are garbage.


I wonder if this entire "AI cheating" stuff will end up as calculators did when I was growing up. When I was a young child in school calculators were forbidden. You had to learn multiplications in your head. By the time I got to the final exam at the end of our "high school" equivalent we could use "dumb" calculators in math exams. Few years later graphing calculators became accepted in schools (sometimes even required).

It is important humans learn to express themselves in writing. The only way I think this will happen is if kids do their writing at school supervised.


How could we have both the AI that is indistinguishable from humans and the AI that can detect it with good accuracy? That would imply the race on both sides for an AI that is more intelligent than an high IQ human.


Text is a very high dimensional space, n-dimensional in fact. There is plenty of room for an AI to leave a fingerprint that can be detected in some ways but not others.

In fact it doesn't take much text to distinguish between two human beings. The humanly-obvious version is that someone that habitually speaks in one dialect and someone else in another must be separate, but even without such obvious tells humans separate themselves into characterizeable subsets of this space fairly quickly.

I'm skeptical about generalized AI versus human detection in the face of the fact that it is adversarial. But a constant, unmoving target of some specific AI in some particular mode would definitely be detectable; e.g., "ChatGPT's current default voice" would certainly be detectable, "ChatGPT when instructed to sound like Ernest Hemmingway" would be detectable, etc. I just question whether ChatGPT in general can be characterized.


This should have been rejected as an idea just on its face. False positives are really problematic. And if it performs unexpectedly well (accuracy is high) then it just becomes a training tool for reinforcement learning.


Rather than try to detect if pieces were generated by AI, why not just check if they're plagiarized off the outputs of a bunch of popular models? We already have traditional plagiarism analysis methods, we just need a corpus of most of the recent outputs from the most popular LLM services to check against. OpenAI, Google, Anthropic, and any other LLM-as-a-service companies could profitably sell access to these corpora to third-party analysis services that compile them and offer search.


One reason would be that LLM's don't tend to output verbatim pieces from their dataset (unless explicitly prompted to do so). This is made further complicated by the "temperature" setting allowing users to make output even more "creative" than its dataset.

In OpenAI's case, its writing style usually comes from OpenAI's in-house dataset they used for RLHF. This is what gives it the ability to chat and respond with its signature (perhaps overly formal and apologetic) tone.

Although it can be used to write in other styles, sometimes it will refuse to because of this.


You misunderstand me. I'm not talking about detecting when LLM's plagiarize humans, but when humans "plagiarize" (copy from without attributing) LLM's.


There's an exceedingly simple way of doing this that's pretty much bullet proof: Just check the given text against the database of stored generations (which they no doubt keep) and you'll have a pretty much perfect result. Logistically, searching untold terabytes of text might be challenging but to say there's fundamental difficulties is just not accurate imo.


that's the problem - no you won't. The problem is you've bought too much into the stochastic parrot view of LLM's to a point where you are viewing it as a fancy content storage/retrieval system.


The idea that OpenAI was intentionally watermarking its output to avoid training data back-contamination should be thoroughly discredited now.


I think educators got away for too long of handling the problem of umotivated students and uninteresting classes subjects...

Not the educators fault tough, more like the system is bad.

My point is that given knowledge is mostly free and available, the system should teach the students to think rather than using tools or remembering facts


low accuracy is certainly a good reason to drop a project, especially when dealing with small text (<1000 chars), this is where most social media post/mini-blogs fall under.

bigger text e.g. reports, thesis etc are probably easier & cheaper to verify by humans, with help of A.I. tools (ref checking, searching...)


Wow, OpenAI pulling the plug on its AI Classifier tool really shows how tough it is to spot the difference between human and AI-generated text. It's a bummer that it didn't work out as planned, but hey, that's how we learn, right?


No disrespect, but...is this comment AI generated?


Could we not add invisible characters into the text a bit like a water mark?


Yes, and we could just as easily remove them.


True but I feel it would catch a few people out?


We could but doing so would lobotomize it.


Watermarking is a more tractable approach, but the cat is out of the bag.


How would you go about watermarking AI written text?


https://arxiv.org/pdf/2301.10226.pdf

Here's a decent paper on it.

It covers private watermarking (you can't detect it exists without a key), resistance to modifications, etc. Essentially you wouldn't know it was there and you can't make simple modifications to fool it.

OpenAI could already be doing this, and they could be watermarking with your account ID if they wanted to.

The current best countermeasure is likely paraphrasing attacks https://arxiv.org/pdf/2303.11156.pdf


I don't know.

I suppose hosted solutions like ChatGPT could offer an API where you copy some text in, and it searches its history of generated content to see if anything matches.

> bUt aCtuAlLy...

It's not like I don't know the bajillion limitations here. There are many audiences for detection. All of them are XY Problems. And the people asking for this stuff don't participate on Hacker News aka Unpopular Opinions Technology Edition.

There will probably be a lot of "services" that "just" "tell you" if "it" is "written by an AI."


One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading, like using a particular sentence length, paragraph length, or punctuation pattern. Or use certain words in the text that may not be frequently used by humans etc.

Watermarking needs to be subtle enough to be unnoticeable to opposing parties, yet distinctive enough to be detectable.

So, this is an arms race especially because detecting it and altering it based on the watermark is also fun :)


> One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading

This seems like a total non-starter. That can only negatively impact the answers. A solution needs to be totally decoupled from answer quality.


The paper I linked in the parent's comment as the "Simple proof of concept" on page 2, and like you said outlines it's limitations as both negative to performance and also easily detectable and determinable.

Their improved method instead only replaces tokens when there's many good choices available, and skips replacing tokens when there are few good choices. "The quick brown fox jumps over the lazy dog" - "The quick brown" is not replaceable because it would severely harm the quality.

Essentially it's only replacing tokens where it won't harm the performance.

It's worth noting that any watermarking will likely harm the quality to some degree - but it can be minimized to the point of being viable.


You can do this by injecting non visible unicode (LTR / RTL markers, zero width separators, the various "space" analogs, homographs of "normal" characters) but it can obviously be stripped out.


Make half of the tokens (the AI's "dictionary") slightly more likely.

This would not impact output quality much, but it would only work for longish outputs. And the token probability "key" could probsbly be reverse engineered with enough output.


It would be pretty easy to figure out against standard word probability in average datasets. Even then the longer this system runs the more likely it is to pollute its own dataset by people learning to write from gpt itself.


type=text/chatgpt :P


Invisible characters in a specific bit-pattern.

Pretty common steganographic technique, really.


Can you elaborate on "invisible?" The only invisible character I can imagine is a space. It seems like any other character either isn't invisible or doesn't exist (ie, isnt a character).

Additionally, if I copy-paste text like this are the invisible characters preserved? Are there a bunch of extra spaces somewhere?


When students try to evade plagiarism detectors, they will swap characters like replacing spaces with nonbreaking spaces, replacing letters with lookalikes (I vs extended Cyrillic Ӏ etc), and inserting things like the invisible 'Combining Grapheme Joiner'

IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.


> IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.

It doesn't matter since there's no one-pass solution to counterfeiting.

You have the right of it-- the best you can hope for is adding more complexity to the product, which adds steps to their workflow and increases the chances of the counterfeiter overlooking any particular detail that you know to look for.


There's a bunch of different "spaces", one is a "zero-width space" which isn't visible but still gets copied with the text.

https://en.wikipedia.org/wiki/Zero-width_space


And the second site students will go to is zerospaceremover.com or whatever will show up to strip the junk.


So, all I have to do is copy-paste it into a text editor with remove-all-formatting to circumvent that?


If it's generated by a SaaS, the service could sign all output with a public key.


This isn't a watermark though, the idea of a watermark is that it's inherently embedded in the data itself while not drastically changing the data


Why is this comment being downvoted?

OpenAI can internally keep a "hash" or a "signature" of every output it ever generated.

Given a piece of text, they should then be able to trace back to either a specific session (or a set of sessions) through which this text was generated in.

Depending on the hit rate and the hashing methods used, they may be able to indicate the likelihood of a piece of text being generated by AI.


Why would they want to is my question. A single character change would break it.

Then you have database costs of storing all that data forever.

Moreso, it's only for openAI, I don't think it will be too long before other gpt4 level models are around and won't give two shits about catering to the AI identification police.


> A single character change would break it.

That depends on how they hash the data, right? They can use various types of Perceptual Hashing [1] techniques which wouldn't be susceptible to a single-character change.

[1] https://en.wikipedia.org/wiki/Perceptual_hashing

> Then you have database costs of storing all that data forever.

A database of all textual content generated by people? That sounds like a gold mine, not a liability. But as I've mentioned earlier, they don't need to keep the raw data (a perceptual hash is enough).

> won't give two shits about catering to the AI identification police

I'm sure there will be customers willing to pay for access to these checks, even if they're only limited to OpenAI's product (universities and schools - for plagiarism detection, government agencies, intelligence agencies, police, etc).


So other text should not be tagged as AI generated?


only tractable for closed source hosted LLMs


Interesting. I was under the impression this tool was effective because of some sort of hidden patterns generated in sentences. I guess my assumption was way complex than what it actually is


It's very interesting actually. I think from business perspective SEO is the most exposed. Very curious how Google will (or may be now) solve the issue?


Even if AI detectors were 99% effective, anyone could just iterate over an AI produced piece of writing until it's in the 1% that isn't detected and submit it.


On a related note, I built this site as a PSA:

https://isthiswrittenbyai.surge.sh


If we can't tell what is written by AI, how will we know if AI is just stuck in a feedback loop in which it references itself (and is wrong)?


I urge anyone who values data privacy to refuse to use sites such as this which employ third party data services, ie. popups, which purport to allow you to "Manage" your consent but which hide a long list of opt-out "Legitimate Interest" flags in a "Vendors" list concealed at the bottom of another scrolling list.


We're entering a wild world. Our information space is up for the slaughter.


it was doomed to fail from the start. being detectable is precisely how the bot is trained. so the moment a tool figures out a classifier, a new generation that undetectable springs up.


Was this what Stack Overflow were using to detect automated answers?


None were officially built into the site so it'll vary from moderator to moderator, but the one that mods had a browser script made for to help streamline moderation was RoBERTa Base OpenAI Detector from 2019, created prior to the existence of GPT-3, GPT-3.5 (ChatGPT free), or GPT-4 (ChatGPT pro). It'll be far worse than the 2023 one this article is about.


No official announcement is lame


No lamer than dropping davinci.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: