> Eventually, as models and their users both improve, we'll collectively realize that trying to reliably discriminate between AI and human writing is no different than reading tea leaves. We should judge content based on its intrinsic value, not its provenance.
There are zillions of words produced every second, your time is the most valuable resource you have, and actually existing LLM output (as opposed to some theoretical perfect future) is almost always not worth reading. Like it or not (and personally I hate it), the ability to dismiss things that are not worth reading like a chicken sexer who's picked up a male is now one of the most valuable life skills.
Putting aside the claim that "LLM output [...] is almost always not worth reading"[1], the whole issue here is that this supposed ability of determining whether or not content is AI-generated doesn't exist. Is it really a valuable life skill to decide whether or not you want to read something based solely on its density of em dashes?
Of course there are cases where you can tell that some text is almost certainly LLM output, because it matches what ChatGPT might reply with to a basic prompt. You can also tell when a piece of writing is copied and pasted from Wikipedia, or a copy of a page of Google results. Would any of that somehow be more worth reading if the author posted a video of themselves carefully typing it up by hand?
1: You're assuming a specific type of output in a specific type of context. If LLM output were never worth reading, ChatGPT would have no users.
> Is it really a valuable life skill to decide whether or not you want to read something based solely on its density of em dashes?
Having good heuristics to make quick judgements is a valuable life skill. If you don't, you're going to get swamped.
> Would any of that somehow be more worth reading if the author posted a video of themselves carefully typing it up by hand?
No, but the volume of carefully hand-typed junk is more manageable. Compare with spam: Individually written marketing emails might be just as worthless as machine-generated mass mailings, but the latter is what's going to fill up your inbox if you can't filter it out.
> If LLM output were never worth reading, ChatGPT would have no users.
Only if all potential users were wise. Plenty of people waste their time and money in all sorts of ways.
Then it's stupid and you're going to have an inevitable failure. Hopefully without any expensive sounds, so don't put LLMs in charge of heavy machinery or medical devices.
What we're discussing is whether a set of heuristics to determine whether something is the output of a human or an LLM are "stupid", not whether we should put LLMs in charge of critical work. It's not the LLMs we're talking about, but the heuristics to detect them.
The person you initially replied to claimed the heuristics (to detect LLMs) are not stupid if they are shown to work (by detecting LLMs). The person they were replying to claimed such heuristics were useless.
I think the heuristics are stupid, but happen to work with the current round of LLMs. The next round of training will work to stop the heuristics working, since if they work it means the LLM isn't correctly guessing the next token a human would produce. I don't think LLMs will become indistinguishable from humans any time soon, but I think it'll be a very long back & forth of training & finding new heuristics.
Because the false positive rate is unacceptably high — we're talking about a standard, widely used character — and because if the heuristic becomes widespread enough to matter, then it will be trivially circumvented by bad actors anyway. Who is it helping if we collectively bully ourselves into excising a perfectly good punctuation mark from human language?
If anything, I'd rather that renderers like Markdown just all agree to change " - " to an en dash and " -- " to an em dash. Then we could put the matter to bed once and for all.
Oh no, I'm not advocating ditching em-dahses. I love them -- the form I use, anyway.
I was just curious why you've decided paying attention to them is a bad heuristic. Sure, it can change once people instruct their LLMs not to use them, but still, for now, they sure seem to overuse them!
That and "let's unpack this". I swear, I'll forbid ChatGPT from using "unpack" ever again, in any context!
That's fair. It's not like I don't pay attention to it myself. It's more that I wouldn't never use presence of em dashes in the absence of any other heuristics to predict whether or not something is LLM-generated, and it's a practically useless signal either way because I also wouldn't assume that content that used hyphens in place of dashes wasn't LLM-generated.
So the only real purpose of the heuristic is to add a tiny extra vote of confidence when I see a comment that otherwise appears to be lazy ChatGPT copypasta, but in such cases I'll predict that it was probably LLM output either way, and I'll judge that it appears to be poor writing that isn't worth my time regardless of whether or not an LLM was involved.
Fundamentally, the issue I'm seeing here is that we're all talking over each other because we need a better standardized term than "LLM output". I suppose "slop" could work if we universally that it referred only to a subset of LLM output, rather than being synonymous with LLM output in general, but I'm not sure that we do universally agree on that.
If someone types the equivalent of a Google search into ChatGPT, or a spammer has an automated process generically reply to social media posts/comments, that's what qualifies to me as "slop". Most of us here have seen it in the wild by now, and there's obviously a distinctive common style (at least for now), and I think we can all agree that it sucks. That's very different from someone investing time and/or expertise to produce content that just happens to involve an LLM as one of the tools in their arsenal; the attitude it isn't is just the modern equivalent of considering cellular phone calls or typed letters to be "impersonal".
I'm not suggesting that LLM output doesn't tend to have a higher density of em dashes than human output. I'm just pushing back on the idea that presence of em dashes is sufficient evidence to dismiss something as probably-LLM-generated, which is no better than superstition. I mean, I've used em dashes in a number of comments in this thread, and no one has accused me of using an LLM, so it can't be a pattern that anyone puts too much stock in.
No one said anything about LLM companies. If I were a spammer today, I'd just have my code replace dashes in LLM output with hyphens before posting it. As a human, I'm not going to suddenly stop using dashes because a handful of people are treating a silly meme as if it were a genuinely useful heuristic.
That maybe backs up the claim that it's standard, but not that it's widely used or the false positive rate would be unacceptably high.
> If I were a spammer today, I'd just have my code replace dashes in LLM output with hyphens before posting it.
No you wouldn't, for the same reason spammers don't put more plausible stories in their emails: they want to filter for the most gullible segment before investing any human effort.
It's a standard punctuation mark available on Android/iOS/macOS keyboards, and automatically inserted into text by widely used software such as Microsoft Word. You guys are acting like it's an obscure Unicode character that GPT just spontaneously started using out of the blue, and ignoring the obvious answer that it's common in LLM output because it's common in training data. The burden of proof is on anyone claiming that it isn't common.
I was referring to social media spam. It would be a simple way to defuse people citing the use of dashes as "proof" that your spam was spam and having the hivemind bury it. You can't ensnare gullible readers if they never see your comment to begin with — not that following an absurd blanket rule of categorizing em dash usage as AI output has anything to do with whether or not the reader is gullible.
"Should be fairly low" isn't a safe assumption without robust data to back that up. I think it's more likely to be unacceptably high. Dashes are standard punctuation marks available through Android/iOS/macOS keyboards, and automatically inserted into text by common tools like Microsoft Word — not some obscure Unicode character. What's next, are we going to start flagging any text that ends in a question mark as "AI-generated"?
> 1: You're assuming a specific type of output in a specific type of context. If LLM output were never worth reading, ChatGPT would have no users.
I think nobody is upset about reading an LLM's output when they are directly interacting with a tool that produces such output, such as ChatGPT or Copilot.
The problem is when they are reading/watching stuff in the wild and it suddenly becomes clear it was generated by AI rather than by another human being. Again, not in a context of "this pull request contains code generated by an LLM" (expected) but "this article or book was partly or completely generated by an LLM" (unexpected and likely unwanted).
Right, that's part of what I'm getting at. There are two primary cases when LLM output tends to be bad:
1. In the context of research/querying, when unverified information from its output is falsely passed off as verified information curated by a human author. There's a big difference between "ChatGPT or some blog claims X" and "the answer is X".
2. In the context of writing/communication, when it's used to stretch a small amount of information into a relatively large amount of text. There's a big difference between using an LLM to help revise or trim down your writing, or to have it put together a first draft based on a list of detailed bullet points, and expecting it to stretch one sentence into a whole essay of greater value than the original sentence.
Those are basic misuses of the tool. It's like watching an old person try to use Google 20 years ago and concluding that search engines are slop and the only reliable way to find information is through the index of Encyclopedia Britannica.
> this supposed ability of determining whether or not content is AI-generated doesn't exist.
It seems like you’re just wrong here? Em dashes aside, the ‘style’ of llm generated text is pretty distinct, and is something many people are able to distinguish.
No, I'm not wrong. Someone could easily write in the default output style of ChatGPT by hand (which will probably become increasingly common the longer that style remains in place), and someone could easily collaborate with ChatGPT on writing that looks nothing like what you're thinking.
If organizations like schools are going to rely on tools that claim to detect AI-generated text with a useful level of reliability, they better have zero false positives. But of course they can't, because unless the tool involves time travel that isn't possible. At best, such tools can detect non-ASCII punctuation marks and overly cliched/formulaic writing, neither of which is academic dishonesty.
Okay, you’re right that the LLM writing style isn’t singularly producible by LLM’s. However, I’m not sure why this writing style would become increasingly common? I don’t see why people would mimic text that is seen as low quality or associated with academic dishonesty.
Additionally, I do think it is valuable to determine if a piece of text is valuable, or more precisely, what I’m looking for. As others have said, if I want info from a LLM about a subject, it is trivial for me to get that. Oftentimes I am looking for text written by people though.
However, I’m not sure why this writing style would become increasingly common?
I was basing that on a few factors, off the top of my head:
1. Someone might pick up mannerisms while using LLMs to help learn a new language, similarly to how an old friend of mine from Germany spoke English with an Australian accent because of where she learned English.
2. Lonely or asocial people who spend too much time with LLMs might subconsciously pick up habits from them.
3. Generation Beta will never have known a world without LLMs. It's not that difficult to imagine that ChatGPT will be a major formative influence on many of them.
As others have said, if I want info from a LLM about a subject, it is trivial for me to get that.
Sure, it's trivial for anyone to look up a simple fact. It's not so trivial for you to spend an hour deep-diving into a subject with an LLM and manually fact-checking information it provides before eventually landing on an LLM-generated blurb that provides exactly the information you were looking for. It's also not trivial for you to reproduce the list of detailed hand-written bullet points that someone might have provided as source material for an LLM to generate a first draft.
This is all future concerns; if it happens, then people can change their heuristics. There's no point trying to predict all possible futures in everything that you do.
The comment you're replying to isn't related to the topic of heuristics. The first part is explicitly an answer to a question concerning a future prediction.
There are zillions of words produced every second, your time is the most valuable resource you have, and actually existing LLM output (as opposed to some theoretical perfect future) is almost always not worth reading. Like it or not (and personally I hate it), the ability to dismiss things that are not worth reading like a chicken sexer who's picked up a male is now one of the most valuable life skills.