In some cases an AI will make a weird word choice. So do a lot of humans. Someti...

red_admiral · on June 12, 2024

A fairly simple and useful AI detector that works uncannily well on student papers: (a) does the text contain "I am an AI" or words to that effect, (b) are there lots of completely made up references?

Lalabadie · on June 12, 2024

Precisely, it's a tech that aims to write the most average, most likely human text.

TeMPOraL · on June 12, 2024

LLMs don't average, they learn the distribution, from which you then sample (or the UI does it for you). Because of that, they don't write in a single style that's a blend of many human styles - they can write in any and all of the human styles they saw in training, as well as blend them to create styles entirely "out of distribution". And it's up to your prompt (and sampling parameters) which style will be used.

Lalabadie · on June 13, 2024

True, a better choice of words than average would have been: a very likely completion to a starting state/input.

TeMPOraL · on June 13, 2024

Perhaps, but in the context of this thread, what's important is that the space of possible completions is encompassing every writing style imaginable and then some, and the starting state/input can be used to direct the model to arbitrary points in that space. Simple example template:

  Please write <specifics of the text you want LLM to write>, <style instruction>.

Where <style instruction> = "as if you were a pirate", or "be extremely succinct", or "in the style of drunk Shakespeare", or "in Iambic pentameter", or "in style mimicking the text I'm pasting below", etc.

There's no way those "AI detectors" could determine whether the text was written by AI from text itself, as it's trivial to make LLM output have any style imaginable.

shmel · on June 12, 2024

Yeah, I am shocked people keep repeating this. We've seen LLMs easily writing mathematical proofs in the style of Shakespeare, come on.

derbOac · on June 12, 2024

There's still some typicality defined by the prompt though. If you ask for a proof in the style of Shakespeare, you're going to get some "average" Shakespeare. It's kind of embedded in the task definition; you're shifting the reference distribution.

If a LLM returned something really unusual for Shakespeare when you didn't ask for it, you'd say it's not performing well.

Maybe that's tautological but I think it's what's usually meant by "average".

I'm sure LLMs with something different is on the near horizon but I don't think we're there quite yet.

chrsig · on June 12, 2024

> you're going to get some "average" Shakespeare

The point was that no, you wont (necessarily) get some "average" shakespeare. A sampler may introduce bias and look for the "above average" shakespeare in the distribution.

dartos · on June 12, 2024

Saying they find some “average” is an easy way to explain to a layman that LLMs are statistically based and are guessing and not actually spitting out correct text as you would expect from most other computer programs.

That’s why it’s repeated. It’s kind of correct if you squint and it’s easy to understand

shmel · on June 12, 2024

What is the correct text anyway? Everything around you is somewhat wrong. Textbooks (statistically all of them) contain errors, scientific papers sometimes contain handwavy bullshit and in rare cases even outright falsified data, human experts can be guessing as well and they are wrong every now and then, programs (again pretty much all of them) contain bugs. It is just the reality.

Even very simple ones may require you to twist the definition of "correctness". I open a REPL and type "1/3.0*3.0" and get "0.9999999999". Then you have to do mental gymnastics like "actually it is a correct answer because arithmetic in computers is implemented not like you'd expect".

dartos · on June 12, 2024

> What is the correct text anyway?

Exactly. The fact that language is fuzzy is why LLMs work so well.

The issue is that most people expect computers to not make mistakes. When you write a formula in an excel sheet, the computer doesn’t mess up the math.

The average non tech person knows that humans make mistakes, but are not used to computers making mistakes.

Many people, maybe most, would see an answer generated by a computer program and assume that it’s the correct answer to their question.

In pointing out that LLMs are guessing at what text to write (by saying “average”) you convey that idea in a simplified way.

Trying to argue that “correct” doesn’t mean anything isn’t really useful. You can replace the word “correct” with “practically correct” and nothing about what I said changes.

shmel · on June 12, 2024

What do you mean they aren't used to computers making mistakes? Have they ever asked Siri/Alexa something and got useless answers? Have they ever seen ASR or OCR software making mistakes? Have they called semi-automated call centers with prompt "say what you need instead of clicking numbers" only to hear repeated "sorry, I don't understand you" until you scream "connect me to a bloody human"? Have they ever seen a situation when automated border control gates just don't work for whatever reason and there are humans around to sort this out? Have they ever used google translate in last 20 years for anything remotely complicated, like a newspaper article? Have they ever used computers for actual math? Is computer particularly good at solving partial differential equations, for example? Have they ever been in a situation where GPS led them to a closed road or a huge traffic jam? Have they ever played video games where computers sometimes make stupid things?

Sure, computers are better at arithmetic humans, but let's be honest, nobody uses chatgpt as a calculator. Last 20 years AI is getting everywhere, we keep laughing that sometimes AI systems make very obvious stupid mistakes. Now we finally have a system that makes subtle mistakes very confidently and suddenly people are like "I thought computers are never wrong". I can't fathom how anyone would expect that.

dartos · on June 12, 2024

You seem to be picking out sentences in my responses and arguing them instead of the point I’m trying to illustrate.

I don’t really think there’s much left for us to talk about.

TeMPOraL · on June 12, 2024

The word you're looking for is "any", not "average". As in, it can write like any human, any way you want it to. Not just as some "average human".

dartos · on June 12, 2024

The word “average” implies a use of statistics, which is why I think it’s good to use, even though it’s not precise.

TeMPOraL · on June 13, 2024

People aren't as dumb as you seem to imply here; "average" isn't accurate enough. "Random", or even "statistical", would be less confusing.

dartos · on June 13, 2024

We can keep bikeshedding, sure.

I don’t think people are dumb, just that the vast majority of people dont have knowledge of statistics and AI.

“Random” isn’t good either. Obviously the text gpt and other LLMs generate isn’t random.

Statistical works too, sure.

jdietrich · on June 12, 2024

Not quite true for an LLM chatbot with RLHF - it aims to provide the most satisfactory response to a prompt. AI detectors are snake oil to begin with, but they're super snake oil if people are smart enough to include something in their prompt like "don't respond in the style of a large language model" or "respond in the style of x".

brookst · on June 12, 2024

Conceptually it seems like the average of all human texts would be distinct from any users because it would blend word choices and idioms across regions, where most of us are trained and reinforced in a particular region.

Other statistical anomalies probably exist; it is certainly possible to tell that an average is from a larger or smaller sample size (if I tell you X fair coin flips came up heads 75% of the time, you can likely guess X, and can tell that X is almost certainly less than 1000).

But in practice it doesn’t look possible, or at least the current offerings seem no better than snake oil.

cesarb · on June 12, 2024

> Conceptually it seems like the average of all human texts would be distinct from any users because it would blend word choices and idioms across regions

That's only true in the aggregate. Within a single answer, LLMs will try to generate a word choice which is more likely _given the preceding word choices in that answer_, which should reduce the blending of idioms.

> where most of us are trained and reinforced in a particular region.

The life experience of most of us (at least here in HN) is wider than that. Someone who as a child visited every year their grandparents in two different regions of the country could have a blend of three sets of regional idioms, and that's before learning English (which adds another set of idioms from the teachers/textbooks) and getting on the Internet (which can add a lot of new idioms, from each community frequented online). And this is a simple example, many people know more than just two languages (each bringing their own peculiar idioms).

brookst · on June 12, 2024

While I agree that many people visit two regions of the same country, I think few of us would display word choice patterns reflecting the US, England, and Australia within a single piece. Could it happen? Sure. But LLMs won't have the bias towards likely combinations, except inasmuch as that's represented in training data.

cesarb · on June 13, 2024

> I think few of us would display word choice patterns reflecting the US, England, and Australia within a single piece.

Someone who learned English mostly through books and the Internet could very well have such a mixture, since unlike native speakers of English, they don't have a strong bias towards one region or the other. You could even say that our "training data" (books and the Internet) for the English language was the same as these LLMs.

mjburgess · on June 12, 2024

The problem is that word use is power-law distributed, so that the most common ~200 words in use are extremely over-represented, and that goes for phrases and so on.

It takes a lot of skill and a long time to develop a unique style of writing. The purpose of language is to be an extremely-lossy on-average way of communicating information between people. In the vast majority of cases, idiomatic style or jargon impairs communication.

bongodongobob · on June 12, 2024

That's not true at all. I could tell it to write like an unhinged maniac, someone who never uses contractions, or George Washington with a lisp.

Lalabadie · on June 13, 2024

And it would give you a response that others are likely to give, in the context of this prompt.

add-sub-mul-div · on June 12, 2024

In the general internet the reputation of AI writing is that it's writing that's bad/awkward in a way that is often identifiable (by humans) as not having been written by humans.

AI detectors are useless, you're right, but for the same reason AI is unreliable in other contexts, not because AI writing is reliably passable.

NoMoreNicksLeft · on June 12, 2024

> in a way that is often identifiable (by humans) as not having been written by humans.

You should check out reddit sometime. It's been nearly twenty years (not hyperbole) of everyone accusing everyone else of being a bot/shill. Humans are utterly incapable of detecting such things. They're not even capable of detecting Nigerian prince emails as scams.

> not because AI writing is reliably passable. "Newspaper editor" used to be a job because human writing isn't reliably passable. I say this not to be glib, but rather because sometimes it's easy for me to forget that. I have to keep reminding myself.

Also, has it not occurred to anyone that deep down in the brainmeat, humans might actually be employing some sort of organic LLM when they engage in writing? That technology actually managed to imitate that faculty at some low level? So even when a human really writes something, it's still an LLM doing so? When you type in the replies to me, are you not trying to figure out what the next word or sentence should be? If you screw it up and rearrange phrases and sentences, are you not doing what the LLM does in some way?

chipotle_coyote · on June 12, 2024

> Also, has it not occurred to anyone that deep down in the brainmeat, humans might actually be employing some sort of organic LLM when they engage in writing?

This is a fairly common take, along with the idea that AI image generators are just doing what humans do when they "learn from examples". But I strongly believe it's a fallacy. What generative AI does is analagous to what humans do, but it's still just an analogy. If you want to see this in action, it's better to look at the way generative AI fails than the way it succeeds: when it makes mistakes in text or images, the mistakes are very much not the kind of mistakes that humans make, because the process behind the scenes is very different.

Yes, obviously when humans write, they take into account context and awareness of what words naturally follow other words, but it seems unlikely we've learned to write by subconsciously arranging all the words we've encountered into multidimensional vector space and performing vector math operations to arrive at the next word based on the context window we're subconsciously constructing. We learn to write in a very different way.

It's truly amazing that generative AI writes as well as it does, but we reason about concepts and generative AI reasons about words. Personally, I'm skeptical that the problems LLMs have with "hallucinations" and with creating definitionally median text* can be solved by making LLMs bigger and faster.

*I did see the comment complaining that it's not mathematically accurate to say that LLMs produce average text, but from my understanding of how generative AI works as well as my recent misadventures testing an AI "novel writer," it's a decent approximation of what's going on. Yes, you can say "write X in the style of Y," but "write X but make it way above average" is not actually going to work.

NoMoreNicksLeft · on June 12, 2024

> But I strongly believe it's a fallacy.

Either the LLM is the most efficient way to generate text, or there's some magic algorithm out there that evolution stumbled upon a million years ago that we haven't even managed to see a hint that it exists. In which case, you'd be right, this is a fallacy.

Or, brainmeat can't do it better or more efficiently, and either uses the same techniques or something even worse. The latter seems unlikely, humans still do pretty well at generating text (gold standard, even).

> it's better to look at the way generative AI fails than the way it succeeds: when it makes mistakes in text or images, the mistakes are very much not the kind of mistakes that humans make, because the process behind the scenes is very different.

But are you looking at "mistakes" that are just little faux pas, or the ones where people with dementia, bizarre brain damage, or blipped out on hallucinogens incorrectly compute the next word? The former offer little insight. Poor taste in word choice, lack of eloquency, vulgar inclinations are what they amount to.

> but it seems unlikely we've learned to write by subconsciously arranging all the words we've encountered into multidimensional vector space and performing vector math operations to arrive at the next word

You think I meant that someone learns to do that at 2 years old, rather than that the brain has already evolved with the ability to do vector math operations or some true equivalent? I'm not talking about some pop psych level "subconscious" thing, but an actual honest to god neurological level faculty.

> but we reason about concepts and Wander into Walmart next time, close your eyes briefly and extend your psychic powers out to the whole building, and tell me if you truly believe, deep down in your heart, that the humans in that store are reasoning about concepts even once a week. That many, if not most, reason about concepts even once a month. I dare you, just go some place like that, soak it all in.

Human reason exists, from time to time, here and there. But most human behavior can be adequately simulated without any reason at all.

Filligree · on June 12, 2024

> Or, brainmeat can't do it better or more efficiently, and either uses the same techniques or something even worse. The latter seems unlikely, humans still do pretty well at generating text (gold standard, even).

Considering we use something like a thousand times the compute, "something even worse" seems plausible enough.

lottin · on June 12, 2024

I think we have plenty of evidence that humans have the ability to understand, while chatbots lack such an ability. Therefore, I'm inclined to think that we don't employ some sort of organic LLM but something completely different.

NoMoreNicksLeft · on June 12, 2024

I've occasionally seen evidence that some humans seem to sometimes understand. I've learned not to generalize that though.

RcouF1uZ4gsC · on June 12, 2024

Precisely. And they way to get better writing is by having good editors.

The major newspapers and magazines used to have good editors and proofreaders and it used to be rare to see misspellings or awkward sentences, but those editors have been seriously cut back and you see these much more commonly.

But hey, let’s blame something else.

BugsJustFindMe · on June 12, 2024

But also maybe firing writers who make weird word choices and are needlessly wordy is fine.

throwaway48476 · on June 12, 2024

Yes please. The art of writing is conveying the most meaning in the fewest words.

amonith · on June 12, 2024

Eh, eventually AI will write like humans but currently most of the time it's very much apparent what was written by AI. English is my second language so it's hard for me to pinpoint the exact reason why but I guess it's more about the tone and the actual content (a.k.a bullshit) rather than grammar / choice of words.

Most of the time AI slop reads like a soulless corporate ad. Probably because most of the content the AI was trained on was already SEO optimized bullshit mass produced on company blogs. I'd very much like a tool that would detect and filter out also those from "my internet".

retrac98 · on June 12, 2024

If the AI writing is good, you’re not going to know it’s written by AI and you’ll continue to think you "can always tell" while more and more of what you read isn’t written by humans.

cheq · on June 12, 2024

the only way to know if it's original or AI-written is to know the author writing skills beforehand.

guappa · on June 12, 2024

Yeah but to obtain good, you have to give such a specific and detailed prompt that you might just do it yourself.

amonith · on June 12, 2024

Yeah but to reach that point you will probably need those "useless AI detectors" (as stated by the comment I was replying to). That was my point - we're not there yet therefore those tools can be useful.

retrac98 · on June 12, 2024

But how do you know we’re not there yet? Not across the board, but isn’t it possible there’s a small yet growing portion of written content online that’s AI generated with no obvious tells?

amonith · on June 12, 2024

I think we have a misunderstanding - I don't mind if I'm reading AI generated content as long as it doesn't look like "the typical AI content" (or SEO slop). In my point of view companies/writers might use AI detectors to continue improving the quality of their content (even if it's written by hand, those false positives might be a good thing). We're not there yet because I still see and read a lot of AI/SEO slop.

I agree with you that the "portion of written content online that’s AI generated with no obvious tells" is "small yet growing". That's exactly the thing - it's still too small to "be there yet" :)

zamadatix · on June 12, 2024

I don't follow how you're reaching your conclusion. You only mind reading AI content when it's obviously AI/slop and you conclude the vast majority of decent content is not AI generated. In your conclusion how were you able to identify good content as being written by AI or not?

E.g. it's perfectly possible that in terms of prevalence "AI slop > AI acceptable > human acceptable" instead "AI slop > human acceptable > AI acceptable" and nothing noted explains why it is one instead of the other.

rightbyte · on June 12, 2024

Semi-automated such is probably widespread by now.

Like imagine the Rust Evangelic Task Force, but for the next big thing, will probably be shilled by bots.

Low quality content like Youtube and Reddit comments are probably mostly LLM bots who comment on anything to hide the actual spam comments.

cheq · on June 12, 2024

"most of the content the AI was trained on was already SEO optimized bullshit mass produced on company blogs."

totally agree in the last part, I work with copywriting and I spent most of my prompts trying to double-down on the pretentious discourse.

giancarlostoro · on June 12, 2024

Honestly, I could care less if an author uses AI as long as I can understand what I'm reading and it's interesting. They still have to instruct the AI.

immibis · on June 12, 2024

If you're reading nonfiction, it means you're wasting time reading a lot more words when you could have just read the prompt.

vundercind · on June 12, 2024

Remind you of some entire genres of book?

That’s right, business and self-help books!

Any of these with an author who’s got actual accomplishments and money before writing the book was almost certainly already ghostwritten from an outline (and so are lots of other books, you’d be surprised, it’s not just these genres). Successful CEOs or people you’ve heard of generally don’t write their own books. Often, they’re terrible writers, and even if they’re not, writing is time-consuming and as with everything else that actually creates something they prefer to pay someone else to do it.

As of last year new books in that category are written by AI and edited by one or more humans—with each editor doing just two or three chapters, you can finish one of these books in a month or less.

oblio · on June 12, 2024

Well, to err is human, to truly screw up you need a computer.

We're going to be blasted to smithereens with LLM-generated "80% should be good enough" garbage.

vundercind · on June 12, 2024

It’s fortunate we have mountains of human-written books, film, television, radio programs, music, and video games from Before AI. Just the good stuff could occupy several lifetimes.

Pity we killed most of the good used book stores already, though.

Also, shame about journalism and maybe also democracy. That’s too bad.

giancarlostoro · on June 12, 2024

In my case, I talk a lot, and write a TON, my use for AI is really "can you say the same information with less words" then I tweak what it gives me. To be fair, I'm not a paid writer, just a dev writing emails to business people. I rewrite emails like 20 times before sending them. ChatGPT has helped me to just write it once, and have it summarized. I usually keep confidential details out and add them in after if needed.

gravescale · on June 12, 2024

Indeed you can losslessly "compress" an LLM's spew into just the prompt (plus any other inputs like values of random variables).

But you can also compress a book's entire content into just its ISBN.

It's just that books are hopefully more than just statistical mashups of existing content (some books like textbooks and encyclopaedias are kinds of mashup, though one hopes the editors have more than a statistically-based critical input!)

guappa · on June 12, 2024

You can't regenerate the book from the ISBN. But you can generate the text from the prompt.

gravescale · on June 12, 2024

You can go and fetch the book from a book store using the information. Fundamentally there's not much difference between that and "fetching" the output from some model using the matching prompt. In both cases there some kind of static store of latent information that can be accessed unambiguously using a (usually) shorter input.

I'm not saying the value of the returned information is equivalent, of course. But being "just a pointer" into a larger store isn't, in itself, the problem to me.

guappa · on June 13, 2024

You realise that you can't fetch a new isbn without altering the archive, while this is not the case for every new prompt that you come up with?

gravescale · on June 13, 2024

I don't understand the distinction. If the book archive is electronic, like many in fact are, why can you not get a copy of the book with a given ISBN without altering anything? Even if it's not electronic, does the acquisition of a book by an individual meaningfully change the overall disposition of available information? If you took the last one in your local Waterstones, I can still get one elsewhere.

guappa · on June 13, 2024

> If the book archive is electronic, like many in fact are, why can you not get a copy of the book with a given ISBN without altering anything?

Because new books are written?

It feels to me that you are set on insisting that a prompt and an ISBN are the same, and no amount of logic will move you from there.

gravescale · on June 13, 2024

Models can be trained more and fine tuned, though, if we're going to stick to the analogy. But in the context of the analogy, the LLM won't be materially updated between two prompts in roughly the way that telling you that the answer you seek is in a book with a specific ISBN isn't materially affected by someone publishing a new book at that moment.

You are quite right that you're not convincing me of your original thesis that that a prompt contains the entire content of the reply in a way that some other reference to an entity in some other pool of information to doesn't. That's not the same as saying "ISBNs and LLM prompts are the same thing", which is a strawman. It's saying that they're both unambiguous (assuming determininism) pointers to information.

Of course no-one is disagreeing that a reply from a deterministic LLM would add no information to the global system (you, an LLM's model, a prompt) than just the prompt would. But I still think the same is true for the content of a book not adding to the system of (you, a book store, an ISBN).

In fact, since random numbers don't contain new information if you know the distribution, one can even extend it to non-deterministic LLMs: the reply still adds no information to the system. The analogy would then be that the book store gives you at random a book from the same Dewey code as the ISBN you asked for. Which still doesn't increase the information in the system.

oblio · on June 12, 2024

Can you, though? I thought LLMs just by virtue of how they work, are non-deterministic. Let alone if new data is added to the LLM, further retraining happens, etc.

Is it possible to get the same output, 1:1, from the same prompt, reliably?

mrgoldenbrown · on June 12, 2024

They are assuming a lot of things, like the LLM doesn't change, and that you have full control over the randomness . This might be possible if you are running the LLM locally.

guappa · on June 13, 2024

Well if the llm change, why not assume the index system doesn't change too?

And yeah I guess if you control the seed an llm would be deterministic.

patrakov · on June 12, 2024

Not true if the author of the prompt used an iterative approach. Write the initial prompt, get the result, "simplify this, put more accent on that, make it less formal", get the result, and so on, and edit the final output manually anyway.

HPsquared · on June 12, 2024

Depends on your own level of background knowledge vs. the author's.

luma · on June 12, 2024

OpenAI announced that they had started on an AI text detector and then gave up as the problem appears to be unsolvable. The machine creates statistically probable text from the input, applying statistics to the generated result will show nothing more than exactly that. You’re then left triggering false positives on text that is the most likely which makes the whole thing useless.

jgalt212 · on June 12, 2024

> OpenAI announced that they had started on an AI text detector and then gave up as the problem appears to be unsolvable.

Making a reliable LLM also appears to be unsolvable, but we still work at it and still use the current wonky iterations. My comment is even if there is no perfect AI detectors, a lot of these tools are good enough for a "first pass"--coincidentally the same use case many effective LLM practitioners use LLMs for.

luma · on June 12, 2024

Sure it could maybe be kinda right, but what is the cost of a false positive? If you have, say, a 10% false positive rate, and there are theoretical reasons to think you’ll never get that anywhere close to zero, then what use case does this serve? Hey student, there’s 90% chance you cheated, well no I’m that 10%. What now?

Again, OAI cancelled work on this believing it not to be solvable with a high degree of confidence. What is the use case for a low confidence AI detector?

jgalt212 · on June 13, 2024

> What is the use case for a low confidence AI detector?

What's the use case for LLMs in general if you always have to double-check their work?

luma · on June 13, 2024

“The entire current AI market”.