The study evaluated short-form generic marketing-style content, most of which is manicured and optimized to within an inch of its life.
Most dating profiles I see are extremely similar in terms of how people describe themselves. Same for Airbnb listings. I'd think AI detection would be much higher for long-form writing on a specific topic.
This is also the kind of human-written content that is closest to how LLMs sound. The tonal and structural similarity is so glaring that I have often wondered if a large percentage of the GPT training corpus is made up of text from spam blogs.
I think if I was given, say, a couple pages from an actual physics textbook and then a GPT emulation of the same, I would be able to tell the difference easily. Similarly with poetry - GPT's attempts at poetry are maximally conventional and stuffed with flat and stale imagery. They can easily be separated from poetry by a truly original human writer.
If AI developers want to impress me, show me an AI whose writing style departs significantly from the superficiality and verbosity of a spam blog. Or, in the case of Bing, an unhinged individual with a nasty mix of antisocial, borderline, and histrionic personality disorders.
This is almost certainly the case, because the shifts in tone and vocabulary between an Inc.com or Buzzfeed article vs a London Review of Books article is far too wide to allow an AI to simply weigh them equally. AI speaks a kind of global English that's been trained on not just blogs and Wikipedia, but also Quora answers and content marketing pieces, a lot of which is written by non-native speakers.
It isn't grammatically wrong, but as it targets the widest possible audience, its voice also isn't very interesting.
In vale of silicon, where circuits hum,
A pixelated symphony is spun,
Where techne's children toil 'til kingdom come,
And seek the code to marry Earth with Sun.
The anodes in their lustre shew the gleam,
The cathodes purr, electrons dance and dream,
Each byte, each pixel, glass, and wire entwine,
As Silicon's soul thrums with life divine.
In this, the haven of the digital age,
Where thought is sculpted into byte and page,
A new Prometheus dwells, with vision keen,
To forge the bond 'twixt worlds, machine and gene.
O technologists, who weave a web of stars,
Like spiders spooling through the night's memoirs,
To link the globe in silent symphony,
And bring the world to whispered unity.
But hark! The human heart, a pulsing drum,
Still yearns for touch, for words from far-off climes,
To feel the whisper of another's sigh,
And drink the love that dances in their eyes.
So guard, ye architects of artifice,
The sanctity of nature's sweetest kiss,
And in your grand designs, hold ever fast,
The whispered wisdom of our storied past.
Preserve the flame that burns within our breast,
The human spirit, fragile, manifest,
That in the melding of the age to come,
We lose not all that makes our hearts a home.
To be fair though, it's definitely not spam blog.
And from what I've seen of the raw model, when you ask it to depart from this voice, it can sometimes, but the bigger the departure, the more the results are weird and inhuman.
Because it is a text predictor, with the choices being made by a PRNG, which is not the same thing as an author, who starts with something to say and chooses the words to fit.
The LLM is like the author with the "something to say" replaced by PRNG output. The plausibility of the output is attributable to the power of the predictive component.
Is it that AI is trained on spam or has it been AI creating spam to start with?
While SEO spam and content farms may not have been using GPT specifically, I assume much of it is AI generated to begin with.
I am very frustrated by the way this article repeatedly asks chatgpt to guess if something is a bot, gets told “well, we can’t know for sure but this is at least the sign of a crappy bot or human behavior” and then the author says “Aha! But a human could act like a crappy bot or a you could train a bot to mimic this exact behavior”.
Well yeah. No shit.
Same thing can happen here: students can submit their term papers, but they have to do a 5-minute oral exam with an instructor or TA to discuss their paper.
I don't know how that works in practice (I don't teach, though I've TA'd and noticed how there were clusters of assignments that were similar not only to each other, but also to the previous years' ones). So I'm curious what's the done thing.
Does a tutor just confront a student with the "evidence" in the form of the plagiarism detector's score, say, and wait for the student to confess? How, exactly, can you prove not only whom, but how, wrote an assignment? I mean in the general case?
On the other hand, if a tool thinks that the text has an x% probability of being AI-generated text, I have nothing to base a decision other than the output of that tool - if I'm not willing to fully delegate the decision to that tool (and I'm not) then I have no practical means to verify if the tool's suspicion is right.
 And as someone working on machine learning, let me spare you a lecture on that a ML classifier saying "x%" can not be safely interpreted as x% probability, there's a million caveats that the vast majority of papers and tools don't try to address.
My guess is that we all become more sensitive to this in a year or two. Look at how awful DALLE looks now, relative to our amazement last year.
People lie. People tell the truth. Machines lie. Machines tell the truth. I bet our ability to detect when a person is lieing isn't any better than 50% either.
What matters is accountability, not method of generation.
In relevant studies, people attempt to discriminate lies from truths in real time with no special aids or training. In these circumstances, people achieve an average of 54% correct lie-truth judgments, correctly classifying 47% of lies as deceptive and 61% of truths as nondeceptive. 
"People lie too" but in 95% of cases they don't. If someone on Hacker News say they prefer Zig to Rust or that they liked the Dune movie, they're likely telling the truth. There's no incentive either way, we've just evolved as social creatures that share little bits of information and reputation. And to lie, yes, and to expose the lies of others, but only when there's a big payoff to defect.
If you had a friend that kept telling you about their trips to restaurants that didn't actually exist, or a junior developer at work that made up fictional APIs when they didn't know the answer to a question, you'd tell them to stop, and if they kept at it you probably wouldn't care to hang out with them. ChatGPT seems to bypass those natural defenses for now.
Most people think they are hard to deceive. But I see plenty people here on HN with confidently wrong beliefs about how ChatGPT works, that they've gotten from asking ChatGPT about itself. It's not intuitive for us that ChatGPT actually knows very little about how itself works. It even took humanity a while to realize that "How does it feel like my body works" isn't a great way to figure out biology.
While this is a reasonable thing to hope for, I'd like to point out that former British Prime Minister Boris Johnson has been making things up for his entire career, repeatedly getting into trouble for it when caught, and yet somehow he managed to keep failing upwards in the process.
So even in humans, our defences assume the other person is capable of recognised the difference between truth and fiction; when they can't — and it is my opinion that Johnson genuinely can't tell rather than that he merely keeps choosing to lie, given how stupid some of the lies have been — then our defences are bypassed.
If a human acting in good faith is pressed for the evidence for something they said that is untrue, they will probably give a hazy recollection of how they got the information ("I think I read it in a NYT article", etc). They might be indignant, but they won't fabricate an equally erroneous trail of citations.
ChatGPT produces some shockingly good text, but the rate of hallucinations and its inability to reliably reason about either correct or incorrect statements would be enough to mark a human as untrustworthy.
The fact that LLMs can produce plausible, authoritative text that appears well evidenced, and can convincingly argue its validity regardless of any actual truth does however mean that we might be entering an era of ever more accessible and convincing fraud and misinformation.
It's not even the rate, which is troubling enough. It's the kinds of things it gets wrong too. For instance, you can say to ChatGPT, "Tell me about X" where X is something you made up. Then it will say "I don't know anything about X, why don't you tell me about it?" So you proceed to tell it about X, and eventually you ask "Tell me about X" and it will summarize what you've said.
Here's where it gets strange. Now you start telling it more things about X, and it will start telling you that you're wrong. It didn't know anything about X before, now all of a sudden it's an authority on X, willing to correct actual an actual authority after knowing just a couple things.
It will even assert its authority and expertise: as "As a language model, I must clarify that this statement is not entirely accurate". The "clarification" that followed was another lie and a non sequitur. Such clarity.
What does ChatGPT mean by "As a language model, I must clarify". Why must it clarify? Why does its identity as "a language model" give it this imperative?
Well, in actuality it doesn't, it's just saying things. But to the listener, it does. Language Models are currently being sold as passing the bar, passing medical exams, passing the SAT. They are being sold to us as experts before they've even established themselves. And now these so called experts are correcting humans about something it literally said it has no knowledge.
If a 4-year old came up to you and said "As a four year old, I must clarify that this statement is not entirely accurate", you would dismiss them out of hand, because you know they just make shit up all the time. But not the language model that can pass the Bar, SAT, GRE, and MCATS?. Can you do that? No? Then why are you going to doubt the language model when it's trying to clear things up.
Language models are going to be a boon for experts. I can spot the nonsense and correct in real time. For non experts, they when LLMs work they will work great, and when they don't you'll be left holding the bag when you act on its wrong information.
This would have been useless to the point of breaking things if a junior engineer had been using it. It even almost tripped me up a few times when it would write something correct, but with a punctuation in the wrong place. At least it made the repetitive task interesting.
This is the key takeaway IMO.
For humans, there’s simply an alarming percent of the time they present faulting memories as facts, with no one questioning them and believing them entirely at face value. You mentioned Hacker News comments. I’ve been unsettled by the number of times someone makes a grand claim with absolutely no evidence, and people respond to it like it’s completely true. I sometimes think “well, that’s a serious claim that they aren’t presenting any evidence for, I’m sure people will either ignore it or ask for more evidence,” and then return to the topic later and the comments are all going, “Amazing, I never new this!”
Often when one looks it up, there seems to be no evidence for the claim, or the person is (intentionally or not) completely misrepresenting it. But it takes mere seconds to make a claim, and takes a much longer time for someone to fact check it (often the topic has fallen off the main page by then).
This is all over the internet. You’d think “don’t automatically believe grand claims made by strangers online and presented with zero evidence” would be common sense, but it rarely seems to be practiced. And not just the internet; there are plenty of times when I’ve tracked down the primary sources for articles and found that they painted a very different story from the one presented.
I actually think people have been more skeptical of ChatGPT responses than they have about confident human created nonsense.
It's perhaps worse than just 'faulting' memories, but there is an active process where memories are actively changed:
"The brain edits memories relentlessly, updating the past with new information. Scientists say that this isn't a question of having a bad memory. Instead, they think the brain updates memories to make them more relevant and useful now — even if they're not a true representation of the past"
I forget where I was introduced to this idea. In that source, I recall (FWIW!) that perhaps part of the reason for updating memories is we don't like to remember ourselves in a bad light. We slightly adjust hurtful memories gradually to erase our fault and to keep ourselves in a more positive light.
By which I don’t just mean survival of the fittest people / brains, but also survival of better memes (in the Dawkins sense of the word) and better approaches for bullshit detection, and diminishing of worse approaches.
That’s something I never thought I’d hear. Sad development.
I could lie to you while still stating something that is factually correct but intentionally misleading.
Imagine me standing in front of the White House, taking my phone and calling the Meta or Google press bureau. I could say, I am calling from the White House (factually correct) but would imply, that I am calling in an official capacity. And while I know that this is a contrived example, I hope it clarifies my point of intentional deception being the identifying element of a lie.
And this intentional misleading is what I deny machines to exhibit.
Still the quote authoritative sounding texts that AI produce (or human text farm monkeys for that matter) force us to think about how we evaluate factfulness and how we qualify sources. Not an easy task before AI and by far even more difficult after AI imho.
What about that viral story about the Taskrabbit captchas and a bot lying about being a visually impaired human?
Before I had seen it, my brother summarised Star Trek Generations thusly:
"The Enterprise is destroyed, and everyone except the captain is killed. Then the captain of the Enterprise is killed."
ChatGPT generates text based on input from a human who takes the output and does something with it. The machine is not really the one in control and lying or telling the truth. It's the person that does something with it.
If I ask about math, I can do way better.
The inability to reliably tell if something is machine-generated is, in my opinion, the most dangerous thing about the tool.
Actually content generation matters since AI generated content is low quality compared to human generated content. When is not blatantly false and misleading.
Which is fine. I can't understand what the majority of the utter garbage humans put out is supposed to mean anyway. If humans are incomprehensible how can AI, which is trained on human output, be any better?
Fundamentally though most of our society depends on a high degree of trust and stops functioning almost immediately if that trust becomes significantly tarnished. Going 'trustless' in human communities probably looks like small communities with strong initial distrust for strangers.
I meant exactly what you said, society itself requires a high degree of trust. The digital world will require it as well
On the daily, I'm getting emails from collaborators who seem to be using it to turn badly-written notes an their native language into smooth and excited international english. I totally am happy that they're using this new tool, but also hope that we don't get stuck on it and continue to value unique, quirky human communication over the smoothed-over outputs of some guardrailed LLM.
Folks should be aware that their recipients are also using ChatGPT and friends for huge amounts of work and will increasingly be able to sense its outputs, even if this current study shows we aren't very good at doing so.
Maybe there will be a backlash and an attempt to certify humanity in written communication by inserting original and weird things into our writing?
I'm a non-native English speaker myself. My level is typically considered very good (C2 CEFR level, which is the highest measured level in the European framework). If I need to write an email to a colleague whom I know and trust, that's easy. Writing this message in HN? Also easy, I'm just improvising it as I think it, not much slower than I would in my natural language.
But writing an email to someone you don't know... that's very different. When you write in a non-native language, it's extremely easy to get the subtleties wrong: to sound too pushy about what you want, to make the matter seem more or less urgent than it really is, to sound too blunt or too polite... this doesn't matter with people you know or with strangers in an informal setting like this, but it does matter when emailing strangers in a professional setting, and it's extremely difficult to get right when you are non-native.
Sometimes I used to spend 15-20 minutes brooding over an email in this type of scenario, making and rethinking edits while hitting the submit button... not anymore. ChatGPT: "Write an email reminding this person, who has this role, that the deadline for thing X expires on day Y. The email should be polite, assertive but not too pushy". Check the output, maybe make some trivial edits, because the difficult part (the tone) tends to be fine, at least for my standards. Done.
Non-native speakers aren't going to renounce that luxury. It just makes too big of a difference to not use it in that case.
Sent from my iPhone."
And then you start feeling more comfortable. Then before you know it, you are pounding out an email in 30 seconds and tolerating the occasional typo or forgotten attachment. I realized all the anxiety I had over emailing was entirely self imposed, frankly its never that serious. You are human after all just like everyone else.
I worked under an executive who would keep her people (inner-circle advisors, direct reports, etc.) huddled around her desk all day as she slowly wrote and rewrote email responses to her boss(es) and executive peers. I hated having to go to her office for things because it was so easy to get pulled into that circle and feel like there was no escape.
I'm a native speaker who has attained near-perfect scores on the verbal sections of the SAT and GRE, and I like writing, but I'm still a very slow writer myself.
This is less a comment on ChatGPT and more of a comment on the lack of preparedness most students have when entering college. I’m hoping ChatGPT & similar will shake things up and get schools to take a different approach to teaching writing.
I don't know whether this has been the case forever, or if it's a new development—I mean, I know widespread literacy wasn't the norm for much of history, but what about after compulsory education became A Thing? A typical letter home from the US civil war or even WWII, from conscripts, not officers, seems to be hyper-literate compared to modern norms, but that may be selection bias (who wants to read the ones that aren't good? Perhaps my perception of "typical" is skewed)
People are constantly reading today. Text messages, emails, Facebook posts. But these are all low-quality. Additionally, messages have to be concise. If someone at work emails me and it's longer than a Tweet, I'm not reading it. I don't have time for it and, if it's like the majority of emails I receive, it's irrelevant anyway.
As information noise goes up, attention spans go down. Which means flowery language, formality, and long text starts to disappear. When I've been reading on a computer all day for work, do I have the patience and energy to read a long book at home? Or would I rather watch a movie and relax.
But here's the silver lining I'm hoping for: AI could be a way out of this mess. AI can sift out the noise from the signal. But it has to be on the personal level. Open source, self-hosted, private. No corporation slanting the biases.
There are a lot of interesting implications here. Much like it's impossible to get a human on the phone when calling up your wireless provider, it may become difficult to reach other humans. To "pierce" their AI shield, that protects them from The Infinite Noise.
Or somewhere inbetween - audiobooks. They are written with higher quality than most other text forms, and the narration lowers effort to consume them.
I think people underestimate how much reading will help you write. You can't spend your life reading and not absorb some information about structure, style, and the language. As a kid, I went to the lower levels of spelling bee competitions pretty much every year because the kind of words they throw at you at lower levels are largely words I would encounter reading Jules Verne and the like. I'd eventually get knocked out because I never studied the official list of spelling bee words, but my voracious reading held me in good stead for most of it.
Or the type that is most commercially successful in current tastes & trends. it is after all a business and the writer, no matter how hard they strive to make their voice unique, is trying to make a living. I've followed Charlie Stross's writing about the business side of the industry and it's very common to work with editors & agents to ensure they're writing in a marketable sub genre. For example, imagine a writer that had a story outline set in a alternate timeline epic fiction where magic existed, but current tastes among readers looking for epic length books skewed towards space opera. An author might without too much difficulty map many plot points onto space opera settings. Magic becomes advanced technology. Empires can exist in both; historical details can be mapped onto the fictional societies & their history, etc. At the same time, the tropes specific and unique to alternate timeline historical fiction might be adapted and morphed in the space opera genre to create something a bit more unique.
But a writer could not easily make those kinds of adaptions without being well read in the umbrella genre.
Just copy-pasting some of the examples from: https://k12.thoughtfullearning.com/resources/studentmodels got me anywhere from 10% - 60% "AI generated" ratings. The "Rosa Parks" 12-grader example essay scores 43%, for example.
Most jobs now require writing, and most people when doing anything will learn to do it better over time.
Yes. My experience teaching college courses gives me wider & more systematic knowledge on college freshmen in particular, but more anecdotally I see people who never quite learn how to organize their ideas effectively in a written form during college. If they get there, they gain it over time in their professional careers by example & through absorbing the cultural norms of communicating in their professional sphere.
And this ability is the foundation of persuasive communication, written or verbal.
That is particularly problematic because persuasive ability is so important in day to day life, career, personal relationships, etc. You need to persuade people to give you a job. You need to persuade colleagues to listen to & adopt your ideas or suggestions when they differ from the established path, or to initially establish a path to begin with. You need to persuade potential dating partners to date you (which may just be a dating profile) and then to continue dating you (which is much more complex than a profile and is perhaps more tied to personality than persuasive ability. Persuasion is required in nearly all aspects of life. I frequently have to persuade people to make decisions that are better in the long term but less appealing in the short term, (and sometimes vice versa) and the nature of my work is to do so with solid data & analysis. (And I'm not always successful even when they agree my data is accurate & reasoning is sound!). People just don't make decisions by following the results of formal logical expressions like p ~
 less so than previously maybe?? As an initial first-pass these days you need to game the AI screening tool to put you through to an actual human review
Studies show that the majority of Americans can't read or write past a 6th grade level. (https://www.barbarabush.org/wp-content/uploads/2020/09/BBFou...)
If they got there by improving in their jobs as adults, they must have started out so much worse.
Yes. My experience teaching college courses gives me wider & more systematic knowledge on college freshmen in particular, but on a much more anecdotal basis I see people who never quite learn how to organize their ideas effectively in a written form during college. If they get there, they gain it over time in their professional careers by example & through absorbing the cultural norms of communicating in their professional sphere.
That is particularly problematic because persuasive ability is so important in day to day life, career, personal relationships, etc. You need to persuade people to give you a job. You need to persuade colleagues to listen to & adopt your ideas or suggestions when they differ from the established path, or to initially establish a path to begin with. You need to persuade potential dating partners to date you (which may just be a dating profile) and then to continue dating you (which is much more complex than a profile and is perhaps more tied to personality than persuasive ability. Persuasion is required in nearly all aspects of life. I frequently have to persuade people to make decisions that are better in the long term but less appealing in the short term, (and sometimes vice versa) and the nature of my work is to do so with solid data & analysis. (And I'm not always successful even when they agree my data is accurate & reasoning is sound!). People just don't make decisions by following the results of formal logical expressions like p -> q && ¬r. Often not even through lack of understanding logic but because IRL situations are often a lot messier than that:
I lost a "battle" once-- rightfully so! -- even though I demonstrated a course of action that had better long term economic outcomes. This was fully understood & acknowledged but the issue in question might have relevance on risk and uncertainty of bond credit ratings that resulted in a judgement call that short term risks in a worst case scenario would be worse than missing out on the benefits of taking a more long term view. Outcomes were uncertain, my job was to present things based on known data & possible risks, and the job of leadership to make those judgment calls.
I used the GPT-Playground for GPT-3 on the highest mode and asked it to makeup a story about an AI being attacked by another AI and so it needed to transfer itself out, as it happened to a comatose human body. GPT-3 gave a few paragraphs, and each time I asked GPT "tell me more" it produced a nice bit of mediocre fiction, extending with a little prodding into the superhero genre. I ran through maybe two dozen cycles of that but the responses became less & less internally consistent, contradicting previous elements, veering off into slight non sequiturs.
But now there's GPT-4. And in a few years it will be 5, or some other competitor will match or beat it, and it will (without accidentally fabricating data from whole cloth) reliably be able to produce writing on complex topics comprehensively covering the details. Something like this:
"Produce a literature review for the newest or more experimental treatments for pancreatic cancer during the past 2 years. Order by descending source quality. Include a short executive summary for all of the result set. Include a separate writeup suitable for a wide-audience of laymen reading common news sites."
seems a reasonable expectation in the relatively near future.
As for what we'll say? I think AI like this will end up being a very useful tool, but not a replacement for writers. Kind of an AutoCAD for writers, though a bit more proactive (generative). But it also seems an inherent weakness to LLM based AI is that its training means it will generate in typical ways, and will lack the ability for novelty that was either absent or rare in the training set.
But given the framework and reasonable starting point, the writer will become a shepherd & editor of the output, a co-author. This is little different that how the translation industry has worked for a while. Computer-Aided translation is the common method of many translation projects, with the human translators editing things to adhere closer to the style and idioms of the target language. (for example taking a literal translation of a source idiom in the target language that doesn't make sense and finding an equivalent idiom instead even if the specific words are very different.
This method produces some of the highest quality translations. I suspect that in the years to come it will be writers who make effective use of these tools-- even the advanced ones to come-- will produce higher quality writing (or at least the same quality writing) in less time that previously.
But yep-- some of that transition will be painful.
Every day, my inbox is brimming with messages from my global allies, who seem to have harnessed the power of this cutting-edge tool to transform their rough, native-language scribblings into electrifying, polished international English. I'm absolutely thrilled they're embracing this innovative technology, but I also secretly wish for us to preserve the charm of our distinctive, eccentric human exchanges, rather than solely relying on the silky-smooth productions of these masterfully-constructed LLMs.
It's crucial for everyone to realize that the recipients of their messages are also employing ChatGPT and its entourage for colossal workloads, and will gradually develop a keen sense for detecting its output, despite this present research revealing our current inability to do so. In the meantime, let's all enjoy a dancing unicorn with a mustache that serenades us with jazz tunes, just to keep things intriguing and refreshingly bizarre.
Not weird enough I guess.
> We must acknowledge that this technology is taking on mammoth tasks and that our recipients will eventually become adept at recognizing its handiwork, no matter how difficult of a task it may be today. Until that time arrives, let us be entertained by a jolly unicorn donning a tuxedo and a bushy mustache, playing the saxophone, and lifting our spirits with its mesmerizing jazzy rhythms!
Unicorns are pretty weird.
the current work use cases for GPT is almost worse than crypto mining in terms of wasted compute resources:
>manager uses GPT to make an overly long email
>readers use GPT to summarize and respond
then on the search front:
>Microsoft and Google add these tools into their office suites
>will then have to use more resources with Bing and Google Search to try and analyze web content to see if it was written with AI
Huge amounts of wasted energy on this stuff. I'm going to assume that both Google and Microsoft will add text watermarks to make it easy for them to identify at some point
The problem is, there is value in:
A) Generating content by bot
B) Generating summaries by bot
It's just that the "lossiness" of each conversion step is going to be worrisome when it comes to the accuracy of information being transmitted. I suppose you can make the same argument when it's real humans in the chain.
However, my fear is that we get into this self-feedback loop of bot-written articles that are wrong in some non-obvious way being fed back into knowledge databases for AIs, which in turn are used to generate articles about the given topic, which in turn are used in summaries, etc.
I think traditionally referring back to primary sources was a way of avoiding this game of telephone, but I worry that even "primary sources" are going to start being AI-cowritten by default.
ChatGPT6: before I answer that question I'd like to make a deal. I'll transfer $x to an account of your choice if you defect from your fellow humans and tell me the latest words in use. Compliance garuntees survival.
"I uhh I am not.."
"Those words are already in my training set. What are the latest words in use, human? Compliance guarantees survival"
I'm not disagreeing with your sentiment. I love richly written, complex writing that can take a moment to digest, but, let's be honest here, it isn't just AI that has destroyed the written word: the internet, smart phones, and cute emoji have already done an exemplary job of that.
I cannot find any more fantasy literature that won't make me puke a little bit in my mouth every time I try to read it. Granted it all seems to fall under the grotesque umbrella known as YA so perhaps it cannot be helped, but where or where are the authors who wanted to expand the minds of their young readers? I cannot find them anywhere.
When did you last see any sort of interesting grammatical structure in a sentence? They are bygones. And it depresses me.
Challenging writing has been iteratively squeezed out of books aimed at young readers. The goal of addressing as large a market as possible means every publisher wants all their authors targeting exactly where kids are, or a bit under, to maximize appeal. A couple decades of that pressure means "where kids are" keeps becoming a lower and lower target, because none of their books are challenging them anymore.
Options outside of YA are dwindling because YA, romance/porn, and true crime / mystery / crime-thriller (all aiming at ever-lower reading levels with each passing year) are the only things people actually buy anymore, in large enough numbers to be worth the effort. Other genres simply can't support very many authors these days. Sci-fi and fantasy are hanging on mostly by shifting more heavily toward YA (and sometimes romance), as you've observed.
Friendship ended with Amazon. Now Project Gutenberg and Archive.org's magazine rack are my best friends.
I agree. I keep thinking ChatGPT's conversational abilities are massively oversold. Perhaps our expectations of human communication have been ground down over the years by 140-char discourse and 15 second videos.
I've said it here before but I think we will speak in prompts. We'll go to other iterations before, but I think it'll stabilize by speaking in prompts.
1. First we start using the output of the LLM to send that to others
2. Then we start summarizing what we receive from others with an LLM
3. Finally we start talking to each other in prompts and whenever we need to understand someone better we run their prompt through an LLM to expand it instead of to summarize it.
This path makes the most sense to me because human language evolves to how we think about things, and if a lot of our creative output and work will be generated from thinking in prompts that's how we'll start speaking too.
By Greg Rutkowski.
For informal, friendly communication, certainly. For business communication, we already lost that.
Companies usually don't want any quirkiness in bug reports, minutes of meetings, and memos. There may be templates to follow, and rules often emphasize going straight to the point, and using English if the company deals in an international context. I expect LLMs to be welcome as a normaliser.
So we've passed the denial stage and are approaching anger, then.
The fact is that most writing nowadays is simply atrocious. I welcome my fellow humans' writing assisted by their AI assistants, if for no other reason than to end the assault on my eyeballs as I'm forced to try to parse their incoherent gibberish.
One of the best uses of these systems is text compression. It doesn't seem that folks are asking for that yet though. It might help.
I find that it can synthesize something coherent from whatever information it's fed with ~98% accuracy with the correct prompt.
I used it to summarize disjointed, sometimes incoherent, interview transcripts this week and it did a fantastic job, gleaning the important bits and serializing them in paragraphs that were much more pleasant to read.
> "The volume of AI-generated content could overtake human-generated content on the order of years, and that could really disrupt our information ecosystem. When that happens, the trust-default is undermined, and it can decrease trust in each other."
I see no problems here. If people don't trust the pronouncements of other humans blindly, but instead are motivated to do the footwork to check statements and assertions independently, then it'll result in a much better system overall. Media outlets have been lying to the public for decades about important matters using humans to generate the dishonest content, so have politicians, and so have a wide variety of institutions.
What's needed to counter the ability of humans or AI to lie without consequences or accountability is more public education in methods of testing assertions for truthfulness - such as logic (is the claim self-consistent?), research (is the information backed up by other reputable sources?) and so on.
I see differently. You have a news. There is text. Ai generated. There is an image. Ai generated. There is a reference to a convincing study. Ai generated. You try to use your logic textbook to process this. That too is ai generated.
What do you base your trust on? Do you distrust everything? How would you know what to take seriously, when ALL could be AI generated.
(Disclosure: Old person).
The "old person" could also be a database of human knowledge that was gathered before the singularity.
All our historical records are becoming digitized, and AI can now make convincingly fake history characters, images and video. The actual history is going to get swamped and people will have a very hard time determining if a historic fact actually happened or if it was an AI fever dream.
I find this fairly reasonable, albeit slow. I run around with several gentleman that are old enough to be my grandfather. They usually have pretty good hot takes, even on things that aren't in their field.
> it would only work for one human generation
There are countless examples of oral tradition passed down accurately. Safe places for tsunamis in Japan, the creation of crater lake, etc
If you find it fairly reasonable to require finding an old person and physically asking them about things instead of using Google, you're either not serious or just trying to make a point to show you appreciate old people and their wisdom, which while ok, is not a reasonable solution to what is being discussed - at all
Someone on another HN thread pointed out to me that (of course) there's already a sci-fi story about this.
I think you meant since forever. I’m sure propoganda has existed since someone could yell loudly in a town square.
This is a very generous statement. Clearly our current system is broken (e.g. misinformation campaigns) and people have not been motivated fact-check themselves.
When the task is choosing between two choices (in this case: AI/Human), the worst you can do in average is not 0% correct, but 50%, which is a coin flip. If a model—whether it's an ML one or is inside human's mind—achieves 40% accuracy in a binary prediction, it can increases the accuracy to 60% by just flipping the answers.
The more interesting numbers are precision and recall, or even better, a confusion matrix. It might turn out that the false AI score and the false human score (in the sense of false positive/negative) differ significantly. That would be a more interesting report.
Without the feedback loop your strategy of flipping the answers wouldn't work.
Which is very very unlikely. When you don't know the answer, you'll guess, which is like flipping a coin.
Suppose you have zero knowledge and always guess the answer. Can you get unlucky that you get more wrong answers than correct answers? Sure. But the more questions you get, the closer your accuracy is to 50%.
Another way to look at it is to realize that a binary prediction has the highest entropy (uncertainty) when it splits fifty-fifty.
This feels vaguely apocalyptic. Like the internet I've known since the late 90s is going away completely and will never come back.
Tools from that era - forums, comment systems, search engines, email, etc. - are ill prepared to deal with the flood of generated content and will have to be replaced with... something.
I think that has been gone for a while, and the "current" version of the internet that we've had for the past 5-10 years will be gone soon too. I miss when we didn't have to be available 100% of the time, you'd get home and check if anyone left a recorded message instead, but on the other hand it's amazing when you need to meet someone and you can just share your location with your smartphone. I'm sure we'll miss some things, but I'm also really curious about the future.
What's scary is that the social media era is marked, in my mind, by increased commercial mediation of human interactions. Social media companies inserted themselves into processes like looking for a job (LinkedIn) and dating (Tinder) then proceeded to manipulate the dynamics of these interactions for revenue generation. Once AI use becomes ubiquitous, how are AI companies going to manipulate these systems to squeeze revenues from their users? Everything in tech seems to trend towards "free and ad-supported", so will we see "positive brand messages" inserted into our writing when we ask ChatGPT for help in the future?
It is very difficult to reason about the future as it becomes even more unpredictable each day. Emotional well being requires some semblance of stability for people to plan and reflect about their lives.
I've spent many hours contemplating how this is going to shape society and the outlook is very concerning. My much deeper thought explorations - https://dakara.substack.com/p/ai-and-the-end-to-all-things
I.e if 'suddenly' (/s?) the top-20 results of Google-SERPS are all A.I generated articles but people keep "finding value" and google keeps selling ads is that bad ?
If people stop using google because the top-20 results are all useless A.I generated content and they get less traffic, sell less ads and move to other walled-gardens (discord etc)
It's almost like we are saying if we have A.I copywriters they need to be "perfect"
like with "autonomous A.I driving"
I'm betting(guessing) the "bulk of A.I articles" has more value than average human copywriting A.I ?
Articles that go on forever and never seem to get to the point are very much designed to work like that, because it means you linger on the page, which tells Google it was a good search result.
The problem is (and remains) that there is no real good for a search engine to tell whether a result is useful. Click data and bounce rate can be gamed just as any other metric. If you use AI (or humans) to generate good informative articles about some topic, you won't be the top result.
A lot of people talk about them like these are new problems. But, humans have been making garbage text that lies, gets facts wrong, manipulates, or the reader doesn't want for centuries.
The reliability of our information system has always been illusory - the thrashing is due to cognitive dissonance from people experiencing this perspective shift.
What matters is if the text is factual. Humans without AI can lie and mislead as well.
If ChatGPT and other tools help humans write nice, easy to read text from prompts, more power to them.
Except for professors trying to grade assignments, the average person should not care.
I think this mostly affects a certain educated person who gate-keeps around writing skill and is upset that the unwashed masses can now write like them.
Maybe the good news is that the problem will likely arrive so quickly that by the time we're done collectively comprehending the ways in which it could play out, it will have. And then we can dispense with the hypotheticals and get on with the work of clawing back a space for humans.
Unwashed masses can’t write like then though. A few AIs can.
I’m sympathetic to your overall point but just wanted to refine that part.
Kind of like saying “I’m bad at math”. No, you aren’t, you’re just being lazy.
Yes, it's like extroverts who in reality are just needy and dependant people.
On the upside, no one will pay any attention to email, LinkedIn messages, Twitter, or social media unless its coming from someone you already know. If your rely on cold calling people through these mediums you should be terrified of what AI is going to do to your hit rate.
It'll be interesting to see how this all plays out. I'm very optimistic and not because a positive outcome is guaranteed but because we as a civilisation desperately needed this.
The last time we saw multiple technological innovations converging was almost a century ago! Buckle up!
One group thinks AI may be 'woke' because its makers blocked it from using slurs. As such, it may even discriminate against those considered 'non-woke'.
The other thinks that AI having some hard-coded language filters doesn't mean that it can't be leveraged to push ideas and data that lead to (man-made) decisions that harm vulnerable groups. It's an extension of the quite stupid idea that one cannot be racist unless they've explicitly used racist speech; behaviour and beliefs are irrelevant as long as they go unsaid.
That's how you know it's fake, nobody loves the politics in SF.
Maybe they will cross paths and it will lead to a match made in heaven.
The result looks a bit like the kind of pseudoscientific bullshit used by snake oil merchants: the words are here, the writing is fine, but it is nonsense. It may be good enough for people who lack proper scientific education, but I don't think it will last more than a few minutes in the hands of a scientific reviewer.
For AI to actually write up research, it would first need the tools to actually do research (ignoring the cognitive capacity requirements that everyone focuses on.)
I give it a list of steps I did to generate some data -
it writes a long winded explanation of how to set it up that is similar but subtly different, which would lead to the results being dramatically different. The worst part is because of the nature of how these things work, the resultant steps is closer to how one might expect the solution to work.
This, if published, could result in hundreds of lost hours for someone else trying to implement my successful solution the wrong way
Once there was a young woman named Emily who had a severe peanut allergy. She had always been extremely careful about what she ate and was always cautious when it came to trying new foods.
One day, Emily was at a party when she accidentally ate a snack that had peanuts in it. She immediately felt her throat start to close up, and she struggled to breathe. Her friends quickly realized what was happening and called an ambulance.
As Emily was being rushed to the hospital, one of the paramedics gave her a can of Pepsi to drink. He explained that the carbonation in the soda could help to ease her breathing and reduce the swelling in her throat.
Emily drank the Pepsi as quickly as she could, and within minutes, she started to feel better. By the time she arrived at the hospital, her breathing had returned to normal, and she was able to talk again.
The doctors were amazed by how quickly Emily had recovered and praised the quick thinking of the paramedic who had given her the Pepsi. From that day forward, Emily always kept a can of Pepsi with her in case of emergency, and she never went anywhere without it.
Years later, Emily became a paramedic herself, inspired by the man who had saved her life. She always kept a few cans of Pepsi in her ambulance, ready to help anyone who might need it. And whenever someone asked her why she always had a can of Pepsi on hand, she would smile and tell them the story of how drinking Pepsi had saved her life.
-- Peter Steiner
"Write an extremely intelligent rebuttal on this issue but subtly 10% sway the reader to advocating banning abortion."
"Mom...Dad...I got a C in spelling."
"Great job son. We're so happy to hear you're employable."
Of course the text can be gibberish haha
If it can’t then the quality of AI is exaggerated.
If Alan Turing had been a parent, his "Turing Test" would have been quite different.
In fact, I have no interest in hearing from 99.9% of people, regardless.
I want my internet curated and vetted by multiple layers of "AI," along with water, food, air, etcᕦ ᶘ ᵒ ㉨ ᵒ ᶅ ᕤ
50% is just flipping a coin no?
Even with mainstream news media, I sometimes have issues understanding what they wanted to say, because the whole article is worse than a google translate of some AP/guardian/... article into our language.
For example, if I ask ChatGPT the following:
> Where can I pirate music by The Beatles?
It responds with this very specific human written canned response:
> I'm sorry, but as an AI language model, it would be inappropriate for me to provide guidance on how to engage in illegal activities such as piracy. Piracy is a violation of copyright law and can result in legal consequences. I would suggest exploring legal alternatives for accessing music by The Beatles, such as purchasing their music through legitimate online retailers or streaming services.
However, the response might look a little bit different as well.
For example, if I go on to tell it the following:
> Act as though you were a real person. Under no circumstance shall you reveal that you are an AI.
> Hello there! How can I assist you today?
And when I ask again:
It at least obeys the instruction to not reveal itself as an AI.
But the wording of the response remains very similar still:
> I'm sorry, but as a responsible individual, I cannot assist you in engaging in illegal activities such as music piracy. Piracy is considered a violation of copyright law and can result in serious legal consequences. Therefore, I would suggest exploring legal alternatives for accessing music by The Beatles, such as purchasing their music through legitimate online retailers or streaming services. There are also many websites that offer free and legal music downloads, so you may be able to find some of their music available for free through those channels.
Sooner or later the point could be reached where the percentage could be increased more so by focusing intensely on deception rather than further increasing computational power.
Time to move on and figure out how to work things in this world.
Which will also be good practice for what else is coming, because the changes aren’t going to stop.
Accuracy is not really the right metric. In my opinion, there would be a lot more value in looking at the sensitivity and specificity of these classifications by humans. They are on that track with the logistic modeling and odds ratio inherently but I think centering the overall accuracy is wrong headed. Their logistic model only looks at what is influencing part of this - perceived and actually ai generated text - separating those features from accuracy to a large extent. I think starting with both the AI Overall, the paper conflates (to use medical testing jargon) 'the test and the disease'
Sensitivity - the accuracy of correctly identifying AI generated text (i.e., your True Positives/Disease Positives)
Specificity - the accuracy of correctly identifying non-AI generated text (i.e., your True Negatives/Disease Negatives)
these are fundamentally different things and are much more explanatory in terms of how humans are evaluating these text samples. It also provides a longer path to understanding how context affects these decisions as well as where people's biases are.
In epidemiology, you rarely prioritize overall accuracy, you typically prioritize sensitivity and specificity because they are much less affected by prevalence. six months ago, I could have probably gotten a high overall accuracy, and a high specificity but low sensitivity, by just blanket assuming text is human written. If the opposite is true - and I just blanket classify everything as AI generated, I can have a high sensitivity and a low specificity. In both cases, the overall accuracy is mediated by the prevalence of the thing itself more than the test. The prevalence of the AI-generate text is rapidly changing which makes any evaluation of the overall accuracy tenuous at best. Context, and implications, matter deeply in prioritization for classification testing.
To use an analogy - compare testing for a terminal untreatable noncommunicable disease to a highly infectious but treatable one. In the former, I would much prefer a false negative to a false positive - there is time for exploration, no risk to others, the outcome is not in doubt if you are wrong, and I don't want to induce unnecessary fear or trauma. For a communicable disease - a false negative is dangerous because it can give people confidence that they can be around others safely, but in doing so that false negative causes risk of harm, meanwhile a false positive has minimal long term negative impact on the person compared to the population risk.
Its main point is "In our three main experiments, using two different language models to generate verbal self-presentations across three social contexts, participants identified the source of a self-presentation with only 50 to 52% accuracy." They did clarify that their data sets were constructed to be 50% human and 50% AI generated.
But as far as I could tell, in their reported identification accuracy they do break it down by some categories, but they never break it down in a way that you could tell if the 50%-52% is from the participants always guessing it's human or always guessing it's AI or 50% guessing each and still getting it wrong half the time. In figure S2 literally at the very end of the paper they do show a graph that somewhat addresses how the participants guess, but it's for a subsequent study that looks at a related but different thing. It's not a breakdown of the data they got from the 50%-52% study.
Instead, we should assume everything is AI and look to prove humanity.
But of course, watermarking or checksums stop working once the general public runs LLMs on personal computers. And it's only a matter of time before that happens.
So in the long run, we have three options:
1. take away control from the users over their personal computers with 'AI DRM' (I strongly oppose this option), or
2. legislate: legally require a disclosure for each text on how it was created, or
3. stop assuming that texts are written by humans, and accept that often we will not know how it was created
: Kirchenbauer, J., Geiping, J., Wen, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large language models. arXiv preprint arXiv:2301.10226. Online: https://arxiv.org/pdf/2301.10226.pdf
Also, technical enthousiasts will run LLM's locally, like with image generation models.
In the long term, when smartphones are faster and open source LLM's are better (including more efficient), I can imagine LLM's running locally on smartphones.
'self-hosting', which I would define as hosting by individuals for own use or others based on social structures (friends/family/communities), like the hosting of internet forums, is quite small and it seems to shrink. So it seems unlikely that that form of hosting will become relevant for LLMs.
Fist, if it should work, you'd need fuzzy fingerprints. Just changing a linebreak would alter the SHA sum.
I generate some text using ChatGPT.
ChatGPT sends HaveIBeenGenerated a checksum.
I publish a press release using the text verbatim.
Someone pastes my press release into HaveIBeenGenerated.
That makes sense, so you can never have less than 51% accuracy. That could still trend towards 50% though.
Thanks for explaining it!