Hacker News new | past | comments | ask | show | jobs | submit login
AI Submissions to Clarkesworld Continue (neil-clarke.com)
65 points by em-bee on May 19, 2023 | hide | past | favorite | 58 comments



I know that revealing the features they use to spot AI submissions would just make their job harder, but I can't help but be curious what sort of features they are looking for. I frequently use ChatGPT to proofread things I write, and I find that in order to avoid "sounding like ChatGPT" it's best to just ask the AI "Is what I wrote clear?" I find if I focus on clarity, the suggestions it makes are pretty basic and unlikely to make my text sound robotic (e.g. fix this misspelling, define this term). If I ask the AI for suggestions on tone, I find it pushes my text too close to a generic "PR friendly" writing style that does not feel personal to me.


I'd imagine one metric is "bland subject matter with impeccable grammar"

Like, how you can write so well, but not know how to imbue any of the character's personality into the writing?


I feel as though I am being personally attacked because your description resonates with aspects of my self-image with which I am uncomfortable.

However, I understand that you do not know me and that you are describing a hypothetical person who writes carefully about bland subjects.

Overall, I am sure that you are correct and that this is one metric used for identifying AI-generated content.

***

Seriously, though, I feel like the words "however" and "overall" will be tainted for years.


I got accused (more as a joke, but c’mon) by my boss the other day for using GPT to write a document. His justification: the grammar and spelling was too good. I had no idea that good writing was now a “tell” for generated content for some. I I just write good, honest!


I could understand grammar to some extent, but how do you mess up spelling when you have spellcheck?


Its grate too no eye spiel write, four baht worse git whey read line unto hem.

(It's great to know I spell right, for bad words get a red line under them).


But how do you automate that?


That could catch ChatGPT but not LLaMA, which adopts personas almost too well.


That would not necessarily ban AI generated content, only bad storytelling.


This reminds me of the legend of the Tower of Babel. God cursed humanity, rendering them unable to understand each other.

Except in this case it's only "anonymous humanity" that's screwed. And it isn't that we can't understand but that we have lost all trust.

Is it a human with something to say or is it a robot playing a game?


People still think about it in quaint ways.

Spammers are a minir annoyance. I have one word for you: swarms.

Swarms of sleeper accounts can be deployed with existing technology. They can astroturf revolutions, rile people up against others, make reputational attacks at scale — and perhaps most destructively, they can overwhelm anyone with a different viewpoint, with 10-1 content ratio (eg I could deploy 20 bots on HN to downvote and argue with the people who disagree with me ad nauseum).

As you can imagine, forums could use “grandfathered” accounts to sort of mitigate this. But when regular people start preferring bots, it is game over.

You think it is hard for bots to endear themselves over humans? Karma / likes / upvotes is actually a measurable and gameable metric, and the bots can excel at it better than humans. Not to mention if you own a dating site you have access to all the messages that have ever worked to get a date or get a laugh. HN anyone can analyze the type of messages that got the most upvotes.

We keep analyzing one bot among 8 humans at a poker table. No one seriously considers the exponential jump to 1000 bots surrounding every human.

At best - we are building a zoo for ourselves. But we can’t stop!


What are your thoughts on requiring payment to participate as a way to reduce spam? I realize that in this particular context payment could prevent participation for some, however, are there other reasons why payment would be ineffective?


The first reason that jumps to mind is that cost is not a factor for everyone. For almost any given price level, there are actors for whom the cost is still worth it.

Some examples and their given cutoff point, at which the payments are no longer worth it:

Lower cutoff:

- Marketers selling a product -> when customer acquisition is more than the product margin, or investors decide the growth isn't worth it

Medium cutoff:

- Private organizations pushing an ideological viewpoint -> when the private money runs out, higher tolerance here because profit isn't the point, and actors are more likely to consider pushing the ideological viewpoint as worth spending money without direct profit return

High cutoff:

- State actors working against either other states or to squash internal resistance/political opponents -> virtually unlimited depending on how rich the country is

Requiring payment to prevent spam makes sense in narrow use cases, like Steam's $100 fee per game preventing the worst of the flood of cheap garbage. However, I don't think there is any price level that can "clean" general human communication online.


Now, our vision with Intercoin is that communities require payment in their own currency and require badges earned by attending their events.

https://intercoin.org/communities.pdf

Much harder to game at scale


This seems like it could work in some way, but it's important to recognize that spam, and LLM usage already require payment in some form or another. There's a cost to sending out millions of emails just like there will be a cost for generating millions of LLM generated posts. So with that framing, it's obvious that spammers, using AI or otherwise, are willing to expend capital to produce spam. There may be a threshold of cost they won't be willing to exceed, but there's no principle here which can be applied universally.


That was always required for sybil attacks. But if the long-term benefit exceeds the cost of membership, it will be paid. Think of a conqueror renting an army for a small bit of time to go pillage and gain very long term control of a village.

I am in a good position to speak about this since I designed decentralized social platforms like https://qbix.com and smart contract platforms like https://intercoin.app that build coins for communities.

If anyone can make an account, your community has unlimited cost dealing with it. It could be a sleeper account acting like a human, until one day the cost starts getting imposed in the form of, say, coordinated swarm vandalism:

https://en.m.wikipedia.org/wiki/Wikipedia:Wikiality_and_Othe...

But it could be a LOT more subtle and plant the seeds to move opinion of a forum gradually towards anything you want, including simply accepting and upvoting articles from your domains where your bots create content.

If you just mint a coin and give it out for basic tasks, people will take advantage of that by creating many accounts. You have to start asking people to appear in person to get a certificate at least — which is what Sam Altman’s Worldcoin is about. But you don’t need to go that far, every community can just have in-person get togethers once in a while to say hi to each other (“shocker!”)

I foresee a lot of people starting to retreat to gated communities. The thing is that anyone can invite a bot in. If you give everyone N invites, and the people they invite N invites, they’ll eventually use them tk invite N to the power X people. And those people might be real, but eventually may run a bot once in a while. You may think that’s not likely but actually it has already has been happening, people loove to run automation on their own behalf, and have no idea what it does, exposing themselves to compromises and attacks at scale:

  packages in package managers
  NYC landlords pricing apps
  wall street trading bots
I wrote about it here: https://magarshak.com/blog/?p=385

But as I say, the real trouble begins when people start to prefer bots. They choose convenience over security and truth. Bots won’t have resistance to an agenda and a lie like humans will.

I guess our only hope is to dilute the bot swarms with opposing bot swarms, like Bitcoin miners competing kind of. But at that point the entire internet will be a DARK FOREST.


Sam Kriss has a pretty wild post about this, with a running metaphor of LLMs as golems: https://samkriss.substack.com/p/the-cacophony


Thanks. I'll read that.

I was thinking demons mself. Being, above all, liars.

A quadrillion synthetic liars. Breeding and speaking at the speed of light.


Just wait, most the people interested in using these tools for spam haven't gotten it figured out yet. They are working hard though, and once they figure it out it will be more prolific then human traffic by orders of magnitude.


The "dead Internet theory" finally becomes true?: https://en.wikipedia.org/wiki/Dead_Internet_theory


> Before anyone does the “but the quality” song and dance number, none of those works had any chance at publication, even if they weren’t in violation of our guidelines.

So what's the point in even trying this? Why would you waste their time with something that has no chance at being accepted? I don't understand what the motive is for these AI submissions in the first place.


There's an industry of get-rich-quick grifters who are telling people they can make $1000/day by submitting AI generated slurry to Kindle, Audible, or anything else that pays for writing. It's an evolution of previous schemes which used ghostwriting services to produce similar low-quality content, but of course the scam has pivoted to AI now.

It's unlikely that many of the people submitting these works are actually making much if any money, the real winners are the grifters who sold them the expensive course explaining how to do it, but the people who fall for the scam are going to try their hardest to get the ROI they were promised. Which means spraying AI diarrhea everywhere in the hope it sticks to something.


Because any payoff would be worth the effort. It's the same as people spamming bug bounty programs by using publicly available scanning tools.

All it takes is one success anywhere for people to keep trying to spam every possible endpoint. It's why people stand at street corners with signs. It's also why people gamble at casinos. It's why loot boxes sell so well.

Humans can not resist the urge to continuously throw effort at things with rare, intermittent rewards. Cost/reward is not taken into account, only the reward. Sadly, belief in a reward is enough to trigger this. There never needs to be any actual reward.


    "You miss 100% of the shots you don't take."
        -- Wayne Gretzky
-- person who made a GPT-generated submission to Clarkesworld, probably


It's part of the GPT grift on social media. Youtubers peddle these dumb ideas for making money with chatgpt, and don't actually use those ideas to make money.

Their viewers try and fail.


A variety of influencers in developing countries tell their audiences that they can make money by submitting AI-generated stories to sites with open submissions.

The people submitting don't know that there's virtually zero chance of acceptance and are trying to get the fee for a published story, which is a substantial amount in their country. The influencers probably know that the scheme won't work, but don't care because they just need to keep their audience watching.


I've received more than 100,000 spam mails in the last two decades. They have been unsuccessful. Unfortunately, they keep coming.


The problem is low-effort/low-gain spammers

They DGAF, they push their low effort scam and if it works it works

(I really wonder if there's some geographical bias here on the submissions) Especially with his focus on not alienating marginalized submitters, though I'm not sure how long he'll be able to keep that around


What I don't understand is why spammers think this is a good idea. It's not like the SF short story market is flush with cash and it will make them rich.

Personally, I think the spam should be flagged and sent to a chat bot that engages in a series of Lenny style interactions where it makes requests for information, promises eventual acceptance, and a series of increasingly bizarre rewrite requests (cut 100 words, expand your descriptions, more dialog, less dialog, make it from the dog's point of view, remove the dog character, sign this pre-contract information sheet, mail in 3 paper copies including a copy in braille for our blind copy editor).

Then publish a list of people who have been banned in this manner for public shaming. These are not aspiring writers looking for a break. They are scammers and spammers, and their electronic spam needs to be filtered out.


These are not aspiring writers looking for a break

right, as neil wrote earlier here: https://clarkesworldmagazine.com/clarke_04_23/

"We soon discovered, however, that these submissions were coming from outside the science fiction and fantasy community"


I watched an interview the other evening: Woodward & Bernstein, interviewed by Amol Rajan of the BBC (he's a horrible interviewer).

As his last question, he read out a piece of prose about W&B; it was bland, uncritical, even sycophantic. It was also factually accurate. Then he told them it was ChatGPT output, for a prompt along the lines of "Write a report about W&B, focusing on Watergate and the influence they've had on other journalists".

(The actual question's irrelevant, and the answers were garbled, partly because Rajan made such a mess of the interview.)

I have no idea how you'd make an effective spam filter to catch GPT output. It seems like an impossible task.


Gathering details to make sure I understand correctly:

- Amol from BBC interviewed Bob Woodward[^1^].

- As the last question, Amol asked a "bland, uncritical, even sycophantic" question that was from ChatGPT.

My confusion peaks around here:

> (The actual question's irrelevant, and the answers were garbled, partly because Rajan made such a mess of the interview.)

From that, I gather:

- there was one ChatGPT question, multiple people tried answering it.

- the answers to this question weren't great.

- however, the poor answer quality is not solely attributable to the answerers, as in your opinion (stated early and often!) the interviewer is bad.

Given those bits, it does sound to me like a detector is possible, if only because you detected it, and the author seems to be able to detect it, and I think, oddly, I can detect it.

There's something "off" about the text, at least from 3.5, that gives me short-term hope.

[^1^] not Bernstein, presumably, since Bernstein was dead in 1990 therefore he could not be asked to answer a question from ChatGPT


> - Amol from BBC interviewed Bob Woodward[^1^].

And Bernstein; rumours of his death are much exaggerated. The pair were in London for the Ivor Novello awards.

There was a "report" by ChatGPT, followed by a vague question about the impact of AI text-generators on reporting. ChatGPT didn't ask questions.

The interviewer was evidently in awe of these two men; his sycophantic remarks punctuated the interview, and left the interviewees with not much to say. They also took away momentum from the interview.

I didn't detect it; I would have guessed that it was from a Wikipedia article. WP articles about living people are generally non-critical and uncontroversial. The interviewer said (after reading it out) that it was ChatGPT. The author detected it presumably because he prompted ChatGPT to produce the report himself.

I'm sorry that my comment confused you so much! But if you were convinced that Bernstein was dead, I'm not surprised that you were confused.


Omg!!! I even “double checked”, like I could have sworn I saw a death date on his Wikipedia. Thanks, yeah, that solved everything lol. Cheers


In the very near future, every HTML form submission is going to require a market-rate fee to be processed, precisely to deal with spam. I'm surprised it's taken this long actually.


Even a token fee - refundable to proven works of human authorship - would take care of most spam. The dogmatic response to these and similar suggestions is “under no circumstances will authors ever have to pay to publish” often citing situations such as authors who are too poor to pay a token fee or don’t have access to online payments.


Any infrastructure designed to accept fees will be used for verifying stolen credit cards. You need to have countermeasures implemented for this. (Or whoever you use will need to.)

Also, a high refund rate will nuke your merchant account with most processors. No one wants to deal with this business model.


This is a solved problem: 3-D Secure


i disagree that the rejection of pay systems is dogmatic. clarkesworld has already published authors who could not make such payments, so clearly any payment system would have prevented those stories from reaching us and we would have missed out.

you may argue that is worth the price, but i would disagree. a submission system that is not inclusive and prevents people from submitting because of material circumstances would simply be unfair and for me reduce the appeal of the magazine.


I'm old enough to remember when submissions to SFF magazines had to be printed out and submitted by postal mail. There was a real cost to doing that - $5 to $10 depending on the size of the submission and the postal service used. No one complained that it was not fair or not inclusive; it was a legitimate, unavoidable cost of submitting original works to the magazine.


sure, noone complained, because the maganizes never even had the reach that we are talking about now.

in particular it was more difficult to submit for anyone overseas, even more so from developing countries. but this is where clarkesworld is inviting submissions from, and it could not do that if writers there had to pay.


Maybe we will finally get a robust ecosystem of micropayments on the web


I foresee a boom in the proofreading industry.

Proofreading these AI excretions. For veracity, productizability. Picking painstakingly through walls of synthetic bullshit.

It will be a new height in excruciating labor. Interesting new mental diseases will arise.


Why would AI be better at spamming these forms than today's spam bots?


Archive of Our Own is having a similar mini crisis... And a related issue, where authors are unhappy about their work being scraped.


Since there’s no access to the ground truth here—including for Clarke—all we (or he) really know, even if this is completely honest from Clarke’s perspective, is that there is an increase in aggregate submissions and more authors are being banned for subjective reasons that Clarke believes correlate to being AI generated.

The timing may be fairly strong evidence that the source of the surge is generated AI, but whether the bans correspond to use of that significantly better than random chance is supposition. For all anyone (except the individual submitters) knows, there is just as much generative AI use in the ones getting accepted as those being rejected.


With respect, I don't think anyone who writes publishable SF believes that generative AI on its own can write to Clarkesworld publishable quality.


Even if no AI generated works are getting through, without access to the ground truth and empirical evidence that the “AI detection” heuristics + manual review are actually rejecting AI exclusively (or at least at a rate *higher than the rate at which AI works are represented among the subset of submissions below publishable quality), what this actually is is an arbitrary addition of lifetime bans for failing to meet arbitary quality standards with a single submission, masquerading as AI detection.

Which is perfectly within Clarke’s rights, before the people making that argument jump in, but lets not pretend its anything other than what it is. Clarkesworld doesn't have a magical generative AI detecting ability that the solutions tested with access to the ground truth (which, yes, still suck pretty badly) lack.


we don't know what the criteria are, and we have to take their word for it that they are some obvious markers that something is AI generated, and as neil explains in the comments below: it includes many indicators that go beyond the story itself, so it is most certainly not just quality standards.

also not having access to the criteria does not make them arbitrary. i trust that the criteria are well reasoned.


as neil wrote earlier here: https://clarkesworldmagazine.com/clarke_04_23/

"We soon discovered, however, that these submissions were coming from outside the science fiction and fantasy community"

so with the exception of new authors that are unknown in the community, most submissions are from known authors and certainly not AI generated.

there would have to be a significant number of submissions from new authors if AI was capable of creating quality works. since that is not the case we can conclude that AI is not (yet) capable of that.


The one thing that is presently missing from the equation is integration with any of the existing AI detection tools. Despite their grand claims, we’ve found them to be stunningly unreliable, primitive, significantly overpriced, and easily outwitted by even the most basic of approaches. There are three or four that we use for manual spot-checks, but they often contradict one another in extreme ways.

This just about says it all, does it not?

The real trouble will start when people will start PREFERRING bots.


Does it matter if something is written by AI, if it's of sufficiently high quality?

Would it be better to focus our efforts on reducing poor quality submissions for things like this?

Is that what they're already doing?


this is explained in the post. there are markers in a story submission that go beyond the quality of the story itself. a submission without those markers, which would just be a story of poor quality would just be rejected without a ban of the submitter.


Then we could reject it on the fact that it's poor quality alone, right?


yes, but i think part of the problem is that the number of submissions is to high. banning AI submitters hopefully discourages these people to continue.

at present, AI submissions are not only a quality problem, but also a copyright risk. until we have clear laws and court decisions, any commercial publication can not take the risk of a copyright lawsuit that might come from an AI story incorporating elements of other stories. so even if we want to allow such stories, right now they can't be published.

personally, i expect stories to include novel elements, even if just a few, that do not exist in other stories. current AI can't do that, but i do not believe that it will ever be able to. AI may at some time be able to generate stories that are of high quality, but it will never be able to generate stories that contain novel and unique ideas. at best it will contain elements that seem new, but that only means that the reader was not already familiar with them.

anything that a computer produces must, by necessity be derived from existing input. computers will never be creative.

maybe that is not a problem. maybe the stories will be good enough to entertain. the problem really is that we will be overwhelmed by such stories, and as a result drown out the work of human writers if we do not create environments where only human written stories are featured.

i expect that clarkesworld will be such a place. and for this reason alone we must allow a publication and its readers to choose to reject AI stories, no matter how good they are.


And what if the LLMs start producing the highest rated and most alluring science fiction? Isn’t banning them then tantamount to a cognitive bias against non-human intelligence? What is the ethical justification for such anthropocentric practices?


> And what if the LLMs start producing the highest rated and most alluring science fiction?

They haven't.

> Isn’t banning them then tantamount to a cognitive bias against non-human intelligence?

It would be, if they were, but they aren't, so it isn't.

> What is the ethical justification for such anthropocentric practices?

I don't know, because that isn't actually happening. The actual, real world ethical justification for banning LLMs is that they're producing a flood of garbage.


The problem they have is low-quality spam. If an LLMs started producing publishable work, it would get through those filters.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: