Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm doing an experiment with AI posting on Reddit accounts to see if they would get banned. I bought 100 few-week-old accounts from some sketchy site for $0.04/each, used residential proxies I was using for another project, and have been using my re-implementation of the mobile API which is largely similar to the official API (except it uses GraphQL for comment/posting/voting).

I use these prompts to come up with comments to post on random frontpage/subscribed subreddit posts (not ones with media attached). I also randomly upvote posts and search trending terms. Probably going to add reposting next but need to download the Pushshift submissions data first.

    SystemPrompt: `You are a Reddit user responding to a post.  Write a single witty but informative comment.  Respond ONLY with the comment text.
    Follow these rules:
    - You must ALWAYS be extremely concise! 99% of the time, your lines should be a sentence or two.
    - Summarize your response to be as brief as possible.
    - Avoid using emojis unless it is necessary.
    - NEVER generate URLs or links.
    - Don't refer to yourself as AI. Write your response as if you're a real person.
    - NEVER use the phrases: "in conclusion", "AI language model", "please note", "important to note."
    - Be friendly and engaging in your response.`,
    UserPrompt: `Subreddit: "%s"
    Title: "%s"
    `,
Here's the longest running one: https://old.reddit.com/user/Objective_Land_2849

Current problem is that the responses typically range from cynical to way too enthusiastic.



What is the objective of the experiment? If there was a good reason for it that would be fine, but without good reason it sounds more like:

"I'm doing an experiment with AI robot which roams around the parks and public places and throws garbage at random locations. I'm experimenting with coke cans, and burger wrappings, but in future i'm planing to introduce car tires and nuclear waste" :(


The comments are above the level of the median redditor, I don't see a problem with it.


Stupid or not, they are still genuine.


How do we know? Reddit could be full of stupid karma-farming bots since years. How can we distinguish them from just your average stupid reddit-human? Thinking about, this could be also another angle of Turing test, find the true human idiot.


Holy hell.


Lots of comments here chiding OP for running or talking about this; I think there's something to learn here.

This is just a hobbyist prompt and api. If nothing else, I'd say at minimum this highlights that there are likely much larger farms that have been operating in a similar way but at larger scale for longer periods of time (but not talking about it)


Exactly this. If one person with spare time on their hands can do that as a side project, one can only imagine what a government-backed actor can do with this technology. I would not be surprised if in a few years we suddenly learned that half of internet comments are bots, similarly to what happened to email (most of email traffic nowadays is automated emails & spam).


And after all, whomst among us isn't really just three government AIs in a trenchcoat? I know I sure am.


Is this what happens nowadays when you sign up to a dating app? I mention this only slightly tongue in cheek:

a) Because it is being argued by some as a good thing that you have an AI do your flirting for you? "we can free people from writing a thousand introductory messages, giving them energy to focus on the humans on the other side." https://www.theguardian.com/commentisfree/2023/jun/06/ai-bot...

b) As someone mentioned the turing test, as a comment on the Ashley Madison scandal someone did suggest the chatbots on there had passed it. "Claire Brownell suggested that the Turing test could possibly be passed by the women-imitating chatbots that fooled millions of men into buying special accounts" https://en.wikipedia.org/wiki/Ashley_Madison_data_breach


So we can add the destruction of the web to the risks AI poses? Is there nothing AI can't lay to waste?


It really does look like dead internet theory becoming real is a matter of time.


What is there to learn? We already know that it's possible. This is some of the lowest-hanging fruit imaginable and is what spammers have already been doing.


Please don't do that. If you are straight enough to actually explain your operation in this detail, it shouldn't take much extra thought to realize why you shouldn't run this on the public internet. Users of that forum expect human replies. If they wanted to have LLM replies, they could use one of the available interfaces themselves. You are only contributing to the noise on the net. Please stop, and find another hobby.


Dude, one guy helps explain the issue, shows how easy it is, and you think he is the risk? If something is so easy, how do think a hundred others aren't doing it, why criticize the one guy pointing it out.


He's not a messenger, he's actually doing the bad thing he's talking about. I won't criticize him for pointing it out, but he fully deserves criticism for adding to the problem.


Why do you think you are replying to a human? :)


Why you are gating and experiment?


Because it is unsolicited LLM spam, not an "experiment". The "experiment" seems to be if they get caught doing it, or not.


Yes, that was explicit, the experiment was to see if it would be caught. That was the test. Did you read it?


This isn't an experiment any more than the fad of YouTubers doing "pranks" as "social experiments".


He had an idea, he wrote a test, he ran it, he provided results. What are you on about?


What hypothesis are they testing?


There are plenty of bots on reddit. People are clearly not expecting only human replies.

Even if they were expecting only human replies, it's because humans is all that was available before. By this logic, there wouldn't be any acceptable place to introduce the first AI, because no one expected it.

I think what people expect and what people are fine with are two separate things.

And no, you can't always use the LLM interface yourself. Those are gated in different, stricter ways than reddit is.

There are plenty art projects which take human expression and mirror it, or transform it in a mechanical way. Are those also only contributing noise? Should those artists find another hobby?


> There are plenty art projects which take human expression and mirror it, or transform it in a mechanical way. Are those also only contributing noise? Should those artists find another hobby?

Those people are not trying to deceive anyone. This guy was.


This kind of "social experiment" just makes everything worse for everyone.


The only datasets that will be useful to train LLMs in the future will be the ones generated before 2022. Any content generated after this date will be analogous to steel forged after 1945, it will be inevitably contaminated by the "radioactivity" of LLMs.

The good news is that the availability of data to train more and more powerful models will soon be gone, the bad news is it will take the internet as we know it with it.

It will be a sad day when most of HN posts are AI generated, but this day will come, it's pretty much inevitable. The post above us is just a drop in an ocean of garbage generators that are just starting to pop up all around the old human web that we used to "love". We'll probably miss old Twitter someday, as ridiculous as it sounds.


The good news is that this will mostly affect English, but most other languages are likely to keep being mostly generated by humans. This could even encourage people to use their own language more on the internet, which I think is a win for human cultural diversity.

I don't know if there is any escape from this for native English speakers, though.


Most other languages (at least the ones I know) are already hugely polluted by useless content that was (badly) machine translated from English. Such spam sites are now a majority of search results for me when I use Duckduckgo or Google.


That's not wrong, although it's often very easy to spot.


How so? LLMs like GPT4 have no issue generating text in Spanish, for example.


Probably the larger languages will be affected somewhat as well (I can't test Spanish but I've used GPT3.5 in French without issues) but not as much I think, such automated attacks seem to most often be targeted at English (I suppose if you're doing something like that, it's both easier to use English and also gives better returns (whatever they are) since there are much more English readers on the Internet).

On smaller languages though, GPT is often not good enough to use without a lot of supervision. Like it can give a good impression of West-Flemish, but can't simulate an actual conversation on an actual topic. Even just Dutch is kind of hit-and-miss.


GPT-4 tends to screw up the grammar in other languages, I imagine in proportion to inverse of the language's prevalence in the training data.

I often work with GPT-4 in Polish. I don't think I've ever had it give me an answer in Polish without at least one grammatical mistake somewhere per every two or three paragraphs. The text itself is still superb, and its command of vocabulary better than that of a median native speaker, but it revels itself by confusing genders, or forgetting about the grammatical case suffixes.


Spanish is probably the second most easiest one due to the sheer amount of data you can train it on. The less common the language is, the shittier the output becomes.

It is utterly useless at generating pretty much anything in my native language (Bosnian/Croatian/Montenegrin/Serbian, however you wanna call it). Like you don't even have to try to trick it, even if you try the simplest of prompts it will produce instantly dismissible garbage.

Like it's technically not wrong, it's (mostly) grammatically correct, but it produces sentences in such a robotic way no human ever would. Hell, even generating a prompt in English and then using Google Translate makes it sound more natural than straight giving it a prompt in my language. We don't need those AI detection tools, you can take one glimpse at a text and know with 100% certainty it's not written by a human.


As someone who has moderated several popular message boards over the years, I can assure you that the problem of machine generated spam is nearly as old as HTML itself.


It’s interesting what’s possible, and perhaps it shows off how low-thought a lot of human discussion is.

But in the end it’s noise and it pollutes human communication channels. It’s already hard enough to have an honest discussion when there are profit motives and agendas at play. Now we have effectively added probabilistic noise to the mix.

I don’t particularly fault the author for doing it, I’m sure it was fun and intellectually rewarding, and they’re unlikely to be the only one. But still.


Well the current implementation only comments on posts that already have a lot of upvotes so it's unlikely that most of the posts are read by humans. As far as I can tell there's no clear path to making any money with it short of becoming a foreign agent or selling upvotes, neither of which I am interested in. So I will shut it off soon because there's not really much else to do.


The replies are believably human, but kind of banal. If anything, this might indicate you've captured the gestalt of the median social media user.

This one was my favorite, and happened to be the only one that got more than one upvote:

> Why are old people so obsessed with collecting things like spoons, thimbles, and shot glasses? It's like they want to have a tiny version of every object in the world.


Looks like OP replied here

https://old.reddit.com/r/IAmA/comments/13tgscb/im_hasard_f16...

I mean personally I don’t mind. It is better than some of the human comments I’ve seen…


Just imagine that every comment you responded to was someone else performing the same kind of "experiment." Does it change how much you want to engage with the community?


This one is kind of hilarious:

“Why are old people so obsessed with collecting things like spoons, thimbles, and shot glasses? It's like they want to have a tiny version of every object in the world.”


This falls right in the uncanny valley of relatability for me.


Are spoons and thimbles tiny versions of other things though?

Even shot glasses, while you could see them as small versions of regular glasses, they're the normal size for what they're designed to contain (a shot).


Sounds like a bit from Seinfeld


The bots name is Objective_Land_2849.

In the last year or so I've noticed a lot of accounts whose username follows this naming format. Usually its: Adjective_noun_1234 but sometimes the underscores are hyphens. I really do wonder if these are all bot accounts.


If you created your own accounts instead of buying them, you would know that this format is the username format reddit automatically suggests for you :).

Also residential proxies are overkill unless you're doing crime. They also likely expose you to participation in a criminal conspiracy since the provenance of those ips is sketchy at best. IANAL YMMV. Mullvad offers a year subscription for ~$50. Also they support wireguard and you could use something like wireproxy and violla, 100s of ips and no crime in your supply chain*.

* I haven't tried posting to reddit with mullvad ips.

edit: looks like you're not op, sorry... The first paragraph is for you tho.


I use what https://www.pingproxies.com/ calls ISP Proxies, which I think is just them reselling a /22 they got from Charter. Definitely aren't botnet proxies because they have 100% uptime. Duly noted about VPNs though! I would imagine Reddit is more VPN tolerant than most sites.


Thanks for the tip. Didn't know these were a thing. Have you done any research into them? It seems like with these sorts of ventures a lowly techie like myself doesn't have a lot of ways to validate if what they say in their marketing is actually true.

The uptime is a pretty strong signal tho.


I did a whois on the IPs I was given and found they were owned by one ISP and were all in a similar range so there's that. A lot of ISPs (especially T-Mobile) lease IP space out like this too: https://rasbora.dev/blog/detecting-residential-proxy-network.... I would probably be paying a lot less if they were unethically sourced.

In general, if a provider advertises a 5-figure-IP-sized "pool" of IPs with a guaranteed number of "ports" (simultaneous connections), then the operator is almost certainly someone looking to monetize a botnet. Usually the cheapest plan would be something like pool of 20000 IPs - 500 connections, with the number of connections maxing out at 1/4-1/2 of the total IPs due to diurnal dynamics (people in major botnet victim countries like India/China often turn off their routers every night). Also advertising really specific geotargeting is often a sign that they are marketing to carders. The Krebs articles about awmproxy/TDSS are pretty good if you enjoy reading about this kind of thing :)

https://krebsonsecurity.com/2011/09/rent-a-bot-networks-tied... https://krebsonsecurity.com/2022/06/the-link-between-awm-pro...


Hello,

Tim, Managing Director at Ping Proxies here. You're correct - we work with various ISPs including AT&T, Comcast, Spectrum and a bunch of others.

We announce IP blocks with their residential connectivity and have proxies that benefit from datacenter uptime/connectivity while also looking like they're real residential connections.

We currently manage 50,000+ proxies in this configuration.

The downsides over having a peer network are that fixed costs are much more expensive and locations are limited - we have London, Berlin, Ashburn and New York while peer networks have basically every city on the planet but one of the largest benefits is the ethical nature of our product and the compliance that brings.

Let me know if you have any questions at all and thanks for supporting us!

Cheers, Tim at Ping


Brian’s newsletter is one of the few I actually subscribe to. Thank you for these links. From the posts it actually looks like you’d probably pay more if they were botnet shenanigan’s.


I get loads of these following me, they're usually professional exhibitionists on OnlyFans.


Professional exhibitionists is a term I've never heard before. Do other models or pornography actors fall into the same category?


I made the term up on the spot as I didn't want to invalidate what keeps people afloat in a relatively safe environment, and not all of it is sexual in its nature.

But yes, I would consider anybody providing sexually explicit or suggestive material as being a professional exhibitionist.


> What is something that old people love that you don’t understand?

> Why are old people so obsessed with collecting things like spoons, thimbles, and shot glasses? It's like they want to have a tiny version of every object in the world.

Asking the hard hitting questions. The people demand answers!


An ethical way to do this would be to:

At a minimum: Inform the moderators of the subreddit(s) you are planning to use and get consent.

Ideally: Ask the members of the subreddit for permission. Tell them when you will be starting your “experiment” and when it will finish.

What’s the hypothesis you are testing? What’s the benefit for them taking part?


Why be ethical in this instance? It's bad for the experiment.

- High chance of being outright rejected.

- potentially makes responses to your AI responses less natural


'Bad for the experiment' and ethical are orthogonal. There's an enormous body of work on the ethics of social science research, and an awful history of the consequences of such research that wasn't guided by consent and other ethical considerations.

If you're genuinely interested in answering those (likely rhetorical) questions. Check out work on the ethical dimensions of deception and covert research, especially relating to online research. e.g.: https://ec.europa.eu/info/funding-tenders/opportunities/docs...

If you're interested in the consequences of research online that violates consent, here's a good primer, focusing on Facebook's emotional contagion research amongst others - https://journals.sagepub.com/doi/full/10.1177/17470161166806...


I see it from the perspective of the guy doing the "experiment", why should HE be ethical? I understand that being ethical might be good from a societal perspective, but why should he take it in the absence of laws that force him to or professional/reputational damage to him.


> I understand that being ethical might be good

So then you do understand why he should be ethical: being an unethical person is a bad thing.

> why should he take it in the absence of laws that force him to or professional/reputational damage to him.

If the only reason that you behave ethically is because you will suffer consequences if you don't, then you are not ethical. I'd hope that such a person would have problems looking themselves in the mirror, but I know better.


"Why be ethical? (Asking as a serious question)"

This explains pretty much everything.


For "real science", like from within a university, it makes sense, as you must placate (beat) the ethics commission (they can generally be considered adversarial to research).

For this private kind of fun science, there is no need for ethics, unless one commits a crime or fears loss of personal reputation.


Maybe better asked as "why use your ethics"?

There is little consensus on a broad range of ethics.


This isn't really an experiment in the scientific sense.


What’s the actual experiment here?


"Experiment" was probably poor word choice, I mostly just want to see if anybody notices them or if they get kicked out of subreddits by mods. So far none have. I saw a bunch of people on HN say that AI was going to create this new wave of spam so I tried to test out that theory. My conclusion is yes it would make content generation for a prospective spammer easier but there still are a bunch of technical things you have to get right (maybe not for Reddit but for platforms with better protections like Instagram) or your accounts will all get banned in waves. Like your TLS fingerprint, order of headers, or making all the analytics requests that the official app does (I don't even attempt to make these so I assume the accounts will eventually be banned). The other reality is that bots for governments and marketing people usually post low reputation links for their propaganda or affiliate purposes so they are likely to be caught that way anyways. I don't think it dramatically changes things in this space to be honest because I'm pretty sure the large scale disinformation/spam operations were already employing poorly paid foreign people to write posts. Maybe I'm wrong.


This is a sad but inevitable consequences of tech people having no grounding in ethics. And really no education in or respect for the humanities at all. It's a classic case of "so preoccupied with whether or not they could, they didn't stop to think if they should", and evidences all the shallow fallacies that accompany this kind of thinking. The appeal to hypocrisy, false equivalence, 'whats the harm', etc. Resulting in the kind of amoral, 'look what technically cool and kinda messed up thing I made, but whats the real harm' cynicism that's been used to justify the destruction of the commons online since the creation of the banner ad.


Ironically, I find this to be the height of elitism and disconnectedness with humanity, to believe that people need university education in "the humanities" to have a fully developed and grounded moral code and sense of ethics.

People who live "primitive" lifestyles who have zero academic education and have never heard of let alone from any "experts in humanities" can have a keen sense of what is fair, just, right, and wrong, empathy, etc. So can "tech people".

And students of humanities can be lacking all those things. I have my doubts that studying these things actually changes them significantly in a person, but would be really interested to be proven wrong about that. Certainly it is not necessary or sufficient to be an ethical person though.


I agree. Please note I didn't mention anything about a university education.

You can absolutely teach yourself philosophy online based on freely available resources. You can also do introductory psychology and sociology courses from Ivy League institutions at zero cost - although more advanced work and lab research is harder to replicate without access to an institutional context. Also the curriculums do tend to be quite arbitrary and not so rounded - but that's in common with the US style of multidiciplinary undergraduate degree and specialise later.

Here are some resources that link to psych courses online - https://www.onlinepsychologydegree.info/10-places-to-find-fr...

Harvard Business school also offer some ethics courses, but these are quite business focused and don't provide a strong general grounding https://pll.harvard.edu/subject/ethics

Edinburgh universities free online MOOC is likely a better and more rounded introduction - https://www.ed.ac.uk/ppls/philosophy/research/impact/free-on...

To answer your broader point, you're confusing behaving in a commonsense moral or ethical way with understanding and reasoning from a grounding in ethics. I haven't suggested that studying ethics alone makes one virtuous, or that a lack of academic background precludes ethical behaviour. What I am suggesting - and I think your comment further evidences, is that a lack of interest and education in the (two thousand year long) tradition of thinking formally about ethical problems can ensure that our ethical decision making is arbitrary and reactive rather than rooted in our fundamental values. In other words, thinking and reading into this stuff doesn't replace your value system - it gives you a much richer understanding of how you've arrived at your values and can put them into practice.


You agree the post I replied to was the height of elitism?

> Please note I didn't mention anything about a university education.

What do you consider "education in humanities" then, that a "tech person" is unlikely to have received?

> You can absolutely teach yourself philosophy

Again, you seem to have confused having an academic understanding of ethics with a compulsion to act ethically. I don't believe there is much linkage between the two.

> To answer your broader point, you're confusing behaving in a commonsense moral or ethical way with understanding and reasoning from a grounding in ethics.

I'm not. Your comment I replied to suggested that a lack of education in this stuff is the cause of apparent poor behavior, so perhaps it was you who was confusing those things.


This would have been more convincing if you'd listed an actual harm like wasting people's time reading AI generated comments.


Wasting peoples time is a relatively minor harm (in this one case - at scale the waste and diminishment of attention is an enormous issue). Increasing noise to signal ration in online discussions, cultivating a bot net that can be replicated or directly used for nefarious purposes, actively distracting from useful information and authentic relationships, and literally advocating for increasing utility zero spam are all bigger issues.


I think parent's objection is to exactly this logic - a sort of mental laziness demanding proof before willingness to attempt to grok the potential harm.

And then we get stuff like accounts hacked, bank accounts wiped out, people's lives/reputations ruined (social media, poor sec practices, etc.).

There are always consequences, just because it's cool and Shiney doesn't make it otherwise.


> This is a sad but inevitable consequences of tech people having no grounding in ethics. And really no education in or respect for the humanities at all.

I'm going to bet a significant portion of "tech people" have some background in the humanities. After all, many of us are the children of "The Matrix". Also, what does a background in humanities have to do morals or morality. The most evil people in the past few centuries have had significant education and respect for the humanities.

> justify the destruction of the commons online since the creation of the banner ad

It's not the "tech people" doing that. It's the people with a 'background in humanities' pushing for the destruction of the "commons online". Tech people are doing what they are told.


I think a reverse experiment would be far more valuable. Try to find patterns of collective clicknets/botnets trying to swing a subreddit in a certain direction for instance.


I see this as valuable. Makes you reconsider everything you read online.


There's probably thousands of peopple doing this just for lols. Anyone can do this with an hour or two to spare... yep, we're probably talking with bots more often than with real humans :(


>Current problem is that the responses typically range from cynical to way too enthusiastic.

Interesting, this sounds like it could be solved with sentiment analysis.

Best approach would probably be to match the sentiment of the thread / comment you're replying to.


Good. Flood Reddit with idiotic comments to make the platform even worse.


What makes you think they only do this on Reddit? Surely since they don't disclose this one on Reddit, that means they wouldn't disclose here if they did the same with HN. Even if it's not them, surely the idea isn't unique enough that nobody else might do this.


You should absolutely assume that people are running bots on hackernews. The technology exists. I don't think it would even be that hard. I expect I could have a functional prototype in an hour (puppeteer to run a browser, simple Markov Chain to guide actions through states, hit OpenAI's endpoints for completions using OP's prompt slightly modified for hn). Long term, you could refine this with a locally running model that you fine tuned on comments (plus measures to avoid bot detection).


The comments made by the experiement are no better or worse than standard comments you'd find on those threads anyway. That old people collecting things comment was actually pretty good.


I should mention that Unfortunately, at least for the account you posted, the results are now spoiled, since there is a risk that one or more of the detractors in this thread will report it.


That's true. I guess I'll just abandon that account. But it doesn't respond to comment replies anyways so those responses to it are pointless.


As some other users, I'm not a fan of these experiments either. I think that, given prior mod notification, it's fine. And, if anything, it would prove more effective if your aim is to make moderation more resilient to these spam attacks.

I know that the public internet is already full of these. Doesn't justify it. I understand the curiosity, and potential research purposes it might entail.


I'd be very interested in seeing your results and findings written in a writeup in the future. Thanks for sharing!


I’ve seen a similar prompt before, did you get the inspiration from somewhere?


Yes, it's partially based on the Snapchat AI one. I don't really know that much about optimizing prompts so I started there, removed the stuff that didn't apply and added a few things of my own. There is probably a lot of room for improvement.


> Current problem is that the responses typically range from cynical to way too enthusiastic.

...can't you tell it to make sure to never respond cynically?


I am probably not a bot or a dog using a computer but I mostly respond in a range from cynical to way to enthusiastic.


I am also not a bot


I AM NOT AND HAVE NEVER BEEN A ROBOT


Consider including some other comments from the thread as examples


wonder if you'll get effected by the API pricing changes.

that's pretty cool though, are you willing to share the source?


What pricing changes? As far as I can tell they've all been coming down for the past few months. Currently the cost is pennies per day with gpt3.5-turbo. Though I probably would be using one of the locally runnable models if I didn't have a terrible integrated GPU or Hetzner had more affordable GPU servers. Probably not going to share the source because someone would just spam obnoxiously with it and end up getting my accounts banned too.


I think OP means the reddit API pricing changes. I'm not sure if that affects your use (either type of request or amount of requests) or not though.


7 days is the longest running?!


Well, I should note I started about a week and a half ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: