Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's up with the ChatGPT spam here lately?
19 points by pona-a 4 months ago | hide | past | favorite | 18 comments
I noticed in the past few days a large uptick in probably ChatGPT-generated comments. These accounts have low or negative karma, were registered in the past few months, started posting less than a week ago, and seem to just rephrase the title or the contents of a post with some faux "questions" at the end.

Had anyone found reasonable heuristic to block them? Can someone maybe collect a small dataset to train a classifier? If HN becomes a target for this, manual moderation may quickly prove insufficient.




Can you list examples? Or better, report them to the moderators ('Contact' link on the page footer)? I've reported some in the past, months ago, but haven't seen any recently.


Just browsing around right now, I already found these [0] [1] [2], the format is identical to those I've been seeing. This one [0] even seems somewhat successful at blending into a discussion, even getting a human to reply once.

I found about 5 of such accounts just today, often several per a new thread. A lot of them operate in eastern time now. At some hours, every third front page post has a few of these. It's surreal to see the dead internet hypothesis come true in real-time.

[0] https://news.ycombinator.com/threads?id=oslis

[1] https://news.ycombinator.com/threads?id=pljldos

[2] https://news.ycombinator.com/user?id=whjkh


Excellent examples. I agree those are very suspicious and a bit more evolved than the ones I found in the past. I'll have a look for those types/patterns of comments in the future, too.


And... they're gone. While I'm glad it must have gotten someone's attention, hopefully the moderators and not the spammers themselves (does HN allow one to delete 2 day old comments?), now there's no reference point.

I pray LLM output detection can get back on track so that this doesn't keep getting worse. It's certainly doable; we humans already spotted a few ones and distilled them into a fuzzy pattern, so a good classifier might have a chance. False positives are a risk, but depending on the spam volume, may be well worth it.


They aren't gone, just banned, turn on 'showdead' in your profile to see them.

You can just mail hn@ycombinator.com to get someone's attention, chances are someone did as a result of this post.


There's a bunch here (as I look right now) in the noobcomments section of HN.

https://news.ycombinator.com/noobcomments

One example from this page: https://news.ycombinator.com/item?id=41060732


https://news.ycombinator.com/item?id=41060732 doesn't look like autogenerated spam. It looks like someone that is not used to HN submitted their project and used a style that is unusual here. Perhaps they are posting the same comment in all aggregators, that is usually a bad idea because echa one has a different public and style.


It appears they edited their comment after I commented. Before, it was boilerplate ChatGPT style.


Very strange. Have ypu seen https://news.ycombinator.com/item?id=41065900 ?


> report them to the moderators

I find it hard to believe that any active moderators don't already know about it. Every thread here recently has a ton of grayed out, often spammy, comments at the bottom that should really have the posters banned.


Much of the moderation on HN is by the community through flagging and voting, the mods can not possibly read every single comment. Overall it works pretty good and having a bunch of grayed out comments at the bottom of a thread is hardly a problem. The examples given above show the users responsible are limiting their posting to once a day or so probably to evade triggering any automated warnings that would get the mods attention, so it is up to the community to report them. Not that a ban is going to accomplish much, pretty easy to setup a new account.


Never noticed it, but I'm interested; can you link some examples?


I think you can just feed the ai real HN comments (as the style to use for generating) to avoid detection.

Besides, how would the classifier scheme work? Validate the input or prune the threads? Good luck with either approach.


@dang


@dang doesn't work. hn@ycombinator.com does though.

(I just saw this one randomly)


Ah, thanks.


It's a valid concern that you've raised about the potential increase in ChatGPT-generated comments on HN. Here are some thoughts and potential solutions:

1. *Heuristic Identification*: - *Account Age and Karma*: As you mentioned, new accounts with low or negative karma could be a red flag. Filtering out comments from these accounts might help, although it might also block new, genuine users. - *Comment Content*: Look for patterns in the comments, such as generic or overly formal language, repetition, and lack of personal experience or detailed technical knowledge. - *Engagement Metrics*: Check the engagement these comments receive. Comments that are ignored or downvoted could be another indicator.

2. *Training a Classifier*: - *Data Collection*: You'd need a dataset of known AI-generated comments and genuine comments. This could be challenging but necessary for creating an effective classifier. - *Features*: Potential features for the classifier could include linguistic cues, metadata (account age, karma), and engagement metrics (upvotes, downvotes, replies). - *Community Involvement*: Encourage the community to flag suspected AI-generated comments. This could provide more data for training and improve the classifier's accuracy.

3. *Manual Moderation*: - While manual moderation might not be scalable, especially if the volume increases, it is still crucial for edge cases where automated methods might fail. - Moderators could focus on verifying flagged comments rather than monitoring all comments, making the process more efficient.

4. *Community Guidelines*: - Clear guidelines about AI-generated content could help. Encourage transparency if users are experimenting with AI-generated comments and provide a proper context.

5. *Technical Solutions*: - *CAPTCHA*: Implementing CAPTCHAs during account creation or before posting could deter automated systems from flooding the site. - *Rate Limiting*: Limiting the number of posts or comments a new account can make in a short period could reduce the impact of spam accounts.

By combining these approaches, HN can better manage the influx of AI-generated content and maintain the quality of discussions.


Who said LLMs couldn't do irony?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: