Hacker News new | past | comments | ask | show | jobs | submit login
Stack Overflow Moderators Striking to Stop Garbage AI Content from Flooding Site (vice.com)
42 points by pseudolus on June 5, 2023 | hide | past | favorite | 36 comments



It is far more than the moderators of Stack Overflow, which is just one of 181 Stack Exchange sites. This is a general strike, in response to policies, blanket statements about moderators, and secret moderator directives that applied to all sites. Moderators have publicly announced strike actions across the range of sites, including subjects as diverse as Cooking and Astronomy.

See https://meta.stackexchange.com/a/389812/167145 for starters.


We can only hope that they will be able to return to human-generated garbage content and moderation.


I've always felt that we lost too much friction in software over the last couple of decades. Instantly searchable content inside user devices and on the wider internet lead to people losing basic knowledge and abilities related to structuring data and mentally modeling what computers do internally.

That lowered entry barriers and costs to produce and use ever more complex software. Coupled with ever increasing capacity in compute and storage, greater bandwidth, we have this clouded blob of compute. Large tracts of the working population can't and won't discover what's beyond the interfaces they need to learn to get the job done.

I think the end result is that the overall quality of information and experiences possible in this environment has peaked and will now trend down. And I hope this will lead to people craving and striving for better abstractions, mapped to underlying layers with the least amount of astonishment, as we could hope for back in the day when a local environment and a small body of relatively stable documentation (I wanna say physical book but I understand not everyone prefers that format) were quite sufficient to produce high value software.



Can AI not moderate posts? asking for a friend.


Ha, just asked that! I think AI is pretty capable now to do 90% od moderation tasks, you can even have very few humans for the final verdict.


AI questions, AI answers, AI moderators... why even run the site at all?


So maybe after few years of recycling and reusing the same generated data as in input for the new shiny AI, we can come to the conclusion that AI isn’t really smart and you still need humans, and the new generation after will realize that and will have the intrinsic motive to not use the AI to answer in the new shiny StackOverflow..


Or maybe AI will be used for the proper niches and will be everywhere in your lives but as a tool to our use just like electricity


Probably not (yet), moderating speech (abuse, trolling, etc.) should be quite easy, but moderating the information is very hard.


People here are conflating two separate issues. 1 what should the policy be and 2 how to deal with AI generated content flooding the system. in the absence of a foolproof "AI detector" the only sensible approach is to validate truthfulness (1) and deal with the flooding problem (2) separately if it becomes a real problem.


The answer to this is solidifying reputation and filtering out the garbage from sources without a good reputation. The irony of this is that this is mostly idiots without a good reputation leveraging an AI that isn't much better in order to build a reputation so people take them more seriously.

If you set it up properly, reputation is very hard to fake. You have to build it over years and you are dependent on other people with good reputations consistently up voting and vouching for your content. Every time you post something, you stake your reputation. AI can generate a lot of content fast; and some of it might even be alright; but unless it builds a reputation over many years, you can trivially detect it as coming from a source without any meaningful reputation. It doesn't matter how good or bad the content is if you slap a giant noob sticker on their account.

All Stackoverflow needs to do is reset people's reputation to -1 if they get caught posting AI content. You don't need any fancy detection algorithms for this. And as the whole site is basically engineered around the concept of reputation, that shouldn't be too hard.


Why would SO allow AI content Sseems completely insane. They actually are in a real competition with LLMs and their one advantage is that an expert on some niche topic easily outperforms AI and isn't blatently claiming false things as true.

If the site is flooded with AI the content on it is worthless. Just ask ChatGPT...


Doesn't seem like a very important issue. They simply said that moderators shouldn't take down posts or whatever ONLY because they're AI-generated. They can still do so if they are incorrect, just like with a human poster.


No, they didn't. They told the general public one thing, and told the moderators a significantly different thing in private, via a confidential moderator-only communications channel. This put the moderators in the position of having to act in accordance with a secret policy and contrary to what the public policy said, which would make it seem that the moderators were the ones at fault when people pointed that out.


Since this is not mentioned in the open letter about the strike, I have to conclude that what you are talking about is not the reason for the strike.


Your reading skills need significant improvement. It's the 6th paragraph of that letter; and also mentioned in many individual explanations such as https://meta.askubuntu.com/q/20324/43344 . Then there are https://meta.stackexchange.com/q/389814/167145 and all of the comments at https://meta.stackexchange.com/a/389583/167145 .


Correction: Since this is not mentioned until the sixth paragraph in the open letter about the strike, I have to conclude that what you are talking about is not the main reason for the strike.


If moderators could simply know whether something is correct or not on a reading, stack overflow wouldn't need to exist.


They also can't know if it was AI generated on a reading.


That's much easier to tell.


It's much easier to generate this content than to check whether it's correct. They are apparently worried it floods the system.


But they can't really do anything about people flooding the system, can they? Also, how do they even know it's AI generated?


New accounts require some kind of turing tests. Old accounts could be banned if found out, so owners could be made aware of a policy like that. Or maybe, it really is a hard problem that would make such services more expensive to operate. In the end we the user would get fewer such free services. Paid services could somewhat tackle this problem as bots are much better kept in check financially that way.


Why not replace them with AI mods!? AI contents need AI mods.


Seems like a sensible policy to me. instead of trying to work out if content is AI generated (which isn't possible in general) just check if it's true. not sure why this is a problem. truth is the only thing that matters in this case


Right now I take a “trust but verify” approach with using StackOverflow. I assume that users are not malicious/incompetent, but verify when it matters.

With AI generated answers one must take a “don’t trust but verify” approach. And at that point, why use StackOverflow at all?


Because AI lets people generate answers way faster than anyone can verify if they're true.


But how do they verify it's an AI generated answer in order to moderate it? Seems significantly harder than checking the substance.


When someone posts 30 long, substantive answers in 20 minutes, you can be sure they're AI-generated.


As explained at length by the moderators themselves, the detection was a lot more multiply-factored than that. They also took into account (for example) massive stylistic changes in posting questions, comments, and answers; because people were only using machines to generate the answers.

See https://news.ycombinator.com/item?id=36200852 .


They could be posting it on 30 different accounts vpned into different geographical locations. Or 300, or 1000. Eventually the pipeline gets clogged somewhere

The quiestion is by inspecting text can one determine if it’s written by a human or llm? Not sure how accurate that would be and we’re back to square spam, now incredibly to eye out as spam but waste good time on broken incorrect information..


you are describing some weird hypothetical by a very determined person. they are the people who volunteer their time today to moderate, trying to stop dumb copy paste as a first step, be it via looking at the contents or frequency, whatever gets the job done. Always impressed by people opining here on what the world really needs.


Yes, but that's a corner case. What if someone just posts one? And wouldn't you want those substantive answers even if they were AI generated, of course, as long as they were correct!


Does SO have APIs you can submit answers to at an unthrottled rate? That doesn't seem like it'd be in their interest to have built - if you limited each account to posting one answer every 30min, that would prevent quite a bit of spamming, seems like?


that's an operational issue not a policy issue




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: