Around 1 year ago we got hit badly on our [blogging platform][0] by people/groups submitting fake customer support description of other big companies, either being Microsoft, Facebook, Comcast etc.
Rolled out a machine learning model and trained it on the database. 99% of them vanished.
Next day, the machine didn't work and success rate was around 5%.
Found out, they have learned the trick and now using symbols from different languages to make it look like English.
Trained again, success rate went up again.
Next hour, success rate fallen.
This time, they mixed their content with other valid content of our own blogging platform. They would use content from our own blog or other people posts and mix it to fool the machine learning.
Trained it again and was success.
Once a while such content appear and machine model fails to catch them.
It only takes couple of minutes to mark the bad posts and have the model get trained and redeployed and then boom, bad content is gone.
The text extraction, slicing through good content and bad content, finding out symbols vs sane alphabet and many other thing was at first challenging, but overall pretty excited to make it happen.
Through this we didn't use any platform to do the job, the whole thing was built by ourselves, little bit of Tensorflow, Keras, Scikit-learn and some other spices.
Worth noting, it was all text and no images or videos. Once we got hit with that we'll deal with it.
edit: Here's the training code that made the initial work https://gist.github.com/Alir3z4/6b26353928633f7db59f40f71c8f... it's pretty basic stuff. Later changed to cover more edge cases and it got even simpler and easier. Contrary to the belief, the better it got, the simpler it became :shrug
For adversarial problems like this, a shadowban approach can sometimes be necessary. Perhaps people can still see their blogs but GoogleBot gets blocked from indexing them, or they only appear to someone with the spammer's cookies. That way it takes them longer to catch on and evade the model.
Of course, that means you'll need to at least spot check your bans because you can't rely on legit users escalating to you.
The thing is, the people that weren't the ones posting the content. It appeared their computer was affected by some type of malicious file to be part of a bigger network. (botnet?)
I could see from the thousands of different IPs in different countries around the globe, that it could be affected personal computers.
Very few of them were computers from hosting companies, the rest were normal people computers.
I'm sure these machines were doing the job, someone else would have tests the result.
When we did the shadow banning it didn't made a dent in their effort.
The way they changed email, changed username, tried to be unique was completely prepared specifically for our platform (I would guess so)
Whenever we counter their attack, they would be silent for a while and then attack again. They would adjust.
Shadow ban is effective when the attacker themselves will not be aware it, in our case it was tricky to know who was the observer.
I wasn't familiar with the term, so I just searched the phrase "residential proxies", and the space seems even more sketchy than the usual VPN peddlers who promise the world and more. Are they using malware-infected PCs, or what?
Some of them use the endpoints using their app. Which means a lot of people become the spammers just by joining the network. Of course it's mentioned in the middle of their T&C, but who reads that...
I’ve always been curious how the residential proxies work. Is someone going around paying people to run proxies in their home? Are these compromised devices being exploited? Are there a bunch of storage units someplace with cable service? The mind runs wild.
Did they map to ISP ASNs? Country geolocation doesn't say much anymore since there's so many VPN providers whose business is to buy a CIDR in every country and resell access.
Very few of them were from AWS, OVH and other hosting providers, very very few.
We ran each IP towards black lists ips, paid IP reputation checkers. Majority of the IPs were clean.
Back then we had IP reputation check, but it was a headache to maintain, so we disabled it later, however that time very very few them got stopped at IP reputation checks.
As someone that's used IP proxying services that provide millions of IPs for scraping purposes, that is a very mature industry, and they advertise (and I believe them) "millions" or IPS, even for what you might consider hard to supply ones, like mobile IPs, and they let you slice and dice them however you want? Datacenter IPs? Residential IPs? Mobile IPs?[1] What state or city would you like them in? Would you like the site you're hitting to not have been accessed by this IP (through proxying at least), and if so how many days? Do you want some mix of that? Make your own configurations and set them up as proxy endpoints, etc.
Fighting against abuse at the level of IP address attributes seems like a losing game to me. Honestly, the best I saw at this (3-5 years ago at least) for traffic was Distil networks, where they put a proxy device in front and examine your traffic and captcha or block based on that.
Since you have content being submitted, there's a lot more you can use to classify, such as how you used ML, so that's good. Part of me worries that this is all sort of reminiscent of infections and antibiotics though. The continual back-and-forth of you finding a block them finding a workaround feels kind of like you were training the spammers (even if you were training yourself at the same time). At some point maybe we'll find that most the forum spam is ML generated low information content posts that also happen to be astroturfing that is hard to distinguish from real people's opinions.
1: Fun fact, to my knowledge anonymous mobile IPs are provided by a bunch of apps opting into an SDK (like an advertising/metrics SDK) which while their app is open (at least I hope that's a requirement) registers itself to the proxying service so it can be handed out for use by paying proxy customers. Think about that next time you play your free "ad-supported" mobile game.
I remember an old mailing list discussion on sourcehut, because sourcehut provides a build service that you can use for automation.
The decision from sircmpwn was, at the end, to charge money for the service. Charging money and KnowYourCustomer will kill most exploits dead.
In this sense, this is turning the frustration level to 11. You can use the service to a certain extent, without frustration, but if you want to get serious then you're going to have to jump through some hoops.
Dedicated people will still find a way through, but you've cut off 95% of the flow and killed the low-effort attempts. Now, you can focus on the serious shit.
Sounds like a micropayment system where there is much more significant first buy in amount to be on the micropayment system than the actual (tiny) cost of a micropayment to use a service would make it worth far less worth it to spam online services.
The risk of having their payment identifier/address banned from services before significant use makes it very risky for them to use such a thing even if tiny micropayments a worth it to the spammer.
It certainly could have other problems with people getting banned from such a system for things other than: spam and other use detrimental to the service provider. There is also the issue of how the initial buy in fee is distributed.
But a high buy in for a system that many online service providers use would very strongly discourage use detrimental to the services providers (I think; this is only for discussion, as this is posted by someone with little knowledge on this. Micropayment systems have been talked about a lot, but I don't remember high buy-in mentioned).
Edit: Forgot to mention that the idea of this is that service providers can offer their services at lower cost because the risk to them from a account/address with a high buy in is lower than from an account/address with no buy in.
> The decision from sircmpwn was, at the end, to charge money for the service.
Frankly, I'm shocked the other major free providers (GitHub/Lab) haven't done this by default. GitHub's current default is free for public branches, and a small fee for private.
I could see a flipped setup working: by default a very small fee (1-5 cents per x number of batches) that most companies wouldn't notice, and a path for FOSS projects to apply for credits.
I can't find the direct link, but I remember someone on HN pointing out that because CI tools are turing complete, GitHub actions is the cheapest serverless cloud product in the world right now -- you just need to figure out how to game the system.
I'm sure they've built very sophisticated filtering tools, but imagine someone slips through the cracks and gets a cryptominer working. Get that action registered in enough projects (by, say embedding it in an Actions library or generator tool) and that could be significant.
Yeah, in the end you wind up with a small set of persistent adversaries who have been tweaking their abuse alongside your fixes, and a hopefully much higher wall for new abusers to scale.
If possible, it can help to hold back new systems and release a bunch of orthogonal anti abuse systems at once. Then the attackers need to find multiple tweaks instead of just evading one new system.
If resources are free then you could even actually deploy their app and either whitelist it for their own IP or only allow very few requests before taking it down.
This would be even more frustrating and could ruin whatever they plan to do with their abusive app in the first place. Let's say they deploy their malware/phishing page, test it a couple of times (possibly from a different IP) and it works. They then start spamming the malicious link and waste decent amounts of time/money/processing power, not realizing that the link was dead after the first 10 hits.
We're primarily trying to prevent fraudulent payments combined with expensive VMs. Throttling CPU to almost nothing on high risk accounts sounds delightfully irritating.
We also get the less resource intensive, but still harmful abusive apps that port scan the internet. Those are relatively easy to detect. We generally don't want to be a source of port scans so we shut them off pretty quickly.
I wonder if you could return bogus but plausible results to port scans? Either whitelist a set of safe ports such as HTTP(S?) so that if they try to "curl google.com" to confirm whether everything is OK they get a good response but silently drop everything else causing their scan to return negative on all ports.
You would think, I’ve run into plenty of click bots that continue to run for years despite receiving noting but 4xx or 5xx results. You would think someone someplace would monitor that and spin the IP but no reaction at all.
Then there are others where its only an hour or so before rates get adjusted to our threshold and/or new IPs start emitting the same requests.
If your eyes can "normalize" a unusual symbols to a common one to make an English word then so can a lookup table. I feel like this isn't a case where you'd reach first for a neural net.
In fact, the Unicode consortium provides a report and extensive list of "confusable" symbols, which you could use alongside Unicode normalization tables to map adversarial back into more ASCII-equivalent text before running it through anti-spam mechanisms that are interested in the content of the message.
I only learned about it myself after spending too long building my own half-baked version. I think it's in pretty opaque language that makes it hard to find even if you know what you want.
Maybe somebody else on here will see it and learn about it before they need it, and at least you still have a new tool to reach for in the future.
Obviously you're saying it doesn't cover everything, but a big thing it's not going to catch beyond leetspeak-type situations is the kinds of thing you (used to) see in internationalized domain spoofing: legitimate non-Latin-script letters that just look the same or nearly the same.
NFKC/NFKD will handle "this is another form of the Latin letter A" type stuff but not "Cyrillic A looks like Latin A."
One term used to describe this is homoglyph, as in homoglyph attacks for phishing.
Back in 2015 I did some work just using simple bitmap rendering plus OCR to find text that looked like a small selection of known words. It was actually reasonably effective.
Yeah, then someone has to create or find that whole table and make.
The initial problem wasn't those symbols but the content itself, the symbols and special characters came into the problem later.
Later on as mentioned in my original comment, that they would use positive content from other blog posts that were published/passed the moderation to mix up their bad content.
Probably could use a different method, but at that time needed something quick and fast and it worked and still works with very little tweaking.
Although we don't have massive amount of threats or abusers anymore to exactly know the effect, but again, so far it works.
That time, they would coming several thousands per minute, IP blocking, range blocking, USER AGENT, captcha or anything such didn't work on them.
The good news is that the Unicode consortium has a report on this issue, and the tables already exist for normalization and mapping of confusables to their ASCII lookalikes: https://www.unicode.org/reports/tr39/
I built a Python library for finding strings obfuscated this way. Was critical when moderating our telegram channel before an ICO.
https://github.com/wanderingstan/Confusables
E.g. "𝓗℮𝐥1೦" would match "Hello"
If you can identify text written with mixed glyphs just ban it outright. Normal users don't use text like this, the pure binary presence of such "homomorphic" text at all is probably a better signal for spam than whatever your neural net when running it after normalization.
I think that depends on the users. People copying and pasting bits of text that was in English or another common language— think documentation, code, news articles, tweets, etc.— with a different character set could be problematic.
Also, 𝒮ℴ𝓂ℯ 𝒜𝓅𝓅𝓈 marketed as "𝔽𝕠𝕟𝕥𝕤 𝕗𝕠𝕣 𝕤𝕠𝕔𝕒𝕝 𝕞𝕖𝕕𝕚𝕒" would be ℭ𝔞𝔲𝔤𝔥𝔱 𝔲𝔭 𝔦𝔫 𝔱𝔥𝔦𝔰. (math symbols) A user base with young people getting bounced or shadow banned for trying to express themselves or distinguish themselves from their peers would be like ಠ_ಠ (Kannada letter ttha)
I think targeting the language they're using is a better bet.
> pasting bits of text that was in English or another common language
If they use many (maybe three? four? or more) character sets in the same post, or different character sets in any single word, then that'd be highly suspicious?
Whilst still letting people copy paste from another language
Special case needed for the shoulder shrug with an Hirigana letter tsu I mean katakana tsu
I've noticed much more usage of alternative Unicode ranges for numbers/letters in email subjects lately to make marketing messages stand out, too (in addition to emoji of course), though I wouldn't necessarily mind banning that...
huh. For any specific purpose? Does it seem like they avoiding paying for recruiter accounts or something by evading algorithms designed to detect their activity, or is it just for the heck of it?
I know right? There are so many times when I've wanted to use something like box drawing unicode characters (cp437) to explain a complicated concept on hacker news, but alas I couldn't, due widespread computer fraud and abuse. How are we going to build a more inclusive internet that serves the interests all ALL people around the world, regardless of native language, if the bad guys are forcing administrators to ban unicode? (╯°□°)╯︵ ̲┻̲━̲┻
They kinda do. Check out the shrug "emoji", table flip, and so forth. Then there's the meme of adding text above and below by abusing Unicode's "super" and "sub" modifications.
You could block it to only ever represent ASCII, but then you've knocked out the ability to expand internationally.
You can hardcode a rule for this specific bypass. Or you just retrain the neural net, and it learns that presence of these symbols = bad very quickly, and you spent less time writing and testing a custom solution.
> Found out, they have learned the trick and now using symbols from different languages to make it look like English.
I wonder if you can train a ML model using text as images. For example, taken as strings, "porn" and "p0rn" are not very similar, but visually they are.
I think they meant (and I am interested in hearing about) appealing a "block" decision that was made by your automation.
If I'm a real human and trying to post a "good" post, but the model classifies it as bad and automatically blocks it, how do I appeal that decision? Can I? Or is my post totally blocked with no recourse?
When a post gets published, it will be send to machine learning image via REST.
If bad, the post will be kept as Draft.
A new record gets created in another database table to keep track them, the accuracy rate was recorded as well.
This was made to make sure no irreversible action was done on the good content.
Blogs with more than 1 year of history would not go through moderation but no action was being taken, just recording the accuracy for future reference.
Later, someone from our team (me usually) would check them by eye and pull trigger on them, they would go into make the training better.
If something would pass the moderation but it was indeed spam, would go into another iteration.
We had to do this for over a month, through the time, the success was around 99%, no blogs would be wiped by machine classification from our database unless confirmed by someone.
That time the whole model was trained for that specific content. Later it get into other type of spams. Which we trained different models.
Overall, the the machine actions were logged, content/users/blogs would get labeled and bad marks on them.
They would be displayed in a report page, until someone make the final decision, through the whole time, the user would be shadow banned (shadow banning didn't help though) and their content would not be published.
Thanks for the detailed response! And nice to hear how much you've managed to keep humans involved in the process. I used to work on a content review automation system for a big company, so it's always fun to hear about how others handle similar cases.
And there's a lot of overlap between how that system acted and what you're describing. It makes we wonder if there's space for a company that offers this sort of model training + content tagging + review tooling capability as a service, or if there's too many variation on what "good" and "bad" input is to make it generalizable.
I'm interested in why these people were doing this. Were they hoping to get non-tech-savvy people that were searching for computer help? I guess that's a good audience of unwitting users to attempt to hack, but was the goal to get them to submit to one of the remote tech support scams? Were they embedding malware into your blogging platform, or getting ad revenue out of this somehow?
> Were they hoping to get non-tech-savvy people that were searching for computer help?
Yes.
They would create this posts and get quickly on search results (The platform is pretty good for making SEO optimization out of the box) and they would write good quality posts as well.
They would also share this posts on some other websites, especially social media accounts.
We don't have Google analytics or such to see where exactly they would come from. I noticed huge traffic to such pages by looking at the logs.
Our nginx log parser was alerting us about sudden spike on certain blogs and pre-defined list of words we have.
That's when we noticed something is going on.
Didn't take more than couple of hours (while working on the model) that we receive email from data center people about hosting phishing content, again didn't take much longer we received emails from some of those companies as well.
> Were they embedding malware into your blogging platform, or getting ad revenue out of this somehow?
No. On the blogging platform, we have everything bleached out, nothing would go in without passing through sanitizers.
They would simply had people convinced to call those US numbers.
I actually called one of those numbers and yeah, it was one of those customer supports some other part of the planet earth and definitely not from the company he was pretending to be and very quickly asked me to install team viewer on my machine. I really wanted to let them access it via the windows on my virtual box and have some fun with them, but well, someone had to fix the moderation issue :D
We still keep the free plans even through there are abusers, but that won't be a reason to retire it. So many people using for legitimate reasons and keeping their personal writing there.
1. Highly technical, because the flow was scripted to work with our website. Bypassing captcha, email verification by using many different domains, email accounts and also highly distributed via many ips.
2. Non technical. Where they paid some ppl to do it manually, which doesn't seems to be due to the way they walk through many steps like piece of cake.
However, our platform was/is a target due to several reasons:
1. Easy to register and start blogging.
2. Free plan with no hard limit.
3. Quick rankings due to SEO implementation out of the box.
4. Absence of any moderation before such attack.
And probably some other possible reasons that made their job easier and us a better target.
If I was this sort of scammer, I’d have the text being posted and progression of how hard it is to figure out pretty well ironed down. Then you just try it on lots of platforms to find the ones that don’t figure out how to solve the issue. So my guess would be they tried because they might as well, and then they hit the limit of how much effort they had put into their own generation systems and moved on to the next mark.
If you don't mind me asking, what sentence embeddings model (bert/roberta/etc) did you have the best luck with for your classifier? I like the quick retrain that can be done with an approach like this, though I have found that if you throw too many different SPAM profiles at a classifier it starts to degrade, and you might have to build multiple and ensemble them. The embedding backend can help a lot with that.
Basically, pulled the database into CSV file and anything that was published before the bad content was classified as HAM.
We had content that were OK, so marked as HAM and then our new bad content all marked as SPAM.
When deployed to production for some hours HAM content got wrongly marked and model got trained on them as well which made so many confusion but the problem was taken care of once the model got properly tuned and safer to let it be automated.
I'm curious why you'd roll your own at all when there are moderation services available. Did you just have a use case that didn't match anything on the market?
It's not that the content will be popped up to everyone when someone posted something.
There's a Feed page where you can read what others you follow have published.
There's a Explore page where latest content without any filter or categorization would be visible. This is where such content would appear, but only blogs older than 7 days would appear there (we have removed that delay in recent versions).
Basically no one noticed them.
Although we disclosed the issue we were dealing with some of the old users of platform when they complained about their posts not getting published. That was the first issue in first 15 minutes of the machine learning model classifying wrongly due to being fed mixed content (where bad content was mixed with good content from those exact blogs by spammers.)
Other than several bloggers reporting their posts wouldn't go through as expected, no one else got effected and I hope no people were lured with those scammer while their content was published on our platform.
If you host blobs for free, somebody is going to use you as their host. Even if you just hosted audio, I'm sure somebody will quickly come along with a steganography tool to hide their content on your site (and use your bandwidth).
Similarly, if you make compute power available, people will use you to mine cryptocurrency. Even if all you host is text, somebody will come along to be abusive. When you put a computer on the Internet, it's open to the entire world, including the very worst people.
If you're hosting a community, start from the beginning by knowing who your community is and how they will tell you who they are. If the answer is "everybody", then know what everybody means -- it means some people won't want to be there, because some people will make life hard for them.
It's no longer 1991, when you could assume that such people wouldn't find you. They will find you -- for money, or the lulz. You have to plan for that on day 1. You can't fix it after the fact.
Yeah, there's an entire category of "idea guys" who don't get this. They repeatedly try to crack the code on a truly moderation-free or purely crowd-moderated platform, and it never, ever, ever works.
It almost always boils down to a poor understanding of how humans work (usually some sort of "homo economicus") or how computers work (usually some sort of "AI magic wand").
Generally they'll want to make a half baked social media network without understanding you need to pay for things like hosting, or a programers time. I've made the mistake of writing code for these folks.
Guaranteed they'll never appreciate it, and this includes non profit coding groups. Never ending scope creep, vague requirements, etc.
My rule is unless you're one of my best friends I simply will not build your project for you. However the few times I have built something for a friend I found the experience to be very rewarding, it can be good to develop with someone else who can give you feedback so you actually know you're building something someone would like
I still theorize crowd-moderated platforms are possible, as long as there's really good gate-keeping.
My bet is some real-world tie, one which is time consuming and expensive to create. From there it should be possible to create moderation tools that keep the rest going.
An example of a real world tie would be a trust network that requires status with in-person communities and local businesses. And not just "accept the hot chick friend request," but an explicit "I'm staking my reputation by saying this person is real."
Slashdot’s meta-moderation system worked well for a long time. One set of people could make moderation decisions directly on content, and then another unrelated set of people would review the moderation decisions and support or revert them.
It was all tied to karma and permissions in ways I can’t quite remember. But essentially there was no way for a motivated bad-faith group to both moderate and meta-moderate themselves, and the incentives marginalized bad faith actors over time.
You had limited resources to use (5 moderation points at a time for example), at least in theory. Having moderation be moderated was a great idea though I think should come back in some form.
Moderation is labor and you get what you pay for. Which is not that crowd-moderation cannot work, but that for good crowd-moderation you still have to treat it as a labor pool, have a very good idea of how you are incentivizing/paying for it, and what "metrics/qualities" those incentives are designed to optimize for.
(In some cases it actually is far cheaper to pay a small moderator pool a good wage than to pay an entire community a bad wage to "crowd-moderate" if you actually test the business plan versus alternatives.)
Whatever happened to Something Awful? Are they still around?
They charged a one-time $10 fee to access their forums. If you got banned, you could pay $10 to get a new account. It made being a total dick expensive. I've heard it get called the Idiot Tax.
It's like StackOverflow + Reddit + Wikipedia. Section 3.1 is what makes the concept fairly unique. Most moderation systems require known moderators; the proposed system uses random selection. Eligible moderators could require a minimum reputation to further reduce the possibility of bad actors. Using something like Slashdot's interaction limit may be helpful.
Webs of trust are awesome, but call for a lot of investment from users. I think they are more viable if they are independent of any one website, so all that effort isn't flushed down the drain if the guy who owns the domain goes incommunicado.
It depends on the scale. I can vouch that crowd-moderation works fine for a small forum (~ 1000 members) which I am part of. And there is no karma system. You get to report posts (3 reports mean that the post is deleted), and warn users (24-hour ban after 3 "active" warnings, and then it scales up to a permanent ban after 15 "active" warnings). Warnings become "inactive" after a month.
It also depends on the threat model. If the community is the target of an harassment campaign coordinated by external actors, then you might need additional tools, or people dedicated to the job. However, this won't necessarily solve the problem, as external actors could double-down, and moderators can lose their minds (suspicion of a troll behind every post, abuse of power, absence of control of the moderators, possible presence of a spy/agitator among the moderation team, etc.). I won't name the forum and the community, but I have a specific one in mind. It does not help that it is a source of information for gaming media, which means that it is often linked to in press articles, which attracts much attention from all kinds of people.
That being said, I get back to the subject: user-generated content on platforms (and not just forums). If the goal is to reach a large scale, then I fully agree with you.
> You get to report posts (3 reports mean that the post is deleted), and warn users (24-hour ban after 3 "active" warnings, and then it scales up to a permanent ban after 15 "active" warnings). Warnings become "inactive" after a month.
Sounds like I could do some damage there by signing up with three accounts?
In practice, this has not happened yet, and it has been 3 years since the forum inception.
One obstacle which I forgot to mention is that an account cannot report posts or warn other members unless the account is 3 months old *and* the account has created at least 300 posts. Both conditions have to be met. I guess it is a sufficient hindrance for most Internet trolls to forget about the forum if they had no intention to take part in the community in the first place.
Yeah, I will agree with you that there can be some smaller-scale systems that work fine. That said, in these cases in my experience it's always few key "hero" mods who are just very committed to volunteering to keep things cleaned up.
Without actually hiring people, it's hard to get that level of commitment, and just as you are saying, as soon as the work gets hard enough (eg clever trolls that turn users against each other, or paranoid political crusaders who think the mods are in league with unseen forces), even the best volunteers end up quitting at the worst times.
In the long run I think the solution is just hiring moderators. It costs money, but if you want a job done well and consistently ya gotta pay.
Why can't you verify real identities? Can't you use phone numbers, photos of IDs, charging a credit card, verification of physical addresses, or invites to increase the difficulty if creating fake accounts?
Yes, there is an issue of increasing the difficulty if signing up for real users but once accounts are tied to real identities doesn't that allow crowd moderation?
Somehow authenticating every user with a "real ID" doesn't help much unless you engage in content moderation.
A system like that would be complex, costly, a major barrier for growth, and would likely still be vulnerable to fraud. You probably wouldn't have much opportunity to take legal action against abusers, even if you could identify them. Plus, safely storing the user info needed to make a system like that work would be a huge liability.
And at the end of the day you still have to moderate the platform to identify abuse and take action against abusers. But if you use "real IDs" that probably wont be a problem because you'll have no users anyways.
lobste.rs is tiny enough to be irrelevant, and already virtually unusable because it's unwilling to ban people who are unpleasant without being unambiguous rulebreakers.
That swings to the other side of "you either die an MVP or build content moderation": people are not going to submit real ID for a random project. They've only just started implementing this on Youtube ""age verification"" because they were made to, and Facebook only did it as an arbitrary after-the-fact hammer. It causes all sorts of problems (what do you regard as valid? What about deadnames?).
Twitter and lots of other sites do phone number verification which is less onerous but far easier to spoof.
And of course the biggest, highest profile moderation challenge involves people whose identities are known but nonetheless are toxic to the community. Including the "final boss" of content moderation challenges, Donald Trump.
I agree. You build the MVP then as you need content moderation you start requiring more onerous proof of identity. The only goal is to make ban evasion more difficult.
For everything else, just ask the community what they want. If they don't like Donald Trump in the conversation, then he's gone. Donald can then attempt to find a community (subreddit) that accepts him. That community can be quarantined or banned if people really don't like it.
Thank you for the feedback. I strongly suspect I'm wasting my time :(
I agree that you'd need to securely store the personal information but as it's only used during account sign up it could be an entirely separate system.
It's definitely a huge draw back that you'd risk such important information.
Absolute worst case! You're going to end up DMCA'd by the entire music industry.
> It's no longer 1991, when you could assume that such people wouldn't find you.
Even back in the nineties, there was abuse .. but the internet was so much smaller, and it was possible to manually ban them. Except on USENET. The labour of dealing with spam fell to a small number of people, one of whom wrote this astonishing rant: https://www.eyrie.org/~eagle/writing/rant.html
(and partly disowned it, but I think he was right first time)
This is the key distinction. If you charge money, from the beginning, most of your content moderation woes go away.
At Transistor.fm we host podcasts and charge money for it (starting at $19/month). We've had very little problems with questionable content.
We're a counterpoint to the narrative here: small (4 full-time people), profitable, and calm.
> Even if you just hosted audio
Most DMCA takedown requests these days are handled through the big podcast directories (Spotify, Apple Podcasts). We haven't had to write/implement any fingerprinting tech.
Charging also gives a (potential) communications path with the customer, through the payments processor. This can be useful for resolving account issues, which are otherwise a nightmare when there's no identification whatsoever.
But any payment also means that you're now fighting against "free" for growth. Free may come either from some massive monopoly who can offer their alternative as a loss-leader, or as a subsidised offering on the back of another monetisation scheme (usually advertising), where revenue potential increases with platform scale. Yes, you'll have fewer issues, but you'll also always be on the short end of the growth stick.
This is a rephrasing of another reply to your comment, hopefully with a different and clarifying emphasis.
> We're a counterpoint to the narrative here: small (4 full-time people), profitable, and calm.
I hope your model wins in the end but let’s face it - the internet makes available a global market for nearly no marginal cost per user. Networks effects and all that on top of that premise.
You can carve out a niche but the serious money will be spent on scale.
> If you host blobs for free, somebody is going to use you as their host. Even if you just hosted audio, I'm sure somebody will quickly come along with a steganography tool to hide their content on your site (and use your bandwidth).
This feels like something more of a theoretical example cited versus something that has happened. Do you have any examples of steganography being used as bandwidth redirection/hosting?
I couldn't find the article I had read a few years back, but I remember this sort of thing being used to host content on Facebook, Wikipedia, Reddit, etc, before they cracked down on it.
I did this when I was in high school for fun. I used a poorly designed comment system somebody designed and used it to transfer files around hidden in gibberish comments. Maybe not the most common thing, but more common than you would expect.
Eventually someone will come around and build a "Google docs FUSE" tool that stores arbitrary data in your system. I suspect this is the main reason Google switched docs to actually count against your usage. For normal users you can still store hundreds of millions of documents but it's pointless to encode data in it over just uploading it to drive.
I'll never forget having to be a moderator for a somewhat popular forum back in the day and oh man did I learn how a few people can make your life hell.
One thing not mentioned many times in these discussions are the poor moderators. Having to look at all that stuff, some of which can be very disturbing or shocking (think death, gore, etc as well as the racy things) really takes a toll on the mind. The more automation the less moderators have to deal with and then usually its the tamer middle ground content.
I'll never forget having to be a moderator for a somewhat popular forum back in the day and oh man did I learn how a few people can make your life hell.
I was also a mod for a popular gaming forum way back in the day. It was pretty miserable looking back.
Personally, for me, the extreme/shocking content wasn't the biggest issue. That stuff was quick and easy to deal with. If you saw that type of content you just immediately deleted it and permanently banned account. Quick and easy.
What was a lot harder were the toxic users that just stuck around. Not doing anything bad enough to necessarily warrant a permanent ban, but just a constant stream of shitty behavior. Especially sometimes when the most toxic users were also some of the most popular users.
> What was a lot harder were the toxic users that just stuck around. Not doing anything bad enough to necessarily warrant a permanent ban, but just a constant stream of shitty behavior. Especially sometimes when the most toxic users were also some of the most popular users.
What people find out, again and again, is that you just ban those users. Don't need an excuse. Just ban them. Even if they are popular. Your community will be much better once you do.
I have done this for years and it just works. You know these users give off a bad vibe and that others are put off by it. Just remove them and ignore the complaints from the user. They can find another space that accepts them. I don't even bother writing rules lists because they are pointless.
I do give warnings out first but usually that does nothing to change behavior anyway.
That was also my experience moderating a medium-sized city subreddit. Bigger problems were easily dealt with. Toxicity was a lot harder to deal with, especially when it's so easy to create a throwaway account. I quit when one user decided to target me personally, and kept evading bans to cause more grief.
All of this crap, and your reward is more complaints, more demands.
If u/gonewild can manage user verification, then any one can.
Doubly so when surveillance capitalists like facebook and NSA already have (shadow) profiles for every person, living and dead.
Facebook absolutely already knows the true identity of each and every troll. Not verifying account creation is a convenient fiction, willful ignorance, allowing their outrage machine to profit. "lalala", hands over ears, "i can't hear you!"
> What was a lot harder were the toxic users that just stuck around.
In about a year as a mod on a semi-busy political forum, the trickiest situations always seemed to involve two users, neither generally horrible but both continually stepping over the line in their interactions with each other. And each had their own highly motivated allies, so any action would ignite a new firestorm of complaints about biased moderators. What a nightmare. Probably part of why that site doesn't exist any more.
BTW, that's also where I learned some rules of effective moderation. Unfortunately, finding a forum where moderators know how to moderate is hard. Far more often, they fall into a pattern of ruling on technicalities instead of considering what will actually improve discourse, and they always end up getting manipulated by the community's worst members to drive out better ones.
> What was a lot harder were the toxic users that just stuck around. Not doing anything bad enough to necessarily warrant a permanent ban, but just a constant stream of shitty behavior. Especially sometimes when the most toxic users were also some of the most popular users.
This is the problem with community guidelines being the be-all and end-all. Hard rules are great for catching insults or slurs. They're not so great for dealing with actual abuse or inciting very bad ideas.
There's a reason white supremacists and literal nazis (yes, really with the salutes, genocide fantasies, Jewish conspiracy theory and all) have shifted from using obvious language to dogwhistles and "just asking questions". Erosion is a much more powerful force than a few direct impacts.
If you want to moderate a community, you need to have a plan for dealing with toxic individuals, not just language. We tend to imagine "hackers" and (foul-mouthed) "trolls" but I find Molly's archetypes a lot more though-provoking: https://twitter.com/mollyclare/status/1254886822779502593?la...
I think community moderation is a problem we tend to run into the Dunning-Kruger effect with because it seems like something we have an intuitive understanding of even if we have zero experience actually doing it and having ever learned what works and what doesn't.
> Hard rules are great for catching insults or slurs.
Bingo. Hard rules encourage brinksmanship. There's always a class of "picador" users who will poke and prod and provoke just up to the line where the rules are, then flag the response. A moderator too wrapped up in rule by technicality (or too lazy to look at context) will then come down as harshly as they can on the author of the flagged comment, and give the picador a total pass. Problem is, the picador does this again and again and again, never making a positive contribution, while their targets are often chosen precisely for their prominence. Guess which one is encouraged to continue their behavior, and which one is encouraged to go away. Has the "moderator" helped to improve discourse on the site, or helped to ruin it?
We had some of the crazy people track us down and call in bomb/death threats to our office building.
So many though we were in collusion with a specific forum moderator (out of a million forums) and got to incensed. And this was in the early 2000s that we think was a saner time.
A close friend of mine is a primary contributor to an extremely popular console emulator. He learned quickly to author under an alias which he keeps secret – even from most of our friend group.
It's bizarre that he has to keep this real love of his, which he's devoted hundreds and hundreds of hours to, so close to his chest.
But sadly The Greater Internet Fuckwad Theory holds true today.
> Especially sometimes when the most toxic users were also some of the most popular users.
If the "toxic" users were the most popular, how do you know you were not the "toxic" one instead? If the community is supporting the "toxic" material, how could it be "toxic"?
In my experience, it's not the toxic users that's the problem. It tends to be the toxic mods. You can ignore toxic users. You really can't ignore the toxic mods.
I also have a problem with the term "toxic". It ultimately means "something I don't like". Mods should never ban "toxic" content. They should ban illegal and perhaps non-pertinent content. But that's just my opinion.
There have been a bunch of articles lately about the horrors that Facebook moderators have to pour through. FB has been forced to pay $MMs to some of them for mental health: https://www.bbc.com/news/technology-52642633
> Having to look at all that stuff, some of which can be very disturbing or shocking
Yup, was the designated person to report all child porn for our photo-sharing website. It was horrific. Some of those images still haunt me today, they were so awful. And the way the reporting to the NCMEC[0] server worked, you had to upload every single image individually. They did not accept zip files or anything at the time. It was a giant web form that would take about forty image files at once.
Even without seeing that stuff, seeing a constant stream of bad behaviors with the probably-good behavior filtered out can subtly change your priors about people - it makes you start thinking people suck more in general, kind of like how watching news where they show the worst of the worst makes one trust people less.
I definitely used to notice this after some time working on our moderation queues.
> I'll never forget having to be a moderator for a somewhat popular forum back in the day
Similar experience, though I'll say that the worst was dealing with other teenagers that threatened suicide when you banned them. That always took a lot of effort to de-escalate and was a complete drain on personal mental health.
I could deal with porn, shock images, and script kiddie defacements, but having people threaten to kill themselves was human and personal. It hurt, especially when the other person was legitimately having a personal crisis.
I still think about some of these people and wonder if they're okay.
Several years ago a popular gaming forum with a significant teenage audience I used to read had declared a simple policy toward threats of suicide. If you were threatening to kill yourself, do it, and stop messaging the mods, they are not here to talk you down from a ledge. It seemed pretty effective.
Most these threats weren’t serious, just some problematic teen looking for attention. Showing them that some strangers don’t give a shit about them or their antics could be a real eye opener.
It always blew people's minds when I told them that 50% of the engineering time at reddit was spent on moderating. What's interesting though is that we didn't even have any moderation for the first year or so, because the community would just downvote spam.
It wasn't until we got vaguely popular that suddenly we were completely overwhelmed with spam and had to do something about it.
What blows my mind is that only 50% is spent moderating.
At some point the engineering work can be something approximating “done” (or should get asymptotically close to it), and the health of your platform ultimately rests a lot more on the quality of the community than on any particular technical project.
What is engineering doing that it takes up as much person-hours as moderation? Running A/B tests to tweak the “try the app” dialog?
(Yes, I’m bitter… I deleted my Reddit account years ago and I’m still lamenting the loss of what the site used to be.)
I saw this happen to a couple subreddits. I don't blame reddit for that (other than selection of default subs). It's just the natural progression as a subreddit grows that it turns to memes and constant recycled content. Heavy moderation is the only way to stop that.
Roblox is working on a "moderation" system that can ban a user within 100ms after saying a bad word in voice. But their average user is 13 years old.
Interestingly, Second Life, the virtual world built of user-created content, does not have this problem. Second Life has real estate with strong property rights. Property owners can eject or ban people from their own property. So moderation, such as it is, is the responsibility of landowners. Operators of clubs ban people regularly, and some share ban lists. Linden Lab generally takes the position that what you and your guests do on your own land is your own business, provided that it isn't visible or audible beyond the parcel boundary. This works well in practice.
There are more and less restrictive areas. There's the "adult continent", which allows adult content in public view. But there's not that much in public view. Activity is mostly in private homes or clubs. At the other extreme, there's a giant planned unit development (60,000 houses and growing) which mostly looks like upper-middle class American suburbia. It has more rules and a HOA covenant. Users can choose to live or visit either, or both.
Because it's a big 3D world, about the size of Greater London, most problems are local. There's a certain amount of griefing, but the world is so big that the impact is limited. Spam in Second Life consists of putting up large billboards along roads.
Second Life has a governance group. It's about six people, for a system that averages 30,000 to 50,000 concurrent connected users. They deal mostly with reported incidents that fall into narrow categories. Things like someone putting a tree on their property that has a branch sticking out into a road and interferes with traffic.
There's getting to be an assumption that the Internet must be heavily censored. That is not correct. There are other approaches. It helps that Second Life is not indexed by Google and doesn't have "sharing".
The biggest reason IMO for moderation in the first place is because if you don't block/censor some people, they will block/censor others. Either by spamming, making others feel intimidated or unwelcome, making others upset, creating "bad vibes" or a boring atmosphere, etc.
So in theory, passing on moderation to the users seems natural. The users form groups where they decide what's ok and what's banned, and people join the groups where they're welcome and get along. Plus, what's tolerable for some people is offensive or intimidating for others and vice versa: e.g. "black culture", dark humor.
If you choose the self-moderation route you still have to deal with legal implications. Fortunately, I believe what's blatantly illegal on the internet is more narrow, and you can employ a smaller team of moderators to filter it out. Though I can't speak much to that.
In practice, self-moderation can be useful, and I think it's the best and only real way to allow maximum discourse. But self-moderation alone is not enough. Bad communities can still taint your entire ecosystem and scare people away from the good ones. Trolls and spammers make up the minority of people, but they have outsized influence and even more outsized coverage coverage from news etc.. Not to mention they can brigade and span small good communities and easily overwhelm moderators who are doing this for volunteering.
The only times I've really seen moderation succeed are when the community is largely good, reasonable, dedicated people, so the few bad people get overwhelmed and pushed out. I suspect Second Life is of this category. If your community is mostly toxic people, there's no form of moderation which will make your product viable: you need to basically force much of your userbase out and replace them, and probably overhaul your site in the process.
It sounds like Second Life lived long enough to build content moderation, pushed the work of content moderation onto its users, and in a hilarious psychological trick worthy of machiavelli, made the users think they own a piece of something (they don't) so that what other users do on "your" land is up to you. My job would also love if I paid them to work there instead of the other way around.
The Internet must be heavily censored to be suitable for mainstream consumption and the tools described make Second Life sound like no exception.
> You either die an MVP or live long enough to build content moderation
made the users think they own a piece of something (they don't)
True, land in Second Life is really a transferable lease. It is a asset, though. You can resell to someone else. Second Life makes most of their money from "tier charges", which work like property taxes and are a fixed amount per square meter. Land resales and rentals, though, are a free market. "Content moderation" is a minor part of using land in Second Life. You usually build something on land and do something with it.
You have to be present (as an avatar) to make trouble. You can be a jerk in a virtual world, but you have to do it "in person". Most of the social pressures of real life work. Troublemakers can be talked to before things reach the ejection stage.
Many of the problems on forums come from being able to post blind, with no interaction during posting. What you posted persists, and is amplified by "sharing" and search engines. Virtual worlds lack that kind of amplification. You can gather a crowd, if you wish, but they don't have to stay around.
It's not perfect. There are still jerks. But being a jerk in a virtual world does not scale. Space is what keeps everything from being in the same place.
In both your original comment and in this followup, which makes the point explicit, I find it ironic that SL have addressed one of the key challenges of "spaceless" cyberspace ... by reinventing the concept of space. Which, as you say, "keeps everything from being in the same place".
That's not a criticism. There's a considerable degree of respect. There's some irony in that spacelessness is one of the purported advantages of the online world. It also makes me wonder what, if any, other options might exist, because effective tools for combatting abuse seem scarce, and those that do exist either brittle or capricious.
For some reason I remember everyone's behavior on old school message boards as much better than modern social media. Sure, you have your degenerate boards, but just don't go there. Moderation and censor ship will always exist, but they seem to work better when they are more locally applied.
I remember it being a huge mix! It really depended on what boards you were on. There were boards I was a member of in 2003 that had very strong moderation and they were great!
I was also on some basically unmoderated boards and saw some stuff I wish I didn't see.
I think this is more indicative of the communities you were a part of than the actual behavioral norms of people at the time.
Echoing the same sentiment Counter Strike was exactly the same. There was an insane diversity of servers. Some were literally labeled Adult Content and way on the other extreme some were 'christian' where saying the word "shit" would get you banned.
True sense of authority keeps everyone at bay; if one a mod goes rogue it all collapses. But when a mod can be countable for their actions, everyone acts as a community and holds the peace. A transparent modlog could really make a community.
Clan's were more than just a bunch of mates playing a game. It was a free-open community where everyone was treated with respect regardless of who you were. Q3Arena was my first FPS at 13 and I fell in love for just the community spirit.
Organized clan-wars between X and Y, joining rival clan-servers just to poke and have fun are days which are now lost. It's the same experience of inserting a VHS cassette and hitting play knowing you were going to receive a real-feel of an experience.
I may of hit the tequila a bit too stiff tonight and this really hits hard but I do wonder if the same experience will ever make a come back.
I think this is because old school message boards had a sense of community between existing users and there wasn't a large influx of users at any single point. If one person comes in and starts running amok it's easy to just ignore them, tell them off or ban them. But now there's less persistent forums that people are a part of, so there's a lack of community standards that people just naturally gravitate towards. There's no overall community between, for example, people who comment on youtube, so any youtube comment section is just whoever happens to stumble across it.
Having run a platform of a million or so of those, this is somewhat true. But there were spammers posting across many communities and those where the mods left got littered. We had to set forums to automatically move to require moderator approval to post which at least saved the board in history, but was a pain for a moderator if they returned.
> Second Life has real estate with strong property rights.
Property rights is the wrong framing. We know this from the social sciences. The actual solution is just ownership.
This is the problem with public spaces in big cities compared to close-knit smaller communities: if it "belongs to everyone" it doesn't belong to anyone, if it is owned by the city/state/government, it's not actually owned by the people.
I'm not saying framing it as individual property rights doesn't work, I'm just saying it's too narrow an interpretation and a bad analogy, e.g. because property rights can allow for layers of indirection which erode this effect whereas ownership can be shared and still maintain the effect (although a cynic would then frame it as "peer pressure").
> Roblox is working on a "moderation" system that can ban a user within 100ms after saying a bad word in voice. But their average user is 13 years old.
Reminds me of XBox live more than 10 years ago. They banned the word "Gay" since it was used as a slur by (by their estimate) 98% of users.
But there was a two percent population that simply used it legitimately. [0]
It's really funny every time a "we don't censor" platform pops up catering too the American right they speed run going from moderation == censorship to we're moderating our platform in record time. Turns out moderation is really important to make a platform for a community.
If you create a platform with absolutely zero censorship, you become a repository for child porn. I participated in Freenet many years ago because I liked its ideas (And thought it would have been a nice way to pirate games without my ISP being able to know), but it got a reputation for being used for CP, and I promptly deleted it, because I want no part in that.
If you merely censor illegal content, you will become a home for disinformation and ultra right-wing conspiracies. See Parler.
In either case, and especially the first, you're likely to get kicked off your hosting platform and get a lot of attention from the government.
I don't think it's possible create a "we don't censor" platform without hosting it in some foreign country that doesn't care about US laws and also hiding that you're the one that runs it.
I think the more general rule is that if your platform is advertising itself as an alternative to an existing dominant platform but that doesn't censor "X", the kind of people who'll flock to you will mostly be those who value X over any other interaction on the dominant platform.
Most moderates don't feel restricted enough in their speech to use Twitter even if they may have some racist ideas they don't feel safe stating there, but literal white supremacists who want to talk about the superiority of the white race all day long will love your alternative that doesn't censor them.
But if those white supremacists now all flock to your platform you likely now have a platform mostly consisting of white supremacists. This will naturally limit who else wants to join your platform because if they don't like being around tons of white supremacists they don't stick around long enough to build a counterbalance.
This can also happen after you think you've already established a more nuanced community as with your example of Freenet.
It turns out that while some people may seek opportunities to talk about X, others will not want to share space with discussions about X, often because of what it means to their own safety but also sometimes just from fear of association.
If you have a fun social network and it has a prospering community of nazis in it, you're now the nazi network, no matter how much of your community is about other things than being a nazi.
Gab may be a more useful point of reference, but what's hilarious about right-wing platform is that their censorship is fine, it's the other people's censorship that's the problem. (Similar to how immigrants having fake papers is wrong but having a fake vaccination card is sticking it to the man). Go post some pro-vaccine or pro-mask mandate things, and see how long you last before being deplatformed.
I saw the author had moved to either a self hosted domain/ghost but didn't want to risk hugging them to death...however unlikely it is that anyone would click on my random link.
Reading your comment history I would hypothesize you generally follow a coherentist/constructivist school of thought. I'm drawing this from repeated efforts to ensure alignment and clarity of actual meaning, especially in cases where vagueness in the language used by others might induce misunderstanding by someone who only gave a cursory read to a concept. You value whole concepts rather than individual facts and seem to approach things with the idea that your understanding grows over time through ongoing experience - i.e., understanding is constructed.
I would highlight this quote: "I instantly started learning very useful features reading the GNU docs. (I still need to fully internalise those). Yes, the full manual is very much better than the manpage."
No one has ever actually taken me up on this before...
> If you merely censor illegal content, you will become a home for disinformation and ultra right-wing conspiracies. See Parler.
What do you count as disinformation and why is it a problem? If you disagree with something you can ignore it and move on, or engage with it and respond with your own counter-argument. It doesn't seem like a problem that reduces the viability of the entire platform. It is also strange to me that you seem to think a lack of censorship only favors "ultra right-wing" conspiracies. I saw a lot of disinformation about policing being spread throughout 2020 without much evidence. Those who pushed those narratives did not face any moderation for their misinformation. I recall as well when Twitter, Medium, and others banned discussions of the lab leak theory. The pro-moderation crowd unwittingly aided in the CCP's avoidance of accountability and smeared some very rational speculation as disinformation. I don't think I want anyone - whether the government, powerful private companies, or biased moderators - to become the arbiters of permitted opinions.
> In either case, and especially the first, you're likely to get kicked off your hosting platform and get a lot of attention from the government.
It's also curious to me that you mention Parler, because January 6th was organized more on other platforms than on Parler. Silicon Valley acted in unison against Parler because they share the same political biases among their leaders and employees, and because they share that degree of monopolistic power (https://greenwald.substack.com/p/how-silicon-valley-in-a-sho...). The darker part of this saga is that sitting members of the US government pressured private companies (Apple, Google, Amazon) to censor the speech of their political adversaries by banning Parler (https://greenwald.substack.com/p/congress-escalates-pressure...), in what can only be called an abuse of power and authority. When tech companies are facing anti-trust scrutiny and regulatory pressure on other issues, why wouldn't they seek favor by doing the incoming government's bidding and deplatforming Parler? I feel like the actions observed in the Parler saga are less about moderation and more about bias and power.
It is a problem because you'll get thrown out by your hosting and other service providers if you don't moderate your content; so if you want to keep running your service, not moderating is simply not a practical option. That is why Parler is mentioned, they are a demonstration that it's not practical to keep operating without accepting a duty to moderate (as Parler did eventually) even if you try really, really hard.
And while there are a lot of conspiracies, all of which will be on your site if you don't moderate, most of them will be tolerated by others but it's the ultra right-wing conspiracies / nazis / holocaust deniers that will cause your service threats of disconnection; so you'll either start moderating or get your service killed in order to protect them.
I understand you don't want anyone - whether the government, powerful private companies, or biased moderators - to become the arbiters of permitted opinions; however, you don't really get to choose (and neither do I); currently there are de facto arbiters in this world.
I won't get into the argument of "Well then who is the arbiter of truth?" because honestly, I don't have an answer. It can't be the government for obvious reasons, but it also can't be private corporations, and certainly can't be the general public. That leaves...nobody. Maybe a non-profit organization, but even those could easily be corrupted.
> and why is it a problem?
Nearly 700,000 US deaths from COVID so far, a number that continues to rise due to anti-vax disinformation convincing people to not vaccinate.
Disinformation is literally killing people by contributing to the continued spread of a pandemic. It's absolutely insane to me that you would genuinely ask why disinformation is a problem.
Just because I don't have a solution to a problem doesn't mean the problem doesn't exist.
> If you disagree with something you can ignore it and move on, or engage with it and respond with your own counter-argument.
If this was an effective approach, Tucker Carlson would have been off the air ages ago, QAnon would have been dismissed as a crackpot by everybody, and disinformation wouldn't be a problem.
> Disinformation is literally killing people by contributing to the continued spread of a pandemic. It's absolutely insane to me that you would genuinely ask why disinformation is a problem.
I wonder if you ever considered the possibity that not everything that goes against your current beliefs is disinformation.
I wonder if you ever considered the possibity that not everything that goes against your current beliefs is disinformation.
Speaking only for myself, I try to consider such possibilities constantly. Unlike what passes for Trump-era conservatives, I don't like to be wrong. If you correct a mistaken belief of mine, I'll thank you for doing me the favor.
Difficulty: Your faith's no good here, nor is your money. You'll need to bring evidence.
I'm open to having my mind changed, but it takes rigorous testing and peer review. I'm not going to have my mind changed by some talking head on the TV or some dude in sunglasses recording a video from his truck.
If I got COVID, I'm not going to demand Ivermectin just because Joe Rogan told me to.
But you're dodging the issue at hand. COVID has killed nearly 700,000 Americans and over 4.5M people worldwide. Vaccines slow the spread of COVID and are safe. This is all verifiable information. Claims that the vaccine is ineffective or has deadly side-effects are disinformation. Not because I disagree, but because extensive study and peer review points to those claims being false.
When you disagree with scientific studies that have undergone extensive peer review, it's not disagreement. You're just wrong.
If you don't mind answering another question, why the hate for Tucker Carlson? I don't follow him but people have complained about him so much that I've now seen a few videos to see what the fuss is about. I didn't see anything wrong within those few clips I saw (maybe an hour's worth) - it didn't seem any different from any other mainstream news in that one side of the argument was being presented, with a lot of conviction. But I did not see misinformation. I am sure there's some non-zero amount of misinformation that can be found from scouring his clips, but that's true for anyone and any source, and I certainly don't think he should be "off air" for it. I can't help but think that a lot of the character attacks against him are simply made because he's a prominent and successful voice on the "other side", and his effectiveness is a risk to political adversaries.
As someone who wants to see Tucker Carlson off the air, do you see your position on the matter differently? Are there conservative voices you support being platformed, and what makes them different for you?
It's not a good sign when the "News" network you work for has to go to court to argue that no reasonable person would take you seriously.[1]
Just over the past year, Carlson has peddled such obvious falsehoods as claiming the COVID-19 vaccines don’t work, the Green New Deal was responsible for Texas’ winter storm power-grid failure, immigrants are making the Potomac River "dirtier and dirtier,” and that there’s no evidence that white supremacists played a role in the violent Jan. 6 Capitol riots. [2]
We don't need this guy on the public airwaves. He should get a blog... that is, if he can find a hosting provider who will tolerate his views.
If you disagree with something you can ignore it and move on, or engage with it and respond with your own counter-argument.
I sympathize with a lot of what you wrote (and didn't downvote it), but the Trump era has highlighted a serious problem with your specific point above. The marginal cost of bullshit is zero. It takes basically no effort to post more of it, while it always takes at least a small amount of effort to debunk it.
Worse, the bullshitter usually has the first-mover advantage. To claim the initiative, they only have to post a new thread predicated on far-right propaganda or conspiracy theories or hijack an existing one. Once lost, the rhetorical high ground is difficult and time-consuming to reclaim. As soon as you argue with the shitposter, they effortlessly shift their role from aggressor to victim, as some would suggest is happening in this very conversation.
I've always maintained that the antidote to bad speech is more speech. A few years ago I would have died on this hill at your side. But principles that don't work in practice are useless... and this one, having been tested, simply doesn't work in practice. The sheer quantity of bullshit has an ironclad quality all its own.
The consequences of a toxic media landscape at scale are felt by those considerably removed from the platform(s) themselves. Twitter cheered the Arab Spring protests. Facebook are attempting to distance themselves from their very active role in the Myanmar genocide. History shows that changes to media landscapes are often very disruptive socially and politically, often at a horrific cost of innocent lives.
How is that funny? Every platform has to block illegal content. Every platform wants to block low value content like spam. Many platforms want to block obscenity and pornography. None of this is in any way news to any of the platforms you’re alluding to.
The interesting distinction between platforms is not whether they moderate, but what lawful and non-abusive (of the platform itself) content they permit.
Edit: Child is incorrect. The vast majority of moderation on free speech platforms is criminal threats and other illegal speech.
The irony comes from the fact that their moderation almost always falls into two categories:
1. They have to moderate the content that got them kicked off the original platform, because it turns out nobody wants to buy ad space on a forum dedicated to why the Jews are responsible for all of the world's evils; and
2. They choose to moderate dissenting political opinions, which is just bald hypocrisy.
It's not illegal content I'm talking about. I'm specifically thinking of sites like Parler and Gab which very loudly and specifically start as anti-censorship of anything legal for the American right that feel like they're being censored off of the regular platforms. Then they quickly learn that no moderation means you'll get absolutely flooded with trolls who aren't a fan of your chosen ideology and are willing to spam and troll you. That quickly encourages them to start actually moderating in some regards the exact thing they were created in opposition to because what they're actually mad about is specific moderation decisions not the idea of moderation in general.
Exactly. They were never actually mad about moderation in general, they just didn't get like getting their stuff getting modded off the platform. They're fine doing the same thing to others.
Indeed, all of these platforms have been "free speech" but banned pictures of naked women from their inception. They're not free speech in any way, they're just okay with racism.
Can't help but think back to W. Edwards Deming's distinction between after-the-fact efforts to "inspect" quality into the process -- as opposed to before-the-fact efforts to build quality into the process.
OP offers a first-rate review (strategy + tactics!) for the inspection approach.
But, the unspoken alternative is to rethink the on-ramp to content-creation privileges, so that only people with net-positive value to the community get in. That surely means a more detailed registration and vetting process. Plus perhaps some way of insisting on real names and validating them.
I can see why MVPs skip this step. And why venture firms still embrace some version of "move fast and break things," even if we keep learning the consequences after the IPO.
But sites (mostly government or non-profit) that want to serve a single community quite vigilantly, without maximizing for early growth, do offer another path.
Absolutely this. Don't build the ship and then run around plugging leaks --- plan out the ship well enough to prevent leaks in the first place.
This is hard, and rare, because it requires predicting how all sorts of different people are going to interact with the community. Traditionally, this hasn't been something that the people who start software companies are particularly interested in, or good at. And a laser focus on user growth only compounds the problem.
Maybe when the Internet was new. But whether you count the Internet's birth in the 1980's with the original cross-content and cross-country links, or around the first dot com boom and bust in 2001, or with the iPhone in 2007, we know how "the Internet" is going to interact with "the community". We knew this back in 2016 when Microsoft released their "AI" chatbot to Twitter, and Twitter taught it to be a racist asshole in less than 24 hours† and the Internet, collectively said duh. Of course that was going to happen.
Anyone who's started a new community these days knows they have to start with a sort of code of conduct. That's non-negotiable these days. Would it be better if platforms like Discord did more to address the issue? Absolutely.
You're totally right it isn't easy - but the Internet's a few decades old by now and we know what's going to happen to your warm cosy website that allows commenting. The instant the trolls find it, you either die an MVP or live long enough to build content moderation.
If your blog required a “real ID” to post content rather than allowed anonymous comments, would we have the same problem? The premise of the GP (and one I share) is that the internet’s content moderation problems are symptoms of default anonymity. Twitter is default anonymous so nobody’s reputation is a stake when they teach a neural net hooked up to the twitter firehose to be a racist asshole.
In my experience, Facebook comment threads (while still awful at times) are very different from e.g. YouTube or Tumbler or Twitter comments. Sure they still devolve and become a mess sometimes, but from what I remember there was noticeably less "hard" trolling (i.e. 4chan style derogatory) on Facebook. People still light troll Facebook but not so much in the explicitly derogatory and assholish ways done in communities where expendable identities exist. In any event, because people use real IDs on Facebook, we can and do impose consequences. Remember when everybody used their first name middle name so jobs wouldn't see their underage party pics with substance use?... And then when everyones parents and grandparents joined people just stopped posting that stuff altogether and facebook "grew up".
In this case, a mix of both is required. While you absolutely must plan ahead and implement as many safeguards as you can prior to launch, that's simply the beginning, and it is incredibly naive to think that all the leaks can be prevented. (Or, honestly, that really any aspect of a community can be perfectly master-planned in advance.) To operate anything like a UGC platform is to be eternally engaged in a battle against ever-evolving and increasingly clever methods someone will come up with to exploit, sabotage, or otherwise harm your platform.
This is totally fine -- you just need to acknowledge this and try not to drop the ball when things seem like they're running smoothly. Employing every tactic at your disposal from the very beginning should be viewed as a prerequisite, one that will start you off in a strong position and able to evolve without first having to play catch-up.
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
In HN articles where we discuss social media moderation there's often this idea that "they shouldn't be doing this at all". But I think for most companies and users ... they won't like what a completely moderation free site looks like.
So here we are with this painful problem.
I kinda wish there was an imaginary "real person with an honest identity" type system that did exist where we could interact without the land of bots and dishonest users and so forth. But that obviously brings its own issues.
> I kinda wish there was an imaginary "real person with an honest identity" type system that did exist where we could interact without the land of bots and dishonest users and so forth. But that obviously brings its own issues.
That sounds like it could be done in a way that isnt terrible. As a user, you sign up with an identity provider by submitting personal documents and/or an interview to prove that you’re a real person.
Then, when you sign up to an app/service, you login with the ID provider and come up with a username for that service.
The ID provider does not give the website provider any of your personal information; they just verify that you exist (and you login their their secure portal)
The identity providers could further protect privacy by automatically deleting all of your personal documents from their databases as soon as the verification process is complete. They could also have a policy to not store any logs, such as the list of services you’ve signed up for.
This could still be gamed (ex: a phone scammer tricking someone into getting verified with a provider to get a valid ID), but itd make things much harder and costlier.
Am I missing anything obvious that would make this a terrible idea?
Your idea becomes useless for deterrence from the "automatically deleting" point. The only benefits for having users who are "real person with an honest identity" accrue when people either can't make many accounts (so banning an user actually bans them instead of simply makes them get a new account) or when you can identify them in case of fraud or child sexual abuse material or some such.
So at the very least the "identity provider" absolutely needs to keep a list of all real identities offered to a particular service; otherwise the bad actors will just re-verify some identity as many times as needed.
But if you give up the "hard privacy" requirement, then technically it's possible. It would also mean that the identity provider would sometimes get subpoenas to reveal these identities.
Yeah but making it difficult to create multiple accounts increases the power of a ban. It may not catch a CSAM distributor if the ID provider doesnt keep logs, but it will make it easier to prevent CSAM (and other illegal stuff and spam) on a website/app since its much harder for those people to create multiple accounts.
> So at the very least the "identity provider" absolutely needs to keep a list of all real identities offered to a particular service; otherwise the bad actors will just re-verify some identity as many times as needed.
The website operator is the one independently banning the ID, not the ID provider. Sure, someone could create multiple IDs on multiple ID providers to dodge those bans, but that’s harder to do, and there could also be a fee charged for each verification request.
I could see this also providing ID services for certain niches, like if you want to create a forum for registered nurses only, you could allow registrations only from ID providers that check those credentials. (Or give verified users a special badge, etc). The damn thing could be gamified.
I just worry that, in the real world, this idea will devolve into the dsytopian nightmare that is China’s social credit system no matter how hard anyone tries to prevent it. All it would take is one well placed “think of the children!” argument to justify invasive tracking and data collection/retention.
…but man, if this world didnt suck so bad, a system like that could really help clean up the internet.
Well this sound very similar to PKI and domain registar so the issues with them are pretty same.
But like PKI, if we can have id providers delegated by high authorities, the roots could be governments which could delete to smaller entites and so on. Not to much diff from pasports or liscence
Governor Dao has a better working system for this already.
Authenticated anonymous identities. They are currently targeting the NFT space, but this tech could also be applied to online communities.
https://authentication.governordao.org/
Without any background info, that looks like a joke (or scam) site. It asks you to record yourself repeating “The best things in life will always be free”, like some demon is going to kill you in 7 days if you dont share this email lmao.
Idk what Governor Dao is, but using blockchain tech for identification sounds interesting. However, the tech is seemingly always associated with scammers and criminals, so idk how successful it would be as the backend for a system based on trust and sharing sensitive data.
The thing I was thinking of when it comes to the ID provider is the potential power an ID provider has... if say you wanted to be validated, but still have some anonymity in some cases ... or just if they decide to invalidate you for any given reason.
Granted all very hypothetical stuff, I'd give it a spin for sure.
I suspect that even if we could we would wind up with a disturbing "reverse Turing test". The dark suggestion that there isn't any difference between us and other malfunctioning machine learning arrangements trained by huge data sets. They may be objectively homo sapiens with honest identities devoted to something which makes them act indistinguishable from a bot.
> they won't like what a completely moderation free site looks like.
So you say, but we've never actually had a chance to see one. We have seen content moderation slippery-slope its way to highly opinionated censorship... every time it's been tried.
There used to be a fair chance that you'd stumble on CSAM on 4chan. Without filtering and aggressive moderation, that's what ends up happening (yes 4chan did have moderation to delete the stuff back then and dish out IP bans, but it wasn't fast enough to save people from seeing those things)
One alternative to moderation is to make people pay to post. If you set the price high enough, you'll disuade most spammers and some other species of asshole. Unfortunately, people who hate moderation also tend to hate paying for online services.
True, this keeps the community small and those work really well without moderation.
The moment it becomes a bit more popular, shit hits the fan. I am still for a light approach. Banning spammers, trolls, and illegal porn posters is a separate issue for me, but many platforms began to delete unpopular opinions.
The most heavy handed moderated places are very often the most toxic ones, I think they are far worse than free for all examples. Given, that might not work for the masses.
This is false. 4chan, some early reddit, voat, and a number of other sites.
Moreover, that's the point: it's as i.possible to have a healthy anonymous forum available to the world as it is to have a large society with no laws or government, that isn't dystopian.
No moderation = porn, gore and nazi discussion. Period.
"Light" moderation often = bad-faith trolls taking advantage of your moderation.
HN does a good job of moderating for civility and effort of the post, rather than ideology.
Here's the thing: Talking about controversial topics, or debating with people who think differently than you, takes a lot of effort online, because there are so many trolls and others baiting you into defending a position without putting any effort to explain their own.
So yeah, heavy moderation is an unfortunate necessity in some forums.
I don't see it as an unfortunate thing for private spaces. It's very natural, and realistically the span of opinions where you can get productive dialogue going between people is not infinite. You're never going to get any useful discussion from, say, anarchists and neo-nazis talking to each other.
It's not just reddit, though - if you're going to start moderating, you have a meta-problem to address which is what to do about opinionated moderators.
It also puts the moderator under pressure. User will start ushering demands what you should ban. In worst case law enforcement too.
I don't really understand why people waste so much time getting something removed instead of just reading something else, but there are whole communities that live for banning others by now.
In a larger legal scope platforms are under heavy scrutiny by moral busybodies right now.
All judges are opinionated. They're humans. I don't see why you're trying to deny humanity to judges.
Read the above and reconsider if your argument makes sense. It stands to reason that if you have any position of "power," even if it's just "random internet moderator," you should at least try to be fair, consistent, and reasonable.
The problem is where the line is drawn. Arguments against explicitly illegal content (like CSAM) or unhelpful content (like spam) are used to justify content moderation that goes far beyond. Moderation on the biggest social media platforms (like Facebook, Twitter, TikTok, etc.) includes more than just basic moderation to make the platform viable. They include a number of elements that are more like censorship or propaganda. These platforms ultimately bias their audience towards one set of values/ideologies based on the moderation policies they implement. And given that most of these companies are based in highly-progressive areas and/or have employee bases that are highly-progressive, it is pretty clear what their biases are. This present reality is unacceptable for any society that values free and open discourse.
Propaganda and bias are ultimately subjective notions. Suggest that kings are no different from the rest of us today aside from what their parents did to seize power and it would be slammed as wicked propaganda denying their divine right. Heck just suggest that disfavored groups are people!
I think the issue is more that these are for-profit corporations who care almost exclusively about advertising revenue. Now, if 10% of your 'product' (*your users) is driving away the other 90% of your product, and you can no longer sell them to your advertisers, well? I'm guessing it's more a business decision, they've decided that eliminating 'fringe views' is best for business.
It's kind of like an advertising dystopian situation, where you want this homogenized audience (preferably rather dumbed down) who will eagerly buy whatever is advertised. And, you don't want to scare any of them away.
Notice this view is not partisan or ideological, it's just pure capitalism, how can we wring the most value out of this setup for the shareholders, that's all that matters. The end result is the current mess we now have, in which nobody but the advertiser is all that happy.
It’s not covered in this post, but IP infringement is another moment in life when content moderation becomes necessary. You have to be above a certain scale for large IP owners to notice or care, but if you’re growing and allow users to upload media, you’ll eventually need to start handling DMCA requests at minimum.
Also worth noting that the infamous Section 230 is what allows companies to take these sort of best-effort, do-the-best-you-can approaches to content moderation without fear of lawsuit if they don’t get it perfect.
We built service [0] to cover the Copyright Infringement and other forms including CSAM and IBSA. While in US DMCA is enough, EU has new law [1] that goes way beyond the requirements of the previous law (E-commerce Directive, which is analogous to the DMCA).
What is infamous about it? People have to lie constantly to attack it and create outright alternative universes with distinctions that don't really exist.
stream.new seems really cool. However there is no account button to see all of your video URLs, or a download option for the video. If there was, I would probably make it my default (not sure if that is what you want)
Right now the lightweight utility aspect of stream.new feel right, but if we continue to build upon it as a standalone free product then adding the concept of an "account" with saved videos makes a ton of sense.
One thing that wasn't obvious to me is, why did you care about uploads of NSFW? As I understood it, you want to become Imgur of video. Imgur only became so big because they allowed NSFW stuff.
Not involved with this project, but there's a couple big reasons most would care about this.
* Child porn and similar content that is a level beyond simply "NSFW"
* Uploaders of NSFW stuff are always in need of a new platform they haven't been kicked off yet, and newer platforms are likely to be dominated with this type of content. Unless you want your platform to gain a reputation as the place for mostly NFSW content, you probably don't want this.
* Porn is almost always posted in violation of copyright.
* Hosting porn opens you up to legal issues if you can't verify that everyone involved is adult and consenting.
* Payment processors, hosting companies and other service providers you rely on usually have strict policies excluding porn.
And that's just addressing legal pornography, not other "NSFW" content like child abuse, animal abuse, general violence or gore. If you run a large enough public user generated content service people will use it to distribute illegal content or flood it with jihadist execution videos to ruin someone's day.
If a site is specifically intended as, or is incidentally used in the course of, professional work, then NSFW content can literally get your users fired if overseen, logged, or otherwise detected.
That turns out to shift your user demographics in the medium-to-longer term, and not in a direction that's generally compatible with high-quality and engaging content and interactions.
Where NSFW content is permitted, it should be specifically tagged as such, and not presented unless users specifically opt in to it.
Great write up! Curious if you/Mux have ever considered offering content moderation alongside content hosting? Seems like most platforms do one or the other, but I imagine you could charge quite a premium if you offered both in tandem.
Another particular headache as you get bigger is that humans are better for accuracy, but less consistent. No matter how specific you think your rules are, there will be edge cases where it isn't clear what side they're on. Different human moderators may rule differently on the same content, or even the same moderator at a different time of day. When users find these edge cases, inevitably somebody will get upset that you blocked X but not Y.
And then you have to keep the actual abusive users from figuring out a way to leverage moderator X usually approving their just barely over the line images if they ever figure out how your approval requests are routed.
Interesting article, I wonder how many UGC platforms got away with a third party solution to help their moderation. Feels like a core part of the business that would directly impact if it sinks or floats.
> It’s the dirty little product secret that no one talks about.
Hmmm. I’d say that the first thing people have in mind when UGC comes on the table. A bit like how nobody thinks lightly of storing credit card info, that’s part of the culture at this point I think.
The opt-in to see comments and user based approval method has been tried before. It's just pre-banning and always fails to gain steam on websites because it achieves what banning aims to achieve, it places a long artificial delay to time-sensitive posts. Many better places with more rewarding interactions exist to achieve the same end.
Something tells me that these roast-posts (or "this shouldn't be in anyone's feed"-posts) are not going to attract any interesting talk. There is no fun in shouting moral disapproval in an empty chamber.
Agreed. I don't want to gain steam though. I want to facilitate meaningful, one-on-one, anonymous discussion, through minimal interactions with my website. Showing the resulting discussions to the general public only comes second.
Flame wars, abuse etc. only happen if there are other people watching. Why would anyone try to abuse someone of which they have next to zero info for no personal gain, and no one can witness your triumph anyway?
I was replying to your assertion that ‘Flame wars, abuse etc. only happen if there are other people watching’. DMs are an example of a major abuse vector where other people are not watching.
I understand. However, on my platform, there is no even username so it is very hard to follow people around. Also there is one reply per node limit and per user comment limit per day to cap the volume.
Anyone must be able to take a small amount of abuse to survive this world. I can't eliminate abuse but I can increase the friction of abuse and make abuse less satisfying or rewarding. That's all I want to do.
There are genuinely mean people, and people financially or psychologically incentivized to be mean. Most, if not all, mean people I encountered in my life belong to the second group. Maybe you live a much more interesting life than mine. I dunno.
You will get dominated by bot-generated spam at the very least. The bots don’t care that their content will be seen by one person, they just hammer every form they find, and the recipients will be overwhelmed having to delete this, and they will criticise your platform.
As well as bots, you will get people manually hitting it. They’ll often use software customised to spam you.
There is zero chance this model survives without any sort of moderation.
I am flattered. Let's say my platform somehow get popular, and bots is hitting it for no financial reason at all, they still get the one reply per node limit. Also I have a global per user replies limit per day.
If you have an orchestrated army of bots hitting me, then I would get DDoS'ed, just like everyone else. However, that has nothing to do with content moderation at all, right?
You don't need to be popular or flattered. If you are linked from anywhere and in search engines, they will find you. And they will reply to every node they can. And they'll create new user accounts to submit more.
I've gradually built my setup to filter based on words, URL fragments, block IPs, honeypots, shadowbanning, etc and they still get through. Take away all content moderation and I would be overrun to the point that my site was useless to my users. Those sort of methods are the easy bits too; dealing with greyarea trolls is the worst.
Anyone in this world has to be able to take a small amount of abuse to survive. I am not trying to eliminate abuse, I am trying to take away the positive feedback loop that amplifies injustice.
It is like emails. People can spam you. But if they know nothing about you, and the spam would never get acknowledged, and each SPAM incur some small delay and inconvenience to them, and they can never reap any gain, eventually they will stop.
I considered doing a moderation as a service startup a couple years ago. I didn't end up doing it because came to the conclusion that global communities aren't the future. I think platforms supporting more silo'd communities that make and enforce their own rules are how it will look. Discord and Twitch use this model, and while they have their problems, the problems look quite different from the ones outlined here.
Silos work really well. Discord may have literal terrorist / hate groups on it, but as a user I do not see any of it and the platform only contains a group of my friends. There is no marketing, no spam, only content your friends have posted. And then discord can sit in the background deleting ToS violating groups while I don't see or care about any of it.
> platforms supporting more silo'd communities that make and enforce their own rules are how it will look
Those individual communities need help enforcing their own rules.
I would happily pay a couple bucks per months for a bot that could warn and ban people on my discord server.
I agree that this is the future. I even feel US politics have suffered from being hoisted onto a global stage. When everyone on the internet (globe) can weigh in on or muck with the politics of your smaller community (country or state) you’re going to get into situations that make it hard to practically make decisions and run a country. One of the foundational principles of the US the ability to justifiably oppress minority factions for the good of the majority, but checked by systems of power distribution so that it’s not a simply mob rule and limited so as not to impinge on a set of inalienable rights afforded to all citizens. Yet on the global theatre the assumption is that minority opinions now take precedence over the majority. And what’s worse, 100 people screaming on twitter now has the same impact as 200,000 matching on Washington (to be clear, 200k marching on Washington is significant and should matter, 200 people on twitter should not).
So what? Well now when we need to oppress minority factions more than ever in the face of a public health crisis and tell people sorry suck it up you live in American where the majority says to mask up and get vaccinated if you want to be in public, we “for some reason” at critical moments in curbing the spread of the pandemic fumble around for months on end because a few anti-vexers all of a sudden have infinite civil liberties and a global platform (note, one that they didn't have when we solved previous public health crises). My fear is that we’ve become a society of “piss off I can do what I want” rather than one of calculated and ideally minimized oppression.
I also don’t understand as a society why we have to hold platforms accountable for content. If the problem is a bunch of illicit material showing up, implement KYC requirements so that individuals are exposed legally to the consequences of posting illegal material. Anonymity is a tool/privilege to be used, not abused, and distinctly not a fundamental human right in the US. Make the default less anonymous (but still private, that is something we’re supposed to care about constitutionally) and I suspect a lot of content moderation problems go away.
> 100 people screaming on twitter now has the same impact as 200,000 matching on Washington
It doesn't, though, unless at least one of those 100 has a giant following, but “one person with a media megaphone is louder than 100,000 without” isn't new, its older than radio competing with newspapers.
I believe a group took a look at how viral topics and memes emerged on twitter and simply having something like 50-100 people participate in your hashtag or retweet some content was enough to land you on trending. Their conclusion was that twitter disproportionately represents reality. Wish I could recall where I read this but it definitely made its way through HN in the last year or two.
And anecdotally we see people de-platformed and/or removed from their work because some company’s hr department got wind of a twitter stink and the company made the calculation that the feelings of a few people on twitter are so meaningful that it warrants terminating some employee. Unless that employee was seriously not pulling their weight, there no way appeasing those people on twitter is more valuable than the work your employee is doing and the cost of hiring a new one. Take that boing guy retired military who was removed from his position because of a 15yr old post about how he disagreed with putting women in combat roles for biological reasons…
Point is people’s reality is shaped by what trends on the internet and what trends on the internet is at global scale when you run a global platform/community. This global context is good in some regards e.g. the proliferation of cultural exchange and appropriation but can also be somewhat of a nightmare to manage when trying to solve logistical problems like moderating content or writing laws. I am dubious we should be entertaining “local” politics on global platforms e.g. twitter. Seems like an impedance mismatch to me.
While I think you make good points, my reasoning is far more simple: people just enjoy communities that feel smaller, more intimate, and police themselves. This is a human nature thing regardless of whether we're talking digital or physical worlds.
Our MVP needed content moderation. We put a database tool up and immediately, our first user started using it to create a public-facing database of his porn collection. It was... quite the collection.
As someone working on a platform with content moderation as a core feature, it is so much work. I thoroughly understand why so many platforms ignore it for so long.
Thankfully, we have some nice tools these days. I use Google's Perspective API to automatically hold back text input for manual moderation, which takes a lot of the man hours out of it for my moderation team.
The rest is handled by the users of the platform themselves, and metrics about content reports to curtail abuse.
LOL at the title, I think the silver lining is your moderation becomes a barrier to entry/competitive advantage if done well and kept hidden. It's one of the last things in software that's hard to copy.
There is an untold story here which is how incredibly well HN is moderated. Don’t know how it can remain so good. I feel like the center cannot hold. The site seems to me hideously understaffed, yet they do a pretty much perfect job of moderating. Would love to know if it is all human, supplemented by ML, or what.
Ages ago on This Week in Tech, Leo Laporte and perennial guest John C. Dvorak discussed the notion (in the context of Yahoo Groups at the time) that nearly every hosted content platform starts default-open to maximize the size of their user base but eventually passes over the "horse porn event horizon." Sooner or later, a critical mass of users with specific predilections would find the forum and use it for communicating on their, uh, topic of choice. And the owner at that point had two options:
- Do nothing and let the forum stay as open as it had been
- Ever be ad-supported or supported via mainstream sponsorship or partnership, or be purchased by a bigger company
One interesting approach I heard about in this domain is the Trolldrossel (German for troll throttle) by Linus Neumann from CCC. He implemented a captcha test for a comment server that would fail with a certain percentage when encountering certain key-words in the comment, even when the captcha was solved correctly.
While I have no notes about the effects and the corresponding talk seems to have vanished from the internet, it supposedly worked quite well by forcing any 'obscene' comments through additional rounds of captcha without telling that that was the reason for them to fail, thus demotivating the submitting person to do so.
I think this is a really bad solution that only reinforces bad practices of hiding content. Even worse than direct censorship, it generates more problems than it solves.
When everyone knows it, I agree. As long as it is not mentioned anywhere, and people do not write hundreds of comments, its just a failed captcha that appears more often when someone writes something in range, possibly forcing them to rethink their argument. Its no solution for all, more an interesting thought.
It drives me absolutely nuts when I encounter a video platform upstart that has not adequately prepared (or prepared at all) for the inevitable onslaught of undesirable and illegal content that users will soon start uploading if the platform has really any traction at all. No UGC site/app is immune. Even when prepared, it is an eternal, constantly-evolving battle as users find more clever ways to try to hide their uploads or themselves. If you aren't ready for it at all, you may never be able to catch up. And while a lot of the undesired content could just be really annoying to get rid of, some is catastrophic -- a user uploading a single video of something like child porn that is publicly visible can be the death knell for the platform.
I’m going to go ahead and refute some of the counterarguments I’ve heard a million times over the years just to get it out of the way.
“It could be a while before it’s necessary.”
People seeking to upload and share unsavory content are constantly getting kicked off every other platform for doing so, and thus are always on the lookout for something new to try where they might be able to get away with it, at least for now. They are the earliest adopters imaginable.
“Just let users flag content”
Lots of issues here, but here’s a couple big ones.
1. You cannot afford something like child porn to be visible long enough to be flagged, or for it to be seen by anyone at all. If something like this gets uploaded and is visible publicly, you could be screwed. I worked on a video platform once that had been around a couple years and was fairly mature. One video containing child porn managed to get uploaded and be publicly visible for about one minute before being removed. It was a year before the resulting back-and-forth with federal agencies subsided and the reputation of the platform had recovered.
2. People uploading things like pirated content tend to do so in bulk. You might see people uploading hundreds of videos of TV shows or whatever. It may exceed legitimate uploads in the early days of a platform. You do not want to burden users with this level of moderation, and actually they aren’t likely to stick around anyway if good videos are lost in a sea of crap that needed to be moderated.
“We’ll just use (some moderation API, tool, etc.)”
Yes, please do, but I’m not aware of anything that works 100%. Even if you filter out 99% of the bad stuff, if the 1% that gets through is kiddie porn, say goodnight. These tools get better all the time, but users who are serious about uploading this kind of stuff also continue to find new and interesting ways to trick them. As recently as 2017 a pretty big video platform I worked on was only able to stop everything with a combination of automated systems as well as a team overseas that literally checked every video manually. (We built a number of tools that enabled them to do this pretty quickly.)
Content shouldn’t be moderated
Child porn? Hundreds of pirated episodes of Friends instead of legitimate user videos? (Even if you are pro-piracy, you don't want to pay to host and serve this stuff, and you don't want it to distract from legit original content from your users.) What about when some community of white supremacists gets wind of your new platform and their users bomb it with all their videos?
Do not take this stuff lightly.
EDIT: I've spent most of the last decade as an engineer working on UGC and streaming video platforms
> 2. People uploading things like pirated content tend to do so in bulk. You might see people uploading hundreds of videos of TV shows or whatever. It may exceed legitimate uploads in the early days of a platform. You do not want to burden users with this level of moderation
Not to mention that viewers aren't likely to flag the complete discography camrip of My Little Pony unless they're stupid or have an axe to grind (either against the IP that was uploaded, piracy in general, or the specific uploader). The viewers are often drawn to platforms specifically because they are flooded with piracy in their early days.
Exactly. Setting aside all legal concerns and whatever anyone's philosophy is about piracy or moderated content, you still have the enormous concern about what kind of community you are fostering and what kind of people you are attracting based on what content you allow to be surfaced.
All the Reddit alternatives are each an example of why the early community matters so much. Being a piracy haven is probably the "best" outcome in terms of community-building compared to all the other common fates of low-moderation websites in growth mode.
> It was a year before the resulting back-and-forth with federal agencies
You're blaming lack of content moderation and not a law enforcement system that holds you responsible for something you had no control over when it actually failed to do its own job in this case?
> a law enforcement system that holds you responsible for something you had no control over when it actually failed to do its own job in this case?
Investigating these issues is their job. They don’t show up assuming the site operator is the guilty party, but they do need their cooperation in collecting evidence so they can pursue the case.
It’s analogous to a crime being committed on your property. They don’t show up to charge the property owner for a crime someone else committed, but they do need access to the property and cooperation for their investigation.
We weren't held responsible, but it was still investigated and required our cooperation and was not the best use of our resources. Honestly, the public reputation part was far and away the more unfortunate consequence.
Trust me, I have numerous concerns around the legal issues and the chain of responsibility, but what choice do you have? Are you going to start a fight with them out of principle and hope this works out in your favor? While still devoting the time and energy to the video platform you set out to build in the first place?
I agree. My employer has a moderation product (for comments, usernames, etc): https://cleanspeak.com/
I don't work with it much, but from what I can see it's surprisingly complicated to filter out comments quickly without impacting user experience. I guess you know you've succeeded when the pottymouths join your platform :).
These kinds of things never work because people will come up with infinite dog whistle terms for blocked terms. Things like "egg plant", "13%", etc. You couldn't possibly block all of these things without blocking a lot of legitimate discussion.
Hmm. Tell that to our customers, who seem to think it effective enough to purchase. :)
I think that you are not aiming for 100% automated moderation with a tool like this. Rather, you are looking to screen out the obvious stuff as quickly as you can, and then escalate the iffy content up to humans. But I speculate a bit.
Since I'm not familiar with the Cleanspeak platform, happy to set up a call with you and our sales team. My contact info is in my profile. :)
egg plant is a dog whistle? It is the "official" name for my google account. I am Mr. plant. I don't associate anything with it aside from a vegetable I don't particularly like.
I think this article glosses over some big issues with user generated content: auto-moderation is extremely fickle, humans are expensive and if your platform is large enough people will upload horrifying stuff.
There have been several stories about the poor treatment of content moderators by companies like Google (YouTube) and Facebook but to reiterate: the work can be literally traumatizing. Content moderators will be exposed to gore, child abuse and worse, frequently.
If your startup will need content moderation, think about how you will tackle it and consider the human cost of your solution. There's a trend in tech to either use AI or "fake" AI by using underpaid contractors with minimal labor rights (often in low wage countries). If your content moderation will run on humans, consider the toll and make sure they are compensated adequately and have access to mental health aid. Or better yet: consider whether you need to support massive scale user generated content to begin with. Growth isn't everything.
Really the future for content moderation are feeds published by site operators and volunteer moderators that individual readers can opt in or out of for filtering.
Relying simply on a central authority to decide what you should be allowed to read is a system with utterly predictable failure modes (not the least of which is too much volume for the centralized mods).
I run an online marketplace. It's a constant battle against scammers putting up fake items to sell. While I do run "content moderation" to identify the scams, the fakes are identical or nearly identical to the real items. Content moderation isn't the solution for me. As other commenters point out it's just war of attrition or cycle of escalation from a few bad actors.
The only effective method I have now is fingerprinting (i.e., invading users privacy). Browsers are becoming more privacy oriented so at time goes on fingerprinting will be less effective, with more people being scammed online. I don't think those that want privacy at all costs understand the trade off.
In a few months, I will move to an voluntary fingerprinting/identification scheme soon (like GDPR cookies opt in). Where you identify yourself or don't use my website... which may leave me as a "die an MVP" example.
Instead of a now [sufficiently abused] abuse reporting system.
What's going on? Tptb should fund these moderation companies asap, it could be a more centralized and hidden censorship tool. A SaaS that all companies at sufficient scale require for [lawful] moderation.
One old story I remember is about how Lego have built their own virtual world platform and then spent enormous amount of developer time figuring out how to programmatically detect when a user-created content turned out to be shaped like a penis.
It really frustrates me that this level of abuse is just accepted as a fact of nature.
It feels like watching masked bandits stick up a bank then walk away casually to the next bank down the street while the bank manager says "Drat! Too bad we didn't stop that one at the door. Oh well, at least they only got one register"
I know all the responses to this are going to be "mUh PoLicE sTaTe" but I really wish there was some system of accountability for breach of trust online.
I don't. The advantages are overwhelming, people just forgot to appreciate it that there isn't some form of central information control.
We see a lot of moral panics and authoritarian ambitions right now and the bad accessibility to information control is a huge blessing. If we had more responsibility here, it might look different, but that isn't really the case and probably unrealistic.
For politics, clout or attention some people would love to get rid of some others people, often just because of disagreement. It is good that they cannot. We see companies firing people because they aren't a good look. Gladly that doesn't work with internet users.
If I get abuse online, defense is in most cases very trivial. Given, that is a bit different for public personas. This is behavior not restricted to online content and why prominent people often have to employ PR. But that isn't a reason to worsen the net as a whole for everyone.
If I only want to consume curated content, there are countless venues to do that. Sadly and curiously, people show up in places where content isn't moderated and start to complain.
I believe you have misinterpreted (understandably, perhaps, given my comment's wording) what I meant by "abuse".
I meant "abuse" as in deliberate hostile misuse. Things like XSS, penetration scans, bruteforcing logins, phishing messages, etc.
Constant penetration/vulnerability scans are baseline background noise of the internet. The best-case-scenario is that the scans find no vulnerability.
The point I was trying to make is that I would prefer a world where existing online as something other than data-livestock felt a little safer.
I'm building a human content moderation service as an API for developers. It's not fully ready yet (expected next week) but people can sign up and start exploring the docs. I'd love to hear your feedback and any features you might want to see:
I should clarify that on the home page. Thanks for the feedback! We are planning to review images and flag them for objectionable content e.g. explicit, violent, etc.
Noodling at some thoughts on how technology works --- is fundamental mechanisms, not what it is --- I'd come up with a list of eight factors that seemed central. And then thought for a few days and came up with a ninth: hygiene.
In building any technology, what comes to dominate after a time are the unintended or unwanted consequences. This is particularly true of any networked system, in which as the network grows, various cost functions emerge and begin to overwhelm the value prospects.
I suspect that those cost functions are both relatively constant per node, but also tend to increase over time. The result is that small networks can grow quickly (the large positive per-node values dominante the costs), but as the node-count rises, marginal value decreases (see Tilly-Odlyzko's refutation of Metcalfe's law: v = nlog(n)), and* the cost functions increase both as bad actors are attracted to the system and as interactions between nominally good-faith actors become increasingly high-friction.
The consequence is that a "no-moderation" or "light-moderation" policy works really well, until it suddenly doesn't.
This has parallels to other network-based structures, most notably cities. Without sewage, or crime, or public health, or waste-removal, or pollution-control systems, cities can grow fairly rapidly, up to some maximum viable size. After that point, not only do already-extant issues become intolerable (or more specifically: unsurvivable), but new harms co-evolve with the cities themselves to emerge within them. (See Kyle Harper's The Fate of Rome for a wonderful exploration of this concept.)
For online UCG platforms, this presents as various forms of activity that directly attack trust and safety of the platform itself. Ultimately, growth of the platform is often limited by these effects.
Incidentally, a new professional organisation formed last year to address these issues specifically, the TSPA (Trust and Safety Professional Association). It's still finding its feet, though it's begun organising both a training curriculum and a library of practices and references on the issues, processes, considerations, and legal factors involved.
At this point I'm surprised companies aren't blanket-requiring phone numbers so they have something a little more concrete to ban, much less going whole hog and demanding government-issued ID like drivers' licenses or something.
Indie hackers recently went invite only in their community because of this spam cat and mouse. It's a really lame thing to be spending effort on fighting
Because if you don't do any content-moderation your site will turn into a horrid wasteland that most of your users don't want to be on, thus defeating your purpose if your purpose is making money or any other reason to attract users.
If your purpose is being 8chan, you'll be good.
> Slashdot is a very peculiar case of near no spam filtering, yet very good user content moderation
So, you're saying they have very good user content moderation. Which means they have content moderation. Sounds like it's human (rather than automated), and they have unpaid volunteers doing it. That's a model that works for some. A model of... content moderation.
Reddit rule of thumb: the downvoted comments are far more likely to introduce a new idea than the upvoted ones. However, it's like comparing the likelihood of dying in a car crash instead of a plane wreck: both events are rare enough as not to be worth considering in planning your daily life.
> What about not treating users like kiddies needing supervision?
There's two kinds of "users" here - writers and readers. As a reader, in this world with the humans we've got, I do not want to be subjected to everything someone wants to write. It makes a platform unusable to readers who want to read something that isn't a troll, spam, or propaganda.
The trick is to do that without crimping the users who are writers...
Here's a thought. Have a platform where identity is verified. Users can post publicly or within their circle. Any illegal or fraudulent content can be handled by the legal system due to the lack of anonymity. Beyond that, let users form groups for topics like reddit.
It incentivizes homogeneity and greatly decreases the amount of discussion on controversial topics. If posts are permanent, tied to your identity, and potentially subject to legal punishments, people with minority opinions become much more skittish.
That could be considered only a risk if the moderation is bad, but bad moderation becomes more likely over time due to feedback loops. An optimally permissive moderator will risk inviting an overly strict moderator due to that permissiveness. An overly restrictive moderator will not. There is a greater likelihood of moderation becoming increasingly restrictive over time because the moderation narrows the pool of moderators.
What I'd like is something identical to Facebook (preferably hosted in a box on my desk) that I invite all my friends to, and they have the ability to invite people to. 2nd gen is probably far enough.
Anyone cuts up rough, I go by their house and make them miserable.
You lose out on the network effect with this solution.
If someone gets to pick who's allowed onto a network, then they won't bother using it. Maybe your friends will join since they are allowed to add _their_ friends, but those friends of friends wouldn't bother because most of their friends can't join, meaning your friends won't bother either unless they really want to talk to you specifically.
It would work for groups where most people know each other, but there are already options for that that allow users to be in groups that aren't all owned by the same person e.g. Discord, Signal, Facebook.
A total user base of 50 or so would be fine. The trick is to find a group where everyone (mostly) wants to interact. An extended family could probably work.
Probably acceptance. You can create that platform, but I wouldn't want to use it personally.
But I see room for that. I would prefer it for kids until they reach a certain age so they can decide themselves. Given, I would have a lot of requirements for the identity provider. This requires a lot of trust and I don't really see many on the horizon here.
But universally for the net it would be a huge setback. It would end up like TV and interactivity isn't the only reason it got less popular.
i.e. illegal or fraudulent content is clearly owned by the user irongoat, and you know who irongoat is (at least you know their email, IP address) but no-one on your site knows who irongoat is.
If the content is bad enough, authorities will get in touch for you to tell them what email address irongoat is associated with.
But as long as the content is OK, irongoat can say what he/she wants, with no PII visible.
I like the general idea, but one place where it falls down is in Free press situations. For example, you are a whistleblower or dissident. And the other example is if you are a battered spouse and want to discuss it without the batterer being able to identify you.
We tried that with "real ID" policies. It just made people commit to their shittiness openly. Not to mention they always have repudiation. Even if we go with full fledged cryptography the opsec will fail at scale.
1. Assholes will be assholes under their real names.
2. The politically, economically, and/or socially disadvantaged are further disadvantaged under such regimes. Yonatan Zunger, chief architect of Google+, put that quite clearly: "In practice, the forced revelation of information makes individual privilege and power more important. When everyone has to play with their cards on the table, so to speak, then people who feel like they can be themselves without consequence do so freely -- these generally being people with support groups of like-minded people, and who are neither economically nor physically vulnerable. People who are more vulnerable to consequences use concealment as a method of protection: it makes it possible to speak freely about controversial subjects, or even about any subjects, without fear of harassment."
3. Laws and regimes may change. Ask the inhabitants of Germany circa 1933, or those of Afghanistan circa 2021, among numerous other examples.
4. Groups themselves can lead to and engage in destructive activities. See the case of Reddit's increasing problems with brigading and other hostile activity amongst subreddits, resulting in the banning of numerous of these.
5. The consequences of a toxic media landscape at scale are felt by those considerably removed from the platform(s) themselves. Twitter cheered the Arab Spring protests. Facebook are attempting to distance themselves from their very active role in the Myanmar genocide. History shows that changes to media landscapes are often very disruptive socially and politically, often at a horrific cost of innocent lives.
This is a solved problem since now you can ask for a small bitcoin payment over the lightning network. No way to game that. Deployed it on our group chat and 100% stopped all spam.
Okay, so disregarding that label, most platforms will have a target or focus or niche (well, the ones that want a chance of surviving, anyway) and they will thus be very wise to tailor the rules around that, and create the conditions ideal for fostering that type of content.
For instance, if you were starting, say, a TikTok-esque video app but for super-quick tutorial videos, wouldn't it make sense for upload criteria to require it be some sort of tutorial, stay within some time limit, and probably not be just a gratuitous video of a bunch of people having sex? Call it whatever you want -- "NSFW" is just a shortcut, a heuristic that most people understand the meaning of regardless of whether it is actually safe or unsafe at their place of work. But there can be no denying that platforms/communities serving some interest or demographic will have their own unique requirements, their policies and standards will reflect this, and very often this will preclude "NSFW" content.
A lot of people get bent out of shape about this and view these sorts of policies solely as some sort of censorship issue, but many fail to realize that most of the time it's just about creating the ideal conditions for the community/platform to take hold.
Isn’t this the opposite of “listen to your users”? If you built a platform and you think it’s for x, and your users use it to do y, isn’t y what you should be focusing on, in the sense of Lean Startup/Build Something People Want?
Just because you are legally entitled to do something doesn't mean it's good for your users or society: e.g. the widespread practice of IAPs, or Apple's censorship of the App Store.
Freenet/lbry/tor hidden sites all exists (and get used all the time) and it's 100% not required there at all. I hope at some point weird moralization of nudity will stop.
Have you gone on darknet sites? They have moderation too, or else they get filled with CP and terrorist propaganda just like every other service. I guess that's "fine" if you're anonymous and don't think the FBI will find you. But if you're running a business on the clearnet there's a real name and address and there will be real life consequences. The FBI gets interested real fast if you don't moderate posts that encourage terrorist acts.
Even porn sites need moderation. Trying to stop sexual abuse and child pornography isn't weird moralization.
> and it's 100% not required there at all
I'm not super familiar with the darkweb, but I assume that darkweb platforms also have active moderation, even if it's only to keep griefers out. Pornography is not the only use case for moderation.
>Trying to stop sexual abuse and child pornography isn't weird moralization.
How does moderation of content prevent sexual abuse or CP? If anything I'd argue it creates more, because those that seek the images instead have to produce their own if they cannot find them.
The article is about business running platforms with UGC.
While free forums on the darknet might get away with a tad more lax policies, if you’re a registered business hoping to make any money you won’t have a choice but to moderate in some way. At the very least it will be to follow your country laws, and more often than not your clients will require you to do so.
Outside of what the law prescribes - because that's a whole different topic with more grey area than not. If it's what the law prescribes then I'm not contending that.
> you won’t have a choice but to moderate in some way
This is the way that moderation has been done in the past 10-15 years, but does it have to be? Why couldn't a platform provide user-level controls over what they see instead of making those decisions for them? Early forum software actually did somewhat of a good job of this, and I remember building phpBB extensions that enabled more user-level control. Even with this you can go from super granular to just a couple of primary options. It becomes a tagging/filtering mechanism on behalf of the client.
Edit: UGC platforms may discover that there's some value in finding what filtering options their users use.
To be clear, just following legal requirements is no simple task in most countries, and it might already require a significant moderation effort depending on how motivated your users are.
> does it have to be?
In most cases moderation is less about what users want or don’t want to see, and more about what you want your platform to be.
For instance people are OK with product suggestions when they go on Amazon, but if your job posting site becomes an endless stream of Amazon links you’ll want to curb that. And perhaps your users find interesting products in all these links, but from your perspective it will kill your business (except if you pivot into becoming a product listing site of course)
The internet itself is unmoderated in any useful sense for content, yet it has lived longer than most of these cheesy "moderated" products that seek to impose their morality on you.
It looks like you're getting downvoted, but I think this is a good point and worth thinking about.
I believe one key difference here is group identity perception. If you like thinking in business terms, you could say "branding".
Facebook, Reddit, HN, Twitter, etc. all must care about content moderation because there is a feedback loop they have to worry about:
1. Toxic content gets posted.
2. Users who dislike that content see it and associate it with the site. They stop using it.
3. The relative fraction of users not posting toxic content goes down.
4. Go to 1.
Run several iterations of that and if you aren't careful, your "free" site is now completely overrun and forever associated with one specific subculture. Tumblr -> porn, Voat -> right-wing extremism, etc.
Step 2 is the key step here. If a user sees some content they don't like and associates it with the entire site it can tilt the userbase.
The web as a whole avoids that because "the web" is not a single group or brand in the minds of most users. When someone sees something horrible on the web, they think "this site sucks" not "the web sucks".
Reddit is an interesting example of trying to thread that needle with subreddits. As far as I can tell, Reddit as a whole isn't strongly associated with porn, but there are a lot of pornographic subreddits. During the Trump years, it did get a lot of press and negative attention around right-wing extremism because of The_Donald and other similar subreddits, but it has been able to survive that better than other apps like Gab or Voat.
There are still many many thriving, wholesome, positive communities on Reddit. So, if there is a takeaway, it might be to preemptively silo and partition your communities so that a toxic one doesn't take down others with it.
I personally see it as "plausible deniability" as the cynical actual distinction for what gets people to share blame. Not actual affiliations or whose servers it is run on. Any number of objectionable sites are run on AWS and you basically need to be an international scandal or violating preexisting terms to get booted. Like some malware to governments merchants. Amazon's policies did not care if it was legal just if you were doing so unauthorized. A wise move when international law is really like the Pirate code.
The interlinking between the pages themselves and common branding are what creates the associations. Distributed twitter alternatives like Mastodon can even share the same branding but it is on a per network basis and complex enough to allow for some "innocent" questionable connections.
The internet is very moderated, on the contrary, in terms of UGC.
Traditional, non-social, websites have single or known-group authors. When one of them is defaced or modified we call it "hacking" not "unmoderated content." We assume NASA's site has NASA-posted content. We assume Apple's site has Apple-posted content.
Sites with different standards for what they'd publish have been around for decades (for gore, for porn, etc) but many of these still exist in a traditional curated-by-someone fashion, or are more open to UGC but still have some level of moderation.
The internet is not moderated in any useful sense for content. Drug markets like white house market, and before that silk road have perpetuated for years. Tor and other darknet websites host content that is nearly universally disdained by governments and even most individuals, which I hesitate to even name here what that heinous content is (you and I both know some examples).
> We assume NASA's site has NASA-posted content. We assume Apple's site has Apple-posted content.
Trust in identity is not the same thing as useful moderation of content. That's useful moderation of identity.
>Sites with different standards for what they'd publish have been around for decades (for gore, for porn, etc) but many of these still exist in a traditional curated-by-someone fashion, or are more open to UGC but still have some level of moderation.
Those sites _choose_ to moderate their content, that doesn't exclude others that don't.
>The internet is not moderated in any useful sense for content. Drug markets like white house market, and before that silk road...
You mean the Silk Road that the US government "moderated" out of existence, along with other Tor marketplaces over the years? The same ones that suggest White House Market's existence is also likely to be limited?
I suppose in the sense that Gabby Pettito was moderated off the internet, Ross Ulbricht was moderated off of the internet and into a cage permanently for the heinous crime of facilitating voluntarily peaceful trade. Tor marketplaces were definitely not gone for years, the same content just moved under new banners. You can literally find the same content and more on WHM today as you did under Ulbricht's banner before he was kidnapped by government thugs.
>I suppose in the sense that Gabby Pettito was moderated off the internet, Ross Ulbricht was moderated off of the internet and into a cage permanently for the heinous crime of facilitating voluntarily peaceful trade.
Oh hello, strawman.
>Tor marketplaces were definitely not gone for years, the same content just moved under new banners. You can literally find the same content and more on WHM today as you did under Ulbricht's banner before he was kidnapped by government thugs.
And the only reason that happens is by virtue of Tor making it difficult to track the source of those sites and their operators. That doesn't mean that "moderators" (governments, etc.) aren't putting forth their best efforts to track them down and shut them down. It is nearly inevitable that WHM will see a similar fate to Silk Road, AlphaBay, DarkMarket, etc.. They're being shut down as quickly as they can be.
Glad to know you finally admit that being kidnapped by a 3rd party is not really what most of us think as "moderation", and thus you have made a straw man. Although in the strict sense I guess it is true that moderation could merely mean some 3rd party entity came along and violently kept me away from communicating. If you don't like me posting cat pictures on reddit, you could crack my skull or lock me in a cage and steal my PC and you would have "moderated" me but I wouldn't call that reddit moderation.
... wow. Talk about going from 0-100 entirely too fast.
I was talking specifically about sites such as Silk Road and others being taken offline (which is exactly what you were talking about, too), not once did I mention his arrest nor did I allude to it. Glancing at your username, I seem to recall previous comments from you in threads about drug use being legalized. On the broad topic of drug legalization - again - you and I agree, but you would do well to prevent your biases from creeping in and causing you to misunderstand posts and/or lash out at others.
I apologize, maybe you are not familiar with the details of the silk road. Ross Ulbricht was the administrator and creator of the silk road, allegedly. It's quite probable that without his arrest, it would have persisted even if on newly acquired hardware. I would argue his arrest was integral in these violent thugs "moderating" silk road away like the mob "moderates" away their competition.
Instead, after his arrest the content ended up on new platforms rather than the Silk Road platform.
> biases from creeping in and causing you to misunderstand posts and/or lash out at others.
Yes my bias is in complete, unrestricted free speech. Every single piece of content, regardless of how damaging or vulgar anyone thinks it is and regardless of if it portrays even the worst of crimes. I admit I am colored by that bias.
> lash out at others.
What are you talking about? You feel attacked because your poorly constructed argument was laid open. Your case is pretty clear. Even if the system of the internet has no useful filter of content (whether that is true or not), if a third party such as DEA comes along and decides to seize equipment and throw the operator in jail, you consider that content moderation. And I'm willing to admit from a practical perspective, that could be considered a form of moderation by a violent third party.
---------------
Edit due to waiting on timeout to reply below:
His arrest is hand and hand with the shutdown. It was integral. You can't say you weren't mentioning Ulbricht's arrest when that arrest WAS, in part, the takedown of Silk Road. The very fact that you said you weren't speaking of the arrest lead me to say you "may not be familiar" (note the uncertain words, that your bias clouds you from understanding did not speak in certainties.)
>s, and then angrily respond to them as such.
I think you're projecting. If there's any anger, it must be yours.
>Yeah, again, you're injecting your own biases as you create assumptions about my comments
Your comment appeared to be a rebuttal to my statement that "The internet itself is unmoderated in any useful sense for conten." If it wasn't actually a rebuttal but actually an agreement, I apologize for misunderstanding you were actually supporting that argument.
>See how I used "moderated" in quotes in my very first response? That suggests that I'm using the term rather loosely.
>If something's illegal - even if you and I think it shouldn't be - then it's typically going to be removed at some point, even if it takes a while because something like Tor makes it difficult. And in that sense, yes, the internet is "moderated" for that content. That's all I've said/argued, and I truly don't understand how that is so difficult for you to grasp.
The illegal content has only progressively proliferated since the advent of the internet, and we've yet to see an effective mechanism to moderate the content of the internet as a whole. Virtually every category of content has not only not been removed but increased.
>That's all I've said/argued, and I truly don't understand how that is so difficult for you to grasp.
Yes and I'm arguing that this is incorrect, it hasn't been moderated. At best it has passed from platform from platform but no effective mechanism has managed to censor the internet as a whole.
Sometimes I wonder with all this speak of anger, misinterpretations, and clouded judgement is just you repeating to me what your own psychologist told you.
Yeah, again, you're injecting your own biases as you create assumptions about my comments, rather than stopping to ask what I mean before you fly off the handle. See how I used "moderated" in quotes in my very first response? That suggests that I'm using the term rather loosely.
All I've said was that that's how illegal content is moderated on the internet - it is removed. Silk Road was removed, AlphaBay was removed, DarkMarket was removed, many others have been removed, and many more will continue to be removed even if Tor makes that a slow process. At no point did I bring up whether or not I thought it was "right" to remove them, or to treat Ulbricht in that manner (again, you're assuming I don't know what happened). I said "moderating" with quotes, for lack of a better word.
If something's illegal - even if you and I think it shouldn't be - then it's typically going to be removed at some point, even if it takes a while because something like Tor makes it difficult. And in that sense, yes, the internet is "moderated" for that content. That's all I've said/argued, and I truly don't understand how that is so difficult for you to grasp.
>It's quite probable that without his arrest, it would have persisted even if on newly acquired hardware. ... Instead, after his arrest the content ended up on new platforms rather than the Silk Road platform.
For implying that I don't know what happened, you seem to be forgetting that other Silk Road staff started Silk Road 2.0 after his arrest, but that was also shut down.
>What are you talking about? You feel attacked because your poorly constructed argument was laid open.
Nope. You allow your biases to creep in to your poor interpretations of other people's comments, and then angrily respond to them as such. My initial response was simple, but your strongly held beliefs have clouded your responses.
The "internet" isn't liable, so moderate is in the form of transparent traffic shaping. When disruptions are small, costs are either absorbed in aggregate by infrastructure owners (and user attention) until traffic is literally moderated away with routing.
Maybe so (that sounds believable, anyway). What he's saying is that the other 1% is unmoderated because there's no central authority [1]. The problem here isn't that people will share bad things if you don't stop them, the problem is that you're in a position of being held responsible for something outside your control. If it's illegal, it should be reported (or found by law enforcement whose job it is to enforce the law) and if it's offensive, offer some user-side filtering.
[1] this is starting to change, though - Amazon took Parler offline completely at the hosting level. Although they eventually found another hosting provider, it's not unimaginable that in the near future, service providers will collaborate to moderate the underlying traffic itself.
I guess I would rather see a system that focuses on scanning all images for illegal content (presumably there are services where you can hash the image and check for known child porn images, for example), and focus on tagging all other images for certain things (like David Hasselhof's bare chest or whatever concerns your users). Give the users tools to flag images as illegal content, or for misapplied or missing tags, and the tools to determine which type of content they wish to see. Prioritize handling known illegal content found earlier, then user-flagged possibly illegal content, then missing or misapplied tags. Handle DMCA take down requests according to the letter of the law.
Let the users help you, and let them choose what they want to see. Use conservative defaults if you wish, but trying to guess what users might find objectionable and filtering that out ahead of time sounds like a losing proposition to me. They'll tell you what they don't like. When they do, make a new tag and start scanning for that with the AI goodness.
Of course, this is what I would like to see as a user. I'm probably an atypical user. And I'm not the person about to bet their life savings on a start-up, either, so take this with a grain of salt. I just wish that content providers would stop trying to save me from the evils of life or whatever their motivation is.
Rolled out a machine learning model and trained it on the database. 99% of them vanished.
Next day, the machine didn't work and success rate was around 5%.
Found out, they have learned the trick and now using symbols from different languages to make it look like English.
Trained again, success rate went up again.
Next hour, success rate fallen.
This time, they mixed their content with other valid content of our own blogging platform. They would use content from our own blog or other people posts and mix it to fool the machine learning.
Trained it again and was success.
Once a while such content appear and machine model fails to catch them.
It only takes couple of minutes to mark the bad posts and have the model get trained and redeployed and then boom, bad content is gone.
The text extraction, slicing through good content and bad content, finding out symbols vs sane alphabet and many other thing was at first challenging, but overall pretty excited to make it happen.
Through this we didn't use any platform to do the job, the whole thing was built by ourselves, little bit of Tensorflow, Keras, Scikit-learn and some other spices.
Worth noting, it was all text and no images or videos. Once we got hit with that we'll deal with it.
[0]: https://www.gonevis.com
edit: Here's the training code that made the initial work https://gist.github.com/Alir3z4/6b26353928633f7db59f40f71c8f... it's pretty basic stuff. Later changed to cover more edge cases and it got even simpler and easier. Contrary to the belief, the better it got, the simpler it became :shrug