Hacker News new | past | comments | ask | show | jobs | submit login
“Twitter has an algorithm that creates harassment all by itself” (twitter.com)
164 points by cwyers 3 months ago | hide | past | web | favorite | 76 comments

My business does a lot of posting on Facebook of sometimes controversial things (we're a news site and we sometimes cover local politics). By far the most engaged stuff is the most controversial where people begin to fight and attack each other.

I have ~3k followers on one of my pages. Usually ~200-300 people see any given post if it has no engagements. If it has normal engagement, it might get to 1000-2000 views, still short of my follower count. If it has sparked controversy and there is a fight, I've had the views spike up over 8000-9000 without any shares. Facebook posts to your timeline "your friend commented on this" and others start piling in too. Facebook emails me saying "this post is getting more attention than 95% of the rest of your posts, please pay us money to show it to more people". The more toxic the comments, the more views it gets and the more Facebook begs me to pay them for it.

That's the problem with these algorithms that humans don't watch over. Usually it works great and good content is seen by the people who want to see it. But every now and again it goes out of control and people end up getting hurt and Facebook/Twitter profit from it and even promote it. And as the person who posted it, I have zero ways to stop it from spreading other than deleting the post.

-edit- oh another story... I run a news site for a town, lets call it Townsville. There is another Townsville in another state, but it is not my Townsville. I had a post go super viral, 90,000 views from my 3k followers, because somehow the post made it to the wrong Townsville and 87,000 people were being shown the wrong news article. Again, I had no way to stop this, no tools to correct it. Absolute insanity.

> By far the most engaged stuff is the most controversial where people begin to fight and attack each other.

Well, duh. If people are fighting they are definitely "engaged", but not in a good way. Defining that more engagement equals more success is what made social networks so toxic. The current state that more eyeballs equals more ad money needs to change.


>you can choose to stop participating in the toxicity for starters

In my case, it's not that easy. Simply posting factual news updates is often enough to trigger wild responses. For example, a few months ago I attended a city council meeting and wrote a Facebook post during the meeting saying "City has approved a new 70 unit condo project" with a picture of the plans. It was one of the most toxic comment sections I've ever seen simply because some people disagreed with the action being taken. Not my actions, but the actions I was reporting on. The answer certainly is not to stop reporting the news.

On a side note, your comment comes dangerously close to sounding like a personal attack. Perhaps I'm reading it wrong but it seems like you're saying if you have problems on Facebook (which I and the person you're responding to have said we do), we must be pathetic and posting like edgy children seeking the toxicity we find. If that's not your intention, maybe you could clarify?

I see that you haven't really abandoned your 'baiting style'.

Why do you state that? Some of the largest advances in our society have come from what would be seen as toxic or hostile engagements.


This is kind of like how the value of information is how surprising it is. Classic information theory, but with a twist, because in these cases the "surprise" element is often not the value of the information itself, but the way it is presented. So these algorithms are quite good at recognizing when things are surprising and thus perhaps deserving of wider sharing, but bad at recognizing what kind of surprise it is - is it actually new information, or is it just froth?

> That's the problem with these algorithms that humans don't watch over. ... and Facebook/Twitter profit from it and even promote it.

Don't be fooled.. Someone is spot checking the training data against the output at a statistically valid sampling interval.. ML in this case is the data equivalent of a limited liability corp..

Why does someone need to spot check the training data against the output? I'd assumed the algorithm is trying to maximize a metric, which is best completely automated without human intervention to slow it down or provide biased opinions. Even fuzzier goals like 'not be racist' would get turned into metrics (eg. proportion of words used that are on this watchlist or similar scoring algorithm)

> Why does someone need to spot check the training data against the output?

because one wants to make sure ones program is operating as intended?

The surprising thing about all of this is that sentiment analysis--machine learning algorithms that can interpret the feeling you're trying to convey with your words--is a fairly solved problem.

These companies can easily put a filter over this engagement maximization algorithm and they are choosing not to.

But why would they do it? They have some pretty solid data to support the argument that more engagement, whatever the quality, equals more money to the poster, and more money to the platform. There is no bad publicity. The question is what can we do about it? It exploits basic human psychology.

Of course, machine can read the commenters' minds, this is a solved problem.

I, for one, welcome our mechanical overlords.



This document is: positive (+0.52) Magnitude: 1.39

Sorry, can you provide some citations to "sentiment analysis being a solved problem". Last time I looked, it was decidely non-trivial, though I admit I haven't followed the topic in a few years.

And it is worth noting that if these companies could stop this with limited engagement impact, they totally would as it would get them out of the horrible political hole they are in right now.

Mark Zuckerberg is the 21st century's Pablo Escobar.

Because he runs a vast, violent empire that smuggles illegal drugs?

Probably because the parent dislikes Zuck, and the Hitler comparison has gotten too overused in recent times (and Zuck isnt a politician), so Escobar was chosen as an appropriate one. Fully expecting to see that comparison being used against Jeff Bezos soon.

tl;dr: “everyone i dislike is a nazi or something close” syndrome

"Twitter has an algorithm that creates harassment all by itself"

What am I missing here? There was no harassment of any sort. Alternative headlines could have been:

"Twitter has an algorithm that helps you gain more followers"

"Twitter has an algorithm that helps you drive awareness"

"Twitter has an algorithm that helps you get more twitter followers for your cause or business"

"Twitter has an algorithm that expands your social impact from beyond your sphere."


In other news: public posts on public site go.... public.

The missing piece which the Twitter thread author only touched on is that how a tweet is received by a reader depends a lot on whether or not they come from similar communities and have similar context to the author. By surfacing tweets to people that the author doesn't know at all, it's likely the responses will be more negative in general.

Anyone with a large twitter following knows roughly what the makeup of their follower base is, and they compose tweets accordingly. While always necessary to some extent, it's usually hard to contextualize every single tweet as if it could be read by anyone, so it often isn't done.

As a silly contrived example, lets say I am a software developer that focuses on operating system performance and I tweet something like "I'm working on an algorithm to make killing children an order of magnitude more efficient". (note to real twitter users: never tweet that)

My followers know I'm talking about killing child _processes_ on a computer. So they reply things like "oh, that would be great, it would make this one shell script I have a lot faster to execute" or maybe even "personally I'd rather you encouraged users to use threads rather than forking lots of processes". There might be a heated discussion, but it will be with a HUGE shared context of information.

Now the Twitter algorithm picks it up, and the tweet gets seen by lots of people who don't know anything at all about operating systems. They are, understandably, completely appalled. They start responding with anger. Threats, abuse, etc.

So, Twitter changing the dynamic from "your tweets will primarily be seen by your followers" to "your tweets will frequently be seen by your followers followers" can actually have a big impact on the platform. It will at minimum take some adjustment. Operating with the assumption of one dynamic when there is in fact the other will be...painful.

I get what you are saying, but isn't this what everyone was screaming for years ago when the filter bubble terminology came up? Now we are criticizing networks for showing things outside of our filter bubbles? You can't have it both ways.

Yeah, this definitely is a way to break the filter bubble.

But thinking about it a bit more, it might be one of the worst ways to do so.

For example, assuming roughly that both favorites and retweets represent general agreement, using those mechanisms to surface new tweets to people makes sense. If someone you follow (and presumably respect) quote retweets someone you don't follow with "Yes this!" or something similar, then you're already primed to agree with the person you follow.

But, often at least, replying and not faving/retweeting could very well bais for DISagreement. Now instead you're going to see someone you follow and respect arguing about something, and you're primed to agree with them, and potentially pile on to the original tweet author even though you might not have cared about the topic a minute ago.

Twitter ALREADY has a way to signal that you want all your followers to see a tweet you saw: retweet. And even showing your followers things you favorited at least means they'll see things you probably like. But it seems there's at least a reasonable argument that showing your replies to your followers is setting up a situation where pile-ons to the original tweet are likely.

I guess the point is that Twitter could easily tone down pile ons by noticing that a tweet is generating many more replies than likes. Then reduce display of that tweet instead of boosting it to non-followers.

Perhaps not for blue checkmarks (they've declared themselves central to the public debate), but for average users Twitter should try to calm down pile ons.

Most of those problems would go away if they a) eliminated the gamification (displaying numbers of replies, retweets, and likes) and b) required textual comments of a particular length.

But then so would the engagement and ad revenue.

That doesn't sound like a solid indicator of an issue. Two friends could be having a back and forth discussion with no harassment or conflict. You'd end up with 25+ replies and 1 like.

What's the point of locating in Silicon Valley and hiring the smartest programmers in the world if you can't figure out an algorithm to make hateful posts not show up as often in someone's feed?

I doubt it's because they can't. The more likely answer is they don't want to.

It's actually a hard problem, similar to porn detection without using humans (see: https://en.wikipedia.org/wiki/I_know_it_when_I_see_it). Blocking purely based on keywords or Bayesian filtering usually paints too broad a stroke and ends up limiting well-intended free speech (I once had a comment blocked for arguing AGAINST racism!). It's similar to the "blocking all mention of sex also blocks sex education" problem. It seems to take a fully-fleshed-out intelligence to grasp the true meaning behind even something as innocuous-looking as a written sentence.

Your assumption that people more intelligent than you "should have figured this out by now" belies the very problem- no one has yet come up with a good automated solution for this. If YOU do, you'll be a millionaire.

Again, I disagree. Twitter came up with a way to make some posts more widely shown, and you're trying to tell me they don't have a way to make some posts less widely shown? As someone else said, if there are a lot of comments and few likes, don't put it in the trending feed. That's one solution for free, and I don't even work for Twitter. If it's two people having a conversation back and forth, the broader Twitter audience doesn't need to see it. It's not censored, it's not hidden, it's just not broadcast either.

People have become millionaires, billionaires even, for the exact opposite of what you say. You become rich by making sure controversial content is spread as far and wide as possible, because hatred and fear sell as entertainment. People get addicted to it. You don't become rich by filtering out hateful content, you become rich by enabling it and spreading it because that's what people want (as long as they're not the target).

If you limit yourself merely to detecting abusive tweets, perhaps it is hard. But there are plenty of ways to adjust the way the social dynamics work that would decrease this kind of behavior but, I believe the argument goes, most of those would also decrease _engagement_.

The real problem is the incentives, both for Twitter and for people interacting on twitter. The solution is probably _social_ rather than technical, but as long as Twitter wants to keep your eyeballs on their site for as long as possible (so they can sell ads or whatever to advertisers) a whole host of solutions are going to be verboten.

By way of example, Hackernews literally has a feature to just lock you out of the site if you are using it more than you want to. That is great for us, the users. But twitter would never do such a thing.

I would imagine the issue is certainly because they can't. What is hateful to you is charming and encouraging to someone else. Social norms and cultural differences are gigantic. Look at the recent controversy with the conservative guy on YouTube who referred to a reported from Vox as their 'queer Latino reporter' and it was seen as hate speech... despite the Vox reporter openly and frequently labelling themselves as Voxs queer Latino reporter. How is a computer supposed to interpret that? How is it supposed to know that when person A says something and when person B says the exact same words, referring to the exact same subject, that the greater context of the speakers background political affiliations and those of their audience actually determine the 'meaning' behind the statement, not the statement itself?

This is not an easy problem, and it does no one any good to pretend that it is. Tackling the issue also requires those considering it to consider other social situations. Is someone supporting equal treatment of women in Saudia Arabia practicing hate speech against the conservative ruling party? If we'd had systems that let us actively regulate speech in the way we can now, would it have been appropriate to block Martin Luther King Jr. because his message was growing civil disobedience and causing families to bicker over race politics? Why are we so damn certain that any argument today will necessarily be decided by a regression rather than a wider acceptance of more progress? Change in human societies is always ugly, always comes at the cost of pain and strife, and on the balance has usually moved us in a forward direction. I can't say the same for censorship. Censorship makes impossible any forward movement, and only serves to leave regressive mindsets to fester and make-believe that they have more support than they actually do.

We're not talking about banning these posts, or hiding them, or censoring them. Just not showing them as widely as they do other posts. It doesn't even need to go as deep as "this is hateful", but rather "this has the potential to be hateful" or giving the author the ability to control how widely the message is being shared.

I see these people here trying to debate solutions like good engineers, but unless they work at Twitter, it's a waste. We can guess all day and come up with a million solutions but when it comes down to it, Twitter absolutely has the ability to control posts that spiral out of control. What they don't have is the desire to do so.

What's the line between censoring and "not showing them as widely as other posts"?

It can be smoothly related with probability of post being undesirable. So if algo thinks it's 50% undesirable simply count it as "half a weight." Or tune this function to be whatever you want. Twitter/etc already makes arbitrary choices about what gets shown.

Not every post gets shown as widely as some do. What's that line? That's where I'd start.

For every mean-spirited hate post that gets promoted, another tweet about knitting is not promoted. Why is censorship only bad if the content is hateful?

"How is it supposed to know that when person A says something and when person B says the exact same words....."

I was about to argue against this but then realised its worse than you suggest.

If I as a white person used the N word to describe a black person I would be labelled a racist, whereas a black person can say it all day long. But even if I black up and say it, its even worse. But then with gender the rules are almost reversed, I can declare myself a woman and expect that to be somewhat respected.

And on the internet no one knows you're a dog, or a transvestite in black face.

We're at a stage in "AI" where we can fool image detection with modifying a single pixel, where Google AI mislables black teenagers as gorillas and bing overlooks child porn, and where self driving cars still self drive into things.

All while "learn to code" is used to harass in some contexts...

But we expect twitter folks to just figure out an algorithm to filter out "hateful" posts, when there isn't even an accepted definition of hateful? The first replies it would filter is all the people telling Trump how bad and evil he and his policies are, while the people who try to actually harass people will find quick and easy ways to game the system, as they always have; that's my prediction of a 'best' case outcome.

That would only be two people. You could factor in # of users.

additionally, there's no real need for technically public discussions to be promoted or made more public, so it's not really a failure state if the algorithm doesn't promote a high reply rate exchange between two users exclusively.

You could simply add the number of distinct users replying. Seems like a pretty simple fix.

I'm still not very convinced that replying without liking is an indicator of negativity. Maybe in most cases, but definitely not all cases.

I don't use the like feature on the website at all and often comment on artwork saying how nice it is or whatever.

They have all the data to be able to make a relatively simple change like this. They don't want to, likely because it "drives engagement".

In general, whenever people say something is relatively simple, and yet this thing has not happened, it can often be a sign that we are missing some hidden complexities.

Not always, but often.

I almost exclusively use the like feature.

The alternative headlines would make sense if that was the consequence of the algorithm, but instead it seems to predominately result in folks who have runaway tweets getting harassed by folks who they don't know. Why would you sugar coat that with some noise about driving awareness?

I can see the poster's point of how it could lead to negativity in some cases, but like you I don't understand what the big revelation is here. Social networks thrive off more people interacting with more posts, so they show posts that have been interacted with a little bit to lots of people hoping they continue to get interacted with. That doesn't really surprise me at all.

The idea of twitter just randomly deciding to boost a low-like-count tweet because it got replies is EXTREMELY WEIRD. Nobody knew the service worked this way. Showing friends' likes in your timeline is not a new deal but in this case there weren't likes, it was just "high engagement". High engagement tweets are often controversial posts from women or conservatives or leftists, and all of those groups are likely to get inflammatory replies that the original poster may not have wanted - you don't have any control over whether your tweet goes viral or gets ratioed.

To an algorithm, harassment looks like engagement. To the market, engagement looks like success. We're participating in a system that has no choice but to create this result.

Yeah, this has been a pretty obvious change in the recent months/years. I started noticing tweets in my timeline that were there because it was something someone I follow had replied to -- and these tweets were often "outrage-worthy" or otherwise click/flame-bait. I have since written my own Chrome extension to hide all "suggested" tweets of that sort (as well as hiding all sponsored tweets, and the "Favorite" button).

There's a pretty long list of CSS classes you can just toss a "display: none" on, but unfortunately other stuff can only be discerned by checking that certain elements are inside a given container. I had to start writing actual JS to evaluate the contents of the page and omit/delete stuff that way.

Out of curiosity, why did you hide the favorite button? Couldn't you just not click it?

Yeah, but I just don't even want to see it. It's a feature I would prefer had never been added to Twitter (just like the other stuff I omit using my Chrome extension). Visual noise distracting from the stuff I care about seeing. :)

If you're never going to click it, why have it there when it can be hidden?

They probably have some kind of text classifier that's trained on `tweet_contents -> engagement`, with some suitable engagement metric (dwell time, likes, replies, etc).

Can you share a link to this extension?

Unfortunately it's just a folder on my computer -- I didn't want to pay the $5 to publish the extension on the Chrome Web Store (plus I speculate that Twitter would ban me if I publish it).

Should throw it on gitlab or github for free to share! :)

You can trivially hide an element permanently using any ad blocker of your choice.

I didn't suggest that I couldn't; however I figured that the work had already been done in the extension.

Right-clicking the heart button and clicking "block element" is much easier and faster than downloading an extension, reviewing its code, then overriding whatever safeguards Chrome has to make it hard to install 3rd party extensions. Just my two cents.

I think you misunderstood me. I want to block everything other than tweets and retweets from my timeline. No "so-and-so liked" or "so-and-so follows."

On the new Twitter PWA, you can just switch from "Home" to "Latest Tweets", though it has a habit of automatically switching back once in a while. Also, they use React with obfuscated class names that change whenever they update the site so customizing the appearance (e.g. hiding the like button) is pretty hard to do.

There are two tiers of Twitter: the early-adopter tech users, who use third-party API clients like Twitterific and Tweetbot, and everyone else.

The first bucket has a vastly different Twitter experience. As an API client user I have no ads, no polls, no recommendations or friends likes, no "someone you follow replied" experience. Just a timeline of who I choose to follow, the blissful way it always was. No wonder they wanted to shut the API down.

(Apropos of nothing, the first bucket contains all the tech journalists.)

I don't know if this is such a clean distinction. I have been on twitter since the first 3-6 months that it existed as twitter, and I use the website and first-party mobile apps. Are the ads and recommendations annoying? Yes. But back when I used third-party apps (tons of them over the years, I think starting with Twinkle, and then in some order Tweetie [which became the official app], Tweetbot, Twitteriffic, and probably some I'm forgetting), the experience was bad in lots of other ways. As native photos were added, they didn't display right. As native retweets and quote tweets became a thing, they didn't work. I imagine now with tweet threading, there was at least some gap between the feature existing and the third-party apps supporting it.

Obviously it's a tradeoff, but I found the downsides of the official experience to be less frustrating than the downsides of the third-party experience.

I am slightly salty about this because I had complained about the same effect on reddit with the red inbox — I disabled my own inbox years ago — but nobody really cared.

It’s a more general case of advertising pollution. Just as it benefits advertisers to make viewers uncomfortable and manipulate their attention, Reddit and Twitter (and Facebook!) systematically display messages that make users uncomfortable to get their attention, stimulate emotional vulnerability, and create opportunities for marketers to step in with a palliative, “shopping therapy”.

It might be just me but wow, it's really hard to read multiple tweets as a timeline since I don't know where to start and where to end.

This is because they broke threading and the replies section for a tweet now shows parent tweets. It's bizarre, and twitter breaks the replies view on a regular basis and only notices weeks later. It's not just you.

Yup, i skip, and often flag any twitter link and come to the comments. Unreadable and not a valid source for tech news imho. Its literally people screaming on digital streetcorners.

The magical step is the final step in the argument. The algorithm does seemt to increase engagement altogether, but people's negativity bias will give more weight in ones' mind to the harrassing comments. The algo does incentivize replies it seems, which could potentially be negative, given people don't reply to positive things beyond to like or RT them, so that's an argument in their favor.

Of course, most people don't notice this. I've never had a post with more than 100 replies for example, so I would have never been aware of this.

Why would an amoral view of user engagement be a surprise for a company's whose goal is to show as many ads as possible?

And on top of all that, Twitter's own editorial team regularly stokes political/cultural controversy by boosting non-issues in Twitter Moments and Trending topics.

Gradient descent into hell

I'm not engaged in the twitterverse, so a lot of the jargon goes over my head. Can someone translate what happened?

>The experience of having made a viral tweet is The Worst Fucking Thing.

If you make a "viral tweet", don't read the replies. You have the tools to do so, since Twitter allows you to mute a thread.

Most people use social media not with the intention of shouting into the void, but with the intention of getting replies that they want to read, that they then do read, and perhaps reply in turn.

If that's what usually happens, but sometimes randomly they get tons and tons of replies that they don't want (as claimed by this post), that's an interesting and noteworthy flaw that I've never seen specifically discussed.

You can also mute notifications from people who don't follow you / who you don't follow, though it may not be desirable to keep it like that all the time.

Im not sure I agree with your rather chatitable interpretation of most twitter users. Most seem firmly in the void shouting camp to me.

Even if you like the replies, it's overwhelming.

"Creates harassment" is very misleading. Twitter has an algorithm that shows high-interaction tweets to more people, that's it.

I'm usually the last to defend Twitter but this title is pure clickbait.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact