Hacker News new | past | comments | ask | show | jobs | submit login
Machine-Learning media bias (unite.ai)
77 points by Hard_Space 10 days ago | hide | past | favorite | 61 comments

I'd be more interested in calculating bias based on what they DO or DON'T report entirely. It's one thing to report on something with bias, it is more telling to note what they selectively ignore.

It would be challenging to define "articles that a news organization should cover but isn't" though. Like there's trillions of things that happen every day, where's the lower bound on "news worthy" and how do you not bias that lower bound to ignore things that some other party would consider significant?

You could do it in relation to other news outlets. That won't give you absolute bias, but you could say something like "CNN is much less likely to report about X topics than Fox." It would be interesting as well to add in other factors - i.e. how often do certain outlets report about the wars in the middle east when the president is republican vs democrat?

I would go farther and make it relative to clusters of people on, say, Twitter. If an outlet is concerned with a lot of things that the progressive cluster of Twitter users is talking about (more so than many other outlets), that’s a strong indicator that the media outlet (and other outlets in its media cluster) is also progressive.

I'll just throw out a comment (even though this is mostly a thought experiment) that doing any type of twitter analysis is either nearly impossible (in the scraping case) or prohibitively expensive (e.g. using their API). They've really shut off access to third parties in the last few years. (Just thought I'd mention it since the parent post continues this pre-2015-ish idea that still floats around that using twitter for projects could be a thing.)

Fair enough, yes, Twitter was intended to be an example of a mine-able medium where people express political opinions, but to your point mining Twitter data may be more difficult than I imagined.

How often certain outlets report about the wars in the middle east fluctuates massively on the actual wars, though.

But you would still expect them to fluctuate together. How the different news agencies ramp up and down coverage in response to the same events is what's interesting.

Ah, fair enough - thought you meant over time.

Really? You just take the set of stories across N news outlets and then compare to the set of stories in each one outlet.

Sure. It's just that if the goal really is to identify which stories aren't covered by news then if you only look at what news covers to determine that, you'll miss some of the bigger 'coordinated silences'. Weinstein's comment about how weird it is that no journalist ever brings up some really obvious questions about Epstein comes to mind. Or how little press there was about his brothers telomere story. From one perspective the reason for the silence is because it's not worth covering by the news, but from another perspective it's biased silence.

Or report by saying only half the truth, without showing the other side's story at all. It makes it so much easier to adapt the language to sound neutral and educated. I think this is a big part of why the US is so divided right now, you get mostly half truths and no coherent view from the other side that it makes you think "the others" are completely insane.

Yes. There are research dollars for studies to "use AI to do a meaningless study on topics that are done better in other ways." Hence professors obligingly churn them out and journalists report them.

What's even funnier is, once you speak two languages, to compare coverage of the same news on the same network in two different languages (so aimed at different demographics).

>"The work centers on the way topics are addressed with particular phrasing, such as undocumented immigrant | illegal Immigrant, fetus | unborn baby, demonstrators | anarchists."

This to me is phrenology 2.0 or something. Not only does this have absurd explicit shortfalls, like the fact that words like 'anarchist' can be derogatory or sympathetic purely depending on context on opposite sides of the spectrum, this completely ignores any nuance, sarcasm, or anything else that really reveals bias beyond the obvious.

I'd suggest sampling a thousand random people, having them rate everything on a spectrum and average it out, or consulting the title of the publication. Saves you the compute. We don't need ML to determine that the American Conservative, is surprisingly enough a right-wing publication

The word anarchist is a bad example but for undocumented immigrant vs illegal immigrant or fetus vs unborn baby, it's utterly clear what the connotations are.

> this completely ignores any nuance, sarcasm, or anything else that really reveals bias beyond the obvious.

This problem is mentioned several times in the article, and several ways it manifests itself is discussed.

but no real solution is offered to rectify it, and if you have such glaring problems it basically renders the thing useless in any scientific sense.

I'm focusing on it because of the implication spelled out at the end:

"Nonetheless, the MIT study seems to be the largest of its type to date, and could form the framework for future classification systems, and even secondary technologies such as browser plug-ins that might alert casual readers to the political color of the publication they are currently reading."

What's happening in that study is not science but alchemy, a gigantic problem in the field. Entire sectors are now infested with this stuff, see 'affective computing' for another offender.

> that might alert casual readers to the political color of the publication they are currently reading

So many false positives with ESL writers.

It's experimental, you just use it or not depending on its usefulness.

Yes, and it isn't alchemy, since no gold is created.

It is tautology.

Phrenology is a fake science finding patterns where none exist. But human language and political thinking are deeply intertwined. You’re right that you don’t need ML to know a certain conservative outlet is right wing, but automated analysis can help show us the full spectrum of coverage for a given issue, and help show the more open minded consumers how their particular media compares to others. There’s a lot we can learn by studying how people use language.

The point is that language is not enough. I'm a fierce critic of how contemporary media seems to be moving away from NPOV as a default, but objectivity requires, at a minimum, an attempt to properly weigh and prioritize information. Which is deeply contextual and even time-sensitive.

The usual example is an hypothetical Wikipedia page starting with the sentence "Adolph Hitler was a soldier, a painter and a statesman". It's not wrong, but I wouldn't say it's a good summary either.

But…. They picked up some signal, and the examples they give are both correlated with each other, and with our intuition? So what “is” that signal, and if it correlates well with the left-right dimension, what does it matter if it’s not exactly the same, or not causal in any direction?

All I can think of is that using this data to judge biases could be defeated by a simple search-and-replace if anyone wanted to. But I doubt publishers cate enough about obscure papers calling them conservative to change their language. (If they do stop calling all foreigners “illegal aliens”, that’s would actually be quite the feat for a lowly department of semantics)

It can be defeated by engaging in sarcasm. It can be defeated by using right-wing language to construct a left wing argument (one of the most straight-forward ways to persuade someone is to speak their language), one can twist an argument entirely by minor adjustments to speech.

The issue here is a simple one, that in communication, language cannot be divorced from context. Laughing at a birthday party means you're having fun. Laughing at a funeral means you're distressed. Laughing itself has no meaning, and words themselves have no meaning. You don't 'pick up' signals, you interpret something as a signal within a frame of reference.

This methodology is right in every case where it is trivial, and thus unnecessary, it is likely wrong in every case where it matters. And that is an actively bad combination.

This is really an analysis of the use of biased language in news articles, which is interesting but only one dimension of potential bias.

It is very possible to use non "charged" language, but still report a topic with a strong bias. For example, Slate is left leaning by most measures, but the below landscape chart from the study has them dead center. Maybe they are better at using neutral terms?


> dead center

That's a bias that gets tugged around by the bias of extremes rather than the sensible. Sorry, I hate that term because it is often used to imply that you can average a wrong and a right and get something more correct. Often enough, one side[1] is decently close to right on an issue and the other is pretty much wrong. Picking the center of that isn't more right.

Often the quest for "neutrality" is bunk. In flat-world vs round-world, the flat-world is not with equal standing but the "neutral" seeking would often present as if the flat-world idiots have a case. While issues may have some subjectivity, we should not constantly pretend that there's an even distribution.

A neutral language meter can't ever hope to be right. It's not an analysis of what's well supported. Rather it's an analysis of assertiveness. I'm very assertive about the world being fucking round. I'm a red flag for such a bias meters.

Sorry if I'm going off but--I'll just go a head and say it--that term triggers the fuck out of me for getting undeserved validity.

[1] On a per issue basis. This is not to imply that one side is more consistently correct across issues.

Can someone explain this chart to me? What does the position on the chart indicate? Slate is left leaning, but it is correctly marked in blue.

EDIT: from the paper "Our method locates newspapers into this two-dimensional media bias landscape based only on how frequently they use certain discriminative phrases, with no human input regarding what constitutes bias. The colors and sizes of the dots were predetermined by external assessments and thus in no way influenced by our data. The positions of the dots thus suggest that the two dimensions can be interpreted as the traditional left-right bias axis and establishment bias, respectively"

It’s a projection of the NLP’s vectors into 2D space. Remember the illustrations for the king - man = queen example for word embedding? They also often used a 2D space. You can sometimes, but rarely Intuit a sense for these dimensions, but they don’t come with any natural definition or unit.

I still don’t get it. Is the chart supposed to show axes in addition to the left/right and pro-establishment/critical, currently represented by colors and sizes? How do the “lack of human input” and “external assessments” fit into the explanation?

The paper itself without the summarizer’s bias: https://arxiv.org/abs/2109.00024

I hate how unlabeled these graphs are.

EDIT: I am glad to see the summary here acknowledges and points out that it may have been intentional to obscure the labeling of these graphs as a method to avoid inflaming the media. If that is the case I think that's a bit short sighted on the part of the authors of the paper but it makes sense.

I just feel like all this is going to end up as a deeply layered proxy for pushing your political narrative (hidden behind a "AI is math therefore it's objective" veil), and I just don't feel like being part of the deception.

It is telling that the euphemism “undocumented immigrant” is considered less polar than the factually descriptive “illegal immigrant”. “Undocumented” suggests ambiguous legality, which is plainly false in the vast majority of cases to which the phrase refers.

It is also interesting that there is no neutral term for this topic. Actually, now I’m using euphemisms. It’s not interesting, it’s disheartening.

You seem to imply that "undocumented immigrant" isn't actually less factually descriptive or a euphemism at all, but a term that refers to a slightly broader category than "illegal immigrant."

I don't know if that's a correct categorization, but given that framing the relevant differences are not at all about one "descriptive" term and one "euphemistic" one.

“Illegal” is plainly wrong in the vast majority of cases you’re probably thinking of. At least for most people who crossed the US?Mexican border in the last two years, they have pending applications for asylum since the those courts are completely flooded.

So “asylum seekers” or “refugee” would be better, since I agree that they are not undocumented, either.

My understanding.. it is more grammatically correct to say that:

A person can be undocumented.

However, a person cannot be illegal - they can only commit illegal acts. I.e illegal immigration.

Language is flexible though and I understand what people mean with the term. I believe those on the left take issue with calling people illegal, out of fear of promoting xenophobia.

no one is calling anyone "illegal". they are calling them an illegal alien, it's a legal term in our immigration laws referring to a non resident who is staying here illegally, not by legal means.

getting into the semantics is ridiculous in this situation. 1/3 girls let over that border are raped according to doctors without borders. every person walking over is part of the cartels human trafficking ring, but yeah let's worry about what we called it in our legal books.

xenophobia is not the moral issue here, it's fucking human trafficking.

> every person walking over is part of the cartels human trafficking ring

> xenophobia is not the moral issue here

In other words, the perilous journey and initial exploitation by the cartels are the issue, not some semantics debate about the legal term "illegal aliens" and whether people invoke it because some alleged "xenophobia".

Unless you were selectively quoting parts of my sentences for another reason. You didn't exactly make a rebuttal.

I’m sorry I triggered you. People actually do care about the meaning behind words. That’s why we are talking about it. Words shape understanding, understanding shapes actions, and actions have consequences. Perhaps if we used more compassionate language to refer to these people it would support our efforts to prevent this violence you talk about.

Also, as of this year the Biden administration has asked ICE, CBP to stop using the term “illegal alien”.

there's a time and place to care about words, mind you these are legal words baked into our immigration laws.

noone will believe you care if you squabble over what to call someone while letting them be trafficked.

illegal alien is not offensive, stop trying to make a false debate so you don't have to discuss the real issue, the cartels.

Biden can ask ICE and CBP to use whatever words he wants, like that does something, they're just following the laws he helped write.

maybe he should have updated the immigration laws in his half a century as a senator.

Dude… do you know where we are? We are in a comment thread discussing the usage of words in media bias… is this not the time and place? This isn’t a thread about the cartels. Talk about false debate.

Dude... follow the chain. This was about whether or not illegal immigration has a neutral term, it does, illegal alien, the legal term. I did dive into the issue itself, but it doesn't make it a false debate, unlike the semantics debate.

I was pointing out how bias in the media in this topic always turns to a word debate not a debate on the issue.

Keep in mind YOU brought up xenophobia and not the cartels. You continued the semantics debate, but not the root issue.

Well I disagree. I don’t think there is any such thing as a neutral term in politics. No one has an objective understanding of some combination of words. Just because the term does not illicit emotions in you, does not mean it doesn’t in other people. Like I said, Biden has already recommended the term “illegal alien” be changed - thus highlighting its politicization.

And on your second point I take issue with the premise. Like I said earlier, I think our understanding of words have consequences, debating their meaning is debating the issue. However, is it the most fruitful and meaningful debate? Perhaps not.. but again, we are in a comment section about word usage in media bias - so that’s why I’m focusing on it.

My point is the same. You, like Biden, want to argue semantics, which word to swap out that is "less offensive" but says the same thing.

Even if the word isn't offensive, and you have no idea how many find it as such, if any. That's the priority.

All pointless to solve the actual issue, but makes a great show and distracts people.

Unlike Biden, you had no power to change the immigration laws, so I don't hold anything against you.

But yes, this is about media bias and word usage. Let's just chalk semantic debates up as a tool used in that bias.

It’s honestly really strange how triggered you are.

You've thrown that "insult" again at me while advocating changing a legal term because you think it triggers people. Strange.

Your edits have clarified your intent. I thought you were reaching for ad hominem or something by making unsubstantiated claims about my beliefs. “You like Biden”, vs, “You, like Biden”. Regardless though, you’re still making unsubstantiated claims about my beliefs… I didn’t advocate for changing legal terms.. I was only explaining why “those on the left” feel uncomfortable with the term. My first comment was geared more towards an educational tone rather than a prescriptive tone. I recommend reading it again. I didn’t edit any of my posts. See, this is why I think language is important. Gonna have to call it here though. Good chat.

I believe you're looking too deep into my grammatical edits. I simply missed a comma.

You're the one arguing about replacing legal terms and xenophobia while gloating that I may be "triggered" by what you're saying.

Complete nonsense wrapped up in words, while ignoring the real world issues. Read up on the border, ease up on the thesaurus. Cheers.

In a descendant comment to this, you wrote:

> if we used more compassionate language to refer to these people

Which implies a greater interest in the politics of the language vs. semantic precision/grammatical precision.

The "illegal" in the phrase "illegal immigrant" refers to their illegal immigration, not to the legality of their person-hood, as your comment would imply. "Undocumented" is an intentional dodge of the legality of the immigration action undertaken by the person (aka their immigration status). The more factual phrase has been made controversial by partisans.

And btw, cards on the table, I support an expanded, rational immigration policy that recognizes the critically important role immigrants play in the U.S. economy and, more generally, society. And one that encourages legality and punishes illegality.

illegal alien is the neutral and legal term. it's been politicized like most words that side disagrees with. it all turns into a semantics and racial game to distract from the actual issues.

One of these terms can get you votes from naturalized citizens of certain communities. The other one won't.

https://github.com/rpryzant/neutralizing-bias - This is a similar or even better solution than phrase matching.

It is not the machine learning that has media bias, it is the media bias measured by machine learning. Article’s title:

MIT: Measuring Media Bias in Major News Outlets With Machine Learning

Interesting that this summary states "[t]he paper comes from Prof. Max Tegmark at MIT’s Department of Physics" and later refers to "the author." There are in fact two authors and the lead author (i.e. the one who most likely did the bulk of research and writing) is Samantha D’Alonzo. Is this bias from unite.ai?

I was surprised to read his name. Didn't know he did anything besides push his weird philosophies.

> Is this bias from unite.ai?

No, it's a goof due to time pressure. It's been corrected.

I don't know if the article title is too long or has changed but its:

MIT: Measuring Media Bias in Major News Outlets With Machine Learning

which is rather more useful than the existing one - I thought it was about bias in some way displayed via machine learning as in previous problematic cases.

What was the original title of this thread? It just changed when I refreshed.

> What was the original title of this thread? It just changed when I refreshed.

Same as the article link.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact