
“Fake News” dataset on Kaggle - jo_kruger
https://www.kaggle.com/mrisdal/fake-news
======
ramblenode
This data was collected using a recently developed tool called the BS Detector
[0] which aims to automatically classify fake news. As the labels have not
been validated by humans, I question how much utility this will have for
developing additional fake news detection systems. Any model trained on this
data is really just learning the parameters of the BS Detector, not ground
truth fake news.

[0] [https://github.com/selfagency/bs-
detector](https://github.com/selfagency/bs-detector)

~~~
ManlyBread
Classify? I'd expect classification to be able to detect whether something is
fake or not regardless of the source. Meanwhile this tool is nothing more than
just a simple lookup of sites that the author deemed to be biased or fake.

None of these are backed up with any proof regarding the claim that these
sites provide "fake news" \- most of these doesn't even have a reason on why
these sites are there. If the author cannot provide a reason for these sites
to be there then for what reason should I trust the author?

------
gadders
It includes Breitbart and Drudge Report as fake. Those sites may have a bias
you disagree with but no more so than, say, The Daily Kos or Mother Jones.

Can anyone identify any left-of-centre site urls in the list

~~~
ZeroGravitas
Breitbart's ex-editor-at-large quit because they "shaped the company into
Trump’s personal Pravda, to the extent that he abandoned and undercut his own
reporter, Breitbart News’ Michelle Fields, in order to protect Trump’s bully
campaign manager, Corey Lewandowski, who allegedly assaulted Michelle."

Did anything like that happen at the sites you name?

~~~
trendia
Yes.

At the NYT: "By and large, talented reporters scrambled to match stories with
what internally was often called “the narrative.” We were occasionally asked
to map a narrative for our various beats a year in advance, square the plan
with editors, then generate stories that fit the pre-designated line."

[http://deadline.com/2016/11/shocked-by-trump-new-york-
times-...](http://deadline.com/2016/11/shocked-by-trump-new-york-times-finds-
time-for-soul-searching-1201852490/)

------
trendia
ronpaulinstitute.org is listed as fake news? How did they determine what was
fake or not?

~~~
inimino
Just a list of domains compiled by a guy. If you want to know more about the
level of thinking behind the list of domains, this[1] page linked from the BS
detector readme gives the general idea.

I would say it's the product of a very earnest idealist who thinks he has
solved the problem because he hasn't spent enough time thinking about the
problem to understand the ways in which it is non-trivial.

[https://www.inverse.com/article/23781-bs-detector-
facebook-f...](https://www.inverse.com/article/23781-bs-detector-facebook-
fake-news-daniel-sieradski)

------
web64
The dataset doesn't contain the article URLs which makes further analysis
harder. With the URLs you could look at who is linking to the article or
website and use that as an indication of authority.

------
AdamSC1
I'm livid about the abundant existence and spread of 'fake news' and the
inability people have to fact-check, but, I also want to play devil's advocate
here.

1) This data set mostly relies on a data-set called the "BS Detector" which
describes itself as "[a] hastily assembled...proof of concept...[that]
searches all links on a given webpage for references to unreliable sources,
checking against a manually compiled list of domains...[to] address the
proliferation of fake news". While I agree with their choices in sites, the
concept of manual curation deciding the legitimacy of news is horrifying and
something that we hoped the internet would move us away from.

2) We run into this problem where we want places like Facebook to censor 'fake
news'. But, I imagine most of us can agree that as autonomous moral agents,
we'd never want out access to information censored. Rather we'd want the tools
to make an informed decision ourselves. In accepting that, you'd have to
extend that same right of autonomy to other individuals no matter how 'wrong'
you think they may be.

A) We can't manually curate legitimacy list. As that leads to more abuse, and
let's face it people who read fake news wouldn't trust your list anyway.

B) We can't let tech companies filter out websites based on their
interpretation that leads to more bias especially politically.

C) We can't let crowd-sourced intelligence do it without being locked into the
tyranny of the majority, which as we've seen, spreads fake news.

In a time when traditional news agencies are hurting for views, lost in the
digital noise and often slower to respond to breaking news than niche outlets,
we can't simply give them the benefit of the doubt either. (That would also
enshrine them as some sort of news aristocracy which emerging outlets and
voices couldn't compete with).

It seems rather than try and interpret the news as fake based on the
piece/site itself, we need to take a step back and look and how on the
consumption platform (browser, news channel, social media) you can display
relevant facts to people.

[Unsubstantiated claim warning] I imagine most credible journalists have at
one time another cited an irrefutable fact. A raw figure from a report, a
percent by which a poll was won, or the weight of the Eiffel Tower. It seems
that with data sets like WikiData and Firebase (old dump, not the new mobile
platform) we would be able to locate claims of fact within the writings of
journalists and outlets and compare those against the irrefutable fact.

If a journalist or outlet has a number of facts that are incorrectly stated
well outside a margin of error, a little pop-up displays those to the user
letting them decide how to interpret that.

I know we want to label things as 'Fake' or 'Manipulative' media, and I do
think it should be a criminal offense to _knowingly_ publish false information
outside the protections of entertainment/satire/art etc. But, no matter how we
spin it, categorizing media is more destructive than it is helpful.

We need to find ways to put the educated decision back in the hands of moral
agents who are struggling to find facts in an overwhelming sea of noise.

~~~
inimino
> it should be a criminal offense to knowingly publish false information
> outside the protections of entertainment/satire/art etc.

That is a rather dangerous idea.

~~~
AdamSC1
I can see elements of danger if improperly used, but, from my point of view it
is no more dangerous than libel laws or any other law which requires
explicitly "knowing" what one is doing.

A law like that is dangerous only if the burden of proof is on the journalist.
Which in our modern society _should_ never be the case. I admit, sadly there
seems to many cases now where innocent until proven guilty isn't as strongly
held.

It is tremendously hard to prove that someone 'knew' the information they were
publishing was false (outside of an explicit confession or conversation) and
it absolutely should be hard. But, such a law may make people who are in it
for the money at least give pause.

I can't think of a case where if it isn't fiction, satire, humor, commentary
etc etc, and you know it to be false information that you choose to spread as
fact, that you aren't doing so with malicious intent to harm others or to
personally benefit from the exploitation of others? The social contract of
society exists primarily for protection from such actions, and its tool of
protection is the law.

If there are cases you can think of where that wouldn't be malicious or
exploitative I'm really interested. While I still hold true on the idea that
the most important thing with fake news is to empower people to make their own
autonomous and informed decisions, I'd quite like to be wrong about
criminalizing the spread of false information - as I am a fan of as few laws
as possible to get the job done (as long as the job is done well.)

~~~
nostrademons
Probably-not-hypothetical case:

Tensions rise with Russia over Ukraine/Syria/Baltics/Eastern Europe/etc. The
Pentagon draws up contingency plans for responding to an armed conflict in any
one of those areas.

When questioned by a reporter, the White House responds "We have no plans to
go to war with Russia."

Should the White House Press Secretary go to jail?

I sympathize with the information-libertarian position that holds that all
information should be free, truthful, & accessible to everyone. But there's a
really big pragmatic problem in that when individual soundbites are pulled out
and shared, they lose context. People don't understand the circumstances
behind the facts, and they bring all of their own prejudices and emotions
along. Oftentimes, reporting the truth brings about a _worse outcome_ (in some
cases, a disastrous outcome) than lying to the public.

It's not just the government either - corporations, institutions, and
individuals lie all the time too, and often for good reasons. I was a Google
employee when the Nexus line was under development, and I knew that the
official press statement of "We are not and have no plans to develop a phone"
was false. The reason for it was that if news of the Nexus line leaked, it
would've jeopardized supplier relationships such that there _wouldn 't_ be a
phone. In some cases you can actually get into logical paradoxes where a
statement is true only if everyone believes that it is false and vice versa.

~~~
AdamSC1
I can entirely concede the case for secrecy in these situations.

Obviously, my statement is not a full proposal for a specific law, but rather
a starting point. There is certainly a level of nuance (as there is with all
laws), but, I think it comes down to the difference between publishing content
you know to be false for matters of personal gain, manipulation, malice, etc.

To be a practical law it would need some sort of clause akin to separate the
'actus rea' from the 'mens rea' as well as the impact.

Consider cases in libel. The 'actus rea' (the actual act of committing libel)
and saying something defamatory about someone that you believe to be true, is
different than the libel which has the 'mens rea' (forethought of the action)
in which you made claims you knew to be false but did so for a negative impact
against someone else.

The 'actus rea' alone is often swept under the rug with a slap on the wrist,
unless the plaintiff proves substantial damages from the act itself. That
qualifier can make it a criminal act even without the 'mens rea'.

I agree, there is a lot of nuance to tease out in such a legal concept. My
retort to the other commentator was more that if you are going to say
something is a bad idea, you have to put forth why and not just put forth
rhetoric!

But, once again, I wholeheartedly agree with your examples. I couldn't call
for a law that compels the truth. Just something that prevents false
publication with the intent to deceive people that it is real for the purposes
of malice or ill-gotten self-gain.

------
garysieling
I'd like to see a dataset like this that includes email forwards, e.g. from
Snopes.

------
solotronics
nice! the digital equivalent of safe spaces. perfect for when I want to only
hear opinions that conform to my own.

