
Facebook Must Really Suck At Machine Learning - ktamiola
http://blog.eladgil.com/2016/11/facebook-must-really-suck-at-machine.html
======
mattzito
My suspicion is that it's not that Facebook sucks at machine learning, but
that they are scared of public backlash when their algorithm accidentally
takes a story from Breitbart and classifies it (arguably correctly) as fake
news. At that point Facebook is derided as a tool of the liberal msm (after
all it's run by liberal Silicon Valley technocrats), and from there eyeballs
go elsewhere and Facebook is harmed.

Better for them to feign ignorance and lie to themselves.

~~~
tomp
Funny you mention Breitbart, it's one of the rare media that _didn 't_ predict
a landslide victory for Hillary.

~~~
notheguyouthink
I don't have a dog in this show, but wouldn't that be quite
reasonable/expected if they are skewed towards Republican/Trump?

~~~
snewk
i think the bigger question in wake of hillary's missing landslide, is whether
facebook may actually be classifying CNN et. al. as 'fake' news.

~~~
serge2k
Poor polling data doesn't mean you are publishing fake news.

------
eva1984
No. Google is fooled by fake news too.

[http://www.cbsnews.com/news/googles-top-search-result-for-
fi...](http://www.cbsnews.com/news/googles-top-search-result-for-final-
election-numbers-leads-to-fake-news-site/)

I have some problem with those claims that offer little to none details. Use
the above link as an example, how do we teach machine to verify this news?
First, we need to teach them to formulate a question, in this case, 'Who has a
larger number of popular vote [as of now]?', then it needs to find its way to
verify the number from reliable sources, possibly need to struggle between
conflicted sources as well.

This is no trivial, if we solve this, we might as well solve the problem of
understanding human language as a whole.

The mentioning of Princeton students' Hackthon project is really offbeat, if
the author really bothers to look into what that project actually is, nothing
more than an API mashup.

[https://github.com/anantdgoel/HackPrincetonF16/blob/master/b...](https://github.com/anantdgoel/HackPrincetonF16/blob/master/backend/imageverify.py)

Disclaimer: I worked for a major news app, and the effort to battle fake news
or low quality non-trustworthy news is ever growing. While application of
machine learning is promising, human, especially experienced human is still
the real workhorse for this problem

Edit: Typo.

~~~
alanbernstein
You're focusing on the content.

Look at the source. 70news.wordpress.com should not be at the top of anybody's
list of reliable news sources.

Look at the topic (e.g. from simple keywords). Look at the references (urls)
mentioned in the article. Look at the timeliness.

There is plenty of information available to machine learning algorithms,
without even needing to touch the natural language aspect.

~~~
BinaryIdiot
> You're focusing on the content.

That's the only thing you can focus on. Why should someone be punished and
labeled as fake if they're on a wordpress.com subdomain? Domain is just out,
even spammy or weird looking domains may end up being popular / culturally
fit. You would be preventing good journalistic folks from having any type of
reach if you want to test against domains names, something that arguable may
go away as far as the user is concerned in the next decade.

I'm unconvinced there is anything but content that you can really go off of.
Keywords are not even used in search results, why would they determine news
legitimacy? Even references are not always useful.

~~~
abiox
> Why should someone be punished and labeled as fake if they're on a
> wordpress.com subdomain

i don't know about "labeled as fake", exactly, but it's possible that quickly
and easily acquired urls (such as a wordpress subdomain) may display a
different amount of reliability (per whatever heuristic) in comparison with
various established urls.

------
jwtadvice
The problem is much more difficult than creating a classifier. The purpose of
filtering news is to control the impact, not to control the tactic. The last
time Facebook issued a surge to control 'spam' posts for the same government
concerns they seek to deal with "fake news" today, they took out pro-Sanders
articles, censored sharing of the Snowden documents, prevented people from
linking to Wikileaks, and shut down Facebook organizing for May Day protests.

Facebook regularly censors and directs conversation online around the world to
the whims of various regimes and administrations, but it's just not easy "use
machine learning".

The issue is that machine learning is inherently modeled to solve decision
problems in spaces that separate high level 'features' from low level
observations. The study and practice of machine learning thus far has modeled
noise, uncertainty and complex state spaces.

However, machine learning classifiers do not themselves have a concept of
adversarial data: this led, for example, to spammers who would "poison"
machine learning classifiers by biasing them to learn the wrong things, so
that they could get their spam through. Similarly, not too long ago, Microsoft
put a conversation bot on the web that 'learned the wrong things' because
people approached the bot in an attempt to trick it.

Facebook's requirement for battling unsanctioned opinions is that it solves
this problem - it hasn't been solved yet. Facebook needs to continuously
monitor the adapting activity and patterns of people who are trying to say
things that governments and Facebook customers don't want people saying.

Basically: machine learning doesn't adapt but people do, and naive attempts to
get machine learning to adapt make it vulnerable to being tricked, because
classifier theory inherently has no concept of adversarial data.

~~~
mathperson
umm no!...I don't know too much about this sub-field..but there are machine
methods designed explicitly for adversarial data. it even has a wikipedia
page! I don't know if facebook's methods implement any of this yet..but your
statements about machine learning not modelling this just ain't so.

[https://en.wikipedia.org/wiki/Adversarial_machine_learning](https://en.wikipedia.org/wiki/Adversarial_machine_learning)

[http://www.cs.virginia.edu/~evans/](http://www.cs.virginia.edu/~evans/)

~~~
jwtadvice
Great resources!

It's important to understand that these approaches are on the leading edge of
research, and represent unsolved problems!

~~~
mathperson
yes precisely! i'm sure Facebook's methods are very susceptible to these types
of attacks. Hopefully they still are in 6 years and we can get jobs fixing
this for them lol.

I also really recommend this book [https://www.amazon.com/Trust-Me-Lying-
Confessions-Manipulato...](https://www.amazon.com/Trust-Me-Lying-Confessions-
Manipulator/dp/1591846285/ref=sr_1_1?ie=UTF8&qid=1479519844&sr=8-1&keywords=trust+me+im+lying)

The type of attack is a lot more oldschool but nonetheless, pretty effective.

------
nkurz
It would help me if the author were to point more examples of what he means by
"fake news". As it is, I can't really tell if he's really complaining about
"Onion" style satire, copypasta with clickbait titles, evil government
propaganda, or just political opinions that he disagrees with. Does a public
training set exist that makes it clear what he's talking about? Does the
division point he wants actually exist?

Snopes published an article yesterday asking that asks the question more
clearly:

 _There is much bad news in the online world, but not all of it is "fake"
(i.e., completely fabricated information that has little or no intersection
with real-world events). There are also partisan political sites that take
nuggets of real news and spin them into highly distorted, clickbait articles.
There are sites that misleadingly repackage old news as if it were current
information. There are sites that aggregate articles from a variety of dubious
and questionable sources. There are sites (especially in the fields of health
and science) that believe they're presenting pertinent information but are
woefully inaccurate in their information-gathering and reporting. These forms
of news are all bad in one way or another, but broadly classifying all such
information as "fake news" clouds an already confusing issue._

[http://www.snopes.com/2016/11/17/we-have-a-bad-news-
problem-...](http://www.snopes.com/2016/11/17/we-have-a-bad-news-problem-not-
a-fake-news-problem/)

~~~
sulam
Here's an easy example:

[http://abcnews.com.co/donald-trump-protester-speaks-out-i-
wa...](http://abcnews.com.co/donald-trump-protester-speaks-out-i-was-paid-to-
protest/)

Google is probably just as easy to fool as Facebook. This was the first hit on
a search for "protesters paid to protest", a story that is demonstrably fake:
[https://www.washingtonpost.com/news/the-
intersect/wp/2016/11...](https://www.washingtonpost.com/news/the-
intersect/wp/2016/11/17/facebook-fake-news-writer-i-think-donald-trump-is-in-
the-white-house-because-of-me/?hpid=hp_rhp-top-table-main_fake-
news-845a%3Ahomepage%2Fstory)

Immediate red flags should be: This isn't ABC News's website. There are
statements in the article like: "No one still has an AOL email address except
people that would vote for Hillary Clinton" and from a made up Trump supporter
quoted saying "I knew those weren’t real protesters, they were too organized
and smart..." and (awesomely) "I think I was paid more than the other
protesters because I was white and had taken classes in street fighting and
boxing a few years back".

I mean, this may well be parody, but it gets picked up as real news and he's
making real money off it, impersonating a real news site, which is the point
at which it cross the line from parody to fake, IMO.

~~~
nkurz
_I mean, this may well be parody_

"May well be" is being unnecessarily conservative. Maybe you didn't make it
far enough into the article to read the faked interview with the author of the
the Snope's article I posted:

 _David Mikkelson, founder of Snopes.com, a website known for its biased
opinions and inaccurate information they write about stories on the internet
in order to generate advertising revenue, told ABC News that he approves of
what a story like this is accomplishing._

 _“You have to understand that when a story like this goes viral, and we spend
a minute or two debunking it, we make lots of money. Stories like this have
helped put my children through college, buy a new car, a home and even get the
Silverback gorilla my wife Barbara always wanted since she was a child,”
Mikkleson said._

Buried, but actually rather funny. In a way, this is both a fabulous and a
terrible example. Read in its entirety, it's clearly satire. Yet I'm sure that
many people never read it closely enough to notice.

Should Facebook be censoring this? No. Should they be labelling it? Yeah, it
might be a good idea. Is this the sort of story that the author is referring
to? I don't know, but it would be nice if he'd tell us explicitly.

~~~
hga
Given who Snopes assigned political "debunking" to and the predicable results,
you could also label this item as "Fake but Accurate", the "website known for
its biased opinions and inaccurate information" is accurate enough for enough
of their political items.

One things I don't get, though, is why so many organizations destroyed
whatever credibility they had remaining in a futile attempt to win this
election for their preferred candidate, I mean, this was an "it's my turn"
race, a bit like Dole in 1996 and Romney in 2012. Especially prior to Trump
winning the Republican nomination, was it _really_ that important?

Note that this goes all the way to the DNC for the party itself....

------
mentat
I challenge anyone to put together a robust definition of what "real news" is.
This is a fundamentally unsolved problem at the philosophical level.

~~~
hota_mazi
Reporting an event that actually happened.

How hard a concept is this, really?

~~~
pixl97
Very. Prove I went to a restaurant yesterday. You could show a video of me at
a restaurant, but is it from yesterday, or some other day. Is it really me
there, or someone that looked like me.

~~~
hota_mazi
You are describing the difficulty of proving that an event really happened.

The discussion is about the definition of "real news".

Two very different things.

~~~
dingaling
Well perhaps that is the problem. "The news" _used_ to be about events that
had happened, often several days in the past due to the propagation speed of
information.

Nowadays "the news" also includes:

1\. What someone says might be happening right now.

2\. What someone says about something that happened, or which might be
happening right now.

3\. What someone thinks might happen next.

None of which are necessarily true or grounded in fact but all of which are
valued by the media in the race to be first with the headlines.

So perhaps "real news" could be defined as "events which have occurred and for
which we have objective proof" and everything else is just "editorial".

------
whoopdedo
If Facebook has a fake news problem it's not in small part because Facebook
users will share fake news articles more than real news. Google's decision to
not serve AdWords on fake sites is because those sites were taking advantage
of the high traffic fake news generates to make money.

Fake news is popular. And the entire social economy is based on facilitating
that which is popular. Because the more eyes you can drive to your content the
more advertising you can sell.

While Google vilifies the website for making the fake news, they also profited
by selling the AdWords those sites hosted. While Facebook says it's difficult
to distinguish legitimate news from fake news, they make more money from the
frequently shared fake news than the less popular real news.

The conflation of article content with advertising gives the advertising
brokers a disincentive to prevent fake articles. Until readers stop clicking
on the fake stories, but one glance at a typical supermarket checkout aisle
will tell you that isn't going to happen. If Google or Facebook are serious
about not encouraging fake news they will stop using the content of a webpage
to drive the price of ads. But that obviously isn't going to happen as
targeted advertising is their raison d'etre.

TLDR, Facebook has met the enemy, and he is Facebook.

------
braindead_in
One person's fake news is another person's entertainment. The problem is not a
technical issue. It is a cultural issue. ML cannot solve a cultural issue.

~~~
tedmiston
It'd be cool if I could adjust a parameter on how gossip-y or factual I want
my news sources to be.

Google News seems to have something in place with the "Interested in? / Not
interested" underneath some story abstracts.

~~~
grzm
Another knob I'd like would be terseness. Sometimes I want a nice, wandering,
more literary piece. Sometimes I just want the facts.

~~~
tedmiston
That would be incredible.

The Information is an unpopular, I think even blacklisted, news source here
due to their paywall. But they do a really great job writing a very terse and
high signal two-sentence takeaway for each article. I'm not a subscriber
anymore (too pricy IMO), but here are some recent examples:

 _Surge-Price Builder Leaves Uber_ By Amir Efrati · Monday Oct 17, 2016

> THE TAKEAWAY: One of Uber’s secret weapons in recent years was an economics
> professor, Keith Chen. He overhauled Uber’s surge-price algorithm and made
> other key contributions that helped riders, and Uber, save money.

 _How Dropbox Doubled Down on Business Market_ By Steve Nellis · Thursday Sep
15, 2016

> THE TAKEAWAY: Dropbox founder Drew Houston has come to terms with the fact
> that his company probably isn’t the next consumer tech giant like Facebook
> or Google. That was reflected in a decision halfway through last year to
> sharpen the company’s focus on the business market, including shutting off
> consumer-focused apps like Carousel.

I would love to have that for all tech news.

~~~
grzm
I've heard good things about The Information as well. One of the reasons
they're able to do that, I suspect, is _because_ it's behind a paywall: they
can pay their journalists for quality content.

There are ML systems out there for summarization. I wonder how long until the
systems are good enough to provide this as well. The examples I've seen (e.g.,
Summly, which uses SRI tech, IIRC), are likely good enough if you just want
the facts. I still want it to be well-written.

Ouch. I just checked out the subscription rates for The Information. AFAICT,
you can't even see the rates without supplying your name and email address. :/

Edit to add: From other sources on the internet, it looks like subscription
rates are $399/year.

~~~
tedmiston
I feel the same way. Maybe there's room for a small service ($5/month) where
we can submit articles that might be interesting and the most popular ones get
summarized in a well-written way.

I do remember the rates being hard to find as well: $39/month or $399/annual
[1]. They do offer you half-price initially etc for a few months.

[1]: [https://www.theinformation.com/payment-
policy](https://www.theinformation.com/payment-policy)

~~~
grzm
I've thought about something along the same lines, though more in terms of
"news source/quality certification". I just don't see a way to prevent the
summaries/ratings from being pirated/"re-used" elsewhere.

~~~
tedmiston
Hmm, well… I don't think a paywall is desirable. Maybe the summaries are free
and users can pay for individual summaries that they like with
microtransactions. A fractional penny up to a quarter perhaps.

~~~
grzm
Maybe that would work. Are you familiar with any successful use of micro
transactions? I haven't gone out of my way to look. Something like Patreon
also comes to mind, though I've only heard of that in the podcast sphere,
where people have a more personal relationship with the content creator given
the format.

~~~
tedmiston
I haven't seen one yet personally. I heard of a program Google did where users
paid a flat rate, say $10/mo, which was divvied up at the end of the month
amongst articles / authors each person chose. I saw the website once last year
and have not been able to figure out the name or find it since.

I suppose what I meant is technically not microtransactions (mostly just used
in the context of in-app purchases) but _micropayments_ [1]. I've added ko-fi
[2] to my personal site but no one has actually done it yet.

[1]:
[https://en.m.wikipedia.org/wiki/Micropayment](https://en.m.wikipedia.org/wiki/Micropayment)

[2]: [https://ko-fi.com/Home/About](https://ko-fi.com/Home/About)

~~~
grzm
Thanks for the links!

------
tall
It's not a talent problem. It's a culture/ management issue.

------
mcbits
Facebook's job is to show you what people in your network are sharing with
you, filtered to what seems most likely to keep you engaged with the site.
It's not to act as peer reviewers for the "news". All these people suddenly
complaining about "fake news" are really complaining about the unwashed
masses' content preferences and world views. Maybe the urge to complain is
justified, but it's aimed in the wrong direction.

------
snipethunder
I think there is also a incentive problem. Internally, from what I know, the
performance of a team is always driven much how much metric it can drive, how
much incremental revenue it can generate, how much more engagement this
feature will drive, etc. The number of contents taken down doesn't seem like a
sexy metric, so I assume no team wants to take this unsexy task.

------
danieltillett
The fundamental problem is the mixing of opinion and facts into the same story
combined with a complete lack of interest and care by writers in finding out
what happened.

I want and will pay for a news service which only reports facts (no opinions)
and which is concerned with accuracy. There are specialty services that
provide this in certain niche areas, but no general service.

------
yosito
My first reaction to this was to object to using machine learning to validate
news stories, since it's very hard for a machine to go out into the world and
verify that a story is based on reality. But the idea that fake news on
Facebook is similar to the content farmed / bot generated / spammy stuff
Google tries to eliminate, makes me think. The curious uptick of fake news
over the last few months makes me wonder if Facebook either tried to, or was
complicit in swaying public opinion before the election. I've long suspected
that Google/Facebook technically have enough power to effectively decide an
election by manipulating public opinion. But if they were going to do so,
they'd attempt to do it in a way that doesn't make them look culpable. Someone
should investigate.

------
captn3m0
Surprised no one has pointed out Google's aproach to the problem:
[https://blog.google/topics/journalism-news/labeling-fact-
che...](https://blog.google/topics/journalism-news/labeling-fact-check-
articles-google-news/)

They leave the fact-check to the publisher and ask them to use
[http://pending.schema.org/ClaimReview](http://pending.schema.org/ClaimReview)
schema to tag news as fact-checked.

Already in use by some publications. Announced almost a month before the
election day (Oct 13).

------
anonu
I think fake news is actually a somewhat intractable problem with the level of
ML today... News is usually considered "breaking" and new... meaning there's
nothing out there to cross-reference it with. Even if you were able to extract
"facts" from an article - you might find it difficult to find any other source
(that you trust) that would also be reporting those facts. It is a very
difficult problem...

I think we will eventually get to a point where we can flag potentially fake
news. But were still a few years away with our current technology.

------
bitL
A question - does anyone utilize the fact observed in cognitive neuroscience
in robotics, that the best learning happens in organically growing systems,
not in universal systems where you run a huge batch on a clean, pre-configured
state? So far I see everyone designing the structure of an ML system in
advance and hoping for the best (e.g. define the whole structure in TensorFlow
in the beginning), instead of allowing the system to self-organize as the
learning happens.

~~~
tedmiston
Are you referring to _reinforcement learning_ [1] as opposed to "standard"
supervised learning?

[1]:
[https://en.wikipedia.org/wiki/Reinforcement_learning](https://en.wikipedia.org/wiki/Reinforcement_learning)

------
throwaway9834
There is a large amount of (possibly dubious) research indicating it is
possible to detect whether a speaker is lying by examining movements and
changes in his face and/or body.

There has been a large amount of successful research using neural networks to
classify human behaviors (facial expressions, lip reading, etc.)

Is it possible to create a system to act as a high quality lie detector?

Such software could be a poweful tool in studying crime, politics and
business.

------
mirekrusin
When person A says something publicly that is fake and newspaper writes
article quoting the person - is the article fake or not?

It's very blurred indeed.

What would probably help is extracting list of (main) statements/assumptions
made and give them fact-probability score (ideally with references) - and show
it to people.

But this is not trivial exercise, is it?

I mean, once we have technology that's good in doing it, well... it means we
can probably rely on this AI to make 80% of decisions to run the country.

------
betolink
Their news feed sucks big time, and its' not just a facebook issue, most
social media outsmarted themselves with ML.

~~~
grzm
My (limited) experience with feeds online has been disappointing. Do you have
examples of good feed implementations, especially at larger scales, such as
what Facebook works at? What tech do they use?

~~~
tedmiston
Feedly's Today page kind of feels like a news feed, though of RSS feeds which
I've subscribed to by hand. That said, I believe it's just selecting the most
viewed stories over a very recent period from your most popular feeds (some
ambiguity here).

------
sprobertson
Most of the other ML applications listed (driving, voice recognition, content
recommendation) are somewhat easy tasks for humans, if not at scale...
Reading, understanding and fact checking a news article is comparatively
pretty difficult.

------
gr_thrwy
Leaving aside the Fake News controversy, given the number of fake accounts
polluting their social graph, I would say yes, FB does suck at ML.

------
psadri
My respect for Elad Gil just dropped a notch.

~~~
eladgil
Sorry that is your reaction. Will email you to discuss.

------
eip
That's what they want you to think.

------
msie
Facebook let itself be bullied by the right.

------
cocktailpeanuts
What a misleading title. The real title says "must suck at" (which probably is
insinuating that they probably don't suck), and somehow it's been fabricated
into this clickbait title.

~~~
sctb
We reverted the title from the submitted “Facebook Sucks at Machine Learning”.

------
BucketSort
This angers the LeCun

------
deepnotderp
Ummm,considering that until recently, FB was second only to google in Deep
learning, i would beg to differ.

------
myf01d
Facebook actually sucks at most things, but it doesn't matter as long as the
stock goes up and it generates more revenues.

