Hacker News new | comments | show | ask | jobs | submit login
The Bot Bubble: Click Farms Have Inflated Social Media Currency (newrepublic.com)
105 points by dredmorbius on Apr 26, 2015 | hide | past | web | favorite | 62 comments



I don't understand, when Facebook sends a SMS for validation, can't they check from what country the number is?

If a person registers as an American while being in the Philippines , they could improve their security checks and ask for more details. For instance.

Seems to me ultimately Facebook needs these fake profiles so it can inflate its user base and its ads revenue. Isn't it what people call "growth hacking?"

Interesting article nevertheless. I don't condemn what these folks oversea are doing. It's not hacking, nor extortion , it's obviously some form of spam and against Facebook TOS but I prefer people working in these "digital sweatshops" rather than selling drugs , working in brick factories or prostitution.


You can register as whatever you want and after verification change it to American.


There has been very little incentive to verify any accounts when part of the selling point of your platform is "we have x billions of active accounts", where or not the accounts are real.


Not really true. Spam accounts degrade the quality of the experience, leading to an eventual collapse of real users.

It's more a matter of long term vs short term incentives. Focusing on the short term might lead a fly by night startup to boost user counts unscrupulously, but FB is focused on long term sustainable growth.


>Not really true. Spam accounts degrade the quality of the experience, leading to an eventual collapse of real users.

You say it isn't true, then describe the scenario that is actually happening today, described in this article: a lower quality experience for users and advertisers. The problem exists because of misaligned incentives. Sure, if advertising and users growth plummet due to the problem, Facebook will be forced to do something. But it's happening now and little is being done.

The spam accounts are not new. I worked for a relatively successful gaming company during the social gaming goldrush a few years ago. While doing some research into cheating in our game, we looked at trying to determine players who were using multiple accounts. We figured that close to half of the daily user accounts were fake. Half.

Hell, each of us had multiple accounts ourselves, since we did a lot of testing and market-research, and didn't want to inundate our real-life streams with spammy game updates.

Of course, when the value of your company rests on those DAU figures, no one is going to talk about it. Just report what it says.


I think they give "long term sustainable growth" a lot of lip service, but what they're actually focused on is "as much outwardly-seeming growth as can be manufactured via whatever means possible". This is why they're slash-and-burning their way into "developing markets" with projects like internet.org: they're the only parts of the world left that they haven't sucked dry.


Especially bearing in mind that these figures affect the actual amounts of $$ that advertisers are charged!


As far as fake profiles and "growth hacking" go, I bet FB has solved for the optimal per cent of fake profiles. Above this amount, FB brand suffers, below this level ad revenues suffer. They probably also factor in the cost of effectively keeping fake profile below this threshold. That's not to say FB encourages or promote fake profiles, I just think they've studied it a lot more intensely than most of us commenters have. And given their size, few have a greater economic incentive than them to fully understand the problem.


These people use US based proxies and US registered SIM cards. From Facebook's point of view they are in the US.


Are you sure about US registered SIM cards? That would be a big security problem as well.


Why is it a security problem that a U.S. SIM card works in the Philippines?


Terrorists can get access of a U.S. SIM card and use it for communication, when someone sees the number, they will think its a U.S. number. I mean, it adds another layer of difficulty in tracing it back to the user. May be I am wrong.


This is hilsriously silly. You can buy US SIMs anywhere in the world with no registration. https://www.readysim.com/

SIM is not a national identity card scheme. Also, caller ID is entirely spoofable - this is something of a problem for fake SWAT calls.


I see.

From the readysim site:

>> For military and law enforcement personnel engaged in covert or undercover activities, Ready SIM offers voice, text, and even email communications.

But that makes it easy for the terrorists too, to do covert operations and hide their identities.

Also, it probably only works _in_ the US, even though you can order it _from_ anywhere.


If the worst thing terrorists are doing is fooling me into thinking an incoming call is from Iowa, I'm good with that.


Assuming the telcos are on the hook for the US govt, it probably narrows things down a whole lot more when a US sim connects to a tower in the Philippines


When companies like Facebook send SMS they do it through a third party provider who talks to the operators (e.g. Twilio). To Facebook, all that is just a black box - if they are lucky they get a callback saying the message has been delivered, but that's usually about it. It's not possible unless you have cooperation with each of the telecom operators to find out where that number is currently registered (i.e. which base station).

Even then as with everything telecoms related, everyone does things slightly differently. They would need to reimplement it for every single operator - even then I doubt it would be 100% accurate, especially once you take roaming into account.


Doesn't Facebook see the country code of the phone number on the registration page?


Perhaps they are U.S. numbers with support for overseas roaming. Only the carrier would know the final location.

I'm curious how they can do this so cheaply though. Maybe they're using a sketchy carrier that FB could flag for further review.


Hmm, I guess they could buy prepaid SIM cards in bulk from all over the world.


With U.S. numbers though?


This is interesting. I ran into a guy who uses another part of this ecosystem.

He says he's able to show ads that violate FB's terms of service by selectively showing a different ad to people from within FB's network, so that they don't cancel the account. If they do find out, everything is done via anonymous credit cards, so no big deal, just rinse and repeat.


Google's Panda and Penguin algorithm updates really hurt spam and low quality "blogspam" type sites. It was a pretty huge overhaul that changed the game.

I wonder if facebook is working on such an update. One which would squash most of these click farms. I would think, given their resources, they would. The fact that they haven't yet leads me to believe they think this is a necessary evil and that they willingly allow it to continue.


"I wonder if facebook is working on such an update. One which would squash most of these click farms. I would think, given their resources, they would."

Judging from the anecdotal amount of blatantly "fake" accounts I've reported to Facebook, and seen nothing done about... I'd have to say that they don't "care" all that much short of plausible deniability. Accounts, likes, clicks, views are all currency to Facebook, which it trades to prospective companies in exchange for real money.


You think they will take manual action on individual accounts you reported?


That all depends on volumes, I guess. I don't expect them to put an investigator on every report. There are ways and means to get reliable data out of reports.

Example time.

FB get's 10 individual and separate reports for one specific account being fake. Let's assume this is some sort of "spike", and unusual. They then have an investigator manually check this account, and verify if it is a real account or not. If X people report it as fake, then it's likely a fake account, therefore human-intervention is required to confirm.

Then, stage two. After taking action, they note that those 10 individuals who reported this account matched the outcome of the human investigator. We could then infer that those accounts are:

A. Less likely to be fake themselves. And B. More reliable signals of fake accounts.

So, next time around. Perhaps the cumulative, or "enhanced" scoring of those accounts' reports requires a lower threshold for human intervention.


This is chasing your tail. Fighting spam at a big level is not as easy as people think it is.


> The fact that they haven't yet leads me to believe they think this is a necessary evil and that they willingly allow it to continue.

A cynical view would be: why are fake accounts bad for Facebook? These fake accounts help boost the number of users they report every quarter. These accounts don't use any server-side resources. They just sit around generating "likes", and with each "like" Facebook hears a cha-ching!


The problem also of this kind of updates is that they drastically affect the service during the change. It took at least 6 to 8 months for Google to recover back to the quality of search they had before the update. Now everything works fine but just after the Panda update, it was a complete disaster in terms of quality. But maybe it's a necessary evil, I can't think of another solution for this kind of problem.


Links to low quality blogspam threatened the quality of Google's results. Do inflated "likes" for brands or celebrities harm the core value prop for consumers or Facebook's business model with advertisers? If not, why do anything, esp. if addressing it kills their scary/differentiating "people in the world with Facebook accounts" metric.


>Do inflated "likes" for brands or celebrities harm the core value prop for consumers or Facebook's business model with advertisers?

The argument presented in the article is that they do:

>These fake likes weren’t just an empty number. Whenever Second Floor Music posted content, Facebook’s algorithms placed it on the newsfeeds of a small, random sample of fans—the people who had liked Second Floor Music—and measured how many “engaged” with the content. High levels of engagement meant that the content was deemed interesting and redistributed to more fans for free—the main goal of most businesses that use social media is to reach this tipping point where content spreads virally. But the fake fans never engaged, depressing each post’s score and leaving it dead on arrival. The social media boost Bronstein had paid for never happened. Even worse, she now had thousands of fake fans who made it nearly impossible to reach her real fans. Bronstein struggled to get help from Facebook, reaching out repeatedly through help forums, but, in the end, she scrapped the original page and started again from scratch. Second Floor Music had effectively paid to ruin one of its flagship Facebook pages.


> But the fake fans never engaged, depressing each post’s score and leaving it dead on arrival.

Sounds like a business opportunity. Fake fans who actually engage with content. They could even post fake comments and falsely claim they are planning to attend events. What a wonderful future we've built for ourselves.


The harm is coming from Second Floor Music chasing the wrong metric: advertising to get followers as if having more likes was valuable in and of itself. Over the long term, Facebook advertising is only useful if it builds a brand or prompts a direct response. Facebook may opportunistically make money in the short term by taking money from people looking to push up a vanity metric, but presumably they aren't addressing the fake follower problem because paying for followers isn't a sustainable business, or core to how they'll make money over the long run.


Hi Msabalau -- Building a Facebook page is an important part of building our brand and engaging with users to nurture leads -- not only to convert in the short-term, but to build long lasting relationships. It's especially important for us because our products are simple digital downloads that go for as little as $1.50 each, so we need lifelong customers.

We did not buy likes and I am not concerned with how many people like our page. I'm concerned with our engagement rate and reach -- which is (now) phenomenal. In the campaigns I have administered (after the incident that the article reported), I've created carefully targeted campaigns with high-quality copy. The New Republic article, however, discusses our first campaign, which was unfortunately low-quality and poorly targeted. Instead of learning that we hadn't created a good ad by seeing bad engagement, we learned by having our page overrun with fake likes who could not be removed. That's not a typical response for an ad buy, which is a real problem with Facebook's system.

Thanks for reading the story!

Rachel Bronstein, Web Editor, Second Floor Music


If they have an update, they would use it when they've peaked and can use it as an excuse for a huge drop in active users.


for those interested in this topic these two excellent videos from veritasium corroborate this article and go beyond to explain the stark implications for facebook's business model:

facebook fraud:

https://www.youtube.com/watch?v=oVfHeWTKjag

the problem with facebook:

https://www.youtube.com/watch?v=l9ZqXlHl65g

> From January 2013 to February 2014, a global team of researchers from the Max Planck Institute for Software Systems, Microsoft’s and AT&T’s research labs, as well as Boston and Northeastern Universities, conducted an experiment designed to determine just how often advertising campaigns resulted in likes from fake profiles. The researchers ran ten Facebook advertising campaigns, and when they analyzed the likes resulting from those campaigns, they found that 1,867 of the 2,767 likes—or about 67 percent—appeared to be illegitimate. After being informed of these suspicions, Facebook corroborated much of the team’s work by erasing 1,730 of the likes. Sympathetic researchers from a study run by the online marketing website Search Engine Journal have suggested that targeted Facebook advertisements can yield suspicious likes at a rate above 50 percent. In the fall of 2014, Professor Emiliano De Cristofaro of the University College of London presented research which found that even a page explicitly labeled as fake gained followers—the vast majority presumably bots.

> The bot buildup can even affect companies that aren’t advertising with Facebook, but are just passively hoping their pages gain real fans. In 2014, Harvard University’s Facebook fans were most engaged in Dhaka, Bangladesh. (They stated that they did not pay for likes.) A 2012 article in The New York Times suggested that as much as 70 percent of President Obama’s 19 million Twitter followers were fake. (His campaign denied buying followers.) Less prominent pages from across the world—from those belonging to the English metal band Red Seas Fire to international bloggers—have been spontaneously overwhelmed by bots that are attempting to mask their illicit activity by glomming on to real social media profiles.


The Veritasium videos are excellent explanations of the dynamics of the click/fan fraud and perverse incentives for Facebook.


Fake accounts would like Harvard and Obama to appear as real Americans.


perhaps this is why FB has been de-emphasizing the importance of likes to advertisers.


Great article - and very long long.. I certainly learned a lot.


Long but easy and pleasant to read so it reads fast. No bullshit, a real investigation, the kind of journalism that should be encouraged and rewarded.


Are the SIM cards very cheap? They earned 70 cents per profile, are SIM cards cheaper than that?


you can buy empty, but still receive enabled sim cards in developing countries in bulk very cheaply


So ask people to send an SMS then?


Any sources or details on that?


There's a huge mobile phone recycling industry with its center of operations in asia (Hong Kong). Old phones are traded in very large bulks and a lot of these will have their sim cards still in. I believe the buyers take the phones apart and reuse/resell components, where sims are just one type of component tha can be traded.


One of my thoughts is that there might well be an opportunity in bulk-providing SIM cards.


>> he processes SIM cards dropped off by men on motorcycles, paying a few cents for a card that would sell for $5 to $10 in the United States


I wonder how Facebook came up with the 7% fake accounts number.

Certainly social media profits short term from these schemes... as long as they don't get out of control and threaten the entire business model. So the cynic in me suggests Facebook and the like aren't actually interested in eliminating the scams entirely but rather keeping them managed within certain parameters while appearing to be trying to do everything they can to shut them down.


My approach would be stratified sampling of large data.

If this were my project to investigate, I'd want a random sampling of, say, new account registrations from within the past month or three. A 100-1,000 profile sample would be sufficient to get a strong estimate of the mean, based on scoring of accounts based on some sort of activity profile.

Remember: what's most critical in data analysis isn't the size of the sample, but the selection of it. Facebook have direct access to their own data and could ensure that any such selected sample was in fact random. Further Monte-Carlo re-sampling of a larger sample (e.g., subsets of the 1,000 sample set) could show possible non-randomness.

Other identifiable parameters, especially clustering of registrations through proxies, would also be generally determinable.

Note that the decrease in uncertainty from increasing a sample size (and hence: analysis costs) 10x is only about 3.2 -- your standard deviation decreases with the square root of the sample size. For a normal distribution a sample of only 30 is generally considered sufficient for "large-sample" methods. Most national opionion and political surveys are based on samples of 300 people. The cost that's incurred comes from the fact that once you've selected your sample you make repeated contact attempts to reach all those identified for inclusion.

By contrast, self-selected surveys, and most particularly online surveys and popularity polls, are highly susceptible to sampling bias. Most often, they're not analysis tools but contact-selection or lead-generation tools. They may have limited utility in identifying issues of concern, but should not be relied on in ranking them absent further research.


Heh, cool stats lesson. Always learn something new on HN.

The only thing I didn't catch was how they were to figure out which of the accounts in the sample were fake.


You might have missed: "scoring of accounts based on some sort of activity profile."

Fake accounts are almost certainly going to have characteristics which distinguish them from legitimate ones, and likely a quantitative, manual, or combined assessment might determine that. Points to consider:

⚫ Photo images. Search for duplicates or analyze for manipulation.

⚫ Correlation with other data sources. Real names, Social Security, marketing, and numerous databases exist which tend to point to more legitimate profiles.

⚫ Direct contact. Set up events, meet-ups, special purchase offers, etc., or otherwise try to elicit direct action. Even statistically differential response rates are useful.

⚫ Social graph. Real people _interact_ with other real people, and there's a web of trust or certification which can be imputed or determined.

⚫ Network profiles. Identifying large numbers of accounts with similar or suspicious profiles _and_ originating from the same or adjacent network addresses (CIDR/BGP block), or from known nonresidential space that's _not_ a generally-used proxy, would be a strong indicator. If proxy use continues to climb, that's going to be less useful (and I suspect this will be the case).

Tests along these or similar lines could be used to generate more general scoring algorithms for identifying suspect accounts.


With a small enough volume, you could do manual inspection. I'm guessing most fake accounts are easy to spot by an actual human, putting an effort into it.


Pretty much that. 100 accounts could be reviewed by one person pretty easily in a day or few, 1,000 by a small team, specifically trained and applying consistent standards, with cross-checks.

Once you've done the manual assessments, deriving heuristics for broader estimates are also possible. The key is having training data of known classified accounts.


Certainly. Facebook is selling likes and actions and is making good money from it. If they come from fake accounts - good for Facebook's business and not a problem for them as long as most advertisers don't complain about it...


I can think of a service that let's people upload his/her photos on some site and this site would then ID the photos uploaded to him/her. That is sites like social networks, dating sites, and online galleries can then verify if the photo is an original by verifying a hashed version of a photo to the service.


The problem with your solution is that it requires cooperation; If I'm running a competitor of facebook, (and a dating site, in a way, is a competitor to facebook) I'm not going to want to just give them my content, you know?

You'd be better off with something like tineye or google image search; something that snarfs all the photos off the web, you know, does this regularly and timestamps where it saw the photo first. It wouldn't be 100% but would likely work better than the cooperative solution you propose, and it's something that is well within the capabilities of facebook.

Even then... yeah, I think this sort of thing is going to be pretty much impossible to completely stamp out as long as you can sell fake profiles for enough money that you can pay an actual human to create them.


Services like this do exist already, Airbnb uses one for passport verification. But would you sign up to a dating site or app that requires you to upload a copy of your passport?


Not exactly a passport. But more of an "I uploaded this image first on the internet therefore it's mine."


Or I photoshopped it enough that your filter can't see it is the same and therefore it is now mine (this is easier than you think).

Also images uploaded on Facebook aren't public by default, so tinyeye etc wouldn't have them.


Oh sorry, I misunderstood... Yes that could work, but you'd have to rely on all the big companies using it too. Getting something like Match.com to use it sounds like quite a challenge :)


What about something like "you can't be friends with these people unless you are pictured with at least some of them"? That would at least require a bunch of pictures to be taken.

Of course it's not fully thought out how it would work, maybe just for people who are new to FB? You also need some sensible image recognition algo, or you'll get a lot of chopped off head mashes. But I'm sure FB has staff for that sort of thing.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: