
Mozilla blogger bought 1 million Facebook entries (full name, e-mail) for $5 - dbcooper
http://talkweb.eu/openweb/1819
======
unreal37
I suppose any 15 year old with programming skill can do this. Create an app
(kill zombies!) and then when the popularity dies down, sell out your users
info to the highest bidder.

Kinda sensationalist at the end - "DO YOU STILL FEEL SECURE?" Uh yeah, I do. I
get hundreds of spam a day that Google puts into a little folder for me to
erase. Go ahead and email me, and join that exclusive folder right away.

Oh you want to send me a private message on Facebook? Facebook is kind enough
to put messages from non-friends in a special folder too, and I never check
that.

I feel secure.

~~~
mandeepj
If the sender adds the required headers into his email then it is very likely
that google or any other email service provider will not consider the email as
spam.

Regarding facebook I partially agree with u. Sometimes the message from non-
friends goes into other folder and sometimes I can see it in the regular mail
folder.

~~~
jwegan
Adding the proper headers such as DKIM, SPF, etc., are used to make sure a
spammer can't send mail pretending to be someone else.

All spam systems still look at the content of the message plus the reputation
of the IP/domain when determining if a message should be marked as spam or
not.

------
cs702
Wow, five bucks.

As the saying goes, if you're not paying for the product, you're the product.
The new twist here is that the product (i.e., your FaceBook info) is now being
sold in the open market for only $5.00/1,000,000, or $0.000005 per person.

Prices normally go down only when supply exceeds demand, so the inescapable
conclusion is that there's abundant oversupply of this product in the open
market. Yikes!

~~~
notatoad
>Prices normally go down only when supply exceeds demand

I don't think that rule applies for digital goods where the cost of
reproduction is zero. The supply is infinite.

~~~
TylerE
Nitpick: Cost per unit of digital goods is _low_ , but not _zero_. Servers,
bandwidth, CC fees, sysadmins, etc.

~~~
001sky
Not a nitpick, beacuse the cost to enter the data by the user is also not zero
(thus supply is not infite either). These are _subtle_ but not _trivial_
things to keep in mind, when dealling with massive scale (A billion users,
ect.)

~~~
notatoad
Those are costs, but they're not marginal costs. They have no bearing on the
cost of each copy, nothing to do with scale.

~~~
TylerE
They're absolutely marginal costs, if you look at the right way.

Takes more servers and fatter pipes to support 10,000 downloads a day rather
than 500.

~~~
notatoad
For something like a list of a million user names and email addresses? You put
it on pastebin and set up a script to email out links to it when you get a
Paypal payment confirmed email. The only cost is to acquire the data, once
that is done, there is zero cost.

If you want to talk in totally abstract terms, digital goods in general tend
to have marginal costs associated with them. In the context of this
discussion, there is no supply and demand factor, there are no marginal costs,
and there is no market force called scarcity.

~~~
001sky
Let N=(1000 items of unique information) Let W= (2000 items of unique
information)

2(N) does not yield W, regardless of cost to copy (N).

To get W, you will need to do something more. This will not be cost-less.
That's the more general case.

~~~
sirclueless
That's not what marginal cost means to a supplier. The question isn't whether
it costs more to acquire 2000 email addresses than it does to acquire 1000
email addresses, the question is whether it costs more to distribute to twenty
buyers than it does to distribute to ten.

Thus, the cost of hosting is a marginal cost (probably zero in this world of
pastebins and digital lockers). The fee taken by the payment processor is a
marginal cost. The cost of finding twice as many emails is not.

~~~
001sky
No, a "supplier" has to pay for all of his raw materials costs. That includes
inventory costs as well as distribution. Of course you can always restrict
your timeframe and assume away this cost (inventory as already incurred), but
this is not true in the general sense. In particular, if this is true, by
assumption, the there is a limited supply by deduction. If you increased your
supply [of information bits, not duplicate bits], you would have to pay to
incur inventory at that margin precisely. So you never have together zero
marginal cost and unlimited supply, this makes no sense.

 _notatoad 1 day ago | link

I don't think that rule applies for digital goods where the cost of
reproduction is zero. The supply is infinite._

To sum, "the cost of reproduction" is <not> the cost of "supply", unless the
supply is assumed fixed. Thus the second sentence does not follow _per-se_.

~~~
sirclueless
I don't think you are understanding. Of course there are big costs in
acquiring more product to sell. The question is: Do you have to pay those
costs for each customer, or can you pay them once and amortize the cost over
many sales?

For example, Adobe Photoshop probably costs a lot to design. It has really
high fixed costs, because you need to hire good developers and implement a
bunch of advanced operations. However, once Adobe pays the fixed costs, the
marginal cost of Photoshop is pretty minimal: packaging, printing a DVD, maybe
some marketing. It still costs a lot because the fixed costs are so high, and
there's not much competition.

Conversely, a plumber has relatively low fixed costs: a truck, some tools, and
some training. But plumbers also cost a lot, and this is because they have
really high marginal costs: they have to spend an hour at the house of each
and every customer.

So I agree with you, there may be high costs in acquiring email addresses to
sell. My point is that they are in no way marginal costs.

~~~
001sky
The costs are marginal at the point of periodicity.

example: reseller> pays adobe every month/quarter example: adobe> pays
versioning costs every 24 months

Provided you shrink the window of analysis, you can say "already paid for
inventory, just amortizing it". But in that case, you _don't have unlimited
supply_ , you just have whatever you paid for.

In the case of adobe, despite having "unlimited copies" of CS5, they would
(eventually) run out of supply of salable product if they did not version into
CS6. So while its trivially true they could make unlimited copies of CS5, its
not a great idea to perceive this as unlimited supply. The supply that matters
is the part people are willing to pay for--this is the marginal information
content-- not the marginal bit content of what is delivered.

In some ways I don't think we're disagreeing, just focusing on different
elements of the analysis. My larger point was exactly that -- keep in mind the
broader elements that are considered as relevant by CxO.

THe CEO of adobe makes decisions, for examople, about how often to incur the
marginal cost of versioning the next Creative Suite, how rapidly and how much
to budget, etc. COO of facebook looks at the marginal cost of data centers for
the next 200 million users, etc, in part because s/he is looking at timeframes
and scales which are not the same at the level of a project team, etc.

------
tbassetto
I don't see how the fact that he's a Mozilla blogger is relevant.

The file contains "just" full name, e-mail and URL. Thieves got the
information thanks to their Facebook apps (no idea of its name), it could
happen with any third-party app.

~~~
jcromartie
I'd imagine you could scrape a million people's publicly available info, too.

~~~
sneak
Have you ever tried scraping Facebook?

~~~
Achshar
What should I expect?

~~~
Permit
They're pretty clever. When I started programming in 2009, I wrote a small
scraper that would create accounts, friend people and steal their info if they
accepted. (I never released it past my own friends list and never sold the
data).

There were the obvious checks for CAPTCHAs when too much activity was
detected, but other subtleties as well. If you looked at too many people's
profiles, emails wouldn't be displayed as text, but as images. A person would
be unlikely to notice as the pages looked identical, but dynamic changes like
that make it harder to scrape some things. Introducing even rudimentary OCR
requirements is enough to turn away a lot of programmers.

I'm not saying it's not possible to pull off. But Facebook has set it up so
any money you might make this way will likely not be worth the development
time required.

~~~
mkjones
Glad you found our anti-scraping stuff to be neat! I work on the team that
builds a lot of that technology at Facebook. Any interest in interning here
sometime and helping us improve our systems even more?

~~~
Permit
You guys do a really great job.

To be perfectly honest, I've kind of fallen out of love with web development
in the last year and have taken more of an interest in algorithmic trading. I
appreciate the interest, though. :)

------
moeffju
Here's what I think is the offer on gigbucks: [http://gigbucks.com/Social-
Marketing/26055/instantly-give-yo...](http://gigbucks.com/Social-
Marketing/26055/instantly-give-you-an-email-list-of-11-million-valid-Facebook-
users-with-name-last)

~~~
bogomil
You don§t have to show the URL. Those guys have to be banished, not
advertised.

~~~
Spearchucker
That's like saying "don't think of a camel". My immediate reaction to your
comment was to click on the link - which I otherwise wouldn't have. This is
not a criticism, just an observation.

------
windle
Why is he called a Mozilla blogger? He doesn't work for Mozilla foundation or
Mozilla corp as far as I can see (nor does he claim to). His linkedin states
he associates with the "Mozilla community", but that's hardly an official
representative of Mozilla as "Mozilla blogger" implies.

~~~
nnethercote
His blog is syndicated to planet.mozilla.org. It's hardly a high quality blog.

------
bergie
I don't see this as a particularly big deal. Databases of email addresses have
been available for cheap for a long time, as is evident from the amount of
spam we all get. This is after all why spam blockers are so important.

------
lazyjones
How is that news? Many of these names can be probed for using the public FB
API, without being logged on and without access_token:

e.g. try <https://graph.facebook.com/1112112584> with curl or so ... (sorry
random member from the published list)

Some spammers have probably been harvesting that API for a long time ...

~~~
Indyan
One additional data that I see in the list he purchased is email address,
which is probably what the spammers would value the most.

------
pi18n
I think instead of having users agree to permissions on Facebook and Android
apps, they should have to explicitly grant permissions to the app. Maybe by
dragging an icon or something that represents "email address", "real name",
and other concepts.

~~~
Mahn
This would dramatically lower the conversion for developers, and unhappy
developers make facebook unhappy. Facebook could make it way more obvious if
they wanted to, but they just try to balance how much they screw users vs how
much they screw developers.

------
gondo
what's so surprising about this? you can get list of valid emails and just run
them through facebook search and you ll get names and profile URLs, both
public information no matter what your security settings are. nothing special
about this

~~~
mapleoin
Where do you get a list of 1 million valid emails?

~~~
dangrossman
Type "buy email lists" into Google and pick any ad. Even Salesforce appears to
be selling 30 million e-mails.

~~~
mapleoin
That's what I'm saying...

------
ballstothewalls
The email I used for facebook is facebook@mydomain.com. I have gotten several
credit score emails sent to it... This started happening after I had
deactivated my account. I never used apps that much, though I did sign up for
them occasionally.

I have never noticed this with any of my other unique emails, just the
facebook one.

~~~
theycallmemorty
Is that domain publicly accessible? Could facebook@(any domain you know
exists).com be a reasonable shot in the dark?

~~~
ballstothewalls
Yes, but it doesnt go anywhere except a blank page with my name on it.

------
hcarvalhoalves
The sheets are named "sayfa". That's a hint about who he bought the list from.
Anyone knows which language is that?

~~~
frkn
Turkish

------
brianbreslin
Excuse my naïveté, but what could one use this list for? Aside from spam email
blast?

~~~
jklio
The OED confirms naivety is also an English word that's been around since at
least the 1700s. I am not implying naïveté is wrong in English, just that
there is a simpler option should you so choose.

~~~
brianbreslin
autocorrect on my ipad went with the more pompous version :-P

------
diziet
One million?

Try 3.5 million: [http://fiverr.com/palash1987/provide-3500000-facebook-
emaii-...](http://fiverr.com/palash1987/provide-3500000-facebook-emaii-iist-
database-with-profile-link)

------
driverdan
This is one of the reasons why I don't use FB apps. The privacy and security
controls are far too loose and broad. It's never clear exactly how your data
will be used.

One of my clients has a FB app so work with this stuff daily. It's so easy to
build a full profile of of an active user's life, their interests, their
friends, their work and education history. Their geotagged photos and check-
ins tell you exactly where they like to go. Pure gold for marketers /
spammers.

The information is just too accessible and valuable for people to not abuse
it.

------
benjlang
And that's why I use the <https://mypermissions.com> plugin, so my email
doesn't end up in places like this.

------
gailees
I'm sure much of this information is public on their profiles to begin with;
with a simple web scraper, you can acquire information about millions of fb
accounts and their respective email addresses.

OFC there's an oversupply since they can give out an unlimited amount of
copies of this same million FB entries.

Big Data is great because it's super re-usable and can be purposed for
anyone's specific need.

~~~
seanlinehan
I actually did build an FB scraper about 2 years ago. If I'm remembering
correctly, something like 1/2000 people have their e-mail address public. Name
+ URL is always available, but the e-mail is a bit more valuable.

------
mandeepj
I just checked facebook people search page does not show user's email address.

<http://www.facebook.com/directory/people>

------
erjierjtjrej
Wasn't there a massive dump of FB names paired with emails, released by
lulzsec or some other 'Anonymous' organization last year? Could make this
'deal' quite useless.

------
code_duck
I'm quite skeptical of just about every offer to login with Facebook, or
install an 'app' within the site, whether it's Apple, Etsy or some other
dishonest corporation. Just about everyone wants mainly to gather your
information for some other purpose.

Unfortunately, as usual your average user thinks nothing of clicking through
the permissions page on FB without reading or understanding what it says.

------
Fightback
I laugh at the censor fail. See selected field.

~~~
cpeterso
I'm not seeing it. What is the censor fail in line 66?

~~~
sold
The picture was edited in the meantime. Previously the formula field
(displaying content of current cell) was not grayed.

------
livebeef
That's hardly a deal, it is possible to scrape name, url, gender, locale and
profile picture just by bruteforcing user ids:

<http://graph.facebook.com/4/>

<http://graph.facebook.com/4/picture>

You don't get the email but that would be really bad.

------
tathagatadg
Just curious, is this illegal? What happens if the seller didn't disclose the
method in which he aggregated the data or says I just stood in a shopping mall
and asked people to voluntarily disclose these details? (Sure he'll have a
hard time proving that) ....

~~~
ceejayoz
Chances are it's in the app's privacy policy that they can share your data
with "carefully selected third parties" or something similar.

~~~
Evbn
Not allowed by Facebook developer TOS (which they never enforce).

------
justjohn
I wonder what other data the seller has. It seems likely that you could get
more valuable data from the users of these facebook apps since people tend to
says yes when apps ask for permission to access data.

------
mikeevans
Looks like Facebook isn't happy about it: <http://talkweb.eu/openweb/1842>

------
lysium
Maybe should have blurred out the edit line, too.

------
gailees
BTW, where can I find this deal? Would love to see this list in it's
legitimate form if nothing else.

~~~
bluetidepro
The url is in a comment above. --
<http://news.ycombinator.com/item?id=4687676>

------
greenwalls
Does anyone have any idea how the Facebook accounts were taken from the
owners? Scary.

~~~
thomaslutz
What do you mean by "taken"? I guess the Facebook application just used the
privileges the users provided to it to compile this list.

------
wheelerwj
man that's nothing. You should see what I am able to get for free. although I
did pay $15 for a list of 100m emails, most of which have phone numbers,
names, addresses attached to them.

------
Mahn
According to the link the data was collected through an app. This is not
Facebook's fault, this is the developer being an idiot, because Facebook TOS
are very clear when it comes to data; the developer is in big trouble if
found.

------
rburhum
and then, the data provider sells the credit card information you gave them to
buy this list in another market... Do YOU feel safe?

------
witoldc
So what did he buy that he couldn't get free using Google search? Nothing?

There are phone books out there that list more information that that. Do you
feel secure?

------
catshirt
1 million Facebook entries isn't cool. u know what's cool?

