
More than 1MM Facebook accounts exposed  - nico-roddz
https://www.google.com/search?q=bcode%3D&oq=bcode%3D&sugexp=chrome,mod=0&sourceid=chrome&ie=UTF-8#q=inurl:bcode%3D%5B*%5D%2Bn_m%3D%5B*%5D+site:facebook.com&hl=en&safe=off&prmd=imvns&filter=0&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&fp=8c0bb27d33614e56&bpcl=37189454&biw=1560&bih=698
======
mkjones
My name is Matt Jones, and I work on the Facbook security team that looked
into this tonight. We only send these URLs to the email address of the account
owner for their ease of use and never make them publicly available. Even then
we put protection in place to reduce the likelihood that anyone else could
click through to the account.

For a search engine to come across these links, the content of the emails
would need to have been posted online (e.g. via throwaway email sites, as
someone pointed out - or people whose email addresses go to email lists with
online archives).

As jpadvo surmised, the nonces expire after a period of time. They also only
work for certain users, and even then we run additional security checks to
make sure it looks like the account owner who's logging in. Regardless, due to
some of these links being disclosed, we've turned the feature off until we can
better ensure its security for users whose email contents are publicly
visible. We are also securing the accounts of anyone who recently logged in
through this flow.

In the future if you run into something that looks like a security problem
with Facebook, feel free to disclose it responsibly through our whitehat
program: <https://www.facebook.com/whitehat>. That way, in addition to making
some money, you can avoid a bunch of script kiddies exploiting whatever the
issue is that you've found.

~~~
lazyjones
The URLs don't need to be posted online. Some browsers (Chrome, possibly
Firefox with Safe Browsing mode, very likely any browser with a Google Toolbar
installed) send visited URLs to Google and they will be indexed. I don't know
if this is officially documented by Google, but several people have reported
seeing this while testing new/beta websites that weren't published or linked
anywhere.

~~~
franze
an old meme, and my usual recommendation: just test it: create a page that i
not linked from anywhere. visit it with the browsers mentioned above. watch
the logfiles. wait for it. nope, no googlebot request. it is unbelievable easy
to test, i have done so on various occasions in the past, so there is no need
for you to spread a "several people have reported" rumor. just ... test ...
it.

as for the old stories, that google does this kind of thing: people,
especially SEOs or people who think they know SEO, always blame google. oh, my
beta.site has been indexed, it must be because of ... google is evil.

most of the times i have seen cases where googlebot found a not published yet
site it was because of (just some examples, not a complete list) i.e.:

* turned on error reporting (most of the PHP sites) * the URLs were already used in some javascript * server side analytics software, open to the public * apaches shows file/order structure * indexable logfiles * people linked to the site * somebody tweeted about it * site was covered on techcrunch (yes, really) * all visited URLs in the network were tracked by a firewall, the firewall published a log on an internal server, the internal server was reachable from the outside * internal wiki is indexable * intranet is indexable * concept paper is indexable

testing your hypothesis "chrome/google toolbar/... push URLs into the
googlebot discovery queue, which leads to googlebot visits" is easily
testable. no need to spread rumors. setup for testing this: make an html-page
(30 seconds max, basically ssh to your server, create a file, write some
html), tail & grep logfiles (30 sec max), wait (forever)

~~~
blauwbilgorgel
It is a myth that is hard to get rid of. No one wants to admit they tweeted
out a link to the dev website.

Though I recently found this on the Google+ FAQ:
[http://support.google.com/webmasters/bin/answer.py?hl=en&...](http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1140194)

    
    
      When you add the +1 button to a page, Google assumes that
      you want that page to be publicly available and visible in
      Google Search results. As a result, we may fetch and show
      that page even if it is disallowed in robots.txt.
    

I can understand adding a +1 button to a dev site, and then not understanding
why it shows up in the index.

------
surferbayarea
Facebook's privacy settings have a ton of bugs. Here's another one: 1\. Make a
stupid status update post. 2\. It appears in all your friends newsfeed. 3\.
You realize you said something stupid and private. 4\. Panic. Delete post 5\.
Breathe sigh of relief that it is no longer showing up in your profile. 6\.
But wait a minute! It still keeps showing up in all your friends newsfeed. 7\.
Now that you deleted the post, you can't even modify it's visibility settings.
Heck you can't even get to the url. But all your friends continue to see the
post in their newsfeeds

So the way facebook implements a delete of any activity(status
post/like/comment) is that the owner stops seeing it but everyone else keeps
seeing it. That is simply the most retarded delete implementation ever!

~~~
nthj
to be fair,

cache invalidation is hard.

~~~
drivebyacct2
And naming things.

~~~
Laremere
Together with off by one errors, they're the two hardest things in computer
science...

~~~
alexkus
Yes, the three hardest things in Comp Sci are cache invalidation and off by
one errors.

~~~
keesj
Don't you mean the 10 hardest things?

~~~
jamesrom
I think he means the 11 hardest things actually.

~~~
alexkus
10 in ternary (which I think was his part of the joke)

------
Mithrandir
Weird. I clicked on one of the links and it asked me if I was that user, and,
if so, that I should click the login button. When I did, it logged me in as
that user.

Edit: This happens for multiple users.

Edit2: It looks like if you click on the link, it automatically expires. bCODE
is "an identifier that can be sent to a mobile phone/device and used as a
ticket/voucher/identification or other type of token." I'm guessing somehow
these tokens (the ones that auto log you in) never got used, plus the old ones
were saved and contain email info. Not sure how Google could have gotten them
though. Probably just got accidentally listed, despite robots.txt.

~~~
nico-roddz
It's really weird.

~~~
rxooo
how? I am unable to replicate this bug

------
jpadvo
Here's one theory and analysis of what might have happened. Some people's
emails got out into the public internet, and were indexed. Some of these
emails were from Facebook, and included links to resources that require login.
These links pre-populated the username field for convenience, or in some cases
auto-login the user. Facebook's engineers probably did not anticipate email
notifications to users being crawled by Google. Live and learn, eh?

But could Facebook have done something to prevent or minimize the damage
caused by these leaked emails?

1\. Lets start with the auto-login links, as those are the scariest. Do those
links use one-time-use tokens, and do the tokens expire? If either or both of
those steps was skipped it makes this leak much more serious, and speaks to
negligence or disrespect for user security. If Facebook has both of those
security measures in place, though, they did all they realistically could. If
somebody lets their private email get indexed by Google (seriously, though,
how does that even happen??), that's their own problem.

2\. The other class of leaked urls link email addresses to Facebook profiles.
This isn't as immediately scary, and for a lot of people it wouldn't even
matter. But it is easy to imagine scenarios where this kind of privacy would
be important to someone, and this kind of leak would be just as scary as
someone being able to log in as them. Frankly, I never would have thought of
securing this, and I doubt Facebook did anything to secure it. Going forward,
though, it would probably be worth it for them to link auto-username-
populating through one-time-use, expiring tokens as well.

So, it looks like Facebook probably got hit with a bizarre edge case privacy /
security issue. There are likely things they could do to make their system
more resistant to this kind of thing, but at the same time they probably
didn't do as badly as this might make them look at first glance.

Again, this is speculation, any confirmation or disconfirmation would be
great.

~~~
nico-roddz
This is how everything started:

A friend forward me an email from a FB group notification

Something like:

<http://www.facebook.com/n/?groups%[id> here]%2Fpermalink%[id here]%2F&mid=[id
here]&bcode=[id here]-mjoi&n_m=[email adress here]

When I clicked the url I got automatically logged into my friend's account.

So is definitely a Facebook security issue.

Then I tried some google searches to see if I could find some urls containing
the parameters:

bcode= &email= n_m= mid=

Not a big deal, really.

~~~
dhimes
Thanks for catching this nico-- looks like it's been removed from Google.

~~~
nico-roddz
You're welcome!

------
Matt_Cutts
Okay, I've been through all the comments and I'm going to try to summarize:

\- It looks like in some situations, Facebook will send an email that has a
link. That link expires after a certain amount of time, but in the mean time,
clicking that link lets people access that Facebook account.

\- A large number of services can be set up to automatically post any email
received onto the web. One major category is disposable email services such as
asdasd.ru. Any email to a throwaway account on asdasd.ru gets put up on the
web. Here's an example Facebook recovery email that got turned into a web
page: <http://asdasd.ru/read/414831>

\- Once these emails are just webpages, it's no surprise that search engines
discover those URLs. Note that this is not a Google-specific issue. When I
search on Bing for the query [site:facebook.com bcode n_m mid], the first
result is also one of these urls that has an email address embedded in it. For
a debunk of the misconception that this is related to the Google Toolbar or
Chrome, see my post elsewhere in this discussion at
<http://news.ycombinator.com/item?id=4733276>

So: an email gets sent to someone. That email gets put up on the web as a
webpage. Search engines (including both Google and Bing) find that webpage as
they follow links on the web.

~~~
blauwbilgorgel
I tried Bing and Yandex to find the email bodies. They didn't return many
results (but they do return results).

[http://www.bing.com/search?q=%22wants+to+be+friends+on+Faceb...](http://www.bing.com/search?q=%22wants+to+be+friends+on+Facebook%22+%22If+you+don%27t+want+to+receive+these+emails+from+Facebook+in+the+future%22)

When I try on Google to find the email bodies, I get 250k results, of which
the large majority are on blogspot.com sites.

While mail bodies can be found on a few other sites, like the asdasd.ru
example, and other search engines have found these links too, the main issue
still seems to be with blogspot.com -- These aren't throwaway accounts with
public inboxes, but likely some virus that is intercepting certain mails
(Facebook, Twitter, Youtube, Twoo) and reposting them as a blogpost for
everyone to see.

As Blogspot is Google-owned, this does seem to me a predominantly Google-
specific issue.

~~~
Matt_Cutts
No, Blogger also has a feature that will automatically post messages sent to
an email address. Here's an example email from Facebook that was posted to a
blogspot.com url: [http://weight-loss-
information-123.blogspot.com/2012/08/misb...](http://weight-loss-
information-123.blogspot.com/2012/08/misbaul-hussain-wants-to-be-friends-
on.html)

If you look at the bottom of that Blogger post, it says "This message was sent
to <a gmail address>." So an email from Facebook got posted as a web page to
this blog.

There's no need to suspect some virus that's intercepting emails. Plenty of
people have set up their systems such that email messages get turned into web
pages.

~~~
blauwbilgorgel
You are probably right and I apologize for any misinformation. To me it seemed
strange that the blogs first started spamming, followed by publishing only
certain emails. Wouldn't it make more sense if all emails were published, not
only from certain webservices? Why would a user want to publish their private
Facebook emails in the first place? None of these accounts posts normal
updates, they act compromised.

------
neya
Thank you for exposing this. Much appreciated. Here's one more - The last time
I checked, Facebook revealed 'what you liked' to search engines like Google.
For example, if you search for your name inside double quotes like this -
"Your Name" you will see your name listed virtually on every single page you
liked, for example, If you had liked Sony's Facebook fan page, then your name
would appear in the search results something like this - "[Your name] and 8
others like this"

That's strange because I did tell Facebook under my account settings NOT to
list my profile or my name on Search engines.

To summarize - So be careful with what you 'like', because it really just
takes a Google search to find out your interests. This could (potentially) be
a problem if you are actively seeking employment (and if you had 'liked' some
crazy stuff) or if you have a crazy girlfriend.

------
tszming
Common misinterpretation on how Google handle `Disallow` in robots.txt

Q. If I block Google from crawling a page using a robots.txt disallow
directive, will it disappear from search results? [1]

robots.txt Disallow does not guarantee that a page will not appear in results:
Google may still decide, based on external information such as incoming links,
that it is relevant. If you wish to explicitly block a page from being
indexed, you should instead use the noindex robots meta tag or X-Robots-Tag
HTTP header. In this case, you should not disallow the page in robots.txt,
because the page must be crawled in order for the tag to be seen and obeyed.

[1] [https://developers.google.com/webmasters/control-crawl-
index...](https://developers.google.com/webmasters/control-crawl-
index/docs/faq#h17)

~~~
lloeki
We develop and host a bunch of extranets, which without login consist of your
typical authentication page. We put a _robots.txt_ file there, and the only
sites that link there are our customers companies home sites.

Google still indexes them. The definition of "relevant" here defies my wildest
imagination.

~~~
kevinpet
robots.txt is not about indexing. It's about crawling.

------
djsla
Large number of login emails seem to be from asdasd.ru domain. Googling one of
these emails I find a site that resembles a public inbox with emails from
Facebook in it, like this one - <http://asdasd.ru/read/414831>.

------
FedericoElles
You are able to post on blogger via email. If you register with this blogger-
email-address on facebook, all facebook notifications are published as blogger
posts and indexed by Google. Actually this might be a used to circumvent a
firewall preventing you from using facebook. You can search for the leaked
email addresses on Google and propably find blogger blogs with facebook
notifications posted.

------
anonymouz
I'm somewhat curious, why MM for million/mega and not M? Or does the second M
stand for some unit?

~~~
accountoftheday
The Roman numeral M (mille) means 1000. M^2 therefore equals one million.

~~~
anonymouz
If that's truly how people use it, it is very strange. In actual roman
numerals MM = 2000. Using 'M' as a roman numeral but then multiplying digits
makes no sense at all (you'd need numerals for all the prime numbers to
represent arbitrary numbers...).

And in SI, the prefix 'M' (mega) already means 1 million, so to me it seems MM
is the notation that maximizes confusion.

~~~
campbellmorgan
I totally agree - there's not a roman numeral justifaction for it at all and
it's very confusing in normal situations. My understanding is that it comes
from financial (specifically trader) jargon and I suspect it probably
originated to differentiate it from some other use of "m", but don't know for
sure... Maybe someone else knows why it arose?

------
Yuioup
More than 1 millimeter Facebook accounts exposed?

~~~
jahewson
Millimetre is mm, not MM which would be "meter meter" which is nonsense. MM is
actually the roman numeral for 1 million.

~~~
anonymouz
Capital M would be 'mega' as a prefix, but I think it does not exist as a
unit. If you're willing to read MM as Mm, then it would be Megameter.

Your really should revisit roman numerals. MM = 2000, you have to add them,
not multiply.

~~~
jahewson
Good point. So it turns out that MM is supposed to mean "thousand thousand" in
the world of finance, but it is indeed not a correct roman numeral. Old
school.

------
randartie
A large number of the emails are from 'blogger.com'. Aren't google and blogger
one in the same? Are the urls being crawled because google is reading its
emails and crawling contained urls?

------
amarcus
You think this is bad. Try doing the following google search:

"password" filetype:csv

~~~
jchavannes
I'm sure lots of people have had unwanted encounters with Google's crawlers,
but here's mine: I used to have a subdomain pointing to my home IP which was
protected using Apache htpasswd. I naively had all of my clients' credentials
stored in text files (conveniently named credentials.txt). Somehow I
accidentally removed the htpasswd authentication and it was publicly exposed
for a day or two. Of course Google indexed it and you could view everything in
Google's cache.

There was a process for removing content from Google, but it took a few months
to get completed. I never told anyone and I'm pretty sure all that info is now
purged (I've tried to find it multiple times and it doesn't seem to exist
anywhere).

I also downloaded a WoW guide that I had temporarily thrown up on one of my
servers and forgot to take down. Like a year later I randomly was running a
Google image search for 'Northrend Map' and happened to notice my site was the
THIRD image. At first I thought it was a personalized search result, but I
checked from multiple other places and it was still there even though there
were zero inbound links.

------
cft
What is the meaning of the square brackets in the Google query syntax? I could
not find any official documentation.

~~~
akaBruce
They don't seem to do anything. Searching without the brackets seems to give
the same results for me.

[https://www.google.com/search?q=inurl%3Abcode%3D*%2Bn_m%3D*+...](https://www.google.com/search?q=inurl%3Abcode%3D*%2Bn_m%3D*+site%3Afacebook.com)

------
pokoleo
Looks fixed.

~~~
camus
not fixed here, i can access (private?) infos from people i dont even know,
scary.

~~~
Calvin099
Yeah? Proof?

------
nico-roddz
[http://translate.google.com/translate?sl=es&tl=en&js...](http://translate.google.com/translate?sl=es&tl=en&js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&u=http%3A%2F%2Fnicoroddz.com%2Fagujero-
de-seguridad-en-facebook-permite-acceder-a-perfiles-sin-siquiera-pedir-
contrasena%2F&act=url)

------
olalonde
I'm only getting one result at the moment... what did I miss? Did Google
censor the results?

------
cleverjake
I don't understand how these pages could have been crawled - could someone
enlighten?

~~~
nico-roddz
It's seems that Facebook uses robots.txt to block this pages

<https://www.facebook.com/robots.txt>

But, depending of the amount of inbound links, Google will index the urls
anyway.

It's a common issue.

~~~
cleverjake
Google ignores robots.txt if the number of inbound links is > N?

Also - any speculation as to how so many sites were lining to peoples login
pages?

------
nico-roddz
For the curious. Google results before patch:

[http://nicoroddz.com/wp-content/uploads/2012/11/google-
faceb...](http://nicoroddz.com/wp-content/uploads/2012/11/google-facebook-
leak.png)

~~~
gghdfghdfgh
0,120x800,160x600,160x800;tile=3;ord=1043435364?

------
sabret00the
What is the MM quantifier?

~~~
smcl
I think it's just meant to mean "million" but it's used a lot in the
VC/startup community in valuations (" _CompanyX_ receives $10MM in angel
funding" etc). I think it has come from finance originally, but either way I
think it's unnecessary.

------
jervisfm
Can someone give write a summary on the exact issue ? Was it that people's FB
accounts were accessible through an auto-login link ? Also, I only see one
result returned, and not 1 million.

------
benjlang
This is insane. That's why I use the <https://mypermissions.com> plugin, so
these types of things do happen to my Facebook...

------
stanislavb
Smart! You can extract the emails from result URLs :) param = n_m

------
tlrobinson
I wonder if this is the source of the Facebook data leak ("I just bought more
than 1 million Facebook data entries") last week?

~~~
trevorcreech
If you read that original article, they were harvested from a 3rd party
Facebook App. <http://talkweb.eu/openweb/1819>

------
melc
Google patched this , search results with emails no longer available.

------
hdragomir
Seems like Google have already pulled these results. That was fast.

------
senthilnayagam
saw couple of email accounts, though clicking on the links logged me out

it seems more of what google crawled and stored, another possibility could be
illegally via cookies for ads or analytics

------
benguild
I don't see anything except results for "bcode"

~~~
reg29
When I click on a link I see another person's email address in the login box.
I assume it's a facebook user's email.

See these two examples :

[http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.face...](http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.facebook.com%2Fn%2F%3Fprofile.php%26id%3D100001456572036%26mid%3D6fa41a9G5af3e5dcfef2Gae3ee8G96%26bcode%3DtKss9DeN_1.1351631995.AaTwYn8dLQ56MqMZ%26n_m%3D987654i%2540asdasd.ru&email=987654i%40asdasd.ru)

[http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.face...](http://www.facebook.com/login.php?next=http%3A%2F%2Fwww.facebook.com%2Fn%2F%3Fphoto.php%26fbid%3D361719993917155%26set%3Da.115623531860137.28715.100002374734257%26type%3D1%26mid%3D6e9f28dG5af3c735c4fbG0G109%26bcode%3DKYQ4O9gF_1.1350563168.AaRdixP-
iiqGCuph%26n_m%3Dlachatelaine.baroque%2540mail2.blog.fr%26lloc%3Dphoto_image&email=lachatelaine.baroque%40mail2.blog.fr)

------
sidcool
Can someone explain this to me?

~~~
joeblau
Yeah, me too. I think I got to this too late to understand what's going on
here :)

------
jasongaya
i think it was really great work done by facebook team.

------
drivebyacct2
What exactly was exposed here. It looks like it's been blocked now...

Just stealing from other bit in this thread: somehow these urls got on the
Internet even though they shouldn't have. They are pre-authed urls that auto-
login and then expire.

~~~
runn1ng
Seconding this... I see just ordinary account numbers from here.

~~~
notatoad
Still exposed here - when you click a link, it pre-fills the login box with a
users email. And I guess some of the links include auto-login tokens.

~~~
runn1ng
Oh, OK. I can see the mails too, I just didn't think it's such a security
risk.

------
clobber
This whole Facebook (lack of) privacy thing just keeps getting better and
better, doesn't it?

------
rorrr
1 million million? You mean a trillion?

------
drivebyacct2
I'm interested to speculate on how to best mitigate:

delete all bcodes? Ask Google for a full list of results, regex and a delete
statement? Disable the bcode login and then re-ask the question?

