Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Test if your (US) phone number is in the leaked Facebook data (thenewseachday.com)
44 points by davidjohnstone 11 days ago | hide | past | favorite | 66 comments

When a person submits their phone number to this website, the website operator can immediately search Facebook for the user name associated with the phone number. No need to save the phone numbers. He can turn them into user names. Edit: Not suggesting the website operator is doing that, but the point is he could.


> website operator can immediately search Facebook for the user name

I’m not seeing that. If Facebook recognizes the phone number, it offers to send an SMS to that number or to send an email to an obfuscated email address (like r***@***).

At most, you learn that the user’s email starts with an “r” as in example above. And possibly the numbers of letters in the username, but my experiments indicate that Facebook is not telling you the precise number of asterisks to deduce the length.

Maybe when you tried it, you were using a known device (eg., having an IP address from which you logged into Facebook on an earlier occasion), and therefore Facebook was offering you more detail, such as your username. Could you try a random phone number from the 533M leaked numbers and see if you still get the username?

Yes, the results will vary depending on IP+device fingerprint.

At a minimum, what this search allows is confirmation that a phone nnumber is associated with an existing Facebook account.

Interesting, I didn't know you could do that. When I write "I'm not saving the phone number you enter" I mean to convey the idea that I'm not doing anything else with that number. As far as I know, the number and any derivative products of that number only exists inside the request context and are not stored in any way. There could be bugs, or I could be lying, but I'm not doing anything underhanded like that.

If you are using POST over https, that should be fine.

But, if it is GET requests in the URL over http, yeah, that is leaving a trail.

Any 3rd party JS analytics has access to everything, though.

But it appears he is not doing that. I was able to search the phone number of someone I know, and it was confirmed to be in the breached data. However, it is not present in your above password reset / recovery link. On the other hand, it is for some others I tested.

Reading some of the comments here about entering your phone number on this site, I don’t believe there’s as much as risk as people seem to think. If the site were asking you to enter your username and password to check if it’s on a pwned list, well, yes, that would be a dumb thing to do.

But a telephone number is telling the website operator at most that the phone might be associated with you, but all he learns about you is your IP address and possibly your browser fingerprint. He doesn’t get your name, Facebook ID, email address, interests, password, or anything else.

Now you might say that he can see your Facebook ID or email address in the list of leaked data, and possibly through the Facebook password reset user interface as well. But he could have done that anyway without you ever having supplied the phone number. He has the entire leaked list and it seems that pretty much anyone can get the leaked list with modest effort.

Furthermore phone numbers by themselves carry very little information because they are not sparse. If I give you a correctly formatted 10 digit number like nnn-nnn-nnnn, there’s a quite good chance that it’s a working North American telephone number. By correctly formatted, I mean that it has a valid area code, that the prefix doesn’t begin with a 1 or 0, that the prefix is not 555 (that’s for movies you know), etc. If you follow those rules, I once worked out that you’d have a 20% chance that you’d get a working phone number.

The point is that keeping your random 10 digit phone number off the Internet offers you no additional security or privacy. Phone spammers can test call every possible North American telephone number just as hackers can scan every IP4 network address in the world (only 2^32 of those).

Associating the phone number with your name is bad, I agree. That allows targeted attacks (and targeted spam calls). But you are not giving your name to this website operator. You’re giving him 10 digits — he could have pulled 10 digits out of thin air and it would likely have been someone’s phone number anyway.

In my country my phone number is associated with everything from my electronic identification to bank transactions. It is part of a multi factor authentication. Giving up my phone number would weaken one of those factors. I don’t like it but the system is like that I don’t have much choice.

If my phone number could be linked to my Facebook data then I am giving away a lot more information than just my phone number.

Best put your phone number into this site then, so you can find out if you need to change it.

On the other hand, putting one's phone number into that site, or any other utterly anonymous random site (what the F is "newseachday.com" ?) on a whim, is a mistake.

Why is it a mistake? Of what value is a valid phone number absent any other contextual information? Given a day or two, I could write a script to generate and submit all valid phone numbers in the world. Would everyone be doomed by my having entered their numbers?

I've got several different phone numbers, and they each receive different amounts and types of spam. You'd think given the limited size of the phone number space, that spammers would just dial numbers indiscriminately. However, it appears that they still operate using lists of more-likely victims. By entering my phone number in random forms, I'd worry that it could be used as another indication of a good number to call to bother real humans. Of course that's in addition to the obvious doxing of my online nym if done from the wrong browser VM.

The contextual information is in the leaked data, which you should assume they have a copy of.

Ok, but then they already have that. What does entering the phone number do that makes it dangerous?

Not super dangerous maybe, but undesirable: It adds your IP address to the data they have.

You haven't heard of thenewseachday.com because I quietly put it up online last week.

So? What reason would I have to trust it? Seriously.

I expect you mean well, but the "news" on that site is just an aggregator, like millions of sites with no-so-good intentions. This isn't a good "trust-me" look.

How about answering ashkankiani’s question?

If you are in the breach, the site operator already has your name. And now they will also have your IP address. Yes I realize not all users will enter a phone number that is their own. But for those that do, they’ve just added valuable information to the data held by the site operator, in the nefarious scenario.

David may very well be a trustworthy individual, but any time you voluntarily transmit your phone number online (to Facebook, or otherwise...), you are putting your own personal data security at risk.

Why not just release a compressed CSV of just the numbers? I could grep it in 1 second without worrying about leaking information.

It would be nice if it let you search partial numbers and manually scan the results.

I get so many spam calls I'm afraid to enter my number anywhere.

I get so many spam calls I'm not sure it would make a difference. I'm just about ready to set my iPhone to send all calls to VM if they're not in my address book.

Any plans to add a Paranoid Mode that lets you search for a hash of a phone number (or email address)? I'd imagine that could be more successful on here, heh.

The search space of phone numbers is far too small to offer any type of anonymity. Submitting a phone number's hash is no more secure than submitting the phone number itself.

That is true in principle and I didn't take that into account when I thought about it. Though it would still be possible, in theory, to use a hash function that is cheap enough so it's feasible to hash all the leaked numbers once, but expensive enough that it would take a long time to brute-force the whole number space.

Since there's about 32 million leaked US numbers, but there potentially exist up to 10 billion, any hash function that would take a day to process the leak would still require over 300 days to bruteforce the whole space.

Granted, any number in the leaked set could still be trivially reversed when submitted -- but those were known already anyway, they are just associated with more metadata now.

Many methods could be used to reduce the search space, such as not allowing the seventh-from-last and tenth-from-last digit to equal 0. Now I've reduced the search space by two orders of magnitude, so those 300 days just became 3 days.

To be honest, I don't really know the rules governing US phone numbers and just did a cursory google search, which came up with the simple answer of "10 billion".

That is very likely quite an overestimation. But if I'm not mistaken, the limitations you describe only reduce the search space by 19% (since it's two 90%-steps).

  100 * 0.1 = 10  // First  90% Reduction
  10  * 0.1 = 1   // Second 90% Reduction

Not allowing a single digit to be 0 is a 10% reduction though, is it not?

Oh, you're right! Thank you.

No worries :)

I checked the math so many times because I was starting to suspect my mind was glitching, lol.

> I'd imagine that could be more successful on here, heh.

You're right about that!

If I was to make a HN-friendly version, I'd probably make static JSON files that list all the numbers, indexed by the first four or so digits. When you enter a number, the first digits are sent to the server, and the appropriate JSON file is returned. That list is then searched client-side for the full number and the result displayed. The code should be simple and easy to verify that the full number doesn't leave the client, while maintaining the same simple user interface I already have. Variations of this idea could be more secure (i.e., only enter the start of the number and search for your number yourself in a long list) but less user-friendly.

I don't actually have any plans on implementing this though. I feel satisfied enough with what I have.

(I don't think hashing would work because the address space is too small and reversing is too easy. There aren't any email addresses.)

Just release a CSV containing just the numbers as a zstd compressed file. We can search it ourselves.

True, hashes would be completely trivial to reverse, I didn't think that through :D

And you're right, the only way to build a HN-friendly version would probably be to basically do the checking client-side, since any additional information you send to the server could be directly used to narrow the search space.

I think I read that there are some email addresses in the leak though; wasn't HaveIBeenPwned searching only for those, but not for numbers?

Oh, you're right, there are some email addresses, but not many. In the first 10,000 rows of Australian data, there are 62. I could be wrong, but I think the extra data about users (i.e., location, email address, relationship status, workplace) was scraped from Facebook so it only includes it when it was already publicly visible.

Just brute-force his website’s form with a polite delay between requests and enumerate your own list of numbers!

I laughed, then had another idea: Rather than send the server one number to check, generate another 99 random numbers in the client and send them all to the server. The client receives the status of all of them and shows the status of the entered number. The server never knows the actual number, and the phone number address space is saturated enough that many or most of the random numbers are also real numbers.

That is similar to how I checked, actually.


Why bother ? People here would have the dump and just grep it I think.

Thanks David! Was going to download/parse the data myself to see if I had been impacted so appreciate you making this tool.

I made a version for NL users, but based on first name/last name (returning the last three digits of their phone number): https://jstsch.com/facebook

Where did you get this data? I could only find Australia.txt but it was a misspelt Austria

Any plans to create it for the other countries? with partial search maybe?

Really helpful - thank you!

If anyone with the zip file is trying to "grep" themselves on Windows the powershell command is

Select-String "(firstName\b)+:(lastName\b)" '.\theUnzippedFolder*.txt'

I'm not familiar with windows at all but this worked for me testing names I already know are in the txt and I found some people with names like mine but not me, phew It wouldve been faster to just open each text and control+f but I guess I learned something useful.

HN keeps removing the slash before * but you'll figure it out.

yeah not going to use any search tool where i need to enter my number ... you could just post the data by area codes..... just create a bland UI that lists all area codes ..let user click into the area code and then on the next page list all the phone numbers in that area code that have been affected.

I'd use that but not searching by phone number.

Years of telling Facebook I won’t give them my number finally pay off.

I deleted my account a while back. (Well, whatever Facebook calls “deleted.” Incidentally, did you know you can’t delete your HN account, even if you email them and ask?) Curious whether there’s any data for me at all in this breach.

> Incidentally, did you know you can’t delete your HN account

Reason enough right there not to link an email to your HN account

If you did something like asked for the first and last digits and the other digits of the phone number in any order and then returned the list of phone numbers that contained those digits even that would be better.

Is this satire? I hope it is. I hope that when you enter a number into the field; gigantic text appears on the screen in 150pt font saying "HAVEN'T YOU LEARNED ANYTHING???"

In what way, exactly, am I compromising my security by entering random phone numbers into that input box?

Best case scenario, they link a phone number to my IP address which is dynamic. They can narrow my IP address down to the the general area, but phone number area codes already accomplish that.

People seem to forget that we used to have things called PHONE BOOKS which had people's addresses and phone numbers publicly available...

They now know you have a Facebook account and can use that to extrapolate your username using the Account Recovery feature of Facebook. If you're part of the breach they get to add your IP to the data they've already got on you, along with whatever was in your user agent. This is usually your mobile device make and model and/or your OS and browser.

Eh... They still can't do anything meaningful with that information that I'm particularly worried about.

Facebook shows a profile picture associated with the phone number but it can't be linked back to the actual facebook profile because they use an obfuscated URL for the photo. Facebook also shows email addresses, but they are also obfuscated.

Or you could just download the data yourself and grep it. No need to submit your phone number to yet another website.

Where would one find the data though?

Where can I find the data set? I want to grep for my friends and family and see which data is leaked.

Probably here but you have to pay: https://raidforums.com/Thread-CSV-Facebook-370-Million-db-re...

I don't know of anywhere that has leaks for free, but I'm not really involved in that world so maybe it exists.

I didn't really want to give money to a scummy website like Raid Forums but I figured this would be a popular torrent. If you search BTDigg for "Facebook" and order by date then it is on the first page.


At least I think that's probably it. Not 100% sure.

Edit: Yeah seems legit. It has me in it. It has my phone number, name, facebook ID, address (though note that it is just the address you tell Facebook - mine is just "London, UK" which isn't even up to date), and then a date. Not sure what the date is; possibly retrieval time? Most are between 2016 and 2018.

Why only USA?

I also made an Australian version: https://www.thenewseachday.com/facebook-phone-numbers-austra...

The data is separated by country and it seems it isn't perfectly consistent (i.e., the Australian data is one CSV file while the American data is six colon separated files), so there's effort in adding each additional country.

Is this actually an Austrian version? Per other threads, the dump included "Austriaia", which was "Austria", but didn't include "Australia". Also the country code for Australia is +61, and for Austria it's +43.

It is the Australian version (although I accidentally downloaded that Austrian data first). I turned the phone numbers into local Australian numbers without the international code, where mobile numbers are always 04?? ??? ???.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact