533M Facebook users' phone numbers and personal data have been leaked online

hn_throwaway_99 · on April 3, 2021

I mean, at this point I think everyone should just accept that at the very least their name, age, address(es), email(s), phone number(s) and screen name(s) have been fully leaked if you have ever had any kind of online presence. Not saying that's right or good, but at this point it's just a fact.

So if that's the case, I think we should move beyond really even trying to think of this info as private or a marker of identity, and we need to move everyone to more secure forms of identity verification.

As has been pointed out on HN before, "identity theft" is a made-up concept to make it seem as if you had something stolen from you, when the real problem is banks and other service providers do an absolute shit job of identity verification. They're the ones at fault, and they try to shift the onus onto you to fix things when they screw up.

Indeed, a social security number is pretty much the only additional piece of data to the stuff above that one would need to open up a bank account in someone else's name, and those have been leaked plenty of times too.

The government needs to make harsher penalties for banks and others that can ruin your credit, etc. because they accept all this leaked info as "proof" of identity.

AnthonyMouse · on April 4, 2021

The problem is completely the opposite. The flaw is the existence of social security numbers. Prohibit them from being used for anything but social security.

Then there is no "your social security number" or "your identity" for someone to open a bank account against. You open a bank account and they give you a bank card and you set an address and phone number. The day you open it, they shouldn't need to know who you are, because it's a new account with $0 in it. After that, the owner of the account is the person with that bank card who can be reached at the address and phone number given when the account was created.

Get rid of centralized identity and there is no centralized identity to steal.

bboreham · on April 4, 2021

Money-laundering rules make it illegal for banks not to know who the customer is. Also various anti-bribery and sanctions rules.

AnthonyMouse · on April 5, 2021

None of those rules are worth the cost. Identity theft, created by the concept of centralized identity, costs billions of dollars. Investigations of those other crimes are still possible without government-mandated privacy invasions by banks, and the privacy invasions are a huge cost in themselves.

bboreham · on April 7, 2021

Maybe.

How would you feel if an investigation stalled when a bank said “we have no idea who these people are, or where the money went”?

novok · on April 4, 2021

KYC laws and governments desire to have silent complete financial surveillance stops banks from acting that way too.

stretchwithme · on April 4, 2021

Why not FIX how an SSN can be used? It was not created for this purpose but that doesn't mean how we use it can't be fixed.

Any time someone attempts to use your SSN to identify themselves as you, you should be notified and your authorization should be required for that use to be allowed.

And the higher the value of the authorization, the more care should be required.

Companies are able to do this already with 2 factor authentication. And I think we SORT of have that for change of address, as the Post Office both requires ID and notifies you by mail. Maybe that's really just one factor.

Allow people to require as many additional factors as they want. Confirming my identity in 6 ways when I decide to sell my house sounds good. 4 ways when I buy a car. One or none is fine when I make a $10 purchase. Let me decide.

And let them CHOOSE what organization manages authentication. A private company might do the job a lot more effectively than the Post Office.

Add human methods too. A call from a sibling or child might be one people could set up. Require validation by a notary public.

AnthonyMouse · on April 5, 2021

> Any time someone attempts to use your SSN to identify themselves as you, you should be notified and your authorization should be required for that use to be allowed.

So now the government needs a way of contacting you. Suppose they have your address and phone number on file.

Then you lose your way a while and become homeless for two years. You can't afford a phone and no longer have the same address, and have lost your ID or it expired. You finally start to turn it around and go to open a bank account. The government contacts you how? How do they know it's you?

The answer is that it's a new account and you're not trying to prove anything about whether you're the same person who lived at the old address, so you shouldn't have to.

And once you have a bank account or a mortgage or such, you and the bank can arrange for any form(s) of authentication you like. It shouldn't have anything to do with the government, and it definitely shouldn't have anything to do with how you authenticate yourself to your job or your wireless carrier.

stretchwithme · on April 10, 2021

If a new account can never used to establish your identity for other purposes, there's no problem opening such an account without government identifying you.

jand · on April 4, 2021

I stand by your SSN only for social security point.

But the rest sounds plain wrong to me.

Your phone might get stolen, your name might change due to marriage, your address might change because you moved/got evicted. What now?

AnthonyMouse · on April 4, 2021

Your phone gets stolen so you use your bank card and password to set a new phone number on your account. Your bank card gets stolen so you use your phone and password to get a new bank card.

If someone steals your phone and password and bank card and ability to receive mail at your home address all at once then you're pretty screwed, but you're pretty screwed then regardless, right? That's the level of screwed where somebody else can also get a government ID in your name.

laurent92 · on April 4, 2021

For sufficient amounts or suspicious operations, banks should be in charge of authentification, ie showing up in-person for bank transfers such as a car, showing up and supplying fingerprints if the bank transfer buys a house...

And in fact, for poor people, anything that might engage their entire savings might also benefit from a more advanced identity verification.

For example my stock exchange account just takes a password. It’s a scandal.

jand · on April 4, 2021

Well, you did not mention a password before. With a password, you introduce a new 'security feature' which changes your argument in a non-trivial way.

This might work better than your proposal before.

dheera · on April 4, 2021

I use a slightly different address with every business I give my address to.

For example

123 Main St. Apt 45-WXYZ

Where "WXYZ" is different for each business, to pollute any data mining algorithms trying to collate my address across sites. In some cases when they use address resolving functions, it also helps to spell out your address e.g.

123 Main St. Apt FOURTY_FIVE WXYZ

uniformlyrandom · on April 4, 2021

SSN is used as taxpayer ID, that is why it is needed.

Now, there is absolutely no reason it could not be OTP'd (or uniquely derived for each account).

ubertoop · on April 3, 2021

The scary thing is how much ones phone number (a somewhat ephemeral thing) is actually bound to your IDENTITY.

Considering your phone number is more and more being used in 2FA ... if you were to ever change your number and someone else got it, this would pose a serious security risk if you failed to change over ALL of your internet accounts 2FA to the new number.

ourcat · on April 3, 2021

I've always thought the most scary thing about this practice is that your (unique) phone number is a powerful "foreign key" which could potentially join data from many other leaked databases, forming an even larger dataset on you.

There are plently of other places we give our phone numbers to, which might not have anywhere near the protections that Facebook say they provide.

mopsi · on April 3, 2021

Absolutely, and e-mail or Paypal account name too. Neither of them are trivial to change. If you try to create a new account for each thing at a generic mail provider such as Gmail, your accounts will be shut down by automatic abuse filters. If you roll your own domain, then, well... the domain becomes the foreign key.

Perseids · on April 4, 2021

The solution to this is unlimited true email aliases as e.g. StartMail [1] and Fastmail [2] provide. I wish this was more common place for email provider. Besides the front up cost of developing / setting up the solution, email aliases have the marginal cost of one small database row per alias. And it would be such a boon for privacy.

[1] https://support.startmail.com/hc/en-us/articles/360007297457...

[2] https://www.fastmail.help/hc/en-us/articles/360060591073

seng · on April 4, 2021

Would using a separate service email accounts help mitigate issues? seng-baking@gmail.com, then seng-banking+icici@gmail.com, seng-banking+axis@gmail.com, etc? That way my primary email would stay private and will used only for email, not for identity.

sammax · on April 4, 2021

Your private email that you don't use for signing up anywhere is irrelevant except for phishing and spam. Your secondary email address will become the foreign key that is used to correlate the datasets from everywhere you signed up with it. The +tags can just be removed since it is known how they work. Might give you a small protection against attackers who don't know about email address tags.

dheera · on April 4, 2021

> more and more being used in 2FA

Lesson: Don't use your phone number for 2FA. Get a bunch of virtual numbers and redirect their SMS to your e-mail.

If U2F or TOTP is an option, use it. And use a physical key for TOTP, not some Google or Microsoft authenticator.

anticristi · on April 3, 2021

Like really? Don't you have to walk to a bank or show some ID?

I live in the EU and I do operate under the assumption that banks take reasonable measures to ensure an account is linked to a legal identity.

the_svd_doctor · on April 4, 2021

No in the Us you can easily open one online just by answering a few questions and giving your SSN. Sometimes they will ask you to upload an ID. But that could also have been leaked.

hn_throwaway_99 · on April 3, 2021

No. Many online services will let you open a bank account with name, address, phone, DOB and social security number.

iso1210 · on April 3, 2021

Without sending a confirmation letter to the address and SMS to the phone?

brendoelfrendo · on April 3, 2021

If you're the fraudster, you're providing the address and phone number.

iso1210 · on April 3, 2021

In which case it surely wouldn't match with credit report databases?

AnthonyMouse · on April 4, 2021

It turns out that signing up for a new bank account is something that people commonly do at the same time as they're moving to a new place and change their address and phone number.

iso1210 · on April 4, 2021

So you're saying there's effectively no checks for ID for opening a bank account in the US?

cwillu · on April 4, 2021

"As has been pointed out on HN before, "identity theft" is a made-up concept to make it seem as if you had something stolen from you, when the real problem is banks and other service providers do an absolute shit job of identity verification. They're the ones at fault, and they try to shift the onus onto you to fix things when they screw up."

black_puppydog · on April 4, 2021

Well, I think at this point we can conclude that companies will not be good shepherds of our should-be-private data if we don't incentivize them to. Who'd have guessed?

I appreciate the pragmatic stance you take, and we should definitely move to more robust identification methods.

But making this case for better ID tech should IMHO not be confused with giving facebook and the others a pass. This should never have happened, and the very fact that we have a data trove this big is already a problem. That they don't seem to even attempt to protect that data is another one.

Close that shop up. The fines for this kind of stuff (and the other stunts they've pulled) should make it economically no longer viable to keep it open, really. The first time you fuck up this hard, the fines should hurt. This is not their first time.

ldbooth · on April 4, 2021

Companies effectively run the government, via the 2 party system and the back and forth of high level corporate execs back and forth into gov't. They don't want to rock the boat once that bountiful position is achieved. That said, how to get the government to regulate corporations who regulate the government. Maybe I'm becoming too cynical, but there has to be an externality like the great depression in order for The People to take back the reigns of power (at least in the US).

op03 · on April 4, 2021

Wont happen. All the politicians world over have been trained like Pavlovs dogs to chase Likes as a route to power. So its like asking addicts to not just give up their drugs but to take down the cartels.

The time to shutdown Facebook has past. Now we just have to suck it up and endure them and their effects like we do Cancer.

black_puppydog · on April 4, 2021

Again, I get the sentiment, but I disagree.

fragile_frogs · on April 4, 2021

> Indeed, a social security number is pretty much the only additional piece of data to the stuff above that one would need to open up a bank account in someone else's name

I am not from the US, but is this really all you need to "proof" your identity?

The most common thing I have seen in the EU when companies have a KYC requirement is that during the sign up process you will have a quick video call where you have show your ID card while they verify that your ID is legit.

ransom1538 · on April 4, 2021

"I mean, at this point I think everyone should just accept that at the very least their name, age, address(es), email(s), phone number(s) and screen name(s) have"

Yes. I think the scary part is WHAT these identifiers are connected to. EG. your email being linked to a private antifa community, pro-Taiwan communities, adulteration meetups, etc.

seaman1921 · on April 3, 2021

s/if you have ever had any kind of online presence./if you, your friends, your family, your cleaning lady etc. has ever had any kind of online presence.

johnchristopher · on April 4, 2021

> So if that's the case, I think we should move beyond really even trying to think of this info as private or a marker of identity, and we need to move everyone to more secure forms of identity verification.

Even if this info wouldn't be good enough to sign for official stuff it's still private and unique enough to target you though.

uniqueid · on April 3, 2021

We should start thinking of these breaches in terms of their accumulated impact. It's not the 1990s anymore, where data is difficult to store and networking too slow to move it.

We should assume the leaked data doesn't go away; that instead people out there are consolidating Equifax data with Vastaamo data, adding data from Exchange hacks and the Accellion hack, to cross-reference with data from Facebook... it's like water flooding a levee now, instead of evaporating.

Not the first time I've harped here about this (ie: https://news.ycombinator.com/item?id=26604753, https://news.ycombinator.com/item?id=24586258), but I hope we start planning for that kind of future.

uyt · on April 3, 2021

Honestly sounds like a fun job for future historians. By aggregating all the leaks over a long period, how much of a person can you reconstruct?

For example even though I am using a throwaway account, HN's logs might one day get compromised. So now they can join the IP address to other compromised sites that I was logged into using my usual email. And from my email they already have my name, SSN, address, phone number, usernames, passwords, etc, exposed from prior breaches. But now they know about my shitposts too.

Scoundreller · on April 4, 2021

> logged into using my usual email

It's risky if things go sideways, but HN doesn't require an email address for an account.

uyt · on April 4, 2021

That's exactly my point. I think I am safe on HN because I'm using a random user name with no email attached. But their logs definitely have my ip address and that ip address will be common across other compromised logs on other sites, some of which I might be logged into with a real email (this is true regardless of incognito mode since it's the same computer).

Scoundreller · on April 4, 2021

Use more TOR or CG-NAT systems like mobile/satellite/campus internet.

_clhx · on April 4, 2021

Yes. Why can't you delete your HN account again?

anonymousiam · on April 3, 2021

The root of the problem is not the privacy policy or the system security. The root of the problem is the collection itself. All large businesses, health care providers, and governments maintain databases. Every one of them will eventually be leaked. All it takes is a corruptible trusted insider.

xtracto · on April 3, 2021

This.

I don't trust in the government, but I think digital "personal data" should be only available for "confirmation" to companies that need it. Say, a government entity could have an API that allow you to send hashed personal data that they can verify is right. This way companies will ask the user for their data and hash it client-side. Then they can send the hashes (hashed with a custom provided salt to the entity (government, maybe private) who will basically reply with a True or False on the verification of the different data.

It may even be an interesting use case for a public blockcahin, where your personal data is stored in a Merkle Tree type of data structure, so that one can verify that certain pesonal data of a person is true, without disclosing the data.

sdfhbdf · on April 4, 2021

It's probably not implemented as closely as what you described but check out Europe, this small continent across the pond and the tech scene in the smaller countries.

For example Estonia has had famously and online identity stuff linked via a federal ID (in europe there are more republics then federations so it's easier to manage country-wise) [0] [1]

Or more familiar to me with a bigger sample is a movement in Poland which is gaining popularity - mojeID (myID) which is a Single Sign On system with major banks as providers (they really regligiously check the identities when you open a bank account) or the statebacked login.gov. The mojeID system allows other entities to use your actual identity as an authentication factor without having to keep that much data and pose risks - for example an online alcohol shop can verify the age. [2] [3]

[0]: https://en.wikipedia.org/wiki/Estonian_identity_card

[1]: https://e-estonia.com/solutions/e-identity/id-card/

[2]: https://www.kir.pl/en/administration/mojeid/

[3]: https://www.mojeid.pl/#zastosowanie

menomatter · on April 11, 2021

For many reasons, good or bad, the idea of mandated federal ID or national ID isn't quite accepted in the US.

africanboy · on April 4, 2021

Italy use a similar system, it's called SPID (Italian acronym for Public System for Digital Identity)

https://www.spid.gov.it

spondyl · on April 3, 2021

Wouldn't this fall down as soon as someone enters your details with the wrong casing or uses a +country code rather than a localised phone number and so on?

Humans will error and enter data incorrectly so the hash would be different every time potentially despite being "correct" to a human at a glance?

You could standardise everything (lowercase etc) but I imagine there are country and regional edge cases such as capitalisation having meaning in a given language (I can imagine it being a thing but I don't know it for sure)

webinvest · on April 4, 2021

There could be instructions in place to specify only all lowercase or just use .toLowercase() when sending. Also there could be a specified format for phone numbers or a function that turns all input into the desired format by stripping all special characters. Possibly only hashing the last 9 digits of the phone number for non mission critical applications instead of the full 10 digits.

This sounds like a good use case for short brief documentation.

TheRealDunkirk · on April 3, 2021

> Every one of them will eventually be leaked.

Equifax has more at stake than most. And they've been hacked. Repeatedly. The government has been hacked. Yahoo was COMPLETELY owned. I mean, if someone would put together a list, it would make for shocking reading. It's become so common, that we go, "Oh no! Anyway."

dunham · on April 4, 2021

Yeah, both my health insurance company and the company that eventually bought my mortgage have both sent me letters. And this point I just dump them in a folder in the filing cabinet with the rest.

tomComb · on April 3, 2021

Google has a huge number of activist (and surely some corruptible) employees, and yet the incidents of users data getting out are very close to zero.

I think this demonstrates that user data can be managed safely and effectively.

Usually the incidents reports on user data leaks show that the company seemed to barely be trying - We need laws that force them (even small companies) to put serious effort into it.

varispeed · on April 3, 2021

You don't know that. While the publicly available data leaks are indeed rare, you cannot know if they don't use the data for trading or other purposes for their personal gain without disclosing it to the public.

Judgmentality · on April 3, 2021

Sure, but if you have no evidence of it happening you have a fairly weak argument.

varispeed · on April 4, 2021

People do have access to privileged information and that will influence their decisions on both conscious and unconscious levels. It's not possible to detach yourself from work completely and asses your thinking whether it is influenced by something you saw or not on an objective level. It will also be difficult to prove. For example if an employee hobby is trading, how do you prove that the trades they made are based on their own independent research or based on what they saw? If they saw something that could make them money, they could easily create a trail of evidence that they researched the matter on their own - it will be difficult to prove that it originated from looking up the privileged information and unless someone is going to be making millions, it's frankly not economical to commit resources to. It is also in the interest of the company that such incidents don't see the light of day.

Judgmentality · on April 4, 2021

I understand your point, but everything you're saying is hypothetical. You're not going to persuade anyone of something happening if you can't come up with any evidence of it happening.

tomComb · on April 3, 2021

There are infinite things we can't know - opening the discussion up to that really makes anything possible, but the discussion wasn't even about what they might do with the data beyond leaking or selling it.

cutemonster · on April 4, 2021

> Google has a huge number of activist (and surely some corruptible) employees, and yet the incidents of users data getting out are very close to zero

Am I reading this wrong, or are you saying that activists would be more likely to leak data? Then I would wonder what kind of activists you have in mind.

Agreed that yes indeed it seems possible to build a security serious company, and that Google is (seems to be) a good example. (Now, there are other things I don't like about Google but I guess that's of topic.)

thu2111 · on April 4, 2021

Surely they would. We already learned that members of their own security team don't seem to see any problems with employees abusing privileged access to mandatory Chrome extensions to agitate for unionisation (at Google of all places!!). Twitter employees screwed with the account of the president of the United States.

Ideological employees of big tech firms taking a sudden disliking to someone or some group and abusing privileged access is certainly a threat that ever larger numbers of people are talking seriously. In particular, it is a concern for industries that do things activists don't like, such as working with immigration control (though perhaps that's no longer an issue now Trump is gone).

cutemonster · on April 5, 2021

> employees abusing privileged access to mandatory Chrome extensions

Sounds interesting, you don't happen to have a link? (So I can read more)

thu2111 · on April 6, 2021

https://www.nbcnews.com/news/all/security-engineer-says-goog...

Kathryn Spiers, who worked as a security engineer, updated an internal Chrome browser extension so that each time Google employees visited the website of IRI Consultants — the Troy, Michigan, firm that Google hired this year amid a groundswell of labor activism at the company — they would see a pop-up message that read: “Googlers have the right to participate in protected concerted activities.”

Discussion here: https://news.ycombinator.com/item?id=21813619

Note that she wasn't able to do that unilaterally. Some other member of the team approved her CL and others defended her in public.

I have a vague feeling there was another case like this some years ago where some security engineer modified a Chrome extension for political reasons, but I can't remember the exact details and can no longer find it.

edanm · on April 4, 2021

I mean, yes, but.. what's the solution? Never collect data? In at least some of those cases (and arguably all), that data does need to be collected and stored. What is the government going to do, not maintain birth registries, tax registries, land owner registries etc? What is a big business like a bank going to do, not collect customer data like your name and address?

lovecg · on April 5, 2021

I have a different view: it’s not the collection that’s the problem, it’s the firehose attached to the database. For the applications you mention, make aggregation over all records prohibitively expensive by design.

edanm · on April 5, 2021

A good idea, but again, there are plenty of cases where that fails (depending on what you mean by "aggregate").

- "How much money is currently owed in taxes to the government?"

- "Can't tell you that, we're not allowed to aggregate data".

lovecg · on April 5, 2021

I still think there might be something here. You can allow certain aggregations (like “sum of the tax column”), but they have to be explicitly permitted; otherwise shuffle and hash everything enough times to make a single lookup sort of cheap, while a scan very expensive (plus distribute over enough physical servers and make the network between them low bandwidth to thwart lower level attacks). With enough regulatory or legal pressure on companies to lock down their data, paying this premium might start to look attractive; one could even found a startup peddling the World’s Slowest Database™!

Edit: what I was thinking originally was that in the world of paper-only archives, these massive leaks were all but impossible, yet business could still be done. It should be possible to combine this slowness with the convenience of computers.

edanm · on April 5, 2021

I mean, that slowness is the reason we moved to computers.

That said this is certainly interesting, I wonder if there has already been an exploration of this topic. Could definitely make an interesting startup idea :)

HenryKissinger · on April 3, 2021

[flagged]

HighlandSpring · on April 3, 2021

On a long enough timeline everything and everyone can be compromised (or the institution fails before then)

hobs · on April 3, 2021

Exactly - either the data is basically not valuable at all (the category for which PII rarely fits) or else when the company collapses or is bought, the data moves too.

There's always an incentive to steal or leak it to other companies for money; so as long as the incentives are aligned with GATHER ALL DATA and KEEP IT FOREVER then yes, it will just be a matter of a time before each data store is compromised by mistake or purposefully.

BobbyJo · on April 3, 2021

I doubt the claim, but the sentiment I think is valid. If you think about what data these entities are holding, it's not unique to a single database or entity. Your name/address/phone/ssn/etc. Is likely stored in so many places that the probability it gets leaked from at least one eventually I'd say is very nearly, if not 100%.

allworknoplay · on April 3, 2021

Why on earth did you pick this username

londons_explore · on April 3, 2021

Looks like this is the "To match users to their friends by phone number, you need an API which can take as input a phone number, and return information about if that number has an associated account" problem.

There is no way to let a user find their friends on a service without such an API. Yet if you have such an API, someone can simply brute force all phone numbers worldwide (there are only 10^10), and now they have a database of all users...

Rate limits can help defend, but considering many users might have 1000 phone numbers in their address book, you can't set the rate limit very low without impacting user experience. Attackers can reduce the search space dramatically by only checking phone numbers that resolve to an active line (using VoIP stuff to test a number).

The only real solution is for your app not to have a "Here is a list of your friends already in the app" screen... But as you can imagine that means you won't get any user growth or VC funding...

Someone · on April 3, 2021

I think there are way more than 10^10 phone numbers in the world. I think there are 10^10 combinations in the USA alone (filtering by unused area code, etc will decrease that number, but even then https://www.ck12.org/c/probability/permutation/rwa/Wrong-Num... says almost 8×10⁹ remain)

Also, at least some countries have longer phone numbers (Germany, the UK and China have 11-digit ones, for example), and the international public telecommunication numbering plan says plan-conforming numbers are limited to a maximum of 15 digits, excluding the international call prefix (https://en.wikipedia.org/wiki/E.164), so the search space, potentially, is a lot larger.

Scoundreller · on April 4, 2021

> I think there are 10^10 combinations in the USA alone

Canada and USA share the same numbering plan and that's sometimes overlooked.

I always "press 1" to interact with the 'your social has been criminally suspended blah blah' or 'card-services' robo-calls.

When I have time I share circular stories about my grand-kids that don't visit me anymore, but when I don't, I just respond in French with something like "Je pense vous etes une pamplemousse."

Most recently, one responded with "NO HABLOS ESPANOL". lordy lordy...

amluto · on April 3, 2021

This is the same fallacy that leads to apps asking for permission to access your whole picture library.

Facebook could have an API by which an app can prompt its user to show a list of all of that user’s friends who have the app installed. The app would only learn the identities of people whom the user explicitly selects, and phone numbers would not be part of that identity.

progval · on April 3, 2021

It works for photos because the threat model is about protecting local files against malicious apps.

But for phone numbers, you about protecting Facebook API (which is publicly available via the internet) against arbitrary devices, which Facebook has no way to tell from legitimate ones

amluto · on April 3, 2021

What I mean is: Facebook should remove that API entirely. Apps do not need a way to look up a phone number in Facebook’s database. The “find my friends using this app” feature does not require this capability.

progval · on April 3, 2021

What you are proposing is that third-party apps should ask Facebook's app to find the friends, right?

But Facebook's app needs to access Facebook's database somehow; and anyone can impersonate Facebook's app and query that database too.

amluto · on April 4, 2021

Facebook’s phone app does not need to access the database. The whole thing could be implemented on the backend.

progval · on April 4, 2021

This is what I meant. Replace "database" with "backend" in my previous message and it still stands.

varispeed · on April 3, 2021

I think it should be illegal for apps to help find friends. If you genuinely meet someone offline, then they could generate you a token that then you could enter on the site to "connect".

noxer · on April 3, 2021

Telegram had this issue too and they made a setting "who can find me by my number" you set it to "my contacts" so only mutual contacts can find each other.

noobquestion81 · on April 4, 2021

This is a genius fix because in order to enforce your own privacy you must betray the privacy of n+ friends. Awesome.

noxer · on April 4, 2021

I think you misunderstood that. This setting isn't about privacy of your friends number its about who can find you with your number. For example by brute-forcing numbers. Whether you upload your address book or not is another story and it does ask if you want to do that. Obvious if you decline and then set to above setting to "my contacts" then no one will be able to find you by your number (which is exactly what I personally want)

There is absolutely no need to upload your address book to make you number unsearchable.

Scoundreller · on April 3, 2021

And now you know how those cell phone farming programs were able to pay people a couple bucks a month to run crap on arrays of dozens of phones.

dan-robertson · on April 3, 2021

Obviously it is bad if your personal data is compromised after you (or some else) upload it to an online service like Facebook.

But in this case, it’s important to remember that phone companies used to regularly leak most of their customer’s phone numbers (and names) in the form of a telephone directory. So a question to consider is: suppose that the white pages were still commonly produced and contained most people’s numbers. How would you then feel about something like this.

Personally I feel like the problem with phone numbers being leaked is mostly the epidemic of spam calls (especially in the US) rather than some particular breach of privacy.

Aside: I think it is good to consider these counterfactuals in general for questions about information privacy, for example how would you feel if everyone’s tax returns were published publicly like they are in Sweden?

eightysixfour · on April 3, 2021

The "new" risk with phone numbers is the overreliance on them for login and 2fa and the relative easy of taking one over. I use security keys but still have accounts I can't remove the phone 2fa from despite having two keys tied in.

whichquestion · on April 4, 2021

If you can avoid it, simply don’t give anyone your phone number. Then you can use security keys, hardware keys and recovery codes. Obviously some places require a phone number, but for those that don’t, avoid giving them one.

groby_b · on April 3, 2021

Spam calls are likely not even affected by leaked numbers. Source of suspicion: My partner and I have phone numbers in close numeric vicinity, and deliberately use one for public purposes and the other one is not known outside of a very close circle of family.

We still get spam on both numbers within short time frames - so I'd say it's likely spammers just auto-dial through.

coldcode · on April 3, 2021

That's been going on for many years. Brute force calling costs nothing. I've always wondered if charging 5 cents per call would stop them cold, but I am sure no one wants to implement that now.

dudul · on April 3, 2021

As far as I can remember, the white pages don't include "biographical information". The kind of details used for idiotic "security questions" on websites too lazy to implement 2FA (your mom's maiden name, your first school, the name of your first pet, etc).

As for public tax returns in Scandinavia, first of all it has guardrails - searches are recorded with your information when you lookup someone - and second, countries have different culture and History for a reason.

djhn · on April 4, 2021

Finnish income for those earning 100k and over is bulk downloadable as a csv/json with full names, birth years and provinces, along with taxable salary, capital gains, etc.

varispeed · on April 3, 2021

You can't compare that at all! They leaked IDs and from that you can go to user profile and learn more about them. You cannot do that from a phone company leak.

InternetUser · on April 4, 2021

Before the Internet, all you could do with a person's phone number is call them. But now, with PeopleSearchNow.com (the ultimate free source of personal info--feel free to look up your own real phone number in it, assuming you're American) and with Google and the MULTIPLE social networks that virtually everyone between the ages of 20 and 45 has been on now--typically using the same username on most or all of them--you can find out way more in a matter of a few MINUTES than a 1980s private investigator ever could in a span of a month. I can look up your number on PeopleSearchNow and then, if I can't find YOU on Facebook, I'm confident I can find one of your relatives (several of whom will be listed on PeopleSearchNow), and get to you from there--meaning find your other online profiles, thanks to those Facebook friends who know you.

dan-robertson · on April 3, 2021

Phone companies didn’t leak phone numbers in the conventional sense of the word. I used it to try to draw a comparison. Phone numbers used to be printed in big books and you could usually look someone’s phone number up if you knew their name and rough location. That is, phone numbers were not considered to be particularly private information at all.

I think the comments I most agree with talk about the different security threats people face today with current usage of phones.

InternetUser · on April 4, 2021

The issue of having someone's phone number is completely different these days; in 1985, I couldn't find out your age and your previous addresses and then find dozens or even hundreds of photographs of you alone and/or with your relatives and friends, as I can now - see my other comment:

https://news.ycombinator.com/item?id=26687321

joshspankit · on April 3, 2021

I agree, but also we’ve made it more complicated by using phone numbers as 2FA credentials.

Now suddenly a “white pages of cell numbers” becomes a very convenient tool for getting in to people’s accounts.

ajross · on April 3, 2021

Only if you can hijack their number. Knowing a phone number seems like by far the easiest part of breaking SMS 2FA...

allworknoplay · on April 3, 2021

This is insane. Phone companies published numbers because it was generally considered helpful and the costs of unsolicited calling were relatively high. By the 70s delisting was an option, and by the late 90s it was very common (in the US). The internet made this a no-brainer, and to suggest that it’s somehow ok just because it used to be (in a totally different world) is beyond ridiculous.

We don’t have the option here — people provide their number to a service to be able to use it, and the numbers are then compromised, in breach of that contract and because of the service’s failures.

The two are not remotely alike, what the fuck are you even talking about.

dang · on April 4, 2021

Please make your substantive points without name-calling and swipes. Those are against the site guidelines and we ban accounts that post that way—it's because we're trying for a different fate than internet-default here, or at least to stave it off a while longer.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and sticking to the rules when posting here, we'd be grateful.

kwertyoowiyop · on April 3, 2021

Don’t worry, Facebook will soon put out a press release including the phrase “we need to do better.”

poqegjrioe · on April 3, 2021

I work in the security field and let me tell you something I realized: nobody cares about security. If someone cares about security, it's because they've had many many incidents in the past. We humans are not a species that is good at preventing, we are good at reacting.

the security handbook[^1] has a chapter on that actually, and they basically say that role playing is the only way of not getting burned. Humans are excellent at role playing, and it can help you prevent a lot of catastrophe without having experienced them before.

[^1]: https://securityhandbook.io/

anticristi · on April 3, 2021

I think part of the problem is that many orgs see security as an overhead that engineers do to sleep well at night. A few more breaches, a few more fines and it will finally be seen as a feature to keep the CEO out of jail.

hunter-gatherer · on April 3, 2021

This is just it. I also work in the security industry, and the fact of the matter is that we (security professionals) can't give guarantees. I don't know what exotic exploit or bug will exist tomorrow. Security professions basically offer what (to me) seems like a crappy insurance policy. Depending on your orgs threat model, it is often just cheaper to deal with the breaches. --- I am not saying facebook falls into this category. ---

anticristi · on April 11, 2021

Security is not a replacement for a data breach insurance. Security is basic hygiene for insurance to work at all.

To me, a good parallel is home insurance. If you get robbed, a good insurance will cover your losses. However, if said insurance determines that you were negligent -- say you never lock your front door -- you are on your own.

Do you have precious art at home that you want insured? No problem. Just make sure you add an alarm and sprinklers.

That is how I want discussions around security to be held. Are you a start-up with 10 users? It's okay to do minimal security. Are you a bank whose wires carry $1B? Make sure you throw sufficient "bodies" at the problem, from the top of the hierarchy to bottom.

kevmo · on April 3, 2021

Probably 2/3 of billionaires belong in jail.

aloisdg · on April 3, 2021

Probably most of them if not all.

Judgmentality · on April 4, 2021

"Behind every great fortune is a crime."

RachelF · on April 3, 2021

The problem is that companies don't care about securing their data, because the data is not theirs, it is about their users.

Mark Zuckerberg probably spends more on personal and family security and privacy than Facebook spends on their users' security.

PietKachelhout · on April 7, 2021

That site is a scam. The listed email address does not exist and I have not received a book.

esnard · on April 3, 2021

"This is old data that was previously reported on in 2019. We found and fixed this issue in August 2019."

https://twitter.com/Liz_Shepherd/status/1378398417450377222

varispeed · on April 3, 2021

What a pathetic response. Does it mean users changed where they live? Change their names? Deleted and started a new account so the ID is different?

TheRealDunkirk · on April 3, 2021

> Deleted.

"You keep using that word. I do not think it means what you think it means."

I "deleted my account" in the run-up to the 2016 election, because I could see how social media was being manipulated, and what it is doing to society. And I mean I _deleted_ it. I took the extra steps of researching how to REALLY delete it; not just suspend it.

A couple years later, I needed to get a new account to help admin a page. You can guess where this is going. I tried my usual email -- since it should be free, and it was deleted, right? And... what do you know? Everything was still there. All my posts. All my connections. Everything.

Facebook never deletes anything.

varispeed · on April 4, 2021

I think this is because is very expensive to delete and also to develop true delete process. I think that is not acceptable and cost shouldn't be an excuse for keeping content forever. Even if you manage to get deleted from the live servers, chances are your content is still going to live in backups and will never be deleted. Some backup systems are write-once and could only be deleted by physically destroying the medium. I wish this was properly addressed in legislation.

mrweasel · on April 3, 2021

That kinda sad, because that is what’s going to happen and then we’ll nothing more.

At this point I’m not really sure what it will take for companies, like Facebook, to understand that you need to not fuck around with peoples private data.

BoiledCabbage · on April 3, 2021

Put a monetary cost of holding user data, and a steep monetary cost on losing user data.

Ex, pay x amount per month in perpetuity for each piece of information about a user your keep. And have to pay the "net present value" of those payments if you lose the data.

Having to pay for hoarding user personal data changes the incentives from gobble up as much as possible, to instead only pay for a users data that is worth the cost to your business.

And as an extra incentive to not hold unneeded user data, know the costs you'd pay if it was breached.

mrweasel · on April 3, 2021

Who would get this money? I agree that it needs to be some solution involving a cost, given that most of these companies have shown multiple times that profit isn’t just their main concern, it’s the only concern.

pharke · on April 3, 2021

Think of it like a class action lawsuit on behalf of investors. Instead of entrusting their savings to a company, people are entrusting them with their personal information. If there is gross negligence on part of the company leading to that data being leaked then all of the people whose data was stolen should be able to claim monetary damages. If a legal precedent is established so that these claims can be pursued whenever this happens it should provide enough motivation for these companies to take preventative measures.

gpm · on April 3, 2021

The government typically... who might in turn do something like a tax rebate (write a check to everyone, ontario has been doing with the carbon tax) or just stick it into the general pool of taxes (reducing everyone's taxes).

29083011397778 · on April 3, 2021

So the American government gets a cheque for every other nations citizens that use FB, or FB has to determine where each of their users reside?

Respectfully, I'm not sure either of these lead to outcomes we want

anticristi · on April 3, 2021

Sounds interesting. Shall we call it "GDPR"?

mrweasel · on April 3, 2021

Honestly the EU need to finans a organisation to deal with GDPR violation, hell it could finans it self. The GDPR is the single best piece of legislation ever written, in term of privacy, but enforcement is lacking.

donalhunt · on April 3, 2021

They already do. The Data Protection Commissioners in each state are growing rapidly to try and tackle the volume of work.

rikkipitt · on April 3, 2021

I’ve been getting a lot of automated/unsolicited calls recently. Begs the question if this might be the source of my woes.

Is there a trustworthy phone number version of https://haveibeenpwned.com?

tyingq · on April 3, 2021

"Is there a trustworthy phone number version of https://haveibeenpwned.com?"

An "exact" google search excluding adjacent phone numbers seems to work well for my numbers, and culls a lot (not all) of the autogen pages. So if your number was 212-555-1239, search Google with these strings:

  "(212)555-1239" -1240 -1238

  "212-555-1239" -1240 -1238

neogodless · on April 3, 2021

Dear god, fastpeoplesearch.com is a horribly obnoxious treasure trove of information.

randerson · on April 3, 2021

Just submitted a removal request for myself, a flow full of dark patterns (in fact the Remove button didn't even show up until I disabled my Pi-Hole). Remains to be seen whether all I did was make the data more valuable by confirming my email address. The page recommends signing up at BrandYourself to prevent various other data brokers from showing the same data. How is this not extortion?

tyingq · on April 3, 2021

"The page recommends signing up at BrandYourself"

Is is a link? BrandYourself has an affiliate program, so they are probably making money on referrals.

tyingq · on April 3, 2021

Tried it, you're right. Got 6 of my past addresses, 9 past phone numbers, 8 relatives, all correct. Some incorrect info, but not much as a percentage.

If you reverse search the PO Box address listed on the site contact page, you'll find an Amateur Radio license listed to a person that is probably the owner of the site, based on his past experience.

tyingq · on April 3, 2021

Also, searching for their Adsense publisher id reveals some other sites they own: peoplesearchnow.com, fastbackgroundcheck.com, smartbackgroundchecks.com

Those sites have new and different PO Boxes in other cities, etc.

jrnichols · on April 3, 2021

I am amazed and horrified at the fact data brokers like that are legal. and the hoops to which you must leap in order to get your information out of them, even with california privacy laws.

HNfriend234 · on April 4, 2021

Yup. That's why I pretty much deleted all my social media with my real first name, last name etc.

Imagine getting into a facebook argument with someone and that person becomes so emotionally enraged with what you said that they go to your house.

neither_color · on April 4, 2021

I just tried it on myself and I felt violated reading how much info is out there.

dktalks · on April 4, 2021

This is just one of the websites. Here is a list of all the websites which have your information with easy links to opt-out from them.

I do not maintain this but don't know where I got this from, however I have notifications when this spreadsheets gets updated to remove my information from another website

https://docs.google.com/spreadsheets/d/10wzyKEl-fAxjCD42u9HQ...

acjohnson55 · on April 4, 2021

Yikes. That's the only one of those that was even close to being accurate for me. And I'm not sure I can get it removed because they don't have any of the right email addresses. I don't usually leave much of a trail on these sites, but the info that is correct vs incorrect makes me suspect they probably got it from one of my parents.

Group_B · on April 3, 2021

so if I search my phone number, it brings me to my name and everything. But if I search my name it doesn't get my phone number right. Any ideas why it's like that?

rikkipitt · on April 3, 2021

Good idea, I’ll give that a whirl later. Great tip to filter out those auto-generated list sites. Thanks.

dreadlordbone · on April 3, 2021

you genius

OminousWeapons · on April 3, 2021

Not really an answer to your question, but one partial solution to the problem of having your number leaked or sold is to setup a service like Twilio to act like a phone proxy. You can have Twilio forward calls it receives on a different number ("spam number") to your actual phone number ("real number"). You provide spam number to anyone who isn't a business or personal contact. Every few months, you rotate spam number. If your spam number is leaked, you don't care because its only a transient number which isn't more permanently associated with you.

You can also have more permanent proxy numbers for services or people that may need to get in touch with you long term.

procombo · on April 3, 2021

It's what I have done for years. Only costs $1/mo for the number and a couple hours learning their API.

Your existing cell number can be ported over to Twilio if you are patient.

The only problem is trying to use the number for 2fa. A growing number of banks (like Capital One) block Twilio services from recieving their SMS.

cced · on April 3, 2021

Can you think of any drawbacks of using this for important services like say PayPal? Or are you strictly using this for throw away products and services?

Phenomenit · on April 3, 2021

Is this available to people outside of the US as well and is there a guide for setting this up? Last time I used twilio for a basic sms gateway there was a lot of clicking and typing.

29083011397778 · on April 3, 2021

I've been using voip.ms in Canada to great success. Even SMS codes from banks and Whatsapp work correctly. Excellent service, highly recommend, especially with voicemail auto-transcription (then sent to email) and SMS from desktop via email.

OminousWeapons · on April 3, 2021

I think it is available for people outside the US.

https://support.twilio.com/hc/en-us/articles/223179908-Setti...

I would recommend using the Studio workflow which is GUI based and easy.

https://support.twilio.com/hc/en-us/articles/115016033048-Fo...

OminousWeapons · on April 4, 2021

I forgot to mention this earlier, but you can also proxy outbound calls through Twilio:

https://www.twilio.com/blog/make-receive-calls-twilio-number...

criddell · on April 3, 2021

I've been getting a lot more recently as well and I figured it was due to the phone companies promising to get rid of caller id spoofing this year so scammers are working overtime until they can't anymore.

zeta0134 · on April 3, 2021

Oh, is that a real thing that's happening? Caller ID spoofing is the main reason I hold onto my phone number from [small town] Texas, since only my immediate family ever calls me from there, so I somewhat reliably know anything else from that area code is a scammer.

criddell · on April 3, 2021

I hope so. I believe it's this:

https://en.wikipedia.org/wiki/STIR/SHAKEN

timdaub · on April 3, 2021

intelx.io

Can't say too much about trustworthyness though.

U could also just download the set from e.g. raid forum to check for yourself.

rikkipitt · on April 3, 2021

Might have to I think.

tnolet · on April 3, 2021

European here. What are these bot calls exactly? Never had one as I guess it’s forbidden where I live.

Scoundreller · on April 4, 2021

I recall there were a ton of them in France. Usually pretending to be DHL or another courier asking about a package. Nobody I knew interacted much with the calls.

If you're in Europe, but don't share a language with a much poorer country, you're safe from these.

henadzit · on April 3, 2021

Telemarketing or political campaigns. Check out the Robocall article on wiki. In Europe it depends on the country. In Poland I receive a few calls daily but they are people calling me, not bots. Never received a robocall here.

pessimizer · on April 3, 2021

In the US, the vast majority of them are simple frauds. For a year I got a robocall every few days from a Chinese woman (in Chinese) that a friend of mine said is a threat to get (the hypothetical Chinese immigrant) me deported unless I pay them.

Right now I'm getting a fake credit card debt collection call (I've never had a credit card in my life, only debit), and a call telling me that I'm eligible to have my AT&T (phone) bill halved (I don't have AT&T phone service) and all I have to do is call the number "on my caller ID." I think those two are both being read by the same woman (not the Chinese one.)

I'm more of a texter than a caller, so the vast majority of calls I get are robocall frauds. I'd love to get a robocall that was just annoying for a change, rather than completely predatory.

spicyramen · on April 3, 2021

Same here, i started recieving both calls and SMS which the last i find more annoying. I do use Android and these ones haven't been able to be detected as spam

rikkipitt · on April 3, 2021

I’m on iOS and don’t think there’s a way of blocking unsolicited calls until after the fact... I hope to be proven wrong though!

The odd thing is, the calls often come through having a caller ID very similar to my own number.

thechao · on April 3, 2021

The best I’ve found is to simply reject all calls not in my contacts. Real callers leave a voicemail, which gets transcribed.

Scoundreller · on April 4, 2021

Doesn't work so well if you're on-call or running a business.

coldcode · on April 3, 2021

Those are usually generated, they call numbers in area code/exchange randomly, assuming you will pick up something that seems familiar. Jokes on them, I moved to another state, easy for me to tell.

_xy8h · on April 4, 2021

Best thing I ever did for myself was to get a Google Voice number in an area code I've never lived in.

A lot of these fake calls rely on people assuming "local number = neighbor" kind of mentality which rarely comes into play these days as we're much more mobile than we used to be. Plus area code splits, separate area codes for cell vs landline services becoming increasingly common.

If my GV number were: AAA-BBB-CCCC

And my actual home area code is DDD-EEE-FFFF

Any and all numbers from AAA-BBB-xxxx can be ignored.

I never answer calls that come in on my cell carrier provided number, so that eliminates that issue. Silent ring, no forward to voicemail.

Anything left, about 95% of the time, tends to be legit.

With regional calling being a thing of the past and most cell plans being unlimited text and talk, it makes very little sense to keep a local number. Especially now as it's SOP for roboscammers to fake Caller ID and try to match the first 6 of your 10 digits.

ajanuary · on April 3, 2021

Not natively, but there is an API that apps can use to do it for you. I use Mr. Number because it’s literally the first one I found and it’s worked good enough for me.

JoshTko · on April 3, 2021

on iOS there is a lifesaving phone setting of sending unknown callers straight to voicemail.

rikkipitt · on April 3, 2021

I toyed with that for a while but I kept missing important work calls. I might have a look for an app later, but I have a feeling it might not exist...

ghaff · on April 3, 2021

Yeah. I tend not to pick up calls that are in the "Who would be calling me from Texas?" vein. But while it's annoying to have to look at my phone when it rings, I do get calls from locations that seem plausible and they usually are legit. I'm not really willing to make myself harder to reach for legitimate and even important reasons because of the occasional junk call.

Nextgrid · on April 3, 2021

I wonder if you can get a VoIP number from a different country (where good regulation means spam is less prevalent) and use that for work calls?

ronsor · on April 3, 2021

I'm almost 100% sure your employer wouldn't want to make an international call every time they wanted to contact you by phone.

lanstin · on April 3, 2021

Work uses slack/teams/Webex. One person sends me Signal. No one has ever used telephony, except I use it to call he dial in numbers because my phone audio is better than Bluetooth / virus agent laden laptop displaying ten videos of peoples homes thru vpn.

fourier456 · on April 3, 2021

This also started a few weeks back for me, more unsolicited calls/texts.

ve55 · on April 3, 2021

This could be the first large breach we've seen from FB like this. Most past breaches were of a much different and smaller nature (scraping or API access abuse), and seeing a real leak like this could change the landscape for FB quite a bit, since historically companies like Facebook and Google have been very good with preventing them. I don't know a ton about FB's specifics, but there's a chance this data could be 'public' from people with the given privacy settings, if perhaps 25% of users have that turned on. If that is not the case though, then this would be the first serious breach from FB imo.

Either way at this point I operate under the expectation that most information I input into a database may be leaked at some point. This is particularly rough for services that demand and track a lot of things, but it cannot be helped.

banana_giraffe · on April 3, 2021

Looking at the leak others have pointed to, there are a surprising number of people working in a particular imaginary company:

    sqlite> select company, count(*) as c from usa where length(company) > 0 group by company order by c desc limit 10;
    company                                   c
    ----------------------------------------  ----------
    Self-Employed                             459119
    Facebook                                  181013
    Retired                                   71210
    The Krusty Krab                           61550
    Hollister Co.                             42304
    U.S. Army                                 39682
    Stay-at-home parent                       33095
    Walmart                                   31600
    McDonald's                                30792
    Student                                   25326

dhosek · on April 3, 2021

Before I deleted my Facebook account, it said I was chief sciolist at the Theodore J. Kaczynski Institute of Technology.

itronitron · on April 4, 2021

There is a particular scene in the movie America's Sweethearts that you would enjoy. I would post a link but I haven't been able to find on youtube.

edit: found it :) >> https://www.youtube.com/watch?v=q7Ufkf0YVAk&t=290s

_lffv · on April 3, 2021

Teens like to say they work at the Krusty Krab for some reason. No clue why but when I had Facebook I saw it a lot on peers' accounts.

userbinator · on April 3, 2021

https://en.wikipedia.org/wiki/Krusty_Krab

Probably SpongeBob fans.

_lffv · on April 4, 2021

I wouldn't call any of them SpongeBob "fans". We did grow up with SpongeBob though and SpongeBob is a common subject for Internet memes, so that might be it.

yaml-ops-guy · on April 4, 2021

I wouldn't call any of them SpongeBob "fans".

There’s over 70k of them, how can you be so certain?

cable2600 · on April 4, 2021

SpongeBob was originally for adults. But adults who grew up on Nickelodeon cartoons. Kids found it funny but don't always get the jokes because they are adult humor. They had to dumb it down for the kids. Before Netflix kids watched cable TV and Nickelodeon, and made a lot of memes about SpongeBob and the Krusty Crab. Every one of them wanted to work at the Krusty Crab with SpongBob and Squidward.

techmagus · on April 3, 2021

I associate "The Krusty Krab" to fake/secondary/tertiary/spam profiles. People I know personally and those I was able to confirm as legitimate profiles doesn't use that or any other fake information (or just leave it empty most of the time). I only see such fake information in accounts used to advertise their business in someone else's threads.

viraptor · on April 4, 2021

Culture / group related maybe? I have barely any friends on Facebook with real work details - a lot of them are made up. Especially doctors like to keep their profession (and real names) private.

gbear605 · on April 3, 2021

I definitely know real people (especially highschoolers or college students) who put fictional jobs in their profile. Also common is using some fake name, like that of a fictional character.

znpy · on April 4, 2021

Fake name, of course.

I deleted my account when Facebook started asking for is verification of my name, that was absolutely unacceptable.

uyt · on April 3, 2021

Can you link me to where you found the data?

banana_giraffe · on April 3, 2021

https://news.ycombinator.com/item?id=26682774

dr_kiszonka · on April 3, 2021

In the US, are there any legal repercussions for accessing such data? I wouldn't want to get into any trouble.

flycaliguy · on April 4, 2021

You won’t get in trouble but trust your gut if you’re just curious.

b212 · on April 3, 2021

Could you please tell me how did you convert it to sqlite? I've got a huge 1 GB txt file that crashes my comp every time I try to search for myself there :( Thank you!

banana_giraffe · on April 3, 2021

Super hacky python script I used to turn the text files into a sqlite database:

https://pastebin.com/gBWhCVGz

kuyan · on April 4, 2021

Thanks, this worked for me! Here's my contribution; I tweaked your script a little for memory usage. https://pastebin.com/4hA8VACe

shazzdeeds · on April 4, 2021

Thanks. I tweaked the script to include an incremental printout across files and also creating an index on phone num to easily search.

https://pastebin.com/AxY5PeDQ

bobbylarrybobby · on April 3, 2021

I suggest using a shell program like grep (or even the search feature of less), as shell programs are notoriously good at lazily seeking through files to keep memory use low.

knolan · on April 3, 2021

Firstly don’t do something like open it in notepad. 1GB text files are not exactly difficult to work with once you use a proper text editor or parsing tools.

throwmusic2366 · on April 3, 2021

Does it open with Geany?

https://www.geany.org/

It's FLOSS. Available on Windows, Mac and Linux.

cable2600 · on April 4, 2021

I used to use Borland's Brief to edit large files. But it is commercial. There is a free version in beta here https://sourceforge.net/projects/grief/ Grief. It pages the file in and out of memory to save space.

datavirtue · on April 3, 2021

Try Ultra Edit, free trial. It can read and search massive text files without crashing. Quite responsive.on 10GB files.

max_hammer · on April 4, 2021

You can use `awk` for any sql like operations on file.

swiley · on April 4, 2021

sqlight can ".read" a number of formats. It's one of the more convenient ways to do joins on ugly tables.

dunham · on April 3, 2021

What's the count of people who elected not to enter their company?