Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Who really gives your personal info to Intelius, Instant Checkmate, etc?
304 points by dataflow on March 6, 2017 | hide | past | favorite | 247 comments
Please excuse the frustrated tone; the "secrecy" all over the internet about the whole issue has been driving me insane.

The usual useless BS is "oh, these companies get it from governmental public records". Yeah, right. I'm pretty darn sure that some of the personal information I can find online on MyLife, Intelius, InstantCheckmate, Spokeo, etc. is not in some government agency's public record, and regardless, surely there's no way that hundreds of companies are repeating each others' work over and over again when they could just buy the information from someone.

Someone (or a few) hidden underneath has to be doing the heavy-lifting of scraping people's data from sketchy sources and selling them to third-party companies while staying hidden. My question is, who are these, and (where it is possible to know) whom are they selling to? How can I find out? Surely someone knows, and I'm tired of playing this goose chase where those who don't know just make random guesses as to how the information must be coming from some some public records, and those who do know say hardly anything beyond "you have to know where to look".

I'm not looking for just 1 pointer, though I would appreciate it. I'm tired of pointer chasing. I'm just looking for as comprehensive a list as possible. It has to exist somewhere... after all, when a court needs to order that someone's information be purged (for whatever reason, e.g. for safety), it's got to have a list of these data aggregators somewhere, so I'm sure some people must know. So how do I find out? I'm hoping to also learn to fish in addition to being given the fish.

Thank you!

I've had the misfortune to be present for an in-person demo of a verification service nearly a decade ago. It involved those relationship questions ("Which one of the following people have you not lived with in the last 5 years?") that are incredibly creepy.

I was shocked that they had so much data on me -- I have no debt, no credit cards, no house, no car, no bills, and I had always entered informal rent agreements (I was poor) up to that point -- yet the rep was easily able to list all the places I had resided from college to present, along with a host of off-the-books housemates.

"Where the fsck did you get all this?!" I demanded.

"Have you ever ordered a pizza?"

Turns out, some fast food chains do a brisk business in reselling customer data. I had ordered from Domino's once, but that was enough to link my name to a specific location.

This experience has made me extremely sensitive about the information I give while making a purchase. When I lived in the US, I stopped having food delivered, paid in cash, and never signed up for branded credit cards. I rented informally and, whenever possible, tried to just pay the landlord my share of utilities in cash. Anything to keep a lower profile.

Far more companies than you realize are collecting as much data as possible about you, your habits, and your relations. So I'm afraid your search for a canonical list of data sources is ultimately fruitless. In this new economy, you are always the product.

Do we forget that not long ago everyone's name, address, and phone number was published in the phone book (unless you paid extra for an unlisted number), and hospital admissions were published in the daily newspaper. It used to be unremarkable, and now people shriek "privacy!!" when it's discovered that some mundane detail about their life is not a closely held secret.

No we don't forget.

Those systems were analog and could only be scaled to a certain extent, after which you ran into management overhead.

So in the example given by the OP, dominoes would not be updating your name in the directory.

That would be data that evaporated and never made it to record.

To increase the contrast - whole new types of clustering and analysis are now possible, at the moment a new data point is received.

Indeed, the amount and types of published information hasn't broadly changed, what has changed is the information horizon; the ability to search for information and analyse it. The world hasn't become smaller, our ability to see is much much wider. The information horizon is much further away from zero than it used to be (thanks to the computer's / big data's ability to communicate and analyse), and it's impact on society is much more than just privacy.

This reminds me of the decision on GPS trackers on cars: https://en.wikipedia.org/wiki/United_States_v._Jones_(2012)

Dreeben cited United States v. Knotts as an example where police were allowed to use a device known as a "beeper" that allows the tracking of a car from a short distance away. Chief Justice Roberts distinguished the current case from Knotts, saying that using a beeper still took "a lot of work" whereas a GPS device allows the police to "sit back in the station ... and push a button whenever they want to find out where the car is."

i once worked on a project (about 15 years ago) where we had access to a reverse phone book listing that was generated in part by having a copy of every distinct paper phonebook in the US (also things like school directories) sent overseas to be hand entered (2-3 times for QC). A "nice" feature of it was that the address info was also geocoded so we could answer questions like "Who are my closest 100 neighbors and what are their home phone numbers?". Between that and Google buying the archives of usenet[1] in the same basic timeframe I realized that if it's written down its not likely to stay private for long.

Edit: add citation [1] https://en.wikipedia.org/wiki/Google_Groups#Deja_News

The phonebook was a paper-based system, with significant costs to look up and act on information at scale. You needed the books, the researchers, and the time to find and act on specific numbers.

The numbers weren't widely cross-referenced across multiple other identifier databases: shopping, driving, voting, location to 1m accuracy and 30s time precision.

Every household, small business, mafioso, or political operative didn't have a full copy of the archive, and the ability to deploy it at a moment's notice.

Or in short: scale and costs matter. In fact they dominate all other effects.

Your objection is both meaningless and betrays a profound failure of undrestanding and sympathy.

"Do we forget that not long ago everyone's name, address, and phone number was published in the phone book [...]"

The phone book is an opt-in system, with the ability to remain "unlisted". There's absolutely no such choice, though, from these modern, private data collectors. And they're everywhere.

Consider the following hypothetical: A battered wife flees her husband and takes shelter in a shared house run by a women's organization.

In your phone book world, she didn't need to dramatically alter her existence to stay safe. She could continue to maintain a reasonable subset of her professional and social relationships. She could get a new job in a new city. She could buy groceries. She could pay her own electricity bill. And she definitely didn't need to take her life in her hands to order a cheese pizza.

In this brave new world, though, her ex can track her with the utmost ease. She has to shed nearly every piece of her modern identity -- reconfigure apps, wipe her phone/laptop/tablet, stop using sites that geolocate via IP, and halt nearly all interactions with the modern economy -- at significant social cost.

The sheer number and diversity of data collection points -- and their increasing necessity to participate in the digital economy -- makes opting out orders-of-magnitude more difficult. Yours is a false comparison.

It's likely the locality / ephemerality / repudiability of those identifiers made them less of a big deal in an era when people couldn't instantly look these details up or build a detailed profile of everybody given the right data sources.

Brad de Long addressed this a while back.


Wait hospital admissions? Really, why?

I imagine your friends didn't tweet or message a Telegram group about having had an accident back in the day where that was common (whenever that was, I've never seen it). When the telegraph is the fastest method of communication and landline phones are expensive, you might not learn about a friend being in the hospital until word of mouth reaches you.

Not saying this is the one true answer, but I don't find it hard to imagine why.

Temporary postal mail forewards.

USPS NCOA (Change of Address) file is another major data source.


What makes me crazy about having your mail forwarded is that they often don't even forward it to your new address. I know because I once lived with a crazy person and had to move out and they just kept delivering all my mail there despite duly submitting my forwarding address and stressing the importance of it.

I still get mail for the last folks who lived here and they moved out 13 years ago.

But anybody willing to pay the post office for the data can find out your new address.

And then there's how we pay higher rates for first class mail so that junk mail can be cheaper.

It's almost like the government doesn't work for the people.

Thanks for the tip, from the link:

"There is, however, a loophole that keeps data brokers from accessing your updated address. When you fill out the online form to change an address, you can indicate a temporary change that provides six months of forwarding that can then be extended for another six months. That information, unlike the changes marked as permanent, is not included in the master list sold to data brokers."

And the online link for online change of address with the loophole:


Yes, but will they fail to forward the mail as reliably as they fail to forward permanently forwarded mail?


(I've mentioned the temporary forwards tip elsewhere in the thread myself.)

Their privacy policy says they don't do this:


That link at the bottom of the email states that they unsubscribe you from the mailing list.

Odd how I always seem to get more spam, but from different companies, after I click one. Almost as though all I've done is confirm that the email address actually exists, and is therefore more valuable to sell on to others.

Perhaps the policy is new. This was about a decade ago.

Regardless, that data is out there now, and no change in privacy policy is going to put that cat back into the bag. Besides, someone else is surely acting as a new source for current data.

It wasn't just my location information, though. They were able to ask detailed questions about my personal social network, before Facebook was ubiquitous and LinkedIn was a thing. They were clearly joining several disparate datasets together to discovery relationships.

You might enjoy the book "how to be invisible" by J. J. Luna

Good to know. Let's all start ordering pizza's, etc under random other names. Create a haystack to hide the needle.

A better strat would be if we all agreed on a single name for everyone to use.

I propose Null or Drop Table.

Why not both? :)


easy to filter out

This would not work, the data is heavily cross-referenced. You would simply have more names (as possible neighbors, or rather co-habitators) in your account.

I have insider knowledge as I used to work for one of these companies.

Depending on the product you purchase the data comes from multiple sources. Also these companies have sophisticated machine learning capabilities to build a profile based on various attributes found in seemingly unrelated pieces of data.

So the list consists of credit reporting agencies, public records, your online profiles with public access, court records, aggregators like LexisNexis and dozens like them.

This heavy lifting you speak of is done differently by each company and consists literally based on multiple sources to enrich your profile. These companies spend millions on data and engineering and make even more, and whatever preconceived notion you have about courts ordering to seal your records, it doesn't happen in a centralized fashion, you would need to contact each data vendor individually to be removed. But it would be like playing whackamole.

Thank you for the reply. Can you actually give some kind of a list though? The entire problem here is everyone explains the how but no one is willing to explain the who. I certainly understand it's "multiple sources", I'm asking who are these sources I keep hearing about. There can't be nearly as many sources as there are sites who buy from them. If you'd like to not name the company you worked for yourself then could you at least please list as many other ones as possible? That would be far, far more helpful than just saying they use machine learning and that they use multiple sources, etc.

I can give you one very specific answer in that my late father decades ago worked in IT for a collections agency and one way that biz works is they pay a fixed amount to buy some companies 120+ day accounts receivable file (the feds and every state highly regulates this and its highly variable and complicated, but this is the simplified version...) and this gave them a vast pile of records and legal ability to collect the debt. Now obviously they got most of their revenue by annoying the heck out of debtors right up to the legal limit. But they're leaving money on the table if they don't resell all the data they can legally sell. So sure, even 20+ years ago collections agencies were uploading records to various data aggregators in exchange for a check. Not just the original debt but followup activity and intel WRT the collections process. I would imagine that's only increased over the last couple decades. Most of these data transfers were two way streets, not simple data for money trade, and obviously they maximized profits by leaning the hardest on the people most likely to actually pay up, they did not waste time on debtors known to other collections agencies as hopeless deadbeats.

There is no such public list. Most of these companies are privately held, their methods are trade secrets, and there is no real form of legal recourse. Your best bet would be to buy a book like this: https://www.amazon.com/Hiding-Internet-Eliminating-Personal-... and follow the recommendations, but even that is only likely to be half-effective.

Much of the public information is mined from sources like credit headers, your court records, utility bills, property and tax assessment records, voter registration lists, motor vehicle registrations, etc.

Unfortunately, the legal and technological landscape is such that 'hiding' from these kinds of services is effectively impossible.

I have a friend who is super paranoid about privacy stuff. He was even able to get his utility bills to use a PO box rather than his home address, which he said was extremely difficult.

Depends where you are in the country. Many rural places don't have home delivery and only have P.O. Boxes. Home delivery from Amazon, etc. requires a bit of work.

> There is no such public list. Most of these companies are privately held, their methods are trade secrets, and there is no real form of legal recourse.

How can there not be a public list of these data miners? When e.g. a court needs to control someone's information surely they know who these people are and they can let them know? Is there a secret list in every courthouse or something?

Or when someone wants to start another one of the higher-level companies -- how do they know which core aggregators to buy from? If that's a secret then how would they find out? Surely someone's gotta be willing to tell?

a court needs to control someone's information surely they know who these people are and they can let them know?

I think you have a misunderstanding of what a US court can do. A court can only tell a specific party to take some action, and generally only if that party is somehow related to the legal action (such as being a defendant). Generally, there is no judgement that a court can make that can effect unnamed parties (unless they are John Does, which later have to be named).

Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

Surely someone's gotta be willing to tell?

The techniques used are generally trade secrets, and amount to competitive advantage. There is little incentive for a company to reveal this information (or for an employee to do so, and thus open themselves up to legal liability).

What about this though:

>> Or when someone wants to start another one of the higher-level companies -- how do they know which core aggregators to buy from? If that's a secret then how would they find out?

   > If that's a secret then how would they find out?
You don't. I've worked in data acquisition in the past, both buying data and selling it. Sometimes as the original source of truth and sometimes as a middleman that does data cleaning, standardization, appending (from other sources), then selling the derived product downstream.

Companies in that space guard their upstream sources quite heavily, because they don't want to be cut out of the process. You won't find a centralized list of independent data feeds and providers specifically because of that. In one scenario, we were dealing with a substantial rate increase from one supplier. We spent time attempting to source an alternate supplier of that particular type of data, and could only find sources that were several months more stale than we were currently getting (i.e. these people were getting the feed several hops after we were). In the end we paid the rate increase because we couldn't find an alternate source that was as close to the original data provider as our current source. And without knowing who the original data provider was, we couldn't go around our supplier.

The lack of a centralized directory isn't just done to make things opaque for end users, it's done to make things opaque for business competitors as well. It's an industry that's very, very reliant on networking and introductions.

Edited to add: You're also asking a lot of people in here to name specific companies even if they can't give you huge lists. This space is super heavy on NDAs (and trigger happy on enforcing them). If you've actually worked in it, there's simply no way you're able to name drop legally.

+1 thanks for the explanation!

And regarding this:

> Edited to add: You're also asking a lot of people in here to name specific companies even if they can't give you huge lists. This space is super heavy on NDAs (and trigger happy on enforcing them). If you've actually worked in it, there's simply no way you're able to name drop legally.

I understand that an NDA would prevent you from naming your own company or your suppliers and clients, but surely it doesn't prevent you from listing some other companies in this space that you know of (including but not limited to your competitors)? I don't understand why you shouldn't be able to name any company just because you've worked at one of them.

Every company I know in the space is a company that we've had at least preliminary conversations with (seeing if there's any potential relationship to either purchase from or sell to that company).

Just having that conversation required getting a mutual NDA in place, since the conversation involves revealing your capabilities (even if not your sources). And that's assuming you're even aware of all the NDAs your company has signed with other companies, which isn't always the case. Speculation or name dropping in public could violate an NDA you're not even aware of, then you find yourself having to defend your speculation as just that, rather than as revealing proprietary knowledge (that you didn't actually have but your company did).

At the end of the day, it's easier to default to speaking in generalizations rather than risk the potential repercussions of not doing that. :-/

> Just having that conversation required getting a mutual NDA in place, since the conversation involves revealing your capabilities (even if not your sources). And that's assuming you're even aware of all the NDAs your company has signed with other companies, which isn't always the case.

Interesting. Eight years ago I worked as a buyer for the aggregator with the largest criminal database and I don't remember having to sign an NDA during those sorts of talks. It's possible I've forgotten, but I think it was more a matter of people wanting to know our coverage as much as we wanted to know theirs.

As an aside, the value of those talks wasn't usually in acquiring the data, except maybe in the short term. We always preferred to go directly to the source. The value was simply in learning that the data from a specific source was even available. In a few instances, that got pretty frustrating, since I knew that the seller of the data was scraping a given court's website against that court's wishes (and the TOS on the website), with all associated problems with accuracy and ethics that entails.

Okay, I see. But somehow someone in your company found out about them before they could sign mutual NDAs, right? How does that happen? Do you just need a higher-up who's friends with the right people?

Imagine the legal liability you'd incur if the layperson could track the source of inaccurate data, then proved in court it kept them from getting a loan.

Those are provable damages, and maybe slander if they can convince the court they're a publication

Another reason to be NDAed up.

i imagine its similar to how drugs and weapons dealers meet. when you know seedy people, you get connected. birds of a feather and all that.

Go to an industry site like www.napbs.com and look around.

> Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

As an individual, that sounds like a lot of work for little gain, but if there were pre-filled out forms, and all I had to do was put in my name, I'd be willing to file lawsuits to get my name removed.

>Theoretically, you could sue each company with your data, and a court could tell each of those companies to remove your information. But it would have to be for each one, and the judgement is only binding on those companies.

I'd expect going after such companies to be the state prosecutor's job.

If there are many companies, surely it would be in their interests to leak one another for competitive advantage?

What about class action? Could you make a class action against those who have sold your information?

A class action is an action where there are a large, possibly unknown, number of plaintiffs suing a single group of named defendants. With the thought process being that if a single case against a named defendant is won, then that same outcome would happen for every other plaintiff, and thus doing it once saves the court time.

What you're suggesting is more like a John Doe case where you sue a number of unknown entities, and that can be won, but at some point, the plaintiff has to name the John Does, so that they can defend themselves.

Couldn't you start a class action against one named defendant, and use the discovery process to uncover all the other unknown-at-the-time-of-filing defendants?

I'm pretty sure that if the named defendant coughs up an NDA that prevents them from disclosing the names of their business associates to a court, the judge is not just going to say, "I'll allow it."

It has successfully been done in the past, so yes, you could do that.

You grossly underestimate the scope and power of the United States judicial system, which does not ever have a need or desire to "control someone's information."

> When e.g. a court needs to control someone's information surely they know who these people are and they can let them know?

The companies which furnish personal data aggregated from courts are legally required to stay on top of out-of-date records and purge records which are inaccurate. (it would be completely untenable for things to work the other way, with every court and agency which made records available being required to reach out to every recipient of the data. For one thing, in some cases the data can legally be resold.) They can be held civilly liable for distributing false information and might also be in breach of their agreements with the agencies which give them access to the data.

The urgency with which this is required under the law depends on the use to which the data is being put. It's really important to keep data used in pre-employment reports up to date. Marketing data can generally be full of garbage.

This is all generally governed in the United States under the FCRA. It's not an area of the law that you're going to get comfortably familiar with in just an afternoon of reading.

> When e.g. a court needs to control someone's information surely they know who these people are and they can let them know? Is there a secret list in every courthouse or something?

The courts don't control information in that way.

That's surprising. So what happens when there is a high profile case and the court fears for the jurors' lives? Or the case itself involves someone whose life will be in danger afterwards? If anybody can find out where these people live then they're toast. The courts have to suppress the information legally somehow, right? If not the old information, at least they need some protection against mining of new information after the subjects move or change their names, no?

Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

> Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

This is exactly what he's saying. There is no central node of control for this kind of information. You are operating from entirely unjustified assumptions.

> So what happens when there is a high profile case and the court fears for the jurors' lives?

Jurors' identities are not generally a secret.[0] There are exceptions, but those exceptions do not extend to wiping that person's data from things like pharmacy and gas station reward card databases.

Honestly your entire premise shows a lack of understanding of how the criminal justice system works.

> Are you literally saying they have no way to order all the first-level companies to stop sharing data on someone?

Sue all of them individually, win each case, and have them ordered to stop collecting data on you. Something tell me this is tantamount to "don't use a computer or a credit card. Ever."

[0] http://www.legalmatch.com/law-library/article/public-access-...

> Jurors' identities are not generally a secret.[0] There are exceptions

I was talking about the exceptions. If there are exceptions, a way to handle them must exist, is all I was saying.

> but those exceptions do not extend to wiping that person's data from things like pharmacy and gas station reward card databases.

I was asking about the companies who obtain this original information, not pharmacies' or gas stations' databases themselves. I feel like you're not understanding my question?

> Honestly your entire premise shows a lack of understanding of how the criminal justice system works.

Quiet likely (I'm not claiming otherwise; I'm not a lawyer and I haven't exactly been involved in legal proceedings) and I didn't claim otherwise. Also hardly undermines my point. Like I've said in some 3-4 other comments, someone who e.g. starts a new company like InstantCheckmate has to know whom to buy the data from -- like I said, there's no way all of these companies contact all grocery stores and all doctors. That's insane. Someone's gotta be doing the heavy lifting and making money off it. I'm asking who this is. I asked this in the original post. If the court example is wrong or otherwise bothers you just ignore it. If someone not already involved in the business knows to contact these companies to obtain information, someone must know who they are, is all I'm saying. Otherwise they would not exist.

> I was talking about the exceptions. If there are exceptions, a way to handle them must exist, is all I was saying.

There is. They put them in a hotel, under police guard, for the duration of the trial. https://en.wikipedia.org/wiki/Jury_sequestration

What if I told you there are countries where you can use a credit card, and still have reasonable expectations about privacy, where the information gatherer must erase personal information on demand and bears the burden of erasing it and notifying everyone they sold the information to erase it too?

id like to know which ones? id consider relocating.

1. They don't know, but assume some Western European country fits that description.

2. They misunderstand German privacy laws.

For very high profile cases, a jury would be sequestered, meaning they are housed somewhere and not allowed to interact with the public. If they were in danger, their would be guards protecting them.

After the trial is over, though, they are on their own. They will not continue to be protected, and could certainly suffer retaliation. It sucks, yes.


If I was a company, I would be writing scraping engines that scraped

* police 2 citizen (a platform many counties and municipalities use to report crime and accidents to the public)

* any public facing dataworks plus web application (or whatever various other municipalities/counties are running): the one for the county I live in lists the arrestee's employer

* district level and state level court dockets

* real estate records, which also link up to tax bills

* public voter records

Not surprisingly, only the information I've ever listed on my voter ID has ever showed up in Intelius/LexisNexis databases.

I can provide all the sources used for the company I used to work for, however that's irrelevant if you as you said wanted to learn how to fish.

Any and every company selling data and there are thousands is in fact a source you would need to deal with to be removed from the sites that sell it. If you think this answer isn't specific enough, you're not going to achieve anything by me spoonfeeding you info for one such company.

>I can provide all the sources used for the company I used to work for, however that's irrelevant if you as you said wanted to learn how to fish.

Personally, I think this should be a requirement of the government: "HERE IS A LIST OF ALL PEOPLE THAT ARE COLECTING AND SELLING YOUR PERSONAL DATA, CLICK THIS BUTTON TO DELETE YOUR RECORDS" sort of thing... it should be a mandated public service of regulation.

Then, they have a list of all opt-outs from which companies you may have selected to opt-out from, and you can simply send them any further contact from the opted-out companies you may receive and the company gets a fine, you get a compensation fee...

It used to be like this in many European states back in the 80's. If you wanted to keep a computerized database of personal information, you had to apply to the state for a permit, so there was central control of all the lists. Of course this was extremely difficult to enforce... Eventually that type of legislation was replaced by the EU data directive.

e.g. In Sweden: http://www.datainspektionen.se/om-oss/historik/

2 things, 1.)if you make government enforce this, then we are all going to be paying more and things taking longer. This will push the society toward a fascist or communist states where the government is the all powerful and might beast. I do not like this.

2.) in western countries there are already laws that allow you as the consumer to remove any personally identifiable information from being shared. The ownus is on YOU to go do that, it's not hard but will require to track and follow up every year for ever. However you also have a choice of companies that you provide the information to, and what you provide. Any credit agency when you apply for credit asks if you would allow them to resell this. Opt out!

The fact that in most cases people are to lazy to read and understand what they are provinding and for what purpose is the real issue here. Government is not going to make things better or more secure or even be able to enforce this type of governance. Only you and the lawyers can do this.

I don't think it's fair to call people lazy for not reading every line of an exhaustingly-long TOS. Also, it isn't fair to expect every consumer to fully understand the legalese that they are reading if they do decide to read it end to end. "Don't click 'I Agree' if you don't know what you're agreeing to!" is unrealistic.

This is sort of how European data protection is supposed to work, although you have to maintain your own list as there is no central one. And the enforcement is intermittent.

(Data subject requests are a very powerful tool)

If you ever buy anything with a CC, use a "membership" card anywhere, or have anything delivered, or for that matter purchased online, then there's a greater chance than not this data was shared by the merchant, transaction provider and/or the credit card company themselves.

If you have an online profile, with any friends that aren't paranoid, and allow your friends to see any private information, then this can be collected/correlated by the various bot farms.

> There can't be nearly as many sources as there are sites who buy from them.

Sure there can. The sources include state, county, and local governments. There are a lot of those.

As a person who used to be involved in that industry, do you try to protect your data more? Or is it just a losing battle?

Yes, I protect the shit out of my personal info. I have introduced a seeding mechanism to figure out who is sharing my data whenever I sign up for things. Public records are different and you have to create a Trust or a company to hide behind to not be personally listed.

I would suggest anyone wanting to not be exposed to never use your real name or address online if you're not confident they won't share it. Introducing typos or initials or use a nickname for online orders that allow to list billing address separate from shipping address. Billing data typically has a better chance of not being resold, but there are no guarantees, read their TOS if not sure.

I really think you should do a how-to / AMA

Financial ignorance is an exceedingly exploited issue in the US, and I do not think anyone is doing anything about this.

Would you be willing to write up a guide of some sort about the steps that someone could take to protect their information? Personally, I would be willing to pay for something like that.

Are any of these companies operating in Europe and/or with mostly European data?

If anyone reading this thread is interested: I would pay non-trivial amounts of money on a regular basis for a service that systematically worked to eliminate records like these (and the sources they draw from), as well as chasing down sources of junk mail and the lists they ultimately draw from.

The value would depend on effectiveness, and on the degree to which the service clearly reported exactly what they did. Calling and unsubscribing from sources of junk mail would be a moderate time-saver, but finding out where they got their names and addresses from and destroying those would be far more valuable.

It'd take some optimization and batching of the process to figure out how to avoid taking an excessive amount of time per person.

I've considered working on this problem from a business standpoint, but I couldn't figure out a good business model for it. I don't think too many people will pay a monthly fee to have their information removed from these services. My guess is that they would sign up for a month and after their information has been removed, immediately cancel their subscription until they needed to do it again. And a yearly fee seemed like it would cost too much for mass adoption.

There's also the problem that you'll often need to get the customer to "opt out" by providing their own information to verify they own it or they will need to click on a link from an email or receive an SMS text verification code. This gets really messy as an automated service.

I used to work for https://www.reputationdefender.com/privacy - that is (was) exactly their business model. One of the big selling points was the time savings, where it would take you 80 hours a week or some crazy number if you did it yourself, filling out forms and keeping up with all of the new services that aggregate and sell this stuff. Doing it once is great, but won't help you after that month - all of the same places that initially sent all your data in to the aggregators are just going to do it again next month, so you'll get a new record all over again.

They have largely pivoted since then, into a service primarily for reviews and feedback management. I don't have any insight into the quality of the existing service on either side.

It's possible that the business could be so successful that everyone uses it, the services selling this information all run out of customers and go out of business, none of them come up with newer and more evil ways to do this, and you run out of potential customers. In which case: mission accomplished, retire on your giant pile of money and bask in the knowledge that you made a far better place. (Avoid scenarios in which you have perverse incentives to allow the problem to continue.)

But in the meantime, tens of millions of potential customers times any reasonable fee seems more than enough to build a substantial business on.

You could tempt people in with a cheap fee to let them send in a few pictures of junk mail and stop those. (As you get more, find the biggest sources and automate or batch them so that they cost you almost nothing, which will pay for the higher-effort ones. Have an upper bound on effort expended, and tell people that they don't pay if you can't remove them.) You could then track down the underlying sources, and if you successfully identify them, contact the customer, and give them enough information to decide to pay you for a higher-end service to get them removed from those sources (and keep them removed).

The value that gets people to keep paying you would be a steady stream of reports of "we found this source leaking/selling your information, here's what we did about it". It'll take you years to track down all such sources and find paths to remove them; you will likely end up having to fund some legal work and possibly even a lawsuit or two, which will give you a giant pile of publicity.

(As one example of something much easier for a company optimized for the process to do than an individual: the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person. I read a report of someone doing that to stop receiving persistent Dell catalogs.)

If you're sufficiently creative, you could even pitch this as a service to marketing companies. You have a list of people who will not buy anything via direct mail, and who will despise any company that they receive such mail from. Convince the sources of postal spam that removing those people from their list makes the rest of their list more valuable. Convince the downstream customers of those sources that using your list directly is far more convenient for them than dealing with opt-outs from every individual on it.

That also gives people a continued incentive to pay to remain on that list.

>the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person. I read a report of someone doing that to stop receiving persistent Dell catalogs.)

I would love to learn this process! I've repeatedly asked for a certain mailing to stop and it hasn't ceased.

I'd rather see much larger bulk rates via the USPS for something like "environmental impact".

For example, I'm currently getting no less than 3 letters per week from Spectrum (formerly TWC) promoting their new triple-play plans. I drop all of them in the recycling bin.

I couldn't care less about their bottom line, but I do care about the environmental impact.

At 3x per week, it probably costs them around USD$0.80/week (postage, paper, printing, etc.) to send those 3 letters. Call it USD$1 to make the math easier. I currently pay about $9.03/week. If I upgraded, it would be at least 3x that amount.

I'm not sure how to calculate the profit they would gain after an upgrade, but I can't imagine it would take more than 2-3 months for the new rates to more than cover the mailing fees for an entire year of their letters.

And yet, I'll never upgrade and I hate the waste caused by their practices.

Why does this happen? Is there some subcontractor who just bills by the piece so they don't care about de-duping the idiotic mailmerge that has 5 entries for each entity I'm affiliated with?

In the case I listed above, these are separate, distinct letters. Certainly from an automated process, but clearly all part of a larger marketing program.

In the ones like Dell (which I've also experienced in the past), I suspect there's some metric involved for getting the most contacts for "coverage". The reality is unless they were tracking my moves to different companies, there could be different people with the same name. In that sense, I'm very glad they can't correlate with employment data.

There's also the idea that sending to multiple people within a company means someone may see something they want and try to go through the procurement process because of the catalog, rather than because "IT" says it's time for a refresh.

Not that I agree with any of those practices, for a number of reasons, but I could understand the case for them.

Cable and phone companies don't seem to care about efficiency in their marketing efforts. Almost everywhere they're allowed to count it as an expense and fold it into their regulated costs and make a regulated profit off it.

Putting on my cynic hat, buying lots of newspaper, television and radio ads is a good way to keep the media on the sidelines instead of criticizing the incumbents.

You want a USPS form 1500 (https://about.usps.com/forms/ps1500.pdf). The intended use is for reporting obscene mail but the decision about whether something is objectionable is left up to the addressee (Rowan v. United States Post Office Department - https://www.law.cornell.edu/supremecourt/text/397/728).

>"the USPS has a detailed process for formally putting a company on notice for mailing someone who has specifically unsubscribed, and that process ends in massive fines for continued mailing to that person."

I am doubting this is enforced in any meaningful way. I don't doubt there is a well-detailed process however the US Post Office's bread and butters seems to be Dell Catalogs and Southwest Airlines credit card offers. This fact is what killed Outbox:


Dell is pretty bad. I get every month three (when one is more than enough) catalogs to my address, because of slight variances in my name, even though they are all the same person.

We get four catalogs a month to three different people, one of which gets two catalogs; one addressed to their role as Vice President, and one as Boardmember.

We don't even want _one_ of them. It's not like we're going to see a new printer in a catalog and say "hey, sounds like a great buy!"

Maybe the solution is to sign up about 3000 people at every address you've ever lived at, if all of us sign up for hardcopy catalogues everywhere under every variant of our own names and fictitious names maybe they'll eventually stop or go out if business.

Why bother yourself or the new residents of your home? Sign up any Dell related outfits (resellers, etc) for them instead.

People, like myself, pay a monthly fee for credit monitoring services. A lot of these services will identify if personal information of yours is publicly available on the web.

I only know of Instant Checkmate because my fiancé uses Mint Credit Monitoring and they notified her that her info was available on that site. I promptly helped her opt out of the site. I would have loved if Mint just had a 'Help me purge my info from this site' button, because I felt dirty just having to confirm my Fiancés info was on that site and then go through their process to remove it.

>"I don't think too many people will pay a monthly fee to have their information removed from these services."

I would be interested in hearing what are you basing this on? Did you do some market research?

My feeling is that there is money in privacy, people seem to have no problem paying 5 or 10 dollars a month for a VPN provider for instance.

I can't remember the exact reasons since it was something I was researching a couple years ago. I saw a few competing services that offered annual plans in the $50-$100 range and I think I divided that by 12 and then imagined what would happen when people got most of the value out of the service on first use. It seems to be a very niche market and I'm not sure how much of a pain point it really is to that niche.

I had some other ideas to differentiate, which were a bit of a gray-area hack. This included pulling from the same data sources that Spokeo and similar services use to be able to search for that data. That way, the user wouldn't have to enter any personal information other than say an email address.

You could then offer a free service that let users enter their email address and they would find all the sites you've crawled where you found their information (similar to haveibeenpwned.com), with an upsell to automatically remove it.

Once a subscriber was signed up or previously had used the free service and hadn't opted out, I would keep crawling services for additional data and then give them a warning email when their data popped up again.

It seemed like a semi-reasonable business model at the time, but just a really small market, and I had other business ideas I wanted to pursue.

Interesting points. I wonder if giving the customer visibility into events like "you data was successfully removed from X" would make a difference? Similar to peeking at your Spam folder now and again to see that it is working for you.

Also about the market size I imagine the market is just US? It seems other places, at leas Europe anyway has better laws to protect against these kinds of invasive services.

What about pay-per-record? If you identify 100 records? You could charge $50 (50 cents per record, as an example). Shows the value versus a blanket annual fee. The price-per-record could be reduced for the more records they have, or capped at $100 or something.

Why not structure the company as a public benefit corporation or a non-profit?

> Why not structure the company as a public benefit corporation or a non-profit?

This seems like a good idea; I'd happily support such an endeavor.

You could always "enhance" the business case by selling "whitelisting" to some of these data collection services. Just ask the guy from adblock plus.

What about letting a user enter all their pertinent details, then having a "NUKE ME" button and it would eval how many sites and places to nuke from, and you base your costs to the user based on the effort to nuke the info you find. Then give a very straightforward list to the user and let them select which site/item they want to nuke and provide an upfront cost to them.

* You're found on sites 1, 2, 3

* Nuking site 1 is $10.00

* 2 is $1.50

* 3 is $22.55


Or something like that?

Offering a reward that rises in proportion to the number of sites with your personal information might just provide an incentive to create more such sites.

The more you'd pay for privacy, the more it would be worth to violate it.

Game theory on your comment, sure... but I was just referring to "price per record/line item" -- how to separate these???

>I don't think too many people will pay a monthly fee to have their information removed from these services.

$5-10 a month right here, and I am not rich (yet). Time is valuable. My offer goes up with assurances/contingencies if my name is still on the lists/I am affected (e.g. LifeLock's $1M backing guarantee).

Well, there could be a possible solution to your first problem: offer both a high priced one-time removal attempt, or a recurring subscription with a minimum contractual obligation (6 months for example).

why would people pay for the opposite of what they daily work on at Facebook for free: sharing their private life.

the new generation grew up with spam and giving out private info like there's no tomorrow.

I may sound like a pessimist, but try to talk to any 14-16 year old and see for yourself.

What about a better business idea. Accept cryptocurrency to shut down these aggregating and re-selling services. Physically, by any means necessary. ANY.

Obligatory reference to https://cryptome.org/ap.htm

I actually am running a (year long) experiment. I found a data broker who had my info. They offered a way to "correct" the record, so I added a car that I don't own (who's warranty should be ready for extended warranty offers in a few months) and added many many "off by one" errors like an extra zero on my income, or transposed digits on the size of my house.

We'll see where it lands.

I personally would also pay for this sort of service.

> I found a data broker who had my info.

How...? Or do you just mean sites like InstanCheckmate themeselves?

> They offered a way to "correct" the record, so I added a car that I don't own

That's amazing! They didn't need proof? How did you convince them? Is this legal?

Is this legal?


That said, fraud is usually defined as "An intentional misrepresentation of material existing fact made by one person to another with knowledge of its falsity and for the purpose of inducing the other person to act, and upon which the other person relies with resulting injury or damage."

So, in order for such misinformation to be illegal, the data broker needs only to demonstrate injury or damage. Unless the data broker makes assurances to their customer about the truth of the information and gets sued by one of their customers for providing false information, I find it hard to believe that the data broker will be able to demonstrate injury or damages.

I think the data broker knows better than to make such claims about their data but who really knows. And it isn't like the downstream customers are going to independently verify all the information.

That makes no sense, unless there's a contract, or at least a business relationship.

Why wouldn't it be? It's not like you are lying on tax forms.

> I found a data broker who had my info.

What's the name of it?

I've come across sites like these (SafeShepherd, though it doesn't claim to wipe your information from the "sources", only the sites they see your information on); the problem is I'm not sure I would trust them with something like my driver's license photo. Would you? Or what if they decide to sell your information behind your back later?

Edit: Maybe it was a different site that needed your license (I thought it was SafeShepherd but I can't find it anymore). I know I've definitely come across ones that do.

I wouldn't trust them with a driver's license (why would they even need that?), but I'd trust them with name/address/phone/email information, because that's the information that seems far too readily available already.

I know they need it because a lot of the info-searching sites that they "opt" you "out" from require it. I feel like I read that some need SSN too but I'm not sure about that one.

Also note that SafeShepherd has this fine clause right at the end [1]:

"You agree that Safe Shepherd isn't liable for any failure to comply with these Terms."

What is this supposed to mean? Would you accept it?

[1] https://www.safeshepherd.com/tos

Well that seems to imply that if you violate the TOS, it's your fault, not theirs. Seems reasonable?

It also implies if THEY violate their TOS, it's your fault, not theirs. Seems reasonable?

There was a time (not sure if it's still the case) that the sites that post your info require proof of identity to remove your info

SafeShepherd seems like it's dying. They were active shortly after launch, but it's basically been radio silence ever since. I used to have an account with them, and I didn't find much value from them after the initial cleanup wave (they didn't seem to be keeping up with the new sites that came online).

This service exists, I've used it, can vouch for it. https://www.abine.com/deleteme/landing.php

This Reddit comment suggests DeleteMe doesn't work so well.


Can confirm... manually went through and opted out of everything I could find... made me feel shitty because I had to provide photos and ID card photos to do it... less than a year later it was all back again. Hired one of the services to do it the second time around... they got maybe 50% of it... and less than a year again it was all back (note I was still paying the monthly subscription to have it all purged). Worse... I had to tell them what sites I found that they missed and waste time dealing with support issues... and let's be honest neither the companies who put the shit up, or the companies who you pay to take it down, are all that honorable. Extortion, pure and simple.

Wish Google would just kill their site rankings, that would largely make the problem go away. Google is allowed to de-list spam sites... why they haven't classified all this crap as spam yet is beyond me.

Quoting that comment:

> Your information will show up on those sites sometimes - it'll pop back up after being removed.

That suggests that they're not actually tracking down the sources, just poking the downstream sites that get data from those sources. Much less useful.

The sources include government agencies at the state, local, county, and federal level. Those agencies are not going to hide public documents with your name and address on them just because you ask them to.

There's a legal process to, for example, expunge a criminal record. On the other hand, most counties aren't going to seal the records associated with your house or other property you've bought just because you would like them to.

I've been recently:

1) Segregating automated email to some @customdomain.com address (Yandex and Zoho host vanity domains for free)

2) Forward all email @ that domain to a common inbox

3) Sign-up with servicename@customdomain.com (e.g: facebook@customdomain.com)

You no-longer need to unsubscribe using dubious e-mail links, just automatically black-hole emails that come from spammer@customdomain.com

So, far, so good.

EDIT: List formatting

Useful for email, but this is primarily about postal spam.

somehow misread this entire post

This also helps obscure the very widespread practice of selling data to third parties that identify you with a stable, cross-purpose ID. (Liveramp, et al)

i think the best way to handle this is exactly the opposite. go on mechanical turk, and pay $500 to have them fill out every single free offer, product trial, social media account, ect with slightly similar but incorrect information. Give each turk 10 different pictures that look very similar to you, or are you in bad lighting, and pay them to upload them to G+, FB, ect.

It is much easier to make your correct info difficult to ascertain, than it is to remove it all.

Then they'll just spam all of them. And if you have other reasons to not want to be tracked down, then having 20 or 30 addresses will not help much.

> junk mail

Back in the day, junk mail often had "Return Service Requested", and senders had to pay return postage. So we would tape address labels to cardboard-wrapped bricks, and mark them as "moved with no forwarding address". But that doesn't work anymore.

Here you go: www.safeshepherd.com

I've been using them since way back when they were on earlibird. It's not perfect, but it has reduced the amount of times I show up in these sites significantly. The only disheartening thing is that most of these operators seem to just dump in batches of data periodically, ignoring any prior requests to remove so it has to be done again (which is fine, that's what you pay their automated search and remove request feature for)


These guys don't do it for you? Basically it is there business plan.

I have a foolproof plan for junk mail: I throw it away.

You mean data brokers (https://en.wikipedia.org/wiki/Information_broker)

Top US brokers:

- Acxiom

- Experian

- Epsilon

- CoreLogic

- Datalogix

- eBureau

- ID Analytics

- inome

- PeekYou

- Rapleaf

- Recorded Future

Protip: loyalty/reward cards are a gold mine, especially drug store purchase receipt data

Thanks, this is very useful.

The paranoid voice in my head is wondering if these forms don't actually opt me out of anything, and instead just confirm to these companies that the information they have on me is correct.

The LexisNexis link 404s (added a space), I believe this is correct:


Also, the following can't hurt:


+1 Thanks for sharing! If anyone's actually found these to work please share so we can get more confidence in them :)

The Epsilon link 404s, but I think this page is where it should have gone: https://www.epsilon.com/en_US/consumer-information/consumer-...

New York Times reporter Natasha Singer has extensively covered the data broker industry for the past several years.

Here are some of the key articles but you can find more at https://www.nytimes.com/by/natasha-singer

Jun 16, 2012 | Acxiom, the Quiet Giant of Consumer Database Marketing http://www.nytimes.com/2012/06/17/technology/acxiom-the-quie...

Jul 21, 2012 | Consumer Data, but Not for Consumers http://www.nytimes.com/2012/07/22/business/acxiom-consumer-d...

Jul 24, 2012 | Congress Opens Inquiry Into Data Brokers http://www.nytimes.com/2012/07/25/technology/congress-opens-...

Dec 08, 2012 | Company Envisions 'Vaults' for Personal Data http://www.nytimes.com/2012/12/09/business/company-envisions...

Aug 31, 2013 | A Data Broker Offers a Peek Behind the Curtain http://www.nytimes.com/2013/09/01/business/a-data-broker-off...

Sep 04, 2013 | Getting a Glimpse of Your Own Marketing Data Online http://bits.blogs.nytimes.com/2013/09/04/getting-a-glimpse-o...

Sep 04, 2013 | Acxiom Lets Consumers See Data It Collects http://www.nytimes.com/2013/09/05/technology/acxiom-lets-con...

Dec 23, 2014 | Data Broker Is Charged With Selling Consumers' Financial Details to Fraudsters https://bits.blogs.nytimes.com/2014/12/23/data-broker-is-cha...

Jun 28, 2015 | When a Company Is Put Up for Sale, in Many Cases, Your Personal Data Is, Too http://www.nytimes.com/2015/06/29/technology/when-a-company-...

Pro tip: If you want to avoid giving out your purchase data, pay cash wherever possible.

Right, but where do they get their data?

Are they rolling their own JS API that developers roll into each page? I certainly have never put any of that into any site I've made or seen.

They buy it direct from credit card companies, retail stores, 'free' online widgets (AddThis biz model: https://www.quora.com/Whats-the-business-model-for-AddThis-a...), etc

Right, that all makes sense but seems like it would be more of a grind than a simple JS API. You are effectively creating a marketplace.

Correct - see http://www.crosspixel.net for example:

"Cross Pixel's DMP is powered by our proprietary data relationships with more than 5,500 web sites and mobile apps where we identify and harvest the shopping and researching behaviors on over 650 million unique browsers. Our data partners are leading e-Commerce sites, search directories, comparison shopping engines, coupon sites and toolbars across North America and Latin America."

In general, the 'marketplace' is usually the DMP (Data Management Platform) where two parties can meet and share segments without data leakage (for example - Krux is a DMP used by a lot of Fortune 500 companies).

However the lines between DMP and Data Provider are blurring in recent years...

Great answer thanks! I hadn't heard the term DMP

BlueKai is one of the biggest; it's a data marketplace for cookie-tagged data. They were bought by Oracle a few years ago.

And Krux's website says they have been acquired by Salesforce.


Big thanks for the info, that was insightful.

> Right, but where do they get their data?

Yeah, I'm basically looking for the companies whose answer to this question is "by actually mining the data ourselves from your doctor, grocery store, Facebook, etc.".

Why do you think it works this way and not the other way around as well? Grocery stores shop their data around to see who will pay the most for it. A person who is out of work goes to the local courthouse and requests a bunch of records, compiles them into a spreadsheet and then cold (or warm with something like LinkedIn) calls to see if anyone is interested in the data. An online quiz company is going out of business, and as part of their bankruptcy settlement, they sell off their database of answers at auction. etc, etc, etc.

As pointed out elsewhere, it's a marketplace, and as such there are going to be buyers and sellers. Some of those sellers are going to be primary sources themselves.

Good question. I thought it works that way because it takes a lot of work to sanitize and cross-link people's data to other datasets accurately, so even if it's a "push" model, I still can't believe that every single website that does this does their own data cleaning & ML & whatnot. It's far too much repeated work and a good business to just do the work and sell it off to others. So I'd assume a few companies have to be making profits at the lower layer regardless of whether it's a pull or a push model.

As mentioned elsewhere, you seem to have a lot of unfounded assumptions, and misconceptions about this sector.

It's far too much repeated work

Companies will repeat work over and over again if it's cheaper than buying it, they have custom needs that aren't filled with the data available, etc. Businesses repeat work all the time, and this is not any different. Additionally, for many businesses in the sector, they themselves are the primary source for data. For them it's not repeated work.

a good business to just do the work and sell it off to others.

Yes, that's why some aggregators exist. They make money by brokering the data from multiple sources, some primary and some resold. But they are the tip of the iceberg.

You seem to be under the impression that there is some small list of companies who are all working from primary sources, and that everyone then gets feeds of data from those companies. This would make sense if gathering data was very difficult, or had a natural resource-like limitations. So that model works well for something like diamond mining (as compared to diamond growing), because the number of diamond mines are limited, and there is a natural entry barrier. However, that doesn't take into account the fact that gathering this data is generally easy. Sometimes it's very easy, such as a sftp feed of data from a government records database. Sometimes it's a bit harder, such as needing to physically be present to obtain the data.

That means there is very little barrier to entry, and thus generally there is going to be a lot of competition, and thus many companies vying to make money.

Personal data has value just like any other commodity. So a bit of economic theory goes a long way to understanding what the boundaries of a market might be. Low production cost, high profit goods generally have a large number of companies in the market.

> Right, but where do they get their data?

You give it to them when you sign up for that stupid rewards card.

When I signed up for a major grocery store rewards card, I was able to put a bogus name on the form. How can they correlate my purchases with me specifically? Can they correlate the charges to my credit card with the times the rewards card was used?

(This grocery store did not have a pharmancy)

Credit card or check payments will link records.

Cellphone or MAC tracking.

In the near future if not already, facial recognition.

+1 Thanks for actually posting a list!

It pisses me off too. US is so concerned about privacy, yet a LOT of your private information is made public once you start opening bank accounts, buying real estate, sign up for gym, etc.

When I opened my first bank account they had a typo in my name, which I found out when I received my debit card. I asked them to fix it immediately, however, two to three weeks later I was already getting mail from stores addressed to the misspelled name.

When I was buying my first house I immediately started receiving mail from moving companies at my old address before I signed the closing. After I moved I got a lot of junk mail with other kinds of offers. I even started getting PHONE CALLS from a home monitoring/alarm company. When I asked them where they got my number they hang up.

It is like all the information is up for sale somewhere.

Taking power away from capital to give to individuals is currently positioned as un-patriotic in the US (e.g. pandabear187 above's beleif that regulation of data brokers will lead to fascism or communism, even though he goes to great lengths to protect his own information).

> US is so concerned about privacy

This is not true, at all. HN may provide that appearance, but the vast majority of people who live in the US do not care about their privacy, based on their actions.

"when a court needs to order that someone's information be purged (for whatever reason, e.g. for safety)"

I believe that is the location of your confusion, that is a Hollywood fiction, mostly. If a collections agency is bugging you there is a way to resolve it via the legal system, but its very much case by case and company by company business. A judge can order one company who's officer or agent is present in the courtroom to do something to one record. A judge can purge his own legal system's record of an arrest if he wants to. Belief in this in general is analogous to non-computer people believing in the CSI tv show or hollywood hacking

What about the new higher level companies that pop up amin to Instant Checkmate? How do they know which lower-level companies to buy your information from? There's no way they ALL do the heavy lifting themselves. Someone's gotta be making money off doing the real work and others must be buying from them.

If you have your own domain name with a wildcard, it's really helpful to enter: someservice@mydomain.com as your email. That way if it leaks you'll know who did it and can setup much more robust rules to block. I'll use the domain name as the main address so I remember which name goes to which site.

For physical address mailings, you can hyphenate (or use a middle name) as the service. So First Service-Last as the addressee name. While harder to setup "mail rules" for, at least you'll know who to never trust again.

I've been doing exactly this for a while. Here is my list of companies that have leaked the email address I gave them to spammers: https://gist.github.com/eligrey/5084991

Awesome work!

I do the same with email addresses, but receive very little spam. Mostly I block addresses that start spamming me with newsletters. I've thought about keeping a list, but most companies actually stick to the Dutch anti-spam laws (which are quite good).

Only Dropbox and one personal contact ever actually sold/leaked my email address, and Paypal of course but they hand my email address out to all merchants so they're almost certainly not to blame themselves (not beyond the fact that they hand it out in the first place).

Note that (I think) Adobe was hacked, so that doesn't mean your email was "leaked" by them per se, not in the sense we mean anyway.

Also, out of curiosity, how long did it take these companies to leak your info, generally? Days, weeks, months, years...?

Dropbox was hacked as well.

I do the same, using mailgun to forward them to my usual account. Thou the day i will actually have to send / reply to e.g. customer support with one of those mail addresses will be annoying. Any suggestions on that part?

Google apps can kind of do it[1] (although it appears to leak the main address on purpose), so I guess you can also do it with your own mailserver?

1 - https://support.google.com/mail/answer/22370?hl=en

I just started doing the wildcard domain thing last month, I'm happier knowing that I can shut the taps. I get annoyed just knowing that there's spam in my spam list.

First time I was on the phone with a customer service rep. after using the <website>@<domain>.com format I was asked if i was sure my email address was correct. I lol'ed and told them not to worry about it.

I use random 20 char string for the local part. That way there's no question about the leak. Spammers use a lot of dictionary words in the local part of the email address, so it's better to have a random string. If you're using password manager anyway, there's no reason not to make email/username random too.

For smaller e-shops you might find some with actively exploited 0days this way. I did.

That is useful, but how do you manage to remember the mappings? I want to know what random string corresponds to what service without having to search my password manager.

The best of both worlds would be random local part + a Chrome extension that manages the mappings. The Chrome extension can then replace the local part in Google Inbox with the corresponding site name.

Update: Just use service name + random string concatenated for the local part. Seems like the best solution.

not a bad tip, i'll do this for less reputable websites i suppose.

Even easier is to use john+someservice@doe.com. Works with any e-mail provider, ends up in your normal mailbox. Gmail even adds tags based on what follows the plus.

Might not work for some services due to ignorance of the spec or to prevent users doing this.

Doesn't really work since anyone with half a brain would remove the plus sign and after, knowing the email is more useful without that part. I've never caught anyone this way.

problem with that is websites who think they're super smart and believe that + is not a valid character in an email... sometimes it's just the javascript though and you can submit it via manual POST request.

You can do this with gmail.

For example if your gmail is root@gmail.com

You can do root+yahoo@gmail.com, root+reddit@gmail.com and such on.

It's relatively easy to use regular expressions to strip that out though. I set up a catch all for my domain, and give each entity it's own unique addy to guard against this.

Last year I was getting constant calls and snail mail about buying an extended car warranty on a car that I no longer own. I asked the place where I bought my car if they sell that information and they claimed not to.

So where do these sleazy companies get that data? The DMV?

This year, I'm getting two or three calls every week about a buying a home security system and monitoring.

I don't understand why these calls aren't easier to block. Somebody knows here they are originating from. Why can't I get that information too?

Vehicle registration is most likely public record in your state.[0] You can't push a button and stop all of these calls because there are dozens (probably tens of dozens) of companies that search and process public records and other data sets then sell that information to various companies.

[0] http://www2.westlaw.com/CustomerSupport/Knowledgebase/Techni...

The "home security system" calls could be social engineering to find out if your home is protected or not. Just by listening to what they say and not saying you already have one is enough to tell the caller what they want to know.

I would never tell them anything, but I have listened to the recording and pressed '1' to speak to a representative and when I do that, nobody ever picks up. It's baffling.

The one that surprised me is to learn that virtually all health insurance companies sell your personal health information. Most people think this is illegal because the data is sensitive. But it turns out that if it's generated by a business transaction (i.e. a claim between your doctor and your insurance company) then it's not considered PHI and it's not protected.

For pre-employment screening, we had court runners literally sitting through the courthouses going through paper records for each candidate on an "ad-hoc" basis. Some companies do this in a more organized fashion by having a person just data enter ALL the records (like in North Carolina if I recall) and since they had this data, we just bought the company.

Pre employment screening is different though. They need your permission for a legal background check and of course they will do everything necessary to do it. I'm asking about the information that leaks without your permission.

Small tip: At the grocery store or anywhere else with a rewards account linked to a phone number, rather than signing up for one just use (Your Local Area Code)-867-5309

The number almost always exists and is a valid account. Get the discount, don't get tracked. Thanks, Tommy Tutone.

I've given (area code) 555-1212 and don't recall it ever being questioned. Technically I've given them my phone number, but with one indirection. ;-)


If I can't conveniently avoid those things, I like to use the names of famous serial killers, with a local address that would be in the ocean (if it existed).

I've yet to have a sales clerk question it (or perhaps they just don't care).

The clerks don't care. Why would they?

This has the added advantage of producing some very strange buying-habit data, as you are likely not the only person to have provided 867-5309 / 555-1212.


Since they're listening anyway. Cut out the middleman.

The last time I tried that, it didn't work (or there wasn't an account associated with it).

Make sure to tell them that you name in Jenny too.

Thanks - I always used my friends' parents old number - so someone I didn't know - this would be even better.

Have you ever gone to the doctor, signed up for a gym, or signed up for your local grocery store's membership rewards program?

That's how they get your information.

Have you ever gone to the doctor...

At least in the US, "going to the doctor" is encumbered by several laws oriented toward the protection of a person's information conveyed in the course of a visit.

What is the mechanism by which these laws are sidestepped in providing patient information to third-parties?

signed up for your local grocery store's membership rewards program?

Does e.g. Catalina give their information to Red Plum directly? Is it sold? How much?

I went to Pavilions grocery store the other day, and when I got home, the Facebook app said, "Have you been to Pavilions recently? Click here." I want Facebook to find friends and events nearby me, not track where I go.

I can only imagine all the location data Google, Apple and Facebook is collecting and what they're actually doing with it.

Google asks you to take a photo when it thinks you are somewhere like a newly built mall, or anywhere where they don't have many photos in general.

Unlikely to be your doctor, even releasing your name in combination with the health provider's name provides evidence of a patient relationship and violates HIPAA.

Uh, what? This doesn't explain a lot of the information I see on some of these sites. It only explains the basics like your address and phone number.

well, don't underestimate the idiotic forms that many doctor's offices (in the US) want you to fill. Almost every time, I have seen forms that ask for everything about you. Personal information, Social Security Numbers, Employer information, heck even salary (not kidding). I just don't fill those of course but I am sure many people just fill them out because it is on the form.

Again: I see information out there that doesn't fall in these categories. And again: even for the information that does, that still wouldn't tell me which companies are the main ones aggregating them and selling them off to other aggregators. That's what I'm trying to find out.

Can you be more specific on what type of information you're talking about?

Political Party? Views & Opinions? Relatives? Education history even if it's outside the country? Check out all the categories, here's [1] a random example. And yes, I know all of this information exists out there somewhere (resumes, mailing lists, social sites, whatever); my question is who is the first layer aggregating them from these sources.

[1] https://www.mylife.com/john-smith

who is the first layer aggregating them from these sources.

You clearly understand that you could get this data from various places, but maybe you don't understand how relatively easy this data is to procure (physical presence requests being the hardest). Why do you think there exists a "first layer" as some distinct class of business, and not a large number of primary source gathers, and a large number of aggregators.

There are a lot of business models here, and I think you may be underestimating the complexity. Like you want X number of companies to point the finger at, but that's not reality.

Instead, there are at least 3 axes. Does the company buy data from someone else, do they gather it in house, or both. Does the company sell data to other companies, or not. Does the company use data themselves. All of those combinations are going to be present.

1.) A company that generates it's own data, but never sells, and uses it itself

2.) A company that generates it's own data, but buys additional data, sells it's own gathered data, and uses the data itself

3.) A company that buy it's data, sells that data, and never uses the itself

etc for all of the rest of the combinations

I can say that within my sector, my company used to purchase data from a data broker (not PII, or anything you mention, but industry specific), then decided it was too expensive, and started gather our own data. Now we use the data we gather, and also sell it. Suggesting that there is some "first layer", and that you might be able to identify them all, is just a basic misunderstanding of the entire business of data brokerage.

Political party is fairly easy in the US via voter rolls for primary elections. Even if this list is held privately by the Party, they have an incentive to trade it for data about independent or undecided voters.

Views & opinions are likely from member lists of organizations like the NRA. Again, these organizations will trade their data for valuable data on other people whom they want to reach.

Relatives is pretty easy to figure out if you have address history over time. People who are related tend to share an address at one time or another. Relationship status is similar when a couple moves in together. Obviously, there are also marriage records.

Education history is often verifiable by employers, even outside the country.

>> yes, I know all of this information exists out there somewhere (resumes, mailing lists, social sites, whatever); my question is who is the first layer aggregating them from these sources.

Even facebook buys grocery rewards data, http://lifehacker.com/5994380/how-facebook-uses-your-data-to...

Many clients of these services have agreements that require them to contribute data back (or at least they get a discount for doing it).

I worked for a telephone company that used a service like this a long time ago and we shoveled customer data back at the provider. We did NOT give them call records though.

Almost any medium sized company you deal with is selling your data.

LexisNexis is one of the aggregator that is frequently used.

I thought was a court case database, not a personal information aggregator! +1 thank you!

LexisNexis provides all sorts of databases.

I had an experience where I was shopping for car insurance, and just before I signed, my rate doubled from what was previously quoted by the same company. It turned out that a LexisNexis database had an erroneous record claiming I was at fault for an accident.

Wow! How did you find out it was LexisNexis? Did the insurance company just straight up tell you?

One book that covers the medical perspective is:

"Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records"


I read this book, it should be a crime what they do.

I saw a great talk by a private investigator at a security conference on this topic, and did some research after the talk to confirm what he was saying.

The short version is what you're asking for is going to be an uphill battle. Data aggregation companies don't want to disclose their sources because their services are often used by debt collectors and other organizations who want to find people who don't want to be found. If the sources were public knowledge, debtors could avoid them to escape debt collection agencies, for example.

The comprehensive list you're looking for doesn't exist. Each data aggregation company has a different list which is the result of many private agreements they have with their sources. If you want a list of the sources, you can't ask the data aggregators.

You can ask the sources themselves. Because you don't know who they are, you have to guess. Any company that puts a card in your wallet would be a good place to start. Under California Civil Code 1798.83, you can email companies and ask them to provide you with a list of all the direct marketing companies they sold your information to. Try making a request to your insurance agency.

1798.34 also allows you to ask California government agencies to provide you with an accounting of everyone they disclosed your information to. A 1798.34 request to the DMV should be a rich source of data providers.

There's also lists of companies involved in the data aggregation ecosystem in the consumer finance protection board's list of complaints:


One surprising thing I found out during the talk is that pizza chains are a rich source of data for these companies. If you think about it, this makes perfect sense. The data includes a guaranteed link between a person, a place, payment information, and a phone number.

> when a court needs to order that someone's information be purged

I don't think this is a thing. A friend had to deal with a DV case and I got to see how all the legal machinery involved works. The court doesn't even bother trying to purge the victim's data when they move out and try to stay away from the abuser. The court simply moved the victim and the DV support organization told them to stay off social media and not to give the new address to anyone (including pizza places.)

That being said, the federal government does have a comprehensive list of all the agencies which are members of the federal privacy council. In theory, these agencies are supposed to have a data integrity board which provides oversight for any data they keep on Americans.


Thank you for the legal information! That was enlightening.

The one question I have remaining though is: when these companies pop up, how do they know whom to buy your information from if there's no list and nobody tells them?

It's all very incestuous. I suspect that part of what happens is that new companies are comprised of people who have previously worked in the industry, so they have general knowledge of who the available providers are. Without prior experience in the industry, I suspect you're not going to be in business very long.

The other part of getting initial sources is probably calling companies in retail, insurance, etc and pitching them on how much they would make selling customer data to them. If I had to do it, I'd probably get industry reports, sort by annual revenue, start at the top and work my way down as I try to get a hold of the consumer data department at each company.

I've also been approached by these guys when I was part of shutting a company down. They wanted to buy our customer database. I guess when one of these companies shuts down, some new one can buy the database and use that as a seed for new operations.

Do you happen to have a link to that talk, or the name of the presenter?

I forget, and the conference site took the slides down. It was more than 10 years ago.

It sounds like based on the other comments that there's no way to track down one sole source because there are so many varying from public records to machine learning...

That being said, is there a simple way to better obscure yourself? Like using a business name and a PO box instead of your personal name when it comes to bills/addresses?

You don't think all those fake facebook/twitter profiles trying to follow/friend you are just for the lulz do you... They're mostly for mining your personal information. This extends to transaction processing agreements with advertisers and merchants for the purpose of analytics and tracking, as well as information sharing from credit card companies themselves, and peering agreements for data.

Just your email address alone, let alone combined with IP information can result in being able to find a lot of information about you... then you take that and correlate it to public information, cc purchase history, online profiles, it's a treasure trove. That doesn't even count extra data gleaned from all the tracking cookies.

All said, I'm still far more concerned about government use of similar data than I am private businesses.

As far as public records go, real estate data can be quite the treasure trove. There are data brokers that collect, analyze, and resell this information all day long.

For example, my startup can pull information on ownership, mortgage and sales history, liens, and foreclosure records, among many other things, for a given property. If you were to cross-reference the data with other public and proprietary sources, it could get pretty, umm... what's the right word?, "interesting", in terms of accuracy and level of detail.

Take a look at this (2015) pic of the landscape: https://www.slideshare.net/RaviralaKarunakar/luma-display-ad...

The companies you are interested in are in "DMP & Data Aggregators" and "Data Suppliers" section.

But this is only the largest, best known companies. There are many, many more.

There are far too many too name and they each collect data from different sources - both primary and secondary. Then the data is shared between them through intermediaries that would form a very long list. This duplication/redundancy makes it impossible to remove your data completely from all the owners. Experian Acxiom DataLogix TransUnion Innovis Woodbridge/Thomson Reuters DNB

Easy one is they scrape Google for Linkedin bio info.

I'm paranoid that USPS sells your info once you fill out their change of address form.

> I'm paranoid that USPS sells your info once you fill out their change of address form.

They do: https://www.forbes.com/sites/adamtanner/2013/07/08/how-the-p...

You're not paranoid, I don't know if money changes hands but my changes of address have 100% been accompanied by spam snail mail.

I just don't change my mailing address anymore.

Money does in fact change hands.

WTF. So if Intelius has your address, they can pay the USPS to get your new address. That incentivizes me not to fill out that darn form.

Seems like getting mail you forgot to forward would outweigh this

Request a temporary forward. Up to 12 months.

Public records, which is accessible to anybody.

So paying to delete your info from one site is useless, because the next site that somebody sets up will have your info, if they use the same public records as the others.

Reading this thread I can answer the question because I have worked in this industry and given/sold data to Intellius specifically.

There are three main sources of ALL of the data:

1. Acxiom http://www.acxiom.com/ 2. Experian http://www.experian.com/ 3. Neustar https://www.neustar.biz/

Acxiom got it's big start by developing a way to copy phone books in the 90s and they won a court case that sided with them saying the name and address information was basically public info. Acxiom aggregates something like 800 different attributes for each named person at each address using third party vendors and then resells the entire consumer database to list brokers who often times will add additional detail for smaller subsets of the data. You can opt out of Acxiom by going here


2. Experian. Same as above but they have a lot more specific data about you because you probably fill out forms related to credit and loan applications correctly. Thus they know all of you previous addresses and they sell that to companies like Intellius.

3. neustar: ditto.

The main thing to keep in mind is that each of those three companies have slightly different channels through which they aggregate consumer data so your info comes out a little differently in each database.

Almost any list broker or mailing house or telemarketer that you encounter is getting their data ultimately from one of those three companies (and in many cases they would buy data from all three sources).

Finally, a company like http://www.criteo.com/ uses a process they call "database cookie-ization" to match your online browsing history (hence interests and business) to those three databases via your email addresses. So they know what you look at online, where you live, everything you have ever requested credit for, etc etc.

There are hundreds of smaller companies like these (below) feeding data into those databases too:

https://www.hgdata.com/ https://www.fullcontact.com/ http://zetaglobal.com/ https://www.lotame.com/

If Google aggressively delisted shitty companies like this, the problem would go away.

There's no value for society in these shit services that make people register and pay to have their profiles taken down.

Then again, Google and Facebook are the biggest aggregators among them. Admittedly, they don't sell raw access to their data.

Lexis-Nexis? TRW/Experian/etc?


Information acquired from a number of sources, including working in the data industry (a decade or two back), privacy advocacy, working in Web space, research of my own, stories over beers, legal experiences, etc.

There's a large information-brokerage industry. If you want to find it, investigating the question from the consumption side (as in: who will sell me this information) should turn up the larger players, most of whom are already listed in this thread. https://news.ycombinator.com/item?id=13804795

The big players are much of the business: Power laws work here as anywhere else, and heading off the larger sources is pretty effective.

The value of individual data isn't all that great. Which leads to one of the major PITAs of this industry: there's a lot of invalid, false, or stale data around. The incentives to fix it simply don't exist.

The now-defunct Internet Junkbuster used to have a print-your-own set of letter templates which could be sent to various marketing organisations. Doing that in the early 2000s dropped my own junk-mail volumes tremendously, and for years afterward. I suspect SafeShepherd operates somewhat similarly. Finding and hitting the direct marketing association(s) was a big part of that.

Putting a fraud hold on your credit reports (TansUnion, EquiFax, Experian) is useful.

Any account-based activities or activities in which you are specifically identified are fodder for capture. Credit cards, checks (Luddite! ... hang in there), "loyalty" cards. Gyms and pizza, as noted.

Facebook, which should go without saying. LinkedIn profiles.

Any online information service which has ever been hacked. (For safety, assume all of them.)

Online purchases. Through both the marketplace and your credit card.

Court and other public records are manually reviewed and entered.

Various school and alumni associations. Organisations such as Classmates.com, MyLife, etc., front-ended to skip-tracing and similar organisations (info via direct communications).

Your auto smog testing station. There's an outfit known as ISO, Insurance Services Office, who has a unit that tracks down odometer mileage data. They glean that by buying the state smog check data, which is indexed by VIN and drivers license in mose cases. The notion is that miles driven is an excellent proxy for insurance risk. https://en.m.wikipedia.org/wiki/Insurance_Services_Office

The US Post Office NCOA (change of address) form, as noted. File a temporary COA to avoid getting listed.

Used to be you could submit a "pornographic materials" request to the USPO to have circulars and such removed from your delivery. Though online sources suggest it's possible to block 3rd class mail (Yahoo answers). That's more an annoyance than privacy issue.

Magazine subscriptions.

Request of any organisations you do business that they not share your information. Use telltales to determine which do (additions to your address, name, etc.).

And, if the state of affairs bothers you, get on your government representatives to do something about it. Data are liability, and there's far too much of it floating around. The US in particular has taken an exceptionally piecemeal approach to the problem (video store rental records are protected, bookstore and pharmacy records are not).

Request comprehensive data privacy regulations, with teeth.

Would it be possible to copyright our names and use DMCA takedown notices? Just a random thought.

Names do not meet the threshold of originality test[0]. Beyond that, your parents would be the copyright holder until their death when you would inherit it via their estate.

They are potentially covered under trademark if you have a brand in a specific industry and others are using your name or a similar name in a way that could cause consumer confusion.

[0] https://en.wikipedia.org/wiki/Threshold_of_originality unrelated: the threshold for code is usually around 15 lines.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact