
Database containing details of 56M US residents found on the public internet - ptah
https://www.theregister.co.uk/2020/01/09/checkpeoplecom_data_exposed/
======
PietdeVries
From the article: _However, under the laws of the People 's Republic,
government agencies can more or less search any machine at any time in the
Middle Kingdom, meaning profiles on 56.5 million American residents appear to
be at the fingertips of China, thanks to CheckPeople – we assume Beijing has
files on all of us, though, to be fair._

And that is exactly how the Europeans feel when their health records are
handled by Google and the TSA wants to know their Facebook handle: what
initially appeared as a nice-to-have is now all over a sudden a government
source of data no one anticipated...

Edit - oh darn, I forgot Ancestry.com and 23andMe, which are even worse
examples: US police has full access to all DNA samples provided by anyone in
the past. That is 20 million DNA samples. Not name or age - full genetic
info...

~~~
Eikon
I’m starting to think the Facebook / Google way of harvesting data may really
be over complicated. This kind of services shows people are willing to _pay_
to give personal data that literally defines them as individuals to a
commercial entity.

~~~
quaquaqua1
A dna test was really helpful in finding all my half siblings and other family
members because my biological dad is a sperm donor.

The fact that my DNA can somehow turn up at the scene of a crime somewhere and
the police can query the sample I sent to Ancestry x years ago, well, let's
say that I already knew that risk going into it and it will have to be what it
will have to be.

In our lifetimes, I'm not seeing a way for us to dismantle the forces that are
pushing for such a big surveillance state.

Therefore, by definition, you can either find a way to cope within that
surveillance state, or you can move to somewhere so remote and hidden that you
can't be caught doing what they don't like.

~~~
tapland
You are contributing to it though. You also added your entire close family to
the searchable registry by submitting your DNA.

It reads like 'people like me are out there and don't care, so if you do you
have to move to the remaining square kilometers of Jungle'

~~~
awb
> You also added your entire close family to the searchable registry by
> submitting your DNA.

Noob question, but how does this work?

~~~
alistairSH
Roughly... Crime is committed, police find DNA sample at scene, police compare
sample with online DNA registry, hits as a near-match for personX, police now
know that a sibling/cousin of personX is the culprit. That sibling/cousin
never allowed their own DNA to be collected.

This happened in 2018... [https://www.washingtonpost.com/news/true-
crime/wp/2018/04/27...](https://www.washingtonpost.com/news/true-
crime/wp/2018/04/27/golden-state-killer-dna-website-gedmatch-was-used-to-
identify-joseph-deangelo-as-suspect-police-say/)

~~~
mindslight
... police now _think_ that ...

To me, that is the real threat - false positives. myopic reliance on the
database, and an assumption the computer is always correct.

------
blackearl
"The repository's contents are likely scraped from public records"

Seems like a non-story to me. If you put it out there, someone is collecting,
cleaning, and selling it. The fix in this particular scenario is to put less
online.

~~~
auiya
Almost every city in the US has a GIS system which you can use to look up tax
record ownership of property parcels. Sales history too even. This information
is also available via FOIA requests. There's probably easy ways to scrape it
out considering almost every city uses the same 2-3 GIS software systems. In
the case of home ownership, the only choice you have for privacy is to
register an LLC under a paid attorney as a proxy, and have the LLC buy the
home. And even then you may run into issues trying to file for homestead
exemption property taxes. You're not able to tell the city to unlist your
property info in most cases.

~~~
Mountain_Skies
Reading the Letters to the Editor and similar features in older magazines and
newspapers reveals something that would be unthinkable today: it was common
for the home address of the person writing to the periodical to be published
along with their letter. As far as I can tell, no one had a problem with this.
When I was young, drivers licenses included your Social Security number and it
was common to have it printed on your personal checks. This no longer happens.

Is it the information that's the problem or is the problem what others are
willing to do with that information?

~~~
lubujackson
This is an interesting insight.

I think most reactionary pro-privacy responses (the HN default perspective)
really stem from a complicated internalization of data gathering/use
capabilities that has been adjusted over time combined with a lizard brain
feeling of invasiveness. Because almost always no one particularly gives a
crap about YOU and your specific data, but they may profit from and misuse
your information in passing or in aggregate.

It is a lot like the feeling of having your car broken into or house robbed.
You feel personally violated but more than likely you are a victim of
circumstance. It can be hard to distinguish between faceless identification
for (ad network data gathering, for the most part) and the risks of general
data availability that makes anyone as capable as an old P. I. (especially
when data leaks conflate the two).

~~~
perl4ever
"almost always no one particularly gives a crap about YOU and your specific
data"

It's really amazing how much time people spend these days trying to gather
information on people, considering how useless it is. I don't mean mass
surveillance, I mean like people that you want to do business with
individually.

------
Keverw
I’ve always wondered how those people finder sites get all this info. Do they
just send a bunch of FOIAs to court houses and cities? I kinda figured maybe
they paid to be wired in with the DMV and police computers maybe... I guess
background check companies would be similar however they get the data.

I was looking on some before just curious and I noticed some information was
inaccurate. Like looked up an old address, it said a dead relative used to
live with us, when they never did. Then another site said one of our neighbors
were a sex offender when not true.

Then there’s companies like LexisNexis too that have massive databases on
people too. I think they have a way to run people’s credit without it actually
showing up on people’s reports as I heard car dealers can get info on people
credit without it showing up as a pull, so not sure if maybe it’s like a
cached version of a credit report sold and traded.

Remembering watching some clips years ago on all these big data brokers on
YouTube. Last I heard some of these companies won’t even delete your
information unless you are a police officer who felt your life was in danger.
Seems to still have a similar policy. [https://www.lexisnexis.com/en-
us/privacy/for-consumers/opt-o...](https://www.lexisnexis.com/en-
us/privacy/for-consumers/opt-out-of-lexisnexis.page)

~~~
astura
Where I live court records and property records are completely public and
available for anyone to view online. So just simple web scraping. Voter
records aren't online as far as I'm aware, but they're easy to request.

~~~
Keverw
I know some of it makes it to the paper too, like bankruptcies or divorces if
you don't know where they currently live, but I know some of that stuff is
under a paywall or got to pay to make copies at the court house but probably
depends on the area.

I was looking at one site about red light camera tickets since there's been
debates over them, some states even outlawed them. But looking at one of the
examples, some county didn't even have a secure website to put in license and
credit card info. Chrome even put a warning next to the address bar. Not sure
why they are allowed to process credit cards since a private website would
have to be compliance with PCI. But I guess some areas are more technical than
others.

~~~
astura
Divorces and bankruptcies are included in court records and are available
online in my state.

Property transfers are posted in the newspaper.

~~~
Keverw
For the entire state? I figured a county by county thing. But I do think more
legal case law and publicly funded research should be open to the public. I
know people feel that way about PACER, so someone created a plugin called
Recap.

~~~
astura
Yes, for the state.

We don't really have county courts here, just judicial districts which are all
operated by the state. (We have no county governments)

~~~
Keverw
Interesting, sounds more centralized maybe. I'm guessing Louisiana? or maybe
Alaska?

------
hn_throwaway_99
It's articles like this that make me realize many people really haven't come
to terms with how the Internet has changed the definition of "privacy".

As mentioned in the article, this is all public data that probably any script
kiddie with enough time could write scrapers for. It's a non-story. What's
different is that (some) people somehow expect that this public data is as
hard to collect and correlate as it was 30 years ago. Those days are long
gone, and people should realize it.

------
swarnie_
General question to HN because i don't know the answer.

Has anyone ever tried de-duplicating and matching up records between multiple
leaks to build user/person profiles?

I guess with enough unique keys, compute power and time you could build a
reasonably accurate profile of a person by matching up email addresses, phone
number or SN numbers?

Would probably make identify thief and fraud a lot easier going forwards.

~~~
astura
Data broker-type companies do this already and sell it. They even include
stuff like your probable religion, probable income/net worth, probable
hobbies/interests, etc.

For example:

>Each record contains entries that go far beyond contact information and
public records to include more than 400 variables on a vast range of specific
characteristics: whether the person smokes, their religion, whether they have
dogs or cats, and interests as varied as scuba diving and plus-size apparel.

[https://www.wired.com/story/exactis-database-
leak-340-millio...](https://www.wired.com/story/exactis-database-
leak-340-million-records/)

HN Discussion:
[https://news.ycombinator.com/item?id=17421140](https://news.ycombinator.com/item?id=17421140)

------
flurdy
Paging [https://haveibeenpwned.com](https://haveibeenpwned.com) and
[https://twitter.com/troyhunt](https://twitter.com/troyhunt)

~~~
astura
First off, this is all just public information.

Secondly the location of the database was not disclosed by the researcher, so
he can't exactly load it into haveibeenpwned

~~~
flurdy
[https://twitter.com/troyhunt/status/1215829082920742914](https://twitter.com/troyhunt/status/1215829082920742914)

------
yoaviram
Following data leaks like this one I suggest sending the company, in this case
CheckPeople, a GDPR / CCPA deletion request. Here's a simple way to do it:
[https://yourdigitalrights.org/?company=checkpeople.com](https://yourdigitalrights.org/?company=checkpeople.com)

Disclaimer: i'm one of the creators of the service.

~~~
sylvanaar
How do you make money from it?

~~~
yoaviram
We don't.

------
ryanlol
Meh, it’s all public data anyway.

The only “victim” here is CheckPeople, but I doubt that this will have any
impact on their business.

