Hacker News new | past | comments | ask | show | jobs | submit login
Inside the Largest US Voter Data Leak (upguard.com)
423 points by danso on June 19, 2017 | hide | past | favorite | 329 comments

Speaking as a guy with a lot of experience with voter data ( I built the first "where do I vote" apps for Google and helped found the voting information project):

This is actually almost entirely public data. Yes, including addresses and phone numbers and political affiliation. There are some states that is not public as part of the voter file, but you can still get it other ways publicly. For example: USPS, etc. Some states/players would make you sign agreements not to use it for commercial purposes.

The modeling info included is not public.

Acquiring 50 state data can be a bit of a pain, but there are at least two major players that will sell it to you. (I remember one of them literally laughed when I told them we would want the databases without any personal info included, because we just wanted the address to various political precinct mapping.)

> This is actually almost entirely public data

Birthday is an included item. That's definitely private as it is often used to confirm identity.

"almost public" is meaningless. One data item, like credit card number, or birthday, can make this a dangerous leak.

Birthday is almost certainly public. You can get it easily from the DMV or from any of several dozen commercial providers that resell government data:


That it's used to confirm identity shows how weak identity-theft protections are at most institutions, not what's public information. (For that matter, mother's maiden name is basically public information as well: you can get it from genealogical records.)

The site you linked to is not affiliated with any state DMV. It links to a sketchy background check service that appears to be a scam.

Edit: spelling

The link was preceded by the words "from any of several dozen commercial providers that resell government data:" so that's how I took it. It would be at a .gov domain if it were state affiliated.

It said from the DMV. No evidence of that here.

Also, "reselling government data" implies that the government explicitly gave permission to a business selling your data. I doubt that's true. More likely, these entities gathered data from whatever entities they could, probably various private companies from which you've made online purchases.

That, and, I've never consented to release my birthday publicly. I consider it to be private, therefore it is private. It's mine! I expect the government would not release it since they use it as part of an identifier all the time.

There's no perfect identifier. That's why we need security. This is a major lapse.

Pretty sneaky of the comment above yours to try to pass dmv.org as a government website.

I think you are confused about what "private" means.

I've never "consented" to "releasing" my address publicly, but real estate records are public information thus the real estate I own is easily queried from my town's website and the real estate transaction was published in all the local newspapers. You can even look up my property tax bills from the towns website and my payment history of said taxes. If you're feeling generous you can also pay my property taxes online too.

You're trying to redefine private.

Is the list of doctors you visit private? I could probably discover such information, but it is considered private.

Your address is usually public unless you go to lengths to obfuscate it. You can go to Hollywood and get a map of where all the movie stars live. And, the white pages list your phone number unless you opt out.

Birthday has never been publicly available until this voter data leak. Not good.

>Is the list of doctors you visit private?

Yes, this is explicitly protected by law.

>I could probably discover such information

No you can't get this information, and certainly not legally.

>Your address is usually public

Exactly but I never "gave my consent to release my address," it just is.

>Birthday has never been publicly available until this voter data leak.

Open up a newspaper or open up yourlocalpaper.com There's a whole birth announcements section!

How do you think the RNC got this information in the first place? Public records. The RNC is NOT the government.

> How do you think the RNC got this information in the first place?

I imagine they used campaign dollars to buy the information from private companies with whom you've done business. Maybe legally, maybe not.

The RNC certainly isn't offering up such information as to where they sourced the data they leaked.

> I've never consented to release my birthday publicly.

Too bad. The government doesn't care about you consenting to things. In fact, making you do things without your consent is literally the entire point of organized government, even if we usually overlook that because it's for a good purpose (for example, taxation to pay for health care or national defense).

You're off topic. The government hasn't released my birthday. I question their data security practices, but any release of such information would be considered a mistake.

I don't care if you think paying for health care or national security is important or not. That's an unrelated issue to whether or not Birthday is private or public data.

How do you know if your birthday isn't in some publicly accessible government database somewhere? Did you try to find it? Did you hire a private investigator (who has quick access to all those databases) and ask them if they can find your birthday?

If it were, I would let the government know that's not okay. Hackers would be one step closer to being able to sign up for a credit card under my name.

If it's true that that information is out in the wild now, then I expect government to tighten their tech security procedures.

"I would let the government know that's not okay" - lol, yeah good luck with that.

How do you think the RNC got birthdays for 200 million Americans in the first place? Public records. 200 million Americans are NOT affiliated with the RNC.

> How do you think the RNC got birthdays for 200 million Americans in the first place?

As I replied elsewhere, there is a market for reselling your information on the internet. In some cases that is legal, and in many it's probably not. As a tech person you should know this.

Yes, the only reason that anyone cares about their date of birth is that in recent times it has been something that can be used in identity theft. Back in the pre-internet days, nobody cared. People had their SSNs pre-printed on their personal checks too.

The solution to all of this data privacy hysteria is to change our approach. The possession of common facts about a person should not be sufficient to masquerade as that person.

> The solution to all of this data privacy hysteria is to change our approach. The possession of common facts about a person should not be sufficient to masquerade as that person

I think the solution is better security. We'll only ever have basic facts to uniquely identify people. Biometrics can be hacked/copied too.

I think the problem could be mostly solved by requiring people who are getting credit to apply somewhere in person. Most credit card banks have branches everywhere (USA). The person has to have their face recorded stored along with credit line. This would have many benefits: legitimate people would think more about how serious getting a credit card is, fraudsters would be less likely to try and get fake credit, and it would be much clearer when the bank gave credit to the wrong person and has to eat the loss. Unlike the past, taking and storing pictures is now a trivial task. This is not likely to happen as the credit companies little liability under the current system.

"from the DMV"? no.

> DMV.org is a privately owned website that is not owned or operated by any state government agency.

DMV as in Department of Motor Vehicles. Many states have names like that hence "DMV"

No evidence here that you can get someone's birthday from any state-run DMV

How is that evidence? Birthday isn't included and it notes,

> "Note: Residence address and SSN are confidential except when the requester is authorized by law to receive it."

>The Department of Motor Vehicles (DMV) maintains information on approximately 32 million vehicles/vessels registration (VR), 27 million driver licenses (DL) and/or identification (ID) cards, and over 437,000 occupational licensing (OL) records.

>Confidential information is not considered public record. This includes certain DMV personnel matters, physical/mental information, residence address, social security number (SSN), incomplete findings from research, results of ongoing investigations, operation plans, and electronic data security controls.

DOB is listed on ID and not considered confidential information.

> DOB is listed on ID and not considered confidential information.

Nowhere does it say that. If you think they mean that by omission, I highly doubt it.

Considering it's listed as public record in the first place, I doubt it. I was just directly attacking your claim.

Birthday is not public record, and there is no evidence that it is. That's the point. There's no government entity that will give you a list of people's names and dates of birth.

You cited California DMV which does not give up that information. They won't even give you someone's address unless you're legally entitled, like the police.

How many dark patterns are used by truth-finder.com? Seriously.

Um, no. Your birthday is certainly not private information. Births are openly published in the local paper and online. Not to mention stuff like this - http://www.wfsb.com/story/26835982/website-displaying-voters...

Um, yes. It certainly is. I don't want it released publicly, nor have I consented to release it to any website.

If there are websites out there sharing that private information, they're doing it without my consent.

You may consider it private, but that doesn't make it legally private. Which is the kind of privacy under discussion in this thread.

Nobody has restricted this conversation to the legal sense.

Is your daily schedule, when you go to work,drop off your kids, what route you take to work protected legally? No, but you probably wouldn't want to share that information publicly either. Yet, in the future, a data breach could reveal such information, and a business could seek to resell it.

Did you read The link I posted? If you are a registered voter in Connecticut your birthday is public information by law.

Yes. One state doesn't mean all states. Also, it's tbd whether this website will be allowed to continue to operate,

> "According to the webmaster, there is a class action lawsuit that is trying to have the site shut down"

The article also argues that many types of workers ought to seek removal of their name from this list in the interest of personal security.

The story is making the case that this should be considered private information.

If you don't want it released publicly, a better chance is to ask you parents didn't do it when gave you birth. No offense, many places, like small towns will list birth of child on local newspaper if parents sign a form after birth in the hospital among a pile of forms.

It may be discoverable in that way, but that doesn't mean government entities are or should be disclosing names with birthdays en masse. Certainly not the entire country's all at once, which is what the RNC appears to have done.

> Birthday is an included item. That's definitely private as it is often used to confirm identity.

Lots of things that are actually not private data are used to confirm identity; “it is used to confirm identity” is not a disproof that a piece of information is public information.

Would you like to share your full name and birthday with the world?

I wouldn't. Therefore it is private.

That would disclose way more information than just my name and age. You would then be able to associate that information with my HN username and all the data associated with it.

I don't follow.

idbehold can't post his or her name and birthday without revealing he or she is the owner of the idbehold hacker news account. Which is more information than name+birthday, it's name+birthday+hacked news posting history.


This is a random guy. His named ERNEST BELL JR and his birthdate is 10/05/1959.

> dbehold can't post his or her name and birthday..

Not sure why you're replying for someone else..

> This is a random guy. His named ERNEST BELL JR and his birthdate is 10/05/1959. Inmates obviously lose some rights when convicted. Data security is minor compared to losing the freedom of movement.

Also doesn't surprise me that Florida would make all information on its inmates public.

Plot twist: creepydata is my alternate account and I am Ernest Bell Jr. born 10/05/1959.

> Birthday is an included item. That's definitely private as it is often used to confirm identity.

mylife.com publicly posts birthdays so apparently not

No kidding, but I don't want that information out there, and I imagine most people don't.

Just because some skeezy website shares all my personal information does not make it public information. It's private to me and I'd rather it remain undistributed, where possible.

Overall point being, this US voter data leak is bad. That appears to have released information on all of us that we consider private.

> Just because some skeezy website shares all my personal information does not make it public information.

I kinda think it does mean exactly that. Just because you don't want something to be public doesn't mean that it isn't public.

So do you consider your daily schedule, gps locations of where you go, to be public information?

Nothing about the definition of public or private restrict either to being protected by the government. I can consider some information private without it being protected in the legal sense.

That said, I expect that any government entity that released my birthday would be held responsible for a data leak.

> So do you consider your daily schedule, gps locations of where you go, to be public information?

Nope. What's your point?

Your words,

> "Just because you don't want something to be public doesn't mean that it isn't public."

In the future, your location data could be tracked, shared/hacked, analyzed to produce more information, and breached just like this RNC dataset.

My point is, something private doesn't become public when a criminal exposes it. It's still private to you.

What? It isn't public because it isn't public. Not because I don't want it to be public.

If in the future it is no longer public, then I won't consider it private. I may not be happy about it, but that doesn't change the fact that it is public at that point.

If a criminal releases 100,000 credit card numbers, we don't all of the sudden consider credit card numbers to be public. We try to limit the spread of the leak, if possible, and shore up whatever security lapse occurred. Nowhere in that scenario to we begin to consider credit card numbers public.

So it is with birthdays. There isn't any government organization who will distribute that information.

You most certainly do consider stolen credit cards public! You don't just limit the information and hope for the best—you attempt to inform everyone affected of the breach and try to get everyone to change their card number (because it's public at this point).

I think we're using public/private in different ways. You're referring to the data's present and future classification. I'm pointing to the data's past classification because that determines whether the collection of data was theft or not.

If all credit card numbers were stolen and distributed, we would still have considered them private before that.

The same is true for these birthdays. They were private, then stolen and made public. That they are now public doesn't retroactively make the theft okay. Theft is theft.

The lesson here is to tighten security, increase security education and awareness, and increase investigation into the most egregious of these crimes so that violators can be brought to justice.

The modeling info is the real value.

So taking that modeled data, loading it into Cambridge Analytica (which I understand is somewhat of a DMP in this sense), and leveraging it with highly-customized creative targeted against Custom Audience uploads using this modeled data would be insanely valuable to a political player with the capital to deploy this info weapon.

CA offers DMP services as part of their product suite, but they specialize in doing this kind of modeling as a professional service.

It's probably good that a lot of this is public, but one thing I think we're going to reckon with is how "available" all of this data should be, and how much it costs in terms of time or money to retrieve it.

Should I be able to find the public records for any citizen, anywhere in the country, at the snap of a finger? Or should I have to go to e.g. the courthouse and ask for the records in person, deal with a phone system, &c.

Adding this kind of friction only discourages people with good intentions. It doesn't stop people with not so noble intentions (who are most likely backed with a lot of money)

This is how data brokers make money: go through all the labor intensive manual acquisition processes and then resell it through a slick web UI.

where can I get access to this data?

Contact anyone with a voter file (google "voter file purchase")

Trivially: http://nationbuilder.com/voterfile


As for "For free", states are generally required by law to give it to you if you ask. Some charge fees. Only two have crazy fees (5k and 30k) has a crazy fee (though if you challenge them, ...etc)


There is a github project i'm aware of to put together all of the data: https://github.com/national-voter-file/national-voter-file

FYI, be careful what you do with it. Some of this sort of data--like federal contribution data from the FEC--cannot be legally used for commercial purposes[0]. They do actually enforce it, at least occasionally.

[0] https://transition.fec.gov/pages/brochures/saleuse.shtml#anc... (publication is from 1992, but still accurate per disclaimer at the top)

As long as the CEO of an company (RNC) that gives data to an outsourcer (Deep Root Analytics) is not going to jail to give data to an unqualified company, nothing will change.

If the CEO goes to jail, things will change very rapidly (CEO will manage his CMO much tighter who will first want to see an security audit not older than 6 months).

At least CEOs I have reported to as CTO were very sensitive for implemention issues in areas that could land them in jail.

Same for every other hacking (e.g. Sony) or IT failure (e.g. British Airlines crashed DC).

What law did they break, exactly? These aren't medical or financial records.

A careless programmer makes a bad choice and the CEO has to go to jail? Come on.

>?A careless programmer makes a bad choice and the CEO has to go to jail? Come on

An institutional failure of review, testing and security that will lead to tens of billions of dollars of identity theft goes unpunished completely?

Come on.

A CEO is responsible for his organization. If you ruin lives, you have to pay the price.

Can't handle the heat?

Don't take the job.

I hate how CEO's get hundred million dollar parachutes because, the risk and danger and difficulty of such a position warrants such extravagant pay.

But, then, we ask them to be responsible, bear responsibility for the organization which paid them a hundred million dollars to be responsible,and we say "come on?"

Utterly ridiculous.

CEO's bear responsibility for their organizations, or the organization should not exist. There must be responsibility for private organizations, lest the concept of private organization be nothing more than a cheap trick to remove criminal and civil liability from wrong doing.

This isn't social security numbers.

This is all publicly available data scrapping stuff. Like your public Facebook profile.

If you don't want that stuff to be leaked, then don't put your info publicly on Facebook.

I'm starting to kind of hope there is just a giant "leak" of every single US citizen's basic public information including SSN, etc. Get it all out there so we can stop having the debate over what is private information vs. public.

Then we can stop having this conversation constantly. None of this information is secret, I would put SSN into "quasi-secret" land since it takes such minimal effort to get at it.

We've been relying on security through obscurity for far too long. If the only thing stopping mass identity theft is someone compiling a list of otherwise public information, it's far beyond time we re-evaluate where the true problem lies.

So yeah, I agree. At some point society is going to actually have to confront this in a useful manner vs. hysterics and patching over an obviously failed system.

I didn't put mine on Facebook. What's step 2?

Are you registered to vote?

Then your info is publicly available for anyone to get. The government will just give it to you.

Don't want your freedoms threatened? Avoid attention by never exercising them!

I reject the argument that those who aggregate vast troves of data about people, publicly available or voluntarily shared though they may be, are exempt from any sort of responsibility for the curation and deployment of said data. Informational asymmetries lead to power imbalances, and sufficiently severe power imbalances lead to oppression.

It's still not clear that this is all publicly available information.

Jail is a bit much, but I do think corporations should be financially accountable for the clean-up of privacy spills. (E.g., similar to environmental disasters.)

If corporations face the prospect of a big bill, and the cost of that bill far exceeds the cost of keeping user data safe, a lot of the right things will start happen.

But what law specifically was broken? Should we have a law that punishes the CEO for data breeches? Is a CEO responsible if his experts recommended the practice? Is the CEO responsible if their staff went around and did this without conscent? That seems rife for abuse. Don't like your CEO, leak some data and have him go to jail.

I think data that has to do with voting records, or suspected voting records, would be very reasonable to be under the purview of being treated as sensitive data that, if breached, should have consequences to a company.

If these are voting records, which are public, it may well be that haven't done anything prohibited even if they intentionally distributed all this data to everyone.

As in, the company didn't want to distribute this data, so it's a breach, and the person who did that would be guilty of stealing the company's confidentional information (i.e. the modelling info) but it seems quite likely that purely (re-)distributing the core data of people's names and addresses doesn't actually violate any US laws at all; US privacy laws (outside of medical data) are very lax compared to e.g. EU.

I could imagine that victims of a future identity theft might have a civil claim against company if/when real losses have occurred, but it's quite possible that if the CEO personally published all this data, filmed all of this, and sent to the prosecutor's office, that no crime (according to current USA privacy laws) could be found there.

I'm not saying this shouldn't have consequences. What I'm saying is this is far too nuanced to just say "lock up the CEO"

How about making positive proposals of your own instead of negating everyone else's? Clearly many people find the existing rules and practices inadequate and propose heavy burdens of responsibility commensurate with the substantial incentives and rewards that accrue to success in business.

CEOs are not an oppressed class groaning under the burden of social structures that keep them locked up in the C-suite. Even if they are confronted with draconian penalties for naive misadventure, most CEOs of medium and large firms can afford A+ legal representation. If you're more worried about them than you are about the potential first and second-order effects upon tens or (in this case) hundreds of millions of people, then you are essentially choosing to be a pawn of the powerful.

I think you're blowing grovegames' original post out of proportion — he asked some pretty reasonable questions.

Of course they're reasonable. But problematic large scale data breaches are not a new problem. The last financial crisis was almost a decade ago, and yet we haven't developed a new culture of organizational responsibility since, despite the massive societal costs.

Not to make overly sweeping generalizations, but 'hold on, let's think through all the ramifications here instead of being too hasty' is a great way to maintain the status quo while avoiding any responsibility for it. Who benefits? It sure ain't the general public.

Indeed, lock up a couple CEOs and the others will feel a much stronger need to create better protections. Right?

I don't think it's so simple, but it's clear that most businesses take a reactive rather than a proactive approach to security and many other important considerations. Guillotining a few corporations is likely to have a salutary effect upon the others.

To some extent this is a cultural divide; anglo-Saxon capitalism has an unspoken ethic of 'forge ahead, cross bridges when you come to them' while continental European capitalism is far more accommodating of social considerations and has a 'first do no harm' approach. There are upsides and downsides to both approaches - and of course these are very shallow and incomplete characterizations of complex economic and cultural factors, which I have no intention of trying to defend if someone complains about them.

"A CEO is responsible for his organization. If you ruin lives, you have to pay the price."

If you break the law, you pay the price of jailtime. If you haven't broken the law, you might pay the price in the marketplace, but that's all.

If a law was broken, then of course whoever broke it should be prosecuted. But I don't think anyone disagrees with that, nor does it need to be explained in a lengthy HN comment. The only reason we need such an explanation is exactly because this isn't the way our society works. We set up the rules of the game and expect people to play within those rules, but we don't go around jailing people because we dislike them or disagree with their choices.

"CEO's bear responsibility for their organizations, or the organization should not exist."

Don't forget that a CEO is an employee, and just one employee. A particularly important and influential (and well-paid) one, but "just" an employee. An organization is not solely defined by its CEO, nor does it make sense to think of them as all-powerful in terms of what the organization does. A CEO who doesn't perform can (and probably will!) be fired at some point.

An institutional failure of review, testing and security that will lead to tens of billions of dollars of identity theft goes unpunished completely?

I'm going to agree with you in wanting to see someone punished for this, I'm not sure if I'm on the side of jail time in the absence of malicious intent.

You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security. It's like a hospital only employing 1 nurse and blaming her when patients inevitably die. The organization has a clear responsibility to employee a sufficient number of experts to protect their data. DRA is not unique here but does demonstrate a pattern of companies playing fast and loose with sensitive data with minimal repercussions. We'll keep seeing things like this until our laws are such that stewards of data like these have some sort of incentive to protect them.

>You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security.

How have you established that they didn't have a sufficient number of experts? What if they purchased a product or service and it simply didn't work? Its rather harsh to point fingers without having all of the information.

>We'll keep seeing things like this until our laws are such that stewards of data like these have some sort of incentive to protect them.

I think we need to give companies appliance-like products with a simple set of instructions that anyone can follow. Even a simple change where the data is stored in a 'vault' that requires the use of special tools with built in access controls and auditing would prevent a lot of data breaches. This means you cant email files around or share them on google docs or whatever. I'm convinced that people will do the right thing if you make it easy enough for them.

I'm really impressed at the lengths people will go to defend those whose malfeasance or ineptitude unarguably worsen the lives of up to two hundred million people. You seem like a smart person, please tell me how you think it's OK that these folks' contact details and potentially very detailed psychological and political profile information are now likely available on the dark web?

Given that this data was collected for explicitly political purposes with the specific goal of shaping voting behavior - one of the few things in American life where privacy is considered sacrosanct - surely you don't need me to point out the potential for manipulation, exploitation, and intimidation that become available to bad actors in possession of this data.

Are you familiar with the concept of 'strict liability'? Do you have any policy reason why such a standard shouldn't apply in cases like this?

Well, for one, you can legally obtain much more detailed information on a person from any background check application currently available. So, it's arguable this has really increased the likelihood of a future crime.

"In criminal law, strict liability is liability for which mens rea (Latin for "guilty mind") does not have to be proven in relation to one or more elements comprising the actus reus (Latin for "guilty act") although intention, recklessness or knowledge may be required in relation to other elements of the offense."

Proving recklessness is harder than you think.

If data is leak because someone within a company with the appropriate level of access decides to sell out to the dark web, all the security in the world won't protect you. Should the CEO go to jail because an employee turned on the company?

Heartbleed - you could have had 100 security professional on your team, and you still would have been vulnerable. Should every CEO on the planet go to jail?

Security persons do make mistakes and leave keys in places they shouldn't, genuinely by accident. Whom is going to jail for this error? If you think you are sending the security personel to jail, well, we're going to have an exodus of people willing to call themselves security personel, because no one is paid enough to risk jail for a job.

So, it's not a matter of defending ineptitude, it's a matter of recognizing the problem is complex and unless you can have clear boundries of what is punishible and what is not, your going to have a bad time enforcing anything that makes a difference. As a security person, I'm sure you know a policy without adequate enforcement is absolutely useless.

The hardest part of data processing is collecting it. If you can just grab already collated data for cheap, then it absolutely is more likely to be exploited.

That's all true, but continually talking about what a hard problem it is as a way to avoid settling on some harsh penalties (financial or custodial) for negligence of various types only perpetuates the problem. Courts can decide whether a harsh punishment should actually apply in any given case. But right now, no such punishments are even defined so there are no strong incentives to minimize negligence and restrict collection and distribution of that information.

tl;dr penal incentives function like a sword of Damocles. As long as we're debating whether and what size of sword of sword to hang from a thread, Damocles has no reason to worry.

> I'm really impressed at the lengths people will go to defend those whose malfeasance or ineptitude unarguably worsen the lives of up to two hundred million people. You seem like a smart person, please tell me how you think it's OK that these folks' contact details and potentially very detailed psychological and political profile information are now likely available on the dark web?

So far I've seen people disagree with simply throwing them in jail and people who want more data (was is a screw up? even security people make mistakes. Did they not have appropriate resources? etc).

I haven't seen anyone say this is OK in any way, shape or form. I see many reasonable discussions.

We've been having reasonable discussions on these topics for many years. Discussions that never lead to decisions and actions are a sideshow.

I'm calling for action - strong user-centric privacy protections with strict liability and significant personal and organization penalties for negligence, similar to the French model.

Criminalizing something doesn't mean you've "done something about it". It just means you've applied a traditionally flawed approach to fix a systemic problem that could be better handled through education, training and awareness. However, those things are much harder to do than to simply write something into the legal code without the public being any more prepared to deal with the situation.

a systemic problem that could be better handled through education, training and awareness

How's that been working out for you? This isn't a new problem. Where are those educational, training, and social awareness resources? What budgets have been allocated to them? What mechanisms put in place to monitor the effectiveness of the deployment? How many more years of theoretical discussions about ideal solutions should we have before acting, notwithstanding the possibility of error? If your cautious incrementalist approach is so great (and heaven knows I've spent many years thinking and advocating within that framework) why does the problem keep getting worse? How long and to what extent are you willing to wait for this informed public to manifest and (somehow) overcome all the countervailing forces that have economic and political interests in quite different outcomes?

And why, I ask myself, did you respond to my positive proposal about "strong user-centric privacy protections with strict liability and significant personal and organization penalties for negligence" by ignoring it and instead knocking down a straw man of 'criminalization' that I took care to avoid?

You don't want to be the person responsible for taking or advocating for a decision that might work out poorly, fine. But reiterating the reasons for your hesitancy achieves nothing.

>Where are those educational, training, and social awareness resources? What budgets have been allocated to them? What mechanisms put in place to monitor the effectiveness of the deployment?

Just because there is no information about that in the article, doesn't mean they weren't in place.

> If your cautious incrementalist approach is so great (and heaven knows I've spent many years thinking and advocating within that framework) why does the problem keep getting worse?

How do we know its getting worse? I work with a LOT of non-technical people, and they are very good at detecting spam emails, and not clicking on the fake bluescreen popups, etc using just their intuition and general awareness. They DO pay attention whenever articles about viruses and hacking and whatnot hit the front page.

>You don't want to be the person responsible for taking or advocating for a decision that might work out poorly, fine. But reiterating the reasons for your hesitancy achieves nothing.

Would you consider flipping it then? Let's also put software developers who introduce security bugs in jail. Oh, but software is so so complex!! A million different pieces working together, and I didn't write all that other code, so how could __I__ possibly be held responsible?! Well, people to people interactions are complex too, and putting a process in place where every person is supposed to follow a protocol is hard too.

There are times when I disagree with you, but then there are times like this when I can't agree more.

Now, with that said, I find this position slightly juxtaposed to the position you appeared to hold on the privacy of citizens when the Snowden leaks happened.

Why would political preferences be more sacrosanct than other preferences or private predilections people hold...

It would be great to define all the data-types a citizen can hold a position on and determine those which the government / entities can gain access to, and those which a citizen can expect privacy with...

And have that as a simple checklist as opposed to hidden in lengthy language of laws?

A good question indeed. I was not at all thrilled about eh Snowden leaks, but I had, and still have, some faith in the mechanisms of institutional governance, plus I viewed it in the context of strategic calculus.

Being a Euro I personally favor very rigorous privacy protections, and think you should be able to know who has data on you, get detailed copies of it in some accessible format, and request its deletion. Public institutions that do have a custodial data function should be subject to increasing levels of accountability and their powers should not be unlimited.

Now, since the US doesn't currently promulgate such strict data-gathering and retention standards in the public or private sector as I would like, it's a strategic reality that well-resourced actors like foreign governments can vacuum that up for their own ends, whether nefarious or merely curious. So I'm OK with the NSA collecting such data insofar as it seems irrational for the government to put itself at a disadvantage relative to everyone else in the private sector, in the same way that it would irrational for police officers to have fewer powers than regular people, as opposed to greater responsibility in the exercise of those powers.

In short, if all that data on people can be legally bought or acquired, it'd be pretty stupid for the USA/NSA to be the only entity that didn't have a copy.

I do heartily agree that data aggregation in both public and private sectors is way, way out of control, and I also agree that a checklist approach would be far preferable to yet more books of rules. I have some radical (but inchoate) technical approaches to this problem in mind, if you want to get in touch via gmail.

> You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security. It's like a hospital only employing 1 nurse and blaming her when patients inevitably die.

This isn't a very good example. A shortage of nurses will directly correlate to poor patient care and possible death. But a shortage of security experts? Who knows. I worked at an insurance company that left an access database open to the internet FOR YEARS. We ran analytics when I found it and it was never served from our web server.

So since it didn't get into the public does that mean they were responsible for their security? If the answer is "no" then how would you ever measure these unknowns?

Security is a major problem in tech. It's very difficult, it's nuanced and its vast. Security covers so so much that it would be difficult to one or maybe even a handful of security experts to fully over all aspects of an app depending on your scope.

Beyond that though is mistakes happen. People will screw up. Even security people can screw something up. Throwing someone in jail for a screw up reminds me of the war on drugs; it's not going to stop someone from making a mistake or simply not realizing an unknown unknown.

> Security is a major problem in tech. It's very difficult, it's nuanced and its vast. Security covers so so much that it would be difficult to one or maybe even a handful of security experts to fully over all aspects of an app depending on your scope.

Which is why it's so important to hold companies that screw it up accountable. That's the only way to get it to change. Forget about everything else, accountability will force new rules for data storage and protection. Without accountability, nothing will change.

> Which is why it's so important to hold companies that screw it up accountable. That's the only way to get it to change. Forget about everything else, accountability will force new rules for data storage and protection. Without accountability, nothing will change.

Sure but everyone on HN suggests accountability but never defines what they mean by it except for the few who think someone should just be thrown in jail.

So, what do you suggest for accountability?

I'm not sure it means jail time. It does need to be substantive. Holding the CEO, CTO and CSO directly accountable is a start. But I think it needs to be company wide. It needs to be punitive damage to the company and its shareholders. The risk of mishandling PII and similar data needs to outway the benefit (that may differ per type of PII).

At that point it would be my theory that consolidation around best practices, software, security audits... would become the norm. It would raise the cost of a company taking on the responsibility itself, that they'd rely on others to reduce the cost through volume. It would probably start looking a lot like PCI and credit card co. Requirements. The big difference here being that there isn't an industry body responsible, but the government, which would always be political and probably not have enough teeth.

Careless programmers can also mishandle health/credit card/minor information. We have laws protecting all of that data in particular. I'm not sure the expansion of data privacy laws to include all PII is so farfetched.

It's a slippery slope. Analysis of writing style, patterns of use, etc can deanonymize data to the point where basically everything becomes PII.

Treat everything as PII and we are good. The constitution has an amendment to protect our rights. That seems important. I know of no guaranteed right of corporations to infringe on our privacy and to provide access our data.

Forty years ago we didn't have this issue because there wasn't so much data for them to try to get their grubby greedy hands on. They don't need our data (ANY OF IT)!

The US constitution forbids the US government from acting in certain ways, it in no way impedes upon private organizations. Tort law and the like is what holds private organizations accountable. i.e. The fourth amendment does not protect you from a private entity or individual; laws covering trespassing, theft, breaking and entering do. Please don't drag the constitution into an argument it does not have a place in.

I didn't say the constitution protects me from private assholes. My point was privacy was important to our founders and remains important today. Had our founders known corporations would be a thing and grow as powerful as the government, I would argue they would be included.

We need laws to protect us from them. Much more and better laws. The fact that the constitution does protect us from gov. is a good argument that our gov. should be active in protecting us from corps. Anyway that is my conjecture.

I'd like to argue (not for the first time either) that the Constitution is seriously deficient in its failure to enshrine privacy as a personal right. Great as it has been for the last couple of centuries, I think it's obsolete and should be replaced rather than merely amended.

There's no reason that can't be done as an amendment.

It's not the only thing I'd like to change. Besides which, there is already a movement in progress to bring about another article V convention and I'm guessing the goal of the proponents is drastic rather than minimal alteration. Here's a recent summary article on developments:


The problem is everyone wants different drastic changes.

Obviously. This is going to lead to a bitter conflict in the not-too-distant future, unfortunately.

Let's slide a little more down that slope, then.

Where personal data and privacy is concerned, I'd rather err on the side of caution, than the world we live in now.

Dunno about anybody else, but I'd like to find a convenient canyon.

Literally everyone wants that, the problem is there isn't one, at least that anyone's been able to identify as of yet.

There certainly is one, if you only take into account public opinion. We're dealing with conflicting interests of people who generate data and corporations that collect and traffic data.

OP is clearly suggesting creating a law to avoid this sort of outcome. And while these aren't medical or financial records, they did include: "names, dates of birth, home addresses, phone numbers, and voter registration details, as well as data described as 'modeled' voter ethnicities and religions." Say on average people would pay $5 for this stuff not to be leaked. You're talking about a $1 billion fuckup resulting from a choice that is, as far as I can tell, gross negligence on the part of the programmer.

I dunno about the jail part, but the being ultimately responsible for the actions of the people working for you sort of goes with the title.

Like, why is the organization set up so that 1 programmer can make a catastrophic mistake? The CEO is responsible for that.

So if the system is set up so that one programmer can make a catastrophic mistake, then the system is broken.

If the system is set up so that one general can launch a nuclear warhead, then the system is broken.

If the system is set up so that one politician can kill people without a trial, then the system is broken.

If the system is set up so that one nurse can release data on 1 million patients, then the system is broken.

It's not "what happens when a careless programmer does X." but rather "why do we have a system where a careless programmer can do X."

While I generally agree with your point, it's hard to make this into law. Do you think it should be illegal to have a company with only 1 programmer? If not, how do you prevent them from making catastrophic mistakes?

Law generally doesn't prevent catastrophic mistakes, it creates consequences for them which incentivizes those in a position to make them to find ways of preventing them.

As of right now there is not a Federal law describing exactly what is PII data (which would be a prerequisite for this).

There are many (48) different state laws that do define what PII is and how organizations (commercial and governmental) are to handle data breach notifications. If you want to see what a crazy patchwork map of laws this is checkout:


These only come into play if a certain minimum number of state residents have had their data compromised and if that data is of a certain class.

Typical classes are:

- Account info - Financial info - Health Info - Health Insurance info - DNA - SSN - Biometrics, etc.

And I'm not a lawyer, and we likely don't have all the facts, but at first glance the data released in this breach doesn't meet any of those classifications. It looks pretty much like the data you'd get out of a phone book (name, address, phone number) with a few data points like geocoding and their guess as to your religion and politics.

Which isn't to say that it's great, or that it's not a problem that this was all released, but it is pretty much public data.

> What law did they break, exactly? These aren't medical or financial records.

I know this is USA, but FYI in the EU, all personal data is protected.

A careless programmer making a bad choice should simply not be able to leak 200M personal details.

I am not sure that jail time is really the thing here, but there are institutional problems if this is something that happens.

It's not about the careless programmer IMO. It's that there was nothing in place to make sure the data was secure. Or if there was, it wasn't effective.

The point is that we need laws to govern the management of personally sensitive data like this. Privacy laws in the USA are appallingly weak.

Where does the buck stop then?

Careless programmers don't compile dossiers on over 200M American citizens just for funsies.

You might be interested in visiting :







Not sure I understand your response. I wouldn't describe Facebook's security processes as "careless", nor would I describe their vast and complete data collection as "for funsies"

I think their point is that there are thousands of programmers who handle this data on a daily basis as part of their job - intentionally limiting the scope to "funsies" is ignoring that.

Well, that's the whole game. Either we care and there are consequences, or we don't care and there aren't consequences. We decided we don't care.

> What law did they break, exactly?

That's not how laws work. Laws can be whatever we write them to be. Losing medical and financial records was once not illegal too.

It absolutely is how criminal laws work under the Constitutional prohibition of ex post facto laws; while we can write forward looking criminal laws however we want (within other Constitutional limits), we can't apply those new laws to past conduct.

You can apply them to discussions about hypothetical solutions to current problems, though.

No, that's not how laws work. The law comes first. Then its application to behavior. If they didn't break any existing laws then there's nothing to do but propose a new law.

So, what, it should be illegal to leak your data on public Facebook scraping?

None of the info they had was private info.

If you don't want your info to be leaked then don't make it public.

Not retroactively, that would be a disaster.

And specifically mentioned in the constitution (twice!) as something the government can't do.

I agree. As an example, over a decade ago I was on a team deploying an app to Switzerland. They took privacy really seriously, because they told me that employees get fined/imprisoned for privacy breaches, not just corporate fines.

Voter records are public data. What would they be penalized for?

There's more than voter data here.

The size and scope of the data are larger than most voter record data.

The data appear to include proprietary data from various sources who may not agree with the terms of disclosure here.

Scale matters.

Maybe, but that's a violation of the company's TOS, not the public's right to privacy.

Throwaway account because of involvement in this field.

Even though the RNC is a private organization, it doesn't operate like a normal company. The partnerships that it makes with companies that it awards contracts to are largely relationship-driven, not actually driven by objective analysis of value propositions. Those decisions tend to be made at the COO/CoS (Chief of Staff) level.

If this data has private information on non-republicans then jail them for having it, not for leaking it.

There is no special purpose in these two groups of colluding Americans that grants them special rights to gather data on their non-associates any differently than any other member of the public.

If they are gathering data through any means it's no different than other marketing firms. I think the real debate needs to be about what any companies can share.

But it is not about what they can share, it is about what they or anyone can collect, store and use.

What you can share lets companies store and make decisions based on data they shouldn't have and couldn't share as long as their inputs and outputs look clean.

I,e. Facebook or Google could help you intentionally run a race biased campaign across all their assets as long as they don't tell you any specifics and include a little noise so you can't be sure of any one user's race. All thanks to what they can collect, use, but refuse to share.

This is a losing battle. The cost of rebuilding this kind of database, even from scratch, is only going to go down.

Why do you think any of this data is private?

They seem like they were doing reddit scraping/Facebook scraping.

Nothing illegal about that.

It's public information. The only difference is it's all in one place.

No special rights are needed - the kind of data that is described in the article may be difficult and/or expensive to get, but it's not illegal to get, not illegal to have, and not illegal to distribute to others. It would be legal for you to do the same - maybe it shouldn't be legal (and it isn't so in e.g EU), but it currently is in USA.

If the leak would've been published willingly by Deep Root Analytics, there would be no crime (according to current USA privacy rules) here at all, no currently valid reason to jail anyone.

Doesn't it have religious pref, ethnicity and much more?

> (e.g. Sony, Sony, Sony, Sony, Sony, Sony, Sony, Sony)


I hate to be in the position of defending a leak such as this. But if what they've done is "merely" compiling data that was available from our public profiles, are they obligated to secure that compilation? I'm asking -- I don't know for sure how the data was gathered, it just sounds like it was from scraping public records + public web sites.

Also, can someone ask Troy Hunt whether he has or can get access to this data so he can let us all know if we're on it? (But will it even matter if they don't have an email address field?)

Voter files being public data depends on which state you're talking about -- you're looking at ~50 different sets of laws and regulations. Some states restrict access to only political parties using it for political uses and prohibit distribution to non-parties/candidates. Some states (Florida, for example) will let anyone just apply to get it.

To your latter question, some states, like California, do include email addresses on their voter file, but the coverage tends to be poor.

You can probably assume that, if you're a registered voter in the United States, you're on this dataset, as is your age, gender, party affiliation if applicable, and race/ethnicity info if your state collects that, as well as modeled information projecting things like party support, race, age, and likelihood to turnout.

May I ask, where have you seen a voter file with email addresses? I haven't seen something like that myself.

https://en.wikipedia.org/wiki/Secret_ballot All 50 states have some form of secret ballot. Only WV has an open ballot.

This is voter registration data, not ballot data.

It's an interesting question, because there is a comparable in the intelligence community called aggregation.

Effectively two pieces of information when separate, might not be classified, but if they are linked with a third or combined they become classified. Add more and it changes the classification further.

I wonder if there should be something similar for data aggregation companies. Like what we see with HIPAA.

That's not a bad idea, but the specifics would be pretty hard to nail down.

Personally, while I understand that this is nothing illegal, I think it's terribly wrong.

We just handed everyone, including the 5% of society who tend towards sociopathy, a nicely tagged, collated (and yet probably slightly inaccurate) list of minorities.

Hate women? You have a nice list which includes the names, addresses, and telephone numbers of all those women.

Hate muslims? Boy do I have just the list for you. Blacks? Republicans? White muslim men who live in the same neighborhood as you?

Let's omit the sociopaths for just a moment, and let's look at the ad networks. Can you picture how much more accurate a picture those companies have of you now? They no longer have to guess at your age, ethnicity or religion - they now know. What could go wrong when that list of "legally" collated data gets combined with the RNC leak, and is subsequently itself leaked?

So no. It's probably not illegal to compile these lists. It's probably not even illegal that it was released. But it was, for certain, a damned immoral thing to do, and there will be consequences.

I don't really disagree that this is a shitty data breach, but on the consequences, I'll point out that the 5% of society who tends to sociopathy already have much lower-effort means to target minorities:

Hate women? Look for boobs.

Hate muslims? Look for brown skin.

Hate Republicans? Look for MAGA hats.

That these are inaccurate signals is irrelevant: haters gonna hate, and I really don't think they care whether the brown-skinned person they're harassing is actually a Muslim, they just want a target for their anger.

As for ad networks - they already have much more accurate models of age, ethnicity, and religion than the RNC has. There's a lot more money involved in targeting ads, and so they've put a lot more effort into it than political consultancies. Worrying about them is like closing the barn door after the horse is out.

> lower-effort means to target minorities

It doesn't get much lower effort than downloading a list, popping it into Excel, and sorting on a column (I'm willing to bet that some hate groups are already doing this and will release very specific lists to their membership). And with phone numbers and a couple of bucks, you don't even have to leave your house to send them hate messages by the thousands.

Haters gonna hate - far too flippant a phrase to describe those who emotionally and physically assault their targets.

As for the ad models - the RNC release has a very specific DOB, location, gender, and phone number. Some of these the ad network could guess at, but this provides concrete data.

It gets a lot lower effort than downloading a list, popping it into Excel, and sorting on a column (really, how many non-techies would do that?). Get on the subway, find girls with brown skin, start screaming obscenities, pull knife when confronted, murder.

Look, I don't want to minimize the impact of hate or harassment on victims. It really is terrible, and should be challenged whenever possible. I do want to inject some realism into the discussion of the likely consequences of this breach. The people who would go out and harm other people because of their ethnicity or religion don't particularly care if they get the ethnicity or religion right. (I'm reminded of a time when I was carrying my wife's purse while she went shopping in a nearby store, I walk by a pickup truck, and the guy inside is loudly muttering "Fucking faggots" over and over again. And then I met my wife in the parking lot, give her a kiss, hand her back her handbag, and he laughs a big "Ha-ha!" of relief and drives off.)

And the people who would do mass harassment over the phone have much easier ways to get this data, like e-mailing their state voter registry and asking for it.

Certainly, it makes it easier to call SWAT on the home of a person in Georgia while being in Alabama; this is much lower effort than shouting hate slogans at some random group of people in the street. This is scary nasty.

Why do you assume sociopaths just want to pick on individuals? There are ambitious sociopaths that draw up plans for things like ethnic cleansing, genocide and so forth. We have abundant examples of such behavior within recent history.

I think your concept of sociopathy qua serial killer is uninformed and stereotypical. Most sociopaths are not stabby weirdos blinded by hatred, they're just self-centred people with different levels of emotional affect/susceptibility than the general population.

It seems not to have occurred to you that (depending on record quality) a database like this would be great resource with which to find and recruit sociopaths of various stripes.

I don't assume that sociopaths just want to pick on individuals. I assume that the ambitious sociopaths drawing up plans for ethnic cleansing are the people who compiled this database in the first place.

Someone with the resources to perform ethnic cleansing could easily pony up the few thousand dollars that is required to buy this data in the first place:




I see, I misunderstood the point you were making in your earlier post, sorry.

One of the weird effects of the internet is that stuff that used to be considered obvious and self-evident is now treated as very sensitive. For example, no one used to consider their gender or race "private" information; rather, it was nearly impossible to interact with another human without these things being self-evident.

Not many years ago, there used to be a book distributed far and wide with the names, address, and phone number of virtually everyone that lived in your city. This was accepted as normal and routine, and you had to specifically opt out of being included.

The difference is in the scope. Back when phonebooks were widely used, there was no automated way to correlate those records with other records to create a highly specific list of all Asian women who live in a four block radius of a specific address.

You could walk around and find some of them, but that would take a significant amount of time and effort.

Phones were also not capable of interrupting you while you were out of the home, nor were they capable of receiving short messages on your dime.

Voter registration information is public. It doesn't seem like any of the information "leaked" is actually private.

I missed where it was all public; couldn't you share information with your political party you wouldn't want shared with others? And wouldn't the accumulation of all that data be valuable?

Many states have very loose control over their voter database. The control is mostly exerted through the force of the law prohibiting certain uses of the data. For example, I can order the voter database for the state of New York and receive it on a CD in a few weeks. Other states will simply email you a link to a big bundle of CSV files.

There is other information besides what is on the voter database in this disclosure, it appears, but the voter data itself is mostly not a secret and can be trivially accessed by any citizen. They just have to promise not to break the law around what they do with it.

Most data in there isn't public; voter files is only one part.

Wow, not good:

   "State", "Juriscode", "Jurisname", "CountyFIPS", "MCD", "CNTY", "Town", "Ward", "Precinct", "Ballotbox", "PrecinctName", "NamePrefix", "FirstName", "MiddleName", "LastName", "NameSuffix", "Sex", "BirthYear", "BirthMonth", "BirthDay", "OfficialParty", "StateCalcParty", "RNCCalcParty", "StateVoterID", "JurisdictionVoterID", "LastActiveDate", "RegistrationDate", "VoterStatus", "SelfReportedDemographic", "ModeledEthnicity", "ModeledReligion", "ModeledEthnicGroup", "RegistrationAddr1", "RegistrationAddr2", "RegHouseNum", "RegHouseSfx", "RegStPrefix", "RegStName", "RegStType", "RegstPost", "RegUnitType", "RegUnitNumber", "RegCity", "RegSta", "RegZip5", "RegZip4", "RegLatitude", "RegLongitude", "RegGeocodeLevel", "ChangeOfAddress", "COADate", "COAType", "MailingAddr1", "MailingAddr2", "MailHouseNum", "MailHouseSfx", "MailStPrefix", "MailStName", "MailStType", "MailStPost", "MailUnitType", "MailUnitNumber", "MailCity", "MailSta", "MailZip5", "MailZip4", "MailSortCodeRoute", "MailDeliveryPt", "MailDeliveryPtChkDigit", "MailLineOfTravel", "MailLineOfTravelOrder", "MailDPVStatus", "MADR_LastCleanse", "MADR_LastCOA", "AreaCode", "TelephoneNUm", "TelSourceCode", "TelMatchLevel", "TelReliability", "FTC_DoNotCall"

That does not even nearly fit on one line, so I broke it up. Yeah, it looks pretty bad. Here's hoping it's only sparsely filled out.

















































































As I said in the other thread, this is all public data from various sources (except the modeling). It will be fully filled out. It's very common for political orgs to have the entire voter file for the US (or be able to query it using SQL)

its the aggregation and easily searchable format for 2/3 of citizens that makes this REALLY dangerous for identity theft. they were still grossly irresponsible. no consequences will happen though.

It's already aggregated and easily searchable.


You could, in the next 10 minutes, go purchase a national voter file with all this info on 190 million plus voters.

Just Google "national voter file purchase".

Those are pre-aggregated, and honestly, really cheap (Usually ~2k).

You can also aggregate it yourself for less if you want.

Are you saying there's a $2k national voter file? Where?

He mentioned this in another thread:


Pardon my ignorance, what is the key data in this leak for identity theft? Eg, I didn't see anything really nasty like SSN, so what makes it easy for identity theft?

(not disagreeing, rather I'm seeking information)

Plus the supposedly "startlingly accurate" preference and views modeling that is linked to all personal details and publically accessible. While someone could always deny it being correct it again raises questions about responsibility in data collection.

> RNC_RegID, State, 2012ObamaVoter_DRA_12_16, 2012RomneyVoter_DRA_12_16, 2016ClintonVoter_DRA_12_16, 2016TrumpVoter_DRA_12_16, AmericaFirstForeignPolicy_agree_DRA_12_16 AmericaFirstForeignPolicy_disagree_DRA_12_16 AutoCompaniesShipJobsOverseas_agree_DRA_12_16 AutoCompaniesShipJobsOverseas_disagree_DRA_12_16 CorpReputs_AmericanMakers_DRA_12_16, CorpReputs_DailyLives_DRA_12_16, CorpReputs_Egalitarians_DRA_12_16, CorpReputs_EnviroConscious_DRA_12_16, CorpReputs_OpportunitySeekers_DRA_12_16, CorpReputs_STEMSupporters_DRA_12_16, CorpReputs_SupplyChainers_DRA_12_16, CorpReputs_Unifers_DRA_12_16, DemLeadersStandUpToTrump_DRA_12_16, DemLeadersWorkWithTrump_DRA_12_16, DParty_DRA_12_16, FinancialServicesHarmful_agree_DRA_12_16 FinancialServicesHarmful_disagree_DRA_12_16 FinServicesCompany_Dreamers_DRA_12_16 FinServicesCompany_RiskMitigators_DRA_12_16 FossilFuelsImportantForUSEnergySecurity_DRA_12_16 FossilFuelsNeedToMoveAwayFrom_DRA_12_16, InvestInfrastructure_agree_DRA_12_16, InvestInfrastructure_disagree_DRA_12_16, LowerTaxes_agree_DRA_12_16, LowerTaxes_disagree_DRA_12_16, NonReluctantDJTVoter_DRA_12_16, NonReluctantHRCVoter_DRA_12_16, PharmaCompsDoGreatDamage_agree_DRA_12_16, PharmaCompsDoGreatDamage_disagree_DRA_12_16, ReformGovtRegulations_agree_DRA_12_16, ReformGovtRegulations_disagree_DRA_12_16, ReluctantDJT_Above.5_DRA_12_16, ReluctantHRCVoter_DRA_12_16, RepealObamacare_agree_DRA_12_16, RepealObamacare_disagree_DRA_12_16 RParty_DRA_12_16, StopIllegalImmigration_agree_DRA_12_16, StopIllegalImmigration_disagree_DRA_12_16, TrumpStandUpToDems_DRA_12_16, TrumpWorkWithDems_DRA_12_16, USAFinancialSituation_Optimistic_DRA_12_16, USAFinancialSituation_Pessimistic_DRA_12



How could one tell if a vote is "reluctant" or not, based on the available data?

Lots of ways. Perhaps you know they're a super-loyal Republican voter and you have things that correlate with that, such as willingness to vote when presented with an online poll. You might know they were reluctant because the person shifted between the available alternatives during primary season - perhaps supporting Jeb! Bush, then Marco rubio, then Ted Cruz, then John Kasich, before finally falling into line when Trump won the nomination. You could then infer that Trump was the absolute last choice but that they would still 'hold their nose' and vote for him in the general election, either due to loyalty to GOP on particular policy issues or because of some long-standing hatred of Hillary Clinton, or just a history of being very conformist in political matters (as many, many people are).

More often than not, this data is collected in the millions of phone calls and door-to-door visits conducted by volunteers.

I can not vouch for the accuracy of those. but as a power user of GOPDataCenter (The portal they provide to campaigns and GOP County chairs) the Modeled Ethnicity is pretty terrible.

Can confirm.

“‘Microtargeting is trying to unravel your political DNA,’ [Gage] said. ‘The more information I have about you, the better.’ The more information [Gage] has, the better he can group people into "target clusters" with names such as ‘Flag and Family Republicans’ or ‘Tax and Terrorism Moderates.’ Once a person is defined, finding the right message from the campaign becomes fairly simple.”

Neal Stephenson wrote a book called Interface which predicted a form of tech-enabled micro-targeted politics over 20 years ago. It was disturbing at the time; it's almost considered business-as-usual now.

I believe American democracy would benefit from including the study of such techniques in our educational curriculum. When I was in school, we studied advertising techniques to help us be skeptical. We need the same for targeted political messages now.

I agree, but citizen education should not imho be the only approach here. I'm for much more muscular privacy laws and a slightly narrower tolerance on what's acceptable political speech.

Of course education is great, but look at the vast financial and operational asymmetries between even the most informed individual and well-resourced corporate actors like political parties. I have a super-strong political immune system but being politically engaged and navigating social media is exhausting. For the sake of objectivity I have to systematically expose myself to opinions I find disagreeable lest I retreat into a bubble and be surrounded by confirmation bias, but continuous exposure to countervailing political ideologies is intellectually and morally tiring, given the intense polarization and visceral rhetoric that prevails in today's political discourse.

Despite not liking programming, I've been seriously thinking about building a virtual assistant that I can train to pre-emptively tag people using my peculiar ideological criteria so that I can avoid or at least prepare for certain interactions that I know are going to be psychically difficult. By my value calculus, tuning out of politics is irresponsible at best and suicidal at worst; only communicating with people whose values you share exposes you to confirmation bias, and and inevitably exposes one to manipulation; observation of and argumentation with antagonists is psychically expensive and potentially dangerous.

so much as I agree with you on education, it's not something we can just put on the to-do list and wait a generation to benefit from. And that would be true even if we had a well-functioning educational sector rather than one that fails a large number of children and adults by leaving them only semi-literate and -numerate. People who can't read or reckon well are poorly positioned to identify fallacious political discourse.

I revel in my filter bubble and labor to improve it.

For policy wonks and activists such as myself, discourse, persuassion, marketing, are distractions from the real work of getting things done.

Firstly, because people vote their identity. Period. Almost no one votes on the facts, the issues, the policy, the platforms, whatever. There are no undecideds, no independents. Cite "Democracy for Realists".

Secondly, victory is achieved by mobilizing your supporters. You bring the heat, whoever is sitting in the chair will see the light.

The only distinction is if a voter is willing or unwilling to bother casting a ballot.

This raises so many questions...

Why is U.S. voter registration made public at the individual name/address level?

Why do the states publish their voter registrations in the first place?

Why should private campaign operations (or anyone else) have access to this data?

Shouldn't voters' privacy be protected by the states?

Is there a privacy policy you can review when you register to vote?

So that the two parties who voted for this to be the case can have unfettered access to their potential voters, go to their houses, send them things, and know for sure they're only hitting up people in their party, so as not to mobilize the other side.

Nothing that would harm the 2-party system ever changes in the US, and nothing ever will.

But why does it need to be public?

In the US, you register with the party no? So the party you register with your information, and they can do what you said

> go to their houses, send them things, and know for sure they're only hitting up people in their party, so as not to mobilize the other side.

Nope, in the US you register with the state and you can optionally tell the state that you are a member of some party.

You can register to vote and self-identify as 'Dem' 'Republican' etc while getting your driver's license.

Aside from the public voter file update, the Democratic Party doesn't get any special notification if you pick them, and you don't need to apply or get accepted in any way.

Depends on the state, but in general, no. You register as a member of the party. You register with the state. Voter registration information is used to determine who's eligible to vote, and you don't necessarily need to have a party affiliation to be able to vote (I don't, for example).

My guess is that it's public to avoid unfairly preferencing the major parties, so that they give at least lip service to the idea that "anyone can run for office". The U.S. is pretty sensitive about seeming like it has a fair and open political system, even if other aspects of the system mean that in practice a third party doesn't have a snowball's chance in hell.

Ah! I see, thanks.

Independent review of voter eligibility.

(Eligibility which will vary down to the smallest political divisions)

I don't typically don hats with this much tin foil, and I don't think this is likely, but...

The real danger of data like this, in my opinion, illegal usage for voter fraud.

Find people who are likely to vote against you and likely to have poor voter registration documents, and remove them from the polls so they can't vote.

Find people who aren't likely to vote at all and vote on their behalf. In-person, the only verification required is name & address. By mail, the only requirement is a signature, which can be obtained from receipts (I assume this is available on black hat markets).

Leaving this S3 bucket as public-read allows for deniable coordination with illegal actors. I can't imagine they did this on purpose but that could be an explanation.

I don't know if it's possible, but I hope the FBI / Mueller team is able to get access logs.

No. The data that would be useful for wide-scale voter fraud is already widely available from public/free sources, including state Secretaries of State or Departments of Elections.

The loss here is all the very expensive extra modeling and demographic work that isn't included on those files. But having that doesn't massively alter the mechanics of the voter fraud effort you're describing.

The expensive modeling (and data collection) makes it much cheaper and more feasible.

I agree that it doesn't change the fundamental mechanics, or enable otherwise impossible attacks.

Please learn how election administration is conducted before continuing to speculate, criticize.

An easy way is to see it first hand, by working as a poll judge or inspector.

Happy hunting.

Every county is different so some of my statements may not apply everywhere (I live in San Francisco).

I've yet to work a poll (next election) but have gotten to know the system here pretty well through the SF elections commission.

Our system is very far from perfect. Many counties do not even audit ballots after every election (let alone use only paper ballots). Epollbook software can be all over the place. Voter verification at polling places is often quite minimal. The penalty for forging a signature on a mail in ballot in CA is only $1000 (I was in the room when the state assembly committee voted not to raise the fine to keep up with inflation).

I don't mean to be alarmist - like I said, I don't think these things took place, at least en masse - but it'd be quite naive to suggest there aren't vulnerabilities.

Nicely done. We can certainly compare notes.

I still encourage you to work or observe poll sites on election day. Soup to nuts. If you work it, you'll get training, see how the Australian Ballot is supposed to work. It requires many hands, eye balls, proper accounting.

I'm not so worried about identity theft for in person voting. Just doesn't (didn't) seen to happen much on the west coast.

I vigorously opposed closing our poll sites in favor of all mail postal balloting (WA state). With ballot scanners and electronic adjudication of ballots (changing records in the database per "voter intent"), it's roughly equivalent electronic voting machines, with some new vulnerabilities added (eg tabulating ballots as they arrive, effectively a pre-count).

As various members of the election verification network (EVN) determined, auditing elections is infeasible, impractical, and does little or nothing to increase confidence or certainty.

The gold standard for our form of elections, which I continue to advocate, is the Australian Ballot. In place of auditing, use physical chain of custody. (As you likely know, election administration is not banking, where they have double entry bookkeeping.)


To truly fix our election integrity woes, we need to do two things.

First, replace our first past the post (FPTP) with a more robust voting system. Like approval voting (for executive races) and proportional representation.

Second, adopt universal voter registration, with automatic updates. Were our government to use any one of the number of existing demographic databases (facebook, seisent, choicepoint, NSA, etc) then we'd know in near real-time who was eligible to vote. And save huge money doing it.

Certainly agreed re; approval/proportional/similar.

I'm not sure what you mean regarding the EVN. They provide auditing services, don't they? Also recommend auditing paper ballots here:

> Conduct post-election audits before certification of final results

> Without voter-verified paper ballots, effective audits are impossible.

From their top ten list: http://editions.lib.umn.edu/electionacademy/2016/09/08/evns-...

My impression is that SF actually uses a ballot designed for chain of custody accounting, but doesn't use it whatsoever in practice because of the effort involved. I may be wrong on this. But "many hands, eyeballs, proper accounting" is unfortunately not available for our elections in most areas.

Happy to chat more about this - email is in my profile!

Even better than Approval Voting.


The real danger of data like this is concentration camps and death squads. I don't like stating things so dramatically but that's the sort of thing that actually happens, not just in WW2 but also under the USSR, in many dictatorships in the developing world, when countries like the former Yugoslavia experience political collapse, and so on. Just a couple of months ago there was a systematic effort in the Russian federation of Chechnya to round up and incarcerate homosexuals.

Don't make the mistake of thinking that atrocities couldn't possibly happen here just because you're used to thinking of them as something that only happens in other places.

I have this theory that the only way regular people will start caring about privacy breaches such as this one is to use that data against them in a malicious way. Tell the average Joe that the data of all US voters has been leaked, "Hmmm. That's bad." and they move on with their lives as if nothing happened. Instead, if this data is used to impersonate the average Joe on social media or if it's used to trick their mobile carrier into porting out their number, then they'll take notice. (I am not suggesting people do this, it was just part of a thought experiment)

Unless the company involved is sued to bankruptcy and the people involved are prosecuted, sending a strong message to companies dealing with user data, nothing will change. But that's unlikely to happen as this company is backed by the RNC.

While we're on the topic of collecting personal data of people, there's a simple solution : just don't collect it unless it's absolutely necessary. Stop asking me to broadcast my address in my newsletter. Stop asking me to submit my billing address when I make payments online. Stop asking me for my mobile number when I visit a fast food restaurant. Most of the companies that collect this data are not competent enough to keep it secure. The reason companies ask for an address to broadcast in users' newsletters is some anti-spam act which does not prevent the spammers from doing their job. I imagine it's also a requirement for companies to collect a billing address for certain types of online payments. Change the law to remove these poorly thought out legislature.

More generally, we need regulations on how user data is used by companies. They should not be allowed to store user data indefinitely. If a user closes an account with a company, retain the data for a short period (eg- 1 year) and then delete the data automatically. Companies should not be allowed to build shadow profiles of users.

>I have this theory that the only way regular people will start caring about privacy breaches such as this one is to use that data against them in a malicious way

The other thought experiment is to adopt the opposite point of view that privacy is overrated. You can adopt various worldviews from 'everything is grey' to 'this is good, that is bad" but punishing someone to care about same things you care about is a pretty terrible approach. If you can't convince someone, it doesn't mean they're stupid, it could also mean you just aren't that good at communicating, or that perhaps what you think is important isn't all that important.

>Unless the company involved is sued to bankruptcy and the people involved are prosecuted, sending a strong message to companies dealing with user data, nothing will change.

Or we can give them tools that make it easier to secure data. I've always found that if you make it easy for someone, they almost always end up doing the right thing. As it stands the security products/services domain is a complicated maze where you have to be an expert to evaluate how various products work internally and which services, if any are worth purchasing.

There's a pretty compelling argument though, that people aren't good at making long-term assessments of diffuse, but potent, risks, and/or are willing to, or can be coerced into, arbitraging long-term interests with short-term exigency.

Gresham's law, availability hueristics, optimism bias, distribution of cognitive skills, various aspects of game theory, and more, strongly suggest this.

Examples: global warming, pollution risks, resource depletion, moral and morale hazard, just off the top of my head.

Yeah, and all those biases apply to people on HN too - You (the proverbial) aren't good at making assessments either. Also, flipping it for arguments sake, how come software companies are never penalized for introducing software bugs? How would you like it if an accountant wanted to send you to jail because you introduced a security bug and their critical data got wiped out and they went out of business? Or should we go back to blaming them for not having backups because we have an a-priori assumption that "shit happens" when it comes to software? Well, the other side could say 'shit happens' too.

Those are actually questions I'm actively exploring.

Some recent discussion (from myself and others) on this G+ thread:


I've discussed Gresham's Law dynamics numerous times at my subreddit/blog. See particularly:


I've been meaning to write up a bit expanding market price dynamics beyond the set of goods that Adam Smith defined: labour, capital, commodities, rents, and (indirectly) interest.

In particular, the question of risk pricing, which is treated almost wholly as a financial question rather than an economic one.

The question of pricing under duress is a key one -- the Backward-'S' bending supply curve is a curious economic anomaly:


Also the behaviour of natural resource stocks under supplier pressure -- the price will fall to the lowest levels possible, and supplied volume will increase, if possible, for a number of highly perverse reasons. The collapse of oil prices following the East Texas oilfield discovery, from ~$1/bbl to first $0.13/bbl, then $0.02/bbl, before wellhead production was siezed at force of arms by the Oklahoma and Texas national guard, and Texas rangers, comes to mind.


I'm sure some SV companies will be more than happy to comb over this leak to fire employees for wrongthink.

I think people are missing the big legal liability .... this information has been published to the world, it contains estimates of people's deeply held political beliefs - some of it will be wildly wrong and those people might consider that they have been libeled .... roll on the lawsuits

Libel, in the US, requires the publisher to either known the claim is false ormto publish it with reckless disregard for the truth when it is, in fact, false; I don't think that there is anyway that this could be construed to meet that, no matter what legal theory as to who the liable publisher is you use.

The liability wouldn't be with the RNC or Deep Root; the liability would be on the entity that actually published the information. That's if you consider this to meet the legal definition of publishing (the legal definition of libel, as I understand it, is "to publish in print (including pictures), writing or broadcast through radio, television or film...").

I still don't understand the concern.

Want the name, age, gender, home address, mailing address, party of registration, and voter history from every registered voter in North Carolina? Here is the "leak" on Amazon S3. http://dl.ncsbe.gov/index.html?prefix=data/

Except, by leak, I mean, link I got from my state board of elections' homepage.

Related: This legally mandated "leak" happened years ago, and it even included signatures. Most of the citizens pushing for the recall of a sitting republican governor were democrats, so it seemed like punishment for a Republican state assembly to pass this one-off policy. Especially for those who have zero internet-presence. The first thing many employers see when they search for politically active democrats is this information, paired with "quick searches" of their names on criminal, pedophile and dangerous persons databases. If you want to disinsentivize political action, this is how you do it. I won't link to that site, but can confirm it's still up.


Are the leaked files floating aroubd on the internet, or were they able to shut it down before anyone else got to it?

I've seen no evidence that anyone but the security researcher found this, but maybe that'll change.

Except this isn't a leak. This is publicly searchable information. I don't know why people are blowing this out of proportion.

Because technologists don't know everything about laws and society, even though they like to think they do. You might add that not only is it searchable, but the Freedom of Information Act and its derivative implementations in state laws mandate that it be made accessible in this way for no cost other than the expense in compiling the records.

People getting angry when "government transparency" is supposedly such a good thing no one questions? Go figure...

Because they feel it's wrong, legality aside.

> It would ultimately take days, from June 12th to June 14th, for Vickery to download 1.1 TB of publicly accessible files

Do security firms have special permission to do this? Because as a private citizen, I am pretty sure I would go to jail if I tried this.

IANAL, but I'm pretty sure if someone leaves a Top-Secret document on the ground, and you pick it up, you don't go to jail for that. The data is accessible to the public -- this isn't a hack, this is just downloading.

The only people who can go to jail for mishandling classified information are people with security clearances.

Didn't weev end up on the wrong side of this relatively blurry line?

They don't have a special permission, but each case is treated individually - what exactly technically you did matters just a little, the intent (not claimed intent, but the intent that the judge/jury would imply), circumstances and what you do afterwards with the data matter much more.

I.e., if a respected company downloads the data, reviews what horrible things it has and reports it to proper authorities (and gets legal advice before that on how best do it), then they're very likely to be treated as not done anything bad;

If I'd do the same, contact them asking to fix the vulnerability "or else", and then download the data and publish an angry video rant on youtube, that might land me in trouble, as (expected) intent matters a lot for prosecuting crimes.

IANAL, but not as far as I'm aware.

You could make the argument that since the information was not protected in any way that you were allowed to download it, but try explaining that to a 65 year old judge who doesn't even comprehend the basic structure of the internet.

Great to merge the discussion but IMO the Upguard article is original/superior details.

Agreed -- the Upguard article is a better topic link for this.

If you're a glass half full person at least they care about the do not call list enough to give it its own column.

They take the DNC list and subsequent enforcement very seriously. https://www.ftc.gov/news-events/media-resources/do-not-call-...

source: work at place with large call center. Avoiding DNC fines is among one of our top priorities.

secondary source: S.O. works at place with call center for a large bank. Same thing.

Also anecdotally: If you file a complaint with the FTC for an unknown number that keeps calling you back without them giving you a chance to "opt out" (this is most scammer numbers), you file a ticket with the FTC, and they usually respond to the ticket within 2-3 days. (Another funny thing -- they use Zendesk.). I stopped receiving the calls since filing the report.

As to why they take it so seriously? My guess is it's easy money for them. Kind of like traffic tickets for cops.

Political calls are exempt from the DNC list:


The parties take it seriously because they don't want to lose a vote by pissing someone off.

Is this available online in a searchable way? I want to see what it has on me...

What are the legal implications of a campaign using this leaked modelling data? Would it be breaking any existing laws to use data that was leaked to the public without the permission of the original company that did the modelling? (If nothing else, I suppose it's probably a copyright violation.)

Hypothetically, could one deliberately leak a trove of modelling data with some fake voters inserted, and then monitor the mailbox associated with that fake voter and sue any organization you don't like that sends campaign flyers for using the data without permission?

TLDR. Was any of the information non-public?

The results of data science models are contained in the data dump, and those are non-public. The rest of the information is accessible via public records or registries maintained by states.

> Spreadsheets containing this accumulated data—last updated around the January 2017 presidential inauguration—constitute a treasure trove of political data and modeled preferences used by the Trump campaign

Genuinely curious: can you really have 198 million rows in a spreadsheet?

Excel limits each "sheet" to 1,048,576 rows, but there's no limit to how many sheets you can have in a single file. https://support.office.com/en-us/article/excel-specification...

I used to work in this field (political consulting, the data around it, nothing terrible interesting on the tech front at the time though). It was most likely broken down by Congressional district.

It’s provably a CSV not meant to be used by excel.


>Each file, formatted as a comma separated value (.csv), lists an internal, 32-character alphanumeric “RNC ID”—such as, for example, 530C2598-6EF4-4A56-9A7X-2FCA466FX2E2—used to uniquely identify every potential voter in the database.

It's plural, spreadsheets. It could be thousands of separate files.

Good catch. According to Microsoft[0], the max number of rows is 1,048,576. So I guess there must have been a couple hundred!

[0] https://support.office.com/en-us/article/Excel-specification...

I'm just waiting to hear Trump tell Deep Root Analytics, "You're Fired!"

It would be good to see him make this a clear case of responsibility. Also, someone on the RNC side needs to get fired, too. I'm not sure who, but errors this big demand it.

Trump only has "signaling authority" over the entities with whom the RNC contracts; he can't actually force a contract termination, but he can express distrust in an entity and encourage the RNC to terminate a contract. I don't see that happening, for a myriad of reasons.

The people involved with the decision to start working with Deep Root are mostly not with the RNC anymore. Even if they were, that's simply not how the industry works.

Is there any way to access the Deep Root data to see what it says about me? Has anybody posted it online in a searchable format?

Doesn't this kinda make many (flawed but still in use) "security measures" incredibly vulnerable to social engineering? That's all the birth dates and phone numbers. That's crazy!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact