Speaking as a guy with a lot of experience with voter data ( I built the first "where do I vote" apps for Google and helped found the voting information project):
This is actually almost entirely public data. Yes, including addresses and phone numbers and political affiliation. There are some states that is not public as part of the voter file, but you can still get it other ways publicly. For example: USPS, etc. Some states/players would make you sign agreements not to use it for commercial purposes.
The modeling info included is not public.
Acquiring 50 state data can be a bit of a pain, but there are at least two major players that will sell it to you.
(I remember one of them literally laughed when I told them we would want the databases without any personal info included, because we just wanted the address to various political precinct mapping.)
That it's used to confirm identity shows how weak identity-theft protections are at most institutions, not what's public information. (For that matter, mother's maiden name is basically public information as well: you can get it from genealogical records.)
The link was preceded by the words "from any of several dozen commercial providers that resell government data:" so that's how I took it. It would be at a .gov domain if it were state affiliated.
Also, "reselling government data" implies that the government explicitly gave permission to a business selling your data. I doubt that's true. More likely, these entities gathered data from whatever entities they could, probably various private companies from which you've made online purchases.
That, and, I've never consented to release my birthday publicly. I consider it to be private, therefore it is private. It's mine! I expect the government would not release it since they use it as part of an identifier all the time.
There's no perfect identifier. That's why we need security. This is a major lapse.
Pretty sneaky of the comment above yours to try to pass dmv.org as a government website.
I think you are confused about what "private" means.
I've never "consented" to "releasing" my address publicly, but real estate records are public information thus the real estate I own is easily queried from my town's website and the real estate transaction was published in all the local newspapers. You can even look up my property tax bills from the towns website and my payment history of said taxes. If you're feeling generous you can also pay my property taxes online too.
Is the list of doctors you visit private? I could probably discover such information, but it is considered private.
Your address is usually public unless you go to lengths to obfuscate it. You can go to Hollywood and get a map of where all the movie stars live. And, the white pages list your phone number unless you opt out.
Birthday has never been publicly available until this voter data leak. Not good.
> I've never consented to release my birthday publicly.
Too bad. The government doesn't care about you consenting to things. In fact, making you do things without your consent is literally the entire point of organized government, even if we usually overlook that because it's for a good purpose (for example, taxation to pay for health care or national defense).
You're off topic. The government hasn't released my birthday. I question their data security practices, but any release of such information would be considered a mistake.
I don't care if you think paying for health care or national security is important or not. That's an unrelated issue to whether or not Birthday is private or public data.
How do you know if your birthday isn't in some publicly accessible government database somewhere? Did you try to find it? Did you hire a private investigator (who has quick access to all those databases) and ask them if they can find your birthday?
If it were, I would let the government know that's not okay. Hackers would be one step closer to being able to sign up for a credit card under my name.
If it's true that that information is out in the wild now, then I expect government to tighten their tech security procedures.
"I would let the government know that's not okay" - lol, yeah good luck with that.
How do you think the RNC got birthdays for 200 million Americans in the first place? Public records. 200 million Americans are NOT affiliated with the RNC.
> How do you think the RNC got birthdays for 200 million Americans in the first place?
As I replied elsewhere, there is a market for reselling your information on the internet. In some cases that is legal, and in many it's probably not. As a tech person you should know this.
Yes, the only reason that anyone cares about their date of birth is that in recent times it has been something that can be used in identity theft. Back in the pre-internet days, nobody cared. People had their SSNs pre-printed on their personal checks too.
The solution to all of this data privacy hysteria is to change our approach. The possession of common facts about a person should not be sufficient to masquerade as that person.
> The solution to all of this data privacy hysteria is to change our approach. The possession of common facts about a person should not be sufficient to masquerade as that person
I think the solution is better security. We'll only ever have basic facts to uniquely identify people. Biometrics can be hacked/copied too.
I think the problem could be mostly solved by requiring people who are getting credit to apply somewhere in person. Most credit card banks have branches everywhere (USA). The person has to have their face recorded stored along with credit line. This would have many benefits: legitimate people would think more about how serious getting a credit card is, fraudsters would be less likely to try and get fake credit, and it would be much clearer when the bank gave credit to the wrong person and has to eat the loss. Unlike the past, taking and storing pictures is now a trivial task. This is not likely to happen as the credit companies little liability under the current system.
>The Department of Motor Vehicles (DMV) maintains information on approximately 32 million vehicles/vessels registration (VR), 27 million driver licenses (DL) and/or identification (ID) cards, and over 437,000 occupational licensing (OL) records.
>Confidential information is not considered public record. This includes certain DMV personnel matters, physical/mental information, residence address, social security number (SSN), incomplete findings from research, results of ongoing investigations, operation plans, and electronic data security controls.
DOB is listed on ID and not considered confidential information.
Birthday is not public record, and there is no evidence that it is. That's the point. There's no government entity that will give you a list of people's names and dates of birth.
You cited California DMV which does not give up that information. They won't even give you someone's address unless you're legally entitled, like the police.
Nobody has restricted this conversation to the legal sense.
Is your daily schedule, when you go to work,drop off your kids, what route you take to work protected legally? No, but you probably wouldn't want to share that information publicly either. Yet, in the future, a data breach could reveal such information, and a business could seek to resell it.
If you don't want it released publicly, a better chance is to ask you parents didn't do it when gave you birth. No offense, many places, like small towns will list birth of child on local newspaper if parents sign a form after birth in the hospital among a pile of forms.
It may be discoverable in that way, but that doesn't mean government entities are or should be disclosing names with birthdays en masse. Certainly not the entire country's all at once, which is what the RNC appears to have done.
> Birthday is an included item. That's definitely private as it is often used to confirm identity.
Lots of things that are actually not private data are used to confirm identity; “it is used to confirm identity” is not a disproof that a piece of information is public information.
That would disclose way more information than just my name and age. You would then be able to associate that information with my HN username and all the data associated with it.
idbehold can't post his or her name and birthday without revealing he or she is the owner of the idbehold hacker news account. Which is more information than name+birthday, it's name+birthday+hacked news posting history.
> dbehold can't post his or her name and birthday..
Not sure why you're replying for someone else..
> This is a random guy. His named ERNEST BELL JR and his birthdate is 10/05/1959.
Inmates obviously lose some rights when convicted. Data security is minor compared to losing the freedom of movement.
Also doesn't surprise me that Florida would make all information on its inmates public.
No kidding, but I don't want that information out there, and I imagine most people don't.
Just because some skeezy website shares all my personal information does not make it public information. It's private to me and I'd rather it remain undistributed, where possible.
Overall point being, this US voter data leak is bad. That appears to have released information on all of us that we consider private.
So do you consider your daily schedule, gps locations of where you go, to be public information?
Nothing about the definition of public or private restrict either to being protected by the government. I can consider some information private without it being protected in the legal sense.
That said, I expect that any government entity that released my birthday would be held responsible for a data leak.
What? It isn't public because it isn't public. Not because I don't want it to be public.
If in the future it is no longer public, then I won't consider it private. I may not be happy about it, but that doesn't change the fact that it is public at that point.
If a criminal releases 100,000 credit card numbers, we don't all of the sudden consider credit card numbers to be public. We try to limit the spread of the leak, if possible, and shore up whatever security lapse occurred. Nowhere in that scenario to we begin to consider credit card numbers public.
So it is with birthdays. There isn't any government organization who will distribute that information.
You most certainly do consider stolen credit cards public! You don't just limit the information and hope for the best—you attempt to inform everyone affected of the breach and try to get everyone to change their card number (because it's public at this point).
I think we're using public/private in different ways. You're referring to the data's present and future classification. I'm pointing to the data's past classification because that determines whether the collection of data was theft or not.
If all credit card numbers were stolen and distributed, we would still have considered them private before that.
The same is true for these birthdays. They were private, then stolen and made public. That they are now public doesn't retroactively make the theft okay. Theft is theft.
The lesson here is to tighten security, increase security education and awareness, and increase investigation into the most egregious of these crimes so that violators can be brought to justice.
So taking that modeled data, loading it into Cambridge Analytica (which I understand is somewhat of a DMP in this sense), and leveraging it with highly-customized creative targeted against Custom Audience uploads using this modeled data would be insanely valuable to a political player with the capital to deploy this info weapon.
It's probably good that a lot of this is public, but one thing I think we're going to reckon with is how "available" all of this data should be, and how much it costs in terms of time or money to retrieve it.
Should I be able to find the public records for any citizen, anywhere in the country, at the snap of a finger? Or should I have to go to e.g. the courthouse and ask for the records in person, deal with a phone system, &c.
Adding this kind of friction only discourages people with good intentions. It doesn't stop people with not so noble intentions (who are most likely backed with a lot of money)
As for "For free", states are generally required by law to give it to you if you ask. Some charge fees.
Only two have crazy fees (5k and 30k) has a crazy fee (though if you challenge them, ...etc)
FYI, be careful what you do with it. Some of this sort of data--like federal contribution data from the FEC--cannot be legally used for commercial purposes[0]. They do actually enforce it, at least occasionally.
As long as the CEO of an company (RNC) that gives data to an outsourcer (Deep Root Analytics) is not going to jail to give data to an unqualified company, nothing will change.
If the CEO goes to jail, things will change very rapidly (CEO will manage his CMO much tighter who will first want to see an security audit not older than 6 months).
At least CEOs I have reported to as CTO were very sensitive for implemention issues in areas that could land them in jail.
Same for every other hacking (e.g. Sony) or IT failure (e.g. British Airlines crashed DC).
>?A careless programmer makes a bad choice and the CEO has to go to jail? Come on
An institutional failure of review, testing and security that will lead to tens of billions of dollars of identity theft goes unpunished completely?
Come on.
A CEO is responsible for his organization. If you ruin lives, you have to pay the price.
Can't handle the heat?
Don't take the job.
I hate how CEO's get hundred million dollar parachutes because, the risk and danger and difficulty of such a position warrants such extravagant pay.
But, then, we ask them to be responsible, bear responsibility for the organization which paid them a hundred million dollars to be responsible,and we say "come on?"
Utterly ridiculous.
CEO's bear responsibility for their organizations, or the organization should not exist. There must be responsibility for private organizations, lest the concept of private organization be nothing more than a cheap trick to remove criminal and civil liability from wrong doing.
I'm starting to kind of hope there is just a giant "leak" of every single US citizen's basic public information including SSN, etc. Get it all out there so we can stop having the debate over what is private information vs. public.
Then we can stop having this conversation constantly. None of this information is secret, I would put SSN into "quasi-secret" land since it takes such minimal effort to get at it.
We've been relying on security through obscurity for far too long. If the only thing stopping mass identity theft is someone compiling a list of otherwise public information, it's far beyond time we re-evaluate where the true problem lies.
So yeah, I agree. At some point society is going to actually have to confront this in a useful manner vs. hysterics and patching over an obviously failed system.
Don't want your freedoms threatened? Avoid attention by never exercising them!
I reject the argument that those who aggregate vast troves of data about people, publicly available or voluntarily shared though they may be, are exempt from any sort of responsibility for the curation and deployment of said data. Informational asymmetries lead to power imbalances, and sufficiently severe power imbalances lead to oppression.
Jail is a bit much, but I do think corporations should be financially accountable for the clean-up of privacy spills. (E.g., similar to environmental disasters.)
If corporations face the prospect of a big bill, and the cost of that bill far exceeds the cost of keeping user data safe, a lot of the right things will start happen.
But what law specifically was broken? Should we have a law that punishes the CEO for data breeches? Is a CEO responsible if his experts recommended the practice? Is the CEO responsible if their staff went around and did this without conscent? That seems rife for abuse. Don't like your CEO, leak some data and have him go to jail.
I think data that has to do with voting records, or suspected voting records, would be very reasonable to be under the purview of being treated as sensitive data that, if breached, should have consequences to a company.
If these are voting records, which are public, it may well be that haven't done anything prohibited even if they intentionally distributed all this data to everyone.
As in, the company didn't want to distribute this data, so it's a breach, and the person who did that would be guilty of stealing the company's confidentional information (i.e. the modelling info) but it seems quite likely that purely (re-)distributing the core data of people's names and addresses doesn't actually violate any US laws at all; US privacy laws (outside of medical data) are very lax compared to e.g. EU.
I could imagine that victims of a future identity theft might have a civil claim against company if/when real losses have occurred, but it's quite possible that if the CEO personally published all this data, filmed all of this, and sent to the prosecutor's office, that no crime (according to current USA privacy laws) could be found there.
How about making positive proposals of your own instead of negating everyone else's? Clearly many people find the existing rules and practices inadequate and propose heavy burdens of responsibility commensurate with the substantial incentives and rewards that accrue to success in business.
CEOs are not an oppressed class groaning under the burden of social structures that keep them locked up in the C-suite. Even if they are confronted with draconian penalties for naive misadventure, most CEOs of medium and large firms can afford A+ legal representation. If you're more worried about them than you are about the potential first and second-order effects upon tens or (in this case) hundreds of millions of people, then you are essentially choosing to be a pawn of the powerful.
Of course they're reasonable. But problematic large scale data breaches are not a new problem. The last financial crisis was almost a decade ago, and yet we haven't developed a new culture of organizational responsibility since, despite the massive societal costs.
Not to make overly sweeping generalizations, but 'hold on, let's think through all the ramifications here instead of being too hasty' is a great way to maintain the status quo while avoiding any responsibility for it. Who benefits? It sure ain't the general public.
I don't think it's so simple, but it's clear that most businesses take a reactive rather than a proactive approach to security and many other important considerations. Guillotining a few corporations is likely to have a salutary effect upon the others.
To some extent this is a cultural divide; anglo-Saxon capitalism has an unspoken ethic of 'forge ahead, cross bridges when you come to them' while continental European capitalism is far more accommodating of social considerations and has a 'first do no harm' approach. There are upsides and downsides to both approaches - and of course these are very shallow and incomplete characterizations of complex economic and cultural factors, which I have no intention of trying to defend if someone complains about them.
"A CEO is responsible for his organization. If you ruin lives, you have to pay the price."
If you break the law, you pay the price of jailtime. If you haven't broken the law, you might pay the price in the marketplace, but that's all.
If a law was broken, then of course whoever broke it should be prosecuted. But I don't think anyone disagrees with that, nor does it need to be explained in a lengthy HN comment. The only reason we need such an explanation is exactly because this isn't the way our society works. We set up the rules of the game and expect people to play within those rules, but we don't go around jailing people because we dislike them or disagree with their choices.
"CEO's bear responsibility for their organizations, or the organization should not exist."
Don't forget that a CEO is an employee, and just one employee. A particularly important and influential (and well-paid) one, but "just" an employee. An organization is not solely defined by its CEO, nor does it make sense to think of them as all-powerful in terms of what the organization does. A CEO who doesn't perform can (and probably will!) be fired at some point.
An institutional failure of review, testing and security that will lead to tens of billions of dollars of identity theft goes unpunished completely?
I'm going to agree with you in wanting to see someone punished for this, I'm not sure if I'm on the side of jail time in the absence of malicious intent.
You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security. It's like a hospital only employing 1 nurse and blaming her when patients inevitably die. The organization has a clear responsibility to employee a sufficient number of experts to protect their data. DRA is not unique here but does demonstrate a pattern of companies playing fast and loose with sensitive data with minimal repercussions. We'll keep seeing things like this until our laws are such that stewards of data like these have some sort of incentive to protect them.
>You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security.
How have you established that they didn't have a sufficient number of experts? What if they purchased a product or service and it simply didn't work? Its rather harsh to point fingers without having all of the information.
>We'll keep seeing things like this until our laws are such that stewards of data like these have some sort of incentive to protect them.
I think we need to give companies appliance-like products with a simple set of instructions that anyone can follow. Even a simple change where the data is stored in a 'vault' that requires the use of special tools with built in access controls and auditing would prevent a lot of data breaches. This means you cant email files around or share them on google docs or whatever. I'm convinced that people will do the right thing if you make it easy enough for them.
I'm really impressed at the lengths people will go to defend those whose malfeasance or ineptitude unarguably worsen the lives of up to two hundred million people. You seem like a smart person, please tell me how you think it's OK that these folks' contact details and potentially very detailed psychological and political profile information are now likely available on the dark web?
Given that this data was collected for explicitly political purposes with the specific goal of shaping voting behavior - one of the few things in American life where privacy is considered sacrosanct - surely you don't need me to point out the potential for manipulation, exploitation, and intimidation that become available to bad actors in possession of this data.
Are you familiar with the concept of 'strict liability'? Do you have any policy reason why such a standard shouldn't apply in cases like this?
Well, for one, you can legally obtain much more detailed information on a person from any background check application currently available. So, it's arguable this has really increased the likelihood of a future crime.
"In criminal law, strict liability is liability for which mens rea (Latin for "guilty mind") does not have to be proven in relation to one or more elements comprising the actus reus (Latin for "guilty act") although intention, recklessness or knowledge may be required in relation to other elements of the offense."
Proving recklessness is harder than you think.
If data is leak because someone within a company with the appropriate level of access decides to sell out to the dark web, all the security in the world won't protect you. Should the CEO go to jail because an employee turned on the company?
Heartbleed - you could have had 100 security professional on your team, and you still would have been vulnerable. Should every CEO on the planet go to jail?
Security persons do make mistakes and leave keys in places they shouldn't, genuinely by accident. Whom is going to jail for this error? If you think you are sending the security personel to jail, well, we're going to have an exodus of people willing to call themselves security personel, because no one is paid enough to risk jail for a job.
So, it's not a matter of defending ineptitude, it's a matter of recognizing the problem is complex and unless you can have clear boundries of what is punishible and what is not, your going to have a bad time enforcing anything that makes a difference. As a security person, I'm sure you know a policy without adequate enforcement is absolutely useless.
The hardest part of data processing is collecting it. If you can just grab already collated data for cheap, then it absolutely is more likely to be exploited.
That's all true, but continually talking about what a hard problem it is as a way to avoid settling on some harsh penalties (financial or custodial) for negligence of various types only perpetuates the problem. Courts can decide whether a harsh punishment should actually apply in any given case. But right now, no such punishments are even defined so there are no strong incentives to minimize negligence and restrict collection and distribution of that information.
tl;dr penal incentives function like a sword of Damocles. As long as we're debating whether and what size of sword of sword to hang from a thread, Damocles has no reason to worry.
> I'm really impressed at the lengths people will go to defend those whose malfeasance or ineptitude unarguably worsen the lives of up to two hundred million people. You seem like a smart person, please tell me how you think it's OK that these folks' contact details and potentially very detailed psychological and political profile information are now likely available on the dark web?
So far I've seen people disagree with simply throwing them in jail and people who want more data (was is a screw up? even security people make mistakes. Did they not have appropriate resources? etc).
I haven't seen anyone say this is OK in any way, shape or form. I see many reasonable discussions.
We've been having reasonable discussions on these topics for many years. Discussions that never lead to decisions and actions are a sideshow.
I'm calling for action - strong user-centric privacy protections with strict liability and significant personal and organization penalties for negligence, similar to the French model.
Criminalizing something doesn't mean you've "done something about it". It just means you've applied a traditionally flawed approach to fix a systemic problem that could be better handled through education, training and awareness. However, those things are much harder to do than to simply write something into the legal code without the public being any more prepared to deal with the situation.
a systemic problem that could be better handled through education, training and awareness
How's that been working out for you? This isn't a new problem. Where are those educational, training, and social awareness resources? What budgets have been allocated to them? What mechanisms put in place to monitor the effectiveness of the deployment? How many more years of theoretical discussions about ideal solutions should we have before acting, notwithstanding the possibility of error? If your cautious incrementalist approach is so great (and heaven knows I've spent many years thinking and advocating within that framework) why does the problem keep getting worse? How long and to what extent are you willing to wait for this informed public to manifest and (somehow) overcome all the countervailing forces that have economic and political interests in quite different outcomes?
And why, I ask myself, did you respond to my positive proposal about "strong user-centric privacy protections with strict liability and significant personal and organization penalties for negligence" by ignoring it and instead knocking down a straw man of 'criminalization' that I took care to avoid?
You don't want to be the person responsible for taking or advocating for a decision that might work out poorly, fine. But reiterating the reasons for your hesitancy achieves nothing.
>Where are those educational, training, and social awareness resources? What budgets have been allocated to them? What mechanisms put in place to monitor the effectiveness of the deployment?
Just because there is no information about that in the article, doesn't mean they weren't in place.
> If your cautious incrementalist approach is so great (and heaven knows I've spent many years thinking and advocating within that framework) why does the problem keep getting worse?
How do we know its getting worse? I work with a LOT of non-technical people, and they are very good at detecting spam emails, and not clicking on the fake bluescreen popups, etc using just their intuition and general awareness. They DO pay attention whenever articles about viruses and hacking and whatnot hit the front page.
>You don't want to be the person responsible for taking or advocating for a decision that might work out poorly, fine. But reiterating the reasons for your hesitancy achieves nothing.
Would you consider flipping it then? Let's also put software developers who introduce security bugs in jail. Oh, but software is so so complex!! A million different pieces working together, and I didn't write all that other code, so how could __I__ possibly be held responsible?! Well, people to people interactions are complex too, and putting a process in place where every person is supposed to follow a protocol is hard too.
There are times when I disagree with you, but then there are times like this when I can't agree more.
Now, with that said, I find this position slightly juxtaposed to the position you appeared to hold on the privacy of citizens when the Snowden leaks happened.
Why would political preferences be more sacrosanct than other preferences or private predilections people hold...
It would be great to define all the data-types a citizen can hold a position on and determine those which the government / entities can gain access to, and those which a citizen can expect privacy with...
And have that as a simple checklist as opposed to hidden in lengthy language of laws?
A good question indeed. I was not at all thrilled about eh Snowden leaks, but I had, and still have, some faith in the mechanisms of institutional governance, plus I viewed it in the context of strategic calculus.
Being a Euro I personally favor very rigorous privacy protections, and think you should be able to know who has data on you, get detailed copies of it in some accessible format, and request its deletion. Public institutions that do have a custodial data function should be subject to increasing levels of accountability and their powers should not be unlimited.
Now, since the US doesn't currently promulgate such strict data-gathering and retention standards in the public or private sector as I would like, it's a strategic reality that well-resourced actors like foreign governments can vacuum that up for their own ends, whether nefarious or merely curious. So I'm OK with the NSA collecting such data insofar as it seems irrational for the government to put itself at a disadvantage relative to everyone else in the private sector, in the same way that it would irrational for police officers to have fewer powers than regular people, as opposed to greater responsibility in the exercise of those powers.
In short, if all that data on people can be legally bought or acquired, it'd be pretty stupid for the USA/NSA to be the only entity that didn't have a copy.
I do heartily agree that data aggregation in both public and private sectors is way, way out of control, and I also agree that a checklist approach would be far preferable to yet more books of rules. I have some radical (but inchoate) technical approaches to this problem in mind, if you want to get in touch via gmail.
> You can't blame a "careless programmer" when the real problem is organizations simply not committing any resources to security. It's like a hospital only employing 1 nurse and blaming her when patients inevitably die.
This isn't a very good example. A shortage of nurses will directly correlate to poor patient care and possible death. But a shortage of security experts? Who knows. I worked at an insurance company that left an access database open to the internet FOR YEARS. We ran analytics when I found it and it was never served from our web server.
So since it didn't get into the public does that mean they were responsible for their security? If the answer is "no" then how would you ever measure these unknowns?
Security is a major problem in tech. It's very difficult, it's nuanced and its vast. Security covers so so much that it would be difficult to one or maybe even a handful of security experts to fully over all aspects of an app depending on your scope.
Beyond that though is mistakes happen. People will screw up. Even security people can screw something up. Throwing someone in jail for a screw up reminds me of the war on drugs; it's not going to stop someone from making a mistake or simply not realizing an unknown unknown.
> Security is a major problem in tech. It's very difficult, it's nuanced and its vast. Security covers so so much that it would be difficult to one or maybe even a handful of security experts to fully over all aspects of an app depending on your scope.
Which is why it's so important to hold companies that screw it up accountable. That's the only way to get it to change. Forget about everything else, accountability will force new rules for data storage and protection. Without accountability, nothing will change.
> Which is why it's so important to hold companies that screw it up accountable. That's the only way to get it to change. Forget about everything else, accountability will force new rules for data storage and protection. Without accountability, nothing will change.
Sure but everyone on HN suggests accountability but never defines what they mean by it except for the few who think someone should just be thrown in jail.
I'm not sure it means jail time. It does need to be substantive. Holding the CEO, CTO and CSO directly accountable is a start. But I think it needs to be company wide. It needs to be punitive damage to the company and its shareholders. The risk of mishandling PII and similar data needs to outway the benefit (that may differ per type of PII).
At that point it would be my theory that consolidation around best practices, software, security audits... would become the norm. It would raise the cost of a company taking on the responsibility itself, that they'd rely on others to reduce the cost through volume. It would probably start looking a lot like PCI and credit card co. Requirements. The big difference here being that there isn't an industry body responsible, but the government, which would always be political and probably not have enough teeth.
Careless programmers can also mishandle health/credit card/minor information. We have laws protecting all of that data in particular. I'm not sure the expansion of data privacy laws to include all PII is so farfetched.
Treat everything as PII and we are good. The constitution has an amendment to protect our rights. That seems important. I know of no guaranteed right of corporations to infringe on our privacy and to provide access our data.
Forty years ago we didn't have this issue because there wasn't so much data for them to try to get their grubby greedy hands on. They don't need our data (ANY OF IT)!
The US constitution forbids the US government from acting in certain ways, it in no way impedes upon private organizations. Tort law and the like is what holds private organizations accountable. i.e. The fourth amendment does not protect you from a private entity or individual; laws covering trespassing, theft, breaking and entering do. Please don't drag the constitution into an argument it does not have a place in.
I didn't say the constitution protects me from private assholes. My point was privacy was important to our founders and remains important today. Had our founders known corporations would be a thing and grow as powerful as the government, I would argue they would be included.
We need laws to protect us from them. Much more and better laws. The fact that the constitution does protect us from gov. is a good argument that our gov. should be active in protecting us from corps. Anyway that is my conjecture.
I'd like to argue (not for the first time either) that the Constitution is seriously deficient in its failure to enshrine privacy as a personal right. Great as it has been for the last couple of centuries, I think it's obsolete and should be replaced rather than merely amended.
It's not the only thing I'd like to change. Besides which, there is already a movement in progress to bring about another article V convention and I'm guessing the goal of the proponents is drastic rather than minimal alteration. Here's a recent summary article on developments:
There certainly is one, if you only take into account public opinion. We're dealing with conflicting interests of people who generate data and corporations that collect and traffic data.
OP is clearly suggesting creating a law to avoid this sort of outcome. And while these aren't medical or financial records, they did include: "names, dates of birth, home addresses, phone numbers, and voter registration details, as well as data described as 'modeled' voter ethnicities and religions." Say on average people would pay $5 for this stuff not to be leaked. You're talking about a $1 billion fuckup resulting from a choice that is, as far as I can tell, gross negligence on the part of the programmer.
While I generally agree with your point, it's hard to make this into law. Do you think it should be illegal to have a company with only 1 programmer? If not, how do you prevent them from making catastrophic mistakes?
Law generally doesn't prevent catastrophic mistakes, it creates consequences for them which incentivizes those in a position to make them to find ways of preventing them.
As of right now there is not a Federal law describing exactly what is PII data (which would be a prerequisite for this).
There are many (48) different state laws that do define what PII is and how organizations (commercial and governmental) are to handle data breach notifications. If you want to see what a crazy patchwork map of laws this is checkout:
These only come into play if a certain minimum number of state residents have had their data compromised and if that data is of a certain class.
Typical classes are:
- Account info
- Financial info
- Health Info
- Health Insurance info
- DNA
- SSN
- Biometrics, etc.
And I'm not a lawyer, and we likely don't have all the facts, but at first glance the data released in this breach doesn't meet any of those classifications. It looks pretty much like the data you'd get out of a phone book (name, address, phone number) with a few data points like geocoding and their guess as to your religion and politics.
Which isn't to say that it's great, or that it's not a problem that this was all released, but it is pretty much public data.
It's not about the careless programmer IMO. It's that there was nothing in place to make sure the data was secure. Or if there was, it wasn't effective.
Not sure I understand your response. I wouldn't describe Facebook's security processes as "careless", nor would I describe their vast and complete data collection as "for funsies"
I think their point is that there are thousands of programmers who handle this data on a daily basis as part of their job - intentionally limiting the scope to "funsies" is ignoring that.
It absolutely is how criminal laws work under the Constitutional prohibition of ex post facto laws; while we can write forward looking criminal laws however we want (within other Constitutional limits), we can't apply those new laws to past conduct.
No, that's not how laws work. The law comes first. Then its application to behavior. If they didn't break any existing laws then there's nothing to do but propose a new law.
I agree. As an example, over a decade ago I was on a team deploying an app to Switzerland. They took privacy really seriously, because they told me that employees get fined/imprisoned for privacy breaches, not just corporate fines.
Throwaway account because of involvement in this field.
Even though the RNC is a private organization, it doesn't operate like a normal company. The partnerships that it makes with companies that it awards contracts to are largely relationship-driven, not actually driven by objective analysis of value propositions. Those decisions tend to be made at the COO/CoS (Chief of Staff) level.
If this data has private information on non-republicans then jail them for having it, not for leaking it.
There is no special purpose in these two groups of colluding Americans that grants them special rights to gather data on their non-associates any differently than any other member of the public.
If they are gathering data through any means it's no different than other marketing firms. I think the real debate needs to be about what any companies can share.
But it is not about what they can share, it is about what they or anyone can collect, store and use.
What you can share lets companies store and make decisions based on data they shouldn't have and couldn't share as long as their inputs and outputs look clean.
I,e. Facebook or Google could help you intentionally run a race biased campaign across all their assets as long as they don't tell you any specifics and include a little noise so you can't be sure of any one user's race. All thanks to what they can collect, use, but refuse to share.
No special rights are needed - the kind of data that is described in the article may be difficult and/or expensive to get, but it's not illegal to get, not illegal to have, and not illegal to distribute to others. It would be legal for you to do the same - maybe it shouldn't be legal (and it isn't so in e.g EU), but it currently is in USA.
If the leak would've been published willingly by Deep Root Analytics, there would be no crime (according to current USA privacy rules) here at all, no currently valid reason to jail anyone.
I hate to be in the position of defending a leak such as this. But if what they've done is "merely" compiling data that was available from our public profiles, are they obligated to secure that compilation? I'm asking -- I don't know for sure how the data was gathered, it just sounds like it was from scraping public records + public web sites.
Also, can someone ask Troy Hunt whether he has or can get access to this data so he can let us all know if we're on it? (But will it even matter if they don't have an email address field?)
Voter files being public data depends on which state you're talking about -- you're looking at ~50 different sets of laws and regulations. Some states restrict access to only political parties using it for political uses and prohibit distribution to non-parties/candidates. Some states (Florida, for example) will let anyone just apply to get it.
To your latter question, some states, like California, do include email addresses on their voter file, but the coverage tends to be poor.
You can probably assume that, if you're a registered voter in the United States, you're on this dataset, as is your age, gender, party affiliation if applicable, and race/ethnicity info if your state collects that, as well as modeled information projecting things like party support, race, age, and likelihood to turnout.
It's an interesting question, because there is a comparable in the intelligence community called aggregation.
Effectively two pieces of information when separate, might not be classified, but if they are linked with a third or combined they become classified. Add more and it changes the classification further.
I wonder if there should be something similar for data aggregation companies. Like what we see with HIPAA.
Personally, while I understand that this is nothing illegal, I think it's terribly wrong.
We just handed everyone, including the 5% of society who tend towards sociopathy, a nicely tagged, collated (and yet probably slightly inaccurate) list of minorities.
Hate women? You have a nice list which includes the names, addresses, and telephone numbers of all those women.
Hate muslims? Boy do I have just the list for you. Blacks? Republicans? White muslim men who live in the same neighborhood as you?
Let's omit the sociopaths for just a moment, and let's look at the ad networks. Can you picture how much more accurate a picture those companies have of you now? They no longer have to guess at your age, ethnicity or religion - they now know. What could go wrong when that list of "legally" collated data gets combined with the RNC leak, and is subsequently itself leaked?
So no. It's probably not illegal to compile these lists. It's probably not even illegal that it was released. But it was, for certain, a damned immoral thing to do, and there will be consequences.
I don't really disagree that this is a shitty data breach, but on the consequences, I'll point out that the 5% of society who tends to sociopathy already have much lower-effort means to target minorities:
Hate women? Look for boobs.
Hate muslims? Look for brown skin.
Hate Republicans? Look for MAGA hats.
That these are inaccurate signals is irrelevant: haters gonna hate, and I really don't think they care whether the brown-skinned person they're harassing is actually a Muslim, they just want a target for their anger.
As for ad networks - they already have much more accurate models of age, ethnicity, and religion than the RNC has. There's a lot more money involved in targeting ads, and so they've put a lot more effort into it than political consultancies. Worrying about them is like closing the barn door after the horse is out.
It doesn't get much lower effort than downloading a list, popping it into Excel, and sorting on a column (I'm willing to bet that some hate groups are already doing this and will release very specific lists to their membership). And with phone numbers and a couple of bucks, you don't even have to leave your house to send them hate messages by the thousands.
Haters gonna hate - far too flippant a phrase to describe those who emotionally and physically assault their targets.
As for the ad models - the RNC release has a very specific DOB, location, gender, and phone number. Some of these the ad network could guess at, but this provides concrete data.
It gets a lot lower effort than downloading a list, popping it into Excel, and sorting on a column (really, how many non-techies would do that?). Get on the subway, find girls with brown skin, start screaming obscenities, pull knife when confronted, murder.
Look, I don't want to minimize the impact of hate or harassment on victims. It really is terrible, and should be challenged whenever possible. I do want to inject some realism into the discussion of the likely consequences of this breach. The people who would go out and harm other people because of their ethnicity or religion don't particularly care if they get the ethnicity or religion right. (I'm reminded of a time when I was carrying my wife's purse while she went shopping in a nearby store, I walk by a pickup truck, and the guy inside is loudly muttering "Fucking faggots" over and over again. And then I met my wife in the parking lot, give her a kiss, hand her back her handbag, and he laughs a big "Ha-ha!" of relief and drives off.)
And the people who would do mass harassment over the phone have much easier ways to get this data, like e-mailing their state voter registry and asking for it.
Certainly, it makes it easier to call SWAT on the home of a person in Georgia while being in Alabama; this is much lower effort than shouting hate slogans at some random group of people in the street. This is scary nasty.
Why do you assume sociopaths just want to pick on individuals? There are ambitious sociopaths that draw up plans for things like ethnic cleansing, genocide and so forth. We have abundant examples of such behavior within recent history.
I think your concept of sociopathy qua serial killer is uninformed and stereotypical. Most sociopaths are not stabby weirdos blinded by hatred, they're just self-centred people with different levels of emotional affect/susceptibility than the general population.
It seems not to have occurred to you that (depending on record quality) a database like this would be great resource with which to find and recruit sociopaths of various stripes.
I don't assume that sociopaths just want to pick on individuals. I assume that the ambitious sociopaths drawing up plans for ethnic cleansing are the people who compiled this database in the first place.
Someone with the resources to perform ethnic cleansing could easily pony up the few thousand dollars that is required to buy this data in the first place:
One of the weird effects of the internet is that stuff that used to be considered obvious and self-evident is now treated as very sensitive. For example, no one used to consider their gender or race "private" information; rather, it was nearly impossible to interact with another human without these things being self-evident.
Not many years ago, there used to be a book distributed far and wide with the names, address, and phone number of virtually everyone that lived in your city. This was accepted as normal and routine, and you had to specifically opt out of being included.
The difference is in the scope. Back when phonebooks were widely used, there was no automated way to correlate those records with other records to create a highly specific list of all Asian women who live in a four block radius of a specific address.
You could walk around and find some of them, but that would take a significant amount of time and effort.
Phones were also not capable of interrupting you while you were out of the home, nor were they capable of receiving short messages on your dime.
I missed where it was all public; couldn't you share information with your political party you wouldn't want shared with others? And wouldn't the accumulation of all that data be valuable?
Many states have very loose control over their voter database. The control is mostly exerted through the force of the law prohibiting certain uses of the data. For example, I can order the voter database for the state of New York and receive it on a CD in a few weeks. Other states will simply email you a link to a big bundle of CSV files.
There is other information besides what is on the voter database in this disclosure, it appears, but the voter data itself is mostly not a secret and can be trivially accessed by any citizen. They just have to promise not to break the law around what they do with it.
As I said in the other thread, this is all public data from various sources (except the modeling). It will be fully filled out.
It's very common for political orgs to have the entire voter file for the US (or be able to query it using SQL)
its the aggregation and easily searchable format for 2/3 of citizens that makes this REALLY dangerous for identity theft. they were still grossly irresponsible. no consequences will happen though.
Pardon my ignorance, what is the key data in this leak for identity theft? Eg, I didn't see anything really nasty like SSN, so what makes it easy for identity theft?
Plus the supposedly "startlingly accurate" preference and views modeling that is linked to all personal details and publically accessible. While someone could always deny it being correct it again raises questions about responsibility in data collection.
Lots of ways. Perhaps you know they're a super-loyal Republican voter and you have things that correlate with that, such as willingness to vote when presented with an online poll. You might know they were reluctant because the person shifted between the available alternatives during primary season - perhaps supporting Jeb! Bush, then Marco rubio, then Ted Cruz, then John Kasich, before finally falling into line when Trump won the nomination. You could then infer that Trump was the absolute last choice but that they would still 'hold their nose' and vote for him in the general election, either due to loyalty to GOP on particular policy issues or because of some long-standing hatred of Hillary Clinton, or just a history of being very conformist in political matters (as many, many people are).
I can not vouch for the accuracy of those. but as a power user of GOPDataCenter (The portal they provide to campaigns and GOP County chairs) the Modeled Ethnicity is pretty terrible.
“‘Microtargeting is trying to unravel your political DNA,’ [Gage] said. ‘The more information I have about you, the better.’ The more information [Gage] has, the better he can group people into "target clusters" with names such as ‘Flag and Family Republicans’ or ‘Tax and Terrorism Moderates.’ Once a person is defined, finding the right message from the campaign becomes fairly simple.”
Neal Stephenson wrote a book called Interface which predicted a form of tech-enabled micro-targeted politics over 20 years ago. It was disturbing at the time; it's almost considered business-as-usual now.
I believe American democracy would benefit from including the study of such techniques in our educational curriculum. When I was in school, we studied advertising techniques to help us be skeptical. We need the same for targeted political messages now.
I agree, but citizen education should not imho be the only approach here. I'm for much more muscular privacy laws and a slightly narrower tolerance on what's acceptable political speech.
Of course education is great, but look at the vast financial and operational asymmetries between even the most informed individual and well-resourced corporate actors like political parties. I have a super-strong political immune system but being politically engaged and navigating social media is exhausting. For the sake of objectivity I have to systematically expose myself to opinions I find disagreeable lest I retreat into a bubble and be surrounded by confirmation bias, but continuous exposure to countervailing political ideologies is intellectually and morally tiring, given the intense polarization and visceral rhetoric that prevails in today's political discourse.
Despite not liking programming, I've been seriously thinking about building a virtual assistant that I can train to pre-emptively tag people using my peculiar ideological criteria so that I can avoid or at least prepare for certain interactions that I know are going to be psychically difficult. By my value calculus, tuning out of politics is irresponsible at best and suicidal at worst; only communicating with people whose values you share exposes you to confirmation bias, and and inevitably exposes one to manipulation; observation of and argumentation with antagonists is psychically expensive and potentially dangerous.
so much as I agree with you on education, it's not something we can just put on the to-do list and wait a generation to benefit from. And that would be true even if we had a well-functioning educational sector rather than one that fails a large number of children and adults by leaving them only semi-literate and -numerate. People who can't read or reckon well are poorly positioned to identify fallacious political discourse.
I revel in my filter bubble and labor to improve it.
For policy wonks and activists such as myself, discourse, persuassion, marketing, are distractions from the real work of getting things done.
Firstly, because people vote their identity. Period. Almost no one votes on the facts, the issues, the policy, the platforms, whatever. There are no undecideds, no independents. Cite "Democracy for Realists".
Secondly, victory is achieved by mobilizing your supporters. You bring the heat, whoever is sitting in the chair will see the light.
The only distinction is if a voter is willing or unwilling to bother casting a ballot.
So that the two parties who voted for this to be the case can have unfettered access to their potential voters, go to their houses, send them things, and know for sure they're only hitting up people in their party, so as not to mobilize the other side.
Nothing that would harm the 2-party system ever changes in the US, and nothing ever will.
Nope, in the US you register with the state and you can optionally tell the state that you are a member of some party.
You can register to vote and self-identify as 'Dem' 'Republican' etc while getting your driver's license.
Aside from the public voter file update, the Democratic Party doesn't get any special notification if you pick them, and you don't need to apply or get accepted in any way.
Depends on the state, but in general, no. You register as a member of the party. You register with the state. Voter registration information is used to determine who's eligible to vote, and you don't necessarily need to have a party affiliation to be able to vote (I don't, for example).
My guess is that it's public to avoid unfairly preferencing the major parties, so that they give at least lip service to the idea that "anyone can run for office". The U.S. is pretty sensitive about seeming like it has a fair and open political system, even if other aspects of the system mean that in practice a third party doesn't have a snowball's chance in hell.
I don't typically don hats with this much tin foil, and I don't think this is likely, but...
The real danger of data like this, in my opinion, illegal usage for voter fraud.
Find people who are likely to vote against you and likely to have poor voter registration documents, and remove them from the polls so they can't vote.
Find people who aren't likely to vote at all and vote on their behalf. In-person, the only verification required is name & address. By mail, the only requirement is a signature, which can be obtained from receipts (I assume this is available on black hat markets).
Leaving this S3 bucket as public-read allows for deniable coordination with illegal actors. I can't imagine they did this on purpose but that could be an explanation.
I don't know if it's possible, but I hope the FBI / Mueller team is able to get access logs.
No. The data that would be useful for wide-scale voter fraud is already widely available from public/free sources, including state Secretaries of State or Departments of Elections.
The loss here is all the very expensive extra modeling and demographic work that isn't included on those files. But having that doesn't massively alter the mechanics of the voter fraud effort you're describing.
Every county is different so some of my statements may not apply everywhere (I live in San Francisco).
I've yet to work a poll (next election) but have gotten to know the system here pretty well through the SF elections commission.
Our system is very far from perfect. Many counties do not even audit ballots after every election (let alone use only paper ballots). Epollbook software can be all over the place. Voter verification at polling places is often quite minimal. The penalty for forging a signature on a mail in ballot in CA is only $1000 (I was in the room when the state assembly committee voted not to raise the fine to keep up with inflation).
I don't mean to be alarmist - like I said, I don't think these things took place, at least en masse - but it'd be quite naive to suggest there aren't vulnerabilities.
I still encourage you to work or observe poll sites on election day. Soup to nuts. If you work it, you'll get training, see how the Australian Ballot is supposed to work. It requires many hands, eye balls, proper accounting.
I'm not so worried about identity theft for in person voting. Just doesn't (didn't) seen to happen much on the west coast.
I vigorously opposed closing our poll sites in favor of all mail postal balloting (WA state). With ballot scanners and electronic adjudication of ballots (changing records in the database per "voter intent"), it's roughly equivalent electronic voting machines, with some new vulnerabilities added (eg tabulating ballots as they arrive, effectively a pre-count).
As various members of the election verification network (EVN) determined, auditing elections is infeasible, impractical, and does little or nothing to increase confidence or certainty.
The gold standard for our form of elections, which I continue to advocate, is the Australian Ballot. In place of auditing, use physical chain of custody. (As you likely know, election administration is not banking, where they have double entry bookkeeping.)
---
To truly fix our election integrity woes, we need to do two things.
First, replace our first past the post (FPTP) with a more robust voting system. Like approval voting (for executive races) and proportional representation.
Second, adopt universal voter registration, with automatic updates. Were our government to use any one of the number of existing demographic databases (facebook, seisent, choicepoint, NSA, etc) then we'd know in near real-time who was eligible to vote. And save huge money doing it.
My impression is that SF actually uses a ballot designed for chain of custody accounting, but doesn't use it whatsoever in practice because of the effort involved. I may be wrong on this. But "many hands, eyeballs, proper accounting" is unfortunately not available for our elections in most areas.
Happy to chat more about this - email is in my profile!
The real danger of data like this is concentration camps and death squads. I don't like stating things so dramatically but that's the sort of thing that actually happens, not just in WW2 but also under the USSR, in many dictatorships in the developing world, when countries like the former Yugoslavia experience political collapse, and so on. Just a couple of months ago there was a systematic effort in the Russian federation of Chechnya to round up and incarcerate homosexuals.
Don't make the mistake of thinking that atrocities couldn't possibly happen here just because you're used to thinking of them as something that only happens in other places.
I have this theory that the only way regular people will start caring about privacy breaches such as this one is to use that data against them in a malicious way. Tell the average Joe that the data of all US voters has been leaked, "Hmmm. That's bad." and they move on with their lives as if nothing happened. Instead, if this data is used to impersonate the average Joe on social media or if it's used to trick their mobile carrier into porting out their number, then they'll take notice. (I am not suggesting people do this, it was just part of a thought experiment)
Unless the company involved is sued to bankruptcy and the people involved are prosecuted, sending a strong message to companies dealing with user data, nothing will change. But that's unlikely to happen as this company is backed by the RNC.
While we're on the topic of collecting personal data of people, there's a simple solution : just don't collect it unless it's absolutely necessary. Stop asking me to broadcast my address in my newsletter. Stop asking me to submit my billing address when I make payments online. Stop asking me for my mobile number when I visit a fast food restaurant. Most of the companies that collect this data are not competent enough to keep it secure. The reason companies ask for an address to broadcast in users' newsletters is some anti-spam act which does not prevent the spammers from doing their job. I imagine it's also a requirement for companies to collect a billing address for certain types of online payments. Change the law to remove these poorly thought out legislature.
More generally, we need regulations on how user data is used by companies. They should not be allowed to store user data indefinitely. If a user closes an account with a company, retain the data for a short period (eg- 1 year) and then delete the data automatically. Companies should not be allowed to build shadow profiles of users.
>I have this theory that the only way regular people will start caring about privacy breaches such as this one is to use that data against them in a malicious way
The other thought experiment is to adopt the opposite point of view that privacy is overrated. You can adopt various worldviews from 'everything is grey' to 'this is good, that is bad" but punishing someone to care about same things you care about is a pretty terrible approach. If you can't convince someone, it doesn't mean they're stupid, it could also mean you just aren't that good at communicating, or that perhaps what you think is important isn't all that important.
>Unless the company involved is sued to bankruptcy and the people involved are prosecuted, sending a strong message to companies dealing with user data, nothing will change.
Or we can give them tools that make it easier to secure data. I've always found that if you make it easy for someone, they almost always end up doing the right thing. As it stands the security products/services domain is a complicated maze where you have to be an expert to evaluate how various products work internally and which services, if any are worth purchasing.
There's a pretty compelling argument though, that people aren't good at making long-term assessments of diffuse, but potent, risks, and/or are willing to, or can be coerced into, arbitraging long-term interests with short-term exigency.
Gresham's law, availability hueristics, optimism bias, distribution of cognitive skills, various aspects of game theory, and more, strongly suggest this.
Examples: global warming, pollution risks, resource depletion, moral and morale hazard, just off the top of my head.
Yeah, and all those biases apply to people on HN too - You (the proverbial) aren't good at making assessments either. Also, flipping it for arguments sake, how come software companies are never penalized for introducing software bugs? How would you like it if an accountant wanted to send you to jail because you introduced a security bug and their critical data got wiped out and they went out of business? Or should we go back to blaming them for not having backups because we have an a-priori assumption that "shit happens" when it comes to software? Well, the other side could say 'shit happens' too.
I've been meaning to write up a bit expanding market price dynamics beyond the set of goods that Adam Smith defined: labour, capital, commodities, rents, and (indirectly) interest.
In particular, the question of risk pricing, which is treated almost wholly as a financial question rather than an economic one.
The question of pricing under duress is a key one -- the Backward-'S' bending supply curve is a curious economic anomaly:
Also the behaviour of natural resource stocks under supplier pressure -- the price will fall to the lowest levels possible, and supplied volume will increase, if possible, for a number of highly perverse reasons. The collapse of oil prices following the East Texas oilfield discovery, from ~$1/bbl to first $0.13/bbl, then $0.02/bbl, before wellhead production was siezed at force of arms by the Oklahoma and Texas national guard, and Texas rangers, comes to mind.
I think people are missing the big legal liability .... this information has been published to the world, it contains estimates of people's deeply held political beliefs - some of it will be wildly wrong and those people might consider that they have been libeled .... roll on the lawsuits
Libel, in the US, requires the publisher to either known the claim is false ormto publish it with reckless disregard for the truth when it is, in fact, false; I don't think that there is anyway that this could be construed to meet that, no matter what legal theory as to who the liable publisher is you use.
The liability wouldn't be with the RNC or Deep Root; the liability would be on the entity that actually published the information. That's if you consider this to meet the legal definition of publishing (the legal definition of libel, as I understand it, is "to publish in print (including pictures), writing or broadcast through radio, television or film...").
Want the name, age, gender, home address, mailing address, party of registration, and voter history from every registered voter in North Carolina? Here is the "leak" on Amazon S3. http://dl.ncsbe.gov/index.html?prefix=data/
Except, by leak, I mean, link I got from my state board of elections' homepage.
Related: This legally mandated "leak" happened years ago, and it even included signatures. Most of the citizens pushing for the recall of a sitting republican governor were democrats, so it seemed like punishment for a Republican state assembly to pass this one-off policy. Especially for those who have zero internet-presence. The first thing many employers see when they search for politically active democrats is this information, paired with "quick searches" of their names on criminal, pedophile and dangerous persons databases. If you want to disinsentivize political action, this is how you do it. I won't link to that site, but can confirm it's still up.
Because technologists don't know everything about laws and society, even though they like to think they do. You might add that not only is it searchable, but the Freedom of Information Act and its derivative implementations in state laws mandate that it be made accessible in this way for no cost other than the expense in compiling the records.
People getting angry when "government transparency" is supposedly such a good thing no one questions? Go figure...
IANAL, but I'm pretty sure if someone leaves a Top-Secret document on the ground, and you pick it up, you don't go to jail for that. The data is accessible to the public -- this isn't a hack, this is just downloading.
They don't have a special permission, but each case is treated individually - what exactly technically you did matters just a little, the intent (not claimed intent, but the intent that the judge/jury would imply), circumstances and what you do afterwards with the data matter much more.
I.e., if a respected company downloads the data, reviews what horrible things it has and reports it to proper authorities (and gets legal advice before that on how best do it), then they're very likely to be treated as not done anything bad;
If I'd do the same, contact them asking to fix the vulnerability "or else", and then download the data and publish an angry video rant on youtube, that might land me in trouble, as (expected) intent matters a lot for prosecuting crimes.
You could make the argument that since the information was not protected in any way that you were allowed to download it, but try explaining that to a 65 year old judge who doesn't even comprehend the basic structure of the internet.
source: work at place with large call center. Avoiding DNC fines is among one of our top priorities.
secondary source: S.O. works at place with call center for a large bank. Same thing.
Also anecdotally: If you file a complaint with the FTC for an unknown number that keeps calling you back without them giving you a chance to "opt out" (this is most scammer numbers), you file a ticket with the FTC, and they usually respond to the ticket within 2-3 days. (Another funny thing -- they use Zendesk.). I stopped receiving the calls since filing the report.
As to why they take it so seriously? My guess is it's easy money for them. Kind of like traffic tickets for cops.
What are the legal implications of a campaign using this leaked modelling data? Would it be breaking any existing laws to use data that was leaked to the public without the permission of the original company that did the modelling? (If nothing else, I suppose it's probably a copyright violation.)
Hypothetically, could one deliberately leak a trove of modelling data with some fake voters inserted, and then monitor the mailbox associated with that fake voter and sue any organization you don't like that sends campaign flyers for using the data without permission?
The results of data science models are contained in the data dump, and those are non-public. The rest of the information is accessible via public records or registries maintained by states.
> Spreadsheets containing this accumulated data—last updated around the January 2017 presidential inauguration—constitute a treasure trove of political data and modeled preferences used by the Trump campaign
Genuinely curious: can you really have 198 million rows in a spreadsheet?
I used to work in this field (political consulting, the data around it, nothing terrible interesting on the tech front at the time though). It was most likely broken down by Congressional district.
>Each file, formatted as a comma separated value (.csv), lists an internal, 32-character alphanumeric “RNC ID”—such as, for example, 530C2598-6EF4-4A56-9A7X-2FCA466FX2E2—used to uniquely identify every potential voter in the database.
I'm just waiting to hear Trump tell Deep Root Analytics, "You're Fired!"
It would be good to see him make this a clear case of responsibility. Also, someone on the RNC side needs to get fired, too. I'm not sure who, but errors this big demand it.
Trump only has "signaling authority" over the entities with whom the RNC contracts; he can't actually force a contract termination, but he can express distrust in an entity and encourage the RNC to terminate a contract. I don't see that happening, for a myriad of reasons.
The people involved with the decision to start working with Deep Root are mostly not with the RNC anymore. Even if they were, that's simply not how the industry works.
Doesn't this kinda make many (flawed but still in use) "security measures" incredibly vulnerable to social engineering? That's all the birth dates and phone numbers. That's crazy!
So you think it is okay to model ethnicity and religion of 200 milion citizens? This is how my country got all their jews killed in WWII [0]. The mere thought of compiling such information here would lead to some hefty jail time these days. I really cannot believe you can even defend such a thing.
Please re-read what I wrote and consider that I might have the best intentions to explore a discussion with HN. Note also that I discussed the nature of securing the data and not its compilation.
Note that IMO several European countries forfeit some freedoms that I consider valuable and critical for democracy. IMO it is your right to observe the world as it is and note what you observe. Yes, I understand that this right conflicts with privacy and find it an extremely unfortunate consequence, especially given the emerging power of AI. However I think that it's also possible find people culpable for evil intent of a compilation, though difficult to prove.
It's not an attack at you personally. The contrary. But I noticed that a lot of comments found this kind of data collection 'normal'. and apparently in the US this is very much legal. And I as a person find this very troubling, because in Europe these kind of decisions caused a Genocide, which is a result that should not be taken lightly. Do you really want the RNC to collect religious information about all 200 million voters? What happpens if this information falls into wrong hands (in our case it was Nazi's). You just handed someone a potential 200 million long hit list.
So yes it went Godwin quickly, but it is the painful truth.
I'm no expert on the subject, so feel free to set me straight. It seem to me, though, that this is a bit like cargo culting. An association between some part of a process and the result that doesn't factor in the big picture. More specifically, wouldn't the presence of such lists have proven a minor and nonessential part of the genocide? If the Nazis didn't have those lists, would they have just thrown up their hands and went away? It seems they'd have just gathered the information by the usual 'everyone accusing his neighbor' witch-hunt means, with people using it to settle grudges or other nefarious reasons. Maybe it would be less accurate, and maybe a little slower, but ultimately, you'd probably have about the same amount of bloodshed, I would expect. The real keys to these situations are a powerful group with ill intent and a population that is at least willing to condone it. If those are in place, the list doesn't really matter, and if they aren't in place, the list doesn't really matter. So why does the list matter?
Note, this is an invitation to enlighten me, not a firm belief on my part.
Don't try to negate his point with a Godwin flag. This is not about you or your posting intentions; it's a fact that data like this can and has been used for precisely such purposes in many instances.
Yeah yeah, that'snot what you are about, but citing your own comfy idealism to avoid dealing with unpleasant realities is exactly the mechanism that political bad actors seek to exploit.
In the United States, the First Amendment protects your right to compile publicly available data in any way you see fit. I have experience working on a political campaign and they ALL have databases such as this.
52 U.S.C. § 21083 has some relevant text (excerpts below). § 10101 (NVReg Act) does not. By my read, neither bars public access to this data. 5 U.S. Code § 552 (FOIA) probably supersedes.
> (1) ... (v)
> Any election official in the State, including any local election official, may obtain immediate electronic access to the information contained in the computerized list.
> (3) Technological security of computerized list
> The appropriate State or local official shall provide adequate technological security measures to prevent the unauthorized access to the computerized list established under this section
How one votes is secret. Whether one votes is typically compiled by the government and made available to political parties and often the general public.
Note that this is used by campaigns prior to the election to be able to target voters that have not yet voted. Here in Washington state, you receive your mail-in ballot 3-4 weeks before Election Day. I usually return mine as soon as I get it. Campaigns can query this data repeatedly up to Election Day. Knowing that I have already sent in my ballot spares me from needless phone and mailed campaign literature and saves campaigns the cost of sending out useless material.
I had another friend suggest it was probably fine, asked how they would feel if it had "ATF Form 4473 status" on it. Then they could grasp the importance. The person is a huge 2nd amendment proponent, and afraid the government is going to take away all their guns, so it really pointed it out to them in a way they understand.
I almost made this same comment however it's not comparable because these were private firms compiling the data for a private political committee. This is NOT the government collecting this data.
Rest assured the government has much better data on everyone already.
The US most definitely does think it is okay. Every political party and many corporations and non-party entities will have similar databases. If you wanted a list of every Jew in the United States, with their current address, and accuracy exceeding 99%, any number of data vendors would be happy to sell it to you, today. For better or worse, the US makes very different choices about privacy than many other advanced nations do.
Yeah, and that's a Bad Thing. Turns out our institutions do not magically immunize the USA from authoritarian ideologies or operations, which is rather worrisome because if the USA were devolve from democracy to dictatorship then there's no other country that's going to ride to the rescue, so the likely outcome would be an extremely unpleasant civil war.
Who decides which ideologies are authoritarian or not? It isn't an objective or universal reality. There are plenty of people in this country who think a Bernie Sanders presidency would have been as dangerous as Stalinism.
If we didn't have freedom of speech, we'd be just like Russia. The opposition is demonized at best, snuffed out of existence at worst.
Woah. Why don't you calm down and read the comment you are replying to. The comment you were replying to wasn't taking a position on the issue, and certainly not a strong one, merely asking questions.
>The mere thought of compiling such information here would lead to some hefty jail time these days.
No, not in the United States it wouldn't. Not only is it not illegal, it's standard practice by marking companies, data brokers, and others.
"Here" in GP's comment is stated as being The Netherlands. Europe does in general have stricter data protection laws than the US, though I don't know them well enough to know which bits of this leak would be illegal here. Allowing the leak itself would be, at least under UK law.
I think processing any data like that (ethnicity, political viewpoints, union memberships) is completely illegal in the Netherlands (and I assume in the EU) [1] (dutch).
I would personally be very disturbed and angry if data like that about me would exist, let alone leak to the internet. No matter who has it, that data should not exist. Do people in the US really not feel that way?
So? The GP said "these days," not "in my country these days" stating it like it was some sort of universal truth. We aren't talking about compiling data on UK citizens or Dutch citizens, this is American companies compiling information about American citizens.
For a US example look at NAACP v. Alabama (1958). If you are on the side of revealing this information, you are on the same side as some pretty unpleasant folks.
> I look at the picker of sides and their motives to determine the consequences of picking a side.
Taken literally, this suggests that you can't work out the consequences without looking at the supporters. If that's what you mean, I have to say it seems weird to me.
Taken less literally: if you think the consequences of one side winning include "people get harmed", you can just say that. You can point at the consequences instead of the supporters.
The people picking sides provides data and case studies for what that side really means. Its nice to think in the abstract about issues, but it is helpful to see the movements of the various actors in a situation to determine if a decision has a net benefit while not violating what I hold dear. The whole ignore the actors on both sides sacrifices getting a benefit from the analysis they did to pick a side.
Plus bad actors are quite good at sounding very good in the abstract while showing their true colors in their actions.
It's just a different value calculus from your own. You mention that you try to make choices based on 'what seems right' but surely you're aware that different people have different conceptions thereof. Violent extremists are typically very sure of their moral premises even though they may be drastically different from the popular norm.
That's like saying "it's OK to have racist thoughts, just don't say them out aloud". Whether it's revealed or not is secondary to the act of compiling itself.
It is ok to have any thoughts you like about anything. Once you act on them, that's when other people have to take into consideration their consequences. And speech is certainly something with consequences.
So, you would have ruled for Alabama so the KKK and Jim Crow supporters could get the list and "talk" to the businesses and people supporting the NAACP?
Well, the KKK and Jim Crow supporters were a bit farther up the ladder of 'unpleasant'. 'Unpleasant' being used in the same meaning and tone as a southern lady saying "aren't you special".
In their day they were only mildly unpleasant. Furthermore, those groups they believed they were acting to promote policy that believed was in the best interest of the nation, regardless of how we see them in hindsight.
Imagine if there existed similar databases then. Would you hire someone if you knew they were firmly pro-segregation at age 20-something in 1950-something?
The context of the time doesn't make the actions of the KKK any less unpleasant or disgusting. Lynchings and laws codified to keep non white people sub citizens are and were far beyond unpleasant, whether the majority of white people at the time believed so or not.
policy that believed was in the best interest of the nation
Which nation, exactly?
Imagine if there existed similar databases then. Would you hire someone if you knew they were firmly pro-segregation at age 20-something in 1950-something?
Probably not, not. I would certainly probe very carefully to see whether they had abandoned such a stance, since many aspects of personality and and political attitude are formed early in life. If you go back and look at documentary footage from around the time schools were integrated, in the 50s, you can see many youngish people protesting that and waving placards with swastikas and so forth on them (this less than 2 decades after the Nazis were defeated in WW2). A lot of those people held on to those ideas and passed them onto their kids.
Totally. I entered an address I don't care about, and it just kept asking me for what is essentially a marketing profile:
- Is this a personal email address?
- Please enter 10 more email addresses that you use (i.e. are associated with this email address) so that we can "narrow down the search results."
Really. How is whether or not this is a personal email address meaningful in searching hacked data for the email address? I'm literally providing you with text to search against a database. You don't need other information.
Yeah, same experience, same results, although at least it doesn't appear to make up fake results to force you to register (I used a completely fabricated address and it said it didn't find any match before it even asked me to register).
I went a step further. I added two email addresses (one ditched long time ago), was created a "report", but I had to create an account to access it... which I did, received a temporary password via email (lol), and gained access to... pretty much the same data you get on HIBP, but even less detailed: https://imgur.com/i9sOU27
This is not spam, it is a public service to help those who have been compromised as Upguard is not providing the data for those that want to see if they have been breached. There are multiple sites you can check for free if you have been breached including haveibeenpwned which will inevitably be posted also.
This is spam -- it's not a public service if you require registration to see results. It feels more like a ploy at grabbing email addresses than it does an actual security check.
Hey, I see that you claim to have 4 billion records. Could you talk a bit about your verification process for said records?
My private collection has a bit over 5B records atm (and presumably largely the same data) but I would never advertise a service with that number as I know for a fact that various databases contain millions of generated entries.
The results for queries like "00000000000000000000023@inbox.ru" seem to indicate that you have large amounts of bullshit data included in that figure.
In other news, already known information. Every USA states has clear laws on releasing whose registered with which party, past affiliation and address.
> Along with home addresses, birthdates, and phone numbers, the records include advanced sentiment analyses used by political groups to predict where individual voters fall on hot-button issues such as gun ownership, stem cell research, and the right to abortion, as well as suspected religious affiliation and ethnicity.
As far as I understand, the religion/ethnicity columns are not a fact, not a record and not something private provided by the person themselves or some agency - it's a modelling guess that they've made based on other publicly available data.
So, if I look at your post history and make a guess about what religion and ethnicity you might have, and write that down, am I in possession of an illegal database if I lived in e.g. France?
I did take a quick look, and it seems that I have a good idea about your full name, face, hair color, political views and party affiliation, your employer, etc. All this data (together with demographics of others) would also allow to make some guess about your religious preferences. All of that might be wrong, but it might also be right, so let's assume for the sake of argument that I write my guesses down and happen to have an entry that contains your full name, ethnicity and religion, just like the leaked data has.
Should this be considered a crime?
Does this change depending on how many people I look at?
Does this change depending on how accurate my guesses are?
It is a "leak" when additional data that was previously private, such as modeling and analytic information, gets published against the will of its owners.
However, it isn't a "leak" in the traditional colloquial sense that someone stole the data and released it to the public. It's just a security leak.
It is mostly publicly available data, but not always easily accessible (states have varying requirements and methods of acquisition), firms go through quite a bit to get aggregate files in all 50 states. For them to be put up with no protection is jarring. But not surprising with other recent disclosures.
I only wish we had access to the files to do some queries across them!
No. Records released include "names, dates of birth, home addresses, phone numbers, and voter registration details, as well as data described as “modeled” voter ethnicities and religions." [0]
Dates of Birth, Phone Numbers, Email Addresses are not public information. [1] [2]
The data was amassed from a variety of sources—from the banned subreddit r/fatpeoplehate to American Crossroads, the super PAC co-founded by former White House strategist Karl Rove.
There is a significant amount of data here that is not public or covered by the laws.
> Clickbait.
Even if no laws were broken, it may be news to those who are unaware that carveouts to PII disclosure laws have been made and their PII (along with polling data indicating preferences on lots of topics[1]) has been leaked.
Also, "politicians are at the very best so indifferent to citizen-unit privacy concerns that they can't be arsed to hire competent admins" may not exactly be news, but additional data points illustrative of that fact are.
[1] What can be done with that? Well, at the very least, a PI/con-artist/other party with a motivation to approach you cold would certainly love this sort of profile for anyone they're interested in. Think about it for a minute.
The reddit posts are odd. What value do they have?
And why/how did they leak?
Fortunately, they don't hold much personal data, but given that they're looking to raise $$$, the fact that they had a security breach is interesting. Especially if they haven't disclosed it.
This is actually almost entirely public data. Yes, including addresses and phone numbers and political affiliation. There are some states that is not public as part of the voter file, but you can still get it other ways publicly. For example: USPS, etc. Some states/players would make you sign agreements not to use it for commercial purposes.
The modeling info included is not public.
Acquiring 50 state data can be a bit of a pain, but there are at least two major players that will sell it to you. (I remember one of them literally laughed when I told them we would want the databases without any personal info included, because we just wanted the address to various political precinct mapping.)