Hacker News new | comments | ask | show | jobs | submit login
191M US Voters’ Personal Info Exposed by Misconfigured Database (databreaches.net)
307 points by rmxt on Dec 28, 2015 | hide | past | web | favorite | 99 comments

This is public voter data that anyone can get from their county office or secretary of state's office. It's not being "exposed" because voter files and party registrations were never secret.

The data is not a well-kept secret, but there are restrictions on its use, and for some locations criminal laws about misuse. See http://nationbuilder.com/voterdata for state by state rules.

Note in particular, "Placing on the internet so anyone can find out where anyone else lives" is not exactly an approved use in locations like California and Hawaii. Which is why there are going to be consequences once the database owner is tracked down.

Further, some records may never be public, depending on local rules. Witness protection, victims of domestic violence, judges, etc.

Rant: The state run VRDB's are a mess.

HAVA improved the situation, but 50+ separate systems leads to data quality issues. Which then trigger alarms. False positives are "voter fraud". False negatives are "caging" and "purging".

Made worse by the requirement that only eligible (active & inactive) voters appear in each database. Versus a master list of everyone with an eligibility flag, backed by an audit log.

The optimal solution is automatic, universal voter registration, maintained by each jurisdiction and hosted by the feds (perhaps http://eac.gov). Which is exactly what both parties are doing in-house. Then we can put a stop to the recurring food fight every election cycle.

Thanks for listening. My work on this issue has made me grumpy.

What sort of work have you done?

Primarily election integrity and voter privacy. As an activist, lobbyist, and candidate. I've dabbled in public financing. And I contribute to causes I support.

I've been working on legislation information systems, as best able, to help future persons like myself be more effective. (Tools I wish we had when I got started.)

Thanks for asking. From your profile's links, it looks like we'd be allies.

As the article notes, a good portion of the information is already public like you've said. However, as I see it, there are two types of "exposure" at play here. I took the "exposure" in the headline to mean "a publicly accessible database is serving up information that the owner might not have intended to reveal," rather than "sensitive personal/private information has become public." In my opinion, the "exposure" being described does not have to be the latter type to be unsettling or worthy of mention.

Pretty sure it is 100% already public. And not that hard to access. See e.g. https://elections.nationbuilder.com

Came here to emphasize the difference between taking public manually retrievable data and public programmatically accessible data, but it appears the API has no rate limiting in place.

Is it public information in which elections a person votes?


There have actually been some interesting social science experiments to shame people into voting more often by using their turnout record http://www.usatoday.com/story/news/politics/2014/10/30/inter...

But eliminating the jumps you go through to get that info isn't a good thing either

Are you saying that accessing public data should be possible, but require a willingness to jump through hoops?

What's your thinking behind that?

I'm not the person you asked, but to play devil's advocate...

I think law enforcement agencies should be able to wiretap suspects of crimes to listen to their phone calls, subject to judicial oversight. I do not think they should have unrestricted, unlogged access to a database of recordings of every phone call ever made by every citizen.

Same line of thinking IMO. There's a difference between an individual identifying themselves and signing a usage agreement before requesting public records from an individual county, versus dumping online a public database that lets you query the personal details of 191 million people. The same data is available, but the extra hoops you have to jump through change the potential for abuse.

I appreciate your thoughtful comment.

With respect to the law enforcement example, the access there is to non-public data. That seems like an essential difference to me, but perhaps you're citing this situation in preparation for your third paragraph.

In your third paragraph, I think I'm seeing a dilemma. I don't understand how jumping through the hoops changes the potential for abuse.

Let's say 30 people go to the county records office and get the data. They all sign an agreement of some kind. 30 people now have (presumably) exactly the same data set. One of the 30 violates the agreement and publishes it anonymously on the Internet. The data is now "in the wild" and any potential for abuse that it held is now up for grabs.

I recognize that I may be missing something, but if I am, I'm not able to see it. I might need more coffee. ;-)

I agree that there's a difference between public and non-public data.

With respect to the public data though, this is an issue which we've been wrestling with (and largely punting on) since various public records became readily available in electronic form. The fact is that, in the US, there's quite a bit of public information available about individuals. Much of this information (think home sales, past addresses, any court records) is public for historically sound reasons. However, there is a huge difference between scouring dusty town, county, and state records at considerable cost of time and money and making a few quick queries on the Internet, possibly paying a nominal fee if you really want to go deep.

But, to your point, we effectively implement privacy through obscurity/bureaucracy to shield ostensibly public information that we don't feel comfortable about anyone having access to it. But it probably works about as well as security through obscurity.

While it's poor form to have a leaky database, this information is largely public and dirt cheap. You can buy a whole state's worth of data for a couple hundred bucks or a few cents a name. That includes whether or not you're registered to vote in any specific primary.

Doesn't look like who you voted for is disclosed -- I'm not sure that this data even exists. I suspect in most states, you go in to vote, your name is crossed off a list, you're assigned a hash, and that hash votes, and there's no database of "John Smith voted for Jane Doe."

In my home state, your registration is printed in a poll book. It has a line number. As you come in to vote, you sign a log and the poll worker marks that you voted. The way they used to mark that you voted was by writing the sequential sign-in line number you signed in on. So, signature 73 would match up to James Smith.

We then implemented electronic voting machines with voter-verifiable paper tapes that allow to you see your votes and could, if absolutely necessary, be used to do a manual recount using paper records. These tapes were on the same type of paper used for other receipts, but were fed from one reel to another and stored in a locked box on the machine.

During the first election these machines were used, I went with another poll watcher to the precinct where a politician who was so set on how secure and wonderful the machines were and kept my own log - which of the five machines people used as they signed in.

So, at the end, I had my log (line 73 to machine 5, line 74 to machine 1, etc), the nice sequential sign-in sheet that matched easily to the easy-to-read printed poll book, and the paper tapes (required to be open to inspection).

We were able to match votes to people for all but seven of the votes (the last seven, actually, and we had a good idea who matched with which). The politician flipped his shit when I was able to demonstrably prove he voted for someone other than his party's candidate for governor.

The poll procedures were changed the next election cycle. The paper tapes were not allowed to be produced and the poll workers used a tick mark instead of a number in the poll books. The machines remain in use.

>The politician flipped his shit when I was able to demonstrably prove he voted for someone other than his party's candidate for governor. The poll procedures were changed the next election cycle. The paper tapes were not allowed to be produced and the poll workers used a tick mark instead of a number in the poll books. The machines remain in use.

So you are the person that killed democracy? Given that voting is a "trade secret" and the code will never be inspected do you think the abolishment of a paper trail is a good idea?

A paper trail without voter secrecy is possibly worse than no paper trail. It's not hard to design a system with both (shuffling the ballots is usually a good measure).

> I suspect in most states, you go in to vote, your name is crossed off a list, you're assigned a hash

I don't know about US, but I once knew a programmer who worked on russian voting system.

I honestly don't think that he was qualified enough to know what "hash" is.

Given that it's a Russian voting system, I'm not sure that this is unintentional.

Stupidity, not malice.

In my observer experience, the higher the official, the less interested he was in falsifications; it was the lowest ranks that wanted to prove that their areas are loyal with any means necessary, while the higher-ups wanted to avoid the embarrassment and didn't worry much about the outcome, since population's loyalty is pretty sincere, thanks to the propaganda machine.

> since population's loyalty is pretty sincere, thanks to the propaganda machine

Or is that we believe the population is loyal, because of the propaganda machine's affect on us?

Take my word for it, the propaganda machine lies about a lot of things, but unfortunately, it isn't one of them.

"Evil government oppressing discontent population" is a nice trope, but the reality is more grim.

> Take my word for it

I don't think I will in this case. How do you know? I have yet to see any reliable data supporting it, and many dictators have claimed overwhelming public support with polls and elections to match.

I'm not sure how anyone could reliably poll Russian citizens. Who would dare say something negative about Putin? How do you know you can trust the interviewer? How could you rely on anyone keeping your opinion secret from the intelligence services? How much risk are you willing to take for something as trivial as answering a survey? What polling service would dare publish a negative result for Putin?

Lived there for almost all my life, among well-educated, well informed people mostly — and still depressingly big amount of them are sincere in their support.

You know why I despise the western anti-establishment activists who are afraid of US and Europe turning into totalitarian regimes? Because they have a very naive image of what totalitarian state is and how it starts. Today's Russia didn't start in former KGB or in government offices; it started in Stalin-loving hutjob papers. It started with ideology, and that ideology filled a vacuum that the cleptocratic elites desperately needed.

If you fear for your country's future, know that a self-serving capitalist asshole is not nearly as dangerous as a sincere actuvist who really wants to make the world a better place.

> You can buy a whole state's worth of data for a couple hundred bucks or a few cents a name.

Where is this data sold?

Secretary of State's office for a particular state. WA charges some trivial amount for the trouble.

There are companies that add a bit of value by collecting the data and spiffing up the formatting a bit, then burning it to a CD/DVD for you. When I ran for city council of Redmond, WA, I went 15 minutes down the road to a place in Bellevue and just picked up the CD. It gives name, address, and whether or not one voted in each of the last X elections. SELECT * FROM voters WHERE "voter voted in 50% of elections" to get bang for the walking-door-to-door buck, throw that into MapPoint (tells you how long ago it was), and print out the walking sheets.


Among many others.

The data itself is mostly free public records, but it's worth paying to get all 50 states in one place

The Secretary of State's Office in each state is usually responsible for maintaining and selling copies of this information.

Doesn't DieBold determine who voted for whom? /s

"Could it be one of their non-hosted clients leaking the database? Maybe. Could it be that someone hacked one of their clients and stored a copy of the database at this IP address? Maybe. Could it be that an employee of a client decided to make themselves a copy for their own purposes? Maybe. The possibilities are numerous. We really don’t know and DataBreaches.net declines to speculate."

Umm... you just speculated.

They refuse to speculate if one of the speculations is superior to other speculations. Thus, if all speculations are still on the board to be further speculated upon, one cannot speculate further.

My head hurts.

You win the rationalization of the day award: two aspirins. ;-)

So where is the database?

If there's any legal or ethical problem with doing this using the Ohio Voter Registration files, I would like to know. I recently made an interface to it[1] to use when gathering ballot access petition signatures for Bernie Sanders in Ohio[2]. It's freely downloadable data, though[3], and the Board of Elections officials I shared it with weren't aghast at the idea.

[1] http://gobernie.net/ Source code: https://github.com/coventry/voter_lookup

[2] https://www.facebook.com/groups/929112173802716/

[3] http://www2.sos.state.oh.us/pls/voter/f?p=111:1:0::NO:RP:P1_...

My team has these same questions, we are working on a unified voter db. Care to collab on thoughts? looks like thehill just picked up on this. my username at gmail

Be pleased to look for the source with you emerges.com@gmail.com

Seeing the "_id" : ObjectId() fields indicates to me that this is likely a mongodb instance that was available to everyone.

There's been a lot of talk about these recently[1] that I'm surprised this didn't come up sooner.

[1] https://blog.shodan.io/its-still-the-data-stupid/

Yeah, the screenshot is using MongoVUE.

I own a few services that rely on voterfile data we acquire from many sources and I am aware of quite a few others (so I feel like I need to chime in here haha). I suppose sources aren't going to disclose the actual resource or IP address until law enforcement tracks them down? I haven't been able to find any reports of anything specific.

It may not be Nation Builder per se but it could be one of their many integration points maintained by third parties:


A well crafted shodan.io search given the already public information (approximate size, in the US, no password, etc.) should give you a good start. It's already been found once.

So here's a starting point: https://www.shodan.io/search?query=port%3A27017+country%3AUS...

I'm not sure how to search by database size though. But I'd estimate that 190 million voter records, at 1 kb each, would be a little under 2 GB if my math is right.

Here is a MongoDB named voters† that claims to be 472166432768 bytes long (a little short of 2500 bytes per voter, if there are 191e6 voters).

I'm not familiar with MongoDB and don't have the time to learn right now. But do check it out!


Yup, you found it.

Confirmed: db.blackhole_nj.find({$and:[{"fname": "Christopher"},{"mname": "J"},{"lname": "Christie"}]})

The governor's DOB in the results matches what's in Wikipedia.

Having never used MongoDB, I guess I should go searching for some code examples.

how did you run that?

Any MongoDB client, I used Robomongo at the time. But when I tried again the next day I could no longer connect.

It's not any of those. Are you sure it's running linux? Also, you're off by a few orders of magnitude. 191mm records * 1kb each is 191 gigabytes.

I included Linux in the search based on this earlier comment: https://news.ycombinator.com/item?id=10802149

Using census info for age distributions, this most likely amounts to every registered voter.

The site also seems to be having a rough time with the traffic. Here is the cached page: http://webcache.googleusercontent.com/search?q=cache:BXSmNL6...

Earlier this year I spent $25 for a FOIA request for my state's entire voter database. This isn't exactly private information.

What benefit does revealing registrations provide? How would the public interest be harmed by shielding names, addresses, and phone numbers from disclosure?

A lot of comments here saying "so what it's public record." But not a lot of asking if it should be. Something being the status quo doesn't make it right.

It seems like this is a MongoDB database.

Scanning the US IP ranges for Linux hosts (as mentioned in the article) with port 27017 open with ZMap and then running a script that connects to the open database and saves the size of the database in a file would be a good place to start for those who want to find it.

Undoubtedly, the torrent will be available in a matter of hours. Let someone else do the dirty work.

Haven't found one yet. Have you?

Someone mentioned a private torrent but I can't confirm (no access): https://www.voat.co/v/technology/comments/752962

Also a good way to go to jail. Please think long and hard before accessing random computers on the internet. The fact that they are unprotected is unfortunate but irrelevant.

im interested in the legality argument against this - if it is illegal how do sites like this operate?


or even https://scans.io/ which uses something like https://zmap.io/

Not saying you're wrong, wondering where the line is if there is one.

Port scans (what Shodan does) are in a grey area. Actually using the port scan data to connect to a networked service and exfiltrate data is definitely illegal.

While you might appear to be overly cautious, I'm going to agree with you. Better to not be wrapped up in the witch hunt especially once we know authorities are investigating

Look at our more detailed compilation of statutes at http://www.emerges.com/assets/images/docs/Restricted-State-V.... Note NationBuilder is wrong about permissible MS data useage.

More critically, NationBuilder may erroneously be denying accountability.

“Nation Builder is under no obligation to identify customers, and once the data has been obtained, they cannot control what happens to it,”

Specifically look at the statues for MA and CA. Clearly and in writing voter list purchasers are required to get written pre-approval from the two respective states PRIOR to releasing the data. But what if NationBuilder did not sign the affidavit with the state, ie what if NationBuilder got the data from someone in the Democratic or Republican national or state parties?

If either of the two major parties released the data without getting written pre-approval from the state, then they may all be in breach of contract and liable, NationBuilder included.

Not sure what LE is doing here unless this is operated by an org in one of the states where it's illegal to publish this data.

e.g a Florida company publishing California voter records in Florida can't possibly be committing a crime.

I don't see why FBI would get involved either, since there doesn't seem to be any federal crimes happening here.

Whats amazing is that there seems to be no way to contact anyone to take down this database-its just sitting there happily serving up data to anyone that asks. No contact info, no way to track down the owners.

Almost makes you think knocking it offline would be worthwhile just so someone will take a look at it.

1) this information is mostly public information, it contains information about party affiliations and participation in elections but doesn't contain details about votes. 2) it looks like the data came from nationbuilder, which spent around 2-3 years building/compiling a voter registration database that's more accessible to the public than other proprietary solutions

Kentucky has it where you can find your party by using your name and birthday. It also shows your home address. If you're wondering John Calipari is an Independent while Rick Pitino is a Democrat. What's interesting is that it shows Cal's address but Pinito only shows U of L. https://vrsws.sos.ky.gov/VIC/

We actually found a ton of this stuff couple of months ago http://blog.binaryedge.io/2015/08/10/data-technologies-and-s...

Would be interesting to validate the voting record in elections according to the DB against the outcome.

As far as I can tell, the only "breach" here is revealing what candidates or parties voters chose. The voter registries are public in nearly all states. I've used public voter registries to look up addresses, even when I only had a name. Information such as a personal address, phone number, etc have always been trivial to look up.

> The voter registries are public in nearly all states.

The article specifically mentions this:

> While the majority of states make their voter registration lists available as a matter of public record and do not restrict use, some states restrict use. For example, South Dakota requires the requestor of voter registration data to sign a statement …. In California, information on voter registration cards is considered confidential, and subject to many restrictions to access and use …. And in Hawaii, voter registration information may only be used for elections and by the government.

It is implied that the victims include voters from these three states (and explicitly stated that they include voters from California), so it is a genuine data breach in that sense.

> As far as I can tell, the only "breach" here is revealing what candidates or parties voters chose.

Also, as other commenters have mentioned, the list of fields at http://www.databreaches.net/wp-content/uploads/DataFields.jp... does include your party (which I think is information as public as the voting record anyway), but does not seem to include your specific vote.

I don't disagree with the general notion of your comment, but in

> As far as I can tell, the only "breach" here is revealing what candidates or parties voters chose

Why put "breach" in double-quotes? That's a very serious privacy concern if voters did not want this information to be public.

> Why put "breach" in double-quotes? That's a very serious privacy concern if voters did not want this information to be public.

I think that this is a very important point. It doesn't matter how important it is to you that my information is public; the seriousness of its exposure depends on how important it is to me. (I am using 'you' and 'me' here not to argue with you specifically—in fact I agree with you!—but rather as generic pronouns.)

It's not a breach because nothing was exposed that wasn't already public. Public voter rolls are one way we prevent fraud.

I don't expect who I voted for to be public record. I'm not a lawyer, but I don't believe this information is intended to be public record. There are risks of not only employers, but government workers and officials discriminating against someone based on their vote. The ballot should be as secret as possible.

edit: I pieced together information from other comments and noticed that who one voted for is not available, even through this breach. That's great. My comment here was addressing the breach as it was presented in this comment thread.

Who you voted for isn't public, but if you voted in a primary, it is often public which party's primary you voted in. Voting in a particular party's primary doesn't necessarily mean you're an actual supporter of that party though. I've heard of plenty of cases where members of one party voted in the other party's primary because they wanted the less-electable candidate to be that party's nominee.

Your replies in this thread suggest that you take the public nature of voting registration lists to be axiomatic, but it is not; as the article discusses (and as I quoted at https://news.ycombinator.com/item?id=10801570 ), there are (more or less strict) laws regarding confidentiality of voter registration lists in some states, some of whose voters are affected by this breach.

Laws in various states restrict how you can use the data, but does any state prevent you from accessing it? It's illegal to misuse the data regardless of how you acquired it, including from this "leak"

I assure you there are already databases of every registered voter in America. You're not allowed to do certain things with some of the data, but its always been available.

> Laws in various states restrict how you can use the data, but does any state prevent you from accessing it?

That's an interesting and subtle point, to which I don't know how to respond fairly. My information on these laws comes solely from the article, which says:

> In California, information on voter registration cards is considered confidential, and subject to many restrictions to access and use ….

"[S]ubject to many restrictions" links to https://www.lavote.net/Documents/purchase-order-for-voter-el.... I suppose that you could argue that these restrictions do not prevent you (assuming you are a US person) from accessing the data, but I think that the data are far from being public, which is what I understood you to be saying.

When the vote they implicitly consent for that data to be public. The problem is that this db is serving the data to users who are not gong to the proper office to ask for them.

  > revealing what candidates or parties voters chose
This did not happen.

> > revealing what candidates or parties voters chose

> This did not happen.

The list of fields at http://www.databreaches.net/wp-content/uploads/DataFields.jp... does include 'party', although I suppose that information is only as confidential as the voter registration anyway.

EDIT: The replies indicate that my last sentence was unclear. Contrary to what it seemed to say, it meant that, since the voter registration isn't confidential, neither is the party information (except that sometimes the voter registration is confidential, as discussed in the article and my 'uncle' comment (https://news.ycombinator.com/item?id=10801570 )). Maybe I should have phrased it, equivalently but more clearly, as "that information is as public as the voter registration anyway".

The point of my comment was just that it is not true that "voters' parties weren't revealed", which I took to be part of dfc's comment (https://news.ycombinator.com/item?id=10801543 ).

Party registration is not private. It's public. Even if it was private in many states you can only vote in a primary with you are a registered member of that party which are then included in the voter registration lists.

The party affiliation data in the voter roles is probably one of the most heavily used fields in there to boot. A lot of petitions need to be signed by a certain number of registered party members -- the local party apparatus has to get signatures each (election) year to demonstrate that they are indeed the local apparatus for that party, individuals wanting to primary need a certain number of signatures to get on the primary ballot, you may also need to collect signatures to force an open (write-in) primary with no specified candidates...

Campaigns will also use it to target their literature; you might send one mailer to the members of your own party ("X is a totally loyal party member, here's seven things he did to support our core party values!"), another mailer to the opposition party ("X is not a total baby eater, here's five ways he crossed party lines to support things you probably care about!"), and a spattering of other mailers to the third-parties to emphasize support for their particular interests.

The weirder thing to me is that these databases also include birthdays. You could send birthday cards to everyone.

Party registration is not secret at all, at least not in any state I'm familiar with.

People should understand that the party declaration is public information (or at least, not very private). Whether they do is another question.

Texas doesn't have party registration, which is why that field might be blank for the example query listed in the article.

You have no way to know this, unless you are the researcher or have access to the full list of fields somehow.

The list of fields shown is not complete, as they clearly state. They may have a reason known only to them, don't ask me what it is, for holding back information on fields they did not show.

> I've used public voter registries to look up addresses, even when I had only a name

This is why Wikipedia founder Jimmy Wales (and presumably other victims of stalking or DV) does not vote.

It does not have candidates voted for (how could it?) Just party registration, which is public. All of the data in this database appears to already be public records.

That's the question: why are the "primary_2000" and "general_2000" fields redacted strings? http://www.databreaches.net/wp-content/uploads/DataFields.jp...

The values for those fields would usually be a boolean indicating whether you voted or not. For a primary it might have the party, meaning which party's primary you voted in.

For a Get Out the Vote operation it's very valuable to know how consistently people have voted in the past. If you look like a supporter but you vote inconsistently, I might call and ask if you need a ride to the polls. If you already vote every time, I'll allocate resources elsewhere, or ask you to donate/volunteer.

Yes, my point is that the author did not care to redact the other booleans for the other years, but did redact those two specifically.

It's probably whether or not you voted in those races, not who you voted for.

That said, both RNC and DNC databases track who you are likely to have voted for in each race, but that's just a guess. It's a secret ballot. Nobody knows who you voted for.

revealing what candidates or parties voters chose

.. thereby destroying the integrity of the secret ballot, enabling vote-selling, intimidation, etc.

It's just a consequence of the ridiculous party affiliation declaration of voter registration in the USA. It may or may not have anything at all to do with how a person actually votes, and is not a record of how people have ever voted (except coincidentally, if they both registered, say, as a Republican and voted for all Republican candidates). Elsewhere, party membership/affiliation is completely separate from voter registration; participation in the candidate selection process, etc., is governed internally by the party and its membership.

> revealing what candidates or parties voters chose

Isn't this supposed to be anonymous? Is it just the affiliation declared when registering for voting, or the actual vote itself?

It also seems to include data on when they have voted. Which year and primary versus general. I'm sure this is going to be valuable to someone.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact