Hacker News new | past | comments | ask | show | jobs | submit login
Email Address Disclosures, Preliminary Report (letsencrypt.org)
199 points by Aaronn on June 11, 2016 | hide | past | web | favorite | 107 comments



Head of Let's Encrypt here. Our automated mail system had a bug that accidentally exposed about 1.9% of subscriber email addresses to the same 1.9% of recipients.

Our sincerest apologies for this mistake. We will be doing a thorough postmortem to determine exactly how this happened and how we can prevent something like this from happening again.

There is a preliminary report on the issue here:

https://community.letsencrypt.org/t/email-address-disclosure...


We've changed the URL to that from https://twitter.com/aaronraimist/status/741456355693760513 (and the title from "Let's Encrypt is sending out a list of all their users' email addresses").


What do you need email for, anyway? Not giving you one makes it pretty easy not to leak it.


They send reminder emails when certificates are due to expire.


FWIW providing an email address is optional.


Reminders/alerts of certificate expiration, for a start.


How big do you estimate your fine would have been if you were operating in the EU?


Good to notice that the new EU data protection regulation already has influence on the discourse even though it shall apply from May 25, 2018. And more importantly, even though Letsencrypt is a US based organization it applies directly or indirectly.

For those interested, I can highly recommend the documentary Democracy: Im Rausch der Daten (2015) [0]. For 9 month the documentary follows MEP rapporteur Jan Philipp Albrecht and his policy advisor Ralf Bendrath [1] (hacked a C64 in his youth, frequent attendee at C3) and gave a rare inside in the negotiation & process in Brussels. They even recorded the final negotiations in the backrooms and for those knowledgeable on the topic it is very interesting to see how some of the deals where made. After the screening at IDFA the director said he wasn't onset to make a documentary on the EU data protection reform and that Jan Phillip's rapporteur topic was chosen as an example EU process. Regretfully this documentary isn't exemplary for the EU process because most of the time the lobbyists of the large corporations have a far larger influence. The documentary clearly shows that the final text has been the result of the persistence & integrity of the the 3 main characters Albrecht, Bendrath and Redding. And as we're experience now, this has great influence on data protection for more than only Europe.

[0] http://www.democracy-film.de http://www.imdb.com/title/tt5053042/

[1] http://www.bfna.org/person/ralpf-bendrath


In case you're referring to the General Data Protection Regulation, it does not apply until May 2018. The penalty that would be applied would probably be this one, given that the number of prior offenses, the gravity and duration of the offense, the number of affected users and the damage caused, etc. need to be taken into account:

> Each supervisory authority shall have all of the following corrective powers:

> [...]

> (b) to issue reprimands to a controller or a processor where processing operations have infringed provisions of this Regulation;

(IANAL)


There are already similar laws in the UK and other countries.

See this recent example, where a sexual health centre accidentally leaked a list of 800 people who had attended HIV clinics.

They ended up with a £180,000 fine: http://www.bbc.co.uk/news/technology-36247186


Most people would consider their possibly/definitely having HIV to be a more personal piece of information than their email address.

If I'd been affected by this Let's Encrypt email thing I frankly wouldn't care, other than to question why it happened. (Just speaking for myself, I'm sure some people would care.) But if I was data in a leak revealing a personal serious medical issue, I absolutely would care.


Data protection is already regulated by the EU (only by means of a Directive rather than a Regulation) so the same principles apply across the EU/EEA, if not the exact rules.


Good point and very similar case (though in an entirely different category of "damage caused", plus there seemed to be a prior offense).


For sharing a few email addresses? $0 I'd imagine. It's not like its anything important like credentials for shopping/banking or other details which could be used in identify theft. Worst case scenario; google's spam filters have to work a little harder. You'd not even notice. Yes, some people have chosen to run their own mail servers for some reason and those people might conceivably get a bit more spam for a while.


> Yes, some people have chosen to run their own mail servers for some reason and those people might conceivably get a bit more spam for a while.

Once your email is out there, there's no going back.

Email addresses are personally identifiable information, and revealing a relationship between a business and a person is potentially very dangerous.

For example: http://www.bbc.co.uk/news/technology-36247186


You guys should all switch to disposable email addresses. It solves so many problems. If your address is leaked like that, no big deal, you just delete that alias. If a password is leaked along, again having website specific emails will make it very difficult to correlate your credentials with another website. If you start receiving spam on that address you just delete it. And so it ceases to be personally identifiable information.


Should I use a disposable name and a disposable address when I buy physical goods from an online retailer?


I actually start to think you should, without joking. At least the name.

If you look at the past 5 years, there is almost not a single major website that hasn't been hacked and hasn't leaked personal data. Not only do I see no sign of improvement, but it is rather accelerating. Leaking information on 30m+ people is now becoming common and barely makes the news outside of a few specialized websites like HN.

If you have a better alternative than feeding garbage data to websites who want to collect data they won't need (why would an online retailer give a shit that they are shipping a product to someone called Mr X rather than Mr Y?), I want to hear it.


Why not? What's the cost? What's the benefit? I create new accounts for all sites I use, including hacker news, every year or so.


Email addresses are absolutely personal data, and in the EU companies need to protect that data from being leaked.

(There would probably not be a fine; the company would be investigated and warned by the various regulators).


The reason for this screw up was guessed by a Twitter user, and his theory was confirmed by Josh from Let's Encrypt [1].

The whole mess was caused by the Python `email` package, and specifically the behavior of the `MIMEMultipart` object [2]. When you reuse the same `MIMEMultipart` object for multiple emails, each destination address is appended. The same problem takes place when you use Python 3 [3].

[1]: https://twitter.com/0xjosh/status/741487697059946497

[2]: https://docs.python.org/3/library/email.mime.html#email.mime...

[3]: http://i.imgur.com/XwWlUXv.png


I see that it's confrimed, but find it a bit odd that they had originally said "prepended between 0 and 7,618 other email addresses to the body of the email.", as this way it would be just a lot of "To" headers.


The design of MIMEMultipart causes duplicate key: values to be printed, rather than comma-separated addresses as specified in RFC 2822. [1]

https://tools.ietf.org/html/rfc2822#section-3.6.3


Multiple occurrences of the To, Cc, and Bcc fields are permitted (though obsolete).

https://tools.ietf.org/html/rfc2822#section-4.5.3


I was part of the disclosed list and can confirm that this is exactly what happened (plus an extra newline!)


It was not just due to this but also lack of QA/testing.


Since both users mentioned were the last in the list of addresses for the email they received, my money's on a trivial mistake like:

    getEmailBody(users[:i])
instead of

    getEmailBody(users[i])
I typically prefer a high level of polymorphism in my code/APIs (sensibly handling single inputs vs. arrays) but this is a great counter-example even if not the actual root cause. Every feature is also a liability. Double edged sword. Etc.


People on twitter suggest the reason could be a feature of a Python library: https://twitter.com/TvdW/status/741481798014664704

Assignment is interpreted as add-to-the-list operator.


It was confirmed by Josh, co-founder of Let's Encrypt: https://twitter.com/0xjosh/status/741487697059946497.



This looks like it belongs in a "fractal of bad design" article.


A type system can really help here as well.

    sendUserEmail :: FromEmail -> ToEmail -> [EmailHeader] -> EmailBody -> IO ()
However that also requires the discipline of wrapping a lower level function that is probably:

     sendEmails :: FromEmail -> [ToEmail] -> [EmailHeader] -> EmailBody
However even a functional language would shy away from using indexing which could be argued to be the source of this problem.


Nothing about your suggested solution requires or is exclusive to a statically-typed language. "Replace a send-to-arbitrary-number-of-addresses function with a send-to-exactly-one-address function" is possible in dynamically-typed languages, too, and as you openly admit even a statically-typed language is likely to model email sending in a way that accepts multiple recipient addresses.

So your "type system can really help" is really just an irrelevancy you've come up with to try hide your attempt to shove your preferred programming paradigm onto other people.

You should probably stop doing that.


> So your "type system can really help" is really just an irrelevancy you've come up with to try hide your attempt to shove your preferred programming paradigm onto other people.

Not exactly, but it is a weaker advantage than I thought when writing it. The advantage is that languages like Haskell encourage specializing functions in that manner which avoid that specific bug.

However so does test driven development which is just as possible in dynamic languages.

The advantage the statically typed language has over even the dynamically typed language plus tdd is that the program won't compile whereas dynamic language plus tdd relies on programmer discipline.


i thought of that as well.Went and looked at my mail list programme in Racket, and the mail procedure takes many recipients, but in the form if rest arguments, like: (define (send-mail msg . recipients) ...)

passing it a list makes the recipients arg a list with a list of recipients. To actually send to multiple recipients I explicitly have to use the apply procedure that passes all list elements as arguments.

as you said, not a staticly typed language. types would have made an eventual error easier to debug though.


Well, using the type system, weak or strong, to force correct code either at compile time or runtime would help. I can write a function that uses Java reflection for polymorphism as easily as I can use JavaScript reflection to enforce an API contract.

    // JavaScript
    if ( Array.isArray(address) ) {
      throw new Error("Must only pass a single email address.");
    }
    // Java
    void sendEmail(Object recipientOrRecipients) { ... }
There are times when polymorphism is low-risk, and there are times when it's better safe than sorry. Best to know your risk model (and your libraries) and act accordingly.


Sounds like an argument to write

    for user in users:
      getEmailBody(user)
instead of

    for i in range(len(users)):
      getEmailBody(users[i])
(for the languages that let you do so).


I fixed a very similar bug in an ASP.NET site a few weeks back; email generation had a factory class where each time it wished to construct an email it would set From = this, Subject = that, Body = the other, but To was a list, so it wrote To.Add(user.Email), without a To.Clear() call at the start. So the first email went out to one person, the second to two, the third to three… someone else had made the site and on receiving a report of it once I had briefly looked through the code and missed it, then when another client complained of the addresses plus having received sixteen emails I looked again and realised what was going on.


Preliminary report out from Let's Encrypt: https://community.letsencrypt.org/t/email-address-disclosure...

as far as I know, all emails starting with 0-9, A-Z and at least part of 'a' were exposed. I did not get one starting with 'g', so it's somewhere between 'a' and 'g' that it got stopped.

Edit: "7,618 out of approximately 383,000 emails" were sent out


Was just able to confirm, it's up to and including your email address. Mine starts with m so I see 3,761 email addresses. But for me, none lexicographically after my email address are exposed.

Edit: Just want to add that I've made a similar mistake before (with a smaller user base). So I understand how easily these bugs occur. Given all their progress in the last few years, I still believe that the privacy and security of such a large portion of the Web could not be in better hands. Props to the LE team for a quick, responsible response.


You mean M, not m. (it's in ASCII order). Also giving out the number of addresses you see will allow someone after yours to connect your username with your email address if you weren't aware.


Yes, sorry, "M". And if this comment[1] is true, then that number won't reveal much. Their mailer is probably distributed across several nodes.

[1] https://news.ycombinator.com/item?id=11881953


Multiple different images and a pastebin I saw posted on twitter showed the same starting set of emails.


I also didn't get one, and mine starts with "an." So a good chance they caught it pretty early.


Or, perhaps it's just been too long since I've used it actively? Sounds like parent did in fact have their email at least in the body somewhere, even if they didn't get one sent to them. Perhaps I'm the same.


Mine starts with 'ab' and is ~7200 on the list


It sucks this happened but I don't really care. You guys are providing such an amazing and sorely need service I have no problem cutting you some slack. I hope others will too. Of course those working for companies who's lunch you're eating will likely run with this as far as they can.


Interesting that since the list of addresses was sequentially prepended to (if I understand the wording of the notice correctly), anyone who anonymously shares the list will incriminate themselves ,though to a smaller and smaller pool of peer customers.


A simple solution to this would be to chop off an arbitrary number of addresses prior to disclosure. The first person can leak any number of emails, and the last can only leak one.


The list of addresses was prepended to the email, but the addresses were added to the end of the list itself. Thus, every recipient saw their own address as the last item in the list.


This reminded me exactly of Python's mutable default arguments:

http://docs.quantifiedcode.com/python-anti-patterns/correctn...


Sounds like the popular "append e-mail address to e-mail text with each iteration of the loop while keeping the previous ones".


To clarify: the email addresses are in the body of the message, not the To field.


The Hyatt hotel in Switzerland did a similar thing a few weeks ago. They sent a mail shot to everyone using the CC function not BCC. I complained and their response was that they'd recalled the mail so 'that was that'. Of course a recall means nothing to the hordes of gmail addresses,etc. that the mail shot was sent to. It's a common problem and a big incentive to use throw away addresses.


The new head of the IIA (Irish Internet Association) did a cc on the entire membership just a few days ago announcing her arrival. Felt pretty sorry for her. She actually did the bcc correctly the first time but forgot the attachment then correcting that she did a cc.

A comedy of errors....


I once asked to be removed from a list and suggested they use CC instead of BCC. I accidentally did so by way of Reply-All. Boy did I feel stupid. For days. While everyone kept replying to me.


Interesting. Slightly embarrassing. Not a huge deal. Handled well.


For the curious this was the content of the email. Pretty generic.

"Dear Let's Encrypt Subscriber,

We're writing to let you know that we are updating the Let's Encrypt Subscriber Agreement, effective June 30, 2016. You can find the updated agreement (v1.1) as well as the current agreement (v1.0.1) in the "Let's Encrypt Subscriber Agreement" section of the following page:

https://letsencrypt.org/repository/

Thank you for helping to secure the Web by using Let's Encrypt"


Planned email blast accidentally cc'd other recipients, allowing users to see each other's email addresses. They caught it after <8,000 emails went out and are fixing the problem.


It wasn't a CC, if it was then there wouldn't have been a way to stop after only some had gone out, because that's not how CC works.


Isn't the CC header essentially an instruction for the local MTA? So their local MTA might could have been relatively slowly working through the CC list (contacting each recipient mail server in turn). I'm not saying this is what actually happened.


No, MTAs don't look at the To or Cc headers at all. The addresses to deliver to are listed in the SMTP rcpt to command. The MUA can provide a completely different list to what's in the headers.


Ok, good if pedantic response. I think the substance of my point stands: MTAs can take some time to work through the CC list (which, as you say, is passed to them with the "rcpt to" command by a well-behaved MUA)

In other words, delivering a message to copious CC: recipients is not an uninterruptable operation even after the MUA has finished its job. They might have had to/been able to stop the local MTA to interrupt the rogue emails.


Sorry, to clarify why this is not just pedantic, the MUA is likely to split up a long list of recipients rather than try to send an arbitrarily long list in one rcpt to command and potentially fail after a very long time processing and sending data. The fact that they don't have to be the same means that the list in the message can be arbitrarily long regardless of how the MUA batches it for the MTA.

In that event, it is interruptible.

Also, the MTA will batch its own outgoing sends to individual servers (giving them either just one rcpt to or all the recipients whose MX map to that server) and it could be interruptible there as well.

The point is that when sending to many many delivery addresses, things get batched at various stages and become more interruptible than if your CC was the master list of how a message were routed.


Which, to clarify a bit further, is agreement with your conclusion but not how you got there, so maybe that's what you meant by pedantic. :)


In a scenario where the entire list is in CC, this could matter. But when your scenario is leaking a small fraction of the list members, you would have to be batching CC. Once you're batching CC, then it doesn't matter if all the emails in a batch are sent out at the same time.


Some people are saying the emails were in the message body, not CC field.

What's the limit for numbers of addresses in a CC field? Because this is several thousand addresses.


If you're sending to/from Gmail/Exchange the limit is somewhere around 100 addresses (I believe it's technically the byte-size of the field not strictly the number of addresses).

The actual spec though, AFAIK, has no limit.


Right. SMTP header lines can't exceed 998 characters, but RFC2822 now allows multi-line headers[1], so there's no limit except one that might be imposed by an MTA (and that would generally only be imposed on sending, not receiving.)

1. https://www.ietf.org/rfc/rfc2822.txt §2.2.3


I think it's positive that they own up to it and actually apologize.

One would also think that most subscribers of this newsletter has a positive attitude towards the general concepts of privacy and security, so I'm also positive in thinking that a list of these disclosed addresses will never see the day of light (hoping I'm not too naive).


I received one of these emails (most likely because my address begins with 73 and the emails are sorted alphanumerically). It looks like this: http://pastebin.com/vpPU5sLj


I wonder if people whose email addresses start with the letter 'a' also get more spam.


A reminder: If you got a copy of this e-mail (as I did), please don't repost it -- or if you do, don't include the e-mail addresses.


Not of those that did not specify any, I would imagine; you can use the ACME service without providing any email address at all.


Curious to see the reply-all thread that ensues.


It probably won't be very exciting considering the emails existed only in the Body of the email. The emails themselves were only addressed to individuals. You can see this in the linked screenshot.


Good to know, but really I don't care if they send my email address to every other registrant. I run a public web server, I already receive junk email that must be filtered, so I see no problems. It has zero impact on the free certificate service they provide.


It's disappointing to see this level of incompetence from a group responsible for such great leaps in web security. Let's Encrypt should take appropriate steps to ensure this never happens again lest they erode users' trust any further.


Things like this happens all the time. Give them a break. They already did what needed to be done. It's a bad bug yes, but lots of people here could have done it.


Does this suggest that the first person got sent 7,618 e-mails?


Note to self: use a new email account when using this service.


I try to do this for every service I give an email address to.


gmail still drops the ball, because you have to give your realaddress+marker rather than being able to request a marker.

The correct behavior is to be able to request a marker when signed into any email account, and on my side set the tag that it gets tagged with in that inbox as a result. The link between marker and inbox should remain secret.


I pay for a FastMail account which gives me up to 500 aliases. These are then sucked down in to my Gmail account.


I can't see a better example of Google dropping the ball than you paying for some other service so that you can then consume it from gmail. :)

500 is a reasonable limit I think, in case spamming would be some reason for them not to do this. I don't have an opinion about whether people should be able to send from marker@gmail.com or if it should just end up in a real inbox but without the ability to send from that address.


Out of curiosity, why do you do this? FastMail has an amazing webmail interface and Android app. I'd never go back to Gmail.


The original thinking was to make it harder for casual snoopers. If FastMail gets compromised, then they'd need to be compromised over time and someone would need to review a lot of my email with a shelf-life of one-hour to understand who I was - I use auto-generated credentials, paid via Bitcoin, and login once a year via VPN to generate more addresses.

If Gmail gets compromised, you'd need to be looking for a bunch of Fastmail accounts in To: addresses to link my primary email with those emails.

If you wanted to track me down from an email address, you'd need a warrant in Australia (for FastMail), and the US (to find the account using those FastMail credentials), so I'd need to have actually done something wrong (which I haven't), and you'd need to convince judges in two jurisdictions of that. As I said, the threat model is against casual snoopers, rather than a determined state actor with proof of wrong-doing, as I don't think I'm even slightly interesting and I don't think I've done anything that would make me interesting.

As it turns out, you could probably just read enough of my FastMail email as it came in (before it gets deleted by Gmail) to figure out who I am, so this is imperfect.


It'd be a lot simpler to rent a VPS and setup your own mail server.


You don't even have to specify an email address at all; it's considered optional in ACME.[0]

[0] https://github.com/ietf-wg-acme/acme/blob/master/draft-ietf-...


Hm, I didn't get anything.

Did you sign up in the earlier or later stages?


It looks like it's sending them alphabetically. I wonder at what point in the alphabet someone hit the kill switch.


0-9, A-Z were sent out and part of 'a'


Mine starts with 'contact' and I didn't get it, so probably before that.


Security is not just a product.


What I'm asking me myself everytime I see a post about data leak, could you sue a company for the leak?


You can sue almost anyone for almost anything. Could you win? IANAL, but I think you'd have to prove damages.


If they're in the UK you can report them to the regulator - the Information Commissioner.

They tend to take a warn then fine approach.


Directly below the apology for leaking emails addresses, I get this message prominently displayed:

> "Hey there! Looks like you're enjoying the discussion, but you're not signed up for an account.

> When you create an account, we remember exactly what you've read, so you always come right back where you left off. You also get notifications, here and via email, whenever new posts are made. And you can like posts to share the love."

> [Sign up] [Remind me tomorrow]

No thanks :)


Note to self: keep using HTTP, and provide HTTPS for important website content (like shop payment) and use a SSL/TLS Cert that lasts 1 year.


> keep using HTTP, and provide HTTPS for important website content (like shop payment)

What if someone MITM's your site, injects some code so that when the user clicks on the checkout link/button they get sent to a malicious site?


Have you asked Amazon the same question? It works fine for them since 1994.


Amazon has deployed HTTPS across all their sites a couple of weeks ago.


Yes, just last week. And parts of site were down too for some hours. Is MITM something new? No. What is the point? Amazon worked fine since 1994/95. The whole HTTPS-only movement looks very orchestrated.


HTTPS everywhere is a good thing, and if you don't understand why that's fine, I don't have the time to explain to you why you are wrong, but you are. Good luck deploying Internet connected services in the past.


The difference is I understand the Pro and Cons. That's why HTTP and HTTPS is often better.


The jaded, bitter part of me hopes this will be another nail in the coffin for XaaS and the recent trend of centralizing everything onto Web services.

The rest of me which is more jaded and bitter knows that it won't.


What would your ideal online world be?




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: