Hacker News new | comments | show | ask | jobs | submit login
New leak shows NSA harvests To, From, and Bcc lines of e-mail data (arstechnica.com)
403 points by evo_9 1401 days ago | hide | past | web | 106 comments | favorite



NSA harvests the entire email.

Source: http://www.wired.com/threatlevel/2007/05/mark_klein_docu/


That's obvious since the email header and body are in the same document. Presumably what's going on is they capture all emails, process them to strip off the body, and then make the headers available to the analysis systems. They can then go back and retrieve the body of any email that interests them. One of the points that critics have been making is that analysts can make the decision to retrieve the content on their own without a court order, so this whole metadata safeguard isn't much of a protection at all.


If I'm not mistaken, isn't fiber optic splitting done with a glass... Prism?


I talked to a company that did this kind of work, and they used tiny MEMS mirrors made of gold plated silicon.

They wouldn't tell me much about their customers, but they specialized in switching and splitting high speed data at the physical layer.

http://www.glimmerglass.com/solutions/cyber-security-and-law...


"You know, in case you want a redundant signal for...redundancy...reasons."


It's done by bending the fiber, causing leaks. The signal has a single wavelength, so a prism would serve no purpose.


There are several fiber optic beam splitting patents that are actual prisms which serve various needs.

For example : http://www.google.com/patents/US4671613 (one of the many). I'm not saying this is used but it does exist.


Well, a prism makes sense if you're sending multiple signals in the same fiber, and want to separate and combine the signals at the ends. The fiber connection I'm renting has two wavelengths on the same fiber.

The installation guys actually used a device to bend individual fibers during installation, to see which ones were carrying signals. Here's a similar device: http://www.tuolima.com/optical-tool-series/test-equipment/op...


I think you can also use a prism to just split 1 fiber into 2 or more, much like a mirror, similar to how some laser systems work. So in essence the original beam goes on it's merry way unaffected and the duplicate(s) goes into a black box. Though I know very little about this subject I would assume the splitter has to be exact and completely lossless. I also think there are tools to measure any interference.

For example a device like this is used to reflect fibers: http://www.ozoptics.com/ALLNEW_PDF/DTS0095.pdf


This makes sense. I would imagine, a prism would create a more noticeable loss of signal at the other end too since I'm guessing a splice is involved, whereas a passive leak would be less detectable.


Either way it is just refracted light, right? Same physics going on with either.


It's physics, yes, but since the light has a uniform wavelength you can't "split" the beam with a prism: you get one beam going in and only one beam going out.

Curving the cable as the other comment mentions might make sense if the curvature were calculated precisely, as then some (and not all) of the light would escape instead of being refracted back inside the cable.


My understanding is that you can use a beam splitter created from two prisms that will split laser light. This is done during holography from what I understand.

I don't entirely understand all of this, my optics is very rusty.

http://en.wikipedia.org/wiki/Beam_splitter

http://en.wikipedia.org/wiki/Holography#Apparatus

http://en.wikipedia.org/wiki/Total_internal_reflection#Fract...


Well a beam splitter as described in the Wikipedia page is moving past my own ability to describe, but the net effect from the double-prism description seems to resemble the effect of precisely curving the cable: Some light is able to internally reflect out the splitter while the rest proceeds as normal.

But in this case the prism is 'just' a convenient way of getting the desired diagonal shape with materials of different refractiveness, you're not actually breaking light into its component wavelengths as you'd see in NSA's PRISM logo.


What we've seen of the slides seems to indicate that the PRISM name refers specifically to the "direct collection" from cloud-data firms (using FAA70* directives), as distinct from the "upstream collection" from the network. https://news.ycombinator.com/item?id=5887627


Careful, you're going down the Glenn Beck-style "OLIGARHY" route.


The person in this thread that is closest to a Fox News cast member would be the local HN government apologist. Namely, yourself.


It's probably an optic with a coating that reflects some percentage (probably a small percentage) of the light, and a second detector.

http://www.edmundoptics.com/optics/beamsplitters/


We use these extensively:

http://vssmonitoring.com/products/traditional_tap.asp

I'm not sure how they work, but the sales engineer said "mirrors". The fibre units are completely passive.


If you're a techie, knowing that every bit of data you collect from customers will eventually end up in Utah-- you have a duty to either collect the minimum data possible or encrypt both transmission and storage and demand a warrant for access.


Question: Does anyone here understand where exactly systems that incorporate a zero-knowledge architecture fit into the recently illuminated legal framework (re: warrants, etc.)?

i.e. if I implement my service so that I don't have the keys and cannot reasonably obtain them, what does that mean for my users and their data, presuming the data is stored in the US? Juicy example: Lastpass. (I am not affiliated with Lastpass.)

I'm sure this has already been discussed on HN recently, but with the dizzying number of PRISM/Snowden/Leaks/Wiretapping threads flying around it's difficult to keep up.


Lastpass can access your decrypted password vault if they are compelled to. All they have to do is send you some modified JavaScript which steals your password/key.

They're certainly worried about persistant XSS attacks being used to gain access to peoples vaults. There's nothing stopping them performing one of these attacks themselves, targetted to a specific user.

If you think this is unlikely, look up Hushmail being compelled to send modified java applets to their users to steal their keys. It has been done before.

So yeah, if the US government wants access to a list of all of your accounts, when you logged in to them, what IPs you logged in with and your usernames and passwords, they'd probably be quite pleased to find out you're using Lastpass


Well Lastpass was just an example. With enough effort any service can be hacked, but if the bar is high enough it means it's more likely that the US gov can't/won't do it en masse. I would note that Lastpass allows you to implement Google Authenticator/Yubikey/One-time-pad/Biometrics to help secure your key against a simple XSS attack. I think that probably qualifies as 'setting the bar high.'

In any case, my question was more towards the _legal_ situation, not the technical. Suppose you have a near-perfect no-knowledge system, how does the US gov view that entity? At least in theory, if they cannot reasonably force the company to give up the keys, what can they legally do? Can they force the company to shutdown? Can they make the company force users off the service in an attempt to get them into a less secure realm? Are such systems even legal in the current climate?

Of course there is always a way to hack it, and the $5 wrench will beat anything (pun intended), but as far as the mass surveillance mandate goes those options are probably out.


To be clear, I was not describing a hack. LastPass can be forced by the US government to get a LastPass users keys. All they need to do is get a court order and tell LastPass to send some backdoored code to the user, exactly like they do with Hushmail.


I disagree. I have an Amazon EC2 instance and use that as a backend for some apps. I doubt that any of this data transmitted over unsecure HTTP is in Utah.


If it went through one of the peering points with an NSA optical splitter, and it probably did, then it's probably sitting in an NSA data warehouse. The Utah site specifically is still being built.


It's been a while since I worked with email headers and smtp, but I don't think the Bcc header actually exists in transit. The mail user agent and/or the mail submission agent remove it.

They could reconstruct this information from the graph.


Apparently not always:

"In the second case, recipients specified in the "To:" and "CC:" lines each are sent a copy of the message with the "BCC:" line removed as above, but the recipients on the "BCC:" line get a separate copy of the message containing a "BCC:" line. (When there are multiple recipient addresses in the "BCC:" field, some implementations actually send a separate copy of the message to each recipient with a "BCC:" containing only the address of that particular recipient.)"

http://en.wikipedia.org/wiki/Blind_carbon_copy#Visibility

Depending on combination and location of the MUA and MSA, it is plausible that the NSA was able to get full BCC lines.


The BCC field itself is usually removed from the email but if you're monitoring the STMP session you can reconstruct the BCC from the RCPT TO commands in SMTP.


Yes if somebody is in the RCPT TO but not the To: or other fields then they are Bcc:


Considering they have an upstream wiretap on all US internet traffic, it seems possible they could determine the BCC.


Why bother? Google knows the BCC line - your email provider by definition has to in order to BCC the message, and it very much depends on them to properly scrub it out as well (I know for a while it was pretty trivial to see BCC's from most providers).


It depends on if you get the envelop. The envelop is a list of recipients for a given server to handle... the headers of the actual email are a convenience for display, not used for relay/forward/delivery.


If you don't want to use that awful doc viewer:

  wget http://s3.documentcloud.org/documents/719116/pages/doc03-p1-large.gif
  ...
  wget http://s3.documentcloud.org/documents/719116/pages/doc03-p52-large.gif
The last time the guardian had a document up and I provided these gifs someone replied with a pdf copy. I am unsure of how to get the pdf from documentcloud. So feel free to post a pdf link and please explain where you get the URL from


I took a guess and got it right (dont sue me plz DocumentCloud):

  http://s3.documentcloud.org/documents/719116/doc03.pdf


I regret that I have but one upvote to give you. Many thanks. I swore that last time the pdf had no relation to the gif names.


What you just did is no different than the felony they got weev for.

Edit: well, I guess you didn't falsify any headers, though.


Now make a script to download several of those at once, and you are Aaron class felon...

This make me depressed :/


Someone else posted the link to the PDF, but for downloading sequences:

    wget http://s3.documentcloud.org/documents/719116/pages/doc03-p{1..52}-large.gif


Here's an idea: If you don't want to make an anxious public even more anxious, don't name your NSA surveillance program "EvilOlive." Or really, anything starting with 'evil.'


yep, and due to a bug in the perl script, it harvests all the lines to the next From.


Yes, it is probably a regexp bug, like

    /^ $/
Someone pressed the space key once at the wrong place...


What exactly is it supposed to mean that the NSA intercepts only data with one "foreign end"? That it intercepts all data that crosses e.g. a transatlantic cable? Or that it scans the IP header of absolutely everything and grabs anything with a non-US IP as either source or destination? Or something else?


That's where the "minimization" documents from last week come in. They actually collect everything, but then there is some kind of filter at the collection point that is supposed to remove any communication that they are certain is american-to-american. But that filter also has exceptions for things like encrypted messages or pretty much anything else they are interested in. They make the filter as loose as they can while still being able to maintain some deniability that they don't collect domestic communication.


That really depends on the details of what's capable with their collection equipment.

Transoceanic cables are obvious, but there are also satellite-satellite links that are possible that could terminate right in Iowa for all we know, that the NSA wouldn't be able to retain unless/until it made it to one of the Tier 1 ISPs (they might use geolocated IPs for this).

And as the other comment mentions, the minimization procedures seem arranged to blacklist and discard only that data which is clearly US<->US so you could definitely end up with your data being socked away.


They discontinued the program to save just the 3 headers because now they've got other programs that save the entire email message. And phone calls, and text messages and tweets, etc.


This headline is misleading. It implies the program is still ongoing, where the original article clearly states that the program was shut down 2 years ago.

Moreover, after a lot of cynical complaining about Obama not being meaningfully different than previous administrations, it's worth noting that Obama was the one to shut this down.

I'm not interested in reflexively defending the government or Obama but we still need to pay attention to the facts at hand.


If you read the article carefully, or even better the original Guardian article: http://www.guardian.co.uk/world/2013/jun/27/nsa-online-metad... you'll find claims that while the original program was shut down something very like it although probably larger in scope is still ongoing - details are murky though.

I instinctively like Obama, but I'm forced to admit that his policies on national security are by any objective means worse than his predecessor. He's just more eloquent when he talks about them.


I read both of the original articles (also http://www.guardian.co.uk/world/2013/jun/27/nsa-data-mining-... ) -- there is still nothing in them that would cause this statement to be reliably true. You may surmise it, but speculation != fact.

Also, I think you must have a phenomenally short memory if you think Obama's policies on national security are stricter than Bush's were.


Let's see:

  - He has expanded and extensively justified the drone strike program
  - His administration has denied more Freedom of Information Act requests than Bush did
  - His administration has prosecuted more whistleblowers than *all other administrations combined*
  - He's clearly in favour of all this surveillance, even though he campaigned with promises to remove it
I used to think there was a lot to like about Obama and there are still some things. At least he doesn't look like a chimp in photos. But it's naive to think he's not extremely hawkish on national security. Whether that's a good thing or not is up to each of us to decide.


Candidate Obama and President Obama are two very different people, despite what many reporters and big supporters would have you believe. But it takes a strong willed human being not to give in to all the pressures that must be present in the Oval Office. Just imagine the day after he was sworn in how much classified shit the CIA/FBI/HSA/NSA must have presented him with. How would any individual be able to sift through it and call these agencies, who have a vested interest to protect America AND expand their own budgets, on their bullshit? I'm not saying all of it is, but the way they go about security leads me to believe a lot of it is. But who wants to be the next Bush and ignore a terrorism warning?


Naive, naive comment. Its all about the money. The cyber security apparatus is worth more than 80 billions. Terrorism is a creation from the same people seeking to profit from it. No, candidate Obama and President Obama are roles played by the same individual, neither is true, just like an actor playing a script, his job is to convince you the script is real... What does matter is the money the actor brings to his sponsors.


Yeah, 9/11 definitely was a fabricated event...

People need to keep their comments reasonable and cut the hyperbole if they want to get anything done. When you keep crying wolf, people stop listening. Which is fine if you just want to always get the last word in, but if we're actually concerned with overreach and national security then choosing our messaging well and keeping our concerns focused, specific and provable with neat, incremental steps is the way to go.


Kill-lists (that cannot be challenged in court) of Americans that are not in combat roles and are far from any battlefield.


> I instinctively like Obama, but I'm forced to admit that his policies on national security are by any objective means worse than his predecessor.

Maybe the difference is a management style that does a better job following through with things.

Example: Bush spends 8 years going after bin Laden, finds nothing. Obama does it in 2. When I witnessed this it always struck me that probably Bush wasn't trying very hard.

Possible parallel: Bush hears from security apparatus that we need lots of wiretapping, so they do that for a while. Obama gets in there, gets similar advice ... You can guess where that goes.

At any rate I have a hard time believing Bush would say no to this stuff.


Programs are often shut down when superseded by new or better tools. In any case, both the WaPo and the Guardian article on the same leak says metadata collection is ongoing:

A senior administration official queried by the Washington Post denied that the Obama administration was "using this program" to "collect internet metadata in bulk", but added: "I'm not going to say we're not collecting any internet metadata."

http://www.guardian.co.uk/world/2013/jun/27/nsa-data-mining-...


The information that has come out over the last 12 years has taught us that they will change the name of the program, calling "this program" "shut down," while continuing with what they were already doing under another name, sometimes under different authorities (but always the same conduct).


Don't you think if there was evidence that the government was still harvesting CCs,BCCs, etc en masse, that the article would have made this their central claim? The vast majority of the article is about how this stopped in 2011. One line implies they still collect "internet metadata" but not in bulk.


So just use the CC field, problem solved!


Anyone ever hear the rumor that the reason why Google pulled out of China was because Chinese hackers had tapped into a feed of all email metadata? I heard it included subject. This news made me immediately think of that rumor.


Will google pull out of US now? :-)

Can't fight big government. China or US, really doens't matter.


How the heck do you track Ad interacting habits through just an IP address? I call BS on that particular paragraph.


They could track if you reply to an ad or, if they also track your IP connections, see if you click on a link in an email containing an ad.


The IP information is a common and general location, usually of your nearest Telecom tower. IP address in itself does not lead to your Internet device. So the paragraph is still inaccurate if the Ad agency is tracking through IP. Your IP was always naked and available to anyone you send email to (through the headers).


A couple of things:

1. That is true of some cell-based mobile data solutions, but others use an actual IPv4/v6 address assigned to each mobile session.

2. Some popular webmail systems hide the source IP address, while others include a special header with the data.


But your actual IP address still does not tell them your precise location. Unless they also separately get a log of what cell tower handled traffic for what IP addresses, they'd still be left with only the location of your local internet provider.


I might have been mislead by the use of IP, the handset is what is tracked and it usually is more precise than just what tower the device is associated with. It can include measurements taken from multiple towers, which can be derived from data needed for CDMA to even function, or it can include government mandated E911 information which is usually derived from an internal 4-channel GPS receiver. In theory this is only supposed to be used for E911 functions when the handset is in contact with a PSAP, but we have no way as the public to know what information from these systems are collected and stored or for how long.


An IP on a mobile device is not as same as an IP on desktop. As far as desktop is concerned, unless ISPs are willing to track your exact location for the government, there is no way anyone can pinpoint your exact location through the public IP address.


It's a stretch, admittedly, but most email marketing interactions have campaign-specific email addresses to follow conversions. They would certainly be able to see what companies you've previously interacted with by the fact that you are receiving emails from company.com.


via ad agencies?


And what does the Ad agencies got to do with this NSA mess? Or in other words, is there any disclosure in this article of any Ad agency doing NSA's job?


If you can monitor a meaningful fraction of traffic, then you can use an advertising agency's cookies to track a target.


Or really, any session cookie that is transmitted along with a normal HTTP request. Doesn't have to be just an ad agency's.


Is email traffic typically encrypted between major providers? E.g. could a network attacker, located between Google and Microsoft, intercept unencrypted traffic between gmail and hotmail addresses?


It depends. Bigger providers like Google typically prefer to use TLS, but there are a great many smaller providers who do not. In order to keep everything flowing, there has to be the ability to fall back to unencrypted messages.


It is often encrypted, but since the encryption is negotiated using STARTTLS it can easily be stripped by an active attacker. It works fine against passive attackers.


SMTP relay is almost always unencrypted. The client that connects to the SMTP server may connect via an encrypted connection though, but that's mostly to prevent snooping on the client's local network.


This thread is big meaningless distraction... The main point is: You Are all being illegally spied on The land of the free is a big lie.


Can I ask how? How do you have 75% of the traffic or 75% of the servers (as the article states), how the hell is that logistically possible?


GCHQ in the uk have been doing it for most uk traffic, and it seems the approach is to store full data as long as they can, and store headers for longer. All the content is stored for 3 days, then the headers are kept for 30 days, and shared with other agencies like the NSA, who may well keep it all indefitely if they have enough storage available. They probably do some early filtering to keep it manageable, removing duplicate content, unwanted videos etc. and the headers and metadata are probably not that large. Before reading the GCHQ docs I wouldn't have found this claim credible...

If they are not collecting every communication in the world, you can be sure it is not from lack of ambition to do so. In the words of General Alexander:

“Why can’t we collect all the signals all the time?” the N.S.A. director was quoted as saying. “Sounds like a good summer project for Menwith."

Which is a worryng thought when you realise the implications of this ambition. We used to think that only a god could be omniscient, but that is the current ambition of our intelligence services and politicians.


What if the near-term result of all these revelations is that it just becomes the new normal? Is it necessarily such a bad thing for the snoops in the shadows if they know people will eventually just get used to it? After all nothing really feels different day-to-day so ... meh.


Data is being collected on all people. All people are guilty of something. When the time comes, a case will be built for the person currently under scrutiny.

http://marginalrevolution.com/marginalrevolution/2013/06/no-...


Is there any numbers on the amount of hard drives NSA have?



> The center will reportedly be able to store five zettabytes worth of information

I am sure some day in the future there will be MicroSD cards with this storage capacity. But now it is just mindblowing

~5.500.000.000.000 Gigabytes


This website makes the unsourced claim about Yottabytes that "You can compare it to the World Wide Web as the entire Internet almost takes up about a Yottabyte."

(http://whatsabyte.com/)

1 Zettabyte = 1,073,741,824 terabytes.

This Quora answer says that total HD supplied numbers worldwide in 2011 was 6,800,000 units.

(http://www.quora.com/How-many-hard-drives-are-produced-each-...)

I find the 5 zettabyte figure hard to believe.


Tapes, not disks.


Assuming 3:1 compression our 1 zettabyte (or 1,000,000,000 terabytes) of data becomes 333,333,333.33 TB.

Using a nice IBM 4 TB tape we need 83,333,333.33 tapes for 1 zettabyte.

I still find the 5 zettabyte figure hard to believe.

But searching for tape does start producing a lot more government-like language and documents. Knowing that there is a "Summary Of Non Confidential Information On U.S. Magnetic Tape Coating Facilities" makes me want to read the confidential version.


What if it was a database that stores databases. mindfuck


So who the F can we trust? All those denials from everyone and now we see this, which I kinda suspected since Verizon was ordered to hand over the same for phone calls.


I am channeling grugq here. The answer is clearly stated in Biggie's 3rd Commandment of OPSEC:

  Number 3: Never trust nobody
  Your moms'll set that ass up, properly gassed up
  Hoodied and masked up, shit, for that fast buck
  She be laying in the bushes to light that ass up
The answer from the SIGINT community is

  In God We Trust, All Others We Monitor
The answer from President Reagan:

  Trust but verify


My favorite recent variant: "YES WE SCAN!"


I think that line pales in comparison, and that is being generous. Anyway, Carl Malamud was already using it for a different purpose: https://yeswescan.org/


You can't trust anyone. You can trust protocols and implementations and proven math.

That said, I would bet good money they already have working quantum computers, in which case current crypto may have quite a few problems.


That's pretty restrictive. I know a large number of scientists and computer programmers who believe the only thing you can trust are protocols and math, and they're usually completely unable to function outside of the narrow domain of their work.

I think a better philosophy is to trust that people will behave according to the incentives and information available to them. So if there is an organization out there, you can bet that it will act to expand the scope of the organizations' actions, because organizations that don't do this eventually get replaced by ones that do. If the organization is tasked with keeping tabs on all of America's adversaries, you can bet that they will see adversaries wherever possible to preserve a purpose for the organization.


"Institutions will try to preserve the problem to which they are the solution." -- Clay Shirky

http://www.kk.org/thetechnium/archives/2010/04/the_shirky_pr...


> I would bet good money they already have working quantum computers

How much, at what odds, and under what conditions of settling?


The odds depend on whether you know the odds or not.


> I would bet good money they already have working quantum computers, in which case current crypto may have quite a few problems.

Both statements in that sentence are ridiculous. Do you also wear a tin foil hat while having such thoughts?

First, quantum computing is one of those fields for which you need the brightest minds to solve it. Government jobs may still be attractive for researchers, but if they need to keep such developments a secret, it means they have to limit themselves to the people they can actually hire. This means their talent pool will be more limited than that of a company like Google, or a university like MIT, organizations that can always collaborate with whomever they want in the open, including foreign companies and universities. For building a practical quantum computer, they can have big budgets too, given that companies like Google are interested in machine learning, not to mention the pool of investors that would be dying to be a part of the next revolution. Some of the brightest minds we have worked on quantum computing already, in the open. The idea that a single country's government would be able to do a better job, in secret, is preposterous.

Second, quantum computing doesn't solve P = NP. The difficulty of brute-forcing AES-256 is only reduced to that of AES-128. It is something, but not much and that's only speaking about asymptotic complexity. Going from a feeble experiment in building a quantum computer to building farms of such computers to run distributed algorithms on them - well, I can assure you that farms of commodity hardware with capable GPUs will be used instead for a really long time.


https://news.ycombinator.com/item?id=5958879

It's not that unrealistic. Correct, it does not solve P = NP. It does, as another commenter pointed out, make it much faster (feasible) to reverse RSA by factorization.

Re: Recruiting. There are a _lot_ of very bright minds working for the government. Don't forget that the government is willing to pay literally any price to get the talent they need, and say "we will give you unlimited resources to all materials, any budget, anything".

Investors look like a joke if you get paid a large sum and have unlimited resources. Often with TS technologies you can still declassify parts of your research for the public and co-author papers. This is the same thing we do when say, the M1 Abrams Tank. We will export everything except still-classified parts to foreign countries for sale.

There are 5 Nobel prize winners at NIST alone, 4 in physics and 1 in chemistry.


It may be true that you can't trust anyone in the abstract but practically speaking we need to be able to trust our government, to a point, with all sorts of powers and abilities.

Consensus mistrust of the government should worry us more than any particular capabilities it has.


Consensus mistrust? Did you just coin a new phrase or is that some new lingo the kids are saying these days? I have never heard it before and google was no help.

I think the issue here is not so much "trust" but "trust and verify." With a proper level of effective oversight it seems that things would be much different.


Good money huh?

Let's do this. 10btc. Name your terms.


>So who the F can we trust?

Hint: not the secretive spy agency.


I think we would all forgive the intrusion if, as a side effect of this program, NSA was using this data to feed a spam filtering service to which we could all subscribe.


No, for me it would have to be Daniel Suarez' Daemon. Then I could skip over the forgive part, since many alphabet agencies may be rendered obsolete... And all spammers would have ceased or been desisted.

Good story, if you haven't yet read/heard it.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: