

How a Theoretical Claim About the NSA Magically Transformed Into Factual Reality - brown9-2
http://thedailybanter.com/2013/07/how-a-wild-theoretical-claim-about-the-nsa-transformed-into-factual-reality/

======
Wilya
> In reality, it would take 18,918 days or 51 years for 550 analysts to listen
> to just one day’s worth of fiber optic data gathered for Tempora.

This sort of statement is so misleading it hurts. It's kind of sad that people
are so ignorant of what computers and modern data processing can do, that
someone can go around saying this sort of things and not be laughed at.

The analysis of the citation chain is pertinent, though.

~~~
snorkel
The dumb mistake of this article is assuming that all captured data all has to
be analyzed hot off the wire. Obviously the data would be warehoused and
indexed for specific searches later.

~~~
brown9-2
or filtered by software.

------
grey-area
This article is full of holes itself.

The actual headline he's decrying is 'Millions Of Gigabytes Collected Daily'.
This is, according to the estimates and information he includes in the
article, _an accurate estimate_. It is also suitably vague given the
uncertainty over the estimates - it means anything 2 million gigabytes, up to
21 (the theoretical limit). We just don't know, what we do know is the
ambition is to collect every communication, and they are nearing that ambition
in reality, if only for a few full take days at GCHQ - the storage capacity in
terms of internets/day at NSA I'm not sure on, does anyone know what their
current capacity is?

 _Put another way, no! NSA and GCHQ are absolutely not gathering and /or
analyzing that much data per day._

How does this follow? What has analysing got do do with collecting?

 _It’s an inconceivably big number meant to frighten readers._

2 petabytes per day would be just as frightening, and yet is well below the
theoretical limit - I don't think the exact numbers here matter, this is an
estimate meant to show just how much information is being collected.

But he saves the most bizarre assumptions for last:

 _As a group, and barring some sort of vortex, 550 analysts could only listen
to 1,100 gigabytes of phone conversations per day, and that’s if they worked
24 hours per day and listened constantly. In reality, it would take 18,918
days or 51 years for 550 analysts to listen to just one day’s worth of fiber
optic data gathered for Tempora._

Is he seriously contending most of this data is only telephone conversations,
that collection without immediate analysis by a human being is not just as
dangerous, that humans actually analyse each piece of information
individually, without using filtering algorithms?

This comparison is far more misleading and inaccurate than the headline he
decries.

------
betawolf33
There are a number of misrepresentations in this article.

He talks of the impossibility of the 550 (where has this number come from?)
GHCQ and NSA analysts reading all this data. As if people are claiming that
humans are individually watching it themselves. Even if they were, he
completely ignores the obvious method: that machines under those agencies'
control are the ones collecting this data, in favour of a hand-waving 'it's
impossible'.

His statement that it's illegal for the NSA analysts to read these
communications nicely sidesteps the strong evidence that they are doing it.

Even the focus of the article, the 21 million GB claim, tries to stoke up
something much madder than what actually happened. The Guardian's reporting of
the potential volume of data on the cables is not 'wild' in any sense. That
the telephone game translated that potential figure into an actual one is not
desirable, but also not particularly surprising. Notably, he doesn't take the
opportunity to find out the _actual_ average data rate on those cables and
correct what he's criticising.

Of course, he also takes the opportunity to ladle on some criticism of
Greenwald and the Guardian, and play up the 'it's a complex issue' angle which
discourages any kind of public outcry.

------
nl
I've always thought this is interesting, if not directly related.

Apache Accumulo:

 _Like Google, the [NSA] needed a way of storing and retrieving massive
amounts of data across an army of servers, but it also needed extra tools for
protecting all that data from prying eyes. They added “cell level” software
controls that could separate various classifications of data, ensuring that
each user could only access the information they were authorized to access._

[http://www.wired.com/wiredenterprise/2012/07/nsa-accumulo-
go...](http://www.wired.com/wiredenterprise/2012/07/nsa-accumulo-google-
bigtable/)

Then:

 _How will graph applications adapt to Big Data at petabyte scale?_

 _Brain scale. . ._ : 2.84 PB adjacency list, 2.84 PB edge list

 _Largest supercomputer installations do not have enough memory to process the
Brain Graph (3 PB)!_

[Accumulo gives] _linear performance from 1 trillion to 70 trillion edges_ [on
the 1 Pb Graph 500 benchmark]

From "An NSA Big Graph Experiment":
[http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013...](http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf)

------
seabee
An interesting example of how journalists play telephone, but let's not
distract ourselves by questioning the volume of data they intercept from the
problematic fact that they do this at all.

What would worry me is a chain of citations where the pertinent facts changed
(e.g. "celebrity X has the potential for Y" -> "X did Y"). This happens too,
as you sometimes find in issues of Private Eye.

------
Canada
Deduplication. 753785 people watching the same video. The video can be stored
once, or even not at all. Storing 753785 requests for the video is easy and
takes 0 analysts.

In fact, by storing only what users send rather than what they receive you get
everything that really matters.

------
Amadou
Is there any doubt that if the NSA _could_ store 22 petabytes per day that
they would do so? That they are limited by practicality to merely snooping
through or scanning 22 petabytes per day rather than storing all of it is I
think a really minor quibble.

~~~
JonSkeptic
According to the NSA[1], their Utah center is "measured in zettabytes"

Let's break this down:

>1,000 gigabytes is a terabyte.

>1,000 terabytes is a petabyte.

>1,000 petabytes is an exabyte.

>1,000 exabytes is a zettabyte.

>1,000 zettabytes is a yottabyte.

Assuming they use the Microsoft standard to count and not 1024 bytes to a
kilobyte. Let's assume they have 1 yottabyte of data storage. That's 1000
zettabytes. That's 1,000,000 exabytes. And, finally, that's 1,000,000,000
petabytes.

At a rate of 22 petabytes per day it will take approximately 45454545.45 days
or 124,533 years to fill only the Utah data center. (Thank you Wolfram alpha
[2])

So the scary answer to your rhetorical question is that "Yes, the NSA has the
storage capacity to easily, if not comfortably store 22 petabytes of data per
day, or even 500 times that without breaking a sweat."

\----------

[1][http://nsa.gov1.info/utah-data-center/](http://nsa.gov1.info/utah-data-
center/)

[2][http://www.wolframalpha.com/input/?i=%281%C3%9710%5E9%2F22%2...](http://www.wolframalpha.com/input/?i=%281%C3%9710%5E9%2F22%29+days+to+years)

~~~
ars
> is "measured in zettabytes"

Yah right. That would require the ENTIRE worlds production of hard disks for
several years.

At 3TB per hard disk you need 350 million of them. Total world production is
about 600 million - but the vast majority are much smaller than 3TB.

And that would require a building 1/5 of a mile on each side - just to hold
the hard disks, never mind power, cooling, computers or network.

It would require 2.5GW to power just the hard disks - and never mind cool
them, or power the computers and routers.

So: Yah Right.

~~~
nl
It's not as simple as that.

Large scale commercially available SANs advertise de-duplication in the order
of 50%[1].

Commercially available _backup_ software advertises de-duplication rates that
reduce storage requirements by up to 99% [2]

I don't know how they measure "storage", but the The Utah data center requires
65 MW of power[3] which is a non-trivial amount.

[1] [http://www.netapp.com/us/products/platform-
os/dedupe.aspx](http://www.netapp.com/us/products/platform-os/dedupe.aspx)

[2]
[http://thebackupblog.typepad.com/thebackupblog/2011/07/insid...](http://thebackupblog.typepad.com/thebackupblog/2011/07/inside-
avamar-global-client-deduplication.html)

[3] [http://www.npr.org/2013/06/10/190160772/amid-data-
controvers...](http://www.npr.org/2013/06/10/190160772/amid-data-controversy-
nsa-builds-its-biggest-data-farm)

------
iamthepieman
the NSA and GCHQ have way more than 550 employees and I guarantee that the
remainder is not all just janitors, secretarial staff and management.

This article and most others have failed to mention the technological force-
multipliers (the bread and butter of HN) that their developers, sysadmins, and
other technologist are constantly working on.

With the right search capabilities and filtering that 21 million GB gets a lot
smaller.

~~~
fnordfnordfnord
Bamford stated ~70,000 in the seventies in one of his books. The Wiki
estimates 30,000. 550 is a comically low guess. We can get at a much larger
guess than 550 merely by counting parking places at Ft. Meade, and that
wouldn't include the many employees who don't work at Ft. Meade.

------
ig1
The LHC grid (as of 2010) had storage capacity of 150 petabytes. So yes it is
feasible. Efficient compression could probably increase that capacity 5-10x if
you're storing primarily textual data.

------
nsxwolf
Hey they're wrong about the amount of the data, therefore we've got nothing to
worry about!

------
isleyaardvark
This should be called "How a Trivial Fact Check Became a Hyperbolic Article".

------
hochiwa
550 security guards each in a little stadium of cctv's

always watching

------
DanielBMarkham
Yes, the worst part about the security state/NSA story is listening to and
reading those who are most upset about it. They get on an emotional tear and
play loose with facts and sources. This hurts much more than it helps, because
it begins to paint anybody concerned about what we've created as a crank. I'm
much more concerned about the ranters than I am about those that are
apathetic.

This is an issue to be _intellectually passionate_ about, not emotionally
passionate. It's something I have to keep reminding myself of daily.

News outlets and bloggers are all too willing to play telephone and translate
emotional energy into pageviews. That's not a good thing for the discussion.

~~~
fnordfnordfnord
As soon as you boil it down into boring work, it will be forgotten about by
the proles.

~~~
DanielBMarkham
The problem with this strategy is that all you get is a mindless angry mob.
This the Egypt scenario: everybody knows things are broken, but nobody can
agree on what it's supposed to look like.

Contrast this with the Federalist Papers, where an intellectual debate was
held to vet _solutions_.

You don't want "any change" to take the place of "fix what's broken" By
dumbing this down, you make what could be a productive movement into something
much more dangerous.

------
Alex3917
As a rule of thumb, conservatives base their public policy agenda on theories
that have no basis in reality, whereas liberals base their public policy
agenda on statistics that have no basis in reality. This is a good example of
the latter.

