
Estimated Cost to Store All US Phone Calls Made in a Year - danso
https://docs.google.com/spreadsheet/ccc?key=0AuqlWHQKlooOdGJrSzhBVnh0WGlzWHpCZFNVcURkX0E#gid=0
======
alexholehouse
So here's a related question I've been wondering (IANAL - _so_ hard).

A number foreign blogs/companies/people have displayed outrage that, given
that they're not protected by the legal framework American citizens are, PRISM
may give the NSA access to their data.

Irrespective of the truth associated with this, you've got to expect at least
some percentage of paying customers will move business away from American
companies. Considering this, is it conceivable that companies like Apple,
Facebook, etc, could sue the government for lost earnings as a result of the
fallout from this? Or are there a bunch of reasons why they wouldn't/couldn't
(other than the obvious ones like, don't piss off the government).

~~~
adventured
Companies like Apple count on the support of the US Government in all sorts of
trade / legal / international issues, now and in the future. Using Apple as an
example, when you're a half trillion dollar company and trying to navigate
practically every market on earth, it's an absolute nightmare at times and
helps to have a superpower that possesses the world's largest economy and
military in your corner. Having IP problems in Indonesia (wherever)? Ask your
buddies in Washington to help.

That's just one example, and a friendly one. The not-so-friendly example is
Qwest and Joe Nacchio. Or perhaps you lose lucrative government contracts for
iPads. Or perhaps the Senate massively turns up the heat on your tax avoidance
schemes, and it costs you billions because they have a record of everything
you've ever said or done via NSA spying and know where you broke some obscure
tax laws (guaranteed to happen at that scale). Or perhaps that damn pesky
anti-trust case just refuses to go away, and instead is expanded, as the Feds
start snooping around other business deals and practices (guaranteed to find
something).

Or if this were the past, maybe they light Steve Jobs on fire in the options
scandal, instead of being nice and letting it go away with the equivalent of a
slap on the wrist (with a few patsy scapegoats).

~~~
larrys
Excellent answer and also illustrates why people don't understand, and it's
obvious, why the right thing doesn't always happen in Washington. Because it
can't.

Politics is the art of compromise.

It's not about who is right and the right thing happening. It's about who can
best navigate the system to achieve what they want, or, as much as they can
without losing to much.

Your comment covers this concept perfectly.

------
adventured
If YouTube exists (specifically their math of adding X hours of video per
minute), there's absolutely no question as to whether the Feds _can_ store
every phone call. That's trivial.

The NSA has a $15+ billion budget. The FBI has a $8 billion budget. The US
military has a $638 whatever odd billion budget. The intelligence budget is
$80 or so billion.

Yeah they can afford it. That is not an issue.

~~~
raverbashing
In terms of storage costs, a phone call can be compressed and stored quite
efficiently

~~~
diydsp
Yeah, now that I read this, the scary thing is that it's so cheap, that /not
only organizations with huge budgets/ can store every single phone call.

iow Many fortune 500 companies and foreign governments, all with less legal
scruples, security and obligations than the NSA could _also_ store every
single phone call made in the U.S. - if they can get their hands on a copy.

Hell, I imagine any decent organized crime syndicate could scrape up $27M to
store all the data and start mining it for information about when homes are
unprotected.

Having all of these calls stored in one place is a HUGE liability. I said -if
they can get their hands on a copy- which will be hard to get it from the
NSA... unless there are inside jobs at the NSA and unless there are no
external contractors with less scruples, security, etc. in place.

IOW, I'm not as scared of the government having access to this data (although
I'm against it), but it's even scarier that 3rd parties can gain access to it.

~~~
adventured
Very important to note: it's not so much an issue for private companies (non-
telecom carriers) as to whether they can get their hands on something like
phone calls. It's extraordinarily illegal to mass record / copy phone
conversations (that are not yours; and sometimes even when they are yours,
depending on states and notice given) if you're a private business or
individual. I can't emphasize extraordinarily illegal enough. If you wanted to
get into deep shit real fast, and you're a mid level Fortune 500 company,
start tapping into all the nation's phone calls and save a copy of said calls.
The Feds would literally destroy you, not an exaggeration; you would never
walk right again.

I'm assuming in my example that the government isn't the one giving the calls
/ data to a private company, and that said company is doing the spying itself.

~~~
gknoy
However, I imagine that the courts might be sympathetic to a large company
recording + mining all phone calls made with company phones, since they
already went that route with e-mail.

------
bradleyland
You might as well throw out any number.

The base here is developed from the author's "family average". That doesn't,
in any way, reflect "all US phonecalls". Consider business users. There are a
substantial number of business users who talk on the phone for >1,000 minutes
per month. "Family" averages are only going to reflect personal phone calls,
which are a fraction of the phone calls made.

We also cannot assume equivalency between what the Internet Archive pays per
petabyte and what the NSA pays per petabyte. When dealing with government
projects, you have all manner of requirements that have no parallel in the
rest of the business world.

That $27.2 million number might as well be $50 million, or $100 million. It
all depends on your input variables. This is napkin math at its worst.

~~~
ctdonath
Napkin math still gets a ballpark notion.

Let's compute an outer limit: record everyone, all the time, CD quality.

44100 samples per second * 2 bytes per sample * 2 channels * 60 seconds per
minute * 60 minutes per hour * 24 hours per day * 365 days per year *
313900000 people * $100 per terabyte / 1 terabyte = $175 billion per year.
That's an absolute outer limit for cost.

That's less than 5% of federal budget, and we haven't started on the obvious
ways to cut costs by several orders of magnitude. Reduce it to 4410
samples/sec, 1 byte per sample, 1 channel, 1/10th the time and we're already
under $0.5B/yr, without even addressing audio compression (much less voice-to-
text).

~~~
bradleyland
So what inference are we drawing? I honestly wasn't aware that the capability
to _store_ all voice traffic was in question. That might be owed to my
background in telecom though.

Telephone codecs are extremely low bitrate (relative to something like music,
much less video) because you have very clear design constraints, and those
constraints are forgiving. You just need to be able to understand the caller's
voice, not accurately reproduce a live performance of Beethoven's 5th. I agree
that the estimates to store the data are on the very, very high side, but I'm
not sure that number is even significant in the whole scope of the challenge.

The challenges in obtaining every phone call made in the US aren't storage
related; they're almost entirely collection and aggregation related. These
ballpark numbers don't even attempt to factor in that portion of the cost.
IMO, the collection, transport, and aggregation systems are easily 8/10ths of
the problem, not storage. Storage is an extremely scalable solution.

If the government were going to do something like this, they'd likely tap in
to the phone networks at the same places everyone else does. There are major
telecom aggregation points - called "tandems" [1] - around the country. If you
want to set up your own phone carrier with an actual physical network of your
own, this is where you plug in. The government would have to do the same. They
can't simply tap in at a single aggregation point, because not all phone calls
pass through the same points.

From there, you face the choice of storing the data regionally in several data
centers, or attempting to aggregate it all back to a central data center. IMO,
a disaggregated approach makes a lot more sense. You could reasonably expect
to transport all CDR [2] data back to a central location, but you wouldn't
want to send all media (audio data) back to one place. It would be simple
enough to only fetch what you need based on a query against call data. You'd
want all the the CDR data in one place so you could perform your "big data"
analysis on it efficiently, then cherry pick media to pull in for analysis.

I'm certain I haven't even scratched the surface here. This only gets the
government long distance calls. It doesn't touch local, or even intraLATA
calls. Maybe the government isn't interested in those calls though. Maybe
they're only interested in international calls, which makes the problem
simpler, not harder.

Others in the thread have mentioned that this should be treated as an "order
of magnitude" estimation. I don't think we can draw any inference from this
estimation at all, because it represents such a small portion of the problem
domain. We also have no idea what the scope of the challenge is.

The entire exercise is pointless when you think about what we're asking. "Can
the government actually implement a solution to record every phone call in the
US?" I think that's a resounding yes. The solution would look a lot like
setting up a tier 2 network provider [3] with extra investment in a storage
back end. That's entirely within the realm of possibility given the size of
the US Dept of Defense budget.

[1]
[http://en.wikipedia.org/wiki/Class_4_telephone_switch](http://en.wikipedia.org/wiki/Class_4_telephone_switch)

[2]
[http://en.wikipedia.org/wiki/Call_detail_record](http://en.wikipedia.org/wiki/Call_detail_record)

[3]
[http://en.wikipedia.org/wiki/Tier_2_network](http://en.wikipedia.org/wiki/Tier_2_network)

~~~
ctdonath
I'm learning to approach such SWAG discussions filled with "golly, we just
have no idea" by computing outer limits to at least establish "it can't
possibly get any worse/bigger/costlier than $X".

In this case, I was responding to " _You might as well throw out any number_ "
by throwing out a ballpark figure for the absolute worst case audio
surveillance cost scenario: everyone, all the time. OK, so the final number is
really huge, and doesn't take into account the many nuances you mention...but
I know that reducing storage costs by orders of magnitude will leave plenty of
room to accommodate your real-world details. With that sweeping
overgeneralization, I've concluded that whatever the details of implementation
and whatever the extent of monitoring desired, the cost is well within the
operating expenditure of the US federal government.

Now that we know that it's not realistically going to cost more than 5% of
gov't spending, we can contrast that with the NSA's actual budget, and
guesstimate how big the eavesdropping effort really is from that.

If nothing else, it's an exercise to explain to common readers (not you,
you're a telecom guy who groks this stuff) that it IS in fact possible (not
easy, not cheap, but indeed practically possible) to record _every_ phone
call. The scale of capability of some modern technologies is otherwise
incomprehensible to most people; they'll dismiss the notion out of hand unless
you can quantify it in very simple terms they can instantly grasp, like
"recording everyone 24/7 would cost no more than 5% of federal operating
costs."

Kinda hard to mentally keep up with technology which has expanded a billion-
fold in capacity in just 3 decades.

~~~
bradleyland
I'd agree 100%. If I were establishing an outer limit, I would look at a
couple of tier 2 long distance carriers (because there are very few that are
nation wide), and add in a company like Backblaze or CrashPlan. It would make
a fun exercise to dig in to the quarterly filings of some tier 2 LD carriers,
but they buy and sell each other so often, and there are so few that operate
nation wide, it ends up being a lot of work. Another possibility would be to
compare to a mobile operator like Virgin or MetroPCS. Neither operate their
own networks, but they do have their own backend systems.

That would give you an idea of the size we're talking. It's really not all
that big. I don't think people realize that the technology that drives
telephone networks (especially once you get outside the last mile) lends
itself very well to snooping.

------
aidenn0
Numbers are _way_ off. QCELP8 (which is what a lot of phone calls were
actually transmitted in on the cell phone network; it's since been replaced by
the more efficient EVRC) runs at 1/8 the uncompressed size. Offline
compression can be even more efficient, since you can (to a certain amount)
trade latency for compression (and QCELP is technology designed to run on low-
power hardware from 1994, so it's not exactly a beast processing wise).

~~~
epa
It's alright that the numbers are way off (higher) as this is just a crude
estimate. If anything, it shows that this is most likely the max scenario at
which $27M is actually very cheap relative to government/army program funding.

The only thing it does not take into account is business to business calling.

------
zerohm
For comparison, one modern fighter jet (F-22) runs about $357M.

~~~
astrodust
The F-35 is even more, and that hasn't stopped the US from ordering thousands
of them.

------
ajmarsh
I've worked in several government run datacenters. No chance they stored all
that data that cheaply. They can find a way to over pay for anything.

------
IvyMike
I made a similar estimate (based on the premise of crude voice recognition of
calls, not storage) in Feb 2006.

[http://ivymike.blogspot.com/2006/02/could-nsa-wiretap-
everyb...](http://ivymike.blogspot.com/2006/02/could-nsa-wiretap-
everybody.html)

I used a different methodology to come up with call volume; I looked at the
Inter- and IntraLATA numbers (in 2004, which was the lastest available when I
did this analysis.) My number was 700 billion minutes per year.

This new estimate is around 1100 billion minutes per year, which seems very
plausible to me.

------
davepage
It is interesting to consider the purpose of the NSA Utah Data Center given
the space and cost requirements for storing phone calls are beneath trivial
(3e17 bytes, < one floor of an office building and < $0.1B).

At 1.5M square feet, it could hold 344 copies of the national phone call audio
database, based on the OP areal estimate.

An unconfirmed report [0] asserts the center will store 5e21 bytes. World
internet traffic is 3e21 bytes in 2012

[0]
[http://en.wikipedia.org/wiki/Zettabyte](http://en.wikipedia.org/wiki/Zettabyte)

~~~
at-fates-hands
confirmed.

[http://www.npr.org/2013/06/10/190160772/amid-data-
controvers...](http://www.npr.org/2013/06/10/190160772/amid-data-controversy-
nsa-builds-its-biggest-data-farm)

The estimated power of those computing resources in Utah is so massive it
requires use of a little-known unit of storage space: the zettabyte. Cisco
quantifies a zettabyte as the amount of data that would fill 250 billion DVDs.

"They would have plenty of space with five zettabytes to store at least
something on the order of 100 years worth of the worldwide communications,
phones and emails and stuff like that," Binney asserts, "and then have plenty
of space left over to do any kind of parallel processing to try to break
codes." reply

~~~
acqq
Not confirmed. Apparently, the worldwide hard disk production is around 500
million devices per year (see wikipedia). Tera is 1e12, zetta is 1e21, so
they'd need a year's world production of hard disks for one data center?

Exabyte storage seems possible, however.

~~~
davepage
Maybe. But the feds have an historical fondness for tapes.
[http://www-03.ibm.com/systems/storage/tape/ts3500/index.html](http://www-03.ibm.com/systems/storage/tape/ts3500/index.html)

A mere 500 2.7EB complexes and it starts looking like real data.

~~~
acqq
Thanks for the suggestion. As far as I see, it's up to 50 PB raw data per
library, and such a library needs 16 frames (frame is what we call a rack,
"1,800 mm H × 782 mm W × 1,212 mm D") so it's cca 3 PB (3e15) per rack. With
2.5K racks on site (100K sqft data center space according to Wikipedia) it's
still cca 8 exabytes (8e18) in the whole data center.

~~~
davepage
It is not unreasonable to assume, however, that a facility such as this would
include a custom-designed media management system which could hold cartridges
at a higher density than COTS (perhaps with an access latency tradeoff).

1M sq ft of total area, and 100k in the data center may not preclude another
100k sq ft of 'warehouse' containing millions of carts in shoe boxes with GS5s
running around in sneakers.

------
Zigurd
There are only 7 billion human mouths on the planet. Perhaps 3 billion belong
to humans with enough money to talk on phones, which is also a good filter for
basic capacity to make trouble. Capture all of that, and it will cost less
than a fraction to process and store it compared to doing it the hard way:
[http://en.wikipedia.org/wiki/Signals_intelligence_operationa...](http://en.wikipedia.org/wiki/Signals_intelligence_operational_platforms_by_nation)

There's some expensive stuff in there.

------
tootie
Headline should be "cost of storage", not "cost to store". They have to pay a
lot of guys like Ed Snowden $200,000/yr to maintain a database that size.

------
epaga
It seems to me the bandwidth of retrieving all these Petabytes of data from
the various networks into your storage space is more of a bottleneck than the
cost of storing it.

------
iterationx
This was on of the many interesting points made in the book "Cypherpunks:
Freedom and the Future of the Internet" by Julian Assange.

[http://cryptome.org/2012/12/assange-crypto-
arms.htm](http://cryptome.org/2012/12/assange-crypto-arms.htm)

------
snorkel
We think of lists of every phone call ever as lot of big data but consider
that your web browser produces many more requests per browsing session than
the total number of phone calls, texts, and tweets you produced all day.

------
randlet
This ignores the (likely) huge cost of personnel and software required to
manage it all.

~~~
spydum
Yes, and out of 272pb, storage failures will be frequent. At what looked to be
$100/tb, I have to imaging this is not accounted for.

~~~
ctdonath
If the base cost is $27M, accounting for storage failures is cheap (in
government/intelligence budget terms). A SWAG of 10x price for dual RAID-5
storage brings it to just $270M. That's peanuts for NSA types.

------
abe_duarte
I expected the cost to be so much higher. How would they record the calls
though?

~~~
DennisP
Beam splitters on the trunk lines?

