Hacker News new | past | comments | ask | show | jobs | submit login
Estimated Cost to Store All US Phone Calls Made in a Year (docs.google.com)
91 points by danso on June 18, 2013 | hide | past | favorite | 53 comments

So here's a related question I've been wondering (IANAL - so hard).

A number foreign blogs/companies/people have displayed outrage that, given that they're not protected by the legal framework American citizens are, PRISM may give the NSA access to their data.

Irrespective of the truth associated with this, you've got to expect at least some percentage of paying customers will move business away from American companies. Considering this, is it conceivable that companies like Apple, Facebook, etc, could sue the government for lost earnings as a result of the fallout from this? Or are there a bunch of reasons why they wouldn't/couldn't (other than the obvious ones like, don't piss off the government).

Companies like Apple count on the support of the US Government in all sorts of trade / legal / international issues, now and in the future. Using Apple as an example, when you're a half trillion dollar company and trying to navigate practically every market on earth, it's an absolute nightmare at times and helps to have a superpower that possesses the world's largest economy and military in your corner. Having IP problems in Indonesia (wherever)? Ask your buddies in Washington to help.

That's just one example, and a friendly one. The not-so-friendly example is Qwest and Joe Nacchio. Or perhaps you lose lucrative government contracts for iPads. Or perhaps the Senate massively turns up the heat on your tax avoidance schemes, and it costs you billions because they have a record of everything you've ever said or done via NSA spying and know where you broke some obscure tax laws (guaranteed to happen at that scale). Or perhaps that damn pesky anti-trust case just refuses to go away, and instead is expanded, as the Feds start snooping around other business deals and practices (guaranteed to find something).

Or if this were the past, maybe they light Steve Jobs on fire in the options scandal, instead of being nice and letting it go away with the equivalent of a slap on the wrist (with a few patsy scapegoats).

Excellent answer and also illustrates why people don't understand, and it's obvious, why the right thing doesn't always happen in Washington. Because it can't.

Politics is the art of compromise.

It's not about who is right and the right thing happening. It's about who can best navigate the system to achieve what they want, or, as much as they can without losing to much.

Your comment covers this concept perfectly.

According to their latest tax returns Facebook and Apple have a net loss in the US so if anything they should pay the government a commission for sending customers to their more profitable offices in Ireland and the Caribbean.

I'd love to see if/how traffic has changed since the announcements. I wouldn't be surprised if we look back on this as a watershed moment for the whole Internet industry.

I doubt it's changed much.

Where would the traffic go?

It's hard enough replacing products & services from the 'big companies' (google, facebook apple, etc) with smaller ones operating inside the US. Let alone replacing them with services outside the US.

Where does the traffic come from?

Do the majority of users really care that they're being spied on. "If you have nothing to hide..." seems to be a reasonably common way of thinking.

Well, at the minimum, I guess businesses will think twice about using Skype or Google voice

If YouTube exists (specifically their math of adding X hours of video per minute), there's absolutely no question as to whether the Feds can store every phone call. That's trivial.

The NSA has a $15+ billion budget. The FBI has a $8 billion budget. The US military has a $638 whatever odd billion budget. The intelligence budget is $80 or so billion.

Yeah they can afford it. That is not an issue.

In terms of storage costs, a phone call can be compressed and stored quite efficiently

Yeah, now that I read this, the scary thing is that it's so cheap, that /not only organizations with huge budgets/ can store every single phone call.

iow Many fortune 500 companies and foreign governments, all with less legal scruples, security and obligations than the NSA could also store every single phone call made in the U.S. - if they can get their hands on a copy.

Hell, I imagine any decent organized crime syndicate could scrape up $27M to store all the data and start mining it for information about when homes are unprotected.

Having all of these calls stored in one place is a HUGE liability. I said -if they can get their hands on a copy- which will be hard to get it from the NSA... unless there are inside jobs at the NSA and unless there are no external contractors with less scruples, security, etc. in place.

IOW, I'm not as scared of the government having access to this data (although I'm against it), but it's even scarier that 3rd parties can gain access to it.

Very important to note: it's not so much an issue for private companies (non-telecom carriers) as to whether they can get their hands on something like phone calls. It's extraordinarily illegal to mass record / copy phone conversations (that are not yours; and sometimes even when they are yours, depending on states and notice given) if you're a private business or individual. I can't emphasize extraordinarily illegal enough. If you wanted to get into deep shit real fast, and you're a mid level Fortune 500 company, start tapping into all the nation's phone calls and save a copy of said calls. The Feds would literally destroy you, not an exaggeration; you would never walk right again.

I'm assuming in my example that the government isn't the one giving the calls / data to a private company, and that said company is doing the spying itself.

However, I imagine that the courts might be sympathetic to a large company recording + mining all phone calls made with company phones, since they already went that route with e-mail.

It's illegal for the NSA to record it too though, right?

It was illegal for the telcos to go along with the first round of warrantless wiretapping, it took an act of congress to make it retroactively legal.

...transcripted to text and compressed. even better.

If it's automatic...

But I believe there's value in keeping the recording, for voice matching, recording of other sounds in the call, etc

You might as well throw out any number.

The base here is developed from the author's "family average". That doesn't, in any way, reflect "all US phonecalls". Consider business users. There are a substantial number of business users who talk on the phone for >1,000 minutes per month. "Family" averages are only going to reflect personal phone calls, which are a fraction of the phone calls made.

We also cannot assume equivalency between what the Internet Archive pays per petabyte and what the NSA pays per petabyte. When dealing with government projects, you have all manner of requirements that have no parallel in the rest of the business world.

That $27.2 million number might as well be $50 million, or $100 million. It all depends on your input variables. This is napkin math at its worst.

Napkin math still gets a ballpark notion.

Let's compute an outer limit: record everyone, all the time, CD quality.

44100 samples per second * 2 bytes per sample * 2 channels * 60 seconds per minute * 60 minutes per hour * 24 hours per day * 365 days per year * 313900000 people * $100 per terabyte / 1 terabyte = $175 billion per year. That's an absolute outer limit for cost.

That's less than 5% of federal budget, and we haven't started on the obvious ways to cut costs by several orders of magnitude. Reduce it to 4410 samples/sec, 1 byte per sample, 1 channel, 1/10th the time and we're already under $0.5B/yr, without even addressing audio compression (much less voice-to-text).

So what inference are we drawing? I honestly wasn't aware that the capability to store all voice traffic was in question. That might be owed to my background in telecom though.

Telephone codecs are extremely low bitrate (relative to something like music, much less video) because you have very clear design constraints, and those constraints are forgiving. You just need to be able to understand the caller's voice, not accurately reproduce a live performance of Beethoven's 5th. I agree that the estimates to store the data are on the very, very high side, but I'm not sure that number is even significant in the whole scope of the challenge.

The challenges in obtaining every phone call made in the US aren't storage related; they're almost entirely collection and aggregation related. These ballpark numbers don't even attempt to factor in that portion of the cost. IMO, the collection, transport, and aggregation systems are easily 8/10ths of the problem, not storage. Storage is an extremely scalable solution.

If the government were going to do something like this, they'd likely tap in to the phone networks at the same places everyone else does. There are major telecom aggregation points - called "tandems" [1] - around the country. If you want to set up your own phone carrier with an actual physical network of your own, this is where you plug in. The government would have to do the same. They can't simply tap in at a single aggregation point, because not all phone calls pass through the same points.

From there, you face the choice of storing the data regionally in several data centers, or attempting to aggregate it all back to a central data center. IMO, a disaggregated approach makes a lot more sense. You could reasonably expect to transport all CDR [2] data back to a central location, but you wouldn't want to send all media (audio data) back to one place. It would be simple enough to only fetch what you need based on a query against call data. You'd want all the the CDR data in one place so you could perform your "big data" analysis on it efficiently, then cherry pick media to pull in for analysis.

I'm certain I haven't even scratched the surface here. This only gets the government long distance calls. It doesn't touch local, or even intraLATA calls. Maybe the government isn't interested in those calls though. Maybe they're only interested in international calls, which makes the problem simpler, not harder.

Others in the thread have mentioned that this should be treated as an "order of magnitude" estimation. I don't think we can draw any inference from this estimation at all, because it represents such a small portion of the problem domain. We also have no idea what the scope of the challenge is.

The entire exercise is pointless when you think about what we're asking. "Can the government actually implement a solution to record every phone call in the US?" I think that's a resounding yes. The solution would look a lot like setting up a tier 2 network provider [3] with extra investment in a storage back end. That's entirely within the realm of possibility given the size of the US Dept of Defense budget.

[1] http://en.wikipedia.org/wiki/Class_4_telephone_switch

[2] http://en.wikipedia.org/wiki/Call_detail_record

[3] http://en.wikipedia.org/wiki/Tier_2_network

I'm learning to approach such SWAG discussions filled with "golly, we just have no idea" by computing outer limits to at least establish "it can't possibly get any worse/bigger/costlier than $X".

In this case, I was responding to "You might as well throw out any number" by throwing out a ballpark figure for the absolute worst case audio surveillance cost scenario: everyone, all the time. OK, so the final number is really huge, and doesn't take into account the many nuances you mention...but I know that reducing storage costs by orders of magnitude will leave plenty of room to accommodate your real-world details. With that sweeping overgeneralization, I've concluded that whatever the details of implementation and whatever the extent of monitoring desired, the cost is well within the operating expenditure of the US federal government.

Now that we know that it's not realistically going to cost more than 5% of gov't spending, we can contrast that with the NSA's actual budget, and guesstimate how big the eavesdropping effort really is from that.

If nothing else, it's an exercise to explain to common readers (not you, you're a telecom guy who groks this stuff) that it IS in fact possible (not easy, not cheap, but indeed practically possible) to record every phone call. The scale of capability of some modern technologies is otherwise incomprehensible to most people; they'll dismiss the notion out of hand unless you can quantify it in very simple terms they can instantly grasp, like "recording everyone 24/7 would cost no more than 5% of federal operating costs."

Kinda hard to mentally keep up with technology which has expanded a billion-fold in capacity in just 3 decades.

I'd agree 100%. If I were establishing an outer limit, I would look at a couple of tier 2 long distance carriers (because there are very few that are nation wide), and add in a company like Backblaze or CrashPlan. It would make a fun exercise to dig in to the quarterly filings of some tier 2 LD carriers, but they buy and sell each other so often, and there are so few that operate nation wide, it ends up being a lot of work. Another possibility would be to compare to a mobile operator like Virgin or MetroPCS. Neither operate their own networks, but they do have their own backend systems.

That would give you an idea of the size we're talking. It's really not all that big. I don't think people realize that the technology that drives telephone networks (especially once you get outside the last mile) lends itself very well to snooping.

It's safe to assume these kinds of estimates are "order of magnitude" estimates, meaning they could be off by a factor of 2 or more. The goal isn't to build your own system or to bill the government, it's to put the numbers in perspective. Case in point: if you had asked me if phone recording cost closer to 10 billion dollars or 100 million dollars I would have guessed 10 billion; now I know better.

Being able to quickly do order of magnitude estimates (especially in your head) is a very important skill IMO. I've heard McKinsey makes it part of their interview process.

Also need to tack on the cost of developing the system, oversight, bidding etc. Still I would have guess much higher as well.

Napkin math sure, but not at its worst. It's just an very rough estimation with explanations for numbers given and not claiming to be exact. I think the most valuable lesson here is that it's likely to be in the ballpark of millions of dollars and that in theory its possible to do it for $27.2 million.

I think the point here is that its not some ridiculously huge number, like $100 billion. It is feasible for the NSA to actually store a substantial percentage of US phone calls if they wanted.

OTOH, the estimated bytes per call truly is conservative. The 8KB/s estimate is based on G.711, and using G.726 or G.729 would yield a 2-8x reduction in storage. So this is definitely within an order of magnitude: even if the minutes of talk time are off by a factor of 10, you could offset most of that with better codecs.

(Credit: commenter petrilli on Schneier's recent post about this.)

The point is that even quadrupled, the number is so low, it's a drop in the bucket for the government.

Sure, but the point doesn't change. All the calls made in the US can be stored on a budget that isn't out of reach for an organization like the NSA. Napkin math is all about getting you within the right order of magnitude and this manages that just fine.

Any B2C calls are reflected (well, half-reflected actually) on this sheet and B2B don't talk so much anyway, I guess.

Numbers are way off. QCELP8 (which is what a lot of phone calls were actually transmitted in on the cell phone network; it's since been replaced by the more efficient EVRC) runs at 1/8 the uncompressed size. Offline compression can be even more efficient, since you can (to a certain amount) trade latency for compression (and QCELP is technology designed to run on low-power hardware from 1994, so it's not exactly a beast processing wise).

It's alright that the numbers are way off (higher) as this is just a crude estimate. If anything, it shows that this is most likely the max scenario at which $27M is actually very cheap relative to government/army program funding.

The only thing it does not take into account is business to business calling.

Yep, only a full-bandwidth mu-law/A-law call over a T1/E1 or an aggregation of such is going to stream that many raw bits in each direction. Quickly transcode to linear via a lookup table, sum the two sides into a single channel, shift right one bit, then compress, even as mp3, and you're easily at 1/8 the assumed size.

For comparison, one modern fighter jet (F-22) runs about $357M.

The F-35 is even more, and that hasn't stopped the US from ordering thousands of them.

I've worked in several government run datacenters. No chance they stored all that data that cheaply. They can find a way to over pay for anything.

I made a similar estimate (based on the premise of crude voice recognition of calls, not storage) in Feb 2006.


I used a different methodology to come up with call volume; I looked at the Inter- and IntraLATA numbers (in 2004, which was the lastest available when I did this analysis.) My number was 700 billion minutes per year.

This new estimate is around 1100 billion minutes per year, which seems very plausible to me.

It is interesting to consider the purpose of the NSA Utah Data Center given the space and cost requirements for storing phone calls are beneath trivial (3e17 bytes, < one floor of an office building and < $0.1B).

At 1.5M square feet, it could hold 344 copies of the national phone call audio database, based on the OP areal estimate.

An unconfirmed report [0] asserts the center will store 5e21 bytes. World internet traffic is 3e21 bytes in 2012

[0] http://en.wikipedia.org/wiki/Zettabyte



The estimated power of those computing resources in Utah is so massive it requires use of a little-known unit of storage space: the zettabyte. Cisco quantifies a zettabyte as the amount of data that would fill 250 billion DVDs.

"They would have plenty of space with five zettabytes to store at least something on the order of 100 years worth of the worldwide communications, phones and emails and stuff like that," Binney asserts, "and then have plenty of space left over to do any kind of parallel processing to try to break codes." reply

Not confirmed. Apparently, the worldwide hard disk production is around 500 million devices per year (see wikipedia). Tera is 1e12, zetta is 1e21, so they'd need a year's world production of hard disks for one data center?

Exabyte storage seems possible, however.

Maybe. But the feds have an historical fondness for tapes. http://www-03.ibm.com/systems/storage/tape/ts3500/index.html

A mere 500 2.7EB complexes and it starts looking like real data.

Thanks for the suggestion. As far as I see, it's up to 50 PB raw data per library, and such a library needs 16 frames (frame is what we call a rack, "1,800 mm H × 782 mm W × 1,212 mm D") so it's cca 3 PB (3e15) per rack. With 2.5K racks on site (100K sqft data center space according to Wikipedia) it's still cca 8 exabytes (8e18) in the whole data center.

It is not unreasonable to assume, however, that a facility such as this would include a custom-designed media management system which could hold cartridges at a higher density than COTS (perhaps with an access latency tradeoff).

1M sq ft of total area, and 100k in the data center may not preclude another 100k sq ft of 'warehouse' containing millions of carts in shoe boxes with GS5s running around in sneakers.

Also, keep in mind that Utah has stupid cheap power (4.46¢ per kWh): http://www.edcutah.com/files/Section7_Utilities_09.pdf

Cheaper power = more cost-effective storage.

There are only 7 billion human mouths on the planet. Perhaps 3 billion belong to humans with enough money to talk on phones, which is also a good filter for basic capacity to make trouble. Capture all of that, and it will cost less than a fraction to process and store it compared to doing it the hard way: http://en.wikipedia.org/wiki/Signals_intelligence_operationa...

There's some expensive stuff in there.

Headline should be "cost of storage", not "cost to store". They have to pay a lot of guys like Ed Snowden $200,000/yr to maintain a database that size.

It seems to me the bandwidth of retrieving all these Petabytes of data from the various networks into your storage space is more of a bottleneck than the cost of storing it.

This was on of the many interesting points made in the book "Cypherpunks: Freedom and the Future of the Internet" by Julian Assange.


We think of lists of every phone call ever as lot of big data but consider that your web browser produces many more requests per browsing session than the total number of phone calls, texts, and tweets you produced all day.

This ignores the (likely) huge cost of personnel and software required to manage it all.

Yes, and out of 272pb, storage failures will be frequent. At what looked to be $100/tb, I have to imaging this is not accounted for.

If the base cost is $27M, accounting for storage failures is cheap (in government/intelligence budget terms). A SWAG of 10x price for dual RAID-5 storage brings it to just $270M. That's peanuts for NSA types.

But maybe they don't need to manage it all? Maybe they need to keep records from the average person for about a month, and records for special targets much much longer. Perhaps that would ease the maintenance burden.

I expected the cost to be so much higher. How would they record the calls though?

Beam splitters on the trunk lines?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact