Hacker News new | past | comments | ask | show | jobs | submit login

This mentioned the NSA's "Mission Data Repository" in Bluffdale, Utah. They mentioned it could hold 1 yottabyte of data.

Let's put into perspective 1 yottabyte:

All Gmail accounts (~500 million users * 10GB/user = ~5000 PB) + All Facebook photos (~2 billion users * 1GB/user = ~2000 PB) + All of Netflix's videos (1-5 PB) + Library of Congress (10-30 PB) + Wikipedia (0.0005 PB)

= ~7000 PB = 7 Exabytes. = 0.0007% of 1 Yottabyte!!!

1 Yottabyte = 250 billion 4TB hard drives.

A hard drive is about 4" x 1" x 5.75".

The Pentagon is a big building (6,636,360 sqft over 5 floors). If you started stacking hard drives inside the Pentagon it would take about 50 pentagons to hold 250 billion hard drives.

At scale you might be able to make a 4TB hard drive for somewhere between $10 and $100.

1 Yottabyte would be $2.5 trillion - $25 trillion in hard drives. That's a couple USA GDPs.

Okay, I think a yottabyte clearly can't be what they mean because that's just unfathomable.

They also mention a 1 million sqft facility.

In a 1 million sqft you can probably pack about 250 million 3.5" hard drives. If each drive was 4TB you'd end up with 1 million PB, or 1000 EB, or 1 Zettabyte

So by Yottabyte they might (maybe) mean Zettabyte. Only off by a factor of 1,000.

Even still, all of the data of Gmail, Facebook, Netflix, Library of Congress, etc is still probably only ~10% of this data center.


Some of the older estimates I've found:

NPR says zettabytes: http://www.npr.org/2013/06/10/190160772/amid-data-controvers...

Wired (2012) says yottabytes (maybe where this originally came from): http://www.wired.com/2012/03/ff_nsadatacenter/all/

From the NPR article:

"The NSA's Utah Data Center will be able to handle and process five zettabytes of data, according to William Binney, a former NSA technical director . Binney's calculation is an estimate. An NSA spokeswoman says the actual data capacity of the center is classified."

Isn't it possible that these numbers reference the amount of data moving through the system ("handle and process") instead of raw storage capacity, which is probably significantly less and matches more realistic private-sector abilities?

I think it's very plausible the datacenter will be capable of storing five zettabytes by the end of this decade (yottabytes are impossible, short of the NSA having wildly advanced technology we know nothing about yet; the energy cost alone based on today's technology would 'bankrupt' the NSA). I'm skeptical it got up and running with that kind of scale on day one. It'd be ridiculously far beyond their storage needs for many many years. I wouldn't be surprised if it was designed with that kind of scale in mind however, as the NSA would be thinking 10 and 20 years out.

Five zettabytes would of course be enough storage to allocate roughly one terabyte for every person on the planet the NSA could even theoretically hope to grab a single byte of data on in the next 20 years.

In ~20 years I have to suspect 5 ZB will not be nearly enough storage for what the NSA has in mind (due mostly to the expansion of information being stored in ever higher definition video and plausibly in some VR format within a decade).

Could they be using tape drives? This wikipedia entry says limits of 35TB were achieved in 2011 and limits of 185TB were achieved in 2014. The 2011 achievement wasn't expected to be commercially available for 10 years. Would that also preclude industrial/government availability?


Edit: Ha, nevermind, even if they had 185TB tape drives, they'd still need 5 billion of them (unless my math is way off).

Backblaze wrote an article last year where, based on the government agencies that have expressed interested in building their own Storage Pods, including the CIA, they conclude:

"So does the NSA store surveillance data on Backblaze Storage Pods?

We don’t know for sure and certainly the NSA is certainly not publishing their storage architecture. However, between the multiple government agencies using and exploring Backblaze Storage Pods and the pods characteristics as highly-dense, cost-efficient, and open source systems, certainly makes them a very likely candidate." [1]

[1] - https://www.backblaze.com/blog/is-the-nsa-using-backblaze-st...

Ever heard of something called web service ? Why govt would duplicate data if it is already available somewhere ? These companies have built APIs and services that provide data directly to NSA ( allegedly ! ) Just because you can do multiplication and division doesn't mean they are doing way you are thinking. These are 1000 time smart-ass people than you who have pretty solid ways of petabytes of data. By the way 1 yottabyte is theoretical capacity you dumb ass. Next time think a little before you present your hypothesis and before writing off all claims made by others.

I know it seems crazy but they might have a storage technique that is unknown to the public.

That seems unlikely, considering that there is a massive incentive for private firms with tons of money to attempt to create such a storage technique. I doubt any government could beat them to the punch while keeping it a secret.

Especially given the NSA borrows an awful lot of its software inspiration from the private sector (eg Google's BigTable). There's little reason to think their hardware efforts would be dramatically more advanced if their software effort isn't.

What the NSA has are three things: a lot of money coming in every year; a lot of relatively intelligent and highly skilled people working for it; and a license to do terrible things and get away with it (ie they can take incredible risks, and try outrageous things, with minimum concern, and or certainly previously could).

The NSA has a budget roughly the size of Microsoft's annual R&D budget, without needing an $80 billion highly profitable business to be maintained. It's amazing what you can do and or attempt with a 'free' $10 or $12 billion per year to burn.

> What the NSA has are three things: a lot of relatively intelligent and highly skilled people working for it;

Do they though? I have sometimes wondered about this. In order to work for the NSA you have to make a lot of sacrifices. You have to be a US citizen. You have to pass a background check. You have to be contend with a government salary. You may never talk about your job (how does that look on a resume if you ever want to apply somewhere else? Prior experience: classified). You have to work in one of the few locations they operate in. You have to be a pretty hardcore patriot to put up with the things the NSA is doing and still be able to sleep at night.

In summary: it seems that the pool of potential employees should be severely limited. Hence my guess would be that the top talent ends up in the private sector instead of the NSA.

Or, just read the NSA quote from the movie Good Will Hunting again: http://www.imdb.com/title/tt0119217/quotes?item=qt0408102

I don't think they'd likely be way off what's available, but I also think if you're buying 5 Million drives (probably over X years) -- I'm sure you can get some customizations done. So, considering 4tb ssd drives are now arriving in consumer markets -- that does change the equation a bit. And if you plan for 16tb drives in the next 3 years?


Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact