
NASA to launch 247 petabytes of data into AWS, but forgot about egress costs - nobita
https://www.theregister.co.uk/2020/03/19/nasa_cloud_data_migration_mess/
======
slowhand09
Wow! I worked on EODSIS in 93-96. We estimated 16 petabytes, at the time it
would be one of the worlds largest databases. We changed horses midstream
moving our user interfaces from X-windows Motif to WWW. And built a very early
Oracle DB accessible via WWW. There was no cloud then except missions studying
atmospheric water vapor. When this was originally designed there were to be
several (6-7) DAACs - Distributed Active Archive Centers
([https://earthdata.nasa.gov/eosdis/daacs](https://earthdata.nasa.gov/eosdis/daacs))
to store data near where it was needed or captured. Now they have 12 and are
storing on AWS. Amazon didn't exist when this was originally built.

~~~
acruns
I was at ESRI when we were going to host this data, then Congress got involved
and blocked it.

~~~
sizzle
Awesome company, passed up on their offer at the beautiful Redlands campus.

------
anthonylukach
This article seems short sighted.

1\. Using the AWS cost calculator is pointless, naturally an entity the size
of NASA would get heavily discounted rates. 2\. As data volume grows, the
complexities of working with that data expands. NASA appears to be embracing
cloud computing by embracing a paradigm where scientists push computation to
where the data rests rather than downloading data [1], [2], [3], thereby
paying egress on only the higher order data products. 3\. The report notes
that NASA has tooling to rate limit and throttle access to data. This, in
itself, proves that NASA didn't "[forget] about eye-watering cloudy egress
costs before lift-off".

People may scream about vendor lock in, which is a fair complaint; but acting
like NASA just didn't think about egress is misleading.

NASA is ultimately a science institution, I think diverting effort away from
infrastructure management and towards studying data is likely a wise decision.

[1:
[https://www.hec.nasa.gov/news/features/2018/cloud_computing_...](https://www.hec.nasa.gov/news/features/2018/cloud_computing_services.html)]
[2:
[https://link.springer.com/article/10.1007/s10712-019-09541-z](https://link.springer.com/article/10.1007/s10712-019-09541-z)]
[3:
[https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstra...](https://ui.adsabs.harvard.edu/abs/2017AGUFMIN21F..02P/abstract)]

~~~
Supermancho
> NASA would get heavily discounted rates

Having spent a lot of money with AWS, that's giving Amazon more credit than I
think is warranted.

~~~
ganstyles
+1 to this. I've been on teams that spent $75k/mo and didn't get any hint of a
discount. Though we got our own on call rep to handle issues.

~~~
soared
$75k/mo is tiny in the enterprise world. At Oracle they’d give a 22 year old
fresh out of school ~30 accounts that size, for reference. I worked on a team
of 9ish on a ~$5MM/mo account. (Not cloud, but a comparable business unit)

~~~
bosswipe
At which level do you start having real negotiation power?

~~~
Mandatum
$10MM/year is considered a big deal to large enterprise in NA, $1MM outside of
NA.

~~~
gkolli
NA is North America, correct?

~~~
HeWhoLurksLate
Sí, señor

------
Dunedan
> “However, when end users download data from Earthdata Cloud, the agency, not
> the user, will be charged every time data is egressed.

Not necessarily, depending on how the users access the data. If users access
the data through their own AWS accounts, NASA could leverage S3's "Requester
Pays" feature [1], to let the user pay for downloading the data.

1:
[https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPay...](https://docs.aws.amazon.com/AmazonS3/latest/dev/RequesterPaysBuckets.html)

~~~
dpcx
I immediately thought about this as well, however I seem to recall reading
somewhere (and I could be entirely wrong here) that NASA has a requirement to
give away freely their science data.

~~~
jfk13
If there's a marginal cost for each copy of the data that's transferred to a
user, I don't think asking the user to cover that cost conflicts with a
requirement to "give away the data".

(If they distributed their science data in printed form, surely they'd be
allowed to charge people for the cost of printing & mailing the paper copies;
that's quite different from charging for the data itself.)

~~~
elcritch
Why the downvotes? This isn't uncommon or unreasonable if you're downloading
TB's of data. Also the data would be freely redistributable if someone took
the data and put up a torrent. Still I'd rather see NASA host their own data.
Put up an FTP server, torrent server and save a lot of money on hosting fees.

~~~
topkai22
While proxying through a torrent system is a good idea. I doubt it would get
well seeded outside a few popular datasets- the agency would end up the sole
seeder of the long tail.

I’m willing to bet NASA saves a ton of money by going to a cloud provider- US
government storage setups are insanely expensive. I remember a project I was
on got a quote of over $10,000/TB in 2014, and there is no way egress is
actually free right now- they are paying for a government regulation compliant
internet connection one way or another.

I do worry about vendor lock in to a degree, but I’m confident the agency and
tax payers would save money going to any major cloud provider.

~~~
Spooky23
Sounds like there is a bigger story there and it's probably a managed SAN.

I've operated pretty significant government shared infrastructures like this
in the past... we were offering fast, flash-cached disk in 2010 for about
$5,000/TB. $10k/TB is not unreasonable for highly available Tier-1 storage for
something like SAP, especially in that era where you couldn't use all flash in
most case.

Today, cost structures can be very different. You can land high-iop storage
for a fraction of the cost without the overhead of a big SAN. If you need
capacity focused storage, that is also much cheaper.

An agency like NASA gets hosed on services, and cloud is no different. AWS is
probably a net savings for operational workloads whose characteristics are
known. Backup is a no-brainer. But for a high-volume, operationally highly
variable thing like a public archive of data, AWS a square peg in a round hole
because of the metered access.

~~~
topkai22
I’m sure that $10k/terabyte quote was complete overkill for what we needed-
but that’s what the stove piped storage org was offering, and it killed the
project we were working on.

~~~
2J0
I hope you can correct my numbers but I am pretty sure this is within the same
decimal order of magnitude :

If 1-2TB drives were handily $1k in 2010 (2005 $1K hot you 128GB 15KRPM)

and your array set is at least R10,

already raw storage is approaching half of ten thousand dollars.

And this ignores controllers, cabling and chassis.

And this is before we look at our storage software licenses.

Is backup, point in time SLA, replication and availability in this budget?

~~~
topkai22
I wasn't really sure what they pitched us technically, but your pitch sounds
reasonable. It was also complete overkill- we were hosting read only static
images (map tiles). Azure and AWS were less than $300/TB/Year at the time, and
their triple replication was more than what we needed availability wise.

------
djrogers
I'm not saying this won't be a financial cluster - it likely will cost many
times more than planned - but the headline here is just a flat-out lie.

TFA says:

"a March audit report [PDF] from NASA's Inspector General noticed EOSDIS
hadn’t properly modeled what data egress charges would do to its cloudy plan."

'Hadn't properly modeled' is very different from 'forgot about'. And if you
actually read the linked report, it says things like:

"ESDIS officials said they plan to educate end users on accessing data stored
in the cloud, including providing tools to enable them to process the data in
the cloud to avoid egress charges." and "To mitigate the challenges associated
with potential high egress costs when end-users access data, ESDIS plans to
monitor such access and “throttle” back access to the data"

Neither of those statements would be _in the audit_ if the entire topic had
been a surprise.

~~~
tyingq
From that linked report...

 _" In addition, ESDIS has yet to determine which data sets will transition to
the cloud nor has it developed cost models with the benefit of operational
experience and metrics for usage and egress."_

That sounds fairly close to the headline.

------
unhammer

        YOU ARE NOT AFRAID?
        'Not yet. But, er...which way to the egress, please?'
        There was a pause. Then Death said, in a puzzled voice: ISN'T THAT A FEMALE EAGLE?
    

I've been reading A Hat Full of Sky to my daughter these days, and there's a
running joke that "supposedly intelligent people" don't know the meaning of
the word "egress", mixing it up with things like egret, ogress or eagles.

(See also the inspiration for the joke: [https://unrealfacts.com/pt-barnum-
would-trick-people-with-a-...](https://unrealfacts.com/pt-barnum-would-trick-
people-with-a-this-way-to-egress-sign/) )

------
ghostpepper
There's a joke around here somewhere about AWS pricing being too difficult
even for rocket scientists.

~~~
leni536
Apparently AWS pricing is not rocket science

------
movedx
It's The Register, people. Don't take it seriously. It's practically The Onion
of the IT industry, especially the comments sections.

I've written two articles for them and the comments are a joke. They're all
anti-Cloud, anti-progressive. Try selling them Kubernetes has a solution to
their problems: they'll think you've come to steal their children. I know,
I've tried.

In short: this never happened. NASA didn't forget anything. It does, however,
make for a great eye catching headline!

Sorry to be bitter about this, but publications like The Register serve little
purpose these days. It caters to a specific kind of IT personality that can't
let go of their physical tin and they think public Cloud has no place or use
at all. Again I know, I've tried convincing these people of such things.

~~~
mturmon
Smartest comment so far in the thread. The issue of cloud egress has been
known and worked at NASA for a decade now, and the article treats it like an
OMG moment.

Historically, data have been stored and processed on-premise but NASA has been
migrating data and processing to the cloud where it makes sense. For instance,
it makes a lot of sense to burst out to the cloud for near-real-time
processing during and just after natural disasters like earthquakes and forest
fires.

The large missions they mention (SWOT, NISAR - big radars in Earth orbit) are
drivers of the shift of more processing + data to the cloud, because they will
generate an unprecedented amount of data. They are pathfinders. By percentage,
very little of that data will ever egress - it's low-level and uncalibrated -
so a cached strategy could be valuable.

Here are some slides giving background on the SWOT/NISAR data system. They are
from 2017, so more has happened in the meantime, but they touch on some of
these issues:

[https://smd-prod.s3.amazonaws.com/science-red/s3fs-public/at...](https://smd-
prod.s3.amazonaws.com/science-red/s3fs-
public/atoms/files/13%20-%20Day%202%20SDS%20Considerations%20for%20SWOT%20and%20NISAR%20-%20Hua.pdf)

Regarding the step function in data volume, see the humorous slide #4.

------
pixelbath
Unless my numbers are _way_ off, I got around $15.5 million per year using
Backblaze's calculator: [https://www.backblaze.com/b2/cloud-storage-
pricing.html](https://www.backblaze.com/b2/cloud-storage-pricing.html)

Numbers used:

    
    
      Initial upload:   258998272 GB (1024*1024*247)
      Monthly upload:   100 GB (default)
      Monthly delete:   5 GB (default)
      Monthly download: 1048576 GB (1 PB)
    
      Period of Time:   12 months (default)

~~~
adtac
It'll take 215,000 years to reach 247 petabytes if you averaged 100 GB of
upload a month.

~~~
bhandziuk
I think they're saying NASA would add ~100GB of new data to this dataset every
month.

~~~
adtac
I know. And I'm saying if that was the rate they've historically added data to
their dataset, it would've taken them 200,000+ years to get here. Which is why
100GB/mo is virtually nothing for NASA -- it doesn't match with their
historical throughput.

~~~
bhandziuk
I see what you're saying. Yeah, I agree.

------
ackbar03
Oh but aws didn't forget. Aws never forgets

~~~
gonzo41
This is kinda bad press for AWS. If I were NASA I'd be shitty about the
relationship manager not hinting and trying to help architect for lowest cost.

~~~
NikolaeVarius
Since when the hell does NASA actually care about bad press regarding costs
these days.

~~~
badwolf
NASA is spending over $1B for a launch pad that will be used no more than 4
times.

[https://spacenews.com/report-finds-delays-and-cost-
overruns-...](https://spacenews.com/report-finds-delays-and-cost-overruns-in-
sls-mobile-launch-platform-development/)

~~~
whatshisface
How much would a launch pad that will be used four times normally cost for
what they're planning to launch? Without knowing that I can't say if they
overpaid 10x, 2x, got it exactly right or got an amazing bargain.

~~~
yborg
In 1965, the Vertical Assembly Building, which was at that time the largest
enclosed volume in the world, cost $117M (on a $23.5M original construction
contract). That would be about a billion dollars in 2020, but it was completed
in 3 years and was used to stack 13 Saturn Vs. It was later used for the 100+
Shuttle missions as well, but there were additional costs to modify the
building for this purpose. The VAB is still planned for use for future
missions.

[https://www.popsci.com/blog-network/vintage-space/nasas-
vab-...](https://www.popsci.com/blog-network/vintage-space/nasas-vab-garage-
saturn-v/)

~~~
dwighttk
so how many of those things were launched from the VAB?

~~~
yborg
I picked the VAB because it's current dollar cost was roughly a billion
dollars.

Total cost for constructing Launch Complex 39, which includes the VAB and the
crawler-transporter launchers was estimated 1t $500M in 1962 for 2 pads. A
total of 153 launches have occurred from LC-39. This number is greater than 4.

------
NikolaeVarius
Senator Shelby should get AWS to launch a new region in Alabama for NASA at
this rate.

------
OzzyB
Looks like even the big boys get bitten by the Cloud Meme when forgetting
about bandwidth costs; glad I'm not the only one.

------
7777fps
I assume the data accessed is a heavily skewed pareto distribution.

Given that, it's maybe still cheaper to build their own serving / caching
layer in front to save egress costs than to have constructed the whole storage
solution themselves.

~~~
vidarh
Putting a caching layer in front of AWS is often very cost effective even
without much skew in the access pattern. It tends to take a very low hit rate
before it pays for itself.

------
knorker
This surely was entirely known to AWS, where they were rubbing their hands at
the fact that every user of this data has to process it using EC2 on site.

This is Cloud lock-in using data location.

------
tehalex
I wonder if this includes or if they can use Direct Connect? [1]

Cloud data transfers are too expensive, personally I assume that it costs more
to measure and bill for bandwidth than the usage itself...

1:
[https://aws.amazon.com/directconnect/](https://aws.amazon.com/directconnect/)

~~~
angry_octet
They could use direct connect, from each of their data centres, essentially
turning AWS into a giant NAS. However this gives up the idea of using AWS
compute to provide value added analysis.

------
toomuchtodo
Cue the cloud apologists that “it’s better to use the cloud than to build and
manage your own infra”.

This is why you build and run your own storage, similar to Backblaze (who is
almost entirely bootstrapped except for one reasonable round of investment).

~~~
Karunamon
Cue the cloud detractors that "a failure to do due diligence (in this case: 15
minutes on the pricing calculator) on your computing platform should be held
against the whole platform".

Snark aside, it entirely depends on what you're doing. AWS probably has better
engineers, better processes, and more of them than your company.

~~~
falcolas
None of which will _really_ help you, since AWS priority is AWS, not the
uptime of your business. And no number of those better engineers or processes
have prevented downtime and service interruptions on AWS.

~~~
unethical_ban
Oh, man.

Better run your own Internet, after all, you care more about connectivity to
your friends than your ISP does!

Dogmatism is passé. There are good uses for cloud, and good times for on-
premise, depending on what you need, what your skillsets are as an
organization, the kinds of workloads and length of time required for that
workload.

AWS and others have absolutely outstanding amounts of infrastructure and
tooling. Their reliability is off the charts in the past few years, and (once
it actually gets figured out by your engineers) the cloud concept of IAM is
incredibly secure.

There are pitfalls - cost, up-front complexity and several other things - but
I no longer rag on "the cloud".

~~~
toomuchtodo
Amazon has outages all the time, hidden on their status board with a green
triangle, and you still lose S3 objects once you’re operating at a large
enough scale.

A quick google search for “amazon outages” lists the numerous extended outages
they’ve experienced.

~~~
Karunamon
How many of those outages were multi-region and would have taken down a
properly distributed application? How many outages and instances of lost data
would the average enterprise, likely without their own datacenters, redundant
power, hardware staff, etc have taken in the same period?

~~~
toomuchtodo
Most applications will never be architected to be “properly distributed”
because of cost. Many popular web properties (Reddit) still have outages on
AWS even when architected properly. Netflix still distributes content from
their own CDN with their OpenConnect appliances, and only uses AWS for non
streaming use cases (jedberg will correct me on both Netflix and Reddit points
if I'm missing something and comes across this comment).

[https://www.usatoday.com/story/tech/news/2017/02/28/amazons-...](https://www.usatoday.com/story/tech/news/2017/02/28/amazons-
cloud-service-goes-down-sites-scramble/98530914/)

If my app is architected for reliability, I’ll run it on bare metal and keep
the costs savings. Why pay twice by building it for cloud durability _and_
running it on expensive cloud resources? Clearly the AWS marketing is working
(“you’re just building it wrong”).

We’ll see what happens when CFOs take the reins from CTOs and CIOs and start
putting cost controls in place during this recession (“why exactly are we
paying so much in opex when this could be capex we can depreciate?”).

~~~
Karunamon
Ok, so we replace a lot of opex with a little capex and a lot more opex. You
only need devops types if your business runs on a cloud provider, now you need
to employ facilities, sysadmins, security, etc. It's not just the cost of the
hardware we're talking about, your labor budget will necessarily increase as
well.

~~~
toomuchtodo
Devops types are sysadmins that cost more for mostly the same skillset (you
know cloud primitives, you know infra as code, you know some python/bash or
powershell depending on the underlying OS). Facilities, security, etc are
usually covered by your hardware hosting provider, or colocation provider.
Still a lower cost than cloud. You are still paying similar labor costs
regardless if you're in the cloud or have your own metal.

Disclaimer: Previously a devops/infra guy, before that
ops/networking/sysadmin, built out colo facilities/datacenters/hosting
companies before cloud. Have done a lot of cost models for storage and
compute, still do on the side.

~~~
Karunamon
So who takes care of the non-development tasks that AWS (or any cloud
provider, really) is handling on the backend? Schlepping the hardware around,
swapping failing drives, hardware monitoring, actually speccing out and
running a datacenter, physical security, and so forth?

It's generally not the same people who are going to be at their computers
running awscli (or if it is, now we get to figure in how much time they're
spending on tasks that are not their primary job and how many extra of them we
get to hire to maintain the same velocity, not to mention the occasional bit
of firefighting you get to do when you manage your own infra)

------
yosito
> You don't need to be a rocket scientist to learn about and understand data
> egress costs. Which left The Register wondering how an agency capable of
> sending stuff into orbit or making marvelously long-lived Mars rovers could
> also make such a dumb mistake.

I used to work very closely with this department at NASA. Without saying too
much, the short answer is "tenured government employees more concerned about
job security than the success of the project" is how an agency could make such
dumb mistakes.

------
jka
What's the opposite of AWS Snowmobile[0]?

[0] - [https://aws.amazon.com/snowmobile/](https://aws.amazon.com/snowmobile/)

~~~
chickenpotpie
Downloading no data extremely fast

------
Spooky23
Using AWS for this type of use case is dumb for an org as large as NASA, if
cost savings is a goal. It's cheaper to just land capacity at a datacenter.

~~~
toyg
I guess they have additional legal constraints that don’t allow them to just
“land space” here or there - the vendor must probably be security-vetted,
compliant to a hundred government-produced checklists, and willing to go
through extra-long sales and support cycles. It will inevitably push up prices
significantly.

In fact, I can imagine ops-teams at Nasa licking their lips at the idea of
doing away with a lot of that bureaucracy once they switch to AWS... note how
the report mentions that some of the controllers are actual sponsors of the
move: it’s obviously a conflict of interest, but it might well arise when the
org as a whole is a bit too happy to steer away from a suboptimal situation.

This said, AWS will rob them blind, simply because they can. Like all
outsourcers (which is effectively what they are), they get in with the
simplicity argument, then boil that frog up with extra charges. It’s good that
somebody pointed out one of those charges, but I doubt anything will change
substantially- Amazon will probably cut them a discount and that will be it.
And once you’re invested in a cloud env to the tune of hundreds of petabytes,
you’ll likely not switch away for decades.

~~~
Karunamon
> _..then boil that frog up with extra charges._

That implies a level of dishonesty or nontransparency that AWS doesn't have.
Their pricing is disclosed, up front, and they offer a calculator to model
your costs out. Knowing how much data egress you're going to have is not some
arcane art, NASA just plain forgot to do it.

It may be _complicated_ , but so is any workload at this size. Figuring the
cost is part of due dilligence, and they've made it as straightforward as
possible.

~~~
Spooky23
That's a half-truth.

All of the cloud vendors de-empathize network egress costs. It's similar to
products that depend on Microsoft licensing who will always omit those types
of costs. (Oh, so you needed to spend another $500k in SQL Server Enterprise?)

Many organizations lack the operational metrics to allow them to effectively
measure their egress needs. And AWS/GCP/MS salesmen arent in the business of
slowing down deals with awkward questions.

This is especially true where an org like NASA probably contracts out things
like network services. Going from a model where you make fixed capital
investments to paying for the byte is difficult to measure.

~~~
Karunamon
I'm not sure what you mean by "de-empathize".

Here's the official pricing calculator[1] - note that ingress and egress costs
are included in all relevant services. Also note that for something like S3
(which is probably what the article mentions the "earthdata cloud" is based
on), the pricing details are right there on the description page[2].

There is no evidence of any malfeasance by AWS here, just lots of casting
aspersions. What _specifically_ do you want that was not provided?

[1]:
[https://calculator.s3.amazonaws.com/index.html](https://calculator.s3.amazonaws.com/index.html)

[2]: [https://aws.amazon.com/s3/pricing/](https://aws.amazon.com/s3/pricing/)

------
julienchastang
This article is misleading. The entire point is to not move data out of the
cloud. Instead bring your computing (analysis, visualization) to the data and
pay for compute cycles on AWS. If your workflows are short/bursty, you will
come out ahead. Moreover, you will be able to do big data-style computations
that you cannot do in a local computing environment. This is bad journalism,
IMO.

------
chx
If you are facing similar problems you should know traffic via Cloudflare from
B2 is free. I am not 100% CF would be happy if NASA picked the CF free tier
but probably their quote would be magnitudes lower than Amazon's.

------
X6S1x6Okd1st
> NASA also knows that a torrent of petabytes is on the way.

Oh that sounds like a potential solution.

/s

------
gigatexal
might be cheaper to spin up virtual workstations on AWS and use the data there

~~~
julienchastang
Exactly. Move your computation to the data instead of the other way around. At
that point, there are many ways to keep costs down such as using spot
instances and tearing down VMs when your analysis is over.

~~~
gigatexal
And you get to rent the latest hardware than use likely old machines ... I
mean you use the existing machines as dumb terminals but still

------
Havoc
Can't they just use the current DAACs as a caching layer? Seems like the least
ugly way out of this mess.

Also - can't they use torrent tech? I wouldn't mind helping out a bit on space
& data

------
CKN23-ARIN
Putting a dataset into AWS is a lot like putting a satellite into orbit. You
still need to pay later to get it down, or to safely destroy it.

------
Wheaties466
at that point why not just use a P2P based system.

------
szczepano
To sum up no matter how big the hard drives or data center we produce we will
always have problem with storage capacity.

------
pontifier
Cloud egress costs killed the business I'm now trying to save. I won't fall
into that trap.

------
ralusek
I wonder why they wouldn't use Wasabi:

[https://wasabi.com/cloud-storage-pricing/](https://wasabi.com/cloud-storage-
pricing/)

Looks like egress is free.

Maybe because it's comparably untested? Does anyone here have any experience
with it?

~~~
alexfromapex
Probably need assurances or regulatory solutions that only a cloud giant like
AWS could address

------
api
This is exactly why the costs are set up that way. The first time I saw AWS
pricing I chuckled and thought "roach motel." Data goes in but it doesn't come
out. Its one of many soft lock in mechanisms cloud hosts use.

------
tzm
$5,439,526.92 per month

------
turdnagel
Requester pays!

------
Mave83
just build your own storage and save an incredible amount.

It's hard you might think, but it's not. croit.io provides all you need to
deploy a scalable cluster even on multiple geographic regions.

Price for 1 PB sized cluster including everything from rack to hardware to
license to labor for below 3€/TB/Month or at the Amazon Glacier price tag but
with the S3-IA access.

~~~
driverdan
Are you seriously suggesting that NASA didn't consider alternatives, like
their current self-hosted solutions?

~~~
GordonS
Given they "forgot" about egress bandwidth costs, I think the parent's comment
was fair.

------
oh_hello
"The audit, meanwhile, suggests an increased cloud spend of around $30m a year
by 2025"

Isn't this a rounding error for NASA?

------
mensetmanusman
This seems like a good use of torrenting?

~~~
caymanjim
Torrents are only helpful when there's a large number of people who download
the data and are willing to share it. There's not a large userbase for the
vast majority of NASA data. It wouldn't be distributed in any meaningful way.

~~~
mensetmanusman
Maybe various world governments could bandwidth to accomplish these types of
missions.

------
beastman82
Torrent FTW

------
vnchr
Cloud VERSUS Space. Who will come out on top?

------
ph2082
1 Terabyte of hard disk cost ~50USD.

247 Petabyte ~ 247000 Terabyte > 50000 USD.

Network cards, bandwidth, electricity cost > I can't guess.

Couple of good engineers (hardware and software ones), which they definitely
have.

May be they could have built their own cloud in < ~10-15 million USD. And that
won't be recurring cost.

May be they missed article about Bank of America saving ~2 Billion USD, by
building their own cloud.

~~~
supdatecron
Your numbers are way off, as you didn't account for redundancy of the drives
(any failure or bit flips of 1 of those 2,470 drives will cause corruption of
likely the entire data set).

> Network cards, bandwidth, electricity cost > I can't guess.

This is where a huge amount of cost is.

> And that won't be recurring cost.

Maintenance, humans, cooling, drive replacements, property, building, land
tax, payroll tax are all recurring costs.

~~~
ph2082
> Your numbers are way off, as you didn't account for redundancy of the drives
> (any failure or bit flips of 1 of those 2,470 drives will cause corruption
> of likely the entire data set).

Let take another setup of same count as backup. Then another setup as back up
of back up. ~150K

> This is where a huge amount of cost is.

Maintenance, humans, cooling, drive replacements cost > can't be greater than
first time set up cost.

> property, building, land tax, payroll tax

Nasa runs on Government budget, I am sure they can claim some tax break there.

The point I am trying to make is, it may be cheaper to do in-house with the
level of engineering talent they have.

~~~
sitkack
The government should be running its own object store. And by government, I
mean coordinated by Internet2/NSF with federation across all member orgs.

[https://en.wikipedia.org/wiki/Internet2](https://en.wikipedia.org/wiki/Internet2)

Use backblaze pods, demand off peak bandwidth of gilded age megacorps that own
said fiber for sync/replication.

[https://www.backblaze.com/b2/storage-
pod.html](https://www.backblaze.com/b2/storage-pod.html) 480TB/4U

Have 3x sites around the US the build the pods, each new pod gets preloaded
with a smattering of rarely requested and low replication count objects (as a
redundant backup). Then shipped to the site where it will be used. Local
writes go directly to pods which are then kept in sync with the rest of the
cluster.

 __edit, from the TFA

``` And to put a cherry on top, the report found the project's organizers
didn't consult widely enough, didn't follow NIST data integrity standards, and
didn't look for savings properly during internal reviews, in part because half
of the review team worked on the project itself. ```

