
Ice cold archive storage - tpetry
https://cloud.google.com/blog/products/storage-data-transfer/whats-cooler-than-being-cool-ice-cold-archive-storage
======
tpetry
The interesting part it's cheaper than AWS Glacier ($4 per TB per month) and
slightly more expansive than AWS Glacier Deep Archive ($0.99 per TB per month)
but the data is available immediately and not in hours like glacier where you
have to pay a hefty premium for faster access to the data.

------
shittyadmin
Interesting, unlike Glacier this is significantly cheaper than Backblaze B2,
meaning I might have to reconsider how I do my backups again. Any good backup
tools supporting this type of service?

I rely on Restic at the moment which seems to need fast read access to data,
but their incremental snapshotting is great. It'd be ideal if I could find
something like that supporting these "cold storage" solutions.

~~~
mmahemoff
AWS's new Glacier Deep is actually cheaper than Google's Ice Cold,
$1/TB/month.

[https://aws.amazon.com/about-aws/whats-
new/2019/03/S3-glacie...](https://aws.amazon.com/about-aws/whats-
new/2019/03/S3-glacier-deep-archive/)

~~~
m3nu
I expect that Google will also charge for retrieval. Their egress is really
really expensive.

There may also be a minimum storage period, like Amazon has.

Let's wait and see.

~~~
puzzle
It looks like a lower tier than the existing Coldline and Nearline (7x cheaper
for storage than the former). Both have a minimum period, so this one is
likely to have one as well. Coldline and Nearline are more expensive than
regular storage when fetching objects, which means ice cold storage is
probably even more expensive when you restore (is it going to be 7x, too,
keeping symmetry?).

------
yread
> Unlike tape and other _glacially_ slow equivalent

 _shots fired_ I like when multinational corporations with revenues the size
of midsized countries engage in some childish puns

~~~
bzillins
The title seems like a reference to the 2003 Outkast song Hey Ya!
[https://www.youtube.com/watch?v=PWgvGjAhvIw](https://www.youtube.com/watch?v=PWgvGjAhvIw)

Alright alright alright!

------
dredmorbius
Contrast: Petabox, from the Internet Archive.

[https://archive.org/web/petabox.php](https://archive.org/web/petabox.php)

Density: 1.4 PetaBytes / rack

Power consumption: 3 KW / PetaByte

No Air Conditioning, instead use excess heat to help heat the building.

Raw Numbers as of August 2014:

4 data centers, 550 nodes, 20,000 spinning disks

Wayback Machine: 9.6 PetaBytes

Books/Music/Video Collections: 9.8 PetaBytes

Unique data: 18.5 PetaBytes

Total used storage: 50 PetaBytes

Costs are $2/GB, lifetime, I believe.

[https://help.archive.org/hc/en-
us/articles/360014755952-Arch...](https://help.archive.org/hc/en-
us/articles/360014755952-Archive-org-Information)

------
penagwin
Does anybody know what the retrieval fees will likely look like? I've been
wary of most of the "cloud archival" solutions because while they're cheap to
put data into, they seem charge you a billion dollars to actually retrieve it.

~~~
ocdtrekkie
FWIW, this is still an ideal model for backup storage: If your more regular
backup model is robust and your network is well-secured, you'll never need
retrieval. And if you need it, you need it, and it's justifiable to spend big
to save your business.

~~~
lreeves
Backup plans don't mean much unless you fully test a restoration process
periodically though.

~~~
penagwin
I'd be confident with periodically testing just little random parts.

For me this is a "last resort backup", costs little to keep around, and god-
forbid we ever need it. BUT that means we need to account for the case were we
do need it! And if it's going to cost too much then there's no point in the
backup anyway.

~~~
ocdtrekkie
I would generally agree. First of all, you're going to test a lot of your
restore processes with backups which are closer to home: You should make sure
your VMs can all restore from your onsite (or just less icy) backups, for
instance. As long as you're confident in that, the only thing you need to test
with "ice cold" storage is that you can successfully restore a single VM from
it, since you know all of your VMs can be restored.

------
idlewords
What is the meaning of the claim about "99.999999999% annual durability"? Does
that mean one chance in 100B of an object being unretrievable?

~~~
rsync
"What is the meaning of the claim about "99.999999999% annual durability"?"

It has no meaning whatsoever. Someone on the marketing side of the team
decided that was a "competitive" number to present, outwards, and someone in
engineering was tasked with, _working backward_ from that number, coming up
with some plausible calculation that resulted in it.

In the real world, they, like Azure and Amazon, will have single point in time
outages that will wipe that out for a year or more.

Here is what an honest assessment looks like:[1]

"Historically (updated April, 2019) we have maintained 99.95% or better
availability. It is typical for our storage arrays to have 100+ day uptimes
and entire years have passed without interruption to particular offsite
storage arrays."

...

"In the event of a conflict between data integrity and uptime, rsync.net will
ALWAYS choose data integrity."

[1]
[https://www.rsync.net/resources/notices/sla.html](https://www.rsync.net/resources/notices/sla.html)

~~~
jpatokal
You are mixing _availability_ (access at any given moment) with _durability_
(not losing data). From the FAQ:

 _Cloud Storage is designed for 99.999999999% (11 9 's) annual durability,
which is appropriate for even primary storage and business-critical
applications. This high durability level is achieved through erasure coding
that stores data pieces redundantly across multiple devices located in
multiple availability zones._

Disclaimer: I work at GCP, although not in GCS specifically.

~~~
rsync
"You are mixing availability (access at any given moment) with durability (not
losing data)."

You are correct - I misread that as availability _even after quoting that very
same line_.

------
remus
Anyone care to speculate on the technology that allows them to offer the fast
retrieval times and low cost per GB?

~~~
puzzle
"Fast" is relative here. It's fast compared to Glacier and others, but it's
going to be slower than the more expensive tiers.

You might not need to speculate much about how it works, it's probably
implemented as described by Google themselves in slides 22ff. here:

[http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-
Ke...](http://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-Google-Keynote.pdf)

(In a nutshell, pack some hot data with a lot of cold data on many large
drives, then put a Flash-based cache in front of them to get long tail
performance predictability back.)

~~~
remus
Thanks for the link! Interesting stuff.

~~~
puzzle
There was also a talk about the low level storage service and the performance
isolation work that allows it to mix batch and latency-sensitive traffic on
the same drive, but it doesn't seem to have been recorded:
[http://www.pdl.cmu.edu/SDI/2012/101112.html](http://www.pdl.cmu.edu/SDI/2012/101112.html)

Gory details are in the patents, 9781054, 9262093 and 8612990, which I'm not
linking directly, because your lawyers might not approve. There's even a
follow up, 10257111. It's so new, from two days ago, that Google Patents can't
find it, while Justia can.

------
OrgNet
It's a pretty good price but assuming you are storing 8TB and you get your own
drive, the drive would pay for itself in about 14 months... so you would
basically get the next 4 years for free if you are willing to manage it...

~~~
dwild
Will that storage have "11 9’s annual durability" and stored in multiple
location?

Let say that you only need to write to it once, have 2 secure location
available for free, that would still means that you need 2 of them which would
pay for itself in 28 months then.

Sure it's "cheaper" but it's far from being as good and the price difference
isn't that big.

~~~
votepaunchy
Optimally running your own drives also assumes that you can fill the drives
... and the next byte doubles your cost.

------
fiatjaf
Google offers some interesting services, but their API is always so awfully
complicated and cumbersome that I've given up entirely trying to use anything.

~~~
jpatokal
If you can use cp to move files around, you can use gsutil to do the same for
GCS.

[https://cloud.google.com/storage/docs/gsutil](https://cloud.google.com/storage/docs/gsutil)

------
duxup
I use Glacier as a sort of backup of my backups .... and was thinking about
Glacier Deep, but this is tempting too.

~~~
lucb1e
In case you're still thinking about options, I would be fine to host a variant
of this for you:
[https://old.reddit.com/r/DataHoarder/comments/7rjcdn/home_ma...](https://old.reddit.com/r/DataHoarder/comments/7rjcdn/home_made_non_gmo_cruelty_free_offsite_backup/)

------
scurvy
While one major use of something like this would be backups, how does one
handle these backup sets with respect to GDPR requests? The window to respond
is 30 days, so keeping backups longer than say 25 days seems cumbersome. You
would need hot access to the sets to load them up and delete the data.

~~~
jsnell
Encrypt backup data with a per-user key, keep the keys only in hot storage,
delete the key when a user is deleted.

~~~
anoncake
Wont that make the backups useless in case of a data loss (i.e. always)?

~~~
jsnell
You don't keep a single copy of each key, but store enough redundant copies to
get the proper number of nines. Preferably that's redundant geographically, in
terms of storage technology, and in write frequency.

The important part is just that the keys don't end up in long term cold
storage. Either it's only retained for a short period (e.g. tape backups that
get rotated after two weeks), or it supports live deletion.

------
patrickg_zill
What are the transfer costs for storage and retrieval?

~~~
tpetry
It seems to have the same pricing like the other storage classes: No fees for
accessing the files within the same region and the typical bandwidth fees if
the backups will be downloaded to somewhere else.

~~~
couterSpell
Intuitively there is some cost for retrieval. Otherwise you'd just use the
cold storage to store everything.

~~~
tpetry
Like every cloud storage file access costs money, so the operation to access
it. But its so minimal its basically non existent for a backup solution.

~~~
votepaunchy
Nearline and Coldline have a per-byte retrieval cost in addition to the
increased-but-still-low cost per operation.

[https://cloud.google.com/storage/pricing](https://cloud.google.com/storage/pricing)

------
JoblessWonder
Google has burned people so many times with shuttering products with little to
no warning I'd be hesitant to trust them with my long term data storage.

~~~
yjftsjthsd-h
Eh... for consumer stuff, sure, and perhaps even for new/experimental GCP
features. But this is storage, a core function, on GCP, an enterprise service
with actual contracts and SLAs attached.

------
imagetic
Just don't think of it as something you'll ever want to restore unless the
building burns down and you've lost everything.

Glaciers restore costs had a lot of fees in my one experience. We could have
bought several RAID units for the price of a fast restore. If you asked for it
back over a long period of time, the price dropped dramatically.

