
New storage classes for Google Cloud Storage - AndrewDucker
https://cloudplatform.googleblog.com/2016/10/introducing-Coldline-and-a-unified-platform-for-data-storage.html
======
stemuk
I think there is really no doubt that the storage prices itself ( for all
types) are pretty amazing. But let's face it: traffic costs are the huge
elephant in the room. By charging 12ct and more per GB, the traffic costs
easily become your biggest expense, and make the storage price reduction from
2.6 to 2ct per GB almost forgetable.

For me this is in no way acceptable and it seems to be a vicious attempt to
sneak in some extra profit without having the customer noticing this upfront.
Shure, these harsh words seam like a big exaggeration, but I literally never
ever hear or read something about traffic costs in Googles fancy blog posts,
and I would make the assumption that only a fraction of the HN community is
aware of this fact. 120 bucks for a sloppy TB of traffic is just way too high.

~~~
kyledrake
This x 1000. Even when you throw your own CDN on top of this, the bandwidth
markup is nasty. Neocities would be unsustainable if we used GCS for our
hosting.

I'm paying about $0.01/GB right now, and I've seen market rate at half that.
And that's not even directly using IP transit providers. You can get a gigabit
unmetered for $450-1000/mo which you can shove a theoretical 324TB through
every month. The difference in the numbers is so staggering I sometimes wonder
if I'm even doing the math right.

Perhaps their bandwidth is better somehow (prove it), but 12-18x better it is
probably not, and having truffle shavings added to your IP transit really adds
up when you're hauling a lot of traffic. If you're doing something with heavy
BW usage and low margins, be careful with stuff like this. It quickly becomes
much more expensive than doing it yourself.

I'd love to be wrong here. I'm sitting next to 60 pounds of storage servers
I'm setting up for a data center, they're taking up my entire living room. I
would love to get out of the data persistence business forever. But at these
BW rates, it's never going to happen.

~~~
Veratyr
You can get gigabit + a full cabinet of colocation from Hurricane Electric for
$400/mo so you know:
[https://he.net/colocation.html](https://he.net/colocation.html)

I'm sharing it with a few other people and it's great.

~~~
kyledrake
I've looked at this. Easily the best deal in town. A few notes for people
considering it:

\- It's IP transit and a DC from HE, which means it's possibly not BGP peering
through other transit providers? If this is true, if HE's network fails, so
does your server's internet connection. Typically you want to be multi-homed
to ensure redundancy, unless you're doing something like an Anycast CDN and
you don't care if it craps out for a few hours.

\- HE is the cheapest transit for reasons. It may not be a big deal for what
you're doing, but be aware of the differences
[https://news.ycombinator.com/item?id=5348624](https://news.ycombinator.com/item?id=5348624)

\- They only provide a 15A circuit (likely only 80% usable) for a 42U rack,
which is pretty ridiculous. You'll blow through that pretty quickly if you're
using dual Xeon servers. It's cheaper under their pricing to get a second 42U
rack than it is to get the proper amount of power needed for a full cabinet.

\- If you're doing Anycast, be aware that HE doesn't provide any BGP
communities except blackholing, which could make it hard to tune your network.

(If any of this sounds annoying to deal with and think about, you know exactly
why I'd prefer that the cloud providers had better BW costs so I could not
have to do this anymore. Again, if the business model works with GCS,
congratulations, use it.)

~~~
Veratyr
> It's IP transit and a DC from HE, which means it's possibly not BGP peering
> through other transit providers?

The gigabit transit that comes with the cabinet is HE-only, yep. You can
however buy transit and an interconnect from any other carrier in the
datacenter (there are quite a few in FMT2 at least).

> They only provide a 15A circuit for a 42U rack, which is pretty ridiculous.

Yeah, it's pretty shitty. I ended up going with Xeon D for the power
efficiency. Works pretty well.

------
user5994461
Summary: Google Cloud Storage is now (significantly) cheaper, faster and more
available than Amazon S3.

Also, it supports multi-region whereas S3 does not.

Love the direction Google Cloud is taking <3

~~~
tehbeard
What about ~~reliability~~ durability of storage (bitrot, not the uptime)? I
didn't see any information comparing this, is it practically the same across
both?

edit: durability, not reliability.

~~~
klodolph
At this point, durability is hard to compare because the published numbers are
total fantasies. S3's published durability is 11 9s, which I've complained
about in the past because the chance that the human race will be wiped out
next year by an asteroid impact is probably not too far from one in a hundred
million, which puts a hard cap on durability at around eight nines.

Then ask yourself, "is the chance that someone made a mistake in designing the
system which causes catastrophic data loss higher or lower than one in a
billion?"

Yes, the number is total fantasy. The only actual meaning of the durability
SLA is that if the durability SLA is violated, your cloud provider will give
you some service credits. If the data is worth more than some service credits,
this is small consolation.

In practice, if you absolutely must compare durability, what you are trying to
do is compare the frequency of black swan events, but actually estimating the
frequency of those events requires access to proprietary data which you don't
have.

Both providers are undoubtedly storing the data in multiple locations on
multiple types of storage systems with multiple layers of error coding.

After thinking about this issue earlier, the most likely data loss scenario in
my opinion was "I get injured, the company hires someone incompetent who
doesn't pay the bill for a year while I'm in the hospital."

~~~
ghshephard
I've always seen durability from the perspective of Amazon telling me, "Look,
we're going to lose your data - here's how quickly you should expect it to
rot." Basically, take the number of objects you have, and multiply by: 1xe-7
each year. So, if I have 400 Billion objects, (4e11) - Amazon is going to lose
approx 40,000 of them/year, and I should be prepared for that.

With regards to your Asteroid Strike (And other disasters such as Riot,
Insurrection, War, Hurricane, Earthquake, etc...) - these are all disclaimed
in the agreement you sign with Amazon under a clause known as Force Majeure -
which essential means, "Acts of God". The durability clause comes into effect
under the normal course of business, not exceptional events. For those types
of scenarios, you'll want to have a business continuity plan in place, not a
durability formula on your storage service.

~~~
klodolph
Your numbers are off... according to Amazon's durability notes, if you have
400 billion objects then Amazon would expect to lose less than 40 per year,
not 40,000. And for that level of storage, you're paying Amazon something like
a quarter million dollars per year to do storage for you.

My point is that there is no point in taking that kind of durability guarantee
into account, because there are much bigger threats to data storage. It's kind
of like saying that driving by car is very safe if you don't get into an
accident--while technically true, it's not a very useful fact.

~~~
donavanm
The exact values may be off, but that is how to think of it. Losing any
specific object is less likely than earth ending. But these are providers with
many trillion objects. In that case there are a handful of objects that are
going missing each year, on average. The durability guarantee allows customers
to make an educated decision as to how much they invest in loss prevention.

But lets get back to that 400B object customer who's losing 40. How much does
it cost to verify (resilver) all those bits, even annually? Is it even
tenable?

~~~
klodolph
This response got a bit long.

No, that's absolutely not how to think about it. In short, it's a dangerous
simplification and it's not at all representative of how failures work in a
system like S3 or Google Cloud Storage. It does make good marketing copy, but
if you are an engineer or a program lead you have a certain level of
responsibility for understanding why cloud storage does not, in practice, give
you eleven 9s of durability per object every year. (At a first approximation,
you would _at least_ expect the data loss to follow a Poisson distribution,
but…)

Drilling down into the techical guts, S3 and Google Storage do not store
separate objects in the lower storage layers, it's too inefficient. So below
the S3 / GCS object API, you have stripes of data spread across multiple data
centers with error encoding, along with a redundant copy on tape or optical
media. Randal Munroe estimated Google data storage at 15 EB ([https://what-
if.xkcd.com/63/](https://what-if.xkcd.com/63/)), so taking that number, let's
suppose a stripe size of 100 MB (just picking a number out of the air that
seems reasonable) and you get 1.5e11 stripes. Taking Amazon's 11 nines, that
gives a loss of 1 stripe every 8 months.

So, all we've done so far is look at how a system would be implemented, and
we've already completely destroyed the notion that you would expect 40 of 400B
objects to disappear due to bit rot in any particular year. Supposing the
objects are 10KB in average size, you might expect most years to lose no
objects at all, and if you lose any you might lose 10,000 at the same time—and
the entire extent of your recourse is to get a service credit from your cloud
provider.

The gotcha is that the system simply isn't that reliable. First of all,
engineers at Amazon and Google are constantly pushing new configuration and
software updates to their stack. Some of these software updates can result in
catastrophic data loss, and some of these errors will not get caught by
canaries. "Catastrophic" might mean metadata corruption, it might mean the
loss of many stripes all at the same time, but from the cloud provider's
perspective they're still meeting SLA for _most_ of their customers so _most_
of their customers are happy. On top of that, you also have to take into
account the possibility that a design flaw in the storage media would cause
massive data loss across multiple data centers simultaneously, or other
nightmare scenarios like that. Given that I've personally experienced data
loss due to design flaws in storage media and I only have ever owned twenty
hard drives or so in my life, you can imagine that a fleet with millions of
hard drives presents some unique reliability and durability problems.

You can pretend that these configuration and programming errors are "unusual
events" but the fact is that stripe loss for any reason is already an unusual
event, and you might as well include the _most probable_ cause of data loss in
your model if you are going to model it at all.

So, what is the SLA? It's part of a contract. It defines when the contract is
performed and when it is broken. It's also a piece of marketing and sales
leverage. That's all. It's not a realistic or particularly useful description
of how a system actually works—so the responsible engineers and program
managers at companies which use cloud services are always asking themselves,
"What happens if Amazon or Google violates their SLA? Will I lose my job?"

(A footnote: You don't need to verify the bits yourself, cloud providers will
send you messages when your data is lost. If you want more durability then you
go multi-cloud or buy a tape library.)

(Disclosure: I work at a company that provides cloud storage.)

~~~
donavanm
Disclosure: I currently work at AWS and have a bit more than passing
familiarity with large scale storage systems. I know what erasure encoding is.
I know from experience just how wrong things can go.

The passing comment of distribution of errors is important. However the "1
stripe is 100Mb, and objects are 10KB, ergo you'll lose 0 or 10,000 objects"
bit is bizarro. I suspect youre letting your personal experience lead to
assumptions that may not be true in other implementations.

~~~
klodolph
Yes, that's definitely not true on all the storage systems I've worked on.
Some storage systems will pack a single customer's data into a single stripe
and others won't.

------
questionr
Google is making their cloud offering very attractive compared to AWS.

The issue with Glacier is the convoluted retrieval pricing. I understand they
want to dissuade people from using it as a primary storage, but the potential
for a surprise bill is hard to swallow.

Interesting how they still offer unlimited storage through their consumer
Amazon Drive service.

~~~
Gigablah
The AWS web console is a pain point. I had a particularly frustrating
experience once where attempting to restore a Glacier storage class file in S3
silently failed due to permission issues (I had forgotten to assign
permissions to the bucket owner in my automated upload script), but the
console told me that the file would be "available in 3-5 hours". I wasted days
due to this.

Meanwhile other competitors (e.g. Nearline) promise to have your file
available within seconds...

~~~
boulos
Yep. Nearline is great. Just so we're clear though: milliseconds ;). There's
no latency penalty to get your bytes back, even with Coldline. We do charge a
retrieval fee because the pricing is determined by a "promise" that you won't
touch the data frequently (roughly once a month for Nearline and once a
quarter for Coldline), otherwise you should use our regular flavors.

Disclosure: I work on Google Cloud and am a happy GCS customer.

------
vbezhenar
Coldline seems to be a perfect solution for personal backups. I used amazon
glacier, but got hit with extremely huge bill, when I decided to retrieve my
backups, so I'm not going to deal with them anymore, their retrieval pricing
is absurd. If there's good UI and scripting solutions for manual and automatic
backup into google coldline, I'm in.

~~~
chris_st
I highly recommend the "Arq" [1] commercial program for backing stuff up to
these kinds of services. They support most of the big names (AWS, Google,
Dropbox, etc.) and will probably be supporting the new Google options soon.

It does local encryption before sending your data up to the cloud.

Notably Arq restores _all_ macOS permissions/meta-data, and there is an open-
source test kit to show whether such a program does so.

It also works on Windows.

I have no relationship with the company, other than as a happy user.

[1] [https://www.arqbackup.com](https://www.arqbackup.com)

~~~
questionr
Arq sounds almost ideal, except its still intended more for "backup" than
"storage". Its more like an Apple Time Machine.

From what I read, you can still offload data from your local machine by
choosing to "archive" a directory.

But thats an extra step and I haven't read how it handles conflicting
directory that are created in the future after archiving the original.

~~~
chris_st
That's an interesting use case. You ought to email them and ask about it!

My goal is just two off-site backups (AWS and Google).

------
rsync
In case it's not widely known, rsync.net maintains 'gsutil' in the environment
(along with s3cmd) so you can move data between google cloud services and
rsync.net.

    
    
      ssh user@rsync.net gsutil cp mscdex.exe gs://my-bucket
    
      ssh user@rsync.net gsutil rsync -d data gs://mybucket/data
    

Although it should be noted that, circa 2016, the cool kids are all backing up
to rsync.net with borg[1][2] (the "holy grail of backup software") which
limits the use cases of gsutil with an rsync.net account.

HN Readers discount. Just email and ask.[3]

[1]
[https://borgbackup.readthedocs.io/en/stable/](https://borgbackup.readthedocs.io/en/stable/)

[2] [https://www.stavros.io/posts/holy-grail-
backups/](https://www.stavros.io/posts/holy-grail-backups/)

[3] info@rsync.net

~~~
Veratyr
I say this most times I see you advertising and I'll say it again: You really
need to work on your pricing to be competitive with things like this. Even
with the HN discount your service is expensive and even with the Borg/Attic
pricing which is cheaper still, your service is 3x more expensive than
alternatives like Nearline (aside from bandwidth), which themselves can be
used as a backend for Attic (just mount GCS locally).

I really like the look of the service but I can't justify paying over 3x for
it.

~~~
rsync
"your service is 3x more expensive than alternatives like Nearline (aside from
bandwidth), which themselves can be used as a backend for Attic (just mount
GCS locally)."

Well, our storage platform is online and random access so it would be
inappropriate to compare it to either nearline or glacier.

The appropriate comparison is to S3 - or in this case, the multi-regional GCS
option that this discussion points to.

In that case, our attic/borg pricing is 3 cents as compared to (roughly 3
cents) for s3 and 2 cents for GCS - and that assumes you use no bandwidth.
Since we charge nothing for usage/bandwidth, the comparable prices from
amazon/google would be slightly higher.

It sounds like a steal to me and every day plenty of people agree enough to
commence using our services.

Also, unrelated, you know you can just drive up to an rsync.net location and
get your data - even if the Internet is crippled.[1]

[1]
[http://www.rsync.net/products/oob.html](http://www.rsync.net/products/oob.html)

~~~
Veratyr
> Well, our storage platform is online and random access so it would be
> inappropriate to compare it to either nearline or glacier.

That would be true but you stated "circa 2016, the cool kids are all backing
up to rsync.net with borg", so comparing your service pricing as a backup
product to Nearline is appropriate I believe.

Don't get me wrong, your service definitely has a lot of good applications and
the free bandwidth is quite great but for those of us that just want somewhere
to park a whole pile of data we aren't going to touch much, Rsync.net is quite
expensive.

Also a downside of rsync.net vs. AWS/GCS etc. is that you can't direct users'
HTTP requests to rsync.net. That dampens the usefulness of online storage and
unlimited bandwidth, since it essentially limits it to servers under our own
control. If I could run HTTP off it directly, my feelings would be very very
different.

------
AdmiralAsshat
Cold storage looks cheap at first glance, but for the amount of data I have to
backup (about 3 TB), I work out the cost to:

(0.7 * / 100) * 3000 = $21 / month * 12 = $252 / yr

For that price, I'm better off just paying $60 / yr for CrashPlan, which is
unlimited storage. I'd say that applies to anyone who has a terabyte or more
of stuff to back up.

~~~
devopsproject
Have you tried downloading any significant portion of your backup from
CrashPlan? Is so, how was performance?

~~~
AdmiralAsshat
Not a significant portion, no. CrashPlan is only my third layer of backup, the
offsite backup. Each of my drives have a clone backup drive (typically just a
WD My Passport that makes them easy to transport when I travel), and then a
larger external that holds backups of all of them--I usually keep that one in
storage and only plug it in once a month or so to sync. CP is the third layer
of backup. The most I do in a typical week is use the Android app to grab a
few movies or documents from the CP storage to get it onto my phone and
tablet.

I should probably "test" doing a recovery from CrashPlan at some point, but I
fear that even getting a third of it (1TB) would probably trip some sort of
overcharge from my ISP.

------
meirelles
Google finally is taking a step over the cloud offerings. I was skeptical
about them ~1.5 year ago, they where only following the AWS without big
innovations.

------
boulos
For those that are asking:

\- Multi-regional is what we used to call Standard (highlighting that it's
always been awesome and replicated)

\- Regional / DRA collapse into one

\- New offering: Coldline

\- Massive price cut on operations

\- Per-object storage class, plus lifecycle (e.g., automatically go from
Nearline to Coldline after 60d)

Disclosure: I work on Google Cloud (and use Nearline at home!).

[Edit: Formatting. I always forget to put two newlines]

------
kozikow
I had to move to AWS entirely due to lack of GPU instances on google cloud. If
someone from GCE is reading this - please add GPU instances.

Yes, I know unrelated, but I like GCS much more than S3, and I can't use it
since by "network" effect caused by GPUs eveyrthing have to run on AWS.

~~~
boulos
Please ping me with your project id, and your need (e.g., "I want to run
TensorFlow on 8 GPUs per VM, 4 VMs at a time, somewhere in Europe").

Disclosure: I work for Google Cloud (contact info in profile).

------
Yeroc
Google smart to be aggressive on the storage side of their Cloud business.
Once you have the data (assuming it's big data) it naturally makes sense to
move the processing of that data into Google Cloud services as well...

------
nodesocket
You can attach buckets to cloud instances natively using the gcloud cli[1] as
well. Really useful for static file storage.

[1] [https://cloud.google.com/compute/docs/disks/gcs-
buckets](https://cloud.google.com/compute/docs/disks/gcs-buckets)

My only complaint is Google Cloud persistent SSD disk pricing at $0.17 per GB
/ month. AWS EBS SSD is noticeably lower at $0.12 per GB / month.

------
Corrado
My hope is that this will spur AWS to lower their storage prices. Competition
is great!

------
plandis
Multi region support is interesting, but haven't most of GCPs outages affected
all of their regions?

~~~
hurstdog
Checking specifically for GCS at
[https://status.cloud.google.com/summary](https://status.cloud.google.com/summary)
...

There are two outages there, one that only affected a few projects and one
that affected only service in the central US (we have regions over much of the
world).

Anecdotally, from watching and working with other services internally I don't
think most of our outages affect all regions. We actually spend a significant
amount of engineering effort ensuring that we're as decoupled as possible.

Disclaimer: I'm an Engineering Manager on Google Cloud Storage.

~~~
jonathanoliver
Is there some documentation anywhere talks about how Google Cloud (any
product) creates isolation between various regions while at the same time
exposing a simple "regionless" programming model?

~~~
hurstdog
Probably the best reference is the SRE book, specifically the chapters on
loadbalancing and distributed consensus protocols.

Other than that, the general approach is to minimize global control planes and
dependencies in our software stack. In the case of GCS, we do have a single
namespace which means we need to look up the locations of data early in the
request. Once we know the locations of data we can route the request to the
right datacenter to serve it. That global location table is highly replicated
and cached, of course.

When outages happen, most are caused by changes to the stack, so we also are
careful to roll out code or configuration slowly and carefully, slowly
increasing the blast radius after it's been proven safe. For example rolling
out new binaries first to a few canary instances in one zone, then to a few
instances in many regions, then to a full region, then to the world, all
spread over a few days.

Disclaimer: I'm an Engineering Manager on Google Cloud Storage.

------
Heliosmaster
Talking about cold storage, i'd like to Pitch C14 of Online.net (the company
behind Scaleway):
[https://www.online.net/en/c14](https://www.online.net/en/c14)

0.2ct per Gb.

------
brianwawok
So what happened to "Durable, reduced availability". It does not seem directly
mapped into the new system. I guess it roughly maps to "regional"???

~~~
cobookman
Yep. Regional is the equivalent of Durable reduced availability.

~~~
brianwawok
Doesn't look like things got auto migrated.. (whereas for the other cases they
did get auto migrated)

------
banterfoil
Can someone explain to my why I shouldn't be worried about Google becoming a
monopoly? I think they make some fantastic products and they have been
expanding their reach to hardware, DNS, web hosting, and storage.

~~~
Yeroc
You don't need to start worrying about them becoming a monopoly until they
actually figure out how to properly provide support for their products to the
people (eg enterprises) that need it. :P

------
naiv
I am confused. Is multi regional the same as availability zones in s3?

Or do we upload to the European region and it gets replicated to the US
automagically?

------
throw2016
It's difficult to see how AWS and GCE can justify their ridiculous bandwidth
charges given how low they buy it in bulk. This is plain profiteering. With
cloud bandwidth was supposed to become something one takes for granted not
something to fuss over with calculators.

And the whole nickel and diming charges that force users to needlessly
seperate their computing needs into storage, compute, memory, bandwidth,
reads, writes and what not are actually forcing considerable complexity on
users.

This cannot be brushed aside as a good model for cloud when valuable user time
is being wasted grappling with needless intricacies that have no reason not be
flat.

End users cannot buy bandwidth at low bulk rates and for things like backups
may be faced with considerable overcharges on their local connections.

------
ericd
I'm extremely reluctant to rely on Google services for core infrastructure for
anything revenue generating after seeing the shenanigans they've played on
pricing with their other services (especially Google Maps, where they
initially jacked up the pricing to an absurd level, then backpedaled and made
the prices 1/10 what they were initially threatening) and the relatively short
migration windows they typically provide.

Besides capricious pricing changes, their support is notoriously bad, even for
paid apps for business.

They don't seem to understand business customers needs nearly as well as
Microsoft/Amazon, from what I've seen.

Really not who I want running my business's server infrastructure, even if
it's completely free.

------
andybak
Dear god, the copy on that page is awful. The first paragraph is just vacuous
twaddle. Anyone who starts a minor product announcement with the words "Today,
we’re excited to announce..." deserves their own circle of hell. Most of that
first section is redundant preamble and should be rewritten as bullet points.

In fact the whole thing could do with being rewritten in a simple bullet-
pointed skimmable form. That way I wouldn't have to come to HN comments to
find out what the hell it's all about. Is it too early in the day for a drink?

~~~
devopsproject
Try some coffee with a light breakfast, then move on to the harder stuff if
you still can't cope with "words".

~~~
andybak
A couple of points:

1\. It was 4 in the afternoon here when I posted that - a touch too late for
breakfast even if it was a bit early for a drink

2\. I quite like words when they are strung together well ;-)

I still would like to hear someone defend that as a piece of marketing copy.
I've re-read it and stand by my earlier diatribe.

~~~
devopsproject
You've wasted a lot of time on something that is easily ignored. Its time to
move on.

~~~
andybak
And I feel better for it.

