
Google Cloud Storage Nearline graduates to general availability - vgt
http://googlecloudplatform.blogspot.com/2015/07/Google-Cloud-Storage-Nearline-graduates-to-general-availability.html
======
dragonwriter
From the details (in an image, some in fine print on the image) on the offer:
that's if you migrate at least 1PB into Google Cloud Nearline in the first 3
months, and maintain at least 1PB in Google Clound Nearline for at least 12
months.

And its only _1 month_ free at that level of commitment; anything more than 1
month requires a migration commitment to more than the 1PB.

EDIT: At the time this comment was written, the thread title referred to the
100PB free for up to 6 months offer, not the fact that Nearline was now in GA.
While the content of the comment is still true, the title that is being
clarified by the comment is no longer present. (Similar things seem to be true
of a number of the other top level comments.)

~~~
rugmug5
Once the free period is over the 100PB will cost $1 million per month in
storage fees alone. Will the free month(s) even cover the retrieval and
transfer fees to take 100PB out of another cloud?

~~~
low_battery
To store 100PB, one can always buy 50.000 hard drives. Would only cost $2-3
million of initial payment. I wonder how much maintenance would cost to
provide same availability with near storage.

~~~
dekhn
Don't forget the full-time admins and hwops people. Operations costs are going
to dwarf the capital investment.

~~~
rugmug5
Or you could leave it in the cloud where it already is and not transfer it to
Google's cloud. I wasn't suggesting building your own...

------
fweespeech
As cute as it is to have $.01/GB backup storage....every time you pull out a
backup it costs you $.13/GB.

That is a pretty steep hit when you can use a lower quality object store:

[https://www.runabove.com/storage/object-
storage.xml](https://www.runabove.com/storage/object-storage.xml)

For $.01/GB and $.01/GB to pull.

Yeah, it may not be as highly available as Nearline but for the 99% of the
time it is available...you don't have a 3 second delay and you aren't paying a
13x premium to pull data out.

For instance, lets say you have a 50GB backup that you automatically verify
via a testing process. You don't want to burden the disks of your primary
datastore longer than you have to [the source of the backup] so your workflow
is:

Database Server -> Create Backup Locally -> Push Backup to Object Store Bucket
[$.50/month to store the backup, assume you store 7 days] -> Spin up VM which
pulls from Object Store Bucket and verifies the backup [$.06/day for an 4GB
Linode for an hour to do this which is plenty of RAM and time ] -> Pull the
Backup & Verify [$.50/Day]

 __30 Days Backup Cost: __

$20.30 /month to maintain a week of verified backups with RunAbove = (7 * 50 *
.01) + (30 * .06) + (30 * .01 * 50)

$200.30/month to maintain a week of verified backups with Nearline = (7 * 50 *
.01) + (30 * .06) + (30 * .13 * 50)

I don't know why anyone would willingly pay an order of magnitude more for
cold storage.

~~~
vonmoltke
However, the egress charge only applies to data moved off Google Cloud
Services. If you use a Google compute instance to verify the backup (same size
as the Linode at $0.05/hr), the cost becomes:

(7 * 50 * .01) + (30 * .05) + (30 * .01 * 50) = $20.00/month

~~~
fweespeech
You forgot disk pricing for apples to apples. You also lose multi-provider
redundancy [e.g. Everything must be on Google cloud or you are maintaining
multiple builds of your production database machines ]

[https://cloud.google.com/compute/pricing#localssdpricing](https://cloud.google.com/compute/pricing#localssdpricing)

$0.113

So its:

[apples to apples; local ssd]

$186.95 = (7 * 50 * .01) + (30 * .05) + (30 * .1213 * 50)

[persistent provisioned ssd]

$54 = (7 * 50 * .01) + (30 * .05) + ( ( ( (.17 * 96) / 720) + .01) * 50 * 30)

------
xhrpost
Are there any well accepted theories as to how these (Nearline & Glacier)
systems function? I know with Glacier, I've heard powering down hard-drives,
advanced compression (explains 3hr retrieval), and blu-ray storage (explains 3
month minimum). With Nearline bringing access down to 3secs for the same cost,
and now with On-Demand IO, I really don't know what to speculate. Assuming I/O
is limited due to spread across media, how could that be scaled up on-demand?

~~~
bbrazil
My conjecture on Nearline is that it's unused disk space on disks that are
mostly I/O bound.

------
sz4kerto
It's basically impossible to upload this data in 6 months.

100 PB = 102400 TB = 104857600 GB = 107374182400 MB.

You have 365/2 * 24 * 3600 seconds in 6 months, that's 15768000 seconds.

So you need to upload ~7 GB/sec, that's 60 Gb/sec constant upload.

~~~
jakob223
You can't upload this data from one place, but if you have lots of
installations of something, each of which is independently generating a lot of
data, it's feasible.

------
ohitsdom
6 months of free storage is a no brainer business strategy. Once you get
anywhere near 100PB of data loaded, are you really going to move it elsewhere
after 6 months?

~~~
rsync
We (rsync.net) maintain 's3cmd' and 'gsutil' in our environment, so you can do
direct cloud <\--> cloud transfers from either amazon or google to and from
rsync.net.

We _don 't_ have a cold storage product, so perhaps not relevant for nearline
... but it _is_ relevant for their online options. You do have the ability to
migrate to another provider.

Two random notes:

\- we're announcing zfs send/recv[1] as a transport in the next few days and
there will be s3-competitive pricing for this product at TB+ levels.

\- "HN readers discount" is still available all these years later. Just email.

[1] over SSH

~~~
underscores
If you have tons of TBs, you don't want to care about random failure with some
third party. Amazon S3 has the reputation that they never lost a file, AFAIK.
On their scale that provides an argument to just fork over the cash and forget
about it.

While I think ZFS is great, it's just not a solution for really large storage.

Also, if you are still using s3cmd, that only tells me how inexperienced you
are.

~~~
PhantomGremlin
_if you are still using s3cmd, that only tells me how inexperienced you are_

Can we have a brief (one or two sentence) summary of what people _should_ be
using instead?

~~~
ac29
[https://aws.amazon.com/cli/](https://aws.amazon.com/cli/)

------
magic5227
Does anyone know why bandwidth costs for both AWS and Google have remained
flat while storage has rapidly approached .01 cents a gb?

Do they have fixed costs or is this one of the last ways they have to to make
money on their platforms?

~~~
vgt
(disclosure: am work for Google). Not sure on AWS.

Google's Network is a vast, secure, and performant Global SDN. This allows us
to do some cool things. Therefore, Google has a fundamentally unique value
proposition:

\- Traffic between zones/regions at Google never leaves Google Network.
Therefore, no need to setup VPC or VPN between zones/regions. Makes
deployments much much simpler.

\- Google Compute Engine is a VPC out of the box, even cross-region. You can
carve out your own sub-VPCs easily using firewall rules.

\- Traffic from Google to end-user is not being dumped off of Google network
as soon as possible. Google carries your traffic as close to the end-user as
possible.

Edit: take a look at this as well..
[http://googlecloudplatform.blogspot.com/2015/06/A-Look-
Insid...](http://googlecloudplatform.blogspot.com/2015/06/A-Look-Inside-
Googles-Data-Center-Networks.html)

~~~
boundlessdreamz
But you are much more expensive than S3 & CloudFront when it comes to request
pricing

Cloud Storage considers retrieving an object through its HTTP url is
considered a class B XML request type which is priced at $0.01/10,000 ops.
This is 1.3x-2.5x more expensive than cloudfront and s3 respectively. This is
also not well documented and caused me some trouble. I didn't expect HTTP GET
requests to be counted as XML API requests.

~~~
skystorm
What did you expect GET requests to be counted as then? As an aside, I think
it's fairly well documented:
[https://cloud.google.com/storage/pricing#operations-
pricing](https://cloud.google.com/storage/pricing#operations-pricing) (lists
"GET Object" under class B) [but disclosure: I work for Google]

~~~
boundlessdreamz
It did not occur to me that HTTP GET will be considered as an XML API
operation. So I didn't expect HTTP GET requests to be charged because
everywhere I read (including that page you listed) the pricing is for XML API
operation.

That + it being more expensive than cloudfront and s3 came as a rude shock :(

In general I find google cloud documentation and the service much better and
more pleasant to work with than AWS but in this case it is not. Both S3 and
Cloudfront have much clearer pricing (and positioning). In network pricing,
both S3 & Cloudfront are significantly cheaper

~~~
vgt
Per my parent comment, comparing AWS networking functionality to Google's is
Apples-to-Oranges.

------
Smrchy
Very nice. We've been looking into this but the "beta" status kept us away.

So the bigger news here is Google Nearline Storage graduating to general
availability.

------
gabeio
I am absolutely blown away with the 100 PB for every customer... Meaning all
together they must have somewhere in the exabyte to zettabyte range or are
completely betting on the fact that no one will be able to/actually be willing
fill this/their cloud. I guess I am just not used to the fact that on average
most people have at least a 1 TB hard drive or hard drives that add up to 1
TB...

~~~
dragonwriter
> I am absolutely blown away with the 100 PB for every customer...

Its 100PB free for _1 month_ , if you commit to transfer in at least 1PB in
the first three months, and commit to keep at least 1PB for 12 months (Its "up
to 6 months" free of 100PB, but more than 1 month free requires greater than
1PB commitment.) At $0.01/GB/mo, 1PB of data for 11 months (12 months minus 1
month free) is a $110k commitment.

Its not "100PB for every customer", at least not free, its a variable size
discount that covers up to 100PB for 1-6 months of a 12 month commitment
period for customers able to make up-front commitments of $110k or more.

------
carton
I am a bit puzzled by the lack of durability/data loss statistics. In
comparison, Amazon Glacier states that it "is designed to provide average
annual durability of 99.999999999% for an archive."

------
mrcactu5
what happens after 6 months? and who needs 0.1 EB anyway?

------
willcodeforfoo
Any tips for getting a fairly significant amount of data (50TB) from a local
NAS to Google quicker than a 5mbps upstream will allow?

Amazon has Import/Export which lets you ship drives, is this the best option?

~~~
extra88
Google has "Offline Disk Import" [0] which is an "alpha" release at right now

[0] [https://cloud.google.com/storage/docs/early-
access](https://cloud.google.com/storage/docs/early-access)

------
fridek
It looks like an amount of space hard to fill with meaningful data. I wonder
if they would prevent people from filling it with random junk just for fun.

~~~
3pt14159
Only hard to fill if you're dealing with text only. Say you're building a
video copyright detection crawler and you want to store videos from all over
the internet.

~~~
mprovost
An HD video stream from Netflix uses about 3GB per hour. So 100PB gives you
about 4000 years of content.

~~~
3pt14159
Just in 2012 alone Google stored 76 PB of content. With higher quality feeds
and multiple sites you can quickly start getting beyond the limit.

------
s800
Why mention an SLA if it's 99%?

To me, "SLA" is an enterprise oriented term, which a lot of folks (us) don't
care about because we're scaling horizontally and architect with failure in
mind.

99% for a typical non-scaling enterprise app is crap.

~~~
dragonwriter
> Why mention an SLA if it's 99%?

Because at least part of the target market for this product cares about
quantifying the guarantee, whatever it is.

> To me, "SLA" is an enterprise oriented term,

I think that's excessively narrow in terms of who cares about it, but Nearline
is clearly in large part an enterprise-targeted offering, so, even if it was a
purely enterprise-oriented term, its appropriate.

> which a lot of folks (us) don't care about because we're scaling
> horizontally and architect with failure in mind.

Knowing expected failure characteristics can be an important input to
intelligently architecting with failure in mind.

> 99% for a typical non-scaling enterprise app is crap.

Nearline isn't an app, its one of a set of closely related storage offerings
that, by design, would usually be used in coordination with each other and
possibly other storage systems by an app. For its role in that stack, 99%
doesn't seem immediately unreasonable, to me.

