
Amazon S3 and Glacier Price Reductions - jeffbarr
https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glacier-price-reductions/
======
DanBlake
Would really like to see some massive reductions in the operation costs and
most importantly, bandwidth costs.

The bandwidth costs are so far out of line with what the network transfer
actually costs, it just feels like price fixing between the major cloud
players that nobody is drastically reducing those prices, only storage prices.

Charging 5 cents per gigabyte (at their maximum published discount level) is
the equivalent to paying $16,000 per month for a 1 gigabit line. This does not
count any operation costs either, which could add thousands in cost as well,
depending on how you are using S3.

There are several providers that offer a unmetered 1gbps line PLUS a dedicated
server for ~600-750/mo. Providers like OVH offer the bandwidth for as little
as 100/month. ( [https://www.ovh.com/us/dedicated-servers/bandwidth-
upgrade.x...](https://www.ovh.com/us/dedicated-servers/bandwidth-upgrade.xml)
) I am just not sure how amazon can justify a 160x price increase over OVH or
a 30x increase over dedicated server + transfer.

For the time being, the best bet is to use S3 for your storage and then have a
heavily caching non amazon CDN on top of it (like cloudflare) to save the
ridiculous bandwidth costs.

~~~
bkruse
That's exactly why I started Ploid.

A consulting customer came to me a year ago, with a growth from 200TB/year in
data production to over 6PB/year and their budget couldn't sustain that jump
(or anywhere close to it)

Having come from the mass-facilities and data center space with MagicJack, I
knew the wholesale cost of bandwidth, power and drives were continuously
falling.

There are certain clients and use cases that need access to their data all of
the time and the very bones they are built on is based on collaboration
(Genomics).

For example, this client is now storing 6PB of data with us, 3 copies in
separate data centers. We are half the price of S3, and we include all the
bandwidth for free, but limited to a 10GigE per PB stored. This has worked out
extremely well - we were about 20% (!!!!) the price of Amazon after you factor
in bandwidth.

There are lots of challenges we faced, like over zealous neighbors in the
environment, storing lots of small objects and high usage of ancillary
features like metadata but for customers of any size. By putting the "tax" on
bandwidth, a lot of these business cases are solved. I see why Amazon does
that.

AWS is truly great, but as you get into very high scale (specifically in
storage - 2PB+), it becomes extremely cost prohibitive.

~~~
jorangreef
"By putting the "tax" on bandwidth, a lot of these business cases are solved.
I see why Amazon does that."

However, S3 has the same egress pricing as EC2. Do you think it's really a
"business case tax" they're applying across all services?

~~~
dx034
It makes a lot of sense to be able to run loss making products. Otherwise
everyone would use S3 together with Google compute engine and Azure databases
(let's assume they'd be cheapest). In this scenario all providers would lose
out.

In the current world, they can keep prices for some products below costs but
make their money with bandwidth and the other services people are forced to
use to avoid egress traffic.

~~~
jorangreef
"In the current world, they can keep prices for some products below costs but
make their money with bandwidth and the other services people are forced to
use to avoid egress traffic."

Which AWS products are loss leaders?

S3 storage pricing is not exactly cheap. Neither is EC2 instance pricing.

"Otherwise everyone would use S3 together with Google compute engine and Azure
databases (let's assume they'd be cheapest). In this scenario all providers
would lose out."

No, S3 would do well, GCE would do well, Azure would do well. Providers only
lose out to the extent their products no longer compete on merit alone.

~~~
dx034
I can imagine that this is a good reason. Otherwise they could make bandwidth
cheaper so that people who cannot move everything can at least move part of
their applications.

I think the three providers are smart enough to know why they charge that much
for bandwidth. And this is the only reason I could think of why all 3 of them
charge that much. And I'm pretty sure that some products run at a loss, they
do for nearly every company. But AWS won't tell us which ones.

It's reasonable to think that S3 is loss making or about breakeven on its own
but recoups costs due to bandwidth charges.

------
cpkpad
Well, the costs are nicer, but mostly, Glacier goes from an unusable pricing
model to a usable one. I was terrified to use Glacier. The previous model, if
you made requests too rapidly, you might be hit with thousands of dollars of
bills for relatively small data retrievals -- very easy to make a very
expensive bug.

I had wanted Amazon to wrap it in something where they managed that complexity
for a long time. Looks like they finally did.

Now the only thing Amazon needs to do is expand free tiers on all of their
services, or at least very low cost ones. I prototype a lot of things from
home for work -- kinda 20% time style projects where I couldn't really budget
resources for it. The free tier is great for that. All services ought to have
it -- especially RDS. I ought to be able to have a slice of a database (even
kilobytes/tens of accesses/not-guaranteed security/shared server) paying
nothing or pennies.

~~~
Johnny555
A t2.large costs 10 cents/hour, a t2.medium RDS instance costs 7 cents/hour.
If you put in 50 hours/month on this side project, that's $8.50 for compute
plus maybe $3 for ~ 30GB of storage.

$11.50/month doesn't sound too hard to budget for.

~~~
andrioni
It might be more about bureaucracy than about cost. At my last job even small
expenses would require printing forms and getting the CFO to sign them, so in
the end it was pretty much not worth it for small tests.

~~~
brianwawok
Your work can't give you a few VMs in a lab somewhere that you can get free
rein to prototype on? That is usually not too hard to get..

it seems the idea of the Amazon free tier is to give people a taste of AWS so
they can decide to go in more or not. It's not really designed to be a free
prototype for existing large customers product. Like the other poster said,
you can host a tiny VM for $6 or so a month, not a big expense.

If you are asking for $4000 a month production cluster, yes that is harder to
just get..

------
Alex3917
While I'm not going to complain about a price reduction, I'd honestly be more
excited if S3 implemented support for additional headers and redirect rules.
Right now, anyone hosting a single page app (e.g. Angular/React) behind S3 and
Cloudfront is going to get an F on securityheaders.io.

And even worse, there is no way to prerender an SPA site for search engines
without standing up an nginx proxy on ec2, which completely eliminates almost
all of the benefits from Cloudfront. This is because right now S3 can only
redirect based on a key prefix or error code, not based on a user agent like
Googlebot or whatever.

This means that even if you can technically drop a <meta name="fragment"
content="!"> tag in your front end and then have S3 redirect on the key prefix
'?_escaped_fragment_=', that will be a 301 redirect. This means that Google
will ignore any <link rel="canonical" href="..."> tag on the prerendered page
and will instead index [https://api.yoursite.com](https://api.yoursite.com) or
wherever your prerendered content is being hosted rather than your actual
site.

Not only is it a bunch of extra work to stand up an nginx proxy as a
workaround, but it's also a whole extra set of security concerns, scaling
concerns, etc. Not a good situation.

edit: For more info on the prerendering issues, c.f.:

[https://github.com/prerender/prerender/issues/93](https://github.com/prerender/prerender/issues/93)

[https://gist.github.com/thoop/8165802](https://gist.github.com/thoop/8165802)

~~~
jeffbarr
Can you share your needs with me so I can pass them along to the S3 team?

~~~
Alex3917
Thanks Jeff, I just sent you an email:

[https://www.fwdeveryone.com/t/QOU4DQDbS8e4tddfI22J0w/s3-prer...](https://www.fwdeveryone.com/t/QOU4DQDbS8e4tddfI22J0w/s3-prerendering-
issue-hacker-news)

Of course this won't yet show up in Google until we get that nginx proxy stood
up or this feature gets implemented. :-)

~~~
scrollaway
OT: That fwd:everyone service is really cool. Damn :)

~~~
Alex3917
Thanks! Consider making an account, we're re-launching the site in the next
couple weeks so there should be a lot of good content up there shortly.

~~~
0xmohit
Looks really good. Will sign up shortly.

------
Perceptes
Is anyone using either S3 or Glacier to store encrypted backups of their
personal computer(s)? I've only used Time Machine to back up my machine for a
long time, but I don't really trust it and would like to have another back up
in the cloud. Any tools that automate back up and restore to/from S3/Glacier?
What are your experiences?

~~~
icecreammatt
I use Arq ([https://www.arqbackup.com/](https://www.arqbackup.com/)) and it
works very well. I've only tested retrieving small amounts of data from it so
I can't comment much on a large bill. I only wish it worked on Linux. I've
been thinking about seeing if it would work with Wine.

~~~
chrislund
Ditto. I've been using Arq for 4 years now to back up nearly a TB of data to
S3 and Google Cloud Storage.

~~~
CoachRufus87
Out of curiosity, why both?

~~~
chrislund
I guess mostly to not have all my cloud backup eggs in one basket.

------
physcab
This a really dumb question, but since I've never used Glacier what does the
workflow for a Glacier application look like? I'm used to the world of
immediate access needs, and fast API responses, so I can't imagine sending off
a request to an API with a response "Your data will be ready in 1-5 hours,
come back later".

~~~
scrollaway
Glacier is not for data you want readily available. It's for when you care
more about storage than access.

~~~
pauloday
Right, but if you want to retrieve some data through their API, how does it
work? Normally you open the connection, ask for the data, then receive it and
close the connection, does that change if there's a 5+ hour wait between the
ask and the receive? Do you just leave the connection open? Provide them with
a webhook to call when it's ready? I don't personally care about the answer
but I'm pretty sure that's what they were asking.

~~~
dgemm
Well the CLI gives you a job ID that you can use to check on the status and
retrieve when it's ready. You can also ask to be notified by SNS.

[http://docs.aws.amazon.com/cli/latest/reference/glacier/init...](http://docs.aws.amazon.com/cli/latest/reference/glacier/initiate-
job.html)

------
ww520
Is the outgoing bandwidth still the same price? Bandwidth cost is kind of high
compared to other services.

~~~
dx034
Yes, they are high for all 3 major cloud providers. I guess it's to keep you
from using the cheapest part of each provider. That way you have to use their
whole ecosystem and they can offer some products below costs to lure people
in.

------
woah
What is the mechanism that makes it cheaper to take longer getting data out?
Is it that they save money on a lower-throughput interface to the storage? Is
it simply just market segmentation?

~~~
wmf
In theory, tape [1], optical [2], or spun-down disk [3] are cheaper but slower
than spinning disk. Erasure coding [4] is also cheaper but slower than
replication. One could even imagine putting cold data on the outer tracks of
hard disks and warm on the inner tracks. In practice I suspect Glacier is
market segmentation.

[1] [http://highscalability.com/blog/2014/2/3/how-google-backs-
up...](http://highscalability.com/blog/2014/2/3/how-google-backs-up-the-
internet-along-with-exabytes-of-othe.html)

[2] [http://www.everspan.com/home](http://www.everspan.com/home)

[3] [https://www.microsoft.com/en-
us/research/publication/pelican...](https://www.microsoft.com/en-
us/research/publication/pelican-a-building-block-for-exascale-cold-data-
storage/)

[4] [https://code.facebook.com/posts/1433093613662262/-under-
the-...](https://code.facebook.com/posts/1433093613662262/-under-the-hood-
facebook-s-cold-storage-system-/)

~~~
ojiikun
it has more to do with clever uses of erasure coding than you might think.

------
QUFB
I currently use S3 Infrequent Access buckets for some personal projects. These
Glacier price reductions, along with the much better retrieval model look
really great.

However using Glacier as a simple store from the command-line seems horribly
convoluted:

[https://docs.aws.amazon.com/cli/latest/userguide/cli-
using-g...](https://docs.aws.amazon.com/cli/latest/userguide/cli-using-
glacier.html)

Does anyone know of any good tooling around Glacier for the command line?

~~~
res0nat0r
If you don't want to mess with all of that, you could use the standard "aws
s3" command to upload your files to your s3 bucket like normal, and just apply
an archive policy to your bucket or archive/ prefix or whatnot and it will
automatically transfer your files to glacier for you.

~~~
phonon
Yup, works great.

------
codedeadlock
Has anyone tried to migrate to Backblaze. Their pricing seems really
aggressive but I am not sure if we can compare Amazon and Backblaze when it
comes to reliability.

[https://www.backblaze.com/b2/cloud-storage-
providers.html](https://www.backblaze.com/b2/cloud-storage-providers.html)

~~~
boulos
I love the folks at backblaze but the single datacenter thing really worries
me (and again, disclosure, I work on Google Cloud). If you're just using it as
another backup, maybe that's less of a concern: your house would have to burn
down at the same time that they have a catastrophic failure. But it is part of
the reason you see S3 and GCS costing more (plus the massive amount of I/O you
can do to either S3 or GCS; I'd be curious what happens when there's a run on
the Backblaze DC).

Again, huge disclosure: I work on Google Cloud.

~~~
jorangreef
"But it is part of the reason you see S3 and GCS costing more"

I bet that when Backblaze increase their scale by adding data centers they
will decrease prices, not increase them.

According to Ford's laws of service: Price, volume (scale) and quality are
never opposed.

1\. Decrease price and you can increase volume.

2\. Increase volume and you can increase quality.

3\. Increase quality and you can increase volume.

4\. Increase volume and you can decrease price.

~~~
boulos
Sorry if I wasn't clear: your bytes on GCS and S3 are stored across multiple
buildings (GCS Regional, S3 Standard). More copies is more dollars not less
;).

~~~
jorangreef
"More copies is more dollars not less ;)"

As far as I am aware GCS does erasure coding across sites?

Backblaze could do multiple tiers of erasure coding and they would still be
able to reduce prices given more scale, ceteris paribus.

It's not a question of number of replicas, data centers or technical
implementation, but a question of pricing policy.

Does one want to use volume and scale to drive prices down (and cheaper prices
to increase volume) or does one want to use volume and scale to bloat margins?
Backblaze are arguably doing the former.

Does one want to lock customers into an ecosystem by enforcing excessive
bandwidth prices or does one want to pass on bandwidth cost-savings to
customers? Backblaze are arguably doing the latter.

Backblaze would continue to be cheaper because their pricing policy serves
customers across all dimensions.

More scale is definitely less dollars not more (even if it means a fraction of
a few more erasure coded shards across sites).

Disclosure: I do not work for Backblaze.

------
scrollaway
Anyone else finding their S3 bill consisting of mostly PUT/COPY/POST/LIST
queries? Our service has a ton of data going in, very little going out and
we're sitting with 95% of the bill being P/C/P/L queries and only the
remaining 5% being storage.

Either way, good news on the storage price reductions :)

~~~
cyberferret
What app/site are you using to upload to S3? I use a combination of CloudBerry
Backup and Arq Backup on my Macs/PCs here and the requests aren't that high
(on average about 30Gb of data per machine in around 300K files).

I am guessing it comes down to the algorithm used to compare and
upload/download files. I believe the two solutions above use a separate
'index' file on S3 to track file compares.

~~~
scrollaway
It's more that we have a pretty high throughput system, using Lambda.

Users authenticate with an API gateway endpoint, we do a PUT to store a
descriptor file, send a presigned PUT URL back so they can upload their file,
we then process the file and do a COPY+DELETE to move it out of the "not yet
processed" stage and finally do another PUT to upload the resulting processed
file.

Despite a lot of data, the storage bill is barely scratching $40, but we're at
almost $700/mo on API calls.

~~~
ranman
Heyo, sounds not quite right if you wanna shoot me an email
randhunt@amazon.com I'd be happy to try to figure it out. Your API calls
shouldn't be that much more than the storage cost without a really strange
access behavior. I don't know the answer off the top of my head but I'm down
to try to find it out. GET's cost money and outbound bandwidth cost money but
PUTS/POSTS should be neglible.

~~~
scrollaway
Thanks! I'll shoot an email.

Edit: Sent!

------
jakozaur
Great discount. I'm only surprised that Infrequent Access doesn't get any
discount.

By the way, I wrote article, how to reduce S3 costs:
[https://www.sumologic.com/aws/s3/s3-cost-
optimization/](https://www.sumologic.com/aws/s3/s3-cost-optimization/)

------
MrBuddyCasino
Do we know now how Glacier actually works? Tape robots, spun-down disks, racks
of optical media?

Best source I could find was: [https://storagemojo.com/2014/04/25/amazons-
glacier-secret-bd...](https://storagemojo.com/2014/04/25/amazons-glacier-
secret-bdxl/)

------
deafcalculus
Any chance Google will match this price for their coldline storage? I was
planning to archive a few TBs in Google coldline, but Glacier is now cheaper
and has a sane retrieval pricing model.

------
msravi
> For example, retrieving 500 archives that are 1 GB each would cost 500GB x
> $0.01 + 500 x $0.05/1,000 = $5.25

Shouldn't that be $5.025? Or did I misunderstand?

------
nakodari
In our startup, the biggest cost is bandwidth. We live in an age where videos
can be created and streamed in seconds to millions of people. With so high
cost for bandwidth, it's very difficult for bootstrapped startups to grow as
quickly as those who raise VC funding. I hope AWS can reduce the outgoing
bandwidth cost by 50%.

------
jaytaylor
_EDIT:_ My mistake, this is the new S3 pricing! NOT Glacier pricing! Thank you
res0nat0r.

Am I understanding this right? $0.023/GB/month for Glacier, so * 12
months/year = $0.276/GB/year, which means:

    
    
        10GB  = $2.70/year
        100GB = $27.00/year
        1TB   = $270.00/year
        ...
    

And this is only the _storage_ cost. This doesn't take into account the cost
should you actually decise to retrieve the data.

So considering a 1TB hard drive [0] costs $50.00, how is this cost effective?
I can buy 5x1TB hard drives for the price of 1TB on Glacier.

I understand there is overhead to managing it yourself. So, is this just not
targeted to technically proficient folks?

[0] [https://www.amazon.com/Blue-Cache-Desktop-Drive-
WD10EZEX/dp/...](https://www.amazon.com/Blue-Cache-Desktop-Drive-
WD10EZEX/dp/B0088PUEPK)

~~~
derekdahmer
That's comparing apples and oranges - your hard drive doesn't live in a
secure, fireproof data center in the cloud you can access from anywhere.

~~~
jeffbarr
Or eggs and omelettes...

------
lucb1e
If costs matter to you, e.g. for home backups, don't buy Glacier (and _heck_
don't buy S3). A 3TB drive costs about 110eur, so if you'd have to buy a new
one every year (you don't) that'd cost 110/3/1000/12=0.31 cents per gigabyte
per month. Glacier? 7 times more expensive at 2.3ct.

Hardware is usually not a business' main cost but it does matter for home
users, small businesses or startups that didn't get funded yet, some of whom
might consider Tarsnap or some other online storage solution which uses
Glacier at best and S3 at worst. Now you could suddenly be 7× cheaper off if
you do upkeep yourself (read: buy a raspberry pi) and if you throw away drives
after one year.

~~~
x0x0
There is value to having off-premises replicated storage on something more
durable than home-user targeted drives.

Google cloud nearline costs $0.12 per gigabyte-year with prices that will
continue to fall. For a typical 500g hard drive that saw perhaps 700g of
unique data, that's $84/year to have an outside-the-house replicated backup
using something like Arq.

------
questionr
how does this compare to Google's Coldline storage?

~~~
boulos
The biggest difference is that Glacier is still a "suspend/resume" type of
accesss. However, if you just want to compare pricin, it'll depend on your
access pattern and object sizes.

Retrieval in all Google Cloud Storage is instant and for Coldline is $.05/GB
(and Nearline $.01/GB). If you value that instant access, it seems the closest
you'd get with the updates to Glacier is via the Expedited retrieval ($.03/GB
and $.01/"request" which is per "Archive" in Glacier). Then you have to decide
how much throughput you want to guarantee at $100/month for each 150MB/s.
(It's naturally unclear since it was just announced what kind of best-effort
throughput we're talking about without the provisioned capacity).

If you're never going to touch the bytes, and each Archive is big enough to
make the 40 KB of metadata negligible then the new $.004/GB/month is a nice
win over Coldline's $.007. Somewhere in between and one of the bulk/batch
retrieval methods might be a better fit for you.

But again, it's still a bit of a challenge to go in and out of Glacier while
Coldline (and Nearline and Standard) storage in GCS is a single, uniform API.
That's worth a lot to me, and our customers. But if Glacier were a good fit
for a problem you have, and you're talking about enough money to make the pain
worth it, you should seriously consider it.

Disclosure: I work on Google Cloud, so naturally I'd want you to use GCS ;).

------
thijsvandien
With S3 Standard essentially getting S3 Standard - Infrequent Access storage
pricing, where does that leave the latter?

~~~
larrymcp
Hey, did you mean Reduced Redundancy (not Infrequent Access)?

I just noticed that the new pricing for Standard (2.3¢) is now _less_ than the
pricing for Reduced Redundancy (2.4¢)! So there appears to be no reason to use
Reduced Redundancy anymore.

~~~
thijsvandien
You're right, I mixed up after spending too much time on the different pricing
pages yesterday. :)

------
adorable
Any EBS storage price reductions? Those are pretty high at this stage.

------
user5994461
That's both a good and a terrible change.

\- The price reduction on S3 is good! Kudos AWS.

\- The price change on glacier is a fucking disaster. They replaced the
_single_ expensive glacier fee with the choice among 3 user selectable fee
models (Standard, Expedited, Bulk). It's an absolute nightmare added on top of
the current nightmare (e.g. try to understand the disks specifications &
pricing. It takes months of learning).

I cannot follow the changes, too complicated. I cannot train my devs to
understand glacier either, too much of a mess.

AWS if you read this: Please make your offers and your pricing simpler, NEVER
more complicated.

(Even a single pricing option would be significantly better than that, even if
its more expensive.)

~~~
Lazare
I believe this __HUGELY __simplifies glacier pricing.

I don't think you understand just how insanely, incredibly, bizarrely
complicated the old glacier pricing was.

~~~
user5994461
There are 3 pricing models for S3, the 3rd one (glacier) having 3 sub-pricing
models to be chosen at time of requests.

I don't think you realize how insanely complex the entire S3 pricing model
once you get out of the "standard price".

Maybe I just have too much empathy for my poor devs and ops who try to
understand how much what they're doing is gonna cost them. It's only one full
page, both sided, of text after all.

~~~
saranagati
Is that really that much different than hard drives? No hard drive
manufacturer uses the same standards to determine how much space there will
actually be on the disk. You can get hard drives that spin at many different
RPMs. You can get hard drives with many different connector types. Drives with
different numbers of platters. An 8TB, 7200 RPM, SATA, Western Digital drive
is not going to have the same seek time as a 1TB, 7200 RPM, SATA, Western
Digital drive.

There are so many combinations of hard drives that will result in different
performance for different situations all with different costs. Then you start
talking about cold storage as well and you've moved into other media formats.

Just because there is a page worth of a pricing model doesn't mean AWS or any
cloud provider is doing anything incorrectly. You're paying for on demand X
and engineers who are going to utilize that should understand it as well as
they would understand how to build an appropriate storage solution of their
own. On demand just means now they don't have to take the time to design,
implement and operate it themselves.

