
Archiving Amazon S3 Data to Amazon Glacier - jeffbarr
http://aws.typepad.com/aws/2012/11/archive-s3-to-glacier.html
======
ck2
Overly complex but workable. Which seems like a lot of how AWS operates.

I guess the "prefix" could be a pseudo-folder?

 _Although the objects are archived in Glacier, you can't get to them via the
Glacier APIs. Objects stored directly in Amazon Glacier using the Amazon
Glacier API cannot be listed in real-time, and have a system-generated
identifier rather than a user-defined name._

So per usual we have no clue how many actual files are in an AWS service
(without a complex/expensive transaction) the true directory structure or
total size.

Since there is no inbound bandwidth charge, might as well upload directly to
glacier via the api for more control.

~~~
jeffbarr
Think of Glacier as another storage class for S3 when used in this way. We
have regular, reduced redundancy, and now Glacier.

~~~
ck2
That actually should have been their entire API extension to integrate S3 with
Glacier.

Change the storage type, via the aws console or API from

Storage Class: Standard <-> Reduced Redundancy <-> _Glacier_

When toggled to Glacier, it archives it, when toggled back to standard or
reduced, it moves it back to S3.

Would have been beautifully simple and practical.

------
pshc
This is great news for my photo archives. I dump the raw images into S3, and
run a script to generate thumbnails and html indexes (in reduced redundancy).
Now I can have the raws flush to Glacier after a month or so, once I've
probably lost interest in them for the time being.

Now I just need to figure out an easy way to go from a static HTML index to a
RESTORE call...

------
ck2
After looking at this directly in the control panel, the automation rules are
just not going to be good enough.

Let's say you have a three day rotation of daily backups, 1 2 and 3 into S3.

If you wanted to use these controls to archive a weekly on Sunday and/or
monthly on the 1st directly into glacier, there is no way to tell which of the
rotations is the newest.

So it has to be done directly via the glacier API anyway.

Unless I am missing something or my backup logic could be improved?

What would have been actually useful is if there was a trigger command that
could be sent via the S3 api itself. ie. copy|move|restore /backup/1 to
glacier

Anyway will just use free inbound on glacier instead. It's a shame because it
uses up the sending servers bandwidth to duplicate the exact process it just
completed to S3.

added: unless maybe a one-time lifecycle rule can be created via the S3 api to
tell it the daily archive it just uploaded (since the sending server knows
right then which directory it just used) to copy into glacier - that might
work

------
PanMan
It seems strange to me that they do have a "Move files matching Y to Glacier
at moment X" interface, but not a "Move this to Glacier now" interface. Or am
I missing something? I guess you could do the second with the first, but it
seems more complex.

------
josephlord
I can't see anything to allow throttling the rate of Glacier->S3 transfer to
manage retrieval costs. I can only see an option to control how long the
restored object is kept in S3.

Have I missed something or is that not possible?

------
josteink
This ofcourse comes 2 days after I manually migrated much of my S3 data to
Glacier.

As ck2 says, it seems a bit complex, but it also seems to offer other benefits
(like a realtime index).

I guess I'll have to reconsider my options, again.

------
pieter
This is pretty cool. Anyone know what the retrieval pricing is? With Glacier
you can limit the amount you pay by retrieving really slowly, but I'm not sure
that works with the S3 frontend.

~~~
rlpb
They say in the article that the pricing is the same as for Glacier.

> With Glacier you can limit the amount you pay by retrieving really slowly

This will only work if you have small enough Glacier archives and stagger the
retrieval requests. The pricing is based on the speed that they retrieve from
Glacier internally during the job, not at the speed that you retrieve after
the job is complete. I presume the same will apply for S3.

