
Amazon S3 now allows multi-object delete - andrevoget
http://aws.typepad.com/aws/2011/12/amazon-s3-multi-object-deletion.html
======
saurik
Ok, I can't help but wonder... ;P

So, I use S3. A lot. Only, I don't store many bytes: I just store lots of
objects... enough that I have in the past estimated that I owned >1% of all
objects on S3.

(<http://news.ycombinator.com/item?id=2154792> <\- here I describe how I had
been making a billion PUTs per month when Amazon comes out and claims they
only have <300 billion objects stored in S3.)

However, in the last few days? I've binged, and this week I'm purging. I've
been doing as many as 10 million (and never less than 4 million) deletes per
hour for /days/.

Why? One of the things I use S3 for (storing SHSH data, a giant man-in-the-
middle attack against Apple's dysfunctional iPhone software protection
systems) was simply wasting space. :(

(Specifically, I realized that I had, due to some debugging code I never
really realized was for debugging only and thereby never got around to
deleting, had been storing every single request in and out to the service
throughout its history in XML form, in addition to the tightly encoded
binary.)

The thing to realize here: DELETE requests are free. PUT/LIST requests are
$0.01/1k, GET/HEAD are $0.01/10k, but DELETE requests are not billed by Amazon
at all; even the bandwidth for them "won't count" if they are made from EC2
(which I'm doing).

However, one can only imagine that deleting a billion objects in a few days
with a billion separate tiny requests to Amazon is causing someone, somewhere,
a bad week (and possibly even a ton of meetings). :(

(Which, honestly, I feel sorry about, but my business isn't even that
profitable, so I've been having to figure out ways to cut costs, and this
useless data is apparently costing me nearly $1k a month.)

Well, with all of this in mind, and now after almost a week of "the onslaught"
(and even myself wondering multiple times what this many requests must be
doing to their stats), Amazon conveniently announces a new API feature that
lets me decrease the number of separate requests I'm making to their servers
by a thousand times over...

...really, I cannot help but wonder ;P.

~~~
RyanGWU82
Nah, this has been in the works for a few months. I deleted 160 million
objects in the spring, which was enough for me to get a phone call from an
Amazon product manager. He asked if there was anything they could do to help
the deletes go smoothly, and outlined this feature way back then.

~~~
saurik
Did you, in fact, run into any serious problems with those deletes? (I now
have a range of key space that I seem to have "damaged": if I do a LIST with
prefix "requests_/" I now deterministically get an InternalError.)

Seriously, though, this new multi-delete is EPIC ;P. I am now deleting objects
at an rate of over 4 million per minute! (Although, as these are POSTs, I
wonder if these now cost me money; that said, it likely isn't going to much.)

(This is fast enough that I'm now "cleaning house" much more thoroughly than
before: going through some of those "database buckets" I have and deleting
obsolete indexes.)

~~~
jeffbarr
Can you file a bug or email me (jbarr at amazon.com) so that we can
investigate this issue for you?

~~~
saurik
(Done.)

------
elq
It's about damm time. This would've been nice a week ago when I deleted 140K
items from a bucket that were moved elsewhere. The deletes took _14 hours_.

~~~
saurik
The secret was to do the delete operations in parallel; I've been running
anywhere between 64 and 128 LIST->*DELETE processes at any given moment; was
often able to sustain 10m DELETE operations per hour. This new feature is
still much faster, of course ;P.

------
joelhaasnoot
CloudBerry Explorer has already been updated to support this. A good way to
lower my $0.60 worth of usage.

