
Amazon S3 Batch Operations - jeffbarr
https://aws.amazon.com/blogs/aws/new-amazon-s3-batch-operations/
======
zacharyozer
Batch delete, batch delete, wherefore art thou batch delete?

~~~
meritt
It's not exactly what you're asking for, but we have a large bucket with
billions of files (don't ever do this, it was a terrible idea) and we manage
deletions via lifecycle rules. If your file naming convention and data
retention policy permits it, far easier than calling delete with 1,000 keys at
a time.

Also just a word of warning, if you do have a lot of files, and you're
thinking "let's transition them to glacier", don't do it. The transfer cost
from S3->Glacier is absolutely insane ($0.05 per 1,000 objects). I managed to
generate $11k worth of charges doing a "small" test of 218M files and a
lifecycle policy. Only use glacier for large individual files.

[1] [https://docs.aws.amazon.com/AmazonS3/latest/user-
guide/creat...](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-
lifecycle.html)

~~~
toomuchtodo
I have to ask: what’s performance like for operations on the bucket objects?

Edit: I ask because AWS suggests a key naming convention for large object
amounts to ensure that you're distributing your objects across storage nodes,
to prevent bottlenecks.

[https://docs.aws.amazon.com/AmazonS3/latest/dev/request-
rate...](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-
considerations.html)

~~~
votepaunchy
“This S3 request rate performance increase removes any previous guidance to
randomize object prefixes to achieve faster performance. That means you can
now use logical or sequential naming patterns in S3 object naming without any
performance implications.”

[https://aws.amazon.com/about-aws/whats-
new/2018/07/amazon-s3...](https://aws.amazon.com/about-aws/whats-
new/2018/07/amazon-s3-announces-increased-request-rate-performance/)

------
usr1106
Symptomatic for that business that cost is not mentioned in the whole
announcement. I am getting more and more skeptic against all serverless
because the cost is really difficult to estimate, plan, and manage. Of course
if used right some of these services can be cost-efficient. But in real life
not all SW is done right...

If you buy a server and run a poorly architectured system on it you note that
it does not perform and need to make changes.

If you use serverless and run a poorly architectured system on it you pay and
you need to make changes (after someone noted the bill). Yes, there are cost
reports but they are not easy to use and understand. With a performance
bottleneck the system limits while you are trying to understand the
performance measurements. In the cloud case you are paying while trying to
understand what is wrong.

Of course in a big corporation money does not matter to a software developer.
But in a small company the bill paid to the cloud provider might have a direct
impact on the company being able to pay your salary in the near future.

~~~
heavenlyblue
>> If you buy a server and run a poorly architectured system on it you note
that it does not perform and need to make changes.

The examples you provide are not equivalent. It’s more like “we have a poorly
architected software, so we had to buy 200 dedicated servers, because we
didn’t know/couldn’t make it work on 10 of them”.

On cloud you could simply update your software and then downscale. Of course
you pay more for flexibility, but please stop with those strawmen.

------
social_quotient
“Invoking AWS Lambda Functions ... I can invoke a Lambda function for each
object, and that Lambda function can programmatically analyze and manipulate
each object. ”

Wow thanks!

~~~
0xCMP
Oh man, that's what Joyent's Manta could do a while ago. This is nice to have
now via AWS

~~~
kjeetgill
Or, you know, JUST map for map-reduce/hadoop.

------
seancoleman
A few months back, I designed a small background system requiring a flat
key/value store for tracking large amounts of data (>10GB/day). I was hoping
to use S3 as a cheap key/value store, but the lack of batch operations,
requiring individual puts, made it performance-prohibitive, so I went with
DynamoDB. It's worked out great but I'll always wonder what could have been
with S3 if I had batch operations back then.

~~~
yazaddaruvala
This is a different type of “batch”.

You’re talking about API level batch calls. This is about simplifying
workflows which rely on Listing every object in S3 and doing “something”.

------
moes_dev
I was hoping to use this for moving large video files to a different prefix,
but just spotted a limitation of the PUT Object Copy - "Objects to be copied
can be up to 5 GB in size."

Cool feature otherwise.

