

Amazon S3 – 2 Trillion Objects, 1.1 Million Requests/Second - jeffbarr
http://aws.typepad.com/aws/2013/04/amazon-s3-two-trillion-objects-11-million-requests-second.html

======
sylvinus
"If you added one S3 object every 60 hours starting at the Big Bang, you'd
have accumulated almost two trillion of them by now."

That actually sounds underwhelming! IMHO our brains have an easier time
thinking "hey, 1 every 60 hours that's not much" compared to figuring out the
universe is really incredibly old ;-)

~~~
jeffbarr
I'm open to more creative analogies. I'll happily send some AWS stickers to
the first 5 truly great ones that show up under this reply...

~~~
columbo
My math is probably wrong... but I believe if you ate a twinkie for every
request at the end of a year it would take 1,350 Blue Marlin heavy lift ships
to move you across the ocean.

~~~
somethingnew
also you would die.

------
MartinMond
So how many objects should have been corrupted/lost according to their SLA and
how many actually did get corrupted?

~~~
HeyImAlex
>how many objects should have been corrupted/lost according to their SLA

Anywhere from 20 to 200M per year depending on how many people use RRS...

~~~
gphil
It's staggering that out of 2 trillion only 20 might be corrupted, assuming
they lived up to their SLA.

2000000000000 - (.99999999999 * 2000000000000) = 20

Makes me feel pretty OK about having backups there.

------
chuckmans3
Considering we don't hear about problems that often, this is quite an
impressive feat of engineering. You really don't think about it until we get a
hiccup and half the Internet goes down.

------
Smrchy
It would be really interesting to know the average size of an object and
visualize the amount of harddisks it takes to store all this data.

~~~
ihsw
To put this into perspective it's 2.7B new objects per day (assuming 1
Trillion objects averaged over 365 days).

Assuming each object is 100KB (generous estimate, after compression) that
would be 270GB per day -- or assuming ten levels of redundancy and striped
across three RAID storage devices (per level of redundancy) then 8.1TB per
day.

I'm not familiar with their hard disk procurement policies but it wouldn't be
difficult to assume they've been purchasing 1TB drives, so 10 new disk drives
per day just for keeping ahead of growth. Furthermore let's assume their disk
drive failure churn rate is 10% per day so another 1 new disk drive for parts
replacement (so 11 disk drives per day).

These are really loose numbers not based on any actual data (or any personal
experience at all) but just napkin math, so take it all with a grain of salt.

~~~
stilldavid
I'm not convinced that 100KB is a great estimate on file size, but either way
you're off by a few zeroes. It's not 270GB per day, it's 270TB. Even if each
object were just one byte, _that_ would be 2.7GB. 100KB is one hundred
thousand bytes. So it's quite a bit more than eleven drives per day!

~~~
ihsw
You are correct, that would be 270TB.

After applying the same shoddy math with each object being 100KB -- 270TB with
10 levels of redundancy across 3 RAID drives resulting in 8,100TB per day.
This would be 8,100 drives (at 1TB per drive), or 8,910 drives after 10% being
dead-on-arrival.

The math is sketchy, so let's cut it down by 10x (10KB per object): 891 drives
per day. Keep in mind this is just for S3 and it doesn't account for existing
drives failing, growth, or what other services require (eg: EC2, RDS,
Cloudwatch, Cloudfront, etc).

------
upthedale
Wow, Windows Azure seems to be beating it by a long way. Genuine surprise... I
assumed it would be much closer.

9 months ago, they announced they stored twice this amount - 4 trillion
objects. A year before that, 1 Trillion. Given that previous rate of growth,
we can expect they have a lot more than this now.

They also announced peaks of 880,000 requests a second. Whilst Amazon wins
here, I'd say its fair to assume this number has increased in those 9 months.

[http://blogs.msdn.com/b/windowsazurestorage/archive/2012/07/...](http://blogs.msdn.com/b/windowsazurestorage/archive/2012/07/20/windows-
azure-storage-4-trillion-objects-and-counting.aspx)

~~~
theatraine
Bear in mind that Azure storage includes "Blobs, Disks/Drives, Tables, and
Queues" however S3 is only blobs (other services like Amazon's DynamoDB would
be analogous to table storage). Hence, it's not an apples-to-apples
comparison.

~~~
upthedale
Fair point. Do we have any other numbers that might make it a more apples-to-
apples comparison?

------
TallboyOne
What is an 'object' in this sense?

~~~
kalleboo
A file

~~~
facorreia
i.e. a sequence of bytes

------
kenneth_reitz
I love S3, and have been experimenting with using it as a true key value
store.

<https://github.com/kennethreitz/elephant>

It's going quite well so far :)

------
ConceitedCode
Anyone know what technology S3 uses to meet this kind of demand?

~~~
thelegit
Hard drives..

Jk... but seriously, good question! it's gotta just be a massive feet of
engineering

------
cinquemb
waiting for the comparison to the total outstanding us derivative exposure :P

~~~
jeffbarr
Dollars and objects don't compare very well.

------
raulonkar
what do u mean by objects?

~~~
jeffbarr
An object is a single blob of data. S3 objects vary in size from 1 byte up to
5 Terabytes. They can be uploaded with a single PUT, or with multiple PUTs in
series or in parallel (which we call multipart upload). They can be downloaded
as a unit, in full (GET) or in part (range GET).

~~~
raulonkar
Thanks jeffbarr for valuable information...

~~~
Udo
Which you could have googled in 3 seconds.

------
spullara
I am surprised that the number of requests per second is this low — especially
if this includes PUTs. There must be a pretty huge multiple between CloudFront
and S3 that keeps this in check.

------
tomschlick
Anyone else find this surprisingly low? I'd imagine your typical web service
holds a few thousand objects in S3 for images etc, then backups and anything
else. Then you have your big players like netflix, dropbox etc that use the
service. Who store data for tens of millions of customers...

~~~
fooyc
1.1 Million Requests/Second seems especially low for Amazon.

The average server can serve more than 1,000 static objects / second easily.

~~~
amock
There is more to S3 than just serving static objects off of the local drive.
Maintaining the integrity of the data and ensuring that the data sent out is
consistent is not a trivial task. A constantly changing map with 2 trillion
keys is a hard problem on its own. Also, serving 1000 1MiB objects per second
is not the same as serving 1000 1KiB objects per second so it's hard to say
how many resources just the serving portion consumes.

~~~
fooyc
I'm not saying it's simple nor easy.

I was assuming that the number of requests referred to read requests, and I
guess that the system is designed in a way that makes read requests very
cheap, maybe even cheaper than reading a file on a usual filesystem, at least
for hot data.

Just because it's huge and complex doesn't mean it's slow and that requests
are expensive.

