High performance single-threaded access to SimpleDB

bayareaguy · on June 29, 2008

Are your operations simple enough that you could perform equivalent ones using carefully constructed keys and S3 list/get/put?

cperciva · on June 29, 2008

Well, S3 doesn't have a Query operation. But I suppose you could do the same PUTs, GETs, and DELETEs using S3 if you wanted.

bayareaguy · on June 29, 2008

It does however support listing key ranges with "prefix=" parameter.

cperciva · on June 29, 2008

That allows you to get a list of items with names in a given range, but doesn't allow you to do queries on the items' attributes.

bayareaguy · on June 29, 2008

For what I was thinking the value you want to query could go in the "name".

cperciva · on June 29, 2008

If you want to use S3 LIST to emulate a SimpleDB Query, you can only Query one attribute and that attribute must be at the start of the name. Obviously you can't have two different attributes both at the start of the name, so any Query involving two or more attributes can't be emulated by S3 LIST.

bayareaguy · on June 29, 2008

Agreed, but if your values aren't large and if your client can do the obvious filtering then you can emulate it quite easily. This is actually what Mark Atwood's S3 storage layer for MySQL did.

I'm still curious about the values you're using in your test. Did you use exactly the same ones described on the "Indexing and Querying Amazon S3 Metadata with Amazon SimpleDB" page?

cperciva · on June 29, 2008

Agreed, but if your values aren't large and if your client can do the obvious filtering then you can emulate it quite easily.

True, but the S3 LIST operations you need for that can get expensive quickly. I considered it for the application I intend to use SimpleDB for, and decided that SimpleDB was far cheaper.

I'm still curious about the values you're using in your test.

They were utterly synthetic -- the first things which came to mind: The item names were 0000, 0001, 0002, ... 9999, the attribute names were "square" and "cube", and I think you can guess what the attribute values were. :-)

bayareaguy · on June 29, 2008

Thanks. I remember your earlier article where you wrote

Third, if you have an item with only one attribute, and your read:write ratio is more than 22:1, it's cheaper to use S3 instead of SimpleDB -- even ignoring the storage cost -- since S3's 1 μ$ per GET is cheaper than SimpleDB's 1.305 μ$ per GetAttributes request.

and wondered if that still held true for varying I/O rates.

cperciva · on June 29, 2008

Yes, still true -- the price (per BoxUsage) of SimpleDB requests doesn't depend on how many requests you're making per second.