I saw a few comments in the post asking what the point of the blog post was. Aft...

aaronblohowiak · on March 7, 2013

As antirez knows, but others may not:

Redis' model is particularly vulnerable to the disk slowdown because the pagefault blocks all requests..

Normally being single-threaded isn't a big deal for Redis because you are likely bound by Network i/o or CPU but using ssd swap is equivalent to using blocking sync disk i/o, which nobody would do :D

Could there be more aggressive memory allocation (basically a region per key / implement moving whole keys to new regions when their ds outgrows the block)? Sure.. but you're still going to pay dearly for the cost of a miss. This approach would help if you want to have only your 'active' ds in memory and let the OS page out cold keys, but this would require major re-work of redis internals (or I thought it would 6 months ago when I last considered this as a fun project..)

Dave_Rosenthal · on March 7, 2013

> The point is simply to show how SSDs can't be considered, currently, as a bit slower version of memory. Their performance characteristics are a lot more about, simply, "faster disks".

I come to the opposite conclusion when thinking about SSD's place between memory and disk. Let's talk about random I/O, which is the relevant metric for Redis: Yes, memory is a lot faster than SSDs (roughly 2+ orders of magnitude), but SSDs are also a whole lot faster than disk (roughly 2+ orders of magnitude). That makes them sound like they are sort of "in the middle".

But, let's look at another crucial property: something that I'll call the "characteristic size". This is the bytes of data such that the "seek cost" is equal to the cost of reading the bytes. You get this number by dividing the scan speed by IOps. I'll work in very rough numbers:

Memory: 20GB/s / 20M iops = 1Kb

SSD: 500MB/s / 100K iops = 5Kb

7K Disk: 160MB/s / 160 iops = 1000Kb

As you can see, SSDs are much more like memory than disk in this crucial parameter which dictates what data structures and algorithms will be most efficient.

So, actually, do I think of an SSD as more like slower memory than a fast disk.

* edit: Typo

Guvante · on March 7, 2013

You are just changing the metric of measurement to come to your conclusion. The OP said "throughput is closer to disk that memory" which your own numbers, 20 GB/s vs 500 MB/s vs 160MB/s line up very well with, a ratio of 400:1 vs 3:1. You counter with "characteristic size is closer to memory than disk".

You mention that characteristic size is the crucial parameter, but isn't that only the case when you are only considering a single storage medium? Since you always have memory, wouldn't a caching strategy between memory and SSD provide superior performance compared to only worrying about how the data is formatted while on the SSD?

Overall I would agree that SSD's have similar seek performance to RAM, where you can live with fragmentation, but I think the OP's point that there throughput is barely better than a disk is still valid.

Dave_Rosenthal · on March 7, 2013

You make a good point about bandwidth--that SSDs are more like disks. However, OP doesn't mention bandwidth, nor is it relevant to his experiment.

What would it mean for "SSD to be like slow memory"? To me, it would mean that both bandwidth and seeks would run proportionally slower. This is why I'm using the "characteristic size" metric--to evaluate that proportionality (and to give it a physical interpretation).

finnh · on March 7, 2013

I don't understand the utility of your homegrown metric.

Yes, a 7K disk can output 160MB/s ... if it's doing a continous read, with zero seeks per second.

If it's actually doing 160 seeks/second, it's not going to have time to read 1MB (taking a further 1/160th of a second) after each seek.

So this metric means "how much data you can read per iop if you want to cut your iops by about 50%".

How is that useful above & beyond the input numbers?

rossjudson · on March 11, 2013

The comparison gets closer to memory when you look at PCIe-based flash storage.

2.5GB/s bandwidth 540k 512b reads (70 microseconds) 1100k 512b writes (15 microseconds)

jbert · on March 7, 2013

Thanks for posting "negative" results. I think it's good science.

jaequery · on March 7, 2013

interesting. have you tried running the same tests on other db's as well? what'd be great would be a comparison with other databases, such as mongo/memcache and even mysql/pgsql.