The point is simply to show how SSDs can't be considered, currently, as a bit slower version of memory. Their performance characteristics are a lot more about, simply, "faster disks".
Now those new disks are fast enough that if you design a database specifically to use SSDs, you can get interesting performances, compared to old disks. However the idea to use the disk as something you can use to allocate memory will not work well, and complex data structures requiring many random access writes, will not work either.
Code can be optimized for SSD usage of course, but this poses huge restrictions to what you can do and what you can't. This shows how the current Redis strategy of providing complex fast operations using just memory makes sense. In the future as SSDs will converge more with memory, this may change.
Redis' model is particularly vulnerable to the disk slowdown because the pagefault blocks all requests..
Normally being single-threaded isn't a big deal for Redis because you are likely bound by Network i/o or CPU but using ssd swap is equivalent to using blocking sync disk i/o, which nobody would do :D
Could there be more aggressive memory allocation (basically a region per key / implement moving whole keys to new regions when their ds outgrows the block)? Sure.. but you're still going to pay dearly for the cost of a miss. This approach would help if you want to have only your 'active' ds in memory and let the OS page out cold keys, but this would require major re-work of redis internals (or I thought it would 6 months ago when I last considered this as a fun project..)
I come to the opposite conclusion when thinking about SSD's place between memory and disk. Let's talk about random I/O, which is the relevant metric for Redis: Yes, memory is a lot faster than SSDs (roughly 2+ orders of magnitude), but SSDs are also a whole lot faster than disk (roughly 2+ orders of magnitude). That makes them sound like they are sort of "in the middle".
But, let's look at another crucial property: something that I'll call the "characteristic size". This is the bytes of data such that the "seek cost" is equal to the cost of reading the bytes. You get this number by dividing the scan speed by IOps. I'll work in very rough numbers:
Memory: 20GB/s / 20M iops = 1Kb
SSD: 500MB/s / 100K iops = 5Kb
7K Disk: 160MB/s / 160 iops = 1000Kb
As you can see, SSDs are much more like memory than disk in this crucial parameter which dictates what data structures and algorithms will be most efficient.
So, actually, do I think of an SSD as more like slower memory than a fast disk.
* edit: Typo
You mention that characteristic size is the crucial parameter, but isn't that only the case when you are only considering a single storage medium? Since you always have memory, wouldn't a caching strategy between memory and SSD provide superior performance compared to only worrying about how the data is formatted while on the SSD?
Overall I would agree that SSD's have similar seek performance to RAM, where you can live with fragmentation, but I think the OP's point that there throughput is barely better than a disk is still valid.
What would it mean for "SSD to be like slow memory"? To me, it would mean that both bandwidth and seeks would run proportionally slower. This is why I'm using the "characteristic size" metric--to evaluate that proportionality (and to give it a physical interpretation).
Yes, a 7K disk can output 160MB/s ... if it's doing a continous read, with zero seeks per second.
If it's actually doing 160 seeks/second, it's not going to have time to read 1MB (taking a further 1/160th of a second) after each seek.
So this metric means "how much data you can read per iop if you want to cut your iops by about 50%".
How is that useful above & beyond the input numbers?
540k 512b reads (70 microseconds)
1100k 512b writes (15 microseconds)
Compare a good SSD, Samsung 840 to a normal PC using dual channel 1600MHz DDR3.
Maximum sequential read speed
0.5GB/s vs 25GB/s
Random read speed
0.01-0.1GB/s vs 3GB/s
30000-40000ns vs 6-65ns
So we're dealing with (best case) a bandwidth difference of factor 30 and a latency differnce of factor 500.
Now this isn't taking other things in to consideration, such as SSD performance degradation and the requirement of running garbage collection or trim.
Yes. Running tests to validate your assumptions is a big part of robust software engineering. In this case the results were unsurprising but not uninteresting.
Testing your assumptions is something that you're supposed to do when you hit a wall, not when you're driving through a field.
But do you have enough information about how Redis accesses memory under the benchmark in question, combined with the OS page replacement strategy, combined with the characteristics of SSDs, to know the results beforehand? You can guess, for sure; but do you know?
If we all follow your approach, we'll never be surprised unless we get stuck; and if we know what we think we know as well as you seem to think we know it, we shouldn't get stuck in the first place. We should have assumed that we would have gotten stuck, and avoided it.
The article has relatively low value in terms of information content, but the mindset is to be commended. It should have given the author better intuitions about the 3 factors mentioned above. Modern, non-budget systems very seldom thrash; there's a younger generation coming along who've never experienced systems frozen in that way.
An analogy I can think of is testing if a stock Ford Fiesta can reach the speed of sound. You know what the engine is capable of, the environment it is operating in and the tires it is running on - you simply don't need to floor the accelerator to come to a conclusion.
That saying about picking ones battles comes to mind. The mindset is certainly of a sharp character, but what good is a knife without a hand to guide it? The map is not the territory but it does save a lot of time if used strategically.
Of course you can do this in your code, but then you step out of redis. I think it would be nice to bake this into redis, knowing that once loaded back from secondary storage, you get exactly the same object, and avoiding the whole (de)serialization process. Of course you won't achieve the same performance, but this is at least a known penalty.
I'll go and play with this now... :)
Acceptable is a fuzzy standard. Different applications have different needs, and not all applications require thousands of transactions per second. I'd presume there is an I/O rate below which the performance remains stable. Do you know what this rate is, and how it compares to transfer speed or latency of the SSD?
Great analysis BTW.
Not sensible places to use SSD: swapfiles.
Swapping to SSD will also trash your SSD, because you are continually rewriting it.
Edit: The physical tech has also improved since then with "high-endurance MLC" cells. http://www.anandtech.com/show/5518/a-look-at-enterprise-perf...