
The 80/20 rule for storage systems - mrry
http://www.cohodata.com/blog/2014/12/10/8020-rule-storage-systems/
======
kijin
Seagate tried to take advantage of the 80/20 rule with their SSHD "hybrid
drives". Unfortunately, they decided to use only 8GB of flash on all of their
SSHDs. That's a far cry from 80/20\. The solid-state cache only covers 1.6% of
a 500GB hybrid drive, 0.8% of a 1TB drive, and 0.2% of a 4TB drive (yes,
Seagate makes 4TB SSHDs). Those Seagate drives don't even have enough flash to
hold a recent version of Windows and some commonly used apps, not to mention
any games.

The laptop version is even worse because the non-flash part only spins at
5400rpm. Somehow they thought this would be okay in a performance-oriented
drive. The slow platters make Seagate SSHDs even slower in general than
regular 7200rpm laptop drives from WD and Hitachi/HGST. The only improvement
is slightly shorter boot time.

Since the day the first generation of SSHDs came out, reviewers have said that
you need at least 32GB of flash in a hybrid drive to see much real-world
benefit. Everyone has also been complaining about the choice of 5400rpm. But
three SSHD generations later, Seagate still hasn't listened.

Now the SSHD technology is on the way to irrelevance because (1) SSDs have
become much cheaper than three years ago, and (2) even if you need a lot of
space, the mSATA form factor now allows most laptop users to install an SSD
_in addition to_ , not instead of, a 2.5" hard drive. These days, I wouldn't
even consider the SSHD as an option. I'd just buy a large hard drive _and_ an
mSATA SSD, there's no need for a hybrid anymore.

It's a sad tale of corporate boneheadedness killing off an idea that had so
much potential.

~~~
zurn
The world needs a single-HDD form factor explicit HD + SSD for laptops. No
changes in laptops needed, use a single cable and SATA's standard PMP (port
multiplier) feature in the device. OS sees two separate drives.

------
akanet
The article embeds a great talk by the author:
[https://www.youtube.com/watch?v=zDuxd1Enxj8](https://www.youtube.com/watch?v=zDuxd1Enxj8)

He talks about using HyperLogLog to store information about the IOPS a
workload is causing. This lets them do (at 4kb granularity) analysis on _just
how performant different cache sizes are_. Currently solutions are just
measuring filesystem last-accessed times, which hide a lot of important data
(notably storing VM images).

The tools he shows off are pretty awesome too, like letting you replay old
workloads and determining how best to allocate faster storage.

~~~
andywarfield
Thanks! We're pretty early in our use of HLLs within the system but are
already managing to get some really cool data off of them. I'm excited to see
where this all goes as we build out the system over the next year.

Your last-accessed time point is exactly right. Storage systems used to be
able to do a lot with file system-level metadata, but with the size and
opaqueness of VM image files, those techniques have become a lot less
effective. We're currently exploring how we can use HLLs in combination with a
couple of other techniques to do things like clustering co-accessed data and
then managing operations like prefetching and demotions over much longer time
frames than are typically done in OSes and storage systems.

~~~
donavanm
I havent read the paper yet, but Im a little surprised by the hyperloglogs. I
was under the impression that HLLs break down when your symbol frequency
varies by orders of magnitude. Those are exactly the patterns id expect to see
in block/page access frequncies over time. Are the HLLs only tracked on a
smaller temporal scale to increment the distance matrix? Or is there something
else Im missing?

~~~
nickharvey
The state of an HLL is completely determined by the set of distinct symbols
that appear, not the order or the frequency of those symbols. So, inserting a
billion A's and a single B into an HLL will have exactly the same outcome as
inserting a bllion B's and a single A, or even just a single A and a single B.

Does this address your concern, or did I misunderstand your point?

------
zaroth
Can a 'perfect' algorithm actually give end-users the consistent feel of flash
performance with 80% of data stuck on platters? I doubt it.

Access patterns having some statistical distribution is one thing. Writing the
allocator -- and always guessing right in where you allocate a file is another
thing altogether.

Flash is massively higher performance, so every time you are wrong the user
will definitely know. All we hear about these drives is that they aren't
consistently fast. Or they're decent until you actually push them too hard.

SSDs themselves used to have poor consistency. But drives are manufactured to
ever increasing standards. The Samsung 10 year warranty for example is
striking.

80% of your data on a medium which is less reliable over time?! More complex
moving parts increases failure rates. There are downsides here to consider I
think?

~~~
nl
_Can a 'perfect' algorithm actually give end-users the consistent feel of
flash performance with 80% of data stuck on platters? I doubt it._

Of course not. That's not the point!

What may be possible is to give the "consistent feel of flash performance"
some very high percent of the time for a _dramatically_ lower price.

If it's possible to get flash-like performance 95%+ _of the time_ for <50% of
the price then I'd find that pretty compelling.

~~~
zaroth
But flash will have higher performance and higher capacity, at least that's
the projections. I don't know why flash shouldn't be able to win at
price/capacity, longevity, total capacity, robustness, performance, etc. At
some point, there just shouldn't be any reason to manufacture storage in any
other form. In a few generations, isn't that the level of dominance of flash?

~~~
rasz_pl
>I don't know why flash shouldn't be able to win at price/capacity, longevity,
total capacity, robustness, performance, etc

yes, you dont, and this is why you think this: >But flash will have higher
performance and higher capacity

longevity, robustness, endurance, it all goes down with every new generation
of flash. Performance at silicon level is already as good as it gets, we only
win by making wider busses and assessing more data in parallel. Price per
gigabyte comes at a price (haha) or endurance. Speed comes at a price of more
chips per disk (or more wafers/structures inside chips) = money.

Want that sweet new 1TB PCIE SSD able to write at 2GB/s? it will shit itself
after ~24hours of continuous write.

There is no free lunch with SSDs, they are a dead alley.

------
detroll9823
From the article:

"...there are now more than three wildly different connectivity options for
solid state storage (SATA/SAS SSDs, PCIe/NVMe, and NVDIMM), each with
dramatically different levels of cost and performance.

So even if disks go away, storage systems will still need to mix media..."

~~~
jajaja123
The point is that once storage is all-flash, people outside the storage
industry have little incentive to think about differences in storage speed.
What will matter broadly is:

1) Code that scales well over a cluster. 2) Code that distinguishes between
RAM and storage well. 3) Code that distinguishes between cache and RAM well
(for really computationally intensive stuff).

------
jajaja123
Irrelevant - storage is going to be all-flash pretty soon anyway!

~~~
jude-
I doubt it. If your access patterns follow a power law distribution, why
should you pay more for less space, if the performance boost gained by doing
so (i.e. putting the "long tail" onto fast storage) is negligible? Moreover,
the need for storage space certainly isn't decreasing, and the storage/cost
ratios for disk and tape certainly haven't stopped increasing. If your access
patterns follow a power law distribution, a multi-tiered storage system lets
you find the optimal cost/performance trade-off.

~~~
zaroth
Mostly reliability. I would never carry spinning rust in my laptop again,
those days are gone. But also form factor, simplicity, consistency of end-user
experience, and eventually solid-state will be cheaper than spinning platters
in any case. Every graph I've seen of storage price with series for platters
and flash show those lines eventually intersecting.

~~~
bsder
> Mostly reliability. I would never carry spinning rust in my laptop again,
> those days are gone.

Exactly. Spinning rust drives are now far inferior to SSD's for reliability.
No 2.5" spinning rust drive I have bought in the last 7 years has lasted for
more than 2.

I have yet to have an SSD fail on me.

~~~
mitchty
I've had 2 ssd's fail on me. To the point that the data is unrecoverable,
ssd's fail worse than hdd's from my limited experience so far. Hope you have
backups either way.

I also have at least 2 2.5" hard drives that are running on 3 years. Anecdotes
make for poor data sets. (note one of those 2.5" drives is getting smart
errors now after 12800 hours of use, so probably going to die in short order)

Besides, with spinning rust getting to 8TB now, hard drives are just taking
the old place of tape in a proper backup solution. That is
memory->flash->hdd->tape->optical (last two could possibly be the same layer).

~~~
zaroth
Would you trust a drive to carry that kind of lifespan? I think solid state
will ultimately make the rest irrelevant, and in just a few more generations
time.

Then again, when storage and compute are integrated, even solid state should
some day become irrelevant. So I just think 'hard drives' are going away, and
'store' will be a resource, like 'compute', and most likely implemented in
flash for bulk store, and backed up, on separate flash.

~~~
mitchty
Carry what kind of lifespan? Generally I don't trust anything anymore and have
a 2 local copy + one offsite policy for data I care about.

If I don't have that I don't care enough about the data generally.

As for storage and compute you mean memristors or fpga like?

Solid state has a rather huge size difference to overcome. Hard to beat
spinning rust at the moment and at least near future for bulk storage.

