

Digging Into Apple’s Fusion Drive Details - barista
http://www.macobserver.com/tmo/article/digging-into-fusion-drive-details

======
mistercow
So they seem to be trying to differentiate this from a typical HHD (besides
the fact that it's two separate devices) by saying it's "not a cache", but
practically speaking, are the benefits of "not duplicating" really worth it?
Sure, you get 12.5% more space that way, but it seems like the performance
implications would be almost exclusively negative.

On a typical HHD, under typical circumstances, all writes go to the SSD, and
are copied to the HDD later on. So far so good; Apple's solution will do the
same thing. And stuff being read gets copied to the SSD in both cases. Also
great.

Now, suppose a file is loaded to the SSD and we never write to it. Now the SSD
gets full, and the file has to be moved off of the SSD to make room. On a
typical HHD, all that happens is the file gets deleted from the SSD. It's
still on the HDD, so no further action is necessary. But on Apple's Fusion
Drive, this file will have to be copied _back_ to the HDD.

Considering that some of the largest kinds of files (e.g. videos) are exactly
the sort of thing you will want to put on the SSD and not modify, this seems
like a pretty terrible idea.

~~~
gmac
Also, by not duplicating, you presumably increase probability of data loss
considerably. Now you lose data if either of the devices (SSD, HDD) fails.
That certainly seems a bad trade-off for a few percent more space.

~~~
simonh
But in a cache, if the main storage fails you can't just get the data off the
cache anyway. On it's own it's not a filesystem the OS can read, as it's
managed by the device driver. In theory the data is there, but there'd be no
practical way to get at it. In a fusion drive, if one drive fails it looks
like you'll still be able to recover the data on the other drive.

~~~
gmac
OK, but the probability of data loss is still worse than with a cache, because
failure of (HDD or SSD) is likelier than failure of HDD alone.

~~~
simonh
But a HDD with a flash cache is still a hdd component and a flash component.
Putting them in the same enclosure and using one of them as a cache for the
other doesn't magically merge their combined MTBFs into a lower number.

~~~
mistercow
That is correct. While gmac is technically correct that you have a higher
expected amount of data lost with the non-caching system than you would with
the caching system, but that's just the trivial fact that more space = higher
loss expectation.

------
cbsmith
"There are other hybrid models of blending SSD and mechanical disks to save
money, but none of them are able to integrate with the OS and do it as
intelligently as Apple is able to with Fusion Drive."

Seriously? Aside from hybrid drives, Windows has had a very similar technology
since Vista (SSD's weren't as inexpensive back then, so it wasn't terribly
well received), and of course there is ZFS with it's L2ARC.

The whole "it's not a cache" thing is a canard. Doing it right means doing it
much like swap: you may have dupes in certain cases. Going beyond that
actually hurts performance.

~~~
ryannielsen
_The whole "it's not a cache" thing is a canard. Doing it right means doing it
much like swap: you may have dupes in certain cases. Going beyond that
actually hurts performance._

It's fundamentally _not_ a cache. If it were a cache, Fusion Drive would not
increase the total storage space. Anandtech[1] seems to confirm this, noting
that Fusion Drive creates a 4GB write buffer on the SSD and all read
operations will influence a pinning algorithm which will move, not cache,
frequently accessed files from the HDD onto the SSD.

ReadyBoost is exactly a cache, as is ZFS's ARC and L2ARC.

Now then, regarding the statement you question… yes, it is incorrect.
ReadyBoost and L2ARC are pre-existing examples of "hybrid models of blending
SSD and mechanical disks" that "integrate with the OS" and do so quite
intelligently. You are correct to call out that statement. But it really does
appear that Fusion Drive is not a cache, but a true unioning of SSDs and HDDs
managed by the OS to ensure hot files are on the SSD whenever possible.

[1] [http://www.anandtech.com/show/6406/understanding-apples-
fusi...](http://www.anandtech.com/show/6406/understanding-apples-fusion-drive)

~~~
rhplus
The hybrid drive feature in Vista was Ready _Drive_ , not ReadyBoost. It never
took off due to lack of driver support from manufacturers and/or Microsoft.
Even so, you're right that techically it's a cache because copies are stored
on both physical drives backing the logical one. I just wanted to point out
that ReadyBoost was about using additional drives to use as caches, while
ReadyDrive was about using hybrid drives.

<http://en.wikipedia.org/wiki/ReadyDrive#ReadyDrive>

[http://msdn.microsoft.com/en-
us/library/windows/hardware/gg4...](http://msdn.microsoft.com/en-
us/library/windows/hardware/gg463388.aspx)

------
ChuckMcM
I wonder if they can do a write log to the spinning rust (sequential write,
pretty fast, slow on the SSD) and then periodically resolve the deltas. I
played around briefly with a system which put metadata in a RAM disk (battery
backed up to SSD) and actual data blocks on the disk. There was a huge benefit
in that system of splitting the metadata i/o's from the read/write I/Os,
really helped overall performance. I look forward to seeing the whitepaper on
this tech if they publish it.

~~~
cynicalkane
ZFS uses a log-structured filesystem journal, and in my experience can do a
little less than twice as many 'random' writes to a spinning hard drive than
that hard drive is capable of. Of course the random writes are actually
sequential.

If you have a SSD log device configured, writes greater than 128k (IIRC) are
sent directly to disk for performance reasons. This behavior is configurable
and probably should be higher for modern SSDs... but anyway it's a neat idea
that has been used in practice.

------
vubuntu
It's the poor man's (or rather consumer's) 3PAR!

HP 3PAR, StorSimple etc all have done it long time back but at a bigger scale.
Apple is just scaling it down, and bringing it in front of the regular Joe (as
regular as Apple customers tend to be) in a non-complex manner.

~~~
cokernel_hacker
Tiered storage has been done before, this is true. However, they are dedicated
systems. If "Fusion Drive" is at the OS level, that means it shares resources
with the rest of the system which presents more/different challenges.

~~~
vubuntu
StorSimple implements auto-tiering at block level and presents an iscsi target
block device transparently to the OS/Filesystem layer. It is built on top of
already available open source linux iscsi target software. If Fusion was done
in similar manner, it absolutely does not require any modifications to the
regular OS/file system layers at all. The intelligent auto-tiering capable
iSCSI target portion would be at the driver level well isolated from the rest
of the OS/filesystem components. It probably even has more simplistic rules
for tiering . Keep OS files/dir always on SSD. Apply tiering logic to app data
using a simplistic LRU scheme. And there is only two tiers to deal with. I am
no expert in Linux/iSCSI or auto-tiering, but I bet you this is something that
a HP/3PAR or Dell Compellent can spin up at a very short notice using their
existing technologies. Apple is very good at identifying a complex ( and in
most cases an existing but end-user un-friendly) technology that has been
under-utilized in consumer space and bring it down to consumer level and
deliver it in a consumer-friendly manner and put a wonderful marketing spin on
it with cool terminology (Retina display, Fusion drive)and possibly even
patent some arcane & obvious variations of the main theme.

~~~
cokernel_hacker
StorSimple, 3PAR, etc. all appear to be appliances. What I am trying to get as
is subtle so I will try to be a little bit more explicit.

StorSimple can dominate the appliance it is made out of to provide its
services. Fusion Drive cannot dominate the owner's machine.

~~~
vubuntu
This might be oversimplifying a bit \- How about keeping all OS files, app
files etc on SSD. No auto-tiering of those. \- Then some equivalent of a 'cron
job + filter driver' combo that moves data files to and from SSD to the 1/3TB
disk? When I say filter driver, I am thinking of Windows file system mini-
filter driver. Linux equivalent would be VFS I guess.

How difficult is it to implement some thing like that in a pre known 2 tier
configuration context? And what would be the effectiveness of such a solution?

In Windows filter drivers have been used effectively to implement file
tombstoning for archive purposes and utilizing reparse points/extended
attributes to store the remote URI (for recalling file contents from remote
archive upon intercepting a read request). It's not too hard to write a mini
filter driver in Windows and use it in conjunction with reparse points to
implement tombstoning in filesystem.

~~~
cokernel_hacker
Mini-filters are FS level as are VFS solutions. How do you tell a thing that
sits at the VFS level about multiple devices? You could probably do it,
nothing storage engineers solve is unsolvable :)

It certainly seems unnatural.

Most of those appliance solutions (and probably Fusion Drive for that matter)
implement their mojo at the block level and provide block device interfaces.
This is to aid their seamlessness, it makes them a lot more straightforward to
implement and it makes them a lot more flexible. I would have trouble
believing it would be simpler to implement it at a mini-filter/VFS level.

------
Derbasti
I think its funny how Apple tries to solve its file system issues with SSDs.
No other popular file system will get quite as unbearably slow when in heavy
use as HFS+. No other file system will just randomly create errors and little
corruptions here and there in regular use. No other popular OS will come to a
complete halt as soon as it starts swapping because its file system can't
handle it.

And instead of fixing that stuff, Apple is pushing SSDs to everyone. Way of
the future.

~~~
taligent
Sorry but none of what you said is true.

There may be bugs and bugs in the past. But there is no systemic "this FS will
lose data under heavy use" design fault. If there was then millions of people
would have been affected.

And of course Apple is pushing SSD to fix a FS bug not because it is 10x
faster and 1/4 the size of a mobile HDD.

~~~
Derbasti
HFS is producing errors all the time. Run Disk Utility to see them. They may
not be fatal, but they happen.

Apple computers with HDDs do get unusably slow once they start swapping. So
much so that it is usually impossible to terminate the memory-hungry
application. With SSDs the computer becomes usable again. Just ask some
plastic MacBook users.

So, what exactly is untrue about what I said? Just claiming that it is not is
not very productive.

~~~
ryannielsen
_So, what exactly is untrue about what I said?_

Potentially nothing, but it is all anecdotal. Where's the widespread or
researched evidence of HFS (especially in its HFS+ or HFSX variants) being a
slow and easily corruptible file system?

It's not like HFS is a rare and infrequently used file system – it's the
primary filesystem for all Macs since System 3, for many iPods, and for all
iOS devices.

The scaling limits present in HFS+ will rarely be hit by most users and, while
not as resilient as more modern FSs, there are no inherent fatal design flaws
with HFS that I'm aware of. Perhaps you've pushed HFS+ beyond its limits, or
have usage patterns that trigger a rare fatal bug, but just claiming that
something is true is not very productive either.

~~~
Derbasti
On many occasions, I tried to load some data set that would not fit in memory.
Usually on accident. If I do that on Windows or Linux, the computer becomes
slow, I cancel the operation, and everything is well again. Same with OS X and
an SSD. On OS X with an HDD, the computer becomes completely unresponsive with
no recovery beyond a forced reboot.

This happened particularly frequently with virtual machines and/or Matlab,
both of which can very easily exhaust any amount of memory. But maybe no one
else is doing stuff like that.

~~~
r00fus
> This happened particularly frequently with virtual machines and/or Matlab,
> both of which can very easily exhaust any amount of memory. But maybe no one
> else is doing stuff like that.

I used to do a lot of heavy VM work on OSX - and back in the early days of
VMWare Fusion v1 and later. I have never had the OS become unstable due to a
storage bottleneck/thrashing.

Not that OSX is unbreakable, but it's been a whole lot better/more stable than
my experience with Windows. At least on the Mac, a large chunk of the drivers
are supported by the folks who wrote the OS.

------
Mythbusters
Seems like an interesting choice not use the fast disk as a cache. This is the
design that Windows 7 and beyond went with. <http://technet.microsoft.com/en-
us/magazine/ff356869.aspx>

------
tomkinstinch
Reminds me of Seagate's Momentus XT, except it sounds like the file management
is done in the OS and not on the drive, and that the SSD is not being used as
a cache.

[http://www.anandtech.com/show/3734/seagates-momentus-xt-
revi...](http://www.anandtech.com/show/3734/seagates-momentus-xt-review-
finally-a-good-hybrid-hdd)

I'm guessing the SSD portion will be MLC?

Are there any other hybrid drives like this in the wild?

~~~
sehugg
I'm surprised that the benchmarks for the Momentus XT are so good, especially
since it only has 4 GB (8 for the newer model) of SSD. It would seem that in
theory it would be better to just add 4 or 8 GB of DRAM to your main system,
but apparently the drive only caches non-sequential data that isn't likely to
be read quickly from the platters. I'm thinking of trying one on my aging
MacBook Pro.

Anyway the Apple approach should be better because of the bigger SSD and more
intelligent caching available at the OS level.

~~~
yeureka
I installed the 750GB/8GB Momentus XT on my late 2008 MBP and it has been a
breath of fresh air. Previously I had a 500GB WD Scorpio Black. That was fast
but it vibrated a lot and I think battery life was worse than with the
Momentus. I mainly use the machine for XCode and Visual Studio and now I am
CPU bound.

------
epistasis
I wonder if this was implemented as a combination of LVM with an extended area
for the Hot File Clustering [1], which currently reserves only 0.5% spinning
boot disks for small files with a high temperature. Lots to speculate about,
little to know at this point, it seems.

[1] <http://osxbook.com/book/bonus/misc/optimizations/#THREE>

------
owenfi
I wonder if it's this:
<http://en.wikipedia.org/wiki/Smart_Response_Technology>

Oh, it's probably not, because that page says the maximum size is 64GB.

(I work @ Intel and have no inside knowledge about this feature at all.)

------
bane
This is really cool tech. Not only does it solve the problem of "having more
stuff than I can fit on an SSD I can afford" but "long-term storage that I
still need periodic random access to but I/O performance is not paramount, but
heavens no, it needs to be faster than a tape backup"

