Hacker News new | comments | show | ask | jobs | submit login
Z-410: How ZFS is slowly making its way to Mac OS X (arstechnica.com)
138 points by thehigherlife on Mar 18, 2011 | hide | past | web | favorite | 30 comments

This. I want this!

Hard disks are the only part of a computer that is not replaceable. Everything that improves data integrity is improving computer usage as a whole. Improving data integrity is improving user confidence in the machine. This is something that OSes should have done years ago. All any user ever does on a machine is manipulate and work with his personal data. Absolute data integrity should be paramount to any computing technology.

Alas, it is not, and we all have (hopefully) several backups. Again: Anything that improves data integrity is improving computing as a whole. I want this.

"16 exabytes ought to be enough for anybody"

> "There's a huge chasm between using Xsan over Fibre Channel and a USB drive with Time Machine," Brady told Ars. "That middle piece is what we're looking at—users that want the convenience of a device like a Drobo, but with more reliability and [easy verifiability]."

So, ultimately, this is a hardware play, then. That makes sense, because most of the features of ZFS don't add any value to a Mac with a single internal drive and at most one or two USB drives, which is how almost all Macs are used.

> most of the features of ZFS don't add any value to a Mac with a single internal drive

My favorites:

    * Data reliability (checksumming, etc...)
    * Compression
    * Lots and lots of undo
    * Including undoing entire OS upgrades
    * Cheap and quick partitioning
    * Easy disk replacements
I've made use of all of these things this week. It happens that the disk I replaced is part of a two-disk flat pool, but old disk out, larger new disk in and zero downtime is awesome.

I don't recall the details of a nice article I read a few years ago mainly about RAM and ECC but as storage size grows, the probability for any one of its bits to flick "by itself" raises. Data reliability and self-correction becomes an essential part of storage.

I've played a bit with ZFS myself and really enjoyed it. It's like the sum of everything we learned about filesystems and storage and previously used to shoehorn into various historical filesystems, only this time it's a clean and simple implementation.

I think the term you're looking for might be "adjacent track erasure", which is the idea that the magnetic fields of adjacent tracks on a hard drive interfere, so if one track is written frequently (such as a file system metadata block), the tracks that are physicially adjacent on disk may eventually have some bits flip. It's hard to find concrete numbers on whether this happens, or how often, but it's something that people are getting more concerned with on the higher-density drives that are being made now. Of course, ZFS makes the issue pretty much moot.

Except you could do better-than-time-machine, cheap, continuous replication.

You could even TimeMachine-without-additional-drives... I mean, why should my snapshot system and my backup system be one? How about letting me use TimeMachine locally, and if I plug up an additional drive, TimeMachine asks me if I want to backup to it as well?

Not a game-changer exactly, but a nice-ish upgrade potentially.

The biggest upgrade would honestly just be quicker/continuous replication that doesn't drag the rest of your system down.

> You could even TimeMachine-without-additional-drives... I mean, why should my snapshot system and my backup system be one? How about letting me use TimeMachine locally, and if I plug up an additional drive, TimeMachine asks me if I want to backup to it as well?

Unless something changes between now and RTM, Lion will provide that feature out of the box:


And that's about the only feature that a typical Mac user would benefit from with ZFS, without some kind of "prosumer" or "enterprise-y" external storage device.

I almost always use lzjb compression. It's nearly free and gives good returns for mixed data like the sort you'll find on system drives. Binaries, text, etc.

Also the reliability aspect. I haven't had HFS+ go bad on my own systems, but I've seen it several times on servers. It's on about the same level as ext3fs in that arena, which is pretty bad in my book.

I get the "less is more" argument and I agree generally. ZFS is so far beyond a traditional FS though.

One of the big reasons for Time Machine is to make it harder for Mac owners to lose data, either through hard drives failing, or user error. Backing up to the same drive only protects against user error. Instead of displaying a checklist of things you're protected against, Time Machine is an all-or-nothing deal, which makes the interface a lot simpler.

It also enables retail locations extra options to make users happier quicker: if a customer comes in pissed with a logic board problem that's the "third strike" for that computer, the support staff can simply ask if the customer has Time Machine backups, and send the customer away with a new-in-box computer without having to worry about moving drives around and migrating files in-store.

Apple's vertical enough that keeping Time Machine simple saves them employee hours around the world every day.

Lion actually makes local snapshots by default but the option to turn this on and off is quite hidden and even with local snapshots enabled the huge on/off switch for Time Machine in the preferences shows that Time Machine is turned off. All other Time Machine behavior is unchanged. This is, I think, quite a clever solution to the problems you were mentioning.

(This might still change before they release Lion, though.)

Lion does this by default.

Does ZFS do anything to optimize for SSDs?

There are three ways you can use SSDs with ZFS. First, you can use SSDs in place of rotating rust. Second, as was previously mentioned, you can use an SSD as an extension of your filesystem cache. That's called Level-2 Adaptive Replacement Cache (L2ARC). You can also use an SSD as a logging disk to accelerate writes; that's called "Logzilla", IIRC.

In commercial offerings built around these features, you typically source an SSD that is either write-biased (logzilla) or read-biased, depending on what you are trying to do.

Just some formal clarifications:

Logzilla's formal name is ZIL, ZFS Intent Log. Generally these are SLC flash SSD's, mirrored as losing the ZIL on a ZFS pool can lead to "interesting" recovery situations.

L2ARC is basically an extension of main memory used to cache data from the drives. If you lose L2ARC, there aren't any serious consequences. Usually L2ARC is implemented with less expensive MLC flash SSD's.

On a related note, Seagate sells a Hybrid SSD+rust drive called the Momentus XT which uses it's 4GB of flash a similar manner to the L2ARC.

I'm pretty sure the ZIL is not mirrored. If it were, it wouldn't increase performance since you'd still have to wait for the disks to fsync their copy to guarantee reliability.

This is why Sun Storage products (and Nexenta, or homegrown clusters) put the ZIL SSDs in the JBODs and not the storage heads.

You could of-course create a pool that did use a disk as the ZIL, but again, there'd be no point since the mirror has to be kept consistent otherwise it's worthless.

Since the cache (L2ARC) is just cache, and you can lose it at anytime without data-loss, it is in the heads.

Nit-picking. To be clear, the ZIL can (and should be) mirrored, you just want to do it with SSDs consistently. Mixing devices is possible, but you'll be limited to the performance of the slowest device. You can also create storage clusters with multiple heads. Again, a good idea IMO, but the ZIL is a critical component of the FS. If you lose it, you lose the whole pool (as a rule I think, but there might be clever hacks around that under special circumstances if you're lucky). So if you're going to multi-home your zpool, make sure the ZILs are in the shared-storage region. It's no good having redundant controllers if your ZIL is on the dead one.

Yes. ZFS has the concept of storage pools--what used to be called hierarchical storage management. So it can deal explicitly with storage in which some members of the pool, e.g. SSDs, are faster than others. However, this is basically a server concept. It has very little to do with how MacBooks might be used.

Its data dedupe feature is only really fast enough to use on SSDs

It also takes a lot of RAM. 380 bytes for each de-dupe entry (block pointer) IIRC? I think it translated to something like 1GB of RAM per TB of storage assuming half of it was de-duped.

It's really a useless feature when you do the math. Unless you're running a small amount of storage with a very VERY large amount of basically identical volumes (think Amazon) you'll spend more on memory than you did on disk and once you get to that point you'd have almost certainly been better off spending that money on SSDs and using any extra memory for primary cache.

Thats my point - its very useful with SSDs, because you don't need all the RAM, because SSD is so fast at random seek.

I forget the exact names, but go with it. :-)

ZFS has two tables: A Block Pointer Table. This is your hashes of blocks and stays in RAM to let you know where on disk you'll find a particular block. It's also referenced during writing to ensure that the COW operations are transactional.

The DDT (DataDedupeTable) is similar, but points to de-duped blocks. You need enough RAM to keep this in memory at all times. Otherwise for every write you'll have to scan through the entire table on disk, find matching blocks, then modify it. You'll also have to scan through it for every read to see if the block you're looking for is de-duped.

I've probably mixed up the details. I haven't read the source, just used it. But that's my (maybe flawed) understanding of how it works.

Bottom line though, you DO need enough RAM to keep the DDT in memory. If you don't you'll see SEVERE thrashing and SSD or no SSD your disk performance will slow down from thousands of IOPS to tens of IOPS or lower.

This exact thing happened to us with Sun's 7310, loaded with the Sun specified SSDs. You need to be very sure you have the hardware for de-dupe, and even then the payoff is so small...

The only real use-case I can think of is if you're running a VPS.

AFAIK, ZFS does copy-on-write updates, so that's a slight form of wear levelling.

So yes it does, but unintentionally.

L2ARC lets you use SSD as a disk cache. I wonder how well it works with an SD card in a MBP...

Newer Macs use PCI-E for the SD card interface, so it could potentially work, though you'd need a very fast SD card sticking out the side of your Mac all the time.

We've seen in the past that Apple is willing to adopt superior technology. They saw the potential of adopting a unix system as well as adopting an Intel chip. This allows them to provide the same user experience while providing those users with power, should they want it.

Nice. I just started using zfs on a home linux server. We have some large Macs, too, it would be nice to have better data integrity there.

Try it on FreeBSD or Solaris 11, in general I have found it to perform better on the same hardware.

After many many disk benchmarks, we eventually found that ZFS had the best performance of all the other included file systems on the latest version FreeBSD. I highly recommend the combination.

Now Apple just has to buy them and make it standard.

I think the article explains why this isn't going to happen.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact