The UI stuff is great, but the tricky bit about building a storage system is not provisioning it, or getting the access protocols right, it is all about finding all the ways that data can be destroyed (both silently and noisily) and guarding against them. So if you want to stick with the Enterprise target, then you need something like the ZFS On Linux page which describes every way you can get data zapped and how you will prevent that from happening.
If you want to be just an off the shelf "hey here's something that will make your access point into something like a NAS device." then you get to lose data when a disk goes bad, or a memory chip goes bad, or a network cable is loose, or the powersupply cuts out, or the cat knocks it off the table etc.
I've started to do some DR testing myself, but it will take a little while to publish our findings and recommendations.
NetApp tracks it with their Nearstore product line which used SATA drives in a NAS box (they have been for a while actually, when I left they had data on about 65 million drive hours) and while Seagate quotes it a 1x10^15 bits but its actually closer to 5 in 10^15 bits. A 3TB drive has 3x10^13 bits of data (closer to 3x10^14 when you account for track markers and error recovery bits).
If you're bored some time try reading every sector from one of these drives. To maximize your chance of success make sure you operate the drive at a slightly warm temperature (keeps the lubricant from sticking) and isolate it from vibration. Its worse if you read it randomly (you will get some arm servo movement just because the drive will have replaced some blocks from spares, but minimizing it also keeps vibrations down.)
Long before it became an issue on single drives, like it is today, it was an issue when trying to reconstruct a RAID4 (or 5) group that was 3.5TB (which at the time was a 7 disk raid group of .5T drives. 14 disk groups (a full shelf) were pretty much guaranteed to see a second error in the shelf during reconstruction. Which was also way RAID6 or dual-parity RAID became a must have enterprise feature back in 2005 or thereabouts.
On an interesting side note, because the chance of hitting an unrecoverable read error is evenly distributed through a drive, 3X replication is still recoverable even with intermittent read failures. There isn't really a RAID number for that but it does work reasonably well and avoids a pesky parity calculation if you embed check data in your blocks as they do in GFS.
 https://www.usenix.org/legacy/publications/library/proceedin... -- Peter Corbett's paper (he is the guy who invented NetApp's dual parity system, and from that paper the following --
"Disks protect against media errors by relocating bad blocks, and by undergoing elaborate retry sequences to try to extract data from a sector that is difficult to read . Despite these precautions, the typical media error rate in disks is specified by the manufacturers as one bit error per 1014 to 1015 bits read, which corresponds approximately to one uncorrectable error per 10TBytes to 100TBytes transferred. The actual rate depends on the disk construction. There is both a static and a dynamic aspect to this rate. It represents the rate at which unreadable sectors might be encountered during normal read activity. Sectors degrade over time, from a writable and readable state to an unreadable state."
And in experience from the field put it at about 15TB transferred, so 3TB into 15TB, one in five.
[1 - (1 - 5 10^-15)^(3 * 10^12)] ~ 0.01488...
If your 20% figure is accurate, the actual uncorrectable bit error rate would need to be something like 7 in 10^14. I am not disputing your empirical information, but your numbers are do not agree with it. The difference in what your numbers say and what you say is only about 1 order of magnitude. Doing statistical calculations with better records could allow the cause of that to be identified.
But you don't...
Once I won a bet that RAID 1 was actually faster than RAID 0 on a given scenario.
1. Can we enumerate the data loss scenarios?
2. How is drive failure handled?
3. How may data be corrupted and such corruption detected?
4. For every data loss scenario, what is the recovery procedure?
Of course, there is a wealth of information on such questions for standard RAID, but I would suggest for marketing purposes that rockstor synthesize available information (from the many relevant layers of data management) in a coherent fashion, specific to their product. It doesn't have to be deep, but it should be at least minimally comprehensive and broad, with pointers to more detailed, layer-specific information.
Also, it's fine if the recovery scenario is "restore from backup" for e.g. the scenario where data is deleted by an authorized user. If so, there should be at least a minimal "backup story".
We have added appliances <-> appliance replication recently, which can play an important role in recovering from bigger disasters.
We'll have all that documented. Please feel free to participate on our github.
i also administer a freenas box for a small business and this stuff is rock solid, i would only wish a _easy_ solution to get the permission stuff right in a multi user setting.
none the less, thumbs up for creating this, cool stuff!
Also, the fsck tool is still very immature. It takes many years to get good at detecting and recovering from corruption.
Be prepared to update your kernels and tools often and independent of your vendor. Btrfs-progs will likely need to come from the git repo, so building your own packages for distribution around your production nodes will probably be necessary too.
A word of caution: do not run btrfsck without consulting the wiki and mailing list first, and hopefully knowing exactly what you are doing. There are situations you'll encounter which do not require btrfsck to repair (but rather, other tools instead), and it will potentially make a recovery less likely.
FWIW, I have been watching the list for years, and reading regularly for about 6 months trying to get a sense of stability with respect to the features I want.
I would not put btrfs in production yet. Though, likely soon.. I'd guess another year or so.
New kernels should be no problem as Ubuntu will likely provide an HWE stack in the future and btrfs-tools is inside a well maintained ppa...
Damn' I should have pushed ZoL through.
I have been waiting and watching for a long time for most of these "new" filesystem features (pools, fs-level RAID, checksums, send/receive), but I am a "filesystem conservative" (especially in production; less so on my own machines) -- I'll keep waiting awhile longer. On production Linux today, I stay with EXT4 or XFS.
Can you elaborate on the permission-stuff issue you have? We'd love to get it right in Rockstor.
Can't remove raidz's from zpools, but `btrfs device delete` exists.
I'm currently running Freenas with ZFS.
Would be curious to see how this compares.
The one thing missing for me on FreeNAS is some kind of file search/indexing feature.
I wonder if the fact that this is Linux based will make adding something like that easier.
Hmm, stats - don't know much about this topic, but I'd been keen to hear more about what's possible.
On the file indexing front, I think Recoll and Tracker/MetaTracker are the two most active projects - Recoll being the more active one. Strigi and Beagle are both discontinued.
I've had the opportunity to install Rockstor on various hp gen7 and gen8 servers and had no problems.
I witnessed Rockstor install just fine on an old Isilon node and was told that the performance was quite good -- sorry I have no specifics.
> Since Apple has support SMB for a long time, and actually made it the default
> protocol in 10.9, is there much need for AFP?
I'm using OSX 10.9.4, and I've seen better performance over AFP than with SMB.
So yes, it'd be nice to have AFP support.
We changed the key in our live demo, but for our users we'll roll out the fix in the next update. As part of that fix, we'll also remove the key file from git.
I think that's a reasonable plan. Hope I am not missing something.