We were running some services on a stack of older Dell servers with mostly identical hardware. The RAID controller and mainboard on one of them let the magic smoke out. We (a sysadmin team of three) promptly decided to move the RAID array to another unused server in the stack in order to avoid a lengthy reinstall and manual restoring of daily backups – this happened before the almost-one-click provisioning of devops and virtualization tools was the norm.
The other server lacked the RAID license dongle that was on a pin header on the mainboard, so we just tried to cannibalize it from the server corpse that barely had stopped smoking. The new mainboard promptly rejected the license because the service tag differed, and the only way to change it was to install Windows 95 or 98 and run the tag changer tool. In addition, we had to downgrade the controller firmware to match that of the old one. One nightlong unattended rebuild later, the server was up and running the next morning.
When we upgraded the server hardware a bit later, we migrated it over to mdraid and haven't had any problems with hardware failures since.
Moral of the story: If you are depending on certain hardware features, you need to have a cold spare available with identical hardware and firmware. If you cannot afford one, it's better to either use a software implementation, have a really good service contract with the vendor or just let a VM provider take care of your infrastructure.
+1 for this. Hardware raid absolutely has its place, but if you expect to be able to rebuild an array without restoring from a backup, you absolutely need to have spare parts that are tested in advance.
Also, RAID is not a backup. RAID provides high(er) availability. You need to have backups still!
Absolutely true. In this case we underestimated the workload caused by the service tag and firmware version issues and the fact that the new controller wanted to check and rebuild the array, when the disks themselves should have been fine. In hindsight it might have been faster to reinstall the OS and services plus restore the backups.
You can have a nice redundant storage for local performance and resilience, even geographic replication for resilience against a single site loss, but you need snapshots of some kind to protect against replicated destructive updates.
No please don't say something like that if you love your data. Raid is for the availability of your data, a Backup is the secure storage/versioning of you data, two completely different things.
For example, my irreplaceable photos (probably around 3TB worth) get packed into year-based TAR files, then uploaded to Glacier Deep Archive. It's only around $3 USD a month.
I won't have to care about retrieval cost for the moment, because I think I'll only need to touch it in the event of a catastrophe.
Most media files and Linux ISOs don't get backed up at all.
BTW: For Tape it's REALLY important to have a steady Dataflow, too slow like USB is a big NONO, you NEED something like SAS/SATA or Fibre. The tape can but should not stop and wait for new data to be written.
EDIT: Well then look at LTO5 (double cap)...LTO6 is probably still too expensive.
Agreed that identical or confirmed working spare components are important.
I've migrated RAID arrays through several generations of Areca cards successfully and even tested back and forth between models. I like them.
Also remember RAID is not backup; keep a second copy of your data around (offsite if you're serious).
I think I had a raid card with two mirrored drives. Thing is when one drive failed, I couldn't just pull the data off because the remaining drive could only be accessed by that raid controller.
I’m very surprised you went through the trouble of building such an amazing NAS and skipped ZFS! Definitely give it a try. It’s especially useful for combatting disk rot. Not to mention how easy it is to setup and use.
However, do some include support for CRC? That would allow not only detecting, but also correcting bitrot.
I assume that's what Raid6 does, albeit less granular than per file, and more aimed at recovering a full file than a few bits?
Can anyone point me to a FS that supports it, or to the reason it's not done?
As for ZFS, I think I'll keep avoiding it until it is mainlined. I don't really want to play that game.
The next thing I will try out is dm-integrity.
Hacker News discussion: https://news.ycombinator.com/item?id=24372662
I went for a Chenbro SR301 case (I don't know what's changed in the still-available SR301 Plus model) + modest commodity parts over the HP or Supermicro turnkey boxes in a similar form-factor mostly because it takes standard components (mini-ITX boards, ATX PSU, standard fans... nothing I couldn't overnight from any major vendor for cheap) for easy replacement. It was also a little cheaper.
I did btrfs "RAID10" instead of RAID. Hardware RAID is always a hassle, btrfs is in-kernel, and it has all the nice modern data integrity features (two spindle rule, I have a timer that scrubs it periodically for to verify checksums, etc.) - I've been really pleased with that except for the lack of good automated monitoring/notification tools for btrfs.
Documentation I wrote at the time, more about the backup scheme than the machine at https://pappp.net/?p=1627
I'm still using that setup, eventually I'll need to swap some discs as capacities increase, but that machine has required zero fucking around since it was set up.
I found it very strange the guy is happy with & measured 100 Mbps throughput over the
network. Any 2+ drive NAS from the last 5 years should be able to saturate a 1Gbps link.
EDIT: He upgraded to a XEON..so forget all above.
RAID6 makes sense for 5-8 drives.
With RAID10, if the 'wrong' two disks fail, the array will go offline.
With RAID6, any two disks can fail and the array will still be online.
EDIT: The author has also demonstrated that they've gone through various growth permutations over the years. 4 disk RAID6 under md-raid has a supported growth mechanism, allowing you to add additional disks and have it restripe data automagically for you. It's not a good idea as far as performance is concerned, but for a home use nas that is primarily intended for streaming media, it's not a major issue, and enables you to grow to your heart's content (even if it is a bad idea, as I successfully proved growing from 4 disks to 16 over a few years, before respinning as multiple 6-disk arrays)
The testing is focused on ZFS but, quoting from the conclusion:
If you're looking for raw, unbridled performance it's hard to argue against a properly-tuned pool of ZFS mirrors. RAID10 is the fastest per-disk conventional RAID topology in all metrics, and ZFS mirrors beat it resoundingly—sometimes by an order of magnitude—in every category tested, with the sole exception of 4KiB uncached reads.
Emphasis on 'properly-tuned'. This doesn't hold as true as the OP states for the defaults (which is what the majority of home users will be using)
You install it on a USB stick, and the web interface has you up and running within like 2 minutes. The only hard rule is that your parity drives must be as large as or the same size as your data drives, but other than that you’re completely free to add or remove disks one, two, or ten at a time and all it takes is like 3 button clicks.
The community is large so there’s always someone to help should you run into trouble but in general the whole thing is pretty brainless. Click button, server work.
Otherwise, e.g. Synology produces excellent devices, etc.
Sorry but had quite the opposite experience even with the professional lines. The only NAS i trust is a Server (at least HP mini server) and ZFS. Stuff like unraid is for Windows mindsets, i don't trust HW-Raid and i especially don't trust Closedsource Software BS for stuff like that, exceptions are Enterprise stuff like EVA's or NetApp and EMC.
This is how you'd mount a synology array in a Linux server: https://www.synology.com/en-us/knowledgebase/DSM/tutorial/St... - it's basically two terminal commands after all the preparation stuff.
I believe QNAP uses mdraid as well but I haven't had any experience with those.
Yes and that raid destroyed itself many times which btw never happened to me on a self-installed Linux.
I'm sorry you've had the misfortune of encountering such problems.
I've been using Synology/Xpenology (a fork of their GPL bootloader) for 8 years, and it's still going great for me.
Btrfs is on the kernel and just works.
I'm really impressed with how stable it is - never had any issues with it and no dataloss, even with a couple of hard drive failures along the way.
I know many of you will prefer to customise everything and take a deep dive into the way things are set up, but I really do enjoy the simplicity of Unraid.
This is why I picked Synology as my nas. The Synology Hybrid Raid system let me just shove all my extra drives in the 4 bay NAS and still have parity protection.
Then I just slowly upgraded all drives to the current money/capacity optimum one or two at a time.
If I was made of money and could afford to upgrade all the drives in a pool in one go, I'd definitely go for ZFS too.
Btw I have run a NAS similar to this machine for 6 years now using FreeNAS. It’s awesome.
the 10+ seems to run freenas fine.
Also note that the FreeBSD implementation of ZFS has been replaced and ported from OpenZFS (formerly ZFSonLinux).
Another issue I've seen people report as a reason for the increased fan speed were hdds which didn't report temperature properly to the controller so the controller didn't know whether the drives were cool or hot.
What BIOS/iLO versions are you running?
Latest B120i driver was released at 30 Apr 2020. I don't think it's retired yet, but they don't support RHEL 8, so I'm forced to stay on RHEL 7.
I can see that iLO firmware is not latest, but it's BIOS who's responsible for fan speed control and I think that I'm running latest BIOS.
That's not really about BIOS versions. When HDD runs under B120i fakeraid mode, server is aware about HDD temperature which is displayed in iLO. As long as this temperature is low, fans stay low. When HDD runs under AHCI mode, server is not aware about HDD temperature anymore, so it uses high fan speed to ensure that cooling is good enough even if drives are hot. That's how I interpret this whole situation.
Ok, I might have misunderstood the info posted at: http://downloads.linux.hpe.com/SDR/project/ubuntu-hpdsa/ which states: "NOTICE! The hpdsa driver is no longer being developed past the versions indicated. Do not upgrade the Ubuntu kernel or try to use this binary driver with Xenial. If you have a B-series Smart Array, it is advised to use the Linux md (software raid) driver, or upgrade to an H or P series hardware-based host bus adapter."
I thought the hpdsa driver was the same across distros, not unique to Ubuntu. Based on what you're telling me it seems that only the Ubuntu version was discontinued.
Unsure about asrock. I think they maybe have pivoted to a non renoir cpu
The A500 is very nice, though in practice it needs a USB3 hub sitting next to it.
I would say that a good router is the way to go and I think I should replace the one my ISP has given me. Can anyone point me in the direction for a guide to a hardware router and appropriate firmware, so I can be sure to control what the NAS (or any other device within the network) can connect to on "the internet"?
If you're looking to build something more powerful, OPNsense on an x86 box is the way to go. I also think that the interface is much more intuitive compared to OpenWRT's.
The only somewhat difficult part was preserving data from the old disks as QNAP has custom extensions to LVM or MDRAID which work nowhere else. So there were a few back and forth of removing a few disks from the pool, rebooting to a normal linux, formatting them without the custom garbage, booting back and copying over.
In the end I decided to go with NixOS as I've already used it on my laptop/desktop for about 5 years. I also wanted a modern CoW file system and went with bcachefs. Getting it working with NixOS takes some fiddling, especially if you also want encryption but it works. It now runs samba, matrix-synapse and soon maybe nextcloud. Not quite there yet with everything but quite happy that I went down this route.
The only thing missing from the HW side is an SSD, but the box has USB3 and UAS support so I might add an external one that bcachefs can use as a faster tier.
I always use /dev/disk/by-id/* .
Why? Because storing a single copy of your data on a single local RAID array doesn't protect you from disasters such as fire, theft, accidental fluid spills, small children/pets, a power surge after a lightning strike, flooding, and so on. Basically, a lot of the stuff you'd fine in your home insurance policy.
RAID doesn't guarantee data integrity either. You lost data after an unintended deletion or corruption by some computer program? RAID won't help you get your data back any more then a single disk scenario would. If the original file got overwritten and mirrored across the array, and all you had was that one copy, you won't be able to go back to that original.
A more exceptional scenario would be a drive silently failing and RAID happily mirroring corrupted data across the entire array. Unless you use a filesystem that implements checksums - like ZFS - you wouldn't notice this unless it was too late. 
I'm not advising against RAID. I think there's merit in using RAID when it comes to convenience and availability. Like, your data remaining available when you experience a sudden disk failure, and not having to spend hours or days getting everything back on line again.
I'd say that RAID and backups are complementary.
So, what about backups then? The 3-2-1 backup strategy is a good start: 3 copies, 2 local, 1 physically remote. That could be as simple as copying your drives every week to a separate HD or SSD drive, and every month to another separate drive which is safely stored outside of your home, i.e. a family member, friend, or at the bank.
You could step it up a notch and choose to rsync your data across two connected drives, while rsync a third copy to specialized cloud storage service such as Borgbackup or Wasabi. Add in scripts that perform regular checksums control and reports the results daily and you've got a pretty solid solution.
The final step is looking at your data and differentiating between the stuff you absolutely can't afford to lose (e.g. family albums) and the stuff you may afford to lose (e.g. software downloads). The goal here is to calculate the costs associated with required storage volume and bandwidth, and land on an appropriate backup solution that matches your budget and your needs.
Finally, that's just safeguarding your data in the present and short term future. Neither RAID or backups are a long term preservation strategy. Which is a whole other can of worms, including such challenges as migrating obsolete data formats (Yeah, I have WordPerfect 5 and AutoCAD files in my own personal archive), or dealing with obsolete hardware I/O (Yeah, I have a bunch of old IDE drives in a cardboard box for which I had to buy an I/O converter).
When I was younger, all these experiments with firewalls, routers, vlans, RAID and other more professional technologies were a good learning experience, but after all those years, I rather keep work at work and keep it simple at home rather than to simulate a corporate environment on a shoestring budget.
What you say about protection from fire/theft is still real though. I'm looking for a good (which includes easy) solution for that.