

Ask HN: Does anyone here use software RAID on Linux? - RiderOfGiraffes

We've been struggling with software RAID.  First on SuSE 10.3, not on SuSE 11.0.  We're finding that we occasionally have a disk failure, and then the entire RAID device becomes unavailable and dismounts itself.  Using mdadm tells us the individual disk isn't working, and that the RAID isn't available.<p>(Note:  I'm deliberately not using the technical terms here because I need to find out what other people's experience is of this.)<p>Does anyone here use software RAID 5 on SuSE?  If so, what are your experiences?  I've searched around using Google but there doesn't seem to be any problem that crops up repeatedly.  In fact, the general impression is that software RAID works on SuSE Linux 10.3 and above.<p>I would welcome contructive comments and advice.<p>Thank you.
======
soult
I am using RAID1 on my PC and RAID0 on some servers, all software raid with
mdadm. I never had any problems temporarily removing one HDD from the array
(when I needed some free space for a short time) and resyncing it again, but I
always marked the drive as faulty (mdadm --manage /dev/md0 --fail /dev/sdb2)
and then removed them (mdadm --manage /dev/md0 --remove /dev/sdb2) before re-
using it.

Resyncing the array is just as simple: Partition the disk, create a partition
that is the same size as the other RAID1 partition and re-add it to the array
(mdadm --manage /dev/md0 --add /dev/sdb2)

------
RobGR
The most reliable general solution on linux is 3Ware RAID cards, but linux
software RAID generally works, as other posters have noted.

If you are having persistent problems, you should search carefully for a flaky
hardware cause. Are the drives in an external sata enclosure that might have a
barrel-type power connector that is sometimes jiggled ? Are all the problems
always on the same motherboard -- perhaps it has a bad IDE controller ? Are
the drives overheating ?

If your drives support it, try to use the smartmon tools to see if one or both
is having errors. Also, consider writing a script to dump and archive the
output of smartctl -a . Next time you have an issue, look at those outputs
over time, and in addition to the badblocks and so on, look at the recorded
number of times the disk has been powered up, and see if this is consistent
with the number of times the computer has been rebooted.

I have done a lot of RAID stuff, and currently I am moving away from it and
making servers that have just one disk in them. The reason is, is that I have
decided that the best way to acheive reliability is to have two copies of any
important computer. RAID sometimes seems to introduce more problems that it
solves, and switching to a second machine that is maintained by rsync is quick
and always works.

~~~
sounddust
_I am moving away from it and making servers that have just one disk in them._

Going forward, I've also decided to stop using RAID and simply replicate
across servers. One major reason is the introduction of high-quality, low-cost
SSD drives. I anticipate that a single SSD will have a lower chance of failure
than 2 HDDs failing at the same time.

------
sounddust
Can you afford a 3Ware card- even a used one? Nothing compares to a solid
hardware RAID solution, in terms of quality, reliability, and performance. I
suffered the failure of several RAID-5 software arrays in linux about 5 years
ago (despite the failure of only one disk) and decided that it wasn't worth
the trouble. I switched to 3Ware and have never had a problem since, despite
numerous disk failures and replacements.

~~~
RiderOfGiraffes
I've suffered failures with hardware RAID, two different cards, two different
manufacturers. I was assured by several people that software RAID is now
reliable, so I switched from hardware RAID.

I note that you recommend the 3Ware card - not one I've used. Thank you for
the information, I'll do some research on that.

In the meantime, more comments, advice, information and anecdotes welcome.

