Hacker News new | past | comments | ask | show | jobs | submit login
ASRock motherboard destroys Linux software RAID (asrock.com)
226 points by nh2 on Nov 27, 2018 | hide | past | favorite | 88 comments



I suspect what's actually happening here is there was an old GPT partition table on the disk which was not wiped before mdadm -C was run. This means it has a correct primary partition table (with correct CRC etc), and MD RAID has overwritten the secondary PT at the end of the disk.

Unfortunately (and I genuinely hate to be "language lawyering" when someone has lost all their data), the ASRock firmware is doing the right thing according to the letter of the UEFI spec -- see section 5.3.2 of http://www.uefi.org/sites/default/files/resources/UEFI%20Spe... :

> If the primary GPT is invalid, the backup GPT is used instead and it is located on the last logical block on the disk. If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT.

Good time to promote my videos visualizing how GPT and Linux RAID work :-) I did some cool live visualizations of mdadm, mkfs.ext4, gdisk, injecting errors etc. https://rwmj.wordpress.com/2018/11/26/nbdkit-fosdem-test-pre... https://rwmj.wordpress.com/2018/11/06/nbd-graphical-viewer-r...


In addition, had the OP used disk partitions, a valid partition table (either MBR or GPT) would have been written to disk.

Every Linux installer that I've used sets up disk partitions for software RAID volumes. I don't think an installer for a major distro will let you use a bare device, even if you wanted to, and all of the instructions/tutorials I've seen on setting up mdadm include the configuration of partitions.

Seems like the OP went out of their way to configure software RAID in a non-standard manner, without understanding the consequences.

I hate to kick the OP while they're down, but it doesn't look like this is an issue that someone would encounter with a default/recommended configuration.


No worries, this is good advice, I appreciate it.

(Also note I managed to fully recover from this without data loss, and I have backups.)

This RAID setup is already quite old. For newer RAID setups I routinely put the RAID on partitions as you said, after more advice has appeared on the Internet that this is recommended.

However, I didn't find hard arguments for this beyond "if you make the partitions a bit smaller, you can accomodate disks that aren't exactly euqually large" -- certainly I didn't encounter "if you don't do this, UEFI may trash your setup".

It would be inaccurate to say I went "out of my way to set it up in a non-standard manner" though. I didn't use a distro installer, simply because the machine was already installed when I added this RAID to it. I just used straightforward `mdadm --create --create /dev/md0 --level=1 --raid-devices=2 /dev/sdc /dev/sdd` on bare devices, which is a supported use case by mdadm, and much documentation and tutorial material still mention that you can choose bare devices vs partition (probably hard to blame them for that, given that this is the first and only mainboard I have encountered that does this).


Additionally, unless you do an mdadm software RAID using full disks, you cannot ensure that the EFI system partition will be RAIDed. But if you want to avoid having your mdadm metadata at the start of the drives, you can have it placed in various locations which may help, see: https://raid.wiki.kernel.org/index.php/RAID_superblock_forma...

I use mdadm RAID1 on full disks on my desktop with the metadata at the end of the disks (superblock version 1.0). Partitioning tools will complain that my backup GPT partition table is invalid and offer to fix it (I always politely decline) but the primary one is fine. My Dell desktop doesn't mind this situation and also does not ever write any data into the EFI partition so it works well for me.


Why would you want the EFI system partition inside of a RAID volume that's not understood by your system firmware?


When my OS updates the bootloader it stores in the EFI system partition, I want that data to be RAIDed. I set the EFI boot order to use the first disk's EFI system partition then the second disk's. Because at the OS level they are RAIDed and because I put the mdadm metadata at the ends of the disks, the UEFI firmware sees them as normal GPT partitions (with the secondary GPT partition table corrupted at the end of the disk due to mdadm data). When the EFI firmware runs, it should only be performing reads from either of the disks and even if one entire disk is dead it can still boot the system.


It's easier to get alignment right if you use raw disk, especially if you then run dmcrypt on top of your RAID.

Unaligned partitions is a common source of bad performance, and it's not a thing that (used to) "just work".

Therefore I've seen "don't use partitions" not be recommended, since it's otherwise a source of potential unalignment.


This hasn't been a problem since around 2008 when all OS vendors changed their partitioning tools to use at least 1M alignment by default. I also wrote a tool (http://libguestfs.org/virt-alignment-scan.1.html) which can be used to scan and in a few cases correct for this problem.


As a side point, I've seen mention recently that 1M alignment may not be optimal in some cases either. The recommendation was to use 4M instead, which apparently worked 100%.

Just wish I could remember where I read that. I remember taking it on board as it made sense and the source seemed very credible.


Incidentally, thank you for all your work on libguestfs! It's spectacularly useful.


I don't agree with this assessment, the UEFI spec also states that :

> If the primary GPT is corrupt, software must check the last LBA of the device to see if it has a valid GPT Header and point to a valid GPT Partition Entry Array. If it points to a valid GPT Partition Entry Array, then software should restore the primary GPT if allowed by platform policy settings (e.g. a platform may require a user to provide confirmation before restoring the table, or may allow the table to be restored automatically). Software must report whenever it restores a GPT.

Emphasis on "if allowed by platform policy settings". Booting up the system should require read-only access to the disks and writing a new GPT should be prevented in that case. At the very least after asking fo ruser confirmation, as the following section suggests :

> Software should ask a user for confirmation before restoring the primary GPT and must report whenever it does modify the media to restore a GPT.

In that case, the motherboard uefi neither asked nor reported that the primary GPT was restored.


Good technical response!

Is the part you quoted the one that's relevant here though? As per my post, gdisk reports "corrupt GPT" (due to CRC mismatch). In section 5.3.2 you posted, it says:

> If the primary GPT is corrupt, software must check the last LBA of the device to see if it has a valid GPT Header and point to a valid GPT Partition Entry Array.

> If it points to a valid GPT Partition Entry Array, then software should restore the primary GPT if allowed by platform policy settings (e.g. a platform may require a user to provide confirmation before restoring the table, or may allow the table to be restored automatically).

> Software must report whenever it restores a GPT.

So wouldn't the spec-compliant behaviour be to "provide confirmation before restoring", or at the very least, "must report whenever it restores a GPT"?


Yes I believe you are right. It's possible the ASRock firmware did "report" this - I guess it doesn't need to ask for confirmation however ... arrgh. Did the ASRock firmware provide any "report" or feedback about what it was doing?


No, it didn't report anything.

After cycling to the shop and buying a couple TB disks to make extra backups before messing with headers, I created new superblocks (you can tell `mdadm --create` to pick up existing data).

I then repeated the repairing and subsequent re-deletion by the mainboard on boot around 10 times, in order to ensure it's reproducible before I post on their forums.

I did not spot it outputting any kind of message about it.


In fact, gdisk even says so after the Main partition table CRC mismatch :

Loaded backup partition table instead of main partition table!

And there's no "backup partition table CRC mismatch".

This tool can be very cryptic to decode the messages, though.


Well! This explains some odd behaviour I've seen on some servers that were wiped and reinstalled frequently, often with differing legacy BIOS vs UEFI vs GPT vs MBR settings.

I had a `dd if=/dev/zero of=/dev/sda bs=1k count=1000` or similar to wipe the disk just enough that the provisioning would be happy... Or, at least, it was happy most of the time.

I bet this was my root cause!


wipefs(8) is your friend here, it knows about where various signatures are and usually erases things more accurately.

Of course there is also the "nuclear" option of running blkdiscard.


oh the times of bios and just getting a message no boot disk found to let u know ur partition table was messed up :D ... such simpler times.. :D

interesting comment though thanks a lot. makes much more sense now than reading just the article.


Simpler times.

This mainboard's UEFI even has a built-in GUI for filing tech support tickets, for the case things are broken to the extent that you cannot boot.

(This feature is highlighted in the manual, "if you are having trouble with your PC".)


So UEFI requires a GPT to be present, doesn't it? Otherwise what's the objective difference between an invalid GPT and no GPT?


A valid GPT header either at the beginning or end of the disk. Due to how the GPT header is structured, the secondary starts with the partition table and ends with the header signature in the last 8 bytes of the disk.

So unlike with MBR disks, you can't just nuke the first few sectors and call it a day, you have to wipe the whole disk (or use a GPT-aware tool like wipefs).


Can I wipe first few sectors and last few sectors?


To make it look empty, yes.

It's much easier to use a tool like sgdisk

    sgdisk -Z
    sgdisk --zap-all
(note the capital Z)


Use wipefs.


Going by GP, if neither the primary nor backup GPTs are valid then nothing should happen.

I would still recommend having a GPT, though. There's nothing useful to be gained by avoiding it, and we've just seen what sort of failure modes can come up.


No GPT is required. Problems only happen when you have partial or conflicting partition table information. If you want to not use GPT on a disk, you need to make sure to delete both copies.


The amount of bugs in UEFI of new motherboards shocks me. Bugs are all over the place; sometimes I change an option in Setup only to find out it's disabled after reboot, and switching the option ON for the 2nd time works. Or, on my motherboard boot menu is bugged. To boot my Windows disk, I have to choose 'Switch to English' menu entry. I'm afraid to clear EFI vars because currently at least it works, who knows what'll happen when I clear EFI vars.

This really discourages me from buying new hardware.


I feel your pain but it really doesn't shock me at all. Closed source blackbox firmware made mainly by hardware vendors? What could possibly go wrong. Even AMD and Intel are very hit-or-miss with the quality of the software they produce, can you imagine the random mobo maker? Have a look at some of the vendor-maintained forks of u-boot and Linux (whose code they have to publish because the GPL) and see how many bugs you can find.

Meanwhile the software embedded in these boards becomes more and more complex with basically a full blown embedded OS, graphical interface etc...

Software shops have a hard enough time writing robust software these days, ASRock probably has a team of a dozen or so glorified interns copy-pasting from stack overflow (or worse, contracting third parties do to it). I'm exaggerating a bit of course but not by a massive amount from my personal experience working with hardware shops.


Almost certainly the code doing the GPT partition table recovery here is stock edk2 + a bunch of proprietary drivers to initialize clocks on the motherboard. The edk2 code is open source under a BSD-ish license, although I take your point that the full blob running on the ASRock motherboard doesn't come with compilable source so it's not much help. https://github.com/tianocore/edk2


After this incident, I am seriously considering buying a device with all-open-source firmware (like the Raptor Talos/Blackbird on HN recently) for work, just so that I don't have to guess what the mainboard is doing, but can actually look.

And perhaps even other people having had a look what the firmware actually does, or fix the behaviour myself (like adding a "Do you really want to?" popup before it wipes the superblocks).


UEFI implementations are a bugfest, with every other board exhibiting a new idiosyncrasy. I probably don't even remember half of the bugs I've encountered.

My favorite one was when the integrated EFI shell wouldn't scroll, so every new line would just print over the previous one.

The most obnoxious and common is hardcoding the boot path to be "\EFI\BOOT\BOOTX64.efi", ignoring what the boot entry says on that matter.


> The most obnoxious and common is hardcoding the boot path to be "\EFI\BOOT\BOOTX64.efi", ignoring what the boot entry says on that matter.

Tell us what manufacturer does that, so I can avoid ever buying from them.


I've ever seen \EFI\BOOT\BOOTX64.efi used as a last resort used if the system's entries don't point to a valid loader. Pretty grim if that was all your system would allow you to use.


Most recent bug: UEFI ignores POST timeout and basically skips it when a specific SATA drive is connected.


> of new motherboards

This mainboard was released in Q2 2014. It still has DDR3. Its manufacturer warranty expired last year!

This is hardly representative of new hardware.


new enough in the sense of UEFI and BIOS which is spoken about. UEFI is from 2007, so they had 7 years to learn how to write code for it and for their own hardware. if it was a 2008 board it would be so new you could expect this kind of random bugs. 7 years on... it's becoming a sign of something which is inherently broken / never going to work as we hope.


Are you sure they are talking about the motherboard in the article, or other boards they use?


i don't like this either. especially because often bugs are in places which aren't even necessary to begin with. Like if you find invalid data as a firmware on some device you need to use, just give a message about it, BUT LEAVE IT ALONE DON'T DESTROY IT! people think it'll be a clever trick to do these things, but really it will be asking for trouble if you scale it up to loads and loads of users who YOU CAN'T EXPECT TO FOLLOW DEFAULT / RECOMMENDED PROCEDURES.

I mean really, all that was needed was to read the content of the disk, and if you couldn't use it. just print that it was unusable. but somehow someone put in write functions to destroy peoples data because they assumed they knew all cases that would come by. instead of assuming they don't know everything and building a more careful product. the unknown unknown is what hit this developer and user of the product. it's a shame this happens especially with such a framework who is supposed to replace 'legacy' things. if this is a sign of how modern frameworks work for these kind of interactions , i'll take my legacy rubbish over it any day of the week.


This is very discouraging to hear, especially since I'm currently choosing a motherboard for a new PC build. Is there any good way to determine the quality of motherboard firmware? Are there any recommended vendors?


ASUS are generally good for consumer level boards. I've heard of the occasional bad board, but (from my perspective) they seem to be rare.

ASRock, as you see from this HN posting, has a reputation for not being quite as polished. Probably best to skip for now if you don't want to be messing with potential weirdness.

If you're looking for higher end things, Supermicro boards are good. Recent weirdness - "Chinese installed backdoors on Supermicro boards" - aside that is. ;)


Urgh - I find this sometimes too since upgrading that randomly bluetooth would become disabled on boot :/


> It is important to know that a disk configured to be part of an mdadm RAID array can look like a broken EFI disk.

Great detective work. When reporting a bug like this, it's this extra mile investigation that gives a lot of weight/credibility.


In my experience, configuring whole disks as software RAID is a world of pain. My experience is that a disk with a single partition is much more reliable when in an mdadm array. I wonder whether this was because I was also using a bugged EFI mainboard.


The mdadm manpage specifically mentions to use partitions with a special partition type, not whole disks, to prevent various 3rd party software tools from messing with it.


Which part of `man mdadm` says that?

My `man mdadm` shows:

   RAID devices are virtual devices created from two or more real block devices.
   This allows multiple devices (typically disk drives or partitions thereof) to
   be  combined ...

"Disk drives or partitions thereof".

I couldn't spot the mention you're referring to.


Perhaps, but either way it is not great to hear some UEFI has been designed to be "clever" to the point it's trying to hand hold users by deciding what should be in the superblock of a device without your explicit permission.

I can almost forgive it for OS designed for novices but even then they tend to ask permission - however, putting these assumptions into hardware and not even asking permission - that's a broken piece hardware IMO.


GPT has a backup/recovery feature. If the disk was ever a valid GPT disk, then there'll be a copy of the partition table in the final block -- and if that wasn't cleared properly then a functioning UEFI is supposed to restore the primary table from it.

The UEFI bios may be following the spec to the letter, here. If so, the takeaway is to always use `wipefs -a`.


The spec actually says that software should prompt for confirmation and report that it performed changes (see elsewhere in this thread).

`wipefs -a` is a good recommendation, but perhaps it can delete too much? In my case, I used `sgdisk --zap` to delete specifically the GPT bytes without deleting other data.


Why don't you contact them directly [0] instead of writing in some forum where no staff member will ever read it? They have a contact form on their website for technical support.

0: http://event.asrock.com/tsd.asp | http://forum.asrock.com/forum_posts.asp?TID=5265&title=howto...


Non public issue trackers are a waste of time because they don't allow users having the same issue to congregate around discussion of that issue.

If asrock wanted people to contact them about bugs like this they should create a public bug tracker, otherwise they deserve the delayed response time from not getting this via the preferred channels.


Googlability for people who have the same problems.


Exactly !


I recently encountered a bug in an Asrock UEFI (CPU power limits ignored in the X299 ITX motherboard) and contacted Asrock about it. Got a fixed version emailed to me within a couple of days. Perhaps try that instead of the forums.


For every good experience like that, there is an equal and opposite bad one.

I have an X399 Taichi for my TR2. I'm using 2 older U.2 NVMe drives that have a legacy boot option rom that causes the board to hang 9/10 times at POST. The only option is to disable CSM, so it will only try to use the UEFI option rom. This works fine, EXCEPT that the CSM disable is lost every time the board looses power. Eg, that configuration setting is not properly saved to non-volatile storage.

When I reported this to ASrock (via email), I was told in very broken english to "re install windows". This is great advice, considering I'm running FreeBSD, not to mention that the entire issue happens at POST. Sigh.


That's very interesting, that just made me realize my x299 ITX board has the exact same issue with CSM while trying to boot into OS/X (so also BSD-ish, but hackintosh). So a good chance of hanging on boot, unless I disable CSM and a power loss event will clear this CSM setting.


This is interesting. FreeBSD has some... deficiencies... in the EFI bootloader's selection of memory to use at boot time. I wonder if the FreeBSD EFI boot loader and your hackintosh boot loader could be trashing the persistent memory where this CSM setting is stored. That might be why it is not a commonly reported issue.


It may be a problem with your specific piece.

I'm also using X399 Taichi, it had flashed probably all released non-beta firmware versions except 3.00 and 3.10, with permanently disabled CSM and it had never lost that setting.


FWIW, I'm currently running 3.2, and that's all I've ever run. I think there have been a few BIOSes since then, but I have not tried them. This bug is annoying, but manageable (esp. since the box is on a UPS), so I don't want to go through the hassle of an upgrade..


There are other benefits to using FW 3.30[1] over 3.2 wrt hardware support and features actually working (ie- IOMMU, PCI reset, ACS, etc).

[1] https://www.asrock.com/MB/AMD/X399%20Taichi/index.asp#BIOS


All I see in 3.30 is "Update "MdeModulePkg" module to improve M.2 RAID compatibility." I'd be thrilled if IOMMU support got better, but how can I tell from that? Eg, why do you say IOMMU support will improve? Are there more detailed release notes hidden someplace?

FWIW, I'm typing this on chrome running in a bhyve VM, and I pass a USB controller through via PCI-passthru (using IOMMU) for webcam and U2F dongle use by chrome.. The FreeBSD IOMMU driver whines a bit, but I just assumed that was buggy FreeBSD IOMMU support. Eg: "ivhd0: Error: completion failed tail:0x7e0, head:0x0."


Ah, ASRock. The company that doesn't know what 'UUID' stands for.


I appreciate the courtesy of this gentleman's language. It's not only cool that he pointed out something potentially product-defeating and all the while kept a kind and informative tone. I like it when people are nice on the interneb.


> I use software RAID1 on Linux using mdadm using whole disk devices (no partitions).

> It is important to know that a disk configured to be part of an mdadm RAID array can look like a broken EFI disk.

I hate to say this, but it is obvious that a software RAID like mdadm cannot grab raw disks as a whole. If the disk doesn't have a valid GPT and EFI partition at the beginning how it is possible to boot from GRUB, load initramfs, and then load mdadm?

I think something must be misconfigured at the first place.


It's a fully supported use case of mdadm RAID to use bare devices.

These are data disks containing no operating system, so there's no booting from them and GRUB doesn't come into play.


Oh I read the context wrong. I thought he/she used the disks as a bootable disk as well. But do the disks have to have a proper GPT to be safe?


No. Just like all hardware based raids you can set mdadm to use full raw disk.

The GPT is only useful when you want to use part of the disk for RAID - partitioning disk into multiple parts where only some of it will be raided. Some people prefer opposite. First raid raw disks and then partition that md storage.

The other reason to put GPT (or old style partition table) is so that no system or tool will mistakenly be helpful with detrcting "uninitiated drive" and then asking to perform quick format.


These disks almost certainly don't contain an operating system, so they don't need to be bootable.


I had this exact problem three years ago. My software RAID was over a raw device, not a partition, and when I got new hardware for a home file server that had been running for seven years, the new motherboard wiped the superblock. I was able to recover most of the data with some data recovery tool. Unfortunately, some of the files were research data where the filename was significant, and I lost all filenames and paths during recovery.


Don't get me started with ASRock. I'm definitely NOT buying anything from them again.


Well, I originally got this ASRock mainboard because they seem to be much better on support.

For example, they have BIOS updates still in 2018, while the vendor of my previous mainboard from the same generation stopped pushing updates in 2016.


I've been using my ASRock Z97 Extreme 6 for about two years now and it's been very reliable for me. I ended up building a few systems for friends and went with ASRock for those (one a Z77 for an older 3rd gen Intel, another two was for Ryzen builds) and those have been solidly reliable as well.


What made you decide that? I've personally also written them off after I bought a mobo from them with two non-functioning RAM slots a few years ago, but I'm curious what others have seen. So far the only ones I have not had problem with were Gigabyte, but I suspect they have their occasional issues as well..

Edit: Before you ask, yes the RAM was listed in the compatibility list. Same RAM, same CPU, same chipset on a Gigabyte mobo works just fine.


Hopefully Asrock will respond. They often don't.


Looks like it has been two weeks so far and no response. That's pretty bad. Someone should at least acknowledge that they are looking into it.


If a user buys some second-hand hardware that's 3 years obsolete and does out-of-spec things to it… yeah, I'm hardly surprised the vendor isn't looking into it.


Really? If that's the case, you think it's too difficult for them to just say, "Thank you for your post, but we are doing what the spec says to do, see the relevant portion here: ..."? Courtesy is a thing. At least I think it still is.


I doubt whoever's paid to go through that forum (if there is anyone) is paid enough to even understand the problem...


https://xkcd.com/979/

See other sections in this thread: mdadm itself recommends using a partition to prevent hardware from tampering with the data.

Therefore I don't see any real problem with anything around here.


Original author(s) Neil Brown Developer(s) Jes Sorensen Initial release 2001

why does this random authors of code needs to inform people of the fact their hardware has shitty bugs which are apparently never fixed because 'everyone will read best practices'. That seems problematic to me....

I would expect it sooner from the mainboard manufacturer in their information, - with perhaps a warning and a choice (uefi as gui so why not eh!) before destroying disk content.... (the OPs system clearly just does it without even prompting or giving an indication it will do it.) Before this firmwares modify the disk they could easily prompt to check if it's ok with the data's owner to modify this data.

manufacturers shouldn't rely on assumptions in their code for processes which can destroy someones personal data in any case. there being a warning from some people who dive into disk drives a lot that this kind of problem exists doesn't make that situation any better.


Well I may be imagining a pattern here that's biased by my own mileage, but starting from what computers usually come pre-installed with (windows)...


mdadm tells you to do that as a safeguard against buggy implementations. That does not make it okay to have buggy implementations.


It also helps to safeguard against semi-buggy and forgettable humans. Literally the software equivalent of putting a label on your data.

GPT with 1MB aligned partitions will probably work for the foreseeable future; if for no other reason than (popularity across OSes) inertia and it being a nice binary multiple.

As a note; I refuse to use the silly 'MiB' style syntax. 1M bytes is base 10, 1MB is base 2 'near' to that, since bytes implies the native to the device base 2 alignment rather than the common among humans base 10 default.


Reading the title I first thought it was metaphorical. In that they destroyed in performance. Now I see it's literal.


Does software raid do anything in the era of NVMe for performance? Seems like the bottleneck would be the PCIe bus.


What about recoverability features? Starting with Raid 1 mirroring and then going through to less storage wastefull like Raid 5? Or more resilient Raid 6?


Apologies, I was specifically referring to performance


You will still want stripping (raid 0) if you truly require more performance and more unified storage (single storage pool as opposed to 6 individual drives).

Usually each NVMe drive has own pcie lanes, so having two drives raided gains performance over having one drive. AMD is now really pushing into having bonus lanes, so for threadripper they published reports on 6x raid0 of nvme ssds, where they got 21gigabytes/s reads out of storage.

https://community.amd.com/community/gaming/blog/2017/10/02/n...

Please note, it is now important to check motherboard information regarding pci lanes availability, and ensure that between GPU and M2 there are plenty pcie lanes. Some early motherboards had 4 m2 slots, but only first 3 slots guarantees pcie lanes, while utilizing 4th would force sharing of pcie lanes and be slower than first 3.


The PCIe bus can only be a bottleneck if you have a fan-out PCIe switch somewhere. If all your drives are directly attached to the CPU, then the bottleneck is usually the software overhead of the RAID. Sometimes with fast SSDs, the abstraction layer can add so much latency that your investment in fast low-latency drives is squandered.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: