I suspect what's actually happening here is there was an old GPT partition table on the disk which was not wiped before mdadm -C was run. This means it has a correct primary partition table (with correct CRC etc), and MD RAID has overwritten the secondary PT at the end of the disk.
Unfortunately (and I genuinely hate to be "language lawyering" when someone has lost all their data), the ASRock firmware is doing the right thing according to the letter of the UEFI spec -- see section 5.3.2 of http://www.uefi.org/sites/default/files/resources/UEFI%20Spe... :
> If the primary GPT is invalid, the backup GPT is used instead and it is located on the last logical block on the disk. If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT.
In addition, had the OP used disk partitions, a valid partition table (either MBR or GPT) would have been written to disk.
Every Linux installer that I've used sets up disk partitions for software RAID volumes. I don't think an installer for a major distro will let you use a bare device, even if you wanted to, and all of the instructions/tutorials I've seen on setting up mdadm include the configuration of partitions.
Seems like the OP went out of their way to configure software RAID in a non-standard manner, without understanding the consequences.
I hate to kick the OP while they're down, but it doesn't look like this is an issue that someone would encounter with a default/recommended configuration.
(Also note I managed to fully recover from this without data loss, and I have backups.)
This RAID setup is already quite old. For newer RAID setups I routinely put the RAID on partitions as you said, after more advice has appeared on the Internet that this is recommended.
However, I didn't find hard arguments for this beyond "if you make the partitions a bit smaller, you can accomodate disks that aren't exactly euqually large" -- certainly I didn't encounter "if you don't do this, UEFI may trash your setup".
It would be inaccurate to say I went "out of my way to set it up in a non-standard manner" though. I didn't use a distro installer, simply because the machine was already installed when I added this RAID to it. I just used straightforward `mdadm --create --create /dev/md0 --level=1 --raid-devices=2 /dev/sdc /dev/sdd` on bare devices, which is a supported use case by mdadm, and much documentation and tutorial material still mention that you can choose bare devices vs partition (probably hard to blame them for that, given that this is the first and only mainboard I have encountered that does this).
Additionally, unless you do an mdadm software RAID using full disks, you cannot ensure that the EFI system partition will be RAIDed. But if you want to avoid having your mdadm metadata at the start of the drives, you can have it placed in various locations which may help, see: https://raid.wiki.kernel.org/index.php/RAID_superblock_forma...
I use mdadm RAID1 on full disks on my desktop with the metadata at the end of the disks (superblock version 1.0). Partitioning tools will complain that my backup GPT partition table is invalid and offer to fix it (I always politely decline) but the primary one is fine. My Dell desktop doesn't mind this situation and also does not ever write any data into the EFI partition so it works well for me.
When my OS updates the bootloader it stores in the EFI system partition, I want that data to be RAIDed. I set the EFI boot order to use the first disk's EFI system partition then the second disk's. Because at the OS level they are RAIDed and because I put the mdadm metadata at the ends of the disks, the UEFI firmware sees them as normal GPT partitions (with the secondary GPT partition table corrupted at the end of the disk due to mdadm data). When the EFI firmware runs, it should only be performing reads from either of the disks and even if one entire disk is dead it can still boot the system.
This hasn't been a problem since around 2008 when all OS vendors changed their partitioning tools to use at least 1M alignment by default. I also wrote a tool (http://libguestfs.org/virt-alignment-scan.1.html) which can be used to scan and in a few cases correct for this problem.
As a side point, I've seen mention recently that 1M alignment may not be optimal in some cases either. The recommendation was to use 4M instead, which apparently worked 100%.
Just wish I could remember where I read that. I remember taking it on board as it made sense and the source seemed very credible.
I don't agree with this assessment, the UEFI spec also states that :
> If the primary GPT is corrupt, software must check the last LBA of the device to see if it has a valid GPT Header and point to a valid GPT Partition Entry Array. If it points to a valid GPT Partition Entry Array, then software should restore the primary GPT if allowed by platform policy settings (e.g. a platform may require a user to provide confirmation before restoring the table, or may allow the table to be restored automatically). Software must report whenever it restores a GPT.
Emphasis on "if allowed by platform policy settings". Booting up the system should require read-only access to the disks and writing a new GPT should be prevented in that case. At the very least after asking fo ruser confirmation, as the following section suggests :
> Software should ask a user for confirmation before restoring the primary GPT and must report whenever it does modify the media to restore a GPT.
In that case, the motherboard uefi neither asked nor reported that the primary GPT was restored.
Is the part you quoted the one that's relevant here though? As per my post, gdisk reports "corrupt GPT" (due to CRC mismatch). In section 5.3.2 you posted, it says:
> If the primary GPT is corrupt, software must check the last LBA of the device to see if it has a valid GPT Header and point to a valid GPT Partition Entry Array.
> If it points to a valid GPT Partition Entry Array, then software should restore the primary GPT if allowed by platform policy settings (e.g. a platform may require a user to provide confirmation before restoring the table, or may allow the table to be restored automatically).
> Software must report whenever it restores a GPT.
So wouldn't the spec-compliant behaviour be to "provide confirmation before restoring", or at the very least, "must report whenever it restores a GPT"?
Yes I believe you are right. It's possible the ASRock firmware did "report" this - I guess it doesn't need to ask for confirmation however ... arrgh. Did the ASRock firmware provide any "report" or feedback about what it was doing?
After cycling to the shop and buying a couple TB disks to make extra backups before messing with headers, I created new superblocks (you can tell `mdadm --create` to pick up existing data).
I then repeated the repairing and subsequent re-deletion by the mainboard on boot around 10 times, in order to ensure it's reproducible before I post on their forums.
I did not spot it outputting any kind of message about it.
Well! This explains some odd behaviour I've seen on some servers that were wiped and reinstalled frequently, often with differing legacy BIOS vs UEFI vs GPT vs MBR settings.
I had a `dd if=/dev/zero of=/dev/sda bs=1k count=1000` or similar to wipe the disk just enough that the provisioning would be happy... Or, at least, it was happy most of the time.
A valid GPT header either at the beginning or end of the disk. Due to how the GPT header is structured, the secondary starts with the partition table and ends with the header signature in the last 8 bytes of the disk.
So unlike with MBR disks, you can't just nuke the first few sectors and call it a day, you have to wipe the whole disk (or use a GPT-aware tool like wipefs).
Going by GP, if neither the primary nor backup GPTs are valid then nothing should happen.
I would still recommend having a GPT, though. There's nothing useful to be gained by avoiding it, and we've just seen what sort of failure modes can come up.
No GPT is required. Problems only happen when you have partial or conflicting partition table information. If you want to not use GPT on a disk, you need to make sure to delete both copies.
The amount of bugs in UEFI of new motherboards shocks me. Bugs are all over the place; sometimes I change an option in Setup only to find out it's disabled after reboot, and switching the option ON for the 2nd time works. Or, on my motherboard boot menu is bugged. To boot my Windows disk, I have to choose 'Switch to English' menu entry. I'm afraid to clear EFI vars because currently at least it works, who knows what'll happen when I clear EFI vars.
This really discourages me from buying new hardware.
I feel your pain but it really doesn't shock me at all. Closed source blackbox firmware made mainly by hardware vendors? What could possibly go wrong. Even AMD and Intel are very hit-or-miss with the quality of the software they produce, can you imagine the random mobo maker? Have a look at some of the vendor-maintained forks of u-boot and Linux (whose code they have to publish because the GPL) and see how many bugs you can find.
Meanwhile the software embedded in these boards becomes more and more complex with basically a full blown embedded OS, graphical interface etc...
Software shops have a hard enough time writing robust software these days, ASRock probably has a team of a dozen or so glorified interns copy-pasting from stack overflow (or worse, contracting third parties do to it). I'm exaggerating a bit of course but not by a massive amount from my personal experience working with hardware shops.
Almost certainly the code doing the GPT partition table recovery here is stock edk2 + a bunch of proprietary drivers to initialize clocks on the motherboard. The edk2 code is open source under a BSD-ish license, although I take your point that the full blob running on the ASRock motherboard doesn't come with compilable source so it's not much help. https://github.com/tianocore/edk2
After this incident, I am seriously considering buying a device with all-open-source firmware (like the Raptor Talos/Blackbird on HN recently) for work, just so that I don't have to guess what the mainboard is doing, but can actually look.
And perhaps even other people having had a look what the firmware actually does, or fix the behaviour myself (like adding a "Do you really want to?" popup before it wipes the superblocks).
UEFI implementations are a bugfest, with every other board exhibiting a new idiosyncrasy.
I probably don't even remember half of the bugs I've encountered.
My favorite one was when the integrated EFI shell wouldn't scroll, so every new line would just print over the previous one.
The most obnoxious and common is hardcoding the boot path to be "\EFI\BOOT\BOOTX64.efi", ignoring what the boot entry says on that matter.
I've ever seen \EFI\BOOT\BOOTX64.efi used as a last resort used if the system's entries don't point to a valid loader. Pretty grim if that was all your system would allow you to use.
new enough in the sense of UEFI and BIOS which is spoken about.
UEFI is from 2007, so they had 7 years to learn how to write code for it and for their own hardware. if it was a 2008 board it would be so new you could expect this kind of random bugs. 7 years on... it's becoming a sign of something which is inherently broken / never going to work as we hope.
i don't like this either. especially because often bugs are in places which aren't even necessary to begin with. Like if you find invalid data as a firmware on some device you need to use, just give a message about it, BUT LEAVE IT ALONE DON'T DESTROY IT! people think it'll be a clever trick to do these things, but really it will be asking for trouble if you scale it up to loads and loads of users who YOU CAN'T EXPECT TO FOLLOW DEFAULT / RECOMMENDED PROCEDURES.
I mean really, all that was needed was to read the content of the disk, and if you couldn't use it. just print that it was unusable. but somehow someone put in write functions to destroy peoples data because they assumed they knew all cases that would come by. instead of assuming they don't know everything and building a more careful product. the unknown unknown is what hit this developer and user of the product. it's a shame this happens especially with such a framework who is supposed to replace 'legacy' things. if this is a sign of how modern frameworks work for these kind of interactions , i'll take my legacy rubbish over it any day of the week.
This is very discouraging to hear, especially since I'm currently choosing a motherboard for a new PC build. Is there any good way to determine the quality of motherboard firmware? Are there any recommended vendors?
ASUS are generally good for consumer level boards. I've heard of the occasional bad board, but (from my perspective) they seem to be rare.
ASRock, as you see from this HN posting, has a reputation for not being quite as polished. Probably best to skip for now if you don't want to be messing with potential weirdness.
If you're looking for higher end things, Supermicro boards are good. Recent weirdness - "Chinese installed backdoors on Supermicro boards" - aside that is. ;)
In my experience, configuring whole disks as software RAID is a world of pain. My experience is that a disk with a single partition is much more reliable when in an mdadm array. I wonder whether this was because I was also using a bugged EFI mainboard.
The mdadm manpage specifically mentions to use partitions with a special partition type, not whole disks, to prevent various 3rd party software tools from messing with it.
RAID devices are virtual devices created from two or more real block devices.
This allows multiple devices (typically disk drives or partitions thereof) to
be combined ...
Perhaps, but either way it is not great to hear some UEFI has been designed to be "clever" to the point it's trying to hand hold users by deciding what should be in the superblock of a device without your explicit permission.
I can almost forgive it for OS designed for novices but even then they tend to ask permission - however, putting these assumptions into hardware and not even asking permission - that's a broken piece hardware IMO.
GPT has a backup/recovery feature. If the disk was ever a valid GPT disk, then there'll be a copy of the partition table in the final block -- and if that wasn't cleared properly then a functioning UEFI is supposed to restore the primary table from it.
The UEFI bios may be following the spec to the letter, here. If so, the takeaway is to always use `wipefs -a`.
The spec actually says that software should prompt for confirmation and report that it performed changes (see elsewhere in this thread).
`wipefs -a` is a good recommendation, but perhaps it can delete too much? In my case, I used `sgdisk --zap` to delete specifically the GPT bytes without deleting other data.
Why don't you contact them directly [0] instead of writing in some forum where no staff member will ever read it? They have a contact form on their website for technical support.
Non public issue trackers are a waste of time because they don't allow users having the same issue to congregate around discussion of that issue.
If asrock wanted people to contact them about bugs like this they should create a public bug tracker, otherwise they deserve the delayed response time from not getting this via the preferred channels.
I recently encountered a bug in an Asrock UEFI (CPU power limits ignored in the X299 ITX motherboard) and contacted Asrock about it.
Got a fixed version emailed to me within a couple of days. Perhaps try that instead of the forums.
For every good experience like that, there is an equal and opposite bad one.
I have an X399 Taichi for my TR2. I'm using 2 older U.2 NVMe drives that have a legacy boot option rom that causes the board to hang 9/10 times at POST. The only option is to disable CSM, so it will only try to use the UEFI option rom. This works fine, EXCEPT that the CSM disable is lost every time the board looses power. Eg, that configuration setting is not properly saved to non-volatile storage.
When I reported this to ASrock (via email), I was told in very broken english to "re install windows". This is great advice, considering I'm running FreeBSD, not to mention that the entire issue happens at POST. Sigh.
That's very interesting, that just made me realize my x299 ITX board has the exact same issue with CSM while trying to boot into OS/X (so also BSD-ish, but hackintosh).
So a good chance of hanging on boot, unless I disable CSM and a power loss event will clear this CSM setting.
This is interesting. FreeBSD has some... deficiencies... in the EFI bootloader's selection of memory to use at boot time. I wonder if the FreeBSD EFI boot loader and your hackintosh boot loader could be trashing the persistent memory where this CSM setting is stored. That might be why it is not a commonly reported issue.
I'm also using X399 Taichi, it had flashed probably all released non-beta firmware versions except 3.00 and 3.10, with permanently disabled CSM and it had never lost that setting.
FWIW, I'm currently running 3.2, and that's all I've ever run. I think there have been a few BIOSes since then, but I have not tried them. This bug is annoying, but manageable (esp. since the box is on a UPS), so I don't want to go through the hassle of an upgrade..
All I see in 3.30 is "Update "MdeModulePkg" module to improve M.2 RAID compatibility." I'd be thrilled if IOMMU support got better, but how can I tell from that? Eg, why do you say IOMMU support will improve? Are there more detailed release notes hidden someplace?
FWIW, I'm typing this on chrome running in a bhyve VM, and I pass a USB controller through via PCI-passthru (using IOMMU) for webcam and U2F dongle use by chrome.. The FreeBSD IOMMU driver whines a bit, but I just assumed that was buggy FreeBSD IOMMU support. Eg: "ivhd0: Error: completion failed tail:0x7e0, head:0x0."
I appreciate the courtesy of this gentleman's language. It's not only cool that he pointed out something potentially product-defeating and all the while kept a kind and informative tone. I like it when people are nice on the interneb.
> I use software RAID1 on Linux using mdadm using whole disk devices (no partitions).
> It is important to know that a disk configured to be part of an mdadm RAID array can look like a broken EFI disk.
I hate to say this, but it is obvious that a software RAID like mdadm cannot grab raw disks as a whole. If the disk doesn't have a valid GPT and EFI partition at the beginning how it is possible to boot from GRUB, load initramfs, and then load mdadm?
I think something must be misconfigured at the first place.
No. Just like all hardware based raids you can set mdadm to use full raw disk.
The GPT is only useful when you want to use part of the disk for RAID - partitioning disk into multiple parts where only some of it will be raided. Some people prefer opposite. First raid raw disks and then partition that md storage.
The other reason to put GPT (or old style partition table) is so that no system or tool will mistakenly be helpful with detrcting "uninitiated drive" and then asking to perform quick format.
I had this exact problem three years ago. My software RAID was over a raw device, not a partition, and when I got new hardware for a home file server that had been running for seven years, the new motherboard wiped the superblock. I was able to recover most of the data with some data recovery tool. Unfortunately, some of the files were research data where the filename was significant, and I lost all filenames and paths during recovery.
I've been using my ASRock Z97 Extreme 6 for about two years now and it's been very reliable for me. I ended up building a few systems for friends and went with ASRock for those (one a Z77 for an older 3rd gen Intel, another two was for Ryzen builds) and those have been solidly reliable as well.
What made you decide that? I've personally also written them off after I bought a mobo from them with two non-functioning RAM slots a few years ago, but I'm curious what others have seen. So far the only ones I have not had problem with were Gigabyte, but I suspect they have their occasional issues as well..
Edit: Before you ask, yes the RAM was listed in the compatibility list. Same RAM, same CPU, same chipset on a Gigabyte mobo works just fine.
If a user buys some second-hand hardware that's 3 years obsolete and does out-of-spec things to it… yeah, I'm hardly surprised the vendor isn't looking into it.
Really? If that's the case, you think it's too difficult for them to just say, "Thank you for your post, but we are doing what the spec says to do, see the relevant portion here: ..."? Courtesy is a thing. At least I think it still is.
Original author(s) Neil Brown
Developer(s) Jes Sorensen
Initial release 2001
why does this random authors of code needs to inform people of the fact their hardware has shitty bugs which are apparently never fixed because 'everyone will read best practices'. That seems problematic to me....
I would expect it sooner from the mainboard manufacturer in their information, - with perhaps a warning and a choice (uefi as gui so why not eh!) before destroying disk content.... (the OPs system clearly just does it without even prompting or giving an indication it will do it.) Before this firmwares modify the disk they could easily prompt to check if it's ok with the data's owner to modify this data.
manufacturers shouldn't rely on assumptions in their code for processes which can destroy someones personal data in any case. there being a warning from some people who dive into disk drives a lot that this kind of problem exists doesn't make that situation any better.
It also helps to safeguard against semi-buggy and forgettable humans. Literally the software equivalent of putting a label on your data.
GPT with 1MB aligned partitions will probably work for the foreseeable future; if for no other reason than (popularity across OSes) inertia and it being a nice binary multiple.
As a note; I refuse to use the silly 'MiB' style syntax. 1M bytes is base 10, 1MB is base 2 'near' to that, since bytes implies the native to the device base 2 alignment rather than the common among humans base 10 default.
What about recoverability features? Starting with Raid 1 mirroring and then going through to less storage wastefull like Raid 5? Or more resilient Raid 6?
You will still want stripping (raid 0) if you truly require more performance and more unified storage (single storage pool as opposed to 6 individual drives).
Usually each NVMe drive has own pcie lanes, so having two drives raided gains performance over having one drive. AMD is now really pushing into having bonus lanes, so for threadripper they published reports on 6x raid0 of nvme ssds, where they got 21gigabytes/s reads out of storage.
Please note, it is now important to check motherboard information regarding pci lanes availability, and ensure that between GPU and M2 there are plenty pcie lanes. Some early motherboards had 4 m2 slots, but only first 3 slots guarantees pcie lanes, while utilizing 4th would force sharing of pcie lanes and be slower than first 3.
The PCIe bus can only be a bottleneck if you have a fan-out PCIe switch somewhere. If all your drives are directly attached to the CPU, then the bottleneck is usually the software overhead of the RAID. Sometimes with fast SSDs, the abstraction layer can add so much latency that your investment in fast low-latency drives is squandered.
Unfortunately (and I genuinely hate to be "language lawyering" when someone has lost all their data), the ASRock firmware is doing the right thing according to the letter of the UEFI spec -- see section 5.3.2 of http://www.uefi.org/sites/default/files/resources/UEFI%20Spe... :
> If the primary GPT is invalid, the backup GPT is used instead and it is located on the last logical block on the disk. If the backup GPT is valid it must be used to restore the primary GPT. If the primary GPT is valid and the backup GPT is invalid software must restore the backup GPT.
Good time to promote my videos visualizing how GPT and Linux RAID work :-) I did some cool live visualizations of mdadm, mkfs.ext4, gdisk, injecting errors etc. https://rwmj.wordpress.com/2018/11/26/nbdkit-fosdem-test-pre... https://rwmj.wordpress.com/2018/11/06/nbd-graphical-viewer-r...