
ASRock motherboard destroys Linux software RAID - nh2
http://forum.asrock.com/forum_posts.asp?TID=10174
======
rwmj
I suspect what's actually happening here is there was an old GPT partition
table on the disk which was not wiped before mdadm -C was run. This means it
has a correct primary partition table (with correct CRC etc), and MD RAID has
overwritten the secondary PT at the end of the disk.

Unfortunately (and I genuinely hate to be "language lawyering" when someone
has lost all their data), the ASRock firmware is doing the right thing
according to the letter of the UEFI spec -- see section 5.3.2 of
[http://www.uefi.org/sites/default/files/resources/UEFI%20Spe...](http://www.uefi.org/sites/default/files/resources/UEFI%20Spec%202_7_A%20Sept%206.pdf)
:

 _> If the primary GPT is invalid, the backup GPT is used instead and it is
located on the last logical block on the disk. If the backup GPT is valid it
must be used to restore the primary GPT. If the primary GPT is valid and the
backup GPT is invalid software must restore the backup GPT._

Good time to promote my videos visualizing how GPT and Linux RAID work :-) I
did some cool live visualizations of mdadm, mkfs.ext4, gdisk, injecting errors
etc. [https://rwmj.wordpress.com/2018/11/26/nbdkit-fosdem-test-
pre...](https://rwmj.wordpress.com/2018/11/26/nbdkit-fosdem-test-
presentation/#content) [https://rwmj.wordpress.com/2018/11/06/nbd-graphical-
viewer-r...](https://rwmj.wordpress.com/2018/11/06/nbd-graphical-viewer-
raid-5-edition/#content)

~~~
leni536
So UEFI requires a GPT to be present, doesn't it? Otherwise what's the
objective difference between an invalid GPT and no GPT?

~~~
creshal
A valid GPT header either at the beginning or end of the disk. Due to how the
GPT header is structured, the secondary _starts_ with the partition table and
_ends_ with the header signature in the last 8 bytes of the disk.

So unlike with MBR disks, you can't just nuke the first few sectors and call
it a day, you have to wipe the whole disk (or use a GPT-aware tool like
wipefs).

~~~
vbezhenar
Can I wipe first few sectors and last few sectors?

~~~
mjevans
To make it look empty, yes.

It's much easier to use a tool like sgdisk

    
    
        sgdisk -Z
        sgdisk --zap-all
    

(note the capital Z)

------
self_awareness
The amount of bugs in UEFI of new motherboards shocks me. Bugs are all over
the place; sometimes I change an option in Setup only to find out it's
disabled after reboot, and switching the option ON for the 2nd time works. Or,
on my motherboard boot menu is bugged. To boot my Windows disk, I have to
choose 'Switch to English' menu entry. I'm afraid to clear EFI vars because
currently at least it works, who knows what'll happen when I clear EFI vars.

This really discourages me from buying new hardware.

~~~
avhception
UEFI implementations are a bugfest, with every other board exhibiting a new
idiosyncrasy. I probably don't even remember half of the bugs I've
encountered.

My favorite one was when the integrated EFI shell wouldn't scroll, so every
new line would just print over the previous one.

The most obnoxious and common is hardcoding the boot path to be
"\EFI\BOOT\BOOTX64.efi", ignoring what the boot entry says on that matter.

~~~
Filligree
> The most obnoxious and common is hardcoding the boot path to be
> "\EFI\BOOT\BOOTX64.efi", ignoring what the boot entry says on that matter.

Tell us what manufacturer does that, so I can avoid ever buying from them.

------
wyldfire
> It is important to know that a disk configured to be part of an mdadm RAID
> array can look like a broken EFI disk.

Great detective work. When reporting a bug like this, it's this extra mile
investigation that gives a lot of weight/credibility.

------
richardwhiuk
In my experience, configuring whole disks as software RAID is a world of pain.
My experience is that a disk with a single partition is much more reliable
when in an mdadm array. I wonder whether this was because I was also using a
bugged EFI mainboard.

~~~
creshal
The mdadm manpage specifically mentions to use partitions with a special
partition type, not whole disks, to prevent various 3rd party software tools
from messing with it.

~~~
nh2
Which part of `man mdadm` says that?

My `man mdadm` shows:

    
    
       RAID devices are virtual devices created from two or more real block devices.
       This allows multiple devices (typically disk drives or partitions thereof) to
       be  combined ...
    
    

_" Disk drives or partitions thereof"_.

I couldn't spot the mention you're referring to.

------
nik736
Why don't you contact them directly [0] instead of writing in some forum where
no staff member will ever read it? They have a contact form on their website
for technical support.

0: [http://event.asrock.com/tsd.asp](http://event.asrock.com/tsd.asp) |
[http://forum.asrock.com/forum_posts.asp?TID=5265&title=howto...](http://forum.asrock.com/forum_posts.asp?TID=5265&title=howto-
contact-tech-support)

~~~
MrStonedOne
Non public issue trackers are a waste of time because they don't allow users
having the same issue to congregate around discussion of that issue.

If asrock wanted people to contact them about bugs like this they should
create a public bug tracker, otherwise they deserve the delayed response time
from not getting this via the preferred channels.

------
Arie
I recently encountered a bug in an Asrock UEFI (CPU power limits ignored in
the X299 ITX motherboard) and contacted Asrock about it. Got a fixed version
emailed to me within a couple of days. Perhaps try that instead of the forums.

~~~
drewg123
For every good experience like that, there is an equal and opposite bad one.

I have an X399 Taichi for my TR2. I'm using 2 older U.2 NVMe drives that have
a legacy boot option rom that causes the board to hang 9/10 times at POST. The
only option is to disable CSM, so it will only try to use the UEFI option rom.
This works fine, _EXCEPT_ that the CSM disable is lost every time the board
looses power. Eg, that configuration setting is not properly saved to non-
volatile storage.

When I reported this to ASrock (via email), I was told in very broken english
to "re install windows". This is great advice, considering I'm running
FreeBSD, not to mention that the entire issue happens at POST. Sigh.

~~~
Arie
That's very interesting, that just made me realize my x299 ITX board has the
exact same issue with CSM while trying to boot into OS/X (so also BSD-ish, but
hackintosh). So a good chance of hanging on boot, unless I disable CSM and a
power loss event will clear this CSM setting.

~~~
drewg123
This is interesting. FreeBSD has some... deficiencies... in the EFI
bootloader's selection of memory to use at boot time. I wonder if the FreeBSD
EFI boot loader and your hackintosh boot loader could be trashing the
persistent memory where this CSM setting is stored. That might be why it is
not a commonly reported issue.

------
sova
I appreciate the courtesy of this gentleman's language. It's not only cool
that he pointed out something potentially product-defeating and all the while
kept a kind and informative tone. I like it when people are nice on the
interneb.

------
kbumsik
> I use software RAID1 on Linux using mdadm using whole disk devices (no
> partitions).

> It is important to know that a disk configured to be part of an mdadm RAID
> array can look like a broken EFI disk.

I hate to say this, but it is obvious that a software RAID like mdadm cannot
grab raw disks as a whole. If the disk doesn't have a valid GPT and EFI
partition at the beginning how it is possible to boot from GRUB, load
initramfs, and then load mdadm?

I think something must be misconfigured at the first place.

~~~
nh2
It's a fully supported use case of mdadm RAID to use bare devices.

These are data disks containing no operating system, so there's no booting
from them and GRUB doesn't come into play.

~~~
kbumsik
Oh I read the context wrong. I thought he/she used the disks as a bootable
disk as well. But do the disks have to have a proper GPT to be safe?

~~~
adamzochowski
No. Just like all hardware based raids you can set mdadm to use full raw disk.

The GPT is only useful when you want to use part of the disk for RAID -
partitioning disk into multiple parts where only some of it will be raided.
Some people prefer opposite. First raid raw disks and then partition that md
storage.

The other reason to put GPT (or old style partition table) is so that no
system or tool will mistakenly be helpful with detrcting "uninitiated drive"
and then asking to perform quick format.

------
dadrian
I had this exact problem three years ago. My software RAID was over a raw
device, not a partition, and when I got new hardware for a home file server
that had been running for seven years, the new motherboard wiped the
superblock. I was able to recover most of the data with some data recovery
tool. Unfortunately, some of the files were research data where the filename
was significant, and I lost all filenames and paths during recovery.

------
Uhrheber
Don't get me started with ASRock. I'm definitely NOT buying anything from them
again.

~~~
nh2
Well, I originally got this ASRock mainboard because they seem to be much
better on support.

For example, they have BIOS updates still in 2018, while the vendor of my
previous mainboard from the same generation stopped pushing updates in 2016.

~~~
chaoticmass
I've been using my ASRock Z97 Extreme 6 for about two years now and it's been
very reliable for me. I ended up building a few systems for friends and went
with ASRock for those (one a Z77 for an older 3rd gen Intel, another two was
for Ryzen builds) and those have been solidly reliable as well.

------
shmerl
Hopefully Asrock will respond. They often don't.

~~~
JustSomeNobody
Looks like it has been two weeks so far and no response. That's pretty bad.
Someone should at least acknowledge that they are looking into it.

~~~
creshal
If a user buys some second-hand hardware that's 3 years obsolete and does out-
of-spec things to it… yeah, I'm hardly surprised the vendor isn't looking into
it.

~~~
JustSomeNobody
Really? If that's the case, you think it's too difficult for them to just say,
"Thank you for your post, but we are doing what the spec says to do, see the
relevant portion here: ..."? Courtesy is a thing. At least I think it still
is.

~~~
mar77i
[https://xkcd.com/979/](https://xkcd.com/979/)

See other sections in this thread: mdadm itself recommends using a partition
to prevent hardware from tampering with the data.

Therefore I don't see any real problem with anything around here.

~~~
vectorEQ
Original author(s) Neil Brown Developer(s) Jes Sorensen Initial release 2001

why does this random authors of code needs to inform people of the fact their
hardware has shitty bugs which are apparently never fixed because 'everyone
will read best practices'. That seems problematic to me....

I would expect it sooner from the mainboard manufacturer in their information,
- with perhaps a warning and a choice (uefi as gui so why not eh!) before
destroying disk content.... (the OPs system clearly just does it without even
prompting or giving an indication it will do it.) Before this firmwares modify
the disk they could easily prompt to check if it's ok with the data's owner to
modify this data.

manufacturers shouldn't rely on assumptions in their code for processes which
can destroy someones personal data in any case. there being a warning from
some people who dive into disk drives a lot that this kind of problem exists
doesn't make that situation any better.

~~~
mar77i
Well I may be imagining a pattern here that's biased by my own mileage, but
starting from what computers usually come pre-installed with (windows)...

------
dalore
Reading the title I first thought it was metaphorical. In that they destroyed
in performance. Now I see it's literal.

------
exabrial
Does software raid do anything in the era of NVMe for performance? Seems like
the bottleneck would be the PCIe bus.

~~~
adamzochowski
What about recoverability features? Starting with Raid 1 mirroring and then
going through to less storage wastefull like Raid 5? Or more resilient Raid 6?

~~~
exabrial
Apologies, I was specifically referring to performance

~~~
adamzochowski
You will still want stripping (raid 0) if you truly require more performance
and more unified storage (single storage pool as opposed to 6 individual
drives).

Usually each NVMe drive has own pcie lanes, so having two drives raided gains
performance over having one drive. AMD is now really pushing into having bonus
lanes, so for threadripper they published reports on 6x raid0 of nvme ssds,
where they got 21gigabytes/s reads out of storage.

[https://community.amd.com/community/gaming/blog/2017/10/02/n...](https://community.amd.com/community/gaming/blog/2017/10/02/now-
available-free-nvme-raid-upgrade-for-amd-x399-chipset?sf118245427=1)

Please note, it is now important to check motherboard information regarding
pci lanes availability, and ensure that between GPU and M2 there are plenty
pcie lanes. Some early motherboards had 4 m2 slots, but only first 3 slots
guarantees pcie lanes, while utilizing 4th would force sharing of pcie lanes
and be slower than first 3.

