Microsoft went out of their way looking for methods to make ACPI not work well with Linux. And Microsoft viewed ACPI as their spec that they didn't want Linux "getting for free".
And it's not like the distros had a choice. ACPI is the only way to, for instance, enumerate other cores than the boot processor on x86. Features as simple as SMP requires ACPI on x86.
There's much worse in UEFI - UEFI, while a lot better than the awful real-mode boot process that came before it, actually requires building Windows PE executables and dlls, even for a linux system. And the sky didn't fall.
And then continued to be bolted on, not only by the PCs BIOS but also by the possibility to have secondary boot ROMs on extension cards, over 30 years later.
 Call it paraphrasing, as I don't remember right now whether you could even boot from the 2nd floppy drive.
So the low level boot mechanisms on Arm SBC's don't really matter because they are just used to boot a SBC specific UEFI image. The UEFI image then provides a standard mechanism for booting your OS of choice. The fact its taken 15+ years for the arm ecosystem to notice this, might be an interesting thing to study.
* https://github.com/pftf (SBBR-compliant (UEFI+ACPI) AArch64 firmware for the Raspberry Pi)
"Well, ARM just should be more standardized on it's hardware interface then", is a common refrain I hear then, but that's just nerfing a major benefit of ARM in the first place. It's a strategic advantage that it didn't have it's own set of legacy cruft or an extremely heavyweight interface to everything like PCI is.
So, at this point arm machines have shown that everyone doing their own thing just creates giant piles of DT cruft and the associated mess in the kernel as each SoC designer thinks they have the one true way to do interrupts, or whatever other portion of the SoC one chooses to look at. Hence the standards dictate the use of arm compatible GIC's and the like.
Its also a large part of why many IoT like devices get updates for a year or two, then get dropped as the maintenance burden of all these custom devices/etc gets too great.
Also, much of the work (like pinmuxing/clock mgmt/etc) is completely redundant if the firmware configures it. That is is part of the abstraction.
edit: At this point, time to market is a more important metric than saving a few thousand transistors failing to provide standard machine autodetection/etc logic.
In both cases an engineer has to get involved to describe the address/interrupt/configuration mechanism to the OS, and likely provide a driver or set of tweaks for every single OS in existence that might ever run on the platform.
The point of the standardization is that SoCInc adds a device that behaves like one of the standard device in the ecosystem (say NVMe, AHCI or SDHCI for a disk) does the low level platform specific configuration in the firmware and hands the OS something that looks to be a standard PCI, ACPI, USB, or whatever device. Then it can boot one of the dozen OS's or hypervisors the end customer might be interested in using the drivers they already have built in. The onboard mgmt processor then does all the realtime power/perf monitoring in response to course grained OS/ACPI requests and custom code which is aware of SoC specific load/bus utilization/thermal/etc values.
So, all this ends up being an engineering tradeoff, is it worth while to attempt to cram ones application into a ESP8266 (its after-all quite capable and has excellent wifi/web style interfaces built in) or run a full blown linux kernel/etc. By the time your running linux, your already talking about a pretty massive system just to get to the capabilities of that ESP8266, so the overhead of these standards is quite trivial. They were after all invented for machines with performance and resource requirements closer to that ESP8266 than anything in the entire Cortex-A line.
The choice to dedicate engineers at SocInc to duplicate the efforts of those at Microsoft, or Canonical to shave frontend engineering costs off is likely a very poor choice. As I mentioned above, its probably a large reason why so many IoT and Android devices don't get updates after a year or two. Those engineers are off writing new autoexec.bat files for the latest SoC's. Dedicating them to fixing security holes and bugs in the Os for the next 20 years for every SoC shipped costs a lot of money.
There is very nearly 0 value add to hiring an engineer to design an interrupt controller, another to write a driver for every OS in existence for it, and another for 20 years to maintain the result. For what? Lower interrupt latency? To save a thousand transistors or a license fee? If your customer doesn't have some very specific use case for that interrupt controller the value is likely negative.
Interrupt controllers take made a couple weeks to design and write a driver for from my experience. Round up and call it a man month. It doesn't take a lot of units sold for that to be cheaper than licensing an IP core.
I suspect there is a large difference between wired OR interrupts and something that can inform all the arm exception levels, IPI's, direct injection into VM's, etc. AKA all the things required to run Linux on a modern Arm core in a reasonable fashion.
But you didn't address the important part, that your calculus is wrong on the support time frames. They aren't supporting these things for 20 years. And I guarantee you they've cranked the numbers, so much of these designs are focused at taking a fraction of a penny off of each shipped unit.
Adding to that, the underlying piece is that they don't care about the bootloader being nice, and 'they' is several entities with their own goals and requirements. The chips in SBCs are almost entirely leftovers. They made too much for some big customer that for whatever reason didn't use the whole batch, and went to sell them off at fire sale prices. Then some incredibly cost conscious shop buys them up, sticks them on cost reduced dev boards with the custom kernel and calls it a day. The chip company doesn't care because the major customers don't care. The major customers (ie. the integrator for who the chip was designed in the first place) don't care because they just want a Linux kernel booting, and don't care how that happens. The SBC customer doesn't care because they're already running on such low margins they can't afford to do anything to software as it is.
And quite frankly, ARM's standards that you've given the example of "just buy more ARM IP blocks" feel more like a cash grab on ARM's part than a real attempt at a standard. "Just pay us for a GIC, and implementing a GIC interface yourself or buying from someone else will probably have our lawyers calling you" is not a real solution in the cost conscious space like SBCs.
Response to edit: ACPI doesn't make time to market any better, and they'll just phone it in there too. I think the ACPI tables of the PS4 would crash Linux if you try to boot without going in and manually editing them because they cared about time to market and only booting their OS on their hardware.
For a long time neither were documented so people just sort of assumed they were the same thing.
The whole block with the VPUs, ISPs, mpeg4 codec and 3d pipeline could be described as a "GPU", and the gpu firmware is the firmware that runs it.
Modern Desktop CPUs also contain a management/security cpu core which runs everything. And that correctly described as GPU firmware, though its commonly called vBIOS.
But the VPUs are so much more than a management processor for the 3d pipeline. They have a huge vector array and can do software encoded/decoding of video up to 480p MPEG2 (videocore 4 added hardware h264 codecs that could do 1080p)
I like to think of the VPUs as multimedia co-processor cores.
The ipod video used the VideoCore 2 as it's gpu. It had an arm SoC as it's main cpu and a separate videocore chip. The videocore handled all display output and video decoding. It has no 3d pipeline, but they wrote an software opengl implementation which runs on the VPU to implement the games.
There are actually a few set-top boxes (and maybe a cellphone) which just just the videocore VPUs the main cpu. It's flexible enough for that.
In none of these cases (including the RPi) is this core the GPU.
It's not where you should be looking at if you want to make the ARM booting space any better.
It feels like a self defeating cash grab.
Like Odroid Go Advance, a Rockchip 3326 based "retro handheld", and it's clones, was a tremendous hit of last year, still stuck with a 4.4 kernel.
The LBBR spec seems nice, the article does mention that its for some hyperscale datacenter stuff but to me it seems like it could work well for sbcs too, but I don't know if I'm missing something.
I suppose the only phone manufacturers that are at all likely to do something like this are companies like Purism and Pine64.