Hacker News new | past | comments | ask | show | jobs | submit login
AArch64 Boards and Perception (juszkiewicz.com.pl)
67 points by pabs3 9 days ago | hide | past | favorite | 31 comments

> I heard several rumours about why ACPI. Someone said that ACPI was forced by Microsoft. In reality it was decision taken by all major distros and Microsoft.

Microsoft went out of their way looking for methods to make ACPI not work well with Linux. And Microsoft viewed ACPI as their spec that they didn't want Linux "getting for free".


And it's not like the distros had a choice. ACPI is the only way to, for instance, enumerate other cores than the boot processor on x86. Features as simple as SMP requires ACPI on x86.

I'm pretty sure it's referring to the use of ACPI on AArch64, which came way later and was (probably) primarily implemented to run linux.

The real energy behind ACPI was Intel, not Microsoft - Intel ironically wanted an instruction-set neutral firmware glue given that they were splitting their efforts between Itanium and x86. And Intel is pretty pro-linux.

There's much worse in UEFI - UEFI, while a lot better than the awful real-mode boot process that came before it, actually requires building Windows PE executables and dlls, even for a linux system. And the sky didn't fall.

Fully agreed on the horribleness of real-mode (BIOS) boot, but for others it might be interesting to know why: It was made to boot a relatively simple 16bit computer with a pretty limited configuration space ("do you have two floppy drives or one[1], and do you have the monochrome MDA graphics adapter or the unprofessional 4-color CGA one?").

And then continued to be bolted on, not only by the PCs BIOS but also by the possibility to have secondary boot ROMs on extension cards, over 30 years later.

[1] Call it paraphrasing, as I don't remember right now whether you could even boot from the 2nd floppy drive.

Note that whole blog is about Arm architectures and platforms rather than x86(-64) ones.

I think the Rpi is perhaps the best example of the craziness. It's the GPU that actually controls the first few stages of booting.

In the end none of that really matters, as evidenced by the PFTF* bringing it into compliance with the relevant standards. The UEFI/ACPI stack is sufficiently abstract that it has allowed x86 to evolve and scale from little embedded devices like the atomic pi, to HP superdomes using the same basic OS for a couple decades.

So the low level boot mechanisms on Arm SBC's don't really matter because they are just used to boot a SBC specific UEFI image. The UEFI image then provides a standard mechanism for booting your OS of choice. The fact its taken 15+ years for the arm ecosystem to notice this, might be an interesting thing to study.

* https://github.com/pftf (SBBR-compliant (UEFI+ACPI) AArch64 firmware for the Raspberry Pi)

It's more complex than that. One of the major reasons why there isn't a universal ARM standard of booting is that once you've gained control of execution, now you just have a bunch more per SoC work that needs to happen in the kernel compared to x86. ARM is simply far far far more heterogeneous on much deeper levels than x86-pc.

"Well, ARM just should be more standardized on it's hardware interface then", is a common refrain I hear then, but that's just nerfing a major benefit of ARM in the first place. It's a strategic advantage that it didn't have it's own set of legacy cruft or an extremely heavyweight interface to everything like PCI is.

Well neither did x86 back in the < 486 days, but ACPI, PCI and USB were developed in response to exploding complexity of systems that needed to have their machines described in config.sys and autoexec.bat files with matching user set PIO/MMIO/DMA/IRQ jumpers set.

So, at this point arm machines have shown that everyone doing their own thing just creates giant piles of DT cruft and the associated mess in the kernel as each SoC designer thinks they have the one true way to do interrupts, or whatever other portion of the SoC one chooses to look at. Hence the standards dictate the use of arm compatible GIC's and the like.

Its also a large part of why many IoT like devices get updates for a year or two, then get dropped as the maintenance burden of all these custom devices/etc gets too great.

Also, much of the work (like pinmuxing/clock mgmt/etc) is completely redundant if the firmware configures it. That is is part of the abstraction.

edit: At this point, time to market is a more important metric than saving a few thousand transistors failing to provide standard machine autodetection/etc logic.

Everything has changed though, no longer do you have your beige-box 486 connected to a totally different brand mouse over a serial connection. You have your ARM SOC on a phone or tablet talking to the touchscreen physically glued to the front of said-device. The device manufacturers know exactly what hardware is on and connected to the SOC, and can optimize accordingly. There is very little physical device-interop anymore, and anything still left is talking over higher-level busses (USB) or wireless (BLE). Nothing an end-user has is directly driving interrupt pins to a CPU anymore.

Has it? What is the difference from the perspective of a generic OS whether the disk controller is provided by BrokenHWCo and plugged into an ISA slot, or its provided by SoCInc and glued to a AXI bus in the SoC?

In both cases an engineer has to get involved to describe the address/interrupt/configuration mechanism to the OS, and likely provide a driver or set of tweaks for every single OS in existence that might ever run on the platform.

The point of the standardization is that SoCInc adds a device that behaves like one of the standard device in the ecosystem (say NVMe, AHCI or SDHCI for a disk) does the low level platform specific configuration in the firmware and hands the OS something that looks to be a standard PCI, ACPI, USB, or whatever device. Then it can boot one of the dozen OS's or hypervisors the end customer might be interested in using the drivers they already have built in. The onboard mgmt processor then does all the realtime power/perf monitoring in response to course grained OS/ACPI requests and custom code which is aware of SoC specific load/bus utilization/thermal/etc values.

So, all this ends up being an engineering tradeoff, is it worth while to attempt to cram ones application into a ESP8266 (its after-all quite capable and has excellent wifi/web style interfaces built in) or run a full blown linux kernel/etc. By the time your running linux, your already talking about a pretty massive system just to get to the capabilities of that ESP8266, so the overhead of these standards is quite trivial. They were after all invented for machines with performance and resource requirements closer to that ESP8266 than anything in the entire Cortex-A line.

The choice to dedicate engineers at SocInc to duplicate the efforts of those at Microsoft, or Canonical to shave frontend engineering costs off is likely a very poor choice. As I mentioned above, its probably a large reason why so many IoT and Android devices don't get updates after a year or two. Those engineers are off writing new autoexec.bat files for the latest SoC's. Dedicating them to fixing security holes and bugs in the Os for the next 20 years for every SoC shipped costs a lot of money.

There is very nearly 0 value add to hiring an engineer to design an interrupt controller, another to write a driver for every OS in existence for it, and another for 20 years to maintain the result. For what? Lower interrupt latency? To save a thousand transistors or a license fee? If your customer doesn't have some very specific use case for that interrupt controller the value is likely negative.

They're not spending 20 years supporting it though. They're spending six months and then moving all of their engineers to the next chip. An important customer might be able to get them to get a couple of interns to look at something, but even that's like pulling teeth.

Interrupt controllers take made a couple weeks to design and write a driver for from my experience. Round up and call it a man month. It doesn't take a lot of units sold for that to be cheaper than licensing an IP core.

I designed and wrote an entire CPU in Verilog in a couple weeks in college.

I suspect there is a large difference between wired OR interrupts and something that can inform all the arm exception levels, IPI's, direct injection into VM's, etc. AKA all the things required to run Linux on a modern Arm core in a reasonable fashion.

No, like I've wrote shipping to customers interrupt controllers, not a school project. The meat of most of what you're talking about happens in the CPU cores. The interrupt controller just exports "this is the highest priority interrupt" signal based on some signals input, and some interrupt signals that can be triggered off of a memory write (IPIs). The shit's really almost hello world HDL. VM passthrough is just a parrallel set of comparison logic and another "this is the highest priority signal" to the CPU core.

But you didn't address the important part, that your calculus is wrong on the support time frames. They aren't supporting these things for 20 years. And I guarantee you they've cranked the numbers, so much of these designs are focused at taking a fraction of a penny off of each shipped unit.

Adding to that, the underlying piece is that they don't care about the bootloader being nice, and 'they' is several entities with their own goals and requirements. The chips in SBCs are almost entirely leftovers. They made too much for some big customer that for whatever reason didn't use the whole batch, and went to sell them off at fire sale prices. Then some incredibly cost conscious shop buys them up, sticks them on cost reduced dev boards with the custom kernel and calls it a day. The chip company doesn't care because the major customers don't care. The major customers (ie. the integrator for who the chip was designed in the first place) don't care because they just want a Linux kernel booting, and don't care how that happens. The SBC customer doesn't care because they're already running on such low margins they can't afford to do anything to software as it is.

Right, but that solution ('make the hardware more complex, because who cares when it's all to support literally the fastest CPU cores you can buy') comes with a different cost tradeoff. Other than servers, ARM isn't really in the same niche that can handle that additional gate/licensing cost to paper over the differences.

And quite frankly, ARM's standards that you've given the example of "just buy more ARM IP blocks" feel more like a cash grab on ARM's part than a real attempt at a standard. "Just pay us for a GIC, and implementing a GIC interface yourself or buying from someone else will probably have our lawyers calling you" is not a real solution in the cost conscious space like SBCs.

Response to edit: ACPI doesn't make time to market any better, and they'll just phone it in there too. I think the ACPI tables of the PS4 would crash Linux if you try to boot without going in and manually editing them because they cared about time to market and only booting their OS on their hardware.

Sort of... There's two main architectures in the VideoCore. The QPUs are different than the main VideoCore, and the main videocore (which is also what runs the boot code) isn't really the GPU.

For a long time neither were documented so people just sort of assumed they were the same thing.

The original raspberry pi documentation did kind of give people the wrong idea.

The whole block with the VPUs, ISPs, mpeg4 codec and 3d pipeline could be described as a "GPU", and the gpu firmware is the firmware that runs it.

Modern Desktop CPUs also contain a management/security cpu core which runs everything. And that correctly described as GPU firmware, though its commonly called vBIOS.

But the VPUs are so much more than a management processor for the 3d pipeline. They have a huge vector array and can do software encoded/decoding of video up to 480p MPEG2 (videocore 4 added hardware h264 codecs that could do 1080p)

I like to think of the VPUs as multimedia co-processor cores.

The ipod video used the VideoCore 2 as it's gpu. It had an arm SoC as it's main cpu and a separate videocore chip. The videocore handled all display output and video decoding. It has no 3d pipeline, but they wrote an software opengl implementation which runs on the VPU to implement the games.

There are actually a few set-top boxes (and maybe a cellphone) which just just the videocore VPUs the main cpu. It's flexible enough for that.

Well, yes, but changing how the initial boot code works still means changes in the thing we call the "videocore firmware", right?

Yeah, but lots of SoCs have special boot and SoC management cores with a different ISA. The Tegra X1 in the switch uses an ARM7, CortexMs are in vogue in a lot of designs, AMD and Intel have the PSP and ME respectively, and even the SiFive SoCs have a E core that boots the main core complex of U cores.

In none of these cases (including the RPi) is this core the GPU.

Appreciate the clarification, but your paragraph above is exactly what I was getting at. Roughly that "the article is right...ARM booting on dev boards is a crazy world". As crappy as legacy BIOS and UEFI is, booting on x86/x64 is more standardized, despite things like PSP and ME.

The point I'm getting at is that sub cores to boot a main core isn't a crazy world, and you're stuck with the same thing on almost all x86 designs as well.

It's not where you should be looking at if you want to make the ARM booting space any better.

Imho the most important part of the x86 standardization is the IBM pc platform. Arm boards doesn't have an equivalent platform to standardize on, so everybody does their own thing.

BSA and BBR specs from Arm fix that. They also have a certification program https://developer.arm.com/architectures/system-architectures...

I find it unfortunate though that quite a few of the requirements for those standards is "buy more ARM IP blocks" rather than asking defining blocks that can be implemented by others.

It feels like a self defeating cash grab.

There is the device tree[0], which Linux uses to know where all the ports are on an SoC. It’s become the de facto standard for ARM. Even non-Linux OSes like iOS adopted it.

[0]: https://www.kernel.org/doc/html/latest/devicetree/usage-mode...

Interestingly, Apple's use of device tree is older than even ARM's and Linux's. It was part of OpenFirmware and used in almost all of their Macs (and Mac derived lines like iOS) since the PowerMacs gained PCI slots. Even Intel Macs would use device tree internally too, for instance passing the user's password in the FDE unlock screen in the bootloader to the main OS via a DT chosen variable.

Unfortunately this is not common on ARM SBCs, at least not yet, they usually come with a SoC vendor supplied old kernel, and that's about it. And we forgive them, for the sake of low price.

Like Odroid Go Advance, a Rockchip 3326 based "retro handheld", and it's clones, was a tremendous hit of last year, still stuck with a 4.4 kernel.

There is a project to create an open source version of the proprietary GPU firmware that boots into the ARM processor:


I'm glad there is finally some standardization. The previous "Standards in Arm space" articles[1] are good illustration why this was needed.

The LBBR spec seems nice, the article does mention that its for some hyperscale datacenter stuff but to me it seems like it could work well for sbcs too, but I don't know if I'm missing something.

[1] https://marcin.juszkiewicz.com.pl/2020/10/12/standards-in-ar...

I'd like to see phones get to that level of standardization too. Maybe wishful thinking. But imagine being able to grab a generic GNU/Linux image and slap it on your phone no matter what that phone is!

I suppose the only phone manufacturers that are at all likely to do something like this are companies like Purism and Pine64.

For phones there is the bigger problem of SoC and device vendors not mainlining Linux drivers for the hardware they have produced. This is less common with SoC vendors but I think there are still some problematic ones, and of course all the ARM GPU drivers are non-vendor ones created using reverse engineering (except the RPi one).

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact