Bare metal C++ for PI. https://github.com/rsta2/circle Access to most of the har...

rleigh · on March 22, 2023

It's still an A-profile MPU and not an R- or M-profile MCU, and while it will be fast it will have less deterministic behaviour than we might like. If you disable the caches and MMU you'll get better consistency. But wouldn't we expect ~microsecond accuracy from a properly-configured MCU?; ~millisecond accuracy is not a particularly high bar.

lll-o-lll · on March 22, 2023

You can read pins (well one) with sub-microsecond latency using the Fast Interrupt Request, but I have not tried this myself. I think a PI would be more than capable of matching most microcontrollers just due to its very fast clock speed. Add multiple cores with the PI4 and you get a crazy amount of compute between each pulse as well.

There are a bunch of clocks that run plenty fast to enable high resolution timing as well.

rleigh · on March 22, 2023

The high clock speed and multiple cores are great. It's definitely a beefy system. But this is completely orthogonal to timing accuracy and consistency. Speed does not make it more consistent. Tiny low power MCUs have much more accurate and consistent timing.

Low latency can be a good thing, but it's also not related to consistency, particularly when you start looking at what the worst-case scenario can be.

lll-o-lll · on March 22, 2023

So I am the opposite of an expert here, but I don’t follow. If I have control over the interrupts (which I do) and I have high precision timers (which I have), why can I not drive a gpio pin high for X microseconds accurately? What’s going to stuff it up?

rleigh · on March 22, 2023

As I mentioned in the previous reply, the CPU caches and the MMU to begin with. You're probably running your application from SDRAM and an SD card. The caches and page tables result in nondeterminism, because the timing depends upon existing cache state, and how long it takes to do a lookup in the page tables. And as soon as you have multiple cores, the cache coherency requirements can cause further subtle effects. This is why MCUs run from SRAM and internal FLASH with XIP, and use an MPU. It can give you cycle-accurate determinism.

The A-profile cores are for throughput and speed, not for accurate or consistent timing. However, you can disable both the cache and the MMU, if you want to, which will get you much closer to the behaviour of a typical M-profile core, modulo the use of SRAM and XIP. If you're running bare metal with your own interrupt handlers, you should get good results, excepting for the above caveats, but I don't think you'll be able to get as accurate and consistent results as you would be above to achieve with an MCU. But I would love to be proven wrong.

While most of my bare metal experience has been with ST and Nordic parts, I've recently started playing around with a Zynq 7000 FPGA which contains two A9 A-profile cores and GIC. It's a bit more specialised than the MPU since you need to define the AXI buses and peripherals in the FPGA fabric yourself, but it has the same type of interrupt controller and MMU. It will be interesting to profile it and see how comparable it is to the RPi in practice.

lll-o-lll · on March 23, 2023

This is something that could only really be proven by actually testing and I don’t have a fast enough scope to really prove things.

Having said that, I think some of the concerns have fairly simple mitigations. Because of the high clock speed, I can’t see that disabling cache and MMU is required. The maximum “stall times” from either of these components should still fall well below what would be needed. It’s bounded non determinism. That’s completely different to running things under Linux.

Secondly, having multiple cores allows for offloading non-deterministic operations. The primary core can be used for real-time, while still allowing non-deterministic operations on others. The only thing to consider is maximum possible time for synchronization (for which there are some helpful tools).

As I said, I’m far from an expert. It was close to 20 years ago when I last did embedded development for a job, and I was a junior back then anyway. Still, I’d be interested to know if you think I’m way off beam.

rleigh · on March 24, 2023

I think you're pretty much correct. Whether these details matter is entirely application-specific, but you can go the extra mile if your application requirements demand it.

There are certainly multi-core MPUs and MCUs with a mixture of cores. The i.MX series from NXP have multi-core A7s with an M4 core for realtime use. Some of the ST H7 MCUs have dual M7 and M4 cores for partitioning tasks. There are plenty of others as well, these are the ones I've used in the past and present.