You're a bit pessimistic, but beyond that I feel like you're missing the point a bit.
The purpose of a RTOS on big hardware is to provide bounded latency guarantees to many things with complex interactions, while keeping high system throughput (but not as good as a non-RTOS).
A small microcontroller can typically only service one interrupt in a guaranteed fast fashion. If you don't use interrupt priorities, it's a mess; and if you do, you start adding up latencies so that the lowest priority interrupt can end up waiting indefinitely.
So, we tend to move to bigger microcontrollers (or small microprocessors) and run RTOS on them for timing critical stuff. You can get latencies of several microseconds with hundreds of nanoseconds of jitter fairly easily.
But bigger RTOS are kind of annoying; you don't have the option to run all the world's software out there as lower priority tasks and their POSIX layers tend to be kind of sharp and inconvenient. With preempt-rt, you can have all the normal linux userland around, and if you don't have any bad performing drivers, you can do nearly as well as a "real" RTOS. So, e.g., I've run a 1.6KHz flight control loop for a large hexrotor on a Raspberry Pi 3 plus a machine vision stack based on python+opencv.
Note that wherever we are, we can still choose to do stuff in high priority interrupt handlers, with the knowledge that it makes latency worse for everything else. Sometimes this is worth it. On modern x86 it's about 300-600 cycles to get into a high priority interrupt handler if the processor isn't in a power saving state-- this might be about 100-200ns. It's also not mutually exclusive with using things like PIO-- on i.mx8 I've used their rather fancy DMA controller which is basically a Turing complete processor to do fancy things in the background while RT stuff of various priority runs on the processor itself.
That's a best case number, based on warm power management, an operating system that isn't disabling interrupts, and the interrupt handler being warm in L2/L3 cache.
Note that things like PCIe MSI can add a couple hundred nanoseconds themselves if this is how the interrupt is arriving. If you need to load the interrupt handler out of SDRAM, add a couple hundred nanoseconds more, potentially.
And if you are using power management and let the system get into "colder" states, add tens of microseconds.
hmm, i think what matters for hard-real-time performance is the worst-case number though, the wcet, not the best or average case number. not the worst-case number for some other system that is using power management, of course, but the worst-case number for the actual system that you're using. it sounds like you're saying it's hard to guarantee a number below a microsecond, but that a microsecond is still within reach?
But you make the choices that affect these numbers. You choose whether you use power management; you choose whether you have higher priority interrupts, etc.
> that they couldn't get better than 10μs,
There are multiple things discussed here. In this subthread, we're talking about what happens on amd64 with no real operating system, a high priority interrupt, power management disabled and interrupts left enabled. You can design to consistently get 100ns with these constraints. You can also pay a few hundred nanoseconds more of taxes with slightly different constraints. This is the "apples and apples" comparison with an AVR microcontroller handling an interrupt.
Whereas with rt-preempt, we're generally talking about the interrupt firing, a task getting queued, and then run, in a contended environment. If you do not have poorly behaving drivers enabled, the latency can be a few microseconds and the jitter can be a microsecond or a bit less.
That is, we were talking about interrupt latency (absolute time) under various assumptions; osamagirl69 was talking about task jitter (variance in time) under different assumptions.
You can, of course, combine these techniques; you can do stuff in top-half interrupt handlers in Linux, and if you keep the system "warm" you can service those quite fast. But you lose abstraction benefits and you make everything else on the system more latent.
i didn't realize you were proposing using amd64 processors without a real operating system; i thought you were talking about doing the rapid-response work in top-half interrupt handlers on linux. i agree that this adds latency to everything else
with respect to latency vs. jitter, i agree that they are not the same thing, because you can have high latency with low jitter, but i don't see how your jitter can be more than your worst-case latency. isn't the jitter just the variance in the latency? if all your latencies are in the range from 0–1μs, how could you have 10μs of jitter, as osamagirl69 was reporting? i guess maybe you're saying that if you move the work into userland tasks instead of interrupts you get tens of microseconds of latency
i'm not sure that the 'apples to apples' comparison between amd64 systems and avr microcontrollers is to use equal numbers of cores on both systems. usually i'd think the relevant comparison would be systems of similar costs, or physical size, or power consumption, or difficulty of programming or setting up or something. that last one might favor a raspberry pi or amd64 rig or something though...
> i thought you were talking about doing the rapid-response work in top-half interrupt handlers on linux.
When we talk about worst-case latency to high priority top-half handlers on linux, it comes down to
A) how much time all interrupts can be disabled for. You can drive this down to near 0 by e.g. not delivering other interrupts to a given core.
B) whether you have any weird power saving features turned on.
That is, you can make choices that let you consistently hit a couple hundred ns.
> i guess maybe you're saying that if you move the work into userland tasks instead of interrupts you get tens of microseconds of latency
I think "tens" is unfair on most computers. I think "several" is possible on most, and you can get "a couple" with careful system design.
> i'm not sure that the 'apples to apples' comparison between amd64 systems and avr microcontrollers is to use equal numbers of cores on both systems.
I wasn't saying equal numbers of cores. I was saying:
* Compare interrupt handlers with interrupt handlers; not interrupt handlers with tasks. Task latency on FreeRTOS/AVR is not that great.
* Compare latency to latency, or jitter to jitter.
> be systems of similar costs
The price of a microcontroller running an RTOS is trivial, and you can even get to something running preempt_rt for about the cost of a high-end AVR (which is not a cheap microcontroller).
You have to sell a lot of units and have a particularly trivial problem to be ahead doing things the "hard way."
The purpose of a RTOS on big hardware is to provide bounded latency guarantees to many things with complex interactions, while keeping high system throughput (but not as good as a non-RTOS).
A small microcontroller can typically only service one interrupt in a guaranteed fast fashion. If you don't use interrupt priorities, it's a mess; and if you do, you start adding up latencies so that the lowest priority interrupt can end up waiting indefinitely.
So, we tend to move to bigger microcontrollers (or small microprocessors) and run RTOS on them for timing critical stuff. You can get latencies of several microseconds with hundreds of nanoseconds of jitter fairly easily.
But bigger RTOS are kind of annoying; you don't have the option to run all the world's software out there as lower priority tasks and their POSIX layers tend to be kind of sharp and inconvenient. With preempt-rt, you can have all the normal linux userland around, and if you don't have any bad performing drivers, you can do nearly as well as a "real" RTOS. So, e.g., I've run a 1.6KHz flight control loop for a large hexrotor on a Raspberry Pi 3 plus a machine vision stack based on python+opencv.
Note that wherever we are, we can still choose to do stuff in high priority interrupt handlers, with the knowledge that it makes latency worse for everything else. Sometimes this is worth it. On modern x86 it's about 300-600 cycles to get into a high priority interrupt handler if the processor isn't in a power saving state-- this might be about 100-200ns. It's also not mutually exclusive with using things like PIO-- on i.mx8 I've used their rather fancy DMA controller which is basically a Turing complete processor to do fancy things in the background while RT stuff of various priority runs on the processor itself.