
The BeagleBone's I/O pins: inside the software stack that makes them work - dwaxe
http://www.righto.com/2016/08/the-beaglebones-io-pins-inside-software.html
======
corysama
TLDR for old C hackers:

The pins are memory-mapped registers at fixed physical memory addresses. You
can use mmap to map the physical address to your process's virtual address
space. From there, you can just read and write the array. Note that you will
run into hardware timing issues if you try to do this too fast.

> By using mmap and toggling the register directly from a C++ program, the
> GPIO can be toggled at about 2.8 MHz, much faster than the device driver
> approach.[3] If you think this approach will solve your problems, think
> again. As before, there are jitter and multi-millisecond dropouts if there
> is any other load on the system.

There is a system file abstraction in place. But, it seems like a lot more
work for a 20x overhead vs manipulating a struct. The tiny example program at
the bottom is what more my style. [http://www.righto.com/2016/08/the-
beaglebones-io-pins-inside...](http://www.righto.com/2016/08/the-beaglebones-
io-pins-inside-software.html#ref3)

I read through this article rather quickly. I'd be happy to hear if I got
anything wrong.

~~~
kens
The interesting thing about the BeagleBone is there are two microcontrollers
inside the processor chip that you can use if you want to do anything that's
genuinely real-time. (I'm in the middle of using these to emulate an Ethernet
for the Alto restoration. There's a big learning curve to use them.)

As far as the file system abstraction, I'm not sure if it's clever or over-
engineered. It seems like a whole lot of infrastructure to simply flip a bit.
And the BeagleBone device trees - I can see why they did that, but it seems
like there must be a better way.

~~~
revelation
Do you use the MII interface for the PRU-ICSS to do the Ethernet?

I can't really understand why they don't have even a one-page documentation on
that stuff, but I've found code to interface that in parts of their industrial
SDK.

~~~
kens
I'm using bit-banging since the Alto's Ethernet is only 3 megabits/sec so the
PRU (the microcontroller inside the processor chip) is fast enough. I'm pretty
sure that using the MII (Media-Independent Interface, i.e. the Ethernet
controller inside the PRU inside the chip) would just cause me more problems.

------
revelation
Hilarious, GHz+ processors and they can toggle a GPIO pin at 2.8MHz with
millisecond jitter (so, not really 2.8MHz).

This is what it has come to.

Honestly, the BeagleBone (or the TI AMx series) has got it right: modern
processors with their layers upon layers of caching and modern operating
systems are utterly useless for this task. The deterministic 200MHz PRU-ICSS
processors however are perfectly suited to do it. Don't bother fighting the OS
and 30 years of CPU evolution when a PRU can toggle at 5ns intervals.

~~~
bsder
> Hilarious, GHz+ processors and they can toggle a GPIO pin at 2.8MHz with
> millisecond jitter (so, not really 2.8MHz).

Um, allow me to flip your statement around "GHz+ processors so _CHEAP_ that
you can use them to toggle a GPIO pin."

However, that's not generally how I use a BeagleBone. The BeagleBone provides
the user interface (USB, Ethernet, etc.) while a cheap cape provides the
hardware specific part. BeagleBone's have absolutely proliferated around me
once the started being able to run mainline Debian instead of weird things
like Angstrom.

That having been said--I have bit-banged the bone. The annoyance isn't the
speed so much but you have to be in Supervisor mode to change the direction of
the pin. That's really annoying and is a limitation of the TI Sitara SoC. The
RaspberryPi does not seem to have that limitation. So, if you're gonna bit-
bang, an RPi is probably a better choice.

As for why it's so slow, there are two parts.

1) mmap() marks the writes as strongly-ordered and slows things down

You can actually get to 12.5MHz. [https://groups.google.com/forum/#!category-
topic/beagleboard...](https://groups.google.com/forum/#!category-
topic/beagleboard/xuHRlDL19NI)

2) The system seems to have internal packetized transfer buses that gate the
speed

The _bus frequency_ determines the ultimate speed. This is no different from
the much smaller microcontrollers, except that the smaller microcontrollers
tend to run their bus frequencies at the same speed as the internal frequency
since they are running much slower.

~~~
revelation
The PRUs have no bus, their GPIO is in the register file and memory is SRAM.

They do have access to the beefy ARMs bus, which makes them infinitely more
useful than a cape. More often than not a cape will not solve the problem
because quelle surprise, there is then just no low latency reliable path to
control the cape from the ARM. So you've just shifted the issue for a 100x
increase in complexity.

~~~
bsder
Sure, if you need hard real-time control, a much simpler processor with a much
simpler OS (or none at all) is going to do it better. Absolutely. Arduinos and
Cortex-M's win here.

 _However_ , I find that a _VERY_ small amount of programmable logic on a cape
often takes care of the hard-real time problem while the BeagleBone gives me a
Linux backend to work against. Even if I find that the Linux side of things
has too many hiccups to be reliable, the time saved in development because I
can lean into a very powerful process with quick iterations before
transferring to a leaner, more restricted processor is valuable.

And, often, people find out that their hard real-time isn't as hard as they
thought. Sure, if I have a safety system, I'm going to be religious about
maximum latency. However, if I have a machine controller, overkilling
processing power is often way cheaper (and faster to develop) even if I get
the occasionally glitch that causes a fault which needs human intervention.

~~~
willglynn
The BeagleBone's CPU has a normal ARM processor and a pair of PRUs. (This is
much like how the Cell processor had a normal Power processor alongside some
Cell SPUs.) PRUs can control GPIO pins running your dedicated, hard-realtime
code without an OS, _and_ they can do it while Linux runs on the same chip.

[http://beagleboard.org/pru](http://beagleboard.org/pru)

~~~
bsder
I'm well aware of the on-board PRU's. They are very cool.

They also are _very specific_ to the TI processors and lock you into using the
Sitara's forevermore. Not so good if I have a product with volume.

In addition, the PRU's had _absymal_ documentation when I last checked.
Fortunately, that seems to be changing.

------
fake-name
The BeagleBone suffers from the same exact issues the Raspberry Pi does,
namely that there are only two real ways to run the peripherals: The horribly,
horribly slow filesystem interface, and "Open /dev/mem RW! YOLO! Run Apache as
root for your crappy internet of shit project!".

There are basically no actual sane drivers for doing anything fast.

At one point, I spent a while looking at writing my own driver, but apparently
I'm not a good enough hacker to write kernel drivers.

~~~
snops
Except the BeagleBone has PRUs, which are two 32 bit microcontrollers on the
same silicon that can also access the GPIO peripherals. They share memory with
the main processor but can be exclusively dedicated to your needs, and so
peripheral access can be very fast indeed.

Here is someone who has used them to turn the BeagleBone into a 100MHz Logic
Analyser: [http://theembeddedkitchen.net/beaglelogic-building-a-
logic-a...](http://theembeddedkitchen.net/beaglelogic-building-a-logic-
analyzer-with-the-prus-part-1/449)

------
ambrop7
It's sad how the software stacks have gotten to this point where toggling a
pin is so horribly inefficient. And there's no good reason it has to be.

\- Instead of filesystem operations there could be ioctl, which is still safe
from the OS perspective (yes there is overhead compared to direct register
manipulation but much less than putting strings into the /proc filesystem).

\- With a real-time kernel combined with understanding of the scheduling and
relevant concepts (e.g lock memory), it should be possible to achieve much
better latencies, possibly on the order of tens of microseconds. I'm not sure
how much variance there invariably will be due to processor cache or other
sources but I suspect it's too little for most people to care.

\- Finally, does anyone have an idea about the possibility of handling
interrupts in user space? I don't mean "second-level" (bottom-half) handlers,
but the raw interrupt handlers, without any overhead of scheduler or work
queue in front of your code. Sure there are more things to be careful about
like making sure you're acknowledging an interrupt but that's a given. Doesn't
it mostly boil down to the kernel setting up appropriate context like virtual
memory within the ISR? Do CPU designs prevent doing this?

Saying to "leave the real-time things to microcontrollers" is basically giving
up. Hardware should be as general-purpose as possible and this also applies to
being able to do real-time things. I'm not saying the PRUs are not useful, but
they are a very awkward solution for probably most real-time programming
problems that are not currently easily solvable with just the main CPU. These
things use a custom assembly language and there isn't even a C compiler for
them! I would be much happier with an integrated ARM microcontrollers.

~~~
revelation
There is a C compiler for the PRUs, but frankly you're missing the point if
you're asking for that, the whole reason of their existence is to manage
processes that require extremely deterministic timing. You might update your C
compiler, find something is compiled differently and now all the timings are
off. They are not so much meant as microcontrollers, they are for things you
would otherwise have to do on an FPGA to get the timing reliably correct.

Tens of microseconds just don't cut it. Some areas such as realtime ethernet
protocols require nanosecond-accurate packet insertion.

The summary of why things are the way they are is that modern operating
systems and CPUs value throughput over everything.

~~~
ambrop7
Let me reiterate, I don't think the PRU is a problem. But it's just about the
many orders of magnitude in the real-time latency achievable with the PRU and
achievable with the CPU under the default software stack. I shouldn't have to
use the PRU for every problem that has some real-time requirements, I should
be able to use the actual real-time abilities of the CPU. There are many
benefits to being able to do so (e.g. portability, less effort).

Real-time is not a binary thing, there's a whole spectrum. In many cases
compiling real-time code is a non-issue.

