
PRU tips: Understanding the BeagleBone's built-in microcontrollers - dwaxe
http://www.righto.com/2016/08/pru-tips-understanding-beaglebones.html
======
nneonneo
The PRU is a fantastic bit of hardware. With two of these running at 200MHz,
and direct register-mapped access to GPIO pins, there's a lot of cool stuff
you can do with them.

For example, in a recent project, I used one of the PRUs to generate a precise
40MHz square wave clock signal with 40% duty cycle, and the other to read the
signal pin of a camera module into shared RAM. It worked extremely well,
allowing me to obtain camera data at hundreds of FPS, and freed up the main
CPU to do some fairly heavy image processing - all without involving an
expensive camera capture rig or an external PC.

~~~
chrissnell
Neat! Is the source available?

~~~
nneonneo
It's for an academic project which is ongoing (the camera control is just one
part of the system). When it's done, I'd be happy to share source (and indeed
we may simply open-source everything related to the project to accompany the
publication).

~~~
derekja
Hmm, I wish HN had reddit's remindme bot! Sounds great.

------
jarmitage
BeagleBone's PRUs are an enabler of Bela, "the embedded platform for ultra-low
latency audio and sensor processing" which was ~1000% funded on KickStarter
earlier this year.

Bela is being used at the Augmented (Musical) Instruments Lab in London (and
now the community) to make rich, responsive digital musical instruments. It
has a Web IDE, supports C++, Pure Data, Faust, SuperCollider, etc., but again
thanks to the BB's PRU's, it supports audio-rate sensor sampling!

Site: [http://bela.io/](http://bela.io/)

Code:
[https://github.com/BelaPlatform/Bela](https://github.com/BelaPlatform/Bela)

Videos:
[https://www.youtube.com/channel/UCgWd1Q2dcWdqCGNl5BijFsA](https://www.youtube.com/channel/UCgWd1Q2dcWdqCGNl5BijFsA)

Paper: "An environment for submillisecond-latency audio and sensor processing
on BeagleBone Black"
[http://www.eecs.qmul.ac.uk/~andrewm/mcpherson_aes2015.pdf](http://www.eecs.qmul.ac.uk/~andrewm/mcpherson_aes2015.pdf)

Lab:
[http://www.eecs.qmul.ac.uk/~andrewm/](http://www.eecs.qmul.ac.uk/~andrewm/)

~~~
alex_hirner
What a beautiful project. I started to hack around for a guitar amp emulator
on the bb in 2014, but it quickly became too involved. Thanks for sharing.

------
JoeAltmaier
The PRU is pretty cool. Just had to simulate a dozen eCAP devices and its just
a few lines of code. An eCAP records the period (in ticks) of a signal on an
input pin. It can be triggered on a rising or falling signal, and can record
absolute ticks or the number of ticks since the last trigger (differential
mode). Here's the main loop:

    
    
       while (1)
    	{
    		uint32_t sample = __R31;
    		uint32_t change = sample ^ pEcap->sample;
    
    		if (change)
    		{
    			// Calculate which bits have seen a desired edge (it changed, and we trigger on that change)
    			uint32_t edge = change & trigger;
    
    			if (edge)
    			{
    				uint32_t ts = ReadTimestamp();
    
    				int bit = 1;
    				int iBit = 0;
    				while (iBit < 30)
    				{
    					if (edge & bit)
    					{
    						// Store the ts (or the difference in ts) into the capture table for this bit
    						//  and increment the slot index (aICap)
    						pEcap->ECAP[iBit][pEcap->aICap[iBit]++] = (pEcap->differential & bit)? (ts - tsLast[iBit]) :ts;
    						pEcap->aICap[iBit] %= 4;
    
    						// Next time we calculate the difference from this time
    						tsLast[iBit] = ts;
    					}
    					iBit++;
    					bit <<= 1;
    				}
    
    				pEcap->edgeDetected |= edge;
    			}
    
    			pEcap->sample = sample;
    			trigger = sample ^ pEcap->edgeUpDown;
    		}
    	}
    
    

It detects an edge within a few dozen nanoseconds (low jitter). A while loop
like this would kill a main processor thread; and it would have terrible
latency when other threads were scheduled during a trigger event.

And it detects edges on 30 pins in parallel! I could work on the "which pin
had a trigger" code to reduce the period calculation from X30 to log(30) but I
have no need for that fine latency in my current application.

------
amorphic
I actually had no idea that the Beaglebone had these guys hiding onboard. A
nice alternative to something like a RasPi with an Arduino hat or a USB-
connected Arduino.

This is great for projects where you control the whole stack and don't plan to
support anything else. I guess the only downside to going down this path is
that you're locking in the Beaglebone Black as your sole hardware platform and
losing some modularity.

For example it would be great to control a 3D printer by running something
like OpenGB ([http://opengb.readthedocs.io/](http://opengb.readthedocs.io/))
on the main CPU and something like Marlin
([https://github.com/MarlinFirmware](https://github.com/MarlinFirmware)) on a
PRU. But that would require a Beaglebone-centric approach which wouldn't work
on other hardware combinations.

That's just an observation though - I'm really impressed that the PRUs are
there!

~~~
cnvogel
So, structure your project that you can either use the internal PRU or an
external motor controller… I don't see the big problem.

I'd say that most embedded related projects will have to deal with newer
revisions of their hardware, maybe because an older part is no longer
available or because people came up with more intelligent or less buggy
circuits over time. So that's already a few (prob. very minor) variations
you'll have to support.

And besides very trivial projects, you should always try to have at least one
"dummy" implementation of everything, to facilitate automated testing of your
code.

So, in your case, yout 3D printer controller could support the internal PRU of
the Sitara, an external servo controller, or some dummy library that just logs
positions to a file, for testing.

~~~
amorphic
On reflection - you're absolutely right. In fact this is exactly the way
OpenGB is designed!

OpenGB uses an abstract base class (called IPrinter) to describe a printer
interface. At the moment there exists a Marlin implementation and a Dummy
implementation (as you describe) of IPrinter.

Other comments mention BeagleG and MachineKit. It _should_ be pretty trivial
to add IPrinter implementations of both of these.

Thanks for the inspiration! :)

------
AceJohnny2
Modern SoCs are a circus of various CPU cores, and there's a huge amount of
work going on at the companies that design them to make them work together.
Even for those SoCs that run open-source software, many of those secondary
cores aren't visible to people outside the company creating the SoC. They're
hidden behind binary blob "firmware". One of the systems I worked on over 5
years ago had ARM Cortex-A7s running the main OS (could be Linux), talking to
a proprietary DSP running some in-house RTOS, talking to an 6502... The
systems I'm currently working on have way more and more modern cores than
that. As a user, or even as a developper, you wouldn't know.

Kudos to Ken for lifting the curtain a bit on the Sitara's PRUs!

It's a shame there isn't more developer availability of these cores. I'm kind
of shocked that the way to program these is still just through assembly, but
really not that surprised.

~~~
fest
> I'm kind of shocked that the way to program these is still just through
> assembly, but really not that surprised.

This is not true anymore (was in the beginning though)- there is a C/C++
compiler available.

------
kens
Wow, a lot more PRU interest than I expected. Thanks everyone for the
comments! Looking online, I got the impression that hardly anyone was using
the PRU. Are there any forums I should visit where PRU programmers hang out?

------
Taniwha
I think the main thing to think about PRUs is that they are the way to do a
class of things in software that you'd normally break out some custom logic or
an fpga to implement. Sure they're hard to initially set up for but way less
effort than breaking out the PCB design software

There is also a C compiler available, I haven't used it, but Tridge (of Samba
fame) gave a demo at linuxconf a couple of years back in which he launched a
plane (remotely) and ran the guidance software on a BB, with the PRUs doing
all the servo/etc work coded in C ..... and compiled the linux kernel onboard
at the same time ....

~~~
fest
Two things from my experience:

1) Read the fine print when it comes to processor manufacturers telling how
much time something takes. Although PRU subsystem is deterministic and
__most__ instructions take 5ns, there are quite a few cases which take an
order of magnitude more time[1]. Sure, the access times might be deterministic
but that doesn't make it easy to know how much something will take.

2) Remote-controlled airborne vehicles need a lot less computation power than
I expected. PX4 runs it's "main loop" at just 400Hz and I've seen PX4 or
ArduPilot devs (might even be tridge) saying that 50Hz would be enough. Sure,
you need accurate timing for PWM outputs (~1us resolution) and most important:
low jitter.

The first point has bitten me personally- I found it non-trivial to get
reliable < 100ns interrupt jitter on Cortex-M4. It really got down to what was
happening on the bus between CPU/memory/peripherals at the point interrupt was
supposed to fire (e.g. getting the documented latency of 10-odd cycles when
CPU is idle but a lot more when there is a DMA transfer in progress).

1:
[http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Late...](http://processors.wiki.ti.com/index.php/AM335x_PRU_Read_Latencies)

~~~
MegaDeKay
Tridge gives a good summary of the different timing requirements in a
presentation he gave back in 2014 (page 8).

[http://uav.tridgell.net/LCA2014/AP_Linux.pdf](http://uav.tridgell.net/LCA2014/AP_Linux.pdf)

------
greggyb
This article makes repeated references like the following:

> If you want to perform real-time operations, the BeagleBone's ARM processor
> won't work well since Linux isn't a real-time operating system.

I get that Linux is not a real time system, but the article seems to be making
the implication that the ARM processor cannot support real time operations. I
was under the impression that the real-ness of time was determined solely by
the operating system and not by the hardware. Is this not the case, or is the
article just making the assumption that no one is going to port a real time
operating system to the BeagleBone?

~~~
AceJohnny2
Amusingly enough, performance enhancing features like cache or a flexible
interrupt controller get in the way of a system's "real-timeness", because the
timing of operations is no longer clearly deterministic. That's why ARM has
cores like the Cortex-R series (R as in Realtime) that do promise real-time
characteristics. The Sitara in the Beagleboard has a Cortex-A (A as in
Application), which is not designed for realtime.

So yeah, it's more than just the OS.

~~~
Gibbon1
Just simply having interrupts will mess with a hard real time processes. Total
sketchtastic, some modern processors have background stuff that takes control
every so often.

------
curveship
The PRU is the unsung gem of the BeagleBone. I used it to prototype an optical
control system that needed sub-100ns timing. As the article hints, it's not an
easy beast to get started with, but it offers crazy realtime abilities.

~~~
seren
That's pretty interesting, could you detail a little bit what was executed by
the PRU and what was executed by the CPU ?

~~~
curveship
The CPU loaded a bytecode program into the PRU's memory and told it to run.

The PRU ran a bytecode execution engine that was handcoded in assembly. The
core of the engine was a loop:

\- perform the routine for the current bytecode

\- when the routine finished, spin on the PRU's embedded timer (the IEP) until
256 ticks (1.28us) had passed since the start of the cycle

\- advanced to the next bytecode and repeated

External devices were connected either via I2C (those that didn't need super-
tight timing) or to the pins controlled by the PRU. Some of the bytecodes
controlled execution flow like loops while others were specific to the
application, things like turn light on, move lens, trigger camera, etc.

There were a few more bells and whistles, like memory locations where flags
were set by the engine to let the CPU know how progress was going.

It was a fun project. Unfortunately, the manufacturer of one of the core
devices decided to stop making it, so that start-up had to go in a different
direction.

------
heeen2
I am driving a RGB led matrix from a PRU unit:
[http://heeen.de/proj/rgbmatrix/fablab.mp4](http://heeen.de/proj/rgbmatrix/fablab.mp4)

------
vibrolax
Machinekit, originally derived from Linux CNC, uses the BeagleBone's PRU's to
handle the real time I/O.

Code: [https://github.com/machinekit/](https://github.com/machinekit/) PRU
links: [http://blog.machinekit.io/2013/06/beagle-bone-pru-
links.html](http://blog.machinekit.io/2013/06/beagle-bone-pru-links.html)

------
ambrop7
Can someone say how the PRU would compare to a "similar" microcontroller? And
what the most similar core would be, e.g. M0 or M3.

