
Optimizing latency of an Arduino MIDI controller - unkown-unknowns
http://www.jonnor.com/2017/04/optimizing-arduino-midi-controller-latency/
======
jarmitage
Relevant plug: if you're interested in making ultra low latency (as low as
80us) embedded musical instruments, check out Bela:

\- [http://bela.io](http://bela.io)

\- [http://github.com/belaplatform](http://github.com/belaplatform)

\- Many of these papers 2015 or later feature Bela:
[http://instrumentslab.org/publications/](http://instrumentslab.org/publications/)

~~~
kazinator
80 microseconds is an insignificant increment of time relative to the attack
of a musical note for most musical instruments.

The 80 microsecond wavelength corresponds to 12.5 kHz. That's in the range of
the upper harmonics that determine the "crispness" or "air" of the tone.

Loudspeakers and filters will introduce more phase shift than this.

Oh, ... and sound travels a whopping _27 centimeters_ through air in 80 us.

I don't think any event in music needs to be timed to 80 us.

"Dude, did you pull down the 12.5 kHz band on the 31 band eq again? My hi-hat
sounds late!"

"No way man, look: you moved your friggin' stool 27 cm from what it was
before, see?"

~~~
jarmitage
I am definitely not trying to argue that people can respond musically to
events on the order of 80us! Although maybe some augmented future humans will
prove that to be the case ;D

But think about it, having latency below (even way below) the threshold of
human perception in a digital musical instrument increases the possibility
space, in the same way that in digital recording you might use 192kHz sample
rate even though we don't hear in that range.

It also means you can add extra components to your system that might add more
latency without crossing the perception threshold.

So, to me there's plenty of advantages of having a system capable of this,
many of which are still to be explored.

~~~
kazinator
Low latency is useful if many devices are chained together. If you have a
chain of ten, 80us becomes 800us: 0.8ms. That is still very good.

The original MIDI was designed for (reasonable) serial chaining; many devices
have a MIDI IN and OUT port (and some have a THROUGH).

In spite of this, the protocol runs at only 31250 bps. It takes 10 bits (8N1)
to encode one byte, and it takes something like 3 bytes to encode a "note on"
message (for instance). The message is consequently 960 us wide: almost 1 ms!

So with no chaining of anything, just connecting a MIDI source (like a
keyboard) to a synthesizer with a MIDI cable, we have a 1 ms _minimum_ delay
to turn on a note caused by the sheer duration of the message on the wire.

192 kHz sample rate for _storage_ and _transmission_ of audio is complete,
utter bunk.

For _sampling_ , oversampling is useful because it's easier and cheaper to
make a fast ADC, and couple it with a cheaper, simpler analog filter. If you
want to sample at 44.1 kHz or even 48 kHz, and capture a decent range of the
audio spectrum without aliasing, you need a very steep "brick wall" filter at
the Nyquist frequency. But if you sample at 192 kHz (with an aim to capturing
the same spectrum), the filter doesn't need to be that steep. You still roll
off past 20 kHz, but less aggressively. Not only is that simpler and cheaper,
but the filter can be designed with better properties in regard to phase shift
and group delay, and flatter response near the threshold. Of course, the idea
is then to immediately reduce the data from the sampler to a lower rate. It's
like moving much of the filter into the digital domain.

------
joren-
In the publication below [1] a comparison with respect to latency is made
between Teensy, Arduino Uno, xOSC, Bela, Raspberry PI and xOSC. One of the
findings is that serial over USB is slower than Midi over USB while
technically very similar. The Axoloti [2] is not included in the publication
but is of interest as well when building low latency audio devices.

[1]
[http://www.eecs.qmul.ac.uk/~andrewm/mcpherson_nime2016.pdf](http://www.eecs.qmul.ac.uk/~andrewm/mcpherson_nime2016.pdf)

[2] [http://www.axoloti.com/](http://www.axoloti.com/)

~~~
fit2rule
The Axoloti is a superlative design for audio - both at software and hardware,
layers. The Arduino, not so much.

The Article Author doesn't mention whether they've also abandoned the Arduino
MIDI libs and written their own. Probably there's some latency up-stream that
can be reduced, as well ..

~~~
jononor
The Axoloti looks like the device I dreamt of creating many years ago when I
was very into music instruments and just got into embedded hardware/software.

------
jononor
Surprised to be on HN! Open for questions if anyone has got any.

~~~
bjt2n3904
Are you polling the sensors, or using interrupts? I don't see how going from
one sensor to eight increases latency.

~~~
jononor
Polling. The atmega32u4 only has 4/5 external interrupts, and our instrument
has 8 pads. The CapacitiveSensor Arduino library used does this sequentially,
busy-looping for each pin. It would be possible to rewrite this to do all 8
pins in parallel. Right now the readout for different pads is sampled as
slightly different times, which works but non-ideal. A more modern uC can have
capacitive sensing pherirals, like in the Teensy 3.0

~~~
the-dude
IIRC, on a 328 you can turn on external interrupt for an entire port ( 8 pins
) at once. Once the interrupts fires, you detect which pin has changed.

This doesn't work for the 32u4 ?

~~~
jononor
Looking at the datasheet for 32u4, port B does have pin change interrupt. So
one could maybe use that. Depending on what pins are exposed on Leonardo
board, might need to combine with with external interrupts on pins to get a
full 8. A challenge is that the capacitive sensing works by sensing how long
it takes. For small capacitance (depends on your sensor pads) and resistor
values, this is in the order of tens of clock cycles. That can be challenging
to measure with a timer. Several such measurents are summed up to suppress
noise. However for bigger sensors with higher capacitance, like for distance
sensing, then interrupt-based code makes more sense because the uC would
actually wait for a significant time.

------
revelation
Not directly related to the OP, but since it's a popular setup: the best you
can hope for with a USB 2.0 FTDI or USB serial converter is USB fullspeed
frame rate, or 1 kHz. So 1 millisecond in one direction.

~~~
voltagex_
What are the other options?

------
tzs
> My first idea was to use a high-speed camera, using the video image to
> determine when pad is hit and the audio to detect when sound comes from the
> computer. However even at 120 FPS, which some modern cameras/smartphones can
> do, there is 8.33 ms per frame. So to find when pad was hit with higher
> accuracy (1ms) would require using multiple frames and interpolating the
> motion between them.

I wonder how accurate you could get if you hit the pad with the phone and used
the phone's accelerometer to figure out when the impact occurred?

~~~
jononor
Samples rates of accelerometers are usually around 100Hz, so 10ms between each
sample. Some phones might be as high as 250Hz which might start to be usable.
One challenge when using different sensors is to establish a joint timeline
precisely. Might need to synchronize them with an event observed in both at
the same time, like the 'clapper' used in filmmaking.

------
MrZeus
When I read the title of the article I wondered "did they just discover
ASIO4ALL?"

Yes. Yes, they did.

\- [http://asio4all.com/](http://asio4all.com/)

