Hacker News new | past | comments | ask | show | jobs | submit login
The Horror of Cross-platform Audio Output (camlorn.net)
57 points by ctoth on Dec 10, 2014 | hide | past | favorite | 30 comments

I still remember when audio output meant writing your own interrupt handler and directly poking IO ports, so at least we've come a little way since then.

I've done this, and I'd rather do it again than deal with cross platform audio.

Unless I was allowed to use jackd, but that probably doesn't count.

or breadboarding a DAC-to-parallel-port "sound card" because the piezo buzzer just didn't cut it

Ah the Covox[1] craze, sweet memories.

BTW: I think you meant to say "parallel-port-to-DAC", where "DAC" all too often meant just a simple cobbled-together resistor ladder.

Who cares about HiFi when you can have actual sampled sounds instead of tinny beeps? :-)

It was also possible to play sampled sounds through the PC speaker using pulse-width modulation. Now that was a hack.

Interesting to see that a number of complaints have been levelled against the build systems.

It shows just how careful maintainers should be about creating smoother build experiences. Anyway I hope the author of the article will post a follow up with his new library.

That is 100% the reason why Libreoffice has taken often leave Apache OpenOffice in the dust. Seriously.

Interesting idea. Do you have any evidence for that theory?

Ever tried to build Apache Open Office?

I'm in the final stages of development, the point where it needs to be tested in real software. I'm also approaching the point of some cool demos. I'm working on something that I've not seen or really heard of in other libraries for this kind of stuff, but going into it needs the wave files and an article, not a brief comment here. nevertheless, there will certainly be followups. Also, complaining about the build systems is hypocritical. Mine requires CMake and something like 5 Python packages--I generate half my code by crawling my own headers and a Yaml file describing my library. This makes the build system a little inconvenient but, among other things, lets me spit out automatically maintained language bindings that are far above what Swig would give. Which is also something I need to write a blog post about.

Without a hint of irony, the very next section complains about RtAudio, which has everything in one file to fix exactly this problem ;)

(And it's only 10,000 lines.)

10000 lines is twice the size of my whole library, but half my library codes itself from a Yaml description (there were good reasons for this; it's actually worked out really, really well). I don't honestly care if the build process is a little awful. I'll deal, if it works on my platform. Having to read an entire 10000 line source file to build up familiarity with the library before fixing it bothers me much more and, if it's still going to need a build step, stopping at "no external dependencies" ought to be enough.

I place great stock in the organization of my files as well as my code, and rarely let them go above 1000 lines each. Perhaps this is a weakness.

>I don't honestly care if the build process is a little awful. I'll deal, if it works on my platform.

My hopes have just been dashed. I was really hoping you'd make a smooth cross platform build system top of your list.

I still wish you all the best anyway.

I have written quite a bit of cross platform audio code. Portaudio really works fine. If you need low latency on Windows, use the ASIO backend. Compiling it on Windows is a bitch, though.

Didn't find compilation particularly hard on Windows, might be because I know the tools very well though.

On the latency: do you (or anyone reading this) happen to knwo what's the current state of low latency audio (say <5mSec) on linux? A couple of years ago we did an application for human auditory system research which required stable (as in, not ever dropping any buffers) low latency. Original idea was to do it for both windows and linux, the gui would be Qt anyway so that got the cross-platform part coverd already. We were using an RME Multiface and on windows we'd just hook it up and select e.g. 48kHz 128samples buffersize, full duplex and it just worked without any problems with the standard ASIO samples. On linux we initially hardly managed to get anything out at all. And once that was fixed, it was impossible to get down to the required latencies without dropping buffers. Even with the semi-realtime kernel patches or whatever it's called.

Anyway, we gave up back then but I still wonder: was it just our lack of knowledge, or was linux back then (about 7 years ago) really not up to the task, or was it a driver problem? And what is the state today?

Today the state is (almost) as it always was:

1. Install and set up JACK. Enjoy your lack of latency. I haven't profiled it scientifically, but I've tried microphone monitoring with filters layered on and couldn't detect any latency between me speaking and my voice coming out the speakers.

2. Modern linux uses pulse. Pulse is kind of jingoist and slightly hostile to foreign presence. You'll need to edit its default.pa file to not load the udev-detect module and not try to just grab the soundcard for itself, but to load the jack-source and jack-sink modules. You'll also need to write a script to do the module loading because modern desktop linux doesn't have a race-free way to start daemons when a user logs in. Which you'll have to run manually once you log in. That script looks like this:

  pacmd load-module module-jack-source channels=1
  pacmd load-module module-jack-sink channels=2
Oh, and if it wasn't apparent, only one user can use the audio.

As anon4 mentioned, it's pretty easy now to set up low-latency audio now with jack. The hardest thing is probably navigating through the tons of documentation on it, much of which is outdated (the newer version of jack is a bit different to set up, you call jack_ctl instead of directly using jackd)

For best results, you should install a low-latency or realtime kernel. On Ubuntu, it's easy to find low-latency kernels through the repos, and Fedora has a realtime kernel through the PlanetCCRMA repos (although this randomly crashed on my last, approx 5-year old laptop). On other distros you may have to compile your own patched kernel, and there's plenty of documentation on this.

There's also some distros specifically for audio work, that are set up out of the box with great low-latency settings (Ubuntu Studio, KXStudio for example)

On my current laptop, I'm running jack at 48kHz, with 3 buffers of 64 samples, using a Focusrite Scarlett 2i4 interface. I haven't really noticed any significant dropouts so far. This is on Ubuntu with a low-latency kernel.

With the Pulseaudio jack sink, pulse is basically piping its output into jack, which is especially convenient if I want to do something like record audio streaming from a browser.

Using Gentoo, (no GUI just bash command line environment) and ALSA back-end, I got a latency of 64 samples, which seemed indistinguishable from dedicated sampler hardware when triggering audio files in Csound via a USB MIDI keyboard. This was using a 1.2 ghz Pentium M Toshiba laptop with basic on-board Intel soundcard. An i5 540m @ 2.67 ghz goes down to 16 samples (at least with no complaints from ALSA or Csound). An external USB soundcard on the i5 seems to go no lower than 32 samples. The main thing is making sure ALSA connects directly to your soundcard without going through dmix. This of course limits you to using one audio app at a time. If you need to have multiple applications making/receiving at low latencies , you will need to deal with the tedious per-session configuration of JACK.

My problems with compilation of PortAudio were primarily related to ksgid.lib which is apparently now removed from Visual Studio. It's been a while since I did that one so i don't remember specifically what I had to do, but combined with the fact that (on my machine) all the backends but WinMM had to be turned off, it wasn't ultimately important either way. As I recall, it needed a code crawl to find the guilty pragma and some preprocessor definitions that killed it. Portaudio powered the audio for about 4 months before I finally got around to dropping it for the next option, primarily because of latency. Downgrading VS wasn't an option because I lean very heavily on C++11, especially smart pointers and closures.

I've got a lot of open questions about ASIO, including one interesting one: will it knock out other apps and can this be avoided? Finding this out has been on my to-do list forever, but I'm visually impaired and many of my users will ultimately be as well, so I can't have it just killing the assistive technology. I'm not sure how that fixes the higher channel count problems with PortAudio anyway. If you've got more info, I'm open to it-I'd love a solution to fall from the sky.

Why not simply use FMOD / FMODEx? AFAIR their cross-platform support is quite excellent, and their license is free for anything open source or <$100k budget

For a library like that, isn't it also a good idea to let me plug in my own backend? That is, let me specify the sample format I want and either provide a function I can call that will fill a buffer with samples, or ask me to give you a callback that you can call once you have samples ready. Preferably, let me choose which model fits my use better.

Libaudioverse separates the audio backend from the audio simulation. It is already possible to do this: you create what I call a read simulation, set up your objects and connect them, and then use getBlock on it. Allowing the computation of an arbitrary number of samples is not permitted because it introduces a number of performance headaches and code complexities; in addition, I've not yet seen anything related to audio output that randomly resizes the blocks on you. If there's enough people that need me to add a queue that automatically breaks the blocks, I will do so. Either way, you can already send it anywhere, and I could have it working on platforms without backends with a minor code change.

More interestingly, if you are connected to an audio device, you can use a GraphListener. GraphListeners call a callback on you every time a block gets processed. As a consequence, it's also more than possible to do stuff like live stream over the internet. I am debating the merit of a backend that advances the simulation without playing: I can see potential uses for it, I'm just not sure if they're good ones.

As one of the developers of PortAudio, I would appreciate it if the author of the post could clarify the specific bugs that he found. We received no bug reports from the author.

MathJax (http://www.mathjax.org/) will render LaTeX math in the browser. The result is quite pretty.

Use Qt, as a bonus it has a widget kit?

As a downside, you don't get access to the audio data.

Why is cubeb not its own library?

It is?

The problem is that the authoritative source sits inside the Firefox tree. Mozilla makes a standalone version (that doesn't require that tree) available on GitHub, but you're not going to have much luck contributing back if your contributions break the Firefox build.

The solution proposed would be to make the GitHub the authoritative source, and patch the Firefox version. But that shifts the maintenance burden in the way that's opposite to the main developers' interests. I can see why there's no enthusiasm.

I've contributed to libcubeb and had no problems with the above. But my contributions were not of the "lol I don't like automake so let me replace your entire buildsystem" kind. Honestly if that's the kind of thing you're trying to do, I think almost any open source project will push back.

Ok, didn't realize that. I'm abut suspicious of the main article now.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact