Hacker News new | past | comments | ask | show | jobs | submit login
Paul Davis, lead developer of Ardour on fixing big Linux audio issues (libregraphicsworld.org)
95 points by reddotX 3 months ago | hide | past | favorite | 41 comments

Happy to answer any further questions here too!

Great interview! What wasn't entirely clear to me is the conceptual difference between LV2 extensions and other plugin extensions:

"Whereas with LV2, you can actually say: 'Here is the spec for this extension. And it's written down and this is how it works'. So I think this is a great thing and I don't think it'll go away."

Isn't this merely a matter of documentation? See for example the REAPER VST 2.x extensions (which you've mentioned): https://www.reaper.fm/sdk/vst/vst_ext.php

With VST 3.x (which has a COM-like API) you can even add entire custom interfaces which the plugin resp. the host can query.

What Cockos did with Reaper and their VST "extensions" is absolutely not part of Steinberg's conception of VST. Because of this, there's no namespacing: two organizations could "define" an extension that uses the same integer value for the audiomaster callback, and there's nothing to resolve that except for one of them to say "Hey, guys, please don't do that, we were using that first".

LV2 uses URI's to identify extensions and there is an expectation that the URI is based on a URL controlled by the extension's proponents. Namespacing, put simply.

VST3. Cough. Choke. Do you know that the SDK Steinberg distributes for VST3 is larger than the entire Ardour codebase?

> What Cockos did with Reaper and their VST "extensions" is absolutely not part of Steinberg's conception of VST.

I tend to disagree. What else is the purpose of the effVendorSpecific/effCanDo and audioMasterVendorSpecific/audioMasterCanDo opcodes?

> two organizations could "define" an extension that uses the same integer value for the audiomaster callback

Yes, this is indeed a problem.

> LV2 uses URI's to identify extensions

And VST3 uses COM like interfaces with GUIDs. I think it's just two different ways to achieve the same thing.

> Do you know that the SDK Steinberg distributes for VST3 is larger than the entire Ardour codebase?

:-) 99% of the VST3 "SDK" is just unnecessary cruft, the actual plugin interface is pretty small (although not as small as VST2). I've written a cross-platform VST3 host, and "pluginterfaces" is really all you need (https://github.com/steinbergmedia/vst3_pluginterfaces). I don't think the actual VST3 plugin API is significantly larger than LV2.

I have mixed feelings about VST3. I see some of the advantages, but some design decision are just awful. In fact, some things like multi-channel support is even worse than with VST2 (see https://github.com/steinbergmedia/vst3sdk/issues/28).

Sadly, I'm no fan of COM either. It's an engineering design that while conceptually not unrelated to better similar systems of its time, just reeks of everything that was wrong with MS during the 1990s. I don't think that anyone that actually cares about cross-platform would ever choose COM-like stuff for anything. Or maybe I'm just no fan of "component object models" in general.

But anyway, Robin Gareus has already implemented the bulk of VST3 support inside Ardour so we should that emerge sometime (7.0 or before).

> Sadly, I'm no fan of COM either.

Me neither :-) A C API should still be the weapon of choice.

> Robin Gareus has already implemented the bulk of VST3

Cool! I remember I had a short e-mail conversation with Robin about the VST3 SDK last year after he kindly helped me to get his lv2vst plugin to work in my host. Looks like he started to work on the VST3 implementation shortly after that :-)

Out of curiosity, what's your host's name? :)


It's for Pure Data and Supercollider.

GitHub mirror: https://github.com/spacechild1/vstplugin

BTW, the actual VST3 plugin interface only consists of a couple of header files. I didn't have to compile a single .cpp file from the VST3 SDK. So the actual API is pretty light-weight, but it's quasi hidden inside this humongous SDK. It's both funny and sad.

Edit: larger than libardour if you exclude the cmake sources included within it. Larger than ardour if you include the cmake sources included.

Thank you for developing and maintaining such a fantastic piece of software!

And thanks to you and @prokoudine for this great series of interviews.

I was very surprised to read:

> But we do now generally tell new users "You don't have to use JACK. And in fact, if you don't use JACK, your initial experience is going to be a lot easier". That's particularly true for MIDI devices. Most people using JACK2 have to go through some extra loops to actually get hardware to show up. Whereas if they use the ALSA backend on Linux, it just works. > > So JACK will be there, we will suggest and make it more and more obvious that JACK is not the obvious thing for you to use.

I recently helped a friend setting up his Linux laptop to record audio (USB interface, mics to record acoustic instruments). I installed Ubuntu 20.04 and used the great Ubuntu Studio tools to setup JACK [1]. It's still a pain, as you mention, to save/restore session states for my friend, and to setup the settings for latency Vs. xruns.

My friend doesn't need to route audio from one program to another, so I guess he could just use ALSA directly, but then how can he monitor/optimize the latency?

[1] https://help.ubuntu.com/community/UbuntuStudio/UbuntuStudioC...

The latency is set in the audio/MIDI dialog that shows up when you start Ardour. You can also measure systemic (hardware analog-digital/digital-analog conversion) latency from there.

Interesting interview; thank you for doing it. Here's a few questions.

You say that for end users, ALSA just works, and you tend to encourage it for newbies. What are the tradeoffs? What kind of latency penalty am I taking by using ALSA vs JACK? Can it scale to as many channels?

What's your preferred JACK frontend? The UbuntuStudio UI has some nice bridging set up out of the box, so you can e.g. play Youtube while Ardour or Reaper is active. QJackCtl certainly works (and offers a UI around the patchbay you describe) but it doesn't have that feature at least.

Does Pipewire aim to solve the problem of persistent device names? I had a bit of an adventure figuring out how to set up ALSA so that it didn't constantly give new numbers to my audio interface, USB midi IO, control surface, etc (which was painful, because it meant having to reconfigure the DAWs after each reboot.) Once I found the right howto, it wasn't too bad, but it'd be nice if that was taken care of.

Do you have any background knowledge on Bitwig's weird MIDI setup? They support JACK audio, but not JACK midi, so you have to do this weird juggling act to bridge midi to ALSA, which was too much friction for me to bother with (and so I don't use Bitwig.)

Finally, is there much difference between audio interfaces in terms of latency as long as they are USB class compliant? If there is, are there any good references that you know of? Or is it a case of experimentation?

ALSA vs. JACK: there's no latency difference at all. There is a very tiny decrease in CPU load from using ALSA directly (no context switching between the JACK server and clients). No difference in channel counts. Note also that my recommendation was within the context of Ardour: use Ardour's own ALSA audio/MIDI backend, rather than it's JACK audio/MIDI backend (unless you actually need JACK-like capabilities).

I use QJackCtl. I use an ALSA loopback device to bridge between PulseAudio and JACK. Documented at http://ardour.org/jack-n-pulse.html

I don't know the answer to the question about Pipewire. However, devices names: https://jackaudio.org/faq/device_naming.html

We have had users complaining about Ardour in the context of Bitwig's wierd MIDI, indeed. I've never spoken to anyone involved in Bitwig development. I could make guesses about why they did what they did, but they would just be guesses. It is certainly less than ideal.

USB audio on Linux sadly incurs an extra latency penalty due to a kernel-side buffer whose size varies every time the device is opened/started. This means that you will not see constant latency numbers for any USB audio interface on Linux (there are some attempts being made to fix this). However, other than that, they are all functionally equivalent, certainly from a latency perspective. I would give MOTU's recent devices a shout out purely because they've done all device configuration (the "device panel") via a web browser, and thus removed the one barrier that exists for a number of devices on Linux - the audio/MIDI side works, but you cannot configure it since that requires a dedicated Windows/macOS tool. MOTU were the first company I know of to do this, and despite their virulently anti-Linux attitude in the past, it really makes Linux a first class platform for their newer devices (ignoring a few SNAFUs with the firmware, unfortunately).

There's also the first part of the interview available :)


I should've linked to it directly perhaps :)

The technical discussion was a ways over my head but maybe you'll indulge a more basic question: I'd like hook my linux machine up to a (music) keyboard with minimal latency. What do I need to know/understand/possess for a good experience? Can I just install AVLinux, plug in a midi-USB cable, and fire up Ardour?

Yes, absolutely. Just tell Ardour to use its ALSA audio/MIDI backend, and skip JACK. It will all Just Work (TM).

Congratulations on Ardour 6, it really seems like you focused on the right things!

What are your thoughts on MPE and Midi 2.0 in Ardour?

Ardour can already record "MPE" (they are just normal MIDI messages), and play it back. What you can't do is meaningfully edit it, because you need to totally change the idea of what "a note" is ... it's no longer just a pitch, velocity and on/off times - it can be an arbitrary set of time-varying parameters as well. This requires a substantive redesign of the internal data model for MIDI. Currently, I don't have any plans for this, though I would like it to happen.

I have not looked at MIDI 2.0 yet in any level of detail, amazing at that may seem. I don't think MIDI 2.0 really brings much to the table for most users of MIDI, and what it does bring is largely addressed by MPE. However, given some of the deeper changes, it would make sense that whenever we try to tackle the "MPE model", we also pay attention to what MIDI 2.0 requires at the same time.

Well I'm really looking forward to MIDI Capability Inquiry. (well and all the high resolution bits too).

Having a midi device be able to describe and name its parameters and then be able to ... let me quote soundonsound -


"""MIDI‑CI will allow DAWs to discover a lot more about external gear than they can at the moment, and might even allow editing panels to be built automatically."""

I have a lot of gear that will never speak midi 2.0 but my assumption is some clever person will make a proxy (with a big list of existing gear mappings) that would allow me to select that gear and then expose a midi 2.0 interface to the DAW or a keyboard controller that was midi 2.0

This would really be huge for me and just make life a lot better around the studio.

About 10 years ago, Digidesign bought a nice protocol that was really tightly engineered to allow DAWs to build interfaces with control surfaces in an incredibly deep way. The DAW and the surface could negotiate based on their capabilities and needs. It looked really promising.

It has gone nowhere.

There are a number of reasons why, but I would wager that the most important is that people simply do not need this kind of complexity. It is the kind of thing that OEMs can use to build cool devices, rather than the kind of things the overwhelming majority of users ever wants to deal with.

Note that the MIDNAM specification also does a significant part of what you describe above, and although it's not true to say that it has gone nowhere, the industry essentially abandoned it rather than push it forward.

I apologize for sound cynical or skeptical about MIDI 2.0. MIDI 1.0 was a technical and social engineering miracle. But having been around the audio tech industry for 20+ years, I'm not optimistic that the MMA's processes are capable of replicating what was so good about MIDI 1.0 or avoiding the screw-ups they've been involved with since then.

Some things come too early and/or are underspecified.

Midi 2.0 seems like its about a sweet spot that will solve real problems.

I'm looking at the midnam files that come with Ardour...

It seems like these are focused on just named patches? I see the spec can do more like parameter names but it doesn't seem like that's been the focus? Is there a bigger pile of work somewhere?

Thanks, I might poke at this a bit and see what I can get working in Ardour.

Between pipewire and LV2 is Linux finally ready for audio stuff like mac ? Are there music producers who use use linux exclusively for live music, production and mixing ?

There are such people. Since the total number of Linux users is low compared to Windows, probably not very many.

But "is Linux finally ready for audio stuff like mac?" is a meaningless question because there are as many different workflows and definitions of "audio stuff" as there are styles of music.

Also, in this context, even the term "Linux" is not well-defined. You probably mean "Linux in a conventional desktop or laptop computer", but there are many high end digital audio mixing devices you could buy from any major audio tech company that run Linux internally. That's not what you meant, probably, but it's still Linux, thus making the term a bit unclear in this context.

I liked the “CoreAudio got this right” bits weaved throughout the interview; it makes for a good “sometimes you have the wrong interface and need to force a change”.

Paul: I’m curious which “industry / workflow” is likely to be the best driver of improvements in Linux audio. Is film production or post-production the best bet?

(My assumption is that as long as musicians “grow up” on macOS and Logic and so on, you’re fighting with the “perennial year of the Linux desktop”. That wouldn’t apply to audio production teams who don’t need to also use their machines for their personal life)

"Professional" workflows on Linux are not affected by the sorts of things that bug desktop users. ALSA just works (and in some senses, works better than CoreAudio), all the more so for nice fixed studio configurations where the expectation that you can just arbitrarily plug/unplug devices and everything keeps working isn't part of day to day life.

Consumer/desktop audio on Linux will only advance (to the extent that it needs to - it already works quite well) because of people just sitting down and doing the work. Nothing is "driving" this forward.

The problems really arise at the junction between the two worlds: think bedroom/basement music/video production on your one laptop. This is the part that macOS/CoreAudio gets so right - there's no difference in any of this for consumer apps or "pro" apps. On Linux, you have to grapple with the awkward disconnect between these two "workflows". It's not THAT hard, but it's not THAT easy either. And this affects developers just as much as it affects users: which APIs to use?

Thanks. In that case, I liked your bit about userspace APIs as a potential path forward. Much like with userspace networking and storage layers that have become so pervasive lately.

> Paul: I’m curious which “industry / workflow” is likely to be the best driver of improvements in Linux audio. Is film production or post-production the best bet?

Well, you probably need an application which has real-time requirements. This has always been the most difficult aspect of audio processing pipelines.

That was solved on Linux over 10 years ago. It really isn't an issue now.

Real time audio programming within applications tends to be a problem, mostly because too many developers have never read or do not understand:


Ok. But in case an application still fails to prevent buffer underruns, perhaps it's a good idea to at least let the driver generate a reverb to fill the empty space.

At least in the electronic music world, production has long ago moved from Mac to Windows.


Ok I don’t have a citation other than amongst the hundreds of djs and producers I’ve had contact with in the rave/club scenes over the past couple decades I’ve seen that Macs, Atari, and Amiga were dominant in the early 90s and by the early 2000s the tide was turning and now it’s almost exclusively Windows everywhere other than people doing all-hardware sequencing and modular stuff.

I haven’t exactly pestered everyone with a notepad and pencil taking a survey so I do not have a citation. I’d be VERY curious to hear a credible citation to the contrary showing that Macs are growing in use or even maintaining steady market share. That’s not what I see out in the world.

Since we are exchanging anecdotal evidence :), every single one of my musician friends except just one person has moved to the mac in the last 10 years. The reason why that one person didn't is that he needs some Windows-specific software for work.

Weird. What genres are your musician friends into? Must just the circles I travel with but in the parts of the techno/house/drum and bass scenes that I’m privy to in the US, UK, and Germany I see less than 5% macs probably and as the hardware has gotten less expandable (soldered ram, soldered ssd) that number has been decreasing further. Odd that we see such drastically different trends in our respective subcultures.

I can concur, many - if not most - musicians musicians I know use Macbooks for their live performances. Low latency audio on Macbooks just works (tm). It's certainly possible on Windows and Linux, but it requires the right hardware and some tweaking. I'm on Windows, btw.

I’m talking in the studio when the actual work is getting done. A soldered down ssd and ram is not a good investment for a studio musician who wants to be able to expand. Definitely for live shows fashion plays a role so you’re gonna see more macs. Re: the latency thing everybody I know just drops the $50 for a Focusrite 2x2 and installs the right ASIO driver.

> I’m talking in the studio when the actual work is getting done

Ok, the situation in studios might be different (but I would curious to see actual numbers). In a live show you want reliable low latency audio (especially if you play VSTis on a keyboard or do any kind of live processing) and macOS is very good at this, better than Windows I would say. Dismissing this as "fashion" is a bit simplistic.

I'm a computer musician and audio programmer. Although I use Windows most of the time, I would be the first one to admit that the audio situation on Windows is a total mess. As Paul said, on macOS you have a single API and it just works (tm). On Windows, we needed an external company (Steinberg) to come up with a usable solution for low latency audio (ASIO).

Thanks for your work on Ardour by the way. I’ve been attempting to move to a fully Linux-based solution for audio production on and off since the mid-2000s and Ardour is one of the best tools on the platform in my opinion.

Tangentially-related sorry. I have a problem with a midi keyboard connected via USB to a Raspberry pi running the latest Ubuntu Mate. When trying to use the keyboard with the timidity synthesizer, there is a lot of latency, like maybe half a second from key press to sound played out the speakers. Makes learning piano impossible.

I tried editing lots of files, installing jack, uninstalling jack, googling, and bashing my head against the table for hours. Nothing helped. Is this impossible or is there a buffer setting I could tweak somewhere?

It is likely that your buffer size is too large. When starting jack, try for a buffer size of 256 or smaller and see if it works for you.

Ask this question at linuxmusicians.com and you will likely get some more helpful answers.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact