Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Working on a new network transport for PulseAudio and ALSA (gavv.github.io)
107 points by gavv42 8 months ago | hide | past | web | favorite | 43 comments



A brief summary.

I'm working on Roc, a toolkit for real-time streaming over the network. Among other things, it provides command-line tools and PulseAudio modules that can be used for home audio. It can be used with PA, with bare ALSA, and with the macOS CoreAudio.

The main difference from other transports, including PulseAudio TCP and RTP streaming, is the better quality of service when the latency is low (100 to 300 ms) and the network in unreliable (Wi-Fi). The post explains why and provides some comparison, usage instructions, and future plans.

There is still a long way to go, and now we're looking for people thoughts and feedback. Do you find the project useful? How would you use it? What features would you like to see?


It's very disappointing that 300ms is considered "low" latency. The circumference of the Earth is 40075km, and the speed of light in optical fiber is slightly faster than 2/3rds of that in vacuum, so it's physically possible to send a signal to any place on Earth within 100ms, and get a reply within 200ms.

IMO "low latency" should mean low enough than it's very unlikely to be noticed, which most musicians seem to accept as 5ms. (Theoretically, even microsecond level delayed audio can be noticed if mixed with the original signal because of comb filtering.)


I see your point.

Many audio streaming apps requires 1-2 seconds latency (especially on Wi-Fi), that's why I called the 100-300 ms range "low". 100ms is the minimum I've seriously tested on Wi-Fi so far. 300ms is, roughly, the maximum UI delay that feels acceptable (you press "play" and hear the sound).

I'll think about the wording..


300ms is still noticably laggy when the audio is part of a video. Some media players can delay their audio to account for playback delay in the audio device, if the audio stack supports that. Does Roc support that, or if not, is it on your roadmap?


> 300ms is still noticably laggy when the audio is part of a video.

Agree.

> Some media players can delay their audio to account for playback delay in the audio device, if the audio stack supports that. Does Roc support that, or if not, is it on your roadmap?

We have an open issue for implementing correct latency reports in PA modules. When we'll fix it, players that support that feature should automatically start taking the latency into account.

Thanks for reminding me, I'll test this feature specifically.

https://github.com/roc-project/roc/issues/171


Many years ago we did some unscientific testing how latency affected a telephony application. Already at around 80ms there was measurable impact on the dialogue, with an increased frequency of the callers interrupting each other's sentences. Even modern VoIP applications can still be problematic in this regard and additional latency from the software stack wouldn't help.


I see you've got Opus on your to-do list. I would really appreciate that! I find Opus (appropriately configured) to be audibly indistinguishable from CD audio, and it would really help with the bandwidth requirements.

I've always been really excited by the possibilities implied by PulseAudio's network capabilities, but disappointed by their latency and bandwidth requirements. Roc + Opus would be amazing.


Check out https://github.com/eugenehp/trx for Opus streaming inspiration, I've played around with their code and found it easy to work with. Opus would be great with ROC because in case of buffer over/under runs the codec provides features to mask dropouts based on previous content. This is critical when using Wi-Fi.


Thanks.

> Opus would be great with ROC because in case of buffer over/under runs the codec provides features to mask dropouts based on previous content. This is critical when using Wi-Fi.

Are you talking about its PLC or FEC? I didn't test it yet and I'm interested if people are using both of them with music.

BTW it would be also interesting to combine our FECFRAME support with Opus.


Good to know. Yes, Opus will be is one of the highest priorities for us after we'll make the very first (0.1) release.


Excellent; a few years ago I even started hacking on my own transport, very roughly as a PA module, but life got in the way and it never got very far. So I'm very pleased to see this great project. Thanks, and good luck!


Fun fact: end-to-end latency in midrange digital wireless microphone systems is 2.7ms [0]. People sometimes wonder why we bother with 5-6 figure specialty audio and RF gear when we could "just" use general purpose computers and WiFi. This is one of the reasons.

[0] https://www.shure.com/en-US/support/find-an-answer/ulxd-dual...


I've got a home HifiBerry streaming setup over Ethernet. I am using TCP streaming and the latencies are low enough not to be noticeable at all while watching YouTube or playing games and streaming the audio output to my speaker setup on the RPi.

1) Would this make any difference?

2) Does it currently support online plug-unplug the way RTP works without restarting pulseaudio?


> Would this make any difference?

If you have no issues with 1) latency 2) packet losses and 3) clocks difference, that would be no difference, at least until Roc could offer some new encodings.

(If you're using PA, it handles the clocks difference for you. Its RTP transport sometimes worked strange for me, but its "native" tunnels handled it well.)

> Does it currently support online plug-unplug the way RTP works without restarting pulseaudio?

Roc sinks and sink inputs may be loaded and unloaded at any time without restarting PA. But there is no service discovery yet, which means that 1) when a remote sink input appears, sink is not automatically added 2) when a remote sink input disappears, sink is not automatically removed. (We will add this in upcoming releases). Currently the remote sink input can appear and disappear at any time and the local sink will just continue streaming packets to the specified address.


This is exactly what I did - creating an ALSA plugin and leveraging snd-loopback to pass PCM to a streaming process. I would be interested in incorporating your protocol into SlimStreamer. Currently it uses SlimProto, which is TCP based (so sync part is a nightmare to get working on a reasonable level). How far are you with supporting multiple sampling rates and multiple receivers?


Great!

> How far are you with supporting multiple sampling rates

Roc currently supports arbitrary input/output rates but only a single network rate (44100). If the network rate differs from the input/output rate, Roc performs resampling.

We're now finishing the 0.1 release, and I was planning to add support for more network encodings, including more rates, in 0.2. Feel free to file an issue or mail us with a list of encoding/rates you need.

> and multiple receivers?

No support yet. If you use a multicast address, it would probably just work though.

Again, feel free to file an issue and describe what you would expect from such support. I'll be happy to implement it if someone needs it.

Another question is how Roc will interact with your sync part. How do you perform synchronization?


How did you measure latency? I'm thinking to contribute a similar project that has 1~2seconds of delay but I don't know what tools I need to use to benchmark latency.


That small experiment in my post does not include a correct latency estimation. I just configured all three transports with the desired latency. Actually I'm thinking about writing a tool for such benchmarks that will measure the overall latency (PA + network + PA).


I would like to show your project to some friends, thanks! Does it support h323?


Thanks.

> Does it support h323?

No, and there were no plans yet. But we probably can add support if someone will need it.


is 100ms really low latency?


I agree, calling it low was not quite correct. See the thread above: https://news.ycombinator.com/item?id=19828567

I didn't perform serious testing on latencies below 100ms yet. I've added to my todo that we should investigate the minimum supported latency.


I wish you luck with your project! sorry if my comment was snarky but a lot of modern software seems to essentially not care about latency or responsiveness, and it's something that bothers me more and more these days


Thanks :)


Sounds great! What about encryption? If that is used in any environment not in complete control of the user that should be mandatory. E.g. in an open wifi, a shared wifi, or directly over the internet. As the protocol is using RTP anyway, it should be easy to slap on SRTP or DTLS? For the beginning, it may even be sufficient to use a static symmetric key. Or directly use WebRTC which has that already included, but what about the FEC then?


Update: I just discovered SRTP in the advanced features list. Awesome!


Yeah, SRTP is in the list :) It's not the highest priority right now, but I'll get to that sooner or later (sooner if somebody will be asking for it).


This is cool! I've recently been playing with streaming local audio to another system (a raspberrypi with a dac). I tried Pulseaudio's builtin way of streaming, but lag was pretty bad. I found JACK[0] to work well (<20ms) once I got it configured correctly. Kind of a complicated setup (including getting it all to be "automatic", systemd unit files and all, realtime kernels, etc), and not particularly stable. Unfortunately the latency makes up for it.

[0] http://www.jackaudio.org/ (is this link dead? here's a few more-

https://github.com/jackaudio/

https://en.wikipedia.org/wiki/JACK_Audio_Connection_Kit )


This is great staff! Finally there is an Open Source initiative to create a robust transport for sync audio streaming. Existing Open Source solutions that come to my mind (like SlimProto, SnapCast, ffmpeg) focus on providing 'a product' rather than a reusable 'transport'. Few questions: how do you 'capture' PCM stream in case of ALSA? It is straight forward to create a PA sink and plug it into PA configuration, but I am wondering about pure ALSA. Disclaimer: I am an author of https://www.github.com/gimesketvirtadieni/slimstreamer


Thanks.

> Few questions: how do you 'capture' PCM stream in case of ALSA? It is straight forward to create a PA sink and plug it into PA configuration, but I am wondering about pure ALSA.

Good question :) Roc does not implement any special capturing code for ALSA, it just reads from the given device (using SoX currently). The user is supposed to use something like snd-aloop.

It would be possible to create a custom ALSA plugin I guess, but we have no plans for that currently.

You're right about the transport vs product part. I would prefer to work on the transport. And an ALSA plugin would be a product on top of it so it should be a separate project ideally. Actually, the same is true for our PulseAudio modules. I hope later we will either submit them to upstream or separate into a standalone project.

> Disclaimer: I am an author of https://www.github.com/gimesketvirtadieni/slimstreamer

Interesting, didn't see it before.


Network audio is pretty nifty! I run a Snapcast[0] setup at home, tied into Home Assistant[0] automation for multi-room audio. Some notes:

- I have six total audio zones, including my desktop computer.

- Audio for a room turns on/off with a room. It's neat to walk from my office into the kitchen, and have the kitchen lights come up and audio follow me in when the motion detectors fire. Some speakers don't mute when "off", but change source to a text-to-speech only channel (for i.e. door/window contact notification, other messages).

- Everything but my desktop (macOS) are speakers connected to a Raspberry Pi via USB DAC.

- One of my motivations here was multi-room audio, but a big one was to connect a Linux VM's audio output directly to the speakers so I could use the official Spotify client, instead of a 3rd-party library that will eventually break.

- Snapcast is really quite DIY for config, but I could set up other sources--an Airplay target, a line in target with a cable hanging off the server so people could plug in devices at a party, etc. I've seen setups online where people do this, and someone in a room can change that room's "channel" to another source.

- Spotify's DRM-as-feature is nice here, because I just use the Spotify client on my desktop normally, with output coming out elsewhere. I run 700ms of buffer, which is just low enough that clicking play/pause doesn't feel broken. I could probably drop it more, since everything is hardwired in the house.

- Previous to Snapcast, I just toggled Spotify's source when I walked between rooms, but there's quite a bit of dead air there, and it's a hassle to setup, plus multi-room audio sync is nice with people over.

[0] https://github.com/badaix/snapcast

[1] https://www.home-assistant.io/


Interesting, thanks for sharing.


Can this also play to multiple devices (at the same time), while keeping the audio synchronized ?


Not yet. This is in our roadmap however.


This would be very cool. I've been pondering about how to do this every now and then (but never worked on any real time stuff, so...)

If you ever get around to adding this I will start building my el cheapo raspberry pi based sonos clone. ;-)


Regarding realtime stuff there's a project out there to implement the Ethernet AVB (audio/video bridging) standard on BeagleBone using PTP (precision time protocol) for synchronization.

Some of the older network synchronized transports like CobraNet and Dante might also be interesting for anyone wanting to learn more about this stuff.


gstreamer already does that fairly well (https://arunraghavan.net/2017/01/quantifying-synchronisation...), is there a reason to duplicate the work ?


Thanks for info.

> is there a reason to duplicate the work ?

I don't know yet. When the time comes to implementation we'll look whether we can re-use either the code or ideas or maybe instead integrate Roc into GStreamer as a network transport (actually I was thinking about it already and there is an item in the roadmap for it).


Looks awesome! Functionally speaking, it reminds me of Snapcast. Compare/contrast?


Thanks, I didn't know about this project and will definitely look at the implementation.

Their documentation says they use TCP, which usually means that it won't handle low latencies on Wi-Fi due to packet losses.

On the other hand, they have service discovery, remote control, and multi-room synchronization. All three features are planned but not yet supported in Roc. We'll add the first two in upcoming releases, but the multi-room support requires a serious research.

Their documentation also says the client can correct time deviations by playing faster or slower. We use resampling for that instead. I'm wondering how they can avoid glitches without using a resampler.

One more difference is that they use their own protocols (both for streaming and control) while Roc relies on standard RFCs.


This is a cool project! I always wonder how to measure latency of sound-related programs. Is there any tools to benchmark latencies?


You've mentioned PA, ALSA and macOS CoreAudio. That's Linux and Mac. Will Windows users be able to use this as well somehow?


Currently, no. Windows port is in our roadmap but not a priority right now. However, if someone would want to maintain it, I'm ready to accept PRs and help with porting.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: