Hacker News new | past | comments | ask | show | jobs | submit login
Sonobus: Open-source app for low latency peer-to-peer audio (sonobus.net)
388 points by Eduard 61 days ago | hide | past | favorite | 88 comments

This is incredible. I've been wanting a piece of software to stream high quality, low latency mac audio to my iphone, so I can use my headphones wirelessly while working. I'll have to play with it more, but I was set up in mere minutes and it works at extremely low latency.

The fact that this does bidirectional audio is icing on the cake; can't wait to share this with some quarantined bands.

Magic, well done.

15 years ago I played in a band with some friends.

We now all love in different cities. Due to the pandemic and lockdowns we ended up reconnecting and despite the distance we are jamming and working on songs like we never left the drummers basement.

Strongly recommend jamulus.io. It is fast enough to actually jam on music with other players.

How? The network itself should be 15ms, which is the perceptible edge or reverb. Genuinely heavily interested.

The most important thing when you do this is that you don't listen to your local audio signal, you only listen to the mixed signal coming from the server. That puts you in time with everyone else, more or less.

Roundtrip latency was under 30ms, which you definitely feel but your brain can compensate for.

The funny effect overall though is that because you're delayed and everyone else is delayed, you end up sort of waiting for the other player, and they end up waiting for you, so instead of the classic band "everyone speeds up" it's like everyone progressively slows down as the song progresses.

We haven't tried with our drummer actually playing drums, he's just been recording against other parts we provide. I don't think it'll work very well because his acoustic kit will be too loud and he'll hear that as well as the delayed drum sound, which for him will feel like a stuttery mess

> don't listen to your local audio signal, you only listen to the mixed signal coming from the server ... I don't think it'll work very well because his acoustic kit will be too loud

Yes, that sounds right. Whether this sort of approach works depends a lot on whether your instrument is naturally quiet enough that you can just focus on what you hear from the server. People are actually surprisingly good at adjusting to instruments that are "slow to sound"; just like you can bounce a basketball on the beat with a little practice, even though it takes many milliseconds for the ball to hit the ground after leaving your hand, you can learn to play keyboard etc with a similar delay.


We had a zoom video going at the same time so we could look at each other. What tripped my brain up the most was if I looked for rhythm cues from the video while playing, even my own video would not be in sync and I would lose the zone where I could play in delayed time and have to restart

15 ms is the same as people standing 15 ms * 300 m/s = 4.5 m apart.

You can make music together at this distance (even though it's perceptible).

Hmm… Wonder whether it would improve things to create a virtual environment with point sources actually that distance apart, and model the reverb / echoing you'd get from that, so your brain "intuitively" understands the delay.

It is perceptible, but doesn't matter that much in practice. If you're able to achieve the overall latency of around 30ms, you'll be able to play just fine - your mind will quickly adjust to the latency, you just need to listen to your delayed output from the server and disable any local monitoring so you stay synchronized with others.

As a musician I kind of strongly disagree with that. I was never able to adjust to latency above ~8ms. Some people might. I can't. I spent as much effort as necessary to ensure that my setup has low latency.

Of course if you're playing with quantization on, or sequencers, it's not as much of a problem

That always struck me as the solution: pump a click centrally to everyone and then everyone synchronizes to it. You might not be able to monitor but you've "jammed" and actually created music.

That's pretty much what Jamulus does: sound is mixed server-side, so you have a single point of reference (can be click track, can be drummer, can be sequencer...). You can even get a multi-track recording saved on the server so you can then adjust timings and give it a proper mix afterwards.

The answer is pretty obvious: by not caring if it is under 500ms delay.

unless you get a Quantum A.I. that can predict seconds into the future... mine just said musk will start a Q.A.I. company with some shady zero-equity A-round disguised as pre-sale. Oh wait that's not my QuantumAI, it's just my tweeter tab.

500 ms?! At 500 ms I can assure you you'll have trouble with normal conversations. You're nowhere near collaborative music playing.

everything on that comment was exaggerated for comedy. But the truth is still the same, you have to accept the network latency. period.

anyone promising faster than light communication could very well be claiming a patent to a perpetual motion machine.

Yes. Transatlantic latency is <100 ms. People in this thread have said that 50 ms is doable for making music. Why are you derailing the conversation?

This use case can be done much simpler; this is something I've been doing with PulseAudio on my phone for years.


Have to link to that because I chuckled at your post.

Is that relevant though? ;) I don't think "using mobile phone as a wireless headset" is a primary use case for low-latency P2P apps and I certainly wouldn't propose using PulseAudio instead of such apps for, well, low-latency P2P communications.

Though PulseAudio doesn't really minimize latency ...

It seems on par with a Bluetooth headset to me. Of course you have to use it over LAN and not over the Internet, but I don't think that's an issue for this particular use-case.

Could you elaborate on your setup an usecase?

I just enable zeroconf discovery and allow network access in PulseAudio on my laptop and my phone, so both devices automatically see each other as a regular audio device. You can do that from UI via `paprefs`.

I use Soundwire for the same purpose.

Out of curiosity, what's your use case?

I have a nice pair of wired headphones and I listen to a lot of lossless audio. I like being able to get up from my desk to fill up my water glass or grab a snack without breaking the flow of music. Bluetooth isn't ideal due to the quality loss and other factors. I tried some airplay based solutions but the 2 second delay was too much. I'm seeing anywhere from 25ms-200ms latency here and it's working flawlessly.

I could just listen to music on my phone directly, but it becomes such a pain because watching a YouTube video or anything with audio on my computer completely breaks my flow.

Granted this is a completely niche persnickety problem, but what is this site if not for solving niche persnickety problems.

My friend who happens to be an acclaimed tubist just noted that he could use this for online music lessons. So there's earnestly a wide variety of use cases here.

> Granted this is a completely niche persnickety problem

I'm in the exact same boat - completely resonate with your desire to avoid breaking flow - and I doubt I would have thought of this solution if it weren't for your post above.


I can't figure out how latency matters when you're listening to prerecorded music. You hear the whole song from start to finish without interruption, yes? Why does it matter if that whole uninterrupted song is heard 2 seconds later than when it was sent from the streaming device? Oh... you're watching the music video at the same time?

Perhaps it is a usability issue. Low latency makes using the software much more pleasant. I once tried streaming lossless audio to my phone from an mpd running on my home server. Pressing pause and having to wait two seconds for the audio to actually pause killed it for me.

That’s a fair question, but I do a lot more than just listen to music. I jump on a call, I watch a YouTube video, I put on a basketball game or even the news. Those are all latency sensitive.

Why does latency matter for any of these except the call where you're interacting with the stream?

Have you given snapcast a look? https://github.com/badaix/snapcast/

For what it's worth, I have the exact same use case

I'll chime in with the same intense positivity. In the early days of the quarantine I set up a "walk in theater" on my property where I wanted friends to be able to watch the film without bothering my neighbors with a massive stereo system. The natural option was headphones and I chose between setting up an FM transmitter and trying to get everyone to buy a walkman... or... trying to come up with a network streaming solution. I found the open source implementation of VBAN and tried to write a python script to launch the server and manage finding new connections. It failed absolutely miserably and I ended up setting up an amplified audio system in the rain, much to the trouble of my neighbors also.

Needless to say, I love this. The only thing that's stopping it from being truly wonderful is an Android app <3

There actually is an Android version in beta: https://play.google.com/store/apps/details?id=com.sonosaurus...

Needs some work still...

Is the source available somewhere?

Isn't snapcast the de-facto solution for that kind of use-case?

This would be my pick as well. I've researched this in the context of a silent disco, and tried an Airplay based solution described here: https://chrislivengood.net/the-do-it-yourself-silent-disco/, but snapcast just works.

Yeah. But the latency I had with snapcast was too much for video syncing. Maybe I had some issue other people don't have though. If you just want all audio sinks synchronized, then snapcast is great because latency doesn't matter all that much. But. If you want the words you're hearing to match moving of lips on screen.... then you're in a different realm entirely.

That makes sense - I believe by default snapcast has a 1000ms buffer.

I wonder if you can't grab the latency correction factor from snapcast in real-time, and somehow apply it to you video stream as well? This has been raised before it seems [1], and in another issue, the snapcast author recommends looking at RTP based streaming instead of snapcast [2].

Seems like snapcast may not be ideal for this after-all :)

[1] https://github.com/badaix/snapcast/issues/57

[2] https://github.com/badaix/snapcast/issues/731#issuecomment-7...

I think parent says that +/-200ms is acceptable for snapcast, but unacceptable for lip-syncing.

I am not sure snapcast can't sync down to a few ms. If so, the issue would be syncing video with the audio clients, which certainly sounds feasible if integrated in the video player. That's what jellyfin does: https://github.com/jellyfin/jellyfin-web/pull/1011 (I tried to help a bit with that one).

Ooh this looks great! I designed a pair of nice 3D printable headphones [1] and my plan was to make them Wifi headphones so I could roam the house while listening to them. I made a battery powered raspberry pi zero with a nice DAC and set up shairport sync as an Apple Airplay device. That works fine for Spotify, but there’s instances (watching videos while walking in and out of the office) where the latency is a problem. Maybe this will work for me!

[1] https://github.com/tlalexander/reboot-headphones

Airpods could do this as long as you keep your phone nearby.

But, I guess they're not open source...

Nope. Plus these headphones sound actually very good (compared to my roommates high end cans) and they cost $50. If they break I can fix em. It’s kind of going for the opposite of AirPods. Anti-disposable.

This reminds me a bit of ROC [0], though definitely seems more polished.

One thing nice about ROC is its pulseaudio plugin for native integration as a sink.

[0]: https://gavv.github.io/articles/roc-tutorial/

From the screenshots this looks quite straightforward, which is a failing of most other software of this type. I'm quite curious to try it out.

I joined the public group and even tho its just everyone kinda experimenting with their tunes, i love it. This has a lot of potential. Best of luck!

I wish there was a chat integration? Would like to just say: this is cool tune

I was about to say this as well. For instruments though, a GPS sync clock would help everyone jam in sync because anything above 50 ms is too much and your drummer will go nuts.

Drummers are already nuts. They’re drummers.

A bit off topic, but if you just want to get a fast and lossless music playback for multiple desktop & mobile clients on your home network that has a minimal toll on bandwidth usage then look into MPD Satellite Setup https://www.musicpd.org/doc/html/user.html#satellite-setup Playing back a 24b/96kHz flac uses approximately 400kB/s and for a regular 16b/44kHz its ~200kB/s. In such setup the main instance of MPD acts as a query and file server. Your MPD clients need to be configured as proxies. On Android you need the official MPD app that works as said proxy https://f-droid.org/packages/org.musicpd And then chose a front end client e.g. M.A.L.P. https://f-droid.org/packages/org.gateshipone.malp Or MPDroid https://f-droid.org/packages/com.namelessdev.mpdroid Happy listening!

Is there a way for the iOS app to continue functioning in the background with the device locked?

I’ve been dreaming of a scenario where iPhones, or really any mobile device, could be setup as a two-way wireless mic/monitor system.

The idea came up because of things moving to video conference solutions. Any time you have multiple people at one end of the camera your audio almost by necessity becomes some mash up of microphone and speakers. The half-duplex nature of acoustic echo cancelled audio in this setting drives me insane.

I’ve done wild setups of splitting ear buds to a bunch of people but the cables become a mess. It would be so nice for each person in the room to just plug their earbuds with mic into their mobile and then somehow point it at the device hosting the video conference.

I’ve experimented with OBS.ninja for this but it has the same limitation that mic input dies once the device screen is locked.

Yes, the iOS version does stay active in the background when connected to others. It should pretty much do what you want for this use case, but you will need headphones everywhere because this thing is focused on quality, and so there is no echo cancellation.

Thanks! I did not go as far as connecting it to another device. I just noticed the mic indicator disappeared when I locked the screen. I’ll be sure to experiment with a full setup later.

And yes, definitely everyone on earbuds. Software or even DSP echo cancellation drives me crazy and I’m trying to eliminate it from the equation as much as possible.

Thanks for this project!

I just wanted to follow up that I had the chance to play with it and I apologize for jumping to conclusions about capability.

What work needs to be done to move the iOS app to the app store rather than beta? I'm not an iOS developer but if the issue is money I would like to help out!

Is there place to get prebuilt linux executables?

AUR, predictably, and likely others. You can also use JackTrip/jamtrip.

Sonobus is compatible with Jacktrip?

It isn't... just a similar use case.

looks great! could I use it to get better airpod pro mic audio quality on my mac via the iPhone?

to clarify: airpod audio quality works fine paired to mac, but when you want to also use the mic (e.g zoom call), the mic audio quality is horrible. Some bluetooth limitation on macs? not sure, but it’s a known problem[0]. Audio and mic quality on the iphone however is great. no problem at all.

so that’s the reason I’m asking if it’s possible to connect both audio and mic to the mac via the iphone to work around this limitation? ie mac <> sonobus <> iphone <> airpods

[0] https://apple.stackexchange.com/questions/282705/airpods-ext...

As the other commenter said, the main limitation with Airpods (and any other commodity/non-specialized Bluetooth headphones/headsets/buds/etc) is that the Bluetooth protocol only allows for high quality audio streaming or microphone/headset mode. In microphone/headset mode, the audio quality is severely limited. Thus, if you want to use Airpods (or any other Bluetooth audio device) for communication, you're stuck with low quality audio streaming.

Some devices that are designed for high quality audio and microphone usage simultaneously get around this limitation by actually exposing two Bluetooth "devices" simultaneously, one used exclusively for the microphone and the other for receiving high-quality audio. (Some gaming headsets use this technique.)

Thank you. Yes. However they work great on the iPhone. Sound and mic quality are both totally fine. The problem only happens on the Mac.

So this is why I was wondering if using Sonobus + iPhone might work around this issue?

From my knowledge - no.

The reason airpods "worsen" on calls is that it switches Bluetooth profile with 8khz audio.

A "possible" workaround is setting airpods as output and your mac's internal microphone. It'll have other drawbacks for latency/echo cancelation

Christ this explains so much. I just thought my ceiling fan was loud! Does the problem resolve on the iPad? I've been meaning to move all my zoom Calls to the iPad anyways, this might be the final nail!

I spent some time helping someone get this set up for their music group, and it's really really nice just as a voice chat program. It's amazing how much of the unpleasantness of something like a Skype call goes away with really low latency

I've been using JackTrip which accomplishes the same goal, but without a fancy GUI.

SonoBus has a couple of advantages over JackTrip:

* doesn't need Jack

* (optional) compressed audio using the Opus codec

* public and private groups

* automatic resampling and reblocking between peers

* (optional) dynamic jitter buffer adjustment

* built-in panning, mixing and some FX (compressor, eq, reverb, etc.)

* metronome

* record output to disk

* also available as a VST plugin

* and probably more

Also, don't underestimate the value of a good UI ;-)

Finally, be aware that apps like SonoBus are targeted towards musicians, few of who are comfortable running command line programs, let alone setting up Jack on Windows (which is an adventure on its own).

ok, thanks. Yeah I'm trying it out now. It seems like it will be useful for non-technical musicians. And the GUI is "just enough"...has a mixer and configuration for my audio device and a few things like record and metronome, but isn't overdone like JamKazaam.

Seems to me a killer feature is the ability to run as a VST plugin inside of a DAW or OBS, so users can incorporate SonoBus into their DAW's workflow rather than setting up a new thing entirely.

Nice. My mind is circling around audio use cases these days (thanks Clubhouse ;-)). There is definitely a lot of untapped potential and tools like these make it easy to prototype ideas.

this is great! it looks fantastic, and the promise of low latency audio across networks is amazing.

the thing I looked for, hoped to see, but didn't find is a standalone non-gui version, or a library implementing the protocol. I would love to drop this onto a headless pi racked up as a eurorack module broadcasting low latency output from hardware to join up. talk about an amazing jam session.

are there any plans for a library, or command line version?

Yes, there are plans for a headless standalone version on platforms where people want that. You can actually do it right now with Jambox pi dist, which includes SonoBus: https://github.com/kdoren/jambox-pi-gen

This sounds so much like Sonos, I had to make a genuine effort not to confound the two

Good luck on that trademark case

I really hope we don't live in a world where latin prefixes can be trademarked. Are Micron and MicroSoft confounding names? What about Instagram and Instacart?

It's more like Microsoft and Microbusoft or Instagram and Instagrabum

Well - not really, the common part between sonos and sonobus is the latin root [0] sono. Are we saying that now none can use sono as prefix for things relating to sound?


it's literally the exact same scenario:

latin prefix + single letter:

micro-n sono-s

latin prefix + short word:

micro-soft sono-bus

I came here to see if it supported streaming to Sonos.

There's a rock concert going on in the US/Canada group! This is great

Has anyone used this or a similar tool to successfully sing in a group, live?

Check out https://ccrma.stanford.edu/software/jacktrip/ instead. It is designed for that.

In its early days - https://www.pajam.live

look how you can improve it on low latency networks

the auto functionality does not "adjust" to the network capabilities

The current version 1.3.2/3 has a bug in Auto, it will only adjust up. Use Initial Auto instead until the next update, or periodically reset the jitter buffers with the |< buttons.

make the interface more intuitive to the user and fix a couple of bugs in macOS

other than that, it is great!

Is this new or previously existing?

Somewhat new? https://github.com/essej/sonobus/graphs/contributors

I'm assuming the first date is from the commit date, which here is Aug 2, 2020.

Is there a web version available?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact