Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Why isn't there a standard network audio protocol?
149 points by armagon on April 14, 2022 | hide | past | favorite | 149 comments
Having been frustrated again in using bluetooth from a computer to a smart speaker -- ugh! I swear connections only work half the time, and it isn't due to RF interference -- I'm wondering why there isn't a standard protocol for transmitting audio over the network. I think it would be so much easier to use.

[I'm talking about having my devices at home talk to each other. They are already on the same network.]

Edit/Addendum: Are there any streaming audio protocols that work from Mac/Windows/iOS to Amazon Echo Dots? I'm looking for a drop-in replacement for bluetooth audio streaming, where I can play sounds on my computer (ex. a youtube video) and hear it on a louder speaker.




There is several standards in the professional space, Dante being the biggest for "audio over ethernet". It's packet switched, so buffering is required.

AES50 works over cat5 cables, but doesn't use ethernet; it uses a synchronous clock to transmit PCM audio. A lot of the Midas/X32 product lineup uses this to great effect.

Dante allows normal IP equipment to function as audio distribution devices, but has noticeable latency for close-quarters stuff (sound travels ~ .9ms / foot +-10%).

AES50 has extraordinarily low latency, pretty much as good as analog, but only allows point-to-point links.

On the consumer side, RAOP existed for awhile before silicon valley elitism infected Apple: https://en.wikipedia.org/wiki/Remote_Audio_Output_Protocol

EDIT ====

I had in my head the RAOP was an open standard, it's not.


Regular ethernet can do much better than .9 ms / ft, but it would indeed require some buffer size tuning to a level most consumer routers/switches wouldn't allow.

Or even better: https://en.wikipedia.org/wiki/Cut-through_switching


The real problem with Dante is the buffering required, the wire speed is more than sufficient. Ethernet Packets are "best effort", making latency less predictable.

Let's say your original source could be heard by the audience. The audio can travel from source, to mic, be digitized, transmitted over IP, arrive at console, be digitally mixed, then transmitted over IP to speaker, then broadcast at an unknown distance from the source. How much latency occurred there?

The trick is to get the original source wave to roughly line up with the time the amplified wave leaves the PA speaker, otherwise weird echos are heard by the audience at best, and awful band-specific noise cancellation happens at worse.


It's designed for big studio buildings and stadiums. Surely they need latency correction anyway, so the design didn't need to fix it? And you'd think slapping timestamps on all the packets would make it pretty easy to calculate the latency.


Dante now has an OCI runtime that can do this purely in software without an ultimo, ala Dante Virtual Soundcard on the Windows side, which is a good example of how the embedded space is adapting to this prolonged part shortage.

https://www.audinate.com/products/manufacturer-products/dant...


> I'm wondering why there isn't a standard protocol for transmitting audio over the network.

Bluetooth isn't the same as your WiFi network. Most of the comments here are talking about IP-based protocols that aren't relevant for Bluetooth anyway.

Bluetooth is probably the best example of a widely adopted protocol for connecting to devices and sending audio streams. The protocol isn't exactly the problem. It's the buggy implementations of Bluetooth stacks and Bluetooth software in embedded devices.

Getting it right is actually extremely difficult because Bluetooth grew in complexity to be everything to everyone. It isn't only an audio sending protocol. Almost nobody owns the entire Bluetooth stack, so it's a mix of pieces from different companies and vendors.

Apple's implementation isn't perfect, but from experience I can tell you it's 10X better than the nightmare that is Android Bluetooth. It's getting better, but for years you had to collect a lot of different Android phones so you could make your software work around all of the different quirks in each vendor's different Bluetooth stacks.


> Apple's implementation isn't perfect, but from experience I can tell you it's 10X better than the nightmare that is Android Bluetooth

Seem like I have different experience with them. I don't have issues with Android Bluetooth, I do have issues with Apple bluetooth.

Half of the time, my iPad couldn't detect my bluetooth devices (keyboard and audio accessory) are trying to connect to it (already paired). When that occurred, I have to go to the Command Center to force connect my bluetooth devices and half of the time iPad will obligate and connect. Other time it just give up and said couldn't connect or cannot find it (while my bluetooth device is poking iPad to connect). It is a hassle to use my bluetooth devices with the iPad daily.

On the Android side, it instantly connects, even my phone is sleeping.


It sounds like you are talking from a user perspective and the parent is talking from a vendor perspective, no?


I've had the exact same experience when trying to connect my Airpods to my iPhone SE 2nd Gen. When I still used a Samsung S8 the phone would instantly connect to my Airpods. Same experience with Bluetooth headphones.


that is the exact reason I don't like to use bluetooth for audio devices. Nothing beats physical jack cables.


Seconded. Any bluetooth issues I have on Android are specific to a particular device.

The cheap anker headsets I mostly use are rock solid. I have an android head unit (second one actually, first one was garbage) and a Bluetooth radar detector. The detector always works with my phones, and never with my head unit(s).


> Bluetooth isn't the same as your WiFi network. Most of the comments here are talking about IP-based protocols that aren't relevant for Bluetooth anyway.

I'm aware of that. I want audio over WiFi and audio over LAN, as Bluetooth has left me scarred.


> The protocol isn't exactly the problem. It's the buggy implementations of Bluetooth stacks and Bluetooth software in embedded devices.

The spec is way too complicated at some 3000 pages long which apparently leads to faulty implementations left and right.

https://www.wired.com/story/bluetooth-complex-security-risk/


There are some projects like Snapcast[1] or SoundSync[2] (disclaimer: I'm the creator of Soundsync) to let multiple devices communicate together on the same network. The transmission-side isn't that complex: you choose an audio codec, transmit chunks of data and add a synchronization layer (to keep multiple outputs in sync and to correctly delay video playback to match the soundtrack). The bigger problem is building an ecosystem big enough to make it attractive. Bluetooth sucks but is everywhere.

[1] https://github.com/badaix/snapcast [2] https://github.com/geekuillaume/soundsync


I hadn't seen SoundSync before. It looks neat.

Like a lot of other people doing (or trying to do) Whole Home Audio, I'm using the Home Assistant open source platform as the central automation controller. You may want to look at creating a Home Assistant integration for SoundSync as it will expose it to the massive HA community (https://developers.home-assistant.io/docs/development_index/).


Multi output audio is one thing, but for me, something similar to spotify connect (having one master player, either elected or dedicated, and the others are remote controls for it, is more important).

I'm boycotting spotify, so I'm looking for something for soundcloud, deezer, or youtube music.

Tbh, skip deezer, as they actively refuse to create something similar to spotify connect. IMO this is the USP of sonos.. it acts as spotify connect for all services


That looks really neat, I'm not this far in my home automation system dreams (yet), but as I get closer to settling on how I will communicate back and forth to each room, I may need to take a closer look here.


Ooh, SoundSync sounds awesome (no pun intended).


What’s the latency like on Soundsync compared to Snapcast?


There is a standard protocol for audio (and other realtime media) over an IP network: RTP.

But your problem doesn't have anything to do with a lack of standards; Amazon has no incentive to just let you send RTP to the Echo Dot on a port — nobody is asking for that, and they would have no control over the "experience".


Good point -- I just put in a feature request, so now someone has asked for it.

I don't see that this is any worse for the experience than the bluetooth situation. It'd certainly make the device more valuable for me. But that doesn't mean Amazon will see it as worth their time.


Probably get flamed for this, but pulseaudio is good enough for IP networks and handles delay calculation pretty well, when used via multicast it's reasonable but a lack of ecosystem means non-linux support is poor and control is basically non existent, but I did operate pulseaudio as my home audio for TV/PlayStation/phone audio for a time, with some extras like casting receivers etc it's almost useful, but not convenient (there is a gap here someone could fill)


Yeah, Pulseaudio network sinks are useful once everything is properly set up. My media system was a LibreELEC that played movies, and the high end audio system was plugged to it. But I'd rather have that sound when playing some YT video on my laptop: just select media center sink and you have it.

It's really a shame there's nothing multiplatform just like that except for, maybe, JACK.


Yeah, I've used pulseaudio to play same music on multiple computers and their speakers in multiple rooms, and it worked well enough for that: however, that won't solve the issue for the original poster who wants their music to go to a "smart" speaker.


JACKv1 (JACK Audio Connection Kit) has a working audio networking for both Windows and Linux


There sort of is, RTP/RTSP, and in fact it's been around since the earliest pre-web days of the Internet.

The problem is that it's a protocol with a ton of warts -- having two connections, one UDP and one TCP, has been a massive headache for decades now. But it's not awful enough to get ripped up and redone.

The Asterisk VOIP platform had a really awesome protocol called IAX that was basically RTP with the two streams merged into a single UDP connection (and a bare-bones TCP-like reliability layer for the control frames inside of UDP). IAX was never meant for anything other than VOIP, but I wish it had been turned into a wholesale replacement for RTP. If that had happened, it would have been wonderful.


I was thinking about something like Dante, AES50, AES67, AVB, Ultranet, Ravenna or any other of those professional Audio over Ethernet standards out there.

Those are working and used in live venues and studios, the hardware used for those might however be out of scope in terms of price for the typical user and it certainly won't work with your smart speaker.

Does that smart speaker have line in (3.5 mm TRS)? If so you could just send your audio analog over the ethernet cable and build an ethernet to TRS adapter on the other end : ) For longer distances balanced line drivers might be needed, tho.

But shielded Ethernet cables work surprisingly well for analog audio purposes, especially if you send balanced signals through them. If you transformer-balance them you even get galvanic isolation for free.


there is already ABV and DANTE in the pro audio world. you are not aware of it because you probably are not in the recording/audio/music business.

bluetooth sucks because it was invented by a bunch of guys in suits and consumer electronics companies rather than people who understand latency, performance etc. i designed my own protocol in the 2.4ghz band and wrote firmware and middleware for it and it deals with all the weaknesses of BT.

BT should have been designed by those who design the products and applications and deal first hand with end users.


BT was designed to be a general purpose peer to peer wireless communication protocol.

It was not designed to solely carry audio. It just sort of morphed into being primarily used as an audio exchange format (because it's "good enough"). A little bit like how USB morphed into a peripheral bus even though it was designed to be more all encompassing (USB Ethernet, for example). In fact, the USB protocol is somewhat mucked up by the fact that it was designed to be a network instead of a more direct connection.


Now it's actually used in this way with USB4/Thunderbolt 4.


yes, general purpose, another expression for mediocre or garbage.

i think BT wwas first designed for exchanging photos. so mass storage transfer. it should have been designed for streaming latency sensitive data like audio first, and then the “easier” scenarios could have been built on top of that.

at least with USB there was the common sense to include ISO transfers although drivers for that in OSes happened relatively late and OS vendors have ignored the standard for many years, requiring the purchase of analyzers.

in that regard there is similarity with BT but with USB it seems easier to come up with a solution as a firmware/driver/application developer. at least in my experience.


Pro stuff gives a glimpse of if we lived in a perfect world, SDI (and HD-SDI) would have been the de facto standard for video everywhere.


It's almost like including a BNC automatically rules it out of use as a consumer like there's some ridiculous royalty payment owed or something. I love BNC over every other type of connection for a coax cable. Nothing in the consumer world makes as sure of a connection.


Before entering the pro A/V industry I used to equate BNC with "ewww, old as dirt."

I then came to love the simplicity and reliability of SDI. Nowadays I work in uncompressed ST2110, and while there are many advantages of network based video and audio, paying $1,000 for a QSFP to handle just a few streams is a hard pill to swallow!


BNC == featureComplete


Bluetooth is for sure designed by committee as no sane person would intertwine software protocols with wire protocols. But here we are with an endless myriad of profile/protocol mixes all doing essentially the same thing of moving bytes back and forth through the air but with different levers for each.


USB suffers similarly but it’s not as bad IMHO


Dante is great but sadly it's proprietary. Low latency and allows you to replace a loom of analog cable with a single ethernet run.

There's Ravenna and AES67 (which I believe Dante supports), which are open standards but are not as common as Dante.


Dante supports AES67 in a degraded mode (multicast only, 1ms minimum latency, 48kHz only, at least if you're not using Dante Domain Manager).


> The good thing about standards is that there are so many to choose from.

― Andrew S. Tanenbaum

But really, RTP is the closest thing. Outside of the consumer space nearly everything is RTP + some out of band signalling protocol. It's low latency, designed to be multicast, has RFCs for evey codec under the sun, etc.


There is. You build a network of analog cables. Use a sound board to 'switch' the channels. This leads to headphones on 3.5mm jacks, and one or more zones of stereos connected by RCA. This 'network' is as solid in 2022 as it was in 1975.


thank someone brought this up. Wire is always better than wireless. Wish the world would go back to everything being wired, more secure as well.


clearly (given all the other responses), there are a bunch of different conflicting requirements which lead to different protocols.

as for your bluetooth issues, PC bluetooth is a mess.

some of bluetooth's messiness comes from having the higher level elements of the stack designed 20+ years ago to operate on microcontrollers of that era. they've got N different audio profiles because the hardware it was expected to operate on originally would've been hard pressed to handle a single audio profile that could negotiate the gamut of use cases.


Specifically for computers to smart speakers, I use AirPlay 1, but this works better from Windows with a 3rd-party app than from iOS or MacOS—the 3rd party app is perfectly happy to play to as many endpoints as I like, while Apple will only transmit to one endpoint at a time if it's an AirPlay 1 device.

From my Windows 10 PC, TuneBlade AirPlay streaming provides a great experience:

I can play stream anything that is playing on the PC to any AirPlay device on my LAN, and all the playback devices will be in perfect audio sync.

AirPlay 1 devices on my network include an AppleTV, Apple HomePod Minis, Nexum Airplay receivers attached to powered speakers, and DAPs with Airplay reception.

There is a significant buffer delay—about 2 seconds—that messes with video streaming. TuneBlade has the ability to stream video to VLC with synced audio, but doesn't support other video streaming endpoints. There is a bufferless mode with no delay, but it doesn't work well on my network.


There is Airplay 1, which is the only widely supported protocol I'm aware of. See for example https://github.com/mikebrady/shairport-sync.

There is also DLNA, which is actually a standard. I think it's rarely supported for push audio streaming since the protocol is poorly specified.


Some people might say DLNA, but trust me you want absolutely nothing to do with that disaster of a protocol and tech. I have tried off and on for _15 years_ to use different DLNA tech and every single time it ends in total disappointment and failure.


I've got an external HDD with battery and its own small WiFi, it makes its contents available through DLNA. It works great, I usually connect through VLC or a gaming console.


I'm using DLNA to play music from my laptop it at the moment (pulseaudio sink, opus encoded) to a raspberry pi (gmediastreamer) that uses pulseaudio to upmix to 5.1 and play on a usb soundcard. It works, and the quality is good, but the lag is crap and I had to wrap everything in crappy scripts that would fix everything if it died. It's been in place for a year but I'd love to ditch it.


There is, it's called AES67. It just isn't used much in consumer products. The acronym to google is AoIP ("audio over IP")


If you want to turn your home into a TV/Radio station, have a look at Audio Video Bridging[1]. It requires special hardware, but once you're set up devices can reserve bandwidth for their streams which will be prioritized by switches over other Ethernet traffic thus ensuring 100% reliability and sub-2ms latency accross 7 hops.

[1] https://en.m.wikipedia.org/wiki/Audio_Video_Bridging



Ha! Came to post this...I assumed I was the only one to remember it. I got it working when it was part of NCDWare for the NCD X terminals (mostly on the later 700-series terms). Worked, though the audio hardware on the terminals was basic, so it wasn't exactly an audiophile experience. Very clever work, tho.


I remember it from the times where you had ESD (enlightenment sound demon) running on Linux, and this in addition. At least that was the default on some Redhat systems, IIRC?


I keep wanting JACK / netJACK[1] to be this. I've only actually gotten it to work once, though, and it required several hours of fiddling with configuration files and restarting daemons. Usually I run into a problem like...not being able to get certain applications on Windows to route audio through JACK at all.

So, maybe the answer to your question is that there are too many standards for sending audio over the network, which means none of them get enough polish to really become the standard, with support on multiple platforms, requisite drivers actually working, etc.

And this is why I string old-fashioned audio cables around my house.

[1] https://jackaudio.org/faq/netjack.html


If I was in Windows ecosystem I'd imagine there are some DLL injection that would make them work. I wonder if there's work going towards that.


Another question. Why aren't my bluetooth headphones better at buffering larger amounts of data. I should be able to load a complete song without skipping with interference.


Because you might be unhappy if there were 30 second latency on a bluetooth voice call, and there would be a whole lot of overhead in an already complex protocol to enable buffered audio instead of live audio.


Imagine watching a movie with this. I believe apple actually does something like this, slightly delaying the video playback so the AirPods can buffer and the video stays in sync. But this only works if the video player and headphones can communicate.


In fact, Apple aren’t doing this alone. It’s a pretty common feature of video players. I’m pretty sure even VLC supports this.


Because that adds a massive amount of latency, something that is a no. 1 complaint for Bluetooth headphones.


This will change (hopefully) soon with Bluetooth LE audio!


The protocol doesn’t support that - it’s streaming audio.


Why wouldn't a streaming audio protocol allow for that?


I don’t know if you understand what ‘streaming’ means? Streaming doesn’t support large buffering… because that’s not streaming.

But more broadly, not everything can be in scope. At the time of design having 10 MB and a decompressor in earbuds wasn’t realistic.

But blaming your headphones is ignorant - the headphones implement a protocol. They don’t have control over the protocol.


The headphones and earbuds could easily and realistically incorporate a buffer today. How’s that being ignorant?


To be clearer:

Yes, the headphones could store up N seconds of audio data ahead of playback. However, the value of buffering is that if you miss a chunk of data, you can tell the sender "give me that again". Protocols that allow buffering account for that by giving the data sink a means to tell the source "send me chunk F again". Bluetooth A2DP and other streaming protocols, because they prioritize constant latency over data reliability, don't have a means to allow that; the source keeps sending new chunks even if the sink didn't receive one.

As a result, there would be no value in headphones storing up a bunch of audio before playback; if a chunk is missing, there are no means to remedy that in the protocol, so it will still be missing when you play it back.


> Bluetooth A2DP and other streaming protocols, because they prioritize constant latency over data reliability, don't have a means to allow that; the source keeps sending new chunks even if the sink didn't receive one.

That's true for the Bluetooth Headset profile (the low quality one you get on calls), but A2DP goes over Bluetooth ACL[0] which resends dropped frames.

A2DP headphones do have a short buffer on the order of a second or whatever to deal with the jitter from retransmissions (and devices like an iPhone have the logic to delay the display of video appropriately to keep the audio in sync).

Now none of this allows for a whole song to be buffered though.

[0] https://en.wikipedia.org/wiki/Asynchronous_Connection-Less


The protocol doesn’t support that. The headphones can do nothing about that.


In theory, headphones could store music in a buffer instead of playing it, and then delay playing it by say 2 minutes (or 5 seconds or whatever). Even if existing BT profiles preferred losing quality, you could have BT headsets that pretend to be storage devices and accept file uploads and which then play them after they've been completely received. Ideally though, you'd use one of the BT profiles that already provide guaranteed lossless audio transmission (or develop one if there's none). In a sense, BT profiles are protocols within a protocol, so you can develop almost anything you want (ofc, you need devices to support those profiles too).

Of course, the experience of clicking play on a song and having it only start a number of seconds later is not something that'd sell particularly well, I guess. And then you'd have to renegotiate the BT profile if a call comes in that has to happen live. And switching back to the song will have another big delay.


So the upload speed per song is real-time? Come-on - this conversation has turned silly.


BT 3.0 offered up to 24Mbps bandwidth, with other variants offering up to 3Mbps. CD quality music is 1.4Mbps. If you cannot come up with an error correcting scheme that will let you upload music in real time with those parameters, what parameters would you need? (And sure, these rates are hard to achieve with BT in real world because of varying distance and interference, and yes, CD quality music is not the highest quality encoding you can use, but you can achieve similar or better quality with less bandwidth too)

And let's not forget this was a discussion of buffering. A buffer of 5 minutes (50MiB) buys you 5 minutes of not having to be real-time, or to be slowly lagging behind — if that covers 3h of continuous listening time, you probably covered 99% of uses where latency is not a big deal anyway (like playing music — calls and movies are another game).

I already acknowledge practical UX problems with just relying on buffering, but it doesn't make much sense to say how it can't be done because of the protocol either.


But the protocol just doesn't support sending audio faster than it's supposed to be played. The sender doesn't know what to send to do what you want. There's no mechanism to do what you want for the headphones.


Sure, the current protocol doesn't support it.

But wanting a better protocol isn't 'silly'.


Why can’t the headphones buffer the sound for a second? Why would it need protocol support? I’m thinking something like anti-disk-skipping on portable CD players.


If it's actually streaming, the buffer at headphones wouldn't help anything since any missed data would not get resent anyway (since the sender wouldn't keep a buffer and would not have any data to re-send) and would still cause a skip.


They already do keep a buffer for a second or so


Was a full song, now it’s a second?


I only suggested a buffer, not one of an entire song length, so maybe you’ve mistaken me for someone. What I’m trying to figure out is why we can’t apply the same concept as in the anti skipping technology to Bluetooth cutouts.


> I don’t know if you understand what ‘streaming’ means? Streaming doesn’t support large buffering… because that’s not streaming.

That's not how those words work.

Twitch is streaming, right? Under certain flaky playback conditions it can buffer a full minute. Which is 50 megabytes at full quality.


> But blaming your headphones is ignorant - the headphones implement a protocol. They don’t have control over the protocol.

If the headphones are implementing a protocol that isn't suitable for purpose, there is very good reason to blame the headphones. What's the point in having headphones if you need to be in a Faraday cage to use them?


If you buy Bluetooth headphones and complain they don’t buffer full songs then that’s your problem, not the headphones.

> What's the point in having headphones if you need to be in a Faraday cage to use them?

Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.


> If you buy Bluetooth headphones and complain they don’t buffer full songs then that’s your problem, not the headphones.

What is the use case for headphones that cut out every couple of seconds?

> Surely it’s the opposite? They don’t work in a Faraday cage, because they’re streaming and need to be connected.

In this case the broadcast source would be in the Faraday cage along with the listener.


This is a great question that I've been playing with since getting some JBL PA speakers and seeing the ethernet jacks and looking at what it supported. It's SO difficult to get something working without buying really locked down and expensive gear that is for much larger applications. Also pretty cumbersome to enable on my Ubiquiti router but I'm getting there!

I ended up settling on using CobraNET by buying some fairly affordable GPIO rack units from reverb from a defunct church made by AudioScience. As doesn't really want to sell to me directly but they sell to a lot of churches and they scale up and down a lot so nice gear is really affordable.

I wouldn't call it easy to use, but compatible, affordable, and pretty cool. I've been working on a 4.2 mini "ambisonic" setup... It's a long term goal like an old guys basement train set. Heh.

http://www.audioscience.com/internet/products/cobranet/cobra...

I hope this helps and thanks for bringing this up to HN. I look forward to reading all the replies.


A Hifi Berry [1] plus a speaker. The Hifi Berry is a Raspberry Pi hat. There are hats for digital output, analog output, and one with amplified analog output that can drive a 4ohm speaker quite respectably.

I have some old Cambridge Soundworks speakers outside that I drove from a HifiBerry Amp2 hat for a while. Not quite enough power for outside so I switched to a Hifiberry DAC Pro and a pretty capable amp off eBay. The sub to go with them uses a HifiBerry DAC too.

My main listening speakers are active digital speakers, with SPDIF inputs, and Hifi Berry Digi+.

The HifiBerry has a RPi build you can just stick on a card, though I now use dietpi as it survives power loss better. Once that's done you can connect over wifi or wired, using Airplay, Roon, bunch of other things I forget. Basically you can run any music server on it, because linux. With Airplay or Roon (and probably others) you can make all the speakers in the house play the same music, which is awesome for parties.

There's a bunch of manufacturers that make wifi/wired active speakers. Bluesound. KEF make some nice ones for $2000. Not portable, mind you.

[1] https://www.hifiberry.com


I haven't seen VBAN mentioned yet:

https://vb-audio.com/Services/support.htm#VBAN

https://vb-audio.com/Voicemeeter/vban.htm

It's a free, open protocol that runs on UDP, transmits PCM audio and MIDI


RTSP/RTMP are not to your liking?


To pile on further, you may have better success getting a small device (like a pi) and connect the audio out to your speaker.


I'm using Amazon Alexa Echo Dots. I really wish they had a line-in connection, as it, too, would make life much easier when I want to play audio from a device.


I too found bluetooth to be unreliable.

For that reason, I have extensively worked with pulseaudio over network. There is no UI that works for this. NTP for some reason is important which seems like bad design to me. zeroconf doesn't work at all.

Once you get it working... dont dare change anything. It will break in inexplicable ways that drive you up a wall.


Making a Bluetooth alternative, I think apple is the only company that can pull it off. But they will absolutely do it such that only their headphones will work with Apple devices. And then they will license other manufacturers to be able to use their tech to connect to apple devices.

Otherwise, I see a huge opportunity for a consortium to develop a new hardware and software stack for high quality low latency audio and being them as a package to their products. I would love a completely wireless Dolby Atmos like setup with no central receiver, your mobile device itself being the av receiver. New speakers from any manufacturer and form factor could be added wherever you want as you buy them. Calibration according to your speaker placement would be wireless and automatic.


> Making a Bluetooth alternative, I think apple is the only company that can pull it off.

Microsoft did it with Xbox Wireless Protocol which is used to transfer input from controllers and high quality sound without latency.

But, yes. It only works on Xbox or on Windows with an adapter and you can count the manufacturers using it one one hand. Microsoft being the thumb.


I’ll link to https://news.ycombinator.com/item?id=29514876 here, it may have valuable insight.

I can use my hdmi ARC soundbar from my computer. We live in a backwards world.


What about HDRadio? A home scale FM broadcast could accomplish this efficiently and cheaply. Each speaker would just need an FM receiver.

I guess the downside is that your neighbors could listen to whatever you're listening to but who listens to terrestrial radio in their home that is received OTA anymore?

https://en.wikipedia.org/wiki/HD_Radio

https://www.amazon.com/Home-FM-Transmitter-Whole-House/dp/B0...


iBiquity (now owned by DTS) has never, to the best of my knowledge, open sourced their HDC codec, nor has it been reverse-engineered. To me that's a show-stopper towards any kind of widespread buy-in of HD Radio beyond commercial stations.

Also, authorities like the FCC take a dim view of FM broadcasting beyond miniscule power levels as seen in car radio adapters due to the easy potential for intentional or unintentional abuse. For example, a 5 Watt FM transmitter sold on eBay may have you thinking it will yield a small amount of power, but spitballing some numbers: outputting it through an FM band turnstile antenna atop a high building or hill could have an Effective Radiated Power in the 7 or 8 kW range, great enough to cover a small city in a round pattern.

Your proposed devices would therefore fall into that very low power range for certification but there would need to be some sort of clear channel hopping required. That's fine in rural areas but quite difficult in large metropolitan areas.


The answer is DRM. In fact, almost any audio/video standard attempts have to address the elephants in the room: Disney, Warner Media, Universal Music Group, etc, and they all require DRM.


Is there DRM added to bluetooth audio connections?


See SCMS-T DRM. But Bluetooth has been around since the 90s.[1]

1 https://en.wikipedia.org/wiki/Bluetooth#History


From my experience, no. The other comment mentions a standard but I have yet to see it in use anywhere.


Generally you don't see it, but then again Bluetooth became available to consumers in the late 90s, shortly before the big push to lock up all multimedia sharing.



If you don't like audio over ethernet do not even think about doing high quality video :-D

Hate to say it but you are probably better off getting something other than Echo Dots for music. Too bad Google discontinued the Chromecast Audio - I love mine. The biggest plus for me is that once you have a compatible app (such as Spotify) streaming is done entirely on the chromecast audio from your internet itself instead of continuing to use your phones battery and wifi.


I've run into this. I would like to use my Android phone as speaker and microphone for my laptop, so I can walk around without leaving my call. For some reason this is impossible, Bluetooth supports it of course, and so does Pulse Audio on my laptop, but an Android phone will only act as the host not the speaker/mic.

I found SnapCast which lets me send audio from laptop to phone (with huge latency) but not the other way (phone mic to laptop).


And while we’re on the subject, why is cell phone audio so horrible? It is worse than that delivered by the cast metal telephones with rotating dials of my youth.


It doesn't need to be. With VoLTE the sound quality is usually pretty crisp in my experience. It all depends on the carrying technology, bandwidth, compression parameters and codecs used. EVS supports up to 128kbps audio streams, which makes voice data come across crystal clear, and that's a technology from 8 years ago.

One problem is the fact that the codec needs to be negotiated, and if you're unlucky with codec compatibility, both callers fall back to crappy old codecs. Then there are tons of options for audio profile selection depending on requirements and bandwidth available (see https://en.wikipedia.org/wiki/Enhanced_Voice_Services for an overview) which makes it difficult to say what cause your specific problems.

Without VolTE, you're falling back to 3G audio, probably AMR or AMR-WB, which is quite old and doesn't compress as well as modern standards.

Unless you mean the headphone profile for Bluetooth headsets: that's terrible because the standard is ancient, back when Bluetooth had even less capacity for low latency data transfer, and the codec is suboptimal making the situation even worse. There are better codecs out there, and some headsets will support what some call mSBC, which massively improves the audio quality (but not exactly to a HD audio stream because of limitations). There have been several proprietary attempts to fix this issue, but implementing those solutions costs money so many headphones ship without them.


Most likely because those landline phones transmitted via a copper cable while mobile phones send the audio via a heavily compressed and shared wireless connection that isn't exactly all that reliable.

Cabled connections are superior to wireless ones, even more so because traditional landlines had dedicated connections and as such had no need to compress anything.


Analog-only phones had great quality because they didn't sample voice. Once phone systems were changed to digital backbones, it became necessary to sample voices, and the sampling rates that were chosen were done so for efficiency using the tech of the time. Usually 4 khz samples. While there are better quality standards today, many phone systems will fall back on old standards.


Does your phone not support VoLTE? You might have to explicitly turn it on. Sounds great on my phone.


There is at least an attempt at WiFi audio I've seen in the wild, at IKEA of all places: https://www.ikea.com/au/en/cat/wi-fi-speakers-46194/

No idea of the broader ramifications of this, but any move away from the shitshow that is Bluetooth can only be a good thing.


The IKEA wi-fi speakers are running Sonos software internally. It's "yet another" wi-fi streaming platform - along with the offerings from Google, Apple, Spotify etc.


If you are just looking to move audio from one computer to another you can do it with netcat, no protocol needed:

https://www.audio-digital.net/a-pages/audio-over-netcat.html

I've tried this and it works.



Yes, but PulseAudio requires loading a module, authentication, and configuration of several files. It's more versatile I'm sure but if you just want something that works quickly, using netcat to forward from /dev/dsp seems simpler on your own network not connected to the internet. Using netcat has no fault tolerance, a temporary disconnection is likely to cause it to fail, but it's an easy way to get something working quickly, and for some people that might be good enough. Netcat may run on more platforms than PulseAudio does also, especially small embedded systems.


As someone who produces music with synthesizers - there's also no (well working & established) standard for running/managing multiple audio sources over usb.

There are some valiant efforts (e.g. Overbridge by Elektron), but even those took a lot of effort, are proprietary and buggy.



Bluetooth has been in development for over 20 years now, I'd assume they have solved all the problems by now. Meanwhile the consumer WIFI is faster than ethernet copper.


Firewire was supposed to be an AV standard that allowed connecting anything to anything and completely eliminate the RCA cables etc.

And then the RIAA and MPAA discovered the plan and killed it good.


The PulseAudio protocol supports network audio.


In this thread:

https://xkcd.com/927/


At work: why do we have 9 different ways to identify a physical location? Because there are 5 different teams that need to do that, and our team hasn't gotten around to re-inventing the wheel like the other teams have.


It looks like it’s still under active development (on SourceForge!)

https://sourceforge.net/p/nas/activity/?page=0&limit=100#61f...


audio signal is enough. jackd is nice as an option.


You can't even send anything from Apple to non Apple by Bluetooth. Why do you expect audio would work.


I don’t understand what you’re saying here.

I listen to my Apple devices on a knock-off add-on Bluetooth for my car with no issues. I’ve sent audio to a vast variety of non-Apple Bluetooth devices. In fact the only Apple-branded BT device I use are my AirPods.


Have you ever tryed to send a file, a a picture or something over Bluetooth from Apple to Android?


No not in the last decade. It's probably such a remote use case nowadays that it hasnt got any recent attention, if someone needs to forward me something like a picture or PDF then they typically use email, whatsapp, lineapp, signal, dropbox, gdrive etc... I think it must of been 10 years since I had my phone with bluetooth discover-me on and anyone actually tried to beam me something. Bluetooth, to me nowadays, is just something to connect to my airpods but that's largely invisible so I wouldn't noticed if they changed to something other than bluetooth


Why is sending a file relevant in an audio protocol discussion?


ROC looks pretty good.


Dante? AVB? over tcp


There is, Dante.


It's a bit funny that there are already a bunch of comments that are stating "There already is, it's called 'X'", each with a different value for X.

I think this paints a better picture of the situation than any one person can provide.


>It's a bit funny that there are already a bunch of comments that are stating "There already is, it's called 'X'", each with a different value for X.

It's because the replies are interpreting the op's question differently from the intent.

When op asks: "why there isn't a standard protocol?" -- he's asking "why isn't there a SingleDominantThatWorksOnOnEveryDevice audio protocol that lets me connect devices seamlessly?"

The op's word of "standard" is just doing a lot of heavy lifting to convey a frustration with stuff not working intuitively.

The analogy is TCP/IP being a standard (SingleDominantThatWorksOnOnEveryDevice) network protocol that won over Apple AppleTalk, Novell SPX/IPX, and Microsoft LANMAN NETBIOS.

But many replies interpreted "standard" as "any available existing specification regardless of marketshare or device availability" -- so that's where you get various examples of audio protocols that are idiosyncratic to particular domains which are not analogous to the ubiquity and reliability of TCP/IP. E.g. the Dante audio protocol which doesn't seem relevant to op's use case.

And what's the scope of an "audio protocol"? Is it a "media query of music files" protocol like DLNA? Or is it a "virtual hardware audio device endpoint" like Bluetooth Audio?


Yes, that's what I meant.

Why isn't there a widely interoperable audio-over-the-network transmission protocol I can use, so that when I am playing sound (from a song, a video, or a game), I can hear it on an external speaker? [The scope is just a 'virtual hardware audio device endpoint' like bluetooth audio]


As someone who works on the code for a competitor of Sonos, the answer is that it is hard to do, depending on your requirements.

> ...(from) a video, or a game

So then you need low latency, like less than 10ms? So that lip-sync works, and the game is playable?

Do you need it distributed across different endpoints, also with low latency?

Does it need to run using unreliable WiFi connections, and not kill all audio just because one endpoint is under-performing?

These are all hard, hard enough that doing it well (and keeping it proprietary) makes companies like Sonos big.

OTOH, streaming mp3 from one endpoint to another is trivial.


True enough. Somehow bluetooth audio manages these issues.

I for one would accept latency and the audio going silent (or better, an audio indicator) if the connection isn't up-to-snuff but I don't know if other people would.


> Somehow bluetooth audio manages these issues.

It manages it... sometimes.

I have an NVIDIA Shield I use for my video needs, and attempting to pair my Sony WH-1000XM4 headphones with it results in crappy latency and out-of-sync audio. These are both high end products from respected companies, and they work together with pretty shitty results.

Edit: I just tried this again after writing that and magically things work much better than they did before... but I stick by the general point.

In general, I'd describe the Bluetooth experience as mediocre at best.


In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

The reason there isn't a standard (other than Sonos, or those discontinued Chromecast dongles) is that you need the following to work seamlessly:

- network attached DAC of some sort (in-speaker, or not; don't care)

- iOS app

- Android app

- the top 10 streaming services

- radio streaming directories, like TuneIn, or the open source ones.

- airplay

- Chromecast

- network / device auto discovery

- sound synchronization

- power management

- desktop apps

- NFS/cifs/etc bridge

- hdmi/fiber/??? bridge

- N.M surround sound (for N = 2, 3,5,7,9 and M=0,1,2)

- Some battery powered, waterproof speaker that works in direct sunlight on hot days

- Hardware distribution at places like IKEA, BestBuy, Amazon, etc.

- A healthy used hardware market

- 10+ year support lifetimes on the speakers + amps (note: discrete, cheap DAC dongles could disrupt sonos on this point)

And other things I forgot about.


> In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

I bought my first Sonos device last year, the Roam. Using it as a bluetooth speaker is fine and I love the sound and portability, but oh boy do I hate the experience of trying to use Sonos services over wifi.

Nine times out of ten, perhaps even more often, the iOS app says it can't connect and "let's fix it". If I go through the slow reconnection wizard it invariably ends up telling me to reboot my router(!?). I learned to either switch the Roam on/off a bunch of times, or kill and restart the app a bunch of times, before the app eventually decides yes, it can find the device ... only to then fail again when half hour later I want to add something else to the queue or switch station.


Interesting. My experience with the Sonos app has been a revelation in GOOD audio networking experiences. It just works. I download the app - connect to a play 1/3/5 near me and stream music. All in the space of about 2 minutes. Nothing else I've tried comes close to this experience.


I've had a (S1) sonos for many years. That only happens to me if the speakers (or phone) are repeatedly falling off the WiFi network.

Try plugging into Ethernet, or placing it close to your router. If that fixes it, then you have a root cause.


The roam has no Ethernet port. It holding a wifi signal has never been a problem (if I set it playing something, I can’t recall it ever having stopped)

Someone else has contacted me privately and recommended I switch off private MAC address on my phone. That seems to have improved things.


> In true HN fashion, first-mover / market-leader Sonos isn't even mentioned yet.

market leader? no idea. first-mover? wrong. Slim Devices were the first mover in this space with the Squeezebox (subsequently purchased by Logitech). Sonos came shortly afterwards.


In true HN fashion they ignore people were pirating and streaming content to multiple rooms before a big company brand caught onto the idea and profited from it.

I have multi-room streaming using “dumb” speakers, and copper wire (for audio and network). I control one content box and aim it at different speakers from my phone, tablet, laptop. Siri Shortcuts decouple me from waiting for an MBA to approve adding voice commands.

I know; brave flex sticking with simple wire versus going wireless.


In the pro world there was Dante, Ravenna, and to a lesser extent AVB. People didn't like that nothing worked with each other. The AES got the AoIP manufacturers together and standardized a union of these technologies and called AES67. Now most pro gear is compatible and it is in widespread use in (mostly) large audio installations (think stadiums/venues, broadcast, theme parks, etc).

There's not much in way of open source solutions to using it, and not many devices you would want to buy as a consumer that uses it, however.


> There's not much in way of open source solutions to using it

But there are some? Can an arbitrary Linux box or Raspberry Pi be fitted with free software to receive AES67 over Ethernet from commercial solutions, or is there a catch?


Ooooh something I know quite a lot about:

So for AES67 receive, in principle no as PTP stack exists for RPI yet. You could cheat like the majority of manufacturers do and just play the audio as it arrives instead of using the timestamps. You'd also need a way of drifting the audio out clock to match the frequency of the PTP clock. If you didn't care about bitexact audio, you can resample, though ALSAs clock measurement kind of sucks.


There's a kernel module for handling the networking connection and exposing it as an alsa device: https://bitbucket.org/MergingTechnologies/ravenna-alsa-lkm/s..., and some FOSS stuff for managing the discovery/control layer. It's not as simple as plugging in a USB device and selecting your i/o, though.


That kernel module userland part has an EULA that makes it very much non-free, is it required or do the FOSS alternatives work with the kernel module?


That's because "sending audio over a network" isn't a single self-contained problem but a huge area which requires lots of different approaches depending on the specific use case.


Obligatory xkcd: https://xkcd.com/927/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: