There are many codecs that address this such as G.711 codec. PCM is not the right approach.
Would an expert like to chime in? What are good (open) protocols to do home-streaming with low latency given occasional packet-loss (< 10%)?
I assume PCM was used because it is simple and could be done with existing Windows frameworks. That said there are many approaches that could be taken to improve this implementation over the wire (or air):
- Applications like this are what CELT was designed for. You could certainly use straight CELT but these days you're better off grabbing OPUS and configuring it for 48 kHz, essentially putting it in CELT mode. I don't think standalone CELT libraries are still maintained by anyone anyway...
- While the 44.1 kHz sample rate is famous for use in CD audio it's kind of ridiculous for other applications. 48 kHz is a far better starting choice for many, many reasons. Even better adjusting the rate on the fly depending on the source could be cool but I bet at the end of the day everything ends up getting resampled to some fixed sample rate somewhere in the audio stack. Audio in operating systems is actually quite interesting, what with the different sample rates of sources, abilities of physical hardware, mixing, etc, etc.
- The packet format sounds goofy. Multicast RTP and other formats and implementations are a thing and they handle most of the issues the author is (rightfully) concerned about - packetization, MTU, etc. NIH DIY approaches such as this are one of my pet peeves...
- Regarding Wifi and frame format, some compression should certainly be used. The 1/180 second of audio per frame comes to a packetization interval of about 5ms, which would cause a borderline insane flood (> 200 packets per second) of still relatively large packets. The author says 980 bytes at UDP, that still doesn't include the IP, Ethernet, etc frame overhead. For comparison purposes most realtime applications start with a packetization interval of about 20ms, resulting in about 50 packets per second. Of course the encoded frame size combined with the packetization interval determines how much bandwidth is actually used (fun fact, with some codecs you can end up with far more bandwidth used by headers than actual payload). Long story short, CELT with a standard RTP implementation running at 20ms will utilize substantially less bandwidth (too lazy to calculate it out).
- OPUS incorporates SILK, which has a very robust FEC (forward error correction) and PLC (packet loss concealment) implementation. However, this only really applies at more-or-less speech sample rates when OPUS is running in SILK or hybrid modes (AFAIK). Even then very, very few implementations fully realize the dynamic packet loss adaptation that SILK is capable of. It requires an exquisitely complex implementation where the decoder exports loss statistics in real time to the encoder, which then determines the exact amount of FEC parity (essentially) to include in encoded packets to send. Then on the decoder side, the OPUS implementation needs to be able to access the jitter buffer to reconstruct lost frames from the parity information in adjacent packets. In the end most implementations just hardcode packet loss to some fixed value (10%), wasting some bandwidth (and potentially exacerbating packet loss). Fixed values are usually "good enough" but you still need to peek the jitter buffer on decode, and that's tough enough for a variety of reasons. Of course all of this also ignores asymmetric routing scenarios where send and receive packet loss potentially isn't equal, but that's a completely different rabbit hole and doesn't apply to "simple" scenarios like this where usage is on a simple, flat LAN.
On paper latency will be higher with this approach but the quality will be substantially better and you won't be thrashing the hell out of your network infrastructure/spectrum/etc. Actually I'd be interested to see a real-world comparison, it's quite likely that this 200 pps approach using a ton of bandwidth actually has deterious effects on latency and the CELT+RTP+etc approach may actually end up with better latency too in real world scenarios.
Can you list a few? Honest question.
- While 44.1 kHz can theoretically sample the entire range of human hearing there are some practical considerations that give 48 kHz an advantage. That said, there are few reasons to use sample rates above 48 kHz:
- Most importantly 48 kHz is essentially "the standard" on most operating systems, hardware, hardware drivers, various audio frameworks, codecs, audio applications, audio editors, etc, etc. For example, 48 kHz is the only required sample rate in the Intel High Definition Audio Specification (section 126.96.36.199):
Section 2.2 also describes that the timing for the HDA stream format is based on 48 kHz sample rates and other sample rates require some gymnastics (section 5.4.1) to implement.
- Perhaps not surprising to the HN crowd there can be various issues with other sample rates and resampling across different hardware, operating systems, etc:
In 2018 we should be faithfully playing, recording, and transporting audio at 48 kHz. Legacy applications such as the PSTN, CDs, etc will continue to run at their various rates but I can't think of many valid reasons for new applications to use anything other than 48 kHz.
I have separate monitors but use Symless Synergy to share the mouse and keyboard.
I've added it as a scheduled task, set to start on user login, so I don't have to mess with installing it as a service. I'm not going to listen without logging in to my desktop anyway so for me not a huge restriction.
Ive been wary of stuff like Sonos due to the cost and the proprietary nature of the tech, but this Scream driver seems to be a nice simple alternative.
For a one-speaker-per-room setup, it would probably be 'good enough'.
Needs investigation I think.
Driver signing enforcement enhances the security of the OS by preventing malware from installing kernel mode drivers that would hypothetically have unlimited access to the device without the consent or knowledge of the user.
There really is no downside to this.
This is straight-up untrue. You disable it to install the driver. It's fine after that.
If you're an amateur/hobbyist/tinkerer and just want to play with kernel driver development, then you can disable the signing enforcement.
Anyway I'm sorry for derailing from the actual point of this post - it's a very cool project!
It also means, that we will not get some nice things, we would otherwise have. See also iSCSI initiator or FUSE for macOS. They have same problem - need to be signed by the right certificate, so nobody bothered.
Other systems, when they are running in Secure Boot mode, also accept kernel modules signed by the same keys, that are enrolled into UEFI. Why can't Windows and Mac do the same?
Which ones? Linux certainly isn't made by hobbyists today. It used to be, but that was a very long time ago.
Because of corporate control.
I wasn't aware that FUSE for Mac is signed nowadays. When I was checking it few years ago, it was an issue.
The iSCSI terminator for Mac still has the same issue. The code is on github, making it work is the user problem.
Yeah cool project, yet if you'll read the issues on github, you might realize, that at some point no one will be able to use it hassle free, when the author stops renewing a code signing certificate. And for a hobby project, I wouldn't blame him. Some people can't use it hassle free now, because of signing issues.
I ceratinly can't install it on my gf's notebook, so that she can play audio over my speakers, because it would stop working after each reboot.