You are absolutely right. I have used an incorrect source. The search was for forward error-correction in modern protocols... G.711 has neither properties. SMPTE2022-5/7 have such provisions, but I am not an expert in these protocols.
Would an expert like to chime in? What are good (open) protocols to do home-streaming with low latency given occasional packet-loss (< 10%)?
Well, I know for sure that PCM is not the right format for these kind of applications. However, I couldn't remember the abbreviation for the protocols anymore.
Let me start by saying this is a cool tool and I'm glad the author created it and released it. I wouldn't know the first thing about making such a beast. That said there is quite a bit they could learn from decades of progress in this field.
I assume PCM was used because it is simple and could be done with existing Windows frameworks. That said there are many approaches that could be taken to improve this implementation over the wire (or air):
- Applications like this are what CELT was designed for. You could certainly use straight CELT but these days you're better off grabbing OPUS and configuring it for 48 kHz, essentially putting it in CELT mode. I don't think standalone CELT libraries are still maintained by anyone anyway...
- While the 44.1 kHz sample rate is famous for use in CD audio it's kind of ridiculous for other applications. 48 kHz is a far better starting choice for many, many reasons. Even better adjusting the rate on the fly depending on the source could be cool but I bet at the end of the day everything ends up getting resampled to some fixed sample rate somewhere in the audio stack. Audio in operating systems is actually quite interesting, what with the different sample rates of sources, abilities of physical hardware, mixing, etc, etc.
- The packet format sounds goofy. Multicast RTP and other formats and implementations are a thing and they handle most of the issues the author is (rightfully) concerned about - packetization, MTU, etc. NIH DIY approaches such as this are one of my pet peeves...
- Regarding Wifi and frame format, some compression should certainly be used. The 1/180 second of audio per frame comes to a packetization interval of about 5ms, which would cause a borderline insane flood (> 200 packets per second) of still relatively large packets. The author says 980 bytes at UDP, that still doesn't include the IP, Ethernet, etc frame overhead. For comparison purposes most realtime applications start with a packetization interval of about 20ms, resulting in about 50 packets per second. Of course the encoded frame size combined with the packetization interval determines how much bandwidth is actually used (fun fact, with some codecs you can end up with far more bandwidth used by headers than actual payload). Long story short, CELT with a standard RTP implementation running at 20ms will utilize substantially less bandwidth (too lazy to calculate it out).
- OPUS incorporates SILK, which has a very robust FEC (forward error correction) and PLC (packet loss concealment) implementation. However, this only really applies at more-or-less speech sample rates when OPUS is running in SILK or hybrid modes (AFAIK). Even then very, very few implementations fully realize the dynamic packet loss adaptation that SILK is capable of. It requires an exquisitely complex implementation where the decoder exports loss statistics in real time to the encoder, which then determines the exact amount of FEC parity (essentially) to include in encoded packets to send. Then on the decoder side, the OPUS implementation needs to be able to access the jitter buffer to reconstruct lost frames from the parity information in adjacent packets. In the end most implementations just hardcode packet loss to some fixed value (10%), wasting some bandwidth (and potentially exacerbating packet loss). Fixed values are usually "good enough" but you still need to peek the jitter buffer on decode, and that's tough enough for a variety of reasons. Of course all of this also ignores asymmetric routing scenarios where send and receive packet loss potentially isn't equal, but that's a completely different rabbit hole and doesn't apply to "simple" scenarios like this where usage is on a simple, flat LAN.
On paper latency will be higher with this approach but the quality will be substantially better and you won't be thrashing the hell out of your network infrastructure/spectrum/etc. Actually I'd be interested to see a real-world comparison, it's quite likely that this 200 pps approach using a ton of bandwidth actually has deterious effects on latency and the CELT+RTP+etc approach may actually end up with better latency too in real world scenarios.
>While the 44.1 kHz sample rate is famous for use in CD audio it's kind of ridiculous for other applications. 48 kHz is a far better starting choice for many, many reasons.
Not the author but I like 48Khz because its easy to generate on most systems and with a 24Khz nyquist if you set your low pass 3dB point to 24khz you have very little attenuation from dc do 22Khz.
- 48 kHz can be more easily (and more importantly - accurately) resampled to other common sample rates (which tend to be round multiples of 8).
- While 44.1 kHz can theoretically sample the entire range of human hearing there are some practical considerations that give 48 kHz an advantage. That said, there are few reasons to use sample rates above 48 kHz:
- Most importantly 48 kHz is essentially "the standard" on most operating systems, hardware, hardware drivers, various audio frameworks, codecs, audio applications, audio editors, etc, etc. For example, 48 kHz is the only required sample rate in the Intel High Definition Audio Specification (section 7.3.4.7):
Section 2.2 also describes that the timing for the HDA stream format is based on 48 kHz sample rates and other sample rates require some gymnastics (section 5.4.1) to implement.
- Perhaps not surprising to the HN crowd there can be various issues with other sample rates and resampling across different hardware, operating systems, etc:
In 2018 we should be faithfully playing, recording, and transporting audio at 48 kHz. Legacy applications such as the PSTN, CDs, etc will continue to run at their various rates but I can't think of many valid reasons for new applications to use anything other than 48 kHz.
If you're working digitally, you won't introduce any noise. There is the possibility of errors in sounds between 22.05kHz and 24 kHz, but they're easily fixed with a low-pass-filter at 22Khz before resampling.
While there are many codecs than can somehow cope with noisy and lossy channels, G.711 (ie. what is in telco world known as "The PCM") is not one of them.
Personally I've used PulseAudio to send audio over the network from my Linux NUC to my Windows Desktop, so I could watch videos and similar on it without using noisy analog audio or get a new soundcard with optical input. This seems like it could fill a similar role except in reverse.
I have separate monitors but use Symless Synergy to share the mouse and keyboard.
I just used the Cygwin version, worked straight up.
I've added it as a scheduled task, set to start on user login, so I don't have to mess with installing it as a service. I'm not going to listen without logging in to my desktop anyway so for me not a huge restriction.
With this you can cheaply replicate a Sonos style whole home networked music system, by having the recievers be something like some raspberry pi zeros wired up to some amplified speakers.
Ive been wary of stuff like Sonos due to the cost and the proprietary nature of the tech, but this Scream driver seems to be a nice simple alternative.
Well, if the receivers are always-on, and listening for the UDP packets from the Windows PC, then surely any lag causing de-syncs would be minimal - in the millisecond range.
For a one-speaker-per-room setup, it would probably be 'good enough'.
I think you'll be surprised how quickly it drives you nuts (if there's ever the situation where you are hearing another speaker). There's also the rather brilliant https://github.com/badaix/snapcast if you do want sonos-like synchronised playback.
Definitely stay away from Sonos. I bought a pair of cheap Ones a month ago and next thing I knew I had four Ones, four Play:1s, two Beams, and a Connect.
You can disable driver signing enforcement if you want to install drivers without having them signed.
Driver signing enforcement enhances the security of the OS by preventing malware from installing kernel mode drivers that would hypothetically have unlimited access to the device without the consent or knowledge of the user.
afaik you cannot just 'disable' it. You have to disable it every time you boot the computer, and secure boot has to be disabled. I would agree with you if microsofts signing process was 21st century and anyone could do it through proper channels but last time I checked it's archaic and targetable by anyone wanting to make a virus anyways. https://news.ycombinator.com/item?id=17195758
If you're a professional or part of a professional organization, this is a small hurdle to clear and most would agree that the increased security posture for the end user is worth the inconvenience/price for developers.
If you're an amateur/hobbyist/tinkerer and just want to play with kernel driver development, then you can disable the signing enforcement.
Anyway I'm sorry for derailing from the actual point of this post - it's a very cool project!
Professional organization vs hobbyist is a false dichotomy. We have entire operating systems that would be classified as made by hobbyists, yet are used by "serious organizations".
It also means, that we will not get some nice things, we would otherwise have. See also iSCSI initiator or FUSE for macOS. They have same problem - need to be signed by the right certificate, so nobody bothered.
Other systems, when they are running in Secure Boot mode, also accept kernel modules signed by the same keys, that are enrolled into UEFI. Why can't Windows and Mac do the same?
Fuse for macOS is signed by its developer, not sure what you mean "not get some nice things"? Or did you mean for windows? (Both windows and macOS require paid developer certificates to load kernel drivers)
> Anyway I'm sorry for derailing from the actual point of this post - it's a very cool project!
Yeah cool project, yet if you'll read the issues on github, you might realize, that at some point no one will be able to use it hassle free, when the author stops renewing a code signing certificate. And for a hobby project, I wouldn't blame him. Some people can't use it hassle free now, because of signing issues.
I ceratinly can't install it on my gf's notebook, so that she can play audio over my speakers, because it would stop working after each reboot.
That's not how it works at all--I run unsigned drivers on some of my machines all the time because occasionally user-hacked drivers work better with this or that video capture device. You have to reboot to install an unsigned driver; it works just fine afterwards.