Hacker News new | past | comments | ask | show | jobs | submit login
Scream – virtual network sound card for Windows (github.com/duncanthrax)
106 points by ivank on Aug 5, 2018 | hide | past | favorite | 52 comments



Proprietary, but there’s also Dante Virtual Soundcard.


This approach has a rather severe disadvantage: packet loss. Especially in crowded WiFi networks the performance is going to be dismal.

There are many codecs that address this such as G.711 codec. PCM is not the right approach.


To be precise, G.711 is PCM but with a much lower bitrate (and correspondingly lower quality.)


You are absolutely right. I have used an incorrect source. The search was for forward error-correction in modern protocols... G.711 has neither properties. SMPTE2022-5/7 have such provisions, but I am not an expert in these protocols.

Would an expert like to chime in? What are good (open) protocols to do home-streaming with low latency given occasional packet-loss (< 10%)?


You sounded quite expert in your first post.


Well, I know for sure that PCM is not the right format for these kind of applications. However, I couldn't remember the abbreviation for the protocols anymore.


Let me start by saying this is a cool tool and I'm glad the author created it and released it. I wouldn't know the first thing about making such a beast. That said there is quite a bit they could learn from decades of progress in this field.

I assume PCM was used because it is simple and could be done with existing Windows frameworks. That said there are many approaches that could be taken to improve this implementation over the wire (or air):

- Applications like this are what CELT was designed for. You could certainly use straight CELT but these days you're better off grabbing OPUS and configuring it for 48 kHz, essentially putting it in CELT mode. I don't think standalone CELT libraries are still maintained by anyone anyway...

- While the 44.1 kHz sample rate is famous for use in CD audio it's kind of ridiculous for other applications. 48 kHz is a far better starting choice for many, many reasons. Even better adjusting the rate on the fly depending on the source could be cool but I bet at the end of the day everything ends up getting resampled to some fixed sample rate somewhere in the audio stack. Audio in operating systems is actually quite interesting, what with the different sample rates of sources, abilities of physical hardware, mixing, etc, etc.

- The packet format sounds goofy. Multicast RTP and other formats and implementations are a thing and they handle most of the issues the author is (rightfully) concerned about - packetization, MTU, etc. NIH DIY approaches such as this are one of my pet peeves...

- Regarding Wifi and frame format, some compression should certainly be used. The 1/180 second of audio per frame comes to a packetization interval of about 5ms, which would cause a borderline insane flood (> 200 packets per second) of still relatively large packets. The author says 980 bytes at UDP, that still doesn't include the IP, Ethernet, etc frame overhead. For comparison purposes most realtime applications start with a packetization interval of about 20ms, resulting in about 50 packets per second. Of course the encoded frame size combined with the packetization interval determines how much bandwidth is actually used (fun fact, with some codecs you can end up with far more bandwidth used by headers than actual payload). Long story short, CELT with a standard RTP implementation running at 20ms will utilize substantially less bandwidth (too lazy to calculate it out).

- OPUS incorporates SILK, which has a very robust FEC (forward error correction) and PLC (packet loss concealment) implementation. However, this only really applies at more-or-less speech sample rates when OPUS is running in SILK or hybrid modes (AFAIK). Even then very, very few implementations fully realize the dynamic packet loss adaptation that SILK is capable of. It requires an exquisitely complex implementation where the decoder exports loss statistics in real time to the encoder, which then determines the exact amount of FEC parity (essentially) to include in encoded packets to send. Then on the decoder side, the OPUS implementation needs to be able to access the jitter buffer to reconstruct lost frames from the parity information in adjacent packets. In the end most implementations just hardcode packet loss to some fixed value (10%), wasting some bandwidth (and potentially exacerbating packet loss). Fixed values are usually "good enough" but you still need to peek the jitter buffer on decode, and that's tough enough for a variety of reasons. Of course all of this also ignores asymmetric routing scenarios where send and receive packet loss potentially isn't equal, but that's a completely different rabbit hole and doesn't apply to "simple" scenarios like this where usage is on a simple, flat LAN.

On paper latency will be higher with this approach but the quality will be substantially better and you won't be thrashing the hell out of your network infrastructure/spectrum/etc. Actually I'd be interested to see a real-world comparison, it's quite likely that this 200 pps approach using a ton of bandwidth actually has deterious effects on latency and the CELT+RTP+etc approach may actually end up with better latency too in real world scenarios.


>While the 44.1 kHz sample rate is famous for use in CD audio it's kind of ridiculous for other applications. 48 kHz is a far better starting choice for many, many reasons.

Can you list a few? Honest question.


Not the author but I like 48Khz because its easy to generate on most systems and with a 24Khz nyquist if you set your low pass 3dB point to 24khz you have very little attenuation from dc do 22Khz.


48khz does give you a stream that's easy to upsample to 96khz or 192khz. IMO the OP is overstating the case, but it's nice to have.


- 48 kHz can be more easily (and more importantly - accurately) resampled to other common sample rates (which tend to be round multiples of 8).

- While 44.1 kHz can theoretically sample the entire range of human hearing there are some practical considerations that give 48 kHz an advantage. That said, there are few reasons to use sample rates above 48 kHz:

https://xiph.org/~xiphmont/demo/neil-young.html

- Most importantly 48 kHz is essentially "the standard" on most operating systems, hardware, hardware drivers, various audio frameworks, codecs, audio applications, audio editors, etc, etc. For example, 48 kHz is the only required sample rate in the Intel High Definition Audio Specification (section 7.3.4.7):

https://www.intel.com/content/dam/www/public/us/en/documents...

Section 2.2 also describes that the timing for the HDA stream format is based on 48 kHz sample rates and other sample rates require some gymnastics (section 5.4.1) to implement.

- Perhaps not surprising to the HN crowd there can be various issues with other sample rates and resampling across different hardware, operating systems, etc:

https://bugs.webkit.org/show_bug.cgi?id=154538

https://productforums.google.com/forum/#!topic/chromecast/k6...

https://bugs.maemo.org/show_bug.cgi?id=5794

https://groups.google.com/forum/#!topic/discuss-webrtc/EGvSb...

https://bugs.chromium.org/p/chromium/issues/detail?id=86767

In 2018 we should be faithfully playing, recording, and transporting audio at 48 kHz. Legacy applications such as the PSTN, CDs, etc will continue to run at their various rates but I can't think of many valid reasons for new applications to use anything other than 48 kHz.


Thanks for the informative response. (You, too, Justin & Chuck.) Much appreciated.


What if the final sample rate of whatever you're recording will be 44.1kHz? How much is audio affected by resampling 48kHz to 44.1kHz?


If you're working digitally, you won't introduce any noise. There is the possibility of errors in sounds between 22.05kHz and 24 kHz, but they're easily fixed with a low-pass-filter at 22Khz before resampling.


While there are many codecs than can somehow cope with noisy and lossy channels, G.711 (ie. what is in telco world known as "The PCM") is not one of them.


Why G711? G722 is much better with the same 64 kbit/s.


So what do you use this for? Playing audio back in a large room like a restaurant?


Personally I've used PulseAudio to send audio over the network from my Linux NUC to my Windows Desktop, so I could watch videos and similar on it without using noisy analog audio or get a new soundcard with optical input. This seems like it could fill a similar role except in reverse.

I have separate monitors but use Symless Synergy to share the mouse and keyboard.


This, but in reverse. My soundcard was on a Linux host and I used scream from a Windows VM. Worked beautifully.


PulseAudio works on Windows? Last time I've tried installing it I didn't get very far.


I just used the Cygwin version, worked straight up.

I've added it as a scheduled task, set to start on user login, so I don't have to mess with installing it as a service. I'm not going to listen without logging in to my desktop anyway so for me not a huge restriction.


With this you can cheaply replicate a Sonos style whole home networked music system, by having the recievers be something like some raspberry pi zeros wired up to some amplified speakers.

Ive been wary of stuff like Sonos due to the cost and the proprietary nature of the tech, but this Scream driver seems to be a nice simple alternative.


I don't have a sonos but isn't one of the selling points the fact that clients are kept in sync? Does this actually do that? Shairport-sync does.


Well, if the receivers are always-on, and listening for the UDP packets from the Windows PC, then surely any lag causing de-syncs would be minimal - in the millisecond range.

For a one-speaker-per-room setup, it would probably be 'good enough'.

Needs investigation I think.


I think you'll be surprised how quickly it drives you nuts (if there's ever the situation where you are hearing another speaker). There's also the rather brilliant https://github.com/badaix/snapcast if you do want sonos-like synchronised playback.


Snapcast would likely work better for this use case - it handles audio endpoint synchronization.

https://github.com/badaix/snapcast


Oooh wasn't aware of Snapcast. Thanks! Bookmarked.


Definitely stay away from Sonos. I bought a pair of cheap Ones a month ago and next thing I knew I had four Ones, four Play:1s, two Beams, and a Connect.


Looking through the source, this implementation is only stereo (so far). Might not be too hard to expand that out to 5.7/7.1.

https://github.com/duncanthrax/scream/blob/35bc85bd830fe40e8...


The Raspberry Pi outputs audio at 14 bits. Its DAC might be too noisy for music applications.


I think most for audio playback would be using an external DAC (i2s or USB).


I use it to send audio from my Windows QEMU VM (running on its dedicated GPU) to the Linux host for playback.


That. Or hotels. Multicast is great for things like this, just it doesn't tend to work over ISP networks, just on a local network.


sonos alternative


Does anyone know any ios apps that can act as a multicast reciever?


Wait... so you are unable to install drivers unless signed by MS now? Wtf...


You can disable driver signing enforcement if you want to install drivers without having them signed.

Driver signing enforcement enhances the security of the OS by preventing malware from installing kernel mode drivers that would hypothetically have unlimited access to the device without the consent or knowledge of the user.

There really is no downside to this.


afaik you cannot just 'disable' it. You have to disable it every time you boot the computer, and secure boot has to be disabled. I would agree with you if microsofts signing process was 21st century and anyone could do it through proper channels but last time I checked it's archaic and targetable by anyone wanting to make a virus anyways. https://news.ycombinator.com/item?id=17195758


> You have to disable it every time you boot the computer

This is straight-up untrue. You disable it to install the driver. It's fine after that.


The downside is having to pay hundreds of dollars for an EV code signing cert just to submit a driver to Microsoft for Microsoft to sign.


If you're a professional or part of a professional organization, this is a small hurdle to clear and most would agree that the increased security posture for the end user is worth the inconvenience/price for developers.

If you're an amateur/hobbyist/tinkerer and just want to play with kernel driver development, then you can disable the signing enforcement.

Anyway I'm sorry for derailing from the actual point of this post - it's a very cool project!


Professional organization vs hobbyist is a false dichotomy. We have entire operating systems that would be classified as made by hobbyists, yet are used by "serious organizations".

It also means, that we will not get some nice things, we would otherwise have. See also iSCSI initiator or FUSE for macOS. They have same problem - need to be signed by the right certificate, so nobody bothered.

Other systems, when they are running in Secure Boot mode, also accept kernel modules signed by the same keys, that are enrolled into UEFI. Why can't Windows and Mac do the same?


> We have entire operating systems that would be classified as made by hobbyists, yet are used by "serious organizations".

Which ones? Linux certainly isn't made by hobbyists today. It used to be, but that was a very long time ago.


Why Windows of course. ba dum tish


Why can't Windows and Mac do the same?

Because of corporate control.


Fuse for macOS is signed by its developer, not sure what you mean "not get some nice things"? Or did you mean for windows? (Both windows and macOS require paid developer certificates to load kernel drivers)


I meant both.

I wasn't aware that FUSE for Mac is signed nowadays. When I was checking it few years ago, it was an issue.

The iSCSI terminator for Mac still has the same issue. The code is on github, making it work is the user problem.


> Anyway I'm sorry for derailing from the actual point of this post - it's a very cool project!

Yeah cool project, yet if you'll read the issues on github, you might realize, that at some point no one will be able to use it hassle free, when the author stops renewing a code signing certificate. And for a hobby project, I wouldn't blame him. Some people can't use it hassle free now, because of signing issues.

I ceratinly can't install it on my gf's notebook, so that she can play audio over my speakers, because it would stop working after each reboot.


That's not how it works at all--I run unsigned drivers on some of my machines all the time because occasionally user-hacked drivers work better with this or that video capture device. You have to reboot to install an unsigned driver; it works just fine afterwards.


I'll try, then. Thanks.


If you speak to someone who knows a thing or two about security they may tell you this is a good idea.


Are there any plans for JACK support?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: