Hacker News new | past | comments | ask | show | jobs | submit login
VoRS: Vo(IP) Simple Alternative to Mumble (stargrave.org)
86 points by stargrave 9 months ago | hide | past | favorite | 71 comments



> users tend to complain about [Mumble's] newer client versions quality and convenience.

I have never heard anyone complain about Mumble's UX. Especially not that it has gotten worse. Or it's resource usage, which is practically unnoticeable on any average system sold past 2005. It's the best-in-class and has only gotten better.


And even if I had, I'm quite sure they wouldn't have meant they wanted it in the terminal instead.


Well cover me in foil and charge extra, I guess I'm rare.

Why would I ever want a GUI for a voice application?


You are aware that nearly all computer users never use the terminal at all, right? Why would they want anything other than a gui?


> WebRTC-based solutions are insane bloated incredible monsters [...] They work mainly only if you use the same kind of software and codecs, for example Chromium[...]

That's just not true anymore these days. I've used WebRTC between Firefox, Safari, and Chrome without any issues.

> only if you use the same kind of software and codecs, for example Chromium, that requires dozens of gigabytes of disk space and much RAM, CPU time to build it.

There seem to be non-browser implementations [1] [2], although I can't vouch for their quality.

[1] https://liburtc.org/

[2] https://github.com/awslabs/amazon-kinesis-video-streams-webr...


libdatachannel[0] is exciting me the most for 'low footprint'. str0m[1] is especially exciting, but you have to be in a device/ecosystem that allows rust.

> only if you use the same kind of software and codecs

With WebRTC I can connect clients with varying support of H264, VP8, VP9, AV1, Opus, ulaw and alaw. WebRTC has plenty of flaws, but 'lack of negotiation' is one of them.

[0] https://github.com/paullouisageneau/libdatachannel

[1] https://github.com/algesten/str0m


Oh, this looks great! Also thank you for Pion :)

Edit: Woah, I somehow completely missed that Pion does WebRTC media too (I've only seen it in a data channel project I've toyed around with in the past). Definitely add that to my list above, and I can somewhat vouch for it, it's cool!


Thank you so much for using it. I feel so honored(and surprised) that people use it. I totally stumbled into a successful project.

If I ever can be helpful with your projects reach out!


I'm generally not a person who rails at bloat, but WebRTC is definitely a package. It's a suite of SIP, SRTP, ICE, STUN, TURN, and codecs to do voice/video calls. This makes it versatile and seamless but yes also heavy and opinionated. A good example is just how few blessed ways there are to use WebRTC outside of the browser.


What makes something a 'blessed way'? Are you looking for community support, large corporate users etc..? Not being snarky, curious about how people see the space. Lots of implementations exist and I have used most of them for different projects, not all these projects went to production though.

* https://github.com/elixir-webrtc (Elixir)

* https://github.com/pion/webrtc (Golang)

* https://github.com/webrtc-rs/webrtc (Rust)

* https://github.com/algesten/str0m (Rust)

* https://github.com/sepfy/libpeer (C/Embedded)

* https://github.com/awslabs/amazon-kinesis-video-streams-webr... (C/Embedded)

* https://github.com/paullouisageneau/libdatachannel (C++)

* https://webrtc.googlesource.com/src/ (C++)

* https://github.com/shinyoshiaki/werift-webrtc (Typescript)

* https://github.com/sipsorcery-org/sipsorcery (C#)

* https://github.com/aiortc/aiortc (Python)

* GStreamer’s webrtcbin (C)


> Are you looking for community support, large corporate users etc..?

Lots of documentation and demonstrations. And honestly, probably a bit better SEO or maybe searching on my part. For some reason GStreamer's webrtcbin totally flew under my radar.

Also not trying to be snarky/negative about WebRTC. I think it's fantastic at what it does. But as a tinkerer, I want something with more tinkering surface area. Not to detract from the great work on WebRTC and how accessible it's made low-latency streaming.


WebRTC actually feels exceptionally tinkerable to me, with human(-ish) readable SDP offers, public RFCs, various implementations, and importantly it being accessible from JavaScript playgrounds right within the browser!

I've learned a lot about it over the years by following https://webrtchacks.com/, which does just that, i.e. taking apart various open and closed VoIP implementations (WebRTC or otherwise) and seeing how they work.


> You have to verify downloaded tarballs authenticity to be sure that you retrieved trusted and untampered software.

I'm not able to access the site over TLS, so this is currently impossible. Anyone else have better luck?

    (from the installation instructions)

    $ [fetch|wget] http://www.vors.stargrave.org/download/.   vors-2.3.0.tar.zst
    $ [fetch|wget] http://www.vors.stargrave.org/download/vors-2.3.0.tar.zst.sig
    [verify signature]
    $ tar xf vors-2.3.0.tar.zst

Guarantees nothing, if you're actually being attacked. You can't serve out the tarball and the public key[0] and the signature insecurely and get any guarantees about authenticity.

[0]:http://www.vors.stargrave.org/PUBKEY-SSH.pub


I can see your point about a man-in-the-middle attack on HTTP, but realistically if the the attacker can do the MITM he can probably also figure out how to modify the contents on the webserver, in which case TLS does nothing


> realistically if the the attacker can do the MITM he can probably also figure out how to modify the contents on the webserver, in which case TLS does nothing

Sorry, I don't see how that follows. If you had access to the webserver you would never even attempt a MITM attack in the first place.


Most websites these days use Crimeflare or equivalent already anyways which is a MITM proxy, so you're still not safe there either.


Yes, however there's a big difference between Cloudflare and an arbitrary MITM. If you think Cloudflare is mangling traffic in a malicious way you would need a shred of evidence to substantiate that claim.

I would also push back that "most websites" use a 3rd party TLS terminating CDN/proxy layer in front of the actual webserver.


> If you think Cloudflare is mangling traffic

I'm more concerned with the data they receive being leaked or sold.

And by most websites I guess I meant large ones that people frequent every day, because these days it's almost impossible to have any sort of useful DDoS protection without using such a service.


> I'm more concerned with the data they receive being leaked or sold.

But that's not what we're talking about here. We were talking about a MITM attack on key/tarfile distribution in the context of verifying file integrity.


The _biggest_ selling point with mumble (though I've not used it in anger in about a decade) was the way it tied in with the game you were playing and gave you in-game spacial audio.

Stand next to someone in game and they came through loud and clear. Move away from them sideway and it's now quieter and only through one headphone... I found it extremely novel and immersive when I used to play FPS games; everyone on mumble but absolutely not like being in a chatroom.


>...its server side is still written on Qt, which requires hundreds of megabytes of additional libraries to build it up.

See:

https://github.com/umurmur/umurmur


I wonder if there are plans to offer a "headless" mode where call-status is streamed to an FD instead of using a TUI. I'd love to be able to compose/wrap this tool outside of a terminal emulator.

I've been idly hacking on something quite similar but using QUIC instead. One of the downsides with QUIC of course is that it assumes TLS by default which makes it inefficient for use on encrypted transports (like wireguard, zerotier, tailscale, or yggdrassil.) I'll definitely give this a go as I'm a big fan of NNCP, another project by the author.


I'm not sure I understand the reason why; for non-browser, you can set up a FreePBX or something in 20 minutes and it will work with any softphone. But if I was to use a chat today, I'd use something in-browser, so based on WebRTC and incidentally peer-to-peer (well, kinda, but voice/video flow will be). So not sure I understand who/what this is for.


This is REALLY important software nowadays imho.

I'm old enough to have spoken to people on analogue land lines: the sound was crisp, you could hear small background noises, you could hear someone breathe.

Nowadays we usually speak to people on digital lines that are highly compressed (to the extend that is messes with the sound quality), low freq range (no bass, very high sounds) and cut up (without enough sound or when then other party makes more sound the stream is completely interrupted).

And it does not have to be like this! All of this is in favour of the network operator (or centralized chat servers e.g. whatsapp) trying to save some data/money. While many of us have paid for unlimited data!

On top of that much of the conversations are not properly e2e encrypted!

I've used Mumble to speak to people I love over long distance and the quality is just so much better: it's like the analog experience of my childhood. Hearing ever breath, background noise and all in high quality makes all the difference some times.


> I'm old enough to have spoken to people on analogue land lines: the sound was crisp

You're old enough to have forgotten what land lines sounded like.

They intentionally dropped frequencies from the audio signal so that they wouldn't have to carry the data contained at those frequencies. This is why nobody ever sounded like themselves over the phone.


Old phone signals (Plain old telephone service or POTS) cut off frequencies above 4 khz. You cannot hear the difference between f and s, as these are higher frequency sounds at around 8 to 12 khz. You'd have to say f like in Fred or s like in Steve? Cause you literally could not hear the difference.

This is the reason broadband internet (ADSL) is called broadband. Because dial up internet used the POTS frequency band below 4 khz, and broadband used the (broad) frequencies above.

It's also the reason you'd have to get a technician out to your house in order to get ADSL. They would install a frequency cutoff filter on the phone line into your house, and hookup the landline to the below 4 khz side and the ADSL modem to the above 4 khz side, so the POTS signal would not interfere with the ADSL signal.


Pet peeve: This is why we use "broadband" as a synonym for "fast internet", but we should stop using it like that, since it completely neglects the symbol rate in that equation :)


I wonder if it was more of an issue with the mic and/or mic placement on phones? Old landline phones had a large mic right by your mouth, whereas cell phones have little pinhole mics near the bottom of the phone.

I feel like I've actually noticed the opposite. My parents still have a landline, although it's basically VoIP (through their cable company) but it's connected to the analog lines in their house. I've noticed they generally sound clearer on their cell phones than they do on the landline.


It’s different audio codecs and data connections which can be changed and adjusted.

Voice over mobile data will benefit from a different type of data compression relative to how data packets are handled in a cellular radio vs a wired internet.

The landline service could be as clear or clearer than mobile data, it just isn’t configured to do so in your case. I have seen VoIP setups using a high quality codec and a handset that can use those frequencies.

Phone companies use different setups that compress to their advantage for many more callers.


I'd not really call it different packet handling (except for some of the earlier, mostly 2G ones not providing FEC at lower layers and delegating some level of error concealing to the codec, as far as I remember):

The main difference is that the bandwidth available was just much lower, so mobile codecs are compressed more. (Satellite phones take this to the extreme – 2.4 kbps is a typical data rate after compression there!)

But so were e.g. international trunk lines; they squeezed a lot more than one voice channel into 64 kbps using compression, silence suppression etc.

> The landline service could be as clear or clearer than mobile data, it just isn’t configured to do so in your case.

An analog landline has relatively little chance of ever gaining wideband support, since that would require swapping out line cards at the provider, and the trend seems to be to get rid of these entirely (in favor of a VoIP adapter in the CPE).

I think I've once used "HD voice" on a "landline" when calling a mobile phone, but that only worked because my home router was actually doing SIP in the background.


Cell phone audio traditionally only covered 300Hz-3.4kHz before being lossily compressed down to 4kbps (or sometimes higher, depending on network, load in that service area, etc). That is complete shit. Recently, there have been other protocols adopted with greater bandwidth (all the way up to 7kHz, which is still several khz short of covering all the content of speech, but considerably less terrible) and less compression, but if you have audio that's actually good and not merely passable, it's probably because your phone is actually transmitting audio as voip, with a much better codec then is used for cellphone audio transmitted over the standard channel.


G.711, the standard encoding for home phone systems since they went digital, is usually filtered at 300–3400Hz as well. Chances are if you had a home phone in the 80s or 90s it was filtered at 300-3400Hz somewhere along the path.


And the history of that filtering even predates digital lines, due to frequency multiplexing: https://en.wikipedia.org/wiki/L-carrier


Ironically, you've got a better chance of getting acceptable quality on mobile-to-mobile calls these days than when calling from a landline:

The big mobile carriers actually have VoIP interconnects preserving wideband audio, while connecting to an (especially smaller) landline carrier might still involve a circuit switched path (going to the physical location of the area code dialed, too!) that inevitably forces everything through a 4 kHz, 8 bit bottleneck.


> They intentionally dropped frequencies from the audio signal so that they wouldn't have to carry the data contained at those frequencies. This is why nobody ever sounded like themselves over the phone.

I remember switching from early GSM phone to a landline because I like to hear my love breathe


Remember what you like. The facts are that sound over landlines was not crisp, and in fact was a train wreck.

It's particularly ridiculous that you're complaining about digital lines having a low frequency range.

Whatever you remember, it isn't reality.


I remember landlines fondly, it felt like I could hear people better back then. At the same time, I also remember the 4khz cutoff. It was the reason music sucked over landlines. Like if you were listening to music with a friend from highschool over the phone, it'd have no depth and you couldn't understand half of it from lyrics to beats. Hold music was a lot worse back then, these days it doesn't sound near as bad.

I think for a lot of us later Gen-X and older Millennials, as we age our ears don't work as well as they used to. Especially those of us like me that didn't heed good ear protection. I can't speak for everyone, but I suspect if today me tried to talk to someone on a landline back in the early 90s it'd still suck as bad as it feels like it does with cellphones today. We get more range with our phones now but our ears have a harder time processing it.

Just a thought.


Very good point – the "landline generation's" ears have aged considerably since the 90s.

On top of that, I think many remember "landline quality" in terms of a relative comparison with potato-quality early mobile phone codecs, analog mobile phones, heavily compressed discounted long-distance calling circuits etc. of the time.

Yes, landlines were better than any of that, but it doesn't mean that they were actually good by today's standards.


> by today's standards.

Which? Whatsapp call sound shit. Mobile phone calls sound shit.

I did Mumble to get some acceptable quality.


> Whatsapp call sound shit.

Not for me; it's way better than any landline I've ever used.

Not sure what we're doing differently – are you sure it's not your or the other party's speaker or microphone?

> Mobile phone calls sound shit.

Not for me either, at least not when EVS ("HD voice") is used, which is more often than not these days when calling friends/family.

2G connections used to sound quite bad, but since 4G, the limiting factor for me has been the other side being on a landline (mostly for business calls), which usually doesn't support wideband audio.


WhatsApp uses opus for its voice functionality.

The default codec configuration for Mumble is opus.


Maybe i dont live near you.


GSM was indeed heavily compressed (voice channels were only around 12 kbit/s, compared to 56-64 kbit/s on landline), but comparing a modern VoIP codec like Opus (which is what almost every VoIP solution uses these days) to GSM FR or even EFR compression is like complaining about 64 kbps MP3 compressed by an early 2000s codec sounding bad and going back to vinyl for quality.


How old of a landline are you talking about here?

Anything newer than (heavily depending on the country, probably) the 70s or 80s or so would have been very likely PCM u-law or a-law at 64 kbps (i.e. 4 kHz audio bandwidth at 8 bit), which is literally a mandatory codec in WebRTC.

It would have to be a really old, purely analog baseband line without filters (maybe a local call between offices), frequency modulation etc. to preserve more than the typical 4 kHz of audio bandwidth you'd get on these. Inter-trunk connections were often frequency multiplexed to fit more channels onto a physical wire, which also limited them to 4 kHz.

Today, 64 kbps gets you much farther using a modern codec like Opus. WhatsApp sounds better than any landline or native mobile phone connection I've ever used in my life.

> (or centralized chat servers e.g. whatsapp) trying to save some data/money

WhatsApp uses P2P for (non-group) calls if at all possible.

There's also a "save data for calls" option in the settings which is off by default.

Modern codecs are so good, adding even more data would literally not make any discernible difference. A sizable fraction of all data transmitted/received by modern VoIP is IP and UDP framing overhead.


Mobile connections are often especially crap, using something much lower bit rate than G.711 µlaw. Even modern landlines use voice activity detection and comfort noise generation instead of just passing through the background sounds.

I've never used WhatsApp to call. I have used decent SIP connections with G.722 Wideband or OPUS. They sound better than the old landlines. Discord sounds better too. Signal, much worse.

I think often the problem is that cell phones really have crappy speaker and microphone placement for calls, as basically nobody actually makes calls on them anymore.


These days, unless you're on 2G or 3G (if they're even still available in your country), mobile phones will often use AMR-WB or EVS when calling over IMS (i.e. VoIP over LTE or 5G), which are both wideband and considerably better than G.711 (and probably even G.722; while they have lower bitrates, they're also considerably more modern).

The problem is that when calling across networks, the connection might still go over a legacy circuit-switched exchange, and that compresses everything down to narrowband again.

I hope that whoever regulates the PSTN in the US will force a switch to all-IP interconnects at some point, since now we get the worst of both worlds (often somewhat lower reliability due to badly managed VoIP services, combined with potato quality because of a legacy interconnect somewhere between VoIP networks).

All IP could also provide much more efficient routing: Right now, as I understand it, if you're calling somebody with a 212 area code and both you and the callee are physically in San Francisco, your connection might still be routed through some circuit-switched exchange in Manhattan, which isn't great for latency or high availability.


Cool project anyone reading this may be interested in:

https://github.com/Johni0702/mumble-web

I've never used it but it should make having a p2p conversation through Mumble as easy as pointing your browser to some URL. UX matters (Mumble clients, including mobile apps, are not very user friendly last time i checked: they require some level of skill to use them)

Unmaintained for the last 4 years, sadly.


This is great but would need to run on windows for the typical mumble use-case of gaming.

It would also be great to have this running on ESP32 or similar, so you could make dedicated IP desk intercoms - I envisage a star-trek style intercom, with each button being a channel that you can join by pushing it in (can join multiple simultaneously).


> It would also be great to have this running on ESP32 or similar

ESP32 has its own port of the Codec2 library, which allows intelligible communications using very low bandwidth (a few kb/s). It could make the ideal solution for creating small cheap intercoms scattered around a small area using WiFi, or employing very low bandwidth radio connection such as LoRa for wider coverage. I still didn't see any real application, beside a few simple proof of concept videos on YouTube, though.

More info:

http://www.rowetel.com/?page_id=452

https://www.arduino.cc/reference/en/libraries/esp32_codec2/


It's so nerd-sad to see Ukrainian Army doing just fine with Discord screen sharing on Windows[1], realistically obsoleting a large bulk of proper intercom solutions. I want to have to have a working RTS KP-32 on my desk!

1: https://i.redd.it/hjcg9hdnqvqa1.jpg


I'm more concerned about the winrar in that photo. Someone please tell them about 7z


It's clown world level to see any military using a consumer chat application for sensitive operations, especially one made by a foreign cooperaton. They really don't seem to take opsec seriously even when leaks have already cost lives.


They made very successful use of consumer-level tech to defeat military-grade tech, e.g. various uses of drones to deliver munitions, and GSM networks as beacons over enemy territory. No wonder: given their constraints, they jump on every solution that can work right now, with near-zero effort. While using a foreign-controlled tech is usually frowned upon, the US is not the side from which they particularly need to hide. I would rather see them using Signal though.


> This is great but would need to run on windows for the typical mumble use-case of gaming.

You can game on Linux just fine.


You can certainly play a number of popular and/or high quality games on Linux, but it's hardly ubiquitous. If you're interested in a given specific game I would say your odds of it having a viable Linux experience are worse than 50/50 yet, and if it's a AAA title I'd say it's closer to 10-25% IME.

If not for Valve/SteamDeck this would be even more dire, and many of those experiences are still emulation-based i.e. Proton.

If you want to be able to play any given computer game being sold this year, that's only reasonably certain on Windows. Gaming is the only reason I still have a Win10 install.


This is simply not true. As someone who has been gaming exclusively on Linux for over a decade I don't even bother checking for support before getting new games anymore because almost all of them just work with Wine/Proton or only need minor workarounds.

The only things that do give problems (besides new releases which are promply fixes) are some multiplayer games with anti-cheat that intentionally breaks under Linux. But if you support those kinds of games then you reap what you sow.

Proton is also not really magic - it's mostly just Wine + DXVK and while Valve has improved both they didn't start either of them and weren't involved before they became viable. SteamDick is entirely irrelevant here - if anything it has caused developers to not release native Linux builds because of the way Valve does certification (or unfounded fears on the devs side).

> If you want to be able to play any given computer game

Moving goalpoasts much? This is like saying you need to be an american in order to use computers because there are some computer models that were never exported. There are more games than you will ever be able to play which work under Linux, for any genre. For older games it is often even easier to get them to work under Wine/Proton compared to native Windows.


Here's my report:

1. Overwatch -> works great in Proton, albeit about 20% slower than Windows

2. Factorio -> works great natively on Linux, maybe even better

3. Warframe -> works okay in Proton but crashes sometimes

4. Call of Duty (recent) -> kernel anti-cheat both fails to stop cheaters and refuses to work on Linux

I tried living the Proton lifestyle for about 2 years, from around 2019 to around late 2021. Then I just dug my old computer out, reinstalled Windows 10, and went back to using Windows for games.


As a counterexample, I switched to Linux in 2013 for all use cases, gaming included, and haven't looked back. Many of my favorite games do have native Linux builds, but I've had little issue with those that don't.


We can probably cut this back and forth short just by linking to ProtonDB [0] and AreWeAntiCheatYet [1].

[0] https://ProtonDB.com

[1] https://AreWeAntiCheatYet.com

The Steam top 1000 games:

Rating System: ProtonDB Medals

29% Platinum

47% Gold

11% Silver

3% Bronze

4% Borked

In aggregate:

76% P+G (Games you can just play)

86% P+G+S (Games you can play with minor tweaks)

Anticheat:

154 Supported (45%)

39 Running (12%)

3 Planned (1%)

118 Broken (35%)

25 Denied (7%)


This reminds me of a ESP32 Walkie-Talkie: https://www.reddit.com/r/esp32/comments/mbwq6f/walkietalkie_...


It mentions WebRTC but this project wouldn't work in browsers as it's a Go CLI tool. Is there any alternative to WebRTC nowadays? How far along is WebTransport? Anything else that could rely on older web tech?


The project just mentions WebRTC as bloated, it's not WebRTC compatible at all. WebTransport isn't there yet so browsers are hamstrung by not having access to real sockets.


Of course, WebTransport also isn't real sockets. It's more like WebSockets but with more modern transport semantics such as unreliable data and multiplexing without head of line blocking (as long as it's run over QUIC).

For security reasons, browsers don't allow Web sites access to raw TCP or UDP sockets.


Yeah I was being a bit sloppy equivocating the two. WebTransport gives QUIC based messaging in the browser but not TCP/UDP.


Anyone able to find the source code? Can't see it listed anywhere



I see the source is in the tar.zst file and there is documentation for uncompressing it.

Only I spent minutes looking for the git repo link... There is none?


unrelated, but I wonder if anyone can identify the terminal font used in the screenshot? I am looking for a good (free) serif monospaced font.



>No GUI requirement. Why would someone need a GUI for voice application? But a fancy real-time refreshing TUI would be desirable. Mumble tends to output no information, sometimes hiding the fact of a problem and that everything stopped working.

Eh why would someone need a fancy real-time refreshing TUI for a voice application? Just write logs to stderr and status updates to stdout like regular people.

Honestly, go a step further. Take microphone audio in on stdin and produce speaker audio out on stdout. Then you can skip ALSA/OSS/jack/pipewire/pulseaudio support and just leave that to the user to compose.


You've practically described baresip.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: