Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Jam, an Open Source Clubhouse (w/ WebRTC) (jam.systems)
317 points by tosh 11 months ago | hide | past | favorite | 66 comments

Hi everyone,

a few days ago @DoubleMalt, @mitschabaude and me did a one-day hackathon to see if we can get a minimal WebRTC based version of a Clubhouse-style "room" to work.

Since then we added a TURN server, bug fixes ("can you hear me?") and a bit of ui polishing. It is still early (please don't be too harsh) but you should find Jam to be quite usable as it is right now.

Jam runs in any modern browser, at least the ones we could get our hands on. But the long-tail of browsers and operating systems is … long, so please let us know if it does not work for you, and what you are using. We want Jam to run if it can run. If your toaster supports WebRTC it might run Jam as well.

We tried to keep the implementation fairly minimal, I'm sure there is more we can remove and simplify once we spend more time on it.


Please give Jam a try and let us know what you think. Any thoughts, comments, feature requests, hosting-deploy-target requests etc etc highly appreciated.

(for those who are not familiar with Clubhouse yet:)

A room in Clubhouse (or on Twitter Spaces) is a bit like Zoom without video, without screen sharing and without text messaging.

You see a grid of all the people in the room, who is speaking, who would like to say something (slow mute/unmute flashing), who is clapping (fast mute/unmute flashing) and that's it.

Over the past few years I experienced what we all now call "Zoom fatigue". I just don't enjoy video calls. When the pandemic started it felt like even people who in the past were happy with phone calls also wanted to jump on Zoom calls. I'm glad that eventually slowed down again.

I'm super excited about Clubhouse, Twitter Spaces and the activity around audio spaces and live calls.

In hindsight it is fascinating how the ui for group phone calls on smartphones has not advanced much compared to the non-smart phones for such a long time.

It was interesting to me to hear the concept of "zoom fatigue". In my world perspective as a deaf person; the rapid adoption of video calling in 2020/21 represents a huge leap forward in accessibility where I could now lip-read people and converse much more naturally.

It was interesting to me too. I'm autistic, so I'm used to communicating with people focused almost entirely on their faces, not giving or receiving many cues from the rest of their and my body language.

But it turns out most people depend on being able to read and send those nonverbal signals subconsciously with the rest of their bodies to get their points across, and not being able to do so is exhausting for them. It's easier to communicate by phone, where that limitation is expected.

I'm not sure how it can or will, but I'd love to see this Zoom fatigue motivate people who are used to having a wider array of communication methods available to them figure out how to capture that in remote comms.

Thank you for bringing this up!

Do you have experience with speech to text systems for live calls? Which product/service does this really well? I'm super motivated to find a way to add great speech-text and text-speech support to Jam.

thanks for building this. maybe you can point me in the right direction with the naming.

Names like /club/ (house) or /jam/ sounds like they might appeal to an audiophile. apart from using a name that suggests it lends itself to collaborative online music sessions is there anything in this technology that improves on the sound quality/delivery over what the existing platforms do?

tl;dr: Music production evolves over time and digital content distribution is very different in the mixing business. Older formats are not built with streaming in mind and so you lose a lot of quality during upload to streaming services: https://www.youtube.com/watch?v=cHxMsawJsTc

Instead of just another SaaS that claims they've invented the group-call (which we could already do over circuit-switched networks minus the data monetization & user tracking) is there any benefit for those of us who really take the "listening" part seriously?

I like the Ux, very simple and responsive. It would be great if users on the stage could share a file with everyone else.

Oh yeah, and sometimes it’s nice to be able to post text messages too, but I mean, ideally you could upload images and post gifs and have video as well.

...of course, we’re now describing a different product.

This is deliberately minimal so you cant do those things because they’re undesirable for this product.

Sure, those are useful things too... but surely not every thing needs to have every feature.

Have you used teams recently? Did you know it’s soon gonna have a “data” tab so you can live explore your data from PowerBI without every leaving teams?

Featuuuuuure creep is what turns nice things into visual studio: bloated monoliths that tick every possible feature box but slow down both at runtime and in terms of features over time.

Having a sharp, refined product problem with a limited feature set is really nice.

...but it’s open source of you wanna mod it yourself...

This could easily run on top of matrix, which would provide all that out of the box.

Could you point to some examples/tutorials of audio chats based on matrix?

throw in file.pizza!

Nice work.

What type of backend do you need in terms of servers/users if this was a real product? Clubhouse has been suffering under heavy load in certain cases. WebRTC is mostly P2P, but there is still a significant server load to handle edge cases and large rooms.

You should show this off to Mozilla or some other web company that's backing WebRTC and try to get some funds for scaling up your servers if you need to in the future.

Spot on. For larger rooms we will probably have to add mixing on the server side and that will definitely put more load on the server.

We also want to make it super simple to spin up your own Jam for example if you are organizing a micro conference over a weekend and for that we will need to better understand what that means so we can recommend what kind of server/specs to go for.

Thanks a lot for bringing this question up. We need a good answer.

We are currently reaching out to friends and strangers who have a bit more experience running high demand WebRTC + mixing setups to learn about the scaling challenges.

That said, we are quite surprised how well p2p WebRTC works for audio-only in smaller rooms if most participants have a decent internet connection.

WebRTC + Opus is magic.

Amazing work! The WebRTC community needs something like this so bad. Not only will this push a bunch of users toward self-hosted/free software but will also inspire others to build cool things :)

If/when you hit scaling challenges I would love to help! I maintain github.com/pion/turn and github.com/pion/webrtc. One of the reasons I built it was so that I could put my TURN and Signaling server in the same process. It makes it way easier to tie your auth together for signaling+TURN. Then if you do go down the SFU route lots of interesting things you could do. You can see that with how screego[0] does it.

Happy to help however I can (even if not using Pion!)

[0] https://github.com/screego/server/blob/e845b3d29c4b5794ed10f...

Very interesting, thank you for the pointer Sean, followed you on Github and will check out pion!

Perhaps Scalable WebRTC could help you there? Pure client mesh https://github.com/muaz-khan/WebRTC-Scalable-Broadcast without having to add more servers on your end to mix anything

Definitely will look into that

Nice project. For server side mixing, how hard would be to integrate Jam with Mumble? There is a Github repository that does it: https://github.com/Johni0702/mumble-web

I highly recommend MediaSoup for this. It’s all node based and is great for creating “rooms” from WebRTC connections.

Thanks for the pointer!

Janus is another famous SFU but they do have a plugin called AudioBridge that mixes all audio on the server https://janus.conf.meetecho.com/audiobridgetest.html

Works flawlessly, only wish would be to have some sort of public lobby (or many small lobbies) to try it out and listen in for a bit to get a feel for it. Great conversations don’t require hundreds of people, so I wouldn’t worry too much about supporting larger rooms just yet. It certainly would be nice however, to just browse through or tune in to random rooms (like a radio station) so I can discover interesting stuff (if meant to be public).

Love the open-source approach!

btw one thing we could not solve so far:

it seems like on iOS Safari there is _no way_ to tell whether sound output happens via the loudspeaker or the earspeaker (super annoying when you have the phone at your ear and the audio stream re-connects and then randomly switches output to the loudspeaker — which is really loud), there might be a way to work around this without introducing a native app but so far we did not find one:


The dreaded iOS Safari WebRTC audio. I never had much luck with the volume always changing. Best thing I came up with was declaring a new Audio Context, that seemed to fix it 50% of the time

I don't have access to Clubhouse but don't you lose an additional ton of valuable information when there is no video? Like non-physical (video) communication already has its challenges but this takes them to another level. Or am I missing something here?

I prefer audio only, it removes the stress of how you look, and you can do something else while talking (without feeling watched).

Clubhouse has proven quite popular for traditionally underrepresented groups, and I believe it is in a large part due to being audio-only. For speakers, they can focus on what they are saying and hearing, rather than what they look like.

It also lowers the cost of entry. While smartphone cameras are quite good, the expectation for video quality has risen to the point where it is prohibitively expensive for a large portion of the population. When you factor in lighting, hardware and software for editing, bandwidth and time, it excludes a huge number of people and narrows down the diversity of content. Traditional podcasting is also suffering from a similar rise in total cost. Clubhouse eliminates most of those costs - all you need is a smartphone. Granted, it is iOS-only, but this will not be the case for much longer. We can expect Clubhouse to explode even more when it launches for Android.

For listeners, everyone is subject to unconscious bias. By being audio-only, it helps reduce a lot of the factors that would bias our opinion of someone.

It's a feature. Not a bug.

It’s like podcasting vs YouTube. Audio-only can be better under certain contexts

I don't get the point of Clubhouse. Hasn't many video meeting apps already supported turning off video streaming in chatting?

I think Clubhouse’s community and interest-oriented spaces will probably be its key differentiating feature. Video meeting apps are generally targeted towards known audiences, while Clubhouse lets you spontaneously hop into a room with strangers.

Nice work! A colleague of mine did a similar thing in 2014 but it bitrotted terribly. Code is at https://github.com/tOkeshu/bananaphone. Great to see some alternatives popping up!

He presented it at a mini conference in Berlin: https://www.youtube.com/watch?v=pyIIkUV3moM&list=PL37ZVnwpes...

Fantastic, so great to see how quickly you were able to get this up and also it's a blessing for those of not allowed in to the iProduct citadel. Also awesome to see Berlin & Vienna represented.

Re: growth - the crypto community has been jumping into Clubhouse but I think pushing this as the local + opensource alternative could help Jam see very quick adoption.

Love this! Please make landing screen a teeny bit prettier and you could definitely start recommending this to regular 'retail' folk. I'd consider putting the about text below the main 'create room' panel for example

It looks like a nerd app, just needs a tiny bit of swish to make it a mom and pop app too

Thank you, now that basic calls are working we will definitely put more effort in the look and feel.

We also plan to allow some basic customization like adding a logo and selecting 1-2 colors so you can make the Jam better fit the identity of your community (a bit like on reddit).

Appreciate the suggestion of switching the "about" and "create room" segments, great idea!

We just released a new version that has the "about" and "create room" segments flipped around :))


Is there a way to view a list of rooms? I want to listen to something!

Yeah I second this, a discovery mechanism for public spaces would be nice.

This is a really neat and clean design but can't do much in a room by myself. Might give this a try with some family members later on.

This is great! I've used Clubhouse over the past couple of weeks, and have been looking forward to seeing the opensource/self-hosted version.

Imho tho, one super important thing that Jam needs is federation. The best thing about Clubhouse is the discoverability of other people public rooms and being able to dip in and out of random conversations. The combination of self-hosting, open-source, and federation would be a great alternative.

Thanks! We are definitely looking into ways how to help with discovery (for Clubhouse atm social media also tend to work better than the hallway) including federation.

Thank you for bringing up federation.

I was thinking about building something like this based on Jitsi. But I got 20% CPU utilisation on a 2GB/1Core Hetzner Cloud Instance with only three Test Audio Connections. Wonder what's going on. I already tried to switch off logging but it didn't help much. Do you have Data how much resources you need to host this?

Could you build this on top of Agora ? That would make it much nicer from an open-source perspective. It's one of the only services that can carry the backend audio scalably.

The product side is far more interesting...than the carrier api. Not worth spending time over that.

Other alternatives are Twilio, Daily.co, etc

And then you would be risking leaking your data to CCP. Not worth the risk. It will be bad PR for Clubhouse too once the media gets the smell of it.

It's kinda sad that nobody cares, especially because we are dealing with a genocidal regime. It's going to bite us all back in the near future.

And to be honest their documentation is very poor.

Has an open source clone of some wildly popular commercial service ever had substantial success?

Signal, Telegram Moodle Discourse


Wikipedia (Encarta, Encyclopedia Britannica)



I wouldn't say that counts as a service.

What about the latency? The online jamming software, such as Jamulus or SonoBus work hard on reducing this factor. But this is very specific for online rehearsals. If that would be achievable through WebRTC would be awesome!

Looks great! Would be great if you could share how you went about building this!

Just tried it and it's awesome. Looking forward to having more jams on it

thank you Faisal, appreciate it =[^_^]=

I've created a room here if anyone is up for it https://jam.systems/hn-9p1r

How is this {better,different} from dogehouse.tv ?

the page isn't loading for me, i get Uncaught SyntaxError: Unexpected token . on bundle.js:25391

Interesting, looking into it. Can you share which browser you are using?

Thanks a lot, it's the optional chaining operator. We'll have to avoid it or change the ES build target.

thank you!

This is awesome! Keep it up - I'll see how I can contribute.

wow this is incredible! How many concurrent users does it support?

Thank you.

That's a great question. Our current setup is p2p WebRTC and should scale to many small rooms (up to ~12 people) fairly well. That said, there is a TURN server for NAT traversal and a signal server for coordination.

We'll try to get a better picture of where the limits are this week. I don't have a good answer atm to be honest.

Not an expert here but have some experience with it:

Assuming that each peer is connected to every other peer via a mesh network [see this image for reference: https://github.com/feross/simple-peer/blob/master/img/full-m...], each outgoing stream (esp. audio / video) is likely going to be duplicated, per recipient.

Scalability over a mesh network is fully dependent on CPU and network performance of all of the connected devices, and I'd doubt it could handle 12 participants if there is video involved, unless all participants are running relatively high-end and modern devices, with optimal network conditions.

You'll need a SFU or an MFU running on the server to handle larger rooms, while enabling all connected devices to only have to send one output stream per media type, regardless of how many connected participants there are.

Looks like a just wrtc "helloworld" example from MDN.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact