Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Are there good self-hosted video conference tools?
198 points by ketzu on March 16, 2020 | hide | past | favorite | 87 comments
As we switched to full home office at my university, we started establishing short regular video conferences. Unfortunately, most tools are overwhelmed and we would like to self host our infrastructure.

Do any of you have experience with self hosted tools for video conferences?

You need a lot of GPU time to run a decent video conferencing system.

That's because every video feed usually needs to be realtime, low-latency transcoded to match the receivers bandwidth requirements. If some people in the meeting are on 3G while others are on fast internet, you can't send the same data to all of them! You can't send the same to all of them if different client devices have different hardware video encoders/decoders. Start doing software decoding and you'll soon end up draining users batteries like Zoom!

In a 10 person meeting, thats 10 incoming video feeds, and 100 outgoing video feeds. Not many machines can encode 100 video feeds in realtime! Obviously you can skimp on quality a bit and bucket users (ie. we'll have a high, a mid, and a low res feed, and just pick which to send).

For all the above reasons, that tends to be why self-hosted video conferencing systems are kinda laggy and gobble battery and have poor client support.

Big companies offering hosted VC solutions tend to have dedicated video encoding chips, so they can cheaply make hundreds of video streams to send to every participant.

For a low number of people in the room, WebRTC will work well in a fully P2P way. There's no need for a centralized server unless you have more than 10 users connected.

Also, in practice for a big number of users you won't need to encode the videos streams more than 10 times because most people have the same kind of characteristics (you'll have up to three level of network connectivity, and most of your users will have h264 support so you will have few codecs to support).

For instance, you could go for 720p h264, 480p h264, 240p h264, and audio only for people with too little bandwidth or no h264 support, and you'll be good for most use cases.

> WebRTC will work well in a fully P2P way. There's no need for a centralized server unless you have more than 10 users connected.

I wish. With 10 users you have 102 - 10 = 90 different (unidirectional) data flows.

Each of them can run into a different bottleneck where person X cannot hear Y and Y cannot hear W...

And they cannot be monitored centrally (!).

Compared to a centralized location where you would have 10 user-->server + 10 server-->user flows.

On top of that, ISPs care about bandwidth between customers and large services e.g. AWS and don't care about ISP-to-ISP traffic.

> I wish. With 10 users you have 102 - 10 = 90 different (unidirectional) data flows.

I don't understand your maths: in a p2p network with 10 clients, each clients ends up having 9(up) + 9(down) streams, with a centralized server, there is still 9 streams to download, but a single upload.

Of course it requires 9 times less upload bandwith with a centralized server (or you get 9 times better quality for the same bandwidth), but as long as you don't have too much clients, p2p works fine.

Keep in mind that for video chat, the biggest issue for QoS are latency and jitter. Using a centralized server won't help on those and if you have a good connection in terms of jitter and latency, you usually don't have throughput problems either.

> And they cannot be monitored centrally (!).

Why ? You definitely can push QoS metrics to a central serverif you want (but this server will have a much lower load to handle than if it had to centralize the video chat themselves)

> On top of that, ISPs care about bandwidth between customers and large services e.g. AWS and don't care about ISP-to-ISP traffic.

And AWS makes you pay a lot for every byte.

> That's because every video feed usually needs to be realtime, low-latency transcoded to match the receivers bandwidth requirements.

Scalable video (and audio) never seemed to have taken off:

> SVC standardizes the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. A subset video bitstream is derived by dropping packets from the larger video to reduce the bandwidth required for the subset bitstream. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or lower quality video signal.

* https://en.wikipedia.org/wiki/Scalable_Video_Coding

* https://en.wikipedia.org/wiki/Bitrate_peeling

Yes, but perhaps now is the time, given the circumstances.

What's the client support for it?

There does seem to be a general push for it, including AV1:

* https://www.w3.org/TR/webrtc-svc/

nitpick because the slant of your argument is correct, but in a self-hosted system you don't necessarily need a single machine to encode 10 video feeds, each client (assuming this is a P2P approach) only needs to encode one - the user's. It does need to transcode that to 9 outbound streams, which is asking quite a lot of even current high end laptop hardware, and basically impossible on a smartphone.

Add in the unrealistic upstream bandwidth needs and, yeah, your point still stands.

I think a lot of WebRTC video conferencing solutions don't do server side video encoding - the clients encode their own stream(s) for high/low bandwidth, and server side they have a selective-packet-forwarder, or SPF. The clients are usually smart enough to only send the high bandwidth stream when that speaker is in focus.

Modern video codecs support SVC which makes the computation of several lower quality levels computationally inexpensive.

Firstly, a client will have 10 outgoing video streams at most, not 100!

Secondly, if we consider receivers bandwidth, a client can encode 2 or 3 video streams, e.g. low, medium, and high quality video.

I think he's saying the server would have to handle 100 streams. 10 clients * 10 streams each.

Still, isn't it one stream per client? Why 10 streams each?

If you're all connecting directly to a streaming server (instead of using something p2p), the server will have 10 connections for inbound video feeds and then 10*9 open outbound video connections to send each video to each client connection.

> 100 outgoing video feeds. Not many machines can encode 100 video feeds in realtime

The server will send 9 video streams to each connection, but it has to process only 10 video streams since it will send the same video data to all clients, if we ignore client bandwidth.

> if we ignore client bandwidth.

The parent post explicitly took the client bandwidth into account, that's why it was 100 instead of 10.

But your point about 10 vs 9 still stands. Still, 81, and if you bucket into 3 tiers, that's still 27. Which at least scales linearly instead of quadratically.

Exactly. It scales linearly, not quadratically. I forgot to mention that since clients do the "original" encoding, the server actually does less work (regarding resolution tiers).

agreed. do not agree with OP that all 100 require transcoding

I wonder if OP meant "1 outgoing and 9 incoming" streams.

Is there no video compression algorithm that lets you drop every third "line" or something from a higher resolution feed, to let you downscale video efficiently for bandwidth savings, but using only little CPU, and providing okay-ish quality?

That's not really how video codecs work. Below is a gross oversimplification to bootstrap the parent's knowledge.

Video is usually both temporally compressed (across frames), and over frequency domain (ala jpeg) in the frame itself.

Transcoding is essentially what you're asking for, it downscales the video. It's the cheapest currently or we'd be using something else.

If you just dropped data you'd get that blocky square thing that you get when watching tv or satellite and the signal goes lossy.

There are, but they are not widely used.

Matrix is a fascinating decentralized chat and VoIP conferencing option that can be self-hosted. I’m running the Synapse server and recently tried it with a couple of friends. The 1 to 1 video experience was noticeably sharper than Google Hangouts.


Edit: This is the client software: https://about.riot.im/features

How did you set up video ? Last time I checked it was not straight forward. Do you have any software/plugins/repositories to recommend ?

The official install docs have a TURN server option to help with VoIP and all of it worked without modification. What problems did you run into?

I don't really remember. Compiling a lot of things maybe, or running out of space or RAM on my VPS.

I'll try to get it running again this weekend.

You can try Jitsi meet[1]. Open-source and with all the usual features.


The CPU usage of their web client is a pity, everything else is perfect.

We (the company/department I work in) setup Riot.im for chat/room based communication while everyone works remotely. As part of this, we added the Jitsi add-in for conference calls.

At the end of day one I can say that Jitsi audio is pretty good. However, the video does seem to freeze quite regularly. This was in calls with only 3 to 4 people. I don't know if it's a Jitsi problem, or somehow an integration problem with Riot.im.

Then again, because I can imagine that Jitsi has probably experienced astronomical traffic growth over the last few days as more and more people transition to remote working during the corona crisis, I don't want to be overly critical. I'll see how it plays out over the next couple of weeks. However, right now I'm not sure I'd want to use it for client based external calls.

If you’re using the default jitsi we provided, then yesterday it was incredibly overloaded. We are working on it currently; typically the video quality is rock solid.

That's the only one I found so far, but we'll definitely check it out!

I've recently set jitsi up. Its dockerized version is pretty easy to get working and supports ssl automatically if you have a domain name.

Yeah, this works like a charm for me. I host daily with 5-10 people without any problems.

Jitsi beats anything else ,including commercial stuff. You don't need any account, just browser.

It's simple and works very well! maybe a little bit harder to self-host but definitely a top option!

There are .deb packages for Ubuntu and Debian available. I installed it on a freshly provisioned VM running Debian 10 in about 15 minutes. They even provide a shell script to obtain an let's encrypt SSL certificate.


If you use it via docker compose there is nothing hard to self host. I literally tried this out within minutes. A few more minutes to put it on the normal port with a real certificate.


Thanks to the other comments, I now think it's not that hard to self-host it :-)

edit: still, self-hosted won't have the phone numbers to join the room

It says that I should use Chrome instead of Firefox.

Firefox does have issues selecting the correct microphone device.

A group of friends were and still are evaluating video conferencing for small groups (<6 people).

Here's what we've tried so far:

Nextcloud: I'm running a nextcloud instance since ~2 years (it's awesome) and tried nextcloud talk several times, though for some reason we could not get audio and or video to work.

Just this week we tried jitsi because it's open source and can be self-hosted. Unfortunately while it worked in principle jitsi uses large amounts of cpu cycles when running in the browser and additionally latency and video quality was an issue.

For us the best option as of now is still https://whereby.com which works well regarding stability, audio/video quality and latency, though it's not self-hosted. I'm not affiliated in any way with them, I just like their product.

I'd love nextcloud talk to work, and I'm very curious about other solutions I'm not aware of - self-hosted and otherwise.


After reviewing my nextcloud talk setup I saw that I didn't set up a TURN server which might be the reason it's not working. So it's not nextcloud, but incomplete setup.

I use nextcloud talk with installed coturn (STUN/TURN) to regularly connect with friends. But it is mostly one-on-one, no experience with +4 people. Sometimes there is the issue for browsers to pickup the mic but this will get sorted out. I recommend it. Before we used appear.in (now whereby.com). For work face-to-face is nice and human, but screensharing+audio is where the productivity really is. I think mumble would be fine too if it can be transport encrypted.

Regarding Whereby, I used them actively until they became paid for groups bigger than four and introduced sign-up. Then I decided to build my own free solution porting many features in a form of plugins, to keep the main tool light and fast. You can check it out if interested: https://xroom.app

Much as I like Nextcloud, Nextcloud Talk doesn't really scale worth a darn. I'd use it for no more than a dozen or two people. For even a medium-sized business it's way too little.

Twilio just released a (mostly) full featured client for their Programmable Video service: https://github.com/twilio/twilio-video-app-react

We decided to roll this out to our users as a free service for 6 months. Our users are small business owners in the physical meat-world and don't typically have a Zoom subscription already.

I made a couple tweaks to the app- namely to be able to embed as an iFrame into our web application (and pass the room+user name in as GET parameters).

It works great! We've deployed it to Twilio's infrastructure so we didn't have worry about any of that. Will be rolling out to users today.

Can you share the iframe embed code?

Sure! Check out https://github.com/leesalminen/twilio-video-app-react and https://github.com/twilio/twilio-video-app-react/pull/95 . It's a single commit to add support for <iframe> embed.

Don't forget the value of finding ways to avoid the need for video conferencing. While many may feel unavoidable, alternatives like delegating authority, simplifying tasks, and such can often achieve more than a meeting.

The before covid-19, a friend who organizes conferences told me how his firm achieved more by his not going to the second annual conference he organized in Shanghai and delegating more to his former helper, now main Shanghai organizer.

Even if you can't avoid everything, the more you find ways to simplify, the more you build your experience and skills to do so more the next time.

My experience with my customers is that we hardly need to see each other. Chat all the time, voice calls when they save time, screen sharing about 50% of the time we're in a call, video almost only by misclicks on camera icons.

You can try setting up https://www.kurento.org/

Note that all modern browsers support WebRTC. I wrote a simple server to setup WebRTC sessions between browsers that worked, but it's probably easier to use kurento.

Kurento looks pretty great. Here's a brief snippet from their 'Getting Started' guide which could be useful too:

"If your intended application consists of a complex setup with different kinds of sources and varied use cases, then Kurento is the best leverage you can use.

However, if you intend to solve a simpler use case, such as those of video conference applications, the OpenVidu project builds on top of Kurento to offer a simpler and easier to use solution that will save you time and development effort."

(edit: and here's a link to OpenVidu: https://openvidu.io/ )

You could try https://www.yameeting.com/ we give free service for small meetings < 10 ppl with no time restrictions (no enforced limit except server capacity)

You can also use the Ninja Mode (disposables VM's), but you'll have to pay for that.

Disclaimer: I'm the owner of yameeting.com and we use jitsi meet behind the scenes.

School is closed, but class isn't. Classrooms and play-from-home groups urgently need a solution for parents/teachers. What I'm looking for is a solution for classrooms or other activities for tweens (think 4th graders). I need to be able to limit the users to only those who are authorized, but leave the rooms open for joining/exiting much like a chat server. Jitsi itself is great for that except there is no white-list authorization and that I'd have to host it myself (the free host is overburdened). I think you're targeting a different use case, but schools and parents everwhere need a classroom solution urgently.

Hey, thanks for you comments. The idea behind yameeting it's the "Ninja Mode" where you have disposable VM's to reduce costs and help to prevent intrusions on the servers where the meetings are taking place.

You can't limit who enters the room using a whitelist with the out-of-the-box jitisi installation but since it's using XMPP you can have that functionality.

I'm pretty sure there are solutions out there that focus on those features.

If you want more information or help with this just contact me hello at yameeting.com

But if you use the jitsi sfu (media server) the streams are not really encrypted between the participants, its only encryped peer1<->sfu and sfu<->peer2

Yes, that's true it's not end-to-end encrypted as you say.

Here's more information about that https://github.com/jitsi/jitsi-meet/issues/409

I'm surprised that no one mentioned Bigbluebutton.

Open source, easy installation, good documentation and has a great community.

It's the first time I hear of it, but https://bigbluebutton.org/ times out and it doesn't seem to be packaged for Arch Linux (not even AUR) so it doesn't look like it's very well known?

Yes ! We use this one at work (150 employees) and it's really good. You can do only audio or audio + video. It supports recordings and is self hosted. Give it a try !

It's really lacking in Linux distributions

You are right. 64 bit Ubuntu 16.04 alone is supported. It allows them to focus and ship faster updates.

I have recently implemented a Janus server for my client and it's working great for them!


There's Nextcloud Talk: https://nextcloud.com/talk/ but I haven't personally tested it yet.

Never worked for me.

DuoBango https://github.com/DoubangoTelecom/doubango

Use with any standard SIP client. It can be stand alone or part of a PBX... It can be crabby to build and as others have pointed out it's gonna need a bit of horse power but by no means do you need a super high end 48 core monster but your not gonna host this thing on a AWS micro instance either.

We were looking for a good LAN video conference tool, since we (ZeroTier) could use it on virtual LANs for telework. The ideal would be something where you can do a conference call by just entering IPs or finding participants with mDNS. It seems nothing like this exists beyond some really old abandonware for Windows. I spent a little time seeing if we could trick VNC into doing this but it's too clunky.

Keep in mind that depending on the model used by the video/audio processing this might require some hardcore transcoding and/or bandwidth.

Normally a few SD/HD streams are not that hard, but it does add up.

- matrix-synapse as the server

- coturn to help webrtc audio & video get through

- riot.im as a client (includes jitsi)

It's not rocket science. All the server side stuff is in Debian main. Clients available for every major platform. The riot desktop client even has a properly maintained apt repository...

How is the quality and stability?

Feels rather solid in most ways I interact with it. Try the public matrix.org instance through https://riot.im/app to get a feel for it...

Since nobody has mentioned it - multi platform seems to be a necessity.

Another SFU is https://github.com/pion/ion

Someone also built a CLI based conferencing solution https://github.com/dialup-inc/ascii :) stop wasting all those resources on X11!

Do you know the oss https://jitsi.org/jitsi-meet/ They also have docker containers: https://github.com/jitsi/docker-jitsi-meet

Not sure how well it works for actual video conferencing, but https://www.vidyo.com/ is on premise product.

We use it for its audio/video in an ATM/VTM (virtual teller) system.

During testing I did 1-1 video calls and it works well.

It uses 3 Linux VM's that was delivered to us via OVA.

Working on a webbased FOSS video conferencing/telephony solution: https://github.com/garage11/ca11

You should definitely take a look at Elos[1]:

Every common feature you need, multiplatform, and customizable

1: https://elos.vc/site/en/

We’ve been using these guys to pick tools, though we didn’t ask for self-hosted vidcon:


Isn't SfB (neé Lync) still available as self-hosted ("on-prem") solution? Of course it's not necessarily exactly "good" but it exists and can work


I have no experience with this. But I imagine XMPP solves your problem in some way: https://en.wikipedia.org/wiki/XMPP

Why do you need video? Most video conferences are about watching someone's lips move - so you can drop to audio (you still need to deal with the deaf though). With a little work you can turn most video calls into audio only calls and make this greatly easier.

Sometimes writing on a blackboard while talking seems to be the best way to teach. However if you can encourage those who can teach other ways to do so it would be a great help.

No, they're about having a more personal interaction absent an in-person meeting. Body language, facial expressions and even some inflection is lost on phone and completely absent in text.

My company is remote and all of our meetings internal and with clients are done by video conference. I wouldnt dream of ditching the video just for a call bridge.

As someone who doesn't read body language well I'm glad for that lack.

Though your point does stand, people do need to work without it.

Totally agree. I find video is ok for initial meetings, but after that I vastly prefer audio-only, for small groups. Works well with inferior wifi connections in coffee shops, etc. Using this for global discussions every day. Currently I use Skype since it works well for audio-only. I prefer zoom for video, for one thing because it usually works well in China, and the 40 minute limit for free calls has been removed now. Also zoom doesn't force the use of a web client; in my experience, reliability, performance and web clients for videoconferencing don't go together. May explain why they are now worth more than several airlines.

There is no 40 minutes limit for <=2 persons in a Zoom meeting. Are you sure the limit is removed for larger groups? I don't see this stated on https://zoom.us/pricing .

It's removed for users in China (1) Nota bene: Zoom in China has now been somewhat separated from global Zoom (2), and is now hosted in China. Haven't been on a call with anyone over Zoom in China for a while. I don't think that people in China using Zoom are always able to join Zoom calls elsewhere. Will be verifying this shortly. For me, the only thing that always works to China is Wechat (Weixin) (I'm careful as to what I say on it).

(1) https://www.businessinsider.com/coronavirus-covid-19-spread-...

(2) https://technode.com/2019/09/19/chinas-zoom-users-switch-to-...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact