
How to create a video call application with WebRTC - xueyongg
https://blog.phuaxueyong.com/post/2020-06-15-how-to-make-a-video-chat-app/
======
bkanber
I also decided to learn WebRTC and built a video chat app project:
[https://zonko.chat](https://zonko.chat)

The last time I did any p2p networking was back in 2002 or something when you
still had to do it all manually. We used all sorts of fun tricks like NAT hole
punching, and using little script endpoints to capture and forward along port
and public IP address information.

It was fun to see that all of this has since been formalized under the "ICE
framework". I was surprised to see that the STUN spec is only 12 years old
now, despite the techniques involved being used for at least 20 years,
probably more like 30+.

So if anyone who's new to this whole p2p world feels that WebRTC and the ICE
framework is confusing or onerous, I would point out that just a short while
ago these were basically just a handful of heuristic techniques developed
through trial and error over the years. It's really much easier nowadays!
zonko.chat only took me 12 or so hours to build (and seems to be well-
supported by chrome and ff, even mobile).

Edit: Upon reflection, I don't even remember how I learned about some of them.
The concept of TURN was probably one that I, and many thousands of others,
invented from scratch due to necessity (failed to punch the hole? fall back to
this custom relay I wrote in perl). STUN was an easy one to figure out
yourself, too. I don't remember how I learned about hole punching though.
Probably a forum or a book. Or possibly just an experiment ("what if the two
connections touch somewhere in the internet at the same time... hey wait, it
worked?") What's interesting to me is that the core "ICE" concepts (hole
punching, STUN, TURN) are still pretty simple even in their mature,
formalized, scientific form. But the concept of "SIP" is _much_ more
sophisticated today than it was back then.

~~~
vxNsr
> _The last time I did any p2p networking was back in 2002 or something when
> you still had to do it all manually._

> _It 's really much easier nowadays! zonko.chat only took me 12 or so hours
> to build_

While it may have only taken you 12 hours to build in "real time" I'd say
you've been "building" it for the last 20 years. If a newbie tried to do this
they could expect to spend a few weeks or more on the project would be my
guess.

~~~
bkanber
That's a good point! I did not need to relearn core concepts.

The bulk of that 12 hours was actually spent debugging negotiation and
timing/race issues in the signaling/SIP layer. Even for me, figuring out how
the WebRTC API is supposed to work was a little difficult.

~~~
vxNsr
I hope I didn't come off as overly aggressive, the comment was mostly for
myself, I was feeling badly that there was no way I would be able to do
implement something like this in 12 hours.

Often veterans here will talk about projects they trivially did. but unless
you dig into their profile and find out who they are it feels like every joe-
shmo on hn is a 1/2mil+ SWE at google.

~~~
bkanber
No no, that didn't sound aggressive at all! Though I still wouldn't say the
project was trivial, just very limited in scope. And had I done everything
perfectly -- meaning, being able to stream code from my head with no errors --
the project may only have taken 3 hours. That's how small it is. I think the
finished product is 800 lines of code, server and client together. It's
conceptually very small too. (Maybe this is a 'veteran' skill as well, being
able to keep things small.)

My point is that a whole 75% of the time I spent on this project was just me
flailing around with stuff that wasn't working as I expected (that's relatable
at all experience levels!). Perhaps it's true that veterans can get through
certain things more quickly than novices can, but we're not immune to the
80/20 rule either! We just get stuck on different types of problems.

------
figers
WebRTC seems easy when you're creating a proof of concept with-in your own
network, once you get into complex situations behind firewalls across the
internet it's a whole different story.

The article mentioned "This section we will just touch and go about when do
you need a TURN server. It is not needed in all situations but a component
needed if you have to deal with slightly less straightway use cases." a TURN
server is a must in the real world...

~~~
mothepro
Not necessarily, TURN servers should only be required for strict NAT. It seems
that 92%* for connections are not behind strict NATs.

*[https://developers.google.com/talk/libjingle/important_conce...](https://developers.google.com/talk/libjingle/important_concepts?csw=1#portssocketsconnections)

~~~
bkanber
STUN fails under symmetric NAT, not strict NAT. That google document makes no
citations of that 92% figure, but I assume that's for desktop traffic only.
Pretty much all mobile/cellular connections would require TURN too.

~~~
cma
I haven't found Verizon or AT&T to need TURN, but maybe clients on them would
if connecting to each other.

------
zumachase
The issue with webrtc is once you step out of the side-project domain, you
have to confront the endless implementation differences between browsers,
whether it's undocumented SDP behavior, different codecs, non-conformant
behavior for low level calls, etc.

We've built a push-to-talk walkie talkie system called Squawk[0] which holds
long lived webrtc connections in the background throughout the day. We use
simplepeer[1] as the base to help bootstrap some of the browser shimming, but
it's not perfect. So ultimately we've had to build all sorts of checks into
our protocols like an audio keepalive where we send periodic frames (20ms) of
silence down the media channel, and verify that we received some additional
header bytes on the remote end, because otherwise webrtc would let the
connections rot and you wouldn't know until you needed them which in a push-
to-talk situation is too late.

[0] [https://www.squawk.to](https://www.squawk.to)

[1] [https://github.com/feross/simple-peer](https://github.com/feross/simple-
peer)

~~~
gurjeet
I loved the idea and interface etc. so I registered and downloaded the app.
But upon launching, my corporate (ge.com) proxy blocked it:

    
    
        Not allowed to browse Newly Registered Domains category
        URL: https://app.squawk.to/
    

This is perhaps because your whois records are bare minimum!

    
    
        $ whois squawk.to
        Tonic whoisd V1.1
        squawk noah.ns.cloudflare.com
        squawk kim.ns.cloudflare.com
    

You might want to look into that.

Also, to increase adoption, perhaps give the users a link to directly use the
app.squawk.to URL to use as well. I tried that on my personal device and it
works as advertised.

------
hardwaresofton
For those interested in seeing how WebRTC can really scale check out some of
the media servers/SFU (Selective Forwarding Units)s that are out there:

\- janus[0][1]

\- mediasoup[2][3]

\- Medoze[4][5]

It's never been easier to start your own video streaming platform.

[0]: [https://janus.conf.meetecho.com/](https://janus.conf.meetecho.com/)

[1]:
[https://www.youtube.com/watch?v=zxRwELmyWU0](https://www.youtube.com/watch?v=zxRwELmyWU0)

[2]: [https://mediasoup.org/](https://mediasoup.org/)

[3]:
[https://www.youtube.com/watch?v=_GhdFOZTWTw](https://www.youtube.com/watch?v=_GhdFOZTWTw)

[4]: [https://github.com/medooze/media-
server](https://github.com/medooze/media-server)

[5]:
[https://www.youtube.com/watch?v=u8ymYTdA0ko](https://www.youtube.com/watch?v=u8ymYTdA0ko)

~~~
deskamess
Any solutions where WebRTC can be transcoded to an Mpeg2 transport stream?

~~~
hardwaresofton
What you want are transcoding features -- some of the servers offer it (ex.
Kurento[0] for example which is not listed above), but some don't (ex.
mediasoup[1]) and some offer recording but you need to wrangle formats
yourself (ex. janus[2]).

[0]:
[https://www.kurento.org/tags/transcoding](https://www.kurento.org/tags/transcoding)
[1]: [https://mediasoup.org/faq/#does-mediasoup-
transcode](https://mediasoup.org/faq/#does-mediasoup-transcode) [2]:
[https://janus.conf.meetecho.com/recordplaytest.html](https://janus.conf.meetecho.com/recordplaytest.html)

~~~
deskamess
Thank you! Will take a look at Kurento.

------
davidsawyer
A 19-year-old[0] built a video call app[1] with WebRTC and open-sourced[2] it.

Here's a podcast interview[3] about how he did it.

[0]: [https://github.com/ianramzy](https://github.com/ianramzy)

[1]: [https://zipcall.io](https://zipcall.io)

[2]: [https://github.com/ianramzy/decentralized-video-
chat](https://github.com/ianramzy/decentralized-video-chat)

[3]: [https://syntax.fm/show/256/webrtc-and-peer-to-peer-video-
cal...](https://syntax.fm/show/256/webrtc-and-peer-to-peer-video-calling-with-
ian-ramzy)

~~~
fictorial
This is an iteration on Twilio Video's example:

[https://www.twilio.com/blog/2014/12/set-phasers-to-
stunturn-...](https://www.twilio.com/blog/2014/12/set-phasers-to-stunturn-
getting-started-with-webrtc-using-node-js-socket-io-and-twilios-nat-traversal-
service.html)

[https://github.com/philnash/video-
chat/tree/intro](https://github.com/philnash/video-chat/tree/intro)

------
jpgvm
IPv6 is becoming increasingly common here in Asia (I'm based in Thailand but
travel a lot around Asia). For context I have used 3 different ISPs here and
all have dual stack, both my cellular connections have also been dual stack.

This has meant NAT is less of an issue for native IPv6 endpoints, including
P2P.

Hopefully when IPv6 is finally widespread in US/Europe we will see stuff
taking more advantage of this fact.

~~~
Orphis
It's not just about NAT but firewalls too. I have native IPv6 but you can't
reach my computer directly from anywhere.

------
flyGuyOnTheSly
I'm eager to create a higher quality video broadcasting (not web meeting, one
way only) app for some local yoga studios I help out with and am hoping this
article gives me a push in the right direction.

The audio quality on zoom is just terrible no matter if you disable DSP or
not.

So many yoga classes require high quality music.

It's frustrating that chaturbate provides top notch video and audio quality
for free essentially, while paying $20/mo for zoom gives you what looks like
380p video quality and audio quality I have yet to find a poor comparison
for...

Does anyone know how one could emulate what chaturbate does?

Any good articles outlining how they do what they do?

Ideally, the teacher would just plop their phone down in front of them, hit
broadcast, and a few seconds of buffering later 1080p video and quality audio
would be visible through a browser.

Why is that so tough to do??? I haven't been able to find a single article
that simplifies or distills it at all.

~~~
grayfaced
Zoom getting the music audio through mic sounds like the real problem. You
should be aiming to stream the audio from digital source. Then you could have
the song titles overlaid on video. There's definitely licensing issues though.
The instructors are probably already not using legit licenses for their
classes though.

Also a lot of audio codecs are tuned towards speech and filter out high
frequencies. You should pick one meant for music.

~~~
flyGuyOnTheSly
We thought about suggesting spotify playlists while zoom classes are live...
and zoom manages to screw with other app's audio sources as well somehow.

Give it a try.

Start playing any form of music through your phone.... mp3... youtube...
spotify app... and then open a zoom meeting.

It distorts the sound of the music so much and I can't figure out why they
would do that or what purpose it serves.

Probably just poor coding.

------
talkingtab
I've been looking into webrtc and used the "webrtc samples" which are good in
many ways. It is fairly easy to get something up and running, but I found
several areas that were difficult.

* debugging. One users sound just doesn't work while it works perfectly for me with different machines. I am clueless as to how to debug it.

* ice. while it works, I had a hard time understanding, tracking and debugging what was going on.

* closing and restarting connections

* multiple clients in one room?

* echo cancellation. This was frustrating for users.

* Turn. Is there a tool or way to know which clients need a turn server? Are using a turn server?

I ended up guessing that getting it to be a product would actually be fairly
time consuming

~~~
bkanber
WebRTC doesn't do everything for you; it's really just responsible for tying
together ICE with media streams. Signaling is up to you to figure out. For
instance, multiple clients in one room: this is part of the signaling layer
and is not WebRTC's responsibility (I built this into zonko.chat if you want
to see how it works though).

Closing and restarting connections is signaling layer stuff, ie your
responsibility.

Echo cancellation is really _supposed_ to be application layer and up to you
as well, but I think this will probably shift to be the
browser's/WebRTC's/getUserMedia's responsibility at some point.

Re. TURN: ICE is the process that works out whether a specific client needs to
relay through a TURN server. The question is: do you need to implement a TURN
server? The answer is: yes, you need a TURN server. If you built a P2P app
that you want to work for all users, you will always need a TURN server. You
can run coturn on the same box that you serve your app from. Most likely a
side project will never hit the scale requiring more than a $5 digitalocean
box for TURN.

And yes, it should not be a surprise that products are time consuming to build
:) WebRTC is plumbing; you probably were expecting something more like Jitsi.

~~~
mebeam
FYI, echo cancel actually does work ( chrome definitely ), just make sure you
specify the audio constraint so that it has a sample rate of 16khz ( aec does
not work on the default 44/48khz modes )

~~~
bkanber
Good suggestion, I will have to try that out!

------
perenzo
[https://brie.fi/ng](https://brie.fi/ng) \- a modern pure open source WebRTC
implementation. It can even blur peoples background for visual privacy.
Sources at
[https://github.com/holtwick/briefing/](https://github.com/holtwick/briefing/)

~~~
perenzo
I added an own entry here
[https://news.ycombinator.com/item?id=23523830](https://news.ycombinator.com/item?id=23523830)

Learn more about the details:
[https://brie.fi/ng#help](https://brie.fi/ng#help) Also see [https://webrtc-
security.github.io/](https://webrtc-security.github.io/)

WebRTC is end-to-end encrypted by default. There is a signaling server that
helps establishing the connections between users in a room, but after that the
communication is encrypted. Also those TURN and STUN servers are only required
for technical reasons to get peer-to-peer working. So no content is ever
passed unencrypted.

That's the difference to other services like Zoom and Jitsi, where a server in
the middle is receiving the video streams unencrypted and then redistributes.
Although Jitsi is adding encryption support for that as well soon.

------
xueyongg
Took some time over the weeks to play and figure out WebRTC. Made a simple app
out of it. Do check it out!

~~~
EGreg
We started a simple webrtc app in 2018. Thought it would be simple. Now two
years later we are still tweaking the code and dealing with handshakes and
codecs across browsers, as well as edge cases involving firewalls and what to
do if someone disconnects for longer than the timeout.

Finally we had to invent workarounds for Cordova:
[https://mobile.twitter.com/qbixapps/status/11564841564250398...](https://mobile.twitter.com/qbixapps/status/1156484156425039872)

Did anyone set up a WebRTC that was super easy and worked rock solid with just
a few lines?

~~~
Mulpze15
Aren't those codec exchanges and handshakes managed by ICE? Why would you need
to go into this layer yourself?

~~~
fictorial
One example – H.264 is hardware accelerated on iPhone so one might prefer this
over VP8 which could drain the device's battery pretty quickly when used in a
P2P mesh setup.

------
shyamady
Twilio costs but not a bad idea. I created
Remotehour([https://remotehour.com](https://remotehour.com)) which allows you
to have an 'open-door' policy video call easily. It works with Twilo :)

------
whoatethedonut
This reminds me of Icecomm[0] from a few years back. Unfortunately, it didn't
stick around for too long. It was pretty easy to use, as well, and a lot of
people here ended up in a video chat together[1]. LOL!

[0]:
[https://news.ycombinator.com/item?id=8952880](https://news.ycombinator.com/item?id=8952880)

[1]: [https://medium.com/@icecomm/how-launching-icecomm-on-
hacker-...](https://medium.com/@icecomm/how-launching-icecomm-on-hacker-news-
created-the-most-curious-chat-roulette-ever-731fb22dc072)

------
panpanna
Does anyone have a better tutorial on webrtc?

I didn't find the article particularly good.

------
mcjiggerlog
I have some experience in this from developing
[https://p2p.chat](https://p2p.chat) a while back.

As others have mentioned, building a simple project is fairly simple. The
difficulty comes when you want to scale to more than ~4 users without the app
becoming unusable. Adjusting audio/video constraints to ensure that you get
optimal media streams is quite difficult, also. Nevermind dynamically tweaking
them!

------
rergaerg
In real life, the STUN server rarely works, and thus, the myth of this peer to
peer utopia was never realised, and why webrtc did not receive any attention

------
ryanrolds
A small group of friends and I are working on a virtual karaoke club using
WebRTC and Go,
[https://github.com/ryanrolds/club](https://github.com/ryanrolds/club). 100%
agree with the WebRTC being easy to create proof-of-concepts, but there are a
lot of edge cases and browsers differences that have to be worked through.

~~~
xueyongg
What are your plans to deal with the scalability issue thus far? I think that
is always the biggest challenge.

~~~
ryanrolds
The current plan is to keep everyone in "groups", think friends at a table,
small (max 6). The server will maintain peer connections with everyone in the
"room" and broadcast the singer via that peering. As the singer changes, the
server will simply allow the KJ to pick who is getting broadcast over the
other server -> client peer.

------
xueyongg
I really learnt so so much from the entire thread discussion here today!
WebRTC I've gathered is oftentimes easy to create, but the real challenge is
in the scalability. From the way it seems, scalability is only possible with
the forwarding architecture with the use of the Selective Forwarding Unit or
of like.

I always wonder if there is a way to think outside of this 'box'.

------
cf
One thing I've wanted to make is something like gather.town where I can remix
the audio and video so that different users sounded louder or quieter. But, I
never figured out where in the WebRTC API that is done. It seems like I need
to set up my own SFU and put the necessary logic over there.

~~~
Sean-Der
Are you trying to adjust the volume of the remote audio streams? Can you
change value of the `audio` element of the DOM?

I think you could also do it with the WebAudio API. If you throw up a repo
would love to try and help :) having a backend makes it so much harder to
deploy/maintain stuff.

~~~
cf
Yes I want to adjust remote audio streams but I don't see where in the API I
can iterate over each user's audio feed.

~~~
moron4hire
It's not in the WebRTC API. You can change the volume on the audio tags you
create, or you can pipe the audio media stream into a WebAudio graph and
modify it there.

I've built a number of WebRTC apps over the years. Recently, I built just such
a thing as you described and open sourced it:
[https://www.calla.chat](https://www.calla.chat). I opted to build it on top
of Jitsi Meet this time. It's actually advantageous that it's not through the
WebRTC API because Jitsi doesn't give access to the raw WebRTC commands. But
hijacking the audio elements it creates is completely doable.

~~~
cf
Oh man this is like 90% of what I wanted to make! Thanks for making this open-
source!

------
ronlobo
Pretty cool!

Check out

[https://github.com/meething/meething](https://github.com/meething/meething)

dWebRTC Video Meetings MESH/SFU hybrid using GunDB, MediaSoup and Beyond!

This seems to be one of the most promising projects in the WebRTC space with
support from Mozilla Builders.

------
buboard
sadly webrtc p2p does not scale. Sadly current HTML-based solutions for media
servers are slow and highly CPU intensive. Even more sadly, adobe flash has
solved the problem of multi video chat decades ago, but we have decided to
deprecate without alternative

------
julius_set
This article is a strange one. They mention WhatsApp and some other mobile
products but then proceed to frame everything into the context of a browser.

WebRTC works without a browser too FYI

------
kbumsik
Can it be used as an UDP alternative for server-to-client communication, not
browser-to-browser? If so, are there any projects implementing it?

------
gandutraveler
Does webrtc also work on native Android iOS apps ?

~~~
Orphis
There are WebRTC SDKs for Android and iOS.

------
remotists
Most ISPs here in Asia have dual stacks.

------
spicyramen
It's interesting the raise again for unified communications. All this
technology specifically WebRTC is being around for few years now. The
Innovation is minimum, why? Most do the problems are solved. When a technology
is mature most of the focus is on security or applying other technologies to
improve it such as Machine Learning. In the case of VoIP and Video apps, is
very mature since the inception of H323, SIP, SCCP, RTP, sRTP, most recently
JS and WebRTC.

