
Mediasoup – WebRTC Video Conferencing - ofrzeta
https://mediasoup.org/
======
manthideaal
To understand the terms: webrtc, stun, turn, mesh, sfu, mcu, ice and trickle
ice, there is (1). 15 minutes to understand what is all this about. What about
IPv6 stun and turn?, it seems other people asked the same idea I thought: (2)
bout all of this, one is the answer is: As IPv6 takes over the complexity of
new networks, STUN and ICE will become irrelevant. I think that with the surge
in video conferences and rtc, ipv6 with take off.

In my very humble opinion, I would suggest to reserve some address space in
IPv6 for rtc, so that a peer is able to adopt a new special ip reserved for
rtc. Nothing new under the sun, in 2014 someone comment along this line of
thought (2) and (3).

So what are we waiting for?

(1) [https://webrtcglossary.com/](https://webrtcglossary.com/) (2)
[https://www.quora.com/Will-the-IPv6-result-in-the-death-
of-S...](https://www.quora.com/Will-the-IPv6-result-in-the-death-of-STUN-and-
ICE)

(3) 2014, AshleysBrain,
[https://news.ycombinator.com/item?id=7496986](https://news.ycombinator.com/item?id=7496986)
I think the solution is IPv6. Once every device on the Internet is uniquely
addressable again, we can do away with these NAT hacks and two endpoints
should be able to reliably connect to each other again, no matter where they
are. Of course, that's assuming we don't get more short-sighted engineering
that breaks things again...

~~~
supermatt
addressable !== routeable !== reachable.

IPv6 will certainly REDUCE the need for STUN, but there are still (many) cases
where you don't want to be "reachable by default", in which case you need a
stable reference for negotiating routing and reachability (e.g. STUN).

~~~
manthideaal
IPv6 could have and address for "reachable by default", I find that very
useful, also IPv6 allow many addresses so that is not wasteful.

~~~
supermatt
Yeah, but if it’s reachable by default then it’s (by definition) open to the
world. Otherwise (if you mean routable by default) you will still end up
temporarily punching holes in your firewall, which you will need to close
afterwards, and possibly recycling your ip so you aren’t still routable on
that last used address. sounds like you would personally end up being a STUN
server!

------
imtringued
Can this be used as an alternative to Kurento? I use an ffmpeg command to send
an RTP stream to Kurento which then broadcasts that to all clients that
connect via WebRTC. I've had to rewrite the code that interacts with Kurento
twice already. It doesn't feel like a rock solid solution. e.g. when you
accidentally switch audio and video ports in the ffmpeg command the entire
kurento instance crashes. I can't expose Kurento to untrusted instances
otherwise there is a risk of DOS. I don't feel like running a whole Kurento
instance per ffmpeg command is a sustainable solution.

It might not be obvious but I am not using WebRTC for video conferencing. My
use case is basically regular live streaming but with latencies in the 100ms
range.

Edit: I just compared the docs
[https://raw.githubusercontent.com/versatica/mediasoup/v3/art...](https://raw.githubusercontent.com/versatica/mediasoup/v3/art/mediasoup-v3-architecture-01.svg?sanitize=true)
vs [https://doc-
kurento.readthedocs.io/en/6.13.0/features/kurent...](https://doc-
kurento.readthedocs.io/en/6.13.0/features/kurento_api.html#endpoints) This is
exactly what I need!

------
AnonC
This website could be a lot more useful if it answered these questions:

* How good is this compared to other alternatives? (Or why choose this instead of something else?)

* Where are the client apps for this, more importantly on mobile?

~~~
ofrzeta
Sure, the website could be better. Here's a demo
[https://v3demo.mediasoup.org/](https://v3demo.mediasoup.org/)

And here's the source code for the demo:
[https://github.com/versatica/mediasoup-
demo/](https://github.com/versatica/mediasoup-demo/)

Libs for native apps: [https://github.com/haiyangwu/mediasoup-client-
android](https://github.com/haiyangwu/mediasoup-client-android)

[https://github.com/ethand91/mediasoup-ios-
client](https://github.com/ethand91/mediasoup-ios-client)

~~~
anovikov
Libs for native apps suck. But, it is compatible with React Native and works
fine there.

In general, i recommend it. Only media server that i saw working (slightly)
better is Jitsi, but it is 10x more cumbersome and time-consuming to learn.

~~~
deepinder10
making a single customisation in jitsi is a cumbersome task. You have to
accept it how it is.

------
deepinder10
The performance as mentioned on webrtchacks seems promising, I was planning to
go with openvidu but now i am more inclined towards mediasoup. My only
concerns are, please provide simple tutorials with demo and code explanation
(it would be 30 lines of code) so that beginners like me easily understand it.
If you check out openvidu they have provided demos for every use case with
explanation. It is hard for a beginner to understand all this stuff and there
isn't a js v3 broadcasting demo available. Also if you could mention some
performance results on various machines like c5.large, c5.xlarge, c5.2xlarge
with number of participants it would be helpful. Thanks a lot

------
ac130kz
"Cutting-edge" implies that these guys managed to apply secure E2E encryption,
but I can't find any references to it on the website. Correct me if I'm wrong.

~~~
supermatt
You are wrong. This is the current state of webrtc.

Without considerable hacks (patent-encumbered unaccelerated wasm ffmpeg
encoding of pixel data, rolling your own SRTP-like encrypted stream over
datachannels and full mesh distribution of keys) this is not currently
possible for anything other than a full mesh. Whenever an SFU (effectively a
MITM) is involved (needed for more than a small number of participants) e2e
encryption is lost.

If/when Insertable Streams are commonplace, this will be possible without so
many hacks.

Edit: I see I was downvoted for correcting you, but there was an article on
this very subject yesterday: [https://webrtchacks.com/you-dont-have-end-to-
end-encryption-...](https://webrtchacks.com/you-dont-have-end-to-end-
encryption-e2ee/)

~~~
fulafel
The video transmitted over webrtc srtp is natively dtls encrypted between
peers.

~~~
supermatt
Yes, that is the "full mesh" that I mention in my comment...

This platform - and jitsi/janus/zoom/whatever - all use an SFU for more than a
handful of participants.

~~~
fulafel
But thats very doable. And no need for the unaccelerated ffmpeg wasm afaics,
and no need to use a "sfu".

I don't know what Mediasoup does, was just commenting on what webrtc can do re
the "the current state of webrtc" subject.

~~~
supermatt
Do you have any examples of a webrtc video conferencing platform that provides
a sufficient user experience over full mesh with 4 or more participants on
disparate internet connections?

Im just saying that this library (and similar platforms) ARE the cutting edge
of webrtc. At small number of participants, they use full mesh (which is
e2ee), but at larger scale, they need to use an SFU (which is not e2ee without
jumping through some crazy hoops - but this is being worked on)

~~~
fulafel
No (but i hope they exist, havent looked) - however not all webrtc apps are
general videoconf platforms. You don't even need servers if you go full p2p.

~~~
supermatt
You still need servers to handle the signalling, perform ICE and TURN where
applicable.

~~~
fulafel
Only if you need to support users without full internet connectivity. Again
webrtc can be used in many different requirement contexts.

~~~
supermatt
You can certainly manually handle the signalling to set up the offer/answer.
e.g. read them out to your peer(s).

You would still need some form of STUN server (but there are a number of
'freely accessible' ones, even configured as defaults in some browsers) to get
your reachable address/port/proto. You cant, AFAIK, handle this manually - it
is handled internally as part of ICE. Then you would need to manually handle
the signalling of these as well.

Thats the bare minimum you have to do to peer over webrtc under ideal
circumstances, but its certainly doable.

So you do that for each peer. If you (or a peer) change streams you will do
the same thing all over again

~~~
fulafel
STUN is not needed if the participants have full internet connectivity, only
if you have to work around NAT.

~~~
supermatt
As mentioned, you cant manually set up your listening ports - this is handled
internally by ICE, which needs to connect to a STUN server to get (at a
minimum) your public IP address.

edit: cant reply to your comment fulafel, but on that project you posted:
[https://github.com/cjb/serverless-
webrtc/blob/master/serverl...](https://github.com/cjb/serverless-
webrtc/blob/master/serverless-webrtc.js#L25) Also, note that even if this
wasnt defined, some browsers contain defaults. You 100% need STUN, but you can
handle the signalling manually - as I stated.

edit2: cant reply to your comment ibc, but I was explaining the bare minimum
'serverless' webrtc case still required a STUN server. I appreciate that
mediasoup SFU uses ice-lite instead.

~~~
fulafel
I know I've seen at least data channel working without STUN, this was the demo
I think: [https://github.com/cjb/serverless-
webrtc/](https://github.com/cjb/serverless-webrtc/)

If you say it's different with media channels, I'll believe you.

WebRTC can tell you your ip address without any external server, that only
breaks if you are dealing with NAT.

edit: to clarify the NAT-less use case, i'm thinking of apps that can rely
on/require p2p supporting IPv6 connectivity.

~~~
ibc
And somehow this discussion become a general topic about WebRTC scenarios.
They do exist, yes, but mediasoup is a SFU scenario. No STUN is required at
all. TURN may be needed if the client network/router blocks UDP.

------
sandGorgon
any comparisons with jitsi ? really would like to understand how these
technologies stack against each other.

jitsi has been production tested far longer i suppose, through its freely
available videoconferencing service
[https://meet.jit.si/](https://meet.jit.si/)

~~~
ibc
mediasoup co-author here.

Comparing Jitsi with mediasoup is like comparing Netflix (backend + apps) with
Express.js + libcurl.

Jitsi developers may replace their RTC core internals (including the SFU) with
mediasoup + mediasoup-client and you wouldn't even realize of it. Hope this
helps.

~~~
sandGorgon
Thanks for this. Do you plan to go higher in the stack and release components
to get-started-quickly ?

~~~
ibc
Not at all.

------
ibc
Hi, mediasoup co-author here.

TL'DR': Pornhub uses mediasoup.

I've read many comments here asking about "how mediasoup is different than
XXX" or about "mobile apps". I think the Overview in the website should be
self explanatory, I'll just paste a fragment here:

[https://mediasoup.org/documentation/overview/](https://mediasoup.org/documentation/overview/)

\--------------------- Design goals of mediasoup and its client side
libraries:

\- Be a SFU (Selective Forwarding Unit). \- Support both WebRTC and plain RTP
input and output. \- Be a Node.js module in server side. \- Be a tiny
JavaScript and C++ libraries in client side. \- Be minimalist: just handle the
media layer. \- Be signaling agnostic: do not mandate any signaling protocol.
\- Be super low level API. \- Support all existing WebRTC endpoints. \- Enable
integration with well known multimedia libraries/tools.

Use cases:

\- Group video chat applications. \- One-to-many (or few-to-many) broadcasting
applications in real-time. \- RTP streaming.

~~~
bryanrasmussen
I read the whole thing. I am really confused as to why out of that you chose
"Pornhub uses mediasoup" as the TLDR?

~~~
ibc
Just to easily explain that mediasoup is not a replacement for Jitsi or Zoom,
but a low level set of libraries for building build different kind of real-
time applications, including multi-party videoconference apps (such as Jitsi
or Zoom) and others completely different.

------
pkstn
Does it do TURN/STUN?

~~~
ibc
mediasoup is a SFU that must be deployed in a reachable server, so STUN is not
needed at all. You may need a TURN server if a client has a restrictive
firewall that blocks UDP. mediasoup is not a TURN server but you can deploy a
TURN server (i.e. coturn) in your backend.

~~~
supermatt
Are there plans to support a TCP candidate so a TURN server isn't needed at
all? It feels a bit wasteful to effectively use a TURN server as a TCP->UDP
proxy for a publicly accessible server.

~~~
ibc
We do support TCP ICE candidates for long time:

[https://mediasoup.org/documentation/v3/mediasoup/api/#WebRtc...](https://mediasoup.org/documentation/v3/mediasoup/api/#WebRtcTransportOptions)

> TCP candidate so a TURN server isn't needed at all?

This is not true. A router may still block TCP traffic different than TLS or
traffic that does not have destinatiuon port 80 or 443. So ICE TCP candidates
do not avoid the need for a TURN server in certain cases.

~~~
supermatt
Maybe I could allocate a port to use for an additional low priority tcp
candidate via configuration, or would I need to dive into the code for this?

For example, I could supply the generated udp and tcp candidates in addition
to a tcp:443?

What are your thoughts?

~~~
ibc
You cannot select a specific listening port for a specific transport, because
each WebRTC transport requires, at least, a different listening port in the
server:

[https://mediasoup.org/documentation/v3/mediasoup/api/#WebRtc...](https://mediasoup.org/documentation/v3/mediasoup/api/#WebRtcTransportOptions)

YouIf you want to listen in TLS 443 for all clients, add a TURN server into
your backend. Just that.

~~~
supermatt
Is there a reason for the restriction of one connection per port? I would have
thought you would be able to use the same port for each peer source ip/port
tuple?

Not doubting you - but I never experienced this limitation with other
client/server applications. I have an http server serving over 200k concurrent
websockets on port 443, for example.

I'm happy to help out with this if I can.

~~~
ibc
Majority of RTP media server listen into a separate port for each connection.
That's how RTP typically works. This is not TCP connections.

~~~
supermatt
rfc3550 states that it is per destination ip/port tuple. So you should be able
to support multiple connections per local port. Is it possible this is an
oversight in the current implementation? I appreciate this isn’t TCP, which is
why I have just read through all relevant RFCs.

~~~
ibc
Why is that so important? As I said, choosing a specific port is not enough.
This is not TLS. An aggressive firewall may drop those TCP connections because
there is no TLS data on them.

~~~
supermatt
TLS port was just a thought, as I want to reduce cases where turn server is
used because of a limitation with scalability (65k connection limit per turn
server due to a shared source ip). But our discussion has raised another issue
regarding mediasoups limitation of one source per local port - which compounds
the issue.

I’m replacing a web socket server with a data channel server. If I use
mediasoup then I will need to listen over 4 ip4 addresses to support the 200k
clients I can currently support on 1 ip address with web sockets. Not a huge
deal right now, but if I want to support millions of user it means managing 40
or so ip addresses instead of 1 or 2.

Not knocking mediasoup at all, just now aware of a limitation that sounds like
it doesn’t need to exist so seeing if we can do something about it.

~~~
ibc
This is RTP not WebSocket or HTTP. Media servers need a separate port for each
RTP communication. A hack could be done to make all WebRTC endpoints to use a
single port in mediasoup side. However mediasoup also support plain RTP
endpoints and, in those, you need to be ready to listen for RTP from _any_
remote IP:port (you don't know it in advance due to NATs). In WebRTC we can
use ICE user/pwd (previously given to the server via signaling) but that's not
possible with plain/regular RTP (no ICE).

~~~
supermatt
Isn’t that what the SSRC is for? I.e you use the SSRC (sent as part of media
in the SDP) to identify the stream, rather than trusting an authentic stream
is the only one to send to an open port? At least, that is how I understood
the (multiple) rfcs. Not an expert here by any means.

~~~
ibc
In WebRTC spec (although not super mandatory but the current way to go), the
client no longer signals its sending SSRCs into the SDP but a MID and optional
RID values (if simulcast is in use), and those MID and RID are not supposed to
be unique across all participants (not at all, but neither SSRCs are supposed
to). Those MID and RID values are signaled in the SDP and then included into
RTP packets as header extensions. The remote matches RTP packets based on them
and then learns the associated SSRC for a faster lookup for future packets.

Anyway, WebRTC is not just about RTP. In fact, before RTP happens, ICE and
DTLS must de done.

~~~
supermatt
Many Thanks. Sorry to be a bother :) Ill have a play around and see what I can
get working for my usecase.

