I created Broadcast Box originally as a reference server to test OBS against. It was way easier for people to test my WebRTC/WHIP PRs against. Seeing people use it I am seeing the benefits/excitement more.
* low latency means you have a relationship with your audience. These intimate broadcasts are a new medium.
* Simulcast means it is way cheaper to run a streaming site. No more running ffmpeg/generating transcodes server side.
* AV1/H265/Opus means users with lower bandwidth can now broadcast. Users with enough bandwidth can stream at quality levels they couldn’t before
* UDP gives us IRL/Roaming streams. No custom setup for re-connects.
* Multi-track lets you send multiple video feeds or languages at once
* E2E Encryption means that P2P distribution could be a thing
I think this is a highly underrated comment. It seems today, people wont create for free any more, just for the joy of creating and distributing some thing you care about. It's always please donate x to y. Or subscribe to get access to premium content.
this seems like a reasonable comment on its face, but then it reads selfishly. why should a developer donate time when a house isn’t donated to that programmer?
> mediamtx is a great project! Use w/e works best for your use case.
I agree, I'm just struggling to differentiate what BB offers that MMTX does not, so I can identify if there's a USP. If it's a passion project to scratch a personal itch, that's also great!
Also, can you share your latency measuring methodology?
I think Broadcast Box may have implemented WHIP/WHEP before MediaMTX so it tends to be one of the first results when looking up a WHIP-capable server. Plus it has a public instance so if you want to test WHIP real quick, it's pretty easy to just point OBS somewhere instead.
MediaMTX used to be "rtsp-simple-server" which really undersold what it was capable of (even back then it could ingest RTMP and output HLS and WebRTC).
Overall I think MediaMTX has more features and can do everything that Broadcast Box can.
This is all done client side. OBS sends up multiple renditions, the server is just in charge of forwarding the specific layer. I think this is better in a few ways.
* Lower latency - The extra decode + encode adds enough latency that you start to lose the real-time latency.
* Better Quality - You get generational loss from the transcoding. You get better quality only encoding one times. Streaming services also optimize for cost. Having broadcasters control the encoding quality of everything makes for a better experience.
* Security/Trust - Servers shouldn't be able to modify video at all. I would like for broadcast services to eventually offer E2E encryption. It feels wrong to me that a streaming service is able to modify video however it pleases.
> It feels wrong to me that a streaming service is able to modify video however it pleases.
This would significantly complicate services such as Twitch and YT Live going to server-side ad insertion if the source video were E2E. I think server-side ad insertion is likely the only method available for providers that isn't able to be circumvented by ad-block or DNS blocking plugins.
Why? The advertisements could still be added as separate video layers? Sure, they are also easier to be circumvented. But I’d rather supported something like this than end up with real time edited video to insert ads or even worse have “influencers” promote something without ever having really tried it.
If it's a separate layer or different server you can bet that someone will figure out how to remove it from the DOM, hide it, or prevent it from even loading.
There is by definition a layer of trust between the broadcaster, the server, and the client.
That said, you could use TLS from the server to client, which is what many services do (since it's actually quite difficult to use http these days for streamed content due to browser polices).
> I would like for broadcast services to eventually offer E2E encryption
Is this even conceptually possible? I suppose you could sign the stream, but if you want to hide it how would you prevent the server from simply adding itself as a viewer?
(also, if you do this, start a countdown for getting raided as a CSAM distributor)
Hi, this looks great but I tried to do the setup as described in the readme using OBS and streaming to your server and I saw 3-4 seconds delay, how exactly can I reach sub-second?
I always wondered why broadcasters don't transcode on the client side.
RTMP can't handle it, but SRT (and apparently WebRTC can). It would reduce latency for sure. Of course, it does require a good network connection on the client end, but so does streaming.
I feel the opposite. I’ve watched streaming as my primary form of media for over 10 years now, and it seems a majority of the small, “intimate” broadcasters that were fun to watch have had to get real jobs, or are sailing the world, or what have you. To me, it seems like it’s dying in favor of short form doomscroll videos.
> I’ve watched streaming as my primary form of media for over 10 years now
My goodness, you are living in the future. What else are you into? Where can I follow you (blog, twitter, warpcast, etc?)? Love to follow early adopters.
WebRTC is codec agnostic as far as I know, it's up to the peers to negotiate the codecs used. Of course browsers will have more narrow support, but can sometimes activate more codecs via flags or you can use a non-browser client.
People interested in this project might also be interested in Cloudflare's webrtc streaming service¹ as a cloud hosted solution to this same problem. "Sub-second latency live streaming (using WHIP) and playback (using WHEP) to unlimited concurrent viewers." Using the same OBS WHIP plugin, you can just point to Cloudflare instead. Their target pricing model is $1 per 1000 minutes.² Which equates to $0.06 per hour streamed.
They mention "$1 per 1000 minutes of video delivered", does that mean per viewer? $0.06 per hour per viewer seems like a lot, although I have no idea if that's "competitive" or inline with actual bandwidth costs.
As per the direct quote in my parent comment, by "unlimited concurrent viewers", I can only assume that they mean what it says as taken by the text from the links of both ¹⁺²
> Broadcast Box uses WebRTC for broadcast and playback. By using WebRTC instead of RTMP and HLS you get the fastest experience possible.
Nothing in RTMP prevents you from achieving low latency; it's the software stack around it that determines latency for both RTMP and WebRTC. Only HLS does have some built-in deficiencies that cause extra latency.
WebRTC is basically just signalling around RTP in a browser compatible manner. The problem with RTMP is that nobody is using flash based video players anymore so "RTMP and HLS" as a bundle is non-negotiable in web browsers.
If this does what it says it does I'll be a very happy user. Playing RPGs 'together' with somebody over the internet isn't that much fun if they are a second or more behind in what's going on. I've actually looked for a solution to this problem (low-latency P2P streaming) quite some time ago, and couldn't get it to work with just OBS because of strange bugs and other issues, so I really appreciate you including this use-case :)
Steam has this functionality built in, including streaming keyboard and mouse and controllers over the open internet. It's called "remote play together"
There is also the "event" nature of streaming, where the audience is excited to react to things together in real-time, like when people Tweet when watching the Superbowl or a show's season finale. Even for mundane streams, the "chat" will happily talk amongst themselves reacting to whatever the streamer is doing, which can feel like hanging out with friends.
There's also times where the streamer is discussing something that is happening now or just happening (world events, the Superbowl, a game update, etc), where viewers are excited about the content right now and don't want to wait for the traditional record->edit->release cycle.
The very important thing in modern internet media: parasociality with the audience. Only by being live can you have a conversation with "chat", whether that's individual chatters or the hivemind of very large audience streams.
There is a special skill of streamers who can play an action game on one half of the screen and read enough of chat on the other half to make and respond to jokes, at the same time.
Another consideration: youtube and twitch apply different music licensing considerations to live, so almost anything that involves music MUST be live and unarchived.
Some viewers prefer livestreaming, and we see streamers catering to those audiences. It depends on the streamer, but often times there is much stronger sense of community when there is a live, topic-oriented chat, especially when the streamer engages with the chat. This is not something you can satisfactorily replicate with prerecorded streams.
Streaming is also much more low-cost to produce, editing can often represent an unwanted source of complexity and loss of creative control.
Think of the difference between watching a live sporting event or a concert vs watching a recording of the same event. The latter might satisfy you but there's something different about the former.
Live streaming is inherently mor einteractive and there's a shared experience simply because you can't speed up.
I guess you could see that live-streaming is experience-based. VODs are results-based. Not strictly the case but there's a trend.
What's the state of the art in distributing WebRTC to 100k+ clients?
When I was more into the low latency streaming space a few years ago, it felt like WebRTC was there when it came to << 1 second latency, but the infrastructure to actually distribute it was not really. I think Cloudflare (and maybe some other vendors) were working on creating some standard, has it landed? Can I run my own horizontally scalable WebRTC broadcaster (are there open source implementations)?
Something like Low-Latency HLS or CMAF was at like < 5 second latency, but was on the other hand stupidly easy to distribute widely (just static files on a plain old CDN / http server).
I think state of the art in this space is WHIP/WHEP, which is what this project is implementing. I'm guessing 100k+ clients would be pretty difficult for Broadcast Box (out of the box), however Cloudflare has a beta stream service that's has a tagline of "Sub-second latency live streaming (using WHIP) and playback (using WHEP) to unlimited concurrent viewers." https://developers.cloudflare.com/stream/webrtc-beta/
To build this quickly I would chain Broadcast Box instances. Have it look like a n-ary tree and spin up another broadcast box instance when you need it.
I'm unaware of WebRTC being used for this sort of "multicast" context, especially with such a high "fanout" -- all I've used/built are p2p contexts like video/data.
Not knowledgeable here, but also interested if someone actually knows the answer to your question...
Microsoft's now shut down Mixer (formerly Beam.pro) primarly used WebRTC and had streams with up to 100k viewers at some point all via WebRTC, but can't find much info about it anymore now sadly.
I previously was involved in a service where WebRTC was used for a fairly large number of N streams to M viewers. Like ~75 streams total and 500+ viewers. Solution space was public sector incident command (think police officers with multiple cameras on their body streaming to command center).
you know, almost every time i try to talk with my family on jitsi, there's some kind of glitch. they can't see my screen, or i can't see theirs, or they can see it but only in super low resolution, or they have the camera turned on but i can't see it, or we all get kicked off, or something. can broadcast box allow us to use obs studio (or some other free, open-source software) to stream to each other, without relying on a proprietary server? i don't need 100k+ clients, i'd be satisfied with reliable connectivity between 2–4 clients! and i could run a server outside of nat
i'm not going to get 120ms latency though. i'm in argentina, they're mostly in the usa, and i have 200+ milliseconds of latency over the internet to anything in the usa
if broadcast box isn't what i'm looking for, is there something else? i already know about zoom, google, and teams, but those all make us vulnerable to proprietary servers
Having discovered this 15 minutes ago I can't really speak in absolutes, but firing up OBS and using their test server, I think this would allow you to do exactly what you want. It's trivial to get OBS set up to stream to this, and while I haven't tried it, the docs say that it supports multiple streams to the same stream key. I guess it would just show each stream side by side in the client?
I think you could run your own server, run an instance of the front end and distribute the instructions to setup OBS to your family.
Lots of options! More and more WebRTC SFUs are adding WHIP support. Check out https://galene.org/ it lets you do OBS in and has things like chat. Might be better if you are streaming to family and want chat. I would try and find a VPS that is best geographically located.
Cloudflare if you want this quick. Even though it is proprietary you have no vendor lock in, just change your WHIP URL in OBS if things aren’t going right.
apparently 'an sfu' is 'is a media server component capable of receiving multiple media streams and then deciding which of these media streams should be sent to which participants [of which the] main use is in supporting group calls and live streaming/broadcast scenarios.' according to https://bloggeek.me/webrtcglossary/sfu/
i suspect webrtc implementations in browsers are the source of many of the problems but i don't know how to debug that
i suspect that the problems with jitsi are maybe problems with webrtc in general, because i've experienced very similar things on the proprietary services. my limited experience with obs studio, though, has been flawless, and it seems to be the main daily driver for lots of people who make their living streaming?
looks like that's hosted in DE - might not be the best latency? would try self hosting it on a box in Florida or something to try and get the best connection to you both
As someone unfamiliar with video broadcasting latencies, how does this compare to alternatives? Also, what are the hardware specs used to achieve the 120ms measurement?
Some firms were broadcasting live feeds with sub-second (~200ms) latencies to broad audiences not long after the dot com bust.
There've always been (a majority of) broadcasters that had seconds to half minute (!) delays. Very few understood, or even today understand, how to tune every single touchpoint (including what's safe to do, and what one must not do.)
Having an ultra low latency “video toaster” / broadcast mixer is a critical piece for sure.
For us the motivations to work on this started with "backstage feeds" synced with live TV that needed to look simultaneous to the home cable viewer (think World Wrestling Entertainment) as well as Wall Street calls that needed to be (perceptually) in sync with dial-ins. And of course, Times Square NYE ball drops!
In reality, almost nothing matters that way. For vast majority of content, the viewer is consuming only your stream, without some more real time channel at the same time. In most cases, the viewer cares far more about visual quality than the delay.
It drives me crazy that we have had the technical ability to do these things, but just not the demand. I believe it is a chicken/egg problem. It is one of those things that you don't get until you try it yourself.
I hope if more broadcasters (technical and non-technical) realize what is possible they will ask for it more!
> For vast majority of content, the viewer is consuming only your stream
Agree! I think it is that way because you can't have intimate/interactive streams yet. When it becomes possible I hope to see more streaming to smaller/more connected audiences.
You see it everywhere you look. Audio is my favorite pet example. We hit the limit of human perceptibility in the 1980s. The marketing continued to push for two more decades. Now we have a situation where the average consumer is less educated and the enthusiasts are least educated of all.
We push technical advance until it hits a cliff in returns. We will never get there. There will be no 10 million mile drivetrain in a car. There will be no multi-generational fridge. There will be no house that withstands a tornado.
Software feels like it should be different because it's "free", but really any advance beyond the cliff is a great gift and not something to take for granted.
Cable has a large latency too in the seconds range compared to OTA broadcasts.
Even live news has large a large latency with remote interviews. Some of them are so bad it's uncomfortable watching the anchor have the patience to wait for a response before stepping on the remote feed.
If I'm watching the world cup final penalty shootout and I hear my neighbours cheering before I'm watching the player running up that basically breaks the entire game.
Some streaming services are 30 seconds plus behind OTA, at that level twitter and half a dozen news apps have pushed the fact the goal went in while the ball is still in the other half.
“You could also use P2P to pull other broadcasters into your stream. No special configuration or servers required anymore to get sub-second co-streams.”
I currently have this setup for doing a co-stream with a friend and it is terrible:
1. Friend is running OBS to capture his gameplay.
2. Friend has OBS streaming to a Raspberry Pi I have running at my house.
3. The Raspberry Pi is running nginx configured to accept the RTMP stream.
4. I run OBS on another machine to capture my gameplay, add overlays, etc.
5. My OBS has an input source using VLC to capture the stream from the Raspberry Pi.
The setup is awful. Video is pretty delayed and it often just stops working. I would love to look into this project but after reading through the README, I am unclear how I would use this for my setup. Any pointers?
For now I would do Browser Source (is easier/no custom builds). In the future when WHEP is merged that is the way to go. If you get stuck jump in the Discord and happy to help debug! https://discord.gg/An5jjhNUE3
So can I use this to record multiple streamers, add commentary and tracker overlays, to stream it back to twitch? I've been wanting to make something like this for a while...
IMO This has the same problems inherent to WebRTC itself, which is that a large percentage of users in the world are behind symmetric or CG NAT and so TURN relay servers are needed, but never actually defined by the application, because there are no good free ones. Personally I've never been able to use any WebRTC-enabled service for this reason, nor anyone else I know.
Good, free TURN servers probably don't exist because the protocol is basically open proxy as a service --- who wants to run that without limits?
But something like this is a selective forwarder, so as long as you can find someplace to host it that can get an ip:port to listen on, all your users/compatriots should be able to connect to it, and you're ready to go.
There are free VPN services though. But yes you're right, I forgot to qualify my statement saying that I was referring to bidirectional WebRTC... which I had assumed the author was referring to as well since they explicitly mentioned "WebRTC comes with P2P technology".
My public instance runs on a world addressable host and doesn't have TURN. I have no problem using it from my phone and it is behind a CG NAT. No TURN/Relay servers are needed.
> I've never been able to use any WebRTC-enabled service for this reason, nor anyone else I know
Is this a browser/agent issue? You aren't able to use Google Meet or Zoom in the browser?
It may not use it if you don't explicitly need it, there could be out-of-band detection that adds a TURN server or some other OOB relay if it's needed. I've also seen comments online that say Chrome in particular can support TCP for WebRTC which would negate the need for a relay (as normally only DTLS over UDP is used). But based on my understanding of how WebRTC and NAT works, using the typical UDP approach for bidirectional communication over symmetric or CGNAT absolutely should not work, barring some other method of NAT traversal such as a browser-based UPnP client or such.
Broadcasting over WebRTC can be if the source is single. But in case of a scenario of multiple source, like take a conference of 100 people as example, will it be smooth?
Yes. Me clicking the mouse, the software transmitting the mouse click to a datacenter, software rendering the web browser using AVX2, software encoding the stream, sending it to the local browser and decoding it on the screen, shining photons in my eyes, me clicking the button a second time (which also needs to be transmitted to the datacenter), gets me around ~400ms over WebRTC on the reaction time benchmark vs ~200ms on the local computer. I'm not even trying. It's a janky as hell solution that is about to fall apart the moment you look at it funny.
Also, I hate ffmpeg for streams that last longer than a day. The latency creep of streams lasting weeks is horrible.
> We've shown that many measurements of latency [...] ignore the full capture and playback pipeline
In the repo linked in OP is a screenshot showing a wall clock next to its playback on the streaming site -- that's end-to-end to me. So how is this relevant?
Because latency is a distribution and these photos are often selected at the best-case P0 end of all the encode/decode processes whereas actually what matters is the worst case P99.
A proper implementation will make sure the worst-case latency is accounted for and not cherry-pick the best case.
Could you add more to your question? Where did you already do research? What providers are you considering? How much bandwidth is “free” with the tier of VPS you’re paying for? Etc
This is a question that sales reps spend lots of work hours trying to help clients work out.
There's no way that people cannot tell the difference, I can with various streaming methods from my pc to my shield/tv, with wire in the same house. Mouse to photon latency of a good pc will be in the range of 10-20ms, best case you're doubling or tripling that.
Nah to be fair it's fine for a lot of games which are also played on old gen consoles with terrible gamepad to TV latency. Sure twitchy multiplayers are definitely not some of them. I'm not big on competitive multiplayer, only Rocket League and I can't do this over local streaming. Pretty much anything else I play is ok though.
You, my dear Internet friend, are confidently expressing your lack of experience. No one who has played multiplayer LAN games, or low latency Internet games, could or would ever say that streaming gaming, such as the dead stadia, or moonlight, whatever, are comparable to the alternative, Nah, they couldn't.
Most online games use client side prediction, so any input made by the client happens almost instantly on the client and it feels really good, and can be rollbacked if the server disagrees. If you stream your game remote with 40ms it will add 40ms to your input and that just feels bad (not to mention jitter, especially if you're on semi-congested wifi), but its not unplayable or even that noticeable in many games. Would I play some casual Dota like that? Sure. But not high ranked games.
yeah, I just feel the lag and everything, even on monitors claiming 1ms I can feel it while playing FPS and it is really annoying to me if game is not fluent I will not play it
Cloud gaming is streaming from a server in a data center to one nearby client. Twitch-style live streaming is from a client, to a data center, to a CDN, to multiple clients.
TLDR, there are a lot of moving pieces, but people are working on it at the moment. I try to summarize below what some of the challenges are
Bandwidth requirements are a big one. For broadcasts you want your assets to be cacheable in CDN and on device, and without custom edge + client code + custom media package, that means traditional urls which each contain a short (eg 2s) mp4 segment of the stream.
The container format used is typically mp4, and you cannot write the mp4 metadata without knowing the size of each frame, which you don't know until encoding finishes. Let's call this "segment packaging latency".
To avoid this, it's necessary to use (typically invent) a new protocol other than DASH/HLS + mp4. Also need cache logic on the CDN to handle this new format.
For smooth playback without interruptions, devices want to buffer as much as possible, especially for unreliable connections. Let's call this "playback buffer latency".
Playback buffer latency can be minimized by writing a custom playback client, it's just a lot of work.
Then there is the ABR part, where there is a manifest being fetches that contains a list of all available bitrates. This needs to be updated, devices need to fetch it and then fetch the next content. Let's call this "manifest rtt latency".
Lastly (?) there is the latency from video encoding itself. For the most efficient encoding / highest quality, B-frames should be used. But those are "lookahead" frames, and a typical 3 frame lookahead already adds ~50 ms at 60fps. Not to mention the milliseconds spent doing the encoding calculations themselves.
Big players are rewriting large parts of the stack to have lower latency, including inventing new protocols other than DASH/HLS for streaming, to avoid the manifest RTT latency hit.
For HLS you can use mpeg ts, but mp4 is also an option (with the problem you talk about).
IMO one of the issues is that transcoding to lower resolutions usually happens on the server side. That takes time. If the client transcoded that latency would go away (mostly).
Because there are so many middlemen in series buffering frames, and also because access circuit between user terminal to nearest CDN is jittery too. The latency must be few times over max jitter for a par course user experience.
Slightly tangential, do you think Moonlight (with say Sunshine) is good enough for proper work? I've used a few "second screens" apps like spacedesk on my iPad but generally when the resolution is good enough for text, it's too laggy for scrolling (and vice-versa).
(For more details, I'm planning to stream from an old laptop after turning it into a hackintosh. I'm hoping staying on the home network's going to help with latency.)
Absolutely yes, especially if you use a GPU with a recent video encoder.
I have all my PCs connected to each other via Moonlight+Sunshine and the latency on the local network is unnoticeable. I code on my Linux workstation from my laptop, play games on the gaming PC from my workstation, etcetera and it is all basically perfect.
Thank you! When you say GPU with a recent video encoder, do you mean them separately (i.e. don't use software/cpu streaming; and use an efficient encoder), or do you mean use a GPU that supports recent encoders? I'm afraid my Intel HD 520 isn't particularly new and likely doesn't support modern encoders.
A GPU with a dedicated encoder for x264 or preferably x265/AV1. You can do without it, but you'll spend a core or two on software encoding and the overhead will add a few tens of ms of lag.
With full hardware capture and encode (default on windows, can require tweaking on Linux ) it's virtually free resource-wise.
I wouldn't be surprised to learn that Nvidia is doing exactly that on their cloud: Compressing the video on the GPU using NVENC, building a package around it and then passing it to a NIC under the same PCIe switch (mellanox used to call that peerdirect) and sending it on its way.
The tech is all there, it just requires some arcane knowledge.
"arcane knowledge" is too strong of a phrase. You need someone who is familiar with Nvidia hardware and is willing to write software that only works on Nvidia hardware.
This is premature optimisation. The bus bandwidth and latency needed to get a few Mbps of compressed video to the PC is microscopic. It's completely unnecessary to lock yourself into NVIDIA just to create some UDP packets.
Exactly this with „…NVIDIA GPUDirect for Video, IO devices are fully synchronized with the GPU and the CPU to minimize wasting cycles copying data between device drivers“.[1]
* low latency means you have a relationship with your audience. These intimate broadcasts are a new medium.
* Simulcast means it is way cheaper to run a streaming site. No more running ffmpeg/generating transcodes server side.
* AV1/H265/Opus means users with lower bandwidth can now broadcast. Users with enough bandwidth can stream at quality levels they couldn’t before
* UDP gives us IRL/Roaming streams. No custom setup for re-connects.
* Multi-track lets you send multiple video feeds or languages at once
* E2E Encryption means that P2P distribution could be a thing