A quick search for "latency" in here has one little hand-wavey blurb about Mux working to optimize HLS.
>Using various content delivery networks, Mux is driving HTTP Live Streaming (HLS) latency down to the lowest levels possible levels, and partnering with the best services at every mile of delivery is crucial in supporting this continued goal.
In my experience, HLS and even LLHLS are a nightmare for latency. I jokingly call it "High Latency Streaming", since it seems very hard to (reliably) obtain glass-to-glass latency in the LL range (under 4 seconds). Usually Latency with cloud streaming gets to at least 30+s.
I've dabbled with implementing WebRTC solutions to obtain Ultra Low Latency (<1s) delivery but that is even more complicated and fragmented with all of the browsers vying for standardization. The solution I've cooked up in the lab with mediasoup requires an FFMPEG shim to convert from MPEGTS/h264 via UDP/SRT to MKV/YP9 via RTP, which of course drives up the latency. Mediasoup has a ton of opinionated quirks for RTP ingest too, of course. Still I've been able to prove out 400ms "glass-to-glass" which has been fun.
I wonder if Mux or really anyone has intentions to deliver scalable, on cloud or on prem solutions to fill the web-native LL/Ultra LL void left by the death of flash. I'm aware of some niche solutions like Softvelum's nimble streamer, but I hate their business model and I don't know anything about their scalability.
Hmm, we're getting <200 ms glass-to-glass latency by streaming H.264/MP4 video over a WebSocket/TLS/TCP to MSE in the browser (no WebRTC involved). Of course browser support for this is not universal.
The trick, which maybe you don't want to do in production, is to mux the video on a per-client basis. Every wss-server gets the same H.264 elementary stream with occasional IDRs, the process links with libavformat (or knows how to produce an MP4 frame for an H.264 NAL), and each client receives essentially the same sequence of H.264 NALs but in a MP4 container made just for it, with (very occasional) skipped frames so the server can limit the client-side buffer.
When the client joins, the server starts sending the video starting with the next IDR. The client runs a JavaScript function on a timer that occasionally reports its sourceBuffer duration back to the server via the same WebSocket. If the server is unhappy that the client-side buffer remains too long (e.g. minimum sourceBuffer duration remains over 150 ms for an extended period of time, and we haven't skipped any frames in a while), it just doesn't write the last frame before the IDR into the MP4 and, from an MP4 timestamping perspective, it's like that frame never happened and nothing is missing. At 60 fps and only doing it occasionally this is not easily noticeable, and each frame skip reduces the buffer by about 17 ms. We do the same for the Opus audio (without worrying about IDRs).
In our experience, you can use this to reliably trim the client-side buffer to <70 ms if that's where you want to fall on the latency-vs.-stall tradeoff curve, and the CPU overhead of muxing on a per-client basis is in the noise, but obviously not something today's CDNs will do for you by default. Maybe it's even possible to skip the per-client muxing and just surgically omit the MP4 frame before an IDR (which would lead to a timestamp glitch, but maybe that's ok?), but we haven't tried this. You also want to make sure to go through the (undocumented) hoops to put Chrome's MP4 demuxer in "low delay mode": see https://source.chromium.org/chromium/chromium/src/+/main:med... and https://source.chromium.org/chromium/chromium/src/+/main:med...
We're using the WebSocket technique "in production" at https://puffer.stanford.edu, but without the frame skipping since there we're trying to keep the client's buffer closer to 15 seconds. We've only used the frame-skipping and per-client MP4 muxing in more limited settings (https://taps.stanford.edu/stagecast/, https://stagecast.stanford.edu/) but it worked great when we did. Happy to talk more if anybody is interested.
[If you want lower than 150 ms, I think you're looking at WebRTC/Zoom/FaceTime/other UDP-based techniques (e.g., https://snr.stanford.edu/salsify/), but realistically you start to bump up against capture and display latencies. From a UVC webcam, I don't think we've been able to get an image to the host faster than ~50 ms from start-of-exposure, even capturing at 120 fps with a short exposure time.]
Why even bother with the mp4? For audio sync, or just to use <video> tags?
On the web i was latency down by just sending nalus, and decoding the h264 with a wasm build of broadway, but now with webcodecs (despite some quirks), thats even simpler (and possibly faster too, but depends on encoding with b-frames etc)
Of course trying to get lowest latency video, I'm not paying attention to sound atm :)
Hey, I work in the Product team at Mux, and worked on the LL-HLS spec and our implementation, I own our real-time video strategy too.
We do offer LL-HLS in an open beta today [1], which in the best case will get you around 4-5 seconds of latency on a good player implementation, but this does vary with latency to our service's origin and edge. We have some tuning to do here, but best case, the LL-HLS protocol will get to 2.5-3 seconds.
We're obviously interested in using WebRTC for use cases that require more real-time interactions, but I don't have anything I can publicly share right now. For sub-second streaming using WebRTC, there are a lot of options out there at the moment though, including Millicast [2] and Red5Pro [3] to name a couple.
Two big questions comes up when I talk to customers about WebRTC at scale:
The first is how much reliability and perceptual quality people are willing to sacrifice to get to that magic 1 second latency number. WebRTC implementations today are optimised for latency over quality, and have a limited amount of customisability - my personal hope is that the client side of the WebRTC will become more unable for PQ and reliability, allowing target latencies of ~1s rather than <= 200ms.
The second is cost. HLS, LL-HLS etc. can still be served on commodity CDN infrastructure, which can't currently serve WebRTC traffic, making it an order of magnitude cheaper than WebRTC.
It's usually layers of HLS at that. For live broadcasts, someone has a camera somewhere. Bounce that from the sports stadium to a satellite, and someone else has a satellite pulling that down. So far so good, low latency.
But that place pulling down the feed usually isn't the streaming service you're watching! There are third parties in that space, and third party aggregators of channel feeds, and you may have a few hops before the files land at whichever "streaming cable" service you're watching on. So even if they do everything perfectly on the delivery side, you could already be 30s behind, since those media files and HLS playlist files have already been buffered a couple times since they can come late or out of order at any of those middleman steps. Going further and cutting all the acquisition latency out? That wasn't something really commonly talked about a few years ago when I was exposed to the industry. It was complained about once a year for the Super Bowl, and then fell down the backlog. You'd likely want to own in-house signal acquisition and build a completely different sort of CDN network.
Last I talked to someone familiar with it, the way stuff that cares about low latency (like streaming video game services) does it is much more like what you talk about with custom protocols.
The funny thing is that the web used to have a well-supported low latency streaming protocol… and it was via Flash.
When the world switched away from Flash, we created a bunch of CDN-friendly formats like HLS but by their design, they couldn’t be low latency.
And it broke all my stuff because I was relying on low latency. And I remember reading around at the time — not a single person talked about the loss of a low latency option so I just assumed no one cared for low latency.
Flash "low latency" was just RTMP. CDNs used to offer RTMP solutions, but they were always priced significantly higher than their corresponding HTTP solutions.
When the iPhone came out, HTTP video was the ONLY way to stream video to it. It was clear Flash would never be supported on the iPhone. Flash was also a security nightmare.
So in that environment, The options were:
1) Don't support video on iOS
2) Build a system that can deliver video to iOS, but keep the old RTMP infrastructure running too.
3) Build a system that can deliver video to iOS, Deprecate the old RTMP infrastructure. This option also has a byproduct of reduced bandwidth bills.
For a company, Option 3 is clearly the best choice.
edit: And for the record, latency was discussed a lot during that transition (maybe not very publicly). But between needing iOS support, and reducing bandwidth costs, latency was a problem that was decided to be solved later.
Google puts quite a lot of effort into low latency broadcast for their Youtube Live product. They have noticed that they get substantially more user retention if there are a few seconds of latency vs a minute. When setting up a livestream, there are even choices for the user to trade quality for latency.
That's mostly because streamers want to interact with their audience, and lag there ruins the experience.
What's wrong with WebRTC? Other than it not being simple. In my experience it's supported well enough by browsers.
On the hosting side, you've got Google's C++ implementation, or you there's a GStreamer backend, so you can hook it up with whatever GStreamer can output.
In the stuff I'm doing for work, we can get well below 100ms latency out of it. Since Google uses it for Stadia, i'm pretty sure it can do far better than that? What do you need low latency for, what's your use case? Video conferencing? App/Game streaming?
It's just packet switching with much larger packets, the streaming you're thinking of is essentially the same, just with 16-50 ms sample size rather than 2-10 seconds.
"Streaming" in the media industry just means you don't need to download the entire file before playing it back. The majority of streaming services use something like HLS or DASH that breaks up the video into a bunch of little 2 to 10 seconds files. The player will then download them as needed.
But even then, many CDNs CAN "stream" using chunked transfer encoding.
Having to download the whole file before playing it back is kind of the exception, isn't it ?
As the article says, HLS or DASH are specifically about not having to suffer through buffering by auto-dialing quality down, otherwise you can also start viewing during download with the browser <video> tag, over FTP with VLC, or even with peer to peer software like eMule or torrents !
I'm not sure what "real" streaming would even be ? (It probably wouldn't be over HTTP...)
Yeah as the sibling comment mentions these WebRTC implementations do not scale. While you "can hook it up" for hyper-specific applications and use cases, it does not scale to say an enterprise, where a single SA needs to support LL streaming out to tens of thousands of users.
I imagine the (proprietary) stadia implementation is highly tuned to that specific implementation, with tons of control over the video source (cloud GPUs) literally all the way down to the user's browser(modern chrome implementations). Plus their scale likely isn't in the tens of thousands from a single origin.
Even still, I continue to be blown away by the production latency numbers achieved by game streaming services.
And my use-case is no use-case or every use-case. I'm just a lowly engineer that has seen this gap in the industry.
Well, clearly it wouldn't work for something with always unique files like video-conferencing or game streaming, but with a limited number of files we already have an example of a non-HTTP working solution : Popcorn Time.
Also PeerTube seems to have found a way to combine the cheapness of peer to peer and the reliability of an (HTTP?) dedicated server, I wonder how did they achieve this ?
What makes you write that “these” WebRTC implementations do not scale? Which implementations do you have in mind and why do you think they do not scale? Where do they fall over, and at what point?
Live streaming latency does not jibe well with sports. I’ve since learned to disable any push notifications that reveal what happened 30 seconds prior to my witnessing it. What can be done, at scale, to get us back to the “live” normally experienced with cable or satellite?
> What can be done, at scale, to get us back to the “live” normally experienced with cable or satellite?
Stick with satellite distribution? You're going to have a devil of a time scaling any sort of real-time streaming over an IP network. Every hop adds some latency and scaling pretty much requires some non-zero amount of buffering.
IP Multicast might help but you have to sacrifice bandwidth for the multicast streams and have support all down the line for QoS. It's a hard problem which is why no one has cracked it yet. You need a setup with real-time capability from network ingest, through peering connections, all the way down to end-user terminals.
Good overview of all the parts involved! I was hoping they’d talk a little more about the timing aspects, and keeping audio and video in sync during playback.
What I’ve learned from working on a video editor is that “keeping a/v in sync” is… sort of a misnomer? Or anyway, it sounds very “active”, like you’d have to line up all the frames and carefully set timers to play them or something.
But in practice, the audio and video frames are interleaved in the file, and they naturally come out in order (ish - see replies). The audio plays at a known rate (like 44.1KHz) and every frame of audio and video has a “presentation timestamp”, and these timestamps (are supposed to) line up between the streams.
So you’ve got the audio and video both coming out of the file at way-faster-than-realtime (ideally), and then the syncing ends up being more like: let the audio play, and hold back the next video frame until it’s time to show it. The audio updates a “clock” as it plays (with each audio frame’s timestamp), and a separate loop watches the clock until the next video frame’s time is up.
There seems to be surprisingly little material out there on this stuff, but the most helpful I found was the “Build a video editor in 1000 lines” tutorial [0] along with this spinoff [1], in conjunction with a few hours spent poring over the ffplay.c code trying to figure out how it works.
> let the audio play, and hold back the next video frame until it’s time to show it. The audio updates a “clock” as it plays (with each audio frame’s timestamp), and a separate loop watches the clock until the next video frame’s time is up.
Yes .. but. They're interleaved within the container, but the encoder does not guarantee that they will be properly interleaved or even that they will be particularly temporally close to each other. So if you're operating in "pull" mode, as you should, then you may find that in order to find the next video frame you need to de-container (even if you don't fully decode!) a bunch of audio frames that you don't need yet, or vice versa.
The alternative is to operate in "push" mode: decode whatever frames come off the stream, audio or video, and push them into separate ring buffers for output. This is easier to write but tends to err on the side of buffering more than you need.
Interesting, I think I just dealt with this problem! I'd heard of the push/pull distinction but had interpreted it as "pull = drive the video based on the audio" and "push = some other way?". I think I saw "pull mode" referenced in the Chromium source and I had a hard time finding any definitive definition of push/pull.
What I was originally doing was "push", then: pull packets in order, decode them into frames, put them into separate audio/video ring buffers. I thought this was fine and it avoided reading the file twice, which I was happy with.
And then the other day, on some HN thread, I saw an offhand comment about how some files are muxed weird, like <all the audio><all the video> or some other pathological placement that would end up blocking one thread or another.
So I rewrote it so that the audio and video threads are independent, each reading the packets they care about and ignoring the rest. I think that's "pull" mode, then? It seems to be working fine, the code is definitely simpler, and I realized that the OS would probably be doing some intelligent caching on the file anyway.
Your mention of overbuffering reminds me, though - I still have a decent size buffer that's probably overkill now. I'll cut that back.
Good top-level summary of an extremely complicated subject.
The containers diagram is surprising since the arrow means "derived from" - the earlier formats are at the top, I initially thought the arrow was in the wrong direction. Containers are kind of a nightmare since there is a lot of hidden knowledge you need. Many of the tools will let you put any video in any container, but only certain combinations actually work properly across the various devices and operating systems you will actually need to use. It's easiest to stick to "H264 + AAC in an MPEG4 container".
HLS is a horrible hack that also has the side effect of making it harder to download and store videos.
While HLS isn't the cleanest protocol (Yay for extensions of the M3U playlist format...), it's actually really good at what it's designed to do - provide reliable video streaming while using HTTP/S over variable networks.
Ultimately, HLS isn't designed for downloading and storing videos, it's designed for streaming.
Came to say this exact thing. HLS and it's fancy brother LLHLS aren't storage formats like MP4/FLV are. I think of HLS as a playback format: I'd play a Playlist when watching a VOD/livestream but I'd probably save it as an MP4.
But in reality, it hasn't, right? There are so many tools available that will download a video as a single file for you that's not even an issue. You should try looking for the Wizard with that strawman.
There's a difference between not being designed for something, and being designed to prevent something.
HLS isn't designed to prevent saving, it's just optimised for streaming.
The GP stated that hls makes it harder to download and store videos which is mostly a dubious claim, but even if it were true, in most cases where it’s used this is (if only hypothetically) a benefit since major stream providers generally don’t want their streams stored. The reply sounds appropriate.
Not to judge the content quality but I'm curious about the motivation to build this website. Why? Because in the recent days we were discussing the poor quality Google search results and one argument is that the results are poor because the quality of the content has degraded due to the motivations of the content creators, i.e. optimising everything for SEO and views, likes. etc. with guidance of analytics.
So how this works? Is the creator of the website doing SEO here? Will this be sold later? Is it a project for a portfolio? Why are we getting this good quality content? Will it be surfaced by Google?
Not that I suspect anything nefarious, just curious about how original content(beyond commentary on social media) is made these days. What made that content possible? Why the creator(s) spent time to create graphics and text and payed for a domain name and probably hosting?
We actually built How Video Works as a side project at Mux [1] (inspired by How DNS Works [2]) - there's a note about it at the top of the page. We have contributions from our own team as well as others in the industry.
Our main motivation is to try to educate on the complexities and intracies of streaming video. Despite streaming video representing 80+% of the internet, it's all underpinned by a fairly small community of engineers, which we're eager to help grow through tools like this, and the Demuxed community [3].
Edit: I should also mention that Leandro was kind enough to adapt a this content from his amazing Digital Video Introduction [4]
Uh, if you read a bit about Leandro, you'll learn that he's a senior engineer at Grupo Globo in Brazil. I'll leave it to you to discover more about Globo.
Does all the 1.7GB of the decoded video get copied to the GPU? Or is there some playback controller that knows how to read the “delta format” from the codecs and only copies deltas to the framebuffer?
It still blows my mind that we can stream video at 60FPS. I was making an animation app that did frame-by-frame playback and 16.6ms goes by fast! Just unpacking a frame into memory and copying it to the GPU seemed like it took a while.
Smart people chuck the encoded video at the GPU and let that deal with it: e.g. https://docs.nvidia.com/video-technologies/video-codec-sdk/n... ; very important on low end systems where the CPU genuinely
can't do that at realtime speed. Raspberry Pi and so on.
> 16.6ms
That's sixteen million nanoseconds, you should be able to issue thirty million instructions in that time from an ordinary 2GHz CPU. A GPU will give you several billion. Just don't waste them.
Agreed. GPUs support decoding a wide range of codecs (even though you are probably using something like H.264). So it doesn't make sense wasting the time to both decode the data and pipe it out to the GPU.
You shouldn’t copy the frame data to the GPU (assuming that’s literally what your code was doing).
Instead create a GPU texture that’s backed by a fixed buffer in main memory. Decode into that buffer, unlock it, and draw using the texture. The GPU will do direct memory access over PCIe, avoiding the copy.
The CPU can’t be writing into the buffer while the GPU may be reading from it, so you can either use locks or double buffering to synchronize access.
> Instead create a GPU texture that’s backed by a fixed buffer in main memory. Decode into that buffer, unlock it, and draw using the texture. The GPU will do direct memory access over PCIe, avoiding the copy.
With a dedicated GPU with its own memory there still is usually a memory to memory copy, it just doesn’t have to involve the CPU.
Yeah, like so many other things in the GPU world, main RAM texture storage is more of a hint to the graphics card driver — "this buffer isn't going away and won't change until I explicitly tell you otherwise".
It definitely used to be that GPUs did real DMA texture reads though, at least in the early days of dedicated GPUs with fairly little local RAM. I'm thinking back to when the Mac OS X accelerated window compositor was introduced — the graphics RAM simply wouldn't have been enough to hold more than a handful of window buffers.
What I'm curious about is the actual hardware behind video sharing sites. Like how can Streamable,Reddit,Twitter encode such a massive amount of videos at scale. Do they have GPU farms? Dedicated hardware encoding hardware that us mortals can't buy? I left out YT on purpose because they have practically endless money to throw at the problem.
Great question! The real answer is it varies, but for H.264, most just encode on software right now, because GPUs are expensive (especially in the cloud), and the failure rates are really high (if you try to build your own). ffmpeg and lib264 is really fast on modern hardware with decent X86 extensions.
It's also worth noting that YouTube also now builds its own transcoding chips [1], and AWS just launched a dedicated transcoding instances based on Xilinx chips:
Software can be done at utterly stupid multiples of realtime at SD resolutions with only a few cores, depending on your quality target. Cores are very cheap
Fancy GPUs tend to support 8 or more HD streams, even consumer cards using patched drivers.
Then you have dedicated accelerator hardware, these can pack a tremendous amount of transcode into a tiny package. For example on AWS you have vt1 instances which support 8 (or 16?) simultaneous full HD/SD/QHD ladders at 2x realtime for around $200/mo.
In answer to your actual question, at least YouTube selectively transcodes using fancier/more specific methods according to the popularity of the content. They do the cheap thing in bulk and the high quality thing for the 1% of content folk actually watch
In it two guys discuss how to reconstruct an audio file from its image representation, and it turns out to be pretty straightforward.
In the end they are discussing the legal implications of being able to reconstruct audio from an image: if you buy the rights to the image does it give you the right to the audio (probably not!)
But what it makes me wonder is how one could maybe draw sound, or have some kind of generative art program that could be used to first draw a wave and then listen to it. Maybe this has been done already?
This is a really good introduction to how video works. The only think I find missing from a production OTT environment is DRM. Also I assume when the author says "http" he means "https" there are very few (I knew only one and even they used https termination by Akamai) providers who use "http" these days for streaming even with DRM.
If anyone is building an economically viable OTT platform they should consider building their own CDN. Insiders tell me the Singapore based OTT provider HOOQ went bankrupt paying huge bills to AWS and Akamai.
> Also I assume when the author says "http" he means "https" there are very few (I knew only one and even they used https termination by Akamai) providers who use "http" these days for streaming even with DRM.
For what the guide is, I think HTTP is accurate enough, as HTTPS is not its own protocol, it's always HTTP+TLS/SSL, so just saying HTTP is good enough. The ones that need to add the extension, will know so. For video editors (and other parts of the target for this guide), it's more than likely superfluous information.
Also, as far as I know, DRM and HTTPS have different use cases. The DRM we see in browsers are around controlling playback on the client device, not encrypting the content to/from the media server/client device, while HTTPS is the opposite. So if they are using DRM or not doesn't matter when we talk about using HTTPS or not.
As good as thread as any to ask: I have a lot of footage that is recorded continuously over several hours. Most of the time nothing notable happens in the video, but I have a set of timestamps of interesting events. (Think security camera with motion detection events, though this is not that.)
Currently I extract the frame at each interesting timestamp and serve them as JPEGs. I have a web-based playback UI that includes a scrubber and list of events to jump to a particular timestamp.
I would love to upgrade to a real video player that streams the underlying video, allowing the user to replay footage surrounding a timestamp. I have to be able to reliably seek to an exact frame from the event list, though.
I've been looking for something web-based, self-hostable, that doesn't require I transcode my media up front (e.g., break down into HLS chunks). I have few users accessing it at a time so I can transcode on the fly with some caching (though I think it is already H.262 or H.264). Is there anything suitable out there?
What is possible is going to depend a lot on the CPU you have and the media you have.
That said, ffmpeg is going to be the best tool (IMO) to handle this. You may also look at a tool like Vapoursynth or AviSynth if you want to do any sort of preprocessing to the images.
If the video is H.262 (or it is H.264 at a insane bitrate like 50Mbps), I'd encourage transcoding to something not as bitrate heavy. AV1 and HEVC are 2 of the best in class targets (but require a LOT of computational horsepower... OK there is also technically VVC, but nothing really supports that).
If time is of the essence, then I'd suggest looking into what sort of codecs are supported by your CPU/GPU. They won't give you great quality but they will give you very fast transcoding. You'll want to target the latest codec possible.
H.264 is pretty old at this point, H.265 (HEVC) or vp9 will do a better job at a lower bitrate if your card supports either. They are also relatively well supported. VP9 is royalty free.
If your GPU or CPU do not support any recent codec, you might look into the SVT encoders for AV1/VP9, and x264/5 for H.264 or H.265.
All this said, if the codec is fine and at a streamable bitrate, ffmpeg totally supports copying the stream from timeslices. You'll have to play around with buffering some of the stream so you can have ffmpeg do the slicing, but it's not too hard. That's the best option if the stream is streamable (transcoding will always hurt quality).
Oh, and you'll very likely want to compile ffmpeg from source. The version of ffmpeg bundled with your OS is (likely) really old and may not have the encoders you are after. It's a huge PITA, but worth it, IMO. Alternatively you can likely find a build with all the stuff you want... but you'll need a level of trust in the provider of that binary.
Just because a file is encoded as H.264 does not mean it is streamable. The encoding needs to be done in a way that makes streaming realistic.
For the rest of it, I would suggest an ffmpeg solution on a server. You can have it re-encode just the requested times while encoding to a streaming friendly format. There are JS libraries available that allow you to use HLS in the native <video> tag.
I had a similar issue and worked completely around it by using ffmpeg to generate a video file around the point of interest. Instead of serving up the JPEG as in your case, I would serve a small 10s mp4. It was something like (-5s to +5s). Trying to directly stream the file and seek was unreliable for me and I didn't want to get into setting up a full streaming server.
On the contrary, I think Introducing video as RGB is definitely the right choice. RGB is familiar to most people, and is ultimately how things get displayed on your monitor. For digital video, YUV is really just a convenient format for compression reasons as far as I'm aware.
I do think it would be neat to point out YUV in an aside though. The history behind the format's origins (compatibility with black-and-white TVs) is interesting.
As far as the frame rate goes, I'm willing to bet that more people are aware of 60fps as a standard frame rate for videogames than 24fps as a standard frame rate for film.
That's a really nice introduction. One newbie question: how is this influenced by DRM on the browser? Is it all the same plus some security on top or do videos with DRM use a proprietary codecs and packaging?
DRM'd video content uses the same video codecs and containers, but introduces segment encryption during the packaging phase. In most cases, this encryption is within the audio and video samples rather than on the entire segment. Most content is encrypted using MPEG Common Encryption (CENC) - though there are a couple of variants.
Decryption keys are then exchanged using one of the common proprietary DRM protocols, usually Widevine (Google), Playready (Microsoft), or Fairplay (Apple). The CDM (Content Decryption Module) in the browser is then passed the decryption key, so the browser. Can decrypt the content for playback.
I think the DRM part is made during the packaging phase the https://howvideo.works/#packaging (where they need to signalize the media has drm and also encrypt the media)
>Using various content delivery networks, Mux is driving HTTP Live Streaming (HLS) latency down to the lowest levels possible levels, and partnering with the best services at every mile of delivery is crucial in supporting this continued goal.
In my experience, HLS and even LLHLS are a nightmare for latency. I jokingly call it "High Latency Streaming", since it seems very hard to (reliably) obtain glass-to-glass latency in the LL range (under 4 seconds). Usually Latency with cloud streaming gets to at least 30+s.
I've dabbled with implementing WebRTC solutions to obtain Ultra Low Latency (<1s) delivery but that is even more complicated and fragmented with all of the browsers vying for standardization. The solution I've cooked up in the lab with mediasoup requires an FFMPEG shim to convert from MPEGTS/h264 via UDP/SRT to MKV/YP9 via RTP, which of course drives up the latency. Mediasoup has a ton of opinionated quirks for RTP ingest too, of course. Still I've been able to prove out 400ms "glass-to-glass" which has been fun.
I wonder if Mux or really anyone has intentions to deliver scalable, on cloud or on prem solutions to fill the web-native LL/Ultra LL void left by the death of flash. I'm aware of some niche solutions like Softvelum's nimble streamer, but I hate their business model and I don't know anything about their scalability.