Hacker News new | past | comments | ask | show | jobs | submit login
Apple’s Low Latency HLS differs from the community-developed solution (mux.com)
178 points by UkiahSmith 3 months ago | hide | past | web | favorite | 106 comments

I've been recently tasked to find a live video solution for an industrial device. In my case, I want to display video from a camera to a local LCD and simultaneously allow it to be live streamed over the web. By web, I mean that the most likely location of the client is on the same LAN, but this is not guaranteed. I figured this has to be a completely solved problem by now.

Anyway, so I've tried many of the recent protocols. I was really hoping that HLS would work, because it's so simple. For example, I can use the gstreamer "hlssink" to generate the files, and basically deliver video with a one-line shell script and any webserver. But the 7 second best case latency is unacceptable. I really want 1 second or better.

I looked at MPEG-DASH: it seems equivalent to HLS. Why would I use it when all of the MPEG-DASH examples fall back on HLS?

I looked at WebRTC, but I'm too nervous the build a product around the few sample client/server code bases I can find on github. They are not fully baked, and then I'm really depending on a non-standard solution.

I looked a Flash: but of course it's not desirable to use it these days.

So the solution that works for me happens to be the oldest: Motion JPEG, where I have to give up on using a good video compression (MPEG). I get below 1 second latency, and no coding (use ffmpeg + ffserver). Luckily Internet Explorer is dead enough that I don't have to worry about its non-support of it. It works everywhere else, including Microsoft-Edge. MJPEG is not great in that the latency can be higher if the client can't keep up. I think WebRTC is likely better here.

Conclusion: here we are in 2019 and the best low latency video delivery protocol is from the mid-90s. It's nuts. I'm open to suggestions in case I've missed anything.

A fairly long time ago (3-4) years I was tasked to do something fairly similar (though running on Android as the end client). HLS was one of the better options but came at the same costs you describe here. However it was fairly easy to reduce the block size to be less to favor response vs resilience. Essentially you trade buffer size and bitrate switching quality for more precise scrolling through the video and faster start times.

I had to hack it quite severely to get fast load with fair resilience for my usecase as the devices are restricted in performance and can have fairly low bandwidth. Since you're looking at a relatively fast connection, simply reducing the chunk size should get you to the target.

As a follow up - I've spent a couple years working on a video product based on WebRTC. This either works for a PoC where you just hack things together or on a large scale where you have time and resources to fight odd bugs and work through a spectrum of logistical hoops in setting it up. So unless you plan to have a large-ish deployment with people taking care of it I would stick to HLS or other simpler protocols.

> I looked a Flash: but of course it's not desirable to use it these days.

RTMP protocol has a lot of implementations and is still widely used for the backend part of transmitting video at a low latency (i.e. from the recorder to the server).

RTSP with or without interleaved stream is another option.

DASH/HLS is a solution for worldwide CDN delivery and browser based rendering. Poorly suited for low latency.

If you need low latency and browser based rendering you need something custom.

I think OP wanted a browser-based non-flash player solution. Pretty much rules out RTMP.

I have used nginx-rtmp in the past - a very solid solution for my use case.

With some friends we are still using rtmp for our own little game stream server. It's the only way we managed to have very low latency (1-2s).

But this is also a little annoying to be forced to use flash to open the stream. I wonder if we could find something better now.

Speaking of nginx-rtmp, look like dev is stalled now. Anyone know if there is an alternative or someone who took over ?

FLV stream might be an option.


You can also consider tunneling over WebSockets. It's a lot easier than WebRTC especially you don't need the handshaking nonsense which often require self hosting STUN and TURN servers if you don't want to rely on third parties. IIRC the performance of WebSockets is good enough for companies like Xoom.


Some VNC services like noVNC and Xpra also uses WebSockets.

You should probably try mixer. They rolled a low latency protocol by their own. It use websocket as a bilateral channel to allow the server push whatever it want to client directly. Achieving sub second delay (The model here looks more like webrtc instead of hls though)

I have no idea what the underlying tech is, but Steam Link can do extremely low latency on the same network and very low latency over the internet. It can also stream non-game applications, though I imagine automating steam is a nightmare.

Parsec works similarly: https://parsecgaming.com

And Steam uses RTMP for upload like anyone she would do. Just like OBS and other tools.

Sadly browsers do not come RTMP enabled.

Thanks for the info. I did a quick search before commenting but couldn't find out what the backend was.

Me and my friends have our own little streaming website and manage to get 1~2 seconds delay.... It's nothing fancy, NGINX with the RTMP plugin from which we get the streams, it only passes them trough, once we added encoding we had a noticeable delay. This is flash tech that can be run as html 5 now, but I didn't see this within your list so perhaps you haven't looked at it.

I wonder why serving an endless http stream of something like h264 is not an option? Can't ffmpeg produce it in real time at your resolution?

H264 over Websockets and played by Media Source Extensions is pretty simple too (<200LOC): https://github.com/elsampsa/websocket-mse-demo

Interesting, I had tried to get an HTML5 video element to read from a gstreamer-based MPEG source, but it would not work. I'm pretty sure because gstreamer did not provide a real HTTP server, so the headers were messed up. It's odd, because oggmux did work over tcpserversink. Anyway, I will try this because I'm interested in the resulting latency.

>Live video rolls for a minute or so, until it stops. Don't no why, but you may continue from here.. :)

Check out NDI. Near real-time local video streaming. It’s inexpensive/no cost if you are doing it all in software.

Keep in mind that NDI is a proprietary technology from NewTek, not an open spec like SMPTE 2110/2022. That being said it does work remarkably well in my experience, provided you have a dedicated network for it.

Similar situation here, ended up with the same solution, after an initial attempt with HLS. jsmpeg (https://github.com/phoboslab/jsmpeg) made it pretty easy.

Try streaming TS packets over Websockets and decoding with FFMPEG compiled to WASM in the browser. I wrote, https://github.com/colek42/streamingDemo a couple years back, and despite the hacky code it worked really well. You could probably do much better today.

We recently completed a project with similar requirements. We ended up using rtsp from the camera and packing it up in websockets using ffmpeg. We had sub second latency. The camera gave h264 so we could just repack that. We're giving a talk about the project on MonteVIDEO Tech meetup, though it will be in Spanish.

Contact me if you want to discuss it further.

WebRTC options are improving, check out


Well I was hoping to not have to use a commercial product. From the front page, "Ultra Low Latency WebRTC" is supported only in the Enterprise Edition. I may as well use Flash.

"8-12 seconds End-to-End Latency" for community edition.

Actually a commercial product is not necessarily a problem, but the monthly fees are. If there was a one time fee version (perhaps with limited number of clients or something), then this might work.

I used jsmpeg to live stream camera feeds from robots. There are a few others that do the same. In my case I wrote a custom go server to handle the multiplexing. It did fairly well, and was able to support something like 60 clients at a time. This was a weekend project and I don't have time to keep the reobots on line so I will leave you with some video of a early client I build. There are some other videos showing the robots off on my channel.

I also poked a round with making a real time remote desktop client that could be access via a web browser for linux. It to -- at least on local lans got very low latency video. The link for that is below too.

- https://www.youtube.com/watch?v=7kSbm-IQjK0

- https://www.youtube.com/watch?v=JJ_srz7Ti8Y

Edit: I should mention latency were measured in ms, not seconds, even for many clients. I am sure to scale out to 1000's of users I would have to add a bit, of latency but not by much.

Oh yeah, I saw that. I'm also hoping to be able to use the h.264 compression hardware built into the SoC we're using and it was my understanding that jsmpeg was MPEG1 only.

That being said, the ffmpeg solution is not using the hardware accelerator either, even though it does support MJPEG. But I think with work we can get a gstreamer based solution: the missing part is an equivalent of ffserver that works with gstreamer. The hardware vendors like to provide gstreamer plug-ins for their accelerators.

Also, it's weird to me that this needs a giant javascript client library. What about the HTML5 built-in video support?

If you are using mpeg1 you can just dump the packets on the line. And if you want to get fancy you can read in a HQ stream and setup a beefy server to run 3 or 4 conversions to different bandwith classes and move clients up and down as required.

My code is geared to robots -- and has not been updated recently but there is at least a example of the simpler multiplexing in go.


Cool, I'll try it.

How does Twitch do low latency mode? Isn't that 1-2 seconds?

Their own variation of HLS. Note that except for Safari, browsers don't implement HLS directly, but rather websites do, through HLS.js etc. So you can implement whatever low latency version of HLS you want (assuming it is constructed of HTTP primitives that JS can access).

HLS is also natively supported in Edge on Windows and Chrome on Android: https://en.m.wikipedia.org/wiki/HTTP_Live_Streaming#Clients

Does it need to work in a browser or could you provide VLC or similar as the client and use the 90s protocol that actually addresses your use case?

You want WebRTC

You should try webrtc. googles stadia is built upon webrtc, so I assume it should be able to give latency in miliseconds.

Why can't you simply use plain HTTP streaming directly loaded by the browser with video tag?

The major criticism the author has is the requirement for HTTP2 push for ALHLS, which many CDNs don't support. While I agree it is a valid criticism, I am glad Apple is forcing the CDNs to support push. Without the 800lb gorilla pushing everyone to upgrade, we would still be using pens on touchscreens.

I am not a fan when Apple obsoletes features that people love and use. But I always support when Apple forces everyone to upgrade because friction from existing providers is what keeps things slow and old. Once Apple requires X, everyone just sighs and updates their code, and 12mo later, we are better off for it.

That being said, I agree with author's disappointment that Apple mostly ignored LHLS instead of building upon it. Chunked encoding does sound better.

There are good reasons CDNs don't support http/2 push. It’s hard to load balance and hard to operate, since it requires a persistent TCP connection with the client for the entire duration of the stream, which can be hours. It has consequences that echo almost everywhere in the architecture.

I agree. But now there is motivation for the CDNs to solve these problems in a clever way so that users can get better performance and lower latency.

They are likely to “solve” them by charging more. Serving low-latency HLS to Apple devices will cost more, continuing consolidation into the few tech giants big enough to pay what it takes to get inside the iOS walls. Hardly progress.

I believe Cloudflare supports it and it's free.

What exactly is the benefit of HTTP2 for HLS CDN use, particularly?

The obvious benefit of not using it is that you don't need your CDN to do TLS, likely to be utterly superfluous if video chunks are validated through the secure side-channel of the playlist already.

TLS provides privacy, not merely validity. Some folks don't want others knowing what they watch after connecting to, er, Disney+.

(I have no idea why video streaming might be better over HTTP/2 either.)

TLS specifically does not prevent a passive eavesdropper from telling what compressed video you’re watching. If they can drop a few packets and force you to rebuffer, they can tell very quickly—plausibly faster than you can tell watching the video start!

Dropping packets is not passive...

Okay so ignore that part. It's a tangent to the actual point.

Can you point us to some documentation? I’m having trouble seeing how this would work absent a huge vulnerability in TLS.




There's variation in the segment-to-segment sizes of video. Watching the stream of data, you can pretty easily find many of the segment sizes, and from there you just need a lookup table.

Figuring out spoken language or which web pages are in packets is fuzzier but still viable.

Ah, thanks. That makes sense.

That’s interesting. Do you have a reference to a working example?

How does that work?

The main redeeming feature of traditional HLS is that it can use ordinary HTTP CDN infrastructure. If you're going to require video-streaming-specific functionality in CDNs anyway there is absolutely no justification for designing your protocol in this horrendously convoluted, inefficient, poorly-performing way.

It's ironic that "live streaming" has gotten worse since it was invented in the 1930's. Up until TV went digital, the delay on analog TV was just the speed of light transmission time plus a little bit for broadcasting equipment. It was so small it was imperceptible. If you had a portable TV at the live event, you just heard a slight echo.

Now the best we can do is over 1 second, and closer to 3 seconds for something like satellite TV, where everything is in control of the broadcaster from end to end.

I suppose this is the tradeoff we make for using more generalized equipment that has much broader worldwide access than analog TV.

Yes, and its driven by consumers.

Unless your content operates in a very small niche, "real time" is far less important that continuity.

In rough order of preference for the consumer:

1) it starts fast 2) it never stops playing 3) It looks colourful 4) Good quality sound 5) Good quality picture 10) latency

one of the main reason why "live" broadcast over digital TV has a stock latency of >1 second is FEC. (forward error correction) this allows a continuous stream of high quality over a noisy transport mechanism. (yes, there is the local operating rules for indecent behaviour, and switch and effects delays, which account for 10 seconds and >250ms respectively)

For IPtv its buffer. Having a stuttering stream will cause your consumers to switch off/go elsewhere. One of the reasons why realplayer held on for so long was that it was the only system that could dynamically switch bitrates seamless, and reliably.

There is a reason why netflix et al start off with a low quality stream, and then switchout to HD 30 seconds in, its that people want to watch it now, with no interruption. They have millions of data points to back that up.

Google seems to think they can implement video gaming over IP. And they probably can, my ping to them is only 9ms, less than a frame.

There is just a broad lack of interest in reducing latency past a certain point unless there is a business reason for it. People don't notice 1 second of latency.

Why didn't they use that capability for voice/video communication? Are games a better business?

I remember Hangouts being better than Skype, but that's not high praise. Every calling service I've used has been laggy and disconnects often.

They do, voice and video calls are intolerable at 1s of latency.

> Google seems to think they can implement video gaming over IP.

No the didn't. Early attempts at streaming videogame were unplayable even with a server in another room or a direct DC connection.

And they want it to work over an average internet connection in America.

No, the definitely did not solve the issue

And yet, I was able to game competitively from my apartment in Brisbane, using a server in Sydney, using Parsec; usually coming in at less than a frame of latency, sometimes just over a frame. This was two years ago, too. And Australia isn't known for it's amazing internet connections (though mine was better than most).

Just because one group was incompetent doesn't mean another will be.

It has been possible for years to get a total encode+decode latency of less than one frame with x264.

Meanwhile many people are gaming on TVs that impose 3-8 frames of processing lag.

And you can beat most current tech by more than half a frame just by supporting HDMI 2.1 or variable refresh rate. (Instead of taking 1/60 of a second to send a frame, you send it as fast as the cable can support, which is 5-12x faster)

I played over 20 hours of assassins creed through chrome during the stadia beta and I couldn't notice any latency. While it might not work for games like cs go, AR, or bad networks, they 100% have a working product today for assassins creed.

> less than a frame

my 144Hz screen would disagree.

It's not surprising if you think about how our ability to store video has changed over the years. The delay on analog TV is so low because the picture data had to go straight from the camera to the screen with basically no buffering since it was infeasible to store that much data. (PAL televisions buffered the previous scanline in an analog delay line for colour decoding purposes, but that was pretty much cutting edge at the time.) Now that we can buffer multiple frames cheaply, that makes it feasible to compress video and transmit it without the kind of dedicated, high-bandwidth, low-latency links required in the analog days. Which in turn makes it possible to choose from more than a handful of channels.

We also lost the ability to broadcast things like rain, snow, fireworks, dark scenes and confetti with the loss of analog.

I missed something. What do you mean?

Compression artifacts ruin scenes with those things in them. Analog isn’t compressed so it has no artifacts.

I see.

That seems to be mostly solved with high speed links and better encoding technology.

No, it got worse. Try H265, compression artifacts are pretty bad in certain scenarios, even with high bitrate. Same with h264 - but it can be solved with high bitrate - but your file size also gets much much bigger. Which means you will need very low latency, high-speed internet.

I think youtube is the only streaming service that does it very well without any issues for the end-user, anywhere in the world. Mostly because of their free peering service that is extremely ubiquitous. https://peering.google.com/#/

He meant better encoder technology at the same bitrate.

H265 encoding/compression tech is the best in the world right now - unless I missed something.

How you encode something goes beyond the standard. You can encode the same source at the same bitrate and with the same standard in different ways.

For example, there were noticeable quality differences between MP3 coders.

I’m having trouble finding good examples of confetti or fireworks on Netflix. I’ve noticed fewer problems as time goes on, but anecdata and all that.

Different encoders and standards will have different problematic scenarios. It seems like the number of problematic scenarios is decreasing.

I can encode DVD video (MPEG2) as h.264 at a huge bitrate decrease with no apparent quality loss.

Certainly streaming real time is harder with digital formats, but it’s generally good enough.

You can't find good examples because people actively avoid recording and uploading videos containing things that are hard to encode.

I find this hard to believe. Netflix is going to avoid certain titles because they aren’t going to look perfect?

Noone's producing titles that can't look good on most platforms.

Some delay from many producers is almost certainly intentional. Live content providers want to be able to have a second to cut a stream if something unexpected (profanity, nudity, injury...) occurs on set.

Analog TV is also massively less spectrum efficient. You can fit 4+ digital channels in the same spectrum as one analog TV channel.

And don't forget how low and inconsistent the quality of analog TV was compared to what we can broadcast digitally.

The real story here is that latency isn't actually important to live TV, so it's a no-brainer trade-off to make. If you look at other transmission technologies where latency is more important, like cellular data transmission, latency has only decreased over the years.

> Now the best we can do is over 1 second

Mixer can do about .2 seconds.

This title is unnecessarily inflammatory with intent to gain our sympathy to the position presented.

The technical writeup of this post are spot-on, though. I prefer less drama with my bias but I’m very glad I read this.

Thanks, that's exactly how I felt — that there’s a really good and useful article in here, but clouded by assumptions and an attempt to create controversy.

Note: HN altered the title from the original to which my comment refers.

Looks like as far back as 2014 research has pointed to some big gains using HTTP/2 push: https://dl.acm.org/citation.cfm?id=2578277

> A Partial Segment must be completely available for download at the full speed of the link to the client at the time it is added to the playlist.

So with this, you can not have a manifest file that point to next future chunks (e.g. for up to next 24 hours of live stream) and delay proccessing of http request until the chunk became available. Like HTTP Long Polling used for chunks.

> On the surface, LHLS maintains the traditional HLS paradigm, polling for playlist updates, and then grabbing segments, however, because of the ability to stream a segment back as it's being encoded, you actually don’t have to reload the playlist that often, while in ALHLS, you’ll still be polling the playlist many times a second looking for new parts to be available, even if they’re then pushed to you off the back of the manifest request.

Which could be avoided if Apple didn't enforced the availibilty of download "at the full speed" once it appeared in the manifest. (long polling of chunks)

LHLS doesn't have this issue as the manifest file itself is streamed with chunked responses hence it makes sense. (streaming manifest file)

> For the time being at least, you’ll have to get your application (and thus your low latency implementation) tested by Apple to get into the app store, signaled by using a special identifier in your application’s manifest.

And this makes me to think about the implementability of the 1st and 2nd point on ALHLS. Maybe the current "implementation" is compatible but not with the specs itself.

To your last point,

> Maybe the current "implementation" is compatible but not with the specs itself.

It's perhaps worth noting that this is a "preliminary specification" and an extension of HLS. HLS itself is an IETF standard (well - an "Internet Draft"): https://tools.ietf.org/html/draft-pantos-http-live-streaming...

It is not an IETF standard - those have RFC numbers. It is just a personal draft - any IETF member can upload one of those, regardless if it's useful. I'm happy that it has a specification - but it's just a one-man project, not something that has gone through the IETF standards process.

A couple of years ago it finally was published as an RFC: https://tools.ietf.org/html/rfc8216

Any chance some of this is related to patent avoidance?

> measuring the performance of a blocking playlist fetch along with a segment load doesn’t give you an accurate measurement, and you can’t use your playlist download performance as a proxy.

I don’t see why this would be the case. If you measure from the time the last bit of the playlist is returned to the last bit of the video segment is pushed to the client, you’ll be able to estimate bandwidth accurately.

> from the time the last bit of the playlist is returned to the last bit of the video segment

Based on my loose understanding of HTTP/2 server push and ALHLS, the sequence of events will be:

1. Client requests playlist for future media segment/"Part"

2. Server blocks (does not send response) until the segment is available

3. Server sends the playlist ("manifest") as the response body along with a push promise for the segment itself

The push then begins with the segment.

The push stream can presumably occur concurrently with the response body stream. So I don't think you can wait until every bit of the playlist comes in. Likewise, you can't use the playlist bits itself to gauge bandwidth because the server imposes latency by blocking.

It’s still over the same TCP connection, so there is no way for the push to beat the playlist.

Apple low latency test stream I set up if useful (uses CDN) https://alhls-switchmedia.global.ssl.fastly.net/lhls/master....

As usual, Apple pushes NIH, instead of supporting DASH which is the common standard. And they also tried to sabotage adoption of the later by refusing to support MSE on the client side that's needed for handling DASH.

> As usual, Apple pushes NIH, instead of supporting DASH which is the common standard.

I mean... HLS predates DASH. It would've been hard for them to support a common standard which didn't even exist at the time. Initial release of HLS was in 2009[0], work started on DASH in 2010[1].

I'd also disagree with the characterization of DASH as "the commmon standard" - it's certainly a legitimate standard, but I feel like support for HLS is more ubiquitous than support for DASH (please correct me if I'm wrong).

[0] https://en.wikipedia.org/wiki/HTTP_Live_Streaming

[1] https://en.wikipedia.org/wiki/Dynamic_Adaptive_Streaming_ove...

Predating doesn't stop them from supporting something else once it becomes common. They don't do it since they want to impose HLS on others. And their refusal to support MSE[1] on iOS stinks even more clearly as an anti-competitive method to do it.

[1]. https://en.wikipedia.org/wiki/Media_Source_Extensions

Why didn’t MPEG just adopt HLS?

I think you have the NIH cause-effect the wrong way around.

Apple isn't exactly the champion of free standards. Is HLS free for others to adopt? DASH is. The same messed up story happened with touch events for JavaScript. What Apple were pushing wasn't free.

> DASH is free to adopt

Citation needed :)

From https://www.mpegla.com/programs/dash/

> The royalty for DASH Clients, which is now payable from January 1, 2017 forward, remains US $0.05 per unit after the first 100,000 units each year.

And that sounds like reason enough for Apple to ignore the DASH spec. Case closed IMHO.

Apple should be out of video business, if they are scared of patent trolls.

But the problem is the reverse here. Apple have patents on HLS itself which are not free, so it's not suitable for general adoption.

That's MPEG-LA patent trolls, nothing to do with MPEG in MPEG-DASH for the reference. They can claim they own the Moon the same way. They do it with anything that looks usable and related to video. Patent trolls aren't really the measure of how free the standard is.

Apple however are the owners of HLS, and unlike some random patent trolls, if they are insisting on its adoption, they have to make sure their patents on it are royalty free. Not that it will protect anyone from further patent tolls attacks from the likes of MPEG-LA, but that's a requirement.

Apple has never been shy about communicating which technologies of theirs they have patented. While it would be nice to have an explicit statement from Apple, the absence of it speaks volumes.

Has MPEG made any statement about the MPEG-LA patent pool?

> While it would be nice to have an explicit statement from Apple, the absence of it speaks volumes.

I'm not sure what that means. Either they released it royalty free or not. Since there is no public statement about it, there is no reason to assume it's free.

> Has MPEG made any statement about the MPEG-LA patent pool?

I don't think anyone cares to make statements about patent trolls. The only effective way to deal with them is to bust their claims in court, which not many want to do.

> Is HLS free for others to adopt?

Yes, and v7 of the protocol is documented here: https://tools.ietf.org/html/rfc8216

I don't see it saying there it's free for everyone to use. It also says:

> This document is not an Internet Standards Track specification; it is published for informational purposes.

IETF standard should be free as far as I know. So the fact that it's not is already suspicious.

Knowing Apple, they probably patented it to the brink. So they released it royalty free? And when exactly?

> I don't see it saying there it's free for everyone to use.

Note the word 'and'. That link wasn't to explain that it's free.

Also "is not an IETF standard" is a pretty low bar for suspicion!

Well, exactly the point. So this link doesn't say anything about whether it's free.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact