Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: WebRTC Insertable Streams and server processing (github.com)
86 points by Sean-Der 18 days ago | hide | past | web | favorite | 24 comments

Hey HN!

This is a little demo of WebRTC insertable streams. This API has the possibility of opening up some exciting with WebRTC.

The gist of it is that you have access to the video frames in the browser now! You can see here [0] flipping bits in the browser (a simple XOR Cipher) that was applied here [1] in the Go code.

The use case I am most excited for is attaching metadata to frames. Especially for teleoperation a big ask has been attaching metadata to a specific frame, this makes it possible!

I also have a video demo on Twitter here[2] but not that exciting

[0] https://github.com/pion/webrtc/blob/master/examples/insertab...

[1] https://github.com/pion/webrtc/blob/master/examples/insertab...

[2] https://twitter.com/_pion/status/1271956810015010816

Hi, perhaps you could be a little more explicit in the README about what you mean by "insertable stream" for people without the background. Is this like a dataflow-graph where you can combine streams using generic "operation" nodes/functions?

> The gist of it is that you have access to the video frames in the browser now!

As a total newbie in this area, it sounds surprising that this wasn't already the case. Would you access these frames from Javascript or WebAssembly? Would this be efficient enough (in terms of speed and power use)?

Pion seems promising!

Is there anything like Jitsi built on Pion?

(I have been using Jitsi a lot lately. It is great software and a good example that free software can be just as featureful as the proprietary alternatives, but also a pretty complex software stack on it's own, there's a lot to learn just to extend it.)

Thank you :) Yes! There are three 'media servers' that I know of. You can also find more stuff in awesome-pion [2]

* https://github.com/pion/ion

* https://github.com/peer-calls/peer-calls

* https://www.irif.fr/~jch/software/sfu/

Ion (the first one) was built with extensibility in mind. Each service runs in its own container, so you can just fork ion-app-web [0]. You continue to get the backend improvements though.

We are working on adding RTMP/SIP bridges as their own nodes. There is also a generic 'AVP' node that will allows media processing. You can save to disk, or other custom stuff you define.

If you are interested would love to have you involved. Hop on Slack [1] or feel to just leave ideas/feedback on Github!

[0] https://github.com/pion/ion-app-web

[1] https://pion.ly/slack

[2] https://github.com/pion/awesome-pion

I have worked with the library a good bit, and it is definitely getting there. There is still too much reliance on ssrcs and simulcast is not directly supported yet. There is discussion of a v3 API on the horizon which is interesting as well. I have used it well in multiple cases, the library is clean and the responsiveness of devs on the slack channel is great.

Once it gets simulcast, I may build my own Jitsi kinda thing with it. Also as already mentioned by the other comment, Ion is the main one they are building.

I am kinda annoyed by VP8. It is so widespread and so awful due no support in hardware. I think even latest macos doesn't support hardware encoding. Once video is on VP8 iPhone will became hot, 5 year old MBP turns on all fans.

Just switch to H264 and everything works just fine.

Quality is not that different.

Why everyone is obsessed with it?

> latest macos doesn't support hardware encoding. Once video is on VP8 iPhone will became hot, 5 year old MBP turns on all fans

I don't understand. Why isn't your question "Why is Apple so extremely late to the party?" ?

An iPhone becoming hot with VP8 is caused by Apple not improving VP8 support, not by any inherent quality of H.264. They have been denying for quite a lot of time that the industry did standardize and make mandatory the support for both H.264 and VP8 if you want to implement WebRTC. Apple ignored that and until very recently, they decided to ignore half of those codecs.

If I had to guess, maybe "everyone" is "obsessed" with VP8 because H.264 is encumbered by royalties and stuff. VP8 is royalty-free, and its successor VP9 is too.

What? If your CPU have H264 then you shouldn't pay any licenses. You will have it in any CPU everywhere.

It is not apple late to the party, it is just doesn't make sense to implement VP8 in silicone. Also burning macbooks is Intel issue, not mac. Or may be chrome team are not using hardware acceleration.

VP8 is supported for sure, but this codec is subpar comparing to H264. I can easily encode H264 even on Raspberry Pi Zero and i can't do the same for VP8 at all.

Also average engineer doesn't care about royalties, but still everyone is using VP8 as default.

Disclaimer: I work on WebRTC somewhere in the codec stack. Certainly not the most knowledgeable in that field but here are a few reasons.

Software encoders will always have more features than hardware encoders, which are prone to bugs.

A bug in a HW encoder needs either a new driver or new silicone. You can't work with those and sometimes you can't even detect which version it is.

So if you want to ship a reliable application, you can either use a software codec (which you control) or hope the HW encoder which is totally random works as intended.

HW encoder bugs show in many forms. Sometimes the stream is just broken and can't be decoded, sometimes it won't listen to encoding parameter updates you send it (eg max bitrate or quality settings), sometimes it just generates a valid but bad stream (too many I frames, breaking the max bitrate setting), sometimes they have a TERRIBLE latency.

Sometimes, they just plainly lack features, for example can't encode temporal layers, don't have any denoising (useful for VP8), have a limited amount of instances limiting simulcast opportunities, have extreme frame size limitations (can't do screensharing with it)...

And that's just what I could come up with in a few minutes of thinking, I'm sure people who have worked on it for a much longer time would be able to say much more.

We could argue that those issues would apply exactly the same for both H.264 and VP8 hardware-based encoders, so these aren't reasons for lacking VP8 encoders in phones and other devices... otherwise we wouldn't have H.264 hardware either.

I agree in spirit with the parent comment (although judging from my post it would seem the contrary). Ideally, both codecs would have hardware encoders. The difference in performance and battery savings are huge. I just understand that the industry players felt that it was necessary to provide a royalty-free option among the proposed codecs... either that or it was just Google pushing their own thing, which also sounds very likely.

They do apply to all codecs, absolutely. Some just have historically more usage, so the encoders might behave better. There are still bugs though, I do remember hearing about some device producing an H264 stream that another device couldn't decode in HW. It's still not perfect.

Also, as many people have pointed out, VP8 is a more complex codec than H264, so harder to test coverage of all features, especially as most applications (especially on desktop) are not really doing RTC, which is quite a specific use case.

And phones DO have VP8 encoders. It's just that iPhones for historical reason don't have one. Pure speculation from my part, but I'm guessing they'll do AV1 someday as Apple is working on the spec.

But in general, even though HW encoders would be awesome for all codecs if they were working well in all situations, the reality is they don't always work, you usually need to build lists of HW that is supposedly behaving well with your use case. All advanced features (eg spatial layering) are tricky to implement in HW and are usually not configurable, so SW codecs will still have quite a lot of usage for the sake of quality.

> If your CPU have H264 then you shouldn't pay any licenses. You will have it in any CPU everywhere.

Except for software that runs on multiple platforms, which might not have H.264 hardware support (or might not be compatible with WebRTC even if they do exist). If you're anyone other than Safari, you have exactly zero reason to believe you'll be able to run WebRTC without a software implementation of H.264. Software encoding means royalty payments if you're a legitimate business.

I yet to see CPU without H264 built in. Even cheapest mobile CPU (more like MCU) have it.

Again, no one wants software encoding. This is a thing to avoid.

Im regards to webRTC: vp8 supports temporal layers, h264 does not. Meaning you can encode and upload 1 stream while participants with different network conditions can be served the best possible framerate. Vp9 and av1 also support spatial layers, which is the same for the resolution. For this you currently need simulcast in vp8 or h264.

Yeah, it is like people care about this when they literally can't hear a person on the other side because of fans. Without hardware support all this codecs are useless on mobile.

If you will take a look at Zoom, they are essentially doing simulcast over websockets/webrtc data channel and it works well for large amount of participants. Unlike VP8 that can't handle even two.

Apparently, Zoom encodes and decodes video in software using Webassembly.

WebAssembly is in browsers only, i bet they use proper codecs on desktop apps where most users are.

There are HW VP8 encoders around. Can't find a definitive list, but I suspect most of the ones that can do VP9 will do VP8 as well: https://en.wikipedia.org/wiki/VP9#Hardware_implementations

How does WebRTC go together with HTTP/3 and QUIC, or is it unrelated at all?

Hey, I used to work on this :).

It's complicated :).

Basically, WebRTC is a combination of a bunch of protocols: ICE, DTLS, SCTP, and RTP. You could theoretically reduce that to ICE + QUIC for p2p use cases and just QUIC for client/server use cases.

For p2p, there is RTCQuicTransport (see https://developers.google.com/web/updates/2019/01/rtcquictra...)

For client/server, there is QuicTranport (see https://web.dev/quictransport/)

Of course, you'll probably want to be able to encode and decode some audio and video as well to make that useful. For that, there is WebCodecs (see https://github.com/WICG/web-codecs or https://www.chromestatus.com/feature/5669293909868544)

For use cases like live streaming and cloud gaming, I did a presentation about the combination of WebTransport + WebCodecs: https://www.w3.org/2018/12/games-workshop/slides/21-webtrans...

And then there is the work happening in the IETF along these same lines: https://datatracker.ietf.org/wg/ript/documents/

Thank you and others for answering me! :-)

Broadly speaking, WebRTC addresses P2P and does things like NAT traversal; HTTP/3 is "classic" server/client communications, with the server needing a publicly reachable hostname.

That being said, nothing specifically prevents a server from being one of the peers in WebRTC, but the overhead can be significant, so it's not commonly used like that.

I'm not aware of any way of doing something like WebSockets with datagram semantics (i.e. over UDP), but it seems like this is something that might be addressed once WebSockets can be used over HTTP/3.

Unrelated! (for now)

WebRTC uses decades old protocols (RTP/RTCP) for media exchange and pre-dates QUIC. There is talk of using QUIC instead, but I have no idea how that is going!

There's the Web Transport proposal. It provides client/server streams similar to what you can do with data channels (ordered, unordered, multiple streams) and it's using QUIC -


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact