Hacker Newsnew | past | comments | ask | show | jobs | submit | Sean-Der's commentslogin

I am excited for VAD to go away. PersonaPlex totally seems like the future.

However things like 'Call center helpline' turn based actually seems better! You don't want to be interrupted when giving information back and forth (I think?)


Did you use libwebrtc on the backend? When you say `libwebrtc` is the only game in town are you talking about clients or servers?

Even for clients you have things like libpeer that libwebrtc can't hit.


yes - i used libwebrtc on the backend and, pre-LLM, patched it to work around a lot of the things i discovered that were directly related to low latency AV streaming. pion didn't exist then.

i think the challenge is that pion is an excellent product today. it would benefit me if its innovations were subsumed into libwebrtc, because eventually those innovations will show up in the iOS stack, which is one of the customers that matter to me. it is subjective if it is the MOST important customer, that is my belief and it is probably true of openai, at least until they get their own device out the door.

there can be many, many use cases though! not everything has to be, try to make the thing for 1b people that has to interact with all the most powerful and meanest businesses on the planet.


Check out [0]. You can do 'Voice AI' on small/cheap hardware. It's the most fun you can have in the space ATM :) It's been a while, but posted a demo here [1]

[0] https://github.com/pipecat-ai/pipecat-esp32

[1] https://www.youtube.com/watch?v=6f0sUEUuruw


beautiful demo - is it running fully locally or talking to 3rd party API’s? That box was jaw dropping small

For the best experience, you'll still want it to communicate with 3rd party APIs to handle the speech to text, text to speech, and LLM.

It doesn't today, but you could with sometime like this [0]. You can save/suspend all WebRTC state and bring it back with the next process.

[0] https://github.com/pion/webrtc-zero-downtime-restart


Very grateful that OpenAI published the article/publicized their usage of Pion[0] a library I work on. If you aren't familiar with WebRTC it's a super fun space. I work on a book WebRTC for the Curious [1] that details how it works.

[0] https://github.com/pion/webrtc

[1] https://webrtcforthecurious.com


I use pion thanks for making it!

Curious if you thought their approach was necessary, it seemed like a ton of complexity to reduce one of the faster parts of a voice AI setup. Having a fast model and accurate VAD seems way more important than fine tuning WebRTC transit times.


Thanks for using it :)

I think It’s a case of you improve what you own. The owners of WebRTC servers were aggressively improving their part. They don’t own the inference servers.


Appreciate you putting the entire book online!

I read parts of it a while ago when I had an idea on using webRTC data channels to pass data from databases to browser clients via a CLI. Your book made me understand that it's probably not a great fit for my use case. I just used a centralized control plane and websockets instead.

I still feel like there is something fun that we can do with webRTC data channels + zero copy Apache Arrow arraybuffers + duckdb WASM, but haven't figured it out yet


Thanks for reading it!

You can't beat Websockets :) Especially since you have so much tooling/existing stuff that works with HTTP.

I have been trying to get a website off the ground that does Datachannels + SQlite in the browser and then users sync between each other. I have gotten distracted so many times though.


Super cool! Please give me a ping if you ever launch that (my email is on my website (in my profile))

What is preventing the fun is that even though we now have IPv6 widely enough available we still can't have p2p connections in the browser without a cumbersome control plane of servers. If you could join a federation in the browser from some bootstrap IPs then I think we could have some real distributed fun.

Thanks for WebRTC for the Curious and for Pion! Not using the latter directly, but have used both to better understand WebRTC

For those unfamiliar with WebRTC, the Pion FAQ page has a good description:

> WebRTC is a standardized protocol for P2P communication. It allows two peers to exchange media and data. It is encrypted by default, and handles connectivity establishment in many different network conditions. It is supported in browsers, and has multiple out of browser implementations.[0]

[0]: https://github.com/pion/webrtc/wiki/FAQ#what-is-webrtc


slightly unrelated but what’s with storing the entire codebase in the root directory instead of a nested src folder? It makes getting to the README a lot more difficult

Thats the default for go projects. Go imports are repository strings (e.g.):

     import ("github.com/go-sql-driver/mysql")
so it's standard to have the library files in the root directory.

FWIW I usually don't structure my Go projects this way unless they're very very small. This is what I usually do for anything larger than 2-3 files:

  ├── cmd
  │   └── binary-name
  │       └── main.go (may subpackage for things like CLI porcelain, etc)
  ├── go.mod
  └── internal
      └── app.go (and subpackages, etc)

I assume this is why GitHub has the annoying #readme-ov-file slug

This is valid criticism. Go fanbois don't like listening to any go criticism. They were all like who needs templates in go. and now go has templates.

To me go code looks like somebody vomitted stuff in the root dir and i have to wade through that every time. No namespacing. nothing


I don't like go as a personal preference but reducing them to "fanboys" is a bit reductive. I'm sure the same could be said about your own favorite language.

Is it reductive when its describing a group of people that like something and refusing to hear any ill of it? The comment wasn't shade at people using the language in general.

And you're right, fanboys are in every language. But resorting to changing the argument by whataboutism is a bit reductive.


I’m not a go fanboy, but I do know from other contexts that so-called “fanboy“ behaviour is frequently associated with level-headed supporters getting defensive in the face of imprecise criticism.

There’s an oft-repeated pattern where valid specific criticisms morph into broad criticism, which morphs into judgement, which breeds defensiveness, which feeds the criticism. Once you recognise this pattern, you see it everywhere.


Sure, and there's the near-identical pattern where valid specific criticisms are taken as broad criticism even though they aren't, etc., etc..

Thats the defensive step outlined above.

Ok... The question was why is it like that. The answer is because it's in go. Nobody was anything other than civil before you neckbearded in here. Chill. There's a sane way to say what you said.

I believe it was "They were all like who needs generics in go. and now go has go generate and templates."

I guess I qualify as a Go fanboi -- it is not perfect but gets the job done for a lot of us, sorry it doesn't work for you.

But back to your point about "vomit in the root dir", Go does have namespacing of sorts via packages, and the pattern you criticized is not the only way -- often just a simple main.go at the root bringing in packaged functionality.


WebRTC is great and so is Pion, thanks for help making and maintaining it! I loved learning about WebRTC from WebRTC for the Curious!!!

I used pion and it was fantastic. Most of the article seems pretty standard webrtc techniques for performant voice.

Only a software dev would start their referencing at 0 lol

I do this too I never made the connection.

What USB Tuners do people like? I have a Hauppauge WinTV-dualHD Dual Tuner - 955D and I have to restart server every 3 days because it deadlocks.

Do any alternatives to TvHeadEnd exist? I looked a bit and seems like it’s the best. So much customization it would be hard to reach it’s quality


It's been a few years since i last tried, but I was pretty satisfied with TBS PCI-e tuners [1]. Good linux support, stable, and you can get them with multiple tuners to stream/record many channels at once.

[1] www.tbsiptv.com


Anysee 30 something. It's USB but I have uptimes close to a year

This is a route that I was planning to take on Linux, so it is disappointing to hear that this was your experience. Is it possible that HDHomeRun as the backend to Tvheadend is the way to go?

> use WebRTC and deploy selective forwarding units, which are going to be something custom

Would you mind explaining more? If you are doing WHIP/WHEP you should be able to drop in Broadcast Box/MediaMTX etc... and switch out servers and no one should notice. You can use browser/mobile/ffmpeg/OBS etc... get the same behavior. I care a lot about the broadcast space, want to learn about other problems.

> subtly speed up audio/video to keep everything in sync

You can use https://webrtc.googlesource.com/src/+/refs/heads/main/docs/n... to add more delay (if you want to force more buffering). Or if you don't link the media together (via MediaStream) you don't get the behavior you describe either!

> capture each participant's audio individually

That's a neat problem. I haven't solved this one myself, I wonder if it's easier with RtpTransport or insertable streams?


Regarding SFUs - with something like HLS, I can really easily scale up using something like a caching CDN (not entirely sure if that's the right term). But the idea goes: I can distribute the HLS media playlist, and have my media segment entries prefixed with a caching/CDN service. The service will be configured with the actual origin server, and when a segment isn't in the CDN, the CDN fetches from the origin, on-demand. That was a nice option when I was doing owncast streaming since I really only paid based on viewership, and just had to make sure I had the correct cache-related headers on my media segments.

Or alternatively - I can push media segments up to a CDN and distribute that way, using an s3-compatible service, or just rsyncing to a server with better bandwidth, etc. One thing I didn't care for - again back when I was broadcasting with Owncast - was that I needed to make sure old media segments were expired, otherwise I would rack up an insane bill. I had a 24/7 owncast stream and if you're not on top of expiring media segments with your CDN, it gets expensive fast.

The overall idea is - serving HLS is ultimately serving files and there's a good amount of tooling for that, right.

Now that you mention it, I think WHIP/WHEP can solve some of that. I just don't know of any service where I can have that same cache/CDN-like experience, of either having the CDN connect to the origin as needed and fan-out, or where I can push up and let the service distribute. (though - now I'm googling for "webrtc sfu as a service" and see that is a thing!).

Didn't know about the playout delay extension.

Whether capturing individual audio is easier with RtpTransport or insertable streams - I'm unsure. Possibly? I just figure since MoQ is going to rely on things like WebCodec/WebAudio there's hopefully a bit more control over what happens with audio as it comes in.

I'll admit though - I've started noticing how often podcasts are clearly recorded using something that doesn't allow per-participant recordings and, I'm guessing as long as the quality is good enough most aren't worrying about it.

EDIT: feel like I should mention Pion rules, I used it a few years ago to put together an SRT-to-WebRTC thing and RTMP-to-WebRTC thing to use with Janus Gateway, it was so easy.


Huge fan of BigBlueButton! It’s been cool to watch the project go through multiple big tech changes and still keep going.

Never give up! Free/Open Source software especially in schools is so important


You know... As it turns out, that's a different piece of software! (I was super confused at the comment and had to google it.)

My company, and long before that just the website I used to host different projects on, is "BigBlueCeiling" so I tend to default to "BigBlue_________" for project names or offshoots now.

I'm a huge fan of FOSS in general though. And if BigBlueBam found life inside of education I wouldn't hate it.


Yea!

* Do video playback out of the browser. You can render a subset of frames, use a different pipeline for decode etc...

* Pull video from a different source. Join Google Meet on current computer, but stream from another host.


I can’t wait for https://w3c.github.io/webrtc-rtptransport/ when you talk about pulling vide out seems like the perfect fit.

I ended up doing proxy because Google Meet doesn’t let me hook at any RTCPeerConnection APIs at all. I wanted to send synthetic media in, but couldn’t get it working. Ending up doing a virtual webcam on Linux.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: