More

dmazzoni · 2024-11-21T04:25:33 1732163133

I'm a fan of more frameworks for desktop apps that wrap system webview rather than embedding Chromium.

Chromium is awesome and all, but it's just way overkill for many apps, and doesn't make sense for the download size and the need to support auto-update for security features.

hresvelgr · 2024-11-21T04:49:06 1732164546

I feel the reason the system web-view is eschewed in these frameworks is because you have no choice but to maintain compatibility with 3-4 browsers: Edge/IE on Windows, Safari on Mac, and Firefox on Linux (usually).

thayne · 2024-11-21T05:42:31 1732167751

> Firefox on Linux (usually).

No, linux is usually webkit (which is also what safari is based on). Both Gtk and Qt have webkit-based widgets.

I'm not aware of a gecko based webview for desktop. Unfortunately, Firefox's technology is even more poorly suited for embedding than chromium's.

yoav · 2024-11-21T04:54:37 1732164877

The number 1 request I get from people is to add the ability to optionally bundle chromium.

I personally prefer the system webview because you don’t have to rush update your app for every chromium security update. And on the web making things cross browser is a normal part of the job and instinct imo.

But there are a ton of early startups that only have bandwidth to support chrome/chromium in their complex webapps and want a quick way to port their web app to desktop app. For them taking on the security burden and increasing bundle size is a good tradeoff to getting that consistency.

Luckily electrobun has a custom zig bsdiff implementation that generates update diffs as small as 4KB and self extracting executable that uses zstd so at least the file size is less relevant of a concern compared to electron.

lolinder · 2024-11-21T06:07:55 1732169275

> you don’t have to rush update your app for every chromium security update

I'm interested to hear more about this—if you're using security-sensitive features in a WebView, aren't you then at the mercy of the OS to patch them whenever they see fit? And if you're not using features that have security implications, why do you need the latest version of Chromium at all times?

littlestymaar · 2024-11-21T06:48:24 1732171704

> But there are a ton of early startups that only have bandwidth to support chrome/chromium in their complex webapps and want a quick way to port their web app to desktop app.

Ugh!

People writing web apps without supporting anything else than Chrome should burn in hell. (And that's a pretty useless decision anyway since “supporting chrome” really means supporting two engines: Chromium and WebKit, because Chrome on iOS uses WebKit internally …)

jamager · 2024-11-21T11:36:56 1732189016

Or email clients that only support Gmail...

dmazzoni · 2024-11-09T16:08:10 1731168490

Yes. There are far fewer people at each higher level.

dmazzoni · 2024-11-07T07:08:52 1730963332

> It's not even clear why they need a browser in the mix; most of these services have APIs you can use

They have APIs to schedule meetings.

They don't have APIs that give you access to the compressed video stream.

Sesse__ · 2024-11-07T08:15:38 1730967338

Many of them do, they're just not something any random person can go and sign up for.

dmazzoni · 2024-11-07T07:06:21 1730963181

They join a 3rd-party meeting using a browser.

Then they capture the video from the meeting in Chromium.

Then they need to send that captured video to another process for compression and processing.

No, WebSockets isn't the most efficient, but there aren't that many options once you're capturing inside a web page.

dilyevsky · 2024-11-07T08:25:21 1730967921

Not totally sure but they probably extract video via Chrome DevTools Protocol which uses WebSocket as transport.

dmazzoni · 2024-11-07T07:04:54 1730963094

> Their current approach is what I'd think would be a temporary solution while they reverse engineer the streams (or even get partnerships with the likes of MS and others. MS in particular would likely jump at an opportunity to AI something).

They support 7 meeting platforms. Even if 1 or 2 are open to providing APIs, they're not all going to do that.

Reverse-engineering the protocol would be far more efficient, yes - but it'd also be more brittle. The protocol could change at any time and reverse-engineering it again could days between days and weeks. Would you want a product with that sort of downtime?

Also, does it scale? Reverse-engineering 7+ protocols is a lot of engineering work, and it's very specialized work that not any software engineer could just dive into quickly.

In comparison, writing web scrapers to find the video element for 7 different meeting products is super easy to write, and super easy to fix.

lostmsu · 2024-11-07T08:42:20 1730968940

If they forked Chromium, they should have direct access to compressed stream of a particular video element without much fuss.

dmazzoni · 2024-11-07T07:00:55 1730962855

> Here they have a nicely compressed stream of video data

But they don't.

They support 7 different meeting providers (Zoom, Meet, WebEx, ...), none of which have an API that give you access to the compressed video stream.

In theory, you could try to reverse-engineer each protocol...but then your product could break for potentially days or weeks anytime one of those companies decides to change their protocol - vs web scraping, where if it breaks they can probably fix it in 15 minutes.

Their solution is inefficient, but robust. And that's ultimately a more monetizable product.

dmazzoni · 2024-11-07T06:56:22 1730962582

> Why don't they just write the meeting output to hls

They're capturing video from inside a Chromium process. How exactly do you expect to send the raw captured video frames to hls?

Are you proposing implementing the HLS server inside a web process?

mannyv · 2024-11-08T16:54:33 1731084873

No, what I'm saying is pipe the video output to an HLS encoder. HLS live will rewrite the m3u8 as more segments come in. In fact, they can render the audio into its own m3u8 and use that as input for their transcriber, saving even more bandwidth/data transfer/etc.

Since it's coming from a headless process, they can just pipe it into ffmpeg, which is probably what they're using on the back-end anyway. Send the output to a file, then copy those to s3 as they're generated. And you can drop the frame rate and bitrate on that while you're at it, saving time and latency.

It's really not rocket science. You just have to understand your problem domain more better.

Shipping uncompressed video around is ridiculous, unless you're doing video editing. And even then you should use low-res copies and just push around EDLs until you need to render (unless you need high-res to see something).

Given that they're doing all that work, they might as well try to get an HLS encoder running in chrome. There just was an mp3 codec in web assembly on HN, so an HLS live encoder may not be too hard. I mean, if they were blowing a million because of their bad design they could blow another million building a browser-based HLS encoder.

dmazzoni · 2024-11-07T06:54:44 1730962484

The initial approach was shipping raw video over a WebSocket...between two processes running on the same machine.

That doesn't sound like a ridiculous idea to me. How else would you get video data out of a sandboxed Chromium process?

sfink · 2024-11-07T20:38:51 1731011931

Short answer: raw video is big.

With my mindset, you have a gigantic chunk of data. Especially if you're recording multiple streams per machine. The immediate thought is that you want to avoid copying as much as possible. If you really, really have to, you can copy it once. Maybe even twice, though before moving from 1 to 2 copies you should spend some time thinking about whether it's possible to move from 1 to 0, or never materializing the full data at all (i.e., keep it compressed, which could apply here but only as an optimization for certain video applications and so is irrelevant to the bootstrapping phase).

WebSockets take your giant chunk of data and squeeze it through a straw. How many times does each byte get copied in the process? I don't know, but probably more than twice. Even worse, it's going to process it in chunks, so you're going to have per-chunk overhead (maybe including a context switch?) that is O(number of chunks in a giant data set).

But the application fundamentally requires squishing that giant data back down again, which immediately implies moving the computation to the data. I would want to experiment with a wasm-compiled video compressor (remember, we already have the no GPU constraint, so it's ok to light the CPU on fire), and then get compressed video out of the sandbox. WebSockets don't seem unreasonable for that -- they probably cost a factor of 2-4 over the raw data size, but once you've gained an order of magnitude from the compression, that's in the land of engineering tradeoffs. The bigger concern is dropping frames by combining the frame generation and reading with the compression, though I think you could probably use a Web Worker and SharedArrayBuffers to put those on different cores.

But I'm wrong. The data isn't so large that the brute force approach wouldn't work at all. My version would take longer to get up and running, which means they couldn't move on to the rest of the system.

dmazzoni · 2024-11-07T06:51:48 1730962308

I'm assuming it's because the compressed video on the wire is encrypted?

dmazzoni · 2024-11-07T06:50:15 1730962215

Their business is joining meetings from 7 different platforms (Zoom, Meet, WebEx, etc.) and capturing the video.

They don't have control of the incoming video format.

They don't even have access to the incoming video data, because they're not using an API. They're joining the meeting using a real browser, and capturing the video.

Is it an ugly hack? Maybe. But it's also a pretty robust one, because they're not dependent on an API that might break or reverse-engineering a protocol that might change. They're a bit dependent on the frontend, but that changes rarely and it's super easy to adapt when it does change.

h4ck_th3_pl4n3t · 2024-11-07T08:53:57 1730969637

I'm not sure you understood what I meant.

They are in control of the bot server that joins with the headless chrome client. They can use the CDP protocol to use the screencast API to write the recorded video stream to filesystem/disk, and then they can literally just run ffmpeg on that on-disk-on-server file and stream it somewhere else.

But instead they decided to use websockets to send it from that bot client to their own backend API, transmitting the raw pixels as either a raw blob or base64 encoded data, each frame, not encoded anyhow. And that is where the huge waste in bandwidth comes from.

(The article hints to this in a lot of places)

yencabulator · 2024-11-07T22:17:22 1731017842

They are doing e.g. transcriptions live as the stream happens, not writing a file and batch processing later.

lostmsu · 2024-11-07T08:29:44 1730968184

Even in this case it is non-sensical. Dunno about Linux, but on Windows you'd just feed the GPU window surface into a GPU hardware encoder via a shared texture with basically 0 data transmission, and get a compressed stream out.