This article is very correct: Wasm has a code size problem. This is a problem in...

azornathogron · on April 13, 2024

> browsers have various popular language runtimes (and perhaps even popular libraries) preloaded, so that all web pages requiring that runtime can share the same (read-only) copy of that code.

That sounds a lot like the idea from some years past that commonly used JavaScript frameworks would be served from a few common CDNs and would be widely enough used to be almost always in cache in the browser, and therefore won't need to actually be downloaded for most pages (hence, the size of the js frameworks shouldn't matter so much)

I'm no expert but from what I understand, that didn't really work out very well. A combination of too many different versions of these libraries (so each individual version is actually not that widely used), and later privacy concerns that moved browsers toward partitioning cache by site or origin. Maybe other reasons too.

Of course, you didn't mention caching and perhaps that's not what you had in mind, but I think it's a tricky problem (a social problem more than a technical one): do you add baseline browser support for increasing numbers of language runtimes? That raises the bar for new browsers even further and anyway you'll never support all the libraries and runtimes people want. Do you let people bring their own and rely on caching? Then how do you avoid the problems previously encountered with caching JS libs?

kentonv · on April 13, 2024

These are good questions and I think there's more than one answer that's worth exploring.

I think that the privacy problems caused by shared caches could be solved, without simply prohibiting them altogether. Like, what if you only use the shared cache after N different web sites have requested the same module?

But if we really can't get around that problem, then I think another approach worth exploring is for there to be some sort of curated repository somewhere of Wasm modules that are popular enough that browsers should pre-download them. Then the existence of the module in a user's browser doesn't say anything about what sites they have been to.

Versioning is a problem, yes. If every incremental minor release of a language runtime is considered a separate version then it may be rare for any two web sites to share the same version. The way the browser solves this for JavaScript is to run all sites on the latest version of the JS runtime, and fully commit to backwards compatibility. If particular language runtimes could also commit to backwards compatibility at the ABI level, then you only need to pre-download one runtime per language. I realize this may be a big cultural change for some of them. It may be more palatable to say that a language is allowed to do occasional major releases with breaking changes, but is expected to keep minor releases backwards-compatible, so that there are only a couple different runtime version needed. And once a version gets too old, it falls out of the preload set -- websites which can't be bothered to stay up to date get slower, but that's on them.

This is definitely the kind of thing where there's no answer that is technically ideal and people are going to argue a lot about it. But I think if we want to have the web platform really support more than just JavaScript, we need to figure this out.

thenameipicked · on April 13, 2024

I think a better model would be for the site itself to provide the modules, but the browser will hash and cache them for the next site that may want to use the same module.

This way, there's no central authority that determines what is common enough.

This model does not allow for versioning. For this model, it would be risky to allow it (one website could provide a malicious model that infects the next site you visit).

wizzwizz4 · on April 13, 2024

> Like, what if you only use the shared cache after N different web sites have requested the same module?

That would still let websites perform timing attacks to deanonymise people. There's no way to verify that "N different websites" isn't just the same website with N different names.

Though, we could promote certain domains as CDNs, exempt from the no-shared-cache rules: so long as we added artificial delay when it "would have" been downloaded, that'd be just as safe. We're already doing this with domains (HSTS preload list), so why not CDNs?

Web browser developers seem to labour under the assumption that anyone will use the HTML5 features they've so lovingly hand-crafted. Who wants something as complicated as:

  <details>
    <summary>Eat me</summary>
    <p>Lorem ipsum and so on and so forth…</p>
  </details>

when we have the stunning simplicity of:

  <div class="MuiPaper-root MuiPaper-elevation MuiPaper-rounded MuiPaper-elevation1 MuiAccordion-root MuiAccordion-rounded MuiAccordion-gutters css-1aj41gs">
    <div class="MuiButtonBase-root MuiAccordionSummary-root MuiAccordionSummary-gutters css-1oqimao" tabindex="0" role="button" aria-expanded="false" aria-controls="panel-content" id="panel-header">
      <div class="MuiAccordionSummary-content MuiAccordionSummary-contentGutters css-l0jafl">Eat me</div>
      <div class="MuiAccordionSummary-expandIconWrapper css-1fx8m19">
        <svg class="MuiSvgIcon-root MuiSvgIcon-fontSizeMedium css-vubbuv" focusable="false" aria-hidden="true" viewBox="0 0 24 24" data-testid="ExpandMoreIcon">
          <path d="M16.59 8.59 12 13.17 7.41 8.59 6 10l6 6 6-6z"></path>
        </svg>
      </div>
    </div>
    <div class="MuiCollapse-root MuiCollapse-vertical MuiCollapse-hidden css-a0y2e3" style="min-height:0px">
      <div class="MuiCollapse-wrapper MuiCollapse-vertical css-hboir5">
        <div class="MuiCollapse-wrapperInner MuiCollapse-vertical css-8atqhb">
          <div aria-labelledby="panel-header" id="panel-content" role="region" class="MuiAccordion-region">
            <div class="MuiAccordionDetails-root css-u7qq7e">Lorem ipsum and so on and so forth…</div>
          </div>
        </div>
      </div>
    </div>
  </div>

Example modified from https://mui.com/material-ui/react-accordion/. Though, in fairness, the developer UX is much better:

  <Accordion>
    <AccordionSummary id="panel-header" aria-controls="panel-content">
      Eat me
    </AccordionSummary>
    <AccordionDetails>
      Lorem ipsum and so on and so forth…
    </AccordionDetails>
  </Accordion>

Maybe the problem isn't the libraries. Maybe the problem is us.

troupo · on April 13, 2024

The problem is the libraries. Browsers are still mostly incapable of delivering usable workable building blocks especially in the realm of UI. https://open-ui.org/ is a good start, but it will be a while before we see major pay offs.

Another reason is that the DOM is horrendously bad at building anything UI-related. Laying out static text and images? Sure, barely. Providing actual building blocks for a UI? Emphatically no.

And that's the reason why devs keep reinventing controls. Because while details/summary is good, it's extremely limited, does not provide all the needed features, and is impossible to properly extend.

lelanthran · on April 13, 2024

Maybe not so tricky.

What's wrong with having package management for dynamics libs built into the browser, using signed packages?

Any dynamic lib that is referenced, say /glibc.6.0.2, is downloaded only once, ever.

This is a problem Linux distributions more or less solved ages ago for distribution packages.

Why does a new, more complicated and over-engineered thing need to be invented when a tried and tested mechanism exists?

skybrian · on April 13, 2024

It seems like limited dynamic linking support could go a long way.

For example, there could be a Go shared library that includes the runtime and core parts of the standard library that many programs use. It would decrease the size of all Go programs, without needing to have dynamic library support within an app. The language runtime might not need heavy optimization for space. It’s already loaded, and as long as any program uses a function, it’s not wasted space.

It changes the cost model for optimizing programs in that language for space. Since included standard library functions are free (if you’re using the language at all), you might as well use them.

Though, the problem reoccurs with commonly used libraries and frameworks. You’d also want Cloudflare’s standard library for Go to be shared when running on Cloudflare.

One problem with this model is that languages don’t evolve in lockstep with the runtime. Either there would be limited support for different versions of a language, or the shared libraries available would pile up over time, resulting in limited sharing between apps. JavaScript has the “you don’t get a choice” versioning model, which requires strong backward compatibility and sometimes polyfills. It might not be as suitable for other languages.

When a runtime really wants to cut down on space, it can be done by limiting plugin diversity. Though there are complaints, “you must use JavaScript” worked out pretty well for browsers.

Maybe we don’t need a lot of different WebAssembly-based languages? It’s a tower of babel situation. Diversity has costs.

screcth · on April 13, 2024

Could it be possible to do "profile guided tree-shaking" to build a small module with all the code that's necessary for the application and pull-in less used functionality on-demand using dynamic linking?

If tree-shaking was done based on production information it may be possible to prune a lot of dead/almost-dead code without having to implement sophisticated static analysis algorithms.

nurple · on April 13, 2024

A lazy chunked delivery strategy like used in the k8s stargz-snapshotter[0] project could be effective here, where it only pulls chunks as needed, but it would probably require wasm platform changes.

[0] https://github.com/containerd/stargz-snapshotter

kevindamm · on April 13, 2024

There is a substantial risk there unless you can hit all the edge cases and error conditions when profiling. Even a good fuzzer can miss a very rare state. Then when you hit it in real use there's no code to handle it!

Profile-based optimization and JITting is plausible because the corner cases are still there, just not optimized.

screcth · on April 13, 2024

I completely agree, that's why in that case you could download the missing code from the server and load it using dynamic linking.

The server would then mark it as reachable so it's delivered as part of the main bundle next time.

I would expect the bundle to converge quickly to the set of functions that are actually reachable.

Aditionally, it's very likely that the sets of reachable code of two versions of the same app have significant overlap, so the information collected for version N could be used as a starting point for N+1, and so on.

avodonosov · on April 13, 2024

"less used functionality on-demand" - so the code to handle the rare case remains available, on demand.

avodonosov · on April 13, 2024

I experimented with that for javascript: https://github.com/avodonosov/pocl

aledalgrande · on April 13, 2024

> we could start thinking about an architecture where browsers have various popular language runtimes (and perhaps even popular libraries) preloaded

that could potentially lead to hundreds of versions of runtimes downloaded in the browser, filling up the cache with binaries that might be used by 1 site each

beepbooptheory · on April 13, 2024

I think I agree overall, just want to point out that with Wasm, you still end up using a fair bit of the built-into-browser js to accomplish things not purely computational. Especially in this context with Hoot [1], where things like appendChild are external functions you call inside the scheme. One could theoretically do this for much of the js standard library in any kind of wasm context.

1. https://spritely.institute/news/building-interactive-web-pag...

kentonv · on April 13, 2024

Indeed, I/O APIs (anything that talks to the outside world) are another sore point for WebAssembly, as browsers do not currently expose any particular APIs directly to Wasm, only to JavaScript. So Wasm has to make calls to a JavaScript middleman layer to use those APIs.

But browsers are understandably hesitant to create a whole parallel API surface designed specifically for Wasm callers. That's a lot of work.

I am not totally convinced that this is a real problem, vs. just something that makes people feel bad. Like, if you are coding Rust, the idea that all your "system calls" are calling into a layer of JavaScript feels disgusting. But is it a real problem? Most of these calls are probably not so performance sensitive that this FFI layer matters that much.

If it is a real problem, I'd guess the answer is for browsers to come up with a more efficient way to expose WebIDL-defined APIs to Wasm, but without reinventing any individual APIs. Being derived from WebIDL, they are still going to have JS idioms in their design, but maybe we can at least skip invoking actual JavaScript.

jonnycomputer · on April 13, 2024

Lot I don't know about how browsers are shipped, but it seems to me like browsers could easily get away with packing in a few languages and their STLs as part of their default installs. Python is what, 25MB? Would another couple hundred megs of disk space be such a big deal?

lxgr · on April 13, 2024

Possibly – if you can find a single version of Python that everybody will be happy with, forever.

Being able to cache runtimes and libraries like that across sites would be nice, though (but probably enables fingerprinting, so one Python runtime per origin it is).

jonnycomputer · on April 14, 2024

Fair point. I didn't think of that.

nextaccountic · on April 13, 2024

Does browsers support wasm with dynamic linking?

tomjakubowski · on April 13, 2024

The way Emscripten does it, IIRC, doesn't require any special browser support. The toolchain generates glue code in JavaScript to support calls between dynamically linked Wasm modules.

kentonv · on April 13, 2024

nextaccountic · on April 13, 2024

Do you happen to know where can I check out the cutoff version for each browser? https://caniuse.com/?search=wasm doesn't have it (or other things like WasmGC for that matter)

kentonv · on April 13, 2024

I believe dynamic linking has been a core feature of WebAssembly from the beginning. You have always been able to load multiple Wasm modules in the same isolate and make them call each other.

(But, language toolchains have to actually be designed to use this feature. Most aren't.)