I've kept openEtG's wasm blob (card game engine) <400kb while migrating a lot of logic (card text generation, etc) to wasm. It's Rust. Doing so takes some care:
1. avoid floats (fixed point arithmetic saved quite a bit of space)
2. avoid hashmaps (originally used hashmaps since it was easy to port JS maps to, have since ported everything to vecs)
3. avoid strings (for awhile there were no strings, but eventually brought it in for display logic)
4. use a small allocator, like talc
5. avoid dependencies. I only use rand & fxhash. I should probably get rid of rand (fxhash only used to hash game state to check for desyncs)
6. avoid generic diversity. I try to keep a small set of instantiated types, for example Vec<i16> is there so no need to bring in Box<[i16]> or anything. Getting away from floats/hashmaps helped reduce type diversity
7. design algorithms with size in mind, I have a couple lookup tables where I pack bits https://github.com/serprex/openEtG/blob/2011007dec2616d1a24d... encodes an adrenaline mechanic where multiple attacks give lower attack power creatures more attacks than higher attack power creatures. Care was taken comparing how much decoding logic cost compared to storing unpacked values. AI evaluation uses 6 bit fixed precision because 64 encodes more efficiently than 128 in webassembly
Similarly there's a targeting mechanism with AND/OR & predicates. I used to have an AST like format with each predicate getting an enum & AND/OR being a slice of expressions. Now each expression is 32 bit integers encoding expression in polish notation, AND/OR have 2 bit codes & predicates are 6 bits (polish notation won over reverse polish here because with polish notation I was able to have AND/OR short circuit evaluation)
> 1. avoid floats (fixed point arithmetic saved quite a bit of space)
This is really interesting, since WASM has built in float types of course - is there any more detail you can go into / examples you can show us?
I have a problem at work where I was thinking fixed point arithmetic might help, because I know my maximum resolution needs (e.g. I know the problem will never care about sub-millimetre positioning). I'd be interested to hear more about related issues.
Are you sure they're not talking about code size? Not sure what limiting instantiations would have to do with heap size...
(I also find it surprising that avoiding floats helps anything; maybe just in avoiding more instantiations? Because if the std floating point functions don't fully inline, you've got bigger problems than wasm blob size...)
- Use brotli compression for serving wasm bundles to the browser, and make sure your web server is setup to use the brotli compressed files. (Its a 1 line change in nginx, for example). Brotli drops the size of wasm bundles by about 3x. Its much better than gzip for wasm.
I've been working on a C++ deep RL library (RLtools) and also created some WASM examples (https://rl.tools). I didn't pay any attention to the binary size at all but it turns out it is also just around 200-300kb (including everything, deep learning forward/backward, RL algo and dynamics simulation).
Even though it's not prohibitive rn, I'm curious how small it could be and hopefully find some time soon to squeeze it down
And even JavaScript only conceptually uses doubles everywhere, JavaScript engines do use integers where they get away with it. Your loop counter is almost certainly an int. You can use this to your advantage for optimization purposes if you carefully craft your statements to probably fit in integers (e.g. by adding bitwise operations)
> Especially for step 6, using a Vec as a Box, I would not expect that to save much.
It's not that using a Vec as a Box saves anything at all. It's that generics require monomorphisation, and that eats up a bunch of space, and more instances of generic types translates into more space.
If your application uses Vec<i16> elsewhere, you've already paid the binary size price for using it. Vec<i16> is close enough to Box<[i16]> that the functional differences don't warrant the extra code generation.
Unfortunately nothing concrete, it's a few kb here & there. I'd estimate I've saved ~300kb overall. I'd have to go back to compare these things. Removing floats saved more space than I expected
Strings seem to have a high cost because there's a lot of complexity in rust's format! macros. And the generated code seems to end up with a lot of obscure ways it can panic.
I've found just having one stray dbg!() in my rust code can add ~20kb to my wasm bundle size. Look at what a single dbg!(some_u32) generates: https://rust.godbolt.org/z/bex9z8vx7 . Its also calling into a bunch of garbled functions in the standard library - which will bring in a lot more code.
I suspect zig's comptime approach might work much better for for this sort of thing, if we want smaller binary sizes.
jesus christ, just do a desktop app. by your own admission you only saved 300k with all that. a desktop app at 700k is nothing, which means you could ignore all these optimizations and focus on improving speed or adding features
Arguably, on the web a difference between 300 KB and 700 KB doesn't matter either, many web pages have a single hero image bigger than that.
But for a long time one main critique point was that WASM apps are "bloated", so its good to see counter examples, and that devs start caring about trimming fat in general (this would also be a good thing for many desktop applications tbh).
Tree-shaking is such a bad misnomer. Virgil's compiler calls this "reachability analysis" and it's built into the compilation model. The compiler will parse and typecheck a program's (and libraries' code), and run initializers, but after that the compilation proceeds by exploring from the main entrypoint(s) and only reachable code is analyzed and ends in the final binary. It will happily generate a program (without runtime system) that just consists of a single main function. The runtime system is only necessary for stacktraces and GC, and can be omitted if desired.
The name "treeshaker" for an application delivery tool possibly originated in Lisp. The first time I've found it was in Lucid Common Lisp, a (no longer available) commercial implementation of Common Lisp for UNIX. Lucid CL 4.1 in 1992 included a tool called Treeshaker. Lucid CL was one of those Lisps which have the idea of an image (see for example the "image" feature of Lisp 1 in 1960), a saved memory dump of the current heap of a running Lisp. An application then is an image plus a runtime. The image typically includes almost all code AND data from the memory, which sometimes creates the wish to create smaller images for application delivery. Lucid CL then included a tool, which before saving that image, removes all kinds of unused code and data, depending on some meaning of what „unused“ means. For example the symbol table may have variables, types, functions, etc. which are not referenced anywhere. A treeshaker tool might also get a list of things to remove. Thus in a graph of reachable Lisp data&code, the connections are pruned. Either a GC or a specialized piece of code then collects the garbage, shrinks memory (-> shakes the tree and everything which is cut loose is falling down) and dumps it as a possibly smaller image.
The treeshaker thus was not a compiler tool, but a tool to remove what was determined to be unused code of a Lisp heap. Remember, by default such a Lisp image would also contain a compiler, an interpreter and an implementation of a read-eval-print loop. Thus if we break (-> interrupt a thread which then provides a REPL) a running program into such a read-eval-print loop, we could still use all the code, which is in this heap (which was restored from an image). Thus it would make sense to remove the compiler too, and possibly the read-eval-print loop, too.
I could be wrong, but I've heard that the term originated in Smalltalk. Smalltalk is also image-based, and tree-shaking was a way to produce a smaller image for deploying.
From above: "In contrast, tree shaking uses the approach of eliminating --shaking out-- what is not needed in a static fashion. Programmers specify the requirements of their application and the unneeded parts of the Lisp system are removed using their detailed knowledge of the application. The disadvantages of tree shaking are that it requires programmer intervention and that a function which is shaken out cannot be easily restored."
> Most Lisp development systems, including Lucid’s, provide all the resources of the Lisp system by default, and this in turn leads to a style of development in which the programmer makes use of whatever tool happens to be most convenient. Because much of the basic Lisp system (or any development system built on top of the basic Lisp system) will generally be unused by a given application, it is very worthwhile to have a tool for excising these unused parts. This tool is called the Treeshaker.
> Treeshaker execution occurs in three phases: walking, testing and writing. In the walking phase, the Treeshaker accumulates a set of objects that need to be included in the saved image. After making this set, the treeshaker runs a test of the application to check that all objects which are used in a typical run have been included. The writing phase then generates an executable image which will run the application.
> To a first approximation, the walk phase is just a matter of computing the connected component of the Lisp image (treated as a directed graph in the obvious way) generated by the application’s toplevel function. However, because of the way that Lisp objects are generally connected this usually includes almost the entire Lisp image including the unused subsystems. Therefore the treeshaker uses several techniques to find connections between objects that do not actually need to be followed in the walk."
> "The name Treeshaker is meant to be evocative of the idea of actually shaking a tree to dislodge dead branches or other trash."
On the more funny side, LispWorks has a keyword to the DELIVER function :shake-shake-shake , which invokes the treeshaker during application delivery.
The correct general term is “dead-code elimination” [0]. Reachability analysis is commonly used do determine what code can be removed, but other methods are possible, and the analysis by itself doesn’t remove any code, that’s a subsequent step.
“Tree-shaking” as commonly used implies that the granularity of the removal is functions, whereas dead-code elimination can generally be at arbitrarily fine levels, for example eliminating branches of conditional expressions, and can be based on all kinds of static program analyses.
Although true, in most cases when people talk about dead code elimination, they refer to eliminating code inside a function, whereas tree-shaking unambiguously refers to inter-procedural dead code elimination.
Probably because traditionally, unused 'objects' in static link libraries were ignored in the first place by linkers. But with LTO the term dead-code-elimination makes sense IMHO (since the LTO pass will drop any code and data that ends up being unused).
The reason I didn't like the name when I first came across it is that I think of shaking the tree for harvesting the fruit. The fruit is what you want, not what you want to eliminate.
Dead branches and loose leaves are still connected to a real tree, that's why it's a misnomer. "Raking" would be a better name if you want to keep to the metaphor. Dead code elimination is the most precise term, and is already well established.
dead branches and loose leaves are connected to a real tree in the same way that dead code is connected to the program in a file. if you break off the dead branch, nothing happens to the tree, just like when you remove unused code, nothing happens to the program.
I learned of tree shaking first, and DCE is clearly superior as a term: 1) it's actually descriptive, 2) there's a large literature using this term to look for further information, and 3) as the original poster noted, it's not actually a misnomer. There is literally no advantage to the new term that I can think of.
Tree-shaking is specifically the form of DCE where you remove unreferenced functions/modules.
Especially important is that you can do tree-shaking without analyzing control flow, while by default "DCE" implies you're analyzing control flow.
And I don't think tree-shaking is a misnomer. Depending on how you visualize the metaphor, unreferenced functions are either barely attached or not attached. Shaking them off is simple and sufficiently realistic.
Tree shaking is the established lisp term, but not used for compilers, but packagers, to shrink images. Dead branches are NOT stored in the pruned image.
Dead code elimination came 20 years later with C compilers.
The problem with treeshaking - I wrote my first for my lisp 30 years ago, it was trivial - is the lack of compile-time evaluation. The more the compiler knows, the more it can prune. Every run-time branch, late binding and esp. dynamic call by string kills it. With simple tricks you can eliminate 90% of your code. IO, error handling, the number tree, lots of slack in the stdlib's. With GUI even more. I heard from CL images shrinked from 2GB down to a floppy disk.
DCE is a standard compiler optimization that's been been around since at least the 80s. It's done in multiple different ways, and "tree shaking" is just one more way. Whether the term DCE is known doesn't seem relevant to what is the most descriptive and meaningful term for this optimization.
I can't speak to origins, but currently in the JS world DCE and tree-shaking refer to different things. "Tree-shaking" normally refers to when the bundler omits unreachable code, that a more naive bundler would have included. It's an oft-discussed topic because it wasn't possible to do in some earlier module formats, and some bundlers do it better than others. In this context the "tree" mostly refers to the dependency tree.
In contrast DCE usually refers what the JS engine does at runtime, via whatever means. But DCE isn't much discussed, unless one talking about v8 internals or the like.
DCE has been around at least since 1971 Frances E. Allen's "A catalogue of optimizing trasformations", but I don't have her earlier papers/internal ibm memos to be able to say, but the section on DCE in that paper didn't appear to contain any further references.
So in your mind, a compiler pass is somehow supposed to be similar to the wind blowing, and despite the fact that "tree shaking" makes no reference to the wind or dead branches or leaves, you think anyone seeing this term will immediately make this "completely obvious" connection, and to you, this makes more sense than simply calling an optimization a straightforward, self-explanatory name like "reachability analysis" or "dead code elimination".
It's no wonder naming things is considered one of the hardest things in computer science if this kind of convoluted argument makes sense to people.
Virgil's reachability analysis also applies to initialized objects and their fields. I.e. it will only include the reachable parts of the heap and will remove fields that are never accessed from objects. This analysis happens on the entire program at once.
Proving yet again that there aren’t enough gardeners in computer science.
The metaphor we use for optimization is “low hanging fruit” which no orchard owner would ever do. It’s massively wasteful, be you a programmer or a farmer. It’s what amateurs do. I do tree shaking. Pick a tree (subject matter in the code) and get all of the fruit that’s willing to fall off before moving to the next. It’s more efficient, more effective, and more sustainable. It works with human factors instead of against them.
What we call tree shaking is more like tree pruning. Specifically what you’d call thinning (tracing a misplaced or damaged branch back to the parent and cutting it there).
> The metaphor we use for optimization is “low hanging fruit” which no orchard owner would ever do.
This is just over-extending the metaphor.
The term has existed long before software was a thing, and refers simply to grabbing something that's easy. That's it.
The same goes for tree-shaking. The author does the same thing - over-extends the metaphor. Tree shaking simply means giving everything a jiggle and seeing what comes loose. It's easy to understand and shouldn't be read into any more than that.
I think the idea behind calling it "shaking" is these branches and leaves that are already cut (inaccessible from the root), and just need a strong breeze to shake the tree and make them fall out.
I believe low hanging fruit is not specific to CS.
Regardless, it is a perfect metaphor. You want to eat an apple: which one do you pick? Taking the low-hanging fruit is less work right now and gets you to your immediate goal, but disregards general efficiency. Sure, picking a whole tree is more efficient. But if you want a single apple, taking the low hanging fruit is the fastest approach. The metaphor works because it actually implies that it's not the most efficient approach, just the easiest
Everyone, even the amateurs and complete non–gardeners, know that this is not the best way to pick fruit. The whole point of the phrase is to point out that someone was lazy. It is saying that all they did was the absolute minimum amount of work.
You think of orchard owners, but that's not the image it conjures for me. I remember the neighborhood of my childhood. The fruit trees there were 90% decoration. Usually someone would pick one (1) fruit whenever they felt like getting one. Most of the fruits would never get picked at all. The ground would be full of overripe and rotting fruit, until someone could be bothered to clean it up.
> If your language’s compiler toolchain can manage to produce useful Wasm in a file that is less than a handful of over-the-wire kilobytes, you can win.
I agree that tiny binaries will open up new use cases for wasm! And WasmGC definitely helps.
As more context, Java and Kotlin can do fairly well there today, around 2-3 K:
Though as Andy says, it depends which APIs you use - I am sure there are Java/Kotlin APIs that would pull in large amounts of code, so you do need to be careful there. But these languages are already doing a lot better than C++ and Rust on code size, thanks to WasmGC (no need to bundle several K of memory management code, in particular).
I'm curious if wasmgc will help with rust as well. I can imagine it helping when handling javascript objects. And I think it could be used as an alternative, less efficient memory allocator.
But even so, the default rust allocator in wasm is probably fine in most cases. Once you start compiling for size, using wasm-opt and brotli compressing your wasm code, you can fit a massive amount of code in less than 100kb of downloaded content. And its a mistake to directly compare the cost of 100kb of wasm with 100kb of bundled javascript. Javascript is many times slower to parse and initialize. The download time is real, but actual time-to-first-paint is much better using 100kb of wasm vs 100kb of javascript.
But smaller is better. I'm quite excited for Java, Kotlin, C#, Python, Go and friends to all become viable languages for web applications. I'm curious what the resulting size of real applications will end up being. I suspect one of the biggest differences will be in how the frameworks are made. Virtual DOM diffing will always be more complex and slow than reactive component libraries like Svelte, Solidjs, Leptos (rust) and so on. Once wasmgc lands everywhere, I think which web framework you use will have a much bigger impact on performance than language.
> Wasm makes it thinkable to do DOM programming in languages other than JavaScript
Does it really? AFAIK, if I want to do any kind of DOM manipulation in say, rust, I need bindings that will basically serialize calls to be done on the JS side. So with the current incarnation of wasm, I believe you're still stuck with JS.
In theory you can import DOM functions from runtime & call with references to dom objects now, bypassing JS to call directly into runtime (I say in theory because I'm not in the know whether this is actually possible, but GC at least brings prerequisite mechanisms to get to that point)
GC feature in wasm extends to having opaque references outside linear memory (which may reference objects controlled by host system's gc) avoiding need for js caller to handle resource being freed by wasm code with a pool or something keyed by handles (integers) passed to wasm
I've been playing around with leptos for rust lately - which is a super fast framework for doing web frontend work with rust via wasm. It seems fine, honestly. Basically the same as solidjs:
There's some extra size overhead from wasm compared to javascript, but its honestly not that bad. After wasm-opt and brotli compression, the wasm bundle for this counter app is 37kb. So its in the same general ballpark as react, but much faster once its up and running.
I haven't tried doing direct DOM manipulation with it. But for general components it seems great.
The intent of WASI is to provide (among other things) a direct to DOM API. One where e.g. a DOM implemented in Rust can be used from a WASM module written in Rust without executing any Javascript.
Some of that gets fiddly because the DOM API is sort of specified with Javascript semantics.. so they're going for things with less Javascript legacy baggage first, like HTTP requests (server & client), TCP sockets, filesystem access.
"Dead code elimination" usually refers to smaller-scale compiler optimizations where within a function, you're discarding pieces of code that will never be reached.
"Tree-shaking" refers to a whole-program analysis where you discard entire modules and functions if they are never invoked.
They are conceptually the same, but a compiler author will likely have to implement them separately, so having two names helps.
FWIW I never encountered 'tree shaking' in the C/C++ world (only in the Javascript world). Commonly in the C/C++ world, dead-code-elimination is used for anything that removes (or ignores in the first place) unused code and data.
And I think with LTO it's all the same anyway (in the past there were a lot of gnarly details how the compiler/librarian actually created a static link library to avoid linking code that's not actually used, but that all doesn't matter anymore with LTO).
1) What words persist or become mainstream has little to do with how old they are. Tree shaking is evocative and is probably more appealing/approachable to say than "dead code elimination", so it became the more popular term.
2) I was curious about which term actually came first.
I couldn't find any use of the term "tree shaking" or "tree shaker" in the realm of computing on Google Scholar (it was all citrus tree or other arboreal topics, weird). The earliest discussion I could find with the word is this on comp.lang.lisp: https://groups.google.com/forum/#!topic/comp.lang.lisp/pspFr...
If it really originated in the Lisp context (as someone claimed here), it's because it's about as much about eliminating unnecessary data and metadata as it's about executable code.
Isn't the tree shaking metaphor based on how some fruit trees are harvested? When you shake the tree, the ripe fruit falls off. It's not the best metaphor in that with fruit harvesting you want the fruit, whereas when storing a serialized image you discard what falls off, but alas.
My philosophy is to work hard at optimizing my javascript with algorithms and design simplifications for size/performance to make room (in bundle size and CPU) for parts of my code that require brute force compute to solve.
In my ClubCompy project, I use WASM to implement a FAT filesystem atop local storage, which has proven to be very computationally expensive. And, I plan to use WASM for pixel-perfect sprite collision detection when I reintroduce that feature later this year.
My first implementation of collision detection was in plain javascript. With 256 sprites on the screen all colliding with one another, the framerate dropped to below 1fps. I believe I can get that done on a worker thread basically for free and have no performance impacts.
This is something that sounds like a good idea when you first hear it, but feels unpleasant when you actually play it. Most 2D games use rectangular collision hitboxes for good reason: it's easier for the player to predict if sprites will collide or not. With pixel-perfect collisions, the same movement will collide or not depending on the phase of the animation cycles. It feels bad to fail a movement that always worked before just because the animations happened to line up poorly. And pixel perfect isn't even realistic in many cases; small details on sprites can represent things like cloth or hair that in reality wouldn't cause a hard collision. And sprites often move multiple pixels per frame, so colliding individual pixels increases the chance of them clipping through each other. Simple, predictable collision detection is generally best.
Agreed, pixel perfect sprite collision detection has limited applicability in games, like missiles vs ship in shmups. And even then, one would probably want to implement a hitbox system smaller than the sprite images for fairness reasons.
My pixel perfect collision detection is not just sprite-to-sprite, but also sprite-to-playfield. I also signal on overlapping sprite-to-sprite rectangles.
Users can "turn off" the pixel-perfect collision detection to save on compute by resizing the sprite to anything other than 24x24 (even 24.1x24, which rounds down to 24x24 on the displayed sprite image.)
I want pixel perfect collision-detection for authenticity and to provide an eclectic programming tool that creators may find clever uses for.
1fps is the kind of frame rate you should be getting with 20000 collisions, not 256. Your algorithm is the bottleneck, here, not the programming language.
You say "pixel-perfect". If you have enough spare memory, one simple algorithm would be: render an offscreen canvas of the whole arena, draw each sprite as a stencil in a different colour, and test against that. Linear time, and no need to segment anything. (You might need to use the high bits of the canvas, though: I don't know how anti-fingerprinting measures work, but I expect they replace the low bits of a canvas' data with noise.)
The last time this code ran was in 2012, and computers and JS engines were slower then.
Also, my Comfy language interpreter is easy to bog down with sprite signal processing. The only cure for that is improving interpreter performance. I need it to run good on tablets and phones, lots of tuning to do in general.
I like your algorithm idea! Stensils are indeed a fast way to work. I think I'll still do better on a worker thread with bit swizzling in the CPU rather than reading back stencil video memory. I'll have to spike it.
This article is very correct: Wasm has a code size problem. This is a problem in browsers because all that code has to be downloaded to start the site. It's also a problem for serverless architectures, where code is often loaded from cold storage to a specific server on-demand while a client waits.
Tree-shaking might help, but I feel like it's only an incremental optimization. Fundamentally the reason Wasm programs are so bloated is because they have to bring their whole language runtimes and standard libraries with them. In contrast, with JavaScript, the implementation and basic libraries are provided by the browser. But obviously the browser can be expected to have language runtimes for every language pre-loaded...
... or... could it?
I think we need to consider another approach as well: Shared libraries and dynamic linking.
WebAssembly supports dynamic linking. Multiple Wasm modules can be loaded at the same time and call each other. However, many Wasm toolchains do not attempt to support it. Instead, they are often design to statically link an entire program (plus language runtime) into a single gargantuan module.
Pyodide (CPython on Wasm) is a counter-example. It is designed for dynamic linking today. This is precisely why Cloudflare Workers (a serverless platform) was recently able to add first-class support for Python[0]. (I'm the tech lead for the overall Workers platform.) A single compiled copy of the Pyodide runtime is shared by all Workers running on the same machine, so it doesn't have to be separately loaded for each one.
If dynamic linking were more widely supported, then we could start thinking about an architecture where browsers have various popular language runtimes (and perhaps even popular libraries) preloaded, so that all web pages requiring that runtime can share the same (read-only) copy of that code. These runtimes would still run inside the sandbox, so there's no need for the browser to trust them, just make them available. This way we can actually have browsers that have "built-in" support for languages beyond JavaScript -- without the browser maintainers having to fully vet or think about those language implementations.
> browsers have various popular language runtimes (and perhaps even popular libraries) preloaded, so that all web pages requiring that runtime can share the same (read-only) copy of that code.
That sounds a lot like the idea from some years past that commonly used JavaScript frameworks would be served from a few common CDNs and would be widely enough used to be almost always in cache in the browser, and therefore won't need to actually be downloaded for most pages (hence, the size of the js frameworks shouldn't matter so much)
I'm no expert but from what I understand, that didn't really work out very well. A combination of too many different versions of these libraries (so each individual version is actually not that widely used), and later privacy concerns that moved browsers toward partitioning cache by site or origin. Maybe other reasons too.
Of course, you didn't mention caching and perhaps that's not what you had in mind, but I think it's a tricky problem (a social problem more than a technical one): do you add baseline browser support for increasing numbers of language runtimes? That raises the bar for new browsers even further and anyway you'll never support all the libraries and runtimes people want. Do you let people bring their own and rely on caching? Then how do you avoid the problems previously encountered with caching JS libs?
These are good questions and I think there's more than one answer that's worth exploring.
I think that the privacy problems caused by shared caches could be solved, without simply prohibiting them altogether. Like, what if you only use the shared cache after N different web sites have requested the same module?
But if we really can't get around that problem, then I think another approach worth exploring is for there to be some sort of curated repository somewhere of Wasm modules that are popular enough that browsers should pre-download them. Then the existence of the module in a user's browser doesn't say anything about what sites they have been to.
Versioning is a problem, yes. If every incremental minor release of a language runtime is considered a separate version then it may be rare for any two web sites to share the same version. The way the browser solves this for JavaScript is to run all sites on the latest version of the JS runtime, and fully commit to backwards compatibility. If particular language runtimes could also commit to backwards compatibility at the ABI level, then you only need to pre-download one runtime per language. I realize this may be a big cultural change for some of them. It may be more palatable to say that a language is allowed to do occasional major releases with breaking changes, but is expected to keep minor releases backwards-compatible, so that there are only a couple different runtime version needed. And once a version gets too old, it falls out of the preload set -- websites which can't be bothered to stay up to date get slower, but that's on them.
This is definitely the kind of thing where there's no answer that is technically ideal and people are going to argue a lot about it. But I think if we want to have the web platform really support more than just JavaScript, we need to figure this out.
I think a better model would be for the site itself to provide the modules, but the browser will hash and cache them for the next site that may want to use the same module.
This way, there's no central authority that determines what is common enough.
This model does not allow for versioning. For this model, it would be risky to allow it (one website could provide a malicious model that infects the next site you visit).
> Like, what if you only use the shared cache after N different web sites have requested the same module?
That would still let websites perform timing attacks to deanonymise people. There's no way to verify that "N different websites" isn't just the same website with N different names.
Though, we could promote certain domains as CDNs, exempt from the no-shared-cache rules: so long as we added artificial delay when it "would have" been downloaded, that'd be just as safe. We're already doing this with domains (HSTS preload list), so why not CDNs?
Web browser developers seem to labour under the assumption that anyone will use the HTML5 features they've so lovingly hand-crafted. Who wants something as complicated as:
<details>
<summary>Eat me</summary>
<p>Lorem ipsum and so on and so forth…</p>
</details>
<Accordion>
<AccordionSummary id="panel-header" aria-controls="panel-content">
Eat me
</AccordionSummary>
<AccordionDetails>
Lorem ipsum and so on and so forth…
</AccordionDetails>
</Accordion>
Maybe the problem isn't the libraries. Maybe the problem is us.
The problem is the libraries. Browsers are still mostly incapable of delivering usable workable building blocks especially in the realm of UI. https://open-ui.org/ is a good start, but it will be a while before we see major pay offs.
Another reason is that the DOM is horrendously bad at building anything UI-related. Laying out static text and images? Sure, barely. Providing actual building blocks for a UI? Emphatically no.
And that's the reason why devs keep reinventing controls. Because while details/summary is good, it's extremely limited, does not provide all the needed features, and is impossible to properly extend.
It seems like limited dynamic linking support could go a long way.
For example, there could be a Go shared library that includes the runtime and core parts of the standard library that many programs use. It would decrease the size of all Go programs, without needing to have dynamic library support within an app. The language runtime might not need heavy optimization for space. It’s already loaded, and as long as any program uses a function, it’s not wasted space.
It changes the cost model for optimizing programs in that language for space. Since included standard library functions are free (if you’re using the language at all), you might as well use them.
Though, the problem reoccurs with commonly used libraries and frameworks. You’d also want Cloudflare’s standard library for Go to be shared when running on Cloudflare.
One problem with this model is that languages don’t evolve in lockstep with the runtime. Either there would be limited support for different versions of a language, or the shared libraries available would pile up over time, resulting in limited sharing between apps. JavaScript has the “you don’t get a choice” versioning model, which requires strong backward compatibility and sometimes polyfills. It might not be as suitable for other languages.
When a runtime really wants to cut down on space, it can be done by limiting plugin diversity. Though there are complaints, “you must use JavaScript” worked out pretty well for browsers.
Maybe we don’t need a lot of different WebAssembly-based languages? It’s a tower of babel situation. Diversity has costs.
Could it be possible to do "profile guided tree-shaking" to build a small module with all the code that's necessary for the application and pull-in less used functionality on-demand using dynamic linking?
If tree-shaking was done based on production information it may be possible to prune a lot of dead/almost-dead code without having to implement sophisticated static analysis algorithms.
A lazy chunked delivery strategy like used in the k8s stargz-snapshotter[0] project could be effective here, where it only pulls chunks as needed, but it would probably require wasm platform changes.
There is a substantial risk there unless you can hit all the edge cases and error conditions when profiling. Even a good fuzzer can miss a very rare state. Then when you hit it in real use there's no code to handle it!
Profile-based optimization and JITting is plausible because the corner cases are still there, just not optimized.
I completely agree, that's why in that case you could download the missing code from the server and load it using dynamic linking.
The server would then mark it as reachable so it's delivered as part of the main bundle next time.
I would expect the bundle to converge quickly to the set of functions that are actually reachable.
Aditionally, it's very likely that the sets of reachable code of two versions of the same app have significant overlap, so the information collected for version N could be used as a starting point for N+1, and so on.
> we could start thinking about an architecture where browsers have various popular language runtimes (and perhaps even popular libraries) preloaded
that could potentially lead to hundreds of versions of runtimes downloaded in the browser, filling up the cache with binaries that might be used by 1 site each
I think I agree overall, just want to point out that with Wasm, you still end up using a fair bit of the built-into-browser js to accomplish things not purely computational. Especially in this context with Hoot [1], where things like appendChild are external functions you call inside the scheme. One could theoretically do this for much of the js standard library in any kind of wasm context.
Indeed, I/O APIs (anything that talks to the outside world) are another sore point for WebAssembly, as browsers do not currently expose any particular APIs directly to Wasm, only to JavaScript. So Wasm has to make calls to a JavaScript middleman layer to use those APIs.
But browsers are understandably hesitant to create a whole parallel API surface designed specifically for Wasm callers. That's a lot of work.
I am not totally convinced that this is a real problem, vs. just something that makes people feel bad. Like, if you are coding Rust, the idea that all your "system calls" are calling into a layer of JavaScript feels disgusting. But is it a real problem? Most of these calls are probably not so performance sensitive that this FFI layer matters that much.
If it is a real problem, I'd guess the answer is for browsers to come up with a more efficient way to expose WebIDL-defined APIs to Wasm, but without reinventing any individual APIs. Being derived from WebIDL, they are still going to have JS idioms in their design, but maybe we can at least skip invoking actual JavaScript.
Lot I don't know about how browsers are shipped, but it seems to me like browsers could easily get away with packing in a few languages and their STLs as part of their default installs. Python is what, 25MB? Would another couple hundred megs of disk space be such a big deal?
Possibly – if you can find a single version of Python that everybody will be happy with, forever.
Being able to cache runtimes and libraries like that across sites would be nice, though (but probably enables fingerprinting, so one Python runtime per origin it is).
The way Emscripten does it, IIRC, doesn't require any special browser support. The toolchain generates glue code in JavaScript to support calls between dynamically linked Wasm modules.
Do you happen to know where can I check out the cutoff version for each browser? https://caniuse.com/?search=wasm doesn't have it (or other things like WasmGC for that matter)
I believe dynamic linking has been a core feature of WebAssembly from the beginning. You have always been able to load multiple Wasm modules in the same isolate and make them call each other.
(But, language toolchains have to actually be designed to use this feature. Most aren't.)
> If your language’s compiler toolchain can manage to produce useful Wasm in a file that is less than a handful of over-the-wire kilobytes, you can win.
Zig is perfect for this.
Personally, I don't think this is an important factor if Wasm file size is less than 100K. It does matter if the file size is over MB.
Builtin GC is only important for some apps, not all. It is best to make your web app GC-free.
The most important factor for apps using Wasm to succeed is still performance benefit.
I have Blazor apps running on Cloudflare pages. They download fast and the performance is great. The load time is terrible though. I think it is unsolvable with .NET. I think the core issue is how everything is entangled by design in oo langs.
Also, the amount of money put into js is hard to compete with. Then to also have copy/paste as a lang feature in js is like cheating.
Third. With Blazor at least you still need js and skills in that area. I think this is my main issue.
> I think the core issue is how everything is entangled by design in oo langs.
It's not an issue with object orientation. An issue is that languages of that era often relied on reflection for many use cases. People are actually working hard to rid large parts of .NET from these use cases and mark them as safe for eliminating unused code.
When there's the possibility of using reflection to call a method, you don't really know what you can safely eliminate. It might look like nothing is calling `Foo.Bar()`, but what if someone has done `Reflection.getClass(someClass).runMethod(someVar)` and those variables have been set to "Foo" and "Bar"?
For example, Dart doesn't allow reflection with precompiled apps because it allows them to safely eliminate unused code (https://docs.flutter.dev/resources/faq#does-flutter-come-wit...). Dart is an object oriented language, but it has eschewed runtime code generation and runtime reflection for compile time code generation.
.NET is also headed in this direction, but that doesn't happen overnight.
However, as others have pointed out, part of the issue will be that non-JS languages will still need to ship implementations of standard library stuff that's included in the JS runtime in the browser (at least the pieces of the standard library you're using post-tree-shaking).
Blazor...is not great at this. Currently, the packaging model does not do the justice to capabilities of .NET trimming because of the limitations of how the WASM is currently packaged on top of Mono. If you want to see how well it can actually trim (which everyone seems to like calling tree-shaking, even though it's not exactly right), it is better to try out building regular applications with AOT - they produce small binaries.
The experimental support in runtimelab for NativeAOT-LLVM targeting WASM provides much smaller bundle sizes and much better performance but given that it still is under dotnet/runtimelab rather than dotnet/runtime, I don't know when it will be available.
> Wasm makes it thinkable to do DOM programming in languages other than JavaScript
Can't help but picking this out for correction - people have been doing it for a long time in compile-to-JS languages - eg ClojureScript, TypeScript, ReasonML and many others.
And people have also been compiling native-ish stuff for the web a long time before Wasm through asm.js & emscripten, like C and through that C-based languages such as Python: https://en.wikipedia.org/wiki/Asm.js#Adoption
But can't you use WASM to create GUI's like Photoshop, with no JavaScript or DOM?
Isn't the bigger goal of GUI's on WASM is we can jettison JavaScript/DOM and go back to writing GUI's like 10-20 years ago, with simpler libraries. Like SKIA, or something. Using non-web GUI libraries, since they could be compiled to WASM and run in web.
EDIT: Native. I mean pre-web, when GUI libraries were native, everything ran locally.
Seemed like WASM would let apps be built like that again, but would be in browser for deployment.
I don't think that's a very good goal. Jettisoning the DOM means jettisoning accessibility and being able to leverage everything that the browser gives you out-of-the-box. You have to render to a canvas and build everything from scratch. I think Wasm is great for supplementing a JS app, not replacing it (e.g. using a Wasm module to do some calculations in a Worker). I like to use the right tool for the job, and trying to use something other than JS to build a web app just seems a little janky to me.
At one point, there was a Host Bindings proposal that would enable you to do DOM manipulation (it looks like it was archived and moved to the Component Model spec [1]). That would probably be the ideal way to avoid as much JS as possible. However, browser vendors have been heavily optimizing their JS runtimes, and in some cases, Wasm may actually be slower than JS.
I've been following Wasm's progress for several years, which has been slow, but steady. Ironically, I think the web is actually the worst place to use it. There's so much cool non-web stuff being done with it and I'm more interested to see where that goes.
> But can't you use WASM to create GUI's like Photoshop, with no JavaScript or DOM?
Yes, you can. If you have the time and the money. To quote Figma: "Pulling this off was really hard; we’ve basically ended up building a browser inside a browser.": https://www.figma.com/blog/building-a-professional-design-to... And that's just for the canvas part. Figma's UI is React.
you need a good UI library if you want to do it like native. And native platforms have those. There's nothing of the sort for any of WebGL/Canvas/WebGPU
It'll have to be GPU accelerated somehow, outside the wasm boundary. At 1080p it's hard to make pixels move fast enough to get away with CPU rasterization.
"Old-fashioned" desktop GUI systems, like Win32 MDI, are built out of composeable widgets. HTML is built out of composeable widgets. Whatever you do with Wasm and direct pixel rendering? That isn't.
The travesty that is modern web development is, absolutely, a step back from what we had back in the day. But that's not inherent to the web: it's a deficiency of practice. Many web APIs are delightful, and while they're all clunky, Win32 MDI also has its footguns. https://devblogs.microsoft.com/oldnewthing/20120213-00/?p=83...
I agree. Still looking for something that bridges that gap, to make modern web pages as simple as old WinForms.
The only web based tools that I've come across that seem to hide all the JavaScript/html, and be 'simple' are tools like Elmish, or some of the other functional programming 'DSL's. Where the page is kind of composed by functions that compile into the JavaScript. With JavaScript, by the time I'm manipulating the DOM, I keep thinking some tool should be handling this for me.
It feels like OCaml is well positioned for this kind of flow analysis to perform aggressive tree shaking. With GC support in WASM OCaml should produce tiny binaries.
As things go along, I’m more and more mystified by WASM’s apparent design. It feels like 1996 Java for applets, but without the built-in GC, stdlib, or even the most basic hooks into the browser. Which means it’s basically useless for what I assume its goal is, of letting you use languages other than JavaScript to code a web page. Without trivial DOM access, what’s the point?
Other proposed usages like for FaaS-style edge widget computing honestly don’t make a lot of sense. Why target an artificial VM for that purpose instead of existing architectures that work just fine with less overhead.
For compute that could happen in either the browser or the edge, maybe, but how much is that level of portability really realistic or worthwhile? I’m guessing it might be in a few narrow cases, but it’s not going to be a common pattern.
In a nutshell, Wasm is essentially a CPU with a tiny instruction set. It's very primitive and minimal, but I think that was the point. If you need to do something with numbers in a web app, it's pretty neat. If you need to work with strings, you're going to end up crying in the fetal position under your desk.
The current iteration of WebAssembly was always an "MVP." It's got the core instruction set and memory model to run, essentially, C programs safely and efficiently, and just enough interop with the host to get data in and out.
But it was always the plan to expand on that, and make a wider set of use cases easier and more efficient. Working directly with the DOM really requires some amount of integration with the GC that manages the DOM, for example.
The thing that makes this interesting outside the browser is the security model. Unlike typical environments used for FaaS or whatever else, it's capability based and starts from literally zero- everything a WebAssembly module can touch has to be passed in explicitly when it's instantiated. That's a lot narrower, lighter weight, and more flexible than things like containers.
But I thought WASM was to replace .NET and JAVA VM's. But, without a VM, to be more a pass through to the underlying CPU. So much faster. So compile down to a 'byte code' like thing, that does not run on a VM, but runs on the underlying HW.
> So compile down to a 'byte code' like thing, that does not run on a VM, but runs on the underlying HW.
I think you may be confusing "system" virtual machines and "process" virtual machines. The VM here is the thing executing the WASM bytecode; it works the same was as .NET and JAVA VMs.
I understand it is a VM like Java. I was just under the impression that it was somehow better, more streamlined, that would offer enough performance improvement that you could start treating it like running a 'native' local app. Like if I build a 'native' app, a 'thick client', I could now run it on WASM in a browser. Thus not need any local installs, but have same performance.
I've seen some apps doing that. But guess it isn't considered 'the way' for the future?
> But I thought WASM was to replace .NET and JAVA VM's.
It's name is literally web assembly. It's goal was never and still isn't to replace those VMs. It was literally an idea to create a faster code sandbox for the web based on the idea's from Mozilla's asm.js
> to be more a pass through to the underlying CPU. So much faster. So compile down to a 'byte code' like thing, that does not run on a VM, but runs on the underlying HW.
wat.
> So goal was speed. And to allow other languages to compile to it.
Yes, it was. Has nothing to do with the fantasy of replacing JVM and .Net VM or running directly on hardware.
Of course, that is where it started.
I'm probably being loose in terminology.
I assumed that to gain this speed, that it was a little closer to the metal than a VM like Java. That it must have some kind of pass through to allow commands to run on the local HW, not just emulated in a VM. So like a VM in that you can compile to it, but it would execute natively.
From the FAQ.
"WebAssembly aims to execute at native speed by taking advantage of common hardware capabilities available on a wide range of platforms. It is a low-level assembly-like language with a compact binary format that runs with near-native performance"
And when looking at the Use Cases, it seems to be trying to do a lot more than javascript.
The big utility of WASM for me, like OP hints at, is bringing things that would be infeasible to port to the web to it, like, say https://pgaskin.net/kepubify/ and other conversion tools (eg ffmpeg-wasm). Much preferable to downloading something or uploading a file to some random person’s server
1. avoid floats (fixed point arithmetic saved quite a bit of space)
2. avoid hashmaps (originally used hashmaps since it was easy to port JS maps to, have since ported everything to vecs)
3. avoid strings (for awhile there were no strings, but eventually brought it in for display logic)
4. use a small allocator, like talc
5. avoid dependencies. I only use rand & fxhash. I should probably get rid of rand (fxhash only used to hash game state to check for desyncs)
6. avoid generic diversity. I try to keep a small set of instantiated types, for example Vec<i16> is there so no need to bring in Box<[i16]> or anything. Getting away from floats/hashmaps helped reduce type diversity
7. design algorithms with size in mind, I have a couple lookup tables where I pack bits https://github.com/serprex/openEtG/blob/2011007dec2616d1a24d... encodes an adrenaline mechanic where multiple attacks give lower attack power creatures more attacks than higher attack power creatures. Care was taken comparing how much decoding logic cost compared to storing unpacked values. AI evaluation uses 6 bit fixed precision because 64 encodes more efficiently than 128 in webassembly
Similarly there's a targeting mechanism with AND/OR & predicates. I used to have an AST like format with each predicate getting an enum & AND/OR being a slice of expressions. Now each expression is 32 bit integers encoding expression in polish notation, AND/OR have 2 bit codes & predicates are 6 bits (polish notation won over reverse polish here because with polish notation I was able to have AND/OR short circuit evaluation)