Hacker News new | past | comments | ask | show | jobs | submit login
Tree Shaking (wikipedia.org)
36 points by hidden-spyder on Aug 1, 2021 | hide | past | favorite | 33 comments



1. In real life, you shake real trees, to get fresh fruit.

2. In JS, you shake fake trees, to shed bad fruit.

3. Common wisdom says: "Ye shall know them by their fruits."

Therefore, all JS dependency trees are rotten. /s


What's the term used for tree shaking outside JS? Do Java or Python or Ruby or PHP do something as advanced as tree shaking? Having migrated a whole company from Java to JS I know that the JS ecosystem is very large and birthed incredible tools that are lacking in other languages/playgrounds. webpack is an incredible builder compared to what's available for Java/Kotlin on Android and Swift on iOS. And webpack just got blown out of the water from esbuild. So my hypothesis (until proven wrong with links) is that the other languages never really reached the awesomeness in tooling when tree shaking is done.


It’s called Dead Code Elimination in the context of an optimizing compiler. Both GCC and LLVM have supported naive versions for a long time; more recently, Link-Time Optimization has allowed both to fully prune functions that can’t be called by the program.

I don’t know about PHP, but both Python and Ruby have module/import semantics that explicitly allow functions to be smuggled across “visibility”/access boundaries. So they can’t really do this sort of optimization, at least not without breaking a whole bunch of real code.


I also forgot to mention: these sorts of optimizations probably aren't very effective on Ruby and Python anyways. Compiled code benefits from DCE because it simplifies control/call flow and decreases instruction cache pressure, while JS benefits from it because it decreases the amount of text you're serving over the network.

Python and Ruby are aren't going to benefit (much) from the former and the latter doesn't apply, which is probably why it hasn't been a priority in either language.


Server-side programming languages and apps do not need to optimize "bundle" size as JS need to because client downloads (expensive and slow to download large js file as most internet usage now is via mobile phones) and executes the code (also slow as it need to be done on client every time). PHP often does not even need build system as we rarely use transpiling to older/newer versions because we don't have this problem of supporting different browsers. Package/file count does not have any meaningful impact of application speed


In general it’s called “dead code elimination.” [1] It’s been around for a long time. A linker can do a basic version of this, but advanced optimizations require whole-program optimization and it’s not typically done since it affects compile time and binary size isn’t as important as it is for web browsers.

GWT was doing tree shaking for Java to JavaScript compilation before most JavaScript programmers had the tool chain for it.

[1] https://en.m.wikipedia.org/wiki/Dead_code_elimination


Here’s a conversation about tree shaking in Lisp from 1994 [1] (from the posted Wikipedia page).

[1] https://groups.google.com/g/comp.lang.lisp/c/pspFr1XByZk


One of the earliest "tree shaker" might be the "selective linker" of MacScheme (an early Scheme implementation for the Apple Macintosh), which was developed in 1985 by William Clinger. It's for example mentioned in the 1987 release announcement for MacScheme+Toolsmith Version 1.0:

https://ml.cddddr.org/scheme/msg00189.html

Lucid Common Lisp had a tree shaker as part of their delivery tool. That one was developed from 1988 onwards.


JS doesn’t really peak in anything. Everything in JS people think is unique has been done decades ago. The comments show that tree shaking is no exception.


Linking?


Computer terms often fail to live up to their colloquial use!

The joke in college was that parent (processes) must be sure to kill their children, otherwise they turn into zombies. (Unix / POSIX model of processes)


In a similar vein, one of my first FreeBSD commits was "serial killers shouldn't commit suicide" -- preventing the killall utility from killing itself.


Bad fruit yields good alcohol, therefore JS brings the party.


Tree shaking is such a well-named concept that I could tell you exactly what it was the first time I heard someone say it, even though they hadn't defined it for me yet. It's such a joy when we can map abstract mechanisms to plain language so easily.


One issue that I haven't seen discussed often yet are "dead dependencies" - something where I believe tree shaking is essential.

Image the following scenario: You have an app A that pulls in library B as a direct dependency. B pulls in another library C as a transitive dependency. C may have further transitive dependencies of its own (D, E, F...)

As it happens, B is a "toolbox/commons/utils" library, which bundles several small, unrelated bits of functionality into a single asset for convenience. A only uses a fraction of B's functionality and it does not use the parts of B that require C.

So not only does B suddenly contains lots of dead code, the entire library C and all of transitive dependencies are dead code. None of those libraries are actually used by anything and A's developers will likely have no idea what they do at all - yet the build system considers them essential and will bundle them with the application.

There is one escalation of this, which I guess you could call "undead code": Some frameworks (e.g. Spring for Java) use classpath scanning or similar mechanisms to automatically detect and execute modules across your codebase. This scan includes modules pulled in through transitive dependencies!

So in the worst case, a library that you never consciously added to your project and that has no actual reason to be there will execute code when your app is running.


(Discussed briefly in the article) This is one of the benefits of ESM. Top-level imports and exports are static, allowing significantly greater confidence in identifying which code is actually referenced, versus previous module systems like CJS or AMD.

This is also why named imports are often encouraged over whole-module namespace imports (though in theory that shouldn’t make much of a difference unless the namespace is accessed with dynamic keys).

For example, tree shaking ESM makes this equivalent to the sibling comment example’s first line:

    import { array } from 'lodash';
Dynamic imports can be similarly optimized, but only to the extent the import path is static, e.g.

    import(`./foo/bar/${quux}.js`)
Can eliminate everything adjacent to /foo/bar, but cannot eliminate anything within /foo/bar without more advanced analysis of dynamic usage. Of course TypeScript can help here if `quux` is narrower than `string`, but I’m not sure how much that’s in use by existing bundlers.


JS module namespacing resolves this. Take a look at how lodash separates its utilities:

  var array = require('lodash/array');
  var object = require('lodash/fp/object');
https://github.com/lodash/lodash


When this happens, tree shaking is acting as a workaround for unfortunate library design. Utilities should be split up based on what they depend on.

Otherwise, you end up downloading lots of libraries you don’t use, and then tree shaking prunes them. The final result is okay, but this results in unnecessary compiles and makes builds slower.


I wonder why tree-shaking wasn't always the default for, say, JS bundlers. If a compiler/analyzer knows what the entry point of a program is, as well as any symbols it exports to the outside world, isn't it relatively simple to figure out what's not being used?

I could be misunderstanding something.


Couple of weird language edge cases

JS allows functions to by called by name.

A.foo();

Can be

A["foo"]()

Because foo is now a string it's possible to add levels of indirection.

Action(name) { A[name](); }

It's possible the list of actions to perform are sent to the client as data.

This amiguity has left enough doubt for most tools, and people making tree shaking a practice on code not commonly performed. The longer that happens the scarier it becomes to start investigating.

Since the browser will tree shake for you the incentive to cleanup your source is also less important.


But in this case A is exported. No one is going to tree shake methods of a class.


It doesn't need to be a class; that's a detail:

  function foo() { ... }
  var callthis = "foo"
  window[callthis]()
And this is true for any object; attaching functions to objects is kinda common.

And of course, "callthis" is usually defined in a much more complex way, such as if (user_input_is_this) callthis = "foo" else callthis = "bar". Or what about callthis = "foo"; callthis += "_bar"?

In general this kind of stuff is tricky in dynamic languages (and also in static languages once you start using reflection); JavaScript isn't really an exception here. You really need to have a deep understanding of the code and logic to truly be certain that something will never be called.


> [...] Often contrasted with traditional single-library dead code elimination techniques common to minifiers, tree shaking eliminates unused functions from across the bundle by starting at the entry point and only including functions that may be executed.

Assuming, that methods of classes count as "function" in the terminology of that wiki page, then it seems to say methods of a class are shaken out. If they are not seen as functions (again, in that terminology), then I guess not.


I think the article is over-simplifying, I believe most tree-shaking is done at the `import`/`export` boundaries.

Code that isn't used locally in a module and is not imported by any other modules is omitted.

---

It helps that there isn't a global object for the local module scope, otherwise in-module dead code detection would also not be possible, since any line of code within it could use dynamic access and static analysis wouldn't be able to prove that it isn't used.


Dead code elimination is only straightforward if your language has sane naming semantics (lexical scope, static name resolution, etc). If your language does crazy stuff like allowing function lookup by the string representation of its name in the source code (e.g. JavaScript) it becomes much harder.


Which is why Google's jscompiler requires a stricter subset of JS for "advanced optimizations" including dead code elimination.


C allows you to call functions by string...

C++ also allows it, but the function names are a bit harder to guess.


> C allows you to call functions by string

please elaborate


Possibly OP meant dlopen/dlsym on itself? But that's a stretch, and not really language feature...


And functions aren't guaranteed to be included in the binary unless you pass additional compiler flags with that strategy.


Very few things are guaranteed in C. If you want to guarantee the function exists let the operating system look up the function names by string for you.


Create a string, look up the address of a function in the library. Assign the address to a function pointer, call it.


I think tree-shaking will be a default in new JS bundler. ESbuild seem to enable it by default: https://esbuild.github.io/

I think the reason some other bundlers like webpack didn't/don't have it by default is because tree-shaking became popular after they were invented and tree-shaking was added to the bundler later on as an extra nice-to-have feature.

It seems webpack has built-in support for tree-shaking from version 2. Latest version of webpack might have it enabled by default (not 100% sure but possible) https://webpack.js.org/guides/tree-shaking/




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: