Hacker News new | past | comments | ask | show | jobs | submit login

IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem. That's essentially where we are today both in language repositories for OSS languages and private monorepos.

This is partly due to how we've distributed software over the last 40 years. In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.

Today we pile libraries on top of libraries on top of libraries. Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running. Who knows or cares what all 'foolib' contains.

At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file. Adding optional functionality can get ugly when it would require creating new modules, but if you only want to use a tiny part of the module, what do you do?

The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.

Its a terrible idea and I'd hate it, but how else do you address the current setup of effectively building the whole universe of code branching from your dependencies and then dragging it around like a boat anchor of dead code.




> IMO any system where taking a dependency is "easy" and there is no penalty for size or cost is going to eventually lead to a dependency problem.

Go and C# (.NET) are counterexamples. They both have great ecosystems and just as simple and effective package management as Rust or JS (Node). But neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs and even large frameworks like ASP.NET or EF Core.

A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example. But again, Go and C# are proving them wrong. A great std lib is a solution, but one that comes with huge efforts that can only be made by large organisations like Google (Go) or Microsoft (C#).


No it doesn't.

A large stdlib solves the problems the language is focused on. For C# and Go that is web hosts.

Try using them outside that scope and the dependencies start to pile in (Games, Desktop) or they are essentially unused (embedded, phones, wasm)


> A large stdlib solves the problems the language is focused on

That's part of it, but it also solves the problem of vetting. When I use a Go stdlib I don't have to personally spend time to vet it like it do when looking at a crate or npm package.

In general, Go & Rust packages on github are high quality to begin with, but there is still a pronounced difference between OS packages and what is approved to be part of the language's own stdlib.

It's nice to know thousands of different companies already found the issues for me or objected to them in reviews before the library was published.


“Web server” is a pretty big use case though.

But I agree that graphics is often overlooked in std libs. However that’s a bit of a different beast. Std libs typically deal with what the OS provides. Graphics is its own world so to speak.

As for Wasm: first, that’s a runtime issue and not a language issue. I think GC is on the roadmap for Wasm. Second, Go and C# obviously predate Wasm.

In the end, not every language should be concerned with every use case. The bigger question is whether it provides a std lib for the category of programs it targets.

To take a specific example: JS isn’t great at efficiently and conveniently generating dynamic HTML. You can go far without (or minimal) dependencies and some clever patterns. But a lot of pain and work hours would have been saved if it had something that people want to use out of the box.


> “Web server” is a pretty big use case though.

You don't consider games, desktop and mobile applications big use cases, each being multi billion industries?

I don't know man, I feel like you're arguing in bad faith and are intentionally ignoring what the athrowaway3z said: it works there because they're essentially languages specifically made to enable web development . That's why their standard lib is plenty for this domain.

I can understand that web development might be the only thing you care about though, it's definitely a large industry - but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.


> but the thesis of a large standard lib solving the dependency issue really isnt true, as (almost) every other usecase beyond web development shows.

I don't think the dependency issue can be solved by a good std lib, but it certainly can be mitigated as some languages show.

I think JS is a very pronounced case study here.


Web is likely bigger than all of those together. And large part of mobile and desktop apps depends on the web tech these days.


Specifically those languages are back end focused so about 28% of developers. 55 focus on front end. If you add up games desktop and mobile, oddly you get 28% as well. So not bigger but the same size good intuition! That leaves out embedded 8% and systems (8-12%). Which are probably more what rust is used for. There is obviously overlap and we haven't mentioned database or scientific programming at 12 and 5 percent respectively.

Edit: after rereading this I feel like I may have come across sarcastic, I was legitimately impressed a guess without looking it up would peg the ratio that closely. It was off topic as a response too. So I'll add that rust never would have an asynch as good as tokio, or been able to have asynch in embedded as with embassy, if it hadn't opted for batteries excluded. I think this was the right call given its initial focus as a desktop/systems language. And it is what allowed it to be more than that as people added things. Use cargo-deny, pin the oldest version that does what you need and doesn't fail cargo deny. There are several hundred crates brought in by just the rust lang repo, if you only vet things not in that list, you can save some time too.


"Web server" is, more or less, about converting a database into JSON and/or HTML. There are complexities there, sure, but it's not like it's some uniquely monumental undertaking compared to other fields.


Not all web servers deal in HTML or JSON, many don't have databases outside of managing their internal state.

Even ignoring that, those are just common formats. They don't tell you what a particular web server is doing.

Take a few examples of some Go projects that either are web servers or have them as major components like Caddy or Tailscale. Wildly different types of projects.

I guess one has to expand "web server" to include general networking as well, which is definitely a well supported use case or rather category for the Go std lib, which was my original point.


> Web server" is, more or less, about converting a database into JSON and/or HTML

You seem to have a very different definition of "web server" to me.


Just to explain this confusion, the term “web server” typically refers specifically to software that is listening for HTTP requests, such as apache or nginx. I would use the term “application server” to refer to the process that is processing requests that the web server sends to it. I read “web server” in their comment as “application server” and it makes sense.


Yes. That's the same distinction I would expect. Although I'm not sure that the database stuff is the role I'd usually look for in the application server itself.

Maybe it's a language community thing.


Ah, yeah, I did mean “application”. You’re right about the “application server” being a weird place for db connections.


actually dotnet also does not need too many dependencies for games and desktop apps.


So it comes out of box with good renderers, physics engines, localization, input controllers and in-game GUIs?


The libraries you listed are too specialized. And they require integration with asset pipeline which is well outside of scope of a programming language.

As for the generic things, I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.


> I think C# is the only mainstream language which has small vectors, 3x2 and 4x4 matrices, and quaternions in the standard library.

They've got SIMD-accelerated methods for calculating 3d projection matrices. No other ecosystem is even close once you start digging into the details.


To be fair, there is no language that has a framework that contains all of these things... unless you're using one of the game engines like Unity/Unreal.

If you're willing to constrain yourself to 2D games, and exclude physics engines (assume you just use one of the Box2D bindings) and also UI (2D gamedevs tend to make their own UI systems anyway)... Then your best bet in the C# world is Monogame (https://monogame.net/), which has lots of successful titles shipped on desktop and console (Stardew Valley, Celeste)


> To be fair, there is no language that has a framework that contains all of these things.

Depends. There is Godot Script. Seeing how it comes with a game engine.

But original claim was

    > actually dotnet also does not need too many dependencies for games and desktop apps.
If you're including languages with big game engines. It's a tautology. Languages with good game engines, have good game engines.

But general purpose programming language has very little to gain from including a niche library even if it's the best in business. Imagine if C++ shipped with Unreal.


Those are extremely specialized dependencies. Whereas in Rust, we talk about e.g. serde, which is included in the std libs for many major languages

Are you really trying to compare serde to rendering engines?


>A great std lib is obviously the solution. Some Rust defenders are talking it down by giving Python as counter example.

Python's standard library is big. I wouldn't call it great, because Python is over 30 years old and it's hard to add things to a standard library and even harder to remove them.


There are things added from tine to time, but yeah, some stuff in there just feels dated at this point.

I’m still hoping we can get a decently typed argparse with a modern API though (so much better for tiny scripts without deps!)


I'm thankful argparse exists in pythons stdlib. But argument parsing is not that hard especially for simpler programs. programmers should be able to think for a minute and figure it out instead of always reaching for clap, thats how you get dependency hell.

Argument parsing, in partucular, is a great place to start realizing that you can implement what you need without adding a dozen dependencies


Hard disagree. Standardized flag parsing is a blessing on us all, do not want to jave to figure out what flag convention the author picked to implement of the many lile one does with non getopt c programs.

Don't disagree with the principle, there are a lot of trivial pythong deps, but rolling your own argument parsing is not the way


Again, argument parsing is not that hard most of the time. You dont have to make your own conventions. Thats just weird.

If youve never thought about it, it might seem like you need an off-the-shelf dependency. But as programmers sometimes we should think a bit more before we make that decision.


Argument parsing is absolutely the kind of thing where I'd reach for a third-party library if the standard library didn't provide (and in Python's case, maybe even then - argparse has some really unpleasant behaviours). When you look through library code, it might seem like way more than you'd write yourself, and it probably is. But on a conceptual level you'll probably actually end up using a big chunk of it, or at least see a future use for it. And it doesn't tend to pull in a lot of dependencies. (For example, click only needs colorama, and then only on Windows; and that doesn't appear to bring in anything transitively.)

It's a very different story with heavyweight dependencies like Numpy (which include reams of tests, documentation and headers even in the wheels that people are only installing to be a dependency of something else, and covers a truly massive range of functionality including exposing BLAS and LAPACK for people who might just want to multiply some small matrices or efficiently represent an image bitmap), or the more complex ones that end up bringing in multiple things completely unrelated to your project that will never be touched at runtime. (Rich supports a ton of wide-ranging things people might want to do with text in a terminal, and I would guess most clients probably want to do exactly one of those things.)


You can, but there’s always a tradeoff, as soon as I’ve added about the 3rd argument, I always wish i had grabbed a library, because i’m not getting payed to reinvent this wheel.


Sure. And thats how you get leftpad and dependency "supply chain" drama.


Are you really comparing clap with leftpad?


While not everything in Python's stdlib is great (I am looking at you urllib), I would say most of it is good enough. Python is still my favorite language to get stuff done exactly because of that.


Maybe Python 4 will just remove stuff.


My personal language design is strongly inspired by what I imagine a Python 4 would look like (but also takes hints from other languages, and some entirely new ideas that wouldn't fit neatly in Python).


> but neither Go or C# have issues with dependency hell like Rust or even more JavaScript, because they have exceptional std libs

They also have a lot narrower scope of use, which means it is easier to create stdlib usable for most people. You can't do it with more generic language.


I would say C# gets used almost everything at Microsoft between GUIs, backends, DirectX tooling (new PIX UI, Managed DirectX and XNA back in Creative Arcade days), Azure,..., alongside C++, and even if Microsoft <3 Rust, in much bigger numbers.


I didn't understand the embedded systems argument. Just because a standard lib is large doesn't mean it all ends up in the compilation target.


Indeed, it has no bearing on binary size at all, because none of it will be included. If you are coming from the perspective where the standard library is entirely unusable to begin with, then improving the standard library is irrelevant at best. It also likely means that at least some time and effort will be taken away from improving the things that you can use to be spent on improving a bunch of things that you can't use.

I feel like this is an organizational problem much more than a technical one, though. Rust can be different things to different people, without necessarily forcing one group to compromise overmuch. But some tension is probably inevitable.


> Indeed, it has no bearing on binary size at all, because none of it will be included.

That depends on the language. In an interpreted language (including JIT), or a language that depends on a dynamically linked runtime (ex c and c++), it isn't directly included in your app because it is part of the runtime. But you need the runtime installed, and if your app is the only thing that uses that runtime, then the runtime size is effectively adds to your installation size.

In languages that statically link the standard library, like go and rust, it absolutely does impact binary size, although the compiler might use some methods to try to avoid including parts of the standard library that aren't used.


Embedded Rust usually means no_std Rust, in which case no, neither the standard library nor any runtime to support it get included in the resulting binary. This isn't getting externalized either; no_std code simply cannot use any of the features that std provides. It is roughly equivalent to freestanding C.

What you say is true enough for external-runtime languages and Go, though TinyGo is available for resource-constrained environments.


Well, Rust's standard library has three components, named core, alloc and std

The no_std Rust only has core but this is indeed a library of code, and freestanding C does not provide such a thing = freestanding C stdlib provides no functions, just type definitions and other stuff which evaporates when compiled.

Two concrete examples to be going along with: Suppose we have a mutable foo, it's maybe foo: [i32; 40]; (forty 32-bit signed integers) or in C maybe they're int foo[40];.

In freestanding C that's fine, but we're not provided with any library code to do anything with foo, we can use the core language features to write it outselves, but nothing is provided.

Rust will happily foo.sort_unstable(); this is a fast custom in-place sort, roughly a modern form of introspective sort written for Rust by its creators and because it's in core, that code just goes into your resulting embedded firmware or whatever.

Now, suppose we want to perform a filter-map operation over that array. In C once again you're left to figure out how to write that in C, in Rust foo impl IntoIterator so you can use all the nice iterator features, the algorithms just get baked into your firmware during compilation.


I don’t want a large std lib. It stifles competition and slows the pace of development. Let libraries rise and fall on their own merits. The std lib should limit itself to the basics.


I think this is partially true, but more nuanced than just saying that Rust std lib is lacking.

Compared to go and c#, Rust std lib is mostly lacking:

- a powerful http lib

- serialization

But Rust approach, no Runtime, no GC, no Reflection, is making it very hard to provide those libraries.

Within these constraints, some high quality solutions emerged, Tokio, Serde. But they pioneered some novel approaches which would have been hard to try in the std lib.

The whole async ecosystem still has a beta vibe, giving the feeling of programming in a different language. Procedural macros are often synonymous with slow compile times and code bloat.

But what we gained, is less runtime errors, more efficiency, a more robust language.

TLDR: trade-offs everywhere, it is unfair to compare to Go/C# as they are languages with a different set of constraints.


I would say compared to other languages Rust feels even more lacking.

All those AFAIR need 3rd party packages:

Regex, DateTime, base64, argument parsing, url parsing, hashing, random number generation, UUIDs, JSON

I'm not saying it's mandatory, but I would expect all those to be in the standard library before there is any http functionality.


Having some of those libraries listed and then not being able to change API or the implementation is what killed modern C++ adoption (along with the language being a patchwork on top of C).

As some of the previous commenters said, when you focus your language to make it easy to write a specific type of program, then you make tradeoffs that can trap you in those constraints like having a runtime, a garbage collector and a set of APIs that are ingrained in the stdlib.

Rust isn't like that. As a system programmer I want none of them. Rust is a systems programming language. I wouldn't use Rust if it had a bloated stdlib. I am very happy about its stdlib. Being able to swap out the regex, datetime, arg parsing and encoding are a feature. I can choose memory-heavy or cpu-heavy implementations. I can optimize for code size or performance or sometimes neither/both.

If the trade-offs were made to appease the easy (web/app) development, it wouldn't be a systems programming language for me where I can use the same async concepts on a Linux system and an embedded MCU. Rust's design enables that, no other language's design (even C++) does.

If a web developer wants to use a systems programming language, that's their trade-off for a harder to program language. The similar type safety to Rust's is provided with Kotlin or Swift.

Dependency bloat is indeed a problem. Easy inclusion of dependencies is also a contributing factor. This problem can be solved by making dependencies and features granular. If the libraries don't provide the granularity you want, you need to change libraries/audit source/contribute. No free meals.


Yeah I’ve encountered the benefit of this approach recently when writing WASM binaries for the web, where binary size becomes something we want to optimize for.

The de facto standard regex library (which is excellent!) brings in nearly 2 MB of additional content for correct unicode operations and other purposes. The same author also makes regex-lite, though, which did everything we need, with the same interface, in a much smaller package. It made it trivial to toss the functionality we needed behind a trait and choose a regex library appropriately in different portions of our stack.


> Being able to swap out the regex, datetime, arg parsing and encoding are a feature

A feature present on every language that has those in the stdlib.


Not necessarily, when other components of the stdlib depend on them


Also not necessarily with third-party libraries.


Indeed. However, you need to recognize that having those features in stdlib creates a huge bias against swapping them out. How many people in Java actually uses alternative DB APIs than JDBC? How many alternative encoding libraries are out there for JSON in Go? How about async runtimes, can you replace that in Go easily?


True! Although it’s easier to swap out a third party lib that’s using a bloated dependency than it is to avoid something in std.


> All those AFAIR need 3rd party packages: Regex

Regex is not 3rd party (note the 'rust-lang' in the URL):

https://github.com/rust-lang/regex


3rd party relative to the standard library. In other words: not included.


Create a new library, name it as "Standard library", include and reexport al those libraries, profit.


This won't solve supply chain issues.


Linux distributions are built this way. Distro maintainers selects libraries and versions to include, to create solid foundation for apps.


Which still doesn't solve the supply chain issues...


> Procedural macros are often synonymous with slow compile times and code bloat.

In theory they should reduce it because you wouldn’t make proc macros to generate code you don’t need…right? How much coding time you save with macros compared to manually implementing them?


To be fair I think Rust has very healthy selection of options for both, with Serde and Reqwest/Hyper being de-facto standard.

Rust has other challenges it needs to overcome but this isn't one.

I'd put Go behind both C#/F# and Rust in this area. It has spartan tooling in odd areas it's expected to be strong at like gRPC and the serialization story in Go is quite a bit more painful and bare bones compared to what you get out of System.Text.Json and Serde.

The difference is especially stark with Regex where Go ships with a slow engine (because it does not allow writing sufficiently fast code in this area at this moment) where-as both Rust and C# have top of the line implementations in each which beat every other engine save for Intel Hyperscan[0].

[0]: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa... (note this is without .NET 9 or 10 preview updates)


> (because it does not allow writing sufficiently fast code in this area at this moment)

I don't think that's why. Or at least, I don't think it's straight-forward to draw that conclusion yet. I don't see any reason why the lazy DFA in RE2 or the Rust regex crate couldn't be ported to Go[1] and dramatically speed things up. Indeed, it has been done[2], but it was never pushed over the finish line. My guess is it would make Go's regexp engine a fair bit more competitive in some cases. And aside from that, there's tons of literal optimizations that could still be done that don't really have much to do with Go the language.

Could a Go-written regexp engine be faster or nearly as fast because of the language? Probably not. But I think the "implementation quality" is a far bigger determinant in explaining the current gap.

[1]: https://github.com/golang/go/issues/11646

[2]: https://github.com/matloob/regexp


> At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

I'm not convinced that happens that often.

As someone working on a Rust library with a fairly heavy dependency tree (Xilem), I've tried a few times to see if we could trim it by tweaking feature flags, and most of the times it turned out that they were downstream of things we needed: Vulkan support, PNG decoding, unicode shaping, etc.

When I did manage to find a superfluous dependency, it was often something small and inconsequential like once_cell. The one exception was serde_json, which we could remove after a small refactor (though we expect most of our users to depend on serde anyway).

We're looking to remove or at least decouple larger dependencies like winit and wgpu, but that requires some major architectural changes, it's not just "remove this runtime option and win 500MB".


I was very 'impressed' to see multiple SSL libraries pulled into rust software that never makes a network connection.


This is where a) a strong stdlib and b) community consensus on common packages tends to help at least mitigate the problem.

My feeling is that Python scores fairly well in this regard. At least it used to. I haven't been following closely in recent years.


A lot of people dunk on Java, but its standard library is rock solid. It even is backward compatible (mostly).


Did you dig any deeper over which paths that was pulled in?


Not in Rust, but I've seen it with Python in scientific computing. Someone needs to do some minor matrix math, so they install numpy. Numpy isn't so bad, but if installing it via conda it pulls in MKL, which sits at 171MB right now (although I have memories of it being bigger in the past). It also pulls in intel-openmp, which is 17MB.

Just so you can multiply matrices or something.


> Someone needs to do some minor matrix math, so they install numpy

I’m just not convinced that it’s worth the pain to avoid installing these packages.

You want speedy matrix math. Why would you install some second rate package just because it has a lighter footprint on disk? I want my dependencies rock solid so I don’t have to screw with debugging them. They’re not my core business - if (when) they don’t “just work” it’s a massive time sink.

NumPy isn’t “left pad” so this argument doesn’t seem strong to me.


Because rust is paying the price to compile everything fromch scratch on a release build, you can pay a little extra to turn on link time optimization and turn of parallelism on release builds and absolutely nothing gets compiled in that you don't use, and nothing gets repeated. Also enabling symbols to be stripped can take something with tokio, clap, serde, nalgebra (matrix stuff) and still be 2-5Mb binary. That is still huge to me because I'm old, but you can get it smaller if you want to recompile std along with your other dependencies.


MKL is usually what you want if you are doing matrix math on an Intel CPU.

A better design is to make it easy you to choose or hotswap your BLAS/LAPACK implementation. E.g. OpenBLAS for AMD.

Edit: To be clear, Netlib (the reference implementation) is almost always NOT what you want. It's designed to be readable, not optimized for modern CPUs.


I would argue that BLIS is what you want. It is proper open source and not tied to Intel platforms.


Symbol culling and dead code removal is already a thing in modern compilers and linkers, and rust can do it too: https://github.com/johnthagen/min-sized-rust


Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO. It's a band-aid on the problem. A useful and pragmatic band-aid today for sure, but it fundamentally bothers me that we have to spend time compiling code and then spend more time to analyze and rip it back out.

Part of the issue I have with the dependency bloat is how much effort we currently go through to download, distribute, compile, lint, typecheck, whatever 1000s of lines of code we don't want or need. I want software that allows me to build exactly as much as I need and never have to touch the things I don't want.


> Others have made similar comments, but tree-shaking, symbol culling and anything else that removes dead code after its already been distributed and/or compiled is too late IMO.

Why, in principle, wouldn't the same algorithms work before distribution?

For that matter, check out the `auditwheel` tool in the Python ecosystem.


As others have pointed out elsewhere, that only removes static dependencies. If you have code paths that are used depending on dynamic function arguments static analysis is unable to catch those.

For example, you have a function calling XML or PDF or JSON output functions depending on some output format parameter. That's three very different paths and includes, but if you don't know which values that parameter can take during runtime you will have to include all three paths, even if in reality only XML (for example) is ever used.

Or there may be higher level causes outside of any analysis, even if you managed a dynamic one. In a GUI, for example, it could be functionality only ever seen by a few with certain roles, but if there is only one app everything will have to be bundled. Similar scenarios are possible with all kinds of software, for example an analysis application that supports various input and output scenarios. It's a variation of the first example where the parameter is internal, but now it is external data not available for an analysis because it will be known only when the software is actually used.


The situation isn't quite as dire as you portray. Compilers these days can also do devirtualization. The consequent static calls can become input to tree shaking in the whole program case. While it's true that we can't solve the problem in general, there's hope for specific cases.


Way back when, I used to vendor all the libraries for a project (Java/Cpp/Python) into a mono repo and integrate building everything into the projects build files so anyone could rebuild the entire app stack with whatever compiler flags they wanted.

It worked great, but it took diligence, it also forces you to interact with your deps in ways that adding a line to a deps file does not.


One nice thing about cargo is that it builds all your code together, which means you can pass a unified set of flags to everything. The feature of building everything all the time as a whole has a bunch of downsides, many which are mentioned elsewhere, but the specific problem of not being able to build dependencies the way you want isn't one.


This is the default way of doing things in the monorepo(s) at Google.

It feels like torture until you see the benefits, and the opposite ... the tangled mess of multiple versions and giant transitive dependency chains... agony.

I would prefer to work in shops that manage their dependencies this way. It's hard to find.


I've never seen a place that does it quite like Google. Is there one? It only works if you have one product or are a giant company as it's really expensive to do.

Being able to change a dependency very deep and recompile the entire thing is just magic though. I don't know if I can ever go back from that.


It's the same that we're doing for external crates in QEMU's experiments with Rust. Each new dependency is added to the build by hand.


[flagged]


It absolutely is so, or was for the 10 years I was there. I worked on Google3 (in Ads, on Google WiFi, on Fiber, and other stuff), in Chromium/Chromecast, Fiber, and on Stadia, and every single one of those repos -- all different repositories -- used vendored deps.


I would absolutely do this for any non-toy project.

Alternatively, for some project it might be enough to only depend on stuff provided by Debian stable or some other LTS distro.


Maven was the one the started the downfall into dependency hell. (Ant as well, but it was harder to blindly include things into it)

Kids today don't know how to do that anymore...


Yet Maven repository is still not that bloated even after 20+ years Java et al. being one of the most popular language.

Compared to Rust where my experience with protobuf lib some time ago was that there is a choice of not 1 but even 3 different libraries, one of which doesn't support services, another didn't support the syntax we had to support, and the third one was unmaintained. So out of 3 choices no single one worked.

Compared that to Maven, where you have only one official supported choice that works well and well maintained.


More time enables more consolidation.


No, there were never several unofficial libraries, one of which eventually won the popularity contest. There was always only one official. There is some barrier to add your project there, so might be that helped.

It's even more pronounced with the main Java competitor: .Net. They look at what approach won in Java ecosystem and go all in. For example there were multiple ORM tools competing, where Microsoft adopted the most popular one. So it's even easier choice there, well supported and maintained.


> Microsoft adopted the most popular one

That's still consolidation, and it also needs time.

Even in Rust crates like hashbrown or parkinglot have been basically subsumed in the standard library.


Agree, after I thought for more examples. Thanks, don't know why others downvote.

Besides consolidation point, I still think that "barrier to entry" point is still valid -- if it's more effort to even publish a library, its author is probably more committed already.


This works very well until different parts of the deps tree start pulling same Foo with slightly different flags/settings. Often for wrong reasons but sometimes for right ones, and then its new kind of “fun”. Sometimes buildsystem is there to help you but sometimes you are on your own. Native languages like C++ bring special kind of joy called ODR violations to the mix…


>At each level a caller might need 5% of the functionality of any given dependency. The deeper the dependency tree gets the more waste piles on. Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call, but all you did was take that one dependency to format a number.

So, what's is the compiler doing that he doesnt remove unused code?


"dependency" here I guess means something higher-level that your compiler can't make the assumption you will never use.

For example you know you will never use one of the main functions in the parsing library with one of the arguments set to "XML", because you know for sure you don't use XML in your domain (for example you have a solid project constraint that says XML is out of scope).

Unfortunately the code dealing with XML in the library is 95% of the code, and you can't tell your compiler I won't need this, I promise never to call that function with argument set to XML.


Why the compiler can't detect it will not be used? Tree shaking is well implemented in Javascript compilers, an ecosystem which extensively suffer from this problem. It should be possible to build a dependency graph and analyze which functions might actually end up in the scope. After all the same is already done for closures.


A more realistic example: something like printf or scanf. It can take an object of multiple types as argument. It takes computer's locale from environment and does locale dependent number and date formatting, also supports various timezones that it reads from OS.

And you always run it in a data center that uses a specific locale, and only UTC time zone, and very few simple types. But all this can only be known at runtime, except maybe types if the compiler is good.


As poster dead deep in the thread below, something like this can happen

doc_format = get_user_input() parsed_doc = foolib.parse(doc_format)

You as the implementer might know the user will never input xml, so doc_format can't be 'xml' (you might even add some error handling if the user inputs this), but how can you communicate this to the compiler?


That's called bad library design. Rather than a global, make an instantiated parser that takes in specific codecs.


It aint matter, if format is comes from runtime then compiler will not know.


What you're calling "tree shaking" is more commonly called "dead code elimination" in compilers, and is one of the basic optimisations that any production compiler would implement.


A surprising amount of code might be executed in rarely-used or undocumented code paths (for example, if the DEBUG environment variable is 1 or because a plugin is enabled even if not actually used) and thus not shaken out by the compiler.


What makes you think that a lot of code is hidden behind dbg env variable instead of e.g dbg build?


Plenty of libraries have "verbose" logging flags ship way more than assumed. I remember lots of NPM libs that require `winston` for example are runtime-configurable. Or Java libraries that require Log4J. With Rust it's getting hard to remember because everything today seems to pull the fucking kitchen sink...

And even going beyond "debug", plenty of libraries ship features that are downright unwanted by consumers.

The two famous recent examples are Heartbleed and Log4shell.


> Its super easy to say `import foolib`, then call `foolib.do_thing()` and just start running.

It's effectively an end-run around the linker.

It used to be that you'd create a library by having each function in its own compilation unit, you'd create a ".o" file, then you'd bunch them together in a ".a" archive. When someone else is compiling their code, and they need the do_thing() function, the linker sees it's unfulfiled, and plucks it out of the foolib.a archive. For namespacing you'd probably call the functions foolib_do_thing(), etc.

However, object-orientism with a "god object" is a disease. We go in through a top-level object like "foolib" that holds pointers to all its member functions like do_thing(), do_this(), do_that(), then the only reference the other person's code has is to "foolib"... and then "foolib" brings in everything else in the library.

It's not possible for the linker to know if, for example, foolib needed the reference to do_that() just to initialise its members, and then nobody else ever needed it, so it could be eliminated, or if either foolib or the user's code will somehow need it.

> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

I can say that, at least for Go, it has excellent dead code elimination. If you don't call it, it's removed. If you even have a const feature_flag = false and have an if feature_flag { foobar() } in the code, it will eliminate foobar().


foolib is the name of the library, not an object.

It also happens to be an object, but that's just because python is a dynamic language and libraries are objects. The C++ equivalent is foolib::do_thing(); where foolib is not an object.


> Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

Clarification: Go allows for a very simple multi-file. It’s one feature I really like, because it allows splitting otherwise coherent module into logical parts.


Further: I’ve never seen rust encourage anything of the sort. Module directory with a mod.rs and any number of files works just fine.


I probably mischaracterized this as its been a while since I did more than trivial Rust. AFAIK its not possible to depend on only a part of a module in Rust though right? (At least without an external build system)

For example, you can't split up a module into foo.rs containing `Foo` and bar.rs containing `Bar`, both in module 'mymod' in such a way that you can `use mymod::Bar and foo.rs is never built/linked.

My point is the granularity of the package/mod encourages course-grained deps, which I argue is a problem.


You'd use feature flags to enable certain parts of the library.


> not possible to depend on only a part of a module in Rust though right

yesn't, you can use feature flags similar to `#if` in C

but it's also not really a needed feature as dead code elimination will prune out all code functions, types, etc. you don't use. Non of it will end up in the produced binary.


Yeah, likewise Rust is completely fine after you say `mod foo` and have a file named foo.rs, if you also make a foo/ directory and put foo/whatever.rs and foo/something_else.rs that those are all part of the foo module.

Historically Rust wanted that foo.rs to be renamed foo/mod.rs but that's no longer idiomatic although of course it still works if you do that.


to extend on this:

in rust crates are semantically one compilation unit (where in C oversimplified it's a .h/.c pair, and practically rustc will try to split it in some more units to speed up build time).

the reason I'm pointing this out is because many sources of "splitting a module across files" come from situations where 1 file is one compilation unit so you needed to have a way to split it (for organization) without splitting it (for compilation) in some sitation


Not just multiple files, but multiple directories. One versioned dependency (module) usually consists of dozens of directories (packages) and dozens to hundreds of files. Only newcomers from other languages create too many go.mod files when they shouldn't.


> Eventually you end up in a world where your simple binary is 500 MiB of code you never actually call,

It’s getting hard to take these conversations seriously with all of the hyperbole about things that don’t happen. Nobody is producing Rust binaries that hit 500MB or even 50MB from adding a couple simple dependencies.

You’re also not ending up with mountains of code that never gets called in Rust.

Even if my Rust binaries end up being 10MB instead of 1MB, it doesn’t really matter these days. It’s either going on a server platform where that amount of data is trivial or it’s going into an embedded device where the few extra megabytes aren’t really a big deal relative to all the other content that ends up on devices these days.

For truly space constrained systems there’s no-std and entire, albeit small, separate universe of packages that operate in that space.

For all the doom-saying, in Rust I haven’t encountered this excessive bloat problem some people fret about, even in projects with liberal use of dependencies.

Every time I read these threads I feel like the conversations get hijacked by the people at the intersection of “not invented here” and nostalgia for the good old days. Comments like this that yearn for the days of buying paid libraries and then picking them apart anyway really reinforce that idea. There’s also a lot of the usual disdain for async and even Rust itself throughout this comment section. Meanwhile it feels like there’s an entire other world of Rust developers who have just moved on and get work done, not caring for endless discussions about function coloring or rewriting libraries themselves to shave a few hundred kB off of their binaries.


I agree on the bloat, considering my rust projects typically don't use any shared libraries other than a libc a few Mb for a binary including hundreds of crates in dependencies (most pf which are part of rustc or cargo itself), doesn't seem so bad. I do get the asynch thing. It just isn't the right tool for most of my needs. Unless you are in the situation where you need to wait faster (for connections usually) threads are better for trying to compute faster than asynch is.


This idea is already implemented in Dotnet, with Trimming and now ahead of time compilation (AOT). Maybe other languages can learn from dotnet?

https://learn.microsoft.com/en-us/dotnet/core/deploying/trim...

https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...


dead code elimination is a very old shoe

which get reinvented all the time, like in dotnet with "trimming" or in JS with "tree-shaking".

C/C++ compiler have been doing that since before dot net was a thing, same for rust which does that since it's 1.0 release (because it's done by LLVM ;) )

The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.

Which also brings use to one area where it's not out of the box, if you build .dll/.so in one build process and then use them in another. Here additional tooling is needed to prune the dynamic linked libraries. But luckily it's not a common problem to run into in Rust.

In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization. The problem of tons of LOC in dependencies is one of supply chain trust and review ability more then anything else.


> The reason it gets reinvented all the time is because while it's often quite straight forward in statically compiled languages it isn't for dynamic languages as finding out what actually is unused is hard (for fine grained code elimination) or at lest unreliable (pruning submodules). Even worse for scripting languages.

It seems to me in a strict sense the problem of eliminating dead code may be impossible for code that uses some form of eval(). For example, you could put something like eval(decrypt(<encrypted code>,key)), for a user-supplied key (or otherwise obfuscated); or simply eval(<externally supplied code>); both of which could call previously dead code. Although it seems plausible to rule out such cases. Without eval() some of the problem seems very easy otoh, like unused functions can simply be removed!

And of course there are more classical impediments, halting-problem like, which in general show that telling if a piece of code is executed is undecidable.

( Of course, we can still write conservative decisions that only cull a subset of easy to prove dead code -- halting problem is indeed decidable if you are conservative and accept "I Don't Know" as well as "Halts" / "Doesn't Halt" :) )


Yes, even without Eval, there's a ton of reflective mechanisms in JS that are technically broken by dead code elimination (and other transforms, like minification), but most JS tools make some pretty reasonable assumptions that you don't use these features. For example, minifiers assume you don't rely on specific Function.name property being preserved. Bundlers assume you don't use eval to call dead code, too.


reflective code is evil.


> In general most code size problems in Rust aren't caused by too huge LOC of dependencies but by an overuse of monopolization

*monomorphization, in case anyone got confused


And here I thought that Rust already killed unions


Those are done at compile time. Many languages (including Rust, which this story is about) also remove unused symbols at compile time.

The comment you're replying to is talking about not pulling in dependencies at all, before compiling, if they would not be needed.


I don't think libraries are the problem, but we don't have a lot of visibility after we add a new dependency. You either take the time to look into it, or just add it and then forget about the problem (which is kind of the point of having small libraries).

It should be easy to build and deploy profiling-aware builds (PGO/BOLT) and to get good feedback around time/instructions spent per package, as well as a measure of the ratio of each library that's cold or thrown away at build time.


I agree that I don't like thinking of libraries as the problem. But they do seem to be the easiest area to point at for a lot of modern development hell. Is kind of crazy.

I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.

Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.

This gets equally frustrating when our metrics for determining the safety of something largely discourages inaction on any dependencies. They have to add to it, or people think it is abandoned and not usable.

Note that this isn't unique to software, mind. Hardware can and does go through massive changes over the years. They have obvious limitations that slow down how rapidly they can change, of course.


> Instead, the problem is one of stability. In a "foundation that doesn't move and cause you to fall over" sense of the word. Consider if people made a house where every room had a different substructure under it. That, largely, seems to be the general approach we use to building software. The idea being that you can namespace a room away from other rooms and not have any care on what happens there.

I'm not sure what the problem is here.

Are you after pinning dependencies to be sure they didn't change? Generally I want updating dependencies to fix bugs in them.

Are you after trusting them through code review or tests? I don't think there's shortcuts for this. You shouldn't trust a library, changing or not, because old bugs and new vulnerabilities make erring on both sides risky. On reviewing other's code, I think Rust helps a bit by being explicit and fencing unsafe code, but memory safety is not enough when a logic bug can ruin your business. You can't avoid testing if mistakes or crashes matter.


Stability in that you don't want to take on a dependency that will throw a migration at you within the next decade. Or longer. You also don't want one that will introduce enabled sweeping features in the common path.

Examples: Google's Guava for the migration department. Apache Commons would be a good example of how not to make life painful for users there.

For sweeping features, Log4j introduced some pretty terrible security concerns.


> I'll note that it isn't just PGO/BOLT style optimizations. Largely, it is not that at all, oddly.

Well, it's not required to trim code that you can prove unreachable, true. But I was thinking about trying to measure if a given library really pulls it's non-zero weight, and how much CPU is spent in it.

A library taking "too much time" for something you think can be done faster might need replacement, or swapping for a simple implementation (say the library cares about edge cases you don't face or can avoid).


Fully agreed in that this can be heavily subjective. And some things are flat out difficult with no real way of saying what is "too big."

My point on PGO/BOLT not being relevant was more that I see people reaching for libraries to do things such as add retries to a system. I don't think it is a terrible idea, necessarily, but it can be bad when combined with larger "retries plus some other stuff" libraries.

Now, fully granted that it can also be bad when you have developers reimplementing complicated data structures left and right. There has to be some sort of tradeoff calculation. I don't know that we have fully nailed it down, yet.


> It's a terrible idea...

It's a terrible idea because you're trying to reinvent section splitting + `--gc-sections` at link time, which rust (which the article is about) already does by default.


The article is about Rust, but I was commenting on dependencies in general.

Things like --gc-sections feels like a band-aid, a very practical and useful band-aid, but a band-aid none the less. You're building a bunch of things you don't need, then selectively throwing away parts (or selectively keeping parts).

IMO it all boils down to the granularity. The granularity of text source files, the granularity of units of distribution for libraries. It all contributes to a problem of large unwieldy dependency growth.

I don't have any great solutions here, its just observations of the general problem from the horrifying things that happen when dependencies grow uncontrolled.


A consideration that is often overlooked is that the waste accumulates exponentially!

If each layer of “package abstraction” is only 50% utilised, then each layer multiplies the total size by 2x over what is actually required by the end application.

Three layers — packages pulling in packages that pull their own dependencies — already gets you to 88% bloat! (Or just 12% useful code)

An example of this is the new Windows 11 calculator that can take several seconds to start because it loads junk like the Windows 10 Hello for Business account recovery helper library!

Why? Because it has currency conversion, which uses a HTTP library, which has corporate web proxy support, which needs authentication, which needs WH4B account support, which can get locked out, which needs a recovery helper UI…

…in a calculator. That you can’t launch unless you have already logged in successfully and is definitely not the “right place” for account recovery workflows to be kicked off.

But… you see… it’s just easier to package up these things and include them with a single line in the code somewhere.


if only we had a system that we could all operate on with a standard set of tools that would take care of shared resource access like this.


As far as I'm aware, LTO completely solves this from a binary size perspective. It will optimise out anything unused. You can still get hit from a build time perspective though.


"completely solves" is a bit of an overstatement. Imagine a curl-like library that allows you to make requests by URL. You may only ever use HTTP urls, but code for all the other schemas (like HTTPS, FTP, Gopher) needs to be compiled in as well.

This is an extreme example, but the same thing happens very often at a smaller scale. Optional functionality can't always be removed statically.


That only applies when dynamic dispatch is involved and the linker can't trace the calls. For direct calls and generics(which idiomatic Rust code tends to prefer over dyn traits) LTO will prune extensively.


    let uri = get_uri_from_stdin();
    networking_library::make_request(uri);
How is the compiler supposed to prune that?


  let uri: Uri<HTTP> = get_uri_from_stdin().parse()?; 
If the library is made in a modular way this is how it would typically be done. The `HTTP` may be inferred by calls further along in the function.


So what happens if the user passes an url containing ftp:// or even https:// to stdin? Or is this an HTTP only library?


Depends on what is desired, in this case it would fail (through the `?`), and report it's not a valid HTTP Uri. This would be for a generic parsing library that allows for multiple schemes to be parsed each with their own parsing rules.

If you want to mix schemes you would need to be able to handle all schemes; you can either go through all variations (through the same generics) you want to test or just just accept that you need a full URI parser and lose the generic.


If you want to mix schemes you should just mix schemes.

  let uri: Uri<FTP or HTTP or HTTPS> = parse_uri(get_uri_from_stdin()) or fail;


See, the trait system in Rust actually forced you to discover your requirements at a very core level. It is not a bug, but a feature. If you need HTTPS, then you need to include the code to do HTTPS of course. Then LTO shouldn't remove it.

If your library cannot parse FTP, either you enable that feature, add that feature, or use a different library.


No, this wouldn't work. The type of the request needs to be dynamic because the user can pass in any URI.


Then they can also pass in an erroneous URI. You still need some way to deal with the ones you're not accepting.


I guess that depends on the implementation. If you're calling through an API that dynamically selects the protocol than I guess it wouldn't be removable.

Rust does have a feature flagging system for this kind of optional functionality though. It's not perfect, but it would work very well for something like curl protocol backends though.


That's a consequence of crufty complicated protocols and standards that require a ton of support for different transports and backward compatibility. It's hard to avoid if you want to interoperate with the whole world.


yes, it's not a issue of code size but a issue of supply chain security/reviewability

it's also not always a fair comparison, if you include tokio in LOC counting then you surely would also include V8 LOC when counting for node, or JRE for Java projects (but not JDK) etc.


And, reductio ad absurdum, you perhaps also need to count those 27 million LOC in Linux too. (Or however many LOC there are in Windows or macOS or whatever other OS is a fundamental "dependency" for your program.)


Or you could use APE and then all of those LOC go away. APE binaries can boot metal, and run on the big 3 OS from the same file.


It's certainly better than in Java where LTO is simply not possible due to reflection. The more interesting question is which code effectively gets compiled so you know what has to be audited. That is, without disassembling the binary. Maybe debug information can help?


Not only it is possible, it has been available for decades on commercial AOT compilers like Aonix, Excelsior JET, PTC, Aicas.

It is also done on the cousin Android, and available as free beer on GraalVM and OpenJ9.


Those all break compatibility to achieve that.


No they don't, PTC, Aicas, GraalVM and OpenJ9 support reflection.

The others no longer matter, out of business.


You can't LTO code under the presence of reflection. You can AOT but there will always be a "cold path" where you have to interpret whatever is left.


Yet it works, thanks to additional metadata, either in dynamic compiler which effectly does it in memory, throwing away execution paths with traps to redo when required, and with PGO like metadata for AOT compilation.

And since we are always wrong unless proven otherwise,

https://www.graalvm.org/jdk21/reference-manual/native-image/...

https://www.graalvm.org/latest/reference-manual/native-image...


You do understand that the topic at hand is not shipping around all that code needed to support a trap, right?


In Go, the symbol table contains enough information to figure this out. This is how https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck is able to limit vulnerabilities to those that are actually reachable in your code.


The symbol table might contain reflection metadata, but it surely can't identify what part of it will be used.


It's possible and in recent years the ecosystem has been evolving to support it much better via native-image metadata. Lots of libraries have metadata now that indicates what's accessed via reflection and the static DCE optimization keeps getting better. It can do things like propagate constants to detect more code as dead. Even large server frameworks like Micronaut or Spring Native support it now.

The other nice thing is that bytecode is easy to modify, so if you have a library that has some features you know you don't want, you can just knock it out and bank the savings.


Doesn’t Java offer some sort of trimming like C#? I know he won’t remove everything but at least they can trim down a lot of things.


Yes, jlink, code guard, R8/D8 on Android, if you want to stay at the bytecode level, plus all the commercial AOT compilers and the free beer ones, offer similar capabilities at the binary level.


Everywhere in this thread is debating whether LTO "completely" solves this or not, but why does this even need LTO in the first place? Dead code elimination across translation units in C++ is traditionally accomplished by something like -ffunction-sections, as well as judiciously moving function implementations to the header file (inline).


Clang also supports virtual function elimination with -fvirtual-function-elimination, which AFAIK currently requires full LTO [0]. Normally, the virtual functions can't be removed because the vtable is referencing them. It's very helpful in cutting down on bloat from our own abstractions.

[0] https://clang.llvm.org/docs/ClangCommandLineReference.html#c...


> As far as I'm aware, LTO completely solves this from a binary size perspective.

I wouldn't say completely. People still sometimes struggle to get this to work well.

Recent example: (Go Qt bindings)

https://github.com/mappu/miqt/issues/147


LTO only gets you so far, but IMO its more kicking the can down the road.

The analogy I use is cooking a huge dinner, then throwing out everything but the one side dish you wanted. If you want just the side-dish you should be able to cook just the side-dish.


I see it more as having a sizable array of ingredients in the pantry, and using only what you need or want for a given meal.


Then another group of armchair programmers will bitch you out for using small dependencies

I just don't listen. Things should be easy. Rust is easy. Don't overthink it


Some of that group of armchair programmers remember when npm drama and leftpad.js broke a noticeable portion of the internet.

Sure, don't overthink it. But underthinking it is seriously problematic too.


LTO gets a lot of the way there, but it won't for example help with eliminating unused enums (and associated codepaths). That happens at per-crate MIR optimisation iirc, which is prior to llvm optimisation of LTO.


The actual behavior of go seems much closer to your ideal scenario than what you attribute to it. Although it is more nuanced, so both are true. In go, a module is a collection of packages. When you go get a module, the entire module is pulled onto the host, but when you vendor only the packages you use (and i believe only the symbols used from that package, but am not certain) are vendored to your module as dependencies.


> In some cases the languages make this worse. Go and Rust, for example, encourage everything for a single package/mod to go in the same file.

What? I don't know about Go, but this certainly isn't true in Rust. Rust has great support for fine-grained imports via Cargo's ability to split up an API via crate features.


There's an interesting language called Unison, which implements part of this idea (the motivation is a bit different, though)

Functions are defined by AST structure and are effectively content addressed. Each function is then keyed by hash in a global registry where you can pull it from for reuse.


> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library. You end up with the minimal set of code for the functionality you need.

Or you have ultra-fine-grained modules, and rely on existing tree-shaking systems.... ?


If you think about it, every function already declares what it needs simply by actually using it. You know if a function needs another function because it calls it. So what exactly are you asking? That the programmer insert a list of dependent functions in a comment above every function? The compiler could do that for you. The compiler could help you and go up a level and insert the names of modules the functions belong to?


My understanding is that the existing algorithms for tree shaking (dead code elimination, etc. etc. whatever you want to call it) work exactly on that basis. But Python is too dynamic to just read the source code and determine what's used ahead of time. eval and exec exist; just about every kind of namespace is reflected as either a dictionary or an object with attributes, and most are mutable; and the import system works purely at runtime and has a dazzling array of hooks.


> The only real solution I can think of to deal with this long term is ultra-fine-grained symbols and dependencies. Every function, type, and other top-level language construct needs to declare the set of things it needs to run (other functions, symbols, types, etc). When you depend on that one symbol it can construct, on demand, the exact graph of symbols it needs and dump the rest for any given library.

That’s literally the JS module system? It’s how we do tree shaking to get those bundle sizes down.


As many others as mentioned, "tree shaking" is just a rebranded variation of dead code elimination which is a very old idea. I don't think JS does what OP is suggesting anyway, you certainly don't declare the exact dependencies of each function.


> At each level a caller might need 5% of the functionality of any given dependency.

I think that is much more of a problem in ecosystems where it is harder to add dependencies.

When it is difficult to add dependencies, you end up with large libraries that do a lot of stuff you don't need, so you only need to add a couple of dependencies. On the other hand, if dependency management is easy, you end up with a lot of smaller packages that just do one thing.


The late Joe Armstrong had an idea about open source that it should be just a collection of functions that we publish. It would solve this problem.


OTOH it also depends on the architecture you build. If you have a local-first thick client the initial install of 800 MB is less relevant if after install you communicate on a tightly controlled (by you) p2p networking stack, but take on heavy dependencies in the UI layer to provide you e.g. infinite collaborative canvas based collaboration and diagramming.


Small libraries are nice to reduce stuff, but are npm's isEven, isOdd and leftpad really the right solution? - Instead of a bunch of small libraries maintained by many individual maintainers I'd prefer a larger lib maintained by a group, where continuacy is more likely and different parts work together.


I am just a college student, so sorry if this is stupid, but we know that Rust compiler can detect unused code, variables, functions and all, as can IDE's for all languages, then why don't we just remove those parts? The unused code is just not compiled.


Mainly because in some libs some code is activated at runtime.

A lot of the bloat comes from functionality that can be activated via flags, methods that set a variable to true, environment variables, or even via configuration files.


When talking about LTO we don't expect it to be removing code used in runtime. Such code is not dead code, by definition.

If you want to disable certain runtime features, you'd do so with feature flags.


Sure, but I'm talking about bloat in libraries that don't get LTO'd. If there are no feature flags and no plugin functionality, LTO can't do its job. There are plenty of non-core libraries like this.


Agreed it’s a problem and I can’t propose a solution other than something you’ve suggested which is referencing functions by their value (tldr hashing them) kinda like what Unison(?) proposes.

But I think the best defense against this problem at the moment is to be extremely defensive/protective of system dependencies. You need to not import that random library that has a 10 line function. You need to just copy that function into your codebase. Don’t just slap random tools together. Developing libraries in a maintainable and forward seeking manner is the exception not the rule. Some ecosystems exceed here, but most fail. Ruby and JS is probably one of the worst. Try upgrading a Rails 4 app to modern tooling.

So… be extremely protective of your dependencies. Very easy to accrue tech debt with a simple library installation. Libraries use libraries. It becomes a compounding problem fast.

Junior engineers seem to add packages to our core repo with reckless abandon and I have to immediately come in and ask why was this needed? Do you really want to break prod some day because you needed a way to print a list of objects as a table in your cli for dev?


> In the 80s the idea of a library of functionality was something you paid for, and painstakingly included parts of into your size constrained environment (fit it on a floppy). You probably picked apart that library and pulled the bits you needed, integrating them into your builds to be as small as possible.

If anything, the 1980s is when the idea of fully reusable, separately-developed software components first became practical, with Objective-C and the like. In fact it's a significant success story of Rust that this sort of pervasive software componentry has now been widely adopted as part of a systems programming language.


You're talking about different 80s. On workstations and Unix mainframes, beasts like Smalltalk and Objective C roamed the Earth. On home computers, a resident relocatable driver that wasn't part of ROM was an unusual novelty.


Yeah, 1990s is more accurate. There was a huge market for COM controls and widget libs and a lot of that Obj-C stuff came with a price tag.


This has been the #1 way to achieve code re-use and I am all for it. Optimize it in post where it is necessary and build things faster with tested code.


Size issues and bloat can be solved by tree shaking which is orthogonal to granularity of the package ecosystem. It doesn't matter for server side (at least people don't care). On client side, most ecosystems have a way to do it. Dart does it. Android does it with proguard.

The more pressing issue with dependencies is supply chain risks including security. That's why larger organizations have approval processes for using anything open source. Unfortunately the new crop of open source projects in JS and even Go seem to suffer from "IDGAF about what shit code from internet I am pulling" syndrome.

Unfortunately granularity does not solve that as long as your 1000 functions come from 1000 authors on NPM.


I can't remember the last time I saw someone so conclusively demonstrate they know nothing about the basics of how libraries, compilers, and linkers work.


Dead code elimination means binary size bloat does not follow from dependency bloat. So this point is pretty much invalid for a compiled language like Rust.


Dead code elimination is exactly the same as the halting problem. It’s approximate (and hopefully conservative!) at best.


No, dead code elimination in a statically-dispatched language is not equivalent to the halting problem.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: