I think the dependency situation is pretty rough, and very few folks want to adm...

alexvitkov · 2024-09-26T08:06:32.000000Z

That's what inevitably happens when you make transitive dependencies easy and you have a culture of "if there's a library for it you must use it!"

C/C++ are the only widely used languages without a popular npm-style package manager, and as a result most libraries are self-contained or have minimal, and often optional dependencies. efsw [1] is a 7000 lines (wc -l on the src directory) C++ FS watcher without dependencies.

The single-header libraries that are popular in the game programming space (stb_* [2], cgltf [3], etc) as well as of course Dear ImGui [4] have been some of the most pleasant ones I've ever worked with.

At this point I'm convinced that new package managers forbidding transitive dependencies would be an overall net gain. The biggest issue are large libraries that other ones justifiably depend on - OpenSSL, zlib, HTTP servers/clients, maybe even async runtimes. It's by no means an unsolvable problem, e.g. instead of having zlib as a transitive dependency, it could:

1. a library can still hard-depend on zlib, and just force the user to install it manually.

2. a library can provide generic compress/decompress callbacks, that the user can implement with whatever.

3. the compress/decompress functionality can be make standard

[1] https://github.com/SpartanJ/efsw

[2] https://github.com/nothings/stb

[3] https://github.com/jkuhlmann/cgltf

[4] https://github.com/ocornut/imgui

lifthrasiir · 2024-09-26T08:19:19.000000Z

> The single-header libraries that are popular in the game programming space (stb_* [2], cgltf [3], etc) as well as of course Dear ImGui have been some of the most pleasant ones I've ever worked with.

The mainstream game programming doesn't use C at all. (Source: I had been a gamedev for almost a decade, and I mostly dealt with C# and sometimes C++ for low-level stuffs.) Even C++ is now out of fashion for at least a decade, anyone claiming that C++ is necessary for game programming is likely either an engine developer---a required, but very small portion of all gamedevs---or whoever haven't done significant game programming recently.

Also, the reason that single-header libraries are rather popular in C is that otherwise they will be so, SO painful to use by the modern standard. As a result, those libraries have to be much more carefully designed than normal libraries either in C or other languages and contribute to their seemingly higher qualities. (Source: Again, I have written sizable single-header libraries in C and am aware of many issues from doing so.) I don't think this approach is scalable in general.

raverbashing · 2024-09-26T12:48:34.000000Z

> The mainstream game programming doesn't use C at all. (Source: I had been a gamedev for almost a decade

Game programming changed a lot, parent is talking about stuff older than 10 yrs

There was a lot of PC gaming in C/C++, and "Engine" were developed together with games for the most part. Think all the Doom and Quake saga

That's what he's talking about

lifthrasiir · 2024-09-27T02:28:48.000000Z

Most libraries mentioned in the GP are less than 10 years old, except for perhaps stb libraries (dates back to early 2000s). Single-header libraries are definitely a recent phenomenon, possibly inspired by stb libraries after all.

mike_hearn · 2024-09-26T08:25:41.000000Z

> as a result most libraries are self-contained or have minimal, and often optional dependencies

If you ignore the OS, then sure. Most C/C++ codebases aren't really portable however. They're tied to UNIX, Windows or macOS, and often some specific version range of those, because they use so many APIs from the base OS. Include those and you're up to millions of lines too.

xpe · 2024-09-26T10:55:37.000000Z

> That's what inevitably happens when you make transitive dependencies easy and you have a culture of "if there's a library for it you must use it!"

1. This doesn't mean that C++'s fragmented hellscape of package management is a good thing.

2. "inevitably"? No. This confuses the causation.

3. This comment conflates culture with tooling. Sure, they are related, but not perfectly so.

the_gipsy · 2024-09-26T09:04:18.000000Z

> a library can provide generic compress/decompress callbacks, that the user can implement with whatever.

This only works for extremely simple cases. Beyond toy example, you have to glue together two whole blown APIs with a bunch of stuff not aligning at all.

Measter · 2024-09-26T16:12:15.000000Z

Having a quick look at efsw, it depends on both libc and the windows API, both are huge dependencies. The Rust bindings for libc come to about 122 thousand lines, while the winapi crate is about 180 thousand lines.

[Edit] And for completeness, Microsoft's Windows crate is 630 thousand lines, though that goes way beyond simple bindings, and actually provides wrappers to make its use more idiomatic.

xpe · 2024-09-26T18:04:17.000000Z

> At this point I'm convinced that new package managers forbidding transitive dependencies would be an overall net gain.

Composition is an essential part of software development, and it crosses package boundaries.

How would banishing inter-package composition be a net gain?

DanielHB · 2024-09-26T06:59:41.000000Z

The fact is that dependency jungle is the prevalent way to get shit done these days. The best the runtime can do is embrace it, make it as performant and safe as possible and try to support minimum-dependency projects by having a broad std library.

Also I am no expert, but I think file-watchers are definitely not simple at all, especially if they are multi-platform.

kreyenborgi · 2024-09-26T07:53:44.000000Z

https://github.com/eradman/entr is

    Language                     files          blank        comment           code
    -------------------------------------------------------------------------------
    C                                4            154            163            880
    Bourne Shell                     2             74             28            536
    C/C++ Header                     4             21             66             70
    Markdown                         1             21              0             37
    YAML                             1              0              0             14
    -------------------------------------------------------------------------------
    SUM:                            12            270            257           1537
    -------------------------------------------------------------------------------

including a well-designed CLI.

entr supports BSD, Mac OS, and Linux (even WSL). So that's several platforms in <2k lines of code. By using MATHEMATICS and EXTRAPOLATION we find that non-WSL Windows file-watching must take four million minus two thousand equals calculate calculate 3998000 lines of code. Ahem.

Though to be fair, cargo watch probably does more than just file-watching. (Should it? Is it worth the complexity? I guess that depends on where you land on the worse-is-better discussion.)

lifthrasiir · 2024-09-26T08:06:11.000000Z

You are comparing a bicycle and a car; while you might only need a bicycle for your daily life, they are not directly comparable.

BSD, Mac OS and Linux share the same interface that approximates POSIX---so it only supports a single platform with different variants. Its CLI is not well-designed, it's just a fixed unconditional terminal sequence that even doesn't look at $TERM and its options have no long counterpart (probably because it couldn't use getopt_long which is a GNU extension). And cargo-watch actually parses the `cargo metadata` JSON output (guess what's required for parsing JSON in C) and deals with ignore patterns which are consistent in syntax (guess what's required for doing that besides from fnmatch).

And I'm not even meant to say that the supposed figure of 4M LoC is all required. In fact, while the problem itself does exist, I don't think that figure is accurate at all, given the massive `windows` crate was blindly counted towards. I guess the faithful reproduction of cargo-watch without any external library will take about 20--50K lines of code in Rust and in C. But doing it in C would be much more painful and you will instead cut requirements.

varjag · 2024-09-26T08:21:37.000000Z

(guess what's required for parsing JSON in C)

Certainly nothing on the order of MLOC. Ditto for other features you listed.

lifthrasiir · 2024-09-26T08:23:49.000000Z

See my other comment about the inexactness of 4M LoC figure. The total amount of true dependencies would be probably a tenth of that, which is still large but much more believable.

Arch-TK · 2024-09-26T09:49:58.000000Z

There's no standard way of doing file watching across BSDs, Mac OS and Linux.

> it's just a fixed unconditional terminal sequence

Are you referring to the clear feature? Yes, it's fixed. It's also pretty standard in that regard. It's optional so if it breaks (probably on xterm because it's weird but that's about it) you don't have to use it and can just issue a clear command manually as part of whatever you're running in the TTY it gives you. Honestly I don't think the feature is even really needed. I highly doubt cargo-watch needs to do anything with TERM so I am not sure why you mention it (spamming colours everywhere is eye candy not a feature).

But more importantly, this is just a convenience feature and not part of the "CLI". Not supporting long options isn't indicative of a poorly designed CLI. However, adding long option support without any dependencies is only a couple of hundred lines of C.

> And cargo-watch actually parses the `cargo metadata` JSON output

Which is unnecessary and entirely cargo specific. Meanwhile you can achieve the same effect with entr by just chaining it with an appropriate jq invocation. entr is more flexible by not having this feature.

> (guess what's required for parsing JSON in C)

Not really anywhere near as many lines as you seem to think.

> deals with ignore patterns which are consistent in syntax (guess what's required for doing that besides from fnmatch).

Again, entr doesn't deal with ignore patterns because it allows the end user to decide how to handle this themselves. It takes a list of filenames via stdin. This is not a design problem, it's just a design choice. It makes it more flexible. But again, if you wanted to write this in C, it's only another couple of hundred lines.

From my experience doing windows development, windows support probably isn't as painful as you seem to think.

All in all, I imagine it would take under 10k to have all the features you seem to care about AND nothing non-eye-candy would have to be cut (although for the eye candy, it's not exactly hideously difficult to parse terminfo. the terminfo crate for rust is pretty small (3.2k SLOC) and it would actually be that small (or smaller) if it didn't over-engineer the fuck out of the problem by using the nom, fnv, and phf crates given we're parsing terminfo not genetic information and doing it once at program startup not 10000 times per second).

Yes, I think trying to golf the problem is probably not appropriate. But 4M LoC is fucking ridiculous by any metric. 1M would still be ridiculous. 100k would also be ridiculous 50k is still pretty ridiculous.

lifthrasiir · 2024-09-26T10:24:15.000000Z

> There's no standard way of doing file watching across BSDs, Mac OS and Linux.

You are correct, but that's about the only divergence matters in this context. As I've noted elsewhere, you can't even safely use `char*` for file names in Windows; it should be `wchar_t*` in order to avoid any encoding problem.

> Are you referring to the clear feature? Yes, it's fixed. It's also pretty standard in that regard.

At the very least it should have checked for TTY in advance. I'm not even interested in terminfo (which should go die).

> spamming colours everywhere is eye candy not a feature

Agreed that "spamming" is a real problem, provided that you don't treat any amount of color as spamming.

> Which is unnecessary and entirely cargo specific. Meanwhile you can achieve the same effect with entr by just chaining it with an appropriate jq invocation. entr is more flexible by not having this feature.

Cargo-watch was strictly designed for Cargo users, which would obviously want to watch some Cargo workspace. Entr just happens to be not designed for this use case. And jq is much larger than entr, so you should instead consider the size of entr + jq by that logic.

> Not really anywhere near as many lines as you seem to think.

Yeah, my estimate is about 300 lines of code with a carefully chosen set of interface. But you have to ensure that it is indeed correct yourself, and JSON is already known for its sloppily worded standard and varying implementation [1]. That's what is actually required.

[1] https://seriot.ch/projects/parsing_json.html

> Yes, I think trying to golf the problem is probably not appropriate. But 4M LoC is fucking ridiculous by any metric. 1M would still be ridiculous.

And that 4M LoC is fucking ridiculous because it includes all `#[cfg]`-ignored lines in various crates including most of 2.2M LoC in the `windows` crate. That figure is just fucking incorrect and not relevant!

> 100k would also be ridiculous 50k is still pretty ridiculous.

And for this part, you would be correct if I didn't say the "faithful" reproduction. I'm totally sure that some thousand lines of Rust code should be enough to deliver a functionally identical program, but that's short of the faithful reproduction. This faithfulness issue actually occurs in many comparisons between Rust and C/C++; even the simple "Hello, world!" program does a different thing in Rust and in C because Rust panics when it couldn't write the whole text for example. 50K is just a safety margin for such subtle differences. (I can for example imagine some Unicode stuffs around...)

Arch-TK · 2024-09-26T15:08:40.000000Z

> As I've noted elsewhere, you can't even safely use `char` for file names in Windows; it should be `wchar_t` in order to avoid any encoding problem.

Yes, this is true. But I think the overhead of writing that kind of code would not be as enormous as 30k lines or anything in that order.

> At the very least it should have checked for TTY in advance. I'm not even interested in terminfo (which should go die).

Maybe. It's an explicit option you must pass. It's often useful to be able to override isatty decisions when you want to embed terminal escapes in output to something like less. But for clear it's debatable.

I would say it's fine as it is.

Also, if isatty is "the very least" what else do you propose?

> Agreed that "spamming" is a real problem, provided that you don't treat any amount of color as spamming.

I treat any amount of color as spamming when alternative options exist. Colours are useful for: syntax highlighting, additional information from ls. Not for telling you that a new line of text is available for you to read in your terminal.

There are many things where colours are completely superfluous but are not over-used. I still think that colours should be the exception not the rule.

> Cargo-watch was strictly designed for Cargo users, which would obviously want to watch some Cargo workspace. Entr just happens to be not designed for this use case. And jq is much larger than entr, so you should instead consider the size of entr + jq by that logic.

Yes jq is larger than entr. But it's not 3.9M SLOC. It also has many features that cargo-watch doesn't. If you wanted something cargo specific you could just write something specific to that in not very much code at all. The point is that the combination of jq and entr can do more than cargo-watch with less code.

> and JSON is already known for its sloppily worded standard and varying implementation [1]. That's what is actually required.

I hope you can agree that no number of millions of lines of code can fix JSON being trash. What would solve JSON being trash is if people stopped using it. But that's also not going to happen. So we are just going to have to deal with JSON being trash.

> And for this part, you would be correct if I didn't say the "faithful" reproduction. I'm totally sure that some thousand lines of Rust code should be enough to deliver a functionally identical program, but that's short of the faithful reproduction. This faithfulness issue actually occurs in many comparisons between Rust and C/C++; even the simple "Hello, world!" program does a different thing in Rust and in C because Rust panics when it couldn't write the whole text for example. 50K is just a safety margin for such subtle differences. (I can for example imagine some Unicode stuffs around...)

Regardless of all the obstacles. I put my money on 20k max in rust with everything vendored including writing your own windows bindings.

But neither of us has time for that.

lifthrasiir · 2024-09-26T23:59:10.000000Z

> It's an explicit option you must pass. It's often useful to be able to override isatty decisions when you want to embed terminal escapes in output to something like less. But for clear it's debatable.

In terms of UX it's just moving the burden to the user, who may not be aware of that problem or even the existence of `-c`. The default matters.

> I treat any amount of color as spamming when alternative options exist. Colours are useful for: syntax highlighting, additional information from ls. Not for telling you that a new line of text is available for you to read in your terminal.

I'm a bit more lenient but agree on broad points. The bare terminal is too bad for UX, which is why I'm generous about any attempt to improve UX (but not color spamming).

I'm more cautious about emojis than colors by the way, because they are inherently colored while you can't easily customize emojis themselves. They are much more annoying than mere colors.

> It also has many features that cargo-watch doesn't. If you wanted something cargo specific you could just write something specific to that in not very much code at all. The point is that the combination of jq and entr can do more than cargo-watch with less code.

I think you have been sidetracked then, as the very starting point was about cargo-watch being apparently too large. It's too large partly because of bloated dependencies but also because dependencies are composed instead of being inlined. Your point shifted from no dependencies (or no compositions as an extension) to minimal compositions, at least I feel so. If that's your true point I have no real objection.

> I hope you can agree that no number of millions of lines of code can fix JSON being trash. What would solve JSON being trash is if people stopped using it. But that's also not going to happen. So we are just going to have to deal with JSON being trash.

Absolutely agreed. JSON only survived because of the immense popularity of JS and good timing, and continues to thrive because of that initial momentum. It's not even hard to slightly amend JSON to make it much better... (I even designed a well-defined JSON superset many years ago!)

Arch-TK · 2024-09-27T15:16:20.000000Z

> The default matters.

The default is no clear.

It could be the default is to clear and then I would agree that an isatty check would be necessary. But an isatty check for an explicit option here would be as weird as an isatty check for --color=always for something like ls.

> The bare terminal is too bad for UX

I think it depends on the task and the person. You wouldn't see me doing image editing, 3d modelling, audio mastering, or web browsing in a terminal. But for things which do not suffer for it (a surprising number of tasks) it's strictly better UX than a GUI equivalent.

> emojis

Yes, I dislike these. I especially remember when suddenly my terminal would colour emojis because someone felt it was a good idea to add that to some library as a default. :(

> I think you have been sidetracked then, as the very starting point was about cargo-watch being apparently too large. It's too large partly because of bloated dependencies but also because dependencies are composed instead of being inlined. Your point shifted from no dependencies (or no compositions as an extension) to minimal compositions, at least I feel so. If that's your true point I have no real objection.

Well no, I think you can build a cargo-watch equivalent (with a bit of jank) from disparate utilities running in a shell script and still have fewer total lines.

And sure, the line count is a bit inflated with a lot of things not being compiled into the final binary. But the problem we're discussing here is if it's worth to depend on a bunch of things when all you're using is one or two functions.

As I understand it, whenever doing anything with windows, you pull in hideous quantities of code for wrapping entire swathes of windows. Why can't this be split up more so that if all I want is e.g. file watching that I get just file watching. I know windows has some basic things you inevitably always need, but surely this isn't enough to make up 2M SLOC. I've written C code for windows and yes it's painful but it's not 2M SLOC of boilerplate painful.

Large complex dependency graphs are obviously not a problem for the compiler, it can chug away, remove unnecessary shit, and get you a binary. They're usually not a big problem for binary size (although they can still lead to some inflation). But they are a massive issue for being able to work on the codebase (long compilation times) or review the codebase (huge amounts of complexity, even when code isn't called, you need to rule out that it's not called).

And huge complex dependency graphs where you're doing something relatively trivial (and honestly file watching isn't re-implementing cat but it's not a web browser or an OS) should just be discouraged.

We both agree that you can get this done in under 50k lines. That's much easier to manage from an auditing point of view than 4M lines of code, even if 3.9M lines end up compiled out.

lifthrasiir · 2024-09-28T03:45:40.000000Z

Yeah, I think we are largely on the same page. The only thing I want to correct at this point is that Rust has no OS support yet, so any "system" library will necessarily come out as a third-party crate. Including the `windows` crate in this context is roughly equivalent to including the fully expanded lines of `#include <windows.h>`, which is known to be so massive that it also has a recommended macro `WIN32_LEAN_AND_MEAN` to skip its large portion on typical uses, but that should still count according to you I think? [1] Thankfully for auditors, this crate does come from Microsoft so that gives a basic level of authority, but I can understand if you are still unsatisfied about the crate and that was why I stressed the original figure was very off.

As noted in my other comments, I'm very conscious about this problem and tend to avoid excess dependencies when I can do them myself with a bit of work. I even don't use iterutils (which is a popular convenience library that amends `Iterator`), because I normally want a few of them (`Iterutils::format` is one of things I really miss) and I can write them without making other aspects worse. But I'm also in the minority, I tend to depend on "big" dependencies that are not sensible to write them myself while others are much more liberal, and I do think that cargo-watch is already optimal in the number of such dependencies. More responsibilities and challenges remain for library authors, whose decisions directly contribute to the problem.

[1] I haven't actually checked the number of lines under this assumption, but I recall that it exceeds at least 100K lines of code, and probably much larger.

orf · 2024-09-26T23:26:52.000000Z

> But neither of us has time for that.

Indeed, so why waste it reinventing the wheel instead of using a high quality third party package?

Or arguing about highly suspect LoC numbers pulled out of thin air as if it’s not an apples-to-oranges comparison, for that matter.

Arch-TK · 2024-09-27T14:57:36.000000Z

I mean there is an issue here with inflated line counts. It makes the whole solution more complex and more difficult to troubleshoot. It makes the binary size inflated. It likely makes the solution slower. And, probably most important, it makes auditing very difficult.

gpderetta · 2024-09-26T09:48:51.000000Z

> BSD, Mac OS and Linux

Do these OSs share file watch interfaces? Linux itself has, last I checked, three incompatible file watch APIs.

lifthrasiir · 2024-09-26T10:01:19.000000Z

Hence "variants". Many other aspects, including the very fact that you can safely use `char*` for the file name, are shared.

kreyenborgi · 2024-09-26T11:41:18.000000Z

Good argument. I do prefer bicycles.

ben-schaaf · 2024-09-26T08:21:34.000000Z

Note that entr doesn't recursively watch for file changes. It has a list of files it watches for changes, but this list isn't amended when new files are added. Fundamentally that's a fairly small subset of proper recursive file watching. In terms of just watching files a better project to compare against is https://github.com/inotify-tools/inotify-tools.

kreyenborgi · 2024-09-26T11:40:35.000000Z

from man entr:

       Rebuild  project  if a source file is modified or added to the src/ di‐
       rectory:

             $ while sleep 0.1; do ls src/*.rb | entr -d make; done

Though I shudder to think of the amount of code needed to rewrite that in a compiled language while sticking to the principle of not reinventing anything remotely wheel-shaped. (Btw the libinotify/src is like 2.3kloc, inotify cli <1.2kloc.)

raincole · 2024-09-26T08:45:19.000000Z

> By using MATHEMATICS and EXTRAPOLATION we find that non-WSL Windows file-watching must take four million minus two thousand equals calculate calculate 3998000 lines of code

You joke, but Windows support is the main (probably the only?) reason why cargo-watch is huge. Rust ecosystem has some weird shit when interacting with Windows.

dist1ll · 2024-09-26T07:09:26.000000Z

That's the usual response I get when I bring this issue up. "file watching is actually very complicated" or "if you avoided deps, you'd just reimplement millions of loc yourself.

Forgive me if I'm making a very bold claim, but I think cross-platform file watching should not require this much code. It's 32x larger than the Linux memory management subsystem.

joatmon-snoo · 2024-09-26T07:37:57.000000Z

Good file watching that provides flexible primitives absolutely requires:

- ok, a single ext4 file inode changes, and its filename matches my hardcoded string

- oh, you don’t want to match against just changes to “package.json” but you want to match against a regex? voila, now you need a regex engine

- what about handling a directory rename? should that trigger matches on all files in the renamed directory?

- should the file watcher be triggered once per file, or just every 5ms? turns out this depends on your use case

- how do symlinks fit into this story?

- let’s say i want to handle once every 5ms- how do i actually wait for 5ms? do i yield the thread? do i allow other async contexts to execute while i’m waiting? how do those contexts know when to execute and when to yield back to me? now you have an async runtime with timers

- how does buffering work? are there limits on how many file change events can be buffered? do i dynamically allocate more memory as more file changes get buffered? now you need a vector/arraylist implementation

And this is before you look at what this looks like on different platforms, or if you want polling fallbacks.

Can you do it with less dependencies? Probably, if you start making hard tradeoffs and adding even more complexity about what features you activate - but that only adds lines of code, it doesn’t remove them.

What you describe is ideologically nice, but in practice it’s over-optimizing for a goal that most people don’t really care about.

varjag · 2024-09-26T08:18:03.000000Z

Are you 100% sure all these cases are handled by cargo-watch?

cogman10 · 2024-09-26T11:17:50.000000Z

Yup. Every user facing case mentioned has a corresponding flag. The non-user facing stuff, like being cross platform, is common sense.

https://crates.io/crates/cargo-watch/8.5.2

lostmsu · 2024-09-26T09:09:15.000000Z

Regex one stands out as a negative example here. Why does it have to be built-in instead of exposing a str -> bool filter lambda?

sethaurus · 2024-09-26T16:41:24.000000Z

How does one pass a lambda to a CLI tool? Outside of using a regex or equivalent pattern syntax, I'm struggling to understand what you are proposing here.

lostmsu · 2024-09-26T18:30:34.000000Z

This is in a context of file watching library. Although I'm not sure cargo-watch supports or even needs to support regex.

mirashii · 2024-09-26T09:45:39.000000Z

Now you need a general purpose embedded language interpreter to express your filter lambda? I'm not sure you've really made anything simpler.

rowanG077 · 2024-09-26T11:11:43.000000Z

I don't see why you want an embedded interpreter for this. Can you explain?

tempodox · 2024-09-26T16:12:23.000000Z

If you give a lambda to cargo-watch instead of a regexp, it has to be evaluated. Hence interpreter.

rowanG077 · 2024-09-26T17:09:02.000000Z

Why would you evaluate it using an interpreter? Since you are using it in the context of a rust lambda you compile it. You just have a rust file that calls cargo-watch as a library. Crafting an interpreter seems like an incredibly bad idea.

mirashii · 2024-09-26T18:56:30.000000Z

But now you have to know at compile time what you're watching, which is not what cargo-watch or any of the similar commands like entr do.

rowanG077 · 2024-09-26T22:49:58.000000Z

Considering you have a rust file and the dependency why wouldn't you compile it? It's not like rustc is not available when you are using cargo-watch.

tempodox · 2024-09-27T07:34:36.000000Z

Anyway, doing all that just so you don't have to write a regexp has all the hallmarks of overkill. Keep it simple.

SkiFire13 · 2024-09-26T07:45:24.000000Z

It's not just file watching, that would be the watchexec crate, while cargo-watch properly integrate with cargo. Moreover cargo-watch also includes:

- proper CLI support, with help messages, subcommands and so on

- support for reading cargo's metadata

- logging

- support for dotenv files

- proper shell escaping support

- and it seems also support for colored terminal writing.

Moreover both watchexec and cargo-watch end up depending on winapi, which includes binding for a lot of windows API, some which might be needed and some which not be.

This could also be worse if the offial windows crate by Microsoft was used (or maybe it's already used due to some dependency, I haven't checked), since that's gigantic.

DanielHB · 2024-09-26T07:35:27.000000Z

I think the issue of file-watching is that the libs usually support multiple implementations (with different tradeoffs and with multiple fallbacks) for file-watching with a lot of them being platform specific.

j-krieger · 2024-09-26T07:20:54.000000Z

Eh. The standard library is also a gigantic dependency written entirely by volunteers.

DanielHB · 2024-09-26T07:42:14.000000Z

I am not a Rust expert but the thing with the standard libraries is that it only has peer dependencies with itself and they are all synced to the same version.

Meaning if you only use the std lib you:

1) Will never include two different versions of the same peer dependency because of incompatible version requirements.

2) Will usually not have two dependencies relying on two different peer-dependencies that do the same thing. This can still happen for deprecated std lib features, but tends to be a much lesser issue.

These two issues are usually the ones that cause dependency size explosion in projects.

dist1ll · 2024-09-26T07:33:21.000000Z

I don't have a problem with dependencies in principle. There's a good reason for standard libraries to contain a decent amount of code. It is a vector for supply chain attacks, but I also have a lot of trust in the Rust maintainers. The Rust standard library is exceptionally well-written from what I've seen, so I'm not too worried about it.

FWIW I checked out the nightly toolchain, and it looks like the stdlib is less than 400k SLoC. So literally 10x smaller.

SkiFire13 · 2024-09-26T07:37:09.000000Z

> try to support minimum-dependency projects by having a broad std library.

Since everyone depends on the standard library this will just mean everyone will depend on even more lines of code. You are decreasing the number of nominal dependencies but increasing of much code those amount to.

Moreover the moment the stdlib's bundled dependency is not enough there are two problems:

- it can't be changed because that would be a breaking change, so you're stuck with the old bad implementation;

- you will have to use an alternative implementation in another crate, so now you're back at the starting situation except with another dependency bundled in the stdlib.

Just look at the dependency situation with the python stdlib, e.g. how many versions of urllib there are.

DanielHB · 2024-09-26T07:48:04.000000Z

You do have good points as well and it depends heavily on how disciplined the std lib makers are. Go for example has a very clean and stable std lib.

I posted this in some other thread:

I am not a Rust expert but the thing with the standard libraries is that it only has peer dependencies with itself and they are all synced to the same version. Meaning if you only use the std lib you:

1) Will never include two different versions of the same peer dependency because of incompatible version requirements.

2) Will usually not have two dependencies relying on two different peer-dependencies that do the same thing. This can still happen for deprecated std lib features, but tends to be a much lesser issue.

These two issues are usually the ones that cause dependency size explosion in projects.

SkiFire13 · 2024-09-26T08:01:12.000000Z

For 1), Cargo already take care of that if you use the same major version. Bundling dependencies in the stdlib "solves" the problem by making new major versions impossible.

This means that if a bundled dependency in the stdlib is even found to have some design issue that require breaking changes to fix then you're out of luck. As you said the stdlib could deprecate the old version and add a new one, but then you're just making problem 2) worse by forcing everyone to include the old deprecated dependency too! Or you could use a third-party implementation, especially if the stdlib doesn't have the features you need, but even then you will still be including the stdlib version in your dependency graph!

Ultimately IMO bundling dependencies in the stdlib just makes the problem worse over time, though it can raise awareness about how to better handle them.

DanielHB · 2024-09-26T11:45:02.000000Z

> Cargo already take care of that if you use the same major version.

Most dependency management systems do that, but large projects often end up pulling multiple different major versions of (often very large) dependencies.

> 2) worse by forcing everyone to include the old deprecated dependency too!

Like I said I am no expert on Rust, but I assume that Rust can eliminate stdlib dead-code from the runtime? So unused deprecated features shouldn't be included on every build? Also deprecated features often are modified to use the new implementation under the hood which reduces code duplication problem.

> Bundling dependencies in the stdlib "solves" the problem by making new major versions impossible.

Yes, which is a feature. For example Go is very annoying about this not only on the stdlib. https://go.dev/doc/go1compat a lot of 3rd party libs follow this principle as well.

I bring Go a lot but I actually don't like the language that much, but it gets some pragmatic things right.

I am not saying everything should be in the stdlib, but I tend to think that the stdlib should be fairly big and tackle most common problems.

SkiFire13 · 2024-09-26T13:34:46.000000Z

> > Bundling dependencies in the stdlib "solves" the problem by making new major versions impossible. > > Yes, which is a feature. For example Go is very annoying about this not only on the stdlib. https://go.dev/doc/go1compat a lot of 3rd party libs follow this principle as well.

But there's no reason such a "feature" requires bundling dependencies in the stdlib. As you mention 3rd party Go libs manage to do this perfectly fine.

> but I tend to think that the stdlib should be fairly big and tackle most common problems.

I tend to disagree with this, because the way to tackle those common problems with likely change in the future, but the stdlib will be stuck with it for eternity. I would rather have some community-standard 3rd party crate that you can replace in the future when it will grow old. See also "Where modules go to die" https://leancrew.com/all-this/2012/04/where-modules-go-to-di...

norman784 · 2024-09-26T09:54:18.000000Z

I would argue that Go is focused on writing web services and their stdlib is focused on providing those primitives. On the other hand Rust is more general programming language, so it's harder to add something to the stdlib that would not benefit the broader range of users.

chillfox · 2024-09-26T07:53:24.000000Z

"I think file-watchers are definitely not simple at all"

I don't really know much about Rust, but I got curious and had a look at the file watching apis for windows/linux/macos and it really didn't seem that complicated. Maybe a bit fiddly, but I have a hard time imagining how it could take more than 500 lines of code.

I would love to know where the hard part is if anyone knows of a good blog post or video about it.

0x000xca0xfe · 2024-09-26T13:44:18.000000Z

If you lose track of your dependencies you are just asking for supply chain attacks.

And since xz we know resourceful and patient attackers are reality and not just "it might happen".

Sorry but sprawling transitive micro-dependencies are not sustainable. It's convenient and many modern projects right now utilize it but they require a high-trust environment and we don't have that anymore, unfortunately.

cogman10 · 2024-09-26T11:29:33.000000Z

This is a natural and not really scary thing.

All code is built on mountains of dependencies that by their nature will do more than what you are using them for. For example, part of cargo watch is to bring in a win32 API wrapper library (which is just autogenerated bindings for win32 calls). Of course that thing is going to be massive while watch is using only a sliver of it in the case it's built for windows.

The standard library for pretty much any language will have millions of lines of code, that's not scary even though your apps likely only use a fraction of what's offered.

And have you ever glanced at C++'s boost library? That thing is monstrously big yet most devs using it are going to really only grab a few of the extensions.

The alternative is the npm hellscape where you have a package for "isOdd" and a package for "is even" that can break the entire ecosystem if the owner is disgruntled because everything depends on them.

Having fewer larger dependencies maintained and relied on by multiple people is much more ideal and where rust mostly finds itself.

throwitaway1123 · 2024-09-26T18:22:37.000000Z

> The alternative is the npm hellscape where you have a package for "isOdd" and a package for "is even" that can break the entire ecosystem if the owner is disgruntled because everything depends on them.

The is-odd and is-even packages are in no way situated to break the ecosystem. They're helper functions that their author (Jon Schlinkert) used as dependencies in one of his other packages (micromatch) 10 years ago, and consequently show up as transitive dependencies in antiquated versions of micromatch. No one actually depends on this package indirectly in 2024 (not even the author himself), and very few packages ever depended on it directly. Micromatch is largely obsolete given the fact that Node has built in globbing support now [1][2]. We have to let some of these NPM memes go.

[1] https://nodejs.org/docs/latest-v22.x/api/path.html#pathmatch...

[2] https://nodejs.org/docs/latest-v22.x/api/fs.html#fspromisesg...

preommr · 2024-09-26T12:21:16.000000Z

> The alternative is the npm hellscape where you have a package for "isOdd" and a package for "is even" that can break the entire ecosystem if the owner is disgruntled because everything depends on them.

This used to be true 5-10 years ago. The js ecosystem moves fast and much has been done to fix the dependency sprawl.

dartos · 2024-09-26T12:41:08.000000Z

I… don’t think that’s true.

Just look at how many downloads some of those packages have today.

Look at the dependency tree for a next or nuxt app.

What the js world did is make their build systems somewhat sane, whatwith not needing babel in every project anymore.

throwitaway1123 · 2024-09-26T17:42:44.000000Z

> Look at the dependency tree for a next

Looks ok to me: https://npmgraph.js.org/?q=next

Ironically, most of the dependencies are actually Rust crates used by swc and turbopack [1][2]. Try running `cargo tree` on either of those crates, it's enlightening to say the least. And of course, Node has a built in file watcher, and even the most popular third party package for file watching (Chokidar) has a single dependency [3].

[1] https://github.com/vercel/next.js/blob/07a55e03a31b16da1d085...

[2] https://github.com/swc-project/swc/blob/b94a0e1fd2b900b05c5f...

[3] https://npmgraph.js.org/?q=chokidar

dartos · 2024-09-26T20:45:37.000000Z

Not sure what npmgraph is doing exactly, but there are more dependencies than that.

React is a dependency of every next application, but I don’t see it there.

Maybe next, the singular package, has fewer dependencies than a project made with next.

throwitaway1123 · 2024-09-26T21:40:05.000000Z

React and react-dom are peer dependencies (npmgraph lists them but doesn't graph them visually). The actual full installation command is: `npm install next@latest react@latest react-dom@latest`[1]. Even if you include react and react-dom, the dependency graph still looks tolerable to me: https://npmgraph.js.org/?q=next%4014.2.13%2C+react%4018.3.1%...

[1] https://nextjs.org/docs/getting-started/installation#manual-...

lifthrasiir · 2024-09-26T07:38:44.000000Z

I consciously remove and rewrite various dependencies at work, but I feel it's only a half of the whole story because either 1K or 4M lines of code seem to be equally inaccurate estimates for the appropriate number of LoC for this project.

It seems that most dependencies of cargo-watch are pulled from three direct requirements: clap, cargo_metadata and watchexec. Clap would pull lots of CLI things that would be naturally platform-dependent, while cargo_metadata will surely pull most serde stuffs. Watchexec does have a room for improvement though, because it depends on command-group (maintained in the same org) which unconditionally requires Tokio! Who would have expected that? Once watchexec got improved on that aspect however, I think these requirements are indeed necessary for the project's goal and any further dependency removal will probably come with some downsides.

A bigger problem here is that you can't easily fix other crates' excessive dependencies. Watchexec can be surely improved, but what if other crates are stuck at the older version of watchexec? There are some cases where you can just tweak Cargo.lock to get things aligned, but generally you can't do that. You have to live with excessive and/or duplicate dependencies (not a huge problem by itself, so it's default for most people) or work around with `[patch]` sections. (Cargo is actually in a better shape given that the second option is even possible at all!) In my opinion there should be some easy way to define a "stand-in" for given version of crate, so that such dependency issues can be more systematically worked around. But any such solution would be a huge research problem for any existing package manager.

cmrdporcupine · 2024-09-26T08:54:39.000000Z

It's frustrating because the grand-daddy of build systems with automatic transitive dependency management -- Maven -- already had tools from day one to handle this kind of thing through excluded dependencies (a blunt instrument, but sometimes necessary). In my experience, [patch] doesn't cut it or compare.

That, and the maven repository is moderated. Unlike crates.io.

Crates.io is a real problem. No namespaces, basically unmoderated, tons of abandoned stuff. Version hell like you're talking about.

I have a hard time taking it at all seriously as a professional tool. And it's only going to get worse.

If I were starting a Rust project from scratch inside a commercial company at this point, I'd use Bazel or Buck or GN/Ninja and vendored dependencies. No Cargo, no crates.io.

lifthrasiir · 2024-09-26T09:16:51.000000Z

> In my experience, [patch] doesn't cut it or compare.

AFAIK what Maven does is an exclusion of dependency edges, which is technically an unsafe thing to do. Cargo [patch] is a replacement of dependency vertices without affecting any edges. (Maven surely has a plugin to do that, but it's not built-in.) They are different things to start with.

Also I believe that the edge exclusion as done by Maven is (not just "technically", but) really unsafe and only supported due to the lack of better alternatives. Edges are conceptually dependent to the incoming vertex, so it should be that vertex's responsibility to override problematic edges. An arbitrary removal of edges (or vertices) is much harder to track and many other systems have related pains from that.

What I'm proposing here is therefore the extension of Cargo's vertex replacement: you should be able to share such replacements so that they can be systematically dealt. If my transitive dependencies contain some crate X with two different versions 1.0 and 2.0 (say), I should be able to write an adapter from 2.0 to 1.0 or vice versa, and ideally such adapter should be available from the crate author or from the community. I don't think Maven did try any such systematic solution.

> That, and the maven repository is moderated. Unlike crates.io.

Only the central repository is moderated by Maven. Maven is not much better than Cargo once you have more remote repositories.

> Crates.io is a real problem. No namespaces, basically unmoderated, tons of abandoned stuff. Version hell like you're talking about.

Namespace is not a solution for name squatting: namespace is just yet another identifier that can be squatted. If you are worried about squatting, the only effective solution is sandboxing, everything else is just moving the goal post.

The very existence of remote repositories also means that you can't always moderate all the stuffs and get rid of abandoned stuffs. You have to trust repositories, just like that you have to trust crates with crates.io today.

lifthrasiir · 2024-09-27T02:10:58.000000Z

A small fix: Maven Central is not moderated by Maven developers themselves but by Sonatype [1].

[1] https://www.sonatype.com/blog/the-history-of-maven-central-a...

pas · 2024-09-26T10:44:41.000000Z

namespaces are a solution to having "authenticated groups of crates", it helps structuring and more importantly restructuring crates.

lifthrasiir · 2024-09-26T10:56:55.000000Z

That's an intention, not the outcome. You might assume that having both `@chrono/chrono` and `@chrono/tz` shows a clear connection between them, but such connection is nothing to do with namespace (the actual crate names are `chrono` and `chrono-tz`), and any authority provided by `@chrono/` prefix is offseted by the availability of similar names like `@chrno/tz` or `chrono-tz`. The only thing namespace can prevent is names starting with the exact prefix of `@chrono/`, and that's not enough.

pas · 2024-09-26T12:48:26.000000Z

"... any authority provided by [a] prefix is offseted by the availability of [similar prefixes]"

I'm not buying this, sorry. Yes, typos and other deceptive things are possible, but having this authority data would allow tools to then use this signal. Not having it seems strictly worse.

simon_o · 2024-09-26T11:21:36.000000Z

> Namespace is not a solution for name squatting: namespace is just yet another identifier that can be squatted. If you are worried about squatting, the only effective solution is sandboxing, everything else is just moving the goal post.

The problems crates.io struggles with have never been an issue with Maven, regardless of how creatively you try to redefine words.

That's a fact. Deal with it.

lifthrasiir · 2024-09-26T12:05:48.000000Z

How can you be that sure? :-) It is not even like that Maven repositories don't suffer from malicious packages with confusing names (for example, [1])...

[1] https://github.com/spring-projects/spring-ai/issues/537

simon_o · 2024-09-26T12:27:44.000000Z

That seems to be an absolute win to be honest. Not sure how you think this is helping your case.

Maven Central people nuked the artifact that may have caused confusion, and if the owners try anything like that again, it's likely their domain will be banned from publishing.

lifthrasiir · 2024-09-27T00:02:33.000000Z

Yes, but that's not unique to Maven because virtually all software repositories have such policies. If that's about the required amount of "moderation" you claim, I don't see how Maven can even be considered better than others.

simon_o · 2024-09-27T13:32:53.000000Z

Or maybe you don't want to.

If that's the hill you want to die on, good luck.

lifthrasiir · 2024-09-28T03:28:47.000000Z

Maybe you wanted to say that policies do not imply actual "moderation". But that is demonstrably false, there are documented cases where crates.io removed packages solely because they were malicious and all those cases happened as soon as possible for crates.io. So Maven Central has to do something more in order to be ever considered better than crates.io, but I have no idea---it accepts any well-formed uploads after all. Do elaborate on that.

dartos · 2024-09-26T12:46:29.000000Z

Only sith speak in absolutes.

conradludgate · 2024-09-26T06:56:48.000000Z

I bet most of those lines are from the generated windows api crates. They are notoriously monstrous

dist1ll · 2024-09-26T06:59:56.000000Z

You're right, the windows crate alone contributes 2.2M. I wonder if there's a way to deal with this issue.

lifthrasiir · 2024-09-26T07:18:00.000000Z

The exact size of the `windows` crate depends on feature flags, because parsing 2.2M lines of code is always going to be very expensive even when you immediately discard them.

JoshTriplett · 2024-09-26T07:30:30.000000Z

The parser is shockingly fast. The slow parts come after parsing, where we process all those function definitions and structure definitions, only to end up throwing 98% of them away.

A challenging architectural problem that several of us are trying to get someone nerdsniped into: inverting the dependency tree, such that you first check what symbols exist in a large crate like windows, then go to all the crates depending on it and see what they actually consume, then go back and only compile the bits needed for those symbols.

That'd be a massive improvement to compilation time, but it's a complicated change. You'd have to either do a two-pass compilation (first to get the symbol list, then again to compile the needed symbols) or leave that instance of the compiler running and feed the list of needed symbols back into it.

lifthrasiir · 2024-09-26T07:43:26.000000Z

Agreed, though by "parsing" I meant to include easy steps like cfg checks. In fact, cfg checks should probably be done at the same time as parsing and disabled items should be discarded as soon as possible---though I don't know whether that is already done in the current compiler, or whether it's beneficial or even possible at all.

JoshTriplett · 2024-09-26T07:51:20.000000Z

We do some parsing "underneath" disabled cfg checks, in order to support user-friendliness features like "that function you tried to call doesn't exist, but if you enabled feature xyz then it would". But we do discard cfg-protected items before doing any subsequent heavier operations.

Flex247A · 2024-09-26T07:08:09.000000Z

Enabling FAT LTO reduces the final binary size but it isn't a permanent fix.

actionfromafar · 2024-09-26T07:11:34.000000Z

Not permanent how?

Flex247A · 2024-09-26T16:30:33.000000Z

Because the compiler ends up looking at all the functions anyway, only for the linker to discard them all.

jeroenhd · 2024-09-26T10:12:49.000000Z

I love and hate the Windows API crates. They're amazing in that they bring pretty much the entire modern Windows API into the language without needing to touch FFI generators yourself, but the Windows API is about as large as almost every package that comes with a desktop Linux install.

I wish crates that used Windows stuff wouldn't enable it by default.

lifthrasiir · 2024-09-26T10:30:21.000000Z

Well, they do! What happens here is that `windows` crates are lightly processed even when they are disabled, for the reason JoshTriplett mentioned elsewhere. Such "light" processing is negligible in practice, but technically all those lines are processed (and `Cargo.lock` will list them even when they are entirely unused), hence the overblown and extremely misleading figure.

nullifidian · 2024-09-26T07:14:07.000000Z

Some amount of the risk from the "dependency jungle" situation could be alleviated by instituting "trusted" set of crates that are selected based on some popularity threshold, and with a rolling-release linux-distro-like stabilization chain, graduating from "testing" to "stable". If the Rust Foundation raised more money from the large companies, and hired devs to work as additional maintainers for these key crates, adding their signed-offs, it would be highly beneficial. That would have been a naturally evolving and changing equivalent to an extensive standard library. Mandating at least two maintainer sign offs for such critical set of crates would have been a good policy. Instead the large companies that use rust prefer to vet the crates on their own individually, duplicating the work the other companies do.

The fact that nothing has changed in the NPM and Python worlds indicates that market forces pressure the decision makers to prefer the more risky approach, which prioritizes growth and fast iteration.

tbillington · 2024-09-26T08:42:14.000000Z

vendor + linecount unfortunately doesn't represent an accurate number of what cargo-watch would actually use. It includes all platform specific code behind compile time toggles even though only one would be used at any particular time, and doesn't account for the code not included because the feature wasn't enabled. https://doc.rust-lang.org/cargo/reference/features.html

whether those factors impact how you view the result of linecount is subjective

also as one of the other commenters mentioned, cargo watch does more than just file watching

the_clarence · 2024-09-26T17:21:06.000000Z

Agree. That was always my major gripe with Rust: it's not battery included. The big selling point of golang was the battery included part and I think that's really what is missing in Rust. I hope that with time more stuff can't get into the rust stdlib

umanwizard · 2024-09-26T07:24:13.000000Z

Why, concretely, does this matter?

Other than people who care about relatively obscure concerns like distro packaging, nobody is impeded in their work in any practical way by crates having a lot of transitive dependencies.

josephg · 2024-09-26T10:03:15.000000Z

Author here. If I compile a package which has 1000 transitive dependencies written by different authors, there's ~1000 people who can execute arbitrary code on my computer, with my full user permissions. I wouldn't even know if they did.

That sounds like a massive security problem to me. All it would take is one popular crate to get hacked / bribed / taken over and we're all done for. Giving thousands of strangers the ability to run arbitrary code on my computer is a profoundly stupid risk.

Especially given its unnecessary. 99% of crates don't need the ability to execute arbitrary syscalls. Why allow that by default?

pornel · 2024-09-26T12:37:44.000000Z

Rust can't prevent crates from doing anything. It's not a sandbox language, and can't be made into one without losing its systems programming power and compatibility with C/C++ way of working.

There are countless obscure holes in rustc, LLVM, and linkers, because they were never meant to be a security barrier against the code they compile. This doesn't affect normal programs, because the exploits are impossible to write by accident, but they are possible to write on purpose.

---

Secondly, it's not 1000 crates from 1000 people. Rust projects tend to split themselves into dozens of micro packages. It's almost like splitting code across multiple .c files, except they're visible in Cargo. Many packages are from a few prolific authors and rust-lang members.

The risk is there, but it's not as outsized as it seems.

Maintainers of your distro do not review code they pull in for security, and the libraries you link to have their own transitive dependencies from hundreds of people, but you usually just don't see them: https://wiki.alopex.li/LetsBeRealAboutDependencies

Rust has cargo-vet and cargo-crev for vetting of dependencies. It's actually much easier to review code of small single-purpose packages.

vlovich123 · 2024-09-26T13:08:45.000000Z

There’s two different attack surfaces - compile time and runtime.

For compile time, there’s a big difference between needing the attacker to exploit the compiler vs literally just use the standard API (both in terms of difficulty of implementation and ease of spotting what should look like fairly weird code). And there’s a big difference between runtime rust vs compile time rust - there’s no reason that cargo can’t sandbox build.rs execution (not what josephg brought up but honestly my bigger concern).

There is a legitimate risk of runtime supply chain attacks and I don’t see why you wouldn’t want to have facilities within Rust to help you force contractually what code is and isn’t able to do when you invoke it as a way to enforce a top-level audit. Even though rust today doesn’t support it doesn’t make it a bad idea or one that can’t be elegantly integrated into today’s rust.

pornel · 2024-09-26T16:24:46.000000Z

I agree there's a value in forcing exploits to be weirder and more complex, since that helps spotting them in code reviews.

But beyond that, if you don't review the code, then the rest matters very little. Sandboxed build.rs can still inject code that will escape as soon as you test your code (I don't believe people are diligent enough to always strictly isolate these environments despite the inconvenience). It can attack the linker, and people don't even file CVEs for linkers, because they're expected to get only trusted inputs.

Static access permissions per dependency are generally insufficient, because an untrusted dependency is very likely to find some gadget to use by combining trusted deps, e.g. use trusted serde to deserialize some other trusted type that will do I/O, and such indirection is very hard to stop without having fully capability-based sandbox. But in Rust there's no VM to mediate access between modules or the OS, and isolation purely at the source code level is evidently impossible to get right given the complexity of the type system, and LLVM's love for undefined behavior. The soundness holes are documented all over rustc and LLVM bug trackers, including some WONTFIXes. LLVM cares about performance and compatibility first, including concerns of non-Rust languages. "Just don't write weirdly broken code that insists on hitting a paradox in the optimizer" is a valid answer for LLVM where it was never designed to be a security barrier against code that is both untrusted and expected to have maximum performance and direct low-level hardware access at the same time.

And that's just for sandbox escapes. Malware in deps can do damage in the program without crossing any barriers. Anything auth-adjacent can let an attacker in. Parsers and serializers can manipulate data. Any data structure or string library could inject malicious data that will cross the boundaries and e.g. alter file paths or cause XSS.

josephg · 2024-09-26T13:09:41.000000Z

> the exploits are impossible to write by accident, but they are possible to write on purpose.

Can you give some examples? What ways are there to write safe rust code & do nasty things, affecting other parts of the binary?

Is there any reason bugs like this in LLVM / rustc couldn't be, simply, fixed as they're found?

steveklabnik · 2024-09-26T13:31:42.000000Z

https://github.com/Speykious/cve-rs

They can be fixed, but as always, there’s a lot of work to do. The bug that the above package relies on has never been seen in the wild, only from handcrafted code to invoke it, and so is less of a priority than other things.

And some fixes are harder than others. If a fix is going to be a lot of work, but is very obscure, it’s likely to exist for a long time.

josephg · 2024-09-28T04:02:54.000000Z

Yes, true. But as others have said, there’s probably still some value in making authors of malicious code jump through hoops, even if it will take some time to fix all these bugs.

And the bugs should simply get fixed.

bormaj · 2024-09-26T11:32:04.000000Z

Are there any attempts to address this at the package management level (not a cargo-specific question)? My first thought is that the package could declare in its config file the "scope" of access that it needs, but even then I'm sure this could be abused or has limitations.

Seems like awareness about this threat vector is becoming more widespread, but I don't hear much discuss trickling through the grapevine re: solutions.

josephg · 2024-09-26T11:56:41.000000Z

Not that I know of - hence talking about it in this blog post!

vlovich123 · 2024-09-26T13:11:23.000000Z

Package scope is typically too coarse - a package might export multiple different pieces of related functionality and you’d want to be able to use the “safe” parts you audited (eg no fs access) and never call the “dangerous” ones.

The harder bit is annotating things - while you can protect against std::fs, it’s likely harder to guarantee that malicious code doesn’t just call syscalls directly via assembly. There’s too many escapes possible which is why I suspect no one has particularly championed this idea.

josephg · 2024-09-26T13:15:33.000000Z

> it’s likely harder to guarantee that malicious code doesn’t just call syscalls directly via assembly.

Hence the requirement to also limit / ban `unsafe` in untrusted code. I mean, if you can poke raw memory, the game is up. But most utility crates don't need unsafe code.

> Package scope is typically too coarse - a package might export multiple different pieces of related functionality and you’d want to be able to use the “safe” parts you audited

Yeah; I'm imagining a combination of "I give these permissions to this package" in Cargo.toml. And then at runtime, the compiler only checks the call tree of any functions I actually call. Its fine if a crate has utility methods that access std::fs, so long as they're never actually called by my program.

vlovich123 · 2024-09-26T13:32:27.000000Z

> Hence the requirement to also limit / ban `unsafe` in untrusted code

I think you’d be surprised by how much code has a transitive unsafe somewhere in the call chain. For example, RefCell and Mutex would need unsafe and I think you’d agree those are “safe constructs” that you would want available to “utility” code that should haven’t filesystem access. So now you have to go and reenable constructs that use unsafe that should be allowed anyway. It’s a massively difficult undertaking.

Having easier runtime mechanisms for dropping filesystem permissions would definitely be better. Something like you are required to do filesystem access through an ownership token that determines what you can access and you can specify the “none” token for most code and even do a dynamic downgrade. There’s some such facilities on Linux but they’re quite primitive - it’s process wide and once dropped you can never regain that permission. That’s why the model is to isolate the different parts into separate processes since that’s how OSes scope permissions but it’s super hard and a lot of boilerplate to do something that feels like it should be easy.

josephg · 2024-09-26T14:01:08.000000Z

> I think you’d be surprised by how much code has a transitive unsafe somewhere in the call chain. For example, RefCell and Mutex would need unsafe and I think you’d agree those are “safe constructs” that you would want available to “utility” code that should haven’t filesystem access. So now you have to go and reenable constructs that use unsafe that should be allowed anyway. It’s a massively difficult undertaking.

RefCell and Mutex have safe wrappers. If you stick to the safe APIs of those types, it should be impossible to read / write to arbitrary memory.

I think we just don't want untrusted code itself using unsafe. We could easily allow a way to whitelist trusted crates, even when they appear deep in the call tree. This would also be useful for things like tokio, and maybe pin_project and others.

zifpanachr23 · 2024-09-26T07:40:47.000000Z

Because for a lot of companies, especially ones in industries that Rust is supposedly hoping to displace C and C++ in, dependencies are a much larger concern than memory safety. They slow down velocity way more than running massive amounts of static and dynamic analysis tools to detect memory issues does in C. Every dependency is going to need explicit approval. And frankly, most crates would never receive that approval given the typical quality of a lot of the small utility crates and other transitive dependencies. Not to mention, the amount of transitive dependencies and their size in a lot of popular crates makes them functionally unauditable.

This more than any other issue is I think what prevents Rust adoption outside of more liberal w.r.t dependencies companies in big tech and web parts of the economy.

This is actually one positive in my view behind the rather unwieldy process of using dependencies and building C/C++ projects. There's a much bigger culture of care and minimalism w.r.t. choosing to take on a dependency in open source projects.

Fwiw, the capabilities feature described in the post would go a very long way towards alleviating this issue.

umanwizard · 2024-09-26T08:13:06.000000Z

Those companies can just ban using new rust dependencies, if they want to. Writing with minimal dependencies is just as easy in rust as it is in c++

hu3 · 2024-09-26T11:42:32.000000Z

You can't "just ban" new dependencies in an ecosystem where they are so pervasive otherwise the ban becomes a roadblock to progress in no time.

Sorry I have a problem with "just" word in tech.

kibwen · 2024-09-26T13:18:37.000000Z

This appears to be implying that rolling your own libraries from scratch is not a roadblock in C and C++, but somehow would be in Rust. That's a double standard.

Rust makes it easy to use third-party dependencies, and if you don't want to use third-party dependencies, then you're no worse off than in C.

hu3 · 2024-09-26T14:52:34.000000Z

Or, you know, leverage Go/.NET/JVM standard libraries for 99.999% of software and get shit done because there's more to memory safe solutions than just Rust.

Not to mention C/C++ dependency situation is a low bar to clear.

umanwizard · 2024-09-26T17:45:31.000000Z

If you can tolerate a garbage collector and interpreter overhead, sure. Rust's main niche is things that would have formerly been written in C++.

umanwizard · 2024-09-26T17:44:09.000000Z

Indeed, you wouldn't really be participating in the "rust ecosystem" at that point. I'm not disputing that it'd be a lot more difficult. The experience would be similar to using C++.

sandywaffles · 2024-09-26T13:49:59.000000Z

No one is forcing you to use a dependency. Write the code yourself just like you would in another language. Or vendor the dependency and re-write/delete/whatever the code you don't like.

hu3 · 2024-09-26T14:46:58.000000Z

Sorry but down here in Earth, not having unlimited resources and time does force us to use dependencies if you want to get things done.

The line has to be drawn somewhere. And that line is much more reasonable when you can trust large trillion dollar backed standard libraries from the likes Go or .NET, in contrast to a fragmented ecosystem from other languages.

What good is vendoring 4 million lines of code if I have to review them anyway at least once? I'd rather have a strong MSFT/GOOGL standard library which I can rely upon and not have to audit, thank you very much.

bobajeff · 2024-09-26T14:07:08.000000Z

I disagree i think avoiding dependencies is partly how we have these codebases like chromium's where you can't easily separate the functionally you want and deal with them as a library. That to me isn't minimalism.

anon-3988 · 2024-09-26T08:00:25.000000Z

Does C++ codebases with similar features parity somehow requires less code?

leoedin · 2024-09-26T09:16:48.000000Z

There's probably a similar amount of code in the execution path, but the Rust ecosystem reliance on dependencies means that you're pulling in vast amounts of code that doesn't make it to your final application.

A C++ library author is much more likely to just implement a small feature themselves rather than look for another 3rd party library for it. Adding dependencies to your library is a more involved and manual process, so most authors would do it very selectively.

Saying that - a C++ library might depend on Boost and its 14 million LOC. Obviously it's not all being included in the final binary.

kibwen · 2024-09-26T13:23:12.000000Z

> A C++ library author is much more likely to just implement a small feature themselves rather than look for another 3rd party library for it.

This is conflating Javascript and Rust. Unlike Javascript, Rust does not have a culture of "microdependencies". Crates that get pulled in tend to be providing quite a bit more than "just a small feature", and reimplementing them from scratch every time would be needlessly redundant and result in worse code overall.

leoedin · 2024-09-27T09:44:38.000000Z

My comment had nothing to do with Javascript.

Rust may not have "left pad" type micro-dependencies, but it definitely has a dependency culture. Looking at `cargo tree` for the medium size project I'm working on, the deepest dependency branch goes to 12 layers deep. There's obviously a lot of duplicates - most dependency trees eventually end with the same few common libraries - but it does require work to audit those and understand their risk profile and code quality.

nullifidian · 2024-09-26T14:33:55.000000Z

>Rust does not have a culture of "microdependencies"

It absolutely does by the C/C++ standards. Last time I checked the zed editor had 1000+ dependencies. That amount of crates usually results in at least 300-400 separately maintained projects by running 'cargo supply-chain'. This is an absurd number.

goodpoint · 2024-09-26T10:44:27.000000Z

Yes and by orders of magnitude.

goodpoint · 2024-09-26T10:43:39.000000Z

There's been many massive supply chain attacks happening.

And people are still calling it "obscure concerns"...

moss2 · 2024-09-26T07:06:38.000000Z

Same problem with JavaScript's NPM. And Python's PIP.

jwr · 2024-09-26T07:31:38.000000Z

This isn't necessarily a language problem, though, more of a "culture" problem, I think.

I write in Clojure and I take great pains to avoid introducing dependencies. Contrary to the popular mantra, I will sometimes implement functionality instead of using a library, when the functionality is simple, or when the intersection area with the application is large (e.g. the library doesn't bring as many benefits as just using a "black box"). I will work to reduce my dependencies, and I will also carefully check if a library isn't just simple "glue code" (for example, for underlying Java functionality).

This approach can be used with any language, it just needs to be pervasive in the culture.

josephg · 2024-09-26T10:04:47.000000Z

> This isn't necessarily a language problem, though, more of a "culture" problem, I think.

Author here. We could make it a language problem by having the language sandbox dependencies by default. Seems like an easy win to me. Technical solutions are almost always easier to implement than social solutions.

Ygg2 · 2024-09-26T10:16:56.000000Z

Edit: replied to wrong person.

josephg · 2024-09-26T12:01:17.000000Z

Huh?

> It's throwing the baby and bathwater into lava.

Is it really so controversial to want to be able to limit the access that utility crates like humansize or serde have to make arbitrary syscalls on my computer?

Seems to me like we could get pretty far with just compile-time checks - and that would have no impact whatsoever on the compiled code (or its performance).

I don't understand your criticism.

Ygg2 · 2024-09-26T14:33:33.000000Z

I thought you wanted to prevent transitive dependencies. For sandboxing crates, as JoshTriplett said it's another can of worms.

josephg · 2024-09-28T04:00:05.000000Z

By default, yes. But it probably makes sense to let people whitelist specific crates in their dependency tree. Crates like std and tokio, or blas libraries that make heavy use of simd. Stuff like that.

orwin · 2024-09-26T08:40:04.000000Z

I think this is made easier with Clojure macro capacity. In general, if you have powerfull metaprogramming tools, you trade dependency complexity with peace of mind (I still have flashbacks of C++ templates when i talk about metaprogramming :/. Does this qualify for PTSD?).

iforgotpassword · 2024-09-26T06:49:41.000000Z

Maybe they can learn from the Javascript folks, I heard they're very good at this.

pjmlp · 2024-09-26T07:21:04.000000Z

I think the interaction between both communities is exactly the reason of the current state.

teaearlgraycold · 2024-09-26T07:08:44.000000Z

Not sure if you're serious and talking about tree-shaking - or joking and talking about left-pad.

mseepgood · 2024-09-26T07:12:20.000000Z

No, they are the worst perpetrators re dependency hell.

M4Luc · 2024-09-26T11:59:16.000000Z

The Javascript folks are at least aware and self critical of this. In the Rust community it's sold as a great idea.

wiseowise · 2024-09-26T07:10:50.000000Z

Yes, unironically they’re now.

Node has improved greatly in last two years. They always had native JSON support. Now have native test runner, watch, fetch, working on permission system à la deno, added WebSockets and working on native SQLite driver. All of this makes it a really attractive platform for prototyping which scales from hello world without any dependencies to production.

Good luck experimenting with Rust without pulling half the internet with it.

E: and they’re working on native TS support

Denvercoder9 · 2024-09-26T08:25:42.000000Z

> without any dependencies

Nah, you still have those dependencies, they're just integrated in your interpreter. That has advantages (you're now only trusting a single source) and disadvantages (you always get all the goodies and the associated risks with that, even if you don't need them).

wiseowise · 2024-09-26T09:08:34.000000Z

You’re being pedantic for the sake of being pedantic.

M4Luc · 2024-09-26T12:07:02.000000Z

Another example is Axum. Using Go, C#, Deno or Node you don't even need any third party provided more or less secure and maintained lib. It all comes from the core teams.

hawski · 2024-09-26T11:55:56.000000Z

The friction in C and C++ library ecosystem is sometimes a feature for this sole reason. Many libraries try to pull as little as possible and other things as optional.

olalonde · 2024-09-26T08:54:58.000000Z

Why do you care how many lines of code the dependencies are? Compile time? Lack of disk space?

ptsneves · 2024-09-26T12:37:57.000000Z

Think of the problem as a bill of materials. Knowing the origin and that all the components of a part are fit for purpose is important for some applications.

If I am making a small greenhouse i can buy steel profiles and not care about what steel are they from. If I am building a house I actually want a specific standardized profile because my structure's calculations rely on that. My house will collapse if they dont. If I am building a jet engine part I want a specific alloy and all the component metals and foundry details, and will reject if the provenance is not known or suitable[1].

If i am doing my own small script for personal purposes I dont care much about packaging and libraries, just that it accomplishes my immediate task on my environment. If I have a small tetris application I also dont care much about libraries, or their reliability. If I have a business selling my application and I am liable for its performance and security I damn sure want to know all about my potential liabilities and mitigate them.

[1] https://www.usatoday.com/story/travel/airline-news/2024/06/1...

M4Luc · 2024-09-26T12:03:32.000000Z

Security and maintenance. That's what's so compelling about Go. The std lib is not a pleasure to use. Or esp. fast and featureful. But you can rely on it. You don't depend on 1000 strangers on the internet that might have abandoned their Rust crate for 3 years and nobody noticed.

cmrdporcupine · 2024-09-26T09:00:11.000000Z

Some of us like to understand what's happening in the software we work on, and don't appreciate unnecessary complexity or unknown paths in the codebase that come through third party transitive dependencies.

Some of us have licensing restrictions we have to adhere to.

Some of us are very concerned about security and the potential problems of unaudited or unmoderated code that comes in through a long dependency chain.

Hard learned lessons through years of dealing with this kind of thing: good software projects try to minimize the size of their impact crater.

armitron · 2024-09-26T07:10:11.000000Z

This is the main reason we have banned Rust across my Org. Every third party library needs to be audited before being introduced as a vendored dependency which is not easy to do with the bloated dependency chains that Cargo promotes.

skywal_l · 2024-09-26T07:25:01.000000Z

The dependency hell issue is not directly related to Rust. The Rust language can be used without using any dependency. Have you banned javascript and python too?

Zagitta · 2024-09-26T11:19:18.000000Z

And in a similar vein have they audited the runtimes of all the languages they use? Because those a dependencies too and in many ways even more critical than libraries.

OtomotO · 2024-09-26T07:31:20.000000Z

Good on you, this approach will keep you employed for a looooooooong time, because someone has to write all that code then, right? ;)

mu53 · 2024-09-26T07:42:39.000000Z

TBH, I have adjusted my programming recently to write more stuff myself instead of finding a library. Its not that bad. I think ChatGPT are really good at these at those types of questions since it can analyze multiple from github and give you an answer averaging them together.

Also, if you just have a really well defined problem, its easy to just whip out 10-50 lines to solve the issue and be done with it

SkiFire13 · 2024-09-26T08:08:14.000000Z

And that's how you end up with solutions that don't handle edge cases.

OtomotO · 2024-09-26T10:40:04.000000Z

Anything worth writing isn't covered by chatgpt

cmrdporcupine · 2024-09-26T08:49:08.000000Z

Why ban Rust instead of just banning Cargo?

It's entirely possible to use Rust with other build systems, with vendored dependencies.

Crates.io is a blight. But the language is fine.

j-krieger · 2024-09-26T07:22:19.000000Z

How do you solve this for other languages you use?

hu3 · 2024-09-26T11:45:37.000000Z

I've seen this approach go a long way with languages that have a large standard library. Go and C# .NET comes to mind.

armitron · 2024-09-26T07:33:37.000000Z

Our main languages are Go and OCaml. We can leverage third party libraries without easily running into transitive dependency hell as there’s an implicit understanding in these communities that large number of dependencies is not a good thing. Or, expressed differently, there is coarser granularity in what ends up being a library. This is not the case with Cargo which has decided to follow the NPM approach.

lifthrasiir · 2024-09-26T07:49:06.000000Z

At least in my experience, Go packages and Rust crates are much coarser than NPM packages. (Look at actual direct and indirect dependencies in cargo-watch to judge it by yourself.) I think Go prefers and actually has resource to keep mostly centralized approaches, while Rust crates are heavily distributed and it takes longer for the majority to settle on a single solution.

simonask · 2024-09-26T07:24:07.000000Z

I'm sorry, but that feels like an incredibly poorly informed decision.

One thing is to decide to vendor everything - that's your prerogative - but it's very likely that pulling everything in also pulls in tons of stuff that you aren't using, because recursively vendoring dependencies means you are also pulling in dev-dependencies, optional dependencies (including default-off features), and so on.

For the things you do use, is it the number of crates that is the problem, or the amount of code? Because if the alternative is to develop it in-house, then...

The alternative here is to include a lot of things in the standard library that doesn't belong there, because people seem to exclude standard libraries from their auditing, which is reasonable. Why is it not just as reasonable to exclude certain widespread ecosystem crates from auditing?

cmrdporcupine · 2024-09-26T09:02:55.000000Z

> One thing is to decide to vendor everything - that's your prerogative - but it's very likely that pulling everything in also pulls in tons of stuff that you aren't using, because recursively vendoring dependencies means you are also pulling in dev-dependencies, optional dependencies (including default-off features), and so on.

What you're describing is a problem with how Cargo does vendoring, and yes, it's awful. It should not be called vendoring, it is just "local mirroring", which is not the same thing.

But Rust can work just fine without Cargo or Crates.io.

joatmon-snoo · 2024-09-26T07:41:31.000000Z

This is what lockfiles are for.

dathinab · 2024-09-26T12:16:54.000000Z

> It turns out, the deps add up to almost 4 million lines of Rust code, spread across 8000+ files

(Putting aside the question weather or not that pulls in dev dependencies and that watchin files can easily have OS specific aspecects so you might have different dependencies on different OSes and that neither lines and even less files are a good measurement of complexity and that this dependencies involve a lot of code from features of dependencies which aren't used and due to rust being complied in a reasonable way are reliable not included in the final binary in most cases. Also ignoring that cargo-watch isn't implementing file watching itself it's in many aspects a wrapper around watchexec which makes it much "thiner" then it would be otherwise.)

What if that is needed for a reliable robust ecosystem?

I mean, I know, it sound absurd but give it some thought.

I wouldn't want every library to reinvent the wheel again and again for all kinds of things, so I would want them to use dependencies, I also would want them to use robust, tested, mature and maintained dependencies. Naturally this applies transitively. But what libraries become "robust, tested, mature and maintained" such which just provide a small for you good enough subset of a functionality or such which support the full functionality making it usable for a wider range of use-case?

And with that in mind let's look at cargo-watch.

First it's a CLI tool, so with the points above in mind you would need a good choice of a CLI parser, so you use e.g. clap. But at this point you already are pulling in a _huge_ number of lines of code from which the majority will be dead code eliminated. Through you don't have much choice, you don't want to reinvent the wheel and for a CLI libary to be widely successful (often needed it to be long term tested, maintained and e.g. forked if the maintainers disappear etc.) it needs to cover all widely needed CLI libary features, not just the subset you use.

Then you need to handle configs, so you include dotenvy. You have a desktop notification sending feature again not reason to reinvent that so you pull in rust-notify. Handling path in a cross platform manner has tricky edge cases so camino and shell-escape get pulled in. You do log warnings so log+stderrlog get pulled in, which for message coloring and similar pull in atty and termcolor even through they probably just need a small subset of atty. But again no reason to reinvent the wheel especially for things so iffy/bug prone as reliably tty handling across many different ttys. Lastly watching files is harder then it seems and the notify library already implements it so we use that, wait it's quite low level and there is watchexec which provides exactly the interface we need so we use that (and if we would not we still would use most or all of watchexecs dependencies).

And ignoring watchexec (around which the discussion would become more complex) with the standards above you wouldn't want to reimplement the functionality of any of this libraries yourself it's not even about implementation effort but stuff like overlooking edge cases, maintainability etc.

And while you definitely can make a point that in some aspects you can and maybe should reduce some dependnecies etc. this isn't IMHO changing the general conclusion: You need most of this dependencies if you want to conform with standards pointed out above.

And tbh. I have seen way way way to many cases of projects shaving of dependencies, adding "more compact wheel reinventions" for their subset and then ran into all kinds of bugs half a year later. Sometimes leading to the partial reimplementations becoming bigger and bigger until they weren't much smaller then the original project.

Don't get me wrong there definitely are cases of (things you use from) dependencies being too small to make it worth it (e.g. left pad) or more common it takes more time (short term) to find a good library and review it then to reimplement it yourself (but long term it's quite often a bad idea).

So idk. the issue is transitive dependencies or too many dependencies like at all.

BUT I think there are issues wrt. handling software supply chain aspects. But that is a different kind of problem with different solutions. And sure not having dependencies avoid that problem, somewhat, but it's just replacing it IMHO with a different as bad problem.

wiseowise · 2024-09-26T06:54:17.000000Z

What do you propose? To include it as part of std? Are you insane? That would bloat your binaries! (Still don’t understand how the smart compiler isn’t smart enough to remove dead code) And imagine if there’s an update that makes cargo-watch not BlAzInGlY fAsT™ but uLtRa BlAzInGlY fAsT™? /s

willvarfar · 2024-09-26T07:11:02.000000Z

How does Go compare?

I'm curious as I don't know Go but it often gets mentioned here on HN as very lightweight.

(A quick googling finds https://pkg.go.dev/search?q=watch which makes me think that it's not any different?)

wiseowise · 2024-09-26T07:18:58.000000Z

https://pkg.go.dev/std

They’re much better.

lifthrasiir · 2024-09-26T08:55:23.000000Z

I recall that I was very surprised to hear that Go standard library has extensive cryptographic stuffs. Generally that would be very unwise because they will become much harder to change or remove in spite of security issues. Turns out that this particular portion would be maintained by other maintainers who are actually trained in cryptography and security---something almost any other languages wouldn't be able to do with their resources.

TechDebtDevin · 2024-09-26T07:22:08.000000Z

You fuck around...