I like this idea, but I dont know if Hyper is the best package to go with.
Hyper occupies part of the Rust ecosystem that I think suffers from package
bloat, like much of NPM. For example, currently Hyper requires 52 packages:
Part of this is just crates being broken up more in Rust. For example the `http` crate only contains trait (interface) definitions. They break down like so:
If those platform crates are just backends for libc (or similar to libc), why aren't they all folded in a single project? Having them as separate crates open the gate for supply-chain attacks without really allowing greater control or expressiveness. You are always going to pull all of them in, you are unlikely to ever use them directly, and they have no more impact on compilation than features.
Having to use proc-macro2+quote+syn for macros feels wrong, considering it's a language feature.
Having to pull in 4 crates for pinning also seems wrong. This could easily be a single utility crate, and if you really need all this cruft to use Pin (a language feature) most of this should probably be in std.
The async I/O feels like definite bloat. Not only is that a lot of futures-* crates, but I know from first-hand experience that those tend to implement multiple versions of some primitives (like streams) that are incompatible.
> If those platform crates are just backends for libc (or similar to libc),
They're not, they're to the specific platform's APIs. The libc crate is a package that targets libc on every platform.
> Having to use proc-macro2+quote+syn for macros feels wrong, considering it's a language feature.
In order to ship procedural macros as a feature, they're pretty minimal. There's a tradeoff here; others could be chosen, but weren't for good reasons.
> Having to pull in 4 crates for pinning also seems wrong.
Really depends, network calls are a textbook use-case for async I/O. And there's so many futures crates specifically so that you can only depend on the bits you need.
I imagine that it's there because it is older than that happening. HashBrown's README says
> Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.
I don't know if that use case is important to Hyper or not.
Not depependin on OpenSSL is kind of the point of the original post if I understand it correctly.
Also I find the dependency on OpenSSL one major pain in my Rust projects. When you want to build a statically linked binary you need to supply a statically built OpenSSL and if your distro doesn't come with one (like Ubuntu) you are on your own. Yes, there is a Docker container that comes with all the prerequisites but I think that's a bit heavy for my purposes.
I wish there was a single switch in Cargo.toml and every dependency would automagically use rustls.
Containers is becoming the way to build stuff, partly for that reason.
I think any dependency adds a level of burden, but some things are better delegated to library. I think crypto is a good case, btw I think OpenSSL is not the only one lib for TLS with curl.
> Containers is becoming the way to build stuff, partly for that reason.
This sits wrong with me, but thinking about it: I rewrote the sentence about containers in my comment three times before posting it and it still doesn't sound compelling. Maybe you have a point here.
So this gets you also different behaviour that may be what you want, or may not, depending.
Specifically if you use SChannel, you get the CA roots from Microsoft's CA Root programme, whereas ordinarily you'll end up with (some derivative of) the Mozilla CA root programme.
You also get the local policy root overrides. So for example in many corporate networks with a middlebox ensuring employees don't look at porn, the middlebox is trusted according to Windows Group Policy. Now your Curl program works the same way as Internet Explorer does, if the site is trusted in IE then it's trusted in Curl.
On the other hand, this means that the SChannel enabled Curl trusts different things from the Curl on platforms with OpenSSL. Maybe this new setup works "fine" in SChannel Curl, but only when you try from a Linux do you discover that your new site doesn't work at all any more without Microsoft's trust list, which explains the thousands of new tickets filed by (mostly Linux using) customers whose product just mysteriously broke even though it looked fine on your Windows test machine and you've just closed a dozen of those tickets as WORKSFORME...
You could argue the same thing for many big monolithic C projects though. How many of the original authors/maintainers are left in OpenSSL or the Linux kernel?
My main worry about Rust dependencies is not so much the number, it's that it's still a fairly young ecosystem that hasn't stabilized yet, packages come and go fairly quickly even for relatively basic features. For instance for a long time lazy_static (which is one of the dependencies listed here) was the de-facto standard way of dealing with global data that needed an initializer. Apparently things are changing though, I've seen many people recommend once_cell over it (I haven't had the opportunity to try it yet).
Things like tokio are also moving pretty fast, I wouldn't be surprised if something else took over in the not-so-far future.
It's like that even for basic things: a couple of years ago for command line parsing in an app I used "argparse". It did the job. Last week I had to implement argument parsing for a new app, at first I thought about copy/pasting my previous code, but I went on crates.io and noticed that argparse hadn't been updated in 2years and apparently the "go to" argument parsing lib was now "clap". So I used clap instead. Will it still be used and maintained two years from now? Who knows.
I switched ripgrep to clap 4 years ago. And that was well after clap had already become the popular "go to" solution.
Some parts of the ecosystem are more stable than others. That's true. And it takes work to know which things are stable and which aren't.
And yet, some things just take a longer time to improve. lazy_static has been stable and unchanged for a very long time and it works just fine. You don't need to switch to once_cell if you don't want to. lazy_static isn't going anywhere. The real change here, I think, is that we're hoping to get a portion of once_cell into std so that you don't need a dependency at all for it.
The async ecosystem is definitely moving more quickly because it just hasn't had that much time to stabilize. If you're using async in Rust right now then you're probably an early adopter and you'll want to make sure you're okay with the costs that come with that.
Interesting that I missed clap when I wrote that program a few years ago then. In my defence "argparse" is a lot more explicit than "clap" for a such a library. Also argparse's last update was 2 years ago, so there's been quite a bit of overlap.
I guess what I'm saying is that it's an other problem with the current package ecosystem: you often end up finding multiple packages purporting to do what you need, and it can be tricky to figure out which one you want. As an example, if you want a simple logging backend the log crate currently lists 6 possibilities: https://crates.io/crates/log
> it's an other problem with the current package ecosystem: you often end up finding multiple packages purporting to do what you need, and it can be tricky to figure out which one you want
I'm trying to remember the last language I've used where people didn't say that.
Hmm... clojure? Nop.
Javascript? Nop nop nop.
Python? Hahaha I can't even remember all the package managers: virtualenv, venv, pipenv, poetry, ...
Seems like an unavoidable problem unless you buy into a curated ecosystem. Like, yeah, the cost of a decentralized ecosystem is that you have to do your due diligence on which crate to use, if any. (For example, I don't even bother with a log helper crate because it just isn't necessary for simple cases.)
Well yes if they're published as part of the same project as lots of these are. In C/C++ you wouldn't do this because consuming a library is a pain so you want to minimise the number of dependencies. In Rust, what would be 1 library in C often gets broken up into a few that are published together in order to allow people to depend on only the functionality they need.
It's also worth pointing out that once a version is published to crates.io, it can't be altered, specifically to prevent social engineering attacks. If you're worried about it, that means you can audit the frozen codebase for any given version from a top-level crate down through the dependencies, and once that trust is established, it can't be leveraged for a silent dependency change later on, which can only happen through a version update on the end-user's side.
C/C++ suffers from severe wheel reinvention due to a lack of package management, but do you see people cautioning the use of programs written in that language for that reason?
The STL package contains lots of things, but most of them are pretty ugly to use and quite a few of them are quite complicated to use because they are generalized for as many use cases as possible. There are so many authors out there that decide they can do memory management manually, write their own thread or process pools, implement sorting in a different way, rewrite basic algorithms for list operations like mapping, filtering, zipping, reducing...
Simply counting the number of dependencies isn't a great indicator for dependency bloat. There are extremes on both ends : no deps --> I know everything better and reimplemented the world, and thousands of deps --> I put a one liner in a package ma! One should not judge too quickly.
> These complaints are valid, but my argument is that they’re also not NEW, and they’re certainly not unique to Rust ... The only thing new about it is that programmers are exposed to more of the costs of it up-front.
Hm, this response essentially says "other languages have this problem too, so deal with it".
While thats true, it completely misses the point. The point is not that dependencies exist, or even that a package might have many dependencies.
The point is, Rust (and NPM) I have found many times dont care or even consider the impact of a large amount of dependencies, and often take no steps to mitigate or reduce that number.
As others said, some features could be split off into other crates. Maybe someone only needs HTTP, or maybe they need HTTPS but no Async. Or maybe they dont need logging. With Hyper and others you just have to build everything whether you want it or not.
When I find a project that is a handful of .c files and a Makefile they almost always compile and run. Sometimes with warnings because the features used in the code are depreciated but usually without too much fanfare.
Same in Rust. Actually, it's quite better than in C. The only time Rust projects fail to compile is when they pull in some C library and something there (like configure.ac) messed up. :D
And if this C project does anything interesting it pulls in bunch of C libraries that came precompiled with your OS, and might be stale and contain unpatched security vulns.
C/C++ developers pointing at other languages about dependency hell is a curiosity.
> other languages have this problem too, so deal with it
This is the primary reason I try to avoid projects built with npm. Fucking dependency hell. If the project hasn't been actively maintained in the last 3 months your chances of getting it to work drop precipitously.
Github and npm are both graveyards filled with dead JS libraries. They make it too easy to litter the universe with sub-par orphaned software. And somehow, it's up to each individual to filter it all out. You have critical software such as React sitting next to mountains of bad nonsense code. And they are all on equal footing.
People love to trash Perl on HN. But among many other things that Perl devs understood, they deeply understood issues that come up with dependencies. Most CPAN modules are namespaced, have unit tests, and unit tests run when modules are installed. Not only that, the people behind CPAN understood that it is a community effort and you, as a library author, have certain responsibilities to your community.
> The only thing new about it is that programmers are exposed to more of the costs of it up-front.
That's funny, because it's only "new" if your experiences primarily lie in newer languages and communities. There's a lot of criticism of C, but one thing it does is make dependencies pretty explicit. Some say that's good, some say that's bad, I guess it can be both at different times.
Let's say you open up a new C codebase: What are its dependencies? You'll have to hunt through its README (hopefully it's up to date!), other build instructions, maybe CMake, maybe some custom build system, etc.
What version of dependencies does it use? If the code has been vendored, you at least know what code its using - but where do you look for updates to that code? Do you manually go out to wherever it was copied from now and then and look for updates? If the code isn't vendored, then how was it installed? From the package manager? If so, what operating system and version was used during development? If its not from a package, its might have been downloaded and installed manually? Again, where was it downloaded from? Where was it installed? What options did it use when it was compiled?
How do you handle transitive dependencies? Probably by hand. How well documented are they?
But each one of these dependencies is pretty consciously and manually added, most of the time. In new code, to introduce a new one usually requires thought, and that creates a culture of caution.
Also if you are dynamic linking, ldd(1) can give you a pretty good picture.
Being small is not in the question. Being there is. Otherwise we wouldn't be here discussing all hyper dependencies in detail as some of these are a lot smaller than zlib.
I am late to reply. What I mean to say is zlib is a small self contained dependency, suitable for static linking, and no major dependency of its own, just math. A lot of more modern libraries and languages tend to have their small dependencies pull in a massive hydra of dependencies.
Also, I coud imagine Rust's type-system raises the bar of dependencies that can be mangaged before everything breaks down. So a lib with 100 deps in NPM isn't the same as a lib with 100 deps in Cargo.
Yes but people often throw out empty criticism without offering solutions or alternatives and it gets annoying after a while and is just a complaint rather than a healthy critique. As programmers we've heard them all before I guarantee it, so why add another one to the pile?
I mean, you're just repeating a sibling comment, but if development has been this way for a long time, it's on the folks who are suggesting the new way to get out there and prove that it's a viable model for software development. It appears that most real-world, actually used software works like this.
I am all about improving the world, don't get me wrong, but saying "hey this software works just like all the other software" isn't really the insult that you seem to think that it is.
> saying "hey this software works just like all the other software" isn't really the insult that you seem to think that it is.
Well, it's common knowledge that most existing software is compete and utter crap, as evidenced by the fact that our first thought upon hearing that a particular piece of software is no longer being updated is not "oh good, it is (probably) finished and we can rely on it", but rather "on no, now the innumerable defects no doubt still latent in it will remain unfixed". So "this software is just as bad as all the other software" is, while not a very grave insult in a relative sense, still quite damning in absolute terms.
It depends. If it's in a github repo and there isn't a massive backlog of issues for a software that hasn't been updated in a while, I might think that.
One good thing about stat counters for packages combined with GitHub for issue tracking of you can kind of tell.
It does take some level of die diligence and isn't easy. But neither is anything relying on say system installed libraries in C projects.
- a significant (eg, at least as complex as wget) software project,
- that has been unmaintained (no updates, code has the same MD5/etc hash),
- with a significant userbase (not sure exactly how to define that one),
- for a significant amount of time (at least five years),
- which is generally regarded as finished and bug-free (not in need of further development) rather than abandoned?
Because I can't think of a single one, and the only ones that even come close are video games where the known bugs were co-opted into gameplay features. The general consensus seems to be that any system that doesn't have automatic updates running is de-facto insecure (which, since every update mechanism I've heard of can introduce new code (ie new security vulnerabilities), means any system whatsoever is insecure).
(I don't quite disagree with the tacit assertion that actually getting things right on - if not the first try - then at least one of the first thirty or so is a extremely, maybe even unreasonably high standard, but it manifestly is a standard that basically all existing nontrivial software projects fail to meet.)
5 years is a relatively rough one... in terms of libraries, I come across a lot that are 2+ years old that are feature complete and work. In terms of applications, there are a couple other responses in this thread, but the specific focus in reference was really on libraries themselves, which shouldn't be as complex as wget in general.
It seems to me Rust folks have developed this habit of deflecting blame by pointing to shallow commentary. One of good examples is compiler slowness reasons, sometimes its LLVM, or it is lot of optimizations, or it is not really slow compare to C++ and so on.
They could have said it straight "Guys highly optimized, safe compilation of medium size project will be in range of 20-30 min". And that would great and honest way to deal. Instead we get oh, we have reduced compile times from 26 minutes to 21 minutes so it is 19% improvement in just one year and there is more to come. Now this is hard work and great but I am sure Rust committers understand when people say fast compile times they most likely comparing to Go etc which would be under a minute for most mid size projects. And this is very well not going to happen.
Same is now to Cargo dependency situation. Cargo and NPM are ideologically in agreement that people in general should just pull it from package manager instead of rewriting even little bit of code. And again instead of owning it there will be list of shallow reasoning: others do it too, reusing is better that rewriting, cargo is so awesome that pulling crate is much easier and so on.
No my point is Rust fans try to win very narrow technical arguments even when they should clearly know discussion is about big picture. And yes it seems bad thing to me.
On an internet forum when someone brings up a topic lots of people will respond with different opinions about that topic. You can act like this is somehow specific to the Rust community, but I don't think it is.
TBH you seem to have a really odd bias against Rust, you repeatedly take something about it that's positive and try to spin it as negative, like possibly even for years? Maybe examine that.
Most of these are maintained as sets of crates under the same project/maintainers. For example, everything starting with futures comes from one repo, everything starting with tokio plus mio (plus some others) are under the tokio-rs GitHub organization, all the windows bindings packages are from the same repo, etc.
Plus some of the dependencies are also dependencies of the standard library (hashbrown, cfg-if, libc).
I've recently taken a different stance on this. the issue isn't number of dependencies, its the reliability of said dependencies to be the following
- Secure. Does this contain malicious code, exploits or otherwise known bugs that could be fixed but aren't? This of course is hard, and will never be perfect. There are static security scans though, espcially in a language like rust, that should be able to verify what the code does before its being published for consumption via carog. NPM is trying to do something similiar. This isn't foolproof and more sophisicated exploits always will exist, but getting low hanging and mid-level fruit should be within reach, which is a net win
- Is it quality? This is beyond just secure, but does it provide real utility value? One thing we always hear is DRY, which may mean sometimes you consume alot of dependencies, since the problem space you work in involves a lot of things, so why re-invent every single fix if something exists and you can glue together to start making impact in your problem domain? I don't think this is an issue espcially if #1 is true
So I don't know, I think its fine to have a lot of dependencies, I think the justification for those dependencies is often related to the complexity of the work involved. I'd expect a cURL replacement to have quite a few dependencies, since its complicated software with lots of edge cases, so for instance not re-inventing a HTTP parser is a great idea.
Now if only we all shared as developers this sentiment in terms of upstreaming contributions back too. The more we contribute and share with each other the more productive we can be.
Of course, sometimes package managers do terrible jobs at ensuring some sort of base quality, around security or otherwise, and thats never great. So its important to be aware of the trade offs you make when you source your dependencies.
By no means does this excuse developers from not understanding their dependency tree either. Its really the opposite.
* tracing is only present for logging purposes. does curl need it? It should be configurable.
* itoa is only present for performance purposes, and only seems used by the server.
* It seems that a bunch of projects in the dependency tree use pin-project which is heavyweight and instead could use pin-project-lite. Some already do, which creates both being used, so you are worse off than just with pin-project alone...
* hyper contains code for both http servers and clients. Even if the dependencies are the same (and as seen above they are not), having to compile the server code for the client means an increase in compile time. It would be cleaner to provide separate server and client flags to make it possible to turn one off.
There’s a third option if the package is so important: put it in the standard library.
Technically it’s still a dependency but the standard library is maintained with a standard that is rarely matched by third party libraries, and can dramatically simplify the ecosystems’ dependency graph.
Rust's standard library is intentionally kept small since it's initially not always obvious what the best solution is and the stability guarantee makes it the wrong place for evolving, diverse or opinionated APIs.
E.g. the async ecosystem offers you to pick between runtimes of different complexity and tradeoffs. Std only picked up the essential traits that allow other crates to interoperate with each other and the language itself to define async functions.
> with a standard that is rarely matched by third party libraries
I mean, you can go both ways with this. Standard libraries are significantly more difficult to work on than third party libraries, and I've seen a lot of code in standard libraries that is objectively worse than ecosystem equivalents because of it.
I agree, there's no silver bullet, only tradeoffs and our appetites for them.
From looking at the list of crates above, I see a lot that are often part of the stdlib of other languages, such as HTTP, concurrency/nio primitives, and logging.
My perception of Rust's not including such fundamental primitives in the standard library is that Rust is still very much experimental, and the ecosystem values tinkering and experimenting with new ideas. The cost of that is increased dependency hell, especially over time. There's obviously huge value in experimentation for the industry, but it makes me hesitate to use Rust for projects where I just need to get stuff done, and stay done.
HTTP libraries are a prime example of where many, many standard libraries are considered old and crufty, and there are much better ecosystem libraries that end up being wildly used more.
You may have that perception, and that is fine, but it's not likely to be a thing that changes significantly, even when Rust is quite old. There's just not a lot of advantage to being in the standard library, and numerous downsides.
We should be more nuanced than that. There are also many standard libraries where the HTTP implementation is the standard. Why?
> There's just not a lot of advantage to being in the standard library, and numerous downsides.
Look at those huge lists of dependencies and the complaints of Cargo dependency hell. That's the downside. Every node in your dependency graph has overhead for everyone involved, and it's even worse when it's something as fundamental as HTTP.
Isn't that exactly why you'd use a package like hyper that wraps the pieces together for you?
I'm less experienced with rust, but with nude, there's many times I'll use a specific dependency over another because it's already in the dependency tree.
Aside, it's rough actually trying to keep node dependencies in check in a project. Especially in web UI projects using npm.
As inconvenient as it is, I tend to agree. Having worked with C#/.Net where almost everything in the box, Node, where very little is in the box and a miniscule amount of rust which is more towards the latter, I prefer the latyer.
Now, I am somewhat supposed that say tokio or similar hasn't made it in the box yet, it allows for much greater experimentation.
Aside, I wouldn't be surprised to see MS generate a massive sure if libraries if they shift more internals development to rust. Not sure if it'll be good/bad or otherwise.
I also wonder if Hyper is really the right tool for the job here. When I was looking into HTTP/S crates, I decided against Hyper because it seemed to require bringing in a runtime for HTTPS, and the async nature of hyper did not seem necessary for my totally synchronous CLI tool. For something like CURL it seems like you would just want the leanest, simplest synchronous HTTP implementation possible
1. CURL without https seems insufficient nowadays.
2. CURL could be improved by running multiple downloads at once. I'm not sure that curl command line utility could do it, but certainly libcurl.so has this ability, it allows client code to work with multiple connections.
3. Any application having UI could benefit from async: input/output and main task are async by nature. For example, curl might want to show progress/status to a terminal despite of a stalled connection.
So maybe Hyper is too much of a code for a curl, but it is arguable that it is not.
1. So as to your first point, I totally agree CURL needs to support HTTPS. My point is that Hyper needs a runtime for HTTPS, and it doesn't necessarily make sense for CURL to have a runtime.
2. I'm not sure that CURL should necessarily support multiple concurrent downloads. It could also be argued it's more UNIX-y to make it just do one thing and allow the caller to run multiple CURL processes at the same time
3. You wouldn't necessarily need async to achieve these goals. You could easily have a synchronous http implementation which allows for printing to the console between receiving chunks of data from the network. And if you really didn't want blocking to have a "spinning" activity indicator or something, you could still achieve it with threads.
I think ultimately you'd have to decide based on the relative cost of including an entire runtime vs. just launching a second thread.
Hyper only brings in a single-threaded runtime, so it's not much of a runtime at all. That is to say, driving a future returned from Hyper in a blocking fashion and then dropping it is all that's required. Once you drop the client, the runtime will be dropped too. I usually take issue with runtimes due to the added complexity and the lack of clarity about resource usage - I'm mostly worried about superfluous memory usage and some rogue threads going off and doing a quest and a half doing god knows what, hogging or otherwise interfering with my application's threads. I don't think that's possible in this case. Do you know if there are any other concerns that I should be worried about when bringing in something that has a runtime?
> You could easily have a synchronous http implementation which allows for printing to the console between receiving chunks of data from the network. And if you really didn't want blocking to have a "spinning" activity indicator or something, you could still achieve it with threads.
One could do a spinning indicator, but not a label STALLED. To get this one need to restart read(2) every while, and there we come to implementation with complexity on par with async. Things become even more interesting if a program wants to process user input in async. UNIX-way is to send signals, but it is just plain ugly. dd from coreutils allows to use signals to trigger it to print progress, it is very inconvenient way to do it.
> I think ultimately you'd have to decide based on the relative cost of including an entire runtime vs. just launching a second thread.
I'm not so sure. Runtime for user-space context-switching is very small. I did it for educational purposes at some time in the past with C. It is operation like save registers, switch stacks, restore registers and jump to another thread. If you have more than two threads, then you'd need some kind of structure to store all contexts and to decide which one to choose next. Add some I/O code (like epoll) to track state of file descriptors, and you are done. One could do it without async, but it wouldn't become much smaller, because it would be the same logic, just instead of stack switching program would recreate stack frames.
IMO "how many packages are the dependencies broken into" is a far less useful question than "how many maintainers have commit access to the dependency subtree".
The latter is a better question because:
* It's directly connected to your security posture.
* It's a stable metric across languages with different norms about module size.
The number is smaller than that. You need to pass -e no-dev to cargo tree to filter out the dev-dependencies (which are only relevant for hyper development).
A quick check of cargo-Geiger shows many hundreds of unsafe invocations in the dependencies of hyper. I think it’s hard to argue that some rust HTTP library is Irrefutably safer when you’ve thrown out so many of the static guarantees of the language and replaced them with “dude, trust me”.
This quote is interesting: “ I’m a bit vague on the details here because it’s not my expertise, but Rust itself can’t even properly clean up its memory and just returns error when it hits such a condition. Clearly something to fix before a libcurl with hyper could claim identical behavior and never to leak memory”.
So Rust aborts on invalid memory accesses, unwrap on None, etc. It does not abort on memory leaks. I don’t see Rust aborting in that context as much different from a segfault, and it guards against more situations than a segfault is able to do. Additionally, when stack unwinding is enabled (default) aborts can be caught during runtime and handled specially, if that’s necessary.
> So Rust aborts on invalid memory accesses, unwrap on None, etc.
Rust will panic on these things, and panics can abort, or unwind. Unwind is the default.
That's not what's being talked about here, I don't think. This is about alloc::alloc::handle_alloc_error, which was not allowed to unwind at the time the linked comment was made. But in the last few hours, https://github.com/rust-lang/rust/pull/76448 was linked to, which shows how that has since changed.
This is an interesting case for error-handling, because a) it can happen virtually anywhere, so handling it with Result would add huge API overhead in terms of ergonomics (given that most code never really has to think about it), yet b) it's very important that it is able to be handled in certain kinds of system contexts, which panics are not designed to facilitate. Most error cases seem to fall neatly into one camp or the other (something you always want to explicitly handle, or something where you just want to abort), but this one doesn't.
handle_alloc_error seems to do well enough as a workaround, but (from my superficial reading of the GitHub thread) it feels like just a very specific "poor-man's try/catch" for this one particular case. It feels like a workaround.
In general I'm a big fan of Result/panic in place of traditional exceptions, but this usecase makes it really quite unideal
> This is an interesting case for error-handling, [...]
Part of the problem is that it's actually two different error cases: A, the data we're operating on is too large, versus B: the system as a whole doesn't have enough memory. Case A is obviously[0] a explicitly-handle error (just like "the data we're operating on is malformed"), whereas case B is obviously a just-give-up error (like "the system doesn't have a floating-point unit"). But there doesn't seem to be any practical way to reliably distinguish the two cases, and it's not clear they can be rigorously separated even in principle.
0: we might decide to 'handle' it by aborting, but that's not special to allocation
> This is an interesting case for error-handling, because
I think this is a very good articulation of this space, thank you.
> It feels like a workaround.
I agree, but the needed work to improve this has taken a very, very, very long time, and so I think workarounds are ultimately helpful. We'll get there...
I think it can honestly be traced down to just how wide a stretch of usecases Rust itself has managed to span. For systems, this arguably is "something you always want to explicitly handle", whereas for applications it's almost always "something where you just want to abort". The level of abstraction is very different, but a single language is spanning both. Or, more precisely, a single standard library.
Just spitballing: maybe one solution could be an alternate set of standard primitives, specifically for low-level work, that do return a Result for everything that might trigger OOM? Maybe those could be abstracted out of the current standard library, and the existing ("application-level") APIs could wrap them with a panic?
This is AFAICT essentially what wg-allocators is working on where you can directly specify an allocator in collections (and alloc() returns a Result): https://github.com/TimDiekmann/alloc-wg
From the readme it sounds like this has actually started to be upstreamed now?
I'm still annoyed that the Rust people screwed up error handling so badly. The designers should have gone with classical exceptions like other languages, but instead went for a fashionable-at-the-time combination of error codes and added exceptions (spelled "panics"), cribbed for some reason from Go. And on top of that, the Rust designers copied one of the most annoying parts of the C++ ecosystem: a compiler switch for changing exception behavior.
The overall result is that everyone pays the cognitive cost of exception (spelled "unwind" in Rust) safety, pays the syntactic and runtime costs of error code checking, pays the runtime cost of unwind tables, and still can't actually rely on unwinding to actually work, because anyone can just turn panics into aborts.
I hope for a language with Rust's focus on memory safety but without Rust's weird fashionable-in-the-2010s language design warts.
> The overall result is that everyone pays the cognitive cost of exception (spelled "unwind" in Rust) safety
Very, very few people have to think about unwind safety, because it really only comes into play when you're writing unsafe code, and relying on the ability for panics to be caught. Many folks aren't writing any unsafe, and many who are are doing it explicitly in a panic=abort environment. And most don't rely on panics being able to be caught in the first place.
So, some subset of library authors have to pay attention to unwind safety in some cases. This is hardly "everyone."
> pays the syntactic
It's one character.
> and runtime costs of error code checking,
Exceptions also have runtime costs. I'm not aware of anything demonstrating that there is a really major difference between the two in real systems. I would be interested in reading more about this! Most of the discussion I've seen focuses on microbenchmarks, which can be very different than real code.
> pays the runtime cost of unwind tables,
Only if you want them, as you yourself mention, you can turn this off. And many do.
I do think it's not quite what you're saying though, due to Rust's design, we would have checked exceptions, and so the Result part would be there with exceptions, it's the Ok() part that would change.
In theory you could make a language without checked exceptions that would make it be like your example.
Okay so, technically in theory you can introduce logic bugs, but not memory safety bugs, if you don't consider unwind safety in safe code. Logic bugs can happen in any code, of course.
The practical, day-to-day implications of this still round to zero, though.
> The practical, day-to-day implications of this still round to zero, though.
Vendors have been using similar language to downplay potential bugs for decades, usually to disastrous results. At one point, even memory safety wasn't a big deal. I'm just waiting for a software package to have a security vulnerability when an attacker is able to trigger an untested Rust unwind path and put some Rust daemon into a state it didn't expect.
There are many good parts of Rust, but I don't think I'm ever going to convinced that the error handling wasn't a huge and unfixable mistake. It's because error handling is such a big mistake that Rust has grown layers of syntactic sugar --- try!, !, etc. --- to paper over the ugly spot in the language.
Logic bugs can be just as disastrous as memory safety bugs. From an attacker's point of view, they're ultimately about making a program do something not intended. Downplaying logic bugs (where Rust is weak) and emphasizing memory safety (where Rust is strong) might make Rust look better, but it's not doing any favors for computing.
I agree that they can be. There are a few differences:
1. It is not clear that we will ever be free of logic bugs to the same degree that we can minimize memory safety bugs.
2. We're in a specific context in this sub-thread, and that's that you claimed that this is a pervasive issue that everyone must consider all the time. These logic bugs can only be introduced in a context that is very unusual, and so my claim is not that logic bugs in general are irrelevant, but that the context that this kind of bug can appear is smaller than you say it is.
> logic bugs (where Rust is weak)
Rust gives you way more tools than C to reduce logic bugs as well.
The whole point of secure programming is caring about those "unusual" contexts. And the comparison to C isn't really fair: C is dangerous for everything. Yes, Rust is better than C, and even better in some ways than C++, but my point is that there's another Rust out there, a Rust^, that's even better than Rust, and Rust^ uses exceptions for error handling throughout.
Most applications are not supposed to survive a panic!(), to the point where I personally tend to use "panic = 'abort'" for any code where performance matters. As such it's not really a mental overhead, and definitely not the Rust way of doing error handling (in the same way than using "assert" and a signal handler is not the way you're supposed to do error handling in C++).
The Rust way of dealing with errors is to return a Result<> and in my experience it's the best system I've used. It doesn't have the code flow breaking aspect of exceptions while being a lot nicer and less error-prone than C or Go style error handling.
Libraries can declare "I require semantic x or semantic y." Applications can as well. If an application says it needs semantic x, and a library says it requires semantic y, you get a compile error.
Most libraries do not depend on a particular semantic, in my experience.
I assume library authors can specialize code to either semantic as well, and carefully target both in a single library? E.g. with cfg attributes/macros.
I actually don't think so, but I'm also not sure what use case would require you to do this. Generally, you're agnostic, and only in very specific circumstances would you require unwind. I'm not sure when a library would require abort.
I can't come up with a case for requiring abort, but perhaps there are some performance optimizations possible when not unwinding? I assume the compiler does plenty already, but e.g. there would be no need to hang on to any data for making better error messages from a panic, since you can't capture it.
If Rust had gone with exceptions, we'd have lost the entire embedded community. You have to realize that -C panic=abort was added in no small part because of a hostile fork of the language that removed the unwinding mechanism.
I happen to like exceptions in principle, but for a lot of low-level embedded code they're a deal breaker.
basically there are already a lot of interchangeable backends, they are going to introduce a new one written in rust, which should increase memory security, but rust itself does not cleanup itself when it panics
I see that Stenberg (bagder) is receiving funding for the work from the ISRG, but I wonder if McArthur (seanmonstar) is, too? It seems like a sizable amount of work on their part, too.
Not that I consider the overall sentiment of the linked article wrong, but this...
> Dereferencing a nullptr gives a segfault (which is not a security issue, except in older kernels). Dereferencing a nullopt however, gives you an uninitialized value as a pointer, which can be a serious security issue.
...betrays a complete lack of understanding what Undefined Behavior is/implies. That's not something you want to see in an article discussing memory safety.
The language's semantics are such that by default, you get memory safe code. This is checked at compile time. While many languages are memory safe, they often require a significant amount of runtime checking, with things like a garbage collector. Rust moves the vast majority of these kinds of checks to compile time, and so has the performance profile of C or C++, while still retaining memory safety.
> How does it compare with modern C++?
One way to look at Rust is "modern C++, but enforced, and by default." But that ignores some significant differences. For example, Rust's Box<T> and std::uniq_ptr are similar, but the latter can still be null, whereas Rust's can't. C++ cannot be checked statically for memory safety, even if modern C++ helps improve things, it doesn't go as far as Rust does.
Rusts type system is able to carry a lot of information it can use to verify the memory safety of programs at compile time.
For example, the type system includes a piece called the borrow-checker, which is able to guarantee that pointers are still valid when you use them, which eliminates use-after-free and buffer overflows.
In a similar vein, the type system includes information about in which ways types may be shared across threads, and by using this information, the compiler can guarantee that there are no data races whatsoever in multi-threaded programs.
> We’d like to thank Daniel for his willingness to be a leader on this issue. It’s not easy to make such significant changes to how wildly successful software is built, but we’ve come up with a great plan and together we’re going to make one of the most critical pieces of networking software in the world significantly more secure. We think this project can serve as a template for how we might secure more critical software, and we’re excited to learn along the way.
> one of the most critical pieces of networking software in the world
cURL is widely available and widely used, obviously, but I'm surprised to see it described this way. I've always seen it mainly as a way for people and scripts to conveniently try out endpoints and download files. But this makes it sound like more than that; does it get widely used in an infrastructural capacity?
the binary `curl` is just a CLI frontend for `libcurl`. curl can do a lot more than HTTP. it can transfer data over 20-odd different protocols, including real-time streaming media. it's huge in embedded software.
Once you go looking, you'll find libcurl everywhere. I wouldn't be surprised if your grandma interacts with libcurl multiple times per day without knowing.
> At an estimated six billion installations world wide, we can safely say that curl is the most widely used internet transfer library in the world. [...] curl runs in billions of mobile phones, a billion Windows 10 installations, in a half a billion games and several hundred million TVs - and more.
I remember from very old PHP days that the default http client everyone reached for in PHP land was curl: https://www.php.net/manual/en/book.curl.php . I don't doubt that this is the case for many other older languages as well. This means, in turn, that there are large swathes of the internet communicating with one another via methods like this.
I don't have any data on exploitability, but 19 of the last 22 vulnerabilities (since 2018) have C-induced memory unsafety as a cause: https://curl.haxx.se/docs/security.html
Oh thanks, that at least gives some idea of the potential. I see e.g. "HTTP/2 trailer out-of-bounds read" and "SSL out of buffer access"... I guess there might be some candidates.
Switching immediately to building with C++, and then migrating incrementally to safe forms in C++, would provide much more value per unit effort. It would also enable engagement by the orders-of-magnitude more available skilled C++ programmers, who could also pick up new skills writing modern, safe C++ to apply in other migrations.
It is not an either/or proposition. Certain, select modules could be recoded in Rust by particularly motivated Rust coders, leaving the huge amount of other code, for which there are too few Rust enthusiasts to work on, to be modernized in C++, and still able to call into the Rust code.
I think you may have misunderstood what the post is saying they're going to do. It is significantly more in line with your suggestion than you seem to think.
Yes, curl has a concept of "backends," which you can choose at compile time. This is about providing an option for a new backend, based on Rust libraries. That's it. Nobody is re-writing anything.
I'm unifying Rust's async HTTP implementations H1, H2, H3 and Google's tarpc in Rust~Actix~Torchbear. I just don't have a lot of time now sice my house got broken and I don't have enough money to rent anywhere. It also needs a lot of work on the parsing layer, and the laptops with my notes on them are hard to keep with me as I move around.
I like how the comment referenced in the article with the description "Rust itself can't even properly clean up its own memory" was answered today saying the restriction of unwinding on oom is going away; it's not a fundamental issue, just something that wasn't implemented that way the first time.
I find myself in the need of a "lib_download" a few times, a high level library that:
- support HTTP/HTTPS
- support proxy (for by-passing firewall, censorship, etc, http/https/socks5)
- download one large file in parallel (configurable temporary directory)
- download many small files in parallel (seems too high-level to put in a library, not sure this is a good feature)
- configurable retry (maybe too high-level to put in a library)
- resume download
- good error semantics
- an interface with defined behaviour
- progress report (useful for downloading large files)
I tried using a wrapped (in rust) version of libcurl, and in the end I decided to just use the curl cli, and read through the man page and pass about 13 arguments to it to make it's behaviour defined (to me, to a certain confidence level), I also pinned the curl executable to a specific version to avoid unknown changes.
The end result works, but the process is unnecessarily complicated (invoke the cli binary, know what argument to pass, know the meaning of the many error codes), and the resume is not pleasant to use. I guess libcurl is designed to be that way, so that to an curl-master, he can tune all the knobs to do what he want, but to a average library user who just want to download things, it requires more attention than I'm willing to give to.
Used in an interactive context, the issue of defined behaviour is usually overlooked, but when used a library in a program that runs unattended and expensive to upgrade/repair, achievable defined behaviour is a must, and test is not an alternative to it, even experience is not an alternative (experience are time consuming to get, and not transferable to others).
All package managers needs to download packages from internet, often via HTTP, it's good to have a easy-to-use, well-defined, capable download library, many of them uses curl (Archlinux's pacman, rust installation script), many of them use others with varying level of capabilities, I thinks it would be beneficial if we can have a good library (in rust) for download things.
Certainly, SPARK's GPLv3 license is incompatible with Curl's license. Curl's license is MIT-ish, but adds a prohibition for those who use it from using the author's name to promote their business. That restriction is not compatible with GPLv3. Since the author of curl took the time to write that restriction, I would imagine it is more important to them than SPARK. Since the copyleft licenses are hostile to restricting free speech, it's safe to assume that all copyleft options would be similarly unacceptable.
Rust is dual APL2/MIT, and Curl's modified MIT is compatible with MIT, so no such issue would exist for Rust.
(Standard disclaimer, I'm not your lawyer, no citations offered, seek legal counsel.)
I do not know much about Wuffs, but it seems to be completely safe. No arithmetic overflows, no bound checking failures, no None unwrapping panics, no memory allocation failure panics.
it's an interesting choice. i would have thought that fortifying http client libraries for major languages would be more important, but maybe they've already been hardened and interactive use of curl is a vector.
makes me wonder about other interactive tooling. would be interesting if there were malicious binaries that were benign at runtime but triggered bugs in debuggers and profilers.
I was under the impression that curl worked on more platforms than Rust and LLVM. It will be interesting to see what happens to curl support on those platforms going forward.
As the article indicates, it would be but one of dozens of existing backends, although it'd be one where few alternative backends currently exist (HTTP/1 and HTTP/2).
For instance libcurl can use any of 13 different TLS backends (one of which is already in Rust), or 3 different HTTP/3 backend (one of which is in rust).
Libcurl supports multiple compile-time backends for http support, encryption, and so forth. This will be no different. Hyper will be just one option among many, as will Rustls.
First, in a philosophical sense: pointers and x86 CPUs are real, ultimately any safe abstraction must be built on unsafe primitives. The ability and need to do that aren't specific to memory unsafety, we do that all over software engineering.
Second, empirically, my experience has been that the design of these abstractions can be safe, but moreover that the cordoning off of unsafe blocks makes 3p auditing for memory unsafety _much_ easier to do. It can be orders of magnitude faster than reviewing an entire C or C++ codebase.
A TCB should be dozens of lines, not thousands. More code means more places for more bugs to hide.
My experience in Safe Haskell was that, if you have to ask each module individually whether it has a safety property, then you've already created too much work for yourself. Instead, require every module to structurally encode the desired invariant.
Or, in fewer words: If you want memory safety, don't have `unsafe` blocks.
OCaml has its fair share of unsafe features, with the functions helpfully prefixed by "unsafe_". If you look through the stdlib you'll find dozens of such functions. Plus real world code uses them to do things like avoiding bounds checks. Much as I'm a fan of OCaml, even a pure OCaml implementation could do unsafe things.
Especially given that this is not a full rewrite from scratch - it follows more the model that Firefox uses: integrate components written in rust into an existing C code base. Rusts binding friendliness from and towards C will be a major point in such an effort.
A language like OCaml can still have memory unsafety issues introduced by the compiler or standard library. It just makes it much more manageable to effectively audit for and fix such issues. `unsafe` blocks serve the same purpose.
Not only that, but to write anything more than a hello world requires lots of unsafe blocks. There are loads of them everywhere, in all of those crates, incl. the standard library.
Moderately low-level crates require some unsafe blocks. Rust is supposed to be a systems language from what I heard, so if you are going to write something larger, and/or complicated, and/or low-level, you will have to resort to unsafe blocks, I think.
Yep, there are indeed loads of them. Feel free to clone all of those repositories and look for those unsafe blocks. You could even get it per-crate.
Of course there are some false positives in there, and yes, some crates may not have unsafe blocks (so not all crates have them, you got me), but still... It is a bit too many unsafe blocks.
Or I do not know, maybe they just have a thing for unsafe blocks, for example in the standard library (std) you can find 2267 unsafe blocks.
One thing to bear in mind is that the libc and winapi crates are bindings to the libc and Windows C headers, so every single function signature in there will be marked unsafe because it's FFI, so those will be heavily impacting your results.
You could also avoid some false positives in comments by searching for "unsafe fn" and "unsafe {", rather than just the word "unsafe", as those are the only tokens (to my knowledge) that can follow "unsafe".
The point is that they want further assurances that the "curl https://totally-not-evil.example.com/install.sh" part won't, in certain environments, screw up some pointer arithmetic and write the buffer into executable memory, or cause some other heartbleed-esque bug which can be exploited.
Piping it to "sudo bash" is perfectly acceptable in the eyes of the system. It's doing the instructions the user has asked it to, they've explicitly been configured as sudoers, and usually have been prompted to enter their password.
Historically there was a long period where this didn't do what you expect, which is very bad.
What this looks like it does, and indeed does today (modulo bugs some of which could be prevented using Rust) is:
Ask totally-not-evil.example.com for this install.sh resource and then run that as root as a Bash script. This is no worse than if you were to have totally-not-evil.example.com give you the bash script on a floppy disk or something. If you suspect they might actually be evil, or just incompetent, that's on you either way.
But for some years curl didn't make any effort to confirm it was getting this file from totally-not-evil.example.com. Connect over SSL, ignore all this security stuff, fetch the file. So then it's like you just accepted a floppy disk you got in the mail which says it's "from totally-not-evil.example.com" but might really be from anybody. That's definitely worse. Today you have to specify the --insecure flag to do this if you want to (Hint: You do not want to)
Yeah, and it still doesn't matter. Because if you run that command you are already trusting that site with RCE on your machine, so if the premise of the attack is "the site is bad" you were owned anyways.
Moreover, the site could be legitimate and ship you something legitimate, but if it's truncated for any reason it could still be a "valid" bash script that now does incorrect things. Consider various prefixes of `rm -rf /tmp/thisscript.working/...`
The website could detect whether you are using a regular browser or curl itself to download the .sh file and return something different. So inspecting the .sh using your browser before you run that line would not protect you.
In your example, bash, sudo, linux, your DNS stack, ISP, router, clipboard and keyboard all play a role that is just as essential as curl in that command working the way you (cynically) expect it to.
The bug did pass the type checker. Memory safe languages also have security issues. The program never run is the most secure, or like a programmer gain experience, programs get "battle hardened".
> Hyper is a fast and safe HTTP implementation
Well.. Hyper does rely on unsafe blocks (14 at first glance[2]), so I don't know if we can just assume that it's safe. When Sergey Davidoff did their big smoke test of popular Rust HTTP implementations they found a couple of bugs[1] (through Reqwest).
I love the idea of a safer cURL, but I don't think you should take this as a magical answer to all of cURL's problems.
autocfg, bitflags, bytes, cfg-if, fnv, fuchsia-zircon, fuchsia-zircon-sys, futures-channel, futures-core, futures-sink, futures-task, futures-util, h2, hashbrown, http, http-body, httparse, httpdate, indexmap, iovec, itoa, kernel32-sys, lazy_static, libc, log, memchr, mio, miow, net2, pin-project, pin-project-internal, pin-project-lite, pin-utils, proc-macro2, quote, redox_syscall, slab, socket2, syn, tokio, tokio-util, tower-service, tracing, tracing-core, try-lock, unicode-xid, want, winapi, winapi-build, winapi-i686-pc-windows-gnu, winapi-x86_64-pc-windows-gnu, ws2_32-sys