At the risk of being slightly tangential, I've been sorely wanting to air this particular grievance with Rust for some time. It's somewhat related, since the author mentions their package system. Its package ecosystem isn't nearly in the horrible state that node's is, but having a package system shouldn't be a substitute for designing a useful standard library for a language. I think that the attraction to 'small languages' is very much misplaced. If I can't get through Rust's official documentation without being recommended the use of third party packages for basic functionality (getopt, interfacing with static libraries... Etc) then the designers have made a terrible error.
Opinions on this are a dime a dozen. You often see the reverse of it too, for example, you might have heard that "Python's standard library is where things go to die." You could just as easily call that a "terrible error." The fact that Python's standard library has an HTTP client in it, for example, doesn't stop everyone from using requests (and, consequently, urllib3) for all their HTTP client needs. So despite the fact the standard library provides a lot of the same functionality as a third party dependency, folks are still using the third party dependency.
I think the size of the standard library is just one of possibly many contributing factors that leads to a large number of dependencies. I think a part of it is culture, but another part of it is that the tooling _enables_ it. It's so incredibly easy to write some code, push it to crates.io and let everyone else use it. That's generally a good thing, but it winds up creating this spiral where there's almost no backpressure _against_ including a dependency in a project. This means there's very little standing in the way of letting the fullest expression of DRY run wild. There are some notable examples in the NPM ecosystem where it reaches ridiculous levels. But putting the extremes aside, there's a ton of grey area and it can be pretty difficult to convince someone to write a bit more code when something else might work off the shelf. (And I mean this in the most charitable way possible. I find myself in that situation.)
I do hope we can turn the Rust ecosystem around and stop regularly having dependency trees with hundreds of crates, but it's going to be a long and difficult road. For example, not everyone even agrees with my perspective that this is actually a bad thing.
Python's standard library is where things go to die because of the terrible adhoc versioning system (the module name is the version number) and dynamic typing means they are afraid to change anything. But even then it's still better than having no standard library at all.
The advantage of a standard library is that you only need to learn one API instead of a dozen different APIs for doing the same thing, which means you can develop a degree of mastery over it. It also reduces the friction for using better abstractions. e.g. Every professional Python programmer knows defaultdict, whereas I rarely see that data structure used in other programming languages, it's too much of a leap to install a dependency to save a few if statements, but it all adds up.
> The advantage of a standard library is that you only need to learn one API instead of a dozen different APIs for doing the same thing, which means you can develop a degree of mastery over it.
The rust ecosystem has done well to converge on certain crates as sort of replacement for missing std features.
In practice (at least in the rust ecosystem), I only need to learn one interface for:
As a relative outsider, it’s not obvious at all that these are the right crates to choose. I appreciate the commitment to long-term stability that the standard library appears to have, but that benefit goes out the window if I accidentally rely on a third-party crate that changes its API every six months.
Looking at crates.io, regex looks pretty safe, as it’s authored by “The Rust Project Developers” and includes explicit future compatibility policies. Unfortunately, I can’t find an index of only the crates maintained by the Rust team.
Serde is obviously popular, but at first glance is a giant Swiss Army knife that will likely have lots of updates to keep track of that are completely unrelated to my project (whatever it is). If I search for JSON, I get an exact match result of the json crate, followed by a bunch of serde-adjacent crates, but not serde itself.
Request hasn’t been updated in 4 years, and has a total of less than 7000 downloads.
They probably meant reqwest (https://github.com/seanmonstar/reqwest), not request. Reqwest is maintained by the same developer (seanmonstar) as hyper, the de facto standard http library.
All these libraries are very well known within the community and are what I would come up with as a complete outsider (I don't think I've written more than a hundred lines of Rust code to this date).
There's actually a more official resource: the rust cookbook[0]. This is maintained by the rust-lang team (rust-lang-nursery is an official place for crates maintained by the rust language maintainers).
That sounds like something that could be solved by having crates.io provide a curated list of common popular crates for certain features. That is, this seems to be mostly a documentation issue.
It’s really a reputation bootstrapping problem, for which popularity can be a useful proxy. For me to use third-party code, I have to trust that the future behavior of the developers will be reasonable: I want my side projects that don’t get touched for months or years to still mostly work when I get back around to them.
Not everyone or every project will have the same desires, though. Sometimes, a fast-moving experimental library is the right choice. The trouble is figuring out which I’m looking at.
I'm not sure I follow these concerns about "working in the future" - as long as you specify versions that work for you in your Cargo.toml file, that should work at any point in the future given that you use Rust 1.x.
If you'll want to update to always be on the latest version of each crate, well that discomfort about them potentially not working is part of the price.
If I come back to something, it’s because I want to resume active development. Keeping a dependency pinned at an old version makes that more difficult in various ways, so I personally value forward compatibility.
Not everyone does, and that’s fine. I just want to know what a library developer’s stance on it is before I try to use their library.
Not really. Rust supports 8 bit microcontrollers. Lot's of libraries, including parts of the standard library make no sense on these kind of platforms.
The standard library and 3rd party crates generally have excellent compatibility across mainstream platforms.
Except the small detail that C's POSIX support is much wider than Rust's tier 1 platforms.
I am fairly certain that without profiling and defining a test configuration for a set of specific C++ compilers / standard C++ library I won't assert anything about std::regex performance with 4 MB of RAM.
In other words, not all functions in the stdlib of Ada, Pascal, C and C++ can be used in all possible target environments? Sounds like a failure to quality gate those standard libraries.
I'm not sure if you're just being disingenuous here, but you're right that you're not going to be able to use all functionality from the stdlib of Ada ( and others ) on every possible target, but you were never, ever going to. And Rust certainly won't solve this problem for you. It's not a consequence of poor standard library design either.
It might not be immediately obvious, but even C has a runtime library, which needs to be specific to the architecture and OS that you're targeting. Just for a quick example, `malloc` is going to need to function differently depending on what OS you're running, and if you're targeting a microcontroller with extremely limited RAM it might not even need to be implemented at all.
I don't think the parent was claiming the Rust was better in this regard, just that it was no worse. Other languages also restrict standard library features on some platforms.
Not exactly. Rust can be made to run on 16-bit toaster or even OsIJustWrote. Just because it can be run, doesn't mean Rust std lib devs will support 16-bit toasters or OsIJustWrote.
Each platform has different levels of support. Primary being Windows, Mac and Linux, where every pure Rust crate runs.
Std lib makes certain reasonable assumptions, for which it works e.g. malloc exists and panic! is implenented.
In Python, if you're running 3.7.1 then you're also running the standard library for 3.7.1. Sure, I guess it would be possible for a programming language to decouple these things so that it's possible to ask for a particular version of the standard library (in its entirety), or a particular version of a standard library... but then programmers can no longer rely on the standard library to "just work" and "just be there", which is its appeal. If you decouple the standard library from the language, might as well switch to a Rust-like system where you simply give an official stamp of approval to certain packages regardless of who developed them.
I think you might have misunderstood me. I was contrasting one extreme interpretation with another. I was not really criticizing Python. Its large standard library is one of the things I like about it.
> dynamic typing means they are afraid to change anything
When talking of the standard library, static typing doesn't save you when breakage happens. It's better of course, at least the compiler protects you from obvious errors (although that doesn't work for transitive dependencies, with the dynamic linking to binaries that Java / the JVM does ;-))
The problem is when a piece of code that was compiling fine a year ago, fails to compile on a newer version of the standard library, due to breaking changes, that's going to take time and effort to fix.
And this gets worse when the breakage happens in dependencies and those dependencies are no longer maintained. This can always happen of course, not just due to the standard library, but due to transitive dependencies too. But still, breakage in the standard library, or in libraries that people depend on, is a bad thing. And consider that as the number of dependencies grows, so does the probability for having dependencies that are incompatible with one another (compiled against different versions of the same dependencies).
And semantic versioning doesn't work. Breaking compatibility will inflict pain on your downstream users, no matter how many processes you have in place for communicating it. And this is especially painful when you're talking about the standard library.
If the standard library introduces breaking changes, regardless if the language is static or dynamic, then it's not a standard library that you can trust. Period.
Also — when should you break compatibility, in the standard library or in any other library?
The answer should be never!. When we want to change things, we should change the namespace and thus publish an entirely new library that can be used alongside the old one. Unfortunately this isn't a widely held viewed, but I wish it was.
---
Going back to the batteries included aspect of some standard libraries, like that of Python, there's one effect that I don't like and that's not very visible in Python since the bar is pretty low there.
The standard library actively discourages alternatives.
When a piece of functionality from the standard library is good enough, it's going to discourage alternatives from the ecosystem that could be much better.
Some pieces of functionality definitely deserve to be "standard". Collections for example, yes, should be standard, because libraries communicate between themselves via collections. And that's what the primary purpose of a standard library is ... interoperability. Anything else is a liability.
In my post, I specifically call on proc-macro support (syn and quote) plus rand (not all of rand though, just the "give me a random number" functionality that comprises 99% of the use cases but 10% of the implementation complexity) to be added to the Rust standard library. But I feel these are particularly justified because they're already present, just not accessible. Overall, I think Rust's "batteries not included" approach is a good tradeoff.
I don't necessarily disagree. I have my own pet things that I'd like to see in std too. I've long wanted to see lazy_static in std. Funnily enough, it looks like `once_cell` might wind up being a nicer approach to achieving a similar end, but without using a macro. So if we had added it to std many years ago, we might find ourselves with an API that we regret! It's tough.
With that said, if `syn` finds itself in a spot where it needs to do a breaking change release for a new language feature, then that would be tricky. It sounds like syn's architecture is pretty flexible (non-exhaustive enums), but it's not clear to me that it could support all possible language additions without breaking changes.
> So if we had added it to std many years ago, we might find ourselves with an API that we regret! It's tough.
That is the ultimate problem of batteries includes vs a small standard library distilled into a specific example.
People used to tout Python's "batteries included" line as positive almost universally a decade and more ago. Then better replacements were developed, and those batteries started looking less appealing.
There's really no escaping that while including any base functionality. Either you include it and portions will be stale later as better interfaces and paradigms are developed, or you don't, and you risk slower adoption and harder usability as people need to figure out solutions for common tasks, even if that solution is as simple as find the crate that provides it. Because eventually that becomes a problem not of finding the crate, but finding the best crate out of the multiple that exist, which itself causes fragmentation of the common developer experience (and thus makes it harder to share knowledge and have a good community).
I think I get the tradeoffs but while my company would be a perfect fit for rust, we do all our development on an airgapped network. Custom registries are a thing now, but picking and choosing which packages our IT will consider trustworthy and then taking the subset of that with a license our legal will approve an then prunimg things with dpendencies that are now missing is a huge task that needs to happen at regular intervals and probably leaves some pretty big gaps in functionality. Its a shame, I like the language.
At least from a licensing perspective, you should be all set. Virtually everything in the Rust ecosystem is permissively licensed.
Trustworthy is another thing altogether though. How do you handle this process in C and C++? Both of those languages have fairly spartan standard libraries as well.
> Virtually everything in the Rust ecosystem is permissively licensed.
I just did our license audit and 100% of our shipped deps were Apache | MIT. If you can clear those two licenses with legal, you should be good to go for virtually any crate.
Might be worth releasing my one liner for this if anyone else finds it useful.
The spartan C standard library was what my comment regarding 'small languages' was referencing. I don't have much experience with many of C's contemporaries, but for an opposing view take for instance the standard library of Ada ( I use the term 'standard' here to connote the library that is mandated as per the standard of these languages ). It is definitely orders of magnitude larger than that of C, and takes a fundamentally different perspective. Ada's standard library, while dated by modern standards, looks more like it was designed to address the specific needs of its domain. Mind you, it could be argued that C's stdlib was also, if you restrict its domain to 'OS development'. My point is that for a language like Rust, that does not have a tiny stdlib like C, its standard library should address the common use cases required by its developers. Node.js is another bad offender here. If you look at the most popular npm modules, you'll see things like 'body-parser' and 'async', which clearly show gaps in the functionality that Node's stdlib is catering for.
Actually they don't, C standardization just dumped that standard library into POSIX instead, which I why all major OS end up supporting POSIX if they want to make C developers feel at home.
And you usually see them cursing the platforms that don't care about POSIX support.
And ISO C++ has repented themselves from following C's footsteps and have been improving the standard library since C++11, mostly by integrating boost libraries.
I don't understand what point you're trying to make. All I did was ask a simple question: if trust in Rust is hard, how do you handle trust in other ecosystems? The details on exactly how big or how small the standard libraries are (or why they are that way) are less important for this particular question. Consider the size of Python's standard library vs C or C++ or Rust. The size of C or C++ is much closer to Rust's size than Python's size.
Also, while some of your historical context is interesting, I don't really appreciate your editorialization, which I often find is off the mark personally.
My point is that although many only think about ISO C libc when talking about C's standard library, the reality is that with a few exceptions, libc always goes alongside POSIX across the large majority of platforms with a C compiler.
So in reality POSIX complements libc as C's "runtime platform", even though ISO C never considered to make libc that big.
My historical context is how I experienced how things went, throughout the media we had available at the time.
I surely welcome factual corrections when I fail off the mark.
Everyone benefits from learning proper history facts.
> My point is that although many only think about ISO C libc when talking about C's standard library
I wasn't. Even with POSIX, it is spartan by today's standards. Look at the standard libraries of Python and Go. POSIX doesn't have JSON (de)serialization, HTTP servers, XML parsing and a whole boatload of other crap. So I don't think there is anything wrong with my characterization.
> not all of rand though, just the "give me a random number" functionality that comprises 99% of the use cases but 10% of the implementation complexity
Are you suggesting including the core `rand` crate with some traits and a basic pseudo-random generator in the stdlib, and allowing other crates to use those traits to implement the many other[1] RNGs?
I'm honestly not sure this is a good idea, it seems the rand crate has undergone a few iterations, crystalizing this inside the stdlib might lead to problems?
I admit, design of a proper random number library API is tricky, and experience from deploying rand will no doubt be invaluable. The point I was trying to make is that some use cases require sophistication such as being able to choose different algorithms, but a lot of the times rand shows up in a build time histogram, it's just because the user wanted some pretty good random numbers; a much simpler API would suffice.
I agree that in many cases a much simpler API would suffice, but it strikes me as odd to suggest adding a convenient but insufficient-for-security API to the same standard library that protects HashMap against DoS.
The request here is for a simpler API. I and many others would be more than happy to get a function that returns a random u64. No traits, no crippling commitments to a specific API design.
But the std hashmap implementation[0] already depends on a PRNG via the rand crate[1], and thus OS random (for the initial seed). So it's in the standard library even if you can't use it directly. Which is what makes me think this is an API issue, not an implementation one.
It does depend on this yes, but the random seed isn’t part of the public API. An RNG API in std would need to pick an algorithm and make that part of its stability guarantees.
In this case I'd quibble that it wouldn't need to guarantee a specific algorithm, just a minimal set of properties. If the user can't manually set the seed then they'll be no expectation that it's deterministic or follows any particular algorithm. So the precise implementation can change with versions or even platforms.
Javascript's Math.random and Crypto.getRandomValues works this way.
Which is fine. Not everything needs SecureRandom. I'd wager more code using random numbers is for A/B tests, sample data generation or video games than people implementing crypto.
I get that security people don't want non-secure RNGs to exist out of fear that someone might try implement security related functions out of them but why should I care if I just want to choose between 3 types of enemies this wave?
No worries, it was read with offense I didn't take. I agree, designing a good random number API is in fact hard. I just think we can give developers a better out-of-the-box experience.
Having a very simple API is not at odds with security, though. A simple API could automatically seed from the OS and then run a tiny loop with sha2 or sha3, for example.
SHA2 is not designed to be used as a RNG. SHA3 might be a little bit better with its "streaming" modes, but there are many far better and faster ways to get random bits.
2. You definitely don't want your std to be designed with a simple API to a fast, secure, and popular crypto primitive? Pretend I named your favorite one, to avoid bikeshedding issues.
Inclusion of parts of rand might be a good idea, but syn seems to need breaking changes as for long as the Rust language is getting new features:
> Be aware that the underlying Rust language will continue to evolve. Syn is able to accommodate most kinds of Rust grammar changes via the nonexhaustive enums and Verbatim variants in the syntax tree, but we will plan to put out new major versions on a 12 to 24 month cadence to incorporate ongoing language changes as needed.
My core argument is that libraries usually do need breaking changes, and including them into std turns them into zombies that can never be removed. I think python has issues with this. And the syn example that I pointed out above shows that Rust isn't immune from this either.
Right, I see your point here. There are non-trivial issues that need to be worked out, which I'm sure is one reason it's not in the library yet. Another (mentioned elsewhere) is that it takes time to converge on what a good API might look like.
I feel like a reasonable middle ground to these issues is for communities to perhaps embrace “metapackages” that serve as community maintained “standard libraries”. In the R community we have tidyverse, which is practically a full mirror of R’s standard libs at this point.
These metapackage communities can then focus on interoperability of constituent packages w/o overburdening the standard libraries, and core language can remain lean and concise.
Tidyverse is written by a strong contributor to R, but it isn't used regularly by a large percentage of the community, so I'd be averse to calling it part of the standard libraries - although I appreciate this is what you're referring to with the speech marks.
On point of the post, Tidyverse is a great deal of bloat by loading unnecessary packages instead of specifics and creating namespace issues where two functions have the same name. It's generally fine for interactive work, but causes so many issues in development as you can't pick and choose what's loaded into the environment.
In line with what you're saying, meta-libraries make sense in terms of developing a line of packages towards a singular vision, but only make sense when the core libraries aren't expanded regularly. Maybe in these cases more of push needs to be made to supporting the existing tools?
R is an odd example though, as the standard libraries are loaded by default and I think only really comprise of basic stats/graphics/data.frame tools. I don't really believe a great deal of extra tools have been added to base R in the last few years, just improved in terms of speed and memory use (e.g 3.5.0's change to compiled packages)
Several attempts have been made at this in Rust in the past, but few people use them. It just adds even more dependencies that you’re not actually using.
You mention Python, but I am pretty sure you've written some popular Go packages, so what is your opinion on the Go standard library? I think it's a good example of 'batteries included' in the right sense. Though, some of the reason its good may just be virtue of the fact that it's newer and there's less rotting packages; I guess only time will tell for sure, but it definitely feels right to me in many cases.
I like Go's standard library. It's well designed. The number of pitfalls is pretty small.
I have more thoughts, but they are very hand wavy and ill-formed, so please take them with a grain of salt. One of my theories for why the Go standard library has had as much success as it has, is that it doesn't necessarily provide implementations that go as fast as reasonably possible, and that tends to give more flexibility for exposing simpler APIs. A good microcosm of this idea is JSON (de)serialization. Without even blinking, I can think of three reasonably popular third party JSON (de)serialization libraries in the Go ecosystem. The one provided by the standard library is pretty slow compared to some of them. It's not clear to me that it can be fixed without changing the API. But it's a good example where the standard library has provided something, but it isn't good enough in a lot of cases, so folks wind up bringing in a third party dependency for it anyway.
But even that alone isn't necessarily a bad thing. encoding/json is likely good enough for a really large number of use cases. On top of that, it's very convenient to use. (I'd still take serde in Rust over Go's system any day, but that's a different conversation.) And this kind of fits within Go norms pretty well. Go was never built to be the fastest, so the fact that some of its standard library has perhaps sacrificed performance for some API simplicity is totally consistent with that norm. And I don't think that norm is a bad thing.
There are some other examples where Go's standard library is slower than what it could be, for example, CSV parsing and walking a directory hierarchy.
Overall, I think the balance struck by Go's standard library was very nicely done. However, I'm not convinced it could have been replicated by Rust. The reasons for that are just guesses, and wander too far into musings about how the language is itself developed. But even putting that aside, Rust is going to have stricter requirements, because people tend to gravitate toward Rust when performance is important. So if std doesn't provide the fastest possible thing, then it's going to be a bigger deal than if Go does the same thing.
Again, above is super hand wavy and just a bunch of opinions from my own personal perspective.
> I like Go's standard library. It's well designed.
It feels to me more like code incidental to other purposes than a library-as-library. Some folks like that and extracting from a concrete use is often instructive and efficient.
On the other hand, there is a truly maddening inconsistency in whether you get an interface or a physical struct.
One of these lends itself to easy replacement, injection and mocking.
The other lends itself to writing the nth-tillion interface wrapper for the parts of the standard library which use physical structs. Which are, naturally, slightly different from and therefore incompatible everyone else's bangzillionth interface wrapper for the parts of the standard library which use physical structs.
Then there's errors. But that's another day's rant.
> On the other hand, there is a truly maddening inconsistency in whether you get an interface or a physical struct.
That’s an issue, though it’s also one with most Go libraries too.
The thing is though, unnecessary interfaces also kind of suck. It makes code harder to follow, when the concrete type is hidden behind an interface. Also, one of the things that’s common in Go is testing with real implementations, rather than mocking - and I used to do just that. Even with Redis, I had a tiny shim server that implements the Redis protocol, that I would use in tests. Not absolutely everything can be done efficiently this way, but the virtues of testing with real clients are hard to ignore. Mocks and stubs can hide a lot of bugs that you would need to hope are caught by slower integration or e2e tests... And by virtue of being slower, they generally would cover less branches, too.
> Then there's errors. But that's another day's rant.
Have you kept up with the latest? I think they’re headed in the right direction with errors. Specifically with Is and As, along with the %w directive. The %w directive is a bit weird, but honestly, it’s a clever solution, and it seems like it would work.
Thanks, that seems reasonable. I still hope Rust can strike a better balance than it has now, which perhaps will become easier as champions emerge in the ecosystem. (Crossbeam stuff in standard library would be nice.)
A better choice would be enterprise titans like Java and .NET, more bases are covered. Go's GUI story for example is just endless fragmentation compared to e.g. Java.
This comment indicates the problem! Taking the scare-quotes "enterprise" as a synonym for "bloat", one person's necessary feature is another person's bloat.
.NET is an interesting case, because of WinForms, which is a de facto part of the standard library. I think few C# developers think of WinForms as bloat; it's very convenient for making simple UIs. Yet putting, say, GTK+, in the Go standard library would doubtless be considered bloat. I don't think there are easy answers to these questions.
Agree, the "enterprise" jibe here is misguided - as an OSS developer, I find the dotnet standard library to be fantastic. I don't find it to be at all bloated or 'enterprisey'.
This is why I chose the word "anachronistic". It seems of another time, because it is. It's definitely hard to figure out what will and won't be timeless, but it isn't hard to look back with hindsight and point out things that definetely weren't.
Yes, and instead now we have devs forced to build SPAs and architect every single thing as a client-server app with a database attached. I get it, SaaS is great for vendor lock-in. But not every in-house tool needs to be run as a service.
I think not anymore since java.time - which is incredibly similar to Joda - came around? This is actually an illustrative example of the process I like and hope rust will develop over time: let the community reach a consensus on the best third party libraries things, then consider pulling them in to or at least taking the best parts of their APIs for the standard library.
Other Java stdlib packages can't depend on Joda-time. If it wasn't added and I used joda-time I'd have to convert to the old datetime classes if I wanted to use it with stdlib packages.
Another example was CompletableFutures which were inspired by ListenableFutures from Guava.
I can now use these with guarantees that they will be stable as Java has strong commitments to backwards compatibility.
The JRE should be self-sufficient. By bundling java.time, it can finally start offering methods that take and return those types, instead of the current jumble of millis, nanos, long+TimeUnit pairs, Dates, and Calendars.
There's a cost to discovering the consensus choices that I think inclusion in a standard lib minimizes. But there may be other similarly good ways to accomplish this. If I'm remembering correctly, doesn't Rust have a set of libraries that aren't in the standard library but are somehow vouched for? Maybe that's a similarly good approach to solve the discovery problem, I'm not sure.
Ok thanks. Is the intention for that to grow, as a curated set of libraries that isn't quite the standard library? Or so you think some of those will move into the standard library if they become canonical or stable enough?
I can think of a bunch of stuff that has moved or is moving from third party crates into the standard library: parking_lot, hashbrown, (minimal) Future trait. But I agree that no crate has moved wholesale into the standard lib.
parking lot and hashbrown moved their internals, replacing ones that already existed. This is probably a distinction that doesn’t actually matter but in my head it’s different for some reason, thanks for pointing it out :)
> A case in point is how the ORM Entity Framework that comes with .NET has made the older NHibernate (a separate package) obsolete
This did happen over time, but NHibernate was still really popular for a long time after EF came out, because of limitations it had.
I also don't think it was entirely because EF existed - over time, EF implemented more and more features that NHibernate had, yet at the same time it seemed like the NHibernate team had given up - there were no updates to it for a long time. It was the lack of updates that moved me to EF, but I always preferred NHibernate.
I may not be a typical .Net developer, but my personal feeling is that this is too general and somewhat glib.
If you're inexperienced you will (and should!) choose the default option, if there's one available, and the most commonly used option if there isn't a default. So, if you wanted an ORM, then before EF there was NHibernate. But after a while you become able (from painfully gained experience) to determine what you want from a tool and what trade-offs you want to make, so you might use something like Dapper or no ORM at all.
Personally, I have found some of the MS provided implementations - shall we say - less than optimal. The Unity Framework. EF (which I dearly wish I have never had to suffer using). MSTest. Enterprise Library (oh god ugh).
So I make other - informed - choices about what to use. Some things I keep using because they are 'good enough' and I have a library of utilities and a mental map of how they work - log4net, NUnit - and some things I find I just don't need any more (mocking libraries, for one).
MS still support their provided implementations - as they should - but even their own projects can and do use third-party frameworks rather than the MS provided implementation (for example, Bot Framework used AutoFac (back when I was using it, anyway) [1]) - because their developers have been released from the requirement to exclusively use MS provided frameworks, and are making their own choices about what to use in their projects, and consequently what their users should use when using those tools.
Eventually, of course, if a tool is so crucial that it becomes part of .Net itself - the best example I can think of is dependency injection in .Net Core, which is in Microsoft.Extensions.DependencyInjection - you'd have to be mighty stubborn to use anything else.
One other point: I wouldn't describe NHibernate as 'obsolete', but given their historic and chronic inability to keep their documentation sites live and working ([2] referenced from [3]) it's easy to get that impression. But people are certainly still using it [4]. Just not as many as there used to be.
I'm not saying .NET developers will never use other libraries as an alternative, but they wont invent wheels (or used reinvented wheels) where the .NET provided one is solid. I am speaking generally of course. And there are many developers who are .NET + something else. I'm .NET and dabbled with Haskell and Node JS for example. I'm looking at Lisp as it is interesting. But let's write a web app in .NET. How many .NET developers will think "which framework should I use?". That is a valid question in JS/Haskell/Lisp. It's quite interesting, and its a positive for .NET in many ways. You can get stuff done, and also come in on a project and it be familiar.
The interesting thing about the way things have developed with using third party dependencies early on is that these dependencies provide us with data.
We don't have to just guess or make biased claims about what would be useful to move to the standard library. We can look at the numbers and see what people are actually using.
I am a Rust user myself and I think one major problem is that beeing on crates.io say nothing about the quality of code. I never published a single crate myself because in my eyes to be worth published a crate should work decently.
That beeing said, I think the whole Rust userbase would benefit from having some sort of collection of well tested crates and strictly divide between private, work in progress and production crates.
I only know Node and Python well enough. I find with NPM that dependencies are indeed hell because anything goes. With Python it's slightly less bad because packages have a fixed tree of versions (although I don't know what's up with Conda or others yet), not to mention the more extensive standard lib. I suppose if there are real conflict workarounds required you could always use virtual environments all the way down, but I have generally just done whatever upgrading is required to resolve incompatibilities (which can sometimes preclude using third-party deps or require forking and upgrading their own deps).
I also remember from years ago that Bundler in Ruby allowed version mismatches to coexist and would install both deps in the same tree, punting any runtime issues (same as NPM).
Any Gentoo or NIX users might have something to add here, as I can remember being regaled at conferences by them about this topic.
All that said, it would seem that Rust could have some kind of super-intelligence about dependencies due to all the static goodness. So my questions are:
a) who cares how much is userland and how much is std lib if everything is equally safe and documentable?
b) if "too much" becomes std lib, could any feelings of overwhelmingness not be mitigated with more namespacing?
The canonical example I was thinking of was this: https://doc.rust-lang.org/nomicon/ffi.html
We only get as far as the second paragraph before the official manual is recommending that we use a 3rd party crate. I understand that this crate is made by the Rust developers, but this doesn't seem like a good approach to me.
I feel like the role of a standard library has become kind of overloaded. Back when you had to manage all your dependencies yourself, having that baseline of functionality bundled with the language and maintained by its authors was a logistical necessity. You just don't have time to chase deps for every little thing.
But in the modern world, I agree with the small language people: the logistical problem is dead. Almost every modern language has push-button dependency management, so the difficulty overhead of using a third-party library is basically zero. And, as you pointed out, this means that the stdlib now competes with third-party libraries, and the third-party libraries proliferate wildly. So there's still a problem, it's just not logistical anymore.
The other problem a stdlib solves is making decisions. This is the npm nightmare: which of these fifty libraries do I use? They're easy to install, but hard to evaluate. This compounds because every library is also making those decisions with their dependencies and so on, so you can end up with 8 different implementations of basically the same thing. You don't have this problem with a stdlib because if there's a std::string, everyone expects your code to work with std::string.
So perhaps the modern stdlib is better off as a standards library: no code, just a mapping from name (ie "option_parser") to package@version (ie "clap@1.1.0"). The work of the stdlib would then be to curate this mapping in a cohesive way, so the packages are high quality individually, but also work well together and reflect the general direction of the language and its community. Whether or not these packages are actually bundled with the compiler is really just an implementation detail. The library isn't code; it's decisions.
Updates would be pinned to release versions, so backwards-incompatible changes coincide with a major version/edition/etc of the language. This would be an expected process, because, as with urllib{n+1}, the first answer isn't always the best. Third-party packages are just as available as ever, of course, and a destandardised package is only a (trivially automated) rename away.
Aside from providing a layer of simplicity and stability over the third-party ecosystem, the standardised libraries would serve as reference implementations for a common interface, like Node's express/connect, or Rust's tokio/futures. In other words, it could help concentrate community effort around emerging standards.
I grant that this is a very, uh, political approach to what has historically been a code problem, but if anything I think the current situation owes itself largely to treating community problems ("how do we agree on a foundational layer of library code?") with technical solutions (package registry + sort by stars).
The role of the standard library is entangled with the role of the language as a whole. Many people use tbe python interpreter as a sort of advanced calculator. If batteries weren’t included there this whole usecase would suffer.
There are a lot of us out there with dev and prod environments that aren’t allowed to be connected to the Internet or otherwise strictly controlled. Languages that rely on me downloading packages are mostly dead in the water at my workplace.
Virtually no modern language or package system requires this. All of them allow you to cache dependencies locally, so they can be committed to version control or otherwise put in place during deployment however you see fit.
Maybe I didn't communicate my point clearly. I cannot just go out and get whatever dependencies I want to use. It isn't possible. What I have on the system is what I've got, and I can't add anything. I cannot put the dependencies on the system by any means.
Thus a lot of things get done in Python and Java, because of the ample standard libraries. I was able, after a year of lobbying and procedures and approvals, to get a Rust compiler, but there is zero chance of me ever getting anything off crates.io.
Firefox, Debian, and many other build systems aren’t either. That’s orthogonal from all of this; the use case is well supported and has been for years.
My main gripe with crates.io is that they allowed everyone to take every name. The "foo" crate just belongs to the first person who took (or squatted) that name, regardless of whether that person actually implements a nice library under that name.
What they should've done is allow uploads only to "<username>/<cratename>". So if I decide to make a regex crate, it's called "majewsky/regex" at first and Alice can make "alice/regex" and Bob can make "bob/regex". Then at some point, the community (through some open process) decides that Alice's crate has the best API, so "alice/regex" gets aliased to just "regex".
That way, everything in the main namespace adheres to some sort of quality standard and has community support behind it. Because there's some explicit process gate to getting stuff into the main namespace, you could attach any number of beneficial requirements to it, e.g.:
- test coverage
- documentation coverage
- at least 3 people having committer rights to the repo, at least 2 of which must not be affiliated with the same company
Apparently that was done in the Ruby community and Github "helpfully" uploaded all the Ruby packages to the gems repository meaning all 217 forks of https://github.com/httprb/http became potential gems.
But the underlying idea is sound: namespace and federation. Rust should ideally follow Java's solution. And it's completely forward portable: when namespacing is introduced, the crates.io/regex crates become io.crates.regex. Then you can have your com.github.majewsky.regex.
I don't particularly like URL-derived package names, esp. with Go where the repo URL is auto-derived from the package name, because that makes it an absolute pita to move the repo to a new canonical location.
That basically just moves the problem to a land grab for the group name. Because people don't think Alice/regex looks professional enough, there will be a rush for regex/regex.
The language is still 'fairly new'. I'd rather the lang team let popular things cook as a 3rd party package for awhile until a good default solution shakes out. For instance hashbrown was just pulled into the stdlib. Some people are taking a look at bringing Crossbeam into the stdlib.
[1] https://internals.rust-lang.org/t/proposal-new-channels-for-...
Rust does need a better package curation story, a Rust "expanded universe" of recommended packages whose provenance is more carefully tracked than the average package and whose APIs are stable. The Rust community needs to encourage other packages to depend on stable versions in that recommended set to reduce duplication.
But going further and actually making that set to be part of the "standard library", and thus released at the cadence of the Rust language itself, and managed under the same umbrella, would be harmful.
Rust used to have getopt in the standard library. It was removed because it was bad. Clap only came later. I don't think Rust would have been a better language if the old getopt were kept around.
One very quick comment in support of a strong first party library: it’s proven to be much simpler, and much more common, for third party libraries to be compromised and everything from intentional security holes to full on malware to be bundled into the distributed packages.
I could pin every dependency, and monitor CVEs for all my dependencies, and monitor the ownership/code changes for all my dependencies, and hope for updates in a timely manner should there be CVEs... or I can use the standard library and move on with my life.
Personal opinion of course. Some people enjoy monitoring CVE lists. :)
Isn't it at least partly explained by wanting to allow implementations of this functionality to evolve more rapidly than they could if they were in the stdlib and thus reach good solutions more rapidly? getopt in particular seems like something that programmers haven't reached a consensus on despite 40 years of experimentation.
I would beg to differ with this. The kind of functionality expected from Getopt is fairly standard these days. I also disagree that features like getopt really need to 'evolve' much at this point in time. At the risk of courting controversy here, I'm going to confess that often I just don't want the community involved in the development of a language. On a project I inherited recently ( with the mission to save ) I unfortunately had to suffer using Typescript. If you look at the community discussions regarding the language's future direction you'll be treated to the uninformed arguing with the ignorant about the language's direction.
Every time I download an external crate, I have to learn a different developer's way of doing things, suffer their idiosyncratic ways of creating an API and potentially expose myself or my application to a new set of vulnerabilities. The more I can get away with not doing this, the better.
More often than not, absolutely. It requires a much higher level of competence to design a language and develop a functional compiler than it does to design libraries. There are packages in Node with millions of downloads, packages that are basically ubiquitous in certain domains that are riddled with bugs, with terrible interfaces and documentation. I can even think of libraries I've worked with in Java/C/.net/etc that are just horrific... If the languages themselves were as badly designed as the average library, they'd never succeed.
Language design and compiler implementation are definetely high-competency skills, but they don't necessarily correlate with library design skills.
For instance, http client and server libraries are often in this gray area of uncertainty about whether they should be in the stdlib or not. Is this something language designers or compiler implementors have a lot of experience in? I would say not; sending and serving http requests are not something compilers need to do. Or take GUI libraries, what do language designers know about that?
I also know of many bad third party libraries, but there are tons of examples of really awful parts of standard libraries. The original date/time APIs in Java are a mess (they finally fixed this by in essence bringing in a third party API). The ssl bindings in the Ruby stdlib were a common source of bugs back when I was paying attention to this (maybe they've fixed it), same thing for the built in http stuff. Someone else mentioned the similar weakness of python's built in http, such that most people use a third party library instead. Even the java collections APIs are pretty poor such that people often augment them with things like guava or apache libraries.
My point is just that developing good libraries is a hard thing and I don't see any reason to think language designers or compiler maintainers are any better (or worse!) at it than other people. There isn't really a shortcut, you can't just cede authority to the powers that be on the core language teams, you just have to evaluate the quality of libraries for your use case yourself.
You've made a very good point that proficiency in developing compiler infrastructure does not imply that you're qualified to develop every specific aspect of a standard library. Date/Time, as you pointed out, is a very good example of this. It's a very complex domain that requires specialised knowledge.
I'll counter this by saying that one aspect of language design is choosing the scope of the project, and deciding how best to implement a standard library targeting the language's intended domain. If your language is designed to implement web servers, then developing a GUI library might be a poor investment. Consequently, if your language is designed for implementing system applications, then investing time and talent into developing things like FFI, Filesystem and GUI functionality are just the prerequisites to the language being useful in its intended domain.
Yes I do sympathize with this in that, put bluntly, I'd also prefer that a programming language be developed by a smaller team of highly skilled people, as opposed to making the process as accessible as possible. But I really think there are ways for everyone to help out while retaining the feature that the most important and challenging parts are contributed to by the appropriate people, without upsetting anyone. And I think it's possible that Rust might be a shining example of such a thing. Clearly, in modern western society, it's hard to avoid this discussion acquiring a political dimension. And honestly that is something the open source community might need to address openly and attempt to do a better job of schism-avoidance than other areas of society.
On the flip side Lua is a wonderful language because its small lib lets it go anywhere.
We used to run a whole gamestate of a shipped title on PSP in a 400kb block allocation. I've yet to see that in any other dynamic language of consequence.
When I was still in that 'I should write a programming language' stage of career development, I worked on a pretty sophisticated (for the era) mobile app. PyPy was getting quite a bit of press around that time and my brain connected some dots.
One of the ideas I wanted very much to explore is scaling the API, both up and down. For building something akin to PyPy, you might want a 'kernel', a small set of libraries that were available everywhere, and several other levels that include more or different things.
Mobile takes the base and adds a few things suitable for mobile (storage, UI, broader networking). Desktop has a real UI, and then there's the kitchen sink like .Net and Java have.
But you have the same problems you always have with decomposition - if you didn't guess the right boundaries when you built the thing then removing or rearranging bits is a serious PITA. Sometimes I think the best we can hope for is to leave clear messages for the next language so that it doesn't organize things the way we did.
LuaJIT is incredibly fast. It might just be the fastest scripting language out there.
I think this was only possible because Lua itself is such a simple language. It says a lot that LuaJIT's implementation of this simple language is extremely complex in comparison to vanilla 5.1. Imagine the added complexity for something like Ruby that offers more than one associative data structure.
With LOVE you get a complete game engine runtime with all the boring OS abstractions taken care of and you can just start working.
I tried a few game development bouts with Rust since it seemed like a natural step up from C++ but the compile cycle kills a lot of my creative drive.
I think this is just a consequence of the language being compiled, as with C++. With Rust having a wide variety of language features the compile time is increased also. At least we're lucky to have languages today that offer far shorter turnaround times for building prototypes.
Statically compiled languages like Rust can have a large a standard library as they want, because only the code that is actually used will get included in the binary.
> but having a package system shouldn't be a substitute for designing a useful standard library for a language.
Completely agree. I hold firm to the believe that a strong standard library that considers modern software development goals will drive the success of a language.
Look at Go. Which is usually my example since not many do what Go does. You can do a whole web application in Go with minimal use (if any) of external libraries.
Back to Rust and their defense: they intend to adopt popular / quality packages from Cargo into the standard library or fill the gap between packages.
I secretly wish D had similar things to Go in the standard library. HTTP and SMTP etc which Python seems to have at least HTTP and sadly Go ditched their SMTP package for whatever reason making it awkward.
I hate having to learn a new package manager per new language. I love programming so I try a lot of languages out of love. Package managers and build systems are horrible UX in every language. I prefer to not rely on third party oh look now deprecated packages and start out with whats out of the box.
Given there are no infinite resources, one has to choose where to focus. Rust team seems to be focusing on the language and solving hard-to-deal-with but relevant problems - and on evolving the language, which is truly a remarkable feat when you are pushing the boundaries.
Libraries can be implemented by 3rd parties and maybe later adopted as standard libraries or become de-facto standards. That's how it has been done for most popular languages out there, and I think it's a smart decision.
Go's standard library works for Go because it has a rather sharp focus on implementing web services. It's worthless for implementing a GUI application, or a particle physics simulation, or a PID-1 daemon.
Rust has a much broader aim, so a stdlib accommodating all of its usecases would be as comically huge as Python's, with all the problems that come from that.
It's also good for crypto, image processing, logging, file compression, and other commonly useful things. Go does have an emphasis on network services but is not limited to that.
"a stdlib accommodating all of its usecases would be as comically huge as Python's, with all the problems that come from that."
That's a straw man. I don't want a stdlib like Python's, I want one like Go's.
I think it is a little ironic that he speaks of performance culture but simutaneously advises to use dynamic dispatch and avoid polymorphism. I can see the justification in non-critical code paths, but serialisation is a pretty important part of most networked software nowadays so I do not think that smaller binaries and faster compilation times (better developer experience) justifies a performance hit in the form of dynamic dispatch through crates like miniserde.
Performance culture has you measure the actual performance implications, then make an informed decision. Is the code on a performance-critical path? Maybe some of your serialization code is, but it's extremely unlikely that a dynamic dispatch when parsing command line args is the reason your app is slow. Also be aware that highly inlined code does nicely in microbenchmarks but might have significantly negative performance implications in a larger system when it blows out the I-cache.
> Also be aware that highly inlined code does nicely in microbenchmarks but might have significantly negative performance implications in a larger system when it blows out the I-cache.
I see this assertion a lot, but I have never actually seen a system in which inlining that would otherwise be a win in terms of performance becomes a loss in a large system. LLVM developers seem to agree, because LLVM is quite aggressive in inlining (the joke is that LLVM's inlining heuristic is "yes").
I'd be curious to see any examples of I$ effects from the effects of inlining specifically in large systems mattering in practice.
Fiora refactored the MMU code emitted by Dolphin to a far jump, which had significant performance improvements over inlining the code [0]. She had an article about it in PoC || GTFO [1].
Interesting, that's a good case. Though it's a bit of an extreme one, because it's jitcode for a CPU emulator. I'm not sure how relevant that is to Rust, though it's certainly worth keeping in mind.
In my experience, i$ is much bigger than everyone thinks, and they over-emphasize optimizing for it whenever someone brings up code size. It can soak up a lot. That said, for JITs, where code is not accessed very often and in weird patterns, it can matter quite a lot.
hm, I've run a lot of profiling of various software through the years and never once instruction cache misses have been a problem, in large template-rich boostful C++ codebases
Systems that JIT large amounts of code (HHVM, etc) deal with this trade-off all the time. See e.g. https://qconsf.com/sf2012/dl/qcon-sanfran-2012/slides/KeithA... for an old discussion of some of the issues (e.g. inlined/specialized versions of memcpy were slower overall than a standalone-slower outlined version).
You're asking for something which is a bit awkward to find, because it requires a bunch of code in a loop to pressure the cache, and then have someone notice the effects of inlining one thing vs not makes all the difference.
The most likely people to be able to answer this one would be game devs or video codec hackers, at a guess.
I do know that inlining choices can have massive effects on executable size. I've seen more people complain about this kind of thing. It's most noticeable when controlling the inlining of a runtime library function in a language a bit more high level than Rust - I'm thinking of Delphi, with its managed strings, arrays, interfaces etc.
One somewhat related example I can think of was how the v8 javascript implementation switched from a baseline compiler to a baseline interpreter. The interpreter version has less startup latency (because compiling to bytecode is less work) and uses less RAM (because bytecodes are more compact).
It isn't exactly about inling but it is an example where optimizing for size also optimized for speed at the same time.
Any time you have an error/exception/abort path, you always never want to inline it (LLVM prob has attributes to prevent that, but I'm not sure if they are used by rust). Also, LLVM does get a little too aggressive with things like unrolling so I wouldn't be surprised if it inlined too aggressively too.
I do agree with you with the measuring aspect. Part of building high-performance systems is being able to measure performance in an accurate and actionable way, and consequently optimise code paths that have significant performance impacts.
However, I do believe that a competent engineer would have the judgement to be able to see, roughly, where performance hits would likely arise and optimise accordingly. Command-line arguments would likely not fall under this mandate, but serialisation to stdout is likely a good candidate for well-designed and well-optimised code. A nice side-effect is that this also avoids significant refactors down the line when you need to, in this case, change your serialisation from dynamic dispatch to static dispatch.
One thing I like about Rust a lot is that it lets you choose which hit you want to take when it comes to polymorphism. You can manually (and without much difficulty!) prefer dynamic dispatch if it's important to you to keep your binary size small, but you can also choose static dispatch and allow some replicated copies of your parametric code if that's what you want to optimize for.
It's also worth noting that if you are using polymorphic functions that are only ever called in your code with a single known type parameter, then your program should be just as efficient as if you wrote a monomorphic version with that fixed type in both runtime _and_ binary size, which means the only disadvantage to the polymorphism there is compile time.
It would be nice if Box<dyn Trait> implemented Trait. Then we would be able to write the function only once with parametric polymorphism and then at callsite decide if we want to monomorphize for the given type or not.
You can provide this implementation yourself easily enough though. I agree it's maybe not ideal that this needs to be done for every Trait you want this behavior for.
I’ve been using C and C++ (sorry Rust), like, forever, and I think should hate them with all fibers of my soul. A high-level programming language is “supposed” to let me forget, for the higher being’s sake, all machine-specific details and focus on the logic of the problem at hand. (Hello FORTRAN.)
If you, as the programmer, would prefer not to know and remember machine specific details, you can use rust or c++ easily. Just don't think you will always get the best performance possible for your hardware. And I think that is justified and reasonable.
I've always thought the fact that parametric polymorphism always results in monomorphised code is just a temporary deficiency in the language/compiler.
I think if a problem can be solved with either, the default choice should generally be parametric polymorphism rather than subtype polymorphism, simply from a logical point of view.
Haskell is often described as passing around "dictionaries" corresponding to class instances. Presumably Rust could add the same functionality depending on how a type parameter is declared (eg, `fn foo<T: Foo>() ..` denotes a monomorphised function whereas `fn foo<%T: Foo>() ..` could denote a non-monomorphised function which at runtime takes arguments specifying the size and alignment of `T` as well as a vtable corresponding to the `Foo` trait). This would also make polymorphic recursion possible.
Swift takes this approach: monomorphisation is an implementation detail/optimisation.
In languages like Rust and Swift where values don't have a uniform representation (that is, not always a pointer), this takes a lot of infastructure, and a lot of performance/optimiser work to get reasonable performance for common code: the Swift compiler has quite a bit of code devoted to making generically-types values behave mostly like statically-typed ones, with minimal performance cliffs.
Rust's approach is that this sort of vtable-/dictionary-passing has to be done explicitly (with a trait object), and such values have restrictions.
To a very large extent, we already have this with `fn foo(foo: &dyn T)` (or `foo: Box<dyn T>` for the owned version). What I would find even more interesting is the compiler much more aggressively factoring out the common code from the multiple instances, ideally compiling it only once and putting only that one version in the binary.
`T` there would be a trait though, not a type. You should still be able to have, for example, `fn foo<%T>(v: Vec<T>)` in which case you can still call the function with a regular `Vec<i32>`, since if the size/alignment is simply passed as an argument at runtime, you can operate on the existing non-boxed representations of data. The only thing that's different is the specialisation of the instructions essentially happens at runtime rather than at compile time.
I have however thought that it should be automatically done based on heuristics, but since the notion of "zero-cost abstraction" is considered the default, I don't think this would be desirable.
Well, I have this anecdote. We switched from serde to our own serialization / deserialization scheme (it still uses serde, but only for the JSON part), which is heavily based on dynamic dispatch, and actually got it faster.
Wasn't apples to apples comparison, but it was some times faster at the time (my memory doesn't serve me, but something around 3x to 5x). Also, compilation speed went down (well, at the time :) ). It was mostly due how some of the features work in serde (flatten and tagged enums), though.
I made a separate, cleaner, experiment (https://github.com/idubrov/dynser), which does not show that dramatic improvement (again, wasn't apples to apples, there were other factors which I don't remember), but shows some.
Can you do both? Fast compile time and slightly slower execution time for debug build using dynamic dispatch and long compile time and fast execution time for release build using static polymorphism from the same code base.
It could probably be done using conditional builds for which there is support for in cargo, though that would require the programmer to write two versions of the same code. I doubt that it is possible for the compiler to do this optimisation automatically.
Recall that dynamic dispatch does not need you as a programmer to know which implementation is being used for a given polymorphic method or function - I find it difficult to see how the compiler would be able to reason what implementation is being referred to in code to generate static polymorphic code without the programmer being explicit. If that were the case, there would be no need to be explicit at all (and consequently no need for static dispatch in Rust code), and all you would need to do for polymorphism is to use dynamic dispatch. However, although this would be incredibly convenient and ergonomic, unfortunately the Rust compiler is not capable of magic.
I think it's considered old hat by now for JITs in languages with polymorphism to inline a little bit of dynamic dispatch code into the call sites. The branch predictor gets to work its magic and removes the call overhead in a high number of cases.
I think I read somewhere that Javascript engines do something similar, with some extra code to de-optimize when you fiddle with the object prototype.
I’m not super familiar with the details, but all the the major Javascript engines definitely do extremely involved runtime optimizations, and I wouldn’t be surprised at all if the case you described is one of them.
It looks like the author is most bothered by compile times of dependencies.
Cargo needs to do better with shared caches (so you compile each dep at most once per machine) or ability to get precompiled crates (so you don't even compile it).
Incremental improvements of compiler speed or trimming of individual dependencies won't bring the 10x improvement it needs.
One of the biggest issues that I face with Rust is that its builds are enormous, and I often work on machines with limited disc space.
The actual binary sizes are fine - even with embedded devices that have <1MB of program space, there doesn't seem to be much bloat. But I need to remember to clean every project when I finish working on it, because otherwise the build folders will eat 100s-1000s of megabytes each on a 16-32GB partition.
I really appreciate Rust's inclusive approach to learning and teaching, but I can't justify using it for education on ultra-affordable machines for that reason. People often scatter one-off projects all over the place as they learn, and when they run out of space it takes a long time to clean everything up.
So I would also be very in favor of some sort of simple shared package cache.
It looks like you can set it using a CARGO_TARGET_DIR environment variable or build.target-dir config value. So I guess you could create a $HOME/.cargo/config file with the following in it (letting you skip the --target-dir part):
[build]
target-dir = "/home/user/build-artifacts"
But I don't know if the different projects will trample each other that way. If that's the case you could just go with a .cargo/config per-project targeting a subdirectory of the build-artifacts directory.
You can kind of hack a shared cache using cargo workspaces (put all your rust projects in one directory as part of a workspace) but that is far from ideal. All your projects will share a target directory and Cargo.lock.
I agree though. I don't mind binary sizes in the single megabytes for a release build, and I don't mind clean/rebuild build times that are a few dozen seconds. I do mind the 500MB of build artifacts in a single target directory, multiplied by all the projects I have that share the same version of serde/winapi/syn/quote/insert-common-dependency-here.
Have you tried sccache [0]? It doesn’t always choose to cache a dependency, but it helps about 70% of the time. Anecdotally, it hastened a release build of a pretty standard CLI tool (with incremental compilation) by almost 4x.
In the context of resource-constrained machines, one can always host it remotely on S3. (or mount an NFS share as the CARGO_TARGET_DIR, if you’re feeling adventurous or want fast CI)
Yes! Even using the local server on your own machine. It’s not as good as incremental, but it really does help when working with multiple projects that use the same deps in the same ways.
NB: it works a bit better on not-Mac because staticlibs are deterministic.
But two minutes is still a lot of time to wait for a build, especially if you’re doing gamedev and want to prototype something fast.
It seems nalgebra is the culprit here: because Rust doesn’t yet support const generics, it has to use some hacky type-level metaprogramming to represent numbers, and that will definitely destroy build times.
2 minutes for a full build from scratch. Incremental builds afterwards take seconds, though unfortunately linking of big projects can still sometimes takes up to around a minute.
My toy project that I ported from a Gtkmm article done in the days of "The C/C++ Users Journal" takes around 25 minutes to build from scratch on a Asus 1215B netbook (dual core, 8GB, HDD).
The original code, after being migrated to an up to date version of Gtkmm, takes a couple of seconds with GCC 7, not more than one minute if at all.
The big difference? I don't need to compile from scratch all the 3rd party dependencies.
With every release from Rust I do a clean build to assert how much it has improved.
It was much worse, so congrats on the work achieved thus far, but it is still a pain to set up a project from scratch.
and its actually gotten worse since then, significantly worse. In the 2 months
since then the installer has increased from 203 MB to 299 MB. Also
unbelieveably, Rust has failed to address package balkanization which I would
say has ruined the Node community. A popular package is "cargo-edit", which
currently pulls in 239 other crates:
Unless I've misunderstood you, that would appear to be a different issue. The size of Rust platform tools doesn't necessarily lead to longer build times or bloat for applications built with Rust.
And IMHO your issue does appear to be being taken seriously, at least judging from the link provided.
AFAIK the real problem in node is not number of packages in itself, but number of independent trust relationships implied by transitive dependencies. Basically how many separate people's integrity and security practices are you counting on when you install your reqs?
So a language could:
- identify a blessed set of packages that don't imply separate trust relationships (a stdlib, or packages maintained or audited by the language team)
A few ideas:
1) Remove the ability to unpublish / yank crates. A published crate should be immutable, but the crate's metadata should always be updateable by the maintainer.
2) Improve the metadata that describes a crate so that it is easy to tell if a crate should be used. For example, is the crate beta quality? Was a serious error found in the crate and it needs to be marked as "not safe"? Is it a Long-Term Support release? Etc.
3) As a culture, disallow trivial crates. No "is-odd" or similarly low effort crates. These just add bloat since the have so little functionality compared to their overhead. If your crate's toml is larger than the crate's code, you are doing it wrong.
Ironically, "no trivial crates" is almost exactly the opposite of what the article seems to want, which is only small crates so you're not importing lots of needless bloat. It's hard to please everyone!
I talk about this a bit, I'm in favor of at least medium granularity crates, but if they break down into smaller features, where different use cases will meaningfully choose different sets of features, use feature gates. So, for example, you might have a "string formatting utilities" crate with a "left-pad" feature. (Note: this particular example is unlikely because the `format!` macro in the standard library can do it just fine)
You could require that packages pass community review before they're allowed to be published on (the main) registry. Emacs-lisp (Melpa) does this. Although it's a different thing, Homebrew does that also.
Balkanize definition: to break up (a region, a group, etc.) into smaller and often hostile or uncooperative units.
Are you implying there are competing crates with belligerent attitudes to each other?
> which I would say has ruined the Node community
I thought it was the basis and strength of node... (However I am too scared to use node or node toolchains: I fear trojans since I don't trust all dependencies).
One sovereign political unit (the Ottoman Empire) was weakened by ethnic nationalism (alongside political decay, external pressure, etc.); ended up in war; and the Balkan region was broken into several small states. Instead of being able to make regional policy, amortize common costs, suppress local divisions under a common rule, communally defend against outside pressure/threats, etc., these new states were weak and unstable. Ultimately their conflict drew in outside powers and led to the first world war.
The suggestion in this context is that if a library is replaced by several pieces, they will adopt different conventions, duplicate effort, not interoperate as freely, etc., instead of benefiting from a unified vision and design.
> There’s also an effort to analyze binary sizes more systematically. I applaud such efforts and would love it if they were even more visible. Ideally, crates.io would include some kind of bloat report along with its other metadata, [...]
This is what I always wanted (for Rust as well as for C) but never got around to hack together myself. I dreamt it up more as a feature of cargo though, something like 'cargo stats' or so. Shouldn't be to hard and cargo is extensible.
A typical smartphone ships with around 10,000 times this much storage capacity and enough RAM to hold it 100 times over.
This is bloat?
I mean, I get it, the binary used to be only 2MB, 1/3rd the size. But are numbers this low really worth worrying about? I think a GUI app in 6MB is hugely impressive.
I genuinely thought he was going to say it was 100MB or something higher.
Windows 3.11 required 4MB of RAM and the whole install took <20MB of disk space, and that's the entire OS with all of its utilities and libraries.
A typical smartphone ships with around 10,000 times this much storage capacity and enough RAM to hold it 100 times over.
The fact that it can hold that much does not make it right to waste resources. To contrast, video or audio is a good use of the space it takes up in general, because there has been and continues to be research in compressing that data, and it's pretty close to being as small as it can practically be. Apps are not a good use of space because we know roughly what the lower limit is --- and the current average is a few orders of magnitude more than that.
> Windows 3.11 required 4MB of RAM and the whole install took <20MB of disk space, and that's the entire OS with all of its utilities and libraries.
Sure, and the moon landing used computers with less processing power than your kids calculator. That doesn't mean we should use those to put people on the moon over faster hardware.
Does the fact that older, slower and smaller hardware and software once existed mean we should spend time, resources and potentially sacrifice features to... what? Hark back to the old days where we had 128kb of memory and hard disks the size of vinyl records?
A 6mb GUI app _is_ impressive for right now. At some point in time it would have been absolutely massive, and the way it's going, at some point in the future it may well be absolutely minuscule. And that's not a bad thing.
I'm constantly torn between these two perspectives. Practically speaking, 6mb is fine. Ideologically, not as much.
In general, complexity is indicative of a poorer design, and the fact that an entire operating system + utilities can be delivered in a comparable size to a gui app is hard to look away from.
The author says, "Once you accept bloat, it’s very hard to claw it back". The fear is that bloat increases at higher rate in proportion to its capability and computing power.
I'm cognizant that I'm very much in the minimalist camp with regard to computers, so I do make sure to peek outside and remind myself that stressing about an executable size is a bit unwarranted. But that being said, we must remain vigilant!
If today's computers did things a million times better, or did a million more things than 25 years ago, I'd agree with you, but from a user perspective a modern computer is not really all that different from a Windows 3.11 machine. The screens are bigger and we have Internet now, but the experience of, e.g., writing a letter in Word is basically the same.
The screens alone are responsible for a lot of size increases (framebuffers in RAM, high res media) but also, unlike in Windows 3.11, modern Word allows you to mix English, Japanese and Arabic in a document, allows use of a screen reader, and has a thousand features that you personally don't need but everyone has some set of features that they use, and taking away any of them would offend someone.
But what if GUI applications that were considered lean and mean in the 90s were common on today's hardware? Wouldn't those applications be really fast and light on resources?
Fast and light on resources isn't the same thing: a GPU-accelerated GUI toolkit is going to be bigger than a full software one, but also much faster on modern hardware.
It's an apples to oranges comparison, you can't really say that one is "wasting" resources or not.
If you want to use windows as an example. Why not take the current version? It uses what like 16GB for a basic install (I wouldn't know, I don't use it)? Compare the 6Mb GUI framework to that.
Just go at any iOS or Android conference discussing user monetization.
APK/IPA sizes are the number one reason why consumers uninstall software and the reason behind the ongoing module format efforts from Apple and Google.
I have written GUI apps that could fit in a floppy.
I think of myself as someone who cares about bloat — I define it by practical impact.
For example, I keep Facebook off my phone, and when I need to download it (for example to rsvp for an event) the app is ~100MB, which means 1> I can’t even do it unless I’m on wifi and 2> it’s slow even there.
I’m having trouble understanding the practical impact of 6MB. Many web pages are larger than that. Even on the Mac I bought in 1994, that would have been a reasonable size for an application (and it only shipped with 120MB hard drive).
But maybe I’m missing something. Some folks have mentioned CPU caches which is interesting — is that the problem?
A large app in C++ with ATL (Active Template Library) or WTL (Windows Template Library) would have about the same size and speed as in C with plain Win32 API, leveraging the whole new level of abstractions, code reuse and developer productivity. The point is such things are possible, bloat is not inevitable.
Arguing about 10KB or 100KB applications on a comment page that's 40kb in size is somewhat silly.
It's _worth_ it to trade 1mb, 10mb or 100mb of app size in some cases. That's why you don't hand-craft your "simple 100kb guis" in assembly and have them only be 1kb. Exactly the same principle applies here.
Plus, realistically, users don't care. Not one bit. That's why Slack is out here capturing the market, while some other lightweight and exquisitely coded 10kb tool written in C isn't. Because they are shipping features the users want and iterating fast on their memory hungry and bloated platform, while the other one segfaults when there is an unencountered error.
> Plus, realistically, users don't care. Not one bit.
I disagree. I develop an audio workstation - https://ossia.io ; the total size is between 50 and 100 megabytes depending on the platforms. It uses Qt and LLVM and is itself around 500kloc so I'm already around the lower limits of what I can do.
My users, & much people on the internet keep comparing it in size to Reaper, another DAW where the binary is around 10 megabytes (https://www.reaper.fm/download.php) - but they wrote their own gui toolkit and language interpreter.
well, I still think that Qt is the better solution, and so by an order of magnitude :-) it allowed me to write the software, which is being used in production in mac / linux / windows, while doing a ph.d. at the same time ; not sure I could have done the same with any of the other options in there (an older version was using JUCE but it was full of problems ; in particular JUCE's software renderer is much less efficient than QPainter).
And for all of REAPER's goodness, it took decades before getting a linux version.
Oh yeah, I agree. It's a good one. Several folks in the comments called out the author for not bringing it up. Personally, I think the author had a bias where any solution had to be close to the size of the native, standalone app.
One thing I wondered about Qt is if there's a way to trim out anything an app doesn't use. Have you seen anything like that?
The reason that it isn't silly is that people keep saying these larger sizes are necessary when anyone with some perspective and history in computers knows that it's ridiculous.
You can say users don't care about bloat and speed, but when they have an alternative that is clearly not the case. uTorrent destroyed the market share of other torrent clients by being lightning fast and tiny. Chrome captured market share off of being fast. IE originally killed Netscape because it 'loaded' much faster. Winamp won because it was fast and tiny. Google won because it loaded fast and the searches were fast. Google maps won because it was full screen and still faster than MapQuest. People hate the Reddit redesign because it is slow and bloated. People like hacker news' interface because it is fast. People upgrade their phones to see dramatic speed differences. A major advantage of apple is their faster CPUs.
When users have no choice, they put up with whatever bloated nonsense they have to. When they have a choice, they do actually go with interactivity and less latency.
You can be patronizing and pretend that it's archaic to care about well made software that doesn't take up 100x the resources it should need, but when someone wants software that gets out of their way, scales well, or runs on a low power platform, that 300 MB chat client isn't going to cut it.
Even Raspberry Pis have many GBs of both disk and RAM and super fast USB, MicroSD, and network transfer speeds.
I don't really see the difference between 5mb and 20mb bins. And I'm a fan of minimalist OS systems like Archlinux with a slim set of running services. Disk space isn't really the main concern besides as a symbolic measure of cruft. No system is ever going to fill up space by having too many OS/terminal programs installed.
But I also don't come from the nostalgic Unixy C programming world. Just a Unix user. So I may be biased.
No I'm not. Pi 2 meets all of those requirements and has MicroSDHC which offer tons of high speed space even at entry level.
Besides that was an example, I have plenty of other IoT and embedded style hardware and they are all fitting this criteria.
Simply put the size of binaries has never been a real concern for some time now and the few rare cases where it might matter probably aren't worth all the effort made obsessing over sub 50mb bin size.
99%+ of it will be people who "feel" like it should be small, without any good reason.
Rust is used a lot for embedded development, and the memory available on an IoT device is nowhere near that of a smartphone.
You probably won't want a GUI running on an IoT sensor, but you might need a network stack, maybe some data processing...
It is the mindset. Slacking off leads to increasingly suboptimal apps. As long as the sales dept picks up the slack, it will work out, but it will bite you one of these days.
India, China, Africa. Huge, huge markets. Slow, old, crap hardware and networks. Do the math.
It's a little unfortunate that some of the cool features of Rust need to be put into the "use sparingly" category. I know that polymorphism, async, etc. should be used judiciously anyway, but still.
One recent case we saw a similar tradeoff was the observation that the unicase dep adds 50k to the binary size for pulldown-cmark. In this case, the CommonMark spec demands Unicode case-folding, and without that, it’s no longer complying with the standard. I understand the temptation to cut this corner, but I think having versions out there that are not spec-compliant is a bad thing, especially unfriendly to the majority of people in the world whose native language is other than English.
In the Python world, you can install an "extra" along with a package. So you can make the deliberate decision to omit Unicode case folding from your CommonMark parser. Maybe something like that is possible with a crate?
That said, I think this is a non-feature in the spec, if anything. I see the value in recommending (but not requiring) Unicode normalization, but I don't see the added value of Unicode-aware case-insensitivity. Maybe it's more important in non-Latin text.
> It's a little unfortunate that some of the cool features of Rust need to be put into the "use sparingly" category.
This is just the reality of engineering -- there are no silver bullets.
Sure, it'd be nice if any language could give you the space efficiency of dynamic dispatch with the runtime efficiency of monomorphized generics, but those two things are fundamentally in tension. Neither Rust nor any other language can fix that.
Rust at least gives you a fairly easy choice of which you want in any given circumstance. Most languages just pick one or the other universally.
Also not a silver bullet, because there's non-zero costs in memory, CPU overhead, I$, etc to the statistics gathering, stop-and-jit-the-dynamic-call-and-change-the-call-sites and so forth.
In long running server processes this can mostly amortize out nicely over a very long run, but for interactive applications it can add noticeable lag.
> I don't see the added value of Unicode-aware case-insensitivity. Maybe it's more important in non-Latin text
I believe that -- to give one example -- this is something that Chrome implements, but Firefox does not. It's a huge pain to have to match accents perfectly in a text search on a page when you often want your search to be accent and case insensitive. This sort of thing is very important for any text search workload, I imagine.
> In the Python world, you can install an "extra" along with a package. So you can make the deliberate decision to omit Unicode case folding from your CommonMark parser. Maybe something like that is possible with a crate?
This is definitely possible in Rust with Cargo crate features! [0]
For instance in a game library like quicksilver you can opt in to WebAssembly support, images, fonts, audio handling, etc. Then the image or audio library themselves can put formats and codecs behind feature flags for instance. Each crate then uses conditional compilation to branch out these features at compilation time. More commonly as an end-user, test modules use a special conditional compilation macro [1].
It's a complex trade-off of runtime performance, code clarity, productivity, safety, binary size, and compile times. Rust tries hard to eat all the cakes and have them too, but it can't do miracles.
And looks like the author is on the right track to tackling the problem — moving less common functionality behind feature flags, pregenerating data "offline", profiling and trimming code with the help of cargo-bloat.
> For one, it’s common that you get different versions anyway (the Zola build currently has two versions each of unicase, parking_lot, parking_lot_core, crossbeam-deque, toml, derive_more, lock_api, scopeguard, and winapi).
This seems like a specific thing that would be a measurable win and a not uncommon problem (e.g., I have a test kernel module that doesn't have many dependencies, and it still has two generic-arrays, two proc-macro2s, and two unicode-xids). Is there something that could be done here technically, such as pull requests to bump common crates to using the same version of even-more-common dependencies?
This is up to 0.13.2... plenty of supposedly breaking API churn, no wonder you have two copies.
Yes, your dependencies probably rely on different 0.x versions, and a pull request could fix that. You can inspect your Cargo.lock file to figure out which ones are to blame.
> Is there something that could be done here technically, such as pull requests to bump common crates to using the same version of even-more-common dependencies?
Suppose it cuts build time from 5 minutes to 30 seconds....
Technically, a transparently mirrored file system, a strong compiler cluster (memory, cores, etc). And some predictive ML. But you end up with a binary-equivalent (verifiable) output.
I wouldn't use it. Incremental builds aren't that bad, and I'd have a hard time trusting third party compiled libs enough to include in a release from a new service that hasn't built up trust the way that say a Linux distro has.
While you could assume any maliciousness or security compromise would be caught, as you can see from the rubygems news today this is not instant and it adds another point of failure.
I would likely use this depending on how transparent it was and the pricing and if it supported RLS etc. I was writing a rust project on a terrible old laptop that took 2-5 minutes for compilation time. I ended up standing up a Digital Ocean instance and used VScode (via Coder https://github.com/cdr/code-server ). And this ended up being the most workable solution. Cut down my compilation times significantly and also helped w/ VScode RAM usage.
Not sure how the buisness model for a cloud compiler would work, but I would be interested.
Do dynamic libraries really help? They only save space in the case where 1) you are running many different programs and 2) they all use the same libraries in the same versions. If you're running multiple copies of the same program, they share code, at least on Linux.
Depends on the use case. I'm thinking about GUI, where on Linux you can write code that links against (say) Gtk, and thus can pull in lots of libraries while the executable itself is tiny. With the current state of the Rust ecosystem, you basically have to build the GUI toolkit and bundle it with your app.
Dynamic libs on the file system are awful. It's OK if the kernel wants to conserve RAM by sharing code pages that are identical, but saving hard disk space with dynamic libs is so painful.
> running many different programs and they all use the same libraries in the same versions
It's distro maintainers' job to test packaged apps and converge on a known-good version of a shared lib. This is only a problem if your distro doesn't package the apps you need, and you build from upstream source ignoring the distro and using all the random versions upstream happened to test.
It highly depends on use case. If I have web application running on 100 nodes 5mb vs even 10mb has such a minuscule cost that it not even worth calculating. Now benefits for ops from just having to push a single static binary without external dependencies are fairly decent.
Good points, but I'd like to point out that nothing on this post is Rust-specific.
You can have the same issue if programming in any other language, including C: excessive indirections, inefficient algorithms, bad abstractions, excessive use of unnecessary libraries, etc.
It's indeed easier to "bloat" your resulting binary in C++ or Rust given the easiness to do higher-level abstractions; since you can more easily program complex solutions you also need to consider your design and the trade-offs on your code.
I'd also like to point out that, in comparison:
* Rust bloat is a speck compared to hundreds of megabytes for a similar Python program + runtime including the same amount of code.
* Rust bloat can be mostly optimized away for systems that really care about excess/unused code - you have #![no_std], disabling of backtraces, aborting on panic, and a lot of other optimizations that throw away a big portion of extra functionality not needed for things like embedded. You have little alternative on things like Go besides removing debugging symbols and doing tricks like "dynamic decompression" (which could also be applied to Rust programs to further reduce their size, btw).
Bottom line is: Rust makes it easier for you to "just add a new library" and it also make you more mindful of the bloat, but we need to keep it in perspective.
It looks like the `fluent` library is to blame for the bloat here.
I have myself made another localization library that use a simpler model and compiles quite fast when using the rust `gettext` backend: https://github.com/woboq/tr
I don't understand the advice: "Use async sparingly"
It doesn't make sense. Either your complete code base is async or it is not. If you have a single blocking code point in it. It is not async anymore, since it can't handle any other schedule async tasks while waiting for the blocking call.
Probably could have worded it better. I didn't mean, "only use a little bit of async," I agree that doesn't make sense. I meant, "use async if your problem really needs it, otherwise avoid it."
The main thing it is you can't really add async when you need it, because if you have written enough sync code, turning it into async amounts to a major rewrite/refactor of the code including choices for dependencies, code for parallel code execution are different in async etc.
> Once you accept bloat, it’s very hard to claw it back. If your project has multi-minute compiles, people won’t even notice a 10s regression in compile time. Then these pile up, and it gets harder and harder to motivate the work to reduce bloat, because each second gained in compile time becomes such a small fraction of the total.
This particular problem can be addressed head-on, I think. It would seem feasible to have the compiler to distinguish between the target application (library) and the dependencies. Then it could report the compile times as separate values. This could be built into CI/CD tools as a way to catch application level compile time changes.
Of course, this approach wouldn’t tell the entire story, but it would likely serve as a canary in the coal mine at least.
I'm naive to rust, but I do get your concern. For a small part in a project, I used a tokio based library. And it just exponentially increased my build time.
What about if rust could support something like dynamic Linking/loading. Like have some crates globally installed and while building we could link to the global one instead of locally getting all the crates. Like C/C++does it right?
Serialization will always be slow for obvious reasons, that's why binary protocols and messages started to emerge (HTTP/2, gRPC) instead of serializing/deserializing everything to JSON and back.
grpc uses protobuf which still does traditional serialization. The cool stuff is capnp/flatbuffers, where you write to and read from the "serial" memory directly.
Seems like a lot of this comes from rust defaulting to allowing several different versions of a library to be linked in ... there’s def some other pieces. But that seems like a biggie
So far in my toy projects I've mostly seen this crop up from depending on a lot of 0.x crates still going through significant version churn and legitimate breaking changes, where A and B legitimately can't use the same version of C as-is due to API changes, because they haven't been keeping their dependencies up to date.
The fix is simple, when it happens: Patch A/B to use the latest major version of C, fixing the source code as necessary. You can [patch] locally until upstream accepts your Pull Request - which might include a https://deps.rs/repo/github/rust-lang/cargo badge in their README.md, to encourage them to continue to keep C up-to-date.
Since it's not possible for a Standard Library to have everything a programmer needs, programming languages shouldn't have Standard Libraries but Standard Repositories.
I just don't feel like rust can replace C/C++, because the syntax is not simple enough. Rust has a lot of cool things, but to me syntax simplicity is more important.
Maybe I have a hard time adapting myself to rust? Maybe my brain is too much "wired" to a C style syntax. It still seems that to me, C-style syntax is just better.
I would rather prefer a C++-like language that breaks down from backward compatibility with C while keeping its simplicity, has STL containers, and is simpler to read and use.
Rust is cool, but I'm just curious if it can really be adopted for large project to justify rewriting existing code.
Using "let thing: type" seems bloated, I prefer the C style "type thing;"
; can also make a lot of difference, which is a little weird.
I don't understand why there is both str and String, seems complicated for nothing.
Option things are not really clear yet, and I don't understand what is their use, it seems like an alternative to unions, but few developers use unions anyway...
Pattern matching seems powerful but I fail to understand its usage, and I'm a little skeptical about the machine code it generates.
D is fine, but I think that even D is not simple enough.
Rust seems like it's awesome, but it requires to entirely rethink how you write code, and I've never been a fan of high level abstraction.
Most of the time, just `let thing`. Let (ha) type inference do its thing. Explicit type annotations are for when it fails.
`String` is roughly a Java `StringBuffer`, `&str` is roughly a C++ `string_view`.
Yes, you need to rethink how you write code. Rust relies a lot on the basics of typed functional programming. Reading a Haskell tutorial up until the M word would help a lot. But surely there's a lot of good guides written for Rust specifically.
tl;dr the core concept you're looking for is sum types, aka discriminated/tagged unions. If you're going "up" from C level abstraction, imagine a C union starting with an enum that indicates which value is there. That's essentially what it is at the memory level (modulo optimizations). But conceptually, you just have a "this OR that" type. Pattern matching is how you access values of that kind of type.