Hacker News new | past | comments | ask | show | jobs | submit login
I sped up serde_json strings by 20% (purplesyringa.moe)
336 points by purplesyringa 43 days ago | hide | past | favorite | 117 comments



The utf-8 tricks make me very nervous since I have seen too many attacks with parser confusion. I for with serde for correctness not speed. I hope this was fuzzed all the way with a bunch of invalid utf-8 strings.


Luckily utf-8 structure is _very_ trivial compared to the average parser. Not to say there can't be bugs, but that the internal states of a parser shouldn't be large, and can be exhaustively tested.


This is the sort of space where I’d like to see a fuzzer.


Any bugs you can point to that come to mind of this class?


https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_er...

> Many of the first UTF-8 decoders would decode these, ignoring incorrect bits and accepting overlong results. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL, slash, or quotes. Invalid UTF-8 has been used to bypass security validations in high-profile products including Microsoft's IIS web server[26] and Apache's Tomcat servlet container.[27] RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."


awesome that serde moves so quickly. i just ran across simdutf8 and realized the pr for simd-enabled uft8 parsing is coming up on 5 years:

https://github.com/rust-lang/rust/issues/68455



The parent is comparing the speed of Rust std improvement to the faster pace of Serde.


Very strong jart feel about this person's blog, that was a nice read

> We would need to reinvent the wheel, but this is quite neat if you think about it.

Is this real or ironic though? I read it and started laughing at the writer but the rest of the page seems quite heavy on self-deprecation


I think it means the approach is quite neat, not the fact that it requires reinventing the wheel.


What does jart mean?


HN username for an extremely talented software engineer and software-engineering communicator, justine.lol. Probably most known around here for her cross-platform C code (cosmopolitan) and relatedly redbean, a zip file and tool that is also an executable file-server-file that hosts itself and can produce other such self-hosting cross platform executable zip file servers.


A commonly posted author: https://justine.lol/


> Teaching to _think_ is just as important as teaching to code, but this is seldom done

Oh, the arrogance of thinking that the other person doesn't think.


I don't think there's any arrogance in the statement. It doesn't assume others don't think. It's simply observing that most blog posts and how-to articles show the final result, but not necessarily the steps that were needed to get there.


I don't see what one has to do with the other.


I'm not really sure I understand what you're saying and/or asking.

As far as I can tell, the statement in question was simply saying that showing the steps to come to a solution is rare, and also that it helps to teach how to think (how to solve a problem). I guess don't see where the arrogance lies.


It isn't what author argues at all. It's about teaching how to think: that you have to do some research and it's not a step you can skip; having done your research doesn't free you from having to draw your own conclusions too. Skipping either of those steps is easy but wrong.


> is seldom done

It's this part


Perhaps the wording was off on my part. What I meant is not that people don't think, it's that people seldom teach others to think, at least in web articles.

Most posts of such format I have seen are "we did this and got this", not "we tried this, it failed because of this, then we figured out something else might work and it worked after these modifications".


> Most posts of such format I have seen are "we did this and got this", not "we tried this, it failed because of this, then we figured out something else might work and it worked after these modifications".

That doesn't resemble anything remotely related to teaching how to think. You're just logging your trial and error process, which is exactly what each and every single developer goes through on a daily basis.

What exactly do you think other developers do?


In context they're very clearly talking about blogging/write-ups/presentation of technical things. A lot of the material about making / fixing things we're presented with in life are finished products, the results clean and tidy, and the steps to accomplish the result obvious with the benefit of someone else to tell you what they are. It's much less common to see even a glimpse of the effort it took to get there, or for someone to document the process, including dead ends and false starts.

Even here, we can imagine that had the author failed to actually make anything faster, they might not have written anything at all. And yet, wouldn't that still have had benefit to people? To see things attempted that didn't work, to understand why those things didn't work? Maybe it wouldn't have been as interesting to as wide an audience, but it's important to see failure. Both as a way of learning from others to not repeat the same efforts, but also because its really easy to fall into the trap of assuming you're incapable if you do fail when everyone around you always seems to be succeeding.

Or perhaps as an analogy, almost everyone creates some art in life, and certainly every artist struggles to create that art. Yet it would be a disservice to only ever present art to learning artists as complete master works and paint by numbers replications. We need to see the "happy little accidents" of Bob Ross, the sketch books of iterations on a design, the piles of failed clay firings. Not because no one experiences these things, but because they are instructive on their own in a way that only seeing success is not.


Again, they aren't saying developers don't think. They're talking about blogging

I had this issue at PeerDB where we'd blog about some dev, when I wrote it'd be a stream of consciousness trying to communicate the mood, frustration, & flailing process. It wouldn't get published, in favor of blogs with clearer product messaging


Serde json has 3gb of dependencies once you do a build for debug and a build for release. Use serde on a few active projects and you run out of disk space. I don’t know why json parsing needs 3gb of dependencies.

I’m all for code reuse but Serde for json is a bit of a dogs breakfast when it comes to dependencies. all you need is an exploit in on of those dependencies and half of the rust ecosystem is vulnerable.

Rust should have Jason built in.


It has 5 dependencies, one of which is optional, and another is serde itself: https://github.com/serde-rs/json/blob/master/Cargo.toml

    indexmap = { version = "2.2.3", optional = true }
    itoa = "1.0"
    memchr = { version = "2", default-features = false }
    ryu = "1.0"
    serde = { version = "1.0.194", default-features = false }
I don’t think you’re measuring what you think you’re measuring when you say it has 3GB of dependencies. But I can’t say for sure because you don’t provide any evidence for it, you just declare it as true.

If I were to guess, I’d say you’re doing a lot of #[derive(Serialize, Deserialize)] and it’s generating tons of code (derive does code generation, after all) and you’re measuring the total size of your target directory after all of this. But this is just a guess… other commenters have shown that a simple build produces code on the order of tens of MB…


> Rust should have Jason built in.

I don't think this is a reasonable approach. That's just a way to introduce bloat. Importantly, std does not differ from other crates, except for stability guarantees, so there would be no positive here. All it does is link the library's release cycle to the compiler's. (In fact, rustc-serialize used to be built-in, so Rust tried to go that way.)

But also, serde_json isn't large by default. I'm not sure where you are getting those numbers from. serde_json isn't large, serde isn't large. They both have very low MSRVs few other crates support, so in all truth they can't even have many dependencies.


> That's just a way to introduce bloat.

I don't think "bloat" is the issue; as I'm sure you know, Rust programs only contain code for the features of the stdlib they use; if it had JSON support and it wasn't used, the linker would omit that code from the final binary. (Granted, that would make the linker work a little harder.)

More at issue is maintenance burden. Data formats come and go over time. In 10 or 20 years when JSON has faded to XML's level of relevance, the Rust stdlib team shouldn't have to continue maintaining a JSON parser.


Std definitely differs from other crates:

1. There's only one version so you can't end up with multiple copies of the crate.

2. It is precompiled, so it doesn't bloat your target directory or compile time.

3. It is able to use unstable features without using the nightly compiler.

It's a totally reasonable approach. Many other languages have JSON support in their standard libraries and it works fine. I'm not sure I'd want it, but I wouldn't say it's an obviously bad idea.


> Many other languages have JSON support in their standard libraries and it works fine. I'm not sure I'd want it, but I wouldn't say it's an obviously bad idea.

I would say it's a bad idea. JSON is, for lack of a better (less derogatory) term, a data-format fad. If Rust had been designed back in 2000 we'd be having this discussion about XML. Hell, consider Javascript (where JSON comes from), with XHR: remember that stands for "XMLHttpRequest"! Of course it can be used with data payloads other than XML; fortunately the people who added it weren't that short-sighted, but the naming is an interesting historical artifact that shows what was only fleetingly dominant at the time as an API data format.

In another 20 years, when Rust is hopefully still a relevant, widely-used language, we may not be using JSON much at all (and oof, I really hope we aren't), and yet the Rust team would still have to maintain that code were it in the stdlib.

Consider also that Rust's stdlib doesn't even have a TOML parser, even though that seems to be Rust's configuration format of choice.


Ha I really hope you're right about JSON being a fad, and something better will come along.

I would bet against it though. JSON's flaws (ambiguous spec re numbers & duplicate keys etc; no comments, strings aren't zero copy...) are pretty minor compared to XML's (completely wrong data model, insanely verbose, not even basic data types).

There was much more motivation to replace XML than JSON.

Also even though XML has been broadly replaced, it's still widespread and I don't think it would be out of place to have it in a standard library. Go has, and Go's standard library is one of its highlights.


> I don't think this is a reasonable approach. That's just a way to introduce bloat.

Can this meme die already? The fact that out of the box install of Rust can’t parse JSON is a joke, and you know it.


Rust has a great package manager, so moving libs into std doesn’t bring much benefit.

On the other hand a change like this perf improvement can be released without tying it to a language version, that’s good too.


> On the other hand a change like this perf improvement can be released without tying it to a language version, that’s good too.

And you pay for that by having literally no way to parse something ubiquitous like json out of the box on install, relying to either installing third party lib (which is yet another security attack vector, requires yet another approval for upgrade, API can change on a whim by maintainer and other can of worms) or by using other language.


I think you can consider a few extremely common crates (serde, tokio, etc.) to basically not be "third-party". The risk that dtolnay will randomly break serde_json is not meaningfully different from the risk that the rust core team will randomly break the standard library.

> requires yet another approval for upgrade

Approval from whom?


If Rust were a "web language", sure, I'd think it would have to have JSON support built in.

Rust is a systems programming language. If Rust had JSON support built in, I'd take it much less seriously. JSON is a fad, just like XML was 20 years ago. In 20 years, when JSON goes the way of XML, the Rust stdlib team should not have to continue maintaining a JSON parser.

An out of the box install of C can't parse JSON either. Do you think C is a joke? C++? Java?


I don’t want to wait on language releases to get updates to json, regex, etc. Nor do I want a crappy stdlib impl of something to become widespread just because it comes out of the box like Go’s worst of breed html templating and http “routing”.


Somehow Python and JS can get away with json in std lib, but thing that builds binaries can’t?

How often does it even need to be updated to parse freaking json?


Python's json is an often-quoted example of why not to have it in the standard lib. There are some bad defaults that no one can fix for stability reasons. Though I admit it does come in handy sometimes.

In production, I've lately seen serde_json backed python implementations, this makes sense for performance and memory safety.


Out-of-the-box you can add serde_json to your Cargo.toml file in a single line and have JSON parsing.

    serde_json = "*"
I'm not sure I see the problem.


If you have a vaguely modern Rust, you can just "cargo add serde_json" and Cargo will make the change for you.


Repeating the same claim more incredulously isn’t really a good debating tactic.


Dependency bloat is an issue with Rust in general. The dependency trees for any meaty Rust project quickly become pretty horrifying. Auditing all these dependencies is infeasible, and my level of confidence in a lot of them is fairly low.

I worked with Rust for a few years, and with the benefit of a few years' experience, I don't think I'll be touching Rust again until the ecosystem matures a great deal (which will only come with significant corporate adoption), or if I need something for a no-std, no-deps, strictly-a-C-replacement kind of project. (Though Zig might edge out Rust for this use case once it stabilises.)


> The dependency trees for any meaty Rust project quickly become pretty horrifying.

s/Rust//

This is really no different from any other language.

At least Rust, with Cargo, makes it easy to scan your dependencies. And many notable Rust projects attempt to keep third party dependencies to a minimum.

C++ gives you absolutely nothing to work with. Other languages with package managers don't keep dependency trees shallow. You're holding Rust up to a standard that nothing meets.


> And many notable Rust projects attempt to keep third party dependencies to a minimum.

I don't think this is true. The only two major Rust crates that manage to keep their dependencies light are tokio and serde, and these are highly atypical projects. For a more typical example, look at something like axum (running `cargo tree` for project with a single dependency on axum returns 121 lines).

> This is really no different from any other language.

> You're holding Rust up to a standard that nothing meets.

Respectfully, I think you're creating a bit of a false dichotomy here. I'm not demanding perfection, I'm merely noting that I've found Rust dependency trees to grow noticeably faster than dependency trees in equivalent languages. You add two dependencies in Rust, and suddenly you have a dozen dependencies of dependencies of dependencies, including at least three different logging crates. In the world of C, which is what Rust is trying to displace, that's just not going to pass muster.

Rust is a very fine language with a bit of a dependency addiction (a dependency dependency?). I honestly don't see what service it does to the language to pretend otherwise.


> You add two dependencies in Rust, and suddenly you have a dozen dependencies of dependencies of dependencies, including at least three different logging crates. In the world of C, which is what Rust is trying to displace, that's just not going to pass muster.

In the world of C, how many libraries just vendor their dependencies, so they're just not easily visible in a dependency tree?


I think the GP's point was that all modern languages (and some of the older ones) have this problem - in fact, JS is infamous for it. Therefore, I don't think it's really a pro or a con on its own - only in the context of your business and problem.


tokio and serde are certainly not the only ones. You can put almost all of my crates into that category too.

The problem with your framing is that you look at this as a "dependency addiction." But that doesn't fully explain everything. The `regex` crate is a good case study. If it were a C library, it would almost certainly have zero dependencies. But it isn't a C library. It exists in a context where I can encapsulate separately versioned libraries as dependencies with almost no impact on users of `regex`. Namely, it has two required dependencies: regex-syntax and regex-automata. It also has two optional dependencies: memchr and aho-corasick.

This isn't a case of the regex crate farming out its core functionality to other projects. Indeed, it started as a single crate. And I split its code out into separately versioned crates that others can now use. And this has been a major ecosystem win:

* memchr is used in all sorts of projects, and it promises to give you exactly the same implementation of substring search that the regex crate (and also ripgrep) use in your own projects. Indeed, that crate is used here! What would you do instead? If you were in C-land, you'd re-roll all of the specialized SIMD that's in memchr? For x86-64, aarch64 and wasm32 right? If you haven't done that sort of thing before, good luck. That'll be a long ramp-up time.

* aho-corasick is packaged as a stand-alone Python library that is quite a bit faster than pyahocorasick: https://pypi.org/project/ahocorasick-rs/ There's tons of other projects on crates.io relying on aho-corasick specifically, separately from how its used inside of `regex`.

* regex-syntax gives you a production grade regex parser. More than that, it gives you exactly the same parser used by the regex crate. People have used this for all sorts of things, including building their own regex engine without needing to re-create the parser (which is a significant simplification).

* regex-automata gives you access to all of the internal APIs of the regex engine. This is all the stuff that is too complex to put into a general purpose regex library targeting the 99% use case. As far as I know, literally no other general purpose regex engine has ever attempted this because most regex engines are written in C or C++ where you'd be laughed out of the room for suggesting it because dependency management is such a clusterfuck. Yet, this has been a big benefit to other folks. The Yara project uses it for example, and the Helix editor uses it to search discontiguous strings: https://github.com/helix-editor/helix/pull/9422 (Instead of rolling your own regex engine, which is what I believe vim does.)

This isn't dependency addiction. This is making use of separately versioned libraries to allow other projects to depend on battle tested components independent of their primary use case. Yet, if people repeat this kind of process---exposing internals like I did with the regex crate---then you wind up with a bigger dependency tree.

Good dependency management is a trade-off. One the one hand, it enables the above to happen, which I think is an objectively Good Thing. But it also enables folks to depend on huge piles of code so easily that it actively discourages someone from writing their own base64 implementation. But as should be obvious, it doesn't prevent them from doing so: https://github.com/BurntSushi/ripgrep/blob/ea99421ec896fcc9a...

Good dependency management is Pandora's box. It has been opened and it is never going to get closed again. Just looking on and calling it an addiction isn't going to take us anywhere. Instead, let's look at it as a trade-off.


Super interesting take, thanks for sharing your thoughts.

I like your trade-off framing, but I do think that your argument fails to take account of something important here, which I'll try to illustrate with an example.

Let's say I want to run a simple modern web stack. In C#, I might pull in ASP.NET and EF Core. In Java, I might pull in Spring. In Rust, I might pull in axum, axum-server, rustls and sqlx. From those four deps alone, and before I have written a single line of code, I now have a three hundred and fifty-two line long `cargo tree`. There are three things to note on this.

First, and most obviously, this is far too many dependencies of dependencies, and this is before I add in a lot of other crates I'll need to work with this stack, or for my business logic. I can't seriously rely on - or even keep up with - a dep tree like this for a single and relatively simple project.

Second, a lot of these dependencies are made by anonymous users on github, are documented solely on Discord, change frequently, and do not appear to warrant a high degree of confidence. If Rust continues to gain traction, we will absolutely see security issues à la Pollyfill.JS or xz. For my livelihood, I'm necessarily going to place more trust in Microsoft's ASP.NET or VMWare's Spring than in @animewaifu420's three dozen half-abandoned packages and their 'dank' Discord.

Third, despite pulling in twenty-five times more dependencies than Alpine requires to run an OS, this is actually an extremely barebones web stack. No code-first database migrations, no authentication or authorisation, no internationalisation, no named routes, just to take some random basic features - all of this requires further crates (and, in turn, many dozens of further dependencies of dependencies.)

I'm using web as an illustrative example here, but this is an issue that permeates the ecosystem. I've faced similar issues for other use cases as well.

I like your trade-off framing, but I respectfully think it comes out strongly against Rust from an end user's perspective, at least at this point in Rust's history. I also think it elides over something important - the fact that Rust need not actually be like this. There's no reason Rust couldn't, in the future, have fully-featured crates, with dramatically curtailed dep trees and wide industry buy-in. But it's just not there at this stage. And so I return to my initial post - I quite like Rust, I've previously used it for years, but the ecosystem is years away from a sufficient level of maturity for me to feel inclined to return to it for any significant work at this stage.


> I can't seriously rely on - or even keep up with - a dep tree like this for a single and relatively simple project.

Our dependency trees in `uv` and `ruff` at Astral are bigger than this. We rely on it and keep up with it. We do not spend most of our time dealing with the dependency tree. (Although I do think our compile times are longer than would be nice.)

> For my livelihood, I'm necessarily going to place more trust in Microsoft's ASP.NET or VMWare's Spring than in @animewaifu420's three dozen half-abandoned packages and their 'dank' Discord.

If this ends up being true, then that means there is financial incentive to provide something more holistic, right?

I don't know what "dank" Discords have to do with anything here.

From my perspective, I see a lot of derision and vague feelings in your commentary, but little hard data. Tons of production web applications are built in NodeJS on orders-of-magnitude larger dependency trees. Tons of peoples' livelihoods depend on it.

To be honest, I have a lot of the same feelings as you. I don't like large dependency trees either. But that tends to be more connected to tangible problems like compile times.

> There's no reason Rust couldn't, in the future, have fully-featured crates, with dramatically curtailed dep trees and wide industry buy-in.

There's no reason C couldn't have better dependency management either, right? But it doesn't. And the problem isn't just technical.

> but the ecosystem is years away from a sufficient level of maturity for me to feel inclined to return to it for any significant work at this stage.

I suspect the ecosystem is more mature than you think it is, particularly if your understanding of maturity boils down to just a single number of dependencies. But like, if you had written this comment originally, I probably wouldn't have replied. I don't think what you're saying is totally unreasonable, but I also don't see this changing either. Like, if Microsoft wanted to come in and build a nice web framework for Rust, I think it's very likely it would be built on top of the existing crate ecosystem. I don't think they would, for example, roll their own HTTP implementation unless there was some technical reason that `hyper` wouldn't work for them. I do not think they would do it just to get the dependency count down.

What you're seeing is a manifestation of a million tiny little decisions. Should I write my own URL parser? Or should I use a crate for it that I know is also used in production by a number of large companies? Why would you choose to roll your own? (Again, excepting technical reasons.) You wouldn't. You would use the one that already exists. And this decision procedure repeats over-and-over. Big companies aren't immune to it either.

In any case, I did say "trade off" and not "Rust's way is always better." So my reply makes room for viewpoints like yours, where as I think your original comment didn't make room for a more balanced take. That's why I responded.


Responding to the Discord point, charitably, I think the person you're replying to is saying that Discord is a poor documentation platform that has some fatal flaws, and I do think it has unfortunately replaced web forums in a lot of contexts, which means a couple things:

1. It's unsearchable and opaque: you can't really do anything but the most basic of searches on Discord and getting support or finding previous/similar questions can be incredibly difficult. This also has the strong disadvantage of being unarchivable with makes the next issue more important.

2. It's controlled by a single entity and the viability of Discord, from both a service provider perspective and a community perspective is uncertain. For all we know, Discord, the company, could die without any warning, or there could be policy decisions which make hosting communities untenable. If that happens, it'd be incredibly difficult to get stuff off of there. With more decentralized communities, or even centralized, but internet-visible ones, there are potentially archives and backups that can help restore information.

This, imo, makes it not a great choice for a software project's forum or whatever. Maybe general IRC-style chat where conversations tend to be relatively ephemeral anyway? Sure. But it's not a great forum or wiki replacement imo.


I'm not sure languages without a package manager being the default are nearly as bad.

When it is trivial (like running a single command line or a single line in a file like cargo.toml), having a proliferation of dependencies is so easy it almost becomes guaranteed. But in languages like c++ where not everyone uses them (even if several PMs exist) adding a dependency to a library is a much bigger deal as you either have to use something like git submodules or manage them yourself OR make any user of your library go get the correct versions themselves.


> I'm not sure languages without a package manager being the default are nearly as bad.

The autoconf or cmake list of included libraries for many non-trivial C++ projects is usually just as long as stuff in Cargo, especially when breaking up Boost, Qt, or other megaframeworks into individual libraries.

They do "make any user of your library go get the correct versions themselves" but that has been far less of a problem this century thanks to OS package management.


The thing is, when adding dependencies _isn't_ trivial, you end up vendoring. You aren't gonna thoroughly test your command line parser if it is a very small feature of your overall project. A dedicated command line parser lib will (more likely than you in this case) thoroughly test their implementation.


I think it's important to figure out what you're comparing though.

A language that makes it easy to pull in dependencies also encourages breaking code into separate modules.

Languages that make dependency management hard tend to have larger, heavier dependencies. Two dependencies in such a language are more likely to have duplicate functionality between them, instead of sharing the functionality through another dependency.

Is it better to vet many smaller dependencies, or fewer large ones that likely duplicate a lot of stuff? It depends on what those dependencies are.

I don't think just looking at dependency counts is that useful. Many libraries that would be a single dependency in other languages are split into several because they are useful on their own.


> I'm not sure languages without a package manager being the default are nearly as bad.

Javascript doesn't have a specified package manager.


It’s different from Go, but then, Go is probably not the language you’re going to replace Rust with.

(I know it’s exactly different from Go’s dependency management, but you frankly rarely need any thing outside of the STL in Go.)


And yet with Python and JS I don’t need to pull a freaking third-party library for something as basic as JSON.


Most of my Rust projects don't need a JSON parser. You probably haven't been a programmer that long if you think JSON is "basic". If it was 20 years ago, you'd be arguing for an XML parser in Rust's stdlib. In 20 years I'm sure it will be something else. File/data formats come and go. The stdlib of a systems programming language shouldn't be taking on a forever maintenance burden for something that likely won't be in widespread use for all that long.


Nope, instead it's pulled into every project whether you need it or not.


Pulled how?


If I don't need it, it's not basic it's just useless.


> This is really no different from any other language.

There are languages with big standard libraries and first party frameworks.

I can build a complex web app in C# using only packages published by Microsoft in ASP.NET and EF.

Python ships with a lot of "batteries included".

Not saying I expect that from rust considering the funding/team size discrepancy and language targets - but I disagree that every language is same in this regard - JS/Node is notoriously bad, Rust is around C++ level, and plenty of higher level languages have first pary/standard library stacks.


Moment you add something to std it's effectively died. Strong backwards guarantees will choke development of library.

Python std is where libraries go to die.


Who says that's a bad thing ? Sometimes things are good enough and just need to be vetted and maintained.


Rarely that's ever true. Hardware and software assumption will change over time. E.g. Simd is rare and not available to 99% of hardware has it, pointer is 16/32/64 bit wide, that API is available, SGML/JSON/XML is all parsing you will ever need.



> A medium-sized Rust project can easily tip the scales at 2-300 crates, which is still rather more dependencies than anything I’ve looked at here, but that’s explained by the simple fact that using libraries in C is such a monumental pain in the ass that it’s not worth trying for anything unless it’s bigger than… well, a base64 parser or a hash function.

It's odd that this article spends so much time arguing that Rust is no different than C or C++, only to concede at the end that Rust projects do have more dependencies.


Because Rust has Cargo which makes it trivial to include dependencies.

C developers constantly reinvent because C dependency management is such a joke that no one bothers.


This is so on-point. Number of dependencies is correlated with ease of package management.

If you have a well-working package manager you're more likely to use a dependency than just rewrite that little part of your code.

For little programs written in C or C++ this usually means that people write their own buggy CLI parsing that doesn't produce a proper help message and segfaults with the wrong combination of arguments. In Rust people use clap. And they just need to derive on their CLI arguments struct.

And this process happens for every "small" dependency. In the end you're faster developing with cargo, you get a more professional result, probably you can even generate an executable for ARM and Intel without changing anything.

But OMG, you have a dependency tree.


> Number of dependencies is correlated with ease of package management.

I disagree with this, Python makes it trivial to add dependencies. [1] And yet Python developers tend to have the mentality of an older generation of software developers, pre-node/npm, and believe in keeping most things light. That starts with the language maintainers having a vision that says this should be possible, by including the batteries most people need for mid-size projects.

This has a compounding effect. People writing the dependencies that your dependencies will use might not even need dependencies outside of the Python standard library. In Rust, those dependencies have dependencies which have dependencies. You get a large tree even when everyone involved is being relatively sane. When one or two projects are not careful, and then everyone uses those projects ...

When I'm writing Rust, I eternally have the infuriating dilemma of picking which dependency to use for a relatively common operation. Having one blessed way to do something is an enormous productivity advantage, because it's never on you to vet the code. The disadvantages of a large stdlib are pretty overstated; even when an outside library outperforms the internal one (like Python's "requests"), the tendency will be for everyone to form a consensus about which outside library to use (like Python's "requests"), because that's how the ecosystem works.

Using technology designed 25 years ago has big advantages for Linux developers as well. Even when I need to add a dependency in Python, chances are I don't have to install one in my development environment because I can experiment with the shared dependencies already installed on my operating system. And I can look to the maintainers of the software I use - and my system maintainers - for implicit advice about which libraries are reliable and trustworthy, and use those. Rust, Go, etc., throw all of this out in the name of convenience for developers, but it's precisely as a developer that I find this mentality frustrating.

[1] People complain about the UX of e.g. pip in certain situations, but (regardless of whether it's a reasonable complaint) it's not the same issue as maintaining a stack of dependencies in the language itself being difficult.


> I disagree with this, Python makes it trivial to add dependencies.

In python it's surely easier to add dependencies than C/C++, but it's harder than in Rust and not so trivial. Everyone using your script has to manually run pip to _globally_ install dependencies, unless you use virtual environments which can take a requirements.txt file, which just show how the builtin dependency management is not enough.

> When I'm writing Rust, I eternally have the infuriating dilemma of picking which dependency to use for a relatively common operation.

Likewise in python you might wonder which library to use between urllib, urllib2, urllib3 and reqwest. People not familiar with python might even fail to guess which ones are part of the stdlib.

Granted that's just one specific example, but it shows pretty nicely how there will always be a library with a better design and a stdlib module that's good enough right now will likely not hold the test of time, ultimately resulting in bloat.


C has no dependency management. There are various other package management you can use with C though. I am quite happy with my Linux distribution package manager.

But I have to say it clearly: cargo is a supply chain disaster and ever changing dependencies are major problem for Rust. Rust programs having many dependencies if not a good thing.


I wonder if Go tooling does the job umm... "better" here. At least faster build time and debug build is much smaller.

At my current company, we handle payment, transaction etc with Go (some Fiber, some Echo). None of the projects reach 100 MB, and my pkg folder size is around 2.5 GB-ish. Those are the dependencies of all of my Go codebase. Well not bad.

Compare it with building a Rust sqlite web API which easily eat 3 GB disk. 10 similar projects may eat at least 30 GB.... :D

Disclaimer: I don't use Rust for work... yet. Only for personal tinkering.


To me the size of the codebase after installing dependencies isn’t that relevant. Are you starved for 30GB for your work? My current company has projects in PHP, rust, and a bunch of frontend projects with all the modern build tooling.

The largest service we deploy is ~300Mb, maybe 200Mb if you exclude copy of Monaco that we ship in the Dist folder. That web server will have terabytes of database storage behind it and 32 or 64 Gb of Redis/Memcached behind it. If we add in the Elasticsearch we’ve got another terabyte and a ton of memory.

If those dependencies aren’t checked into version control or being shipped to prod does it really matter?


For an application developer like me? No. It's technically not a dealbreaker. But probably more like a question for compiler devs.


Odd indeed.

It's also worth noting that the article has a real apples != oranges problem. Things like 'libsystemd' are not in any way comparable to your typical Rust crate.


> Though Zig might edge out Rust for this use case once it stabilises.

Zig has a meaningful advantage in the context of this discussion: lazy compilation. The compiler won't even semantically analyze a block of code unless the target requires that to happen.

Currently, dependencies are more eager than they really need to be, but making them just as lazy as compilation is on the roadmap. Lazy compilation means no tree shaking is needed, and it means that the actual build-graph of a program can be traced on a fine-grained level. We might not be able to audit a hundred dependencies, but auditing the actual used code from all those dependencies might be more practical.

This is well positioned to handle a common pattern: `small-lib` provides some useful stuff, but also has extensions for working with `huge-framework`, and it needs to have `huge-framework` as a dependency to do so. Currently this means that the build system will fetch `huge-framework`, but if we can get it lazy enough, even that won't have to happen unless the code consuming `small-lib` touches the `huge-framework` dependency, which won't happen unless the program itself needs `huge-framework`.

Existing build systems don't do such a great job with that kind of structure, and the culprit is eagerness.


Rust emits unreasonable amount of debug information. It's so freakishly large, I expect it's just a bug.

Anything you compile will dump gigabytes into the target folder, but that's not representative of the final product (after stripping the debug info, or at least using a toned-down verbosity setting).


> Rust emits unreasonable amount of debug information. It's so freakishly large, I expect it's just a bug.

Rust relies on the linker (via -ffunction-sections and -gc-sections) to delete functions that aren't ever used but the linker isn't capable of removing the corresponding debug info.

https://github.com/rust-lang/rust/issues/56068


Most of your target folder isn't debug info, but stale build artifacts because Cargo doesn't do any garbage collection.


Does it need a more compact representation of its debug info?


I swear the target folder for literally any project of any scale is at least several GB in size.


I get a progress bar when I run `cargo clean` because it's so large.


For a work project, my recent cargo clean removed 90 GB.


jaw drops


From crates.io, `serde` is a 76.4 KiB dependency. And from what I've seen looking through the code, it's pretty minimal.


Please show your work. I cannot reproduce "3gb of dependencies".

Here is my test:

Cargo.toml

   [package]
   name = "serde-test"
   version = "0.1.0"
   edition = "2021"
   
   [dependencies]
   serde = { version = "1.0.208", features = ["derive"] }
   serde_json = "1.0.127"
src/main.rs

    use serde::Deserialize;
    
    #[derive(Deserialize)]
    struct Foo {
        bar: String,
    }
    
    fn main() {
        let foo: Foo = serde_json::from_str("\"bar\": \"baz\"").unwrap();
    
        println!("{}", foo.bar);
    }
$ cargo build && cargo build --release && du -sh target

    ...

    78M target


I arrive at almost the same result as you, with 76MB.

I've also checked .cargo, .rustup, and my various cache folders (just in case) and haven't found any additional disk usage.

OP is clearly mistaken.


The first thing that jumps out is that the code example doesn't work.

The next thing is that the example merely calls cargo build. Using an IDE of any sort will typically invoke rust-analyzer which will bloat the target directory quite a bit. I've also found that stale build artifacts tend to chew up a lot of space (especially if you're trying to measure the typically smaller release builds).

Beyond that, none of the serde features that will tend to generate a ton of code are being used.

So yeah a minimal example won't use a lot of space but if you start to use the bells and whistles serde brings you will definitely bloat your target directory. I expect a typical rust project to take around 3–4 gigs for build artifacts depending.


> The first thing that jumps out is that the code example doesn't work.

Good catch. I forgot the braces. It does not change the target directory size in a significant way.

As for your other comments: sure! We can have a real conversation about rust-analyzer and other serde features (though I am not sure which specific features you are referring to) causing the target directory to increase drastically in size. However, a sensationalist comment that claims the _dependencies_ are 3gb appears to be misleading at best.


> So yeah a minimal example won't use a lot of space but if you start to use the bells and whistles serde brings you will definitely bloat your target directory.

Which seems orthogonal to the number of dependencies?


Jesus, 78MB is still a lot for such a simple program.


That’s not the size of the program but the size of the build artifacts folder that includes all the intermediate files like .o in a C project and more.


That still seems like a lot of build artifacts for a 10 line program?


It deserializes a unicode string to a custom structure. Do not mistake it with C character-shuffling hello-world-programs.

Edit: s/to JSON/to a custom structure/


I’m on mobile so I can’t check at the moment. But I’d be shocked if the equivalent go binary was anywhere near as big, or took anywhere near as long to build.

I’ll check later


I don't know what you consider big and long, but on my computer (Ryzen 5700X) it took 7 seconds to build, and the resulting binary is 556 kB.


I'm on an M1 mac, and I followed the instructions above (cargo build --release && du -sh target) and it took 6 seconds, and the target dir is 35MB. I ran it twice to make sure I wasn't pulling the remote dependencies

go.mod

    module json-test

    go 1.21.1
main.go package main

    import (
        "encoding/json"
        "fmt"
    )

    type Foo struct {
        Bar string `json:"bar"`
    }

    func main() {
        var foo Foo

        json.Unmarshal([]byte(`{"bar": "Hello, World!"}`), &foo)

        fmt.Println(foo.Bar)
    }

time go build && du -sh .

    go build  0.07s user 0.11s system 348% cpu 0.051 total
    2.4M .
I'd say 15x larger and 12x slower "bigger and longer" at least.


A 10 line program with two dependencies and all their transitive dependencies.


Built in JSON encoding/decoding is one of the things I’ve enjoyed about Swift. It’s nice when it’s not necessary to shop around for libraries for common needs like that.


Almost nobody is shopping around Rust JSON libraries unless they need some specific feature not provided by serde and serde_json. They are the default everyone reaches for.


The fact that a JSON parser is the sort of ecological niche where you naturally get a single best library which does basically what everyone wants it to do with minimal dependencies is exactly the argument for putting it in the stdlib, though.


That would require moving the entirety of serde and its traits (or equivalent) into the standard library. Considering how long it takes to stabilize basic things in the stdlib, I think that’s a terrible idea.

IMO Rust has the best tradeoff between stdlib and ecosystem libraries I’ve ever seen in a (compiled) language, and that includes serde/serde_json.


The moment you NEED to include a library to parse some basic JSON file - you’ve lost already.


Based on your other reply about JSON being a "basic" feature I assume you do a lot of work with JSON.

What you need to understand is not everyone works with JSON, and for them it's a feature to not have JSON parsing code in their binaries. It's not a loss for them.


Where did you get a notion that JSON parsing code will end up in a binary if it’s not used? Or Rust compiler is so obtuse it can’t tree shake unused code?


How did you get that from what I said? JSON isn't included in Rust binaries because of its ability to bring in only what's needed, and my ability as a developer to specify that as a fine-gained level at compilation time.

Using a language where you don't bring things in as needed means they're built-in at the interpreter level or some other scheme like a large required standard library.

Maybe in those languages your compiler is smart enough to filter out unused code, maybe you don't even have a compiler and you have to distribute your code with an interpreter or a virtual machine with all the batteries. Either way, languages where libs are brought in as-needed are at an advantage when it comes to compiler tech to generate such binaries.


Why have I lost what exactly?


I just added `serde_json` to my small GUI program. It alone increased the binary size from 6Mb to 8Mb; unacceptable for what it does.


What kind of machine are you developing on that runs out of space that quickly…?


this 100%. serde is a bloated monster, its sad that its the popular JSON, because all it does is make Rust look bad in my opinion. here are some smaller options:

https://lib.rs/crates/humphrey_json

https://lib.rs/crates/rust_json

https://lib.rs/crates/sj


The value prop isn't serde_json, it's automatically generated serializers and deserializers for structured data without needing an extra codegen step like with protobufs/capnproto, plus all that machinery decoupled from the actual data format you're reading.

It essentially generates a massive amount of code that you need to write anyway, at the cost of code size and compile time. And a lot of people are happy to make that trade off.

I wouldn't call that a "bloated monster" because of that. Also, none of those options are alternatives to serde_json, unless you restrict yourself to serde_json::Value - which no one does in practice.


> none of those options are alternatives to serde_json, unless you restrict yourself to serde_json::Value - which no one does in practice.

check your facts, all the above options have derive support, serde is not special in that.


merde_json should also be relatively small.


can rust use the json-c library?


I'd assume you could use bindgen and create bindings no problem.


People use rust for its memory safety.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: