Hacker News new | past | comments | ask | show | jobs | submit login
The Serde Rust Framework (serde.rs)
360 points by ur-whale 50 days ago | hide | past | favorite | 152 comments

serde is one of the best things about Rust in practice.

It is more convenient to use serde to serialize/deserialize to some standard format like JSON or YAML than it is to write your own half-baked format and serialization/deserialization code --- even for the simplest tasks --- so you just stop doing the latter. This is a fundamental shift, and very good for your software.

"Convenience" here actually covers a lot of ground. The APIs are convenient. Boilerplate code is minimal (#[derive(Serialize)]). serde's attributes give you lots of control over the serialization/deserialization, while still being convenient to use. cargo makes importing serde into your project effortless. All of these are necessary to achieve that shift away from custom formats.

Before Rust I used to write a lot of one-off half-baked formats. Pernosco uses Rust and serde and it uses no half-baked formats. Everything's JSON, or YAML if it needs to be more human-editable, or bincode if it needs to be fast and compact and not human-readable or extensible.

The only downside is compile time bloat.

Serde generates heaps and heaps of generic code. This all gets optimized away to be very efficient, but only once it reaches LLVM.

Ever tried working on a crate with hundreds or thousands of de/serializable types? Compile times shoot through the roof really quickly, and serde is often the culprit.

The maintainer of serde also created `miniserde` [1] to tackle this problem, which uses dynamic dispatch instead of generics and can have 4x compile time improvements.

Due to Rusts lack of orphan instances you really depend on a pervasive standard for serialization, though, so the ecosystem is pretty locked in to serde.

[1] https://github.com/dtolnay/miniserde

I've been using Rust for just a little bit, building applications/services/scrapers/databases/forking libraries, but I haven't noticed these issues--the biggest compile time issue I had was including RocksDB. I'm trying to imagine what you're building. :} 99% of the time I'm just doing incremental debug builds, and haven't really minded the compile times when doing release builds. Overall it everything has been fast, to the point where I want to start scripting with Rust.

Studies have shown that more than 87% of complaints about rust compile time are because people are enabling serde everywhere and for everything.

These concerns are valid. Data serialization/deserialization is an extremely standard thing to need to be able to do. It's not the user's fault if they use the obvious tool for an obvious job and suddenly have bloated compile times.

Seems like there’s opportunity for tighter integration with the compiler where serde moves into the compiler or there are acceleration primitives rather than doing the AST macros thing. This sounds similar to the problems of c++ template bloat.

Except C++ template bloat has solutions not yet available in Rust.

For example you can pre-instatiate the most common type parameters and just stuck those instantiations into a binary library.

Also if you have a nice C++ compiler environment like Visual C++, you can even do edit-and-continue for many kinds of changes, and VS 2022 will double down on that capability.

C++ has theoretical solutions to it. I’ve yet to see anyone actually do anything like that in any professional environment (from small startup codebase a to the largest c++ code bases that exist in the world) because that’s an unmaintainable solution with a high burden placed on the developer.

Whatever the solution Rust comes up with I think will look at the practical component so as to actually see adoption. It seems like the Rust team is very good at studying the mistakes other languages make in their approaches to a problem and finding the right fit within the language/targeted domain.

We used to do such stuff at Nokia, on HP-UX with Clearcase and .o build caching supported by cleartool using aCC, while working on NetAct, professional enough?

My point was more meant to be that it's extremely rare, not that it doesn't ever happen. Do people partially specialize templates? I'm sure yes, it happens from time to time. Lots of things happen from time-to-time depending on the mix of the team & the problem domain. If you picked a random team/codebase, how many do you think are carefully optimizing partial template specialization? Moreover, what tools even exist to let you profile the contribution of a missing specialization and measure the impact of your change? Not sure how .o build caching relates to partial template specialization though.

With regards to your specific experience, I don't know that I'd say that Clearcase, HP-UX or aCC are main-stream professional C++ development environments. I think for that you're really looking at clang, gcc, & msvc with icc being a distant fourth that has itself moved to clang recently. The primary target OS would either be server Linux, Android, macOS, iOS with maybe FreeBSD picking up the slack. HP-UX has a niche but it's a small niche comparatively in terms of $ spent on developers in that ecosystem. In terms of source control, everyone is mostly on git or mercurial. Anyone not just refuses to get with the times because they have bought into their existing tool expertise but don't know how to adapt.

@burntsushi had a post where he explained that the common error-handling crates also add big compile-time costs. Most projects use them as well as serde, so the problem compounds.

I think that's a misunderstanding. That's a very different problem from the serde one.

The compile time costs for the error handling crates is due to increase in build graph size, not due to the cost of running the error handling proc macros or the cost of compiling the code generated from them. For anything but a small project, that increase in graph size is probably not a big deal (and likely the project is already using much of the same deps). This cost does not increase as a project gets larger.

serde also probably has the build graph size problem, but the bigger problem is the cost generating and compiling the generated code. This cost tends to increase as a project gets larger.

Can you provide a link to that post?

That's great. That means that if the Rust compiler is made faster in this one case it'll make a huge difference across the board.

That's true. Pernosco has lots of serialization code and compilation is slower than I'd like. I feel that it's still easily worth it.

Lack of orphan instances really aren't that bad of a problem in practice. If really necessary, the newtype pattern is very easy to do in Rust and has the benefit of never conflicting if the dependency adds a trait implementation. What is way more important is keeping your crate sizes manageable since IIRC Rust doesn't do any incremental/parallel compilation within a single crate.

Compilation speed isn't just one number. While 5min isn't bad at all for compiling a whole project from scratch including dependencies, 5min is a really long time for incremental builds.

You just do `incremental = true` and `codegen-units = 1024` (or how many shards you want, afaik per-crate), and it's perfectly incremental. Even thinLTO does incremental, but I suggest you use a modern linker instead of old GNU ld.

It's a tradeoff. Switch to Python and watch your serialization code become at least 10x slower at runtime.

10x slower seems unlikely - if in python you’re using the same rust serde library for your json parsing or whatever, E.g. you “poetry add orjson” to pull rusts serde lib into your python project, then the actual serde operations themselves will be native rust speed.

The problem is IMHO less the increased runtime cost, but that it also tends to get less reliable, robust, harder to get actual right (instead of just somehow working) etc.

In my observation, the Rust community is very hesitant to use dynamic dispatch, even where it wouldn't hurt performance (true for most things that involve I/O) and would dramatically reduce compile times.

I'm not sure why. The fact that Rust makes dyn traits significantly more verbose doesn't help, of course, but that can't be the only reason.

Writing `dyn` in front of types doesn't really make a difference, it's only a little bit of added verbosity.

The main problem is that trait objects are very limited.

There is no downcasting or upcasting (eg C++ dynamic_cast) and trait objects are limited to a single trait. You can't have `Box<dyn A + B>`.

That leads to lots of headaches in practice.

I hear downcasting might be on the horizon, so that's something to look forward to.

I think in most ecosystems it's easier to use a prebuilt serializer/deserializer than your own format.

Nah, not like it is in Rust. For the vast majority of projects, serde makes parsing / serializing something they don't even have to think about, without having to sacrifice type safety to get there. Until you experience that in a large project with a lot of moving parts and developers, it's hard to appreciate just how many developer cycles you were wasting before on stuff that can be totally automated. It's format-agnostic, it's really fast (not the absolute fastest, but fast enough for most situations), it's insanely customizable for even the weirdest use cases, it's absolutely everywhere, it incorporates proper error handling, and adding it to your own types is trivial.

Is there any actual reason something like serde couldn't exist in other languages? None that I know of, but it doesn't, at least not when you consider just how pervasive serde is in the ecosystem (almost every library that stores things you want to serialize will have it). I really hope every language can standardize on something similar; it just makes development so much nicer and faster when you don't have to worry about this stuff.

> For the vast majority of projects, serde makes parsing / serializing something they don't even have to think about, without having to sacrifice type safety to get there.

This has not been remotely my experience. Deserializing has a lot of implications for ownership and you get into complicated deserializer trait specifications pretty quickly.

In particular, there doesn’t seem to be a way to deserialize a type that has a reference because you don’t seem to be able to tell serde who should own that memory. Maybe I’m wrong, but I’ve walked through this multiple times with experienced Rust users and no one could get it working.

Serde is impressive, I’m sure, but far from “don’t even have to think about it”.

We almost always deserialize to owned data, partly because our deserialization source is almost always compressed so the data has to be copied. There are a few cases where we want to serialize with a reference and deserialize to owned, but that's no big deal using custom "with" functions.

We have hit one serde footgun, which is that skip-serializing-if corrupts your data with formats like bincode :-(. https://github.com/serde-rs/serde/issues/1732 really wish that could be fixed.

> Is there any actual reason something like serde couldn't exist in other languages?

C# standard library has DataContractSerializer for decades. That thing reads/write text XML by default, but with minor tweaks can do binary XML or JSON.

I like their binary XML format the most. Very fast because doesn’t waste time printing or parsing numbers or Base64 bytes. Also, pre-shared XML dictionary, and session-accumulating dynamic dictionary, makes the serialized representation very compact, sometimes an order of magnitude better than text/xml.

These sorts of things are really convenient, but the idea of tightly binding your type definitions to serialization formats always makes be a little nervous: I haven’t used Rust enough in practice to have an opinion here, but I’m wary of a solution in this space that makes it too easy to ignore the forward/backwards compatibility of your external format.

Yeah, for sure that is something to be cautious of. For cases where I expect backwards compatibility to be a concern, I will usually make the top-level type of the message or format an enum with each variant being a version (initially with just a single version, V1 or V0). That way, if we need to make backwards-incompatible format changes, we can create a new variant, and potentially even automatically migrate between formats (I've done this a few times). Then all we have to do is call a function to migrate to the latest format, and we can use whatever that is for all the actual program code (generally it will be a type alias for the version structure of the same name). Of course, if this gets too complex or you start to have non-treelike relationships between values, you still want to fall back on a more traditional database schema, but this kind of versioning can get you a long way.

That said, it's incredibly refreshing to be able to focus on these semantic and migration concerns without having to worry about how to deal with parsing them later. serde doesn't solve enforcing backwards compatibility for you, but it sure makes doing the right thing a lot easier. You never have to worry about some code accidentally forgetting to check the version field, people carelessly mixing and matching messages with different versions, or fields that "should" never be set but are there for backwards compatibility. That's worth a lot to me!

I rarely have to think about deserialisation in c# or java either...

I wrote a lot of Java back in the day. I elaborated on why I disagree in the other comment. I do not know if the situation in C# is similar.

For context, the example from the Serde wiki written in Java using Jackson

  public record Point(int x, int y) {}

  // [...]

  var objectMapper = new ObjectMapper();
  var jsonString = objectMapper.writeValueAsString(new Point(2,1));
It isn't obvious to me what advantage Serde offers that you couldn't get with Jackson or other similar libraries in Java (although I get that Java and Rust are different languages, and that there may not be something this ergonomic in the likes of C++)

For that example there's no advantage. But it's really, really easy to find examples where there are (the simplest example is, anything with pointers with @NonNull annotations, or anything which is performance-sensitive and contains other objects).

That has been the same in Java and .NET world for 20 years now, just plug the desired serializer from the standard library and be done with it.

It absolutely does not provide the same guarantees, and I've covered why in several posts now. Do you believe that people who disagree with you have never used any language other than Rust before? Having to deal with data structure invariants getting violated due to reflection is a huge headache and makes the parsers way less useful than they would be otherwise.

Yes, because those same people seem to be unaware reflection is not the only way to use such APIs, revealing a superfical knowledge of how those ecosystems have been working for the last 25 years.

How is it implemented in Java or C# if not via runtime reflection? I've always assumed that's the way its done...

Java has annotation processors which can perform code generation, just like Serde does.

Manifold [1] an example framework that relies heavily on that, achieving similar things like Serde.

Java developers don't like to use these things too much though, unlike Rust developers who love their `#derive`, so you're mostly right that reflection-based serialization is still more common, but that's by choice.

Dependency Injection is in a similar situation: frameworks like Micronaut [2] can do it without reflection, and things like Google Dagger [3] have existed for several years that do the same thing... but still, most Spring Boot-based projects I know of use reflection-based DI as well... it's fast and the security issues have been largely mitigated nowadays.

[1] http://manifold.systems/

[2] https://docs.micronaut.io/latest/guide/

[3] https://rskupnik.github.io/dependency-injection-in-pet-proje...

Thank you for the detailed write up! I love finding an out about cool language feature/tools

I'd always recommend folk use constructor-based deserialisation, rather than getting the deserialiser to create empty objects and poke at fields.

That's still (at least naively) using reflection, but it's a lot safer.

In practice, using reflection for every object being deserialised is far too slow and Jackson at least generates code at runtime rather than repeatedly reflecting. That's magic, but it's nice magic because it's not breaking the language: one can see that it's possible to implement it in pure Java.

I don't program Java, but I have implemented and used similar things for C#.

Reflection is indeed used there, but not to serialize. It's only used once per type, to generate code.

C# has multiple ways to generate code in runtime. One is Reflection.Emit, allows to manually generate bytecode instructions + metadata. For instance, one can build new types in runtime. A typical pattern is implementing manually-written abstract class or interface with generated types: this way the manually-written code can call into runtime generated one.

Another method is System.Linq.Expressions. This one allows to generate code (no new types though, just functions) from expression trees, and provides API to build and transform these expression trees.

Regardless on the method, the generated code is no different from manually written code. JIT compiler can even inline things across, when generated code calls manually written one, or vice versa.

Ah the bytecode generation via Reflection.Emit seems like a really powerful feature. Does it feel "brittle" to use, or is it expressive enough to aid writing more robust code (to a degree, I realise it's not possible to aliminate every problem)?

There's a weird (not in a bad way) elegance here that reminds me of lisp.

> Does it feel "brittle" to use

Yes and no.

No because when you try to do unsupported things like calling a method on an object which doesn’t support one, you gonna get an appropriate runtime exception.

Yes because if you fail lower-level things like local parameter allocation, you gonna get an appropriate runtime exception but that one is (1) too late, I’d prefer such things to be detected when you emit the code, not when trying to use the generated code (2) Lacks the context.

Overall, when I can I’m using that higher-level System.Linq.Expressions for runtime codegen. Things are much nicer at that level. I only using the low-level thing when I need to emit new types, like there: https://github.com/Const-me/ComLightInterop/blob/master/ComL...

You can use stuff like VS T4 templates or compiler plugins via annotation processors.

So at compile time, just like serde works, you create your type safe serializer.

Why would reflection break invariants? Most deserializers can pass through data into the constructor via reflection as well, where you can protect against invalid objects.

Yay, except that at least Java does so by using reflections and has lead to more then just a few massive security vulnerabilities.

Just to be clear, there are Java libraries which do thinks better in a less problematic way, just the facilities shipped in the standard library are IMHO inadequate.

Macros are definitely key. I wonder if ownership helps too? Circular references are always weird when serializing

The trick with Serde is that it decodes straight into a native Rust struct. You get most of validation for free (it also nicely takes advantage of Rust enums with data), and struct access is maximally fast.

A generic JSON decoder would give you a dynamic structure that can contain anything, and then you'd have to pick it apart.

Most deserializers in typed languages allow you to deserialize straight into structs or typed objects...

Serde just doesn't use reflection to do it.

APIs like GSON and Jackson have a lot more footguns, because (1) they attempt to (de)serialize types whether or not they were ever intended to be (de)serialized, (2) they are missing information about important constraints due to losing information at runtime (the excuse usually used here is that "they don't do validation" when in reality they are violating expectations of package modularity and encapsulation in a pretty blunt way). For example, they will happily ignore non-null constraints on your fields. In practice, this is a major annoyance and ends up making you very paranoid about (de)serializing any structure you don't have full control over.

Performance of a lot of the reflection-based parsing APIs also leaves something to be desired, which means that projects with more stringent performance requirements often have to resort to stuff like Avro. This still happens with serde, but much more rarely--besides having more compile-time information at its disposal and needing to allocate less, it also provides relatively straightforward hooks to achieve things like zero-copy deserialization for strings where the whole buffer is available at once.

I’m pretty sure the non-null issue is fixable with a single config option, at least in GSON IIRC.

I don't believe it's fixable at all if you're using someone's third-party library that wasn't designed for GSON (unless something has changed a lot since I last used it). The annotation has to be stuck on the original field, AFAIK. And similar stuff apply to other invariants on the type. I believe these kinds of issues are somewhat fundamental when you decide you're going to start (de)serializing structures that never thought about whether or how they'd like that to happen.

In Jackson, you can use mixin classes to define annotations for 3rd party libraries.

C++ isn't super easy about it, though. The best C++ json library (in my opinion), nlohmann's json, still requires you to define to_json and from_json for your types to use them. It's not too bad as a one-off, but when you have dozens of types, it gets really tedious compared to Serde, and managing std::variant with it is way more annoying than Serde with Rust enums.

However, if you just want a very minimal subset of a JSON response (i.e. a 'status' string field out of say 20 other values in a map), is it not better to just look for one field, as opposed to have to create an entire struct in Rust to represent the full response?

How does Serde work with optional extra fields? I know you can use Option<>, but that implies you know they exist - what happens if fields get added in the future silently? I guess the schema changes then, which isn't good, but that might happen?

An "entire" Rust struct containing a single string field is just a struct with a single field. You don't have to specify the fields you don't need and it's unlikely your handrolled parser would be faster than serde (unless you are using memchr or something, then in that case you aren't parsing JSON).

You can add a catch all field that captures those in a map. You can also choose to ignore things you don’t explicitly mention. It is up to you how strict you want the parsing to be.

So you can define a struct with one status field and the right annotations and you will get exactly what you describe without having to write the code and it will still be almost as fast as doing that parsing yourself.

If you only want a single field in json, you define a struct with just that single field. Serde does have the ability to return an error about unknown fields, but it is not enabled by default.

This also informs your second question. If new optional fields in json are added, unless you tell serde to complain about them, it won't, and you can add the new Option at your leisure.

As far as I understand, unmapped fields are simply ignored, so you can write your code like

    struct Response { status: u64 }

> However, if you just want a very minimal subset of a JSON response (i.e. a 'status' string field out of say 20 other values in a map), is it not better to just look for one field, as opposed to have to create an entire struct in Rust to represent the full response?

You can define a struct with just the status field.

> How does Serde work with optional extra fields? I know you can use Option<>, but that implies you know they exist - what happens if fields get added in the future silently? I guess the schema changes then, which isn't good, but that might happen?

Unknown fields are skipped by default.

Fair enough. I have not found that to be true in C/C++, which is my main point of comparison. For very simple cases it's generally easier to write a few lines to a text file, maybe with some whitespace separators in the lines, than to use an "real" format.

For JS and Python, JSON.stringify/parse and json.loads/dumps are a step up, but you still end up with an untyped mess with no schema validation, which makes them only halfway solutions to me. I'm a static typing guy at heart, sue me.

Yea, but with serde, you don't get an untyped mess. You either get the type you expected or you get an error.

Same is true in any typed language.

I don't think that's true. Or at least we have different ideas about what's being discussed...

Let's take JSON or YAML for example:

Rust's closest siblings C and C++ are typed, and you get a pretty much untyped messes when working with serialisation or deserialisation. You have to manually inspect every node or write manual ”NodeType" to struct conversation. They allow unwrapping to primitives at best (eg via template specialisation).

In Haskell it's a bit more awkward, but you can kinda pull something similar off in terms of ergonomics. I've not worked with JSON in Haskell that much that I need to look for it, but I don't remember there being something as ergonomic as serde.

Typescript doesn't support this type of validation either out of the box (there might be tools that add validation).

In Elixir there's the Poison library, or it could be done with Kernel.struct/2 (not 100% sure this will work though).

Of the "mainstream" languages Go is pretty much the only one that has as good ergonomics as Rust in that it supports it pretty much natively (via struct tags)


This list excludes codegen tools which can generate (de)serialisers from an external schema (a la capnp, grpc, jsonschema, etc).

> In Haskell it's a bit more awkward, but you can kinda pull something similar off in terms of ergonomics. I've not worked with JSON in Haskell that much that I need to look for it, but I don't remember there being something as ergonomic as serde.

I think Haskell's most popular JSON library aeson is basically the same thing and probably a source of inspiration for serde.

  data MyCustomType = ... deriving (Generic, ToJSON, FromJSON)
or if you want more control you can use deriving-aeson

  -- I prefer snake case instead
  data MyCustomType = ...
    deriving Generic
    deriving (FromJSON, ToJSON)
    via CustomJSON '[FieldLabelModifier CamelToSnake]

Ah, I've heard about aeson, but not used it. Thanks for pointing it out!

> Of the "mainstream" languages Go is pretty much the only one that has as good ergonomics as Rust in that it supports it pretty much natively (via struct tags)

Swift does this too, using the Codable Protocol. The compiler generates the protocol conformance code for you, if you say a type is `Codable`, and the properties the type contains are all Codable types (Int, Float, String, URL, etc. conform to it), you don't have to do anything.


Ah, I don't use Apple devices, so I've never been able to play with Swift(1). So I didn't know Swift had it too; thank you for pointing it out!

(1): which is a shame, I remember when it came out that it seemed like a nice language

Swift is available for Windows 10 and a few Linux distros (Ubuntu, Cent, and Amazon Linux 2) according to their download page.[0] If you're still interested, go give it a shot.

[0]: https://swift.org/download/

Most typed languages it's like this

var account = JsonConvert.DeserializeObject<Account>(string/stream)

That's it. No inspection or looping through trees. You can often get it to create objects via constructor for validation.

It's been like this for 15 years...

Maybe we are having different ideas about what is being discussed, but in Java and C# (arguably the most mainstream languages) you have the same experience supported natively via annotations and attributes.

Ah, I stand corrected then.

I've worked with the Microsoft tech stack very very little, so I've never bothered to learn C#.

I've completely forgot about Java; I haven't used it in ages. Now that you mentioned Java, I realised Kotlin has it too.

Swift also has native support. I am actually not a huge fan of the ergonomics of Go’s struct tags because of how cluttered they get in practice. It feels so unstructured like the rest of the language, but maybe that makes it perfect for Go.

Yeah I don’t know why you would try to write your own textbased format. GSON and Jackson have been around for over a decade in Java, and equivalents in other languages. Unless you are in some perf critical application, a library is almost always better for this problem.

At least in languages with similar native performance as Rust (C/C++), in my personal experience, serialization has often been difficult and manual, and most libraries have fallen short of my expectations, or had serious performance, security, or toolchain issues, to the point I usually hand-serialize myself.

The other benefit is that almost all of the Rust ecosystem supports serde, often by way of a feature. So using serde can mean being able to embed a type from a crate in your serializable type or being able to pass your own types to crate functions/methods that serialize/deserialize. It all ends up being interoperable.

It's also blazing-fast.

I've done some benchmarking on JSON -> in-memory borrow structs and it was pretty darn impressive while being fairly ergonomic to use compared to a streaming parser.

Completely agree. FWIW, I think procedural macros make a huge difference for ergonomics. For example, a "similar" library in C++ (libcereal) is still a PITA to use and debug.

The extendibility of it is impressive, as well as the list of formats supported by the community. The ability to output JSON for debugging next to tight-concise binary formats, is really appealing.

I've dealt with a number of C++ based serialization libraries, and they always have serious downsides to the point I often end up hand serializing structures with one-off half-baked formats just like you, which is error prone and laborious.

Whilst not classical de/serialization I wrote serde_v8 (https://github.com/denoland/serde_v8), an expressive and ~maximally efficient bijection between v8 & rust.

It has powered Deno's op-layer since 1.9 (https://deno.com/blog/v1.9#faster-calls-into-rust-with-serde...) and has enabled significant improvements in opcall overhead (close to 100x) whilst also simplifying said op-layer.

Honestly using serde for language interop is one of my favorite things about serde, whether it's "classical de/serialization" or not. I've recently had the very-pleasant experience of writing some code that needs to pass geospatial data back and forth between Python and Rust, and found that the geojson crate, even though it's nominally for JSON, actually works with other serde-compatible things, including (something I found kind of miraculous) Python objects, using the pythonize crate, which can walk them with serde visitors. So as long as I can get my data into a roughly-geojson-shaped thing on the python side, I can consume it on the Rust side, without having to ever actually produce json.

Thats fantastic.

I maintain the nodejs bindings for foundationdb. Foundationdb refuses to publish a wire protocol, so the bindings are implemented as native code wrapped by n_api. The code is a rat's nest of calls to methods like `napi_get_value_string_utf8` to parse javascript objects into C. (Eg [1]). As well as being difficult to read and write, I'm sure there's weird bugs lurking somewhere in all that boilerplate code. I've made my error checking a bit easier using macros, but that might have only made things worse.

I'd much prefer all that code to just be in rust. serde-v8 looks way easier to use than all the goopy serialization nonsense I'm doing now. (Though I'd want a serde_napi variant instead of going straight to v8).

[1] https://github.com/josephg/node-foundationdb/blob/c1165539e5...

While what I am using it for is not for performance, but I'm also using Serde as a bridge for my template engine and it's such a nice experience. I just wish it was possible to pass auxiliary information between serializers without having to resort to thread locals.

To me, Serde is one of the killer features of Rust. Once you realize how much of programming is (for better or worse) shuttling data between different representations, it's hard for me to program in a language where I don't have a tool like Serde.

A big part of its flexibility is how modular it is. Most common Rust libraries support serde serialization (at least as an optional feature), so if you use crates that do, you can plug any backend in and serialize those data structures to it. It doesn't even have to be string-based; I've been using Serde to store arbitrary data structures as objects on Google Cloud Firestore.

Two things make serde great: (1) It's format agnostic (2) everyone in the community uses it.

My experience with other language ecosystem (I'm thinking Haskell or Scala) is that serialization/deserialization libraries are limited to one format, with incompatible APIs. Even if the libraries in those ecosystem provide the same kind of guarantees, it still doesn't compare to serde. Typically any framework or library (cats, play, scalaz, akka) will be either inflexible with format supports or require glue code (provided by the user of the library or a 3rd party) to even work with your pet codec library, which end up also limited to a specific storage format.

With serde, the assumption is just that things work, and it's fantastic. Example: I have my web server (written in rust) communicating messages to the frontend (in elm) with a websocket. For fun, I decided to test druid's wasm output when it came out (druid is a UI framework), see how feasible it would be to replace the frontend. And I could just pick up the type definitions from my rust server, add a websocket dependency that works on wasm and _it just worked_! I could then switch from json to msgpack and it worked the same.

The strength of rust is the community. There is a strong sense of collaboration and under the rich diversity of frameworks and high level libraries, there is a culture of modular and shared code and contributions that makes every thing that more flexible and reliable.

Ah, cool. If you implement ToJSON from aeson for a type in Haskell, you also can do yaml serialization without any extra work (besides bringing in the YAML library, but the point is that the yaml library reuses ToJSON), but I don't know about alllll of those others that I see here: https://serde.rs/#data-formats)

Pretty cool!

I just came from rust to go and… what a disappointment.

Missing fields default to some kind of default “zero value” - for any type, even full blown structs. You can’t tell the difference of a missing field or the field having the default value.

So if a field has a validation of “must be greater than zero”, you can’t really give a proper error message. If user puts in “0” or omits the field, you always get a value of “0”.

I just went from two years of professional Rust development to Go, and I’m right there with you. Thankfully we have some terrible C++ that looks to be a good candidate for a rewrite.

If you want a field to be able to be not-set, you just need to specify that by making it an `Option<_>` type. So like, if the field is a number, make it an `Option<u64>` -- that way you can distinguish between not-set (`None`) and set to zero (`Some(0)`).

Edit: whoops, misread which language it was you were frustrated with. Nevermind!

You misunderstood, parent is complaining about Go, not about Rust. In Go you would have to use a pointer to represent Option<T>, which makes code really awkward.

Go serialization really has quite a few issues apart from default initialization. Configuration by somewhat weird struct tag strings which are only evaluated/validated at runtime, de/serialization is all done via reflection (unless you want to use code generation), ...

> In Go you would have to use a pointer to represent Option<T>, which makes code really awkward.

This sounded surprising to me, then I remembered Go doesn’t have generics. I imagine this won’t be as much of an issue when it does?

Emulating ADTs without language support/pattern matching is quite awkward. A prime example is std::optional in C++.

.map() et al only get you so far.

When extracting the value in Go you either have to return a pointer / error pair or abort. Both of which introduce awkward code patterns and potential for misuse.

Perhaps if they were to remove the concept of nil in golang, but since they are all about backwards compatibility (which is nice to be fair), I don't think there's much change of seeing mainstream adoption because of how many existing projects will rely on default value in certain circumstances

I'd love to be proven wrong, but even with generics if you start using option types, it's going to be like JS and TypeScript where you'll have some parts nicely typed and others are the wild west

I know you’re giving helpful advice, but the author is talking (complaining) about Golang, which has no option typing (unless you want to use a pointer).

when Optional<u64> is used in rust what's the memory layout like? is the absence tracked using a pointer which is null or a bitvector of optional fields?

Option<u64> would look like tagged union in memory, so one bit "tag", 8 bytes u64 and padding.

If type inside of Option<_> has invalid values (like references which can't be null) one of these would be used for None, so Option<&T> is the same size as &T.

> If type inside of Option<_> has invalid values

Can it actually use any general invalid bit pattern? I expect that it only supports an optimization for values that cannot be zero (and then it uses zero to indicate None).

thanks and is the padding for aligned access (wasted space) or just full byte (poor performance on non x86/64 like architecture)? because that's the interesting part as to what's the tradeoff chosen and whether a dev can choose a different one. Any doc references are appreciated.

They are padded for aligned access, so on amd64 Option<u64> will be 16 bytes.[0] However, the unused discriminant values are counted as niches for enums wrapping that, so Option<Option<u64>> is still 16 bytes (for example).

[0]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

I've usually seen this handled by having everything be a pointer, the default value of the pointer is nil. You can also use the SQL values.

Isn’t it insane to change your types to accommodate this? You have to change all fields accesses, it behaves differently, you lose non-nullability…

> Isn’t it insane to change your types to accommodate this?

Welcome to Go! This was a real pain for us at my last job, where we used go-swagger to generate REST API code. All the generated structs had pointers for _non-optional_ fields so it could also check that they were populated. Meanwhile, optional fields were not pointers because they could use the default value.

So then we ended up with lots of code doing checks for `is X nil` all over the place. Many of those wouldn't be needed in normal usage, since a nil value would be rejected by the validator earlier, but we also sometimes needed to make these structs by hand and pass them around.

All of this is necessary because of the lack of generics. With generics you can solve this with things like an `Option<T>` type, like Rust does.

Yea exactly, pointers id allowing nils, which I don’t want to conflate with omitting either.

In the end what I did is parse the JSON-string twice, once more with a “checkjson” library that tells me about missing fields.

Pretty excited for Go 1.18 for that reason! Will likely take some time for the ecosystem to shuffle out good solutions for different bits, though

Well that's go for you.

Go is simple and "default case" which is simple and works for most cases... and then you have edge cases where you need to bend over backwards.

As an exercise, try to use comma (",") in JSON field name.

Don't you do that in rust too? You'd be working with Option<T> instead of T.

The critical difference is that an Option<T> is not a T, but a nil pointer is still a pointer.

The stylistic difference is that the Rust type is semantic: your optional values are Options. The Go type is silly: why does being a pointer (which is allowed to be nil) imply required?

Option<Int> is a really good way to describe a field that if present is an int, but may not be present, though. Pointer isn't.

I don't think so. Speaking as a Rust dev, if you want to do anything slightly different from how things currently work, you need to change the types, which can be a major hassle.

This is a thread about Serde, not how superior Rust and how shitty Go is. We've had more than enough from this ...

Right, serde is one of the many features about Rust that people miss when writing Go. Simply a group of downtrodden programmers bonding over a shared experience the memory of which evoked when reminiscing on a grand cathedral. But alas the mere reminder of the existence of such a grand structure weighs on the heart of the plebeian. No need to rub it in sorry gogrammers’ faces that their language is a limp noodle in comparison. Thanks for reminding us (;

> No need to rub it in sorry gogrammers’ faces that their language is a limp noodle in comparison.

Say what you want about Go, but gophers get better jobs than you, meanwhile almost all rust jobs are shady crypto startups. Hard truth lol.

I know how to write both, thank you very much. And I predominately use Rust and Swift at my current gig, which is not a crypto anything. Did you know Microsoft, Cloudflare, Google, and Mozilla all use Rust? Wow. Such shade. Much crypto. I guess you're right about them all being startups though.

Frankly Serde’s continued success has made it basically necessary for most Rust development. When the Bazzar organically builds and loves a mini-Cathedral it’s probably time to upgrade it to standard.

The increased complie times are the only real issue with Serde.

Especially if upgrading this crate into std would allow for a way to reduce the compile time “penalty” of very common formats like JSON, etc.

Because Serde is a common API for pretty much all serialisation formats supported in Rust, it's very easy to try multiple formats to choose the best size/speed you need. JSON too big? You can make it CBOR or Msgpack with a couple of lines. Want faster? Call bincode instead.

Yeah, the format independence is amazing. In Pernosco we use bincode for IPC payloads and when we want to log IPCs we just serialize the payloads to JSON with zero developer effort.

Have you considered using a zero-copy serialization format like capnproto, Flat Buffers, or rkyv for binary IPC payloads? Or did you decide that getting that last increment of performance wasn't worth the trouble of using something less easy than serde?

Yeah our IPC marshalling just isn't that relevant for performance AFAIK.

I'm glad serde exists but I don't like HOW it works:

I think there should be a push for rust-lang to provide introspection. If you want to use serde, every crate needs to derive serde traits as well. This is not only bad architecture but it also prevents other libraries to get popular. For example, there are many "mini-serde" variants and I personally think, they are awesome (and surprisingly fast), yet you cannot practically use them because literally no crate supports anything else than serde.




I wonder if the c++ approach of boost.pfr would be portable to rust ? It allows reflection on aggregates without needing to annotate anything: https://github.com/boostorg/pfr

I think currently, the only closest thing is bevy-reflect. Which is nice, but IMHO it should be in the language because nobody supports it ATM and to get there, it will take at least the same time it took for serde to get supported.


Serde solves a lot of problems but using a dedicated macro for it feels like a real missed opportunity compared to using a generic macro for the part that needs a macro (treating structures generically), i.e. what frunk does, and then building serialization as just one of many possible use cases on top of that. Compare how this is done in the Scala ecosystem, where circe-generic is just one of many libraries that use shapeless to represent datatypes as generic records (and then serialization proceeds from there), and the same representation code is also used by e.g. doobie (a sort of lightweight ORM) to transform records to and from database rows.

(Or, better still, building generic representation of structs as records into the language/compiler, where the whole ecosystem can depend on it and we don't have to bloat compile times generating it with a macro).

On the other hand, Scala's uPickle works almost identically as how Serde does, originally for performance, but the final visitor-based architecture ended up being almost identical.

KotlinX serialization works basically the same way too; it definitely seems like a case of convergent evolution

Well "walk the object graph with a visitor" goes back at least to early (reflection-based) Java XML serializers; a generated-code version of the same thing is not a huge leap. I find it a frustratingly imperative way of approaching the problem: you can't think about what happens without thinking about control flow, because the visit can't be defunctionalized as a value. You may not want to physically reify the generic representation of the value - there would be a performance cost to that - but having it available to think about helps keep your code (and understanding) clear, in the same way that when doing a series of transformations of a large collection you probably want to be able to conceptualise "the collection in between step 1 and step 2" even if you don't want to actually instantiate that in memory at runtime.

I’m surprised to see that they don’t support protobuf yet which seem to be gaining. But otherwise seems like a good framework

Yes, that would be really nice. Protobuf doesn't really fit the model of serde though because of how closely coupled it is to its schema files. Still, there's two good implementations with the prost [0] and the protobuf [1] crates to choose from.

0: https://crates.io/crates/prost

1: https://crates.io/crates/protobuf

I believe that is implemented by a separate crate that simply implements ‘(De)Serialiazable’

Is there any Rust web framework that does not have problem with Slow-Loris ?


Because of the way Rust's async works, you can put a timeout on anything (e.g. you don't need your http parser support timeouts, you can kill it at any await point). So it's only a matter of choosing timeout policy for your app. One person's DoS attack is another person's long polling API.

Anyway, it has nothing to do with Serde, which doesn't have a network component. For Serde you'd typically buffer the input first, or use external framing (like line-oriented JSON).

None of the current Rust async frameworks will prevent your synchronous code from going haywire.

Tokio or async-std timeouts only cancel futures. The dedicated spawn_blocking threadpools also do not support cancellation.

To prevent such cases you would have to spawn the work into a dedicated thread pool and then kill the thread on timeout.

Terminating threads is a very complicated topic though and not generally possible for compiled languages, especially without introducing memory unsafety.

The only real workaround is to have the synchronous code regularly check for cancellation via an atomic bool or similar, and terminate if required.

Slow loris is I/O based, so it’s easy to abort. The rest will follow from that. Even if you piped your request stream into synchronous blocking io::Read adapter for use with Serde (which IMHO is pointless), the adapter will see an error due to its channel dropping, and bail out immediately.

Native busy-looping or deadlocked threads are tricky to kill safely, but that’s a general problem with threads in non-interpreted languages. Not specific to Serde, not related to Slow loris.

If you’re blocking on IO that’s you’re cancellation signal. You can of course close the pipe you’re reading from which surfaces an error in your synchronous thread.

How is that related to serde?

I thought I'd share this project in case anyone wanted to quickly evaluate rust for web


Although there's a lot of work left to be done, I'd love to hear feedback, painpoints and ideas you have :)

Hoping to have documentation up at create-rust-app.dev soon~

Please consider giving warp as choice for the web server.

Any particular reason?

Its more light weight and usually doesn't enforce any particular way of doing things. Regardless of warp support, I have most of the other things on your list integrated one by one (JWT/dashboards /react/diesel) but it was time consuming as I didn't have a quick start project template.

So I think the work you are doing is going to be valuable for the community. I guess giving an option between the three popular web frameworks (actix, warp, rocket) won't be so easy as you might have very different way of integrating the authentication etc with each. Probably one way to do this is to keep things more modular so that the authentication can be called independently (among other features).

This is interesting, and news to me. Your comment suggests something about Rust imposes this problem on all web frameworks, is that true? If so could you share the root cause?

I'm just newbie in Rust. Does some web framework not use hyper? I'm trying to find web framework that does not have problem with Slow-Loris.

This doesn't solve your problem out of the box, but I think most reverse-proxies have configurations that can protect the services behind them from these sorts of attacks.

I don't think anyone will want to avoid hyper, tbh, that's a lot of work to hand-roll instead. so you better advertise the issue there instead, and if they refuse to do anything about it then make a big stink downstream.

I think this should be fixed in hyper.

Serde is great. I wish it did XML as well but that's a trickier proposition

Perhaps XML serializers could be derived from schema. That way you always know which tag name, attribute or child node is meaningful.

I have found the quick-xml crate to work well.

one thing that has kinda irked me about serde is that there's relatively limited support for bounding memory usage for types like a Vec<T> at deserialize time without a lot of manual work.

It is doable (e.g. see SafePSBT deserializer here https://github.com/sapio-lang/sapio/blob/master/ctv_emulator...), just not particularly ergonomic.

addendum: if you're like 'when could you want this?', imagine you are deserializing from a Reader which has an internal circular buffer or something from the network (bounded) and you want to limit a peers total resident memory (or something) on your server. Very difficult to do this with Serde!

But but apache arrow!

I know it's nit picking but I would say it's a library not a framework.

There are varying definitions of framework but what tend to be common is that they have a strong effect of "locking you into their ecosystem", which isn't really the case for serde.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact