Hacker News new | past | comments | ask | show | jobs | submit login
Serde 1.0.0 for Rust released (github.com)
273 points by taheris on Apr 20, 2017 | hide | past | web | favorite | 51 comments

Serde really is one of the gems of the Rust ecosystem.

It is a (de)serialization framework that can be quite easily implemented for various serialization formats like JSON, MessagePack, Yaml, toml, ...

It enables automatic and very performant (de)serialization of you data, into different formats.

Often with a simple:

` #[derive(Serialize, Deserialize)] struct Data { ... } `

For existing formats Serde is indeed pretty awesome. However when I tried to pick it up I found the lack of support for fixed arrays >32 a show-stopper for me.

I just found out recently that there's a couple work-arounds but none of them were obvious and for a framework that is zero-allocation based not having great support for arrays was a non-starter since that's 90% of the structures I was working with.

I love Rust but I feel like arrays in general are somewhat of an unfinished part of the language, you run into similar problems with Clone and Debug which can be frustrating.

Active work is going into integer generics, which will solve this issue. It has taken a while but isn't exactly simple either!

Glad to hear it!

I guess it's just jarring since the rest of Rust is so well crafted it felt really abrupt to run up against the 32 length fixed size limits. It's also somewhat annoying to have to coerce it into a slice via &array[..] instead of &array.

It's workable and when I finally get the library to a stable state I definitely plan to share what went well and where I saw pain points.

Yup, totally hear you. It's totally achievable, but not simple; Haskell has this issue too, for example. There's been a few iterations of the design already. The current approach is to cut scope as much as possible to ship the basics sooner.

It is this lack of non-type template parameters, template templates (ie higher minded types), and variadics that make Rust feel like a step backwards compared to C++. When, Rust gets these, it will remove a a major blocker for me.

I really really would love Avro support added... I'd do it if I had more Rust experience. The icing on the cake is if it could also interact with the Avro registry somehow.

Avro support, just like all other Serde data formats, lives in its own crate. https://crates.io/crates/serde-avro

I keep picking this up to try and help and then dropping it. I wish I had more time in the day.

It looks like deserialization is mostly there, but not serialization.

What's Avro?

It's not as popular as Protocol Buffers, Thrift, MessagePack etc., probably because it started out in Java land (I believe it came from Doug Nutch of Lucene fame).

I've never used it, but one benefit of Avro is that it can embed the schema in the serialized output, which allows clients to consume data even though they don't have the IDL, which isn't possible with Protobuf and Thrift. MessagePack does include types and map keys, but doesn't have a schema.

s/Doug Nutch/Doug Cutting/

Oops. That should have been "Doug Cutting of Nutch and Lucene fame".

Looks like a data serialization format: https://avro.apache.org/

(Never heard of it either)

It allows you to define a schema, and then compile it to various languages and have multiple applications in multiple languages "speak" the same class, down to the binary level (for RPC, blob storage etc).

Really glad to see some cool libraries coming up for Rust. When I last tried to learn it, I got the basics then sort of just gave up because there didn't seem to be anything cool or unique I could do with it.

Serde is the kind of magic that we all hoped a type system like Rust's would cause. The transcode and zero-copy stuff is so neat. Doing stuff like this under the hood and "for free" is a nightmare in other languages.

I'm working on an strace implementation in Rust, with an enum (tagged union) for system calls:

  enum Syscall {
     Open { pathname: Buffer, flags: u64, count: u64 },
     Read { fd: u64, buf: Buffer, count: u64 },
I was very pleasantly surprised at being able to add a couple of dependencies, add #[derive(Serialize)] right above the struct, change two lines of my main driver program, and get an strace --json with useful output with no further effort. It's the sort of experience I expect from a higher-level language with dynamic types and runtime reflection, but available to me in a systems language.

Yes, and sometimes you really pay for runtime reflection - I've seen Java programs reduced to Python performance by excessive use of reflection. Whereas, this is _compile-time reflection_!

Very nice:

> Zero-copy deserialization

> […] The semantics of Rust guarantee that the input data outlives the period during which the output struct is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output struct still refers to it.

>The semantics of Rust guarantee that the input data outlives the period during which the output struct is in scope

To be fair, so does the semantics of every language with garbage collection -- keeping things alive while there are references to them is the bread and butter of GC.

EDIT: I do think it's impressive that Rust can manage this without the overhead of GC. But the sentence from the release notes immediately before 'killercup's quote was: This uniquely Rust-y feature would be impossible or recklessly unsafe in languages other than Rust which struck me as a bit over-hyped.

No, I think Rust still does this uniquely in a way that isn't available with a GC language. Imagine the following design (assume fd is a UDP socket or something, and so each read returns a complete message):

    char buf[1024];
    while (read(fd, buf, 1024)) {

    for (message: messages) {
Garbage collection will keep buf alive, but won't guarantee that buf isn't being mutated while it's alive. Rust's ownership system will guarantee that. In Rust, the read() function would require a mutable (i.e., unique) reference to the buffer, and serde's deserialization function also requires a reference to the buffer, preventing read() from being callable while the deserialized objects continue to exist.

I think doing reader/writer refcounting at runtime is hard because this is a case where there's nothing reasonable to do at runtime if you have incompatible references. At best you can do copy-on-write, but then you silently lose the zero-copy performance. You really want a compile-time error saying "You structured this code wrong, go redesign it or add some copies."

I think the point here is that the source data might be mutated, without you touching the deserialized struct. Rust prevents that from becoming a problem.

In addition to geofft's mention of mutability, you also couldn't do this safely in other languages without allocating. Garbage-collected languages tend to keep GC-able objects on the heap, not the stack. This serde feature allows you to allocate an array on the stack, deserialize from it into a structure on the stack that references the array, and never allocate anything on the heap.

Rust does the inverse- it doesn't extend the lifetime of the input data, it restricts the lifetime of the output struct.

But Rust doesn't have garbage collection, which is why this is interesting in the first place. :P And being able to statically rule out any spurious copies/alloctions (the "zero-copy" bit) while guaranteeing safe pointer usage (especially for string data) is something that Rust is very good at.

I think the uniquely Rust bit is the memory efficiency. Instead of allocating memory for the raw data and the deserialized memory, you can simply reference the raw data directly through a typed variable, if I read this correctly. I'd assume a garbage collector does really help with this kind of efficiency. The cost here is compile time, but for performance critical code, it might well be a good trade off.

As an aside:

> Rust's "orphan rule" allows writing a trait impl only if either your crate defines the trait or defines one of the types the impl is for. That means if your code uses a type defined outside of your crate without a Serde impl, you resort to newtype wrappers or other obnoxious workarounds for serializing it.

I hadn't realized this. The justification is reasonable - preventing ambiguity when resolving traits - but it precludes one of the major use cases for traits/type classes: adapting a type from one library to an interface in another library without wrapper types.

This is true, however you can get around a lot of the annoyance of it with conversion traits like From<T> and Into<T>.

It still takes a bit of boilerplate to write unfortunately.

The good news is this likely won't be the case forever, it looks like specialization (I think it's called) and some other type system features will allow you to write some code to resolve this. Intersection impl's I think it's called. Correct me if I'm wrong anyone.

Luckily, because of rust's "newtype" types and the Deref trait, wrapping isn't too much of a pain.

Yeah, in Haskell it's allowed but outputs a warning. So a lot of my modules use -fno-warn-orphans :D

The reason is that class instances are really not first-class in Haskell. You can never import a module and choose not to import its instances, not even with

  import Module (just_one_symbol_i_need)
or to not import a couple of them. I think named instances a la PureScript let you do

  import Module (instance toJsonText, ...)
which would be great to have in Haskell, but probably breaks something deep inside instance resolution.

PureScript as a "Haskell without the mistakes" really is a great idea.

I really love Serde, but one consequence of splitting out the formats into different libraries is that different formats can be of substantially different levels of quality. My impression is that JSON support is best-of-class, but I have no idea about the others.

For example, I've been looking at the CBOR library for Serde [1], and it's not obvious whether the library is full-featured, robust, actively supported, etc. Same goes for many of the other Serde formats. At the moment I'm likely to just choose JSON for new projects since I don't want to build on top of something that isn't known to be solid, but it would be really nice to be able to use binary formats for what I want to do.

Now that Serde is 1.0 it would be nice to do a push on the individual formats so that users coming in can tell what's active and well-supported vs a (possibly inactive) community contribution.

[1]: https://github.com/pyfisch/cbor

This is a global problem for packages, and I think there's a push for a feature where developers can mark for a crate how stable it is. Anyways for the packages without passing tests we already know it can't be stable yet, as it doesn't even have unit test passing guarantee :)

There's also capnproto-rust[1] if anyone was wondering.

[1]: https://github.com/dwrensha/capnproto-rust

... but it doesn't use Serde.

Should it? Or can it? Am curious, not a criticism.

They target different things. Serde is something that simply serializes data into common text based or binary based format.

But there's been some innovation done to make that even more efficient. It started with protocol buffers[1]. So you can then basically write .proto files which are based on protocol buffers' own schema[2] which look like this[3]. What's special about these schemas is that they can be strongly typed, and then after a schema is written, which is a .proto file (in case of protocol buffers), code for any language can be generated to receive and parse the binary encoded message properly, with proper error checking. This avoids re-writing code in different languages if a RPC protocol is changed. It also offers other advantages and you can look into the docs for that.

Then, the author of protocol buffers left Google and created something called Capnproto[4], which improved on it in many ways. Now, what I linked to is a rust program supporting capnproto's own schema[5]

[1]: https://developers.google.com/protocol-buffers/

[2]: https://developers.google.com/protocol-buffers/docs/proto3

[3]: https://github.com/WhisperSystems/libsignal-protocol-c/blob/...

[4]: https://capnproto.org/

[5]: https://capnproto.org/language.html

I'm aware of the details of Cap'n Proto.

But I thought since Avro is somewhat similar to capnproto and it uses Serde (in Rust) then capnproto could/should too.

But it sounds like it is a "should not"

This is really cool. But how does zero-copy deserialization work for &'a str with formats like JSON where the input data may have string escapes (and therefore needs to be mutated during deserialization)?

It doesn't. You can use Cow<'a, str> if you want string escapes to work, but then it is not really zero-copy anymore.

So, what, if you use &str you get the raw data, escapes and all? That's not very good, especially because it means round-tripping your struct through JSON can return the wrong result.

I assume that Cow<'a, str> will only produce an owned string if there are strong escapes that need decoding? If so, that's probably the best approach for decoding appropriately, as you'd get zero-copy as long as no mutations are required, but round-tripping through JSON would still work right.

That makes sense. I want to rewrite my plist parser to both use Serde and be zero copy (https://github.com/conradev/plist-rs/issues), and for binary plists you would similarly need to convert UTF-16 strings to UTF-8, but wouldn't need to touch UTF-8 strings.

I wonder how the API would work. Would json::Value be modified to have a lifetime and contain Cow enums? How would it implement ToOwned? It almost seems like there would have to be separate types, a json::Value<'a> and json::OwnedValue.

Yes, I don't think you can do better if forced to use &str. The point of using Cow is indeed to only copy when necessary like you assumed.

You could have zero-copy roundtrips if you used escapes in the deserialized strings as well.

For the record, I just tested, and it turns out that if you try and decode a string with escapes into a &str, it produces an error rather than giving you the escaped version of the string.

Awesome! I end up using Serde in just about every project. It's great! Congrats guys.

Does serde support serializing to protobuf and back?

From the README: "Serialization is not yet implemented in this version."

We really have to start rendering READMEs on crates.io...

IMO you should just merge with docs.rs.

That's a good idea!

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact