Protobuf, Thrift, plenty more
Schema-ful, zero-copy: Cap'n'proto, Flatbuffers
Schema-less, copying: Json (binary and other variants included), XML
Schema-less, zero-copy: Flexbuffers (Any others? This seems new to me)
Its one of reasons I don't mind maven. Yeah there's 1000+ line XML config file, but maven DTD is so tight nearly any syntax issue will be flagged. Something easy to appreciate when you're used to giant config files that don't get validated until runtime.
My comment was only aimed at service payload serialization; as to whether markup makes for a good config file format, I'm not entirely sure. It certainly is better than inventing an ad-hoc format IMO, but OTOH there are a couple of not-quite-SGML/XML config formats such as Apache's httpd.conf, or in fact maven's that just give XML/SGML a bad name IMHO, because they generally inherit the downsides of markup without bringing its benefits (such as being able to assemble a configuration from fragments using regular entity expansion).
Look at the number of DLs on this JSON schema validator https://www.npmjs.com/package/ajv
Just as an aside. The new version of OpenAPI (v3.1 RC) for REST interfaces now fully supports the json-schema validation mechanism so we may see an uptake in use of json schema.
Its not perfect and has some holes.
We currently use this to store security log data, but think it's an interesting midpoint between having no schema at all vs requiring schema registries to do useful work.
I will consider it in future though, while I'm familiar with SBE it's not one I'd have thought of when thinking about serialisation.
I did end up writing a simple schema verifier for Ruby (ClassyHash, on GitHub) in one of the jobs where I used msgpack, but I no longer have access to maintain it. My benchmarks showed msgpack+classyhash was faster than native JSON (didn't test oj I think) and other serialization formats, and faster than all the other popular Ruby schema validators at the time.
Tldr: msgpack rocks, use it instead of JSON for internal services
This probably comes down to a disagreement on what "zero-copy" means.
Some people use the term "zero-copy" to mean only that when the message contains a string or byte array, the parsed representation of those specific fields will point back into the original message buffer, rather than having to allocate a copy of the bytes at parse time.
Cap'n Proto and FlatBuffers implement a much stronger form of zero-copy. With them, it's not just strings and byte buffers that are zero-copy, it's the entire data structure. With these systems, once you have the bytes of a message mapped into memory, you do not need to do any "parse" step at all before you start using the message.
For example, if you have a multi-gigabyte file formatted with one of these, you can mmap() it, and then you can traverse the message tree to read any bytes of the message except the chain of pointers (parent to child) leading to that one datum. Aside from the mmap() call, you can do all this without even allocating any memory at all.
That is absolutely not possible with Protobuf, because Protobuf encoding is a list of tag-value pairs each of which has variable width. In order to read any particular value, you must, at the very least, linearly scan through the tag-values until you find the one you want. But in practice, you usually want to read more than one value, at which point the only way to avoid O(n^2) time complexity while keeping things sane is to parse the entire message tree into a different set of in-memory data structures allocated on the heap.
That is not "zero-copy" by Cap'n Proto's definition.
(Disclosure: I am the author of Cap'n Proto and Protobuf v2.)
By the way I worked on protobuf performance at Google for years and we could never get flatbuffers to go any faster.
Whoa, that's a very strong statement. But then the rest of the paragraph gets a lot weaker.
> there are high-performance computation packages that compress data structures in L1-cached-sized blocks
This seems like a non sequitur. Of course hand-tuned data structures can achieve higher performance than any serialization framework, but what does that have to do with zero-copy vs. protobuf? Are you suggesting that protobuf encoding would be a good choice for these people?
> So, you've used the word "achieve" to decorate an outcome that might not be optimal.
I'm not sure why "achieve" would imply "optimal". Of course whether this is an advantage depends on the use case.
There are many cases where zero-copy doesn't provide any real advantages. If you're just sending messages over a standard network socket, then yeah, zero-copy probably isn't going to make things faster. There are already several copies inherent in network communication.
But if you have a huge protobuf file on disk and you want to read one field in the middle of it, that's just not something you can do in any sort of efficient way. With zero-copy, you can do this trivially with mmap().
Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste. Zero-copy would let you build and consume the structure from the same memory pages.
These seem "intrinsically valuable"?
> we could never get flatbuffers to go any faster.
What use case were you testing? Did you test any zero-copy serializations other than flatbuffers?
I've heard from lots of people that say Cap'n Proto beat Protobuf in their tests... but it definitely depends on the use case.
This is the part I don’t agree with. What I’m saying there is value in using encoded structures not only between servers and not only over shared memory but even within a single process. Yes, you discard the ability to just jump to any random field, but that is not always important. Often it can be better to spend some compute cycles and L1 accesses to save main memory accesses. If you are having to make full access to some kind of data anyway, then packing it makes a ton of sense. Consider any kind of delta-encoded column of values ... you can’t seek within it, but if the deltas are smaller than the absolutes, this can save massive amounts of main memory bandwidth. This is why I argue that representing something as a C struct in main memory is not obviously advantageous, outside some given workloads.
As for flatbuf at google I’m sure you’re aware that the only way to get the kind of mindshare you’d need to ship it would be to make websearch measurably faster.
It doesn't seem to me like Protobuf is ideal for this use case, but sure, I see how the light compression afforded by Protobuf encoding could lead to a win vs. bare C structs.
I think a better answer here, though, would be to use an actual compression algorithm that has been tuned for this purpose.
Of course, then the uncompressed data needs to be position-independent, so no (native) pointers. You could use something hand-rolled here... but also, this is exactly the problem zero-copy serializations solve, so they might be a good fit here. Hmm!
I'd be pretty interested to compare layering a zero-copy serialization on top of compression vs. protobuf encoding in these high-performance computing scenarios. Is that something you tried doing?
I don't think this is the criticism being raised.
The main criticism being raised against non-zero-copy serialization is that this often requires maintaining different memory representations for the same value - the copies are the consequence of transforming from one representation to another one.
They are not talking about "unpacking on the fly for processing", but rather about "unpacking on memory to be able to call an opaque API outside your control that expects the unpacked representation". That requires copying in-memory to interface with that API.
Your approach only works if you are willing to "re-implement the world" to interface with whatever packed format suits your application.
With zero-copy serialization you don't have to do that.
It feels like you're conflating things here.
It's not always a better algorithm, but it's a better definition.
What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?
Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.
Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.
At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?
I shipped an embedded project using an RTOS and < 256kB RAM using protocol buffers (even nested ones) and zero-copy deserialization for byte arrays some time ago. We used nanopb, and it worked just fine if you understand how it works - although it certainly is less pleasant to work with than a fuller implementation which would just have copied out the internal bytes into new arrays and not have let us deal with lots of internal pointers into byte arrays.
Overall using Protocol Buffers was a great success in that project, since we could share schemas between IoT devices and the backend, and were able to generate a lot of code which would otherwise have been hand-written.
> Using something like flatbuffers would have made a lot more sense.
It might be able to solve the same problem. But it also needs to answer the questions: Is a suitable library available for all consumers of the data-structure? I don't think this was the case for our use-case, so it wouldn't have made more sense to use it back then.
Weird rebuttal, considering that my argument is that the copying nature of protobuf can save memory bandwidth.
Huh? An extra pass over the data to parse it obviously uses more memory bandwidth.
I might understand your argument if parsing converted the data into a memory-bandwidth-optimized format, but the protobuf parsed form is certainly not that, unless things have changed very drastically since I worked on it.
At Google scale protobuf works perfectly fine. It's our lingua franca and a lot of work (apparently by yourself included) has gone into making it performant. But it comes with normative lifestyle assumptions.
Sure you could change the wire format and implementation to be mappable; but then it wouldn't be compatible with the mainstream implementation.
FlatBuffers can be accessed instantly without deserialization or allocation, so clearly in some cases a huge speedup is possible. If in your case there was no speedup, there must be other bottlenecks.
> A direct mapping to memory is not always optimal for performance.
You comment seems to imply that the first statement follows from the second, but it does not. You're right, a direct memory mapping might not be optimal in some situations, but then again in some others it might be optimal. So this feature isn't always useful to everyone, but it doesn't follow that it's never useful to anyone.
Even if you worked on this on a range of projects at Google, isn't it possible that there are people working on systems that have rather different performance characteristics than Google's systems?
I think this is the important point when it comes to discussing zero-copy. I've written a custom protobuf implementation for java which can do exactly that.
It's a bit tricky since protobuf supports recursive messages and java's Unsafe is not as powerful as what you can have in C++. My trade-off was to require the caller to pre-allocate messages needed before parsing the data. This works great when working with multi-gigabyte files where you want to process a large number of (possibly nested) messages, but is not as ergonomic as normal protobuf code.
It obviously doesn't come for free, as you need to do a linear scan to find those tag-values, but there are ways to speed that up too, so it becomes very fast in practice.
I'm sure Cap'n Proto and FlatBuffers are faster for some use-cases (I haven't tested), but a very important point for me is to be wire-compatible with protobuf3 and its ecosystem... and still be zero-copy/zero-alloc.
And what happens if you copy two different branches of one message into another, and they happen to share some children? Do you have to keep your seen-pointer map around across multiple copies?
For a cyclic graph, things get more confusing. Copying one branch of a fully-connected cyclic graph always means copying the entire graph. Apps can easily get into trouble here. Imagine an app that implements its own tree structure where nodes have "parent" pointers. If they try to copy one branch into another message, they accidentally copy the entire tree (via the parent pointers) and might not even realize it.
The one way that I think pointer aliasing could be made to work is if pointers that are allowed to alias are specially-marked, and are non-owning. So each object still has exactly one parent object, but might have some other pointers pointing to it from elsewhere. A copy would not recurse into these pointers; it would only update them if the target object happened to be part of the copy, otherwise they would have to become null.
But I haven't yet had any reason to try implementing this approach. And apps can get by reasonably well without it, by using integer indexes into a table.
"Skyscrapers have been solved since the 20's" doesn't answer whether I should use steel beams when building a house.
Records in Cap'n Proto are laid out like C structs. All fields are at fixed offsets from the start of the structure. For variable-width values, the struct contains a pointer to data elsewhere in the message.
Each new object is added to the end of the message, so that the message stays contiguous. This does imply that if you resize a variable-width object, then it may have to be moved to the end of the message, and the old space it occupied becomes a hole full of zeros that can't really be reused. This is definitely a down-side of this approach: Cap'n Proto does not work great for data structures that are modified over time. It's best for write-once messages. FlatBuffers has similar limitations, IIRC.
Off-topic: I see a few links in your HN profile, is that the best place to keep up with the projects you're working on?
The reference implementation (in go) is 90% complete, enough to marshal objects except for recursive support.
[EDIT] and I forgot to mention re-entrant as well (again if done correctly).
Would BIPF fall into this category? It's a serialization format for JSON: https://github.com/dominictarr/bipf
if performance isn't an issue, then just use any of these tools. unless the tooling cost and representational issues make it easier to to just use bytes.
all for abstractions..but people seem to to be blind to the idea that there's a perfectly good one a short step down from capn proto
Protobuf can support zero-copy of strings and byte arrays embedded in the message. Cap'n Proto and FlatBuffers support zero-copy of the entire message structure.
(Disclosure: I'm the author of Cap'n Proto and Protobuf v2.)
Just to be clear, even this more limited notion of "zero copy" isn't currently supported in the open-sourced version of protobuf. It is supported in the internal Google version, which is presumably what you're thinking of. There is an open issue  tracking the possibility of making it available in the open source version too.
My guess are that Flatbuffers are more flexible and slower to read. And 5 years to late to the party.
This HN article, though, is about FlexBuffers. FlexBuffers appears to be based on FlatBuffers, but does not use schemas. Cap'n Proto, FlatBuffers, and Protobuf are all schema-driven (you must define your message types in a special language upfront). FlexBuffers is more like JSON in that all types are dynamic.
Personally I'm a strong believer that schemas are highly desirable, but some people argue that schema-less serializations let you get stuff done faster. I think it's very analogous to the argument between type-safe languages vs. dynamically-typed languages. Obviously there are a lot of smart people on both sides of these arguments.
For a document database, I don't agree at all. Some time back I spent more time than I'd like developing on Mongo, and boy did I wish I could actually tell it the schema of documents in each collection and have it enforce that (not to mention optimize based on it). A lot of developers actually use libraries on top of Mongo to define and enforce schemas.
I think protocols still don't have the level of tooling that makes it really obvious why schemas are better.
> Personally I'm a strong believer that schemas are highly desirable
I'm currently working on a project that's very executable size sensitive, and having a fixed schema is crucial to that, as can optimize a lot more aggressively. We can even use dead code feedback to identify schema parts that aren't consumed on the client
That article is a bit old; is there anything that stands out to you in the last ~5 years where things have diverged?
I haven't looked into exactly how FlexBuffers work but off the top of my head I suspect that leveraging FlatBuffer's "virtual table" technique probably helps here. In (normal, schema-ful) Cap'n Proto, fields within a struct have fixed offset, meaning that unused fields still take space. As I understand it, FlatBuffers tries to avoid this by adding an extra layer of indirection -- each struct has a sort of "virtual table" which stores the offsets of each field, where some fields might not be present at all. If multiple structs in a message happen to end up with the same virtual table, then the virtual table is only written once.
Totally speculating here since, again, I haven't actually looked at FlexBuffers, but if I were building something isomorphic to JSON on top of FlatBuffers, I'd probably look into extending the virtual tables to index fields by name rather than number. So if you have two structures with the same set of field names, they can share a virtual table, and those field names only have to appear once. That'd be a pretty great way to compress JSON.
Back in Cap'n Proto, we don't have these vtables. For data with fixed schemas, my opinion is that these vtables seem like they require more bookkeeping than they are worth. But for dynamic schemas they seem like a much bigger win. So if you wanted to encode dynamic schemas layered on top of Cap'n Proto, you'd probably have to come up with some similar vtable thing yourself.
Funny you should say vtables may not be worth it.. I was of a similar opinion (why would you have many fields that are not in use??) until people showed me some of the Protobuf schemas in use at Google, with hundreds of fields, most unused. This is what pushed me in the direction of the vtable design.
Data always starts out neatly.. but the longer things live, the more this kind of flexibility pays off.
Yeah, I was quite familiar with those back when I was at Google. But, IIRC, a lot of them were semantically unions, but weren't actually marked as such mostly because "oneof" was introduced relatively late in Protobuf's lifetime.
No joke, I originally was arguing for 8-bit vtable entries because surely no-one ever needs more than 256 bytes worth of fields. Good thing my co-workers were smarter than me.
And yes, FlatBuffers has built-in unions from day 1, which was probably helpful.
I've started to prefer functions that are explicit about their error cases and have interfaces that make it obvious about what errors you'd want to handle. Unexpectedly getting a zero when you get garbage data may cause issues.
Thinking about it, zero is often used as a success code: ERROR_SUCCESS is 0x0 on Windows. So garbage in, ERROR_SUCCESS out??
value, err = root.AsInt64();
> A type byte is made up of 2 components (see flexbuffers.h for exact values):
> * 2 lower bits representing the bit-width of the child (8, 16, 32, 64). This is only used if the child is accessed over an offset, such as a child vector. It is ignored for inline types.
> * 6 bits representing the actual type (see flexbuffers.h).
> Thus, in this example 4 means 8 bit child (value 0, unused, since the value is in-line), type SL_INT (value 1).
Type aware methods like `.isInt()` wouldn't make sense if they couldn't determine the underlying type.
Having the error can signal both if a value could not be retrieved, or if it could be retrieved, but was coerced, as an example.
I'm glad to have seen this, which was on https://serde.rs/, which seems to be a recent addition. I was looking for an alternative to messagepack.
There's different ways to make people aware of alternatives. I chose to spread word of this product on HN :)
FlexBuffers is a schemaless format much like JSON, so is naturally extensible, and typeless, so I don't see how it would need any such thing.
MessagePack allows applications to define application-specific types using the Extension type. Extension type consists of an integer and a byte array where the integer represents a kind of types and the byte array represents data. Applications can assign 0 to 127 to store application-specific type information. An example usage is that application defines type = 0 as the application's unique type system, and stores name of a type and values of the type at the payload.
FlexBuffers could support a similar scheme if we wanted to, though putting custom data in the format is already easy enough using the "blob" type, which is arbitrary bytes much like the MessagePack feature.
Sure, it works via stuffing the custom things in blobs, but then you have (1) an extra layer of indirection during dispatch, and (2) always waste a few bits or bytes to store the custom type information, even when you don't need it for actual blobs.
In principle, it's not a lot of effort to reserve a fraction of the type ID space for custom types. From a user's perspective, it's also much nicer to hear that a library cleanly uses existing extensions to the type system, rather than hijacking a general-purpose blob type.
If there's nothing that prevents it, feel free to take this as a feature request :-)
So this feature would look like a new function IsCustom1() or AsCustom1() etc, where the latter would give you a pointer to the bytes to do your own thing.. so not much difference with Blobs.
Also note that because FlexBuffers is O(1) access to elements, inline data (most scalars) is all the same size, whereas variable sized data is stored over an offset. So again, would not be very different from blobs.
So, would you recommend Protobuf, Cap'n'Proto, FlatBuffers or FlexBuffers for multiplayer games? The usual packets are game states or user input sent at high frequency.
So FlatBuffers will make an incoming packet waaay faster to work with than Protobuf. On the downside, Protobuf tends to be a little smaller, so if bandwidth is a greater concern than (de-)serialization speed, you might still prefer it. Additionally, receiving data over the network raises the question of how you handle packets that have been corrupted (or intentionally malformatted by an attacker), and in the case of FlatBuffers you'd need to at least run the "verifier" over the packet before accessing it if you don't want your game servers to crash when this happens. That slows it down a little bit, but is still fast, i.e. still doesn't allocate etc.
Cap'n Proto will perform similarly, though does have the downside that all fields take space on the wire, regardless of whether they're set or not. So which is better depends on the kind of data you want to send and how it is likely to evolve.
Frankly, for the absolute highest performance (and lowest bandwidth) game networking you still need a custom encoding.
This is really the truth here. The efficiency of a networking protocol for a multiplayer game is somewhat sensitive to the context; Are you trying to do a fast-paced game with lots of client prediction? Do you need to have guaranteed delivery? Do you expect your game to be used with highly-lossy networks, or mostly from stable connections?
It's not uncommon to find something like flatbuffers or flexbuffers available in a multiplayer game engine, but the high-performance systems like movement or ai will probably utilize a custom protocol better suited to their task.
Quite apart from its neglect for years as Kenton was working on sandstorm.io, I found the low level access interfaces to extract best performance to be both fiddly and often rely on an intimate knowledge of the internals.
As is often the case, maximum performance for your use case requires more intimate detail.
I can't remember, but some of these zero copy protocols may still require byte order swapping on mixed endian systems. (But cache access times probably dwarf the byte swapping)
FlatBuffers is used to great effect, but often with new teams building new things, that are not yet shackled in Protobuf everywhere, like many of the AI related teams.
Thanks a lot for your time (and work!). FYI, i'm planning on using flatbuffer for communication between a mobile app native code and a cross-platform library, hoping to save on data decoding time. A bit similar to what Xi ide is doing. If you have any advice, you're welcome ! :D
- serialisation cannot be intrusive (i.e. I don't want a class definition to be generated from a schema file)
- zero copy
- some sort of versioning so any accidental message version mismatch between sender and receiver doesn't cause a crash / undefined behviour.
- many language support
Protobufs are intrusive, so that rules them out.
JSON is just too verbose and slow.
MessagePack looked like the most promising path to pursue at the moment. Flatbuffers look quite similar.
It's a pretty stellar cross-platform serialization toolkit and the zero-copy support for reading is no joke.
It is one of the best serialisation methods currently available.
Consider that you're writing a custom struct/data type called MyStruct in your language. Now, can you use std::vector<MyStruct>? This requires your other language to read the entire vector header and instantiate a new type based on both your definition of your custom type and the vector template. The vector instantiation will ask questions like whether or not your type is copy constructible and move constructible, and if so, use these implementations. Designing this into your other language requires a lot of compromises.
And that's just templates. ADL poses an even bigger difficulty. Again when you write your custom type in your language and define a swap function for it, will C++ standard algorithms that need swapping find your implementation via ADL? Recall that swapping is invoked via having `using std::swap;` and then invoking the bare function `swap` which triggers ADL. Again, designing this into your other language requires a lot of compromises.
The only language that I know of that comes close to having full interoperability with C++ is Objective-C, and it does so by creating a new Frankenstein language called Objective-C++. It has two different kinds of classes, two different exception mechanisms, etc.
In fact the doc says,
> Being 100% compatible with C++ means more or less adding a fully functional C++ compiler front end to D. Anecdotal evidence suggests that writing such is a minimum of a 10 man-year project, essentially making a D compiler with such capability unimplementable.
It then goes on to describe the pragmatic approach that doesn't allow full compatibility:
> D takes a pragmatic approach that assumes a couple modest accommodations can solve a significant chunk of the problem:
> matching C++ name mangling conventions
> matching C++ function calling conventions
> matching C++ virtual function table layout for single inheritance
Can you see how limited this subset of interoperable API is?
Basically COM? Which is not surprising as it was designed as an OO native FFI.
I'm not sure if that's the same meaning in this context, however.
This has worse performance for decoding, but far better performance for encoding. Encoding any traversible format is guaranteed to be expensive because you have to encode it inside-out, calculating the nested sizes of everything as you go. For messages that are only encoded once and read once, for example network messages, a format that is traversible in-place is a poor choice.