Protobuf, Thrift, plenty more
Schema-ful, zero-copy: Cap'n'proto, Flatbuffers
Schema-less, copying: Json (binary and other variants included), XML
Schema-less, zero-copy: Flexbuffers (Any others? This seems new to me)
Its one of reasons I don't mind maven. Yeah there's 1000+ line XML config file, but maven DTD is so tight nearly any syntax issue will be flagged. Something easy to appreciate when you're used to giant config files that don't get validated until runtime.
My comment was only aimed at service payload serialization; as to whether markup makes for a good config file format, I'm not entirely sure. It certainly is better than inventing an ad-hoc format IMO, but OTOH there are a couple of not-quite-SGML/XML config formats such as Apache's httpd.conf, or in fact maven's that just give XML/SGML a bad name IMHO, because they generally inherit the downsides of markup without bringing its benefits (such as being able to assemble a configuration from fragments using regular entity expansion).
Look at the number of DLs on this JSON schema validator https://www.npmjs.com/package/ajv
Just as an aside. The new version of OpenAPI (v3.1 RC) for REST interfaces now fully supports the json-schema validation mechanism so we may see an uptake in use of json schema.
Its not perfect and has some holes.
We currently use this to store security log data, but think it's an interesting midpoint between having no schema at all vs requiring schema registries to do useful work.
I will consider it in future though, while I'm familiar with SBE it's not one I'd have thought of when thinking about serialisation.
I did end up writing a simple schema verifier for Ruby (ClassyHash, on GitHub) in one of the jobs where I used msgpack, but I no longer have access to maintain it. My benchmarks showed msgpack+classyhash was faster than native JSON (didn't test oj I think) and other serialization formats, and faster than all the other popular Ruby schema validators at the time.
Tldr: msgpack rocks, use it instead of JSON for internal services
This probably comes down to a disagreement on what "zero-copy" means.
Some people use the term "zero-copy" to mean only that when the message contains a string or byte array, the parsed representation of those specific fields will point back into the original message buffer, rather than having to allocate a copy of the bytes at parse time.
Cap'n Proto and FlatBuffers implement a much stronger form of zero-copy. With them, it's not just strings and byte buffers that are zero-copy, it's the entire data structure. With these systems, once you have the bytes of a message mapped into memory, you do not need to do any "parse" step at all before you start using the message.
For example, if you have a multi-gigabyte file formatted with one of these, you can mmap() it, and then you can traverse the message tree to read any bytes of the message except the chain of pointers (parent to child) leading to that one datum. Aside from the mmap() call, you can do all this without even allocating any memory at all.
That is absolutely not possible with Protobuf, because Protobuf encoding is a list of tag-value pairs each of which has variable width. In order to read any particular value, you must, at the very least, linearly scan through the tag-values until you find the one you want. But in practice, you usually want to read more than one value, at which point the only way to avoid O(n^2) time complexity while keeping things sane is to parse the entire message tree into a different set of in-memory data structures allocated on the heap.
That is not "zero-copy" by Cap'n Proto's definition.
(Disclosure: I am the author of Cap'n Proto and Protobuf v2.)
By the way I worked on protobuf performance at Google for years and we could never get flatbuffers to go any faster.
Whoa, that's a very strong statement. But then the rest of the paragraph gets a lot weaker.
> there are high-performance computation packages that compress data structures in L1-cached-sized blocks
This seems like a non sequitur. Of course hand-tuned data structures can achieve higher performance than any serialization framework, but what does that have to do with zero-copy vs. protobuf? Are you suggesting that protobuf encoding would be a good choice for these people?
> So, you've used the word "achieve" to decorate an outcome that might not be optimal.
I'm not sure why "achieve" would imply "optimal". Of course whether this is an advantage depends on the use case.
There are many cases where zero-copy doesn't provide any real advantages. If you're just sending messages over a standard network socket, then yeah, zero-copy probably isn't going to make things faster. There are already several copies inherent in network communication.
But if you have a huge protobuf file on disk and you want to read one field in the middle of it, that's just not something you can do in any sort of efficient way. With zero-copy, you can do this trivially with mmap().
Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste. Zero-copy would let you build and consume the structure from the same memory pages.
These seem "intrinsically valuable"?
> we could never get flatbuffers to go any faster.
What use case were you testing? Did you test any zero-copy serializations other than flatbuffers?
I've heard from lots of people that say Cap'n Proto beat Protobuf in their tests... but it definitely depends on the use case.
This is the part I don’t agree with. What I’m saying there is value in using encoded structures not only between servers and not only over shared memory but even within a single process. Yes, you discard the ability to just jump to any random field, but that is not always important. Often it can be better to spend some compute cycles and L1 accesses to save main memory accesses. If you are having to make full access to some kind of data anyway, then packing it makes a ton of sense. Consider any kind of delta-encoded column of values ... you can’t seek within it, but if the deltas are smaller than the absolutes, this can save massive amounts of main memory bandwidth. This is why I argue that representing something as a C struct in main memory is not obviously advantageous, outside some given workloads.
As for flatbuf at google I’m sure you’re aware that the only way to get the kind of mindshare you’d need to ship it would be to make websearch measurably faster.
It doesn't seem to me like Protobuf is ideal for this use case, but sure, I see how the light compression afforded by Protobuf encoding could lead to a win vs. bare C structs.
I think a better answer here, though, would be to use an actual compression algorithm that has been tuned for this purpose.
Of course, then the uncompressed data needs to be position-independent, so no (native) pointers. You could use something hand-rolled here... but also, this is exactly the problem zero-copy serializations solve, so they might be a good fit here. Hmm!
I'd be pretty interested to compare layering a zero-copy serialization on top of compression vs. protobuf encoding in these high-performance computing scenarios. Is that something you tried doing?
I don't think this is the criticism being raised.
The main criticism being raised against non-zero-copy serialization is that this often requires maintaining different memory representations for the same value - the copies are the consequence of transforming from one representation to another one.
They are not talking about "unpacking on the fly for processing", but rather about "unpacking on memory to be able to call an opaque API outside your control that expects the unpacked representation". That requires copying in-memory to interface with that API.
Your approach only works if you are willing to "re-implement the world" to interface with whatever packed format suits your application.
With zero-copy serialization you don't have to do that.
It feels like you're conflating things here.
It's not always a better algorithm, but it's a better definition.
What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?
Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.
Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.
At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?
I shipped an embedded project using an RTOS and < 256kB RAM using protocol buffers (even nested ones) and zero-copy deserialization for byte arrays some time ago. We used nanopb, and it worked just fine if you understand how it works - although it certainly is less pleasant to work with than a fuller implementation which would just have copied out the internal bytes into new arrays and not have let us deal with lots of internal pointers into byte arrays.
Overall using Protocol Buffers was a great success in that project, since we could share schemas between IoT devices and the backend, and were able to generate a lot of code which would otherwise have been hand-written.
> Using something like flatbuffers would have made a lot more sense.
It might be able to solve the same problem. But it also needs to answer the questions: Is a suitable library available for all consumers of the data-structure? I don't think this was the case for our use-case, so it wouldn't have made more sense to use it back then.
Weird rebuttal, considering that my argument is that the copying nature of protobuf can save memory bandwidth.
Huh? An extra pass over the data to parse it obviously uses more memory bandwidth.
I might understand your argument if parsing converted the data into a memory-bandwidth-optimized format, but the protobuf parsed form is certainly not that, unless things have changed very drastically since I worked on it.
At Google scale protobuf works perfectly fine. It's our lingua franca and a lot of work (apparently by yourself included) has gone into making it performant. But it comes with normative lifestyle assumptions.
Sure you could change the wire format and implementation to be mappable; but then it wouldn't be compatible with the mainstream implementation.
FlatBuffers can be accessed instantly without deserialization or allocation, so clearly in some cases a huge speedup is possible. If in your case there was no speedup, there must be other bottlenecks.
> A direct mapping to memory is not always optimal for performance.
You comment seems to imply that the first statement follows from the second, but it does not. You're right, a direct memory mapping might not be optimal in some situations, but then again in some others it might be optimal. So this feature isn't always useful to everyone, but it doesn't follow that it's never useful to anyone.
Even if you worked on this on a range of projects at Google, isn't it possible that there are people working on systems that have rather different performance characteristics than Google's systems?
I think this is the important point when it comes to discussing zero-copy. I've written a custom protobuf implementation for java which can do exactly that.
It's a bit tricky since protobuf supports recursive messages and java's Unsafe is not as powerful as what you can have in C++. My trade-off was to require the caller to pre-allocate messages needed before parsing the data. This works great when working with multi-gigabyte files where you want to process a large number of (possibly nested) messages, but is not as ergonomic as normal protobuf code.
It obviously doesn't come for free, as you need to do a linear scan to find those tag-values, but there are ways to speed that up too, so it becomes very fast in practice.
I'm sure Cap'n Proto and FlatBuffers are faster for some use-cases (I haven't tested), but a very important point for me is to be wire-compatible with protobuf3 and its ecosystem... and still be zero-copy/zero-alloc.
And what happens if you copy two different branches of one message into another, and they happen to share some children? Do you have to keep your seen-pointer map around across multiple copies?
For a cyclic graph, things get more confusing. Copying one branch of a fully-connected cyclic graph always means copying the entire graph. Apps can easily get into trouble here. Imagine an app that implements its own tree structure where nodes have "parent" pointers. If they try to copy one branch into another message, they accidentally copy the entire tree (via the parent pointers) and might not even realize it.
The one way that I think pointer aliasing could be made to work is if pointers that are allowed to alias are specially-marked, and are non-owning. So each object still has exactly one parent object, but might have some other pointers pointing to it from elsewhere. A copy would not recurse into these pointers; it would only update them if the target object happened to be part of the copy, otherwise they would have to become null.
But I haven't yet had any reason to try implementing this approach. And apps can get by reasonably well without it, by using integer indexes into a table.
"Skyscrapers have been solved since the 20's" doesn't answer whether I should use steel beams when building a house.
Records in Cap'n Proto are laid out like C structs. All fields are at fixed offsets from the start of the structure. For variable-width values, the struct contains a pointer to data elsewhere in the message.
Each new object is added to the end of the message, so that the message stays contiguous. This does imply that if you resize a variable-width object, then it may have to be moved to the end of the message, and the old space it occupied becomes a hole full of zeros that can't really be reused. This is definitely a down-side of this approach: Cap'n Proto does not work great for data structures that are modified over time. It's best for write-once messages. FlatBuffers has similar limitations, IIRC.
Off-topic: I see a few links in your HN profile, is that the best place to keep up with the projects you're working on?
The reference implementation (in go) is 90% complete, enough to marshal objects except for recursive support.
[EDIT] and I forgot to mention re-entrant as well (again if done correctly).
Would BIPF fall into this category? It's a serialization format for JSON: https://github.com/dominictarr/bipf
if performance isn't an issue, then just use any of these tools. unless the tooling cost and representational issues make it easier to to just use bytes.
all for abstractions..but people seem to to be blind to the idea that there's a perfectly good one a short step down from capn proto
Protobuf can support zero-copy of strings and byte arrays embedded in the message. Cap'n Proto and FlatBuffers support zero-copy of the entire message structure.
(Disclosure: I'm the author of Cap'n Proto and Protobuf v2.)
Just to be clear, even this more limited notion of "zero copy" isn't currently supported in the open-sourced version of protobuf. It is supported in the internal Google version, which is presumably what you're thinking of. There is an open issue  tracking the possibility of making it available in the open source version too.
My guess are that Flatbuffers are more flexible and slower to read. And 5 years to late to the party.
This HN article, though, is about FlexBuffers. FlexBuffers appears to be based on FlatBuffers, but does not use schemas. Cap'n Proto, FlatBuffers, and Protobuf are all schema-driven (you must define your message types in a special language upfront). FlexBuffers is more like JSON in that all types are dynamic.
Personally I'm a strong believer that schemas are highly desirable, but some people argue that schema-less serializations let you get stuff done faster. I think it's very analogous to the argument between type-safe languages vs. dynamically-typed languages. Obviously there are a lot of smart people on both sides of these arguments.
For a document database, I don't agree at all. Some time back I spent more time than I'd like developing on Mongo, and boy did I wish I could actually tell it the schema of documents in each collection and have it enforce that (not to mention optimize based on it). A lot of developers actually use libraries on top of Mongo to define and enforce schemas.
I think protocols still don't have the level of tooling that makes it really obvious why schemas are better.
> Personally I'm a strong believer that schemas are highly desirable
I'm currently working on a project that's very executable size sensitive, and having a fixed schema is crucial to that, as can optimize a lot more aggressively. We can even use dead code feedback to identify schema parts that aren't consumed on the client
That article is a bit old; is there anything that stands out to you in the last ~5 years where things have diverged?
I haven't looked into exactly how FlexBuffers work but off the top of my head I suspect that leveraging FlatBuffer's "virtual table" technique probably helps here. In (normal, schema-ful) Cap'n Proto, fields within a struct have fixed offset, meaning that unused fields still take space. As I understand it, FlatBuffers tries to avoid this by adding an extra layer of indirection -- each struct has a sort of "virtual table" which stores the offsets of each field, where some fields might not be present at all. If multiple structs in a message happen to end up with the same virtual table, then the virtual table is only written once.
Totally speculating here since, again, I haven't actually looked at FlexBuffers, but if I were building something isomorphic to JSON on top of FlatBuffers, I'd probably look into extending the virtual tables to index fields by name rather than number. So if you have two structures with the same set of field names, they can share a virtual table, and those field names only have to appear once. That'd be a pretty great way to compress JSON.
Back in Cap'n Proto, we don't have these vtables. For data with fixed schemas, my opinion is that these vtables seem like they require more bookkeeping than they are worth. But for dynamic schemas they seem like a much bigger win. So if you wanted to encode dynamic schemas layered on top of Cap'n Proto, you'd probably have to come up with some similar vtable thing yourself.
Funny you should say vtables may not be worth it.. I was of a similar opinion (why would you have many fields that are not in use??) until people showed me some of the Protobuf schemas in use at Google, with hundreds of fields, most unused. This is what pushed me in the direction of the vtable design.
Data always starts out neatly.. but the longer things live, the more this kind of flexibility pays off.
Yeah, I was quite familiar with those back when I was at Google. But, IIRC, a lot of them were semantically unions, but weren't actually marked as such mostly because "oneof" was introduced relatively late in Protobuf's lifetime.
No joke, I originally was arguing for 8-bit vtable entries because surely no-one ever needs more than 256 bytes worth of fields. Good thing my co-workers were smarter than me.
And yes, FlatBuffers has built-in unions from day 1, which was probably helpful.