Hacker News new | past | comments | ask | show | jobs | submit login
FlexBuffers (google.github.io)
325 points by excerionsforte 11 months ago | hide | past | favorite | 163 comments

So here's how I think this fits into all the other different types of data serialization:

Schema-ful, copying: Protobuf, Thrift, plenty more

Schema-ful, zero-copy: Cap'n'proto, Flatbuffers

Schema-less, copying: Json (binary and other variants included), XML

Schema-less, zero-copy: Flexbuffers (Any others? This seems new to me)

While it's true that XML is a generic serialization of angle-bracket markup that doesn't need a schema for serialization (which was exactly the main motivation for its "invention" as a SGML subset), the reason it's being used in inter-party or long-term/loosely-coupled service payload serialization is because it has fairly powerful and well-established schema languages (XML DTDs and XML Schema) for validation. This is unlike JSON which thrives in ad-hoc serialization for tightly-coupled back-ends and web front-ends (and only in that scenario IMHO). So I don't think XML belongs into the "schema-less, copying" category.

Very true. IMO many people that hate XML config files just haven't used an IDE that validates schema. Its super nice to have auto-complete and property validation on config files, something not offered by JSON or YAML. A good reason to stick with XML for complicated configs.

Its one of reasons I don't mind maven. Yeah there's 1000+ line XML config file, but maven DTD is so tight nearly any syntax issue will be flagged. Something easy to appreciate when you're used to giant config files that don't get validated until runtime.

Visual studio does schema validation for JSON and gives errors inline. There is a big list of supported schemas built in and you can define your own. Never actually looked at the format personally.

Json schema is what vscode uses. It’s a standard nowadays. Just as effective as XMLSchema.



While I generally agree, I'm not sure maven's pom.xml (aka porn.xml), of all things, is a paragon of good markup design ;) For one, maven actively forbids/rejects use of ordinary XML entity references, and invents its own text expansion instead (so strictly speaking pom.xml isn't even XML proper). Then using XML for a relatively simple EAV format seems like overkill. But yeah, over a decade ago the maven developers had great plans to open up the format for alternative serializations/DSLs; thankfully they didn't, if only because they realized it isn't worth the maintenance effort. I'll add that a format for describing software builds is probably the wrong place to let a thousand blossoms bloom, something that you realize soon enough if you've worked as freelancer in Java-ish project's for any amount of time, and where every other project is feeling the urge to use the oh-so-great gradle as an alternative to pom.xml.

My comment was only aimed at service payload serialization; as to whether markup makes for a good config file format, I'm not entirely sure. It certainly is better than inventing an ad-hoc format IMO, but OTOH there are a couple of not-quite-SGML/XML config formats such as Apache's httpd.conf, or in fact maven's that just give XML/SGML a bad name IMHO, because they generally inherit the downsides of markup without bringing its benefits (such as being able to assemble a configuration from fragments using regular entity expansion).

pom.xml is not a great example of proper XML use. For example maven should utilize additional namespaces for plugin configurations, but it does not.

Jsonschema exists.


This is a later creation, not yet even finalized.

It was used in multiple production systems I have worked on.

Look at the number of DLs on this JSON schema validator https://www.npmjs.com/package/ajv

It works though.

Schema-less and schema-ful are better be split into schema-for-parsing, schema-for-validation, no-schema-invented-yet.

Or even schema-for-type, schema-for-value-validation

Just as an aside. The new version of OpenAPI (v3.1 RC) for REST interfaces now fully supports the json-schema validation mechanism so we may see an uptake in use of json schema.

Its not perfect and has some holes.

We're still early in development, but the ZNG format is one we're working on for schema-less, semi-structured, zero-copy operations:


We currently use this to store security log data, but think it's an interesting midpoint between having no schema at all vs requiring schema registries to do useful work.

Where does Apache Arrow fit in? Schema-ful, zero-copy?


Arrow would be schema-ful, zero-copy, yeah. Parquet isn't though.

Arrow is based on Flatbuffers, Parquet is based on Thrift.

One thing I wish people knew more about is SBE, it seems to be super fast by design:


As far as I know, SBE was designed with financial protocols in mind, though possiblity more order entry (FIX-SBE) than market data. I'm not saying it is the case but it's possible that it's more suited to the IEX market data that the article uses for the performance test than capnproto or flatbuffers.

I will consider it in future though, while I'm familiar with SBE it's not one I'd have thought of when thinking about serialisation.

SBE is a zero-copy schema-full serialization format. I don't think there's anything which limits the format to the financial domain. For example, here's [1] a toy example of a schema describing a car.

[1] https://github.com/real-logic/simple-binary-encoding/blob/ma...

Apache Avro is also another data serialization format. Schema-Ful, unsure about zero copying or copying.


Not zero copy, requires deserialization.

Apache Arrow is schema-ful, zero-copy (it is both an in-memory and serialization format)

Sounds like some implementations of MessagePack may be in the same category.

MessagePack/msgpack is great for that middle ground where JSON ser/des is too slow, but you don't have enough engineers to justify the maintenance burden of a heavier-weight schemaful protocol.

I did end up writing a simple schema verifier for Ruby (ClassyHash, on GitHub) in one of the jobs where I used msgpack, but I no longer have access to maintain it. My benchmarks showed msgpack+classyhash was faster than native JSON (didn't test oj I think) and other serialization formats, and faster than all the other popular Ruby schema validators at the time.

Tldr: msgpack rocks, use it instead of JSON for internal services

There's also ASN.1, the dreadful ancient giant of the scene.

Copying is more a facet of the implementation than the architecture, and relates strongly to the language and runtime. There's no reason that protobuf needs to copy. The only reason most C++ protobuf libraries copy is because ownership in C++ is hard and that makes zero-copy hard to use safely. By contrast it's easier to write a protobuf codec in Go that just aliases everything, because the Go runtime keeps any referenced buffer alive and deletes it when it's not referenced. In any case, it's always been possible to have zero-copy protobuf, you just don't get that from the "Hello, World!" protobuf tutorial.

This is incorrect -- it is not possible to implement Protobuf in a way that achieves the notion of "zero-copy" that Cap'n Proto and FlatBuffers achieve.

This probably comes down to a disagreement on what "zero-copy" means.

Some people use the term "zero-copy" to mean only that when the message contains a string or byte array, the parsed representation of those specific fields will point back into the original message buffer, rather than having to allocate a copy of the bytes at parse time.

Cap'n Proto and FlatBuffers implement a much stronger form of zero-copy. With them, it's not just strings and byte buffers that are zero-copy, it's the entire data structure. With these systems, once you have the bytes of a message mapped into memory, you do not need to do any "parse" step at all before you start using the message.

For example, if you have a multi-gigabyte file formatted with one of these, you can mmap() it, and then you can traverse the message tree to read any bytes of the message except the chain of pointers (parent to child) leading to that one datum. Aside from the mmap() call, you can do all this without even allocating any memory at all.

That is absolutely not possible with Protobuf, because Protobuf encoding is a list of tag-value pairs each of which has variable width. In order to read any particular value, you must, at the very least, linearly scan through the tag-values until you find the one you want. But in practice, you usually want to read more than one value, at which point the only way to avoid O(n^2) time complexity while keeping things sane is to parse the entire message tree into a different set of in-memory data structures allocated on the heap.

That is not "zero-copy" by Cap'n Proto's definition.

(Disclosure: I am the author of Cap'n Proto and Protobuf v2.)

Nothing of your comment was false but there's no intrinsic value to your stronger definition of zero-copy. A direct mapping to memory is not always optimal for performance. Indeed, there are high-performance computation packages that compress data structures in L1-cached-sized blocks, to save main memory bandwidth. So, you've used the word "achieve" to decorate an outcome that might not be optimal.

By the way I worked on protobuf performance at Google for years and we could never get flatbuffers to go any faster.

> no intrinsic value to your stronger definition of zero-copy.

Whoa, that's a very strong statement. But then the rest of the paragraph gets a lot weaker.

> there are high-performance computation packages that compress data structures in L1-cached-sized blocks

This seems like a non sequitur. Of course hand-tuned data structures can achieve higher performance than any serialization framework, but what does that have to do with zero-copy vs. protobuf? Are you suggesting that protobuf encoding would be a good choice for these people?

> So, you've used the word "achieve" to decorate an outcome that might not be optimal.

I'm not sure why "achieve" would imply "optimal". Of course whether this is an advantage depends on the use case.

There are many cases where zero-copy doesn't provide any real advantages. If you're just sending messages over a standard network socket, then yeah, zero-copy probably isn't going to make things faster. There are already several copies inherent in network communication.

But if you have a huge protobuf file on disk and you want to read one field in the middle of it, that's just not something you can do in any sort of efficient way. With zero-copy, you can do this trivially with mmap().

Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste. Zero-copy would let you build and consume the structure from the same memory pages.

These seem "intrinsically valuable"?

> we could never get flatbuffers to go any faster.

What use case were you testing? Did you test any zero-copy serializations other than flatbuffers?

I've heard from lots of people that say Cap'n Proto beat Protobuf in their tests... but it definitely depends on the use case.

> Or if you're communicating over shared memory between processes on the same machine, then the entire serialize/parse round trip required with protobuf is 100% waste.

This is the part I don’t agree with. What I’m saying there is value in using encoded structures not only between servers and not only over shared memory but even within a single process. Yes, you discard the ability to just jump to any random field, but that is not always important. Often it can be better to spend some compute cycles and L1 accesses to save main memory accesses. If you are having to make full access to some kind of data anyway, then packing it makes a ton of sense. Consider any kind of delta-encoded column of values ... you can’t seek within it, but if the deltas are smaller than the absolutes, this can save massive amounts of main memory bandwidth. This is why I argue that representing something as a C struct in main memory is not obviously advantageous, outside some given workloads.

As for flatbuf at google I’m sure you’re aware that the only way to get the kind of mindshare you’d need to ship it would be to make websearch measurably faster.

OK, so we're talking about a use case where you're compressing data in main memory and trying to decompress it only within L1 cache. I guess there must be a lot of data sitting around in RAM that isn't accessed very often. Search index leaf nodes I suppose?

It doesn't seem to me like Protobuf is ideal for this use case, but sure, I see how the light compression afforded by Protobuf encoding could lead to a win vs. bare C structs.

I think a better answer here, though, would be to use an actual compression algorithm that has been tuned for this purpose.

Of course, then the uncompressed data needs to be position-independent, so no (native) pointers. You could use something hand-rolled here... but also, this is exactly the problem zero-copy serializations solve, so they might be a good fit here. Hmm!

I'd be pretty interested to compare layering a zero-copy serialization on top of compression vs. protobuf encoding in these high-performance computing scenarios. Is that something you tried doing?

If this is your thing, then this is your book:


> Yes, you discard the ability to just jump to any random field, but that is not always important.

I don't think this is the criticism being raised.

The main criticism being raised against non-zero-copy serialization is that this often requires maintaining different memory representations for the same value - the copies are the consequence of transforming from one representation to another one.

We do that all the time in high-performance computing. You keep a packed representation in memory and unpack it in small pieces to operate on it. Sparse matrices, compressed columns, etc. This is not evil, it’s an adaptation to the way the machine works. Saying that Kenton’s definition of zero-copy is unconditionally better is an aesthetic argument and I don’t buy it.

You are still not getting it.

They are not talking about "unpacking on the fly for processing", but rather about "unpacking on memory to be able to call an opaque API outside your control that expects the unpacked representation". That requires copying in-memory to interface with that API.

Your approach only works if you are willing to "re-implement the world" to interface with whatever packed format suits your application.

With zero-copy serialization you don't have to do that.

We can both advocate for different perspectives on this issue without either of us “not getting it”.

> Saying that Kenton’s definition of zero-copy is unconditionally better

It feels like you're conflating things here.

It's not always a better algorithm, but it's a better definition.

There is a big difference to me. I haven't used any of those systems but I have written plenty of console games where the artists and designers want to fill memory. That means I don't have memory for 2 representations, unparsed and parsed. I also want fast loading so loading say 4k at a time into some temp buffer and parsing into memory is also out. I load the file directly into memory, fix up the pointers, and use it in place.

Not-Faster on what platform? In Borg, in Google3 code, deployed on a fast machine with a nice fast wide memory bus and a large cache?

What about in embedded code, or in a game? A place where memory bandwidth is scarce, or where we're trying desperately to reduce the number of syscalls and jumps back and forth between kernel and user space?

Having the entire payload memory mapped, and copies avoided, makes an absolutely huge difference once these kinds of concerns are real. Having something mmap'd in theory means it's nice and hot and fresh in the kernel's mind, and potentially kept in cache.

Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship. Using something like flatbuffers would have made a lot more sense.

At Google protobuf is the veritable "once I have a hammer everything looks like a nail", to the point where I've seen protobufs in JSON payloads, or vice versa, or even deeper... protobof in JSON inside a Chromium Mojo payload... because, well, how good could it be without protobuf?

> Some people in my team had gRPC foisted on them to run on an ARM Cortex class device running a real time OS. It boggled my mind they were even able to get it to ship.

I shipped an embedded project using an RTOS and < 256kB RAM using protocol buffers (even nested ones) and zero-copy deserialization for byte arrays some time ago. We used nanopb, and it worked just fine if you understand how it works - although it certainly is less pleasant to work with than a fuller implementation which would just have copied out the internal bytes into new arrays and not have let us deal with lots of internal pointers into byte arrays.

Overall using Protocol Buffers was a great success in that project, since we could share schemas between IoT devices and the backend, and were able to generate a lot of code which would otherwise have been hand-written.

> Using something like flatbuffers would have made a lot more sense.

It might be able to solve the same problem. But it also needs to answer the questions: Is a suitable library available for all consumers of the data-structure? I don't think this was the case for our use-case, so it wouldn't have made more sense to use it back then.

> Not-Faster on what platform? In Borg, in Google3 code, deployed on a fast machine with a nice fast wide memory bus and a large cache?

Weird rebuttal, considering that my argument is that the copying nature of protobuf can save memory bandwidth.

> my argument is that the copying nature of protobuf can save memory bandwidth.

Huh? An extra pass over the data to parse it obviously uses more memory bandwidth.

I might understand your argument if parsing converted the data into a memory-bandwidth-optimized format, but the protobuf parsed form is certainly not that, unless things have changed very drastically since I worked on it.

Right but the parsed form is exactly what I meant when I said it’s an aspect of the implementation. The generated C++ code you get from Google’s protoc is a sparse thing, to be sure. But that is an artifact of the implementation. You can do anything you want with a protobuf and there are numerous independent implementations in the wild, in many languages.

I believe the point is that the actual wire format of protobuf is not amenable to memory mapping and direct manipulation or access. So to use it this way a copy would have to be made. The appeal of cap'n'proto and flatbuffers is the ability to map and work with the serialized format with minimal overhead.

At Google scale protobuf works perfectly fine. It's our lingua franca and a lot of work (apparently by yourself included) has gone into making it performant. But it comes with normative lifestyle assumptions.

Sure you could change the wire format and implementation to be mappable; but then it wouldn't be compatible with the mainstream implementation.

Couldn't find your ldap internally.. would be happy to chat on what you tried to make flatbuffers go faster. Mine you can find on the go link page for flatbuffers.

FlatBuffers can be accessed instantly without deserialization or allocation, so clearly in some cases a huge speedup is possible. If in your case there was no speedup, there must be other bottlenecks.

I don’t work there any more. If you want to talk to someone who knows a lot more about it than me, try haberman (who also hangs out here).

> there's no intrinsic value to your stronger definition of zero-copy

> A direct mapping to memory is not always optimal for performance.

You comment seems to imply that the first statement follows from the second, but it does not. You're right, a direct memory mapping might not be optimal in some situations, but then again in some others it might be optimal. So this feature isn't always useful to everyone, but it doesn't follow that it's never useful to anyone.

Even if you worked on this on a range of projects at Google, isn't it possible that there are people working on systems that have rather different performance characteristics than Google's systems?

> Aside from the mmap() call, you can do all this without even allocating any memory at all.

I think this is the important point when it comes to discussing zero-copy. I've written a custom protobuf implementation for java which can do exactly that.

It's a bit tricky since protobuf supports recursive messages and java's Unsafe is not as powerful as what you can have in C++. My trade-off was to require the caller to pre-allocate messages needed before parsing the data. This works great when working with multi-gigabyte files where you want to process a large number of (possibly nested) messages, but is not as ergonomic as normal protobuf code.

It obviously doesn't come for free, as you need to do a linear scan to find those tag-values, but there are ways to speed that up too, so it becomes very fast in practice.

I'm sure Cap'n Proto and FlatBuffers are faster for some use-cases (I haven't tested), but a very important point for me is to be wire-compatible with protobuf3 and its ecosystem... and still be zero-copy/zero-alloc.

Author of protobluff [1] here - a zero-copy, mostly stack-based implementation for Protocol Buffers in C. Yes, you must scan through the message to find the corresponding tag/value pair you're interested in, but in many cases, it can be fast enough if you're only interested in a few values of a large message. Sure, using a vtable is much faster, as it provides O(1) lookup semantics, but sometimes you may not be able to change to another format without touching the entire stack.

[1]: https://github.com/squidfunk/protobluff

Sounds like there's a point to be made for referential integrity too. That is, if a struct contains string A twice, when you read it back in you'd want both those pointers to be identical. You'd get this for free with Cap'n Proto, but it would require extra care with "one-copy" or looser definitions of "zero-copy."

Heh, well, you could get it for free in Cap'n Proto if Cap'n Proto allowed pointer aliasing. It doesn't, though, because if it did, then messages would not be trees, they'd be graphs, which ruins a lot of stuff. For example, a very common thing to do with a message is copy one branch of the tree into a different message. Deep-copying a branch of a tree is easy. Deep-copying a branch of a graph, though -- what does that even mean?

Copying a branch of a DAG has about the same meaning as copying a branch of a tree, right?

For a DAG, maybe, but it requires a lot more bookkeeping. Now you have to remember all the pointers you've seen before in order to detect dupes. To do that you probably need a hash map and some dynamic memory allocation, ugh.

And what happens if you copy two different branches of one message into another, and they happen to share some children? Do you have to keep your seen-pointer map around across multiple copies?

For a cyclic graph, things get more confusing. Copying one branch of a fully-connected cyclic graph always means copying the entire graph. Apps can easily get into trouble here. Imagine an app that implements its own tree structure where nodes have "parent" pointers. If they try to copy one branch into another message, they accidentally copy the entire tree (via the parent pointers) and might not even realize it.

The one way that I think pointer aliasing could be made to work is if pointers that are allowed to alias are specially-marked, and are non-owning. So each object still has exactly one parent object, but might have some other pointers pointing to it from elsewhere. A copy would not recurse into these pointers; it would only update them if the target object happened to be part of the copy, otherwise they would have to become null.

But I haven't yet had any reason to try implementing this approach. And apps can get by reasonably well without it, by using integer indexes into a table.

This problem (persistent graph structures) has been solved since the 90s: http://citeseerx.ist.psu.edu/viewdoc/download?doi=

This isn't a question of finding a magical solution, it's a question of trade-offs in performance, complexity, and usability that any solution necessarily imposes, and whether those trade-offs are worth it to support a feature that 99% of use cases don't need.

"Skyscrapers have been solved since the 20's" doesn't answer whether I should use steel beams when building a house.

Maybe it should be called zero parse and zero copy? Seems like two different terms...

Worth noting that by the weak definition of zero-copy JSON can also have zero-copy deserialization; many JSON embedded libraries either write a null terminator in the original buffer or return a pointer and a length.

Could you point to documentation on how does Cap'n Proto achieve this? Does it keep a header with offsets of bye positions for individual fields? What happens when a variable sized field is edited?


Records in Cap'n Proto are laid out like C structs. All fields are at fixed offsets from the start of the structure. For variable-width values, the struct contains a pointer to data elsewhere in the message.

Each new object is added to the end of the message, so that the message stays contiguous. This does imply that if you resize a variable-width object, then it may have to be moved to the end of the message, and the old space it occupied becomes a hole full of zeros that can't really be reused. This is definitely a down-side of this approach: Cap'n Proto does not work great for data structures that are modified over time. It's best for write-once messages. FlatBuffers has similar limitations, IIRC.

Thanks for this explanation!

Off-topic: I see a few links in your HN profile, is that the best place to keep up with the projects you're working on?

I think all my major projects are in my profile... Cloudflare Workers (and Cap'n Proto, which it uses) are my day job; I don't get much time to work on other projects these days unfortunately.

Indeed, there were several projects that can replace the generator (really, you can come up with something completely else). One of the latest I've seen is https://perfetto.dev/docs/design-docs/protozero

I'm building Concise Encoding, which is schema-less, with a 1:1 compatible text and binary representation. The binary format supports zero copy for string and binary values.

The reference implementation (in go) is 90% complete, enough to marshal objects except for recursive support.


https://github.com/rentzsch/lich is schemaless and potentially zero copy.

XDR (schema-ful, zero-copy (if done correctly)), but it's older and even less cool.

[EDIT] and I forgot to mention re-entrant as well (again if done correctly).

So in this/your view, Protobuf and CBOR would be used in different scenarios?

Yeah, protobuf requires that both parties agree on a schema whereas CBOR is self-describing like XML or JSON.

CBOR also has a schema format, but yes, it's typically used schemaless.

I hadn't heard of this, so I had to look it up: https://tools.ietf.org/html/rfc8610

> Any others?

Would BIPF fall into this category? It's a serialization format for JSON: https://github.com/dominictarr/bipf

but i really do wonder why this is such a matter of debate. if performance and specific semantics are an issue...just use the standard tricks and write bytes into a buffer and push it onto the wire.

if performance isn't an issue, then just use any of these tools. unless the tooling cost and representational issues make it easier to to just use bytes.

all for abstractions..but people seem to to be blind to the idea that there's a perfectly good one a short step down from capn proto

Except in truth Protobuf, Thrift, Avro, etc. all support doing the same "schemaless" decoding as FlexBuffers.

The msgpack parser handles strings zero copy as well. Some tiny JSON parsers for embedded stuff do that as well.

Some formats optionally support schemas, or bundle the schema with the file. I think Avro does that.

Aren't the newer versions of protobuf zero-copy?

No, not in the way that Cap'n Proto and FlatBuffers are.

Protobuf can support zero-copy of strings and byte arrays embedded in the message. Cap'n Proto and FlatBuffers support zero-copy of the entire message structure.

(Disclosure: I'm the author of Cap'n Proto and Protobuf v2.)

> Protobuf can support zero-copy of strings and byte arrays embedded in the message.

Just to be clear, even this more limited notion of "zero copy" isn't currently supported in the open-sourced version of protobuf. It is supported in the internal Google version, which is presumably what you're thinking of. There is an open issue [1] tracking the possibility of making it available in the open source version too.

[1] https://github.com/protocolbuffers/protobuf/issues/1896

Kenton, so what are the main difference to your Cap'n Proto?

My guess are that Flatbuffers are more flexible and slower to read. And 5 years to late to the party.

FlatBuffers has been around almost as long as Cap'n Proto. I wrote this comparison back in 2014, but it may be outdated now: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-...

This HN article, though, is about FlexBuffers. FlexBuffers appears to be based on FlatBuffers, but does not use schemas. Cap'n Proto, FlatBuffers, and Protobuf are all schema-driven (you must define your message types in a special language upfront). FlexBuffers is more like JSON in that all types are dynamic.

Personally I'm a strong believer that schemas are highly desirable, but some people argue that schema-less serializations let you get stuff done faster. I think it's very analogous to the argument between type-safe languages vs. dynamically-typed languages. Obviously there are a lot of smart people on both sides of these arguments.

One use case where schema-less is the way to go, when you provide the infrastructure, but have no „ownership“ of data it will be used for. E.g. you build a logging or analytics tool where customers can send arbitrary data. Or a document database as a matter of fact. There schema-less / self described data is a must.

Not necessarily. For logging/analytics, you could have customers upload their schema when configuring the service. I would think that doing so would allow for some powerful optimization opportunities, enabling your service to save quite a bit of CPU and maybe some bandwidth, too. It would probably also allow you to provide a better user experience, like making it easier to construct dashboards and such because you actually know how the data is structured.

For a document database, I don't agree at all. Some time back I spent more time than I'd like developing on Mongo, and boy did I wish I could actually tell it the schema of documents in each collection and have it enforce that (not to mention optimize based on it). A lot of developers actually use libraries on top of Mongo to define and enforce schemas.

You can apply a Schema to your collection and have MongoDB enforce it . This has been in MongoDB since version 3.6. We support the JSON schema standard. https://docs.mongodb.com/manual/core/schema-validation/

Cool! My time using Mongo was pre-3.6 so I didn't know about this.

True, I guess it all boils down to ease of use (convenience). You can build a system which accepts schema + data and build dash boards and data relevant relevant processing + optimisations, but that results in a much more complex system with higher entry burden. Sadly convenient systems always get broader adoption.

Yeah, I see this as pretty similar to the debate over type-safe vs. dynamic languages. It used to be that people argued that dynamic languages were just way easier to use, and type-safe languages were just an optimization. But I think the real issue was that the tooling enabled by type-safe languages didn't exist yet. These days, TypeScript is no faster than JavaScript but is very popular, because of the tooling it enables, like editors with auto-complete and jump-to-definition.

I think protocols still don't have the level of tooling that makes it really obvious why schemas are better.

I own a service where users simply upload their schemas along their data. Doesn't seem to be a major issue.

Flatbuffers can support some writeable updates and it's language support is pretty comprehensive.

> Personally I'm a strong believer that schemas are highly desirable

I'm currently working on a project that's very executable size sensitive, and having a fixed schema is crucial to that, as can optimize a lot more aggressively. We can even use dead code feedback to identify schema parts that aren't consumed on the client

You can also combine flatbuffers and flexbuffers: https://users.soe.ucsc.edu/~jlefevre/skyhookdb/fb_structure_...

Thanks for linking the article. Despite the original article being about FlexBuffers, I happen to have been looking at FlatBuffers vs Cap'n Proto today.

That article is a bit old; is there anything that stands out to you in the last ~5 years where things have diverged?

I haven't kept track of FlatBuffers so I don't know what might have changed there, except that I imagine they support a lot more languages now (probably more than Cap'n Proto honestly). Cap'n Proto's serialization layer hasn't changed very much in those 5 years; development focus has been more on the RPC system.

In theory, you could build something like flexbuffers for cap’n’proto if someone was really motivated, no?

I suppose you could layer something on top. But I think it would be tricky to come up with something with satisfying performance properties. A naive encoding of JSON into Cap'n Proto would result in messages that are much larger than JSON messages, because of all the pointers and padding and text field names.

I haven't looked into exactly how FlexBuffers work but off the top of my head I suspect that leveraging FlatBuffer's "virtual table" technique probably helps here. In (normal, schema-ful) Cap'n Proto, fields within a struct have fixed offset, meaning that unused fields still take space. As I understand it, FlatBuffers tries to avoid this by adding an extra layer of indirection -- each struct has a sort of "virtual table" which stores the offsets of each field, where some fields might not be present at all. If multiple structs in a message happen to end up with the same virtual table, then the virtual table is only written once.

Totally speculating here since, again, I haven't actually looked at FlexBuffers, but if I were building something isomorphic to JSON on top of FlatBuffers, I'd probably look into extending the virtual tables to index fields by name rather than number. So if you have two structures with the same set of field names, they can share a virtual table, and those field names only have to appear once. That'd be a pretty great way to compress JSON.

Back in Cap'n Proto, we don't have these vtables. For data with fixed schemas, my opinion is that these vtables seem like they require more bookkeeping than they are worth. But for dynamic schemas they seem like a much bigger win. So if you wanted to encode dynamic schemas layered on top of Cap'n Proto, you'd probably have to come up with some similar vtable thing yourself.

FlexBuffers are actually not built on top of the FlatBuffers encoding, they have their own special purpose encoding, which tries to be as compact as possible while still allowing in-place access (details, search for FlexBuffers here: https://google.github.io/flatbuffers/flatbuffers_internals.h...).

Funny you should say vtables may not be worth it.. I was of a similar opinion (why would you have many fields that are not in use??) until people showed me some of the Protobuf schemas in use at Google, with hundreds of fields, most unused. This is what pushed me in the direction of the vtable design.

Data always starts out neatly.. but the longer things live, the more this kind of flexibility pays off.

> some of the Protobuf schemas in use at Google, with hundreds of fields, most unused

Yeah, I was quite familiar with those back when I was at Google. But, IIRC, a lot of them were semantically unions, but weren't actually marked as such mostly because "oneof" was introduced relatively late in Protobuf's lifetime.

Agreed, most use cases are not this extreme. But I saw them as an "upper bound" on how people would stretch a serialization system. I didn't want to be the guy saying "640k should be enough for everyone".

No joke, I originally was arguing for 8-bit vtable entries because surely no-one ever needs more than 256 bytes worth of fields. Good thing my co-workers were smarter than me.

And yes, FlatBuffers has built-in unions from day 1, which was probably helpful.

Now that absl::Cord has been opensourced, I'd expect a release of protobuf that takes advantage of it. Up to you to use the API though.

> if you supply a buffer that actually contains a float, or a string with numbers in it, it will convert it for you on the fly as well, or return 0 if it can't. If instead you actually want to know what is inside the buffer before you access it, you can call root.GetType() or root.IsInt() etc.

I've started to prefer functions that are explicit about their error cases and have interfaces that make it obvious about what errors you'd want to handle. Unexpectedly getting a zero when you get garbage data may cause issues.

Thinking about it, zero is often used as a success code: ERROR_SUCCESS is 0x0 on Windows. So garbage in, ERROR_SUCCESS out??

Something like:

    value, err = root.AsInt64();
makes it clear you have an error path. You can still ignore `err`, like when you are moving fast and breaking things or it's somehow certain to always succeed, but it's clear that there's an unhandled error path.

As you can see from the documentation, `AsInt64` is a convenience method that either says, I know this is an int, or make it so. There are also ways to check the type before you access, if you prefer. You can even check if its unsigned, if you want that level of type-safety.

Right, but GP is suggesting that the requirement to ignore an 'err' value that would make the API user think twice about possible failure cases that could otherwise go ignored with the convenience API.

You are choosing to use a schema-less, dynamically typed representation (where a strongly typed alternative is directly available). I'd say, the convenience of being able to just say I am going to assume this is an int is fitting. If you wanted to be forced to error check (which needs an if-then in most languages), you might as well use the schema based, strongly typed version of the system instead, which would guarantee correct types for you, and requires no error checking.

But look at the encoding, they compact the int down to the amount of bytes it needs for storage. There is no space to encode the type, and there is no way to distinguish encoded uint from an int. You must layer your own schema on top.

I believe this is the docs that explain their type byte. That information is stored:


> A type byte is made up of 2 components (see flexbuffers.h for exact values):

> * 2 lower bits representing the bit-width of the child (8, 16, 32, 64). This is only used if the child is accessed over an offset, such as a child vector. It is ignored for inline types.

> * 6 bits representing the actual type (see flexbuffers.h).

> Thus, in this example 4 means 8 bit child (value 0, unused, since the value is in-line), type SL_INT (value 1).

Type aware methods like `.isInt()` wouldn't make sense if they couldn't determine the underlying type.

If you use cpp17, you can use the built-in std::optional to solve this problem.

Strictly, not the same. One returns a value and an error. One returns a value or nothing.

Having the error can signal both if a value could not be retrieved, or if it could be retrieved, but was coerced, as an example.

https://google.github.io/flatbuffers/flatbuffers_white_paper... has the “why”, if anyone is curious.

I made this thing! AMA :)

When I had searched HN for this, there was none. Curious, why didn't you market it? Seems production ready or maybe it was planned in the future.

I'm glad to have seen this, which was on https://serde.rs/, which seems to be a recent addition. I was looking for an alternative to messagepack.

"market" it? This is an open source project, with zero budget. I'd rather spend my time writing more code than trying to market anything, which seems hard in todays climate :) Good systems will eventually reach people organically, even if it takes longer that way.

Marketing is an umbrella term for spreading awareness. Less people that know about it then the less likely your product gets experimented with and then chosen. I don't think any recent serialization benchmarks include flexbuffers.

There's different ways to make people aware of alternatives. I chose to spread word of this product on HN :)

Even that kind of "marketing" takes time :) But thanks for helping out.

Rather than "market", you could "promote" or "evangelize" instead...

Unlike MsgPack, FlexBuffers doesn’t support custom/extension types. Is this something that is planned for the future?

I searched for "msgpack extension types" and "msgpack custom types" and nothing conclusive came up, so you'll have to explain what you mean.

FlexBuffers is a schemaless format much like JSON, so is naturally extensible, and typeless, so I don't see how it would need any such thing.

Here's the link to the spec about extension types: https://github.com/msgpack/msgpack/blob/master/spec.md#exten...

From there:

MessagePack allows applications to define application-specific types using the Extension type. Extension type consists of an integer and a byte array where the integer represents a kind of types and the byte array represents data. Applications can assign 0 to 127 to store application-specific type information. An example usage is that application defines type = 0 as the application's unique type system, and stores name of a type and values of the type at the payload.

Ah, thanks! I presume that is done for compactness, because I otherwise see no reason to pack an application type in with the built-in types.

FlexBuffers could support a similar scheme if we wanted to, though putting custom data in the format is already easy enough using the "blob" type, which is arbitrary bytes much like the MessagePack feature.

The nice thing with custom types is that you only need one type dispatching logic.

Sure, it works via stuffing the custom things in blobs, but then you have (1) an extra layer of indirection during dispatch, and (2) always waste a few bits or bytes to store the custom type information, even when you don't need it for actual blobs.

In principle, it's not a lot of effort to reserve a fraction of the type ID space for custom types. From a user's perspective, it's also much nicer to hear that a library cleanly uses existing extensions to the type system, rather than hijacking a general-purpose blob type.

If there's nothing that prevents it, feel free to take this as a feature request :-)

You'd also need a way to patch the custom (de-)serializer into your "dispatch", which currently there isn't.. as unlike MsgPack, FlexBuffers doesn't actually unpack anything.

So this feature would look like a new function IsCustom1() or AsCustom1() etc, where the latter would give you a pointer to the bytes to do your own thing.. so not much difference with Blobs.

Also note that because FlexBuffers is O(1) access to elements, inline data (most scalars) is all the same size, whereas variable sized data is stored over an offset. So again, would not be very different from blobs.

I think it is called polymorphism, at least on msgpack libraries I used.

What do you think is the best buffer protocol to use for multiplayer games? We used Protobuf for a fast-paced .io game, but encoding-decoding turned out to be pretty slow and generated a lot of garbage in JS. We were in the process to switch to FlatBuffers (before the company went bankrupt), but the syntax made it feel harder to use compared to Protobuf, not sure about the performance though (we expected it to be faster and less garbage created because of the zero-copy).

So, would you recommend Protobuf, Cap'n'Proto, FlatBuffers or FlexBuffers for multiplayer games? The usual packets are game states or user input sent at high frequency.

I originally designed FlatBuffers for games (though admittedly more for things like level data or save game data than network packets), so I'd think it is pretty suitable. I had actually used Protobuf on a game project just before, and its performance problems led directly to the no-unpacking no-allocation design that FlatBuffers has.

So FlatBuffers will make an incoming packet waaay faster to work with than Protobuf. On the downside, Protobuf tends to be a little smaller, so if bandwidth is a greater concern than (de-)serialization speed, you might still prefer it. Additionally, receiving data over the network raises the question of how you handle packets that have been corrupted (or intentionally malformatted by an attacker), and in the case of FlatBuffers you'd need to at least run the "verifier" over the packet before accessing it if you don't want your game servers to crash when this happens. That slows it down a little bit, but is still fast, i.e. still doesn't allocate etc.

Cap'n Proto will perform similarly, though does have the downside that all fields take space on the wire, regardless of whether they're set or not. So which is better depends on the kind of data you want to send and how it is likely to evolve.

Frankly, for the absolute highest performance (and lowest bandwidth) game networking you still need a custom encoding.

> Frankly, for the absolute highest performance (and lowest bandwidth) game networking you still need a custom encoding

This is really the truth here. The efficiency of a networking protocol for a multiplayer game is somewhat sensitive to the context; Are you trying to do a fast-paced game with lots of client prediction? Do you need to have guaranteed delivery? Do you expect your game to be used with highly-lossy networks, or mostly from stable connections?

It's not uncommon to find something like flatbuffers or flexbuffers available in a multiplayer game engine, but the high-performance systems like movement or ai will probably utilize a custom protocol better suited to their task.

My experience with capn proto was a mixed bag.

Quite apart from its neglect for years as Kenton was working on sandstorm.io, I found the low level access interfaces to extract best performance to be both fiddly and often rely on an intimate knowledge of the internals.

As is often the case, maximum performance for your use case requires more intimate detail.

I can't remember, but some of these zero copy protocols may still require byte order swapping on mixed endian systems. (But cache access times probably dwarf the byte swapping)

there is a comment up thread from people at google claiming flatbuffers aren't faster than protobuf in practice. that surprises me a lot, as i was interested in flat buffers for performance gains. do you have an opinion ?

I've worked with people internally, and it is indeed a challenge to speed up services with FlatBuffers. This is because 99.9% of internal communications flow over Protobuf, and many such services are simply: receive Protobuf, modify a few things, send it on the next service. So to deploy FlatBuffers there, you usually have to translate from Protobuf to FlatBuffers, do some high intensity stretch of communication in FlatBuffers only, then convert back. Given the cost of conversions, it would require replacing fairly large amounts of infrastructure for it to be a clear win, and given that those conversions have to be written entirely by hand, that is often not feasible. So translation of what the other comment said: "we gave up".

FlatBuffers is used to great effect, but often with new teams building new things, that are not yet shackled in Protobuf everywhere, like many of the AI related teams.

Ok, now that gives a very different picture from what the original comment explained. Of course, protobuf to flatbuffer is slower than just protobuf...

Thanks a lot for your time (and work!). FYI, i'm planning on using flatbuffer for communication between a mobile app native code and a cross-platform library, hoping to save on data decoding time. A bit similar to what Xi ide is doing. If you have any advice, you're welcome ! :D

If you use it for networked communications, by default use the verifier on the C++ side, would be my tip.

As i understood it, this is not about being faster, but about being more memory efficient.

zero copy supposedly means you're saving on memory allocation when reading. You're saving the decoding step, so my guess would have been that in practice you're gaining both on memory consumption and cpu.

I'm looking to replace our current Boost.Serialization code with something else. I have the following requirements:

- serialisation cannot be intrusive (i.e. I don't want a class definition to be generated from a schema file)

- zero copy

- some sort of versioning so any accidental message version mismatch between sender and receiver doesn't cause a crash / undefined behviour.

- many language support

Protobufs are intrusive, so that rules them out. JSON is just too verbose and slow. MessagePack looked like the most promising path to pursue at the moment. Flatbuffers look quite similar.

My predecessor has employed smile in all our m2m communication. As far as I have understood it, it aims to be a binary representation of json. I cannot speak for its speed but I trust him to a certain extent.

there's a lot of C++ around this FlexBuffer thing so I'm not sure how relevant it is outside of C++. Any idea?

We use FlatBuffers (the parent project) in our code @ FS... that means it has to work in C++, Rust, Java, and TypeScript. It's mostly the schema processing that requires a C++ compiler to make it work.

It's a pretty stellar cross-platform serialization toolkit and the zero-copy support for reading is no joke.

We've been using it across multiple languages too as the data serialization format between server and (web) client. It's fantastic.

FlexBuffers currently is implemented in C++, Java, Rust, Dart, Swift, (and JS under review).

You forgot Python and C# :). Plus a Unity3D specific version, which is battle tested in a game with more than half a million downloads ;).

We use Flexbuffer to serialize aribitary Objective-C objects through reflection (rather than NSCoding). It is faster and stable for us in a few years actually (we implemented that in early 2019).

The codegen for flatbuffers is fairly extensive, but C++ is still the blessed language for gRPC, however [0].

[0] https://google.github.io/flatbuffers/flatbuffers_grpc_guide_...

gRPC FlatBuffers is supported in quite a few languages besides C++, Java, Go, Python, Swift, off the top of my head..

FlatBuffers libraries are implemented in a big list of languages, as you can see on the website. FlexBuffers is not implemented in all the same languages yet, at least in the official Google repo. Otherwise, it is very relevant in any language--it's a binary serialization format.

If you need a language support for FlexBuffers please create an issue on GitHub repo. I ported it already to Swift, C#, Dart, Python, JS (is in review). If I am familiar with the language you need, it will take me just a couple of days to port it.

It is basically a code template generator. It is written in C++ but it can generate code for other languages.

It is one of the best serialisation methods currently available.

AFAIK some languages can use C++ fairly easily, including D and Rust

These languages can use C fairly easily. To use C++, you must wrap the C++ interface in `extern "C"` interfaces. It's very hard to use the full spectrum of C++ features.

Consider that you're writing a custom struct/data type called MyStruct in your language. Now, can you use std::vector<MyStruct>? This requires your other language to read the entire vector header and instantiate a new type based on both your definition of your custom type and the vector template. The vector instantiation will ask questions like whether or not your type is copy constructible and move constructible, and if so, use these implementations. Designing this into your other language requires a lot of compromises.

And that's just templates. ADL poses an even bigger difficulty. Again when you write your custom type in your language and define a swap function for it, will C++ standard algorithms that need swapping find your implementation via ADL? Recall that swapping is invoked via having `using std::swap;` and then invoking the bare function `swap` which triggers ADL. Again, designing this into your other language requires a lot of compromises.

The only language that I know of that comes close to having full interoperability with C++ is Objective-C, and it does so by creating a new Frankenstein language called Objective-C++. It has two different kinds of classes, two different exception mechanisms, etc.

Wrong, D has support for C++ FFI.


It supports neither templates nor ADL, the two of my examples.

In fact the doc says,

> Being 100% compatible with C++ means more or less adding a fully functional C++ compiler front end to D. Anecdotal evidence suggests that writing such is a minimum of a 10 man-year project, essentially making a D compiler with such capability unimplementable.

It then goes on to describe the pragmatic approach that doesn't allow full compatibility:

> D takes a pragmatic approach that assumes a couple modest accommodations can solve a significant chunk of the problem:

> matching C++ name mangling conventions

> matching C++ function calling conventions

> matching C++ virtual function table layout for single inheritance

Can you see how limited this subset of interoperable API is?

> matching C++ name mangling conventions

> matching C++ function calling conventions

> matching C++ virtual function table layout for single inheritance

Basically COM? Which is not surprising as it was designed as an OO native FFI.

And UWP improves on it, giving more C++ features across the ABI, basically what .NET v1.0 should have been all along.

I didn't state it supports 100% compatibility, and it is still better than being stuck with an OS ABI from the 70s, that is only meaningful when the underlying OS is written in C anyway.

What does zero-copy mean in this context?

Normally, zero-copy refers to the fact that data can be decoded without copying it in memory. I.e. a pointer cast is zero-copy because there's no need to copy the memory elsewhere; that memory can simply be reinterpreted as the target type.

I'm not sure if that's the same meaning in this context, however.

I wonder how it compares to messagepack's zero copy serializer: https://blog.treasuredata.com/blog/2011/11/21/messagepack-th... ?

I wish Google open sourced RecordIO instead (or in addition). People reinvent this particular bicycle, poorly, pretty much in every project where engineers are smart enough to introduce a _structured_ application log.

It looks like https://github.com/google/riegeli might be what you're looking for? (from a search of "RecordIO")

Far too complicated for what it's supposed to do. Google's RecordIO basically just concatenates messages together, with a few additional provisions for compression and handling small messages. In contrast, Riegeli depends on three different compressors and Abseil, and its file format specification is quite elaborate: https://github.com/google/riegeli/blob/master/doc/riegeli_re.... This could explain its lack of market penetration (only 221 stars at the time of this writing). The authors apparently don't care that it's not popular, since they don't even provide a code example on how to use it. It also apparently uses HighwayHash for integrity checking, which is overkill - such things should use crc32c, for which most modern CPUs have a dedicated, fast instruction.

TensorFlow's TFRecord is RecordIO.

So, is this basically Android's Bundle semantics made standard and cross-platform? Are there differences implementation-wise?

I'd rather use MessagePack. Clean, simple, small.

It has worse performance though. Not zero-copy.

It depends what you mean by zero-copy. It can't be traversed without parsing the structure, but strings and binary blobs can be used in-place without copying. (This is also possible with JSON even, for example with RapidJSON's in-situ parser.)

This has worse performance for decoding, but far better performance for encoding. Encoding any traversible format is guaranteed to be expensive because you have to encode it inside-out, calculating the nested sizes of everything as you go. For messages that are only encoded once and read once, for example network messages, a format that is traversible in-place is a poor choice.

That's too general. Sometimes there's [almost] nothing to calculate, so no slowdown. Sometimes the traversible format can be used as your working data structure and then encoding is a no-op.

Java example is very weird.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact