
Flatbuffers by Google – CapnProto alternative - halayli
https://github.com/google/flatbuffers
======
kentonv
Detailed (though perhaps biased) comparison I wrote up a few months ago:

[http://kentonv.github.io/capnproto/news/2014-06-17-capnproto...](http://kentonv.github.io/capnproto/news/2014-06-17-capnproto-
flatbuffers-sbe.html)

~~~
tbastos
Even though capnproto would be our first choice, the lack of support for
Windows/CMake is kind of a party killer. FlatBuffers doesn't offer everything
we need either, but its codebase is simpler to grasp and hack, so it may end
up being the safer choice... which is unfortunate

~~~
kentonv
I'm trying to get basic MSVC support into version 0.5.0, which is planned for
release in late November. The reflection and dynamic APIs probably won't be
supported initially because too much would have to be rewritten to work around
missing C++11 features in MSVC -- they'll come online as soon as MSVC adds
support. But, most common use cases don't need them anyway.

0.5.0 will also feature cmake support (this is already in git).

------
kevinbowman
Wow: "For applications on Google Play that integrate this tool, usage is
tracked" without even an option to disable that. Sure, it's open source so can
be changed by editing the source code, but does anyone else find that kinda
creepy?

e.g. if I make an app using 10 FOSS libraries, then I wouldn't want my app
reporting to 10 different places everything which the user is doing.

Also, on the actual homepage for it
([http://google.github.io/flatbuffers](http://google.github.io/flatbuffers)),
the only mention of this call-home feature is buried at the bottom of the
"building" page.

[EDIT] This is incorrect; see below comments about this tracking not being a
call-home feature but instead just Google scanning apps submitted to the Play
Store

~~~
eridius
I'm a bit confused. Is it actually calling home? That seems kind of unlikely,
given that a) that seems pretty egregious for a library like this, and b) they
said this doesn't affect the application at all beyond consuming "a few
bytes".

Could it instead be just that Google scans Google Play apps for a string in
the binary that matches the Flatbuffers version string format? That seems more
likely given what the README does say about this. And it also seems more
useful in general; Google would benefit more from knowing how many
applications use the library than knowing how popular these applications are.

~~~
subim
> Could it instead be just that Google scans Google Play apps for a string in
> the binary that matches the Flatbuffers version string format?

Of course that's what's happening:
[https://github.com/google/flatbuffers/blob/master/include/fl...](https://github.com/google/flatbuffers/blob/master/include/flatbuffers/flatbuffers.h#L977-994)

------
sandGorgon
Again, taking this from a previous conversation on this topic -
[https://news.ycombinator.com/item?id=7904443](https://news.ycombinator.com/item?id=7904443)
\- it seems CapnProto and Flatbuffers are much faster in C++, Go and Rust...
the benchmarks may be very different in Javascript, Python, Ruby, etc.

It would be really interesting (and possibly more relevant for HN) to have
benchmarks based on one dynamic language - say Python.

Oh and @kentonv - I'm not a native American English speaker (rest of the world
really). I really, really have trouble pronouncing Capn'Proto. Even more
difficult to pronounce it in a meeting and have people recall/Google it.

~~~
kentonv
To be clear, the thing that you'd think would be a problem in dynamic
languages -- lack of pointer arithmetic -- actually isn't a problem. Every
language has a way to extract values from a byte string, e.g. the `struct`
module in Python, TypedArrays in Javascript, ByteBuffer in Java, etc.

The real problem in dynamic languages is that they tend to be worse at
inlining accessor functions. This is not really because inlining is impossible
-- v8 can do it -- but because most dynamic languages don't prioritize
performance in the first place and so haven't implemented such optimizations.
This is actually a problem in Go as well, weirdly. Because of this, if you
actually intend to consume most of the content of a message, it may make sense
to parse it into a language-native data structure up front so that access
doesn't need to go through accessor functions. Most Cap'n Proto
implementations support this. Doing this will still be much faster than using
Protobufs because the Cap'n Proto format is naturally faster to decode.

As David says, "Cap'n" should be pronounced like "happen", though pronouncing
it as "captain" is OK as well (and will still get people to the right place if
they Google it).

~~~
sandGorgon
@kentonv - you misunderstand. I do know that all of this can be implemented in
dynamic languages. The question is whether the benchmarks there will be
significantly different than benchmarks on languages with direct unsafe memory
access.

I dont know the answer, ergo the question.

~~~
kentonv
Hmm, I thought that was what I was answering. Maybe I'm still
misunderstanding. You're asking if Cap'n Proto's advantage over something like
Protobufs will be less pronounced in a dynamic language compared to C++? Yes,
that is likely the case, due to one or both of the inlining issue and the the
language's general slowness dwarfing any gains from the encoding library.

Of course, in cases where Cap'n Proto has a more-than-constant speedup, such
as reading a single field from a large message (O(1) in Cap'n Proto, O(n) in
Protobufs), then the difference will still be huge regardless of language.

If you're looking for specific benchmark numbers, I don't have any handy,
sorry. (But benchmarks can be manipulated to show any result, so you shouldn't
trust any author-provided numbers anyway.)

~~~
sandGorgon
@kentonv - yes that is what I was asking and thank you for asking. I think one
aspect of my question got lost in the noise and it is my fault. On Python,
protobuf vs capnproto is not apples to apples, since the former is pure python
... while yours is python wrapper over C. I have read your justifications [1]
and I agree with you. But do note that there are some large usecases for
Python on desktop software. C-extensions turn to be blockers in those cases.
In many ways, I was hoping that you would have a pure-python version as well
(since you did build one at Google) which sacrifices speed for compatibility.

I was thinking in that context.

[1]
[http://kentonv.github.io/capnproto/news/2013-09-04-capnproto...](http://kentonv.github.io/capnproto/news/2013-09-04-capnproto-0.3-python-
tools-features.html)

~~~
kentonv
It would be great if someone were to contribute a pure-Python implementation,
but it's unlikely the sandstorm.io team will work on this since it has no real
use to us.

I actually think it's likely that a pure-Python version of Cap'n Proto would
be significantly faster than the pure-Python protobuf implementation. Parsing
Protobufs in Python is really horrible performance-wise since you have to
inspect and branch on almost every byte. The way to make Python fast is to
delegate as much work as possible to the built-in libraries that are written
in C. But, there's just nothing that can be delegated in the case of
Protobufs. In contrast, a Cap'n Proto parser could pretty easily leverage the
existing `struct` module.

That said, if you enable Cap'n Proto's "packed" mode, then this advantage is
lost, since that's another byte-by-byte algorithm that will perform poorly in
pure Python.

------
userbinator
I looked at their implementation at
[http://google.github.io/flatbuffers/md__internals.html](http://google.github.io/flatbuffers/md__internals.html)
and found this rather confusing paragraph:

 _Strings are simply a vector of bytes, and are always null-terminated.
Vectors are stored as contiguous aligned scalar elements prefixed by a 32bit
element count (not including any null termination)._

So... does the count include the null terminator byte or not?

~~~
zeroxfe
That sounds like an unambiguous no to me (null-terminator not included in
count.)

~~~
robert_tweed
I believe you are correct, mainly because the basic purpose of the protocol is
"no parsing", so therefore it must work when loaded directly into RAM.

The spec is a bit confusing though, because of the statement that "Strings are
simply a vector of bytes". The way I understand it is that a string is a
vector FOLLOWED BY a null terminator. The spec should probably say that rather
than the current wording.

This would appear to be necessary so that a string can be treated as either a
vector like any other (with the correct number of elements) and can also be
accessed directly by a (char *) pointer without things going awry.

Disclaimer: I haven't read the whole spec yet; this is my off-the-top-of-my-
head interpretation and I may have misunderstood it completely.

~~~
mockery
I'd expect that strings are just a vector of bytes, with strlen()+1 bytes, the
last of which is always null.

------
jhallenworld
I used the C preprocessor as the schema compiler in my serialization library:
[https://github.com/jhallen/joes-
sandbox/tree/master/lib/sdu](https://github.com/jhallen/joes-
sandbox/tree/master/lib/sdu)

~~~
ultimape
Cool. I'd love to know more about what makes your system awesome - It is a
very creative idea! Have you thoguht about creating a DTD out of the schema or
vice-versa. Having a DTD to validate the file against would allow for some
serious robustness in hot-loading stuff from the web.

I think the big draw for the flatbuffer system is that it can stream data in
with a low memory foot-print.

------
desdiv
Related discussion 130 days ago:
[https://news.ycombinator.com/item?id=7901991](https://news.ycombinator.com/item?id=7901991)

------
WhitneyLand
So when did it become acceptable to leave tracking code turned on by default
(opt out) in open source repos?

~~~
maxerickson
Are there rules attached to the plurality of (so called) open source licenses,
or even OSI approved licenses? Not really.

Are you finding this particular tracking code acceptable? Apparently not.

So there was never any coherent whole that could have found something
unacceptable to begin with and in the end there are still disparate parts that
continue to find it unacceptable.

I guess this is an obvious and tiresome answer, but I'm not sure what else you
would expect anyone to say.

~~~
WhitneyLand
Just because there are no rules against something does not make it acceptable
or good.

Just because I review code doesn't meant I accept or use it so your assumption
is wrong.

There should be nothing tiresome about calling out for discussion of evolving
trends in software or anything else.

If you see nothing negative about this practice then speak out constructively.

~~~
maxerickson
I assumed that you did _not_ find the tracker here acceptable.

What I find tiresome is insisting that the use of some license or the other is
a statement of values (your phrasing also implies that history clearly agrees
with you, which I tend to find tiresome).

If the readme and other materials made repeated attempts to invoke some set of
values and the source was contrary to that, fine you have a valid gripe, but
the readme doesn't mention the license and the homepage (
[http://google.github.io/flatbuffers/](http://google.github.io/flatbuffers/) )
keeps it to "It is available as open source under the Apache license, v2 (see
LICENSE.txt)."

