

Capnproto-rust vs. C++ - steveklabnik
http://dwrensha.github.io/capnproto-rust/2013/11/16/benchmark.html

======
pcwalton
As I said earlier on the mailing list, I suspect (without looking into it)
that the cause of the I/O performance slowdown relative to C++ is something
related to buffering—perhaps the I/O is not being buffered, or the buffer
isn't functioning right. This would be consistent with the serialization-based
I/O leading to larger slowdowns, because there would be more calls to write(2)
then. If so, then this should be fixable.

I'm pleasantly surprised to see the object mode slightly faster than C++.

~~~
renshaw
I should note that the Rust version currently omits some features of the C++
version, such as read-limiting and the actual counting of the throughput.
These things are cheap, but they may explain why Rust is faster in that one
case.

~~~
pcwalton
We did some benchmarking on IRC today in light of this post and we found that
Rust's stdio is currently quite slow due to a flag not being set properly in
libuv which causes it to punt to a thread pool. There is currently a fix in
the queue:
[https://github.com/mozilla/rust/pull/10558](https://github.com/mozilla/rust/pull/10558)

~~~
srean
In case you get some time would appreciate a blog post or a comment describing
what the issue was on the rust side as well as on the libuv side. Upvoted in
advance.

------
eonil
I have been thought all the serialization formats such as Protobuf, Thrift,
BSON, or MessagePack are using compression as much as possible because saving
amount of I/O is ultimate win for overall performance rather than fast
calculation by memory alignment.

When I see this Capnproto, I am confusing that which one is right approach.
Alignment is not exotic technique, and why didn't they align the data if
there's no reason to save I/O?

Or am I totally misunderstanding these implementations?

~~~
kentonv
Well, it depends on the environment. If you are doing interprocess
communication, then I/O bandwidth is obviously not a concern at all. On the
other hand, over the internet, it clearly is the biggest concern. For intra-
datacenter traffic on a 10Gbit NIC, it's harder to say, but it _probably_
isn't the bottleneck. Cap'n Proto supports both cases by making additional
packing optional, so you can choose the best trade-off for your application.

Regarding the other formats you mention, I think you may be imagining that the
designers of these protocols thought more carefully about them than they
really did. Protobuf, for example, was designed pretty ad-hoc to solve an
immediate problem in Google's search infrastructure, and then stuck mostly
because as more and more things used it, it was easier to keep using it than
start over. The designers readily acknowledge that it is not an ideal format
-- in fact, there are other ways they could have done the encoding which would
have taken no more space but would have saved significant CPU time.

(Disclosure: I was the maintainer of protobufs for a long time, though not the
original creator. I am also the author of Cap'n Proto.)

------
zik
It'd be nice to see some description of which compiler, version and flags are
used in each case.

~~~
renshaw
Author here. The makefiles I'm using are on github:

[https://github.com/dwrensha/capnproto-
rust/blob/master/Makef...](https://github.com/dwrensha/capnproto-
rust/blob/master/Makefile)

[https://github.com/dwrensha/capnproto/blob/benchmark/c%2B%2B...](https://github.com/dwrensha/capnproto/blob/benchmark/c%2B%2B/src/capnp/benchmark/Makefile)

I compiled libcapnp with the latest Clang g++ that ships with XCode. It
perhaps would be more fair to also compile the C++ benchmarks with Clang,
instead of the Macports gcc4.8 that I'm using, but unfortunately Clang barfs
on some template hackery in the benchmark driver.

~~~
Game_Ender
That would be quite useful. Otherwise it's very hard to tease apart the
differences between optimization back ends of GCC and LLVM vs the Rust and C++
front ends. I feel like I have seen benchmarks showing differences around
5-20% between LLVM and GCC in speed, so the it will have an effect.

