
Cap'n Proto v0.3: Python support, better tools, other improvements - kentonv
http://kentonv.github.io/capnproto/news/2013-09-04-capnproto-0.3-python-tools-features.html
======
srollyson
Hi, Kenton.

First: thanks for your work on Protocol Buffers. I've used it fairly
extensively for RPC communications between C++/Java clients and a Java
service. It made things so much easier to get native objects in each language
using a well-defined protocol.

One thing that bugged me about Protobuf is that it provided a skeletal
mechanism for RPC (e.g. RpcController/RpcChannel) but later deprecated the use
of that mechanism in favor of code-generating plugins. Since Cap'n Proto is
billed as an "RPC system", do you have plans to include a more fleshed-out
version of RPC than was provided in Protocol buffers? Having abstract classes
for event-handling and transport mechanisms is a good idea for extensibility
but it sure would make it easier for your users if there was at least one
default implementation of each.

I imagine that Google has standard implementations of these things internally
but balked at trying to support them for multiple languages as an open source
project.

~~~
kentonv
Yes, in fact, the next release of Cap'n Proto (v0.4) is slated to include RPC
support. There are some hints on what it might look like in the docs already:

[http://kentonv.github.io/capnproto/rpc.html](http://kentonv.github.io/capnproto/rpc.html)
[http://kentonv.github.io/capnproto/language.html#interfaces](http://kentonv.github.io/capnproto/language.html#interfaces)

The reason Google never released an RPC system together with protobufs is
because Google's RPC implementation simply had too many dependencies on other
Google infrastructure, and wasn't appropriate for use outside of Google
datacenters. There were a few attempts to untangle the mess and produce
something that could be released, but it never happened.

The public release had support for generating generic stubs, as you mentioned,
but it was later decided that these stubs were actually a poor basis for
implementing an RPC system. In their attempt to be generic, their interface
ended up being rather awkward. We later decided that it made more sense to
support code generator plugins, so that someone implementing an RPC system
could provide a plugin that generates code ideal for that particular system.
The generic interfaces were then deprecated.

Cap'n Proto also supports code generation plugins. But, as I said, we will
soon also have an "official" RPC layer as well -- and it will hopefully be
somewhat plugable itself, so that you can use a different underlying transport
with the same generated interface code. Anyway, this will all become clearer
with the next release, so stay tuned!

~~~
srollyson
I'm not going to lie; it took me a little while to wrap my head around those
stubs before implementing a TCP transport and semaphore triggers to unblock
outstanding RPC function calls. However, it seemed much easier to do that than
write a plugin for protoc to generate code that did roughly the same thing.

I'm currently considering RPC implementations for a personal project I'm
working on. Right now I may end up trying Thrift since it seems to support RPC
out of the box, but my ultimate goal is to have a WebSockets transport which
Thrift doesn't provide. I may end up contributing to Cap'n Proto if it looks
like the effort required to get RPC up and running has at least some parity
with the effort required to extend Thrift for my needs.

It's clear from your planned use of futures and shared memory that your goal
for Cap'n Proto is to make it the go-to library for communication in parallel
computing. I'm definitely eager to see Cap'n Proto succeed in that endeavor.
JSON is great for readability but it really isn't going to cut the cake when
efficiency matters!

~~~
kentonv
I look forward to hearing from you, should you decide to contribute. :) A web
socket transport for Cap'n Proto would make a lot of sense, particularly if
paired with a Javascript implementation, which one or two people have claimed
they might create. I expect it will be easy to hook this in as a transport
without disturbing much of the RPC implementation.

~~~
btilly
One random idea that just hit me if you're thinking about RPC layers anyways.
Make sure that Cap'n Proto plays well with 0MQ. They probably do already, but
a published example or two demonstrating it would not be a bad thing.

~~~
kentonv
You can certainly send Cap'n Proto messages over 0MQ (or nanomsg) pretty
easily -- Cap'n Proto gives you bytes, 0MQ takes bytes. Done deal.

However, supporting Cap'n Proto's planned RPC system on top of 0MQ may not
work so well. The thing is, 0MQ implements specific interaction patterns, such
as request/response, publish/subscribe, etc. Meanwhile, Cap'n Proto RPC is
based on a different, more fundamental object-oriented model that doesn't fit
into any of these patterns. A Cap'n Proto connection does not have a defined
requester or responder -- both sides may hold any number of references to
objects living on the other side, to which they can make requests at any time.
So it fundamentally doesn't fit into the req/rep model, much less things like
pub/sub. On the other hand, you can potentially build a pub/sub system _on top
of_ Cap'n Proto's model (as well as, trivially, a req/rep system).

I discussed this a bit on the mailing list:

[https://groups.google.com/d/msg/capnproto/JYwBWX9eNqw/im5r_E...](https://groups.google.com/d/msg/capnproto/JYwBWX9eNqw/im5r_E-
vlyIJ)

At least, this is my understanding based on what I've managed to read so far
of 0MQ's docs. I intend to investigate further, because it would be great to
reuse existing work where it makes sense, but at the moment it isn't looking
like a good fit. If I've missed something, definitely do let me know.

~~~
btilly
The killer feature that I like for 0MQ is that you can support message passing
asynchronously, even when the other side is not currently up. For instance in
a request/response pattern, one side might go away, get restarted,
reinitialize, and then they carry on as if there wasn't a period in the middle
where there was no connection. This kind of robust handling of network
interruptions is very convenient for many use cases.

However what you describe isn't necessarily going to fit into that. The #1
thing that your description makes me wonder about is whether RPCs are going to
be synchronous or asynchronous. So, for instance, if you hand me a data
structure with a list objects that are references to data that I want to have,
and I decide that I need 10 of them, do I have to pay for the overhead of 10
round trips, or can I say, "I need these 10" and get them all at once?

~~~
kentonv
> _support message passing asynchronously, even when the other side is not
> currently up._

That's probably something that could be implemented in Cap'n Proto as some
sort of a persistent transport layer. But since the connections are stateful,
it does require that when one end goes down, it comes back up with its state
still intact. I have a lot of ideas for how to make this possible it big
systems but it's a long way off.

Of course, in the simple case where you _do_ have a defined client and server
and the server is only exporting one stateless global service object that the
client is using -- which is roughly what 0mq req/rep sockets are for -- then
it should be no problem to support this.

> _whether RPCs are going to be synchronous or asynchronous_

The interface will be asynchronous based on E-style promises (similar to
futures). In fact, say you call an RPC which returns a remote object
reference, and you immediately want to call another method on that reference.
With Cap'n Proto's approach, you will be able to do this whole interaction in
_one_ round trip instead of two. This is called "Promise Pipelining". There's
a bit (just a bit) more detail here:

[http://kentonv.github.io/capnproto/rpc.html](http://kentonv.github.io/capnproto/rpc.html)

------
zheng
The claims on this site are pretty impressive, but I have close to zero
knowledge of the history here, so can someone comment on how many grains of
salt this should be taken with? Otherwise, this looks pretty cool. Something
that beats protobufs in overall speed could be really helpful depending on the
application.

~~~
haberman
Just to give a bit of counterpoint, here are some trade-offs that Capn Proto
makes compared with protobufs. (Full disclosure: I work at Google and know
Kenton from his time here; I have my own protobuf library that I've worked on
for several years). I'm sure Kenton will correct me if I get anything wrong.
:)

Capn Proto's key design characteristic is to use the same encoding on-the-wire
as in-memory. Protobufs have a wire format that looks something like:

    
    
      [field number 3][value for field 3]
      [field number 7][value for field 7]
      etc.
    

The fieldnum/value pairs can come in any order, and may define as many or as
few of the declared fields as are present. This serialization format doesn't
work for in-memory usage because for general programming you need O(1) access
to each value, so protobufs have a "parse" step that unpacks this into a C++
class where each field has its own member.

Protobufs are heavily optimized so this parsing is fast, but it's still a very
noticeable cost in high-volume systems. So Capn Proto defines its wire format
such that it _also_ has O(1) access to arbitrary fields. This makes it
suitable as an in-memory format also.

While this avoids a parsing step, it also means that your wire format has to
preserve the empty spaces for fields that aren't present. So to get the
"infinitely faster" advantage, you have to accept this cost. For dense
messages, this can actually be smaller than the comparable protobuf because
you don't have to encode the field numbers. But for very sparse messages, this
can be arbitrarily larger.

As Kenton points out on
[http://kentonv.github.io/capnproto/encoding.html](http://kentonv.github.io/capnproto/encoding.html)
, lots of zeros compress really well, so even sparse messages can become
really small by compressing them. To do this you lose "infinitely faster", but
according to Kenton this is still faster than protobufs.

In both cases though, the tight coupling between the (uncompressed) wire
format and the in-memory format imposes certain things on your application
with regards to memory management and the mutation patterns the struct will
allow. For example, it appears that the in-memory format was not sufficiently
flexible for Python to wrap it directly, so the Python extension does in fact
have a parse step.

Other cases where you could need a parse/serialize step anyway: if you want to
put the wire data into a specialized container like a map or set (or your own
custom data classes), or if the supported built-in mutation patterns are not
flexible enough for you (for example, the Capn Proto "List" type appears to
have limitations on how and when a list can grow in size).

It's very cool work, but I don't believe it obsoletes Protocol Buffers. I'm
actually interested in making the two interoperate, along with JSON -- these
key/value technologies are so similar in concept and usage that I think it's
unfortunate they don't interoperate better.

~~~
kentonv
Generally a fair analysis. A few comments/corrections:

> _For example, it appears that the in-memory format was not sufficiently
> flexible for Python to wrap it directly, so the Python extension does in
> fact have a parse step._

This is not correct. The Python wrapper directly wraps the C++ interface. You
might be confused by Jason's claim that "The INFINITY TIMES faster part isn't
so true for python", but this was apparently meant as a joke.

It is true, though, that the constraints of arena-style allocation (which
Cap'n Proto necessarily must use to be truly zero-copy) mean that working with
Cap'n Proto types is not quite as convenient as protobufs, although most users
won't notice much of a difference. Lists not being dynamically resizable is
the biggest sore point, though most use cases are better off not relying on
dynamic resizing (it's slow), and the use cases that really do need it can get
around the problem using orphans (build an std::vector<Orphan<T>>, then
compile that into a List<T> when you're done).

OTOH, over the years, many people have requested the ability to use arena
allocation with Protobufs due to the speed benefits, especially with Protobufs
being rather heap-hungry. I always had to tell them "It would require such a
massive redesign that it's not feasible."

And yes, there is the trade-off of padding on the wire. You have to decide
whether your use case is more limited by bandwidth or CPU. With Cap'n Proto
you get to choose between packing (removing the zeros, at the cost of a non-
free encode/decode step) and not packing (infinitely-fast encode/decode,
larger messages). For intra-datacenter traffic you'd probably send raw,
whereas for cross-internet you'd pack. Protobufs essentially always packs
without giving you a choice. And because it generates unique packing code for
every type you define (rather than use a single, tight implementation that
operates on arbitrary input bytes), Protobuf "packing" tends to be slower.

~~~
haberman
Thanks for the correction on the Python point.

> OTOH, over the years, many people have requested the ability to use arena
> allocation with Protobufs due to the speed benefits, especially with
> Protobufs being rather heap-hungry. I always had to tell them "It would
> require such a massive redesign that it's not feasible."

Yes totally, I agree that arena allocation is great. I think we both agree on
this point, though we've taken two different paths in attempting to solve it.

Your approach is to say that arena allocation can be made pretty convenient,
and sparse messages can compress really well, so let's design a message format
that is amenable to arena allocation and then implement a system that uses
this format both on-the-wire and in memory.

My approach is to say that we can solve this (and many other related problems)
by decoupling wire formats from in-memory formats, and having the two
interoperate through parsers that implement a common visitor-like interface.
Then a single parser (which has been optimized to hell) can populate any kind
of in-memory format, or stream its output to some other wire format. Of course
this will never beat a no-parser design in speed, but the world will never
have all its data in one single format.

I think of these two approaches as totally complimentary; to me Capn Proto is
simply another key/value serialization format with a particular set of nice
properties, and I want it to be easy to convert between that and other
formats.

Since your approach is much more focused, you have been able to turn out
usable results orders of magnitude faster than I have. I'm spending time
implementing all of the various protobuf features and edge cases that have
accumulated over the years, while simultaneously refining my visitor interface
to be able to accommodate them while remaining performance-competitive with
the existing protobuf implementation (and not getting too complex). As much as
I believe in what I'm doing, I do envy how you have freed yourself from
backward compatibility concerns and turned out useful work so quickly.

~~~
kentonv
It's more like I started from "Let's design a message format that can be
passed through shared memory or mmap()ed with literally zero copies", and then
arena allocation was a natural requirement. :)

> _Since your approach is much more focused, you have been able to turn out
> usable results orders of magnitude faster than I have._

To be fair, the fact that I'm working on it full-time -- and with no review,
approval, or other management constraints of any kind -- helps a lot. :) (Down
side is, no income...)

------
bsimpson
The Python implementation is a wrapper around a C++ module, so it's probably
not practical to use on AppEngine.

~~~
kentonv
Yep, that is a big trade-off. There's room for someone to write a pure-Python
implementation as well, to fill that niche. You could probably get a lot of
the way using Python's `struct` module. But it would be slower.

