
Introducing BERT and BERT-RPC: GitHub's new serialization and RPC protocol - mojombo
http://github.com/blog/531-introducing-bert-and-bert-rpc
======
vicaya
Any benchmarks compared with Thrift and Protobuf? BTW, Hadoop's new Avro
project is yet another serialization project that's designed to be dynamic
language friendly.

I personally find thrift IDL trivial to manage and is probably the only sane
way to support a dozen languages (statically and dynamically typed) in a type
safe and efficient way.

BERT-RPC is simple, but what if you find your server too slow and needs to be
rewritten in another language? I think the Thrift approach makes that simpler.

------
antonovka
This is just silly:

 _I just can’t. I find the entire concept behind IDLs and code generation
abhorrent. Coming from a background in dynamic languages and automated
testing, these ideas just seem silly._

The point of IDLs is to provide a stable, exact, portable and succinct
definition of an inter-application _protocol_. In defining the protocol
explicitly in a language neutral way, it is easier to ensure conformance and
correctness of implementation.

As a side effect, it does make code generation easy, allows one to optionally
apply static typing to ensure that messages are always correctly formed, and
ensures a _DRY_ approach across the board.

You could just as easily implement poorly defined interfaces using Protobuf or
Thrift -- nobody says that you actually have to use an IDL, and the
serialization format doesn't require it as long as you keep the messages self-
describing. Moreover, a lot of effort has gone into making Protobuf (and even
Thrift) as efficient as possible -- yet another serialization format is wholly
unnecessary.

[Edit] Google even outlines one way to implement self-describing messages
using the existing protobuf standard in the project documentation:
[http://code.google.com/apis/protocolbuffers/docs/techniques....](http://code.google.com/apis/protocolbuffers/docs/techniques.html#self-
description)

Other methods including simply encoding fields as tuples of (name, value) --
protobuf includes value types in the field encoding.

~~~
SirWart
I think the code generation part is a lot worse than the IDL part.

~~~
antonovka
It makes usage quite a bit cleaner for some target languages, but it's also
optional.

------
jerf
Is this intended to be put out for others to use? My reaction to it as an
internal protocol is "OK, that's interesting" (I have my own streaming JSON
message protocol myself), but putting it out for others to use clogs up an
awfully full space (you'll note I'm not linking you to a "library" since my
protocol isn't worth releasing).

One issue that leaps to mind (based on the fact my streaming JSON protocol
also travels through Erlang as it happens) is what you do with dicts? Erlang's
native representation for dicts is extremely hostile as a term to send to
other languages:

    
    
        Erlang R13B01 (erts-5.7.2) [source] ...etc
    
        Eshell V5.7.2  (abort with ^G)
        1> dict:from_list([{"abc", "def"}, {"efg", "hij"}]).
        {dict,2,16,16,8,80,48,
              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
              {{[],
                [["efg",104,105,106]],
                [],[],[],
                [["abc",100,101,102]],
                [],[],[],[],[],[],[],[],[],[]}}}
    

That's a hash with two entries, abc -> def and efg -> hij. Note that's the
internal representation; dumping out a perl hash gives you a semantic-level
representation but under the hood it too has dirty nasty bits like that.

If that's not easy, that's a big price to pay for moving binaries around, in
general. In specific cases it could be fine.

~~~
evgen
Why would you send an actual dict instead of a proplist? Every serialization
protocol I am aware of does not send the actual internal dict/hash/map
representation but instead flattens it down to what is effectively a list of
key-value pairs that may or may not be tagged by the protocol as something to
be reconstituted as a dict at the remote end. Looking at the spec it appears
that this is what BERT does as well.

~~~
jerf
Because the result in Erlang will not be a dict, it will be a list of
key/value pairs. That's not the same.

Doing that automatically means you have to do something else to label it as a
key/value list meant to back to a dict.

These are not insurmountable problems, just issues that have to be dealt with.
Any problem in Erlang can be solved with another layer of record indirection,
but those don't come for free.

I must not have looked closely enough to find the source; I did try.

~~~
bham
[http://github.com/mojombo/bert/blob/master/lib/bert/encoder....](http://github.com/mojombo/bert/blob/master/lib/bert/encoder.rb)

[http://github.com/mojombo/erlectricity/blob/master/lib/erlec...](http://github.com/mojombo/erlectricity/blob/master/lib/erlectricity/encoder.rb)

------
simonw
I thoroughly enjoyed the justification given for inventing something new
despite the existence of XML / JSON / Thrift etc.

------
rads
Why not REST? What's wrong with HTTP?

~~~
jacobolus
Because not everything in the world maps well onto request-response, and
trying to put stateful bi-directional communication on top of HTTP is an ugly
hack.

~~~
DougWebb
While that's true, it seems that a function call maps precisely onto request-
response: call a function, get back a return value. Statefulness isn't a
problem for REST or HTTP either, because it's the communication protocol
that's stateless, not the resources. For example, websites are REST services,
the pages on the website are the resources, and they certainly exist: that's
their state. What's stateless is that it doesn't matter what order you
retrieve the pages; you'll always get the same page for a given URL. The only
exception is when the request is intended to change the state of some
resource, such as this thread changing state when I submit the comment I'm
writing.

For a RESTful function call service, each function would be a resource, and
the functions would be required not to have side-effects (eg: don't change the
state of the service). That's often good design for any API, not just RPC-type
services. Again, the exception would be function calls whose purpose is to
modify the service state in some way.

When the service state is intended to be different for each user of the
service, HTTP has a solution for that too: a cookie which contains a token
identifying the session. Any state changes can be associated with the token.

Now you mention bi-directional. If you mean that either side of the
interaction can initiate an exchange, then you're right; HTTP is no good for
that. Not many RPC services work that way though; it's more complex to design.
It's much more common and simpler to have client/server interactions where the
client initiates every exchange, and the server listens and responds. You can
do a lot with that model.

~~~
jacobolus
> _It's much more common and simpler to have client/server interactions_

I can’t agree with such a blanket statement. It completely depends on your
goals and application logic. All kinds of protocols/applications are designed
to be bi-directional (multiplayer games (think MUDs, FICS, etc.),
IRC/Jabber/other chat, VoIP, push email, message queues, multi-user document
editors, telnet/SSH, remote device monitoring apps, etc. etc. etc.

Just because such apps aren't so often seen on the web, where their function
would have to be hacked on top of HTTP, doesn't mean they aren't common, in
general.

Running your multi-player game protocol over HTTP, using cookies to record the
session, and polling for updates, just so you can say you’re being RESTful
(and "simpler"?) is a terrible terrible idea, because if the semantics of your
app stay the same, you basically have equally complex logic, now with several
layers of indirection and overhead tossed in for no reason.

> _For a RESTful function call service, each function would be a resource, and
> the functions would be required not to have side-effects_

Okay, I want my function call to be "I just captured your queen with my rook,
and you'd better find out about it because it's now your turn". How does that
work exactly?

~~~
DougWebb
I didn't say that everything was a client/server type of interaction; I said
that client/server interactions are more common than bi-directional
client/client interactions. You've provided a list of client/client protocols,
which is great, but I still think client/server is much more common.

For client/client, you're right: HTTP is not appropriate. If you can stomach
the XML Jabber (aka XMPP, right?) looked like a good general-purpose bi-
directional protocol, and if I'm not mistaken that's what Google Wave is
using. If you don't want to pay the XML parsing/serializing overhead, other
protocols are available, and maybe that's why BERT-RPC was created. My
original point was simply that, depending upon the requirements, the existing
HTTP protocol might be sufficient for the kind of interaction needed and it's
supported by a rich infrastructure.

