
Reasons to use protocol buffers instead of JSON - brynary
http://blog.codeclimate.com/blog/2014/06/05/choose-protocol-buffers/
======
falcolas
> Schemas Are Awesome

No reason you can't implement schemas over JSON. In fact, you typically
implicitly do - what your code is expecting to be present in the data
structures deserialized from JSON.

> Backward Compatibility For Free

JSON is unversioned, so you can add and remove fields as you wish.

> Less Boilerplate Code

How much boilerplate is there in parsing JSON? I know in Python, it's:

    
    
        structure = json.loads(json_string)
    

Now then, if you want to implement all kinds of type checking and field
checking on the front end, you're always welcome to, but allowing "get
attribute" exceptions to bubble up and signal a bad data structure have always
appealed to me more. I'm writing in Python/Ruby/Javascript to avoid rigid
datastructures and boilerplate in the first place most times.

[EDIT] And for languages where type safety is in place, the JSON libraries
frequently allow you to pre-define the data structure which the JSON will
attempt to parse into, giving type safety & a well define schema for very
little additional overhead as well.

> Validations and Extensibility

Same as previous comment about type checking, etc.

> Easy Language Interoperability

Even easier: JSON!

And you don't have to learn yet another DSL, and compile those down into lots
of boilerplate!

I'm not trying to say that you shouldn't use Protocol Buffers if its a good
fit for your software, but this list is a bit anemic on real reasons to use
them, particularly for dynamically typed languages.

~~~
Arkadir
> structure = json.loads(json_string)

Not quite. Let's say your JSON data contains the following attribute:

    
    
        "access" : [ "view", "edit", "admin" ],
    

This field should be represented (in the language) as a set of values from an
"access-levels" enumeration.

In C#, you'd have the following boilerplate:

    
    
        [DataMember(Name = "access")]
        public HashSet<AccessLevels> Access { get; private set; }
    

In OCaml, it would be:

    
    
        access : Access.Set.t ;
    

The simple "json.loads" solution would return a list of strings instead.
What's the Python code for turning it into a set of enumeration values, and
failing if one of the values does not match ?

~~~
falcolas
Python doesn't have enumerations, per se. Here's how I'd represent that:

    
    
        if 'edit' in structure['access']:
             # can edit
        

In which case it really doesn't matter if there's junk in access. If I really
felt the need to validate the structure, I could do so with:

    
    
        if not set(structure['access']).issubset(all_access_privs):
            raise ValueError('invalid access types passed in')
    

but more realistically, I'd rely on the ORM object to validate against the
authoritative source - the database - and key off the errors there.

~~~
TomaszZielinski
Python 3.4 has enums, if that's what you meant:
[https://docs.python.org/3/library/enum.html](https://docs.python.org/3/library/enum.html)

------
wunki
Or, another alternative is Cap'n Proto [1] from the primary author of Protocol
Buffers v2. It smooths some of the bumps of protocol buffers.

[1]:
[http://kentonv.github.io/capnproto/](http://kentonv.github.io/capnproto/)

~~~
ntoshev
Came here to write this. Promise pipelining is an especially interesting
attempt to solve latency in RPC (although it doesn't always work).

~~~
ithkuil
I'd love to see Go support for capnproto RPC; I wish I had spare time.

------
Arkadir
Easy language interoperability as a reason to choose Protobuf over JSON ?
Mainstream languages support both JSON and Protobuf equally well, and the
others tend to support JSON more often than Protobuf.

Free backwards compatibility ? No. Numbered fields are a good thing, but they
only help in the narrow situation where your "breaking change" consists in
adding a new, optional piece of data (a situation that JSON handles as well).
New required fields ? New representation of old data ? You'll need to write
code to handle these cases anyway.

As for the other points, they are a matter of libraries (things that the
Protobuf gems support and the JSON gems don't) instead of protocol --- the
OCaml-JSON parser I use certainly has benefits #1 (schemas), #3 (less
boilerplate) and #4 (validation) from the article.

There is, of course, the matter of bandwidth. I personally believe there are
few cases where it is worth sacrificing human-readability over, especially for
HTTP-based APIs, and especially for those that are accessed from a browser.

I would recommend gzipped msgpack as an alternative to JSON if reducing the
memory footprint is what you want: encoding JSON as msgpack is trivial by
design.

------
CJefferson
Reasons not to use protocol buffers (in C++ at least):

    
    
        1) Doesn't support Visual Studio 2013.
        2) Doesn't support Mac OS X Mavericks.
        3) No "nice" support C++11 (i.e. move constructors)
    

(These can be at least partly solved by running off svn head, but that doesn't
seem like a good idea for a product one wants to be stable)

With JSON I can be sure there will be many libraries which will work on
whatever system I use.

~~~
nly
All 3 of your points are the fault of Protobufs being an almost stagnant
project. Nothing significant has changed for a number of years. You only have
to look at the public repo commit log to see Protobufs may as well be a
release tarball only distribution[0], with already infrequent releases[1].

[0]
[https://code.google.com/p/protobuf/source/list](https://code.google.com/p/protobuf/source/list)

[1]
[https://code.google.com/p/protobuf/downloads/list](https://code.google.com/p/protobuf/downloads/list)

~~~
kentonv
The management chain for Protobufs goes up through the internal infrastructure
org. Sadly, they never really seemed to understand the importance of the open
source release. Even though Chrome, Android, and other open source Google
products rely on it, those orgs generally don't work with internal
infrastructure very much, so they naturally seem distant and unimportant.

Meanwhile, there are two branches of Protocol Buffers: internal and external.
They contain mostly the same code, and there are scripts that automatically
merge changes back and forth, but those scripts do require some amount of
human supervision. And since the internal users are a lot more demanding than
the external ones, most development occurs internally.

This is a pretty crappy way to run an ostensibly open source project, and
that's my fault. I probably should have put in the effort to better unify the
two branches so that changes could be integrated automatically. It would have
been a pretty significant amount of work, though, and I was mostly a one-man
team (with a few "20% time" helpers), and most of my time was spent trying to
figure out how to convert millions of lines of existing internal application
code over from proto1 to proto2 (where proto1 is the original version of
protobufs which has never been released publicly).

What I did do was make sure to run the fiddly release process on a semi-
regular basis, so that the open source release did not fall behind. Luckily
the core code was pretty stable, so it wasn't necessary to do releases all
that often. But pushing releases was something I pretty much had to do on my
own initiative; management never cared.

So when I eventually moved off protobufs, the replacement team (which
management didn't even realize was needed until about six months later)
naturally deprioritized open source releases. They've done a couple over the
years, but as you've noted, it's rare.

On the "bright" side, judging from the way things were going when I left the
company in early 2013, it's unlikely that there has been any significant
change in the internal code from which you'd benefit, so maybe it doesn't
really matter.

~~~
nly
Wow, thanks for the insight in to behind the curtain :}

------
ardit33
We at Spotify use them extensively and are actually moving away from
Protobufs, which we consider as 'legacy'. The advantages of Protobufs don't
make up for its disadvantages over plain JSON. With JSON you have universal
support, simple parsing, developer and debug friendly, much easier to mock,
etc etc.

------
AYBABTME
I think the main advantages are:

    
    
        - network bandwidth/latency: smaller RPC consume less 
          space, are received and responded to faster.
        - memory usage: less data is read and processed while      
          encoding or decoding protobuf.
        - time: haven't actually benchmarked this one, but I 
          assume CPU time spent decoding/encoding will be 
          smaller since you don't need to go from ASCII to 
          binary.
    

Which means, all performance improvements. They come, as usual, at the cost of
simplicity and ease of debugging.

~~~
al2o3cr
"I assume X is more performant ... I haven't actually benchmarked it"

I believe that's usually spelled "I am making this up".

~~~
kentonv
The vast majority of performance judgments are made without benchmarking,
because most of the time the answer is obvious enough without the need to
measure.

This is one of those times. Parsing the text "1234" is obviously going to be
slower than loading its binary value, and text obviously takes more space than
binary.

That said, the internet is full of benchmarks backing this up if you care to
Google it.

------
gldalmaso
> _When Is JSON A Better Fit?_ > _Data from the service is directly consumed
> by a web browser_

This seems to me like a key issue, you need to really know beforehand that
this won't ever be the case, else you need to make your application polyglot
afterwards. A risky bet for any business data service.

Maybe if it's strictly infrastructure glue type internal service. But even
then, maybe someone will come along wanting to monitor this thing on the
browser.

~~~
tbrownaw
I thought the reasoning behind JSON was that Javascript has a parser built in
(eval), rather than it's incapable of being used to parse other formats?

------
znt
Not so sure about "backwards compatibility" part.

From Protocol buffer python doc: [https://developers.google.com/protocol-
buffers/docs/pythontu...](https://developers.google.com/protocol-
buffers/docs/pythontutorial)

"Required Is Forever You should be very careful about marking fields as
required. If at some point you wish to stop writing or sending a required
field, it will be problematic to change the field to an optional field – old
readers will consider messages without this field to be incomplete and may
reject or drop them unintentionally. You should consider writing application-
specific custom validation routines for your buffers instead. Some engineers
at Google have come to the conclusion that using required does more harm than
good; they prefer to use only optional and repeated. However, this view is not
universal."

So basically I will be in trouble if I decide to get rid of some fields which
are not necessary, but somehow were defined as "required" in the past.

This will potentially result in bloated protobuf definitions that have a bunch
of legacy fields.

I will stick to the JSON, thanks.

~~~
gohrt
That argument never made sense:

" old readers will consider messages without this field to be incomplete and
may reject or drop them unintentionally."

That means the old readers -- the ones that are _expecting_ required fields,
can't accept new messages. That's good! The readers don't know how to read the
new messages! The readers need to be updated to a new version before they can
correctly start reading new version of the schema.

~~~
kentonv
Yeah, the problem is that because "required" is checked at the protobuf parser
layer, it will be enforced whether or not the application itself actually
depends on that field being present. In practice this tends to escalate minor
bugs into full outages. Like, there have actually been outages in Google
Search, Gmail, and others that wouldn't have happened if all required fields
were treated as optional instead.

I've written more about this here:

[https://kentonv.github.io/capnproto/faq.html#how_do_i_make_a...](https://kentonv.github.io/capnproto/faq.html#how_do_i_make_a_field_required_like_in_protocol_buffers)

------
pling
This is a fine reason to use protocol buffers instead of JSON:

    
    
       [ 4738723874747487387273747838727347383827238734.00 ]
    

Parsing that universally is a shit.

------
KaiserPro
The biggest thing is that its smaller over the wire.

however if schemas scare you (shame on you if they do) then msgpack might be a
better choice.

------
jacob019
Definitely the right direction for performance. My company ended up going with
python-gevent and zeromq to implement an asynchronous API server with
persistent TCP connections. Our application servers are able make remote calls
over a persistent tcp connection without any noticeable overhead. You could
still use JSON, and we tried it--but since we're all python anyway we decided
to just pickle the objects which is way faster. We looked at protocol buffers,
but found it to be a bit cumbersome. It's been stable for two years and
completely solved our scaling problems.

------
tieTYT
> There do remain times when JSON is a better fit than something like Protocol
> Buffers, including situations where:

> * You need or want data to be human readable

When things "don't work" don't you always want this feature? Over a long
lifetime, this could really reduce your debugging costs. Perhaps protocol
buffers has a "human readable mode". If not, it seems like a risk to use it.

~~~
kentonv
You can run the binary format through `protoc --decode` to get text, or use
the `toString()` method in your code. It's an extra step, but for the 99.99%
of queries that no human looks at, it saves a lot of CPU.

------
redthrowaway
...in Ruby. For some applications.

With Node, I'd have to see a very good argument for why I should give up all
of JSON's great features for the vast majority of services. Unless the data
itself needs to be binary, I see no reason why I shouldn't use the easy,
standard, well-supported, nice-to-work-with JSON.

~~~
kybernetikos
In javascript, JSON is also much faster to deserialize than protobuffers.

------
jweir
Anyone tried ProtoBuf.js "Protocol Buffers for JavaScript." on the client
side?

[https://github.com/dcodeIO/ProtoBuf.js](https://github.com/dcodeIO/ProtoBuf.js)

~~~
kybernetikos
Yes. It's a decent implementation. However, binary data access (particularly
DataView which is used by protobufjs) is embarrassingly slow in all browsers
at the moment, so JSON (possibly gzipped if you're worried about bandwidth) is
much better if you're going to be mainly targetting browsers.

------
jayvanguard
What a huge step backwards. We had decades of binary protocols. They sucked.
Then everyone moved to non-binary protocols and it was much better. Let's not
do it all over again.

~~~
kentonv
Don't forget about XML, MIME, SGML, SOAP... We've had decades of protocols
both binary and text, and they've all sucked.

The thing it seems we've learned more recently is that the more features and
complication you add to a protocol, the more it sucks. At the end of they day
your data is composed of primitives, records, and lists, and if your protocol
offers to structure things in any other way, it's just creating confusion.
This is why JSON beats XML, and Protobufs beats many of its binary
predecessors.

------
nly
I'd rather use Avro. The binary encoding is more dense and there's an
officially supported JSON encoding (the text encoding for Protobufs is mostly
intended for debugging)

------
pkandathil
When you convert an object from language X to JSON, validate it using a schema
validation before deserializing, then is it not the same as JSON. Also now
with JSON you have the opportunity to have human readable data which is great
when debugging issues. I am not seeing the advantage of protocol buffers. It
would be great if you can compare payload sizes and see if there is a
significant savings from that perspective.

------
mrinterweb
Personally, I prefer Thrift over protocol buffers. I'm surprised Thrift wasn't
mentioned.

------
don_draper
"Reason #3: Less Boilerplate Code"

Anyone who has used Avro in Java knows that this is not true.

~~~
nly
Avro code is generated though?

~~~
don_draper
For anything non trivial, putting data into an Avro object and pulling it out
is manual and tedious. Instead of json_parse("{'nice json':'string'}) you have
to assemble the object and for large hierarchical objects this is a pain. And
then you want to return that object to the UI. Well it's not standard JSON.
For example the arrays aren't simple arrays. My team has code to fix these
kinds of things but it's not nearly as easy to use as JSON.

tl;dr Avro JSON != Standard JSON

The OP's assertions that the protocol buffer's are awesome and reduce
boilerplate conflict with my experience.

~~~
nly
I don't see how "standard JSON" could be any easier when it's even less
structured.

------
AnimalMuppet
Use protocol buffers because you get versioning? How hard would it be to add a
version number to your JSON struct?

(Answer: Trivial.)

