
Using Protobuf instead of JSON to communicate with a front end - teh
https://blog.wearewizards.io/using-protobuf-instead-of-json-to-communicate-with-a-frontend
======
oppositelock
I worked on a product inside Google which used protos (v1) as the data format
to a web front end, and in practice, that system was a failure, in part to the
decision to use protos. The deserialization cost of protocol buffers is too
high if you're doing complex data throughput, and even though the data size is
smaller, it's better to send larger gzipped JSON (which will be decompressed
in native code) and deserialized into JS (also via native code). We weren't
using ProtoBuf.js, but our own internal javascript implementation of a similar
library, and doing all of this in JS was too expensive. Granted, we were
sending around protos that had multi megabyte payloads at times.

We rewrote our app eventually to send protos in JSON format to the app, while
just letting our backends still pass around native protos, it worked a lot
better.

~~~
haberman
Things have changed a lot since your experience, I think. For one, a different
encoding called "JSPB" has become the de facto standard for doing Protocol
Buffers in JavaScript, at least inside Google. JSPB is parseable with
JSON.parse(), so it avoids the speed issues you experienced.

And looking forward, JavaScript parsing of protobuf binary format has gotten a
lot faster, thanks in large part to newer JavaScript technologies like
TypedArray. Ideally JSPB would be deprecated as a wire format in favor of fast
JavaScript parsing of binary protobufs, but this would of course be contingent
on the performance being acceptable.

Finally, JSON is becoming a first-class citizen in proto3, so protobuf vs.
JSON will no longer be an either/or, it can be a both/and.
[https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json)

~~~
boomzilla
What benefits do I get from ProtoBuf, apart from the standard binary wire
format?

JSON is just more popular as a serialization format. It doesn't matter what
what programming language or OS I am on, there is almost always a built-in
library that de/serialize JSONs at reasonable speed. To send the JSON objects
around from one service to another, I can just gzip the string if it's big, or
just plain UTF-8 string if it's not.

ProtoBuf has to provide more values for people like me to switch. I would
rather try out Apache Avro first as a replacement for what I am doing right
now.

~~~
haberman
In my opinion, the biggest benefit from using protobuf is that the schema
exists in a .proto file. This can be used to provide all sorts of
conveniences.

With a plain JSON-based API, you copy and paste field names out of sample code
or the documentation. If you spell a field name wrong, there will be no error
on the client. If you're lucky, the server _might_ error out because it didn't
recognize the property name, but it also might not. If you send an integer
when the server was expecting a string, the server might automatically convert
or it might not.

With protobuf, the schema is explicit in a .proto file. That means that the
client library can tell you, at the precise moment that you say
msg.misspledFieldName, that the field name doesn't exist. Or if you try to put
an integer in there instead of a string, it can tell you about that too.
Basically it makes for a tighter feedback loop, which is almost always better.

In statically-typed languages like C++ or Java, the schema can be used to
generate static types too, so it's actually a _compile-time_ error when you
misspell a field name.

> It doesn't matter what what programming language or OS I am on, there is
> almost always a built-in library that de/serialize JSONs at reasonable
> speed.

Yep, that's one reason that proto3 will support JSON as a first-class citizen:
[https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json)

~~~
gcb0
summary: xml annoyances for json

:-)

~~~
haberman
XML isn't painful because it has a schema, XML is painful because it wasn't
really designed for RPC, so getting to feature parity with something like
Protocol Buffers takes a whole stack of XML technologies and a huge mess of
complexity.

Protocol Buffers were designed from the ground up for RPC, and as a result are
_far_ simpler and more convenient to use than XML. Seriously, nobody who uses
Protocol Buffers compares them to XML, because it's not even a comparison.

[https://developers.google.com/protocol-
buffers/docs/overview...](https://developers.google.com/protocol-
buffers/docs/overview#whynotxml)

------
haberman
Making JSON first-class is an explicit design goal of proto3, the next version
of Protocol Buffers currently in alpha:
[https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json)

This will allow you to switch between JSON and protobuf binary on the wire
easily, while using official protobuf client libraries. So you can choose
easily whether you care more about size/speed efficiency or wire readability.
Best of both worlds!

I work on the protobuf team at Google and would be happy to answer any
questions.

~~~
caust1c
> Message field names are mapped to lowerCamelCase

Why is a mapping to camel case necessary? I imagine it creates the potential
for collisions, no?

~~~
wora
Co-author of proto3 here. The reasons we chose lowerCamelCase are
compatibility with Google's REST APIs (like Gmail API) and readability for
users who work with JSON output directly without client library. API designers
should not define confusing data schemas, let alone allow collisions. Most
proto messages have small number of fields, avoiding name collision is a
trivial for an API designer.

------
skybrian
It's possible to encode a protobuf as JSON and we do it all the time at
Google. In browsers, native JSON parsing is very fast and the data is
compressed, so going to a binary format doesn't seem worthwhile. The .proto
file is used basically as an IDL from which we generate code.

~~~
cletus
Personally I've found JSON encoded protobufs to be almost universally awful.

The most common method is to use an array indexed by the field number. I've
seen protobufs with hundreds of fields so that's hundreds of nulls as the
string "null".

The alternative is to have JSON objects with attributes named after the
protobufs field name. This isn't without warts either and seems to be less
prevelant in my experience.

Another problem is JavaScript doesn't support all the data types you can get
in protobufs, most notably int64s.

Protobufs are relatively space efficient (eg variable width int types). JSON
encoded protobufs much less so.

Perhaps the rise of browser support for raw binary data will make this less
awful.

Many consider it a virtue to use the same code on the client and server. It
explains things like this and GWT. Personally opi think this is horribly
misguided and a fools errand. You want to decouple your client and server as
much as possible (IMHO).

Disclaimer: I work for Google

~~~
haberman
> The most common method is to use an array indexed by the field number. I've
> seen protobufs with hundreds of fields so that's hundreds of nulls as the
> string "null".

What you are describing here is known as the "JSPB" wire format. This is a
serialization that is only ever used for JavaScript, and only used there
because, historically, parsing binary protobufs in JavaScript was too slow.
With TypedArray and other JavaScript enhancements, this is changing. Ideally,
JSPB wire format would be phased out completely.

> The alternative is to have JSON objects with attributes named after the
> protobufs field name. This isn't without warts either and seems to be less
> prevelant in my experience.

It's about to become a lot more prevalent with proto3, which features first-
class JSON support. See: [https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json)

Disclaimer: I work on the protobuf team.

~~~
skybrian
proto3's JSON is an improvement on ascii protobufs, but since it uses field
names, it doesn't have the same backward compatibility guarantees as a format
that uses tag numbers.

It would be nice if we had a standardized JSPB wire format that used tag
numbers, rather than the various unofficial implementations we have now.

------
justinsb
I like to use Protobuf in my server code, but then support JSON _or_ Protobuf
as the encoding. So browsers can continue to use JSON, but the server gets
strongly-typed Protobuf structures.

~~~
haberman
What you describe is exactly how proto3, the latest version of protobuf, will
work!

proto3 supports both binary protobuf encoding and JSON natively, so you can
switch between them as desired. [https://developers.google.com/protocol-
buffers/docs/proto3](https://developers.google.com/protocol-
buffers/docs/proto3)

proto3 is currently in alpha, but we are working to bring it closer to release
(I work on the protobuf team at Google).

~~~
hesdeadjim
Oh cool, didn't know there was a new version of protocol buffers. I ended up
choosing Thrift for my current project due to wider language support, but I
have been frustrated with some the limitations of the IDL (primarily no
recursive data structures, so no generic storage of JSON-like objects).

~~~
haberman
Another one of the goals of proto3 is to increase language support a lot. Just
this year we have added Ruby, Objective C, and C#, and keep your eyes peeled
for more. proto3 is still alpha, but this is the direction things are going.

proto3 is especially designed to be paired with gRPC, which is also in alpha
but also going for wide language support:
[http://www.grpc.io/](http://www.grpc.io/)

------
sbarre
The biggest takeaway for me from this experiment was "always make sure you are
gzipping your output".

------
zubspace
One thing, where protobuf (at least protobuf-net) really shines, is
serialization of data into a binary format which is incredibly fast. In .NET,
all inbuilt alternatives are slower by a large margin.

[https://code.google.com/p/protobuf-
net/wiki/Performance](https://code.google.com/p/protobuf-net/wiki/Performance)

~~~
dmsimpkins
I agree. I recently converted some large files that were previously stored
using XmlSerializer to use protobuf-net, and I found an 8x increase in space
efficiency, and 6-7x increase in (de)serialization efficiency. It really is a
fantastic library, and if your classes are already marked up for
serialization, there is very minimal work required to make the switch. For
files that need not be human-readable, protobuf is definitely the way to go.

------
benjaminjackman
It would probably be better to try something like Cap'n Proto or SBE if
worried about performance. Otherwise I think sticking to GZIP'd json isn't
going to lag that far behind. Protocol buffers biggest benefit IMHO is just
their .proto file for cross language code generation.

I have it on a todo list to port an SBE parser to ScalaJS. ScalaJS already
backs java ByteBuffers with javascript TypedArrays. That should be really
fast, the same stuff that is being worked on for making asm.js fast will also
make the Cap'n Proto / SBE approach fast, so I think this has the most promise
of bringing really high-performance data transfer capabilities to the browser.

------
rqebmm
Having used both on a few projects, including a JS frontend, my advice is:

"Don't use protobufs if you don't have to".

Protobufs can be much faster, and provide a strict schema, but it comes at the
price of higher maintenance costs. JSON is much simpler, easier to implement,
and MUCH easier to debug. If your GPB looks like it's building properly, but
fails to parse, it's a huge pain to try and decode/debug the binary. You'll
wish you could just print the JSON string.

If you need the speed and schema, then GPBs are great. In our case, we got a
huge speed boost just by avoiding string building/parsing inherent in JSON.

~~~
nfmangano
Could you elaborate on the maintenance costs? We use ProtoBufjs for our own
real-time whiteboarding webapp over web sockets, and in the long run having
strict schemas has saved us a lot of time. We're a distributed team with
different members working on the front and backends, and we frequently refer
to our proto files to remember how data is transferred and how it should be
interpreted (explained in our proto commented code).

Are the maintenance costs related to debugging unparsable messages? We've
almost never had an issue there, so maybe we've just been lucky?

------
mhahn
I'm curious if Google has a common envelope they send all service messages
with. Ie. A common way of specifying pagination parameters, auth tokens etc.
when sending protobuf messages between services. I've been using protobufs for
my services and wrote a ServiceRequest object which has worked well. I was
more just surprised about not being able to find much documentation on actual
deployments as opposed to just simple tutorials.

------
dustingetz
Transit is similar but addresses the flaws described in this article

[http://blog.cognitect.com/blog/2014/7/22/transit](http://blog.cognitect.com/blog/2014/7/22/transit)

~~~
teh
Not sure transit is designed for the same space. E.g. there seems to be no
schema, and the default JSON encoding isn't super readable either.

Protobufs can be encoded as JSON and as text, so there are some ways to
address the readability I guess.

------
Animats
With one end in Python 2 and the other end in Javascript, using binary
protobufs seems misplaced optimization. It's nice to know the support is there
(well, not in Python 3, apparently), in case you need to talk to something
that speaks protobufs.

I'm looking forward to seeing protobufs in Rust as a macro. It should be
possible; there's an entire regular expression compiler for Rust as a compile-
time macro, which is a useful optimization.

~~~
kibwen
It's not protobuf, but Rust has quite good Cap'n Proto support:
[https://crates.io/crates/capnp](https://crates.io/crates/capnp)

------
rikrassen
One of the comments on that article was "YAY! JSON is wastefully large. I'd
love to replace it." Is this true? I'm confused why JSON would be seen as a
wasteful as a format. It seems to be that with any decent compression I would
think it's hard to get much smaller. In this case I'm not talking about the
other advantages Protobuf offers, I just want to know about size.

~~~
shanemhansen
There are basically 2 areas where JSON is really wasteful. Compression can
help with both of those.

    
    
      1. Dictionary keys are repeated when you have an array of similar objects.
      2. Non-text data. JSON can't natively represent binary data, forcing people to use things like base64 for binary and base10 for numbers.

~~~
rikrassen
I hadn't considered binary data. Thanks.

------
gobengo
+1 to "Did this in a real product and fully regret it"

------
laurentoget
Another way to do this is to specify the protocol in protobuf but have the
server translate responses and requests to and from json. The java protobuf
library does that for you out of the box. This is easier to implement. I would
be curious to compare performance of both approaches in different contexts.

------
krapht
How does Protobuf compare with Corba? I'd be interested in anybody's
experience if they have used both.

~~~
nostrademons
CORBA was ridiculously complex, because they tried to make remote objects look
like local ones, with messages, reference counting, naming, discovery, etc.
Protobuf is just a serialization mechanism. You're thinking at a lower level
of abstraction - it's all just PODs that go over the wire, you build your own
RPC framework on top of that (or use gRPC, which is Google's protobuf-over-
HTTP2 RPC library) and think in terms of requests & responses.

IMHO trying to make everything look like an object was a mistake, and newer
RPC frameworks like gRPC, Thrift, and JSON-over-HTTP are _much_ easier to use
than the late-90s frameworks like RMI, CORBA, and DCOM. Sometimes you don't
want abstraction, because it abstracts away details you absolutely need to
think about.

------
flavor8
> Reading time: ~15 minutes.

842 words including code.

Average adult reading speed: 300 words/minute.

Does not compute.

~~~
Keats
I know, I included some time for people wanting to to open some links, the
github project etc.

Only reading the text itself takes indeed less than 5 minutes, not sure which
approach people prefer.

------
drawkbox
There is definitely a place for binary serialization/de-serialization and
transmission. Inter-system communication is probably the best place for binary
or any place that needs high speed real-time communication with the smallest
size to fit in MTU limits (game protocols over UDP for instance). Any place
that you control the client and server is ok to use binary.

However, I do feel there is a strange swaying back to binary
(Protobuf/HTTP/2/etc). Developers are trying to wedge it in now in places it
may cause more problems because it is more efficient in performance but not in
use or implementation. Plus, like mentioned in this thread, you can compress
JSON to be very small to send over the wire which makes the compactness of it
a non-issue in non real-time cases. Going binary just to go binary is more
trouble than it is worth in most cases.

\- Binary over keyed plain text (JSON) is harder to generically parse objects
i.e. dictionaries/lists for just a few fields/keys.

\- Binary over JSON also seems to lock down messaging more, people have more
work to change binary explicit messages because of offset issues and
client/server tools must be in sync rather than just adding a new key that can
be pulled as needed.

\- Third party implementation and parsing of JSON/XML is more forgiving making
version upgrades and changes easier to do. This is especially apparent on
projects that are taken over by other developers.

\- The language/platform on the backend leaks into the messaging. For instance
Protobuf only runs on js/python currently and has various versions. The best
messaging is independent of the platform and versioning is easier.

I would bet binary formats end up causing more bugs over keyed/plaintext
(JSON/XML and possibly compressed) though I have nothing to back that up by
except my own experience largely in game development where networking state is
almost always binary, for server/data I wouldn't use it unless it needs to be
real-time.

That being said Protobuf is awesome and I hope developers are using it where
it is best suited and that developers don't start obfuscating messaging for
performance where it doesn't really need to be, better to be simple unless you
need to make it more complex at every level.

------
omouse
At work we're using HTTP requests and now we added RabbitMQ in the last few
months to deal with the fact that our frontend has to talk to our backend.
After seeing this article it feels like we chose the wrong tool for the job;
protobuf/thrift appear to be _typed_ which would have saved us a lot of
frustration as we've already run into multiple cases where the receiver or
sender have messed up the type conversion or parsing.

~~~
tokenizerrr
I don't see how protobuf is mutually exclusive with RabbitMQ. RabbitMQ is a
message broker and can send around byte arrays. These byte arrays can be
anything, including protobuf messages.

~~~
kajecounterhack
(The above, but yes, send protobufs because they are typed.)

~~~
tokenizerrr
Sorry, what do you mean?

------
vruiz
I guess it only makes sense if you are already using protobuf everywhere else
in your stack. Specially if you are leveraging GRPC[0] which is already
profobuf over HTTP. The network tab problem could be solved by an extension,
or browsers could offer the tools built-in if there were to become a trend.

[0] [http://www.grpc.io/](http://www.grpc.io/)

~~~
soldergenie
Grpc doesn't work over the browser. This is stated explicitly in the FAQ.

See [https://groups.google.com/forum/#!topic/grpc-
io/5Ic8MKgltwY](https://groups.google.com/forum/#!topic/grpc-io/5Ic8MKgltwY)

~~~
rektide
It's rather painful that they don't seem to have any design docs up for their
HTTP transport, leaving it to things like FAQ entries to explain these
details. This was my first thought too- grpc does this.

~~~
soldergenie
The protocol is documented - [https://github.com/grpc/grpc-
common/blob/master/PROTOCOL-HTT...](https://github.com/grpc/grpc-
common/blob/master/PROTOCOL-HTTP2.md)

However, you still need the FAQ to figure out that browser transport isn't
supported

------
swalsh
I always wondered why google decided to build Protocol Buffers. ASN.1 seemed
like it worked well, and it covered all the corners.

~~~
VikingCoder
Here was Kenton Varda's response:

[https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4](https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4)

 _My understanding of ASN.1 is that it has no affordance for forwards- and
backwards-compatibility, which is critical in distributed systems where the
components are constantly changing._

...

 _OK, I looked into this again (something I do once every few years when
someone points it out)._

 _ASN.1 _by default_ has no extensibility, but you can use tags, as I see you
have done in your example. This should not be an option. Everything should be
extensible by default, because people are very bad at predicting whether they
will need to extend something later._

 _The bigger problem with ASN.1, though, is that it is way over-complicated.
It has way too many primitive types. It has options that are not needed. The
encoding, even though it is binary, is much larger than protocol buffers '.
The definition syntax looks nothing like modern programming languages. And
worse of all, it's very hard to find good ASN.1 documentation on the web._

 _It is also hard to draw a fair comparison without identifying a particular
implementation of ASN.1 to compare against. Most implementations I 've seen
are rudimentary at best. They might generate some basic code, but they don't
offer things like descriptors and reflection._

 _So yeah. Basically, Protocol Buffers is a simpler, cleaner, smaller, faster,
more robust, and easier-to-understand ASN.1._

~~~
kentonv
Man that guy sounds full of himself.

Ugh was that only 5 years ago?

------
labianchin
I wonder how would that be like with Avro. It also has JSON encoding:
[https://avro.apache.org/docs/1.7.7/spec.html#json_encoding](https://avro.apache.org/docs/1.7.7/spec.html#json_encoding)

------
jebblue
"While I see the need for Protobuf and Thrift for services communication, I
don't really see the point of using it instead of JSON for the frontend."

Ah Ok whew, so the title was wrong or designed for click bait.

~~~
ohitsdom
Has the title been updated? It currently is "Using Protobuf instead of JSON to
communicate with a front end", which is not click bait at all. The author used
Protobuf instead of JSON as an experiment, and concluded that there is no
reason to use it.

------
nly
Thrift has a JSON encoding out of the box.

~~~
haberman
proto3 (currently in alpha) does too: [https://developers.google.com/protocol-
buffers/docs/proto3#j...](https://developers.google.com/protocol-
buffers/docs/proto3#json)

------
zapov
You can use JSON only as codec, which can give you performance of Protobuf
with much better debugability.

------
imaginenore
Have you guys tried MsgPack? If so, is it worth it?

[http://msgpack.org/](http://msgpack.org/)

~~~
7b64f0f2
Used it to transmit data over 0MQ, worked flawlessly.

~~~
placebo
same here :)

