
Show HN: ffjson: faster json serialization in Go - pquerna
https://journal.paul.querna.org/articles/2014/03/31/ffjson-faster-json-in-go/
======
haberman
As someone who has been working on parsing/serialization for many years, I can
absolutely confirm that generating schema-specific code will beat generic code
almost every time.

The article discovers an important tradeoff: speed vs. convenience. Generated
code is faster but less convenient, because it adds steps to your compile. And
you pay this compile-time overhead for every message type you want to
manipulate. The pain of this generated code was one of Kenton Varda's
motivations for creating Cap'n Proto after working on Protocol Buffers for
years. Unlike Protocol Buffers, Cap'n Proto doesn't need to generate
parsing/serialization code because its serialization format also works as an
in-memory format.

I have taken a somewhat different approach to the problem, with my Protocol
Buffer / JSON / etc. serialization framework upb
([https://github.com/haberman/upb](https://github.com/haberman/upb)). Instead
of using static code generation, I use a JIT approach where I generate
specialized code on-the-fly. This approach is particularly important when you
are wrapping the library in an interpreted language like
Python/Ruby/Lua/JavaScript, because users of these languages don't have a
compile cycle at all, so adding one is a large inconvenience.

My library isn't ready for prime time but I'm hoping to have something ready
to use this year.

~~~
pcwalton
> The article discovers an important tradeoff: speed vs. convenience.
> Generated code is faster but less convenient, because it adds steps to your
> compile.

Rust's approach to serialization eliminates this unnecessary tradeoff, IMO: it
uses the built-in macro system to parse the structure of your serializable
data types and generates the specialized code at compile time. There is no
special build step, you get the full speed of specialized code, and it's as
simple as writing "#[deriving(Encodable)]" at the top of your data structures.

(As an added bonus, we're using the same infrastructure to generate trace
hooks for precise GC in Servo, which eliminates a major pain point of
integrating systems-level code with precise garbage collectors.)

~~~
p0nce
Well, yeah, D does this routinely too, no macros needed.

[https://github.com/atilaneves/cerealed](https://github.com/atilaneves/cerealed)

[https://github.com/msgpack/msgpack-d](https://github.com/msgpack/msgpack-d)

[https://github.com/Orvid/JSONSerialization](https://github.com/Orvid/JSONSerialization)

~~~
pcwalton
"static if", __traits, and mixin are the D equivalent of Rust's macros, more
or less.

------
hierro
This is way more complicated that it needs to be. Use
code.google.com/p/go.tools/go/types to process the AST and you'll get
basically the same information that the compiler sees. With that you can
generate the code pretty easily. For comparison, our JSON code generator
implementation is just ~350 lines and supports having different JSON
representations for the same type, which vary depending on the container type.

Also, if you want to make the serialization faster you need to understand
exactly what makes encoding/json slow (hint: is not only reflect) and remove
all reasonable bottlenecks. You state that megajson does not support the
MarshallJSON interface like that's a bad thing, but I'm pretty sure that's
deliberate because it's indeed a feature. When encoding/json encounters a type
which implements MarshalJSON it does the following:

1 - Call MarshallJSON to obtain its JSON representation as []byte 2 - Validate
the produced JSON using a slower-than-the-bad's-guy-horse function based state
machine which processes each character individually 3 - Copy the []byte
returned by MarshallJSON to its own buffer

Unsurprisingly (after reading encoding/json's code, of course) having a
MarshalJSON method is way slower than letting encoding/json use reflection if
the JSON you're generating is anything but trivial and without almost any
nesting, because it avoids extra allocations, copies and the validation step.

------
chimeracoder
Shameless plug, but this looks like the exact inverse of gojson[0], which
generates code (struct definitions) based on sample JSON.

I originally wrote it when writing a client library to interface with a third-
party API; it saves a lot of effort compared to typing out struct definitions
manually, and a lot of type assertions compared to using
map[string]interface{} everywhere.

[0]
[https://github.com/ChimeraCoder/gojson](https://github.com/ChimeraCoder/gojson)

------
hannibalhorn
The implementation of this is pretty interesting, in that it generates code
that imports your code, compiles it, and then uses reflection to generate the
serialization code. And in the end, that worked out better for the author than
using the AST.

~~~
lucian1900
If anything, that would be an indication that the tooling is nowhere near good
enough. Any reflection should be doable entirely with data the compiler has
anyway.

~~~
timtadh
Actually there is a tool for this. I think the author was simply un-aware of
it. It is called oracle* and is able to answer this typing question and many
more.

*: [https://code.google.com/p/go/source/browse/cmd/oracle/main.g...](https://code.google.com/p/go/source/browse/cmd/oracle/main.go?repo=tools)

------
AYBABTME
I've made a library this weekend that doesn't need code generation to achieve
2x improvement[1] over the standard library.

While the OP only implemented the encoding part, I only implemented the
decoding part =):

[https://github.com/aybabtme/fatherhood](https://github.com/aybabtme/fatherhood)

So I guess they overlap nicely in that.

[1]:[https://github.com/aybabtme/fatherhood#performance](https://github.com/aybabtme/fatherhood#performance)

------
United857
Looking at the name, at first I thought ffmpeg author and general programming
god Fabrice Bellard had come up with it.

------
kevrone
Seems like this approach might also work for building type-specific collection
implementations as well!

------
mwsherman
On the .Net side, Jil is doing similar, creating a custom static serializer
per type. It’s able to do the code generation at runtime by emitting IL:
[https://github.com/kevin-montrose/Jil](https://github.com/kevin-montrose/Jil)

------
eikenberry
You can get near the same speedup by just avoiding reflection. You unmarshal
into an interface{} then pull the data out manually using type assertions as
necessary. In my last project I think I got about a 1.6-1.7 speedup this way.

~~~
alecthomas
Nice hack (in the hacker sense), but not exactly convenient.

------
otterley
Good stuff, there.

Feature request: optionally emit canonicalized (key-sorted) JSON.

------
knodi
This is a good example of why everyone should avoid using reflect package as
much as possible.

I use reflect for quick development and then remove it before production roll
out.

~~~
Dobbs
I've had to use reflect due to missing functions in an upstream API. We could
have forked the upstream package or used reflect in one function.

We decided to put in a patch and use reflect until it is accepted.

