
Show HN: A Fast, Malloc-Free C++14 Json Parser and Encoder - matt42
https://github.com/matt-42/iod#a-fast-malloc-free-json-parser--encoder
======
jlarocco
This uses std::string all over the place, which allocates memory under the
hood.

I'm not sure how any JSON parser could avoid memory allocation without a
difficult to use interface. Numbers, arrays, strings, and objects are
unbounded by the JSON spec, so a truly malloc free library would need to
provide a kind of streaming interfaces where things are returned in fixed
sized chunks.

JSON wouldn't be my first choice for data storage in situations where I needed
to avoid dynamic memory.

~~~
areop
I wrote a JSON-Parser in C with minimal validation in ~250 LOC that does not
allocate memory, but leaves the type check and conversion to the user.
[https://github.com/vurtun/json](https://github.com/vurtun/json)

~~~
klibertp
This is absolutely beautiful. I didn't know you could get an address of a
label in C or to do a goto to an address from variable instead of a literal
label. Is this new in C, or was it always there?

~~~
dietrichepp
It's a GCC extension. [https://gcc.gnu.org/onlinedocs/gcc/Labels-as-
Values.html](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html)

~~~
techdragon
Is there an equivalent for Clang/LLVM?

~~~
cjcole
Yes.

[http://blog.llvm.org/2010/01/address-of-label-and-
indirect-b...](http://blog.llvm.org/2010/01/address-of-label-and-indirect-
branches.html)

A venerable interpreter implementation trick.

------
haberman
> As of today, all json parsers rely dynamic data structures to parse and
> store the json objects.

I'm not sure that's entirely fair. Callback-based parsers like YAJL leave the
application free to store the data in whatever data structure they want, or
even to stream-process the input without storing in a data structure at all.

But regardless, the meta-programming approach described here is interesting
and novel. Generating structure-specific parsing code is a well-explored area
(for example, Protocol Buffers is designed entirely around this idea), but
doing it as C++ metaprogramming is a novel approach (Protocol Buffers relies
on compile-time code generation).

I don't actually understand how the object inspection and compile-time codegen
works with this meta-programming approach; will be interesting to dig in a
little deeper and learn more.

~~~
nly
I've used Boost Fusion associative maps, in conjunction with a bit of glue
code and an appropriate JSON or XML library, to provide pretty seemless
serialisation and de-serialisation for both XML and JSON for a while. For XML
it's a little tricky because you have to have a means of mapping the simple
flat relationship between a struct and its fields, to the various
relationships between XML elements, subelements, attributes, and CDATA etc.
Here's an example of what my declarations for XML look like in some code I
wrote last year.

    
    
        BOOST_FUSION_ADAPT_ADT (
            xml_encoded<Description>,
            XML_ATTR        (string, "summary")
            XML_TEXT        (string)
        )
    
        BOOST_FUSION_ADAPT_ADT (
            xml_encoded<Event>,
            XML_ATTR        (string, "name")
            XML_SUBTREE     (shared_ptr<Description const>, "description")
            XML_ATTR        (optional<unsigned>, "since")
        )
    

The #defines for each macro are short, and beyond Fusion there's only a small
support header

Here's a talk from CppConn describing a similar use case, but for binary data
formats
[https://www.youtube.com/watch?v=wbZdZKpUVeg](https://www.youtube.com/watch?v=wbZdZKpUVeg)

------
huhtenberg
On one hand this is undoubtedly a very clever use of ++ features. On the other
hand that _heck of a lot_ of scaffolding (just look in /iod directory) and the
more scaffolding there is, the more caution is needed in adopting the code.

The same goal - not parsing what's not needed - can be done with a
conventional callback-based C code. You basically go through the json data,
parse, say, a field and call the app asking "is this ok? shall I proceed?". If
it's a yes, then you indeed proceed and parse out the value chunk and pass it
to the app the same way. If it's a no, you either abort or skip over the
value. The end effect is the same - parsing of an invalid json input is
aborted as soon as the app flags the first bad entry; and unwanted fields are
never parsed in full.

So I seriously doubt that this is a little more than a marketing spin of a
proud developer -

    
    
      This makes its performances impossible to match
      in other languages such as C or Java that do not
      provide static introspection.
    

I am fairly certain that vurtun's code [1] can match and most likely beat this
lib's performance, with ease.

[1]
[https://news.ycombinator.com/item?id=8609236](https://news.ycombinator.com/item?id=8609236)

~~~
matt42
Vurtun's code is probabely faster than the iod json parser since it leaves the
type check and conversion to the user. Since iod knows the structure of the
object you are parsing, it directly checks types and throw exceptions if the
json string does not contains all the required fields of the destination
object, things that you cannot do without compile-time introspection, and that
vurtun's library leaves to the user.

~~~
huhtenberg
No, I meant of course that vurtun + all the callbacks are going to be at least
as fast as iod. You'd basically pass a bit of context to every callback, so
that the app would know where exactly it is in its parsing. It's clearly a bit
more of leg work in terms of coding, but I would personally take that over
introducing a dependency on a larger abstract framework. I come from the
embedded programming background, so I would opt for a pre-processor step that
takes in a struct definition and generates a proper set of parsing callbacks.
This way you'd at least see the interim code before it gets compiled. But I
digress, to each his own and that's not a point here. I spent a lot of time
optimizing C code and I find your blanket performance claims to be too
absolute and cocky for comfort.

------
densh
See also: Scala Pickling [1]. Serialization and deserialisation logic
optimised for specific datatype is generated purely at compile time using
Scala Macros [2].

[1]
[http://lampwww.epfl.ch/~hmiller/pickling/](http://lampwww.epfl.ch/~hmiller/pickling/)

[2] [http://scalamacros.org](http://scalamacros.org)

------
twic
> This makes its performances impossible to match in other languages such as C
> or Java that do not provide static introspection.

A CHALLENGE!

So, er, who's up for it?

You could implement an analogue of this approach in Java. It's true that Java
doesn't have language constructs that would let you do this as part of
compilation, but Java has its ways. You could write an annotation processor to
do this at compile time, or use a bytecode parser at runtime (this is yucky,
but a fairly standard technique these days). Either way, the output would be a
pair of synthetic classes which implemented the parser and encoder. A tool
like this would be moderately laborious to write, but a straightforward matter
of programming.

~~~
astral303
It reminds me of jackson-afterburner[0] in Java, which generates byte code for
parsing and generating JSON.

[0] [https://github.com/FasterXML/jackson-module-
afterburner](https://github.com/FasterXML/jackson-module-afterburner)

------
nly
RapidJSON claims to support "in-situ parsing", which is presumably mostly zero
copy, and presumably doesn't allocate much either. I'd like to see benchmarks
over comparable code.

~~~
masklinn
in-situ parsing would only work when decoding from an existing buffer
(returning slices from the original buffer instead of copying the data), not
when decoding JSON from e.g. a socket, right?

------
dvt
If you're using C++, it's by definition malloc-free ;)

~~~
dvt
Random downvote? Using malloc in C++ is considered terrible practice. Should I
even source it?

------
rdtsc
> In classic C or C++, you would define a function taking optional arguments
> as :

Is that true? Can classic C have default (optional) arguments?

~~~
cremno

      void fun(int mandatory_arg, int optional_arg1 = 1, int optional_arg2 = 12, int optional_arg3 = 12);
    

No, this isn't legal ISO C. There are ways to simulate them though:
[https://gustedt.wordpress.com/2010/06/03/default-
arguments-f...](https://gustedt.wordpress.com/2010/06/03/default-arguments-
for-c99/)

~~~
matt42
My mistake, I just fixed the readme.

------
rurban
Sorry, didn't read the code yet. But malloc free means stack allocation, thus
dangerous to stack attacks. Please clarify.

~~~
BinaryIdiot
I also haven't read the code yet but malloc free may simply mean the code
doesn't call malloc but it can still allocate things on the heap using New.

~~~
k4st
This program appears to use heap allocation, at least indirectly through its
use of std::string.

On a more general note, libraries that perform zero dynamic allocation (and
instead require the library user to pass in memory) can be very convenient for
systems programming where portability is a concern. For example, I use Intel
XED instruction encoder/decoder, which is a library that performs no dynamic
memory allocation. This allows me to use it in user space and kernel space
without hijacking the malloc symbol.

