

Show HN: Very low footprint JSON parser in portable ANSI C  - udp
https://github.com/udp/json-parser

======
aliguori
There definitely is a lack of good JSON parsers for C. We wrote our own in
QEMU. The relevant code is:

[http://git.qemu.org/?p=qemu.git;a=blob;f=json-
lexer.c;h=3cd3...](http://git.qemu.org/?p=qemu.git;a=blob;f=json-
lexer.c;h=3cd3285825d1f8433da982eba6059169bd6776c3;hb=HEAD)

[http://git.qemu.org/?p=qemu.git;a=blob;f=json-
parser.c;h=849...](http://git.qemu.org/?p=qemu.git;a=blob;f=json-
parser.c;h=849e2156da4e7a3fad8f890370236cb6da9be716;hb=HEAD)

Among other things, this supports streaming, is fairly fast, and has gotten a
fair bit of scrutiny against malicious input.

The lexer is a hand written state machine which seems like something you
should never do but turned out to be pretty reasonable.

~~~
haberman
What's wrong with YAJL?

~~~
lflux
Nothing, YAJL got a lot of things very right. We use it in our C daemons for a
bunch of things. Not having to unpack the whole JSON into memory is pretty
handy.

------
mape
Another alternative, <https://github.com/esnme/ultrajson>

"Ultra fast JSON decoder and encoder written in C with Python bindings"

From the people that built the Battlefield 3 web portal.

Medium complex object:

ujson encode : 18757.01101 calls/sec

yajl encode : 6315.14030 calls/sec

simplejson encode : 5542.03928 calls/sec

cjson encode : 4651.59072 calls/sec

\---------

ujson decode : 10759.69649 calls/sec

simplejson decode : 8148.35221 calls/sec

cjson decode : 7931.04387 calls/sec

yajl decode : 5887.38201 calls/sec

------
spullara
This Show HN makes me think there needs to be a site for more formalized code
reviews of open software. Ideally with some great game mechanics to make sure
engagement is high and thing are getting reviewed well.

~~~
alexchamberlain
As a younger programmer with great ambitions, some sort of code review site
would be awesome!

~~~
zoul
<http://codereview.stackexchange.com/>

------
tptacek
What happens when the input length is longer than 2^31? You used an "int" for
the length (also, why ever use a signed value for length?) --- even on LP64,
that counter wraps at ~32 bits.

(Same question applies to how you handle the max_memory computation).

~~~
udp
Added some protection against that, thanks.

------
halayli
You aren't checking the return value of json_alloc() in new_value()

~~~
udp
Well spotted! Fixed, thanks.

------
feralchimp
Is JSON guaranteed to be ASCII?

To clarify: Any "lookup table" that maps hex values to assumed character
values is a portability red flag. When using them, it's polite to add comments
to explicitly call out the code page dependency and argue (from a spec or RFC,
say) why that assumption is okay.

~~~
michael_miller
<http://www.ietf.org/rfc/rfc4627> specifies that the encoding must be Unicode:
"JSON text SHALL be encoded in Unicode. The default encoding is UTF-8."

~~~
pantaloons
Neither the execution nor source character set (of C) is guaranteed to be
ASCII though. This makes the general parsing as well as lines like "if (c >=
'A' && c <= 'F')" non-portable.

~~~
udp
Non-portable to different character sets, not platforms. One could argue that
the argument to json_parse is a UTF-8 string.

------
lflux
ts=3? Is this some sort of subtle troll to irritate all factions of tab stop
religions?

------
chops
Interesting. I may try my hand at learning some Erlang NIF creation using
this. Then I can benchmark it against Bob Ippolito's mochijson2 module.

Could be a fun little exercise.

------
roschdal
I like using Jansson: <http://www.digip.org/jansson/>

~~~
scumola
+1 for Jansson here too. Lightweight and works really well.

------
schlecht

      const json_char *cur_line_begin, *i;
      ...
      top->u.dbl = strtod (i, (json_char **) &i);
      top->u.integer = strtol (i, (json_char **) &i, 10);
    

Ick

------
peter_l_downs
Seems similar to the one up on CCAN [1], which is also BSD-MIT licensed.

EDIT: forgot to mention that it includes a bunch of great helper functions,
too.

~~~
rmgraham
1\. <http://ccodearchive.net/info/json.html>

------
tcas
I've used cJSON (<http://sourceforge.net/projects/cjson/>) in the past, which
worked very well for what I needed (simple 1 file JSON parser for config
files). Maybe I'll give this a shot the next time I need to do some simple
JSON parsing.

You should get the project listed on <http://www.json.org/>

~~~
mikepurvis
If that's the same cJSON I was using a few months ago, I found it a lot more
memory-hungry than it needed to be. I was doing some network code with lwIP on
an embedded system, so the all-static nature of js0n (with some helper
functions) was a better fit for me.

------
m_eiman
Here's another minimalist alternative:
<https://bitbucket.org/zserge/jsmn/wiki/Home>

I've used it and think it's pretty neat. One of these days I'll get around to
releasing the helper functions we've written to make it easier to use too.

~~~
fmardini
I use it as well, and I'm very happy with it! It's been running in production
for quite a while now without any hiccups.

------
avar
Any reason you roll your own numeric() instead of using isdigit()? It's in
C89.

~~~
udp
Just to be sure it's inlined, really. Although I assume isdigit would be,
being a compiler built-in.

~~~
shtylman
Seems like a premature optimization. Don't assume, check the assembly if you
care :)

------
andrewcooke
why are the flag values not enums (and why is 4 missing?)? is using a lookup
table for decoding hex really faster than the (minimal) logic (what if it
causes cache misses)? do you really think that a state machine with bit flags
is the best way to express the logic here? is string_add meant to increment
string_length on subsequent passes? what is "json_value * cur_value" supposed
to do at the top of json_value_free (maybe i am missing some c trick here?)?

[not dissing you, just bored on a sunday afternoon...]

~~~
udp
_> why are the flag values not enums (and why is 4 missing?)?_

What would the advantage of using an enum be? (and I guess I used 4 and then
removed it later.)

 _> is using a lookup table for decoding hex really faster than the (minimal)
logic (what if it causes cache misses)?_

No idea, that's just the way I did it. Feel free to try something else and
profile if you're really that concerned.

 _> do you really think that a state machine with bit flags is the best way to
express the logic here? is string_add meant to increment string_length on
subsequent passes?_

There's only two passes, and it increments the length on both (the first is to
measure the string, the second is to know where to write in it).

 _> what is "[..] cur_value" supposed to do at the top of json_value_free
(maybe i am missing some c trick here?)?_

You're not supposed to mix code and value declarations in ANSI C, so I put it
at the top of the function. It's just used to temporarily store the value
while reading the parent.

~~~
pjscott
I've converted the lookup table to a few lines of logic. I think it's more
readable, and I would definitely bet on it being faster, though since I
haven't profiled I don't know how much difference it would make.

[https://github.com/PeterScott/json-
parser/commit/db9c326f747...](https://github.com/PeterScott/json-
parser/commit/db9c326f74709b5a5f2904e11f44b9c165613433)

~~~
udp
Yeah, I'll go with that - cheers.

------
mappu
What are you doing with json.h:121 in _json_value::&operator[](const char*
index) when your key doesn't exist?

Still, very nice. Comparable to jsonxx which i've been using up until now.

~~~
udp
Hmm, what should I do? (since it returns a reference). I could make it a
pointer instead, but then you wouldn't be able to chain it.

Maybe some kind of const json_null value to return when the key isn't found.

edit: Done that.

~~~
mahmud
longjmp to an earlier stage where you can "retract" the error or somehow wrap
it in a chainable form (e.g. add a union to your result to signal whether it's
a value or error, or whatever)

That's what exceptions are supposed to do. C doesn't have exceptions, so you
use setjmp/longjmp.

------
krakensden
On a related note, if you want to get something done on a sunday afternoon,
writing a simple recursive descent JSON parser from scratch is both doable and
fun.

~~~
lmm
This should go without saying, but never use such a thing on user-supplied
data.

------
cpeterso
The _inline_ keyword is optional and redundant for member functions defined
within a class or struct declaration.

------
schlecht
Where c is `json_c', "return c > 127 ? 0xFF : hex_table [c];" always returns
false due to the range of types.

~~~
pjscott
Nice catch. It's fixed in the latest version.

------
schlecht
Why are you doing "#define numeric(b) ((b) >= '0' && (b) <= '9')" instead of
say isdigit(3) ?

------
robocop
Where are the tests?

~~~
reidrac
Oh, thanks. I was starting to feel weird because nobody was saying anything
about the lack of tests.

HN may or may not work as a code review platform, but I don't think I would
use myself a 3rd party software that doesn't provide tests.

