
Mison: A Fast JSON Parser for Data Analytics [pdf] - fanf2
http://www.vldb.org/pvldb/vol10/p1118-li.pdf
======
posnet
Someone already implemented it in rust.

[https://github.com/pikkr/pikkr](https://github.com/pikkr/pikkr)

~~~
XR0CSWV3h3kZWg
Looks like this is just a library so far. It'd be nice to see something that
could compete with q using this.

~~~
mohaba
Do you mean the tool 'jq'?

~~~
XR0CSWV3h3kZWg
whoops yep.

------
zapov
Gson best of the breed? Oh my :(

[https://github.com/eishay/jvm-
serializers/wiki](https://github.com/eishay/jvm-serializers/wiki)

[https://github.com/fabienrenaud/java-json-
benchmark](https://github.com/fabienrenaud/java-json-benchmark)

[https://github.com/kostya/benchmarks](https://github.com/kostya/benchmarks)

~~~
javajosh
They quite clearly state that Jackson is the fastest, later in the paper.

~~~
zapov
But it's not. It's just the default/most popular one.

It's up from 2x slower than fastest Java one, as clearly shown in the links.

~~~
javajosh
I read the paper quickly, but Jackson was really really fast. Note that their
charts are measuring _throughput_ , so a larger bar is better.

~~~
javajosh
Dammit, I re-read the paper and I was wrong, you were right. But I can't
delete/change my comment, so please downvote away. (I hate wrong information
on the internet, and to be the source of it is a horrible feeling. Sorry.)

------
XR0CSWV3h3kZWg
So the big wins are when only a couple fields need to be read and the majority
of the data can be ignored. They've even integrated this into spark. It'd be
really nice to see the code released!

~~~
lower
> It'd be really nice to see the code released!

I'm surprised the conference accepted the paper without the source code being
made public.

------
jarym
"A key challenge for achieving these features is to jump directly to the
correct position of a queried field with-out having to perform expensive
tokenizing steps to find the field" \- I think jsoniter does that (or
something similar) [http://jsoniter.com](http://jsoniter.com)

------
fefe23
SIMD optimization is actually not as easy as it sounds. You can use SIMD to
greatly speed up looking for ", for example, but with JSON that is not enough,
because there could be an embedded " in the string, escaped as \" And if you
find a ", checking whether the previous character is a \ is not enough either,
because it could be \\\" (an escaped backslash at the end of the string).

The main takeaway of this, to me, is: if you design a language like JSON, make
the grammar easily parsable. Escape " as \22 for example, or =22 or basically
anything not countaining the escaped character, then you can use SIMD to look
for the end of the string very efficiently.

~~~
fnord123
My main takeaway is that if you care about performance so much, just ETL the
data to be avro, thrift, protobuf, hdf5, netcdf, parquet, arrow, anything but
plain text.

~~~
frankmcsherry
When I was grinding away on the 128B edge graph on my laptop, literally 85% of
the time was parsing integers. Get your data to native binary as soon as you
can (edit: or out of plain text at least, per parent).

------
kuwze
Reminds me of the fastest JSON parser written in D[0].

[0][http://forum.dlang.org/thread/20151014090114.60780ad6@marco-...](http://forum.dlang.org/thread/20151014090114.60780ad6@marco-
toshiba)

------
js8
I am surprised they don't mention succinct trees in the references. Looks like
this was invented independently.

