
Building a high performance JSON parser - grey-area
https://dave.cheney.net/high-performance-json.html
======
masklinn
[in Go] seems relevant enough to be added to the title as other languages
could reasonably easily bind to existing high-performance JSON parsers like
sajson or simdjson but doing so from Go incurs all the usual cgo issues.

~~~
tych0
What are the usual cgo issues? I've used it a fair bit and haven't had any
general problems.

There are specific problems like you can't use syscalls which must be single
threaded (a royal pain for container managers, since setns() is), but binding
to a library like simdjson wouldn't have this problem.

~~~
masklinn
> What are the usual cgo issues? I've used it a fair bit and haven't had any
> general problems.

1\. cgo calls have much higher overhead than regular function calls, so for
small documents you'd likely lose performance rather than gain, and even for
large ones depending how you read from the parser it might also be terrible,
callback-based libraries are even worse as calling Go from C is even slower

2\. concurrency can suffer a lot as a Ccall will prevent switching out the
corresponding goroutine, locking out a scheduler thread

3\. cgo complicates builds, especially cross-compilation

4\. it also makes deployments more complicated if you were relying on just
synching a statically linked binary

5\. might have improved since, but used to be most of the built-in Go
development tools couldn't cross the cgo barrier, and non-go devtools
generally don't support go

------
sbr464
In the main scanner function, a few minor performance squeezing notes I'd like
to test:

1\. Moving "length := 0" above the for loop, since it's reassigned in all the
needed cases.

2\. To avoid having an extra "if whitespace[c]", Including the whitespace
cases in the main switch statement, even if it means duplicating or moving
"s.br.release"?

Or, using a switch statement vs a lookup ("whitespace[c]"), if it must be
done.

3\. In the switch statement, using multiple assignment (in most cases):

    
    
      length = validateToken(&s.br, "false")
      s.pos = length
      
      //
      s.pos = length = validateToken(&s.br, "false")
    

4\. In the String and default cases, inlining the length assignment within the
if statement.

5\. Returning "s.br.window()[:length]" in each case vs breaking out of the
switch statement to return. Even though it's ugly, to avoid one step.

6\. I'm curious if any performance could be gained by including more cases for
common characters (A-Z,a-z, 0-9), to avoid using the default case. Testing if
there is a penalty for using a default case vs more cases, even if it's ugly.

7\. Including additional cases for exact values to avoid extra function calls
to "parseString(&s.br)" or "s.parseNumber()".

8\. I'm curious in some cases, if peeking at the next character with a nested
switch statement, could avoid additional iterations or function calls to
validate/release.

9\. In the whitespace check, peeking for common JSON formatting patterns to
avoid iterations. Such as 2 or 4 spaced json, a new line, followed by tabs or
spaces etc. Or possibly establishing that the JSON is
"probably2Spaced/probably4Spaced" and then peeking more efficiently?

~~~
e12e
> 7\. Including additional cases for exact values to avoid extra function
> calls to "parseString(&s.br)" or "s.parseNumber()".

I can see how you might choose some numbers to optimize for (1..10 for
example) - but strings? You could of course do a frequency analysis of the
test data - but would that help in general, beyond just cheating on the
benchmark?

I guess you could try for "key" and "value", and maybe "id"? Possibly adding
"email" and "name"?

Also from tfa regarding numbers:

> Scanner.parseNumber is slow because it visits its input twice; once at the
> scanner and a second time when it is converted to a float. I did an
> experiment and the first parse can be faster if we just look to find the
> termination of the number without validation, canada.json went from 650mb/s
> to 820mb/sec.

------
dzsekijo
Adding this, for sake of completeness (I have no experience with it, just yet
another Go JSON project aiming at high performance).

[https://github.com/minio/simdjson-go](https://github.com/minio/simdjson-go)

~~~
glangdale
I wrote the original simdjson code along with Daniel Lemire. The go version of
simdjson (it's a rewrite, not just a binding to C++ code) is slower than the
original simdjson but still 8-15x faster than encoding/json.

I don't know how the two compare as I don't really know where the overheads
happen in the go version. Assuming the analogous case is "decoding into an
interface{}" the simdjson port would be considerably faster.

~~~
throwaway894345
According to the article, encoding/json is particularly slow with respect to
decoding due to allocations. Notably, the API makes it difficult (impossible?)
to avoid these allocations. Do you know if simdjson is significantly faster in
this regard? And if you don't know that, do you know if the decoding API is
the same as with encoding/json?

~~~
glangdale
I don't know; this isn't a huge source of performance problems in the C++
version and I don't know much about how the go version works.

------
vlowther
There is also
[https://github.com/segmentio/encoding](https://github.com/segmentio/encoding),
which aims to have a zero allocation fastpath while preserving compatibility
with encoding/json.

~~~
Gys
From the OP: 'I believed that I could implement an efficient JSON parser based
on my assumption that encoding/json was slower than it could be because of its
API. It turned out that I was right, it looks like there is 2-3x performance
in some unmarshalling paths and between 8x and 10x performance in
tokenisation, if you’re prepared to accept a different API.'

------
zamalek
The cache line on x86 is 64 bytes. Your whitespace lookup table is way too
big. At the very least, you can subtract (with overflow) '\t' and check that
the character is not greater than ' ' before hitting the LUT.

The ASCII table is ripe for bit twiddling (I suspect it was organized
according with that in mind). You may find bit patterns in whitespace chars.

------
peterohler
Interesting approach. A different approach was taken with OjG
([https://github.com/ohler55/ojg](https://github.com/ohler55/ojg)) which shows
a significant performance improvement over the current golang JSON parser.

~~~
todotask
Wasn't aware you have developed ojg, how did it came about iirc I have asked
about Go version a few years ago.

~~~
peterohler
Just finished it last week. I'm pretty happy with the results but even more
pleased with the JSONPath. It even works on regular types using reflection.

------
ypcx
I've tried to parse those test files in Deno (in a very rudimentary test[1]),
the results (on my i9-9980HK @ 2.40GHz) are:

    
    
      canada.json          -->   31 ms,   73 MB/s
      citm_catalog.json    -->   13 ms,  135 MB/s
      code.json            -->   17 ms,  113 MB/s
      example.json         -->    0 ms,   73 MB/s
      sample.json          -->    6 ms,  124 MB/s
      twitter.json         -->    6 ms,  114 MB/s
    

[1]
[https://gist.github.com/youurayy/18553475c5a9f81a17345cddeeb...](https://gist.github.com/youurayy/18553475c5a9f81a17345cddeebc5d08)

~~~
tptacek
Is that code unmarshalling to types?

~~~
ypcx
Deno is a JavaScript runtime built in Rust, so I guess it depends what you
mean by "types". Definitely not a strong typing, although the example could be
rewritten to TypeScript, which Deno supports natively. I use Deno sometimes,
so it was of interest to me to compare it to the advancements presented here
for Go.

~~~
tptacek
I like Deno (v8 is written in C++, not Rust, for what it's worth); the
question is just: is it doing the same work that the encoding/json benchmark
is doing? What are we comparing?

------
otabdeveloper4
All this effort on building a better JSON parser for language 'X' should have
been better spent on inventing a statically-typed JSON. (Type inference seems
like a thing that should work well in this space.)

~~~
jcelerier
> All this effort on building a better JSON parser for language 'X' should
> have been better spent on inventing a statically-typed JSON

how does that help when you want to implement a JSON-based protocol ?

~~~
otabdeveloper4
I'd guess that 99.9% of those protocols are actually statically-typed.

E.g., an object's "foobar" field is always an array of int, and won't suddenly
become a string from one invocation to the next.

It seems incredibly strange that we're somehow not leveraging this.

~~~
fnord123
What you're saying exists. Protobuf, flatbuffers, capnproto, xdr, parquet,
arrow, avro, orc, sqlite. Indeed if you use these formats you will be able to
load the data into memory more quickly than using JSON.

But this article isn't about those protocols. It's about JSON.

~~~
chrismorgan
You can do it with JSON too. Various statically-typed languages do JSON
parsing straight into structs all the time. (e.g. in the Rust ecosystem, Serde
is the most popular implementation of this concept.)

------
Xophmeister
Given that almost all JSON applications are going to be IO bound (reading from
disk, or more likely, network) then what’s the fascination in making super-
fast JSON decoders, beyond the engineering challenge? Sure, it’s an overhead
to the processing you’re actually interested in, but I hardly imagine that
it’s in “straw that breaks the camel’s back” territory.

~~~
jonstewart
Because it’s not true that JSON parsing is I/O bound. It might be if you have
an old 5400 RPM laptop hard drive, otherwise it’s CPU bound. There are many,
many benchmarks which will indicate this to you, including ones in TFA.

~~~
throwaway894345
Don't benchmarks like these usually pre-warm the file cache? If so, then these
benchmarks wouldn't be evidence that parsing is I/O bound, since they're
reading from kernel memory, right?

~~~
jonstewart
Correct, good parsing benchmarks will ensure the input is in memory or, at
least, heavily cached with a linear access pattern. And, the point is, those
benchmarks often show libraries parse JSON at a good deal less than 100 MB/s,
or, for the fast ones, maybe 100-400 MB/s. Those parsing rates are not fast
enough to claim that JSON parsing is I/O bound.

Of course, the original comment is saying that most “apps” are I/O bound
anyway (shall we assume web apps?). I think this is a lazy argument, or at
least an ignorant/self-centered one — plenty of apps are not web apps running
in an embarrassingly slow context like Django or Rails. For example, I work in
digital forensics/cyber security, and we have to scan through TBs of logs
(sometimes in JSON).

------
stephc_int13
It makes me sad every time I see a rewrite from scratch of a working,
optimized and well tested library for the sake of writing it in a newer/trendy
language...

In my opinion this is much worse than reinventing the wheel.

Especially when the result is, in the end, slower.

~~~
enriquto
> In my opinion this is much worse than reinventing the wheel.

You say it as if reinventing the wheel was a bad thing, but it is not.
Technology advances by the continued reinvention of the wheel.

