
Show HN: Gojay – Performant JSON encoder/decoder for Golang - francoisllm
https://github.com/francoispqt/gojay
======
nieksand
The fastest Go JSON parser I know of is
[https://github.com/buger/jsonparser](https://github.com/buger/jsonparser).
I've used that one in production quite successfully.

I don't see that in the comparative benchmarks for Gojay.

~~~
latch
> I don't see that in the comparative benchmarks for Gojay.

Look again, JasonParser is included in some, but not all, of the benchmarks.
JasonParser [apparently] allocates no memory, which, to me, seems pretty
compelling.

~~~
nieksand
You are right. I totally missed it in the first few tables.

The lack of allocs for JsonParser was definitely a big win for my project.
When processing at high rps (40k msg/sec), GC is near the top of all my
performance profiles. Driving allocs down with Buger's parser was very helpful
for getting the throughput up.

~~~
zerr
What's the project? Seems like a language with a GC is not a good choice, is
it?

~~~
nieksand
Essentially a gloried batcher / data transformer consuming from Kafka.

Even with the GC overhead, I will still able to hit my performance goal of 40k
msg/sec per AWS c3.4xl.

In a vacuum I would have picked Rust over Go for this. But Go is widely used
at my work, whereas I'm the only Rust guy.

~~~
woolvalley
How much of a difference you think rust would of made? Have you made something
similar?

~~~
nieksand
I've written some high req/sec projects in Rust, but nothing that would be
apples-to-apples to that particular Go project.

The memory access patterns are straight-forward enough that the remaining GC
could definitely go away. If I had to make a wild, from the hip, and
unsubstantiated guess, maybe a 2x to 5x performance multiple. That would
include a comparable amount of time spent optimizing and would depend greatly
on the quality of the Kafka crates for Rust (which I've never used).

------
tptacek
Performance is a real issue for Go's standard JSON, but this is a lot of extra
boilerplate code to have to write (I'd probably codegen most of this if I had
to), so I'd assume the reasonable strategy would be to implement with
encoding/json, profile, and then just GoJay the hotspots.

~~~
laumars
It looks like this has a Marshal and Unmarshal function just like it's core
library counterpart. So I'd guess you might be able to use this as a drop in
replacement. However I'm yet to prove that theory.

~~~
tptacek
encoding/json's Marshal and Unmarshall uses reflect and struct tags. This
library doesn't: you have to define a function to make your struct satisfy an
interface.

~~~
laumars
You don't have to use structs tags (nor even structs) to use encoding/json. In
fact I often don't as a lot of my usage is with maps rather than structs.

I did some testing with gojay and it did work as a drop in replacement for
most of my usage. However the performance improvements I saw in my benchmarks
were not nearly as favourable as those published in the projects git
repository. I'm sure I could get better results if I played around with their
APIs a little more rather than just using the mashallers but frankly the
utility I'm using it in favours the flexibility of encoding/json a little more
anyway, even if that does cost me a little in performance (and to be clear, it
really wasn't much as I needed to iterate the execution of my utility a
thousand times just to get any meaningful differences between the two
libraries. So we're not talking real world usages).

That said, if you're building high performance servers then I'm sure gojay
would really stand out. On reflection (no pun intended), my requirements
wasn't really the target use case for this package.

~~~
tptacek
That's fine, but this library isn't a drop-in replacement for encoding/json;
to be that, it has to work for people who expect tagged structs to round-trip
through it, which won't happen with this library.

~~~
laumars
That's fine but I never stated "drop in for all use cases". A point I made
very clear in my second post. But for some use cases it can be. As I had
explained already.

The rest should be abundantly clear which use case would apply to anyone who's
spent more than five minutes in Go (or any programming language that supports
reflection) and had read the first like of the packages readme (ie that it
doesn't use reflection).

It's definitely worth remembering that one doesn't have to use structs and
tags to write nor read JSON in Go before people start bitching about boiler
plate code and lack of macros.

------
krylon
I have been using ffjson[1] in a couple of private projects, which generates
code to encode/decode data, which supposedly is faster than the json library
from Go's standard library.

Has somebody any first-hand expertise how this compares to the competition?

(I could, of course, do my own benchmarks, but so far I have not had any
performance issues on this end, so it has not been a pressing need. The only
problem I have run into with ffjson is that various linters will bark at the
generated code, but again, that has not been a big problem.)

[1] [https://github.com/pquerna/ffjson](https://github.com/pquerna/ffjson)

~~~
pquerna
(ffjson author here)

The main feature that ffjson has that most of the non-stdlib JSON libraries is
stdlib compatibility. Eg, the same struct-tags and interfaces used in stdlib
are used by ffjson. It's just trying move most of the reflection / allocations
/ etc to a `go generate` step vs runtime.

If you abandon trying to stay consistent with the stdlib JSON, and make new
APIs/interfaces for propagating the encoder or decoder state, as Gojay has, it
will undoubtedly be faster than ffjson or stdlib.

~~~
krylon
Thank you very much!

------
BillinghamJ
Would be handy if it included a utility to generate the (un)marshaling
functions, so that can just be done as a build step.

So far we’ve used
[https://github.com/mailru/easyjson](https://github.com/mailru/easyjson) for
that, but the code generation step is unbelievably slow - multiple minutes for
about 100 structs.

~~~
francoisllm
It's actually the next milestone, a generator for structs, maps and slices.
Until now the goal was to make it ready for some high traffic services in
production. These traffic receive JSON with chinese, vietnamese character so
we needed to make sure it works well first (boilerplate was not an issue).

~~~
anothergoogler
Go is pretty good about Unicode, did you run into issues with Chinese and
Vietnamese with standard encoding/json package or with other third-party Go
JSON libraries?

~~~
francoispqr
Nope, it's quite easy to integrate unicode parsing :) just need to check for
"\u1234" strings in JSON as it is valid, also need to check for utf16
surrogates. Standard package does it already just had to implement it in
Gojay.

------
dtolnay
To see how this fares against native code, I ported their benchmark to Rust's
JSON library [https://serde.rs/](https://serde.rs/). Disclaimer: I am a
maintainer of Serde.

Numbers and graphs: [https://github.com/serde-rs/json-
benchmark/tree/gojay#serde](https://github.com/serde-rs/json-
benchmark/tree/gojay#serde)

Rust source code: [https://github.com/serde-rs/json-
benchmark/blob/gojay/src/li...](https://github.com/serde-rs/json-
benchmark/blob/gojay/src/lib.rs)

TL;DR GoJay ranges from 20% slower to 2.7x slower depending on workload.

~~~
IshKebab
Go is "native code" but anyway nice work!

~~~
Groxx
As always, it's a range.

Go has a runtime which you cannot control which schedules your code when it
feels like it and a garbage collector. Rust has neither.

Sure, Go isn't parsing and interpreting its files at runtime. But neither does
Python, so I'm not sure that's a meaningful line to draw.

~~~
abiox
> Sure, Go isn't parsing and interpreting its files at runtime

afaik, referring to something as "native code" just indicates it isn't
compiled to/executing as bytecode. not anything to do with gc, runtime, etc.

~~~
Groxx
Depends on who you ask. As demonstrated by literally every commenter in this
thread so far.

Besides, define bytecode. Once it has run through a JIT and diverged from the
on-disk representation, is it now native? How is it distinguishable from a
binary that detects your CPU architecture and executes different branches of
code? What about other forms of self-modifying code?

There are cases where an easy argument can be made (e.g. Java, which has a
separately-supplied VM to run your bytecode... but then where's the line with
DLLs?), but there isn't an unambiguous line in the sand here. At any line you
can produce a new system that straddles it (e.g. a JAR which ships with its
own VM. the VM is native code, is the binary now native or not?), and often
there are already widely-used examples.

------
laumars
This looks perfect for a project I'm working on which is a UNIX / Linux $SHELL
that makes heavy use of JSON pipelining.

~~~
agumonkey
structured shell, go on

~~~
zbentley
[https://github.com/PowerShell/PowerShell](https://github.com/PowerShell/PowerShell)

~~~
laumars
Powershell is object oriented and, in my opinion at least, an absolute pig for
doing quick one liners (overly verbose syntax, pipelines bork if types
mismatch even when the data getting passed is still essentially just textual).
I wanted something that was still in the same realm of typical UNIX shells
(even with Bash comparability where sensible) but with an awareness of complex
data formats.

However everyone has their own opinions and preferences. I wrote my shell to
scratch my own personal itch and if others like / use it that is a bonus.

------
edhelas
How does it compare with encoding the same kind of data in a XML stream?

~~~
laumars
That would depend on the XML interface you used. I don't really rate the XML
parser in the Go core library much. But I think as much if that is my own
personal biased against XML.

~~~
dullgiulio
Encoding XML to a structure makes little sense, as there is not a direct map
between tags, attributes and struct fields. Something like a DOM makes lots
more sense, but it is also much more verbose. This is the big reason to prefer
JSON over XML.

~~~
laumars
Yeah, that was the point I was hinting at :)

------
segmondy
How does this compare to [http://ugorji.net/blog/go-
codecgen](http://ugorji.net/blog/go-codecgen) ?

~~~
francoispqr
It's faster :)

