From the documentation it seems to me that they're mostly concerned about resource utilization when processing APIs, and not dealing with files. 635KB may be representative of a large API payload in their environment.
It is mentioned here: https://github.com/bytedance/sonic/blob/a577eafc253adb943924..., but it isn't included in the benchmarks graphs. Seems this repo is specifically focused on Golang and isn't necessarily motivated by being the fastest JSON [de]serializer on the planet.
Right, they show MB/s of parsed JSON, and it definitely surprised me: simdjson-go is among the slowest! It looks like there's something fishy going on there, as it is _orders of magnitude_ slower that vanilla simdjson.
Yes. But on a bar graph showing rate taller bars mean faster, but on a bar graph showing time to complete a task taller bars mean slower.
The graphs are showing rate and simdjson-go has shorter bars (or nonexistent bars which suggest a quantity too low to show up at the resolution of the graphs). TkTech though the graphs were showing time and so thought they were showing that simdjson-go is very fast, which skavi is pointing out is reversed because they are showing rate.
I regularly deal with JSON documents several MB in size, but do developers frequently deal with JSON documents several GB in size? If so, where do you encounter something like that? Surely "processing" that much data (for whatever definition of process you have) is orders of magnitude slower than parsing it.
I love the idea of a library trying to squeeze every last bit of performance out of the CPU, but I'm genuinely curious at the problems it solves in the real world.
Depends. I've had multi-gigabyte `[{..}, {..}, ...]` json arrays from database dumps, and doing even basic things with that with jq takes ages unless you use the (highly obtuse IMO) streaming methods. Sometimes you can pre-grep to filter the results to something trivial to process, but sometimes the structure is not unique enough to let you do that, or it depends on multiple field values - filtering that with a json parser makes perfect sense, and then speed can matter.
That said, a 2x+ improvement for a couple megabytes, especially if done many times per second, is still a significant improvement.
Monthly general ledger entries for the largest real estate companies, tried XML and JSON, eventually landed on compressed CSV for best trade off between human readable large files (~1-3GB) and compressibility.
Not JSON, but we process XMLs on the order of a GB. Largest ones are consolidated invoices (ie a lot of separate invoices in one file). Other large ones contain rules and codelists in multiple languages.
All the time. It's not there's a single record that is that size, but all sorts of things log in json, so you wind up with multi-GB jsonl files. As an example: AWS CloudTrail logs.
Aren't log files processed a line at a time? Last time I had to deal with some structured log, I streamed lines concurrently into json parser and it went pretty fast.
Since when is 635 KB “large”? I suppose it depends on your use case, since they consider 400 B a small, they probably use lots of JSON APIs for many small things.
How many libs we have seen saying that they are the fastest deserializing library, until you try to open files with corner cases, like ut8 characters...
Why does it feel like there are consistently new JSON parsing libraries popping up? People joke about JS frameworks, but JSON parsers almost feel like they're at the same level.
It sounds like it essentially is written in C. INTRODUCTION.md says:
> As for insufficiency in compiling optimization of go language, we decided to use C/Clang to write and compile core computational functions, and developed a set of asm2asm tools to translate the fully optimized x86 assembly into plan9 and finally load it into Golang runtime.
Github says it's 59.6% assembly and 6.5% C. Possibly the assembly is just the checked-in result of compiling+translating the C?
Gross that they have to do this. I know the Go folks really prioritize speed of compilation, but I wish they'd support debug builds with their own backend and release builds with LLVM so you could get this kind of performance when actually writing Go. I see there have been a few attempts at Go + general-purpose backend (gccgo, llgo, gollvm) but none seem to use the official Go frontend written in Go, so I think they're doomed to be second-class at best.
edit: and/or, if the Go folks and GCC and/or LLVM coulds could negotiate a shared ABI (not necessarily switching to the platform's default C ABI but having "cc --abi=special-golang-stack-copying-thing"), folks could just link against something compiled in C/Rust/whatever without requiring this separate compilation+translation step (the high-maintenance, high-performance path) or CGo overhead (the easy but slow-for-frequent-calls path).
I really don't care what language it is written in. As long as there is full test coverage, I'm happy.
I've been a long time user of jsoniter and it is much faster than the standard lib. It really makes a difference for the work I do. If this is even faster and has tests, even better.
Out of curiosity, what does "working with" mean here? What operations did you need to perform on it? Streaming reads, transforms, indexed reads, appends, edits?
I'm thinking that any general-purpose JSON loader is likely to perform badly for a 100GB file, purely because it'll use 2x (or more likely 10x) as much memory for the parsed representation. So you'd want some kind of special-case reader for huge files -- maybe it just builds some kind of sparse index with pointers back into an mmap of the raw data.