Due to the inherent constraint of JSON, the exact use case would matter a lot for such comparisons. Simdjson is generally faster when you only want a well-formedness check or a very small proportion of a large input JSON, but the "well-formedness" for JSON would be a small subset of the "well-formedness" of binary formats, and a partial parsing performance is often dominated by language bindings instead of underlying parsers (it is a wise move that simdjson also has a JSON pointer support for this reason, because that greatly reduces FFI overhead). Binary formats in comparsion tend to have a generally flat performance profile.
Look again at Arrow. Part of the concept is the wire format is the same as the in memory format—no de/serialization. Arrow isn’t intended for storage; they point to parquet for that.
Apache Arrow defines an inter-process communication (IPC) mechanism to transfer a collection of Arrow columnar arrays (called a “record batch”). It can be used synchronously between processes using the Arrow “stream format”, or asynchronously by first persisting data on storage using the Arrow “file format”.
The Arrow IPC mechanism is based on the Arrow in-memory format, such that there is no translation necessary between the on-disk representation and the in-memory representation. [0]
The Arrow spec aligns columnar data in memory to minimize cache misses and take advantage of the latest SIMD (Single input multiple data) and GPU operations on modern processors. [1]
During my PhD studies, we have developed a JSON parser for FPGAs [0] that could in theory be turned into silicon if somebody really wanted to. In our evaluation we showed that it is a magnitude faster than at least the AVX-256 version of simdjson that was available at the time on a single JSON document.
FJCVTZS has JavaScript in the name, and is useful for implementing JavaScript yes, but as you say, it's just a float -> int conversion with a particular rounding mode. "Convert float to int, rounding towards zero" is a typical scalar type instruction, very different from the kind of instruction you'd want to optimize parsing.
I swear the only reason people pay FJCVTZS any attention at all is that it has "JavaScript" in its name. If it was just a "convert float to int, round towards zero" instruction, everyone would see it as a normal, boring part of the ISA.
FJCVTZS is really just "float to int with x86 semantics", which also happens to be the semantics that were baked into JavaScript. The name is probably just ARM dancing around mentioning a competing ISA.
> I swear the only reason people pay FJCVTZS any attention at all is that it has "JavaScript" in its name.
You’re not wrong, and yet—
> convert float to int, round towards zero
... and take the result modulo 2^32 as two’s complement. That’s still rather specific and likely useless for numeric computing, which is the canonical application area for this kind of thing. So the “JavaScript” in the name is fair, if only to dispel confusion around why one would ever want this.
I mean these things seem very different? AES is "just" a bunch of mathematical operations done on 16 byte chunks, you can make instructions which optimize that fairly easily. JSON is mainly a parsing problem, which .. is much harder to do in hardware. Hardware acceleration shines when you can do operations on large chunks of data at once.
SSE/AVX/NEON/RVV are all general sets of vector operations, which can be used to optimize certain parts of parsing, as simdjson does. If there could be instructions added to further optimize JSON parsing, I have a feeling that they would be new general purpose vector instructions rather than anything JSON-specific.
Or do you have any JSON-specific instructions in mind which would help beyond what the existing vector instructions already do?
When writing parsers, NEON suffers a bit from a lack of movemask and from tbl requiring that you mask off more high bits before using it (compared to pshufb). I mostly don't have complaints about AVX.