I really like using line-delimited JSON [0] for stuff like this. If you're looki...

klabb3 · on March 19, 2023

+1. While yes, you can have a giant json object, and you can hack your way around the obvious memory issues, it’s still a bad idea, imo. Even if you solve it for one use case in one language, you’ll have a bad time as soon as you use different tooling. JSON really is a universal message format, which is useful precisely because it’s so interoperable. And it’s only interoperable as long as messages are reasonably sized.

The only thing I miss from json lines is allowing a type specifier, so you can mix different types of messages. It’s not at all impossible to work around with wrapping or just roll a custom format, but still, it would be great to have a little bit of metadata for those use cases.

ddulaney · on March 19, 2023

An out-of-band type specifier would be cool, though you still have to know the implicit schema implied by each type.

In the system I work with, we standardized on objects that have a "type" key at the top level that contains a string identifying the type. Of course, that only works because we have lots of different tools that all output the same 30 or so data types. It definitely wouldn't scale to interoperability in general. But that's also one of the great things about JSON: it's flexible enough that you can work out a system that works at your scale, no more and no less.

wwader · on March 20, 2023

confusingly jq also has a streaming mode https://stedolan.github.io/jq/manual/#Streaming that streams JSON values as [<path>,<value>] pairs. This can also be combined with null input and enables one to reduce, foreach etc in a memory efficient way, eg sum all .a in an array without loading the whole array into memory:

    $ echo '[{"a":1},{"b":2},{"a":3}]' | jq -n --stream 'reduce (inputs | select(.[0][1:] == ["a"])[1]) as $v (0; .+$v)'
    4

stonecolddevin · on March 18, 2023

Isn't this pretty much what JSON streaming does?

ddulaney · on March 19, 2023

Yep, it’s a subset of JSON streaming (using Wikipedia’s definition [0], it’s the second major heading on that page). I like it because it preserves existing Unix tools like grep, but the other methods of streaming JSON have their own advantages.

[0]: https://en.m.wikipedia.org/wiki/JSON_streaming