I did an experiment[1] last year that demonstrated JSON parsing only requires 1 bit on the stack per array or associative array encountered (as they are the two non-terminals that lead to different rules).
In my implementation, I let a 64-bit integer represent the non-terminal stack (then used recursion if it went over that). So you'd need a JSON structure that has "a hash of a hash of a hash ... pointing to an array" with a depth in the 60s before you'd get a recursion call. And the real program stack shouldn't take that much per call (I didn't optimize this part, so I'm not sure what the minimum is).
I only wrote a JSON validator, but my method should apply to cases where JSON elements are being placed into a data structure.
I'm pretty sure that a JSON parser could be written in C/C++ that requires less than 1kb overhead, where most of that is the string buffer.
I make no claims about speed though, as my concern was proving the bit part (I suspect speed using my method would be worse than the naive per character jump table).
Once you have more than 50MB of JSON, maybe it's time to switch to a streaming parser. While that's not going to give you more parsing speed, it will solve all your memory issues and if you are getting that data from a slow source (like the network) then you can even parse as you are reading it, which then would give you a considerable speed advantage.
I had a very similar problem with Ruby and some of our workloads. I ended up utilizing this evented parser [1] to process smaller subsets of the JSON document and then clear them from memory. The result is a memory usage that is at most the size of one of these subsets (plus some overhead).
I have been wanting to generalize this approach with some open source software for some time, but I haven't gotten around to it. I think an attractive solution is for a user to specify a JSONPath [2] upfront about the types of subsets they are interested in. In the case of Ruby these chunks could then be yielded by an iterator and cleared, but I wonder if a more general-purpose CLI program which prints the subsets to stdout would also work if the subsets are small enough. But this adds another serialization/deserialization step.
I did something very similar with XML in Ruby. All the libraries are either event-based (SAX, which is awful for actual parsing code) or snarf the whole document into RAM.
I wrote a simple SAX wrapper [1] that applies a (so far, extremely simplified) subset of XPath to the stream, then parses each match into a DOM using Nokogiri. Works exceedingly well, though it's not faster than reading everything into RAM.
In the project that I am involved[0], I have been doing stress tests with very large JSON messages. After evaluate a few options I came up that the project 'JSMN' was good enough for my purposes.
https://github.com/zserge/jsmn
I am not sure if there are Python or another language bindings available but that library is very clean and simple to use. It does not perform memory allocations as you (the caller) have to provide the buffers so it gives you good flexibility.
I've worked with jsmn, and it was really a pleasure. I loved that you had one big memory allocation up front (which could even be on the stack) and then no more. In my case I was filling in a data structure where I was comfortable capping the length of all list-like elements, so I also had just one big allocation for that.
Using it reminded me a lot of my days writing XML SAX parsers. I understand why some people don't like that pattern, but if you've ever written a Finite State Machine it will feel very familiar. Personally I like how the event-based model lets me break things up into little chunks.
I made a "JSON parser generator" a few months ago. There is no memory overhead because it just puts the json data straight into a C struct while it's parsing.
I'd be curious to see how FlatBuffers lines up against MsgPack seeing as it requires just a single allocation and read.
Heck if you want to get fancy(and have access to mmap) you can just mmap() your FlatBuffer data and let the kernel handle paging it in/out for you for very low memory overhead.
Sometimes you dont have a choice, say receive your data in Jason. In those cases, my experience is that ujson is much faster than the alternatives. Interesting that the author concludes the same for small objects but a different picture with bigger objects.
In my implementation, I let a 64-bit integer represent the non-terminal stack (then used recursion if it went over that). So you'd need a JSON structure that has "a hash of a hash of a hash ... pointing to an array" with a depth in the 60s before you'd get a recursion call. And the real program stack shouldn't take that much per call (I didn't optimize this part, so I'm not sure what the minimum is).
I only wrote a JSON validator, but my method should apply to cases where JSON elements are being placed into a data structure.
I'm pretty sure that a JSON parser could be written in C/C++ that requires less than 1kb overhead, where most of that is the string buffer.
I make no claims about speed though, as my concern was proving the bit part (I suspect speed using my method would be worse than the naive per character jump table).
[1] http://web.cs.dal.ca/~sjackson/bitJson/JSON.html