I have been looking for exactly this for a while now:
“For my usecase, I wanted a way to:
Stash my data in its "original" JSON form
Explore it later & build whatever views I want
Keep costs & infrastructure complexity low
Self-host it / own my data”
However, I think compression is a key part. Large scraped json datasets can be compressed by a few orders of magnitude, which is the difference between scp’ing a db up to aws in minutes vs days. SQLite has an official proprietary extension to DEFLATE backing pages, but that’s not easy to buy/distribute for hobby projects. I’ve tried row compression with zstd dictionaries and it works well, but then you lose native indexing.
Mongo wiredtiger does pretty much what I want, it’s just not a neat flat file :/
Nice! Glad it resonated. Never quite sure how a project like this will land.
Thanks for sharing those - will check them out. Interested to see what happens as the size of the dataset grows.
I have not looked deeply, but Typesense[1] seems like another interesting project. Similar to ES or Algolia, easy to self-host, & with a seemingly efficient memory & disk footprint.
No compression of json docs. Jsonb is even bigger than json text column. There’s an extension out there but it requires manual compression passes and doesn’t allow indexing. Might as well use SQLite
Looks very similar to something I made a while ago[1]. SQLite is an awesome piece of software, and with JSON and Fulltext search baked in is ideal for these kind of things. Glad to see that people are finally realizing it!
We've been sticking JSON documents into SQLite databases for years, simply because we can't be bothered to map thousands of business facts to individual columns. Turns out that this approach is fast enough that our customers can't tell the difference either way.
Decisions like this can make for one of those 10-100x speedups in development timelines. Even if you are convinced that JSON serialization+SQLite are too slow, you should still try it to make sure. You will almost certainly be surprised.
Love it. Got some questions: What request volume does it perform against? Do you shard requests? How large is the dataset? What is an acceptable response time? Any major slowdowns at the highest percentiles?
Our request volume has a strict upper bound that we can depend on. The part of the user base touching JSON+SQL code paths is well-constrained.
For the most contentious area of our application - updating business state instance per user - We find that we are able to serve on the order of 1~10k requests per second. The size of these datasets is around 0.5~15 megabytes.
There isn't a whole lot of other magic involved. WAL is the most important thing for improving throughput.
This is phenomenal. The React app alone is truly impressive. You've put a lot of attention into things like displaying the query time and round-trip time.
At a glance (and it's a been a while since I've touched MongoDB) the query language looks very similar, maybe compatible with a couple of very minor changes. Was that intentional?
Thank you! Appreciate it! Glad you noticed some of the details.
I didn't intentionally model the Mongo query language. Some requirements were:
- JSON-based: so I did not have to parse the query
- Easy to extend: so I could add new operators later
- DB independent: so I could swap out SQLite with postgres, mongo, cockroach, duckdb, etc. IF I ever wanted to
- Expressive: so I could do aggregations / group-bys / etc
I considered using more popular standards like the ElasticSearch API but they weren't quite what I wanted. I liked how jsonlogic was basically just an AST. Its primitive structure makes code generation to other DB query languages straightforward.
I started this project with my own "JSON lisp" variant, but then found jsonlogic & used that since mine was _pretty_ close (only lists, no objects).
The top-level fields of my queries look like SQL (select, where, group_by, order_by) - I really like the Honeycomb query UI (https://www.honeycomb.io/query/) - so wanted to be able to support user experiences like that ... but also go further via arbitrary jsonlogic expressions.
I built something kind of similar, a JSON document explorer/viewer specifically for the database from a video game - https://data.destinysets.com. Something I think is neat about this is that it uses the OpenAPI spec to link relationships. This is all completely client side for better or worse.
Nice! The nested explorer you have there is REALLY slick. Great work!
This is something I had been thinking about - when dealing w/ normalized JSON documents, how might I allow exploration of referenced documents? & how might I encode those relationships is some way that's lightweight?
I think this is great. And on a tangent, a few months ago I was searching "everywhere" to find a somewhat standard rest api query format. There's odata, but that was way too heavy.
I don't know how popular jsonlogic itself really is, but it definitely seems like a good fit for queries.
Yeah, jsonlogic does not seem THAT popular. A JSON-based solution that looks like an AST means you can skip query parsing & just do generation. I started this project with my own JSON Lisp DSL before going with jsonlogic.
In either case, I like the idea of being able to swap out arbitrary storage layers - sqlite, postgres, mongo, etc. - while keeping the JSON api.
Somebody is using a database (sqlite in this case) to store JSON objects. There is nothing strange or unusual here.
When doing this, though, the problem is that it's more difficult to run search queries with the same expressiveness as we would otherwise do had the data been properly "unfolded" into many columns of a relational database.
This tool allows to make such expressive requests. This is very useful.
Interestingly the search query itself is expressed as a JSON object (this is a design choice, it could or could not have ben the case. In any case it's cute.)
“For my usecase, I wanted a way to:
Stash my data in its "original" JSON form Explore it later & build whatever views I want Keep costs & infrastructure complexity low Self-host it / own my data”
However, I think compression is a key part. Large scraped json datasets can be compressed by a few orders of magnitude, which is the difference between scp’ing a db up to aws in minutes vs days. SQLite has an official proprietary extension to DEFLATE backing pages, but that’s not easy to buy/distribute for hobby projects. I’ve tried row compression with zstd dictionaries and it works well, but then you lose native indexing.
Mongo wiredtiger does pretty much what I want, it’s just not a neat flat file :/