The cool thing for working with json is to store each json document as is in one column, then make virtual columns that store some specific information you want to query, using some combination of json_extract, then index those columns.
This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.
(You can still also search non-indexed, raw json, although it may take a long time for large collections).
It's a shame SQLite doesn't support indexing arbitrary JSON data similar to Postgres. This works by hashing the paths and values within each JSON doc (or just the paths if you choose json_path_ops which you probably should.) https://www.postgresql.org/docs/current/datatype-json.html#J...
Unfortunately I don't think SQLite has generalized support for multi valued indices, though perhaps it would be possible to implement using the virtual table mechanism like full text search. https://www.sqlite.org/fts5.html
With a feature like this is there still a good reason to use MongoDB?
Honest question, I know Oracle and Postges well, but not their JSON features. I'm just starting to learn MongoDB seriously because of a current project.
Not really, imo. The biggest advantage of Mongo was being able to store a single hierarchical document and search it. Now that you can do that in the RDBMS there's no advantage at all. You also throw away all capability of doing relations in the database with Mongo which is optional in an RDBMS and sometimes very beneficial.
Mongo could win if it were more performant or had a better query syntax... but it isn't and it doesn't. It might have some slight advantage in the speed you can set up a replicated cluster but you'll pay long-term in overall performance from my experience. If all you are doing is storing documents, just use an S3 bucket, etc...
To me the only really good reason to use MongoDB those days is very high data ingestion rate. It was the main reason why CERN used it in the LHC, but I don't know if that solution is still in place.
But you need to know in advance which keys/paths you want to index.
Postgres can index the entire JSON document (or parts of it) and they can support unknown query condition (e.g. using a JSON path). This isn't as fast as a proper B-Tree index, but still faster than a full table scan.
I think the limiting factor is that there's no way to split the single JSON document out into all its paths and values in a way that SQLite can index it. There's `json_each` and `json_tree` but there's no way to generate a (dynamic) table using these functions that you could then index, as far as I know.
I believe this could probably be done using a trigger on document insert, but that would involve actually inserting each path and value into it's own table, rather than SQLite generating it on the fly, so it would likely more than double the storage requirement and require inserting potentially hundreds of rows for each document, depending on what your original document structure looks like.
I love this setup for working with external APIs. Often I'll only want a couple of fields from a response and don't want to build out a huge schema to match the entire response, but it's hard to know in advance which fields will be most useful. By spending some disk space to store the entire response you can then add fields you need ad hoc as virtual cols, like you describe.
I've baked a ton of different SQLite tricks - including things like full-text indexing support and advanced alter table methods - into my sqlite-utils CLI tool and Python library: https://sqlite-utils.datasette.io
My Datasette project provides tools for exploring, analyzing and publishing SQLite databases, plus ways to expose them via a JSON API: https://datasette.io
I've also written a ton of stuff about SQLite on my two blogs:
Your excellent work around SQLite led me to building https://dmd.tanna.dev and some other projects with it recently, and it's been such a game changer for me, so thank you
Oh wow, thanks very much :D yes Datasette was _so_ useful with making my organisation able to be more productive with understanding the data, made it much nicer than folks writing SQL locally, huge props for building it
SQLite is good stuff. Fan as I am of BSON over JSON as a more precise and comprehensive datatype for data management, I wrote this:
https://github.com/buzzm/sqlitebson
This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.
(You can still also search non-indexed, raw json, although it may take a long time for large collections).
I love SQLite so much.