It's not like datomic claims to be lightening fast, it's not. The real problem i...

quoll · on Nov 18, 2022

Not sure what you mean here? as_of() is easy to do efficiently, so long as you have tree indexes (it doesn't really matter what type of tree, though fractal trees would not be good). But you need direct access to the tree, rather than having an index that's built using a tree, since then you can't build the phases.

packetlost · on Nov 18, 2022

Right, if you have direct access to the tree it's possible and not really inefficient at all (a modified B+Tree would be my first pick, but I digress). Part of my problem was I was trying to offload as much of the work to SQLite as possible. I think if I were to go back and try again, I would make it a bit further :)

harryvederci · on Nov 18, 2022

I'm talking about Datalog, I mentioned Datomic purely in the context of syntax.

And to sum up what I know about Datalog, it's basically just this article (not reference material, but it got me interested in the concept), plus some light wikipedia-ing: https://www.instantdb.com/essays/datalogjs

Edit: I think you added the part with your git repo after I wrote this. Looks cool, could you add a license file to it?

packetlost · on Nov 18, 2022

Got it. Yeah, datalog is cool. I really like it (if that wasn't obvious from my attempt at implementing it). While not built on SQLite, this project has caught my attention recently: https://github.com/cozodb/cozo

It's built on RocksDB and has slightly different query syntax (supposedly to be more similar to Python's, as a primary target usecase is within Jupyter notebooks). Check it out!

zh217 · on Nov 19, 2022

More precisely, Cozo currently has a RocksDB backend. The development version also has SQLite backend (for easier compilation on mobile platforms), in-memory backend, etc. It now can run as a web assembly module as well (see https://cozodb.github.io/wasm-demo/)

With respect to the SQLite backend, Cozo uses SQLite purely as a KV store with range scan capabilities, with both keys and values stored as bytes. This actually results in faster queries than offloading some stuff to the SQLite engine for the (very few) cases we tested.

Disclaimer: I'm the author of Cozo.

packetlost · on Nov 19, 2022

Oh wow, I didn't catch that. Awesome work man, I'm fan of Cozo and hope to make contributions in the future!

zh217 · on Nov 19, 2022

What you described can be implemented efficiently with B-trees. The problem is, the required operation is low-level and not usually exposed by the database engines. Basically you need to be able to walk the tree, up and down.

Say your table is `[data, timestamp]`, with data sorted ascendingly and timestamp sorted descendingly in the tree, and you want to scan for values for `as_of(T)`. Assume you have found `[data1, T1]` as a valid row. To get the next row, instead of the usual scanning, you seek to the value greater than `[data1, neg_inf]`. Now assume the value you get for the seek is `[data2, T2]`. If `T2 <= T`, then you get a match and can continue with the loop by seeking to `[data2, neg_inf]`, otherwise seek to `[data2, T]` and see what you get.

It is doable in SQLite, but you must use its VM directly, as you cannot do tree-walking with SQL.

Assume you have `M` keys and that for every key you have `N` timestamped data points. The above algorithm cuts down the as-of query time from linear complexity in `N` (`MN log(M)`) to logarithmic complexity in `N` (`M log(MN)`), which I think is the best we can do.

packetlost · on Nov 19, 2022

If I were to implement it, I would have a custom B+tree where you have a double-layer. The "upper" half of the tree the blocks are sorted by the "key". Once you get to the "leaf" nodes of the upper layer, you have a "lower layer" that contains blocks, all with the same key (or omit it), but sorted by the timestamp/tx instant. You get O(log(n) + log(m)) for historical queries, but O(log(n)) for "current" queries.

Or you can just index all modifications to all keys and have them ordered by key then tx, which will give you O(log(n)), but where your n is all values that have ever been set set.