Hacker News new | past | comments | ask | show | jobs | submit login

It's not like datomic claims to be lightening fast, it's not. The real problem is if you want to support the historical queries "as_of()", which is very difficult to do efficiently with only B-tree indexes. So you're either left maintaining your own indices as tables or externalizing it to something else. I think the latter would actually be a pretty ok option if your dataset fits entirely in memory (though... for historical queries it will grow much faster than a normal database).

Regardless, I have a partial attempted implementation of this on top of SQLite using Python: https://git.sr.ht/~chiefnoah/quark

No query engine, I didn't get that far.




Not sure what you mean here? as_of() is easy to do efficiently, so long as you have tree indexes (it doesn't really matter what type of tree, though fractal trees would not be good). But you need direct access to the tree, rather than having an index that's built using a tree, since then you can't build the phases.


Right, if you have direct access to the tree it's possible and not really inefficient at all (a modified B+Tree would be my first pick, but I digress). Part of my problem was I was trying to offload as much of the work to SQLite as possible. I think if I were to go back and try again, I would make it a bit further :)


I'm talking about Datalog, I mentioned Datomic purely in the context of syntax.

And to sum up what I know about Datalog, it's basically just this article (not reference material, but it got me interested in the concept), plus some light wikipedia-ing: https://www.instantdb.com/essays/datalogjs

Edit: I think you added the part with your git repo after I wrote this. Looks cool, could you add a license file to it?


Got it. Yeah, datalog is cool. I really like it (if that wasn't obvious from my attempt at implementing it). While not built on SQLite, this project has caught my attention recently: https://github.com/cozodb/cozo

It's built on RocksDB and has slightly different query syntax (supposedly to be more similar to Python's, as a primary target usecase is within Jupyter notebooks). Check it out!


More precisely, Cozo currently has a RocksDB backend. The development version also has SQLite backend (for easier compilation on mobile platforms), in-memory backend, etc. It now can run as a web assembly module as well (see https://cozodb.github.io/wasm-demo/)

With respect to the SQLite backend, Cozo uses SQLite purely as a KV store with range scan capabilities, with both keys and values stored as bytes. This actually results in faster queries than offloading some stuff to the SQLite engine for the (very few) cases we tested.

Disclaimer: I'm the author of Cozo.


Oh wow, I didn't catch that. Awesome work man, I'm fan of Cozo and hope to make contributions in the future!


What you described can be implemented efficiently with B-trees. The problem is, the required operation is low-level and not usually exposed by the database engines. Basically you need to be able to walk the tree, up and down.

Say your table is `[data, timestamp]`, with data sorted ascendingly and timestamp sorted descendingly in the tree, and you want to scan for values for `as_of(T)`. Assume you have found `[data1, T1]` as a valid row. To get the next row, instead of the usual scanning, you seek to the value greater than `[data1, neg_inf]`. Now assume the value you get for the seek is `[data2, T2]`. If `T2 <= T`, then you get a match and can continue with the loop by seeking to `[data2, neg_inf]`, otherwise seek to `[data2, T]` and see what you get.

It is doable in SQLite, but you must use its VM directly, as you cannot do tree-walking with SQL.

Assume you have `M` keys and that for every key you have `N` timestamped data points. The above algorithm cuts down the as-of query time from linear complexity in `N` (`MN log(M)`) to logarithmic complexity in `N` (`M log(MN)`), which I think is the best we can do.


If I were to implement it, I would have a custom B+tree where you have a double-layer. The "upper" half of the tree the blocks are sorted by the "key". Once you get to the "leaf" nodes of the upper layer, you have a "lower layer" that contains blocks, all with the same key (or omit it), but sorted by the timestamp/tx instant. You get O(log(n) + log(m)) for historical queries, but O(log(n)) for "current" queries.

Or you can just index all modifications to all keys and have them ordered by key then tx, which will give you O(log(n)), but where your n is all values that have ever been set set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: