Hacker News new | past | comments | ask | show | jobs | submit login
SirixDB – An Evolutionary, Temporal NoSQL Storage System (github.com)
88 points by burtonator 15 days ago | hide | past | web | favorite | 25 comments



It would be interesting to hear about the use-cases that prompted the creation of Sirix. For example, we were inspired to build Crux[0], another bitemporal document database (although we opted for Datalog rather that XQuery), following our experiences of integrating timestamped data from multiple upstream systems, whilst coping with delays and ad-hoc corrections, and also maintaining efficient time-travel auditability.

[0] https://juxt.pro/crux


Some of the use cases can be found here:

https://sirix.io/documentation.html

The main distinctive features are our main document store index, features inherited by ZFS (checksums in parent pointers, the main index, a trie, compression and hopefully soon encryption of page-fragments, always consistent on-disk, log-structured without the need of a WAL...), versioned user-defined indexes, highly concurrent data structures (every transaction has access to one snapshot and we only allow one read/write transaction, parallelization has to be done by the client code) and record-level versioning (also a novel sliding window algorithm -- whereas a slightly other implementation is patented by the founder Marc Kramis).

I might implement pointer swizzling for the upcoming 1.0.0 release, should speedup Sirix considerably :-)

https://sirix.io/features.html

I think XQuery is great for querying JSON data. That said in the future I'd also implement something based on Spark to distribute queries...


Cool, thanks for the summary! The user-defined indexes sound particularly intriguing.

The documentation page alludes to payroll, audit and decision support applications -- have you implemented one or more of those already? I have been compiling a list of known uses for bitemporality in the Crux docs which you are more than welcome to borrow from:

https://juxt.pro/crux/docs/bitemp.html#_known_uses

It is great to see all this new enthusiasm for temporal databases now that they are finally viable, decades after all the major research happened :)


Will have to sleep now... but yes, the ideas and the first prototype emerged already in I think 2006 from Marc Kramis (back then it internally was named Idefix, then Treetank... but I think Treetank is pretty strange ;-))


How is Crux different from Datomic?


These are the differences I think I know about so far: - You can't lie to Datomic, it's immutable and has some really interesting performance characteristics because of that i.e. TTL = infinity, the downside from a legal pov is no information is really gone only redacted from now, I think the law should be updated so that redacted is enough (if not already, I'm not a lawyer) Crux can delete things not sure about performance ramifications - Datomic stores information as datoms which are single pieces of information which are easy to re-arrange at run time to be any shape of tree you like, Crux is a document store which is a small tree already, not sure what abilities there are to re-arrange the tree at query time

- storing business time in Datomic is inefficient because it has to be added to the transaction which I don't think is added to indexes, this is the primary use case of crux though

- cardinality is defined up front in Datomic which I suspect catches data integrity errors, I haven't seen anything in crux to support this

- crux doesn't support either the pull or entity syntax ( I don't know which one or why that's important, I'm a bit out of my depth now)


Wow, great summary, thanks from me, too :)


Has bracket work moved over to StrixDB? I’ve been looking at bracket but don’t see any commits to the bracket repo in over a year. I’m looking forward to checking out StrixDB


No, I had to "fork" it, when it still was on Google Code, because of temporal extensions and lately JSON stuff. Sebastian sadly doesn't seem to be on Github a lot. Would love to contribute my changes, but I also have to have the ability to change/fix stuff on my own. That said I really love the whole idea of Brackit, a flexible query compiler framework with proven rules, to rewrite the AST usable by any data store... and I want to plugin rewrite rules for index accesses as well as operators in the first place and so on (the indexes can be manually created and queried as of now). Sirix basically as of now has to use raw document scans, but I'd love to work on rewrite rules and then on a cost based optimizer (that said it would be great to find someone who is experienced and can help a bit). His Ph.D. thesis can be found here: http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publicatio...

And Sebastian worked under supervision from Dr. Theo Härder :-)


so anyway -

Names beginning with the string "xml", or with any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification.

from here https://www.w3.org/TR/xml/#sec-common-syn

so I think the following

<rest:sequence xmlns:rest="https://sirix.io/rest"> <rest:item> <xml rest:id="1"> foo <bar rest:id="3"> <test rest:id="4"> yikes <bar rest:id="6"/> </test> </bar> </xml> </rest:item> </rest:sequence>

is not well-formed, or has there been some change I've missed?


Okay thanks, would need to change the example and add it to the token checker before inserting elements/attributes...


My latest article released a few days ago can be found here:

https://hackernoon.com/asynchronous-temporal-rest-with-vert-...


What's the use-case for time-travel queries vs. just using time series databases? (ie. influx, timescale, kdb etc.)


Hey, I think best explanation is this:

https://stackoverflow.com/questions/51533143/temporal-vs-tim...

:-)


The use-case listed there (the change of address) can be implemented perfectly fine with any rational or non-rational database. I personally would not switch to a completely new database and new paradigm just because of the requirement to show user's address based on the date. And for data that changes a lot - time series db still seems like a better choice.

So what's the real use-case?


You can reconstruct a revision in O(n), you can search for a specific revision in O(log n) and the transaction time is stored in a revision root page (time the transaction commits). Thus, you do not have to store the time for each node (even start- and end-time).

Furthermore Sirix does not have to copy whole record pages which have changed, it depends on the chosen versioning algorithm. The whole structure is highly concurrent and allows client side parallelism. We also do not have to write in a WAL first, but the structure is always consistent (if no hardware failure occurs...).


*relational (damnit!)


By the way, if you check it out and find bugs, please let me know :-) It's still at an early stage and for sure has some rough edges.

Thanks and have a great day :)

Oh and maybe another interesting article: https://hackernoon.com/sirix-io-why-copy-on-write-semantics-...


By the way, would it make sense to work on these visualizations again for comparing revisions?

https://youtu.be/l9CXXBkl5vI

Or would anyone be interested to port the processing.org stuff to the web? Sadly I'm a backend developer and feel a bit overwhelmed by all the front-end frameworks JavaScript stuff), but I think the visualizations for comparing revisions are really helpful.

More information: https://github.com/JohannesLichtenberger/master-thesis/raw/m...


Oh and I think with pointer swizzling (Java object references instead of references to the page-IDs in the buffer manager) we can gain quiet some speed :-)

And maybe adding memory mapped files in the future.


And I also have an idea how to save up even more space on disk/on the flash drive :-)


Oh BTW: I'm also no native english speaker, so, any hint on stuff, which is wrong is also greatly appreciated :-) just got my first spelling error correction pull request. You guys are really awesome, thanks :)


(first since a few years ago), but thanks so much :)


By the way I've added some documentation lately on https://sirix.io :-)


Wow, thank you so much for posting/mentioning :-)

Have a great evening Johannes




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: