
SirixDB – An Evolutionary, Temporal NoSQL Storage System - burtonator
https://github.com/sirixdb/sirix
======
refset
It would be interesting to hear about the use-cases that prompted the creation
of Sirix. For example, we were inspired to build Crux[0], another bitemporal
document database (although we opted for Datalog rather that XQuery),
following our experiences of integrating timestamped data from multiple
upstream systems, whilst coping with delays and ad-hoc corrections, and also
maintaining efficient time-travel auditability.

[0] [https://juxt.pro/crux](https://juxt.pro/crux)

~~~
lichtenberger
Some of the use cases can be found here:

[https://sirix.io/documentation.html](https://sirix.io/documentation.html)

The main distinctive features are our main document store index, features
inherited by ZFS (checksums in parent pointers, the main index, a trie,
compression and hopefully soon encryption of page-fragments, always consistent
on-disk, log-structured without the need of a WAL...), versioned user-defined
indexes, highly concurrent data structures (every transaction has access to
one snapshot and we only allow one read/write transaction, parallelization has
to be done by the client code) and record-level versioning (also a novel
sliding window algorithm -- whereas a slightly other implementation is
patented by the founder Marc Kramis).

I might implement pointer swizzling for the upcoming 1.0.0 release, should
speedup Sirix considerably :-)

[https://sirix.io/features.html](https://sirix.io/features.html)

I think XQuery is great for querying JSON data. That said in the future I'd
also implement something based on Spark to distribute queries...

~~~
refset
Cool, thanks for the summary! The user-defined indexes sound particularly
intriguing.

The documentation page alludes to payroll, audit and decision support
applications -- have you implemented one or more of those already? I have been
compiling a list of known uses for bitemporality in the Crux docs which you
are more than welcome to borrow from:

[https://juxt.pro/crux/docs/bitemp.html#_known_uses](https://juxt.pro/crux/docs/bitemp.html#_known_uses)

It is great to see all this new enthusiasm for temporal databases now that
they are finally viable, decades after all the major research happened :)

~~~
lichtenberger
Will have to sleep now... but yes, the ideas and the first prototype emerged
already in I think 2006 from Marc Kramis (back then it internally was named
Idefix, then Treetank... but I think Treetank is pretty strange ;-))

------
zcw100
Has bracket work moved over to StrixDB? I’ve been looking at bracket but don’t
see any commits to the bracket repo in over a year. I’m looking forward to
checking out StrixDB

~~~
lichtenberger
No, I had to "fork" it, when it still was on Google Code, because of temporal
extensions and lately JSON stuff. Sebastian sadly doesn't seem to be on Github
a lot. Would love to contribute my changes, but I also have to have the
ability to change/fix stuff on my own. That said I really love the whole idea
of Brackit, a flexible query compiler framework with proven rules, to rewrite
the AST usable by any data store... and I want to plugin rewrite rules for
index accesses as well as operators in the first place and so on (the indexes
can be manually created and queried as of now). Sirix basically as of now has
to use raw document scans, but I'd love to work on rewrite rules and then on a
cost based optimizer (that said it would be great to find someone who is
experienced and can help a bit). His Ph.D. thesis can be found here:
[http://wwwlgis.informatik.uni-
kl.de/cms/fileadmin/publicatio...](http://wwwlgis.informatik.uni-
kl.de/cms/fileadmin/publications/2013/Dissertation-Baechle.pdf)

And Sebastian worked under supervision from Dr. Theo Härder :-)

------
bryanrasmussen
so anyway -

Names beginning with the string "xml", or with any string which would match
(('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or
future versions of this specification.

from here [https://www.w3.org/TR/xml/#sec-common-
syn](https://www.w3.org/TR/xml/#sec-common-syn)

so I think the following

<rest:sequence xmlns:rest="[https://sirix.io/rest">](https://sirix.io/rest">)
<rest:item> <xml rest:id="1"> foo <bar rest:id="3"> <test rest:id="4"> yikes
<bar rest:id="6"/> </test> </bar> </xml> </rest:item> </rest:sequence>

is not well-formed, or has there been some change I've missed?

~~~
lichtenberger
Okay thanks, would need to change the example and add it to the token checker
before inserting elements/attributes...

------
lichtenberger
My latest article released a few days ago can be found here:

[https://hackernoon.com/asynchronous-temporal-rest-with-
vert-...](https://hackernoon.com/asynchronous-temporal-rest-with-vert-x-
keycloak-and-kotlin-
coroutines-217b25756314?source=friends_link&sk=5eabb36b2984cf61a2dff3f9fe45addc)

------
reykjavik
What's the use-case for time-travel queries vs. just using time series
databases? (ie. influx, timescale, kdb etc.)

~~~
lichtenberger
Hey, I think best explanation is this:

[https://stackoverflow.com/questions/51533143/temporal-vs-
tim...](https://stackoverflow.com/questions/51533143/temporal-vs-time-series-
database)

:-)

~~~
reykjavik
The use-case listed there (the change of address) can be implemented perfectly
fine with any rational or non-rational database. I personally would not switch
to a completely new database and new paradigm just because of the requirement
to show user's address based on the date. And for data that changes a lot -
time series db still seems like a better choice.

So what's the real use-case?

~~~
lichtenberger
You can reconstruct a revision in O(n), you can search for a specific revision
in O(log n) and the transaction time is stored in a revision root page (time
the transaction commits). Thus, you do not have to store the time for each
node (even start- and end-time).

Furthermore Sirix does not have to copy whole record pages which have changed,
it depends on the chosen versioning algorithm. The whole structure is highly
concurrent and allows client side parallelism. We also do not have to write in
a WAL first, but the structure is always consistent (if no hardware failure
occurs...).

------
lichtenberger
By the way, if you check it out and find bugs, please let me know :-) It's
still at an early stage and for sure has some rough edges.

Thanks and have a great day :)

Oh and maybe another interesting article: [https://hackernoon.com/sirix-io-
why-copy-on-write-semantics-...](https://hackernoon.com/sirix-io-why-copy-on-
write-semantics-and-node-level-versioning-are-key-to-efficient-
snapshots-754ba834d3bb)

~~~
lichtenberger
By the way, would it make sense to work on these visualizations again for
comparing revisions?

[https://youtu.be/l9CXXBkl5vI](https://youtu.be/l9CXXBkl5vI)

Or would anyone be interested to port the processing.org stuff to the web?
Sadly I'm a backend developer and feel a bit overwhelmed by all the front-end
frameworks JavaScript stuff), but I think the visualizations for comparing
revisions are really helpful.

More information: [https://github.com/JohannesLichtenberger/master-
thesis/raw/m...](https://github.com/JohannesLichtenberger/master-
thesis/raw/master/Master/Thesis/thesis.pdf)

~~~
lichtenberger
Oh and I think with pointer swizzling (Java object references instead of
references to the page-IDs in the buffer manager) we can gain quiet some speed
:-)

And maybe adding memory mapped files in the future.

~~~
lichtenberger
And I also have an idea how to save up even more space on disk/on the flash
drive :-)

------
lichtenberger
Oh BTW: I'm also no native english speaker, so, any hint on stuff, which is
wrong is also greatly appreciated :-) just got my first spelling error
correction pull request. You guys are really awesome, thanks :)

~~~
lichtenberger
(first since a few years ago), but thanks so much :)

------
lichtenberger
By the way I've added some documentation lately on
[https://sirix.io](https://sirix.io) :-)

------
lichtenberger
Wow, thank you so much for posting/mentioning :-)

Have a great evening Johannes

