
RecallGraph – an open-source graph database, for version controlled graph data - adityamukho
https://github.com/RecallGraph/RecallGraph
======
Bedon292
I had recently been thinking about something like this. A project I have been
working on has lots of relational data in postgres, and keeps the history of
all the data and relationships, but its so relational a graph is something we
were considering for the next version. But the history is important. So this
might actually fit the bill.

Has anyone had any experience with this yet?

~~~
refset
I don't have experience with either RecallGraph or ArangoDB but I have been
following the project for a few months now - kudos to the team for releasing
1.0!

Naturally there are a few other open source technologies exploring the
"temporal graph" space, including TerminusDB [0], which uses a git-like
branching history model for data collaboration, and Crux [1], which supports
bitemporal history for transactional workloads (full disclosure: I work on
Crux directly). Datahike [2] is also worth checking out if you've not seen it.
Both Crux and Datahike are heavily inspired by Datomic.

However, if the history-aspects of your data model end up being more
complicated than the graph-aspects of your data model, then you may well be
better off using SQL temporal tables as implemented by Teradata / SAP HANA /
DB2 etc.

[0] [https://terminusdb.com/](https://terminusdb.com/)

[1] [https://opencrux.com/](https://opencrux.com/)

[2]
[https://github.com/replikativ/datahike](https://github.com/replikativ/datahike)

~~~
Bedon292
Of these Crux, looks quite interesting. How well can it scale? Can it handle
billions of documents?

For us, history is mostly about being able to audit a record and understand it
history, and occasionally undo a mistake. Don't need or want to go with things
like SAP.

~~~
refset
Crux is designed to scale directly based on how RocksDB (or LMDB) performs on
a single node, in terms of: sustained ingestion throughput, KV seeks/sec, and
the sheer quantity of KV data that can be supported on an array of local SSDs
(i.e. easily many billions of small docs). At a higher level this means that
point-in-time queries will maintain good performance regardless of how much
history is stored, thanks to Z-order indexing [0], and the query algorithm
only requires very modest amounts of memory because the KV indexes are lazily
streamed out of Rocks and processed tuple-by-tuple (though having more memory
is always going to speed things up!).

Beyond the scope of a single Crux node, horizontal read scaling comes for free
due to the transaction time model of history (i.e. you can spin up N identical
nodes to service all manner of wholly unrelated use-cases with consistent
reads).

[0]
[https://en.wikipedia.org/wiki/Z-order_curve](https://en.wikipedia.org/wiki/Z-order_curve)

------
malloryerik
Seems useful to note that this is built on ArangoDB which is an open source
multimodel NoSQL database with document, key-value and graph, also rapid full-
text search. You can use the same language for all of them and combine query
styles (graph + document in the same query, etc). I'm not affiliated in any
way but ArangoDB fulfills requirements of mine and is awesome when Postgres
isn't the right fit so I'm hoping it grows and thrives.

[https://www.arangodb.com/](https://www.arangodb.com/)

------
kayza
What are the most common Graph database use cases? When should I go with a
Graph DB instead of a NoSQL or a SQL database?

~~~
mumblemumble
Well, first off: A graph database is typically considered a type of NoSQL
database. Second off, a lot of graph databases use a SQL database such as
PostgreSQL as the storage engine.

What really distinguishes graph databases is the querying language. There are
a lot of these out there - RDF, Datalog, Cypher and Gremlin. These are
typically optimized for modeling and making it easy to query against data with
a high degree of interconnectedness. So, taking an RDBMS as the baseline, and
assuming that by NoSQL you meant something like a column or document store
that offers poorer support for ad-hoc queries than an SQL database, a graph
database would be moving in the opposite direction.

Sort of. There's technically not anything a graph database can do that can't
be expressed in modern (i.e., since the early 2000s for most, or 2018 if MySQL
is your jam) SQL. But sometimes it can take a fair bit of effort to do so. If
you find yourself frequently getting lost in a quagmire of complex joins and
recursive CTEs, a graph DB can be a real boon for the maintainability of your
data layer.

~~~
rotten
I'm not so sure that many graph databases use a relational database as the
data store. Some use Linear Algebra representations of the graph. Some use
key-value stores. Some are proprietary implementations that we'll never know
exactly how the data is represented under the covers.

~~~
rapnie
RecallGraph at first glance looks a bit like TerminusDB that recently featured
on HN [0]. In terminusdb data is stored like code in git, and you can time
travel and do branch, merge, squash, rollback, diff, blame, etc. But
TerminusDB is a semantic graph database based on OWL schemas, which stores
data as RDF and querying delivers JSON-LD. I will certainly give RecallGraph a
closer look.

[0]
[https://news.ycombinator.com/item?id=22867767](https://news.ycombinator.com/item?id=22867767)

