
Cayley – An open-source graph database - iamjeff
https://github.com/cayleygraph/cayley
======
barakm
Maintainer here, good to see Cayley on HN again :)

We've got a lot of new features on master, (GraphQL support, Gephi interfaces,
Recursive iterators, etc) and are cutting a release next week.

Active work in the coming releases on tightening down the indexing and really
bringing it into prod.

EDIT: Feel free to join the new Slack or the Discourse mailing list/discussion
board!

~~~
chaostheory
What's the difference between Cayley and Neo4J?

~~~
edem
Think about Datomic. It can use Riak as a backend or something else but it is
still seen ad Datomic to the outside world. (You can also compare it to Lucene
vs Solr)

~~~
foobarchu
Wait, Datomic with Riak as the backend? That seems a ridiculous level of
abstraction, since Riak itself uses pluggable backend stores (Bitcask or
LevelDB). Like creating a programming language whose interpreter runs on the
JVM.

------
sverhagen
We have Elasticsearch as a generic document search engine, each document has a
non-trivial amount of properties (let's say 50 or so). It's incredibly
performant for all sort of searches, the details of each of which this
solution wasn't specifically designed for, hence me calling it a _generic_
search engine. Every time I contemplate of bringing graph relationships, that
exist between these documents, into the mix, I get stuck. Elasticsearch
doesn't quite do graph (natively), but the graph databases I tried don't do
properties too well (OrientDB, Neo4J). I'm not talking about one or two
properties, but multiple properties across multiple hops in the graph that I
envision querying for. Let alone full text searches. I emailed back and forth
with the helpful folks at Orient, but it always came down to optimising for
specific queries, gone the "generic". Is anyone solving that problem? Cayley?

~~~
janemanos
Sounds like something ArangoDB could be a good solution for. Full disclosure
I'm from ArangoDB team and happy to help. If you like just drop me a line to
jan.stuecke (at) arangodb.com

~~~
lolive
Being a graph database user, i always have to manage a replication of the
"molecules"of my graph in ES for a user-friendly search experience. Ca
arangodb help for such a use case? Or may be dgraph?

~~~
janemanos
With ArangoDB you can choose between synchronous replication and asynchronous.
With the Agency of ArangoDB you also have a RAFT based consensus protocol
which holds the state of the cluster. My team mate wrote a nice article about
our approach. You might want to have a look:
[https://www.arangodb.com/2017/01/reaching-harnessing-
consens...](https://www.arangodb.com/2017/01/reaching-harnessing-consensus-
arangodb/). In single instance you have full transactional guarantees with
multi collection and multi document transactions. In cluster mode we provide
single document transactions. More guarantees will follow.

------
jnordwick
What exactly is this? The GitHub page speaks of different backends, and those
appear to just be databases or key-value stores in themselves (e.g, Postgres
and Bolt).

Is Cayley basically a query rewritter, that is it has some tables in the
backend and when queried, Cayley then goes to the "real" (for lack of a better
word) database? Cayley's query language might be more full featured, but it
isn't a storage mechanism in itself?

There are two things from that:

1\. There is no way for Cayley to take the graph structure of the data into
account when laying it out on disk or when executing the query. Is this the
long-term decision, or is this just a stop-gap until a storage mechanism can
be done?

2\. This would seem to imply that the abstraction layer from Cayley to the
backend storage would be relatively slim. How difficult is it to add another
storage driver for another SQL database or for one with a custom query
language?

Another thing I noticed:

> query -- films starring X and Y -- takes ~150ms

Even on two year old hardware that seems dog slow - less than 7 queries a
second - for a very simple query.

~~~
dimfeld
Cayley's graph data layout is most similar to a Hexastore-style [1] triple
store, though IIRC it doesn't do the full six-way index that the original
Hexastore paper describes. The Redis page on secondary indexing [2] has a
great quick intro to what this actually entails (search the page for
Hexastore).

As you might guess from the Redis link, this style of graph lends itself well
to KV stores, so the answer to your question #1 might be that it's a long-term
decision, but the style of graph is really designed for a KV store anyway. But
I haven't discussed this at all with the Cayley devs so I can't actually speak
for them.

I'm using it with the BoltDB backend and have been pleased with the
performance overall. I haven't looked at the backends for more complex
databases like Postgres in detail, but it does appear that the backend
interface has potential for predicate pushdown as well. The repository's graph
directory [3] contains the various backends if you want to check it out.
Overall it doesn't look very difficult to add another backend type, but I
haven't tried it yet. Looking at the existing SQL backend, it appears to
already support MySQL, PostgreSQL, and CockroachDB (but I've tried none of
these with Cayley).

[1]
[http://www.vldb.org/pvldb/1/1453965.pdf](http://www.vldb.org/pvldb/1/1453965.pdf)
[2] [https://redis.io/topics/indexes](https://redis.io/topics/indexes) [3]
[https://github.com/cayleygraph/cayley/tree/master/graph](https://github.com/cayleygraph/cayley/tree/master/graph)

~~~
itamarhaber
Speaking of which, take a look at this Redis module that marries Hexastore and
neo4j-like queries: [https://github.com/RedisLabsModules/redis-module-
graph](https://github.com/RedisLabsModules/redis-module-graph)

------
ceyhunkazel
Good work! An alternative is Dgraph [https://dgraph.io/](https://dgraph.io/)
which I am considering for my next project.

~~~
dyu-
Note that they changed the license from APL to AGPL just recently (13 days
ago, based from their commits)

------
zmanian
Here are my experiences with Cayley. 100% positive for building graph
microservices.

1\. Use Cayley as a library 2\. Put metadata in separate nodes.

~~~
dimfeld
This is the approach I ended up using too. Works great.

------
badtuple
Anyone use Cayley in prod? An old job used Neo4j, and the graph concept was
great for specific use cases. As a lightweight graph store, Cayley was really
exciting when it came out, but I haven't had a need for it since I left that
job. It strikes me as really well made, and I'd love to hear any war stories.

~~~
LukaAl
Tried to use it in production a couple of years ago hosting a mirror copy of
Freebase with mixed results:

\- There were a couple of issue loading the data that we fixed and contributed
back the patch

\- Loading the data was really slow, and it got slower every time a new entry
was added (Loading the full freebase dump required 1 week on a very beefy
machine with SSD. Used LevelDB)

\- Then the queries were relatively slow. Without going too much into details,
we were using the data to analyze texts and extract entities, and the
relationship between them, and even parallelizing the queries, they were
relatively slow (depending on complexity between 0.1 and 1 sec on average). We
solved the issue implementing a robust caching layer in front of it and
carefully planning the queries.

\- In general, it was stable and performant enough for a backend service. But
we were pushing really the envelope of what it could do.

All in all, I would say that I was happy with it. In comparison, I tried a
year earlier to use Neo4J in a similar role and I give up after 2 weeks
because I wasn't even able to get it loading part of the dataset without
crashing on a similar hardware.

~~~
frik
What's the best way to load Freebase in 2017? Cayley with Postgres storage? Or
some other RDF/graph DB? Or ElasticSearch? Or dump it in Postgres/MySQL? I am
not interested in complex queries, but simple queries that execute reasonable
fast.

~~~
pawanrawal
We have it loaded on a Dgraph instance. In case you want to play around with
it at [https://play.dgraph.io](https://play.dgraph.io)

~~~
mring33621
How is Dgraph licensed? I see both Apache and AGPL in GitHub.

~~~
mrjn
Dgraph follows MongoDB licensing. The clients are all in Apache, and the
server code is AGPL. This doesn't affect anyone using Dgraph for commercial
purposes; but if they make changes to the server code, they'll have to release
them under AGPL. Blog post here:
[https://open.dgraph.io/post/licensing/](https://open.dgraph.io/post/licensing/)

------
fiatjaf
I can't understand how to use the query language. It all seems so magical!

I tried building something with Cayley once but couldn't fetch all the data I
wanted in a single query, or didn't know how to, then got frustrated and
deleted everything.

~~~
rektide
Which of the three query languages are you having trouble with? All of them?
MQL has been around a long long time (2006). Gizmo is new but based on & very
similar to Gremlin (2009). GraphQL is the newest (2015). Did you try them all?
Or is one in particular rough?

~~~
fiatjaf
I'm talking about that Gizmo/Gremlin.

------
wut42
Very nice! any plans to use it as a backend for google's badwolf[1] (a
temporal graph store)?

[https://github.com/google/badwolf](https://github.com/google/badwolf)

~~~
barakm
The thing badwolf brings to the table (and respect to the author -- super nice
fellow) is adding metadata (namely, a timestamp) to the links.

The topic of 'reification' found throughout our recent discussion is how we
can generally add metadata to links, thereby making it a lot easier to fit the
two models together.

~~~
wut42
awesome :) thanks

------
guillem_lefait
Being on HN while doc is lorem ipsum ([https://cayley.io/1-getting-
started/](https://cayley.io/1-getting-started/)), damned !

~~~
barakm
Yeah, we're still getting the marketing site up, complete with docs. Til then,
there's
[https://github.com/cayleygraph/cayley/tree/master/docs](https://github.com/cayleygraph/cayley/tree/master/docs)
with the content

------
jraedisch
There is an asterisk in the docs behind "inspired" but no footnote for it.
What does it mean?

------
d0vs
Can anyone give concrete examples of datasets that are better suited for a
graph database and why?

~~~
maxdemarzi
Anything Social. Product or Person hierarchies. Network datasets. Ancestry
(genetic or data), etc.

They are better suited for Graph Databases because the queries tend to be many
joins traversing paths both deep and wide.

------
schemathings
Any screenshots of the visualizer?

------
frik
How matured is it? Has someone used it for big datasets? Last time I tried it
wasn't ready to cope with a 250 GB N-Triples RDF. (two years ago)

~~~
frik
I see Cayley now supports other backends beside LevelDB. "PostgreSQL and
MongoDB for distributed stores" \- that's good to read.

~~~
lolive
is there a description of the RDBMS low-level model? Is it something like a
single s,p,o table, with indexes (s, so, spo, , po, etc)?

~~~
barakm
Exactly so, for the RDBMS model. Yeah, this has it's own issues, but it's the
most direct method. We do a little extra trick with the indexing; joining on
fixed hashes instead of the full value, but nothing crazy.

