
Cayley – An open-source graph database - _pius
http://cayley.io
======
bratsche
For people here experienced with graph databases, do you typically use the
graph db as your primary data store or do you use it in combination with
something like postgresql? If you're using both, can you talk about how that
works and if it's been successful for you?

I'm curious because I've had a couple situations where I thought using neo4j
(or some graph db) would be a natural fit for something I wanted to do, but
otherwise I thought most of my other data fit into postgresql just fine. My
instinct is that if I'm doing this in a web app then querying from two
different databases is going to slow down my responses a lot.

~~~
AlisdairO
I've got a lot of background in RDF graph stores. It depends a lot on your
usage, but I think for your typical web app, you'd be better off using a
Postgres install, and making use of fancier features like WITH RECURSIVE as
necessary. Graph stores often miss out on features like guaranteed relational
integrity and guaranteed constraints, which I find invaluable for safe
application development in the face of concurrent updates.

Graph stores are typically much slower for repetitive data that fits cleanly
into a relational model. This isn't to say they're not useful - for more
irregular data they're a fantastic fit - it's just that very irregularly
structured data isn't the common case.

Of course, you can always use two different stores - much like many sites do
with a separate lucene/elasticsearch index for text search - but your graphing
needs must be relatively componentised for that to work well.

~~~
JPKab
Curious: What RDF triple stores have you used, and in what kind of
application?

I was looking into using Stardog for a metadata repository I was building, but
we ended up (probably unwisely) bastardizing Postgres into a bunch of self-
join heirarchies.

~~~
AlisdairO
The ones I've spent most time with were Jena/TDB, Virtuoso, 3store, along with
a couple of proprietary engines. BigOWLIM is also a strong contender in the
space. I've used them in the context of both object storage and semantic web
data storage.

My experience is that if you don't need constraints/enforced relational
integrity, RDF stores make for really simple/easy object storage. There's
definitely a performance tradeoff, though - depends on what you need, really!

------
sfvisser
Interesting 'triple' they use :)

    
    
        // Our triple struct, used throughout.
        type Triple struct {
            Sub        string `json:"subject"`
            Pred       string `json:"predicate"`
            Obj        string `json:"object"`
            Provenance string `json:"provenance,omitempty"`
        }

~~~
AlisdairO
This is weirdly ubiquitous in the RDF store world as well. I eventually gave
up talking about 'triple stores' and called them 'RDF stores' instead :-)

~~~
jerven
It is basically required for SPARQL 1.1. (Named) graph support. Often people
call them quad stores.

------
slapresta
I can't get over the logo being a 3-colored version of a 2-colorable graph.

~~~
nandemo
That's intentional. It represents robustness and scalability: you can add an
edge between a red and yellow node, without having to re-color.

(actually I've just made that up)

------
nicklovescode
I've been using a graph db as my primary database with my last project(neo4j)
and it has been a pleasure. I wish a good graph database was hooked in with
the new Google Cloud stuff so it could be queried/visualized/performance
analyzed in a similar fashion to the BigTable demo earlier

~~~
cihangirsavas
are you using it for production purposes or for your small projects?

if it is for production, how is your read/write performance?

~~~
kaonashi
Neo4j wants everything in memory, so the bottlenecks would come after your
data-set size outstrips the memory available.

~~~
bunkat
Fast disks are also important for write performance since Neo4j syncs every
change - big RAID arrays of SSDs help.

~~~
jmlvanre
This is one place where Log-Structured-Merge systems can really prove
themselves. There is rarely a need to sync on every change as most of them are
independent. You can usually gain lots of write-throughput by syncing at the
speed the hardware is optimized for and squeezing as many append-only changes
in to those logs. At Orly we've spent quite some time looking at ways to deal
with these bottle-necks.

------
hugofirth
One question: I'm curious what your motivation was for providing a Gremlin
style query language? As opposed to something like Cypher? Was it a case of
expressiveness, personal preference etc....?

~~~
barakm
I wanted something easy to pick up -- Javascript seemed natural. I also wanted
to be as agnostic as possible, because graph query languages are interesting
and worth experimenting with. That's why it has submodules per-language and
it's (relatively) straight-forward to write another.

Cypher's not bad either, but it's not "just Javascript". But I'm totally
taking commits if someone wants to port it :)

------
cnbuff410
Here is a brief introduction

[http://google-opensource.blogspot.hk/2014/06/cayley-
graphs-i...](http://google-opensource.blogspot.hk/2014/06/cayley-graphs-in-
go.html)

------
brickcap
Almost all graph databases use gremlin as querying language but I really love
orient db's approach of using sql[1] for querying graphs. It feels more
natural in my opinion plus it lowers the barrier to entry for people who
already know about sql.

Some of the things I like about cayley

1\. Switchable backends(I wonder if I can configure it to use couchdb as a
store)

2\. Documentation get right to the point. When I first tried my hand at graph
databases I could not understand where to start but cayley's approach is
pretty straight forward and it wins plus points from me for including a big
dataset :)

A question: I see no mention in the docs about running it on multiple nodes
(where does it stand with regards to CAP etc)

[1]([https://github.com/orientechnologies/orientdb/wiki/SQL](https://github.com/orientechnologies/orientdb/wiki/SQL))

------
preillyme
Has anybody tried out Orly?
[https://github.com/orlyatomics/orly](https://github.com/orlyatomics/orly)

~~~
JasonL9000
I have. I'm on the team that's been developing it for the past four years.
It's nice to see graph databases getting some popular traction at last. (I was
into them before they were cool, of course.) I've only just started looking at
Cayley but it looks like there are some significant differences between the
projects. Orly is designed for high-speed, high-volume applications that need
large-scale storage and consistent transactions. We're more OLTP than OLAP,
which seems to be the way Cayley leans. Most graph systems tend toward
analytic applications.

~~~
preillyme
Could Cayley use Orly for a storage engine?

~~~
JasonL9000
Maybe so. Orly has a somewhat leveldb-like component called a Repo which might
slot in well. A Repo is a log-structured merge storage system that also
provides indexing, access to previous values, and, most importantly,
consistent reads and writes, even across multiple Repos. It also makes good
use of resources (RAM, SSD, HDD) to optimize performance and cost. (Also
working on letting it run on GPUs, for insanely high performance for people
with the power and air conditioning budgets.)

~~~
e12e
I've seen a couple of mentions of graph dbs on GPUs in this thread, and it
does seem like a (somewhat) obvious fit. Anyone aware of any projects that
make (good) use of that right now? Not necessarily a stand-alone service, but
also things like an embedded graph db ("berkley db for graphs") or something
like that?

~~~
jerven
BigData by systap is working on this combination see this blog post
([http://blog.bigdata.com/?p=658](http://blog.bigdata.com/?p=658))

------
hugofirth
Awesome - all my work is with graph data and graph databases any additions to
the space are great news.

------
bobbriody
Can anyone provide a working example for the visualization feature? The docs
say that you can use the Tag functionality to label source/target nodes for
sigma.js rendering, but bridging the gap between that suggestion and the
actual query does not seem trivial.

------
philjohn
Sad to see no SPARQL support as of yet, it looks like it's on their longer
term goals though as they are query language agnostic.

Interesting to see more products entering this space.

------
dajohnson89
Has anyone tried out Tinkerpop?

[http://www.tinkerpop.com](http://www.tinkerpop.com)

I played with it, and it was kinda fun.

------
dingdingdang
Demo version somewhere?

~~~
barakm
[http://cayley-graph.appspot.com](http://cayley-graph.appspot.com) is a live
demo on App Engine.

------
jperras


------
weihuan
zxcv

------
ForHackernews
> Not a Google project

~~~
riffraff
and obviously, not "google's new graph database".

~~~
nostrademons
It looks like a Googler's 20% project.

However, I'll point out that most of the actually-used open-source projects
coming out of Google aren't actually "Google projects", they are "projects by
Googlers released under the OSPO process". LevelDB (Jeff Dean & Sanjay
Ghemawat), Protocol Buffers (Kenton Varda), Guice (Bob Lee, Jesse Wilson, and
Kevin Bourillion), Gumbo (myself), and angular.js (a team within DoubleClick)
all started out as small internal projects built to scratch an individual's
itch that were then released externally because hey, why not. The "corporate"
open-source projects have been things like Android, Chrome, GWT, Closure,
Polymer, etc.

~~~
barakm
Exactly this. And I'm the Googler. Hi!

~~~
zaphar
Hi. Former googler who does work that Graph Databases would be useful for. If
you don't mind I'd like to ask:

Does it have bulk import and if so what is it's speed for bulk import rougly
speaking?

~~~
barakm
It does have bulk import; I've been loading largish subsets of the Freebase
dumps.

Load speed is pretty good (into persistent storage, I assume) and can be
improved with some of the database parameters. A rough estimate is that a
million triples or so takes about 5 minutes, but that slows down as it gets
bigger. 134m triples took me 6-8hrs, so I slept on it.

~~~
krato
Since no one's mentioned it yet, another alternative to Neo4J is AllegroGraph
(though you need to pay for it; the free version supports 5 million tuples).

It _does_ support bulk loading, with over 500K triples per second. According
to
[http://franz.com/agraph/allegrograph/agraph_benchmarks.lhtml](http://franz.com/agraph/allegrograph/agraph_benchmarks.lhtml),
given enough RAM, it can load over a billion tuples in just over half an hour.

~~~
jerven
Virtuoso 7.1. and OWLIM 5.5 have similar loading speeds. In this case the
decompression algorithm is often the bottleneck needing multiple files to be
read in parallel to go faster. Oracle 12c Semantic Network and Yarcdata uRiKA
can also go faster in loading if correctly set up.

------
leccine
Woo, this is probably the coolest project from Google in the last while. Is
there anybody using a graph database here? What is your use case? Would you
mind sharing?

~~~
jonathanmarvens
At my company, we're using graph-based models for most of our NLP stuff
(mainly entity extraction ATM) and our ANNs (mainly object classifiers).

\- Jonathan

~~~
leccine
Thanks Jonathan. I am not terribly familiar with NLP would you mind resolving
the 3 letter acronyms (ATM, ANN)?

------
tempodox

      OS X:  Homebrew is the preferred method.
    

Over my dead body. Homebrew messes up my /usr/local, leaving its heaps of crap
everywhere. I don't find that acceptable. MacPorts at least has the decency to
put itself into /opt/local, which isn't normally used by anything else.

