
Graph databases and Python - bjerun
http://de.slideshare.net/MaxKlymyshyn/odessapy2013-pdf
======
sophacles
Does anyone have a good list of books, tutorials, howtos and whatnot about
graph databases in general, preferably with python examples (but any language
is good really...)?

I've used graphdbs in the past but a nice collection of patterns and best
practices would be nice - upping my game on this topic is a current interest
of mine!

~~~
schiang
[http://shop.oreilly.com/product/0636920028246.do](http://shop.oreilly.com/product/0636920028246.do)

This is a good book for a quick intro to graph dbs for anyone interested.

~~~
FraaJad
The book is free for download at
[http://graphdatabases.com/](http://graphdatabases.com/)

------
gngeal
Could anyone explain to me what it means "native" versus "non-native" graph
processing in that slide show? Ditto for "native" versus "non-native" graph
storage. I simply have no idea what I'm supposed to picture when I see that.

Also, on the neo4j.org page, the claim that "graph data model['s]
expressiveness supersedes the relational model" seems a little bit spurious,
seeing as, as I understand it, the relational model and graph data are both
anchored in first-order predicate logic, and therefore should be able to do
the same things essentially (although Codd-style RDBMS with a little bit more
fuss regarding the necessary schemas).

~~~
espeed
_Could anyone explain to me what it means "native" versus "non-native" graph
processing in that slide show?_

One of the leading native graph processing engines is GraphLab
([http://graphlab.org/);](http://graphlab.org/\);) however, the creator of
GraphLab, Dr. Joey Gonzalez, is now focused on GraphX, which is essentially
GraphLab built on Spark
([http://spark.incubator.apache.org](http://spark.incubator.apache.org)),
which is a non-native analytics platform.

Building a graph-processing engine on a general processing system like Spark
makes pre-processing and post-processing much easier.

See "Introduction to GraphX - Presented by Joseph Gonzalez, Reynold Xin - UC
Berkeley AmpLab 2013"
([http://www.youtube.com/watch?v=mKEn9C5bRck](http://www.youtube.com/watch?v=mKEn9C5bRck))

Also, a bunch of advancements in graph processing are coming down the pipe,
which will be released in a few months (see
[https://news.ycombinator.com/item?id=6786563](https://news.ycombinator.com/item?id=6786563)).

 _Ditto for "native" versus "non-native" graph storage._

See this post by Dr. Matthias Broecheler, the creator of Titan
([https://github.com/thinkaurelius/titan/wiki](https://github.com/thinkaurelius/titan/wiki))...

"A Letter Regarding Native Graph Databases"
([http://thinkaurelius.com/2013/11/01/a-letter-regarding-
nativ...](http://thinkaurelius.com/2013/11/01/a-letter-regarding-native-graph-
databases/))

~~~
gngeal
So essentially, it's totally meaningless marketing bullshit? As much as I
favor memory optimizations, I think that merely trying to linearize the access
patterns is completely futile in the case of graph databases. On that level of
brute-force approach to speeding things up, you'll most likely gain more
performance by using lower-latency memory modules, or simply by using
different data structures to accommodate for your specific cache line sizes
and latencies, then by trying to linearize generic graphs.

------
aidos
I just joined a project using neo4j. They're using the latest version so we've
had to build our own python tooling. It's still very immature but hopefully
we'll get it into an opensourcable state.

Modelling in graphs is new to me so I was wondering if anyone had any tips or
pointers.

~~~
espeed
It shouldn't take too much to update Bulbs/Neo4j to Neo4j 2.0 -- add the
Gremlin Plugin on Neo4j Server (which isn't installed by default anymore) or
swap out the Bulbs built-in Gremlin scripts for Cypher equivalents, if Cypher
will let you do everything you need...

Bulbs Python Client:
[https://github.com/espeed/bulbs](https://github.com/espeed/bulbs)

------
techtalsky
I'm working on a project that requires a tree datastructure (basically a graph
but with only single-direction parent-child relationships) and the number of
nodes will stay under a thousand. I could have chosen a graph database, but
for my level of complexity I just used postgres and a table that has foriegn-
key relationships to itself.

Then I made a rails front end using the acts-as-sane-tree gem, which is
designed to use this postgres data model and recursive queries:
[https://github.com/chrisroberts/acts_as_sane_tree](https://github.com/chrisroberts/acts_as_sane_tree)

~~~
olefoo
Have you looked at the ltree extension for postgres?
[http://www.postgresql.org/docs/9.3/static/ltree.html](http://www.postgresql.org/docs/9.3/static/ltree.html)

It's quite fast.

~~~
techtalsky
No, looks cool though. That would impact deployability to heroku though, yes?

~~~
olefoo
It's on their list of approved extensions
[https://postgres.heroku.com/blog/past/2012/8/2/announcing_su...](https://postgres.heroku.com/blog/past/2012/8/2/announcing_support_for_17_new_postgres_extensions_including_dblink/)

You should be able to say:

    
    
        CREATE EXTENSION ltree;
    

In the database you want it in.

------
bsaul
I had a look into python client library for neo4j a year ago, and couldn't
find a way to perform multiple graph writes in a single transaction, because
the only API available was the http one. Has that changed since ?

~~~
espeed
You can use client-side or server-side Gremlin scripts for this...

[http://stackoverflow.com/questions/16759606/is-there-a-
equiv...](http://stackoverflow.com/questions/16759606/is-there-a-equivalent-
to-commit-in-bulbs-framework-for-neo4j/16764036#16764036)

Here's how to use server-side Gremlin scripts in Python with Rexster, which is
TinkerPop's open-source server that runs multiple graph databases, including
Neo4j...

[https://groups.google.com/d/topic/gremlin-
users/Up3JQUwrq-A/...](https://groups.google.com/d/topic/gremlin-
users/Up3JQUwrq-A/discussion)

------
tux
Page 10 "Я из Одессы я просто бухаю." translation: I'm from Odessa I just
drink. Meaning his drinking a lot of "Vodka" ^_^

~~~
maxmaxmaxmax
This is local meme - when someone asking question and you will look stupid in
case you don't have answer.

------
amirouche
Many things in the graphdb space are broken.

Tinkerpop people are pushing too hard Gremlin DSL/API/whatever which is AFAIK
only useful in some situation somewhat complex and more or less a nice way to
write some common queries. But in simple situations any language with the raw
Graph API can do the job. And there is still no drivers for Python in Rexster.
I tried, but it was too complicated. Rexster itself is too complicated.

Neo4J with their own query language made things even more complicated. Instead
of a “Graph that can be queried with your preferred language” you get a “Graph
that can be queried with something that looks like SQL but is not“

ArangoDB is nice for people that want to do JavaScript full stack. Which is
not the case of people doing Python.

Also, there is nobody marketing graphdbs just saying “it solve the general
problem“. period.

The only thing that may hold you back from using graphdbs are performances but
in _a lot of situtations_ you don't care especially in situations where you
want to be flexible and to move fast. That's where GraphDBs shine a lot. Of
course there is also the graph/tree problem solving space but this is taken
for granted.

GraphDB actors market a lot the _specialized database_ aspect of graphdbs,
nonetheless graphdbs are good even for solving generic webdev problems.

Also if you are looking for a Graph Database server that does just that, and
where you can query the graph in Python 2.7 (or Scheme) have a look at:
[https://github.com/python-graph-lovestory/Java-
GraphitiDB](https://github.com/python-graph-lovestory/Java-GraphitiDB)

~~~
amirouche
please explain the downvote.

------
Patrick_Devine
The last graph database that I used for a large project was Virtuoso, which
wasn't mentioned in the slides. It's worth a look:

[http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/](http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/)

------
vram22
graph-tool is a Python library for graph handling:

[http://jugad2.blogspot.com/2013/01/graph-tool-python-
module-...](http://jugad2.blogspot.com/2013/01/graph-tool-python-module-for-
graph.html)

------
mcphilip
For interactive neo4j examples see:

[http://blog.neo4j.org/2013/10/the-first-graphgist-
challenge-...](http://blog.neo4j.org/2013/10/the-first-graphgist-challenge-
completed.html?m=1)

------
olefoo
The one thing I thought was missing from the python tooling for Neo4J was that
because *.cyp files are so new they aren't yet handled by the standard
documentation toolchain.

------
linux_devil
Tried neo4j , and I find it handy using python and py2neo . Since my laptop is
limited in memory , I couldn't visualize graphs properly from web interface.

------
knowitall
What is the secret ingredient of graph databases? The presentation linked from
the presentation mentions physical addresses instead of IDs. I get that that
would be a speedup, but I would expect it to be more like a constant factor?

Then maybe you can save all links from a node in the node, so you can get all
the links with one read access. Fine. But as soon as you get to the second or
third level, I would expect the magic to be gone. Say every node has 100
links. OK, so the first 100 links you get in constant time c. But to get the
second level, you already need 100 requests (one for each node and it's
attached link list). So 100c time. For the third level you need 10000 reads,
10000c time. The next level would be 1000000 requests.

Just saying I'd expect things to get ugly with a graph database pretty fast,
too (not as fast as with a relational db, but still).

I haven't really coded a big graph based app, but my expectation would be that
get really good performance, a hand coded solution would always be required.
For example trying to squeeze as much of the relevant data into memory in a
compressed way. Am I wrong?

Oh and also I am not sure how good relational DBs are at query optimization.
Just because the visible model is "one row per link" doesn't mean the db
couldn't do some intelligent caching internally.

~~~
mhluongo
Data models that use deep "JOIN"s are _way_ faster on a graph database. You're
right about branching factor- if you always traverse all relations, any
database will be slow. In most cases, however, you don't.

