

MySQL vs. Neo4j on a Large-Scale Graph Traversal - espeed
http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-large-scale-graph-traversal/

======
sthlm
While the conclusion that Neo4j is usually faster will generally be true for
most such comparisons, the method to get to this result is very flawed.

Traversal methods are built-in for Neo4j, so those can be used for
representative testing. The MySQL traversal methods chosen are very basic and
don't necessarily reflect the optimized methods people will use in production.

Neo4j performance is highly dependent on caching. If it has to fall back to
access the disk store, performance goes down rapidly, which can happen with a
large data set [1]. MySQL optimization is a whole different horse. I suppose a
BTREE might be the correct indexing solution to use here, but more information
about the cache size would be good to know here as well.

With such performance experiments it's always useful to add a memory profiler
to the discussion. Java VisualVM is a nice one that integrates well with
Eclipse [2].

[1]
[http://wiki.neo4j.org/content/Performance_Guide#Disks.2C_RAM...](http://wiki.neo4j.org/content/Performance_Guide#Disks.2C_RAM_and_other_tips)

[2]
[http://download.oracle.com/javase/6/docs/technotes/guides/vi...](http://download.oracle.com/javase/6/docs/technotes/guides/visualvm/profiler.html)

~~~
espeed
_Traversal methods are built-in for Neo4j, so those can be used for
representative testing. The MySQL traversal methods chosen are very basic and
don't necessarily reflect the optimized methods people will use in
production._

It's not that the traversal methods are built in, it's that that each node as
a built-in index of its adjacent nodes so it doesn't have to do external
lookups on each traversal step.

~~~
sthlm
I meant that the Neo4j API offers methods for graph traversal out-of-the-box
which are likely the same ones that most people will use. MySQL doesn't have a
default data structure for graphs (that I know of) or default algorithms.

So while the methods used to traverse the Neo4j graph are fairly
representative, the data structure and algorithms used for the MySQL traversal
are not.

Of course, I agree with you that the data structure itself is optimized. In
general I'm not doubting Neo4j's ability to excel in most benchmarks. I just
think the approach is very basic.

~~~
espeed
_Neo4j API offers methods for graph traversal out-of-the-box which are likely
the same ones that most people will use._

Interestingly, Marko didn't use Neo4j's native API (<http://api.neo4j.org>) --
he used a dataflow framework he wrote called Pipes
(<https://github.com/tinkerpop/pipes/wiki/>).

You probably have heard of the graph programming language Gremlin
(<https://github.com/tinkerpop/gremlin/wiki>). Gremlin is a thin wrapper over
Pipes.

~~~
sthlm
Oh, I'm sorry, I didn't misread that. That's very good to know. I know
Gremlin, didn't know it was based on Pipes. Thanks!

~~~
sthlm
Of course I meant, "I misread that".

------
teh
I checked all the links but didn't find any source?

I suspect a lot of slowness is round-trips to mysql. In postgresql you can
e.g. do "with recursive" queries to query simple graphs like your example,
avoiding the round trips.

Their cypher [1] query language may be a much better selling point.

[1] [http://docs.neo4j.org/chunked/milestone/cypher-query-
lang.ht...](http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html)

------
mwexler
I am underwhelmed with this post that shows that, metaphorically, when using a
hammer vs. a screwdriver to insert a nail into wood, the hammer tends to
perform better. Perhaps I missed some big discussion on how great MySQL
performs on graphs, because it would seem the wrong tool for graph traversal.

And a stacked bar graph was probably not the best way to display these
findings.

However, other posts on the blog (markorodriguez.com) are more interesting,
such as the application of a graph traversal approach to a collaborative
filtering recommendation algo.

~~~
rkalla
I see your point; I interpreted the post to be more targeted at folks that
weren't aware of Neo4j or graph-specific data stores and might be trying to
hammer their nails in with screwdrivers (only know MySQL and nothing else) and
the post was enlightening them to what is waiting at the end of the rainbow...
or some analogy like that :)

------
cbs
I don't expect mysql to ever win at graph traversal when pitted against a
graph database, but...

 _However, no attempts have been made to optimize the Java VM, the SQL
queries, etc. These experiments were run with both Neo4j and MySQL “out of the
box” and with a “natural syntax” for both types of queries._

This, along with all the other factors overlooked, makes the results nothing
more than a very cursory starting point to actually comparing the two data
stores. There is not enough data, or effort into collecting accurate data to
draw any conclusions.

------
scubaguy
Isn't Neo4j a graph database?

