MySQL vs. Neo4j on a Large-Scale Graph Traversal

sthlm · on Sept 28, 2011

While the conclusion that Neo4j is usually faster will generally be true for most such comparisons, the method to get to this result is very flawed.

Traversal methods are built-in for Neo4j, so those can be used for representative testing. The MySQL traversal methods chosen are very basic and don't necessarily reflect the optimized methods people will use in production.

Neo4j performance is highly dependent on caching. If it has to fall back to access the disk store, performance goes down rapidly, which can happen with a large data set [1]. MySQL optimization is a whole different horse. I suppose a BTREE might be the correct indexing solution to use here, but more information about the cache size would be good to know here as well.

With such performance experiments it's always useful to add a memory profiler to the discussion. Java VisualVM is a nice one that integrates well with Eclipse [2].

[1] http://wiki.neo4j.org/content/Performance_Guide#Disks.2C_RAM...

[2] http://download.oracle.com/javase/6/docs/technotes/guides/vi...

espeed · on Sept 28, 2011

Traversal methods are built-in for Neo4j, so those can be used for representative testing. The MySQL traversal methods chosen are very basic and don't necessarily reflect the optimized methods people will use in production.

It's not that the traversal methods are built in, it's that that each node as a built-in index of its adjacent nodes so it doesn't have to do external lookups on each traversal step.

sthlm · on Sept 28, 2011

I meant that the Neo4j API offers methods for graph traversal out-of-the-box which are likely the same ones that most people will use. MySQL doesn't have a default data structure for graphs (that I know of) or default algorithms.

So while the methods used to traverse the Neo4j graph are fairly representative, the data structure and algorithms used for the MySQL traversal are not.

Of course, I agree with you that the data structure itself is optimized. In general I'm not doubting Neo4j's ability to excel in most benchmarks. I just think the approach is very basic.

espeed · on Sept 29, 2011

Neo4j API offers methods for graph traversal out-of-the-box which are likely the same ones that most people will use.

Interestingly, Marko didn't use Neo4j's native API (http://api.neo4j.org) -- he used a dataflow framework he wrote called Pipes (https://github.com/tinkerpop/pipes/wiki/).

You probably have heard of the graph programming language Gremlin (https://github.com/tinkerpop/gremlin/wiki). Gremlin is a thin wrapper over Pipes.

sthlm · on Sept 29, 2011

Oh, I'm sorry, I didn't misread that. That's very good to know. I know Gremlin, didn't know it was based on Pipes. Thanks!

sthlm · on Sept 30, 2011

Of course I meant, "I misread that".

teh · on Sept 28, 2011

I checked all the links but didn't find any source?

I suspect a lot of slowness is round-trips to mysql. In postgresql you can e.g. do "with recursive" queries to query simple graphs like your example, avoiding the round trips.

Their cypher [1] query language may be a much better selling point.

[1] http://docs.neo4j.org/chunked/milestone/cypher-query-lang.ht...

mwexler · on Sept 28, 2011

I am underwhelmed with this post that shows that, metaphorically, when using a hammer vs. a screwdriver to insert a nail into wood, the hammer tends to perform better. Perhaps I missed some big discussion on how great MySQL performs on graphs, because it would seem the wrong tool for graph traversal.

And a stacked bar graph was probably not the best way to display these findings.

However, other posts on the blog (markorodriguez.com) are more interesting, such as the application of a graph traversal approach to a collaborative filtering recommendation algo.

rkalla · on Sept 28, 2011

I see your point; I interpreted the post to be more targeted at folks that weren't aware of Neo4j or graph-specific data stores and might be trying to hammer their nails in with screwdrivers (only know MySQL and nothing else) and the post was enlightening them to what is waiting at the end of the rainbow... or some analogy like that :)

espeed · on Sept 28, 2011

Perhaps I missed some big discussion on how great MySQL performs on graphs, because it would seem the wrong tool for graph traversal.

You can represent a graph in almost any data structure and because of this people do it all the time, but it isn't as performant because in MySQL you have to hit an external index on each traversal step, whereas the in a graph DB, each node caries an index of its adjacent nodes.

cbs · on Sept 28, 2011

I don't expect mysql to ever win at graph traversal when pitted against a graph database, but...

However, no attempts have been made to optimize the Java VM, the SQL queries, etc. These experiments were run with both Neo4j and MySQL “out of the box” and with a “natural syntax” for both types of queries.

This, along with all the other factors overlooked, makes the results nothing more than a very cursory starting point to actually comparing the two data stores. There is not enough data, or effort into collecting accurate data to draw any conclusions.

scubaguy · on Sept 28, 2011

Isn't Neo4j a graph database?