Hacker News new | past | comments | ask | show | jobs | submit login

While the conclusion that Neo4j is usually faster will generally be true for most such comparisons, the method to get to this result is very flawed.

Traversal methods are built-in for Neo4j, so those can be used for representative testing. The MySQL traversal methods chosen are very basic and don't necessarily reflect the optimized methods people will use in production.

Neo4j performance is highly dependent on caching. If it has to fall back to access the disk store, performance goes down rapidly, which can happen with a large data set [1]. MySQL optimization is a whole different horse. I suppose a BTREE might be the correct indexing solution to use here, but more information about the cache size would be good to know here as well.

With such performance experiments it's always useful to add a memory profiler to the discussion. Java VisualVM is a nice one that integrates well with Eclipse [2].

[1] http://wiki.neo4j.org/content/Performance_Guide#Disks.2C_RAM...

[2] http://download.oracle.com/javase/6/docs/technotes/guides/vi...




Traversal methods are built-in for Neo4j, so those can be used for representative testing. The MySQL traversal methods chosen are very basic and don't necessarily reflect the optimized methods people will use in production.

It's not that the traversal methods are built in, it's that that each node as a built-in index of its adjacent nodes so it doesn't have to do external lookups on each traversal step.


I meant that the Neo4j API offers methods for graph traversal out-of-the-box which are likely the same ones that most people will use. MySQL doesn't have a default data structure for graphs (that I know of) or default algorithms.

So while the methods used to traverse the Neo4j graph are fairly representative, the data structure and algorithms used for the MySQL traversal are not.

Of course, I agree with you that the data structure itself is optimized. In general I'm not doubting Neo4j's ability to excel in most benchmarks. I just think the approach is very basic.


Neo4j API offers methods for graph traversal out-of-the-box which are likely the same ones that most people will use.

Interestingly, Marko didn't use Neo4j's native API (http://api.neo4j.org) -- he used a dataflow framework he wrote called Pipes (https://github.com/tinkerpop/pipes/wiki/).

You probably have heard of the graph programming language Gremlin (https://github.com/tinkerpop/gremlin/wiki). Gremlin is a thin wrapper over Pipes.


Oh, I'm sorry, I didn't misread that. That's very good to know. I know Gremlin, didn't know it was based on Pipes. Thanks!


Of course I meant, "I misread that".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: