Hacker News new | comments | show | ask | jobs | submit login

Nope, or at least none that called themselves such. We tried neo4j, which exploded trying to import data on the order that we're working with and a couple of RDF databases, which survived the import, but were a couple of orders of magnitude off from the performance we were hoping for.

After writing some 8 different backends for our store class and none being within an order of magnitude of our own prototype for the sorts of applications we're doing, it seemed more fruitful to round out our own application rather than continuing the seemingly endless recurse of possible data backends which ranged from mildly to amazingly disappointing.

If you've got something specific that you've worked with in the past that you think would be worth our while to evaluate, I'd consider investing the time to try it out. But just that there exist more options that we could evaluate at the moment doesn't necessarily imply that it's reasonable to keep writing new backends, which sometimes take a non-trivial amount of effort.

I'm part of the Neo4j team and I'm puzzled about the import problem. I don't know about the size requirements you have but you mention 2.5M nodes and 60M edges and we run systems in production with a LOT more data (billions range). So it definitely shouldn't blow up. Maybe you ran into some bug in an older release or something else was wrong.

It's also important to note that Neo4j through the normal API is optimized for the most common use cases: reading data and transactional updates. Those operations are executed all the time during normal operation, whereas an import is typically done once at system bootstrap and then never again.

To ease migration, as part of our 1.0 release (June time frame) we will expose a new "batch injection" API that is faster for one-time imports of data sets. This is currently being developed. If you have feedback on how an API like that should behave, feel free to join the discussions on the list:



I'm assuming this is a proprietary, so any other comments regarding RDF databases would be helpful. I've used ARC (arc.semsol.org) before, and it works adequately. Though I haven't run performance tests personally, ARC is based on PHP so it probably gets blown away by this C++ version.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact