Hacker News new | past | comments | ask | show | jobs | submit login

We help bring gpu visual analytics & investigation automation to users of all sorts of graph DBs (think tableau & servicenow for graph), so based on our enterprise/big tech/gov/startup interactions:

1. Shortlist (and in no order): Neo4j, AWS Neptune, Datastax Graph, TigerGraph, Azure CosmosDB, and JanusGraph (Titan fork) are the ones we see the most in practice, and not in production but rumor-mill, Dgraph, RedisGraph, & ArangoDB. The three-and-four-letter types seem to roll their own, for better or worse. There are also some super cool ones that don't get visibility outside of the HPC+DoD world, like Stinger & Gunrock. Interestingly, the reality is a ton of our graph users aren't even on graph DBs (think Splunk/ELK/SQL), and for data scientists, just do ephemeral Pandas/Spark. As someone from the early days of the end-to-end GPU computing movement, we're incorporating cuGraph (part of nvidia rapids.ai) into our middle tier, so you get to transparently benefit from it while looking at data in any of the above.

2. I now slice graph DB's more in terms of OLTP (neo4j, janus, neptune, maybe tiger) vs OLAP (spark graphx, cugraph) vs batch (janus, tiger) vs friendly BI/data science (neo4j) vs friendly app dev / multi-modal add-on (CosmosDB, Neo4j, Arango, Redis). Curious to see how this goes -- given the number of contributors, I'm guessing it's doing well in at least one of these. +1 to hearing reports from others!

Thanks, I really appreciate the comprehensive write up of what your team is seeing. Any chance of a longer blog post that expands on this, especially pro-cons and performance?

Yes, that is a great idea!

For someone who just wants to run some (intensive) OLAP graph queries on the “graph formulation” of a relational or hierarchical dataset every once in a while (maybe batch, maybe user-initiated, but either way <1QPS), but doesn’t yet have a graph DB and doesn’t really want to maintain their data in a canonical graph formulation, which type of graph DB would you recommend as the simplest-to-maintain, simplest-to-scale “adjunct” to their existing infra?

I.e. what’s the graph DB that best fits the use-case equivalent to “having your data in an RDBMS and then running an indexer agent to feed ElasticSearch for searching”?

My default nowadays is minimize work via "no graph db": csv/parquet extract -> jupyter notebook of pandas/cugraph/graphistry, and if that isn't enough, then dockerized (=throwaway) neo4j , or if the env has it, spark+graphistry. The answers to some questions can easily switch the answer to say "kafka -> tigergraph/janusgraph/neptune", or some push button neo4j/cosmosdb stuff:

* Primary DB: type / scale, and how fresh do the extracts need to be (daily, last minute?)

* Are queries more search-centric ("entities 4 hops out") or analytics ("personalized pagerank")?

* Graph size: 10M relations, or 10B? Document heavy, or mostly ints & short strings?

* Is the client consuming the graph via a graph UI, or API-only?

* Licensing and $ cost restrictions?

* Push-button or inhouse-developer-managed?

The result of (valid) engineering trade-offs by graph db dev teams means that, currently, adding a graph db as a second system can be tricky. The above represent potential mismatches between source db / graph stack / workload and team burden. Feels like this needs a flow chart!

Happy to answer based on the above, and you can see why I'm curious which areas Nebula will help straddle :)

Very insightful answer! Thanks for sharing your opinions here. Nebula Graph is good at OLTP use cases where high QPS and low latency are required.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact