
Nebula Graph: A Linearly Scalable, Distributed Graph Database Written in C++ - jamie-vesoft
https://nebula-graph.io/
======
jandrewrogers
It looks like a competent and thoughtful implementation but, as best I can
determine and not to take anything away from it, using an old design. The
performance and scalability is throttled by the use of secondary indexing
structures. You would have to use some pretty expensive hardware for the
performance cliffs to not be immediately evident.

I don’t do a lot of work on graph databases these days, but I’ve seen state-
of-the-art implementations do 10x this many inserts/sec/server on EC2 VMs
where the local data model size was 100x the available RAM. And in principle
these architectures could easily scale-out. Indexing structure and storage
engine design figure prominently, both usually need to be built for the
purpose.

~~~
continuations
What are the differences between Nebula's old design and the new architectures
of graph DB?

Any open source graph DB that uses a new architecture?

------
moab
Does the OP have links to any benchmarks? Specifically, what kind of ingestion
rates can one expect with a modest number of machines? Does it support a
single-machine (shared-memory parallel) environment? What kind of algorithms
are supported?

It would be good to add some information about the features/capabilities on
the homepage. Right now the blurbs make vague statements like "high
throughput", which could be 1000 edge updates/sec or 10M.

~~~
jamie-vesoft
Thanks so much for your suggestion regarding the website!I am thinking about
the same thing as well. Will keep improving the site along the way. Really
appreciate it.

As to the data for throughput, there are some PoC projects going on and
according to data from production, for inserting, one of our clients has
inserted 300b records to 6 servers within 20 hours, that is 690k
inserts/sec/server.

We want the benchmark data to be verified by decent clients in their
production environment. And will reveal more data in the future.

Thanks again!

~~~
harikb
Just curious, didn’t you have to do some basic benchmarking using your own
data to get these clients to signup in the first place? Or is this part of a
larger engagement/partnership that these clients trust you enough to embark on
this?

------
gigatexal
I can't consider this until the folks at Jepsen have run it through its paces
or if it's matured and been battle tested first. A database is so important to
anything these days it has to have a seal of approval from the likes of Jepsen
for me to trust my data to it which is why I bias towards existing solutions
before jumping on a new db.

~~~
jamie-vesoft
Good point. Software systems mature with on-going testing. Nebula Graph has
implemented Jepsen tests for quite some time already. See [https://nebula-
graph.io/en/posts/detect-data-consistency-iss...](https://nebula-
graph.io/en/posts/detect-data-consistency-issues-in-raft-implementing-with-
jepsen/)

~~~
gigatexal
That is really good then! I’ll check it out now

------
FridgeSeal
Oooh this looks interesting.

A comparison with the likes of DGraph and Neo4J would be really useful!

------
VHRanger
What graph algorithms are implemented beyond querying?

Also, how does a node locally store where it's neighbours are stored in the
cluster?

~~~
boxfire
I found these in the docs which are verbose and helpful:

[https://github.com/vesoft-
inc/nebula/blob/master/docs/manual...](https://github.com/vesoft-
inc/nebula/blob/master/docs/manual-EN/1.overview/3.design-and-
architecture/1.design-and-architecture.md)

[https://github.com/vesoft-
inc/nebula/blob/master/docs/manual...](https://github.com/vesoft-
inc/nebula/blob/master/docs/manual-EN/1.overview/3.design-and-
architecture/2.storage-design.md)

~~~
jamie-vesoft
Thanks for sharing! Yes you are right, the architecture articles are trying to
help users understand how Nebula Graph stores and processes data.

------
justicezyx
[https://news.ycombinator.com/item?id=22051271](https://news.ycombinator.com/item?id=22051271)
(3 months ago)

Someone mentioned benchmark, it was mentioned the authors are working on that.
Have not checked the current state.

~~~
jamie-vesoft
Great digging! Thanks so much for paying attention to the benchmark report
data. We apologize that you have to wait for so long!

Yes we have been working on the benchmark data for quite some time because we
have been working with our clients to verify our capability. For example, one
of our clients has inserted 300b records to 6 servers within 20 hours, then we
are confident to say that Nebula Graph can manage 690k inserts/sec/server.

We will keep working and provide a trustworthy benchmark report for you as
soon as we can.

Thanks again!

------
vardump
Would love to see a distributed hypergraph database. Do such things exist in a
practical form yet?

------
kvbe
what are the main use cases for these type of graph databases?

~~~
jamie-vesoft
Graph databases are efficient in exploring multi-hop relationships which are
common in many business scenarios. So basically if your application needs to
query n-hop relationships all the time, then graph database is a better
choice. Some main use cases include real-time recommendation
(product/content/shop), risk management like fraud detection in the financial
services industry, knowledge graph and machine learning, etc.

