
Show HN: Nebula – a distributed graph database written in C++ - jamie-vesoft
https://github.com/vesoft-inc/nebula
======
FillardMillmore
Interesting stuff, thanks for sharing!

How did you decide to use an SQL-like query language rather than a declarative
query language like Neo4j uses (Cypher query language)? What do you see as the
pros and cons of that decision?

Do you have any plans currently to design an ETL tool to make extracting data
from a RDBMS and loading into NebulaGraph easier?

I didn't see this in any of the documentation (though I did admittedly skim),
but is there any sort of visual front-end built in? If not, do you have any
plans to make one?

~~~
vvern
SQL is a declarative language. Cypher is somewhat SQL-like, also declarative
language which exists to make the expression of graph queries with predicates
over nodes and edges easier to express.

~~~
FillardMillmore
Thank you for the correction. You are correct, SQL is a declarative language.
It would've been more accurate for me to refer to Cypher as 'a more expressive
declarative query language' (specifically for the graph database paradigm)
than a typical SQL-like query language.

------
ablekh
Looks interesting. Good luck! Couple of questions: 1) why did you decide to
create your own graph query language instead of trying to follow recent graph
query languages standardization trend (e.g., see
[https://www.tigergraph.com/2019/02/25/the-road-to-
standardiz...](https://www.tigergraph.com/2019/02/25/the-road-to-standardized-
graph-query-language-gql-part-1) and
[https://www.tigergraph.com/2019/03/15/the-road-to-a-
standard...](https://www.tigergraph.com/2019/03/15/the-road-to-a-standardized-
graph-query-language-gql-part-2\);) 2) do you plan to support semantic
features [i.e., RDF/S, Gremlin, SPARQL, inference]; 3) do you plan to have
SDKs for popular languages [e.g., Python, TypeScript]; 4) did you benchmark
(or plan to do so) Nebula Graph against competition in terms of performance
[i.e., data import, query throughput and latency, inference speed, if/when
supported]; 5) have you figured out which features will remain open source and
which ones will be enterprise-only (I assume that you plan to follow the Open
Core business model)? Thanks!

~~~
jamie-vesoft
Thanks for the questions! Good ones indeed.

1)nGQL will be compatible with the international standard GQL.

2) Currently no

3) So far the following SDKs are available:

    
    
       Python https://github.com/vesoft-inc/nebula-python 
    
       Java https://github.com/vesoft-inc/nebula-java
    
       Go https://github.com/vesoft-inc/nebula-go
    

4) We are working on the benchmark. Stay tuned.

5) Yes we'll follow the Open Core business model like you said. All the
features that are open source today will remain open source. :)

Please let me know if you have any further questions.

~~~
ablekh
It's my pleasure! Thank you for the clarifications. I will definitely follow
your efforts. :-)

~~~
jamie-vesoft
Much appreciated! You may find us on Twitter to follow the most recent
updates.:) Also, welcome to be a contributor!

------
rs23296008n1
This looks _good_. I like the query language.

Whats the status of the C++ client?

[https://github.com/vesoft-
inc/nebula/blob/master/src/client/...](https://github.com/vesoft-
inc/nebula/blob/master/src/client/cpp/GraphClient.cpp)

I see it has connect/execute/disconnect. This seems like the minimum required.
What can't I do with that basis?

~~~
jamie-vesoft
Glad you loved the query language!

The C++ client is PR ready now:

[https://github.com/vesoft-inc/nebula/pull/1013](https://github.com/vesoft-
inc/nebula/pull/1013)

------
shermanye
Nice to meet everyone here. As a newcomer, I would like to introduce ourselves
a little bit. Nebula is inspired by the Facebook internal project Dragon
([https://engineering.fb.com/data-infrastructure/dragon-a-
dist...](https://engineering.fb.com/data-infrastructure/dragon-a-distributed-
graph-query-engine/)). Fortunately I was one of the founding members of the
project. The project was started in 2012. Since then I've been spent all my
time working on the graph databases.

The goal of Nebula is to be a general-purposed, distributed graph database. We
welcome any positive feedback and technical discussion. We would love to learn
to the community and to provide a product which truly satisfies customers'
needs.

------
maxpert
Any production usages that can tell us about how much it can handle? How does
it compare to dgraph?

~~~
jamie-vesoft
Thanks for showing interests in Nebula Graph!

In some production deployments, the graph size reaches hundreds of billions of
edges and data size reaches dozens of TB.

------
ragerino
I gave the tutorial using docker a try. The SQL like query language is OK
until it gets to doing queries with where clauses on the data. Some form of
help and auto completion would be great.

I recommend to align with the GQL standard:
[https://www.gqlstandards.org/](https://www.gqlstandards.org/)

And Apache Tinkerpop. E.g. a RDF Sail implementation for Nebula.
[http://tinkerpop.apache.org/](http://tinkerpop.apache.org/)

~~~
jamie-vesoft
Thanks so much for trying Nebula! We really appreciate it.

Sorry about the where clause issue. Do you mind bringing an issue in this
regard on our GitHub repo? So that we can assign it to relevant staff.

As to the query language, thanks for your suggestion and nGQL will surely be
aligned with the GQL standard. We are keeping a close eye on it. :)

We are planning to support OpenCypher in the first half of 2020 and TinkerPop
would be the next.

Thanks again! Here's our slack group btw and you may raise any question there:
[https://join.slack.com/t/nebulagraph/shared_invite/enQtNjIzM...](https://join.slack.com/t/nebulagraph/shared_invite/enQtNjIzMjQ5MzE2OTQ2LTM0MjY0MWFlODg3ZTNjMjg3YWU5ZGY2NDM5MDhmOGU2OWI5ZWZjZDUwNTExMGIxZTk2ZmQxY2Q2MzM1OWJhMmY#)

------
lmeyerov
Wow, not sure how I missed this! Bravo!

If anyone involved would appreciate access to Graphistry to experiment with a
gpu client/cloud visual analytics side (e.g., jupyter & react), let me know
(leo@....com). Would love to see them together!

~~~
jamie-vesoft
Exciting! I know Graphistry from Twitter.

------
dang
We changed the URL from [https://github.com/vesoft-
inc/nebula](https://github.com/vesoft-inc/nebula) to the project page.

~~~
jamie-vesoft
Hey, thanks for letting me know. May I ask why?

~~~
dang
Because it's more project-specific and thus stands out more. There are a ton
of Github links posted to HN, obviously.

If you'd rather we switch to the other URL, let us know and we can do that.

~~~
jamie-vesoft
Ah I see. Thanks for the explanation! Could you please change the URL back to
the GitHub URL? Also the title? Much appreciated!

~~~
dang
I've changed the URL back from [https://nebula-graph.io/](https://nebula-
graph.io/).

The submitted title ("Show HN: An open-source distributed graph database
written in C++") led to lots of arguments, so we changed it in accordance with
HN's rules about baity titles. That's in
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html).

~~~
jamie-vesoft
Thanks for the explanation. I thought that was the reason you changed the
title. Makes sense to me. :)

------
slowmotarget
It looks really neat! What's your plan on making this project viable in the
long run? Do you envisage to monetize hosting, or maybe create a community vs.
paid edition?

~~~
jamie-vesoft
Glad you liked the project! Hosting service would be the main monetization
method. In addition, we will be providing consulting, training and all sorts
of enterprise services.

------
bane
This is very interesting. Anybody use it? Alternatives?

~~~
lmeyerov
We help bring gpu visual analytics & investigation automation to users of all
sorts of graph DBs (think tableau & servicenow for graph), so based on our
enterprise/big tech/gov/startup interactions:

1\. Shortlist (and in no order): Neo4j, AWS Neptune, Datastax Graph,
TigerGraph, Azure CosmosDB, and JanusGraph (Titan fork) are the ones we see
the most in practice, and not in production but rumor-mill, Dgraph,
RedisGraph, & ArangoDB. The three-and-four-letter types seem to roll their
own, for better or worse. There are also some super cool ones that don't get
visibility outside of the HPC+DoD world, like Stinger & Gunrock.
Interestingly, the reality is a ton of our graph users aren't even on graph
DBs (think Splunk/ELK/SQL), and for data scientists, just do ephemeral
Pandas/Spark. As someone from the early days of the end-to-end GPU computing
movement, we're incorporating cuGraph (part of nvidia rapids.ai) into our
middle tier, so you get to transparently benefit from it while looking at data
in any of the above.

2\. I now slice graph DB's more in terms of OLTP (neo4j, janus, neptune, maybe
tiger) vs OLAP (spark graphx, cugraph) vs batch (janus, tiger) vs friendly
BI/data science (neo4j) vs friendly app dev / multi-modal add-on (CosmosDB,
Neo4j, Arango, Redis). Curious to see how this goes -- given the number of
contributors, I'm guessing it's doing well in at least one of these. +1 to
hearing reports from others!

~~~
bane
Thanks, I really appreciate the comprehensive write up of what your team is
seeing. Any chance of a longer blog post that expands on this, especially pro-
cons and performance?

~~~
lmeyerov
Yes, that is a great idea!

------
foota
The architecture diagram looks interesting, would love to read more about it
if anyone finds something.

~~~
rainyi2007
I found some articles about the architecture and design of the database in
their repo:

Overall structure: [https://github.com/vesoft-inc/nebula/wiki/Nebula-Graph-
Archi...](https://github.com/vesoft-inc/nebula/wiki/Nebula-Graph-Architecture-
Overview)

Storage engine design: [https://github.com/vesoft-
inc/nebula/blob/master/docs/manual...](https://github.com/vesoft-
inc/nebula/blob/master/docs/manual-EN/1.overview/3.design-and-
architecture/2.storage-design.md)

Query engine design: [https://github.com/vesoft-inc/nebula/wiki/Query-Engine-
Overv...](https://github.com/vesoft-inc/nebula/wiki/Query-Engine-Overview)

Hope that helps. :)

~~~
ioli
Thanks

------
winrid
Love the query language! Very easy to dive into. Any companies using this in
production?

~~~
jamie-vesoft
Glad you loved the query language. Simplicity and versatility are our design
goals for the language.

Currently the project has been deployed in multiple leading internet companies
in China, including Tencent, MeiTuan (Chinese Yelp), Red (Chinese Pinterest),
Vivo, and so on.

------
gibsonf1
SPARQL?

------
cetra3
It's not open source, as it is licensed under Commons Clause according to the
README, which according to the FAQ is not open source.

I'd be interested in knowing whether the commons clause license has been
challenged as the wording is rather simple

~~~
thunderbong
Honestly, I don't see what's wrong with expecting payment for your work if
someone else decides to sell it. Why should 'open source' get conflated with
free (as in gratis)?

For me, open source has been an incredible way to learn software - it's
syntax, it's architecture, it's control flow, it's gotchas.

From my understanding of the license [1], you can see the code, learn from it,
do whatever you want with it, modify it if you so please, improve on it,
whatever. The only thing you cannot do is sell it. Because you've taken
someone else's idea in the first place.

I see this happening all the freakin' time and it pisses me off no end. If I
suggest a software to someone, the first thing they as is 'Is it open source?'
What they really mean is 'Is it free?' Why? If someone is expecting to get
paid for creating software for others, why is the feeling not reciprocated
towards the person who's created the software in the first place?

From what I've seen, most managers and software engineers, expect to get paid
for their work but all the software which helps them make that money, they
expect for free.

I find that attitude extremely hypocritical, honestly.

[1]: [https://commonsclause.com/](https://commonsclause.com/)

~~~
zozbot234
Why should open source get conflated with things that are NOT open source?
Putting restrictions around "commercial" use (which is notoriously hard to
define) is not open source. Discriminating against fields of endeavor is not
open source.

If you want to get paid for developing genuine open source software, there are
things you can do to that effect. Get paid for support (even maintaining the
code _is_ support). Offer to highlight companies that support your software
(even if the highlighting is quite trivial, this is enough to unlock
'marketing' expenses and make it easier for business-oriented entities to
support you). Start a Patreon page. There are _lots_ of things that can be
done without adding any licensing restrictions.

~~~
brobdingnagians
> "without adding any license restrictions"

That would imply public domain. Every license has some licensing restrictions.
MIT, BSD, and associated ones are closest to that, but still have
restrictions. "Open source" in the literal sense in English is where the
source is open to be looked at by everyone. Lots of software is like that,
even fully commercial offerings. AGPL, GPL, and co have pretty drastic
limitations on commercial usage (much more than the Commons Clause), but are
obviously open source. The author should decide licensing, and if the source
is available to be perused-- the English language would tend to call that,
"open source". I think "OSI Approved Open Source License" would be a better
phrase than the linguistically vague "open source". English has proper nouns
for that sort of thing, and if we can go around writing "GNU/Linux", I think
specifying the _type_ of open source license really isn't too much to ask for.

~~~
zzo38computer
There are some licenses effectively like public domain, such as zero-clause
BSD, CC0, WTFPL, Unlicense, etc.

GPL does not restrict commercial use any more than non-commercial use. What it
does restrict is adding additional restrictions, it requires source code to be
distributed, and it does not allow disallowing the user to substitute their
own version.

If the source is available to be perused I think it is called "shared source"
(or "source available"); "open source" is a subset of that, and is according
to the OSI definition. "Free software" is also a subset of "source available".
"OSI approved" is a subset of "open source" because OSI approved does not
include public domain, even if it is still open source (which in some cases it
is) (also some stuff that meets the OSI definition (by both words and
intention) might not be OSI approved because OSI has not looked at it yet).
And then there is also "FOSS".

------
pojntfx
It's a proprietary license, false advertising ...

~~~
jamie-vesoft
Thanks for your feedback!

While our original intention is to provide a real open-source graph database
project for the community, we also want to prevent cloud vendors from
monetizing the project without contributing back to the community. Exactly
like what's explained in this TechCrunch article:
[https://techcrunch.com/2018/09/07/commons-clause-stops-
open-...](https://techcrunch.com/2018/09/07/commons-clause-stops-open-source-
abuse/)

That being said, Common Clause seems to be the only license that can be used.
Quote the article: "Academics, hobbyists or developers wishing to use a
popular open-source project to power a component of their application can
still do so. "

However, we will seriously consider the license issue. Please do let us know
if you know any better licenses that can be used.

Much appreciated!

------
essive
Why is this project tied so specifically to China? Honest question here....

~~~
flohofwoe
What are those specific ties other than the company behind the project being
located in China?

~~~
apta
Still a red flag.

~~~
SQueeeeeL
Is it? That's like saying that Intel being located in the US is a red flag for
them leaving backdoors in their hardware... oohhh

[https://en.wikipedia.org/wiki/RDRAND#Reception](https://en.wikipedia.org/wiki/RDRAND#Reception)

~~~
apta
And China has backdoors in their stuff. We shouldn't be using it.

