How did you decide to use an SQL-like query language rather than a declarative query language like Neo4j uses (Cypher query language)? What do you see as the pros and cons of that decision?
Do you have any plans currently to design an ETL tool to make extracting data from a RDBMS and loading into NebulaGraph easier?
I didn't see this in any of the documentation (though I did admittedly skim), but is there any sort of visual front-end built in? If not, do you have any plans to make one?
Good question re data loading! Currently there are two tools for this purpose:
1) Spark Writer which enables you to load data from DWH to Nebula Graph via Spark: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...
2) CSV importer if you don't have to import very large data from RDBMS to Nebula Graph: https://github.com/vesoft-inc/nebula-importer
The visual will be released by the end of this week. :)
1)nGQL will be compatible with the international standard GQL.
2) Currently no
3) So far the following SDKs are available:
5) Yes we'll follow the Open Core business model like you said. All the features that are open source today will remain open source. :)
Please let me know if you have any further questions.
Whats the status of the C++ client?
I see it has connect/execute/disconnect. This seems like the minimum required. What can't I do with that basis?
The C++ client is PR ready now:
The goal of Nebula is to be a general-purposed, distributed graph database. We welcome any positive feedback and technical discussion. We would love to learn to the community and to provide a product which truly satisfies customers' needs.
I recommend to align with the GQL standard:
And Apache Tinkerpop. E.g. a RDF Sail implementation for Nebula.
Sorry about the where clause issue. Do you mind bringing an issue in this regard on our GitHub repo? So that we can assign it to relevant staff.
As to the query language, thanks for your suggestion and nGQL will surely be aligned with the GQL standard. We are keeping a close eye on it. :)
We are planning to support OpenCypher in the first half of 2020 and TinkerPop would be the next.
Thanks again! Here's our slack group btw and you may raise any question there: https://join.slack.com/t/nebulagraph/shared_invite/enQtNjIzM...
In some production deployments, the graph size reaches hundreds of billions of edges and data size reaches dozens of TB.
If anyone involved would appreciate access to Graphistry to experiment with a gpu client/cloud visual analytics side (e.g., jupyter & react), let me know (leo@....com). Would love to see them together!
If you'd rather we switch to the other URL, let us know and we can do that.
The submitted title ("Show HN: An open-source distributed graph database written in C++") led to lots of arguments, so we changed it in accordance with HN's rules about baity titles. That's in https://news.ycombinator.com/newsguidelines.html.
1. Shortlist (and in no order): Neo4j, AWS Neptune, Datastax Graph, TigerGraph, Azure CosmosDB, and JanusGraph (Titan fork) are the ones we see the most in practice, and not in production but rumor-mill, Dgraph, RedisGraph, & ArangoDB. The three-and-four-letter types seem to roll their own, for better or worse. There are also some super cool ones that don't get visibility outside of the HPC+DoD world, like Stinger & Gunrock. Interestingly, the reality is a ton of our graph users aren't even on graph DBs (think Splunk/ELK/SQL), and for data scientists, just do ephemeral Pandas/Spark. As someone from the early days of the end-to-end GPU computing movement, we're incorporating cuGraph (part of nvidia rapids.ai) into our middle tier, so you get to transparently benefit from it while looking at data in any of the above.
2. I now slice graph DB's more in terms of OLTP (neo4j, janus, neptune, maybe tiger) vs OLAP (spark graphx, cugraph) vs batch (janus, tiger) vs friendly BI/data science (neo4j) vs friendly app dev / multi-modal add-on (CosmosDB, Neo4j, Arango, Redis). Curious to see how this goes -- given the number of contributors, I'm guessing it's doing well in at least one of these. +1 to hearing reports from others!
I.e. what’s the graph DB that best fits the use-case equivalent to “having your data in an RDBMS and then running an indexer agent to feed ElasticSearch for searching”?
* Primary DB: type / scale, and how fresh do the extracts need to be (daily, last minute?)
* Are queries more search-centric ("entities 4 hops out") or analytics ("personalized pagerank")?
* Graph size: 10M relations, or 10B? Document heavy, or mostly ints & short strings?
* Is the client consuming the graph via a graph UI, or API-only?
* Licensing and $ cost restrictions?
* Push-button or inhouse-developer-managed?
The result of (valid) engineering trade-offs by graph db dev teams means that, currently, adding a graph db as a second system can be tricky. The above represent potential mismatches between source db / graph stack / workload and team burden. Feels like this needs a flow chart!
Happy to answer based on the above, and you can see why I'm curious which areas Nebula will help straddle :)
I'd also include redis because of the graph module (https://oss.redislabs.com/redisgraph/).
I've likely missed a bunch of others. Add them as I'm interested in graph db and have only scratched the surface myself.
OrientDB is an alternative to Neo: https://orientdb.org/
There are a bunch of DB's compatible with Tinkerpop and e.g. query-able with Gremlin: http://tinkerpop.apache.org/
And this awesome page has some good entries: https://github.com/jbmusso/awesome-graph/blob/master/README....
In architecture and goals it actually closely resembles Dgraph, would love to see an (opinionated) comparison by Manish, the CEO of Dgraph
- geospatial features
- good speed as it is based on badgerdb key value database and ristello cache library.
- http library and other features
One of the advantage I saw in nebula graph is security role based access which is not available in dgraph until today.
I am very curious about benchmark between nebula graph and dgraph.
Also what is storage system used in nebula graph.
As to the storage system, Nebula Graph is based on multi-group raft and RocksDB.
You may take a look at this article about the design of our storage engine: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...
In 2020 we will be working on more plugins. You may stay tuned if that interests you. :)
Does nebula also store data multiple time for multiple index?
Nebula doesn't store data multiple times for index.
And here's how the indexing works in Nebula Graph:
You are allowed to create multiple indexes for one tag or edge type in Nebula Graph. For example, if a vertex has 5 properties attached to it, then you can create one index for each if it's necessary for you. Both indexes and the raw data are stored in the same partition with their own data structure for quick query statement scanning. Whenever there are "where" clause/syntax in the queries, the index optimizer decides which index file should be traversed.
We (I work at Dgraph) have data redundancy when you have multiple replicas for a given group - but that's an optional feature.
The goal of Nebula is to be a general graph database, not just a knowledge graph database. There are some fundamental differences between the two.
We welcome any positive feedback and technical discussion. We would love to learn to the community and to provide a product which truly satisfies customers' needs.
One of the most interesting picks: RDF4j (java based). It can connect to a lot of different SPARQL servers, but the rdf4j Native Store should be good enough for data sets in the order of the "100 million triples", according to the docs.
I don't know much about it, but not long ago they announced integrated support for "federated queries", which means that if you data set can't fit in a single node, they have a solution to query different servers in the same query .
I'm slowly learning through the forest of related technologies, one of the most useful is SHACL , which is a language to validate and extract pieces of the graph that match a pattern (very loosely, think a "schema" for graphs).
Take your time and beware of objectivity of article. Vendors try to lure you in. Following has some good info (but a Neo4j bias): https://dzone.com/articles/rdf-triple-stores-vs-labeled-prop...
Also, both those articles are a bit old: RDF* (,) is a new extension for RDF that makes it easier to accomplish the same kind of things you can do with property graphs. RDF4j has support for RDF* in the roadmap! .
To me, the fact that RDF is 1) a simpler and more general model and 2) an open standard with multiple free and commercial implementations; makes RDF a more a attractive option than locking into a single proprietary implementation like Neo4j.
Check out the Getting started YouTube video here if you prefer video tutorials:
Also some FAQs:
If you are interested in the architectural design of the project, here are some articles for your reference:
Feel free to contact us if anything is missing. :)
Overall structure: https://github.com/vesoft-inc/nebula/wiki/Nebula-Graph-Archi...
Storage engine design: https://github.com/vesoft-inc/nebula/blob/master/docs/manual...
Query engine design: https://github.com/vesoft-inc/nebula/wiki/Query-Engine-Overv...
Hope that helps. :)
Currently the project has been deployed in multiple leading internet companies in China, including Tencent, MeiTuan (Chinese Yelp), Red (Chinese Pinterest), Vivo, and so on.
I'd be interested in knowing whether the commons clause license has been challenged as the wording is rather simple
For me, open source has been an incredible way to learn software - it's syntax, it's architecture, it's control flow, it's gotchas.
From my understanding of the license , you can see the code, learn from it, do whatever you want with it, modify it if you so please, improve on it, whatever. The only thing you cannot do is sell it. Because you've taken someone else's idea in the first place.
I see this happening all the freakin' time and it pisses me off no end. If I suggest a software to someone, the first thing they as is 'Is it open source?' What they really mean is 'Is it free?' Why? If someone is expecting to get paid for creating software for others, why is the feeling not reciprocated towards the person who's created the software in the first place?
From what I've seen, most managers and software engineers, expect to get paid for their work but all the software which helps them make that money, they expect for free.
I find that attitude extremely hypocritical, honestly.
If you want to get paid for developing genuine open source software, there are things you can do to that effect. Get paid for support (even maintaining the code is support). Offer to highlight companies that support your software (even if the highlighting is quite trivial, this is enough to unlock 'marketing' expenses and make it easier for business-oriented entities to support you). Start a Patreon page. There are lots of things that can be done without adding any licensing restrictions.
That would imply public domain. Every license has some licensing restrictions. MIT, BSD, and associated ones are closest to that, but still have restrictions. "Open source" in the literal sense in English is where the source is open to be looked at by everyone. Lots of software is like that, even fully commercial offerings. AGPL, GPL, and co have pretty drastic limitations on commercial usage (much more than the Commons Clause), but are obviously open source. The author should decide licensing, and if the source is available to be perused-- the English language would tend to call that, "open source". I think "OSI Approved Open Source License" would be a better phrase than the linguistically vague "open source". English has proper nouns for that sort of thing, and if we can go around writing "GNU/Linux", I think specifying the _type_ of open source license really isn't too much to ask for.
GPL does not restrict commercial use any more than non-commercial use. What it does restrict is adding additional restrictions, it requires source code to be distributed, and it does not allow disallowing the user to substitute their own version.
If the source is available to be perused I think it is called "shared source" (or "source available"); "open source" is a subset of that, and is according to the OSI definition. "Free software" is also a subset of "source available". "OSI approved" is a subset of "open source" because OSI approved does not include public domain, even if it is still open source (which in some cases it is) (also some stuff that meets the OSI definition (by both words and intention) might not be OSI approved because OSI has not looked at it yet). And then there is also "FOSS".
That's... somewhat accurate.
Let's not pretend the GPL team itself didn't have issues with Tivo-ization, that prompted license changes.
Cloud servic-ization is the virtualization of hardware modification locks.
So call it opinions about "commercial" or use another word, but the GPL definitely has them.
Hence why the FSF advocates the AGPL for software that's designed to be performed "as a service" over a computer network. But "no tivoization" and AGPL clauses do not deny these uses; they simply enable the end user of the software to exercise her rights with respect to it.
It's all well and good, and nothing immoral is done by offering code under this license, but that doesn't make it open source.
Meanwhile the creator gets to share their work freely with anyone who wishes to use it as a component of their own product/software in the spirit of open source.
Let’s say it’s a full DB option as part of AWS RDS (or whatever that graph DB equivalent is). That probably is clearly monetizing the product. But what if they completely abstract the API and not expose the original one, it’s just the backing engine for a graph DB product?
Now moving away from a direct product, what if it’s just the backing DB AWS uses for managing all of their infrastructure? It’s not being directly monetized at that point but it might be the most critical component for the AWS operations, which means that it is helping them monetize other products. Do they owe in this case? (I’m speaking about the license here, not whether or not they should or should not based on goodness or feature improvements they want to pay to see).
As the DB moves further away from profit centers in an organization, at what point is it no longer being monetized?
Personally, I’d like to see a model where the OSS developers can and are paid in all of these cases for their work, but I’m not always sure there is anything better than a contract to support and build new features (classic OSS support model).
Discriminating by field of endeavor is contrary to the definition of open source software, and has been since before the term even existed. It's not open source, it's effectively Shared Source and developers who care about open source should stay away from this.
AGPL requires network-accessible code to be disclosed & licensed under an AGPL-compatible license.
The Commons Clause license outright prohibits SaaS-style offerings of the licensed code.
A lot of startups licensing their code under AGPL might still have AWS et al. eat their lunch, becuase all Amazon needs to do to remain compliant is to publish any modifications made to the AGPL-ed code.
> For purposes of the foregoing, "Sell" means practicing any or all of the rights granted to you under the License to provide to third parties, for a fee or other considerationon (including without limitation fees for hosting or consulting/support services related to the Software), a product or service whose value derives, entirely or substantially, from the functionality of the Software.
So, you cannot pay a contractor to set this up, because they can't deliver to you if they charge for setup or hosting?
Please DO let us know if you have any better license options than Common Clause that can help provide an open-source project for the community while stop cloud vendors from monetizing without contributing back?
https://writing.kemitchell.com/ (blog & the blogroll for finding others). See also: https://www.google.com/search?q=site%3Awriting.kemitchell.co...
While our original intention is to provide a real open-source graph database project for the community, we also want to prevent cloud vendors from monetizing the project without contributing back to the community. Exactly like what's explained in this TechCrunch article: https://techcrunch.com/2018/09/07/commons-clause-stops-open-...
That being said, Common Clause seems to be the only license that can be used. Quote the article: "Academics, hobbyists or developers wishing to use a popular open-source project to power a component of their application can still do so. "
However, we will seriously consider the license issue. Please do let us know if you know any better licenses that can be used.
I don't want to defend that company or the product, or the country they operate from, but the source code is all on github under a permissive license and thus can easily be auditioned for government backdoors. Where's the problem?