
DegDB, an open-source distributed graph database - cydrobolt
https://github.com/degdb/degdb
======
barakm
Hey, maintainer of Cayley (
[https://github.com/google/cayley](https://github.com/google/cayley) ) here.
Glad to see more people interested in this space!

It looks like your code is mostly stubbed out right now -- using gorm for any
persistence. You'll find that storing a basic graph is easy; doing more
complex traversals and optimization is a harder problem.

I can appreciate your concept of experimenting with adding monetary incentives
on top of requests and datasets. Graphs can be a useful for this, if you view
a graph as a (requestable) URI of triples.

I'd recommend linking in Cayley as your backend (you can use it as a library),
and dealing with the requests/economics as an API layer on top. The benefit of
open source is you don't have to reimplement everything yourself.

And if you have novel notions on how to distribute a graph that could be
interesting, feel free to ping me. I warn you that it's a hard problem and
bold claim in a number of ways -- it's not something you just build without
working with a couple people.

~~~
d4l3k
Hey, I'm the developer for degdb. I'm really glad to see all the interest in
this project, even in its very rough initial state.

I'm aware that gorm isn't a great option for graph storage but seemed to be
the easiest way of handling data storage initially. A lot of this project was
written at a hackathon in ~36 hours but I've been refactoring.

I looked at Cayley (and have it as a dependency in an attempt to borrow the
Gremlin parser). However, it doesn't seem to have a great way to store
"metadata". How would you recommend adding fields to triples such as language,
author, creation date, and cryptographic signature? Serializing and shoving
them into Quad.Label seems kinda hacky.

------
amitport
Nice. I've implemented basically the same thing about 5 years ago:
[https://code.google.com/p/graphpack/](https://code.google.com/p/graphpack/)
(future dev on
[https://github.com/amitport/graphpack](https://github.com/amitport/graphpack))

never had time to publish proper documentation though.

you should also check out the paper that was published a few years after:
[http://onlinelibrary.wiley.com/doi/10.1002/spe.2226/abstract](http://onlinelibrary.wiley.com/doi/10.1002/spe.2226/abstract)

~~~
d4l3k
That's really neat. Any reason you decided to abandon work on it?

Non pay-walled version: [http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-
get.cgi/2...](http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-
get.cgi/2012/CS/CS-2012-10.pdf)

~~~
amitport
As @barakm said in the comment above, "it's not something you just build
without working with a couple people"...

It requires a lot of work to get to a "real" product which includes
documentation, basic website, traction with users, etc,. I eventually got
side-tracked with life and other work. I will be very happy to continue work
on it if someone will be willing to _actively_ help out.

------
marknadal
Interesting proposition, but the majority of its design is flawed. Responding
to the design doc:

Truples are common in graph databases, however both gun and Neo4J use more
property graphs (while Neo4j has mandatory edge nodes, gun does not), which in
my personal opinion is actually useful while triples are more of an academic
thing (note here, I am biased because I am the author of gun). He chucks
conflict resolution up to some fairly nondeterministic behavior that will
ultimately require a lot of gossip, which then makes resolution hard and
untimely. He also suggests that less popular content should be charged more,
which I think worsens problems that already exist in things like Bittorrent,
not mitigates them.

~~~
d4l3k
Yeah, a lot of what you're saying I agree with.

In the several months since I wrote that I've changed my mind on a lot of
things. The new design uses a sharded architecture (on hash of topic id) with
nodes having a specific keyspace. This makes it much more robust and allows
for actual consistency. Since all data will be treated equally, there will no
longer be a penalty for less used data.

The main reason I originally thought that less used data should be penalized,
is that it takes up a lot of resources for things that aren't used by the
majority of users. However, that's a hard thing to track and makes it
difficult to propagate inserts.

As for triples vs property graphs, they're functionally equivalent. I'm using
triples because they've been shown to work quite well at scale such as
Google's massive Knowledge Graph.

~~~
jerven
In the long having a triple based API does not mean you need to have a single
triple table based storage.

In the SPARQL world that is actually quite interesting as different systems
have very different data layouts while maintaining the same basic query
language.

Comparing to the top post. I think triple systems are much more scalable than
Neo4J even if not as popular. There are a few triple systems with a trillion
node benchmarks. Even more with a 100 billion plus. Neo4J has at most ~34
billion relationships, and no more than 274 billion triples. Those are hard
limits per current Neo4J documentation. But I have not heard of any Neo4j
systems in production at that scale. While I know of at least one SPARQL
system that is running with 4 trillion edges
([http://allegrograph.blogspot.ch/2015/11/allegrograph-news-
no...](http://allegrograph.blogspot.ch/2015/11/allegrograph-news-
november-2015.html)).

------
jerven
Nice to see. Is SPARQL support planned? I am wondering because it has a
triplestore directory.

In the meantime a more interesting production ready open source distributed
graph database is worth looking at:
[https://www.blazegraph.com/](https://www.blazegraph.com/). It scales really
well and will soon have GPU support for graph traversals. It has tinkerpop and
SPARQL support.

~~~
kajecounterhack
[https://github.com/google/badwolf](https://github.com/google/badwolf) this is
another similar project being developed at Google.

Badwolf has a SPARQL-like query language too.

~~~
jerven
BadWolf is interesting in its temporal aspect. But IMHO has dropped a bit to
much from the RDF world to really take off.

Also I paid the price in early adoption of SPARQL/RDF. Not looking to repeat
that with an even earlier adoption of a non standard system. Especially, if
the temporal aspect does not appear in the data I work with.

------
wilsonfiifi
Before I clicked the actual link I was secretly praying it would be written in
Go so I could comb through the source code and understand its inner workings!
Thanks for sharing this and making it open so us lesser experienced systems
devs can learn!

