Ask HN: If you've used a graph database, would you use it again? - networked
======
gr__or
I used Neo4j for a few side projects but my go-to is still PostgreSQL. The
largest flaw I see with Neo4j (and probably other graph databases as well) is
that it forces you to think of your entities as either vertices or edges and
that line tends to be not as clear as you might expect.

For example: (:Person)-[:BEFRIENDS]->(:Person)

If we'd want to store a date with that relationship, Neo4j got your back,
that's entirely possible (= relationships can have attributes). But now our
requirements change and we'd also want to have an entity for Events shared by
friends (e.g. friendship anniversary), now we have to remodel our data to
something like:

(:Person)-[:IS_IN]->(:Friendship)-[:HAS]->(:Event)

In SQL that wouldn't have been a remodel, because there's no difference
between a relationship and an entity. We would've gone from:

Person(id) Friendship(person1_id, person2_id)

to:

Person(id) Friendship(person1_id, person2_id) Event(friendship_id)

So I feel like the vertice/edge distinction Neo4j makes, gets in the way of
changing data model needs and I ultimately think that modeling your data as a
graph is not helpful. Though it can be extremely helpful in querying and
that's where its biggest strength lies.

~~~
pikchurn
Maybe you want a hypergraph?

[http://www.hypergraphdb.org/](http://www.hypergraphdb.org/)

~~~
vinceguidry
I wish this were a networked database that I could just slap on a Digital
Ocean droplet and query from anywhere.

------
andrewstellman
I've been using RDF and triplestores / RDF databases for the last half-decade,
developing both front-end and back-end systems, and training many developers
to work in RDF. If you're used to either relational databases or object-
oriented design, it's a really different way of thinking about data. Just like
OOP is really good for certain kinds of problems and models, and RDBMS is good
for other kinds of problems and models, RDF is great for specific kinds of
problems. For example, if you need to combine several data from several
somewhat incongruent sources into a single coherent database (e.g. dozens of
data feeds that are _almost_ the same, but you need to preserve the
differences while combining the parts that are the same), you might end up
with headaches trying to come up with a good RDBMS design; RDF is really well-
suited for that kind of problem.

While a lot of the work I do is covered by NDA, one problem that I've applied
it that I can talk about is analyzing basketball play-by-plays. I've spent
some time talking to the analytics team at an NBA franchise, and it turns out
doing interesting analytics on play-by-plays can be a surprisingly tough nut
to crack. RDF was a great tool for tackling this. Here's the source (written
in Scala), for anyone interested at having a look:
[https://github.com/andrewstellman/pbprdf](https://github.com/andrewstellman/pbprdf)

~~~
ar-jan
Do you know if there's off-the-shelf software (GUI) to create/edit/explore
your own RDF dataset? Or does it always involve building your own front-end?

~~~
andrewstellman
I like WebVOWL for visualizing the RDF ontology. Here's the pbprdf ontology
displayed in it:
[http://www.visualdataweb.de/webvowl/#iri=https://raw.githubu...](http://www.visualdataweb.de/webvowl/#iri=https://raw.githubusercontent.com/andrewstellman/pbprdf/master/generated/ontology.ttl)

A few years ago I put together a quick GUI in C# to make it easier to run
SPARQL queries: [https://github.com/andrewstellman/sparql-
explorer](https://github.com/andrewstellman/sparql-explorer)

I haven't found an RDF editor or visual tool that I like. Some people like
Topbraid Composer: [https://www.topquadrant.com/tools/modeling-topbraid-
composer...](https://www.topquadrant.com/tools/modeling-topbraid-composer-
standard-edition/) (commercial, closed source)

------
nsedlet
We had a production Rails app running with postgres, and we decided to
implement some of our models with Neo4j. Graphs felt like the right way to
represent the data, and all of the models were new, so we felt more free to
choose the approach that seemed best.

A month later we rewrote everything in SQL - the main drivers were:

\- as we refined our model, we realized that a relational DB with a bunch of
join tables was good enough

\- our developers were more comfortable working with SQL

\- it wasn't possible to run complicated queries involving both databases
simultaneously

\- the Rails ORM felt easier to use than the Neo4j Ruby APIs (though this was
certainly a function of our own familiarity with Rails and relational
databases in general)

\- having the extra database complicated our codebase and complicated our
deployment

There was nothing horrifying or surprising in our encounter with graph
databases. It just felt like we just made the wrong initial architectural
decision. We were still trying to define the problem and were trying to use
something we didn't fully understand.

I'd hesitate to use graph dbs in the future unless I needed a high-performance
app with a lot of data that only a graph could model well. Otherwise having
two different types of databases is annoying.

~~~
lairdpopkin
My instinct is that using two different databases in one app adds a lot of
complexity, but it's not clear to me that if you need to pick one database
that SQL is a better choice than a graph database. It really depends on the
application. For many purposes, a graph database can do what a SQL database
can do, because a SQL TABLE is very much like a graph collection of nodes. Of
course, if you're starting with an app on a SQL database and just adding to
it, that's very different from a "green field" project where you can pick
technologies freely.

------
maxdemarzi
I started using Neo4j 8 years ago after a long time as a relational database
developer. I needed it for a project building a LinkedIn clone with skills (at
the time LinkedIn didn't have skills). I was going to need a massive join
table of user-skill-user and decided it was best in a graph. I built a ruby
gem "neography" as a Neo4j driver and became an open source contributor. Later
Neo4j contracted me to build a rules engine in a week for one of their
clients. That got me a job as a Sales Engineer at Neo4j. 100+ blog posts, 200+
github repos later, lots of travel, and many wins, I still love the job, and
still love the database.

~~~
somehnreader
How is a graph database built under the hood?

I know that a decent RDBMS (simplified) will consist of the following:

\- data in blocks organised with a block-size that the underlying filesystem
likes

\- a cache for the most frequently used blocks

\- every index is a B-Tree with pointers to the blocks containing the tuples

Then there are column stores as well as row stores, and for compression you
might have some dictionary encoding going on.

Now, how does the Graph Database look under the hood and what are the
complexities involved? How is the Graph persisted?

~~~
moxious
Graph databases are built a lot of different ways; for example Neo4j's
architecture is very, very different than something like an RDF triple store,
or datastax on top of cassandra.

[Neo4j internals can be seen here]([https://www.slideshare.net/thobe/an-
overview-of-neo4j-intern...](https://www.slideshare.net/thobe/an-overview-of-
neo4j-internals))...it's a bit old but I think mostly still accurate.

In graphs you have to persist nodes and edges, though you may partition nodes
by label/category. In the case of neo4j there is a property store rather than
a set of columns.

~~~
somehnreader
Thanks, very helpful. I am just looking at it and will have a bit of a think
about this later :)

------
v-yadli
Interesting question. I think it is critical to point out that, the underlying
principles for a graph database is different from RDBMS, because the operators
in a graph database may not comply to relational algebra.

Consider the following case:

Client A issues a query -- starting from a vertex, conduct bounded closure
search, giving every visited vertex a mark (coloring, or lexical flag,
whatever you would expect from a graph algorithm)

Client B issues a query -- clearing any marks applied to a particular vertex,
which happens to be one of the visited vertex of Client A's query.

Now, race condition aside, let's assume we first process query A then B. Would
we allow query B to succeed? It is clearly possible for query B to break the
semantic of query A, for example, query A goes through a bridge and then query
B cuts the bridge, so the connectivity information is lost.

Of course we could say that such query A should be a part of a transaction,
and isolation can be more strictly enforced -- but again, to what degree? Poor
locality will cause the transactions to be interconnected with each other. How
does a graph database determine what is the true purpose of the algorithm
under each query? What does it guarantee?

Many graph databases now claims ACID, but what do they really mean?

Is it just a fancy query language over a traditional data model? Say, you
could also build graph queries for a SQL database -- what does a graph
database provide that such graph-over-SQL cannot?

p.s. I work on Microsoft Graph Engine:
[https://github.com/Microsoft/GraphEngine](https://github.com/Microsoft/GraphEngine).
We decide to build a modular graph processor rather than calling it a graph
database, because we don't really know by default, what kind of semantics does
a user want. With GraphEngine, you could plug in linear query languages likq
Gremlin or GraphQL, you can also plug in SPARQL, or traditional relational
model with strong guarantees, or down to bare-metal key-value store with
atomicity and durability only. I do think that a graph data model is very
helpful in many scenarios, but I think we really need to advance the research
on the semantic of graph management.

~~~
jexp
Transaction isolation is a no-brainer, so I don't think your example holds.
Also, your example is not related to the algebra but to isolation.

"Claiming ACID" what is ambiguous about that? Transaction support with
different serialization levels, like other databases that offer it.

And Neo4j originally started b/c RDBMS was not able to execute the complex
deep traversals needed in real time. Dedicated storage & query engine for
graphs allow you to run statements quickly that would otherwise take too long
to execute.

Regarding the data model, the property-graph model is much closer to the
object model but with richer relationships, it doesn't suffer from the object-
rdbms impedance mismatch and is better suited to express real-world domains &
scenarios. It also represents semantic relevant relationships as first class
citizens in the database, allowing for proper information representation and
much faster retrieval.

Disclaimer: I work with/for Neo4j, for 8+ years and still love it.

~~~
v-yadli
> "Claiming ACID" what is ambiguous about that? Transaction support with
> different serialization levels, like other databases that offer it.

A non-graph-database would not provide operators like deep traversals.
Operations are tightly bound to ACID as a whole, not just isolation. Of course
ACID would always hold if you strictly linearize everything, but that defeats
the purpose of data management, and one would achieve the same goal with even
macro processors like `m4`.

Getting traversals and other graph algorithms into the business means that
there are lot of things that should be reconsidered, like constraints, and
triggers.

For example, if you cannot write a constraint to limit the local clustering
coefficient of every entity, you do not proceed in your traversal with a good
upper bound time budget. However, it is the vertices that you _don't_ visit
that will propagate these constraints back while you are halfway there.
Parallelizing such queries, in my opinion, is beyond state-of-the-art
research.

~~~
twic
> A non-graph-database would not provide operators like deep traversals

You can do this with a recursive common table expression.

~~~
moxious
While this is technically true, in the SQL world this requires wizard level
skills that most SQL developers do not possess, and when you arrive at this
spot, you end up with a query that performs really, really badly.

Look, between the database formalisms, they're all "complete" in the sense
that you can choose any database and solve all the problems. But certain
databases are going to be pathologically bad at solving certain types of
problems, which is why there are so many sub-niches that persist over time.

For deep path traversals, you _can_ do it with RDBMS, but a graph DB is going
to win every time in part because the data structure is just set up for that
purpose. There are other queries where RDBMS will be best too. So it goes.

------
kenning
The coolest thing to me about neo4j is that it spins up a little web server
with an extremely friendly UI that allows people to build queries and run then
locally. My non-coder coworker wrote all her own queries and found, then fixed
errors in the data entirely on her own.

Our data set could have been handled fine with a relational database,
honestly. However this was a rare case where over-engineering a problem and
using the latest technology saved time.

~~~
rspeer
So I used Neo4j in 2011. It was very exciting at first, and then I got quite
burned by it when I tried to make something real. Many people in this thread
are describing a very different experience, and I want to know if it has
really dramatically improved, or if the use cases of Neo4j users are just
different from mine.

\- In 2011, it worked great on small data that fit in RAM, but once the data
became bigger than RAM, queries would take unexpectedly large numbers of
seconds. How much data do you put into Neo4j?

\- I admired the friendly little web server until I realized that it was a
massive security hole: anyone who could access it could run arbitrary code on
the server, it ran over plain HTTP, and if you put it behind an HTTPS proxy,
it stopped working. I hope this isn't still the case. Does it have reasonable
access control and HTTPS now? Could you use the Web interface in production?

~~~
jakewins
On (1), the memory layer has been entirely rewritten since 2011. It used to be
a combination of MMAP and on-heap caching; mmap in java being notoriously
terrible and caching on the heap being even worse. The memory layer now works
similar to postgres, with a user-space page cache managing blocks of RAM. So:
It's certainly changed, and in my experience much for the better.

On (2), yes, the little UI now requires a username/password, and it supports
HTTPS. HTTP remains available, defaulting to localhost access.

------
voondo
I used the graph part of ArangoDB for a recent project and I appreciated the
flexible nature of the relationships between entities (being edges, so always
n-n). For example, my customer did often change its mind about some critical
parts of the business logic (and thus relations between entities) and it was a
pleasure to update without rewriting too much code. Also the queries involving
many relationships seems more powerful and simpler than in a RDBMS. Anyway,
maybe not an universal solution but, as a web/mobile developper, I can't see
the actual limitations with my daily use case.

------
Maro
Facebook uses a custom graph database called TAO (nodes, edges, traverse them
[1]) for storing (almost) all production data. Based on DBMS classes from the
Uni days this is counterintuitive, but <scale>. In practice it just worked,
and it didn't keep / enabled SWEs to move fast. Having said that I don't see
why I would use a graph database unless I have >10M DAUs.

[1] [https://www.facebook.com/notes/facebook-engineering/tao-
the-...](https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-
the-graph/10151525983993920/)

~~~
exceptione
Does that mean Daily Active Users? Please spell out abbreviations!

~~~
sarabande
TAO = The Associations and Objects (from the linked paper)

SWE = Software engineer

DBMS = Database management system

DAU = Daily active users

------
eudora
I'm currently using a home-baked graph database built on top of PostgreSQL for
[https://unlikekinds.com](https://unlikekinds.com)

It stores information as triples (Bob -> Married to -> Gary) and with
properties (Bob.last_name = Stamper)

I've been finding that the benefits keep on paying off. I can arbitrary relate
any thing to any other thing (and query those relationships) without changing
code or database schema at all.

And the fact that it's a literal, intuitive, representation of reality makes
things much easier to reason about.

When viewing something and seeing all the related info, the data nerd in me
loves it: [https://unlikekinds.com/t/unlike-
kinds](https://unlikekinds.com/t/unlike-kinds) (meta)

------
nickstefan12
Our app models directed graphs in Postgres with a closure table (the
transitive edges between nodes).

The advantages are that it's just sql, has good performance, and we can query
the graph using relational logic rather than n+1 traversal. The trade off is
space (the closure table has the potential to be huge).

So it depends on the size of the data set. Part of me wishes we'd built
something that's easier to partition, but for now that's a future concern.

~~~
quietbritishjim
Surely another disadvantage of storing all that redundant data is that either
(a) you lose atomicity of updates or (b) have some terrifyingly large
transactions for a relatively simple change to the underlying data.

Whether this is acceptable depends not only in the size of the data set but
also how often it changes compared to how often you query it.

------
mrjn
There is a lot of stigma attached to graph DBs. Would it provide good
performance? Should I ever use it as my primary database? Is my data ever safe
with a graph DB?

If we go beyond that, assuming there was one which provided great performance,
data integrity and can be reliable as a primary database — then Graph DBs are
just better.

First, the schema and data modeling is incredibly simple. Our minds think in
graph terms. Things connecting to each other is very natural to us as human
beings. Graph DBs replicate that in a very straightforward way.

Then, many graph DBs, being modern support flexible schemas, something which
is a huge win for speed of application iteration.

Graph DBs are also sparse. Which means it's a lot easier to model many
differents kinds of data sources and data types into the same "table." What
that gives is the ability to query across anything in the entire DB, without
being concerned about table level boundaries.

We were solving this problem with Google's knowledge graph where we had to fit
movie dataset in DB. The film industry has so many roles (director, producer,
actor, cinematographer, and so on), that having a table for each, with many
times same person doing multiple roles, is just super fucking hard. With
hundreds of such roles, each role being a table would be insane. Representing
this information in graphs is a cakewalk in comparison. And this problem gets
a lot worse if you then switch to the music industry, books and others (hence,
the decision to be a knowledge "graph").

Functionality wise, graph DBs provide a super set of SQL. They support all the
(equivalent of) select x from y where z type statements, while also doing fast
and recursive traversals and joins at the DB level.

And recursive traversals and joins are a huge deal. The rise of GraphQL over
REST APIs is in a way indicative of that. To render a page in modern websites,
you need to recursively ask for components (think questions in Quora or Stack
Overflow). I remember Quora would have thousands of such components on a
single page. GraphQL made it easier to query for those, by expressing a way to
retrieve this tree in a single query. But, the internal mechanics of doing
this via relational tables is still the same, which is to repeat a query and
collect cycle. Graph DBs natively support things like these, and imagine how
much more efficient and powerful that is.

Once you start to wrap your head around graphs, it’s hard to not be
wholeheartedly impressed by their power.

Disclaimer: I'm author of dgraph.io. But, don't let genetic fallacy blind you.
My points above stem from the reasons which propelled me to jump into the
graph DB world.

~~~
srd31
The film industry example makes sense. Partially duplicative tables make
development confusing.

Do you still feel there is an advantage of graph over relational when we have
a known schema and known relationships without deep recursive relationships.
For example, an inventory tracking system, we have items, customers,
deliveries,etc...? I like the idea of being able to throw some metadata onto
any of those tables quickly during prototyping, but my gut feeling is that
long term we run into the need to be more structured and explicit like we do
with a relational DB. It reminds me somewhat of the tradeoffs with NoSQL DBs
during development

~~~
mrjn
Not only do I think there's a lot of benefit in representing data for which
the schema is already fixed, we've actually gone to the extent of showing this
by building a whole replica website for Stack Overflow.

[https://github.com/dgraph-io/graphoverflow](https://github.com/dgraph-
io/graphoverflow) (unmaintained, so please don't complain if it doesn't work
:-)).

If you build systems like inventory tracking, question answering, etc., the
hard logic of relevant data retrieval can either lie in your application or
within your DB. Former is the case when you use relational DBs, latter is the
case when you use graph DBs.

With graph DB, you can put the data together quickly, but then have the DB do
the hardlifting of "given a customer, find me all the items and the locations
of delivery" (just random Q that I spent 2 seconds, not representative of real
workload); or "given a question, find me all the answers, sorted by a score;
top 5 comments on these answers sorted by date, with a count of total
comments, count of likes, count of dislikes, etc." (real workload for QA
sites). Then the application iteration becomes largely a factor of query
iteration, not backend logic iteration.

^ And that's solid! That kind of stuff is what makes developers love JS over
C++ (random comparison).

------
solresident
Currently, using Gremlin on AWS Neptune. The learning curve was steep.

If the situation calls for it, sure! The current use case is sort of up in the
air. The decision was made to use a graph database to store the mutation of
records over time, but then the higher ups want to limit what's put in it,
so... I'm not sure if the computation costs are worth what it's actually
capable of. From what I'm gathering, if one is looking to store complex data,
which is highly connected through edge-case relationships, as in, greater than
5 types of edges, then it might be worth looking into, but I can't imagine how
a dynamic/traditional table database wouldn't have been faster in terms of
what we need it for, querying large lists of data, with 3-4 edge traversals.

I only have a year~+ of experience with databases, so grain of salt. In terms
of personal preference, working with a graph database has been quite fun.

------
pimmen
We use Titan DB and are updating it to Janus since Titan is dead. We used
Neo4j for a small hack at an event at the office mapping up bus routes and
homes living along the bus stops to find out quickly how accessible they were,
and it worked well ... for a hack.

I really like Gremlin and I like how you can extend the relations and do new
computations you never thought of easily, but it's not the savior it's been
hailed as, in my opinion. For a lot of problems SQL will do you well, and
migrating can be a bitch with SQL but if it's a domain where the basic
functionality is solved (such as a web shop) I wouldn't bother with a graph
database until i find a good use case for it. You can always migrate your SQL
tables to a graph DB later on if you think it's worth it.

------
lolive
We have a production system based on Neo4J (and Elasticsearch). We have had
some hard times with Neo4J (cluster issues, deadlocks) but their support
helped us figure out those issues.

Honestly, a graph database made discussions with the domain expains MUCH
easier. And the schemalessness made evolutions much easier. Our technical team
embraced the concept really quickly. And the domain experts have a clean
mental model of the data that were otherwise split between very unfriendly
technologies (XML databases, files, CSV).

We wonder whether Elasticsearch could be removed from our architecture
(because managing 2 databases is a mess). But we do not know yet if Neo4J can
handle both the load and the variety of search use-cases.

------
voodootrucker
I've used TitanDB multiple times for non-production projects, and would
absolutely use it again. It runs in process on the JVM, and models true graph
problems well. (dependencies in large code bases, social networks)

------
WalterGR
What's your use case?

As for me, for decades I've wanted to be able to have everything stored on my
computer represented as a graph. (Times have changed, so there's obviously a
strong network-connected aspect now.)

------
jerven
Yes, using RDF+SPARQL I would use it any day of the week. The power off RDF
comes when having to deal with other peoples data, or providing your data to
other people. This is not a usecase everyone has but if you do nothing beats
RDF+SPARQL in a financial sense.

The variety of DBs available if you use RDF is great as well. Different DBs
have different strong sides but we can keep the same data model and query
language.

------
jjirsa
Helped build one on top of cassandra at past employer - they’re still
apparently happy with it. It’s something like 5-6 petabytes and handles
millions of writes per second, hundreds of thousands (or maybe millions) of
reads/traversals per second. Powers a successful and growing APT-hunting SaaS
platform, and I’m still pretty proud of it, even though I don’t work there
anymore.

------
bullen
I made this: [http://root.rupy.se](http://root.rupy.se)

It's also based on a small HTTP server:
[http://github.com/tinspin/rupy](http://github.com/tinspin/rupy)

I will use it for the rest of my life in every project that needs a relations.

------
Jeff_Brown
I am finding Datalog* more general and easier. It lets you write rules
declaratively rather than procedurally. I don't know how its performance
compares to a traditional graph database, but it sure saves a lot of
programming time.

(I'm using pyDatalog, which is open source and works with a variety of
database backends.)

------
ktk
I'm an engineer that used to do RDBs for a long time. One day a customer of a
friend came with an issue that was in my opinion impossible to solve with
relational DBs: He described data that is in flow all the time and there was
no way we could come up with a schema that would fit his problem for more than
one month after we finished it. Then I remembered that another friend once
mentioned this graph model called RDF and its query language SPARQL and
started digging into it. It's all W3C standards so it's very easy to read into
it and there are competing implementations.

It was a wild ride. At the time I started there was little to no tooling, only
few SPARQL implementations and SPARQL 1.1 was not released yet. It was PITA to
use it but it still stuck with me: I finally had an agile data model that
allowed me and our customers to grow with the problem. I was quite sceptical
if that would ever scale but I still didn't stop using it.

Initially one can be overwhelmed by RDF: It is a very simple data model but at
the same time it's a technology stack that allows you to do a lot of crazy
stuff. You can describe semantics of the data in vocabularies and ontologies,
which you should share and re-use, you can traverse the graph with its query
language SPARQL and you have additional layers like reasoning that can figure
out hidden gems in your data and make life easier when you consume or validate
it. And most recently people started integrating machine learning toolkits
into the stack so you can directly train models based on your RDF knowledge
graph.

If you want to solve a small problem RDF might not be the most logical choice
at first. But then you start thinking about it again and you figure out that
this is probably not the end of it. Sure, maybe you would be faster by using
the latest and greatest key/value DB and hack some stuff in fancy web
frameworks. But then again there is a fair chance the customer wants you to
add stuff in the future and you are quite certain that at one point it will
blow up because the technology could not handle it anymore.

That will not happen with RDF. You will have to invest more time at first, you
will talk about things like semantics of your customers data and you will
spend quite some time figuring out how to create identifiers (URIs in RDF)
that are still valid in years from now. You will have a look at existing
vocabularies and just refine things that are really necessary for the
particular use case. You will think about integrating data from relational
systems, Excel files or JSON APIs by mapping them to RDF, which again is all
defined in W3C standards. You will mock-up some data in a text editor written
in your favourite serialization of RDF. Yes, there are many serializations
available and you should most definitely throw away and book/text that starts
with RDF/XML, use Turtle or JSON-LD instead, whatever fits you best.

After that you start automating everything, you write some glue-code that
interprets the DSL you just built on top of RDF and appropriate vocabularies
and you start to adjust everything to your customer's needs. Once you go live
it will look and feel like any other solution you built before but unlike
those, you can extend it easily and increase its complexity once you need it.

And at that point you realize that this is all worth is and you will most
likely not touch any other technology stack anymore. At least that's what I
did.

I could go on for a long time, in fact I teach this stack in companies and
gov-organizations during several days and I can only scratch the surface of
what you can do with it. It does scale, I'm convinced by that by now and the
tooling is getting better and better.

If you are interested start having a look at the Creative Commons
course/slides we started building. There is still lots of content that should
be added but I had to start somewhere: [http://linked-data-
training.zazuko.com/](http://linked-data-training.zazuko.com/)

Also have a look at Wikipedia for a list of SPARQL implementations:
[https://en.wikipedia.org/wiki/Comparison_of_triplestores](https://en.wikipedia.org/wiki/Comparison_of_triplestores)

Would I use other graph databases? Definitely not. The great thing about RDF
is that it's open, you can cross-reference data across silos/domains and
profit from work others did. If I create another silo in a proprietary graph
model, why would I bother?

Let me finish with a quote from Dan Brickly (Googles schema.org) and Libby
Miller (BBC) in a recent book about RDF validation:

> People think RDF is a pain because it is complicated. The truth is even
> worse. RDF is painfully simplistic, but it allows you to work with real-
> world data and problems that are horribly complicated. While you can avoid
> RDF, it is harder to avoid complicated data and complicated computer
> problems.

Source:
[http://book.validatingrdf.com/bookHtml005.html](http://book.validatingrdf.com/bookHtml005.html)

I could not have come up with a better conclusion.

------
rambossa
Are GraphDBs ever the right move in terms of performance?

~~~
threepipeproblm
For things that can reasonably be done with RDBMS, probably not... see "Do we
need specialized graph databases? Benchmarking real-time social networking
applications" (PDF link
[https://event.cwi.nl/grades/2017/12-Apaci.pdf](https://event.cwi.nl/grades/2017/12-Apaci.pdf))

But it seems likely that for queries that differentiate graph databases, such
as finding long, variable-length paths, that there are cases where they can
excel.

~~~
SOLAR_FIELDS
I found the same. If you live in a very specific use case they excel.

I work with large amounts of geographic data. We use Cassandra and RDBMS as
the traditional storage but whenever we want to do network analysis it goes
into graph DB just to take advantage of the tooling.

And one of our use cases is exactly what you mention. If you are interested in
the properties of the edges of long highways in a road network that can
stretch hundreds of edges, for instance, RDBMS ain’t gonna cut it.

------
marknadal
I wrote a driver for MongoDB in 2010, but then moved onto Neo4j in late 2013.

I liked Neo4j quite a bit, it could handle all the sensor/IoT data we could
throw at it. Back then it had (and I'm sure still does) a beautiful
interactive data visualization dashboard, great Cypher tutorials, and more.

Neo4j is a good database. I went to write a database driver for it, and found
it extraordinarily difficult. I knew it would take at least a month of work to
build.

At the same time really cool tools like Firebase were becoming popular, and
Multi-Master database architecture with Cassandra and Riak were showcasing
what high availability could do.

So I decided, rather than implementing the Neo4j driver, which I knew was
bound to Neo4j's Master-Slave architecture, I would rather switch to Firebase
or build my own mashup of all the tools I wanted:

\- Firebase (realtime)

\- Neo4j (graphs)

\- Cassandra (multi-master / P2P)

\- CouchDB (offline-first)

I spent a few weeks building a prototype and submitted it to HackerNews in
early 2014. It was a huge success.

Since then, we've gotten 7.5K+ stars
([https://github.com/amark/gun](https://github.com/amark/gun)), raised venture
capital money, and introduced decentralized cryptographically secure user
blockchains, and a ton more.

Graph databases, to me, are so compelling, I have not only "used them again"
but spent the last 3.5+ years of my life dedicated to building, improving, and
making them more awesome.

I certainly hope others try them, even if it isn't GUN. They're worth a shot,
but aren't a silver bullet, so use them where it makes sense.

~~~
zaphirplane
What’s gun, you mean gnu open source ?

~~~
grzm
Likely they're referring to
[https://github.com/amark/gun](https://github.com/amark/gun) , which they
refer to earlier. amark is Mark Nadal.

~~~
marknadal
Thanks, yes (you and xumingmingv) are correct.

Hey, I noticed your nice resources in your profile - particularly Haidt. For
my wife's PhD, she worked with Baumeister, a colleague of Haidt. Would love to
hear more about your interest in civil discourse and other such things! Shoot
me an email?

~~~
grzm
Done!

