
Getting Started with Graph Databases - arunmoezhi
https://academy.datastax.com/demos/getting-started-graph-databases
======
jmiserez
FYI: There are two "Graph Databases 101" posts on the front page now. This one
and the older one here:

[https://news.ycombinator.com/item?id=11257280](https://news.ycombinator.com/item?id=11257280)
(4 hours ago, 15 comments)

~~~
fibo
It seems he did it to gain votes.

~~~
ehartsuyker
Except they're two different articles posted by two different people. This
seems like an amusing coincidence.

------
dcw303
As I had zero experience with graph databases, this was generally a good
intro, but the article could do with some polishing to save newbies like me
from the underlying suspicion that they're missing something obvious. I tried
just reading the tutorial without watching the video, and suffered some
cognitive dissonance.

At the end of the article, there's two diagrams that show the behaviour of
jcvd.out() and jcvd.outE(). The little gremlins are pointing at two vertices
and two edges respectively, but from the 15 lines of code earlier, they're the
wrong connections, right? jcvd only has edges to kickboxer and bloodsport, but
the diagrams show connections to kickboxer and timecop.

So I looked at the code again, and realized the timecop vertex was never
created, which seems kinda odd if you're going to use it in the diagram.

I eventually watched the video and saw animations where the little gremlins go
to all three vertices/edges, so it's probably just a badly timed screencap for
the article. Not that that explains why timecop is not in the code example,
but _whatever_.

~~~
rustyrazorblade
You're right, there's some inconsistencies in the code vs the slides. Must
have missed that. I'll fix that going forward.

------
fibo
Ahah you put the same title as
[https://news.ycombinator.com/item?id=11257280](https://news.ycombinator.com/item?id=11257280)

------
mitsoz
I was very interested in the subject, but hated this video.

Very superficial, started off with a complicated relational schema to
criticize relational databases, but never ended up explaining how a graph
database would simplify the problem. I thought that the graph database
concepts + language was way more complex than SQL schema + language.

Very fast talking and moving of slides, is this supposed to sound or look
smart? On top of that, 50% of the time the video was a close up to the
presenter's face moving left and right in an awkward fashion.

~~~
owen11
Good feedback. I might need similar feedback for my upcoming talk. I am about
to give a talk about Cayley (open source graph db written in Go) and I am
working on my slides [http://oren.github.io/adventure-
graphs](http://oren.github.io/adventure-graphs)

Let me know what you think and also join us on IRC (#cayley on freenode) if
you find it interesting.

~~~
throwaway41597
TL;DR: graphs are everywhere in the real world, so using a graph DB will be
simpler and more efficient; examples of graph queries follow.

Thank you but may I ask who this presentation is for? Because from a quick
glance, it's not very deep in technical details. I mean I'm curious about
graph databases, but comparing them to vanilla SQL schemas isn't very
informative. What I really want to know is what makes them different from
denormalized schemas (which is what I expect most people would use).

------
mullsork
Kudos for providing both a video walkthrough AND an identical text version.
Made me really happy!

------
dperfect
Maybe I still just don't "get it", but this explanation didn't really show me
how a graph database is any better than an RDBMS, apart from a somewhat
simpler interface (which in my opinion is still no better than many ORMs).

For good performance, it sounds like you _still_ need to make good decisions
about what to index, as well as putting hard limits on your data - even if not
strictly enforced by the data model. And if those kinds of things affect
performance, then surely changes to the schema (or whatever you'd call it
here) will result in a need for migration/reoptimization. The trouble is, when
that needs to happen, I personally would rather have tight control over when
and how it happens (with a migration), rather that rely on a black box that
supposedly makes everything simple. I'm assuming graph databases have ways to
control that process, but that kind of proves my point - you don't get greater
performance, simplicity, _and_ flexibility for free, especially when you
compare it to something as mature as the current RDBMS's. So what problem is
it really solving?

Also, the comparison is a little unfair to RDBMS's - this makes it sound like
you'd need separate join tables for every kind of person-media relationship,
when you could certainly just use one join table with a column for various
relationship types. And the complexity of TV shows with seasons and episodes?
I'm pretty sure those distinctions would still need to be modeled in a
thoughtful way with a graph database, but I could be wrong.

~~~
jonpaine
Index-free-adjacency.

There are myriad pros/cons between graph/relational/nosql, but to me, a "real"
graph db will have index free adjacency, allowing it to do deep traversals
(friend of a friend-of a friend-oaf-oaf....) in constant time. It finds it's
value in traversal of deeply connected datasets.

Any article or comparison that doesn't at least try to explain index free
adjacency isn't going to make a compelling case for a graphdb, let along a
native graph db. One reason for that may be that many "graph" databases don't
have index free adjacency, so have worst than expected deep traversal
characteristics.

~~~
dperfect
That makes sense. I'm seeing index-free adjacency mentioned in some other
comparisons. Sounds pretty cool.

So if each node has pointers directly to related nodes (without needing an
index lookup), does that also mean that inserts and updates are slower? From
what I understand, if you're bypassing the need for an index lookup at query
time, you have to pay for that at some other point in time - specifically by
looking up the appropriate pointers at the time of insert/update. Is that
accurate?

~~~
jonpaine
That's right. However, there are still indexes, even if they aren't necessary
for traversal. Ideally you'll use an index to find a start node and traverse
on from there (or in the case of your question, update from there).

------
lqdc13
Is Titan going to survive even though datastax bought out the team? Their
github repo hasn't been very active recently.

My issue with graph dbs is that as requirements change you usually have to add
more granularity to the edges and nodes. Eventually the schema becomes much
more complicated than a RDB.

~~~
rail2rail
FWIW AWS recently added DynamoDB integration support for it.

[https://aws.amazon.com/blogs/aws/new-store-and-process-
graph...](https://aws.amazon.com/blogs/aws/new-store-and-process-graph-data-
using-the-dynamodb-storage-backend-for-titan/)

------
sschueller
Very cool. What are some of the issues to look out for switching from an old
SQL model?

~~~
woodman
I don't have any experience with this particular product, but done work with a
bunch of semantic web software (which is graph based). The most difficult part
of migration is related to ontology, the edges. Feature creep is very easy and
if you don't set hard limits you can easily find yourself graphing metadata
about graph metadata :) You can do this with relational databases as well -
recursive logging tables and the like, but it is easier to catch because of
the exploding table count. Authorization isn't as easy either, so you'll want
to give that some thought before you jump in.

------
marknadal
This article overly inflates the complexity of graphs and databases in order
to sound fancy. I've written a response that is very direct and shows how
simple a graph database can be: [https://github.com/amark/gun/wiki/Graph-
Databases-101](https://github.com/amark/gun/wiki/Graph-Databases-101) .

~~~
rustyrazorblade
Hi. Presenter here. Honestly the goal wasn't to sound fancy. If you're going
to work in the GraphDB world, you're going to come across this terminology.

If your concern around my intro is the complexity described of the relational
world, well, that's kind of the point. Anyone with at least a few years
experience in the RDBMS world has probably come across a project that's
spiraled completely out of control with a outrageous number of many to many
relationships that are almost impossible to work with. The role of the DBA
just to manage your queries and tables is a reflection of that difficulty.

GUN looks like a cool project. Good intro, & thanks for the feedback.

~~~
marknadal
Sorry for the abrasiveness, I actually liked your overview of table outrage (I
should have been positive and mentioned this). What I didn't like is that the
article starts right away with mentioning Gremlin - which is very popular
within the academic community but difficult for most developers. Honestly
anything outside of SQL and MongoDB's query spec can be frightening for devs.
Because you are genuinely talking about another language you have to learn.
This complexity makes graphs themselves look like they are hard and difficult
and for serious people, like machine learning. So I think it is dangerous to
introduce people to complicated new query languages in a 101 article because
people will feel discouraged that if they can't get past Gremlin then they'll
never be able to use graphs at all (despite the fact that they use them all
the time, especially on the frontend, without even realizing it). Elsewise I
thought you did a good job explaining the problem and even talking about the
basics of semantics. Just going down the Gremlin route either scares people
off or appeals to the more academic elite.

~~~
phpnode
Gremlin is a practical query language designed for developers and has very
little (maybe zero) usage in academia. SparQL is the more established option
which does get some academic (and production) usage, particularly as a lot of
research on graph databases has a cross over with semantic web research.

This isn't the first time I've seen you criticise the "academic elite". You
seem to use it as a crutch, an excuse for sloppy thinking and poor quality
software.

