
Graph query languages: Cypher vs. Gremlin vs. nGQL - jamie-vesoft
https://nebula-graph.io/en/posts/graph-query-language-comparison-cypher-gremlin-ngql/
======
motohagiography
I use cypher most days over the last few years and I like it because it's a
complete level of abstraction over the database.

In gremlin and this new nGQL, the whole notion of "INSERT," implies that the
graph is just a representation over top of a relational db, and power users
should understand RDBMS concepts articulated in SQL queries first with this
novel and cute graph thingy after for managers. It's like asking Excel users
to understand and care about pointer arithmetic.

Gremlin leverages some peoples mental sunk costs in SQL, where cypher is very
verbose, but lets me reason purely about my graph model without hacking over
the implementation. The others aren't bad, but the people who will be using
graphs won't be DBAs.

In this sense, cypher/Neo4j isn't a competitor to noSQL and rdbms products,
it's a competitor to spreadsheets, where the majority of people actually get
work done themselves, instead of specifying it and having others engineer it.

~~~
EGreg
I was really attracted to graph databases mainly for the ability to do joins
in effectively constant time rather than O(log N) time.

But then I realized that sharding and localizing data can accomplish roughly
the same thing.

Also the graph database doesn’t have to duplicate data so much for joins,
saving on memory.

If you are going to have a huge dataset, build your data as RDMBS first and
then make a cache in a graph database.

I say this only because graph databases are not mainstream yet.

~~~
karatestomp
I came to exactly the same conclusion after having Neo4j pushed on a project
by managers who'd been sold by their "it's great for everything!" marketing.
At least as of ~2 years ago, no, it wasn't. Fine for a narrow set of query
types for data of a very specific kind of shape (dense graph) that you don't
care about much and can re-generate if it gets screwed up. Unsuitable as a
"database of record" (poor integrity enforcement, transactions very limited)
and quite bad (slow, awkward) at a lot of things one might want to do with it,
even kinda "graphy" things, that didn't happen to fall in its strengths.
Memory hog, too. Seems like most graph databases, though possibly better at
some of those things, will tend to be similar, since they _have to_ make trade
offs to achieve notably good performance at whatever they're benchmarking to
put on their marketing pages.

Cypher was really nice, though.

~~~
frant-hartm
What version did you use? What you are describing sounds like Neo4j from
around 6 years ago, not today.

~~~
karatestomp
About two years ago. Whatever was current then. We had a paid license too
because this client was all-in on N4J. They had their own internal champion
who'd set himself up as the "Neo4j expert" and got them to send him to
conferences and hang out on phone calls and try to fix the fires he was
partially responsible for but didn't get blamed for. All their projects had
serious issues resulting from their insistence on a particular stack, which
had basically nothing really capable of protecting data consistency at any
point. We had to fight for TypeScript (vs. their prefered vanilla Javascript)
to gain a tiny semblance of sanity, productivity, and stability in that
environment.

Transactions in N4j couldn't handle modifying one entity [edit:the schema of
one entity type, I mean] while updating another, which was pretty limiting
(say, you want to do safe Rails-migrations-style version bumps in the DB in
the same transaction as your modifications, so they can't get out of sync).
Constraints capabilities and data types were very limited. Perf if you stepped
off the Golden Path, which was easy to do by accident with something that
looked boring and normal, was mediocre at best for our use case (smallish
sparse sub-graph fetching, mostly)

[EDIT] meanwhile, all the official material from N4J was doing its MongoDBest
to sell itself as 100% suitable for production for 100% of use cases, because
of course it was. Look anywhere else and you got a very different, more
accurate story.

------
pgt
No mention of Datalog?
[http://www.learndatalogtoday.org/](http://www.learndatalogtoday.org/)

~~~
carapace
FWIW there's a Prolog-based graph DB called Terminus:
[https://medium.com/terminusdb](https://medium.com/terminusdb)
[https://github.com/terminusdb](https://github.com/terminusdb)

I don't know much about it but they are associated with Seshat Global History
Databank [http://seshatdatabank.info/](http://seshatdatabank.info/) which, if
I understand correctly, is something like a serious attempt at "psychohistory"
(like in Asimov's Foundation.) [https://medium.com/terminusdb/stuffing-the-
whole-human-histo...](https://medium.com/terminusdb/stuffing-the-whole-human-
history-in-one-knowledge-graph-e52984f562c7)

~~~
pgt
Thanks for TerminusDB. I'm currently implementing a toy datalog-based graph DB
in Clojure backed by RocksDB.

(I wish Terminus didn't publish articles on Medium, where you need an account
to read)

------
mark_l_watson
Sorry, but this article fails big time: no mention at all of SPARQL.

For application developers, having access to general public Knowledge Graphs
like DBPedia and WikiData can be a very good resource.

While I am also a big fan of more general graph databases like Neo4J, not even
mentioning SPARQL is such a HUGE OMISSION that I have to suspect some
commercially motivated bias in this article.

The decision of which graph data platform to use is not always black and
white. Use SPARQL with a RDF/OWL data store or a more general graph data store
like Neo4J as appropriate. Learn both technologies.

~~~
tasogare
Is there any SPARQL implementation that returns results under 10 seconds on a
big dataset? Because, I never found a public SPARQL endpoint that gives
remotely acceptable response time.

~~~
bjoernbu
While there may be several things missing for many productive use cases
(especially inserts/updates), I think QLever ([https://github.com/ad-
freiburg/QLever](https://github.com/ad-freiburg/QLever)) fits that description
very well. There's also a public endpoint linked there.

------
souenzzo
There is also EQL. A query language written in EDN

[https://edn-query-language.org/](https://edn-query-language.org/)

Once it use vectors/maps to describe the query, you can
compose/generate/transform with simple data operations (concat, filter,
assoc...)

------
bionhoward
We evaluated umpteen graph dbs this past year and chose vanilla Postgres
instead because Neo4j/RedisGraph have insane licenses.

It’s useful you’re comparing these languages. I would suggest to make yours
more like Cypher, specifically the arrows <-[]-> are much less verbose than
BIDIRECT/ REVERSELY, and MATCH is a lot less verbose and more powerful than
other query languages. It sucks to want to use Neo4j then not be able to due
to license and business issues.

If there were a quality graph db with a permissive license and Cypher,
serverless hosting, search and JSON, we’d use it... neo4j didn’t work out
because they require an NDA to get a price quote (!) and the Redis Source
Available License basically reads, “you can’t use this for startups”;
RedisLabs.com quotes a “low” price of $500 monthly to get modules with basic
stuff like JSON, Search, and Graphs (“cloud pro”) - but then the pricing page
triples that number. We pointed this out to redislabs at least 3 different
channels (email, git, Twitter) but the pricing error still exists on their
cloud page. If RedisLabs leaves an $800 / mo typo sitting on their page for
months, how do you trust them with sensitive customer data? Went with Amazon
Aurora PostgreSQL instead. Love Row Level Security (but wish you could specify
columns inside your row policies)

You might also include ArangoDB AQL

~~~
henryfjordan
Neo4j is licensed as GPLv3 unless you want the enterprise features
(replication). I've run the non-enterprise version in production and it was
working fine for a limited workload. Replication would have been nice at some
scales but it wasn't the reads that were the issue anyway, it was the writes
which replication wouldn't help anyway.

RedisGraph is licensed under their weird license but as far as I can tell you
just can't expose the RedisGraph API directly to your customers. Building an
API on top of it that adds some abstractions should be fine (think a social
network powered by the extension). I could be wrong about that though.

Are you actually doing graph work in Postgres? I learned about recursive CTE
queries once upon a time and that was what prompted adoption of Neo4j.

(I'm not a lawyer and don't take legal advice from the internet)

------
jjgreen
I rather like Cypher, easy to get into with the (node)-[edge]->(node)
construction, difficult in the middle (until you realise that WITH is very
different to SQL's), then a delight. Gremlin, so they let Java's horrible
camelCase leak into their syntax? Oh my ...

~~~
spmallette
You write Gremlin in the programming language of your choice and with that
comes the idioms of that language. For Java that means, camelCased syntax as
that is what Java developers expect. So a function like hasLabel("person")
looks right to them. In C#, that same function is HasLabel("person") and in
Clojure it is (has-label :person) and so on. Gremlin isn't meant to be an
embedded string within your programming language. It is actual code within
your programming language.

~~~
jjgreen
Well I stand corrected. I was judging from the examples in the linked article,
so will have another look ...

------
dreamcompiler
A new "graph query language" that doesn't even mention Prolog is like
describing "a new way to create illness immunity by injecting the body with a
weakened version of the virus that causes the illness" or "a new computer
architecture wherein both instructions and data reside in random-access
memory."

------
quantified
I may be missing something, but to any of them let you return information that
is not a node in the underlying graph? I had a project once where a user
request would result in a graph that was derived: result nodes and edges were
derived based on the database’s nodes and edges. Rolled my own system because
nothing supported that.

Related, do any of the let you do something simple like return a count of
vertices obtained from a traversal, or do you need to walk a result and count
them yourself?

------
gozzoo
Is anybody still using SPARQL?

~~~
fils
Yes.. I work on several projects that leverage SPARQL. The article is also
remiss in not mentioning the work the W3C is doing along with Neo4J to do its
own alignment ([https://www.w3.org/Data/events/data-
ws-2019/](https://www.w3.org/Data/events/data-ws-2019/)). Indeed, this article
seems very self serving in its omissions. There is a follow up meeting planned
soon for that too.

Also, much of the work in validation (SHACL, Shex) of graphs leverages SPARQL
so it's not going anwhere soon. I would like to see it evolve to allow more
vertex based searches without the need for extensions though.

~~~
tasogare
From your link "W3C's RDF uses URIs (Web addresses) for nodes and link labels
in directed graphs. This has the advantage of enabling them to be dereferenced
to obtain further information, making for a Web of linked data. In particular,
nodes can be dereferenced to graphs on remote databases."

I think spreading those kind of lies[1] if a part of why there is a divide
between the W3C (at least the RDF community) and the rest of the word. The W3C
should be more transparent about RDF capabilities and realistic about the real
power of the Semantic Web.

[1] this false because RDF use IRIs not URLs, and not all IRIs are
dereferenceable. Moreover, even URLs used as resource id are not constrained
to be dereferenceable per the spec (and when they are, you'll get a lot 404 in
practice). Also, the same effect can be obtained as easily with properties
(key value pair) attached on nodes of graph database.

~~~
AtlasBarfed
Yeah, the SPARQL people need to realize that the semantic web was a massive
dud in the programming and database market, and a lot of that was overreach,
overpromise, and a lack of focus on "real-world" problems.

Thus, if someone is looking to unify graph QLs that are in actual use in
"business" problems, SPARQL and the overall RDF aren't going to get attention.
You can start with the fact that RDF basically assumes you want to globally
address all your data with URIs, which will result in ridiculously verbose
overhead in naming/addressing. Nevermind the fact that such things basically
promise some sort of long term durability that the actual web has shown
doesn't exist. After all, today's URI link to URI
www.tla.com/link/to/some/data can mean the world wide wrestling foundation one
day, and the world wildlife foundation the next.

In particular, Gremlin was adopted by DSE / Titan / successor to Titan which
ran atop Cassandra for near-limitless scalability.

RDF and the Semantic web, while being intended for the massive WWW, seemed to
not have any care for demonstrating techniques, queries, and architectures at
scale.

Likewise, are Datalog and Prolog used extensively?

~~~
p_l
Both Prolog and SPARQL/RDF/Semantic Web are used at scale and pretty
extensively. Unfortunately often behind closed doors, but there are very
performant systems involved.

URIs and RDF in general don't need to use public HTTP links or anything like
that, meanwhile the layered systems like OWL and RDFS provide some impressive
features for implementing complex systems, especially when you actually want
to use a semantic graph instead of loosely-schemed bag of nodes and vertices
common in non-RDF graph databases.

~~~
AtlasBarfed
So sqarql rdf should stick to it's limited application space and let more
generally applicable graph technologies go their own way

~~~
p_l
Except it's actually a superset of those simplistic models.

It's just not getting the "kool" looks and gushing reviews, mostly written by
people who totally missed all the previous research. Which is a common problem
all around in computing.

------
johnnycat650
Have any of you checked out TigerGraph yet? I have heard if you have run into
scale issues with any other GraphDB, to check them out. First glance,
benchmarks blow Neo4J out of the water... GSQL seems intuitive, anybody have
any experience?

------
hans_castorp
They forgot to mention that Cypher is also supported by AgensGraph

~~~
jamie-vesoft
Hey Thanks for pointing it out. We'll add that soon.

------
zachrip
Unrelated to the content, but can you please not put white on yellow like
that? It's basically impossible for me to read with cataracts.

~~~
jamie-vesoft
You meant the pictures right? Sorry for the bad experience. Will definitely
improve our color palette for future posts.

~~~
zachrip
Thank you :)

------
ChicagoDave
Huge fan of cypher. Used it for a previous start up, including the data store
for an OAuth2 implementation.

~~~
jamie-vesoft
Good to know. Cypher is quite popular indeed.

