Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What was your experience using a graph database?
161 points by tiuPapa 83 days ago | hide | past | web | favorite | 93 comments
I have an idea that I want to work on during the break. And I think this is something that would suit a graph db, (a service that would link users together depending on their choice profile). So what was your experience of working with one? Which graph db did you use?

Edit: What sort of problems are well-suited to graph databases? In other words, what are some scenarios where something like Postgres is not suitable anymore?

Graph databases are the NoSQL of this half decade. Move cautiously. Just because you conceptualize it in your mental model does not mean you need a graph database. Further, recognize most (all?) implementations are not yet as performant or scalable as traditional data storage solutions.

Design your data schema first, then design your queries and finally your data lifecycle pipeline. Run some estimates on the order of magnitude for inserts, query rates, query types and storage sizes - then compare those numbers to the real-world perf of the various graphdb solutions. In general, compared to more typical solutions, you have more expensive inserts, query costs and storage sizes in exchange for more expressive queries. There aren't many application where those cost tradeoffs make sense.

Source: Twice now (2012 and 2018) I've reviewed available graphdbs for storage of enterprise security data when doing the initial platform technology selection. Both times the team fell back onto more traditional approaches.

I agree completely with this. move cautiously. I personally found the entire space very immature.

neo4j is the most mature solution I found (in the Java space). if you want to use something else go for it, but you may be surprised at the low quality.

op: I strongly recommend implementing most/all of your pipeline using graph & non-graph approaches. choose the graph approach iff you can demonstrate with hard evidence that it makes sense.

+1. We take for granted the maturity of RDB systems, but it makes for a stark comparison to GraphDBs.

Calculating over or walking over graphs sucks because there is usually a better, less brute way for any particular query.

Unless you have a set of use cases that require the ability to query across near enough random and unindexable subsets of a graph (eg Facebook), you’ll probably be better off with a DB and a spot of flattening.

Same here, I've gotten used to moving thousands of records a second even with MySQL's InnoDB on spinning metal. Then tried Neo4j and, I think, one other software—and that was the end of my experience with graph dbs.

If I were interested in them again now, I would try new and fancy solutions first, to see if there are nosql-level performance improvements in the graph db space.

> In general, compared to more typical solutions, you have more expensive ... query costs

Not to detract from your general point, but curious whether you looked at Dgraph in your analysis. It's quite fast and was built for speed.


> Just because you conceptualize it in your mental model does not mean you need a graph database.

Yes! When I was younger I worked on a problem once that needed to compute some very basic graph metrics. My seniors tried to do the work in an early graph database and it was a disaster. It turns out literally just reading in the lines from a file and counting things got the job done in a few seconds.

They refused to use the results until they were coming out of the graph database because "just in case we needed other metrics". We never needed the other metrics.

Storage is very cheap these days.

Disclaimer: I’m an 8 year Neo4j user and 6 year employee.

Neo4j is a great database if you learn how to use it and are willing to get your hands dirty every once in a while (write Java). I keep a blog at maxdemarzi.com on the things you can do with it. See the dating site blog series it may be relevant to what you are doing.

We have thousands of videos, slideshares, blog posts trying to teach graphs. If you take the time to learn you will be successful. If you connect with us on Slack and ask you will be successful.

If you treat it like an rdbms you will fail. See https://m.youtube.com/watch?v=oALqiXDAYhc for a primer and see https://m.youtube.com/watch?v=cup2OyTfrBM for the crazy stuff you can do that most DBs can’t.

Every time that Neo4J is mentioned here, the pricing issues are raised.

No exception today: I used Neo4J at a previous startup, but after using the "free" non-scaled version of it, we got into a hard bottleneck due to a lack of scalability. When we looked to scale Neo4J, we almost had a heart attack when seeing the price. Being this a 50 people "developing country" startup we could not afford to pay the very steep prices.

Disclaimer: I know nothing about Neo4J or their pricing structure. To be fair, though, niche products are often very expensive because there aren't many customers to pay for the development. If you have a million paying customers you can charge them $10 a year each and fund a reasonable sized company. If you have 5, then you need to charge them $2 million each ;-) Having said that, I tried to find pricing information on their website and couldn't find it. For me that's a sign of "If you have to ask, you can't afford it" level of pricing...

did they not offer a competitive rate when you contacted them based on your country's exchange rate?

Actively using ArangoDB. It has good performance and features. The only thing lacking is something akin to “views” but you can always denormalize into another collection albeit managed by the application. Graphs are a very natural way of thinking about entities and relationships and in its simplicity my development sped up. I personally like to try and schema out in MySQL Workbench etc. but if you want to get started doing something you can basically just make a mind-map and that’s effectively your schema, very intuitive and quick. Great for proof of concept.

Oh and the ability to query for shortest path and similar graph computations, and the DB does all the heavy lifting, is super nice.

One of the latest releases added views to arangodb in what they call "ArangoSearch"


New graph DBs implemented with the GraphBLAS linear algebra model will be orders of magnitude more performant than previous gen DB models. RedisGraph 1.0 is the first public GraphBLAS database implementation. And things are about to get even faster with the GraphBLAS GPU implementations in the works.

See previous discussions on GraphBLAS https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...

Thanks for this link, very interesting talk from the graphblas author:


I've never seen any advantage to graphdbs over relational models until I saw this talk. Raising graph analysis to the level of linear algebra is brilliant.

> Raising graph analysis to the level of linear algebra is brilliant.

Adjacency matrices is how graph problems were handled by APL programmers in the 1980s, mostly because there was no alternative before nested arrays. At the time, there was not a lot of vectorized hardware, and main memory sizes were too small for many problems, so the adjacency matrix representation was more of a problem than a good solution. It's really the advances in vectorized hardware and main memory sizes that have made this technique practical for large graphs.

Yes, a lot of innovation had to occur to make the GraphBLAS linear algebra model practical, which has been in the works for more than 10 years. It began with Jeremy Kepner and John Gilbert's work formalizing the linear algebra model over semi-rings [1]. And then working with Intel, IBM, Nvidia and the other hardware vendors to define and implement a standard set of hardware primitives. But you could really see the stars align a few years ago then when the deep-learning ML wave hit because it paved the way with a surge in demand for GPU/TPU accelerators in the data center. A lot of things had to happen in to make this all come together. It's been an industry-wide effort.

[1] Graph Algorithms in the Language of Linear Algebra, by Jeremy Kepner and John Gilbert https://epubs.siam.org/doi/book/10.1137/1.9780898719918

Here's the Redis Day London Nov '18 launch video [1] of RedisStreams and RedisGraph 1.0 GA [2] -- their benchmarks show it to be up to 600X faster over previous-gen graph DBs [2] (and that's before the coming parallel/distributed GraphBLAS GPU implementations that are in the works)...

[1] RedisGraph 1.0 launch video https://www.youtube.com/watch?v=S5WWBzi0LcM

[2] RedisGraph 1.0 https://github.com/RedisLabsModules/RedisGraph

[3] RedisGraph benchmarks https://redislabs.com/redis-enterprise/redis-modules/redis-e...

The insight there is turning the adjacency list of a graph into matrix and suddenly the goodness of linear algebra can be utilized. That's ingenious!

Thanks for the link.

And it's not just graphs...as referenced in the linked comments above, there are now linear algebra models that encode Datalog and/or the entire typed-lambda calculus as linear algebra matrix operations (and as shown in the Datalog paper [1], the Datalog linear algebra implementation is the fastest Datalog implementation to date).

But MIT and Sandia Labs are taking the linear algebra model to the next level, and are now working on encoding an entire operating system in the language of linear algebra...

Jeremy Kepner (the head of MIT Lincoln Labs and GraphBLAS lead) and his team just published a paper [2] where they define an entire unix operating system using the same linear algebra model as they used for D4M/GraphBLAS. The linear-algebra OS model scales linearly way beyond the Linux limits, and since the entire OS kernel representation is defined as generic matrix transformations, it can run on any processor, including CPUs, GPUs, or a cluster of TPUs.

[1] A Linear Algebraic Approach to Datalog Evaluation (2017) [pdf] https://arxiv.org/abs/1608.00139

[2] TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines (2018) [pdf] https://arxiv.org/pdf/1807.05308.pdf

Nice! I think the resource management and accounting parts in OS might be a good fit. Access control definitely is a matrix.

Thanks for algolia link. I found this particular comment very useful: https://news.ycombinator.com/item?id=18353943

IBM also has a GraphBLAS [1] implementation in the works [2], and this summer Jenna Wise [3], a PhD candidate at CMU, spent the summer at IBM working on a formal verification proof of the GraphBLAS code [4].

[1] http://graphblas.org

[2] IBM GraphBLAS https://github.com/IBM/ibmgraphblas

[3] Jenna Wise, CMU http://www.cs.cmu.edu/~jlwise/

[4] Formal verification proof of the GraphBLAS C API https://github.com/jennalwise/graphblas-verif

I've played around with RDF/N3 databases a bit, but mostly then from a document-oriented storage angle. I believe this has much of what you're looking for, ie getting away from table-based databases, and moving closer to optimal data architectures.

Also, if you're curious about this sort of database design, NASA shared some interesting work on XDF: TheExtensible Data Format Based on XML Concepts[0][1], which was part of their long-range Constellation Project[2] toolset for building, launching, and operating the Ares spacecraft. They detailed it in this NExIOM slideshow[3], which reading again after quite a while brings back some very good memories. Enjoy!

0: https://nssdc.gsfc.nasa.gov/nssdc_news/june01/xdf.html

1: https://github.com/sccn/xdf/wiki/Specifications

2: https://en.wikipedia.org/wiki/Constellation_program

3: https://step.nasa.gov/pde2009/slides/20090506145822/PDE2009-...

Datomic (which I already shilled in this thread) is basically immutable RDF with an explicit time dimension, plus an intuitive sql-ish relational/graph query library

And very buggy and slow.

I have worked on a few projects where graph databases were used. I have not personally seen a case where I feel they add much value relative to their complexity and tradeoffs.

One of the projects was a business workflow application centered around validating business processes by collecting and reporting on process data—think manufacturing quality control. A graph database was used in an attempt to allow application users more control in defining their workflows and give them more expressive semantic reporting abilities. We tried several graph databases. In reality what happened was that the scheme became implicit and performance was truly awful. The choice of a graph db was a strategic decision; we wanted to enable a different user experience. We probably could have done this project in 20% of the time with a standard database and wound up with a better result.

I have also worked on a problem related to storing and retrieving graph data for image processing. The graph db was obscenely slow and inefficient despite the data models being actual graphs.

Both of the projects I worked on involved people who are experts in graph databases. The level of nuance and complexity was astounding. Even simple tasks like trying to visualize the data became monumentally complex.

My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.

(edit: add last sentence)

Holy crap, have we worked on some of the same projects? GraphDB inappropriately applied to a process definition/exploration application. In my case I'm pretty sure the correct solution taking into account all desired functionality and existing support was a desktop (sigh, maybe Electron I guess) app and good ol' SQLite, but nooooo, we did a web app with server-side storage in Neo4j. I tried to sell PostgreSQL as it was 100% for sure a better fit for the kind of queries we'd be running, but that didn't fly. They had a Neo4j "expert" to whom I sometimes had to explain how Neo4j worked. The highest-tier tech manager at the client with whom I interacted was learning about Neo4j from what was mostly a marketing book from the Neo4j folks, turns out.

They burned a shitload of money on those bad decisions, on that and other products they'd previously stuck on Neo4j for no good reason, which were also seeing poor and unpredictable performance and having a rough time with immature supporting tools for the DB. Whole thing's closely related to the "we have big data, it's in the single digit GB range, so big! We need big data tools!" error, I think.

> My takeaway from both of these experiences was that unless you intend to ask questions about the relationships, a graph might not be a very good fit. Even in that case, other databases will likely perform just as well.

Precisely the same conclusion I reached, at least in the case of Neo4j. If the main thing you need to do is answer questions about graphs, it might be an OK DB to use. If the main thing you need to do is extract data from graphs, then you sure as hell don't want it as your primary datastore. Maybe—maybe—some kind of supplement to a SQL DB or whatever depending on your exact needs, but it shouldn't be what you're actually storing most of the data in.

Your experience sounds identical to my own. Particularly the "marketing book from the Neo4j folks" starting you down this journey.

I have used Dgraph on a couple of projects and enjoyed it. It seems a little more natural to me to think about all of the relationships between my data. It also can be queried using a GraphQL(ish) request.

But, it hasn't been around for long and doesn't have any options for hosting, and the JS client library is pretty basic - so you need to do a lot to have something a little more abstracted like Mongoose.

I enjoyed learning something new and will use it again - but unless I need queries that traverse many relationships, I'll probably use Postgres.

My experience with graph DBs: schemas usually become implicit and ad hoc, minimal tooling support and even less mature tooling, difficult to explore and understand your data, performance is often poor, memory usage can be problematic because they usually pull the graph into RAM, and there are few people with real experience (comparatively to traditional DBs) which can create a negative feedback loop.

I have yet to run into a use case where a graph provided more value than a relational model. I'm sure they exist, but I haven't found them yet.

What sort of problems are well-suited to graph databases? In other words, what are some scenarios where I will run into trouble using Postgres?

edit: I've previously looked into ArangoDB.com, Dgraph.io, JanusGraph.org, and Cayley.io (to run on top of CockroachDB). I understand all of these are scalable distributed systems, and Postgres is not (CitusData.com aside). Do the benefits of these other systems mainly come when you outgrow single-node Postgres (which has JSONB for "document" storage, PostGIS.net, Timescale.com, etc)?

edit 2: where can I find more technical, concrete examples than https://neo4j.com/use-cases?

> What sort of problems are well-suited to graph databases?

If you have a graph, then a graph database, with built-in graph algorithms, will be able to run operations on your graph without pulling all the data out to a client. I'm not an expert in PostgreSQL but I don't think it has any graph algorithms?

My team opted for postgres over a graph db in one of our projects to model a network of devices. Primarily due to the pgRouting extension, which implements many graph algorithms in postgres.


I've implemented graphs and trees in a relational DB quite a few times. For graphs you can just store an adjacency list. I think most of the large DB's now have tree traversal extensions to SQL (I know Oracle does). If you need to manipulate on the server then PL/Sql (or equivalent) can do the job.

The only time I could imagine that you'd need a specialist graph database would be if you had a very large number of nodes and some time / space intensive algorithm to run over them. Even in this case you could just store the data in a relational DB and use a low level interface (e.g. C) to write the specialist algorithm.

Yes, you need a nosql graph databases only for applications that require eventual consistency - and that's a hard thing to get right in any case; an application can do without that if does not have to scale to a very large user base.

In any case putting the graph nodes and links into sql tables is a much easier to do option.

PG has recursive common table expressions, which allow you to traverse a tree or graph represented as an adjacency list in a single query. (No doubt there are other important things that graph DBs do.)

Are the built in graph algorithms useful these days? I tried neo4j a few years back and didnt find anything similar to eg networkx, which is the level of graph/network algorithms I use regularly. Neo4j made life much harder for me at the time, and I had to migrate that project back to postgres.

Your question got me thinking, and led me to find this: https://github.com/bitnine-oss/agensgraph "AgensGraph, a transactional graph database based on PostgreSQL"

If you have a graph, then a graph database

How does one know if it's time to migrate to a graph database?

Right now I have a large (6.5GB) database that I have to sometimes work with, and the queries are getting more complex as more features are added to the project.

Right now the data is stored in MySQL, but to get anything interesting out of it requires multiple queries and then sifting through the data later. It all seems cumbersome.

It depends on the structure of the DB - 6.5GB is large if its attributes, if its documents then not so much. If its an index of documents then a different problem. Access patterns matter to. Like if its a 6.5 GB database that stores OLTP data, then yes thats large. If its a history of document modifications then probably small. If its historical data for a data cube sort of app, then its pretty small, I'd say tiny - I've worked on ones that get into the TB's.

It's all attributes; no documents being stored. About 2m records.

So thats a bit over 3k per record. Forgive my curiosity but whats the application domain? Thats a lot of info per record.

Better port that sucker to Postgres sooner rather than later. You know, for future proofing.

How about hierarchical data? E.g., a country's laws / statutes? Thousands of text files organized in a hierachy. I've resorted to relational denormalizing and hacks to get decent performance. So I'm wondering if a graph database would be a better fit.

E.g., I frequently need to query, "What is the list of ancestors from the object to the top of the tree?"

In a relational system, this needs to be stored in some kind of data structure, which is redundant. But theoretically in a graph database, it'd be a fast O(log n) query if I'm not mistaken.

My understanding is that graph databases are better suited to handling many-to-many relationships

I'd put it slightly different, and more simplified:

It's about the relationships (between the nodes).

Good question, I should probably add this to my original question. Personally, I am trying to build a recommendation engine of sorts and I think Graph DB is suited for this but I am no expert.

I gave a talk about this at ClojureNYC in 2017, the second half of the talk is about modeling graphs in various popular databases (SQL, Neo4j, Mongo, Datomic) and the problems you encounter https://s3.amazonaws.com/www.dustingetz.com/Getz+2017+Datomi...

Datomic (an immutable database for doing functional programming in the database) is central to my startup http://www.hyperfiddle.net/ , I don't think Hyperfiddle is possible to build on other databases that exist today. The future lies in immutability, full stop.

Datomic : databases :: git : version control

Couple of quick points a. If you are primarily dealing with categorical data - strings as opposed to numbers, graphs are pretty good for storage, retrieval and visualization. Categorical - genes, diseases etc. and require a lot of graph algorithms eigenvalue, shortest path etc. Biggest difference in querying is - in SQL you say "what" you want, in SPARQL / Gremlin you say "how" you want it i.e. what relationships to take b. Graph as a representation format shines, but as a storage mechanism, have not found it to be optimal. Many go for graph as a layer on top of RDBMS c. RDF is better in terms of standardization instead of prop. Graph Database. This is because you can arbitrarily decide what should be a vertex vs. what should be a property. In things like Neo4J it gets fixed once you decide. Virtuoso comes pretty close since it implements RDF on a RDBMS (my limited understanding) d. It is good for representing knowledge / metadata (atleast RDF) but again I would stay away from representing data. e. Your choice of graph algorithms typically ends up being what comes prepackaged (say gremlin etc.), or you take it intermediary and use algorithms there (Networkx / igraph (igraph is awesome)) or writing your own (this is not trivial typically) f. Many pointed about the schema, I actually think this is the advantage of RDF. My typical workflow is to start with RDF, do my basic stuff on RDF until I have a good understanding of what are the queries and therefore optimal schema and then migrate to RDBMS as needed. Trying to do large scale on RDF on a laptop infra is not optimal

I would try to use some combination of RDBMS with runtime graph like igraph. YMMV.

GrapheneDB (Neo4j) on Heroku here - relatively small scale project so far (1000's of users) but very easy to use, no problems, great support. If your problem space is a graph, using a graph will make your life easy once you get over the GraphQL learning curve.

We're on Rails and so use the Neo4j.rb gem which has been around for quite a while and also has a ton of work and support around it. The Ruby DSL for it makes it as easy as you would expect in Rails for most basic relationships and queries, and you can access more advanced features or drop into GraphQL as needed.

For our use case, a graph DB was definitely easier than trying to manage relationships and categories in a relational DB but it will definitely depend on your use case. Good luck!

I used the same technologies, but neo4jrb did have a few rough spots. The change from v7 to v8 was a large, breaking change, and I ran into a number of (minor) compatibility issues with other libraries.

As you say, the appropriate use of a graph database is highly situational. I certainly would not advise doing so without a great deal of careful consideration. However, I tend to doubt whether any amount of consideration would be sufficient; it may be that the most reliable way to determine whether a graph database is the best solution is to try it and see. At least, I was never really able to stop wondering whether I might be able to do things just as easily using Postgres; I would defer to those with more experience, however.

Stay away from Orientdb it is a super hyped multi-model database but it's unreliable and hard to maintain. I worked with it for 2 years incl. paid professional support.

Can you elaborate a bit on the problem(s) you encountered?

RedisGraph! it’s a module added to redis. very simple and performant, they supply a docker image already running it and clients exist in most languages.

I wanted to use it, but it was not compatible with cluster redis on AWS. That seems ridiculous to me

You might want to consider using Managed Redis by Redislabs https://redislabs.com/redis-enterprise/vpc/

I'm using datomic for 2 years and it's awesome. When I look to others graphdb, I see that they don't have a powerful query engine.

I've endured Datomic for 4 years and I really wonder what the folks who enjoy it are doing, because for me it us utter misery.

- It's miserably slow (if you wish to contradict this statement, please provide numbers) - Consumes gobs of memory (export our data to JSON and it's orders of magnitude smaller) - Full text search will consume all your CPU cores for if you given it a short query (seriously, don't touch this feature, it is a basket full of footguns) - Resource leaks (the Cassandra backend used to leak full databases!)

I've been up to 3:00 a.m. dealing with bugs in Datomic. What use cases does it actually work for?

Did you get in touch with cognitect? My datomic is running for more then 2 years with no crashes. I'm using it on dynamodb.

The datalog like query language combined with immutability is quite nice. I'm no expert in Datomic yet, but it is pretty nice that the queries are simple Clojure data structures. I can use regular old Clojure code to build queries dynamically (if needed).

It's awesome seems like an understatement :D

Understand the difference between graph querying and graph processing. If you have a question like, “give me all the friends of friends of person X that share a college”, then that’s a query that a graphdb would be helpful for. Saying, “find clusters of nodes based on relationships of types x, y, and z” is a processing job. You might need to query the graphdb to get the graph data that’s then loaded into a graph processing engine.

So be clear on exactly what you’re trying to do.

I've been using Neo4j to store the bitcoin blockchain. Bitcoin transactions have a graph structure, and so by storing the entire blockchain in a graph database you can easily query for connections between different bitcoin addresses.

If you're interested, I've done some explanation of it here: http://learnmeabitcoin.com/neo4j/

My experience with Neo4j has been a good one. The database is currently around 1TB and runs continuously without a problem. It's fast enough to use it as a public blockchain explorer, whilst simultaneously keeping up with importing all the latest transactions and blocks on the network.

It took some time to get the hang of the Cypher query language to get it to do what I want, but the browser it comes with is handy for learning via trial and error. I found the people on the Neo4j slack channel to be incredibly helpful with my questions.

We used Neo4j in a prototype rebuild of our primary application for about a month before switching back to MySQL. In that time it crashed and lost all of our data several times. We found it untenable.

My faith in it was soured at that point. That said, this was probably 5ish years ago now, so I cannot speak to how much it's stabilized since then.

Consider using Arangodb, a multi-model db. Document store, and graph-db in one. Has a query language (aql) that is easy to understand. Joins and relations are easy to accomplish. Version 3.4 has many new features, like full text search and geojson.

As someone who is working on a project trying to visualize person ->company relations, I went trough GraphDb (that's the name of the product) that uses SPARQL and then migrated to neo4j, I prefer ne4js Cypher language over SPARQL for graphs. From what I've found SPARQL cannot do "find all related entities from starting entity X with depth n"

Also I prefer projects that have web UI for queries and visualization

This isnt true, you can achieve that easily with SPARQL using property paths. I've been doing that and much more with Sparql for years

I worked on a project which employed the Gremlin API on an Azure Cosmos DB instance. This configuration worked (largely) seamlessly, though it is worth accentuating how we worked explicitly with graphs and consequently benefited from being able to actually store and query our data as graphs. An alternative could have been Neo4j, which, based on preliminary research, would have done the job, too. Overall, it was a nice, new experience which arguably presented a learning curve. One piece of advice, however, would be to thoroughly evaluate your requirements and strongly consider whether you actually need a graph database in the first place! If you are dealing with an excessive amount of interconnected relationships, definitely consider this option. If not, chances are you will have something functioning faster with a conventional database.

Postgres not suitable? Postgres is probably the most powerful multimodel data store with the lowest TCO on the market today.

Postgres can be used for columnar data, as a graph database (using https://github.com/bitnine-oss/agensgraph), as a timeseries database (using https://github.com/timescale/timescaledb), and as a KV store (which is astoundingly simple to do using its builtin jsonb column type)

In fact at this point the only thing other than Postgres I would look at is FoundationDB due to the fact that (although it takes some time) you can model and run ANY kind of data store on top of it.

We started using AWS Neptune (you can use it either with SPARQL or Gremlin) for a medical knowledge graph and while the AWS service is very good, the problem is the same as with any other NoSQL database:

Schemaless is ok if you just need to throw data and then analyze it in whatever way you find, but for two way data flow (ie: using written data in a user facing application), schemaless is a true headache and you end up keeping your schema in the application level anyway.

Besides that, the experience have been great, our hyphotesis of evolving our schema of concepts and relations between them freely have been proved successful.

I am thinking about building an app using Neo4j.

I built a prototype with Postgres but there were bottle necks in the IO and querying. Did some research and it turns Neo4j might be better suited.

The app is a beer recommendation system using data scraped from Beer Advocate. It makes recommendations based on the location of the user and beers that the user enters in.

Why a graph database might be suitable for this is that there are no null relationships with Neo4j. In my limited understanding, this means that full table scans don't need to occur for making recommendations based on location.

Supposedly cosine simularity and other recommendation algorithms are built in, so looking forward to using that.

Highly recommend you look at JanusGraph. Gremlin query language is quite easy, it's open source and the underlying graph DB for most cloud providers. I've extensively looked at the others. Janus Graph was our choice.

I'm interested in using graph databases to model history and the humanities. I guess knowledge management, which is something I find RDBMSs are weak on. Not due to the schema, but because if you want to make relationships between anything possible, you end up with one table containing all nodes, and one containing edges. Which if you have multiple types of node (e.g. People, Places, Battles) gets very messy.

I'm keen to hear from anyone who's worked in this field.

Not to steal thunder from graph databases, I want to point out that recursive SQL enables support for graph-like data structures in a relational database. You write constraint logic in the form of trigger functions to enforce directionality and prevent cycles and so forth. With this, you may not need a full graph database for your work. It's up to you to decide which is the best tool for the work, but you can't make a sound decision without being aware of the tool in the first place.

The problems that are well suited to graph databases are the ones where you tried them on a more practical technology first add compared performance and maintenance costs.

^^This, exactly. First see how far you can push a decent relational DB, then try harder. When you really can’t push it any harder, or it becomes uneconomical to do so, _then_ look at graph solutions.

I used neo4j on a prototype [0] in 2015 trying to get better insight on how our various business users collected data from external partners. So it was like a giant metadata repository.

It was useful in trying to find connections and shortest paths. Project wasn’t build out for reasons unrelated to graph dbs.

[0] http://jupiter.phiresearchlab.org/#/main

I wrote program that stores code in a DAG using Neo4j. It was fun, I enjoyed it. The performance was ok, but once I had finished prototyping it, I moved it into postgresql and never looked back. Neo4j was excellent for prototyping, but it requires way more memory and cpu for a production instance than postgresql. I want to do more projects with neo4j, but I haven't had the time.

Neo4j was easy to get started with though I have not built any really large projects. Graph databases are ideal for when you have lots of entities but are not sure about a classification schema, as you can just create classification entities and connect them to other entities as you see fit. If you like working with linked lists you'll pick up the concepts very quickly.

spent months evaluating graph databases, finally implemented it in Cassandra. Don't believe the hype.

Can you explain further? I tend to agree because the claim that "graph databases are good for highly connected data" seems funny - can't relational databases be "highly connected" in the same way? Isn't it really about query languages or maybe writing expressive schemas? I don't see how claims about efficiency stand up to an objective critique.

The issue at hand was to define vertexes with relationships and look up and track those. More writes than reads. I wasn't lead on it after the initial review and testing, but if you're doing something super simple, do it on postgress, I'd say. If you're doing something medium complex, use an off the shelf graphDB but beware rough edges and newness bugs. If you're doing something large scale and unique, talk to smarter people than me - I can introduce you :-)

I asked a similar question some months ago: https://news.ycombinator.com/item?id=16542183.

Slightly related: I've been looking for a distributed graph database (preferably a RDF triple-store) for weeks, and haven't been able to find anything promising.

I am working on one on top of foundationdb. See https://github.com/amirouche/asyncio-foundationdb. Chime in!

I was deeply disappointed. I need very rich algebraic structures, like hypergraphs and lattices, and no graph database seems to be up to the challenge.

I'm also curious on this, for those who are using it, are you using it as a primary datastore?

One question though . How do people deal with changing schema for a NoSQL database ?

They use migrations scripts.

Procedural and navigational data manipulation with graph, declarative with SQL. Views in SQL, no equivalent of them in graph.

It should not even take half a brain to understand why [pseudo-]relational replaced and obsoleted graph 4 decades ago.

No. Incorrect on all levels. You can do declarative with both, or take a procedural approach with both. You can use SQL or Datalog with both. You can create views with both, materialized or otherwise. Graphs have never been obsolete. Graphs and the networks they represent are the foundation for some of the most sophisticated algorithms we know.

4 decades ago, mate. 4 DECADES AGO. Relational replaced graph 4 decades ago because graph lacked all that functionality 4 decades ago. And that it took 4 decades for graph to merely level up (and mostly by mocking relational-ish behaviour at that) is only telling of how advanced relational has always been.

(And pls note carefully that I didn't say graph was obsoleted as a mathematical theory. What I did say was that graph was obsoleted as an appropriate mathematical theory * for underpinning doing data management with it *.)

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact