Hacker News new | comments | ask | show | jobs | submit login
Dagoba: an in-memory graph database (2016) (aosabook.org)
133 points by jxub 9 months ago | hide | past | web | favorite | 12 comments

This is a really, really great article. Shame it didn't get on HN 2 years ago.

For anyone interested in the history of databases, as the author eluded to, it began with graph databases first (called Navigational Databases), and then SQL came in and actually ruined the approach and performance (despite people thinking SQL databases were designed for storage constraints, that didn't come until later after the relational model had been more developed). If you want to learn more about this history, I did a talk out in Italy (later recorded on a web cast) about it here: https://vimeo.com/208899228/b9bc9eaaa4

"eluded" (escaped/avoided) -> "alluded [to]" (referenced)

Here's a link to this article with working source on GitHub: https://github.com/dxnn/dagoba

Really useful to see it all put together.

This is an excellent article -- not just technically, but tonewise as well; kudos to the author.

> Fortunately, we're building an in-memory database, so we don't have to worry about any of that!

If I wanted to do something in memory, I'd just use data structures in my process. Not so much a database if there is no persistence.

Haven't read the article yet, about to, but that sounds a bit naive.

In-memory is not exactly to do with persistence, you can still have non-volatile data be in an "in-memory" DB and take snapshots of it, but even so, what about when you decide you need to scale your compute but not your RAM and move the data structure to a networked box, or when you need to have more than one local process read the same data structure, or when you want to ensure your architecture will always support such a thing if you need to, or if your language doesn't have a decent implementation of a particular data structure, or etc.

It might make it convenient to do those things, but it doesn't make it a database really though does it? The article admits it doesn't have to handle the most complex problems associated with data persistence (getting transactional blocks of data to a storage medium).

Your examples become a problem if your graph can't fit in the RAM of one host. If not that, then you have then distinct graphs to handle, and so a paralellisable problem to be handled by multiple processes. All you have in that case is an abstraction which lets you have a unified interface to all of them. Notably that also isn't relevant as the one in the article doesn't do any of that. It is a graph library.

You're running a small social network, and you have ~200 containers running your Rails app, which gets scaled up and down by k8s all the time, not to mention twice-daily deploys.

Seems like you might want that 100GB of user relationship graph to live somewhere else, so you don't (1) massively inflate your RAM requirements on the app, (2) have to fan out updates to hundreds of boxes instead of a couple, (3) deal with cold start problems during the relatively frequent app restarts, (4) etc.

How does the application described in the parent help with any of that? It doesn't allow other processes to access the data from what I can read, it doesn't allow distributed unified access to anything?

> "If I wanted to do something in-memory, I'd just use data structures in my process"

I think most of the repliers are posting response to the blanket statement you made on _all_ classes of in-memory databases (think memcached, redis, voltdb, HANA etc)

Ah no, clearly there is a need for a process based service for such use cases. But this is a graph library, not a database

This _is_ a data structure. For scalable graph queries.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact