Hacker News new | past | comments | ask | show | jobs | submit login
RedisGraph: A High Performance In-Memory Graph Database as a Redis Module (redisgraph.io)
237 points by wll on Mar 17, 2018 | hide | past | favorite | 41 comments

A week ago while in Tel Aviv I talked with my colleague which is the lead developer of RedisGraph. He will be in San Francisco at Redis Conf with us and will release exiting news about RedisGraph, currently the development is very active.

Graph on Redis is cool and I get that hexastore is the way to go. Its very different from the Neo4J like graphs where the nodes and edges have properties. Hexastore is more based on the W3C RDF model. What confuses me is then why not have SPARQL as the query language?

Although RedisGraph current version uses the concept of a Hexastore to represent the Graph I'm not sure this is the best way to go both memory (X6) and CPUwise (when traversing the graph).

That said RedisGraph does supports both nodes and edges properties as these are not stored within the Hexastore but on the actual entity object.

The new version of RedisGraph which will be presented at RedisConf2018 will completely replace the Hexastore with a new data structure which is better suitable in terms of both memory and cpu.

That’s great to hear! I see RedisConf is at April’s end. Would it be possible to get early access? RedisGraph is a crucial component of a system I am designing.

Well, RedisGraph is an open source project, you can browse its repository at https://github.com/RedisLabsModules/redis-graph

Although the work which will be presented at Redis Conf has yet to be published.

I was indeed wondering about the upcoming release.

This is great !

I'm curious how does it perform in comparison of neo4j. Especially in memory usage.

Regarding graph databases: does anyone have specific recommendations regarding something suitable to hold 2-digit TB of graph data, and allow many queries regarding how datapoints that come up are connected to some areas of interest, through the big knowledge graph[0]? I'm looking into doing some prediction work based on it, and have not found any suitable data store so far, which would work well with it, due to the size and budged restraints limiting storage to NVMe modules, instead of DRAM. Any help finding a suitable datastore would be greatly appreciated.

[0]: http://data.gdeltproject.org/documentation/GDELT-Global_Know... https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-...

Haven't tried in production, but Janus Graph[0] supposed to be able to scale to the sizes you're describing. The another one I looked at is ArangoDB[1] but again, never used in production.

[0] http://janusgraph.org

[1] https://www.arangodb.com

The production stack for this is almost always postgresql/Cassandra+spark. Some people also use stuff like Janus graph, but that is still too new.

Thank you for the suggestion, this seems to be exactly what I needed. It seems that the new-ness of it prevented me from finding it earlier.

Edit: I just read some more, and it seem like this is a graph-native "brother" of cockroachdb. Which was the one I previously cosndidered, until learning of the inadequate down-pushing ability of graph queries, which would increase network needs a lot. Let me know if you want me to try and notify you once there are benchmarks or production tests successfully completed.

Dgraph is very, very immature. I'd monitor their forum for a bit before deciding whether to use - you quickly get a good feel for the issues and raw edges: https://discuss.dgraph.io/

Hi Jan from ArangoDB here.

2-digit TB of graph data is for sure a serious graph :) Depending on what you actually want to analyze and how the graph is structured, you can either use our Pregel integration for high-level analytics of the data set or use SmartGraphs to shard the data to a cluster and perform in-depth traversals, pattern matching and such stuff. SmartGraphs is an Enterprise feature but free for evaluation and testing. Please note that the suitability of SmartGraphs depends on the structure of the graph itself (e.g. do you have or can identify communities to shard by efficiently)

We created a series of tutorials which guide you through such a process https://www.arangodb.com/pregel-community-detection/ (next step always at the end of each tutorial). You can choose between two storage engines. Think RocksDB is the better choice in your case, as everything is persisted to disk (data/indexes) and you can configure how much main memory should be used, so you can configure the trade-off between performance and main memory yourself. If you have fast SSDs it's even better, as RocksDB is optimized for that.

I think this is an unfair comparison as RedisGraph keeps everything in memory (graph index + graph entities) for the sake of performance, AFAIK Neo utilise disks.

In addition to my previous comment I would like to add that RedisGraph memory usage is proportional to the size of the graph stored, AFAIK Neo utilizes disk as the main storage, however the Java VM may have objects in memory in order to traverse the graph, so may also use some memory.

I've never tested the memory usage difference between the two, but it should be an interesting test to perform.

Generally speaking it seems like a common practice now days to load large portions of your data from disk to memory for performance reasons obviously, no one is willing to pay the penalty hit of disk seek when it comes to real time data querying.

This makes sense however even on-disk stores written in Java sometimes have non trivial memory usage characteristics. ElasticSearch is an example, so I would check what happens in the real world. Well of course an in memory system will always end having a memory usage proportional to the graph size...

For many use cases, ES has great memory characteristics because the doc values, which is now default.


As an example, I have the HN data set on my laptop - about 15 GB of data and I have the max mem set to 3GB (heap is usually ~1) and I can search, aggregate, etc. very quickly without memory problems.

Sure I don't want to say that on-disk systems always need to have the whole set of pages in memory, but depending on the performances that you want to obtain, at least a percentage of the pages will need to be on memory, so if one has the illusion that: in-memory systems -> memory usage proportional to the amount of data. in-disk systems -> fixed memory usage, just more disk, this is usually not true.

It's 100% in memory so I would imagine memory consumption will be bigger. But being in C and in-memory is meant for CPU efficiency.

Oh man. This is my idea of nirvana. Basically done with any other data-store except maybe for full-text search.

Lightweight python implementation of both graph and full-text search (plus other fun stuff):


Seriously. Have you tried this out for full-text search? https://github.com/RedisLabsModules/RediSearch

Memory storage is fun but there's a very hard max limit on how much data you can have. For many use cases with constant storage it's awesome though.

While there are certainly plenty of people who need more storage than is practical to put in-memory, most people don't. E.g. it's rare enough to find people with single datasets in the 1TB+ range. Most databases I've come across at clients max out in the tens to hundreds of GB range. Many more could be easily split (e.g. they may be multi-TB or above, but are that way because of aggregation of data that is logically separate, such as relating to different customers, and can easily be sharded) - though whether or not that's worthwhile is a separate issue.

So while you're right, for most people it's not going to be a practical problem. Query performance tends to be a bigger problem in practice, and IO bandwidth is often the main cost driver for servers (with NVMe SSDs providing so much additional IO bandwidth that they often pay for themselves several times over by reducing number of servers needed for people who insist on "traditional" disk focused databases), pushing server costs into ranges where spending more on RAM but needing fewer servers to handle the IO load, is more and more often a good tradeoff.

How much data do you have? You can get 24TB RAM on a server these days.

That sounds extremely expensive

They're also really high latency, unless ridiculous-amounts-of-ram server technology has gotten better in the past few years.

Here's one I found that supports up to 48 TB:


They don't include a price, so I'd assume it's in the low millions or high six figures.

The latency doesn't sound that bad. It's 200ns to access memory on another local NUMA node, and 500ns to access memory on another machine using NUMAlink.

Fascinating tech. I had no idea this could be done with Xeons.

I believe SAP has had this for a while[0] (but they are columnar while Redis is key-value).

[0]: http://www.ziti.uni-heidelberg.de/ziti/uploads/ce_group/2015...

SAP HANA is a full fledged in memory computing DB.

Looks good! Two questions come to my mind :

- Is it possible to embed RedisGraph inside an Android application ?

- Given it is memory only, is it possible to "dump" the graph to local storage then restore it to memory later ?

Another question, more general : Given an X-only application (X being memory or disk), it is possible to emulate the API on top on the other, for example filesystem-in-memory and memory-on-disk (basically, swap I guess). Would you rather build an application using exclusively a Filesystem API, or exclusively a Memory-oriented API ?

Best Regards

Can RedisGraph be embedded inside an Android application? I don't have much experience developing applications for Android but it mostly comes down to the underline architecture you're aiming for, although Redis 4.0 can run on ARM, RedisGraph currently can't as it access unaligned memory.

But incase that's resolved or in the case of Intel CPU you'll have to embed a binary executable within your Android application, i'm pretty sure that's possible.

With regard to your second question, I think this a personal preference question, as I've been dealing mostly with in-memory applications for the past couple of years I'll prefer a memory oriented API.

> Given it is memory only, is it possible to "dump" the graph to local storage then restore it to memory later ?

That's how Redis operates. It happens automatically. You can also back every command that changes the graph to a durable transaction log.

The main features list includes “On disk persistence”. Is that what you were looking for?

>RediSearch has a distributed cluster version that can scale to billions of documents and hundreds of servers. However, it is only available as part of Redis Labs Enterprise.

This is really cool. I wrote something similar with Python (it was slow ha) as an exercise to learn Redis a few years back. I used Gremlin/Groovy as an inspiration for the query lang, not Cypher as this project does. https://github.com/emehrkay/rgp

Would this work if I wanted to store the graph forever we generally use redis as memcache to store data with expectation that it will gone at any point in time.

This is a wrong misconception as Redis data can be persistent https://redis.io/topics/persistence

Modules extending Redis with new capabilities and data types can and probably should use Redis persistency.

As long as you have the recommended 1:1 backup set up, it should be almost forever.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact