I'm curious how does it perform in comparison of neo4j. Especially in memory usage.
: http://data.gdeltproject.org/documentation/GDELT-Global_Know... https://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-...
Edit: I just read some more, and it seem like this is a graph-native "brother" of cockroachdb. Which was the one I previously cosndidered, until learning of the inadequate down-pushing ability of graph queries, which would increase network needs a lot.
Let me know if you want me to try and notify you once there are benchmarks or production tests successfully completed.
2-digit TB of graph data is for sure a serious graph :) Depending on what you actually want to analyze and how the graph is structured, you can either use our Pregel integration for high-level analytics of the data set or use SmartGraphs to shard the data to a cluster and perform in-depth traversals, pattern matching and such stuff. SmartGraphs is an Enterprise feature but free for evaluation and testing. Please note that the suitability of SmartGraphs depends on the structure of the graph itself (e.g. do you have or can identify communities to shard by efficiently)
We created a series of tutorials which guide you through such a process https://www.arangodb.com/pregel-community-detection/ (next step always at the end of each tutorial). You can choose between two storage engines. Think RocksDB is the better choice in your case, as everything is persisted to disk (data/indexes) and you can configure how much main memory should be used, so you can configure the trade-off between performance and main memory yourself. If you have fast SSDs it's even better, as RocksDB is optimized for that.
I've never tested the memory usage difference between the two, but it should be an interesting test to perform.
Generally speaking it seems like a common practice now days to load large portions of your data from disk to memory
for performance reasons obviously, no one is willing to pay the penalty hit of disk seek when it comes to real time data querying.
As an example, I have the HN data set on my laptop - about 15 GB of data and I have the max mem set to 3GB (heap is usually ~1) and I can search, aggregate, etc. very quickly without memory problems.
That said RedisGraph does supports both nodes and edges properties as these are not stored within the Hexastore but on the actual entity object.
The new version of RedisGraph which will be presented at RedisConf2018 will completely replace the Hexastore with a new data structure which is better suitable in terms of both memory and cpu.
Although the work which will be presented at Redis Conf has yet to be published.
So while you're right, for most people it's not going to be a practical problem. Query performance tends to be a bigger problem in practice, and IO bandwidth is often the main cost driver for servers (with NVMe SSDs providing so much additional IO bandwidth that they often pay for themselves several times over by reducing number of servers needed for people who insist on "traditional" disk focused databases), pushing server costs into ranges where spending more on RAM but needing fewer servers to handle the IO load, is more and more often a good tradeoff.
Here's one I found that supports up to 48 TB:
They don't include a price, so I'd assume it's in the low millions or high six figures.
Fascinating tech. I had no idea this could be done with Xeons.
- Is it possible to embed RedisGraph inside an Android application ?
- Given it is memory only, is it possible to "dump" the graph to local storage then restore it to memory later ?
Another question, more general :
Given an X-only application (X being memory or disk), it is possible to emulate the API on top on the other, for example filesystem-in-memory and memory-on-disk (basically, swap I guess). Would you rather build an application using exclusively a Filesystem API, or exclusively a Memory-oriented API ?
But incase that's resolved or in the case of Intel CPU you'll have to embed a binary executable within your Android application, i'm pretty sure that's possible.
With regard to your second question, I think this a personal preference question, as I've been dealing mostly with in-memory applications for the past couple of years I'll prefer a memory oriented API.
That's how Redis operates. It happens automatically. You can also back every command that changes the graph to a durable transaction log.
Modules extending Redis with new capabilities and data types can and probably should use Redis persistency.