Hacker News new | comments | show | ask | jobs | submit login
Mosaic: processing a trillion-edge graph on a single machine (acolyer.org)
156 points by contingencies on May 30, 2017 | hide | past | web | favorite | 11 comments

This is really well executed. I am always surprised that more graph engines are not designed using the internal architecture here. Similar designs actually work well in more distributed environments. The design is close to a graph engine I wrote for a supercomputer in 2009, particularly how the parallelization and scale-out work, which performed well. The overhead of networking is significant but can be partially mitigated with aggressive latency hiding techniques.

The article demonstrates how efficiently a very large graph can be processed on a single machine but the architecture has analogues that work nicely on large compute clusters too. It is just a good way to process graphs.

Do you have any papers about your work online? That sounds super interesting.

Unfortunately not. I've recently started working on a series of articles and papers to close that gap though.

Not the OP, but googled and this was the second after LinkedIn: http://www.jandrewrogers.com/2015/10/08/spacecurve/

Stupid question by someone who finds this very interesting but lacks some background information: What kind of processing are we talking about? What do you do with the graph? Searching? Finding paths? Mutating contents? Changing the graph structure? Aggregating data about the graph?

The purpose seems to be presupposed, so I guess it's some common use case that I should know.

The major motivation for Giraph (Facebook) and GraphJet (Twitter) was the ability to do graph based recommendations such as Friend of Friend scoring (Facebook) or Who To Follow (Twitter). Basically looking at the followers or friends of each of your followers or friends and using some kind of scoring algorithm to pick which users you will most likely be interested in.

Another common example is doing some kind of clustering (community detection) of vertices like for example labeling the users on the social graph by likely political affiliation or likely hobbies. These labels could be used as features for advertising recommendations later.

Anything - even traversal. Basically, whatever combination of functions you can feed to the EP/LR/GR to achieve whatever task.

This begs the question if this scales linearly. The 1 Trillion Edge Graph on Xeon Phi ( Knight Corners ) and Intel SSD. How well would it do on Knight landing ( The Current Gen )? Or Knight Hill, next gen with 10nm. And with Optane Memory instead of SSD.

Could we handle Facebook's edge graph on a single machine?

Good question - sounds like you have a paper to write.

> But, with Mosaic, we are able to process large graphs, even proportional to Facebook’s graph, on a single machine.

What exactly does the word "proportional" mean here?

On the order of one trillion edges like Giraph.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact