Hacker News new | past | comments | ask | show | jobs | submit login
Blazegraph 2.0 – GPU-accelerated distributed graph database (blazegraph.com)
103 points by espeed on Feb 29, 2016 | hide | past | web | favorite | 23 comments

We've used Blazegraph at work for a project, and we have used a number of other graph databases.

Blazegraph is a very niche product and requires a lot of time for setting it up and adjusting it for your workload.

If Blazegraph peaks your interest then you should also look into Yarc platform by Cray.

Should you want to look more into graphs, but don't want to spend endless nights just trying to load your data then I would recommend Stardog, which has just been a pleasure to work with.

The promises they make are really tempting. However, when I experimented with Blazegraph last year, it ended up stuck more often than I could count, whether during bulk import or in answering some ad-hoc SPARQL queries.

I can imagine that with extensive tuning, BlazeGraph provides a good database. Just don't expect it to have the polish and convenience of a modern RDBMS or a shiny NoSQL store :)

WikiData selected BlazeGraph to back its new Query Service:


Here's a spreadsheet showing WikiData's evaluation of each candidate graph database:


Their selection "process"[1] was.. not what I'd choose to use, especially since they changed the priorities as they evaluated. But then they abandoned that process[2] so I wouldn't read too much into the evaluation.

Basically as far as I can see, the main reason BlazeGraph was chosen was this: [they] had me out to their office (a house an hour and half from mine)[3]

I'm sure BlazeGraph is fine. We were doing a very similar evaluation at the same time, and the Titan situation screwed us over too. But we took a look at BlazeGraph after Wikidata chose it, and found it pretty rudimentary at that time.

[1] https://phabricator.wikimedia.org/T90101

[2] "As you can also see we didn't finish filling them all out. But we've still pretty much settled on BlazeGraph anyway. Let me first explain what BlazeGraph is and then defend our decision to stop spreadsheet work" https://lists.wikimedia.org/pipermail/wikidata-tech/2015-Mar...

[3] https://lists.wikimedia.org/pipermail/wikidata-tech/2015-Mar...

I just had a look at some of the urls in your profile. http://proof.social/ and http://whybase.com/ are down or dead but sounded interesting.

We've definitely been working on improving the ease of getting started. The number of configuration options is a strength and a challenge.

There's now an updated user's guide on the wiki: https://wiki.blazegraph.com/wiki/index.php/Main_Page

and additional code samples: https://github.com/blazegraph/blazegraph-samples/


Stardog is great and would be even better with a friendly licensing policy towards indie developers. Their community version is too limited and enterprise version price is per server, contact sales. Current developer version is for testing only.

I would happily pay for a developer license or something like that.

We sell Developer licenses to developers, indie or not, all the time. Send email to sales@complexible.com! :)

Great news, I will. I guess that option wasn't obvious for me when I was looking over http://stardog.com/#plans

MapGraph -- the GPU-accelerated graph engine -- has been rolled into Blazegraph 2.0, and it looks like this means OLTP and OLAP can be combined into a blazing-fast, single OLXP system.

From: https://www.blazegraph.com/product/gpu-accelerated/

  The original work was funded by DARPA and presented at the 
  2014 SIGMOD conference in a paper entitled, MapGraph: A 
  High Level API for Graphs [1]. This work is available in 
  open source. Later work, in collaboration with the 
  University of Utah SCI Institute [2] and funded by DARPA, 
  applied multi-core techniques running on over 750 M cores 
  on the Titan Supercomputer to extend this to Multi-GPU 
  traversal with Breadth First Search (BFS).  On a cluster of 
  64 NVIDIA K40 GPUs, it demonstrated a throughput of 32 
  Billion Traversed Edges Per Second (32 GTEPS), traversing a 
  scale-free graph of 4.3 billion directed edges in 0.15 
  seconds, which was featured in a presentation IEEE Bigdata 
[1] https://www.blazegraph.com/whitepapers/MapGraph-SIGMOD-2014....

[2] http://www.sci.utah.edu/publications/Fu2014a/UUSCI-2014-002....

VIDEO: Blazegraph GPU and DASL at Super Computing 2015 (https://vimeo.com/148519808)

It's GPLv2 and now has support for SPARQL and TinkePop3/Gremlin.

See https://groups.google.com/d/topic/gremlin-users/8fS_ak-tWNs/...


Meanwhile, check out:

Also down:


Should be all resolved. Tuning EC-2 up for the /. effect...

"Your connection is not secure

The owner of www.blazegraph.com has configured their website improperly. To protect your information from being stolen, Firefox has not connected to this website."

We've seen cases where the newer Thawte root certs were not included in the trust chain. That might be the issue.

Got any questions on Blazegraph GPU? Just let us know...

What's the performance like on data sets that are too big for any single GPU's memory?

Can the GPU be used to accelerate shortest-path queries (e.g. dijkstra's algorithm) and if so, where can I read more about how that's achieved?

The graph does need to fit into GPU ram. We use graph partitioning for multi-node, Multi-GPU configurations.

Dijkstra's algorithm which, as mentioned by Davidson et al. [1], is a "sequential algorithm [that] is poorly suited for parallel architectures like GPUs that require large numbers of parallel threads for efficient execution."

Instead, we have variants of the algebraic formulation of the Bellman-Ford algorithm as given in Kepner and Gilbert's book [2].

[1] Andrew A. Davidson, Sean Baxter, Michael Garland, and John D. Owens: "Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths." In Proceedings of the IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS), 2014. http://dx.doi.org/10.1109/IPDPS.2014.45

[2] Kepner and Gilbert: "Graph Algorithms in the Language of Linear Algebra."

> provides a Scala-based language to write graph and big data analytics and is complementary to the Spark and Hadoop ecosystems

It'd be awesome to have Blazegraph as a backend for Spark's Pregel queries.

With Tensorflow bindings in place, and the BIDMach/BIDMat libraries, it is very nice seeing Spark getting some serious GPU attention.

We definitely see Spark + Scala + Blazegraph DASL to be a sweet spot for combining the ease of Spark with the GPU performance. Have a submission in to the Hadoop Summit in June on it: https://hadoopsummit.uservoice.com/forums/344955-data-scienc...

You guys should submit a talk to Spark Summit. Look forward to it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact