Data-Driven, Recursive Interfaces for Graph Data

bsaunder · on Oct 22, 2010

Interesting article, thanks for posting. To mesh with the recent "Rands in Repose" post, this gave me some good relevance points to ponder for a little while.

I've developed a similar fascination with graph data over the past couple of years. My particular affliction has focused on treating program code as data (nodes) and stitched together with edges.

For visualization I was considering generating partial 3D models using something like StructureSynth and putting them in a 3D world like opencobalt. Also stumbled across Orange (http://www.ailab.si/orange/) which looks useful too.

rjurney · on Oct 22, 2010

Check out WireIt, there is a lot of good stuff there for visual peogramming in a graph interface: http://neyric.github.com/wireit/

I used it here to make a web version of PigPen: http://github.com/rjurney/Cloud-Stenography http://vimeo.com/6032078 http://wiki.apache.org/pig/PigPen http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.134...

jerf · on Oct 22, 2010

FWIW, what you say is basically true, and in fact if you dig into actual relational theory (as opposed to the bastardized subset that SQL gives you) you'll find something that itself looks an awful lot like a graph.

But the problem that you will find eventually, and you should always be mindful of, is that nobody has solved the problem of representing arbitrary graphs and getting the sort of good performance you expect from a web service. This has been one of the big stoppers for "RDF", for instance, which by the way is still something you should check out even so if you don't immediately know what RDF is.

Intuitively, while I will admit I'm not an expert on the topic (just a dilettante that has gone down some similar thought paths), the problem is that a full graph has no structure to get a hold of and take advantage of in your query. A traditional SQL table has a regular, recurring structure and obvious indexes to use to optimize the performance (and in fact this is the source of most if not all of the deviations from relational theory, IMHO). A NoSQL database strictly limits itself to what is usually the equivalent of an SQL record with one key (more or less) and a blob in it. (And some of them do various moderately fancy things with that blob, but even so, a blob.) A graph can just do anything it damn well pleases, and they do not only in theory but in practice, and that becomes difficult to deal with in practice even when the theory is beautiful.

rjurney · on Oct 22, 2010

The key point is that you've done all your processing in batch, and you're only displaying the most prominent or interesting properties and links for each record. Hadoop/Pig/Python together are much more powerful than SQL if you don't have a real-time requirement, and by the time we get to the key/value store all the data processing is done. If you think you have a real-time analytic requirement... well you may, but quite possibly you really don't.

Getting to that point through batch processing can be hard, but the infrastructure is ideally suited to it. NoSQL doesn't impede you whatsoever. It enables you to think correctly about packaging your data for recursive consumption in trivial interfaces.

Real-time large scale graph processing isn't possible, or is very hard but... its not really needed to do amazing things.

th0ma5 · on Oct 22, 2010

I would argue that when you represent a graph as an Entity, Attribute, Value (just like a tuple in RDF) in the database as just those three columns, there are a lot of opportunities for indexing. You're right though, a large full rendered graph is unwieldy by definition.

besquared · on Oct 22, 2010

I have no idea what this person is trying to say. Can anyone elaborate?

rjurney · on Oct 22, 2010

I can help. Which bit wasn't clear?

I was trying to explain the human interaction consequences of batch processing and NoSQL in presenting mined data in web applications.

If you're not into any of those things, I could explain but... its a niche. There's probably not too much point.