
Microsoft GraphView: Use SQL Server or Azure SQL to efficiently manage graphs - us0r
https://github.com/Microsoft/GraphView
======
pbnjay
I'm hoping I'm missing something important, or that there's a reason the
documentation isn't more clear, especially if this is coming from Microsoft.

    
    
        "efficiently manage graphs" != "efficiently traverse graphs"
    

It's very very easy to store a graph in a database. It's very very hard to
efficiently walk an arbitrary path between nodes using SQL.

Browsing the user manual, I don't see any examples of graph walks. There are
some glorified joins, but there's no real explanation of why a graph database
is better suited for the queries show (quite the opposite I feel -- it's
harder to read the non-standard SQL!).

~~~
yid
Seems like its more geared towards matching relatively small template-based
subgraph patterns rather than doing arbitrary walks efficiently. From page 9
of the user manual:

> Below is an example query matching a triangle over the employee-client
> graph:
    
    
        SELECT En1.Name, En2.Name, Cn.Name
        FROM EmployeeNode AS En1, EmployeeNode AS En2, ClientNode AS Cn
        MATCH En1-[Colleagues AS Cg]->En2-[Clients AS C1]->Cn,
              En1-[Clients AS C2]->Cn
        WHERE En1.Name = ‘Jane Doe’ AND C1.credit > 1000

~~~
pbnjay
Clearer and more standard SQL Join:

    
    
        SELECT En1.Name, En2.Name, Cn.Name
          FROM EmployeeNode AS En1, EmployeeNode AS En2, ClientNode AS Cn,
               Colleagues AS Cg, ClientOf AS C1, ClientOf AS C2
         WHERE En1.Name = ‘Jane Doe’ AND C1.credit > 1000
           AND En1.ID=Cg.ID1 AND En2.ID=Cg.ID2
           AND En1.ID=C1.EnID AND En2.ID=C2.EnID
           AND C1.CnID=C2.CnID
    
    

Really all the "MATCH" gets you is cleaner many-to-many join. But is that
tradeoff worth the DLL dependency and burden to bringing on new developers?

~~~
Someone
[http://research.microsoft.com/pubs/259290/GraphView%20User%2...](http://research.microsoft.com/pubs/259290/GraphView%20User%20Manual.pdf):

    
    
      CREATE TABLE EmployeeNode
      (
        [ColumnRole:”NodeId”] 
        WorkId varchar(32),
        [ColumnRole:”Property”]
        Name varchar(32),
        [ColumnRole:”Edge”, Reference:”EmployeeNode”]
        Colleagues VARBINARY,
        [ColumnRole:”Edge”, Reference:”ClientNode”, Attributes: {credit:”int”, note:”string”}]
        Clients VARBINARY
      )
    

=> they also appear to replace "many rows each with key and a single
referenced value" by "row for each unique key with packed values for that key"
(possibly with multiple rows for a key if the packed data doesn't fit in a
VARBINARY(MAX))

So, the database structure looks a bit like an inverse index. That could speed
up some queries.

I have been playing with something similar in pure T-SQL, using a cross apply
over a table-valued function to get from this packed format back to the
normalized table, but wasn't impressed with its performance. Using CLR was
next on my list, but I haven't come around it because I don't have high hopes
that it will perform, but I may try this.

------
theatraine
I wonder how this will compare with TitanDB?

[http://thinkaurelius.github.io/titan/](http://thinkaurelius.github.io/titan/)

