Hacker News new | past | comments | ask | show | jobs | submit login
Don't bring a tree to a mesh fight (valand.dev)
48 points by valand on Nov 23, 2021 | hide | past | favorite | 22 comments



Don't bring a mesh / graph to a hypergraph fight. Hypergraphs seem to be the "biggest generalization" I've come across in the list -> tree -> graph -> hypergraph hierarchy. (That is: all lists are trees. All trees are graphs. All graphs are hypergraphs)

In a graph, an "edge" connects two nodes. In hypergraphs, a "hyperedge" can connect 2-or-more nodes.

-------

You might think that hypergraphs are purely theoretical, except... they are in common usage all the time. You might also know hypergraphs by their alternative name: Relational Tables.

Well, I'm being loose with the language last paragraph. A relational-table is really a hyper-edge. And the nodes of the hypergraphs are the domains of whatever you're working with.

If your relational-table only has 2-columns, you have a normal graph relationship between the data. But as anyone who has worked with relational databases knows: you often have 3-columns, 4-columns or more to represent the data. Once you reach 3-columns / 4-columns or beyond, you no longer are working with "simple" graphs but instead hypergraphs, and the theoretical complexity of the whole system reaches a new level.

--------

    CREATE TABLE PersonsHyperedges (
        PersonID int,
        LastName varchar(255),
        FirstName varchar(255),
        Address varchar(255),
        City varchar(255)
    );
You can imagine this Table as a list of hyperedges, connecting "PersonID", "LastName", "FirstName", "Address" and "City" nodes together.

--------

    CREATE TABLE PersonsGraphEdges (
        LastName varchar(255),
        FirstName varchar(255),
    );
In this case, since there's only 2-elements in the "hyperedge", this table is a list of "normal graph-edges"

-----------

There are also things "in-between". Bipartite graphs are very common for example. The hierarchy fits as "tree -> bipartite graph -> graph" (all trees are bipartite graphs).

If you have assurances that your problem you're working with is bipartite, then favor bipartite graph algorithms. There are a lot of NP-complete problems over graphs that are in P for bipartite-graphs


A hypergraph can always be viewed as a bipartite graph


I'm not sure if you're correct or wrong? But its a strange statement to make.

Like, any graph can have a "minimum spanning tree" representation for example. But the minimum-spanning tree has some degree of information loss compared to the original.

Similarly: a graph / tree could have a Lexicographic ordering (ie: a list representation), but this list-representation lost a lot of information in the translation process.

---------

It just turns out that a lot of algorithms could work on the minimum-spanning tree (or the lexicographic list) of the graph.

So instead of working on the general graph of a problem, you simplify it down into a simpler structure, and THEN work on the simplified structure.


The duality is super useful in practice!

In the table/dataframe -> hypergraph dataframe transform @ https://github.com/graphistry/pygraphistry , we do `hypergraph(multicolumn_table, direct=True | False)['graph'].plot()` , which renders hypergraphs as a regular graph. That lets you pick. Saves a lot of wrangling!

Consider exploring some logs of customer activity or security events. A hyperedge becomes either:

- a node of a bipartite graph. Ex: each log event becomes a node connecting the various entity nodes it mentions. Event <> IPs, accounts, countries, ...

- .. or a bunch of pairwise entity<>entity edges. Ex: connect each IP<>account<>country directly, and label each edge with the hyperedge event id it came from.

In both cases, you can now directly leverage a lot of traditional graph thinking, and in our case, GPU acceleration & visualization.

Other systems might render hyperedges as say circles encomposing their nodes, but that's trickier at even small/medium scales, so I haven't seen much widely popular

I increasingly just directly equate 'logs' with 'property hypergraphs' and skip the relational step. Funny enough, a lot of our enterprise+gov work is undoing the weird tabular & taxonomy optimizations of SQL engines that leak into the user experience when we're to get simple 360 views for users. It's cool an org built out 10,000 tables, but..... :) Same thing when we're doing graph neural networks: logs => hypergraphs (encoded as bipartite or regular property graphs) => event embeddings / classifications / predictions .


> I increasingly just directly equate 'logs' with 'property hypergraphs' and skip the relational step. Funny enough, a lot of our enterprise+gov work is undoing the weird tabular optimizations of SQL engines to get 360 views of data back to users. It's cool someone has 10,000 tables, but..... :)

The more I program, the more I realize that all these data-structures are conceptually the same if you squint enough.

Graphs are matrices (Edge == 1. Non-edge == 0). Matrices are graphs (see sparse matricies, like COO and its really obvious). Computer Code is graphs, Databases are Hypergraphs. Circuits are hypergraphs. Etc. etc. Everything seems to convert into each other.

The study of NP-completeness is the study of graph (or hypergraph) complexity. Chordal graphs and Bipartite Graphs seem to have significantly more algorithms in P-complexity rather than NP-complexity. (Assuming P =/= NP)

--------

There are questions about which forms are easier to think about... both for the human and the computer. At least, for the equivalent forms (Relational Databases vs Hypergraph representations seem to be equivalent, though I'm not 100% sure)

When one form is "less powerful" than another form, you often get significant improvements in speed. Lists are less powerful than trees, but almost all list algorithms operate way faster.


"Hypergraphs can be viewed as incidence structures. In particular, there is a bipartite 'incidence graph' or 'Levi graph' corresponding to every hypergraph, and conversely, most, but not all, bipartite graphs can be regarded as incidence graphs of hypergraphs."


A hypergraph over vertices V can be viewed as a subset of the power set of V, i.e. E ⊆ 2^V. Which can be represented by a bipartite graph over vertices (V ∪ E), where v ∈ V and e ∈ E are connected iff V ∈ E.


Representing a hypergraph as a bipartite graph doesn't lose any information. The entire structure is still there. It will take you two pointer indirections instead of one to traverse an edge, but otherwise it's exactly the same - all the connections are still in the right places.


There must be something being lost in translation here, because bipartite graphs cannot represent all graphs, let alone hypergraphs.

* A -> B

* B -> C

* C -> A

This is a simple, 3-node cyclical graph that fails to be bipartite. All graphs are hypergraphs, so this is also an example of a hypergraph that fails to be bipartite.

EDIT: Funny enough, if we remove C->A and replace it with "C -> D" and "D->A" it'd be bipartite, with A/C as one side and B/D being the other side.

-------

We can convert this cyclic graph into other representations. For example:

* A-> B

* B-> C

Now we have a minimum spanning tree, which also happens to be a list [A, B, C]. This was accomplished by removing an edge (the C->A edge).

----

There are a great many operations that "see" a graph in other forms. For example, the following list describes the graph:

    1. A->B
    2. B->C
    3. C->A
This "list representation" of the graph is certainly... a list... and it captures all of the information of the graph. But its also not really useful to do graph operations in this form. It may take a long time to find a node's edges for example.

I'm assuming you're saying that there's some kind of "bipartite form" that can accurately describe all hypergraphs. But while that's technically true, that's also not really what people mean.


I'm not saying that all hypergraphs - or all graphs - literally are bipartite graphs, I'm saying that you can construct a bipartite graph that represents the same structure as the hypergraph.

Given the hypergraph:

  A -> (B, C)
you can convert it to a bipartite graph by introducing an extra vertex of the second type to represent the hyper-edge:

  A -> e
  e -> B
  e -> C
You can do the same thing starting from a "regular" graph, although it's kind of silly for most purposes; for the basic cycle, you'd have:

  A -> e1
  e1 -> B
  B -> e2
  e2 -> C
  C -> e3
  e3 -> A
I'd say that's a much more valid representation of the graph than the spanning tree, which is lossy (since you can't tell from [A, B, C] whether or not there's an edge C->A).


You have a vertex set {A, B, C} and an edge set {A->B,B->C,C->A}. I'll label the edges D,E,F for convenience.

We define an incidence structure between the vertices and edges, by making red vertices to represent original vertices, and blue vertices to represent original edges. A directed edge A->B is represented by the path A->D->B in our incidence structure. In total, our incidence structure has six nodes {A,B,C,D,E,F} and six edges:

  A->D
  B->E
  C->F
  D->B
  E->C
  F->A
Said differently, we have subdivided every edge of the original graph into two edges in the incidence structure. That doubles the lengths of all cycles, so we don't see any odd cycles; also we have an explicit bipartition into red and blue.

All that said... directed hypergraphs are kinda wild.


The idea is that if you have a bipartite graph and the sets V and W are the vertex sets for the bipartition, then V is the hypergraph's vertices and W is the hypergraph's hyperedges. A hyperedge w in W is incident to whatever it's adjacent to according to the bipartite graph.


The undirected graph is isomorphic to this bipartite graph:

  U: { a, b, x }
  V: { e1, e2, e3 }
  E: { (a, e1), (b, e1), (b, e2), (c, e2), (c, e3), (a, e3) }
> But while that's technically true, that's also not really what people mean.

Not sure who these people are, but there are definitely cases where the bipartite form is useful.


Great comment. I've worked on modeling a specific kind of data as graph and it was too limited. I thus used hypergraph and realized after a while it was in fact relational data, with hyperedge being like table like you explained. I still sticked to the graph model as it was convenient to not have the relationships hardcoded in a database.


For people who want to understand more of the math behind things like hypergraphs.

Hypergraphs as math object:

https://en.wikipedia.org/wiki/CW_complex

Or with data:

https://en.wikipedia.org/wiki/Sheaf_(mathematics)

And then you find things like hypergraphs in data fusion:

https://www.youtube.com/watch?v=b1Wu8kTngoE

Or related to logic:

https://ncatlab.org/nlab/show/cubical+type+theory


I think there's a typo:

> It is in the shape of a Directed Acyclic Rooted Tree or simply a tree. The neat thing is, a tree-shaped computation is always "one-pass". Each node can only be passed at least once.

Should be:

> Each node can only be passed at most once.


Author here. Thank you for catching that (and taking time to read)! Fixing it up tonight.


I think I found one more minor typo – in the Control Flow Graph section I believe "The possible number of calls to c is zero to infinity;" should be "zero to one" instead, as the flow terminates as soon as it reaches "c", so there is no chance for it to be greater than one.


Thanks. Great catch!

One thing to learn from this, working past 6pm will most likely yield bugs rather than values.


No, thank you. I thoroughly enjoyed the article. I also enjoy the title, although it caused some cognitive dissonance by implying that the primary thesis would be: to be careful trying to use trees, where meshes are more appropriate.

Instead the concluding thesis was: "Whenever Possible, Make a Tree" and due to my expectations I thought for a second that I may have accidentally skipped over some section which championed meshes.

Potentially, putting the thesis as a single sentence near the top would prime readers so they can accurately ingest the rest of the article with proper prejudice.


"Whenever Possible, Make a Tree" I've thought about it, but I added " and the Other Way Around" instead, just because I feel the original title gets more glances than the alternatives.


Is “mesh” in this context a term I’m not aware of? The author never defines it.

My guess is that “mesh” here is supposed to mean “directed cyclic graph” but it’s not very clear.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: