
Computation Graphs and Graph Computation - bmc7505
http://breandan.github.io/2020/06/30/graph-computation/
======
dan-robertson
I don’t really understand what the post is trying to say or whom it is
targeted at. But maybe that’s because I only skimmed it. It feels like a big
brain dump of random notes rather than anything cohesive or particularly
complete.

I found the frequent self-quotes odd and the self-congratulatory introduction
more odd. An example of the lack of target audience would be the section “what
are graphs?” It starts with a paragraph which seems to be for someone who has
never heard of graphs (yet it doesn’t actually define what a graph is, but
maybe that’s because there’s so much variety in what different things may be
graphs). It also calls graphs “data structures” which I don’t think I would
agree with. It’s then followed by a big list of links and then a paragraph
referencing Kotlin∇ and talking about trees and such even though the previous
paragraph seems to be for someone who doesn’t know what a graph is.

I found other sections weird. E.g. the author seems to think that it is
important to treat a matrix as a function between vector spaces but somehow
misses the main point (this function must be _linear_ and the numbers in a
matrix are in some sense unimportant because they are relative to some basis).
Then there’s a section about raising matrices to a power except when you look
at it you realise it’s actually a product of different matrices and actually
the topic just changed halfway through the paragraph.

The “matrix multiplications” in the section on data flow graphs don’t make
sense to me, even with the random extra rules given above (the third entry in
S' the second row should be 5, but this causes the next row to not have the
idempotence property the author wants). It also seems to be confusing matrices
with manipulating random numbers in a grid.

I don’t want to be overly negative. There’s probably some good stuff in here
but at the moment it just appears to be a dump of the author’s notes and
thoughts.

~~~
brzozowski
Hi Dan, thanks for taking the time to read this and leave your feedback. I'll
try to address your remarks one-by-one.

> I found the frequent self-quotes odd and the self-congratulatory
> introduction more odd.

Thank you for the candid feedback! I have reflected a bit about this and where
it comes from. It detracts from the main theme, which I regret, but also gives
the reader context. Like many, I have flaws and ego is one of them. I try to
be upfront about this, so the reader can bail early.

> It starts with a paragraph which seems to be for someone who has never heard
> of graphs.

This is a good point. I did not think very carefully about the audience. It is
mostly, as you mention, a brain dump. At the same time, I am trying to explain
some ideas which may be familiar to you, but others are encountering for the
first time. I agree, this could be clearer what parts are for what audience.

> this function must be linear and the numbers in a matrix are in some sense
> unimportant because they are relative to some basis

Not attempting to be mathematically rigorous, but I think you raise an
important point here, and I will try to clarify this point more carefully.
Linearity can be a constraint in real vector spaces, but is less of a problem
in Boolean algebra. The goal here is just to show how matrices can represent a
function between vector spaces.

> Then there’s a section about raising matrices to a power except when you
> look at it you realise it’s actually a product of different matrices.

Not sure I follow, can you be more specific? I will try to clear this up.

> The “matrix multiplications” in the section on data flow graphs don’t make
> sense to me, even with the random extra rules given above.

Completely agree, this algorithm needs to be made more explicit, because it is
a poorly written example. The full example can be found in Miller (1987).

[https://www.cs.cmu.edu/~glmiller/Publications/MRK86b.pdf](https://www.cs.cmu.edu/~glmiller/Publications/MRK86b.pdf)

> I don’t want to be overly negative.

Not at all! Again, thanks so much your comments. I would be happy to discuss
them further here or by email.

------
jix
I haven't read everything, but it does contain a section about something I'm
quite familiar with:

The article says "We compute the hashcode of the entire graph by hashing the
multiset of WL labels. With one round, we’re just comparing the degree
histogram. The more rounds we add, the more likely we are to detect a
symmetry-breaker:", which is not correct.

Repeatedly re-labeling a graph by assigning a unique label to each unique
neighborhood histogram, until the corresponding partition reaches a _fixpoint_
would compute the _one-dimensional_ Weisfeiler-Lehman refinement. But there
are _many_ non-isomorphic graphs that you cannot tell apart with this! It
cannot tell apart any two regular graphs of the same size and degree, and you
can easily run into those in practice.

Even if the code would repeat this process until a fixpoint is reached,
instead of hoping that 10 iterations suffice, non-isomorphic graphs that
result in the same WL partition means that "The more rounds we add, the more
likely we are to detect a symmetry-breaker" doesn't hold.

There are also more efficient ways of computing this.

Now apart from one-dimensional WL refinement, where you compute a label from
the neighborhood of individual vertices, there is also k-dimensional WL
refinement, which very roughly speaking looks at all length k sequences of
vertices and computes statistics for those. Here it is true that as k reaches
|V| you will be able to tell apart any two non-isomorphic graphs, but only
because if k=|V| you're trying all permutations! There is no fixed k that will
tell apart any two graphs.

If you want to perform graph isomorphism tests in practice, I'd recommend
Traces: [http://pallini.di.uniroma1.it/](http://pallini.di.uniroma1.it/)

It combines two-dimensional WL refinement with a recursive search to tell
apart all graphs, while using computational group theory algorithms to
efficiently prune the search space for graphs with many symmetries. It is very
efficient in practice.

It is described in
[https://arxiv.org/abs/0804.4881](https://arxiv.org/abs/0804.4881) (also
linked from the website) and also has references for everything I wrote
(modulo any mistakes I made while summarizing).

~~~
brzozowski
Thanks for your feedback! I like the WL algorithm for its simplicity, but
agree there are specific cases it does not handle well. It is meant to
illustrate a simple message passing algorithm, and is not a particularly
efficient implementation. Will clarify this point, thanks!

------
Kednicma
This is a big post. I'll point out what is especially interesting in terms of
equivalences.

A square matrix on the Booleans is indeed also equivalent to a directed graph,
as explained. It is also equivalent to a Chu space [0], specifically a sort of
unitary Chu transformation.

We can specialize slightly. Let our square matrices be upper triangular; that
is, let the lower triangle not including the diagonal be all 0/false. Then the
corresponding graph is a DAG. (Apologies if this was in the article; I saw
hints of it, but never the direct statement!) The corresponding Chu space has
poset structure. These three facts are all related by fourth fact that the
matrix's rows are labels for a topological sorting of the DAG, and the fifth
fact that DAGs are equivalent to posets; we obtain a sixth fact that posets
are equivalent to upper triangular Boolean matrices.

[0] [http://chu.stanford.edu/](http://chu.stanford.edu/)

~~~
bmc7505
Interesting! I was not aware of the connection to Chu spaces, will have a look
into that! Algebraic topology seems to have a endless trove of many
interesting dualities. I did find the connection between topological ordering
and boolean matrix multiplication to be really interesting. I was taught a
queue-based algorithm as an undergrad, so learning about this was a pleasant
surprise.

~~~
adamnemecek
All dualities are the same duality, namely adjoint functors. Wrote a brain
dump on it
[https://github.com/adamnemecek/adjoint](https://github.com/adamnemecek/adjoint)

~~~
bennofs
I was about to comment that these chu spaces look vaguely like galois
connections. Turns out galois connections are just a special case of adjoint
functors as well, so thanks for making that connection!

------
diydsp
I read the whole thing - delighted. More of a programming "implementer" than a
"designer," I have intuited things like this before... And when I was required
to learn control theory and simulink (which uses graphs), it grew deeper in
me. I had a system of equations with a dozen variables (represnting a medical
device and body physiology)... Some measured accurately, some interstitial
guesses. I wanted to be able to transform the system around into many forms. I
eventually learned Minsky, simulink and modelica, but wanted better tools.

From that experience, one thing I would consider adding to your system is
physical units. I independently discovered that physics units (Ohms, inerita,
etc.) can be described as a vector. The other thing that is missing is out
there and would be a boon is a real-time element for design. So many of the
UIs are blocky in their design, meaning you sketch for a while, then press a
large, heavy compile button, and wait for your results... To counteract this
frustration, I have been undertaking experiments in real-time compiling.
Please contact me if you're looking or a co-conspirator! I have been
conceiving and part-time developing an environment where you can model systems
continuously in real-time (or faster). Each time you make a change, it runs a
number of tests and in fact, your changes don't have to be "complete." The
system can iterate in the direction you send it and converge to your spec.

~~~
bmc7505
Hey, thanks for your comment! I'm a big fan of DIY "implementers" like
yourself and am happy to hear some of these ideas resounded with you. I come
from a similar background, and agree we need better tools for thinking and
working with complex software systems.

> I would consider adding to your system is physical units.

Totally agree. This is high up there on my priority list. Kennedy (2006) did
some early work on dimension and unit types that I was quite inspired by:

[http://typesatwork.imm.dtu.dk/material/TaW_Paper_TypesAtWork...](http://typesatwork.imm.dtu.dk/material/TaW_Paper_TypesAtWork_Kennedy.pdf)

> I independently discovered that physics units (Ohms, inerita, etc.) can be
> described as a vector.

Interesting! I guess that makes sense. Do you have any links describing this
perspective a bit? I would be interested in reading about it.

> I have been undertaking experiments in real-time compiling. Please contact
> me if you're looking or a co-conspirator!

Sounds cool! One of the things I'm interested in is REPLs and interactive
programming environments. Do you have anywhere I can read about this project
in more detail?

edit: I noticed you were interested in musical instruments and DSPs. One of my
colleagues, Maxime, is also working on synthesizers and music production,
which I thought you might be interested in reading about:

[https://pointersgonewild.com/2019/10/06/zupiter-a-web-
based-...](https://pointersgonewild.com/2019/10/06/zupiter-a-web-based-
modular-synthesizer/)

~~~
diydsp
> physics units (Ohms, inerita, etc.) can be described as a vector.

It really resembles that Kennedy quite a bit. The big diff is the Kennedy one
focus on using human-readable descriptions in code, albeit with heavy syntax,
whereas my version is more normalized and suitable for machine implementation,
e.g. FPGA/ASIC.

I made a brief video for you explaining it:
[https://www.youtube.com/watch?v=REZcTVS4TCk](https://www.youtube.com/watch?v=REZcTVS4TCk)

One thing I realized while making it is that it's a great way to increase your
intuition of what the units mean in physics!

------
bmc7505
Hi HN, author here! This post discusses some research I and my colleagues at
McGill have been working on recently, borrowing ideas from graph theory and
linear algebra to model computation. If any of these topics interests you,
feel free drop me an email at bre@ndan.co -- would love to get your feedback
about GPGPU compilation, boolean function analysis, abstract rewriting systems
or any of the topics discussed. Thanks!

------
goodmattg
Intriguing post! Back in my undergrad days I wrote a course thesis on
Syntactic Tree Kernels applied to the problem of determining duplicate
question on Quora. Like you, I am convinced that using graphs to link
semantics with structure is under-explored:

[https://github.com/goodmattg/quora_kaggle](https://github.com/goodmattg/quora_kaggle)

[https://goodmattg.github.io/articles/2017-09/Syntactic-
Tree-...](https://goodmattg.github.io/articles/2017-09/Syntactic-Tree-Kernels)

Further, as someone who came out of the the graph signal processing world
(Ribeiro, Gama, et. al.), and with recent experience in types and functional
programming (Pierce, etc.), I too admire the beauty of using types and graphs,
but as you note, hardware implementation is a fundamental constraint to
production. Perhaps the IPU units from Graphcore are a move towards this? I
could use clarification on your points regarding next steps in low-level
languages and hardware.

~~~
brzozowski
Thanks for your comments! I enjoyed reading about STKs, which I had not heard
about before. I know lots of people are interested in applying kernel methods
to graphs, and it seems like a important area of research. As your post
discusses, one of the problems with defining an algebraically valid graph
kernel arises with Mercer's symmetric condition. As far as I know, defining a
valid convolutional kernel on graphs is still an active area of research.
Hamilton (2020) discusses this problem in the notes from his recent GRL
seminar here:

[https://cs.mcgill.ca/~wlh/comp766/files/chapter6_draft_mar29...](https://cs.mcgill.ca/~wlh/comp766/files/chapter6_draft_mar29.pdf)

> I am convinced that using graphs to link semantics with structure is under-
> explored.

I hope you continue to explore this direction! Designing better semantic
parsing algorithms seems really important for knowledge retrieval. I would
love to learn more about natural language parsing in general (e.g. grounding,
PCFGs, HMMs), this is one area I feel we could learn a lot from in programming
languages research.

> Perhaps the IPU units from Graphcore are a move towards this? I could use
> clarification on your points regarding next steps in low-level languages and
> hardware.

Not very familiar with the Graphcore architecture, although their engineers
have tried to explain it to me several times. I think this is going to be a
co-design problem, but we can start to realize some progress by compiling to
pure BLAS primitives. GPGPU-based approaches seem like a good start.

------
Kinrany
The article starts with details from the author's biography of dubious
relevance, and there's no table of contents.

~~~
bmc7505
Agreed, the biographical details are not super relevant would help to have a
ToC. I have added one to the intro. Thanks for reading!

~~~
ojnabieoot
Not to pile on, but: I would urge you to completely delete all of those
biographical details. They are not only irrelevant - they are a huge turnoff
which destroys your credibility for what is otherwise a decent undergraduate-
level blog post. The details about how brilliant you are make you seem both
arrogant and deeply ignorant. Nobody cares about a dubious and meaningless
"prediction" about automations or pandemics or whatever. Nobody thinks your
prediction that MS would buy GitHub indicates a deeper understanding of typed
languages and machine learning. None of it makes you look intelligent or
prescient. It just makes you look like a foolish narcissist.

~~~
brzozowski
Thanks, this is helpful feedback. I was thinking about this, and where it
comes from in my writing. I’m not going to erase these details, because it
documents who I am, and my frame of mind. If you leave with a poor opinion of
the writer because of it, that’s something we share in common. I think people
should keep their egos in check, and I try my best to do so. But you cannot
erase ego entirely, and to hide it would be disingenuous. Ego is part of who
we are as human beings, and part of what gives us motivation. This blog is
partly a research journal, and partly because I seek validation from Hacker
News. I am no genius, I just listen to the right people and write about ideas
that interest me. If it makes the reader uncomfortable, I can completely
understand that. I’m not trying to hide anything, ego included.

Anyhow, I just wanted to say I appreciate your taking the time to write this
comment!

------
de6u99er
Disclaimer: didn't read the whole thing, but here's some feedback:

> Graphs are not only useful as cognitive aides, but are suitable data
> structures for a wide variety of tasks, particularly on modern parallel
> processing hardware.

I hope you mean by parallel processing, on a single machine. Reasoning on
distributed graphs are difficult to do. The semantic we community could talk
for days about what doesn't work.

> Sets: datasets, multisets, posets, alphabets

I guess by alphabets you mean dictionaries.

When you talk about types as graphs I guess you mean nested Objects with
collection attributes/types

Where I agree with you is that graphs feel more natural than other structures
because our brain is a gigantic graph of neurons.

I don't agree that the web is a directed graph. Not even the semantic web
because of a lack of data quality, ridiculous ontologies, and a massive amount
of parrtaly duplicating the same instances.

I wouldn't include formula or code parsing because in such cases they are just
a means to an end.

What I would include on the other hand are execution plans of data bases which
are used for query optimization. You might want to talk here to someone at
Oracle.

That all being said, I'd like to reiterate on talking to the semantic web
community. Maybe start networking at the ISWC conference first. I always
dreamt of every ody's personal knowledge graph where explicit data is linked
through everybody's individual ontology into something that is particularly
understandable for the creator of the specific ontology.

They definitely need someone with a different perspective. The vuys from
Cambridge Semantics are quite good.

~~~
brzozowski
> I hope you mean by parallel processing, on a single machine.

Not necessarily! Parallelism in the more general sense. We can parallelize
matrix multiplication and there is good research on distributing graph
algorithms across multiple machines. Spielman discusses this in his recent
talk on Algebraic Graph Theory:

[https://youtu.be/CDMQR422LGM?t=1755](https://youtu.be/CDMQR422LGM?t=1755)

> I guess by alphabets you mean dictionaries.

Just unordered collections of symbols. Depending on the dictionary or alphabet
implementation, it might have an ordering.

> I don't agree that the web is a directed graph.

This was intentionally vague, but depending on which types of links you
consider it may or may not have directionality. Hyperlinks, for example, are
directional.

> I wouldn't include formula or code parsing because in such cases they are
> just a means to an end.

Part of my goal is to show how simple implementing these algorithms can be.
Agree it doesn't flow very well, maybe it should be in the appendix.

> You might want to talk here to someone at Oracle.

Agreed, I think the work they're doing in probabilistic programming languages
is super interesting. I recently gave a short presentation about loopy belief
propagation (another interesting graph algorithm for PGMs).

[https://github.com/breandan/kaliningraph/blob/master/latex/c...](https://github.com/breandan/kaliningraph/blob/master/latex/comp766/presentation.pdf)

> That all being said, I'd like to reiterate on talking to the semantic web
> community...Cambridge Semantics are quite good

Will definitely look into them, thank you! I am fascinated by the whole topic
of knowledge bases and semantic parsing, and appreciate the suggestions you
provided.

------
maxsun
I enjoyed reading through this -- maybe more-so than other commenters since
I'm less familiar with the material. It seems like the overall goal is to
provide some intuition for how automatic differentiation & program synthesis
might be implemented efficiently using matrices. However, I'm a little
confused on how the section on Weifeiler-Lehman & checking for isomorphic
graphs ties in to the rest of the article?

~~~
brzozowski
> It seems like the overall goal is to provide some intuition for how
> automatic differentiation & program synthesis might be implemented
> efficiently using matrices.

Right! I wish I had emphasized that theme a little more. This is basically a
long digression on how to think about automatic differentiation in the context
of program synthesis, two topics which I enjoy thinking about.

> However, I'm a little confused on how the section on Weifeiler-Lehman &
> checking for isomorphic graphs ties in to the rest of the article?

Great question! I introduce isomorphism testing for two reasons:

1\. It is an example of a general purpose algorithm which, although we do not
show it, can be implemented completely using matrix multiplication.

2\. It is an example of how message passing on graphs works. This helps us
build an intuition for how to implement more complex graph algorithms.

Hamilton (2020), does a much better job at introducing the WL algorithm and
why it matters in the context of graph representation learning:

[https://cs.mcgill.ca/~wlh/comp766/notes.html](https://cs.mcgill.ca/~wlh/comp766/notes.html)

Feel free to drop me an email if I can help clear anything up. Thanks for
reading carefully and leaving a comment!

------
hasperdi
Interesting article, some parts in the beginning is useless and detracts value
from the interesting bits.

The author boasted that he made some correct predictions about certain events
such as Microsoft acquiring GitHub in 2018. Well I bet he made other
predictions too in the past that didn't come true, but they didn't get any
mention. Hindsight bias is a real thing.

Idea for next article: explainer of human cognitive biases
([https://en.wikipedia.org/wiki/List_of_cognitive_biases](https://en.wikipedia.org/wiki/List_of_cognitive_biases))

~~~
brzozowski
Sure, I made plenty of bad predictions too! I have written about some of the
harmful effects of trust and bias in automation, feel free to check it out:

[https://breandan.net/2017/02/02/trust-in-
automation/](https://breandan.net/2017/02/02/trust-in-automation/)

------
_pjs_
The Roadmap parts reminds me of the [Futamura Projections]. The graal-vm tries
to implement it. Although the method is different, the outcome is strikingly
similar.

[Futamura Projections]:
[https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_pr...](https://en.wikipedia.org/wiki/Partial_evaluation#Futamura_projections)

~~~
brzozowski
Yes, I was originally inspired by GraalVM’s work! They were one of the first
to realize a practical implementation of Futamura’s ideas. I briefly discuss
it towards the end:

[http://breandan.net/2020/06/30/graph-
computation/#programs-a...](http://breandan.net/2020/06/30/graph-
computation/#programs-as-graphs)

~~~
_pjs_
oops, skimmed the last part tbh (was a bit too long of a read for my morning
Tea). Just saw the roadmap in the end and figured this is where it's heading.
Have you thought of applying this to web-stacks?

But thanks for the great article! Will read the second half more thoroughly
this evening.

~~~
brzozowski
> Have you thought of applying this to web-stacks?

Yes, it seems like a good place to start. I was thinking of writing a PoC
interpreter for a subset of the JS syntax. Thanks for the suggestion!

------
titzer
See "The Program Dependence Graph and Its Use in Optimization"

[https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferr...](https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferr..).

~~~
brzozowski
Very cool, thank you! Link was broken, here is the full reference:
[https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferr...](https://www.cs.utexas.edu/~pingali/CS395T/2009fa/papers/ferrante87.pdf)

------
jvanderbot
This reads like one of those "Golden ratio appears everywhere!" posts. It's
overly self-congratulatory and a scattershot of ideas with varying levels of
technical depth which makes it very hard to track the topic.

~~~
brzozowski
Definitely the product of an over-excited grad student who’s fallen in love
with these ideas for the first time. Thanks for reading, I hope you were able
to get something from it (if not what to avoid)!

------
jcelerier
isn't differentiable programming a concept that has existed under one shape or
another since the 80s, likely earlier ? e.g.
[https://dl.acm.org/doi/abs/10.1145/322609.322611](https://dl.acm.org/doi/abs/10.1145/322609.322611)

Like, I'm pretty sure writing a generic "differential function" is one of the
first things you teach to undergrads taking a LISP class - isn't "differential
programming" as mentioned here just the same concept baked in PLs ?

~~~
brzozowski
It has been rebranded a number of times, from automatic differentiation to
algorithmic differentiation to differentiable programming. Takeo Imai wrote a
blog post about the origins and history of the term and its modern usage:

[https://bonotake.github.io/deep%20learning%20and%20cs/2019/0...](https://bonotake.github.io/deep%20learning%20and%20cs/2019/08/02/origin-
of-differentiable-programming.html)

