
Viewing Matrices and Probability as Graphs - _Microft
https://www.math3ma.com/blog/matrices-probability-graphs
======
edflsafoiewq
Re viewing matrices as graphs

Under the usual matrices-as-linear-maps interpretation, a nxm matrix is a
linear map from the free vector space on m elements to the free vector space
on n elements, M : Vect(Y) -> Vect(X), where X and Y are as in the article.

For a regular map of finite sets Y -> X, we can draw a picture showing how
each element of Y goes to an element of X by drawing a bunch of dots for Y and
a bunch of dots for X with an arrow from each dot in Y to its image in X. This
is a directed bipartite graph where each dot in Y is the source of exactly one
edge.

How is a linear map, Vect(Y) -> Vect(X), different? Each element of Y goes not
to a single element of X, but to a linear combination of elements of X. If y
goes to a x1 + b x2, we can think of this as two arrows out of y, each one now
being weighted, one arrow to x1 with weight a and one to x2 with weight b
(considering edges with weight 0 to be non-existent if you like). This is the
graph shown in the article.

This interpretation shows why matrix multiplication is path composition. It's
exactly analogous to how for the map of finite sets, you find the image of z
by first following its arrow to y in Y, then following __its __arrow to x in
X. Here you first follow z to all its images in Y, then you follow those to
all their images in X. Relation composition is only a special case.

And of course the connection between XxY->R and Vect(Y)->Vect(X) is the usual
correspondence between a rank (1,1) tensor and a linear map.

Re probabilities: this corresponds to imagining a random process that
transforms elements of Y into elements of X where y turns into x with
probability p(x,y). The probability that you end up with x (the marginal
probability) is the probability that any dot in Y transformed into x, ie. sum
over all the arrows that wind up at x.

------
ironSkillet
If you like this idea, also check out
[https://graphicallinearalgebra.net](https://graphicallinearalgebra.net).

It also describes fundamental linear algebra operations in terms of graphs,
but with a slightly different flavor, and much more detail.

~~~
blondin
thank you so much for sharing! this looks very interesting.

------
michelpp
There is an API formalizing this style of matrix/graph computation called
GraphBLAS, coming soon to a language and database near you.

Here is an easy to read but math focused introduction to the topic by Dr
Jeremy Kepner Head and Founder of the MIT Lincoln Laboratory Supercomputing
Center:

[http://www.mit.edu/~kepner/GraphBLAS/GraphBLAS-Math-
release....](http://www.mit.edu/~kepner/GraphBLAS/GraphBLAS-Math-release.pdf)

~~~
espeed
What do you mean "coming soon" \-- GraphBLAS 2.3.0 was just released [1] --
it's been here! ;)

[1]
[http://faculty.cse.tamu.edu/davis/GraphBLAS.html](http://faculty.cse.tamu.edu/davis/GraphBLAS.html)

[2] [http://graphblas.org](http://graphblas.org)

For more context, depth, and previous discussions, see...

[https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&pr...](https://hn.algolia.com/?query=GraphBLAS&sort=byPopularity&prefix&page=0&dateRange=all&type=all)

~~~
michelpp
It's true. I look forward to seeing more language bindings, like Python,
Javascript, etc. I expect it will be soon. For my own GraphBLAS education I
have been working on a PostgreSQL extension:

[https://github.com/michelp/pggraphblas](https://github.com/michelp/pggraphblas)

------
Cogito
The semi-ring described at the end on Z_2 = {0,1} with the 'interesting'
addition function where 1+1=1 might be more familiar here as Boolean logic,
where:

0 -> False

1 -> True

x -> AND

\+ -> OR

(a AND b)=True iff a=True and b=True; (axb)=1 iff a=1 and b=1

(a OR b)=False iff a=False and b=False; (a+b)=0 iff a=0 and b=0

------
mike_ivanov
And vice versa. Take a graph, write it down as a matrix, apply a discrete
Laplacian operator. Then - just as an example - find the number of distinct
non-negative eigenvalues. This is exactly the number of connected components
in the graph. Pure magic.

------
lmeyerov
A fun before and after for me was when trying to explore correlations over
samples with a lot of features. In particular, on survey results on why teams
adopt different programming languages:

V1: Correlations as an interactive heatmap --

[http://lmeyerov.github.io/projects/socioplt/viz/rank.html](http://lmeyerov.github.io/projects/socioplt/viz/rank.html)

V2: Correlations as an interactive graph --

[https://labs.graphistry.com/graph/graph.html?dataset=PyGraph...](https://labs.graphistry.com/graph/graph.html?dataset=PyGraphistry%2FPC7D53HHS5)

Nowadays when we're dealing with wide event data (security, fraud, operations,
etc. investigations), even more fun!

------
TuringNYC
I like how the material was presented. Anyone know what program was used to
create the illustrations -- these are fantastic for explanations!

~~~
abnry
I believe they are hand drawn. Check out more posts on the blog, they are
similarly wonderfully presented!

------
outlace
There's a probability theory and associated software package that uses
stochastic matrices for inference/conditioning/etc:
[https://efprob.cs.ru.nl/](https://efprob.cs.ru.nl/)

------
CMCDragonkai
There's also graphical linear algebra which takes matrices into a more
category theory interpetation that is also visual:
[https://graphicallinearalgebra.net](https://graphicallinearalgebra.net)

------
Twisol
> [...] what do matrices corresponding to equivalence relations look like?

Actually, the graphic under "Block matrices correspond to disconnected graphs"
illustrates this for us already! (Just mentally replace all the non-zero
entries with 1; the graphs structurally remain the same.)

I initially thought it would be "a disconnected set of cliques", but we've
already established that we're modeling rectangular matrices as bipartite
graphs between the input and output spaces. Square matrices in particular can
be considered to operate over the same vector space, so we could have more
arbitrary graphs -- such as cliques.

------
ajuc
This seems related to the way graphs are often implement in code -
[https://en.wikipedia.org/wiki/Adjacency_matrix](https://en.wikipedia.org/wiki/Adjacency_matrix)

Just in that case you can start with any graph, not necessarily bipartite, so
you don't divide vertices into 2 sets X and Y, just use the same set of all
vertices V twice, so the matrix is always square:

    
    
        V * V -> R
    

I wonder if the properties from this article still hold on adjacency matrices.
Would be very useful.

~~~
edflsafoiewq
Yes, it's really the same graph. There really no difference between drawing
two copies of V with arrows going from one to the other, and drawing one copy
of V with arrows just staying in that copy. Either way, if you trace the
arrows from a given vertex you will end up at the same spots.

So for example the interpretation of matrix multiplication as path composition
given here explains why the nth power of the adjacency matrix gives the number
of length-n walks.

(I think the precise relation is that if A the adjacency matrix for G, then if
you draw the graph from this article for the matrix A, the adjacency matrix of
_that_ graph will be the block matrix

    
    
        0 A
        A 0
    

That's assuming the edges in the article are regarded as undirected; if they
are directed, just replace one of the As with 0.)

~~~
edflsafoiewq
Heh. No one noticed but that should have been

    
    
        0 A^T
        A  0
    

In the matrix-as-arrows-from-dots-in-Y-to-dots-in-X picture, the transpose
means "turn all the arrows around". This also explains the connection between
the transpose and the categorical dual.

------
activatedgeek
Given this seamless transition between linear algebra and graph theory, could
someone summarize how exactly one or the other formulation helps to solve
problems better?

I wouldn't want to read up a huge list of things, just trying to get a sense
of what's being achieved by this kind of (re)formulation.

~~~
sjg007
See Bayesian networks, Markov chains...

------
elbear
I've been thinking about this lately. Specifically, I see a connection between
systems theory and graph theory. My intuition tells me that all relationships
between systems can be expressed as graphs.

Can anyone confirm this? I'm not very familiar with either field, so I may be
asking a basic question.

~~~
ironSkillet
You would have to be more precise in describing what you mean by "all
relationships between systems". A simple graph (meaning vertices and edges
with no additional information) can encode the notions of a system and the
existence of relationships between two systems, but without more structure you
can't day much more. But this is already useful. If you have a big system
you're trying to understand, a good first step is to draw a graph of the major
components with edges indicating a dependency.

~~~
elbear
Systems interact through inputs and outputs, right? I mean, one system's
output can be another system's input and even the second system's output can
be the first one's input.

I guess my question was if there's something that can't be described or
modelled with graphs.

As I said, my intuition is fuzzy, because I don't know much graph theory.

------
leetrout
Is there a typo in the header "Matrices with no nonzero entires correspond to
complete bipartite graphs"??

Shouldn't it be "Matricies with all nonzero entries" or "Matricies with no
zero entries"?

------
bhl
I do recall looking up the definition of 'bijection', and seeing that it
slightly resembles a graph. As a generalization: so an n-dimensional tensor
can be viewed as a weighted n-partite graph?

~~~
Twisol
An n-partite graph has the problem that edges can still only connect two
vertices at a time. I think you would also need to include hyper-edges (faces,
solids, etc). The joint probabilities would be represented by these hyper-
edges, and the bounding ("lower") edges represent progressively more marginal
probabilities. Higher category theory studies these kinds of objects, so it
wouldn't particularly surprise me.

I wonder if simplicial complexes would work here?
[https://en.wikipedia.org/wiki/Simplicial_complex](https://en.wikipedia.org/wiki/Simplicial_complex)

------
cpburns2009
I'm not following. Why are _M12_ and _M31_ ignored in the first example, and
why are only edges given values while vertices coordinates are ignored?

------
justAlittleCom
It's funny because I could write a "viewing probability graphs as matrices"

For me, probabilities are graphs, and matricess is just a way to represent
those.

