
Understanding Programs Using Graphs - caution
https://engineering.shopify.com/blogs/engineering/understanding-programs-using-graphs
======
simonw
One of the many neat uses of Observable is quickly prototyping these kinds of
graphics - often by starting with a notebook someone has already created and
iterating on it.

I have a notebook here which can visualize the table structure of any
Datasette-hosted database: [https://observablehq.com/@simonw/datasette-table-
diagram](https://observablehq.com/@simonw/datasette-table-diagram)

My first version used a force-directed graph for this, but then Thomas
Ballinger from the Observable team showed me how to use a DOT diagram, which
is a much better visualization for relational tables.

------
kipply
Seas of nodes are also used as IR in Java Hotspot and the V8 engine
([https://doar-e.github.io/blog/2019/01/28/introduction-to-
tur...](https://doar-e.github.io/blog/2019/01/28/introduction-to-turbofan/))

------
ncfausti
I started learning Blueprints in Unreal Engine recently, I swear it has helped
my problem solving ability in other general areas of programming. Watching the
links light up as nodes are executed is pretty great for debugging.

~~~
johnsonjo
I honestly feel I had a similar feeling when I played Little Big Planet 2 when
I was in my late teens. It had a creative mode where you could use basic logic
gates like and gates and or gates, and other utilities like counters and the
like (including not just digital inputs but also analog ones). Understanding
inputs and outputs from a chip level was quite interesting to me. I tried to
make a calculator and learned all about bit adders and such, but was never
able to figure out as a teenager how to make a conversion of the binary bits
to decimal so that I could display it. I'm still not entirely sure what the
proper algorithm is for that (if by chance anyone knows that would be nice to
know), but I'm sure I could find it out now that I have a degree in CS (just
saying that's plenty sufficient but not necessary to have a degree).

~~~
indiv0
You'd use a BCD to 7 segment display circuit [0]. At least that's what we did
in my digital systems design course. Depending on what number you're trying to
display you light up different parts of the segmented display.

[0]: [http://electronics-course.com/bcd-7-segment](http://electronics-
course.com/bcd-7-segment)

~~~
johnsonjo
Sweet! Thanks for sharing! I’ll have to look into this more when I have time.
I think the missing key for me was the BCD part. I had figured out how to do
the seven segment display at some point using other utilities that LBP2 had.

------
zomglings
This was a nice article, and I thank you for providing those references at the
end.

The article makes it seem as if TruffleRuby is a Shopify project, where this
GitHub repo seems to differ on that point:
[https://github.com/oracle/truffleruby](https://github.com/oracle/truffleruby).
What's the deal?

I'm also very curious about how much TruffleRuby helps you save in
infrastructure costs. It looks like Shopify is starting to put together
serious systems around optimization of Ruby code.

~~~
chrisseaton
We're working on TruffleRuby at Shopify, but it's an Oracle-led project yes.

> I'm also very curious about how much TruffleRuby helps you save in
> infrastructure costs.

We don't have an empirical answer to this yet. We're working on it. We're
investing in MRI, TruffleRuby, and many other parts of the Ruby ecosystem.

~~~
zomglings
Cool. Another question since you are around: How many of your sea-of-nodes-
level optimizations do you think can be applied instead as AST-level
optimizations?

~~~
chrisseaton
Well you can see the one in the article about sharing the computation from the
two branches - there's no way to represent that in an AST, for the very reason
that it's a tree - there can only be one user of a node in an AST, but in a
graph, multiple nodes can use it.

You could introduce some kind of hidden local variables and assign to those,
but then you're kind of building a graph in an AST, and might as well use a
graph, and those variables are harder to understand for other optimisations.

(Note that these aren't 'my' optimisations, I'm just writing about them.)

------
retox
Aren't we coming full circle back round to UML? UML is very good at designing
and documenting complex code and it's interactions at many different levels,
and having a common graphing notation helps ensure everyone is on the same
page by looking at the same diagram.

The problem comes when the design and the code get out of step, but there are
tools available to round-trip the code and model to keep everything in sync.
Executable UML used to be a thing too, but I was never very convinced by it.

~~~
RNCTX
Hopefully they get it right this time when someone reinvents UML.

By right I mean, the first time someone comes along and says “it’s not a
visual language anymore” the response is “wrong, and you’re fired.”

------
bllguo
not _directly_ related, but one interesting thing my coworker has been working
on is visualizing ML models using graphs, which has made them much more
interpretable for business stakeholders. Being able to break the model down
into steps, feed the graph individual observations to see how the model flow
would work, etc. I was surprised how receptive non-technical people were to
the concept

------
Woberto
I enjoyed the article, but am maybe too dense to understand the repercussions.
My understanding of the summary is that a goal is to be able to develop these
graphs and then convert them back to ruby so that developers who know ruby can
see how the compiler is optimizing. My honest question is, why does this
matter? If the developers then start writing code more in line with what a
compiler would do, then it seems compilation would be faster - is it
significant, and is this a reason why? What else could it be? Thanks for the
article and any thoughts!

~~~
pubby
My apologies if you know this, but by graph it means the mathematics
definition (a type of data structure) and not the actual visualization image:
[https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)](https://en.wikipedia.org/wiki/Graph_\(discrete_mathematics\))

Compilers convert source code into graphs and then perform optimizations on
these graphs. They never work directly on the source code itself. Once the
graphs have been optimized, they're then converted into the target format,
which is typically assembly language or bytecode.

So mostly this is about techniques compilers use to optimize and transform
code.

~~~
Woberto
I didn't know that, thank you for kindly enlightening me! So from the
blogpost, the "compiler debug dumps" are the optimized graph data struct, and
instead of converting to assembly or byte code, they want to "decompile" or
convert back to ruby. Do you think they're doing this to better understand and
improve the compiler? Or to write code that is more easily compiled?

~~~
jimsmart
I suspect this may have originally developed as a debugging aid, your “to
better understand and improve the compiler”, but also to better understand how
Ruby programs execute. With non-trivial data structures and algorithms (such
as sea-of-nodes here), being able to visually introspect statically, and/or
visualise its operation dynamically, can often provide informative insights.

Instead of, or perhaps as well as, converting to assembly or byte code, here
the author is producing a visualisation of the internal graph structure
contained within the compiler.

The graph isn’t necessarily a one-to-one match with the AST derived from the
source, due to graph transformations that take place during the optimising
phase of the compiler/runtime. This makes it even more important/interesting
to see what is going on under the hood.

Usually, in high-level programming languages one rarely has concerns regarding
how easily the compiler might deal with the code - one just codes, and uses
language features appropriately, as needed. With the only caveat here being
the extreme performance crew, e.g. games programmers might use a restricted
subset of C++, and also might disassemble compiler output, and use that
information to help try and optimise their source somewhat.

------
flexvision
What is a simple tool to produce ASTs from code? Can it work cross-language?

~~~
AntonyGarand
I would start with [https://astexplorer.net](https://astexplorer.net), which
handles many languages, and provides interactivity: click on a node within the
ast, and il will highlight the corresponding code and vice versa.

The ast's are usually language specific, therefore it's not cross language.

------
srikz
The big value of the visual representation is if it is dynamic. That is, it
gets updated while you are running the code and can use it during debugging.
Otherwise, the novelty wears off quickly.

Static graphs are still useful to understand a new piece of code, see Source
Trail[1] [1]: [https://www.sourcetrail.com/](https://www.sourcetrail.com/)

------
Sevaris
Interesting topic. I feel like the representation of the Ruby fib misses the
mark though because of how many language implementation details are caught up
in it. The lack of abstraction here takes away from the usefulness of using a
graph for understanding control/data flow in a program.

~~~
tom_mellior
> I feel like the representation of the Ruby fib misses the mark though
> because of how many language implementation details are caught up in it.

The article's title is not optimal. It's really about compilation to machine
code, and for that you do need all these details. The only "understanding" of
these graphs is about understanding what the compiler is doing with the code,
it's not for program understanding on a high level.

------
navan
Interesting article and good question about whether we should program using
graphs. In a different domain, languages like halide, tvm and tiramisu or in a
way letting programmers create graphs for high performance.

~~~
lmkg
Programming via graphs is common in big-data machine-learning as well. Apache
Beam and Tensorflow are both programmed by creating an execution graph object,
and handing that off to the runtime.

~~~
rcfox
Makefiles basically just trees.

------
thelazydogsback
The graphic is too fuzzy -- but I'm trying to figure out why the AST for fib
in Ruby would be so complex...

~~~
chrisseaton
Ruby has more complicated semantics.

An example of a concrete difference here is that Ruby checks for overflow on
integer maths, where Java does not. This means each maths operation is a
little cluster of nodes, rather than a single node.

Another example is that Ruby has dynamic typing, so values that come into the
function need to be type checked, and values going out need to put into a
format where they can have their type inspected.

One more example is that the call to fib in Java is static - we know exactly
where that goes. In Ruby the call is dynamic, so we need to check we're
calling the correct method.

Here's a full SVG copy of that diagram
[https://www.dropbox.com/s/k68id9aqu258blw/fib-
ruby.svg](https://www.dropbox.com/s/k68id9aqu258blw/fib-ruby.svg).

~~~
thelazydogsback
Thanks - I knew all that, but I guess I expected those semantics to be
implicit rather than explicit. I guess because we're not really looking at an
"AST" but an intermediate executable representation. Certainly this is good
for understanding what Ruby is doing, but not so useful for understanding what
fib is doing - trees vs. forest. I guess what you really want is to be able to
zoom into IL-semantics only if you care about such things, otherwise you show
only enough detail to get across the algorithm being expressed -- possibly
with the addition of declaratively showing additional constraints that have
been added by the runtime without showing how the contraints have been
satisfied.

