The one distinction I would add with neural networks is that it's not just a recursive tree traversal that one would get when evaluating an arithmetic statement, but an actual graph: a computation node can have gradients from multiple sources (e.g. if a skip connection is added), so each node needs to keep accumulated state around that can be updated by arbitrary callers.
Of course, optimized autograd / autodiff is more parallelized than node-based message passing, but it's a useful model to start with.
I'd have to think about this for a while but I'm not sure I see that as a distinction. if you have a skip conneciton, that's just another node in the graph you can have to execute topologically before your dependent node, and then pass the data. over the edge when the child node is ready to consume.
What you're describing with node-based message passing sounds much more like a petri net, or other agent-based discrete event modelling system. Which is another powerful mental paradigm, but challenging to reason about.
In terms of abstract algebra, there are no distinctions. What you call gradients are actually data flows. "From multiple sources" - means that a function can take multiple parameters (= inbound gradients, inflows).
Of course, optimized autograd / autodiff is more parallelized than node-based message passing, but it's a useful model to start with.