
Can Programming Be Liberated from the von Neumann Style? (1977) [pdf] - kick
https://www.thocp.net/biographies/papers/backus_turingaward_lecture.pdf
======
dang
2016:
[https://news.ycombinator.com/item?id=13210988](https://news.ycombinator.com/item?id=13210988)

[https://news.ycombinator.com/item?id=12159792](https://news.ycombinator.com/item?id=12159792)

2015:
[https://news.ycombinator.com/item?id=10182712](https://news.ycombinator.com/item?id=10182712)

2014:
[https://news.ycombinator.com/item?id=7671379](https://news.ycombinator.com/item?id=7671379)

2009:
[https://news.ycombinator.com/item?id=768057](https://news.ycombinator.com/item?id=768057)

(Links for the curious.)

------
carapace
FWIW, I think the idea he presents of performing algebra (or other formal
syntactic or semantic operations) to derive and transform programs deserves
more attention than it seems to get.

I was doing some experiments with Joy (
[https://joypy.osdn.io/](https://joypy.osdn.io/) ) and some meta-programming
that would have been pretty gnarly in most other notations was very
straightforward:

[https://joypy.osdn.io/notebooks/Quadratic.html#derive-a-
defi...](https://joypy.osdn.io/notebooks/Quadratic.html#derive-a-definition)

[https://joypy.osdn.io/notebooks/Recursion_Combinators.html#d...](https://joypy.osdn.io/notebooks/Recursion_Combinators.html#derivation-
of-hylomorphism-combinator)

[https://joypy.osdn.io/notebooks/Generator_Programs.html#maki...](https://joypy.osdn.io/notebooks/Generator_Programs.html#making-
generators)

[https://joypy.osdn.io/notebooks/Newton-
Raphson.html#finding-...](https://joypy.osdn.io/notebooks/Newton-
Raphson.html#finding-consecutive-approximations-within-a-tolerance)

Joy is similar to Backus' FP and to the point-free Haskell of Conal Elliot's
"Compiling to categories" [http://conal.net/papers/compiling-to-
categories/](http://conal.net/papers/compiling-to-categories/)
[https://joypy.osdn.io/notebooks/Categorical.html](https://joypy.osdn.io/notebooks/Categorical.html)

------
DayDollar
The actual mental slavery, is the notion of instructions. You have Operand A,
Operand B, and a hidden Operand I, which operates on given Operands, encoded
in a endless chain of drudgery.

Now imagine a system, which is deterministic, but purely phyics based, data
flows continously at maximum possible speed, by beeing copied from A to *, and
operations, are executed purely as a attribute of data- meaning, this river
carrying sand, carrys certain certain sand, again purely as a property to a
stone in the river, where it is grinded, split, etc..

Then again, the new data, having diffrent "physical" properties, is sped along
to other streams.

The art to program such a machine, would be to understand the physics of data,
of the system, to place a stone upriver, and to know, where downriver to
reach, to collect the gem, liberated and polished.

Such dataphysics also has interesting inherent properties. Interaction can
never reach farther then the number of processors in parallel times the
maximum length of data. So dataphysics has a lightcone, within which it is
fully parallel. One could take data out and insert it into this determinstic
simulations, simply by staying out of sight and keeping causality. But i
digress.

~~~
gradschool
Do you mean like Sutherland and Sproull's counterflow pipeline processor
developed at Sun in the 90s?

[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.4...](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.4810)

~~~
dukoid
Shameless plug: Try flowgrid.org -- Unfortunately it tends to be more
cumbersome than "traditional" programming. I hope the tutorial is some fun
though. Perhaps because one just gets used to traditional programming too
much? Data flow programming seems to be successful for Node Red though...

------
aasasd
I wonder if anyone noticed how people first came up with the architecture of a
single bus for data and code and how they then had endless problems due to
writing data where code should be and executing data as code. And they plug
the problems by forbidding to write data and code in the same place and to
read data as code (afaik).

So yeah, should maybe computers be liberated from the single pipeline?

~~~
cwzwarich
> So yeah, should maybe computers be liberated from the single pipeline?

Internally, they basically are. Instruction and data caches are separate and
only connect at the L2 level. Aliasing between the two pipelines is only
allowed if software requests it, with the specific architecture determining
the amount of automatic vs. manual synchronization required. What would you
prefer, physically disjoint DRAM?

~~~
aasasd
I think I've read about programs misapplying the execution bits accidentally,
such that malicious data can slip by. Buffer overflows still don't seem to be
shrugged away despite the bits.

Maybe finish the move and do exactly what was proposed before the single
pipeline: don't ever mix data and code in the same memory. No need for
physically separating the RAM. Just have designated areas, maybe even allocate
code areas on the fly―only not by the program itself.

Even JIT can be accommodated this way, I think: since you know that code comes
from the program and not errant buffers, and if the program is marked as JIT-
type, let it write to a code area allocated just for this program. But
probably not copy from the data memory.

------
_pmf_
A rebuttal by Dijkstra to Backus' paper is here (actually, a deconstruction):
[http://www.cs.utexas.edu/users/EWD/ewd06xx/EWD692.PDF](http://www.cs.utexas.edu/users/EWD/ewd06xx/EWD692.PDF)

~~~
joe_the_user
A rough summary of his points:

* It is not given that the "Von Neumann Bottleneck" is what generates flabby, verbose programs.

* Computers execute programs, human reason about them. There's no reason to expect the same tools to apply. This has similarities to Linus' objections to C++ in the Linux kernel - kernel code is an implement of high level concepts and there's no to have the language structures _partly_ support or impose these high level concepts; instead one should understand abstractly and then bring them to implementation.

* The operations of functional programming aren't efficient in their naive implementation. They can be made efficient with an optimizing compiler but this effectively results in the programmer thinking about more, not less, elements.

* Dijkstra takes issue with Backus' assumption that functional programs will make program proving accessible to the average programmer. I think Dijkstra wants to up the skill of the average programmer rather than expecting that functional programs will make proving easier. (this one I'm less sure of but it seems like you have any situation where only "unaverage" programmers are going to be using functional languages).

Dijkstra closes: _" In short, the article is a progress report on a valid
research effort but suffers badly from aggressive overselling of its
significance, long before convincing result have been reached."_

And I think the situation now is that FP has shown itself to be definitely
useful in some domains but still suffers from publicity that implies it is a
paradigm that will "eat" ordinary software engineering.

That it has huge mind-share for what it isn't might be as bad as it not being
widely used for what it is useful for.

------
mpweiher
"John told me he considered FP a failure because, to paraphrase him, FP made
it simple to do hard things but almost impossible to do simple things." \--
Grady Booch,
[https://twitter.com/Grady_Booch/status/1153348388827480065](https://twitter.com/Grady_Booch/status/1153348388827480065)

------
rstuart4133
We probably will see the end of the von Neumann Style, but it won't be because
programmers don't like it. It will because in it's modern incarnation it's
horribly inefficient.

This breakdown of power consumption of a 32 bit ADD by various bits of the CPU
was taken from a Tesla presentation about their navigation computer,
[https://youtu.be/Ucp0TTmvqOE?t=5247](https://youtu.be/Ucp0TTmvqOE?t=5247)

icache: 25.0 pJ, register file: 6.0 pJ, control: 39 pJ, ALU add: 0.1 pJ

(sometimes I curse HN limited formatting. it doesn't do much, and yet the
little it does often manages to get in the way.)

The ADD takes 0.15% of the total energy consumption. His breakdown doesn't
show it, but I'm guessing the rest could be divided up into three things:
getting the data to the ALU, and decoding of the instruction stream (which can
probably be better thought of as a form of decompression), and control
overhead mostly devoted to extracting parallelism from a serial set of
instructions.

0.15% is woeful, particularly so when you realise the main constraint on speed
now is power, or more precisely getting rid of it. We already have
architectures that do better than tvon Neumann (eg, GPU's, NPU's), but even
those are orders of magnitude away from what the brain achieves in terms of
MIP's/watt.

The main constraint here is not building it. IBM did build a computer that
mimics how the brain does things. It was a neural network and so massively
parallel, but just as importantly information was sent by varying the number
of pulses sent in a unit of time. That's important because if no information
is sent there are no pulses, so potentially bugger all power was used.

The main constraint is programming it. We can't do it, and we have no idea
where to begin. This debate about FP vs statefull programming seems to be a
never ending source of entertainment in our field, but it's beside the point
as neither have much relevance to solving this problem.

------
tyri_kai_psomi
If the Alonzo Church model and Turing model of computing are computationally
equivalent, then does it really matter? I may not be understanding correctly.

------
cryptica
Programming will never be 'liberated' from the von Neumann style because the
most successful programs will always be written in that style.

That's like asking if dentistry can be liberated from the toothbrush because
we have dental floss.

~~~
Frost1x
Be careful speaking negatively about functional programming... there's an
almost cult like following obsession that's almost anti-establishment, bordem
with imperative paradigms, or more in pursuit of leaving a mark/legacy.

I've used functional and imperative programming for varied tasks throughout my
career. Sometimes, functional programming approaches are fantastic and
intuitive for tasks, other times, they're counterintuitive and become
cumbersome depending on what you're doing.

While academically, functional programming can be interesting in many
respects, for most development needs it can hamper development progress more
than assist (humans have lots of tendencies and history that gives imperative
approaches an edge).

For humans, computers exist to help us perform tasks we can't very well or
quickly and do so consistently. As long as you can do that, people in general
don't care about what underlying theoretical computational abstractions you
choose--lets not lose sight of that fact. Only those in computing will really
care.

Functional programming isn't new per se, there's just a trend of rediscovery
and advancement going on. There's something to be said about supporting the
underdog. Whether imperative paradigms will _always_ dominate as the higher
level abstraction of choice is questionable since the software field is still,
relatively speaking, new. It likely isn't going to change much in any near
future, however.

~~~
cryptica
I don't see FP (and I mean pure FP style) as the underdog. I think the FP
community tend to blame FP's repeated failure to reach mainstream on either
developers' stubbornness or FP's poor implementation in many languages.

The way I see it however, is that FP has had plenty of attention and plenty of
opportunities to go mainstream but it failed to do so simply because it's not
as effective as imperative programming when it comes to building most real-
world software.

It's not like the market doesn't know. Every developer on the planet knows
what FP is and knows all the benefits. Some imperative languages like
JavaScript have even adopted some aspects of FP for certain use cases (some
aspects of the front end UI related to dynamic templates; e.g. VueJS, React).

But FP is ideal only for certain narrow niches. FP proponents should just
leave it at that and stop trying to promote FP as if there is a fundamental
law of science which says that it's superior to imperative programming because
it's not.

My personal experience with FP is that it's not good for creating large
modular software. You can't create effective abstractions that are accurate
models of real world entities without encapsulating their state as well. State
is an essential and inseparable part of the abstraction when we talk about
modeling real-world entities in software.

------
Ericson2314
HNers, let me tell what Backus finally had in mind come to fruition; I only
figured this out recently:
[https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-...](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-130.pdf)

Here's the thing, Backus was not coming from the lisp and embryonic ML
communities that were function programming research at the time. This was
career-retrospective speech be a new-commer to the field. I think the existing
community was surprised and bemused.

Certainly your regular function programminger would have chuckled at all the
"point-free" style FP. This was a decade before Haskell mind you, and at least
a decade before lots of function reuse became popular (being blocked on better
strictness analysis and a lack of type classes and instances). Remember in
then-cutting-edge Scheme, (define ...) has an implicit (begin ...) rather than
single-expression body, and don't get me started on the other imperative
lisps. APL before FP just made the existing community laugh harder.

So fast forward a while, and even today implicit parallelism is not happening
much in production in Haskell. Applicative is much more popular than Arrow,
and thus everyone is still writing Von Neuman code even if its dressed up
functionally.

Well, let me tell, that thesis puts a lot of the pieces together:

The real genius of point-free isn't getting rid of the names. We have debruign
indices for that. Nor is it allowing other modules of computation. Computer
scientists found good ole' 1970s Category Theory (In principle, Backus could
have done this too!) and now we know about (symmetric) monoidal categories and
their internal languages and all the stuff we need to point-free-ify the
programs in lambda-calculus-influenced calculii.

The real genius is dealing with dynamism / higher-order programs. With the raw
lambda calculus, you are in a fun house of higher-orderism and it's hard to
jump out of the substitution / term-rewritting mindset. CPS, which occupied
much of the functional research community for at least a man-decade, only
further serializes and higher-order-ifizes things. Monadic programs bind
promiscuously; every step is potentially dynamic af, one again also
serializing and higher-order-ifizing things.

With arrow-like things, however (and Control.Cateogry.* and Profunctor is
really much better at distilling the concepts than the legacy Arrow class,
thanks Ed), suddenly we have a decently expressive form of computation with no
lambdas in site. We've lifted the "fog of war", freeing the first order "90%"
of the work from the higher order "10%".

Arrow notation as implemented in Haskell today, however, abuses `arr` for the
structural rules of the guest language (weakening, contraction, and exchange
per Wikipedia [1]). This reshrowds everything and ruins all the promise of
Arrows. Yuck.

Back to the thesis, there are 2 big contributions:

1) A practical version of Arrow which doesn't need arr for structural rules.
(Control.Cateogry.* since does this better and with more faithfulness to the
math, but had to start somewhere.)

2) A theory of heterogenous metaprogramming so the first order and higher
order bits don't even have to be the same language. We're talking things like
and FPGA that computes its own re-flashing.

There is also a protoype implementation of various lambdas so humans can still
use names, but I hear its bitrotted and so I can't vouch for it.

So, back to John Backus. The thing is programmers are lazy. They rather write
in the language they know, or learn the language their teachers know. Also,
humans need at least the option of some names (even if we have to name too
many things today), so mandatory combinators will cause problems as long as we
struck programming in shitty, 1-dimentional text. Finally, without at least
the _taste_ of actually compiling to other programming models, along with
scaling programs with the Von Neumann model, the economic and curiosity
incentives just aren't there. And in 1978 the Von Neuman model was still
fucking killing it. Wires were fast, and components be they spinning disks,
ALUs, or DRAM were all kinda slow but in similar and predictable ways.
Programming DOS was a shit-show, of course, but so were non-trivial compilers.
Nothing was ready.

Now we know how to do program better, have the power to run the compilers, and
the legacy machines have a cost model I like to think of as "molasses in
capillaries with huge bottlenecks". We're ready.

[1]:
[https://en.wikipedia.org/wiki/Structural_rule](https://en.wikipedia.org/wiki/Structural_rule)

~~~
fallat
Are the arrows presented in the paper, something you think I should really
familiarize myself with and use?

I did a deep dive into lambda calculus over the past year and I noticed
similar limitations you mention. So this seems interesting.

I have seen arrows before but they just seem like a re-hash of the other
lambda-like calculi. But maybe this paper's version of arrows is different?

~~~
Ericson2314
I think it's worth it. Everything is lambda-in-a-straight-jacket-like. The
question is how can you mix and match straight jackets, and how convoluted is
programming.

Also look at
[https://github.com/conal/concat](https://github.com/conal/concat)

~~~
zozbot234
Categorical logic reframes the lambda calculus as being "simply" the internal
language of cartesian-closed categories. Doing the same for other categories
leads to other languages that are not so lambda-like. Sometimes categories are
best characterized by _graphical_ languages, as seen in string diagrams and
generalizations thereof. Studying these things involves a rather thriving
subfield at the intersection of PLT, category theory and foundational
mathematics/logic.

------
segmondy
I love this paper, I believe it will be. Something like APL meets Prolog with
a little side of lisp.

------
DonHopkins
John von Neuman's 29 state cellular automata machine is (ironically) a
classical decidedly "non von Neumann architecture".

[https://en.wikipedia.org/wiki/Von_Neumann_cellular_automaton](https://en.wikipedia.org/wiki/Von_Neumann_cellular_automaton)

He wrote the book on "Theory of Self-Reproducing Automata":

[https://archive.org/details/theoryofselfrepr00vonn_0](https://archive.org/details/theoryofselfrepr00vonn_0)

He designed a 29 state cellular automata architecture to implement a universal
constructor that could reproduce itself (which he worked out on paper,
amazingly):

[https://en.wikipedia.org/wiki/Von_Neumann_universal_construc...](https://en.wikipedia.org/wiki/Von_Neumann_universal_constructor)

He actually philosophized about three different kinds of universal
constructors at different levels of reality:

First, the purely deterministic and relatively harmless mathematical kind
referenced above, an idealized abstract 29 state cellular automata, which
could reproduce itself with a Universal Constructor, but was quite brittle,
synchronous, and intolerant of errors. These have been digitally implemented
in the real world on modern computing machinery, and they make great virtual
pets, kind of like digital tribbles, but not as cute and fuzzy.

[https://github.com/SimHacker/CAM6/blob/master/javascript/CAM...](https://github.com/SimHacker/CAM6/blob/master/javascript/CAM6.js#L4569)

Second, the physical mechanical and potentially dangerous kind, which is
robust and error tolerant enough to work in the real world (given enough
resources), and is now a popular theme in sci-fi: the self reproducing robot
swarms called "Von Neumann Probes" on the astronomical scale, or "Gray Goo" on
the nanotech scale.

[https://en.wikipedia.org/wiki/Self-
replicating_spacecraft#Vo...](https://en.wikipedia.org/wiki/Self-
replicating_spacecraft#Von_Neumann_probes)

[https://grey-goo.fandom.com/wiki/Von_Neumann_probe](https://grey-
goo.fandom.com/wiki/Von_Neumann_probe)

>The von Neumann probe, nicknamed the Goo, was a self-replicating nanomass
capable of traversing through keyholes, which are wormholes in space. The
probe was named after Hungarian-American scientist John von Neumann, who
popularized the idea of self-replicating machines.

Third, the probabilistic quantum mechanical kind, which could mutate and model
evolutionary processes, and rip holes in the space-time continuum, which he
unfortunately (or fortunately, the the sake of humanity) didn't have time to
fully explore before his tragic death.

p. 99 of "Theory of Self-Reproducing Automata":

>Von Neumann had been interested in the applications of probability theory
throughout his career; his work on the foundations of quantum mechanics and
his theory of games are examples. When he became interested in automata, it
was natural for him to apply probability theory here also. The Third Lecture
of Part I of the present work is devoted to this subject. His "Probabilistic
Logics and the Synthesis of Reliable Organisms from Unreliable Components" is
the first work on probabilistic automata, that is, automata in which the
transitions between states are probabilistic rather than deterministic.
Whenever he discussed self-reproduction, he mentioned mutations, which are
random changes of elements (cf. p. 86 above and Sec. 1.7.4.2 below). In
Section 1.1.2.1 above and Section 1.8 below he posed the problems of modeling
evolutionary processes in the framework of automata theory, of quantizing
natural selection, and of explaining how highly efficient, complex, powerful
automata can evolve from inefficient, simple, weak automata. A complete
solution to these problems would give us a probabilistic model of self-
reproduction and evolution. [9]

[9] For some related work, see J. H. Holland, "Outline for a Logical Theory of
Adaptive Systems", and "Concerning Efficient Adaptive Systems".

[https://www.deepdyve.com/lp/association-for-computing-
machin...](https://www.deepdyve.com/lp/association-for-computing-
machinery/outline-for-a-logical-theory-of-adaptive-systems-efsWyqMa1l)

[https://deepblue.lib.umich.edu/bitstream/handle/2027.42/5578...](https://deepblue.lib.umich.edu/bitstream/handle/2027.42/5578/bac4296.0001.001.pdf?sequence=5)

[https://www.worldscientific.com/worldscibooks/10.1142/10841](https://www.worldscientific.com/worldscibooks/10.1142/10841)

~~~
Ericson2314
> Although I refer to conventional languages as "von Neumann languages" to
> take note of their origin and style, I do not, of course, blame the great
> mathematician for their complexity. In fact, some might say that I bear some
> responsibility for that problem.

From the paper. Whew.

