
Programming as Theory Building (1985) - onlurking
https://gist.github.com/onlurking/fc5c81d18cfce9ff81bc968a7f342fb1
======
optymizer
The topic of how we as developers implement solutions in code has been on my
mind for years.

The one insightful idea I found in this essay is that coding is a lossy one-
way operation, from which you cannot fully derive the original idea or the
'theory'. That seems similar to losing information when compiling source code,
making it impossible to restore the exact source code from its machine code
representation.

So if we work backwards, it's: machine code (bits) -> source code (text) ->
idea/solution (human thought?)

Despite losing some information, machine and source code have interesting
properties, such as being able to copy them easily, transpile to different
format, etc.

What I'd like to ask the HN brain is if anyone can think of another way to
express a higher level thought other than language? In his essay, Naur implies
that there is no such thing. I wonder if we had made any progress on that
front in the 35 years that have elapsed since this essay was written.

The only thing I can think of is something like UML, which has tons of diagram
types for structural and behavioral properties of a system, but I've always
found it hard to 'see' the real idea they're trying to describe, in the same
way how I find it hard to imagine a 4D object by looking at its 3D
projections. With enough effort its certainly doable, but I wouldn't say the
process is intuitive or easy, so to me, diagrams are like projections of an
idea from different points of view, but how do we encode the
idea/thought/theory itself?

What is it about language and apprenticeship that makes conveying ideas or
theories possible? I view this process as an inefficient way of serializing an
idea and transmitting it over voice to another person, who has to unserialize
the sounds, convert them to words, then they have to create the associations
in their brain based on the meaning of those words, and then probe into the
correctness of the associations by asking clarifying questions.

Is this really the best we can do in 2020? How are other fields conveying
complex abstract notions and ideas?

~~~
thristian
You're butting your head against the fundamental paradox of communication: in
order to communicate an idea that's in your head to somebody else, you have to
encode it in a way that the other person will recognise and decode; that is,
you need to already have some shared context. However, if you have a _new_
idea then by definition it can't be part of the shared context, so it can't be
communicated.

We get around this by invoking combinations of existing ideas and hoping that
the recipient puts them together in more or less the right way: we might say
"a leopard sits in the tree to your left", invoking the existing ideas
"leopard", "tree", "to your left" and "sits", which can be combined in the
obvious way. UML, musical notation, mathematics... all these are variations on
"language" in the sense that they have a vocabulary of existing ideas, and a
grammar of natural ways to combine them, and so you can bootstrap ideas in
another person's brain by giving them pieces they already know and hoping they
can assemble the idea correctly.

Language is messy and non-portable and unreliable, and it is exactly those
properties which allow it to convey novel ideas from one person to another.

~~~
mjburgess
We get "around" this by ostentation: words in a spoken (natural) language
_refer_ to the world.

I point to examples of trees and say "tree", etc.

"New" ideas are acquired by example -- language is not a closed system.

~~~
Ma8ee
Which is harder than it sounds if you are trying to do it from scratch. Even
pointing is a part of language, and trying to convey that concept is far from
trivial. I think a big part of language learning in children involve the child
seeing other people react to language and imitates. E.g., father points and
mother gaze in the direction of finger, child follows mother's gaze.

~~~
musingsole
To further this idea, dogs intuit the meanings of commands, some more easily
than others, but teaching a dog to understand a human pointing at something is
hard. My own dogs seem to interpret it as "just keep looking"

------
at_a_remove
"If you lose the people, you lose the program."

I had a protracted, bitter struggle with this at one job. We had a business
process, really in the top two of business processes, that was neglected as it
was critical. When I was new to that role, I said, "We should rewrite this.
This is scattered across multiple servers, the code has almost no comments (in
the places where we still _had_ source code), and more importantly, people
were leaving."

One of the original programmers had died. People responsible for the _why_ of
certain decisions were retiring or leaving. And so on and so forth. I used to
joke that our process was documented in C, except for the places it was bash,
or borrowed Powershell, or ... Like an evolved process, instead of problems
being solved, later systems were added to correct issues, only epicycles, even
as the bus factor continued to decrement every so often.

I still have a low level of sour antipathy when I think of it, that my efforts
to "do the right thing" came to nothing.

It's a shame. When I had a freer hand to work as I liked, systems I built
could detect, in a limited fashion, when the world had changed, that is, if
the theory of the world was wrong. If the vendor changed some critical portion
of the database, the program would identify the new column or the missing
table, then loudly expire after issuing its complaint.

This understanding of the slice of the world a program must interact with is
so critical and worse yet, fragile, subject to both breakage and decay.

------
onlurking
OP here, much of this paper resonates around the knowledge we need to build a
system, lot of this is like the understanding of business rules, is the
context we need to build a working software in the first place.

The problem is that knowledge is mostly "tacit", and tends to grow as the
software evolves. For example, several development tasks are normally
completed not only based on the documented user stories, but they also carry
the context from meetings or discussions that aren't documented.

When you lose the original authors of the program, it becomes very difficult
to rebuild the necessary context to understand how the system works - tasks
like adding new features or modifying existing behavior becomes very hard.
Also in the "The Metaphor as a Theory" part, much of the work is a shared
knowledge between the developers, when you have several programmers working in
parallel as fast as they can, the design of the program can become highly
incoherent.

Nowdays we have practices like testing which could be a really helpful
companion when it comes to understanding how the system works and the expected
behavior of it's parts, which can be treated as a documentation, also we have
code reviews that can guarantee that any addition to the system is consistent
according to the system's design if is done right.

But still, this dependency of the context it's a very hard problem to resolve.

~~~
azhu
> The problem is that knowledge is mostly "tacit", and tends to grow as the
> software evolves

Totally agreed. I agree that the missing codification is the context that was
needed at the genesis of the program -- the context necessary to understand it
holistically.

Seems to me a valid way of understanding this problem is that there is a
missing piece of documentation. It's generally well documented where/what the
starting point product/business idea is as well as where/what the end point
code is, but how point B was derived from point A is oftentimes not well
documented. This method of derivation highly informs the structure of the
result, and is a lossy conversion. It can be thought of as the "spirit" of a
program/an org's engineering department (what happens in between the product
ideas you can see going in and the lines of code output you can see coming
out).

If that part, and the methods and techniques by which an org maps and
translates an idea from the domain of business into the domain of code, is
codified and documented and evolved alongside the programs the org produces
then it may help.

------
alexashka
I think this could be applied to all spheres of human knowledge.

We have books written in languages we don't understand. There _has_ to be
shared context beneath all forms of communication and communication itself can
be seen as CRUD operations of the layered cake of shared
contexts/stories/narratives/ideas that have dependencies between them.

Study of CRUD operations and interactions between the layers and what the
various types are and what common dependencies occur and when would be a very
fruitful field of psychology, if not perhaps some AI modelling - I don't know
if there already is a psychological model along these lines.

For example when you haven't seen a friend for a long time and you reconnect,
if your life experiences haven't updated or deleted large parts of your shared
context, you'll fall right back into the groove.

It does make me wonder if there needs to be an update of the shared context
country or even world-wide to make people feel a sense of community again.
We've done away with religion and the many community bonding experiences that
it offered. We've tried the 'get rich or die tryin' context and I don't think
it has been fulfilling for the majority of the beta testers :)

Time to try something else, perhaps with a little more thought and rigor put
into it?

------
bonestormii
This concept evokes many familiar memories of reading other people's code,
which is always extremely hard. I feel multiple ways about this concept. On
the one hand, I believe programs can be sectionally reduced to a inputs,
outputs, and a sequence of states in between, and a programmer can understand
those things well enough to extend an existing programmer competently in many
cases. It must be true, because it does happen.

On the other hand, while a programmer can learn from source and documentation
the wheres, whens, and whats of the program, there is always the remaining
question of "Why?", which is central to this discussion. Here, I think good
high-level examples of usage tend to do a good job of covering the inputs and
outputs. But with regard to all of the intermediary states of the program...
there is too much detail there to really document it. Those details evolve as
an evolution more than a design. There is code added, then replaced or omitted
entirely. Things are designed which work, but then are restructured for
performance, organization, or to eliminate repetition. In these cases, there
is information that is manifested in the _absence_ of code, and the second
rendering of the code better captures its function, but obscures its
evolutionary history.

Here's something I've been consuming lately:
[https://www.youtube.com/watch?v=wbpMiKiSKm8](https://www.youtube.com/watch?v=wbpMiKiSKm8)

This game programmer (Sebastian Lague, who is excellent, by the way) walks
through the development of procedural terrain generation in Unity. What's
fascinating to me is the way he does it does this really effective job of
"theory building". Things are implemented; results are observed; some code is
deleted altogether that was only ever present to allow building up to that
illustration, but will no longer be necessary at the next stage of evolution.

This is the way programmers work. Information is lost. If you weren't there to
experience it at inception, only a great imagination and testing can replace
it--at which point, you may find yourself actually rewriting the code, using
existing code as a reference.

~~~
ximm
So if reading code means reconstructing a theory, that could explain why we
have so much NIH.

~~~
bonestormii
NIH?

~~~
lioeters
[https://en.wikipedia.org/wiki/Not_invented_here](https://en.wikipedia.org/wiki/Not_invented_here)

------
fsloth
Peter Naur is one of the unsung(?) giants of software. His name _should_ be
instantly recognizable.

This is perhaps the best paper ever to explain the nature of collaborative
program development and maintenance.

If I had a company I would make this mandatory reading for everybody -
everybody, not just programmers.

~~~
munichpavel
I am currently leading a topic group at work on code documentation in AI
development. I was already planning to skip the lists of dos and don'ts, as
well as the usual best practice language, but maybe I should just refer
everyone to this gist and Naur ...

------
azhu
> A program is a shared mental construct (he uses the word theory) that lives
> in the minds of the people who work on it.

Absolutely. The plain English definition for the word "program" that Google
shows is (noun) "a set of related measures or activities with a particular
long-term aim", (verb) "arrange according to a plan or schedule".

A software program fulfills a certain set of behaviors, serves a certain
purpose, is a materialization of an idea, or otherwise is a transcription of
something from a certain domain into the domain of software. The "source of
truth" of what that something is is external from the program.

The knowledge domain of a programmer is therefore not only both their code and
the idea it represents but the mapping in between. Both ends are easily
documentable (in the narrow context of their specific domains), and it feels
like this could have led to a possible convolution of what it takes to be a
good programmer.

We have endless measures, philosophies, and codifications for what makes for
good form when drawing back the bowstring, what release techniques make for
the least disturbance to the arrow's path, what arrow shape makes for the most
optimal flight, etc, but less for an archer's aiming technique. All we can do
is just look to see if the target's been hit or not.

How an org "aims" the "arrow" of code towards the business target is a higher
level concern than how awesome the arrow shot is or how straight it flies. If
you're not controlling how you aim you lack the context to fully qualify your
assessment of how arrow choice, pull/release form, or even flight path
affected your result.

Codifying not only how your org builds product ideas and how it implements
software, but how your org maps from one domain to the other helps mitigate
knowledge siloing.

------
Ididntdothis
This makes sense to me. I think it’s really important to be able to predict
what the software will do under certain circumstances. You can do that only if
you have a pretty good concept of the thinking behind the code. I usually get
nervous when something doesn’t behave as expected because it indicates that
there is a mismatch between the theory and its implementation.

This would explain why a lot of corporate software isn’t good. There is no
shared understanding and a lot of people make changes without understanding
the big picture.

------
igravious
Related discussions on HN:

10 months ago:
[https://news.ycombinator.com/item?id=20487652](https://news.ycombinator.com/item?id=20487652)

2 years ago:
[https://news.ycombinator.com/item?id=10833278](https://news.ycombinator.com/item?id=10833278)

5 years ago:
[https://news.ycombinator.com/item?id=7491661](https://news.ycombinator.com/item?id=7491661)

------
imprettycool
I only read the abstract (first paragraph), so maybe I'm way off base here.
I'm about to go to bed and want to bang this out:

I think it's three pieces that need to come together. The source, the system
it's running on, and the user. You don't necessarily need all 3

1\. If you only have the user and the source but not the system then you're
screwed. I could print out the entire FreeBSD source and docs, go back in time
to 1820 and it would be pretty much useless since I need a C compiler and a
million transistors, power supply and a bunch of other stuff. Obviously this
is an extreme example, since most of the time you'd just have a slightly
incomplete system (e.g. crappy build scripts but you know they built it on a
unix system 2 years ago) so it's usually workable

2\. If it's the user and the system then that's basically proprietary
software. You can reverse engineer the source. Tedious but doable

3\. If it's the source and the system, then you might be able to get a new
user study both and understand everything again. Depends on the complexity of
the source/system and the docs.

I think of it as an organism, like it can be damaged and heal itself. There is
redundancy between these 3 axes. Depending the circumstances, you can heal it
or it might be permanently damaged

------
akavel
The basic notion of program as theory fits into what I personally stumbled
upon recently on my own. Notably, expanding on it, I like to see every
execution of a program as an _Experiment_ \- in that it may support or
invalidate the _Theory_ (by manifesting bugs/undesired behaviors). I'm happy
to see I'm not the first one to think of this idea. However, I am not
necessarily convinced by the main claim the article seems to make based on it,
that the Theory cannot be resurrected from the code of the program +
documentation. I think it may be very hard, and depend a lot on many factors
(quality of code, docs, the resurrecting team, their time, and as suggested,
access to the domain where the program is used), but it may still be possible
to a huge extent. I believe some sentences used by the author actually provide
hints in support of this claim. Also, in other sciences like math or physics,
albeit not easily, knowledge/theory transfer through writing can be done, or
at least helped significantly.

------
ximm
> the primary aim of programming is to have the programmers build a theory

I don't agree with that statement, but I don't think the primary aim is to
produce a program either.

I believe the primary aim is to enable users to use a program. For that they
need a mental model. Maintaining a consistent and simple theory among
developers is a means to that end.

------
p4bl0
Very interesting article. Thanks for sharing. I think it explains very well
why software developed by large IT companies¹ that puts developers after
developers for a few months on their clients projects are systematically very
bad, to stay polite.

[1] I refer to what are called SSII ( _société de services en ingénierie
informatique_ ) in France.

------
mbrodersen
The primary aim of programming is to solve problems. The moment you forget
that one simple fact you are already heading in the wrong direction.

