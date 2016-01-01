However, look at Physically Based Rendering (http://www.pbrt.org/). It won an Academy Award. It should win a Pulitzer. All of the actual source code is in the book and the book is quite readable.
Or look at Lisp in Small Pieces (https://pages.lip6.fr/Christian.Queinnec/WWW/LiSP.html). There you will find a complete lisp system surrounded by enlightening text.
Your confusion stems from the coding constraints of the 1970s. My first machine had 8k of memory, 4k of which was operating system. So programs could not exceed 4k, requiring #include statements. As programs grew they adopt the "pile of sand" (POS) format where the "meaningful" directory names like 'src', 'doc', 'test', etc contain the semantic content.
Of course nobody reads POS code. It wasn't written to communicate to humans.
It's the late 90s... code like humans matter, like your code needs to be understood, maintained, and modified. Write literate programs similar to Physically Based Rendering and Lisp in Small Pieces. Stop programming like your grandfather did. Oh, yeah, and write provable programs... but that's another rant.
Actually those that had family members working at Burroughs, ETHZ, DEC, Xerox PARC, Texas Instruments or Genera, probably should start coding as their grandfather did. :)
And yet, and yet.
Also, writing your thoughts into concrete, descriptive comments helps you with rubber-duck debugging, often pointing out weaknesses or edge cases you didn't think of when you were heads-down in the code. It's far faster to fix issues that come up ahead of time than after you've swapped it out of mind.
If only we as an industry had some label for this Technical Debt so that we could appeal to practices like refactoring, unit tests and continuous integration to pay it down.
Provable wrt to what specification?
Documenting a general computer program is much more involved.
Do you have any background on this term?
The code-starers and the experimentalists. When debugging, the code-starers would, well, stare at the code and try to reason this out logically. There will be statements such as "this cannot be the case, because ...".
The experimentalists would run the program with variations in input and code and observe the effects in order to figure out what was really happening.
As you can probably tell, I am an experimentalist, and I find the code-staring approach puzzling: (a) the world is not as you expect it to be, so reasoning about what can or cannot be the case is of limited use, as your reasoning is evidently faulty (usually assumptions). (b) there is a real world you can ask questions, why would you not want real answers??
Philosophically and historically, I consider the move towards empirical science (Bacon etc.) probably the greatest advancement humanity has made to date. It certainly is dramatically more effective than any other method of separating truth from non-truth we have, whereas scholasticism...
While the shotgun approach to science (run random experiments and see what correlations fall out) can be useful at times, there's a benefit to figuring out precisely what you're looking for before you start looking.
Not sure where you get that from someone extolling the empirical scientific method and science in general.
You obviously don't type random strings until by chance you hit on something.
Scientific method includes cycles of hypothesis, experiment, validation/refutation.
As with anything, I think you can swing too far. I had one junior developer under me who would constantly try different things, but this person was not an experimenter. There was no hypothesis being tested or reasonable mental model that they were working against. It was just a painful-to-watch spew of ill-formed and poorly thought out code of questionable syntax.
I tend to fall more on the experimenter side of things, but sometimes code staring is exactly the right approach.
For me the reason is that the bandwidth is too narrow. To make it work well you often have to guess just right, and the problem with guessing just right is similar to code-staring: your mental model of the code is currently wrong, therefore your chances of guessing just right are also not too good. And if you guess wrong, you often have to start from scratch. Another reason might be that I was scarred for life by early versions of gdb :-) And of course there's a lot of code nowadays you don't want to stop in a debugger, because it will behave quite differently (your network request just timed out...)
I rather collect diagnostics and then try to figure out what happened.
This approach also answers (to my own satisfaction, if nobody else's) the old question "how can you refactor confidently without unit tests?" Because I can prove the equivalence of any refactoring in my head, the exact same way I can recheck an algebra problem and known it's right. To me, code is a math problem. I don't mean the arithmetic in the expressions. I mean the whole structure of the program is one big math problem.
I'm a code starer. But I have my limits. If I can't reason about some piece of code and it meets the following criteria:
1. it's sufficiently complex; and
2. I need to understand it so that
3. I can extend it
Then I usually write an interface to it that I can reason about. That interface should add type information and will be peppered with assertions about invariants I expect to hold. As I understand more about the underlying code I add more assertions. If it's a dynamic language I will write many tests against the interface. And I just program against the interface instead of trying to dissect the beast and programming-by-hypothesizing.
For projects where being correct is even more critical or bugs are more costly I'll even reach for higher-level tools like TLA+, Event-B, etc. It's amazing what these tools can do and I wish I'd known about them earlier in my career.
In late March 2016 I was working out a problem in Openstack that had the appearance of a race condition. It had to do with the vif_unplugged_event sent from Nova to Neutron. In certain situations when the L2 service component in Neutron was behind on its work it would fail to respond to the event before Nova carried on with its work leaving the network in a bad state. We'd ultimately found a solution to the problem but while I was in the middle of it I was talking about it to someone who would graciously introduce me to the idea of modelling the system and using a model checker to find the race condition for me. I had heard about TLA+ from somewhere, once, so I listened.
I had been an enthusiastic enough student that this someone would become a good friend. Together we decided to work on some models of some underlying components in Openstack. Driven by the Amazon paper on their use of TLA+ on the AWS services it seemed like a worthwhile cause: see if we could convince open source projects to adopt these tools and techniques in the critical parts of their systems. Improve the reliability and safety of infrastructure projects like Openstack which continue to be used as components in applications such as Yahoo! Japan's earthquake notification service.
We haven't published our models yet but we started with a model of the semaphore lock in greenlet. It started as a high-level model using sets, simple actions, and some invariants in pure TLA+. We then added a model of the greenlet implementation in TLA's PlusCal algorithm language and used the model checker to prove the invariant in the higher-level specification still held when refined by the implementation model. We then refined the specification and the model in TLA+ until we came quite close to a representative implementation of the semaphore in PlusCal that was very close to how the Python code was written. We didn't find any errors which I think was satisfying.
We decided to take our little project to the Openstack design summit in Austin. My enthusiastic partner in maths and I found a handful of naive souls to come to an open discussion about formal methods, software specifications, and Openstack. It went quite well. We unfortunately haven't been able to expand on that effort as I'd lost my employment and he had to focus on his PhD thesis.
Needless to say though I've since used verification tools like TLA+ to model design ideas and I continue my studies in predicate calculus and logic-based proofs. I just don't talk about it too much at work. Tends to frighten the tender souls.
Update I should clarify that the errors we were particularly interested in were deadlocks in the implementation of the semaphore.
What I should have added is that invariably, the problem would be in one of these places that they had eliminated by reasoning.
While you obviously need to think about your code, otherwise you can't formulate useful hypotheses, you then must validate those hypotheses. And if you've done any performance work, you will probably know that those hypotheses are also almost invariably wrong. Which is why performance work without measurement is usually either useless or downright counterproductive. Why should it be different for other aspects of code?
Again, needing to form hypotheses is obviously crucial (I also talk about this in my performance book, iOS and macOS Performance Tuning [1]), I've also seen a lot of waste in just gathering reams of data without knowing what you're looking for.
That's why I wrote experimentalist, not "data gatherer". An experiment requires a hypothesis.
For example, I learned about pointers by debugging them and coerceing random memory addresses (which were ints) into pointers. Once the concept was learned, I could reason about it.
So nowadays I can reason about code, provided I understand the concept. But I learn about the concept by debugging/experimenting -- in most cases.
Of course I am a big fan of automated tests.
You can start experimenting with the program at runtime while the program still is very incorrect. The type checker would require that the program at least makes sense from a typing perspective.
There's a continuum of how nicely things are written in real life: if you just crashed your car, you might text your family quickly telling them what happened. Or if the site went down, you might hack together a quick fix. That'll be utilitarian, since in a situation like that you don't give a damn about how well the text/code "flows". Then there's writing where you're not pressured for time, but you don't care about the style: most code people write is like this. Then finally there's code, and writing, that's intended to be beautiful. It's written with the intent of showing it to other people, and with the hope that someone will read it and say "wow".
The reality, however, is that most code - and most writing - is merely utilitarian. And even when people -do- try to make their code (or writing) beautiful, they aren't skilled enough to do it. Hell, just because I _want_ to write like Márquez doesn't mean I can.
There is code that is literature, but it's hard to find. It may be lost, written on a whiteboard in a job interview to impress the company. It may be hidden from the world, the author embarrassed by it. Or, it might be out there somewhere, but nobody's discovered it yet.
* An informative program that you will learn something (good) from.
* A cryptic cipher (seemingly) that will only make your eyes bleed and your head ache.
Just sayin'.
Hmm.. Isn't the TL;DR supposed to be for readers who don't have that "moment" for you to get to the point?
Some of the other comments point out that the literature idea doesn't work for all code. It probably helps if both the author and the reader buy in to the idea. But I can't get behind the idea that it's true to say "code isn't literature" when what seems to be the case is that literature is a thing we can do to texts, not a thing we must do. Code might be a genre of text? I don't know. But maybe the baby is being thrown out with the bathwater here.
I appreciate that he's trying to dispel the idea that we "read" code as we read for pleasure, I learn from code by experimenting with it. I open up the debugger and step through it, watch the variables change and see where it goes when I execute it. Most of all, I learn by changing that code and trying to build on it. I have enhanced my javascript skills immensely in recent years by cloning various projects on github and trying to expand on them or adopt them to my own purposes. I don't recommend opening up a code base and just reading it, actively engage it, break it, and enhance it.
Sans culture, written text is just that: written text, and there can be no "literature".
And given that literature is 'machine code' for the 'psychological machine' of the Human 'reader', it should have been clear to OP from day 1 that if there is such a thing as "software as literature" it must be 'read' from the pov of Machine, SICP gods' notwithstanding.
[edit]
For people who believe code isn't literature descriptive names are wasted characters. The biggest difference between code and literature is that code is structured where literature isn't (literature makes up for this with a flow like a plot). Explicit structures are things like subtypes and nested scopes, qualities that exist directly and not through inference or reference resolution. There are things that can be inferred directly from structures like context and scope. For these developers code can be less explicit and less wordy because they aren't reading code line by line and guessing at how the pieces come together.
The code is literature camp represents an educated formalism. In the absence of explicit structures you need to understand what the code is doing by simply reading the code.
The problem with the code is literature approach is that it is for humans only. The computer absolutely doesn't care, and so therefore it can be wrong or deceptive.
But if your name and variables don't explicitly explain what each code section tries to achieve, there's no way to figure it out what it's supposed to do and why it's coded the way it is, if you don't already know that. From the code itself you can only learn what the code does, not what the developer intended to achieve; if you don't know the latest, there's no way to tell if there's a bug in it and it's working in a wrong way.
This is specially important for the parts of the program you don't understand, which are the ones where you would benefit the most from proper comments/explanations, i.e. literature.
Data point: I do read code like books and I can't stand longAndMeaningful variable names. Rather, cut to the chase.
It takes no time at all to read a longAndMeaningful variable name, and no time at all to work out what it's doing.
Assuming you don't name variables a1 to z256, calling a variable something like "items" makes it context dependent, which is a little too much like building state into the syntax.
Long names provide an extra informal level of type safety. You should be able to see what a variable is doing by looking at it, and you should also be able to see that it provides a clean level of abstraction/composition instead of just being an arbitrary container with a label.
And of course, there's a difference between longAndMeaninful and reallyVeryLongWindedAndExplicitButOhSoMeaninful. And let's not start about the FactoryFactoryBuilderEtc thingies.
That whole paragraph describes exactly how I go about learning sections of a new codebase.
Rename stuff, pull out methods and classes, reformat etc until I have a few "a-ha" moments then I discard my changes and re-read the original with a little more enlightenment.
This reminds me of something a friend said to me a long time ago about his lack of interest in crosswords: "If I want a good puzzle I read a poem."
I've always approached other people's code as a puzzle.
