Hacker News new | past | comments | ask | show | jobs | submit login
Code is not literature (2014) (gigamonkeys.com)
112 points by setra on March 14, 2017 | hide | past | favorite | 65 comments



Code is not literature the same way equations are not mathematics. Take any math or physics book, extract all of the equations, and throw away the text. That's how we write programs... just the equations.

However, look at Physically Based Rendering (http://www.pbrt.org/). It won an Academy Award. It should win a Pulitzer. All of the actual source code is in the book and the book is quite readable.

Or look at Lisp in Small Pieces (https://pages.lip6.fr/Christian.Queinnec/WWW/LiSP.html). There you will find a complete lisp system surrounded by enlightening text.

Your confusion stems from the coding constraints of the 1970s. My first machine had 8k of memory, 4k of which was operating system. So programs could not exceed 4k, requiring #include statements. As programs grew they adopt the "pile of sand" (POS) format where the "meaningful" directory names like 'src', 'doc', 'test', etc contain the semantic content.

Of course nobody reads POS code. It wasn't written to communicate to humans.

It's the late 90s... code like humans matter, like your code needs to be understood, maintained, and modified. Write literate programs similar to Physically Based Rendering and Lisp in Small Pieces. Stop programming like your grandfather did. Oh, yeah, and write provable programs... but that's another rant.


> Stop programming like your grandfather did.

Actually those that had family members working at Burroughs, ETHZ, DEC, Xerox PARC, Texas Instruments or Genera, probably should start coding as their grandfather did. :)


I think the post does a good job of arguing against your position here (and thus of explaining why literate programming has been such a failure). Literate programming misunderstands what code is, and so misunderstands how we go about understanding code. Literate programming imagines code is like a book, i.e., something that we understand by contemplating it. The post argues (I think correctly), that code is more like an organism or a mechanism, which can best be understood by observing it in action, or by interacting with it.


I am rather conflicted to read this advice. I think we all want to do this; on the other hand, I think we also generally cannot afford the time to do it. In the project that I run, many major subsystems have been obsoleted by other algorithms, changes in the underlying hardware, etc. Every hour spent not writing beautiful literate code to communicate with other humans in these cases is an hour saved.

And yet, and yet.


Every hour saved in skipping code readability is 4 hours wasted later in trying to reverse engineer what the code author was doing. Discarding on-the-job code with no maintenance or bug-fixing is astonishingly rare. Even building different parts of a system often requires some code updates in other portions, even if it's not even executed yet.

Also, writing your thoughts into concrete, descriptive comments helps you with rubber-duck debugging, often pointing out weaknesses or edge cases you didn't think of when you were heads-down in the code. It's far faster to fix issues that come up ahead of time than after you've swapped it out of mind.


I have been that second person many times and I agree completely with this comment. When you skip readability you are saving your own dev time at the expense of other future devs'.


That sounds like debt in time against the technical aspects of the system in question.

If only we as an industry had some label for this Technical Debt so that we could appeal to practices like refactoring, unit tests and continuous integration to pay it down.


All business is about borrowing stuff from the future -- getting as much gain as possible now and deferring the costs as far as possible into the future, when someone else needs to pay back the debt, or when it can be defaulted on. Business managers don't let tech workers spend time refactoring because it violates this basic business principle. They will make up some other positive-sounding label, e.g. Agile Development, and bandy that around instead. When the system's technical debt is called in, those managers will have already moved to another system to pull the same con, or they'll replace the system with a fresh one complete with green-card-seeking programmers. The programmers on the old system will be retrenched without payment via stack-ranking, and their health problems caused from stress and lack of sleep forgotten about.


Every non-throwaway project involves communicating with another human, which includes yourself in six months.


> Oh, yeah, and write provable programs

Provable wrt to what specification?


But I think rendering is a specific kind of programming problem that allows simple mathematical reasoning (even if the equations are complicated). To state it differently, information is only flowing one way, i.e. from the model to the display.

Documenting a general computer program is much more involved.


I've never heard "pile of sand" used to describe that layout before. I tried googling but it only brought me back to this post.

Do you have any background on this term?


There's a distinction I've noticed ever since university CS:

The code-starers and the experimentalists. When debugging, the code-starers would, well, stare at the code and try to reason this out logically. There will be statements such as "this cannot be the case, because ...".

The experimentalists would run the program with variations in input and code and observe the effects in order to figure out what was really happening.

As you can probably tell, I am an experimentalist, and I find the code-staring approach puzzling: (a) the world is not as you expect it to be, so reasoning about what can or cannot be the case is of limited use, as your reasoning is evidently faulty (usually assumptions). (b) there is a real world you can ask questions, why would you not want real answers??

Philosophically and historically, I consider the move towards empirical science (Bacon etc.) probably the greatest advancement humanity has made to date. It certainly is dramatically more effective than any other method of separating truth from non-truth we have, whereas scholasticism...


I'm a "code starer" that uses experiments to inform the "staring" phase. I look hard at the code, think to myself "this can't happen... unless", I formulate the hypothesis and then use experiments to prove/disprove it.

While the shotgun approach to science (run random experiments and see what correlations fall out) can be useful at times, there's a benefit to figuring out precisely what you're looking for before you start looking.


> shotgun approach to science...random experiments

Not sure where you get that from someone extolling the empirical scientific method and science in general.

You obviously don't type random strings until by chance you hit on something.

Scientific method includes cycles of hypothesis, experiment, validation/refutation.


If you present your view as a simplistic dichotomy, you should not be surprised when it is taken as such, and if you put questions in your post, you should not be surprised if someone answers them.


The dichotomy was between experimentalists and code-starers. Not between random-code-typers and code-starers.


Your characterization of 'code-starers' is a caricature, and the questions you posed suggested that you did not fully understand the scientific method. TeMPOral's post was a reasonable response to it.


And "code-staring" is also not simply running your hand over what "must" be right and continually asserting that the bug is "impossible". It is exactly the step of debugging that informs the hypothesis to be tested.


Agreed. I can get caught up in a mess of different 'experimental' approaches and need that 'staring' phase to re-contextualize.


Especially when dealing with concurrency.


I think the experimentation helps identify the piece of code to stare at. Some people are just choosing staring at an earlier point than you are.

As with anything, I think you can swing too far. I had one junior developer under me who would constantly try different things, but this person was not an experimenter. There was no hypothesis being tested or reasonable mental model that they were working against. It was just a painful-to-watch spew of ill-formed and poorly thought out code of questionable syntax.

I tend to fall more on the experimenter side of things, but sometimes code staring is exactly the right approach.


Don't forget the "step-througher". I'm surprised nobody in this thread mentioned using a debugger. I tend to rely heavily on the debugger and stepping through the code. Most common two mistakes beginners make are 1. "printf debugging" and 2. simply trying different inputs and reasoning about what outputs they give. You just wrote the code yourself (or at least have the source)! It's not a black box with just inputs and outputs. Step through the darn thing and see if it's really doing what you think it should be doing.


I agree stepping through is incredibly useful, but I think you're wrong to characterize "printf debugging" as a mistake. A printout of several well-formatted printfs can be a good way to view more than one moment of a program at once. It's less useful for control-flow errors but can be good for other problems.


I tend to write JS and languages that compile to it, and I feel debugging tools are the area most lacking in that space.


I have to admit I find stepping-through generally not too useful (although there are obviously exceptions when it works well).

For me the reason is that the bandwidth is too narrow. To make it work well you often have to guess just right, and the problem with guessing just right is similar to code-staring: your mental model of the code is currently wrong, therefore your chances of guessing just right are also not too good. And if you guess wrong, you often have to start from scratch. Another reason might be that I was scarred for life by early versions of gdb :-) And of course there's a lot of code nowadays you don't want to stop in a debugger, because it will behave quite differently (your network request just timed out...)

I rather collect diagnostics and then try to figure out what happened.


I am in the staring camp. Until you understand what the code is supposed to be doing, what is there to experiment on? Code is not as complex as most things in the world, and reasoning about it is actually extremely useful.


Moreover, I think that if you can't reason about the code you're looking at, you don't understand it and you definitely should not be changing anything in it. Otherwise you're like a cat under the car's hood, saying "I found the problem - the engine is made out of parts".


Experimenting and reasoning ("staring") solves two different problems: the "what" and the "why". Both are useful tools that can lift different tasks, and neither should be used to the exclusion of the other.


I've been a "code starer" for 30 years and emphatically wouldn't do it differently. I want the model of the program in my head to be so faithful that I can, and do, spot bugs by 'running' it there. I'll happily stare at code until that's the case. That state of things maximizes the speed I can work at relative to the stability of what I write. It's not at all uncommon for me to write code for weeks without even compiling once. And usually when I finally do compile and run, there is very, very little to debug.

This approach also answers (to my own satisfaction, if nobody else's) the old question "how can you refactor confidently without unit tests?" Because I can prove the equivalence of any refactoring in my head, the exact same way I can recheck an algebra problem and known it's right. To me, code is a math problem. I don't mean the arithmetic in the expressions. I mean the whole structure of the program is one big math problem.


Reasoning is one of the most powerful tools we have.

I'm a code starer. But I have my limits. If I can't reason about some piece of code and it meets the following criteria:

1. it's sufficiently complex; and

2. I need to understand it so that

3. I can extend it

Then I usually write an interface to it that I can reason about. That interface should add type information and will be peppered with assertions about invariants I expect to hold. As I understand more about the underlying code I add more assertions. If it's a dynamic language I will write many tests against the interface. And I just program against the interface instead of trying to dissect the beast and programming-by-hypothesizing.

For projects where being correct is even more critical or bugs are more costly I'll even reach for higher-level tools like TLA+, Event-B, etc. It's amazing what these tools can do and I wish I'd known about them earlier in my career.


Can you talk about cases that you've modeled in TLA+? I love to read about practical uses of modeling techniques, because the more I code, the more I am convinced that unit testing is just a stab in the dark.


Sure.

In late March 2016 I was working out a problem in Openstack that had the appearance of a race condition. It had to do with the vif_unplugged_event sent from Nova to Neutron. In certain situations when the L2 service component in Neutron was behind on its work it would fail to respond to the event before Nova carried on with its work leaving the network in a bad state. We'd ultimately found a solution to the problem but while I was in the middle of it I was talking about it to someone who would graciously introduce me to the idea of modelling the system and using a model checker to find the race condition for me. I had heard about TLA+ from somewhere, once, so I listened.

I had been an enthusiastic enough student that this someone would become a good friend. Together we decided to work on some models of some underlying components in Openstack. Driven by the Amazon paper on their use of TLA+ on the AWS services it seemed like a worthwhile cause: see if we could convince open source projects to adopt these tools and techniques in the critical parts of their systems. Improve the reliability and safety of infrastructure projects like Openstack which continue to be used as components in applications such as Yahoo! Japan's earthquake notification service.

We haven't published our models yet but we started with a model of the semaphore lock in greenlet. It started as a high-level model using sets, simple actions, and some invariants in pure TLA+. We then added a model of the greenlet implementation in TLA's PlusCal algorithm language and used the model checker to prove the invariant in the higher-level specification still held when refined by the implementation model. We then refined the specification and the model in TLA+ until we came quite close to a representative implementation of the semaphore in PlusCal that was very close to how the Python code was written. We didn't find any errors which I think was satisfying.

We decided to take our little project to the Openstack design summit in Austin. My enthusiastic partner in maths and I found a handful of naive souls to come to an open discussion about formal methods, software specifications, and Openstack. It went quite well. We unfortunately haven't been able to expand on that effort as I'd lost my employment and he had to focus on his PhD thesis.

Needless to say though I've since used verification tools like TLA+ to model design ideas and I continue my studies in predicate calculus and logic-based proofs. I just don't talk about it too much at work. Tends to frighten the tender souls.

Update I should clarify that the errors we were particularly interested in were deadlocks in the implementation of the semaphore.


> "this cannot be the case, because ...".

What I should have added is that invariably, the problem would be in one of these places that they had eliminated by reasoning.

While you obviously need to think about your code, otherwise you can't formulate useful hypotheses, you then must validate those hypotheses. And if you've done any performance work, you will probably know that those hypotheses are also almost invariably wrong. Which is why performance work without measurement is usually either useless or downright counterproductive. Why should it be different for other aspects of code?

Again, needing to form hypotheses is obviously crucial (I also talk about this in my performance book, iOS and macOS Performance Tuning [1]), I've also seen a lot of waste in just gathering reams of data without knowing what you're looking for.

That's why I wrote experimentalist, not "data gatherer". An experiment requires a hypothesis.

[1] https://www.amazon.com/gp/product/0321842847/ref=as_li_tl?ie...


By simply experimenting without understanding you run a good chance of producing code that is only correct by accident or code that is technically correct but wildly inefficient. Also by first taking the time to build up a mental model of what is going on might spend longer fixing the first and second bug (which tend to be the easy ones), but you'll spend dramatically less time fixing 12th and 13th bug (which tend to be the really tricky ones).


I'm pretty sure, as you've heard from others, that everyone does a part of both. If you never look at the code and try to reason why it gives you the result you got in your experiment how do you get to the point of changing the code? Surely you don't experiment with code changes until the process gives you the result you want.


I'm an experimentalist. For me, it feels too hard to reason about code. I prefer to see what happens until an error occurs, and for some reason I feel a lot more incentivized to reason about why it goes wrong at that moment. With experimenting I can ask questions about the context, hence I don't need to remember the context. Also, I'm not sure if I have the memory to remember the context. All I know is that I process information quite well, but my active memory is quite limited.

For example, I learned about pointers by debugging them and coerceing random memory addresses (which were ints) into pointers. Once the concept was learned, I could reason about it.

So nowadays I can reason about code, provided I understand the concept. But I learn about the concept by debugging/experimenting -- in most cases.


The best way I know to debug a piece of software (or just to understand it) is to rewrite it. I keep and rewriting it until a) the error presents itself or b) I have interiorized the model so profoundly that I can understand where the problem is.

Of course I am a big fan of automated tests.


The old meaning of the word "code hacker" from the 1980's (before ycombinator took and redefined it) was someone who writes and maintains programs by running it with variations in input and observing the effects, until it basically works. "Basically" here is of course defined as passing over half of the unit tests, which of course are performed by the business users after the change is in production. Starers are the ones who didn't send a double to the aptitude test at their job interview.


This also explains why dynamic languages can be easier to work with especially for less-than-expert programmers.

You can start experimenting with the program at runtime while the program still is very incorrect. The type checker would require that the program at least makes sense from a typing perspective.


I do both. Sometimes one solves the problem, sometimes the other does. I like having as many debugging techniques as I can.


I think, just like writing, some code is meant to be read but most is produced as a fact of life. For example, of what I write in my daily life - emails, HN comments, papers, text messages... the majority aren't meant to be aesthetically pleasing at all. But occasionally I will write something, either on purpose or by accident, that's just a little bit poetic and nice. Or I'll sit down to write a story, or a blog post, or something that's meant to be a bit nicer to read.

There's a continuum of how nicely things are written in real life: if you just crashed your car, you might text your family quickly telling them what happened. Or if the site went down, you might hack together a quick fix. That'll be utilitarian, since in a situation like that you don't give a damn about how well the text/code "flows". Then there's writing where you're not pressured for time, but you don't care about the style: most code people write is like this. Then finally there's code, and writing, that's intended to be beautiful. It's written with the intent of showing it to other people, and with the hope that someone will read it and say "wow".

The reality, however, is that most code - and most writing - is merely utilitarian. And even when people -do- try to make their code (or writing) beautiful, they aren't skilled enough to do it. Hell, just because I _want_ to write like Márquez doesn't mean I can.

There is code that is literature, but it's hard to find. It may be lost, written on a whiteboard in a job interview to impress the company. It may be hidden from the world, the author embarrassed by it. Or, it might be out there somewhere, but nobody's discovered it yet.


To further extend that idea, you can lump code you (must?) read into 2 broad overgeneralizations:

* An informative program that you will learn something (good) from.

* A cryptic cipher (seemingly) that will only make your eyes bleed and your head ache.

Just sayin'.


Seibel's observation that reading code is less like literature and more like science is dead on. No matter how readable the code is, when I'm confronted with 10,000 lines of it spread across numerous encapsulated functions, I must tackle it very differently from how I read prose. With a complex literary text, I can just read it in linear fashion with occasional segueing to look up words and concepts, with well-engineered code I must follow numerous cases into different flows of logic. These aren't the same at all.

I appreciate that he's trying to dispel the idea that we "read" code as we read for pleasure, I learn from code by experimenting with it. I open up the debugger and step through it, watch the variables change and see where it goes when I execute it. Most of all, I learn by changing that code and trying to build on it. I have enhanced my javascript skills immensely in recent years by cloning various projects on github and trying to expand on them or adopt them to my own purposes. I don't recommend opening up a code base and just reading it, actively engage it, break it, and enhance it.


> Tl;dr: don’t start a code reading group. What you should start instead I’ll get to in a moment but first I need to explain how I arrived at my current opinion.

Hmm.. Isn't the TL;DR supposed to be for readers who don't have that "moment" for you to get to the point?


Blogs are also not literature, apparently. :) Otherwise they would merit more careful editing.


The tl;dr is just "don’t start a code reading group."


Literature is not an inherent quality, for me. Literature (and art more broadly) is among other things a lens that humans look at statements and groups of statements through. There is no "Is this literature?", there is only "what happens when I treat this 'as literature'"? Is it useful to treat the complex interconnected statements in a codebase as literature? Yes, probably, to some people. For me music and plays both provide good structural references for me learning to code. It is occasionally useful to me to connect these things. But I also treat literature itself as a bit of a specimen as the author suggests- I want to know about a poem, what is b it, how does it work, w why is it this way and not some other way. I want to "run" a poem out loud and see what the sounds are.

Some of the other comments point out that the literature idea doesn't work for all code. It probably helps if both the author and the reader buy in to the idea. But I can't get behind the idea that it's true to say "code isn't literature" when what seems to be the case is that literature is a thing we can do to texts, not a thing we must do. Code might be a genre of text? I don't know. But maybe the baby is being thrown out with the bathwater here.


Reading code may not be like reading literature, but I find writing code to be remarkably similar to writing prose. I wrote about that in more detail here: http://www.ericsuh.com/blog/posts/2016/01/writing-code.html


You make good points there, I've proudly omitted so many needless words over the years.


Apropos of the Asimov's remarks, I find it rather sad that a literature major is completely ignoring that literature can only exist in a cultural context.

Sans culture, written text is just that: written text, and there can be no "literature".

And given that literature is 'machine code' for the 'psychological machine' of the Human 'reader', it should have been clear to OP from day 1 that if there is such a thing as "software as literature" it must be 'read' from the pov of Machine, SICP gods' notwithstanding.

[edit]



There are two schools of thought on this: it is and it isn't.

For people who believe code isn't literature descriptive names are wasted characters. The biggest difference between code and literature is that code is structured where literature isn't (literature makes up for this with a flow like a plot). Explicit structures are things like subtypes and nested scopes, qualities that exist directly and not through inference or reference resolution. There are things that can be inferred directly from structures like context and scope. For these developers code can be less explicit and less wordy because they aren't reading code line by line and guessing at how the pieces come together.

The code is literature camp represents an educated formalism. In the absence of explicit structures you need to understand what the code is doing by simply reading the code.

The problem with the code is literature approach is that it is for humans only. The computer absolutely doesn't care, and so therefore it can be wrong or deceptive.


There is one single thing that the structure and choice of commands in code cannot communicate: intent. For most parts of the program this doesn't matter much, because many programs are conventional and do pretty much the same things; so the programmer can reasonably guess why the code is doing what it does if you have already seen similar programs.

But if your name and variables don't explicitly explain what each code section tries to achieve, there's no way to figure it out what it's supposed to do and why it's coded the way it is, if you don't already know that. From the code itself you can only learn what the code does, not what the developer intended to achieve; if you don't know the latest, there's no way to tell if there's a bug in it and it's working in a wrong way.

This is specially important for the parts of the program you don't understand, which are the ones where you would benefit the most from proper comments/explanations, i.e. literature.


> For people who believe code isn't literature descriptive names are wasted characters.

Data point: I do read code like books and I can't stand longAndMeaningful variable names. Rather, cut to the chase.


I'm now reading one book, in which author is using names for variable like "iep" (of type IfExpressionParser), or so. If the code is longer than ten or twenty lines I need to go back to the variable definition to make sure what it really is. Not convenient reading at all.


The chase being what?

It takes no time at all to read a longAndMeaningful variable name, and no time at all to work out what it's doing.

Assuming you don't name variables a1 to z256, calling a variable something like "items" makes it context dependent, which is a little too much like building state into the syntax.

Long names provide an extra informal level of type safety. You should be able to see what a variable is doing by looking at it, and you should also be able to see that it provides a clean level of abstraction/composition instead of just being an arbitrary container with a label.


I think it's somewhat of a middle ground. What I think hampers your ability to understand code, next to not knowing what variable names are supposed to mean, is not being able to easily discern variables from eachother, making you do double-takes while reading and thus making it more difficult to understand. If your code is littered with variables like longAndMeaningful and longAndWithMeaning and lengthyAndMeaningful etc, that becomes really annoying to read.

And of course, there's a difference between longAndMeaninful and reallyVeryLongWindedAndExplicitButOhSoMeaninful. And let's not start about the FactoryFactoryBuilderEtc thingies.


What do you do if you find a section of code that solves a problem that you have never seen before? How do you decide what it's supposed to do, and whether it is doing it right or it has a bug?


>> in order to grok it I have to essentially rewrite it

That whole paragraph describes exactly how I go about learning sections of a new codebase.

Rename stuff, pull out methods and classes, reformat etc until I have a few "a-ha" moments then I discard my changes and re-read the original with a little more enlightenment.


The author is looking at this from the perspective of someone who reads 'for fun' or generalized learning. When you have to maintain (or god forbid, rewrite) a module, you do a lot of code reading. At that point the more like reading a good essay it is, the better.


I would take it one step further and say that a better analogy than collecting specimens would be that code-reading is like doing secondary research: Read an overview, dive down one path, take notes, find and read three more articles, etc.


What came to my mind while reading this is code is closer to poetry than it is to 'literature'. Not a perfect analogy mind you but it shares these qualities: - It tends to be dense. - It is meant to be decoded. - The terms/words/commands/formatting are picked carefully to evoke concepts (and in poetry, feelings).

This reminds me of something a friend said to me a long time ago about his lack of interest in crosswords: "If I want a good puzzle I read a poem."

I've always approached other people's code as a puzzle.


literature feels also a lot like decoding science specimens, in the same sense in the article as to get at or understand the author's mind; i feel that code and 'literature' are the same. the reader of code plays the part of the compiler, computer, or conveyor. the reader of prose literature plays the part of spectator, character, or anything. these words too, is also code for our human computers to syntax-check and preprocess and accept/install or discard/patch.


Reading code is more like figuring out all the plumbing of a large building.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: