But as soon as the skilled code reader has a purpose in mind -- a question to answer -- he or she can rapidly find a meaningful narrative. Put into that context, programmers read code constantly, and the more they read the better they get.
So I don't like the "nobody actually reads code" claim. It's a strawman. When I tell people to read code, it's always in the context of "pick something you want to understand or fix, and read with that purpose in mind." Not "the Linux kernel is like Moby Dick, you should really read it all."
That doesn't mean that reading it is any less important, or that writing readable code is any less important, or that code is all an ugly mess not worth reading.
It may mean that the reading pattern for code might not lend itself toward a reading group in the same way that literature does. So we might need to either spend more time reading code in solitude or develop new ways to do reading groups that work for this style of writing.
And it just seems plain ridiculous to say that people don't spend time reading code. People do read code, but they don't think about it as "reading" in the same way as someone reads a book, so they don't have it readily at hand when asked.
I think that asking "what repos have you cloned just so that you could look at something, not intending on actually building or using that particular code" is probably roughly analogous to asking "what Wikipedia pages have you read recently" I frequently do both of those for the same reasons.
I might pull up the wikipedia page for the Apollo Lunar Module because I suddenly realize that I don't know how the RCS thrusters on it work, or I might clone git's repo because I suddenly realize that I don't know how git-notes is implemented.
In either case, I don't read the whole thing (and certainly not straight through in a linear manner, like I read Moby Dick), but rather I'm going to grep/ctrl-f to the part that I'm interested in, then probably jump around a dozen or more times until I am satisfied. That's still reading though.
I still end up pulling down the repo and using grep 90% of the time. OpenGrok is what I use when I don't even know which repo I'm interested in.
Seibel studied English, has written some popular programming books, and has had the experience of trying to set up code-reading seminars at multiple companies. The key points of the article were 1) many programming gurus recommend reading code yet nobody does this; and 2) applying a lit-seminar approach to investigating code doesn't really work. That's all he was saying. There's no need to imply that his understanding of literature is limited to pulp fiction (I highly doubt that it is).
If you go through life looking for opportunities to argue semantics, you won't be disappointed. But you'll also miss most of the meaning.
Personally, I have to use the skills I gained learning rhetoric and analyzing literature to make sense of some code bases. I've seen some shocking ball-o-mud code bases, and the only way to make sense of it was to understand the author(s), though never having met them. And to understand the subtext of the syntax, despite inconsistent application (poor naming convention and mixing ladder and structured text is like reading illuminated medieval engravings). There's code smells and slight variation in copy/paste blocks that belie the history of edits. There are threads woven through multiple volumes, where deprecated interlocks and crossed wires lead to bizarre plot twists. It's easy to think of certain machines as having personalities (or mental disorders, as the case may be). And there's never just an atomic dozen lines of code; at best you can reference five subroutines across two PLCs to describe what may be going on. And like good literature, you can't spoil the ending, as the joy is in the retelling of the story (of how you tracked down what was really happening).
It doesn't have to be a fascinating travesty, but I've yet to see a boring, dull, straightforward control system. Perhaps industrial programming logic just naturally turns out that way.
But it sure feels like Moby Dick, both in size, depth, and the unreliable narrator.
Of course I have. Any time I've picked up a reference book, textbook, or anthology.
books != novels
I really enjoyed reading this article, but I would argue with its headline. Based on the author's experience and the example from Donald Knuth, it seems like the best way to read code is to go through it multiple times to the point where you could reimplement it or provide complete documentation for it.
The literary analog for code reading might be a writing a scholarly reader's companion to a book.
You can't write a secondary source for a work of literature by reading it once through like a drugstore thriller or romance. A literary analyst would read the book through completely >3 times and spend hours on certain key passages. They would take extensive notes reconstructing the innerworkings of the characters, the relationships between them, and key themes. Once the work has been comprehensively understood, the scholar can write out in an expository manner what is going on in the piece of literature, the same way that a thoroughly digested piece of software can be rewritten based on the mental model that develops as you read.
Obviously software and novels do not map completely one onto the other. I think the key similarity is that they both can be created with sufficient complexity to require taking multiple passes and following along with the author, building something similar yourself in order to truly understand them.
The potential advantages include:
- More source code and more documentation on the screen(s) at once
- Ability to edit documentation independently of source code (regardless of language?)
- Write documentation and source code in parallel without merge conflicts
- Real-time hyperlinked documentation with superior text formatting
- Quasi-real-time machine translation into different natural languages
- Every line of code can be clearly linked to a task, business requirement, etc.
- Documentation could automatically timestamp when each line of code was written (metrics)
- Dynamic inclusion of architecture diagrams, images to explain relations, call-graph hierarchies, etc.
- Single-source documentation (e.g., tag code snippets for user inclusion in manual[s]).
There have been countless proposals over the years for some kind of richer file format for representing code and they have all been busts because so much of our tooling, assumptions, interoperability and culture is centered on flat text code that it's proven impossible thus far to switch.
If there are other known features that are more important, they would have taken off by now.
If you find yourself repeating an assignment, pull it out into a function and remove the duplication.
Rinse and repeat until everything in the file is assigned to a descriptive function or variable. Doing this with someone else's code gives you a feel for how to do it with your own.
Code which can be represented as plaintext is versatile and portable and, more importantly, has its comprehensibility decoupled from any specific company, or project or group providing the necessary tools to make it human-readable. If an alternative file format isn't natively human-readable then it is by my definition less readable regardless of the standards put into writing the code itself.
Their simplicity is also a strength. That richer file format will at some point be written in a text file.
I am the founder of Crudzilla Software (see profile for link) a web dev platform.
We use jsr-223 which allows scripting engines for other languages to be integrated into the jvm. What we did was create a file format that serves as a meta-wrapper around pieces of code, we call this wrapper a "crud".
This turns out to be quite powerful as it allows the code to have additional instructions associated with it. For instance input validation, security and configuration can be specified along with the code they apply to.
Although Smalltalk imaging systems are an interesting alternative.
Freeing code from the constraints of text files would allow different ways to interact with it. Visual programming, REPLs, etc, could work more smoothly in the same workflow as a traditional text editor.
"Good enough" principles make inline comments a clear choice, I'd think. Plus, seeing the comment in context may provide cognitive benefits; the comment and code block grouped in the same space helps recognize and associate their linkage in a way cross-highlighting one or the other may not.
"In my life as an architect, I find that the single thing which inhibits young professionals, new students most severely, is their acceptance of standards that are too low. If I ask a student whether her design is as good as Chartres, she often smiles tolerantly at me as if to say, 'Of course not, that isn't what I am trying to do. I could never do that.'"
Then: "That standard must be our standard. If you are going to be a builder, no other standard is worthwhile."
And so he asks the same thing about programming.
"But at once I run into a problem. For a programmer, what is a comparable goal? What is the Chartres of programming? What task is at a high enough level to inspire people writing programs, to reach for the stars? Can you write a computer program on the same level as Fermat's last theorem? Can you write a program which has the enabling power of Dr. Johnson's dictionary? Can you write a program which has the productive power of Watt's steam engine? Can you write a program which overcomes the gulf between the technical culture of our civilization, and which inserts itself into our human life as deeply as Eliot's poems of the wasteland or Virginia Woolf's The Waves?"
Maybe code is just bad literature?
This strategy may work for small programs, but it doesn't scale to large programs. For example, most people aren't going to have the time to refactor Firefox or the Linux kernel to figure out how they work.
Also, it's hard to tell a lot about a large program just by reading a listing of the source code. Certain things about the code become much more obvious if you step through the running code with a debugger. To extend the author's analogy of a program being a scientific specimen: the code is a living specimen whose behavior can be studied, not just a dead specimen that can be stained and looked at under a microscope.
However, even with a large program, sometimes I find it helpful to write a smaller program that does much the same thing as a small part of it. For example, last year I wrote a debugger frontend in Dart, based on the Chrome DevTools debugger. Whenever I wanted to implement something I'd first look at how the Chrome debugger did it.
Currently I'm working on a reimplementation of the React framework, also in Dart.
I still look through other sources, including man pages, books and a lot of googling. But sometimes I just want to see what it is I'm dealing with. I do this with all code bases I deal with. I think it's a good practice to get into.
If I meet a track that I really like, I don't just listen to it. I put it on the decks, try to mix it with something else and listen how it interacts with it. I put it on the grid, sample loops, hits and small sounds. If you don't understand what I'm talking about, here's a video of Four Tet doing something similar to Jackson's Thriller:
Sometimes I analyze it's structure, laying empty loops in mute tracks alongside it. Sometimes I try to recreate synths that are used. Sometimes I go to whosampled.com and try to recreate the sampling process.
I'm sure writers do the same with literature they read, too.
I wonder how blurred the lines can truly become between code and literature, though. If a piece of code is primarily intended to be read and discussed, does that make it literature?
I personally use the development tools in Safari most of the time.
I think Chrome and/or Firefox provide more support for live editing of the current web page, however I've never made much use of that functionality.
Don't read code. Read papers. Build a model of your algorithms etc. in your mind. Describe this model in a wiki. Translate the model into interfaces. Then write the code that implements those interfaces.
I feel that this is really nothing that a good compiler couldn't do with a higher level language today. However, in doing so I would wind up with a heavily polluted namespace of helper methods and such that really don't help me understand what I was trying to do.
So, in the vein of reading code. I've only read a few sections of "The Stanford Graphbase," as I just got it a couple of weeks ago, but I can already tell this would have been a much better introduction to a few graph algorithms than I had in my undergrad.
Further, all of the "literate" programs I have written have been much easier for me to jump back into. Precisely because I have much of my "decoding" notes. So, code isn't literature, because we don't write it with a narrative for humans in mind. But, there is no real reason we couldn't.
In particular I, and the IT shop at Tachyus, have chosen F# as the way to go forward for a number of reasons. Sticking to readability, F# (and other FPs to a greater or lesser extent) allow production code that "reads" more expressively in terms of conveying what the code is actually accomplishing to the reader (and to the compiler) rather than the frequently tangled instructions to the compiler on how to accomplish the task coming from traditional imperative and OO languages. F# also has some very useful tools to emit a form of literate code that produces publication ready HTML or MD, http://tpetricek.github.io/FSharp.Formatting/ (This project will soon be accepted as a top-tier project by the F# Software Foundation, http://fsharp.org/) It may not be to the letter of Knuth's idea of literate programming, but certainly in the spirit.
I did read some code lately. Actually I had to go so far as stepping through it in the debugger to properly decode it, http://jackfoxy.com/transparent-heterogeneous-parallel-async... (the code snippets here have tool-tips in my article, just one of the features available with FSharp.Formatting), but this is really the exception in F#. The vast majority of code is easily accessible to any programmer of reasonable quality (with proper introduction to FP) in any IT shop. The deeper functional stuff like Continuation Passing Style and Applicative Functors (e.g. heterogeneous parallel async) in most cases is already available in core libraries. And when not a literature search and/or getting in touch with the FP community helps.
Example of a conditional statement:
Am I better than you?
If so, let us proceed to scene III.
What is the goal of reading literature we're talking about? We're mixing up reading a book for pleasure and gaining a deep understanding of a piece of literature to become a better writer.
Reading a piece of code or a book once is not going to do anything to your skillset as a producer, at least books are specifically written to be read once for pleasure. The equivalent for code would be using a piece of software, not reading the code once.
If you want to be a better writer then you get a deep understanding of a piece of literature, the same applies to code. I have recently read a lot of code, because I was debugging/modifying a library I was using (the Requests lib in Python). It's very nicely written and I did get some good ideas from it, but it was work.
I don't think the metaphor is flawed at all. I think that this was a result of coders thinking that people would get better at writing by reading literature or that this was the point of literature seminars. I guess a lesson in understanding other disciplines at least a little bit before trying to take lessons from them?
If you asked a different question, like "explain how you read code in the course of a typical project or experiment" you will get a ton of examples. They might describe how they look to understand the basic data structures, and then imagine some sample data flowing through the algorithm to understand the purpose, and then examine the details, edge cases, and interactions to see why some non-obvious choices were made. Then they might describe how they use this to find what parts of the code should be generalized, specialized, or extended to fit new functionality.
It might be interesting to incorporate code reading into an interview to see the strategies that people use. It would be quite difficult to make it a fair question, though, because patterns vary widely and it often takes more than an hour or so to adapt.
The first code reading session I held, I chose underscore.js and it was a successful code reading session, because -- unlike most libraries and programs -- a functional utility library was a nice linear read with mostly self-contained functions. However, when we got to more complex programs and libraries with more code to handle accidental complexity (e.g. handle browser and DOM inconsistencies, or UNIX fragmentation etc) it was considerably harder to read and the presenter found themselves jumping between different code paths and functions like they were debugging the program.
I guess Google has spoiled me. When reading code, I constantly look at its development history - commit messages, diffs and line-by-line "blame", linked bugs and code review threads. If you have good tools for that, there's much less need for inline comments.
A good code reader should be like a tour guide, and a good tour guide doesn't visit every single building and street in a neighborhood but rather describes the historical context of the neighborhood and then visits a few interesting places.
I do get lots of value out of that. My favorite example is Beazley's GIL talk: http://www.youtube.com/watch?v=Obt-vMVdM8s
This is how I code and read code..damn and I thought I never would see the day where someone finally got it..
One could read Selinger or Pamuk or Sartre or Hesse, to realize that this second component is much more important, while masters like Nabokov whose speciality is playing with words might show you that wording is also important.)
The transition from reading to writing ones own texts, not imitating or copy pasting is also not clear, and, of course, one never could become a good writer only by excessive reading. Writing and speaking are different cognitive tasks from reading or listening.
So what? Reading of good code is important, it teaches style, how to be brief, concise, precise. But where to find the good code? Well, the recursive list functions in Scheme are worth reading. Some parts of Haskell Prelude are worth reading, some macros of Common Lisp, etc.
The code of "the top writers" are worth reading. Code from PAIP or On Lisp or SICP are obvious examples, while some code, like from Practical CL which is mostly a mechanical translation of OO stuff only adds more confusion.
So, reading "good" code is still the must, the same way that reading Catcher In The Rye or Zen And Art Of Motorcycle Maintenance or Atlas Shrugged is still the must.
But programming is about writing, which means expressing ones own ideas and realizations and understanding, so one must have these in the first place.
In this sense programming is like writing a poetry - it must emerge and form in ones mind before it could be written down. The best poetry is written exactly like this - committed to the paper suddenly as it emerges, without any later changes.
This reflects the process of "emergence" of ideas or profs in a mind of scientists who are continuing to persue a problem for years - suddenly it is here, as if it came from subconscious. It seems that the best code, like these classic Lisp procedures or parts of Prelude has been written this way.
Of course, reading Java is as meaningless as reading graphomans or some lame and lenthy political pamphlet in a third-rate newspaper.)
Are you sure that's true? Can you cite some examples?
Lisp is famous for its interactivity: the read-eval-print loop, SLIME, Lisp Machines, Emacs, etc. Avid Lisp hackers even edit code inside of running systems. The "bottom-up approach" to programming (as advocated by Paul Graham) is almost the opposite of what you describe, isn't it?
Generally speaking, I think both programmers and poets work in a dynamic way with their texts: moving stuff around, seeing what works, doing experiments, asking others, etc.
That's one reason why Knuth's idea of literate programming seems so academic and remote for most programmers: how are you going to keep all of that text up-to-date when you start refactoring?
Your each iteration in a bottom-up process could be based on a small insight after thinking about a subproblem. Later one just re-uses ones own realizations and adapts them to new requirements.
Also I think that it should be not just linear bottom-up process, but recursive one, when you regularly "call yourself" with the old problem, but a "new you, evolved with experience". Starting from the bottom, from basic building blocks is crucial. The only "addition" is that nothing will be set in stone and you should come back to "simplify" and refactor even what is at the very bottom.
I also never advocated Knuth's idea or that whole programs should be printed as books (while some procedures such as map or append are worth to be printed and framed).
As for poetry, well, I thing almost every youth wrote some in his late teens or early twenties, and yes, I told it wrong, not a whole poem emerges in ones mind, but a few central passages, the main scheme, to which some ornaments could be added later.