Another example is TDD. People espouse the benefits, then some study comes along (http://www.neverworkintheory.org/?p=139) saying the benefits are largely illusory and that code reviews are more effective.
Instead of listening to the experts at programming, listen to the experts on programming. Read some studies about the effectiveness of various tools and methods. Try new things. Programming is a craft, and like many crafts it contains significant amounts of dogma passed from teacher to apprentice.
Wow. What a strange advice. All it takes to be an expert on anything is labeling yourself as such. I'm an expert on a lot of things.
Being competent at something, on the contrary, means actually going there, doing the stuff and learning the craft.
If the guy trying to teach me mechanics doesn't have dirty hands and nails, I'm not very interested in what he has to say.
Sometimes outside perspective is important.
-- Niels Henrik David Bohr
The Copy & Paste one is rubbish. We say copy & paste is bad because 99% of the time you see it, it is bad.
The authors blithely ignore this to make an intellectual point that there are occasional uses to cloning code. Of course there are. A complete waste of words.
"For example, one way to evaluate possible new features for a system is to clone the affected subsystems and introduce the new features there, in a kind of sandbox testbed. As features mature and become stable within the experimental subsystems, they can be migrated incrementally into the stable code base; in this way, the risk of introducing instabilities in the stable version is minimized."
One might say, branching? Indeed, the paper mentions "forking" and boilerplate code. Many examples are poor, where a better language would be able to abstract at a higher level and not require "cloning". One example required "cloning" because the developer didn't have write access to the section he wanted to fix.
As far as real "copy and pasting": "Common examples include the initial lines of for loops".
But hey, I don't have a good survey to back up the fact that most of the real "copy and paste" I see in programming is laziness or poor platform limitations that end up being a pain in the ass and introducing more bugs.
Makes it sound like another candidate for new abstraction facilities (not really all that new -- see APL and its descendants).
But further than that, truth is they are not the first one to come up with those results. Following is a blatant selection of near verbatim quotes from Code Complete (the author makes a great job covering the subject).
"Microsoft's applications division has found that it takes three hours to find and fix a defect by using code inspection, a one-step technique, and 12 hours to find and fix a defect by using testing, a two-step technique (Moore 1992)."
"Collofello and Woodfield reported on a 700,000-line program built by over 400 developers (1989). They found that code reviews were several times as cost-effective as testin - a 1.38 return on investment vs. 0.17."
"[...]the Software Engineering Laboratory found that code reading detected about 80 percent more faults per hour than testing (Basili and Selby 1987). "
"A later study at IBM found that only 3.5 staff hours were needed to find each error when using code inspections, whereas 15-25 hours were needed to find each error through testing (Kaplan 1995)."
Table 20-2. Defect-Detection Rates Removal Step
Lowest Rate Modal Rate Highest Rate
Informal design reviews 25% 35% 40%
Formal design inspections 45% 55% 65%
Informal code reviews 20% 25% 35%
Formal code inspections 45% 60% 70%
Modeling or prototyping 35% 65% 80%
Personal desk-checking of code 20% 40% 60%
Unit test 15% 30% 50%
New function (component) test 20% 30% 35%
Integration test 25% 35% 40%
Regression test 15% 25% 30%
System test 25% 40% 55%
Low-volume beta test (<10 sites) 25% 35% 40%
High-volume beta test (>1,000 sites) 60% 75% 85%
Source: Adapted from Programming Productivity (Jones 1986a),
"Software Defect-Removal Efficiency" (Jones 1996), and
"What We Have Learned About Fighting Defects" (Shull et al. 2002).
"one study of "junior and senior computer science students" (described in an article I have to pay $19 to read) doesn't make somebody an expert on anything"
... is really wrong.
[Edit] Formatting nightmare
Your additional citations are irrelevant because "testing" and "TDD" are not the same thing.
* The size of the company something is being developed for. IBM is not the same environment as some startup, and what works for one may well not work for the other.
* The economic goals and risks of the system: in other words, like someone else mentions, flight control systems have different incentives than a web page. What works for one doesn't work for the other (if you develop a web page as slowly and carefully as the flight control thing, your competitors will eat you alive).
* A long term look at how the code lives and evolves. Perhaps some things are quicker to code up. How do they stand up to maintenance, and adding new features with time? And as new team members are added?
As an example the TDD study you mentioned compared the defect rates of a new software developed once with TDD and once with Code Review.
At work we do TDD mostly to help us developing (faster feedback if the code does what i want, running it on the target needs 10+min), to have an example how the code should be used and to now that a refactoring broke something unrelated.
If it helps us with a refactoring or reduces the defect reduction rate that is a nice benefit but not the main reason why we use TDD.
So it depends on what you do if a study is applicable or not.
The study is much more thoroughly debunked in the comments than I will attempt to reproduce here. Important points:
1) The study's hypothesis was that TDD would produce software with equal, or fewer defects than inspection. That would be one, and depending who you talk to, by no means the most significant benefit of TDD.
2) The actual data collected by the researches showed no statistically significant difference in the two approaches.
Reading studies about the effectiveness of different tools and methods is a great idea. Treating them uncritically is not. :)
John Graham-Cumming, the author of the article submitted here, has a review of this book on Amazon.com:
"This isn't a book about evangelizing the latest development fad, it's about hard data on what does and does not work in software engineering."
"Are you going to believe me, or what you see with your own eyes?"
Besides the assorted cognitive biases that lead people to be convinced of the (non-)existence of $DEITY (and the absolute superiority of $EDITOR), there's the risk of deliberate trickery by salesmen and consultants who've been studying stage magicians.
One fine day a consultant arrived and announced that she was going to institute a new development paradigm. It was going to be world-class cool and whizzy, and improve our productivity and reduce our bug count.
The process? It consisted of a whiteboard mounted in the engineering area, with everyone's name and some heiroglyphs by the names, and some dates.
"Huh?" we said.
"It's the new (whatever her last name was) software development process. We put your name up here, with these symbols that tell us whether you're behind or ahead of schedule that get updated every day by me."
"WTF?", we said.
"I'm doing this with you guys, for free, because I want to get a business process patent out of it."
She lasted two days.
 Any solution that might work, except good software engineering and project management practices, that is. Sigh. There are no silver bullets.
Up to this point, I thought you were going to tell us the new process turned out to be a genuine improvement.
IME, there are few ways to improve productivity in a small development team with a better ROI than providing vast amounts of whiteboard space right next to where people work.
A bonus point is awarded for each different colour of pen.
Ten bonus points are awarded for any technology that can immediately capture the current state of the board and save it in a standard graphics file format for future reference or wider circulation.
I just take photos with my phone and email them around. No need for expensive captures.
[Within minutes of our first meeting with the aforementioned short-timer consultant we were calling the whiteboard a "wall of shame"]
Then these people get up on their high horse about how they've got the solutions to programming and would you idiots just listen to this wisdom and why aren't you listening and come on man up this needs to be made scientific come on. Sorry, no, your wisdom is paper thin to the point that it can't even support its own weight when you pick it up, let alone try to actually apply it to anything. It's orders of magnitude of value away from being strong enough to support you standing on it and preaching.
Yes, the preaching makes people a bit grumpy.
I find that too. Most of them seem so incommensurate with what they purport to be studying as to be nearly trivial. The researchers rarely seem to address (or even be aware of) the assumptions they're making, and their assumptions are usually significant enough to dominate the data.
That's not to say one can't extract value from such studies, but what the value is is so open to interpretation that everyone ends up relying on their pre-existing preferences to decide the issue, which defeats the purpose.
Edit: that study on code cloning that AngryParsley cited above is an example. They make some good distinctions among reasons why programmers duplicate code. But their empirical findings are dominated by their own view of what's valuable vs. not. They admit as much:
Rating how a code clone affects the software system is undoubtedly the most controversial aspect of this study, and also the most subjective.
I have mixed feelings about this study. On the one hand, it's good to see people working diligently to study real codebases. At least someone is trying to look at data. On the other hand, how they're interpreting it is no different than what we all do when we argue this shit online or over beers or – more to the point – when hashing out a design decision. This isn't science, it's folklore with benefits. The problem is that it's being shoehorned into a scientific format it can't live up to.
Their title, by the way, is a straw man. What they're really arguing is that not all forms of code duplication are equally bad, that some are good choices under certain circumstances like platform and language constraints. That's reasonable (if bromidic) and even interesting, but it's just musing. It's not at all up to the authoritative status that AngryParsley gave it; it merely looks that way because it was published in a journal. The reality is that they have an opinion and looked at some code. At least they did look at some code.
Nobody is going to change their mind because of such work, nor should they. It isn't nearly strong enough to justify throwing out one's own hard-won opinions-based-on-experience-to-date. The net result is that everyone will look at it and see what they already believe. For example, I look at it as a Lisp programmer and the examples seem almost comedic. It's obvious that in a more powerful language, you could eliminate most if not all of that duplication, so what the paper really shows is that language constraints force programmers into tradeoffs where duplication is sometimes the lesser evil. Exactly what I already believed.
To me, being afraid of a debugger is like being afraid of actually knowing exactly what is going on - being lazy and just read logs and guessing what might have gone wrong, instead of letting the debugger scream in your face all the idiotic mistakes you have made.
I would argue that using the debugger is being lazy in an intelligent way, instead of spending hours reading endless logs trying to puzzle together logic the debugger can show you directly.
Logs are good. They're effective. They're easy to use. You can filter, search, and aggregate them. Without printf()s and log statements, my world would be chaos and darkness.
But a debugger gives you superpowers:
thread apply all bt
Changing log statements means stopping and re-running your program. If startup time is large, this can hurt productivity.
It's rare, but logs can mislead. With async stuff, logs don't always get printed out in the right order (hello, Node.js and Twisted). A debugger is crucial for figuring out that sort of unintuitive behavior.
I seldom use debuggers, but sometimes it's the right tool. Others I know and respect use debuggers much more. Even among people who are great at what they do, people work differently.
For those of us who prefer to use debuggers sparingly, the worst abuses stand out: coders wasting hours playing with breakpoint and stepping through code, eventually finding the point where it breaks, and then still not understanding the actual problem or how it should be fixed. This kind of situation is certainly not the case for good developers, but it's depressingly common. Good developers who use debuggers also see these abuses, but they respond with "you're doing it wrong" rather than "put away the debugger." IOW, anyone/everyone tends to correct someone by showing them how they do it.
Like him, I use a debugger rarely. Not because I'm opposed; they're great when they work. But it means I don't understand what my software is up to. Which for me is a sign of design and code quality issues. Or just ignorance. Both of which are solved by working to clean things up.
My code is often just a class or two plugged into a behemoth. I know exactly what my code is doing, but not exactly how it's being called by the platform, or what responses is it getting.
I agree a debugger can help you figure out mysteries. But when I find myself using one, I try to ask: how could I have avoided having a mystery in the first place. Common answers: better tests, cleaner code, better design.
I wouldn't be so keen to say that these help "avoid mysteries" in the first place. Many algorithms that have better time complexity and space complexity are often much harder to understand than writing algorithms that sacrifice those qualities, but can be understood at first glance. For example, is it easier to understand code that performs a lot of bit wise operations, which is probably better designed and more cleaner than a piece of code that performs operations with strings and object?
- Your program runs on a client system and logs help you understand or reproduce the system without having to do a remote session (which might be impossible on some firewalled envs).
- Your code crashes without stack trace and you want to understand where to begin the search.
Agree with your overall statement though.
In fact I would say if you don't know how to use a debugger, you really have no reason to avoid one. The point of knowing the debugger in part, esp. with more dynamic languages, is to do better debugging in your head.
I don't build bridges but I would be very surprised if an architect described his work as "pure science and no craft at all" (how would it be possible, then, to build beautiful / ugly bridges?)
I do a little woodworking and have many tools; friends sometimes look at my shop and ask if I really need all that -- yes, I do. In the course of a project you get to use many different tools. You can get around to missing one but it takes exponentially longer to work with not the exact tool. (Same thing with photography).
I'm learning to fly, and the most important word regarding human factors is "honesty". The way to fly is not to avoid mistakes, it's to detect them and minimize the consequences; if you feel you can do no wrong you'll eventually kill yourself.
Unless, of course, you are software engineer-ing.
Flight guidance-and-control systems, among many other things, are are precisely engineered software systems. In a world of web-apps and mobile-apps, people tend to forget this kind of software exists.
Sure, working on your web app, writing some JQuery widgets, or coding up some python scripts is a craft.
From my limited reading, it seems that most "mission critical" software is achieved by applying a lot of resources, especially testing, to the project. Not to mention having a very well-defined (and relatively unchanging?) problem space.
Surely, if there were engineering principles that enabled folks to reliably create high quality software, we wouldn't see the horrible failure rates across all sorts of software projects.
There's no special software engineering sauce that gets used. But there is a dedicated commitment to careful code review, and exhaustive testing, as well as a very rigorous process for defect handling. We have a dedication from the top down to ensure our stuff doesn't break our customers. We don't have a "software pirate ninja rock star" culture, we have a culture of careful work.
Doesn't mean null pointers don't get accessed, or that weird code doesn't show up. It just means that hey, we worked real hard to ensure that these issues are minimized. Quality is a journey, and every day we have to work on it.
This is a great article about how this sort of stuff is done and the kind of culture that you want to cultivate. It's a bit dated, but still solid.
And the "software pirate ninja rockstar" thing is a shameless strawman.
I don't really consider disciplined programming to be a branch of engineering. While I don't have a sophisticated metaphysics of code, it seems that there is an essential ontological difference between "engineering" a software system and "engineering" a bridge or a chemical process.
Richard Gabriel once suggested the idea of a MFA in software, and I think that he is onto something.
It is difficult to put into words, but I would say that the heart of engineering is the discipline of understanding how and why something is useful, as distinguished from feelings or hopes about its utility.
An MFA in software is pretty much the opposite of engineering. Engineering is not a matter of taste or opinion, it is about creating such hard sparkling truths that opinion would be superfluous.
Lots of static analysis also helps, as well as systematic human review of any code, test, or test result.
The best example I witnessed was an engine control system (FADEC software) written in a dialect of ADA called Spark. The code was so clear that it was self-explaining, and a requirements database could explain any given statement in the program. Spark has some nice properties (e.g. no recursion to make it possible to check stack depth limits statically...)
So while more manpower is an important element, it is not the only one. Simplifying the problem space to an extreme is also essential.
Is there anyone who uses a debugger for more than inspecting state?
EDIT: I guess lower level languages and more involved applications use debuggers much more extensively.
One example. Sometimes what you want is a time machine: figuring out how a particular variable reached its value. So you swap out the Windows memory allocator (which randomizes initial heap addresses) for one with predictable addresses, run the program until you find the dodgy value, take its address, then restart the program with a hardware breakpoint setup to monitor and log the stack whenever modifications are made to that memory address.
This kind of "backward tracking" takes no more than 5 minutes on a project that's set up for it (i.e. with the appropriate allocator available and switchable). Solving the problem by guessing locations, dropping printfs in the code, etc. is rather less productive.
Other uses: debugging code you don't have the source for, OS code, binary compatibility issues etc.
(Crash bugs are usually not a big deal; usually, when the code crashes, the crash location is relevant. Bugs that corrupt state are much worse (heap corruption, races / concurrent modification, etc.). The original bug resulting in the final bad behaviour may be completely different from where it appears.)
* Tracing code execution paths to load the code into your mind
* Modifying state inline so you get a different path without re-running
* Modifying state like FirstName to trace just how the bugger gets into the output
* Some sophisticated debuggers like java can allow you change code, then hot recompile and deploy code LIVE.
* Suspend threads when you reach a breakpoint so you can dig around at all the multi-threaded state at that point
- Although multi-threaded debugging is where printfs shine
Just few days ago I had a strange problem with the order of imports in Python at the border of my code and external library (Celery). There were import hooks involved but they didn't seem to be executed properly in certain conditions. I could reproduce them quite reliably but I needed to pinpoint the exact import (inside Celery itself, mind you) that was causing the problem. pdb (Python debugger) was indispensable while solving it.
On the other hand, though, it was probably the first time in many months that I used pdb for more than 5 minutes, and for something more complicated than checking why a particular test fails.
This post made me realize that 95% of the time I fix errors from just the stack trace, 4.9% from logging, and 0.1% from actually debugging.
It's good for white box testing. It's also good for compiled languages. You know something is wrong, but you don't know why it's wrong.
...just based on years of observation
Having recently started mentoring/managing the first really junior engineer on our team (self-taught, <1 year programming experience), boy does this ring true. Luckily I'm of the temperament to find the "advanced beginner" stage of learning more funny than annoying.
I think it's possible to understand as little about your code when using loggers as when using debuggers, so I have a hard time agreeing with him there. I think his general point about having tools and knowing when to use them applies just as much to that as it does to language, so he contradicts himself.
1. Don't be lazy and just do something that works without taking the time to learn why it works.
2. Don't be lazy and just stop when you have something that works. Go through the code again and see if you can make it better.
4. If you find yourself writing the same thing twice, don't be lazy and carry on, put the code in a single place and call it from where you need it.
Or at least that's how I see it. I do all of the things I shouldn't do, largely because doing things the wrong way is so much easier!
Edit: Rather than rewritten, I meant "falls under the general category of". The article was great!
Is it lazy to accept something that works, and meet the deadline?
Is it OK as a craftsman to miss your deadlines, because you want to know why something works?
These decisions are like a craft in themselves - sometimes we can just trust a library works. Sometimes, we need to understand more. And sometimes we need to re-negotiate deadlines. And sometimes we lose clients.
Your function may need some refactoring to make sense, but just cutting that block of text out and then calling it in a function can usually be done in seconds.
Copy the code. Paste in new place. Change variable names to fit in new place.
Cut the code. Place inside function declaration. create call to function in old and new places.
Copy-pasting and just dropping the useless parts is easier than making a new function, because you have to think about how to make the new function apply to both cases (the easiest way is "well, I'll just put a switch in the parameters and if statements", but it doesn't really lead to better code).
And also: decouple code from it's context. Rename variables. Rename the function. Possibly add a new module for it.
Moving repeated code to a function rarely just about relocating a piece of code. Quite often (especially if you have cross-file code repetition) you identify abstraction, which is a Serious Business (TM), requiring at least to think about where to introduce it, how to fit it into existing mental model of the program, and while we're at it, how to make it useful for code that you'll be writing next week, because it would be a waste not to do that now.
It's not that difficult and it's quite rewarding, but it also takes significantly more time than just copy-pasting and carrying on.
>Be promiscuous with languages
>I hate language wars. ... you're arguing about the wrong thing.
It's easy to take this for granted, but it's a concept that is very important to stress to new coders. If you spend too much time focusing on one language you run the risk of the form becoming the logic . This is a dangerous place where your work can be better analogized to muscle memory than to logical thought.
At least in a college environment, I think this lack of plasticity causes discomfort with different representations of similar logic - and so flame wars abound.
You seem be implying that the latter statement doesn't apply to the disciplines of science and engineering. "skill and experience expressed through tools" is highly important in both watchmaking and bridge building. I would advise anyone who says elsewise to reconsider.
I understand your point, but why create a hugely false dichotomy between a craft discipline and the science and engineering disciplines?
I strongly concur with points 2 and 6.
There is something different between software and other engineering methods. I'm not sure he expresses it properly, but it's definitely there.
No developers develop the same way, and even though there are some obvious ways, there seldom exists an absolute all cases best way.
It is the thesis of the mythical month (F. Brooks) and I do like this theory since its corollary the "no silver bullets syndrome" is quite accurate.
The essence of programming is creativity, thus no tools can improve software productivity in its essence.
The problem with school, is studious dull boys with no imagination thinks they worth something in programming by incanting mantras of pseudo tech gibbish. They have a 90K$ loan, no gift, and they pollute the eco system because else, they become hobos. At least, most of them are hired as java, C++ or PHP developpers where they fit best.
Completely not true. Creativity is a fragile thing and anything that stands between you and expressing your thougths may break the creative process altogether. Good tools can also enhance the process, for example by allowing to see you the thing you're working on in realtime (see e.g. Bret Victor's "Inventing on Principle").
Also, software productivity is a function of both creativity AND being able to turn the idea into reality efficiently. Good tools do a great job on the second part.
 I do have the feeling though that most of the creativity still happens on paper and/or whiteboard, not inside computer programs.
And also most breakthrough are also in simple efficient conception.
Delivering is what most people calls craft.
I do pride myself in delivering, however, any monkey coder can deliver.
Engineers (And I mean real ones) have enough trouble with people thinking they are blue colar craft workers.
One thing I would add though is that there are many times when there is time pressure and a kludge works. The right thing to do here is to document that it is a kludge so that if/when it bites you later you have a comment that attracts your attention to it.
"I don't understand why this fixes the problem of X but this seems to work" is a perfectly good comment. It's great to admit in your comments what you don't know. (That's why questions relating to commenting are great interview questions IMO.)
Finally, I think it's important in the process of simplification to periodically revisit and refactor old code to ensure it is consistent with the rest of the project. This should be an ongoing gradual task.
Anyway, great article.
In Smalltalk, you practically live inside the debugger. Also, if you are an ASM programmer, the debugger is indispensable.
I used to do precisely that. Sprinkle code with log messages, recompile and run. When I finally learned how to use gdb, my debugging productivity increased tenfold.
I mean, just the ability to stop your program at any given point gives you an enormous advantage. You can not only examine the local state of your program, but also you can see how the state of systems outside of your program (e.g. database) changes, and all of this without polluting the code with tons of useless debug messages.
Often when I had new ideas during bug hunts, to test my hypothesis without a debugger, I had to go back and add new logs, the recompile, then run (and make sure it reaches the same state as before!) - lots of wasted time. With a decent debugger it's as easy as typing an expression.
And I don't think debuggers lead to lazy thinking. The process of finding the problem is the same whatever method you use - you analyze the code, have an idea about what could be wrong, change one thing, then see what happens. Debuggers just make it easier.
It is harder to grow software than it is to initially build it. Preconceptions bite you on the ass, data structures don't allow for new features, side effects multiply.
You don't need to learn the layers. In fact, if you're learning all the layers, you're probably an innefective coder. This is not to say that you shouldn't investigate the layers or have a poke around the layers. But, software's about reuse and reuse is about reusing other people's work via known interfaces without worrying overmuch about what goes on underneath the hood.
I'm actually more of a debugger than a profiler, and as much as I'd like to believe that my way is as valid as his, I suspect that he's probably right on this and I'm probably wrong.
It appears to me that a lot of folks are actually incapable of stopping and thinking how stuff might even plausibly work. As soon as you rephrase the question in terms of fundamentals, it becomes clear, but people allow themselves to get confused by all the high-level whizbang stuff, without remembering that there is no magic.
I've recently rediscovered the joy of writing GEM apps in m68k ASM...
"All problems in computer science can be solved by another level of indirection." - Butler Lampson
And, from what limited sources I could find:
"[above] ... Except for the problem of too many layers of indirection" - David Wheeler
Wilkes, Wheeler, and Gill wrote a book, "Preparation of programs for an electronic digital computer", that pretty much describes the core software engineering precepts we use today - in 1951.
Gill wrote "The Diagnosis of Mistakes in Programmes on the EDSAC", which gives a good snip of what they were doing at the time. I think that it can be obtained for free online.