Hacker News new | comments | show | ask | jobs | submit login
Why Good Developers Write Bad Code: An Observational Case Study [pdf] (upedu.org)
173 points by danielalmeida 980 days ago | hide | past | web | favorite | 80 comments

TL;DR key recommendations:

- Once a project is completed, the team must ensure that the “What” and “Why” of each software item are properly documented.

- In the cases of parallel development of inter-dependent software modules set up a negotiation table to solve conflict between the development teams.

- Make sure that the development team is aware of the CMMI-ACQ or ISO12207 processes for negotiating with third parties.

- Make sure that testers are involved when negotiating with a third party for a potentially vulnerable software component. 

- Plan organization-wide process reviews to detect isolated processes and to promote information flows between processes.

- Planned special budget items to support long lasting corrections or corrections that are likely to benefit many modules. 

- Projects with strict deadlines are risky, and should be carefully monitored to avoid last minute unplanned activities.

- Team members should maintain a careful balance between the flows of information within formal development processes and informal human interactions.

- Team members should make sure that knowledge is appropriately distributed amongst them. For example, pair programming is a practice which can promote knowledge sharing.

- Any intrusion into the team dynamics by outsiders should be done very carefully.

I think there are a lot of conflicting ideas at play here. As a coder, yes I can write good code. But however, everyone I know in the valley tells me "just shut up and launch, it doesn't have to be good. You should have launched yesterday, that's what they would have told you at YC". This advice have a lot of truth to it, because you need to get feedback, validate your project, and perhaps have the first-to-market advantage in the highly competitive world of technology today, with investors and customers hounding you on both sides. However, this hasty, rushed culture necessarily will result in a lot of poorly-written code being deployed (followed by a lot of rewrites), even by decent coders, because they simply aren't given the time and runway.

The situation is very different when creating software for internal use in large corporations, which I think was the case with this study. With a startup, launching early can get you the advantages you mentioned, and your code quality does not necessarily have to be high because you generally will not have many users at the start. Large internal software must be highly functional and performant from the first deployment because it will immediately be used by the whole company and will often affect revenue streams. This allows low quality to have a huge impact from day 1.

If you write code that solves a people problem and allows your project to move forward, no matter how stinky that code is, that code is "good code".

By nature, we programmers often judge our code on its technical merits. But a stinky piece of crap that I might write that shows me clearly what I need to build, or demonstrates the techniques that are required, or shows that something isn't going to work early, etc can have immense value.

Writing stinky code is a tool that every programmer can use. The trick is to know when you should use it and when you should not. If done properly, you can solve some intractable problems dealing with requirements gathering, design, or even political problems. When used improperly it can have exactly the opposite effect -- it pushes requirements gathering back, or defers important design decisions until it is too late, or creates conflict within the team.

I know some people in the industry who are pretty good at this, but I know of nobody who is a master at it. Not only do you need to be good at it personally, you need to be able to influence the team to coordinate their work in a way that is not destructive. This requires you to have impeccable taste, amazing technical ability and sublime communication skills. It is a skill that I wish more programmers would value and work towards.

If the code is stinky, it's bad code. It can still be have business value, which is why somebody coined the term "technical debt".

I get really crabby about bad commit messages because on some projects, the commit history is the only 'Why' you ever get.

Scratch that, on MOST projects that's the case.

I'm a fan of whys. Even for yourself. I have code I have to maintain that I wrote 10 years ago.

But it can make for long commit messages. Why also includes what alternatives you considered, and why you didn't use those.

I also like to have a digital "worksheet" for each change I do, where all my thoughts and research goes. So if all else fails, I can reference that. But no-one else can, so I like to transfer as much knowledge as possible to the commit message. At the same time, some of these go on for 5 or more pages. They also tend to be very messy. I'm not sure if it's all appropriate for a commit message.

In my case, only key whats and whys go to the commit message. Most of the discussion is left in the issue tracker.

(Of course, the commit message refers to issue ID, and vice versa.)

Is there any scenario under which a long commit message is detrimental, even if "messy"?

It's an issue when dealing with management or clients. They can see long commits as a problem with someone with too much time on their hands.

The whole point of a version control system is that it contains everything related to the code. On lots of web dev and game projects you also commit the finalized assets (the generated javascript from coffeescript, the compressed 3d textures, etc. etc.)

That's not bad commit messages, that's bad client management.

IMO most of your "why" ought to be code-comments, not commit-messages, because most of the time people say: "Why is this code this way" as opposed to "Why did this specific transition occur in the past".

I completely agree.

There are man times when I have seen a crappy piece of code that I wrote a while ago, and decide to "tidy it up". The I test it, then remember there is a strange edge case that required me to write it the "crappy" way rather than the clean way. It always gets commented the second time.

There's nothing worse than a project full of bad commit messages.

There was one project I saw that had a two line commit message, and there were 29,000 changed lines across 44 files. Who does that?

I'm looking at a commit message from one of my clients today, which is just "Not, new bins, cant stop bins" and a check-in of several hundred dlls generated as build artifacts. Six days later another developer at the client checked-in a commit that removed all of the dlls, with the message "remove BIN folder".

Some other examples: "Fixes to make work", "Left Over", and their most common message " ".

They're nice guys to work with, but their VC habits are awful.

I have to admit, I've made number of commits with "." as the message.

No excuse, beyond it usually being a minor change well documented in the code ( I always write comments ) and me being utterly buried under work... You know, start the day with 20 things to do, crack off 4 of them and have 23 things in queue at the end of your day.

The "." was sort of a placeholder for "Fuck This, I'm ready to quit." ( and I did eventually )

That's when I prefer to use `git amend` or rebasing. I'll make "WIP doing things" commits, and then later squish a few together before making the code more public.

I could see the latter being something like "Apply coding standards", where somebody went in and fixed all the lines that went over the right margin or had wonky whitespace.

Of course, we can debate over and other whether mass-correcting existing files is actually helpful (one unofficial rule we have here is "only use the auto-formatter on code you've personally worked on or have taken over responsibility for", because it affects the blame history).

Also, I changed my name last year, and when I did that, I ran a mass find-and-replace to correct my credit in every Javadoc I'm credited on (I particularly detest my deadname, and I want it dead and buried), with the commit message being something like "Correct my credit to match my new name".

Guilty as charged. Or maybe 'nolo contendere'. I just dropped 5kloc, 6 weeks work, into a repo with the message "Initial commit"

I'm currently working on the changes that will actually document what is otherwise a walk of code.

The road to hell is paved with good intentions.

I can one-up you there: one of my "initial commit" messages was a dedication to my cat, who passed away a couple of days before I checked the code in. Better yet, we released that project as open-source, so that commit message is still floating around Assembla for all to see.

I could see that. Initial imports and all that.

However, my example was a project that was around for about 3 years at that point, and it was basically just labelled: "Upgrade to version 2.0".

Face → Palm

I've made that commit before. It was actually an initial import of a from-scratch rewrite that happened to have utility libraries and such in common. Delete everything, drop in the new files you've been working on elsewhere, make that a commit. Not everything changes, because some things are the same by coincidence. Otherwise, it's just a discontinuity.

Eh, uninformative commit messages aren't a sin, they're a trade-off. Right now, most of my commit messages are one word. Why? Because at the moment nobody cares about my commit messages because nobody cares about my project because it doesn't yet do anything useful. Getting to the point where it does something useful has higher priority than writing long messages nobody will read. Once the commit messages have an audience, then I'll put work into writing better ones.

It's a self fulfilling prophecy though. If you want to be important you have to act important.

Yes, nobody cares about your commit messages until there's a bug or a major refactor. By then, if the commit messages suck, it's too late to do anything about it.

"Programs are meant to be read by humans and only incidentally for computers to execute"

Is never more true than when you're trying to fix bugs or make improvements. A project you can't improve is a dead project.

This just reminded me of whatthecommit. IIRC they get these from actual commit messages; if they do, there is no hope. :-)

[0] http://whatthecommit.com

I once saw a coworker commit a reshuffling of a project's directory structure with the message "YOLO". We made fun of him for a while for that.

(edit: He made this change after a lot of discussion with a bunch of people, and he also sent out a mass email describing the changes to the structure, so he wasn't totally being irresponsible there.)


I'd assume the keys are the commit hashes? Now how to search github for them...

this line made my day: ' "7142cd872a703392c1b094a18a1e229e": "LAST time, XNAMEX, /dev/urandom IS NOT a variable name generator...",'

I'll take a sound codebase and zero commit messages over a poorly structured codebase any day.

This choice makes no sense but.... hmm, really? I think I'd rather have good commit messages that explain the "whys" that went into the code.

Code typically isn't rocket science. It's the human knowledge that goes into it that's irreplacable.

Example 1: OK, you're using a third-party CSV parser instead of the one built into the standard library. Why? If your code is crap but well-documented, I can read what you were thinking: "Using non-standard CSV parser because the standard one chokes on files bigger than 2gb" At that point I can refactor your code, or perhaps see that this issue has been fixed in a newer version of the standard library. Or maybe I realize that you confused gigabits and gigabytes and you made a bad choice in the first place, and I realize I can safely remove this dependency. But if your code is tight but undocumented... I would have no idea why this third-party library is being used unless I do some painful trial-and-error that still might not definitively answer the why.

Example 2: You inherit Mary's code. It calculates commissions for our salespeople. The code is sloppy and convoluted, because the sales guys change the commission formulas every month... and these changes have been happening for over ten years, often on very short notice, often contradicting basic assumptions made when the software was originally architected. But Mary documented every change. Which is good, because the fucking sales guys sure don't. Her code is literally the company's only coherent record in the entire company of the commission process. Remove her comments and commit messages, and none of the code would make sense, even if it was tightened up into a sounder codebase of seven modules with 300 LoC each instead of ten modules with 500 LoC each.

So yeah. Totally fictional choice but I'll take documentation every time. Code is just code, I can fix it.

(Both those examples are fictional, but I've been coding professionally for nearly twenty years and I've seen variations of them countless times...)

In both of your examples, I believe comments in the code are the real winning strategy, rather than the individual commit-messages.

99% of the time what you want is to understand the current code--or at least code at a specific past point in time--as opposed to every transition that occurred.

For the CSV parser, I'd rather see a comment ("/* We use this for >2gb support */") or a test case ( testOverTwoGigsParseable() ) would be a lot more useful than any level of discipline over commit-messages.

For Mary's commission-calculator, it sounds like nobody has access to good "whys" anyway, because they boil down to "salesguy X insisted on it". Instead, the commits are functioning as an auditing/blame tool.

That statement makes no sense. It's not like anyone ever has to make a choice between making sound decisions while coding and making sane commits and commit message.

Well, yes, but it's true :-D


fixed indentation updated to pass new code linter

That's still a why. An irritating one, sure, but it at least tells me to look at the previous commit to find the real 'why'.

I had a local git repo with a month's worth of commits. I hadn't pushed any of them so they were all _just_ local (though the head was constantly being uploaded to an online store).

I got a new computer and when I was getting rid of my old one, it didn't occur to me to push all the commits or save the git repo. So, my next commit consisted of all the changes for that entire month.

It sounds like you forgot to copy the .git folder that had all of the commit information.

If you make a commit, it would copy over with that, because it's written in that folder.

I'd wager that you simply copied the main files over and tried to re-commit.

but wouldn't those be represented as individual commits (just happened to be pushed at the same time)?

Sounds like he lost the local repo, and only the actual final state of the files was backed up. Thus, a single new commit representing everything at once.

yes, that's right. When I got rid of my computer, I didn't save the folder containing the repo; all the commits were lost.

I was able to retrieve the full final content from the place where it was being uploaded to, but that didn't have any git-related files.

What was the commit message?

"Issue #27" is terrible. Something like (in C#/VS) "Execute CodeMaid against solution" would be perfectly fine.

Depends on how you have things set up. I once worked at a company with SVN/Trac integration. If you put "Refs #27" in a commit message, it would actually add a link to the commit as a comment on the ticket. If you put "Fixes #27", it would do that and close the ticket. Our system was also set up so you had to ref or fix a valid ticket in order to commit.

I miss having a system like that...

That doesn't sound Agile.

Really? Let's review: - Once a project is completed, the team must ensure that the “What” and “Why” of each software item are properly documented. That's essentially a retrospective.

- In the cases of parallel development of inter-dependent software modules set up a negotiation table to solve conflict between the development teams. Role of the scrum master to remove impediments.

- Make sure that the development team is aware of the CMMI-ACQ or ISO12207 processes for negotiating with third parties. Previous bullet point.

- Make sure that testers are involved when negotiating with a third party for a potentially vulnerable software component.  Testers are stakeholders in this, they should be there.

- Plan organization-wide process reviews to detect isolated processes and to promote information flows between processes. Removal of impediments.

- Planned special budget items to support long lasting corrections or corrections that are likely to benefit many modules.  Got me on this one.

- Projects with strict deadlines are risky, and should be carefully monitored to avoid last minute unplanned activities. - Team members should maintain a careful balance between the flows of information within formal development processes and informal human interactions. - Team members should make sure that knowledge is appropriately distributed amongst them. For example, pair programming is a practice which can promote knowledge sharing. - Any intrusion into the team dynamics by outsiders should be done very carefully

These last four are key elements of Scrum - managing the burn down, be flexible, and ensuring chickens can not interrupt the pigs.

Agile does not mean there is no process or formal rules. It's not a free for all, agile is about the ability to quickly respond to change.

Standardized processes defined by committee like CMMI-ACQ and ISO12207 are pretty much the exact opposite of Agile, so, yeah.

> Standardized processes defined by committee like CMMI-ACQ and ISO12207 are pretty much the exact opposite of Agile, so, yeah.

Meh. All the Agile Manifesto really says about process is:

Individuals and interactions over processes and tools and Responding to change over following a plan

Nothing says you can't (or shouldn't) use a pre-defined process, or have a plan. If anything, the core of what "Agile" is, is about being flexible and responsive to change. As long as your process allows for that, it can be implemented as an Agile process even if it was created by a committee.

That's not to say that most firms using things like CMMI aren't doing it in a way that is far removed from Agile principles, but I blame that on the implementors more than the process. YMMV.

I don't have enough context to determine your tone. Are you saying that the above suggestions are not Agile, and so are not compatible with Real Software Development (or at least default to that state)? Or that the above list brings into question the Agile methodology? Or something else entirely?

I'm very interested in the processes of developing good software so I hope there is some good discussion around this paper.

"The project was a second attempt to overhaul this complex Module. A first attempt was made between 2010 and 2012 but was abandoned after the fully integrated software did not work. Because this project was a second attempt, many specifications and design documents could be reused. Accordingly, the development has essentially followed a waterfall process, as few problems were expected the second time around."

I love the sheer insanity of this.

1. Write up a plan.

2. Execute the plan.

3. Total project failure. Abandon it.

4. Decide to try again.

5. "It will be easy this time! We've got an existing plan we can use!"

It's like using a treasure map, not finding any treasure, and now thinking it will be even easier to find the treasure now since you already have a map!

An interesting article, and we need more like it, but I feel the author didn't go far enough in understanding the root causes of complexity.

For example, they cite both schedule/budget pressure, and insufficient docs. The incomplete docs were "thousands" of pages. Does anyone really believe even half the team would read "complete" documentation, that are thousands upon thousands of pages? Would complete docs speed up development time? Would they speed up development time even taking into account the cost of producing and consuming complete docs? Is the size of module truly essential complexity, or is part of the problem that they're building on legacy code?

The author mentions interop with other teams and third party software as a large source of friction. Why are other modules so large and complex that they're maintained by separate teams? Is that essential complexity, or was it caused by previous attempts to patch their way to release? Would the modules be more manageable, and hence, require smaller teams, if they used other practices/languages/tools?

Access to test hardware, and managing the test personnel was another source of friction. What is the cost of buying more test hardware? In previous companies where I've worked, with manual QA depts and under-funded test hardware budgets, doubling the hardware budget would allow halving the QA salary budget (primarily because you don't need to hot-swap equipment several times a day), and shorten test cycles. Is that the case here?

Further, how much manual test is truly necessary, and how much is caused by the developers not writing automated tests?

Offhand and IMHO, some possibles can be:

  - fear or lack of understanding of a legacy codebase
  - fear or lack of understanding of a 3rd party component
  - time pressure
  - too many edge cases to handle
  - intra-team communication issues or stylistic differences
  - lack of domain experience to understand those edge cases
  - lack of time to build the correct design, or rebuild a design if it won't fit
  - project management pressure against big-design-up-front and to "start now" when making something more organized is required
  - unclear requirements
  - not interested in work or company anymore
  - lack of use cases representing all possible usage scenarios
  - it's actually not their code that was bad, but they are coupled to bad systems that are themselves bad/leaky/error-prone

For me it's simple:

1. Lack of risk management 2. Poor project management 3. Allowing other to set my urgency (like deadlines but more psychological) 4. Not enforcing professional practises I know I should follow

3 and 4 are to all intents and purposes symptoms of 1 and 2

Interesting, but for me, there's a key piece missing. What did they use to determine that a developer was "good" to begin with?

If you read the text you can see that it's not even important if the developers are "good," the internal policies limit what even the best developers can do.

It's an attempt at a catchy title by appealing to the 10X developer belief/archetype.

To mitigate these problems I try to always completely finish the current task including doc updates etc before moving to the next thing. I do this pretty much always even under management pressure. It can be awkward at times.

Good code doesn't take more time than bad code. I think bad code takes more time than good code.

Does anyone recognize which company this study examined?

your bad code is somebody else's good code.

I'd put it rather like... for any reasonable definition of "good code" and a sample that highlights the characteristic features of such good code, you will find a competent, intelligent developer that will consider it bad.

Which is more or less the same thing, except that IMHO it does not suggest that every bad code shall be "good" for someone. There's such thing as code that is universally considered as awful (but for their creators, maybe).

I had to write bad code because sales and management usually dictate the timeline of the project.

As an example: sales promises new customer feature X in one month and we will lose money unless it's implemented.

My choice is to either: 1) Complete it the right way in double the time or 2) use hacks and cut corners to get it done on time. Since non-technical people just see the output and not what's going on underneath, they often times don't see the difference and when they explain to the boss that they might lose money, it's almost always option #2.

I hated being forced to do this so many times in my career, I started my own company.

Observationally this seems why sales driven organizations might have trouble keeping good developers. They force the build up of so much technical debt and never give the time to fix it that the good developers just leave rather than maintain the nightmare of code they've had to write.

This weakness in sales-driven organizations applies to other types of employees, too. I used to be a graphic designer for one such organization, and it was next to impossible to create marketing collateral based on design standards; they'd rather buy it in a template kit from an office supply store than wait for a better plan to unfold. I later watched as they completely missed a huge opportunity to go upmarket with the proprietary hardware technology that the product staff developed. They were so obsessed with what was right in front of them that they became embroiled in tactics without strategy, and only the amazing breadth and depth of the market allowed them to continue making a profit without punishing them too severely.

Are you implying logic driven companies exist?

yes, they do. I'm contracting (in germany, if that's important) and about a third of the companies that hire me have high or very high standards when it comes to avoiding technical debt, adherence to sane development cycles and clean code. So yes, there are those companies.

Sadly offshoring craziness seems to be increasing here.

> I hated being forced to do this so many times in my career, I started my own company.

It depends a lot on what your company is doing, but for the most part, you'll probably discover that cutting corners to save your short-term time is often the right thing to do.

It depends, right?

It's right to the biz side of the house, because they have no idea what the costs of those cut corners are. They just know that they asked for shiny feature X, and then that shiny feature X was delivered, albeit with a complaint that the codebase was now Y harder to work on.

Rinse and repeat, and their interface is still "Oh, well, if I ask, the engineers will make it happen".

As this goes on, though, the engineers incur costs and suffering in order to keep delivering. And for any developer worth their salt, there is a very real cost--I almost want to say psychic and emotional trauma--associated with working on bad code.

And these engineers? Odds are, they have no big stake in whether or not the feature works. They're paid (badly) the same way either way, and don't get any benefit whether or not the new feature works.

So, they'll just leave when the continual cost of dealing with a shitty codebase outweighs the benefits of staying at that company. And they'll probably be rewarded with more money for switching!

The incentives are all wrong, and then one day your sales folks realize that any request suddenly takes waaay longer than is acceptable.

This all depends of course of the definition of "cutting corners". Sometimes one person's "cutting corners" is another person's "striving for excellence" and sometimes it's "run project into oblivion".

All too often the short-sighted "quickest" way forward is through copy-paste some code that does something remotely similar and massage it into giving the desired output. Sometimes this is necessary, but a lot of times it is not. And every time it leads to technical debt. I've worked on a lot of code bases like this: "Copy file x, rename copy, add copy to project file, rename global names, massage into y, rinse repeat". And then you find a bug in x.

Developers are typically too far removed from customers to ever know, but often the customer is just using time as a bargaining chip. Everybody wants all their needs filled right this instance but most people are reasonable enough to know that the universe doesn't work that way. You just need to gently remind them of it and they will respect you all the more for it, at least if you then deliver quality on time.

Of course this is harder with customers who believe they know a thing or two about development. In their mind it's just change the output of x, but what they don't know is that because you cut corners in the past, you'll need to change the output in y, z, x2, x3, y2, z3 and zz as well...

Indeed. Very frequently "getting it out the door" can mean enough money to hire a great developer vs just a good one. There are salespeople whose over-promising causes more harm than good, but it's not black and white.

This is definitely true in the interest of shipping anything. The problem is when new features are added to the thing that shipped, small refactorings should make it a manageable beast. Instead, <rant>you end up chasing bugs across six different files and fixing the entire thing will take a week and a half of your time because it involves moving things out and straightening the screwup that went on for three days before anybody noticed, so they insist you live on with the bug and keep chasing bugs down as they occur</rant>

Bullshit. Most of the time the corner cutting is done solely because shithead sales drones make promises without consulting engineering in the slightest. They get a huge commission for getting the sale and go home at the end of the day. We have to put in more late nights to cover for their asses, and we barely get anything for a raise because of that one time we weren't able to meet sales physically impossible promise.

I wonder what going into sales would be like. You could try the other side of the coin.

If there is an emergency situation where I need to apply a hack, of course. But, I always go back and take the time to fix it with an engineered, long-term, solution.

I also don't over promise features in an un-realistic amount of time.

I will tell you, as someone in the same position, that even when you get all of the time you need, and you write elegant, beautiful code, in five years that code will probably seem clunky and outdated, and someone will ask you what the heck you were thinking when you wrote it.

So, there's no such thing as good quality code that saves time in the long run? So we should just give up and hire some cheap overseas contractors because the result will be the same?

To anyone except an MBA running a software project for the first or second time, that's pretty obviously not true. (Some people, when they realize this, do "additional process will be mandated until quality improves".)

I'm not clear what your last sentence means at all. You write the best code you can at the time, but a dogmatic approach - even the one that you think will save so much time in the long run, will likely be completely useless within a few years. It sucks, but that's what happens. This has a positive spin, too - you will be much less likely to adopt flavor-of-the-month programming tactics until you really vet them, unless it's incredibly obvious as an improvement.

This is usually one of the fastest ways in an interview to tell how experienced someone is. Younger, less experienced programmers are dogmatic, but their view is extremely limited - they believe defiantly in the current coding practices, but haven't yet learned that their dogma is just the latest, and will be replaced in short order.

Hey, I was once one of those dogmatic kids, too.

Not to speak for someone else, but the last sentence was clearly a reference to "The beatings will continue, until morale improves."

You're right though. And it's a balance between technical debt incurred having not enough time (hacky shortcuts) vs technical debt incurred from having full freedom (overengineering of unnecessary layered abstractions and indirection).

It's a tightrope and we're always going to stumble somehow. In retrospect, we tend to focus on the stumbles (wouldn't do that now!) rather than the fact that we actually got across the canyon and delivered business value. A natural bias - we discount the fact that we're making new mistakes even as we look back and criticize our past mistakes.

After five years, you'll probably ask that of yourself! But if I was really given all the time I preferred, I'd have brought the code current with my ideals.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact