Hacker News new | past | comments | ask | show | jobs | submit login
Startup Suicide -- Rewriting the Code (steveblank.com)
217 points by terrisv on Jan 25, 2011 | hide | past | favorite | 111 comments

Except when it isn't!

Sometimes during the initial phase of building a product you realize you're on the wrong road and it's actually faster to toss out what you've got and start over. Typically I take about three tries to get it 'right', the first is to get a good feel for the problem space, the second when I have a first working version and the third one will actually last for a long long time.

The first two last for respectively as long as it takes to type them in and a couple of days to weeks, and I think part of the secret here is to actually plan to throw away that second version instead of hanging on to it after getting past the point of no return in terms of sunk cost.

Most 'rewrites' are not done for valid reasons but simply because a new guy was brought on board that has not yet learned to read code in order to understand it but whose gut response to anything they didn't write themselves is to trow it out and do a rewrite, even if that means killing the company.

Just look at netscape to get an idea of what that mentality will do to your corporation.

Sometimes during the initial phase of building a product YOU realize you're on the wrong road and it's actually faster to toss out what you've got and start over.

But what if it is years later, and you are gone and it's up to some new guys to do the rewrite?

Or in the words of Steve: A CEO who had lived through a debacle of a rewrite or understood the complexity of the code would know that with the original engineering team no longer there, the odds of making the old mistakes over again are high.

So I think you and Steve aren't really disagreeing. You're talking about the early time frame, when you are writing all the code. He is talking about the later time frame, when the company has grown and you are long gone.

Agreed. Rapidly chewing through prototypes in the early stages of a project is a great way to weed out bad ideas.

Rewriting stable production code, on the other hand, is almost always a mistake.

Depends on what you mean by stable, and what you mean by rewrite. If your end result is buggy and unstable, it's sometimes worth using it as a rough draft and starting over. Ground up rewrites in a successful environment (which was the subject of the article) are usually wrong.

Planned, incremental refactoring or re-architecture can be quite beneficial if done right.

Certainly. Continuous & judicious refactoring is the best way to avoid getting into this fix in the first place and is also generally the best way to dig yourself out.

Just had this experience. Took an existing code base and worked about four weeks to try and modify it to support new functionality. Had an epiphany that 'this shouldn't take this long', started from scratch and duplicated all existing functionality as well as the new code within two days.

Sometimes tossing badly designed code seems like two steps backward, but that isn't always the case.

The big trick of course is that you were in a good position to actually make that call and it worked. The difference is that there are people that will shout 'rewrite it' without being in a good position to make the call and as some kind of personal 'NIH' syndrome.

I've done what you just did to a package that I maintained literally for years for a company that I contracted for and at some point the same realization hit me, it shouldn't be this hard to just toss this and do it again. But by then I had a pretty thorough understanding of the problem and of the flow of the code that solved the problem (even if it was weighed down with a lot of junk). Rewriting it made life much better. But typically if someone is new to a project and needs to fix a minor bug and starts to say we should re-write all of this they're just plain wrong.

I don't think it is just NIH, the new people just have a very superficial view of the amount of issues, choices and decisions that were faced in building the system. That means they have a tendency to underestimate the costs that go into building the system.

Exactly, that's the core of the issue: it is very easy to overestimate the advantages of the new design/rewrite. But even worse very easy to underestimate the advantages of the old design. Because in a bad system (the ones you want to rewrite), things are generally not that well specified.

That's also defines what may work as a rewrite and may not work: if your application has a lot of external dependencies, and is used by customers in a very tightly way, rewriting it will take forever unless you don't care about losing your existing customers (because you lose what works and what does not). What makes matter worse is that you are more likely to make those mistakes early in the business.

If your application does not have tight integration with the customer, then it becomes much easier to replace it, one part at a time. Otherwise, you are likely to just recreate the same monstruosity anyway once you managed to support half of the features from the old version.

I think rewrite make sense in some cases, but the natural reaction should be don't, especially when you don't have that much experience. Successful rewrites are the exception, not the rules. To make a dubious analogy, that's like the junior programmer would think that his bugs are actually in the libraries/compiler he is using. The senior programmer knows it is almost always his own fault, and the very senior one has a few stories about long night debugging caused by compiler bugs in the old times.

You are right, of course. One of the larger issues in this kind of endeavor is that many start ups aren't really using seasoned engineers.

I'd think that that requires that you really understand the code top to bottom. I know I've looked at code other people have written and gone, "that doesn't look right" and redone it 'right', only to discover the hard way that there was a good but subtle reason the code looked the way it did and my fix introduced a couple of very subtle bugs.

Three tries seems about right. FWIW: the second version has an actual name: http://en.wikipedia.org/wiki/Second-system_effect

The problem I've been running into is that my second version took 1 month to reach 80% "complete" and is taking 3 more months to finish that last 20%.

That's just yet another instance of the 80/20 rule (derived from the Pareto Principle).

In my industry (games) it's well known that 80% of the effort goes into the last 20% of the product, but also the last 20% of polish makes 80% of the quality.

> but also the last 20% of polish makes 80% of the quality.

I've noticed this, too. When a product is 80% of the way there, the people behind it see a product that's nearly done. Your audience, on the other, sees a piece of garbage.

I understand that what Steve Blank (and also Joel in his famous article) refer to is rewrite of existing applications with a lot of customers, something that brings a lot of cash to the company already.

Also, it seems that you describe a rewrite you did by yourself: the dynamics of a rewrite by a single person are quite different I believe, and are more likely to be successful (unless that's one guy who rewrite everything and ask other to support it, which sadly happens too many times).

So you would say "Startup Suicide -- Don't rewrite your code too late"

Even 'later' there can be valid reasons to rewrite. I don't like blanket statements on complicated issues, the real answer here is 'it depends', and it depends on a lot of factors. There are good rewrites and there are bad ones, and it's not always clear-cut what the best decision is. If it isn't broken don't fix it seems to be a good rule, it makes you err on the side of caution.

The fact is the plenty of times rewriting is not the answer, but for instance, refactoring consistently and aggressively over a longer period of time can have just the same effect without most of the downsides.

An all-or-nothing hail Mary rewrite where some poor sob has to turn the switch on D day is typically a recipe for failure but that's not the only option on the menu.

I have watched a large rewrite fail and cost an engineering manager his job. The next manager, perhaps learning from his fallen comrade, did something that worked spectacularly well. He did a gradual, component focused rewrite. With each release, they would carve out a part and rewrite only that chunk. For anyone looking at the big rewrite, I would suggest this as an alternative.

I've noticed this as well. You simply can't start from scratch.

The first thing you do is make sure you have good tests built around our old code base. Then you slowly start refactoring/rewriting pieces out. After every small refactoring round run your tests and make sure everything is working. The key is to break down the rewrite into small steps and make sure you have a full functioning product at each step. This might even mean that you need to write code that will be removed after a couple of refactoring iterations.

Wish I could vote you up a few more times on this. Mike Feathers called them characteristic tests. Sure, run them as unit tests, but get them under CI right away too. Test everything that moves all the time.

This type of work, and this approach, appeals to a limited set of people, though. It's painstaking, detailed work. The other problem is that businesses don't understand its value, and don't want to pay for it, ime. I've seen two companies go down not paying attention in this area, and two more who are currently dying.

Course, if the thing was under test to start with then things would be so much simpler ;)

This is what Michael Feathers calls 'seams' in his book, Working With Legacy Code. Often, you have to do exploratory testing, that is, you don't really know the requirements but you make tests that the current code passes. Then you can refactor it. That way, current code behavior won't be changed.

Very good read, if you need to deal with legacy code and you don't know where to start.


This is exactly what we just did to ww.com, and while we're still on the fence whether that was the right thing to do or not the result is that we did not have big continuity issues and that we now have a completely fresh codebase.

Where I see rewrites fail is when the company has one huge monolithic and interconnected application which really has to be rewritten all at once or not at all.

Dividing your application up into logical libraries and services goes a long way towards making it easy to rewrite. Essentially, you want to obey the single responsibility principle. This means that each component should not need to change if a change is made in another component. From the Wikipedia article:

> Martin defines a responsibility as a reason to change, and concludes that a class or module should have one, and only one, reason to change. As an example, consider a module that compiles and prints a report. Such a module can be changed for two reasons. First, the content of the report can change. Second, the format of the report can change. These two things change for very different causes; one substantive, and one cosmetic. The single responsibility principle says that these two aspects of the problem are really two separate responsibilities, and should therefore be in separate classes or modules. It would be a bad design to couple two things that change for different reasons at different times.[1]

For example, some of your core logic might be written as a library or a stand-alone server. This separates it from the GUI logic and the database logic. So if you need to rewrite the interface, you do that in the GUI code. If you want to refactor the business logic, you rewrite that component. If you want to change the database, you alter your data-access layer.

This makes incremental improvement easy.

1. http://en.wikipedia.org/wiki/Single_responsibility_principle

> one huge monolithic and interconnected application which really has to be rewritten all at once or not at all

I disagree. Any program can always be refactored and compartmentalized. It may be a slow process but it is possible to do a rewrite one chunk at a time.

Absolutely -- a total rewrite is only necessary if broad architectural decisions are all wrong. Otherwise, working to decouple the monolithic system is the crucial step which will allow the chunk-by-chunk rewrites.

The thing that popped into my head while reading the piece is that sparing one star developer might be pretty cheap; how about putting him on a Skunkworks rewrite for 6 months and see how it goes? If it's looking good, give him whatever resources are necessary to finish. Do you think that would work?

I think that's a good way to minimize risk, but also consider the morale issue. This would be the equivalent of giving one developer a corner, window office while the rest stay in thei cubes.

Perhaps you could apply google's 20% rule? Every Friday the entire staff breaks up into teams and work on skunkworks rewrite projects. I think this will boost overall morale and you might find your developers staying late on Friday nights :)

I guess I'm on four major rewrites so far - two complete disasters, two great successes.

The only pattern I've worked out so far is: disaster, success, disaster, success...

Good luck in your next project.

I wonder what predicts success in the rewrite?

The advantage to the component based rewrite is that it doesn't cost you your head if it fails. You can still push out new features in each version, and you have a fallback plan if the component rewrite fails or is delayed (just use the old one).

The disasters were driven mainly by an attempt to create a manageable codebase while keeping the end user experience fairly similar.

The successes were things that built on existing systems but focused on delivering things that were actually radical improvements in functionality, with the rewrites being driven by this, not an end in themselves.

I am a big proponent of gradual change like this. I've used it multiple times on major enterprise systems.

I'm still baffled by why more people don't do this.

I feel like the mental stumbling block that stops people has something to do with our ideas of purity and cleanliness. Maybe people intuitively feel contact with the old system would make the new code unclean.

It is slow and painful and requires you to pay lots and lots of careful attention to the behavior of the existing enterprise system that you may very well hate.

Greenfield development sings a siren song. You get to scribble on a gloriously empty page in your imagination, free of such mundane concerns as cash flow, near-term customer demands, day-to-day stability, vitally important edge cases, and hard-won but crufty bug fixes.

Part of the reason could be technical analysis paralysis. I've worked on projects rewriting PowerBuilder components in C#. Getting the two to talk to each other is non-trivial, so determining where to slice off chunks to rewrite is an anxiety-inducing prospect.

We've been doing a variant of this for the last year. We rewrote the core and one large component first and have been gradually moving over the other components to the new architecture. It's definitely a safer way to go, though implementation time is longer.

I'll lend my anecdotal experience here as well.

I was the fourth employee at a startup (that is still going strong nearly 10 years later). Pretty early on we were forced into a major pivot which saw us make the transition from Palm OS to Symbian. At that point we were left with a bunch of really difficult questions.

The code-base, as it was, was not in very good shape. It was full of bugs, suffered from some ahem questionable design decisions, and ran rather poorly. On the other hand, it was the basis for a product already doing millions in sales.

At that point I had grown into a technical lead role (we were probably more in the 25 employee range at that point). I looked over everything and championed a strategy in which we would simultaneously port the "bad" code over while splitting off a much smaller team (that I ended up leading directly) to rewrite everything from scratch.

That rewrite literally saved our business. Our 'old' code base had suffered through that year as we struggled to patch it up enough to meet acceptance requirements from a device OEM we were working with. It was a terrible project, and no one was particularly happy. Meanwhile our direct competitor had brought out a new product that raised the bar in terms of quality by about 1000%.

As we transitioned to the new, much saner, codebase we were able to very quickly respond. We built better features that worked more reliably. Our next experience with that same OEM went incredibly smoothly. They went from nearly dropping us to being among our biggest champions going forward.

So in that case, rewriting the thing probably saved the business. In the ensuing years, as more and more platforms have come about, that same code has continued to evolve nicely to meet the demands of those platforms as well.

So I'll agree with jacquesm: except when it isn't indeed:)

We did a full re-write last summer.

It was absolutely worth it and has since allowed us to iterate at a much higher speed. It was a bit terrifying to be at a feature standstill.

Our re-write was from Java(struts2) + BDB to python(pylons) + MongoDB.

If anybody is interested we are giving a talk at PyCon.

MongoDB + Pylons at https://catch.com: Scalable Web Apps with Python and NoSQL: http://us.pycon.org/2011/schedule/sessions/131/

Off-topic, do you have any plans for moving over to Pyramid ?

Yeah, Pylons 1.1 will have some forward-compatibility stuff with Pyramid. From what we know it should not be hard to migrate since it provides a lot of the same fundamentals as Pylons.

Famous rewrites:

- Microsoft Word, "Project Pyramid". Never finished, the company decided it would take too long to rewrite + keep up with adding new features.

- Netscape 6. Practically killed the company. Dragged on for years.

- IE4. Turned out OK and made IE the leading browser.

- Ericsson AXE-N. Huge project to rewrite the succesful AXE phone system in an object oriented way. Failed miserably.

I'm sure you can think of a few more. I wonder what Microsoft did right with IE

Perl 5 was a complete rewrite of Perl 4. Great success.

Git was a rewrite of BitKeeper. Great success.

BIND 9 was a rewrite of BIND 8. Significant improvement.

In 2002 MediaWiki was created as a complete rewrite of the previous software that Wikipedia was running. Astounding success.

Mozilla is a famous rewrite disaster. I have my opinions on it, but this is not the place for that.

PHP 3 was a rewrite of PHP 2. I hate to say good things about PHP, but that rewrite has not been bad for PHP.

Project Xanadu went through a rewrite. This seems to have been a bad thing.

vBulletin only became popular after a version 2 rewrite.

Zope 3 is a rewrite of Zope 2. It does not seem to be a success.

I generated this list by taking the first two off of the top of my head, then I went to http://en.wikipedia.org/wiki/Rewrite_%28programming%29 and clicked through to the links to all of the listed projects. If a quick scan for "rewrite" followed by information about how good it was gave me an opinion, I added it to the list.

Other than the obvious effects of survivorship bias, this should be relatively unbiased. From this it doesn't seem that rewrites are necessarily a bad thing.

Incidentally in my personal experience I've been involved with a number of rewrites. Most succeeded. I've seen a number of other rewrites from a distance. Most failed. I consider this mostly luck.

> Git was a rewrite of BitKeeper.

Um, not exactly in the same sense as the article. I would say that Git was inspired by BitKeeper.

> Project Xanadu went through a rewrite.

I thought Xanadu was vaporware?

Judging from http://en.wikipedia.org/wiki/Project_Xanadu there is a possibility that, had they not chosen to rewrite their prototype, they might have released something years before they actually did.

I also have the sense that this project suffered a lot from the desire to try to be perfect, and hence failing to be good.

Windows NT - a complete rewrite of windows, and became the core of XP and windows server. This was a major success, IMO, NT/XP was far, far more stable than the 3.1/95 family.

It also took Microsoft almost 10 years and millions of man hours to get NT to a stable state, iron out comaptibility and performance issues before they eventually replaced 95 with XP. It was a huge undertaking - probably not something a startup can afford. IIRC Microsoft had a completely separate team working on Win NT initially, lead by David Cutler.

edit: fixed 'not something a startup can't afford' -> 'not something a startup can afford'

> IIRC Microsoft had a completely separate team working on Win NT initially, lead by David Cutler.

If you're interested in this, I highly suggest you read "Show Stopper!". It provides some interesting insights into Microsoft's early days with NT.


... and I still can't wait for asynchronous I/O events on both a socket and the console...

Hmm, that's interesting. I was about to say that NT wasn't a rewrite but it was a completely different operating system designed from scratch, well I guess a little of the design came from VMS:)

I see your point on how you consider this to be a rewrite.

I guess the question I have is, what's the difference between a brand new product, and a rewrite? Up until now I considered windows NT to be a brand new product.

AutoCad was rewritten to be object oriented in the early 1990's. The first rewritten version Release 13.0 [1994] was such a flea ridden dog that many customers skipped it. But Release 14 in 1997 was a marvel, and Autodesk has had success since and been able to extend the product into new areas due to the object oriented architecture.

Foursquare went through a rewrite but I guess that was pretty early once the team had got a proof point that there were people out there who would use the app.


The rewrite of Netscape's code actually turned into Firefox, which was fairly successful by any metric.

Sometimes rewrites are necessary, but they have to be driven from necessity, not simply from the desire to "start fresh".

Not quite. There was the rewrite that killed the company, and then there was another rewrite, which turned into Firefox.

There was no rewrite that killed the company. There was an attempted rewrite that went nowhere, but the company was killed by IE4 being good enough and free.

Was it successful for Netscape the company, or for the eventual users (MANY years later) of Firefox? I agree with you on the latter, but the decision makers at Netscape need[ed] to be concerned about the former.

My lessons from painful rewrites:

Even if you have little time to invest in refactorings along the way, at least do this: try and plan from the start to componentize wherever practical as you go along -- as soon as you start to get a decent feel for how the parts fit together, but not any sooner.

That means that if things do become a mess, at least you have the option to rewrite or refactor different components without having to tear down the whole edifice.

You'll never achieve the dream of perfect decoupling and don't die trying, but at a minimum doing what you can to break a big problem up into smaller ones will make it all a bit less scary from a psychological point of view.

Not going to help you if you get your component architecture wrong, either, which is why you don't do it all upfront. But do try and decouple fairly aggressively as soon as you do get a respectably stable insight into the structure of the problem (or of a particular part of the problem), because the longer you leave it the harder it's going to be.

If the structure of the problem never stabilises, and it's a complex problem, then good luck to you.

It also helps a lot if you start out with a framework and development tools which make it easy to be modular and easy to develop in a modular fashion. As always there's trade-offs between this and speed of development, but I suspect the suggestion of an extreme mutually exclusive trade-off, eg along the lines of "rails vs j2ee" is a false dichotomy. Both can (and should, and do gradually seem to be) meeting in the middle.

As always YMMV.

I've also been part of a successful rewrite for an enterprise software project. Keys to success:

1. Cut features. Aggressively scoped the project to only core functionality and cut down a bloated feature set by about 50%. This allowed us to complete the rewrite much faster. Existing customers weren't forced to migrate, but often wanted to because we'd...

2. Deliver new features that weren't possible on the legacy code base. This included: - 100x Performance improvements (hours to seconds) - Versioning and audit tracking - Better, faster, sexier user interface

3. Leverage new technologies and frameworks. This is an obvious part of any re-write because it enables a team to move faster. Technology changes so quickly. Think of how a small team today can accomplish so much more than a large team stuck on a platform from 5+ years ago.

I'm not saying that a rewrite is always right, but I believe you can make a good case for it. In this case, I guess you could consider it a part of a business pivot.

Sometimes you just burn the boats...

You never want to 'rewrite' the codebase, but you do want to 'refactor' it. The concepts are basically the same, but you gain the incremental benefits by not starting from a new codebase. Rewrites always suffer from second system effect and disperse your efforts away from the actual problem. Rewrites are the economic equivalent of quantitative easing, it's just a nice way to avoid dealing with the actual problem while pretending to fix it.

The process I always use for refactoring an existing project is to first isolate the code that you want to remove, then create an interface that so that the new interface is EXACTLY the same as the isolated code and then build a new implementation that utilizes the same interface. Then you change the interface so that the leaky abstractions from the old interface are gone.

This is the process I used to migrate from OpenLDAP to SQL Server. And yes, there was a DN field in SQL Server for many years that emulated the DN structure of LDAP, and we had a bunch of hacky stored procs that would emulate LDAP semantics for search.

Rewriting is the 'easy' answer, but its usually not the best. Even if you're 'refactoring' your app to another language it's usually best to create some bridges so that both work in parallel, if you have a web app use some proxy hackery, if you have a desktop app, use IPC. I've switched apps from Perl to ASP.NET using this method.

The other thing that refactoring in this manner benefits is team cohesion, by switching languages slowly it allows the people proficient in the old language to add a lot of value to the new team, transfer knowledge slowly, and also get up to speed on the new language. When people feel like they are going to be out of a job when the rewrite is complete they will not be thrilled about it. When they have an opportunity to learn a new language, contribute meaningfully to getting rid of things they always hated, they will be much more engaged.

No offense to the author, but going through one bad rewrite doesn't make anyone qualified to declare the idea unilaterally unsound.

Having gone through dozens of complete rewrites, I can agree that engineers too often want to start from scratch, because it seems easier to build it 'the right way' than continue to wrestle with old code. But that doesn't mean it's always the wrong idea.

I've seen it work brilliantly. I was over one rewrite where we were struggling to get the existing code base to adapt, so I pared off a couple of devs, rewrote the whole app in a few weeks (under a month) while the primary team continued to support the existing app. Maintenance became a breeze.

But I've seen the other side of it, too. I've seen total teardowns and chucking years of QA'd functionality go horribly wrong.

1) Often the perception that code quality is so bad that a rewrite is needed stems at least in part from the "not built by me/us" syndrome (related to the "not built here" syndrome). Developers tend to overestimate their ability to write good quality code in real world situations.

2) A lot of folks here are talking about "throwing away the initial prototype". That makes a lot of sense at an early stage when it's just you and may be two other people and progressively less sense as you grow larger modulo some other factors. If the company in question has a 50M run rate, we are talking about a really mature product. I don't think you can call it the initial prototype any more.

My opinions are colored by observing, at close range, a failed multi-year attempt to rewrite a mature product from scratch at a major corp and success at refactoring in parts to significantly improve code quality at my own startup.

Often the perception that code quality is so bad that a rewrite is needed stems at least in part from the "not built by me/us" syndrome

That plus the lack of understanding of why the code is complicated. Corner cases and exceptions are a huge PITA.

The trouble with special cases is that they come in two very different flavours. Some are essential complexity, inevitable consequences of the problem you're trying to model. Others are accidental complexity, artifacts of the development process, often things that came along when requirements changed after the initial design was set and didn't fit in neatly but didn't justify reworking the whole thing either.

You can never get rid of the essential complexity, but with the wisdom of hindsight you can often produce new design that integrates the accidental special cases into a coherent whole. I've seen modules cut to 1/3 their former size and various "can't fix" bugs eliminated as a consequence.

"If the company in question has a 50M run rate, we are talking about a really mature product. I don't think you can call it the initial prototype any more."

But some of those same companies would love to be called a "startup" years in to their operations.

A lesson can be derived form the Facebook , PHP and HipHop story. They did not rewrite the PHP code in another language, they just made it go faster with HipHop by compiling PHP into C++, and the compiling that into binary.

The end users neither cared nor knew about this change. I do not think the development or release timelines of new features were impacted at all.

What this example serves to illustrates is that one should consider all possibilities before making a decisions regarding code that has been in production and has active users. Rewriting production code should not be the only option you should have on the table.

Sure, HipHop gives a fixed percentage increase. It feels a little strange to compare this to a software re-architecture though, since fixing something Fundamentally Wrong can have compoundingly good effects down the road.

HipHop makes PHP run as C++. It is a massive boost and a kind of a port.


I think the answer is not to 'not rewrite' but 'rewrite well'. Rewrite, to me, doesn't mean throwing out the whole thing -- it means start from a blank slate and leverage what you have as you rebuild something that matches your problem.

I'm a big proponent of 'rewrite often'. Instead building your software like you're playing Katamari Damacy, take the time to rewrite to your current specs as a whole -- playing Red-Green-Refactor on a bigger scale.

I know the article is addressing 'bad rewrites', but I think all rewriting gets an unfairly bad rep.

It's a highly contextual decision. If it's coming from an experienced engineer who doesn't want to make more work for themselves than absolutely necessary; I would take their advice in stride. If you've just hired college kids with no experience, do the opposite of what they say.

I currently maintain a legacy web application built over 12 years ago in C++. When it was originally built, it was that developers first programming project. It has since survived several aborted attempts to extend it with and port it to Python before landing in my lap. It's horrible to say the least.

However, the approach to the problem was decided before I got here. It's a smart plan and well executed by the very smart developer who worked here before me. He's wrapped the legacy application with an FFI and has written a slew of heavily tested code to sync the legacy data from the old application to a relational database that they want the new platform to be written on. From there most of the application has actually been ported to a Python web framework. Those parts that haven't are still supported by the legacy application. My job is to finish this process and then look at "re-normalizing" the data and start re-developing and designing the features our clients have been asking for.

The problem with this approach is that it's not cheap. It's not glamorous work dealing with someone else's poor design choices, bugs, and lack of documentation. It's not easy grasping the amount of complexity that goes into running a system that maintains the legacy application and the new code in parallel. A green developer simply cannot do it. People with the kind of expertise it takes to manage this approach to dealing with legacy applications come with a premium.

One of the first questions I asked was, "why didn't you just rewrite this?" They certainly had enough time. The decision to take the approach we're on now was made four years ago and was not expected to take this long. A rewrite, even a mis-estimated one, would not have taken near as long and would have been far cheaper. They also wouldn't still be suffering some of the crippling bugs that are left in the legacy code that are affecting their customers to this day.

"Rewrite," isn't an ugly word that should be avoided like the plague. In many cases it's a very reasonable answer to a difficult problem. Like anything you just have to evaluate the pros and cons effectively. Only experience can help you there. So if your seasoned technical lead says, "rewrite," you might want to consider it.

Creating a legacy compatibility layer, so that you can rewrite each component or area piece by piece is definitely the way to go. That's how I've been reworking Appleseed into a component-based MVC, and it's worked really well. People still can use the legacy code, they don't have to wait in the dark while things get rewritten.

I think that there is a scale depending on how far you've got with the startup. The pain of a rewrite rises the further along you are, which is why it's important to:

a) Make good architectural decisions (good luck)

b) Rewrite a lot as early as possible, as those decisions turn out to be wrong.

You know, fail fast.

I'm at an early stage, and I have rewritten twice this year The pain has definitely been worthwhile, as my system is now beautifully designed and organised.

It might be possible to design the perfect architecture on a whiteboard and then go ahead and execute it, but that's an order of magnitude harder than writing a subroutine and having it execute first time with no errors. And most of us can't even do that regularly.

Your product is like a bit of jello which is solidifying fast; you need to make the dramatic changes early to avoid being stuck with an ugly lump later on.

I led a counter case where a rewrite was very successful. There had been a major component of the architecture that just approached the problem wrong at its inception. Changes that one would have thought should take a few hours or day to make took weeks. And once you dug into the code, you learned why. It lacked adequate tests, proper componentization, error handling and operational visibility. It was far beyond refactoring, it was just ill conceived. It's poor functioning had cascaded into other systems; they were riddled with hacks to compensate for the problem system's deficiencies; technical debt had become a cancer that spread around the architecture. At a certain point, we declared the technical debt had reached technical bankruptcy and acquired buy-in across the organization (execs to engineers) that we needed to start a new code base.

However, part of making the rewrite succeed was sucking it up and doing continued maintenance on the legacy system. It was no fun but it had to be done. Things that couldn't be implemented in a reasonable time with the legacy system but were high priorities were implemented in the new system to assure that the win wasn't just one of purity of essence, it was enhanced functioning. Enough was learned from what worked and what didn't in the legacy system that we had a good deal of clarity on what requirements we wanted to fulfill. The hand wringing over excessive feature creep and other foibles that can make rewrites fail were attacked with discipline.

I've heard of many big rewrites that failed but don't buy the argument that they demonstrate that it can't be done. It can.

I am torn.

We are not really a startup, but we are small and have a startup attitude among the technology people. The platform we are running on goes back about 10 years, and it's starting to show.

There is a single shared database that is used by MS Access, classic ASP, and ASP.NET applications. This means you can't change any one piece without affecting all of the others. We've had people leave because of the resistance to change inherent in the platform. Tiny little changes are very very hard to make sometimes. Small changes can take weeks. Certain major changes might as well be impossible.

But then all of the advice I've ever heard says "don't rewrite." What if we don't rewrite, but build a new platform? Solve different problems than the original?

My default answer: build a new modern database, create a datapump to and from. Don't touch the old stuff, write any new stuff to access the modern database directly.

Inasmuch as new features require storing new data that doesn't fit in the old schema, just don't send that data back.

You get to implement new features piece by piece. But always have your fallback that --crufty or not-- keeps the company going. The big hairball in side-by-side systems is operations that require atomicity. Those can get hairy, so I'd ignore them, leaving that stuff in the old system as long as possible, until you're ready to rewrite those modules in the new system and sunset the old one altogether.

Are you familiar with refactoring? It's difficult to give specific advice because it depends so critically on the local landscape, but in general, factor out a layer and give it a good healthy coating of unit tests, and repeat until done. It is more complicated than I make it sound, sure, but in some sense it is also really that easy. Factoring out the data access layer sounds like a first step, creating some sort of service that actually unifies the data access patterns and then moving up from there, but I can't guarantee that.

All but the truly worst worst scenarios are better met with this approach than a true rewrite. In this case by "rewrite" I mean the creation of a new system next to the existing system that doesn't work until all (or at least most) of the new pieces are in place. If you've truly got an epic snarl, rewrite may be the only option, but odds are you don't actually have that epic of a snarl. With the proper approaches and tons of unit tests, a refactoring is like a rewrite in the end, except you have a running system the whole time. It can actually be slightly slower in total, but you also get the value of being able to choose when to stop and a continually improving system that is always actually running; it's only slower vs. a rewrite that actually succeeds and completes and that is not a sure thing!

With a database backing you do also have the option of trying to bring up a new system that also hits the old databases, but that will in practice require the first step I laid out anyhow, the refactoring of the data access layer, and once you've done that the use of the rewrite value goes down a lot.

Refactoring only goes so far in this case, since we can only realistically refactor one application. Unit tests help the refactored application but they don't guarantee that all of the other pieces (DB stored procs, MS Access, classic ASP, assorted other bits) still work.

Why can you only refactor one application?

Rewrite. I led the development team for a rewrite of a successful web product for a small-mid size web company that was built on PHP4 in 2002-2003.

Like you said, the way it was built, a change to one thing would effect everything else which didn't exactly encourage improvements and innovation in the product. We were able to maintain the existing product (5 hours a week) while rewriting it from the bottom up to include the modularity and extensibility that would drive the future of the company and the product.

The rewrite was successful after 8 months of coding and testing, and in the following year the traffic increased by 200M pv/m

Don't rewrite.

Instead get yourselves Microsoft MVC 3 and learn that.

- Since you already do asp.net you don't have to learn a new language, C# or whatever you currently use will be fine.

- It installs side-by-side with your existing dev and production environments and you can use the the very latest and completely awesome Razor view engine[0]

- Make a new version of a single ASPX page that you want to rewrite. Use the fancy URL rewriting features of MVC 3 to send some clients to the new version of the page while everyone else still gets to see the old one.

- Upgrade your database to SQL Server?

This way you bring your tech up to date without the risk of a huge rewrite. And you get benefits for your clients very quickly even if you can't help everyone at the same time. Even if there's always some legacy ASP pages left in your system, so what? If they still work then just leave them there.

[0] http://weblogs.asp.net/scottgu/archive/2010/10/22/asp-net-mv...

Ah, this old chestnut again. Just because Joel said it, doesn't make it true. Granted, the line between "refactor aggressively" and "rewrite" can be pretty blurry at times.

Granted, the line between "refactor aggressively" and "rewrite" can be pretty blurry at times

Not at all. The line isn't blurry in the slightest. In the "refactor aggressively" case, the code is continuing to run while you make the changes. In the "rewrite" scenario, you start with a blank page, and don't have an even minimally functioning system until you build enough of it to get to that point.

In my experience, the advantages to the former approach cannot be overstated.

> In the "refactor aggressively" case, the code is continuing to run while you make the changes.

I've seen (and done) plenty of refactorings that involved rewriting large parts of code and/or leaving things in a non-running state for a while. I've also seen (and done) rewrites that involved liberal re-use of code from the previous code base.

I think that's an overly narrow definition of "rewrite". In fact, I'd argue that piecemeal is the smart way to go about a rewrite, if that is at all possible. In which case the rewrite is just an aggregate of refactorings.

That "narrow" definition is the one being used by Joel, and the other authors who have weighed on in the debate. A rewrite isn't an aggregate of refactorings-- for the purposes of this debate, it is the opposite of an aggregate of refactorings.

Man, I regret not spending more time on the quality of the code at IMVU. I'm not a big fan of rewriting from scratch, but we've basically scaled a prototype into a crazy successful business, and it's had some nontrivial effects. For example, product owners now believe that you may as well timebox all refactoring, because you can never get it all. I wrote about the hidden costs of dirty code a while back: http://chadaustin.me/2008/10/10-pitfalls-of-dirty-code/

In general I'd prefer to refactor, but you need to _explicitly_ adopt that cultural shift early on.

Sometimes a full rewrite is useful and necessary, often it isn't.

The typical failure scenario for rewrites is that development teams don't appreciate that they are more difficult than writing something from scratch. When you create something new you have time for it to grow to scale, you have time for bugs and design defects to be worked out of the system, you have the ability to concentrate on all of these problems without distraction, and the consequences of a late project (which should be expected) are less.

A naive rewrite will ignore the fact that the rewrite will take more time to reach the QA level of the old system (especially at scale), that it'll take more time to develop because resources are split between supporting the old system and the new, that it'll be more difficult to support backwards compatibility or data migration, and that it'll be risky to deploy (you either do so incrementally or you take a huge risk and do a big-bang deployment). Combined with the regular schedule pressure of software development what you usually end up with is your typical big-bang integration/deployment CF.

Stepping into a time machine, going back to the origin of your software system and rewriting it to be better is far, far easier than writing a new system to replace an existing, running system that's supporting a large volume of business.

An important question to ask is: when are you fully committed to the new system? The right answer should be "when it's proven itself better", but typically the answer is "when we've started coding it", the latter is hugely risky. It's best to think of rewrites as creating an internal competitor that has to prove itself at every point.

tl;dr Rewrites are a different kind of software project than normal development, failing to appreciate that typically leads to failure of the rewrite.

At one company, I went through two rewrites, one bad, one good. Sort of.

In January 1999, I was hired by appoint.net (which became eCal) to be their Chief Scientist; my main job was to rewrite their Web calendar in C++. (The original version was in ASP.) It was a disaster. I started by building the infrastructure API, but I made it much too complicated. The team working on the rewrite got up to about 5 people before they admitted that they just couldn't understand most of what I had done. We canned it sometime around November, having spent something like 3 hacker-years on it.

The team then went off and implemented a bunch of API features that talked to the existing application's database, but were written in Perl. They discovered they loved Perl, and were highly productive in it.

About June 2000, the company decided to pivot. The company's business was an OEM calendar; the whole app ran on our servers, but anybody could pay us to get a custom skin, accessed through their own domain. It was popular with dot-coms who wanted sticky features to keep people coming back. One thing that we billed as an advantage was that all the customer calendars were actually running against the same database, meaning that someone who had a calendar with Foo could send invitations to someone who had a calendar with Bar. (What we didn't emphasize was that users could switch back and forth between Foo and Bar at will. That was less sticky.)

But, in mid-2000, we started losing customers--not to competition, but to bankruptcy. Management decided we needed to switch to the enterprise market--and, in the enterprise market, Foo did not want Bar users being able to see their calendars. We needed a packaged solution; but our ASP implementation couldn't do it.

So, we started on a rewrite. All Web stuff would be done in Perl; all database transactions would be stored procedures. I was asked to build an infrastructure for the team to use. I made a template engine--fairly standard stuff these days, but nice; the team loved using it. I eventually got it pretty fast, too, by compiling the templates down to Perl on demand. (We had a budget of 140ms on the Web server and 140ms on the database, which was computed to support 1,000,000 users on 4 Sun ES-450s--about $16K, I think. We hit both budgets easily.)

It rocked. Technically, it was a complete success. Business-wise, well, no--we released in June 2001, a time when enterprises were just not willing to spend money. Our pitch was that it would save them money compared to Exchange or Notes, and we were probably right; but they already had Exchange or Notes.

My favorite Joel essay: "Things You Should Never Do, Part I"


This discussion comes up a lot. People make strong arguments for both approaches.

If you have an established product/business, rewriting is very dangerous. I prefer to refactor aggressively.

I was about to post this very link! It takes upwards of 1 year to get stable and rewriting at that point without considering refactoring and/or creative solutions is cumulative team/company failure.

Steve's advice is sound in some situations but one factor not mentioned by him is the stability of the current platform. Are customers affected by bugs, poorly thought out process flow, etc.? Rewriting simply to add new features may be a losing proposition. But, rewriting because the current code base is so poorly thought out that it affects customers...now we MAY have good justification.

Also, the following struck me as odd, "Our CEO doesn’t have a technology background, but he’s frustrated he can’t get the new features and platforms he wants (Facebook, iPhone and Android, etc.)" Not knowing the situation better, I'm a little leery that the CEO is driving the new features when a product manager or someone closer to the customer base might be in a better position to determine what's needed. Perhaps in this company the CEO is close enough to the customer base, but I can't tell from the post.

One of the things I've seen people consistently screw up in rewrites is perf.

I was in the unfortunate seat more than a decade ago of having MY code up for rewriting. It worked just fine, but people argued that I was the only one who could work on it (not true, but it could have been cleaner, no doubt).

Anyways a small team dedicated six months to rewriting my server code. Six months later they scrapped the rewrite. It ran at something like 20% my sustained throughput. I used to joke that their code was, "copies all the way down". And sure my code made a lot of use inline asm and some nasty tricks, but it flew.

Would I write it the same way again, probably not. But when you're doing a rewrite people have something to compare you against.

The interesting thing is how easily you can 'slide' into a full rewrite, when your old code is aging legacy and everybody wants to get rid of it. I've participated and seen it happen many times. It has never worked particularly well.

In my experience, what you mostly underestimate is the amount of hidden features lurking inside a mature product, which all need to be rewritten in the new implementation. The stuff you can easily estimate is 20% of the work (main features), and 80% is all the small stuff you afterwards realize you also need.

This is why you MUST throw away the prototype.

Ditching bad code is much easier when there’s less of it. Throwing away the first iteration takes discipline and must be executed cold and ruthlessly. It is not enough to admit the code needs rewriting. You must be prepared to delete code from day 1.

3 months is a good life cycle for a functional prototype. Budget for 1 month of overhaul for the first 3 months of development and 1 month of overhaul for every 1 month of development beyond that.

I can vouch for this having gone through this myself. We built our beta in about a year with a codebase full of "technical debt" and a very bulky application which was "feature rich" but "usability poor" and also started introducing all these back end problems (some a manifestation of our web framework as well). So we thought we were "too cool for school" and started architecting (in hindsight, "over-architecting") what was at heart a simple application which just needed a facelift and some consolidation.

We distributed it into multiple pieces and started a massive re-write with all these services which ended up with more code in the shared plugin (duplicated across all services) than in the service itself. Eventually, we realized that we re "over-engineering", cut our losses and quickly glued our little pieces back together into a slightly slimmer application which had all that we really needed to begin with - a facelift and some consolidation.

This is one of my favorite lessons learned, also covered here: http://www.joelonsoftware.com/articles/fog0000000069.html

I think a good analogy is a messy house. Do you bulldoze the house and rebuild it, or do you just go ahead and tidy what's in it?

Only in the most unbelievably dire and awful circumstances do you take the former option...

The situation is never 'tidy up' though, when it comes to software. When was the last time your assignment was to 'move 4 buttons on the screen, and make them shinier'?

Assignments to existing systems involve activities like adding new db fields, new validation logic, integration with external systems, rerouting/duplication of existing data in to new modules, and so on.

Comparing these to activity on a house, all would involve construction/building of some sort. If the foundation is weak - to the point where hammering a nail in a wall causes the floor in another room to cave in a bit, most sane contractors would not get involved, or require severe structural work to get things (back?) to a minimum safety code.

Admit it - you've worked on projects where introducing a variable in module X causes havoc in some screen which seems unrelated to module X. You can harp about test cases catching all this stuff, but the types of systems we're talking about - the ones we're talking about replacing wholesale - don't have that infrastructure in the first place. They weren't built according to best/good practices, which is why the people in the article were considering rebuilding from scratch.

Only in software do we think we can order people to continue to work on systems that are visibly falling apart without having to put in the required infrastructure work to make sure things don't keep breaking.

See my other reply below; there are definitely times when shit needs to be done. Perhaps the analogy should be the state of the house in general; if the foundation's fucked you really have no choice but to start again.

And yes, there are definitely situations where that's been necessary and I've worked with some. I've seen a piece of software that got so damn complicated that the only thing you could do with it was to add stuff exactly according to the retarded design, anything even slightly varying from that would have taken literally weeks to implement.

I think the point here is that utterly fucked software occurs way more times than people would care to admit. A lot of the problem is the disconnect between non-technical managers and coders. Coding isn't factory work.

No, there are other circumstances, like you have enough money and time to rebuild the whole house to be how you want and make your future better. Of course, other considerations are taken into account, like buying new land and selling the old one, etc... But then that makes the analogy bad.

well, yeah, but all analogies to coding are bad in the sense that they never quite capture everything and coding is honestly just not like anything else.

But at least with respect to emphasising the cost of a rewrite, it gets at it... somewhat :-)


Perhaps it's better to say 'the condition of a house' - there are definitely occasions where a rebuild is necessary. If the foundation is completely screwed, for example.

The idea that you ought never rewrite is mistaken, but it's important not to jump into it.

Maybe the general principle is 'weigh stuff up'. Perhaps too general to be useful, however...

It depends.

For example, if your startup is a web-based service written in PHP, then in most cases you shouldn't rewrite it, because your customers won't get much out of it, and you'll be left behind.

But if your're a database startup and your storage engine is doing too much random disk I/O (which is slow), then you'll have to rewrite that part, otherwise you don't have a usable product. However, you should keep in mind that this is super-dangerous, so you should get something out the door ASAP, even it means taking terrible shortcuts.

Depends how bad the code base is, I can see security being a big problem in a PHP code base loaded with technical debt. Speed is another one, certain things in PHP are just going to be slow with bad implementation.

IMO the prudent thing to do as a businessman is to wait until your startup is no longer a startup but a well operating business when you can deal with such technical debt by hiring a smart guy who'll refactor piece-by-piece in the background. Until then you just patch security issues and cache the hell out of it.

For me it depends. I think that rewriting the 100% of the code is a crazy thing. I'm working for some web startups, and expecially in the beginning, with really low experience, you or your fellows could create something that fit the need in that specific moment, but maybe in the future is not good, maybe haven't good scalability, or something similar. In that case, is better rewrite. Maybe you could save the company and learn a lot of things about that terrible experience.

Show me an example of a startup that scaled up massively and didn't rewrite their code. I'd suggest here that we have a case of correlation without causation. Startups tend to need a big rewrite early in their history, which is also coincidentally when they are likely to fail. People looking for a reason (or an excuse) for failure will blame the rewrite. Not unlike the 'vaccines cause autism' meme.

rewrite is an overarching daunting task. It's so much better to break rewrites into multiple steps. I went through several major rewrites with various startups so I think it works, but take it with grains of salt:

1. Throw away garbage first. Most companies including startups accumulate garbage code fairly quick. By just throwing away old ideas that didn't work, you have achieve a lot of gains already. There are many benefits doing this alone: tests run faster, compile time reduces, awk/grep became faster, etc. As startup founders/CTO, you can even held garbage throwing party once ever few months. Every programmer I know loves throwing away garbage code.

2. Ask stakeholders what are the primary use cases. Don't write a single line before doing this because it will be wrong, again.

3. When new technology is involved, perform load tests. Even the most trivial load tests would do. Doing this will inform you basic knowledge on how robust the tools are.

4. Rewrite 1 thing at a time and run tests in between so that your confidence stays high.

counter point: Greenplum.

An anecdotal evidence that it is possible to fire all of sales, rewrite and not only survive but also have a good exit.


Sometimes when you have a blighted building on your hands, the only thing you can do is burn it to the ground, clear the lot and start fresh. It just doesn't make sense to replace a window here, the plumbing there when the entire frame is rotten and about to collapse.

If Steve had written the ending of the post as the beginning, it would have clearly conveyed what he was trying to say.

The key factors involved in such a decision are:

1) Why 2) When

I don't think it is a blanket vote for or against a rewrite.

The important bits come in the end under "Lessons Learned".

Joel Spolsky has written a very good esay about this using netscape as example: http://www.joelonsoftware.com/articles/fog0000000069.html

"don’t rewrite the code base in businesses where time to market is critical and customer needs shift rapidly."

I don't mean this sarcastically but in 2011 what businesses are there where "time to market" not important?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact