A rewrite is very hard, specially from a ancient, spaguetti monster. Keep it in the same language or not is not the biggest issue, IMHO, except if we are talking about a key library not available elsewhere or a niche where that language is just the BEST.
I have done severals. Changin languages, database engines, architecture, styles... even do the same thing several times in the same project!
And each time, I see a lot of code reduction. Specially, if I can change the language!. In one of them, I reduce by the butloads some badly C# project to python. I meant, close to 1000 files to few dozens. Yep, if we are talking about spaguetti, then that could compresse that well ;)
In fact, I think the action of change language (or move to the most recent versions with the most modern libraries/dependencies possible) is the SIMPLEST way to reduce the load of the job.
I do this all the time. Each time obj-c give some new trick that cut code, I apply it as fast as possible, across all my codebase. I learn to do that after the most insane upgrade/rewrite from .NET 1 to 1.1 then 2.0 that kill us because the boss wait to much.
The BEST way is obviously not bring (again) the same mistake that create that monster in first place. THAT is what make hard/impossible the task in the average corporation, because are the cultural problems that cause the biggest mess.
Also, is necesary to keep old project alive, and (this is something that bite me once) truly have the most hyper-perfect data upgrade/syncronization possible to minimize downtime and have real data from the start... real but clean! This way I create a 3-tier version in visual foxpro+sql server of a fox/dos app that was deployed in +2000 places with non-tech people before the internet, sucesfully (bar the first couple of tries ;)).
It's coincidental that this should be on the front page of HN the night before my team and I release a big migration that has taken the best part of a year.
The codebase he describes is an eerily accurate representation of where we started, with the added complication that it was built by a sole developer than wasn't using any version control at all.
There are roughly 600k lines of Perl code in the back-end alone, but because there was a lot of duplication in lieu of version control, I have no way of knowing how much of this was actually in use. I suspect roughly 100-150k.
Our approach was pulling the platform apart into distinct (Ruby) services and putting HTTP interfaces around some of the legacy services where possible. We've ended up with < 15k of Ruby, including the front-ends. It's not perfect but there haven't been many major issues in our pilot release, and the team is happy. Fingers crossed.
While a lot of the commenters here seem somehow outraged at the computations, 5.5 man years seemed like a very aggressive schedule for rewriting a 1MLOC system, even if the end result were a 100KLOC one. My initial guess would have been 10 man years, and a cost in the millions.
So here you have what was probably a 150KLOC original system. How many man-years would you guess the rewrite took?
Three, plus a part-time contractor. One of the team is front-end so had little to do with replacing the Perl part.
The truth of it is, in our case it isn't (yet) a full rewrite, and there is still a lot of functionality tied up in the existing codebase. So it's not easy to answer the question about man years, but I would guess around 1.5.
The biggest wins for us in terms of lines of code were not the language (we did consider sticking with Perl), but re-assessing the business logic, ridding the codebase of legacy junk and using existing libraries instead of hand-rolled solutions.
However, I would mention that this doesn't match the alleged situation in article, a million lines in heavy use.
Your app sound like it's real complexity was considerably less. Given that at least complexity goes up with size of the code, your success still might not mean that diving into the situation described in the article would be a good idea.
This blog post seems like a case of being stuck between a "second system syndrome" and a sunk cost fallacy.
It would seem like the obvious answer would be to not sit down and rewrite the whole thing from scratch, but start replacing pieces (with whatever language they think they'll be successful at).
The idea that they should somehow just be stuck forever with a shitty mess of a perl application seems incredibly defeatist.
Protecting sunk costs is only a fallacy if the money was entirely wasted.
If you spent money on something and that something isn't worth what you put into it but still is worth a lot, you want to protect that investment.
As far as incremental improvements go, rewriting some part in another language seem pretty bad. I mean, if you are being incremental, then you have to make changes that might interrupted in the middle and then you'd be saddling the system with two different languages.
You could just easily rewrite the worst parts to conform whatever existing or new standard you have. I'm not fan of Perl but I'm pretty sure you could at least create a subset that would conform standard object-oriented practices and not have the problem of now having a system written in two different languages.
I have a hard time considering this large application as an "investment". Presumably, it provides some business function, and has been hopefully generating more revenue in its lifetime than was spent creating it.
I don't think the software has any intrinsic value; it wasn't constructed from precious metals, which could be re-smelted and sold for scrap. If continuing to develop it/maintain it is costing them money (either in real terms because of development costs, or by lost opportunity costs), than they should look at how that's trending. At what point does the thing cost more than it's worth to them? (Maybe it never does, I've had banking customers who spend millions of dollars a year keeping a thirty year old, p.o.s. Cobol application running because it's the backbone of their operations).
And I didn't mean to imply that they should rewrite parts of it in another language, just that they should start decomplecting pieces of it so that they can replace those pieces with better-designed ones. If they want to stick with Perl, because that's what their expertise is, than I think they should do that. I would agree that it's probably counterproductive to take on both rebuilding the application while at the same time switching to a new language.
But unless the entire application is passing around internal Perl data structures, it seems crazy that they can't identify edges to the application functionality, and start to peel those edges away and encapsulate that functionality in a better way.
You can write a million lines of code that do arbitrarily little, particularly if you're following poor development practices. I've seen a million-line codebase that could have been replaced with 10k lines of python. Depending on what functionality they actually need and how much they need the developers, Acme could well be making the right decision.
You know why you can do that? Because you have hindsight. You see exactly where the business went.
And this is the same damn trap all neophyte developers fall into. "Let's rewrite!"
Once that first wave of business requests and demands comes along, your precious sandcastle will crumble. Because the business team is fickle. And they stick you with deadlines. And then, mid-deadline, they change their mind. Or are forced to go a different direction because some shit government law is passed that requires you to broadcast your service requests with encrypted messages tied to pigeons (because, realistically, that's how the government does APIs).
Here on the internet where everything is made up and the points don't matter, you can get away with rewrites. Agile not working for you? Let me introduce you to the CADT model (http://www.jwz.org/doc/cadt.html).
All rewrites become tomorrow's bug-infested legacy ghettos.
You're usually correct. But a great developer is one who can spot the cases where the patterns don't apply. There are cases where the legacy code is so bad that it's better to throw it away, and there are rewrites that work out well (I've been part of one of them).
(Don't listen to JWZ, at least on business/strategy. His employers don't exactly have a great history of success on that front)
I think this is the key. I have this big honking system that nobody really understands. We need a system to do X. please implement X is way easier than the whole, rewrite a million lines problem.
What is also key is understanding that this is almost never the way that happens.
It goes more like this:
We have this big honking system that nobody really understands. We need a system that does X (where X is the list of features we think the other system does that are critical. X is subject to change as we discover other features to add and features that actually weren't necessary.) Please implement X this way.
And that can easily become an exercise in extreme frustration.
In my experience, the reason they will not even consider "Please implement X" as an option is because they see ANY additional effort as more expensive. They figure taking the existing code and producing a new system that EXACTLY duplicates the functionality will guarantee controlled costs.
They're wrong of course, but this is how it is done, at least in government contracting circles. I have literally seen developers required to get out a RULER and measure user interfaces as they appear on one screen so that they can exactly duplicate them on newer higher-dpi screens without changing the appearance or functionality whatsoever.
Everyone presumes the million-line system is one which could be implemented much more simply... there are some systems, however, that need those million lines. Rewriting one of those in a new language is a whole different level of nightmare. Especially when the requirements are 'make it work like the old system' and nothing else, and the requirements for the old system are 30 years old and not even close to portraying the system as it stands currently... But hey, to avoid 'being the next healthcare.gov', political types will push anything out the door and just shoot anyone who points out problems in the back.
Ah. And then a week later you hear that Janice in the accounting office in New Zealand depended on one of those features. That's when you learn there is a Janice in accounting. And that your company has an office in New Zealand.
There are times where a legacy technology has limitations that ultimately prevent progress and create maintenance nightmares. A good example is some of the older database technologies where a 1TB database machine could cost 100k (example: Sybase).
You could try to maintain a series of expensive databases, but between replication backups, dev boxes, team of DBAs etc suddenly your costs to keep the old technology are really high. And if these are databases storing expensive financial data, well, maybe the risk of hitting your 1TB limit is an expensive risk to have on your plate.
And it might take 10MM to migrate to a new technology, but in this case it would probably be worth a switch.
When technology is a commodity, and working poorly is still working, then maybe it's hard to justify a switch. But if your old technology starts to limit your performance and introduce risks or problems that detract from your competitive advantage, then you might not have a choice.
Minor point, but I don't like the citation of Netscape as a failed rewrite.
It's worth remembering that the Mozilla project ended up being a success in that it dealt the blow that finally dislodged IE from dominance. The original Netscape codebase could not do this because at the time IE4 was already way ahead in terms of CSS support and Netscape was hitting an architectural dead-end. Maybe they could have piled on some more hacks to get NS5 out quicker, but even if it had feature parity to IE5 they had to contend with Microsoft's bundling which was Netscape's real undoing.
Lots of absurd assumptions in here. Many of them were acknowledged, but the author seems to think that this `millions of lines of spaghetti code` with `little use of existing libraries` will be rewritten as millions of lines of spaghetti code with little use of existing libraries. The rewrite should decrease the workload to a fraction of what the author used for his fuzzy math.
Changing the language isn't a big deal. You can always go SOA. In fact, even if you stick with the same language and are doing a rewrite, SOA is probably the right answer.
If the original system is THAT bad, the language isn't the problem, it's the architecture, and you should probably refactor it into multiple components which could potentially be multiple different languages.
That's my thoughts. Get away from app and more into portal. What still befuddles programmers on large projects is that they still think from the whole front to the whole back and vice versa. MVC hasn't helped this one bit.
Is this train of thought meant to be specific to server-side apps? An example of a large rewrite was the classic Netscape browser engine (to Gecko), and I'm having trouble picturing how a browser engine could use a SOA.
SOA is just the general engineering pattern for providing services to anything that can consume them. The internet is fully realized if all endpoints can both produce and consume services.
The scary thing for me in ditching the old code base has been those undocumented corner cases. You generally find them by saying "We don't need that" to some piece of code and later find out it was a bug fix 5 years ago. It always seems worse with stored procedures when moving databases.
If the only spec is the old code base, then you are probably doomed.
I haven't worked on a big project without version control, and I don't see it as a substitute for an actual specification. I have also not see complete explanations of patches be explained in detail enough to substitute for a spec.
What really surprises me is that I got to the end and no one mentioned what seems obvious to me: if they are doing a rewrite (in any language) of a horrible code base they made, what reason do they have to believe this time will be different? Yes, we can assume some learning, hopefully some improvement, but as the saying goes, you can write FORTRAN in any language. Switching languages won't magically fix things. Training up your dev team and getting them to start refactoring, OTOH, is a much more interesting proposition.
"fired the dev team that had been working for one and a half years to develop a complicated project in part because an outsourcing company in India promised they could replicate it in two months" - when will they learn?
More appalling (or hilarious) is the footnote: "Except for an insurance company who decided to switch their accounting software from COBOL to C++. They gave their COBOL devs a two week training course in C++ and told 'em to rewrite the system. I don't need to tell you how that turned out."
I can't possibly imagine that the 1000000 lines of code couldn't be replaced piecemeal (in giant blocks). Then again, I've never seen 1000000 lines of code.
As some previous commenters have noted, there is a spectrum here between full-bore rewrite and don't-touch-a-thing. Parts of the code could be factored out, modularized, rewritten in another language, and all that without huge impacts on the existing functionality. Joel Spolsky wrote an article a long time ago about the value of software that has already been written and put through its paces (though I don't agree with /never/ rewriting, it should be done only as last resort):
"The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. "
There's value baked into that old code: lessons learned, bugs fixed, workaround put into place. These things can be lost during a rewrite (and sure, there are other times you don't need them in the new version because the original problem has a better solution/doesn't happen in the new language/whatever), and potentially losing/missing should be considered carefully.
This is a pretty terrible analysis. It completely ignores that things might, just MIGHT actually be built correctly this time. Ignores that the developers redoing the code might know the problem domain better. Ignores improvements in the quality of the development teams. Ignores far too much and then advocates staying with a bad solution out of fear that it might not go well.
I think this is a good, well reasoned writeup. One thing I'd take exception with is the idea that they have to rewrite a million lines of code.
> Very little use of existing libraries ("not invented here" syndrome)
The senior dev and PM could sit down over a few weeks and do a assessment of other languages that had good library coverage for lots of the existing system without writing a line of code.
I'd bet the project size would shrink considerably.
At an old startup I worked in, we had a legacy codebase of about 600k lines with 15 years of cruft in an ancient dialect of C++ with lots of not invented here syndrome.
By that point the system was so old and fragile that it simply had to be rewritten. The few libraries that we did use and didn't write were no longer supported, modern OSs wouldn't run the software correctly, vendors had simply gone out of business and so on.
A good 70% of the system functionality was rewritten in C# by just a couple guys part-time over the course of a year basically just gluing together existing libraries.
OK, I'll bite. Why the hell would anyone want to go back to Perl? That is one disgusting mashup of a language.
I'm horrified by it daily when I have to use scripts written by older bioinformaticians. The best benefits I've heard are string processing speed (<3 you Ruby) and package management (hello Python, Ruby, R).
Sometimes Perl works really well. For example, just the other day I wrote a tokenizer in Perl after attempting to do the same in other languages, doesn't it look pretty?
What position? The position I see it in is a legacy language that used to run the internet and was the first language that really worked for bioinformatics. Is it different outside science?
Yes--people who have practical experience writing and maintaining code that has to be maintained tend to write maintainable code.
Scientists (and, in my experience, especially bioinformaticians) tend to make horrible, awful messes no matter how maintainable you think a language is. (You can hand them Inform 7 and it'll still end up looking like Fortran ate the csh manual and vomited all over an APL keyboard.)
Perl has always been designed from a get-things-done point of view, rather than adhering to a particular philosophy. It also has a huge number of well-maintained libraries available. These are both big plusses in some contexts.
(Not speaking for myself here, BTW. A decade ago I tried Python & never looked back.)
In your case, part of the problem may not be due so much to the language, as to the authors of those scripts you mention. People who have not studied software development as their primary discipline have typically not been exposed to ideas about good design, writing maintainable code, etc.
Yeah I agree the programmers whose code I'm reading are likely skewing the sample. But, as anecdata, when I moved to my current lab I requested that we not code in perl because people were writing unreadable, unmaintainable code. When we stiched to Ruby en masse, people really pulled it out of the bag and we now have really quite a nice codebase. I think the very strong community focus on standards in Ruby helped that along, as did the basic aesthetic of the language.
Yes, I know quite a few. In computational genomics in the south-east UK there's us (University of Cambridge Plant Sciences), Queen Mary University (Yannick Wurm's group), and The Sainsbury Lab at the John Innes Centre in Norwich (Dan Maclean's lab).
May need to expand. Saw several posts about shrinking but sometimes the best way to go is expand then start replacing little parts. So A, B, C both feed into magic box that produces X, Y, Z respectively. Well, make two (or more) magic boxes and rather than trying to write a (ABC) -> (XYZ) converter all at once, write a A to X converter, then a B to Y converter... Given wildly different languages, they may no longer belong in the same function anyway...
The only way to pull off something like this is to do it slowly, carefully, and in pieces. You aren't going to rewrite a system that large all in one go.
Of course, rewriting something just so you can say it's in a different language is silly anyway. Whoever set that goal is being overly simplistic. They need to step back and re-examine their actual needs and real problems.
PayPal's Node.js experiences are totally different than what the blog is describing. Besides, the "number of lines" calculation is useless, and even though the blog says so, t still runs with it. Besides, a Perl blog post is not a very neutral source on the cost of rewriting Perl to something else.
Very misleading post. It's obviously not that expensive to start slowly replacing pieces and have some people work on new features isolated from the main code base.
But I understand Ovid's frustration with so many people successfully switching from Perl to things like Go and being happy about it in their blogs ;)
We're still in the dark ages when it comes to software development, and I don't believe it's because we're using Java instead of Haskell.
I believe we need much more powerful tools that help us in understanding large code bases. Tools that can help us visualize what's going on. Tools that can do testing for us. Tools that can rewrite code for us (think Resharper or other Jetbrains refactoring tools), but an order of magnitude better.
Why don't you think Haskell will help with this? If Java can't prevent a NullPointerException I don't see how static analysis can take the tooling where you want it to go.
It's zero-sum if the tooling around Haskell is antiquated compared to Java's. But the idea that a language is going to make us "that" much more productive has to go. We need much, much better tools.
> That's roughly 5.5 person years of effort to rewrite to rewrite the code base, but that assumes you're working seven days a week, 365 days a year. In reality US workers typically work roughly 2,000 hours per year, or about 250 days out of the year. That means it would take eight person years of effort to replicate the above code base (over ten years for the average hourly hours here in France).
If he's from France, why not just stick to talking about the French working hours? Or just one of the two countries. Bringing up two countries like that was weird to read, but maybe that's just me.
I have done severals. Changin languages, database engines, architecture, styles... even do the same thing several times in the same project!
And each time, I see a lot of code reduction. Specially, if I can change the language!. In one of them, I reduce by the butloads some badly C# project to python. I meant, close to 1000 files to few dozens. Yep, if we are talking about spaguetti, then that could compresse that well ;)
In fact, I think the action of change language (or move to the most recent versions with the most modern libraries/dependencies possible) is the SIMPLEST way to reduce the load of the job.
I do this all the time. Each time obj-c give some new trick that cut code, I apply it as fast as possible, across all my codebase. I learn to do that after the most insane upgrade/rewrite from .NET 1 to 1.1 then 2.0 that kill us because the boss wait to much.
The BEST way is obviously not bring (again) the same mistake that create that monster in first place. THAT is what make hard/impossible the task in the average corporation, because are the cultural problems that cause the biggest mess.
Also, is necesary to keep old project alive, and (this is something that bite me once) truly have the most hyper-perfect data upgrade/syncronization possible to minimize downtime and have real data from the start... real but clean! This way I create a 3-tier version in visual foxpro+sql server of a fox/dos app that was deployed in +2000 places with non-tech people before the internet, sucesfully (bar the first couple of tries ;)).