Ditching a Language

mamcx · on Jan 14, 2014

A rewrite is very hard, specially from a ancient, spaguetti monster. Keep it in the same language or not is not the biggest issue, IMHO, except if we are talking about a key library not available elsewhere or a niche where that language is just the BEST.

I have done severals. Changin languages, database engines, architecture, styles... even do the same thing several times in the same project!

And each time, I see a lot of code reduction. Specially, if I can change the language!. In one of them, I reduce by the butloads some badly C# project to python. I meant, close to 1000 files to few dozens. Yep, if we are talking about spaguetti, then that could compresse that well ;)

In fact, I think the action of change language (or move to the most recent versions with the most modern libraries/dependencies possible) is the SIMPLEST way to reduce the load of the job.

I do this all the time. Each time obj-c give some new trick that cut code, I apply it as fast as possible, across all my codebase. I learn to do that after the most insane upgrade/rewrite from .NET 1 to 1.1 then 2.0 that kill us because the boss wait to much.

The BEST way is obviously not bring (again) the same mistake that create that monster in first place. THAT is what make hard/impossible the task in the average corporation, because are the cultural problems that cause the biggest mess.

Also, is necesary to keep old project alive, and (this is something that bite me once) truly have the most hyper-perfect data upgrade/syncronization possible to minimize downtime and have real data from the start... real but clean! This way I create a 3-tier version in visual foxpro+sql server of a fox/dos app that was deployed in +2000 places with non-tech people before the internet, sucesfully (bar the first couple of tries ;)).

anarchitect · on Jan 14, 2014

It's coincidental that this should be on the front page of HN the night before my team and I release a big migration that has taken the best part of a year.

The codebase he describes is an eerily accurate representation of where we started, with the added complication that it was built by a sole developer than wasn't using any version control at all.

There are roughly 600k lines of Perl code in the back-end alone, but because there was a lot of duplication in lieu of version control, I have no way of knowing how much of this was actually in use. I suspect roughly 100-150k.

Our approach was pulling the platform apart into distinct (Ruby) services and putting HTTP interfaces around some of the legacy services where possible. We've ended up with < 15k of Ruby, including the front-ends. It's not perfect but there haven't been many major issues in our pilot release, and the team is happy. Fingers crossed.

jsnell · on Jan 14, 2014

The obvious question is: how large was your team?

While a lot of the commenters here seem somehow outraged at the computations, 5.5 man years seemed like a very aggressive schedule for rewriting a 1MLOC system, even if the end result were a 100KLOC one. My initial guess would have been 10 man years, and a cost in the millions.

So here you have what was probably a 150KLOC original system. How many man-years would you guess the rewrite took?

anarchitect · on Jan 14, 2014

Three, plus a part-time contractor. One of the team is front-end so had little to do with replacing the Perl part.

The truth of it is, in our case it isn't (yet) a full rewrite, and there is still a lot of functionality tied up in the existing codebase. So it's not easy to answer the question about man years, but I would guess around 1.5.

The biggest wins for us in terms of lines of code were not the language (we did consider sticking with Perl), but re-assessing the business logic, ridding the codebase of legacy junk and using existing libraries instead of hand-rolled solutions.

bluej4ack · on Jan 14, 2014

The way you did it seems to be the most logical (and obvious) approach, which the article completely neglects to mention

joe_the_user · on Jan 14, 2014

Good for you,

However, I would mention that this doesn't match the alleged situation in article, a million lines in heavy use.

Your app sound like it's real complexity was considerably less. Given that at least complexity goes up with size of the code, your success still might not mean that diving into the situation described in the article would be a good idea.

anarchitect · on Jan 15, 2014

In all likelihood our scenario is less complex than in the one described in the article.

That said, the code we are replacing is in constant use and powers all our online stores, the main source of revenue for our company.

m0nastic · on Jan 14, 2014

This blog post seems like a case of being stuck between a "second system syndrome" and a sunk cost fallacy.

It would seem like the obvious answer would be to not sit down and rewrite the whole thing from scratch, but start replacing pieces (with whatever language they think they'll be successful at).

The idea that they should somehow just be stuck forever with a shitty mess of a perl application seems incredibly defeatist.

joe_the_user · on Jan 14, 2014

Protecting sunk costs is only a fallacy if the money was entirely wasted.

If you spent money on something and that something isn't worth what you put into it but still is worth a lot, you want to protect that investment.

As far as incremental improvements go, rewriting some part in another language seem pretty bad. I mean, if you are being incremental, then you have to make changes that might interrupted in the middle and then you'd be saddling the system with two different languages.

You could just easily rewrite the worst parts to conform whatever existing or new standard you have. I'm not fan of Perl but I'm pretty sure you could at least create a subset that would conform standard object-oriented practices and not have the problem of now having a system written in two different languages.

m0nastic · on Jan 14, 2014

I have a hard time considering this large application as an "investment". Presumably, it provides some business function, and has been hopefully generating more revenue in its lifetime than was spent creating it.

I don't think the software has any intrinsic value; it wasn't constructed from precious metals, which could be re-smelted and sold for scrap. If continuing to develop it/maintain it is costing them money (either in real terms because of development costs, or by lost opportunity costs), than they should look at how that's trending. At what point does the thing cost more than it's worth to them? (Maybe it never does, I've had banking customers who spend millions of dollars a year keeping a thirty year old, p.o.s. Cobol application running because it's the backbone of their operations).

And I didn't mean to imply that they should rewrite parts of it in another language, just that they should start decomplecting pieces of it so that they can replace those pieces with better-designed ones. If they want to stick with Perl, because that's what their expertise is, than I think they should do that. I would agree that it's probably counterproductive to take on both rebuilding the application while at the same time switching to a new language.

But unless the entire application is passing around internal Perl data structures, it seems crazy that they can't identify edges to the application functionality, and start to peel those edges away and encapsulate that functionality in a better way.

lmm · on Jan 14, 2014

You can write a million lines of code that do arbitrarily little, particularly if you're following poor development practices. I've seen a million-line codebase that could have been replaced with 10k lines of python. Depending on what functionality they actually need and how much they need the developers, Acme could well be making the right decision.

aryastark · on Jan 14, 2014

You know why you can do that? Because you have hindsight. You see exactly where the business went.

And this is the same damn trap all neophyte developers fall into. "Let's rewrite!"

Once that first wave of business requests and demands comes along, your precious sandcastle will crumble. Because the business team is fickle. And they stick you with deadlines. And then, mid-deadline, they change their mind. Or are forced to go a different direction because some shit government law is passed that requires you to broadcast your service requests with encrypted messages tied to pigeons (because, realistically, that's how the government does APIs).

Here on the internet where everything is made up and the points don't matter, you can get away with rewrites. Agile not working for you? Let me introduce you to the CADT model (http://www.jwz.org/doc/cadt.html).

All rewrites become tomorrow's bug-infested legacy ghettos.

lmm · on Jan 15, 2014

You're usually correct. But a great developer is one who can spot the cases where the patterns don't apply. There are cases where the legacy code is so bad that it's better to throw it away, and there are rewrites that work out well (I've been part of one of them).

(Don't listen to JWZ, at least on business/strategy. His employers don't exactly have a great history of success on that front)

jfoutz · on Jan 14, 2014

I think this is the key. I have this big honking system that nobody really understands. We need a system to do X. please implement X is way easier than the whole, rewrite a million lines problem.

zaphar · on Jan 14, 2014

What is also key is understanding that this is almost never the way that happens.

It goes more like this:

We have this big honking system that nobody really understands. We need a system that does X (where X is the list of features we think the other system does that are critical. X is subject to change as we discover other features to add and features that actually weren't necessary.) Please implement X this way.

And that can easily become an exercise in extreme frustration.

otakucode · on Jan 14, 2014

In my experience, the reason they will not even consider "Please implement X" as an option is because they see ANY additional effort as more expensive. They figure taking the existing code and producing a new system that EXACTLY duplicates the functionality will guarantee controlled costs.

They're wrong of course, but this is how it is done, at least in government contracting circles. I have literally seen developers required to get out a RULER and measure user interfaces as they appear on one screen so that they can exactly duplicate them on newer higher-dpi screens without changing the appearance or functionality whatsoever.

Everyone presumes the million-line system is one which could be implemented much more simply... there are some systems, however, that need those million lines. Rewriting one of those in a new language is a whole different level of nightmare. Especially when the requirements are 'make it work like the old system' and nothing else, and the requirements for the old system are 30 years old and not even close to portraying the system as it stands currently... But hey, to avoid 'being the next healthcare.gov', political types will push anything out the door and just shoot anyone who points out problems in the back.

guard-of-terra · on Jan 14, 2014

Whoa I once got a 150kLOC code base in my possession and got it to 37kLOC in maybe two years.

I cut a lot of stupid boilerplate and was willing to cut features nobody needed. I added a lot of new, good features (not so good, too).

It was still not perfect, these days I would be much more fierce at shaking old stuff.

Still, you just delete and fix and delete and fix. That's how you make a turbo plasma rifle out of Singer sewing machine.

aryastark · on Jan 14, 2014

> was willing to cut features nobody needed

Ah. And then a week later you hear that Janice in the accounting office in New Zealand depended on one of those features. That's when you learn there is a Janice in accounting. And that your company has an office in New Zealand.

guard-of-terra · on Jan 15, 2014

Well, then you fix it. It's better to try than coding around code of unknown origin, likely defunct.

arh68 · on Jan 14, 2014

Did you change the language, or did the language/libraries stay the same? Just curious.

guard-of-terra · on Jan 14, 2014

Language stayed the same (java) and libraries stayed mostly the same because I was actually afraid of infreastructure changes.

squigs25 · on Jan 14, 2014

So... what is the solution?

There are times where a legacy technology has limitations that ultimately prevent progress and create maintenance nightmares. A good example is some of the older database technologies where a 1TB database machine could cost 100k (example: Sybase).

You could try to maintain a series of expensive databases, but between replication backups, dev boxes, team of DBAs etc suddenly your costs to keep the old technology are really high. And if these are databases storing expensive financial data, well, maybe the risk of hitting your 1TB limit is an expensive risk to have on your plate.

And it might take 10MM to migrate to a new technology, but in this case it would probably be worth a switch.

When technology is a commodity, and working poorly is still working, then maybe it's hard to justify a switch. But if your old technology starts to limit your performance and introduce risks or problems that detract from your competitive advantage, then you might not have a choice.

dasil003 · on Jan 14, 2014

Minor point, but I don't like the citation of Netscape as a failed rewrite.

It's worth remembering that the Mozilla project ended up being a success in that it dealt the blow that finally dislodged IE from dominance. The original Netscape codebase could not do this because at the time IE4 was already way ahead in terms of CSS support and Netscape was hitting an architectural dead-end. Maybe they could have piled on some more hacks to get NS5 out quicker, but even if it had feature parity to IE5 they had to contend with Microsoft's bundling which was Netscape's real undoing.

Arnor · on Jan 14, 2014

TL;DR Straw men and fuzzy math.

Lots of absurd assumptions in here. Many of them were acknowledged, but the author seems to think that this `millions of lines of spaghetti code` with `little use of existing libraries` will be rewritten as millions of lines of spaghetti code with little use of existing libraries. The rewrite should decrease the workload to a fraction of what the author used for his fuzzy math.

baldfat · on Jan 14, 2014

you are not allow to dissect an article and make fuzy math and strawmen claims with TL;DR!!!!!

Arnor · on Jan 14, 2014

You're right. That was snide and disingenuous. Unfortunately, I've lost my edit link or I'd remove it. Sorry.

andrewvc · on Jan 14, 2014

Changing the language isn't a big deal. You can always go SOA. In fact, even if you stick with the same language and are doing a rewrite, SOA is probably the right answer.

If the original system is THAT bad, the language isn't the problem, it's the architecture, and you should probably refactor it into multiple components which could potentially be multiple different languages.

jksmith · on Jan 14, 2014

That's my thoughts. Get away from app and more into portal. What still befuddles programmers on large projects is that they still think from the whole front to the whole back and vice versa. MVC hasn't helped this one bit.

millstone · on Jan 14, 2014

Is this train of thought meant to be specific to server-side apps? An example of a large rewrite was the classic Netscape browser engine (to Gecko), and I'm having trouble picturing how a browser engine could use a SOA.

jksmith · on Jan 14, 2014

That's an interesting example because existence of a thick client allows for SOA on everything else. The browser is the last thick application.

anvandare · on Jan 14, 2014

Correct me if I'm wrong but isn't SOA just the Unix Philosophy with a focus on networked computers?

jksmith · on Jan 14, 2014

SOA is just the general engineering pattern for providing services to anything that can consume them. The internet is fully realized if all endpoints can both produce and consume services.

protomyth · on Jan 14, 2014

The scary thing for me in ditching the old code base has been those undocumented corner cases. You generally find them by saying "We don't need that" to some piece of code and later find out it was a bug fix 5 years ago. It always seems worse with stored procedures when moving databases.

If the only spec is the old code base, then you are probably doomed.

bluej4ack · on Jan 14, 2014

version control would help with that

protomyth · on Jan 14, 2014

I haven't worked on a big project without version control, and I don't see it as a substitute for an actual specification. I have also not see complete explanations of patches be explained in detail enough to substitute for a spec.

npsimons · on Jan 14, 2014

What really surprises me is that I got to the end and no one mentioned what seems obvious to me: if they are doing a rewrite (in any language) of a horrible code base they made, what reason do they have to believe this time will be different? Yes, we can assume some learning, hopefully some improvement, but as the saying goes, you can write FORTRAN in any language. Switching languages won't magically fix things. Training up your dev team and getting them to start refactoring, OTOH, is a much more interesting proposition.

_random_ · on Jan 14, 2014

"fired the dev team that had been working for one and a half years to develop a complicated project in part because an outsourcing company in India promised they could replicate it in two months" - when will they learn?

reinhardt · on Jan 14, 2014

More appalling (or hilarious) is the footnote: "Except for an insurance company who decided to switch their accounting software from COBOL to C++. They gave their COBOL devs a two week training course in C++ and told 'em to rewrite the system. I don't need to tell you how that turned out."

rtpg · on Jan 14, 2014

I can't possibly imagine that the 1000000 lines of code couldn't be replaced piecemeal (in giant blocks). Then again, I've never seen 1000000 lines of code.

digisth · on Jan 15, 2014

As some previous commenters have noted, there is a spectrum here between full-bore rewrite and don't-touch-a-thing. Parts of the code could be factored out, modularized, rewritten in another language, and all that without huge impacts on the existing functionality. Joel Spolsky wrote an article a long time ago about the value of software that has already been written and put through its paces (though I don't agree with /never/ rewriting, it should be done only as last resort):

http://www.joelonsoftware.com/articles/fog0000000069.html

The most important part:

"The idea that new code is better than old is patently absurd. Old code has been used. It has been tested. Lots of bugs have been found, and they've been fixed. "

There's value baked into that old code: lessons learned, bugs fixed, workaround put into place. These things can be lost during a rewrite (and sure, there are other times you don't need them in the new version because the original problem has a better solution/doesn't happen in the new language/whatever), and potentially losing/missing should be considered carefully.

jaegerpicker · on Jan 14, 2014

This is a pretty terrible analysis. It completely ignores that things might, just MIGHT actually be built correctly this time. Ignores that the developers redoing the code might know the problem domain better. Ignores improvements in the quality of the development teams. Ignores far too much and then advocates staying with a bad solution out of fear that it might not go well.

bane · on Jan 14, 2014

I think this is a good, well reasoned writeup. One thing I'd take exception with is the idea that they have to rewrite a million lines of code.

> Very little use of existing libraries ("not invented here" syndrome)

The senior dev and PM could sit down over a few weeks and do a assessment of other languages that had good library coverage for lots of the existing system without writing a line of code.

I'd bet the project size would shrink considerably.

At an old startup I worked in, we had a legacy codebase of about 600k lines with 15 years of cruft in an ancient dialect of C++ with lots of not invented here syndrome.

By that point the system was so old and fragile that it simply had to be rewritten. The few libraries that we did use and didn't write were no longer supported, modern OSs wouldn't run the software correctly, vendors had simply gone out of business and so on.

A good 70% of the system functionality was rewritten in C# by just a couple guys part-time over the course of a year basically just gluing together existing libraries.

scott_meyer · on Jan 14, 2014

Can you name any widely used piece of Software that has not been rewritten at least once?

Rewrites: Firefox, IE, Word, Windows, MacOS, ...

Many of these have been rewritten multiple times and they are all orders of magnitude more complex than some random million-line hunk of perl.

Rewriting is hard and may absolutely or just take a long time, but failure to rewrite is pretty much guaranteed to fail.

Blahah · on Jan 14, 2014

OK, I'll bite. Why the hell would anyone want to go back to Perl? That is one disgusting mashup of a language.

I'm horrified by it daily when I have to use scripts written by older bioinformaticians. The best benefits I've heard are string processing speed (<3 you Ruby) and package management (hello Python, Ruby, R).

zzzcpan · on Jan 14, 2014

Sometimes Perl works really well. For example, just the other day I wrote a tokenizer in Perl after attempting to do the same in other languages, doesn't it look pretty?

  sub tokenize {
    while ($_[0] =~ m!
      (?<whitespace>  [\x20\x09]+                   ) |
      (?<lf>          [\x0a]                        ) |
      (?<cr>          [\x0d]                        ) |
      (?<ident>       [A-Za-z_]+[A-Za-z0-9_]*       ) |
      (?<float>       [0-9]*\.[0-9]+                ) |
      (?<float>       [0-9]+\.[0-9]*                ) |
      (?<int>         [0-9]+                        ) |
      # ...
      (?<unknown>     .                             )
    !gsx) {
      my ($k, $v) = each %+;
      # $k: token, $v: data
      # pos($_[0]): current offset
      # ...
    }
  }

Blahah · on Jan 14, 2014

nice... it looks quite pretty, but would be much prettier (but very similar) in Ruby :P

jerf · on Jan 14, 2014

"That is one disgusting mashup of a language."... "I'm horrified by it daily when I have to use scripts written by older bioinformaticians."

The answer is that your sample is flawed. Scientists can turn anything into a "disgusting mashup of a language". Perl is in its position for a reason.

Blahah · on Jan 14, 2014

What position? The position I see it in is a legacy language that used to run the internet and was the first language that really worked for bioinformatics. Is it different outside science?

chromatic · on Jan 14, 2014

Is it different outside science?

Yes--people who have practical experience writing and maintaining code that has to be maintained tend to write maintainable code.

Scientists (and, in my experience, especially bioinformaticians) tend to make horrible, awful messes no matter how maintainable you think a language is. (You can hand them Inform 7 and it'll still end up looking like Fortran ate the csh manual and vomited all over an APL keyboard.)

jerf · on Jan 15, 2014

"You can hand them Inform 7 and it'll still end up looking like Fortran ate the csh manual and vomited all over an APL keyboard."

You have made my day.

ggchappell · on Jan 14, 2014

Perl has always been designed from a get-things-done point of view, rather than adhering to a particular philosophy. It also has a huge number of well-maintained libraries available. These are both big plusses in some contexts.

(Not speaking for myself here, BTW. A decade ago I tried Python & never looked back.)

In your case, part of the problem may not be due so much to the language, as to the authors of those scripts you mention. People who have not studied software development as their primary discipline have typically not been exposed to ideas about good design, writing maintainable code, etc.

Blahah · on Jan 14, 2014

Yeah I agree the programmers whose code I'm reading are likely skewing the sample. But, as anecdata, when I moved to my current lab I requested that we not code in perl because people were writing unreadable, unmaintainable code. When we stiched to Ruby en masse, people really pulled it out of the bag and we now have really quite a nice codebase. I think the very strong community focus on standards in Ruby helped that along, as did the basic aesthetic of the language.

chrisseaton · on Jan 14, 2014

Do you know many scientists who code in Ruby? Is any of your code open source? I'm interested in scientific uses of Ruby.

Blahah · on Jan 14, 2014

Yes, I know quite a few. In computational genomics in the south-east UK there's us (University of Cambridge Plant Sciences), Queen Mary University (Yannick Wurm's group), and The Sainsbury Lab at the John Innes Centre in Norwich (Dan Maclean's lab).

In-progress code for my PhD is mostly on Github: http://github.com/Blahah

See also BioRuby, biogems.info, sequenceserver.com

garrettdreyfus · on Jan 14, 2014

Come on man is it that really necessary? That is neither instructive or helpful to anybody, its just mean.

Blahah · on Jan 14, 2014

It's not mean, it's an observation and a question, and it's relevant to the article.

vampirechicken · on Jan 14, 2014

By what objective criteria did you decide "disgusting mashup of a language?"

Please rank several other languages by the same criteria so we can judge your objectivity.

garrettdreyfus · on Jan 14, 2014

I think it is mean to call something that obviously many people like, and spent alot of time building, a "disgusting mashup of a language".

VLM · on Jan 14, 2014

May need to expand. Saw several posts about shrinking but sometimes the best way to go is expand then start replacing little parts. So A, B, C both feed into magic box that produces X, Y, Z respectively. Well, make two (or more) magic boxes and rather than trying to write a (ABC) -> (XYZ) converter all at once, write a A to X converter, then a B to Y converter... Given wildly different languages, they may no longer belong in the same function anyway...

skywhopper · on Jan 14, 2014

The only way to pull off something like this is to do it slowly, carefully, and in pieces. You aren't going to rewrite a system that large all in one go.

Of course, rewriting something just so you can say it's in a different language is silly anyway. Whoever set that goal is being overly simplistic. They need to step back and re-examine their actual needs and real problems.

vampirechicken · on Jan 14, 2014

A special pig like that, you don't eat all at once.

nawitus · on Jan 14, 2014

PayPal's Node.js experiences are totally different than what the blog is describing. Besides, the "number of lines" calculation is useless, and even though the blog says so, t still runs with it. Besides, a Perl blog post is not a very neutral source on the cost of rewriting Perl to something else.

zzzcpan · on Jan 14, 2014

Very misleading post. It's obviously not that expensive to start slowly replacing pieces and have some people work on new features isolated from the main code base.

But I understand Ovid's frustration with so many people successfully switching from Perl to things like Go and being happy about it in their blogs ;)

seivan · on Jan 14, 2014

Could this be booking.com?

mortyseinfeld · on Jan 14, 2014

We're still in the dark ages when it comes to software development, and I don't believe it's because we're using Java instead of Haskell.

I believe we need much more powerful tools that help us in understanding large code bases. Tools that can help us visualize what's going on. Tools that can do testing for us. Tools that can rewrite code for us (think Resharper or other Jetbrains refactoring tools), but an order of magnitude better.

dasil003 · on Jan 14, 2014

Why don't you think Haskell will help with this? If Java can't prevent a NullPointerException I don't see how static analysis can take the tooling where you want it to go.

mortyseinfeld · on Jan 14, 2014

It's zero-sum if the tooling around Haskell is antiquated compared to Java's. But the idea that a language is going to make us "that" much more productive has to go. We need much, much better tools.

Dewie · on Jan 14, 2014

> That's roughly 5.5 person years of effort to rewrite to rewrite the code base, but that assumes you're working seven days a week, 365 days a year. In reality US workers typically work roughly 2,000 hours per year, or about 250 days out of the year. That means it would take eight person years of effort to replicate the above code base (over ten years for the average hourly hours here in France).

If he's from France, why not just stick to talking about the French working hours? Or just one of the two countries. Bringing up two countries like that was weird to read, but maybe that's just me.