
Ditching a Language - chromatic
http://blogs.perl.org/users/ovid/2014/01/ditching-a-language.html
======
mamcx
A rewrite is very hard, specially from a ancient, spaguetti monster. Keep it
in the same language or not is not the biggest issue, IMHO, except if we are
talking about a key library not available elsewhere or a niche where that
language is just the BEST.

I have done severals. Changin languages, database engines, architecture,
styles... even do the same thing several times in the same project!

And each time, I see a lot of code reduction. Specially, if I can change the
language!. In one of them, I reduce by the butloads some badly C# project to
python. I meant, close to 1000 files to few dozens. Yep, if we are talking
about _spaguetti_ , then that could compresse that well ;)

In fact, I think the action of change language (or move to the _most_ recent
versions with the most modern libraries/dependencies possible) is the SIMPLEST
way to reduce the load of the job.

I do this all the time. Each time obj-c give some new trick that cut code, I
apply it as fast as possible, across all my codebase. I learn to do that after
the most insane upgrade/rewrite from .NET 1 to 1.1 then 2.0 that kill us
because the boss wait to much.

The BEST way is obviously not bring (again) the same mistake that create that
monster in first place. THAT is what make hard/impossible the task in the
average corporation, because are the cultural problems that cause the biggest
mess.

Also, is necesary to keep old project alive, and (this is something that bite
me once) truly have the most hyper-perfect data upgrade/syncronization
possible to minimize downtime and have real data from the start... real but
clean! This way I create a 3-tier version in visual foxpro+sql server of a
fox/dos app that was deployed in +2000 places with non-tech people before the
internet, sucesfully (bar the first couple of tries ;)).

------
anarchitect
It's coincidental that this should be on the front page of HN the night before
my team and I release a big migration that has taken the best part of a year.

The codebase he describes is an eerily accurate representation of where we
started, with the added complication that it was built by a sole developer
than wasn't using any version control at all.

There are roughly 600k lines of Perl code in the back-end alone, but because
there was a lot of duplication in lieu of version control, I have no way of
knowing how much of this was actually in use. I suspect roughly 100-150k.

Our approach was pulling the platform apart into distinct (Ruby) services and
putting HTTP interfaces around some of the legacy services where possible.
We've ended up with < 15k of Ruby, including the front-ends. It's not perfect
but there haven't been many major issues in our pilot release, and the team is
happy. Fingers crossed.

~~~
jsnell
The obvious question is: how large was your team?

While a lot of the commenters here seem somehow outraged at the computations,
5.5 man years seemed like a very aggressive schedule for rewriting a 1MLOC
system, even if the end result were a 100KLOC one. My initial guess would have
been 10 man years, and a cost in the millions.

So here you have what was probably a 150KLOC original system. How many man-
years would you guess the rewrite took?

~~~
anarchitect
Three, plus a part-time contractor. One of the team is front-end so had little
to do with replacing the Perl part.

The truth of it is, in our case it isn't (yet) a full rewrite, and there is
still a lot of functionality tied up in the existing codebase. So it's not
easy to answer the question about man years, but I would guess around 1.5.

The biggest wins for us in terms of lines of code were not the language (we
did consider sticking with Perl), but re-assessing the business logic, ridding
the codebase of legacy junk and using existing libraries instead of hand-
rolled solutions.

~~~
bluej4ack
The way you did it seems to be the most logical (and obvious) approach, which
the article completely neglects to mention

------
m0nastic
This blog post seems like a case of being stuck between a "second system
syndrome" and a sunk cost fallacy.

It would seem like the obvious answer would be to not sit down and rewrite the
whole thing from scratch, but start replacing pieces (with whatever language
they think they'll be successful at).

The idea that they should somehow just be stuck forever with a shitty mess of
a perl application seems incredibly defeatist.

~~~
joe_the_user
Protecting sunk costs is only a fallacy if the money was entirely wasted.

If you spent money on something and that something isn't worth what you put
into it but still is worth a lot, you want to protect that investment.

As far as incremental improvements go, rewriting some part in another language
seem pretty bad. I mean, if you are being incremental, then you have to make
changes that might interrupted in the middle and then you'd be saddling the
system with two different languages.

You could just easily rewrite the worst parts to conform whatever existing or
new standard you have. I'm not fan of Perl but I'm pretty sure you could at
least create a subset that would conform standard object-oriented practices
and not have the problem of now having a system written in two different
languages.

~~~
m0nastic
I have a hard time considering this large application as an "investment".
Presumably, it provides some business function, and has been hopefully
generating more revenue in its lifetime than was spent creating it.

I don't think the software has any intrinsic value; it wasn't constructed from
precious metals, which could be re-smelted and sold for scrap. If continuing
to develop it/maintain it is costing them money (either in real terms because
of development costs, or by lost opportunity costs), than they should look at
how that's trending. At what point does the thing cost more than it's worth to
them? (Maybe it never does, I've had banking customers who spend millions of
dollars a year keeping a thirty year old, p.o.s. Cobol application running
because it's the backbone of their operations).

And I didn't mean to imply that they should rewrite parts of it in another
language, just that they should start decomplecting pieces of it so that they
can replace those pieces with better-designed ones. If they want to stick with
Perl, because that's what their expertise is, than I think they should do
that. I would agree that it's probably counterproductive to take on both
rebuilding the application while at the same time switching to a new language.

But unless the entire application is passing around internal Perl data
structures, it seems crazy that they can't identify edges to the application
functionality, and start to peel those edges away and encapsulate that
functionality in a better way.

------
lmm
You can write a million lines of code that do arbitrarily little, particularly
if you're following poor development practices. I've seen a million-line
codebase that could have been replaced with 10k lines of python. Depending on
what functionality they actually need and how much they need the developers,
Acme could well be making the right decision.

~~~
jfoutz
I think this is the key. I have this big honking system that nobody really
understands. We need a system to do X. please implement X is way easier than
the whole, rewrite a million lines problem.

~~~
zaphar
What is also key is understanding that this is almost never the way that
happens.

It goes more like this:

We have this big honking system that nobody really understands. We need a
system that does X (where X is the list of features we _think_ the other
system does that are critical. X is subject to change as we discover other
features to add and features that actually weren't necessary.) Please
implement X this way.

And that can easily become an exercise in extreme frustration.

------
guard-of-terra
Whoa I once got a 150kLOC code base in my possession and got it to 37kLOC in
maybe two years.

I cut a lot of stupid boilerplate and was willing to cut features nobody
needed. I added a lot of new, good features (not so good, too).

It was still not perfect, these days I would be much more fierce at shaking
old stuff.

Still, you just delete and fix and delete and fix. That's how you make a turbo
plasma rifle out of Singer sewing machine.

~~~
aryastark
> was willing to cut features nobody needed

Ah. And then a week later you hear that Janice in the accounting office in New
Zealand depended on one of those features. That's when you learn there is a
Janice in accounting. And that your company has an office in New Zealand.

~~~
guard-of-terra
Well, then you fix it. It's better to try than coding around code of unknown
origin, likely defunct.

------
squigs25
So... what is the solution?

There are times where a legacy technology has limitations that ultimately
prevent progress and create maintenance nightmares. A good example is some of
the older database technologies where a 1TB database machine could cost 100k
(example: Sybase).

You could try to maintain a series of expensive databases, but between
replication backups, dev boxes, team of DBAs etc suddenly your costs to keep
the old technology are really high. And if these are databases storing
expensive financial data, well, maybe the risk of hitting your 1TB limit is an
expensive risk to have on your plate.

And it might take 10MM to migrate to a new technology, but in this case it
would probably be worth a switch.

When technology is a commodity, and working poorly is still working, then
maybe it's hard to justify a switch. But if your old technology starts to
limit your performance and introduce risks or problems that detract from your
competitive advantage, then you might not have a choice.

------
dasil003
Minor point, but I don't like the citation of Netscape as a failed rewrite.

It's worth remembering that the Mozilla project ended up being a success in
that it dealt the blow that finally dislodged IE from dominance. The original
Netscape codebase could not do this because at the time IE4 was already way
ahead in terms of CSS support and Netscape was hitting an architectural dead-
end. Maybe they could have piled on some more hacks to get NS5 out quicker,
but even if it had feature parity to IE5 they had to contend with Microsoft's
bundling which was Netscape's real undoing.

------
Arnor
TL;DR Straw men and fuzzy math.

Lots of absurd assumptions in here. Many of them were acknowledged, but the
author seems to think that this `millions of lines of spaghetti code` with
`little use of existing libraries` will be rewritten as millions of lines of
spaghetti code with little use of existing libraries. The rewrite should
decrease the workload to a fraction of what the author used for his fuzzy
math.

~~~
baldfat
you are not allow to dissect an article and make fuzy math and strawmen claims
with TL;DR!!!!!

~~~
Arnor
You're right. That was snide and disingenuous. Unfortunately, I've lost my
edit link or I'd remove it. Sorry.

------
andrewvc
Changing the language isn't a big deal. You can always go SOA. In fact, even
if you stick with the same language and are doing a rewrite, SOA is probably
the right answer.

If the original system is THAT bad, the language isn't the problem, it's the
architecture, and you should probably refactor it into multiple components
which could potentially be multiple different languages.

~~~
jksmith
That's my thoughts. Get away from app and more into portal. What still
befuddles programmers on large projects is that they still think from the
whole front to the whole back and vice versa. MVC hasn't helped this one bit.

~~~
millstone
Is this train of thought meant to be specific to server-side apps? An example
of a large rewrite was the classic Netscape browser engine (to Gecko), and I'm
having trouble picturing how a browser engine could use a SOA.

~~~
jksmith
That's an interesting example because existence of a thick client allows for
SOA on everything else. The browser is the last thick application.

------
protomyth
The scary thing for me in ditching the old code base has been those
undocumented corner cases. You generally find them by saying "We don't need
that" to some piece of code and later find out it was a bug fix 5 years ago.
It always seems worse with stored procedures when moving databases.

If the only spec is the old code base, then you are probably doomed.

~~~
bluej4ack
version control would help with that

~~~
protomyth
I haven't worked on a big project without version control, and I don't see it
as a substitute for an actual specification. I have also not see complete
explanations of patches be explained in detail enough to substitute for a
spec.

------
npsimons
What really surprises me is that I got to the end and no one mentioned what
seems obvious to me: if they are doing a rewrite (in any language) of a
horrible code base _they_ made, what reason do they have to believe this time
will be different? Yes, we can assume some learning, hopefully some
improvement, but as the saying goes, you can write FORTRAN in any language.
Switching languages won't magically fix things. Training up your dev team and
getting them to start refactoring, OTOH, is a much more interesting
proposition.

------
_random_
"fired the dev team that had been working for one and a half years to develop
a complicated project in part because an outsourcing company in India promised
they could replicate it in two months" \- when will they learn?

~~~
reinhardt
More appalling (or hilarious) is the footnote: "Except for an insurance
company who decided to switch their accounting software from COBOL to C++.
They gave their COBOL devs a two week training course in C++ and told 'em to
rewrite the system. I don't need to tell you how that turned out."

------
rtpg
I can't possibly imagine that the 1000000 lines of code couldn't be replaced
piecemeal (in giant blocks). Then again, I've never seen 1000000 lines of
code.

------
digisth
As some previous commenters have noted, there is a spectrum here between full-
bore rewrite and don't-touch-a-thing. Parts of the code could be factored out,
modularized, rewritten in another language, and all that without huge impacts
on the existing functionality. Joel Spolsky wrote an article a long time ago
about the value of software that has already been written and put through its
paces (though I don't agree with /never/ rewriting, it should be done only as
last resort):

[http://www.joelonsoftware.com/articles/fog0000000069.html](http://www.joelonsoftware.com/articles/fog0000000069.html)

The most important part:

"The idea that new code is better than old is patently absurd. Old code has
been used. It has been tested. Lots of bugs have been found, and they've been
fixed. "

There's value baked into that old code: lessons learned, bugs fixed,
workaround put into place. These things can be lost during a rewrite (and
sure, there are other times you don't need them in the new version because the
original problem has a better solution/doesn't happen in the new
language/whatever), and potentially losing/missing should be considered
carefully.

------
jaegerpicker
This is a pretty terrible analysis. It completely ignores that things might,
just MIGHT actually be built correctly this time. Ignores that the developers
redoing the code might know the problem domain better. Ignores improvements in
the quality of the development teams. Ignores far too much and then advocates
staying with a bad solution out of fear that it might not go well.

------
bane
I think this is a good, well reasoned writeup. One thing I'd take exception
with is the idea that they have to rewrite a million lines of code.

> Very little use of existing libraries ("not invented here" syndrome)

The senior dev and PM could sit down over a few weeks and do a assessment of
other languages that had good library coverage for lots of the existing system
without writing a line of code.

I'd bet the project size would shrink considerably.

At an old startup I worked in, we had a legacy codebase of about 600k lines
with 15 years of cruft in an ancient dialect of C++ with lots of not invented
here syndrome.

By that point the system was so old and fragile that it simply _had_ to be
rewritten. The few libraries that we did use and didn't write were no longer
supported, modern OSs wouldn't run the software correctly, vendors had simply
gone out of business and so on.

A good 70% of the system functionality was rewritten in C# by just a couple
guys part-time over the course of a year basically just gluing together
existing libraries.

------
scott_meyer
Can you name any widely used piece of Software that has not been rewritten at
least once?

Rewrites: Firefox, IE, Word, Windows, MacOS, ...

Many of these have been rewritten multiple times and they are all orders of
magnitude more complex than some random million-line hunk of perl.

Rewriting is hard and may absolutely or just take a long time, but failure to
rewrite is pretty much guaranteed to fail.

------
Blahah
OK, I'll bite. Why the hell would _anyone_ want to go _back_ to Perl? That is
one disgusting mashup of a language.

I'm horrified by it daily when I have to use scripts written by older
bioinformaticians. The best benefits I've heard are string processing speed
(<3 you Ruby) and package management (hello Python, Ruby, R).

~~~
zzzcpan
Sometimes Perl works really well. For example, just the other day I wrote a
tokenizer in Perl after attempting to do the same in other languages, doesn't
it look pretty?

    
    
      sub tokenize {
        while ($_[0] =~ m!
          (?<whitespace>  [\x20\x09]+                   ) |
          (?<lf>          [\x0a]                        ) |
          (?<cr>          [\x0d]                        ) |
          (?<ident>       [A-Za-z_]+[A-Za-z0-9_]*       ) |
          (?<float>       [0-9]*\.[0-9]+                ) |
          (?<float>       [0-9]+\.[0-9]*                ) |
          (?<int>         [0-9]+                        ) |
          # ...
          (?<unknown>     .                             )
        !gsx) {
          my ($k, $v) = each %+;
          # $k: token, $v: data
          # pos($_[0]): current offset
          # ...
        }
      }

~~~
Blahah
nice... it looks _quite_ pretty, but would be much prettier (but very similar)
in Ruby :P

------
VLM
May need to expand. Saw several posts about shrinking but sometimes the best
way to go is expand then start replacing little parts. So A, B, C both feed
into magic box that produces X, Y, Z respectively. Well, make two (or more)
magic boxes and rather than trying to write a (ABC) -> (XYZ) converter all at
once, write a A to X converter, then a B to Y converter... Given wildly
different languages, they may no longer belong in the same function anyway...

------
skywhopper
The only way to pull off something like this is to do it slowly, carefully,
and in pieces. You aren't going to rewrite a system that large all in one go.

Of course, rewriting something just so you can say it's in a different
language is silly anyway. Whoever set that goal is being overly simplistic.
They need to step back and re-examine their _actual_ needs and _real_
problems.

~~~
vampirechicken
A special pig like that, you don't eat all at once.

------
nawitus
PayPal's Node.js experiences are totally different than what the blog is
describing. Besides, the "number of lines" calculation is useless, and even
though the blog says so, t still runs with it. Besides, a Perl blog post is
not a very neutral source on the cost of rewriting Perl to something else.

------
zzzcpan
Very misleading post. It's obviously not that expensive to start slowly
replacing pieces and have some people work on new features isolated from the
main code base.

But I understand Ovid's frustration with so many people successfully switching
from Perl to things like Go and being happy about it in their blogs ;)

------
seivan
Could this be booking.com?

------
mortyseinfeld
We're still in the dark ages when it comes to software development, and I
don't believe it's because we're using Java instead of Haskell.

I believe we need much more powerful tools that help us in understanding large
code bases. Tools that can help us visualize what's going on. Tools that can
do testing for us. Tools that can rewrite code for us (think Resharper or
other Jetbrains refactoring tools), but an order of magnitude better.

~~~
dasil003
Why don't you think Haskell will help with this? If Java can't prevent a
NullPointerException I don't see how static analysis can take the tooling
where you want it to go.

~~~
mortyseinfeld
It's zero-sum if the tooling around Haskell is antiquated compared to Java's.
But the idea that a language is going to make us "that" much more productive
has to go. We need much, much better tools.

------
Dewie
> That's roughly 5.5 person years of effort to rewrite to rewrite the code
> base, but that assumes you're working seven days a week, 365 days a year. In
> reality US workers typically work roughly 2,000 hours per year, or about 250
> days out of the year. That means it would take eight person years of effort
> to replicate the above code base (over ten years for the average hourly
> hours here in France).

If he's from France, why not just stick to talking about the French working
hours? Or just _one_ of the two countries. Bringing up two countries like that
was weird to read, but maybe that's just me.

