
Why don't you just rewrite it in X? - buovjaga
http://nibblestew.blogspot.com/2017/04/why-dont-you-just-rewrite-it-in-x.html
======
orf
Do you really want to convert the lines one-to-one? This seems pretty
wasteful. Surely a better way is to just do step 3, drawing inspiration (and
copying where appropriate) from the original code based on the _features_
rather than the structure of the original code.

For example gtk-doc generates documentation from comments in c code. I'm
guessing there is some perl code to parse c code, which i would substitute for
the great 'pycparser' library. With this method you would copy the existing
code, fix it, then throw it away.

The methods to generate HTML from the comments is likely a lot of code. Sure,
you could copy whatever method the current perl code does... Or you could
substitute a lot of it for Jinja2.

Perhaps these examples are incorrect in the context of this specific project,
but i don't see the point in copying the lines one for one. Copy the _meaning_
of the code into idiomatic python from the get go, and test the output against
the known good perl code. I doubt perl code copied one for one is ever going
to be idiomatic python code, so why bother, especially if it takes so long?

Edit: Yeah, tests are good. No tests are bad. Having a complete understanding
of the code doesn't require translating it line-for-line. Re-writing a project
in a different language is a breaking change, but in the context of this
project your initial tests could be (for every bit of C code you can possibly
find that has gtk-doc comments):

    
    
       > gtk-doc.py > a
       > gtk-doc.pl > b
       > diff a b
    

After you can find no differences, great, release it as an alpha or beta.
People can then feed back any internal, weird quirky behavior they depend on.
Rinse and repeat.

My point is spending 1000000000 hours hand-converting perl into python then
again re-writing it into idiomatic python so you can catch any theoretical
small perl-specific edge case in what is effectively a complete rewrite and a
major version change is IMO a bit pointless, especially when converting
Frankenstein python-perl into idiomatic Python may result in deleting large
parts of the said slow-to-convert code.

~~~
bsder
It is Perl code to parse X using regular expressions.

This is _always_ the worst kind of Perl code imaginable to try to tinker with.

It will be fiddly. It will have exceptional cases piled upon exceptional
cases. It will have hideous numbers of undocumented corners.

All because it isn't a real, _actual_ , fscking parser.

Because we don't really need a parser for this, right now, do we, really?

Yeah, this is one of those cases where "Do the simplest thing" breaks down
horribly and very rarely does someone have the wherewithal to apply 2x4
cluebats to the people who _persist_ in writing parsers with regular
expressions.

Yes, I have done so (in Python actually). However, my code has a little
"counter" in the comments along the lines of "You have debugged a regular
expression in this code: <n> times. Use a real parser you idiot."

It is depressing how large I have let that number get before I actually pull
my head out of my ass.

~~~
zzzcpan
Yeah, proper parsers are actually super easy once you are comfortable with
them and take much much less time than a pile of regexps. Like higher-order
parser combinators. You can implement them for most languages in no time and
then just construct your parser from a few basic higher-order primitives:
regexp tokens, alternations, sequences and repetitions.

Getting there though takes a bit of practice, from recursive descent parsers
to parser generators.

~~~
weaksauce
are you handwriting your parsers or using some kind of parser generator
typically?

~~~
girvo
I usually just pull a Parsing Expression Grammar library in.

------
WalterBright
> Manually converting code from one format to another is the most boring,
> draining and soul-crushing work you can imagine.

It's not that bad. I've converted many programs from one language to another.
For example, in the 1980s I converted the FutureNet DASH schematic editor from
16 bit x86 assembler to C (so it could be ported to other platforms). I
converted my game Empire from BASIC to Fortran to PDP11 assembler to C. I've
converted parts of Optlink from assembler to C. I've converted a lot of the
DMD compiler from C++ to D.

The conversions I've done have all produced valuable results.

It's actually rather enjoyable work, but I like obsessive detail work.

~~~
majewsky
There's a certain breed of developer (me included, and probably also Walter
above) who really like refactoring and porting tasks which others would
consider soul-crushing.

When the codebase has evolved for a really long time, and you can see the
different coding styles of different generations of developers, you get to
feel like an archeologist or a geologist who's digging through multiple layers
of rock.

~~~
tofflos
I often use the archeology analogy to describe my work to others. I like how
you become familiar with your predecessors and their strategies for solving
problems. It's a way of getting to know them even if you never actually get a
chance to meet them.

Some of the best parts:

1\. Trying something totally new only to realize that one of your predecessors
has attempted to do it before. Bonus points if you can solve it this time
around.

2\. Digging around and realizing that your predecessor was in fact yourself.

3\. Randomly encountering one of your predecessors in real life and becoming
instant friends while reminiscing about old times. ;-)

------
siliconc0w
Applications can accumulate 'business cruft' as much as they do code cruft.
Things were built for one reason, then were hacked to do something different.
I.e:

"Note that GTK-Doc wasn't originally intended to be a general-purpose
documentation tool, so it can be a bit awkward to setup and use."

So pure 'rewriting' is not usually what you want. You want the same
functionality but in an easier to use and maintain package. The approach I
usually take is finding a well supported project that is already solving the
same problem and extending it to solve mine.

In this case, a documentation generator seems a suspect thing to rewrite as
there are already many of those out there. I'd look at extending Doxygen to
support GTK-Doc's syntax (or automating a GTK-Doc to Doxygen translator).

Anyway this is an approach i've used with some success before. It can't always
work, sometimes the problem you're solving is fairly unique or sometimes the
available open source projects don't have much better code bases than the one
you're trying to replace.

As far as "Chesterton's Fences" I say tear 'em down (but instrument and be
prepared to rollback). Sometimes the only way to tell why something exists is
to remove it. This is effectively just paying the price of technical debt - a
cost of doing business. The higher cost is to live in a world littered with
rusting fences.

------
makecheck
Rewrites lead to all those wonderful feature-removing and compatibility-
breaking scenarios that you see in “new versions” of software. They can really
suck for consumers and that is reason alone to be much more cautious.

On the other hand, you can’t expect to keep your entire software world
standing still. At some point, the operating system or even the _hardware_
will make it really hard to keep an old code base going, and you’ll have to do
very creative things just to keep it all working. And no, you typically don’t
have the freedom to force people to use some ancient system for your benefit;
you’re not in a vacuum, your users are doing lots of other things and their
_other_ apps are going to keep moving even if you don’t.

You will reach a point of real regret if you don’t spend at least some time to
move code to more modern concepts/languages/libraries. It doesn’t have to be
100% at once, nor does it have to be an outright replacement; leverage multi-
language bindings, testing frameworks, etc. and beta users to make progress.
And whatever you do, never release a “MyApp 5” that is a completely different
code base than “MyApp 4”; this just aggravates people when nothing works quite
right. You need MyApp 4.1, 4.2, 4.3, 4.4, etc.

------
danieldk
_The cURL project consists of roughly 100 thousand lines of C code according
to Ohloh._

You do not have to when you choose a more modern safer language that can
export functions with C linkage and can trivially call C functions. You decide
to use a new language and write new functionality or functions that have to be
rewritten anyway.

For instance, gcc first switched to g++ as the default compiler and started
allowing a subset of C++. Firefox started using Rust here and there where/when
it makes sense. There are countless other examples.

I think very few folks would actually propose converting a 100KLOC or an MLOC
project to another language. (Though Go did did it with their runtime and
compiler :), though that's quite a special case).

~~~
vvanders
Yup, I'm in the middle of porting something over to Rust that's C based and
this works _really well_. You can stub in the Rust parts piece-by-piece via
FFI and incrementally bring up the port testing along the way instead of a
big-bang integration at the end.

------
peterwwillis

                  Should I rewrite X in Y?
                     /               \
                    /                 \
                   /                   \
                  |                     |
                  |                     |
      Am I doing this just        Is my team full of
      for the features in Y?        experts in Y?
                  |                       |
                 Yes                  ___/ \___
                   \                 |         |
                    \               No        Yes
                     \               |         |__________
                      \              |                    |
                       \         Are there any experts    |
                        \          on my team in Y?       |
                         \           |           |        |
                          \          No          |        |
                           \         |          Yes       |
                            \        |     _____/         |
                             \       |     |              |
                              \      |  Did they          |
                               \     |  propose it?       |
                                \    |     |      \       |
                                 \   |    Yes      |      |
                                  \  |     |       No     |
                                 Don't rewrite.    |      |
                                     |             |      |
                                     |           Were you going to
                                     |           rewrite it anyway?
                                     |              No       |
                                     |______________|       Yes
                                                             |
                                                              \
                                                               \
                                                                \
                                                                 \
                                                                  \
                                                                   \
                                                                    \
                                                                     \
                                                                      \
                                                                       \
                                                                        |
                                                                 Think about it.

------
ams6110
Although LOC metrics are always questionable, converting code from one
language to another at 100 lines/hr seems absurdly optimistic to me.

~~~
TheCoelacanth
Definitely seems completely unrealistic for any conversion involving idiomatic
Perl. I could see it being possible for Java to C# or vice versa.

------
hawski
A nice example of rewrite in X is the rewrite of 0install in OCaml from Python
[0]. A very systematic approach, replacing code gradually. There are multiple
posts on this and they were discussed here a few times.

[0] [http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
ret...](http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-
retrospective/)

------
luckydude
``Every now and then the code used "fancy" Perl features. Converting those
parts was 10x, 100x, and sometimes up to 1000x slower.''

That is why, when I run an engineering team, I constantly push back against
"clever" or "fancy" stuff. Engineers love that stuff, eat it up. I get it,
it's fun. But when they move on to something else and a mere mortal has to
maintain, fix, or convert it, the cost of that clever stuff becomes apparent.
I've had to be that mere mortal and I can assure you it is a miserable
experience.

I've often thought that one of the most useful things about me is that I'm not
smart enough to be that clever, I like simple, obvious code. It's a little bit
of a lie, I'm smart enough, sort of, to be that clever but it's a lot more
work. And I know that any code that is more than 6 months, doesn't matter if I
wrote it or someone else wrote it, it always feels like someone else wrote it.
And man, do I love it when that someone else went for straightforward rather
than clever.

~~~
pekk
The corollary is that entire languages and libraries dedicated to doing things
cleverly, even if they are technically usable, are often bad choices.

But unless you can point at benchmarks, nobody will listen to you, and the
better choice may even be described as blub.

------
janwillemb
Summary:

> Manually converting code from one format to another is the most boring,
> draining and soul-crushing work you can imagine.

> we can estimate that a sustained rate of conversion one person can maintain
> is around 100 lines of code per hour

> This gives us a clear answer on why people don't just convert their projects
> from one language to another: There is no such thing as "just rewrite it in
> X"

------
bluetwo
The answer to the question "why don't you re-write in X" is the question "what
benefit to the client would this create and is it worth the effort?"

~~~
ams6110
The other answer is "great idea, please send your pull request when it's
ready"

~~~
ars
Don't do that unless you actually intend to accept the pull request. Saying
this was a way of telling someone to go away is not right.

~~~
ams6110
If the demand is "you should rewrite in X" from some random user I think it's
a perfectly fine way to say "go away."

If there's some indication that the request is from someone qualified _and
willing to help_ , maybe you give it more consideration.

~~~
bluetwo
I get what you're saying. A brilliant idea that someone isn't willing to put
any effort into really isn't likely to be brilliant.

I'll ask someone to write a one page summary of the project goals. Similar
result.

------
exabrial
I will smack the next person that says "because x is more productive than y".
That's an excuse to scratch an itch to using some latest popular framework or
language. If that was actually true, it would be easily measureable with a
double blind test. What's usually true is "rewrites (in any language) are more
productive than the original team, because now we actually understand the
requirements"

~~~
cle
Easily measurable? With what metric?

Or are you implying that, because you don't know a measurement that has a
causal relationship with it, that you can't be improving it?

~~~
exabrial
I'm not the person making the absurd claim "x is more productive than y". I've
merely pointed out their lack of proof for such a ridiculous statement.

------
CodeWriter23
Solution: reply to bug report, "please reopen when you have finished that pull
request, complete with unit tests", then mark as WONTFIX.

~~~
luckydude
Have an upvote, that's brutally honest.

------
drej
People mention that said rewrite needs to have some immediate benefits. I
actually like rewrites where no obvious changes are visible. These are
rewrites, where one anticipates future problems and does this work
preventively.

Three projects come to mind. 1) Rewriting a set of internal libraries from
Python 2 to 3, because it's better to do early on, before compatibility
problems arise. 2) Rewriting an ancient cli tool written in C++ into Go
(nobody here knows C++, some speak Go). It has worked for years untouched, but
you never know. 3) Rewriting a Fortran modelling tool into Go, because while
people know Fortran, it maintenance costs are annoying and adding
functionality is difficult or impossible.

In all cases, no functionality was gained or removed by the rewrite, but
future pain was spared thanks to these endeavours.

~~~
p4lindromica
You would not consider all of these maintainability improvements to be
immediate benefits?

~~~
braveo
"maintainability" by DEFINITION is not immediate.

------
abraae
So disappointing! I thought I was going to read a call to arms to stop
building systems as web applications, and move to the venerable window
protocol.

------
alangpierce
I've thought about codebase conversions quite a bit while working on the
decaffeinate project [1] to help speed up the conversion from CoffeeScript to
JavaScript at my work. I've also given some advice on different strategies
[2][3], although some of that is specific to the CoffeeScript -> JavaScript
problem, which is easier than other situations since the languages can interop
easily and are so similar.

The "1 hour per 100 lines" rough number from the article is probably
optimistic, at least in some settings. As one example, a coworker of mine
manually converted a complex 500-line file, and it took two days to convert,
one day to go through code review, and introduced two bugs.

Probably any reasonable strategy needs to find a way to work incrementally and
focus on the most valuable parts first. For example, if you focus your efforts
on converting files that are the most active, then you may be able to
culturally move your team to the new language even if there's still a lot of
legacy code in the old language.

In terms of the three phases from the article, I'm hoping that decaffeinate
can become stable enough that it completely automates step 1 and avoids the
need for step 2, but step 3 will always take time. In my case, my plan is to
do a fully-automated conversion over the ~150k lines of code (broken up into
maybe 10 chunks), call the style issues tech debt, and slowly clean those up
as we work through each part of the code.

[1]
[https://github.com/decaffeinate/decaffeinate](https://github.com/decaffeinate/decaffeinate)

[2]
[https://github.com/decaffeinate/decaffeinate/blob/master/doc...](https://github.com/decaffeinate/decaffeinate/blob/master/docs/conversion-
guide.md#converting-a-whole-project)

[3]
[https://github.com/codecombat/codecombat/issues/4276#issueco...](https://github.com/codecombat/codecombat/issues/4276#issuecomment-291954686)

------
jack9
> There is no such thing as "just rewrite it in X".

People do it all the time, every day. Code has side affects and features you
don't necessarily need. NIH is usually, "don't need most of that here" or
"language X would do it cleaner/with-less-bugs(tm)", which is different
wording for the exact same reasoning. So I'm not sure why the focus on this
line-by-line conversion that nobody would choose to do.

On the other hand, rewriting my java chat server in erlang (in 2008) went from
5000 lines to about 500. That was a point by point functionality conversion of
my own project, for existing java clients.

------
alexeiz
For any reasonably complex codebase the complete rewrite is not possible. The
first several versions of the rewrite will inevitably lack the functionality
present in the original version and have more bugs compared to the original
version. In my experience, you can only achieve the feature and quality parity
if you continue to work on the code long after the initial rewrite.

Take for example a rewrite of coreutils in Rust. The LS utility in original C
([https://github.com/goj/coreutils/blob/rm-d/src/ls.c](https://github.com/goj/coreutils/blob/rm-d/src/ls.c))
and its rewrite in Rust
([https://github.com/uutils/coreutils/blob/master/src/ls/ls.rs](https://github.com/uutils/coreutils/blob/master/src/ls/ls.rs))
are quite different. The rewrite doesn't implement half of the original
options. It's reasonable to assume that not many people even use the rewritten
LS, so it contains bugs that the original widely used version doesn't have.

The bottom line is that if you want to achieve comparable quality and
features, the time and effort invested in rewriting some code is almost the
same as the time and effort invested in creating that code from scratch.

~~~
dragonwriter
> For any reasonably complex codebase the complete rewrite is not possible.

If it's reasonably complex but also reasonably well designed, it can be
completely but incrementally rewritten by module, without the problems you
describe.

OTOH, those are usually the things that there is less pain with the existing
system driving by a desire to rewrite. Chances are if you want to rewrite it,
it's going to be a nightmare.

------
empath75
At least where I work, switching from perl to python happens when the guy that
wrote the perl leaves the company and nobody else wants to touch it.

Every time I've been handed a perl script to 'fix', if I think I'm ever going
to have to touch it again, I rewrite it in python. Our perl code is a
migraine-inducing mess of unmaintainable punctuation vomit.

------
adamnemecek
Generally you'll be porting to a more expressive language so you'll be
hopefully removing lines overall.

------
rwmj
When Red Hat converted our OVirt VM/cloud management tool from C# to Java, it
was done using an external tool (I think using a tool written by Tangible
Inc). This converted the code mostly automatically, with some manual tidying
up afterwards. It was about 100K lines of code, and I believe the conversion
took about 6 months (while existing development continued). The project was
ultimately successful. It is now a respected open source Java project and
widely used and sold.

More on the conversion here: [https://lpeer.blogspot.co.uk/2010/04/switching-
from-c-to-jav...](https://lpeer.blogspot.co.uk/2010/04/switching-from-c-to-
java.html)

------
dilap
An interesting recent example of successful rewrite was moving the Go compiler
from C to Go. They wrote a transpiler from the (very disciplined) C code to
Go, and then have been slowly idiomaticifying the code since.

~~~
bostik
You have to realise the scope and context: the Go compiler rewrite was done by
people who were:

* _working on the codebase full time_

* _intimately familiar with the quirks_

* relying on _the transpiling to be mostly automated_

IMO as far as rewrites go, that's a pretty tall order.

I'm personally contemplating a rewrite of an old project (mostly due to GTK
having gone downhill in past years), and harbour no misconceptions about the
scale of the effort. It's only about 10k lines of python, across two different
projects, but if it takes me less than 2 months (of _active_ spare time work),
I will be very much surprised.

~~~
dilap
Oh, totally. Very much agree.

Out of curiosity, what's the app, and what UI toolkit would you port to? (And
would you stick with python?)

~~~
bostik
The app is my old online poker tracker, written with pygtk.

I am tempted to move to Qt instead, and for a number of reasons will probably
try Go instead.

1) As much as I have traditionally disliked Go's dependency management, with
the vendoring support it can be made reasonable

2) I never got python's threading to work properly, which made the UI
unresponsive when dealing with database operations; and threading in python is
a hack anyway

3) I believe ( _cough_ hubris!) that I can simplify the internals by further
splitting the architecture

One of the best ideas I ever had was to forego all attempts at subprocess
management and just go for DBus and its process autolauncher instead. I mean,
have you ever tried to make a fork/exec work from within GTK software without
causing X hiccups?

~~~
dilap
Cool stuff. I think the UI library story is still pretty raw, but I've done
several backend applications in Go, and I'm pretty much in love. It's a very
well-engineered language with a strong appreciation for real-world
practicalities.

------
wolco
I was thinking many of the manual changes in step one could have been done
through a program or script freeing you to help with phase 2/3\. Rewriting
without introducing something new: features, stability, cost reduction, easier
to maintain, etc is usually a waste of time but could be a fun if you are
using the exercise to learn.

------
jcoffland
I mostly agree with the sentiment of this article but converting a bunch of
shell, Awk and Perl into Python 3 is probably a good idea. It's not like
Python is some newfangled untested language. Making the build system portable
and easy to use is _really_ important for an Open-Source project.

------
joezydeco
The Eighth Dirty Word - "Just"

[https://blog.forrestthewoods.com/the-eighth-dirty-word-
just-...](https://blog.forrestthewoods.com/the-eighth-dirty-word-
just-2d2386850cda)

------
stcredzero
I'm not sure that large organizations are soon going to grok this trick to
save them millions of dollars and tens of thousands of hours of meetings and
employee debate. Anyhow, here it is:

 _Only allow coding that lets the company change its mind._

If you choose a language which has conservative, well established features,
devoid of parsing quirks, and you enforce very strong coding standards per-
team, then it is possible for every program to be composed of per-team idioms
which can then be automatically translated to idiomatic code in a different
language through isomorphic changes.

You will still need to have a project to port your program, but instead of
trying to outstrip the rate at which production code changes, the effort is
instead going towards perfecting a software machine which translates the
program in an instant. By doing it this way, you can always upgrade your
technology without slowing down the production team, or without trying to
outstrip a production team which is already trying to move competitively fast.

I know this trick works. I've done it. What makes this challenging:

    
    
        - Management will be afraid of this, because it's unconventional
        - Teams may not want to or be able to usefully adhere to such standards 
    

But if your organization can pull this off, then you will be able to upgrade
technology without slowdowns in implementing new business functionality. If
organizations can pull this off, they can "just rewrite in in X" any time they
want to! ( _And_ wind up with idiomatic code in the new language! If you are
trying to do this, and not winding up with idiomatic new code, then you have
tackled this at the wrong level of abstraction, possibly being hampered by a
team which didn't have suitable coding standards.)

(Think of it this way: Do you think it's easier for a very capable team to
outstrip your ability to keep up with features, or your ability to keep up
with new programming idioms? If you team is competent, they will be _quickly_
producing new features, but _seldom_ producing new coding idioms!)

------
doggydogs94
Another flavor of this topic is the following: just convert from database X to
database Y. It should be easy, they are both SQL.

