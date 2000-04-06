Hacker News new | comments | show | ask | jobs | submit login
Do you really want to convert the lines one-to-one? This seems pretty wasteful. Surely a better way is to just do step 3, drawing inspiration (and copying where appropriate) from the original code based on the features rather than the structure of the original code.

For example gtk-doc generates documentation from comments in c code. I'm guessing there is some perl code to parse c code, which i would substitute for the great 'pycparser' library. With this method you would copy the existing code, fix it, then throw it away.

The methods to generate HTML from the comments is likely a lot of code. Sure, you could copy whatever method the current perl code does... Or you could substitute a lot of it for Jinja2.

Perhaps these examples are incorrect in the context of this specific project, but i don't see the point in copying the lines one for one. Copy the meaning of the code into idiomatic python from the get go, and test the output against the known good perl code. I doubt perl code copied one for one is ever going to be idiomatic python code, so why bother, especially if it takes so long?

Edit: Yeah, tests are good. No tests are bad. Having a complete understanding of the code doesn't require translating it line-for-line. Re-writing a project in a different language is a breaking change, but in the context of this project your initial tests could be (for every bit of C code you can possibly find that has gtk-doc comments):

   > gtk-doc.py > a
   > gtk-doc.pl > b
   > diff a b
After you can find no differences, great, release it as an alpha or beta. People can then feed back any internal, weird quirky behavior they depend on. Rinse and repeat.

My point is spending 1000000000 hours hand-converting perl into python then again re-writing it into idiomatic python so you can catch any theoretical small perl-specific edge case in what is effectively a complete rewrite and a major version change is IMO a bit pointless, especially when converting Frankenstein python-perl into idiomatic Python may result in deleting large parts of the said slow-to-convert code.


It is Perl code to parse X using regular expressions.

This is always the worst kind of Perl code imaginable to try to tinker with.

It will be fiddly. It will have exceptional cases piled upon exceptional cases. It will have hideous numbers of undocumented corners.

All because it isn't a real, actual, fscking parser.

Because we don't really need a parser for this, right now, do we, really?

Yeah, this is one of those cases where "Do the simplest thing" breaks down horribly and very rarely does someone have the wherewithal to apply 2x4 cluebats to the people who persist in writing parsers with regular expressions.

Yes, I have done so (in Python actually). However, my code has a little "counter" in the comments along the lines of "You have debugged a regular expression in this code: <n> times. Use a real parser you idiot."

It is depressing how large I have let that number get before I actually pull my head out of my ass.


Yeah, proper parsers are actually super easy once you are comfortable with them and take much much less time than a pile of regexps. Like higher-order parser combinators. You can implement them for most languages in no time and then just construct your parser from a few basic higher-order primitives: regexp tokens, alternations, sequences and repetitions.

Getting there though takes a bit of practice, from recursive descent parsers to parser generators.


are you handwriting your parsers or using some kind of parser generator typically?


I usually just pull a Parsing Expression Grammar library in.


> "Do the simplest thing"

You have to do the simplest thing that could actually, possibly work!


With Python in particular, I think it's because it's the jump from "I can use the standard library" to choosing from the large list of other parser libraries out there. (Or writing your own parser from scratch, but that has some of the same drawbacks as just using regular expressions.)


> Do you really want to convert the lines one-to-one? This seems pretty wasteful.

I did this once - converting a few kloc of Java to C# - I'm sure it helped that these are very similar languages in many respects. I actually copied the Java source code into C# source files, and then tweaked them until they compiled. The initial commit even had e.g. a "polyfill" for Java's ArrayList<T> class.

https://github.com/MaulingMonkey/poly2tri-cs

By virtue of being such a mechanical translation, it went very fast. I intentionally avoided translating into C# idioms - or even naming conventions - in the initial commit. I wanted to avoid refactoring the entire codebase simultaneously - my experience has been that multi kloc refactors are a great way to bite off more than I can chew, and spend a lot of time in refactoring or debugging hell, and to create an unreviewable mess, all of which is very slow.

By saving all the refactoring into C# idioms etc. for followup changelists, the translation went so smoothly that I still haven't learned the math behind constrained Delaunay triangulation. A non-1:1 translation I'm sure would've forced me to learn it, if only for debugging purposes. ArrayList<T> is now gone - replaced with standard List<T> and IEnumerable<T> from .NET's standard libraries, foo.setX(value) replaced with foo.X = value;, etc.

This isn't an approach I'd recommend for all such projects, but I'm surprised just how well it worked for that conversion. The initial translation took maybe a few hours (if that), and the subsequent cleanup a few more.


Given how powerful it is, when used correctly, it's tough to see why refactoring has such a bad reputation amongst software developers.

For instance, using refactoring, when possible, provides far better guarantees of correctness than tests do.


Sure, you can absolutely write a new program with the same feature list instead of more closely porting the old logic over to the new language.

But now you have a new problem: a Chesterton's Fence (https://www.chesterton.org/taking-a-fence-down/) problem. The old logic is in the form it is (knotty and complicated, most probably) because previous developers made it that way in response to problems they were facing at the time -- problems you may not be aware of, because their old changes made those problems go away. Those knots represent places where those previous developers learned that the problem they were trying to solve wasn't as simple as it first appeared.

You can definitely simplify the porting job if you just ignore all that and write new logic that does The Simplest Thing That Could Possibly Work, but by doing so you can set yourself up for future heartburn as all those forgotten edge cases come roaring back. You end up re-learning the hard way lessons your predecessors learned the first time around.


One problem is that the "known good perl code" likely doesn't have comprehensive tests, it was probably written before TDD became prevalent. So you'll probably need an initial phase to develop tests for all the functionality so you can verify that your new code still passes all the tests.

Also you probably still have to understand every line of the old code, because within will be buried the many undocumented special case workarounds and bug fixes that code accumulates over time. See the classic Spolsky "Things You Should Never Do"

https://www.joelonsoftware.com/2000/04/06/things-you-should-...


> One problem is that the "known good perl code" likely doesn't have comprehensive tests, it was probably written before TDD became prevalent.

I'd argue that perl made TDD mainstream, though we did not call it that way 20 years ago. Well, 30, but I did not join in until the mid 90s. Perl's test protocol TAP was around since at least 1988[1] and back then when I was coding perl for a living, I rarely encountered "known good perl code" that did not have tests (notable exception: usemod wiki).

[1] https://testanything.org/history.html


Having used TAP I'm glad it existed for the things it arguably precipitated but I will never do it voluntarily again.


>I'd argue that perl made TDD mainstream,

because wtf else are you going to do when you develop using lines like

  next if (/^(#|$)/);
(totally 100% typical totally normal line - I found it by clicking the source for the very first CPAN module for a very common word - I knew I would find a typical line like that. I swear there was zero bias to this methodology).

I just use this example because it ends in what is line noise. I'm pretty sure it means "don't process this line (skip to the nexct one) if it's a line that either starts with a #, or is an empty line). But how do I really know what it does? By testing it, of course!

When good code is line noise, how else do you know that it works? Of course you're going to test it. :D

-

EDIT: Downvoters aren't disagreeing with me. You can't downvote me just because you don't like the truth. Perl coders have a heavy culture of testing. It's the only way anyone writes Perl code - period. This is in contrast with C++ programmers, for example, who historically were more content to reason about their code. Historical fact - sorry.


That's a weak argument. Just because you are not familiar with Perl regexes doesn't mean that this line is unreadable. Not saying that there is unreadable Perl code; there's troves of it. But this line is reasonably clear and idiomatic Perl.


It's not about being unreadable (I gave you my reading! It means skip this line if it starts with a literal # character or "starts with" the end-of-line marker, i.e. is blank. # is a literal, $ is the end of line character and | separates them with an or. You have to be completely familiar with regexes to know that this is correct.

This line is simple.

More complicated lines are...more complicated. Testing lets you know that code is working. Perl is the first major project that had amazing, obscene amounts of test coverage. I would say it is due to the grammar and difficulty.

I agree with you that it is clear and idiomatic Perl! Clear and idiomatic Perl is also easy to misparse using just your brain. Test code is what tells you what you wrote is doing what it is supposed to be doing.

Nobody writes Perl code, reads it carefully and reasons about it, and figures that if it compiles it is probably correct. Further, nobody has ever written code in Perl that way. Instead Perl has a heavy culture of testing.

You can argue with it but it's simply true. Contributors to CPAN have had more test coverage than the average project, and this has been true for a very long time. There's no real way to debate this.


FTFY:

    my $comment_or_blank =~ 
        /^      # beginning of line
         \s+    # bugfix - any amount of whitespace
         (\#|$) # comment char or blank line 
    /x;         # x modifier allows commented regexen.
    next if $comment_or_blank;


In C-like programming language (yes, I know that == may not be string equality and so on, but that's a minor thing) it could be written like that.

    if (line == "" || line[0] == '#') continue;
Seems fairly straightforward to me, Perl programmer won't have any problem understanding the line you provided (with `next` keyword). If you see such a line just after loop, it's clear that it skips particular lines (in this case, empty lines and probably comment lines).

Such a thing may look cryptic when not used to regular expressions, however once used, it's actually easier to understand and less bug prone compared to imperative implementation (as there is less code, which means less chance to have a bug somewhere).


Historically (not today) C++ programmers were far less likely to ever write a test to make sure

  if (line == "" || line[0] == '#') continue;
does what they think it does than the Perl programmer was to write a test to make sure

   next if (/^(#|$)/);
does what they think it does. The C++ programmer was far more likely to eye-ball it. This is a simple, historic fact. TDD came to Perl before it came to C++.

I am making a simple statement about actual practices, not some kind of larger point. TDD came to Perl earlier. simple fact.


“if (line == "")” most certainly does not do what you "think it does" in C++.


> Do you really want to convert the lines one-to-one?

Yes, as much as possible!

> This seems pretty wasteful.

No, it's not! The more 1 to 1 the rewrite is, the more of it is possible to be done automatically.

Assuming your language gives you enough expressive power, it's often faster to write a library of shims for semantics and translate syntax automatically... in the worst case with a bunch of regexes - even that works better than manually writing the code from scratch, even if you read and drew inspiration from another implementation.

As an example, I started porting Python code, the PyParsing library, to Io[1]. I stripped docstrings for the moment, but if you open it side-by-side with the Python code, you'll see that it's almost line-to-line identical, except for the syntax. To do this I first wrote a library emulating some of Python features in Io[2], then passed the original code to a series of around 20 regexes, which resulted in not-working-but-close Io, which I manually corrected.

Now, writing 1.5k (it's about a half of PyParsing for now, not finished) lines of a library (not to mention 3 times as much docstrings, which you can reuse if you keep your translation close to original), even if you had previously a quite good understanding of such a library, wouldn't take two days. Which is exactly how much time it took me using this approach.

Of course, how beneficial this method may be to you in practice depends on the target language and how well it matches features of the original or how easy it is to extend it. Io and Racket are two examples of languages extremely well suited to this approach, but you can do the same with JavaScript, Python, Ruby, PERL and many other high-level, meta-programmable languages, just with a bit more effort.

Anyway, my two cents, as I recently had such an experience :)

[1] https://github.com/piotrklibert/ioparsing/blob/master/src/pa...

[2] https://github.com/piotrklibert/ioparsing/blob/master/src/sy... and str.io


This requires you to full understand upfront all of the code you are trying to convert. While this is feasible in small projects, for a large code base this can be very, very dangerous. The advantages of going the slow way:

* you can reuse test-cases

* you have something to check the program flow against (in worst case with print statements)

* by the time you are done you really should know the code and step #3 is a breeze

PS: Your example of "just use jinja2" - have you verified that a rewrite is non-trivial? Have you checked that there are test cases you can run against your code or are you risking to manually have to verify that your new version does the same?


Hopefully there is a comprehensive set of integration and acceptance tests to verify that the output of the new tool matches the output of the old tool, bugs and weirdness and all. Unfortunately, this is rarely the case.


Isn't that a null point? If they don't exist then this endeavour is dangerous no matter the method you go about it - translating perl to python doesn't ensure its a 1-1 mapping functionality wise, and if you're going to rewrite it in idomatic python anyway....


Run the old tool, run the new tool, compare outputs. If you can't do this, you shouldn't be rewriting anything.


I do this a lot (end-to-end tests that just `diff` the results from the current code with the expectation that I previously committed into the repo), and it makes one appreciate how much work it is to make a program's output deterministic. When you look around some of my code, there's a lot of loops over maps/dicts/hashes where, instead of iterating over the map directly, I first gather the keys, sort them, and then iterate over these to ensure that the test is deterministic.


Plus one for deterministic software. It's an incredibly useful property when you want to test something. Particularly when doing major version upgrades and when replacing entire systems.

Unfortunately strict determinism and test coverage often go hand in hand. Either the developer gets it or (s)he doesn't.

Fortunately you often do get a least a moderate level of determinism even from the most awful software.


If you don't have a comprehensive set of test inputs, then "compare the outputs" doesn't suffice. (If you do, then by all means, take that approach.)


> Manually converting code from one format to another is the most boring, draining and soul-crushing work you can imagine.

It's not that bad. I've converted many programs from one language to another. For example, in the 1980s I converted the FutureNet DASH schematic editor from 16 bit x86 assembler to C (so it could be ported to other platforms). I converted my game Empire from BASIC to Fortran to PDP11 assembler to C. I've converted parts of Optlink from assembler to C. I've converted a lot of the DMD compiler from C++ to D.

The conversions I've done have all produced valuable results.

It's actually rather enjoyable work, but I like obsessive detail work.


There's a certain breed of developer (me included, and probably also Walter above) who really like refactoring and porting tasks which others would consider soul-crushing.

When the codebase has evolved for a really long time, and you can see the different coding styles of different generations of developers, you get to feel like an archeologist or a geologist who's digging through multiple layers of rock.


I often use the archeology analogy to describe my work to others. I like how you become familiar with your predecessors and their strategies for solving problems. It's a way of getting to know them even if you never actually get a chance to meet them.

Some of the best parts:

1. Trying something totally new only to realize that one of your predecessors has attempted to do it before. Bonus points if you can solve it this time around.

2. Digging around and realizing that your predecessor was in fact yourself.

3. Randomly encountering one of your predecessors in real life and becoming instant friends while reminiscing about old times. ;-)


yes please :) I definitely enjoy this sort of work .


Most of those have relatively high value added from the conversion. Perl to Python is essentially meaningless as the languages have feature parity. Complaints like "remove all the global variables and add objects", given Perl actually has all the constructs necessary to remove those way less hours could be used just architecturally refactoring what's there for the same actual value. Something like homogenising dependencies project level are abstract wins its hard to both quantify and be motivated by. Going from assembly to C is hugely improving the situation of productivity.


> Most of those have relatively high value added from the conversion.

That is true. There are so many things I can work on, I try to pick only the ones with the highest ROI of my time.

> feature parity

The initial conversion is feature parity. This is deliberate, as conversion only works if you ruthlessly avoid any attempt at fixing, improving, or refactoring code. Just translate.

But once the translation is done, and the new program is working exactly like the old one, then the benefits start accruing as you can start removing the technical debt, and take advantage of what the new language offers.


> Perl to Python is essentially meaningless as the languages have feature parity.

Well there is some greater context here, namely that they want to put all their tools on one platform. That's not at all meaningless, though I too doubt the importance here. The poor deployment story of scripting languages likely does not play a big role here.


Applications can accumulate 'business cruft' as much as they do code cruft. Things were built for one reason, then were hacked to do something different. I.e:

"Note that GTK-Doc wasn't originally intended to be a general-purpose documentation tool, so it can be a bit awkward to setup and use."

So pure 'rewriting' is not usually what you want. You want the same functionality but in an easier to use and maintain package. The approach I usually take is finding a well supported project that is already solving the same problem and extending it to solve mine.

In this case, a documentation generator seems a suspect thing to rewrite as there are already many of those out there. I'd look at extending Doxygen to support GTK-Doc's syntax (or automating a GTK-Doc to Doxygen translator).

Anyway this is an approach i've used with some success before. It can't always work, sometimes the problem you're solving is fairly unique or sometimes the available open source projects don't have much better code bases than the one you're trying to replace.

As far as "Chesterton's Fences" I say tear 'em down (but instrument and be prepared to rollback). Sometimes the only way to tell why something exists is to remove it. This is effectively just paying the price of technical debt - a cost of doing business. The higher cost is to live in a world littered with rusting fences.


Rewrites lead to all those wonderful feature-removing and compatibility-breaking scenarios that you see in “new versions” of software. They can really suck for consumers and that is reason alone to be much more cautious.

On the other hand, you can’t expect to keep your entire software world standing still. At some point, the operating system or even the hardware will make it really hard to keep an old code base going, and you’ll have to do very creative things just to keep it all working. And no, you typically don’t have the freedom to force people to use some ancient system for your benefit; you’re not in a vacuum, your users are doing lots of other things and their other apps are going to keep moving even if you don’t.

You will reach a point of real regret if you don’t spend at least some time to move code to more modern concepts/languages/libraries. It doesn’t have to be 100% at once, nor does it have to be an outright replacement; leverage multi-language bindings, testing frameworks, etc. and beta users to make progress. And whatever you do, never release a “MyApp 5” that is a completely different code base than “MyApp 4”; this just aggravates people when nothing works quite right. You need MyApp 4.1, 4.2, 4.3, 4.4, etc.


The cURL project consists of roughly 100 thousand lines of C code according to Ohloh.

You do not have to when you choose a more modern safer language that can export functions with C linkage and can trivially call C functions. You decide to use a new language and write new functionality or functions that have to be rewritten anyway.

For instance, gcc first switched to g++ as the default compiler and started allowing a subset of C++. Firefox started using Rust here and there where/when it makes sense. There are countless other examples.

I think very few folks would actually propose converting a 100KLOC or an MLOC project to another language. (Though Go did did it with their runtime and compiler :), though that's quite a special case).


Yup, I'm in the middle of porting something over to Rust that's C based and this works really well. You can stub in the Rust parts piece-by-piece via FFI and incrementally bring up the port testing along the way instead of a big-bang integration at the end. 


              Should I rewrite X in Y?
                 /               \
                /                 \
               /                   \
              |                     |
              |                     |
  Am I doing this just        Is my team full of
  for the features in Y?        experts in Y?
              |                       |
             Yes                  ___/ \___
               \                 |         |
                \               No        Yes
                 \               |         |__________
                  \              |                    |
                   \         Are there any experts    |
                    \          on my team in Y?       |
                     \           |           |        |
                      \          No          |        |
                       \         |          Yes       |
                        \        |     _____/         |
                         \       |     |              |
                          \      |  Did they          |
                           \     |  propose it?       |
                            \    |     |      \       |
                             \   |    Yes      |      |
                              \  |     |       No     |
                             Don't rewrite.    |      |
                                 |             |      |
                                 |           Were you going to
                                 |           rewrite it anyway?
                                 |              No       |
                                 |______________|       Yes
                                                         |
                                                          \
                                                           \
                                                            \
                                                             \
                                                              \
                                                               \
                                                                \
                                                                 \
                                                                  \
                                                                   \
                                                                    |
                                                             Think about it.


Although LOC metrics are always questionable, converting code from one language to another at 100 lines/hr seems absurdly optimistic to me.


Definitely seems completely unrealistic for any conversion involving idiomatic Perl. I could see it being possible for Java to C# or vice versa.


A nice example of rewrite in X is the rewrite of 0install in OCaml from Python [0]. A very systematic approach, replacing code gradually. There are multiple posts on this and they were discussed here a few times.

[0] http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-ret...


``Every now and then the code used "fancy" Perl features. Converting those parts was 10x, 100x, and sometimes up to 1000x slower.''

That is why, when I run an engineering team, I constantly push back against "clever" or "fancy" stuff. Engineers love that stuff, eat it up. I get it, it's fun. But when they move on to something else and a mere mortal has to maintain, fix, or convert it, the cost of that clever stuff becomes apparent. I've had to be that mere mortal and I can assure you it is a miserable experience.

I've often thought that one of the most useful things about me is that I'm not smart enough to be that clever, I like simple, obvious code. It's a little bit of a lie, I'm smart enough, sort of, to be that clever but it's a lot more work. And I know that any code that is more than 6 months, doesn't matter if I wrote it or someone else wrote it, it always feels like someone else wrote it. And man, do I love it when that someone else went for straightforward rather than clever.


The corollary is that entire languages and libraries dedicated to doing things cleverly, even if they are technically usable, are often bad choices.

But unless you can point at benchmarks, nobody will listen to you, and the better choice may even be described as blub.


Summary:

> Manually converting code from one format to another is the most boring, draining and soul-crushing work you can imagine.

> we can estimate that a sustained rate of conversion one person can maintain is around 100 lines of code per hour

> This gives us a clear answer on why people don't just convert their projects from one language to another: There is no such thing as "just rewrite it in X"


The answer to the question "why don't you re-write in X" is the question "what benefit to the client would this create and is it worth the effort?"


The other answer is "great idea, please send your pull request when it's ready"


Don't do that unless you actually intend to accept the pull request. Saying this was a way of telling someone to go away is not right.


If the demand is "you should rewrite in X" from some random user I think it's a perfectly fine way to say "go away."

If there's some indication that the request is from someone qualified and willing to help, maybe you give it more consideration.


I get what you're saying. A brilliant idea that someone isn't willing to put any effort into really isn't likely to be brilliant.

I'll ask someone to write a one page summary of the project goals. Similar result.


I will smack the next person that says "because x is more productive than y". That's an excuse to scratch an itch to using some latest popular framework or language. If that was actually true, it would be easily measureable with a double blind test. What's usually true is "rewrites (in any language) are more productive than the original team, because now we actually understand the requirements"


Easily measurable? With what metric?

Or are you implying that, because you don't know a measurement that has a causal relationship with it, that you can't be improving it?


I'm not the person making the absurd claim "x is more productive than y". I've merely pointed out their lack of proof for such a ridiculous statement.


Solution: reply to bug report, "please reopen when you have finished that pull request, complete with unit tests", then mark as WONTFIX.


Have an upvote, that's brutally honest.


People mention that said rewrite needs to have some immediate benefits. I actually like rewrites where no obvious changes are visible. These are rewrites, where one anticipates future problems and does this work preventively.

Three projects come to mind. 1) Rewriting a set of internal libraries from Python 2 to 3, because it's better to do early on, before compatibility problems arise. 2) Rewriting an ancient cli tool written in C++ into Go (nobody here knows C++, some speak Go). It has worked for years untouched, but you never know. 3) Rewriting a Fortran modelling tool into Go, because while people know Fortran, it maintenance costs are annoying and adding functionality is difficult or impossible.

In all cases, no functionality was gained or removed by the rewrite, but future pain was spared thanks to these endeavours.


You would not consider all of these maintainability improvements to be immediate benefits?


"maintainability" by DEFINITION is not immediate.


So disappointing! I thought I was going to read a call to arms to stop building systems as web applications, and move to the venerable window protocol.


I've thought about codebase conversions quite a bit while working on the decaffeinate project [1] to help speed up the conversion from CoffeeScript to JavaScript at my work. I've also given some advice on different strategies [2][3], although some of that is specific to the CoffeeScript -> JavaScript problem, which is easier than other situations since the languages can interop easily and are so similar.

The "1 hour per 100 lines" rough number from the article is probably optimistic, at least in some settings. As one example, a coworker of mine manually converted a complex 500-line file, and it took two days to convert, one day to go through code review, and introduced two bugs.

Probably any reasonable strategy needs to find a way to work incrementally and focus on the most valuable parts first. For example, if you focus your efforts on converting files that are the most active, then you may be able to culturally move your team to the new language even if there's still a lot of legacy code in the old language.

In terms of the three phases from the article, I'm hoping that decaffeinate can become stable enough that it completely automates step 1 and avoids the need for step 2, but step 3 will always take time. In my case, my plan is to do a fully-automated conversion over the ~150k lines of code (broken up into maybe 10 chunks), call the style issues tech debt, and slowly clean those up as we work through each part of the code.

[1] https://github.com/decaffeinate/decaffeinate

[2] https://github.com/decaffeinate/decaffeinate/blob/master/doc...

[3] https://github.com/codecombat/codecombat/issues/4276#issueco...


> There is no such thing as "just rewrite it in X".

People do it all the time, every day. Code has side affects and features you don't necessarily need. NIH is usually, "don't need most of that here" or "language X would do it cleaner/with-less-bugs(tm)", which is different wording for the exact same reasoning. So I'm not sure why the focus on this line-by-line conversion that nobody would choose to do.

On the other hand, rewriting my java chat server in erlang (in 2008) went from 5000 lines to about 500. That was a point by point functionality conversion of my own project, for existing java clients.


For any reasonably complex codebase the complete rewrite is not possible. The first several versions of the rewrite will inevitably lack the functionality present in the original version and have more bugs compared to the original version. In my experience, you can only achieve the feature and quality parity if you continue to work on the code long after the initial rewrite.

Take for example a rewrite of coreutils in Rust. The LS utility in original C (https://github.com/goj/coreutils/blob/rm-d/src/ls.c) and its rewrite in Rust (https://github.com/uutils/coreutils/blob/master/src/ls/ls.rs) are quite different. The rewrite doesn't implement half of the original options. It's reasonable to assume that not many people even use the rewritten LS, so it contains bugs that the original widely used version doesn't have.

The bottom line is that if you want to achieve comparable quality and features, the time and effort invested in rewriting some code is almost the same as the time and effort invested in creating that code from scratch.


> For any reasonably complex codebase the complete rewrite is not possible.

If it's reasonably complex but also reasonably well designed, it can be completely but incrementally rewritten by module, without the problems you describe.

OTOH, those are usually the things that there is less pain with the existing system driving by a desire to rewrite. Chances are if you want to rewrite it, it's going to be a nightmare.


At least where I work, switching from perl to python happens when the guy that wrote the perl leaves the company and nobody else wants to touch it.

Every time I've been handed a perl script to 'fix', if I think I'm ever going to have to touch it again, I rewrite it in python. Our perl code is a migraine-inducing mess of unmaintainable punctuation vomit.


Generally you'll be porting to a more expressive language so you'll be hopefully removing lines overall.


When Red Hat converted our OVirt VM/cloud management tool from C# to Java, it was done using an external tool (I think using a tool written by Tangible Inc). This converted the code mostly automatically, with some manual tidying up afterwards. It was about 100K lines of code, and I believe the conversion took about 6 months (while existing development continued). The project was ultimately successful. It is now a respected open source Java project and widely used and sold.

More on the conversion here: https://lpeer.blogspot.co.uk/2010/04/switching-from-c-to-jav...


An interesting recent example of successful rewrite was moving the Go compiler from C to Go. They wrote a transpiler from the (very disciplined) C code to Go, and then have been slowly idiomaticifying the code since.


You have to realise the scope and context: the Go compiler rewrite was done by people who were:

* working on the codebase full time

* intimately familiar with the quirks

* relying on the transpiling to be mostly automated

IMO as far as rewrites go, that's a pretty tall order.

I'm personally contemplating a rewrite of an old project (mostly due to GTK having gone downhill in past years), and harbour no misconceptions about the scale of the effort. It's only about 10k lines of python, across two different projects, but if it takes me less than 2 months (of active spare time work), I will be very much surprised.


Oh, totally. Very much agree.

Out of curiosity, what's the app, and what UI toolkit would you port to? (And would you stick with python?)


The app is my old online poker tracker, written with pygtk.

I am tempted to move to Qt instead, and for a number of reasons will probably try Go instead.

1) As much as I have traditionally disliked Go's dependency management, with the vendoring support it can be made reasonable

2) I never got python's threading to work properly, which made the UI unresponsive when dealing with database operations; and threading in python is a hack anyway

3) I believe (cough hubris!) that I can simplify the internals by further splitting the architecture

One of the best ideas I ever had was to forego all attempts at subprocess management and just go for DBus and its process autolauncher instead. I mean, have you ever tried to make a fork/exec work from within GTK software without causing X hiccups?


Cool stuff. I think the UI library story is still pretty raw, but I've done several backend applications in Go, and I'm pretty much in love. It's a very well-engineered language with a strong appreciation for real-world practicalities.


I was thinking many of the manual changes in step one could have been done through a program or script freeing you to help with phase 2/3. Rewriting without introducing something new: features, stability, cost reduction, easier to maintain, etc is usually a waste of time but could be a fun if you are using the exercise to learn.


I mostly agree with the sentiment of this article but converting a bunch of shell, Awk and Perl into Python 3 is probably a good idea. It's not like Python is some newfangled untested language. Making the build system portable and easy to use is really important for an Open-Source project.


The Eighth Dirty Word - "Just"

https://blog.forrestthewoods.com/the-eighth-dirty-word-just-...


I'm not sure that large organizations are soon going to grok this trick to save them millions of dollars and tens of thousands of hours of meetings and employee debate. Anyhow, here it is:

Only allow coding that lets the company change its mind.

If you choose a language which has conservative, well established features, devoid of parsing quirks, and you enforce very strong coding standards per-team, then it is possible for every program to be composed of per-team idioms which can then be automatically translated to idiomatic code in a different language through isomorphic changes.

You will still need to have a project to port your program, but instead of trying to outstrip the rate at which production code changes, the effort is instead going towards perfecting a software machine which translates the program in an instant. By doing it this way, you can always upgrade your technology without slowing down the production team, or without trying to outstrip a production team which is already trying to move competitively fast.

I know this trick works. I've done it. What makes this challenging:

    - Management will be afraid of this, because it's unconventional
    - Teams may not want to or be able to usefully adhere to such standards
But if your organization can pull this off, then you will be able to upgrade technology without slowdowns in implementing new business functionality. If organizations can pull this off, they can "just rewrite in in X" any time they want to! (And wind up with idiomatic code in the new language! If you are trying to do this, and not winding up with idiomatic new code, then you have tackled this at the wrong level of abstraction, possibly being hampered by a team which didn't have suitable coding standards.)

(Think of it this way: Do you think it's easier for a very capable team to outstrip your ability to keep up with features, or your ability to keep up with new programming idioms? If you team is competent, they will be quickly producing new features, but seldom producing new coding idioms!)


Another flavor of this topic is the following: just convert from database X to database Y. It should be easy, they are both SQL.




