Hacker News new | comments | show | ask | jobs | submit login

> Companies never think of the old guys as the ones to implement the new system - that's a job for the "enterprise experts"

Exactly this is why rewrites fail. The challenge of a rewrite is not in mapping the core architecture and core use case, it's mapping all the edge cases and covering all the end user needs. You need people intimately familiar with the old system to make sure the new system does all the weird stuff which nobody understood but had good reasons that was in the corners of the old system's code. IMHO the best way to approach a rewrite is to make a blended team of experts of the old system and experts of the new technology, and you put a manager in charge with excellent people skills who can get them to work together.

What a ton of people forget : you also need to implement bugs or edge cases the same as in the old software.

More often than not the next developer using the old system worked around it. But never documented it as a bug (or wrong format, it featur! Not a bug!)

Rewrites fail, IME, because the old system was never documented in the first place, and because they are only done when a change that management is not confident can be done in the old system (usually because both technical and business expertise associated with the old system has been lost, and it's being maintained by what amounts to a cargo cult priesthood) is necessary, usually on a firm and fairly short timetable.

And since the number of old guys available to help is constantly growing, it no longer makes sense to do projects without them. A diverse team makes better products.

What about not doing a rewrite? What about instead refactoring, documenting etc the old system.

If you want to move off of COBOL + mainframe, that kind of necessitates a rewrite, doesn't it?

Emulate the mainframe and make a modern abstraction interface over the platform. We investigated this option for some IBM 360 COBOL code for a project I was running. We were a very small team and were market consumers (not implementers) of the original system which got open sourced in a panic. We eventually chose not too - but seriously considered it. If I were the owners (the Fed) I would have.

This doesn't help if part of your goal is to have a system not written in COBOL. Emulating the older hardware and OS gives you an even more complex system that's harder to hire people to work on.

It helps for the first stage - getting the platform onto a more sustainable environment. Once the interface exists and has full coverage, parts of the backend can begin migration without a doomed-to-fail 'big bang' rewrite.

Does it really help? What does it cost to build a trustworthy emulator for an ancient system that you don't own the source for? What would it cost to just migrate your legacy applications to a newer mainframe that IBM supports?

I wasnt suggesting writing an emulator from scratch. I was suggesting emulating the mainframe. Here is what we, specifically, were looking at: https://en.wikipedia.org/wiki/Hercules_%28emulator%29

Don't you then need to pay IBM for the OS anyway? Will they license the OS for this use?

Or did your system actually include the OS source?

For 360, it is now in the public domain: https://en.wikipedia.org/wiki/OS/360_and_successors

Interesting. Thank for the details.

One of the problems with attempting to write interface is the opaque source/consumer problem.

E.g. I'm working on a system that I could hypothetically abstract (I've got access to it, can poke with enough tests and test data, etc).

However. What I don't have is access to code or test injection into any of my sources / consumers. Both of which are expecting all the corner case quirks to be exactly identical and may actually have accreted software that depends on a specific quirk. A specific quirk that I have no way of knowing about. Or may send me something I've never seen and am not expecting because it's a 1:1,000,000 corner case, and we don't have any logging of an example that came through production.

I haven't worked on too much of the heavyweight stuff like you have, but I tend to take the perspective that "a 100% compatible rewrite is impossible." 95%+ maybe, but we're going to have to deal with the <= 5% after it goes to production.

Did you ever pursue writing and dropping a new tailored load balancer / router type application on the incoming data stream such that you could divert a specific portion onto new system(s)?

> we don't have any logging of an example that came through production.

If it's something that's never come through production, it is not already documented anywhere, and it has an occurrence rate of approximately 0.001%, is it really a feature that needs to be replicated?

Sure, because clients will say anything short of 100% success is failure.

(Currently migrating a site of 125k pages of content with oodles of edge-cases)

Yes, but not necessarily a big-bang style rewrite and that's where the difference lies. Big-bang style rewrites have an extremely high failure rate, doing it smart is more work from the start but has a much higher chance of success.

Because COBOL is the PHP of the 60s, and mainframes are slow and expensive.

Also, too many of the talents are stuck in a blocked I/O mindset somehow.

Some are wizards though, writing assembly and making raspberry pi sized systems blazingly fast. OK a couple of raspberry pi:s

Mainframes aren't hotbeds of compute power - they are all about the I/O. Your typical COBOL program reads a record, does some moderate processing, and writes a record out. Over and over. So keeping the input and output channels full so processing wouldn't stall was a key design goal.

As a counterpoint, many very well funded and talent rich organizations have failed to retire what TPF mainframes do every day for airlines, banks, and credit card companies.

Cobol isn't involved, but those slow mainframes are.

To be fair, rewriting huge mission critical systems is hard, no matter of what kind of system it is and what you are changing to.

Non-Blocking i/o isn't really some miracle new-age programming drug. Doesn't change the equation much.

Our mainframe is expensive but it isn't slow. It's not exactly sitting on the same hardware from the 70's.

The amount of performance (particularly CPU) you get per dollar is very low. Mainframes are all about lots of I/O with ridiculously high reliability and availability, but for an absurd amount of money.

Right, but that's our use case and it also happens to run our legacy applications! IBM's support is also very good.

We're not running our modeling engines and that stuff on it. We have HPC for that.

There was never an excuse to not document PHP functionality. Infact, its quite easy to document now, just the same as any other language. Devs simply argue that the 'system is still changing', and so nothing is documented.

Rewrites also fail because the the thing being rewritten is a mess, no specification exists of what it does, and the only regression test suite is deployment into production.

all features and edge cases should have a test. if possible also have tests for all the bugs ever found in the old system. it helps if the tests are not too coupled with the code and are well documented. a test could be making an entry where deb and cred dont match. or trying to enter an extra decimal. and if the system adds two decimals corectly.

>all features and edge casss should have a test.

Yeah, but the interesting problem is what happens next. Let's say a test case reveals that there's a flaw in the next-gen system. You fix it. It later turns out that the same flaw exists in the legacy system. What do you do?

Do you revert the fix, or leave it in place?

Yeah, I had to explain to a client once that our new version of their analytics query produced different results because their original SAS code didn't mind referencing variables before they're declared, and so one of the numbers in the analysis used to always be 0.

They did NOT appreciate hearing that they had been running a bugged query for years...

Revert the fix and add it to your backlog. You've got enough to worry about when porting/rewriting, don't add additional dimensions of complexity and risk by trying to change business logic at the same time. Minimize risk by minimizing change...then when the new system is up on its feet, go back and fix all the mistakes you found.

I upvoted that but I suspect that there will never be an opportunity to "go back and fix all the mistakes you found".

At least there never is for me.

I should count all the grains of rice in my lunch, too.

Maybe I don't give the rewriters enough credit, but in all honesty, I'd be surprised if they bothered to look that deep into the code and tests before it's already too late. Too often, there's just some handwaving and proclaiming you "don't to it like that in today's software".

I worked a company where they did a rewrite of one of the main systems. My friend said it best when they finally finished and annoyed a lot of their customers. He said, "They fixed the things that were wrong, but they missed implementing all the things it got right."

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact