Hacker News new | past | comments | ask | show | jobs | submit login

I work on some software of which the oldest parts of the source code date back to about 2009. Over the years, some very smart (some of them dangerously smart and woefully inexperienced, and clearly - not at all their fault - not properly mentored or supervised) people worked on it and left.

What we have now is frequently a mystery. Simple changes are difficult, difficult changes verge on the impossible. Every new feature requires reverse-engineering of the existing code. Sometimes literally 95% of the time is spent reverse-engineering the existing code (no exaggeration - we measured it); changes can take literally 20 times as long as they should while we work out what the existing code does (and also, often not quite the same, what it's meant to do, which is sometimes simply impossible to ever know).

Pieces are gradually being documented as we work out what they do, but layers of cruft from years gone by from people with deadlines to meet and no chance of understanding the existing code sit like landmines and sometimes like unbreakable bonds that can never be undone. In our estimates, every time we have to rely on existing functionality that should be rock solid reliable and completely understood yet that we have not yet had to fully reverse-engineer, we mark it "high risk, add a month". The time I found that someone had rewritten several pieces of the Qt libraries (without documenting what, or why) was devastating; it took away one of the cornerstones I'd been relying on, the one marked "at least I know I can trust the Qt libraries".

It doesn't matter how smart we are, how skilled a coder we are, how genius our algorithms are; if we write something that can't be understood by the next person to read it, and isn't properly documented somewhere in some way that our fifth replacement can find easily five years later - if we write something of which even the purpose, let alone the implementation, will take someone weeks to reverse engineer - we're writing legacy code on day one and, while we may be skilled programmers, we're truly Godawful software engineers.

Peter Naur's classic 1985 essay "Programming as Theory Building" argues that this is because a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it's lossy, so you can't reconstruct a program from its code.



Thanks for sharing this. I've been looking for this for a long time to replace my own poor attempt at saying the same:

The code you write isn't the hard part. It's not the interesting part. And it's not really why you're paid. What I'm trying to imbue into you is the discipline to design and document and plan for the future. Saying you can "whip up a script this afternoon" is not impressive or commendable. It does harm without the other pieces.

Here's the thing: when I see people spend all their time on design discussions and documents, their work isn't any better for it. Usually it's worse. The documents are not clear or helpful. The designs encoded therein are not better than the first pass knee-jerk thought. The points raised in reviews are just bikeshedding. The process encourages explosion of complexity, making mountains of molehills, and sometimes can be a refuge for people who don't actually have the skills to hide out in. I do very much respect when someone can plop down some working code this afternoon vs. organize an 8-person kickoff meeting next week.

If it's a genuinely large project, the working code is a skeleton POC, and we can run a design process using what we learned from it. But most of the time it's not, and that engineer has saved us collectively hundreds of hours of waste talking to death something that just isn't that hard.

I think one of the critical senior engineer skills is having correct intuitions about how much time and energy should be sunk on a particular task. Sometimes that means asking people to slow down, collaborate, and think. Equally often, it means asking them to put down the calendar invites, make a choice, and bang something out. We have this bizarre cost model where code is expensive and meetings are free; nothing could be further from the truth.

The best software I ever worked on - software in which the customer never reported a single bug, and every requirement was met - could not have been done without the documentation. The documentation was part of the process of creating the software - the larger part, really.

Requirements, design, implementation, testing; all documented thoroughly, all reviewed and checked, each stage linked clearly to the previous and the next. One could literally trace a requirement from the requirements documentation, through the design, into implementation (which was done using literal programming, producing beautiful PDFs for humans to read - there was very little need for anyone to ever actually look at the actual pure Objective-C code that was fed to the compiler), and then onwards to testing, such that each individual test was linked to proving given requirement(s) that had been written down at the start; sometimes years previously.

The customer was so impressed that they asked us to take over from another supplier whom they were busy suing for lack of delivery of a piece of the same system. Our software was delivered on time, exactly to spec, fully documented from requirements to design to implementation to testing. The whole project ran for most of a decade.

It certainly wasn't easy. It required rigor and discipline and review panels and, I gather, somewhere between ten and twenty years' experience of creating software like that. It worked; the only software I've ever worked on in which the customer never reported so much as a single bug. The entire system could be completely understood down to the level of the code by reading the documentation, and when changes were required, the software engineer would begin by reading the documentation and upon opening the literal style source file, would find it just as expected from the design. I've never seen anything like it before or since.

They went all in, though, and took years to get there. One couldn't just take a modern, agile-style software shop and start doing this. It was culturally ingrained.

I certainly agree that in other places, I've seen some truly awful documentation that hindered more than it helped. I've seen system diagrams literally consisting of a square box with "server" written in it. Protocol documents that contradict themselves on the same page. I once was double-checking the wiring diagram for a cradle for a processor board, and disagreed with the voltage of a single pin; it turned out I and the other chap read the exact same line of text from the board manufacturer and interpreted it in completely opposite ways - one of us interpreted it as meaning the pin should be grounded, the other that it should be connected to the live rail. The documentation was so bad we ended up having to go back to the manufacturer and get a human on the line willing to go on record with the correct value.

What kind of software was this?

I've been in a similar project where we would get beautiful requirements from the client and implement them flawlessly on time, every time. I've also been puzzled by this a lot and wonder why other projects can't be the same. The thing missing from the story is the complexity of the software and how long the client spent upfront on designing those requirements before handing them off. This was very trivial applications with few states, no UI and clear input and outputs, nowhere near a big modern web service. If our implementation was according to spec but not what the end customer really needed it would still not be counted as a bug, it was the clients responsibility for writing a bad spec, the client was very much aware of this so we also hardly ever got any bug reports in that sense.

It was waterfall it a nutshell, assuming the first design is correct and resource optimizing every step of the way for it. You couldn't replicate that process onto an app with a big UI footprint and quickly changing requirements which requires more iteration rather than upfront design.

>which was done using literal programming

I presume you meant "Literate Programming" (one invented by Don Knuth)?

Regarding this project, can you share some more details as to how everything was done from requirements to implementation? How were they seamlessly linked together?

Yes I did mean that. We used noweb.

Let me probe my memory...

Requirements were developed first by the customer and our requirements specialists, working together. That was a big priority. Requirements were broken down into individual elements, examined for contradictions and ambiguities, they had to be discrete, they had to be testable, all that sort of thing. This required domain knowledge as well as just being thorough and having attention to detail. As one would expect when writing software, it turns out that if you don't know what it's meant to do, or you misunderstand what it's meant to do, the odds of building the right thing are very low. This is something I see very commonly in many software companies; it's very common to start building things without knowing what they're meant to do. Sometimes it's impossible to know at the start what they're meant to do, but I do wonder if we could be a little harder up front about demanding to know what it's meant to do. Agile seems to be a way to control against that risk; a way of building software without knowing at the start what it's meant to do but getting there by frequent course correction.

Once written down and formally agreed by the customer, each little requirement was given a unique identifier. That unique identifier was carried through the design and implementation and tests. The literate programming techniques used and the format of the design documents and test documents typically mean that a sidebar on each page showed the exact requirements that a given section was implementing; an index would show all such places, so if I wanted to know how we were going to meet requirement JAD-431, I could look up all such pages and satisfy myself that the design met it, that the implementation met it, that the tests covered it. Reviews would involve someone having a copy of the requirements any given design or implementation was meant to meet, and ensuring that every one of them was indeed met.

I remember printing out test documents, and executing them (including the build; each test document specified the versions of source code to be used and the tools used to build it, and often the exact build steps), and on each page where I signed and dated it to confirm that I had conducted the test and it had passed, there was a listing of the requirements being tested (or partly tested). That sure gives someone pause; when I signed my own name in ink to give my personal guarantee that the test passed and was all in order, I sure didn't accept any ambiguities or shoddiness. Those signed, dated test documents would get a double-check by QA and if that was the final test, sealed in an envelope with a signature over the seal, and securely stored against the customer performing a random inspection or against anyone ever needing to come back and verify that for a given version, at a given time, the tests did pass.

If a requirement ever changed (which did happen, and we sure did charge for it!), it was thus easy to identify everything else that would need to change. The effect of the change would ripple through the design and implementation and tests, linked by the affected requirement unique identifiers; each piece thence being examined, updated and confirmed as once again meeting the requirements, with confidence that nothing had been overlooked.

While I've really enjoyed your posts in this thread you're anecdotes do highlight something very important; quality software is expensive. Also, most clients, be they internal to the org or external don't really know what they want and certainly don't have the wherewithal to supply detailed, testable requirements up front.

Where I think most shops touting "Agile" miss the mark is the customer collaboration part. Extracting the detailed requirements you've described would take a lot of upfront time and effort that I've found most "Agile" shops don't have the stomach for. Where the rubber usually meets the road is in "Grooming" sessions. These sessions are typically rush jobs to move Stories to "Ready for Dev" but what I've experienced is there's rarely adequate information to proceed to development after a typical Grooming session.

most clients... don't really know what they want and certainly don't have the wherewithal to supply detailed, testable requirements up front.

Very much so. To do this well, one requires something that is in very short supply; high-quality, competent customers.

Nice, it seems like a simple process done rigorously. No highfalutin buzzwords (compared to Agile/Scrum :-) but a systematic approach using commonsense/wellknown processes. In essence, a) Utmost focus on Requirements gathering b) Use Literate Programming techniques to weave Design and Implementation and possibly into Testing/QA c) Use a "primary key" throughout to trace everything to requirements.

> It certainly wasn't easy. It required rigor and discipline and review panels

In other words, it took... engineering.

What company was this? I’d love to see a talk about how this worked

This was a software specialist UK outpost of a second-tier US defence company, about a decade ago. They developed their process while an independent software company, and were subsequently eaten by that US defence company (before I joined) who had the good sense to leave them alone to do what they do.

I hear you. But I'm not convinced that there's ever an okay time to skip writing down design, requirements, etc. Even if they're rough in a wiki page and it takes 30 minutes. It simply doesn't belong in your head alone.

I'm certainly not advocating for adding insane amounts of process to little jobs. But show your work. It's worth most of the credit.

Sure, the author needs to write things down; often commit messages and block comments are enough, sometimes more is needed. If anything we are heavily under-invested in solo documentation-writing.

What drives me up the wall is starting a long, formal document and having a bunch of meetings about it, seeking input and consensus from 10+ people and committees, for something with the scope of a couple days. That process is designed for large, multi-engineer-month new systems, or decisions that will be hard to back out of and have ramifications for hundreds of people over years. But I often see junior engineers putting the wheels in motion for basic day-to-day maintenance and feature iteration tasks.

It doesn't help that our promotion process incentives this heavily, because it generates lots of evidence for the committee to evaluate.

I think Naur would say that design documents can't capture the program any more than the source code does. The program lives in the minds of the people who make it. It's like a distributed system with no non-volatile storage. When the last node goes down, end of system.

Yes, he says this pretty explicitly in the essay. Some relevant excerpts:

"Programming in this sense primarily must be the programmers' building up of knowledge of a certain kind, knowledge taken to be basically the programmers' immediate posession, any documentation being an auxiliary, secondary product." (emphasis mine)

"The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered."

But we "Value working software over comprehensive documentation".

I don't fully agree with that, I mean a program will keep running even if no one knows how it works anymore. But yeah knowing how it works and why is highly valuable and expensive to restore if lost.

> I mean a program will keep running even if no one knows how it works anymore.

Not if it doesn't compile anymore because of a switch in hardware. I've seen this firsthand; same exact code but results were off by an order of magnitude.

As Naur pointed out in his article, just keeping the existing program running in its original state isn't enough. The program reflects the needs of the people using it, and those needs evolve over time.

For example, the software that runs a bank needs to be constantly updated to comply with current laws and regulations and to support the new types of accounts and services being offered by the bank. Over time, customers wanted to have access to their accounts through ATMs, then over the web, and later via mobile apps.

Sounds like Platonic Forms[1], but for applied computer science.

Skimming Naur's essay, the solution to this problem seems like it could involve building higher level code analysis into the language itself (making the linter part of the language and development environment).

Microsoft is doing this to some degree with compiler extensions on the .NET platform [2]. When used creatively, you can effectively deploy a custom linter with your libraries which emits compiler errors and warnings based on how the user is doing things (ie I think you're trying to do X, have you considered using this new functionality instead?). However, I think it is interesting to consider what it would look like if statically typed code and style analysis/suggestions were a first class citizen in a programming language, not just an add on for advanced users. If this exists now, I'd love to check it out.

It's too bad we won't all live to see the inevitable improvement in computational notation over the next few hundred years. Time to binge watch some Alan Kay talks and try to imagine it.

[1] https://en.wikipedia.org/wiki/Theory_of_forms [2] https://docs.microsoft.com/en-us/visualstudio/code-quality/r...

There are worse things than no documentation when doing one of these excavations. Design docs that were aspirational and only partially implemented. Code comments that are incorrect, or reflect how things used to work. Reams of beautiful code and through documentation that it turns out are not the production implementation anymore because of a feature flag change or migration elsewhere. Migration trackers with open action items and deadlines 7 years in the past, creepily abandoned mid-flight like Pripyat.

Detective work can go a long way, but in situations like this there's no substitute for ownership continuity and institutional memory.

See Conway's Law:


organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.

- M. Conway

This happened to me on a project. I spent months documenting the Byzantine labyrinth of hundreds of interwoven services that had been patched for a decade by a hundred developers. Much of it had been done under pair programming, so required pair programming or a 120 IQ just to decipher it.

In the end I was squeezed out not because I was wrong, but because my method of solving longstanding issues from first principles didn't fit within the company politics of the client's organization.

The gist of it is that clients mostly see short term personal productivity and not the long term gains from removing technical debt or avoiding it altogether by way of the developer's experience. Senior engineers can eventually reach a point where there isn't much for them to contribute on a project because every last issue could have been avoided through proper planning and engineering in the first place.

In my case, my greatest strengths (like refactoring ability) were overlooked and my greatest weaknesses (like implementing features at high speed by repeating myself or ignoring project conventions because I don't know any better - something I can no longer do) were brought to the forefront.

A year ago I thought Conway's law was nonsense, but recent experience has led me to walk some of that back. I would replace "are constrained to" with "grow to", or perhaps just "often".

On my first day of my current job, Mufasa (my boss) took me up on Pride Rock at sunrise, and we cast our eyes over the system diagram of the back-end supporting our product. And he told me, "We work on this box here." Our team's box gets data from that team's box, and we output data to be consumed by this other team's box...

"Any what about that shadowy land?" I asked, pointing to the arrows coming from outside the diagram.

"That is the Front-end. You must never go there."

So yeah, Conway's law through and through. This is a big company, though, and a long-lived product. At my last job, at a startup that was ~3 years old, things were intentionally very different. This month you and a few others worked on feature X. Not on this system or that system, you were implementing something for the customer. Maybe you're a backend engineer, but you still wrote the database migration, the queries, the endpoints, the front-end logic, markup, CSS, whatever. Some of it you sucked at, but a lot of that was handled by good team composition. If another engineer on that feature knew CSS better, they mostly styled the things and you reviewed their work. Maybe you styled the next thing.

I was frustrated in that job, because I didn't like writing CSS, and I wasn't good at it, and I was very good at some other things. Why couldn't I just do those things I was good at? It would have kept me a lot more productive, and saved (and made) the company a lot of money! Why did I have to do things I was bad at? Did they think engineers were interchangeable?

Well, now I know. In my current job, working on five different features in my box in the system diagram, I'm super productive when pointed in the right direction with enough to work on and some autonomy. That just doesn't happen as often. We need to communicate between teams a lot more. We need to align our quarterly deliverables. We duplicate functionality on both sides of an interface, (or in each of N components), with subtly different semantics, because nobody technical is responsible for the feature. Show-stopping bugs before launch are a fact of life, because integration tests between our components suck.

Generously (though for other reasons too), we're 1/3 as productive as my last gig was. Conway's law is strangling us.

Legacy code from 2009. I'm getting old.

I'm still running/maintaining code I originally wrote in the 1980's.


Wow, I didn't realize D went back that far. I thought C++ came in the early 90s and D naturally followed as a "better version" after that. Very cool.

It was originally for a C compiler, and work started on it in 1983 or so.

Admittedly playing around with the definition of "legacy code" (and I get what you meant - I'm just playing with the words!), this particular set of code is legacy code from 2009, but I think I could argue that some of the world's legacy code was written yesterday, and there is code from decades ago that is not legacy code.

This particular set, though, was legacy code pretty much the week after it was written :(

I have played with this idea recently lately. I consider almost all code out there as legacy. I was talking about it at a conference also explaining how I like working with legacy code. (https://youtu.be/n0XCMHrSbwc) I think I’m the odd one who actually like working with legacy code.

> I think I’m the odd one who actually like working with legacy code.

I had a fun bit of legacy code wrangling a month or so back. I looked at production profiles of some very compute-heavy jobs (we have great tools for that) and found we were spending ~half of my pay on cloud compute in this one library function -- deep equality checks for a kind of object.

The equality-checking code itself is very slow, significantly slower than hand-written alternatives, but the objects are complicated (though luckily tree-like in structure) so hand-writing equality comparison functions would be a nightmare.

Many of these structures had "id" fields, though. Gasp, could we just compare ids instead of doing deep equality checks?

So I asked around, and nobody knew. Nobody even knew what the code was for. It'd be trivial to replace the deep equality checks with id comparisons, but we can't do it.

Seeing how some legacy code works is one thing. Data comes in here, data goes out there. Knowing which invariants the code is trying to maintain, though, what properties it assumes about the data here or there (Sorted? Unique? Up to date? All belongs to the same user?) makes making changes difficult, though. It means you need to write defensive code that the original authors would find ridiculous, because you don't have their institutional knowledge, and can't derive it locally in the code, and can't inspect it at runtime (or be sure that prod won't send a counter-example the day after you deploy to production.)

To me, "legacy" just means undocumented code that isn't well understood by anyone currently working in the code base. By that definition, pretty much anything written by your average contractor is legacy code.

I'm pretty sure I'm writing some legacy code from 2019 right now.

Every time I go back to write code I had previously worked on. I go, did I write this!? Everyday is legacy code! Truth

Although this is normal in the sense that it often happens it should not be considered normal in the sense that it is a situation that should be allowed to endure. You should educate yourself up to the point that it no longer is true. The thing that greatly increased my quality as a programmer is to learn TDD and also the book 'Working Effectively with Legacy Code' by Michael Feathers. It might be something different for you but you should go find out what it is.

Great book. I think a lot of people want to test, but lack skills. Testing software is actually quite difficult. You have to know what to test, then how. I love tdd, but I find it takes too long for businesses to tolerate, so usually I have to test after to hit deadlines. Tdd works great for pure functions with logic. But things like ui integration tests (not to be confused with functional tests) are very time consuming to write with red green rafeactor tdd.

I am not sure TDD really takes that much longer. Writing the tests certainly takes time but if you don't write tests you will have to spend more time testing the software manually. If the test is a sequence of 20 click in a web interface and you have to do that multiple times before it works that also costs time and you are probably not going through every scenario in that case so errors can remain undiscovered for some time.

I don't always do TDD in a 100% strict way. The duration of the TDD cycle varies for me mainly depending on how practical a very short cycle is and also depending on how complex the software is. I have found that overly obsessing about very short cycles can be inefficient. Also, I tend to test a cluster of multiple classes instead of a single class or a single method. I think the subject that a test is about should be something at a level that is a bit higher, perhaps even something that the customer could recognize as something they value. If you do not do that you might be testing implementation details that are very much subject to change. Testing at the right level also can save quite a bit of time.

I don't write user interfaces that often and when I do they are mostly not that fancy. In that case I have sometimes written tests that check that the HTML output is correct. In the case of things that are purely visible I quite often first fix the thing and then write a test because the browser in my head is not always quite good enough to write a correct test beforehand.

I have generally found that management was encouraging towards writing test in places where I have worked. Occasionally I have been praised for writing high quality software but also sometimes got the remark that sometimes things were taking a longer than expected. Everybody should be able to understand that there is a trade-off here. I would not want to work in a place where they produce as much crap as possible in as little time as possible. Lack of quality is quite hard on me emotionally.

I think the problem with TDD is it assumes you know what shape your functions and classes and data structures will take before you start coding. This is rarely the case, and if it were easy then CASE tools would have won over the world in the 90s. They didn't and people still code in typeless scripting languages.

I think it's this glossing over the inherent and deep complexity of software development that turns a lot of people off TDD.

What's wrong with getting the software right, often through a process of trial and error, and then writing the tests to lock in what you have working?

"TDD [...] assumes you know what shape your functions and classes and data structures will take before you start coding". Absolutely not. In a green field project you might start TDD with an 'architecture' consisting of only one function. Since one single function is a very bad architecture for anything except something extremely simple at some point refactoring would take that function into multiple functions, or make a class of it or do yet something else. The main difference with the CASE approach is that TDD does not presume that we can know an adequate architecture beforehand but instead can figure that out while we are going. I.e., decide things when we actually have the information to do it. TDD actually is the only method of software development that I know of that respects the complexities involved in software development.

One might write test afterwards but as I note above tests do cost time so one should aim for the maximum payback of that time. When you start with the test it starts paying back as soon as possible starting from the point where it flags your first attempt making the test pass as not quite working.

One case where it may be better to start with some trial and error is when it is not clear what algorithm should be used.

> In the case of things that are purely visible I quite often first fix the thing and then write a test because the browser in my head is not always quite good enough to write a correct test beforehand.

This is what I was referring to.

Engineers are often ready to scrap things and start over, but sometimes it’s worth having a couple senior/principal/staff level folks do a quick evaluation and figure out if you’re needlessly duct taping an unmanageable piece of service/application/whatever abstraction and actually scrapping it might be the better option.

From my own experience, you don’t actually know how the existing system works, the expected business rules were never documented, etc, and so even starting over isn’t really a less ambiguous path.

so even starting over isn’t really a less ambiguous path.

Sure isn't. We don't know what the software does right now, but there are customers all over the world using it and paying tens of thousands per year for support contracts; since we don't know quite what it does, we don't know quite what they're doing with it. We can pretty much guarantee that if we started over, we would miss a great many requirements (and a great many bugs that happen to work for various specific customers) and a lot of customers would be very unhappy. We don't know what it does, they don't know what it does, but it's working for them at the moment.

Oh Gods, even the unit tests. There are some. Sometimes one fails. We stare at it. It was expecting this value to be 3, now the value is 4. Why was it expected to be 3? Is it wrong that it's now 4, or have we changed the behaviour such that the correct answer is now 4 and we need to change the test? Undocumented unit tests are worthless. I'm not asking for a lot of documentation - just commentary in the code next to them would be fine - but if I can't see what's actually being tested and how that "correct" value was ascertained, when it fails, I don't know if the bug is in the code or the test. Useless unit test.

I rewrote an internal tool from scratch to replace an old one that was being used by a few thousand people. It turned out that fully half of the users only ever used it to see the auto-filled partial results that populated into one of the form elements. That particular requirement never made it into the specification for the new tool, and the rollout of the replacement did not go well, to put it mildly.

In fact, there was so much of a mismatch between what these users thought the tool was for and what it seemed to be for from an engineering perspective that it was extremely hard to get any more specific feedback other than “The new one is terrible; I can’t get it to do anything”.

So you probably want to log the requests, figure out what people are doing.

There are no requests to log. It's not a web app.

It's installed on at least one and up to many networked windows machines on a given customer site, where that network is isolated and contains numerous IP and serial cable connected pieces of specialist hardware (no two customers have the same sets of hardware); we make them dance together. There is no interaction outside that network.

Fair enough, that does make the problem harder.

I once insisted on these comments as part of the deliverable to an outsource team.

I got an app full of comments like:

  assert(x == 3); //x should be 3 here

When I have some time available, but not enough time to really get into something, I just pick up a piece of code and try to improve it. Over time, I hope this leads to gradual removal of the cruft that accumulates over time.

At my shop some guy created a new ui framework. Three years later he's still there only one who can use it. It looks pretty, so it's impossible to convince management that it's done more harm than good.

Could start with regression testing, each time you break something, write a test so that you do not break the same thing again. After a time you will have a decent test suit. Also write tests for all new stuff. And also write tests for everything you change. Would also be an idea to start using version control, so that each change is documented in the commit message. Both tests and SCM you can start with on any stage eg. even on legacy code. Another strategy would be to move some functionality into a micro service. For example instead implement new feature X, make it a standalone service that not only is decoupled from the monolith, it should not even share the same db, it should be completely standalone.

While we do use some of those methods, sadly, we are not in possession of much of the hardware that our software runs. Making it impossible for us to fully test ourselves; we can usually bank on a period of testing on a customer site during installation, but after that, we're out of luck. Often the hardware claims to support some protocol, but we've found that really, that means "a variation on the theme of a given protocol" so ensuring that we still meet the protocol isn't quite a guarantee of success. Customers are often sympathetic to the idea that their hardware doesn't quite work as advertised - but not very sympathetic.

There is often literally no way for us to know that we've broken it until a customer upgrades and discovers it no longer runs their hardware as it used to. Which does happen.

Can you record the actual machine protocol? Then use that recorded data in regression/integration future tests.

How do you sell the reverse engineering?

How does "the business" feel about this situation?

How does the team feel about it?

You don't have to sell anything. It's part of getting the job done.

In an ideal world, correct. If this is the case:

> changes can take literally 20 times as long as they should while we work out what the existing code does

I imagine you have to sell the extra time to (aka convince folks this is the right path) _somebody_. Maybe I've been working at the wrong places :) .

    > Maybe I've been working at the wrong places :).
I think it's a balance between keeping certain key tasks beyond the radar of PM's (so you have time to think and act) and doing enough "deliverables" to keep them happy.

Sadly, PM's in many places have ground-down that buffer to the point where you're maxed out on your backlog and they know everything that's in it. Or worse, they're creating fake deadlines to fabricate "urgency" so that you're no longer in control of the minutes in your day.

>they're creating fake deadlines to fabricate "urgency"

i've definitely had PMs do this to me when projects were wrapping up and they didn't have any real work to give me. I'd just work at the regular rate and no surprise- no one squawked when i blew past the deadline for the busy work. It's mismanagement imo, with enough experience you start to resent being treated like a precocious child with adhd

I think you're fundamentally misunderstanding things. When I've worked on systems like the one EliRivers is describing, you don't have a choice between hacking something together in X time or doing it correctly in 20X time. You have a choice between doing it in 20X time and not doing it at all. If it's a part of the system you haven't touched before, it simply isn't possible to make changes without taking the time to understand what it's currently doing first.

The only part of the process that's optional and could theoretically be skipped is turning your notes into documentation consumable by others, but that's a pretty small portion of the time spent.

Thanks for calling out my misunderstanding.

My real question is, how did the business/management learn that they had "a choice between doing it in 20X time and not doing it at all"? How did those lessons get learned?

No reverse-engineering - no new feature or bugfix. That's all there is to it.

Did the business have to get bit a few times (with, say production outages?) before this became the protocol?

"Production outages" aren't something we have to worry about; we don't run a service and once something is working on a customer site, they basically don't tinker around with the configuration or setup. They just use it. But there have certainly been projects that massively overran because the estimates assumed that pieces that already existed were understood and reliable (sometimes they were reliable; they reliably did something, we just thought they were meant to do something else).

This happens less now that we assume any existing piece that we don't know about already will turn out to be a minefield.

> Pieces are gradually being documented as we work out what they do

Nothing wrong with that, but it's far more important to write TESTS for the code as you understand it.

Once you have tests for a subsystem, you can rewrite, refactor and/or replace it with confidence!

most important part is hidden in there:

> with deadlines to meet

the reason the company is there, employing you now, is that deadlines were met.

That is also the reason that the company is now sometimes spending 100000 dollars on work that should cost 5000. A few years ago, the whole edifice was dangerously close to becoming literally unmaintainable; collapsing under its own complexity, the interest payments on the technical debt requiring more than 100% of available output.

Those deadlines were frequently missed, by the way. Sometimes by months. I think not quite ever by years.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact