What we have now is frequently a mystery. Simple changes are difficult, difficult changes verge on the impossible. Every new feature requires reverse-engineering of the existing code. Sometimes literally 95% of the time is spent reverse-engineering the existing code (no exaggeration - we measured it); changes can take literally 20 times as long as they should while we work out what the existing code does (and also, often not quite the same, what it's meant to do, which is sometimes simply impossible to ever know).
Pieces are gradually being documented as we work out what they do, but layers of cruft from years gone by from people with deadlines to meet and no chance of understanding the existing code sit like landmines and sometimes like unbreakable bonds that can never be undone. In our estimates, every time we have to rely on existing functionality that should be rock solid reliable and completely understood yet that we have not yet had to fully reverse-engineer, we mark it "high risk, add a month". The time I found that someone had rewritten several pieces of the Qt libraries (without documenting what, or why) was devastating; it took away one of the cornerstones I'd been relying on, the one marked "at least I know I can trust the Qt libraries".
It doesn't matter how smart we are, how skilled a coder we are, how genius our algorithms are; if we write something that can't be understood by the next person to read it, and isn't properly documented somewhere in some way that our fifth replacement can find easily five years later - if we write something of which even the purpose, let alone the implementation, will take someone weeks to reverse engineer - we're writing legacy code on day one and, while we may be skilled programmers, we're truly Godawful software engineers.
The code you write isn't the hard part. It's not the interesting part. And it's not really why you're paid. What I'm trying to imbue into you is the discipline to design and document and plan for the future. Saying you can "whip up a script this afternoon" is not impressive or commendable. It does harm without the other pieces.
If it's a genuinely large project, the working code is a skeleton POC, and we can run a design process using what we learned from it. But most of the time it's not, and that engineer has saved us collectively hundreds of hours of waste talking to death something that just isn't that hard.
I think one of the critical senior engineer skills is having correct intuitions about how much time and energy should be sunk on a particular task. Sometimes that means asking people to slow down, collaborate, and think. Equally often, it means asking them to put down the calendar invites, make a choice, and bang something out. We have this bizarre cost model where code is expensive and meetings are free; nothing could be further from the truth.
Requirements, design, implementation, testing; all documented thoroughly, all reviewed and checked, each stage linked clearly to the previous and the next. One could literally trace a requirement from the requirements documentation, through the design, into implementation (which was done using literal programming, producing beautiful PDFs for humans to read - there was very little need for anyone to ever actually look at the actual pure Objective-C code that was fed to the compiler), and then onwards to testing, such that each individual test was linked to proving given requirement(s) that had been written down at the start; sometimes years previously.
The customer was so impressed that they asked us to take over from another supplier whom they were busy suing for lack of delivery of a piece of the same system. Our software was delivered on time, exactly to spec, fully documented from requirements to design to implementation to testing. The whole project ran for most of a decade.
It certainly wasn't easy. It required rigor and discipline and review panels and, I gather, somewhere between ten and twenty years' experience of creating software like that. It worked; the only software I've ever worked on in which the customer never reported so much as a single bug. The entire system could be completely understood down to the level of the code by reading the documentation, and when changes were required, the software engineer would begin by reading the documentation and upon opening the literal style source file, would find it just as expected from the design. I've never seen anything like it before or since.
They went all in, though, and took years to get there. One couldn't just take a modern, agile-style software shop and start doing this. It was culturally ingrained.
I certainly agree that in other places, I've seen some truly awful documentation that hindered more than it helped. I've seen system diagrams literally consisting of a square box with "server" written in it. Protocol documents that contradict themselves on the same page. I once was double-checking the wiring diagram for a cradle for a processor board, and disagreed with the voltage of a single pin; it turned out I and the other chap read the exact same line of text from the board manufacturer and interpreted it in completely opposite ways - one of us interpreted it as meaning the pin should be grounded, the other that it should be connected to the live rail. The documentation was so bad we ended up having to go back to the manufacturer and get a human on the line willing to go on record with the correct value.
I've been in a similar project where we would get beautiful requirements from the client and implement them flawlessly on time, every time. I've also been puzzled by this a lot and wonder why other projects can't be the same. The thing missing from the story is the complexity of the software and how long the client spent upfront on designing those requirements before handing them off. This was very trivial applications with few states, no UI and clear input and outputs, nowhere near a big modern web service. If our implementation was according to spec but not what the end customer really needed it would still not be counted as a bug, it was the clients responsibility for writing a bad spec, the client was very much aware of this so we also hardly ever got any bug reports in that sense.
It was waterfall it a nutshell, assuming the first design is correct and resource optimizing every step of the way for it. You couldn't replicate that process onto an app with a big UI footprint and quickly changing requirements which requires more iteration rather than upfront design.
I presume you meant "Literate Programming" (one invented by Don Knuth)?
Regarding this project, can you share some more details as to how everything was done from requirements to implementation? How were they seamlessly linked together?
Let me probe my memory...
Requirements were developed first by the customer and our requirements specialists, working together. That was a big priority. Requirements were broken down into individual elements, examined for contradictions and ambiguities, they had to be discrete, they had to be testable, all that sort of thing. This required domain knowledge as well as just being thorough and having attention to detail. As one would expect when writing software, it turns out that if you don't know what it's meant to do, or you misunderstand what it's meant to do, the odds of building the right thing are very low. This is something I see very commonly in many software companies; it's very common to start building things without knowing what they're meant to do. Sometimes it's impossible to know at the start what they're meant to do, but I do wonder if we could be a little harder up front about demanding to know what it's meant to do. Agile seems to be a way to control against that risk; a way of building software without knowing at the start what it's meant to do but getting there by frequent course correction.
Once written down and formally agreed by the customer, each little requirement was given a unique identifier. That unique identifier was carried through the design and implementation and tests. The literate programming techniques used and the format of the design documents and test documents typically mean that a sidebar on each page showed the exact requirements that a given section was implementing; an index would show all such places, so if I wanted to know how we were going to meet requirement JAD-431, I could look up all such pages and satisfy myself that the design met it, that the implementation met it, that the tests covered it. Reviews would involve someone having a copy of the requirements any given design or implementation was meant to meet, and ensuring that every one of them was indeed met.
I remember printing out test documents, and executing them (including the build; each test document specified the versions of source code to be used and the tools used to build it, and often the exact build steps), and on each page where I signed and dated it to confirm that I had conducted the test and it had passed, there was a listing of the requirements being tested (or partly tested). That sure gives someone pause; when I signed my own name in ink to give my personal guarantee that the test passed and was all in order, I sure didn't accept any ambiguities or shoddiness. Those signed, dated test documents would get a double-check by QA and if that was the final test, sealed in an envelope with a signature over the seal, and securely stored against the customer performing a random inspection or against anyone ever needing to come back and verify that for a given version, at a given time, the tests did pass.
If a requirement ever changed (which did happen, and we sure did charge for it!), it was thus easy to identify everything else that would need to change. The effect of the change would ripple through the design and implementation and tests, linked by the affected requirement unique identifiers; each piece thence being examined, updated and confirmed as once again meeting the requirements, with confidence that nothing had been overlooked.
Where I think most shops touting "Agile" miss the mark is the customer collaboration part. Extracting the detailed requirements you've described would take a lot of upfront time and effort that I've found most "Agile" shops don't have the stomach for. Where the rubber usually meets the road is in "Grooming" sessions. These sessions are typically rush jobs to move Stories to "Ready for Dev" but what I've experienced is there's rarely adequate information to proceed to development after a typical Grooming session.
Very much so. To do this well, one requires something that is in very short supply; high-quality, competent customers.
In other words, it took... engineering.
I'm certainly not advocating for adding insane amounts of process to little jobs. But show your work. It's worth most of the credit.
What drives me up the wall is starting a long, formal document and having a bunch of meetings about it, seeking input and consensus from 10+ people and committees, for something with the scope of a couple days. That process is designed for large, multi-engineer-month new systems, or decisions that will be hard to back out of and have ramifications for hundreds of people over years. But I often see junior engineers putting the wheels in motion for basic day-to-day maintenance and feature iteration tasks.
It doesn't help that our promotion process incentives this heavily, because it generates lots of evidence for the committee to evaluate.
"Programming in this sense primarily must be the programmers' building up of knowledge of a certain kind, knowledge taken to be basically the programmers' immediate posession, any documentation being an auxiliary, secondary product." (emphasis mine)
"The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered."
Not if it doesn't compile anymore because of a switch in hardware. I've seen this firsthand; same exact code but results were off by an order of magnitude.
For example, the software that runs a bank needs to be constantly updated to comply with current laws and regulations and to support the new types of accounts and services being offered by the bank. Over time, customers wanted to have access to their accounts through ATMs, then over the web, and later via mobile apps.
Skimming Naur's essay, the solution to this problem seems like it could involve building higher level code analysis into the language itself (making the linter part of the language and development environment).
Microsoft is doing this to some degree with compiler extensions on the .NET platform . When used creatively, you can effectively deploy a custom linter with your libraries which emits compiler errors and warnings based on how the user is doing things (ie I think you're trying to do X, have you considered using this new functionality instead?). However, I think it is interesting to consider what it would look like if statically typed code and style analysis/suggestions were a first class citizen in a programming language, not just an add on for advanced users. If this exists now, I'd love to check it out.
It's too bad we won't all live to see the inevitable improvement in computational notation over the next few hundred years. Time to binge watch some Alan Kay talks and try to imagine it.
Detective work can go a long way, but in situations like this there's no substitute for ownership continuity and institutional memory.
organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations.
- M. Conway
This happened to me on a project. I spent months documenting the Byzantine labyrinth of hundreds of interwoven services that had been patched for a decade by a hundred developers. Much of it had been done under pair programming, so required pair programming or a 120 IQ just to decipher it.
In the end I was squeezed out not because I was wrong, but because my method of solving longstanding issues from first principles didn't fit within the company politics of the client's organization.
The gist of it is that clients mostly see short term personal productivity and not the long term gains from removing technical debt or avoiding it altogether by way of the developer's experience. Senior engineers can eventually reach a point where there isn't much for them to contribute on a project because every last issue could have been avoided through proper planning and engineering in the first place.
In my case, my greatest strengths (like refactoring ability) were overlooked and my greatest weaknesses (like implementing features at high speed by repeating myself or ignoring project conventions because I don't know any better - something I can no longer do) were brought to the forefront.
On my first day of my current job, Mufasa (my boss) took me up on Pride Rock at sunrise, and we cast our eyes over the system diagram of the back-end supporting our product. And he told me, "We work on this box here." Our team's box gets data from that team's box, and we output data to be consumed by this other team's box...
"Any what about that shadowy land?" I asked, pointing to the arrows coming from outside the diagram.
"That is the Front-end. You must never go there."
So yeah, Conway's law through and through. This is a big company, though, and a long-lived product. At my last job, at a startup that was ~3 years old, things were intentionally very different. This month you and a few others worked on feature X. Not on this system or that system, you were implementing something for the customer. Maybe you're a backend engineer, but you still wrote the database migration, the queries, the endpoints, the front-end logic, markup, CSS, whatever. Some of it you sucked at, but a lot of that was handled by good team composition. If another engineer on that feature knew CSS better, they mostly styled the things and you reviewed their work. Maybe you styled the next thing.
I was frustrated in that job, because I didn't like writing CSS, and I wasn't good at it, and I was very good at some other things. Why couldn't I just do those things I was good at? It would have kept me a lot more productive, and saved (and made) the company a lot of money! Why did I have to do things I was bad at? Did they think engineers were interchangeable?
Well, now I know. In my current job, working on five different features in my box in the system diagram, I'm super productive when pointed in the right direction with enough to work on and some autonomy. That just doesn't happen as often. We need to communicate between teams a lot more. We need to align our quarterly deliverables. We duplicate functionality on both sides of an interface, (or in each of N components), with subtly different semantics, because nobody technical is responsible for the feature. Show-stopping bugs before launch are a fact of life, because integration tests between our components suck.
Generously (though for other reasons too), we're 1/3 as productive as my last gig was. Conway's law is strangling us.
This particular set, though, was legacy code pretty much the week after it was written :(
I had a fun bit of legacy code wrangling a month or so back. I looked at production profiles of some very compute-heavy jobs (we have great tools for that) and found we were spending ~half of my pay on cloud compute in this one library function -- deep equality checks for a kind of object.
The equality-checking code itself is very slow, significantly slower than hand-written alternatives, but the objects are complicated (though luckily tree-like in structure) so hand-writing equality comparison functions would be a nightmare.
Many of these structures had "id" fields, though. Gasp, could we just compare ids instead of doing deep equality checks?
So I asked around, and nobody knew. Nobody even knew what the code was for. It'd be trivial to replace the deep equality checks with id comparisons, but we can't do it.
Seeing how some legacy code works is one thing. Data comes in here, data goes out there. Knowing which invariants the code is trying to maintain, though, what properties it assumes about the data here or there (Sorted? Unique? Up to date? All belongs to the same user?) makes making changes difficult, though. It means you need to write defensive code that the original authors would find ridiculous, because you don't have their institutional knowledge, and can't derive it locally in the code, and can't inspect it at runtime (or be sure that prod won't send a counter-example the day after you deploy to production.)
I don't always do TDD in a 100% strict way. The duration of the TDD cycle varies for me mainly depending on how practical a very short cycle is and also depending on how complex the software is. I have found that overly obsessing about very short cycles can be inefficient. Also, I tend to test a cluster of multiple classes instead of a single class or a single method. I think the subject that a test is about should be something at a level that is a bit higher, perhaps even something that the customer could recognize as something they value. If you do not do that you might be testing implementation details that are very much subject to change. Testing at the right level also can save quite a bit of time.
I don't write user interfaces that often and when I do they are mostly not that fancy. In that case I have sometimes written tests that check that the HTML output is correct. In the case of things that are purely visible I quite often first fix the thing and then write a test because the browser in my head is not always quite good enough to write a correct test beforehand.
I have generally found that management was encouraging towards writing test in places where I have worked. Occasionally I have been praised for writing high quality software but also sometimes got the remark that sometimes things were taking a longer than expected. Everybody should be able to understand that there is a trade-off here. I would not want to work in a place where they produce as much crap as possible in as little time as possible. Lack of quality is quite hard on me emotionally.
I think it's this glossing over the inherent and deep complexity of software development that turns a lot of people off TDD.
What's wrong with getting the software right, often through a process of trial and error, and then writing the tests to lock in what you have working?
One might write test afterwards but as I note above tests do cost time so one should aim for the maximum payback of that time. When you start with the test it starts paying back as soon as possible starting from the point where it flags your first attempt making the test pass as not quite working.
One case where it may be better to start with some trial and error is when it is not clear what algorithm should be used.
This is what I was referring to.
From my own experience, you don’t actually know how the existing system works, the expected business rules were never documented, etc, and so even starting over isn’t really a less ambiguous path.
Sure isn't. We don't know what the software does right now, but there are customers all over the world using it and paying tens of thousands per year for support contracts; since we don't know quite what it does, we don't know quite what they're doing with it. We can pretty much guarantee that if we started over, we would miss a great many requirements (and a great many bugs that happen to work for various specific customers) and a lot of customers would be very unhappy. We don't know what it does, they don't know what it does, but it's working for them at the moment.
Oh Gods, even the unit tests. There are some. Sometimes one fails. We stare at it. It was expecting this value to be 3, now the value is 4. Why was it expected to be 3? Is it wrong that it's now 4, or have we changed the behaviour such that the correct answer is now 4 and we need to change the test? Undocumented unit tests are worthless. I'm not asking for a lot of documentation - just commentary in the code next to them would be fine - but if I can't see what's actually being tested and how that "correct" value was ascertained, when it fails, I don't know if the bug is in the code or the test. Useless unit test.
In fact, there was so much of a mismatch between what these users thought the tool was for and what it seemed to be for from an engineering perspective that it was extremely hard to get any more specific feedback other than “The new one is terrible; I can’t get it to do anything”.
It's installed on at least one and up to many networked windows machines on a given customer site, where that network is isolated and contains numerous IP and serial cable connected pieces of specialist hardware (no two customers have the same sets of hardware); we make them dance together. There is no interaction outside that network.
I got an app full of comments like:
assert(x == 3); //x should be 3 here
There is often literally no way for us to know that we've broken it until a customer upgrades and discovers it no longer runs their hardware as it used to. Which does happen.
How does "the business" feel about this situation?
How does the team feel about it?
> changes can take literally 20 times as long as they should while we work out what the existing code does
I imagine you have to sell the extra time to (aka convince folks this is the right path) _somebody_. Maybe I've been working at the wrong places :) .
> Maybe I've been working at the wrong places :).
Sadly, PM's in many places have ground-down that buffer to the point where you're maxed out on your backlog and they know everything that's in it. Or worse, they're creating fake deadlines to fabricate "urgency" so that you're no longer in control of the minutes in your day.
i've definitely had PMs do this to me when projects were wrapping up and they didn't have any real work to give me. I'd just work at the regular rate and no surprise- no one squawked when i blew past the deadline for the busy work. It's mismanagement imo, with enough experience you start to resent being treated like a precocious child with adhd
The only part of the process that's optional and could theoretically be skipped is turning your notes into documentation consumable by others, but that's a pretty small portion of the time spent.
My real question is, how did the business/management learn that they had "a choice between doing it in 20X time and not doing it at all"? How did those lessons get learned?
This happens less now that we assume any existing piece that we don't know about already will turn out to be a minefield.
Nothing wrong with that, but it's far more important to write TESTS for the code as you understand it.
Once you have tests for a subsystem, you can rewrite, refactor and/or replace it with confidence!
> with deadlines to meet
the reason the company is there, employing you now, is that deadlines were met.
Those deadlines were frequently missed, by the way. Sometimes by months. I think not quite ever by years.