The process probably looks silly in this case because nothing went wrong. But with a hard-coded variable being changed, who is to say it won't break some other system by being changed? And if the variable was then moved to a config file, who's to say again that it might not break in some obscure way? Such as if the config file is not found in the uat/test environment? Additionally after all this, the manager can change the variable in the config file as they please without this process need going through again.
Why not instead of
> Shirley (Code Review): It is now against company policy to have any hard-coded variables. You will have to make this a record in the Parameters file. Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production.
Add at the end:
However, I'll allow the change for now because this is an important update that affects peoples jobs. I want to see every issue fixed before the end of the week and I will not approve any other changes until these issues are resolved.
Working in big enterprise shops, I've noticed that periodically that attitude does take hold, and the result is that the firefighting ends up producing more fires than it puts out.
If you could change most of the people, culture, and incentives in a non-software-centered enterprise all at once (and this typically goes far beyond the IT shop), sure, maybe more fluidity in software development and release process would work better. But that's usually not possible, nor is it usually possible to durably wall off a group within an enterprise to isolate it from the problems of the larger organization (though that does happen sometimes.)
But there's a huge swath of dev shops that never pay any technical debt back, and the compound interest on that technical debt consumes all productivity. Past a certain point, drawing a line in the sand and forcing better practices no matter how easy it is to rationalize 'one more hack' is the only way to break that cycle.
Engineers often talk about technical debt as if the codebase is the currency of the realm. It's not. Business Value is the currency of the realm. The code is something that facilitates business value, when (and only when) it does what the business needs. The people working on the code have a customer, the business. The business is locked in to a single provider for that service. And the service provider has grown complacent, prioritizing its own needs and processes ahead of the needs of the customer. They're behaving like other entrenched and untouchable service providers do, in monopoly industries like government and cable companies. Deal with my service levels, or suck it.
There are many customer-obsessed engineers who know how to strike this balance, but this is one of the many issues where there's selection bias in who will take the time to participate in the discussion. People who think this is a powerful story of IT done right need to broaden their minds, stop playing around on hacker news, and start thinking a little broader about how software service providers can do a better job of learning how to say "yes" faster.
"Technical Debt" is a term that exists largely to communicate the idea that the thing it represents increases the dollar cost of both operating the software to deliver constant business value in a steady state, and of delivering changes to the software to continue to deliver the same value as business needs change or to realize additional opportunities to deliver new value.
It is, absolutely and centrally, about business value; ignoring it in decision-making is exactly like ignoring maintenance/operations costs in a capital purchase and considering only acquisition costs.
The catch here though is the fact that rate of business value production is inversely proportional to amount of tech debt. Meaning a ton of yes's today will mean a grinding halt in a couple of years. I have worked on too many (very successful) teams now and watched many others that have gotten themselves into this situation.
After this point, you end in a place where tech debt repayment efforts can take years - but it also coincides with 4-5 year mark when original developers are tired and leave. End product - a giant legacy system operated by a ton of new folk who have no clue what's up. End result: This gets 'deprecated' and replaced by a brand new one.
Ideally I do think data is gold but code needs to be incrementally ripped out and replaced every 3-5 years or so. Else entropy is too crippling on the system.
We don't need more "yes men", we need more people able and willing to say 'Here are the trade-offs', and work with the business from there.
Yes, it absolutely is about Business Value, and the best Business Value is not always "yes" (of course, sometimes it is).
... speaking as someone currently dealing with and seeing the Business Value impact of technical debt.
Theres that much technical debt on the old app that the first change I made to it 5 years ago caused it to stop compiling because of the single line I added to a function which made it to big to compile.
If you work in a shop that has been around long enough to lose its training wheels and get a driving licence, then you have technical debt thats core functionality that everyone is scared of breaking.
The end result of that being that you couldn't really change _anything_ without it breaking any of N client installs (each custom) - and you couldn't reason about which one touched a particular part of the code, or what assumptions it made.
That company was all but dead inside two years. There were other factors leading to its demise, but far and away the biggest issue was the crippling technical debt that was piled on with reckless abandon.
There are definitely trade-offs to be made, and sometimes it makes sense to take on some technical debt. Your example of back-office apps may be one of them, but there's certainly a subset of software development for which this is both a business and technical-level suicide.
I usually find that customers
- Prefer getting feature early, even if it's unpolished, meaning ship something broken and fix it later rather than polish and ship something perfect.
- Prefer many, complex features over simplicity/performance in the product.
- Prefer having more bugs, but quick turnaround over having few bugs and a long turnaround.
In all the enterprise scenarios I have worked I have tried to stress the importance of simplicity etc but really all customers have stressed is that they don't mind bugs, crashes or poor performance as long as they can get their work done, and that might depend on that last feature or putting out that last fire.
Unlike if you came back to them with a polished product with no bugs, they would be like why am I spending all this money for this time?
Any improvements you make down there are just racking up on time spent that makes the bills higher which they will notice.
They do not see your improvements, but they will notice it if your much betterer code now introduces a new bug.
Higher bills as your predecessor, slow in fixing and new bugs. That is the moment to worry about getting bills paid, not for fixing fast.
The customers we want are those that aren't customers yet. Those customers require features that we don't have yet in order for them to become customers.
It might be the right decision as a business to screw the existing customers by aggregating tech debt in order to grab new market share before the competition does (again this is in a context where a customer has no chance of leaving once on board - which is more or less the case in all enterprise scenarios I have experience from, both web and desktop).
I hate this mentality and as a developer it's terrible - but I can't say that it's the wrong decision to make short term, business wise.
In addition I am not in favor for aggregating technical debt, I suppose I wasn't clear. The article talked about a 1 line fix and that by itself should not trigger a rewrite of a feature, especially not when the fix has to be done now, not next week.
I have never seen that happen, but then I have never worked on invoiced/project work, only long term enterprise/product dev. Customers license the software, but I suppose there could be some SLA type agreement also for desktop software (which happens to be the case).
There is something that isn't right here. I am not sure why I am feeling this.
It's a good thing.
If the company policy should be ignored, then the issue should have been escalated to someone with the authority to make that decision.
IMO the real problem is that the submitter didn't realize that was the case and so didn't escalate sooner. This could have all been avoided by asking for (and getting) carte blanche from Philip or David in the first place.
Talk about policy and it's politics and escalation and such. Talk about principles and it becomes a moral issue.
Alternatively, think of it as breaking a specific law (such as the speed limit) vs being immoral in a more general sense of the word.
Decisions should be made at the lowest level where all the relevant information is available. In this case, the executive understands the pressing reasons for making the change, so is properly able to weigh the incremental risk (with the technical advice of Shirley et al) vs. the business priority, whereas Shirley can only see half of the picture.
Imagine if Shirley had gone ahead and broken policy, then something went badly wrong. She likely would've been blamed for the whole fiasco.
This is what I was looking for in this entire thread. It's one thing to push through a small change in a small internal program for something that isn't critical like a batch job that clears old data from a server. That's not what this is.
It changes the potential output of the entire company. Yes, it's a single LOC totaling out to a single byte. Yes, it passed all of the tests. However, tests are written by people, and tests are only as good as the people who write them. This is partially why there are human eyes in the QA/IT/UAT phases. Shirley would have been viewed as the single point of failure had something go wrong, because she made the decision to break policy.
For this reason, I always advocate reversing the numbers. The higher the P number, the higher the priority. So you never run out of numbers to express how much more urgent and earth shattering your next bug is. Try it!
So they introduced a new priority level: "Critical". The pattern repeated itself in short order.
It was a tragedy of the commons thing, and eventually it was literally impossible for a job to ever finish if it was marked anything other than "Critical".
So they introduced "Omega Critical". Now you needed to be on a permissions list maintained by the database team to raise something to "Omega Critical". I got on the list (because I was scheduling actually high-priority stuff). Then everyone in my org would come to me and ask me to set random bullshit to Omega Critical, because they knew I had permissions and "one more job won't hurt"...
I don't work there anymore, but I believe they have since developed even more weird categories like "Q4 Omega Critical" and who knows what else.
1. VP bangs head with equivalent VP in other department, complaining there are too many requests. Can work, depends on individual.
2. Actually have a service level agreement. Requests are limited to X per hour/day/week/whatever the process needs, and billed accordingly. Have a bit of buffer in this, a little creative billing. Requests within SLA are dealt with in agreed turnaround time. Have a dashboard that clearly shows this to customers. Alert customers if they're consistently submitting what's outside of SLA, and they will receive a degradation of service unless they provide further funding.
Everyone knows if you pour too much water unto a cup it will overflow.
Put a checkbox. "This is an emergency". People are suddently less likely to click it.
And anyway, if it really were, they'd have called so it still doesn't matter but it makes them happy :D
(And I managed to check it within my first month there. My boss kindly let me know the difference between a client saying emergency and a VP saying so....)
To get to the front of the queue, you had to mark your request "hot and urgent for the whole team"
Another great thing about inverting the priority numbers is that nothing can easily go beneath a floor of 0, though ultimately all of this is about fixing the symptom and not the cause.
Every single fucking time the PO thinks the deadline is too late, she tries to remove nice-to-have issues - sorry, but that doesn't change the deadline.
However people soon discovered that priorities where stored as a float in the database and they could enter any number 0<n<=5 in the field. So bugs quickly started to get priorities like 0.5, 0.25, 0.1, 0.05 and so on...
Sales-driven-development is lots and lots of fun.
UPDATE jira SET priority=nil WHERE 1=1;
Just take all tickets that are "urgent", remove the urgent and put them at the back of the queue.
I worked at a company where the technical debt was so deep that I really wanted to say that it was technically bankrupt, meaning that the software should be entirely rewritten. This goes against every grain of my experience to do, and we ended up not declaring technical bankruptcy, but if you want to understand where it starts, it's, we'll do it later.
Each of those decisions needs to be accounted for just like a credit card purchase.
It is easy to forget what the real world looks like when staring at a screen all day. We take out tech debt like a mortgage - sometimes that debt doesn't make sense and we end up having to foreclose on it, other times we're able to pay it off slowly over time and it all works out.
Where the analogy to debt falls down is that sometimes you luck out and don't have to repay tech debt. Maybe your legacy manufacturing system gets replaced with something modern and all of that horrible code magically vanishes - maybe your single threaded software from the 90s is finally outdated and you need something modern and would have had to rewrite it anyhow.
Just call that "winning the tech debt lottery". Much like winning the real lottery it definitely happens but not enough to rely on it.
What happens if the findings are fixed later in the week? Are they promoted, too? Are large swaths of the system regressed because of the changes? The process should allow the reviewer to say "oh, this is legacy code--the actual change looks good, and I won't worry about the code from ten years ago that doesn't conform to the standards we created a year ago."
No matter how hard the process mavens try to reduce development to painting-by-numbers, good human judgement is the key to success.
My group supports several legacy applications (10+ years old). The product owners can barely be bothered to green-light bug fixes. I can't imagine they could find the justification to green-light tech debt fixes.
If it's a decade-old application that rarely even gets bug fixes, it's probably not worth doing major refactorings. On the other hand, if it's your core product, and your team is regularly making updates and fixes, it would be irresponsible to not incorporate your team's latest standards.
Also, it's worth investing in test automation, because the balance really shifts if you can minimize the cost of regression testing by automating it.
I have seen a 270k loc codebase where every change has been made by this kind of thinking. Nobody ever went back to make the "temporary" changes properly.
So now there's a lot more disruption and a ton of frustrated people throwing way too much energy into finding what ultimately ends up being a one line mistake.
After a few rounds of that you decide that if it's going to take 6 days, it's going to take 6 days.
This goes in the backlog and gets pushed down further and furhter and further until it's backlog clutter that nobody even notices until that particular chunk of code no longer exists.
when did you ever get to this land of milk and honey called later?
whenever you have a jfdi because its broken, the way its coded always stays....
I've never seen a manager sign off on reworking code that does the job, just to make it "more manageable" or "standard" or "less of a bodge"
A problem is that most minor changes in the big picture are thought of as being critical things that need to bypass the process in the short term.
It's not just this case that seems to make sense. It's that there's dozens or hundreds of other similar critical patches that need to bypass the process.
Because "later" might end up actually being in the middle of the night, or later might not ever come. And then the mess might be magnitudes bigger than what you were facing before. Just wait, now. You have the time. Always. Unless your platform/software is continually crashing, your business always has time. Changing code like that without review is begging for you to "not have time" and cause an undue amount of stress for everyone involved.
Be an engineer, not a monkey. We're all only human and mistakes will be made.
Even if you believe that the next committer should clean up, it can require quite a big context switch - either the clean up explodes into quite a big task and you have to question the trade-offs being made in doing it now, or it doesn't have full focus and becomes risky and/or no real improvement.
I don't happen to believe that the next committer should clean up in most of these situations. Minor things, sure, you should do it (but don't side-track the review with it!). In my experience, people who already couldn't be bothered to do it or clean up later use a new commit as an excuse to get their work done for them. I've often seen people criticising tangential code issues in review who were responsible for the original commit. The question becomes - if it's such a minor issue for me to clean up, why wasn't it a minor issue for you to clean up? If it's bigger, how did it pass review in the first place?
My view is that extraneous cleanup should be proportional to the original change at most - and not discussed in code review of unrelated changes. If a particular area of the code sees sufficient churn to accumulate more and more debt, everybody will know and talk about it, it will be clear that some specific refactoring is necessary, and more importantly it will be evidence that small cleanup is worthless (high churn) and a better architecture is valuable (high churn). A given team may have high tolerance for technical debt and thus never do this work. But then they are screwed anyway, why create noise at every review just hoping that someone else solves it?
Remember, there's also a good chance that the mess stays the same as it always was, sitting there passively. And there's a chance that the cleanup introduces unanticipated problems, especially if it's legacy code with no/less testing.
And surely addressing the code you're touching isn't "extraneous" cleanup. You're modifying the code, you can do it bit by bit.
"Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production."
Basically, it's the idea that if you touch a piece of code, you have to fix everything else wrong with it, even if that involves touching other areas of the code that you weren't looking at initially and may not even have familiarity with.
This is "the perfect is the enemy of the good", applied to code refactoring. If your policies put people on the hook for fixing mistakes that were made by other people a long time ago every time they touch code, there is a very strong incentive to never touch code. You get a bunch of hyperactive software wizards that shit out a working product (with a bunch of crap code), put it on their resume, and then move on to another employer. The people hired after them have a strong incentive to never do anything (because touching code will require that they touch a bunch of other stuff that they don't understand and have no interest in dealing with), and so they browse Reddit and Hacker News all day long. The code remains crappy, but its crappiness can't be blamed on anyone except folks who are no longer with the company.
Actually, this is a pretty accurate description of the software industry.
And 2 days were lost to escalation while 2 were spent in test. Everything else didn't look crazy to me if the change is small but has a high system impact.
For an industry that is implicated in downed spacecraft, massive security vulnerabilities, and a reputation for unstable products, it's amazing how happy we are to optimize for developer satisfaction over all other metrics (I'm looking at you, Agile).
And before anyone says it, I'm not claiming this is a good process. But coding velocity is just one metric by which we should be measuring our success.
Developer comfort is not a goal of agile; the goal of agile is always having the project in a known good state.
Any developer comfort is a side effect of the prerequisites for having the project always be in a known good state.
Having said that, I think to say that "the goal of agile is always having the project in a known good state" is simplifying things a bit too much. A very important part of agile is preferring artefacts that are executable over artefacts that are not.
More relevant to this discussion is the idea that people should be making judgements rather than relying on hard coded rules of process. If someone has made the decision that the risks of doing something is not worth the benefit, then avoiding the problem is not a waste of time. Importantly, though, a process or rule can not make judgements -- only people can. Following a rule blindly certainly can waste a lot of time.
Setting up a team is not easy. If you put yourself in a situation where every single person on your team can and will press the big red "Stop The Presses!" button, then you only have yourself to blame. Every person on a team should have responsibility appropriate for their experience, skill and view of the situation. If you have disputes, then you don't want an arbitrary process determining the winner -- you want a real, thinking, feeling human being to make the judgement call.
I'm rambling on here, but most of the breakdowns I see in teams come from the idea that either employees are stupid and so we should shield ourselves from them with layers of process, OR the idea that my colleagues are stupid and so I should make sure to politically rig decisions so I don't have to talk to them. This is basically guaranteed to result in failure.
Agile isn't a process so much as it is conceding defeat on the very concept of process.
If someone wants to give you a full specification and stick to it, you can build something fully specified. If they give you a vague idea that will change a lot, build them as little as necessary and make it flexible enough to change with the idea. If your code's going along for the trip, it may as well be easy to pack.
and that error-proofing is entirely imaginary - they have added a ton of moving parts, handoffs, and manual processes and so will have way more screw ups than a small tight team practising continuous delivery, as anyone who has worked in such environments will know.
If you'll get off my back, let me refactor and get my "developer comfort" on, you'll never have to involve me or QA or IT in supporting Just One More File Type ever again - you can push that from policy on the server side. But "developer comfort" isn't "value for the customer."
That's a technical decision that they shouldn't have to know about and are not qualified to make any decisions about.
Or put in other words: The PM should be your customer, not your boss.
Why can't you say "we never designed the system to support more than X file types, so if we want arbitrary additions we need to take Y time to change the system to support that. Thus far, there have been Q requests for changes, and in my professional opinion we could save time now, and in the future if we make adding arbitrary filetypes to the feature set this sprint".
This places the refactoring as something necessary to accomplish business goals, and save time/money in the future while providing better service to the internal or external clients using the software.
As a developer, your goal is not really to write working code as quickly as possible, your goal is to provide the highest possible throughput of working code.
Consider the case of delivering A as fast as possible, B as fast as possible and C as fast as possible. Then contrast that to delivering A,B, and C as fast as possible. These are different problems. Your product manager may or may not understand this point, so ensuring that they do would be my first step.
To do this, I usually use the metaphor of cooking. You can cook scrambled eggs very quickly, and can save time by not putting things away or washing up. You can then make a salad -- again you will save time by not cleaning or washing. Finally you can make soup. But by the time you get to soup, your board is full of unnecessary ingredients, your sink is full of dirty dishes, you have no room on your burner for the pot, etc. Professional chefs clean as they go. Each task takes a little bit longer, but their overall throughput is massively higher (the Ratatouille scene where she says, "Keep your station clean... or I will kill you!", is a good visual to keep in mind).
But this isn't enough. You product manager, once they realise this, will try to "optimise your experience". They will try to group tasks that they think will make you go faster. They will try to drop functionality which they think will make you go slower, etc, etc. Above all, they will keep asking you, "Can we make this change without having to refactor the code?".
In this way, you are going to start restricting what you can do and you will make your code more rigid because you are always treading around the boundaries and never fixing fundamental problems in the center. This is a sure fire way of making you go much, much slower and making your customers very unhappy (since you never do anything of any substance).
So what you need to do to remedy this situation is to cooperate with your product manager and agree where the appropriate place to make decisions is. For example, your product manager is the only person able to make a good judgement about what the customers actually need/want. It is usually hard for programmers to trust their product managers in this regard (because we tend to think of them as morons who are only pushing a schedule so that they will look good and get promoted). Making sure that your product manager understands that you depend on them for this aspect can go a long way to building a bridge. You have to create the idea that you are a team, working together to fulfil the same goals, rather than adversaries trying to achieve different things.
In exchange, the product manager will have to trust you that technical decisions will have a good outcome. If you tell them, "I'm going to take a detour", they have to understand that's because it will make things go faster -- you can see a roadblock ahead and pushing ahead will kill your throughput. This decision is yours, and yours alone. It is a matter of trust, though, so it will take time for your product manager to build their trust in you. Feel free to have a frank discussion about this and explain that it's difficult for you to trust them too. But when both sides trust the other, then there will be a much better outcome.
Now there is one last thing: emergencies. Occasionally, it doesn't matter whether there is a road block ahead or not. This is not your call. And as much as there are asshole product managers who will pull the "this is an emergency" card every 5 seconds, that's their call.
If you think the emergency card is uncalled for, balance your feeling for fixing the situation against the similar issue where you want to delay production to refactor code. That's going to be your call. Your product manager is going to be thinking, "Is it really necessary????" and your goal is to engender trust. It is a give and take, but your only route to success is building that trust.
In the end, if you find that you just can't trust your product manager (or they just can't trust you), I would talk with your management. If you still can't find an acceptable solution, then looking for a better situation elsewhere is probably your best option. Some people want to work in a broken way. You don't have to though.
This is precisely the problem. PM refuses to believe engineering when we say "You can have feature F on date D" rather than on date D-21 when PM thinks they 'must' have it. ("must" in this case is also not backed up with data, but random desires of potential customers that come up in sales presentations...)
I find fixed-length sprints harm agile. You're tempted to suggest sprint-length goals, not the minimum useful product. This leads to distrust, because everything looks like estimate stuffing, yet you being honest intend to fill those days doing cleanup, automation, etc. All real work. And you resent every effort to trim even a minute.
Usually even the surliest customer can be worked with by simply cutting them in on more of the planning (you do bill for meetings, right?) and letting them decide how far to push a "spike", etc. When they see their demands are heard and turned into work (and billed for!!) they often become much more open to suggestion, such as to stop beating an expensive dead horse.
The goal is to be able to break down a project into functional goals (index the data, query it, script that, schedule that, add a GUI, etc.) After the first step or two you're producing value, on the way to the goal.
One of the hardest problems in development is finding a competent customer for a PO, who can drive the project with requirements and political support. The more tangible you can make the benefits early on, the higher-level stakeholders you'll have access to and the less trouble you'll face getting things done.
It's that trust issue that you need to address, not the date. If you focus on the date, then you are doing exactly the same thing that the PM is doing when they complain that you insist on refactoring code -- you are abandoning trust and trying to make a judgement call from the wrong place.
Unfortunately, in this situation there are probably a few things you need to do to solve the situation. As usual, there are many ways to skin a cat, but I will tell you the way that has worked for me in the past.
First, if you have deadlines on individual features, then you have a problem. If you have features that span large amounts of time (as in more than a day or two), then you have an even bigger problem. From what you are saying, these two things seem likely.
The strategy I would suggest is to temporarily acquiesce to the deadlines. Yes, technical debt, but it will pale in comparison to the debt you will acquire if you don't fix this problem.
Next, split up the features into smaller pieces. Each piece should be (on average) about 1 day of work. So if your feature is due in 10 days, then you should have 10 pieces to build. Do not organise these pieces by design structure. Instead organise them such that every day you accomplish something that you can literally show the PM (it won't necessarily be possible every time, but do your best). Very important: doing this will require a lot of rework because you will have to put temporary design scaffolding in to make something user-visible every day. Do not avoid this!!!!! I repeat: Do not avoid this!!!!! You will have to refactor as you go. This is not bad.
Depending on what kind of thing you are building and what your deployment procedure is, try to deploy not-finished, but still useful bits as often as possible. Every day would be best if you can manage it, but don't go more than 2 weeks without a customer visible deployment (2 weeks is a rule of thumb, but you should probably view it as a hard and fast rule at first and adjust later when you have a feel for what works best). This may require you to split up the feature into sub-features that are complete in and of themselves. This can be challenging from a design perspective.
Your goal here is 2 fold. First you are establishing credibility that you are working and delivering every day. Not only that, but if you miss the deadline, the PM has a very good idea of where you are with the feature. As you surmise, they probably don't care at all when the feature is done. They just care that you are working on it flat out. By delivering every day, you establish this fact.
Second, by deploying at least every 2 weeks, you are significantly reducing the pressure that the PM feels from other places. If you have an excellent PM, they will be insulating you from the white hot pressure that other business units are putting on your team. But even the best PM cracks and puts you under the vice.
Corporate attention span varies in different companies, but my 2 week rule of thumb has worked well for me in many different environments. It answers the "What have you done for me recently" question nicely. A stress free PM results in significantly more elbow room for you. Never underestimate this.
Now, it may be that you are already delivering dribs and drabs every day or two (because your PM is already pre-optimising your experience). If so you can probably go to step 2 faster. In this step you start negotiating to remove deadlines. Since you are already delivering every day and you are deploying every 2 weeks, negotiate the amount of work that you are going to deploy in the two weeks, while continuing to deliver every day.
So, if you have 10 reports to do, don't put a deadline on each report. Say that you will deploy all 10 reports in 2 weeks and that you will deliver functionality every day, just as you always have. Also negotiate that prioritisation happens every 2 weeks. So whenever they want something, they will have to wait (on average) one week before it is prioritised into the active development backlog. This will be a hard sell, but offer the addendum, "Of course if there is an emergency, we'll have to make adjustments, and I will leave the definition of 'emergency' up to you." (The general rule is that you remove a like amount of work from the backlog and replace it with the 'emergency'. If you have to stop work in progress, then it is lost and you emphasise that 'emergencies' have a cost).
It won't be smooth going all at once, but over time you should be able to negotiate a way that will work for your group.
Hope this helps! It doesn't always work out, but like I said: you always have the option to vote with your feet if you think that your partners just don't want to dance.
Please allow me to disagree with you that this is somehow my fault (which can apparently be corrected by my "learning to make my case more effectively.")
Edit: It's possible that your PM will then say, "This feature will earn $250k per week. Delaying six weeks will cost $1.5m. Over two years, 1800 hours of wasted developer time is ~$150k, and recruiting 20% new devs will cost $250k. Let's spend $400k in lost productivity to earn $1.5m in additional revenue."
Engineers and developers usually have a strong aversion to giving out numbers
out of the ass, and you cannot give such an estimate in any other way
(barring the cases of the most trivial patches).
I think the problem is that we often think giving a point estimate somehow has to be 100% accurate. Or, worse, we estimate only on the lines of code that will be created and forget that it is all of the lines of code discarded in the process that take most of the time.
So, no, refusing to give a date doesn't do service to this. If something is going to take a long time, talk to the reasons it will take time. But FFS, do not focus on some intrinsic quality of the existing code base that is meaningless. It is not a customer problem that there is cruft in the codebase that embarrasses you. It is a customer concern that the last X times a feature was added you spent Y hours dealing with high priority bugs that resulted from compromises.
And if you don't know X and Y, than you are worse than making shit up, you are losing credibility.
That's a learned behavior from all the times that estimates become deadlines.
And no, I do not think it is that easy. The point is that refusing to speak simply robs the process of valuable input.
I'm not saying this is the issue in your case, but I've seen it a lot.
The Product Manager and Director of Engineering should be peers, not one reporting to the other.
1) He changed a number.
2) There was ( apparently ) a very simple test for the effect of the change being correct.
3) This was high-priority.
There's more to it more often than not than just policies and the amount of legacy code, but even when we had the most supportive of managers, morale was very low over it. I definitely think companies, as they age and go from startup to a firmly placed company, should really take stock of their code assets and policies and really make sure they're kept up-to-standard yearly, and really look at every policy and make sure it's absolutely needed. It's a lot easier than taking a 1 line project on a 9 year old piece of code and turning it from a half-day full-review to a 6 day haltingly slow process.
Can you please clarify what the case has been? Has it been the case that procedures that require bringing all code up to standards if just a tiny bit is touched? Or has it been the case that nobody will ever touch legacy code if they can possibly help it?
"Fix this unrelated stuff or no approval" needs to be overridden by "This is critical". "Fix this before you add this new feature" is pretty reasonable.
You brought up the name for it. Okay. Now can we get back to the debate on whether it is a good practice?
These people are debating whether boy-scouting is 100% required every time, or whether there are critical issues that require making it sometimes-optional.
There are obviously issues where it would be better to skip it, but that's the wrong question. The question that needs asked is can the organization identify, adopt, and properly execute a system which identifies those cases reliably enough that the benefit of the true positives is not overridden by the costs of the false positives. And, IME, the places that tend to have inflexible policies like this are also places where the answer to that is "hell, no!". Which is demonstrated dramatically (and expensively) everytime someone with who fails to recognize the organizational capacity constraint gets into the right position of authority and tries to implement that kind of flexibility.
Without those steps there's another article waiting to be written: "Rogue developer brings down business for six days with a seemingly innocuous 1-line change"
Of the two, I've had to unexpectedly work nights, weekends or even vacations at only one kind.
Guess which one I decided I'd rather work at.
It can be annoying to go past code reviews and yet there are fire everywhere all the time.
One of the methods I found had a cyclomatic complexity of 200+ and a single test, verifying that the result was not null.
I was told "no, you can't fix that, the NY office wrote it and they'll get angry with us if we change their code and refuse to help".
Things are worse than you think :)
This seems to be a pretty clear example of not making the process work for you. Unwritten policies. Having to escalate twice because of silly reasons. Hot fixes being miscategorized as features. Is a very urgent fix really the place to be fixing a bunch of other tech debt?
OP demonstrates the fatal flaw in having process, in that poorly created process can be crippling. In this case, it may have led to significantly less profits, where if they had a way to fast-track something immediately, they could have avoided the delayed change for something so simple.
Another flaw is that process, if it's really bad, almost encourages people to circumvent it. At large, slow companies, the devs who push the most code are not necessarily the best and the brightest but rather those who can bend process in their favor.
(1) Making the variable a configuration variable will do nothing: it's not like you want two different versions of company policies at the same time.
(2) It won't help with testing, because any realistic test of the variable's change will involve changing the company policy and probably the way all systems behave.
In my opinion, making that into a config variable is a mistake. It's perfectly OK to keep as a hard-coded constant.
Yes, in many cases they are an improvement, but not always. Every "best practice" must be judged individually with the question "does it apply here?".
Years ago I was working with some very small piece of a web-crawling framework, and I had to do something special for www.naver.com, Korea's biggest portal. Because the string "www.naver.com" appeared multiple times in the same function, we put that into a string constant at the beginning: something like
static const string kNaverUrl = "www.naver.com";
...it was only much later that I realized the stupidity. As if Naver, Korea's #1 portal, will change its domain name at a whim! Even in the unlikely case that happens (say, they get acquired), so many other things will also change that my code will be surely unusable after that.
kNaverUrl was an unnecessary detour that only served to make the code longer and (slightly) less obvious, guarding against a theoretical possibility that will never come true.
Code Review has this nasty tendency of accumulating "best practices" beyond breaking point, because nobody objects to best practices. Once in a while, these practices turn out not to be that best.
Just my two cents.
It also defends against typos, as serge2k pointed.
You have changed your magic string into a magic constant. It's a step towards better code, but the next step is to give the thing a proper name. In this case, something like DoSomethingSpecialUrl. Now you have a bit of self-documenting code.
Maybe they won't change their domain name, but what if you end up needing to move from http to https? Or what if you want to crawl their staging server at staging.naver.com? Or if you wanted to bypass the DNS?
Avoiding the magic constant achieves more than just guarding against name changes. It also guards against typos, you reduce n potential failure points to 1.
That's such a common junior developer mistake. I see it a lot - constants called ONE_SECOND or COMMA.
Hearing "Oh, make setting XYZ configurable" or "Stick a flag to turn FooBar'ing on or off" also fills me with worry, as all too often it has been the harbinger of terrible feature creep and piling on of technical debt.
However, not every hard coded constant should or needs to be in a config file.
Ideally I'd like to see code reviews culturally in an organization be mostly about catching bugs and collaboratively learning the craft of coding better. It can be very frustrating and demotivating when it devolves into a series of undocumented style and naming requirements. I think it is good to have standards and process but seems like most places I've worked are woefully short on documenting these hidden requirements.
Reviewers should not be afraid to reject code, changes should be tested.
However, there are both bureaucracy and code reviewing things that I would change here to not slow things down unnecessarily:
- If the code practices document hasn't been updated, pending future changes shouldn't block your PR
- You should never have any trouble finding someone to review your changes - the whole team should be empowered to review.
- Strictly, only the lines you touch should need to follow any existing code practices, not the whole module because you happened to change one line. Otherwise people will be scared to do small fixes.
- You should only be forced to write new tests for functionality you've actually added, for the same reason. As well as of course fixing tests you break with your code.
- Don't even get me started on Ed not having access to Marge...
- Why does this piece of work need to be done in the first place? Why would a backlog be limited to 3 months?
TL;DR: There should only be as much bureaucracy & strict rules as strictly necessary. Unnecessary bureaucracy is the death of companies, large and small.
If it's a literal factory they may want minimize storage costs. You have to rent space to keep the product until the customer expects it delivered. I'm a layman so take my guess with a grain of salt.
Good tech managers do this today. I do. My direct reports that give good code reviews are more valuable to me, so I need to know how well they are reviewing code.
The real question is: What is this policy and process protecting against? Does it work?
We're all risk averse. Bad things happen. We come up with processes and systems to prevent them. We write our postmortems and say "How could this have been prevented?" and we say "Heavyweight process X and automated, difficult to maintain, high overhead system Y" and everyone nods their head sagely and makes plans to implement X and Y. Then we make a new mistake and postmortem and build process Z and system W.
But you know that old saying "Lightning never strikes the same place twice"? Well, it's wrong. Sometimes it does. Especially over a long enough timeline. But it's pretty rare.
We're the soldiers preparing to fight the last war and losing the next one.
Personally, when I see stuff like this I always think of https://medium.com/@webseanhickey/the-evolution-of-a-softwar...
It's a cartoon. But I'm getting old and angry. 15 years in, I think that most of this stuff that we do, these pedantic code reviews, processes, policies, best practices and test plans we're always beating each other over the head with, they don't always pay for themselves. We still write bad software. We still make stupid human mistakes.
There is no answer but good judgment, trust and respect. You need as much of it as possible. I think it's easier if you believe Clayton Christensen and build protected small teams that can operate in a limited bureaucracy and limit the connectivity between them and groups viewing them as "others". But again, trust, respect, autonomy and good judgment.
Or just make shittons of money. With lots of money, you can afford to hire lots of smart people to do stupid stuff and take forever doing it.
David/Julie: Having processes that allow critical issues to sit idle in a backlog for 2 days.
Shirley: Delaying a code change because of completely unrelated issues. Code changes that introduce bad code should be fixed first. But demanding that the person fix all unrelated issues in the file is not the way to go.
Tony: Not having tools in place for quick, completely automated testing. If a one line change requires 3 days of manual testing, something is wrong.
Mature, high-profit systems warrant much greater scrutiny on incoming changes than businesses/systems which aren't spinning off money yet. You can err in both ways: too slow where you need to be fast, and too fast where you need to be more careful. And AFAICT companies/organizations build processes, habits, and platforms around one mode and have a hard time switching to the other.
Last time we had a partial outage that only affected a piece of the base for a few hours, we got a first estimate at over 4 million dollar direct loss of revenues, and some percentage of that as direct guaranteed profit loss.
This isn't dev. Running a factory is ops. It can't take 6 days to renew your domain name when you realise it expires in 2. It can't take 6 days to patch your server when some script kiddie takes it down the first time. Not everyone has a job that gives them the luxury of making sure everything is perfect before they commit to anything.
(Of course, the rule for this button when you have is that you can only press it like once per month or some other arbitrary long time. Oh, and maybe having a physical button that actually sets off an alarm in the office could help too. Maximum annoyance to prevent abuse.)
> Also, there are 2 old Debug commands, an unassigned variable warning message, and a hard-coded Employee ID that will all have to be fixed before this module can be moved to production.
F* that s* indeed.
People beg for time to fix tech debt, and right here is someone complaining about having to do just that.
I've also worked support at a startup that was all about the whole "we don't really need tests for trivial changes!" factor. Oh, what fun it was when the devs would push straight to production at 1am, with an "ah, it'll get tested in the morning sometime, maybe"... in the meantime our farmer clients were using the tool well before office hours to plan their days around irrigation requirements. Those changes that devs thought so trivial were usually trivial, but sometimes were quite catastrophic due to unintended interactions.
Lots of stupid little processes that add up and make the managers feel power, but end up killing the performance of the business.
Brings up the interesting differences between leadership and management. Managing is what gets people into situations like this, whereas leadership (in late stage companies) is what gets people questioning these processes instead of moving like cattle into a slaughter.
I saw this exact type of behavior at General Electric back in 2011. People wanted to argue about acronyms and process design all day, whilst Samsung ate up 20% market share in 1 year. That business of GE no longer exists and was sold to a foreign buyer. Move fast or die -- this rule does not only apply to startups!
The startup may have 5 guys in a warehouse building their product, while the big company has 100 people on a $10M assembly line.
When the startup wants to scale by 20%, they work longer hours or add another worker. If they didn't plan properly and find that they don't have another workstation for that worker, they buy a table at OfficeDepot and there's their new workstation.
The assembly line might need an entire process flow redesign to add more workstations, plus a physical reconfiguration, software changes, etc.
If a developer just blindly adds a parameter that adds 10% to the workday without going through reviews and tests, he may find out from the physical plant engineers that they no longer have enough time to do preventative maintenance so they need to scheduled an outage to do deferred maintenance.
There's a reason why process exists in large companies, and it's not just for the sake of process.
That small startup might be able to usurp the large corporation, but the large company still has an advantage in throughput and/or efficiency. The corporation can promise Walmart that they'll have 100,000 units delivered by Dec 1st... to meet the same goal, the small startup might have to scale their workflow 100X, and they can't do that by scaling from 5 guys in a warehouse to 500 guys in a warehouse.
Right a 78 year old company with hundreds of billions of dollars of revenue is a startup killing big companies.
I'll just note that while Samsung is about half of G&E's age it makes twice as much in revenue.
Move fast or die -- this rule does not only apply to startups!
Code is pretty much a footnote to that process. It's exactly one of very many tools a business can use.
It's true that technical debt can cause business problems. But it doesn't cause nearly as many problems as having an ineffective sales team, getting killed by regulation, or being eaten alive by a competitor.
Big companies aren't taking the chance.
And no one bothers suing into the ground a muppet with two packs of ramen to their name ...
See Also: Boeing, IBM, CGI Group, AT&T, NRC, etc.
The biggest reason large companies that move slow continue to succeed in their original business, is because their original processes and flow are from when they were growing, while they were learning and more agile in their space. As a result, they still generally provide much more value there than the average startup can compare to, in at least a few specific areas, specific to each business.
That's why the blue ocean approach often succeeds much more than trying to take on a large company at their own game. It's hard, because it's a huge organization dedicated to being good at that one thing- and nothing more. So the better answer is to sidestep, not to take them on directly.
Edited to update comment on regulatory capture.
Regulatory capture is a form of government failure that occurs when a regulatory agency, created to act in the public interest, instead advances the commercial or political concerns of special interest groups that dominate the industry or sector it is charged with regulating.
In the case of regulatory capture, you'd think regulation would most likely be reduced rather than impeded.
No, in regulatory capture you expect regulation to be increased so as to form a barrier to entry to new competitors, while advantaging (comparatively) established incumbents that have the inside track with regulators and the regulatory process.
Additionally, you ignore the business logic cross-departmental issues when you do not have a centralized security department.
There is really no need for full IT Security PCPAA compliance department.
Also, don't forget selection bias - most small companies, startups included, die in the first few years. Only the success stories make the papers.
1.) priority is not just high, it is critical, communicating this is lost at each layer (executive, planning, execution, process control, quality control)
2.) leadership is lax, the chain of command doesn't designate a clear single responsible individual
3.) policy enforcement in this example actually increases the risk of an unsatisfactory outcome, by increasing the complexity of the solution vs. what is in production
4.) quality control is adversarial and ass backwards, code review is supposed to be a sanity check "does this code do what the developer thinks it does" aka. "can some other person understand it"
5.) test planning should not be the developer's responsibility, quite frankly if QA can't figure out if it is working or not you should fire your QA department.
6.) Ultimately, it is a total failure of policy and management as it requires the President of the company to micromanage the situation.
If you think any of this is fine, I'm sorry but your company is doomed to fail (unless it is already so big it is too big to fail).
Slow/inflexible company policy has been built up, presumably under the IT Director's watch. The president didn't communicate that it was extremely urgent (and maybe it wasn't - the article doesn't say when the change actually needed to be made by to make a difference). Ideally, management can recognize that this change took a week because of their accumulated decisions, and lack of urgency. They should figure out exactly how their company policy is slowing them down instead of letting heads roll.
Just remember the chapter 12 of Peopleware.
Between two organizations, the productivity can vary by 10 times. It's not about you, it's a social characteristic of the organisation.
Just leave and find one whom speed suits you.
The real reason why this one-line change took so long is because the bureaucracy hasn't been automated away yet. Policy like not allowing magic numbers shouldn't be stuffed in a document, it should be enforced in a linting tool, and that tool ought to be runnable on the developer's workstation. If you have a QA department, QA should be spending its time writing automated tests, not maintaining and running manual tests. The developer ought to have been able to take 5 minutes to checkout the codebase, make the quick change, run the tests close to the change, and then submit to CI, which would run through the entire test suite and then continuously deploy to production. QA's approval should be built-in to the codebase and CI. Even in big companies, the process should not take longer than an afternoon.
And then no one would ever try to block a code review ever, and your code quality would go in the toilet. Then, something like this would happen again, except with a much less benign change, and your servers blow up (not literally).
"1.) priority is not just high, it is critical"
Priority is always critical for whatever little changes management wants.
Which is why in an actual emergency, management needs to actually communicate with the people involved in the development pipeline, not just hope the normal flow will magically work faster.
This used to happen constantly at my office:
Boss: I made a ticket to [change thing]. It's urgent, and please let [coworker I pass tickets to] know that it's urgent too.
Me: Coworker, I just sent you a ticket, [boss] says it's urgent.
Coworker: Everything is urgent. Shrugs and goes back to working from the top of his ticket list, all of which have the same high-priority flag as the one I just sent him.
This was my exact thought. Why did a "fast tracked" ticket spend two days in a queue? Why did the requests to IT not also get set to urgent priority? (An IT request that blocks an urgent code change should itself be considered urgent.)
Most bizarrely to me, why the hell did they not defer updating the code to meet newer company policies? For the hardcoded variable thing, I would've said "ok, just change the value now, then create a separate, non-urgent ticket to bring that code up to our current standards."
It's sounds like their bureaucracy isn't equipped to deal with fast-tracked changes. The worrying part that management fed a critical issue into that bureaucracy and didn't prod it along until they noticed the delays. They may not fully appreciate how complex and full of roadblocks their process has become.
If this company runs into this kind of problem often, then something can be done, but if this is a rare occurrence, it's not so bad.
Remember dev did not 'waste' 6 days of his time, rather, it took that time for the cogs to move together on it.
And even with 1 ... it's not super clear.
(2) It does appear that David, the IT Director, was responsible, though he was out of town during part of the process. (What? In 2016 "out of town" means "not reachable"?)
It's on (3) that I totally agree with 'cthulhuology and disagree with you. Changing the value of an integer constant in code is far less risky than what the code reviewer demanded. As others have argued upthread, the original one-line change should have been put through to production as it was, and the additional changes should have followed later via the established process... if indeed they were truly necessary at all. (As someone else pointed out, a variable whose value changes once in 10 years probably doesn't really need to be a configuration parameter, though making it into one probably counts as a slight improvement.)
For (4), okay, here I don't entirely agree with 'cthulhuology. There is a place for enforcing standards; code review isn't just about comprehensibility.
The text explicitly says David's signature was needed. Unless they have an e-signature setup, or he can locate a fax machine, it doesn't matter how "reachable" he is.
(That's assuming that "out of town" doesn't mean "on vacation", in which case "not reachable" is in fact quite reasonable.)
I'd argue that adding a variable that changes once in 10 years is not an improvement. Config files should consist of things that actually need to be configured. Adding crap you'll never touch makes it harder to find the variables that are actually important, and also makes it harder to read and understand all available options.
Introducing configuration variables for the hell of it is analogous to writing every method to be as generic as possible; useful but not worth the risks and expenditure. Just like how we often draw the line and say "this method doesn't need to be any more generic", config variables similarly need an approach more nuanced than "move everything to the config file".
This depends on context.
An integer variable can kill humans if it's the wrong variable.
Was it related to safety?
I'm assuming the reviewer knew what they were doing ...
Maybe some common sense all around would have helped here.
"Philip (President): Our factory is underutilized by 10%. Either we start building more of our backlog or we lay people off. I'd rather keep everyone busy, build inventory, and get ahead of the curve before the busy season. How can we do that?
Lee (Operations Manager): Company policy restricts us from building more than 3 months of backlog. If you just change that to 4 months, we'll have plenty of work."
You can't have one week of 10% less productivity if not you will immediately fire everyone? And you company policies only work if you change a line of code? And this line of code is from some old legacy code that no one touch and your whole process depends on it? No, 6 days of development is the least of the problems here. This company is absurd.
6 days for the line of code wasn't optimal, but wasn't that bad. What this story shows to me how bad management pretend things are IT fault.
As for requiring a code change to implement new policy; maybe not ideal, but it looks like they're trying to move away from that with the requirement that hard coded constants be moved to a parameters file.
And in general, using the company's production software to enforce business policies is probably an important management control. Often when you see rules like that, it's because the company got burned by something in the past and wants to enforce a process that prevents it from happening again. In this case, maybe in the past an operations manager was evaluated based on how busy the factory does, and so just kept building up more and more inventory to game the metrics he/she was being judged by.
I think it was a tad inaccurate of the manager to call it 'legacy' software. It might be an old system, but it appears to be critical to the company's operations, and it seems to be something that is worked on fairly regularly. Even if it's written in Cobol and JCL, I'm not sure it would be fair to call it legacy software unless the company is actively working on a replacement and making only necessary changes to the old system in the interim.
So, while I understand your concerns, I'm not sure it's fair to pin this on bad management. It sounds to me like a company that's probably been in operation for a long time, and has processes in place to make sure nobody pushes any changes to a production-critical system without proper approval and testing. They could possibly streamline the process, but it doesn't seem (necessarily) crazy based on the information we were given.
If you increase the backlog time you are building for, then it's a one-off increase, it doesn't increase the quantity of new work coming in, so once they've worked through this one-off extra with their 10% spare capacity, things will just go back to how they were before.
I.e. "1 line in 6 days" is only a measure of latency, and not of throughput.
You could drive a station wagon full of backup tapes quite far in 6 days :)
Large blobs make the review process completely fall apart. In the past I dug up too many cases of people allowing 50-file steamrollers into a repository when the entire “review” was literally two words (“looks good”). While this probably happened due to time constraints, it makes the entire thing pointless. With a simpler commit process, engineers might submit smaller change requests at a time and people asked to review “7 files” might do a thorough job instead of balking at requests to review “50 files”.
"Don't have it debug log 'Foo'd 5 bars', make it 'Foodulated five bar things'"
There really needs to be a procedure for violating procedure, so that when higher-ups say "do this immediately", those same higher-ups can own the consequences of violating procedure, and so that they know the business implications of the rollback plan, etc.
If the bug is "people are getting laid off" or "our customer database is being siphoned by an SQL injection", it probably should not wait.
Proper handling would have had a dev working on it within the hour. The response to the code review would have been "other changes are out of scope, open another ticket for enhancements". If the developer didn't already have access to the test environment, at least IT would know that they're authorized for it. (Critical fixes shouldn't be assigned to a new hire who doesn't know the code base and doesn't have access to commit code yet.) The ticket would be edited (Test Case: view WorkOrdersHours report, Expected Result: ~10% higher total) and tester would be told to reread it and go ahead. And of course documenting the test runs is the tester's responsibility (they're the one doing the test runs), not the developer's, as is notifying the IT Admin and Operations Manager when the testing is done and it's ready for sign off.
Exact same process, same quality checking along the way, timeframe reduced from 6 days to 4 hours or less; likely less than 1 if the team works together well and their tools are decent. A good process can handle emergencies without sacrificing quality, as long as the people work together and the priority is communicated.