If they succeeded, the manager would have pointed to the feature as an example of their “hustle” and ability to get things done where others couldn’t.
If they shipped the feature and it crashed the website, the manager would blame the front end team for making a fragile system that couldn’t handle a simple feature.
If they failed or were blocked, they’d point out their working proof-of-concept and blame the front end team for onerous process or politics.
The real litmus test is how the company reacts to that manager after this stunt. If the company sides with the hustle manager, it doesn’t bode well for engineering in the long term. When management rewards shows of effort instead of long-term results and fails to acknowledge negative externalities or selfish managers, you breed more of that behavior.
However, if management sides with engineering and shuts the hustle manager down, you’ve found a good workplace.
Ideally, once they identify by trying to pull the trigger you can move them out of the company.
Many of these processes may be necessary, but it's also necessary to explain why, and to make them as fast and painless/frictionless as possible - especially as each single process in isolation may seem reasonable, but when stacked on top of each other, the "get stuff done" approach becomes a lot more tempting.
Process stacking for me is one of the reasons why it's super painful to work in big companies if you want to get something done. As soon as somebody makes a mistake, they will add a little bit of process to ensure that never happens again.
Individually, as you point out, that makes sense. But if you have to go through a 1000 item review checklist for a single line of code, then I can assure you that no human will be able to actually think through those 1000 items. But they will go through the motions to satisfy the process. Then because they have the checklist they don't think they have to think about it anymore. They make a mistake. It gets added to the checklist.
I experienced situations where a single code change would take at least a month. This lead to people trying to save time on a) tests, b) any kind of refactoring, c) adapting libraries instead of writing your own implementation (because fixing the library would be 2 code changes and not just twice the process effort, but an actual committee had to decide about the library change first.)
So a lot of process IMHO is the worst thing you can do for your code quality. Checklists are good, but they should be limited to a manageable number (e.g. 10 items, if you want to add something, you have to remove something less important first). It should also not be harder to do the right thing, e.g. centralizing functionality in libraries should be easier than.
Our sub would dock at the pier the day before, everyone but Weapons Department got the day off/in port duty day. Weapons Department would hold an all afternoon walkthrough of the entire process. Manpower locations and roles. Equipment setup and basic operations. Types, quantity, and sequence of weapons to take aboard. Expected timeframe / pace so that no one was expecting to have to hustle to catch up.
And everything was in binders, with plastic strip edged pages and fresh grease pencils issued to everyone managing.
Every one of those steps was a result of "Ok, crap, what do we rewrite to make sure (shudder) THAT NEVER happens again."
And even so, on my fifth loadout party, I still missed a retaining strap and almost helped dump a torpedo in the harbor, except there was already a step right after mine with a separate checkbox that said "Aux handler has checked strap type/quantity/positioning for weapon type."
Procedures are great for the things that need them.
And when you have numerous teams/functions scattered about, procedures are even more necessary.
And I do get that a lot of code is not likely to detonate under the companies' hull, per se.
To be fair, the other side of the medal is when it is simply not possible to get certain things done, because the need has not been anticipated when designing the processes. If you don't have the company political clout to get these processes amended, your only option is to wait until a customer is negatively affected, in order to drive the point home. Still, hustling (even if it is well-meaning) is of course not an acceptable solution.
The less clarity there is on the "why" the more creative the management will be.
Of course, managers who say "I don't believe that will happen so I'm going to skip this part." should be walked out of the door to their car immediately. :-)
The issue can sometimes occur when the manager doesn't know that their rag tag team is not this special case, but actually clueless. Or have not learned to spot the difference, or that there is a difference.
> just a counter example
It took me a while to understand why HN'ers revel in "the counter-example."
In mathematical proofs, you only need one counter-example to refute a proof or argument.
Pedantic HN'ers seemingly fail to realize that mathematics and the real world are not the same thing.
Here you're right to raise issue, but it seems the comment is merely trying to point out that 'not all [Scottish!?!] rag-tag teams are bad' and idea draw attention to some such teams being superb. Which seems a fair comment to me.
The things we're talking about here aren't mathematical axioms, they're general trends. One counter-example does not disprove a trend. Every real-life trend has exceptions, and it frequently is interesting to examine the exceptions to see why they bucked the trend.
But anyway, the main lesson I learned there is that as an ops team (or broader, as an IT department) you need to have Principles, capital P. A short set of rules and goals which you can always point to. Like the uptime goal, which excludes / includes a LOT of things right off the bat - access controls (nobody can touch production directly), testing practices, application architecture (stateless), etc.
These analysts say that Soviet authorities appear to recognize that operator errors at the Chernobyl plant on the night of April 25-26 were not the sole cause of the accident, and that technical flaws in the reactor’s design contributed to the worst accident in the 44-year history of nuclear energy.
In particular, they said, a distinctive feature of the Chernobyl design, which sets it apart from conventional nuclear power plants in most of the world, is its tendency to generate a sudden and uncontrollable burst of power if large steam bubbles, or “voids,” are allowed to form in the reactor core, as they did before the accident.
This peculiarity of the Chernobyl type of graphite reactor, called a positive void effect, is now seen as a decisive factor in the accident, one that transformed successive blunders on the part of Soviet operators over a period of hours into a catastrophe.
If a startup needed to move quickly, they’d ping the relevant parties at the planning phase and get everyone on the same page.
I was referring to the archetypal “hustle” managers who deliberately try to do an end-run around other teams for their own personal gain.
Startup or enterprise, doesn’t matter. You can’t have management that rewards asymmetrical games that benefit rule-breakers at the expense of everyone else, including the customers.
I had a laptop which I brought with me from Australia. It wasn't in the asset register in the USA, so I was entitled to a computer. I ordered the most highly specced desktop build available, put it into the empty cubicle next to mine, and spun it up as a development server. It didn't have backups, but that was OK because I never worked with primary data on it and all my work got committed back to Perforce daily.
Strictly it was very much against policy, but policy would have meant I spent 6 months sitting on my hands. My manager "hustled" for me and did an end run around process.
That service becomes the source for a management report.
That management report contains useful data that the CEO looks at weekly and uses to build his board report.
The original Aussie guy leaves, but leaves the laptop behind because it's not his. He also doesn't document it (because that would get him in trouble for running a server on a laptop).
The laptop finally dies. The CEO is furious because he can't create his report. He leans on the IT manager. The IT manager has no freaking idea where this report is coming from or who makes it. They lean on the Support team to find out which server produces this report. The Support team drop everything to work out wtf is going on, because this suddenly became their #1 priority.
Eventually, someone finds the decaying husk of the laptop, and works out what's going on. They put together a plan for creating a supported server to do the same thing. It'll take weeks, because they have to provision a server properly through the usual channels. CEO has a rant at the entire IT department for not supporting critical business processes, and not being agile enough to support the business. IT manager takes a beating in the next management reviews. No-one is happy.
usually a rogue spreadsheet rather than server. The worst case I saw was an Excel spreadsheet in a business-critical department running on a user machine with a post-it note on it saying "don't turn this machine off". If the logged-in user name wasn't the same as the temp who had originally built it, the spreadsheet refused to work and the department ground to a halt.
> ... benefit rule-breakers at the expense of everyone else, including the customers
Maybe I've been in startup land for too long, but seems super normal and fine to ship a feature over the weekend if it goes through the regular CI gates - it's tested, peer reviewed, been QAd in staging, etc. Is this not accepted outside of startups?
Deployment over the weekend can make a lot of sense in the world of B2B, but there’s a difference between a carefully thought out plan to deploy at a quiet time and sneaking something out when no one is looking.
In this case, someone was trying to quietly ship things to production on a Sunday without involving the owners of the front end. How would it look for you if some other team crashed your part of the website on a Sunday without even coordinating the change with you first?
My point was that it’s important for companies to not reward selfish behavior from managers who want to make a name for themselves. If you genuinely need to ship a website feature on Sunday, you involve the website team for launch and follow up monitoring. You don’t try to quietly ship it out the door at the risk of breaking other parts of the business, as Rachel explained in the article.
Her story was of an untested feature trying to get injected into the frontend on Sunday to meet a Monday deadline, by someone without the proper access, and with no apparent oversight or process concerns.
I'd do everything I could to block this push as well.
Sounds like these people designed a component that wouldn't work on the current infrastructure. That happens, but if there is a big oversight like this, then you should ship late, instead of risk taking everything down.
Multiple of these processes you listed often require humans. You either are asking them to do this during the weekend, which is bad, or you gave them ample time during the week meaning you're ok with steps needed on weekdays, so you can do that for the last step too, with full staff present.
(bugs and urgencies notwithstanding, but that doesn't appear to be what we're discussing)
Environments like those described often have continuous push and automated slow rollouts with health checks, so the idea of doing something on a Sunday isn’t that strange at all.
That said, there’s something to be said for not trying to locally optimize. If you push bad stuff on Sunday, you’re messing up a bunch of people’s well-earned rest and recovery time from work. You push bad stuff on Monday, and everyone’s there to help you fix it without the stress of lost family or other commitment time.
The difference is 24 hours, which likely isn’t going to make or break anything. It’s easy to get sucked into believing things like that matter when they don’t.
I haven't had specific conversations with anybody about it, but I think we have all been around the block enough times to have been burned on a few weekends when it really wasn't necessary.
Not a start up at all though, and not a team of twenty somethings with anything to prove by moving fast and breaking things.
Agree with you though that I have seen this at a lot of places. I did a number of phone interviews looking for a more relaxed place in order to end up here.
In fact most trading platforms have this huge advantage of not being 24/7 operations.
1) Is stuff really badly broken?
2) Was the really bad breakage introduced recently (this could either be an earlier bad rollout, or it could be external factors changing)?
3) Is this requested deploy either a revert of a recently-made bad change or the minimal possible fix/bandaid?
If all three of these are true, then you can do a deploy RIGHT NOW regardless of the calendar.
(recently - because if this is something which has been broken for years then it's unlikely that it suddenly became urgent absent external change - which is already carved out above)
Once a startup hits some level of maturity, it's unacceptable to be shipping something significant on the weekend (or whenever people aren't around to respond to an issue). Probably post product-market fit, maybe Series B.
I guess it also matters how much your company values work-life balance.
I remember as a young Eng. getting caught up in the platform holy wars, and then sitting as a PM looking back on it all like I must have been in some cult.
There's truth to the notion that 'it's complicated' and rarely does anything get done in a weekend, but if there is focus, a decent dynamic process, things can move faster.
I worked at one company that had a messaging product, it has a big team of Engineers and things were at a snail's pace. I suggested bringing in a few talented people and starting from scratch as a re-factor, they thought I was crazy. A young intern left the team, did it on his own with one other person and met with enormous success. The company, even after literally watching an intern out-do them never changed.
Both the old company and the new company are big enough names you've all heard of them I wish I was at liberty to share.
In another project, we were opening up some basic APIs. We did some work with Facebook and they were able to give us a custom API in literally a few days. Our own, simple APIs took 18 months to deliver. The weekly product teams consisted of 10 people rambling on - and the two most important people, the dudes actually writing the code, were not ever present. It was a colossal and shameful waste.
Even though getting a rag-tag bunch of Engineers over the weekend is usually not a good sign (it might actually work for some marathon bug fixes or something), I'm usually sympathetic to the cause.
It makes me wonder how these organisations don’t collapse under the weight of their ineptitude. Most of the bugs or issues I have to fix are from problems we created by short term hacks. Way beyond simple tech debt.
The engineers are as much at fault as the managers, particularly when it comes to introducing insane complexity to the stack to solve simple problems (how many startups seriously need to invest in tech like gRPC or graphql except to gain cool points?). Management, on the other hand, have no empathy for the people doing the work, so quality dips as we are pressured by both self-imposed and external deadlines which are decided with zero input from engineers.
Half the time we web engineers are building glorified content management systems with some nice design over the top. It’s boring but it’s not a burn out.