Hacker News new | past | comments | ask | show | jobs | submit login

I expect this post will resonate with a lot of HN readers. Every tech company I’ve worked for has had at least 1 manager who tries to ship features over the weekend with a ragtag team of developers who don’t understand the system or why that’s a problem.

If they succeeded, the manager would have pointed to the feature as an example of their “hustle” and ability to get things done where others couldn’t.

If they shipped the feature and it crashed the website, the manager would blame the front end team for making a fragile system that couldn’t handle a simple feature.

If they failed or were blocked, they’d point out their working proof-of-concept and blame the front end team for onerous process or politics.

The real litmus test is how the company reacts to that manager after this stunt. If the company sides with the hustle manager, it doesn’t bode well for engineering in the long term. When management rewards shows of effort instead of long-term results and fails to acknowledge negative externalities or selfish managers, you breed more of that behavior.

However, if management sides with engineering and shuts the hustle manager down, you’ve found a good workplace.

Every company over a certain size has a manager who explicitly puts together a rag tag team of cluelessness in order to "get stuff done" because they don't understand the complexity of the stuff. They interpret push back as a lack of co-operation, rather than the requirement for sanity that it entails.

Ideally, once they identify by trying to pull the trigger you can move them out of the company.

The probability of such "get stuff done" goes up the more seemingly-pointless and painful, slow and bureaucratic processes are introduced.

Many of these processes may be necessary, but it's also necessary to explain why, and to make them as fast and painless/frictionless as possible - especially as each single process in isolation may seem reasonable, but when stacked on top of each other, the "get stuff done" approach becomes a lot more tempting.

>especially as each single process in isolation may seem reasonable, but when stacked on top of each other, the "get stuff done" approach becomes a lot more tempting.

Process stacking for me is one of the reasons why it's super painful to work in big companies if you want to get something done. As soon as somebody makes a mistake, they will add a little bit of process to ensure that never happens again.

Individually, as you point out, that makes sense. But if you have to go through a 1000 item review checklist for a single line of code, then I can assure you that no human will be able to actually think through those 1000 items. But they will go through the motions to satisfy the process. Then because they have the checklist they don't think they have to think about it anymore. They make a mistake. It gets added to the checklist.

I experienced situations where a single code change would take at least a month. This lead to people trying to save time on a) tests, b) any kind of refactoring, c) adapting libraries instead of writing your own implementation (because fixing the library would be 2 code changes and not just twice the process effort, but an actual committee had to decide about the library change first.)

So a lot of process IMHO is the worst thing you can do for your code quality. Checklists are good, but they should be limited to a manageable number (e.g. 10 items, if you want to add something, you have to remove something less important first). It should also not be harder to do the right thing, e.g. centralizing functionality in libraries should be easier than.

It's a cool thing that software moves so much faster than other processes, like in my experience military grade weapon loading.

Our sub would dock at the pier the day before, everyone but Weapons Department got the day off/in port duty day. Weapons Department would hold an all afternoon walkthrough of the entire process. Manpower locations and roles. Equipment setup and basic operations. Types, quantity, and sequence of weapons to take aboard. Expected timeframe / pace so that no one was expecting to have to hustle to catch up.

And everything was in binders, with plastic strip edged pages and fresh grease pencils issued to everyone managing.

Every one of those steps was a result of "Ok, crap, what do we rewrite to make sure (shudder) THAT NEVER happens again."

And even so, on my fifth loadout party, I still missed a retaining strap and almost helped dump a torpedo in the harbor, except there was already a step right after mine with a separate checkbox that said "Aux handler has checked strap type/quantity/positioning for weapon type."

Procedures are great for the things that need them. And when you have numerous teams/functions scattered about, procedures are even more necessary.

And I do get that a lot of code is not likely to detonate under the companies' hull, per se.

Moreover, in this industry, a checklist is usually a reinvention of 18th century manufacturing process. A lot of the "needs checklist" can be transformed and automated into "do the integration tests pass?" Several orders of magnitude faster, as it doesn't need couriers on horseback to carry documents to and fro, or a live human imprimatur. Talk about repeating history...

I tried to introduce that, but you need to change culture even more in that case.

>As soon as somebody makes a mistake, they will add a little bit of process to ensure that never happens again.

To be fair, the other side of the medal is when it is simply not possible to get certain things done, because the need has not been anticipated when designing the processes. If you don't have the company political clout to get these processes amended, your only option is to wait until a customer is negatively affected, in order to drive the point home. Still, hustling (even if it is well-meaning) is of course not an acceptable solution.

Absolutely agree. And it's up to a company to insure that process isn't pointless or obfuscating the reason for its existence.

The less clarity there is on the "why" the more creative the management will be.

Of course, managers who say "I don't believe that will happen so I'm going to skip this part." should be walked out of the door to their car immediately. :-)

I have seen the rag tag "get stuff done" team that works really well because they are staffed with people who know what they're doing. A team that knows better than to try to ship late Sunday for example, perhaps small and organizationally lax but experienced and disciplined.

The issue can sometimes occur when the manager doesn't know that their rag tag team is not this special case, but actually clueless. Or have not learned to spot the difference, or that there is a difference.

Bingo! The origin of the cargo cult, right there. "Look at that team: gets stuff done, and quickly! Therefore, I shall assemble a team at random, and push for speed at the expense of everything else. The results must surely manifest, for I am following the incantation!"

Incidentally Pirates of the Apple put together first Mac. Though not exactly rag-tag, but brilliant, just a counter example I suppose.

Well, the Mac was designed and developed by Apple engineers hand-picked by Steve Jobs. So it was an official long-term project with backing by the CEO.

> just a counter example

It took me a while to understand why HN'ers revel in "the counter-example."

In mathematical proofs, you only need one counter-example to refute a proof or argument.

Pedantic HN'ers seemingly fail to realize that mathematics and the real world are not the same thing.

I think it's the opposite of what you're imagining. Refutation is generally a claim that the truth of the situation is more nuanced than a simplistic statement would make it seem.

Here you're right to raise issue, but it seems the comment is merely trying to point out that 'not all [Scottish!?!] rag-tag teams are bad' and idea draw attention to some such teams being superb. Which seems a fair comment to me.

I don't think that the goal here is refutation, its adding additional data-points give a more nuanced picture.

>In mathematical proofs, you only need one counter-example to refute a proof or argument.

The things we're talking about here aren't mathematical axioms, they're general trends. One counter-example does not disprove a trend. Every real-life trend has exceptions, and it frequently is interesting to examine the exceptions to see why they bucked the trend.

Yup, and developers looking for shortcuts or how to do things the way they are used to (e.g. calling a database directly). I can recall one instance where a guy was asking for SSH access to the production environments, just so he could look at some env variables and logs. We had the best ops team (IMO) who worked with Principles (they were working towards 99,99999% uptime), which simply excluded SSHing into production servers. The developer was told to add something to his application that logged the environment variable, if that's what he really needed. It's a bit more work but at the same time he had no excuse because new deploys were a matter of minutes.

But anyway, the main lesson I learned there is that as an ops team (or broader, as an IT department) you need to have Principles, capital P. A short set of rules and goals which you can always point to. Like the uptime goal, which excludes / includes a LOT of things right off the bat - access controls (nobody can touch production directly), testing practices, application architecture (stateless), etc.

I don't think asking what environment variables are available for application developers to use on the servers is an unreasonable request. It should probably be part of the platform documentation though. Logs, again there should be a documented safe way to do this (ELK etc.). SSH to production is not the answer though.

BTW, the night crew pulling a stunt is how the Chernobyl Disaster happened.


They weren't exactly pulling a stunt. They were carrying out a test as planned, just off schedule (because of other operational demands) and the key thing is that neither the test planners nor the operators were aware of the design flaw in the reactor (because the government had declared it a state secret).


These analysts say that Soviet authorities appear to recognize that operator errors at the Chernobyl plant on the night of April 25-26 were not the sole cause of the accident, and that technical flaws in the reactor’s design contributed to the worst accident in the 44-year history of nuclear energy.

In particular, they said, a distinctive feature of the Chernobyl design, which sets it apart from conventional nuclear power plants in most of the world, is its tendency to generate a sudden and uncontrollable burst of power if large steam bubbles, or “voids,” are allowed to form in the reactor core, as they did before the accident.

This peculiarity of the Chernobyl type of graphite reactor, called a positive void effect, is now seen as a decisive factor in the accident, one that transformed successive blunders on the part of Soviet operators over a period of hours into a catastrophe.


Never heard of this before - thanks!

If this interests you, there's a show on HBO by the same name that you should give a go.


Making it easy and self service to do things the right way also helps create a cultural expectation that if you think you need to leave the happy path, you’re probably holding it wrong. Or at least, you brought the blockage/expensive coordination problem on yourself.

Yes, you are correct. But, to play devil's advocate to this, I will also say that there are operations where resistance to any change that will cause more than a trivial amount of work are roadblocked endlessly for no substantive reason except that it will disrupt a cushy routine of just keeping the lights on.

It's not that black and white though. A startup will need to have a different culture than an established company with thousands of users. What is right for a consumer company is not necessarily right for an enterprise company. Companies need to evolve all the time, and such stories are tipping points where such evolution should happen.

Startups aren’t an excuse to get sloppy and break other team’s production environments.

If a startup needed to move quickly, they’d ping the relevant parties at the planning phase and get everyone on the same page.

I was referring to the archetypal “hustle” managers who deliberately try to do an end-run around other teams for their own personal gain.

Startup or enterprise, doesn’t matter. You can’t have management that rewards asymmetrical games that benefit rule-breakers at the expense of everyone else, including the customers.

Here's an example of an end-run. I had a 6 month secondment to the USA at a previous job, and I needed a server to do data analysis on. It would have taken over 6 months to get one provisioned for me because of "process".

I had a laptop which I brought with me from Australia. It wasn't in the asset register in the USA, so I was entitled to a computer. I ordered the most highly specced desktop build available, put it into the empty cubicle next to mine, and spun it up as a development server. It didn't have backups, but that was OK because I never worked with primary data on it and all my work got committed back to Perforce daily.

Strictly it was very much against policy, but policy would have meant I spent 6 months sitting on my hands. My manager "hustled" for me and did an end run around process.

That's fine because it isn't contagious. If your rulebreaking bites you, the entire company's customer base won't even know.

no no no no. I've seen this a hundred times[0].

That service becomes the source for a management report.

That management report contains useful data that the CEO looks at weekly and uses to build his board report.

The original Aussie guy leaves, but leaves the laptop behind because it's not his. He also doesn't document it (because that would get him in trouble for running a server on a laptop).

...Years pass...

The laptop finally dies. The CEO is furious because he can't create his report. He leans on the IT manager. The IT manager has no freaking idea where this report is coming from or who makes it. They lean on the Support team to find out which server produces this report. The Support team drop everything to work out wtf is going on, because this suddenly became their #1 priority.

Eventually, someone finds the decaying husk of the laptop, and works out what's going on. They put together a plan for creating a supported server to do the same thing. It'll take weeks, because they have to provision a server properly through the usual channels. CEO has a rant at the entire IT department for not supporting critical business processes, and not being agile enough to support the business. IT manager takes a beating in the next management reviews. No-one is happy.

[0]usually a rogue spreadsheet rather than server. The worst case I saw was an Excel spreadsheet in a business-critical department running on a user machine with a post-it note on it saying "don't turn this machine off". If the logged-in user name wasn't the same as the temp who had originally built it, the spreadsheet refused to work and the department ground to a halt.

Slippery slope fallacy. We're talking about temporary measures to get things done. If you don't pay back the technical debt, then sure. It always comes back to haunt you.

If you have worked in support this tends to be less fallacy and more reality. The point is that when cutting corners in the purchasing process people often are fed up or too much in a hurry....so things are not documented the way something should be for production. When these process jumpers leave the “tech debt” Piper is paid via your support team/IT burnout.

I would agree with you, except I've seen it happen so freaking often

I hope you don't mind -- to me that sounds like an off-topic example. What you did, was good for you and the others. Whilst the one you replied to, wrote about doing things that would be good for you, but bad for others:

> ... benefit rule-breakers at the expense of everyone else, including the customers

Agree with that. Rules should be agreed and then followed by all. I was talking about the general rule of whether "ship features over the weekend with a ragtag team of developers" is bad.

At the exact moment you take money or personal information from users, you really better have a safety / security first approach and have a defined, repeatable, scheduled, abortable deploy procedure. Its not number of users, its what its going to cost each user when you screw up.

> Every tech company I’ve worked for has had at least 1 manager who tries to ship features over the weekend with a ragtag team of developers who don’t understand why that’s a bad idea.

Maybe I've been in startup land for too long, but seems super normal and fine to ship a feature over the weekend if it goes through the regular CI gates - it's tested, peer reviewed, been QAd in staging, etc. Is this not accepted outside of startups?

The hypothetical scenario you’re describing and the story I just read have almost nothing in common except the word “weekend”.

Deployment over the weekend can make a lot of sense in the world of B2B, but there’s a difference between a carefully thought out plan to deploy at a quiet time and sneaking something out when no one is looking.

Shipping over the weekend isn’t always a bad thing when it’s necessary, but it needs to be an all-hands effort. You need to give people a heads up and at least collect the minimum base knowledge to do it right.

In this case, someone was trying to quietly ship things to production on a Sunday without involving the owners of the front end. How would it look for you if some other team crashed your part of the website on a Sunday without even coordinating the change with you first?

My point was that it’s important for companies to not reward selfish behavior from managers who want to make a name for themselves. If you genuinely need to ship a website feature on Sunday, you involve the website team for launch and follow up monitoring. You don’t try to quietly ship it out the door at the risk of breaking other parts of the business, as Rachel explained in the article.

That sounds like a nice smooth pre-production process...

Her story was of an untested feature trying to get injected into the frontend on Sunday to meet a Monday deadline, by someone without the proper access, and with no apparent oversight or process concerns.

I'd do everything I could to block this push as well.

Wouldn’t “everything” in this case be to summon the managers to drop blocks on heads? If you are yourself a manager, summon a more powerful manager that can drop more blocks.

Here's another idea, you can ship late. There has to be a really good reason to meet the Monday deadline and it must be really important to the company. In fact it would be so important that the people and departments would know that it's important. They wouldn't find out about it on the weekend the day before it ships.

Sounds like these people designed a component that wouldn't work on the current infrastructure. That happens, but if there is a big oversight like this, then you should ship late, instead of risk taking everything down.

> seems super normal and fine to ship a feature over the weekend if it goes through the regular CI gates - it's tested, peer reviewed, been QAd in staging, etc.

Multiple of these processes you listed often require humans. You either are asking them to do this during the weekend, which is bad, or you gave them ample time during the week meaning you're ok with steps needed on weekdays, so you can do that for the last step too, with full staff present.

(bugs and urgencies notwithstanding, but that doesn't appear to be what we're discussing)

Depends. If you have a full CD/CI system and use feature flags, you can definitely push such an update into production, having the feature disabled by default but enabled if the requests have special attributes set, or for a specified group of people (A/B).

My comment is unrelated to what you can do, it's about what you should do. If you can afford weekday humans in the other parts of the process, you can afford a weekday release.

Considering the described situation is someone asking for help understanding how to achieve something (which indicates they don’t have the familiarity with the processes in place to do these things), and they’re asking the team responsible for that infrastructure for the first time on the day before launch, I don’t think one can claim that it’s been well-tested and peer reviewed.

sure, but I'd think some gates on that are standard practice no? Owners needing to approve changes, tests passing, QA'd in a staging env, etc

Sure, but we’re so far away from that scenario here. In that ideal world you describe above, the project leads would have already talked through the requirements and processes involved with the environment and/or service owners and/or release engineering groups. Not all changes are of the sort where “if the tests pass, it’s safe to ship it”, and that requires learning what is acceptable and not, and talking to the relevant people when you’re out of scope.

Environments like those described often have continuous push and automated slow rollouts with health checks, so the idea of doing something on a Sunday isn’t that strange at all.

That said, there’s something to be said for not trying to locally optimize. If you push bad stuff on Sunday, you’re messing up a bunch of people’s well-earned rest and recovery time from work. You push bad stuff on Monday, and everyone’s there to help you fix it without the stress of lost family or other commitment time.

The difference is 24 hours, which likely isn’t going to make or break anything. It’s easy to get sucked into believing things like that matter when they don’t.

I'm at a fintech in NYC and my team generally doesn't release stuff to prod between like 10AM Friday and 10AM Monday.

I haven't had specific conversations with anybody about it, but I think we have all been around the block enough times to have been burned on a few weekends when it really wasn't necessary.

Not a start up at all though, and not a team of twenty somethings with anything to prove by moving fast and breaking things.

Agree with you though that I have seen this at a lot of places. I did a number of phone interviews looking for a more relaxed place in order to end up here.

Deploying stuff over the week-end is standard practice for trading systems, because that's when markets are closed. But obviously those deployments are discussed and planned during the week.

In fact most trading platforms have this huge advantage of not being 24/7 operations.

This handy flowchart hangs on the wall near my company's core infrastructure group [0].

[0] https://www.amazon.com/Should-Deploy-Friday-T-Shirt-Sizes/dp...

That's almost complete... there's another three question flow which gets to yes.

1) Is stuff really badly broken?

2) Was the really bad breakage introduced recently (this could either be an earlier bad rollout, or it could be external factors changing)?

3) Is this requested deploy either a revert of a recently-made bad change or the minimal possible fix/bandaid?

If all three of these are true, then you can do a deploy RIGHT NOW regardless of the calendar.

(recently - because if this is something which has been broken for years then it's unlikely that it suddenly became urgent absent external change - which is already carved out above)

So what you're saying is, it's acceptable to deploy on Friday at 5pm if and only if you're undoing the damage caused by some chump deploying on Friday at 4:55pm?

Yes. And then to find out why a DEFCON-1 level revert was necessary at all: that's a deploy that you want to be prepared for, but you never want to use.

Or some data corrupting bullshit that was released on Thursday and it's taken you that long to notice but it's causing ongoing damage.

It really depends on what it is you're shipping.

Once a startup hits some level of maturity, it's unacceptable to be shipping something significant on the weekend (or whenever people aren't around to respond to an issue). Probably post product-market fit, maybe Series B.

I guess it also matters how much your company values work-life balance.

In my experience at all levels, I would say Engineers generally don't have a very good sense of the big picture and of what really matters, are generally caught up in a lot of details unneeded complexity that doesn't create value in the manner they think it does.

I remember as a young Eng. getting caught up in the platform holy wars, and then sitting as a PM looking back on it all like I must have been in some cult.

There's truth to the notion that 'it's complicated' and rarely does anything get done in a weekend, but if there is focus, a decent dynamic process, things can move faster.

I worked at one company that had a messaging product, it has a big team of Engineers and things were at a snail's pace. I suggested bringing in a few talented people and starting from scratch as a re-factor, they thought I was crazy. A young intern left the team, did it on his own with one other person and met with enormous success. The company, even after literally watching an intern out-do them never changed.

Both the old company and the new company are big enough names you've all heard of them I wish I was at liberty to share.

In another project, we were opening up some basic APIs. We did some work with Facebook and they were able to give us a custom API in literally a few days. Our own, simple APIs took 18 months to deliver. The weekly product teams consisted of 10 people rambling on - and the two most important people, the dudes actually writing the code, were not ever present. It was a colossal and shameful waste.

Even though getting a rag-tag bunch of Engineers over the weekend is usually not a good sign (it might actually work for some marathon bug fixes or something), I'm usually sympathetic to the cause.

I’ve faced this a lot, particularly when the management is a couple of layers away from the team doing the work.

It makes me wonder how these organisations don’t collapse under the weight of their ineptitude. Most of the bugs or issues I have to fix are from problems we created by short term hacks. Way beyond simple tech debt.

The engineers are as much at fault as the managers, particularly when it comes to introducing insane complexity to the stack to solve simple problems (how many startups seriously need to invest in tech like gRPC or graphql except to gain cool points?). Management, on the other hand, have no empathy for the people doing the work, so quality dips as we are pressured by both self-imposed and external deadlines which are decided with zero input from engineers.

Half the time we web engineers are building glorified content management systems with some nice design over the top. It’s boring but it’s not a burn out.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact