"The engineers and I handle customer support. When I tell people that, they look at me like I'm smoking crack. They say, "Why would you pay an engineer $150,000 to answer phones when you could pay someone in Arizona $8 an hour?" If you make the engineers answer e-mails and phone calls from the customers, the second or third time they get the same question, they'll actually stop what they're doing and fix the code. Then we don't have those questions anymore."
(This is of course a bit different thing, probably he's not suggesting to having those engineers take the calls middle of the night).
While in reality at least in my experience the developer would be very much happy to fix the code, it's just that you don't get any time to do that. It's only new features and new products.
So which set of customers do you please? How do you ensure that customer requests for behavior changes, or customers being surprised by application behavior doesn't result in a scope creep that expands the requirements of your application to an unsustainable level?
How do you ensure that your business doesn't get coopted by BoM (Buckets of Money) to basically be a contract development house and eventual acquisition target by your largest enterprise customer while ignoring/harming all your other customers who were earlier adopters?
There's a lot of jokes about bugs actually being features, but at some point if an application behaves in a certain way long enough and that behavior is relied on elsewhere, changing it is a new feature with all the consequences that come along with that, even if the new behavior is strictly more correct.
All of these factors need to be balanced in determining what the best path is to tread in your software. At a certain scale, most of these questions can be addressed organically, maybe by the developers themselves, but at a certain scale it's just not feasible to interface customers directly with developers. You are working on "new features and new products", because customer issues get turned into identification of new markets to solve those issues, which are serviced by new features and new products rather than by "fixing" the old products and features, which would break it for other customers.
The situation described by GP is primarily a expectation and people management problem, not a technical issue.
You don't. You remember who pays your salaries ( i.e. it is the customers who pay you buckets of money, not Joe Random User that bitches about paying $5/mo ) and you prioritize the features/requests of the Bucket Of Money customers.
If you have a real product, then the requests of Bucket of Money customers would closely match the requests of other customers and your product will grow gangbusters ( see AWS/Salesforce/Dropbox ). If they do not, then it is quite possible that in reality you are nothing else but the custom dev shop for one or two bucket of money customers and you dont have a real product.
This is my experience building for and selling to enterprise. There are a limited number of 800lb gorillas and but every company thinks they are a special snowflake. This quickly has you building software tightly coupled to workflow which is death for commercial software. It's usually one of the reasons the market exists in the first place; incumbents can't get the required reward to match the effort at their desired scale.
> How do you ensure that your business doesn't get coopted by BoM (Buckets of Money) to basically be a contract development house and eventual acquisition target by your largest enterprise customer while ignoring/harming all your other customers who were earlier adopters?
Fork the product into two (or more) separate products, each of which are somewhat similar but cater to different audiences with different needs.
Also, organization and management of it all will take more time and effort. You need to reconcile them or make decision.
I agree. At GitLab, I'm trying to build our "Support Engineering" group into people that have the time and mandate to fix the things that spring up leaving product devs to focus on new features. It's been working pretty well:
https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/16511 (40% view speedup)
This might be the niche I'm working in, but that does not sound all that unreasonable to me; I certainly know the type.
At first, there is a debate whether there really is a bug (instead of giving a trained user and the according support staff the benefit of the doubt), then there is a debate whether the bug should really be fixed or the thing is delayed at nauseam ("need more information" without even having skimmed the damn code at least once). And when the business side finally puts their foot down, the defect is "fixed" with some minimalistic, ad-hoc duct-tape solution that guarantees that five other bugs will pop up in the future because of it.
So while I agree with your second point, that doesn't invalidate the first. These kinds of devs actually do exist (with various degrees of intensity, of course). I'm not sure if the archetype is really fit support work, though.
I mean, are intermixed ALL the time. I literally this week was working in a feature (that was, in reality, a bug) meanwhile only talking to my partner that give support and see him ignoring a red modal windows with a error message (oh, that is not a bug, it allow to work(???)).
You can turn the bugs and anti-features of the competence in your own selling points (our product sync the offline database without the need to have a operator that restart the service and click on "sync" in the dashboard!!!) and stuff like that...
A simplistic description is, "sales and marketing increase MRR, a quality product decreases churn".
Often a company becomes financially viable by fixing churn.
I started informing the developers every 2nd (or maybe it was 3rd) time I saw different people experiencing the same problem.
So by the time we'd had around 8 people reporting the same issue they'd gotten tired of seeing my face and fixed the bug.
It's the feedback that's important. A little friction in the path could be a good thing, as long as the people providing the friction have an incentive to do the right thing. I sat one room over from the devs, I was friends with one of them, (had designs on switching teams) and we all went to the same planning and status meetings, so my head was in the right place for this.
I recall many many years ago Dell cancelled a plan to move front-line QA overseas because they realized that there was a perverse incentive for an external team to keep quiet about frequently asked questions (because they're cheap to answer and you get paid by the call).
Those people in Arizona won't give a damn about how many people can't figure out the app, so long as users keep calling (ie, just good enough that you retain customers).
The other thing is that you get familiar with the "type" of problems that arise. This helps you communicate with customer support when they call, or know what questions to ask to get to their issues.
I've had customer support complain about an issue or feature request they have never told us about but have had pain points and customer requests for months. It is always a trivial change. Improving communication between customer support staff developers is hugely valuable however it happens.
<< If they pay us enough money for me to notice when they no longer do, I will take their call at any time and so will you >>
This is great for actual issues, but what about the genuinely stupid questions? Or the customers who basically just refuse to read the Help section before contacting customer support? From my experience working in support, there are a fair amount of those. And those just seem like a waste of the engineers' time.
The software field seems very anti-union, for reasons that I don't entirely understand. Protecting your time is one feature that unions offer. Want to be paid for every hour you're on call? Get the union to put it in their rules for employers.
The alternative we have now is that each developer is responsible for negotiating this into their contract on their own -- even though they probably have little or no experience with legal contracts. (Dammit, Jim, I'm a programmer, not a contract lawyer!)
I'm surprised that software professionals have opted for a solution that necessitates each person solving the same problem each time it comes up. It's like manual memory allocation but for employment contracts. We just assume everyone is always perfectly competent at this skill, and if they aren't, it's all on them!
Right now, software developers have LOTS OF POWER. Companies are constantly courting devs, offering very high salaries, and including incredible perks that simply aren't available in most other industries. Yes, some engineers are exploited, but on average the power lies with the employees.
When you think of all the "union overhead": voting on contracts, complying to work rules, occasionally going on strike, there just isn't enough pain to make that worth it. Yes, I might work too much overtime and go on call at bad hours, but I can also leave and get a new job relatively quickly, and get paid a lot to work in a climate-controlled environment with free drinks and food.
A real software union will arise if and only if it becomes necessary. If somehow everyone is automated out of a job and the few engineers left feel that their very existence is at risk, and the few employers remaining ruthlessly exploit those engineers for minimal pay and benefits, THEN people will fight back (and a union is one possible strategy for this fight). Until then it's a waste of effort.
This is actually not true in my experience. Having a high salary and perks isn't the same as having power. On-call is a perfect example where, AFAIK, it's not the norm to get paid for it (as a developer at least), and if you don't like it, you're free to get a new job. Except the new job will likely have the same policy.
There are other work-life examples: for example, I've tried to find a part-time job, and to get jobs to offer me more vacation time in lieu of higher pay. Both of these are more common in countries with strong unions, but I've never been able to get them here.
My view is that you need to look at the totality of compensation for the totality of responsibilities and decide if the deal is fair or not.
Nonsense. There are a bazillion software companies out there and most look nothing alike. If on-call is a dealbreaker for you, just ask the right questions and be clear up front.
And if you want to work part time, consider contracting.
It’s fun and all, but I’d prefer more vacation time over a ping pong table and free dinner.
The "perks" I see being offered for software developers are essentially free for employers to offer at scale. When you're paying someone a 6-digit salary, "free drinks and food" at work are cute but pointless. Anything of real value, like working conditions, are never up for negotiation. I see lots of people here saying they've never been paid for being on-call (nor have I). Most people I've asked say they'd prefer, and are more productive in, a private office, but no employer in my city offers it. You can't just switch employers to fix this.
"Union overhead" is very low, for what workers get out of it. Strikes are actually quite rare. Boeing machinists are a giant union here, and they haven't been on strike in over a decade. Voting and compliance are strange things to complain about. Either they're already done today, but in different contexts (e.g., one-on-one meetings, if your manager listens to you and has the power to give you what you want), or they're not done today, and it would be a great improvement for workers if it were.
These complaints sound downright bizarre in any other context. I don't hear anyone looking at a modern democratic republic with safety regulations and saying "Well, things like voting and compliance are just more trouble than they're worth".
Only if you ignore the fact union members are giving up power to union leaders, who can force them to give money to support political candidates as a condition of employment and who can, as part of a contract, deny them the ability to participate in labor actions if the union leadership disagrees, probably for some profit motive.
(With how Hollywood is run, you might as well have While Loop Engineers.)
So... we switch hats a lot. In the last six months I've been (a) deep into research, producing basically Beamer slidesets; (b) implementing that research in Python; (c) spearheaded the basic product design process, albeit based on the whole team's insight (my lemma was "I'm basically an anthropologist trying to make sense of what everyone's been saying", although I made significant calls (d) pitching in with front-end programming in Angular when our third party provider pooped out and it got to crunch time and (e) spent deep time writing finance spreadsheets and valuation models.
It's a tight ship and I'm not even the most multi-talented person in the crew.
IMHO the real selling point for a software union would be a union-sponsored tax advantaged retirement plan that is ruthlessly optimized to being employee favored. Lots of startups don't have retirement plans or have shitty ones. Furthermore, lots of companies run into issues with non-discrimination testing for "highly compensated" line engineers. If you get things set up with a multi-employer plan with a pretty lightweight collective bargaining agreement for employers to sign, you can get significant tax advantages.
A few ideas, not in any specific order:
It would kill start-ups, wouldn't it? If you have to obey union rules, you can't have one person who's the DBA and the project lead and the primary programmer and and and because those are all different jobs and require different union employees. Great if you're Microsoft and can afford it, but otherwise...
What would it do to Open Source and Free Software? Can it be Union-Certified if it's made by people who aren't all being paid? If a company has to Look For The Union Label, as the song goes, is it going to be allowed to use software which comes from outside the corporate sphere?
Related to the above, what would happen to hobbyist programming? Would it be regulated out of existence entirely, or merely relegated to people writing software nobody's allowed to use because it was made off-the-clock?
If the union so desires, it could state that no union member may do all the aforementioned jobs at once unless they have at minimum an indelible 5% ownership stake in the company; bear the title of Chief Technical Officer; report only to the CEO, COO, or directly to the board; and have no employee subordinates. It sets the expectations for any company wanting to do that. Any company wishing to ignore it can still hire non-union employees, but the union employees are instructed to embargo any company that does not meet the standards for employment.
Hobby projects are irrelevant. There is no employer. Unions are a cartel for paid labor by employees, where there may be a power imbalance between owners, managers, and laborers. In a hobby project, the worker is always the owner.
FOSS is likely unaffected. Any employer that has FOSS projects is already reined in by the threat of forks. In the extreme case, the union could manage their own forks of FOSS projects, and refuse to merge changes that don't meet union criteria.
It isn't about whether anyone is paid, but about whether everyone is following the cartel rules.
Why would the union rules require separate people for two positions? Why would this be relevant to open source and free software? Why would it be relevant to hobbyists? And so on. I just don't see these problems in other fields. There's unions for actors, and stagehands, and janitors, but I still have a local amateur theatre in my neighborhood, and the lighting director sweeps the floors when that needs doing.
Whenever I say the word "union", all the complaints I hear against them from programmers sound like straw men. That makes me believe that it's the right track.
You refused to understand my post and downvoted to disagree.
Your position is, therefore, refuted by your own dishonesty.
I didn't know how to react and said sure, here's my number. After waking up twice during the night I decided to mute my phone completly from 10pm to 7am.
Sometimes I woke up and had 50 messages and would try to solve the problem from home before heading in the office. Sometimes a coworker was already on it if they were in the office early.
One time my boss asked in the morning if I didn't get any messages. I said sure I did, but I was sleeping, I don't get paid to get up in the middle of the night. He didn't say anything and I would keep turning my notifications off :)
However, hiring people and not telling them this is part of the job is insanely unfair. People should get paid for being on-call. Further still, if something is going to wake me up during the night, I would expect to be able to work on those problems as a priority.
If I can't do that, then no dice, and thus this further explains why I don't really agree that Devs should be able to throw code over the wall at Ops, and ruin their lives whilst they sleep easy and priortise features over stability.
> which includes meeting the support requirements out of hours
I'm not good with dealing with issues with a deployed product while I should be busy living my life.
Instead, you think some ops guy should do it?
End of the day, an Ops guy who is not part of the development team can't really do all that much when a bad commit brings down the system. Sure, we can take a stab in the dark and roll-back, but we don't know if that's going to make the problem worse, or we can restart it but that's about the easiest thing in the world to automate.
So, why not get the experts of the system, who rolled out the change, and undertook the quality control, to be part of the team that fixes the outage (irregadless of what time we do it at)? Who is better suited?
How does an on-call rotation prevent this?
Where does the responsibility end? At me doing the best job possible under the circumstances, all that could be expected of a human being.
Do you think the same about your previous jobs? If they find a bug you wrote you should come in and fix it?
Have you considered mentioning this to your colleagues, boss or anyone telling you it's your turn? If not - it's really not cool towards your colleagues, boss and organization in general.
a) my wife
b) my kids
c) willing to pay me
(I was on call every 3rd week back in my biomed days but I was explicitly compensated above my normal pay for this)
Additionally, if this job duty was actually important, then the employer would actually react with some criticism when they discovered that an employee was not performing this duty adequately. The fact that the employer isn't upset about having the on call employee ignoring messages means that it isn't important. Making employees being on call when the employer doesn't actually care about anyone being on call is also not cool.
In this case I was being paid for it and I only heard about it in the first week after starting. I asked "how often do I have to work on weekends", but not "will i be on call" in the contract negotiations. Definitely a question I will ask in the future. Although I think being on call is a good thing.
" all work necessary outside these times, including Saturday and Sunday, is already remunerated in the base salary."
That's actually not enforceable in most US states. Typically a base salary cannot cover work which is outside normal business hours if it's a consistent expectation that work will be performed outside normal business hours. Contracts are enforceable because both parties are given "due consideration", so that the terms are intended to be mutually beneficial. As written, what you described is basically saying "Do more work, do it outside typically expected times to perform work, and you're not getting anything more for doing this."
As an example of how this works, in a situation in which you will work consistently outside normal business hours but are a salaried employee, such as a manager supervising an overnight crew at a manufacturing facility, you will be awarded "Shift Differential Pay", which is a percentage or dollar amount differential above and beyond normal pay for someone of your position at the company as "due consideration" for agreeing to work outside normal business hours. This would be true for your overnight crew on the line as well, except that it's included in their hourly.
Being salaried does not automatically mean you are a slave to the company and their whims, and being overtime exempt does not mean you can be asked to consistently work outside normal business hours week after week without any additional compensation. Overtime is an occasional action, if you are being asked to work outside normal business hours as a standard operating practice, that's something else entirely.
All of my jobs have been as the GP describes, where managers see after-hours problems as the dev's responsible for the relevant feature's "fault", and expect them to both manage communication with business stakeholders about the problem and make it go away within short notice.
Of course getting a call middle of the night, when your system goes down because of an AWS outage you can nothing do about just sucks ;)
I've been on call for 20+ years. I've never gotten paid extra for it. I just figure it's baked into the normal paycheck. As long as everyone on the team is doing on call about the same amount, it doesn't really matter. At the end of the year, it usually works out pretty evenly.
> Scheduling. When I have been on call, it has always been one week at a time,
I agree with this one. At Netflix we tried a bunch of different schemes, going from just a few hours on call at a time to a week at a time. The week seemed to work out best for everyone.
> Escalation. There should always be an escalation path if there is a real crisis.
At Netflix, our escalation path was always the on-call engineer, me (the team lead), and then my manager and then their manager. It almost never got past the first engineer, and in the rare cases it did, pretty much everyone on the team was ok with getting a call at any time to help in a crisis, so usually we'd just call whoever would have the most relevant expertise for the current issue. Oftentimes another one of us was already on the call listening anyway. It rarely rolled up to me.
My point being, beyond the rigidity of one person being designated on call, if you really want it to work well you need to be flexible and trust that your team is made of competent people that you can rely on, and they need to be cool with getting a call when they aren't on call, assuming that they might call on you one day.
They've noted that it's important to pay compensation, both to be fair to the employees, and as a closed-loop feedback mechanism to ensure the business prioritizes fixing pages. This concept of business feedback is also discussed in a chapter of the terrific Seeking SRE book, chapter "Against On-Call: A Polemic" .
Adequate compensation needs to be considered for out-of-hours support. Different organizations handle on-call compensation in different ways;
> Google offers time-off-in-lieu or straight cash compensation, capped at some proportion of overall salary.
> The compensation cap represents, in practice, a limit on the amount of on-call work that will be taken on by any individual.
> This compensation structure ensures incentivization to be involved in on-call duties as required by the team, but also promotes a balanced on-call work distribution and limits potential drawbacks of excessive on-call work, such as burnout or inadequate time for project work.
 https://landing.google.com/sre/sre-book/chapters/being-on-ca... (search Compensation)
By default, in all but one US State, employees can be fired with no notice for no reason (or for any reason other than a handful of specific exceptions like "because of your race"). The legal departments of most companies feel that having a clearly specified contract would weaken this right, so as a matter of policy they do not have contracts with the majority of employees (this includes most tech employees). They have employees sign separate contracts for things like "keeping all the company's secrets", "granting ownership of copyrights and patents" or "promising not to work for any competitor for some length of time", and these contracts are not tied to their salaries.
You are welcome to decide that you don't like this system, but you will find it difficult to find employment.
- This is not a contract; any contract with us must be signed by the CEO. (Paper is not signed by the CEO.)
- You are an "at will" employee. The employment relationship may be ended at any time, by any party, for any reason, or no reason at all. There are no notice requirements, and any and all obligations of one party to the other are severed at the moment of separation.
- We may change the terms and conditions of your employment at any time. If you don't like it, you are free to leave.
As employment "contracts" go, these were slightly less useful to me than a roll of toilet paper.
When I made my comment, I was trying to get a handle on just why Americans don't think of these as contracts, and the quoted bit is why I think. An employment contract, to an American, means, for whatever reason, probably because that's how Europeans do it, that the company can't just fire you.
The fact that should the agreement ever turn up in court, it's contract law that will be used to adjudicate it, just doesn't register. Probably because lawsuits are so far away from the American consciousness, something only big companies with huge budgets do with each other. Or ambulance chasers or other such grifters.
The only negotiable point is the rate of pay. The valuable consideration is money for labor. All other terms and conditions of employment are set by the employee manual, which is "do this, just like this, or we fire you".
As contracts go, they suck for the employee.
The culture shock I'm feeling at this discovery is worse than discovering the US doesn't have electric kettles, bans crossing the road, or still uses cheques in shops.
they need to be cool with getting a call
when they aren't on call
If I'm getting an emergency call when I'm not on call twice a year, well, these things happen.
If I'm getting such a call 12 times a year, I gotta wonder if there's a problem with our testing practices, our prioritisation/tech debt, our L1 support, our high-level architecture, our infrastructure, the teams that interface with our system, our training practices or our hiring standards.
Some of those will be within my power to fix. Others, maybe I figure it's easier for me to move companies.
So much for 40 hour work week.
People would be spinning in their graves if they knew what had become of employer-employee relationship.
The best scheme in my experience is to have a 24/7 fully staffed L1 working shifts for initial response triage then waking the appropriate L2 to deal with the actual problem if it isn’t covered in the runbook. No good waking the DBA if you can’t login to the database because a router has crashed and failed to failover. The next morning the L2 guy updates the runbook so if possible if the same thing recurs L1 can just handle it.
What really doesn’t work is having the on-call support also doing out-of-hours work such as maintenance or whatever. That’s a recipe for disaster. People willing to make the sacrifice of on-call are a precious commodity to be used wisely, or they’ll walk.
Paying only in the event of a callout doesn’t work either, that person has still had to decline other activities in order to make themselves available.
I like the idea of compensating by the week; seems like compensating per incident gives a bit of a perverse incentive.
I agree that it is good to try both things, and developers should try to be support once in a while, and vice versa. However, I think they are very different mindsets and doing both is making people less productive.
When you're a developer, you need to concentrate on the new feature you're working on only. The better you concentrate, the less bugs and problems you introduce. However, when you fix issues for the customer, it's often punctuated work where you wait for the customer, or investigate the problem, and so you work on many different little things at once. It is not a very good environment to do bigger decisions about architecture changes.
Of course, it depends on person. Some people are really good architects/developers and you don't want them to spend their days talking customers through issues. On the other hand, some people are more comfortable in support and that's good too.
And I think if a bug fix requires larger rearchitecture of the thing (that is, the root of the problem is more conceptual than just incorrect code), then it's better addressed by temporary patch and doing it properly in development cycle.
Personally, I really dislike emergency software debugging. I much prefer the kinds of projects that prioritize testing and code stability so as to obviate the need for an 'on call' developer. In these kinds of projects an emergency call should be so rare as to essentially be never.
(High-reliability engineering is very much a different, more expensive, less agile culture from software startups, and I worry that the culture is bleeding across in inappropriate ways. The "self-driving" car with a "safety driver" is an extreme example of this: an on-call human that's supposed to respond to operational problems in an extremely short timeframe, but also provides an opportunity to blame the human rather than the software)
An on call engineer shouldn't be the solution for bad reliability, because as you said it doesn't help. He is primarily there for availability.
Instead of high throughput I maybe should've said highly available systems.
Splitting the people that write software, fix bugs for that software and make sure their system is available throughout the night is terrible from a responsibility point of view. It makes you not give a crap, while developing new features. That's just human nature.
1 - it's one week length
2 - it's paid extra
3 - it's optional, but you're a bit of a bad mate if you don't participate
4 - we try to have at least 6 people on rotation to ensure a full month between on call
Because we do several changes to production per day, our coverage is around > 99% for all our services and libraries (my team is responsible for about 30 of them). We have near zero live incidents, and whenever it does happen the phone rings, it ends up being just some unpredictable spike in load that self heals without intervention.
Because on call is not painful (as it shouldn't be!) and we support each other no one has any problem being on call.
IMO what companies should be doing is paying extra per hour until they get people that want to do it. As in, increase the price they pay "extra" until someone decides to give up their free time outside of normal development hours.
On-call also varies a ton between companies. I was technically on-call all of the time in my last job, but it was a low throughput system. I had to be up at odd hours maybe once every 2-3 months. I slept pretty well. If you offered me free meals for the week, I wouldn't mind taking my turn on the watch regularly.
This job, I'm on call maybe one week every two months on a high throughput system, and even though it's only half of the day (we have an overseas team to take the night shift), it's generally acknowledged in the team that your sleep takes a hit and you get no real work done that week. If this were an optional part of my job, you'd have to pay me double for the week (basically a 10% raise).
as I said it is optional, no harm comes to you for not participating, and we have people with very good reasons for not helping the team support the code they themselves built and deployed themselves to production.
This seems to clearly state that you have a negative opinion of you for turning down extra work that is clearly undesirable. I assume that you're not the only one on your team that feels this way either, that's what I meant by social penalty.
Depending on who you ask it may not be a penalty but that would mean everyone on the team has to think this way. I personally don't think people should be considered a bad mate if they don't want to do an optional thing -- what if they value their time more than the pay+extra that's being offered?
I don't know if you've noticed, but time is the one thing you can never get back. Once you make a certain amount of money, you can live very comfortably -- priorities shift to things that you can't just buy, time is basically the most lucrative and rare yet abundant resource there is.
There are other companies to work at, and we make our expectations clear before the person is hired in relation to on call.
Because many optional activities have an impact on your peers and they are unlikely to judge you strictly based upon your job duties?
As far as your impact-on-peers argument -- you could "optionally" also stay 5 hours after when you normally go home to help reduce workload for your peers and help them, do you do that? No? What about 4 hours? what about 3? 2? Where is it fair to stop? The rest of the adult world calls this professionalism, and you stop at what's required of you as your job duty, put forth in your employment contract. In the course of fulfilling that duty you're expected to be reasonably courteous, not to subscribe to some weird hostage situation where the rest of your team suffers if you don't do something that was marked as optional.
You seem to be assuming there is a lot of peer pressure placed on you if you don’t want to do it.
I’m simply saying there are always social costs. For example you probably won’t be listened to as much when there are conversations around improving system stability.
It’s like our after work Friday drinks are entirely optional - but lots of people build friendships and trust there and this can often lead to higher productivity.
If you can build these friendships another way or have a different path to an equivalently high productivity then not going doesn’t have an impact on you.
That sounds like not listening to people about things they might be good at and know something about, because you want to punish them for something completely unrelated. Namely, punish them for not participating in "optional" activities. All the while you don't want to openly and transparently say what you expect from people.
Yes, it is manipulative and it is bad workplace.
> It’s like our after work Friday drinks are entirely optional - but lots of people build friendships and trust there and this can often lead to higher productivity.
It sounds sounds like nepotism where your ability to function and be promoted rests on your ability to make friends and be charming around beer.
No a meritocracy, but rather badly managed workplace.
Seriously, you openly say that you would listen and judge system stability suggestions based on participation in supposedly optional activity unrelated to system stability. You also openly say that you trust people work based on Friday beer instead on how they act when working.
That sounds like horrible workplace for anyone who care about work and great workplace for charming bullshitters.
That means that fathers don't have to drinking Friday evening and can be with their families. It means that parents who pick up kids after work are not disadvantaged by it. It means primary caregivers (women) have smaller hit on their career then they would otherwise. It means that people can so sport on Fridays, abstinents do well, anyone can use Friday evening to travel.
It is not merely mildly manipulative. It is literally bad office politics framed as "being social". Peers being passive aggressive is no different from management being manipulative or passive aggressive.
Lastly, it also means that I can make open transparent agreements about my work and preferences and salary compensation. Because in your setup, such things are not talked about openly and conflicts are not solved directly.
In my experience they are closely related.
> You also openly say that you trust people work based on Friday beer
Sure - there is an incredible depth of research on trust building via outside of work/after work activities.
> That sounds like horrible workplace
Strange considering I work at companies regularly listed in “best companies to work for” surveys.
As in, they are fun places if you single, but if you don't want to offload all children or sick relatives care to partner, you will be punished for drinking with buddies less. Your actual in-the-workplace behavior and output will be irrelevant.
They are fun places because of ping pong table and x-box console, but you wont be able to make explicit agreements about your workload and nature of work.
This is a problem, because during the interviews the person was repeatedly told we did on call and he/she's ok with it.
For us, as a team, it's important because if you have the right to deploy to live whenever you want (after code review evidently), you have the obligation to keep it. When everyone shares the load, the load is lighter for everyone. And my experience tells me it just makes everyone much much more responsible and professional.
However, my employer has moral agency above me only insofar as I'm not committing a crime against them (ie fraud or embezzling company funds, etc). This does not include me not performing duties I'm paid for. If I don't do my work, then they don't pay me. This is a civil matter. This certainly doesn't include the reason why I'm deciding not to do optional work. My employer doesn't get to decide that I'm somehow a bad person because they don't agree with why I'm not doing extra non-required work.
Eventually, I'm going to be a corpse in the ground. I'm not missing out on my other life goals because you weren't satisfied with my priorities and it turns out that the money you were offering didn't help me accomplish what I want to accomplish.
How do you know this?
And if you think that's bad, let me tell you about my friend. He's the only cardiologist in town...
If I have to option to not do a bad thing (even if it is only minimally bad), then why shouldn't I pursue that option?
If you don't mind doing the bad thing, then you should definitely take advantage of that. But probably shouldn't try to convince other people that the bad thing isn't bad. 1) It reduces your own advantage of willing to do the bad thing which you are hopefully converting into money. And 2) its end game is making people do something they don't enjoy without reason or compensation, which seems bad.
I'm glad that your situation worked out for your own personal good. Nice things happening to people make me happy. Things working out for people make me happy. However, the situation you describe is the latter not the former. That your bad situation, which ultimately worked out for you, did not personally bother you enough to be problematic (for you personally) doesn't make it a good situation. I'm glad that it didn't bother you. However, it may have bothered someone else.
Your situation was objectively bad not because it bothered or didn't bother you or another person. Your situation was bad because it was the result of a powerful entity externalizing their failures onto weak entities.
A manufacturing plant that has the ability to setup logistics to keep a plant running 24/7 is a powerful entity. A manufacturing plant that is able to support jobs for at least three different engineering disciplines (chemical, mechanical, electrical) is a powerful entity.
A powerful entity is able to hire additional staff to handle non-working-hour emergencies. That they didn't hire this staff was their failure.
But that's okay, they don't have to pay for this failure because they can force their employees to pick up their failure by working extra hours. The employees are weak entities because they do not have the ability to decline an encroachment of their working lives into their personal lives.
They could be sleeping, or eating, or spending time with their families, or spending time on hobbies, or spending time innovating with their discipline. All things which help society and the economy. But instead that time has been stolen to make money for something that already has plenty.
For example, if I figured out the secret to creating strong AI with respect to writing software such that I could replace the entire software engineering industry with one large computer (note: this isn't something I believe will be possible for centuries if it is ever possible), then I would feel compelled to use the billions of dollars this would undoubtedly get me to help retrain all of the software engineers I just permanently put out of a job.
It's also stated as 'if it is within your power to do good, then you should do it'. My contention is that a powerful company should hire more people to cover additional work instead of finding creative ways to get additional work out of currently employed people for the same amount of money because hiring more people is a good that they are able to do and getting more work for less money isn't.
I'm fine with us not agreeing. But I'm also fine with me being right, which is why I'm still typing.
So how much did that pay?
Not that this is necessarily a good thing. I feel like most of the turnover is completely preventable, should the employer want to actually keep employees for longer than 2-3 years…
- $200 for being available for the week. Must be within 1hr of the office should the call require you to come in to use special equipment
- All calls first are screened by our Sales team: "Can this wait until business hours tomorrow? There is a substantial call out fee"
- 1hr minimum for phone support
- 3hr minimum for us logging in with our laptops
- 1.5x rate for calls during "days" (7am-10pm Mon->Sat)
- 2.0x rate for nights/sundays/stats
I've done this for like... 4 years now? Its pretty decent overall, you can get some really great weeks where customers just want things fixed NOW so a lazy Sunday watching Netflix turns into 6hr @ 2.0x rate (even though you only worked for 15 minutes). What this creates is an environment where most of the guys on rotation are happy to swap you calls if you have something going on.
And with all that laid out, I want to say I agree with a lot of what the article says. Problems only exist for so long before people take the effort to fix them. Lots more time goes into testing and making sure we have a clear rollback plan when major installs go in. I think its pushed people to follow my lead a lot more in making very verbose logging options so people not familiar with the project are able to quickly pull up logs and understand the issues.
Overall, I recommend it. I can see how it may not fit in different work environments, but I find it a great addition to my job both in giving me a wider breadth of understanding the work my company does and a bit of extra pay.
My only question is, is it possible to game the system? Meaning, deploy some sloppy code or config the week you know you're on call so you get a few extra 2.0x stints @ 3 hours a piece (on other words, try to manufacture your lazy Sunday Netflix scenario)?
In my last job, it was something you were expected to do, and there was no additional compensation. On the plus side, it did expose you to all parts of the application, areas beyond your usual domain. On the downside, you are not only responsible for your code, but everybody else's as well. It's really shitty to be up at 3 AM fixing a ball that somebody else has dropped.
In addition to the company wide on-call, there was also team on-call, which was a schedule rotation with your team members to be on-call for team-specific issues. The problem was, if you team was small, you ended up being on call a LOT. My team was being continually stripped of members, so for a while I was ending up on-call 24/7 for weeks on end. It was very stressful.
During Initial days, Initial Dev team is involved in support. And it is amazing. You get real feedback and good insights into how user uses the system. This pays off immensely.
Once the product and team grows, the real need of exclusive support becomes evident. And it becomes quite clear, just like not all
support folks can code, not all devs can support. It requires some unique skill set.
Not all requests from customers are critical and even if there are issues, hot fixes need not be necessary. Apart from devs becoming anxious, they may be too eager to comply with requests immediately. Also support requires the team to talk the language user is comfortable with, too much technical communication may not be relevant.
But what has worked for us in growing stage, dev some spending time with support. They can (only) listen to important calls and sharing information between the two teams regularly.
What's also important is that management is good at prioritizing what's important, and what isn't.
Depending on many circumstances, you just can't fix everything.
More responsibility, more accountability, overseeing junior staff AND on call. No roster, no clear definition of exactly what that would entail, but it was the kind of place that had thousands of system, and every day something was on fire.
All of that was offered for the glorious compensation rise of $0.
I happily turned down that "promotion" and it was clear the company hated me for it.
If there's no raise involved, I can only assume you thought I was good enough to do that job from day one.
Small startup that's still iterating or just found market fit? Doubtful.
I still have nightmares that I'm getting woken up into a hellish situation to fix code I've never seen at 3am. Or that I'm out on a date or having a beer or trying to enjoy my life when I get called.
I remember the constant state of anxiety just knowing I could be called. Couldn't even wind down watching a movie much less read a book. I quit when I realized I felt a sense of relief commuting to work the next morning because I wouldn't have to field an emergency by myself.
I also remember fantasizing about being a cafe barista or security guard that year. Waited way too long to get out.
I am way happier now that I don't have to carry my laptop with me 24/7 and worry about taking it out while on a date or running off to find a hallway or corner to sit in and do work during the middle of a movie or concert. Sometimes I'd even get an emergency phone call during my commute and have to pull off the freeway to work.
That said, just the other day (after 18 months off), I twitched a bit when my text message notification went off.
And of course, it affects everyone differently.
I can relate, I get big anxiety rush anytime my phone rings ever since.
This is a myth, and AFAICT, there is no proof of this being an occurrence in Roman society, at all.
A few history buffs couldn’t find anything to support it....
I’d be interested to know how they did test bridges etc.
I would argue that on-call shouldn't exist at all. If a company wants a system to be supported 24/7 it should have three eight hour shifts. Of course companies balk at this, saying it's too expensive, but if their product isn't worth paying extra for then perhaps it's also not worth being up 24/7.
This is the sort of thing that should be enforced by law or by a strong union contract because businesses can't be trusted to act in their employees' best interests.
If resolving an alert requires to "turn it off and on again" you don't need a developer for that.
Stress and lack of sleep reduces cognitive performance (what you pay for when hire a developer) and kills employee morale.
If you have 2 similar job offers for similar companies, one requires you to be on-call, the other one doesn't... which one would you pick?
If you are having a very bad on-call week and a recruiter reaches to you, you will be more likely to talk to them, or will be more likely to ask for a raise or just quit.
The "skin in the game" argument sucks. Developers are not solely responsible for software quality. Deadlines are often not set by developers.
You need a person to do it, and a developer is-a person. It's funny how our community celebrates stories of startups where they built servers out of Lego and emptied the trash themselves, but can't be bothered to flip a switch ourselves. (Or you could write a program to flip this switch, since that is your profession.)
> If you have 2 similar job offers for similar companies, one requires you to be on-call, the other one doesn't... which one would you pick?
Easy: whichever one was better at the 10 other attributes I value more highly than that. It's vanishingly unlikely that I'd get two job offers from companies which were so similar I'd need to compare the LSB.
> If you are having a very bad on-call week and a recruiter reaches to you, you will be more likely to talk to them, or will be more likely to ask for a raise or just quit.
Perhaps true, but not in any way specific to pager duty. You might be having a very bad debugging week, or a very bad legacy systems integration week, or any other kind of bad week.
> The "skin in the game" argument sucks. Developers are not solely responsible for software quality. Deadlines are often not set by developers.
Deadlines are usually set by managers, and when I worked a place with pager duty, my manager had to be on the rotation, too. That company had a lot of problems, but pager duty was not one of them. He was well aware of how bugs would come back to bite us.
We basically run software that someone bought from a dev-shop or wrote themselves. 90% of the time restarting a service fixes the issue right now. If it happens more than once, you asses if it's worth waking a developer and what the likely hood of him fixing the bug is. Normally you'd need a new deployment anyway, and you don't really want to do that at 3AM, better to wait until the morning.
You do need to have developer on call, to some extend, but if you have to call them more than once or twice a year, something not right. In those cases, where the same buggy software is a fault for waking you multiple times a week, it not a developer you want on call, it's a project manager or what ever type of middle management is involved.
The issue is that the developers actually do want to fix bug, and write stable software. From a middle management perspective: if someone is up during the night to reboot servers and hand held data imports, then that's a fixes issue, and the developers can focus on new features.
I assure you that if you call up managers at 2AM to tell them that the software they are responsible for has a bug, they will start focusing on stability.
A decade ago, we had a physical pager that we handed off every week. The pager was tied to our ticketing system and anyone could create a ticket for it. It worked for the most part, but every now and then the "entire system is down" issue would turn out to by Mary in accounting's internet cable was loose.
Then we staffed up and hired a 24/7 on call support staff. We also went from four small Dev teams to dozens. These teams never felt the impact of their decisions on the support staff and would happily thtow code over the wall. They didn't feel like it was their job. Having worked in those trenches, I spent a good portion of my time trying to make it easier for them to troubleshoot our applications.
Over the past couple of years, we've moved to a more modern model. We still have the dedicated first line of defense to handle things outside of business hours. But if something happens and they can't handle it, there are on call rotations for all products they can escalate to. Eventually that escalation still makes it up to me, but having the teams in it has made it more likely that they will put the effort in to making support easier.
I think it is important to have developers support their applications as long as the culture and process allow for it to be sustainable. Part of that is making sure the people on the rotation actually understand the systems they are supporting. Another is making sure each event results in learning and hopefully changes that prevent it. And another is recognizing that when someone has been up in the middle of the night, their productivity will decrease and they should be allowed to recover.
I've seen many decisions made where some attention to "what failure modes would such a design have, that might result in human attention at 3 AM?" would lead to different fundamental technology choices. I know that I have made different technology choices and design decisions, based on some early career experience where I was the person who would be paged if the system required human attention.
But if the people making the fundamental technology choices have no experience or exposure to the 3 AM possibility, the trade-off might never be considered until it is too late.
1) you get calls / emails from the clients. Anything from a P1 everything is on fire incident down to "we've seen some random SQL agent job has failed, drop what you are doing and give us an RCA now"
2) you get automated alerts via systems you dont own, like SQL Sentry, where someone somewhere years ago put in an alert that says "if XYZ batch job runs for 8 hours, alert" then has never touched the threshold since
3) you get automated alerts from systems you do own, which is a godsend because for once you can adjust down the noisy alerts
4) your manager or skip level will without warning create "dumb" (nuance free) Splunk alerts and expect you to see them, know when and how to respond to them minus any documentation to support the point of the alert or how to respond
5) your manager or skip level will accept any automated alert from any other dev or infra team and expect you to know when to respond, when to ignore, and don't you dare ask them to change the alerting thresholds to fix noise, that's not being a 'full stack on call'
6) you must respond to all of the business hours client email to the team distro, within 5 minutes of receipt. If someone puts SupportONcall@nolongerastartup.com on a thread, the subject instantly becomes your personal life mission until solved, dismissed by the email originator, or finally kills enough resources to annoy the manager to the point of (gasp) declaring the issue transient or not reproducible. Hope you like that your manager doubled the fields on JIRA tickets and marked them all as required.
7) everything in the company or business partners is in scope for your team until explicitly taken over by a dedicated team like DBAs
8) since we have one client with very strict SLAs, your manager has decided that now all of your alerts should be treated with equal urgency to those SLAs(response to an email within 30 minutes, 2 hr work around, 1 day fix)
In exchange for this, you get one work from home day per week, where you get to be online an extra hour on your designated day to be on call while the on call is in traffic home. That way, you are always responsive to email originators who cannot bear to wait until 6pm to get a response as to whether or not to worry about a missed backup or nolock-laden SQL select query that isn't working.
Somewhat exaggerated... But it's close enough that if you see this is deleted I probably am sitting in the discipline room or pink slipped.
I've stayed away from a few similar situations due to glassdoor reviews.
Anecdata: there is sometimes a stigma from Dev to Support, the latter is lesser in programming skills than the former, so they "shouldn't" be allowed to cross over. If I could have told my past self not to take the support college job 'for the experience', I'd probably have gotten actual programming job offers out of school rather than an analyst role.
But thanks for the advice - I'll dig out of the hole sooner rather than later hopefully!
I suggest talking to a few recruiters, they'll know how to polish your resume. As long as you get developer phone screens, you're doing well. It can often take a few different interviews with a few different companies before you get an offer.
BTW, depending on your personal situation, it might be worth it to quit your job and start looking for a new one, full time. That's a big risk, though, so it really only works if you have no debt or very understanding parents. I've always found it easier to find a job when I can dedicate myself full time to my job search. 120 hours dedicated to a job search is easier when it's full time, instead of 1 hour a night for 6 months.
The cost of quality documentation, management tools, system reliability and intelligible logging is real. You either have to spend it up front or every time the operational attention needs deep institutional knowledge. Having a developer there to catch your application whenever it falls down means the software deliverable can be be opaque to a level that would be unacceptable to an exclusively operations-oriented audience.
Loosely related example: the support team for manufacturing/service is our engineering department and I field most software issues. If I'm on site, I can pop down, do a quick investigation, and explain how to get everything running again quickly. When I'm off-site or the issue is at another location, the friction of hand-holding someone through the process is just enough to highlight the places that need enhancement.
2. If developer had to work nights, he should be compensation with additional days off.
3. no payment would reduce the stress so we should not ask for payment compensation.
4. We as developers have let this on us too easily, to eliminate stress devs must form a group and do not sign contracts which do not provide automatic day off for oncall.
Blame assignment is super counter-productive in the moment of emergency, but it seems like it could be useful for measuring developer effectiveness, incentivizing shipping features but also shipping features with a small amount of bugs. I have written my fair share of bugs that have snuck into test/staging/production (I'm a prolific at writing bugs), but that's the kind of thing that should come up in a yearly review (and I expect it to) and hurt my chances for raises/promotions, instead of the bullshit musical chairs, politics and level/rank setting (how many more years until you reach Senior Staff Software Engineer IV with distinction again?) that happens right now.
Also my ideal on-call situation (which probably doesn't exist):
- paid by supply/demand (price per hour on-call increases until someone decides to do it)
Companies should go back to hiring competent night staff for truly critical business processes and paying them whatever is appropriate. The on-call system as it sits now is heavily tilted in favor of business at the expense of employees -- the attention of a $100/hr+ professional for free, or some small percentage of the actual cost.
Also BTW if you write software and don't care that it's bug-free or don't take responsibility for it, you're a bad software engineer/developer. You don't have to be passionate about code but being a professional generally means producing quality work, and quality work is reasonably robust whenever it can be. One of the differences between a junior and senior software engineer is knowledge of what constitutes enough "quality" in context.
About the most consequential thing you can get away with without wrecking trust is mild humourous social shame like making someone wear a silly hat. And for things which have gone badly wrong that seems inappropriate.
This kind of thing is what blogger Alex Harrowell brilliantly coined "Coasian hell" megaprojects don't work: http://www.harrowell.org.uk/blog/2018/01/31/in-the-eternal-i...
I also disagree about it wrecking trust -- a well built & fairly applied system should build trust -- it's when people put their trust in systems with no power/hidden manipulation that trust gets wrecked the fastest. You could even make it opt in, and tie bonuses to the risk taken by those who decide to have raises/promotions manipulated in the context of the system.
IMO this is basically just a sub-problem of the general "how do we govern societies" general problem and "don't have consequences" doesn't seem like a good plan either.
[EDIT] I want to add that I really would like to hear other suggestions for how to solve these kinds of issues. I could only imagine a truly no-consequences style working in a xerox parc-ish environment which is only possible when there's more than enough money (both on the corporate and the people side) so desperation isn't producing rash actions, and most people are being motivated by something other than the normal money/prestige.
That said, I think if you want to encourage risk taking, do it directly -- incentivize it with money/prestige or hire more people who take risks (and give them free reign). In recent history more and more "labs"-type positions have been opening up at companies, with the aim being to lure in people who want to do interesting work. As an engineer, a labs position is 1000% more interesting to me than any other senior whatever position because of this ability to take risks and possibly reap large rewards (even if they go to the company).
As far as your point about productive members producing more bugs, you can incentivize/dis-incentivize this by changing how you perform reviews. Incentivize productivity, but not at the cost of introducing more bugs. Shower cash/prestige/autonomy on developers that produce lots of features with low error-counts and people will optimize for that if that's what drives them.
- Have the entire org take a hit
- Penalize the infra team
The thing about blame assignment is that no one wants to get blamed for anything (so like if you try #2 the infra team would likely find someone to blame as quick as possible), which ordinarily makes it pretty toxic but I think you can use it for good here with proper communication and goal-setting (which is of course harder than it sounds).
The case of outages completely caused by infrastructure I think is pretty rare, but if it's really something like S3 going down or whatever just don't count it. If it's that someone pushed an invalid load balancer setting to an ELB, then there's gonna be someone at fault, whether it's the infra team for letting it be possible in the first place or the developer that did it for doing it. Good infra teams try to make bad settings impossible and good settings self-servicable in my experience.
All this said, it really doesn't need to be super heavy handed, I mean don't make some orwellian report-your-neighbor for points system, but enforce accountability and link it to something people care about.
The interesting thing is that the majority of issues that came up were not necessarily bugs per-say, but rather, the hundreds of input sources our app consumed (algorithmic trading) frequently had bad data, so it was always a scramble to add fixes and stay on top of it, till the next bad input stream came in. It never ends!
I'm not sure if I've seen it proposed yet, but a better strategy IMHO is to have folks be "on call" while they are at the office. Then rotate to the next global office when they leave. If devs want to stay and go above and beyond, great. If your company needs to be 24/7, you need to staff it properly 24/7. Or be very upfront about the sleep deprivation requirements when hiring for it.
Depending on where you live this may not be realistic. In Japan for example it's quite common for companies to put their engineers on call without compensation - even if it's a legal gray area. I was once on a team that had to threaten management with a lawyer when they tried to propose this, but I have a feeling the majority of workers here would just swallow it.
There are other logistical factors that need to be considered which this article makes no mention of. What happens when someone who is on-call lives/commutes through an area with patchy cellphone coverage? What do you do regarding alcohol consumption?
While the product is still being developed/alpha/unstable, developers do the oncall. Benefit: they do have knowledge & having skin in the game works as motivation. This part is mentioned in the article, btw. But then, when the product matures, an SRE organization takes over.
Key point: they do so voluntarily and can request changes before taking over. This creates the good dynamic of separate people for separate roles, one can think of as 'judicial independence'. There's nothing like combining own skin in the game and the fact that you're pointing out deficiencies in somebody else's product (not yours) to get the extreme level of diligence typical for those reviews.
SRE review is a long process and generally assures the product adheres to a set of good practices surrounding monitoring, alerting, logging, playbooks, rollouts & canarying, emergency levers and whatnot.
The places it falls down are where we interface with other teams who aren't on call for their systems and for them a weekend long outage is "acceptable".
I suggest you look at the on-call chapters in the SRE book, SRE Workbook, and Seeking SRE.
The solution is primarily to include the development team in the on-call rotation (you build it - you run it). This can be very hard to do politically.
The solution is primarily to include the
development team in the on-call rotation
We're hiring people with on call being something that is part of the position they are taking.
As for the other teams, we're working on the politics to get them to support there systems, and looking at alternatives to using them if they don't.
I am having a hard time imagining scenarios that need developers to be oncall. Is it a matter of pushing bad code to production?
And then there are other issues that classical operations people tend to be much better at finding, such as weird network/storage/compute/whatever disruptions or starvation, wobbly load balancers or firewall rules etc.
Of course, you can try to teach developers those skills as well, but then you could also teach operators more about application logic.
My point is that neither "developer on call" nor "operations on call" feel obviously right to me, and I haven't found a good solution yet. Maybe both need to be on call, and collaborate.
I've mentioned this a few times on here, but I know a lawyer, and he's friendly enough to take a look over any contracts I sign at work and to let me know what to look out for, what is enforceable, etc. He does it for me for free, but I know he does it for others too (his specialty is contracts) for a lot cheaper than I assumed a solid lawyer would cost.
Anyway, my employer brought us in to a Monday morning meeting one day and told us that due to signing a new contact with a client, we'd all be doing on-call support, with a rota for who would be on call that day. I had Friday's, and was told that every Friday I would need to be available from 6pm-6pm. On top of this, we were told we'd be paid something stupid like £10. Not an hour, just £10 for being on-call, and an extra £5 should an alarm go off.
I mentioned to the Head of Tech privately that this wasn't in our contacts, and that I don't want to work on-call. Later on that day, we were all told that we'd have new contacts available to be signed later on that week, so I sent a text to my friend and asked his thoughts.
The long and short of it was that I could refuse to sign a new contract if I wasn't happy with it, and that if a deal couldn't be reached with work, I could be free to leave with no repercussions. I said this to my boss, and in the end I was told that I didn't have to do on-call work. I had mentioned this to a few others at work, and about half of the dev team chose not to work on-call. Those that did didn't even try to push for more money, and they took the extra days that others didn't work. One guy worked Friday to Monday on-call for two years, for around £30 a week, and some Amazon vouchers as payment.
Nowadays, I actively turn down jobs with on-call hours, and I won't take a job with on-call hours unless it was for my own company or my own product. I don't give a fuck if spending more time outside of work with the product I built will make it a better product, or if it'll force me to write better code. With that being said, in my experience there are plenty of developers out there who will happily work any extra hours requested, even if the money is poor, because it puts them in the good books of their managers.
At my last place, we needed to support a product outside of office hours, and we found that there are numerous consultancies/companies outside of our time zone that specialise in this exact thing. We ended up working with a developer in San Francisco that handled overnight support for us. Even with minimal experience of the product, we never had downtime they couldn't fix.
We received a fee per week and a fee for every incident (150% normal hourly wage). There were between 0 and 10 incidents outside of office hours in a week. Most of the time 0 to 1.
Even though it was part of the job and the compensation was fair, the only thing I miss about it now is the money ;)
Interesting. If I get woken up while on call, it's never by another person - rather it's by an alarm. Should I start deferring these alarms until next morning to get more sleep?
For context, I am new to being on call - this process was in place before I started.
If the phone rings, you answer it. If, after some assessment, it can wait until morning, you leave it til morning. Then you make sure it won't wake the next person up while they are on call.
I remember starting out early in my trade craft. Im an engine mechanic now, but years ago i worked mechanical maintenance for a state psychiatric hospital. The job came with an on-call pager about the size of a box of chocolates, but there was a limited and well defined scope. HVAC and the standby power generators for example were considered "priority one" where I had to be on-site in 30 minutes or less. busted light in the bathroom however was not an on-call priority.
It wasnt a rule when i started, but i eventually turned it into one: you cannot tack on extra work for an on-call event. example: im not replacing lights or repainting lines in the garage because im "already here" for a faulty transfer switch.
Lol does anyone here get paid extra for being on call?
A common system at our work is that support calls are first passed to a 24 hour helpdesk, who have a decent clue, and have access to fault finding documentation that the development team writes. If X do Y etc.
Only if that documentation fails does it get escalated to the developers. This encourages the developers to write good documentation, and ensures that trivial fixes can be sorted without calling out the developers.
Personally I love it when I get called for 5 minutes on a Saturday morning, tell them to turn it off and on again, and claim a half day off in compensation.
"Night shifts have detrimental effects on people’s health [Dur05], and a multi-site "follow the sun" rotation allows teams to avoid night shifts altogether."
"For each on-call shift, an engineer should have sufficient time to deal with any incidents and follow-up activities such as writing postmortems [Loo10]."
"Google offers time-off-in-lieu or straight cash compensation"
I recently had an incredibly dystopian experience around being on-call as a developer, and while I know for a fact that's not the norm, it's enough cause for concern to share my experience with others in hopes companies that choose this are held to higher standards and processes.
I joined a company in Vancouver early this year, that I will call company X. Company X is a well known name in the U.S for real estate/property search/etc. I was hired onboard to help transition a good chunk of their dated front-end code and help champion the direction of the front-end for various product teams in the company. Turns out the front-end was a giant amalgamation of a couple things: Dust.js, jQuery, bits of really poorly written React.js, all hooked up with and plugged into Node.js rendered server-side pages. An immense amount of UI bugs and regressions would appear whenever anyone haphazardly made a change to a seemingly unrelated component/page. Multiple efforts over the years were made by various people to "take the lead" on coming up with a shared UI/component library that was to be used across the various teams and products, but the components themselves were very buggy and lacked clear, consistent design patterns or input from UX/UI designers. This caused most of the teams to resort to building their own variations of similar components, with little effort to contribute back. This would continue over a couple iterations until someone else came up with the genius idea to build a share UI/component library...you get the idea. To actually develop and make changes on the front-end was even more archaic. The various products owned by the teams occupied a portion of the site, and were all hooked up by a build harness that someone had created. Only one person really knew how the harness worked, you needed to be able to connect to a specific machine to even just load the site navigation or anything, for that matter. There was a whole week or two where this wasn't possible, and productivity slowed to a crawl. Interestingly enough, the version of the harness that various teams were running were also different and out of sync. So you'd run the harness and wait some 3 minutes to test any little change, but no other pages nor products worked, so if your feature required integration with various other products, you were in for one hell of a ride. On top of this, a lot of the front-end code was written by developers that weren't well versed in building front-ends for web applications. Needless to say, the codebase was largely an entangled mess of different ideas, state management strategies, polluting of the global namespace, front-end libraries, duplicate code, hacks, and nuances. Some 2~3 years prior to my joining, the company had a mass exodus of developers -- apparently the place is rife with political turmoil amongst various directors and departments, too.
Prior to joining, I was explicitly told there was no on-call. Some 3 or so weeks after, there was talk about "testing Pagerduty". Very quickly, every developer on the product teams were required to be hooked up to Pagerduty and be on a recurring schedule. This is what that looked like for my team: 2 developers would be on-call on any given week, for 2 straight weeks. The intern, contractor, and Principal were excluded. This meant that as 1 of the 4 other people on the team, you'd be on-call 24/7 for 2 weeks every 4 weeks. How were the escalation and notification policies setup? When any error occurred, you'd get an app notification from Pagerduty, immediately followed by a text message, and a phone call. If you did not acknowledge within 3 minutes, it would text, phone, and notify again every minute until 5 minutes. At the 5 minute mark it would call the other 2 developers. No ack in 15 minutes -> Principal + Manager, next 15 minutes -> Director. My manager had 2 teams under him, and at one point he got an escalation from his other team. Saying he was unhappy would be an understatement -- a large number of hours and meetings over the next couple weeks were put in place to come up with a plan to make sure it never happened again and to keep people accountable.
Frequency of on-call rotation and overly aggressive escalation policies aside, there were other major issues. Traditionally, the products/services were all part of one large monolithic application. At some point in the past 2 years, there was a big push towards microservices. However, there was no API versioning, no proper logging or much ability at all to track where an error originated from. Despite using microservices, deployments were a coordinated effort every Thursday, along with code freeze and multiple rungs of approval from PMs to Directors/VP. Unfortunately, the team I was on was in charge of the CRM portion of the product, which was the most commonly used feature and had many integrations with other teams. This meant that for many teams, their errors would only bubble up through our front-end, where Pagerduty would be triggered for our team. In order to make the alerts stop, there were a number of hurdles. Firstly, there was no way to snooze some of these alerts as they weren't identified as identical errors even though they were. Secondly, locating the root of the issue was often extremely difficult, between the broken build processes and fragmentation. Thirdly, as APIs weren't versioned and deployments were done once a week as a concerted effort, fixes would not land until at least the next week, at best.
There were multiple times when I was on-call that I'd be woken up multiple times at incredibly inconvenient times: 2am, 4am, 5am, any day, didn't matter. Pagerduty bombardment came frequently. One day in particular I was at my desk trying to get work done and my phone went off some 13 times in 1 hour, all first alerts, and for the same issue. The cause? One of the teams was in charge of maintaining a set of APIs around Twilio, and pushed an update that caused constant errors everytime someone made a call. Obviously, this surfaced through our team instead of theirs. There was no rollback or anything to address this immediately. After tracking down the root cause and making the team aware, they had to prioritize the issue so it could get a resolution. The fix took just over 3 weeks, during which time all our team could do was put up with the pages and dismiss them.
I'd expressed concerns around how Pagerduty would be put into place prior to all this happening, and during. Throughout, the response from management was very clear: tough luck, deal with it or get out (in more words). Multiple members on both my manager's teams (amongst other teams) expressed discontent and frustration, many talks were had, and all fell on deaf ears. To top it all off, there was zero compensation, both monetary and time off. Myself and another colleague left, yet another transferred to a different part of the company without Pagerduty, and now another mass exodus is in full swing. Even the new contractor decided to get out well before his 8 months was up.
Overall it was a horrid experience, an incredible waste of everyone's time, productivity, health, and money. I'd hate to see this type of paradigm proliferate in the industry without due diligence and care around the whole practice. All I have left to show for it is my body in a constant state of anxiety, as if I'm still on 24/7 Pagerduty.
However it's often the case that "on-call" means ship broken software and fix bugs after hours.