For the first few jobs in my career, I lived with what many did, the dreaded on-call rotation. That is, for the uninitiated, when you do your typical workday, and then you remain available to handle support issues through the night. One particular job left me with several weeks of late-night calls, and I couldn’t stand it anymore. So here’s the story of how I stopped doing on-call support from then on.
Let me go ahead and put something somewhat controversial out there. On-call support happens because we admit something terrible can go wrong, and we cannot wait to fix it. The less mature the systems and code, the more on-call hurts. The more mature and stable, the less it hurts.
Now I’m not saying it can be eliminated in its entirety, but it can turn into something where many people who handle on-call rarely receive calls and can go about living their life with little chance of interruption.
Want a quick way to gauge the maturity of the organization you’re interviewing? Ask them about on-call support and how many issues they receive.
Alright, so the story starts as a fairly large project launches to production for an alpha release.
It wasn’t going well.
We’d build features during the day, then a select few of us would then get calls throughout the night as the DevOps group tried to get it running in production. I was one of those select few.
My first phone call would come in around 11 PM, and after an hour or so, the call would end, and my second call would come in about 3-4 AM. After another hour or two, then I’d wake up at seven and be at work by 9.
This pattern kept up for several weeks, and the lack of sleep was taking its toll. I couldn’t understand why we didn’t do this during the workday. After all, getting into production by our deadline was the most important thing, so why were we doing it when we had the least help?
I was assured by leadership that this would end, things would get better, and this would never happen again. I had very little confidence in those platitudes. So I took matters into my own hands.
So I said one day, “I’m going to keep helping, but every time you call, I’m going to drink a Manhattan. You get to decide how drunk you want me in production.” The leadership said I was not allowed to do that, to which I retorted that I would do whatever I want on my time, and they have a choice not to call me or do this during the day.
That night I got my call, and I said, “Hang on.” I went over, made a drink, and returned. “I can’t believe you were serious,” someone on the call said. I resolved the issue, and in another hour, I got another call.
I said, “Hang on.”
That was the last call that night, and the next day I had to go into a closed room to talk about my antics. I made my point as clearly as I could that there are just a few of us burning out at night trying to get the most important thing done while all day long, we do nothing to help.
I was told my help wouldn’t be needed again.
Management created a dedicated room where people from many teams would work on this during the day. People rotated in and out, and trying to do this at night came to a close.
I had a sort of epiphany about how companies and groups tend to treat supporting their products from then on. They treat it as a secondary concern to building new features.
How crazy is it to add features to a product that struggles to run as is?
As I left that company and went to others every time, someone brings up support. I list my rules.
I get pushback when I ask managers to sit on the phone as, “There’s nothing I can do to help.” I tell them that I want them to sit there awake as long as I do so that when I come to work exhausted and say this has to stop, they feel its importance.
In the seven years since I started this, companies I’ve joined have changed the way they handle on-call due to my rules. I won’t say that on-call isn’t needed, but treating it as something different than the rest of the job isn’t healthy or sustainable. My rules force that issue out.
You may find my rules too strict, and that’s alright. Consider, though, how your group thinks about after-hour support and how what happens when people are asleep feeds back into your regular workday. Most of the time, it doesn’t. Is the support demand growing? What is changing to make it that way, and what can you do to turn that trend around.
If nothing else, my rules force people to look at their choices during the day that leads to suffering at night.
What rules would you help create to make on-call something rarely used because your group cares so much about stability?