On-call is a coping mechanism, not a strategy

The phone that owns your evenings

It is 9:40 at night. Dinner is cold, the show you started is paused, and your phone is face-up on the table because you have learned that face-down is a luxury you do not get. You are not actively working. You are waiting to work, which is a different and worse thing. The team calls you reliable. Your manager calls you the safe pair of hands. What none of them say out loud is that the plan for keeping the company running after hours is you, awake enough to answer.

So you answer. A sync job failed and needs a rerun. A regional sales lead is locked out before a 7 a.m. demo overseas. A sign-in alert fired from a country nobody has visited, and you are the only person who can tell whether it matters. Each one is small. None of them needed to reach you at home, except that there is nowhere else for them to go. You clear the queue, set the phone back down, and start the wait again.

Being the person who always answers feels like a strength. It is worth asking what it is quietly covering for.

On-call is a symptom, not a structure

On-call is fine as a backstop for the genuinely rare and genuinely urgent. It becomes a problem when it quietly turns into the plan. When the answer to "what happens if this breaks at night" is "Morgan picks up," you do not have a resilient operation. You have one person standing in for the systems, runbooks, and routing that a resilient operation would have built.

The tell is the kind of page you get. A real incident, the database is down, a breach is in progress, deserves a human at any hour. Most after-hours pages are not that. They are routine work that simply had no defined path, so it defaulted to the one person who would always respond. Three categories show up again and again:

Work that should have been automated, like a failed job that could rerun itself or a routine access grant that could follow an approval flow.
Work that should have been routed, like a GuardDuty finding or a Microsoft Entra sign-in alert that belongs with someone who reads them all day, not whoever is closest to a laptop.
Work that was never urgent at all, batched into the evening only because the day was too full to reach it.

None of those are on-call problems. They are design problems wearing an on-call costume.

What the hero model costs you

The first cost is the person. Constant interruption and broken recovery are a recognized driver of burnout; the World Health Organization classifies burnout as an occupational phenomenon tied to chronic workplace stress that has not been managed. The mechanism here is simple. You never fully clock out, so you never fully recover, so the reserve you would need for a real incident is already spent on reruns and lockouts.

The second cost is the business, and it is the one leaders miss. A function that runs on one person's willingness to answer is a single point of failure with a pulse. When that person takes a real vacation, gets sick, or leaves, the coverage leaves with them, because the coverage was never written down. The knowledge of which job to rerun and which alert to ignore lives in one head, and heads do not scale, back up, or hand off cleanly.

If your continuity plan is one tired person and a phone, you do not have a continuity plan. You have a person who has not burned out or quit yet.

The third cost is the quietest. The important work that never pages anyone, the stale Conditional Access policy, the half-finished offboarding, the access review nobody scheduled, gets crowded out by the noise of the pages that do. The hero is too busy answering to fix the things that would stop the calls.

Build the system the pages are asking for

The fix is not a tougher rotation or a second person to share the misery. It is to treat every recurring page as a defect and design it out. Security is an operational problem before it is a tool problem, and most after-hours noise is the same. The gap is a missing path for work to follow without a human relaying it at night, not a missing product.

Start by sorting last month's pages into the three buckets above, then act on each:

Automate the repeatable. A failed sync reruns itself with an escalation only if the retry fails. A routine access request runs through an approval flow, day or night, the way our approval links and access requests guide describes, so it never lands in your texts.
Route the specialist work. Send identity alerts and threat findings to people who read them as their daily job, so an Entra or GuardDuty signal lands in front of the right eyes the first time and never has to find you.
Give everything a defined home. A request becomes a tracked item with an owner and a status, not a message in your personal phone. The submit a request guide shows how one intake turns scattered after-hours pings into a queue someone is accountable for.

Do that, and the page rate drops because the reasons to page dropped. The rare true incident still reaches a human, with a rested person on the other end and a runbook behind them.

The reliable person deserves a reliable system

Go back to 9:40 and the phone on the table. In a designed operation, most of what reached you tonight never would have. The failed job retried itself. The lockout followed a path that did not depend on you being awake. The sign-in alert was already in front of the person who reads them every day. The phone is on the table, but it is quiet, because the system is doing the catching that you used to do alone.

Being dependable is a real and rare quality. It deserves better than being spent as the company's after-hours infrastructure. The strong move is not answering faster; it is building the structure so fewer calls have to be answered at all. So the honest question for your team is this: how many of last month's pages were genuine emergencies, and how many were just work that had nowhere else to go?

On-call is a coping mechanism, not a strategy

The phone that owns your evenings

On-call is a symptom, not a structure

What the hero model costs you

Build the system the pages are asking for

The reliable person deserves a reliable system

More in Operations

A support experience your team will not resent

Why a request queue beats a shared inbox

AWS and M365 under one operator, not two

This is what we actually do