And, contrary to the stated intentions, I've directly observed developers making crappy, band-aid fixes to ongoing production problems in the interest of "making the pages stop". This is the mindset when you are on call be being paged at all hours.
In theory, DevOps is supposed to put those that can best fix things closest to the problems, but in reality a slight separation from the firestorm of ops actually produces better, more thoughtful solutions in the long run.
The best balance is to have a first tier Ops on-call, 2nd tier engineering on-call, and any alerting issues get attention within 24 hours, moving to the front of the work-queue. But, indiscriminately assigning everyone "pager-duty" rotations leads to lower quality solutions in the end.
• It increases pager coverage, and reduces any one person's pager obligations. Simply having pager anticipation is a mental burden after a while.
• It creates a stronger incentive for response procedures: what are the expected obligations of response staff, what's considered sufficient effort, what's the escalation policy, who is expected to participate, what are consequences of failure to respond?
• Cross-training. Eng learns ops tasks, ops has a better opportunity for learning what eng is up to and deals with.
• It makes engineering more aware of the consequences of their actions: is insufficient defensive engineering causing outages (say, unlimited remote access to expensive operations), are alerts, notification mails, and/or monitoring/logging obscuring rather than revealing anomalous conditions? Are mechanisms for adjusting, repairing, updating, and/or restarting systems complex and/or failure prone themselves?
My experience at one site, where I was a recent staff member (and hence unfamiliar with policies, procedures, and capabilities), systems went down starting at 2am, I was unable to raise engineering or my manager, and the response the next staff meeting to my observation of this was pretty much "so what" did not endear me to the organization (I left it shortly afterward).
Note that what I'm calling for isn't for eng to be the sole group on pager duty, but for eng and ops to share that responsibility.
Within the right framework, keeping everyone on pager rotation can lead to much smoother operations, because everyone stays familiar with the system as a whole. This was going around recently, and captures the essence of the philosophy: http://catenary.wordpress.com/2011/04/19/naurs-programming-a...
At one place I worked we had a two-person support shop. We would claim time and again that this or that affected customers or made support hard. The devs would pick and choose what was fun to work on. I ended up leaving and the other guy went on a prearranged month-long vacation. Everyone else had to pick up support (~5 devs) for a month, and I'm told that they had so much trouble with the normal support load that development actually stopped for that month. Apparently when the other guy got back, they started listening a bit more to his concerns, having had a taste of what happens on the pointy end.
In a similar vein, there's a wine distributor where all employees spend their first week half on the phones and half in the packing department, to give everyone a feel of what the core function is and what customers complain about. The guy telling me said that everyone gets the treatment, except the new CEO, who got away with only doing a day rather than a whole week.