• It increases pager coverage, and reduces any one person's pager obligations. Simply having pager anticipation is a mental burden after a while.
• It creates a stronger incentive for response procedures: what are the expected obligations of response staff, what's considered sufficient effort, what's the escalation policy, who is expected to participate, what are consequences of failure to respond?
• Cross-training. Eng learns ops tasks, ops has a better opportunity for learning what eng is up to and deals with.
• It makes engineering more aware of the consequences of their actions: is insufficient defensive engineering causing outages (say, unlimited remote access to expensive operations), are alerts, notification mails, and/or monitoring/logging obscuring rather than revealing anomalous conditions? Are mechanisms for adjusting, repairing, updating, and/or restarting systems complex and/or failure prone themselves?
My experience at one site, where I was a recent staff member (and hence unfamiliar with policies, procedures, and capabilities), systems went down starting at 2am, I was unable to raise engineering or my manager, and the response the next staff meeting to my observation of this was pretty much "so what" did not endear me to the organization (I left it shortly afterward).
Note that what I'm calling for isn't for eng to be the sole group on pager duty, but for eng and ops to share that responsibility.