The blame vs reward issue to me sounds rather orthogonal to the one we are discussing here. If the house crumbles one can choose to blame or not blame the one who built it but independently of that issue, in that situation it quite clear that it is not the time to attach pretty pictures to the walls. I.e., it certainly is not the time to do any improvement let alone reward anyone for it. First the walls have to be reliable and then we can attach pictures to them. The question what percentage of my time am I busy repairing failure vs what percentage can I write new stuff seems to me more important than MTBF vs. MTTR.
I have to grant you that underneath what I write there is some fear going on, but it is not the fear of blame. It is the fear of finding myself in a situation that I do not want to find myself in, namely, the thing is not working in production and I have no idea what caused it, no way to reproduce it and I will just have to make an educated guess how to fix it. Note that all of the stuff that was written to provide quality gates is often also very helpful to reproduce customer issues in the lab. This way the quality gates can decrease MTTR by a very large amount.
I think the quality gates mentioned in the article are the ones where you have a human approving a deployment. If you have an issue in production and you solve it you should definitely add an automated test to make sure the same issue doesn’t reappear. That automated test should then work as a gate preventing deployment if the test fails.
And I think the issue of blame is very much related to what you say drives this: fear. Fear is the wrong mindset with which to approach quality. Much more effective are things like bravery, curiosity, and and resolve. I think if you dig in on why you experience fear, you'll find it relates to blame and experiences related to blame culture. That's how it was for me.
If you really want to know why bugs occur in production and how to keep them from happening again, the solution isn't to create a bunch of non-production environments that you hope will catch the kinds of bugs you expect. The solution is a better foundation (unit tests, acceptance tests, load tests), better monitoring (so you catch bugs sooner), and better operating of the app (including observability and replayability).
Then you say that e.g., bravery is better than fear. Well, there is fear right there inside bravery. I would be inclined to make up the equation bravery = fear + resolve.
And why are you pitting replayability against what I am saying? Replayability is a very good example of what I was talking about the whole time. I have written an application in the past that could replay its own log file. That worked very well to reproduce issues. I would do that again if the situation arose. Many of these replayed logs would afterwards become automated tests. The author of the original article would be against it, though. The replaying is not done in the production environment, so it is bad, apparently.
And I'm saying that the things I listed are good ways to get quality while not having QA environments and QA steps in the process.
I also don't know where you get the notion that all debugging has to be done in production. If one can do it there, great. But if not, developers still have machines. He's pretty clearly against things like QA and pre-prod environments, not developers running the code they're working on.
So it seems to me you're mainly upset at things that I don't see in his article.