I think flaky tests would be always retried and if they start failing 100% then it’s marked as a pure regression. But they don’t block merges if they fail (ie it might be done only after the merge). Not 100%. But generally yes - the culture is to incentivize people to fix their flaky tests and it’s one of the things EMs are judged on.
> Ultimately, our goal with PFS is not to assert that any test is 100 percent reliable, because that’s not realistic. Our goal is to simply assert that a test is sufficiently reliable and provide a scale to illustrate which tests are less reliable than they should be.