This is the most "damned if you do, damned if you don't" part of testing. I've f...

thaumasiotes · 2024-08-11T02:44:33 1723344273

No you don't. If your protocol involves peeking (and early stopping), you need different thresholds to declare statistical significance. But you can do that. You just need to know whether you're peeking or not, which everybody does.

beejiu · 2024-08-11T16:29:20 1723393760

> If you peek, you really have to commit to running the test for the full duration no matter what.

It's more complicated, but you can also run sequential A/B testings using [SPRT](https://en.wikipedia.org/wiki/Sequential_probability_ratio_t...) or similar, where a test gets accepted or rejected once it hits a threshold. I won't go into the details, but you can incrementally calculate the test statistic, so if your test is performing very badly or well, the test will end early.

One product team I worked in run all tests as sequential tests. If you build a framework around this, I'd argue it's easier for statistics-unaware stakeholders to understand when you _can_ end a test early.

aflag · 2024-08-11T10:49:18 1723373358

If there is a bug, then the experiment needs to be called off and a new one constructed. You shouldn't change anything else during the execution of the experiment.