Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is the most "damned if you do, damned if you don't" part of testing. I've found so many coding errors that weren't obvious until you looked at the day 2 or day 3 test results. "Hm, that's weird. Why is $thing happening in this test? It shouldn't even touch that component."

If you peek, you really have to commit to running the test for the full duration no matter what.



No you don't. If your protocol involves peeking (and early stopping), you need different thresholds to declare statistical significance. But you can do that. You just need to know whether you're peeking or not, which everybody does.


> If you peek, you really have to commit to running the test for the full duration no matter what.

It's more complicated, but you can also run sequential A/B testings using [SPRT](https://en.wikipedia.org/wiki/Sequential_probability_ratio_t...) or similar, where a test gets accepted or rejected once it hits a threshold. I won't go into the details, but you can incrementally calculate the test statistic, so if your test is performing very badly or well, the test will end early.

One product team I worked in run all tests as sequential tests. If you build a framework around this, I'd argue it's easier for statistics-unaware stakeholders to understand when you _can_ end a test early.


If there is a bug, then the experiment needs to be called off and a new one constructed. You shouldn't change anything else during the execution of the experiment.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: