This blog post misses to me what is the most important difference: If you are no...

btilly · on June 1, 2012

That was exactly my point at http://news.ycombinator.com/item?id=4040022 and it is a valid one.

However I have been thinking about it since, and it is possible to design a multi-armed bandit approach with logarithmic regret (though higher by a constant factor than a traditional approach), that can handle the varying performance ratio. It also would allow you to add variations at any time.

There remain operational differences, but this problem is fixable.

on June 1, 2012

[deleted]

birken · on June 1, 2012

The problem with this math is that the difference in the scores on Friday are already pretty significant (http://www.thumbtack.com/labs/abba/#A=60%2C500&B=40%2C50...)

Here is the absurd case. The conversion rate is 10% on Friday and 50% on Saturday:

Friday (~10% conversion):

A: 10 / 100

B: 11 / 100

Saturday (~50% conversion):

A: 5 / 10

B: 45 / 90

-----------------------

A: (10 + 5) / (100 + 10) = 13.6%

B: (11 + 45) / (90 + 100) = 29.5%

This is 99.9% statistically significant even though both variations are exactly the same: http://www.thumbtack.com/labs/abba/#A=15%2C110&B=56%2C19...

AJ007 · on June 1, 2012

The human factor in split testing is rarely to almost never discussed. I've been building and running landing pages for around 7 years now, I can't count how many millions of people have flowed through them.

Here are some things I've found:

1) Absolute conversion rate. After a certain point, whatever you add will just detract from the performance of something else. That detraction could either be from the landing page itself (if your lucky) or some longer term variable (hope those "conversions" don't cost you too much money.) I have had both occur.

2) "Statistically significant" can just be noise when variables are fairly close to each other. After getting rid of the obvious losers, I've watched the "winner" of elements like button color change back and forth daily for weeks, with no clear winner, even with 30,000+ conversions a day flowing through. This is the kind of thing visualwebsiteoptimizer would write a case study on 1 hours worth of traffic and declare a winner.

3) You brand can be shit on when dealing with returning users. They are used to seeing one thing and now they see something else. Imagine if the colors, theme, and button locations changed every day (or to be fair, once a week) when you visited hn. Often "better converting" designs actually convert worse when introduced on existing customers.

4) Failure to account for external variables, especially when dealing with small sample sizes. Testing is often done most vigorously with paid traffic sources as the monetary incentive is direct. The traffic source itself is often a bigger determining factor behind the conversion rate than the design. Small budget/sample size, and you could end up with some pretty poor test results that the math will tell you are correct.

I am not saying don't test. I am saying a/b testing, split testing, multivariate testing, etc is abused

mef · on June 1, 2012

Thanks for your reply. That was my deleted post up there, I worked out a table like yours and realized your point shortly after posting.