

Show HN: The problem with the epsilon greedy method - crobertsbmw
https://github.com/crobertsbmw/EpsilonGreedy
I decided I wanted to roll my own AB testing app for Django (https:&#x2F;&#x2F;github.com&#x2F;crobertsbmw&#x2F;RobertsAB) when I was finished, I came across this:<p>http:&#x2F;&#x2F;stevehanov.ca&#x2F;blog&#x2F;index.php?id=132<p>Which is a very convincing article on why AB testing sucks and with a few extra lines, you can improve your algorithm to select the best test so you never go back and update your code (yeah right.)<p>I then thought, how many tests does this thing have to run to truly figure out which is best?<p>I made 4 tests with probability of success equalling 1&#x2F;2, 1&#x2F;4, 1&#x2F;5, 1&#x2F;6  and found that for this algorithm to settle on the best success rate (1&#x2F;2), it took 91 hits on average with a max of 876 tests.<p>I ran the same test using a standard AB algorithm. Picking whichever test has been tested the least and run that test. It took on average 32 tests to figure out which performed the best with a maximum of 363. On average 3 times better than the greedy epsilon method.<p>I then tried tweaking my success ratios to something a little less dramatic. 1&#x2F;10, 1&#x2F;11, 1&#x2F;12, 1&#x2F;13. Which just made everything take a LOT longer.<p>The only problem is that in reality you don&#x27;t know what the best solution is, so you can never know if you have gotten to the &quot;actual&quot; solution. The epsilon greedy method will eventually get there (although you will never know when). And if you are using the standard AB method you will never know if you have arrived at the best option either, especially when we are talking about the difference between 1&#x2F;20 clicks versus 1&#x2F;21 clicks.<p>Moral of the story -- AB testing is probably a waste of time.<p>Here is a link to all the tests I ran (python3): https:&#x2F;&#x2F;github.com&#x2F;crobertsbmw&#x2F;EpsilonGreedy
======
marketforlemmas
Interesting comparison but, by my understanding, epsilon greedy and A/B
testing do not solve the same problem.

Epsilon greedy is a method for minimizing regret, that is the expected loss
you occur from choosing options that are sub-optimal.

A/B testing's goal (or one of many goals) is to maximize the chance that,
after the test is over, you select the best option going forth.

So e-greedy makes a conscience choice to not maximize its statistical
confidence in certain options because it is trying to exploit the things it
knows to be good. Meanwhile A/B testing is trying to balance the exploration
so it can have that statistical confidence.

Hopefully someone with more expertise can chime in but I think this is the
gist of it.

------
bcbrown
[http://engineering.richrelevance.com/bandits-
recommendation-...](http://engineering.richrelevance.com/bandits-
recommendation-systems/)

~~~
crobertsbmw
awesome read. This would have satisfied my curiosity and probably saved me an
entire day of messing around. Thanks.

