Hacker News new | comments | show | ask | jobs | submit login

This is a good coversation. So the Disclaimer, my company Conductrics.com allows you to use algos similar to bandits as well as AB/MVT.

AB can be thought of as more of a form of epoch- greedy - you play uniform random then play greedy. One advantage of e-greedy is that if your environment is not stationary - you are still sampling from the arms - its sort of an insurance premium. To address the differences in reward signals based on the environment, there is the option to model the environment with features - since it maybe that Fridays require a different selection than Sundays - not sure why you would a priori assume that the relative values, or even just the arm rankings are independent of environmental variables.

One other point- if you just want a super quick hack to pick winners (or at least pick from te set of higher performing arms) you can just optimistically seed your estimates for each arm - then just pick greedy. Not claiming it is optimal or anything but it requires almost no coding overhead. Of course you need the problem to be stationary.

Regardless of which algo approach you use, I do think it is useful to at least think in terms of bandits or reimforcemt learning when tackling these problems.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: