Hacker News new | comments | show | ask | jobs | submit login

Without concrete details about the experiment setup and dataset on which it was run (and ideally the source code,so we can be sure there aren't any bugs) this is essentially meaningless.

Anyone can run "simulations" to prove anything. Providing just a summary table is of little use. I am not saying that the Wingify folks are trying to mislead people - just that this article doesn't have sufficient rigor to justify its conclusions.

OTOH, many CS papers, even published ones, don't provide source code or datasets so people can replicate the results, so perhaps this is the 'new normal' ;) .

Here's the code (quick-and-dirty): http://pastie.org/4007859

I had double-checked the code, but it is quite possible that I made an oversight somewhere.

I am not saying that your code has bugs (or that it hasn't!).

However running experiments to prove that one 'process' is superior to another in a real world situation often involves more than running a chi square test on randomly generated data ;) (which is what I understood your code to be doing on a very brief glance at it- sorry if I got it wrong).

There is nothing wrong with starting your exploration with something like this, but it really isn't sufficient to make the claims in the blog post about specific ways in MAB is better/worse than A/B testing.

This seems to be methodologically dubious. I am not a stats expert, though I use it in my work, and I could be wrong but testing convergence to statistical significance doesn't seem to mean anything (mathematically/statistically)!.

I could be wrong - there are people with Stats PhD's here and I'll leave it to them to tell me if I am. But I've never heard of an experiment (in the formal sense) which ran till a significance checking algorithm crossed some tripwire level.

Is the sampling random? (especially for an MAB) What are the underlying distributions (and assumptions about them)?

In your article you (imo) either need to say something like "this test isn't really valid and so the conclusions shouldn't be taken very seriously" (which would be weird thing for a company with your product to blog!) or you should do some rigorous experiment design and be able to defend your results.

Right now it reads like some MBA type plugged in two black box formulae in an Excel sheet and drew conclusions from the results. (please note: I am not saying you did it that way -just saying that the article reads like not enough thought went into setting up a proper experiment)

(Statistical) Experiment Design is a lot of work and surprisingly hard to carry off well, and involves much subtlety - and theory! there are whole books written about it. Maybe time to buy a few? :)

Yes, I agree we should put the code on the post probably so people know what exactly was done. Yes, it is a quick and dirty simulation, and not a rigorous analysis. I hope someone more knowledgeable does shed light if the arguments have any fundamental oversights. The post is mainly meant for our customers who asked if MAB is "better" than A/B testing. It clearly isn't as both are meant for answering different types of questions.

It's the old normal. See also the famous article about how most medical research is mathematically wrong.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact