Feature flags are a way to do that. By happy coincidence, they share semantics almost verbatim with A/B testing. (At a high level of abstraction, the most interesting API is basically User#should_see?(feature_name_goes_here). They typically have a bit more going on in the API than that -- for example, the ability to assign users to groups (like, say, "our employees", "friends & family", "our relentlessly dedicated True Fans (TM) who are willing to suffer the odd bug", "10% of people who signed up last Monday", etc) and assign groups as being able to view a feature. There is often a UI visible for that.
It absolutely would, and guess what we use at work :-0
It absolutely would, now get off HN and fork it. I want a pull request by Monday !
Now get off HN and fork it.
Passing ?traffic_dist=10 to the participate endpoint will only direct 10% of traffic to the test. The rest will get the control alternative.
I'll add this to the documentation.
Give everything a default traffic weight of, say, 10. Make it alterable as above. Divide traffic proportionately between all versions according to their weight.
The reason to do it this way is so that when you go from 5 versions to 4, then 4 to 3, then 3 to 2 you just eliminate versions and don't have to calculate the percentages to rebalance. Trust me, calculating percentages gets very old, very fast.
While you're at it, add the ability for versions to be marked as not to be reported on. This allows people to have versions test, control, and unreported control. You start with a lot in unreported control. You ramp up the test by dropping the weight of unreported control.
(E.g. your sample mix will be invalid if you start at 10 test / 90 control then ramp up to 30/70. But if you start 10 test / 10 control / 80 unreported, then ramp to 30/30/40, your samples will be valid.)