
Sixpack: Language-Agnostic A/B Testing - zackkitzmiller
http://sixpack.seatgeek.com/
======
patio11
Awesome. A lot of places do things the way you used to do them, with one
system that is the A/B testing source of truth and then crufty glue code to
pass things back and forth to that system. Great to have more options for
folks who don't want to do this, especially as heterogeneous architectures are
only going to get more common going forward.

------
lifeisstillgood
Looks very nice - should be pretty easy to fork for feature toggles too.

~~~
mrclownpants
What is a feature toggle?

~~~
patio11
Let's hypothetically say you have a new feature Foo. Foo is under active
development and works on the test and staging environments, but you're
concerned it might not be ready for prime time. You first release Foo to your
staff's accounts on the production servers. After they break Foo in a while,
you roll it out to 10% of the user base selected randomly, while watching your
automated instrumentation to see how it reacts (does it blow up anything? do
users care about it? does anyone actually use the thing?). After you've proven
Foo out you release it to the entire userbase. Should you at some point have a
problem with Foo, you desire the ability to yank it back from all users while
you get back to tinkering on it privately.

Feature flags are a way to do that. By happy coincidence, they share semantics
almost verbatim with A/B testing. (At a high level of abstraction, the most
interesting API is basically User#should_see?(feature_name_goes_here). They
typically have a bit more going on in the API than that -- for example, the
ability to assign users to groups (like, say, "our employees", "friends &
family", "our relentlessly dedicated True Fans (TM) who are willing to suffer
the odd bug", "10% of people who signed up last Monday", etc) and assign
groups as being able to view a feature. There is often a UI visible for that.

~~~
ficklelarry
@patio11 what do you think of this thing overall? (Not feature flags, but
rather using a language-agnostic framework.)

~~~
patio11
I'm broadly in favor of any technology which increases the number of firms
which are able to make A/B testing a routine practice at their organizations.
This solves an issue which occurs in some deployments. It being available as
OSS is therefore unmitigated good news, though I probably won't use it myself.

------
ficklelarry
This is _exactly_ what I've been looking for. Ever since Google Website
Optimizer went away, I've been kinda floating in the wind when it comes to UI
testing. Major props to SeatGeek for releasing this so that others can use it.

~~~
ficklelarry
edit: Changed "GWO" to "Google Website Optimizer"

------
mrclownpants
It would be cool if you could divert only a fraction of traffic into a
particular test. A future feature, perhaps?

~~~
zackkitzmiller
You actually already can do that, and all of the clients support it. It's
probably just not documented.

Passing ?traffic_dist=10 to the participate endpoint will only direct 10% of
traffic to the test. The rest will get the control alternative.

I'll add this to the documentation.

~~~
btilly
That's not going to be the best API over time.

Give everything a default traffic weight of, say, 10. Make it alterable as
above. Divide traffic proportionately between all versions according to their
weight.

The reason to do it this way is so that when you go from 5 versions to 4, then
4 to 3, then 3 to 2 you just eliminate versions and don't have to calculate
the percentages to rebalance. Trust me, calculating percentages gets very old,
very fast.

While you're at it, add the ability for versions to be marked as not to be
reported on. This allows people to have versions test, control, and unreported
control. You start with a lot in unreported control. You ramp up the test by
dropping the weight of unreported control.

~~~
sutterbomb
+1 on the unreported control. It's a common use case to start an A/B test at a
small test percentage, then ramp up over time. Without an unreported control,
you can't do that and have valid samples for each group.

(E.g. your sample mix will be invalid if you start at 10 test / 90 control
then ramp up to 30/70\. But if you start 10 test / 10 control / 80 unreported,
then ramp to 30/30/40, your samples will be valid.)

------
danmaz74
Or you can use ABalytics: no backend, just use Google Analytics :)
[https://github.com/danmaz74/ABalytics/](https://github.com/danmaz74/ABalytics/)

