Hacker News new | past | comments | ask | show | jobs | submit login
Sixpack: Language-Agnostic A/B Testing (seatgeek.com)
62 points by zackkitzmiller on Aug 28, 2013 | hide | past | favorite | 16 comments

Awesome. A lot of places do things the way you used to do them, with one system that is the A/B testing source of truth and then crufty glue code to pass things back and forth to that system. Great to have more options for folks who don't want to do this, especially as heterogeneous architectures are only going to get more common going forward.

Looks very nice - should be pretty easy to fork for feature toggles too.

What is a feature toggle?

Let's hypothetically say you have a new feature Foo. Foo is under active development and works on the test and staging environments, but you're concerned it might not be ready for prime time. You first release Foo to your staff's accounts on the production servers. After they break Foo in a while, you roll it out to 10% of the user base selected randomly, while watching your automated instrumentation to see how it reacts (does it blow up anything? do users care about it? does anyone actually use the thing?). After you've proven Foo out you release it to the entire userbase. Should you at some point have a problem with Foo, you desire the ability to yank it back from all users while you get back to tinkering on it privately.

Feature flags are a way to do that. By happy coincidence, they share semantics almost verbatim with A/B testing. (At a high level of abstraction, the most interesting API is basically User#should_see?(feature_name_goes_here). They typically have a bit more going on in the API than that -- for example, the ability to assign users to groups (like, say, "our employees", "friends & family", "our relentlessly dedicated True Fans (TM) who are willing to suffer the odd bug", "10% of people who signed up last Monday", etc) and assign groups as being able to view a feature. There is often a UI visible for that.

@patio11 what do you think of this thing overall? (Not feature flags, but rather using a language-agnostic framework.)

I'm broadly in favor of any technology which increases the number of firms which are able to make A/B testing a routine practice at their organizations. This solves an issue which occurs in some deployments. It being available as OSS is therefore unmitigated good news, though I probably won't use it myself.

It absolutely would.

Hmm, after looking up your profile I am now wondering is that:

It absolutely would, and guess what we use at work :-0


It absolutely would, now get off HN and fork it. I want a pull request by Monday !


Nice one.

We currently aren't using a fork for feature flags, though it would certainly be cool to do so.

Now get off HN and fork it.

This is exactly what I've been looking for. Ever since Google Website Optimizer went away, I've been kinda floating in the wind when it comes to UI testing. Major props to SeatGeek for releasing this so that others can use it.

edit: Changed "GWO" to "Google Website Optimizer"

It would be cool if you could divert only a fraction of traffic into a particular test. A future feature, perhaps?

You actually already can do that, and all of the clients support it. It's probably just not documented.

Passing ?traffic_dist=10 to the participate endpoint will only direct 10% of traffic to the test. The rest will get the control alternative.

I'll add this to the documentation.

That's not going to be the best API over time.

Give everything a default traffic weight of, say, 10. Make it alterable as above. Divide traffic proportionately between all versions according to their weight.

The reason to do it this way is so that when you go from 5 versions to 4, then 4 to 3, then 3 to 2 you just eliminate versions and don't have to calculate the percentages to rebalance. Trust me, calculating percentages gets very old, very fast.

While you're at it, add the ability for versions to be marked as not to be reported on. This allows people to have versions test, control, and unreported control. You start with a lot in unreported control. You ramp up the test by dropping the weight of unreported control.

+1 on the unreported control. It's a common use case to start an A/B test at a small test percentage, then ramp up over time. Without an unreported control, you can't do that and have valid samples for each group.

(E.g. your sample mix will be invalid if you start at 10 test / 90 control then ramp up to 30/70. But if you start 10 test / 10 control / 80 unreported, then ramp to 30/30/40, your samples will be valid.)

Or you can use ABalytics: no backend, just use Google Analytics :) https://github.com/danmaz74/ABalytics/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact