
The pitfalls of A/B testing in social networks - kiyanwang
https://tech.okcupid.com/the-pitfalls-of-a-b-testing-in-social-networks/
======
dalbasal
Many moons ago, OKC had a sort of clever & irreverent blog for stuff like
this. It was a great read. There were some classics like " _never pay for
online dating_ [1]" and " _The case for older women._ "[2] The first made a
convincing case that the cheesy business model (see below for irony) of most
dating sites doomed them to suckness. The second made a hilariously quantified
(and also convincing) appeal for changing your age preference settings, while
actually talking through real social norms and culture.

They managed to deal with touchy dating sterotypes, race, sex. Reading it, you
could tell that they were genuinely smart (as opposed to just sophisticated)
in their data/analytical mindset. You also got a feel for how the author
thinks and where that mindset translates into making OKC.

Anyway, this reads like a linkedin article. Safe, vanilla company blog post.
Big contrast. I wonder if the OKC guys writing that old blog 10 years ago new
how unique their position was.

[1] Got taken down after they got bought by the target of the post. Accessible
here: [http://static.izs.me/why-you-should-never-pay-for-online-
dat...](http://static.izs.me/why-you-should-never-pay-for-online-dating.html)

[2][https://theblog.okcupid.com/the-case-for-an-older-
woman-99d8...](https://theblog.okcupid.com/the-case-for-an-older-
woman-99d8cabacdf5)

~~~
AznHisoka
I read somewhere that they had just 1 guy doing all the data analysis and
after they got acquired he/she just was too busy with other work. Thus those
type of posts just disappeared.

~~~
shostack
This is actually a hard thing to manage at some companies. Writers often don't
have access or knowledge of the data, and engineers or data scientists often
have bigger fish to fry beyond some content marketing explorations.

If anyone has found a good way to find that balance and create really solid
in-depth data-driven content for there org, I'd love to learn more.

------
pougetj
I'm currently in the process of getting my phd on this exact topic and I'm
always happy to hear the problem of network interference in A/B tests being
tackled at more and more tech companies. Here's a small (incomplete) list of
(mainly industrial) references to solutions to this problem.

\- Facebook data scientists have an extensive list of publications on the
topic: [1][2]. (Dean Eckles is now at MIT).

\- LinkedIn's experimentation team has also published papers on the topic:
[3][4]

\- Data scientists at Google have also worked on this topic. One external
publication/collaboration I'm aware of is: [5]

[1]
[http://www.pnas.org/content/113/27/7316](http://www.pnas.org/content/113/27/7316)

[2] [https://arxiv.org/abs/1404.7530](https://arxiv.org/abs/1404.7530)

[3] [http://www.kdd.org/kdd2017/papers/view/detecting-network-
eff...](http://www.kdd.org/kdd2017/papers/view/detecting-network-effects-
randomizing-over-randomized-experiments)

[4]
[https://dl.acm.org/citation.cfm?doid=2783258.2788602](https://dl.acm.org/citation.cfm?doid=2783258.2788602)

[5]
[http://proceedings.mlr.press/v51/basse16b.pdf](http://proceedings.mlr.press/v51/basse16b.pdf)

Disclaimer: this list is heavily biased by experiences I've had through
collaborations/internships. I am an author on [3]. Edit: formatting.

------
fragsworth
> In conclusion: A/B testing is conceptually simple, but can be difficult to
> execute if your product involves anything you'd consider "social
> interaction".

A/B testing is really hard, even when there is no social interaction internal
to your application (like a game). The easiest thing to test is marketing for
new users. But even that can be tricky, where one group of people might click
fewer ads but are higher quality. This is especially troublesome when you
can't tell how many friends they invite through word of mouth. And then if you
want to implement other features or fix bugs over the duration of the test,
that also poisons the results.

And then the number of incoming users you need is massive.

To do a good A/B test that doesn't mislead you is extremely difficult. Almost
to the point that for most companies, I don't think it is worth doing.

> So... I guess my final recommendation is that you should hire some data
> scientists that like doing experiments.

Like this guy says, you basically need a data science team to do it
effectively. If you've got a small team (like I work with), don't waste your
time with it. You're better off just adding new features and fixing bugs.

~~~
jerednel
> But even that can be tricky, where one group of people might click fewer ads
> but are higher quality

This is why I always pull performance metrics in addition to
impressions/clicks when running A/B tests. Not being in the media/creative
department means I often don't see the creatives i'm asked to analyze the
performance of so there are occasionally different creatives that have a
stronger resonance with people with a higher propensity to convert. This is
itself a learning and could lead to using that sort of creative more for
acquisition focused campaigns, rotating it out of the upper funnel ad rotation
and coming up with a new upper funnel creative to test for the purposes of
building large audience pools.

> This is especially troublesome when you can't tell how many friends they
> invite through word of mouth.

Could you provide an incentive (gold/credit/whatever) for each new active user
referred which could feed into the evaluation criteria of the test as far as
the "value" that a particular conversion brought with it? Then I suppose it
turns more into a LTV study.

------
FutureSpec
I think OkCupid must have fallen victim to getting bought out and watered
down. They've been removing features and making bizarre UI decisions for
months now and people are not happy.

Many threads like this one:

[https://www.reddit.com/r/OkCupid/comments/75r2ay/seriously_o...](https://www.reddit.com/r/OkCupid/comments/75r2ay/seriously_okc_fuck_you/do8ewiz/)

~~~
slig
> I think OkCupid must have fallen victim to getting bought out and watered
> down

I'm betting that they run tests for everything and change into whatever
direction the results tells them. If people like "slide to the left/right" and
a simple UI, no walls of text, quizzes, etc, so be it.

------
carlsborg
The A/B testing example of 500 word limit is a great example of what Soros
calls Reflexivity in his books on markets.

------
wyck
Here's an idea, don't trust your users and use your intuition.

