

A/B Testing - jack7890
http://jackg.org/ab-testing/

======
kevinconroy
Excellent point. Blindly following A/B testing can lead to a good local
maximum, but odds are it's not the market maximum. Radical redesigns and new
business directions can lead to even higher (or lower) conversions.

In my experience, the trick is finding the balance between optimization and
strategy. Doing continuous testing along the way helps you find out what's
working and what's not.

~~~
reyan
It reminds me of Fisher's theorem. [1]

[1]
[http://en.wikipedia.org/wiki/Fishers_fundamental_theorem_of_...](http://en.wikipedia.org/wiki/Fishers_fundamental_theorem_of_natural_selection)

~~~
kevinconroy
Agreed.

Side note: HN removed the apostrophe from the Wikipedia URL, resulting in a
404. You can easily get to it via Google search link, though.

[https://www.google.com/search?q=Fishers+fundamental+theorem+...](https://www.google.com/search?q=Fishers+fundamental+theorem+of+natural+selection)

------
ssharp
I think the point "blindly following A/B tests is bad" is valid, but I don't
completely follow the path this article paves to lead you there.

For part one, regarding forecasting, you were A/B testing within the segment
you were targeting, whether you realized you were targeting that segment or
not. You saw improvements within this segment, but were further alienating a
blind spot. Strict, purely statistical A/B testing isn't a great mechanism for
testing out segmentation strategies, business models, etc. A/B testing isn't a
strategy to get you more oranges, it's a tactic to get more juice from the
oranges you already have.

For the second part, regarding the map, I'm still not sure if this is a good
candidate for A/B testing, or if the implementation of the A/B test really
isolated enough variables to reach a business conclusion. I suspect that may
be difficult. Critical portions of your user interface are most likely another
area where A/B testing is a bad idea.

------
austenallred
Well done; A/B testing is valuable, but under a very specific set of
circumstances. The true purpose of AB testing is to say, "Does my conversion
go up or down if I make that button green?" If it goes up you keep it. You put
in some variables, and A/B testing can tell you which varables will produce
the best result. But it's important not to forget that the results of these
tests were (intentionally) limited to certain variables.

The reason being a founder is so hard and the reason your intuition as a
founder is so valuable is because answers that require finding and solving
pains can't (yet) be mathematically found. To us a very tired analogy, A/B
testing will tell you people want a faster horse instead of a car.

~~~
yummyfajitas
_But it's important not to forget that the results of these tests were
(intentionally) limited to certain variables._

I recently made the argument that you should cook up a single metric which
captures all variables of interest, and use it everywhere. Put all variables
of interest into them, and carefully define what tradeoffs you are willing to
make.

[http://www.chrisstucchio.com/blog/2013/metrics_manifesto.htm...](http://www.chrisstucchio.com/blog/2013/metrics_manifesto.html)

As for the tired analogy, founders intuition gives you the idea of cars. A/B
testing can still demonstrate that cars are superior.

~~~
darkxanthos
Since you're proposing a calculated metric, it's advisable to just log all
three and do the calculation afterwards. That way if the business decides that
number is silly at some point it can be changed and your results recalculated.
Also trying to come up with a simple formula that relates different units of
business value can be VERY difficult. Any literature on perspectives around
that?

~~~
yummyfajitas
_Also trying to come up with a simple formula that relates different units of
business value can be VERY difficult._

Deciding what you actually want and what tradeoffs you are willing to make can
be hard. Then again, if you don't know what you want you are unlikely to
achieve it.

------
kapilkale
I suspect many startups follow this pattern: 1\. Launch with flawed
hypothesis. 2\. Confused or disappointed users say what they really want. 3\.
Pivot that leads to product market fit (in Seatgeek's case, launching stadium
maps, Columbus, etc).

We went through that at GiftRocket. We launched with the premise of geo-
located gift cards. Of those who tried, many had bad experiences. But they
gave us enough direction to kill the GPS component and turn GR into a well-
packaged online way to send money as a gift.

Funny enough- PB told us to do this before we launched (the same way PG
identified the issue for Seatgeek).

------
JumpCrisscross
Consumers don't want to think about fluctuating prices. Especially if it means
there is a chance they bought too early. They want a price today that's better
than the market.

Scenario 1: The Broker

I want to see Swedish House Mafia but don't want to pay more than $400.
Inversely, I bought Beyoncé tickets, sobered up, remembered that I hate
Beyoncé, and would like to get at least $300. Your give the customer a
probability of the ticket being sold. Perhaps also the option of resetting the
price if the probability falls below X.

Scenario 2: The Dealer

Consumers buy and sell a small number of tickets. If you bought(sold) tickets
when your model said they were likely to go up(down) by a margin in excess of
your error you could diversify across multiple event types, venues, artists,
and dates. You would be exposed to model (transformed basis) risk and would
need to finance inventory.

Scenario 3: The Market Maker

Derivatives! :D The simplest way to "do arbitrage without ever holding the
tickets" would be to sell tickets as forwards. To illustrate, let's suppose
prices are at $500 and you believe they will fall to $200. You sell a "ticket"
at $250 to a customer and receive those funds today. The night before the
event you buy the ticket at $200 and deliver it to the customer, pocketing the
$50 plus interest as your spread. You're still exposed to model risk, but with
the benefit of float, i.e. cash today for a deliverable tomorrow. Bonus: easy
shorting.

------
JoelMarsh
A/B testing does not tell you what to do. It measures what you have done. HUGE
difference, but a difference that very few people actually seem to understand.

The last paragraph of this article made me shake my head though. If you do A/B
tests and then use the losing version anyway - because it feels better in your
gut - you're an idiot.

Choose a different path if you want, but if done properly, your A/B results
are fact (among the options you tested).

~~~
tgrass
The author wrote that they have a vision of what the seatsearching experience
_should_ be like. A/B allows us to test what is. It doesn't test for what
should be.

If the goal is to force the evolution of a trend/habit/market, doesn't one
have to go beyond split testing, and venture into the risky unknown?

~~~
JoelMarsh
From the article: "A few months ago we tested a new design for our map UI. The
new UI converted a bit worse. But we preferred it, so we kept it anyway."

 __ __

They A/B tested and then went with the LOSING version, because it felt better
in their gut.

Go for something radically different if you want, no problems there, but don't
test if you're going to choose your favorite even when it loses. That's just
dumb.

~~~
aurelianito
Why?

They tested and it converted "a bit" worse. I am pretty sure that if it
converted "a lot" worse, they would have not switched.

So, the test gave them useful information.

------
wavesarewet
I think most folks building websites _should_ blindly follow A/B test results.
There are no doubt situations where it's best not to, but introducing that
possibility means there are too many opportunities to make judgement mistakes.

~~~
mjn
Blindly following A/B testing with no design judgment is an interesting idea,
but difficult to truly do. At its extreme, it's just an interactive genetic
algorithm. You start with a blank HTML page, and then you design it _only_ by
A/B testing, not by any human judgment. You generate elements through some
kind of statistical process (maybe Markov chains trained on existing
webpages), then you make every decision of what to include or exclude based on
A/B testing.

------
luigi
If you go through the trouble of setting up an A/B test and determining a
statistically significant result, you're not _blindly following_ it when you
implement the winning variant.

It's totally cool to disregard the result an A/B test gives you. But don't
justify that decision by saying that if you do follow the results of well-run
A/B test, that it's somehow blind.

------
msencenb
Remember that A/B testing is a tool for refining the local maximum of a
specific design. It works wonders when you are at scale (with some sense of
product market fit) but is not a cure all at a small startup.

Going out and talking to 15 of your target customers in person will get you
more learning at an early stage than a month long A/B test

------
damoncali
A/B testing is for people who already know what they're doing, not for people
trying to figure out what to do.

------
cyphersanctus
Notice how nobody was interested in whether Marissa Mayer had any feedback for
SeatGeek on the video of their pitch. I suspect that question session would
work out quite differently today.

------
morganb180
Blindly following anything is always a bad idea.

~~~
narag
Like blindly following that rule?

------
jnarong
Perhaps it was just that the correct, holistic metrics were not used to judge
the a/b test?

~~~
btilly
It could be that the A/B test results were improperly stopped too early to be
meaningful. (Hint: stopping when you hit 95% is not enough.) It could be, as
they think, that the group that they were testing is not the group that they
needed to target for future growth. It could be that there are familiarity
effects that threw off the test. It could be that they were using the wrong
metrics for what they needed. It could be that to get at what they want,
they'd need to carefully segment their test in ways that they didn't know how
to do.

There are a lot of ways to not get the answers you were looking for out of
your A/B tests.

Of course it is also possible that they don't realize that their growth would
have been better had they paid attention to the A/B test. We have a tendency
towards confirmation bias - whatever decision we make, we often will look for
ways to convince ourselves that the decision we made is correct.

We're not given enough information to distinguish between these possibilities.

------
guiomie
I didnt finish the article, but the video linked in there was really good.

------
suyash
Thanks for exposing PG's and Jack's email to the whole world.

~~~
jack7890
Good catch, thanks. I just updated the images with the emails removed.

------
rorrr
I watched your pitch video, and it indeed was painful. You didn't have the
answer to an important question, and you still don't. "We don't want to be a
hedge fund" is not an answer. If your prediction software works, you are
sitting on a gold mine, and you DON'T need a huge amount of cash.

If ticket prices fluctuate 40%, and you're right 80% of the time, and half the
time the prices are going up, your potential gain is

    
    
        0.4 * 0.8 * 0.5 = 0.16 = 16%
    

Which is twice as good as your fee margin of 7%.

EDIT: And what's more important, your volume is not limited by the percentage
of your site visitors who make purchases, you are limited by your money, which
will grow exponentially, more or less. 32% average profit (minus fees) on each
buy/sell is fucking HUGE, it's unbelievable you're not doing it.

~~~
batiudrami
I don't know if this would have an important effect on the business model (or
if it's part of the reasoning of 'not wanting to be a hedge fund'), but being
in that position is going to give you a very hostile user base.

As a music fan, I can't stand scalpers. I understand the principles of supply
and demand, but scalpers subvert this because you don't necessarily have to
resell all the tickets you purchase to be profitable - just enough that your
margin covers the unsold ones. The only people losing are the fans, who can't
get affordable tickets, even when the venue is half full.

I had no idea how bad the scalping situation was until I tried to buy tickets
to a show in Chicago (in Australia where I live, it's a problem, but I've
never been in a situation where I couldn't get tickets by buying them when
released). I stayed up late (in my time zone) to get tickets, and missed out
because the website wasn't dealing well with the load. Next morning, I got up,
and there were 400+ tickets for sale on StubHub. For a 1300-person venue. Even
if 100% of scalped tickets went on sale, on one website, within 10 hours of
the event going on sale, that's still 30% scalping rate.

There are a number of solutions both easy (printing name and date of birth on
the ticket, which is matched to ID at the gate) or difficult (personally I
like the idea of taking 'bids' for tickets - the maximum someone is willing to
pay, and then at a cutoff date taking the nth person in line's bid (of an n
capacity venue) and issuing all tickets at that price. Minimal risk for
promoters as they can cancel if they can't manage a reasonable margin, and no
one overpays for tickets).

Anyway, I digress. My point is, I think potential sellers would see this
service as ripping them off, and would be hesitant to list tickets unless they
weren't confident they could resell them, and buyers would resent them because
you're essentially taking a large margin for almost no service. If the
algorithm works there's minimal risk for the company, you'd just be taking
free cash. I could see myself as a buyer actively avoiding using this
(hypothetical) service where possible because of it, and sellers also avoiding
it as they perceive an offer from you as a way sign that value would go up and
they shouldn't sell now.

As an aside, when I was in the US I got a ticket to see the SF giants for
about $10 for great seats on SeatGeek. Was super happy with the experience
(and the people next to me who so generously offered rules clarifications for
me).

~~~
DoubleCluster
It's really just market making. If the venue sells the tickets for much less
than people are willing to pay it's very reasonable that people will buy them
to resell them. Want to stop resale? Price the tickets closer to the real
value.

~~~
batiudrami
That was why I made the suggestion that you ask the maximum people will pay
and charge at at whatever rate fills the venue (though I suspect that people
would be likely to underbid compared to if they were told the ticket price,
but at the same time they know that if they do not bid enough they are
guaranteed to miss out on tickets, which might counterbalance it). It would
certainly sort out the glut of music festivals with diluted lineups that
Australia is seeing.

But anyway, take it from me, music fans are irrational types who won't see it
that way. I've encountered significant resistance at my suggestion of dealing
with the scalping problem. An idea exists that the 'real fans' should be able
to get tickets at a price they can afford, for whatever definition of 'real'
means that the person speaking falls into it.

Plus, sometimes tickets can be sold at market cost, but by taking advantage of
the upper end of the market and by limiting supply, scalpers can still be
profitable even when only selling a small proportion of the tickets they
purchased.

