
Optimizely (YC W10) Increases Homepage Conversion Rate by 29% - dsiroker
http://blog.optimizely.com/optimizelys-new-results-page-helps-you-determ
======
Eliezer
Dear Optimizely: Your statistics do not tell you that B has a 95% chance of
being better than A. Your statistics tell you when the excess of B over A has
less than 5% probability, assuming B and A are actually equally effective.

A Bayesian would understand this in terms of prior probabilities and
likelihood ratios, but to put it into nontechnical terms, suppose that you
tried out 15 different alterations and none of them seem to work. Then on the
16th, your detector goes off and says, "Less than 5% probability of these
results arising by chance!" Do you conclude that it's 95% likely that this
version is genuinely better? No, because the first 15 failed attempts told you
that improving this webpage is actually pretty hard (the prior probability of
an effective improvement is low), and now when you see that the 16th attempt
has a result with a less than 5% probability of arising from chance, you
figure "Eh, it's worth testing further, but probably it _is_ just chance."

Another extremely important point is that the classical statistics you learned
to use to decide that something was <5% likely to arise by chance, only apply
if you decided in advance to do exactly that many trials and then stop. Your
chance of finding, on _some_ trial, that your running total of results is
"statistically significant", when A and B are actually identically effective,
is _considerably greater_ than 5%. See
[http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequ...](http://lesswrong.com/lw/1gc/frequentist_statistics_are_frequently_subjective/)
\- a trial I ran with 500 fair coinflips had at least one step where the
cumulative data "rejected the null hypothesis with p < 0.05" 30% of the time.

You're not really to blame for this mistake, because the horrid non-Bayesian
classical statistics taught in college are just about _impossible_ to
understand clearly; but it does sound to me like someone at your org needs to
study (a) Bayes's Theorem (b) the case for reporting likelihood ratios rather
than p-values (likelihood ratios are objective, p-values decidedly not) and
(c) the beta distribution conjugate prior (which would make progress toward
having priors and likelihood ratios over "These two pages have a single
unknown conversion rate" or "These two pages have different unknown conversion
rates"). Or in simpler terms, "Someone at your company needs to study Bayesian
statistics, stat."

~~~
meric
The p-value represents "Probability this is all luck". So why isn't 1 - p
"Probability this isn't all luck"?

>> Another extremely important point is that the classical

>> statistics you learned to use to decide that something

>> was <5% likely to arise by chance, only apply if you

>> decided in advance to do exactly that many trials and

>> then stop.

I agree. Not doing that I'd say it's "fudging with the numbers". I can't find
on their article where they did this though.

~~~
jharsman
A better interpretation might be "probability this would happen if results
were governed purely by chance". Note that the distinction is important if the
process is in fact governed by chance!

Lets say I roll a dice two times and get a six both times. The probability of
this happening is 1/36 or about 3%.

Would you say I have established with a 95% confidence interval that the
particular die I'm using always rolls six? No, because you have good reason to
believe that the the results are in fact random. Or in other words, you have a
strong prior belief that the hypothesis you're testing is false.

~~~
meric
>> Or in other words, you have a strong prior belief that the hypothesis
you're testing is false.

Yes, that is true for the dice roll; You already know most die aren't rigged.

I don't see how this affects the results of optimizely's product; You do not
have a strong prior belief if the hypothesis is true or false.

Also note that the number of observations in the article used were in the
thousands.

------
AJ007
They increased the number of people submitting their URL, there is no telling
if that actually resulted in higher leads for them.

What I have found is a simple landing page, that tells the user exactly what
you are providing, and is free of any confusion, works the best over the long
term.

I've run hundreds of thousands of website visitors through Google Website
Optimizer in multi-variate tests and what I've found is that over time there
is little to no difference in conversion rate for minor landing page changes.
The biggest jumps come from eliminating content in the design and clarifying
the message.

Looking at the small amount of users they sent to this landing page, I would
call the results inconclusive. You can ramble off statistics to me all day
long, but you can't change the fact that humans don't behave when the
predictable that coin flips and physics do. (its really chilling when you see
how many drugs the FDA has approved over tiny margins of change/success.)

~~~
gnaritas
> Looking at the small amount of users they sent to this landing page, I would
> call the results inconclusive

The statistic say otherwise. A 29% bump with a 1% margin of error is not
inconclusive; it's virtually the very definition of a conclusive result.

> You can ramble off statistics to me all day long, but you can't change the
> fact that humans don't behave when the predictable that coin flips and
> physics do.

And you can rattle off personal anecdotes like this all day long, and the
statistics are still more correct than you are and assert their margins of
error and accuracy. The statistics are more correct than your intuition.

~~~
Silhouette
> A 29% bump with a 1% margin of error is not inconclusive; it's virtually the
> very definition of a conclusive result.

Well, no, it's a set of numbers with percentage signs after them. Perhaps the
documentation for Optimizely specifies how their error margins etc. are
derived, but nothing in the linked article does as far as I can see. Without
knowing that underlying reasoning, all those pretty graphs and percentages are
just a load of gobbledegook, apart from the original data points and the
percentage increase figures derived directly from dividing them.

~~~
gnaritas
What a lazy ignorant statement. If you want to see how they crunch the
numbers, go look.

~~~
Silhouette
I did. A Google search for

"error bars" site:optimizely.com

turns up exactly three hits. One of them is the blog post we're talking about.
The others are discussions on the Optimizely support pages from December 2010
and January 2011, which are similarly statistically waffly. The older one
promises a further clarifying post that never seems to have been written.

If you have found other sources where the Optimizely site publicly describes
their statistical methodology, please share them. I think several people
following this discussion would be interested.

Otherwise, I stand by my earlier comments.

~~~
gnaritas
So ask them; they aren't stupid, they wouldn't be building a business based on
A/B testing without using valid methods of testing and displaying results.
That you call the results crap because you don't have the perfect details of
everything is simply absurd.

~~~
Silhouette
I didn't call the results "crap". I am simply pointing out that they are
meaningless without knowing the methodology behind them. (And we aren't just
missing the "perfect details of everything" here. As far as I can see, we have
no rigorous details whatsoever.)

I would remind you that _you_ were the person who was attacking another
poster's position based on your interpretation of those currently meaningless
numbers. It's up to you to back up your claim, not up to the rest of us to
figure out whether your argument has any merit.

~~~
gnaritas
You called them gobbledegook, same difference.

> I am simply pointing out that they are meaningless without knowing the
> methodology behind them.

Only if you assume incompetence or malice on the part of Optimizely, neither
of which you have any valid reason to do. It's perfectly reasonable to assume
they aren't stupid and the results are valid.

> It's up to you to back up your claim, not up to the rest of us to figure out
> whether your argument has any merit.

Um, my claim is don't assume they're idiots; that doesn't require me to back
anything up.

The poster I replied to wasn't attacking them, he was attacking statistic in
general, which is what I was replying to.

Your response was to imply that Optimizely doesn't know what they're doing and
therefore their results are invalid until you see how they're crunching the
data; that's simply absurd.

------
destraynor
As always you need to be wary of how these results are reported.

AJ already pointed out that they're not measuring "conversions" in the sense
of converted to paying customers, but converting to "entering a url in a
field"

The 29% increase is always misunderstood (by clients at least)

The original page had a conversion rate of 8.9%

The page they ended up with had a conversion rate of 11.5%

The change is that an additional 2.6% of customers are now entering their urls
in a field and clicking a button.

------
petercooper
This is sorta related but since I concluded the test today and this post is
here, I thought I'd share.

I ran a 5 way split test for 9 days on my newsletter's signup page (JavaScript
Weekly). My original page was the 2nd best performing but an identical page
just _without_ the subscriber count got a 8% higher conversion (or about 20%
more signups in all) with a 90% confidence at the end of testing.

The worst performer? A signup page with no screenshot preview of the
newsletter. Sent conversions from about 37% down to a mere 3% (!!) Lesson
learned? Always have visuals or screenshots on pages where you're trying to
get people to sign up for things they aren't sure about.

~~~
StavrosK
Were you looking at the statistics all the time to see when they passed 90%
confidence? That invalidates them, just so you know.

Basically, if you're watching it, it won't science.

~~~
petercooper
I'm not sure how tongue in cheek that was ;-) but 90% confidence was not a
_goal_ , at least. I just got bored after a week and wanted to move on. C'est
la vie..

------
espadagroup
Does anyone know of a central repository for all of these little landing page
optimization tweaks? I know each site is different and just to test to know
for certain but there should be some general decisions vetted that would make
a good boilerplate.

~~~
kristofferR
<http://www.abtests.com/> is one

------
sgrove
We've been using Optimizely more and more extensively (we have a slightly
unusual use case for it), and it's been _fantastic_ at successfully ratcheting
up conversions. We use it in concert with mixpanel when we need to push people
along a funnel.

The tool's insanely easy to implement, a joy to use, and I get to rely on them
to tell me when something is statistically meaningful.

~~~
walrus
What is your slightly unusual use case?

~~~
sgrove
We actually deploy it across a number of client apps, rather than just our own
site. We use it to run a dozen or so tests at any given time across multiple
apps, and to quickly iterate on others' products.

It's an interesting combination with mixpanel. If we could get one more thing
from Mixpanel, we'd really be set - I'm bugging them, we'll see if they come
out with it!

------
joeyespo
It would be very cool if Optimizely could gather the A/B test results across
all customers, run some statistical analysis, and publish which optimizations
are significant. Sort of like an OKTrends for websites.

That would be a great resource for initial usability and design decisions.
After which, can be tweaked and further optimized by their product.

~~~
dsiroker
Great idea! We'll do this.

~~~
gnaritas
That would rock.

------
pkamb
That "Enter Your website URL" field is really annoying. The easiest way to
change the default "<http://www.example.com> to your website's URL would
normally be to double click the "example" and type your address, leaving the
boilerplate "<http://www.> and ".com" and such.

But you can't do that due to the fancy javascript and everything. Have to type
the whole thing yourself.

If the field _cleared_ when you gave it focus, it wouldn't really matter and
I'd just type in the URL myself. But the text remains, in the background,
taunting you. It even appears to highlight the "example" part if you double
click it.

~~~
dsiroker
Sorry about that. The URL will fade out slightly when you focus on the textbox
but I agree it is better to just clear it completely. We'll make that change.

~~~
ceejayoz
Why clear it? Why not leave <http://> in there?

Might be worth A/B testing it. :-p

------
kristofferR
How does Optimizely compare with Visual Website Optimizer?

~~~
MikeX1555
I spent the last month using VWO, and despite the website needing a slight Web
2.0 visual update, it has been a good experience.

I'm now getting started with Optimizely and the biggest thing I miss is Video
demos. I also liked that VWO had a much smaller JS file with the option to
remove Jquery. Fix this Optimizely, I don't need to load Jquery twice!

~~~
Pickhardt
Hey Mike -

Good point about the video demos.

If you have your own version of jQuery running on your site, you can have
Optimizely exclude jQuery from the project bundle. Check out this for more
info:

[http://support.optimizely.com/kb/advanced/does-optimizely-
co...](http://support.optimizely.com/kb/advanced/does-optimizely-conflict-
with-existing-javascript-libraries-like-jquery-prototype-or-mootools)

If you have any other questions, you can contact us through support.

\- Jeff (with Optimizely)

------
201studio
I'd be interested to see some testing around using a '!' in a conversion
button text as opposed to a period.

------
JackWebbHeller
I got a lifetime free Optimizely account as part of an AppSumo deal. I can
successfully say it's one of the simplest yet most powerful and effective web
products I've ever used.

The ease with which you can make changes is astonishing and there's no limits
if you know a little jQuery.

------
garindra
Kinda out of topic, but I tried out top websites that use long-polling (like
Quora), and it always fails. I guess the app waits for all the resources to
load completely, which for long-polling websites, happens rather long after
the DOMready event.

------
usagi7
I wish I could apply Optimizely AB Tests in everything I do. Like measuring
the best way to word movie choices so that the one you secretly want gets
chosen (conversion)!

Anyway, interesting statistics. Larger sample size I think would definitely be
more dramatic.

------
rokhayakebe
How do you know AB test works if you aren't testing with the same user?

~~~
Raykhenberg
A/B testing works by randomly distributing your traffic. We can't assume that
every person is the same, but by splitting the traffic randomly we can assume
that the two groups are similar.

For a little bit more information on how our particular flavor of AB testing
works check out <http://optimizely.appspot.com/works>

