
How Optimizely Almost Got Me Fired (2014) - yanowitz
http://blog.sumall.com/journal/optimizely-got-me-fired.html
======
iamleppert
A/B testing is high on hype and promise but low on actual results if you
follow it though to actual metrics. I've done various forms of A/B throughout
most of my career and found them to be cinsistent with the OP's results.

A much better approach is to install significant instrumentation and actually
talk to users about what's wrong with your sign up form.

That, or actually build a product that users want instead of chasing after
pointless metrics. I mean, really, you think changing the color of text or a
call-out is going to make up for huge deficiencies in your product or make
people buy it? The entire premise seems illogical and just doesn't work. The
only time I've seen a/b tests truly help was when it accidentally fixed some
cross browser issue or moved a button within reach of a user.

Most of the A/B website optimization industry is an elaborate scam, put off on
people who don't know any better and are looking for a magic bullet.

~~~
Aleman360
> A much better approach is to install significant instrumentation and
> actually talk to users about what's wrong with your sign up form.

But but but telemetry is evil!

~~~
ross-life
Telemetry in a specific product? Fine.

Telemetry in my OS* that has access to _everything_? No.

* That I cannot turn off.

~~~
quanticle
What if the OS is the product?

~~~
ross-life
An OS is a product of course, but to me it's on a completely different level
to "just" an application product. It has access to everything, I want to be
able to trust it and know exactly what it's doing. Telemetry I can't turn off
ruins that. Having it is fine, just let me turn it off...

I like Windows and really like VS, I was often literally the only (non-VM)
Windows user in a sea of OS X at several offices but Windows 10, the flip-
flopping UI, the ads, and the telemetry pushed me to Linux and building my own
desktops.

------
feral
[I'm a PM @ Optimizely]

We were asked about this article before, on our community forums, and one of
our statisticians, David, wrote a detailed reply to this article's concerns
about one- vs two-tailed testing, which might be of interest [3rd from the
top]:

[https://community.optimizely.com/t5/Strategy-Culture/Let-
s-t...](https://community.optimizely.com/t5/Strategy-Culture/Let-s-talk-about-
Single-Tailed-vs-Double-Tailed/m-p/4220)

Additionally, since then, as other commenters have mentioned, we've completely
overhauled how we do our A/B-testing calculations, which, theoretically and
empirically, now have an accurate false-positive rate even when monitored
continuously. Details:

[https://blog.optimizely.com/2015/01/20/statistics-for-the-
in...](https://blog.optimizely.com/2015/01/20/statistics-for-the-internet-age-
the-story-behind-optimizelys-new-stats-engine/)

------
yummyfajitas
Disclaimer: I'm the director of data science at VWO, an Optimizely competitor.

In my view, the issue is not one-tail vs two-tail tests, or sequential vs one-
look tests at all. The issue is a failure to quantify uncertainty.

Optimizely (last time I looked), our old reports, and most other tools, all
give you improvement as a single number. Unfortunately that's BS. It's simply
a lie to say "Variation is 18% better than Control" unless you had facebook
levels oftraffic. An honest statement will quantify the uncertainty:
"Variation is between -4.5% and +36.4% better than Control".

When phrased this way, it's hardly surprising that deploying this variation
failed to achieve an 18% lift - 18% is just one possible value in a wide range
of possible values.

The big problem with this is that customers (particularly agencies who are
selling A/B test results to clients) hate it. If we were VC funded, we might
even have someone pushing us to tell customers the lie they want rather than
the truth they need.

Note that to provide uncertainty bounds like this, one needs to use a Bayesian
method (only us, AB Tasty and Qubit do this, unless I forgot about someone).

(Frequentist methods can provide confidence intervals, but these are NOT the
same thing. Unfortunately p-values and confidence intervals are completely
unsuitable for reporting to non-statisticians; they are completely
misinterpreted by almost 100% of laypeople.
[http://myweb.brooklyn.liu.edu/cortiz/PDF%20Files/Misinterpre...](http://myweb.brooklyn.liu.edu/cortiz/PDF%20Files/Misinterpretations%20of%20Significance.pdf)
[http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf](http://www.ejwagenmakers.com/inpress/HoekstraEtAlPBR.pdf)
)

~~~
lifeisstillgood
Could you point to the math/protocol that goes from a sample set, to a range
"-4 to +34" and of course how one goes from a range to giving a single number?

I feel like the discussion on this thread is missing the underlying "source
code"

~~~
yummyfajitas
Here are slides on the topic and the full math paper. But we dont go from a
credible interval to a single number - we just report credible intervals. We
find that to be the only honest choice.

[https://www.chrisstucchio.com/pubs/slides/gilt_bayesian_ab_2...](https://www.chrisstucchio.com/pubs/slides/gilt_bayesian_ab_2015/slides.html)

[https://www.chrisstucchio.com/blog/2013/bayesian_analysis_co...](https://www.chrisstucchio.com/blog/2013/bayesian_analysis_conversion_rates.html)

[https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technic...](https://cdn2.hubspot.net/hubfs/310840/VWO_SmartStats_technical_whitepaper.pdf)

~~~
lifeisstillgood
thank you - bedside reading :-)

------
scuba_man_spiff
One thing I noticed that I haven't seen commented on yet:

The solution mentioned of running a two tailed test would not have solved the
problem of a false result the author demonstrated through conducting an A/A
test.

According to the image in the article: [http://blog.sumall.com/wp-
content/uploads/2014/06/optimizely...](http://blog.sumall.com/wp-
content/uploads/2014/06/optimizely-test.png)

The A/A test had: A1: Population: 3920 Conversion: 721 A2: Population: 3999
Conversion: 623

    
    
        Z-Score: 3.3
        2-tailed test signifiance: 99.92%
    

Looks like the one-tail vs. two tail test doesn't make huge difference in this
case.

So, maybe a larger sample size would have seen a reversion to the mean, but
given the size and high significance that would be unlikely (interesting
exercise to try different assumptions to calculate how unlikely, with the most
overly generous obviously just being the stated significance).

Yes, the test was only conducted over one day, but if it was the exact same
thing being served for both, that shouldn't matter.

If there was a reversion to the mean due to an early spike, we would expect to
see the % difference between the two cells narrow as the test kept running.
You can see in the chart that the % difference (relative gap between the
lines) stays about the same after 8pm on the 9th.

So if it's not the one-tailed test at fault, and it's not the short duration
of the test at fault, what is?

Don't know.

I have seen in the past that setup problems are incredibly easy to make w/ a/b
testing tools when implementing the tool on your site. I've seen in other
tools things like automated traffic from Akamai only going to the default
control, or subsets of traffic such as returning visitors excluded from some
cells but not others.

Based on those results, I'd be suspicious of something in the tool setup being
amiss.

------
closed
> This usually happens because someone runs a one-tailed test that ends up
> being overpowered.

It always pains me a little when people doing research describe statistical
power as a type of curse. Overpowered? Should we reduce it? The risk isn't
having too much power, the risk is that someone will incorrectly interpret
their Null Hypothesis Significance Test (NHST). They need to shift their focus
to measuring something (and quantifying the uncertainty of their
measurements), rather than think of "how likely was this result given a null
hypothesis", whether that hypothesis is..

something is not greater than 0 (one-tail), or

something is not 0 (two-tail).

> You’ll often see statistical power conveyed as P90 or 90%. In other words,
> if there’s a 90% chance A is better than B, there’s a 10% chance B is better
> than A and you’ll actually get worse results.

This isn't necessarily true. A could be the same as B. Also, these tests are
being done from the frequentist perspective, so saying "there's X chance B is
better than A" is inappropriate, unless you're talking about the conclusions
of your significance test (e.g. 90% chance you correctly detect a difference
between them--a difference you assume is fixed to some true underlying value).
Overall, being aware that a one-tail test is taking the position that nothing
can happen in the other direction is useful, but a good next step is
understanding what NHST can and cannot say.

This even a frequentist vs bayesian problem, since you could create situations
where a person felt a study was overpowered in either framework.

------
jacalata
Don't just run tests longer - run tests for a pre-defined amount of time
instead of "until you see a result you like".

~~~
scuba_man_spiff
Your comment hits the nail on the head here.

Standard statistical tests used in a/b testing are based on one check. If
someone is checking repeatedly on a test until they get a 'significant'
result, your chance of getting a getting a false positive is many X the stated
significance.

Best practice - set a pre-defined end, and one or two defined early check-in
points where only make an early call if result is overwhelmingly significant
or if the business has fallen off a cliff.

~~~
stdbrouw
That would be best practice if you insist on using null hypothesis
significance testing and only wanted to use classical frequentist statistics.
We really can do much better these days with multi-armed bandits, and by
focusing on effect sizes and credible intervals rather than a yes/no answer to
a hypothesis.

~~~
jacalata
Sounds like I need to update my stats knowledge! Do you happen to know of a
good place to start learning about today's state of the art?

------
elliptic
I agree with the main point of the article, but I'm somewhat disturbed by the
statistical errors and misconceptions.

>Few websites actually get enough traffic for their >audiences to even out
into a nice pretty bell curve. If >you get less than a million visitors a
month your >audience won’t be identically distributed and, even >then, it can
be unlikely.

What is the author trying to say here? Has he thought hard about what it means
for "an audience" to be identically distributed?

>Likewise, the things that matter to you on your website, >like order values,
are not normally distributed Why do they need to be?

>Statistical power is simply the likelihood that the >difference you’ve
detected during your experiment >actually reflects a difference in the real
world. Simply googling the term would reveal this is incorrect.

------
hb42
> In most organizations, if someone wants to make a change to the website,
> they’ll want data to support that change.

So true and sad. In all the so called data-driven groups I have worked for,
the tyranny of data makes metrics and numbers the justification for or counter
to anything, however they have been put together.

> The sad truth is that most people aren’t being rigorous about their A/B
> testing and, in fact, one could argue that they’re not A/B testing at all,
> they’re just confirming their own hypotheses.

The sad truth is that most people aren’t being rigorous about anything.

------
RA_Fisher
Here's a great article about how Optimizely gets it wrong:
[http://dataorigami.net/blogs/napkin-folding/17543303-the-
bin...](http://dataorigami.net/blogs/napkin-folding/17543303-the-binary-
problem-and-the-continuous-problem-in-a-b-testing)

There are _many_ offenders. I've yet to see a commercial tool that gets it
right.

Tragically, the revamp by Optimizely neglects the straightforward Bayesian
solution and uses a more fragile and complex sequential technique.

~~~
emcq
The article you mention names RichRelevance, but there are others who
implement Thompson Sampling or other forms of Bayesian Bandits, such as SigOpt
[0] and Dynamic Yield [1]. Various adtech companies also use it underneath the
hood.

[0] [https://sigopt.com/](https://sigopt.com/)

[1] [https://www.dynamicyield.com/](https://www.dynamicyield.com/)

~~~
yummyfajitas
Thompson Sampling is not a replacement for A/B tests. Unfortunately, the real
world violates the assumptions of Thompson sampling virtually all the time.

[https://www.chrisstucchio.com/blog/2015/dont_use_bandits.htm...](https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html)

Bandit algorithms do have some important use cases (optimizing yield from a
short lived advertisement, e.g. "Valentines Day Sale"), but they are not
suitable for use as an A/B test replacement.

Also, I'd steer away from dynamic yield - I've found their descriptions of
their statistics to take dangerous (i.e. totally wrong) shortcuts. For
example, counting sessions instead of visitors as a way to avoid the delayed
reaction problem and increase sample size (as well as completely breaking the
IID assumption).

~~~
emcq
I love your posts, and completely agree that Bayesian Bandits are not
replacements for A/B tests.

To be fair though, the realworld issues like nonstationarity and delayed
feedback are also concerns for A/B tests (which you also bring up in your
great post), and you can tweak the bayesian bandits to handle these cases
decently.

How does counting sessions instead of visitors avoid delayed feedback? I read
your post [0] but dont remember anything about that. Is it just that they say
that after a session is completed (which is somewhat nebulous to measure in
many cases), then you have all the data you need from the visit?

[0]
[https://www.chrisstucchio.com/blog/2015/no_free_samples.html](https://www.chrisstucchio.com/blog/2015/no_free_samples.html)

~~~
yummyfajitas
Absolutely agree you can tweak Thompson sampling to handle nonstationarity,
periodicity and delayed feedback. I even published the math for one variation
of nonstationarity:
[https://www.chrisstucchio.com/blog/2013/time_varying_convers...](https://www.chrisstucchio.com/blog/2013/time_varying_conversion_rates.html)

(I've also dealt with delayed reactions, but I've never published it, and
probably won't publish until I launch it.)

Dynamic yield has the delayed feedback problem because users might see a
variation in session 1 but convert in session 2 (days later). They "solve"
this by doing session level tracking instead of visitor level tracking - the
delayed feedback is now only 20 minutes (same session) instead of days.

The problem is that session A and session B are now correlated since they are
the same visitor. IID is now broken.

~~~
emcq
Makes sense, thanks for the explanation!

------
kylerush
Optimizely rolled out a huge update to the way it handles statistics called
Stats Engine last year. That update resolves the issues discussed in this
article. You can read more about Stats Engine here:
[https://www.optimizely.com/statistics/](https://www.optimizely.com/statistics/)

~~~
stdbrouw
Their Stats Engine does resolve the issues, but I have to laugh at their
marketing materials calling it 21st century statistics, because even the new
approach is comically behind the times. Looks like the poor schmucks
accidentally outsourced the project to some really old-school one trick pony
"let's throw some maximum likelihood theory at this" statisticians. I would've
hoped even die-hard frequentists would see the value of a multi-armed bandit
and the irrelevance of declaring a winner rather than quantifying the effect
of each variant and the uncertainty around it.

Cf. the technical paper:
[http://pages.optimizely.com/rs/optimizely/images/stats_engin...](http://pages.optimizely.com/rs/optimizely/images/stats_engine_technical_paper.pdf)

------
cwyers
> some testing vendors use one-tailed tests (Visual Website Optimizer,
> Optimizely)

> Most A/B testing tools recommend terminating tests as soon as they show
> significance, even though that significance may very well be due to short-
> term bias. A little green indicator will pop up, as it does in Optimizely,
> and the marketer will turn the test off.

People pay brisk money for this?

------
stdbrouw
After reading the blog post and reading through the comments, it looks like
people are drawing the wrong conclusion from this. The problem is not that AB-
testing is overrated, doesn't work, is bullshit etc. but that Optimizely used
to do it wrong.

~~~
ErikVandeWater
Exactly. A/B testing is very useful in some situations (i.e. where traffic to
the page is great enough, product-market fit is achieved, and the product is
well-developed enough that any changes to it after the campaign is run will be
minimal). But many companies, usually startups, scale too early and use the
products poorly. Companies that sell these products have very little incentive
to discourage use of their products by ill-informed users, so the misuse
continues.

------
erikbern
> Statistical power is simply the likelihood that the difference you’ve
> detected during your experiment actually reflects a difference in the real
> world.

This seems incorrect to me. Isn't statistical power the likelihood that the
null hypothesis would generate an outcome at least as extreme as what you
observed?

I'm guessing the issue has a lot more to do with peeking at the outcome and
not correcting for it (and similarly running many tests)

[http://www.stat.columbia.edu/~gelman/research/unpublished/p_...](http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf)

~~~
smu3l
Yes, this is incorrect.

Also the next sentence is wrong:

>You’ll often see statistical power conveyed as P90 or 90%. In other words, if
there’s a 90% chance A is better than B, there’s a 10% chance B is better than
A and you’ll actually get worse results.

Having a test w/ 90% power means that if A is truly better than B (for one
sided) or A is truly different than B (two sided), then you'll detect it 90%
of the time you run the test (on independent data).

------
nartz
In my experience, if you are seeing huge effects like 60% difference in
conversion, etc, you probably did something wrong (i.e. too small sample,
didn't wait long enough, etc) - I've never seen something this large by simply
moving things around, changing colors, changing messages, etc.

Also, in general, the more drastic the changes are, the more of an effect you
__could __have (up to some percentage). I.e. a small change would be changing
a message or color, dont expect conversion to change by much. A large change
would be changing from a flash site to an html one with a full redesign that
loads twice as fast...

~~~
ssharp
One of the better explanations of a/b testing I've read is that there are
three types of uses on your site: those who will covert no matter what, those
who will not convert no matter what, and those who might convert. Testing is
for that final group. Thinking in those terms, you have to consider if such a
large percentage fits into that "might" group. I've never personally witnessed
a 60% improvement at any volume I'd bother testing on and I've been doing
tests for many years.

I also think a lot of the "change the button" tests or "increased email sign
up by x times" are dubious. Unless the conversion is someone giving you money,
then you still have steps to go before you see real business improvement.
There are lots of ways I can minipulate my traffic to make certain funnel
steps look better and more optimized, but the only thing I really care about
is what's coming out of the funnel. So all those extra email signups or button
pushes mean nothing if the group who perform those actions on your b version
still aren't interested in an actual purchase.

------
forrestthewoods
I'm increasingly convinced all statistical analysis performed by non-PhDs is
no better than a coin flip. Maybe even worse.

My favorite example is still the quite popular Page Weight Matter posts. I
wonder how close they were to abandoning a 90% reduction in size. I wonder how
many improvements the world at large has thrown away due to faulty analysis.

[http://blog.chriszacharias.com/page-weight-
matters](http://blog.chriszacharias.com/page-weight-matters)

------
jamiequint
This is a huge problem with paid marketing as well. Many folks will look at a
conversion rate completely ignorant of sample size and allocate thousands of
dollars in budget to something which they have no idea performs better or
worse.

The real problem (as you allude to in the article) is that the demand for
accurate tools is not really there. Vendors don't build in accurate stats
because only a tiny portion of their client base understand/demands them.

------
anarchitect
There is much more to running experiments properly than it seems. While I'm
not an expert on the statistics side, there are a few things I've learned over
the years which come to mind...

1) Run the experiment in whole business cycles (for us, 1 week = one cycle),
based on a sample size you've calculated upfront (I use
[http://www.evanmiller.org/ab-testing/sample-
size.html](http://www.evanmiller.org/ab-testing/sample-size.html)). Accept
that some changes are just not testable in any sensible amount of time (I
wonder what the effect of changing a font will have on e-commerce conversion
rate).

2) Use more than one set of metrics for analysis to discover unexpected
effects. We use the Optimizely results screen for general steer, but do final
analysis in either Google Analytics or our own databases. Sometimes tests can
positively affect the primary metric but negatively affect another.

3) Get qualitative feedback either before or during the test. We use a
combination of user testing (remote or moderated) and session recording (we
use Hotjar, and send tags so we can view sessions in that experiment).

------
filleokus
Interesting read, might be worth adding (2014) to the title though.

~~~
IkmoIkmo
Why the downvote? The year is relevant as Optimizely has made various
adjustments in the past 1-2 years to the way they handle statistics, one other
user even purporting it addresses the very issues this article mentions.

~~~
PhasmaFelis
One might wonder if a company which doesn't understand statistics, despite
statistics being their sole reason for existence, is actually _capable_ of
fixing these problems. If they don't understand what they don't understand,
how can they address it effectively?

------
gnicholas
Almost didn't click on this because the title seems (and is) clickbait-y, but
it was actually a very useful read for me.

As a founder, I'm constantly hearing about A/B testing and how great these
tools are. I'm not enough of a statistician to know whether everything in this
article is true/valid (and would welcome a rebuttal), but the part about
regression to the mean really hits home. Encouraging users to cut off testing
too early means that you make them feel good ("Look, we had this huge
difference!"), when in reality the difference is smaller/negligible.

I'll still do some A/B testing, but given our engineering/time constraints—and
my inability to accurately vet the claims/conclusions of the testing
software—I won't spend too much time on this.

~~~
fharper1961
Deciding not to AB test because of this article would be a huge mistake.

Learning from mistakes made by others, and avoiding them is what I would
suggest as the take away.

I work for a successful ($50M+ revenue) bootstrapped startup. And one of the
reasons for the success is that AB testing became part of the company's
culture, as soon as there was enough data coming in for the tests to become
useful.

AB testing is so important that we have built our own in-house framework that
automatically gives results for our company specific KPIs.

------
IndianAstronaut
AB testing is too simplistic. Even on my marketing team we have designed more
complex metrics to look at a factors impact on multiple outcomes. The testing
is still a straight forward chi square, but with a bit more depth.

------
TeMPOraL
Wow, did not see that coming. This article actually confirms the cynical
hypothesis I entertain - that most of the "data-driven" marketing and
analytics is basically marketers bullshitting each other, their bosses, their
customers and themselves, because nobody knows much statistics and everyone
wants to believe that if they're spending money and doing something, it must
be bringing results.

Some quotes from the article supporting the cynical worldview:

\--

"Most A/B testing tools recommend terminating tests as soon as they show
significance, even though that significance may very well be due to short-term
bias. A little green indicator will pop up, as it does in Optimizely, and the
marketer will turn the test off. But most tests should run longer and in many
cases it’s likely that the results would be less impressive if they did.
Again, this is a great example of the default settings in these platforms
being used to increase excitement and keep the users coming back for more."

This basically stops short of implying that Optimizely is doing this totally
on purpose.

\--

"In most organizations, if someone wants to make a change to the website,
they’ll want data to support that change. Instead of going into their
experiments being open to the unexpected, open to being wrong, open to being
surprised, they’re actively rooting for one of the variations. Illusory
results don’t matter as long as they have fodder for the next meeting with
their boss. And since most organizations aren’t tracking the results of their
winning A/B tests against the bottom line, no one notices."

In other words, everybody is bullshitting everybody, but it doesn't matter as
long as everyone plays along and money keeps flowing.

\--

"Over the years, I’ve spoken to a lot of marketers about A/B testing and
conversion optimization, and, if one thing has become clear, it’s how
unconcerned with statistics most marketers are. Remarkably few marketers
understand statistics, sample size, or what it takes to run a valid A/B test."

"Companies that provide conversion testing know this. Many of those vendors
are more than happy to provide an interface with a simple mechanic that tells
the user if a test has been won or lost, and some numeric value indicating by
how much. These aren’t unbiased experiments; they’re a way of providing a fast
report with great looking results that are ideal for a PowerPoint
presentation. _Most conversion testing is a marketing toy, essentially_."
(emphasis mine)

 _Thank you_ for admitting it publicly.

\--

Like whales, whose cancers grow so big that the tumors catch their own cancers
and die[0], it seems that marketing industry, a well known paragon of honesty
and teacher of truth, is actually being held down by its own utility makers
applying their honourable strategies within their own industry.

I know it's not a very appropriate thing to do, but I _really_ want to laugh
out loud at this. Karma is a bitch. :).

[0] -
[http://www.nature.com/news/2007/070730/full/news070730-3.htm...](http://www.nature.com/news/2007/070730/full/news070730-3.html)

~~~
TheLogothete
>most of the "data-driven" marketing and analytics is basically marketers
bullshitting each other

Most of the ones you hear about. You know, the ones who were SEOs or content
writers or programmers before waking up one day and deciding to be marketers.

Marketing degrees have mandatory statistical courses. The good marketing
programs take them very seriously. However a lot of schools focus on the
communication side of marketing, which leads a very significant chunk of the
marketing analyst positions to be filled by people with economics and
accounting degrees.

The Internet is actually quite new and has just begun maturing. A lot of
people working in digital marketing do not really understand what marketing
is. When you meet somebody on the street and ask him what marketing is, he
will describe advertising, or more precisely mar. communications. So when the
internet became this giant medium for doing all sorts of commerce, big
companies and schools couldn't fill the skill gap fast enough, so the gap was
filled by self-learners coming from all sorts of backgrounds. When they wanted
to build up their "marketing skills", naturally they defaulted to learning
about marketing communication instead of the economics-orientated part of
marketing. This is why digital marketers obsess with their site and drool over
a/b tests and such. The web site is a communication medium. They've put
themselves in this box equating marketing and communications.

Too bad for them, because graduates nowadays are digital native too, so they
have no problem navigating the internet and learning html/css.

------
jmount
(as others have mentioned) Optimizely's newer engine uses ideas like Wald's
sequential analysis. Here is my article on the topic: [http://www.win-
vector.com/blog/2015/12/walds-sequential-anal...](http://www.win-
vector.com/blog/2015/12/walds-sequential-analysis-technique/) .

------
hyperpallium
\tangent When I first heard of A/B testing, I thought of combining it with
genetic algorithms to evolve the entire site. Just run it til the money rolls
in.

Unfortunately, if it did work, it would probably be through something
misleading or scammy. Therefore, you need some kind of automatic legality
checking... which would be hard.

------
jbpetersen
Is anybody out there taking an approach of gradually driving more traffic to
whichever option is winning out and never running 100% with anything specific?

