
Whom the Gods Would Destroy, They First Give Real-time Analytics - chrisdinn
http://mcfunley.com/whom-the-gods-would-destroy-they-first-give-real-time-analytics
======
lmkg
Full-time web analyst here. Total agreement.

Information is as useful as your ability to act on it--no more, no less. Real-
time analytics is something that sounds sexy and gets a lot of headlines (and
probably sales), but it's not particularly useful, especially compared to the
cost to implement. Most organizations aren't capable of executing quick
decisions of any significance. In fact, quite a few business models wouldn't
have much to gain even if they were capable of it.

My experience is that there are three types of companies, with very little
overlap:

1\. Companies large enough to receive statistically significant amounts of
data in under an hour.

2\. Companies small enough to make decisions regarding significant site
updates in under an hour.

3\. Companies whose name is "Google."

Fact of the matter is, any change to your site more significant than changing
a hex value will require time overhead to think up, spec out, test, and apply.
Except in the most pathological cases of cowboy coding, it will take at least
a day for minor changes. Changing, say, the page flow of your registration
process will take a week to a month. You won't be re-allocating your multi-
million-dollar media budget more often than once a quarter, and you have to
plan it several months in advance anyways because you need to sign purchase
orders.

In short, you can usually wait 'til tomorrow to get your data. Really, you
can. Sure, you can probably stop an A/B test at the drop of a hat, but if it
took you a week to build it, you ought to let it run longer than that.

I have had one client who really did benefit from real-time-ish (same-day)
data. It was a large celebrity news site. They could use data from what
stories were popular in the morning to decide which drivel to shovel out that
afternoon. This exception nonetheless proves the rule: Of the 6 "requirements"
listed in the article, only 1.5 were needed in this particular case: hard yes
on accessibility, and timeliness was relaxed from 5 minutes to 30.

(Note that when I say analytics, I mean tools for making business decisions.
Ops teams have use for real-time data collection, but the data they need is
altogether different, and they are better served by specialized tools).

~~~
pdog
Why single out Google? They're in category 1.

~~~
cli
He is saying that only Google can do both number 1 and number 2.

~~~
mrtron
They seem to qualify for number 3 as well.

------
btilly
Gah, yet another article that links to Evan Miller's article on how to not run
an A/B test. I really need to finish writing my article that explains why it
is wrong, and how you can do better without such artificial restrictions.

His math is right, but the logic misses a basic fact. In A/B testing nobody
cares if you draw a conclusion when there is really no difference, because
that is a bad decision that costs no money. What people properly should care
about is drawing the wrong conclusion when there is a real difference. But if
there is a significant difference, only for small samples sizes is there a
realistic chance of drawing a wrong conclusion, and after that the only
question is whether the bias has been strong enough to make the statistical
conclusion right.

He also is using 95% confidence as a cut-off. Don't do that. You don't need
much more data to massively increase the confidence level, and so if the cost
of collecting it is not prohibitive you absolutely should go ahead and do
that. Particularly if you're tracking multiple statistics. If you test
regularly those 5% chances of error add up fast.

~~~
ISL
_He also is using 95% confidence as a cut-off. Don't do that. You don't need
much more data to massively increase the confidence level, and so if the cost
of collecting it is not prohibitive you absolutely should go ahead and do
that._

Statistical significance grows roughly like the square root of the number of
samples. Moving from 2-sigma (95%) to the physics gold-standard of 5-sigma
requires drastically more data in almost all cases.

Selecting a measurement's uncertainty is something which should be carefully
considered. Sometimes you only care about something to 10%, sometimes a 1-in-
a-million part failure kills someone's Mom.

If you're doing lots of A/B testing, where trials penalties add up, it might
be worth looking into the way that LIGO handles False Alarm Rates. They have
to contend with a lot of non-Gaussian noise/glitches.

~~~
btilly
_Statistical significance grows roughly like the square root of the number of
samples._

No, no, no. You are confusing the growth of the standard deviation (which does
grow like the square root of the number of samples) with the increase in
certainty as you add standard deviations. That falls off like e^(-O(t^2))
where t is the number of samples. This literally falls off faster than
exponential.

What does this mean in the real world? In a standard 2-tailed test you get to
95% confidence at 1.96 standard deviations, 99% confidence at 2.58 standard
deviations, and 99.9% confidence at 3.29 standard deviations. These numbers
are all a long ways away from 5 standard deviations.

Let's flip that around and take 95% confidence as your base. If you are
measuring a real difference, then on average 99% confidence requires a test to
get 32% more data, and 99.9% confidence requires a test to get 68% more data.
Depending on your business, the number of samples that you get are often
proportional to the time it takes to run the test. If making errors with x% of
your company involves significant dollar figures, the cost of running all of
your tests to higher confidence tends to be much, much less than the cost of
one mistake.

That is why I say that if the cost of collecting more data is not prohibitive,
you shouldn't be satisfied with 95% confidence.

~~~
ISL
Assume a random variable is barely resolved at 1-sigma off zero with N
samples. If I wish to increase my confidence that it really is off zero (and
the mean with N samples is actually the mean of the distribution), then I'll
need 4N samples to halve my uncertainty and double the significance of the
observation (as measured in sigma-units). It is in that sense that the
significance of a measurement increases like \sqrt(N).

Viewed from my perspective, if you'd like to go from 2-sigma (95%) to
3.29-sigma, you'd need (3.29^2)/(2^2)=2.7 times the amount of data used to get
the 2-sigma result, or 170% more samples.

It looks like you've reached your conclusion that I'd need 68% more data to
reach 99.9% by taking 3.29/1.95=1.68. I believe that this is in error.
Uncertainty (in standard deviation) decreases like 1/\sqrt(N), not 1/N.

\sqrt(N) has driven me to depression more than once.

~~~
btilly
You are right that I used linear where I should have used quadratic.

However consider this. To go from 95% to 99% confidence takes 73% more data
collection. So for 73% more data, you get 5x fewer mistakes.

To go from 95% to 99.9% confidence takes 182% more data. So for less than 3x
the data, you get 50 times the confidence.

My point remains. Confidence improves very, very rapidly.

~~~
ISL
Neat to see a different side of a coin. In our lab, individual measurements
can take as long as a year. \sqrt(N), when constrained by human realities,
presents a wall beyond which we cannot pass without experimental innovation.

As the derivative of \sqrt(N) is 1/2*1/\sqrt(N), your first measurement
teaches you the most. Every measurement teaches you less than the last. In
general, we measure as much as we must, double the size of the dataset as a
consistency check, and move on. The allocation of time is one of the most
important decisions of an experimenter.

~~~
btilly
Ah. Well I talk about the cost of data acquisition for a reason.

I've seen a number of businesses who have a current body of active users, and
this does not change that fast. So when they run an A/B test, before long
their active users are all in it, and before too much longer those of their
active users who would have done X will have done X, and data stops piling up.
In that case there is a natural amount of data to collect, and you've got to
stop at that point and do the best you can.

Businesses are as alike as snowflakes - I am happy to talk about generalities
but in the end you have to know what your business looks like and customize to
that.

------
sardonicbryan
So I built and use a realtime analytics dashboard that tracks revenue,
projected revenue, revenue by hour for a portfolio of social games. I find it
incredibly useful, but I will give a couple tips that address some of the
issues in the article:

1) You have to provide context for everything. Current real time revenue is
presented right next to the 14 day average revenue up to that point in time,
and also how many standard deviations the delta between the two is. Ie:
Current revenue is $100 at 10am, vs. 14 day average of $90, which is 0.2
standard deviations of revenue at that time.

2) Hourly revenue is presented the same way, right next to the 14 day average
revenue for that hour and the SD delta.

3) Look at it a lot. I've been looking at this sheet regularly for over a year
now, and I have a really good feel/instinct for what a normal revenue swing
is, and an even better feel for the impact of different
features/content/events/promotions on our revenue.

4) This approach also works better when the impact of your releases is high. A
big release typically spikes revenue 2-3 SD above baseline, and causes an
immediate and highly visible effect. So while I'm not strictly testing for
statistical significance, it's one of those things where it's pretty obvious.

5) It also works better if you use it in conjunction with other metrics. We
validate insights/intuitions gained from looking at realtime data against
weekly cohorted metrics for the last several months of cohorts.

~~~
ProblemFactory
While this sounds very cool, and my inner geek would love this dashboard on
the wall of the office - is it actually useful or is it a distraction?

What actions or decisions would you make within minutes of seeing the results?
If product changes take days or weeks, daily analytics is just as useful, and
stops people wasting time on looking at the data more than once per day.

~~~
rossjudson
This is precisely the right point. We look at an analytic display to determine
if action should be taken, and what those actions should be.

The vast majority of "big data" is noise, more than effectively summarized
with basic statistics.

------
physcab
I like this rant. Seldom do I see the need for a real-time system and
sometimes I think engineers and program managers gravitate towards the concept
to better answer questions of "why" a problem happens. But analytics problems
most of the time can't be solved in real time. You have to put on your
thinking cap, take a step back, do some background research, and be patient.
And as an analyst it is bad for your credibility to jump to conclusions.
Unlike engineering, it better to be slow and right on your first try than
"move fast and break things".

------
ChuckMcM
Nice post. Ops guys though, like to see the bushes rustling right away so that
we can reboot that switch before all hell breaks loose :-)

The central theme is a good one though, tactics or strategies have an innate
timeline associated with them, and deciding on tactics or strategies with data
that doesn't have a similar timeline leads to poor decisions. The coin flip
example in the article is a great one.

Ideally one could say "What is the shortest interval of coin flips I can
measure to 'accurately' determine a fair coin?" And realize that accuracy
asymptotically approaches 100%. One of the things that separate experienced
people from inexperienced ones are having lived through a number of these
'collect-analyze-decide' cycles and getting a feel for how much data is
enough.

~~~
btilly
_One of the things that separate experienced people from inexperienced ones
are having lived through a number of these 'collect-analyze-decide' cycles and
getting a feel for how much data is enough._

If you're going off of "feel" instead of statistics then you're doing it
wrong. Period.

~~~
Karunamon
>If you're going off of "feel" instead of statistics then you're doing it
wrong. Period.

I disagree, at least to a point. Take an experienced Operations/monitoring guy
who's been around the block more than once, then sit him in front of the
monitoring utilities for a new company developing a new service.

Then, take a total newbie and put him in the same place.

Train them both to equal skill on your tools and operations.

Who do you think will make the most proper calls? Why?

At this point, those statistics and that documentation do not exist yet. What
constitutes a "false positive" vs a "drop everything and spin up more VMs and
get on the load balancer" can be more of an art than a science, especially
when you're first starting out.

As hokey as this sounds, certain systems have a "personality" that varies
between installations and companies, that nothing short of day to day use will
educate one in.

~~~
btilly
For operations, I agree. A lot of the numbers you have don't have a rigorous
statistical interpretation - for instance is a load average of 20 fine or a
problem? Depends if you're looking at the Oracle database.

But the original article was talking about A/B testing, and that is the
context that I was thinking about. There you both can and should use
statistics.

------
creature
I once interviewed for a lead webdev role at a small startup. They had 10-12
people, and a product that was doing OK. (I was thoroughly unconvinced by it,
but that's another story). One of the things they talked about was their
upcoming plan to build a real-time analytics system to track user behaviour. A
big project! That I would get to spearhead! They'd budgeted 2-3 months and 6-8
people to implement it. We talked about their plans for a bit, before I asked
(what I thought was) the obvious question:

"So, what's the real-time system going to help you decide that the current
system won't?"

There is a long, uncomfortable pause as the two people look at each other,
each hoping the other will answer.

"Well... it's not so much the real-time element, per se..." one managed. "But
we want more granular data about how people are using our app."

"Okay. But you're currently doing analytics via HTTP callbacks, right? Why not
just extend that to hit some new endpoints for your more granular data? You've
already got infrastructure in place on the front and back end to support
that."

No answer. We moved on. I don't know if I actually saved them 1-2 man-years of
work or if they plowed ahead anyway.

------
lostnet
And we shouldn't have calculators because we may forget the relationships
between numbers?

I use analytics to do significant A/A testing on every configuration the sites
users are actually using to determine what will work for my A/B testing
later... Should I maintain a separate realtime analytics or delay deployments
by 24 hours when I would like a little more assurance? This is not a
rhetorical question, whether I should keep maintaining separate tracking for
the 20% of the time where google analytics is unfit is an open problem for me.

Similarly, I would like to know if there is a sudden plummet in some
demographic the second I start a test. It usually isn't significant, but the
client panic will be. It is better to cancel the test and do a post-mortem
before restarting.. A B test doesn't have to get its day in court.

Giving delayed numbers for routine reports is perfectly valid, dressing up
that pig is luddism.

~~~
gfodor
Presumably anything along the lines of the "demographics drop" you mention
would be in live operational metrics if it is something that is an effective
metric for monitoring the health of the system.

------
josh2600
This is a really interesting post.

While I agree with the basic premise that Real-time analytics are rarely
helpful, here are a couple places where they could be very useful:

* Conferences - Being able to see live user analytics on a conference site, since it is ephemeral, would be great.

* Pop-up Sites - Again, the short nature of the site means seeing a blocking action or a broken link early is tremendously valuable.

Basically there are a couple circumstances where real-time analytics might
make sense, but they're generally short duration engagements. Getting
analytics info for a site which is no longer being hammered is useless unless
it's a long term project.

~~~
josephlord
What action will the conference take based on any immediate information or do
you mean the information will go on the site?

Broken links etc. is probably in the operational category although a validator
is a better solution for that issue. Logging errors and maybe tracking
accesses to ensure every page is being reached can be done by realtime
operational stats and doesn't contradict this article.

~~~
josh2600
So think of conferences where there's a site up for one day.

If one wants users to clickthru to a particular page, but they're all going
elsewhere, that's something that's only actionable on the day of the event (as
changing it 12 or 24 hours later does one little good).

I see your point about not contradicting the article, but I think there are
instances where evaluating the performance of a website in real time (for
time-sensitive events) could have a real impact.

I don't think we're disagreeing so much as talking about the same point from
different angles.

------
car54whereareu
"You just need to understand cause and effect," said Apollo.

"He's right, mortal. This isn't what you would call rocket science," added
Athena.

"Okay, and my business will succeed if I can understand cause and effect?"

"Yes," said Apollo.

"Of course! Why are you wasting time? Go write some software", said Athena.

So yeah, real-time A/B testing seems like a bad idea, but real-time analytics
sounds fine. On the other hand, maybe the Gods gave you the idea of cause and
effect to destroy you. I bet more than one story on hacker news today pretends
to understand the causes for an effect.

------
AnthonyMouse
I agree with this in general, but there are exceptions. For example, it would
be nice to know _immediately_ if a new change has caused your conversion rate
to drop precipitously for some reason, so that you can turn it back off and
take a minute to see if you can figure out why before you lose a full day's
worth of revenue.

~~~
scottandjames
Agreed- seeing real-time changes are helpful to respond to drop-offs or
spikes.

Also, if you are aware of the general trend of Tuesdays being higher
traffic/results than Saturdays (to take his Etsy example) and don't take those
to heart around product decisions, then watching real-time numbers to respond
to changes as they happen can help you hop on waves with supplemental content
or messaging.

------
cftm
Interesting post though I feel the author is somewhat missing the forest for
the trees; the issue isn't about "real-time" the issue is that many people
conducting A/B tests don't understand what the statistics are telling them nor
do they understand when an adequate "sample" has been pulled.

Real-time data isn't needed for A/B testing but this falls into the PEBKAC
category.

------
phyalow
Splunk? - I cant help but think that is piece of software would address most
concerns this article raises.

------
frozenport
Yes, Yes, and a Thousand Times, Yes!

