
Beginners Guide to Maths and Stats behind Web Analytics - ZanderEarth32
http://www.seotakeaways.com/beginners-guide-maths-stats-web-analytics/
======
pseut
WTF? This was a somewhat incoherent list of "stats stuff"; I especially liked
the equation:

    
    
                    signal
       confidence = ------  x  \sqrt(sample size)
                    noise
                    

(none of the terms in the 'equation' were defined beforehand) which was led
into by 'so when someone says “is your result statistically significant?” then
it means he is really asking “What is the likely hood that your result has not
occurred by chance”' No no no no no no no no no no non.

Edit: corrected despair-induced typos.

------
jakozaur
In real world the statistics is usually the most useful part of math.
Analysing data without some basic statistics lead often to erroneous
conclusions.

What surprise me, that it doesn't get attention it deserves at pre-university
education. A lot of lessons in math are about geometry, algebra, etc. which
are a great way to learn logical and abstract thinking, but aren't as useful
as statistics.

------
dkarl
_Q1. When your website conversion rate jumps from 10% to 12% then is it 2%
rise in conversion rate or 20% rise in conversion rate?_

This question doesn't have a clear answer, because the values are already
percentages. Imagine the original numbers refer to apples. First there were 10
apples, then 12 apples. The absolute increase is 2 apples; the relative
increase is 20%. Obviously, if you say "the increase is __ apples" or "the
increase is __ %" there's only one right way to replace the blanks with
numbers. But since 10 and 12 are already percentages, the absolute and
relative changes would be stated as "the increase is 2%" (absolute increase)
and "the increase is 20%" (relative increase.) They mean different things, but
they're both correct statements if interpreted correctly.

In practice, people will expect one and interpret the other as wrong. Knowing
which one they expect is not a matter of statistics.

~~~
pseut
It's a change of two "percentage points," not 2%. With the right vocabulary,
there's no ambiguity.

~~~
dkarl
If you rely on people making the same precise distinction between "percent"
and "percentage point" that you do, then you'll have more misunderstandings
rather than fewer. It's like "next Friday." If today is Monday the 3rd and I
say "next Friday," I don't rely on anyone outside my immediate family
understanding that I mean Friday the 14th, since many people understand it to
mean Friday the 7th.

~~~
pseut
I "rely" on the person saying that it went from 10% to 12%. I'll refer to such
a change as a 2 percentage point change, but if the other person calls it the
wrong thing, I don't really care. I'd be much more concerned about someone
getting worked up about the semantics of the it and ignoring the context of
the numbers. E.g. If they want to call it a 20% change and leave it at that,
the misuse of terminology is the least of our problems.

Witness: <http://xkcd.com/1102/>

~~~
sctechie
Communication is at the heart of our problems. Your attitude is rather toxic.
The xkcd you linked does little to soften your stance which appears to I don't
care if the other person and I don't agree on terminology; to quote, "but if
the other person calls it the wrong thing, I don't really care."

There is REAL value in being able to communicate complex ideas effectively. In
my opinion, agreeing on definitions for terminology is step #1 to having an
effective conversation / discussion.

~~~
pseut
Look, this isn't a matter of "agreeing on the terminology." I know and use the
correct terminology, but I don't get bent out of shape if the person I'm
talking to doesn't know it. Especially when their mistake is obvious and I can
correct for it in my head. I don't see how it is "toxic" to be lenient when
talking to non-specialists who misuse terminology.

By and large -- and, hey, you'll think this statement is toxic -- people who
spend a lot of time worrying about whether something is called a "percent" or
a "percentage point", outside the context of a classroom, do not know what
they're talking about and have no business teaching anyone how to interpret
statistics. Sometimes a change from 10% to 12% matters a lot. Sometimes it
doesn't. Sometimes it would matter a lot, but is estimated so imprecisely as
to be indistinguishable from noise. Sometimes it is measured very precisely
but is meaningless. I could go on. This context-dependence is true whether you
call it a percent, a percentage point, or just a "change."

Most smart people understand this regardless of their statistical training.
But then they read that, no, what really matters is what people _call_ the
change, and then they either 1) conclude that statisticians are pedantic
morons who should be ignored and/or 2) psych themselves out and doubt their
instincts and wind up worrying about trivial, trivial shit.

Communication is important, but not the way you claim. It is important that
specialists (be they statisticians, programmers, whatever) be able to explain
things to clients/nonspecialists. It is also important that the specialists be
able to interpret what the clients/nonspecialists want to understand and do.
The burden falls entirely on the specialist, and any guide that spends any
amount of effort to get nonspecialists to use the "correct" terminology is
misguided and wasted at best. Which is what I meant by the "toxic" statement,
"but if the other person calls it the wrong thing, I don't really care."

~~~
sctechie
I'm sorry but I have to interact with non-technical managers on a regular
basis. I consider it part of my job responsibilities to ensure that we are
communicating using the same terminology. When I tell a client his conversions
are up 35% because of some change we made before A/B testing, I need to know
that he understands what that means.

I didn't mean toxic as a personal slight, sorry if you took it that way.

If the other person 'calls it the wrong thing', then how do you know they
understand what you're talking about? I think it's worthwhile in that
situation, if not necessary, to take a few minutes and define, specifically,
what the terms you're using mean.

I simply disagree that communication is not as important as I claim. The value
you bring as a statistician is not running a z-test. Any high-school kid with
a computer can go to Wikipedia and be running a z-test on some data 10 minutes
later. The value comes from being able to understand the results and
communicate them effectively to your clients.

~~~
pseut
Reread my last paragraph -- you misunderstand my position (phrased for maximal
irony). I care very much if my students or (hypothetical) employees misuse
terminology. But the best way by far to communicate that a value changed from
10% to 12% is to say "xyz changed from 10% to 12%." I hope that the next step
in the conversation is not a discussion of whether that means that the value
changed by 20% or by 2%, but whether the change is important and measured
precisely...

but I am a little worried that you call them z-tests instead of t-tests (even
when using Gaussian critical values) (and, to belabor the point, I try to call
them "Gaussian" critical values because "Normal" may be interpreted
ambiguously by a non-technical reader, but I can usually tell whether someone
I'm talking to means "normal" in a technical or vague sense).

:)

~~~
sctechie
My apologies, I'm only an amateur statistician. I'll defer to your knowledge
in that area. =) (and I'm joking here, I'm decent with the stats, just wanted
to focus on the broader issue).

I don't want to get bogged down in the stats discussion because I was making a
broader point and don't claim to be an expert in statistics. We could extend
the example to any area where one person has more technical expertise in any
certain subject than the people they are communicating with.

So, let's step outside the arena of statistics for a second. If you were
teaching someone to cook, would you really explain the process using terms
like a 'pinch' or a 'dash' of salt. Sure, to an expert chef or grandmother, a
pinch of salt is a perfectly reasonable quantity to add to the recipe. The
student just learning to cook can only guess at what that term means. That's
why most recipes come with specific amounts or weights of ingredients to add,
because we need a common terminology to correctly express the recipe.

Taken totally as an argument for teaching or explaining statistics, I see your
point. It's far more important to discuss and quantify the significance of the
change rather than simply noting that something did change and by how much.

And yeah, irony went right over my head. I blame Friday. =)

~~~
pseut
For cooking: I'd say 'pinch' or 'dash' and then clarify if the person asked
for clarification. If he or she looks puzzled, I'll offer the clarification
unsolicited and as naturally as possible. When I cook pancakes on the weekend
with my three year old daughter, we talk about how the appearance of the edge
of the pancake changes as it gets closer to being ready to be flipped. To get
even further off topic -- if you like cooking, take a look at "Tartine Bread"
which is damn near the pinnacle of this sort of communication.

~~~
sctechie
Well, that will teach me to argue with an Econ professor! And I also find
myself hungry after staring at pictures of bread on google for the past 10
minutes. Thanks for the interesting discussion. =)

~~~
pseut
bon appetit!

------
vtuulos
Shameless plug: <https://bitdeli.com> lets you put theory into practice - you
can build your own web analytics in plain and simple Python.

Feel free to ping me if you need help getting started (or a longer trial :)

------
alexatkeplar
If you want to have a go rolling your own web analytics calculations, I would
recommend setting up SnowPlow (<https://github.com/snowplow/snowplow>) and
then following through the SnowPlow Web Analyst's Cookbook (start with the
simple recipes here: <http://snowplowanalytics.com/analytics/basic-
recipes.html>).

Here's a simple example for unique visitors by month:

    
    
        SELECT
        YEAR(dt),
        MONTH(dt),
        COUNT(DISTINCT(user_id))
        FROM events
        GROUP BY YEAR(dt), MONTH(dt) ;

------
prashantganti
Nice article. It could have done with a better example for %difference.
%difference is used when neither value is more important or topical than the
other.

------
tomrod
What is the pay rate for someone who knows far, far more statistics than this,
and can communicate at a coherent level?

~~~
sctechie
"Data science" is a hot thing right now. If your technical / design abilities
match your stats knowledge, you could pull 6 figures easily depending on the
market you're in.

<http://en.wikipedia.org/wiki/Data_science>

------
louwhopley
Awesome! This clears up stats and speaking statistics a lot for me. Thank you.

------
kappaloris
any source for people who know some statistical theory but know almost nothing
about how it is applied to web analytics?

~~~
disgruntledphd2
There isn't one, or at least there wasn't when I went looking (almost PhD
quantitative psychologist now working in web analytics). I was thinking of
starting a blog series, possibly a book called Web Analytics for Nerds or
something.

That's at least a month or two (95 int 1-12 months) off though.

~~~
tomrod
I'd love to correspond. ABD econ here, still in the grad game for another
year. Ping me if you're interested.

------
shanellem
Awesome post! Knowing how to use analytic suites is one thing, but knowing how
to accurately interpret the data is another.

