
Founders, teach your employees statistics - ttunguz
http://tomtunguz.com/founders-teach-your-employees-statistics
======
birken
I'm a huge fan of pandas (<http://pandas.pydata.org/>) for data analysis. It
offers a lot of the basic functionality of R, but everything is in python. The
original author of pandas, Wes McKinney, even wrote a book about it: Python
for Data Analysis (<http://shop.oreilly.com/product/0636920023784.do>).

One caveat I would mention about data analysis would be that statistics is not
just number crunching. It is really a bit of an art to making sure you are
looking at the right sample of data in the right way, and ensure you are
accounting for all potential biases. Surprisingly, I have noticed as I've
gotten more experience doing data analysis, it takes me longer to do and I
make less confident assertions. But on the other hand, I now very rarely make
assertions which were incorrect, which is extremely important. I believe that
incorrect data analysis is significantly worse than no data analysis.

So, the advice I would give to people getting started is whenever you come to
a conclusion by analyzing a particular piece of data, ask your "if I look at
the data differently, can I come to the opposite conclusion?". You would be
surprised how often the answer to this question is yes, and that is a good
indicator that you a) need more data or b) cannot make a significant
conclusion. This can be especially difficult when you are already _sure_ you
know the answer to a question even before you do the data analysis, but you
really have to be disciplined about it.

~~~
Wilduck
> One caveat I would mention about data analysis would be that statistics is
> not just number crunching

Agreed. Analysis that is well presented _tells a story_. The story begins with
where the data comes from, then describes how the data was analyzed, and
finally how the results relate back to the real world. If any of these pieces
are missing, or unconvincing, it's a good sign that something is off, like you
say, with either a) The data, or b) the significance of the conclusion.

I emphasize the story aspect because the act of laying out all of the
assumptions in a semi-narrative form goes a long way towards deobfuscating the
potentially confusing statistics. It is also the sort of discipline that can
help to lay bare all of the assumptions that could make you _sure_ that you
know the answer.

------
btilly
I agree that learning statistics is invaluable.

I disagree that teaching R is the same as teaching statistics.

~~~
Homunculiheaded
I definitely have to agree. There is nothing more dangerous than plugging some
numbers into R, typing t.test(some_numbers) and proclaiming "Hey look at this
great p-value!" without understanding what a p-value is actually telling you.

The most important part of doing statistics right is actually understanding
what you're doing, and you're far better off using less powerful tools that
you understand than pumping data into fancy functions that you can't really
explain.

And more important than just raw number crunching: understanding statistics
gives you an improved intuition regarding data. The real value in statistics
is the new tools it provides for thinking about problems.

A better solution than "learn R" would be to read through something like Head
First Statics and the move on from there to more advanced stuff, and only then
start hacking around in R.

~~~
glaugh
Could not agree more. Most stats tools are so complicated that one spends more
time learning/thinking about the logistics of running an analysis ("Should I
run a chi-squared or a Fisher's Exact Test? And how do I get it to run?") than
about the part that demands real human creativity: thinking through the right
questions to ask, biases in the sample, etc.

Disclosure/shameless-promotion, I'm the cofounder of
<https://www.statwing.com>, an easy to use stats tool.

------
droithomme
Statistics is incredibly important since every scientific publication and
public policy decision these days refers to statistical results to justify
their conclusion.

Given this, a class in "bad statistics" would be even more useful - how
numbers are presented using a veneer of statistical analysis to fraudulently
imply incorrect conclusions that benefit the interest group or organization
publishing or financing the study. As case studies such a class would have no
shortage of examples, for example the more than 50% of peer reviewed journal
articles whose findings were actually false. (per Ioannidis,
[http://www.plosmedicine.org/article/info:doi/10.1371/journal...](http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124))

~~~
ttunguz
Ha! That's a great point.

------
kfk
I say teach them the business. Statistics is a fine tool, but you need to know
what you are doing. You can build a regression line to predict sales in a
given month, but you migh forget important drivers or to just _ask_ people
with experience.

On top of that, remember that looking at the past to predict the future does
not always work. First, because you might not have a past (if you are a start
up...); second, because new tech and facts might screw up your regression
analysis very easily, especially in a fast moving sector (and if you are a
start up...).

Said that, I find simple _descriptive_ statistics very useful, I wish people
in business knew at least what a variance is.

------
dizzystar
One of the most difficult things to do is tell the numerically illiterate how
your conclusions are useful. Many people have a huge distrust of math and
believe intuition trumps all considerations. I think it's a cultural thing.
Don't hire people that are afraid of quantification. They don't need to know
the exact procedures, but they shouldn't be violently striking against
quantification. Your sales team doesn't really need to understand statistics
beyond that it serves a certain function, focusing on certain measurable
targets.

------
dpritchett
I'll bet you could tell some great stories from your AdSense days to
illustrate your point.

~~~
ttunguz
Yes, we used a ton of stats to inform product decisions, measure the health of
the auction and inform our sales team (prioritize leads).

------
freshhawk
birken has made this point already but as a more general one. If you are
already using Python in any way then use that instead of R, you will be much
happier.

If you are not already using python then it still might be the right tool over
using R. R has great functionality but my god are there ever warts in the R
language and ecosystem.

Also: I'll add my support to the opinion that you need to be very careful that
you understand the statistics you are doing or you will be asking your tools
to lie to you and not know it. A whole lot of the statistics you are going to
want to do will be more advanced than stats 101.

------
michaelochurch
Problem: then they will realize they are getting a shitty deal with 20,000
shares out of 80 million and want a real slice. Better to leave them
numerically illiterate.

