
How to teach sensible elementary statistics to lower-division undergraduates? - luu
https://statmodeling.stat.columbia.edu/2019/11/03/how-to-teach-sensible-elementary-statistics-to-lower-division-undergraduates/
======
eximius
I wonder if we aren't going about this a bit wrong.

As much as _I_ love learning about the nuts and bolts of statistics, would the
general populace not be better of learning the basics of why (experiment
design, independence, covariance, etc) and then the basics of how augmented
with tools like probability toolkits that do the math for you?

Because, honestly, the math in statistics can be boring and error prone. I'd
much rather a population capable of setting up experiments and using the right
abstractions and then using a calculator for it.

And as much as I want to strive for greater mathematical literacy, perhaps
that should be sought independently elsewhere. Diving too deep into the math
might impede understanding of statistics.

Of course, I don't know where or how you draw the line on who should be taught
what...

I'd also tend towards Bayesian approaches too (p-hacking - or equivalents-;is
of course still possible, but there are presentations of results (i.e.,
probability distributions of a result value) which make hacking to an
arbitrary value less tempting) which does tend to have a bit more complicated
math past the first introduction.

~~~
xouse
I've heard of experimental classes before, where instead of a normal grading
rubric that includes tests and homework and grades out of 100%, you
effectively start at 0% and you do rpg like quests/tasks to gain experience
points and "level up" to your final grade.

I think if one has a statistics class that's aimed at giving people better
statistics literacy it would be a remarkably good fit for a gamified
interactive curriculum using an in class currency to make bets/predictions!

You could earn currency the same way you do in games, completing mundane tasks
you really don't want to do, like having every homework problem you
successfully complete give you a few gold. This incentivizes studying and the
effort adds some real emotional value to the currency you wager later.

You could have a whole final fantasy style rpg underlayer where currency is
used to buy armor/weapons to make you stronger and help you fight bigger
monsters for more currency and advancing in the "game" with the ultimate goal
of amassing enough currency through your activities to "buy" your A in the
class or whatever.

Then you could have the meat of the class be in lecture scenarios where
everyone is presented with a situation that's implicitly meant to test your
knowledge of a common gotcha in statistical literacy like base rate neglect in
a live statistical simulation. You get to watch as your characters are
subjected to "rolls" of the dice that determine their fate based on if you
chose A or B. Do you believe the wizard with the diagnosing spell with a x
rate of false positive/negative who says you have disease such and such with a
y incidence rate?

It's just the basic base rate neglect fallacy scenario, but putting people in
a situation where they're incentivized to care about it and rewarded for
getting it right.

And by doing these simulations live in class it makes for a lot of spectacle
and fun with everyone getting to gamble on these scenarios and then get to see
the results play out in real time with all the suspense that gambling normally
entails.

You could even have group scenarios where the entire class has to pick option
a or b as a whole with everyone's money collectively on the line, and have a
heated discussion period where different parties are trying to explain why the
class should choose a or b. Imagine being the sole voice of reason trying to
valiantly explain to your class how base rate neglect works with everything on
the line!

This just seems like such a natural fit to me, and I'm really excited thinking
about it.

~~~
eximius
I thought I'd be onboard from the first paragraph but then you took a hard
turn into gamification.

That seems like entirely too much overhead for a topic that already doesn't
have enough time to teach the fundamentals properly. Not to mention that, at
some point, you hit a wall of severe diminishing returns on promoting interest
through gamification without enormous payoffs. People will either be engaged
with the material or they won't. Your class/their grade is simply not high
enough stakes for someone not interested in the material to slog through with
that extra effort.

What would happen is a couple folks who would have already been motivated will
suggest their answer and the rest of the room will follow one of them.

I like the 'pick and choose'-additive model you mentioned. It builds in extra
credit along the way, too. 12 weekly assignments worth 5 pts apiece (say 1
question, 1 pt, they can be longer questions), 3 exams worth 20 points each.

Of course, that is merely making explicit a fairly normal system where you
just don't know the assignment counts or weights beforehand.

But you could adjust it so that, instead of 12 homeworks, you can do a handful
of projects that require deeper understanding. For people uninterested, just
getting their credit, they can slog through the simple homework. For those
motivated to learn the material more deeply, they can do the harder, deeper
work (that hopefully takes less overall time - no need to punish them).

------
bonoboTP
Statistics can actually be taught in an elegant, "pure" and enlightening
manner without cramming formulas and rules of thumb.

I had some lectures of the rules-of-thumb type and nothing stuck for me. It
was all jumping across different approximations, implicit assumptions, "use
this formula if 5<n<30 and this other one if n>=30", the whole thing felt very
ad hoc.

Bayesian formulations and the general didactic style of machine learning texts
did the trick for me. By understanding the Bayesian approach, I can see
through the frequentist style much better, things don't seem nearly as scary
as before.

People treat statistics as some dark art, when its principles are actually
quite simple and the applied techniques can be derived nicely by making your
assumptions explicit and _knowing_ exactly where and by how much you're
approximating and why. You need to go through some simple proofs of how
distributions, e.g. Poisson and binomial become normal in the limit.

Maybe different people are different, but for me learning things at a shallow
level is quite difficult. It rather works like I either understand it or I'm
just memorizing for a test with a very confused and scrambled mental model.
The point of "getting it" or grokking it is often quite sudden, not a linear
progression through understanding it. If I stop in the middle, I may be able
to reproduce the content of a course convincingly, by knowing what I'm
supposed to say and regurgitating facts, but not really being convinced of it
deep inside.

~~~
commandlinefan
Any book recommendations for somebody who ignored stats as an undergrad but is
now realizing how important/useful it really is?

~~~
conjectures
Bayesian Data Analysis by Gelman et al (author of the linked blog, as it
happens).

------
maps
How about stop trying to take shortcuts and have an actual rigorous
progression? It sure would help out on problems of statistical errors in
scientific literature, Sure not everyone might be able to pass these classes,
but that is kind of the point. Why do we want to just pass through students
with weak to no understanding? Ah yes tuition fees.

~~~
swiley
Having an argument for why rather than just being handed a formula helps me. I
don’t see why people think this would make it harder.

~~~
chongli
Because the “why” of statistics has many turtles before you get to the bottom.
Measure theory, topology, real analysis, abstract algebra. You need to learn a
lot of math before you get a complete picture of the theoretical underpinnings
of modern probability theory, which forms the foundation of all of statistics.

Most people just want to calculate a P value or a 95% confidence interval for
the mean of whatever they’re researching. They’re not interested in how it all
works.

~~~
adrianN
You need a bit of analysis, but I'm pretty sure that you can understand
p-values for normal distributions of a single variable without abstract
algebra or topology. Understanding simple cases from first-ish principles
makes it much easier to swallow handwavy explanations for complicated stuff.

~~~
soVeryTired
You can't understand the _why_ of the normal distribution without Fourier
analysis though - which is pretty heavy going for anyone who's not hardcore
science or engineering.

~~~
adrianN
Can't you introduce the normal distribution as the limit of the binomial
distribution? I think you can prove the central limit theorem without using
terribly advanced math.

~~~
chongli
That's only the most limited version of the theorem which has since been
renamed the de Moivre-Laplace theorem [1]. The rabbit hole goes much deeper
when you talk about the most general form which works for any set of
independent and identically distributed random variables, not just binomial
random variables.

[1]
[https://en.wikipedia.org/wiki/De_Moivre–Laplace_theorem](https://en.wikipedia.org/wiki/De_Moivre–Laplace_theorem)

~~~
adrianN
Sure, but do you need the most general form to build some intuition about
p-values? I don't think so.

~~~
chongli
That’s moving the goalposts. The original claim was about getting a complete
picture. A full understanding of the “why” of statistics going all the way to
the bottom.

------
roenxi
Society at large would get a lot more use out of a course that teaches people
to spot the more common statistical paradoxes so they know when they need to
call in a Real Statistician.

There are a huge number of hot button debates where it is a pointless and
uphill battle talking to anyone who isn't familiar with Simpson's Paradox.
Pushing ideas like that out into the broader Arts, etc, communities would do a
lot of good. A course leading up to that insight would probably make a good
elementary stats course.

------
jimhefferon
We use a text (Lock, Lock, Lock, Lock, and Lock) that teaches Conf Intervals
and Hyp Tests via simulation. I understand this is becoming more common. We
also get further than those topics than OP says-- ANOVA is very possible.

My degree is in Math but I've been impressed that Stats professional societies
take education very seriously. There is widespread, evidence-based, discussion
of how to present the materials.

I want to say one more thing, based on prior experience dscussing this kind of
thing here: we should acknowledge that some students find this hard. I've been
teaching the subject for twenty five years and I really do have some idea of
what I am doing. Many students are taking this for their Liberal Studies
course and I don't teach at Harvard so those may be factors, but another
factor is that different people think differently. For some folks a five-step
argument- this is so, and from this we conclude, etc.- is just not how they
usually work. I think this course is a help to them, both in developing
intellectually and in worthwhile life knowledge, but a person also needs to
respect that they are doing something hard.

~~~
mkl
> We use a text (Lock, Lock, Lock, Lock, and Lock)

This is a real book! [http://www.lock5stat.com/](http://www.lock5stat.com/)

~~~
jpm_sd
and if you're curious about the back-story, as I was:

[https://today.duke.edu/2012/11/lock5stat](https://today.duke.edu/2012/11/lock5stat)

------
6gvONxR4sf7o
This might be election season speaking, but I'd love to have people take a
course based on case studies, telling misleading stats from suggestive stats
from strong evidence. I'm so frustrated seeing misleading figures parroted
around by talking heads in media, in politics, and online. Often it's not just
misleading, it's wrong. I suspect it's because these people (or their teams)
just don't have the practice digging in to do a cursory "Is this bullshit?"
check.

As far as societal bang for your buck, I'd give people bullshit detecting
classes. I'd focus on this goal specifically, not incidentally.

------
alexhutcheson
To keep them engaged you need some applied project work with real (although
preferably pre-cleaned) data sets and a real software environment. In my
college courses I would lose focus in the lectures when equation after
equation was presented and explained, but the homework projects with real data
forced me to learn and internalize how to use different statistical tools
(although it didn't really teach me how they are implemented).

In my college econometrics courses we used Stata for this, but I'd probably
recommend R if you have a choice. The book "R for Data Science"[1] is really
good for teaching the basics of data manipulation, graphing, and running
regressions. However, it's not a statistics book - you'd need to consider it a
"supplement" to teach applied skills. You'd also want to skip the chapters
that focus on cleaning data, programming, etc.

[1] [https://r4ds.had.co.nz/](https://r4ds.had.co.nz/)

------
dmurray
For those unfamiliar with the "lower-division" term of art, Google offers:

> Lower division courses are any course taken at a junior college or community
> college or courses offered at the freshman and sophomore level at a four-
> year college or university regardless of the title or content of the course

~~~
throwawaysea
In some schools it refers to courses that are taken by undergraduates in their
first and second years (the earlier/lower half of the four year degree). It
happens to also be the case that you can transfer into a four year university
from a community college/junior college after two years, thus bypassing lower
division coursework. However this can vary a lot school to school and degree
to degree because sometimes four year schools want you to complete certain
coursework at their standards rather than external standards (especially if
the course is directly a part of the major you’re pursuing).

------
ArtWomb
One of the best "Intro to Stats" at scale is Kaggle's Titanic Survivability
dataset. Real-world, tangible data. Providing an intuitive feel for the power
of multi-variate linear regression. Those interested can then seek out a more
rigorous essential backgrounding.

Titanic: Machine Learning from Disaster

[https://www.kaggle.com/c/titanic](https://www.kaggle.com/c/titanic)

------
lordnacho
Seems to me the real issue is giving enough time to it. After studying stats I
always found it odd that most of a science course is "the stuff of that
subject" rather than stats, which would teach you things relevant to every
subject. After all every subject at some point says "we took these
observations, and because of that we think..."

------
BasilAwad
Seeing theory is good.

[https://seeing-theory.brown.edu/](https://seeing-theory.brown.edu/)

------
iandanforth
If anyone actually has this problem and doesn't know about StatQuest, well
then, StatQuest!

[https://www.youtube.com/user/joshstarmer](https://www.youtube.com/user/joshstarmer)

------
blzrdnofreespch
Here’s a really cool, very fun intro to stats article I recently came across:

[https://www.jwilber.me/permutationtest/](https://www.jwilber.me/permutationtest/)

If nothing else, it’s amusing.

