Hacker News new | past | comments | ask | show | jobs | submit login

I've been working on a introductory STATS book for the past couple of years and I totally understand where the OP is coming from. There are so many books out there that focus on technique (the HOW), but don't explain the reasoning (the WHY).

I guess it wouldn't be a problem if the techniques being taught in STATS101 were actually usable in the real world. A bit like driving a car: you don't need to know how internal combustion engines work, you just need to press the pedals (and not endanger others on the road). The problem is z-tests, t-tests, ANOVA, have very limited use cases. Most real-world data analysis will require more advanced models, so the STATS education is doubly-problematic: does not teach you useful skills OR teach you general principles.

I spent a lot of time researching and thinking about STATS curriculum and choosing which topics are actually worth covering. I wrote a blog post about this[1]. In the end I settled on a computation-heavy approach, which allows me to do lots of hands simulations and demonstrations of concepts, something that will be helpful for tech-literate readers, but I think also for the non-tech people, since it will be easier to learn Python+STATS than to try to learn STATS alone. Here is a detailed argument about how Python is useful for learning statistics[2].

If you're interested in seeing the book outline, you can check this google doc[3]. Comments welcome. I'm currently writing the last chapter, so hopefully will be done with it by January. I have a mailing list[4] for ppl who want to be notified when the book is ready.

[1] https://minireference.com/blog/fixing-the-statistics-curricu...

[2] https://minireference.com/blog/python-for-stats/

[3] https://docs.google.com/document/d/1fwep23-95U-w1QMPU31nOvUn...

[4] https://confirmsubscription.com/h/t/A17516BF2FCB41B2




I wrote this paper a few years ago about this for education researchers: https://journals.aps.org/prper/abstract/10.1103/PhysRevPhysE...

This paper probably seems obvious to a lot of people but i found when i gave talks about things and read and reviewed papers people typically didn't know basic things like why you might leave some data out as a test set, why some models work better than others, when you use logistic regression versus linear regression, etc.


Nice. I see you cover hierarchical (multilevel) linear models, which is already a big step up from the techniques normally covered in STATS101.

The general advice about measuring/comparing models also seems useful.


Yeah I tried to make the point that even in the case of multi level models you still should consider their ability to predict because otherwise how can you trust the model understands the underlying correlation structure of your data? That’s because many people had been advocating for these models dogmatically while presenting very poor fit statistics (r^2<0.2) while making big claims. Since I finished my phd I calmed down a bit haha. Now I just run workshops and conferences instead. And I try to present statistics and machine learning as building a lab apparatus. Once the model is built then you can ask it research questions. But simply building the model is not research.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: