Undergrad class in Statistics will do a better job, I believe.

I think it really depends on what you're trying to do. I'm all for statistical rigour, and stats will help you with a nice structured dataset. A lot of the time though knowing how to scrape data and convert between formats opens opportunities that a pure statistician wouldn't have. Most of the interesting visualizations I've seen lately don't involve much stats at all.

Neither an conventional undergrad class nor this are "better." They just have different focuses.

But many (perhaps "most") intro stats classes don't involve any programming. So if you want to "implement" anything, an intro stats class may not get you there (even if it gives you a better foundation to understand what various statistical manipulations actually mean.)

Probably, but undergrad classes in Statistics are not available to everyone.

Did you notice any specific weak areas?

I think this is aimed at an audience outside the academic system, and I like the focus on active involvement with data. I think there could be an activist underpinning here - Paulo Freire style data literacy. (http://www.infed.org/thinkers/et-freir.htm)

Having said that, my Maths teacher self wants to do some work on the glossary. In the spirit of 'code talks' I'll post some definitions up and link them to the issue tracker and see what happens...

>Probably, but undergrad classes in Statistics are not available to everyone.

If you have internet connection you can learn Statistics, there is a lot of good resources. For example:

https://www.coursera.org/course/stats1 (maybe not undergrad level, but good place to start)


Speaking of Coursera, Data Analysis[1] starts January - "applied statistics course focusing on data analysis".

[1] https://www.coursera.org/course/dataanalysis

Undergrad classes don't teach you how to scrape: http://schoolofdata.org/handbook/recipes/scraping/

scraping is not related to data.

Why not? Scraping is part of the data acquisition and cleanup process. You need to do it unless you're working with Bloomberg terminals or Census data.

I agree. If I want to engage with my local government on a local issue (e.g. anti-social behaviour) I need data. The data is increasingly available on Web sites. Hence scraping and format conversion become important...


Just the other day I was thinking -- I end up losing in debates because I'm unable to cite data. Scraping and acquiring data is a key part of research, so I'm very much looking for a text that presents the big picture as well as the nitty gritty details from beginning to end.

