This demonstration of using R to geocode (via the ggmap extension of ggplot2) was particularly cool (and also, as an example of the OP's organized notes, includes a copy of the data since the original link went dead): http://f.briatte.org/teaching/ida/101_geocoding.html
Also the amazing growth in R in just the last few years. http://www.tiobe.com/index.php/content/paperinfo/tpci/R.html
(O know that ranking is not the greatest argument for a language BUT it does show (somewhat) its growth. Specifically the flexibility of R (12 ways to do one things) has allowed it to evolve quickly and the libraries are just amazing. RStudio has changed R with Hadley Wickham's ggplot2, dplyr, reshape2, tidyr and etc. It just makes the the language do so much and change so quickly.
I use to be in love with all things Python and now I still respect Python and Pandas but I kind of gone to more domain specific tools.
Dplyr and ggplot2 (noted by baldfat) are exceptional.
I recently wrote a tutorial on dplyr here: http://www.sharpsightlabs.com/dplyr-intro-data-manipulation-...
To put this simply, dplyr's syntax is set up to create streamlined workflows. All of the major data management tasks (sort, subset, group, summarize) are easy to do. And they can be "chained" together (much like using pipes in Unix).
Ggplot (another R package) is an amazing data visualization tool. The syntax has a deep underlying structure, based on the Grammar of Graphics theoretical framework. I won’t go into that too much, but suffice it to say, when you learn the ggplot2 syntax, you’re actually learning how to think about data visualization in a very deep way. You’ll eventually understand how to create complex visualizations without much effort.
GGplot and dplyr are the reason I settled on R (instead of Python). When you use them together (again, using "chaining") you can explore your data rapidly and also create really high quality analyses.
So in terms of Python resources:
- The classic NLTK book: http://www.nltk.org/
- A Programmer's Guide to Data Mining: http://guidetodatamining.com/
- Hitchhiker's Guide to Python: http://docs.python-guide.org/en/latest/index.html
- Statistical Inference for Everyone: http://web.bryant.edu/~bblais/statistical-inference-for-ever...
- Frequentism and Bayesianism: A Python-driven Primer: http://arxiv.org/abs/1411.5018
- Probabilistic Programming and Bayesian Methods for Hackers: http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabil...
- Software Carpentry's primer: http://software-carpentry.org/v5/novice/python/index.html
- And of course, LPTHW: http://learnpythonthehardway.org/
I also came across Practical Python and OpenCV book and have found it really useful for implementing computer-vision exercises...it's not free, but the author has a blog where he regularly posts insightful examples: http://www.pyimagesearch.com/
(I haven't created any class-specific lessons but will definitely post them when I have them ready)
What I mean is that they have you spend weeks and months learning data types and 'for loops' when in reality, you don't need those to get insights from data. There are other toolsets (namely: dplyr and ggplot) that don't require you to know control flow, etc. If you're a developer, I'm sure that sounds strange, but believe me, you don't need software development style knowledge. What you need is to be able to do data manipulation and data exploration. You need to be able to turn data into insight.
Said differently, these courses are teaching statistics and CS in R. What they should be teaching is data manipulation and data visualization right from the start.
I'm not a developer, just a mechanical engineer and having extreme difficulty in landing a job. Do you believe if I do the course on the data analysis will allow me to get into the Financial industry? I really enjoy the concept of FOREX, stock trading and setting up algorithms for it like what 'investment companies' do or the 'hedge fund' guys.
This environment is... how to describe it? "often wild horse, sometimes donkey"
Can be very (very!) defiant sometimes and have some well known issues with very big objects and small memory machines (is easily linkable with databases so this is a minor trouble)... but is also much rewarding.
I think that any hour that you spend on R will be worth it, just don't be a masochist, buy or borrow a couple of good introductory books, read some manuals and internet links like the former and, when you have some practice and wish to improve, one of the best ways is to remember the repositories, install some packages of your interest with
And take a look to as many R-code of those as you can.
Highly recommended. Try it.
Using R to simulate the finances of public sector pension funds:
Curiously a job position was announced (four monts ago, currently closed) at the end of the note in the same link. They were looking for an R expert.
From the syllabus:
> The aim of the course is to show how to perform elementary
data analysis in the social sciences.
I feel like the time series section doesn't teach basic time series analysis at all. For example, they show plots of the ACF and PACF without going into detail about what those are and how they're different. I don't think that's helpful.
This looks like a nice set of examples, but far from an actual course!