And I also think you're taking the right approach building/hacking/doing rather than going to an institution. This stuff is so new that I find myself spending 5-10% of my time just trying to stay up on the latest tech.
One thing I would add though, data science is really 3 things usually: business knowledge, hacking and lastly stats/machine learning. The stats piece is shockingly easy as more and more modules/packages/libraries make it possible to create/train a model in 2 lines of code. (Applying the right model to your data set is difficult.)
The other shocking thing is that really, 80% of my time is probably spent hacking, and most of that is just spent on getting data.
Quoting the section 'An Academic Shortfall':
"Academic credentials are important but not necessary for high-quality data science. The core aptitudes – curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature – that distinguish the best data scientists are widely distributed throughout the population."
In my estimation, none of those aptitudes are covered by teaching technical skills like databases, NLP, ML, graphical models, and the other topics this curriculum covers.
The "core aptitudes" generally boil down to asking the correct questions, establishing the correct answers, and correctly defending them. Academia doesn't automatically instill these skills, but it can do a great job of doing so.
Either way, inside or outside the ivory tower, an ace programmer who masters NLP, ML, Hadoop, and everything else could easily still come out without the required core aptitudes, and be thoroughly unprepared to do what data scientists are really expected to do: answer questions.
This is an applied curriculum with a focus on specific technologies that enable an analysis-bent people to leverage their brains with technology. That's why data work is so beautiful -- it's space to demonstrate unquantifiables like curiosity, diligence, creativity, and grit.
The quality of your projects is likely a good metric for your aptitude for data work, which is why I strongly advise working on a personal project.
I'd love to get more pull requests with more materials that teach analysis!
Note: preemptive clarity: this isn't a language flamewar thing, I'm genuinely curious.
Also, I don't have a list of these handy, but I've found long annotated notebooks/blog posts of worked data science examples very helpful for refreshing my memory on applied techniques. http://derandomized.com/ is a great example, maybe other HN readers have some favourites we could add.
> I geared the original curriculum toward Python tools and resources, so I've explicitly marked when resources use other tools to teach conceptual material (like R)
Why did you choose Python over R? Personal preference, a bent toward Python in the online courses you found, or is Python generally considered the de facto language choice for professional data scientists?
I imagine you could tackle these courses with any programming language, but if Python seems to be the way the data science community is going, it would be helpful to know that. Personally, I'm curious because I'm trying to decide if I should pick up Python on the side to supplement the knowledge I already have of R and various other programming languages.
Also, thanks for putting all this together. It's great!
Otherwise, asking what technology to use is like asking what mode of transportation to use to get to a destination -- it's not the point. The important part is that you arrive. Some days walking over the mountain is the least sensible method, other days high seas make taking the boat around it impossible. The tool that gets the job done is the best tool.
Thanks for putting these resources together.
 -- http://nltk.org/book3/
Particularly http://videolectures.net/pascal/ has plenty of lectures and tutorials from their summer schools on very relevant topics for machine learning.
Regardless of any path you take, these are very exciting times to be in computing sciences. All the best to everyone and keep upgrading your skills and knowledge :)
You'll be required to login is all.