You all are the experts on what you need, and I'm all ears. Fire away!
In my experience learning ML, learning concepts in theory is good and all, but I never really understood the details until I had to implementing the algorithm.
The one thing I'd really change is to tighten up the range of tools used. It seems helpful to show students a range of tools, but it usually ends up being a major distraction for students and a lot of extra effort for course staff. Any such course is already going to be a blitz of new concepts and technology.
Go full Python, plus interactive tools as helpful (Weka, Tableu). Let them pick up R or D3.js or whatever later, after they have a better appreciation for the concepts and such which make them useful.
I would want something on identifying actionable dimensions and how to talk to people to figure out how to help them
1. Computer Science:
- Algorithms/D.S. Enough to be able to identify what sort of problems are C.S. problems or statistics problems.
- Systems. You may not have to build a system but it is useful to know how the real world systems are built, what sort of constraints come into play, what trade-offs are there. Especially, if you will be working with large scale datasets. You don't want to be remembered as the dude who did a select * order by rand limit 10 on an HBase table.
- Programming Language: Learn one programming language well. Depending on your job, you may need to learn more than one. Python is a nice starting language. One useful trick to learning more languages is to learn one language really well and see how stuff you can do changes in the other language. Also, side note: don't get into one true language debates. They are useless. Every language has its pros and cons.
This is tough to kind of characterize. Because the field is so diverse.
- Probability: Get some basic probability under your belt. Getting the intuition right is more useful than learning a lot of stuff. You can pick up more complicated stuff (Stochastic processes, Stochastic Calculus stuff) as and when you progress further anyway.
- Statistics: At the very least, figure out hypothesis testing, biases, p-values, estimators and regression. The more statistics I learn the more I am of the opinion that the tools matter less as much as a critical understanding of where statistics should apply. What biases are there and how you can identify them.
- Linear Algebra: Again a very basic undergraduate linear algebra course (with vector spaces) should help you understand say Matrix completion stuff. Of the top of my head, I think grokking how vector spaces work, what independence means, how dimensionality reduction, kernels work is useful.
- Machine Learning: This is mostly a tie up of the kind of stuff you learn in the math courses. My basic 101 ML grad school covered the following: Unsupervised Learning (KMeans or some clustering algorithms), Supervised Learning (Discriminative, Generative approaches, bias - variance tradeoffs etc). I also learnt some silly bullshit on Genetic algorithms.
So yeah, as long as you learn the basic fundamentals really well, you should be able to pick up stuff fairly easily.
E.g. Recommendation systems, I never learnt most of this in school as part of a specific course. However, once you know what goes on in Matrix decomposition and know what regression is, you can understand the why of why people do what they do when they solve these problems.
Do you have any recommendations for big data components? Would it be worth teaching how to use a Hadoop cluster, or is a small toy cluster too far abstract from what using a large cluster requires?