Alternatively: 1. Take Coursera's excellent Intro to Data Science for free 2. Sp...

joshz · on July 17, 2013

Incidentally Coursera's Intro to Data Science looks/ed to be a trial run for the first of three classes in the UW Data Science certificate program [1]. Each class is a bit over $1k. UW did the same with Intro to Computational Finance and Financial Econometrics.

[1] http://www.pce.uw.edu/certificates/data-science.html

codyb · on July 17, 2013

Kaggle looks really cool. Thanks for this comment introducing me to it. They even have introductory competitions for those not well versed in data science with tutorials on things like python and random trees. Pretty sweet!

djvv · on July 17, 2013

You would be surprised how many people in the field have never heard of Kaggle.

achompas · on July 17, 2013

Or how many in the field are extremely skeptical of any lessons an aspiring analyst could learn from it.

joncooper · on July 17, 2013

It seems that you know something about the field.

Perhaps rather than offering snappy responses with negative tones, you could offer something constructive to the discussion?

Say, what you think the skills required for day-to-day work as a data scientist are, and how you'd suggest someone develop them.

Perhaps also what you think the best approach is to credentialing your learning--grooming a pedigree--if neither Kaggle nor a degree program are good approaches.

achompas · on July 17, 2013

Sure thing. Sorry for the previous terseness--I really really really hate this whole "Coursera --> Kaggle --> DS job at Facebook" meme when it (rarely) appears on HN, since it isn't even close to reality.

I'm not a data scientist, but I work with them very closely as an engineer and I've considered going down the same path. When I talk about data scientists, it's not a reference to any of the following:

> Engineers working with big data technology, like Hadoop, Storm, Kafka, who are essential but often uninvolved in model construction and evaluation.

> Analysts who develop models, then hand them off to engineers/IT to code them up (or keep them in Excel spreadsheets).

Instead, I'm thinking about someone with a specific background. They likely have a PhD, since that's an excellent way to experience the "ask-explore-code-test-present" workflow needed to answer an interesting question with real-world implications. The strong academic background is not necessary, but it greatly reduces friction during the research workflow (since you've spent 3-4 years in it). I'm getting a MS and working hard to make it as research-oriented as possible, fwiw.

This person also has a strong foundation in applied math. They might have worked on signal processing questions, applied algorithms for learning Bayesian network structure to proteins, or thought about the transition from Hopfield networks to RBNs or whatever awesome deep learning stuff is going on nowadays. A guy I respect described this quality as that of "a traveler," someone who can understand advanced work in a number of disciplines in addition to their specialty.

This person is an engineer. They learn languages easily, understand algorithmic complexity and think about the complexity of their models. They don't have to be Linus.

Finally, the person is forward-thinking. They understand that questions are motivated by business needs, and that answering these questions can have serious implications for the company or its partners. I should channel patio11 here!

Anyway I'm obviously very opinionated about this, but it's just one opinion. I'm happy to discuss this more with anyone who's interested, though--contact is in my profile.

achompas · on July 17, 2013

This is a recipe to get very good at basic analysis. It won't prepare you for the day-to-day responsibilities of a data scientist, though.

xiaoma · on July 18, 2013

What do you feel would needed to be added to the mix in order to prepare a person for the day-to-day responsibilities of a data scientist? Also which of those responsibilities do you see as most challenging?

achompas · on July 18, 2013

Yeah, sorry for the snarky one-liners. I wrote a bit more here:

https://news.ycombinator.com/item?id=6060821

There are two pieces Kaggle can't help you with: working through the full research cycle and developing performant models. It also emphasizes the wrong goals (for example error minimization is almost never your primary goal), but I need to work at some point and have spent enough time in this thread, so I'll skip that. :P Email me if you want to discuss, though.

Anyway Kaggle can't help with the full research cycle, since you're not identifying a relevant question yourself (this is surprisingly hard) or presenting your answer to others. The latter is hard for any route, since you really only encounter that type of volume in industry.

sker · on July 17, 2013

Thanks. I've been spending too much time on HackerRank lately. This Kaggle site will provide some new, interesting challenges.