Hacker News new | past | comments | ask | show | jobs | submit login
Review of the First Three Johns Hopkins Coursera Data Science Courses (jeffheaton.com)
65 points by ignacioelola on May 14, 2014 | hide | past | favorite | 26 comments

I am probably not the typical student for these courses. I already work as a data scientist. Additionally, I am a Ph.D. student in Computer Science, and the author of several Artificial Intelligence (AI books)

Coming from HN, this is one of the great things about Coursera courses for me. But it seems to freak out some students when the bar for top of the class is set by people with extensive professional backgrounds. [1] What can make for a corrosive environment is that there is often someone who stokes their ego by talking about how easy the course is in the forums...though this is rarely anyone at Jeff Heaton's level [or an HN'er].

[1] I took an introductory programming course with an HNer who had been on a c++ technical committee. But I am used to looking around the table and not seeing the dimmest bulb in the chandelier.

I completed the first two courses in the first four-week block and am currently taking the third course. Mr. Heaton's review provides a really good summary of what to expect.

Overall, I'm also happy with the course. I was expecting a little more degree of difficulty and a little higher workload than what I've run into so far. If you're an experienced developer with a Github account, the first course can be completed in a couple of hours. The R Programming course was more along the lines of what I was expecting. So far, the third class is closer to the first than the second (in terms of difficulty...does require a bit more time).

Going into the course, I wasn't expecting to come out a "data scientist" ready to land a full-time job in the field. My experience so far confirms that expectation. But it's a fun course, a good way to get started in R, and a good way to spring-board your exploration into the field. It's nice to have deadlines as a motivation to keep on track and stay on a track for learning. I'm hoping by the end of the curriculum I feel confident enough to try and land some small free-lance projects.

I'm paying for the "official" certification. I'm not sure if it's really worth it, but at $50/class it's not putting a big dent in my finances.

Same here. The first three were a good start. It was a little annoying that there is zero feedback on the peer assessments other than the grade. The third class starts off slow but by he last week is at about the same level as the R Programming course. My biggest complaint ( which applies to all of these courses) is that the lectures are just narrated PowerPoint presentations. Are all moocs like that? I'm not expecting polished videos at the same level as Kahn Academy but these lectures are just a small step up from fading the slides myself.

Now I'm taking the next three. They are a good continuation that picks up where the first three left off. I was looking forward to the Statstical Inference class. It has been almost 10 years since I took intro to stats in college. For someone without any stats background this course will really step up the difficulty. I was even more disappointed with the lectures on the stats cours. The yellow highlighting as he reads each line on the slide is extremely distracting. But the content is exactly what I was hoping for.

I've done a little hacking with R for data heavy analysis at work when excel couldn't handle the data. I'm really glad to be taking advantage of this opportunity to get more experience with it in these course. My day job is implementing the 'production' side of this kind of data processing with java and hadoop in the healthcare space. Hopefully this specialization will help me better communicate with our clinical/science teams.

You summed up pretty well how I feel about the first three to four (I've taken the first three and am now taking "Exploratory Data Analysis").

The ease with which I'm going through it is due to my 20+ years of programming in multiple languages, so I've pretty much seen it all. I'm also assuming that these classes are intended for more statistics oriented people who may not have a ton of programming experience. Maybe both of course with the later classes appearing to be a little more stats oriented.

It is easy for me, but I am learning a lot too, not sure I would have dedicated myself to learning R if I didn't have these classes or if I didn't have something in my job that needed it.

The projects are actually fun if you allow yourself enough time to make them into something more than they are. The "Getting And Cleaning Data" class had a little bit of leeway in the project for doing things your way as long as you documented why you did them.

Anyway, I took it signature track as well and am having fun, so it's definitely worth it to me. I'm really looking forward to the "Statistical Inference" and "Practical Machine Learning" courses to help spring me into more AI and Machine Learning topics.

I find it amazing how many complaints there are on the official forums form these classes. I'm not sure what people were expecting. But equivalent material is available online for people with 2000-4000 dollar price tags.

These classes will always amount to what people can make of them. My expectation from this isn't to become a data scientist, it's simply to improve from where I current am. After I've taken a few more of these I'll try some Kaggle.

My only issue so far has been the quality of lectures in the Inference Class, but even then it's likely worth the time and money invested.

My interest was piqued by this article, but upon initial investigation, I'm having trouble stomaching putting $50 toward learning how to use github... Any reason I shouldn't just skip ahead?

The $50 is an optional fee for the Signature Track. If you pay for this track, Coursera adds some extra checks to validate your identity and issues you a "Verified" certificate.

You can still take all of the courses for free and get a certificate, but Coursera won't validate that you actually did the coursework.

I'm guessing you could pick and choose to pay for only certain classes, but you have to pay for all of them to earn the overall "Specialization" certificate.

See the following url for details: https://www.coursera.org/specialization/jhudatascience/1?utm...

To employers out there, how much does the fact that a candidate completed a Coursera/Udacity course factor into your decision to hire him? Does it matter if he paid for the certificate?

I'm not an employer but do interview people for my organization and I'd see completion of online courses as a positive. Not necessarily for the knowledge gained but for the ability to finish what one has started and as an obvious indicator of interest in continuous learning.

Whether or not the candidate was part of a MOOC won't directly factor into the technical evaluation though, irrespective of whether the courses completed were relevant technically or not.

Completing does not factor into hiring decision. A paid certificate does not factor in, either. In almost all cases, decision to hire is based on:

1) Whether we like the person (i.e., can we get along with him for months, years, etc.)

2) Is the person honest?

3) Was this person recommended, and do your views align with the recommender's?

0, pretty much the same as a traditional degree. It wouldn't even cross my mind to ask about the certificate.

You wouldn't consider a traditional degree to have value in the field of "data science"? So to you, "data science" is indeed truly defined as "statistics without a proper statistical background"?

(It's an honest question, since that's my impression of "data science" as a statistician)

The certificate does not matter to me, but completing the course is a bonus if the candidate mentions it. It demonstrates to me the ability to affect self-improvement.

I'm about to finish the ML course (after 3 tries.) And essentially you have to battle the huge amount of boredom and keep doing it week after week without stop. Just yesterday I took a quiz without even watching the lecture or anything: scored 4.5/5. Proceeded to finish the programming assignment, again without bothering with the lecture, 100/100. Not to say I won't watch the lecture later, I want to make sure I know exactly the way it's done (will do today probably, I wanted to finish the assignments fast this week because last I almost passed the deadline.) But they are passable without that much effort. Just persistence.

But they are passable without that much effort. Just persistence.

It depends on a person's background. There are people like Jeff Heaton who have years of professional experience in Data Science taking an introductory data science course, and there are invariably people taking machine learning with no programming experience. But in the sweet spot for any class there will be people who are really stretched. People who will spend twenty or thirty hours on a programming assignment that some people complete in one.

I enrolled in the last iteration of Van Hentenreck's Discreet Optimization. A great course and his enthusiasm is infectious. I learned a lot. Saw where I needed to go. But there was no chance I was going to pass. I just don't have the chops...yet, hopefully.

One of the things that makes Discreet Optimization a great MOOC course is that it can be approached at different levels. A student can attack the problems using dedicated optimization libraries. If that's not enough of a challenge, they can write their own algorithms. And if that's not enough, they can prove optimality for each of their solutions.

And like every Coursera computing course there are people who can do all of it. And most who cannot.

Damn, now I had to add it to my watch list in Coursera, seems very interesting. Of course, my background is in maths, so even if I did no machine learning back then I know how to fill the gaps as needed and as fast as possible. If someone just rang my door and told me he passed ML without any prior programming experience, I'd hire him: it definitely shows an incredible will and self-learning ability. In my case... Well, I got to know a little better some formalisms of machine learning, and also got me thinking more about ML problems, which is fun and gets me brewing with interesting ideas.

Discreet Optimization is full of fertile ideas. And they're all NP hard. And that means that any success is meaningful. It inspired me to work on getting the tools I need for another assault.

I've become good at understanding lecture videos played back at faster than real-time. Sometimes I can get up to 2x playback speed with VLC and still understand what's being said. It'd be so boring watching things in real-time. I'm so glad Coursera let you download videos so you can do that.

Yes, absolutely have to agree with this! The lectures can be time consuming in some of the classes and this really makes it bearable. You really just need to get the gist of what the professor is presenting and then use the lecture notes/presentations as-needed. I would probably skip the video lectures altogether, but occasionally there is something in the lectures themselves that you need to pick up on. At least this has been my experience with the "Data Analysis" classes, my first foray into MOOCs.

The online viewer at Coursera (including the iOS app) has also speed setting: I watch them at between 1.5 and 1.75. Occasionally 2x, and more often than not I skip 30 sec chunks when nothing interesting is going on.

Yes X 1000! Once I started doing this I started to finish courses, and lectures. Before, I was nodding out within a half hour. I'm not a smart guy either. Just an average, or low average IQ(too lazy to take the test actually). I watch the videos at 1.4x or higher, and slow down the speed when something complex is introduced. (Teacher must have good diction, and an accent I'm familiar with--no mumblers)

It is amazing how under-valued persistence is in engineering. It is the trait that I find the most exciting to discover in any candidate.

I think we are doing the specialization at the same time.

As an open source maintainer I can say that “question asking etiquette” is NOT common knowledge.

I don't have any problem with having that as a credit earning question. But I hated it when it carried 20-25% of the credit in one of the quizzes. This course could have asked more questions in the quiz to test its students more.

Could anybody say how these compare to Andrew Ng's ml-class? I'm almost through ml-class and I'm wondering, whether these courses add much.

Andrew Ng's class has hours and hours of content each week and it is pretty thorough (I am just auditing, don't have much knowledge).

I feel that Data Scienctist Toolkit was a slacker course, it could have covered more topics or tested the students more. Five questions in each quiz and four such quizzes and you are done.

These courses are morale boosters, you can earn a certificate in four weeks and that keeps some motivated.

I like the concept of five late days for the whole course, lets you stay on track despite one's busy schedule.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact