Hacker News new | past | comments | ask | show | jobs | submit login
New Data Science Certificate Program (datascience101.wordpress.com)
132 points by swGooF on June 11, 2012 | hide | past | web | favorite | 50 comments

Why is this the top link on HN? There are already numerous courses available that will allow you to learn this stuff for free from very highly ranked universities, including Stanford [1] and CMU [2], among others. This will just teach you similar things while also taking your money and giving you a "certificate".

I guess if you want to enter a new field and you need to have some certifiable expertise, this may be a good option. That being said, if the field you plan on entering really does require some documented education, having this certificate will not even put you in the same playing field as those with actual degrees in the field, not to mention those with advanced degrees.

[1]: https://www.coursera.org/course/ml

[2]: http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml

The Coursera courses are excellent, but Coursera, Udacity, CMU, etc. are offering a different set of courses than the UW progam. For instance, I don't think any of the current players are offering Hadoop courses... In general, it looks like the UW program is more technology-specific and applied than the other programs.

Personally, I'd prefer the less technology-specific topics already on offer. But, my employer would be much more likely to hire someone with UW's course-mix. So, there should be some demand for that.

And, if we are talking about a career decision, $3,000 is small potatoes compared to the value of getting the right topics.

BigDataUniversity (http://bigdatauniversity.com/) also has free courses and cover hadoop and some other stuff.

The site appears to be push sponsor products and don't really talk about alternatives. Expect to see lots of endorsements of IBM products.

For me, I'm wondering if I'm learning the right material. If I self teach, how do I really know if I'm doing this right unless I get feedback? Also, many hiring managers might not recognize self-taught expertise. I might not want to work for those managers, but unfortunately, this includes a very large swath of potential jobs.

Of course, I wonder if this certificate gets me anywhere as far as employment goes.

> Why is this the top link on HN? >

Because this is the response of the "university system" trying to protect its cash flow. Perhaps you missed it, but udacity said recently it will do some certification program for "minimal costs", which was the beginning of the conversation. Education should be free (Khan style), says Udacity.

The University establishment's response is "here we will give you certificates, but you have to give us three grand."

Money for certificates, knowledge for free.

I would think this would be more appropriate to add to an existing skill set in another field.

Yes, there are some good options available online. However, the certificate program and data science are more than just machine learning.

When I hear "Scientist" I tend to think "PhD level work." Graduate level work in math, Comp Sci and statistics is not something that can be readily compressed into a 9 month program without substantial prerequisites.

Are today's "data scientists" really just software devs who have specialized in digging around in data and using various data mining algorithms with only a superficial understanding of their inner workings?

While I'm sure we have different definitions of "superficial understanding" one thing I've noticed as I've gotten more interested in ML/datamining during the final stages of my master's in CS is that solving real world problems with these techniques is often a very different experience than deeply understanding the theory behind them.

For example I couldn't implement an SVM library from scratch to save my life, but I do understand what it means to be a 'maximum margin' classifier, from a high level how the 'kernel trick' works, and why you would tune regularization and cost parameters. However this knowledge has been enough to help me in quite a few interesting problems.

Reading accounts of how others have solved real world data mining issues it's amazing how often a very simple model will do the job, and also how often, even among more serious researches, there's a bit of intuition in finding the right combination of parameters, and lots of trial and error in searching for which model/blend of models really does the job.

I think there's a lot of room for more people approaching data mining with the 'hacker' mentality. Sure you don't want 'data scientists' using a randomForest whose eyes glaze over when you mention the word "ensemble", or someone who couldn't explain in plain terms what a "maximum margin hyperplane" is. But, there is a growing space for practitioners in this space, that aren't necessarily as strong in the theory as people working in the pure research space.

Slightly off topic, but I just completed the online learning from data course offered by Cal Tech (for free), and am pleased to say that 1) I understand this thread, and 2) have implemented an SVM starting with a quadratic programming package (not quite from scratch!). I highly recommend the class for anyone interested, a re-run of the course is starting soon: http://work.caltech.edu/telecourse.html

good to know, thanks for sharing

Much of what you say resonates with what I have seen from a different vantage point - old school software dev who has dabbled in data mining. Simple models often work and are preferable - easier to explain. More data and simple models generally provides superior results to small amounts of data and complex models.

It seems like "ensemble" methods - combining the results of several different algorithms - is generally a less-than-rigorous exercise that involves throwing a bunch of different approaches at the problem and averaging the results.

It is good to hear that there is "a growing space for practitioners in this space, that aren't necessarily as strong in the theory." But the term "Data Scientist" seems a bit lofty for folks doing this sort of work.

The thing is that "less-than-rigorous exercise" is true in many areas of ML. Take for example neural nets, which are very popular and successful, even among real expert's there a lot of 'magic' behind why they really work. SVMs are loved partially because they work well, but also they are very sound from a theoretical standpoint, if you know the math you can show that it will work, this is not necessarily true with many other successful techniques.

Interesting side note for ensembles: 'averaging' is usually not one of the best methods for blending results. More successful approaches include using either a perceptron or a simply training a linear model to find appropriate weights for predictions from each individual model. I've even had a case where simply picking the MIN of each set of predictions worked surprisingly well for a particular problem.

The above btw is something that I think a "Data Scientist" should know, and is well out of the scope of a software engineer who just plugs values into prepackaged algorithms. A "data scientist" should be able to read papers [1] that explain these things, which is more than many software engineers do.

Now I'm not a data scientist, but while I can't write an SVM from scratch, when I'm working on data mining problems I am reading several academic papers a week. I really think we're looking at two sincerely distinct areas of expertise and it's not too lofty to look at someone who has to read academic papers to do his job as a "scientist".

[1] http://www.edscave.com/docs/Blending_Methods_AusDM2009.pdf

Are today's "data scientists" really just software devs who have specialized in digging around in data and using various data mining algorithms with only a superficial understanding of their inner workings?

Ideally, no. We're witnessing an overuse of the name "data scientist" (which has its own problems, but that's another story). There's a non-trivial difference between a data scientist who understands the theory used for the EM algorithm or belief propagation, and a "data scientist" who is performing large-scale data analysis using various data mining tools.

Unfortunately, they're both getting lumped together. To become one of the former, you need graduate-level maths, CS, and statistics, while this certificate caters to the latter.

> When I hear "Scientist" I tend to think "PhD level work."

There's a lot of non-PhD level work that goes into a large research project, often for aspects you may not have considered like animal care.

A "Data Scientist" might be a specialized software developer, but it doesn't follow that they have only a superficial understanding of the inner workings, even if that understanding is not good enough on its own to do much original research.

This is the first I've heard the term "Data Scientist," though Harvard recently announced a masters-level "Computational Science and Engineering" degree. (http://news.harvard.edu/gazette/story/2012/06/a-new-masters-...)

The bottom line is that due to the amount of data being generated by research, demand for programmers to help deal with it is rising.

"opportunism 101"

I love this explanation on quora btw. http://www.quora.com/Career-Advice/How-do-I-become-a-data-sc...

Most likely Udacity will have all those classes for free within another year anyway, with the opportunity to get a certificate that's actually widely recognized. Not trying to knock the UW program, but going to college to learn CS just seems like it's going to become really unnecessary really fast.

It's funny, on the NYC subways the city has now put up ads warning kids against going to college, and telling them to call the hotline to ask if the college is credible before enrolling.

Wow. Are the warnings targeting college in general, or the diploma mill types?

Considering that it's the subway, I'm guessing that it's probably the diploma mills. The last time I took the subway, which, granted, was three years ago, the place was plastered with ads for diploma mills.

It seems like sage advice either way. Going to college certainly has merit if you are there for the right reasons, but blindly going to college because that is what graduating high school students do is a mistake many make. Many of those people would be better off exploring other opportunities.

It's a little ambiguous. It basically says that many colleges are very expensive and none of the students end up getting jobs, so you should call the number to learn what kinds of questions to ask to evaluate whether the college you're considering attending is a scam or not. I wanted to take a picture but there was someone sitting in front of it. The first time I noticed these ads was Friday, I think they're either on the S or the 1.

The ads are targeted towards trade schools and GED programs.

A more accurate headline: Get a Certificate in Data Science in 9 months.

That is probably true, but I think the goal of the certificate would be to become a data scientist.

"scientist" usually implies a doctoral degree in some field.

I've known quite a few people who had very responsible day-to-day jobs conducting scientific testing who didn't have PhDs. I'm not sure I would exclude them from the category of "scientists".

Is science only science if it is aimed towards publishable research (which is clearly the academic view)?

Scientists communicate and leave archive quality research by way of publications. Anyone can "do" science, but I think it is a bit unfair to arbitrarily level-up smart people to the same level as those that go through years of training?

The lab techs that do scientific testing under the direction of a PI, they do science, right? Does that mean we need to relabel their badges to Scientist?

That just sounds like a debate about status - which I appreciate is very important inside academia but far less so outside.

I'm pretty sure it implies the use of the scientific method.

That's the old economy definition, anyway.

Looking at the UW description of the program, it is assumed you are already a software engineer or statistician. This isn't for someone walking in off the street with no background in the subject.

If it wasn't offered under the auspices of UW, I would have thought it a scam...

Me too at first, but that's because the link goes to a 3rd party blogger, not the actual program being offered.

"...the cost is around $3000" and the primary benefit seems to be you get to call yourself a "Data Scientist". There are tons of free resources online for "hadoop, NoSQL, machine learning, statistics, graph algorithms..".

That's true, but I find the one thing missing from learning online is curriculum: I can find a ton of resources to learn a specific thing, but because I don't know what I don't know, I don't know where to start. It's also hard to judge the quality of the material without being versed in the subject. This sort of structured course adds value in both of those areas. Whether that's worth $3k is a matter of opinion.

You can actually become a data scientist in 9 seconds because the term has no meaning.

Well for danes it would mean computer science. In 1966 Peter Naur coined the word datalogi (data-logy) which has since been used in danish for the subject computer science.

The data scientists I work with would strongly disagree. I've discussed this a bit on HN, and while the title is a bit nonsensical, it has come to stand for a mix of responsibilities.

Don't take my word for it, though--you should visit us and find out what data scientists do!

For a graduate certificate $3000 is actually quite pricey. At smaller Universities in Europe it is possible to earn a MSc in Business Intelligence & Data Mining via distance for just €4000 (http://www.itb.ie/StudyatITB/bn518BID.html). Plus their focus almost exclusively on open source software such as RapidMiner.

I looked at that link and am interested.

8 hours lecture online per week + whatever offline work, for 4 semesters. That format, at minimum passes the "sniff test". I think four semesters at that rate is long enough to legitimately teach the content.

My only question now is how well received is it by the world at large? Has anyone hands on experience with ITB?

From what I understand, the degree is more or less a structured way to become really versatile in RapidMiner. The relationship between the software development team and those teaching the degree is very close ( http://rapidminerresources.com/index.php?page=training / http://en.wikipedia.org/wiki/RapidMiner ). Therefore I suspect the core motivation for those choosing this degree is to reach a high level of proficiency in that particular software by studying with one of its developers, and earning on the way an MSc from a state university. It is a legitimate university, but don't expect any employer going wild about it.

Just as a disclaimer: I have not relationship with ITB or RapidMiner. A while ago I played around with the software because some of the tutorials are really interesting and very accessible for someone like me who lacks a deep statistical understanding(i.e. http://www.youtube.com/watch?v=OXIKydgGbYk).

This prospect makes me more interested. Learning theory is nice, but I'm interested in these topics primarily because I'm interested in the data and that requires proficiency in a good tool.

It's a bit more than half the price for a course that is half as long, so as such the pricing seems fair enough. Also it's not just a random certificate, but a certificate issued by a well known University. I'm sure ITB is a fine university, but it's hardly got serious name recognition, so I don't think that an argument can be made that an MSc from ITB will be worth more than a graduate certificate from UW.

The problem with certificates is, that you really have to look very closely to understand what was covered. For example Stanford has a range of certificates, some taking as little as about one day of work, the others a few semesters. There are so many factors involved such as professional, undergraduate, graduate, for-credit, for continuous education credit, and so on. Therefore I became somewhat skeptical of them as they almost always require more explanation than available on a CV (if we are talking brand value here).

I like UDub even tho I never went there (in fact i'm headed there right now) but I really hope they enforce the stated minimal program qualifications, i.e. applicants show aptitude in math, engineering, database design, programming, or the other thing slisted:


That is great point. Enforcing the enrollment qualifications can increase the quality of the content.

Nice, I should have spent some time doing this instead of working through Axler and Feller through my undergrad and grad school.


What advantages in life does having a certificate yield? Would an employer really care if I had this certificate vs if I learned it on my own?

Like anything, you cannot predict the future. You might find your perfect dream job as a result of having the certificate. You might just as easily miss out on the opportunity to take your perfect dream job because you're busy acquiring the certificate. Anything can happen.

My personal advice is to ignore any future advantages that may or may not come as a result and focus on doing it because you simply want to do it. If you enjoyed the process and feel fulfilled at the end, it doesn't matter what may come as a result. If you need to work hard to sell yourself on the idea of doing it, it is probably not worth doing.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact