Hacker News new | past | comments | ask | show | jobs | submit login
Intro to Data Science at UC Berkeley taught by Jeff Hammerbacher (datascienc.es)
165 points by rxin on March 14, 2012 | hide | past | favorite | 24 comments


We're on iteration 2 for this course, and it's still in somewhat rough shape. If you plan to devote significant time to the lectures, I'd recommend waiting until next spring, when we'll be teaching iteration 3 online at http://ds-class.org.

Later, Jeff

Awesome! The URL seems similar to other Coursera courses. Will this course be offered online through Coursera?

Maybie it's just me, but "Data Science" seems to be basically enterprise information systems (ETL, dataflow diagrams, data mining, Data Warehousing, business inteligence, and so on) but with a cool mobile-social-esque feel associated to the term.

I had classes in information systems analysis & design with the exact same content.

There's definitly been some ruffed feathers from the BI professionals and data analysts over the "data scientist" title. We like this band before they were popular man!

And there's some truth to the criticism that its mainly a rebranding. Someone (can't recall the source, sorry) recently defined "data scientist" as "a data analyst who lives in California."

That said even though many of the generalized tasks are the same I think there's some value to the title. There are a broad range of big pro and analyst roles that don't fit. Lots of big pros just make ssrs reports or just build star schema or look at data for insights but don't apply any hypothesis, test, repeat method.

The key differentiators for a data scientist IMO are

- can do everything required to go from piles of unorganized data to usable insights. From data munging to visualization design to programming to applying statistics correctly to analyst activities like knowing what business questions to ask

- when doing analyst work they operate using scientific(ish) methods to test and verify data hypotheses.

That describes many data analysts and BI pros that don't have cool titles now, but may soon. Recognizing the difference between people and businesses that do all of that vs report writers and ad hoc olap browsing users is valuable and positive IMO.

So, basically, you are saying that the main difference is that data scientists also make desicions based on the data, while the BI/DA works as a "data guy" for executives. Is that a correct way to put it?

In a way there seems to be a parallel between the enterprise programmer vs. hacker, and the business inteligence/data analyst vs. data scientist.

Yeah or at least the execs are saying "getting more users is important. How can we improve signups?" instead of "get me a time on signup page metric on report x."

A data scientist is like an analyst that doesn't have to go beg the tech guys to collect a new data set or build a new mining model, etc.

When Jeff & DJ Patil started using the term "data scientist," they were at Facebook and LinkedIn making products ("People You May Know," etc.) via machine learning on massive datasets.

It may be my ignorance, but when I hear "enterprise," "BI," "ETL," etc., I'm picturing some poor analyst doing database JOINs in order to dump the latest widget numbers into a PowerPoint table for the next board meeting.

Insofar as there is such a thing as "data science," I think it means making transformative use of data (ie by creating tools or models), not just summarizing it.

The idea is that new technology make all this doable by one guy. That is what he's teaching.

For some reason I viewed data science as all that (only as a means to an end) plus statistical analysis. Only one way to find out. =p

Some of that does include statistical analysis.

Lots of back and forth over the nomenclature as usual. I hope not to obfuscate further by adding my definition.

I'm currently a BI consultant aspiring to the title of data scientist and here's my motivation...

Traditional business intelligence skills basically refer to people who are 'IT guys that have finance knowledge' ...so generally you'll find yourself doing pretty general reporting along with some financial performance management (FPM) albeit at the data modeling/ metadata modeling level (you're building metadata models and cubes/reports dashboards with drill down not just flat reports.) All of this is done at the whim of some exec/BA/line manager all of whom (in my experience)seldom understand the subject well enough to actually pose sensible strategic questions.

Data science implies several levels of creativity expressed through solid technical skills along with a dash of journalism. Maybe it is just a rebranding but what it represents to those in the field is a total paradigm shift in terms of where and how the skills are applied. This is key because all too often my work as a BI consultant boils down to churning out x number of meaningless reports by a certain date so that some head of department can get his bonus and justify the Oracle purchase that incidentally resulted in a 3 day trip to Paris funded by a stunningly sophisticated sales team.

If I come off cynical it's because I am passionate about data. I believe that data science and the paradigm shift it represents has the power to really change human lives and I believe that it has a key role to play in the future of the evolution of our species.

This year's offering has some changes from the Spring 2011 version of the course (the assignments are all different), but you can view the Spring 2011 at http://datascienc.es/spring-2011-course/

Anyone know the password to watch to old videos?

Try the course name, minus the "intro to", in one word.

That would be "datascience", lower case.

Bummer. It appears we have been locked out from the videos.

That worked for me. Thanks.

Related: Data-Driven modeling course by Jake Hofman (Yahoo Research): http://jakehofman.com/ddm/recent-posts/

lots of practical exercises, playing with real data and APIs

I must say I am quite disappointed... watched 2 hours of this, then started skimming, sorry but everything I have encountered is quite trivial (and I see myself as a rookie in the field). Did other HNers find new bits in those lectures? if so, please point out.

I took this class last year (Spring 2011). Hammerbacher's a great professor. He focuses on teaching real-world data analysis tools and skills.

It does looks great. Especially the examples.

I'm confused since it looks like the course has already started. Can we still submit old assignments for credit?


It has videos from last year (Spring 2011). Videos for this year's offering should be posted online later.

nice collection of resources: http://datascienc.es/resources/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact