Hacker News new | past | comments | ask | show | jobs | submit login
Data Mining and Analysis: Fundamental Concepts and Algorithms [pdf] (dcc.ufmg.br)
158 points by rawfael on Sept 15, 2013 | hide | past | web | favorite | 12 comments

I had the luck of being taught by Prof. Meira - who is one hell of a professor - while this book was in draft stage. Here you'll find not only a great toolkit of techniques of data mining, but more importantly, a very comprehensive arsenal of concepts to properly analyse your data and the results of the techniques applied.

As grizzlon said in a sibling comment, you'll find the chapter dependencies on page 10, in case you need to look up/learn something specific on demand.

The chapter dependencies is one of those things that is so obvious you wonder why it isn't in every textbook.

How was the class? The book is very comprehensive. I'd imagine it would be hard to cover it in a year, let alone a quarter or semester. What environments and languages did the class program in?

Agreed about the chapter dependencies. I hope this becomes a trend!

The class is one semester long, and is divided between data analysis, frequent pattern mining, clustering and classification. Most of the book is either covered or briefly discussed. It is indeed a 'deep' class: UFMG has a strong data mining/machine learning/information retrieval/natural computing/other related areas program, so each class can afford to be pretty specific.

The class is taught at the same time to undergrad and graduate students (the difference being that each group has a different class project; grads have to write a basic research paper).

There are lots of pen on paper theoretical quizzes and tests. Technologically, AFAIK there are no restrictions on which technologies to use, but popular choices are those which the TAs are most experienced in, usually Weka, C++ and Python. Like other classes taught by Prof. Meira, students are pushed as far as possible in terms of evaluation difficulty, then graded on a curve. UFMG alumni considering these classes should be careful if they decide to take it along with other difficult classes.

Here is the course page (in pt-br, but Google Translate should be OK): http://homepages.dcc.ufmg.br/~meira/DokuWiki/wiki/ensino_md

Off-topic: OP seems to share my first and one of my last names lol. Another seemingly brazilian commenter in the topic also seems to share my first name. This may or may not indicate a correlation between the name Rafael and computer science in Brazil lol.

Ahhh - UFMG - solid place. I heard UFMG is why Google is in Belo Horizonte.

As for names, when I saw yours, I thought of this R Almeida. http://www.sherdog.com/fighter/Ricardo-Almeida-11

That would be correct. Google settled in Belo Horizonte by buying Akwan Search Technologies, which was co-founded by UFMG professors Nivio Ziviani[1] and Berthier Ribeiro-Neto (currently Google's Head of Engineering in Brazil). Information Retrieval - a department in which they play key roles - is another strong area of the university.

Ziviani currently is a co-founder in recommendation startups Zunnit[2] (just upstairs in the building I am right now) and Neemu[3].

PS: Ha! Maybe this guy is the reason I can't get my screen name everywhere I want!

[1] http://en.wikipedia.org/wiki/Nivio_Ziviani [2] http://zunnit.com/ [3] http://neemu.com/

Like the Chapter Dependencies on page 10.. Don't think I have seen that before

This is great, I've been looking for something like this, to refresh and build on a class I took in college. To my surprise, this is my professor's actual book I look forward to reading the book again. Thanks!

Thank you for the PDF. I am a math major and have been feeling rather bummed that I haven't had much of a chance to learn techniques for data analysis, so this is really appreciated.

Awesome, thanks!

I have ~3.5million pieces of structured data to go through, I will find this paper interesting.

It's great to see quality content from Brazil. Thank you.

Looks great!


Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact