

Stanford CS Book: Mining of Massive Datasets [pdf] - yarapavan
http://infolab.stanford.edu/~ullman/pub/book.pdf

======
moultano
Pretty cool. I work in search quality at Google, and this is a pretty decent
overview of the more universal tricks I've picked up from people on the job,
as well as a lot of things I didn't know. MinHashing in particular is one of
my favorites.

Also, if you like this, I'm trying to collect resources of this quality in
this subreddit: <http://www.reddit.com/r/learnit>

~~~
izendejas
Thanks for sharing! This is a great set of resources.

------
jak0
If you guys mostly have CS background you should definitely check this out :
<http://www-stat.stanford.edu/~tibs/ElemStatLearn/>

It's Data Mining from a statistician point of view. You can download the
entire book for free on the website and the graphs and computations are done
with R.

This book is also one of Hal Varian's favourites:
<http://www.dataspora.com/blog/sexy-data-geeks/>

------
jaykz52
I've been thinking lately of (finally) pursuing graduate studies, and data
mining is an area that I find drawn to. Obviously Stanford is doing some
significant research in this area, but I've been out of academia for 4 years
and I somehow doubt I'd be a competitive applicant. Does anybody have personal
experience with other universities/programs that are doing extensive research
that they'd like to share? It'd be greatly appreciated!

~~~
eob
I was out of school for 3 years before going back to pursue a PhD. It can be
done, but you're going to have to put in a _lot_ of work the first year
getting back into the swing of academia, and even more if you're going back to
specialize in something you don't already do.

------
Raisin
Anyone have a epub or Mobi format? I just got a kindle and the converter for
it ruins code samples if you take it from PDF.

~~~
j2d2j2d2
I used to face the same issue with my Kindle and bought an iPad, because of
how frustrating I found this, and am quite happy with the Kindle app on it.

Very smart of Amazon to hedge themselves on the hardware so cleanly.

Sharing my story not to diss the device, but in case anyone else is having the
same predicament and considering jumping.

~~~
moultano
Whenever I asked around on academic circles about this, the ipad was the
consensus. The kindle DX does an ok job according to the people I talked to,
but the slow page turning drove people crazy when trying to understand a
complicated paper referencing previous math/diagrams.

I really wish the academic world wasn't standardized around formats that only
work for print. There are some LaTeX->html converters out there, that could
presumably be used to make epub, but I have no idea how well they work.

~~~
evgen
Your peers are generally correct. I used a DX for technical papers and PDFs
after going through the conversion hell you are stuck in (albeit with a Sony
Reader) and then jumped to an iPad when it came out. The best advice I can
offer is to hop over to mobileread.com and check out the forums to see what
the current state of the art is. Back when I was doing two or three
conversions a day the tools to use were Rastafarian and PDF2LRF but I am
guessing there are better options available now. You may also want check out
Calibre, which is an ebook library management app but one with a lot of built-
in conversion routines.

~~~
mdaniel
Holy cow, could a project have picked a worse name than "Rastafarian"? If one
searches for "Rastafarian", you can guess what turns up. But even "Rastafarian
pdf conversion" seems to drop the "pdf" and returns lots of results for
converting to the Rastafarian religion. Same for "convert".

I guess I'll have to go trolling through the mobileread.com forums, but just
damn.

------
clay
Chapter 2 of this book is awesome. Covers specific implementation and design
strategies for map reduce matrix multiplication and table joins.

------
jacobshea
Thanks, great idea for the reddit group.

------
johnconroy
An amazing resource. Brilliant... Stanford lead the world in open education
(MIT are great too)

