

Algorithms for Massive Data Sets - helwr
http://www.cs.princeton.edu/courses/archive/spring02/cs493/schedule.html

======
maurits
One can also have a look at the stanford course "Mining Massive Datasets". No
video lectures (yet) but course information here:

[http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.h...](http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.html)

The book here: <http://infolab.stanford.edu/~ullman/mmds.html>

~~~
DanielRibeiro
Also posted and discussed here: <http://news.ycombinator.com/item?id=1984449>

------
ajays
Dated: 15-May-2002 . Just sayin'. There's been a lot of work in the last
decade on this subject.

~~~
PaulHoule
yeah my first thought was like all this stuff is about ten years old

~~~
helwr
I posted more recent stuff under the link below. Feel free to add whatever is
missing.

------
grk
<https://massivedatasets.wordpress.com/> from a Danish Technical University
course with the same name

------
cdavid
As mentioned by others, this list is old. The cited algorithms are certainly
still good to know, but the meaning of massive is different now. Today,
massive means:

    
    
      - too large to fit even in big iron (few people can afford them anyway)
      - low value: a lot of data are useless / too bad to be useful, so not taking into account all of them all the time is not too bad.
    

Nothing outside near linear or even sublinear algorithms really work in those
cases. Singular Value Decomposition is a great example. Up to recently, it was
mostly about about doing fast, accurate SVD for large matrices. There is a
recent surge on approximate algorithms which see any data only once at most.
This is useless for most "hard" engineering tasks, but for analysis of large
graph data, you can most likely tolerate a few % of error in your biggest
singular values to still get something useful.

The fun part is that things as simple as matrix multiplication become an
interesting and potentially hard problem.

------
pyronicide
Would anyone know if there's audio/video of these lectures? I keep seeing
amazing classes like this and wishing that everyone could enjoy them instead
of just the local students.

------
siculars
Slides, notes and papers from Sergei Vassilvitskii's class on a similar topic,
COMS 6998-12: Dealing with Massive Data,
<http://www.cs.columbia.edu/~coms699812/> .

~~~
akivabamberger
Great class.

------
helwr
also see [http://www.quora.com/Machine-Learning/What-are-some-
introduc...](http://www.quora.com/Machine-Learning/What-are-some-introductory-
resources-for-learning-about-large-scale-machine-learning)

------
chrisaycock
Also of potential interest is Stanford's "Workshop on Algorithms for Modern
Massive Data Sets" (MMDS):

<http://www.stanford.edu/group/mmds/>

