

UC Irvine Machine Learning Repository - jcr
http://archive.ics.uci.edu/ml

======
jcr
Jacques Mattheij (hn:jacquesm) has a nice listing of free datasets that was
posted here years ago.

[http://jacquesmattheij.com/Free+Public+Data+Sets](http://jacquesmattheij.com/Free+Public+Data+Sets)

If you know of any others, please post them.

~~~
sebg
Kaggle => [http://www.kaggle.com/](http://www.kaggle.com/)

List of data sets => [http://www.datawrangling.com/some-datasets-available-on-
the-...](http://www.datawrangling.com/some-datasets-available-on-the-web)

Another list of data sets =>
[http://blog.mortardata.com/post/67652898761/6-dataset-
lists-...](http://blog.mortardata.com/post/67652898761/6-dataset-lists-
curated-by-data-scientists)

List of data sets on Quora => [http://www.quora.com/Where-can-I-find-large-
datasets-open-to...](http://www.quora.com/Where-can-I-find-large-datasets-
open-to-the-public)

Sub-reddit for data sets =>
[http://www.reddit.com/r/datasets](http://www.reddit.com/r/datasets)

------
infinitone
Anyone know of a dataset of people, like front facing photos of people's upper
bodies with no clothes on. I've been meaning to work on a cv method for
bodyfat % estimation, given a series of progression photos of a person. I have
a rough idea of how to do it, but I need a training set.

~~~
sebg
Per my other comment on the page, you should also ask here => Sub-reddit for
data sets =>
[http://www.reddit.com/r/datasets](http://www.reddit.com/r/datasets)

------
sgt101
I was viva'd in 1998 an yea! verily, I used Iris and Abalone (and 10 others)
in my thesis. But why, young padawan? why?

Coz they worked.

16 years of industrial machine learning research later...

The world is not like the UCI repository

Getting the data into the form that the analysis algorithm can parse, asking
the question... that is the work. Ok, also dealing with the answer - that too.

~~~
tensor
Data munging may be time consuming, but it is hardly difficult. The real
challenge in ml is still the algorithms and theory.

~~~
sgt101
No, I disagree.

Algorithms can be selected, it's not hard to learn which technique is suited
to which situation.

What theory are you talking about? What do we have - Kearn's contributions on
COLT and the descendant's there of (Michael Kearns is a great fella btw, and
COLT is super - but so 1990's), or Vapnik and Structural Risk Minimisation?

It's not really physics is it?

~~~
tensor
All the data munging in the world will not make progress of any sort in
improving the state of ML/AI. All advances come from better algorithms and
understanding of associated mathematics.

Applying techniques to new problems involves data munging, and that is an
important and useful task, but it is not difficult in the same way that doing
new algorithmic work is, nor does it advance the state of the art.

~~~
sgt101
I still think you are wrong.

There is no harm in developing a new algorithm, or coming up with new stories
as to how it works, but to say that "data munging" will not advance the state
of the art is to exclude the art! It's like saying that unless someone comes
up with new physics it's impossible to build better spaceships.

And if Nasa were to build a better spaceship would that advance the state of
the art in spaceship building?

------
alvaromuir
As a student @ UC Irvine I can tell you top notch talent leads most of their
coursework

~~~
Difwif
As a student @ UC San Diego with the ability to take a few courses at Irvine,
are there any specific courses or prof. you recommend looking into?

~~~
Govannon
I'm no longer a student there, but Shannon Alfaro was great at anything
parser/language design related, and Gopi Meenakshisundaram was great for
graphics. Eric Mjolsness is probably brilliant but everything I took from him
was very dense.

