

Ask HN: What's the best way to retrieve educational data? - vital101

I'm working on a project right now that requires a lot of data about higher education institutions to even get started.  I need the following information about colleges in the US:  State, Name of College, List of Departments, and List of Department Designators (CPS for Computer Science, BIO for Biology).<p>How would you go about farming this data?  I've thought about submitting specific requests to Amazon Mechanical Turk, but I wasn't sure if there were any other resources out there that might be useful.
======
JimmyL
I don't think it's listed in one place - I would get a list of school from
Wikipedia or the Dept. of Education (they must have a list somewhere), and put
it out as a MT task. If you do this, do a search for how to properly use the
service, as there seems to be a certain etiquette and way to interact with the
Turkers to get best results. One example of the kind of post I mean is
[http://iamelgringo.blogspot.com/2008/09/mechanical-turk-
now-...](http://iamelgringo.blogspot.com/2008/09/mechanical-turk-now-
with-25-percent.html), but I know there have been a few others posted here.

If you don't want to Turk it, you could put it out on craigslist or Kijiji as
a freelance contract, but I suspect this would be more trouble than any
savings you'd get over MT.

~~~
vital101
Thanks for the insight. I've actually found a fairly comprehensive listing of
colleges in the US, so formatting the submission for MT would probably be the
toughest part.

On a similar note, has anyone ever had any success with having end-users
populate data like this? For instance, ask them to "Add a school" or "Add a
department" and such? I only worry about having to fact-check all of the
incoming information with this method.

~~~
adinobro
If you are worried about fact checking you could do ask two different people
to enter the same data. If both people enter the same data then there is a
higher chance that it is correct.

~~~
vital101
Excellent idea. It's doesn't guarantee accuracy, but I think that it would go
a long way in making the site more accurate.

