

Ask HN: Historical data sets available online? - pzaich

As a former History major, turned software engineer, I been wondering lately where software and big data might be able to assist our historical perspective. I&#x27;m sure there any data sets that are ripe of aggregation and visualization. What are some interesting data sets available online?
======
Someone
I doubt you will find historical data sets that are 'big data'. If stone,
paper or clay tablets are your storage medium, you will find it hard to buy
the stuff to store a terabyte of data on, even if you could generate it (4k of
text on a page is 250,000,000 pages in a terabyte)

There are small interesting data sets, for example in the archives of the
Dutch east India company.
[http://resources.huygens.knaw.nl/das/EnglishIntro](http://resources.huygens.knaw.nl/das/EnglishIntro):
_This site presents tables which give a virtually complete survey of the
direct shipping between the Netherlands and Asia between 1595-1795 "_

For larger data sets, it probably isn't that hard to find daily weather
measurements stretching over a century or more.

Also, churches may have large data sets of births, deaths and marriages that
can be mined.

In all cases, for historical data, I think the 'be mined' part will be more of
a challenge than the software part (e.g. figuring out why more people from
region X ended up on ships to Asia in years Y to Z)

------
kevindalias
There are a ton of interesting datasets at InfoChimps, but many are almost
completely unstructured. They require a lot of preprocessing and natural
language work to ready them for analysis. It's usually worth it, though, and
the ETL coding has really helped me grow with Python.

[http://www.infochimps.com/datasets](http://www.infochimps.com/datasets)

What other historical analyses have you worked on?

------
pizza
Not probably what you were going for, but
[http://www.math.uah.edu/stat/data/HorseKicks.html](http://www.math.uah.edu/stat/data/HorseKicks.html)
has an interesting backstory, to do with the development of Poisson
distributions.

