
Show HN: DataHub, open source datasets for Artificial Intelligence - theo31
Hi HN!<p>I am an undergrad student trying to build interesting things with AI.<p>Recently, I was looking for a dataset I could use for a new project. I realized that it is really frustrating to go through all the government websites (with terrible UX) just to find some usable dataset.<p>I set out to build a GitHub for datasets, named DataHub. Right now, we have more than 1000 datasets from Montréal and New York City, with more cities coming soon (and possible government agencies).<p>All of this is wrapped into a powerful search. It&#x27;s a breeze to find a dataset to work on.<p>I&#x27;d be interested to know what you guys are looking at when searching for datasets and if DataHub could be of any help!<p><a href="https:&#x2F;&#x2F;datahub.now.sh&#x2F;" rel="nofollow">https:&#x2F;&#x2F;datahub.now.sh&#x2F;</a>
======
alex_g
Very cool! The interface is really beautiful and I would love if data.gov was
formatted like this.

What is your strategy for acquiring these datasets? Are you going to pull them
from data.gov and other websites?

What happens if those datasets are changed on data.gov, will you detect that?

~~~
theo31
Hi!

For now, I only save the links to the data on other websites. This way, the
data is always up to date.

In order to acquire those datasets, I wrote a couple of scrapers using
chromeless.

------
bruth
Nice work. You definitely should get in touch with the Dat Project folks.
There are several of them on the core team and in the community who are
actively scraping government websites for open data.

~~~
theo31
Thanks! I've never heard of them, will get in touch!

------
colobas
There's a typo. "Datahub has more then 1200 datasets" should read "Datahub has
more than 1200 datasets"

~~~
theo31
Thank you! I'll fix it right now

------
maz1b
Would love to use this once there's some kind of public health / medical data
of some sort!

~~~
theo31
Just added US Health Data (around 2,000 datasets):
[https://datahub.now.sh/u/ushealthgov](https://datahub.now.sh/u/ushealthgov)

I'd love to get your feedback on this!

