
Show HN: Public Datasets for Machine Learning - rahul_1212
https://medium.com/@matelabs_ai/public-data-sets-use-these-to-train-machine-learning-models-on-mateverse-4dda18a27851
======
jackschultz
I keep saying this, but I feel like gathering correct, valid, and formatted
data is the biggest part of machine learning and data analysis these days.
Learning the algorithms takes time, but using libraries for learning is
decently quick. Getting the data you want to use takes the most time and
effort.

Great to see posts and sites like this that share public data for people to
learn ML and possibly even result in benefits.

------
miesman
Enron_Corpus - Not a clean dataset for training but a unique resource.

[https://en.wikipedia.org/wiki/Enron_Corpus](https://en.wikipedia.org/wiki/Enron_Corpus)

"The corpus is unique in that it is one of the only publicly available mass
collections of real emails easily available for study, as such collections are
typically bound by numerous privacy and legal restrictions which render them
prohibitively difficult to access"

