Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>>> If someone wants to host the raw files to allow others to download it let me know. It is a 83 GB tar.gz file which uncompressed is just over 1 TB in size.

Anyone knows if it is possible to download similar data set for youtube and reddit? I have ideas for search engine based on it, but I don't want to write/maintain scraper scripts.



There's a very large dataset of Reddit posts and comments at https://files.pushshift.io/reddit/


I have most of the historical reddit data except ~year I use to train ML models. Let me see if I can find a public link for you...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: