Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are the github repos intended to collect errata? Do you know of a database which has metadata for all the Gutenberg books?


There's a database of RDF files that describe the books (http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2), but its a bit of a pain to use and doesn't link the books back to the API that should be used for crawling Project Gutenberg (http://www.gutenberg.org/robot/harvest).


I think the previous version of the metadata included a path to the ftp server. Splitting the book id (4443 -> 4/4/4/4443) works for _most_ books, but there were somewhere between 800 and 3000 books organized in a different folder structure that I still need to track down.


The github repos are intended to collect issues and received pull requests. Project Gutenberg doesn't have a public bugtracker, nor do they use version control.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: