Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a database of RDF files that describe the books (http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2), but its a bit of a pain to use and doesn't link the books back to the API that should be used for crawling Project Gutenberg (http://www.gutenberg.org/robot/harvest).


I think the previous version of the metadata included a path to the ftp server. Splitting the book id (4443 -> 4/4/4/4443) works for _most_ books, but there were somewhere between 800 and 3000 books organized in a different folder structure that I still need to track down.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: