Hacker News new | past | comments | ask | show | jobs | submit login

Cool, installing the applications on docker on my dedi.

I have a couple of question though:

Will the data remain archived on my system after it is updated? And what format will that be in?

Will there be a public API to access this data once uploaded, or for services such as Feedly to import back entries from feeds? (I would hope they would support that, but the public API would be enough for me.)

Thank you for providing this service.




After the greader*-grab programs upload data to the target server, it is removed from your machine. All of the data eventually ends up in WARCs at https://archive.org/details/archiveteam_greader

As for an API, someone will hopefully write one to directly seek into a megawarc in that archive.org collection, or import everything into their feed reading service.


Any way I could patch the program to stop it from deleting the data after it is uploaded?

Among other things I would like to set up an ElasticSearch cluster for my own feeds.

Is the WARC format defined somewhere? I haven't looked at any other ArchiveTeam projects so I'm not informed if this format is used elsewhere.


Yeah, you could patch seesaw-kit to not delete local data. Note that greader-grab just gets a random work item from the tracker.

There's an ISO spec for WARC and tools linked at http://www.archiveteam.org/index.php?title=The_WARC_Ecosyste...


I haven't tried it, but the --keep-data option might work?

https://github.com/ArchiveTeam/seesaw-kit/blob/master/run-pi...


Yeah, that looks like the right thing.


No, the data will be uploaded - first to an staging server run by ivank/"ArchiveTeam". Then it will be uploaded to the Internet Archive (Some has been uploaded already: https://archive.org/details/archiveteam_greader)

No, not currently. But the Internet Archive will provide the raw data and anyone is free to setup such an API :-)

Thanks for helping out!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: