That's something that I'm currently trying to understand myself. I haven't yet f...

detaro · on July 9, 2016

WARC files are raw recordings of crawler runs, including HTTP headers and other metadata. The raw, archival result of the downloads, that you can extract the files downloaded from.

https://en.wikipedia.org/wiki/Web_ARChive

noufalibrahim · on July 9, 2016

During my time at the Internet Archive when we were working the wayback machine and related stuff, we wrote an arc/warc python library to parse and unpack these files. The library is over here https://github.com/internetarchive/warc. Just in case anyone is interested.

toomuchtodo · on July 9, 2016

Also, https://github.com/ikreymer/webarchiveplayer

vram22 · on July 9, 2016

That sounds like it could be useful.

Thanks, Noufal.

mihaitodor · on July 9, 2016

Yeah, but they claim that "URLs are directly available in the Wayback Machine too" over here: http://www.archiveteam.org/index.php?title=Coursera

What I don't get is which URLs...

detaro · on July 9, 2016

I guess if you go directly to the URL for a course/video via the wayback machine you get the content as well?

mihaitodor · on July 9, 2016

So far, I couldn't find any one that works, but maybe I'm missing something :(