Getting them from the Archive through was an exercise in frustration. IA offers (and heavily recommends) using the torrent download option to ease on bandwidth cost
Unfortunately for what ever reason, there's no way to pull down the .torrent files using this method.
In the end I had to simply pull the MPEG-2 videos down one by one over the course of several months (due to speed limiting on IA's end)
# If are not a Python user or want to try something different (faster), can be done with sh, sed, openssl, curl/wget/etc. plus a simple utility I wrote called "yy025" (https://news.ycombinator.com/item?id=17689152). yy025 is a more generalised "Swiss Army Knife" for making requests to any website. This solution uses a traditional method called "http pipelining".
# Additional command-line options for openssl s_client omitted for sake of brevity. The above outputs the torrent urls. Feed those to curl or wget or whatever similar program you choose, or maybe directly to a torrent client. Something like
You are probably thinking of pipelining in terms of the popular web browsers. Those programs want to do pipelining so they can load up resources (read: today, ads) from a variety of domains in order to present a web page with graphics and advertising.
That never really worked. Thus, we have HTTP/2, authored by an ad sales company. It is very important for an ad sales company that web pages contain not only what the user is requesting but also heaps of automatically followed pointers to third party resources hosted on other domains. That is, pages need to be able to contain advertising. HTTP/1.1 pipelining is of little benefit to the ad ecosystem.
However, sometimes the user is not trying to load up a graphical web page full of third party resources. Here, the HN commenter is just trying to get some HTML, extract some URLs and then download some files. The HTML is all obtained from the same domain. This is text retrieval, nothing more.
If all the resources the user wants are from the same domain, e.g., archive.org, then pipelining works great. I have been using HTTP/1.1 pipelining to do this for several decades and it has always worked flawlessly.
Typically httpd settings for any website would allow at least 100 pipelined requests per connection. As you might imagine, often the httpd settings are just unchanged defaults. Today the limits I see are often much higher, e.g., several hundred.
It is very rare in my experience to find a site that has pipelining disabled. More likely they are disabling Connection: keep-alive and forcing all requests to be Connection: close. I rarely see this.
The HTTP/1.1 specification suggests a max connection limit per browser of two. There is no suggested limit on the number of requests per connection. In terms of efficiency, the more the better. How many connections does a popular we browser make when loading an "average" web page today? It is a lot more than two! In any event, pipelining as I have shown here stays under the two connection limit.
I wanted to download all of the Computer Chronicles. Both for viewing offline and to have my own "set" of files. I even re-encoded them to HEVC (from MPEG-2) and put them up here https://intelminer.com/torrents/TV%20SHOWS/Computer%20Chroni...
Getting them from the Archive through was an exercise in frustration. IA offers (and heavily recommends) using the torrent download option to ease on bandwidth cost
Unfortunately for what ever reason, there's no way to pull down the .torrent files using this method.
In the end I had to simply pull the MPEG-2 videos down one by one over the course of several months (due to speed limiting on IA's end)