I was a little disappointed with the IA app. It works fairly well for the most p...

textfiles · on June 6, 2019

I just tried this in bash:

for each in `ia search collection:computerchronicles --itemlist`; do ia download $each --glob=*.torrent; done

I have myself a directory of torrents.

IntelMiner · on June 6, 2019

I'm glad it works now. At the time it didn't seem to list .torrent as a valid option though

textmode · on June 7, 2019

# If are not a Python user or want to try something different (faster), can be done with sh, sed, openssl, curl/wget/etc. plus a simple utility I wrote called "yy025" (https://news.ycombinator.com/item?id=17689152). yy025 is a more generalised "Swiss Army Knife" for making requests to any website. This solution uses a traditional method called "http pipelining".

   export Connection=keep-alive;
   n=1;while true;do
   test $n -le 8||break
   echo https://archive.org/details/computerchronicles?\&sort=-downloads\&page=$n
   n=$((n+1));done \
   |yy025|openssl s_client -connect archive.org:443 -ign_eof \
   |sed '/item-ia\" [^ ]/!d;s,.*=\",,;s/\"//;s,.*,https://archive.org/download/&/&_archive.torrent,' \
   |yy025|openssl s_client -connect archive.org:443 -ign_eof|sed '/Location:/!d;s/Location: //'

# Additional command-line options for openssl s_client omitted for sake of brevity. The above outputs the torrent urls. Feed those to curl or wget or whatever similar program you choose, or maybe directly to a torrent client. Something like

   |while read url;do curl -O $url;done

unmole · on June 7, 2019

I though HTTP pipelining was discouraged and that most servers disabled by default. Is that not true?

textmode · on June 7, 2019

You are probably thinking of pipelining in terms of the popular web browsers. Those programs want to do pipelining so they can load up resources (read: today, ads) from a variety of domains in order to present a web page with graphics and advertising.

That never really worked. Thus, we have HTTP/2, authored by an ad sales company. It is very important for an ad sales company that web pages contain not only what the user is requesting but also heaps of automatically followed pointers to third party resources hosted on other domains. That is, pages need to be able to contain advertising. HTTP/1.1 pipelining is of little benefit to the ad ecosystem.

However, sometimes the user is not trying to load up a graphical web page full of third party resources. Here, the HN commenter is just trying to get some HTML, extract some URLs and then download some files. The HTML is all obtained from the same domain. This is text retrieval, nothing more.

If all the resources the user wants are from the same domain, e.g., archive.org, then pipelining works great. I have been using HTTP/1.1 pipelining to do this for several decades and it has always worked flawlessly.

Typically httpd settings for any website would allow at least 100 pipelined requests per connection. As you might imagine, often the httpd settings are just unchanged defaults. Today the limits I see are often much higher, e.g., several hundred.

It is very rare in my experience to find a site that has pipelining disabled. More likely they are disabling Connection: keep-alive and forcing all requests to be Connection: close. I rarely see this.

The HTTP/1.1 specification suggests a max connection limit per browser of two. There is no suggested limit on the number of requests per connection. In terms of efficiency, the more the better. How many connections does a popular we browser make when loading an "average" web page today? It is a lot more than two! In any event, pipelining as I have shown here stays under the two connection limit.

LeoPanthera · on June 6, 2019

There's no speed limit if you login. At least, this was the case when I also batch downloaded all the CC MPEG-2 files, and it finished overnight.