

Ask HN: How could you make/find a spider for JSTOR articles? - corporalagumbo


======
venomsnake
Ask the federal prosecutors in Boston - they have a lot of information
gathered already.

On a not so sad note - writing a high performance fully async spider is a
trivial task so it is just a matter of guessing the urls

<http://www.jstor.org/stable/i40055831> \- this seems to be the issue format
just regex for them

there the articles url are in

<http://www.jstor.org/stable/41237135> this format. So parse any page for
those links and keep already crawled and html downloaded.

