

A Little Web Spider I Wrote Last Night (code included) - andrewljohnson
http://www.trailbehind.com/trips/view_report/118

======
rhett
Here is a web spider in 1 line: perl -MLWP::UserAgent -MHTML::LinkExtor
-MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift @ARGV) {
print STDERR "working on $link";HTML::LinkExtor->new( sub { my ($t, %a) = @_;
my @links = map { url($_, $link)->abs() } grep { defined } @a{qw/href img/};
print STDERR "+ $_" foreach @links; push @ARGV, @links} )->parse(do { my $r =
$ua->simple_request (HTTP::Request->new("GET", $link)); $r->content_type eq
"text/html" ? $r-> content : ""; } ) }' <http://www.google.com>

~~~
yogione
I tested the above script - It works. You are good.

~~~
rhett
thanks, i didn't write it. I remembered that in some magazine from 1999. The
link to the author is in the comment below

------
jerf
Here's a little web spider I just wrote: "wget -r".

I haven't got a problem with people writing their own quickie scripts, but
they aren't really worth putting online without some other compelling reason.

------
timf
It's hard to tell if you're after learning or something to use etc., but if
you haven't seen it yet, check out <http://scrapy.org/> for ideas (or even
something to use if that is what you are after).

------
a-priori
For completeness, and to keep with the recent meme around here, here's a web
spider I wrote a while ago in Erlang.

<http://github.com/michaelmelanson/spider>

