Hacker News new | past | comments | ask | show | jobs | submit login

Here is a web spider in 1 line: perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift @ARGV) { print STDERR "working on $link";HTML::LinkExtor->new( sub { my ($t, %a) = @_; my @links = map { url($_, $link)->abs() } grep { defined } @a{qw/href img/}; print STDERR "+ $_" foreach @links; push @ARGV, @links} )->parse(do { my $r = $ua->simple_request (HTTP::Request->new("GET", $link)); $r->content_type eq "text/html" ? $r-> content : ""; } ) }' http://www.google.com




I tested the above script - It works. You are good.


thanks, i didn't write it. I remembered that in some magazine from 1999. The link to the author is in the comment below




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: