Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
rhett
on March 12, 2009
|
parent
|
context
|
favorite
| on:
A Little Web Spider I Wrote Last Night (code inclu...
Here is a web spider in 1 line: perl -MLWP::UserAgent -MHTML::LinkExtor -MURI::URL -lwe '$ua = LWP::UserAgent->new; while (my $link = shift @ARGV) { print STDERR "working on $link";HTML::LinkExtor->new( sub { my ($t, %a) = @_; my @links = map { url($_, $link)->abs() } grep { defined } @a{qw/href img/}; print STDERR "+ $_" foreach @links; push @ARGV, @links} )->parse(do { my $r = $ua->simple_request (HTTP::Request->new("GET", $link)); $r->content_type eq "text/html" ? $r-> content : ""; } ) }'
http://www.google.com
quilby
on March 13, 2009
|
next
[–]
More info:
http://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0013.html
yogione
on March 12, 2009
|
prev
[–]
I tested the above script - It works. You are good.
rhett
on March 13, 2009
|
parent
[–]
thanks, i didn't write it. I remembered that in some magazine from 1999. The link to the author is in the comment below
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: