

Perl script to dump the urls submitted by an HN user - adulau
https://gist.github.com/939629

======
mike-cardwell
Without the crazy dependencies:

    
    
      #!/usr/bin/perl
      use strict;
      use warnings;
      use LWP::Simple;
    
      my %urls = ();
      process( "http://news.ycombinator.com/submitted?id=$ARGV[0]" );
      print join("\n",sort keys %urls)."\n";
      sub process {
         my $html = get( $_[0] );
         $html =~ s#<td class="title"><a href="(http[^"]+)#$urls{$1}++#eg;
         process("http://news.ycombinator.com$1") if $html =~ m#<a href="(/x\?fnid=[^"]+)" rel="nofollow">More</a>#;
      }

------
fragmede
Far from perfect (regexps on html???), but; it suffices to get latest 22
submissions:

    
    
      wget -q -O - http://news.ycombinator.com/submitted?id=adulau | grep -o 'title"><a href="[^"]*" rel="nofollow">[^<]*<' | sed -e 's/title"><a href="\([^"]*\)" rel="nofollow">\([^<]*\)</\2 - \1/'

------
spudlyo
What makes this program so small is the Scrappy library from CPAN, which gets
most of its muscle from Web::Scraper and WWW::Mechanize.

I think you'll find if you try this program yourself, you're going to be
spending a lot of time downloading and installing CPAN dependencies, few of
which can likely be satisfied by your OS's package manager.

~~~
mapgrep
The `cpan` command line utility automatically downloads and installs
dependencies when you use it to install a module. It's the rough equivalent
(and precursor to) ruby's `gem`.

~~~
telemachos
I very strongly recommend cpanminus (aka, cpanm)[1] over cpan now. It's zero
configuration, lighter on memory use and much more straightforward for most
uses.

[1] <https://github.com/miyagawa/cpanminus>

~~~
pyre
I'd also recommend using perlbrew+cpanm instead of using cpanm to muck with
the system-installed perl, sometimes this can cause breakage for core system
components that run on perl (and are only tested against the system-installed
version of perl + packages).

~~~
prodigal_erik
> cpanm to muck with the system-installed perl, sometimes this can cause
> breakage for core system components

Yikes. This would be why I don't share root with anyone who thinks it's a good
idea to smuggle stuff onto a box without going through the system package
manager (which is rpm or dpkg or the like, _not_ the cpan client or any other
single-language ghetto).

~~~
pyre
Sorry. I meant that:

\- Installing stuff into system directories from through a second package
management system (perl = cpan, ruby = gems, etc) is generally a bad idea
(e.g. the apt-maintained perl libraries are not seen by cpan, so it will need
to install those if there is a requirement for one of them in a cpan package
you are installing).

\- Upgrading versions of Perl libraries that are used by system (outside of
the official package management system) Perl scripts can cause unexpected
results.

This is why perlbrew is useful. It manages self-contained Perl installations
that you can switch between in your particular environment.

The only downside that I've seen is with system-installed scripts that use the
"#!/usr/bin/env perl" method to specify which Perl to run. I haven't seen
this, in particular with Perl scripts, but I ran into this with virtualenv +
comix as a Python script with "#!/usr/bin/env python" for the shebang line (it
blew up since my virtualenv didn't have py-gtk installed).

~~~
prodigal_erik
> manages self-contained Perl installations

That's what a lot of languages resort to, but as the proverb goes, now you
have two problems. We instead make OS packages out of library dependencies
that didn't already have them, so we have a way to legitimately deploy them to
production servers.

------
rjbond3rd
Nice script would be even better with these two (virtually mandatory) lines
near the top:

use strict;

use warnings;

~~~
pyre
Or:

    
    
      use common::sense;
    

<http://search.cpan.org/~mlehmann/common-sense-3.4/>

------
melling
Anyone have a Perl script to get all my saved stories, with URL, title, and
date?

~~~
pyre
It's a little rough, as I whipped it up in response to this article (it's
something that's been on my todo list), but here it is:

<https://github.com/bsandrow/hn-profile>

