
Python for Perl programmers - joslin01
http://everythingsysadmin.com/perl2python.html
======
jeremya

      However, most Python programmers tend to just read the
      entire file into one huge string and process it that way.
      I feel funny doing that. Having used machines with very
      limited amounts of RAM, I tend to try to keep my file 
      processing to a single line at a time. However, that 
      method is going the way of the dodo.
    
      contents = file('filename.txt').read()
      all_input = sys.stdin.read()
    

This is not the correct way to do things. The first example given is better.

    
    
      for line in file('filename.txt'):
        print line
    

Generators are a beautiful thing [1][2]

[1] [http://stackoverflow.com/questions/519633/lazy-method-for-
re...](http://stackoverflow.com/questions/519633/lazy-method-for-reading-big-
file-in-python)

[2] <http://www.dabeaz.com/generators/>

~~~
eblume
Though the results may have changed since, I once had to parse a very, very
large file (about 5 GB) line-by-line. I tested a number of different methods,
and ultimately discovered that reading the entire file using .readlines() was
faster by a significant margin. I vaguely recall it was more than four times
faster, actually - and that was a really big win when you're talking about a
data set that large.

Granted, our machine had a perhaps-unusual design in that it had multiple RAID
6 data banks, a very powerful dedicated IO controller, and about 128 GB of
RAM. So your mileage may vary.

Unfortunately I don't have access to that machine any more so I can't perform
the same benchmark, but I have a nagging feeling that if I ran the tests again
with Python 3.2 I'd get different results.

~~~
lucian1900
That's most likely because of the overhead of all those syscalls for each
line.

A nicer way to do such things is to open the file with mmap, which does one
system call and then just bumps a pointer. mmaps expose both a file and a
list-like API.

------
js2
_Python people tend to always compile their regular expressions; I guess they
aren't used to writing throw-away scripts like in Perl_

This is so that the RE isn't compiled each time it's used. These days the RE
module maintains an internal cache for compiled RE's, but that wasn't always
the case.

The 1.5.2 docs for compile() say "using compile() is more efficient when the
expression will be used several times in a single program" while the 2.7.2
docs add the note that "the compiled versions of the most recent patterns
passed to re.match(), re.search() or re.compile() are cached, so programs that
use only a few regular expressions at a time needn’t worry about compiling
regular expressions."

Edit: apparently the cache was present in 1.5.2; it just wasn't documented. In
1.5.2 the cache size was 20. In 2.0 it was increased to 100.

Still: explicit is better than implicit.

As well, I find assigning the RE to a well-named variable tends to make for
more readable code. RE patterns can be horribly ugly and I like to get their
definitions out of the way of the logic using them.

~~~
alexis-d
> RE patterns can be horribly ugly

It worth to be noted that the verbose flag (re.X or re.VERBOSE in Python) may
help to get more readable/friendly regexes (by allowing whitespace and
comments).

~~~
ajross
I'm desperately fighting the urge to snark: but another very effective
treatment for dealing with unreadable regex syntax is to _learn regex syntax_.
In my experience probably 90% of people who complain about regular expressions
simply don't know them very well.

It's a declarative language without symbols or recursion: that means that it's
just not well-suited to being "broken down and simplified". You just have to
bite the bullet and learn it. With a little practice, the basic features (e.g.
beginning/end of line markers, character classes and captures, maybe non-
capturing parentheses too) should be readable without trouble.

~~~
alexis-d
I think you missed my point.

I agree that for trivial regex it's useless. For instance (directly taken from
Python doc) in this case:

    
    
      a = re.compile(r"""\d +  # the integral part
                         \.    # the decimal point
                         \d *  # some fractional digits""", re.X)
      b = re.compile(r"\d+\.\d*")
    

that's useless. But for more complex problems it may be useful (e.g.
<http://www.doughellmann.com/PyMOTW/re/> I know email matching regexes are not
really complex or even a good example of use of regexes but that's the first
example I found).

Another point is that even if I can read/write (that's the case) regexes other
people may have to deal with my code and it's a well known fact that many
people doesn't understand/like regex, so splitting them in small "chunks" may
help them.

~~~
ajross
To be fair: my argument wasn't _against_ the use of the /x suffix to a regex.
I'm sure that there are circumstances in the wild (though quite honestly I
can't recall seeing any in production code, and I've written and read a
truckload of regexes over the years) where it's used productively to document
a really hairy expression.

I'm just saying that 90% of the time when users complain about not being able
to read a regex, the solution should be "hit the books" not "rewrite the
expression".

------
tocomment
Can someone please make a Perl for Python programmers?

It actually took me 30 minutes today to figure out how to do:

text=open('file.txt').read()

without creating 3-5 lines of code which seems excessive for a simple
operation.

There are apparently 20 ways to do it in Perl and they need their own CPAN
library for it :-( [1]

[1] [http://search.cpan.org/~drolsky/File-
Slurp-9999.13/extras/sl...](http://search.cpan.org/~drolsky/File-
Slurp-9999.13/extras/slurp_article.pod)

~~~
shanemhansen
This is actually really simple in perl. The idiomatic way to do it would be:

    
    
      open(my $fh, "<", "file.txt") or die "wft?"
      my @lines = <$fh>;
    

lines is now an array of lines, you can join them into a single string if you
want.

~~~
ajross
Or just:

    
    
      my $data = `cat $file`;
    

Some folks dislike the useful use of cat, but anyone who's ever read a shell
script will recognize the idiom. And it's a trivial one-liner much simpler
still than the python code .

~~~
pdehaan
This isn't exactly idiomatic Perl and fails completely in environments that
don't have cat (Windows).

For one-time uses there's not a problem. However, if you find yourself writing
what amount to shell scripts in Perl, you might be better off just writing
shell scripts.

~~~
ajross
Excuse me? Who are you to determine what is and isn't "idiomatic"? And that
you would try this kind of thing about _perl_ of all languages is just
shocking. What's saddest is that it's the _lack_ of this kind of absurdist
pursuit of robustness and purity is one of the things that has always best
defined the perl community.

Broadly, your argument seems to be that spawning processes in the external
environment[1] (which, admittedly, is inherently nonportable) is a Bad Thing
in perl, and that we shouldn't do it. When I rephrase it like that, does it
sound as poo-flinging crazy to you as it does to me?

[1] Let's be honest: a working "cat" is pretty much the single most portable
thing you can put between the backticks. If you won't allow this, what _will_
you permit?

~~~
pdehaan
First off, apologies if I was coming off as rude, I wasn't trying to be.

My point was more that slurping a file without spawning another process is
already quite easy in perl. That method of slurping contradicts common
patterns like "while (<$fh>)". I'm not saying it's strictly wrong or even bad,
just not necassarily the best option.

------
mrschwabe
This post single handedly uncovered an idea for a new niche' of premium
programming books: how to program in language A for language B programmers.

------
kamaal
Sorry can't use Python instead of Perl. Basically because Python is not Perl.

Now if you want to just change a language for the sake of changing it, its Ok.
But Python and Perl have opposite philosophies. And the actual problem while
changing from Python to Perl occurs there.

To me Python looked like Java with a haircut and french beard. That fact
simply keeps hovering over me, even in this tutorial all those variable =
something.somethingElse.someMethod() thing gives a strong Java vibe .When I
used it for the first time, It looked like a language for people who were
frustrated with Java but at the same couldn't learn Perl either. It looked
like a language designed for such people.

Parsing a program visually hurts my eyes. Especially for large code blocks.
Inability to provide easy ways of writing throwaway command line hacks is just
not acceptable at all on Unix machine. That's precisely where Perl won over
sysadmins during its early days.

Having to bury each statement you suspect under piles of try/catch statements
doesn't feel like a scripting language. No multi line lambdas is a major turn
off.

Regular expression support looks totally alien and its no way closer to Perl's
way $line =~ /<match something>/ and not just that, Matching is just one part.
Extraction, substitution et all all part of regular expression operations. Now
regexes combined with map/grep functions make parsing a whole lot easier in
Perl.

Scoping sucks big time in Python. There is nothing remotely comparable to
CPAN.

I don't know which versions to Program in, 2.x or 3.x? There is nothing like
Moose and other associated modern perl packages in Python.

Lastly I never needed another C based language for scripting. What I needed
was a more extensible language(like Perl6).

Python doesn't serve my scripting needs on Unix. But I agree it has good web
frameworks and it may be helpful there.

------
crb
Here is a good "cheat sheet" for how to do X in Y (where X and Y are in PHP,
Perl, Python or Ruby): <http://hyperpolyglot.org/scripting>

It doesn't cover idioms like Tom's post, but helps me remember the basic
syntax changes when jumping between these languages.

------
rgbrgb
Anyone have the opposite? I've been working in Python for a few years but now
I'm taking a Perl class.

~~~
Craiggybear
I've not used Perl in any real way for years but I am now brushing up on it
again.

You _can_ do neat one-liners in Python or Ruby but I've always liked Perl for
this kind of thing.

I also have a weird penchant for using BASH scripting to do quite complicated
things. Just because it can.

------
tantalor
Not mentioned is the common idiom of assigning regexp matches.

In perl,

> my ($a, $b) = '12' =~ /(\d)(\d)/;

In python,

> (a, b) = re.match('(\d)(\d)', '12').groups()

Note the python code here will throw an AttributeError if the match fails,
whereas the perl will just assign undef to the variables.

~~~
joslin01
Alternatively, use named captures:

    
    
      > obj = re.match('(?P<id>\d\d)(?P<name>\w+)')  
      > id = obj.group('id')  
      > name = obj.group('name')

~~~
Craiggybear
Nice!

