
 (Python) Generator Tricks For Systems Programmer (pdf) - slig
http://www.dabeaz.com/generators/Generators.pdf?
======
jbm
I love generators! They produce attractive code that is understandable (once
one really understands what a generator is)

I remember using one when I was building a Mail DB search appliance (on the
backburner..) with the mail retrieval code being this:

<https://gist.github.com/1014045>

Basically, after importing, retrieving emails is as simple as this:

    
    
        get_messages, close_server  = __pop3()
        all_messages = get_messages()
        for message in all_messages:
                fn(message)
        close_server()
    

__pop3 returning a simple generator, while close_server deals with shutting
off the server / cleaning up the closure itself. No need to
retrieve/initialize all the emails at once (with the memory / other overhead
involved); just one at a time (with the connection being kept open in the
background). Normally if you'd want to do that, you'd have to stick the
"message handling code" into the "message retrieval code", or at least pass
along a "handling" function - neither of which really seems that kosher to me.
Generators solved that problem handily.

~~~
rmc
Looks like you could benefit from python's context managers (aka the "with"
statement <http://www.python.org/dev/peps/pep-0343/> ). It allows you to do
something at the start, do stuff in the middle, then guantee that something is
done at the end, even if an exception is thrown. things that need to be closed
afterwards are a great example of where with statements help.

~~~
jbm
Thanks, I'll take a look.

------
julian37
An alternate (and, arguably, more elegant) approach to "Parsing with Regex"
and "Tuples to Dictionaries" is to use named regex groups and the
MatchObject's "groupdict" method like so:

    
    
      logpats = r'(?P<host>\S+) (?P<referrer>\S+) (?P<user>\S+) \[(?P<datetime>.*?)\] '\
                r'"(?P<method>\S+) (?P<request>\S+) (?P<proto>\S+)" (?P<status>\S+) '\
                r'(?P<bytes>\S+)'
      logpat  = re.compile(logpats)
      groups  = (logpat.match(line) for line in loglines)
      log     = (g.groupdict() for g in groups if g)

------
RodgerTheGreat
If you really like the "pipelined" approach to describing data processing,
postfix languages _sing_ at it. Factor (<http://factorcode.org/>) is well
worth investigating.

edit: I built a little toy system for working with generators in Forth- some
aspects are a bit messy, but you can still write code like

    
    
      100 upto sum dup * 100 upto { dup * } map sum -
    

([https://github.com/JohnEarnest/Mako/blob/master/examples/Alg...](https://github.com/JohnEarnest/Mako/blob/master/examples/Algorithms/Feasible.fs))

~~~
irahul
Apart from chaining generators, you can write a simple helper to pipe general
expressions.

    
    
        def pipe(cur_val, *fns):
            """
            Pipes `cur_val` through `fns`.
            ::
                def sqr(x): return x * x
    
                def negate(x): return -x
    
            `pipe(5, sqr, negate)`
    
            is the same as
            ::
                negate(sqr(5))
            """
            for fn in fns:
                cur_val = fn(cur_val)
            return cur_val
    
    

I find `pipe(5, sqr, negate)` neater than `negate(sqr(5))`, especially when
the nesting is greater than 2.

This is inspired by clojure's threading macro -> and a re-factoring of
@fogus's code snippet on stack overflow. He used reduce to implement the loop
- though loops can be implemented with reduce, and that's how you generally do
it in clojure et al., it isn't idiomatic Python.

------
exDM69
Python's iterators and generators are a perfect example of doing practical
lazy evaluation. Streams in Scheme and lazy sequences in Clojure are similar
in many ways.

A good programming language should try to find a sweet spot that has the best
of both worlds, using lazy and eager evaluation in different situations. It's
also a lot harder to convert code from eager to lazy evaluation than it is to
do the other way, so I'm thinking that lazy evaluation should be the
"default", as it is in Python 3 (range vs. irange in Python 2) and in Clojure
for iteration and sequences.

------
grifaton
Inspired by this very presentation, I recently knocked together a library for
munging text streams in Python, rather than calling through to sed and grep.

If anyone's interested, the code's here:
<https://github.com/inglesp/munger.py>

~~~
slig
Is there any special reason for why you didn't use imap here
[https://github.com/inglesp/munger.py/blob/master/munger.py#L...](https://github.com/inglesp/munger.py/blob/master/munger.py#L116)
?

~~~
grifaton
Habit, I think. Will change it when I get the chance.

------
plainOldText
I've really enjoyed this presentation. Thanks for sharing

Implementation of "tail -f filename" in Python given as example. (slightly
altered)

    
    
      def follow(filename):
           f = open(filename)
           f.seek(0,2)
           while True:
               line = f.readline()
               if not line:
                   continue
               yield line
    
      for new_line in follow('/tmp/test.file'):
           print new_line
    

This is so cool.

------
drats
I love this pdf. The one thing I've been wanting is for the original author,
or someone else with another ~1gig dataset, to do the test again with pypy and
give us the speed figures.

~~~
bryanh
Running the same scripts (mostly, except my Apache log lines were a bit
longer) on a 1.1GB file resulted in:

python 2.7.1: list - 15.7s / generator - 14.1s

pypy 1.7: list - 7.5s / generator - 7.7s

On a MBP 2.2ghz i7 w/8gb. Fun!

~~~
drats
Thanks, the future looks bright. Although RAM usage would be higher for pypy
the improvement is massive.

~~~
bryanh
I checked, PyPy used about 30mb, while vanilla Python was right around 3-4mb.
So an order of magnitude.

------
slig
Reading "Hacking Scrabble (part 1)"
<[http://news.ycombinator.com/item?id=3347720>](http://news.ycombinator.com/item?id=3347720>);
source, I remembered about this presentation. I learned a lot about it and
thought you guys might like too.

~~~
insipid
The follow-up presentation here is just as good, in my opinion:
<http://www.dabeaz.com/coroutines/Coroutines.pdf>

~~~
slig
Thank you! I'll read it ASAP.

------
pyre
I've got an implementation of tee in Python. Python itself comes with a
reverse tee (reading from multiple files/file-like objects), but there was
nothing out there for writing once to multiple file (or file-like objects).

It wasn't as easy as one would think, because trying to use it in conjunction
with something like subprocess.Popen() presents a problem since it requires a
file object that has an actual file handle (i.e. file.fileno()) rather than
just some mock-object that implements the right instance methods.

I need to get it up on github and/or pypi.

~~~
bockris
<http://docs.python.org/library/itertools.html>

has an implementation of tee

~~~
pyre
That's not the tee that I'm talking about.

This takes one iterator and returns n multiple iterators:

<http://docs.python.org/library/itertools.html#itertools.tee>

This is what I'm talking about when I'm talking about tee:

<http://linux.die.net/man/1/tee>

Python has a reverse tee:

<http://docs.python.org/library/fileinput.html>

This iterates over input from multiple sources, but my implementation takes
input from a single source and writes it to multiple destinations.

------
gerds
Is there an equivalent in Ruby?

~~~
irahul
Not as high level as Python's generator expressions, but you can use Ruby's
fibers to implement generators.

~~~
rue
Enumerator solves typical generator scenarios reasonably well, and are easy to
implement for your own classes. They use Fibers internally, and include
Enumerable. Enumerator supports both internal and external iteration.

