
Using Python generators for real work [pdf] - conesus
http://www.dabeaz.com/generators/Generators.pdf
======
davidhollander
Using Python generators for real work... or why I switched to Lua.

They're great at first, but then let's say you need to subdivide pieces of the
generator into helper functions for better organization. Or yield recursively.
Turns out you can't, because yielding is A) syntactical and B) can only be
done from the main function body.

It took us until Python 3 to replace "print x" with a more sensical,
functional "print(x)". Yet for yield we are stuck with even worse arbitrary
syntactical rules for "yield x", instead of being able to do "yield(x)" where
it yields to the wrapping generator\coroutine from wherever it is in a call
stack. The result is nonintuitive, less-refactorable code.

[http://lua-users.org/wiki/LuaCoroutinesVersusPythonGenerator...](http://lua-
users.org/wiki/LuaCoroutinesVersusPythonGenerators)

~~~
jnoller
Actually, Python generators will be changing quite a bit soon:
<http://www.python.org/dev/peps/pep-0380/>

~~~
davidhollander
Thanks for the link, I had not read 0380.

The proposed solution seems to be increasing syntactic complexity rather than
reducing it by adding another keyword, "from" into the mix which is hardly
optimal imo.

This to me indicates a systematic logical flaw of thinking of hitting a
"yield" in the call stack as analagous to "return". When in fact it is the
inverse and analagous to waiting for a function to return and thus more
similar to a print() function call. The whole point of coroutine yielding is
that it inverts the point of view of a call stack and allows the called
routine to be the calling routine as well.

I still use Python for some things and have too many fond memories to ever
hate it, but even if generators get cleaned up I'll probably stay w/ Lua for
pipeline type projects as I've grown too used to runtime within an order of
magnitude of C, better coroutines, and its more Scheme-like nature.

~~~
sparky
My interpretation of the "yield" keyword is not that of a _coroutine_ yield,
but that of "cough up the following value", which is more in line with
"return". Under that interpretation, the fact that generators can also be used
to implement coroutines is a coincidence with unfortunate terminology
namespace conflicts.

I looked around a bit for the original semantic intention behind choosing
"yield" as a keyword, but came up empty. Anyone?

~~~
davidhollander
It is my understanding "yield" refers to "execution control". If semantic
meaning is of concern over lexical, let us ignore the names "coroutine" and
"generator" and focus on the general logic of code continuation. That is, the
difference between pausing the current execution context vs _destroying it_.

Function call: halt further processing of current call stack, execute
procedures, resume call stack after function call when control returns.

Yield: halt further processing of current call stack, execute procedures,
resume call stack after yield when control returns.

Return: destroy current call stack.

The variation present on the code continuation axis is far more significant
than the variation on the coughing-up-values axis.

~~~
tsellon
I think, in this case, 'yield' is being used less in the sense of a street
sign, and more in the sense of crop yield.

------
xtacy
Fantastic!

It would be interesting to combine pipelines into something that branches off
into various categories. For e.g., you could split the lines of access-log
file into IP addresses and size of requests and fork off separate processing
threads: one for obtaining unique set of IP addresses and another for summing
up.

It seems like all that is needed for describing a pipeline is:

    
    
      (a) a queue for input
      (b) a processing program that's connected to the queue
      (c) (possibly multiple) queues for output
      (d) a topology connecting the processing programs
      (e) a job scheduler.
    

At the face of it, it looks similar to Apple's Automator and Matt Welsh's PhD
thesis on SEDA:

    
    
      * http://www.eecs.harvard.edu/~mdw/proj/seda/
      * Paper: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
    

EDIT: Formatting

~~~
davidhollander
Here is some older material on pipeline coding in Lua you might be interested
in that describes this problem in terms of Filters, Sources, and Sinks:
<http://lua-users.org/wiki/FiltersSourcesAndSinks>

LuaSocket implementation of the above:
<http://w3.impa.br/~diego/software/luasocket/ltn12.html>

------
Gonsalu
There is an updated version of this talk:

    
    
        http://www.dabeaz.com/generators-uk/index.html
    

Also, more interesting talks from the same author:

    
    
        http://www.dabeaz.com/talks.html

~~~
Luyt
Clickable links (without spaces in front):

<http://www.dabeaz.com/generators-uk/index.html>

<http://www.dabeaz.com/talks.html>

------
ot
It seems that pipeline-style programming is now popular in many modern
programming languages, for example C# (with Linq), F# (with the |> pipe
operator and the Seq module), and Ruby and JS+jQuery (with method chaining).

I really wish Python had some syntactic feature that encourages this style,
doing "gen_cat(gen_open(gen_find(...)))" is rather cumbersome, compared to
"gen_find(...) | gen_open | gen_cat".

Of course you can override __or__ like some libraries do, but it is not the
_One Way To Do It_...

~~~
xtacy
Mathematica has a nice syntax for function application, mapping over lists,
etc. It's also possible to define such operators in Haskell, which has the
added advantage of type-safety.

    
    
      f @ {1,2,3} == f[{1,2,3}]
      f /@ list == map(f, list)
      f @@ {1,2,3} == f[1,2,3]
    
      (#1 * 2) /@ {1,2,3} == {2,4,6}
    

etc.

~~~
ot
That's like F# operator |>

    
    
        f |> g == g(f)
        [1 .. 3] |> Seq.map (fun x -> x * x) |> Seq.sum |> printfn "%A" // prints 14

------
Vivtek
I wish I could upvote this ten times. Fantastic presentation.

------
ramidarigaz
This is _awesome_. I think I finally understand generators, and I just thought
of a couple use cases in the code I'm working on now. This is great. Fantastic
submission.

------
carlhu
The design pattern the author presents for expressing iteration and the log
parsing example is original and beautiful. I hope the author reads this
comment thread and sees how appreciative we all are for his contribution.

