Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted. Yields are cute and there was monocle, I wouldn't say it exactly took off : https://github.com/saucelabs/monocle
Twisted has inlineCallbacks that use yields as well. Just import Twisted into stdlib then and use that.
I am surprised that gevent was dismissed. Ok, there is also eventlet, if someone doesn't like gevent. Monkey patching is scary? Is it really _that_ scary? Most sane and portable IO code probably runs on that today. Why? Because there is no need to create a parallel world of libraries. Write a test does it pass? Does it handle your use case? I'll take not knowing if my green threads switch tasks and add another green thread lock, instead of doubling my code size with yields, callbacks and futures.
Let's talk about Twisted (sorry couldn't resist WAT video reference). I remember for years searching for parallel libraries to parse exotic protocols. Regular Python library is there, but no, can't use that, sorry. Gotta go find or write one that returns Deferreds. You add a single Twisted module in your code, good luck! -- it ripples all the way to the top through your API and you are doomed being locked into the Twisted world forever.
When gevent and eventlet came around it was like a breath of fresh air. This is what sane concurrent IO looks like in Python:
Sane is in the eye of the beholder.. gevent looks nice, but I'd be very diffident when it comes to actually supporting it in production. It monkey patches the standard library and messes with CPython internals to achieve what it does, infinitely increasing the chance it will conflict with some other piece of code (for example, that bizarre ancient internal propriety library you're using that started life in Fortran, etc)
In the case of async I'm glad to see a from scratch implementation for the standard library. It's a weird area that necessitates some constructions that there is no really standard Python style for. You only need to look at Twisted and e.g. its method chaining to realize this stuff would need a thorough sanity rework before it ever became standard anyway.
Also, most other implementations take the approach of building their own little world. This is definitely true of Twisted. You write code for Twisted, not for Python. Gevent at least doesn't suffer from this.
As someone who's contributed to Eventlet, I've always felt that the only way that it could get over this scary hurdle (and it is legitimately scary) is for it to be integrated within Python itself. Almost all of the weird problems come from fighting with the baked-in assumptions of the Python runtime. Eventlet does try to alleviate the scariness a little bit by allowing you to import "greened" modules rather than changing the global versions, but that has its own problems.
If it were integrated with Python, there would be no monkeypatching, no special magic, it would be just how things work. That said, I'm not at all surprised that Guido doesn't favor a coroutine-based solution; his opposition to general coroutines is as famous as his opposition to anonymous functions. (to clarify: I don't think the @coroutine decorator creates "real" coroutines, any more than generators were already coroutines)
I used gevent at first for a project that needed Async I/O and it worked really well, but then I switched to Erlang and I realized how poor a choice Python is for such tasks. The language really needs to be designed from the start for it (like Go, Rust, Erlang &c... Haskell wasn't designed from start for it but because of it's functional purity, bolting it on was "natural" but - it isn't so for Python IMHO).
Yeah, if you're going to have to do async to be performant than it better be pretty pervasive throughout all the libraries. Bonus points if the language supports syntax to make async easier as well. Node is beating out Python for server stuff not simply because it is "async", but because it is so much FASTER. The speed of V8 vs. CPython is a big part of that. In fact, vanilla JS doesn't have much to make async programming particularly easy: it has verbose function declaration and no yield mechanisms. Even library-level solutions like promises are merely 'OK'.
Still, it is easier to build a fast server that can handle streams in Node than it is in Python. Async Python? I'll just stick to async JS in that case.
I think there is another issue here. Python world has watched as Node.js has been eating its lunch on the server side and they decided, ah, surely that is because Node.js has async, if we add that too, everyone will love Python again and come crawling back. They are not saying, I think that is written between the lines.
Except one thing, as you pointed out, people use Node.js -- 1) it is JS 2) V8 is fast.
The reason why you getting so many downvotes is because your comment is not just silly but flat wrong. Twisted, Tornado were around before nodejs. There are also async frameworks that make writing async code the same as synchronous code. I like tornado but I am using nodejs for my current app because of the libraries. This is where nodejs really shines, the community and libraries are awesome. Twisted has a lot of libraries but it has so much going on that many developers find it too complex. Tornado is a much simpler async framework to adapt to and allows you to run twisted libraries.
Twisted's inlinecallback's and tornado's gen module gets rid of all the async spaghetti code. This is hard to do with nodejs but I still chose nodejs because the available libraries made my project quicker to develop.
Sorry I didn't express myself correctly (see my reply to akuchling below).
Basically yes, Python had Twisted for years, it had Diesel, Monocle, Tornado, and some other ones. I am aware of those and as you've read my comment you saw that I used Twisted enough to know its ins and outs (5 years).
> There are also async frameworks that make writing async code the same as synchronous code.
Yes there is inlineCallbacks and I used. Node.js also has async (https://github.com/caolan/async). But you don't address the main problem that I raised -- fragmentation of libraries. Python is great because it comes with batteries, and then you can find even more batteries everywhere, _except_ if you use an async framework like Twisted, which, percolates all the way through you API code. Once your socket.recv() returns a Deferred(), that deferred will bubble up all the way to the user interface. So you now you end up searching or recreating a parallel set of libraries.
> Twisted has a lot of libraries but it has so much going on that many developers find it too complex.
It is too complex with too many libraries for those who want to take it up but it is not complex and doesn't have enough libraries if you are in it already -- every library you use has to be Twisted now. That's the danger of inventing a new framework.
Yes it will be standard, but there is already a practical standard -- eventlet and gevent. This is somethin Node.js doesn't have. I will personally take monkey-patching and the danger that my function will context switch inside while doing IO over using Twisted. I saw a practical benefit from it at least.
I have a question for you since you have a lot of experience with async. Eventually node.js will have generators (when V8 implements ECMAScript 6) which should allow node.js to have something like gevent. What kind of effect do you think this will have on the node.js world?
If it has generators it will make it easier. I remember being happy about finding out Twisted has inlineCallbacks. I basically let you not have to split your logical function into multiple functions simply because it has to to do some IO.
Before it used to be code like:
def processShoppingCart(...):
d.addCallback(_cb1)
return d # (d is a Deferred)
def _cb1(...):
d2.addCallback(_cb2)
return d2 # d2 is a another Deferred
etc.
Which with generators and inlineCallbacks turns into
@someDecoratorThatEnableUsingInlineCallbacks
def processShoppingCart(...):
.. do some work ..
yield <some_io_function_like_check_db>
.. do some more work ..
yield <some_io_other_io_function>
....
You get the idea. I am not too familiar with Node.js but I imagine it will help quite a bit.
People have been writing async apps in Python since 1995 (Medusa); Twisted was first published around 2001. It's not like async programming is new to Python.
Sorry I didn't say it correctly, I meant that it seems the renewed interest in async is stemming from watching Node.js get all the attention.
I have been using Twisted for 5 years full time and have also used eventlet and gevent. From talking to others, I have found few who enjoyed or loved Twisted. It was pretty much the only sane way to do concurrent, performant IO for a while. But then when green thread approach came about, I had never looked back.
All was well. Then one day Node.js showed up, and it seems it has started to eat Python's lunch -- fast, scripted development on the server side, with some reasonable concurrency. And it was faster too.
Python devs looked at it and couldn't believe their eyes. And I speculate many have concluded it was because everyone was in love with a callback based async IO paradigm. So that's my guess why we are seeing this proposal.
Go was designed for concurrency from the start and even has null pointers and lots of mutation, this makes the concurrency problem easier to solve because you have more control over state.
Meh, I'm not sure where I stand on this argument - I've recently been having an affair with Haskell and I prefer the Haskell Way. The way Haskell models I/O seems confusing at first but monads make the whole thing more manageable than any language I've ever used.
"More control over state" makes me feel funny inside; if you haven't actually used Haskell then you would probably see that pretty much every language right now is immature in comparison to Haskell when it comes to "control over state".
I also think mutability makes reasoning about large concurrent and/or parallel programs much more difficult.
My company used gevent in production at very large scale, and we were extremely happy with it. In fact, we ported our existing Django and Flask applications to run under gevent, which was a surprisingly fast process. (Weeks, not months, to port rather large codebases.) We did have to be careful with third-party libraries, like Zookeeper clients, but that was worth the tradeoff. We got the performance of an evented structure without having to rewrite a ton of code.
How does one go about porting Django apps to be compatible with gevent? In that you used gevent in your Django code, or that you built something completely different?
I assume that you use a WSGI server which allocates one greenlet per request (e.g. gevent.wsgi, or Gunicorn's async workers), and make sure that the rest of your code isn't going to block the event loop too much. Once that's done, you can have a whole bunch of HTTP requests being handled at once. That's nice if your server spends most of its time blocking on database requests or something.
I have -- we used Twisted (about 10k lines of code). Then switched to eventlet -- and the code reduced to about 5 or 6K. There were some issues with monkey-patching, but tests showed them, and we fixed them.
As someone mentioned, if they instead standardized on greenlet then monkeypatching talk wouldn't make sense.
If your bizarre ancient internal thing doesn't need to do async I/O, then don't import it monkey-patched. I use Eventlet heavily in production, and this tends to be pretty easy.
Fortran libraries; you mean Scipy? A bunch of the Python numerics world (one of the big growth cases for Python!) are Fortran. LAPACK/ARPACK are still unmatched.
YES. Thank you for saying this. Some of the async code that I maintain uses Twisted and some of it uses Eventlet, and the difference between them is night and day. The code using Eventlet is so much cleaner, so much easier to maintain, and (oddly enough) so much less magical than the Twisted stuff. This was written by the same people, and they're all really good programmers, so the obvious confounding variables are not an issue here. Eventlet and Gevent are just so much better.
Worried about monkey-patching? Then only monkey-patch the parts you need to be asynchronous. Worried about magic that you don't understand? Have a look at the code; the magic is actually pretty straightforward after you've paid a little attention to the man behind the curtain.
If you're interested in async stuff for Python, I urge you to have a look at Eventlet or Gevent.
>Yes it has nice features such can replace the reactor/hub
thing. Has futures/promises/deferreds. That has all been
done before in Twisted.
Yeah but - this will be in stdlib. And I think the hope is one-event-loop-to-rule-them-all will let the various frameworks play nice with each other. For instance - Glyph just mentioned to me that he doesn't use IPython any more to work on Twisted code because IPython now has a Tornado event loop which can conflict with the Twisted code he's playing with...
I don't think the hope is that this will be better than twisted or gevent in terms of implementation (obviously the API will be nicer than twisted given Python 3) just that it will be the standard by virtue of being in stdlib.
Tornado has a well documented (and least complicated regarding gevent & twisted are concerned IMHO) but not well known/used "inline callback" with yields to simplify async code.
One thing I love about gevent is that you can share code between async and non-async. Most of my project benefits from async IO, but there's one part that needs to use a lot of CPU within a single process. So that part uses multithreaded Jython, the rest uses gevent, the common code is shared, and it all just works.
Exactly, I was surprised how in the whole "ideas" mailing list discussion Guido had and in other forums that is dismissed as "meh" not even mentioned.
Discussions quickly turn theoretical and academic. "But you don't know when your green threads will switch, man, so I'll add yields in there for you". Yes, and then also make sure there is a complete universe of libraries.
Python is awesome not just because it is fun to write little hello world examples in (so is Logo), it is awesome because it is easy to GetShitDone(TM) quickly. The big part of GetShitDone(TM) quickly is reusing libraries not rewriting everything from scratch.
Using an exotic database for some reason -- great. Found a Python library to interact with it -- great. Oh but my framework is based on Deferreds and this one was written without Deferreds or this one returns Futures. Sorry, go write your own from scratch.
This has been the story of my life for 5+ years search or re-writing Twisted version of already existing libraries.
Now at least just adopt Twisted and go with it if they are going this route. But now, they are 'standardizing' on something new. I think had they done this in 2007, yeah rock on, that would have made sense. They didn't. What saved and kept Python on the back-end during the past 5 or so years was greenlet (eventlet and gevent). Guido is kicking all those people in nuts and saying, "no", we'll do Twisted now (with some changes).
This is really a matter of taste. You should at least be aware that you're monkeypatching, and code and test accordingly. Many people have good results from monkeypatching, and even more have good results from calling well-written-and-tested libraries that monkeypatch.
But monkey patching turns python from explicit to implicit. It just doesn't feel pythonic to me, and I don't think I'm alone in this.
A big reason I use (and enjoy using) python is because it doesn't feel like a "bolted on" solution. All current concurrency options for python feel bolted on to me personally.
The python internals weren't designed for this, which is the reason they have to use monkey patching. It doesn't mean that you can't make something work, but it means the language/interpreter sure aren't going to help you make it work.
I just couldn't write something which depends on speed and concurrency in python right now, knowing there are solutions much better designed for the problem. Python holds a special place in my heart, but unicode and concurrency aren't so good right now.
However, this future async support and unicode support in python 3 very much excites me!
> It just doesn't feel pythonic to me, and I don't think I'm alone in this.
But how does a tangled mess of callback1 callback2 callback3 feel when all you want to do is do a couple of db reads and writes while processing a simple shopping cart. Is that Pythonic?
> The python internals weren't designed for this, which is the reason they have to use monkey patching.
So fix the internals. Here is a practical way people use Python every day, make that the default, don't revert to some academic or callback mechanism.
> I just couldn't write something which depends on speed and concurrency in python right now, knowing there are solutions much better designed for the problem.
See that is what saddens me. gevent and eventlet do let you write reasonably good and concise IO concurrent code. Some have run large sites and deployments with it. I haven't found any major slowdowns or downsides to switch yet. Because I it is easy and simple to experiment, I'll always try Python first, even though later I might switch to Go or Erlang.
seconded, I doubt many a sane person would use Python for IO bound concurrency, they should definitely be looking at Rust (in the future slightly) and Erlang (currently the defacto standard for easy-of-use concurrency).
Perhaps the video makes more clear the rationale. E.g.
Possible solution: "Standardizing gevent solves all its problems".
One of the responses: "I like to write clean code from scratch".
Another: "I really like clean interfaces".
So I'd prefer that the BDFL work with the gevent folks to get it cleaned up and integrated while adjusting it to expose a "clean interface".
Perhaps the whole thing will make more sense once Guido provides more detail, but I'm underwhelmed and confused.
Guido has been resisting the stackless stack slicing assembly technique since I first learned about Python and Stackless Python in 1999. That's obviously never going to change.
That reminds me of one of those famous Roman Emperors that all is well and good as well as they make rational decisions, then eventually they turn senile or mad, and everyone realizes how dictatorship is not that much fun sometimes.
From a certain perspective it is a rational decision. Because the CPython API relies so heavily on the C stack, either some platform-specific assembly is required to slice up the C stack to implement green threads, or the entire CPython API would have to be redesigned to not keep the Python stack state on the C stack.
Way back in the day [1] the proposal for merging Stackless into mainline Python involved removing Python's stack state from the C stack. However there are complications with calling from C extensions back into Python that ultimately killed this approach.
After this Stackless evolved to be a much less modified fork of the Python codebase with a bit of platform specific assembly that performed "stack slicing". Basically when a coro starts, the contents of the stack pointer register are recorded, and when a coro wishes to switch, the slice of the stack from the recorded stack pointer value to the current stack pointer value is copied off onto the heap. The stack pointer is then adjusted back down to the saved value and another task can run in that same stack space, or a stack slice that was stored on the heap previously can be copied back onto the stack and the stack pointer adjusted so that the task resumes where it left off.
Then around 2005 the Stackless stack slicing assembly was ported into a CPython extension as part of py.lib. This was known as greenlet. Unfortunately all the original codespeak.net py.lib pages are 404 now, but here's a blog post from around that time that talks about it [2].
Finally the relevant parts of greenlet were extracted from py.lib into a standalone greenlet module, and eventlet, gevent, et cetera grew up around this packaging of the Stackless stack slicing code.
So you see, using the Stackless strategy in mainline python would have either required breaking a bunch of existing C extensions and placing limitations on how C extensions could call back into Python, or custom low level stack slicing assembly that has to be maintained for each processor architecture. CPython does not contain any assembly, only portable C, so using greenlet in core would mean that CPython itself would become less portable.
Generators, on the other hand, get around the issue of CPython's dependence on the C stack by unwinding both the C and Python stack on yield. The C and Python stack state is lost, but a program counter state is kept so that the next time the generator is called, execution resumes in the middle of the function instead of the beginning.
There are problems with this approach; the previous stack state is lost, so stack traces have less information in them; the entire call stack must be unwound back up to the main loop instead of a deeply nested call being able to switch without the callers being aware that the switch is happening; and special syntax (yield or yield from) must be explicitly used to call out a switch.
But at least generators don't require breaking changes to the CPython API or non-portable stack slicing assembly. So maybe now you can see why Guido prefers it.
Myself, I decided that the advantages of transparent stack switching and interoperability outweighed the disadvantages of relying on non-portable stack slicing assembly. However Guido just sees things in a different light, and I understand his perspective.
Thank you for the providing all the background! I didn't know all the historical context.
> or custom low level stack slicing assembly that has to be maintained for each processor architecture.
Yeah I would personally also say that the advantages of writing the equivalent assembly for a handful of architectures outweigh re-writing / re-inventing high level library code dealing with concurrent IO.
I think Guido and a few others have decided to focus on some things rather than others and the Python ecosystem will suffer in the long run based on these decision. The ability to write beautiful concurrent IO code without a whole new async framework, is better, even if means breaking some C extensions, writing assembly, or not supporting exotic CPUs.
Programming world is not going to get less concurrent over time. Concurrency will spread more and more. There is already some momentum with gevent and eventlet and it is something that Node.js doesn't have -- and I see Guido turning away and making something worse.
Just to check if I'm understanding the presentation right, will the implementation involve compiler magic to turn this:
@coroutine
def getresp():
s = socket()
yield from loop.sock_connect(s, host, port)
yield from loop.sock_sendall(s, b'xyzzy')
data = yield from loop.sock_recv(s, 100)
# ...
into this, similar to how C# does it? (let's pretend multi-line lambdas exist for a minute)
def getresp():
s = socket()
loop.sock_connect(s, host, port).add_done_callback(lambda:
loop.sock_sendall(s, b'xyzzy').add_done_callback(lambda:
data = loop.sock_recv(s, 100).add_done_callback(lambda:
# ...
)
)
)
Or will the `yield from`s bubble up all the way to the event loop and avoid the need for that?
It is Eventlet and Gevent have that magic. Here is how that looks:
def getresp():
s = socket()
s.connect((host,port))
s.sendall(s,b'xyzzy')
data = s.recv(s,100)
Compare that to any of the above. This is what is thrown away in favor of 'yield from' and @coroutine mess coupled with a completely parallel set of IO libraries.
Well... There actually are a completely parallel set of IO libraries, it just happens that the interface can be identical to the existing blocking interfaces because of the greenlet stack slicing magic... So it only appears like there are not a completely parallel set of IO libraries.
> So it only appears like there are not a completely parallel set of IO libraries.
Right on. That's the great part -- both a simple way to program and re usability of libraries.
So far, I see library ecosystem fragmentation as the biggest issue of all and nobody seems to want to talk to it.
Academically all the yields and co-routines look so cool, in practice when you need 5 libraries to help with some task, and now you have to re-write them -- not so cool.
I don't understand your question. From the implementation perspective Python doesn't rewrite things to continuation-passing-style but the end result should be the same.
Twisted's inlineCallbacks singlehandedly turn Twisted in my mind from an abomination into something that is a joy to work with. In lei of a more hands off Erlang/Go approach, I am convinced that style is the only way to go.
A minor gripe (seperate from all my other gripes, which other people have already talked about):
"...run code in another thread - sometimes there is no alternative - eg. getaddrinfo(), database connections"
Just thought I'd mention that async-supporting DNS libs do exist (eg. gevent ships with C-ares), and in particular I've used async postgres database connections in both C and gevent. The code to gevent-ise psycopg2 connections is about 10 or 15 lines, iirc.
Reading that presentation, it seems that Python has way too many Asyncronous I/O libraries/frameworks on its hands (not to be inflammatory though, I see it as a chance).
I really wonder why that is not the case in Ruby. I mean there are some, but there's mostly confidential and there doesn't seem to be much interest around them. Especially not to the point that the project leader would take a stab at it.
For anyone curious, INTERCAL was originally a joke language which included a COMEFROM instruction that acted like GOTO in reverse: http://en.wikipedia.org/wiki/COMEFROM
I think it's a great idea. I haven't tried Twisted and having to install some 3rd party component to get it working doesn't sound tempting, however being supported by default, does.
Why does Guido think this is general purpose enough to add to Python but that the scientific features to make it competitive with R aren't? Is he envious of node.js?
The scientific community is not that interested in merging into the stdlib.
Also, the main point of this is to allow for different async libs to find some common ground to stop the madness of having twisted-specific, tornado-specific, etc... The scientific community does not have this pb because everybody uses numpy.
The scientific features are only of interest to the (drum roll) scientific community.
Async interests potentially everybody, including the server guys, the backend guys AND the scientific community.
Not to mention that this is a few contained classes, whereas the scientific stuff is tons and tons of code to be included into Python, including lots of Fortran and C, that would more than triple the size of the standard library.
Lastly, node.js? Lots of languages have a good story for async, from C# and Scala, to Go and Rust...
I think because scientific features are not fundamental tools of expression, while he is working a language that is trying to be the foundation (most general) for the more specific libraries or tools.
Yes it has nice features such can replace the reactor/hub thing. Has futures/promises/deferreds. That has all been done before in Twisted. Yields are cute and there was monocle, I wouldn't say it exactly took off : https://github.com/saucelabs/monocle
Twisted has inlineCallbacks that use yields as well. Just import Twisted into stdlib then and use that.
I am surprised that gevent was dismissed. Ok, there is also eventlet, if someone doesn't like gevent. Monkey patching is scary? Is it really _that_ scary? Most sane and portable IO code probably runs on that today. Why? Because there is no need to create a parallel world of libraries. Write a test does it pass? Does it handle your use case? I'll take not knowing if my green threads switch tasks and add another green thread lock, instead of doubling my code size with yields, callbacks and futures.
Let's talk about Twisted (sorry couldn't resist WAT video reference). I remember for years searching for parallel libraries to parse exotic protocols. Regular Python library is there, but no, can't use that, sorry. Gotta go find or write one that returns Deferreds. You add a single Twisted module in your code, good luck! -- it ripples all the way to the top through your API and you are doomed being locked into the Twisted world forever.
When gevent and eventlet came around it was like a breath of fresh air. This is what sane concurrent IO looks like in Python:
http://eventlet.net/doc/examples.html
My fear is that many will just say fuck it, I'll just use Go/Rust/Erlang for IO bound concurrent problems.
It is nice having a benevolent dictator, except when he goes a little crazy, then dictatorship doesn't sounds so much fun anymore.