I think benchmarking asyncio with any type of CPU bound tasks misses the point. Previously we were relying on hacks like monkeypatching with gevent, but now we've been presented with clean, explicit and beautiful way to write massivly parralel servers in Python.
But please do check this simple gist I found some time ago that helped me understand how powerful asyncio is:
> [..] It handles hundreds of thousands of concurrent connections [..]
Is it a single process application, or you use a few OS processes behind some load balancer?
I don't think you are an idiot I just think you didn't search well enough. There is a reason there is no local file IO with asyncio.
Check these links for more info:
From what I understand the way libuv (what node.js uses) gets around OS limits is with a thread pool.
Also, this documentation might be helpful:
My experiences don't seem to mirror your frustration. I found asyncio quite useful for tinkering with a simple web scraper that hit multiple sites at once (each with a different response time) and munged all the data together into one data set.
Thanks for the write-up!
what's even more worrying is that it had almost run out of funds in July. Currently, it has 5000$ in the fund earmarked for Py3 support (http://pypy.org/py3donate.html)
somehow I thought that Google was supporting Pypy.
PS. We're closing onto 3.3 release soon
PPS. Google donated a bit of money to PyPy when Guido was there, definitely "google supporting pypy" is a bit of a stretch. It was years ago though
Because it seems to me that Pypy is the future of python. and it makes total sense for someone like Google or Dropbox to support it with a lot of money.
In January 2013 I joined Dropbox. I work on various Dropbox products and have 50% for my Python work, no strings attached.
I don't have a 20% project per se, but I have Google's agreement that I can spend 50% of my time on Python, with no strings attached, so I call this my “50% project”.
I don't know if Google does or does not provide meaningful support to CPython development. Certainly it did, as you point out. But fijal's comment was about the present, not the past.
Now, nothing less than a wholesale change of the leadership can right the listing ship.
I'd suggest you to go to python-dev and post a concrete (and realistic!) proposal on how to improve whatever you think is broken.
Suggesting to change the leadership isn't really productive.
With due and high respect for the people who work hard on Python, I am happy to contribute, but I am not happy serving the leadership of the project as it currently stands. Python 3.x is a net value-destroying proposition, but the current leadership has spent so much energy exhorting the community to move, without success, that it has no choice now than to stand down.
1) Python 2.7 is still, 6 years later, much more popular than Python 3.x.
2) Python 3.x now has 5 versions, none of which has features which are strong enough to cause migration. Specifically, async is not as easy as the competition, type annotations are an itellectual indulgence if they don't improve performance, and ease of use, a central pillar of the language's attractiveness to newcomers, is being eroded with unnecessary moves towards "seriousness".
3) It was a mistake by the Python authorities to break compatibilty without bringing substantial new features. Breaking compatibility allows revolutionary features. Instead we have breaking of compatibility with only incremental improvement.
4) Because the leadership refuses to acknowledge 1-3, its credibility is now questionable. It is an important related issue that people who point out these problems, as I do, face a strong barrage of criticism, with suspicions of orchestration.
5) With due respect for and gratitude to the inventors and contributors, the time for change has come. For Python to flourish, new leadership is needed.
I am happy to debate my strong opinions, but to suggest out of the blue that I don't know what I'm talking about is not interesting. I have been using Python for a decade. I know it very well. You may know it better, in which case I would like to hear your considered opinions on my points. The context, for avoidance of doubt, is my love of Python, and my strong desire for it to continue to do well.
"Go write a C extension" you tell me, "use something besides cPython" he says, "just use multiprocessing" I hear. Sure...but ffs, we've had multicore processors for almost TWO DECADES now.
One of my biggest, and apparently unchanging, problems with Python is the desire to keep things simple in the interpreter, to the disadvantage of the language. Sure "implementation for interpreters may vary" blah blah, but you have to target the bottom end in performance, and most widely installed, which is definitely cPython for both points.
The IO bound tasks in Python are a problem and I wish there was a clean solution. Python does not have a global event loop, so there is not an easy place to hook in coroutines, callbacks, etc. So for a while we were stuck with one of the following:
1. Use threading or multiprocessing. This sucks for more than concurrency of like 2-8.
2. Use eventlet, gevent, or another event loop. The problem here is that you have to buy into it whole hog. No component of yours can be blocking, and that's hard to tell.
3. Write your own event loop. I've done this and find it to be the most understandable and easy to debug approach. This sucks because of the amount of effort it takes for something so fundamental (because networking is tricky).
Some people would be happy if Python got better at solving IO-only bound tasks. I guess that's where this feature comes in. I haven't played with 3.5 yet because I am mostly stuck on 2.7 for reasons. However, looking at it, I feel like there should have been more of a separation between blocking and non-blocking code here. Something alongs the lines of an async function not being able to call a blocking function.
Re: CPU and IO bound tasks: I know of no great framework for this besides threading (not the kind in Python + GIL, but real threading). I usually just side-step this problem by separating tasks that are both IO and CPU bound into smaller tasks that are only CPU or only IO bound. Thankfully, that's generally pretty easy to do.
When your competition is NodeJS there is nothing really left to say.
Python is really decades behind in this area and no async hack is going to improve matters. I think it's safe to say that this is an obvious example of how Python is not really a general purpose language (disregarding the fact that it started out as a christmas hack and really has no design behind it). It should be used for scripting and quick prototyping but architecting anything substantial on it is not advisable.
Everyone who works with Python who understands its limitations has usually found reasonable workarounds for the things you would expect the language to bend to, much like other, similar languages that occupy that niche.
By the same token, we can claim that C++'s horrendous and undecidable syntax and footguns make it a language inadequate for "general purpose" programming; and for many cases you would be right!
But there is no such thing as a language that can do everything well. The other big contender for "general purpose" is the JVM languages, which also suffer from innumerable issues such as slow VM startup time, long VM warmup, enormous RAM usage, lack of very reasonable primitives such as unsigned integers, etc. etc.
Those are the reasons why very large systems either use a hybrid of languages and runtimes for different tasks, or use a monolythic solution and make the adequate compromises.
No one will disagree that the multithreading story would be easier in Python if it chucked the GIL out of the way, but then again, if you're doing that you might as well start a project in Elixir, Julia, Rust, or any number of modern languages that don't suffer from the cruft, but easily will need a decade to catch up in terms of library support and tooling.
If you're in Toronto, Canada in November you should come to my talk at Pycon CA.
Openstack is a rather large Python codebase that powers some rather large public and private clouds from Rackspace to HP and even CERN. I'll be discussing the various technologies and processes that allow us to do that in Python.
It's also one of the more popular platforms for scientific computing and data analysis. There are huge code-bases designed in Python that work quite well and continue to grow in adoption and usefulness. I don't see why anyone would advise against writing a substantial system in Python.
Yet I believe your premise is incorrect in the first place. Why has Golang, by most commentators' comments, "got concurrency right". Is that true? Tell me. If it is, why can't Python get concurrency right also? If it's not true, why is there an official solution, being criticised before our eyes?
On this case, I will quote you: "I suspect you don't really know what you are talking about...".
The author is right. If you write a coroutine, any code that uses it must also be a coroutine. This is pretty annoying when you're trying to test something manually. It bubbles up this way until you eventually hit the event loop.
If you're trying to debug a coroutine in the interactive shell you've got to do something like this:
loop = asyncio.get_event_loop()
# Blocking call which returns when the hello_world() coroutine is done
It takes some getting used to - I think Python people are likely to have more trouble with this precisely because Python is a very clear, explicit, and mostly imperative language -you can read Python code as a sequence of instructions and that will be pretty much the way it gets executed, which makes understanding programs very easy. Async is simply a less intuitive way to program, so to adopt it you have to be sure it's worth the hassle of giving up easily understood code.
At least we have pypy. The community should really be rallying behind that project.
You're not supposed to use async/await without a framework like asyncio or tornado.
One way to provide a better UX is to merge asyncio into Python on a deeper level, but this is something that many people won't like.
asyncio doesn't provide any nio abstractions for files because (a) it's not really needed, and (b) there is no easy way to implement it.
(a) basically you shouldn't expect your code to block on disk io. But even if it does block for a very short amount of time it's probably fine.
(b) one way to implement nio for files is to use a threadpool. Maybe we'll add this in later versions of asyncio, but it will require to write some pretty low level code in C (and reimplement big chunks of asyncio in C too). Another option is to use modern APIs like aio in Linux, but as far as I know almost nobody uses it for real.
Bottom line -- you don't need coroutines or asyncio to do file io. What you need asyncio (and frameworks like aiohttp) is to do network programming in Python efficiently.
No, it's very often not fine. Magnetic disks, still the norm for many, and definitely with large storage, often go as low as 5KB/s for random access (or even sequential access to very fragmented files). Reading a 1MB file can easily take 5-10 seconds in some setups - which is not acceptable for any interactive service. It's not fine for a web server to not service any requests for 5 seconds.
> Another option is to use modern APIs like aio in Linux, but as far as I know almost nobody uses it for real.
Anyone I know who tried came back screaming. There is no way to do an async file open, for example - which means that if you rely on aio, you can block for 10 minutes waiting for an NFS or SMB mounted file to open.
The only sane, portable thing to do for Unix/Posix is use a threadpool for async file io - or just use something like libuv which already abstracted async operations this way.
uh?! In what world are you living in? Blocking the main thread of your app on disk I/O makes for a poor user experience.
Disk I/O is a big deal for a whole range of applications. I would even say almost all apps have to deal with disk I/O at some point, as opposed to network I/O.
If Python 3 had good concurrency and optimization (something neither version has right now), I'd consider using it, but is there an already existing reason that I'm just not seeing?
The advantage of Python 2 is an evaporating pool of legacy Python2-only libraries and legacy Python2-only programmers. Python2 is dying at the rate that pool is evaporating.
Python 3 has the advantage of having a future, plus all the advantages of Python 2 and more with the exception of that shrinking pool. I have already replaced all my Python 2 code and won't intentionally write more code that I know will have to be replaced. I'd like anything I build now to have the greatest likelihood of still being something I can continue building on in the future. Python 3 is more likely to fit that requirement than Python 2.
I'm teaching my kids to program. Python 3 is a good choice for doing so. Python 2 would set them up to be teenage maintenance programmers. Doing that to them would be child abuse. I upgrade my own skills, too, so wouldn't choosing Python 2 for my own upgrades be a form of self-abuse?
More fancy answer -
There's plenty of nice improvements to the language (better handling of iterators, tidied up std lib, modern objects, etc), though it's probably worth it for unicode support alone - trying to convince python2 to properly handle international text is just a huge pain (also - no you can't just replace all accented characters with non accented ones if you want to preserve the correct meaning of text and not piss off all your customers by misspelling their names).
Yes you can eventually solve all problems with python2. But why bother when you can just use the latest version, there's basically no cost for new projects.
This idea that Python 3 is cost free needs to be expunged. For large classes of users, Python 3 is not possible even if they want to (which they don't). Stop this erroneous propaganda. Unicode is nice if you're a web guy, and if you're a web guy, why are you using Python already? Unicode is completely irrelevant for everybody else and is most definitely not a core reason to move to Python 3. You're living in cloud cuckoo land on your 3.x magic mushroom trip.
Earth to web jockeys. Python's hardcore is numerical computing, and that hardcore is not on Python 3, and is not moving anytime soon, and certainly not for the dubious benefit of unicode. Moreover the web stack is much less important to Python than is numerical and scientific computing, for the simple reason that while the former has multiple better competitors in the form of golang, JS et al, the latter does not.
[sidebar: since when does ascii not cater for accents?? I am bilingual french / english and I have never had a problem typing french accents in ascii? You're creating misleading propaganda again. Once again, EatHeart, I quote you: "I suspect you don't really know what you are talking about...", or worse, you have an agenda to mislead.]
you're from the US, right? there's about 6 billion people for whom ascii isn't enough. some of them program in python.
Yes. I'd probably accept a slightly worse language than python 2 but with reasonable unicode support. Fortunately, there's no need to.
I never had any problem with 3.x conceptually, I was just slow/lazy and because I could see how each new 3.x release got better and better there wasn't much rush. I always thought I would move to 3.x one day. And 2.7 was a nice place to wait for awhile across distro upgrades.
I haven't bothered with any migration of old code yet - just starting new stuff in 3.4. I've got one project that aims to be 2.7/3.4+ bilingual and it's going well so far.
Now that I've tried it I do like 3.4 - four releases of small feature improvements do add up eventually. I think Python's strategy of keeping 2.7 around for a long time to let stuff migrate when ready is a good one. Each new 3.x release also makes migration from 2.7 a bit easier. Not worrying about supporting 3.0, 3.1 or 3.2 (and 3.3 even?) also makes migration easier too.
I'd recommend moving to 3 for new projects. Migrating existing projects is more of a case by case deal where it may or may not be worth it yet or even ever.
Despite what people like to say on forums, pretty much every library I'd want to use works fine. And it has for a long time.
When I've tried to go back and use Py2, I keep getting annoyed by simple things.
The unicode is a pita. Not having keyword-only arguments is annoying.
Not having "yield from" means more dancing.
Py2 forces me to manually make a lru_cache decorator, instead of just working..
From my perspective, sure, I _could_ use Python 2.x..
But Python3 works better and is easier.
The question isn't "Why Python3", it's "Why the fsck wouldn't you?"
Unicode support is vastly improved.
Finally there are have specific exceptions like FileNotFoundError so you don't have to catch OSError and then check errno. I got so tired of that.
No more old-style classes, which means you know you can use super(), which is now less annoying since you don't need to “remind” Python which class you're in.
Lots of other small changes, most of which do lead to improvements, however minuscule. I would argue that unless you are must use a Python2-only library, just go with 3.
The syntax is really nicer IMHO. Unicode strings are really nice to have too. Plus Python 2.x will be EOL'd soon. I just don't see the point of working with it anymore.
Why start a new project in a language that will have support dropped in ~4 years?
If it isn't going to be around in less than 5 years, I'm not going to use it for new projects which have lifespans measured in decades. I've got projects still in production, almost unchanged [only changes were to update business rules], for 4 years. It moves millions of dollars a year. I sure as hell don't want to worry about OS support for it another 4 years.
In the end what pushed me over were the improvements to the standard library multiprocessing module only available in Python 3.
If Python were a listed company, the CEO would have been replaced long ago. I'm tired of watching my favourite language flail around like this. Will Continuum Analytics or Enthought please fork 2.7?
The fundamental limitation here isn't unique to Python. Multiple tasks that are CPU bound simply cannot be managed using cooperative multitasking, and that's all asyncio is: cooperative multitasking, with a friendlier syntax than the old asyncore module.
Cooperative multitasking using async I/O works great when you have lots of tasks running that are I/O bound--they spend almost all of their time waiting for network or disk reads/writes. But as soon as you start piling up tasks that require CPU, you need either multiple threads or multiple processes. Python asyncio was never intended to handle that use case.
Multiple threads is where Python has a unique limitation: the GIL prevents multiple threads from running on multiple cores, so if you want multiple CPU bound tasks to use multiple cores, you need to fork each one into a separate process.
It also doesn't help that the multiprocessing module is, IMO, a huge ball of cruft that's overkill for a lot of multitasking use cases. But there are simpler ways to do it, at least on Unix systems, because the fork system call is very lightweight.
Shameless plug: I wrote a library for this some time ago, which I use whenever I have a Python project that needs to fork worker processes; it's the "comm" sub-package in my plib-io Python distribution (this is the Python 3 version):
What we need, is a concentrated focus on the only two areas where Python beats the competition hands down. Numerical computing, and newby accessibility. The first gets a token "@" operator (great, thanks), the second is moving backwards with 3.x. Asyncio, type annotations, and unicode, are answers to questions nobody is asking (except by those who refuse to use the right tool for the right job).
Web programming in general is not CPU intensive, so async I/O (which Python already does well, as you point out--except for the API issue, see below) works fine for web programming.
> if you insist on python, we have plenty of options to do async i/o already
But all of their APIs suck, at least IMO. Supporting async I/O with built-in language syntax, to make code easier to write and more readable, seems like a good idea to me.
Multitasking is the most important issue that Python faces today (and maybe PyPy is the answer).
Could you tell me what (and how long) your experience is?
My answers about your complaints are basically:
1. Solving threading is (relatively) easy, if you just give up backwards compatibility; Not the 2.x -> 3.x compatibility which is comparatively trivial - but in a major way, breaking every single extension library, and the vast majority of Python code (or slowing it down unbearably). You might be happy, but the rest of the Python users (basically, the reason you actually use Python) won't.
2. Assuming they agree with you about their failure (I dont, FWIW), someone has to step up and offer an alternative. Who is that, what are they doing these days, and why do you think they will succeed where Guido et al "failed"?
3. Perl at 28 years is not much older than Python at 24 years (relatively speaking), but has been sliding into obscurity for a long time now, whereas Python is flourishing. The Python 3 transition is actually happening as planned (IIRC, there was an expected 5 year period just for feature and speed parity!). Perl 6 is esoteric, Perl 5 is aging and dying. I think that for a living, popular language, the Python team is doing a commendable job, even if they do not address a specific issue (that many people care about, but would actually not make that much of a difference in practice if we are to learn from other languages)
It just need further dev work and acceptance into py3
Definitely not "solved without breaking backwards compat".
Before answering your questions 1 thru 3, some context. It is my strong belief that the authorities are constantly looking at golang and JS as their competitors, in other words, the web world, whereas the real hardcore advantage of Python is in science and numerical computing. As evidence, witness numerous Python books which advise new users to hit the Continuum Analytics or Enthought sites for their full-stack Python installations, even texts which are not about numerics.
On your questions:
1) I don't care about threading. I care about pushing as many compute bits through the Xeon as I can in a given amount of seconds (using Numpy). But as I am a data scientist, I need the REPL. C is out. Why can I still not do this easily? Multiprocessing is there, sure, but it's been unch for years, while Cuda, OpenCL etc are far too hard for guy like me whose intellectual bandwidth is occupied with the domain, not the CS. Isn't that what Python was supposed to be about? Getting stuff done? Why isn't Python vectoring my data through the CPU and GPU yet, 15 years after numeric was first introduced?
2) Continuum Analytics is doing an awesome job and I don't see why they, or Enthought, couldn't take up the mantle, 10gen/Datastax style to use a database analogy. They really know their customers, and the Continuum stack delivers real new value every 6 months, and not only for a scientific audience. More generally, real users in real domains should be driving the project.
3) I am less concerned about Python 3 happening as planned, than I am about the focus of the project. Type annotations? This is an intellectual indulgence if it does not increase performance. Asyncio? We've had async libraries for years! Even when I started Python we had async libraries (not as good but they were there). How is async something fantastic and new? It's nothing but polishing an existing capability a little bit further. Unicode. fine. But again, web focused. Nobody else cares. Xrange laziness. Okay. Leaves me cold. Print(). No CS benefit, but huge marketing loss as you can't go out to newbies anymore and say "hey, check this out... Python hello world?"
>>> print "hello world"
So. What is Python. A serious language? NO. A wonderfully malleable, not too serious, friendly language, into which you can insert some real hardcore stuff (Numpy, ML, website parsing, database transformations, game scripting, image processing....the list is endless) really easily? Yes.
Where is Python genuinely way ahead of everyone else? Only on numerical computing.
It strikes me that Guido and co are embarrassed by their weekend hack of 1.x and 2.x, when that is precisely what the user base loves about it. Their attempt to make Python serious, is killing Python's original spirit. There is nothing wrong with 2.7. Nobody wants Python to morph into Java.
So, am I a Luddite wanting 2.7 to live forever? no. What I want is vectorization plus DAG-like workflows. These are the most important pieces of computer science that actually dovetail with real world use cases, today. Yes async is cool, but golang now owns that space. What I'd really like is for Python to give us a good framework for the Big Data world which is in almost everybody's use case now, and that means, Python needs to talk multiprocessor, Python needs to talk GPU, Python needs to talk cluster, and Python should long ago have been addressing this directly. Python needs to "go vectorized". That should be the project's obsession. SIMD, in a word, where the parallel granularity can go from GPU kernels,to CPU, to multi-machine clusters, and DAG-like workflows (with possible recursion) built in. This is not a nice-to-have capability anymore. It is what the next wildly popular mainstream programming language will have, built into the language. The signals from everything we're seeing added to 3.x is NOT this. It's web-like stuff. That battle is over. JS won.
Let's not gift the opportunity of huge data to Java (Spark) and Cuda, or a new language that will see the future better than us. It should be Python!
There you go. My view.
No, that's not where it is. It's in being a user friendly language for people who need to get numerical work done, and there's a huge difference. IPython/Jupyter did not happen in any other language, not even remotely (the closest thing I've seen is some Lua graphic repl that didn't go far).
> Nobody wants Python to morph into Java.
Python is not morphing into Java, far from it. 3.x represents a cleanup of the language (str vs. unicode, new vs. old style classes, a lot of other stuff). I do not agree with all of it (I too disagree with demoting print from statement to function), and there is other stuff I would have done - but most of it was desperately needed.
> What I want is vectorization plus DAG-like workflows. These are the most important pieces of computer science that actually dovetail with real world use cases, today
For a tiny, tiny part of the users. And they already have e.g. Theano, and a few other tools, that give some solutions.
> What I'd really like is for Python to give us a good framework for the Big Data world which is in almost everybody's use case now,
Excuse me, but I have to disagree anyone other than a tiny (perhaps 1 percent) of Python users care about big data. Most definitely not "everybody's use case". And the minority which actually cares about big data should not get anywhere close to Python - the runtime is prohibitive.
> This is not a nice-to-have capability anymore. It is what the next wildly popular mainstream programming language will have, built into the language.
Interesting. What language has this today? I know not of a single one, mainstream or niche. I don't believe that you are right on this.
> Let's not gift the opportunity of huge data to Java (Spark) and Cuda, or a new language that will see the future better than us. It should be Python!
Check out Nim + Nimborg. You'll get the best of Python and Nim, and Nimborg would let you do the transition smoothly.
Your idea of Python is not Guido's idea of Python, and as far as I know does not have popular support anywhere. If I were feeling so strongly about my favorite language being so misguided, I would look for a new language. The momentum of existing development direction is essentially unstoppable, whether it is right or wrong.
If you use CPU-bound tasks, using non-blocking I/O (which is what asyncio is about) is not going to work. You can use multiple processes (maybe from concurrent.futures) to help that case.
If you make one mistake in your thinking, it's treating Python as more than a christmas hack (which it started out as). It's not a serious language, it has no design behind it and is decades behind the state of the art.
Use it for throw-away scripts and such is what I say, it's an area where it shines.
The moment you want to _architect_ something substantial, reach for something else cause chewing gum and duct tape is not going to do it.
I don't think concurrency is great in Python, but you're living in the wrong world if you dont think people are using the language for real things.
Unlike Perl, there is nothing immediate in Python to warn you of the pit you're digging for yourself, especially if you're a newcomer to programming (as a substantial chunk of the Python community is). The troubles begin when you make the colossal mistake of trying to use Python for more than it can do, try to design and architect actual systems with it but I digress.
On the other point you made, Google, Facebook, Twitter etc are using C++ too.
Does that make C++ a good language for architecting substantial systems? Show me (1) system of substantial size implemented in C++ that hasn't turned out to be a DISASTER
in anything _but_ performance. When your browser can be taken over a million ways to Sunday _just because_ it's implemented in C++, do you think Google, Facebook, Twitter etc give a damn if _you_ don't? Same argument for PHP and Facebook.
Would you say that these companies are "pretty serious" about web, infrastructure and platforms?
There are good languages out there, suited to architecting systems that are maintainable, secure and evolve gracefully.
You won't find them simply by looking at what Google, Facebook and Twitter are using. You will need to try harder.
 Actually I think you're right. I think it's time to graduate to something serious. My mistake is to think that Python can be that serious language. 3.x, by trying to be that, is horribly compromising the basics of what Python is all about, namely accessibility, friendliness, discovery, malleability. Not production.
there's nothing new that i can see under the hood
Example here: http://dask.pydata.org/en/latest/imperative.html
If you like it, can you please blog about it (and post on hn)? Needs exposure outside the python data community.