Hacker News new | past | comments | ask | show | jobs | submit login
Python 3.5 and Multitasking (brianschrader.com)
84 points by sonicrocketman on Sept 24, 2015 | hide | past | web | favorite | 99 comments



I've been using the new async/await syntax to write beautiful asynchronous websockets server and I fell in love with it. It handles hundreds of thousands of concurrent connections (on top of aiohttp) and the code is so much cleaner than it would be with i.e. NodeJS and Express and Promises. It reads like a serial code.

I think benchmarking asyncio with any type of CPU bound tasks misses the point. Previously we were relying on hacks like monkeypatching with gevent, but now we've been presented with clean, explicit and beautiful way to write massivly parralel servers in Python.


Anything you can share? Some good examples of networking code with async/await would be incredibly helpful. The documentation covers the primatives but I have a hard time putting it all together.


Unfortunately the code is not open source - I'll try to open parts of it in the future.

But please do check this simple gist I found some time ago that helped me understand how powerful asyncio is:

https://gist.github.com/gdamjan/d7333a4d9069af96fa4d


I'm actually tearing up here. That is... Beautiful.


fyi: ES2016 javascript also supports async/await, and you can use them today w babel


No it doesn't. It just hit stage 3 in TC39 yesterday, meaning browsers should be considering implementing it experimentally in order to solicit feedback from developers. When it progresses beyond that you can start calling it ES2016.


i stand corrected, somehow i was under the impression that they were a part of es7 (es2016). So does this mean, there's still a chance that they might not make it?


It's going to make it in some form, possibly even it's current form, but it isn't formally part of the spec yet.


Thanks for writing this!

> [..] It handles hundreds of thousands of concurrent connections [..]

Is it a single process application, or you use a few OS processes behind some load balancer?


Yes, it is a single process application. But to utilize all cores I use Gunicorn to spawn multiple processes.

http://aiohttp.readthedocs.org/en/stable/gunicorn.html


>I did this for two reasons, the first being that I cannot, for the life of me, figure out how to use asyncio to do local file IO and not a network request, but maybe I'm just an idiot.

I don't think you are an idiot I just think you didn't search well enough. There is a reason there is no local file IO with asyncio.

Check these links for more info:

* https://stackoverflow.com/questions/87892/what-is-the-status...

* http://blog.libtorrent.org/2012/10/asynchronous-disk-io/

From what I understand the way libuv (what node.js uses) gets around OS limits is with a thread pool.

Also, this documentation might be helpful:

* https://docs.python.org/3.5/library/asyncio-dev.html#handle-...

My experiences don't seem to mirror your frustration. I found asyncio quite useful for tinkering with a simple web scraper that hit multiple sites at once (each with a different response time) and munged all the data together into one data set.

Thanks for the write-up!


have you used a database in a non-blocking way with asyncio ? I'm thinking of the way Psycogreen works or node-postgres works.


Please check aiopg [1]. There's also an interesting list of packages for asyncio [2].

[1] https://aiopg.readthedocs.org/en/stable/

[2] http://asyncio.org/


Thanks for the suggestions!


Maybe it's worth noting, maybe not, but this extremely trivial example (e.g. serial.py) gets executed 20x faster just by using pypy which speeds up serial execution. This is more than you would get by any sort of multiprocessing/threading shengenians just by using an optimizing VM (and granted, this example is very simple, but maybe addressing basic performance problems should come first)


pypy has not been updated in a while (it's compatibility is with Python 3.25).

what's even more worrying is that it had almost run out of funds in July. Currently, it has 5000$ in the fund earmarked for Py3 support (http://pypy.org/py3donate.html)

somehow I thought that Google was supporting Pypy.


PyPy is being update very regularly with 3 releases a year on average. Most of the work however focuses on things our users really want - stability, performance, C extension support, warmup speed, memory consumption - you know the mundane stuff. If someone is willing to put effort into supporting more python 3, that's great! we would welcome the contributions.

PS. We're closing onto 3.3 release soon

PPS. Google donated a bit of money to PyPy when Guido was there, definitely "google supporting pypy" is a bit of a stretch. It was years ago though


Google not supporting it in a big way is very strange. is there a political conflict between cpython and pypy ?

Because it seems to me that Pypy is the future of python. and it makes total sense for someone like Google or Dropbox to support it with a lot of money.


It makes sense != it makes money. Google does not support cpython either in any meaningful way (that is, someone hired full time to work on cpython for example).


huh ?

https://www.python.org/~guido/ In January 2013 I joined Dropbox. I work on various Dropbox products and have 50% for my Python work, no strings attached.

http://www.linuxjournal.com/magazine/interview-guido-van-ros...

I don't have a 20% project per se, but I have Google's agreement that I can spend 50% of my time on Python, with no strings attached, so I call this my “50% project”.


Dropbox isn't Google.

I don't know if Google does or does not provide meaningful support to CPython development. Certainly it did, as you point out. But fijal's comment was about the present, not the past.


I really wanted to take a look to how PyPy handles this. Now I'm even more so. Thanks!


if you are blocking on io (such as read/write to disk or over the network), pypy execution speed will not help while threading will.


obviously, but the time of blocking on IO should be the same on all strategies + CPU overhead of doing that


concurrency in python is kind of a disaster, in my opionion. there are a lot of different options but they all seem to have significant drawbacks, and not just limited to ease of use. i know concurrency is a hard problem, but i wish there was one really good, straightforward solution instead of 3 or 4 different half-baked convoluted solutions (threading, multiprocessing, asyncio, subprocess in stdlib, plus twisted, gevent, pulsar etc as third-party).


From what I see, the community keeps changing its mind every few years. We need to pick a multitasking method and really embrace it. If we want to keep the GIL, then we need to go whole hog on some other approach.


A community which changes its mind every few years is a community which does not have effective leadership. Python's leadership is ineffective because 3.x has erased its credibility. You have maximum one opportunity to change a language in mid course. That opportunity has already been used (wasted?) in the case of Python, and while it could have addressed the elephant in the room, multicore, it did not. Instead it indulged the wishes of its powers that be to tinker on the edges to make Python more "serious", in the process adding nothing that anybody needed, while reducing simplicity and accessibility, and, criminal sin, no backward compatibility with no corresponding forward killer feature.

Now, nothing less than a wholesale change of the leadership can right the listing ship.


> Now, nothing less than a wholesale change of the leadership can right the listing ship.

I'd suggest you to go to python-dev and post a concrete (and realistic!) proposal on how to improve whatever you think is broken.

Suggesting to change the leadership isn't really productive.


Leadership change is the fundamental basis of our societies, from the central tenets of democracy, to the structure of our economies where positions of responsibility require competence, at the risk of rapid replacement. It is, on the contrary, very productive to make a case in a public forum (this one) for an ineffective leadership to be changed, if it will help to alter the course of an endeavour which has lost its way, an endeavour, by the way, on which millions of programmers' livelihoods depend.

With due and high respect for the people who work hard on Python, I am happy to contribute, but I am not happy serving the leadership of the project as it currently stands. Python 3.x is a net value-destroying proposition, but the current leadership has spent so much energy exhorting the community to move, without success, that it has no choice now than to stand down.


I suspect you don't really know what you are talking about...


What is the point of this comment? How am I incorrect? My points are:

1) Python 2.7 is still, 6 years later, much more popular than Python 3.x.

2) Python 3.x now has 5 versions, none of which has features which are strong enough to cause migration. Specifically, async is not as easy as the competition, type annotations are an itellectual indulgence if they don't improve performance, and ease of use, a central pillar of the language's attractiveness to newcomers, is being eroded with unnecessary moves towards "seriousness".

3) It was a mistake by the Python authorities to break compatibilty without bringing substantial new features. Breaking compatibility allows revolutionary features. Instead we have breaking of compatibility with only incremental improvement.

4) Because the leadership refuses to acknowledge 1-3, its credibility is now questionable. It is an important related issue that people who point out these problems, as I do, face a strong barrage of criticism, with suspicions of orchestration.

5) With due respect for and gratitude to the inventors and contributors, the time for change has come. For Python to flourish, new leadership is needed.

I am happy to debate my strong opinions, but to suggest out of the blue that I don't know what I'm talking about is not interesting. I have been using Python for a decade. I know it very well. You may know it better, in which case I would like to hear your considered opinions on my points. The context, for avoidance of doubt, is my love of Python, and my strong desire for it to continue to do well.


i personally don't care too much about how python 2 is still more popular than python 3. it's like saying java 1.5 is more popular than java 8. it just doesn't matter to people who use python 3. after working on a python 3 code base for almost 2 years now i wouldn't go back to python 2 if they paid me for it and there's one reason for that: exception handling. it's something that's just broken in python 2. it's a huge change and your 3) is invalid because of that (and unicode, but you probably don't care if you don't consider this a worthwile change.) your 4) thus doesn't follow. re your point 5), python's current leadership should be commended for creating a language so good that people have no reason to switch from. python 3 is even better, so just start using it for new projects.


FWIW nothing stops you from forking the project and being a leader of it. Go ahead, show the better way.


We see people hyping all the time for NodeJS mainly because it can handle millions of concurrently open connections. Now we get a much cleaner (IMHO) way to achieve the same thing in python (of course python is slower) - i.e. see aiohttp as websockets server. I would not call it a disaster.


It's great that it works for you for this one particular type of application, but the whole world isn't io bound. Some of us just want regular, plain old, concurrency. As this shows, multiprocessing is still the only, unsane, solution right now. I just want to be able to make a thread, and do stuff in it in parallel to other threads, like I can with most other languages.

"Go write a C extension" you tell me, "use something besides cPython" he says, "just use multiprocessing" I hear. Sure...but ffs, we've had multicore processors for almost TWO DECADES now.

One of my biggest, and apparently unchanging, problems with Python is the desire to keep things simple in the interpreter, to the disadvantage of the language. Sure "implementation for interpreters may vary" blah blah, but you have to target the bottom end in performance, and most widely installed, which is definitely cPython for both points.


I think these are two distinct use cases. NodeJS actually does not (AFAIK) handle one of them. Basically, you have IO bound tasks and CPU bound tasks (and a mix of both which is really nasty business). Python has had CPU-bound task concurrency via multiprocessing and it's been OK. My preference would be to get rid of GIL and improve how threading is actually done, but technically you can serve CPU-bound tasks today with Python 2 and 3. This is (AFAIK) not something that Node does out of the box.

The IO bound tasks in Python are a problem and I wish there was a clean solution. Python does not have a global event loop, so there is not an easy place to hook in coroutines, callbacks, etc. So for a while we were stuck with one of the following:

1. Use threading or multiprocessing. This sucks for more than concurrency of like 2-8.

2. Use eventlet, gevent, or another event loop. The problem here is that you have to buy into it whole hog. No component of yours can be blocking, and that's hard to tell.

3. Write your own event loop. I've done this and find it to be the most understandable and easy to debug approach. This sucks because of the amount of effort it takes for something so fundamental (because networking is tricky).

Some people would be happy if Python got better at solving IO-only bound tasks. I guess that's where this feature comes in. I haven't played with 3.5 yet because I am mostly stuck on 2.7 for reasons. However, looking at it, I feel like there should have been more of a separation between blocking and non-blocking code here. Something alongs the lines of an async function not being able to call a blocking function.

Re: CPU and IO bound tasks: I know of no great framework for this besides threading (not the kind in Python + GIL, but real threading). I usually just side-step this problem by separating tasks that are both IO and CPU bound into smaller tasks that are only CPU or only IO bound. Thankfully, that's generally pretty easy to do.


asyncio provides an event loop in the standard lib now



And ironically, nodejs is also single threaded with it's own GIL, which somehow no one ever mentions. It's great for async IO bound stuff but is really worse that Python for mutlicore CPU bound work - at least Python has concurrent.futures which makes throwing together a parallel map operation a few lines of code.


This is like the cripple making fun of the blind man.

When your competition is NodeJS there is nothing really left to say.


This isn't a problem unique to Python. There are a plethora of concurrency solutions in every language because there isn't a one-size-fits-all generic solution for every workload that exists.


It's unique to Python in that most other languages used for more than just scripting have true concurrency via OS preemptive threads with no GIL.

Python is really decades behind in this area and no async hack is going to improve matters. I think it's safe to say that this is an obvious example of how Python is not really a general purpose language (disregarding the fact that it started out as a christmas hack and really has no design behind it). It should be used for scripting and quick prototyping but architecting anything substantial on it is not advisable.


I think this a very unfair comparison.

Everyone who works with Python who understands its limitations has usually found reasonable workarounds for the things you would expect the language to bend to, much like other, similar languages that occupy that niche.

By the same token, we can claim that C++'s horrendous and undecidable syntax and footguns make it a language inadequate for "general purpose" programming; and for many cases you would be right!

But there is no such thing as a language that can do everything well. The other big contender for "general purpose" is the JVM languages, which also suffer from innumerable issues such as slow VM startup time, long VM warmup, enormous RAM usage, lack of very reasonable primitives such as unsigned integers, etc. etc.

Those are the reasons why very large systems either use a hybrid of languages and runtimes for different tasks, or use a monolythic solution and make the adequate compromises.

No one will disagree that the multithreading story would be easier in Python if it chucked the GIL out of the way, but then again, if you're doing that you might as well start a project in Elixir, Julia, Rust, or any number of modern languages that don't suffer from the cruft, but easily will need a decade to catch up in terms of library support and tooling.


> architecting anything substantial on it is not advisable.

If you're in Toronto, Canada in November you should come to my talk at Pycon CA.

Openstack is a rather large Python codebase that powers some rather large public and private clouds from Rackspace to HP and even CERN. I'll be discussing the various technologies and processes that allow us to do that in Python.

It's also one of the more popular platforms for scientific computing and data analysis. There are huge code-bases designed in Python that work quite well and continue to grow in adoption and usefulness. I don't see why anyone would advise against writing a substantial system in Python.


Concurrency is just difficult, with no single solution fully addressing every use case. There are a few mature approaches to python concurrency, their weaknesses lie in the fundamental tradeoffs rather than buggy code. Not that there isn't room for improvement, but its almost impossible to have one solution that solves everything.


Then why do we have an official concurrency module? If there is "no single solution fully addressing every use case", why do we have one official solution? Which the main post criticises? If you are correct that there is no single solution, then surely there should be no single solution put forward as official?

Yet I believe your premise is incorrect in the first place. Why has Golang, by most commentators' comments, "got concurrency right". Is that true? Tell me. If it is, why can't Python get concurrency right also? If it's not true, why is there an official solution, being criticised before our eyes?

On this case, I will quote you: "I suspect you don't really know what you are talking about...".


While I haven't had a chance to use 3.5's async/await syntax, I have used AsyncIO pretty heavily to deal with multiple sensor inputs/outputs on a Raspberry Pi.

The author is right. If you write a coroutine, any code that uses it must also be a coroutine. This is pretty annoying when you're trying to test something manually. It bubbles up this way until you eventually hit the event loop.

If you're trying to debug a coroutine in the interactive shell you've got to do something like this:

  loop = asyncio.get_event_loop()
  # Blocking call which returns when the hello_world() coroutine is done
  loop.run_until_complete(hello_world())
  loop.close()
That's my main beef with it. Debugging can also be painful because when you hit an exception, your stack trace will also involve the asyncio library. Aside from those complaints, I'm a fan. It works fine and reads better than callback-style code.


You have a similar problem in nodejs - if the api you want to use is async it has to take callbacks, so if you have to do something with that result you have to nest in another callback etc... it gets ridiculous. But it's pretty much a fundamental issue with async code - to keep things async then you have to set up the whole interdependent network of functions to be able to work async, since you don't know when anything is going to return its value.

It takes some getting used to - I think Python people are likely to have more trouble with this precisely because Python is a very clear, explicit, and mostly imperative language -you can read Python code as a sequence of instructions and that will be pretty much the way it gets executed, which makes understanding programs very easy. Async is simply a less intuitive way to program, so to adopt it you have to be sure it's worth the hassle of giving up easily understood code.


This mirrors my experience precisely. I was very excited about async/await, hoping that this would integrate coroutines into regular Python scripts, without the need to manage some complex dispatch engine. I was equally disappointed to learn that it's business as usual, with painful and inadequate semantics out of the box.

At least we have pypy. The community should really be rallying behind that project.


> This mirrors my experience precisely. I was very excited about async/await, hoping that this would integrate coroutines into regular Python scripts, without the need to manage some complex dispatch engine. I was equally disappointed to learn that it's business as usual, with painful and inadequate semantics out of the box.

You're not supposed to use async/await without a framework like asyncio or tornado.

One way to provide a better UX is to merge asyncio into Python on a deeper level, but this is something that many people won't like.


First of all, see this pic: https://pbs.twimg.com/media/COLLg0TUAAA4j79.jpg:large

asyncio doesn't provide any nio abstractions for files because (a) it's not really needed, and (b) there is no easy way to implement it.

(a) basically you shouldn't expect your code to block on disk io. But even if it does block for a very short amount of time it's probably fine.

(b) one way to implement nio for files is to use a threadpool. Maybe we'll add this in later versions of asyncio, but it will require to write some pretty low level code in C (and reimplement big chunks of asyncio in C too). Another option is to use modern APIs like aio in Linux, but as far as I know almost nobody uses it for real.

Bottom line -- you don't need coroutines or asyncio to do file io. What you need asyncio (and frameworks like aiohttp) is to do network programming in Python efficiently.


> basically you shouldn't expect your code to block on disk io. But even if it does block for a very short amount of time it's probably fine.

No, it's very often not fine. Magnetic disks, still the norm for many, and definitely with large storage, often go as low as 5KB/s for random access (or even sequential access to very fragmented files). Reading a 1MB file can easily take 5-10 seconds in some setups - which is not acceptable for any interactive service. It's not fine for a web server to not service any requests for 5 seconds.

> Another option is to use modern APIs like aio in Linux, but as far as I know almost nobody uses it for real.

Anyone I know who tried came back screaming. There is no way to do an async file open, for example - which means that if you rely on aio, you can block for 10 minutes waiting for an NFS or SMB mounted file to open.

The only sane, portable thing to do for Unix/Posix is use a threadpool for async file io - or just use something like libuv which already abstracted async operations this way.


"you shouldn't expect your code to block on disk io"

uh?! In what world are you living in? Blocking the main thread of your app on disk I/O makes for a poor user experience.

Disk I/O is a big deal for a whole range of applications. I would even say almost all apps have to deal with disk I/O at some point, as opposed to network I/O.


After playing with Asyncio more, I certainly see your point. I guess my larger point was that, even with the new async/await syntax, Python's general multitasking is still rough and incomplete imo. I thought async/await was a solution, but it's not. I just imagine it being so much simpler.


Tangentially related: of the people and projects that are using Python 3, why? I've found that aside from syntax and a few features here and there, Python 2 and 3 are more or less the same technology-wise, especially since 2.7 has a lot of features backported from 3. Thus to me it seems better to stick with 2 because there's so many existing libraries and CPython is the only implementation with complete Python 3 support.

If Python 3 had good concurrency and optimization (something neither version has right now), I'd consider using it, but is there an already existing reason that I'm just not seeing?


Python 3 has the advantage of having a future. It is the living branch of the language.

The advantage of Python 2 is an evaporating pool of legacy Python2-only libraries and legacy Python2-only programmers. Python2 is dying at the rate that pool is evaporating.

Python 3 has the advantage of having a future, plus all the advantages of Python 2 and more with the exception of that shrinking pool. I have already replaced all my Python 2 code and won't intentionally write more code that I know will have to be replaced. I'd like anything I build now to have the greatest likelihood of still being something I can continue building on in the future. Python 3 is more likely to fit that requirement than Python 2.

I'm teaching my kids to program. Python 3 is a good choice for doing so. Python 2 would set them up to be teenage maintenance programmers. Doing that to them would be child abuse. I upgrade my own skills, too, so wouldn't choosing Python 2 for my own upgrades be a form of self-abuse?


Simple answer - It's the current (and currently updated) version. Why would you not use it when starting a new project?

More fancy answer - There's plenty of nice improvements to the language (better handling of iterators, tidied up std lib, modern objects, etc), though it's probably worth it for unicode support alone - trying to convince python2 to properly handle international text is just a huge pain (also - no you can't just replace all accented characters with non accented ones if you want to preserve the correct meaning of text and not piss off all your customers by misspelling their names). Yes you can eventually solve all problems with python2. But why bother when you can just use the latest version, there's basically no cost for new projects.


There is a very clear cost for many new projects: infinite... for the simple reason that some key libraries still do not do Python 3 at all. Let's be clear. Cassandra is the best large ingest nosql database out there. Python 2 only on CQLSH. Bloomberg. The default terminal used by the 300 000 most important financial people on earth. API Python 2 only. Theano. 3.x via 2to3 only. Anaconda, 2 still the default download with 3 an afterthought (in other words, even for new downloads, 2 still the majority!). This is more than about stuff not being ported yet. It's about people preferring 2.x.

This idea that Python 3 is cost free needs to be expunged. For large classes of users, Python 3 is not possible even if they want to (which they don't). Stop this erroneous propaganda. Unicode is nice if you're a web guy, and if you're a web guy, why are you using Python already? Unicode is completely irrelevant for everybody else and is most definitely not a core reason to move to Python 3. You're living in cloud cuckoo land on your 3.x magic mushroom trip.

Earth to web jockeys. Python's hardcore is numerical computing, and that hardcore is not on Python 3, and is not moving anytime soon, and certainly not for the dubious benefit of unicode. Moreover the web stack is much less important to Python than is numerical and scientific computing, for the simple reason that while the former has multiple better competitors in the form of golang, JS et al, the latter does not.

[sidebar: since when does ascii not cater for accents?? I am bilingual french / english and I have never had a problem typing french accents in ascii? You're creating misleading propaganda again. Once again, EatHeart, I quote you: "I suspect you don't really know what you are talking about...", or worse, you have an agenda to mislead.]


>Unicode is completely irrelevant for everybody else and is most definitely not a core reason to move to Python 3. You're living in cloud cuckoo land on your 3.x magic mushroom trip.

you're from the US, right? there's about 6 billion people for whom ascii isn't enough. some of them program in python.


Bloomberg supports Python 3. Your information is old.


it's probably worth it for unicode support alone

Yes. I'd probably accept a slightly worse language than python 2 but with reasonable unicode support. Fortunately, there's no need to.


I've been slow to move to 3.x, but have just recently started using 3.4 as my first Python 3.

I never had any problem with 3.x conceptually, I was just slow/lazy and because I could see how each new 3.x release got better and better there wasn't much rush. I always thought I would move to 3.x one day. And 2.7 was a nice place to wait for awhile across distro upgrades.

I haven't bothered with any migration of old code yet - just starting new stuff in 3.4. I've got one project that aims to be 2.7/3.4+ bilingual and it's going well so far.

Now that I've tried it I do like 3.4 - four releases of small feature improvements do add up eventually. I think Python's strategy of keeping 2.7 around for a long time to let stuff migrate when ready is a good one. Each new 3.x release also makes migration from 2.7 a bit easier. Not worrying about supporting 3.0, 3.1 or 3.2 (and 3.3 even?) also makes migration easier too.

I'd recommend moving to 3 for new projects. Migrating existing projects is more of a case by case deal where it may or may not be worth it yet or even ever.


For me personally, I've had much better experiences with Py3 than I ever did with Py2. I've used Py3 for most of my projects since 2012, and I've never run into any major problems.

Despite what people like to say on forums, pretty much every library I'd want to use works fine. And it has for a long time.

When I've tried to go back and use Py2, I keep getting annoyed by simple things. The unicode is a pita. Not having keyword-only arguments is annoying. Not having "yield from" means more dancing. Py2 forces me to manually make a lru_cache decorator, instead of just working..

From my perspective, sure, I _could_ use Python 2.x.. But Python3 works better and is easier.

The question isn't "Why Python3", it's "Why the fsck wouldn't you?"


Type annotations. They're the sole reason we moved from Python2 to Python3, and it was completely worth it. With mypy, we now have at least some semblance of a type system and type checking, which is a huge win.

Unicode support is vastly improved.

Finally there are have specific exceptions like FileNotFoundError so you don't have to catch OSError and then check errno. I got so tired of that.

No more old-style classes, which means you know you can use super(), which is now less annoying since you don't need to “remind” Python which class you're in.

Lots of other small changes, most of which do lead to improvements, however minuscule. I would argue that unless you are must use a Python2-only library, just go with 3.


The amount of libraries working on Python 2 only is really small and not a problem for a lot of us: http://py3readiness.org/

The syntax is really nicer IMHO. Unicode strings are really nice to have too. Plus Python 2.x will be EOL'd soon. I just don't see the point of working with it anymore.


Python 2 has an EOL in 2020.

Why start a new project in a language that will have support dropped in ~4 years?

If it isn't going to be around in less than 5 years, I'm not going to use it for new projects which have lifespans measured in decades. I've got projects still in production, almost unchanged [only changes were to update business rules], for 4 years. It moves millions of dollars a year. I sure as hell don't want to worry about OS support for it another 4 years.


I held of switching to Python3 until last year because I was worried about library support. But we're at a point now where the vast majority of currently maintained libraries are Python 3 compatible.

In the end what pushed me over were the improvements to the standard library multiprocessing module only available in Python 3.


I don't use python all that much, but out of curiosity, what's the problem with multiprocessing? In the languages I do develop in, I find it much easier to reason about than multithreading.


Data exchanged between the processes is serialized with pickle. Pickling is slow, adds latency, and doesn't work on all objects.

https://docs.python.org/2/library/multiprocessing.html#pipes...


That sucks; there's no shared-memory message-queue implementation?


any message queue implementation would require serializing (which the Python standard library provides with "pickling"). If there was a reasonable way to share live objects without the GIL, Python threads would use them.


When he brings up requests in comparison to urllib he seems to not know about aiohttp https://github.com/KeepSafe/aiohttp


Better multitasking and concurrency syntax, with task scheduling and out of core support here: http://dask.pydata.org/


In other words, a lost opportunity. Asyncio and this new syntax is both hard for beginners and experienced python coders alike, and still doesn't do multicore!. We're clocking up 3.x version numbers as if this will magically provide the illusion of progress, but the killer feature is still not there. Instead we get type annotations. In a dynamic language. Which doesn't compile and therefore doesn't need them. With no performance advantage. If I have to type declare everything, I want 10x performance. Okay?

If Python were a listed company, the CEO would have been replaced long ago. I'm tired of watching my favourite language flail around like this. Will Continuum Analytics or Enthought please fork 2.7?


> Asyncio and this new syntax is both hard for beginners and experienced python coders alike, and still doesn't do multicore!.

The fundamental limitation here isn't unique to Python. Multiple tasks that are CPU bound simply cannot be managed using cooperative multitasking, and that's all asyncio is: cooperative multitasking, with a friendlier syntax than the old asyncore module.

Cooperative multitasking using async I/O works great when you have lots of tasks running that are I/O bound--they spend almost all of their time waiting for network or disk reads/writes. But as soon as you start piling up tasks that require CPU, you need either multiple threads or multiple processes. Python asyncio was never intended to handle that use case.

Multiple threads is where Python has a unique limitation: the GIL prevents multiple threads from running on multiple cores, so if you want multiple CPU bound tasks to use multiple cores, you need to fork each one into a separate process.

It also doesn't help that the multiprocessing module is, IMO, a huge ball of cruft that's overkill for a lot of multitasking use cases. But there are simpler ways to do it, at least on Unix systems, because the fork system call is very lightweight.

Shameless plug: I wrote a library for this some time ago, which I use whenever I have a Python project that needs to fork worker processes; it's the "comm" sub-package in my plib-io Python distribution (this is the Python 3 version):

https://bitbucket.org/pdonis/plib3-io/src


I know it's not easy to graft multicore onto Python. So why pretend? If you're serious about web programming you're not using Python anyway. You're using JS or Golang. And if you insist on python, we have plenty of options to do async i/o already. Optimising async is like polishing the doorknobs on the titanic. Waste of effort.

What we need, is a concentrated focus on the only two areas where Python beats the competition hands down. Numerical computing, and newby accessibility. The first gets a token "@" operator (great, thanks), the second is moving backwards with 3.x. Asyncio, type annotations, and unicode, are answers to questions nobody is asking (except by those who refuse to use the right tool for the right job).


> If you're serious about web programming you're not using Python anyway.

Web programming in general is not CPU intensive, so async I/O (which Python already does well, as you point out--except for the API issue, see below) works fine for web programming.

> if you insist on python, we have plenty of options to do async i/o already

But all of their APIs suck, at least IMO. Supporting async I/O with built-in language syntax, to make code easier to write and more readable, seems like a good idea to me.


I think the limitations of the GIL are not all in all a bad thing, but I do think that the Python community needs to seriously look at making Asynchronous execution a priority in future versions. Asyncio is really complicated and provides very little in terms of performance.

Multitasking is the most important issue that Python faces today (and maybe PyPy is the answer).


The problem that python is facing (and pypy is not solving) is that in order to have a working async library ecosystem, everything, especially the stdlib has to be async aware. This is a far cry right now and I don't see any of the python devs really trying to address that. Otherwise you end up with the same situation as for twisted - it works and it's great, but every single library needs to be made twisted aware


So we're in 3.5 already, and one of its central features, async, is flawed, you say. So not only do we not have a killer feature, but even one of the nice-to-haves is a dog. Timeout. This project needs new leadership. The competition (Golang) is walking all over Python on the async issue.


I read your comments on this thread, and I frankly wonder where you are coming from. It seems you care a lot about Python, but have a crisis because your pet peeve (threading) is not being addressed, and as a result expect the team who has been guiding Python to the success that it is over the last 25 years to step down. (And it has been a huge success - it has none of the big corporate money backing that Java, C# and Go have - but mindshare and notability on the same scale, dominates significant niches like non-HPC scientific computation, and has very significant presence in almost every field).

Could you tell me what (and how long) your experience is?

My answers about your complaints are basically:

1. Solving threading is (relatively) easy, if you just give up backwards compatibility; Not the 2.x -> 3.x compatibility which is comparatively trivial - but in a major way, breaking every single extension library, and the vast majority of Python code (or slowing it down unbearably). You might be happy, but the rest of the Python users (basically, the reason you actually use Python) won't.

2. Assuming they agree with you about their failure (I dont, FWIW), someone has to step up and offer an alternative. Who is that, what are they doing these days, and why do you think they will succeed where Guido et al "failed"?

3. Perl at 28 years is not much older than Python at 24 years (relatively speaking), but has been sliding into obscurity for a long time now, whereas Python is flourishing. The Python 3 transition is actually happening as planned (IIRC, there was an expected 5 year period just for feature and speed parity!). Perl 6 is esoteric, Perl 5 is aging and dying. I think that for a living, popular language, the Python team is doing a commendable job, even if they do not address a specific issue (that many people care about, but would actually not make that much of a difference in practice if we are to learn from other languages)


The GIL issue has been solved without breaking backwards compat: http://pyparallel.org/

It just need further dev work and acceptance into py3


Did you actually look into it? The threads you run can't make any change to existing objects, and various other changes. It does break compatibility, and needs patched version of Numpy, ODBC (and I would guess, most other packages).

Definitely not "solved without breaking backwards compat".


On Windows


I have been using Python for a decade, and intensively for 7 years. I am a domain expert (finance) but I do also have 4 years of CS so consider myself non-idiotic, if not haskell-genius, on programming languages. I adopted Python (before it was popular - it was behind ruby / PHP / perl at the time) because it was pragmatic above all. That last statement is being violated with 3.x, and that is the seed of my concern.

Before answering your questions 1 thru 3, some context. It is my strong belief that the authorities are constantly looking at golang and JS as their competitors, in other words, the web world, whereas the real hardcore advantage of Python is in science and numerical computing. As evidence, witness numerous Python books which advise new users to hit the Continuum Analytics or Enthought sites for their full-stack Python installations, even texts which are not about numerics.

On your questions:

1) I don't care about threading. I care about pushing as many compute bits through the Xeon as I can in a given amount of seconds (using Numpy). But as I am a data scientist, I need the REPL. C is out. Why can I still not do this easily? Multiprocessing is there, sure, but it's been unch for years, while Cuda, OpenCL etc are far too hard for guy like me whose intellectual bandwidth is occupied with the domain, not the CS. Isn't that what Python was supposed to be about? Getting stuff done? Why isn't Python vectoring my data through the CPU and GPU yet, 15 years after numeric was first introduced?

2) Continuum Analytics is doing an awesome job and I don't see why they, or Enthought, couldn't take up the mantle, 10gen/Datastax style to use a database analogy. They really know their customers, and the Continuum stack delivers real new value every 6 months, and not only for a scientific audience. More generally, real users in real domains should be driving the project.

3) I am less concerned about Python 3 happening as planned, than I am about the focus of the project. Type annotations? This is an intellectual indulgence if it does not increase performance. Asyncio? We've had async libraries for years! Even when I started Python we had async libraries (not as good but they were there). How is async something fantastic and new? It's nothing but polishing an existing capability a little bit further. Unicode. fine. But again, web focused. Nobody else cares. Xrange laziness. Okay. Leaves me cold. Print(). No CS benefit, but huge marketing loss as you can't go out to newbies anymore and say "hey, check this out... Python hello world?"

  >>> print "hello world"
  "hello world"
It just doesn't get any simpler, and yet Python wants to throw out that unique hook for new people who care little about coding but a lot about domain. It's zero, genuinely zero, boilerplate, whereas there are a dozen languages where you can do print("hello world"). Seems trivial, but in print vs print() we have the key difference in philosophy (get-stuff-done-now vs take-me-oh-so-seriously). If you're so serious at a computer science level, you're not going to do Python.

So. What is Python. A serious language? NO. A wonderfully malleable, not too serious, friendly language, into which you can insert some real hardcore stuff (Numpy, ML, website parsing, database transformations, game scripting, image processing....the list is endless) really easily? Yes.

Where is Python genuinely way ahead of everyone else? Only on numerical computing.

It strikes me that Guido and co are embarrassed by their weekend hack of 1.x and 2.x, when that is precisely what the user base loves about it. Their attempt to make Python serious, is killing Python's original spirit. There is nothing wrong with 2.7. Nobody wants Python to morph into Java.

So, am I a Luddite wanting 2.7 to live forever? no. What I want is vectorization plus DAG-like workflows. These are the most important pieces of computer science that actually dovetail with real world use cases, today. Yes async is cool, but golang now owns that space. What I'd really like is for Python to give us a good framework for the Big Data world which is in almost everybody's use case now, and that means, Python needs to talk multiprocessor, Python needs to talk GPU, Python needs to talk cluster, and Python should long ago have been addressing this directly. Python needs to "go vectorized". That should be the project's obsession. SIMD, in a word, where the parallel granularity can go from GPU kernels,to CPU, to multi-machine clusters, and DAG-like workflows (with possible recursion) built in. This is not a nice-to-have capability anymore. It is what the next wildly popular mainstream programming language will have, built into the language. The signals from everything we're seeing added to 3.x is NOT this. It's web-like stuff. That battle is over. JS won.

Let's not gift the opportunity of huge data to Java (Spark) and Cuda, or a new language that will see the future better than us. It should be Python!

There you go. My view.


> Where is Python genuinely way ahead of everyone else? Only on numerical computing.

No, that's not where it is. It's in being a user friendly language for people who need to get numerical work done, and there's a huge difference. IPython/Jupyter did not happen in any other language, not even remotely (the closest thing I've seen is some Lua graphic repl that didn't go far).

> Nobody wants Python to morph into Java.

Python is not morphing into Java, far from it. 3.x represents a cleanup of the language (str vs. unicode, new vs. old style classes, a lot of other stuff). I do not agree with all of it (I too disagree with demoting print from statement to function), and there is other stuff I would have done - but most of it was desperately needed.

> What I want is vectorization plus DAG-like workflows. These are the most important pieces of computer science that actually dovetail with real world use cases, today

For a tiny, tiny part of the users. And they already have e.g. Theano, and a few other tools, that give some solutions.

> What I'd really like is for Python to give us a good framework for the Big Data world which is in almost everybody's use case now,

Excuse me, but I have to disagree anyone other than a tiny (perhaps 1 percent) of Python users care about big data. Most definitely not "everybody's use case". And the minority which actually cares about big data should not get anywhere close to Python - the runtime is prohibitive.

> This is not a nice-to-have capability anymore. It is what the next wildly popular mainstream programming language will have, built into the language.

Interesting. What language has this today? I know not of a single one, mainstream or niche. I don't believe that you are right on this.

> Let's not gift the opportunity of huge data to Java (Spark) and Cuda, or a new language that will see the future better than us. It should be Python!

Check out Nim + Nimborg. You'll get the best of Python and Nim, and Nimborg would let you do the transition smoothly.

Your idea of Python is not Guido's idea of Python, and as far as I know does not have popular support anywhere. If I were feeling so strongly about my favorite language being so misguided, I would look for a new language. The momentum of existing development direction is essentially unstoppable, whether it is right or wrong.


I fail to see how asyncio should use multicore. Asynchronous programming uses a single thread...

If you use CPU-bound tasks, using non-blocking I/O (which is what asyncio is about) is not going to work. You can use multiple processes (maybe from concurrent.futures[1]) to help that case.

[1] https://docs.python.org/3.5/library/concurrent.futures.html#...


They pretty much already did, but with 3.x: http://pyparallel.org/


Amen brother.

If you make one mistake in your thinking, it's treating Python as more than a christmas hack (which it started out as). It's not a serious language, it has no design behind it and is decades behind the state of the art.

Use it for throw-away scripts and such is what I say, it's an area where it shines.

The moment you want to _architect_ something substantial, reach for something else cause chewing gum and duct tape is not going to do it.


We are pretty serious about the web, infrastructure, and platforms in general, and we use Python. As does Dropbox, Google, Facebook, Twitter, etc.

I don't think concurrency is great in Python, but you're living in the wrong world if you dont think people are using the language for real things.


Not sure what you mean by pretty serious about the web, infrastructure and platforms. To me this reads as an oxymoron because Python is not a pretty serious language. It's a toy at best. There is not a single feature that Python introduced or pioneered in the computing arena. It's a mismash of bad ideas from other similarly bad languages, wrapped in user-friendly paper. In that regard, it's very similar to Perl, but a lot more dangerous.

Unlike Perl, there is nothing immediate in Python to warn you of the pit you're digging for yourself, especially if you're a newcomer to programming (as a substantial chunk of the Python community is). The troubles begin when you make the colossal mistake of trying to use Python for more than it can do, try to design and architect actual systems with it but I digress.

On the other point you made, Google, Facebook, Twitter etc are using C++ too.

Does that make C++ a good language for architecting substantial systems? Show me (1) system of substantial size implemented in C++ that hasn't turned out to be a DISASTER in anything _but_ performance. When your browser can be taken over a million ways to Sunday _just because_ it's implemented in C++, do you think Google, Facebook, Twitter etc give a damn if _you_ don't? Same argument for PHP and Facebook.

Would you say that these companies are "pretty serious" about web, infrastructure and platforms?

There are good languages out there, suited to architecting systems that are maintainable, secure and evolve gracefully. You won't find them simply by looking at what Google, Facebook and Twitter are using. You will need to try harder.


Amen buddy. I hack in Python. I discover in Python. I script, and I pre-production in Python. Then I do C or Java or even JS. But none of my pro-code would have been possible without all the learning I accumulated about the problem in good ol' 2.7 interactive and its arsenal of tools.

[edit] Actually I think you're right. I think it's time to graduate to something serious. My mistake is to think that Python can be that serious language. 3.x, by trying to be that, is horribly compromising the basics of what Python is all about, namely accessibility, friendliness, discovery, malleability. Not production.


so is it a little syntax sugar on top of the multiprocess module to please ios developers?

there's nothing new that i can see under the hood


no, asyncio is a completely different module and paradigm from multiprocessing. the new standard syntax uses asyncio, and none of it has anything to do with ios


I mentioned iOS because Apple's Grand Central Dispatch API is something I wish Python had. Python's multitasking is so complicated while Apple's GCD is so elegant. It's sad really.


Your example is trivially parallelized with some decorators from the dask library.

Example here: http://dask.pydata.org/en/latest/imperative.html


I hadn't seen Dask before this. Thanks!


Sure!

If you like it, can you please blog about it (and post on hn)? Needs exposure outside the python data community.


Have you seen this: http://dask.pydata.org/en/latest/




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: