
The Future of Asynchronous IO in Python - leonardinius
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce200536d847
======
Animats
It's frustrating. Many people want an interprocess subroutine call, and few
OSs have the right primitives for it. (QNX does, and Windows sort of does, but
the Linux/Unix world does not.) There's an endless collection of kludges so
program A can call program B implemented on top of pipe/socket like
mechanisms.

OpenRPC and CORBA are out of fashion. Google protocol buffers help, but don't
come with a standard RPC mechanism. HTTP with JSON is easy to do but has high
overhead. Then there's the handling of failure, the curse of callback-oriented
systems. ("He said he'd call back. Why didn't he call back? Doesn't he like me
any more? Should I dump him?")

The lack of a good, standardized interprocess call mechanism results in
interprocess call logic appearing at the application level. At that level,
it's intertwined with business logic and often botched.

~~~
lmm
> Google protocol buffers help, but don't come with a standard RPC mechanism.

Sounds like you want Thrift.

~~~
jbergens
Doesn't Cap-n-proto have some RPC mechanism in it? Might help for some cases.

~~~
lmm
Maybe. RPC is a first-class citizen in thrift, and it's more mature and
established than Cap'n Proto (and had better cross-language support last time
I checked). But whatever works for your use case.

------
Reef
Article suggests that a high performance framework is implemented where
notification of a unavailability will be propagated "before real user tried to
execute a request" (which would have 50% probability of happening if number of
heartbeats per second would be equal to number of requests per second the
service serves, ha ha). Also from the Article we can learn that sending a 304
with content does not work as expected (!?)

Not only that, but there should be only one connection to database, through
all traffic will go. Pulling that up through a network switch connected to a
server with multiple cables, or pushing all that through a single TCP
connection, with a foot note that advises against rewriting it after a few
years?

Also, everything explained there should run in a single thread, because surely
it will be fast enough.

Good luck with that.

[if you find other funny stuff that I missed, leave them in the comment
section below]

~~~
tailhook
> Article suggests that a high performance framework is implemented > where
> notification of a unavailability will be propagated "before > real user
> tried to execute a request" (which would have 50% probability > of happening
> if number of heartbeats per second would be equal to number > of requests
> per second the service serves, ha ha).

Yes and no. Sometimes there is a dependency that is used only on 1% of
requests, or even 0.1% of requests. And that's thing thats hard to track by
50x error codes, because they are really rare.

But anyway it's not "fully designed feature", just a vision of what
potentially can be done. So it's not going to be a stumbling-block for
implementation.

> Also from the Article we can learn that sending a 304 with content does not
> work as expected (!?)

Sending 304 works as expected. Passing arbitrary status code with arbitrary
headers with arbitrary body from the application to a framework doesn't work
as expected.

> Not only that, but there should be only one connection to database, >
> through all traffic will go. Pulling that up through a network switch >
> connected to a server with multiple cables, or pushing all that through > a
> single TCP connection, with a foot note that advises against rewriting > it
> after a few years?

Not sure I understand your question well. But note that I'm speaking about
Python. We have many python processes on the box anyway. Each of them has it's
own connections to the downstreams (i.e. a database).

Then if we start writing asynchronous Python code, we need to send requests
from multiple asynchronous tasks from each of the python process. I argue that
it's more efficient to send requests from all tasks of a single process
through a single downstream connection.

> Also, everything explained there should run in a single thread, > because
> surely it will be fast enough.

Sure, single I/O thread in C will outperform any python processing of that
data. That's true for 99.9% use cases.

------
xorcist
> (The GIL) is there for other scripting languages too (Ruby, Perl, Node.js,
> to name a few)

I don't know about Node, but I know it's perfectly possible to write "proper"
multi threaded programs in Perl (and has always been since the version string
started reporting support for threads, which should be a good 15 years ago).

It's not terribly relevant to the article (because you normally don't write
multi threaded programs in Python), but then why bring it up?

~~~
Flimm
True, also relevant is that interpreter threads are officially discouraged
now.
[http://perldoc.perl.org/threads.html#WARNING](http://perldoc.perl.org/threads.html#WARNING)

------
mamcx
Ok, and if we can start clean?

(I'm in the process of build a toy language).

Is this more a problem for a language with baggage, or is general? I wonder if
for example in Go, Erlang is less of a issue...

~~~
rubiquity
The entire premise of this article is that the author has adopted the
microservices disease and is suffering from not having asynchronous IO. You'll
never hear an Erlang programmer even talk about microservices because they are
a solution to a problem that doesn't exist in Erlang, or any sufficiently
concurrent language for that matter[0]. Oh, and microservices clearly create
other problems, such as messaging.

0 - I do hear Go-lang users talk about microservices, which is weird. Because
they have decent concurrency primitives to where they shouldn't need to split
web applications up into microservices.

~~~
untothebreach
a lot of the articles I read about microservices seem to come at them more
from an organization point-of-view than a concurrency one. That is,
encapsulating the logic for one specific component of a system, both for ease
of scaling up only the parts of the system that need to be scaled, as well as
for a different way for dev teams to interact with other parts of the system.

I'm not saying I agree, but I do understand why some golang programmers might
be evangelizing microservices as well

~~~
jbergens
I still don't see how it really helps. For organizational concerns it may be
easier to say that a team is responsible for a (micro)service than for a
library, but it doesn't have to be easier. If they change their API all users
will suffer anyway. It doesn't have to help much with scaling either. If the
service/library is the thing that uses most of the cpu it will still help to
add more servers with everything on them. Microservices actually may cause
bigger lag since the network connections may take time (app calls service 1
which calls service 2 etc). If they all were libraries on the same server the
calls would be much faster.

You do get an opportunity to monitor each service easier if they live on
separate machines (or just separate processes). If you want that with
libraries you need to add monitoring code to all libraries and then collect
data from them to see where the bottlenecks are. You might also need to
develop new tools to configure things like pool-sizes and cache-sizes for each
library. For example saying that you should have 5 threads (or handlers)
handling mail-messaging and 20 threads/handlers handling calculations. You
might also want to be able to change theese configurations in a live system. I
think a lot of applications misses this monitoring and configuration part and
then gets into problem because of that. And of course microservices helps with
using different languages for different parts, even though most organizations
I've seen seem to mandate a specific language anyway.

------
rumcajz
This article raises many relevant points.

But the real issue behind all of that is that we lack means to easily
implement protocol stacks. Implementing a new protocol (especially in the user
space) is a task that can easily eat months or years of your precious time.

~~~
tailhook
Well, for many protocols it easy. I had written (a useful subset of) mysql
protocol parser in a weekend. Sure, this is in Python, so reimplement it in C
is much harder, still not years or so. Many protocols: mongodb, redis,
memcached, beanstalk, etc, are much simpler.

There are protocols that take months and years, but they are not so ubiquitous
(with the obvious exception of HTTP, which is really complex and ubiquitous).
So they may be developed after basic tools are in place.

~~~
rumcajz
Parsers are generally easy, I was thimking more about implementation stuff
like: How can we pipe input from one protocol to another protocol on top of
it? How to poll on a user-space protocol? How to integrate it with foreign
event loops? If in user space, how to handle process termination if there's
still data in tx queue? How to handle many different peers at once without
having to create a separate thread for each of them? Et c.

------
theVirginian
I'm sorry to be negative but the grammar mistakes in this article were
glaring.

~~~
tailhook
Sorry, English is not my native language. You can use "notes" in medium, to
point me to the mistakes.

------
notastartup
Do we really need to reinvent the wheels? Microservice, the async craze feels
like, we now need to create a single wheel that is made up of many wheels, and
spend an enormous time making it look and feel like a single wheel that was
working fine, never mind that it serves absolutely no difference to the end
user, it will make our next few years interesting because the old way of
creating the wheel is boring and unexciting.

I think that after a few years, software businesses will realize that it was
an investment with questionable advantages and go right back to what was
working fine for the past decade and will continue to work fine.

~~~
tailhook
Well, here is a good article:

[https://plus.google.com/+RipRowan/posts/eVeouesvaVX](https://plus.google.com/+RipRowan/posts/eVeouesvaVX)

It seems that Amazon started using microservices in 2002. Do you think 12
years is not enough to learn on mistakes?

~~~
lmm
Amazon uses microservices because they need to at their scale. But there are
maybe a few dozen companies that operate at that scale. 12 years isn't enough
time to learn a lot when there are only a few people doing it, just as in e.g.
spaceflight.

