Hacker News new | comments | show | ask | jobs | submit login
The Future of Asynchronous IO in Python (medium.com)
85 points by leonardinius on Nov 26, 2014 | hide | past | web | favorite | 50 comments



It's frustrating. Many people want an interprocess subroutine call, and few OSs have the right primitives for it. (QNX does, and Windows sort of does, but the Linux/Unix world does not.) There's an endless collection of kludges so program A can call program B implemented on top of pipe/socket like mechanisms.

OpenRPC and CORBA are out of fashion. Google protocol buffers help, but don't come with a standard RPC mechanism. HTTP with JSON is easy to do but has high overhead. Then there's the handling of failure, the curse of callback-oriented systems. ("He said he'd call back. Why didn't he call back? Doesn't he like me any more? Should I dump him?")

The lack of a good, standardized interprocess call mechanism results in interprocess call logic appearing at the application level. At that level, it's intertwined with business logic and often botched.


This x1000. Has anyone used posix queues on linux for this? I work on a rube goldberg app that uses a custom protocol to talk to a c++ app that has a plugin system (bi directional comm between host and plugin dll), that talks to external processes over ipc. The ipc part is the cherry on top.


OS X still has a great basis for interprocess communication via NeXTSTEP's PDO: http://en.wikipedia.org/wiki/Portable_Distributed_Objects it's existed for over 20 years now and is IMO unjustly ignored in spite of its simple implementation and low-overhead.


> Many people want an interprocess subroutine call, and few OSs have the right primitives for it. (QNX does, and Windows sort of does, but the Linux/Unix world does not.)

Could you please elaborate on Windows part of your statement? What particularly are you referring to?


COM/DCOM have offered a relatively usable solution for interprocess and cross-machine RPC for... quite a while. I used it to do interprocess communication back in the VB6 era (it was built-in), and in fact I did it by accident. I picked the wrong checkbox in the project options, so I ended up with the user interface of my application and the backend literally running in separate processes.

Other than the performance overhead (that I initially chalked up to 'oh, COM is just slow I guess'), the way I finally noticed was when a pointer I passed across the COM boundary wasn't valid (because it was to memory in another process). Whoops! Everything was working perfectly up until that point.

(FWIW, I am pretty sure the invalid pointer thing only happened because I was passing a raw address around - VB6 doesn't have pointer types.)

Plus, since COM provides ways to do source-level and binary compatibility, you can leverage that for your RPC.

I won't call it awesome, but it's quite robust and gets used in many places on Windows.

At a more basic level, you can trivially implement RPC using windows messages, though the security model improvements in 7 and 8 have made this a bit more complicated. There is excellent, straightforward infrastructure for establishing message loops and sending messages - really easy to get right - and it doesn't require your application to have UI. Pretty much any thread can receive and process messages if it wants to.


> I picked the wrong checkbox in the project options, so I ended up with (...)

You accidentally summarized my experience with development on Windows in the nineties :-)

It's funny that what was designed as a feature of the ecosystem (tooling that abstracts complexity), completely turned me away from Windows development and into Linux development. Development on linux was, at the time, much cruder and closer to the metal. The positive note was that everyone was on the same boat, so documentation of low level routines, as well as community support, were flawless.

(not that I can, to this day, understand documentation created by kernel hackers, such as the documentation of nftables, tc or the ifb module, but I digress)


> Google protocol buffers help, but don't come with a standard RPC mechanism.

Sounds like you want Thrift.


Doesn't Cap-n-proto have some RPC mechanism in it? Might help for some cases.


Maybe. RPC is a first-class citizen in thrift, and it's more mature and established than Cap'n Proto (and had better cross-language support last time I checked). But whatever works for your use case.


So often an RPC mechanism is not what you really want in the end. You tend to be better off with a MOM-style interface.


What do you mean with Windows sort of does ? Windows has good IPC.


It depends on what you mean by IPC I guess. There's a billion and one ways to do it, everything from message passing to shared memory. If you want to do Windows-style IPC on UNIX, wouldn't SYSV IPC suffice? (QNX is of course another beast altogether.)


Do you have a link for description of how it works in QNX?


I guess it's about the QNX message passing APIs http://www.qnx.com/developers/docs/6.4.1/neutrino/sys_arch/i...


is this 2004? D-Bus has been around for 9 years.

it’s not an OS primitive (yet: kdbus), but why does it need to?


Article suggests that a high performance framework is implemented where notification of a unavailability will be propagated "before real user tried to execute a request" (which would have 50% probability of happening if number of heartbeats per second would be equal to number of requests per second the service serves, ha ha). Also from the Article we can learn that sending a 304 with content does not work as expected (!?)

Not only that, but there should be only one connection to database, through all traffic will go. Pulling that up through a network switch connected to a server with multiple cables, or pushing all that through a single TCP connection, with a foot note that advises against rewriting it after a few years?

Also, everything explained there should run in a single thread, because surely it will be fast enough.

Good luck with that.

[if you find other funny stuff that I missed, leave them in the comment section below]


> Article suggests that a high performance framework is implemented > where notification of a unavailability will be propagated "before > real user tried to execute a request" (which would have 50% probability > of happening if number of heartbeats per second would be equal to number > of requests per second the service serves, ha ha).

Yes and no. Sometimes there is a dependency that is used only on 1% of requests, or even 0.1% of requests. And that's thing thats hard to track by 50x error codes, because they are really rare.

But anyway it's not "fully designed feature", just a vision of what potentially can be done. So it's not going to be a stumbling-block for implementation.

> Also from the Article we can learn that sending a 304 with content does not work as expected (!?)

Sending 304 works as expected. Passing arbitrary status code with arbitrary headers with arbitrary body from the application to a framework doesn't work as expected.

> Not only that, but there should be only one connection to database, > through all traffic will go. Pulling that up through a network switch > connected to a server with multiple cables, or pushing all that through > a single TCP connection, with a foot note that advises against rewriting > it after a few years?

Not sure I understand your question well. But note that I'm speaking about Python. We have many python processes on the box anyway. Each of them has it's own connections to the downstreams (i.e. a database).

Then if we start writing asynchronous Python code, we need to send requests from multiple asynchronous tasks from each of the python process. I argue that it's more efficient to send requests from all tasks of a single process through a single downstream connection.

> Also, everything explained there should run in a single thread, > because surely it will be fast enough.

Sure, single I/O thread in C will outperform any python processing of that data. That's true for 99.9% use cases.


> (The GIL) is there for other scripting languages too (Ruby, Perl, Node.js, to name a few)

I don't know about Node, but I know it's perfectly possible to write "proper" multi threaded programs in Perl (and has always been since the version string started reporting support for threads, which should be a good 15 years ago).

It's not terribly relevant to the article (because you normally don't write multi threaded programs in Python), but then why bring it up?


True, also relevant is that interpreter threads are officially discouraged now. http://perldoc.perl.org/threads.html#WARNING


Ok, and if we can start clean?

(I'm in the process of build a toy language).

Is this more a problem for a language with baggage, or is general? I wonder if for example in Go, Erlang is less of a issue...


The entire premise of this article is that the author has adopted the microservices disease and is suffering from not having asynchronous IO. You'll never hear an Erlang programmer even talk about microservices because they are a solution to a problem that doesn't exist in Erlang, or any sufficiently concurrent language for that matter[0]. Oh, and microservices clearly create other problems, such as messaging.

0 - I do hear Go-lang users talk about microservices, which is weird. Because they have decent concurrency primitives to where they shouldn't need to split web applications up into microservices.


> You'll never hear an Erlang programmer even talk about microservices because they are a solution to a problem that doesn't exist in Erlang, or any sufficiently concurrent language for that matter[0].

The architecture of an idiomatic Erlang-based system is essentially a microservice architecture.

> - I do hear Go-lang users talk about microservices, which is weird. Because they have decent concurrency primitives to where they shouldn't need to split web applications up into microservices.

Microservice architecture has motivation (loose coupling, distribution, independent scalability of components) that go considerably beyond "my language doesn't have decent concurrency primitives".


> The architecture of an idiomatic Erlang-based system is essentially a microservice architecture.

:)

> loose coupling, distribution, independent scalability of components

If you have good abstractions for concurrency then distribution and independent scalability should be trivial in a single codebase. Loose coupling is usually a false dream. Microservices tend to get coupled at the network level instead of the code level. Yuck!


A system which serves millions of users spans multiple machines. It's much easier to scale and evolve your stack if you're keeping it decoupled. Plus if you use something like nanomsg you can keep the services on the same host, and use something like ipc:// for minimal latency, then just switch the protocol and move the service to a new host with no other changes when it's time to scale up.

No matter how good your concurrency primitives, you cannot escape the network. Yes, I know, Erlang is awesome and has excellent distribution and concurrency capabilities out of the box, but you cannot write all the things in Erlang, nor should you want to. Different languages/services have to talk to each other somehow and at some point.


>A system which serves millions of users spans multiple machines. It's much easier to scale and evolve your stack if you're keeping it decoupled.

It can be much harder. Let's say that you have a zipcode<->address conversion library and you're deciding whether to use it within the web stack (option A) or to create an additional service with a REST API on a separate machine (option B). Microservices!

Option A means that it scales with your web stack. If you have 5 application servers today behind a load balancer and your load doubles then tomorrow you will need 10.

Option B means an entirely new set of servers. Not only do you still need those 5 application servers, you need an entirely new server for handling zipcode<->address translation. Let's say it's maxed out.

What happens when your load doubles then?

Well, you'll still need to scale up those 5 application servers running behind a load balancer, but you ALSO need an entirely new load balancer and two servers behind it for the zipcode<->address translation service.

>Different languages/services have to talk to each other somehow and at some point.

This doesn't mean that you need for them to talk over a network layer.


I don't see why would you need to have a load balancer for the zipcode service; if the service truly scales linearly with the number of application servers, you can simply tie them together 1-on-1 with simple configuration.


What, so one extra zipcode server for every application server?


Or N-to-1. You still don't need load balancing, if it truly scales linearly, just configuration.


Microservices are touted as being fault tolerant. Erlang provides fault tolerance as part of the OTP. While I haven't programmed much in Go, I haven't seem much in the way of fault tolerance.


a lot of the articles I read about microservices seem to come at them more from an organization point-of-view than a concurrency one. That is, encapsulating the logic for one specific component of a system, both for ease of scaling up only the parts of the system that need to be scaled, as well as for a different way for dev teams to interact with other parts of the system.

I'm not saying I agree, but I do understand why some golang programmers might be evangelizing microservices as well


I still don't see how it really helps. For organizational concerns it may be easier to say that a team is responsible for a (micro)service than for a library, but it doesn't have to be easier. If they change their API all users will suffer anyway. It doesn't have to help much with scaling either. If the service/library is the thing that uses most of the cpu it will still help to add more servers with everything on them. Microservices actually may cause bigger lag since the network connections may take time (app calls service 1 which calls service 2 etc). If they all were libraries on the same server the calls would be much faster.

You do get an opportunity to monitor each service easier if they live on separate machines (or just separate processes). If you want that with libraries you need to add monitoring code to all libraries and then collect data from them to see where the bottlenecks are. You might also need to develop new tools to configure things like pool-sizes and cache-sizes for each library. For example saying that you should have 5 threads (or handlers) handling mail-messaging and 20 threads/handlers handling calculations. You might also want to be able to change theese configurations in a live system. I think a lot of applications misses this monitoring and configuration part and then gets into problem because of that. And of course microservices helps with using different languages for different parts, even though most organizations I've seen seem to mandate a specific language anyway.


Good point about the organizational point of view. I'm still torn on whether more smaller teams is better for organizations or worse. Smaller is usually better, but now along with coordinating between the applications at the network level you also have to coordinate between applications at the human level. Some organizations probably do a lot better at this than others.


Maybe I'm dumb, but doesn't splitting up a web application into micro services have many other benefits such as modularity, constraining complexity, code reuse etc.?


You can get those with a straight library without introducing the complexity of message-passing, RPCs, and distributed systems.

(I'm a big fan of SOAs/microservices for companies that operate at Google-scale. Most companies do not, and for the rest of us - plain old libraries are very underrated. You can always slap an RPC layer or message broker on top of a library's API, but there's no reason to pay that cost until you need to. The real reason to break things up into microservices is when you run out of RAM on the box, or alternatively when you want better cache hit rates by focusing the processor on a small amount of code. You typically don't get there until you're serving thousands of QPS against a data set in the terabytes.)


Modularity: No. Your headache in creating a system of micro services will just be that much greater if you don't have modular code. I wouldn't call that a benefit even if it does end up with you writing more modular code.

Constraining complexity: It actually makes complexity worse (what do you do when microservice A goes down? times out? returns data you can't deserialize? takes too long?). These things you do not have to worry about if you are not doing any inter process or inter-server communication.

Code reuse: No. Why would it help code reuse?

Additionally....

https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...


Bingo! It's extra funny when someone that dislikes distributed systems is pro SOA. How do SOA advocates not realize they are signing up for distributed systems? If you think keeping business logic decoupled is hard, try keeping network logic decoupled, oh and that unreliability thing, too.


Probably yes, it's a problem of a baggage. If you would start clean, you need two things:

1. Support full threading and one of the memory models that allows it. To have an inspiration look at Clojure, Rust, Go, Erlang (probably I don't mean any particular order).

2. Have a standard communication way between processes, like Erlang has. My favourite would be to implement SP protocol family developed as a part of nanomsg library (https://github.com/nanomsg/nanomsg/tree/master/rfc).


This article raises many relevant points.

But the real issue behind all of that is that we lack means to easily implement protocol stacks. Implementing a new protocol (especially in the user space) is a task that can easily eat months or years of your precious time.


Well, for many protocols it easy. I had written (a useful subset of) mysql protocol parser in a weekend. Sure, this is in Python, so reimplement it in C is much harder, still not years or so. Many protocols: mongodb, redis, memcached, beanstalk, etc, are much simpler.

There are protocols that take months and years, but they are not so ubiquitous (with the obvious exception of HTTP, which is really complex and ubiquitous). So they may be developed after basic tools are in place.


Parsers are generally easy, I was thimking more about implementation stuff like: How can we pipe input from one protocol to another protocol on top of it? How to poll on a user-space protocol? How to integrate it with foreign event loops? If in user space, how to handle process termination if there's still data in tx queue? How to handle many different peers at once without having to create a separate thread for each of them? Et c.


I'm sorry to be negative but the grammar mistakes in this article were glaring.


Sorry, English is not my native language. You can use "notes" in medium, to point me to the mistakes.


This is something that one should get over. Some of the most brilliant, and downright nice, people I've ever worked with had average to poor grammar and spelling. I'm much better off from having worked with, communicated and laughed with them over my life.


yeah, i think so too.


Do we really need to reinvent the wheels? Microservice, the async craze feels like, we now need to create a single wheel that is made up of many wheels, and spend an enormous time making it look and feel like a single wheel that was working fine, never mind that it serves absolutely no difference to the end user, it will make our next few years interesting because the old way of creating the wheel is boring and unexciting.

I think that after a few years, software businesses will realize that it was an investment with questionable advantages and go right back to what was working fine for the past decade and will continue to work fine.


Well, here is a good article:

https://plus.google.com/+RipRowan/posts/eVeouesvaVX

It seems that Amazon started using microservices in 2002. Do you think 12 years is not enough to learn on mistakes?


Amazon uses microservices because they need to at their scale. But there are maybe a few dozen companies that operate at that scale. 12 years isn't enough time to learn a lot when there are only a few people doing it, just as in e.g. spaceflight.


do they really use 'microservices'... or just services (SOA)? what are microservices anyway?


What does Javascript have to do with anything? The article is about Python. It's about how we build the server-side, not the client-side.


What do microservices have to do with JavaScript?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: