Hacker News new | past | comments | ask | show | jobs | submit login
Uvloop: Fast Python networking (magic.io)
456 points by c17r on May 4, 2016 | hide | past | web | favorite | 130 comments

> However, the performance bottleneck in aiohttp turned out to be its HTTP parser, which is so slow, that it matters very little how fast the underlying I/O library is.

This is exactly the same observation that motivated the Mongrel web server for Ruby, 10 years ago this year.

"Mongrel is a small library that provides a very fast HTTP 1.1 server for Ruby web applications. [...] What makes Mongrel so fast is the careful use of an Ragel [C] extension to provide fast, accurate HTTP 1.1 protocol parsing. This makes the server scream without too many portability issues." -- https://github.com/mongrel/mongrel

And its successor Thin:

"Thin is a Ruby web server that glues together 3 of the best Ruby libraries in web history: (1) the Mongrel parser, the root of Mongrel speed and security" -- http://code.macournoyer.com/thin/

In case anyone is still wondering, parsing in Ruby/Python/Lua is pretty slow compared to C/C++. That's why I personally have been really interested for a long time in writing parsers in C that can be used from higher level languages. That way you can get the best of both worlds.

It's the best of both worlds but doesn't shield you from the worst of one of the worlds. Untrusted input is still reaching code that has direct access to system memory. Hopefully not, anyway. But probably. Still, it's the way to go if performance is key.

These days you would probably want to write the parser part in Rust, with a small amount of unsafe code to implement a C-compatible API that could then be called from Python, or wherever. I did this for some regexp-based log parsing code written in Python, and saw a considerable (2-3x) performance win. The main outstanding issue is that Rust isn't as easy to distribute as C to random end users (e.g. it likely requires the user to have rustc installed for |pip install| to work, which is unlikely and not always possible through standard package managers).

Doesn't Conda help with that problem, allowing you to precompile rust extensions just like C extensions?

Very true. Thankfully fuzzing tools are getting better all the time. LLVM's libfuzzer is great.

Absolutely it's a positive that this is an active area of research. One day these types of problems are going to be something future programmers joke about. Or don't even know existed.

I'd recommend to watch this: https://www.youtube.com/watch?v=y0hyqzR6hIY

He goes into detail about how these types of libraries are particularly difficult or impossible to fuzz. He uses OpenSSL as an example but I would imagine an http library being similar.

> He goes into detail about how these types of libraries are particularly difficult or impossible to fuzz.

I just watched the video on 2x and I don't think that's a fair summary. He seems positive on fuzzing in general and mentions that fuzzing found two extremely tricky bugs in libsndfile and flac.

He does point out that there are some cases like OpenSSL that are particularly difficult to fuzz completely because they are encrypted and heavily stateful, creating transient keys on the fly and such. I don't think HTTP has this problem, for the most part.

Cool to see a video of Erik de Castro Lopo though -- I've worked with that guy since the early 2000s when I was working on Audacity (which uses his excellend libsndfile internally -- or at least did at the time).

On a related note, we go through great lengths in the Kestrel HTTP server [0] (which also uses libuv) to have fast HTTP parsing. As an example, we attempt to read the method and the HTTP version as longs and compare them to pre-computed longs in order to have fast comparisons and reuse strings containing standard methods and versions (reducing memory allocation is the main driver of our optimizations) [1], so we don't have to allocate those strings on every request. We do a similar thing for headers [2]. We also manage a lot of our own memory, despite using a garbage collected language [3].

[0] https://github.com/aspnet/KestrelHttpServer

[1] https://github.com/aspnet/KestrelHttpServer/blob/3a424f6abac...

[2] https://github.com/aspnet/KestrelHttpServer/blob/dev/src/Mic...

[3] https://github.com/aspnet/KestrelHttpServer/blob/dev/src/Mic...

Have you benchmarked it?

This is also what inspired the uwsgi[1] protocol in the uWSGI project[2].

1. https://uwsgi-docs.readthedocs.io/en/latest/Protocol.html

2. https://uwsgi-docs.readthedocs.io/en/latest/

Check out http-parser for Python: https://pypi.python.org/pypi/http-parser

This uses Dahl's original http_parser.c FSM. With a little work you can write a WSGI handler around it. Highly recommend.

How much of a parsing stack can you build atop of Ragel? I gave Ragel a good look over recently when checking out parsers for Ruby because Ragel has a Ruby binding. I chose A Ruby PEG† library (Parslet) because I decided that there was too much of a gap between the low level finite automata machinery that Ragel provides and the parsing generation that I was looking for. Was I wrong to decide that, I wonder.

† On a related note, I'm finding it difficult finding (fast) GLR or GLL parser generators for Ruby

I think you're right: the gap between Ragel and what you want in higher-level languages is too wide to make a stack out of it IMO.

I'm working on what I consider a parsing stack. The key observation, in my opinion, is that you need a way of representing structured data in the high-level language that is both rich and efficient. I think Protocol Buffers can serve that role.

Once you have the structured data representation, you can write parsers to read/write it. But you want the parsers to be independent of how you represent the structured data in any one language.

I've spent the last little while browsing through the upb and gazelle repos. I'm struggling to get a high-level overview of what you want to do. I'm curious though because what you're saying here on HN intrigues me.

Also note that David Beazley (google him if you're not aware) has a competitor to asyncio, which arguably is also a direct competitor to this called curio:


The caveat is that it uses the new async/await coroutine bits that just landed in Python 3.5, so it only works with Python 3.5+. He also gave a talk on concurrency in python recently at last year's PyCon:


Unless I'm mistaken, the decorator @asyncio.coroutine() is equivalent to async def and yield from is functionally a drop-in for await, so you should be able to use it with at least 3.4, maybe 3.3. Not that that's much better though.

That makes it functionally equivalent to uvloop (which from my understanding is a drop-in replacement for the built-in asyncio eventloop).

There are benchmarks of curio in the blog post, FWIW.

Good stuff. I saw him RT your announcement of this on twitter.

This isn't a fair comparison. the "HTTP server" presented isn't doing any checks / validation that a typical web server does.







The benchmark is almost equivalent to testing raw eventloop performance against a complete http server.

To make this test fair, you need to write the same code in asyncio_http_server.py in other languages.

Keep in mind that the HTTP server in the benchmarks actually uses the httptools parser, which is a full-blown HTTP parser. A lot of heavy-lifting is done by the parser.

I plan to add a complete HTTP protocol implementation to httptools, but honestly, I don't expect it to be more than 20% slower.

> This isn't a fair comparison. the "HTTP server" presented isn't doing any checks / validation that a typical web server does.

How so? It uses a binding to http-parser, just like nodejs.

Parsing is just a small part of an http server. You can check the links I posted to see how much work is involved in validating http headers (after parsing), and creating http responses.

There is a TCP and a HTTP benchmark.

This is quite interesting, but I don't find req/sec very interesting at all. These should be about matters of concurrency, that is, how much is being done at once, not how many req/sec overall are done (which could almost be explained away purely by the gains in lower latency).

These benchmarks seem to only use 10 clients concurrently, max. That's ridiculously low.

A few questions I'd like to see answered.

How many clients can you connect to each server, and have each one ping once per 5 mins, before the CPU gets overloaded?

How much memory is used per connection?

How does PyPy+Twisted fair in this?

> These benchmarks seem to only use 10 clients concurrently, max. That's ridiculously low.

I've just update the post with more details on HTTP benchmarks and attached the correct full-results file [1]. The concurrency level for HTTP benchmarks is 300, not 10.

To answer other questions, I'll have to run some benchmarks tomorrow :)

[1] http://magic.io/blog/uvloop-blazing-fast-python-networking/h...

Cool! For the client test I'm referring to, these would be long-lived clients that stay around and just TCP ping for an echo server, rather than HTTP calls that connect/disconnect.

In my experience, PyPy+twisted is around 5-25x faster CPython/twisted, and smoked asyncio as well. Would be great to see how uvloop compares there, and of course, someday when PyPy supports Python 3.5, there's no reason it couldn't use uvloop via cffi I'd hope.

> [..] Would be great to see how uvloop compares there [..]

Yep, I'm curious to see what will happen there. Do you have any suggestions on what tool to use to generate the load?

> [..] someday when PyPy supports Python 3.5, there's no reason it couldn't use uvloop via cffi I'd hope.

We'll figure that out! ;)

I wrote a tool to evaluate memory per connection one layer up (at websocket level), using autobahn. It works in asyncio, so it should work fine with uvloop.


I'm the dev behind uvloop. AMA.

No questions, just a thank you for including relevant information about the tests content, concurrency, adding percentile boxes to graphs, etc. (and the tested environment itself!) It's refreshing to see a benchmark taken seriously rather than "we tested some stuff, here are 3 numbers, victory!".


1) What makes uvloop which is based on libuv 2x faster than node.js which is also based on libuv?

2) Can uvloop be used with frameworks like flask or django?

3) gevent uses monkey patching to turn blocking libraries such as DB drivers non-blocking, does uvloop do anything similar? If not how does it work with blocking libraries?

1) I don't know :( I've answered a similar question in this thread with a couple of guesses.

2) No, they have a different architecture. Although I heard that there is a project to integrate asyncio into django to get websockets and http/2.

3) asyncio/uvloop require you to use explicit async/await. So, unfortunately, the existing networking code that isn't built for asyncio can't be reused. On the bright side, there are so many asyncio DB drivers and other modules now!

> 2) No, they have a different architecture

Having a web framework (a next generation Flask if you will) built ground up with concurrency is the missing key. I have used Flask for many years, but this is the right time to introduce a new framework - because of the internal restructuring and slowdown of Flask's maintainers (and I say this with the utmost respect).

If you are keen, this has the potential to be the killer application for Python 3.

One problem with this is that the entire ecosystem has to get on board with async. Maybe a new framework would make it compelling enough, who knows.

This was/is the big issue with Tornado, IMO (and Tornado has been around for ages in framework time). Tornado is only async if the entire call stack all the way down to the http socket is async, using callbacks instead of returning values. This means that any 3rd party client library you use has to be completely written asynchronously, and none are in python. So you end up with a lot tedious work re-implementing http client libraries for Twilio or Stripe or whatever you're using.

I'm curious to see where asyncio goes in python, but I'm a bit skeptical after seeing how much of a pain it was to use Tornado on a large web app. In the meantime I'll be using Gevent + Flask, which isn't perfect since it adds some magic & complexity but has the huge upside of letting you keep using all the libraries you're used to.

There is already http://www.tornadoweb.org/en/stable/ which is python3 compatible and shipped in production software across lots of companies (last I heard hipmunk and quora uses it)

Tornado is excellent... But Flask is better than excellent. The mental map of Flask is incredible. Tornado is a little hard to grok. Now one may argue that Tornado is hard by choice...to not mask the complexity. But then we have node..the most hip of frameworks out there. Node's true innovation was not performance, but to simplify the mental model of async. Obviously there's all the callback hell and all..but still.

I'm pretty sure both Node and Tornado have the same mental model of async. Both have callbacks and both have async/await functionality that makes it look more like blocking code.

The main benefit of Node as I see it is that the entire Node community uses the same IO Loop whereas Python's community is fragmented between normal sync code and multiple different IO Loops (asyncio will probably help with this).

I'm actually thinking about writing such a framework :) I'll call it "spin".

You totally should. I think people are hungry for a next gen async web framework... And if it leverages Python 3, then so much the better. Flask cannot go here even if it wants because its core philosophy is to be WSGI compliant. You don't necessarily have to adhere to that.

Just one point, please make sure you have designed DB access as a core part of your framework (e.g. [1]). Too many frameworks discount database interaction until it's too late.

Oh and please please choose your name so that it doesn't conflict on Google search. http://www.spinframework.org

[1] http://initd.org/psycopg/docs/advanced.html#async-support vs https://github.com/chtd/psycopg2cffi

Since spin is conflicting, may I suggest spyn? Gets that nice little Python 'py' in there.

Can also be read as "spine".

> Just one point, please make sure you have designed DB access as a core part of your framework (e.g. [1]). Too many frameworks discount database interaction until it's too late.

I strongly advise against this. One of the reasons Flask is so attractive is the fact that it does not enforce any database on you.

Thanks to its decoupled design you can use it purely as a routing library, which is great! Letting the framework decide something important as the database is a bad idea. [1]

[1] https://blog.8thlight.com/uncle-bob/2012/08/13/the-clean-arc...

that is ok - but there is a representative library that works. Loosely decoupled but definitively working is beautiful... and this is why I use Flask in my startup.

what frequently happens is a web framework without a thought for any kind of DB interaction (or as you put it... a routing library). In things like an async web framework, that could leave users hanging. For example, psycopg vs psycopg2 vs psycogreen vs psycopg2-cffi . Tell me which one to use and benchmark it.

I agree, its a fine line. And we can keep going back and forth whether a framework should "recommend" or not recommend. But in case like this - I think there will NOT be a lot of libraries that will be compliant with the async usecase. I would hope that this framework will recommend... but not ship "batteries included".

If such a framework was built around asyncio, do we really need yet another database layer? 1 of the best things about Flask is it doesn't reinvent the wheel.

It's not so much as another dB layer, but some async compatible dB layer.

Most database libraries don't play very well with non blocking code. In fact nodejs dB libraries were specifically designed for this.

Building async frameworks is not trivial - http://initd.org/psycopg/docs/advanced.html#async-support

I'd like to chat with you about this, share ideas for API, and architectures. Even if we don't work to each others, we can benefit from sharing ideas about what should a next gen framework look like in Python.

Go for it! An async python web framework with the speed of Go and the ecosystem of python would be a killer app.

We are actually working on a project like this with Tygs (https://github.com/Tygs/tygs, which will use crossbar.io), and right now it's taking ages just to get the app life cycle right.

Indeed, we want to make it very easy to use, especially a clean/discoverable API, simple debugging, clear errors, etc. Which is the stuff async frameworks are not good at, and an incredible added value.

It's a lot more work we initially thought. When you are doing async, you now think, not with a sequence of actions ordered in time, but with events. So for your framework to be usable, you must provide hooks to:

    - do something at init
    - register a component
    - do something when a component is registered
    - do something once it's ready
    - do something when there is an error
    - do something when it shuts down
With minimum boiler plate, and maximum clear error handling when the user try to do it at the wrong time or in the wrong way.

But, we learned while coding the project that, with asyncio, exceptions in coroutine don't break the event look (except KeyboardInterrupt which is a weird hybrid) while exceptions from the loop do break it.

Plus you have to make a nice setup, which auto starts the event loop with good default so that people don't have to think about it for simple apps. But it must be overridable, and handle the case where your framework is embeded inside an already started event loop, and make it easy to do so.

It's one of the strong point of gevent: you don't have to think about it. With asycio/twisted, you have the benefit a explicit process switch and parallelism, but you need to pay the price with verbosity and complexity. We try to create a balance, and it's turns out to be harder than expected.

Then you have to make clear error reporting, especially iron the implementation mixing Task, Futures, coroutine functions and coroutines. Provide helpers so that common scheduling is done easily...

And you haven't even talked about HTTP yet. This is just proper asyncio management. This is why nobody made a killer framework yet : it's a looooooot of work, it's hard, and it's very easy to get it wrong. Doing like <sync framework> but async doesn't cut it.

You've basically described Twisted. It's unfortunate that when twisted was initially developed python didn't have some of the syntactic niceness (coroutines, async) that would have made it a bit easier/cleaner to use.

you are doing excellent work. you should totally join hands with the OP and make something great!!

sounds good! Go forth!

https://github.com/pyGrowler/Growler "A micro web-framework using asyncio coroutines and chained middleware."

Forgive me if I'm misunderstanding what you're looking for, but doesn't Tornado fit that bill?

Tornado still too low level.

    - It's verbose compared to flask.
    - It reinvent the wheel, while flask uses great components such as werkzeug.
    - If you want to make a component, it's complex.
    - It doesn't come battery included for the Web like Django, just the bare minimum.
    - It misses the opportunity to provide task queues, RPC or PUB/SUB which are key components to any modern stacks are made easily possible by having persistent connections.
    - It ignores async/await potential of unifying threads/process/asyncio and don't allow easy multi-cpu.
Don't get me wrong, I think tornado is a great piece of software, but it's not match against innovative projects we see in Go or NodeJS such as Meteor.

We will have to agree to disagree on many of your points as they are mostly personal opinions, but the last one is flat wrong. Tornado has for a long time had great support for fully leveraging all CPU cores. Happy to show sample code if you like.

My mistake. My knowledge on tornado is dated.

Yeah, it's hard for Python to compete with languages like Go and Erlang that multiplex IO in the runtime.

It's not hard for the language, but we don't have the proper framework yet. Because such a framework is quite a task.

I'm the main person in Django working on 2), and this is interesting for sure, though our current code is based on Twisted since we need python 2 compatability (there's some asyncio code, but not a complete webserver yet)

What are the chances that Django Channels (the project I believe you're working on) will be able to ship with support for asyncio/uvloop?

You can use guv[0] if you're looking for a UV based version of gevent. I've used it before with good success.

[0] https://github.com/veegee/guv/blob/develop/README.rst

The answer of 2) is likely to be "no" as wsgi is synchronous in nature. The only way to run it async is through gevent.

I'm working on a Python CLI that uses asyncio/aiohttp to make and process requests to a 3rd party API. Anyways, I ran into the 10,000 socket problem today and ended up using a semaphore, that actually boosted the overall performance. Why is that? Is it just because the CPU is overwhelmed otherwise?

It depends. Maybe you aren't closing the sockets properly (shutdown + close). Or maybe, because of how TCP works, the sockets are stuck in timeouts and don't really close for a long period of time. If its something like that, then your old connections aren't really closing, and new ones can't be created.

Or maybe it's a simple problem of aiohttp performance -- as shown in the blog post, its HTTP parser is a bit slow.

In general, I'd recommend to use a fewer number of sockets and implement some pipelining of API requests.

Are you making all connections within a single session?

Not long ago I saw example here that someone was creating a new session for every single connection. This is not very optimal way of using it. If you use it within same session, aiohttp will make use of keep-alive, which in turn will reuse existing connections and reduce overhead. You also won't need to use a semaphore, since you can define limit in TCPConnector.

Why you had performance issues? As other said, you were making thousands of connections, each socket need to be in TIME_WAIT state for 2 minutes after closing (limitation of TCP, SCTP does not have this problem). So if you use all connections within short time, you'll essentially run out of them. Some people use tcp_tw_reuse/recycle, and that solves this issue, but that makes your connections no longer follow RFC and you might encounter strange issues later on. The advice above should resolve your problem without any hacks.

> at least 2x faster than nodejs, gevent, as well as any other Python asynchronous framework

I did not see any benchmarks in the repo to support this. How was this statistic determined?

For benchmarks we have a separate project on the GH: [1] -- you can run the benchmarks on a Linux box pretty easily. Here's an example output: [2]

[1] https://github.com/MagicStack/vmbench [2] http://magic.io/blog/uvloop-blazing-fast-python-networking/t...

Not sure what you mean, the benchmark section lists ~40k packets/s for nodejs and asyncio, ~100k for uvloop (for 1KiB packages, similar difference for 10 and 100 KiB) - and ~20k req/s for nodejs, and ~37k for uvloop w/httptools. Interestingly, uvloop pulls ahead for 100KiB request size for the http case.

The benchmark (simple echo server) results were right there in the blog post:


Building on to this, how does it compare to raw libuv in c?Personally, I'm not surprised that python (especially cython) is faster than node in this case, but I still need to see how much less overhead there is to node.

> Building on to this, how does it compare to raw libuv in c?

Building something in C is very hard. uvloop wraps all libuv primitives in Python objects which know how to manage the memory safely (i.e. not to "free" something before libuv is done with it). So development time wise, uvloop is much better.

As for the performance, I guess you'd be able to squeeze another 5-15% if you write the echo server in C.

> I'm not surprised that python (especially cython) is faster than node in this case

Cython is a statically typed compiled language, it can be anywhere from 2x to 100x faster than CPython.

>So development time wise, uvloop is much better.

Yeah, I definitely get that. I'm just trying to see the smaller picture here.

>Cython is a statically typed compiled language, it can be anywhere from 2x to 100x faster than Python.

Ah, my bad for not knowing the difference between Cython and CPython. It seems to me, then, that this isn't really a fair comparison to node, is it? Naturally a statically typed language is going to be faster than a dynamic one. Good on you for including a comparison with Go, though.

> It seems to me, then, that this isn't really a fair comparison to node, is it?

Isn't node written in C?

Yes, but JavaScript is still a dynamic, garbage collected language.

Most of nodejs internals are in C++ on top of libuv. Only a thin layer of JS interfaces wrap that.

Python is also a dynamic, GCed language. uvloop is built with Cython, which uses the Python object model (and all of its overhead!), and CPython C-API extensively (so it's slower than a pure C program using libuv).

I think the uvloop package is written in Cython, but the benchmarks just use Python.

What are some of the non-obvious limitations/peculiarities one should be aware before using this?

Wherever you use asyncio, it should be safe to just start using uvloop (once we graduate it from beta).

uvloop shouldn't behave any differently, I've paid special attention to make sure it works exactly the same way as asyncio (down to when its objects are garbage collected).

Pretty awesome stuff. Do you think aiohttp's http parsing would benefit much on a hypothetical pypy3 interpreter?

Maybe. But I think its parser is better to be replaced with the httptools one.

It looks like uvloop requires Python 3.5. How practical is it to create a fork for companies stuck on Python 2.7?

Not really practical. uvloop is designed to work in tandem with asyncio, which is a Python 3-only module. asyncio, in turn, requires 'yield from' support, something that Python 2 doesn't have.

The 'Trollius' module backported asyncio to Python 2, requiring syntax changes, though. For example "yield from" to "yield From()".

Work on Trollius was stopped a few weeks ago, there wasn't enough interest/use. Call for interested maintainers, http://trollius.readthedocs.io/deprecated.html#deprecated

The problem with Trollius was that packages that asyncio packages needed explicitly to add support for it to work (because Python 2 does not have yield from) Several packages (I remember aiohttp was one of them) did not want to do that.

> How practical is it to create a fork for companies stuck on Python 2.7?

Not very, it's an alternative event loop for asyncio[0] which was introduced in 3.4 and builds upon other Python 3 features (e.g. `yield from`)

[0] https://docs.python.org/3/library/asyncio.html

Use either docker or/and crossbar.io to isolate the 2.7 code as a service if you don't want to port it, then add the rest of the code in 3.

Today it should be good practice to use containers with Python 2.7 and 3.5 or any other versions you like. With that you can solve most szenarios.

Python is actually designed in such way that you can install multiple major versions and they can coexist perfectly fine together.

For example you can install python 2.6, 2.7, 3.3, 3.4, & 3.5 all on one host without any conflicts. The limitation is that many distributions prefer to not maintain different versions of supposedly the same language.

If you use RedHat or CentOS you can just use https://ius.io/ and get access to the other python versions. This is one of few repos that makes sure the packages don't conflict with system ones.

This is awesome!

How does it compare to PyPy?

Once PyPy3 is available, will this work with it?

> Once PyPy3 is available, will this work with it?

We'll find a way!

do you have perf benchmarks under heavier concurrency? (e.g., how does it do with 100 concurrent clients? 1000?)

The HTTP benchmarks were actually run under concurrency level of 300. I've just updated the post with extra details. See also the full report of HTTP benchmark [1]

[1] http://magic.io/blog/uvloop-blazing-fast-python-networking/h...

Wow. What's the intuition for why python on top of libuv is 2x faster than node on top of libuv?

Two things I'd check first:

1. The benchmarks make servers generate a huge number of objects, so maybe, the GC is under too much pressure.

2. Another possibility is that the v8 JIT can't optimize some JS code in nodejs, or does a poor job.

That said, only careful profiling can answer your question :)

I would be interested in seeing the performance difference in NodeJS TCP echo benchmark by using piping instead of reading/writing manually:


C Python uses ref counting, whereas Node/V8 uses GC. Ref counting schemes generally use much less memory and GC scans can be expensive when there's a lot of new data, as is the case for a web server.

How about clustering the Node server? By default, node uses only one process. Making the server use all available processes requires a bit more work (see https://nodejs.org/docs/latest/api/cluster.html or https://github.com/Unitech/pm2) but will probably do more justice to Node.js vs Python.

> but will probably do more justice to Node.js vs Python.

How so? Clustering for node just makes it run in several OS processes. You can (and should) do the same for Python code, it's easy.

Asyncio is pretty fast, but as soon as you write any Python logic, request per second will drop significantly.

I'm surprise uvloop is faster than node.js since libuv was developed for the latter. Kudos to the author.

> Asyncio is pretty fast, but as soon as you write any Python logic, request per second will drop significantly.

Sure, but it really depends on how complex your Python code is.

> I'm surprise uvloop is faster than node.js since libuv was developed for the latter. Kudos to the author.


Can't one write the logic in cython also?

Looks good. Wonder why these benchmarks never include an actual database. I'm using firebase with nodejs and no matter how many requests per second my server can respond with, it's ultimately constrained by data requests and memory (ie how many connections can I wait on).

When I see numbers like 50k requests per second it's meaningless, unless I have no database or some kind of in memory cache only db.

It's not meaningless for new gen apps which trade a lot of db request for message passing. In micro-services app, when you update something, you propagate the change, and you have x client update for one db requests instead of having x + 1 db requests. In that context, broacasting quickly to a lot of clients is important.

Are you using this code in production?

I'd say it's not yet ready for production. The test coverage is fairly decent, though, so I hope we'll make a stable release soon.

That said, uvloop should be fully compatible with asyncio. All APIs should be ready, so you can start testing it in your projects.

I think claims about being faster than X require showing the code used in the benchmarking.

Time to take a rest for JS? :)

It will be great to know how this compares to async frameworks in Java, Netty and Vert.x for example, in the benchmarks.

Feel free to make a PR to https://github.com/MagicStack/vmbench!

Does it mean that uvloop could allow the use of the same event loop implementation on the three major OS?

asyncio already allows that.

uvloop, as of now, only runs on *nix; but I hope we'll have Windows supported soon too.

It is interesting that uvloop-streams is almost identical to gevent in performance. Gevent is based on libev, the project libuv was forked from.

What exactly is the -streams addition that makes uvloop-streams perform so much worse than plain uvloop?

Streams implementation is a pretty big chunk of Python code, that manages flow control, buffering, and integration with coroutines. And you don't always need all that stuff when you're writing a protocol, since you can implement them more efficiently as part of protocol parser.

Is it possible to add a benchmark for gevent using raw sockets not StreamServer (assuming that adds similar overhead).

AFAIK, StreamServer should actually affect benchmark in a positive way. In gevent case, StreamServer isn't about high-level abstractions, it's about making sure that client sockets always have a READ flag in the IO multiplexor (epoll, kqueue etc)


Unless you're working with legacy project, in which case you wouldn't be replacing the lib for such stuff anyway, I don't see any reason why you would want to use 2.7.x over 3.x unless you desperately need something like gevent.

gevent got 3.x support a while ago.

I was waiting for this LOL... I think the parent was being sarcastic here :-)

The benchmarks for node.js are terribly misleading. The node.js implementations only ever spawn a single process, and thus node is only running on a single core and uses only a single thread.

Specifically, the http server example(1), doesn't even bother using the standard library provided Cluster module(2). Cluster is specifically designed for distributing server workloads across multiple cores.

All node.js services/applications I've worked on in the past 3 years (that are concerned with scale) utilize a multi-process node architecture.

The current benchmark can only claim that a single python process that spawns multiple threads is 2x faster than a single node.js process that spawns only one thread.

This fact may be interesting to some, but is irrelevant to real world performance.

[1]: https://github.com/MagicStack/vmbench/blob/master/servers/no...

[2]: https://nodejs.org/api/cluster.html

There is nothing misleading about the benchmarks. It is explicitly said that ALL frameworks were benchmarked in single-process and single-thread modes.

Yes, in production you should run your nodejs app in cluster, your Python apps in a multiprocess configuration, and you should never use GOMAXPROCS=1 for your go apps in production!

Running all benchmarks in multiprocess configuration wouldn't add anything new to the results.

The main premise in my comment is that the benchmarks do not resemble real world performance, and are therefore misleading.

The comment above (https://news.ycombinator.com/item?id=11626762) further expands on why these kinds of benchmarks, although interesting, have no real value.

Each implementation does something wildly different and responds to different inputs with completely different outputs.

To put it metaphorically, if you put a car engine in two completely different chassis and then race them on a track, you aren't gaining any real insight into relative performance of the engine in the two vehicles.

Also, just to be clear, my qualms are with the benchmarks alone, I think the library is great! Thanks for all the hard work :)

I guess I look at the benchmarks in a bit different light.

These benchmarks are primarily comparing event loops and their performance. TCP benchmark is very fair, HTTP - maybe not so much. The point is to show that you can write super fast servers in Python too, just have a fast protocol parser.

As for the HTTP benchmarks, I plan to add more stuff to httptools and implement a complete HTTP protocol in it. Will rerun the benchmakrs, but I don't expect more than 20% performance drop.

Since this is a benchmark of eventloop based frameworks, it makes sense to only spawn a single eventloop and test against that. I looked through the code for the python servers and they are all configured for a single event loop, making this a comparison on equal footing.

Yes it's true you normally run multiple node processes in production, but you likewise normally run multiple asyncio/tornado/twisted processes in production as well. I don't see it as a big deal, or misleading to compare them in this sense.

It says that all the benchmarks are single-threaded and even mentions at the end that you could push performance further with multicore machines.

It doesn't matter anyway, with one thread per core it would be pretty straightforward to scale in beefier machines.

Do the other benchmarkside spawn multiple processes?

No, even Go is explicitly configured to only use one scheduler:

> We use Python 3.5, and all servers are single-threaded. Additionally, we use GOMAXPROCS=1 for Go code, nodejs does not use cluster, and all Python servers are single-process.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact