
Ask HN: What is your preferred Python stack for high traffic webservices? - bigethan
There's a large project on our roadmap to rework a big ole chunk of legacy code into something that is actually an asset for the company instead of an anchor.<p>I'm considering giving it a run with a base of gunicorn/gevent/nginix/pyramid.  Seems that gunicorn/gevent give us the ability to use threads where best, but without having to make everything callbacks.  And Pyramid gives us a flexible framework to run our web service through (currently the main focus).  Kicked around the idea of using M2/0MQ as a way to implement a SOA of sorts, but it feels like a bit to much.<p>So, if you were starting from scratch and wanted to build a robust high traffic web service (site, app, api, etc), what would you use?
======
espeed

      * haproxy - frontline proxy
      * varnish - app server cache
      * nginx - static files
      * uwsgi - app server
      * flask - primary framework
      * tornado - async/comet connections
      * bulbflow - graph database toolkit
      * rabbitmq - queue/messaging
      * redis - caching & sessions
      * neo4j - high performance graph database
      * hadoop - data processing

~~~
lamnk
Why Varnish in front of Nginx? AFAIK Nginx can pretty much handle the role of
Varnish.

~~~
espeed
Varnish connects directly to the app servers -- it's not in front of nginx
(nginx is to the side serving other content).

------
b14ck
I'm a fan of Django, so my ideal stack looks something like this:

    
    
      * puppet - managing server packages / infrastructure
      * monit - monitoring server processes / fixing things
      * django - primary web framework and ORM
      * amazon mysql - it's hosted, and works via plug-ins with Django
      * amazon s3 - storing static assets (images, css, javascript, etc.)
      * amazon elastic load balancer - for scaling incoming HTTP requests across multiple web app servers
      * amazon autoscale - for spinning up new web app servers to handle spikes in traffic
      * rabbitmq - message queueing
      * celery - processing async tasks in a robust fashion. must have
      * memcahed - no explanation necessary
      * git
      * fabric for deploying software
      * jenkins for testing / building software
      * nginx for buffering elastic load balancing requests to web app servers

~~~
thedangler
how many EC2 servers do you have running for all this?

------
antimora
I am considering this stack off the shelf in my next big project:

\- uWSGI - performs better than gunicorn and has support for async apps using
gevent

\- nginix - front end server

\- pyramid - web framework

\- mongodb - database

\- mongoengine - mongodb and python mapper

\- zeromq - messaging and communication

\- jinja2 - for template engine

\- gevent - for async processing

\- gevent-zeromq - to make zeromq non-blocking and gevent compatible

\- socket-io - JS lib for realtime communication

I still need to develop robust session management. I considered various
options and came to conclusion if I want something fast, truly distributed and
not using sticky session I should come up with my own session manager demon
hosted on each node. I would use ZeroMQ to communicate to it.

~~~
bigethan
is the uWSGI choice a purely based on performance? I like gunicorn for the
simplicity, uWSGI has tons of options thought I wonder if they make management
more difficult.

~~~
antimora
Yes, it was primarily because of the performance. According to these
benchmarks, unicorn totally choked: "... At the bottom we have Twisted and
Gunicorn ..."

<http://nichol.as/benchmark-of-python-web-servers>

~~~
mardiros
Look at the comment, the twisted code is wrong. By default the twisted reactor
is multiplatform, but you can imporve performance for your specific platform.
It's in the doc. If you run Twisted under linux and you should, use the epoll
reactor.

------
lightcatcher
Its not a high traffic site, but I'm running a app that served average of 5
req/s with Mongrel2 + wsgid + MySQL + django and thats working pretty well.

Also, the benchmark of Python web servers that gets linked everywhere
(<http://nichol.as/benchmark-of-python-web-servers>) is getting old. I'm
planning on doing a new benchmark, probably this coming weekend. As of now,
I'm planning to test gunicorn, uWSGI, tornado, bjoern, eventlet, and gevent
over HTTP, flup over FCGI, and uWSGI and wsgid over zeroMQ (behind Mongrel2).
Thinking of it, I probably need to put all of the HTTP servers behind nginx
for a more fair comparison. Am I forgetting any servers that people would like
to see benchmarked?

~~~
antimora
Could you also try benchmarking uWSGI in async mode? Preferably with gevent?

Looking forward, thanks!

------
timc3
This is what I am using currently:

    
    
      * haproxy - frontline proxy
      * nginx - static files and back proxy
      * supervisord - service uptime
      * gevent/meinheld - wsgi
      * django
      * gevent/eventlet - websockets/comet
      * postgresql - Database obviously
      * memcached - caching for django
      * rabbitmq - message queuing
      * celery - message processing
      * fabric - deploying
      * hudson - building

~~~
Pewpewarrows
Just described the stack that I use to a "T".

The only things I would add are "solr" for search, and "redis" for
miscellaneous speed improvements, such as statistics tracking and counting.

------
jsherer
Surprised at the low number of CherryPy posts in this thread. Not only is it a
great framework, it supports Python3 out of the box. My stack:

\- ubuntu/debian - apt ftw

\- python 3

\- haproxy - proxy

\- nginx - w/ uwsgi

\- cherrypy - framework that supports PY3

\- sqlalchemy - orm and sql

\- postgres - relational storage

\- mongodb - "mandatory" NoSQL

\- 0MQ - messaging

------
andybak
You should find Simon Willison's talk about Building Lanyrd very relevant.

Slides and video here: <http://lanyrd.com/2011/brightonpy-building-lanyrd/>

------
kingkilr
* nginx * gunicorn * Django * PostgreSQL * memcached * Whatever else I need to implement the logic of the site (redis, celery, etc.)

------
amitutk
I am newbie to using python for web services. Will django be better to start
with or should I consider pyramid/flask/uWSGI as suggested here?

~~~
espeed
Flask is great for beginners because it's well documented and easy to
understand. You can become familiar with Flask in a weekend -- start with the
Quickstart and then go through the tutorial (<http://flask.pocoo.org/docs/>).

You won't have to spend much time learning it or fighting with it -- you won't
find yourself asking, "Will I be able to do what I want in the framework
without hacking it?" Flask let's you program in Python rather than writing to
the framework like you typically have to in larger, opinionated framework's
like Django and Rails.

Ironically, this also makes Flask an ideal choice for advanced Python
programmers because it gives you flexibility rather than always wondering
"will the framework allow me to easily do...?"

BTW uwsgi is a production app server. For example nginx has a built-in uwsgi
connector and you use uwsgi to serve Flask apps (see
<http://flask.pocoo.org/docs/deploying/>), but it's not a framework like
Django/Pyramid/Flask.

------
ConceitedCode
uWSGI, nginx, pyramid, sqlalchemy, postgresql, mako, beaker and fabric to
deploy

My preferred setup that works for most cases. All reliable and fast.

~~~
bigethan
That does sound lovely. Have you played with any of the evented/thread options
of uWSGI?

~~~
ConceitedCode
Not really. I haven't had a need to yet, but it is on my list to check out.

------
lacion
_Varnish / Frontline server sends static media to nginx, and other request to
uwsgi cluster._ Nginx / static media servign _UWSGI / app servers_ Django /
Web Framework _PGSQL / Relational Database_ Redis / NoSQL / cache / sessions
*RabbitMQ / messaging queue

we use varnosh as a frontend server it handles the load balancing betwen our
UWSGI servers, and if the request is a static file its send to our nginx
server. we them use redis to store all of our cache and sessions, we cache
everything so everytime there is a read from our database via the django ORM
our api grabs the whole object returned and stores it in redis so next time we
need to retrieve it we just hit redis.

------
z0r
At my job, we are running tornado w/ gunicorn and membase w/ haproxy to load
balance (and not much else) and handling quite a bit of traffic. If I were to
write my own from scratch I'd want to learn some erlang first ;)

------
gtaylor
nginx + gunicorn + pgbouncer + postgres, S3/CloudFront for -all- media. The
gunicorn app server sit behind one of Amazon's Elastic Load Balancers, but
could just as easily be HAProxy.

------
BarkMore
nginx - frontline proxy, static files

tornado - web

memcache - cache

mysql - database

------
steve8918
Does anyone have any opinions on web.py? I played around with this and it
seemed pretty easy to use.

~~~
zmitri
Tornado has a very similar feel to web.py, yet can do much more when you are
ready for it. If you like web.py you might as well use tornado.

~~~
espeed
Tornado is a different animal -- it's asynchronous so when you write code for
it you have to program using callbacks rather than in a traditional style.

My current project requires real-time/always-on connections so I started to
develop it using Tornado and then decided to switch to the Quora model -- use
a traditional Web framework like Flask for most things, and connect back to
Tornado for the real-time stuff.

When I switched to this model, the development process sped up considerably.
In addition to just being really-well designed, Flask has an amazing debugger
that makes me more productive, and it's easier to write unittests for Flask
because you can write them in a traditional way and don't have to contend with
Tornado's IOLoop.

For real-time stuff, you could forgo Tornado all together and instead use
gevent to deploy your Flask app
(<http://flask.pocoo.org/docs/deploying/others/#gevent>), like some have done
with Django and Pyramid, but I haven't tried this yet.

~~~
zmitri
You don't need to write tornado in an async fashion. You can use it like you
would Flask, but its got a more similar feel to web.py than Flask.

------
samuel1604
eventlet!

~~~
pjscott
Seriously, it's amazing. It's like gevent, but with better documentation.

~~~
bigethan
I was under the impression that gevent was like a new version of eventlet.
Other than the docs (which isn't insignificant), why choose one over the
other?

~~~
pjscott
They're pretty similar. The main difference is that Eventlet supports more
event loops, has a messier basic architecture, is slightly slower, and lets
you defer blocking tasks to threads. Since I'm connecting to MySQL, which
happens through blocking C code in libmysql, that's pretty important to me.
Eventlet's db_pool module is _very_ nice.

Personally, I'm happy with eventlet and can't see why I would want to go to
gevent. Does anybody using gevent want to chime in on what's better about it?

~~~
Continuation
gevent has utilities to turn Python blocking calls into non-blocking ones.

In your case you could switch to a pure Python MySQL driver and gevent will
turn all your mySQL calls into async.

Actually there're also gevent non-blocking MySQL drivers that are written in
C.

------
MostAwesomeDude
Twisted/Twisted/Twisted/Twisted. >:3

More seriously, Twisted/Flask/SQLAlchemy has been the formula for the past two
deployments I've done, and I'm happy with it.

~~~
bigethan
Why Twisted over Tornado or other threaded solutions? I'm curious about the
specific strengths that you enjoy.

~~~
mardiros
Twisted is a tcp/udp[/ssl] framework, Tornado is an http one.

In other word, with Twisted, you can write http, ftp, smtp, ... and your are
free to write your own protocol.

~~~
dguaraglia
Well, twisted is an _async_ framework, and that's the important thing. Back in
the day I actually wrote a Twisted protocol to send/receive SMS messages from
a modem over serial. In total the code was like 100 LOC. That's how flexible,
well-organized it is.

------
antihero
PHP 4 running on IIS.

~~~
ashconnor
Did you miss Python in the title?

