There's a large project on our roadmap to rework a big ole chunk of legacy code into something that is actually an asset for the company instead of an anchor.
I'm considering giving it a run with a base of gunicorn/gevent/nginix/pyramid. Seems that gunicorn/gevent give us the ability to use threads where best, but without having to make everything callbacks. And Pyramid gives us a flexible framework to run our web service through (currently the main focus). Kicked around the idea of using M2/0MQ as a way to implement a SOA of sorts, but it feels like a bit to much.
So, if you were starting from scratch and wanted to build a robust high traffic web service (site, app, api, etc), what would you use?
Tornado is solid and proven, however we will explore gevent on uwsgi more in the future. Using gevent for comet/async would enable us to consolidate the Tornado code into Flask, but we have been focused on other stuff so we'll test gevent when we have more time.
Flask is at about the right level of abstraction for what a Web framework should be these days. In this era of the social graph, it can be more interesting to store your social graph in a graph database and use it as your primary datastore. And if you're not using a relational database as the primary datastore, why would you want a framework that's built around an ORM?
ORM-based frameworks are ok if you stay inside the box, but they can get in the way when you're not using the RDBMS for authentication and authorization. And when you strip out all the stuff that's tied to the ORM and auth, you end up with with something that looks a lot like Flask. It's usually cleaner to start with something that was designed from the ground up to be a polyglot framework.
P.S. We chose uswgi because it's high performance and high quality (Roberto is a really smart guy), and there's a little-known feature in works -- uwsgi binary connectors to varnish and haproxy that will enable you to connect uwsgi directly to varnish and haproxy over a binary protocol and thus eliminate the HTTP overhead.
At scale yes. They're built to solve different problems and excel at each individually. Nginx can proxy, but HAProxy is more flexible. Nginx can cache, but Varnish is much more flexible and efficient. We use all three technologies and route the traffic to the service best suited for the traffic.
I'm a fan of Django, so my ideal stack looks something like this:
* puppet - managing server packages / infrastructure
* monit - monitoring server processes / fixing things
* django - primary web framework and ORM
* amazon mysql - it's hosted, and works via plug-ins with Django
* amazon elastic load balancer - for scaling incoming HTTP requests across multiple web app servers
* amazon autoscale - for spinning up new web app servers to handle spikes in traffic
* rabbitmq - message queueing
* celery - processing async tasks in a robust fashion. must have
* memcahed - no explanation necessary
* fabric for deploying software
* jenkins for testing / building software
* nginx for buffering elastic load balancing requests to web app servers
I am considering this stack off the shelf in my next big project:
- uWSGI - performs better than gunicorn and has support for async apps using gevent
- nginix - front end server
- pyramid - web framework
- mongodb - database
- mongoengine - mongodb and python mapper
- zeromq - messaging and communication
- jinja2 - for template engine
- gevent - for async processing
- gevent-zeromq - to make zeromq non-blocking and gevent compatible
- socket-io - JS lib for realtime communication
I still need to develop robust session management. I considered various options and came to conclusion if I want something fast, truly distributed and not using sticky session I should come up with my own session manager demon hosted on each node. I would use ZeroMQ to communicate to it.
Thanks. Actually I did consider Beaker it didn't fulfill my requirement. I wanted to replicate sessions across nodes actively and async, and also I wanted to persist to MongoDB async.
So here is how I was considering to build:
- sessions updated and validated by a demon process per node.
- each validation and update will be one call via ZeroMQ's Req/Rep pattern. With each call I can validate session and reset timestamp.
- after each validation I asynchly will replicate session across other nodes via ZeroMQ's Pub/Sub ( I don't care about extra memory)
- sessions will also persist in MongoDB (async), just in case each node is restarted, thus preserving session.
Btw, I only want to valid/invalidate a session token and keep authorization information. Any other small values I could simply keep in the cookies encrypted, e.g. User ID. Though I could keep session information in cookie as well, but this allows sessions to live forever and it's not good if I want to kick some user out.
Primarily I wanted it to be really fast, be able to invalidate session, and be available on any node via active replication. If I used Beaker, I had to use memcached which is not replicated (by default setup), and each validation would have required 2 round trips to memchached (validation, update stamp).
It's easier to write my own with ZeroMQ and I can build with custom logic at demon level.
Awhile back I looked into Mongo for sessions, but it isn't really designed to do this. Sessions are temporal, and AFAIK Mongo doesn't have a built in mechanism for expiring them, plus session data is usually small -- there are other datastores better suited for this type of data.
Primarily sessions will live in RAM. And I was going to use MongoDB to persist sessions just in case the daemon dies or session is not found in RAM as a fallback. The reason for using MongoDB is because it's my primary database.
But your concern about growing table of sessions is valid, which I will handle by periodically archiving the old sessions so that index sizes remain small.
Look at the comment, the twisted code is wrong. By default the twisted reactor is multiplatform, but you can imporve performance for your specific platform. It's in the doc.
If you run Twisted under linux and you should, use the epoll reactor.
Its not a high traffic site, but I'm running a app that served average of 5 req/s with Mongrel2 + wsgid + MySQL + django and thats working pretty well.
Also, the benchmark of Python web servers that gets linked everywhere (http://nichol.as/benchmark-of-python-web-servers) is getting old. I'm planning on doing a new benchmark, probably this coming weekend. As of now, I'm planning to test gunicorn, uWSGI, tornado, bjoern, eventlet, and gevent over HTTP, flup over FCGI, and uWSGI and wsgid over zeroMQ (behind Mongrel2). Thinking of it, I probably need to put all of the HTTP servers behind nginx for a more fair comparison. Am I forgetting any servers that people would like to see benchmarked?
Flask is great for beginners because it's well documented and easy to understand. You can become familiar with Flask in a weekend -- start with the Quickstart and then go through the tutorial (http://flask.pocoo.org/docs/).
You won't have to spend much time learning it or fighting with it -- you won't find yourself asking, "Will I be able to do what I want in the framework without hacking it?" Flask let's you program in Python rather than writing to the framework like you typically have to in larger, opinionated framework's like Django and Rails.
Ironically, this also makes Flask an ideal choice for advanced Python programmers because it gives you flexibility rather than always wondering "will the framework allow me to easily do...?"
BTW uwsgi is a production app server. For example nginx has a built-in uwsgi connector and you use uwsgi to serve Flask apps (see http://flask.pocoo.org/docs/deploying/), but it's not a framework like Django/Pyramid/Flask.
Flask to teach you about web services in general, and then move on to Django once you have a decent grasp of things.
Flask is still excellent for small to medium, toy sized projects. Weekend hacks thrown together as a demo, for example. Beyond that I've found the lack of structure and third-party applications in Flask to be a hinderance for more mature sites. I'd say that for any given feature I need on a site that isn't extremely specific to the purpose of it, 90% of the time I can find it on django-packages and have it dropped-in and integrated within 10-15 minutes.
I'd venture to guess that for every hour I spend getting something to work that doesn't fall nicely into Django's structure, I save a hundred or so from not having to re-write a mature component that the community has solved ten times over.
I use http://cherrypy.org/ (behind a nginx revproxy) to my satisfaction. The data is stored either in http://www.mongodb-is-web-scale.com/ , or in MySQL. Disclaimer: these websites are not designed for 50.000 hits per second, but during benchmarking I get consistent times like 4 msec per page, and I'm confident that nginx can handle many slow clients simultaneously.
Varnish / Frontline server sends static media to nginx, and other request to uwsgi cluster.
Nginx / static media servign
UWSGI / app servers
Django / Web Framework
PGSQL / Relational Database
Redis / NoSQL / cache / sessions
*RabbitMQ / messaging queue
we use varnosh as a frontend server it handles the load balancing betwen our UWSGI servers, and if the request is a static file its send to our nginx server. we them use redis to store all of our cache and sessions, we cache everything so everytime there is a read from our database via the django ORM our api grabs the whole object returned and stores it in redis so next time we need to retrieve it we just hit redis.
At my job, we are running tornado w/ gunicorn and membase w/ haproxy to load balance (and not much else) and handling quite a bit of traffic. If I were to write my own from scratch I'd want to learn some erlang first ;)
web.py development has stagnated lately. It's a great framework that many other Python frameworks used as inspiration, such as the webapp framework on Google App Engine and Tornado (and even Flask). However, Flask is more modern, under more active development, and is extensively documented. It's simple to use like web.py, and I would argue it's one of the cleanest Python Web frameworks out there.
Tornado is a different animal -- it's asynchronous so when you write code for it you have to program using callbacks rather than in a traditional style.
My current project requires real-time/always-on connections so I started to develop it using Tornado and then decided to switch to the Quora model -- use a traditional Web framework like Flask for most things, and connect back to Tornado for the real-time stuff.
When I switched to this model, the development process sped up considerably. In addition to just being really-well designed, Flask has an amazing debugger that makes me more productive, and it's easier to write unittests for Flask because you can write them in a traditional way and don't have to contend with Tornado's IOLoop.
They're pretty similar. The main difference is that Eventlet supports more event loops, has a messier basic architecture, is slightly slower, and lets you defer blocking tasks to threads. Since I'm connecting to MySQL, which happens through blocking C code in libmysql, that's pretty important to me. Eventlet's db_pool module is very nice.
Personally, I'm happy with eventlet and can't see why I would want to go to gevent. Does anybody using gevent want to chime in on what's better about it?
Well, twisted is an async framework, and that's the important thing. Back in the day I actually wrote a Twisted protocol to send/receive SMS messages from a modem over serial. In total the code was like 100 LOC. That's how flexible, well-organized it is.
~ Mature. Twisted has been around for a lot longer than Tornado and has learned from all of the available history of modern OS networking.
~ Tested. Twisted's been in production for a long, long time, and is covered with thousands of unit tests. It's official policy that code may not enter the Twisted tree without accompanying tests.
~ Flexible. Twisted can be used as a general-purpose networking library, it can integrate with Pygame and Pyglet, GTK+, Wx, Qt, Tk; it doesn't have to be used for servers.
~ Extensible. Twisted's connectors are explicit, and rely on interfaces and adapters rather than inheritance. As an example, Twisted's SSH library lets you separate the SSH server, SSH channel, and SSH shell from each other. Annoying if you want a standard SSH server, but terrific if you're building a custom SSH proxy or tunnel. (I did this a few weeks ago at work. A lot easier with Twisted than with Paramiko!)
I should note that it's not an either-or; there is a branch of Tornado which throws out all of the event loop and uses Twisted's event loop instead.
The tornado and cyclone APIs are closer to what I am thinking about as a web developer. Virtually never do I make an object and then think about how I'd like to adapt it to return an HTML view or programmatically add it under a parent object.