Hacker News new | comments | show | ask | jobs | submit login
Real-time applications and will Django adapt? (arunrocks.com)
121 points by pramodliv1 1375 days ago | hide | past | web | 69 comments | favorite

(My background: I work for Google, I did a real-time web prototype using the client libraries for GChat back in 2009 when real-time search was all the rage, my Noogler mentor at Google was the frontend tech lead for the eventual real-time search product we launched, and before Google I'd worked in financial software, where real-time responsiveness really is required.)

I think that the folks currently building prototypes in Meteor dramatically underestimate the difficulty of scaling up real-time software to production-grade quality.

The problem is that if a single component in your stack blocks, you are no longer real-time. Any time one client writes into the database and another reads it, you have to poll, since the DB won't give you notifications. (Exception: PostGres gives you PQnotifies, Oracle gives you the User Messaging Service, MySQL it's theoretically possible with triggers and user-defined stored procedures that make a network call, and MongoDB you can break the DB abstraction and tail the oplog. Good luck plumbing any of these up through your language DB driver and ORM, though.) If you have business logic in a middle-tier server that's request-response only, then that logic becomes a synchronization bottleneck, and you have to constantly update that server and poll it with requests. If your algorithms require complete state snapshots, you're out of luck unless you build a service to manage and update that state consistently while triggering the algorithms whenever it changes. If your algorithms can't run in soft-realtime time guarantees (dozens to hundreds of milliseconds, usually), you're still out of luck. You need to figure out sharding of state and message notifications yourself. You need to figure out message recovery protocols - most real-time systems have odd consistency problems when messages get dropped due to overload, network failures, or software errors.

Google's real-time search ended up polling every 15 seconds with simple AJAX calls, because when the lag for a post to go through the indexing & serving pipeline is a minute or two (itself a major accomplishment), an additional 15 seconds isn't going to be noticeable to the user.

People on HN love to hate on Twitter engineering, but one thing they've done really well is scale a system that actually is soft real-time and has a lot of potential producers and consumers. This is far from the trivial exercise that someone who picked up Meteor in a weekend might think it is.

I think the twitter-hate was usually because Twitter started with a naive Rails approach on Joyent and would fail whale constantly in the early years.

Just recently they blogged how they finally completed a rewrite that improved per node performance to 100,000 messages per second instead of a few hundred - a better than expected result. Its just a bit OCD inducing how they had to solve problems of their own creation for a long time instead of "doing it right from the start".

I offer no opinion whether or not they were being smart and pragmatic or incompetent hipsters in the beginning.

Edit - Found it, here's the post: https://blog.twitter.com/2013/new-tweets-per-second-record-a...

Its just a bit OCD inducing how they had to solve problems of their own creation for a long time instead of "doing it right from the start"

That's a really, really silly thing to get annoyed about, and flies straight in the face of lean, agile and MVP. I'd wager that if Twitter had "done it right from the start" then they wouldn't be around at all today.

> and flies straight in the face of lean, agile and MVP

It all depends on how complex is doing something right from the start, and how right must it be done for you to stay out of trouble. Twitter had a really naïve approach when it started. Had they started with a slightly better (not necessarily harder) approach, they'd have had far less problems at a much later time.

MVP has a "V" in the middle. Twitter only survived because their V was extremely flexible.

I don't disagree.

Django's event system lets you avoid polling the DB.

Here is a plugin that offers some real-time capability (through a separate process of course): http://telegraphy.readthedocs.org/en/latest/django-telegraph...

Seems to be the best thing on offer that is off the shelf but you may be better rolling your own node, websocket, redis pub/sub, django integration yourself(which actually isn't that hard to do and may give you better flexibility).

If I understand the Telegraph docs correctly, it offers some event-based capability if you interact the app through your Django models exclusively. If, say, you've got a crawler that's out there discovering new content on the web, or a chatbot that listens on IRC, or another app that interacts with the DB via a different framework, you're pretty much out of luck, unless you also write those components to use Telegraph and Django models.

That's my point about every part of your system needing to be async-aware. Most off-the-shelf open-source components that people use to build websites now are not. Ironically I think we were in a better position in this respect in the early 2000s with JMS, Jini, JavaSpaces, CORBA, RMI, etc. but all of those fell by the wayside because of justifiable concerns about programming complexity. Now the latest generation of web developers is discovering async soft real-time programming again, and soon we will again discover those concerns about programming complexity.

> Django's event system lets you avoid polling the DB.

Django's signal system must be run on the same instance, synchronously, before returning any response to client. This means if a signal blocks, the response will never reach the client.

It's technically impossible to handle post-response events in WSGI.


For example, user requests a blog post, you can't first return response then increase your view counter by 1. You just can't. Unless you use some container specific hooks (like uWSGI, Tornado, etc.)

Django's signals API is synchronous, but writing a wrapper to process signals as asynchronous tasks is relatively straightforward.


> Async Signals uses Celery_ for signal routing

Since Celery is just a bunch of worker processes, why can't we build a web framework which natively supports both web worker and bg task worker?

Sounds good to me.

From what I understood, Meteor does tail the MongoDB oplog.

That's some dark magic out there. Does Meteor oplog monitor process need to run on the db instance as well?

Simply put, Meteor runs oplog monitor in the node.js process through node-native-driver.

Scaling anything is hard to achieve, especially when you are also trying to build a business around an idea.

Scaling is usually a problem you solve after you become more popular. My original article about Meteor and Rails wasn't about handling twitter-esque type traffic.

How do you know someone works for Google? They will tell you.

This is unnecessarily snarky. Sure, there is probably a status element in the declaration - but it also confers authority to their claims, whether you like it or not. This is handy for folks skimming comments to look for quality contributions.

The grandparent wasn't even referencing working at Google for a status boost, but merely because it directly related to their diatribe (their experience with implementing a Google feature and how that relates to scalability issues that Meteor fails to address)

There seems to be a meme going around that things like Rails or Django need to somehow change and react to single page javascript web apps.

Maybe it's just me, but trying to modify your favorite web app framework to accommodate something they were never designed to do in the first place is foolish and will end up ruining what was originally great about tools like Django in the first place.

Just because a hammer is a popular tool that you really like doesn't mean it needs to change into ladder when you decide you need to climb onto a roof.

Rails/django are built to build websites. Websites are changing towards being JavaScript in the client single page apps. Thus either django/rails changes or gets removed.

I'm currently architecting a new app, and my django layer is still crucial for: API access to the data, Auth & Auth, background processing.

What we are trying to do as website builders has changed, and thus we are at a turning point. It isnt obvious yet what the go-to stack of the future is going to look like - is it django + tastypie + angular, or rails + ember, or meteor or something else?

Django was great for the old way of doing things (static or Ajax enhanced web). But it's not clear what it's role should be in the future.

To use your analogy: this is people trying to figure out if they still need hammers now that we're starting to use screws as fasteners.

Websites as a whole are not changing toward being single page apps. Some apps are being written as single page JS apps, but that's been the case for a while and we still aren't closer to the death of traditional websites than we were in 5 years ago.

Another option is that websites that rely entirely on Javascript get removed. I prefer this one.

I just don't see that as a likely future.

I don't see the value proposition of making (most) web apps/sites real-time. Sure, it makes sense for a chat app or a stock ticker, but blogging? A news site? E-commerce?

Maybe it's important that eBay is "real time" in the last 5 minutes of an auction, but the rest of the time, the vast majority of the content is relatively static. A seller might update the description of a listing a couple times over a two week auction, for example. And while it sounds great to immediately update my search results when a new listing goes live, in reality, I already have 40 pages of results to look through, and that listing that just went live 5 seconds ago probably isn't much more relevant than any of the others I'm sifting through.

I'm not opposed to client-heavy apps where it makes sense. When done well, it can create a really responsive user experience. Gmail is great at this; I have no desire for it to be "real time" -- not any more than it already is.

Do we really believe that one day cnn.com will be "real-time", with article updates and errata popping up inline as we read?

It's not that everything must be real-time. But, the stuff that doesn't need it has already been well-done for over a decade. The frontier of new possibilities (including as incremental enhancement to the old categories) tends to involve what's enabled by real-time.

For example, sprinkling in a little real-time surprise – like a notification that others have already responded to your recent work – can accelerate valuable interactions.

For example, in 'blogging' and 'news', both the original authors and active commenters appreciate no-reload indications of fresh comments, mentions, and inlinks. You can do a site without that – but you'll be missing out on features that users increasingly expect, and work to create new interesting content and engagement.

In 'e-commerce', a client-pulled site works and is well-understood, but adding live sales help, or indicators of limited deals being exhausted, can help close sales... so why not try it?

Even where the major cores of these markets work fine without real-time, the frontier of exploration and optimization uses greater game-like liveliness.

When CNN.com replaces what's gets piped into a 100 inch screen in your living room, yes.

Anything is possible.

Maybe it's just me, but I find the simultaneous popularity of "only check your email 4 times a day" and "OMG ALL WEB APPZ MUST BE REALTIME" slightly peculiar.

To tell you the truth, I don't normaly use real time web apps at all. But I have an urge into turning what I write into real time apps, and no good explanation why, it just feels that they become much easier to use.

Maybe I (and everybody else) only have the wrong impression. It happens, and I don't have enough data to conclude anything.

idealistic view of the world vs the reality of the world

Python in general doesn't really have a good solution for this, so it's not something specific to Django. I run a Python web app that has certain real-time needs, and I had to forgo a popular web framework like Django so that I could use Twisted. The problem with solutions like this is that since the language doesn't have built-in support for asynchronous IO, everything has to be compatible with the library of your choice (whether that's Twisted, Gevent, or other), and at that point, you'd be better off just using a different language/runtime like Node.js or Erlang.

I think the current solution is to have Django serve the main app and have a separate "API server" that runs Node or whatever, but as the article points out, you're not really even using Django at that point because all it's doing is serving up a single HTML page--the rest is handled by the browser and the API server.

Python 3.4 may help a lot with the language mechanisms (with asyncio, pluggable event loops, and composable generators everywhere), but there's still the issue of getting library support to use all of that.

Node.js isn't actually better - it uses the callback model of async programming, which should be familiar to any C++ programmer who's been writing servers since the 80s, both because it's the current best solution for writing scalable event-driven servers and because it sucks.

For ease of programming a CSP-based language like Go or Erlang is really the way to go, but then you're back to the "lack of library support" problem that you'd get with Python 3.4, except worse because Python at least has libraries for the synchronous part of the computation.

> Node.js isn't actually better - it uses the callback model of async programming,

The point is not about the code style, it's that since Node is 'asynchronous by default,' culturally, Node produces libraries that are also asynchronous by default. Most Ruby or Python libraries aren't.

> but there's still the issue of getting library support to use all of that.

Much of Node's success in employing the event loop model comes from not having extensive libraries of blocking I/O code to work around. Maybe the difficulty of the 2-to-3 Python migration could be a boon here.

Good news as you can now use ES6 generators in Node.js and when combined with another ES6 feature, Promises it can result in much nicer async code e.g. http://taskjs.org/

ES6 is looking very nice indeed, great thing is that with Node you won't have to worry about old browser support.

I never understand the `function *(){}` syntax. Why can't we simply introduce a new keyword like `generator (){}`?

Or just recognize that the function contains the yield keyword. That said, `function*` isn't all that bad given what you get in return.

Perl has libraries that can work with lots of different event loops (see for example http://search.cpan.org/~mlehmann/AnyEvent-7.07/lib/AnyEvent.... ) so asnyc IO doesn't need to be built in to support a very good level of portability.

I'm sure that'd be possible in python as well.

I guess it's less about native support and more about the culture. Very few Python libraries are built with async IO in mind. And honestly, I don't expect them to be, because there is no standard, agreed-upon way of doing async stuff in Python. So the result is you get a bunch of tiny communities around each event loop implementation, but because they aren't compatible, you get a bunch of repeated work and very little overall progress.

(Don't know if it's the same with Perl, but I feel like the Perl community evolves much more quickly than Python's.)

That's currently what I do. Also Django is useful for sketching your requirements out, then once you know what you're building you can move the real-time data retrieval into NodeJS and let Django handle the boring stuff.

Not EVERYTHING in an app will need to be real-time. Especially boring maintenance functionality such as password reset etc.

True, but for such basic things, you could just use something lightweight like Flask and not have to deal with all of Django's baggage.

It all depends on the app, though. In my case, even "boring" stuff often needed to make HTTP requests to the outside world, so Django would never have scaled for me. (In hindsight, I should have just written the whole thing in Node, but it wasn't as mature back then and I didn't want to take the risk.)

I think very few sites actually need to be SPA at all. Just because an e-commerce site has a real-time component doesn't mean it must be built in Meteor.

E-commerce sites are in fact a prime example of something that I think should be built using traditional technologies. Do you want price updates? Just poll them with AJAX and let the rest of the site remain static. It's far from a multiplayer game we're talking about.

I don't think I really understand the limitations we're talking about. No you wouldn't ever want to write an app that had real-time elements in pure Django, but isn't that what Celery is for? I bet with a solid messaging queue and good architecture you could write a pretty convincing real-time app using Django as not much more than a REST api to celery tasks and the database (and really, that abstraction is what a framework is for anyway).

Besides, this sky-is-falling nonsense around frameworks is getting old. A framework either lives or dies. Django has a very healthy community around it and they are doing a great job right now of keeping the framework stable so folks who "just need to get work done" can get work done. There haven't been a lot of revolutions, and that's fine for me. Believe it or not, there's still a market for content-heavy, traditional MVC websites. And when you need to add real-time elements, Django, Celery and Django REST Framework are up to the task a vast majority of the time.

Another real time application issue that rarely gets any attention is WebRTC. I wish people would start tackling these issues for python/django, too. As of writing this I don't know about any library that would allow me to write a server application in python that would serve as a peer in a WebRTC session. The benefit would be unreliable real time data channels to the server. This can be of great use for games. Of course there are many different use cases.

Aside from an inability to run websockets on Django, I've been running "real time" websites for quite some time. AJAX calls are dirt simple to handle with your typical Django setup.

Scaling and blocking are handled pretty easily by running Django on FCGI using Flup and a Nginx frontend. No blocking problems since they're running in processes and threads, redis for caching and pub/sub, and a database for the backend. Works a charm.

Now then, this isn't a high volume site, getting only in the medium hundreds of requests per minute, but it's been working without problems on a small AWS instance. DB backups take more CPU than Django ever has.

Websockets, on the other hand, took me over to Go. Certainly not giving up Django for the rest of the site, however, until it really can't handle the load anymore.

Wouldn't a websocket middleware solve the issue? Client starts a socket and passes the id through HTTP to the Django app. When something happens in Django, the event is piped through the previously created socket and the problem is neatly solved. Could even have some sophisticated publish/subscribe mechanics in here.

There is a bit of middleware out there which enables websockets, but:

1) Doesn't currently work with Django 1.6+

2) Websockets and WSGI don't mix well. It's possible to capture the socket and use it further up the stack, but it requires some really nasty hacks.

3) Requires you to use a custom version of runserver, which allows the raw socket to be passed up into the handler code.

Not worth it to me. I'm sure the code could be made to work again, but then you loose a lot of the benefits of running it behind something like fcgi (since you need access to the socket for the persistent two way communication).

I wasn't thinking about Django middleware but a more generic message routing engine running as a separate process that speaks websockets on one end and has a bi-directional REST interface on the other exchanging messages with the Django backend.

edit: karneges points out Hookbox and Pushpin can fill this role.

Interesting idea, but you still end up with something performing polling against the backend, if you want the ability to send messages to the client without receiving a request first (the real strength of websockets).

Well, you always need the client to reach out first, since it is not otherwise addressable. ;) But this doesn't mean it has to poll. With a separate gateway that supports Websockets or Server Sent Events, it should be enough for the client to make an initial request to bootstrap the connection, and then the server can send as many messages as it wants downstream.

> it should be enough for the client to make an initial request to bootstrap the connection, and then the server can send as many messages as it wants downstream.

Yes, that's the point of websockets; but what was discussed here is a program acting as a bridge between django and websockets using http to speak to django. http does not support such asynchronous communication. You get one response for one request. How can a backend send a new response if there is no open request from the intermediary?

Sure, you can hack http by refusing to close the stream of a response and sending data intermittently (well, if you're using a django->apache/nginx protocol that supports streaming responses), but then you're no longer speaking http; you're speaking your own protocol over http.

Sure, you can have your intermediary poll django, which reduces some of the overhead since you're bypassing the external network stack, but you're still relying on only sending messages every $poll_interval.

Sure, you can create some secondary process through manage.py that runs and communicates with the intermediary directly, but then you're no longer speaking http to Django.

If you want async communication between your client and your server using websockets, you can't rely on speaking http to anybody; it just isn't compatible with truly asynchronous websocket communication.

The HTTP site of the middleware server should be able to make and receive HTTP requests. If something causes an event in Django, it can make a request to the HTTP server side of the middleware and send a message to all listening websocket clients.

Aaah, I think my confusion was from your misuse of the term middleware. When used in the context of Django, middleware is a layer in a stack of WSGI calls, not a standalone daemon which accepts and receives http posts and websockets.

I could see such a standalone daemon working, but it would seem like more straightforward to just write a daemon which handles websockets and your application logic on its own.

Hookbox is pretty much in line with what you're describing. It speaks Websockets on one side and HTTP on the other.

There's also Pushpin, which I mentioned in a separate comment.

I'd completely agree.

Django is missing websockets, and little else. All that ode about not repeating code at the client and the server isn't that relevant because the view (client) operates on a completely different environment from the model (server), and does a completely different kind of data manipulation. Very little code repeats, and the little that does is trivial.

Ok, a better way to represent the client code is always welcome, but Django has already a lot of power and flexibility here, and it couples well with Javascript capabilities.

I don't know about Django but rails has the idea of "live controllers".

Sure it uses polling but didn't you watch DHH's railsconf presentation? They have 5-6 workers and a single redis server which sustains 100k+ reqs per minute.

It also only took DHH 4 hours to convert the entire basecamp project to be live (ie. live updating comments as it comes in).

Sure it's not really live since the polling is only happening every few seconds but who cares? Even for most chat systems it's completely reasonable to do polling, most certainly if it's 1:1 chat.

Also look at Disqus. They are mostly all django, they even use postgres with a schema. Their "real time comment system pusher" was written in Go in a week with almost no prior knowledge to Go. I see nothing wrong with that and IMO it's exactly what we should be doing.

Use Django/Rails for the bulk of your app, CRUD interfaces, etc. and then create optimized services with Go or some other language for real-time aspects.

[*] Everything I mentioned is documented online through talks, engineering blogs, etc..

I have written "realtime" web applications in Django, using Tornado for websockets (or their emulations). While a pure realtime non-blocking solution might be able to squeeze out a lot of more performance, it's certainly possible.

Realtime web applications require a choreography of communication between server and client, with an unpredictable user and network messing stuff up all the time. Like much of web development it comes down to not going crazy. Otherwise we would be writing web applications in C++ or Java, wouldn't we?

I don't see nothing wrong with separate asynchronous server which handles real-time for your Django site.

When event generated by user happens on your site - you just handle it in a traditional manner i.e. - POST via AJAX, validate, save if necessary and then publish into asynchronous server which broadcasts event to all connected clients. In this way you have a graceful fallback in case of async server downtime, so your user doesn't even notice something went wrong. You are not mixing things which were not developed to be mixed. In this case you are just writing your site as usual and then add real-time elements where necessary.

Using Gevent together with Django seems like monkey patching entire web site to me.

I really respect the work of guys developing uWSGI. But at moment it does not seem to be usable in a simple obvious way. Maybe in future their real-time support will become mature and convenient enough.

Of course, Meteor and Derby like approach is another level of problem solution. But in context of Django I don't think we should consider them as examples. We use python, not javascript - we have no native solution for browser environment and I personally think we do not even need it.

The best way I found to do this for python is by using Tornado,you have an excellent websocket implementation baked in and a scheduler within the webserver itself so its simple to poll for changes and update only when necessary,or interleave with a call back if you want "true" real time. Plug in a front end with angular/knockout etc,pass around json objects and you are good.

As far as meteor/node goes,having the same language on the server client is great. Having javascript as that language is not so great. Web apps are generally a front end to something bigger and I never want to do any serious data wrangling in javascript if I can avoid it.

You can use Pushpin in front of Django (or any web framework, whether event-driven or not) to implement realtime features.


The thesis behind this architecture is that most realtime web applications can be reduced to request/response and publish/subscribe messaging patterns. Instead of looking at Django as a legacy framework, look at it as 50% of the solution (read: request/response). Pushpin provides the rest.

This looks quite interesting; thanks for sharing.

I disagree that server side templates are no longer needed. Templates are often reused for things like sending emails or exporting to PDF. Sure, you could use a JavaScript server side template to do this.

The only issue I see with Django is websockets. Apart from that I have been using Django to build 'real time' web apps for years (AJAX). Django does server side very well, AngularJS does client site well, mix in django-angular and I have most of what I need. websockets django-websocket-redis.

I had the same problems.. I love Django and I want to use it for my real-time application but I just couldn't find a way to make it work. I've chosen to use node/angular/firebase instead and I'm very happy with my choice so far.

I use Angular with Django-Rest-Framework and I love it.

Hey, I wanted to pick your brain about it but I can't find your e-mail or a way to reach you. Would be awesome if you can contact me (phzbox at gmail)

localbitcoins.com is using django. Start a trade and messages are real time without needing to reload a page

Probably ajax, which is more resource intensive/laggy than WebSocket-y and isn't bi-directional (you've got to poll with ajax if you want to push changes from the server).

Ajax is Good Enough for this use case (and probably a lot of others), imo. Even polling every 10s is plenty for messaging on a site like localbitcoins.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact