
The switch: Python to Node.js - pquerna
http://journal.paul.querna.org/articles/2011/12/18/the-switch-python-to-node-js/
======
wulczer
Sounds like a weird reason to switch the language and framework used for your
product. Blocking libraries block just the same in Node and in Twisted. Why
not just stop using the Django ORM or more generally, stop writing code that
blocks the main thread for long periods of time?

The problem of the main thread waiting on a response from the database is
easily solved without rewriting all your code, which incidentally will have
the same issues, since there's no new magic there.

If your software is a giant, ugly hairball because you wrote it in a hurry (I
know I did this many times!) don't blame it on the language or framework. What
I'm wondering is: Python/Twisted/Django rank equally or similarly on all the
points from the list, except for Team Experience, which has to be much higher
with the old stack than with a completely new technology and set of tools.
This makes me think that there was a "play with shiny things" point after all.

~~~
rictic
It's been a while since I've written nodejs in anger so I may be out of date,
but as a rule libraries in Node don't block, they present async interfaces and
do what they have to in order to keep from blocking.

Python has no such conventions, making many libraries into dangerous
minefields for the Twisted developer.

------
lordlarm
I've come up with the same conclusion regarding the Django ORM, as they have.
An excerpt:

 _I believe our single biggest mistake from a technical side was not reigning
in our use Django ORM earlier in our applications life. We had Twisted
services running huge Django ORM operations inside of the Twisted thread pool.
It was very easy to get going, but as our services grew, not only was this not
very performant, and it was extremely hard to debug_

Django has in many ways become the industry standard in regards to fast
development - but it, as so many other ORMs, does not scale and produces
horrible SQL-statements. Also for many applications I find Django to be a huge
overkill - and in more recent times I've focused a lot more energy using
microframeworks such as Flask and Bottle.

~~~
ergo14
"Django has in many ways become the industry standard in regards to fast
development - but it, as so many other ORMs, does not scale and produces
horrible SQL-statements."

I disagree - many ORM's in python world? Try using sqlalchemy it will put
django's ORM to shame, it will build fast good looking queries most of time,
and if you have a rare case of something not good enough - you can easly tweak
it. Can't say anything about Storm but saying that "industry standard" django
ORM is OK because "others have same flaws" is just distorting reality to fit
this thesis ;-) Ugly truth is django ORM is not ok and it bites many people
when they are at the point where it's real pain to fix this - the more
developers are aware of this the better.

~~~
jpk
I'm in the process of building an api in Django (GeoDjango, actually, with
PostGIS). What are the gripes with the ORM and how could it be better? (Not
that I'm disagreeing; I'm curious. I'm not experienced enough to know what
makes an ORM okay or not.)

~~~
clojurerocks
Theres a couple of issues i had with it personally. It doesnt work that well
with nosql. Also i found it difficult to customize or do certain things that i
wanted to do. Some of them very basic. For example i wanted to only return
individual columns rather then all of them. I spent hours looking for how to
do this. I finally found out you do .values or .select or something to that
effect at the end of your call. Unfortunately i tried this and got errors. So
i gave up in frustration.

~~~
kingkilr
.defer() and .only(), they're right in the list of methods on Querysets:
[https://docs.djangoproject.com/en/1.3/ref/models/querysets/#...](https://docs.djangoproject.com/en/1.3/ref/models/querysets/#defer).
Sorry, but it's a bit hard to take this critique seriously, given how trivial
it is. I have a list of complaints about Django's ORM that is about a mile
long, so don't think I'll defend it blindly, but let's try to stay grounded in
reality when we criticize something.

~~~
clojurerocks
No idea what defers and only are. Actually maybe only was what i was looking
for. The django documentation for many things such as this is almost non
existent. Some of it is from older versions as well. And im sorry but how is a
newbie supposed to find this stuff. Youve probably been working with python
and django for a while. Which is great. But when youre just starting and
playing around having to spend hours looking simplistic stuff like this up
when you have a million other things to do really gets frustrating.

~~~
bretthoerner
None of your points are coherent or truthful. This seems common among people
that run to Node and Mongo.

> The django documentation for many things such as this is almost non
> existent.

He just linked you to documentation with exactly what you needed, so it surely
exists.

> Some of it is from older versions as well.

He linked to the current 1.3 docs.

> And im sorry but how is a newbie supposed to find this stuff.

The second section of the docs is about models, the second heading is all
about QuerySets. The link he pasted is literally right in front of you.

<https://docs.djangoproject.com/en/1.3/>

QuerySets: Executing queries | QuerySet method reference

------
sneak
Python is a programming language. Node.js is not. What you meant to say is
that you switched from Python (presumably running on top of a Real Web Server)
to javascript running on Node.

The argument could be made that they are both interpreters, but please, get it
right - you are writing javascript, which is a pretty shitty language in which
to develop apps these days. The only reason we use it is because it has near-
universal browser support - and then only for UIs.

Also, not to parrot, but it's true - node is cancer. It's for fad chasers who
have no idea how real servers manage to serve volume.

Fortunately, most people inexperienced enough to choose node are inexperienced
enough to never need to scale, so it'll work out okay for them - for
implementation-specific values of 'okay'. :)

~~~
karterk
> you are writing javascript, which is a pretty shitty language in which to
> develop apps these days.

Woah. I do both Python and JS, and guess what - there are a fair number of
people who hate Python's significant whitespace. And to them, it's also a
"hacky" language (with all kinds of __foo__ stuff). So, shitty-ness is in the
eyes of the beholder.

> It's for fad chasers who have no idea how real servers manage to serve
> volume.

There are real world apps like Trello, which is on Node, and are scaling well.
Get your facts right, or stop trolling.

~~~
sneak
I think it's amusing that when I claim that javascript is a shitty language,
instead of trying to claim that javascript isn't shitty, you point out the
shittier attributes of the language I was comparing it to.

Regardless of whether you like python or not, javascript pretty much sucks.
Coffeescript helps a bit, but if you aren't running in browser, why hobble
yourself like that?

~~~
cm127
The "Javascript is a shitty language" argument is getting old. I feel like
you're only admitting that you're too lazy to learn how to program with a
prototype-based language. You even suggest Coffeescript which is big on making
Javascript like an OOP language when it's really not.

As far as a reason why you might "hobble" yourself is because Javascript is
great at asynchronous design. You should try checking it out for real instead
of jumping on the opinion bandwagon.

~~~
jashkenas
Oh man oh man.

    
    
        > You even suggest Coffeescript which is big on making
        > Javascript like an OOP language when it's really not.
    

JavaScript is deeply object-oriented.

------
amix
I blame them and not the technology stack they used. It's idiotic to mix
blocking technologies with non-blocking technologies as blocking the IO loop
is fatal for performance as you won't be able to process any events while the
IO loop is blocked.

And this person still has not learned from their mistake, they think that
using SQLAlchemy would have solved their problem. If they wanted to make
Twisted work they should have picked a database that can be used in a non-
blocking way.

The reason why I like node.js is that everything is built in a non-blocking
way. Twisted has the same philosophy until people think they can get away
using a blocking library inside the IO loop.

~~~
dextorious
SO, basically you just learned about non-blocking and think node.js is a
golden bullet. It's not.

And it's exactly the same situation as anything they could have used in
Python, as soon as they add actual code to get work done (as opposed to hand
over work and wait for a callback).

"""The reason why I like node.js is that everything is built in a non-blocking
way. Twisted has the same philosophy until people think they can get away
using a blocking library inside the IO loop."""

That's not what they were doing, though. At the end of the non-blocking fiesta
you want to actually return results to a waiting connection -- actual results,
rendered templates etc. That will block and take time -- especially in node.js
which is single threaded.

So they just traded Python's threads/processes/lightweight threads for node.js
processes.

~~~
amix
No, I did not learn about non-blocking approach recently. I have built
solutions in Java and in node that scaled to over 300.000 open connections
pocessing billions of messages. Besides that you are missing my point, which
basically is to not mix blocking with non-blocking code.

------
espeed
The depth of libraries available in the Python or JVM ecosystems is going to
dwarf what's available for Node. Were you able to find everything you need for
Node?

~~~
substack
Node is catching up pretty fast, although its modules might not be as mature
yet:

    
    
        102257  perl    http://www.cpan.org/
         33270  java    http://search.maven.org/#stats
         31921  ruby    https://rubygems.org/
         18068  python  http://pypi.python.org/pypi
          5732  node    http://search.npmjs.org/

~~~
espeed
What's the quality like of most of those Ruby gems? I keep seeing people refer
to the gem number, but how sophisticated are they?

Last time I looked there wasn't much along the line of SciPy, NumPy, NLTK, or
Matplotlib.

~~~
malandrew
I would venture to guess that the lack of such libraries is more the result of
community focus.

The rails community (and by extension ruby) is generally more product focused,
whereas the python community (and by extension django) are more science
focused.

I intentionally ordered the language and the framework differently for each
because most people who use ruby/rails, got into that community for the sake
of the framework, whereas most of the people who use python/django, got into
that community for the sake of the language.

Finally, python has a much greater adoption among in the academic world, so
it's natural to expect that its community will have built more academic
libraries like those you mentioned.

Node.js and Javascript are seeing adoption among those interested in real-time
applications, evented systems and applications that bridge the chasm between
the client and the server, so it is natural to expect that more libraries
focused on problems within those domains.

One thing I like that I've seen from the node.js community is that the fact
that they are developing with the same language on both the client and server,
that they consider them one in the same. The only thing separating the two is
latency and connection reliability. The latency issue is psychologically not
much different than having a bias for doing things in memory and avoiding disk
IO server-side. The lack of connection robustness is likely to evolve into
solutions that mirror the problems that the erlang/OTP community has spent a
lot of time solving.

The ecosystem around languages are heavily influenced by the strengths and
weaknesses of the communities that adopt them, and the community that adopts
them is largely the result of which problems that language is well suited for
either by language design or historical coincidence (e.g. Javascript is in the
browser)

------
andrewcooke
so what makes javascript callbacks easier to understand than twisted's
deferreds? aren't they basically the same thing?

and how does "we picked Node.js" logically follow from "It is obvious that the
JVM platform is one of the best ways to build large distributed systems right
now"?

~~~
dextorious
There probably a middle step "we also wanted to play with the latest shiny
technology", that's not mentioned.

~~~
andrewcooke
looking at the spreadsheet, "shiny newness" seems to be "velocity"; it has the
highest weight of all (twice the average).

i can only begin to imagine the conversations a year down the line:

\- _how did you decide this?_

\- we used a spreadsheet. science!

\- _so you weighted against the asynchronous approach that confused your
developers earlier?_

\- actually, we put most weight on it being an exciting new technology.

\- _oh._

[https://docs.google.com/spreadsheet/ccc?key=0AvBGESHWxhk2dHJ...](https://docs.google.com/spreadsheet/ccc?key=0AvBGESHWxhk2dHJ2Q0lWRFF3dkxLZmFiMVVGRElQaEE#gid=0)

~~~
pquerna
The state of the spreadsheet wasn't constant.

We played with the weights.

What I posted was just where it was left after a meeting 9 months ago.

Yes, we definitely weight against our feelings of how we failed to employee
Twisted Python. This was our experience, are you suggesting we shouldn't
consider our experiences when evaluating something?

It really came down to a choice between pursuing a JVM based system, and
Node.js.

Yup, we probably did want to do something new. But guess what, its worked. I'm
sure we would have haters the other direction if we were posting about how
Node.js has failed us, but it hasn't, yet.

~~~
andrewcooke
i'm surprised that you chose a technology so similar to one that you had
problems with before. especially when the deciding factor seems to have been
"it's exciting" rather than "it does the job well". yes, you can argue that
"exciting" is important to attract a top-notch team, but then i am not sure
why a top-notch team would have problems with twisted (and not have the same
general issues with node).

how will node fix the issues you had with twisted?

------
hello_moto
If they believe in Node.JS, good for them. I assume they have the right people
to write good and modular javascript code.

If you're a good front-end JS developers but never ever wrote a minimum
reusable JS library, you've got a lot to learn before diving to JS.

As long as the code is waaay more modular than this:
[https://github.com/marijnh/CodeMirror2/blob/master/lib/codem...](https://github.com/marijnh/CodeMirror2/blob/master/lib/codemirror.js)

They should be alright in terms of adding new features. And if Node.JS hits
scalability issue? well tough luck I suppose.

I see that JVM is a very serious contender but as already noted: License
issue. And let me add this: I'm betting my ass that there's the "I don't want
to write Java" whisperers as well. Now now, don't BS me. I know how the so-
called "engineers" think of Java these days. Sure, you could use Clojure,
Scala, or JRuby. But seeing the competitor is Node.JS, pretty easy choice
don't you think?

My experience writing JS code for Node.JS based platform (ExpressJS, etc) is
that there's a bigger chance that you'll write more code. Code to make the
code more modular (sounds weird isn't it?). Code to make sure you work-around
the warts of JS. Each line of code should be highly scrutinized due to JS
warts.

This thread is going to be a typical fun nerds-fight. I guarantee you. I'm
going to grab a popcorn and watch nerds doing keyboard slamming.

------
viraptor
I'd really like to see that spreadsheet, since there are many python
frameworks other than twisted available. There are eventlets/greenlets,
there's gevent, there's diesel and monocle for minimal things, etc. Makes you
wonder what actually made them change the whole technology... Than again, they
could probably even reuse pyjamas to still write node.js code in python...

OT: "We should of used SQLAlchemy. We should of built ..." I thought this
expression was Bristol-specific. Apparently it's taking US too.

~~~
tikhonj
I don't know if you missed the link or if it was added later, but it's here:
[https://docs.google.com/spreadsheet/ccc?key=0AvBGESHWxhk2dHJ...](https://docs.google.com/spreadsheet/ccc?key=0AvBGESHWxhk2dHJ2Q0lWRFF3dkxLZmFiMVVGRElQaEE#gid=0)

It seems they looked at Gevent, Go, C++ and the JVM. As usual, no love for
Haskell :(, not that they're entirely to blame there.

~~~
reinhardt
Too bad that the comparison doesn't make much sense: Node/Twisted/Gevent
(language specific frameworks) vs Go/C++ (languages) vs JVM (platform).

~~~
dextorious
It's not like (language specific framework + language) is that different from
(language + built-in features + standard library).

------
kyledrake
If you want to use a reactor pattern and you don’t want to use JavaScript, you
can use EventMachine on Ruby. Twisted is the same thing for Python. I’m sorry
to hear you were having problems with Twisted, but EventMachine is absolutely
rock solid. I have gotten a lot of work done with it.

Now, let’s say you don’t want to do callbacks, but still take advantage of the
Reactor pattern in EM. I wrote a patch for Sinatra that uses Fibers to wrap
callbacks, hence allowing you to continue to program synchronously as you used
to, and it’s called Sinatra Synchrony (<http://kyledrake.net/sinatra-
synchrony>).

But in order to do non-blocking IO, you don’t need to use a Reactor pattern,
because Ruby internally does not block on IO (the GIL does not affect this).
And of course you get solutions like JRuby (built on the JVM), which provide
for threads (and Rubinius 2 coming soon).

My APIs written with any and all of these concurrency options can get
thousands of hits per second. It’s quite scalable, but still provides all the
rich libraries, reusable code, and testing support that makes my APIs high
quality, which to me (and my users) is more important than making them fast.
With this approach, my APIs are both high quality and fast. And I have never
experienced a single crash of any worker processes in production. I would
know, because my process monitor is smart enough to observe the workers in my
deploy code and informs me, while adding another worker to replace the fallen
one. Did I mention I can do zero-downtime hot deploys? Again, all implemented
in Ruby.

I’m not trying to brag here, but there’s this weird current of arguments that
these existing solutions don’t scale. It’s a myth, I don’t know where it comes
from, and I’m worried that if I don’t speak out against these arguments, I’m
going to wake up in a giant pile of half-baked Javascript spaghetti written by
people that didn’t even understand the real problem in the first place
(blocking IO people, research it!).

------
clojurerocks
After developing a v1.0 of a project with django i ended up also moving to
node.js as well as mongodb as opposed to postgresql. I found myself
increasingly wrestling with django. I really like python and the django
community and think django is good for certain types of applications. But at
least for what im building it wasnt the right solution. Im also using node.js
for another project which is also currently built with django. It has
different requirements and i might use python on the backend whereas the other
project will most likely just be node.js and mongodb.

------
peregrine
I'm curious as to why Go wasn't a final contender? It started off with Go
being a crowd favorite and ended with it nowhere. I assume it has to do with
the young package repository but what's the reasoning you used?

~~~
pquerna
Go's core packages are good.

Go's not-in-core packages are a mixed bag. Lots of "one off" experiment
projects. Which is fine, thats where it is in its life. Node.js was in the
same place packages wise 18 months ago.

Additionally, at the time we were looking at this, Go was still doing releases
every 2 weeks -- this was months before the Go 1.0 plan was even announced:
<http://blog.golang.org/2011/10/preview-of-go-version-1.html>

~~~
peregrine
Thanks! I am playing around with Go and you are right the packages are a mixed
bag. Not to mention using a search engine is basically impossible.

------
baghali
Did you evaluate EventMachine? If yes would you please share your findings.

~~~
pimeys
I've used fibers-enabled EventMachine in one part of our application. It's
pretty nice, because you don't have to do callbacks and the Ruby fibers are
pretty nice actually.

We're using it for handling jobs from Resque, which reads and writes from our
database and do a ping to a 3rd party server.

But. The amount of bugs in the em-enabled libraries is just amazing. For
example em-activerecord just didn't work so well under huge load (dropping
SQL-connections), so I had to write my own mini ORM for the workers. Also
Resque blocks by default, so of course doing a non-blocking and non-forking
version was a priority.

The worst thing here are the error messages and backtraces. There are none
(DeadFiberException for failing assert) and with hacks you can get out a bit
more information if the job crashes.

Now thinking this again, it would be a good idea to write it again with
Node.js. Just because it's still more mature compared to EventMachine, and all
of it's libraries are reactor core friendly by default.

~~~
sandGorgon
>For example em-activerecord just didn't work so well under huge load
(dropping SQL-connections), so I had to write my own mini ORM for the workers.

That's interesting - could you elaborate (here or a blog post) on your
findings. what was different between your ORM and em-activerecord that made it
more performant.

The thing is - my first thought would have been to leave the ORM alone and
focus on the DB side (like more sophisticated connection pooling/pgbouncer,
etc. ). Which is why I'm interested in what went wrong in the ORM that made it
screw up when used in a non-blocking kind of a setup.

~~~
pimeys
> That's interesting - could you elaborate (here or a blog post) on your
> findings. what was different between your ORM and em-activerecord that made
> it more performant.

I will do a blog post about this when I finally have some time (christmas
holidays, maybe). Em-activerecord worked very nice first, but when I hit it
with thousands of concurrent jobs, some workers just dropped dead saying MySQL
couldn't answer. This was annoying, because the workers didn't really fail in
Resque, just failed to do their job and if this was in production it would've
cost us thousands.

So, my own ORM is just a database superclass with em-connection-pool,
configuration, openstruct and couple of class and instance methods (insert,
update, find, query). And now this thing is really really fast, using lot less
of sql connections (delayed job had 300 connections, now we'll need only ~40
connections) and scales really well.

The thing here is, that when I'm not using Rails at all, I don't know why I
should use evented Ruby instead of Node.

~~~
sandGorgon
forgive me for digging deeper - but could you narrow down the problem _why_
your architecture better. My gut feel is that you are leveraging the
connection pool better - because thats the only thing that can explain DB
connection issues. The other thing could be that your em-activerecord
connection objects might be too heavy, so you're hitting some kind of
ulimit/open-file-descriptors problem (basically OS issues).

This seems like exactly what would hit anybody who is trying to build a worker
model for processing data... hell even for sending emails. It would be
interesting to see where you are pushing the limits - it could even be
something deep like

~~~
pimeys
I tried to dig deeper to em-activerecord, but it really was a better idea to
build a very simple orm layer with specific sql queries instead of a more
complex solution.

This problem is noted by the developers of em-activerecord and em-synchrony a
several months ago already. There seems to be no progress, which might lead to
a harder problem or that people are not using these libraries in bigger
services.

But my point was, that Node.js seems to be much much more finished product
compared to EventMachine for example. With EM you have to live without most of
the Rails and for example testing the EM code with unit tests is not so nice
job to do...

What I liked the most with EM are the Ruby fibers. The best solution for
hiding the callbacks and writing nice and readable code by far. Too bad the
fibers are a bit of a ghetto still, like somebody said earlier.

