

Tornado twice as fast with PyPy 1.7 compared to Python 2.7 - peterbe
https://groups.google.com/d/topic/python-tornado/VkOPfrbhaXE/discussion

======
peterhunt
I _love_ PyPy, but it kind of sucks watching everyone get excited about the
CPU performance of a web server while forgetting about this:

"Pypy used about three times as much memory in both cases, but usage was
stable over time (i.e. it's not "leaking" like pypy 1.6 did)."

For a front-end web server, I'd trade CPU perf for memory any day of the week.

~~~
sb
I think that's a good point and was also heavily commented on when Google's
Unladden Swallow released its benchmark numbers (IIRC, for the django
benchmark its binary size grew to 800 megs.) Probably that was even a reason
they stopped working on it (there was a link somewhere, but I cannot find it
right now.)

Furthermore, I think this "problem" is attributable to jit-compilation in
general, since you have to store the code somewhere. The situation was/is
somehow similar to the JVM's memory requirements. An interesting alternative
to code generation is to optimize interpreters instead.

~~~
sandGorgon
with SSDs catching up (in terms of price/gb) to hdds, could you possibly use
SSD as a jit-cache ? processors are the most power hungry part of a machine,
and considering cooling is the biggest bottleneck in a modern datacenter, I
wonder if SSDs acting as caches could amplify a processor's power.

~~~
wmf
Sure, you can use an SSD as swap (unless you're in the cloud where SSDs have
inexplicably not been invented yet).

------
abecedarius
While I've also seen it to be about twice as fast in my own testing (not with
Tornado), it also had much longer GC pause times. (No stats, just watching my
program's logs go by -- very noticeable stutters every couple seconds or so.)

~~~
aidenn0
I would be very impressed if any real GC beat out CPython in terms of GC pause
times, as CPython uses reference counting, which is far less subject to GC
pauses than other GC methods (but has the disadvantage of worse amortized
performance and inability to collect circular data-structures)

~~~
jerf
"Difficulty" collecting circular structures, not "inability". You can still
walk the live set with reference counting every so often to catch the circular
trash, but you do have to pay through the nose, relatively speaking.

~~~
sb
Which is actually what Python is doing, too. There is (IIRC) a mark and sweep
collector that collects cycles. Another interesting technique for dealing with
this problem is called "trial deletion."

------
peterbe
And memory stable!

And three times faster on template rendering!

~~~
kingkilr
Major thanks to Justin Peel for finding the memory leak! That was a fun one,
memory allocated external to the GC (so e.g. something allocated by OpenSSL)
wasn't factored into calculating when to run the GC, which meant that the
destructors of the objects holding onto the OpenSSL structs weren't being
called because the GC never saw the correct memory pressure.

------
ecdavis
It's interesting that the first run on both the PyPy tests was significantly
slower than all subsequent runs. I guess PyPy needs to do a bit of work before
it's properly warmed up.

~~~
endtime
That's because PyPy is JIT-based - it compiles the code just before running
it, unless it has already been compiled (which is the case on all but the
first run).

