
PyPy for low-latency systems - shocks
https://morepypy.blogspot.com/2019/01/pypy-for-low-latency-systems.html
======
kahseng
Reminds me of a time at Quora in 2011 where we saw Python GC impact 99th
percentile server-side site speed. So drawing from HFT inspiration where some
companies would disable JVM GC during trading hours and perform them offline,
I thought about how to take some backends periodically offline in order to
have GC not happen on user requests. A simpler operational solution emerged
though where I just had to disable GC on user requests and make it happen only
on a special "/_gc" endpoint. I then dual purposed the frequent nginx/haproxy
backend health-check functionality to use that endpoint, thereby ensuring all
backends had frequent GC and the time spent there only impacting the health
check requests, and not that of end users.

edit: added more details I remembered later

~~~
JanisL
This is a very interesting approach, what happened with memory footprint when
you did this?

~~~
kahseng
Thanks, don't think I saw much impact at all in aggregate - our memory
consumption on these web servers were dominated by objects we intentionally
stored per request or globally, and not temporary/unreferenced python objects.

~~~
cma
Even while GC is delayed Python (CPython at least) will free some stuff
through reference counting. Only circularly referenced stuff should stick
around until the next GC run. So that can avoid lots of stack temporaries and
stuff.

~~~
Doxin
Theoretically with your code structured right you can disable the cyclical
garbage collector outright: It only deals with reference cycles which you can
explicitly avoid by using the weakref module.

Not entirely sure how you'd go about writing code like that, but it's
possible.

------
htfy96
Nice examples and graphs. What really confuses me is the definition of "low-
latency" nowadays. The meaning suffers a slippery slope in recent years. It
used to refer to a scale of microseconds at HFT shops, then it came to web
request latency at a scale of milliseconds. Now every GC-based language claims
they are "low-latency" because 90%/95%/99% GC stop is within
16.67ms/50ms/whatever. Therefore today some HFT developers invented words like
"ultra-low latency"[0] to name their work.

[0]: [https://en.wikipedia.org/wiki/Ultra-
low_latency_direct_marke...](https://en.wikipedia.org/wiki/Ultra-
low_latency_direct_market_access)

~~~
mcpherrinm
On the flip side, I've also heard "low-latency" (or even "instant") to mean
"within 30 seconds", as an upgrade from "the mainframe runs a batch job every
hour (or day)".

I don't think there's any way to consider a phrase like "low latency" without
considering what you're talking about.

~~~
milkytron
Good point, the word "low" itself is relevant.

------
zokier
This could possibly be combined with multiprocessing for great effect? I'm
imaging something like having a pool of workers executing tasks (/reacting to
events/serving requests/etc), and only running gc after the task has been
done, but before indicating readyness to the supervising process
(/loadbalancer/etc).

~~~
aidenn0
Even a very simple mode with two processes that never have GC enabled at the
same time would greatly improve things.

------
dajonker
Seems quite useful, allowing the developer to guide the garbage collector into
the right direction by carefully placing statements to tell the garbage
collector when it is (not) ok to run. But make sure to add some comments
describing the purpose of those statements, and how to profile your code to
check whether it's still working correctly. Don't want to accidentally stop
garbage collection altogether, either.

------
raymondh
This is a nice bit a progress and addresses a major concern about using PyPy
in real-time systems.

------
genjipress
I'd be curious to see if any of the work done here could be applied back to
the main CPython project. I doubt it could happen immediately -- at least, not
with the way GC is currently implemented -- but PyPy has been a source of
innovation for CPython in the past (see: new dict implementation).

~~~
throwaway12iii
CPython already has a lower latency GC than PyPy, gc.disable() already works,
and allowed manual memory management when needed.

Reference counting allows you (if needed) to keep references to memory in your
python code, and free them in the right spots.

This is PyPy becoming useful for a _lot_ more production use cases. From web
APIs that have a latency SLA, to audio, games. In many cases peak performance
is not important, it's the minimum performance.

~~~
mattip
Refcounting comes with its own in-thread gc pauses whenever you exit a block
or context and the local variables are collected.

~~~
throwaway12iii
Yeah. However you have the option to not pause if it is important. You can
control where the memory management happens. You can either keep references to
the memory, and call gc.disable(). When you are ready you can let go the
references and enable the gc.

PyPy now lets you control where memory management happens. Making it possible
to control worst case performance. For many production apps this is a big
deal.

~~~
mattip
You can never prevent the GC cycle in CPython at the end of a block (context).
You can only prevent the GC that tries to break reference cycles. If your
class does crazy things at destruction, like "time.sleep(10)", and you create
an instance of the class inside a function, when that function returns you
will pause CPython even if you call gc.disable()

You also cannot disable the minor collections in PyPy, only the major
collections, but once the JIT kicks in PyPy can prevent some of the object
churn by optimizing instances away.

~~~
throwaway12iii
Yeah. Avoiding slow things like classes, threads and adding time.sleep(10) is
the trick.

------
pstrateman
But a new event could be submitted at anytime.

This is certainly an improvement, but not a complete solution.

~~~
viraptor
Ideally you have enough copies of the server process to handle the events that
come when another process is running GC.

You already have to have enough of them to handle events while other workers
are busy.

------
cakoose
Relevant: "Blade: A Data Center Garbage Collector" (2015,
[https://arxiv.org/pdf/1504.02578.pdf](https://arxiv.org/pdf/1504.02578.pdf))

Terrible title, but basically the same idea.

------
a_imho
_Combining these two functions, it is possible to take control of the GC to
make sure it runs only when it is acceptable to do so._

I'm conflicted on this. My gut tells me if I'm going to manually take control
over the garbage collector I should reconsider my design decisions.

Disclaimer, I don't know the first thing about PyPy or Gambit Research, so
probably this is the right approach for them?

~~~
Fragoel2
It's explained in the article that this is to solve one specific issue that
Gambit Research had: in some parts of their code they need to take action with
very low latencies (<10ms) and hence they can't wait for the GC. This way they
manually execute the GC in other sections of the code were timing requirements
are relaxed.

------
alex-wallish
This is awesome!

