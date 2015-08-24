Hacker News new | comments | show | ask | jobs | submit login
Dismissing Python Garbage Collection at Instagram (engineering.instagram.com)
It's basically "cheating" at GC by exploiting a very narrow use case. I saw a trick like this at Smalltalk Solutions in 2000 with a 3D game debugging tool. The "GC" actually simply threw everything away for each frame tick.

Someone needs to come up with something like a functional language based on a trick like this. Or maybe a meta-language akin to RPython, so people can write domain specific little languages for doing things like serving web requests, combined with domain specific "cheating" GC that can get away with doing much less work than a full general purpose GC.

Couldn't a pure functional programming environment be structured to allow for such GC "cheating?"

PHP works very similarly to what they describe in the post. It has a garbage collector that's effectively reference counting, but the whole heap just gets trashed at the end of every request.

not exactly, but you can have a look at how GC works in BEAM[1][2].

[1]: https://www.erlang-solutions.com/blog/erlang-19-0-garbage-co... [2]: https://hamidreza-s.github.io/erlang%20garbage%20collection%...

I immediately thought of BEAM while reading this.

ErlangVM's GC does exactly this. Memory is scoped in a closure around a process, and any time the process goes down all its memory its thrown away.

Also anytime a functions scope terminates, all its memory immediately goes away.

This can be done because its a functional language with immutable data structures.

Is it just me, or does it look like the typical example of short term hack that will blow up in your face pretty quickly, and turn your life in a constant stream of low-level tinkering ?

I suppose people at instagram didn't just stop there, but are also planning for more long term solution to optimizing their stack ( aka migration to a more performant language).

I didn't know about atop or perf profiling. Cool write up.

I'm confused - doesn't a worker run out of memory if GC is disabled?

Python is also reference counted, and this does the bulk of the work - the GC is just for things that were missed. Instagram has the process that spins up the Python works kill and replace any that eventually use of the allowed threshold of memory.

Question, but as someone who doesn't do this sort of work...is this typical? That things would balloon that requires you to periodically kill things sounds like fuzzy logic somewhere to me.

I get that software is complex and people have simple deadlines...

It looks like Instagram's change disabled refcounting too, otherwise they'd still be doing copy-on-read for the refcount variable?

They abandoned that change because it didn't solve their problem.

It will eventually, but the master process monitors and restarts it when the RSS reaches certain threshold. So it's indeed a trade-off.

This is actually very clever, and really solves their problem pretty neat!

