
Composer – Disable GC when computing deps and refs - Damin0u
https://github.com/composer/composer/commit/ac676f47f7bbc619678a29deae097b6b0710b799
======
Seldaek
For those looking for a technical explanation, the PHP garbage collector in
this case is probably wasting a ton of CPU cycles trying to collect thousands
of objects (a LOT of objects are created to represent all the inter-package
rules when solving dependencies) during the solving process. It keeps trying
and trying as objects are allocated and it can not collect anything but still
has to check them all every time it triggers.

Disabling GC just kills the advanced GC but leaves the basic reference
counting approach to freeing memory, so Composer can keep trucking without
using much more memory as the GC wasn't really collecting anything. The memory
reduction many people report is rather due to some other improvements we have
made yesterday.

As to why the problem went unnoticed for so long, it seems that the GC is not
able to be observed by profilers, so whenever we looked at profiles to improve
things we obviously did not spot the issue. In most cases though this isn't an
issue and I would NOT recommend everyone disables GC on their project :) GC is
very useful in many cases especially long running workers, but the Composer
solver falls out of the use cases it's made for.

~~~
sillysaurus3
_As to why the problem went unnoticed for so long, it seems that the GC is not
able to be observed by profilers, so whenever we looked at profiles to improve
things we obviously did not spot the issue._

That sounds like a bug in the profiler, not with Composer. Observing internal
time is pretty important for any profiler.

~~~
Seldaek
Yes it is definitely a failure of the tooling and I hear it is actually being
worked on.

------
tomp
I had the same issue in Python recently. The project runs as a server that
loads a huge amount of objects from the database, and could use as much as
10GB memory! Python's reference counting works great, but every so often, the
full-heap-scanning cycle collector would run, and it took quite a lot of time
to scan a mutli-GB heap.

We noticed the issue happened most often when deserializing objects (loading
them from Redis to memory). As it turns out, Python would schedule a
collection every time the _object_created_ counter was sufficiently higher
than _object_destroyed_ counter. In general, this makes sense, because that
way you can be sure that objects are being created and not being freed, which
most likely means a resource leak or a reference cycle. However, the same
thing happens during deserialization - many new objects are created, and none
are freed. Coupled with Python's low threshold (700), GC was triggered many
many times in every serialization loop (usually in vain, as no new objects
became recyclable). Disabling GC and running full collections manually solved
the problem

------
Gigablah
Looks like someone disabled garbage collection on that comment thread as well
:)

~~~
mahouse
The truth is, I don't understand the point on having to download MBs of stupid
animated images I will not even look at when I expect to see a commit diff.

~~~
sergiosgc
The much anticipated sour-grapes-party-spoiler award goes to:

------
Mithaldu
As far as i understand composer is roughly the same thing as the cpan client.
And they just simply disabled the garbage collector for it.

What is this guy doing that he needs gigabytes of memory to install a bunch of
php libraries?

    
    
        Before: Memory usage: 2194.78MB (peak: 3077.39MB), time: 1324.69s
        After:  Memory usage: 4542.54MB (peak: 4856.12MB), time:  232.66s

~~~
Maken
So, did they exchange a 70% reduction in execution time for a 100% increment
in memory usage?

~~~
antirez
1324 -> 232\. If you say 70% reduction VS 100% increment sounds like you are
talking the same stuff and it is possible to compare 70 with 100.

The reality is even in the above edge case: 2x memory, 6x speed.

~~~
Cthulhu_
This; interpreting performance numbers is hard. Also, memory nowadays is
cheap, CPU power isn't.

~~~
porker
> Also, memory nowadays is cheap, CPU power isn't.

Unless you're running your deployment on a 512Mb or 1G VM. I've had composer
max out swap on those too. Even with 2G RAM it's not been happy sometimes, so
be interesting to see what difference this patch makes.

~~~
jayzalowitz
Yeah, i have had to move to more expensive ec2 instance on account of this
very issue..

~~~
sneakest
IMO you should commit your composer.lock file up to your repository and then
use composer.phar install --no-dev --optimize-autoloader on any production
instance. Install is much faster and uses hardly any memory compared to the
update command.

To add/update any dependencies for your project run the composer.phar update
on your development environment or somewhere it can use a ton of memory and
cpu without issue. Then just commit and push up your composer.lock changes.
Been doing it this way for over a year and had no issues deploying changes in
ec2.

------
DangerousPie
Interesting. I was looking at the comments hoping for some more technical
background, but unfortunately they seem to have been run over by the animated
gif crowd.

Any more details on this?

~~~
rossriley
It seems when you start to hit the memory limit PHP's automatic garbage
collection will loop through the constructed objects to see if any can be
cleaned up.

If none can (and in the case of Composer all the objects exist for a reason)
then it's wasting time analysing the objects.

So in this case there's only a large waste of cpu doing nothing with gc
enabled.

~~~
icebraining
Well, yes and no, since some of the reports show an increase in memory usage,
so the gc was doing something.

~~~
masklinn
Most reports don't show significantly changed memory usage though (some
increase slightly, others decrease slightly).

------
mtmail
Wonderful commit.

(I didn't know animated gifs in github comments are a thing. Maybe I work too
much with boring projects.)

------
dec0dedab0de
Could someone more versed with PHP, and this project explain why turning off
garbage collection helped so much? and why they didn't turn it back on at the
end of the function?

~~~
jcampbell1
PHP is reference counted, so memory is typically freed as soon as an object is
no longer needed. Cycles are the exception which can cause memory leaks, so in
version 5.3 php added a cycle collector, which reads every object in memory
and very occasionally deletes objects that are disconnected and have greater
than zero reference counts (cycles).

In my opinion, the php cycle collector is a pointless waste of time. In
objective-c, apple just let's the memory leak by default, and they give you
tools to find the leaks, and then you modify the code to break the cycles.

There is no need to turn cycle collection back on at the end of the program,
because OS frees the memory at program termination.

~~~
innocenat
I agree that cycle collector is pointless waste of time. Most script runs
short enough that the memory leak doesn't really matter.

But for long running script, it's either cycle collector, or add support for
weak reference. But IMO, due to how reference are stored in PHP, and to my
limited knowledge of PHP core, I am quite sure cycle collector are more
beneficial in both developer time and usefulness. (Not every programmers know
how to manage reference cycle)

------
kornakiewicz
I remember story of my friend in algorithmic contest for high school students
in Poland (which are quite hard). He solved problems correctly, but in his
implementation he got to check in every iteration of loop if a collection
still got any elements. He used col.size()==0 instead of col.isEmpty(). The
first was O(n) and it fucked up all performance.

~~~
tantalor
That's a bug.

~~~
anon4
Not really, some containers have a linear-time size by design. The canonical
example is a linked list in which you wish to keep the splice-another-list-at-
middle time linear.

------
shaurz
Wait, when did Github become the new 4chan?

~~~
meowface
It's much closer to reddit than 4chan, otherwise the nature of the images
posted would be a little different.

And it's been like this for 2 or 3 years now. I've seen comment spam of images
for commits and issues for quite a while.

~~~
joshuacc
The first one I remember was the commit that added CoffeeScript to rails by
default.
[https://github.com/rails/rails/compare/9333ca7...23aa7da](https://github.com/rails/rails/compare/9333ca7...23aa7da)

~~~
lmm
The first I remember was [https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commi...](https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac) , but I guess that
was a couple of months later.

------
markartur
What is wrong with the comments...

~~~
trebor
Nothing actually, they're working correctly. The _people_ on the other hand...
that's the questionable part. ;)

------
gus_massa
I found an interesting comment between the gif:
[https://github.com/composer/composer/commit/ac676f47f7bbc619...](https://github.com/composer/composer/commit/ac676f47f7bbc619678a29deae097b6b0710b799#commitcomment-8796588)
by h4cc

> Behold, found something in the docs about garbage collection:

>> Therefore, it is probably wise to call gc_collect_cycles() just before you
call gc_disable() to free up the memory that could be lost through possible
roots that are already recorded in the root buffer. [...]

------
butwhy
Never have I seen so many gifs on a commit page.

~~~
TeMPOraL
Then you most certainly don't remember the infamous Bumblebee Fiasco.

[https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commi...](https://github.com/MrMEEE/bumblebee-Old-and-
abbandoned/commit/a047be85247755cdbe0acce6f1dafc8beb84f2ac)

~~~
ward
Ugh. I remember that, I had posted on that to explain to the developer why
there was so much attention and kept receiving mails and notifications from
github for ages. At the time there was no way to "stop watching" when you had
commented on something, if I remember correctly.

~~~
TeMPOraL
I still have all those notification emails; I am still thinking about graphing
them one day to see how the commenting rate on this thread evolved over time.

------
headius
Am I the only one that considers this disgusting? If the GC is so bad that it
causes 2-10x slower operation in this use case, then it's a bad GC. I mean
really, really bad. Short-lived objects in any modern GC should be swept away
trivially without a lot of overhead. Of course we're talking about PHP here,
so perhaps it's redundant to say something about it sucks, but
jesus...runtimes that require hacks like this should be taken out back and
shot.

------
NDizzle
Wow, OSX 64bit chrome can't handle that many animated gifs. 32bit could just
fine. What gives?!

~~~
r109
BeamSyncDropper v2

------
munificent
I was curious, so I did some investigation, starting here:

[http://php.net/manual/en/features.gc.php](http://php.net/manual/en/features.gc.php)

Here's when I found:

PHP uses ref-counting for most garbage collection. That means non-cyclic data
structures are collected eagerly, as soon as the last reference to an object
is removed.

Naïve ref-counting can't collect cyclic data structures, though. Normally,
cycles are "collected" in PHP by just waiting until the request is done and
ditching everything. That works great for web sites, but makes less sense for
a command line app like Composer.

To better reclaim memory, PHP now has a cycle collector. Whenever a ref-count
is decremented but not zero, that means a new island of detached cyclic
objects _could_ have been created. When this happens, it adds that object to
an array of possible cyclic roots.

When that array gets full (10,000 elements), the cycle collector is triggered.
This walks the array and tries to collect any cyclic objects. They reference
this paper[1] for their algorithm for doing this, but what they describe just
sounds like a regular simple synchronous cycle collector to me.

The basic process is pretty simple. Starting at an object that could be the
beginning of some cyclic graph, _speculatively_ decrement the ref-count of
everything it refers to. If any of them go to zero, recursively do that to
everything they refer to and so on. When that's done, if you end up with any
objects that are at zero references, they can be collected. For everything
left, undo the speculative decrements.

If you have a large live object graph, this process can be super slow: you
have to traverse the entire object graph. If there are few dead objects, you
burn a bunch of time doing this and don't get anything back.

Meanwhile, you're busy adding and removing references to live objects, so that
potential root array is constantly filling up, re-triggering the same
ineffective collection over and over again. Note that this happens even when
you aren't _allocating_ : just assigning references is enough to fill the
array.

To me, this is the real problem compared to other languages. You shouldn't
thrash your GC if you aren't allocating anything!

Disabling the GC (which only disables the cycle collector, not the regular
delete-on-zero-refs) avoids that. However, it has a side effect. Once the
potential root array is full, any new potential roots get discarded. That
means even if you re-enable the cycle collector later, those cyclic objects
may _never_ be collected. Probably not a problem for Composer since its a
command-line app that exits when done, but not a good idea for a long-running
app.

There are other things PHP could do here:

1\. Don't use ref-counting. Use a normal tracing GC. Then you only kick off GC
based on _allocation_ pressure, not just by mutating memory. Obviously, this
would be a big change!

2\. Consider prioritizing and incrementally processing the root array. If it
kept track of how often the same object reappeared in the root array each GC,
it can get a sense of "hey, we're probably not going to collect this". Sort
the array by priority so that potentially cyclic objects that have been live
in the past are at one end. Then don't process the whole array: just process
for a while and stop.

[1]:
[http://media.junglecode.net/media/Bacon01Concurrent.pdf](http://media.junglecode.net/media/Bacon01Concurrent.pdf)

------
echeese
That page has over 200MB worth of animated gifs, just as a warning.

------
aidenn0
Naive mark and sweep: making refcounting look fast for 50 years.

------
caiob
Some insightful comments would've been nice.

------
illumen
Excellent collection of gifs. That's the more interesting part for me. I
already knew garbage collection was slow.

~~~
JetSpiegel
[https://camo.githubusercontent.com/668aedc4bd252dd8fb5a57b90...](https://camo.githubusercontent.com/668aedc4bd252dd8fb5a57b902c9444354d79d49/687474703a2f2f692e696d6775722e636f6d2f4e7459384a71702e676966)

Particularly this. What a disturbing documentary.

~~~
tambourine_man
This got me curious, what's that?

~~~
byoung2
[http://tumblr.knowyourmeme.com/post/4028078808/this-
raptor-j...](http://tumblr.knowyourmeme.com/post/4028078808/this-raptor-jesus-
rave-gif-was-made-out-of-a-scene)

------
eXpl0it3r
I see two lines changed! Click bait! :P

------
btbuildem
Do you work with 13yr-olds?

------
benihana
The commit is great. I love that the comments have spiraled completely out of
control. At this point, 30 minutes after the link was posted, the comment
thread is now a competition to see who can post the best gif.

I know we're serious here, but stuff like this reminds me why I love the
internet so much. It's fun to cut loose once in a while.

~~~
bshimmin
Agreed. It's such a shame that HackerNews doesn't let you post animated GIFs -
I think it'd really add a lot of value to the discussions here.

~~~
Omniusaspirer
I'm hoping this is sarcasm since I don't think I've seen a single intelligent
discussion in my life that was helped along by a funny GIF.

Not to say there's anything wrong with funny GIF's, but I come to HN exactly
because it moderates away that sort of stuff.

~~~
bshimmin
It was sarcasm. Sorry I forgot the <sarcasm> tags - I'll be more careful in
future.

~~~
mring33621
You forgot them again.

~~~
pestaa
He only forgot to close it.

~~~
bshimmin
You guys are a tough crowd sometimes. </sarcasm>

------
Treasdex
__Warning __NSFW Commit detected. "Pedophile 11-year-old girl images from
4Chan"

Stay classy programmers.

------
Retrazder
Warning: The commit is NSFW.

The commits are embarrassing, stupid and really exposes why developers are
considered idiots. Why troll?

Because they are jerks. Period. Grow up noobs.

------
itamarhaber
gifhub galore :)

------
cdnsteve
What was I looking at again? I forgot because all of these animated GIF's are
amazing!

