
On Stack Overflow's recent battles with the .NET Garbage Collector - sams99
http://samsaffron.com/archive/2011/10/28/in-managed-code-we-trust-our-recent-battles-with-the-net-garbage-collector
======
david_a_r_kemp
Let me prefix this rant by saying I Love C# and .Net. I think it's a really
good platform to build on, as demonstrated by it's growth outside of the
Microsoft world.

This article highlights why (good) C/C++ have such a hard time moving to
C#/.Net. In .Net memory management is, 99.9999999999% of the time, fire and
forget, whereas a big chunk of learning and programming C/C++ is about memory
management. Most C, and some C++, programmers would have baulked at having a
massive object graph in memory - releasing it becomes a massive headache, and
that's before you even get into synchronisation issues, and the .Net team have
done a really good job of taking this headache away (yes, I realise they 're
not the first). However, we're now at the point where there are a lot of C#
programmers who've never heard of a linker (by default, csc compiles and links
in one go), have little concept of static vs dynamic linking, and, more
importantly, don't have the first clue about memory management. This manifests
itself in things like not using IDisposable properly/consistently, using the
string.+ operator rather than using StringBuilder/StringWriter, and developers
have no idea about the difference between class and struct, especially when it
comes to parameter handling. Even books like (the excellent) C# in Depth are
pretty light on the memory management side of things, deferring this to books
about the CLI and DNR. Whereas, the truth is, if you're going to write a big
application, you're going to have some concept of this.

~~~
MatthewPhillips
Could you explain what you mean about string + vs StringBuilder? I'm guilty of
this one.

~~~
stonemetal
In C# strings are immutable , so string + string creates a new string. This
means coping both strings into the new result string. That is a lot of copying
and can be pretty slow if you do it on large strings, or often on small
strings. Especially if you concatenate multiple strings at once (A + B + C + D
means that A is copied 3 times, B 3 times, C twice, and D once) StringBuilder
is mutable so appends don't recopy data. It is really only a problem if you do
it on large strings more than a few times.

On the other hand I am not sure if this is a problem in .NET. I know the
semantics look bad but no one said the execution model had to match the
semantics. If I remember correctly the JVM has the capability to rewrite the
code to the faster version, though I am not sure if the CLR does.

~~~
m_myers
The JVM does rewrite individual concatenations as StringBuilders (so a+b+c+d
results in one StringBuilder with three .append() operations), but the real
performance problem is concatenating in a loop.

    
    
        String result = "";
        for (String str : myStrings) {
            result += str + " "; // on each execution, a new StringBuilder is created
        }
    

as opposed to

    
    
        StringBuilder result = new StringBuilder();
        for (String str : myStrings) {
            result.append(str).append(" ");
        }
    

The JVM can't hoist the generated StringBuilder out of loops to produce the
second, faster piece of code. I believe the CLR is the same.

~~~
njw45
Later JVMs do escape analysis, though - so there's not necessarily any
difference between the two methods...

~~~
m_myers
At least in Oracle Java 1.6 update 24, which is supposed to have escape
analysis enabled by default, it is a huge difference. I don't know if Java 7's
analysis is smarter.

------
smiler
I think stackoverflow is the first high volume .net site that have gone in
depth publically with how they are solving their performance issues. Microsoft
should be paying them for the continued great PR

~~~
riffraff
it's great that they share their findings and part of their code, even if I'm
not a .NET dev I find the blog posts really interesting and their
approach/tools inspirational.

But wasn't slashdot the first high volume site sharing their solutions to
performance issues?

~~~
smiler
Notice I said .NET. You can find plenty of presentations / blogs on php
performance, ruby / rails performance, python / django performance but not
many on .NET - I guess simply because most high-volume sites don't use .NET

~~~
noselasd
I'd say it's rather because most high volume sites using .NET are more
commercialized , and often run by companies that's typically not been very
open about anything they do internally.

~~~
dbattaglia
Also StackOverflow are one of the few .Net sites hackers/devs flock to and
care about. I work on a large scale .Net web app, but I don't think many
hacker news readers would really care about the application itself (and, as
you pointed out, my company would probably kill me if I started blogging about
the internals!).

------
cookiecaper
I wonder whether all those gymnastics were more prudent than simply writing a
C module to filter questions and feeding the results back to the C# web app.
Opinions?

~~~
melling
Jeff doesn't know C. Should he learn C was an ongoing topic in Jeff and Joel's
StackExchange podcast, which was actually somewhat entertaining. Time for
another episode.

~~~
rbanffy
Which episode was that?

~~~
melling
It has been a while so I can't remember. They were getting people to do the
transcripts for their shows.

[http://meta.stackoverflow.com/questions/19244/community-
podc...](http://meta.stackoverflow.com/questions/19244/community-podcast-
transcription-project-idea)

If your Google-Fu is good.

------
Griever
Fantastic read. Thanks for posting it!

After reading this article I felt extremely humbled. As a developer doing
relatively "simple" work day to day, seeing this level of programming
proficiency is very impressive. There isn't a chance I'd be able to figure out
a problem such as this. At least certainly not within the timeframe that the
SO team fixed it in.

It's articles like this that make me want to continue pushing my limits as a
developer.

~~~
bhrgunatha
> There isn't a chance I'd be able to figure out a problem such as this.

Don't sell yourself short. You generally don't know what your _actual_ limits
are until you really push hard against them or break through them. I've had to
hunt down and fix some very unusual problems - with few or no references or no
results from google.

Those kind of mind-bending problems actually give a great deal of
satisfaction. I'm serious that you never really know your limits until you
break them. I think the trick is to keep growing and expanding your abilities
and knowledge.

~~~
Griever
Very true. This is generally why I appended that with "at least not within the
timeframe that the SO team did".

I think that I have gotten to where I am by constantly pushing myself, but
reading articles like this just reminds me that I still have a ways to go.

------
bluelu
Why not write a script that periodically takes a machine of the loadbalancer,
forces a GC on each of the servers, and then adds it back again.

I think this would be a cleaner approach than changing the entire code base.

~~~
scott_s
I disagree. Your solution attempts to fix the symptom. Their solution fixes
the root cause of the problem.

~~~
ajross
It does? What is the "root cause" though? You seem to take it to mean the
generation of references that must be traversed in gen2, but that sounds like
just as much a hack to me: they modify the code in ways that don't represent
the semantics of the problem to work around a limitation of the
implementation.

Hell, you could argue the "root cause" is the use of a garbage-collected
environment in the first place. All popular GC implementations have latency
issues like this. All of them. If you can't deal with occasional high
latencies, you should identify that requirement before choosing Java or C#.

All the solutions provided are just patching around that issue. I don't see
anything in the post that looks like a root cause, and the GP's post has the
advantage of being much simpler to implement.

~~~
scott_s
I consider the root cause to be a mis-match between the memory management
needs of the application and the assumptions of the GC. Their solution matches
the application's memory usage with the assumptions of the GC.

What would be more useful to them is to be able to provide either hints to the
GC, or to actually have some control over memory management. That is, keep the
application the same, but let it influence GC behavior. They're forced to go
at it the other way: keep the GC the same, but change the application so what
the GC does becomes the right thing.

 _they modify the code in ways that don't represent the semantics of the
problem to work around a limitation of the implementation._

That is always true when you start optimizing for performance. And nick_craver
explains why the proposed suggestion may be easier to say, but there are
hidden complexities. Personally, once you start introducing external control
to your application like that, alarm bells go off in my head.

------
Sukotto

      I can not stress how important it is to have 
      queryable web logs.
    

I'd love to read some (actionable) articles on how to set up some of the
benchmarking / logging stuff that you guys have. For sure I'll check out _.NET
Memory Profiler_ and _mini profiler_ linked in the article, but I'd be
grateful if anyone wants to post more comprehensive info on where to start.

------
steve8918
I'm not one to hate on C#, or be a programming language snob but this
certainly looks like a case where poor performance and a lot of extra work
were created simply because of a deficiency in a language. Languages like C#
are advertised as being simple to use, and for the most part they are. They
fit the bill in a lot of different use cases.

But it certainly seems like to me that if you want to build a high-
performance, scalable piece of software, you don't want to have to bang your
head against wall and fight against deficiencies within the language
implementation itself in order to get your software working properly.

~~~
mwsherman
You are conflating the language with the runtime, first. And second, we were
experiencing the occasional page taking maybe one second, which we considered
unacceptable but 99% of the world wouldn’t notice.

(NB, languages like Ruby and PHP are lovely, but are over 10x slower than C#
and Java.)

~~~
Roboprog
I'm so tired of benchmarks that play with ints and floats which show how fast
Java is.

Most business app work is string manipulation. If you think Ruby is slow, look
at the timings for "sequential" operation in the table on this page (yes, it's
my site):

<http://roboprogs.com/devel/2009.12.html>

If you think that I have badly blundered in my methodology, get the benchmark
code, fiddle with it, and run it on your own hardware:

<https://github.com/roboprog/mp_bench>

FYI: I'm not trying to say that "Ruby is the most awesomest language, like
ever, d00dz!!!". I like different tools for different jobs. I'm just tired of
seeing people judge implementation speeds based on bit twiddling benchmarks,
rather than stuff that at least churns through a large number of strings, if
not other object types, and does some I/O.

~~~
qw
I noticed some differences in the code that I think are not equivalent

Perl: print $local_process_var, " ", &gen_pg_template(), "\n"

Python: print local_process_var, " ", gen_pg_template()

Java: System.out.println( timestamp + " " + genPgTemplate() );

I'm not sure, but I think the Perl/Python code sends the strings as arguments
to print that will just output them directly. In Java you will concatenate 3
strings before sending it to "print". This adds some extra CPU usage that the
other versions don't have

~~~
Roboprog
One item of possible note: It wasn't possible to have list driven output
syntax in Java before 1.5. I'm not sure if they added an output method with
elipses in it then, or not. Even if so, I'm not sure it would be faster.

Still, I could modify the scripting output lines to bring them down to Java's
level :-)

------
MatthewPhillips
Slightly off topic, for any SO employees here, have you all ever experimented
with using mono-fastcgi-server for any part of your sites and what were your
findings? Assuming any of your code compiles under mono.

~~~
sams99
We have not, has anyone ran any benchmarks/tests comparing the mono GC to the
Microsoft GC? I am pretty sure we would need to change large chunks of our
system to run it in mono.

~~~
kmontrose
My understanding (from discussions with some Fog Creek devs who've done the
mono dance for FogBugz) is that ASP.NET on mono is an awful lot of trouble,
and doesn't run under IIS.

<http://www.mono-project.com/FAQ:_ASP.NET>

Seems to validate the IIS bit.

------
ableal
These two phrases, early in the article, could give someone an interesting
problem to solve:

 _Our initial tag engine design was a GC nightmare. The worst possible type
memory allocation. Lots of objects with lots of references that lived just
enough time to get into generation 2 (which was collected every couple of
minutes on our web servers).

Interestingly, the worst possible scenario for the GC is quite common in our
environment. We quite often cache information for 1 to 5 minutes._

(Curiously, this is under a header mentioning "abuse of the garbage
collector". I'd put it the other way around ...)

------
mattmanser
Less a stop watch in the brain, more likely a case of simply pressing F12 in
the browser. All of the consoles have profilers these days.

I must admit I am a little surprised that tags are stored as a free text
field.

~~~
sams99
The issue popped up once in a hundred times, in general people do not notice.

Tags are stored both in a posttags table and free text. Fts is faster for many
queries.

~~~
sams99
I do admit it surprised me that we stored tags in an fts field when I started,
but after a few months it made quite a bit of sense, the queries are just way
faster.

------
jroseattle
To SamS and the SO team: what drove the choice to Free Text Search? Any other
options evaluated at the time?

------
platonichvn
Great post. I always learn something new when the SO guys share their
experiences with .NET in a large scale web application. Thanks for the link to
the memory profiler. Hopefully it will be better than struggling with windbg.

------
iam
If they used an array of structs they would've cut down on the GC's graph
traversal significantly.

Another option I was surprised they didn't mention was to run GC.Collect
periodically to clear out gen2.

~~~
mrgoldenbrown
Running Collect periodically probably won't help much in this scenario - the
huge data structure isn't actually being collected during the GC.Collect. But
the GC doesn't know this until it spikes the CPU while it walks a bajillion
references.

Since nothing is actually getting collected, if you ran GC.Collect
periodically, you would just be incurring more lag with no benefit. I think
your idea might help more in a different situation, one where you do have
large amounts of memory getting reclaimed with each GC sweep.

