
Measuring the Impact of the .NET Garbage Collector - matthewwarren
http://mattwarren.org/2014/06/18/measuring-the-impact-of-the-net-garbage-collector/
======
Locke1689
FYI, as soon as you start getting serious about profiling, you should use
PerfView[0]. It's what we use in Roslyn for almost all investigations.

[0] [http://www.microsoft.com/en-
us/download/details.aspx?id=2856...](http://www.microsoft.com/en-
us/download/details.aspx?id=28567)

~~~
matthewwarren
Yeah I keep meaning to dig into PerfView a bit more, I've only scratched the
surface of it.

BTW I really like the Perf stuff in Roslyn, I wrote 2 posts about it, in case
your interested? Although as you worked on it, it won't be anything you don't
already know ;-)

[http://mattwarren.org/2014/06/05/roslyn-code-base-
performanc...](http://mattwarren.org/2014/06/05/roslyn-code-base-performance-
lessons-part-1/) [http://mattwarren.org/2014/06/10/roslyn-code-base-
performanc...](http://mattwarren.org/2014/06/10/roslyn-code-base-performance-
lessons-part-2/)

~~~
Locke1689
Seems pretty accurate to me.

~~~
matthewwarren
Thanks for taking the time to read it!

------
x0x0
I've worked on a product that used the classic tricks (large byte[] arrays,
access through sun.misc.Unsafe, indexing instead of references, etc) and while
it was quite fast, it was written by someone who worked at Azul and deeply
understood the jvm and gc. Personally, it makes me think D is the right
solution: gc for 99.99% of your objects, but for the performance critical bits
or where you're fighting the gc, opt out of gc and manage memory by hand.

Also, theUnsafe makes me snicker

    
    
        .Unsafe.class.getDeclaredField("theUnsafe");

~~~
matthewwarren
Yeah there definitely comes a point with .NET/Java where, to get very high
performance, you are fighting the language/runtime.

The general argument is that you are still more productive by writing 90% or
95% of you app in a managed language, in the idiomatic way. Then you use crazy
tricks to tune the last 10/5%. Rather than doing the whole thing in C/C++,
which will give better performance, but may not be quicker (more productive)
to write.

I don't know much about D, it's interesting to find that you can opt out of GC
like that.

~~~
x0x0
I haven't done it, but according to a tutorial [1] you can directly access
malloc / free and bypass the gc

[1]
[http://qznc.github.io/d-tut/memory.html](http://qznc.github.io/d-tut/memory.html)

~~~
barrkel
You can directly access malloc and free in .NET as well, it's just more
inconvenient.

~~~
Locke1689
It's also usually a bad idea. Memory allocated via PInvoke or similar APIs is
basically opaque to the garbage collector. This can produce undesirable
behavior, from polluting your code to interfering with GC due to memory
fragmentation. Marshalling also isn't free.

Basically, if you're at the point where the GC is impacting you but if you
haven't tried Roslyn-level optimizations -- do that first.

~~~
barrkel
Fragmentation is unlikely - they'll be completely different heaps, and on
64-bit probably far apart. Marshalling costs only in so far as you work with
safe code. C++/CLI is working at a slightly different level.

But I agree that working with the grain of the GC is usually more productive.

~~~
Locke1689
That seems like a lot of assumptions. I don't think I'd be OK with generally
advocating based on all those assumptions. For example, you never touched on
what would happen if you allocated enough in native and managed that you start
to get significant memory pressure -- collecting with paging is almost
impossible, so the CLR goes into panic mode in an attempt to prevent paging
and a large portion of the heap would be untouchable/immovable.

~~~
barrkel
Are you confusing physical memory with address space?

There's no good reason for the managed heap to be anywhere near any of the
native heaps in address space, on 64-bit platforms.

And the CLR's GC should actively allocate slabs well away from any native heap
(trivial to do - reserve (not commit, reserve) a big contiguous chunk of
address space), simply because it relies on third party code which will itself
be allocating native memory; everything from GUI code to native DB drivers and
their caches, quite independent of unsafe code doing manual allocation.

In the absence of a GC-aware virtual memory manager, GC-immovable memory has
little relevance to paging.

(Of course, GC.Add/RemoveMemoryPressure should be called if you're doing
native allocation from .net.)

------
Padding
What many people seem to miss in these sorts of discussions is that
"pauseless" usually also means less throughput. I guess it's not different
from real-time systems being slower in practice than non-deterministic ones.

If what Azul has worked on conventional Client/Server-JVMs, Sun/Oracle/IBM
would've adopted it long ago.

The sad reality is that spending 99% of your time in GC, but never in a pause
longer than "x units of time" qualifies as "pauseless" and if you want more
throughput, you have to scale up the system so that the remaining 1% of
througput is sufficiently large for your workload.

I guess this works in the markets where Azul is active (things like finance)
but is useless in more conventional use cases.

~~~
electrum
Azul's garbage collector is patented and until recently required specialized
hardware:
[http://www.azulsystems.com/products/vega/processor](http://www.azulsystems.com/products/vega/processor)

Recent advances in Intel CPUs have allowed them to run on commodity hardware,
but last I looked, it still required a custom kernel module.

------
molixiaoge
great

