

Microbenchmarking Mono's new GC - j_baker
http://gonzalo.name/blog/archive/2010/Jul-17.html

======
dododo
why are so many benchmarks done so badly statistically?

the benchmark was run once: we have no idea if this was a fluke or not.
hopefully the author didn't repeat and pick the best.

at best the numbers reported can be considered some mean. the mean is not a
robust estimator:
[http://en.wikipedia.org/wiki/Robust_statistics#Examples_of_r...](http://en.wikipedia.org/wiki/Robust_statistics#Examples_of_robust_and_non-
robust_statistics)

one sample is enough to distort the mean. furthmore we've no idea how typical
this run was: there should be some measure of spread.

~~~
viraptor
Zed, is that you? :)

<http://zedshaw.com/essays/programmer_stats.html>

I think we've seen that already. Mean is "good enough" for most programmers
apparently. And to be honest in many cases, it's ok. It was a microbenchmark,
not a big evaluation of latency of both algorithms.

~~~
dododo
you only know the mean is good enough if you examine the spread and/or use a
robust estimator.

spikes happen in performance (for better and for worse) all the time. one
spike is enough to totally distort the mean, no matter how often you repeat.

it might be a microbenchmark but to paraphrase: if it's not done well, it's
not worth doing. the complete opposite of what the microbenchmark purports to
show could be true most of the time.

~~~
viraptor
I agree with you from the point of view of people doing proper statistics.
Mean is not good enough.

Then again - let's say I'm a user of some library and I trust the developers
to do the right thing and test the new implementation against the old one. The
thing that is of interest to me is whether on average on my random test I
still see the difference. He did. Sure - the whole test is flawed, may be
incorrect, etc. In reality - he just verified something he already
knew/expected against his random data. No matter how much we care about
correctness, he's still right on some level. There's a visible improvement
even on this random test and if he tried it at least 3 times we can assume
that the new GC is worth looking at if you had issues with the previous one.

"if it's not done well, it's not worth doing"? I'd say - if it answers your
question, it's worth doing. If you assume the question was - "can I verify
that the new GC is better in my random not-scientific test on a specific
dataset", then everything's all right.

And I do hope that the person writing the new GC will publish some better
stats.

~~~
dododo
1\. the point of doing statistics properly is to try to remove your bias about
what you think the results are: every developer who has worked hard enough on
a problem, probably thinks they made an improvement. it's a good idea to test
this, statistically, as you might be surprised.

2\. the article didn't repeat anything, at all. it's not even on average (at
least, as it's reported). as far as i can tell, he ran a test once and
reported the result.

3\. the point is that judging from one sample or a mean on a small sample does
not answer any reasonable question because the (unrepresented) uncertainty in
the answer could actually mean the result goes either way in general.

btw, what i'm suggesting isn't some deep, complicated statistics! i'm
suggesting the most basic, elementary statistics. this is super, super basic
stuff. high school kids here know about the mean and standard deviation here.

if you want to be more principled about things kind of thing, you could either
do some kind of hypothesis test (e.g., a two-sample t-test adjusting for
variance would be appropriate here) or use a Bayes factor to compare the two
hypothese (better, not better).

------
Roboprog
It looks like a promising improvement. From the sound of it, this seems like
the same sort of GC methodology that Java used the last few revs (1.5; 1.6):
putting objects in "recent" and "old" pools.

That said, I would like to see a benchmark that _uses_ a bit more memory, not
just temporary stuff to be discarded ASAP. Many of the Java promotion
benchmarks do that too (focus on use of transient data only). What happens if
you actually need to hold on to some state, rather than merely concatenating
somebody's account info into a dynamic index.html page in a web app?

Oh well, there's always memcached, I suppose. It would be nice to have
languages that support app types other than stateless-as-possible web apps.

~~~
dfox
Generational GC's are what almost every high-performance VM uses and this
approach is well-tested in many practical systems. Almost all aplications tend
to allocate large amounts of temporary objects, while having large amounts of
memory consumed by long-living objects (I would assume that at least one half
of used memory in this case are long-living objects). Also, because
generational GC is almost invariably copying GC (it is possible to make non-
moving generational GC, but that is more of academic interest) it tends to
improve caching behavior, which is even more important with each new
generation of CPUs.

Main reason why copying and generational GCs are not used everywhere is that
they tend to generally complicate matters when you want to seamlessly
interface with C code (non-managed code in .NET terms), because objects then
tend to be moved around by GC, which has to see and be able to update every
reference to it's heap.

In most applications and runtimes, there are a lot more behind the stages
objects allocated by VM and or library routines than objects that you allocate
directly in your code.

Edit: and I think that Sun's JVM uses generational GC at least since 1.3, if
not 1.2.

~~~
riffraff
as a side note, bohem's gc is also generational
<http://www.hpl.hp.com/personal/Hans_Boehm/gc/#details>

~~~
dfox
But generational collection has to be explicitly enabled by
GC_enable_incremental() and whole thing looks especially experimental. Main
problem is that you cannot reasonably maintain remembered set(or grey set for
incremental GC) without write barriers and write barriers are not practical
when you cannot modify compiler.

------
jules
Looks good but one bit triggered WTF:

> Mostly precise scanning (stacks and registers are scanned conservatively).

Why would you code a conservative garbage collector for .NET?

~~~
dfox
Because of i386. i386 has not enough registers to be able to efficiently
divide them into pointer-only and non-pointer-only. For stack case it is often
worthwhile to use calling convention reasonably similar to what C/OS uses, but
examining that mess is not practical for GC, so it's better to just scan stack
conservatively (which is by the way done by many other generational GCs).

Edit: also you can implement conservative scan of stack and registers on Unix
without writing single line of assembly code by scanning result of
getcontext(), or even almost completely portably by scanning result of
setjmp() + some heuristics for finding stack in memory.

~~~
ssp
.NET is statically typed though, so it's possible to make a static
determination at each GC point which registers and stack locations contain
pointers.

~~~
dfox
Possible, but probably not practical.

And especially in case of .NET which offers pretty tight integration with
unmanaged code, you can have various stack frames with layout completely out
of your control on stack (I'm not sure whenever such stack frames must be
scanned for pointers, but they will certainly complicate any attempts to
analyze stack structure).

~~~
ssp
_Possible, but probably not practical._

It has certainly been done in various Java implementations. HotSpot I'm pretty
sure does precise garbage collection.

Native stacks don't need to be scanned for pointers, but the VM must of course
keep track of which parts of the stack are native.

~~~
barrkel
What you do is stop the world for a moment, set a GC due flag, and then wait
for threads to come to GC-safe points. At GC safe points, the set of live
registers and stack locations is definitely known.

~~~
dfox
This does not work when you can have threads executing arbitrary native code,
that can do some non-GC related work for essentially unbounded time, which is
certainly case with .NET. In JVM any native code tends to be prepared to cope
with this somehow, but .NET is explicitly designed to be able to seamlessly
interoperate with random COM and legacy native libraries, so you cannot
guarantee that such native code will get back to GC safe point in any bounded
amount of time.

Edit: you can do stop-the-world GC at any point of program whan you scan stack
conservatively, GC-safe points are only needed for precise scanning of stack
and registers, so this is problem only when you want to be fully precise.
Fully precise GC is certainly better when it is possible, but conservative
scan works mostly well enough.

~~~
barrkel
Arbitrary native code is just fine - all you need to do is catch it when it
returns back into the CLR. Native code can't (legally) have access to any
managed memory that isn't already pinned. Since you (by you, I mean the CLR)
control the call point to native code, you control the return point too - you
can patch the instructions after the call if necessary.

~~~
thedigitalengel
What happens if the native code enters, say, an infinite loop?

