
Garbage Collection vs. malloc/free - pron
https://groups.google.com/forum/#!topic/mechanical-sympathy/1TMjVjyyMmA
======
JoshTriplett
Java's garbage collection is far from the state of the art in GC technology.
While there are some non-standard JVMs that have advanced GC (such as IBM's
real-time garbage collector), the standard JVM does not.

Current-generation garbage collectors have far better performance, and many
research GCs support concurrent, pause-free collection.

(And that leaves aside the non-performance benefits of a GC, in terms of
engineering simplicity and bug elimination.)

~~~
dman
Any pointers on the state of the art?

~~~
pjmlp
[http://www.azulsystems.com/technology/c4-garbage-
collector](http://www.azulsystems.com/technology/c4-garbage-collector)

~~~
hga
Which I would add is a production garbage collector, an updated version of the
Pauseless one that's got its own paper and is covered in the 2nd edition of
_the_ book on garbage collection, [http://www.amazon.com/The-Garbage-
Collection-Handbook-Manage...](http://www.amazon.com/The-Garbage-Collection-
Handbook-Management/dp/1420082795/)

Does require custom hardware or kernel cooperation for speed (e.g. it needs to
do batched MMU operations without clearing the TBL on each 2MB page). Looks
like it's got a better read barrier than the Pauseless one; that does of
course cost extra on stock hardware.

------
haberman
It's amusing to me that on the same thread GC proponents say that high-
performance GC is easy
([https://news.ycombinator.com/item?id=6485114](https://news.ycombinator.com/item?id=6485114))
_and_ that Java's GC is "far from state of the art"
([https://news.ycombinator.com/item?id=6483704](https://news.ycombinator.com/item?id=6483704)).

And so the perpetual argument goes: GC is superior to manual memory management
(the latter of which is "adding needless effort to writing software"), and the
performance is just as good, and the cases where performance is _not_ just as
good will be fixed Real Soon Now.

~~~
betterunix
I'd like to point out that I did not say that a _high-performance_ GC is
_easy_. What I actually said is that a _simple_ GC is _easier_ than simple
automatic optimizations. A high-performance GC is the same level of difficulty
as _good_ automatic optimization.

------
pjmlp
Actually the discussion at the newsgroup is more about Java's GC vs
malloc/free.

In GC enabled systems programming languages like the Oberon family, Modula-3,
D, Sing#, one has finer control over the types of available memory.

\- The usual GC allocated memory with the usual set of possible GC
implementations

\- Stack objects

\- Global objects statically allocated

\- Manual memory management inside unsafe blocks via untraced GC references

Not all GCs are made alike.

~~~
pron
Real-time Java (RTSJ) actually has some very cool mechanisms for fine-tuned
control over allocations.

------
bnolsen
In language garbage collection just makes implementing a compiler that much
harder to do and frankly bloats up the runtime. Basic c++ shared pointers take
up pretty much 90+% of what people need in GC (non stupid software
architecture decisions should make up the remaining 10%).

I _have_ seen problems with stalling caused by kernel memory heap defragging
on linux resulting in some serious thrashing. What that means is that the
issue of heap fragmentation can be a problem regardless of which language
used.

~~~
gsg
Reference counting, even ignoring the additional expense C++ places on
std::shared_ptr by requiring it be thread safe, is far too slow for allocation
heavy languages. Only "we don't give a damn" implementations (like CPython)
use it, because it is simply not competitive.

~~~
praseodym
iOS uses Automatic Reference Counting (ARC,
[http://en.wikipedia.org/wiki/Automatic_Reference_Counting](http://en.wikipedia.org/wiki/Automatic_Reference_Counting))
by having the compiler add reference counting instructions, while refraining
from having the developer add tedious reference counting instructions
themselves. I'm not sure if there are significant performance advantages over
proper GC.

It's worth noting that Apple has implemented GC in Objective-C on OS X before,
but deprecated that in favour of its ARC implementation -- probably because
its garbage collection implementation was no good at all (performance-wise).

~~~
pjmlp
Apple dropped GC in Objective-C, because they never managed to make it work
properly.

By the time ARC was announced, there were still lots of corner cases in the
ways code compiled with GC was interacting with code that wasn't compiled with
GC enabled.

Plus, if you check Apple's developers documentation from those days, great
care was required to make sure the GC could play nice with the C part of
Objective-C.

So, in reality the ARC approach was taken, because it was the only way to have
a working automated memory management, compatible with Objective-C code
overall. Of course, being Apple, they sold it as a better solution for memory
management and not as a result from their failure to implement a sound GC for
Objective-C.

------
rayiner
People tend to conflate garbage collection with Java's object model, but the
two things are distinct. Most of the time when you're talking about
performance limitations in Java, you're really concerned about Java's object
model, not the GC.

Garbage collection limits your object model, but Java's is much more limited
than required for garbage collection. For example, in C++, you might pass
small structs on the stack or in registers, while in Java you pass a pointer
to a heap allocated object. But there is no reason a garbage collector can't
handle value objects allocated on the stack, and indeed the CLR supports that
feature.

All a garbage collector prevents you from doing is hiding pointers in integers
and the like. But that's not as significant a limitation as not being able to
say flat arrays of structs, something that is totally compatible with garbage
collection.

~~~
pron
Java value-types (+ arrays) are in the works. The JVM already does what is
known as "scalar replacement"
([http://www.stefankrause.net/wp/?p=64](http://www.stefankrause.net/wp/?p=64))
automatically allocating some non-escaping objects on the stack.

------
Pxtl
I loathe the fact that all intermediate options get left off the table -
reference-counting, or single-ownership-with-weak-references, or anything else
short of memory-gobbling full GC that breaks your real-time guarantees.

And C++'s hideous shared_ptr system isn't a good thing to point to.

~~~
lmm
If you have real-time requirements reference counting is worse than GC; it has
the same problem of some operations sometimes taking an unexpectedly long
amount of time, and the worst-case guarantees are weaker.

Single ownership with weak references is effectively what you have in C,
though I guess compiler support for distinguishing which references are weak
or strong is nice.

~~~
Pxtl
Weak reference implies a safe error result if you access a weak reference
after a "strong" reference has been released, not undefined behavior. Not just
a pointer.

For refcounting and real-time, just let me leak the cycles. The non-realtime
part of reference counting is the cycle detection, let me control the cycle
sweeps and everybody's happy.

~~~
RyanZAG
_> let me control the cycle sweeps and everybody's happy._

Maybe you are happy, but 99.9% of people would probably be very unhappy with
that solution. How would you even go about accomplishing that in a way that
wouldn't make you lose your mind? I can only imagine the complexity of having
to decide how many cycle sweeps need to be done every time I reduce the
reference count on an object/struct. I also think the runtime performance of
enforcing that kind of limit would out way any possible benefit.

------
banachtarski
The benefit of malloc/free (or new/delete) is that you have control over your
own memory layout to handle optimizations like cache alignment, keeping
relevant data contiguous, deferring pauses that would introduce latency and
such.

~~~
pron
Memory layout is a separate discussion. The kind of layout optimizations
you're describing require work, and if you're willing to put in this work, you
can do it in GCed environments as well (like the JVM, with finer-grained
control coming in future versions).

Otherwise, several/most of the JVM's GCs do a very good job of handling this
for you. Objects that were allocated consecutively by the same thread will
tend to be kept close together in memory.

~~~
banachtarski
That's contiguity but alignment is a different thing. At a certain point
though, the two things become one and the same. On the one hand, you could put
in layout optimizations in a GC system. On the other hand, you can use things
like shared_ptr and effectively write your own GC even.

I guess the debate really isn't an A vs B thing. It's just a spectrum and
deciding which side of the spectrum you want to start on.

~~~
pron
> you can use things like shared_ptr and effectively write your own GC even.

This is hardly the case, especially on modern multi-core hardware. Writing a
GC that works well in a concurrent environment is a huge undertaking. And
unless your system is very simple, you'll often be better off with a good
general-purpose GC.

A shared_ptr introduces _a lot_ of contention.

~~~
ori_b
The thing is that most garbage generated is thread local. In fact, depending
on the program, it's likely that most of it follows a stack discipline, albeit
one that's not delimited by function boundaries.

A simple GC (or resettable arena allocator) can make use of this knowledge,
and be quite a bit faster in those specific cases.

It's hard to beat the man years of work in generic GC that works in all cases,
but it's not that hard to beat it in specific situations when the application
author knows the details of how the application will do the allocations.

99% of the time, writing your own custom allocation strategies is a waste of
time. Sometimes, though, it can be a win, allowing you to allocate _and_ free
in a matter of tens of CPU cycles with no GC pauses at all.

~~~
lmm
>99% of the time, writing your own custom allocation strategies is a waste of
time. Sometimes, though, it can be a win, allowing you to allocate and free in
a matter of tens of CPU cycles with no GC pauses at all.

Agreed, but again, if you're willing to put that effort in you can do it in a
GCed language.

------
jrochkind1
> What makes a difference is gc pauses. Imho if you can live without managed
> memory. You can do this in java as well and eliminate gc pauses.

Wait, how can you turn off GC'd managed memory in Java, and do it yourself? I
didn't know you could do that in Java?

~~~
fauigerzigerk
Now everyone will tell you that you can use ByteBuffer or sun.misc.Unsafe to
keep memory off-heap and hence out of sight of the GC.

BUT what that means is that you have create your own custom implementations of
the data structures you need and you'll have to write your own memory
allocator.

Unless that memory area holds just a simple data BLOB (like a video file)
you're always going to be much more productive implementing the entire thing
in C++.

~~~
pjmlp
> Unless that memory area holds just a simple data BLOB (like a video file)
> you're always going to be much more productive implementing the entire thing
> in C++.

Or use a GC enabled systems programming language, that provides control over
what is allocated via GC or via other means.

------
_pmf_
> In my experience the biggest difference is stack allocation

This is the only voice of reason in this thread full of bullshit by monkeys
who have obviously never used C to any extent.

~~~
pjmlp
Yes we did.

Some of us are even older than C, and are thus aware that there used to be a
time and age where C only mattered for UNIX developers.

